Commit 215a4d6d authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent a6936913
......@@ -9,6 +9,17 @@ which you can use if your data format is simple.
In general, you probably need to write a source DataFlow to produce data for your task,
and then compose it with existing modules (e.g. mapping, batching, prefetching, ...).
The easiest way to create a DataFlow to load custom data, is to wrap a custom generator, e.g.:
```python
def my_data_loader():
while True:
# load data from somewhere
yield [my_array, my_label]
dataflow = DataFromGenerator(my_data_loader)
```
To write more complicated DataFlow, you need to inherit the base `DataFlow` class.
Usually, you just need to implement the `get_data()` method which yields a datapoint every time.
```python
class MyDataFlow(DataFlow):
......@@ -25,9 +36,9 @@ Optionally, you can implement the following two methods:
+ `reset_state()`. It is guaranteed that the actual process which runs a DataFlow will invoke this method before using it.
So if this DataFlow needs to do something after a `fork()`, you should put it here.
The convention is that, `reset_state()` must be called once and usually only once for each DataFlow instance.
`reset_state()` must be called once and only once for each DataFlow instance.
A typical situation is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
A typical example is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
Otherwise, child processes will have the same random seed. The `RNGDataFlow` base class does this for you.
You can subclass `RNGDataFlow` to access `self.rng` whose seed has been taken care of.
......@@ -37,13 +48,14 @@ module, you can take them as a reference.
#### More Data Processing
You can put any data processing you need in the source DataFlow, or write a new DataFlow for data
You can put any data processing you need in the source DataFlow you write, or you can write a new DataFlow for data
processing on top of the source DataFlow, e.g.:
```python
class ProcessingDataFlow(DataFlow):
def __init__(self, ds):
self.ds = ds
def get_data(self):
for datapoint in self.ds.get_data():
# do something
......
......@@ -104,6 +104,7 @@ class DataFromGenerator(DataFlow):
"""
Args:
gen: iterable, or a callable that returns an iterable
size: deprecated
"""
if not callable(gen):
self._gen = lambda: gen
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment