update docs

215a4d6d · Yuxin Wu · a6936913 · 215a4d6d · 215a4d6d
Commit 215a4d6d authored Mar 16, 2018 by Yuxin Wu
Show whitespace changes
Inline Side-by-side

Showing with 21 additions and 8 deletions

docs/tutorial/extend/dataflow.md docs/tutorial/extend/dataflow.md +20 -8

tensorpack/dataflow/raw.py tensorpack/dataflow/raw.py +1 -0

No files found.
--- a/docs/tutorial/extend/dataflow.md
+++ b/docs/tutorial/extend/dataflow.md
@@ -9,6 +9,17 @@ which you can use if your data format is simple.
 In general, you probably need to write a source DataFlow to produce data for your task,
 and then compose it with existing modules (e.g. mapping, batching, prefetching, ...).

+The easiest way to create a DataFlow to load custom data, is to wrap a custom generator, e.g.:
+```python
+def my_data_loader():
+  while True:
+    # load data from somewhere
+    yield [my_array, my_label]
+
+dataflow = DataFromGenerator(my_data_loader)
+```
+
+To write more complicated DataFlow, you need to inherit the base `DataFlow` class.
 Usually, you just need to implement the `get_data()` method which yields a datapoint every time.
 ```python
 class MyDataFlow(DataFlow):
@@ -25,9 +36,9 @@ Optionally, you can implement the following two methods:

 + `reset_state()`. It is guaranteed that the actual process which runs a DataFlow will invoke this method before using it.
  So if this DataFlow needs to do something after a `fork()`, you should put it here.
-	The convention is that, `reset_state()` must be called once and usually only once for each DataFlow instance.
+  `reset_state()` must be called once and only once for each DataFlow instance.

-	A typical situation is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
+  A typical example is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
  Otherwise, child processes will have the same random seed. The `RNGDataFlow` base class does this for you.
  You can subclass `RNGDataFlow` to access `self.rng` whose seed has been taken care of.

@@ -37,13 +48,14 @@ module, you can take them as a reference.

 #### More Data Processing

-You can put any data processing you need in the source DataFlow, or write a new DataFlow for data
+You can put any data processing you need in the source DataFlow you write, or you can write a new DataFlow for data
 processing on top of the source DataFlow, e.g.:

 ```python
 class ProcessingDataFlow(DataFlow):
  def __init__(self, ds):
    self.ds = ds
+
  def get_data(self):
    for datapoint in self.ds.get_data():
      # do something

--- a/tensorpack/dataflow/raw.py
+++ b/tensorpack/dataflow/raw.py
@@ -104,6 +104,7 @@ class DataFromGenerator(DataFlow):
        """
        Args:
            gen: iterable, or a callable that returns an iterable
+            size: deprecated
        """
        if not callable(gen):
            self._gen = lambda: gen