update docs

af61ebbc · Yuxin Wu · fbbd435a · af61ebbc · af61ebbc · af61ebbc
Commit af61ebbc authored Sep 15, 2017 by Yuxin Wu
4 changed files
--- a/README.md
+++ b/README.md
@@ -44,7 +44,9 @@ It's Yet Another TF wrapper, but different in:
 	+ Data-parallel distributed training is off-the-shelf to use. It is as slow as Google's official benchmark.

 3. Focus on __large datasets__.
-	+ It's painful to read/preprocess data through TF. tensorpack helps you load large datasets (e.g. ImageNet) in __pure Python__ with autoparallelization.
+	+ It's painful to read/preprocess data through TF.
+		tensorpack helps you load large datasets (e.g. ImageNet) in __pure Python__ with autoparallelization.
+		It also naturally works with TF Queues or tf.data.

 4. Interface of extensible __Callbacks__.
 	Write a callback to implement everything you want to do apart from the training iterations, and

--- a/docs/tutorial/dataflow.md
+++ b/docs/tutorial/dataflow.md
@@ -50,7 +50,7 @@ the rest of the data pipeline.
 	If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html)
 	on how tensorpack further accelerates data loading in the graph.

-Nevertheless, tensorpack support data loading with native TF operators as well.
+Nevertheless, tensorpack support data loading with native TF operators / TF datasets as well.

 ### Use DataFlow outside Tensorpack
 DataFlow is __independent__ of both tensorpack and TensorFlow.

--- a/docs/tutorial/input-source.md
+++ b/docs/tutorial/input-source.md
@@ -72,8 +72,9 @@ For example,
 1. Come from a DataFlow and been fed to the graph.
 2. Come from a DataFlow and been prefetched on CPU by a TF queue.
 3. Come from a DataFlow, prefetched on CPU by a TF queue, then prefetched on GPU by a TF StagingArea.
-4. Come from some TF native reading pipeline.
-5. Come from some ZMQ pipe, where the load/preprocessing may happen on a different machine.
+4. Come from a DataFlow, and further processed by `tf.data.Dataset`.
+5. Come from some TF native reading pipeline.
+6. Come from some ZMQ pipe, where the load/preprocessing may happen on a different machine.

 When you set `TrainConfig(dataflow=)`, tensorpack trainers automatically adds proper prefetching for you.
 You can also use `TrainConfig(data=)` option to use a customized `InputSource`.

--- a/tensorpack/dataflow/common.py
+++ b/tensorpack/dataflow/common.py
@@ -60,10 +60,10 @@ class TestDataSpeed(ProxyDataFlow):

 class BatchData(ProxyDataFlow):
    """
-    Concat datapoints into batches.
+    Stack datapoints into batches.
    It produces datapoints of the same number of components as ``ds``, but
    each component has one new extra dimension of size ``batch_size``.
-    A batch can be either a list of original components, or (by default)
+    The batch can be either a list of original components, or (by default)
    a numpy array of original components.
    """

@@ -71,15 +71,14 @@ class BatchData(ProxyDataFlow):
        """
        Args:
            ds (DataFlow): When ``use_list=False``, the components of ``ds``
-                must be either scalars or :class:`np.ndarray`, and
-                components has to have consistent shape across ``ds``.
+                must be either scalars or :class:`np.ndarray`, and have to be consistent in shapes.
            batch_size(int): batch size
            remainder (bool): When the remaining datapoints in ``ds`` is not
                enough to form a batch, whether or not to also produce the remaining
                data as a smaller batch.
-                If set to False, all generated datapoints are guranteed to have the same batch size.
+                If set to False, all produced datapoints are guranteed to have the same batch size.
            use_list (bool): if True, each component will contain a list
-                of datapoints instead of an numpy array of datapoints. This also avoids an extra copy.
+                of datapoints instead of an numpy array of an extra dimension.
        """
        super(BatchData, self).__init__(ds)
        if not remainder: