update docs

1e9342a5 · Yuxin Wu · f6313a07 · 1e9342a5 · 1e9342a5
Commit 1e9342a5 authored May 16, 2019 by Yuxin Wu
Show whitespace changes
Inline Side-by-side

Showing with 29 additions and 12 deletions

docs/tutorial/efficient-dataflow.md docs/tutorial/efficient-dataflow.md +12 -7

docs/tutorial/extend/input-source.md docs/tutorial/extend/input-source.md +17 -5

No files found.
--- a/docs/tutorial/efficient-dataflow.md
+++ b/docs/tutorial/efficient-dataflow.md
@@ -7,7 +7,6 @@ Since it is simply a generator interface, you can use the DataFlow in any Python
 or your own code as well.
 **What we are going to do**: We'll use ILSVRC12 dataset, which contains 1.28 million images.
 The original images (JPEG compressed) are 140G in total.
 The average resolution is about 400x350 <sup>[[1]]</sup>.
@@ -37,10 +36,11 @@ Some things to know before reading:
    before doing any optimizations.
 The benchmark code for this tutorial can be found in [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ImageNet),
-including comparison with a similar (but simpler) pipeline built with `tf.data`.
+including comparison with a similar pipeline built with `tf.data`.
 ## Random Read
+### Basic
 We start from a simple DataFlow:
 ```python
 from tensorpack.dataflow import *
@@ -64,6 +64,8 @@ On a good filesystem you probably can already observe good speed here (e.g. 5 it
 because we are doing heavy random read on the filesystem (regardless of whether `shuffle` is True).
 Image decoding in `cv2.imread` could also be a bottleneck at this early stage.
+### Parallel Prefetch
 We will now add the cheapest pre-processing now to get an ndarray in the end instead of a list
 (because training will need ndarray eventually):
 ```eval_rst
@@ -85,11 +87,12 @@ Now it's time to add threads or processes:
 		ds = PrefetchDataZMQ(ds1, nr_proc=25)
 		ds = BatchData(ds, 256)
 ```
-Here we start 25 processes to run `ds1`, and collect their output through ZMQ IPC protocol,
+Here we fork 25 processes to run `ds1`, and collect their output through ZMQ IPC protocol,
 which is faster than `multiprocessing.Queue`. You can also apply prefetch after batch, of course.
+### Parallel Map
 The above DataFlow might be fast, but since it forks the ImageNet reader (`ds0`),
-it's **not a good idea to use it for validation** (for reasons mentioned at top).
+it's **not a good idea to use it for validation** (for reasons mentioned at top. More details at the [documentation](../modules/dataflow.html#tensorpack.dataflow.PrefetchDataZMQ)).
 Alternatively, you can use multi-threaded preprocessing like this:
 ```eval_rst
@@ -138,11 +141,11 @@ Let's summarize what the above dataflow does:
 3. Both 1 and 2 happen together in a separate process, and the results are sent back to main process through ZeroMQ.
 4. Main process makes batches, and other tensorpack modules will then take care of how they should go into the graph.
-Note that in an actual training setup, I used the above multiprocess version for training set since
+There are also `MultiProcessMapData` as well for you to use.
-it's faster to run heavy preprocessing in processes, and use this multithread version only for validation set.
 ## Sequential Read
+### Save and Load a Single-File DataFlow
 Random read may not be a good idea when the data is not on an SSD.
 We can also dump the dataset into one single LMDB file and read it sequentially.
@@ -190,6 +193,8 @@ the added line above maintains a buffer of datapoints and shuffle them once a wh
 It will not affect the model as long as the buffer is large enough,
 but it can also consume much memory if too large.
+### Augmentations & Parallel Prefetch
 Then we add necessary transformations:
 ```eval_rst
 .. code-block:: python
@@ -243,7 +248,7 @@ So DataFlow will not be a serious bottleneck if configured properly.
 ## Distributed DataFlow
-To further scale your DataFlow, you can run it on multiple machines and collect them on the
+To further scale your DataFlow, you can even run it on multiple machines and collect them on the
 training machine. E.g.:
 ```python
 # Data Machine #1, process 1-20:

--- a/docs/tutorial/extend/input-source.md
+++ b/docs/tutorial/extend/input-source.md
@@ -30,7 +30,7 @@ down your training by 10%. Think about how many more copies are made during your
 Failure to hide the data preparation latency is the major reason why people
 cannot see good GPU utilization. You should __always choose a framework that enables latency hiding.__
-However most other TensorFlow wrappers are designed to be `feed_dict` based.
+However most other TensorFlow wrappers are designed without latency hiding in mind.
 Tensorpack has built-in mechanisms to hide latency of the above stages.
 This is one of the reasons why tensorpack is [faster](https://github.com/tensorpack/benchmarks).
@@ -47,11 +47,12 @@ People often think they should use `tf.data` because it's fast.
 * Indeed it's often fast, but not necessarily. With Python you have access to many other fast libraries, which might be unsupported in TF.
 * Python may be just fast enough.
-    As long as data preparation keeps up with training, and the latency of all four blocks in the
+    Keep in mind: as long as data loading speed can keep up with training, and the latency of all four blocks in the
-    above figure is hidden, __faster reader brings no gains to overall throughput__.
+    above figure is hidden, __a faster reader brings no gains to overall throughput__.
    For most types of problems, up to the scale of multi-GPU ImageNet training,
    Python can offer enough speed if you use a fast library (e.g. `tensorpack.dataflow`).
-    See the [Efficient DataFlow](/tutorial/efficient-dataflow.html) tutorial on how to build a fast Python reader with DataFlow.
+    See the [Efficient DataFlow](/tutorial/efficient-dataflow.html) tutorial on how to build a fast Python reader with `tensorpack.dataflow`.
 ### TensorFlow Reader: Cons
 The disadvantage of TF reader is obvious and it's huge: it's __too complicated__.
@@ -73,7 +74,7 @@ To support all these features which could've been done with __3 lines of code in
 API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator)
 (i.e. Python again) to the rescue.
-It only makes sense to use TF to read data, if your data is originally very clean and well-formated.
+It only makes sense to use TF to read data, if your data is originally very clean and well-formatted.
 If not, you may feel like writing a script to format your data, but then you're almost writing a Python loader already!
 Think about it: it's a waste of time to write a Python script to transform from some format to TF-friendly format,
@@ -108,3 +109,14 @@ If you need to use TF reading ops directly, either define a `tf.data.Dataset`
 and use `TFDatasetInput`, or use `TensorInput`.
 Refer to the documentation of these `InputSource` for more details.
+```eval_rst
+.. note:: **InputSource requires tensorpack**
+    `tensorpack.dataflow` is a pure Python library for efficient data loading which can be used
+    independently without TensorFlow or tensorpack trainers.
+    However, the `InputSource` interface does require tensorpack and cannot be
+    used without tensorpack trainers.
+    Without tensorpack trainers, you'll have to optimize the copy latency by yourself.
+```