Commit 1e9342a5 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent f6313a07
...@@ -7,7 +7,6 @@ Since it is simply a generator interface, you can use the DataFlow in any Python ...@@ -7,7 +7,6 @@ Since it is simply a generator interface, you can use the DataFlow in any Python
or your own code as well. or your own code as well.
**What we are going to do**: We'll use ILSVRC12 dataset, which contains 1.28 million images. **What we are going to do**: We'll use ILSVRC12 dataset, which contains 1.28 million images.
The original images (JPEG compressed) are 140G in total. The original images (JPEG compressed) are 140G in total.
The average resolution is about 400x350 <sup>[[1]]</sup>. The average resolution is about 400x350 <sup>[[1]]</sup>.
...@@ -37,10 +36,11 @@ Some things to know before reading: ...@@ -37,10 +36,11 @@ Some things to know before reading:
before doing any optimizations. before doing any optimizations.
The benchmark code for this tutorial can be found in [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ImageNet), The benchmark code for this tutorial can be found in [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ImageNet),
including comparison with a similar (but simpler) pipeline built with `tf.data`. including comparison with a similar pipeline built with `tf.data`.
## Random Read ## Random Read
### Basic
We start from a simple DataFlow: We start from a simple DataFlow:
```python ```python
from tensorpack.dataflow import * from tensorpack.dataflow import *
...@@ -64,6 +64,8 @@ On a good filesystem you probably can already observe good speed here (e.g. 5 it ...@@ -64,6 +64,8 @@ On a good filesystem you probably can already observe good speed here (e.g. 5 it
because we are doing heavy random read on the filesystem (regardless of whether `shuffle` is True). because we are doing heavy random read on the filesystem (regardless of whether `shuffle` is True).
Image decoding in `cv2.imread` could also be a bottleneck at this early stage. Image decoding in `cv2.imread` could also be a bottleneck at this early stage.
### Parallel Prefetch
We will now add the cheapest pre-processing now to get an ndarray in the end instead of a list We will now add the cheapest pre-processing now to get an ndarray in the end instead of a list
(because training will need ndarray eventually): (because training will need ndarray eventually):
```eval_rst ```eval_rst
...@@ -85,11 +87,12 @@ Now it's time to add threads or processes: ...@@ -85,11 +87,12 @@ Now it's time to add threads or processes:
ds = PrefetchDataZMQ(ds1, nr_proc=25) ds = PrefetchDataZMQ(ds1, nr_proc=25)
ds = BatchData(ds, 256) ds = BatchData(ds, 256)
``` ```
Here we start 25 processes to run `ds1`, and collect their output through ZMQ IPC protocol, Here we fork 25 processes to run `ds1`, and collect their output through ZMQ IPC protocol,
which is faster than `multiprocessing.Queue`. You can also apply prefetch after batch, of course. which is faster than `multiprocessing.Queue`. You can also apply prefetch after batch, of course.
### Parallel Map
The above DataFlow might be fast, but since it forks the ImageNet reader (`ds0`), The above DataFlow might be fast, but since it forks the ImageNet reader (`ds0`),
it's **not a good idea to use it for validation** (for reasons mentioned at top). it's **not a good idea to use it for validation** (for reasons mentioned at top. More details at the [documentation](../modules/dataflow.html#tensorpack.dataflow.PrefetchDataZMQ)).
Alternatively, you can use multi-threaded preprocessing like this: Alternatively, you can use multi-threaded preprocessing like this:
```eval_rst ```eval_rst
...@@ -138,11 +141,11 @@ Let's summarize what the above dataflow does: ...@@ -138,11 +141,11 @@ Let's summarize what the above dataflow does:
3. Both 1 and 2 happen together in a separate process, and the results are sent back to main process through ZeroMQ. 3. Both 1 and 2 happen together in a separate process, and the results are sent back to main process through ZeroMQ.
4. Main process makes batches, and other tensorpack modules will then take care of how they should go into the graph. 4. Main process makes batches, and other tensorpack modules will then take care of how they should go into the graph.
Note that in an actual training setup, I used the above multiprocess version for training set since There are also `MultiProcessMapData` as well for you to use.
it's faster to run heavy preprocessing in processes, and use this multithread version only for validation set.
## Sequential Read ## Sequential Read
### Save and Load a Single-File DataFlow
Random read may not be a good idea when the data is not on an SSD. Random read may not be a good idea when the data is not on an SSD.
We can also dump the dataset into one single LMDB file and read it sequentially. We can also dump the dataset into one single LMDB file and read it sequentially.
...@@ -190,6 +193,8 @@ the added line above maintains a buffer of datapoints and shuffle them once a wh ...@@ -190,6 +193,8 @@ the added line above maintains a buffer of datapoints and shuffle them once a wh
It will not affect the model as long as the buffer is large enough, It will not affect the model as long as the buffer is large enough,
but it can also consume much memory if too large. but it can also consume much memory if too large.
### Augmentations & Parallel Prefetch
Then we add necessary transformations: Then we add necessary transformations:
```eval_rst ```eval_rst
.. code-block:: python .. code-block:: python
...@@ -243,7 +248,7 @@ So DataFlow will not be a serious bottleneck if configured properly. ...@@ -243,7 +248,7 @@ So DataFlow will not be a serious bottleneck if configured properly.
## Distributed DataFlow ## Distributed DataFlow
To further scale your DataFlow, you can run it on multiple machines and collect them on the To further scale your DataFlow, you can even run it on multiple machines and collect them on the
training machine. E.g.: training machine. E.g.:
```python ```python
# Data Machine #1, process 1-20: # Data Machine #1, process 1-20:
......
...@@ -30,7 +30,7 @@ down your training by 10%. Think about how many more copies are made during your ...@@ -30,7 +30,7 @@ down your training by 10%. Think about how many more copies are made during your
Failure to hide the data preparation latency is the major reason why people Failure to hide the data preparation latency is the major reason why people
cannot see good GPU utilization. You should __always choose a framework that enables latency hiding.__ cannot see good GPU utilization. You should __always choose a framework that enables latency hiding.__
However most other TensorFlow wrappers are designed to be `feed_dict` based. However most other TensorFlow wrappers are designed without latency hiding in mind.
Tensorpack has built-in mechanisms to hide latency of the above stages. Tensorpack has built-in mechanisms to hide latency of the above stages.
This is one of the reasons why tensorpack is [faster](https://github.com/tensorpack/benchmarks). This is one of the reasons why tensorpack is [faster](https://github.com/tensorpack/benchmarks).
...@@ -47,11 +47,12 @@ People often think they should use `tf.data` because it's fast. ...@@ -47,11 +47,12 @@ People often think they should use `tf.data` because it's fast.
* Indeed it's often fast, but not necessarily. With Python you have access to many other fast libraries, which might be unsupported in TF. * Indeed it's often fast, but not necessarily. With Python you have access to many other fast libraries, which might be unsupported in TF.
* Python may be just fast enough. * Python may be just fast enough.
As long as data preparation keeps up with training, and the latency of all four blocks in the Keep in mind: as long as data loading speed can keep up with training, and the latency of all four blocks in the
above figure is hidden, __faster reader brings no gains to overall throughput__. above figure is hidden, __a faster reader brings no gains to overall throughput__.
For most types of problems, up to the scale of multi-GPU ImageNet training, For most types of problems, up to the scale of multi-GPU ImageNet training,
Python can offer enough speed if you use a fast library (e.g. `tensorpack.dataflow`). Python can offer enough speed if you use a fast library (e.g. `tensorpack.dataflow`).
See the [Efficient DataFlow](/tutorial/efficient-dataflow.html) tutorial on how to build a fast Python reader with DataFlow. See the [Efficient DataFlow](/tutorial/efficient-dataflow.html) tutorial on how to build a fast Python reader with `tensorpack.dataflow`.
### TensorFlow Reader: Cons ### TensorFlow Reader: Cons
The disadvantage of TF reader is obvious and it's huge: it's __too complicated__. The disadvantage of TF reader is obvious and it's huge: it's __too complicated__.
...@@ -73,7 +74,7 @@ To support all these features which could've been done with __3 lines of code in ...@@ -73,7 +74,7 @@ To support all these features which could've been done with __3 lines of code in
API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator) API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator)
(i.e. Python again) to the rescue. (i.e. Python again) to the rescue.
It only makes sense to use TF to read data, if your data is originally very clean and well-formated. It only makes sense to use TF to read data, if your data is originally very clean and well-formatted.
If not, you may feel like writing a script to format your data, but then you're almost writing a Python loader already! If not, you may feel like writing a script to format your data, but then you're almost writing a Python loader already!
Think about it: it's a waste of time to write a Python script to transform from some format to TF-friendly format, Think about it: it's a waste of time to write a Python script to transform from some format to TF-friendly format,
...@@ -108,3 +109,14 @@ If you need to use TF reading ops directly, either define a `tf.data.Dataset` ...@@ -108,3 +109,14 @@ If you need to use TF reading ops directly, either define a `tf.data.Dataset`
and use `TFDatasetInput`, or use `TensorInput`. and use `TFDatasetInput`, or use `TensorInput`.
Refer to the documentation of these `InputSource` for more details. Refer to the documentation of these `InputSource` for more details.
```eval_rst
.. note:: **InputSource requires tensorpack**
`tensorpack.dataflow` is a pure Python library for efficient data loading which can be used
independently without TensorFlow or tensorpack trainers.
However, the `InputSource` interface does require tensorpack and cannot be
used without tensorpack trainers.
Without tensorpack trainers, you'll have to optimize the copy latency by yourself.
```
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment