Commit ccef4d4f authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 796a4353
...@@ -18,7 +18,7 @@ One good thing about having a standard interface is to be able to provide ...@@ -18,7 +18,7 @@ One good thing about having a standard interface is to be able to provide
the greatest code reusability. the greatest code reusability.
There are a lot of existing DataFlow utilities in tensorpack, which you can use to compose There are a lot of existing DataFlow utilities in tensorpack, which you can use to compose
complex DataFlow with a long data pipeline. A common pipeline usually complex DataFlow with a long data pipeline. A common pipeline usually
would __read from disk (or other sources), apply augmentations, group into batches, would __read from disk (or other sources), apply transformations, group into batches,
prefetch data__, etc. A simple example is as the following: prefetch data__, etc. A simple example is as the following:
````python ````python
...@@ -35,16 +35,17 @@ You can find more complicated DataFlow in the [ResNet training script](../exampl ...@@ -35,16 +35,17 @@ You can find more complicated DataFlow in the [ResNet training script](../exampl
with all the data preprocessing. with all the data preprocessing.
Unless you are working with standard data types (image folders, LMDB, etc), Unless you are working with standard data types (image folders, LMDB, etc),
you would usually want to write the base DataFlow (`MyDataFlow` in the above example) for your data format. you would usually want to write the source DataFlow (`MyDataFlow` in the above example) for your data format.
See [another tutorial](extend/dataflow.html) See [another tutorial](extend/dataflow.html)
for simple instructions on writing a DataFlow. for simple instructions on writing a DataFlow.
Once you have the base reader, all the [existing DataFlows](../modules/dataflow.html) are ready for you to complete Once you have the source reader, all the [existing DataFlows](../modules/dataflow.html) are ready for you to complete
the rest of the data pipeline. the rest of the data pipeline.
### Why DataFlow ### Why DataFlow
1. It's easy: write everything in pure Python, and reuse existing utilities. 1. It's easy: write everything in pure Python, and reuse existing utilities.
On the contrary, writing data loaders in TF operators is usually painful, and performance is hard to tune. On the contrary, writing data loaders in TF operators is usually painful, and performance is hard to tune.
See more discussions in [Python Reader or TF Reader](input-source.html#python-reader-or-tf-reader).
2. It's fast: see [Efficient DataFlow](efficient-dataflow.html) 2. It's fast: see [Efficient DataFlow](efficient-dataflow.html)
on how to build a fast DataFlow with parallelism. on how to build a fast DataFlow with parallelism.
If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html) If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html)
......
...@@ -60,9 +60,18 @@ The benefits of using TensorFlow ops are: ...@@ -60,9 +60,18 @@ The benefits of using TensorFlow ops are:
and TF `StagingArea` can help hide the "Copy to GPU" latency. and TF `StagingArea` can help hide the "Copy to GPU" latency.
They are used by most examples in tensorpack. They are used by most examples in tensorpack.
The benefits of using Python reader is obvious: The benefits of using Python reader is obvious: it's __much much easier__.
it's much much easier to write Python to read different data format, Reading data is a much more complicated and much less structured job than training a model.
handle corner cases in noisy data, preprocess, etc. You need to handle different data format, handle corner cases in noisy data,
which all require logical operations, condition operations, loops, etc. These operations
are __naturally not suitable__ for a graph computation framework.
It only makes sense to use TF to read data, if your data is originally very clean and well-formated.
You may want to write a script to clean your data, then you're almost writing a Python loader already!
Think about it: it's a waste of time to write a Python script to transform from raw data to TFRecords,
then a TF script to transform from TFRecords to tensors.
The intermediate step (TFRecords) doesn't have to exist.
## InputSource ## InputSource
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment