Commit 8e2428d9 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent d43c8a28
...@@ -7,7 +7,7 @@ DataFlow is a library to build Python iterators for efficient data loading. ...@@ -7,7 +7,7 @@ DataFlow is a library to build Python iterators for efficient data loading.
**Definition**: A DataFlow is something that has a `get_data()` generator method, **Definition**: A DataFlow is something that has a `get_data()` generator method,
which yields `datapoints`. which yields `datapoints`.
A datapoint is a **list** of Python objects which is called the `components` of a datapoint. A datapoint is a **list** of Python objects which are called the `components` of a datapoint.
**Example**: to train on MNIST dataset, you may need a DataFlow with a `get_data()` method **Example**: to train on MNIST dataset, you may need a DataFlow with a `get_data()` method
that yields datapoints (lists) of two components: that yields datapoints (lists) of two components:
...@@ -46,7 +46,7 @@ the rest of the data pipeline. ...@@ -46,7 +46,7 @@ the rest of the data pipeline.
1. It's easy: write everything in pure Python, and reuse existing utilities. 1. It's easy: write everything in pure Python, and reuse existing utilities.
On the contrary, writing data loaders in TF operators is usually painful, and performance is hard to tune. On the contrary, writing data loaders in TF operators is usually painful, and performance is hard to tune.
2. It's fast: see [Efficient DataFlow](efficient-dataflow.html) 2. It's fast: see [Efficient DataFlow](efficient-dataflow.html)
on how to build a fast DataFlow with parallel prefetching. on how to build a fast DataFlow with parallelism.
If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html) If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html)
on how tensorpack further accelerates data loading in the graph. on how tensorpack further accelerates data loading in the graph.
...@@ -55,7 +55,7 @@ Nevertheless, tensorpack support data loading with native TF operators / TF data ...@@ -55,7 +55,7 @@ Nevertheless, tensorpack support data loading with native TF operators / TF data
### Use DataFlow (outside Tensorpack) ### Use DataFlow (outside Tensorpack)
Existing tensorpack trainers work with DataFlow out-of-the-box. Existing tensorpack trainers work with DataFlow out-of-the-box.
If you use DataFlow in some custom code, call `reset_state()` first to initialize it, If you use DataFlow in some custom code, call `reset_state()` first to initialize it,
and then use the generator however you want: and then use the generator however you like:
```python ```python
df = SomeDataFlow() df = SomeDataFlow()
......
## Write a Trainer ## Write a Trainer
**These contents are subject to change in later versions soon**.
The existing trainers should be enough for single-cost optimization tasks. The existing trainers should be enough for single-cost optimization tasks.
If you want to do something different during training, first consider writing it as a callback, If you want to do something different during training, first consider writing it as a callback,
or write an issue to see if there is a better solution than creating new trainers. or write an issue to see if there is a better solution than creating new trainers.
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
This tutorial contains some general discussions on the topic of This tutorial contains some general discussions on the topic of
"how to read data efficiently to work with TensorFlow", "how to read data efficiently to work with TensorFlow",
and how tensorpack support these methods. and how tensorpack supports these methods.
You don't have to read it because these are details under the tensorpack interface, You don't have to read it because these are details under the tensorpack interface,
but knowing it could help understand the efficiency and choose the best input pipeline for your task. but knowing it could help understand the efficiency and choose the best input pipeline for your task.
...@@ -31,7 +31,7 @@ down your training by 10%. Think about how many more copies are made during your ...@@ -31,7 +31,7 @@ down your training by 10%. Think about how many more copies are made during your
Failure to hide the data preparation latency is the major reason why people Failure to hide the data preparation latency is the major reason why people
cannot see good GPU utilization. __Always choose a framework that allows latency hiding.__ cannot see good GPU utilization. __Always choose a framework that allows latency hiding.__
However most other TensorFlow wrappers are designed to be `feed_dict` based. However most other TensorFlow wrappers are designed to be `feed_dict` based.
This is the major reason why tensorpack is [faster](https://gist.github.com/ppwwyyxx/8d95da79f8d97036a7d67c2416c851b6). This is the major reason why tensorpack is [faster](https://github.com/tensorpack/benchmarks).
## Python Reader or TF Reader ? ## Python Reader or TF Reader ?
...@@ -66,7 +66,7 @@ handle corner cases in noisy data, preprocess, etc. ...@@ -66,7 +66,7 @@ handle corner cases in noisy data, preprocess, etc.
## InputSource ## InputSource
`InputSource` is an abstract interface in tensorpack, to describe where the input come from and how they enter the graph. `InputSource` is an abstract interface in tensorpack, to describe where the inputs come from and how they enter the graph.
For example, For example,
1. Come from a DataFlow and been fed to the graph. 1. Come from a DataFlow and been fed to the graph.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment