Commit 0c53df74 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent fb2a1f34
...@@ -49,20 +49,20 @@ or other tensorpack examples. ...@@ -49,20 +49,20 @@ or other tensorpack examples.
### Parallelize the Pipeline ### Parallelize the Pipeline
DataFlow includes carefully optimized parallel runners and parallel mappers: `Multi{Thread,Process}{Runner,MapData}`. DataFlow includes **carefully optimized** parallel runners and parallel mappers: `Multi{Thread,Process}{Runner,MapData}`.
Runners execute multiple clones of a dataflow in parallel. Runners execute multiple clones of a dataflow in parallel.
Mappers execute a mapping function in parallel on top of an existing dataflow. Mappers execute a mapping function in parallel on top of an existing dataflow.
You can find details in the [API docs](../modules/dataflow.html) under the You can find details in the [API docs](../modules/dataflow.html) under the
"parallel" and "parallel_map" section. "parallel" and "parallel_map" section.
The [Efficient DataFlow](efficient-dataflow.html) give a deeper dive [Parallel DataFlow tutorial](parallel-dataflow.html) give a deeper dive
on how to use them to optimize your data pipeline. on how to use them to optimize your data pipeline.
### Run the DataFlow ### Run the DataFlow
When training with tensorpack, typically it is the `InputSource` interface that runs the DataFlow. When training with tensorpack, typically it is the `InputSource` interface that runs the DataFlow.
When using DataFlow alone without other tensorpack components, When using DataFlow alone without tensorpack,
you need to call `reset_state()` first to initialize it, you need to call `reset_state()` first to initialize it,
and then use the generator however you like: and then use the generator however you like:
...@@ -76,6 +76,7 @@ for dp in df: ...@@ -76,6 +76,7 @@ for dp in df:
### Why DataFlow? ### Why DataFlow?
It's **easy and fast***. For more discussions, see [Why DataFlow?](/tutorial/philosophy/dataflow.html) It's **easy and fast**.
For more discussions, see [Why DataFlow?](/tutorial/philosophy/dataflow.html)
Nevertheless, using DataFlow is not required in tensorpack. Nevertheless, using DataFlow is not required in tensorpack.
Tensorpack supports data loading with native TF operators / TF datasets as well. Tensorpack supports data loading with native TF operators / TF datasets as well.
...@@ -15,7 +15,7 @@ A tensorpack DataFlow can be parallelized across CPUs in the following two ways: ...@@ -15,7 +15,7 @@ A tensorpack DataFlow can be parallelized across CPUs in the following two ways:
In this pattern, multiple identical DataFlows run on multiple CPUs, In this pattern, multiple identical DataFlows run on multiple CPUs,
and put results in a queue. and put results in a queue.
The master worker receives the output from the queue. The master receives the output from the queue.
To use this pattern with multi-processing, you can do: To use this pattern with multi-processing, you can do:
``` ```
...@@ -38,7 +38,7 @@ You can find them at the ...@@ -38,7 +38,7 @@ You can find them at the
### Distribute Tasks to Multiple Workers ### Distribute Tasks to Multiple Workers
In this pattern, the master worker sends datapoints (the tasks) In this pattern, the master sends datapoints (the tasks)
to multiple workers. to multiple workers.
The workers are responsible for executing a (possibly expensive) mapping The workers are responsible for executing a (possibly expensive) mapping
function on the datapoints, and send the results back to the master. function on the datapoints, and send the results back to the master.
...@@ -58,7 +58,7 @@ d2 = MultiProcessMapData(dp, num_proc=20, f) ...@@ -58,7 +58,7 @@ d2 = MultiProcessMapData(dp, num_proc=20, f)
The main difference between this pattern and the first, is that: The main difference between this pattern and the first, is that:
1. `d1` is not executed in parallel. Only `f` runs in parallel. 1. `d1` is not executed in parallel. Only `f` runs in parallel.
Therefore you don't have to worry about randomness or data distribution shift. Therefore you don't have to worry about randomness or data distribution shift.
Also you need to make `d1` very efficient (e.g., just produce small metadata). But you need to make `d1` very efficient (e.g. let it produce small metadata).
2. More communication is required, because it needs to send data to workers. 2. More communication is required, because it needs to send data to workers.
See its [API documentation](../modules/dataflow.html#tensorpack.dataflow.MultiProcessMapData) See its [API documentation](../modules/dataflow.html#tensorpack.dataflow.MultiProcessMapData)
...@@ -66,8 +66,8 @@ to learn more details. ...@@ -66,8 +66,8 @@ to learn more details.
## Threads & Processes ## Threads & Processes
Both the above two patterns can be used with either multi-threading or Both the above two patterns can be used with
multi-proessing, with the following builtin DataFlows: __either multi-threading or multi-processing__, with the following builtin DataFlows:
* [MultiProcessRunnerZMQ](../modules/dataflow.html#tensorpack.dataflow.MultiProcessRunnerZMQ) * [MultiProcessRunnerZMQ](../modules/dataflow.html#tensorpack.dataflow.MultiProcessRunnerZMQ)
or [MultiProcessRunner](../modules/dataflow.html#tensorpack.dataflow.MultiProcessRunner) or [MultiProcessRunner](../modules/dataflow.html#tensorpack.dataflow.MultiProcessRunner)
...@@ -82,16 +82,17 @@ Using threads and processes have their pros and cons: ...@@ -82,16 +82,17 @@ Using threads and processes have their pros and cons:
1. Threads in Python are limted by [GIL](https://wiki.python.org/moin/GlobalInterpreterLock). 1. Threads in Python are limted by [GIL](https://wiki.python.org/moin/GlobalInterpreterLock).
Threads in one process cannot interpret Python statements in parallel. Threads in one process cannot interpret Python statements in parallel.
As a result, multi-threading may not scale very well, if the workers spend a As a result, multi-threading may not scale well, if the workers spend a
significant amount of time in the Python interpreter. significant amount of time in the Python interpreter.
2. Processes need to pay the overhead of communication with each other. 2. Processes need to pay the overhead of communication with each other.
Though __processes are most commonly used__,
The best choice of the above parallel utilities varies across machines and tasks. The best choice of the above parallel utilities varies across machines and tasks.
You can even combine threads and processes sometimes. You can even combine threads and processes sometimes.
Note that in tensorpack, all the multiprocessing DataFlow with "ZMQ" in the name creates Note that in tensorpack, all the multiprocessing DataFlow with "ZMQ" in the name creates
__zero Python threads__: this is a key implementation detail that makes tensorpack DataFlow __zero Python threads__: this is a key implementation detail that makes tensorpack DataFlow
faster than the alternatives in Keras or Pytorch. faster than the alternatives in Keras or PyTorch.
For a new task, you often need to do a quick benchmark to choose the best pattern. For a new task, you often need to do a quick benchmark to choose the best pattern.
See [Performance Tuning Tutorial](performance-tuning.html) See [Performance Tuning Tutorial](performance-tuning.html)
......
...@@ -11,7 +11,7 @@ Note that this article may contain subjective opinions and we're happy to hear d ...@@ -11,7 +11,7 @@ Note that this article may contain subjective opinions and we're happy to hear d
Your data pipeline **only has to be fast enough**. Your data pipeline **only has to be fast enough**.
In practice, you should always make sure your data pipeline runs In practice, you should always first make sure your data pipeline runs
asynchronously with your training. asynchronously with your training.
The method to do so is different in each training framework, The method to do so is different in each training framework,
and in tensorpack this is automatically done by the [InputSource](/tutorial/extend/input-source.html) and in tensorpack this is automatically done by the [InputSource](/tutorial/extend/input-source.html)
...@@ -23,8 +23,8 @@ the data pipeline only needs to be as fast as the training. ...@@ -23,8 +23,8 @@ the data pipeline only needs to be as fast as the training.
It only has to be fast enough. It only has to be fast enough.
If you have used other data loading libraries, you may doubt If you have used other data loading libraries, you may doubt
how easy it is to make data pipeline fast enough, with pure Python. how easy it is to make data pipeline fast enough with pure Python.
In fact, it is usually not hard with DataFlow. In fact, it is usually not hard with DataFlow, because it's carefully optimized.
For example: if you train a ResNet-50 on ImageNet, For example: if you train a ResNet-50 on ImageNet,
DataFlow is fast enough for you unless you use DataFlow is fast enough for you unless you use
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment