Commit 816d04e6 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 141ab53c
# DataFlow # DataFlow
### What is DataFlow
DataFlow is a pure-Python library to create iterators for efficient data loading. DataFlow is a pure-Python library to create iterators for efficient data loading.
**Definition**: A DataFlow is a idiomatic Python iterator object that has a `__iter__()` method ### What is DataFlow
**Definition**: A DataFlow instance is a idiomatic Python iterator object that has a `__iter__()` method
which yields `datapoints`, and optionally a `__len__()` method returning the size of the DataFlow. which yields `datapoints`, and optionally a `__len__()` method returning the size of the DataFlow.
A datapoint is a **list or dict** of Python objects, each of which are called the `components` of a datapoint. A datapoint is a **list or dict** of Python objects, each of which are called the `components` of a datapoint.
......
...@@ -16,15 +16,20 @@ then apply complicated preprocessing to it. ...@@ -16,15 +16,20 @@ then apply complicated preprocessing to it.
We hope to reach a speed of **1k~5k images per second**, to keep GPUs busy. We hope to reach a speed of **1k~5k images per second**, to keep GPUs busy.
Some things to know before reading: Some things to know before reading:
1. For smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some multiprocess runner should usually work well enough. 1. You only need the data loader to be **fast enough, but not faster**.
Therefore you don't have to understand this tutorial in depth unless you really find your data being the bottleneck. See [How Fast Do You Actually Need](philosophy/dataflow.html#how-fast-do-you-actually-need) for details.
This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset. For smaller datasets (e.g. several GBs of images with lightweight preprocessing),
a simple reader plus some multiprocess runner is usually fast enough.
Therefore you don't have to understand this tutorial in depth, unless you really find your data loader being the bottleneck.
**Premature optimization is the root of evil.** Always benchmark and make sure you need optimization before optimizing.
2. Having a fast Python generator **alone** may or may not improve your overall training speed. 2. Having a fast Python generator **alone** may or may not improve your overall training speed.
You need mechanisms to hide the latency of **all** preprocessing stages, as mentioned in the You need mechanisms to hide the latency of **all** preprocessing stages, as mentioned in the
[InputSource tutorial](extend/input-source.html). [InputSource tutorial](extend/input-source.html).
3. Reading training set and validation set are different. 3. Reading training set and validation set are different.
In training it's OK to reorder, regroup, or even duplicate some datapoints, as long as the In training it's OK to reorder, regroup, or even duplicate some datapoints, as long as the
data distribution roughly stays the same. data distribution stays the same.
But in validation we often need the exact set of data, to be able to compute a correct and comparable score. But in validation we often need the exact set of data, to be able to compute a correct and comparable score.
This will affect how we build the DataFlow. This will affect how we build the DataFlow.
4. The actual performance would depend on not only the disk, but also memory (for caching) and CPU (for data processing). 4. The actual performance would depend on not only the disk, but also memory (for caching) and CPU (for data processing).
...@@ -33,11 +38,13 @@ Some things to know before reading: ...@@ -33,11 +38,13 @@ Some things to know before reading:
The solutions in this tutorial may not help you. The solutions in this tutorial may not help you.
To improve your own DataFlow, read the To improve your own DataFlow, read the
[performance tuning tutorial](performance-tuning.html#investigate-dataflow) [performance tuning tutorial](performance-tuning.html#investigate-dataflow)
before doing any optimizations. before performing or asking about any actual optimizations.
The benchmark code for this tutorial can be found in [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ImageNet), The benchmark code for this tutorial can be found in [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ImageNet),
including comparison with a similar pipeline built with `tf.data`. including comparison with a similar pipeline built with `tf.data`.
This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset.
## Random Read ## Random Read
### Basic ### Basic
...@@ -275,7 +282,7 @@ TestDataSpeed(df).start() ...@@ -275,7 +282,7 @@ TestDataSpeed(df).start()
## Common Issues on Windows: ## Common Issues on Windows:
1. Windows does not support ZMQ. You can only use `MultiProcessRunner`, 1. Windows does not support IPC protocol of ZMQ. You can only use `MultiProcessRunner`,
`MultiThreadRunner`, and `MultiThreadMapData`. But you cannot use `MultiThreadRunner`, and `MultiThreadMapData`. But you cannot use
`MultiProcessRunnerZMQ` or `MultiProcessMapData` (which is an alias of `MultiProcessMapDataZMQ`). `MultiProcessRunnerZMQ` or `MultiProcessMapData` (which is an alias of `MultiProcessMapDataZMQ`).
2. Windows needs to pickle your dataflow to run it in multiple processes. 2. Windows needs to pickle your dataflow to run it in multiple processes.
......
...@@ -13,16 +13,25 @@ Basic Tutorials ...@@ -13,16 +13,25 @@ Basic Tutorials
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
dataflow
symbolic
trainer trainer
training-interface training-interface
callback callback
symbolic
save-load save-load
summary summary
inference inference
faq faq
DataFlow Tutorials
========================
.. toctree::
:maxdepth: 1
dataflow
philosophy/dataflow philosophy/dataflow
extend/dataflow
efficient-dataflow
Advanced Tutorials Advanced Tutorials
================== ==================
...@@ -30,18 +39,9 @@ Advanced Tutorials ...@@ -30,18 +39,9 @@ Advanced Tutorials
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
extend/dataflow
extend/input-source extend/input-source
extend/callback
extend/augmentor extend/augmentor
extend/model extend/model
extend/callback
extend/trainer extend/trainer
Performance
============
.. toctree::
:maxdepth: 1
efficient-dataflow
performance-tuning performance-tuning
...@@ -143,7 +143,7 @@ it assumes your dataset has a `__len__` and supports `__getitem__`, ...@@ -143,7 +143,7 @@ it assumes your dataset has a `__len__` and supports `__getitem__`,
which does not work when you have a dynamic/unreliable data source, which does not work when you have a dynamic/unreliable data source,
or when you need to filter your data on the fly. or when you need to filter your data on the fly.
`torch.utils.data.DataLoader` is quite good, depiste that it also makes some `torch.utils.data.DataLoader` is quite good, despite that it also makes some
**bad assumptions on batching** and is not always efficient. **bad assumptions on batching** and is not always efficient.
1. It assumes you always do batch training, has a constant batch size, and 1. It assumes you always do batch training, has a constant batch size, and
...@@ -152,6 +152,7 @@ or when you need to filter your data on the fly. ...@@ -152,6 +152,7 @@ or when you need to filter your data on the fly.
2. Its multiprocessing implementation is efficient on `torch.Tensor`, 2. Its multiprocessing implementation is efficient on `torch.Tensor`,
but inefficient for generic data type or numpy arrays. but inefficient for generic data type or numpy arrays.
Also, its implementation [does not always clean up the subprocesses correctly](https://github.com/pytorch/pytorch/issues/16608).
On the other hand, DataFlow: On the other hand, DataFlow:
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
Tensorpack contains a small collection of common model primitives, Tensorpack contains a small collection of common model primitives,
such as conv/deconv, fc, bn, pooling layers. such as conv/deconv, fc, bn, pooling layers.
However, tensorpack is model-agnostic, which means However, tensorpack is model-agnostic, which means
**you can skip this tutorial and do not need to use tensorpack's symbolic layers.** **you do not need to use tensorpack's symbolic layers and can skip this tutorial.**
These layers were written only because there were no alternatives when tensorpack was first developed. These layers were written only because there were no alternatives when tensorpack was first developed.
Nowadays, many of these implementation actually call `tf.layers` directly. Nowadays, many of these implementation actually call `tf.layers` directly.
......
...@@ -84,20 +84,20 @@ All models are trained with 8 NVIDIA V100s, unless otherwise noted. ...@@ -84,20 +84,20 @@ All models are trained with 8 NVIDIA V100s, unless otherwise noted.
Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can be reproduced. Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can be reproduced.
| Backbone | mAP<br/>(box;mask) | Detectron mAP <sup>[1](#ft1)</sup><br/> (box;mask) | Time <br/>(on 8 V100s) | Configurations <br/> (click to expand) | | Backbone | mAP<br/>(box;mask) | Detectron mAP <sup>[1](#ft1)</sup><br/> (box;mask) | Time <br/>(on 8 V100s) | Configurations <br/> (click to expand) |
| - | - | - | - | - | | - | - | - | - | - |
| R50-C4 | 34.1 | | 7.5h | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[140000,180000,200000]` </details> | | R50-C4 | 34.1 | | 7.5h | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[140000,180000,200000]` </details> |
| R50-C4 | 35.6 | 34.8 | 23h | <details><summary>standard</summary>`MODE_MASK=False` </details> | | R50-C4 | 35.6 | 34.8 | 23h | <details><summary>standard</summary>`MODE_MASK=False` </details> |
| R50-FPN | 37.5 | 36.7 | 11h | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> | | R50-FPN | 37.5 | 36.7 | 11h | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> |
| R50-C4 | 36.2;31.8 [:arrow_down:][R50C41x] | 35.8;31.4 | 23.5h | <details><summary>standard</summary>this is the default </details> | | R50-C4 | 36.2;31.8 [:arrow_down:][R50C41x] | 35.8;31.4 | 23.5h | <details><summary>standard</summary>this is the default </details> |
| R50-FPN | 38.2;34.8 | 37.7;33.9 | 13.5h | <details><summary>standard</summary>`MODE_FPN=True` </details> | | R50-FPN | 38.2;34.8 | 37.7;33.9 | 13.5h | <details><summary>standard</summary>`MODE_FPN=True` </details> |
| R50-FPN | 38.9;35.4 [:arrow_down:][R50FPN2x] | 38.6;34.5 | 25h | <details><summary>2x</summary>`MODE_FPN=True`<br/>`TRAIN.LR_SCHEDULE=[240000,320000,360000]` </details> | | R50-FPN | 38.9;35.4 [:arrow_down:][R50FPN2x] | 38.6;34.5 | 25h | <details><summary>2x</summary>`MODE_FPN=True`<br/>`TRAIN.LR_SCHEDULE=[240000,320000,360000]` </details> |
| R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 31h | <details><summary>2x+GN</summary>`MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` | | R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 31h | <details><summary>2x+GN</summary>`MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` |
| R50-FPN | 41.7;36.2 | | 17h | <details><summary>+Cascade</summary>`MODE_FPN=True FPN.CASCADE=True` </details> | | R50-FPN | 41.7;36.2 | | 17h | <details><summary>+Cascade</summary>`MODE_FPN=True FPN.CASCADE=True` </details> |
| R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 28h | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> | | R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 28h | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] | 40.0;35.9 | 18h | <details><summary>standard</summary>`MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> | | R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] | 40.0;35.9 | 18h | <details><summary>standard</summary>`MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup> | | 69h | <details><summary>3x+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[420000,500000,540000]` </details> | | R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup> | | 69h | <details><summary>3x+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[420000,500000,540000]` </details> |
| R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch]<sup>[3](#ft3)</sup> | 47.4;40.5 | 28h (on 64 V100s) | <details><summary>9x+GN+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[1500000,1580000,1620000]`<br/>`BACKBONE.FREEZE_AT=0`</details> | | R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5 | 28h (on 64 V100s) | <details><summary>9x+GN+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[1500000,1580000,1620000]`<br/>`BACKBONE.FREEZE_AT=0`</details> |
[R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz [R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz
[R50FPN2x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50FPN2x.npz [R50FPN2x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50FPN2x.npz
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment