update docs

816d04e6 · Yuxin Wu · 141ab53c · 816d04e6 · 816d04e6 · 816d04e6
Commit 816d04e6 authored Jun 02, 2019 by Yuxin Wu
6 changed files
--- a/docs/tutorial/dataflow.md
+++ b/docs/tutorial/dataflow.md

 # DataFlow

-### What is DataFlow
-
 DataFlow is a pure-Python library to create iterators for efficient data loading.

-**Definition**: A DataFlow is a idiomatic Python iterator object that has a `__iter__()` method
+### What is DataFlow
+
+**Definition**: A DataFlow instance is a idiomatic Python iterator object that has a `__iter__()` method
 which yields `datapoints`, and optionally a `__len__()` method returning the size of the DataFlow.
 A datapoint is a **list or dict** of Python objects, each of which are called the `components` of a datapoint.


--- a/docs/tutorial/efficient-dataflow.md
+++ b/docs/tutorial/efficient-dataflow.md
@@ -16,15 +16,20 @@ then apply complicated preprocessing to it.
 We hope to reach a speed of **1k~5k images per second**, to keep GPUs busy.

 Some things to know before reading:
-1. For smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some multiprocess runner should usually work well enough.
-	 Therefore you don't have to understand this tutorial in depth unless you really find your data being the bottleneck.
-	 This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset.
+1. You only need the data loader to be **fast enough, but not faster**.
+   See [How Fast Do You Actually Need](philosophy/dataflow.html#how-fast-do-you-actually-need) for details.
+   For smaller datasets (e.g. several GBs of images with lightweight preprocessing), 
+   a simple reader plus some multiprocess runner is usually fast enough.
+
+   Therefore you don't have to understand this tutorial in depth, unless you really find your data loader being the bottleneck.
+   **Premature optimization is the root of evil.** Always benchmark and make sure you need optimization before optimizing.
+
 2. Having a fast Python generator **alone** may or may not improve your overall training speed.
 	 You need mechanisms to hide the latency of **all** preprocessing stages, as mentioned in the
 	 [InputSource tutorial](extend/input-source.html).
 3. Reading training set and validation set are different.
 	 In training it's OK to reorder, regroup, or even duplicate some datapoints, as long as the
-	 data distribution roughly stays the same.
+	 data distribution stays the same.
 	 But in validation we often need the exact set of data, to be able to compute a correct and comparable score.
 	 This will affect how we build the DataFlow.
 4. The actual performance would depend on not only the disk, but also memory (for caching) and CPU (for data processing).
@@ -33,11 +38,13 @@ Some things to know before reading:
    The solutions in this tutorial may not help you.
    To improve your own DataFlow, read the 
    [performance tuning tutorial](performance-tuning.html#investigate-dataflow)
-    before doing any optimizations.
+    before performing or asking about any actual optimizations.

 The benchmark code for this tutorial can be found in [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ImageNet),
 including comparison with a similar pipeline built with `tf.data`.

+This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset.
+
 ## Random Read

 ### Basic
@@ -275,7 +282,7 @@ TestDataSpeed(df).start()

 ## Common Issues on Windows:

-1. Windows does not support ZMQ. You can only use `MultiProcessRunner`,
+1. Windows does not support IPC protocol of ZMQ. You can only use `MultiProcessRunner`,
   `MultiThreadRunner`, and `MultiThreadMapData`. But you cannot use 
   `MultiProcessRunnerZMQ` or `MultiProcessMapData` (which is an alias of `MultiProcessMapDataZMQ`).
 2. Windows needs to pickle your dataflow to run it in multiple processes.

--- a/docs/tutorial/index.rst
+++ b/docs/tutorial/index.rst
@@ -13,16 +13,25 @@ Basic Tutorials
 .. toctree::
  :maxdepth: 1

-  dataflow
-  symbolic
  trainer
  training-interface
  callback
+  symbolic
  save-load
  summary
  inference
  faq
+
+DataFlow Tutorials
+========================
+
+.. toctree::
+  :maxdepth: 1
+
+  dataflow
  philosophy/dataflow
+  extend/dataflow
+  efficient-dataflow

 Advanced Tutorials
 ==================
@@ -30,18 +39,9 @@ Advanced Tutorials
 .. toctree::
  :maxdepth: 1

-  extend/dataflow
  extend/input-source
+  extend/callback
  extend/augmentor
  extend/model
-  extend/callback
  extend/trainer
-
-Performance
-============
-
-.. toctree::
-  :maxdepth: 1
-
-  efficient-dataflow
  performance-tuning
--- a/docs/tutorial/philosophy/dataflow.md
+++ b/docs/tutorial/philosophy/dataflow.md
@@ -143,7 +143,7 @@ it assumes your dataset has a `__len__` and supports `__getitem__`,
 which does not work when you have a dynamic/unreliable data source, 
 or when you need to filter your data on the fly.

-`torch.utils.data.DataLoader` is quite good, depiste that it also makes some
+`torch.utils.data.DataLoader` is quite good, despite that it also makes some
 **bad assumptions on batching** and is not always efficient.

 1. It assumes you always do batch training, has a constant batch size, and 
@@ -152,6 +152,7 @@ or when you need to filter your data on the fly.
   
 2. Its multiprocessing implementation is efficient on `torch.Tensor`,
   but inefficient for generic data type or numpy arrays.
+   Also, its implementation [does not always clean up the subprocesses correctly](https://github.com/pytorch/pytorch/issues/16608).
   
 On the other hand, DataFlow:


--- a/docs/tutorial/symbolic.md
+++ b/docs/tutorial/symbolic.md
@@ -4,7 +4,7 @@
 Tensorpack contains a small collection of common model primitives,
 such as conv/deconv, fc, bn, pooling layers.
 However, tensorpack is model-agnostic, which means
-**you can skip this tutorial and do not need to use tensorpack's symbolic layers.**
+**you do not need to use tensorpack's symbolic layers and can skip this tutorial.**

 These layers were written only because there were no alternatives when tensorpack was first developed.
 Nowadays, many of these implementation actually call `tf.layers` directly.

--- a/examples/FasterRCNN/README.md
+++ b/examples/FasterRCNN/README.md
@@ -97,7 +97,7 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
 | R101-C4                        | 40.1;34.6 [:arrow_down:][R101C41x]                                      |                                                    | 28h                    | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                                                                        |
 | R101-FPN                       | 40.7;36.8 [:arrow_down:][R101FPN1x]                                     | 40.0;35.9                                          | 18h                    | <details><summary>standard</summary>`MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                                                    |
 | R101-FPN                       | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup>          |                                                    | 69h                    | <details><summary>3x+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[420000,500000,540000]` </details>                                                                                                                                       |
- | R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch]<sup>[3](#ft3)</sup> | 47.4;40.5                                          | 28h (on 64 V100s)      | <details><summary>9x+GN+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[1500000,1580000,1620000]`<br/>`BACKBONE.FREEZE_AT=0`</details> |
+ | R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5                                          | 28h (on 64 V100s)      | <details><summary>9x+GN+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[1500000,1580000,1620000]`<br/>`BACKBONE.FREEZE_AT=0`</details> |

 [R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz
 [R50FPN2x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50FPN2x.npz