update docs

72674731 · Yuxin Wu · 101d7aa5 · 72674731 · 72674731 · 72674731
Commit 72674731 authored Jul 24, 2018 by Yuxin Wu
6 changed files
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -3,7 +3,7 @@ An issue has to be one of the following:
 - [ ] Feature Requests
 - [ ] Questions on Using/Understanding Tensorpack
-For any unexpected problems, __PLEASE ALWAYS INCLUDE__:
+## For any unexpected problems, __PLEASE ALWAYS INCLUDE__:
 1. What you did:
  + If you're using examples:
    + What's the command you run:
@@ -21,18 +21,17 @@ For any unexpected problems, __PLEASE ALWAYS INCLUDE__:
      You can install Tensorpack master by `pip install -U git+https://github.com/ppwwyyxx/tensorpack.git`.:
  + Hardware information, if relevant.
-For efficiency issues, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
+About efficiency issues, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
-Feature Requests:
+## Feature Requests:
 + You can implement a lot of features by extending Tensorpack
  (See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
  It does not have to be added to Tensorpack unless you have a good reason.
 + "Could you improve/implement an example/paper ?" 
-  -- the answer is: we don't know, and we don't take feature requests for
+  -- the answer is: we have no plans to do so and we don't take feature requests for
-  examples. You should do it yourself with Tensorpack. If you don't know how to
+  examples. If you don't know how to do it, you may ask a usage question.
-  do it, you may ask a usage question.
-Usage Questions:
+## Usage Questions:
 + Read the [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials) first.
 + We answer "HOW to do X with Tensorpack" for a well-defined X.

--- a/examples/DoReFa-Net/README.md
+++ b/examples/DoReFa-Net/README.md
@@ -13,14 +13,14 @@ Alternative link to this page: [http://dorefa.net](http://dorefa.net)
 This is a good set of baselines for research in model quantization.
 These quantization techniques, when applied on AlexNet, achieves the following ImageNet performance in this implementation:
-| Model                              | Bit Width <br/> (weights, activations, gradients) | Top 1 Validation Error <sup>[1](#ft1)</sup>|
+| Model                              | Bit Width <br/> (weights, activations, gradients) | Top 1 Validation Error <sup>[1](#ft1)</sup>                                   |
-|:----------------------------------:|:-------------------------------------------------:|:----------------------:|
+|:----------------------------------:|:-------------------------------------------------:|:-----------------------------------------------------------------------------:|
 | Full Precision<sup>[2](#ft2)</sup> | 32,32,32                                          | 40.3%                                                                         |
 | TTQ                                | t,32,32                                           | 42.0%                                                                         |
 | BWN                                | 1,32,32                                           | 44.6%                                                                         |
 | BNN                                | 1,1,32                                            | 51.9%                                                                         |
 | DoReFa                             | 1,2,32                                            | 46.6%                                                                         |
-| DoReFa                             | 1,2,6                                             | 46.8%                  |
+| DoReFa                             | 1,2,6                                             | 46.8% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/alexnet-126.npz) |
 | DoReFa                             | 1,2,4                                             | 54.0%                                                                         |
 <a id="ft1">1</a>: These numbers were obtained by training on 8 GPUs with a total batch size of 256.

--- a/examples/FasterRCNN/README.md
+++ b/examples/FasterRCNN/README.md
@@ -11,10 +11,11 @@ with the support of:
 + [Group Normalization](https://arxiv.org/abs/1803.08494)
 ## Dependencies
-+ Python 3; TensorFlow >= 1.6 (1.4 or 1.5 can run but may crash due to a TF bug);
+ Python 3; OpenCV. 
-+ [pycocotools](https://github.com/cocodataset/cocoapi/tree/master/PythonAPI/), OpenCV.
+ TensorFlow >= 1.6 (1.4 or 1.5 can run but may crash due to a TF bug);
+ pycocotools: `pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'`
 + Pre-trained [ImageNet ResNet model](http://models.tensorpack.com/FasterRCNN/)
-  from tensorpack model zoo. Use the models with "-AlignPadding".
+  from tensorpack model zoo.
 + COCO data. It needs to have the following directory structure:
 ```
 COCO/DIR/
@@ -57,7 +58,7 @@ To predict on an image (and show output in a window):
 ```
 Evaluate the performance of a model on COCO.
-(Several trained models can be downloaded in [model zoo](http://models.tensorpack.com/FasterRCNN):
+(Several trained models can be downloaded in [model zoo](http://models.tensorpack.com/FasterRCNN)):
 ```
 ./train.py --evaluate output.json --load /path/to/COCO-R50C4-MaskRCNN-Standard.npz \
    --config MODE_MASK=True DATA.BASEDIR=/path/to/COCO/DIR
@@ -69,19 +70,19 @@ Evaluation or prediction will need the same `--config` used during training.
 These models are trained with different configurations on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95.
 MaskRCNN results contain both box and mask mAP.
- | Backbone | mAP<br/>(box;mask) | Detectron mAP <br/> (box;mask) | Time           | Configurations <br/> (click to expand)                                                                                                                                                           |
+ | Backbone | mAP<br/>(box;mask)                                                                                    | Detectron mAP <br/> (box;mask) | Time on 8 V100s | Configurations <br/> (click to expand)                                                                                                                                                                         |
 | -        | -                                                                                                     | -                              | -               | -                                                                                                                                                                                                              |
- | R50-C4   | 33.1               |                                | 18h on 8 V100s | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> |
+ | R50-C4   | 33.1                                                                                                  |                                | 18h             | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details>               |
- | R50-C4   | 36.6               | 36.5                           | 44h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=False` </details>                                                                                                                                 |
+ | R50-C4   | 36.6                                                                                                  | 36.5                           | 44h             | <details><summary>standard</summary>`MODE_MASK=False` </details>                                                                                                                                               |
- | R50-FPN  | 37.5               | 37.9<sup>[1](#ft1)</sup>       | 28h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details>                                                                                                                   |
+ | R50-FPN  | 37.4                                                                                                  | 37.9<sup>[1](#ft1)</sup>       | 30h             | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details>                                                                                                                                 |
- | R50-C4   | 36.8;32.1          |                                | 39h on 8 P100s | <details><summary>quick</summary>`MODE_MASK=True FRCNN.BATCH_PER_IM=256`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details>                                                               |
+ | R50-C4   | 37.8;33.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50C4-MaskRCNN-Standard.npz)    | 37.8;32.8                      | 49h             | <details><summary>standard</summary>`MODE_MASK=True` </details>                                                                                                                                                |
- | R50-C4   | 37.8;33.1          | 37.8;32.8                      | 49h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True` </details>                                                                                                                                  |
+ | R50-FPN  | 38.2;34.9 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50FPN-MaskRCNN-Standard.npz)   | 38.6;34.5<sup>[1](#ft1)</sup>  | 32h             | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True` </details>                                                                                                                                  |
- | R50-FPN  | 38.2;34.9          | 38.6;34.5<sup>[1](#ft1)</sup>  | 32h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True` </details>                                                                                                                    |
+ | R50-FPN  | 38.5;34.8                                                                                             | 38.6;34.2<sup>[2](#ft2)</sup>  | 34h             | <details><summary>standard+ConvHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head` </details>                                                                        |
- | R50-FPN  | 38.5;34.8          | 38.6;34.2<sup>[2](#ft2)</sup>  | 34h on 8 V100s | <details><summary>standard+ConvHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head` </details>                                                          |
+ | R50-FPN  | 39.5;35.2                                                                                             | 39.5;34.4<sup>[2](#ft2)</sup>  | 34h             | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details>                                                                   |
- | R50-FPN  | 39.5;35.2          | 39.5;34.4<sup>[2](#ft2)</sup>  | 34h on 8 V100s | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details>                                                          |
+ | R50-FPN  | 40.0;36.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50FPN-MaskRCNN-StandardGN.npz) | 40.3;35.7                      | 44h             | <details><summary>standard+GN</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head |
- | R101-C4  | 40.8;35.1          |                                | 63h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details>                                                                                      |
+ | R101-C4  | 40.8;35.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101C4-MaskRCNN-Standard.npz)   |                                | 63h             | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details>                                                                                                    |
- <a id="ft1">1</a>: Slightly different configurations.
+ <a id="ft1">1</a>: This implementation has slightly different configurations from detectron (e.g. batch size).
 <a id="ft2">2</a>: Numbers taken from [Group Normalization](https://arxiv.org/abs/1803.08494)

--- a/examples/ResNet/README.md
+++ b/examples/ResNet/README.md
@@ -15,14 +15,14 @@ This recipe has better performance than most open source implementations.
 In fact, many papers that claim to "improve" ResNet only compete with a lower
 baseline and they actually cannot beat this ResNet recipe.
-| Model              | Top 5 Error | Top 1 Error |
+| Model       | Top 5 Error | Top 1 Error | Download                                                                     |
-|:-------------------|-------------|------------:|
+|:------------|:------------|:-----------:|:----------------------------------------------------------------------------:|
-| ResNet18           |     10.50%  |      29.66% |
+| ResNet18    | 10.50%      | 29.66%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet18.npz)    |
-| ResNet34  		 |     8.56%   |      26.17% |
+| ResNet34    | 8.56%       | 26.17%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet34.npz)    |
-| ResNet50           |     6.85%   |      23.61% |
+| ResNet50    | 6.85%       | 23.61%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet50.npz)    |
-| ResNet50-SE       |     6.24%   |      22.64% |
+| ResNet50-SE | 6.24%       | 22.64%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet50-SE.npz) |
-| ResNet101         |     6.04%   |      21.95% |
+| ResNet101   | 6.04%       | 21.95%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet101.npz)   |
-| ResNet152         |     5.78%   |      21.51% |
+| ResNet152   | 5.78%       | 21.51%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet152.npz)   |
 To train, first decompress ImageNet data into [this structure](http://tensorpack.readthedocs.io/en/latest/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12), then:
 ```bash

--- a/tensorpack/callbacks/prof.py
+++ b/tensorpack/callbacks/prof.py
@@ -13,7 +13,7 @@ from tensorflow.python.client import timeline
 from .base import Callback
 from ..utils import logger
 from ..utils.concurrency import ensure_proc_terminate, start_proc_mask_signal
-from ..utils.gpu import get_nr_gpu
+from ..utils.gpu import get_num_gpu
 from ..utils.nvml import NVMLContext
 __all__ = ['GPUUtilizationTracker', 'GraphProfiler', 'PeakMemoryTracker']
@@ -40,7 +40,7 @@ class GPUUtilizationTracker(Callback):
        if devices is None:
            env = os.environ.get('CUDA_VISIBLE_DEVICES')
            if env is None:
-                self._devices = list(range(get_nr_gpu()))
+                self._devices = list(range(get_num_gpu()))
                logger.warn("[GPUUtilizationTracker] Both devices and CUDA_VISIBLE_DEVICES are None! "
                            "Will monitor all {} visible GPUs!".format(len(self._devices)))
            else:

--- a/tensorpack/models/batch_norm.py
+++ b/tensorpack/models/batch_norm.py
@@ -221,8 +221,8 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
        if sync_statistics == 'nccl':
            if six.PY3 and TF_version <= (1, 9) and ctx.is_main_training_tower:
-                logger.warn("A TensorFlow bug will cause cross-GPU BatchNorm to fail. "
+                logger.warn("A bug in TensorFlow<=1.9 will cause cross-GPU BatchNorm to fail. "
-                            "Apply this patch: https://github.com/tensorflow/tensorflow/pull/20360")
+                            "Upgrade or apply this patch manually: https://github.com/tensorflow/tensorflow/pull/20360")
            from tensorflow.contrib.nccl.ops import gen_nccl_ops
            shared_name = re.sub('tower[0-9]+/', '', tf.get_variable_scope().name)