update docs

a7f4094d · Yuxin Wu · 02c40d10 · a7f4094d · a7f4094d
Commit a7f4094d authored Aug 29, 2019 by Yuxin Wu
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 5 deletions

examples/FasterRCNN/NOTES.md examples/FasterRCNN/NOTES.md +8 -4

examples/FasterRCNN/README.md examples/FasterRCNN/README.md +1 -1

No files found.
--- a/examples/FasterRCNN/NOTES.md
+++ b/examples/FasterRCNN/NOTES.md
@@ -56,6 +56,14 @@ Model:

 Efficiency:

+1. This implementation does not use specialized CUDA ops (e.g. NMS, ROIAlign).
+   Therefore it might be slower than other highly-optimized implementations.
+   With CUDA kernel of NMS (available only in TF master) and `HorovodTrainer`,
+   this implementation can train a standard R50-FPN at 50 img/s on 8 V100s,
+   compared to 35 img/s in [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md#end-to-end-faster-and-mask-r-cnn-baselines)
+   and [mmdetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/MODEL_ZOO.md#mask-r-cnn),
+   and 59 img/s in [torchvision](https://pytorch.org/blog/torchvision03/#detection-models).
+
 1. If CuDNN warmup is on, the training will start very slowly, until about
   10k steps (or more if scale augmentation is used) to reach a maximum speed.
   As a result, the ETA is also inaccurate at the beginning.
@@ -68,10 +76,6 @@ Efficiency:
 	If all images have the same spatial size (in which case the per-GPU computation is *still different*),
 	then a 85%~90% scaling efficiency is observed when using 8 V100s and `HorovodTrainer`.

-1. This implementation does not use specialized CUDA ops (e.g. NMS, ROIAlign).
-   Therefore it might be slower than other highly-optimized implementations.
-	 (CUDA kernel of NMS is currently only available in TF master)
-
 1. To reduce RAM usage on host: (1) make sure you're using the "spawn" method as
   set in `train.py`; (2) reduce `buffer_size` or `NUM_WORKERS` in `data.py`
   (which may negatively impact your throughput). The training only needs <10G RAM if `NUM_WORKERS=0`.

--- a/examples/FasterRCNN/README.md
+++ b/examples/FasterRCNN/README.md
@@ -98,7 +98,7 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
 | R50-FPN                        | 38.9;35.4 [:arrow_down:][R50FPN2x]                                      | 38.6;34.5                                          | 24h                    | <details><summary>2x</summary>`TRAIN.LR_SCHEDULE=2x` </details>                                                                                                                                                                                                                                                                                                           |
 | R50-FPN-GN                     | 40.4;36.3 [:arrow_down:][R50FPN2xGN]                                    | 40.3;35.7                                          | 29h                    | <details><summary>2x+GN</summary>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` <br/>`TRAIN.LR_SCHEDULE=2x`                                                                                                                                                                        |
 | R50-FPN                        | 41.7;36.2 [:arrow_down:][R50FPN1xCas]                                   |                                                    | 16h                    | <details><summary>+Cascade</summary>`FPN.CASCADE=True` </details>                                                                                                                                                                                                                                                                                                         |
- | R101-C4                        | 40.1;34.6 [:arrow_down:][R101C41x]                                      |                                                    | 27h                    | <details><summary>standard</summary>`MODE_FPN=False`<br/`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                |
+ | R101-C4                        | 40.1;34.6 [:arrow_down:][R101C41x]                                      |                                                    | 27h                    | <details><summary>standard</summary>`MODE_FPN=False`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                |
 | R101-FPN                       | 40.7;36.8 [:arrow_down:][R101FPN1x]                                     | 40.0;35.9                                          | 17h                    | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details>                                                                                                                                                                                                                                                                                    |
 | R101-FPN                       | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup>          |                                                    | 64h                    | <details><summary>3x+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=3x` </details>                                                                                                                                    |
 | R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5                                          | 28h (on 64 V100s)      | <details><summary>9x+GN+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=9x`<br/>`BACKBONE.FREEZE_AT=0`</details> |