Commit 9e8e9fb8 authored by Yuxin Wu's avatar Yuxin Wu

add benchmark

parent a7f4094d
...@@ -16,7 +16,7 @@ This is a minimal implementation that simply contains these files: ...@@ -16,7 +16,7 @@ This is a minimal implementation that simply contains these files:
### Implementation Notes ### Implementation Notes
Data: #### Data:
1. It's easy to train on your own data, by calling `DatasetRegistry.register(name, lambda: YourDatasetSplit())`, 1. It's easy to train on your own data, by calling `DatasetRegistry.register(name, lambda: YourDatasetSplit())`,
and modify `cfg.DATA.*` accordingly. Afterwards, "name" can be used in `cfg.DATA.TRAIN`. and modify `cfg.DATA.*` accordingly. Afterwards, "name" can be used in `cfg.DATA.TRAIN`.
...@@ -38,7 +38,7 @@ Data: ...@@ -38,7 +38,7 @@ Data:
which is probably not the optimal way. which is probably not the optimal way.
A TODO is to generate bounding box from segmentation, so more augmentations can be naturally supported. A TODO is to generate bounding box from segmentation, so more augmentations can be naturally supported.
Model: #### Model:
1. Floating-point boxes are defined like this: 1. Floating-point boxes are defined like this:
...@@ -54,15 +54,25 @@ Model: ...@@ -54,15 +54,25 @@ Model:
GPUs (the `BACKBONE.NORM=SyncBN` option). GPUs (the `BACKBONE.NORM=SyncBN` option).
Another alternative to BatchNorm is GroupNorm (`BACKBONE.NORM=GN`) which has better performance. Another alternative to BatchNorm is GroupNorm (`BACKBONE.NORM=GN`) which has better performance.
Efficiency: #### Efficiency:
1. This implementation does not use specialized CUDA ops (e.g. NMS, ROIAlign). Training throughput (larger is better) of standard R50-FPN Mask R-CNN, on 8 V100s:
| Implementation | Throughput (img/s) |
| - | - |
| [torchvision](https://pytorch.org/blog/torchvision03/#segmentation-models) | 59 |
| tensorpack | 50 |
| [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md#end-to-end-faster-and-mask-r-cnn-baselines) | 35 |
| [mmdetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/MODEL_ZOO.md#mask-r-cnn) | 35 |
| [Detectron](https://github.com/facebookresearch/Detectron) | 19 |
| [matterport/Mask_RCNN](https://github.com/matterport/Mask_RCNN/) | 11 |
1. This implementation does not use specialized CUDA ops (e.g. ROIAlign),
and does not use batch of images.
Therefore it might be slower than other highly-optimized implementations. Therefore it might be slower than other highly-optimized implementations.
With CUDA kernel of NMS (available only in TF master) and `HorovodTrainer`, Our number in the table above uses CUDA kernel of NMS (available only in TF
this implementation can train a standard R50-FPN at 50 img/s on 8 V100s, master with [PR30893](https://github.com/tensorflow/tensorflow/pull/30893)),
compared to 35 img/s in [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md#end-to-end-faster-and-mask-r-cnn-baselines) and `TRAINER=horovod`.
and [mmdetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/MODEL_ZOO.md#mask-r-cnn),
and 59 img/s in [torchvision](https://pytorch.org/blog/torchvision03/#detection-models).
1. If CuDNN warmup is on, the training will start very slowly, until about 1. If CuDNN warmup is on, the training will start very slowly, until about
10k steps (or more if scale augmentation is used) to reach a maximum speed. 10k steps (or more if scale augmentation is used) to reach a maximum speed.
......
...@@ -98,9 +98,9 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b ...@@ -98,9 +98,9 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
| R50-FPN | 38.9;35.4 [:arrow_down:][R50FPN2x] | 38.6;34.5 | 24h | <details><summary>2x</summary>`TRAIN.LR_SCHEDULE=2x` </details> | | R50-FPN | 38.9;35.4 [:arrow_down:][R50FPN2x] | 38.6;34.5 | 24h | <details><summary>2x</summary>`TRAIN.LR_SCHEDULE=2x` </details> |
| R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 29h | <details><summary>2x+GN</summary>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` <br/>`TRAIN.LR_SCHEDULE=2x` | | R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 29h | <details><summary>2x+GN</summary>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` <br/>`TRAIN.LR_SCHEDULE=2x` |
| R50-FPN | 41.7;36.2 [:arrow_down:][R50FPN1xCas] | | 16h | <details><summary>+Cascade</summary>`FPN.CASCADE=True` </details> | | R50-FPN | 41.7;36.2 [:arrow_down:][R50FPN1xCas] | | 16h | <details><summary>+Cascade</summary>`FPN.CASCADE=True` </details> |
| R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 27h | <details><summary>standard</summary>`MODE_FPN=False`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> | | R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 27h | <details><summary>standard</summary>`MODE_FPN=False`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] | 40.0;35.9 | 17h | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> | | R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] <sup>[2](#ft2)</sup> | 40.0;35.9 | 17h | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup> | | 64h | <details><summary>3x+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=3x` </details> | | R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] | | 64h | <details><summary>3x+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=3x` </details> |
| R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5 | 28h (on 64 V100s) | <details><summary>9x+GN+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=9x`<br/>`BACKBONE.FREEZE_AT=0`</details> | | R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5 | 28h (on 64 V100s) | <details><summary>9x+GN+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=9x`<br/>`BACKBONE.FREEZE_AT=0`</details> |
[R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz [R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz
...@@ -116,7 +116,9 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b ...@@ -116,7 +116,9 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
We compare models that have identical training & inference cost between the two implementations. We compare models that have identical training & inference cost between the two implementations.
Their numbers can be different due to small implementation details. Their numbers can be different due to small implementation details.
<a id="ft2">2</a>: Our mAP is __10+ point__ better than the official model in [matterport/Mask_RCNN](https://github.com/matterport/Mask_RCNN/releases/tag/v2.0) with the same R101-FPN backbone. <a id="ft2">2</a>: Our mAP is __7 point__ better than the official model in
[matterport/Mask_RCNN](https://github.com/matterport/Mask_RCNN/releases/tag/v2.0) which has the same architecture.
Our implementation is also [5x faster](https://github.com/tensorpack/benchmarks/tree/master/MaskRCNN).
<a id="ft3">3</a>: This entry does not use ImageNet pre-training. Detectron numbers are taken from Fig. 5 in [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883). <a id="ft3">3</a>: This entry does not use ImageNet pre-training. Detectron numbers are taken from Fig. 5 in [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883).
Note that our training strategy is slightly different: we enable cascade throughout the entire training. Note that our training strategy is slightly different: we enable cascade throughout the entire training.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment