Commit 9e8e9fb8 authored by Yuxin Wu's avatar Yuxin Wu

add benchmark

parent a7f4094d
......@@ -16,7 +16,7 @@ This is a minimal implementation that simply contains these files:
### Implementation Notes
Data:
#### Data:
1. It's easy to train on your own data, by calling `DatasetRegistry.register(name, lambda: YourDatasetSplit())`,
and modify `cfg.DATA.*` accordingly. Afterwards, "name" can be used in `cfg.DATA.TRAIN`.
......@@ -38,7 +38,7 @@ Data:
which is probably not the optimal way.
A TODO is to generate bounding box from segmentation, so more augmentations can be naturally supported.
Model:
#### Model:
1. Floating-point boxes are defined like this:
......@@ -54,15 +54,25 @@ Model:
GPUs (the `BACKBONE.NORM=SyncBN` option).
Another alternative to BatchNorm is GroupNorm (`BACKBONE.NORM=GN`) which has better performance.
Efficiency:
#### Efficiency:
1. This implementation does not use specialized CUDA ops (e.g. NMS, ROIAlign).
Training throughput (larger is better) of standard R50-FPN Mask R-CNN, on 8 V100s:
| Implementation | Throughput (img/s) |
| - | - |
| [torchvision](https://pytorch.org/blog/torchvision03/#segmentation-models) | 59 |
| tensorpack | 50 |
| [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md#end-to-end-faster-and-mask-r-cnn-baselines) | 35 |
| [mmdetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/MODEL_ZOO.md#mask-r-cnn) | 35 |
| [Detectron](https://github.com/facebookresearch/Detectron) | 19 |
| [matterport/Mask_RCNN](https://github.com/matterport/Mask_RCNN/) | 11 |
1. This implementation does not use specialized CUDA ops (e.g. ROIAlign),
and does not use batch of images.
Therefore it might be slower than other highly-optimized implementations.
With CUDA kernel of NMS (available only in TF master) and `HorovodTrainer`,
this implementation can train a standard R50-FPN at 50 img/s on 8 V100s,
compared to 35 img/s in [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md#end-to-end-faster-and-mask-r-cnn-baselines)
and [mmdetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/MODEL_ZOO.md#mask-r-cnn),
and 59 img/s in [torchvision](https://pytorch.org/blog/torchvision03/#detection-models).
Our number in the table above uses CUDA kernel of NMS (available only in TF
master with [PR30893](https://github.com/tensorflow/tensorflow/pull/30893)),
and `TRAINER=horovod`.
1. If CuDNN warmup is on, the training will start very slowly, until about
10k steps (or more if scale augmentation is used) to reach a maximum speed.
......
......@@ -99,8 +99,8 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
| R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 29h | <details><summary>2x+GN</summary>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` <br/>`TRAIN.LR_SCHEDULE=2x` |
| R50-FPN | 41.7;36.2 [:arrow_down:][R50FPN1xCas] | | 16h | <details><summary>+Cascade</summary>`FPN.CASCADE=True` </details> |
| R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 27h | <details><summary>standard</summary>`MODE_FPN=False`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] | 40.0;35.9 | 17h | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup> | | 64h | <details><summary>3x+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=3x` </details> |
| R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] <sup>[2](#ft2)</sup> | 40.0;35.9 | 17h | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] | | 64h | <details><summary>3x+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=3x` </details> |
| R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5 | 28h (on 64 V100s) | <details><summary>9x+GN+Cascade+TrainAug</summary>` FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=9x`<br/>`BACKBONE.FREEZE_AT=0`</details> |
[R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz
......@@ -116,7 +116,9 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
We compare models that have identical training & inference cost between the two implementations.
Their numbers can be different due to small implementation details.
<a id="ft2">2</a>: Our mAP is __10+ point__ better than the official model in [matterport/Mask_RCNN](https://github.com/matterport/Mask_RCNN/releases/tag/v2.0) with the same R101-FPN backbone.
<a id="ft2">2</a>: Our mAP is __7 point__ better than the official model in
[matterport/Mask_RCNN](https://github.com/matterport/Mask_RCNN/releases/tag/v2.0) which has the same architecture.
Our implementation is also [5x faster](https://github.com/tensorpack/benchmarks/tree/master/MaskRCNN).
<a id="ft3">3</a>: This entry does not use ImageNet pre-training. Detectron numbers are taken from Fig. 5 in [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883).
Note that our training strategy is slightly different: we enable cascade throughout the entire training.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment