1. This implementation does not use specialized CUDA ops (e.g. ROIAlign),
and does not use batch of images.
Therefore it might be slower than other highly-optimized implementations.
Therefore it might be slower than other highly-optimized implementations.
With CUDA kernel of NMS (available only in TF master) and `HorovodTrainer`,
Our number in the table above uses CUDA kernel of NMS (available only in TF
this implementation can train a standard R50-FPN at 50 img/s on 8 V100s,
master with [PR30893](https://github.com/tensorflow/tensorflow/pull/30893)),
compared to 35 img/s in [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md#end-to-end-faster-and-mask-r-cnn-baselines)
and `TRAINER=horovod`.
and [mmdetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/MODEL_ZOO.md#mask-r-cnn),
and 59 img/s in [torchvision](https://pytorch.org/blog/torchvision03/#detection-models).
1. If CuDNN warmup is on, the training will start very slowly, until about
1. If CuDNN warmup is on, the training will start very slowly, until about
10k steps (or more if scale augmentation is used) to reach a maximum speed.
10k steps (or more if scale augmentation is used) to reach a maximum speed.
@@ -116,7 +116,9 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
...
@@ -116,7 +116,9 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
We compare models that have identical training & inference cost between the two implementations.
We compare models that have identical training & inference cost between the two implementations.
Their numbers can be different due to small implementation details.
Their numbers can be different due to small implementation details.
<aid="ft2">2</a>: Our mAP is __10+ point__ better than the official model in [matterport/Mask_RCNN](https://github.com/matterport/Mask_RCNN/releases/tag/v2.0) with the same R101-FPN backbone.
<aid="ft2">2</a>: Our mAP is __7 point__ better than the official model in
[matterport/Mask_RCNN](https://github.com/matterport/Mask_RCNN/releases/tag/v2.0) which has the same architecture.
Our implementation is also [5x faster](https://github.com/tensorpack/benchmarks/tree/master/MaskRCNN).
<aid="ft3">3</a>: This entry does not use ImageNet pre-training. Detectron numbers are taken from Fig. 5 in [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883).
<aid="ft3">3</a>: This entry does not use ImageNet pre-training. Detectron numbers are taken from Fig. 5 in [Rethinking ImageNet Pre-training](https://arxiv.org/abs/1811.08883).
Note that our training strategy is slightly different: we enable cascade throughout the entire training.
Note that our training strategy is slightly different: we enable cascade throughout the entire training.