<aid="ft1">1</a>: This implementation has slightly different configurations from detectron (e.g. batch size).
<aid="ft1">1</a>: Here we comapre models that have identical training & inference cost between the two implementation. However their numbers are different due to many small implementation details.
<aid="ft2">2</a>: Numbers taken from [Group Normalization](https://arxiv.org/abs/1803.08494)
Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can
be roughly reproduced, some are better but some are worse, probably due to many tiny implementation details.
Note that most of these numbers are better than what's in the paper.