Commit 5af84e93 authored by Yuxin Wu's avatar Yuxin Wu

MaskRCNN: Set TRAIN.LR_SCHEDULE=1x

parent b0cbfd0d
...@@ -56,9 +56,9 @@ To train on a single machine: ...@@ -56,9 +56,9 @@ To train on a single machine:
To run distributed training, set `TRAINER=horovod` and refer to [HorovodTrainer docs](http://tensorpack.readthedocs.io/modules/train.html#tensorpack.train.HorovodTrainer). To run distributed training, set `TRAINER=horovod` and refer to [HorovodTrainer docs](http://tensorpack.readthedocs.io/modules/train.html#tensorpack.train.HorovodTrainer).
Options can be changed by either the command line or the `config.py` file (recommended). All options can be changed by either the command line or the `config.py` file (recommended).
Some reasonable configurations are listed in the table below. Some reasonable configurations are listed in the table below.
See [config.py](config.py) for details about each config. See [config.py](config.py) for details about how to correctly set `BACKBONE.WEIGHTS` and other configs.
### Inference: ### Inference:
...@@ -92,13 +92,13 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b ...@@ -92,13 +92,13 @@ Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can b
| R50-FPN | 37.5 | 36.7 | 11h | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> | | R50-FPN | 37.5 | 36.7 | 11h | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> |
| R50-C4 | 36.2;31.8 [:arrow_down:][R50C41x] | 35.8;31.4 | 23.5h | <details><summary>standard</summary>this is the default, no changes in config needed </details> | | R50-C4 | 36.2;31.8 [:arrow_down:][R50C41x] | 35.8;31.4 | 23.5h | <details><summary>standard</summary>this is the default, no changes in config needed </details> |
| R50-FPN | 38.2;34.8 | 37.7;33.9 | 13.5h | <details><summary>standard</summary>`MODE_FPN=True` </details> | | R50-FPN | 38.2;34.8 | 37.7;33.9 | 13.5h | <details><summary>standard</summary>`MODE_FPN=True` </details> |
| R50-FPN | 38.9;35.4 [:arrow_down:][R50FPN2x] | 38.6;34.5 | 25h | <details><summary>2x</summary>`MODE_FPN=True`<br/>`TRAIN.LR_SCHEDULE=[240000,320000,360000]` </details> | | R50-FPN | 38.9;35.4 [:arrow_down:][R50FPN2x] | 38.6;34.5 | 25h | <details><summary>2x</summary>`MODE_FPN=True`<br/>`TRAIN.LR_SCHEDULE=2x` </details> |
| R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 31h | <details><summary>2x+GN</summary>`MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` <br/>`TRAIN.LR_SCHEDULE=[240000,320000,360000]` | | R50-FPN-GN | 40.4;36.3 [:arrow_down:][R50FPN2xGN] | 40.3;35.7 | 31h | <details><summary>2x+GN</summary>`MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` <br/>`TRAIN.LR_SCHEDULE=2x` |
| R50-FPN | 41.7;36.2 | | 17h | <details><summary>+Cascade</summary>`MODE_FPN=True FPN.CASCADE=True` </details> | | R50-FPN | 41.7;36.2 | | 17h | <details><summary>+Cascade</summary>`MODE_FPN=True FPN.CASCADE=True` </details> |
| R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 28h | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> | | R101-C4 | 40.1;34.6 [:arrow_down:][R101C41x] | | 28h | <details><summary>standard</summary>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] | 40.0;35.9 | 18h | <details><summary>standard</summary>`MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> | | R101-FPN | 40.7;36.8 [:arrow_down:][R101FPN1x] | 40.0;35.9 | 18h | <details><summary>standard</summary>`MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]` </details> |
| R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup> | | 69h | <details><summary>3x+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[420000,500000,540000]` </details> | | R101-FPN | 46.6;40.3 [:arrow_down:][R101FPN3xCasAug] <sup>[2](#ft2)</sup> | | 69h | <details><summary>3x+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`TEST.RESULT_SCORE_THRESH=1e-4`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=3x` </details> |
| R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5 | 28h (on 64 V100s) | <details><summary>9x+GN+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=[1500000,1580000,1620000]`<br/>`BACKBONE.FREEZE_AT=0`</details> | | R101-FPN-GN<br/>(From Scratch) | 47.7;41.7 [:arrow_down:][R101FPN9xGNCasAugScratch] <sup>[3](#ft3)</sup> | 47.4;40.5 | 28h (on 64 V100s) | <details><summary>9x+GN+Cascade+TrainAug</summary>`MODE_FPN=True FPN.CASCADE=True`<br/>`BACKBONE.RESNET_NUM_BLOCKS=[3,4,23,3]`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head`<br/>`PREPROC.TRAIN_SHORT_EDGE_SIZE=[640,800]`<br/>`TRAIN.LR_SCHEDULE=9x`<br/>`BACKBONE.FREEZE_AT=0`</details> |
[R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz [R50C41x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50C41x.npz
[R50FPN2x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50FPN2x.npz [R50FPN2x]: http://models.tensorpack.com/FasterRCNN/COCO-MaskRCNN-R50FPN2x.npz
......
...@@ -79,7 +79,7 @@ _C = config # short alias to avoid coding ...@@ -79,7 +79,7 @@ _C = config # short alias to avoid coding
# mode flags --------------------- # mode flags ---------------------
_C.TRAINER = 'replicated' # options: 'horovod', 'replicated' _C.TRAINER = 'replicated' # options: 'horovod', 'replicated'
_C.MODE_MASK = True # FasterRCNN or MaskRCNN _C.MODE_MASK = True # Faster R-CNN or Mask R-CNN
_C.MODE_FPN = False _C.MODE_FPN = False
# dataset ----------------------- # dataset -----------------------
...@@ -100,7 +100,7 @@ _C.DATA.NUM_WORKERS = 10 ...@@ -100,7 +100,7 @@ _C.DATA.NUM_WORKERS = 10
# backbone ---------------------- # backbone ----------------------
_C.BACKBONE.WEIGHTS = '' _C.BACKBONE.WEIGHTS = ''
# To train from scratch, set it to empty # To train from scratch, set it to empty, and set FREEZE_AT to 0
# To train from ImageNet pre-trained models, use the one that matches your # To train from ImageNet pre-trained models, use the one that matches your
# architecture from http://models.tensorpack.com under the 'FasterRCNN' section. # architecture from http://models.tensorpack.com under the 'FasterRCNN' section.
# To train from an existing COCO model, use the path to that file, and change # To train from an existing COCO model, use the path to that file, and change
...@@ -110,7 +110,7 @@ _C.BACKBONE.RESNET_NUM_BLOCKS = [3, 4, 6, 3] # for resnet50 ...@@ -110,7 +110,7 @@ _C.BACKBONE.RESNET_NUM_BLOCKS = [3, 4, 6, 3] # for resnet50
# RESNET_NUM_BLOCKS = [3, 4, 23, 3] # for resnet101 # RESNET_NUM_BLOCKS = [3, 4, 23, 3] # for resnet101
_C.BACKBONE.FREEZE_AFFINE = False # do not train affine parameters inside norm layers _C.BACKBONE.FREEZE_AFFINE = False # do not train affine parameters inside norm layers
_C.BACKBONE.NORM = 'FreezeBN' # options: FreezeBN, SyncBN, GN, None _C.BACKBONE.NORM = 'FreezeBN' # options: FreezeBN, SyncBN, GN, None
_C.BACKBONE.FREEZE_AT = 2 # options: 0, 1, 2 _C.BACKBONE.FREEZE_AT = 2 # options: 0, 1, 2. How many stages in backbone to freeze (not training)
# Use a base model with TF-preferred padding mode, # Use a base model with TF-preferred padding mode,
# which may pad more pixels on right/bottom than top/left. # which may pad more pixels on right/bottom than top/left.
...@@ -136,11 +136,7 @@ _C.TRAIN.STARTING_EPOCH = 1 # the first epoch to start with, useful to continue ...@@ -136,11 +136,7 @@ _C.TRAIN.STARTING_EPOCH = 1 # the first epoch to start with, useful to continue
# the base learning rate are computed from BASE_LR and LR_SCHEDULE. # the base learning rate are computed from BASE_LR and LR_SCHEDULE.
# Therefore, there is *no need* to modify the config if you only change the number of GPUs. # Therefore, there is *no need* to modify the config if you only change the number of GPUs.
_C.TRAIN.LR_SCHEDULE = [120000, 160000, 180000] # "1x" schedule in detectron _C.TRAIN.LR_SCHEDULE = "1x" # "1x" schedule in detectron
# _C.TRAIN.LR_SCHEDULE = [240000, 320000, 360000] # "2x" schedule in detectron
# Longer schedules for from-scratch training (https://arxiv.org/abs/1811.08883):
# _C.TRAIN.LR_SCHEDULE = [960000, 1040000, 1080000] # "6x" schedule in detectron
# _C.TRAIN.LR_SCHEDULE = [1500000, 1580000, 1620000] # "9x" schedule in detectron
_C.TRAIN.EVAL_PERIOD = 25 # period (epochs) to run evaluation _C.TRAIN.EVAL_PERIOD = 25 # period (epochs) to run evaluation
# preprocessing -------------------- # preprocessing --------------------
...@@ -200,10 +196,10 @@ _C.FPN.FRCNN_CONV_HEAD_DIM = 256 ...@@ -200,10 +196,10 @@ _C.FPN.FRCNN_CONV_HEAD_DIM = 256
_C.FPN.FRCNN_FC_HEAD_DIM = 1024 _C.FPN.FRCNN_FC_HEAD_DIM = 1024
_C.FPN.MRCNN_HEAD_FUNC = 'maskrcnn_up4conv_head' # choices: maskrcnn_up4conv_{,gn_}head _C.FPN.MRCNN_HEAD_FUNC = 'maskrcnn_up4conv_head' # choices: maskrcnn_up4conv_{,gn_}head
# Mask-RCNN # Mask R-CNN
_C.MRCNN.HEAD_DIM = 256 _C.MRCNN.HEAD_DIM = 256
# Cascade-RCNN, only available in FPN mode # Cascade R-CNN, only available in FPN mode
_C.FPN.CASCADE = False _C.FPN.CASCADE = False
_C.CASCADE.IOUS = [0.5, 0.6, 0.7] _C.CASCADE.IOUS = [0.5, 0.6, 0.7]
_C.CASCADE.BBOX_REG_WEIGHTS = [[10., 10., 5., 5.], [20., 20., 10., 10.], [30., 30., 15., 15.]] _C.CASCADE.BBOX_REG_WEIGHTS = [[10., 10., 5., 5.], [20., 20., 10., 10.], [30., 30., 15., 15.]]
...@@ -261,6 +257,18 @@ def finalize_configs(is_training): ...@@ -261,6 +257,18 @@ def finalize_configs(is_training):
os.environ['TF_AUTOTUNE_THRESHOLD'] = '1' os.environ['TF_AUTOTUNE_THRESHOLD'] = '1'
assert _C.TRAINER in ['horovod', 'replicated'], _C.TRAINER assert _C.TRAINER in ['horovod', 'replicated'], _C.TRAINER
lr = _C.TRAIN.LR_SCHEDULE
if isinstance(lr, six.string_types):
if lr.endswith("x"):
LR_SCHEDULE_KITER = {
"{}x".format(k):
[180 * k - 120, 180 * k - 40, 180 * k]
for k in range(2, 10)}
LR_SCHEDULE_KITER["1x"] = [120, 160, 180]
_C.TRAIN.LR_SCHEDULE = [x * 1000 for x in LR_SCHEDULE_KITER[lr]]
else:
_C.TRAIN.LR_SCHEDULE = eval(lr)
# setup NUM_GPUS # setup NUM_GPUS
if _C.TRAINER == 'horovod': if _C.TRAINER == 'horovod':
import horovod.tensorflow as hvd import horovod.tensorflow as hvd
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment