Commit 85586fc5 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 72674731
......@@ -70,17 +70,17 @@ Evaluation or prediction will need the same `--config` used during training.
These models are trained with different configurations on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95.
MaskRCNN results contain both box and mask mAP.
| Backbone | mAP<br/>(box;mask) | Detectron mAP <br/> (box;mask) | Time on 8 V100s | Configurations <br/> (click to expand) |
| - | - | - | - | - |
| R50-C4 | 33.1 | | 18h | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> |
| R50-C4 | 36.6 | 36.5 | 44h | <details><summary>standard</summary>`MODE_MASK=False` </details> |
| R50-FPN | 37.4 | 37.9<sup>[1](#ft1)</sup> | 30h | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> |
| R50-C4 | 37.8;33.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50C4-MaskRCNN-Standard.npz) | 37.8;32.8 | 49h | <details><summary>standard</summary>`MODE_MASK=True` </details> |
| R50-FPN | 38.2;34.9 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50FPN-MaskRCNN-Standard.npz) | 38.6;34.5<sup>[1](#ft1)</sup> | 32h | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True` </details> |
| R50-FPN | 38.5;34.8 | 38.6;34.2<sup>[2](#ft2)</sup> | 34h | <details><summary>standard+ConvHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head` </details> |
| R50-FPN | 39.5;35.2 | 39.5;34.4<sup>[2](#ft2)</sup> | 34h | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details> |
| R50-FPN | 40.0;36.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50FPN-MaskRCNN-StandardGN.npz) | 40.3;35.7 | 44h | <details><summary>standard+GN</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head |
| R101-C4 | 40.8;35.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101C4-MaskRCNN-Standard.npz) | | 63h | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> |
| Backbone | mAP<br/>(box;mask) | Detectron mAP <br/> (box;mask) | Time on 8 V100s | Configurations <br/> (click to expand) |
| - | - | - | - | - |
| R50-C4 | 33.1 | | 18h | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> |
| R50-C4 | 36.6 | 36.5 | 44h | <details><summary>standard</summary>`MODE_MASK=False` </details> |
| R50-FPN | 37.4 | 37.9<sup>[1](#ft1)</sup> | 30h | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> |
| R50-C4 | 37.8;33.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50C4-MaskRCNN-Standard.npz) | 37.8;32.8 | 49h | <details><summary>standard</summary>`MODE_MASK=True` </details> |
| R50-FPN | 38.2;34.9 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50FPN-MaskRCNN-Standard.npz) | 38.6;34.5<sup>[1](#ft1)</sup> | 32h | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True` </details> |
| R50-FPN | 38.5;34.8 | 38.6;34.2<sup>[2](#ft2)</sup> | 34h | <details><summary>standard+ConvHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head` </details> |
| R50-FPN | 39.5;35.2 | 39.5;34.4<sup>[2](#ft2)</sup> | 34h | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details> |
| R50-FPN | 40.0;36.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50FPN-MaskRCNN-StandardGN.npz) | 40.3;35.7 | 44h | <details><summary>standard+GN</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` |
| R101-C4 | 40.8;35.1 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101C4-MaskRCNN-Standard.npz) | | 63h | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> |
<a id="ft1">1</a>: This implementation has slightly different configurations from detectron (e.g. batch size).
......
......@@ -21,9 +21,10 @@ To print flops:
```
It will print about 75Mflops, because the paper counts multiply+add as 1 flop.
Evaluate the [pretrained model](http://models.tensorpack.com/ShuffleNet/):
Download and evaluate the pretrained model:
```
./shufflenet.py --eval --data /path/to/ilsvrc --load /path/to/model
wget http://models.tensorpack.com/ImageNetModels/ShuffleNet.npz
./shufflenet.py --eval --data /path/to/ilsvrc --load ShuffleNet.npz
```
### AlexNet
......@@ -50,8 +51,8 @@ See `./vgg16.py --help` for usage.
|:------------------------------------------|---------------------|--------------------:|
| 29~30% (large variation with random seed) | 28% | 27.6% |
Note that the purpose of this experiment in the paper is not to claim GroupNorm is better
than BatchNorm, therefore the training settings and hyperpameters have not been individually tuned for best accuracy.
Note that the purpose of this experiment in the paper is not to claim GroupNorm is better
than BatchNorm, therefore the training settings and hyperpameters have not been individually tuned for best accuracy.
### ResNet
......
......@@ -112,7 +112,7 @@ if __name__ == '__main__':
parser.add_argument('--data', help='ILSVRC dataset dir')
parser.add_argument('--load', help='load a model for training or evaluation')
parser.add_argument('--fake', help='use FakeData to debug or benchmark this model', action='store_true')
parser.add_argument('--data_format', help='image data format',
parser.add_argument('--data-format', help='image data format',
default='NCHW', choices=['NCHW', 'NHWC'])
parser.add_argument('-d', '--depth', help='ResNet depth',
type=int, default=50, choices=[18, 34, 50, 101, 152])
......
......@@ -25,7 +25,7 @@ class ILSVRCMeta(object):
def __init__(self, dir=None):
if dir is None:
dir = get_dataset_path('ilsvrc_metadata')
self.dir = dir
self.dir = os.path.expanduser(dir)
mkdir_p(self.dir)
f = os.path.join(self.dir, 'synsets.txt')
if not os.path.isfile(f):
......@@ -141,6 +141,7 @@ class ILSVRC12Files(RNGDataFlow):
Same as in :class:`ILSVRC12`.
"""
assert name in ['train', 'test', 'val'], name
dir = os.path.expanduser(dir)
assert os.path.isdir(dir), dir
self.full_dir = os.path.join(dir, name)
self.name = name
......
......@@ -242,9 +242,15 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
shared_name=shared_name + '_NCCL_mean_square') * (1.0 / num_dev)
elif sync_statistics == 'horovod':
# Require https://github.com/uber/horovod/pull/331
import horovod
hvd_version = tuple(map(int, horovod.__version__.split('.')))
assert hvd_version >= (0, 13, 6), "sync_statistics needs horovod>=0.13.6 !"
import horovod.tensorflow as hvd
batch_mean = hvd.allreduce(batch_mean, average=True)
batch_mean_square = hvd.allreduce(batch_mean_square, average=True)
if hvd.size() == 1:
logger.warn("BatchNorm(sync_statistics='horovod') is used with only one process!")
else:
batch_mean = hvd.allreduce(batch_mean, average=True)
batch_mean_square = hvd.allreduce(batch_mean_square, average=True)
batch_var = batch_mean_square - tf.square(batch_mean)
batch_mean_vec = batch_mean
batch_var_vec = batch_var
......
......@@ -101,13 +101,17 @@ class MapGradient(GradientProcessor):
"""
Apply a function on all gradient if the name matches regex.
Keep the other gradients unchanged.
It can be used for gradient clipping, etc.
"""
def __init__(self, func, regex='.*'):
"""
Args:
func: takes a grad or (grad, var) pair and returns a grad. If return None, the
gradient is discarded (hence no update to the variable will happen).
func: a user-supplied function which takes one or two arguments.
The argument(s) can be either a `grad` tensor, or `grad` and `var`.
The function should return the new gradient to be used.
If it return None, the gradient is discarded (hence no update to the variable will happen).
regex (str): used to match variables. Defaults to match all variables.
"""
args = inspect.getargspec(func).args
......@@ -196,15 +200,14 @@ class PrintGradient(MapGradient):
class CheckGradient(MapGradient):
"""
Check for numeric issue.
See :func:`tf.check_numerics` for more information.
Run :func:`tf.check_numerics` for each gradient.
"""
def __init__(self):
super(CheckGradient, self).__init__(self._mapper)
def _mapper(self, grad, var):
# this is very slow.... see #3649
# this was very slow.... see #3649
# op = tf.Assert(tf.reduce_all(tf.is_finite(var)), [var], summarize=100)
grad = tf.check_numerics(grad, 'CheckGradient/' + var.op.name)
return grad
......@@ -215,26 +218,26 @@ class ScaleGradient(MapGradient):
Scale certain gradient by a multiplier.
"""
def __init__(self, multipliers, verbose=True, log=None):
def __init__(self, multipliers, verbose=True):
"""
Args:
multipliers (tuple or list): tuple of (regex, float), or list of tuples.
multipliers (tuple or list): tuple of (regex, float), or list of such tuples.
verbose (bool): whether to print logs or not
log: deprecated
Example:
Use double learning rate for all the bias (as in caffe):
Use double learning rate for all the bias (as in caffe), and freeze layer0:
.. code-block:: python
ScaleGradient(('.*/b', 2))
from tensorpack.tfutils import optimizer, gradproc
opt = optimizer.apply_grad_processors(
opt, [gradproc.ScaleGradient(
[('.*/b', 2.), ('layer0/.*', 0.)]
)])
"""
if not isinstance(multipliers, list):
multipliers = [multipliers]
self.multipliers = multipliers
if log is not None:
logger.warn("'log' in ScaleGradient(..) is renamed to 'verbose'.")
verbose = log
assert verbose in [True, False], verbose
self._verbose = verbose
super(ScaleGradient, self).__init__(self._mapper)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment