Commit 220303dc authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent eb2492c4
...@@ -49,3 +49,6 @@ Speed: ...@@ -49,3 +49,6 @@ Speed:
3. With a large roi batch size (e.g. >= 256), GPU utilitization should stay above 90%. 3. With a large roi batch size (e.g. >= 256), GPU utilitization should stay above 90%.
4. This implementation is about 14% slower than detectron,
probably due to the lack of specialized ops (e.g. AffineChannel, ROIAlign) in TensorFlow.
It's certainly faster than other TF implementation.
...@@ -40,7 +40,7 @@ Predict on an image (and show output in a window): ...@@ -40,7 +40,7 @@ Predict on an image (and show output in a window):
./train.py --predict input.jpg --load /path/to/model ./train.py --predict input.jpg --load /path/to/model
``` ```
Evaluate the performance of a model and save to json. Evaluate the performance of a model on COCO, and save results to json.
(Pretrained models can be downloaded in [model zoo](http://models.tensorpack.com/FasterRCNN): (Pretrained models can be downloaded in [model zoo](http://models.tensorpack.com/FasterRCNN):
``` ```
./train.py --evaluate output.json --load /path/to/model ./train.py --evaluate output.json --load /path/to/model
...@@ -61,11 +61,10 @@ MaskRCNN results contain both bbox and segm mAP. ...@@ -61,11 +61,10 @@ MaskRCNN results contain both bbox and segm mAP.
|R-101 |512 |(800, 1333)|280k |40.1/34.4 |70h on 8 P100s| |R-101 |512 |(800, 1333)|280k |40.1/34.4 |70h on 8 P100s|
|R-101 |512 |(800, 1333)|360k |40.8/35.1 |63h on 8 V100s| |R-101 |512 |(800, 1333)|360k |40.8/35.1 |63h on 8 V100s|
The two 360k models have identical configurations with The two R-50 360k models have the same configuration __and mAP__
`R50-C4-2x` configuration in as the `R50-C4-2x` entries in
[Detectron Model Zoo](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#end-to-end-faster--mask-r-cnn-baselines). [Detectron Model Zoo](https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#end-to-end-faster--mask-r-cnn-baselines).
They get the __same performance__ with the official models, and are about 14% slower than the official implementation, So far this seems to be the only open source re-implementation that can reproduce mAP in Detectron.
probably due to the lack of specialized ops (e.g. AffineChannel, ROIAlign) in TensorFlow.
## Notes ## Notes
......
...@@ -334,13 +334,15 @@ class EvalCallback(Callback): ...@@ -334,13 +334,15 @@ class EvalCallback(Callback):
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument('--gpu', help='comma separated list of GPU(s) to use.') parser.add_argument('--gpu', help='comma separated list of GPU(s) to use. Default to all availalbe ones')
parser.add_argument('--load', help='load model') parser.add_argument('--load', help='load model for evaluation or training')
parser.add_argument('--logdir', help='logdir', default='train_log/maskrcnn') parser.add_argument('--logdir', help='log directory', default='train_log/maskrcnn')
parser.add_argument('--datadir', help='override config.BASEDIR') parser.add_argument('--datadir', help='override config.BASEDIR')
parser.add_argument('--visualize', action='store_true') parser.add_argument('--visualize', action='store_true')
parser.add_argument('--evaluate', help='path to the output json eval file') parser.add_argument('--evaluate', help="Run evaluation on COCO. "
parser.add_argument('--predict', help='path to the input image file') "This option is the path to the output json evaluation file")
parser.add_argument('--predict', help="Run prediction on a given image. "
"This argument is the path to the input image file")
args = parser.parse_args() args = parser.parse_args()
if args.datadir: if args.datadir:
config.BASEDIR = args.datadir config.BASEDIR = args.datadir
......
...@@ -4,13 +4,13 @@ ...@@ -4,13 +4,13 @@
Training examples with __reproducible performance__. Training examples with __reproducible performance__.
__The word "reproduce" should always mean reproduce performance__. __The word "reproduce" should always mean reproduce performance__.
With the magic of SGD, wrong deep learning code often appears to still work, With the magic of SGD, wrong deep learning code often appears to work, especially if you try it on toy datasets.
especially if you try it on toy datasets. Github is full of such deep learning code that "implements" but does not "reproduce" methods.
See [Unawareness of Deep Learning Mistakes](https://medium.com/@ppwwyyxx/unawareness-of-deep-learning-mistakes-d5b5774da0ba). See [Unawareness of Deep Learning Mistakes](https://medium.com/@ppwwyyxx/unawareness-of-deep-learning-mistakes-d5b5774da0ba).
We refuse toy examples. We refuse toy examples.
Instead of showing you 10 arbitrary networks trained on toy datasets with random final performance, Instead of showing you 10 arbitrary networks trained on toy datasets with random final performance,
tensorpack examples try to faithfully replicate experiments and performance in the paper as much as possible, tensorpack examples try to faithfully replicate experiments and performance in the paper,
so you're confident that they are correct. so you're confident that they are correct.
...@@ -19,7 +19,7 @@ These are all the toy examples in tensorpack. They are supposed to be just demos ...@@ -19,7 +19,7 @@ These are all the toy examples in tensorpack. They are supposed to be just demos
+ [An illustrative MNIST example with explanation of the framework](basics/mnist-convnet.py) + [An illustrative MNIST example with explanation of the framework](basics/mnist-convnet.py)
+ Tensorpack supports any symbolic libraries. See the same MNIST example written with [tf.layers](basics/mnist-tflayers.py), [tf-slim](basics/mnist-tfslim.py), and [with weights visualizations](basics/mnist-visualizations.py) + Tensorpack supports any symbolic libraries. See the same MNIST example written with [tf.layers](basics/mnist-tflayers.py), [tf-slim](basics/mnist-tfslim.py), and [with weights visualizations](basics/mnist-visualizations.py)
+ A tiny [Cifar ConvNet](basics/cifar-convnet.py) and [SVHN ConvNet](basics/svhn-digit-convnet.py) + A tiny [Cifar ConvNet](basics/cifar-convnet.py) and [SVHN ConvNet](basics/svhn-digit-convnet.py)
+ If you've used Keras, check out [Keras examples](keras) + If you've used Keras, check out [Keras+Tensorpack examples](keras)
+ [A boilerplate file to start with, for your own tasks](boilerplate.py) + [A boilerplate file to start with, for your own tasks](boilerplate.py)
## Vision: ## Vision:
......
...@@ -11,18 +11,23 @@ The training follows the __exact__ recipe used by the [Training ImageNet in 1 Ho ...@@ -11,18 +11,23 @@ The training follows the __exact__ recipe used by the [Training ImageNet in 1 Ho
and gets the same performance. and gets the same performance.
Models can be [downloaded here](http://models.tensorpack.com/ResNet/). Models can be [downloaded here](http://models.tensorpack.com/ResNet/).
This recipe has better performance than most open source implementations.
In fact, many papers that claim to "improve" ResNet only compete with a lower
baseline and they actually cannot beat this ResNet recipe.
| Model | Top 5 Error | Top 1 Error | | Model | Top 5 Error | Top 1 Error |
|:-------------------|-------------|------------:| |:-------------------|-------------|------------:|
| ResNet18 | 10.50% | 29.66% | | ResNet18 | 10.50% | 29.66% |
| ResNet34 | 8.56% | 26.17% | | ResNet34 | 8.56% | 26.17% |
| ResNet50 | 6.85% | 23.61% | | ResNet50 | 6.85% | 23.61% |
| ResNet50-SE | 6.24% | 22.64% | | ResNet50-SE | 6.24% | 22.64% |
| ResNet101 | 6.04% | 21.95% | | ResNet101 | 6.04% | 21.95% |
| ResNet152 | 5.78% | 21.51% | | ResNet152 | 5.78% | 21.51% |
To train, first decompress ImageNet data into [this structure](http://tensorpack.readthedocs.io/en/latest/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12), then: To train, first decompress ImageNet data into [this structure](http://tensorpack.readthedocs.io/en/latest/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12), then:
```bash ```bash
./imagenet-resnet.py --data /path/to/original/ILSVRC --gpu 0,1,2,3 -d 50 [--mode resnet/preact/se] ./imagenet-resnet.py --data /path/to/original/ILSVRC -d 50 [--mode resnet/preact/se]
# See ./imagenet-resnet.py -h for other options.
``` ```
You should be able to see good GPU utilization (95%~99%), if your data is fast enough. You should be able to see good GPU utilization (95%~99%), if your data is fast enough.
...@@ -37,6 +42,7 @@ See the [tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient ...@@ -37,6 +42,7 @@ See the [tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient
This script only converts and runs ImageNet-ResNet{50,101,152} Caffe models [released by MSRA](https://github.com/KaimingHe/deep-residual-networks). This script only converts and runs ImageNet-ResNet{50,101,152} Caffe models [released by MSRA](https://github.com/KaimingHe/deep-residual-networks).
Note that the architecture is different from the `imagenet-resnet.py` script and the models are not compatible. Note that the architecture is different from the `imagenet-resnet.py` script and the models are not compatible.
ResNets have evolved, generally you should not cite these numbers as baselines in your paper.
Usage: Usage:
```bash ```bash
...@@ -76,7 +82,7 @@ Note that the architecture is different from the offcial preact-ResNet18. ...@@ -76,7 +82,7 @@ Note that the architecture is different from the offcial preact-ResNet18.
Usage: Usage:
```bash ```bash
./cifar10-preact18-mixup.py # train without mixup ./cifar10-preact18-mixup.py # train without mixup
./cifar10-preact18-mixup.py --mixup # with mixup ./cifar10-preact18-mixup.py --mixup # with mixup
``` ```
Results of the reference code can be reproduced. Results of the reference code can be reproduced.
......
...@@ -145,7 +145,7 @@ if __name__ == '__main__': ...@@ -145,7 +145,7 @@ if __name__ == '__main__':
parser.add_argument('-n', '--num_units', parser.add_argument('-n', '--num_units',
help='number of units in each stage', help='number of units in each stage',
type=int, default=18) type=int, default=18)
parser.add_argument('--load', help='load model') parser.add_argument('--load', help='load model for training')
args = parser.parse_args() args = parser.parse_args()
NUM_UNITS = args.num_units NUM_UNITS = args.num_units
......
...@@ -63,6 +63,8 @@ def get_config(model, fake=False): ...@@ -63,6 +63,8 @@ def get_config(model, fake=False):
batch = args.batch // nr_tower batch = args.batch // nr_tower
logger.info("Running on {} towers. Batch size per tower: {}".format(nr_tower, batch)) logger.info("Running on {} towers. Batch size per tower: {}".format(nr_tower, batch))
if batch < 32 or batch > 64:
logger.warn("Batch size per tower not in [32, 64]. This may lead to worse accuracy than reported.")
if fake: if fake:
data = QueueInput(FakeData( data = QueueInput(FakeData(
[[batch, 224, 224, 3], [batch]], 1000, random=False, dtype='uint8')) [[batch, 224, 224, 3], [batch]], 1000, random=False, dtype='uint8'))
...@@ -106,17 +108,18 @@ def get_config(model, fake=False): ...@@ -106,17 +108,18 @@ def get_config(model, fake=False):
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument('--gpu', help='comma separated list of GPU(s) to use.') parser.add_argument('--gpu', help='comma separated list of GPU(s) to use. Default to use all available ones')
parser.add_argument('--data', help='ILSVRC dataset dir') parser.add_argument('--data', help='ILSVRC dataset dir')
parser.add_argument('--load', help='load model') parser.add_argument('--load', help='load a model for training or evaluation')
parser.add_argument('--fake', help='use fakedata to test or benchmark this model', action='store_true') parser.add_argument('--fake', help='use FakeData to debug or benchmark this model', action='store_true')
parser.add_argument('--data_format', help='specify NCHW or NHWC', parser.add_argument('--data_format', help='image data format',
type=str, default='NCHW') default='NCHW', choices=['NCHW', 'NHWC'])
parser.add_argument('-d', '--depth', help='resnet depth', parser.add_argument('-d', '--depth', help='ResNet depth',
type=int, default=50, choices=[18, 34, 50, 101, 152]) type=int, default=50, choices=[18, 34, 50, 101, 152])
parser.add_argument('--eval', action='store_true') parser.add_argument('--eval', action='store_true', help='run offline evaluation instead of training')
parser.add_argument('--batch', default=256, type=int, parser.add_argument('--batch', default=256, type=int,
help='total batch size. 32 per GPU gives best accuracy, higher values should be similarly good') help="total batch size. "
"Note that it's best to keep per-GPU batch size in [32, 64] to obtain the best accuracy.")
parser.add_argument('--mode', choices=['resnet', 'preact', 'se'], parser.add_argument('--mode', choices=['resnet', 'preact', 'se'],
help='variants of resnet to use', default='resnet') help='variants of resnet to use', default='resnet')
args = parser.parse_args() args = parser.parse_args()
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment