Commit 631f011f authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 1567c2dc
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
__Everything__ other than the training iterations happen in the callbacks. __Everything__ other than the training iterations happen in the callbacks.
Most of the fancy things you want to do will probably end up here. Most of the fancy things you want to do will probably end up here.
Callbacks are called during training.
The time where each callback method gets called is demonstrated in this snippet. The time where each callback method gets called is demonstrated in this snippet.
```python ```python
def train(self): def train(self):
...@@ -28,7 +29,7 @@ Note that at each place, each callback will be called in the order they are give ...@@ -28,7 +29,7 @@ Note that at each place, each callback will be called in the order they are give
### Explain the Callback Methods ### Explain the Callback Methods
To write a callback, subclass `Callback` and implement the corresponding underscore-prefixed methods. To write a callback, subclass `Callback` and implement the corresponding underscore-prefixed methods.
You can overwrite any of the following methods to define a new callback: You can overwrite any of the following methods in the new callback:
* `_setup_graph(self)` * `_setup_graph(self)`
...@@ -61,7 +62,7 @@ You can overwrite any of the following methods to define a new callback: ...@@ -61,7 +62,7 @@ You can overwrite any of the following methods to define a new callback:
* `_before_epoch(self)`, `_after_epoch(self)` * `_before_epoch(self)`, `_after_epoch(self)`
`_trigger_epoch` should be enough for most cases, as can be seen from the scheduling snippet above. `_trigger_epoch` should be enough for most cases, as can be seen from the scheduling snippet above.
Use these two methods __only__ when you really need something to happen __immediately__ before/after an epoch. These two methods should be used __only__ when you really need something to happen __immediately__ before/after an epoch.
And when you do need to use them, make sure they are very very fast to avoid affecting other callbacks which use them. And when you do need to use them, make sure they are very very fast to avoid affecting other callbacks which use them.
* `_before_run(self, ctx)`, `_after_run(self, ctx, values)` * `_before_run(self, ctx)`, `_after_run(self, ctx, values)`
...@@ -79,8 +80,9 @@ You can overwrite any of the following methods to define a new callback: ...@@ -79,8 +80,9 @@ You can overwrite any of the following methods to define a new callback:
``` ```
The training loops would become `sess.run([training_op, my_op])`. The training loops would become `sess.run([training_op, my_op])`.
This is different from `sess.run(training_op); sess.run(my_op);`,
which is what you would get if you write `self.trainer.sess.run(my_op)` in `_trigger_step`. However, if you write `my_op.run()` in `_trigger_step`, the training loop would become
`sess.run(training_op); sess.run(my_op);`.
Usually the difference matters, please choose carefully. Usually the difference matters, please choose carefully.
* `_trigger_step(self)` * `_trigger_step(self)`
...@@ -90,7 +92,7 @@ You can overwrite any of the following methods to define a new callback: ...@@ -90,7 +92,7 @@ You can overwrite any of the following methods to define a new callback:
* `_trigger_epoch(self)` * `_trigger_epoch(self)`
Do something after each epoch has finished. Will call `self.trigger()` by default. Do something after each epoch has finished. This method calls `self.trigger()` by default.
* `_trigger(self)` * `_trigger(self)`
......
...@@ -289,7 +289,6 @@ if __name__ == '__main__': ...@@ -289,7 +289,6 @@ if __name__ == '__main__':
history_len=4) history_len=4)
E._init_memory() E._init_memory()
for k in E.get_data(): for _ in E.get_data():
import IPython as IP import IPython as IP
IP.embed(config=IP.terminal.ipapp.load_default_config()) IP.embed(config=IP.terminal.ipapp.load_default_config())
pass
...@@ -123,9 +123,6 @@ class Model(ModelDesc): ...@@ -123,9 +123,6 @@ class Model(ModelDesc):
def build_graph(self, theta, image, gt_image, gt_filter): def build_graph(self, theta, image, gt_image, gt_filter):
kernel_size = 9 kernel_size = 9
image = image
gt_image = gt_image
theta = tf.reshape(theta, [BATCH, 1, 1, 1]) - np.pi theta = tf.reshape(theta, [BATCH, 1, 1, 1]) - np.pi
image = tf.reshape(image, [BATCH, SHAPE, SHAPE, 1]) image = tf.reshape(image, [BATCH, SHAPE, SHAPE, 1])
gt_image = tf.reshape(gt_image, [BATCH, SHAPE, SHAPE, 1]) gt_image = tf.reshape(gt_image, [BATCH, SHAPE, SHAPE, 1])
......
...@@ -71,7 +71,7 @@ prediction will need to be run with the corresponding training configs. ...@@ -71,7 +71,7 @@ prediction will need to be run with the corresponding training configs.
## Results ## Results
These models are trained with different configurations on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95. These models are trained on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95.
Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can be roughly reproduced. Performance in [Detectron](https://github.com/facebookresearch/Detectron/) can be roughly reproduced.
Mask R-CNN results contain both box and mask mAP. Mask R-CNN results contain both box and mask mAP.
......
...@@ -204,7 +204,6 @@ if __name__ == '__main__': ...@@ -204,7 +204,6 @@ if __name__ == '__main__':
parser.add_argument('--data', help='Image directory', required=True) parser.add_argument('--data', help='Image directory', required=True)
parser.add_argument('--mode', choices=['AtoB', 'BtoA'], default='AtoB') parser.add_argument('--mode', choices=['AtoB', 'BtoA'], default='AtoB')
parser.add_argument('-b', '--batch', type=int, default=1) parser.add_argument('-b', '--batch', type=int, default=1)
global args
args = parser.parse_args() args = parser.parse_args()
if args.gpu: if args.gpu:
os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
ImageNet training code of ResNet, ShuffleNet, DoReFa-Net, AlexNet, Inception, VGG with tensorpack. ImageNet training code of ResNet, ShuffleNet, DoReFa-Net, AlexNet, Inception, VGG with tensorpack.
To train any of the models, just do `./{model}.py --data /path/to/ilsvrc`. To train any of the models, just do `./{model}.py --data /path/to/ilsvrc`.
More options are available in `./{model}.py -h`. More options are available in `./{model}.py --help`.
Expected format of data directory is described in [docs](http://tensorpack.readthedocs.io/en/latest/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12). Expected format of data directory is described in [docs](http://tensorpack.readthedocs.io/en/latest/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12).
Some pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/). Some pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/).
...@@ -12,12 +12,12 @@ Reproduce ImageNet results of the following two papers: ...@@ -12,12 +12,12 @@ Reproduce ImageNet results of the following two papers:
+ [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083) + [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083)
+ [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164) + [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164)
| Model | Flops | Top 1 Error | Flags | | Model | Flops | Top-1 Error | Claimed Error | Flags |
|:---------------------------------------------------------------------------------------------------------|:------|:-----------:|:-------------:| |:---------------------------------------------------------------------------------------------------------|:------|:-----------:|:-------------:|:-------------:|
| ShuffleNetV1 0.5x [:arrow_down:](http://models.tensorpack.com/ImageNetModels/ShuffleNetV1-0.5x-g=8.npz) | 40M | 40.8% | `-r=0.5` | | ShuffleNetV1 0.5x [:arrow_down:](http://models.tensorpack.com/ImageNetModels/ShuffleNetV1-0.5x-g=8.npz) | 40M | 40.8% | 42.3% | `-r=0.5` |
| ShuffleNetV1 1x [:arrow_down:](http://models.tensorpack.com/ImageNetModels/ShuffleNetV1-1x-g=8.npz) | 140M | 32.6% | `-r=1` | | ShuffleNetV1 1x [:arrow_down:](http://models.tensorpack.com/ImageNetModels/ShuffleNetV1-1x-g=8.npz) | 140M | 32.6% | 32.4% | `-r=1` |
| ShuffleNetV2 0.5x [:arrow_down:](http://models.tensorpack.com/ImageNetModels/ShuffleNetV2-0.5x.npz) | 41M | 39.5% | `-r=0.5 --v2` | | ShuffleNetV2 0.5x [:arrow_down:](http://models.tensorpack.com/ImageNetModels/ShuffleNetV2-0.5x.npz) | 41M | 39.5% | 39.7% | `-r=0.5 --v2` |
| ShuffleNetV2 1x [:arrow_down:](http://models.tensorpack.com/ImageNetModels/ShuffleNetV2-1x.npz) | 146M | 30.6% | `-r=1 --v2` | | ShuffleNetV2 1x [:arrow_down:](http://models.tensorpack.com/ImageNetModels/ShuffleNetV2-1x.npz) | 146M | 30.6% | 30.6% | `-r=1 --v2` |
To print flops: To print flops:
```bash ```bash
...@@ -34,14 +34,14 @@ wget http://models.tensorpack.com/ImageNetModels/ShuffleNetV2-0.5x.npz ...@@ -34,14 +34,14 @@ wget http://models.tensorpack.com/ImageNetModels/ShuffleNetV2-0.5x.npz
This AlexNet script is quite close to the settings in its [original This AlexNet script is quite close to the settings in its [original
paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks). paper](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks).
Trained with 64x2 batch size, the script reaches 58% single-crop validation Trained with 2 GPUs and 64 batch size per GPU, the script reaches 58% single-crop validation
accuracy after 100 epochs (21 hours on 2 V100s). accuracy after 100 epochs (21h on 2 V100s).
It also puts in tensorboard the first-layer filter visualizations similar to the paper. It also puts in tensorboard the first-layer filter visualizations similar to the paper.
See `./alexnet.py --help` for usage. See `./alexnet.py --help` for usage.
### VGG16 ### VGG16
This VGG16 script, when trained with 32x8 batch size, reaches the following This VGG16 script, when trained with 8 GPUs and 32 batch size per GPU, reaches the following
validation error after 100 epochs (30h with 8 P100s). This is the code for the VGG validation error after 100 epochs (30h with 8 P100s). This is the code for the VGG
experiments in the paper [Group Normalization](https://arxiv.org/abs/1803.08494). experiments in the paper [Group Normalization](https://arxiv.org/abs/1803.08494).
See `./vgg16.py --help` for usage. See `./vgg16.py --help` for usage.
...@@ -51,7 +51,7 @@ See `./vgg16.py --help` for usage. ...@@ -51,7 +51,7 @@ See `./vgg16.py --help` for usage.
| 29~30% (large variation with random seed) | 28% | 27.6% | | 29~30% (large variation with random seed) | 28% | 27.6% |
Note that the purpose of this experiment in the paper is not to claim GroupNorm is better Note that the purpose of this experiment in the paper is not to claim GroupNorm is better
than BatchNorm, therefore the training settings and hyperpameters have not been individually tuned for best accuracy. than BatchNorm, therefore the training settings and hyperpameters may not be the best for BatchNorm.
### Inception-BN ### Inception-BN
......
...@@ -22,8 +22,6 @@ from imagenet_utils import ( ...@@ -22,8 +22,6 @@ from imagenet_utils import (
get_imagenet_dataflow, get_imagenet_dataflow,
ImageNetModel, GoogleNetResize, eval_on_ILSVRC12) ImageNetModel, GoogleNetResize, eval_on_ILSVRC12)
TOTAL_BATCH_SIZE = 1024
@layer_register(log_shape=True) @layer_register(log_shape=True)
def DepthConv(x, out_channel, kernel_shape, padding='SAME', stride=1, def DepthConv(x, out_channel, kernel_shape, padding='SAME', stride=1,
...@@ -235,6 +233,7 @@ if __name__ == '__main__': ...@@ -235,6 +233,7 @@ if __name__ == '__main__':
parser.add_argument('--group', type=int, default=8, choices=[3, 4, 8], parser.add_argument('--group', type=int, default=8, choices=[3, 4, 8],
help="Number of groups for ShuffleNetV1") help="Number of groups for ShuffleNetV1")
parser.add_argument('--v2', action='store_true', help='Use ShuffleNetV2') parser.add_argument('--v2', action='store_true', help='Use ShuffleNetV2')
parser.add_argument('--batch', type=int, default=1024, help='total batch size')
parser.add_argument('--load', help='path to load a model from') parser.add_argument('--load', help='path to load a model from')
parser.add_argument('--eval', action='store_true') parser.add_argument('--eval', action='store_true')
parser.add_argument('--flops', action='store_true', help='print flops and exit') parser.add_argument('--flops', action='store_true', help='print flops and exit')
...@@ -246,6 +245,10 @@ if __name__ == '__main__': ...@@ -246,6 +245,10 @@ if __name__ == '__main__':
if args.v2 and args.group != parser.get_default('group'): if args.v2 and args.group != parser.get_default('group'):
logger.error("group= is not used in ShuffleNetV2!") logger.error("group= is not used in ShuffleNetV2!")
if args.batch != 1024:
logger.warn("Total batch size != 1024, you need to change other hyperparameters to get the same results.")
TOTAL_BATCH_SIZE = args.batch
model = Model() model = Model()
if args.eval: if args.eval:
......
...@@ -49,7 +49,7 @@ class TestDataSpeed(ProxyDataFlow): ...@@ -49,7 +49,7 @@ class TestDataSpeed(ProxyDataFlow):
self.ds.reset_state() self.ds.reset_state()
itr = self.ds.__iter__() itr = self.ds.__iter__()
if self.warmup: if self.warmup:
for d in tqdm.trange(self.warmup, **get_tqdm_kwargs()): for _ in tqdm.trange(self.warmup, **get_tqdm_kwargs()):
next(itr) next(itr)
# add smoothing for speed benchmark # add smoothing for speed benchmark
with get_tqdm(total=self.test_size, with get_tqdm(total=self.test_size,
......
...@@ -84,7 +84,8 @@ class TowerTrainer(Trainer): ...@@ -84,7 +84,8 @@ class TowerTrainer(Trainer):
def get_predictor(self, input_names, output_names, device=0): def get_predictor(self, input_names, output_names, device=0):
""" """
Returns a callable predictor built under ``TowerContext(is_training=False)``. This method will build the tower under ``TowerContext(is_training=False)``,
and returns a callable predictor with input placeholders & output tensors in this tower.
Args: Args:
input_names (list): list of input names, matching the inputs declared for the trainer. input_names (list): list of input names, matching the inputs declared for the trainer.
......
...@@ -53,8 +53,7 @@ def _get_time_str(): ...@@ -53,8 +53,7 @@ def _get_time_str():
return datetime.now().strftime('%m%d-%H%M%S') return datetime.now().strftime('%m%d-%H%M%S')
# logger file and directory: # globals: logger file and directory:
global LOG_DIR, _FILE_HANDLER
LOG_DIR = None LOG_DIR = None
_FILE_HANDLER = None _FILE_HANDLER = None
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment