update docs; add BN WD in resnet

964f5d03 · Yuxin Wu · 274c7544 · 964f5d03 · 964f5d03 · 964f5d03
Commit 964f5d03 authored Mar 01, 2019 by Yuxin Wu
8 changed files
--- a/README.md
+++ b/README.md
@@ -34,9 +34,9 @@ See [tutorials and documentations](http://tensorpack.readthedocs.io/tutorial/ind
 ## Examples:
-We refuse toy examples.
+We refuse toy examples. We refuse low-quality implementations.
-Instead of showing you 10 arbitrary networks trained on toy datasets,
+Unlike most open source repos which only __implement__ papers,
-[Tensorpack examples](examples) faithfully replicate papers and care about reproducing numbers,
+[Tensorpack examples](examples) faithfully __reproduce__ papers,
 demonstrating its __flexibility__ for actual research.
 ### Vision:

--- a/examples/DeepQNetwork/README.md
+++ b/examples/DeepQNetwork/README.md
@@ -20,9 +20,9 @@ Claimed performance in the paper can be reproduced, on several games I've tested
 ![DQN](curve-breakout.png)
-On one (Maxwell) TitanX, Double-DQN took ~18 hours of training to reach a score of 400 on breakout.
+On one GTX 1080Ti, the ALE version took ~3 hours of training to reach 21 (maximum) score on
+Pong, ~15 hours of training to reach 400 score on Breakout.
-Double-DQN with nature paper setting runs at 60 batches (3840 trained frames, 240 seen frames, 960 game frames) per second on TitanX.
+It runs at 50 batches (~3.2k trained frames, 200 seen frames, 800 game frames) per second on GTX 1080Ti.
 ## How to use

--- a/examples/FasterRCNN/train.py
+++ b/examples/FasterRCNN/train.py
@@ -423,7 +423,8 @@ if __name__ == '__main__':
    if args.visualize or args.evaluate or args.predict:
        if not tf.test.is_gpu_available():
            from tensorflow.python.framework import test_util
-            assert test_util.IsMklEnabled(), "Inference requires either GPU support or MKL support!"
+            assert get_tf_version_tuple() >= (1, 7) and test_util.IsMklEnabled(), \
+                "Inference requires either GPU support or MKL support!"
        assert args.load
        finalize_configs(is_training=False)

--- a/examples/README.md
+++ b/examples/README.md
@@ -9,14 +9,15 @@ Github is full of deep learning code that "implements" but does not "reproduce"
 methods, and you'll not know whether the implementation is actually correct.
 See [Unawareness of Deep Learning Mistakes](https://medium.com/@ppwwyyxx/unawareness-of-deep-learning-mistakes-d5b5774da0ba).
-We refuse toy examples.
+We refuse toy examples. We refuse low-quality implementations.
-Instead of showing you 10 arbitrary networks trained on toy datasets with random final performance,
+Unlike most open source repos which only __implement__ methods,
-tensorpack examples try to faithfully replicate experiments and performance in the paper,
+[Tensorpack examples](examples) faithfully __reproduce__ 
+experiments and performance in the paper,
 so you're confident that they are correct.
 ## Getting Started:
-These are all the toy examples in tensorpack. They are supposed to be just demos.
+These are the only toy examples in tensorpack. They are supposed to be just demos.
 + [An illustrative MNIST example with explanation of the framework](basics/mnist-convnet.py)
 + Tensorpack supports any symbolic libraries. See the same MNIST example written with [tf.layers](basics/mnist-tflayers.py), [tf-slim](basics/mnist-tfslim.py), and [with weights visualizations](basics/mnist-visualizations.py)
 + A tiny [Cifar ConvNet](basics/cifar-convnet.py) and [SVHN ConvNet](basics/svhn-digit-convnet.py)
@@ -27,7 +28,7 @@ These are all the toy examples in tensorpack. They are supposed to be just demos
 | Name                                                                                                                                                  | Performance        |
 | ---                                                                                                                                                   | ---                |
 | Train [ResNet](ResNet), [ShuffleNet and other models](ImageNetModels) on ImageNet                                                                     | reproduce paper    |
-| [Train Mask/Faster R-CNN on COCO](FasterRCNN)                                                                                                   | reproduce paper    |
+| [Train Mask/Faster R-CNN on COCO](FasterRCNN)                                                                                                         | reproduce paper    |
 | [Generative Adversarial Network(GAN) variants](GAN), including DCGAN, InfoGAN, <br/> Conditional GAN, WGAN, BEGAN, DiscoGAN, Image to Image, CycleGAN | visually reproduce |
 | [DoReFa-Net: training binary / low-bitwidth CNN on ImageNet](DoReFa-Net)                                                                              | reproduce paper    |
 | [Fully-convolutional Network for Holistically-Nested Edge Detection(HED)](HED)                                                                        | visually reproduce |
@@ -37,7 +38,7 @@ These are all the toy examples in tensorpack. They are supposed to be just demos
 | Single-image super-resolution using [EnhanceNet](SuperResolution)                                                                                     |                    |
 | Learn steering filters with [Dynamic Filter Networks](DynamicFilterNetwork)                                                                           | visually reproduce |
 | Load a pre-trained [AlexNet, VGG, or Convolutional Pose Machines](CaffeModels)                                                                        |                    |
-| Load a pre-trained [FlowNet2-S, FlowNet2-C, FlowNet2](OpticalFlow) | |
+| Load a pre-trained [FlowNet2-S, FlowNet2-C, FlowNet2](OpticalFlow)                                                                                    |                    |
 ## Reinforcement Learning:
 | Name                                                                                                     | Performance     |

--- a/examples/ResNet/imagenet-resnet.py
+++ b/examples/ResNet/imagenet-resnet.py
@@ -103,16 +103,25 @@ def get_config(model):
 if __name__ == '__main__':
    parser = argparse.ArgumentParser()
+    # generic:
    parser.add_argument('--gpu', help='comma separated list of GPU(s) to use. Default to use all available ones')
-    parser.add_argument('--data', help='ILSVRC dataset dir')
+    parser.add_argument('--eval', action='store_true', help='run offline evaluation instead of training')
    parser.add_argument('--load', help='load a model for training or evaluation')
+    # data:
+    parser.add_argument('--data', help='ILSVRC dataset dir')
    parser.add_argument('--fake', help='use FakeData to debug or benchmark this model', action='store_true')
    parser.add_argument('--symbolic', help='use symbolic data loader', action='store_true')
-    parser.add_argument('--data-format', help='image data format',
+    # model:
+    parser.add_argument('--data-format', help='the image data layout used by the model',
                        default='NCHW', choices=['NCHW', 'NHWC'])
    parser.add_argument('-d', '--depth', help='ResNet depth',
                        type=int, default=50, choices=[18, 34, 50, 101, 152])
-    parser.add_argument('--eval', action='store_true', help='run offline evaluation instead of training')
+    parser.add_argument('--weight-decay-norm', action='store_true',
+                        help="apply weight decay on normalization layers (gamma & beta)."
+                             "This is used in torch/pytorch, and slightly "
+                             "improves validation accuracy of large models.")
    parser.add_argument('--batch', default=256, type=int,
                        help="total batch size. "
                        "Note that it's best to keep per-GPU batch size in [32, 64] to obtain the best accuracy."
@@ -126,6 +135,9 @@ if __name__ == '__main__':
    model = Model(args.depth, args.mode)
    model.data_format = args.data_format
+    if model.weight_decay_norm:
+        model.weight_decay_pattern = ".*/W|.*/gamma|.*/beta"
    if args.eval:
        batch = 128    # something that can run on one gpu
        ds = get_imagenet_dataflow(args.data, 'val', batch)

--- a/tensorpack/models/conv2d.py
+++ b/tensorpack/models/conv2d.py
@@ -198,7 +198,7 @@ def Conv2DTranspose(
    else:
        # Our own implementation, to avoid Keras bugs. https://github.com/tensorflow/tensorflow/issues/25946
        assert kernel_regularizer is None and bias_regularizer is None and activity_regularizer is None, \
-            "Unsupported arguments due to bug in TensorFlow 1.13"
+            "Unsupported arguments due to Keras bug in TensorFlow 1.13"
        data_format = get_data_format(data_format, keras_mode=False)
        shape_dyn = tf.shape(inputs)
        strides2d = shape2d(strides)

--- a/tensorpack/tfutils/varmanip.py
+++ b/tensorpack/tfutils/varmanip.py
@@ -226,7 +226,7 @@ def is_training_name(name):
        return True
    if name.endswith('/Adagrad'):
        return True
-    if name.startswith('EMA/'):  # all the moving average summaries
+    if name.startswith('EMA/') or '/EMA/' in name:  # all the moving average summaries
        return True
    if name.startswith('AccumGrad') or name.endswith('/AccumGrad'):
        return True

--- a/tensorpack/train/interface.py
+++ b/tensorpack/train/interface.py
@@ -56,7 +56,7 @@ def launch_train_with_config(config, trainer):
    2. Call `trainer.setup_graph` with the input as well as `config.model`.
    3. Call `trainer.train` with rest of the attributes of config.
-    See tutorial at
+    See the `related tutorial
    <https://tensorpack.readthedocs.io/tutorial/training-interface.html#with-modeldesc-and-trainconfig>`_
    to learn more.