[DoReFa] more models

67e1ac5b · Yuxin Wu · deaf1de2 · 67e1ac5b · 67e1ac5b · 67e1ac5b
Commit 67e1ac5b authored Aug 01, 2018 by Yuxin Wu
9 changed files
--- a/docs/tutorial/trainer.md
+++ b/docs/tutorial/trainer.md
@@ -81,5 +81,6 @@ To run distributed training, first install horovod properly, then refer to the
 documentation of [HorovodTrainer](../modules/train.html#tensorpack.train.HorovodTrainer).

 Tensorpack has implemented some other distributed trainers using TF's native API,
-but TF's native support for distributed training isn't very high-performance even today.
+but TensorFlow is not actively supporting its distributed training features, and
+its native distributed performance isn't very good even today.
 Therefore those trainers are not actively maintained and are not recommended for use.
--- a/examples/DoReFa-Net/README.md
+++ b/examples/DoReFa-Net/README.md
@@ -14,17 +14,17 @@ This is a good set of baselines for research in model quantization.
 These quantization techniques, when applied on AlexNet, achieves the following ImageNet performance in this implementation:

 | Model                              | Bit Width <br/> (weights, activations, gradients) | Top 1 Validation Error <sup>[1](#ft1)</sup>                                      |
-|:----------------------------------:|:-------------------------------------------------:|:-------------------------------------------------------------------------------:|
+|:----------------------------------:|:-------------------------------------------------:|:--------------------------------------------------------------------------------:|
 | Full Precision<sup>[2](#ft2)</sup> | 32,32,32                                          | 40.3%                                                                            |
 | TTQ                                | t,32,32                                           | 42.0%                                                                            |
-| BWN                                | 1,32,32                                           | 44.6%                                                                           |
-| BNN                                | 1,1,32                                            | 51.9%                                                                           |
+| BWN                                | 1,32,32                                           | 44.3% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-1,32,32.npz) |
+| BNN                                | 1,1,32                                            | 51.5% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-1,1,32.npz) |
 | DoReFa                             | 8,8,8                                             | 42.0% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-8,8,8.npz)  |
 | DoReFa                             | 1,2,32                                            | 46.6%                                                                            |
 | DoReFa                             | 1,2,6                                             | 46.8% [:arrow_down:](http://models.tensorpack.com/DoReFa-Net/AlexNet-1,2,6.npz)  |
 | DoReFa                             | 1,2,4                                             | 54.0%                                                                            |

- <a id="ft1">1</a>: These numbers were obtained by training on 8 GPUs with a total batch size of 256.
+ <a id="ft1">1</a>: These numbers were obtained by training on 8 GPUs with a total batch size of 256 (otherwise the performance may become slightly different).
 The DoReFa-Net models reach slightly better performance than our paper, due to
 more sophisticated augmentations.


--- a/examples/DoReFa-Net/alexnet-dorefa.py
+++ b/examples/DoReFa-Net/alexnet-dorefa.py
@@ -13,11 +13,13 @@ import sys

 from tensorpack import *
 from tensorpack.tfutils.summary import add_param_summary
+from tensorpack.tfutils.sessinit import get_model_loader
 from tensorpack.tfutils.varreplace import remap_variables
 from tensorpack.dataflow import dataset
 from tensorpack.utils.gpu import get_num_gpu

-from imagenet_utils import get_imagenet_dataflow, fbresnet_augmentor, ImageNetModel
+from imagenet_utils import (
+    get_imagenet_dataflow, fbresnet_augmentor, ImageNetModel, eval_on_ILSVRC12)
 from dorefa import get_dorefa, ternarize

 """
@@ -199,6 +201,7 @@ if __name__ == '__main__':
    parser.add_argument('--dorefa', required=True,
                        help='number of bits for W,A,G, separated by comma. W="t" means TTQ')
    parser.add_argument('--run', help='run on a list of images with the pretrained model', nargs='*')
+    parser.add_argument('--eval', action='store_true')
    args = parser.parse_args()

    dorefa = args.dorefa.split(',')
@@ -215,6 +218,11 @@ if __name__ == '__main__':
        assert args.load.endswith('.npz')
        run_image(Model(), DictRestore(dict(np.load(args.load))), args.run)
        sys.exit()
+    if args.eval:
+        BATCH_SIZE = 128
+        ds = get_data('val')
+        eval_on_ILSVRC12(Model(), get_model_loader(args.load), ds)
+        sys.exit()

    nr_tower = max(get_num_gpu(), 1)
    BATCH_SIZE = TOTAL_BATCH_SIZE // nr_tower

--- a/examples/FasterRCNN/README.md
+++ b/examples/FasterRCNN/README.md
@@ -80,7 +80,7 @@ MaskRCNN results contain both box and mask mAP.
 | R50-FPN  | 39.8;35.5                                                                                             | 39.5;34.4<sup>[2](#ft2)</sup>  | 34h             | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details>                                                                    |
 | R50-FPN  | 40.3;36.4 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R50FPN-MaskRCNN-StandardGN.npz) | 40.3;35.7                      | 44h             | <details><summary>standard+GN</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.NORM=GN BACKBONE.NORM=GN`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head`<br/>`FPN.MRCNN_HEAD_FUNC=maskrcnn_up4conv_gn_head` |
 | R101-C4  | 41.7;35.5 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101C4-MaskRCNN-Standard.npz)   |                                | 63h             | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details>                                                                                                     |
- | R101-FPN | 40.7;36.9[:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101FPN-MaskRCNN-Standard.npz)   | 40.9;36.4                      | 40h             | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details>                                                                                        |
+ | R101-FPN | 40.7;36.9 [:arrow_down:](http://models.tensorpack.com/FasterRCNN/COCO-R101FPN-MaskRCNN-Standard.npz)   | 40.9;36.4                      | 40h             | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True`<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details>                                                                                        |

 <a id="ft1">1</a>: This implementation has slightly different configurations from detectron (e.g. batch size).


--- a/examples/FasterRCNN/basemodel.py
+++ b/examples/FasterRCNN/basemodel.py
@@ -102,8 +102,8 @@ def image_preprocess(image, bgr=True):
            mean = mean[::-1]
            std = std[::-1]
        image_mean = tf.constant(mean, dtype=tf.float32)
-        image_std = tf.constant(std, dtype=tf.float32)
-        image = (image - image_mean) / image_std
+        image_invstd = tf.constant(1.0 / std, dtype=tf.float32)
+        image = (image - image_mean) * image_invstd
        return image



--- a/examples/FasterRCNN/model_rpn.py
+++ b/examples/FasterRCNN/model_rpn.py
@@ -81,7 +81,7 @@ def rpn_losses(anchor_labels, anchor_boxes, label_logits, box_logits):
        add_moving_summary(*summaries)

    # Per-level loss summaries in FPN may appear lower due to the use of a small placeholder.
-    # But the total loss is still the same.  TODO make the summary op smarter
+    # But the total RPN loss will be fine.  TODO make the summary op smarter
    placeholder = 0.
    label_loss = tf.nn.sigmoid_cross_entropy_with_logits(
        labels=tf.to_float(valid_anchor_labels), logits=valid_label_logits)

--- a/examples/ImageNetModels/README.md
+++ b/examples/ImageNetModels/README.md
@@ -3,7 +3,7 @@ ImageNet training code of ResNet, ShuffleNet, DoReFa-Net, AlexNet, Inception, VG

 To train any of the models, just do `./{model}.py --data /path/to/ilsvrc`.
 Expected format of data directory is described in [docs](http://tensorpack.readthedocs.io/en/latest/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12).
-Pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/).
+Some pretrained models can be downloaded at [tensorpack model zoo](http://models.tensorpack.com/).

 ### ShuffleNet

@@ -35,11 +35,7 @@ accuracy after 100 epochs (21 hours on 2 V100s).
 It also puts in tensorboard the first-layer filter visualizations similar to the paper.
 See `./alexnet.py --help` for usage.

-### Inception-BN, VGG16
-
-This Inception-BN script reaches 27% single-crop validation error after 300k steps with 6 GPUs.
-The training recipe is very different from the original paper because the paper
-is a bit vague on these details.
+### VGG16

 This VGG16 script, when trained with 32x8 batch size, reaches the following
 validation error after 100 epochs (30h with 8 P100s). This is the code for the VGG
@@ -53,6 +49,12 @@ See `./vgg16.py --help` for usage.
 Note that the purpose of this experiment in the paper is not to claim GroupNorm is better
 than BatchNorm, therefore the training settings and hyperpameters have not been individually tuned for best accuracy.

+### Inception-BN
+
+This Inception-BN script reaches 27% single-crop validation error after 300k steps with 6 GPUs.
+The training recipe is very different from the original paper because the paper
+is a bit vague on these details.
+
 ### ResNet

 See [ResNet examples](../ResNet). It includes variants like pre-activation

--- a/tensorpack/graph_builder/distributed.py
+++ b/tensorpack/graph_builder/distributed.py
@@ -72,9 +72,10 @@ class DistributedParameterServerBuilder(DataParallelBuilder, DistributedBuilderB
    `tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_.
    However this implementation hasn't been well tested.
    It probably still has issues in model saving, etc.
+    Also, TensorFlow team is not actively maintaining distributed training features.
    Check :class:`HorovodTrainer` and
    `ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
-    for faster distributed examples.
+    for better distributed training support.

    Note:
        1. Gradients are not averaged across workers, but applied to PS variables
@@ -143,10 +144,11 @@ class DistributedReplicatedBuilder(DataParallelBuilder, DistributedBuilderBase):

    It is an equivalent of ``--variable_update=distributed_replicated`` in
    `tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_.
-    Note that the performance of this trainer is still not satisfactory.
+    Note that the performance of this trainer is still not satisfactory,
+    and TensorFlow team is not actively maintaining distributed training features.
    Check :class:`HorovodTrainer` and
    `ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
-    for faster distributed examples.
+    for better distributed training support.

    Note:
        1. Gradients are not averaged across workers, but applied to PS variables

--- a/tensorpack/tfutils/model_utils.py
+++ b/tensorpack/tfutils/model_utils.py
@@ -11,6 +11,7 @@ from ..utils import logger
 __all__ = []


+# TODO should also describe model_variables
 def describe_trainable_vars():
    """
    Print a description of the current model parameters.