silent autograph warnings; update docs

dd138d5a · Yuxin Wu · 23239bd7 · dd138d5a · dd138d5a · dd138d5a
Commit dd138d5a authored Jul 05, 2019 by Yuxin Wu
5 changed files
--- a/.travis.yml
+++ b/.travis.yml
@@ -26,10 +26,10 @@ matrix:
    env: TF_VERSION=1.3.0 TF_TYPE=release
  - os: linux
    python: 2.7
-    env: TF_VERSION=1.12.0 TF_TYPE=release
+    env: TF_VERSION=1.14.0 TF_TYPE=release
  - os: linux
    python: 3.6
-    env: TF_VERSION=1.12.0 TF_TYPE=release PYPI=true
+    env: TF_VERSION=1.14.0 TF_TYPE=release PYPI=true
  - os: linux
    python: 2.7
    env: TF_TYPE=nightly

--- a/examples/FasterRCNN/NOTES.md
+++ b/examples/FasterRCNN/NOTES.md
@@ -66,15 +66,17 @@ Efficiency:

 1. After warmup, the training speed will slowly decrease due to more accurate proposals.

-1. The code should have around 80~90% GPU utilization on V100s, and 85%~90% scaling
-   efficiency from 1 V100 to 8 V100s.
+1. The code should have around 85~90% GPU utilization on one V100.
+	Scalability isn't very meaningful since the amount of computation each GPU perform is data-dependent.
+	If all images have the same spatial size (in which case the per-GPU computation is *still different*),
+	then a 85%~90% scaling efficiency is observed when using 8 V100s and `HorovodTrainer`.

 1. This implementation does not use specialized CUDA ops (e.g. AffineChannel, ROIAlign).
   Therefore it might be slower than other highly-optimized implementations.

 1. To reduce RAM usage on host: (1) make sure you're using the "spawn" method as
   set in `train.py`; (2) reduce `buffer_size` or `NUM_WORKERS` in `data.py`
-   (which may negatively impact your throughput). The training needs <10G RAM if `NUM_WORKERS=0`.
+   (which may negatively impact your throughput). The training only needs <10G RAM if `NUM_WORKERS=0`.

 1. Inference is unoptimized. Tensorpack is a training interface, therefore it
   does not help you on optimized inference. In fact, the current implementation

--- a/examples/FasterRCNN/config.py
+++ b/examples/FasterRCNN/config.py
@@ -257,10 +257,6 @@ def finalize_configs(is_training):
        if _C.TRAINER == 'horovod':
            import horovod.tensorflow as hvd
            ngpu = hvd.size()
-
-            if ngpu == hvd.local_size():
-                logger.warn("It's not recommended to use horovod for single-machine training. "
-                            "Replicated trainer is more stable and has the same efficiency.")
        else:
            assert 'OMPI_COMM_WORLD_SIZE' not in os.environ
            ngpu = get_num_gpu()

--- a/examples/FasterRCNN/data.py
+++ b/examples/FasterRCNN/data.py
@@ -121,9 +121,10 @@ class TrainingDataPreprocessor:

    def __init__(self, cfg):
        self.cfg = cfg
-        self.aug = imgaug.AugmentorList(
-            [CustomResize(cfg.PREPROC.TRAIN_SHORT_EDGE_SIZE, cfg.PREPROC.MAX_SIZE), imgaug.Flip(horiz=True)]
-        )
+        self.aug = imgaug.AugmentorList([
+            CustomResize(cfg.PREPROC.TRAIN_SHORT_EDGE_SIZE, cfg.PREPROC.MAX_SIZE),
+            imgaug.Flip(horiz=True)
+        ])

    def __call__(self, roidb):
        fname, boxes, klass, is_crowd = roidb["file_name"], roidb["boxes"], roidb["class"], roidb["is_crowd"]

--- a/tensorpack/models/batch_norm.py
+++ b/tensorpack/models/batch_norm.py
@@ -56,6 +56,15 @@ def internal_update_bn_ema(xn, batch_mean, batch_var,
        return tf.identity(xn, name='output')


+try:
+    # When BN is used as an activation, keras layers try to autograph.convert it
+    # This leads to massive warnings so we disable it.
+    from tensorflow.python.autograph.impl.api import do_not_convert as disable_autograph
+except ImportError:
+    def disable_autograph():
+        return lambda x: x
+
+
 @layer_register()
 @convert_to_tflayer_args(
    args_names=[],
@@ -66,6 +75,7 @@ def internal_update_bn_ema(xn, batch_mean, batch_var,
        'decay': 'momentum',
        'use_local_stat': 'training'
    })
+@disable_autograph()
 def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
              center=True, scale=True,
              beta_initializer=tf.zeros_initializer(),