notes

4f0cb51e · Yuxin Wu · dd7ddd94 · 4f0cb51e · 4f0cb51e
Commit 4f0cb51e authored Jan 16, 2018 by Yuxin Wu
Hide whitespace changes
Inline Side-by-side

Showing with 11 additions and 4 deletions

examples/keras/README.md examples/keras/README.md +9 -2

examples/keras/imagenet-resnet-keras.py examples/keras/imagenet-resnet-keras.py +2 -2

No files found.
--- a/examples/keras/README.md
+++ b/examples/keras/README.md
@@ -7,6 +7,7 @@ Use Keras to define a model a train it with efficient tensorpack trainers.
 ### Simple Examples:

 [mnist-keras.py](mnist-keras.py): a simple MNIST model written mostly in tensorpack style, but use Keras model as symbolic functions.
+
 [mnist-keras-v2.py](mnist-keras-v2.py): the same MNIST model written in Keras style.

 ### ImageNet Example:
@@ -16,5 +17,11 @@ reproduce exactly the same setting of [tensorpack ResNet example](../ResNet) on
 It has:

 + ResNet-50 model modified from [keras.applications](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/_impl/keras/applications/resnet50.py)
-+ Multi-GPU data-parallel __training and validation__ which scales (With 8 V100s, still has >90% GPU utilization and finished training in 21 hours)
-+ Good accuracy (same as the [tensorpack ResNet example](../ResNet))
+ Multi-GPU data-parallel __training and validation__ which scales
+	+ With 8 V100s, still has >90% GPU utilization and finished 100 epochs in 19.5 hours
+ Good accuracy (same as [tensorpack ResNet example](../ResNet))
+
+
+Keras alone is not efficient enough to work on large models like this.
+In addition to tensorpack, [horovod](https://github.com/uber/horovod/blob/master/examples/keras_imagenet_resnet50.py)
+can also help you to train large models with Keras.
--- a/examples/keras/imagenet-resnet-keras.py
+++ b/examples/keras/imagenet-resnet-keras.py
@@ -179,7 +179,7 @@ if __name__ == '__main__':
            [(0, 0.1), (3, BASE_LR)], interp='linear'),  # warmup
        ScheduledHyperParamSetter(
            'learning_rate',
-            [(30, BASE_LR * 0.1), (60, BASE_LR * 1e-2), (85, BASE_LR * 1e-3), (100, BASE_LR * 1e-4)]),
+            [(30, BASE_LR * 0.1), (60, BASE_LR * 1e-2), (85, BASE_LR * 1e-3)]),
        GPUUtilizationTracker()
    ]
    if not args.fake:
@@ -189,6 +189,6 @@ if __name__ == '__main__':

    M.fit(
        steps_per_epoch=100 if args.fake else 1281167 // TOTAL_BATCH_SIZE,
-        max_epoch=105,
+        max_epoch=100,
        callbacks=callbacks
    )