Commit 4f0cb51e authored by Yuxin Wu's avatar Yuxin Wu

notes

parent dd7ddd94
...@@ -7,6 +7,7 @@ Use Keras to define a model a train it with efficient tensorpack trainers. ...@@ -7,6 +7,7 @@ Use Keras to define a model a train it with efficient tensorpack trainers.
### Simple Examples: ### Simple Examples:
[mnist-keras.py](mnist-keras.py): a simple MNIST model written mostly in tensorpack style, but use Keras model as symbolic functions. [mnist-keras.py](mnist-keras.py): a simple MNIST model written mostly in tensorpack style, but use Keras model as symbolic functions.
[mnist-keras-v2.py](mnist-keras-v2.py): the same MNIST model written in Keras style. [mnist-keras-v2.py](mnist-keras-v2.py): the same MNIST model written in Keras style.
### ImageNet Example: ### ImageNet Example:
...@@ -16,5 +17,11 @@ reproduce exactly the same setting of [tensorpack ResNet example](../ResNet) on ...@@ -16,5 +17,11 @@ reproduce exactly the same setting of [tensorpack ResNet example](../ResNet) on
It has: It has:
+ ResNet-50 model modified from [keras.applications](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/_impl/keras/applications/resnet50.py) + ResNet-50 model modified from [keras.applications](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/_impl/keras/applications/resnet50.py)
+ Multi-GPU data-parallel __training and validation__ which scales (With 8 V100s, still has >90% GPU utilization and finished training in 21 hours) + Multi-GPU data-parallel __training and validation__ which scales
+ Good accuracy (same as the [tensorpack ResNet example](../ResNet)) + With 8 V100s, still has >90% GPU utilization and finished 100 epochs in 19.5 hours
+ Good accuracy (same as [tensorpack ResNet example](../ResNet))
Keras alone is not efficient enough to work on large models like this.
In addition to tensorpack, [horovod](https://github.com/uber/horovod/blob/master/examples/keras_imagenet_resnet50.py)
can also help you to train large models with Keras.
...@@ -179,7 +179,7 @@ if __name__ == '__main__': ...@@ -179,7 +179,7 @@ if __name__ == '__main__':
[(0, 0.1), (3, BASE_LR)], interp='linear'), # warmup [(0, 0.1), (3, BASE_LR)], interp='linear'), # warmup
ScheduledHyperParamSetter( ScheduledHyperParamSetter(
'learning_rate', 'learning_rate',
[(30, BASE_LR * 0.1), (60, BASE_LR * 1e-2), (85, BASE_LR * 1e-3), (100, BASE_LR * 1e-4)]), [(30, BASE_LR * 0.1), (60, BASE_LR * 1e-2), (85, BASE_LR * 1e-3)]),
GPUUtilizationTracker() GPUUtilizationTracker()
] ]
if not args.fake: if not args.fake:
...@@ -189,6 +189,6 @@ if __name__ == '__main__': ...@@ -189,6 +189,6 @@ if __name__ == '__main__':
M.fit( M.fit(
steps_per_epoch=100 if args.fake else 1281167 // TOTAL_BATCH_SIZE, steps_per_epoch=100 if args.fake else 1281167 // TOTAL_BATCH_SIZE,
max_epoch=105, max_epoch=100,
callbacks=callbacks callbacks=callbacks
) )
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment