update trainer doc

b785bf77 · Yuxin Wu · 2d856f94 · b785bf77 · b785bf77
Commit b785bf77 authored Aug 02, 2017 by Yuxin Wu
Show whitespace changes
Inline Side-by-side

Showing with 26 additions and 19 deletions

docs/tutorial/trainer.md docs/tutorial/trainer.md +25 -18

examples/ResNet/imagenet-resnet.py examples/ResNet/imagenet-resnet.py +1 -1

No files found.
--- a/docs/tutorial/trainer.md
+++ b/docs/tutorial/trainer.md
@@ -5,37 +5,44 @@ Training is **running something again and again**.
 Tensorpack base trainer implements the logic of __running the iteration__.
 Users or derived trainers should implement __what the iteration is__.

+
+### Common Trainers
+
 Most neural network training tasks are single-cost optimization.
 Tensorpack provides some trainer implementations for such tasks.
 These trainers will build the graph based on the given `ModelDesc`, and minimizes `ModelDesc.cost`.

-Existing trainers were implemented with certain prefetch mechanism,
-which will run significantly faster than a naive `sess.run(..., feed_dict={...})`.
-
-There are also Multi-GPU trainers which include the logic of data-parallel Multi-GPU training.
-You can enable them by just one line, and all the necessary logic to achieve the best performance was baked into the trainers already.
-For example, SyncMultiGPUTrainer can train ResNet50 as fast as the [official tensorflow benchmark](https://github.com/tensorflow/benchmarks).
-
 To use trainers, pass a `TrainConfig` to configure them:

 ```python
 config = TrainConfig(
           model=MyModel()
 					 dataflow=my_dataflow,
+					 # data=my_inputsource,		# alternatively, use a customized InputSource
           callbacks=[...]
         )

-# start training (with a slow trainer. See 'tutorials - Input Pipeline' for details):
-# SimpleTrainer(config).train()
-
-# start training with queue prefetch:
-QueueInputTrainer(config).train()
+# start training:
+SomeTrainer(config, other_arguments).train()

 # start multi-GPU training with a synchronous update:
-# SyncMultiGPUTrainer(config).train()
+# SyncMultiGPUTrainerParameterServer(config).train()
 ```

-Trainers just run __some__ iterations, so there is no limit in where the data come from
-or what to do in an iteration.
-For example, [GAN trainer](../examples/GAN/GAN.py) minimizes
-two cost functions alternatively.
+When you set the DataFlow (rather than the InputSource) in the config,
+tensorpack trainers automatically pick up certain prefetch mechanism,
+which will run faster than a naive `sess.run(..., feed_dict={...})`.
+You can set the InputSource instead, to customize this behavior.
+
+Existing multi-GPU trainers include the logic of data-parallel training.
+You can enable them by just one line, and all the necessary logic to achieve the best performance was baked into the trainers already.
+The trainers can reach the same performance as the [official tensorflow benchmark](https://github.com/tensorflow/benchmarks).
+
+Please note that, in data-parallel training, all towers (all replicates of the model) will take 
+tensors from the InputSource (instead of taking one for all and split). So the total batch size
+would be multiplied by the number of GPUs.
+
+### Custom Trainers
+
+Trainers just run __some__ iterations, so there is no limit in where the data come from or what to do in an iteration.
+For example, [GAN trainer](../examples/GAN/GAN.py) minimizes two cost functions alternatively.
--- a/examples/ResNet/imagenet-resnet.py
+++ b/examples/ResNet/imagenet-resnet.py
@@ -106,7 +106,7 @@ def get_config(fake=False, data_format='NCHW'):
                ClassificationError('wrong-top1', 'val-error-top1'),
                ClassificationError('wrong-top5', 'val-error-top5')]),
            ScheduledHyperParamSetter('learning_rate',
-                                      [(30, 1e-2), (60, 1e-3), (85, 1e-4), (95, 1e-5)]),
+                                      [(30, 1e-2), (60, 1e-3), (85, 1e-4), (95, 1e-5), (105, 1e-6)]),
            HumanHyperParamSetter('learning_rate'),
        ],
        steps_per_epoch=5000,