update docs about trainer

e0c1ee77 · Yuxin Wu · fa69c70a · e0c1ee77 · e0c1ee77 · e0c1ee77
Commit e0c1ee77 authored Nov 12, 2017 by Yuxin Wu
8 changed files
--- a/docs/tutorial/dataflow.md
+++ b/docs/tutorial/dataflow.md
@@ -51,10 +51,10 @@ the rest of the data pipeline.
 	If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html)
 	on how tensorpack further accelerates data loading in the graph.
-Nevertheless, tensorpack support data loading with native TF operators / TF datasets as well.
+Nevertheless, tensorpack supports data loading with native TF operators / TF datasets as well.
 ### Use DataFlow (outside Tensorpack)
-tensorpack `InputSource` interface works with DataFlow out-of-the-box.
+Normally, tensorpack `InputSource` interface links DataFlow to the graph for training.
 If you use DataFlow in some custom code, call `reset_state()` first to initialize it,
 and then use the generator however you like:
 ```python

--- a/docs/tutorial/extend/trainer.md
+++ b/docs/tutorial/extend/trainer.md
+## Understand Trainer
-## Write a Trainer
+### Role of Trainer
+Tensorpack follows the "define-and-run" paradigm. A training has two steps:
+1. __Define__: Build graph for the model.
+	Users can call whatever tensorflow functions to setup the graph.
+	Users may or may not use tensorpack `InputSource`, `ModelDesc` or other utilities to build the graph.
+	The goal of this step is to define "what to run" in later training steps,
+	and it can happen __either inside or outside__ tensorpack trainer.
+2. __Run__: Train the model (the [Trainer.train() method](../modules/train.html#tensorpack.train.Trainer.train)):
+	1. Setup callbacks/monitors.
+	2. Finalize graph, initialize session.
+	3. Run the training loop.
+### Assumptions of Base Trainer
+* Q: What types of training can you do with tensorpack?
+* A: Anything that runs in a loop.
+In research we do training of various kind.
+Tensorpack trainers avoid making assumptions on what type of training
+you want to do (e.g., it doesn't have to be batched, SGD-like, or have `X`(inputs) and `y`(outputs)).
+The only assumption is that your training follows this pattern:
+```python
+for epoch_num in range(starting_epoch, max_epoch):
+	for local_step in range(steps_per_epoch):
+		run_step()
+```
+1. Training is **running some iterations**.
+Tensorpack base trainer implements the logic of __running the iteration__.
+Users or derived trainers should implement __what the iteration is__.
+2. Trainer assumes the existence of __"epoch"__, i.e. that the iterations run in double for-loops.
+But `steps_per_epoch` can be any number you set
+and it only affects the [schedule of callbacks](extend/callback.html).
+In other words, an "epoch" in tensorpack is the __default period to run callbacks__ (validation, summary, checkpoint, etc.).
+### How Existing (Single-Cost) Trainers Work
+Most neural network training tasks are single-cost optimization.
+Tensorpack provides some trainer implementations for such tasks.
+These trainers will take care of step 1 (define the graph), with the following arguments:
+1. Some `InputDesc`, the metadata about the input.
+2. An `InputSource`, where the input come from. See [Input Pipeline](input-source.html).
+3. A function which takes input tensors and returns the cost.
+4. A function which returns an optimizer.
+These are documented in [SingleCostTrainer.setup_graph](../modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph).
+In practice you'll not use this method directly, but use [high-level interface](training-interface.html#with-modeldesc-and-trainconfig) instead.
+### Write a Trainer
 The existing trainers should be enough for single-tower single-cost optimization tasks.
 If you just want to do some extra work during training, first consider writing it as a callback,
 or write an issue to see if there is a better solution than creating new trainers.
 If your task is fundamentally different from single-cost optimization, you will need to write a trainer.
+You can do customize training by either using or inheriting the base `Trainer` class.
-Trainers just run __some__ iterations, so there is no limit in where the data come from or what to do in an iteration.
-The existing common trainers all implement two things:
-1. Setup the graph and input pipeline, using the given `InputSource` and `get_cost_fn`.
-2. Minimize `model.cost` in each iteration.
-But you can customize it by using or inheriting the base `Trainer` class.
 You will need to define two things for a new Trainer:
-1. What is the graph.
+1. Define the graph.
 	Add any tensors and ops you like, either before creating the trainer or inside `Trainer.__init__`.
-2. What is the iteration. There are 2 ways to define an iteration:
+2. What is the iteration. There are 2 ways to define the iteration:
 	1. Set `Trainer.train_op`. This op will be run by default.
 	2. Subclass `Trainer` and override the `run_step()` method. This way you can do something more than running an op.

--- a/docs/tutorial/index.rst
+++ b/docs/tutorial/index.rst
@@ -13,7 +13,7 @@ A High Level Glance
  They will eventually be wrapped under the same ``InputSource`` interface and go through prefetching.
 * You can use any TF-based symbolic function library to define a model, including
-  a small set of models within tensorpack. ``ModelDesc`` is an interface to connect the graph with the
+  a small set of functions within tensorpack. ``ModelDesc`` is an interface to connect the graph with the
  ``InputSource`` interface.
 * tensorpack trainers manage the training loops for you.
@@ -38,7 +38,6 @@ User Tutorials
  dataflow
  input-source
-  efficient-dataflow
  symbolic
  trainer
  training-interface
@@ -47,8 +46,19 @@ User Tutorials
  summary
  faq
+Performance
+============
+.. toctree::
+  :maxdepth: 1
+  efficient-dataflow
+  performance-tuning
 Extend Tensorpack
-=================
+==================
 .. toctree::
  :maxdepth: 1
@@ -58,10 +68,3 @@ Extend Tensorpack
  extend/model
  extend/callback
  extend/trainer
-Notes
-======
-.. toctree::
-  :maxdepth: 1
-  performance-tuning
--- a/docs/tutorial/input-source.md
+++ b/docs/tutorial/input-source.md
@@ -102,6 +102,6 @@ For example,
 	Come from some `InputSource`, then prefetched on GPU by a TF StagingArea.
 4. Come from a DataFlow, and further processed by `tf.data.Dataset`.
 5. [TensorInput](../modules/input_source.html#tensorpack.input_source.TensorInput):
-	Come from some TF reading ops. (See the [PTB example](../examples/PennTreebank))
+	Come from some TF reading ops.
 6. Come from some ZMQ pipe, where the load/preprocessing may happen on a different machine.
--- a/docs/tutorial/trainer.md
+++ b/docs/tutorial/trainer.md
-# Trainer
+# Trainers
-Tensorpack follows the "define-and-run" paradigm. A training has two steps:
+Tensorpack trainers contain logic of:
-1. __Define__: Build graph for the model.
+1. Building the graph.
-	Users can call whatever tensorflow functions to setup the graph.
+2. Running the iterations (with callbacks).
-	Users may or may not use tensorpack `InputSource`, `ModelDesc` or other utilities to build the graph.
-	The goal of this step is to define "what to run" in later training steps,
-	and it can happen either inside or outside tensorpack trainer.
-2. __Run__: Train the model (the [Trainer.train() method](../modules/train.html#tensorpack.train.Trainer.train)):
+Usually you won't touch these methods directly, but use
+[higher-level interface](training-interface.html) on trainers.
+You'll only need to __select__ what trainer to use.
-	1. Setup callbacks/monitors.
+### Tower Trainer
-	2. Finalize graph, initialize session.
-	3. Run the training loop.
+Following the terminology in TensorFlow,
+a "tower" function is something that takes input tensors and adds __one replicate__ of the model to the graph.
+Most types of neural-network training could fall into this category.
+This concept is used mainly to support:
-## Assumptions of Base Trainer
+1. Data-parallel multi-GPU training, where a replicate is built on each GPU.
+2. Automatically building the graph for inference, where a replicate is built under inference mode.
-In research we do training of various kind.
-Tensorpack trainers try to avoid making assumptions on what type of training
-you want to do (e.g., it doesn't have to be batched, SGD-like, or have `X`(inputs) and `y`(outputs)).
-The only assumption tensorpack `Trainer` class makes about your training, is that your training
-follows this pattern:
-```python
-for epoch_num in range(starting_epoch, max_epoch):
-	for local_step in range(steps_per_epoch):
-		run_step()
-```
-1. Training is **running some iterations**.
+### MultiGPU Trainers
-Tensorpack base trainer implements the logic of __running the iteration__.
-Users or derived trainers should implement __what the iteration is__.
-2. Trainer assumes the existence of __"epoch"__, i.e. that the iterations run in double for-loops.
+For data-parallel multi-GPU training, different [multi-GPU trainers](http://tensorpack.readthedocs.io/en/latest/modules/train.html)
-But the epoch size can actually be any number you set
+implement different parallel logic, all reaching the same performance as the
-and it only affects the [schedule of callbacks](extend/callback.html).
+[official TF benchmark](https://www.tensorflow.org/performance/benchmarks).
-In other words, an "epoch" in tensorpack is the __default period to run callbacks__ (validation, summary, checkpoint, etc.).
+It takes only one line of code change to use them.
+Note some common problems when using these trainers:
-### Single-Cost Trainers
+1. In each iteration all GPUs (all replicates of the model) will take tensors from the `InputSource`,
+	instead of taking one for all and split.
+	So the total batch size would become ``(batch size of InputSource/DataFlow) * #GPU``.
-Most neural network training tasks are single-cost optimization.
+	Splitting a tensor to GPUs makes no sense at all, only to put unnecessary shape constraints on the data.
-Tensorpack provides some trainer implementations for such tasks.
+	By letting each GPU train on its own input tensors, they can train on inputs of different shapes simultaneously.
-These trainers will take care of step 1, by building the graph by itself, with the following arguments:
-1. Some `InputDesc`, the metadata about the input.
-2. An `InputSource`, where the input come from. See [Input Pipeline](input-source.html).
-3. A function which takes input tensors and returns the cost.
-4. A function which returns an optimizer.
-These are documented better in [SingleCostTrainer.setup_graph](../modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph).
-Often you'll not use this method directly, but use [high-level interface](training-interface.html#with-modeldesc-and-trainconfig)
-instead.
-Existing multi-GPU trainers include the logic of single-cost data-parallel training.
-You can enable them by just one line, and all the necessary logic to achieve the best performance was baked into the trainers already.
-The trainers can reach the same performance as the [official tensorflow benchmark](https://www.tensorflow.org/performance/benchmarks).
-Please note that in data-parallel training, in each iteration all GPUs (all replicates of the model) will take
-tensors from the `InputSource` (instead of taking one for all and split). So the total batch size
-would be ``(batch size of InputSource/DataFlow) * #GPU``.
-### Custom Trainers
-You can easily write a trainer for other types of training.
-See [Write a Trainer](extend/trainer.html).
+2. Your model code (the tower function) will get called multipile times.
+	You'll need to be very careful when modifying global states in those functions, e.g. adding ops to TF collections.
--- a/docs/tutorial/training-interface.md
+++ b/docs/tutorial/training-interface.md
 # Training Interface
-Tensorpack trainers have an interface for maximum flexibility.
+Tensorpack trainers have a verbose interface for maximum flexibility.
 Then, there are interfaces built on top of trainers to simplify the use,
 when you don't want to customize too much.
-### Raw Trainer Interface
-__Define__: For general trainer, build the graph by yourself.
-For single-cost trainer, build the graph by
-[SingleCostTrainer.setup_graph](../modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph).
-__Run__: Then, call
-[Trainer.train()](../modules/train.html#tensorpack.train.Trainer.train)
-or
-[Trainer.train_with_defaults()](../modules/train.html#tensorpack.train.Trainer.train_with_defaults)
-which applies some defaults options for normal use cases.
 ### With ModelDesc and TrainConfig
 This is an interface that's most familiar to old tensorpack users,
 and is now mainly useful for single-cost tasks.
 A lot of examples are written in this interface.
-[SingleCost trainers](trainer.html#single-cost-trainers)
+[SingleCost trainers](../modules/train.html#tensorpack.train.SingleCostTrainer)
-expects 4 arguments in `setup_graph`: `InputDesc`, `InputSource`, get_cost function, and an optimizer.
+expects 4 arguments to setup the graph: `InputDesc`, `InputSource`, get_cost function, and an optimizer.
 `ModelDesc` describes a model by packing 3 of them together into one object:
 ```python
@@ -65,7 +53,7 @@ config = TrainConfig(
 )
 trainer = SomeTrainer()
-# trainer = SyncMultiGPUTrainerParameterServer([0, 1, 2])
+# trainer = SyncMultiGPUTrainerParameterServer(8)
 launch_train_with_config(config, trainer)
 ```
 See the docs of
@@ -73,3 +61,19 @@ See the docs of
 and
 [launch_train_with_config](../modules/train.html#tensorpack.train.launch_train_with_config)
 for usage and detailed functionalities.
+### Raw Trainer Interface
+You can also access methods of trainer directly, to get a finer control:
+__Build__ the graph: For general trainer, build the graph by yourself.
+For single-cost trainer, build the graph by
+[SingleCostTrainer.setup_graph](../modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph).
+__Run__ the iterations: Call
+[Trainer.train()](../modules/train.html#tensorpack.train.Trainer.train),
+or
+[Trainer.train_with_defaults()](../modules/train.html#tensorpack.train.Trainer.train_with_defaults)
+which applies some defaults options for normal use cases.
+Read the API documentation for detail usage.
--- a/tensorpack/input_source/input_source.py
+++ b/tensorpack/input_source/input_source.py
@@ -316,7 +316,9 @@ class BatchQueueInput(QueueInput):
 # TODO tensor inputs can be drained? look at the new dataset API.
 class TensorInput(FeedfreeInput):
-    """ Input from a list of tensors, e.g. a TF data reading pipeline. """
+    """ Input from a list of tensors, e.g. a TF data reading pipeline.
+        The PTB training example shows how to use it.
+    """
    def __init__(self, get_tensor_fn, size=None):
        """

--- a/tensorpack/train/tower.py
+++ b/tensorpack/train/tower.py
@@ -122,7 +122,7 @@ class SingleCostTrainer(TowerTrainer):
        Note:
            1. `get_cost_fn` will always be called under a :class:`TowerContext`.
-               which will contain information abouut reuse,
+               which will contain information about reuse,
               training/inference, scope name, etc.
            2. `get_cost_fn` might get called multiple times for data-parallel training or inference.
            3. To respect variable reuse, use `tf.get_variable` instead of