update docs

af667ff4 · Yuxin Wu · 6a1e822d · 6a1e822d · af667ff4 · af667ff4
Commit af667ff4 authored Oct 27, 2017 by Yuxin Wu
7 changed files
--- a/docs/tutorial/graph.md
+++ b/docs/tutorial/graph.md
-# Build the Graph
-This tutorial explains how a graph is built in tensorpack.
-### ModelDesc
-`ModelDesc` is an abstraction over the most common type of models people train.
-It assumes:
-1. Training is a single-cost optimized by a single `tf.train.Optimizer`.
-2. The graph can be trivially duplicated for data-parallel training or inference.
-If your task is single-cost optimization,
-you can subclass `ModelDesc` and implement several methods:
-```python
-class MyModel(ModelDesc):
-	def _get_inputs(self):
-		return [InputDesc(...), InputDesc(...)]
-	def _build_graph(self, inputs):
-		tensorA, tensorB = inputs
-		# build the graph
-		self.cost = xxx	 # define the cost tensor
-	def _get_optimizer(self):
-	  return tf.train.GradientDescentOptimizer(0.1)
-```
-`_get_inputs` should define the metainfo of all the inputs your graph may need.
-`_build_graph` should add tensors/operations to the graph, where
-the argument `inputs` is the list of input tensors matching `_get_inputs`.
-You can use any symbolic functions in `_build_graph`, including TensorFlow core library
-functions and other symbolic libraries.
-### How it is Used:
-Most tensorpack trainers expect a `ModelDesc`, and use it as a __description
-of the TF graph to be built__.
-These trainers will use `_get_inputs` to connect the given `InputSource` to the graph.
-They'll then use `_build_graph` to create the backbone model, and then `_get_optimizer` to create the minimization op, and run it.
-Note that data-parallel multi-GPU trainers will call `_build_graph` __multiple times__ on each GPU.
-A trainer may also make __extra calls__ to `_build_graph` for inference, if used by some callbacks.
-`_build_graph` will always be called under some `TowerContext` which contains these context information
-(e.g. training or inference, reuse or not, scope name) for your access.
-Also, to respect variable reuse among multiple calls, use `tf.get_variable()` instead of `tf.Variable` in `_build_graph`,
-if you need to create any variables.
-### Build It Manually
-When you need to deal with complicated graph, it may be easier to build the graph manually.
-You are free to do so as long as you tell the trainer what to do in each step.
-Check out [Write a Trainer](extend/trainer.html)
-for using a custom graph with trainer.
--- a/docs/tutorial/index.rst
+++ b/docs/tutorial/index.rst
@@ -39,9 +39,9 @@ User Tutorials
  dataflow
  input-source
  efficient-dataflow
-  graph
  symbolic
  trainer
+  training-interface
  callback
  summary
  faq

--- a/docs/tutorial/trainer.md
+++ b/docs/tutorial/trainer.md
 # Trainer
+Tensorpack trainers prepares and runs the training, which consists of the following steps:
+1. __Build graph__ for the model.
+	Users can call whatever tensorflow functions to setup the graph.
+	Users may or may not use tensorpack `InputSource`, `ModelDesc` to build the graph.
+	This step defines "what to run" in every training step.
+2. Train the model (the [Trainer.train() method](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.Trainer.train)):
+	1. Setup callbacks/monitors.
+	2. Finalize the graph, initialize session.
+	3. Run the main loop.
+## Assumptions of Base Trainer
 In research we do training of various kind.
+Tensorpack trainers try to avoid making assumptions on what type of training
+you want to do (e.g., it doesn't have to be batched, SGD-like, or have `X`(inputs) and `y`(outputs)).
 The only assumption tensorpack `Trainer` class makes about your training, is that your training
 follows this pattern:
 ```python
@@ -15,47 +33,36 @@ Tensorpack base trainer implements the logic of __running the iteration__.
 Users or derived trainers should implement __what the iteration is__.
 2. Trainer assumes the existence of __"epoch"__, i.e. that the iterations run in double for-loops.
-But an epoch doesn't need to be a full pass of your dataset, the size of an epoch can be any number you set
+But the epoch size can actually be any number you set
 and it only affects the [schedule of callbacks](extend/callback.html).
 In other words, an "epoch" in tensorpack is the __default period to run callbacks__ (validation, summary, checkpoint, etc.).
-### Common Trainers
+### Single-Cost Trainers
 Most neural network training tasks are single-cost optimization.
 Tensorpack provides some trainer implementations for such tasks.
-These trainers will build the graph based on inputs and functions which build the cost from inputs.
+These trainers will build the graph by itself, with the following arguments:
-The simplest way to use trainers, is to pass a
+1. Some `InputDesc`, the metadata about the input.
-`TrainConfig` to the `launch_train_with_config` high-level wrapper.
+2. An `InputSource`, where the input come from. See [Input Pipeline](input-source.html).
+3. A function which takes input tensors and returns the cost.
+4. A function which returns an optimizer.
-```python
+See [SingleCostTrainer.setup_graph](http://localhost:8000/modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph)
-config = TrainConfig(
+for details.
-	 model=MyModel()
-	 dataflow=my_dataflow,
-	 # data=my_inputsource, # alternatively, use a customized InputSource
-	 callbacks=[...]
-)
-trainer = SomeTrainer()
-# multi-GPU training with synchronous update:
-# trainer = SyncMultiGPUTrainerParameterServer([0, 1, 2])
-launch_train_with_config(config, trainer)
-```
-When you set the DataFlow (rather than the InputSource) in the config,
-`launch_train_with_config` automatically adopt certain prefetch mechanism, as mentioned
-in the [Input Pipeline](input-source.html) tutorial.
-You can set the InputSource instead, to customize this behavior.
 Existing multi-GPU trainers include the logic of data-parallel training.
 You can enable them by just one line, and all the necessary logic to achieve the best performance was baked into the trainers already.
 The trainers can reach the same performance as the [official tensorflow benchmark](https://www.tensorflow.org/performance/benchmarks).
 Please note that in data-parallel training, in each iteration all towers (all replicates of the model) will take
-tensors from the InputSource (instead of taking one for all and split). So the total batch size
+tensors from the `InputSource` (instead of taking one for all and split). So the total batch size
 would be ``(batch size of InputSource/DataFlow) * #GPU``.
+There are also high-level wrappers that have slightly simpler interface (but exist mainly for old users).
+See [High-Level Training Interface](training-interface.html)
 ### Custom Trainers
 You can easily write a trainer for other types of training.

--- a/docs/tutorial/training-interface.md
+++ b/docs/tutorial/training-interface.md
+# Training Interface
+Tensorpack trainers provide low-level API which requires a number of options to setup.
+There are high-level interfaces built on top of trainer to simplify the use,
+when you don't want to customize too much.
+### With ModelDesc and TrainConfig
+[SingleCost trainers](trainer.html#single-cost-trainers)
+expects `InputDesc`, `InputSource`, get_cost function, and optimizer.
+`ModelDesc` describes a model by packing three of them together into one object:
+```python
+class MyModel(ModelDesc):
+	def _get_inputs(self):
+		return [InputDesc(...), InputDesc(...)]
+	def _build_graph(self, inputs):
+		tensorA, tensorB = inputs
+		# build the graph
+		self.cost = xxx	 # define the cost tensor
+	def _get_optimizer(self):
+	  return tf.train.GradientDescentOptimizer(0.1)
+```
+`_get_inputs` should define the metainfo of all the inputs your graph may need.
+`_build_graph` should add tensors/operations to the graph, where
+the argument `inputs` is a list of tensors which will match `_get_inputs`.
+You can use any symbolic functions in `_build_graph`, including TensorFlow core library
+functions and other symbolic libraries.
+But you need to follow the requirement of
+[get_cost_fn](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph),
+because this function will be used as part of `get_cost_fn`.
+At last you need to set `self.cost`.
+After defining such a model, use it with `TrainConfig` and `launch_train_with_config`:
+```python
+config = TrainConfig(
+   model=MyModel()
+   dataflow=my_dataflow,
+   # data=my_inputsource, # alternatively, use a customized InputSource
+   callbacks=[...]
+)
+trainer = SomeTrainer()
+# trainer = SyncMultiGPUTrainerParameterServer([0, 1, 2])
+launch_train_with_config(config, trainer)
+```
+See the docs of
+[launch_train_with_config](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.launch_train_with_config)
+for its usage and detailed functionalities.
--- a/examples/GAN/GAN.py
+++ b/examples/GAN/GAN.py
@@ -90,7 +90,7 @@ class GANTrainer(TowerTrainer):
 class SeparateGANTrainer(TowerTrainer):
-    """ A GAN trainer which runs two optimization ops with a certain ratio, one in each step. """
+    """ A GAN trainer which runs two optimization ops with a certain ratio."""
    def __init__(self, input, model, d_period=1, g_period=1):
        """
        Args:

--- a/tensorpack/train/base.py
+++ b/tensorpack/train/base.py
@@ -356,7 +356,7 @@ class SingleCostTrainer(TowerTrainer):
    Single-cost trainer has a :meth:`setup_graph` method which takes
    (inputs_desc, input, get_cost_fn, get_opt_fn), and build the training operations from them.
-    To use a SingleCostTrainer object, call `trainer.setup_graph(...); trainer.train(...)`.
+    To use a :class:`SingleCostTrainer` object, call `trainer.setup_graph(...); trainer.train(...)`.
    """
    @call_only_once
@@ -368,14 +368,16 @@ class SingleCostTrainer(TowerTrainer):
            inputs_desc ([InputDesc]):
            input (InputSource):
            get_cost_fn ([tf.Tensor] -> tf.Tensor): callable, takes some input tenosrs and return a cost tensor.
-                Might get called multiple times for data-parallel training or inference.
            get_opt_fn (-> tf.train.Optimizer): callable which returns an
                optimizer. Will only be called once.
-        Returns:
+        Note:
-            [Callback]: a (possibly empty) list of callbacks needed for training.
+            1. `get_cost_fn` will always be called under a :class:`TowerContext`.
-                These callbacks will be automatically added when you call `train()`.
+               which will contain information abouut reuse,
-                So you can usually ignore the return value.
+               training/inference, scope name, etc.
+            2. `get_cost_fn` might get called multiple times for data-parallel training or inference.
+            3. To respect variable reuse, use `tf.get_variable` instead of
+               `tf.Variable` in `get_cost_fn`.
        """
        get_cost_fn = TowerFuncWrapper(get_cost_fn, inputs_desc)
        get_opt_fn = memoized(get_opt_fn)
@@ -386,11 +388,17 @@ class SingleCostTrainer(TowerTrainer):
        internal_callbacks = input_callbacks + train_callbacks
        for cb in internal_callbacks:
            self._register_callback(cb)
-        return internal_callbacks
+    # TODO register directly instead of return?
    @abstractmethod
    def _setup_graph(self, input, get_cost_fn, get_opt_fn):
-        pass
+        """
+        Implement the logic to build the graph, with an :class:`InputSource`
+        that's been setup already.
+        Returns:
+            [Callback]: list of callbacks needed
+        """
    def _setup_input(self, inputs_desc, input):
        assert not input.setup_done()

--- a/tensorpack/train/interface.py
+++ b/tensorpack/train/interface.py
@@ -44,7 +44,14 @@ def apply_default_prefetch(input_source_or_dataflow, trainer, towers):
 def launch_train_with_config(config, trainer):
    """
    Train with a :class:`TrainConfig` and a :class:`Trainer`, to
-    mimic the old training interface.
+    mimic the old training interface. It basically does the following
+    3 things (and you can easily do them by yourself):
+    1. Setup the :class:`InputSource` with automatic prefetching,
+       for `config.data` or `config.dataflow`.
+    2. Call `trainer.setup_graph` with the :class:`InputSource`,
+       as well as `config.model`.
+    3. Call `trainer.train` with rest of the attributes of config.
    Args:
        config (TrainConfig):
@@ -79,7 +86,4 @@ def launch_train_with_config(config, trainer):
    trainer.setup_graph(
        inputs_desc, input,
        model._build_graph_get_cost, model.get_optimizer)
-    trainer.train(
+    trainer.train_with_config(config)
-        config.callbacks, config.monitors,
-        config.session_creator, config.session_init,
-        config.steps_per_epoch, config.starting_epoch, config.max_epoch)