Commit af667ff4 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 6a1e822d
# Build the Graph
This tutorial explains how a graph is built in tensorpack.
### ModelDesc
`ModelDesc` is an abstraction over the most common type of models people train.
It assumes:
1. Training is a single-cost optimized by a single `tf.train.Optimizer`.
2. The graph can be trivially duplicated for data-parallel training or inference.
If your task is single-cost optimization,
you can subclass `ModelDesc` and implement several methods:
```python
class MyModel(ModelDesc):
def _get_inputs(self):
return [InputDesc(...), InputDesc(...)]
def _build_graph(self, inputs):
tensorA, tensorB = inputs
# build the graph
self.cost = xxx # define the cost tensor
def _get_optimizer(self):
return tf.train.GradientDescentOptimizer(0.1)
```
`_get_inputs` should define the metainfo of all the inputs your graph may need.
`_build_graph` should add tensors/operations to the graph, where
the argument `inputs` is the list of input tensors matching `_get_inputs`.
You can use any symbolic functions in `_build_graph`, including TensorFlow core library
functions and other symbolic libraries.
### How it is Used:
Most tensorpack trainers expect a `ModelDesc`, and use it as a __description
of the TF graph to be built__.
These trainers will use `_get_inputs` to connect the given `InputSource` to the graph.
They'll then use `_build_graph` to create the backbone model, and then `_get_optimizer` to create the minimization op, and run it.
Note that data-parallel multi-GPU trainers will call `_build_graph` __multiple times__ on each GPU.
A trainer may also make __extra calls__ to `_build_graph` for inference, if used by some callbacks.
`_build_graph` will always be called under some `TowerContext` which contains these context information
(e.g. training or inference, reuse or not, scope name) for your access.
Also, to respect variable reuse among multiple calls, use `tf.get_variable()` instead of `tf.Variable` in `_build_graph`,
if you need to create any variables.
### Build It Manually
When you need to deal with complicated graph, it may be easier to build the graph manually.
You are free to do so as long as you tell the trainer what to do in each step.
Check out [Write a Trainer](extend/trainer.html)
for using a custom graph with trainer.
......@@ -39,9 +39,9 @@ User Tutorials
dataflow
input-source
efficient-dataflow
graph
symbolic
trainer
training-interface
callback
summary
faq
......
# Trainer
Tensorpack trainers prepares and runs the training, which consists of the following steps:
1. __Build graph__ for the model.
Users can call whatever tensorflow functions to setup the graph.
Users may or may not use tensorpack `InputSource`, `ModelDesc` to build the graph.
This step defines "what to run" in every training step.
2. Train the model (the [Trainer.train() method](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.Trainer.train)):
1. Setup callbacks/monitors.
2. Finalize the graph, initialize session.
3. Run the main loop.
## Assumptions of Base Trainer
In research we do training of various kind.
Tensorpack trainers try to avoid making assumptions on what type of training
you want to do (e.g., it doesn't have to be batched, SGD-like, or have `X`(inputs) and `y`(outputs)).
The only assumption tensorpack `Trainer` class makes about your training, is that your training
follows this pattern:
```python
......@@ -15,47 +33,36 @@ Tensorpack base trainer implements the logic of __running the iteration__.
Users or derived trainers should implement __what the iteration is__.
2. Trainer assumes the existence of __"epoch"__, i.e. that the iterations run in double for-loops.
But an epoch doesn't need to be a full pass of your dataset, the size of an epoch can be any number you set
But the epoch size can actually be any number you set
and it only affects the [schedule of callbacks](extend/callback.html).
In other words, an "epoch" in tensorpack is the __default period to run callbacks__ (validation, summary, checkpoint, etc.).
### Common Trainers
### Single-Cost Trainers
Most neural network training tasks are single-cost optimization.
Tensorpack provides some trainer implementations for such tasks.
These trainers will build the graph based on inputs and functions which build the cost from inputs.
These trainers will build the graph by itself, with the following arguments:
The simplest way to use trainers, is to pass a
`TrainConfig` to the `launch_train_with_config` high-level wrapper.
1. Some `InputDesc`, the metadata about the input.
2. An `InputSource`, where the input come from. See [Input Pipeline](input-source.html).
3. A function which takes input tensors and returns the cost.
4. A function which returns an optimizer.
```python
config = TrainConfig(
model=MyModel()
dataflow=my_dataflow,
# data=my_inputsource, # alternatively, use a customized InputSource
callbacks=[...]
)
trainer = SomeTrainer()
# multi-GPU training with synchronous update:
# trainer = SyncMultiGPUTrainerParameterServer([0, 1, 2])
launch_train_with_config(config, trainer)
```
When you set the DataFlow (rather than the InputSource) in the config,
`launch_train_with_config` automatically adopt certain prefetch mechanism, as mentioned
in the [Input Pipeline](input-source.html) tutorial.
You can set the InputSource instead, to customize this behavior.
See [SingleCostTrainer.setup_graph](http://localhost:8000/modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph)
for details.
Existing multi-GPU trainers include the logic of data-parallel training.
You can enable them by just one line, and all the necessary logic to achieve the best performance was baked into the trainers already.
The trainers can reach the same performance as the [official tensorflow benchmark](https://www.tensorflow.org/performance/benchmarks).
Please note that in data-parallel training, in each iteration all towers (all replicates of the model) will take
tensors from the InputSource (instead of taking one for all and split). So the total batch size
tensors from the `InputSource` (instead of taking one for all and split). So the total batch size
would be ``(batch size of InputSource/DataFlow) * #GPU``.
There are also high-level wrappers that have slightly simpler interface (but exist mainly for old users).
See [High-Level Training Interface](training-interface.html)
### Custom Trainers
You can easily write a trainer for other types of training.
......
# Training Interface
Tensorpack trainers provide low-level API which requires a number of options to setup.
There are high-level interfaces built on top of trainer to simplify the use,
when you don't want to customize too much.
### With ModelDesc and TrainConfig
[SingleCost trainers](trainer.html#single-cost-trainers)
expects `InputDesc`, `InputSource`, get_cost function, and optimizer.
`ModelDesc` describes a model by packing three of them together into one object:
```python
class MyModel(ModelDesc):
def _get_inputs(self):
return [InputDesc(...), InputDesc(...)]
def _build_graph(self, inputs):
tensorA, tensorB = inputs
# build the graph
self.cost = xxx # define the cost tensor
def _get_optimizer(self):
return tf.train.GradientDescentOptimizer(0.1)
```
`_get_inputs` should define the metainfo of all the inputs your graph may need.
`_build_graph` should add tensors/operations to the graph, where
the argument `inputs` is a list of tensors which will match `_get_inputs`.
You can use any symbolic functions in `_build_graph`, including TensorFlow core library
functions and other symbolic libraries.
But you need to follow the requirement of
[get_cost_fn](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph),
because this function will be used as part of `get_cost_fn`.
At last you need to set `self.cost`.
After defining such a model, use it with `TrainConfig` and `launch_train_with_config`:
```python
config = TrainConfig(
model=MyModel()
dataflow=my_dataflow,
# data=my_inputsource, # alternatively, use a customized InputSource
callbacks=[...]
)
trainer = SomeTrainer()
# trainer = SyncMultiGPUTrainerParameterServer([0, 1, 2])
launch_train_with_config(config, trainer)
```
See the docs of
[launch_train_with_config](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.launch_train_with_config)
for its usage and detailed functionalities.
......@@ -90,7 +90,7 @@ class GANTrainer(TowerTrainer):
class SeparateGANTrainer(TowerTrainer):
""" A GAN trainer which runs two optimization ops with a certain ratio, one in each step. """
""" A GAN trainer which runs two optimization ops with a certain ratio."""
def __init__(self, input, model, d_period=1, g_period=1):
"""
Args:
......
......@@ -356,7 +356,7 @@ class SingleCostTrainer(TowerTrainer):
Single-cost trainer has a :meth:`setup_graph` method which takes
(inputs_desc, input, get_cost_fn, get_opt_fn), and build the training operations from them.
To use a SingleCostTrainer object, call `trainer.setup_graph(...); trainer.train(...)`.
To use a :class:`SingleCostTrainer` object, call `trainer.setup_graph(...); trainer.train(...)`.
"""
@call_only_once
......@@ -368,14 +368,16 @@ class SingleCostTrainer(TowerTrainer):
inputs_desc ([InputDesc]):
input (InputSource):
get_cost_fn ([tf.Tensor] -> tf.Tensor): callable, takes some input tenosrs and return a cost tensor.
Might get called multiple times for data-parallel training or inference.
get_opt_fn (-> tf.train.Optimizer): callable which returns an
optimizer. Will only be called once.
Returns:
[Callback]: a (possibly empty) list of callbacks needed for training.
These callbacks will be automatically added when you call `train()`.
So you can usually ignore the return value.
Note:
1. `get_cost_fn` will always be called under a :class:`TowerContext`.
which will contain information abouut reuse,
training/inference, scope name, etc.
2. `get_cost_fn` might get called multiple times for data-parallel training or inference.
3. To respect variable reuse, use `tf.get_variable` instead of
`tf.Variable` in `get_cost_fn`.
"""
get_cost_fn = TowerFuncWrapper(get_cost_fn, inputs_desc)
get_opt_fn = memoized(get_opt_fn)
......@@ -386,11 +388,17 @@ class SingleCostTrainer(TowerTrainer):
internal_callbacks = input_callbacks + train_callbacks
for cb in internal_callbacks:
self._register_callback(cb)
return internal_callbacks
# TODO register directly instead of return?
@abstractmethod
def _setup_graph(self, input, get_cost_fn, get_opt_fn):
pass
"""
Implement the logic to build the graph, with an :class:`InputSource`
that's been setup already.
Returns:
[Callback]: list of callbacks needed
"""
def _setup_input(self, inputs_desc, input):
assert not input.setup_done()
......
......@@ -44,7 +44,14 @@ def apply_default_prefetch(input_source_or_dataflow, trainer, towers):
def launch_train_with_config(config, trainer):
"""
Train with a :class:`TrainConfig` and a :class:`Trainer`, to
mimic the old training interface.
mimic the old training interface. It basically does the following
3 things (and you can easily do them by yourself):
1. Setup the :class:`InputSource` with automatic prefetching,
for `config.data` or `config.dataflow`.
2. Call `trainer.setup_graph` with the :class:`InputSource`,
as well as `config.model`.
3. Call `trainer.train` with rest of the attributes of config.
Args:
config (TrainConfig):
......@@ -79,7 +86,4 @@ def launch_train_with_config(config, trainer):
trainer.setup_graph(
inputs_desc, input,
model._build_graph_get_cost, model.get_optimizer)
trainer.train(
config.callbacks, config.monitors,
config.session_creator, config.session_init,
config.steps_per_epoch, config.starting_epoch, config.max_epoch)
trainer.train_with_config(config)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment