Tensorpack follows the "define-and-run" paradigm. A training has two steps:
1. Build graph for the model.
1.__Define__: Build graph for the model.
Users can call whatever tensorflow functions to setup the graph.
Users may or may not use tensorpack `InputSource`, `ModelDesc` to build the graph.
This step defines "what to run" in every training step.
It can happen either inside or outside the trainer.
Users may or may not use tensorpack `InputSource`, `ModelDesc`or other utilities to build the graph.
This goal of this step is to define "what to run" in later training steps,
and it can happen either inside or outside tensorpack trainer.
2. Train the model (the [Trainer.train() method](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.Trainer.train)):
2.__Run__: Train the model (the [Trainer.train() method](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.Trainer.train)):
1. Setup callbacks/monitors.
2. Finalize the graph, initialize session.
3. Run the main loop.
2. Finalize graph, initialize session.
3. Run the training loop.
## Assumptions of Base Trainer
...
...
@@ -43,17 +43,18 @@ In other words, an "epoch" in tensorpack is the __default period to run callback
Most neural network training tasks are single-cost optimization.
Tensorpack provides some trainer implementations for such tasks.
These trainers will build the graph by itself, with the following arguments:
These trainers will take care of step 1, by building the graph by itself, with the following arguments:
1. Some `InputDesc`, the metadata about the input.
2. An `InputSource`, where the input come from. See [Input Pipeline](input-source.html).
3. A function which takes input tensors and returns the cost.
4. A function which returns an optimizer.
See [SingleCostTrainer.setup_graph](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph)
for details.
These are documented better in [SingleCostTrainer.setup_graph](http://tensorpack.readthedocs.io/en/latest/modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph).
Often you'll not use this method directly, but use [high-level interface](training-interface.html#with-modeldesc-and-trainconfig)
instead.
Existing multi-GPU trainers include the logic of data-parallel training.
Existing multi-GPU trainers include the logic of single-cost data-parallel training.
You can enable them by just one line, and all the necessary logic to achieve the best performance was baked into the trainers already.
The trainers can reach the same performance as the [official tensorflow benchmark](https://www.tensorflow.org/performance/benchmarks).
...
...
@@ -61,9 +62,6 @@ Please note that in data-parallel training, in each iteration all towers (all re
tensors from the `InputSource` (instead of taking one for all and split). So the total batch size
would be ``(batch size of InputSource/DataFlow) * #GPU``.
There are also high-level wrappers that have slightly simpler interface (but exist mainly for old users).
See [High-Level Training Interface](training-interface.html)
### Custom Trainers
You can easily write a trainer for other types of training.