Tensorpack follows the "define-and-run" paradigm. Therefore a training script has two steps:
Most neural network training tasks are single-cost optimization.
Tensorpack provides some trainer implementations for such tasks.
These trainers will take care of step 1 (define the graph), with the following arguments:
1. __Define__: Build graph for the model.
1. Some `tf.TensorSpec`, the signature of the input.
Users can call whatever tensorflow functions to setup the graph.
2. An `InputSource`, where the input come from. See [Input Pipeline](input-source.html).
Users may or may not use tensorpack `InputSource`, `ModelDesc` or other utilities to build the graph.
3. A function which takes input tensors and returns the cost.
The goal of this step is to define "what to run" in later training steps,
4. A function which returns an optimizer.
and it can happen __either inside or outside__ tensorpack trainer.
2. __Run__: Train the model (the [Trainer.train() method](/modules/train.html#tensorpack.train.Trainer.train)):
These are documented in [SingleCostTrainer.setup_graph](/modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph).
In practice you'll not use this method directly, but use [high-level interface](/tutorial/training-interface.html#with-modeldesc-and-trainconfig) instead.
is a trainer that uses user-provided "tower function" to build models.
All existing trainers in tensorpack are subclass of ``TowerTrainer``,
because this concept is able to cover most types of neural-network training tasks.
* Q: What types of training can you do with tensorpack?
#### What is Tower Function
* A: Anything that runs in a loop.
In research we do training of various kind.
Following the terminology in TensorFlow,
Tensorpack trainers avoid making assumptions on what type of training
a __tower function__ is a callable that takes input tensors and adds __one replicate__ of the model to the graph.
you want to do (e.g., it doesn't have to be batched, SGD-like, or have `X`(inputs) and `y`(outputs)).
In short, __tower function builds your model__.
The only assumption is that your training follows this pattern:
If you can write a function that builds your model, then you can use `TowerTrainer`.
```python
forepoch_numinrange(starting_epoch,max_epoch):
forlocal_stepinrange(steps_per_epoch):
run_step()
```
1. Training is **running some iterations**.
The concept of "tower" is used mainly to support:
Tensorpack base trainer implements the logic of __running the iteration__.
1. Data-parallel multi-GPU training, where a replicate is built on each GPU.
Users or derived trainers should implement __what the iteration is__.
2. Graph construction for inference, where a replicate is built under inference mode.
2. Trainer assumes the existence of __"epoch"__, i.e. that the iterations run in double for-loops.
A user needs to provide a tower function to use `TowerTrainer`.
But `steps_per_epoch` can be any number you set
In particular, when working with the commonly used `ModelDesc` interface, the `build_graph`
and it only affects the [schedule of callbacks](callback.html).
method will be part of the tower function.
In other words, an "epoch" in tensorpack is the __default period to run callbacks__ (validation, summary, checkpoint, etc.).
#### Rules of Tower Function
### How Existing (Single-Cost) Trainers Work
The tower function needs to follow some rules:
Most neural network training tasks are single-cost optimization.
1. __It may get called multiple times__ for data-parallel training or inference. As a result:
Tensorpack provides some trainer implementations for such tasks.
* You'll need to be careful when modifying global states, e.g.
These trainers will take care of step 1 (define the graph), with the following arguments:
adding ops to collections, setting attributes of a model instance.
* To use a tensorflow-hub module, you need to initialize the
module outside the tower function, and call the module inside the tower function.
2. It must __respect variable collections__:
* (Required) Only put variables __trainable by gradient descent__ into `TRAINABLE_VARIABLES`.
* (Recommended) Put non-trainable variables that need to be used in inference into `MODEL_VARIABLES`.
3. It must __respect variable scope names__:
1. Some `tf.TensorSpec`, the signature of the input.
The name of any trainable variables created in the function must be like "variable_scope_name/other/scopes/and/name".
2. An `InputSource`, where the input come from. See [Input Pipeline](input-source.html).
Strictly speaking, the name of any trainable variables must:
3. A function which takes input tensors and returns the cost.
4. A function which returns an optimizer.
These are documented in [SingleCostTrainer.setup_graph](/modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph).
* Start with the name of the enclosing variable_scope when the tower function is called.
In practice you'll not use this method directly, but use [high-level interface](/tutorial/training-interface.html#with-modeldesc-and-trainconfig) instead.
* Not use the same variable_scope's name twice in its name.
* Not depend on name_scope's name.
* Not depend on any tensor's name (because the tensor's name may depend on name_scope's name).
Tensorpack layers create variables based on the name given to the layer:
e.g., `Conv2D('test', x)` will open a variable scope named "test".
In order to respect the above rules,
the name of the layer must not depend on name_scope's name or any tensor's name.
4. It must __respect variable scope reuse__:
* The creation of any trainable variables must __respect reuse__ variable scope.
To respect variable reuse (i.e. sharing), use `tf.get_variable` instead of `tf.Variable` in the function.
On the other hand, for a non-trainable variable, it may be desirable to not reuse it between towers.
In this case, `tf.Variable` can be used to ensure creation of new variables in each tower even when `reuse=True`.
* Do not modify the reuse option (e.g., by `scope.reuse_variables()`) of a variable
scope that is not created by you. This affects other's code. You can always
open new scopes if you need the reuse option.
5. It must not create scopes or variables containing the name 'tower', as it is
reserved for special use.
These conventions are easy to follow, and most layer wrappers (e.g.,
tf.layers/slim/tensorlayer) do follow them. Note that certain Keras layers do not
follow these conventions and will need some workarounds if used within tensorpack.
#### What You Can Do Inside a Tower Function
1. Call any symbolic functions as long as they follow the above rules.