Commit 7780c64b authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent a131b8fd
...@@ -4,11 +4,10 @@ ...@@ -4,11 +4,10 @@
The existing trainers should be enough for single-cost optimization tasks. The existing trainers should be enough for single-cost optimization tasks.
If you want to do something different during training, first consider writing it as a callback, If you want to do something different during training, first consider writing it as a callback,
or write an issue to see if there is a better solution than creating new trainers. or write an issue to see if there is a better solution than creating new trainers.
If your task is fundamentally different from single-cost optimization, you may need to write a trainer.
For certain tasks, you do need a new trainer.
Trainers just run __some__ iterations, so there is no limit in where the data come from or what to do in an iteration. Trainers just run __some__ iterations, so there is no limit in where the data come from or what to do in an iteration.
The existing common trainers all do two things: The existing common trainers all implement two things:
1. Setup the graph and input pipeline, using the given `TrainConfig`. 1. Setup the graph and input pipeline, using the given `TrainConfig`.
2. Minimize `model.cost` in each iteration. 2. Minimize `model.cost` in each iteration.
...@@ -25,3 +24,4 @@ But you can customize it by using the base `Trainer` class. ...@@ -25,3 +24,4 @@ But you can customize it by using the base `Trainer` class.
2. Subclass `Trainer` and override the `run_step()` method. This way you can do something more than running an op. 2. Subclass `Trainer` and override the `run_step()` method. This way you can do something more than running an op.
There are several different [GAN trainers](../../examples/GAN/GAN.py) for reference. There are several different [GAN trainers](../../examples/GAN/GAN.py) for reference.
The implementation of [SimpleTrainer](https://github.com/ppwwyyxx/tensorpack/blob/master/tensorpack/train/simple.py) may also be helpful.
...@@ -32,16 +32,24 @@ the argument `inputs` is the list of input tensors matching `_get_inputs`. ...@@ -32,16 +32,24 @@ the argument `inputs` is the list of input tensors matching `_get_inputs`.
You can use any symbolic functions in `_build_graph`, including TensorFlow core library You can use any symbolic functions in `_build_graph`, including TensorFlow core library
functions and other symbolic libraries. functions and other symbolic libraries.
**How does it work**: Most tensorpack trainers expect a `ModelDesc`. ### How is it Used:
The trainers will use `_get_inputs` to connect `InputSource` to the graph,
use `_build_graph` to create the backbone model and minimization op, and so on. Most tensorpack trainers expect a `ModelDesc`, and use it as a __description
of the TF graph__ (except for the input pipeline).
These trainers will use `_get_inputs` to connect the given `InputSource` to the graph.
They'll then use `_build_graph` to create the backbone model, and then `_get_optimizer` to create the minimization op, and run it.
Note that data-parallel multi-GPU trainers will call `_build_graph` __multiple times__ on each GPU. Note that data-parallel multi-GPU trainers will call `_build_graph` __multiple times__ on each GPU.
A trainer may also make __extra calls__ to `_build_graph` for inference, if used by some callbacks. A trainer may also make __extra calls__ to `_build_graph` for inference, if used by some callbacks.
`_build_graph` will always be called under some `TowerContext` which contains these information
(e.g. training or inference, reuse or not, scope name) for your access.
Also, to respect variable reuse among multiple calls, use `tf.get_variable()` instead of `tf.Variable`.
### Build It Manually ### Build It Manually
When you need to deal with complicated graph, it may be easier to build the graph manually. When you need to deal with complicated graph, it may be easier to build the graph manually.
You are free to do so as long as you tell the trainer what to do in each step. You are free to do so as long as you tell the trainer what to do in each step.
Check out [Write a Trainer](http://localhost:8000/tutorial/extend/trainer.html) Check out [Write a Trainer](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/trainer.html)
for using a custom graph with trainer. for using a custom graph with trainer.
...@@ -5,16 +5,16 @@ In research we do training of various kind. ...@@ -5,16 +5,16 @@ In research we do training of various kind.
The only assumption tensorpack `Trainer` class makes about your training, is that your training The only assumption tensorpack `Trainer` class makes about your training, is that your training
follows this pattern: follows this pattern:
```python ```python
for epoch_num in range(starting_epoch, max_epochs): for epoch_num in range(starting_epoch, max_epoch):
for local_step in range(steps_per_epoch): for local_step in range(steps_per_epoch):
run_step() run_step()
``` ```
1. Training is **running some iteration**. 1. Training is **running some iterations**.
Tensorpack base trainer implements the logic of __running the iteration__. Tensorpack base trainer implements the logic of __running the iteration__.
Users or derived trainers should implement __what the iteration is__. Users or derived trainers should implement __what the iteration is__.
2. Trainer assumes the existence of "epoch", i.e. that the iterations run in double for loops. 2. Trainer assumes the existence of __"epoch"__, i.e. that the iterations run in double for-loops.
But it doesn't need to be a full pass of your dataset, ``steps_per_epoch`` can be any number you set But it doesn't need to be a full pass of your dataset, ``steps_per_epoch`` can be any number you set
and it only affects the [schedule of callbacks](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/callback.html). and it only affects the [schedule of callbacks](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/callback.html).
In other words, an "epoch" is the __default period__ to run callbacks (validation, summary, checkpoint, etc.). In other words, an "epoch" is the __default period__ to run callbacks (validation, summary, checkpoint, etc.).
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment