Commit edbcf18a authored by Yuxin Wu's avatar Yuxin Wu

docs update

parent 03c16776
...@@ -37,13 +37,17 @@ TrainConfig( ...@@ -37,13 +37,17 @@ TrainConfig(
# schedule the learning rate based on epoch number # schedule the learning rate based on epoch number
ScheduledHyperParamSetter('learning_rate', ScheduledHyperParamSetter('learning_rate',
[(30, 1e-2), (60, 1e-3), (85, 1e-4), (95, 1e-5)]), [(30, 1e-2), (60, 1e-3), (85, 1e-4), (95, 1e-5)]),
# allow manually setting the learning rate during training # can manually set the learning rate during training
HumanHyperParamSetter('learning_rate'), HumanHyperParamSetter('learning_rate'),
# send validation error to my phone through pushbullet # send validation error to my phone through pushbullet
SendStat('curl -u your_id_xxx: https://api.pushbullet.com/v2/pushes \\ SendStat('curl -u your_id_xxx: https://api.pushbullet.com/v2/pushes \\
-d type=note -d title="validation error" \\ -d type=note -d title="validation error" \\
-d body={val-error-top1} > /dev/null 2>&1', -d body={val-error-top1} > /dev/null 2>&1',
'val-error-top1') 'val-error-top1'),
# record GPU utilizations during training
GPUUtilizationTracker(),
# can pause the training and start a debug shell, to observe what's going on
InjectShell(shell='ipython')
], ],
extra_callbacks=[ # these callbacks are enabled by default already extra_callbacks=[ # these callbacks are enabled by default already
# maintain and summarize moving average of some tensors defined in the model (e.g. training loss, training error) # maintain and summarize moving average of some tensors defined in the model (e.g. training loss, training error)
...@@ -69,6 +73,8 @@ TrainConfig( ...@@ -69,6 +73,8 @@ TrainConfig(
Notice that callbacks cover every detail of training, ranging from graph operations to the progress bar. Notice that callbacks cover every detail of training, ranging from graph operations to the progress bar.
This means you can customize every part of the training to your preference, e.g. display something This means you can customize every part of the training to your preference, e.g. display something
different in the progress bar, evaluating part of the summaries at a different frequency, etc. different in the progress bar, evaluating part of the summaries at a different frequency, etc.
These features may not be always useful, but think about how messy the main loop would look like if you
were to write the logic together with the loops.
See [Write a callback](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/callback.html) See [Write a callback](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/callback.html)
on how to implement a callback. on how to implement a callback.
...@@ -126,7 +126,7 @@ If you identify this as a bottleneck, you can also use: ...@@ -126,7 +126,7 @@ If you identify this as a bottleneck, you can also use:
Let's summarize what the above dataflow does: Let's summarize what the above dataflow does:
1. One thread iterates over a shuffled list of (filename, label) pairs, and put them into a queue of size 1000. 1. One thread iterates over a shuffled list of (filename, label) pairs, and put them into a queue of size 1000.
2. 25 worker threads takes pairs and make them into (preprocessed image, label) pairs. 2. 25 worker threads take pairs and make them into (preprocessed image, label) pairs.
3. Both 1 and 2 happen in one separate process, and the results are sent back to main process through ZeroMQ. 3. Both 1 and 2 happen in one separate process, and the results are sent back to main process through ZeroMQ.
4. Main process makes batches, and other tensorpack modules will then take care of how they should go into the graph. 4. Main process makes batches, and other tensorpack modules will then take care of how they should go into the graph.
......
...@@ -32,15 +32,17 @@ The script expects a metagraph file which is also saved by `ModelSaver`. ...@@ -32,15 +32,17 @@ The script expects a metagraph file which is also saved by `ModelSaver`.
## How to load a model / do transfer learning ## How to load a model / do transfer learning
All model loading (in either training or testing) is through the `session_init` option All model loading (in either training or testing) is through the `session_init` initializer
in `TrainConfig` or `PredictConfig`. in `TrainConfig` or `PredictConfig`.
It accepts a `SessionInit` instance, where the common options are `SaverRestore` which restores The common choices for this option are `SaverRestore` which restores a
TF checkpoint, or `DictRestore` which restores a dict. (`get_model_loader` is a small helper to TF checkpoint, or `DictRestore` which restores a dict. (`get_model_loader` is a small helper to
decide which one to use from a file name.) decide which one to use from a file name.)
Doing transfer learning is straightforward. Variable restoring is completely based on name match between Doing transfer learning is trivial.
Variable restoring is completely based on name match between
the current graph and the `SessionInit` initializer. the current graph and the `SessionInit` initializer.
Therefore, if you want to load some model, just use the same variable name. Therefore, if you want to load some model, just use the same variable name
so the old value will be loaded into the variable.
If you want to re-train some layer, just rename it. If you want to re-train some layer, just rename it.
Unmatched variables on both sides will be printed as a warning. Unmatched variables on both sides will be printed as a warning.
......
...@@ -32,10 +32,10 @@ the argument `inputs` is the list of input tensors matching `_get_inputs`. ...@@ -32,10 +32,10 @@ the argument `inputs` is the list of input tensors matching `_get_inputs`.
You can use any symbolic functions in `_build_graph`, including TensorFlow core library You can use any symbolic functions in `_build_graph`, including TensorFlow core library
functions and other symbolic libraries. functions and other symbolic libraries.
Most tensorpack trainers expect a `ModelDesc`. **How does it work**: Most tensorpack trainers expect a `ModelDesc`.
The trainers will call these methods to create the model, The trainers will use `_get_inputs` to connect `InputSource` to the graph,
connect `InputSource` to the model, create the minimization op, and so on. use `_build_graph` to create the backbone model and minimization op, and so on.
Data-parallel Multi-GPU trainers will call `_build_graph` __multiple times__ on each GPU. Note that data-parallel multi-GPU trainers will call `_build_graph` __multiple times__ on each GPU.
A trainer may also make __extra calls__ to `_build_graph` for inference, if used by some callbacks. A trainer may also make __extra calls__ to `_build_graph` for inference, if used by some callbacks.
### Build It Manually ### Build It Manually
...@@ -43,4 +43,5 @@ A trainer may also make __extra calls__ to `_build_graph` for inference, if used ...@@ -43,4 +43,5 @@ A trainer may also make __extra calls__ to `_build_graph` for inference, if used
When you need to deal with complicated graph, it may be easier to build the graph manually. When you need to deal with complicated graph, it may be easier to build the graph manually.
You are free to do so as long as you tell the trainer what to do in each step. You are free to do so as long as you tell the trainer what to do in each step.
More details to come. Check out [Write a Trainer](http://localhost:8000/tutorial/extend/trainer.html)
for using a custom graph with trainer.
...@@ -31,10 +31,10 @@ class SendStat(Callback): ...@@ -31,10 +31,10 @@ class SendStat(Callback):
class InjectShell(Callback): class InjectShell(Callback):
""" """
When triggered, opens an IPython/pdb shell if a file exists. Allow users to create a specific file as a signal to pause
Useful for interactive debug during training. and iteratively debug the training.
When triggered, it detects whether the file exists, and opens an
Using this callback requires ipython to be installed. IPython/pdb shell if yes.
""" """
def __init__(self, file='INJECT_SHELL.tmp', shell='ipython'): def __init__(self, file='INJECT_SHELL.tmp', shell='ipython'):
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment