more docs for writing callbacks. (fix #287)

57619f3a · Yuxin Wu · e6cf9885 · 57619f3a · 57619f3a
Commit 57619f3a authored Jun 24, 2017 by Yuxin Wu
Show whitespace changes
Inline Side-by-side

Showing with 26 additions and 9 deletions

README.md README.md +3 -1

docs/tutorial/extend/callback.md docs/tutorial/extend/callback.md +23 -8

No files found.
--- a/README.md
+++ b/README.md
@@ -38,7 +38,9 @@ It's Yet Another TF wrapper, but different in:
 	+	Tensorpack trainer is almost always faster than `feed_dict` based wrappers.
 	  Even on a tiny CNN example, the training runs [2x faster](https://gist.github.com/ppwwyyxx/8d95da79f8d97036a7d67c2416c851b6) than the equivalent Keras code.

-	+ Data-Parallel Multi-GPU training is off-the-shelf to use. It is as fast as Google's [benchmark code](https://github.com/tensorflow/benchmarks).
+	+ Data-parallel multi-GPU training is off-the-shelf to use. It is as fast as Google's [benchmark code](https://github.com/tensorflow/benchmarks).
+
+	+ Data-parallel distributed training is off-the-shelf to use. It is as slow as Google's [benchmark code](https://github.com/tensorflow/benchmarks).

 3. Focus on large datasets.
 	+ It's painful to read/preprocess data from TF. Use __DataFlow__ to efficiently process large datasets such as ImageNet in __pure Python__.

--- a/docs/tutorial/extend/callback.md
+++ b/docs/tutorial/extend/callback.md

 ## Write a callback

-The places where each callback gets called is demonstrated in this snippet:
+The places where each method gets called is demonstrated in this snippet:

 ```python
 def main_loop():
@@ -20,27 +20,33 @@ def main_loop():
  callbacks.after_train()
 ```

+## Explain the callback methods
+
 You can overwrite any of the following methods to define a new callback:

 * `_setup_graph(self)`

-To separate between "define" and "run", and also to avoid the common mistake to create ops inside
-loops, all changes to the graph should be made in this method. No session has been created at this time.
+Setup the ops / tensors in the graph which you might need to use in the callback. You can use
+[`graph.get_tensor_by_name`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name)
+to access those already defined in the training tower. Or use
+[`self.trainer.get_predictor(..)`](http://tensorpack.readthedocs.io/en/latest/modules/train.html?highlight=get_predictor#tensorpack.train.Trainer.get_predictor)
+to create a callable evaluation function in the predict tower.

-TODO how to access the tensors already defined.
+This method is to separate between "define" and "run", and also to avoid the common mistake to create ops inside
+loops, all changes to the graph should be made in this method.

 * `_before_train(self)`

-Can be used to run some manual initialization of variables, or start some services for the whole training.
+Can be used to run some manual initialization of variables, or start some services for the training.

 * `_after_train(self)`

-Do some finalization work.
+Usually some finalization work.

 * `_before_epoch(self)`, `_after_epoch(self)`

-Use it only when you really need something to happen __immediately__ before/after an epoch.
-Usually `_trigger_epoch` should be enough.
+Use them only when you really need something to happen __immediately__ before/after an epoch.
+Otherwise, `_trigger_epoch` should be enough.

 * `_before_run(self, ctx)`, `_after_run(self, ctx, values)`

@@ -74,3 +80,12 @@ Do something after each epoch has finished. Will call `self.trigger()` by defaul
 By default will get called by `_trigger_epoch`,
 but you can customize the scheduling of this callback by
 `PeriodicTrigger`, to let this method run every k steps or every k epochs.
+
+## What you can do in the callback
+
+* Access tensors / ops in either training / inference mode (need to create them in `_setup_graph`).
+* Write stuff to the monitor backend, by `self.trainer.monitors.put_xxx`.
+	The monitors might direct your events to TensorFlow events file, JSON file, stdout, etc.
+	You can get history monitor data as well. See the docs for [Monitors](http://tensorpack.readthedocs.io/en/latest/modules/callbacks.html#tensorpack.callbacks.Monitors)
+* Access the current status of training, such as `epoch_num`, `global_step`. See [here](http://tensorpack.readthedocs.io/en/latest/modules/callbacks.html#tensorpack.callbacks.Callback)
+* Anything else that can be done with plain python.