update docs

f636a657 · Yuxin Wu · 16c04d1f · f636a657 · f636a657 · f636a657
Commit f636a657 authored Nov 02, 2017 by Yuxin Wu
7 changed files
--- a/docs/tutorial/callback.md
+++ b/docs/tutorial/callback.md
@@ -71,6 +71,7 @@ monitors=[        # monitors are a special kind of callbacks. these are also ena
 Notice that callbacks cover every detail of training, ranging from graph operations to the progress bar.
 This means you can customize every part of the training to your preference, e.g. display something
 different in the progress bar, evaluating part of the summaries at a different frequency, etc.
+
 These features may not be always useful, but think about how messy the main loop would look like if you
 were to write these logic together with the loops, and how easy your life will be if you could enable
 these features with one line when you need them.

--- a/docs/tutorial/dataflow.md
+++ b/docs/tutorial/dataflow.md
@@ -54,7 +54,7 @@ the rest of the data pipeline.
 Nevertheless, tensorpack support data loading with native TF operators / TF datasets as well.

 ### Use DataFlow (outside Tensorpack)
-Existing tensorpack trainers work with DataFlow out-of-the-box.
+tensorpack `InputSource` interface works with DataFlow out-of-the-box.
 If you use DataFlow in some custom code, call `reset_state()` first to initialize it,
 and then use the generator however you like:
 ```python

--- a/docs/tutorial/efficient-dataflow.md
+++ b/docs/tutorial/efficient-dataflow.md
@@ -18,7 +18,7 @@ We will need to reach a speed of, roughly **1k ~ 2k images per second**, to keep
 Some things to know before reading:
 1. For smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some prefetch should usually work well enough.
 	 Therefore you don't have to understand this tutorial in depth unless you really find your data being the bottleneck.
-	 This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-sized dataset.
+	 This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset.
 2. Having a fast Python generator **alone** may or may not improve your overall training speed.
 	 You need mechanisms to hide the latency of **all** preprocessing stages, as mentioned in the
 	 [previous tutorial](input-source.html).

--- a/docs/tutorial/extend/callback.md
+++ b/docs/tutorial/extend/callback.md
@@ -20,31 +20,32 @@ def train(self):
 ```
 Note that at each place, each callback will be called in the order they are given to the trainer.

+
 ### Explain the Callback Methods

-You can override any of the following methods to define a new callback:
+To write a callback, subclass `Callback` and implement the corresponding underscore-prefixed methods.
+You can overwrite any of the following methods to define a new callback:

 * `_setup_graph(self)`

-Setup the ops / tensors in the graph which you might need to use in the callback.
-You can use TF methods such as
-[`graph.get_tensor_by_name`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name)
-to access those already defined in the training tower.
+Create any ops / tensors in the graph which you might need to use in the callback.
+This method is to separate between "define" and "run", and also to
+avoid the common mistake to create ops inside
+loops. All changes to the graph should be made in this method.

+To access ops which are already defined,
+you can use TF methods such as
+[`graph.get_tensor_by_name`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name).
 If you're using a `TowerTrainer` instance, more tools are available:

-	* Use `self.trainer.tower_func.towers` to access the
+* Use `self.trainer.tower_func.towers` to access the
 	[tower handles](../modules/tfutils.html#tensorpack.tfutils.tower.TowerTensorHandles),
 	and therefore the tensors in each tower.
-	* [self.get_tensors_maybe_in_tower()](../modules/callbacks.html#tensorpack.callbacks.Callback.get_tensors_maybe_in_tower)
+* [self.get_tensors_maybe_in_tower()](../modules/callbacks.html#tensorpack.callbacks.Callback.get_tensors_maybe_in_tower)
 	is a helper function to access tensors in the first training tower.
-	* [self.trainer.get_predictor()](../modules/train.html#tensorpack.train.TowerTrainer.get_predictor)
+* [self.trainer.get_predictor()](../modules/train.html#tensorpack.train.TowerTrainer.get_predictor)
 	is a helper function to create a callable under inference mode.

-This method is to separate between "define" and "run", and also to
-avoid the common mistake to create ops inside
-loops. All changes to the graph should be made in this method.
-
 * `_before_train(self)`

 Can be used to run some manual initialization of variables, or start some services for the training.
@@ -60,7 +61,7 @@ Otherwise, `_trigger_epoch` should be enough.

 * `_before_run(self, ctx)`, `_after_run(self, ctx, values)`

-This two are the equivlent of [tf.train.SessionRunHook](https://www.tensorflow.org/api_docs/python/tf/train/SessionRunHook).
+These are the equivalence of [tf.train.SessionRunHook](https://www.tensorflow.org/api_docs/python/tf/train/SessionRunHook).
 Please refer to TensorFlow documentation for detailed API.
 They are used to run extra ops / eval extra tensors / feed extra values __along with__ the actual training iterations.


--- a/docs/tutorial/input-source.md
+++ b/docs/tutorial/input-source.md
@@ -25,7 +25,7 @@ The reasons are:

 Let's do some simple math: according to [tensorflow/benchmarks](https://www.tensorflow.org/performance/benchmarks),
 4 P100 GPUs can train ResNet50 at 852 images/sec, and the size of those images are 852\*224\*224\*3\*4bytes = 489MB.
-Assuming you have 5GB/s `memcpy` bandwidth, simply copying the data once would take 0.1s -- slowing
+Assuming you have 5GB/s `memcpy` bandwidth (roughly like this if you run single-thread copy), simply copying the data once would take 0.1s -- slowing
 down your training by 10%. Think about how many more copies are made during your preprocessing.

 Failure to hide the data preparation latency is the major reason why people
@@ -74,6 +74,7 @@ Let's take a look at what users are asking for:
 * [Handle dataset that's not a multiple of batch size](https://github.com/tensorflow/tensorflow/issues/13745)
 * [Take variable-length np array](https://github.com/tensorflow/tensorflow/issues/13018)
 * [Different levels of determinism](https://github.com/tensorflow/tensorflow/issues/13932)
+
 To support these features which could've been done with 3 lines of code in Python, you need either a new TF
 API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator)
 (i.e. Python again) to the rescue.

--- a/docs/tutorial/summary.md
+++ b/docs/tutorial/summary.md
@@ -15,7 +15,7 @@ This is how TensorFlow summaries eventually get logged/saved/printed:
 	It runs ops in the `SUMMARIES` collection (by default) every epoch (by default),
 	and writes results to the monitors.
 3. __Where to Log__:
-	Several monitors are [default monitors](../modules/train.html#tensorpack.train.DEFAULT_MONITORS).
+	Several monitors are [enabled by default](../modules/train.html#tensorpack.train.DEFAULT_MONITORS).
 	* A [TFEventWriter](../modules/callbacks.html#tensorpack.callbacks.TFEventWriter)
 		writes things to an event file used by tensorboard.
 	* A [ScalarPrinter](../modules/callbacks.html#tensorpack.callbacks.ScalarPrinter)
@@ -36,7 +36,7 @@ are likely to have too much variance. To address this issue, you can:
 	 [MovingAverageSummary](../modules/callbacks.html#tensorpack.callbacks.MovingAverageSummary)
 	 callback (enabled by default).

-### Other Data
+### Other Logging Data

 Besides TensorFlow summaries,
 a callback can also write other data to the monitor backend anytime once the training has started.

--- a/docs/tutorial/trainer.md
+++ b/docs/tutorial/trainer.md
@@ -6,7 +6,7 @@ Tensorpack follows the "define-and-run" paradigm. A training has two steps:
 1. __Define__: Build graph for the model.
 	Users can call whatever tensorflow functions to setup the graph.
 	Users may or may not use tensorpack `InputSource`, `ModelDesc` or other utilities to build the graph.
-	This goal of this step is to define "what to run" in later training steps,
+	The goal of this step is to define "what to run" in later training steps,
 	and it can happen either inside or outside tensorpack trainer.

 2. __Run__: Train the model (the [Trainer.train() method](../modules/train.html#tensorpack.train.Trainer.train)):
@@ -58,7 +58,7 @@ Existing multi-GPU trainers include the logic of single-cost data-parallel train
 You can enable them by just one line, and all the necessary logic to achieve the best performance was baked into the trainers already.
 The trainers can reach the same performance as the [official tensorflow benchmark](https://www.tensorflow.org/performance/benchmarks).

-Please note that in data-parallel training, in each iteration all towers (all replicates of the model) will take
+Please note that in data-parallel training, in each iteration all GPUs (all replicates of the model) will take
 tensors from the `InputSource` (instead of taking one for all and split). So the total batch size
 would be ``(batch size of InputSource/DataFlow) * #GPU``.