docs update

653b4b97 · Yuxin Wu · fbeed06b · 653b4b97 · 653b4b97 · 653b4b97
Commit 653b4b97 authored Aug 20, 2017 by Yuxin Wu
7 changed files
--- a/docs/tutorial/callback.md
+++ b/docs/tutorial/callback.md
@@ -74,7 +74,8 @@ Notice that callbacks cover every detail of training, ranging from graph operati
 This means you can customize every part of the training to your preference, e.g. display something
 different in the progress bar, evaluating part of the summaries at a different frequency, etc.
 These features may not be always useful, but think about how messy the main loop would look like if you
-were to write the logic together with the loops.
+were to write the logic together with the loops, and how easy your life will be if you could enable
+these features with one line when you need them.
 See [Write a callback](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/callback.html)
 for details on how callbacks work, what they can do, and how to write them.
--- a/docs/tutorial/dataflow.md
+++ b/docs/tutorial/dataflow.md
@@ -5,11 +5,11 @@
 DataFlow is a library to build Python iterators for efficient data loading.
-A DataFlow has a `get_data()` generator method,
+**Definition**: A DataFlow is something that has a `get_data()` generator method,
 which yields `datapoints`.
 A datapoint is a **list** of Python objects which is called the `components` of a datapoint.
-For example, to train on MNIST dataset, you can write a DataFlow with a `get_data()` method
+**Example**: to train on MNIST dataset, you may need a DataFlow with a `get_data()` method
 that yields datapoints (lists) of two components:
 a numpy array of shape (64, 28, 28), and an array of shape (64,).
@@ -17,7 +17,7 @@ a numpy array of shape (64, 28, 28), and an array of shape (64,).
 One good thing about having a standard interface is to be able to provide
 the greatest code reusability.
 There are a lot of existing DataFlow utilities in tensorpack, which you can use to compose
-complex DataFlow with a long pre-processing pipeline. A common pipeline usually
+complex DataFlow with a long data pipeline. A common pipeline usually
 would __read from disk (or other sources), apply augmentations, group into batches,
 prefetch data__, etc. A simple example is as the following:
@@ -36,8 +36,10 @@ with all the data preprocessing.
 Unless you are working with standard data types (image folders, LMDB, etc),
 you would usually want to write the base DataFlow (`MyDataFlow` in the above example) for your data format.
-See [another tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/dataflow.html)
+See [another tutorial](extend/dataflow.html)
 for simple instructions on writing a DataFlow.
+Once you have the base reader, all the [existing DataFlows](../modules/dataflow.html) are ready for you to complete
+the rest of the data pipeline.
 ### Why DataFlow
@@ -52,7 +54,7 @@ Nevertheless, tensorpack support data loading with native TF operators as well.
 ### Use DataFlow outside Tensorpack
 DataFlow is __independent__ of both tensorpack and TensorFlow.
-To `import tensorpack.dataflow`, you don't have to install TensorFlow.
+To `import tensorpack.dataflow`, you don't even have to install TensorFlow.
 You can simply use it as a data processing pipeline and plug it into any other frameworks.
 To use a DataFlow independently, you will need to call `reset_state()` first to initialize it,

--- a/docs/tutorial/efficient-dataflow.md
+++ b/docs/tutorial/efficient-dataflow.md
@@ -13,23 +13,23 @@ The average resolution is about 400x350 <sup>[[1]]</sup>.
 Following the [ResNet example](../examples/ResNet), we need images in their original resolution,
 so we will read the original dataset (instead of a down-sampled version), and
 then apply complicated preprocessing to it.
-We will need to reach a speed of, roughly 1k ~ 2k images per second, to keep GPUs busy.
+We will need to reach a speed of, roughly **1k ~ 2k images per second**, to keep GPUs busy.
 Some things to know before reading:
-1. Having a fast Python generator **alone** may or may not improve your overall training speed.
+1. For smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some prefetch should usually work well enough.
+	 Therefore you don't have to understand this tutorial in depth unless you really find your data being the bottleneck.
+	 This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-sized dataset.
+2. Having a fast Python generator **alone** may or may not improve your overall training speed.
 	 You need mechanisms to hide the latency of **all** preprocessing stages, as mentioned in the
 	 [previous tutorial](input-source.html).
-2. Reading training set and validation set are different.
+3. Reading training set and validation set are different.
 	 In training it's OK to reorder, regroup, or even duplicate some datapoints, as long as the
 	 data distribution roughly stays the same.
 	 But in validation we often need the exact set of data, to be able to compute a correct and comparable score.
 	 This will affect how we build the DataFlow.
-3. The actual performance would depend on not only the disk, but also memory (for caching) and CPU (for data processing).
+4. The actual performance would depend on not only the disk, but also memory (for caching) and CPU (for data processing).
 	 You may need to tune the parameters (#processes, #threads, size of buffer, etc.)
 	 or change the pipeline for new tasks and new machines to achieve the best performance.
-4. This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-sized dataset.
-	 However, for smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some prefetch should work well enough.
-	 Figure out the bottleneck first, before trying to optimize any piece in the whole system.
 ## Random Read

--- a/docs/tutorial/graph.md
+++ b/docs/tutorial/graph.md
@@ -45,7 +45,7 @@ A trainer may also make __extra calls__ to `_build_graph` for inference, if used
 (e.g. training or inference, reuse or not, scope name) for your access.
 Also, to respect variable reuse among multiple calls, use `tf.get_variable()` instead of `tf.Variable` in `_build_graph`,
-if you need to create and variables.
+if you need to create any variables.
 ### Build It Manually

--- a/docs/tutorial/input-source.md
+++ b/docs/tutorial/input-source.md
@@ -41,7 +41,7 @@ either Python code or TensorFlow operators (written in C++).
 The benefits of using TensorFlow ops are:
 * Faster read/preprocessing.
-	* Potentially true, but not necessarily. With Python code you can call a variety of other fast libraries (e.g. lmdb), which
+	* Potentially true, but not necessarily. With Python code you can call a variety of other fast libraries, which
 		you have no access to in TF ops. For example, LMDB could be faster than TFRecords.
 	* Python may be just fast enough.

--- a/tensorpack/callbacks/prof.py
+++ b/tensorpack/callbacks/prof.py
@@ -19,7 +19,12 @@ __all__ = ['GPUUtilizationTracker', 'GraphProfiler']
 class GPUUtilizationTracker(Callback):
-    """ Summarize the average GPU utilization within an epoch"""
+    """ Summarize the average GPU utilization within an epoch.
+    It will start a process to run `nvidia-smi` every second
+    within the epoch (the trigger_epoch time was not included),
+    and write average utilization to monitors.
+    """
    def __init__(self, devices=None):
        """

--- a/tensorpack/callbacks/stats.py
+++ b/tensorpack/callbacks/stats.py
@@ -35,6 +35,8 @@ class InjectShell(Callback):
    and iteratively debug the training.
    When triggered, it detects whether the file exists, and opens an
    IPython/pdb shell if yes.
+    In the shell, `self` is this callback, `self.trainer` is the trainer, and
+    from that you can access everything else.
    """
    def __init__(self, file='INJECT_SHELL.tmp', shell='ipython'):