update docs

9185744d · Yuxin Wu · 910cfaec · 9185744d · 9185744d · 9185744d
Commit 9185744d authored Feb 22, 2018 by Yuxin Wu
11 changed files
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
-Bug Reports/Feature Requests/Usage Questions Only:
+Potential Bugs/Feature Requests/Usage Questions Only:
 Any unexpected problems: PLEASE always include
 1. What you did:
 	+ If you're using examples:
 		+ What's the command you run:
-		+ Have you made and changes to code? Post them if any:
+		+ Have you made any changes to code? Post them if any:
 	+ If not, describe what you did that may be relevant.
 		But we may not be able to resolve it since there is no reproducible code.
 2. What you observed, e.g. as much as logs possible.
@@ -13,10 +13,17 @@ Any unexpected problems: PLEASE always include
 5. About efficiency, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
 Feature Requests:
-1. Improve an existing feature.
+ Improve an existing feature, or add a new feature.
-2. Add a new feature. Please note that, you can implement a lot of features by extending tensorpack
+ You can implement a lot of features by extending tensorpack
 	(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
-	It may not have to be added to tensorpack unless you have a good reason.
+	It does not have to be added to tensorpack unless you have a good reason.
-3. Note that we don't take example requests.
+ We don't take example requests.
+Usage Questions:
+ Read the [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials) first.
+ We answer "HOW to do X in tensorpack" for a specific well-defined X.
+  We don't answer general machine learning questions,
+  such as "how to improve my model" or "I don't understand the paper".
 You can also use gitter (https://gitter.im/tensorpack/users) for more casual discussions.
--- a/README.md
+++ b/README.md
@@ -12,7 +12,7 @@ Tensorpack is a training interface based on TensorFlow.
 It's Yet Another TF wrapper, but different in:
 1. Focus on __training speed__.
-	+	Speed comes for free with tensorpack -- it uses TensorFlow in the __correct way__ with no extra overhead.
+	+	Speed comes for free with tensorpack -- it uses TensorFlow in the __efficient way__ with no extra overhead.
 	  On various CNNs, it runs 1.5~1.7x faster than the equivalent Keras code.
 	+ Data-parallel multi-GPU training is off-the-shelf to use. It runs as fast as Google's [official benchmark](https://www.tensorflow.org/performance/benchmarks).
@@ -68,3 +68,15 @@ Dependencies:
 pip install -U git+https://github.com/ppwwyyxx/tensorpack.git
 # or add `--user` to avoid system-wide installation.
 ```
+## Citing Tensorpack:
+If you use Tensorpack in your research or wish to refer to the examples, please cite with:
+```
+@misc{wu2016tensorpack,
+  title={Tensorpack},
+  author={Wu, Yuxin and others},
+  howpublished={\url{https://github.com/tensorpack/}},
+  year={2016}
+}
+```
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -10,7 +10,7 @@ It's Yet Another TF wrapper, but different in:
 - Focus on **training speed**.
  - Speed comes for free with tensorpack -- it uses TensorFlow in the
-    **correct way** with no extra overhead. On various CNNs, it runs 1.5~1.7x faster than the equivalent Keras code.
+    **efficient way** with no extra overhead. On various CNNs, it runs 1.5~1.7x faster than the equivalent Keras code.
  - Data-parallel multi-GPU training is off-the-shelf to use. It is as fast as Google's
    `official benchmark <https://www.tensorflow.org/performance/benchmarks>`_.

--- a/docs/tutorial/faq.md
+++ b/docs/tutorial/faq.md
@@ -34,7 +34,7 @@ Then it is a good time to open an issue.
 1. Learn `tf.stop_gradient`. You can simply use `tf.stop_gradient` in your model code in many situations (e.g. to freeze first several layers).
 2. [varreplace.freeze_variables](../modules/tfutils.html#tensorpack.tfutils.varreplace.freeze_variables) returns a context where variables are freezed.
-	Learn to use the `custom_getter` argument of `tf.variable_scope` to gain more control over what & how variables are freezed.
+	It is implemented by `custom_getter` argument of `tf.variable_scope` -- learn it to gain more control over what & how variables are freezed.
 3. [ScaleGradient](../modules/tfutils.html#tensorpack.tfutils.gradproc.ScaleGradient) can be used to set the gradients of some variables to 0.
 	But it may be slow, since variables still have gradients.

--- a/docs/tutorial/save-load.md
+++ b/docs/tutorial/save-load.md
@@ -8,9 +8,11 @@ in TensorFlow checkpoint format.
 One checkpoint typically includes a `.data-xxxxx` file and a `.index` file.
 Both are necessary.
-To inspect a checkpoint, the easiest tool is `tf.train.NewCheckpointReader`.
+`tf.train.NewCheckpointReader` is the best tool to parse TensorFlow checkpoint.
-For example, [scripts/ls-checkpoint.py](../scripts/ls-checkpoint.py)
+We have two example scripts to demo its usage, but read [TF docs](https://www.tensorflow.org/api_docs/python/tf/train/NewCheckpointReader) for details.
-uses it to print all variables and their shapes in a checkpoint.
+[scripts/ls-checkpoint.py](../scripts/ls-checkpoint.py)
+demos how to print all variables and their shapes in a checkpoint.
 [scripts/dump-model-params.py](../scripts/dump-model-params.py) can be used to remove unnecessary variables in a checkpoint.
 It takes a metagraph file (which is also saved by `ModelSaver`) and only saves variables that the model needs at inference time.

--- a/docs/tutorial/summary.md
+++ b/docs/tutorial/summary.md
@@ -25,8 +25,8 @@ This is how TensorFlow summaries eventually get logged/saved/printed:
 All the "what, when, where" can be customized in either the graph or with the callbacks/monitors setting.
-Since TF summaries are evaluated every epoch by default, if the content is data-dependent, the results
+Since TF summaries are evaluated infrequently (every epoch) by default, if the content is data-dependent, the values
-are likely to have too much variance. To address this issue, you can:
+could have high variance. To address this issue, you can:
 1. Change "When to Log": log more frequently, but note that certain summaries can be expensive to
 	 log. You may want to use a separate collection for frequent logging.
 2. Change "What to Log": you can call
@@ -40,7 +40,7 @@ are likely to have too much variance. To address this issue, you can:
 Besides TensorFlow summaries,
 a callback can also write other data to the monitor backend anytime once the training has started,
-by `trainer.monitors.put_xxx`.
+by `self.trainer.monitors.put_xxx`.
 As long as the type of data is supported, the data will be dispatched to and logged to the same place.
 As a result, tensorboard will show not only summaries in the graph, but also your custom data.

--- a/docs/tutorial/symbolic.md
+++ b/docs/tutorial/symbolic.md
@@ -80,11 +80,10 @@ with TowerContext('some_name_or_empty_string', is_training=False):
 When defining the model you can construct the graph using whatever library you feel comfortable with.
-Usually, slim/tflearn/tensorlayer are just symbolic functions, calling them is nothing different
+Usually, slim/tflearn/tensorlayer are just symbolic function wrappers, calling them is nothing different
 from calling `tf.add`. You may need to be careful how regularizations/BN updates are supposed
 to be handled in those libraries, though.
 It is a bit different to use sonnet/Keras.
 sonnet/Keras manages the variable scope by their own model classes, and calling their symbolic functions
 always creates new variable scope. See the [Keras example](../examples/keras) for how to use it within tensorpack.
-The support is only preliminary for now.
--- a/docs/tutorial/trainer.md
+++ b/docs/tutorial/trainer.md
@@ -9,6 +9,7 @@ Tensorpack trainers contain logic of:
 Usually you won't touch these methods directly, but use
 [higher-level interface](training-interface.html) on trainers.
 You'll only need to __select__ what trainer to use.
+But some basic knowledge of how they work is useful:
 ### Tower Trainer
@@ -16,22 +17,22 @@ Following the terminology in TensorFlow,
 a __tower function__ is a callable that takes input tensors and adds __one replicate__ of the model to the graph.
 Most types of neural-network training could fall into this category.
-All non-base trainers in tensorpack is a subclass of [TowerTrainer](../modules/train.html#tensorpack.train.TowerTrainer).
+All trainers in tensorpack is a subclass of [TowerTrainer](../modules/train.html#tensorpack.train.TowerTrainer).
 The concept of tower is used mainly to support:
 1. Data-parallel multi-GPU training, where a replicate is built on each GPU.
-2. Automatically building the graph for inference, where a replicate is built under inference mode.
+2. Graph construction for inference, where a replicate is built under inference mode.
-You'll specify a tower function when you use `TowerTrainer`.
+You'll provide a tower function to use `TowerTrainer`.
-If you use `ModelDesc`, the `build_graph` method will be the tower function.
 The function needs to follow some conventions:
 1. It will always be called under a `TowerContext`.
 	 which will contain information about reuse, training/inference, scope name, etc.
-2. It might get called multiple times for data-parallel training or inference.
+2. __It might get called multiple times__ for data-parallel training or inference.
 3. To respect variable reuse, use `tf.get_variable` instead of
-	 `tf.Variable` in the function.
+	 `tf.Variable` in the function, unless you want to force creation of new variables.
+In particular, when working with the `ModelDesc` interface, its `build_graph` method will be the tower function.
 ### MultiGPU Trainers
@@ -44,11 +45,11 @@ It takes only one line of code change to use them.
 Note some __common problems__ when using these trainers:
-1. In each iteration all GPUs (all replicates of the model) will take tensors from the `InputSource`,
+1. In each iteration, all GPUs (all replicates of the model) take tensors from the `InputSource`,
-	instead of taking one for all and split.
+	instead of take one for all and split.
 	So the total batch size would become ``(batch size of InputSource/DataFlow) * #GPU``.
-	Splitting a tensor to GPUs makes no sense at all, only to put unnecessary shape constraints on the data.
+	Splitting a tensor for data-parallel training makes no sense at all, only to put unnecessary shape constraints on the data.
 	By letting each GPU train on its own input tensors, they can train on inputs of different shapes simultaneously.
 2. The tower function (your model code) will get called multipile times.

--- a/tensorpack/graph_builder/distributed.py
+++ b/tensorpack/graph_builder/distributed.py
@@ -70,6 +70,10 @@ class DistributedParameterServerBuilder(DataParallelBuilder, DistributedBuilderB
    It is an equivalent of ``--variable_update=parameter_server`` in
    `tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_.
+    However this implementation hasn't been well tested.
+    It probably still has issues in model saving, etc.
+    Check `ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
+    for fast and correct distributed examples.
    Note:
        1. Gradients are not averaged across workers, but applied to PS variables
@@ -138,6 +142,9 @@ class DistributedReplicatedBuilder(DataParallelBuilder, DistributedBuilderBase):
    It is an equivalent of ``--variable_update=distributed_replicated`` in
    `tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_.
+    Note that the performance of this trianer is still not satisfactory.
+    Check `ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
+    for fast and correct distributed examples.
    Note:
        1. Gradients are not averaged across workers, but applied to PS variables

--- a/tensorpack/tfutils/model_utils.py
+++ b/tensorpack/tfutils/model_utils.py
@@ -14,7 +14,7 @@ __all__ = []
 def describe_trainable_vars():
    """
    Print a description of the current model parameters.
-    Skip variables starting with "tower".
+    Skip variables starting with "tower", as they are just duplicates built by data-parallel logic.
    """
    train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
    if len(train_vars) == 0:
@@ -51,6 +51,8 @@ def describe_trainable_vars():
 def get_shape_str(tensors):
    """
+    Internally used by layer registry, to print shapes of inputs/outputs of layers.
    Args:
        tensors (list or tf.Tensor): a tensor or a list of tensors
    Returns:

--- a/tensorpack/tfutils/summary.py
+++ b/tensorpack/tfutils/summary.py
@@ -198,11 +198,11 @@ def add_param_summary(*summary_lists, **kwargs):
 def add_moving_summary(*args, **kwargs):
    """
-    Add moving average summary for some tensors.
+    Summarize the moving average for scalar tensors.
    This function is a no-op if not calling from main training tower.
    Args:
-        args: tensors to summarize
+        args: scalar tensors to summarize
        decay (float): the decay rate. Defaults to 0.95.
        collection (str or None): the name of the collection to add EMA-maintaining ops.
            The default will work together with the default