update docs

229e991a · Yuxin Wu · 63f656c8 · 229e991a · 229e991a · 229e991a
Commit 229e991a authored Sep 02, 2020 by Yuxin Wu
Showing with 17 additions and 9 deletions

examples/PennTreebank/README.md examples/PennTreebank/README.md +4 -4

tensorpack/models/batch_norm.py tensorpack/models/batch_norm.py +3 -3

tensorpack/train/trainers.py tensorpack/train/trainers.py +10 -2

No files found.
--- a/examples/PennTreebank/README.md
+++ b/examples/PennTreebank/README.md
@@ -6,12 +6,12 @@ This example is mainly to demonstrate:
 1. How to train an RNN with persistent state between iterations. Here it simply manages the state inside the graph.
 2. How to use a TF reader pipeline instead of a DataFlow, for both training & inference.

-It trains an language model on PTB dataset, basically an equivalent of the PTB example
-in [tensorflow/models](https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb)
+It trains an language model on PTB dataset, and reimplements an equivalent of the PTB example
+in [tensorflow/models](https://github.com/tensorflow/models/blob/v1.13.0/tutorials/rnn/ptb/ptb_word_lm.py)
 with its "medium" config.
-It has the same performance & speed as the original example as well.
+It has the same performance as the original example as well.

-Note that the data pipeline is completely copied from the tensorflow example.
+Note that the input data pipeline is completely copied from the tensorflow example.

 To Train:
 ```

--- a/tensorpack/models/batch_norm.py
+++ b/tensorpack/models/batch_norm.py
@@ -163,12 +163,12 @@ def BatchNorm(inputs, axis=None, *, training=None, momentum=0.9, epsilon=1e-5,
          * "default": same as "collection". Because this is the default behavior in TensorFlow.
          * "skip": do not update EMA. This can be useful when you reuse a batch norm layer in several places
            but do not want them to all update your EMA.
-          * "collection": Add EMA update ops to collection `tf.GraphKeys.UPDATE_OPS`.
+          * "collection": Add EMA update ops to collection `tf.GraphKeys.UPDATE_OPS` in the first training tower.
            The ops in the collection will be run automatically by the callback :class:`RunUpdateOps`, along with
            your training iterations. This can waste compute if your training iterations do not always depend
            on the BatchNorm layer.
-          * "internal": EMA is updated inside this layer itself by control dependencies.
-            In standard scenarios, it has similar speed to "collection". But it has some more benefits:
+          * "internal": EMA is updated in the first training tower inside this layer itself by control dependencies.
+            In standard scenarios, it has similar speed to "collection". But it supports more scenarios:

            1. BatchNorm is used inside dynamic control flow.
               The collection-based update does not support dynamic control flows.

--- a/tensorpack/train/trainers.py
+++ b/tensorpack/train/trainers.py
@@ -158,7 +158,11 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
            are supposed to be in-sync).
            But this cheap operation may help prevent
            certain numerical issues in practice.
-            Note that in cases such as BatchNorm, the variables may not be in sync.
+
+            Note that in cases such as BatchNorm, the variables may not be in sync:
+            e.g., non-master worker may not maintain EMAs.
+
+            For benchmark, disable this option.
    """

    @map_arg(gpus=_int_to_range)
@@ -403,7 +407,11 @@ class HorovodTrainer(SingleCostTrainer):
            Theoretically this is a no-op (because the variables
            are supposed to be in-sync).
            But this cheap operation may help prevent certain numerical issues in practice.
-            Note that in cases such as BatchNorm, the variables may not be in sync.
+
+            Note that in cases such as BatchNorm, the variables may not be in sync:
+            e.g., non-master worker may not maintain EMAs.
+
+            For benchmark, disable this option.
    """

    def __init__(self, average=True, compression=None):