Commit 229e991a authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 63f656c8
......@@ -6,12 +6,12 @@ This example is mainly to demonstrate:
1. How to train an RNN with persistent state between iterations. Here it simply manages the state inside the graph.
2. How to use a TF reader pipeline instead of a DataFlow, for both training & inference.
It trains an language model on PTB dataset, basically an equivalent of the PTB example
in [tensorflow/models](https://github.com/tensorflow/models/tree/master/tutorials/rnn/ptb)
It trains an language model on PTB dataset, and reimplements an equivalent of the PTB example
in [tensorflow/models](https://github.com/tensorflow/models/blob/v1.13.0/tutorials/rnn/ptb/ptb_word_lm.py)
with its "medium" config.
It has the same performance & speed as the original example as well.
It has the same performance as the original example as well.
Note that the data pipeline is completely copied from the tensorflow example.
Note that the input data pipeline is completely copied from the tensorflow example.
To Train:
```
......
......@@ -163,12 +163,12 @@ def BatchNorm(inputs, axis=None, *, training=None, momentum=0.9, epsilon=1e-5,
* "default": same as "collection". Because this is the default behavior in TensorFlow.
* "skip": do not update EMA. This can be useful when you reuse a batch norm layer in several places
but do not want them to all update your EMA.
* "collection": Add EMA update ops to collection `tf.GraphKeys.UPDATE_OPS`.
* "collection": Add EMA update ops to collection `tf.GraphKeys.UPDATE_OPS` in the first training tower.
The ops in the collection will be run automatically by the callback :class:`RunUpdateOps`, along with
your training iterations. This can waste compute if your training iterations do not always depend
on the BatchNorm layer.
* "internal": EMA is updated inside this layer itself by control dependencies.
In standard scenarios, it has similar speed to "collection". But it has some more benefits:
* "internal": EMA is updated in the first training tower inside this layer itself by control dependencies.
In standard scenarios, it has similar speed to "collection". But it supports more scenarios:
1. BatchNorm is used inside dynamic control flow.
The collection-based update does not support dynamic control flows.
......
......@@ -158,7 +158,11 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
are supposed to be in-sync).
But this cheap operation may help prevent
certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync.
Note that in cases such as BatchNorm, the variables may not be in sync:
e.g., non-master worker may not maintain EMAs.
For benchmark, disable this option.
"""
@map_arg(gpus=_int_to_range)
......@@ -403,7 +407,11 @@ class HorovodTrainer(SingleCostTrainer):
Theoretically this is a no-op (because the variables
are supposed to be in-sync).
But this cheap operation may help prevent certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync.
Note that in cases such as BatchNorm, the variables may not be in sync:
e.g., non-master worker may not maintain EMAs.
For benchmark, disable this option.
"""
def __init__(self, average=True, compression=None):
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment