Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
229e991a
Commit
229e991a
authored
Sep 02, 2020
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
63f656c8
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
17 additions
and
9 deletions
+17
-9
examples/PennTreebank/README.md
examples/PennTreebank/README.md
+4
-4
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+3
-3
tensorpack/train/trainers.py
tensorpack/train/trainers.py
+10
-2
No files found.
examples/PennTreebank/README.md
View file @
229e991a
...
...
@@ -6,12 +6,12 @@ This example is mainly to demonstrate:
1.
How to train an RNN with persistent state between iterations. Here it simply manages the state inside the graph.
2.
How to use a TF reader pipeline instead of a DataFlow, for both training & inference.
It trains an language model on PTB dataset,
basically
an equivalent of the PTB example
in
[
tensorflow/models
](
https://github.com/tensorflow/models/
tree/master/tutorials/rnn/ptb
)
It trains an language model on PTB dataset,
and reimplements
an equivalent of the PTB example
in
[
tensorflow/models
](
https://github.com/tensorflow/models/
blob/v1.13.0/tutorials/rnn/ptb/ptb_word_lm.py
)
with its "medium" config.
It has the same performance
& speed
as the original example as well.
It has the same performance as the original example as well.
Note that the data pipeline is completely copied from the tensorflow example.
Note that the
input
data pipeline is completely copied from the tensorflow example.
To Train:
```
...
...
tensorpack/models/batch_norm.py
View file @
229e991a
...
...
@@ -163,12 +163,12 @@ def BatchNorm(inputs, axis=None, *, training=None, momentum=0.9, epsilon=1e-5,
* "default": same as "collection". Because this is the default behavior in TensorFlow.
* "skip": do not update EMA. This can be useful when you reuse a batch norm layer in several places
but do not want them to all update your EMA.
* "collection": Add EMA update ops to collection `tf.GraphKeys.UPDATE_OPS`.
* "collection": Add EMA update ops to collection `tf.GraphKeys.UPDATE_OPS`
in the first training tower
.
The ops in the collection will be run automatically by the callback :class:`RunUpdateOps`, along with
your training iterations. This can waste compute if your training iterations do not always depend
on the BatchNorm layer.
* "internal": EMA is updated inside this layer itself by control dependencies.
In standard scenarios, it has similar speed to "collection". But it
has some more benefit
s:
* "internal": EMA is updated in
the first training tower in
side this layer itself by control dependencies.
In standard scenarios, it has similar speed to "collection". But it
supports more scenario
s:
1. BatchNorm is used inside dynamic control flow.
The collection-based update does not support dynamic control flows.
...
...
tensorpack/train/trainers.py
View file @
229e991a
...
...
@@ -158,7 +158,11 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
are supposed to be in-sync).
But this cheap operation may help prevent
certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync.
Note that in cases such as BatchNorm, the variables may not be in sync:
e.g., non-master worker may not maintain EMAs.
For benchmark, disable this option.
"""
@
map_arg
(
gpus
=
_int_to_range
)
...
...
@@ -403,7 +407,11 @@ class HorovodTrainer(SingleCostTrainer):
Theoretically this is a no-op (because the variables
are supposed to be in-sync).
But this cheap operation may help prevent certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync.
Note that in cases such as BatchNorm, the variables may not be in sync:
e.g., non-master worker may not maintain EMAs.
For benchmark, disable this option.
"""
def
__init__
(
self
,
average
=
True
,
compression
=
None
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment