Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
c366eebf
Commit
c366eebf
authored
Feb 14, 2019
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Force `internal_update=True` in Cross-GPU BatchNorm.
parent
49675590
Changes
5
Show whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
33 additions
and
17 deletions
+33
-17
examples/A3C-Gym/train-atari.py
examples/A3C-Gym/train-atari.py
+1
-1
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+26
-14
tensorpack/train/base.py
tensorpack/train/base.py
+1
-1
tensorpack/train/tower.py
tensorpack/train/tower.py
+4
-0
tensorpack/utils/gpu.py
tensorpack/utils/gpu.py
+1
-1
No files found.
examples/A3C-Gym/train-atari.py
View file @
c366eebf
...
@@ -127,7 +127,7 @@ class Model(ModelDesc):
...
@@ -127,7 +127,7 @@ class Model(ModelDesc):
lr
=
tf
.
get_variable
(
'learning_rate'
,
initializer
=
0.001
,
trainable
=
False
)
lr
=
tf
.
get_variable
(
'learning_rate'
,
initializer
=
0.001
,
trainable
=
False
)
opt
=
tf
.
train
.
AdamOptimizer
(
lr
,
epsilon
=
1e-3
)
opt
=
tf
.
train
.
AdamOptimizer
(
lr
,
epsilon
=
1e-3
)
gradprocs
=
[
MapGradient
(
lambda
grad
:
tf
.
clip_by_
average_norm
(
grad
,
0.1
)),
gradprocs
=
[
MapGradient
(
lambda
grad
:
tf
.
clip_by_
norm
(
grad
,
0.1
*
tf
.
cast
(
tf
.
size
(
grad
),
tf
.
float32
)
)),
SummaryGradient
()]
SummaryGradient
()]
opt
=
optimizer
.
apply_grad_processors
(
opt
,
gradprocs
)
opt
=
optimizer
.
apply_grad_processors
(
opt
,
gradprocs
)
return
opt
return
opt
...
...
tensorpack/models/batch_norm.py
View file @
c366eebf
...
@@ -40,7 +40,7 @@ def get_bn_variables(n_out, use_scale, use_bias, beta_init, gamma_init):
...
@@ -40,7 +40,7 @@ def get_bn_variables(n_out, use_scale, use_bias, beta_init, gamma_init):
def
update_bn_ema
(
xn
,
batch_mean
,
batch_var
,
def
update_bn_ema
(
xn
,
batch_mean
,
batch_var
,
moving_mean
,
moving_var
,
decay
,
internal_update
):
moving_mean
,
moving_var
,
decay
):
update_op1
=
moving_averages
.
assign_moving_average
(
update_op1
=
moving_averages
.
assign_moving_average
(
moving_mean
,
batch_mean
,
decay
,
zero_debias
=
False
,
moving_mean
,
batch_mean
,
decay
,
zero_debias
=
False
,
name
=
'mean_ema_op'
)
name
=
'mean_ema_op'
)
...
@@ -48,13 +48,11 @@ def update_bn_ema(xn, batch_mean, batch_var,
...
@@ -48,13 +48,11 @@ def update_bn_ema(xn, batch_mean, batch_var,
moving_var
,
batch_var
,
decay
,
zero_debias
=
False
,
moving_var
,
batch_var
,
decay
,
zero_debias
=
False
,
name
=
'var_ema_op'
)
name
=
'var_ema_op'
)
if
internal_update
:
# When sync_statistics is True, always enable internal_update.
# Otherwise the update ops (only executed on main tower)
# will hang when some BatchNorm layers are unused (https://github.com/tensorpack/tensorpack/issues/1078)
with
tf
.
control_dependencies
([
update_op1
,
update_op2
]):
with
tf
.
control_dependencies
([
update_op1
,
update_op2
]):
return
tf
.
identity
(
xn
,
name
=
'output'
)
return
tf
.
identity
(
xn
,
name
=
'output'
)
else
:
tf
.
add_to_collection
(
tf
.
GraphKeys
.
UPDATE_OPS
,
update_op1
)
tf
.
add_to_collection
(
tf
.
GraphKeys
.
UPDATE_OPS
,
update_op2
)
return
tf
.
identity
(
xn
,
name
=
'output'
)
@
layer_register
()
@
layer_register
()
...
@@ -82,14 +80,19 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
...
@@ -82,14 +80,19 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
1. Accepts an alternative `data_format` option when `axis` is None. For 2D input, this argument will be ignored.
1. Accepts an alternative `data_format` option when `axis` is None. For 2D input, this argument will be ignored.
2. Default value for `momentum` and `epsilon` is different.
2. Default value for `momentum` and `epsilon` is different.
3. Default value for `training` is automatically obtained from tensorpack's `TowerContext`, but can be overwritten.
3. Default value for `training` is automatically obtained from tensorpack's `TowerContext`, but can be overwritten.
4. Support the `internal_update` option, which
enables the use of BatchNorm layer inside conditionals
.
4. Support the `internal_update` option, which
cover more use cases than the standard collection-based update
.
5. Support the `sync_statistics` option, which is very useful in small-batch models.
5. Support the `sync_statistics` option, which is very useful in small-batch models.
Args:
Args:
internal_update (bool): if False, add EMA update ops to
internal_update (bool): if False, add EMA update ops to
`tf.GraphKeys.UPDATE_OPS`. If True, update EMA inside the layer by control dependencies.
`tf.GraphKeys.UPDATE_OPS`. If True, update EMA inside the layer by control dependencies.
They are very similar in speed, but `internal_update=True` can be used
They are very similar in speed, but `internal_update=True` is recommended and can be helpful when:
when you have conditionals in your model, or when you have multiple networks to train.
1. BatchNorm is used inside dynamic control flow.
The collection-based update does not support dynamic control flows.
2. BatchNorm layer is sometimes unused (e.g., when you have two networks to train alternatively).
Putting all update ops into a single collection will waste a lot of compute.
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/14699
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/14699
sync_statistics (str or None): one of None, "nccl", or "horovod".
sync_statistics (str or None): one of None, "nccl", or "horovod".
...
@@ -106,8 +109,15 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
...
@@ -106,8 +109,15 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
If not None, per-GPU E[x] and E[x^2] among all GPUs are averaged to compute
If not None, per-GPU E[x] and E[x^2] among all GPUs are averaged to compute
global mean & variance. Therefore each GPU needs to have the same batch size.
global mean & variance. Therefore each GPU needs to have the same batch size.
The BatchNorm layer on each GPU needs to use the same name (`BatchNorm('name', input)`), so that
The synchronization is based on the current variable scope + the name of the layer
statistics can be reduced. If names do not match, this layer will hang.
(`BatchNorm('name', input)`). Therefore, you need to make sure that:
1. The BatchNorm layer on different GPUs needs to have the same name, so that
statistics can be synchronized. If names do not match, this layer will hang.
2. Different BatchNorm layers in one tower cannot share the same name.
3. A BatchNorm layer needs to be executed for the same number of times by all GPUs.
If different GPUs execute one BatchNorm layer for different number of times
(e.g., if some GPUs do not execute it), this layer may hang.
This option only has effect in standard training mode.
This option only has effect in standard training mode.
...
@@ -115,6 +125,9 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
...
@@ -115,6 +125,9 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
`MegDet: A Large Mini-Batch Object Detector <https://arxiv.org/abs/1711.07240>`_.
`MegDet: A Large Mini-Batch Object Detector <https://arxiv.org/abs/1711.07240>`_.
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/18222.
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/18222.
When `sync_statistics` is enabled, `internal_update` will be set to True automatically.
This is to avoid running `UPDATE_OPS`, which requires synchronization.
Variable Names:
Variable Names:
* ``beta``: the bias term. Will be zero-inited by default.
* ``beta``: the bias term. Will be zero-inited by default.
...
@@ -290,8 +303,7 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
...
@@ -290,8 +303,7 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
if
ctx
.
is_main_training_tower
:
if
ctx
.
is_main_training_tower
:
ret
=
update_bn_ema
(
ret
=
update_bn_ema
(
xn
,
batch_mean_vec
,
batch_var_vec
,
moving_mean
,
moving_var
,
xn
,
batch_mean_vec
,
batch_var_vec
,
moving_mean
,
moving_var
,
momentum
)
momentum
,
internal_update
)
else
:
else
:
ret
=
tf
.
identity
(
xn
,
name
=
'output'
)
ret
=
tf
.
identity
(
xn
,
name
=
'output'
)
...
...
tensorpack/train/base.py
View file @
c366eebf
...
@@ -127,7 +127,7 @@ class Trainer(object):
...
@@ -127,7 +127,7 @@ class Trainer(object):
2. Increase the global_step
2. Increase the global_step
3. Evaluate some summaries
3. Evaluate some summaries
Typically you
do not want to
use ``hooked_sess.run`` in callbacks,
Typically you
__should not__
use ``hooked_sess.run`` in callbacks,
because it is for the "training iteration". If you just want to evaluate
because it is for the "training iteration". If you just want to evaluate
some tensors, use ``sess.run`` if the tensors does not depend on the inputs,
some tensors, use ``sess.run`` if the tensors does not depend on the inputs,
or more generally, use `before_run/after_run` to evaluate the tensors **along with**
or more generally, use `before_run/after_run` to evaluate the tensors **along with**
...
...
tensorpack/train/tower.py
View file @
c366eebf
...
@@ -85,6 +85,10 @@ class TowerTrainer(Trainer):
...
@@ -85,6 +85,10 @@ class TowerTrainer(Trainer):
This method will build the trainer's tower function under ``TowerContext(is_training=False)``,
This method will build the trainer's tower function under ``TowerContext(is_training=False)``,
and returns a callable predictor with input placeholders & output tensors in this tower.
and returns a callable predictor with input placeholders & output tensors in this tower.
This method handles the common case of inference with the same tower function.
If you want to do inference with a different tower function, you can always build the tower by yourself,
under a "reuse" variable scope and a `TowerContext(is_training=False)`.
Args:
Args:
input_names (list): list of input names, matching the inputs declared for the trainer.
input_names (list): list of input names, matching the inputs declared for the trainer.
output_names(list): list of tensor names without the tower prefix.
output_names(list): list of tensor names without the tower prefix.
...
...
tensorpack/utils/gpu.py
View file @
c366eebf
...
@@ -40,7 +40,7 @@ def get_num_gpu():
...
@@ -40,7 +40,7 @@ def get_num_gpu():
built_with_cuda
=
tf
.
test
.
is_built_with_cuda
()
built_with_cuda
=
tf
.
test
.
is_built_with_cuda
()
if
not
built_with_cuda
and
ret
>
0
:
if
not
built_with_cuda
and
ret
>
0
:
logger
.
warn
(
message
+
"But TensorFlow was not built with CUDA support!"
)
logger
.
warn
(
message
+
"But TensorFlow was not built with CUDA support
and could not use GPUs
!"
)
return
ret
return
ret
env
=
os
.
environ
.
get
(
'CUDA_VISIBLE_DEVICES'
,
None
)
env
=
os
.
environ
.
get
(
'CUDA_VISIBLE_DEVICES'
,
None
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment