Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
65d0f0b9
Commit
65d0f0b9
authored
Aug 24, 2017
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
use fused_batch_norm when use_local_stat=False&is_training
parent
81f4b575
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
29 additions
and
16 deletions
+29
-16
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+28
-15
tensorpack/tfutils/common.py
tensorpack/tfutils/common.py
+1
-1
No files found.
tensorpack/models/batch_norm.py
View file @
65d0f0b9
...
@@ -94,12 +94,17 @@ def BatchNorm(x, use_local_stat=None, decay=0.9, epsilon=1e-5,
...
@@ -94,12 +94,17 @@ def BatchNorm(x, use_local_stat=None, decay=0.9, epsilon=1e-5,
* ``variance/EMA``: the moving average of variance.
* ``variance/EMA``: the moving average of variance.
Note:
Note:
In multi-GPU training, moving averages across GPUs are not aggregated.
1. About multi-GPU training: moving averages across GPUs are not aggregated.
This is consistent with most frameworks.
Batch statistics are computed indepdently. This is consistent with most frameworks.
2. Combinations of ``use_local_stat`` and ``ctx.is_training``:
However, all GPUs use the moving averages on the first GPU (instead of
* ``use_local_stat == is_training``: standard BN, EMA are
their own), this is inconsistent with most frameworks (but consistent
maintained during training and used during inference.
with the official inceptionv3 example).
* ``use_local_stat and not is_training``: still use local (batch)
statistics in inference.
* ``not use_local_stat and is_training``: use EMA to normalize in
training. This is useful when you load a pre-trained BN and
don't want to fine tune the EMA. EMA will not be updated in
this case.
"""
"""
shape
=
x
.
get_shape
()
.
as_list
()
shape
=
x
.
get_shape
()
.
as_list
()
ndims
=
len
(
shape
)
ndims
=
len
(
shape
)
...
@@ -131,16 +136,24 @@ def BatchNorm(x, use_local_stat=None, decay=0.9, epsilon=1e-5,
...
@@ -131,16 +136,24 @@ def BatchNorm(x, use_local_stat=None, decay=0.9, epsilon=1e-5,
xn
=
tf
.
squeeze
(
xn
,
[
1
,
2
])
xn
=
tf
.
squeeze
(
xn
,
[
1
,
2
])
else
:
else
:
if
ctx
.
is_training
:
if
ctx
.
is_training
:
logger
.
warn
(
"[BatchNorm] Using moving_mean/moving_variance in training."
)
if
ctx
.
index
==
0
:
# only warn in first tower
# non-fused op is faster for inference
logger
.
warn
(
"[BatchNorm] Using moving_mean/moving_variance in training."
)
if
ndims
==
4
and
data_format
==
'NCHW'
:
# Using moving_mean/moving_variance in training, which means we
[
g
,
b
,
mm
,
mv
]
=
[
reshape_for_bn
(
_
,
ndims
,
n_out
,
data_format
)
# loaded a pre-trained BN and only fine-tuning the affine part.
for
_
in
[
gamma
,
beta
,
moving_mean
,
moving_var
]]
xn
,
_
,
_
=
tf
.
nn
.
fused_batch_norm
(
xn
=
tf
.
nn
.
batch_normalization
(
x
,
mm
,
mv
,
b
,
g
,
epsilon
)
x
,
gamma
,
beta
,
mean
=
moving_mean
,
variance
=
moving_var
,
epsilon
=
epsilon
,
data_format
=
data_format
,
is_training
=
False
)
else
:
else
:
# avoid the reshape if possible (when channel is the last dimension)
# non-fused op is faster for inference # TODO test if this is still true
xn
=
tf
.
nn
.
batch_normalization
(
if
ndims
==
4
and
data_format
==
'NCHW'
:
x
,
moving_mean
,
moving_var
,
beta
,
gamma
,
epsilon
)
[
g
,
b
,
mm
,
mv
]
=
[
reshape_for_bn
(
_
,
ndims
,
n_out
,
data_format
)
for
_
in
[
gamma
,
beta
,
moving_mean
,
moving_var
]]
xn
=
tf
.
nn
.
batch_normalization
(
x
,
mm
,
mv
,
b
,
g
,
epsilon
)
else
:
# avoid the reshape if possible (when channel is the last dimension)
xn
=
tf
.
nn
.
batch_normalization
(
x
,
moving_mean
,
moving_var
,
beta
,
gamma
,
epsilon
)
# maintain EMA only on one GPU is OK, even in replicated mode.
# maintain EMA only on one GPU is OK, even in replicated mode.
# because training time doesn't use EMA
# because training time doesn't use EMA
...
...
tensorpack/tfutils/common.py
View file @
65d0f0b9
...
@@ -44,7 +44,7 @@ def get_default_sess_config(mem_fraction=0.99):
...
@@ -44,7 +44,7 @@ def get_default_sess_config(mem_fraction=0.99):
conf
.
gpu_options
.
allocator_type
=
'BFC'
conf
.
gpu_options
.
allocator_type
=
'BFC'
conf
.
gpu_options
.
allow_growth
=
True
conf
.
gpu_options
.
allow_growth
=
True
#
Hurt performance in 8xP100 training
#
May hurt performance
# conf.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
# conf.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
return
conf
return
conf
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment