Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
a9950705
Commit
a9950705
authored
Jan 09, 2020
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix bug when combining DataParallelInferenceRunner+BatchNorm (since
cc2322bb
)
parent
d2f95645
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
9 additions
and
7 deletions
+9
-7
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
+1
-1
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+2
-1
tensorpack/train/trainers.py
tensorpack/train/trainers.py
+6
-5
No files found.
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
View file @
a9950705
...
...
@@ -59,7 +59,7 @@ If this command failed, tell us your version of Python/TF/tensorpack.
Note that:
+
You can install
T
ensorpack master by
`pip install -U git+https://github.com/tensorpack/tensorpack.git`
+
You can install
t
ensorpack master by
`pip install -U git+https://github.com/tensorpack/tensorpack.git`
and see if your issue is already solved.
+
If you're not using tensorpack under a normal command line shell (e.g.,
using an IDE or jupyter notebook), please retry under a normal command line shell.
...
...
tensorpack/models/batch_norm.py
View file @
a9950705
...
...
@@ -195,7 +195,8 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
ema_update
=
"collection"
# Logic:
# 1. EMA update is possible only when we compute batch statistics (training=True)
# 2. We know that in training, non-main training tower does not need EMA update
# 2. We know that in training, non-main training tower does not need EMA
# update (unless you need, e.g., inference during training on all towers)
# We don't know about what to do in prediction context, so be conservative and do the update.
# 3. User can explicit disable update by "skip".
do_ema_update
=
training
and
\
...
...
tensorpack/train/trainers.py
View file @
a9950705
...
...
@@ -157,10 +157,9 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
are supposed to be in-sync).
But this cheap operation may help prevent
certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync.
"""
BROADCAST_EVERY_EPOCH
=
False
@
map_arg
(
gpus
=
_int_to_range
)
def
__init__
(
self
,
gpus
,
average
=
True
,
mode
=
None
):
"""
...
...
@@ -180,6 +179,8 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
mode
=
mode
.
lower
()
self
.
_builder
=
SyncMultiGPUReplicatedBuilder
(
gpus
,
average
,
mode
)
self
.
BROADCAST_EVERY_EPOCH
=
True
super
(
SyncMultiGPUTrainerReplicated
,
self
)
.
__init__
()
def
_setup_graph
(
self
,
input
,
get_cost_fn
,
get_opt_fn
):
...
...
@@ -384,8 +385,8 @@ class HorovodTrainer(SingleCostTrainer):
Whether to broadcast the variables every epoch.
Theoretically this is a no-op (because the variables
are supposed to be in-sync).
But this cheap operation may help prevent
certain numerical issues in practice
.
But this cheap operation may help prevent
certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync
.
"""
def
__init__
(
self
,
average
=
True
,
compression
=
None
):
...
...
@@ -413,7 +414,7 @@ class HorovodTrainer(SingleCostTrainer):
logger
.
info
(
"[HorovodTrainer] local rank={}"
.
format
(
self
.
_local_rank
))
super
(
HorovodTrainer
,
self
)
.
__init__
()
self
.
BROADCAST_EVERY_EPOCH
=
Fals
e
self
.
BROADCAST_EVERY_EPOCH
=
Tru
e
def
mpi_enabled
(
self
):
"""
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment