Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
a9950705
Commit
a9950705
authored
Jan 09, 2020
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix bug when combining DataParallelInferenceRunner+BatchNorm (since
cc2322bb
)
parent
d2f95645
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
9 additions
and
7 deletions
+9
-7
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
+1
-1
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+2
-1
tensorpack/train/trainers.py
tensorpack/train/trainers.py
+6
-5
No files found.
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
View file @
a9950705
...
@@ -59,7 +59,7 @@ If this command failed, tell us your version of Python/TF/tensorpack.
...
@@ -59,7 +59,7 @@ If this command failed, tell us your version of Python/TF/tensorpack.
Note that:
Note that:
+
You can install
T
ensorpack master by
`pip install -U git+https://github.com/tensorpack/tensorpack.git`
+
You can install
t
ensorpack master by
`pip install -U git+https://github.com/tensorpack/tensorpack.git`
and see if your issue is already solved.
and see if your issue is already solved.
+
If you're not using tensorpack under a normal command line shell (e.g.,
+
If you're not using tensorpack under a normal command line shell (e.g.,
using an IDE or jupyter notebook), please retry under a normal command line shell.
using an IDE or jupyter notebook), please retry under a normal command line shell.
...
...
tensorpack/models/batch_norm.py
View file @
a9950705
...
@@ -195,7 +195,8 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
...
@@ -195,7 +195,8 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
ema_update
=
"collection"
ema_update
=
"collection"
# Logic:
# Logic:
# 1. EMA update is possible only when we compute batch statistics (training=True)
# 1. EMA update is possible only when we compute batch statistics (training=True)
# 2. We know that in training, non-main training tower does not need EMA update
# 2. We know that in training, non-main training tower does not need EMA
# update (unless you need, e.g., inference during training on all towers)
# We don't know about what to do in prediction context, so be conservative and do the update.
# We don't know about what to do in prediction context, so be conservative and do the update.
# 3. User can explicit disable update by "skip".
# 3. User can explicit disable update by "skip".
do_ema_update
=
training
and
\
do_ema_update
=
training
and
\
...
...
tensorpack/train/trainers.py
View file @
a9950705
...
@@ -157,10 +157,9 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
...
@@ -157,10 +157,9 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
are supposed to be in-sync).
are supposed to be in-sync).
But this cheap operation may help prevent
But this cheap operation may help prevent
certain numerical issues in practice.
certain numerical issues in practice.
Note that in cases such as BatchNorm, the variables may not be in sync.
"""
"""
BROADCAST_EVERY_EPOCH
=
False
@
map_arg
(
gpus
=
_int_to_range
)
@
map_arg
(
gpus
=
_int_to_range
)
def
__init__
(
self
,
gpus
,
average
=
True
,
mode
=
None
):
def
__init__
(
self
,
gpus
,
average
=
True
,
mode
=
None
):
"""
"""
...
@@ -180,6 +179,8 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
...
@@ -180,6 +179,8 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
mode
=
mode
.
lower
()
mode
=
mode
.
lower
()
self
.
_builder
=
SyncMultiGPUReplicatedBuilder
(
gpus
,
average
,
mode
)
self
.
_builder
=
SyncMultiGPUReplicatedBuilder
(
gpus
,
average
,
mode
)
self
.
BROADCAST_EVERY_EPOCH
=
True
super
(
SyncMultiGPUTrainerReplicated
,
self
)
.
__init__
()
super
(
SyncMultiGPUTrainerReplicated
,
self
)
.
__init__
()
def
_setup_graph
(
self
,
input
,
get_cost_fn
,
get_opt_fn
):
def
_setup_graph
(
self
,
input
,
get_cost_fn
,
get_opt_fn
):
...
@@ -384,8 +385,8 @@ class HorovodTrainer(SingleCostTrainer):
...
@@ -384,8 +385,8 @@ class HorovodTrainer(SingleCostTrainer):
Whether to broadcast the variables every epoch.
Whether to broadcast the variables every epoch.
Theoretically this is a no-op (because the variables
Theoretically this is a no-op (because the variables
are supposed to be in-sync).
are supposed to be in-sync).
But this cheap operation may help prevent
But this cheap operation may help prevent
certain numerical issues in practice.
certain numerical issues in practice
.
Note that in cases such as BatchNorm, the variables may not be in sync
.
"""
"""
def
__init__
(
self
,
average
=
True
,
compression
=
None
):
def
__init__
(
self
,
average
=
True
,
compression
=
None
):
...
@@ -413,7 +414,7 @@ class HorovodTrainer(SingleCostTrainer):
...
@@ -413,7 +414,7 @@ class HorovodTrainer(SingleCostTrainer):
logger
.
info
(
"[HorovodTrainer] local rank={}"
.
format
(
self
.
_local_rank
))
logger
.
info
(
"[HorovodTrainer] local rank={}"
.
format
(
self
.
_local_rank
))
super
(
HorovodTrainer
,
self
)
.
__init__
()
super
(
HorovodTrainer
,
self
)
.
__init__
()
self
.
BROADCAST_EVERY_EPOCH
=
Fals
e
self
.
BROADCAST_EVERY_EPOCH
=
Tru
e
def
mpi_enabled
(
self
):
def
mpi_enabled
(
self
):
"""
"""
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment