Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
2d661d6d
Commit
2d661d6d
authored
Aug 18, 2020
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
379e9a07
Changes
6
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
10 additions
and
9 deletions
+10
-9
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
+2
-2
examples/FasterRCNN/NOTES.md
examples/FasterRCNN/NOTES.md
+1
-1
examples/FasterRCNN/train.py
examples/FasterRCNN/train.py
+0
-1
tensorpack/graph_builder/training.py
tensorpack/graph_builder/training.py
+1
-2
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+3
-0
tensorpack/train/trainers.py
tensorpack/train/trainers.py
+3
-3
No files found.
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
View file @
2d661d6d
...
@@ -50,10 +50,10 @@ If you expect higher speed, please read
...
@@ -50,10 +50,10 @@ If you expect higher speed, please read
http://tensorpack.readthedocs.io/tutorial/performance-tuning.html
http://tensorpack.readthedocs.io/tutorial/performance-tuning.html
before posting.
before posting.
If you expect the model to converge / work better, note that we do not help you on how to
train a new
model.
If you expect the model to converge / work better, note that we do not help you on how to
improve a
model.
Only in one of the two conditions can we help with it:
Only in one of the two conditions can we help with it:
(1) You're unable to reproduce the results documented in tensorpack examples.
(1) You're unable to reproduce the results documented in tensorpack examples.
(2) It
appears to be
a tensorpack bug.
(2) It
indicates
a tensorpack bug.
### 4. Your environment:
### 4. Your environment:
...
...
examples/FasterRCNN/NOTES.md
View file @
2d661d6d
...
@@ -48,7 +48,7 @@ This is a minimal implementation that simply contains these files:
...
@@ -48,7 +48,7 @@ This is a minimal implementation that simply contains these files:
3. We currently only support single image per GPU in this example.
3. We currently only support single image per GPU in this example.
4. Because of (3), BatchNorm statistics are supposed to be fr
eezed
during fine-tuning.
4. Because of (3), BatchNorm statistics are supposed to be fr
ozen
during fine-tuning.
5. An alternative to freezing BatchNorm is to sync BatchNorm statistics across
5. An alternative to freezing BatchNorm is to sync BatchNorm statistics across
GPUs (the `
BACKBONE.NORM=SyncBN
` option).
GPUs (the `
BACKBONE.NORM=SyncBN
` option).
...
...
examples/FasterRCNN/train.py
View file @
2d661d6d
...
@@ -115,6 +115,5 @@ if __name__ == '__main__':
...
@@ -115,6 +115,5 @@ if __name__ == '__main__':
if
is_horovod
:
if
is_horovod
:
trainer
=
HorovodTrainer
(
average
=
False
)
trainer
=
HorovodTrainer
(
average
=
False
)
else
:
else
:
# nccl mode appears faster than cpu mode
trainer
=
SyncMultiGPUTrainerReplicated
(
cfg
.
TRAIN
.
NUM_GPUS
,
average
=
False
)
trainer
=
SyncMultiGPUTrainerReplicated
(
cfg
.
TRAIN
.
NUM_GPUS
,
average
=
False
)
launch_train_with_config
(
traincfg
,
trainer
)
launch_train_with_config
(
traincfg
,
trainer
)
tensorpack/graph_builder/training.py
View file @
2d661d6d
...
@@ -211,8 +211,7 @@ class SyncMultiGPUReplicatedBuilder(DataParallelBuilder):
...
@@ -211,8 +211,7 @@ class SyncMultiGPUReplicatedBuilder(DataParallelBuilder):
self
.
_mode
=
mode
self
.
_mode
=
mode
if
self
.
_mode
==
'hierarchical'
and
len
(
towers
)
!=
8
:
if
self
.
_mode
==
'hierarchical'
and
len
(
towers
)
!=
8
:
logger
.
warn
(
"mode='hierarchical' require 8 GPUs. Fallback to mode='nccl'."
)
raise
ValueError
(
"mode='hierarchical' require 8 GPUs."
)
self
.
_mode
=
'nccl'
def
call_for_each_tower
(
self
,
tower_fn
):
def
call_for_each_tower
(
self
,
tower_fn
):
"""
"""
...
...
tensorpack/models/batch_norm.py
View file @
2d661d6d
...
@@ -75,6 +75,9 @@ def get_sync_bn_mean_var(inputs, red_axis, sync_statistics):
...
@@ -75,6 +75,9 @@ def get_sync_bn_mean_var(inputs, red_axis, sync_statistics):
assert
TF_version
>=
(
1
,
10
),
\
assert
TF_version
>=
(
1
,
10
),
\
"Cross-GPU BatchNorm is only supported in TF>=1.10 ."
\
"Cross-GPU BatchNorm is only supported in TF>=1.10 ."
\
"Upgrade TF or apply this patch manually: https://github.com/tensorflow/tensorflow/pull/20360"
"Upgrade TF or apply this patch manually: https://github.com/tensorflow/tensorflow/pull/20360"
if
TF_version
>=
(
1
,
15
):
logger
.
warn
(
"BatchNorm(sync_statistics='nccl') may produce incorrect results due "
"to bug in TF>=1.15: https://github.com/tensorflow/tensorflow/issues/41539"
)
if
TF_version
<=
(
1
,
12
):
if
TF_version
<=
(
1
,
12
):
try
:
try
:
...
...
tensorpack/train/trainers.py
View file @
2d661d6d
...
@@ -168,10 +168,10 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
...
@@ -168,10 +168,10 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
gpus (int or [int]): list of GPU ids.
gpus (int or [int]): list of GPU ids.
average (bool): whether to average or sum gradients.
average (bool): whether to average or sum gradients.
mode (str or None): Gradient aggregation mode.
mode (str or None): Gradient aggregation mode.
Supported values: ['nccl', 'hierarchical', 'cpu'].
Supported values: ['nccl', 'hierarchical', 'cpu', 'gpu'].
These modes may differ in speed.
Default to pick automatically by heuristics.
Default to pick automatically by heuristics.
These modes may have slight (within 5
%
) differences in speed.
"hierarchical" mode was designed for DGX-like 8-GPU machines.
"hierarchical" mode was designed for DGX-like 8GPU machines.
"""
"""
self
.
devices
=
gpus
self
.
devices
=
gpus
if
mode
is
not
None
:
if
mode
is
not
None
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment