Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
2d661d6d
Commit
2d661d6d
authored
Aug 18, 2020
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
379e9a07
Changes
6
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
10 additions
and
9 deletions
+10
-9
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
+2
-2
examples/FasterRCNN/NOTES.md
examples/FasterRCNN/NOTES.md
+1
-1
examples/FasterRCNN/train.py
examples/FasterRCNN/train.py
+0
-1
tensorpack/graph_builder/training.py
tensorpack/graph_builder/training.py
+1
-2
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+3
-0
tensorpack/train/trainers.py
tensorpack/train/trainers.py
+3
-3
No files found.
.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
View file @
2d661d6d
...
...
@@ -50,10 +50,10 @@ If you expect higher speed, please read
http://tensorpack.readthedocs.io/tutorial/performance-tuning.html
before posting.
If you expect the model to converge / work better, note that we do not help you on how to
train a new
model.
If you expect the model to converge / work better, note that we do not help you on how to
improve a
model.
Only in one of the two conditions can we help with it:
(1) You're unable to reproduce the results documented in tensorpack examples.
(2) It
appears to be
a tensorpack bug.
(2) It
indicates
a tensorpack bug.
### 4. Your environment:
...
...
examples/FasterRCNN/NOTES.md
View file @
2d661d6d
...
...
@@ -48,7 +48,7 @@ This is a minimal implementation that simply contains these files:
3. We currently only support single image per GPU in this example.
4. Because of (3), BatchNorm statistics are supposed to be fr
eezed
during fine-tuning.
4. Because of (3), BatchNorm statistics are supposed to be fr
ozen
during fine-tuning.
5. An alternative to freezing BatchNorm is to sync BatchNorm statistics across
GPUs (the `
BACKBONE.NORM=SyncBN
` option).
...
...
examples/FasterRCNN/train.py
View file @
2d661d6d
...
...
@@ -115,6 +115,5 @@ if __name__ == '__main__':
if
is_horovod
:
trainer
=
HorovodTrainer
(
average
=
False
)
else
:
# nccl mode appears faster than cpu mode
trainer
=
SyncMultiGPUTrainerReplicated
(
cfg
.
TRAIN
.
NUM_GPUS
,
average
=
False
)
launch_train_with_config
(
traincfg
,
trainer
)
tensorpack/graph_builder/training.py
View file @
2d661d6d
...
...
@@ -211,8 +211,7 @@ class SyncMultiGPUReplicatedBuilder(DataParallelBuilder):
self
.
_mode
=
mode
if
self
.
_mode
==
'hierarchical'
and
len
(
towers
)
!=
8
:
logger
.
warn
(
"mode='hierarchical' require 8 GPUs. Fallback to mode='nccl'."
)
self
.
_mode
=
'nccl'
raise
ValueError
(
"mode='hierarchical' require 8 GPUs."
)
def
call_for_each_tower
(
self
,
tower_fn
):
"""
...
...
tensorpack/models/batch_norm.py
View file @
2d661d6d
...
...
@@ -75,6 +75,9 @@ def get_sync_bn_mean_var(inputs, red_axis, sync_statistics):
assert
TF_version
>=
(
1
,
10
),
\
"Cross-GPU BatchNorm is only supported in TF>=1.10 ."
\
"Upgrade TF or apply this patch manually: https://github.com/tensorflow/tensorflow/pull/20360"
if
TF_version
>=
(
1
,
15
):
logger
.
warn
(
"BatchNorm(sync_statistics='nccl') may produce incorrect results due "
"to bug in TF>=1.15: https://github.com/tensorflow/tensorflow/issues/41539"
)
if
TF_version
<=
(
1
,
12
):
try
:
...
...
tensorpack/train/trainers.py
View file @
2d661d6d
...
...
@@ -168,10 +168,10 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
gpus (int or [int]): list of GPU ids.
average (bool): whether to average or sum gradients.
mode (str or None): Gradient aggregation mode.
Supported values: ['nccl', 'hierarchical', 'cpu'].
Supported values: ['nccl', 'hierarchical', 'cpu', 'gpu'].
These modes may differ in speed.
Default to pick automatically by heuristics.
These modes may have slight (within 5
%
) differences in speed.
"hierarchical" mode was designed for DGX-like 8GPU machines.
"hierarchical" mode was designed for DGX-like 8-GPU machines.
"""
self
.
devices
=
gpus
if
mode
is
not
None
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment