Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
dd138d5a
Commit
dd138d5a
authored
Jul 05, 2019
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
silent autograph warnings; update docs
parent
23239bd7
Changes
5
Show whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
21 additions
and
12 deletions
+21
-12
.travis.yml
.travis.yml
+2
-2
examples/FasterRCNN/NOTES.md
examples/FasterRCNN/NOTES.md
+5
-3
examples/FasterRCNN/config.py
examples/FasterRCNN/config.py
+0
-4
examples/FasterRCNN/data.py
examples/FasterRCNN/data.py
+4
-3
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+10
-0
No files found.
.travis.yml
View file @
dd138d5a
...
...
@@ -26,10 +26,10 @@ matrix:
env
:
TF_VERSION=1.3.0 TF_TYPE=release
-
os
:
linux
python
:
2.7
env
:
TF_VERSION=1.1
2
.0 TF_TYPE=release
env
:
TF_VERSION=1.1
4
.0 TF_TYPE=release
-
os
:
linux
python
:
3.6
env
:
TF_VERSION=1.1
2
.0 TF_TYPE=release PYPI=true
env
:
TF_VERSION=1.1
4
.0 TF_TYPE=release PYPI=true
-
os
:
linux
python
:
2.7
env
:
TF_TYPE=nightly
...
...
examples/FasterRCNN/NOTES.md
View file @
dd138d5a
...
...
@@ -66,15 +66,17 @@ Efficiency:
1. After warmup, the training speed will slowly decrease due to more accurate proposals.
1. The code should have around 80~90% GPU utilization on V100s, and 85%~90% scaling
efficiency from 1 V100 to 8 V100s.
1. The code should have around 85~90% GPU utilization on one V100.
Scalability isn't very meaningful since the amount of computation each GPU perform is data-dependent.
If all images have the same spatial size (in which case the per-GPU computation is *still different*),
then a 85%~90% scaling efficiency is observed when using 8 V100s and `
HorovodTrainer
`.
1. This implementation does not use specialized CUDA ops (e.g. AffineChannel, ROIAlign).
Therefore it might be slower than other highly-optimized implementations.
1. To reduce RAM usage on host: (1) make sure you're using the "spawn" method as
set in `
train.py
`; (2) reduce `
buffer_size
` or `
NUM_WORKERS
` in `
data.py
`
(which may negatively impact your throughput). The training needs <10G RAM if `
NUM_WORKERS=0
`.
(which may negatively impact your throughput). The training
only
needs <10G RAM if `
NUM_WORKERS=0
`.
1. Inference is unoptimized. Tensorpack is a training interface, therefore it
does not help you on optimized inference. In fact, the current implementation
...
...
examples/FasterRCNN/config.py
View file @
dd138d5a
...
...
@@ -257,10 +257,6 @@ def finalize_configs(is_training):
if
_C
.
TRAINER
==
'horovod'
:
import
horovod.tensorflow
as
hvd
ngpu
=
hvd
.
size
()
if
ngpu
==
hvd
.
local_size
():
logger
.
warn
(
"It's not recommended to use horovod for single-machine training. "
"Replicated trainer is more stable and has the same efficiency."
)
else
:
assert
'OMPI_COMM_WORLD_SIZE'
not
in
os
.
environ
ngpu
=
get_num_gpu
()
...
...
examples/FasterRCNN/data.py
View file @
dd138d5a
...
...
@@ -121,9 +121,10 @@ class TrainingDataPreprocessor:
def
__init__
(
self
,
cfg
):
self
.
cfg
=
cfg
self
.
aug
=
imgaug
.
AugmentorList
(
[
CustomResize
(
cfg
.
PREPROC
.
TRAIN_SHORT_EDGE_SIZE
,
cfg
.
PREPROC
.
MAX_SIZE
),
imgaug
.
Flip
(
horiz
=
True
)]
)
self
.
aug
=
imgaug
.
AugmentorList
([
CustomResize
(
cfg
.
PREPROC
.
TRAIN_SHORT_EDGE_SIZE
,
cfg
.
PREPROC
.
MAX_SIZE
),
imgaug
.
Flip
(
horiz
=
True
)
])
def
__call__
(
self
,
roidb
):
fname
,
boxes
,
klass
,
is_crowd
=
roidb
[
"file_name"
],
roidb
[
"boxes"
],
roidb
[
"class"
],
roidb
[
"is_crowd"
]
...
...
tensorpack/models/batch_norm.py
View file @
dd138d5a
...
...
@@ -56,6 +56,15 @@ def internal_update_bn_ema(xn, batch_mean, batch_var,
return
tf
.
identity
(
xn
,
name
=
'output'
)
try
:
# When BN is used as an activation, keras layers try to autograph.convert it
# This leads to massive warnings so we disable it.
from
tensorflow.python.autograph.impl.api
import
do_not_convert
as
disable_autograph
except
ImportError
:
def
disable_autograph
():
return
lambda
x
:
x
@
layer_register
()
@
convert_to_tflayer_args
(
args_names
=
[],
...
...
@@ -66,6 +75,7 @@ def internal_update_bn_ema(xn, batch_mean, batch_var,
'decay'
:
'momentum'
,
'use_local_stat'
:
'training'
})
@
disable_autograph
()
def
BatchNorm
(
inputs
,
axis
=
None
,
training
=
None
,
momentum
=
0.9
,
epsilon
=
1e-5
,
center
=
True
,
scale
=
True
,
beta_initializer
=
tf
.
zeros_initializer
(),
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment