Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
f63e0ee4
Commit
f63e0ee4
authored
Sep 01, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs; Mask R-CNN horovod mode eval only on master machine
parent
7b8728f9
Changes
10
Show whitespace changes
Inline
Side-by-side
Showing
10 changed files
with
61 additions
and
45 deletions
+61
-45
.github/ISSUE_TEMPLATE.md
.github/ISSUE_TEMPLATE.md
+9
-3
.github/ISSUE_TEMPLATE/feature-requests.md
.github/ISSUE_TEMPLATE/feature-requests.md
+2
-1
CHANGES.md
CHANGES.md
+1
-1
examples/FasterRCNN/NOTES.md
examples/FasterRCNN/NOTES.md
+2
-3
examples/FasterRCNN/README.md
examples/FasterRCNN/README.md
+2
-2
examples/FasterRCNN/config.py
examples/FasterRCNN/config.py
+4
-0
examples/FasterRCNN/train.py
examples/FasterRCNN/train.py
+29
-25
tensorpack/dataflow/parallel.py
tensorpack/dataflow/parallel.py
+8
-6
tensorpack/libinfo.py
tensorpack/libinfo.py
+3
-3
tensorpack/utils/serialize.py
tensorpack/utils/serialize.py
+1
-1
No files found.
.github/ISSUE_TEMPLATE.md
View file @
f63e0ee4
## DO NOT post an issue if you're seeing this. You're at the wrong place.
To post an issue, please:
1.
Click the "New Issue" button
2.
__Choose your category__!
3.
__Read instructions there__!
An issue has to be one of the following:
-
Unexpected Problems / Potential Bugs
-
Feature Requests
-
Questions on Using/Understanding Tensorpack
To post an issue, please click "New Issue", choose your category, and read
instructions there.
.github/ISSUE_TEMPLATE/feature-requests.md
View file @
f63e0ee4
...
...
@@ -7,8 +7,9 @@ about: Suggest an idea for Tensorpack
+
Note that you can implement a lot of features by extending Tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
It does not have to be added to Tensorpack unless you have a good reason.
+
"Could you improve/implement an example/paper ?"
-- The answer is: we have no plans to do so. We don't consider feature
requests for examples or implement a paper for you, unless it demonstrates
some Tensorpack features not yet demonstrated in the existing examples.
If you don't know how to do
it
, you may ask a usage question.
If you don't know how to do
something yourself
, you may ask a usage question.
CHANGES.md
View file @
f63e0ee4
...
...
@@ -11,7 +11,7 @@ TensorFlow itself also changes API and those are not listed here.
+
[2018/08/27] msgpack is used again for "serialization to disk", because pyarrow
has no compatibility between versions. To use pyarrow instead,
`export TENSORPACK_COMPATIBLE_SERIALIZE=pyarrow`
.
+
[2018/04/05] msgpack is replaced by pyarrow in favor of its speed. If you want old behavior,
`export TENSORPACK_SERIALIZE=msgpack`
.
`export TENSORPACK_SERIALIZE=msgpack`
.
It's later found that pyarrow is unstable and may lead to crash.
+
[2018/03/20]
`ModelDesc`
starts to use simplified interfaces:
+
`_get_inputs()`
renamed to
`inputs()`
and returns
`tf.placeholder`
s.
+
`build_graph(self, tensor1, tensor2)`
returns the cost tensor directly.
...
...
examples/FasterRCNN/NOTES.md
View file @
f63e0ee4
...
...
@@ -46,11 +46,10 @@ Model:
Speed:
1.
The training will start very slowly due to convolution warmup
, until about
1.
If cudnn warmup is on, the training will start very slowly
, until about
10k steps (or more if scale augmentation is used) to reach a maximum speed.
As a result, the ETA is also inaccurate at the beginning.
You can disable warmup by
`export TF_CUDNN_USE_AUTOTUNE=0`
, which makes the
training faster at the beginning, but perhaps not in the end.
Warmup is by default on when no scale augmentation is used.
1.
After warmup, the training speed will slowly decrease due to more accurate proposals.
...
...
examples/FasterRCNN/README.md
View file @
f63e0ee4
# Faster
-RCNN / Mask-R
CNN on COCO
# Faster
R-CNN / Mask R-
CNN on COCO
This example provides a minimal (2k lines) and faithful implementation of the following papers:
+
[
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
](
https://arxiv.org/abs/1506.01497
)
...
...
@@ -73,7 +73,7 @@ prediction will need to be run with the corresponding training configs.
These models are trained with different configurations on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95.
Performance in
[
Detectron
](
https://github.com/facebookresearch/Detectron/
)
can be roughly reproduced.
Mask
R
CNN results contain both box and mask mAP.
Mask
R-
CNN results contain both box and mask mAP.
| Backbone | mAP
<br/>
(box;mask) | Detectron mAP
<sup>
[
1
](
#ft1
)
</sup><br/>
(box;mask) | Time on 8 V100s | Configurations
<br/>
(click to expand) |
| - | - | - | - | - |
...
...
examples/FasterRCNN/config.py
View file @
f63e0ee4
...
...
@@ -215,6 +215,10 @@ def finalize_configs(is_training):
assert
len
(
_C
.
CASCADE
.
BBOX_REG_WEIGHTS
)
==
num_cascade
if
is_training
:
train_scales
=
_C
.
PREPROC
.
TRAIN_SHORT_EDGE_SIZE
if
train_scales
[
1
]
-
train_scales
[
0
]
>
100
:
# don't warmup if augmentation is on
os
.
environ
[
'TF_CUDNN_USE_AUTOTUNE'
]
=
'0'
os
.
environ
[
'TF_AUTOTUNE_THRESHOLD'
]
=
'1'
assert
_C
.
TRAINER
in
[
'horovod'
,
'replicated'
],
_C
.
TRAINER
...
...
examples/FasterRCNN/train.py
View file @
f63e0ee4
...
...
@@ -24,7 +24,6 @@ from tensorpack import *
from
tensorpack.tfutils.summary
import
add_moving_summary
from
tensorpack.tfutils
import
optimizer
from
tensorpack.tfutils.common
import
get_tf_version_tuple
from
tensorpack.utils.serialize
import
loads
,
dumps
import
tensorpack.utils.viz
as
tpviz
from
coco
import
COCODetection
...
...
@@ -417,16 +416,14 @@ class EvalCallback(Callback):
self
.
dataflows
=
[
get_eval_dataflow
(
shard
=
k
,
num_shards
=
self
.
num_predictor
)
for
k
in
range
(
self
.
num_predictor
)]
else
:
if
hvd
.
size
()
>
hvd
.
local_size
():
logger
.
warn
(
"Distributed evaluation with horovod is unstable. Sometimes MPI hangs for unknown reasons."
)
# Only eval on the first machine.
# Alternatively, can eval on all ranks and use allgather, but allgather sometimes hangs
self
.
_horovod_run_eval
=
hvd
.
rank
()
==
hvd
.
local_rank
()
if
self
.
_horovod_run_eval
:
self
.
predictor
=
self
.
_build_coco_predictor
(
0
)
self
.
dataflow
=
get_eval_dataflow
(
shard
=
hvd
.
rank
(),
num_shards
=
hvd
.
size
())
self
.
dataflow
=
get_eval_dataflow
(
shard
=
hvd
.
local_rank
(),
num_shards
=
hvd
.
local_
size
())
# use uint8 to aggregate strings
self
.
local_result_tensor
=
tf
.
placeholder
(
tf
.
uint8
,
shape
=
[
None
],
name
=
'local_result_string'
)
self
.
concat_results
=
hvd
.
allgather
(
self
.
local_result_tensor
,
name
=
'concat_results'
)
local_size
=
tf
.
expand_dims
(
tf
.
size
(
self
.
local_result_tensor
),
0
)
self
.
string_lens
=
hvd
.
allgather
(
local_size
,
name
=
'concat_sizes'
)
self
.
barrier
=
hvd
.
allreduce
(
tf
.
random_normal
(
shape
=
[
1
]))
def
_build_coco_predictor
(
self
,
idx
):
graph_func
=
self
.
trainer
.
get_predictor
(
self
.
_in_names
,
self
.
_out_names
,
device
=
idx
)
...
...
@@ -443,6 +440,7 @@ class EvalCallback(Callback):
logger
.
info
(
"[EvalCallback] Will evaluate every {} epochs"
.
format
(
interval
))
def
_eval
(
self
):
logdir
=
args
.
logdir
if
cfg
.
TRAINER
==
'replicated'
:
with
ThreadPoolExecutor
(
max_workers
=
self
.
num_predictor
,
thread_name_prefix
=
'EvalWorker'
)
as
executor
,
\
tqdm
.
tqdm
(
total
=
sum
([
df
.
size
()
for
df
in
self
.
dataflows
]))
as
pbar
:
...
...
@@ -451,23 +449,26 @@ class EvalCallback(Callback):
futures
.
append
(
executor
.
submit
(
eval_coco
,
dataflow
,
pred
,
pbar
))
all_results
=
list
(
itertools
.
chain
(
*
[
fut
.
result
()
for
fut
in
futures
]))
else
:
if
self
.
_horovod_run_eval
:
local_results
=
eval_coco
(
self
.
dataflow
,
self
.
predictor
)
results_as_arr
=
np
.
frombuffer
(
dumps
(
local_results
),
dtype
=
np
.
uint8
)
sizes
,
concat_arrs
=
tf
.
get_default_session
()
.
run
(
[
self
.
string_lens
,
self
.
concat_results
],
feed_dict
=
{
self
.
local_result_tensor
:
results_as_arr
})
output_partial
=
os
.
path
.
join
(
logdir
,
'outputs{}-part{}.json'
.
format
(
self
.
global_step
,
hvd
.
local_rank
()))
with
open
(
output_partial
,
'w'
)
as
f
:
json
.
dump
(
local_results
,
f
)
self
.
barrier
.
eval
()
if
hvd
.
rank
()
>
0
:
return
all_results
=
[]
start
=
0
for
size
in
sizes
:
substr
=
concat_arrs
[
start
:
start
+
size
]
results
=
loads
(
substr
.
tobytes
())
all_results
.
extend
(
results
)
start
=
start
+
size
for
k
in
range
(
hvd
.
local_size
()):
output_partial
=
os
.
path
.
join
(
logdir
,
'outputs{}-part{}.json'
.
format
(
self
.
global_step
,
k
))
with
open
(
output_partial
,
'r'
)
as
f
:
obj
=
json
.
load
(
f
)
all_results
.
extend
(
obj
)
os
.
unlink
(
output_partial
)
output_file
=
os
.
path
.
join
(
log
ger
.
get_logger_dir
()
,
'outputs{}.json'
.
format
(
self
.
global_step
))
log
dir
,
'outputs{}.json'
.
format
(
self
.
global_step
))
with
open
(
output_file
,
'w'
)
as
f
:
json
.
dump
(
all_results
,
f
)
try
:
...
...
@@ -572,6 +573,9 @@ if __name__ == '__main__':
if
not
is_horovod
:
callbacks
.
append
(
GPUUtilizationTracker
())
if
is_horovod
and
hvd
.
rank
()
>
0
:
session_init
=
None
else
:
if
args
.
load
:
session_init
=
get_model_loader
(
args
.
load
)
else
:
...
...
tensorpack/dataflow/parallel.py
View file @
f63e0ee4
...
...
@@ -447,9 +447,11 @@ class PlasmaGetData(ProxyDataFlow):
yield
dp
try
:
import
pyarrow.plasma
as
plasma
except
ImportError
:
from
..utils.develop
import
create_dummy_class
PlasmaPutData
=
create_dummy_class
(
'PlasmaPutData'
,
'pyarrow'
)
# noqa
PlasmaGetData
=
create_dummy_class
(
'PlasmaGetData'
,
'pyarrow'
)
# noqa
plasma
=
None
# These plasma code is only experimental
# try:
# import pyarrow.plasma as plasma
# except ImportError:
# from ..utils.develop import create_dummy_class
# PlasmaPutData = create_dummy_class('PlasmaPutData', 'pyarrow') # noqa
# PlasmaGetData = create_dummy_class('PlasmaGetData', 'pyarrow') # noqa
tensorpack/libinfo.py
View file @
f63e0ee4
...
...
@@ -37,11 +37,11 @@ os.environ['TF_SYNC_ON_FINISH'] = '0' # will become default
os
.
environ
[
'TF_GPU_THREAD_MODE'
]
=
'gpu_private'
os
.
environ
[
'TF_GPU_THREAD_COUNT'
]
=
'2'
# Available in TF1.6+. Haven't seen different performance on R50.
# NOTE
TF set it to 0 by default,
because:
# Available in TF1.6+
& cudnn7
. Haven't seen different performance on R50.
# NOTE
we disable it
because:
# this mode may use scaled atomic integer reduction that may cause a numerical
# overflow for certain input data range.
# os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1
'
os
.
environ
[
'TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'
]
=
'0
'
try
:
import
tensorflow
as
tf
# noqa
...
...
tensorpack/utils/serialize.py
View file @
f63e0ee4
...
...
@@ -64,7 +64,7 @@ except ImportError:
dumps_msgpack
=
create_dummy_func
(
# noqa
'dumps_msgpack'
,
[
'msgpack'
,
'msgpack_numpy'
])
if
os
.
environ
.
get
(
'TENSORPACK_SERIALIZE'
,
None
)
==
'msgpack'
:
if
pa
is
None
or
os
.
environ
.
get
(
'TENSORPACK_SERIALIZE'
,
None
)
==
'msgpack'
:
loads
=
loads_msgpack
dumps
=
dumps_msgpack
else
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment