Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
4142b9e7
Commit
4142b9e7
authored
Feb 28, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
docs and deprecations
parent
9edc0ca5
Changes
8
Hide whitespace changes
Inline
Side-by-side
Showing
8 changed files
with
37 additions
and
131 deletions
+37
-131
docs/conf.py
docs/conf.py
+1
-1
docs/tutorial/performance-tuning.md
docs/tutorial/performance-tuning.md
+16
-16
examples/ResNet/README.md
examples/ResNet/README.md
+3
-1
tensorpack/callbacks/inference.py
tensorpack/callbacks/inference.py
+2
-19
tensorpack/callbacks/monitor.py
tensorpack/callbacks/monitor.py
+12
-8
tensorpack/dataflow/common.py
tensorpack/dataflow/common.py
+1
-1
tensorpack/tfutils/sessinit.py
tensorpack/tfutils/sessinit.py
+2
-0
tensorpack/tfutils/symbolic_functions.py
tensorpack/tfutils/symbolic_functions.py
+0
-85
No files found.
docs/conf.py
View file @
4142b9e7
...
@@ -376,7 +376,7 @@ def autodoc_skip_member(app, what, name, obj, skip, options):
...
@@ -376,7 +376,7 @@ def autodoc_skip_member(app, what, name, obj, skip, options):
'PeriodicRunHooks'
,
'PeriodicRunHooks'
,
'apply_default_prefetch'
,
'apply_default_prefetch'
,
'
guided_relu'
,
'
saliency_map'
,
'get_scalar_var'
,
'psnr'
,
'saliency_map'
,
'get_scalar_var'
,
'psnr'
,
'prediction_incorrect'
,
'huber_loss'
,
'SoftMax'
'prediction_incorrect'
,
'huber_loss'
,
'SoftMax'
]:
]:
return
True
return
True
...
...
docs/tutorial/performance-tuning.md
View file @
4142b9e7
...
@@ -2,41 +2,39 @@
...
@@ -2,41 +2,39 @@
# Performance Tuning
# Performance Tuning
__We do not know why your training is slow__
(and most of the times it's not a tensorpack problem).
__We do not know why your training is slow__
(and most of the times it's not a tensorpack problem).
Performance is different across machines and tasks
.
Performance is different across machines and tasks
,
S
o you need to figure out most parts by your own.
s
o you need to figure out most parts by your own.
Here's a list of things you can do when your training is slow.
Here's a list of things you can do when your training is slow.
If you need help improving the speed,
If you ask for help understanding and improving the speed, PLEASE do them and include your findings.
PLEASE do them and include your findings.
## Figure out the bottleneck
## Figure out the bottleneck
1.
If you use feed-based input (unrecommended) and datapoints are large, data is likely to become the
1.
If you use feed-based input (unrecommended) and datapoints are large, data is likely to become the bottleneck.
bottleneck.
2.
If you use queue-based input + dataflow, you can look for the queue size statistics in
2.
If you use queue-based input + dataflow, you can look for the queue size statistics in
training log. Ideally the queue should be near-full (default size is 50).
training log. Ideally the
input
queue should be near-full (default size is 50).
If the size is near-zero, data is the bottleneck.
If the size is near-zero, data is the bottleneck.
3.
If GPU utilization is low, it may be because of slow data, or some ops are inefficient. Also make sure GPUs are not locked in P8 state.
3.
If GPU utilization is low, it may be because of slow data, or some ops are inefficient. Also make sure GPUs are not locked in P8 state.
## Benchmark the components
## Benchmark the components
1.
Use
`DummyConstantInput(shapes)`
as the
`InputSource`
.
1.
(usually not needed) Use
`data=DummyConstantInput(shapes)`
for training,
so that the iterations only take data from a constant tensor.
so that the iterations only take data from a constant tensor.
This will
help find out the slow operations you're using in the graph
.
This will
benchmark the graph without the overhead of data
.
2.
Use
`dataflow=FakeData(shapes, random=False)`
to replace your original DataFlow by a constant DataFlow.
2.
Use
`dataflow=FakeData(shapes, random=False)`
to replace your original DataFlow by a constant DataFlow.
This is almost the same as (1)
, i.e., it removes the overhead of data
.
This is almost the same as (1).
3.
If you're using a TF-based input pipeline you wrote, you can simply run it in a loop and test its speed.
3.
If you're using a TF-based input pipeline you wrote, you can simply run it in a loop and test its speed.
4.
Use
`TestDataSpeed(mydf).start()`
to benchmark your DataFlow.
4.
Use
`TestDataSpeed(mydf).start()`
to benchmark your DataFlow.
A benchmark will give you more precise information about which part you should improve.
A benchmark will give you more precise information about which part you should improve.
Note that you should only look at iteration speed after about 50 iterations, since everything is slow at the beginning.
## Investigate DataFlow
## Investigate DataFlow
Understand the
[
Efficient DataFlow
](
efficient-dataflow.html
)
tutorial, so you know what your DataFlow is doing.
Understand the
[
Efficient DataFlow
](
efficient-dataflow.html
)
tutorial, so you know what your DataFlow is doing.
Benchmark your DataFlow with modifications to understand which part is the bottleneck. Some examples
Benchmark your DataFlow with modifications to understand which part is the bottleneck. Some examples include:
include:
1.
Benchmark only raw reader (and perhaps add some parallel
prefetching
).
1.
Benchmark only raw reader (and perhaps add some parallel
ism
).
2.
Gradually add some pre-processing and see how the performance changes.
2.
Gradually add some pre-processing and see how the performance changes.
3.
Change the number of parallel processes or threads.
3.
Change the number of parallel processes or threads.
...
@@ -52,17 +50,19 @@ know the reason and improve it accordingly, e.g.:
...
@@ -52,17 +50,19 @@ know the reason and improve it accordingly, e.g.:
## Investigate TensorFlow
## Investigate TensorFlow
When you're sure that data is not a bottleneck (e.g. when
queue is always
full), you can start to
When you're sure that data is not a bottleneck (e.g. when
the logs show that queue is almost
full), you can start to
worry about the model.
worry about the model.
You can add a
`GraphProfiler`
callback when benchmarking the graph. It will
A naive but effective way is to remove ops from your model to understand how much time they cost.
Or you can use
`GraphProfiler`
callback to benchmark the graph. It will
dump runtime tracing information (to either TensorBoard or chrome) to help diagnose the issue.
dump runtime tracing information (to either TensorBoard or chrome) to help diagnose the issue.
Remember not to use the first several iterations.
### Slow with single-GPU
### Slow with single-GPU
This is literally saying TF ops are slow. Usually there isn't much you can do, except to optimize the kernels.
This is literally saying TF ops are slow. Usually there isn't much you can do, except to optimize the kernels.
But there may be something cheap you can try:
But there may be something cheap you can try:
1.
You can v
isualize copies across devices in chrome.
1.
V
isualize copies across devices in chrome.
It may help to change device placement to avoid some CPU-GPU copies.
It may help to change device placement to avoid some CPU-GPU copies.
It may help to replace some CPU-only ops with equivalent GPU ops to avoid copies.
It may help to replace some CPU-only ops with equivalent GPU ops to avoid copies.
...
...
examples/ResNet/README.md
View file @
4142b9e7
...
@@ -26,7 +26,9 @@ To train, first decompress ImageNet data into [this structure](http://tensorpack
...
@@ -26,7 +26,9 @@ To train, first decompress ImageNet data into [this structure](http://tensorpack
```
```
You should be able to see good GPU utilization (95%~99%), if your data is fast enough.
You should be able to see good GPU utilization (95%~99%), if your data is fast enough.
The default data pipeline is probably OK for most systems.
It can finish training
[
within 20 hours
](
http://dawn.cs.stanford.edu/benchmark/ImageNet/train.html
)
on AWS p3.16xlarge.
The default data pipeline is probably OK for most SSD systems.
See the
[
tutorial
](
http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html
)
on other options to speed up your data.
See the
[
tutorial
](
http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html
)
on other options to speed up your data.


...
...
tensorpack/callbacks/inference.py
View file @
4142b9e7
...
@@ -9,7 +9,6 @@ from six.moves import zip
...
@@ -9,7 +9,6 @@ from six.moves import zip
from
.base
import
Callback
from
.base
import
Callback
from
..utils
import
logger
from
..utils
import
logger
from
..utils.utils
import
execute_only_once
from
..utils.stats
import
RatioCounter
,
BinaryStatistics
from
..utils.stats
import
RatioCounter
,
BinaryStatistics
from
..tfutils.common
import
get_op_tensor_name
from
..tfutils.common
import
get_op_tensor_name
...
@@ -55,17 +54,9 @@ class Inferencer(Callback):
...
@@ -55,17 +54,9 @@ class Inferencer(Callback):
"""
"""
Return a list of tensor names (guaranteed not op name) this inferencer needs.
Return a list of tensor names (guaranteed not op name) this inferencer needs.
"""
"""
try
:
ret
=
self
.
_get_fetches
()
ret
=
self
.
_get_fetches
()
except
NotImplementedError
:
logger
.
warn
(
"Inferencer._get_output_tensors was deprecated and renamed to _get_fetches"
)
ret
=
self
.
_get_output_tensors
()
return
[
get_op_tensor_name
(
n
)[
1
]
for
n
in
ret
]
return
[
get_op_tensor_name
(
n
)[
1
]
for
n
in
ret
]
def
_get_output_tensors
(
self
):
pass
def
_get_fetches
(
self
):
def
_get_fetches
(
self
):
raise
NotImplementedError
()
raise
NotImplementedError
()
...
@@ -77,15 +68,7 @@ class Inferencer(Callback):
...
@@ -77,15 +68,7 @@ class Inferencer(Callback):
results(list): list of results this inferencer fetched. Has the same
results(list): list of results this inferencer fetched. Has the same
length as ``self._get_fetches()``.
length as ``self._get_fetches()``.
"""
"""
try
:
self
.
_on_fetches
(
results
)
self
.
_on_fetches
(
results
)
except
NotImplementedError
:
if
execute_only_once
():
logger
.
warn
(
"Inferencer._datapoint was deprecated and renamed to _on_fetches."
)
self
.
_datapoint
(
results
)
def
_datapoint
(
self
,
results
):
pass
def
_on_fetches
(
self
,
results
):
def
_on_fetches
(
self
,
results
):
raise
NotImplementedError
()
raise
NotImplementedError
()
...
...
tensorpack/callbacks/monitor.py
View file @
4142b9e7
...
@@ -19,7 +19,7 @@ from ..tfutils.summary import create_scalar_summary, create_image_summary
...
@@ -19,7 +19,7 @@ from ..tfutils.summary import create_scalar_summary, create_image_summary
from
.base
import
Callback
from
.base
import
Callback
__all__
=
[
'TrainingMonitor'
,
'Monitors'
,
__all__
=
[
'TrainingMonitor'
,
'Monitors'
,
'TF
SummaryWriter'
,
'TF
EventWriter'
,
'JSONWriter'
,
'TFEventWriter'
,
'JSONWriter'
,
'ScalarPrinter'
,
'SendMonitorData'
]
'ScalarPrinter'
,
'SendMonitorData'
]
...
@@ -108,7 +108,7 @@ class Monitors(Callback):
...
@@ -108,7 +108,7 @@ class Monitors(Callback):
_chief_only
=
False
_chief_only
=
False
def
__init__
(
self
,
monitors
):
def
__init__
(
self
,
monitors
):
self
.
_scalar_history
=
ScalarHistory
()
.
set_chief_only
(
False
)
self
.
_scalar_history
=
ScalarHistory
()
self
.
_monitors
=
monitors
+
[
self
.
_scalar_history
]
self
.
_monitors
=
monitors
+
[
self
.
_scalar_history
]
for
m
in
self
.
_monitors
:
for
m
in
self
.
_monitors
:
assert
isinstance
(
m
,
TrainingMonitor
),
m
assert
isinstance
(
m
,
TrainingMonitor
),
m
...
@@ -172,7 +172,7 @@ class Monitors(Callback):
...
@@ -172,7 +172,7 @@ class Monitors(Callback):
def
put_event
(
self
,
evt
):
def
put_event
(
self
,
evt
):
"""
"""
Put an
tf.Event
.
Put an
:class:`tf.Event`
.
`step` and `wall_time` fields of :class:`tf.Event` will be filled automatically.
`step` and `wall_time` fields of :class:`tf.Event` will be filled automatically.
Args:
Args:
...
@@ -185,12 +185,18 @@ class Monitors(Callback):
...
@@ -185,12 +185,18 @@ class Monitors(Callback):
def
get_latest
(
self
,
name
):
def
get_latest
(
self
,
name
):
"""
"""
Get latest scalar value of some data.
Get latest scalar value of some data.
If you run multiprocess training, keep in mind that
the data is perhaps only available on chief process.
"""
"""
return
self
.
_scalar_history
.
get_latest
(
name
)
return
self
.
_scalar_history
.
get_latest
(
name
)
def
get_history
(
self
,
name
):
def
get_history
(
self
,
name
):
"""
"""
Get a history of the scalar value of some data.
Get a history of the scalar value of some data.
If you run multiprocess training, keep in mind that
the data is perhaps only available on chief process.
"""
"""
return
self
.
_scalar_history
.
get_history
(
name
)
return
self
.
_scalar_history
.
get_history
(
name
)
...
@@ -240,11 +246,6 @@ class TFEventWriter(TrainingMonitor):
...
@@ -240,11 +246,6 @@ class TFEventWriter(TrainingMonitor):
self
.
_writer
.
close
()
self
.
_writer
.
close
()
def
TFSummaryWriter
(
*
args
,
**
kwargs
):
logger
.
warn
(
"TFSummaryWriter was renamed to TFEventWriter!"
)
return
TFEventWriter
(
*
args
,
**
kwargs
)
class
JSONWriter
(
TrainingMonitor
):
class
JSONWriter
(
TrainingMonitor
):
"""
"""
Write all scalar data to a json file under ``logger.get_logger_dir()``, grouped by their global step.
Write all scalar data to a json file under ``logger.get_logger_dir()``, grouped by their global step.
...
@@ -397,6 +398,9 @@ class ScalarHistory(TrainingMonitor):
...
@@ -397,6 +398,9 @@ class ScalarHistory(TrainingMonitor):
"""
"""
Only used by monitors internally.
Only used by monitors internally.
"""
"""
_chief_only
=
False
def
_setup_graph
(
self
):
def
_setup_graph
(
self
):
self
.
_dic
=
defaultdict
(
list
)
self
.
_dic
=
defaultdict
(
list
)
...
...
tensorpack/dataflow/common.py
View file @
4142b9e7
...
@@ -688,7 +688,7 @@ class PrintData(ProxyDataFlow):
...
@@ -688,7 +688,7 @@ class PrintData(ProxyDataFlow):
self
.
num
=
num
self
.
num
=
num
if
label
:
if
label
:
log_deprecated
(
"PrintData(label, ..."
,
"Use PrintData(name, ...
instead.
"
)
log_deprecated
(
"PrintData(label, ..."
,
"Use PrintData(name, ...
instead."
,
"2018-05-01
"
)
self
.
name
=
label
self
.
name
=
label
else
:
else
:
self
.
name
=
name
self
.
name
=
name
...
...
tensorpack/tfutils/sessinit.py
View file @
4142b9e7
...
@@ -8,6 +8,7 @@ import tensorflow as tf
...
@@ -8,6 +8,7 @@ import tensorflow as tf
import
six
import
six
from
..utils
import
logger
from
..utils
import
logger
from
..utils.develop
import
deprecated
from
.common
import
get_op_tensor_name
from
.common
import
get_op_tensor_name
from
.varmanip
import
(
SessionUpdate
,
get_savename_from_varname
,
from
.varmanip
import
(
SessionUpdate
,
get_savename_from_varname
,
is_training_name
,
get_checkpoint_path
)
is_training_name
,
get_checkpoint_path
)
...
@@ -261,6 +262,7 @@ def get_model_loader(filename):
...
@@ -261,6 +262,7 @@ def get_model_loader(filename):
return
SaverRestore
(
filename
)
return
SaverRestore
(
filename
)
@
deprecated
(
"Write the logic yourself!"
,
"2018-06-01"
)
def
TryResumeTraining
():
def
TryResumeTraining
():
"""
"""
Try loading latest checkpoint from ``logger.get_logger_dir()``, only if there is one.
Try loading latest checkpoint from ``logger.get_logger_dir()``, only if there is one.
...
...
tensorpack/tfutils/symbolic_functions.py
View file @
4142b9e7
...
@@ -3,7 +3,6 @@
...
@@ -3,7 +3,6 @@
import
tensorflow
as
tf
import
tensorflow
as
tf
from
contextlib
import
contextmanager
import
numpy
as
np
import
numpy
as
np
from
..utils.develop
import
deprecated
from
..utils.develop
import
deprecated
...
@@ -17,19 +16,6 @@ def prediction_incorrect(logits, label, topk=1, name='incorrect_vector'):
...
@@ -17,19 +16,6 @@ def prediction_incorrect(logits, label, topk=1, name='incorrect_vector'):
tf
.
float32
,
name
=
name
)
tf
.
float32
,
name
=
name
)
@
deprecated
(
"Please implement it by yourself."
,
"2018-02-28"
)
def
accuracy
(
logits
,
label
,
topk
=
1
,
name
=
'accuracy'
):
"""
Args:
logits: shape [B,C].
label: shape [B].
topk(int): topk
Returns:
a single scalar
"""
return
tf
.
reduce_mean
(
tf
.
cast
(
tf
.
nn
.
in_top_k
(
logits
,
label
,
topk
),
tf
.
float32
),
name
=
name
)
def
flatten
(
x
):
def
flatten
(
x
):
"""
"""
Flatten the tensor.
Flatten the tensor.
...
@@ -47,54 +33,6 @@ def batch_flatten(x):
...
@@ -47,54 +33,6 @@ def batch_flatten(x):
return
tf
.
reshape
(
x
,
tf
.
stack
([
tf
.
shape
(
x
)[
0
],
-
1
]))
return
tf
.
reshape
(
x
,
tf
.
stack
([
tf
.
shape
(
x
)[
0
],
-
1
]))
@
deprecated
(
"Please implement it by yourself."
,
"2018-02-28"
)
def
class_balanced_cross_entropy
(
pred
,
label
,
name
=
'cross_entropy_loss'
):
"""
The class-balanced cross entropy loss,
as in `Holistically-Nested Edge Detection
<http://arxiv.org/abs/1504.06375>`_.
Args:
pred: of shape (b, ...). the predictions in [0,1].
label: of the same shape. the ground truth in {0,1}.
Returns:
class-balanced cross entropy loss.
"""
with
tf
.
name_scope
(
'class_balanced_cross_entropy'
):
z
=
batch_flatten
(
pred
)
y
=
tf
.
cast
(
batch_flatten
(
label
),
tf
.
float32
)
count_neg
=
tf
.
reduce_sum
(
1.
-
y
)
count_pos
=
tf
.
reduce_sum
(
y
)
beta
=
count_neg
/
(
count_neg
+
count_pos
)
eps
=
1e-12
loss_pos
=
-
beta
*
tf
.
reduce_mean
(
y
*
tf
.
log
(
z
+
eps
))
loss_neg
=
(
1.
-
beta
)
*
tf
.
reduce_mean
((
1.
-
y
)
*
tf
.
log
(
1.
-
z
+
eps
))
cost
=
tf
.
subtract
(
loss_pos
,
loss_neg
,
name
=
name
)
return
cost
@
deprecated
(
"Please implement it by yourself."
,
"2018-02-28"
)
def
class_balanced_sigmoid_cross_entropy
(
logits
,
label
,
name
=
'cross_entropy_loss'
):
"""
This function accepts logits rather than predictions, and is more numerically stable than
:func:`class_balanced_cross_entropy`.
"""
with
tf
.
name_scope
(
'class_balanced_sigmoid_cross_entropy'
):
y
=
tf
.
cast
(
label
,
tf
.
float32
)
count_neg
=
tf
.
reduce_sum
(
1.
-
y
)
count_pos
=
tf
.
reduce_sum
(
y
)
beta
=
count_neg
/
(
count_neg
+
count_pos
)
pos_weight
=
beta
/
(
1
-
beta
)
cost
=
tf
.
nn
.
weighted_cross_entropy_with_logits
(
logits
=
logits
,
targets
=
y
,
pos_weight
=
pos_weight
)
cost
=
tf
.
reduce_mean
(
cost
*
(
1
-
beta
))
zero
=
tf
.
equal
(
count_pos
,
0.0
)
return
tf
.
where
(
zero
,
0.0
,
cost
,
name
=
name
)
def
print_stat
(
x
,
message
=
None
):
def
print_stat
(
x
,
message
=
None
):
""" A simple print Op that might be easier to use than :meth:`tf.Print`.
""" A simple print Op that might be easier to use than :meth:`tf.Print`.
Use it like: ``x = print_stat(x, message='This is x')``.
Use it like: ``x = print_stat(x, message='This is x')``.
...
@@ -206,29 +144,6 @@ def psnr(prediction, ground_truth, maxp=None, name='psnr'):
...
@@ -206,29 +144,6 @@ def psnr(prediction, ground_truth, maxp=None, name='psnr'):
return
psnr
return
psnr
@
contextmanager
@
deprecated
(
"Please implement it by yourself."
,
"2018-02-28"
)
def
guided_relu
():
"""
Returns:
A context where the gradient of :meth:`tf.nn.relu` is replaced by
guided back-propagation, as described in the paper:
`Striving for Simplicity: The All Convolutional Net
<https://arxiv.org/abs/1412.6806>`_
"""
from
tensorflow.python.ops
import
gen_nn_ops
# noqa
@
tf
.
RegisterGradient
(
"GuidedReLU"
)
def
GuidedReluGrad
(
op
,
grad
):
return
tf
.
where
(
0.
<
grad
,
gen_nn_ops
.
_relu_grad
(
grad
,
op
.
outputs
[
0
]),
tf
.
zeros
(
grad
.
get_shape
()))
g
=
tf
.
get_default_graph
()
with
g
.
gradient_override_map
({
'Relu'
:
'GuidedReLU'
}):
yield
@
deprecated
(
"Please implement it by yourself."
,
"2018-04-28"
)
@
deprecated
(
"Please implement it by yourself."
,
"2018-04-28"
)
def
saliency_map
(
output
,
input
,
name
=
"saliency_map"
):
def
saliency_map
(
output
,
input
,
name
=
"saliency_map"
):
"""
"""
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment