Commit 4142b9e7 authored by Yuxin Wu's avatar Yuxin Wu

docs and deprecations

parent 9edc0ca5
...@@ -376,7 +376,7 @@ def autodoc_skip_member(app, what, name, obj, skip, options): ...@@ -376,7 +376,7 @@ def autodoc_skip_member(app, what, name, obj, skip, options):
'PeriodicRunHooks', 'PeriodicRunHooks',
'apply_default_prefetch', 'apply_default_prefetch',
'guided_relu', 'saliency_map', 'get_scalar_var', 'psnr', 'saliency_map', 'get_scalar_var', 'psnr',
'prediction_incorrect', 'huber_loss', 'SoftMax' 'prediction_incorrect', 'huber_loss', 'SoftMax'
]: ]:
return True return True
......
...@@ -2,41 +2,39 @@ ...@@ -2,41 +2,39 @@
# Performance Tuning # Performance Tuning
__We do not know why your training is slow__ (and most of the times it's not a tensorpack problem). __We do not know why your training is slow__ (and most of the times it's not a tensorpack problem).
Performance is different across machines and tasks. Performance is different across machines and tasks,
So you need to figure out most parts by your own. so you need to figure out most parts by your own.
Here's a list of things you can do when your training is slow. Here's a list of things you can do when your training is slow.
If you need help improving the speed, If you ask for help understanding and improving the speed, PLEASE do them and include your findings.
PLEASE do them and include your findings.
## Figure out the bottleneck ## Figure out the bottleneck
1. If you use feed-based input (unrecommended) and datapoints are large, data is likely to become the 1. If you use feed-based input (unrecommended) and datapoints are large, data is likely to become the bottleneck.
bottleneck.
2. If you use queue-based input + dataflow, you can look for the queue size statistics in 2. If you use queue-based input + dataflow, you can look for the queue size statistics in
training log. Ideally the queue should be near-full (default size is 50). training log. Ideally the input queue should be near-full (default size is 50).
If the size is near-zero, data is the bottleneck. If the size is near-zero, data is the bottleneck.
3. If GPU utilization is low, it may be because of slow data, or some ops are inefficient. Also make sure GPUs are not locked in P8 state. 3. If GPU utilization is low, it may be because of slow data, or some ops are inefficient. Also make sure GPUs are not locked in P8 state.
## Benchmark the components ## Benchmark the components
1. Use `DummyConstantInput(shapes)` as the `InputSource`. 1. (usually not needed) Use `data=DummyConstantInput(shapes)` for training,
so that the iterations only take data from a constant tensor. so that the iterations only take data from a constant tensor.
This will help find out the slow operations you're using in the graph. This will benchmark the graph without the overhead of data.
2. Use `dataflow=FakeData(shapes, random=False)` to replace your original DataFlow by a constant DataFlow. 2. Use `dataflow=FakeData(shapes, random=False)` to replace your original DataFlow by a constant DataFlow.
This is almost the same as (1), i.e., it removes the overhead of data. This is almost the same as (1).
3. If you're using a TF-based input pipeline you wrote, you can simply run it in a loop and test its speed. 3. If you're using a TF-based input pipeline you wrote, you can simply run it in a loop and test its speed.
4. Use `TestDataSpeed(mydf).start()` to benchmark your DataFlow. 4. Use `TestDataSpeed(mydf).start()` to benchmark your DataFlow.
A benchmark will give you more precise information about which part you should improve. A benchmark will give you more precise information about which part you should improve.
Note that you should only look at iteration speed after about 50 iterations, since everything is slow at the beginning.
## Investigate DataFlow ## Investigate DataFlow
Understand the [Efficient DataFlow](efficient-dataflow.html) tutorial, so you know what your DataFlow is doing. Understand the [Efficient DataFlow](efficient-dataflow.html) tutorial, so you know what your DataFlow is doing.
Benchmark your DataFlow with modifications to understand which part is the bottleneck. Some examples Benchmark your DataFlow with modifications to understand which part is the bottleneck. Some examples include:
include:
1. Benchmark only raw reader (and perhaps add some parallel prefetching). 1. Benchmark only raw reader (and perhaps add some parallelism).
2. Gradually add some pre-processing and see how the performance changes. 2. Gradually add some pre-processing and see how the performance changes.
3. Change the number of parallel processes or threads. 3. Change the number of parallel processes or threads.
...@@ -52,17 +50,19 @@ know the reason and improve it accordingly, e.g.: ...@@ -52,17 +50,19 @@ know the reason and improve it accordingly, e.g.:
## Investigate TensorFlow ## Investigate TensorFlow
When you're sure that data is not a bottleneck (e.g. when queue is always full), you can start to When you're sure that data is not a bottleneck (e.g. when the logs show that queue is almost full), you can start to
worry about the model. worry about the model.
You can add a `GraphProfiler` callback when benchmarking the graph. It will A naive but effective way is to remove ops from your model to understand how much time they cost.
Or you can use `GraphProfiler` callback to benchmark the graph. It will
dump runtime tracing information (to either TensorBoard or chrome) to help diagnose the issue. dump runtime tracing information (to either TensorBoard or chrome) to help diagnose the issue.
Remember not to use the first several iterations.
### Slow with single-GPU ### Slow with single-GPU
This is literally saying TF ops are slow. Usually there isn't much you can do, except to optimize the kernels. This is literally saying TF ops are slow. Usually there isn't much you can do, except to optimize the kernels.
But there may be something cheap you can try: But there may be something cheap you can try:
1. You can visualize copies across devices in chrome. 1. Visualize copies across devices in chrome.
It may help to change device placement to avoid some CPU-GPU copies. It may help to change device placement to avoid some CPU-GPU copies.
It may help to replace some CPU-only ops with equivalent GPU ops to avoid copies. It may help to replace some CPU-only ops with equivalent GPU ops to avoid copies.
......
...@@ -26,7 +26,9 @@ To train, first decompress ImageNet data into [this structure](http://tensorpack ...@@ -26,7 +26,9 @@ To train, first decompress ImageNet data into [this structure](http://tensorpack
``` ```
You should be able to see good GPU utilization (95%~99%), if your data is fast enough. You should be able to see good GPU utilization (95%~99%), if your data is fast enough.
The default data pipeline is probably OK for most systems. It can finish training [within 20 hours](http://dawn.cs.stanford.edu/benchmark/ImageNet/train.html) on AWS p3.16xlarge.
The default data pipeline is probably OK for most SSD systems.
See the [tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html) on other options to speed up your data. See the [tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html) on other options to speed up your data.
![imagenet](imagenet-resnet.png) ![imagenet](imagenet-resnet.png)
......
...@@ -9,7 +9,6 @@ from six.moves import zip ...@@ -9,7 +9,6 @@ from six.moves import zip
from .base import Callback from .base import Callback
from ..utils import logger from ..utils import logger
from ..utils.utils import execute_only_once
from ..utils.stats import RatioCounter, BinaryStatistics from ..utils.stats import RatioCounter, BinaryStatistics
from ..tfutils.common import get_op_tensor_name from ..tfutils.common import get_op_tensor_name
...@@ -55,17 +54,9 @@ class Inferencer(Callback): ...@@ -55,17 +54,9 @@ class Inferencer(Callback):
""" """
Return a list of tensor names (guaranteed not op name) this inferencer needs. Return a list of tensor names (guaranteed not op name) this inferencer needs.
""" """
try:
ret = self._get_fetches() ret = self._get_fetches()
except NotImplementedError:
logger.warn("Inferencer._get_output_tensors was deprecated and renamed to _get_fetches")
ret = self._get_output_tensors()
return [get_op_tensor_name(n)[1] for n in ret] return [get_op_tensor_name(n)[1] for n in ret]
def _get_output_tensors(self):
pass
def _get_fetches(self): def _get_fetches(self):
raise NotImplementedError() raise NotImplementedError()
...@@ -77,15 +68,7 @@ class Inferencer(Callback): ...@@ -77,15 +68,7 @@ class Inferencer(Callback):
results(list): list of results this inferencer fetched. Has the same results(list): list of results this inferencer fetched. Has the same
length as ``self._get_fetches()``. length as ``self._get_fetches()``.
""" """
try:
self._on_fetches(results) self._on_fetches(results)
except NotImplementedError:
if execute_only_once():
logger.warn("Inferencer._datapoint was deprecated and renamed to _on_fetches.")
self._datapoint(results)
def _datapoint(self, results):
pass
def _on_fetches(self, results): def _on_fetches(self, results):
raise NotImplementedError() raise NotImplementedError()
......
...@@ -19,7 +19,7 @@ from ..tfutils.summary import create_scalar_summary, create_image_summary ...@@ -19,7 +19,7 @@ from ..tfutils.summary import create_scalar_summary, create_image_summary
from .base import Callback from .base import Callback
__all__ = ['TrainingMonitor', 'Monitors', __all__ = ['TrainingMonitor', 'Monitors',
'TFSummaryWriter', 'TFEventWriter', 'JSONWriter', 'TFEventWriter', 'JSONWriter',
'ScalarPrinter', 'SendMonitorData'] 'ScalarPrinter', 'SendMonitorData']
...@@ -108,7 +108,7 @@ class Monitors(Callback): ...@@ -108,7 +108,7 @@ class Monitors(Callback):
_chief_only = False _chief_only = False
def __init__(self, monitors): def __init__(self, monitors):
self._scalar_history = ScalarHistory().set_chief_only(False) self._scalar_history = ScalarHistory()
self._monitors = monitors + [self._scalar_history] self._monitors = monitors + [self._scalar_history]
for m in self._monitors: for m in self._monitors:
assert isinstance(m, TrainingMonitor), m assert isinstance(m, TrainingMonitor), m
...@@ -172,7 +172,7 @@ class Monitors(Callback): ...@@ -172,7 +172,7 @@ class Monitors(Callback):
def put_event(self, evt): def put_event(self, evt):
""" """
Put an tf.Event. Put an :class:`tf.Event`.
`step` and `wall_time` fields of :class:`tf.Event` will be filled automatically. `step` and `wall_time` fields of :class:`tf.Event` will be filled automatically.
Args: Args:
...@@ -185,12 +185,18 @@ class Monitors(Callback): ...@@ -185,12 +185,18 @@ class Monitors(Callback):
def get_latest(self, name): def get_latest(self, name):
""" """
Get latest scalar value of some data. Get latest scalar value of some data.
If you run multiprocess training, keep in mind that
the data is perhaps only available on chief process.
""" """
return self._scalar_history.get_latest(name) return self._scalar_history.get_latest(name)
def get_history(self, name): def get_history(self, name):
""" """
Get a history of the scalar value of some data. Get a history of the scalar value of some data.
If you run multiprocess training, keep in mind that
the data is perhaps only available on chief process.
""" """
return self._scalar_history.get_history(name) return self._scalar_history.get_history(name)
...@@ -240,11 +246,6 @@ class TFEventWriter(TrainingMonitor): ...@@ -240,11 +246,6 @@ class TFEventWriter(TrainingMonitor):
self._writer.close() self._writer.close()
def TFSummaryWriter(*args, **kwargs):
logger.warn("TFSummaryWriter was renamed to TFEventWriter!")
return TFEventWriter(*args, **kwargs)
class JSONWriter(TrainingMonitor): class JSONWriter(TrainingMonitor):
""" """
Write all scalar data to a json file under ``logger.get_logger_dir()``, grouped by their global step. Write all scalar data to a json file under ``logger.get_logger_dir()``, grouped by their global step.
...@@ -397,6 +398,9 @@ class ScalarHistory(TrainingMonitor): ...@@ -397,6 +398,9 @@ class ScalarHistory(TrainingMonitor):
""" """
Only used by monitors internally. Only used by monitors internally.
""" """
_chief_only = False
def _setup_graph(self): def _setup_graph(self):
self._dic = defaultdict(list) self._dic = defaultdict(list)
......
...@@ -688,7 +688,7 @@ class PrintData(ProxyDataFlow): ...@@ -688,7 +688,7 @@ class PrintData(ProxyDataFlow):
self.num = num self.num = num
if label: if label:
log_deprecated("PrintData(label, ...", "Use PrintData(name, ... instead.") log_deprecated("PrintData(label, ...", "Use PrintData(name, ... instead.", "2018-05-01")
self.name = label self.name = label
else: else:
self.name = name self.name = name
......
...@@ -8,6 +8,7 @@ import tensorflow as tf ...@@ -8,6 +8,7 @@ import tensorflow as tf
import six import six
from ..utils import logger from ..utils import logger
from ..utils.develop import deprecated
from .common import get_op_tensor_name from .common import get_op_tensor_name
from .varmanip import (SessionUpdate, get_savename_from_varname, from .varmanip import (SessionUpdate, get_savename_from_varname,
is_training_name, get_checkpoint_path) is_training_name, get_checkpoint_path)
...@@ -261,6 +262,7 @@ def get_model_loader(filename): ...@@ -261,6 +262,7 @@ def get_model_loader(filename):
return SaverRestore(filename) return SaverRestore(filename)
@deprecated("Write the logic yourself!", "2018-06-01")
def TryResumeTraining(): def TryResumeTraining():
""" """
Try loading latest checkpoint from ``logger.get_logger_dir()``, only if there is one. Try loading latest checkpoint from ``logger.get_logger_dir()``, only if there is one.
......
...@@ -3,7 +3,6 @@ ...@@ -3,7 +3,6 @@
import tensorflow as tf import tensorflow as tf
from contextlib import contextmanager
import numpy as np import numpy as np
from ..utils.develop import deprecated from ..utils.develop import deprecated
...@@ -17,19 +16,6 @@ def prediction_incorrect(logits, label, topk=1, name='incorrect_vector'): ...@@ -17,19 +16,6 @@ def prediction_incorrect(logits, label, topk=1, name='incorrect_vector'):
tf.float32, name=name) tf.float32, name=name)
@deprecated("Please implement it by yourself.", "2018-02-28")
def accuracy(logits, label, topk=1, name='accuracy'):
"""
Args:
logits: shape [B,C].
label: shape [B].
topk(int): topk
Returns:
a single scalar
"""
return tf.reduce_mean(tf.cast(tf.nn.in_top_k(logits, label, topk), tf.float32), name=name)
def flatten(x): def flatten(x):
""" """
Flatten the tensor. Flatten the tensor.
...@@ -47,54 +33,6 @@ def batch_flatten(x): ...@@ -47,54 +33,6 @@ def batch_flatten(x):
return tf.reshape(x, tf.stack([tf.shape(x)[0], -1])) return tf.reshape(x, tf.stack([tf.shape(x)[0], -1]))
@deprecated("Please implement it by yourself.", "2018-02-28")
def class_balanced_cross_entropy(pred, label, name='cross_entropy_loss'):
"""
The class-balanced cross entropy loss,
as in `Holistically-Nested Edge Detection
<http://arxiv.org/abs/1504.06375>`_.
Args:
pred: of shape (b, ...). the predictions in [0,1].
label: of the same shape. the ground truth in {0,1}.
Returns:
class-balanced cross entropy loss.
"""
with tf.name_scope('class_balanced_cross_entropy'):
z = batch_flatten(pred)
y = tf.cast(batch_flatten(label), tf.float32)
count_neg = tf.reduce_sum(1. - y)
count_pos = tf.reduce_sum(y)
beta = count_neg / (count_neg + count_pos)
eps = 1e-12
loss_pos = -beta * tf.reduce_mean(y * tf.log(z + eps))
loss_neg = (1. - beta) * tf.reduce_mean((1. - y) * tf.log(1. - z + eps))
cost = tf.subtract(loss_pos, loss_neg, name=name)
return cost
@deprecated("Please implement it by yourself.", "2018-02-28")
def class_balanced_sigmoid_cross_entropy(logits, label, name='cross_entropy_loss'):
"""
This function accepts logits rather than predictions, and is more numerically stable than
:func:`class_balanced_cross_entropy`.
"""
with tf.name_scope('class_balanced_sigmoid_cross_entropy'):
y = tf.cast(label, tf.float32)
count_neg = tf.reduce_sum(1. - y)
count_pos = tf.reduce_sum(y)
beta = count_neg / (count_neg + count_pos)
pos_weight = beta / (1 - beta)
cost = tf.nn.weighted_cross_entropy_with_logits(logits=logits, targets=y, pos_weight=pos_weight)
cost = tf.reduce_mean(cost * (1 - beta))
zero = tf.equal(count_pos, 0.0)
return tf.where(zero, 0.0, cost, name=name)
def print_stat(x, message=None): def print_stat(x, message=None):
""" A simple print Op that might be easier to use than :meth:`tf.Print`. """ A simple print Op that might be easier to use than :meth:`tf.Print`.
Use it like: ``x = print_stat(x, message='This is x')``. Use it like: ``x = print_stat(x, message='This is x')``.
...@@ -206,29 +144,6 @@ def psnr(prediction, ground_truth, maxp=None, name='psnr'): ...@@ -206,29 +144,6 @@ def psnr(prediction, ground_truth, maxp=None, name='psnr'):
return psnr return psnr
@contextmanager
@deprecated("Please implement it by yourself.", "2018-02-28")
def guided_relu():
"""
Returns:
A context where the gradient of :meth:`tf.nn.relu` is replaced by
guided back-propagation, as described in the paper:
`Striving for Simplicity: The All Convolutional Net
<https://arxiv.org/abs/1412.6806>`_
"""
from tensorflow.python.ops import gen_nn_ops # noqa
@tf.RegisterGradient("GuidedReLU")
def GuidedReluGrad(op, grad):
return tf.where(0. < grad,
gen_nn_ops._relu_grad(grad, op.outputs[0]),
tf.zeros(grad.get_shape()))
g = tf.get_default_graph()
with g.gradient_override_map({'Relu': 'GuidedReLU'}):
yield
@deprecated("Please implement it by yourself.", "2018-04-28") @deprecated("Please implement it by yourself.", "2018-04-28")
def saliency_map(output, input, name="saliency_map"): def saliency_map(output, input, name="saliency_map"):
""" """
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment