Commit 440bf631 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent defced98
# Efficient DataFlow # Efficient DataFlow
This tutorial gives an overview of how to build an efficient DataFlow, using ImageNet This tutorial gives an overview of how to build an efficient DataFlow, using ImageNet dataset as an example.
dataset as an example.
Our goal in the end is to have Our goal in the end is to have
a __Python generator__ which yields preprocessed ImageNet images and labels as fast as possible. a __Python generator__ which yields preprocessed ImageNet images and labels as fast as possible.
Since it is simply a generator interface, you can use the DataFlow in other Python-based frameworks (e.g. Keras) Since it is simply a generator interface, you can use the DataFlow in any Python-based frameworks (e.g. PyTorch, Keras)
or your own code as well. or your own code as well.
**What we are going to do**: We'll use ILSVRC12 dataset, which contains 1.28 million images. **What we are going to do**: We'll use ILSVRC12 dataset, which contains 1.28 million images.
...@@ -13,10 +12,10 @@ The average resolution is about 400x350 <sup>[[1]]</sup>. ...@@ -13,10 +12,10 @@ The average resolution is about 400x350 <sup>[[1]]</sup>.
Following the [ResNet example](../examples/ResNet), we need images in their original resolution, Following the [ResNet example](../examples/ResNet), we need images in their original resolution,
so we will read the original dataset (instead of a down-sampled version), and so we will read the original dataset (instead of a down-sampled version), and
then apply complicated preprocessing to it. then apply complicated preprocessing to it.
We will need to reach a speed of, roughly **1k ~ 2k images per second**, to keep GPUs busy. We aim to reach a speed of, roughly **1k~3k images per second**, to keep GPUs busy.
Some things to know before reading: Some things to know before reading:
1. For smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some prefetch should usually work well enough. 1. For smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some multiprocess prefetch should usually work well enough.
Therefore you don't have to understand this tutorial in depth unless you really find your data being the bottleneck. Therefore you don't have to understand this tutorial in depth unless you really find your data being the bottleneck.
This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset. This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset.
2. Having a fast Python generator **alone** may or may not improve your overall training speed. 2. Having a fast Python generator **alone** may or may not improve your overall training speed.
...@@ -31,6 +30,9 @@ Some things to know before reading: ...@@ -31,6 +30,9 @@ Some things to know before reading:
You may need to tune the parameters (#processes, #threads, size of buffer, etc.) You may need to tune the parameters (#processes, #threads, size of buffer, etc.)
or change the pipeline for new tasks and new machines to achieve the best performance. or change the pipeline for new tasks and new machines to achieve the best performance.
The benchmark code for this tutorial can be found in [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ImageNet),
including comparison with a similar (but simpler) pipeline built with `tf.data`.
## Random Read ## Random Read
We start from a simple DataFlow: We start from a simple DataFlow:
......
...@@ -42,12 +42,11 @@ Both are supported in tensorpack, while we recommend using Python. ...@@ -42,12 +42,11 @@ Both are supported in tensorpack, while we recommend using Python.
### TensorFlow Reader: Pros ### TensorFlow Reader: Pros
* Faster read/preprocessing. * Faster read/preprocessing.
* Potentially true, but not necessarily. With Python you can call a variety of other fast libraries, which * Often true, but not necessarily. With Python you have access to many other fast libraries, which might be unsupported in TF.
you might not have a good support in TF. For example, LMDB could be faster than TFRecords.
* Python may be just fast enough. * Python may be just fast enough.
As long as data preparation runs faster than training, and the latency of all four blocks in the As long as data preparation keeps up with training, and the latency of all four blocks in the
above figure is hidden, it makes no difference at all. above figure is hidden, running faster brings no more gains to overall throughput.
For most types of problems, up to the scale of multi-GPU ImageNet training, For most types of problems, up to the scale of multi-GPU ImageNet training,
Python can offer enough speed if you use a fast library (e.g. `tensorpack.dataflow`). Python can offer enough speed if you use a fast library (e.g. `tensorpack.dataflow`).
See the [Efficient DataFlow](efficient-dataflow.html) tutorial on how to build a fast Python reader with DataFlow. See the [Efficient DataFlow](efficient-dataflow.html) tutorial on how to build a fast Python reader with DataFlow.
...@@ -56,15 +55,15 @@ Both are supported in tensorpack, while we recommend using Python. ...@@ -56,15 +55,15 @@ Both are supported in tensorpack, while we recommend using Python.
* True. But as mentioned above, the latency can usually be hidden. * True. But as mentioned above, the latency can usually be hidden.
In tensorpack, TF queues are used to hide the "Copy to TF" latency, In tensorpack, TF queues are usually used to hide the "Copy to TF" latency,
and TF `StagingArea` can help hide the "Copy to GPU" latency. and TF `StagingArea` can help hide the "Copy to GPU" latency.
They are used by most examples in tensorpack. They are used by most examples in tensorpack.
### TensorFlow Reader: Cons ### TensorFlow Reader: Cons
The disadvantage of TF reader is obvious and it's huge: it's __too complicated__. The disadvantage of TF reader is obvious and it's huge: it's __too complicated__.
Unlike running a mathematical model, reading data is a complicated and badly-structured task. Unlike running a mathematical model, reading data is a complicated and poorly-structured task.
You need to handle different data format, handle corner cases in noisy data, You need to handle different formats, handle corner cases, noisy data,
which all require condition operations, loops, sometimes even exception handling. These operations which all require condition operations, loops, sometimes even exception handling. These operations
are __naturally not suitable__ for a symbolic graph. are __naturally not suitable__ for a symbolic graph.
...@@ -97,7 +96,7 @@ For example, ...@@ -97,7 +96,7 @@ For example,
1. [FeedInput](../modules/input_source.html#tensorpack.input_source.FeedInput): 1. [FeedInput](../modules/input_source.html#tensorpack.input_source.FeedInput):
Come from a DataFlow and get fed to the graph (slow). Come from a DataFlow and get fed to the graph (slow).
2. [QueueInput](../modules/input_source.html#tensorpack.input_source.QueueInput): 2. [QueueInput](../modules/input_source.html#tensorpack.input_source.QueueInput):
Come from a DataFlow and get prefetched on CPU by a TF queue. Come from a DataFlow and get buffered on CPU by a TF queue.
3. [StagingInput](../modules/input_source.html#tensorpack.input_source.StagingInput): 3. [StagingInput](../modules/input_source.html#tensorpack.input_source.StagingInput):
Come from some `InputSource`, then prefetched on GPU by a TF StagingArea. Come from some `InputSource`, then prefetched on GPU by a TF StagingArea.
4. [TFDatasetInput](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.TFDatasetInput) 4. [TFDatasetInput](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.TFDatasetInput)
...@@ -105,6 +104,10 @@ For example, ...@@ -105,6 +104,10 @@ For example,
5. [dataflow_to_dataset](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.TFDatasetInput.dataflow_to_dataset) 5. [dataflow_to_dataset](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.TFDatasetInput.dataflow_to_dataset)
Come from a DataFlow, and further processed by `tf.data.Dataset`. Come from a DataFlow, and further processed by `tf.data.Dataset`.
6. [TensorInput](../modules/input_source.html#tensorpack.input_source.TensorInput): 6. [TensorInput](../modules/input_source.html#tensorpack.input_source.TensorInput):
Come from some tensors you wrote. Come from some tensors you define (can be reading ops, for example).
7. [ZMQInput](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.ZMQInput) 7. [ZMQInput](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.ZMQInput)
Come from some ZeroMQ pipe, where the load/preprocessing may happen on a different machine. Come from some ZeroMQ pipe, where the reading/preprocessing may happen in a different process or even a different machine.
Typically, we recommend `QueueInput + StagingInput` as it's good for most use cases.
If your data has to come from a separate process for whatever reasons, use `ZMQInput`.
If you still like to use TF reading ops, define a `tf.data.Dataset` and use `TFDatasetInput`.
...@@ -135,7 +135,6 @@ def eval_on_ILSVRC12(model, sessinit, dataflow): ...@@ -135,7 +135,6 @@ def eval_on_ILSVRC12(model, sessinit, dataflow):
class ImageNetModel(ModelDesc): class ImageNetModel(ModelDesc):
weight_decay = 1e-4
image_shape = 224 image_shape = 224
""" """
...@@ -146,21 +145,34 @@ class ImageNetModel(ModelDesc): ...@@ -146,21 +145,34 @@ class ImageNetModel(ModelDesc):
image_dtype = tf.uint8 image_dtype = tf.uint8
""" """
Whether to apply weight decay on BN parameters. Either 'NCHW' or 'NHWC'
""" """
weight_decay_on_bn = False data_format = 'NCHW'
""" """
Either 'NCHW' or 'NHWC' Whether the image is BGR or RGB. If using DataFlow, then it should be BGR.
""" """
data_format = 'NCHW' image_bgr = True
weight_decay = 1e-4
"""
To apply on normalization parameters, use '.*/W|.*/gamma|.*/beta'
"""
weight_decay_pattern = '.*/W'
"""
Scale the loss, for whatever reasons (e.g., gradient averaging, fp16 training, etc)
"""
loss_scale = 1.
def inputs(self): def inputs(self):
return [tf.placeholder(self.image_dtype, [None, self.image_shape, self.image_shape, 3], 'input'), return [tf.placeholder(self.image_dtype, [None, self.image_shape, self.image_shape, 3], 'input'),
tf.placeholder(tf.int32, [None], 'label')] tf.placeholder(tf.int32, [None], 'label')]
def build_graph(self, image, label): def build_graph(self, image, label):
image = ImageNetModel.image_preprocess(image, bgr=True) image = ImageNetModel.image_preprocess(image, bgr=self.image_bgr)
assert self.data_format in ['NCHW', 'NHWC']
if self.data_format == 'NCHW': if self.data_format == 'NCHW':
image = tf.transpose(image, [0, 3, 1, 2]) image = tf.transpose(image, [0, 3, 1, 2])
...@@ -168,17 +180,19 @@ class ImageNetModel(ModelDesc): ...@@ -168,17 +180,19 @@ class ImageNetModel(ModelDesc):
loss = ImageNetModel.compute_loss_and_error(logits, label) loss = ImageNetModel.compute_loss_and_error(logits, label)
if self.weight_decay > 0: if self.weight_decay > 0:
if self.weight_decay_on_bn: wd_loss = regularize_cost(self.weight_decay_pattern,
pattern = '.*/W|.*/gamma|.*/beta' tf.contrib.layers.l2_regularizer(self.weight_decay),
else:
pattern = '.*/W'
wd_loss = regularize_cost(pattern, tf.contrib.layers.l2_regularizer(self.weight_decay),
name='l2_regularize_loss') name='l2_regularize_loss')
add_moving_summary(loss, wd_loss) add_moving_summary(loss, wd_loss)
total_cost = tf.add_n([loss, wd_loss], name='cost') total_cost = tf.add_n([loss, wd_loss], name='cost')
else: else:
total_cost = tf.identity(loss, name='cost') total_cost = tf.identity(loss, name='cost')
add_moving_summary(total_cost) add_moving_summary(total_cost)
if self.loss_scale != 1.:
logger.info("Scaling the total loss by {} ...".format(self.loss_scale))
return total_cost * self.loss_scale
else:
return total_cost return total_cost
@abstractmethod @abstractmethod
......
...@@ -141,7 +141,7 @@ def shape4d(a, data_format='channels_last'): ...@@ -141,7 +141,7 @@ def shape4d(a, data_format='channels_last'):
@memoized @memoized
def log_once(message, func): def log_once(message, func='info'):
""" """
Log certain message only once. Call this function more than one times with Log certain message only once. Call this function more than one times with
the same message will result in no-op. the same message will result in no-op.
......
...@@ -114,7 +114,13 @@ Press any other key to exit. """) ...@@ -114,7 +114,13 @@ Press any other key to exit. """)
shutil.move(dirname, backup_name) shutil.move(dirname, backup_name)
info("Directory '{}' backuped to '{}'".format(dirname, backup_name)) # noqa: F821 info("Directory '{}' backuped to '{}'".format(dirname, backup_name)) # noqa: F821
elif act == 'd': elif act == 'd':
try:
shutil.rmtree(dirname) shutil.rmtree(dirname)
except OSError:
num_files = len([x for x in os.listdir(dirname) if x[0] != '.'])
if num_files > 0:
raise
elif act == 'n': elif act == 'n':
dirname = dirname + _get_time_str() dirname = dirname + _get_time_str()
info("Use a new log directory {}".format(dirname)) # noqa: F821 info("Use a new log directory {}".format(dirname)) # noqa: F821
......
...@@ -58,14 +58,14 @@ class RatioCounter(object): ...@@ -58,14 +58,14 @@ class RatioCounter(object):
self._tot = 0 self._tot = 0
self._cnt = 0 self._cnt = 0
def feed(self, cnt, tot=1): def feed(self, count, total=1):
""" """
Args: Args:
cnt(int): the count of some event of interest. cnt(int): the count of some event of interest.
tot(int): the total number of events. tot(int): the total number of events.
""" """
self._tot += tot self._tot += total
self._cnt += cnt self._cnt += count
@property @property
def ratio(self): def ratio(self):
...@@ -74,13 +74,21 @@ class RatioCounter(object): ...@@ -74,13 +74,21 @@ class RatioCounter(object):
return self._cnt * 1.0 / self._tot return self._cnt * 1.0 / self._tot
@property @property
def count(self): def total(self):
""" """
Returns: Returns:
int: the total int: the total
""" """
return self._tot return self._tot
@property
def count(self):
"""
Returns:
int: the total
"""
return self._cnt
class Accuracy(RatioCounter): class Accuracy(RatioCounter):
""" A RatioCounter with a fancy name """ """ A RatioCounter with a fancy name """
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment