Commit ba293da8 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 6a0d33d1
...@@ -22,8 +22,9 @@ It's Yet Another TF high-level API, with __speed__, __readability__ and __flexib ...@@ -22,8 +22,9 @@ It's Yet Another TF high-level API, with __speed__, __readability__ and __flexib
some benchmark scripts. some benchmark scripts.
2. Focus on __large datasets__. 2. Focus on __large datasets__.
+ It's unnecessary to read/preprocess data with a new language called TF. + [You don't need `tf.data`](http://tensorpack.readthedocs.io/tutorial/input-source.html#tensorflow-reader-cons).
Tensorpack helps you load large datasets (e.g. ImageNet) in __pure Python__ with autoparallelization. It's unnecessary and painful to process data with a new language called TF.
Tensorpack helps you efficiently load large datasets (e.g. ImageNet) in __pure Python__ with autoparallelization.
3. It's not a model wrapper. 3. It's not a model wrapper.
+ There are too many symbolic function wrappers in the world. Tensorpack includes only a few common models. + There are too many symbolic function wrappers in the world. Tensorpack includes only a few common models.
......
...@@ -29,8 +29,9 @@ Assuming you have 5GB/s `memcpy` bandwidth (roughly like this if you run single- ...@@ -29,8 +29,9 @@ Assuming you have 5GB/s `memcpy` bandwidth (roughly like this if you run single-
down your training by 10%. Think about how many more copies are made during your preprocessing. down your training by 10%. Think about how many more copies are made during your preprocessing.
Failure to hide the data preparation latency is the major reason why people Failure to hide the data preparation latency is the major reason why people
cannot see good GPU utilization. __Always choose a framework that allows latency hiding.__ cannot see good GPU utilization. You should __always choose a framework that enables latency hiding.__
However most other TensorFlow wrappers are designed to be `feed_dict` based. However most other TensorFlow wrappers are designed to be `feed_dict` based.
Tensorpack has built-in mechanisms to hide latency of the above stages.
This is the major reason why tensorpack is [faster](https://github.com/tensorpack/benchmarks). This is the major reason why tensorpack is [faster](https://github.com/tensorpack/benchmarks).
## Python Reader or TF Reader ? ## Python Reader or TF Reader ?
...@@ -40,32 +41,25 @@ either Python code or TensorFlow operators, or a mix of two. ...@@ -40,32 +41,25 @@ either Python code or TensorFlow operators, or a mix of two.
Both are supported in tensorpack, while we recommend using Python. Both are supported in tensorpack, while we recommend using Python.
### TensorFlow Reader: Pros ### TensorFlow Reader: Pros
* Faster read/preprocessing.
* Often true, but not necessarily. With Python you have access to many other fast libraries, which might be unsupported in TF. People often think they should use `tf.data` because it's fast.
* Python may be just fast enough.
* Indeed it's often fast, but not necessarily. With Python you have access to many other fast libraries, which might be unsupported in TF.
* Python may be just fast enough.
As long as data preparation keeps up with training, and the latency of all four blocks in the As long as data preparation keeps up with training, and the latency of all four blocks in the
above figure is hidden, running faster brings no more gains to overall throughput. above figure is hidden, __faster reader brings no gains to overall throughput__.
For most types of problems, up to the scale of multi-GPU ImageNet training, For most types of problems, up to the scale of multi-GPU ImageNet training,
Python can offer enough speed if you use a fast library (e.g. `tensorpack.dataflow`). Python can offer enough speed if you use a fast library (e.g. `tensorpack.dataflow`).
See the [Efficient DataFlow](efficient-dataflow.html) tutorial on how to build a fast Python reader with DataFlow. See the [Efficient DataFlow](efficient-dataflow.html) tutorial on how to build a fast Python reader with DataFlow.
* No "Copy to TF" (i.e. `feed_dict`) stage.
* True. But as mentioned above, the latency can usually be hidden.
In tensorpack, TF queues are usually used to hide the "Copy to TF" latency,
and TF `StagingArea` can help hide the "Copy to GPU" latency.
They are used by most examples in tensorpack.
### TensorFlow Reader: Cons ### TensorFlow Reader: Cons
The disadvantage of TF reader is obvious and it's huge: it's __too complicated__. The disadvantage of TF reader is obvious and it's huge: it's __too complicated__.
Unlike running a mathematical model, reading data is a complicated and poorly-structured task. Unlike running a mathematical model, data processing is a complicated and poorly-structured task.
You need to handle different formats, handle corner cases, noisy data, combination of data, You need to handle different formats, handle corner cases, noisy data, combination of data.
which require condition operations, loops, data structures, sometimes even exception handling. These operations Doing these require condition operations, loops, data structures, sometimes even exception handling.
are __naturally not suitable__ for a symbolic graph. These operations are __naturally not the right task for a symbolic graph__.
Let's take a look at what users are asking for `tf.data`: Let's take a look at what users are asking for `tf.data`:
* Different ways to [pad data](https://github.com/tensorflow/tensorflow/issues/13969), [shuffle data](https://github.com/tensorflow/tensorflow/issues/14518) * Different ways to [pad data](https://github.com/tensorflow/tensorflow/issues/13969), [shuffle data](https://github.com/tensorflow/tensorflow/issues/14518)
...@@ -75,14 +69,14 @@ Let's take a look at what users are asking for `tf.data`: ...@@ -75,14 +69,14 @@ Let's take a look at what users are asking for `tf.data`:
* [Sort/skip some data](https://github.com/tensorflow/tensorflow/issues/14250) * [Sort/skip some data](https://github.com/tensorflow/tensorflow/issues/14250)
* [Write data to files](https://github.com/tensorflow/tensorflow/issues/15014) * [Write data to files](https://github.com/tensorflow/tensorflow/issues/15014)
To support all these features which could've been done with __3 lines of code in Python __, you need either a new TF To support all these features which could've been done with __3 lines of code in Python__, you need either a new TF
API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator) API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator)
(i.e. Python again) to the rescue. (i.e. Python again) to the rescue.
It only makes sense to use TF to read data, if your data is originally very clean and well-formated. It only makes sense to use TF to read data, if your data is originally very clean and well-formated.
If not, you may feel like writing a script to format your data, but then you're almost writing a Python loader already! If not, you may feel like writing a script to format your data, but then you're almost writing a Python loader already!
Think about it: it's a waste of time to write a Python script to transform from raw data to TF-friendly format, Think about it: it's a waste of time to write a Python script to transform from some format to TF-friendly format,
then a TF script to transform from this format to tensors. then a TF script to transform from this format to tensors.
The intermediate format doesn't have to exist. The intermediate format doesn't have to exist.
You just need the right interface to connect Python to the graph directly, efficiently. You just need the right interface to connect Python to the graph directly, efficiently.
......
...@@ -315,7 +315,7 @@ def sample_fast_rcnn_targets(boxes, gt_boxes, gt_labels): ...@@ -315,7 +315,7 @@ def sample_fast_rcnn_targets(boxes, gt_boxes, gt_labels):
@under_name_scope() @under_name_scope()
def crop_and_resize(image, boxes, box_ind, crop_size, pad_border=True): def crop_and_resize(image, boxes, box_ind, crop_size, pad_border=True):
""" """
Better-aligned version of tf.image.crop_and_resize, following our definition of floating point boxes. Aligned version of tf.image.crop_and_resize, following our definition of floating point boxes.
Args: Args:
image: NCHW image: NCHW
...@@ -375,7 +375,7 @@ def crop_and_resize(image, boxes, box_ind, crop_size, pad_border=True): ...@@ -375,7 +375,7 @@ def crop_and_resize(image, boxes, box_ind, crop_size, pad_border=True):
image_shape = tf.shape(image)[2:] image_shape = tf.shape(image)[2:]
boxes = transform_fpcoor_for_tf(boxes, image_shape, [crop_size, crop_size]) boxes = transform_fpcoor_for_tf(boxes, image_shape, [crop_size, crop_size])
image = tf.transpose(image, [0, 2, 3, 1]) # 1hwc image = tf.transpose(image, [0, 2, 3, 1]) # nhwc
ret = tf.image.crop_and_resize( ret = tf.image.crop_and_resize(
image, boxes, tf.to_int32(box_ind), image, boxes, tf.to_int32(box_ind),
crop_size=[crop_size, crop_size]) crop_size=[crop_size, crop_size])
......
...@@ -9,7 +9,7 @@ The article [Towards Efficient Multi-GPU Training in Keras with TensorFlow](http ...@@ -9,7 +9,7 @@ The article [Towards Efficient Multi-GPU Training in Keras with TensorFlow](http
has mentioned some of it. has mentioned some of it.
Even on a single GPU, tensorpack can run [1.2~2x faster](https://github.com/tensorpack/benchmarks/tree/master/other-wrappers) Even on a single GPU, tensorpack can run [1.2~2x faster](https://github.com/tensorpack/benchmarks/tree/master/other-wrappers)
than the equivalent Keras code. The gap becomes larger when you scale. than the equivalent Keras code. The gap becomes larger when you scale to multiple GPUs.
Tensorpack and [horovod](https://github.com/uber/horovod/blob/master/examples/keras_imagenet_resnet50.py) Tensorpack and [horovod](https://github.com/uber/horovod/blob/master/examples/keras_imagenet_resnet50.py)
are the only two tools I know that can scale the training of a large Keras model. are the only two tools I know that can scale the training of a large Keras model.
...@@ -28,15 +28,15 @@ It has: ...@@ -28,15 +28,15 @@ It has:
+ ResNet-50 model modified from [keras.applications](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/_impl/keras/applications/resnet50.py). + ResNet-50 model modified from [keras.applications](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/_impl/keras/applications/resnet50.py).
(We put stride on 3x3 conv in each bottleneck, which is different from certain other implementations). (We put stride on 3x3 conv in each bottleneck, which is different from certain other implementations).
+ Multi-GPU data-parallel __training and validation__ which scales + Multi-GPU data-parallel __training and validation__ which scales
+ Finished 100 epochs in 19.5 hours on 8 V100s, with >90% GPU utilization. + Finished 100 epochs in 19 hours on 8 V100s, with >90% GPU utilization.
+ Still slightly slower than native tensorpack examples. + Still slightly slower than native tensorpack examples.
+ Good accuracy (same as [tensorpack ResNet example](../ResNet)) + Good accuracy (same as [tensorpack ResNet example](../ResNet))
### Note: ### Note:
Keras support is __not official__. Keras does not use variable scopes or variable Keras support is __not official__. Keras does not respect variable scopes or variable
collections, which contradicts with tensorpack trainers. collections, which contradicts with TensorFlow conventions and tensorpack trainers.
Therefore, not all Keras layers are supported in tensorpack. Therefore, the support in tensorpack is experimental.
These simple examples can run within tensorpack smoothly, but note that a future version These simple examples can run within tensorpack smoothly, but note that a future version
of Keras may still break them (unlikely, though). of Keras may break them (unlikely, though).
...@@ -211,4 +211,4 @@ class PeakMemoryTracker(Callback): ...@@ -211,4 +211,4 @@ class PeakMemoryTracker(Callback):
results = rv.results results = rv.results
if results is not None: if results is not None:
for mem, dev in zip(results, self._devices): for mem, dev in zip(results, self._devices):
self.trainer.monitors.put_scalar('PeakMemory(MB) ' + dev, mem / 1e6) self.trainer.monitors.put_scalar('PeakMemory(MB)' + dev, mem / 1e6)
...@@ -75,6 +75,7 @@ class KerasModelCaller(object): ...@@ -75,6 +75,7 @@ class KerasModelCaller(object):
"This was automatically corrected by tensorpack.".format(n)) "This was automatically corrected by tensorpack.".format(n))
# Keras models might not use this collection at all (in some versions). # Keras models might not use this collection at all (in some versions).
# This is a BC-breaking change of tf.keras: https://github.com/tensorflow/tensorflow/issues/19643
restore_collection(update_ops_backup) restore_collection(update_ops_backup)
for op in model.updates: for op in model.updates:
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, op) tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, op)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment