Commit b2ff230d authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 09bac481
......@@ -63,12 +63,12 @@ Both are supported in tensorpack, while we recommend using Python.
### TensorFlow Reader: Cons
The disadvantage of TF reader is obvious and it's huge: it's __too complicated__.
Reading data is a complicated and badly-structured job than running the model.
Unlike running a mathematical model, reading data is a complicated and badly-structured task.
You need to handle different data format, handle corner cases in noisy data,
which all require condition operations, loops, sometimes even exception handling. These operations
are __naturally not suitable__ for a symbolic graph.
Let's take a look at what users are asking for:
Let's take a look at what users are asking for `tf.data`:
* Different ways to [pad data](https://github.com/tensorflow/tensorflow/issues/13969), [shuffle data](https://github.com/tensorflow/tensorflow/issues/14518)
* [Handle none values in data](https://github.com/tensorflow/tensorflow/issues/13865)
* [Handle dataset that's not a multiple of batch size](https://github.com/tensorflow/tensorflow/issues/13745)
......@@ -76,16 +76,16 @@ Let's take a look at what users are asking for:
* [Sort/skip some data](https://github.com/tensorflow/tensorflow/issues/14250)
* [Write data to files](https://github.com/tensorflow/tensorflow/issues/15014)
To support all these features which could've been done with 3 lines of code in Python, you need either a new TF
To support all these features which could've been done with __3 lines of code in Python__, you need either a new TF
API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator)
(i.e. Python again) to the rescue.
It only makes sense to use TF to read data, if your data is originally very clean and well-formated.
If not, you may feel like writing a script to clean your data, but then you're almost writing a Python loader already!
If not, you may feel like writing a script to format your data, but then you're almost writing a Python loader already!
Think about it: it's a waste of time to write a Python script to transform from raw data to clean format (e.g. TFRecords),
Think about it: it's a waste of time to write a Python script to transform from raw data to TF-friendly format,
then a TF script to transform from this format to tensors.
The intermediate step (TFRecords) doesn't have to exist.
The intermediate format doesn't have to exist.
You just need the right interface to connect Python to the graph directly, efficiently.
`tensorpack.InputSource` is such an interface.
......@@ -95,13 +95,16 @@ You just need the right interface to connect Python to the graph directly, effic
For example,
1. [FeedInput](../modules/input_source.html#tensorpack.input_source.FeedInput):
Come from a DataFlow and get fed to the graph.
Come from a DataFlow and get fed to the graph (slow).
2. [QueueInput](../modules/input_source.html#tensorpack.input_source.QueueInput):
Come from a DataFlow and get prefetched on CPU by a TF queue.
3. [StagingInput](../modules/input_source.html#tensorpack.input_source.StagingInput):
Come from some `InputSource`, then prefetched on GPU by a TF StagingArea.
4. Come from a DataFlow, and further processed by `tf.data.Dataset`.
5. [TensorInput](../modules/input_source.html#tensorpack.input_source.TensorInput):
Come from some TF reading ops.
6. Come from some ZMQ pipe, where the load/preprocessing may happen on a different machine.
4. [TFDatasetInput](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.TFDatasetInput)
Come from a `tf.data.Dataset`.
5. [dataflow_to_dataset](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.TFDatasetInput.dataflow_to_dataset)
Come from a DataFlow, and further processed by `tf.data.Dataset`.
6. [TensorInput](../modules/input_source.html#tensorpack.input_source.TensorInput):
Come from some tensors you wrote.
7. [ZMQInput](http://tensorpack.readthedocs.io/en/latest/modules/input_source.html#tensorpack.input_source.ZMQInput)
Come from some ZeroMQ pipe, where the load/preprocessing may happen on a different machine.
......@@ -170,14 +170,16 @@ class DataParallelInferenceRunner(InferenceRunnerBase):
"""
Inference with data-parallel support on multiple GPUs.
It will build one predict tower on each GPU, and run prediction
with a larger batch.
with a large total batch.
"""
def __init__(self, input, infs, gpus):
"""
Args:
input (DataFlow or QueueInput)
gpus (list[int]): list of GPU id
gpus (int or list[int]): #gpus, or list of GPU id
"""
if isinstance(gpus, int):
gpus = list(range(gpus))
self._tower_names = ['InferenceTower{}'.format(k) for k in range(len(gpus))]
if isinstance(input, DataFlow):
input = QueueInput(input)
......
......@@ -217,9 +217,9 @@ class ScheduledHyperParamSetter(HyperParamSetter):
param: same as in :class:`HyperParamSetter`.
schedule (list): with the format ``[(epoch1, val1), (epoch2, val2), (epoch3, val3)]``.
Each ``(ep, val)`` pair means to set the param
to "val" __after__ the completion of epoch `ep`.
to "val" **after** the completion of epoch `ep`.
If ep == 0, the value will be set before the first epoch
(by default the first is epoch 1).
(because by default the first is epoch 1).
interp: None: no interpolation. 'linear': linear interpolation
Example:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment