Commit af61ebbc authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent fbbd435a
...@@ -44,7 +44,9 @@ It's Yet Another TF wrapper, but different in: ...@@ -44,7 +44,9 @@ It's Yet Another TF wrapper, but different in:
+ Data-parallel distributed training is off-the-shelf to use. It is as slow as Google's official benchmark. + Data-parallel distributed training is off-the-shelf to use. It is as slow as Google's official benchmark.
3. Focus on __large datasets__. 3. Focus on __large datasets__.
+ It's painful to read/preprocess data through TF. tensorpack helps you load large datasets (e.g. ImageNet) in __pure Python__ with autoparallelization. + It's painful to read/preprocess data through TF.
tensorpack helps you load large datasets (e.g. ImageNet) in __pure Python__ with autoparallelization.
It also naturally works with TF Queues or tf.data.
4. Interface of extensible __Callbacks__. 4. Interface of extensible __Callbacks__.
Write a callback to implement everything you want to do apart from the training iterations, and Write a callback to implement everything you want to do apart from the training iterations, and
......
...@@ -50,7 +50,7 @@ the rest of the data pipeline. ...@@ -50,7 +50,7 @@ the rest of the data pipeline.
If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html) If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html)
on how tensorpack further accelerates data loading in the graph. on how tensorpack further accelerates data loading in the graph.
Nevertheless, tensorpack support data loading with native TF operators as well. Nevertheless, tensorpack support data loading with native TF operators / TF datasets as well.
### Use DataFlow outside Tensorpack ### Use DataFlow outside Tensorpack
DataFlow is __independent__ of both tensorpack and TensorFlow. DataFlow is __independent__ of both tensorpack and TensorFlow.
......
...@@ -72,8 +72,9 @@ For example, ...@@ -72,8 +72,9 @@ For example,
1. Come from a DataFlow and been fed to the graph. 1. Come from a DataFlow and been fed to the graph.
2. Come from a DataFlow and been prefetched on CPU by a TF queue. 2. Come from a DataFlow and been prefetched on CPU by a TF queue.
3. Come from a DataFlow, prefetched on CPU by a TF queue, then prefetched on GPU by a TF StagingArea. 3. Come from a DataFlow, prefetched on CPU by a TF queue, then prefetched on GPU by a TF StagingArea.
4. Come from some TF native reading pipeline. 4. Come from a DataFlow, and further processed by `tf.data.Dataset`.
5. Come from some ZMQ pipe, where the load/preprocessing may happen on a different machine. 5. Come from some TF native reading pipeline.
6. Come from some ZMQ pipe, where the load/preprocessing may happen on a different machine.
When you set `TrainConfig(dataflow=)`, tensorpack trainers automatically adds proper prefetching for you. When you set `TrainConfig(dataflow=)`, tensorpack trainers automatically adds proper prefetching for you.
You can also use `TrainConfig(data=)` option to use a customized `InputSource`. You can also use `TrainConfig(data=)` option to use a customized `InputSource`.
......
...@@ -60,10 +60,10 @@ class TestDataSpeed(ProxyDataFlow): ...@@ -60,10 +60,10 @@ class TestDataSpeed(ProxyDataFlow):
class BatchData(ProxyDataFlow): class BatchData(ProxyDataFlow):
""" """
Concat datapoints into batches. Stack datapoints into batches.
It produces datapoints of the same number of components as ``ds``, but It produces datapoints of the same number of components as ``ds``, but
each component has one new extra dimension of size ``batch_size``. each component has one new extra dimension of size ``batch_size``.
A batch can be either a list of original components, or (by default) The batch can be either a list of original components, or (by default)
a numpy array of original components. a numpy array of original components.
""" """
...@@ -71,15 +71,14 @@ class BatchData(ProxyDataFlow): ...@@ -71,15 +71,14 @@ class BatchData(ProxyDataFlow):
""" """
Args: Args:
ds (DataFlow): When ``use_list=False``, the components of ``ds`` ds (DataFlow): When ``use_list=False``, the components of ``ds``
must be either scalars or :class:`np.ndarray`, and must be either scalars or :class:`np.ndarray`, and have to be consistent in shapes.
components has to have consistent shape across ``ds``.
batch_size(int): batch size batch_size(int): batch size
remainder (bool): When the remaining datapoints in ``ds`` is not remainder (bool): When the remaining datapoints in ``ds`` is not
enough to form a batch, whether or not to also produce the remaining enough to form a batch, whether or not to also produce the remaining
data as a smaller batch. data as a smaller batch.
If set to False, all generated datapoints are guranteed to have the same batch size. If set to False, all produced datapoints are guranteed to have the same batch size.
use_list (bool): if True, each component will contain a list use_list (bool): if True, each component will contain a list
of datapoints instead of an numpy array of datapoints. This also avoids an extra copy. of datapoints instead of an numpy array of an extra dimension.
""" """
super(BatchData, self).__init__(ds) super(BatchData, self).__init__(ds)
if not remainder: if not remainder:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment