Commit 59801ff8 authored by Yuxin Wu's avatar Yuxin Wu

docs update

parent ac2fa1bc
...@@ -41,7 +41,7 @@ It's Yet Another TF wrapper, but different in: ...@@ -41,7 +41,7 @@ It's Yet Another TF wrapper, but different in:
+ Data-Parallel Multi-GPU training is off-the-shelf to use. It is as fast as Google's [benchmark code](https://github.com/tensorflow/benchmarks). + Data-Parallel Multi-GPU training is off-the-shelf to use. It is as fast as Google's [benchmark code](https://github.com/tensorflow/benchmarks).
3. Focus on large datasets. 3. Focus on large datasets.
+ __DataFlow__ allows you to process large datasets such as ImageNet in pure Python without blocking the training. + It's painful to read/preprocess data from TF. Use __DataFlow__ to process large datasets such as ImageNet in pure Python.
+ DataFlow has a unified interface, so you can compose and reuse them to perform complex preprocessing. + DataFlow has a unified interface, so you can compose and reuse them to perform complex preprocessing.
4. Interface of extensible __Callbacks__. 4. Interface of extensible __Callbacks__.
......
...@@ -14,13 +14,13 @@ a numpy array of shape (64, 28, 28), and an array of shape (64,). ...@@ -14,13 +14,13 @@ a numpy array of shape (64, 28, 28), and an array of shape (64,).
### Composition of DataFlow ### Composition of DataFlow
One good thing about having a standard interface is to be able to provide One good thing about having a standard interface is to be able to provide
the greatest code reusability. the greatest code reusability.
There are a lot of existing modules in tensorpack which you can use to compose There are a lot of existing modules in tensorpack, which you can use to compose
complex DataFlow instances with a long pre-processing pipeline. A whole pipeline usually complex DataFlow with a long pre-processing pipeline. A whole pipeline usually
would __read from disk (or other sources), apply augmentations, group into batches, would __read from disk (or other sources), apply augmentations, group into batches,
prefetch data__, etc. A simple example is as the following: prefetch data__, etc. A simple example is as the following:
````python ````python
# a DataFlow you implement to produce [image,label] pairs from whatever sources: # a DataFlow you implement to produce [tensor1, tensor2, ..] lists from whatever sources:
df = MyDataFlow(shuffle=True) df = MyDataFlow(shuffle=True)
# resize the image component of each datapoint # resize the image component of each datapoint
df = AugmentImageComponent(df, [imgaug.Resize((225, 225))]) df = AugmentImageComponent(df, [imgaug.Resize((225, 225))])
...@@ -35,8 +35,6 @@ with all the data preprocessing. ...@@ -35,8 +35,6 @@ with all the data preprocessing.
All these modules are written in Python, All these modules are written in Python,
so you can easily implement whatever operations/transformations you need, so you can easily implement whatever operations/transformations you need,
without worrying about adding operators to TensorFlow. without worrying about adding operators to TensorFlow.
In the meantime, thanks to the prefetching, it can still run fast enough for
tasks as large as ImageNet training.
Unless you are working with standard data types (image folders, LMDB, etc), Unless you are working with standard data types (image folders, LMDB, etc),
you would usually want to write your own DataFlow. you would usually want to write your own DataFlow.
...@@ -70,3 +68,4 @@ training: we only need data to be __fast enough__. ...@@ -70,3 +68,4 @@ training: we only need data to be __fast enough__.
DataFlow is fast enough for problems up to the scale of multi-GPU ImageNet training. DataFlow is fast enough for problems up to the scale of multi-GPU ImageNet training.
See [efficient dataflow tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html) See [efficient dataflow tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html)
for details. for details.
Therefore, for most usecases, writing format conversion/preprocessing code with TensorFlow operators doesn't help you at all.
...@@ -11,7 +11,7 @@ If you have such a mapping function `f` already, you can simply use `imgaug.MapI ...@@ -11,7 +11,7 @@ If you have such a mapping function `f` already, you can simply use `imgaug.MapI
augmentor, or use `MapDataComponent(df, f, index)` as the DataFlow. augmentor, or use `MapDataComponent(df, f, index)` as the DataFlow.
In other words, for simple mapping you do not need to write an augmentor. In other words, for simple mapping you do not need to write an augmentor.
An augmentor does something more than applying the mapping. The interface you will need to implement An augmentor may do something more than applying a mapping. The interface you will need to implement
is: is:
```python ```python
...@@ -28,8 +28,8 @@ It does the following extra things for you: ...@@ -28,8 +28,8 @@ It does the following extra things for you:
1. `self.rng` is a `np.random.RandomState` object, 1. `self.rng` is a `np.random.RandomState` object,
guaranteed to have different seeds when you use multiprocess prefetch. guaranteed to have different seeds when you use multiprocess prefetch.
In multiprocess settings, you will always need it to generate random numbers. In multiprocess settings, you have to use it to generate random numbers.
2. Random parameters and the actual augmentation is separated. This allows you to apply the 2. Random parameter generation and the actual augmentation is separated. This allows you to apply the
same random transformation to several images (with `AugmentImageComponents`), same transformation to several images together (with `AugmentImageComponents`),
which is essential to tasks such as segmentation. which is essential to tasks such as segmentation.
...@@ -2,12 +2,8 @@ ...@@ -2,12 +2,8 @@
### Write a DataFlow ### Write a DataFlow
There are several existing DataFlow, e.g. ImageFromFile, DataFromList, which you can There are several existing DataFlow, e.g. ImageFromFile, DataFromList, which you can
use to read images or load data from a list. use if your data format is simple.
However, in general, you will probably need to write a new DataFlow to produce data for your task. However in general, you will probably need to write a new DataFlow to produce data for your task.
DataFlow implementations for several well-known datasets are provided in the
[dataflow.dataset](http://tensorpack.readthedocs.io/en/latest/modules/tensorpack.dataflow.dataset.html)
module, you can take them as a reference.
Usually, you just need to implement the `get_data()` method which yields a datapoint every time. Usually, you just need to implement the `get_data()` method which yields a datapoint every time.
```python ```python
...@@ -21,12 +17,17 @@ class MyDataFlow(DataFlow): ...@@ -21,12 +17,17 @@ class MyDataFlow(DataFlow):
Optionally, DataFlow can implement the following two methods: Optionally, DataFlow can implement the following two methods:
+ `size()`. Return the number of elements the generator can produce. Certain modules might require this. + `size()`. Return the number of elements the generator can produce. Certain tensorpack features might require this.
+ `reset_state()`. It is guaranteed that the actual process which runs a DataFlow will invoke this method before using it. + `reset_state()`. It is guaranteed that the actual process which runs a DataFlow will invoke this method before using it.
So if this DataFlow needs to something after a `fork()`, you should put it here. So if this DataFlow needs to do something after a `fork()`, you should put it here.
A typical situation is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here. A typical situation is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
Otherwise, child processes will have the same random seed. The `RNGDataFlow` class does this already. Otherwise, child processes will have the same random seed. The `RNGDataFlow` base class does this for you.
With a "low-level" DataFlow defined, you can then compose it with existing modules (e.g. batching, prefetching, ...).
DataFlow implementations for several well-known datasets are provided in the
[dataflow.dataset](http://tensorpack.readthedocs.io/en/latest/modules/tensorpack.dataflow.dataset.html)
module, you can take them as a reference.
With a "low-level" DataFlow defined, you can then compose it with existing modules.
...@@ -2,9 +2,9 @@ ...@@ -2,9 +2,9 @@
## Implement a layer ## Implement a layer
Symbolic functions should be nothing new to you. Symbolic functions should be nothing new to you.
Using symbolic functions is not special in tensorpack: you can use any symbolic functions you have Using symbolic functions in tensorpack is same as in TensorFlow: you can use any symbolic functions you have
made or seen elsewhere with tensorpack layers. made or seen elsewhere together with tensorpack layers.
You can use symbolic functions from slim/tflearn/tensorlayer, and even Keras ([with some tricks](../../examples/mnist-keras.py)). You can use symbolic functions from slim/tflearn/tensorlayer, and even Keras/sonnet ([with some tricks](../../examples/mnist-keras.py)).
So you never **have to** implement a tensorpack layer. So you never **have to** implement a tensorpack layer.
If you would like, you can make a symbolic function become a "layer" by following some simple rules, and then gain benefits from the framework. If you would like, you can make a symbolic function become a "layer" by following some simple rules, and then gain benefits from the framework.
...@@ -19,7 +19,7 @@ def Conv2D(x, out_channel, kernel_shape, ...@@ -19,7 +19,7 @@ def Conv2D(x, out_channel, kernel_shape,
nl=tf.nn.relu, split=1, use_bias=True): nl=tf.nn.relu, split=1, use_bias=True):
``` ```
Basically, a layer is a symbolic function with the following rules: Basically, a tensorpack layer is just a symbolic function, but with the following rules:
+ It is decorated by `@layer_register`. + It is decorated by `@layer_register`.
+ The first argument is its "input". It must be a **tensor or a list of tensors**. + The first argument is its "input". It must be a **tensor or a list of tensors**.
...@@ -31,7 +31,7 @@ By making a symbolic function a "layer", the following things will happen: ...@@ -31,7 +31,7 @@ By making a symbolic function a "layer", the following things will happen:
Everything happening in this function will be under the variable scope 'conv0'. Everything happening in this function will be under the variable scope 'conv0'.
You can register the layer with `use_scope=False` to disable this feature. You can register the layer with `use_scope=False` to disable this feature.
+ Static shapes of input/output will be printed to screen. + Static shapes of input/output will be printed to screen.
+ `argscope` will then work for all its arguments except the input tensor(s). + `argscope` will work for all its arguments except the input tensor(s).
+ It will work with `LinearWrap`: you can use it if the output of one layer matches the input of the next layer. + It will work with `LinearWrap`: you can use it if the output of one layer matches the input of the next layer.
There are also some (non-layer) symbolic functions in the `tfutils.symbolic_functions` module. There are also some (non-layer) symbolic functions in the `tfutils.symbolic_functions` module.
......
...@@ -5,13 +5,14 @@ ...@@ -5,13 +5,14 @@
The library tries to __support__ everything, but it could not really __include__ everything. The library tries to __support__ everything, but it could not really __include__ everything.
For your XYZ, you can either implement them or use any existing python code and wrap it The interface tries to be flexible enough so you can put any XYZ on it.
with tensorpack interface. See [Extend Tensorpack](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack) You can either implement them under the interface or simply wrap some existing Python code.
See [Extend Tensorpack](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack)
for more details. for more details.
If you think: If you think:
1. The framework has limitation in its interface so your XYZ cannot be supported, OR 1. The framework has limitation in its interface so your XYZ cannot be supported, OR
2. Your XYZ is very common, or very well-defined, so it would be nice to include it. 2. Your XYZ is very common / very well-defined, so it would be nice to include it.
Then it is a good time to open an issue. Then it is a good time to open an issue.
...@@ -34,8 +35,8 @@ The script expects a metagraph file which is also saved by `ModelSaver`. ...@@ -34,8 +35,8 @@ The script expects a metagraph file which is also saved by `ModelSaver`.
All model loading (in either training or testing) is through the `session_init` option All model loading (in either training or testing) is through the `session_init` option
in `TrainConfig` or `PredictConfig`. in `TrainConfig` or `PredictConfig`.
It accepts a `SessionInit` instance, where the common options are `SaverRestore` which restores It accepts a `SessionInit` instance, where the common options are `SaverRestore` which restores
TF checkpoint, or `DictRestore` which restores a dict. `get_model_loader` is a small helper to TF checkpoint, or `DictRestore` which restores a dict. (`get_model_loader` is a small helper to
decide which one to use from a file name. decide which one to use from a file name.)
Doing transfer learning is straightforward. Variable restoring is completely based on name match between Doing transfer learning is straightforward. Variable restoring is completely based on name match between
the current graph and the `SessionInit` initializer. the current graph and the `SessionInit` initializer.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
# Input Sources # Input Sources
This tutorial covers how data goes from DataFlow or other sources to TensorFlow graph. This tutorial covers how data goes from DataFlow or other sources to TensorFlow graph.
You don't have to know it, but it may help with efficiency. You don't have to read it because these are details under the tensorpack interface, but knowing it could help understand the efficiency.
`InputSource` is an abstract interface in tensorpack describing where the input come from and how they enter the graph. `InputSource` is an abstract interface in tensorpack describing where the input come from and how they enter the graph.
For example, For example,
...@@ -18,7 +18,7 @@ to customize your `InputSource`. ...@@ -18,7 +18,7 @@ to customize your `InputSource`.
## Use Prefetch ## Use Prefetch
In general, `feed_dict` is slow and should never appear in your critical loop. In general, `feed_dict` is slow and should never appear in training loops.
i.e., when you use TensorFlow without any wrappers, you should avoid loops like this: i.e., when you use TensorFlow without any wrappers, you should avoid loops like this:
```python ```python
while True: while True:
...@@ -26,9 +26,9 @@ while True: ...@@ -26,9 +26,9 @@ while True:
minimize_op.run(feed_dict={'X': X, 'y': y}) minimize_op.run(feed_dict={'X': X, 'y': y})
``` ```
However, when you need to load data from Python-side, this is the only available interface in frameworks such as Keras, tflearn. However, when you need to load data from Python-side, this is the only available interface in frameworks such as Keras, tflearn.
This is part of the reason why [tensorpack is faster](https://gist.github.com/ppwwyyxx/8d95da79f8d97036a7d67c2416c851b6) than examples from other frameworks. This is part of the reason why [tensorpack is faster](https://gist.github.com/ppwwyyxx/8d95da79f8d97036a7d67c2416c851b6).
You should use something like this instead, to prefetch data into the graph in one thread and hide the copy latency: You could use something like this instead, to prefetch data into the graph in one thread and hide the copy latency:
```python ```python
# Thread 1: # Thread 1:
while True: while True:
......
...@@ -18,7 +18,7 @@ class MyModel(ModelDesc): ...@@ -18,7 +18,7 @@ class MyModel(ModelDesc):
Basically, `_get_inputs` should define the metainfo of all the possible placeholders your graph may need. Basically, `_get_inputs` should define the metainfo of all the possible placeholders your graph may need.
`_build_graph` should add tensors/operations to the graph, where `_build_graph` should add tensors/operations to the graph, where
the argument `input_tensors` is the list of input tensors matching `_get_inputs`. the argument `inputs` is the list of input tensors matching `_get_inputs`.
You can use any symbolic functions in `_build_graph`, including TensorFlow core library You can use any symbolic functions in `_build_graph`, including TensorFlow core library
functions and other symbolic libraries (see below). functions and other symbolic libraries (see below).
......
...@@ -9,6 +9,7 @@ import glob ...@@ -9,6 +9,7 @@ import glob
from .base import Callback from .base import Callback
from ..utils import logger from ..utils import logger
from ..tfutils.common import get_tf_version_number
__all__ = ['ModelSaver', 'MinSaver', 'MaxSaver'] __all__ = ['ModelSaver', 'MinSaver', 'MaxSaver']
...@@ -43,11 +44,19 @@ class ModelSaver(Callback): ...@@ -43,11 +44,19 @@ class ModelSaver(Callback):
for key in self.var_collections: for key in self.var_collections:
vars.extend(tf.get_collection(key)) vars.extend(tf.get_collection(key))
self.path = os.path.join(self.checkpoint_dir, 'model') self.path = os.path.join(self.checkpoint_dir, 'model')
self.saver = tf.train.Saver( if get_tf_version_number() <= 1.1:
var_list=vars, self.saver = tf.train.Saver(
max_to_keep=self.keep_recent, var_list=vars,
keep_checkpoint_every_n_hours=self.keep_freq, max_to_keep=self.keep_recent,
write_version=tf.train.SaverDef.V2) keep_checkpoint_every_n_hours=self.keep_freq,
write_version=tf.train.SaverDef.V2)
else:
self.saver = tf.train.Saver(
var_list=vars,
max_to_keep=self.keep_recent,
keep_checkpoint_every_n_hours=self.keep_freq,
write_version=tf.train.SaverDef.V2,
save_relative_paths=True)
self.meta_graph_written = False self.meta_graph_written = False
def _trigger(self): def _trigger(self):
......
...@@ -15,6 +15,7 @@ __all__ = ['get_default_sess_config', ...@@ -15,6 +15,7 @@ __all__ = ['get_default_sess_config',
'get_op_tensor_name', 'get_op_tensor_name',
'get_tensors_by_names', 'get_tensors_by_names',
'get_op_or_tensor_by_name', 'get_op_or_tensor_by_name',
'get_tf_version_number',
] ]
...@@ -134,3 +135,10 @@ def get_op_or_tensor_by_name(name): ...@@ -134,3 +135,10 @@ def get_op_or_tensor_by_name(name):
return f(name) return f(name)
else: else:
return list(map(f, name)) return list(map(f, name))
def get_tf_version_number():
"""
Return a float (for comparison), indicating tensorflow version.
"""
return float('.'.join(tf.VERSION.split('.')[:2]))
...@@ -12,6 +12,7 @@ from six.moves import zip, range ...@@ -12,6 +12,7 @@ from six.moves import zip, range
from ..utils import logger from ..utils import logger
from ..utils.naming import TOWER_FREEZE_KEYS from ..utils.naming import TOWER_FREEZE_KEYS
from ..utils.concurrency import LoopThread from ..utils.concurrency import LoopThread
from ..tfutils.common import get_tf_version_number
from ..tfutils.tower import TowerContext from ..tfutils.tower import TowerContext
from ..tfutils.collection import backup_collection, restore_collection from ..tfutils.collection import backup_collection, restore_collection
from ..tfutils.gradproc import FilterNoneGrad, ScaleGradient from ..tfutils.gradproc import FilterNoneGrad, ScaleGradient
...@@ -28,8 +29,8 @@ __all__ = ['MultiGPUTrainerBase', 'SyncMultiGPUTrainer', ...@@ -28,8 +29,8 @@ __all__ = ['MultiGPUTrainerBase', 'SyncMultiGPUTrainer',
def _check_tf_version(): def _check_tf_version():
ver = float('.'.join(tf.VERSION.split('.')[:2])) assert get_tf_version_number() >= 1.1, \
assert ver >= 1.1, "TF version {} is too old to run multi GPU training!".format(tf.VERSION) "TF version {} is too old to run multi GPU training!".format(tf.VERSION)
def apply_prefetch_policy(config, use_stage=True): def apply_prefetch_policy(config, use_stage=True):
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment