Commit 9185744d authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 910cfaec
Bug Reports/Feature Requests/Usage Questions Only: Potential Bugs/Feature Requests/Usage Questions Only:
Any unexpected problems: PLEASE always include Any unexpected problems: PLEASE always include
1. What you did: 1. What you did:
+ If you're using examples: + If you're using examples:
+ What's the command you run: + What's the command you run:
+ Have you made and changes to code? Post them if any: + Have you made any changes to code? Post them if any:
+ If not, describe what you did that may be relevant. + If not, describe what you did that may be relevant.
But we may not be able to resolve it since there is no reproducible code. But we may not be able to resolve it since there is no reproducible code.
2. What you observed, e.g. as much as logs possible. 2. What you observed, e.g. as much as logs possible.
...@@ -13,10 +13,17 @@ Any unexpected problems: PLEASE always include ...@@ -13,10 +13,17 @@ Any unexpected problems: PLEASE always include
5. About efficiency, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html 5. About efficiency, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
Feature Requests: Feature Requests:
1. Improve an existing feature. + Improve an existing feature, or add a new feature.
2. Add a new feature. Please note that, you can implement a lot of features by extending tensorpack + You can implement a lot of features by extending tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack). (See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
It may not have to be added to tensorpack unless you have a good reason. It does not have to be added to tensorpack unless you have a good reason.
3. Note that we don't take example requests. + We don't take example requests.
Usage Questions:
+ Read the [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials) first.
+ We answer "HOW to do X in tensorpack" for a specific well-defined X.
We don't answer general machine learning questions,
such as "how to improve my model" or "I don't understand the paper".
You can also use gitter (https://gitter.im/tensorpack/users) for more casual discussions. You can also use gitter (https://gitter.im/tensorpack/users) for more casual discussions.
...@@ -12,7 +12,7 @@ Tensorpack is a training interface based on TensorFlow. ...@@ -12,7 +12,7 @@ Tensorpack is a training interface based on TensorFlow.
It's Yet Another TF wrapper, but different in: It's Yet Another TF wrapper, but different in:
1. Focus on __training speed__. 1. Focus on __training speed__.
+ Speed comes for free with tensorpack -- it uses TensorFlow in the __correct way__ with no extra overhead. + Speed comes for free with tensorpack -- it uses TensorFlow in the __efficient way__ with no extra overhead.
On various CNNs, it runs 1.5~1.7x faster than the equivalent Keras code. On various CNNs, it runs 1.5~1.7x faster than the equivalent Keras code.
+ Data-parallel multi-GPU training is off-the-shelf to use. It runs as fast as Google's [official benchmark](https://www.tensorflow.org/performance/benchmarks). + Data-parallel multi-GPU training is off-the-shelf to use. It runs as fast as Google's [official benchmark](https://www.tensorflow.org/performance/benchmarks).
...@@ -68,3 +68,15 @@ Dependencies: ...@@ -68,3 +68,15 @@ Dependencies:
pip install -U git+https://github.com/ppwwyyxx/tensorpack.git pip install -U git+https://github.com/ppwwyyxx/tensorpack.git
# or add `--user` to avoid system-wide installation. # or add `--user` to avoid system-wide installation.
``` ```
## Citing Tensorpack:
If you use Tensorpack in your research or wish to refer to the examples, please cite with:
```
@misc{wu2016tensorpack,
title={Tensorpack},
author={Wu, Yuxin and others},
howpublished={\url{https://github.com/tensorpack/}},
year={2016}
}
```
...@@ -10,7 +10,7 @@ It's Yet Another TF wrapper, but different in: ...@@ -10,7 +10,7 @@ It's Yet Another TF wrapper, but different in:
- Focus on **training speed**. - Focus on **training speed**.
- Speed comes for free with tensorpack -- it uses TensorFlow in the - Speed comes for free with tensorpack -- it uses TensorFlow in the
**correct way** with no extra overhead. On various CNNs, it runs 1.5~1.7x faster than the equivalent Keras code. **efficient way** with no extra overhead. On various CNNs, it runs 1.5~1.7x faster than the equivalent Keras code.
- Data-parallel multi-GPU training is off-the-shelf to use. It is as fast as Google's - Data-parallel multi-GPU training is off-the-shelf to use. It is as fast as Google's
`official benchmark <https://www.tensorflow.org/performance/benchmarks>`_. `official benchmark <https://www.tensorflow.org/performance/benchmarks>`_.
......
...@@ -34,7 +34,7 @@ Then it is a good time to open an issue. ...@@ -34,7 +34,7 @@ Then it is a good time to open an issue.
1. Learn `tf.stop_gradient`. You can simply use `tf.stop_gradient` in your model code in many situations (e.g. to freeze first several layers). 1. Learn `tf.stop_gradient`. You can simply use `tf.stop_gradient` in your model code in many situations (e.g. to freeze first several layers).
2. [varreplace.freeze_variables](../modules/tfutils.html#tensorpack.tfutils.varreplace.freeze_variables) returns a context where variables are freezed. 2. [varreplace.freeze_variables](../modules/tfutils.html#tensorpack.tfutils.varreplace.freeze_variables) returns a context where variables are freezed.
Learn to use the `custom_getter` argument of `tf.variable_scope` to gain more control over what & how variables are freezed. It is implemented by `custom_getter` argument of `tf.variable_scope` -- learn it to gain more control over what & how variables are freezed.
3. [ScaleGradient](../modules/tfutils.html#tensorpack.tfutils.gradproc.ScaleGradient) can be used to set the gradients of some variables to 0. 3. [ScaleGradient](../modules/tfutils.html#tensorpack.tfutils.gradproc.ScaleGradient) can be used to set the gradients of some variables to 0.
But it may be slow, since variables still have gradients. But it may be slow, since variables still have gradients.
......
...@@ -8,9 +8,11 @@ in TensorFlow checkpoint format. ...@@ -8,9 +8,11 @@ in TensorFlow checkpoint format.
One checkpoint typically includes a `.data-xxxxx` file and a `.index` file. One checkpoint typically includes a `.data-xxxxx` file and a `.index` file.
Both are necessary. Both are necessary.
To inspect a checkpoint, the easiest tool is `tf.train.NewCheckpointReader`. `tf.train.NewCheckpointReader` is the best tool to parse TensorFlow checkpoint.
For example, [scripts/ls-checkpoint.py](../scripts/ls-checkpoint.py) We have two example scripts to demo its usage, but read [TF docs](https://www.tensorflow.org/api_docs/python/tf/train/NewCheckpointReader) for details.
uses it to print all variables and their shapes in a checkpoint.
[scripts/ls-checkpoint.py](../scripts/ls-checkpoint.py)
demos how to print all variables and their shapes in a checkpoint.
[scripts/dump-model-params.py](../scripts/dump-model-params.py) can be used to remove unnecessary variables in a checkpoint. [scripts/dump-model-params.py](../scripts/dump-model-params.py) can be used to remove unnecessary variables in a checkpoint.
It takes a metagraph file (which is also saved by `ModelSaver`) and only saves variables that the model needs at inference time. It takes a metagraph file (which is also saved by `ModelSaver`) and only saves variables that the model needs at inference time.
......
...@@ -25,8 +25,8 @@ This is how TensorFlow summaries eventually get logged/saved/printed: ...@@ -25,8 +25,8 @@ This is how TensorFlow summaries eventually get logged/saved/printed:
All the "what, when, where" can be customized in either the graph or with the callbacks/monitors setting. All the "what, when, where" can be customized in either the graph or with the callbacks/monitors setting.
Since TF summaries are evaluated every epoch by default, if the content is data-dependent, the results Since TF summaries are evaluated infrequently (every epoch) by default, if the content is data-dependent, the values
are likely to have too much variance. To address this issue, you can: could have high variance. To address this issue, you can:
1. Change "When to Log": log more frequently, but note that certain summaries can be expensive to 1. Change "When to Log": log more frequently, but note that certain summaries can be expensive to
log. You may want to use a separate collection for frequent logging. log. You may want to use a separate collection for frequent logging.
2. Change "What to Log": you can call 2. Change "What to Log": you can call
...@@ -40,7 +40,7 @@ are likely to have too much variance. To address this issue, you can: ...@@ -40,7 +40,7 @@ are likely to have too much variance. To address this issue, you can:
Besides TensorFlow summaries, Besides TensorFlow summaries,
a callback can also write other data to the monitor backend anytime once the training has started, a callback can also write other data to the monitor backend anytime once the training has started,
by `trainer.monitors.put_xxx`. by `self.trainer.monitors.put_xxx`.
As long as the type of data is supported, the data will be dispatched to and logged to the same place. As long as the type of data is supported, the data will be dispatched to and logged to the same place.
As a result, tensorboard will show not only summaries in the graph, but also your custom data. As a result, tensorboard will show not only summaries in the graph, but also your custom data.
......
...@@ -80,11 +80,10 @@ with TowerContext('some_name_or_empty_string', is_training=False): ...@@ -80,11 +80,10 @@ with TowerContext('some_name_or_empty_string', is_training=False):
When defining the model you can construct the graph using whatever library you feel comfortable with. When defining the model you can construct the graph using whatever library you feel comfortable with.
Usually, slim/tflearn/tensorlayer are just symbolic functions, calling them is nothing different Usually, slim/tflearn/tensorlayer are just symbolic function wrappers, calling them is nothing different
from calling `tf.add`. You may need to be careful how regularizations/BN updates are supposed from calling `tf.add`. You may need to be careful how regularizations/BN updates are supposed
to be handled in those libraries, though. to be handled in those libraries, though.
It is a bit different to use sonnet/Keras. It is a bit different to use sonnet/Keras.
sonnet/Keras manages the variable scope by their own model classes, and calling their symbolic functions sonnet/Keras manages the variable scope by their own model classes, and calling their symbolic functions
always creates new variable scope. See the [Keras example](../examples/keras) for how to use it within tensorpack. always creates new variable scope. See the [Keras example](../examples/keras) for how to use it within tensorpack.
The support is only preliminary for now.
...@@ -9,6 +9,7 @@ Tensorpack trainers contain logic of: ...@@ -9,6 +9,7 @@ Tensorpack trainers contain logic of:
Usually you won't touch these methods directly, but use Usually you won't touch these methods directly, but use
[higher-level interface](training-interface.html) on trainers. [higher-level interface](training-interface.html) on trainers.
You'll only need to __select__ what trainer to use. You'll only need to __select__ what trainer to use.
But some basic knowledge of how they work is useful:
### Tower Trainer ### Tower Trainer
...@@ -16,22 +17,22 @@ Following the terminology in TensorFlow, ...@@ -16,22 +17,22 @@ Following the terminology in TensorFlow,
a __tower function__ is a callable that takes input tensors and adds __one replicate__ of the model to the graph. a __tower function__ is a callable that takes input tensors and adds __one replicate__ of the model to the graph.
Most types of neural-network training could fall into this category. Most types of neural-network training could fall into this category.
All non-base trainers in tensorpack is a subclass of [TowerTrainer](../modules/train.html#tensorpack.train.TowerTrainer). All trainers in tensorpack is a subclass of [TowerTrainer](../modules/train.html#tensorpack.train.TowerTrainer).
The concept of tower is used mainly to support: The concept of tower is used mainly to support:
1. Data-parallel multi-GPU training, where a replicate is built on each GPU. 1. Data-parallel multi-GPU training, where a replicate is built on each GPU.
2. Automatically building the graph for inference, where a replicate is built under inference mode. 2. Graph construction for inference, where a replicate is built under inference mode.
You'll specify a tower function when you use `TowerTrainer`. You'll provide a tower function to use `TowerTrainer`.
If you use `ModelDesc`, the `build_graph` method will be the tower function.
The function needs to follow some conventions: The function needs to follow some conventions:
1. It will always be called under a `TowerContext`. 1. It will always be called under a `TowerContext`.
which will contain information about reuse, training/inference, scope name, etc. which will contain information about reuse, training/inference, scope name, etc.
2. It might get called multiple times for data-parallel training or inference. 2. __It might get called multiple times__ for data-parallel training or inference.
3. To respect variable reuse, use `tf.get_variable` instead of 3. To respect variable reuse, use `tf.get_variable` instead of
`tf.Variable` in the function. `tf.Variable` in the function, unless you want to force creation of new variables.
In particular, when working with the `ModelDesc` interface, its `build_graph` method will be the tower function.
### MultiGPU Trainers ### MultiGPU Trainers
...@@ -44,11 +45,11 @@ It takes only one line of code change to use them. ...@@ -44,11 +45,11 @@ It takes only one line of code change to use them.
Note some __common problems__ when using these trainers: Note some __common problems__ when using these trainers:
1. In each iteration all GPUs (all replicates of the model) will take tensors from the `InputSource`, 1. In each iteration, all GPUs (all replicates of the model) take tensors from the `InputSource`,
instead of taking one for all and split. instead of take one for all and split.
So the total batch size would become ``(batch size of InputSource/DataFlow) * #GPU``. So the total batch size would become ``(batch size of InputSource/DataFlow) * #GPU``.
Splitting a tensor to GPUs makes no sense at all, only to put unnecessary shape constraints on the data. Splitting a tensor for data-parallel training makes no sense at all, only to put unnecessary shape constraints on the data.
By letting each GPU train on its own input tensors, they can train on inputs of different shapes simultaneously. By letting each GPU train on its own input tensors, they can train on inputs of different shapes simultaneously.
2. The tower function (your model code) will get called multipile times. 2. The tower function (your model code) will get called multipile times.
......
...@@ -70,6 +70,10 @@ class DistributedParameterServerBuilder(DataParallelBuilder, DistributedBuilderB ...@@ -70,6 +70,10 @@ class DistributedParameterServerBuilder(DataParallelBuilder, DistributedBuilderB
It is an equivalent of ``--variable_update=parameter_server`` in It is an equivalent of ``--variable_update=parameter_server`` in
`tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_. `tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_.
However this implementation hasn't been well tested.
It probably still has issues in model saving, etc.
Check `ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
for fast and correct distributed examples.
Note: Note:
1. Gradients are not averaged across workers, but applied to PS variables 1. Gradients are not averaged across workers, but applied to PS variables
...@@ -138,6 +142,9 @@ class DistributedReplicatedBuilder(DataParallelBuilder, DistributedBuilderBase): ...@@ -138,6 +142,9 @@ class DistributedReplicatedBuilder(DataParallelBuilder, DistributedBuilderBase):
It is an equivalent of ``--variable_update=distributed_replicated`` in It is an equivalent of ``--variable_update=distributed_replicated`` in
`tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_. `tensorflow/benchmarks <https://github.com/tensorflow/benchmarks>`_.
Note that the performance of this trianer is still not satisfactory.
Check `ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
for fast and correct distributed examples.
Note: Note:
1. Gradients are not averaged across workers, but applied to PS variables 1. Gradients are not averaged across workers, but applied to PS variables
......
...@@ -14,7 +14,7 @@ __all__ = [] ...@@ -14,7 +14,7 @@ __all__ = []
def describe_trainable_vars(): def describe_trainable_vars():
""" """
Print a description of the current model parameters. Print a description of the current model parameters.
Skip variables starting with "tower". Skip variables starting with "tower", as they are just duplicates built by data-parallel logic.
""" """
train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
if len(train_vars) == 0: if len(train_vars) == 0:
...@@ -51,6 +51,8 @@ def describe_trainable_vars(): ...@@ -51,6 +51,8 @@ def describe_trainable_vars():
def get_shape_str(tensors): def get_shape_str(tensors):
""" """
Internally used by layer registry, to print shapes of inputs/outputs of layers.
Args: Args:
tensors (list or tf.Tensor): a tensor or a list of tensors tensors (list or tf.Tensor): a tensor or a list of tensors
Returns: Returns:
......
...@@ -198,11 +198,11 @@ def add_param_summary(*summary_lists, **kwargs): ...@@ -198,11 +198,11 @@ def add_param_summary(*summary_lists, **kwargs):
def add_moving_summary(*args, **kwargs): def add_moving_summary(*args, **kwargs):
""" """
Add moving average summary for some tensors. Summarize the moving average for scalar tensors.
This function is a no-op if not calling from main training tower. This function is a no-op if not calling from main training tower.
Args: Args:
args: tensors to summarize args: scalar tensors to summarize
decay (float): the decay rate. Defaults to 0.95. decay (float): the decay rate. Defaults to 0.95.
collection (str or None): the name of the collection to add EMA-maintaining ops. collection (str or None): the name of the collection to add EMA-maintaining ops.
The default will work together with the default The default will work together with the default
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment