Commit f557aaa9 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent d2c5cc16
...@@ -11,7 +11,6 @@ cache: ...@@ -11,7 +11,6 @@ cache:
addons: addons:
apt: apt:
packages: packages:
- pandoc
- libprotobuf-dev - libprotobuf-dev
- protobuf-compiler - protobuf-compiler
...@@ -42,7 +41,7 @@ matrix: ...@@ -42,7 +41,7 @@ matrix:
install: install:
- pip install -U pip # the pip version on travis is too old - pip install -U pip # the pip version on travis is too old
- pip install flake8 scikit-image opencv-python pypandoc - pip install flake8 scikit-image opencv-python
- pip install . - pip install .
# check that dataflow can be imported alone # check that dataflow can be imported alone
- python -c "import tensorpack.dataflow" - python -c "import tensorpack.dataflow"
......
![Tensorpack](.github/tensorpack.png) ![Tensorpack](.github/tensorpack.png)
Tensorpack is a training interface based on TensorFlow. Tensorpack is a neural network training interface based on TensorFlow.
[![Build Status](https://travis-ci.org/tensorpack/tensorpack.svg?branch=master)](https://travis-ci.org/tensorpack/tensorpack) [![Build Status](https://travis-ci.org/tensorpack/tensorpack.svg?branch=master)](https://travis-ci.org/tensorpack/tensorpack)
[![ReadTheDoc](https://readthedocs.org/projects/tensorpack/badge/?version=latest)](http://tensorpack.readthedocs.io/en/latest/index.html) [![ReadTheDoc](https://readthedocs.org/projects/tensorpack/badge/?version=latest)](http://tensorpack.readthedocs.io/en/latest/index.html)
...@@ -12,7 +12,7 @@ Tensorpack is a training interface based on TensorFlow. ...@@ -12,7 +12,7 @@ Tensorpack is a training interface based on TensorFlow.
It's Yet Another TF high-level API, with __speed__, __readability__ and __flexibility__ built together. It's Yet Another TF high-level API, with __speed__, __readability__ and __flexibility__ built together.
1. Focus on __training speed__. 1. Focus on __training speed__.
+ Speed comes for free with tensorpack -- it uses TensorFlow in the __efficient way__ with no extra overhead. + Speed comes for free with Tensorpack -- it uses TensorFlow in the __efficient way__ with no extra overhead.
On common CNNs, it runs training [1.2~5x faster](https://github.com/tensorpack/benchmarks/tree/master/other-wrappers) than the equivalent Keras code. On common CNNs, it runs training [1.2~5x faster](https://github.com/tensorpack/benchmarks/tree/master/other-wrappers) than the equivalent Keras code.
+ Data-parallel multi-GPU/distributed training strategy is off-the-shelf to use. + Data-parallel multi-GPU/distributed training strategy is off-the-shelf to use.
...@@ -28,7 +28,7 @@ It's Yet Another TF high-level API, with __speed__, __readability__ and __flexib ...@@ -28,7 +28,7 @@ It's Yet Another TF high-level API, with __speed__, __readability__ and __flexib
3. It's not a model wrapper. 3. It's not a model wrapper.
+ There are too many symbolic function wrappers in the world. Tensorpack includes only a few common models. + There are too many symbolic function wrappers in the world. Tensorpack includes only a few common models.
But you can use any symbolic function library inside tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/.... But you can use any symbolic function library inside Tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/....
See [tutorials](http://tensorpack.readthedocs.io/tutorial/index.html#user-tutorials) to know more about these features. See [tutorials](http://tensorpack.readthedocs.io/tutorial/index.html#user-tutorials) to know more about these features.
...@@ -36,7 +36,7 @@ See [tutorials](http://tensorpack.readthedocs.io/tutorial/index.html#user-tutori ...@@ -36,7 +36,7 @@ See [tutorials](http://tensorpack.readthedocs.io/tutorial/index.html#user-tutori
We refuse toy examples. We refuse toy examples.
Instead of showing you 10 arbitrary networks trained on toy datasets, Instead of showing you 10 arbitrary networks trained on toy datasets,
[tensorpack examples](examples) faithfully replicate papers and care about reproducing numbers, [Tensorpack examples](examples) faithfully replicate papers and care about reproducing numbers,
demonstrating its flexibility for actual research. demonstrating its flexibility for actual research.
### Vision: ### Vision:
...@@ -67,7 +67,7 @@ Dependencies: ...@@ -67,7 +67,7 @@ Dependencies:
+ TensorFlow >= 1.3.0 (Optional if you only want to use `tensorpack.dataflow` alone as a data processing library) + TensorFlow >= 1.3.0 (Optional if you only want to use `tensorpack.dataflow` alone as a data processing library)
``` ```
# install git, then: # install git, then:
pip install -U git+https://github.com/tensorpack/tensorpack.git pip install --upgrade git+https://github.com/tensorpack/tensorpack.git
# or add `--user` to avoid system-wide installation. # or add `--user` to avoid system-wide installation.
``` ```
......
...@@ -12,7 +12,7 @@ with the support of: ...@@ -12,7 +12,7 @@ with the support of:
## Dependencies ## Dependencies
+ Python 3; TensorFlow >= 1.6 (1.4 or 1.5 can run but may crash due to a TF bug); + Python 3; TensorFlow >= 1.6 (1.4 or 1.5 can run but may crash due to a TF bug);
+ [pycocotools](https://github.com/pdollar/coco/tree/master/PythonAPI/), OpenCV. + [pycocotools](https://github.com/cocodataset/cocoapi/tree/master/PythonAPI/), OpenCV.
+ Pre-trained [ImageNet ResNet model](http://models.tensorpack.com/FasterRCNN/) + Pre-trained [ImageNet ResNet model](http://models.tensorpack.com/FasterRCNN/)
from tensorpack model zoo. Use the models with "-AlignPadding". from tensorpack model zoo. Use the models with "-AlignPadding".
+ COCO data. It needs to have the following directory structure: + COCO data. It needs to have the following directory structure:
...@@ -33,17 +33,22 @@ COCO/DIR/ ...@@ -33,17 +33,22 @@ COCO/DIR/
## Usage ## Usage
To train: ### Train:
On a single machine:
``` ```
./train.py --config \ ./train.py --config \
MODE_MASK=True MODE_FPN=True \ MODE_MASK=True MODE_FPN=True \
DATA.BASEDIR=/path/to/COCO/DIR \ DATA.BASEDIR=/path/to/COCO/DIR \
BACKBONE.WEIGHTS=/path/to/ImageNet-R50-Pad.npz \ BACKBONE.WEIGHTS=/path/to/ImageNet-R50-Pad.npz \
``` ```
To run distributed training, set `TRAINER=horovod` and refer to [HorovodTrainer docs](http://tensorpack.readthedocs.io/modules/train.html#tensorpack.train.HorovodTrainer).
Options can be changed by either the command line or the `config.py` file. Options can be changed by either the command line or the `config.py` file.
Recommended configurations are listed in the table below. Recommended configurations are listed in the table below.
The code is only valid for training with 1, 2, 4 or 8 GPUs. The code is only valid for training with 1, 2, 4 or >=8 GPUs.
Not training with 8 GPUs may result in different performance from the table below. Not training with 8 GPUs may result in different performance from the table below.
To predict on an image (and show output in a window): To predict on an image (and show output in a window):
...@@ -64,17 +69,17 @@ Evaluation or prediction will need the same `--config` used during training. ...@@ -64,17 +69,17 @@ Evaluation or prediction will need the same `--config` used during training.
These models are trained with different configurations on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95. These models are trained with different configurations on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95.
MaskRCNN results contain both box and mask mAP. MaskRCNN results contain both box and mask mAP.
| Backbone | mAP<br/>(box/mask) | Detectron mAP <br/> (box/mask) | Time | Configurations <br/> (click to expand) | | Backbone | mAP<br/>(box;mask) | Detectron mAP <br/> (box;mask) | Time | Configurations <br/> (click to expand) |
| - | - | - | - | - | | - | - | - | - | - |
| R50-C4 | 33.1 | | 18h on 8 V100s | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> | | R50-C4 | 33.1 | | 18h on 8 V100s | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> |
| R50-C4 | 36.6 | 36.5 | 44h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=False` </details> | | R50-C4 | 36.6 | 36.5 | 44h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=False` </details> |
| R50-FPN | 37.5 | 37.9<sup>[1](#ft1)</sup> | 28h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> | | R50-FPN | 37.5 | 37.9<sup>[1](#ft1)</sup> | 28h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> |
| R50-C4 | 36.8/32.1 | | 39h on 8 P100s | <details><summary>quick</summary>`MODE_MASK=True FRCNN.BATCH_PER_IM=256`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> | | R50-C4 | 36.8;32.1 | | 39h on 8 P100s | <details><summary>quick</summary>`MODE_MASK=True FRCNN.BATCH_PER_IM=256`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> |
| R50-C4 | 37.8/33.1 | 37.8/32.8 | 49h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True` </details> | | R50-C4 | 37.8;33.1 | 37.8;32.8 | 49h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True` </details> |
| R50-FPN | 38.2/34.9 | 38.6/34.5<sup>[1](#ft1)</sup> | 32h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True` </details> | | R50-FPN | 38.2;34.9 | 38.6;34.5<sup>[1](#ft1)</sup> | 32h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True` </details> |
| R50-FPN | 38.5/34.8 | 38.6/34.2<sup>[2](#ft2)</sup> | 34h on 8 V100s | <details><summary>standard+ConvHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head` </details> | | R50-FPN | 38.5;34.8 | 38.6;34.2<sup>[2](#ft2)</sup> | 34h on 8 V100s | <details><summary>standard+ConvHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head` </details> |
| R50-FPN | 39.5/35.2 | 39.5/34.4<sup>[2](#ft2)</sup> | 34h on 8 V100s | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details> | | R50-FPN | 39.5;35.2 | 39.5;34.4<sup>[2](#ft2)</sup> | 34h on 8 V100s | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details> |
| R101-C4 | 40.8/35.1 | | 63h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> | | R101-C4 | 40.8;35.1 | | 63h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> |
<a id="ft1">1</a>: Slightly different configurations. <a id="ft1">1</a>: Slightly different configurations.
......
numpy
six
termcolor>=1.1
tabulate>=0.7.7
tqdm>4.11.1
pyarrow>=0.9.0
pyzmq>=16
subprocess32; python_version < '3.0'
functools32; python_version < '3.0'
import setuptools import setuptools
version = int(setuptools.__version__.split('.')[0])
assert version > 30, "tensorpack installation requires setuptools > 30"
from setuptools import setup from setuptools import setup
import os from os import path
import platform import platform
import shutil
import sys
# setup metainfo version = int(setuptools.__version__.split('.')[0])
CURRENT_DIR = os.path.dirname(__file__) assert version > 30, "tensorpack installation requires setuptools > 30"
libinfo_py = os.path.join(CURRENT_DIR, 'tensorpack/libinfo.py')
exec(open(libinfo_py, "rb").read()) this_directory = path.abspath(path.dirname(__file__))
# produce rst readme for pypi # setup metainfo
try: libinfo_py = path.join(this_directory, 'tensorpack', 'libinfo.py')
import pypandoc last_line = open(libinfo_py, "rb").readlines()[-1].strip()
long_description = pypandoc.convert_file('README.md', 'rst') exec(last_line)
description_type = 'text/x-rst'
except ImportError:
long_description = open('README.md').read()
description_type = 'text/markdown'
# configure requirements with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
reqfile = os.path.join(CURRENT_DIR, 'requirements.txt') long_description = f.read()
req = [x.strip() for x in open(reqfile).readlines()]
setup( setup(
name='tensorpack', name='tensorpack',
version=__version__, version=__version__, # noqa
description='Neural Network Toolbox on TensorFlow', description='Neural Network Toolbox on TensorFlow',
long_description=long_description, long_description=long_description,
long_description_content_type=description_type, long_description_content_type='text/markdown',
install_requires=req, install_requires=[
"numpy",
"six",
"termcolor>=1.1",
"tabulate>=0.7.7",
"tqdm>4.11.1",
"pyarrow>=0.9.0",
"pyzmq>=16",
"subprocess32; python_version < '3.0'",
"functools32; python_version < '3.0'",
],
tests_require=['flake8', 'scikit-image'], tests_require=['flake8', 'scikit-image'],
extras_require={ extras_require={
'all': ['pillow', 'scipy', 'h5py', 'lmdb>=0.92', 'matplotlib', 'scikit-learn'] + \ 'all': ['pillow', 'scipy', 'h5py', 'lmdb>=0.92', 'matplotlib', 'scikit-learn'] +
['python-prctl'] if platform.system() == 'Linux' else [], ['python-prctl'] if platform.system() == 'Linux' else [],
'all: python_version < "3.0"': ['tornado'], 'all: python_version < "3.0"': ['tornado'],
}, },
......
...@@ -10,10 +10,12 @@ __all__ = ['PeriodicTrigger', 'PeriodicCallback', 'EnableCallbackIf'] ...@@ -10,10 +10,12 @@ __all__ = ['PeriodicTrigger', 'PeriodicCallback', 'EnableCallbackIf']
class PeriodicTrigger(ProxyCallback): class PeriodicTrigger(ProxyCallback):
""" """
Trigger a callback every k global steps or every k epochs by its :meth:`trigger()` method. Trigger a callback every k global steps or every k epochs by its :meth:`trigger()` method.
Most existing callbacks which do something every epoch are implemented Most existing callbacks which do something every epoch are implemented
with :meth:`trigger()` method. with :meth:`trigger()` method. By default the :meth:`trigger()` method will be called every epoch.
This wrapper can make the callback run at a different frequency.
All other methods (``before/after_run``, ``trigger_step``, etc) of the input callback are unaffected. All other methods (``before/after_run``, ``trigger_step``, etc) of the given callback are unaffected.
""" """
def __init__(self, triggerable, every_k_steps=None, every_k_epochs=None): def __init__(self, triggerable, every_k_steps=None, every_k_epochs=None):
......
...@@ -52,4 +52,6 @@ except ImportError: ...@@ -52,4 +52,6 @@ except ImportError:
_HAS_TF = False _HAS_TF = False
# This line has to be the last line of the file.
# setup.py will use it to determine the version
__version__ = '0.8.6' __version__ = '0.8.6'
...@@ -220,7 +220,7 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5, ...@@ -220,7 +220,7 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
batch_mean_square = tf.reduce_mean(tf.square(inputs), axis=red_axis) batch_mean_square = tf.reduce_mean(tf.square(inputs), axis=red_axis)
if sync_statistics == 'nccl': if sync_statistics == 'nccl':
if six.PY3 and TF_version <= 1.8 and ctx.is_main_training_tower: if six.PY3 and TF_version <= 1.9 and ctx.is_main_training_tower:
logger.warn("A TensorFlow bug will cause cross-GPU BatchNorm to fail. " logger.warn("A TensorFlow bug will cause cross-GPU BatchNorm to fail. "
"Apply this patch: https://github.com/tensorflow/tensorflow/pull/20360") "Apply this patch: https://github.com/tensorflow/tensorflow/pull/20360")
......
...@@ -295,15 +295,15 @@ class HorovodTrainer(SingleCostTrainer): ...@@ -295,15 +295,15 @@ class HorovodTrainer(SingleCostTrainer):
.. code-block:: bash .. code-block:: bash
# change trainer to HorovodTrainer(), then # First, change trainer to HorovodTrainer(), then
CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -np 4 --output-filename mylog python train.py CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -np 4 --output-filename mylog python train.py
To use for distributed training: To use for distributed training:
.. code-block:: bash .. code-block:: bash
# change trainer to HorovodTrainer(), then # First, change trainer to HorovodTrainer(), then
/path/to/mpirun -np 8 -H server1:4,server2:4 \\ mpirun -np 8 -H server1:4,server2:4 \\
-bind-to none -map-by slot \\ -bind-to none -map-by slot \\
--output-filename mylog -x LD_LIBRARY_PATH \\ --output-filename mylog -x LD_LIBRARY_PATH \\
python train.py python train.py
...@@ -312,14 +312,15 @@ class HorovodTrainer(SingleCostTrainer): ...@@ -312,14 +312,15 @@ class HorovodTrainer(SingleCostTrainer):
# There are other MPI options that can potentially improve performance especially on special hardwares. # There are other MPI options that can potentially improve performance especially on special hardwares.
Note: Note:
1. There are several options in Horovod installation and in MPI command line that can improve speed. 1. To reach the maximum speed in your system, there are many options to tune
for Horovod installation and in the MPI command line.
See Horovod docs for details. See Horovod docs for details.
2. Due to a TF bug, you must not initialize CUDA context before training. 2. Due to a TF bug, you must not initialize CUDA context before the trainer starts training.
Therefore TF functions like `is_gpu_available()` or `list_local_devices()` Therefore TF functions like `is_gpu_available()` or `list_local_devices()`
must be avoided. must be avoided.
2. MPI does not like fork(). If your dataflow contains multiprocessing, it may cause problems. 2. MPI does not like `fork()`. If your dataflow contains multiprocessing, it may cause problems.
3. MPI sometimes fails to kill all processes. Be sure to check it afterwards. 3. MPI sometimes fails to kill all processes. Be sure to check it afterwards.
...@@ -337,8 +338,8 @@ class HorovodTrainer(SingleCostTrainer): ...@@ -337,8 +338,8 @@ class HorovodTrainer(SingleCostTrainer):
See :meth:`callback.set_chief_only()`. Most callbacks have a reasonable See :meth:`callback.set_chief_only()`. Most callbacks have a reasonable
default already, but certain callbacks may not behave properly by default. Report an issue if you find any. default already, but certain callbacks may not behave properly by default. Report an issue if you find any.
+ You can use Horovod API such as `hvd.rank()` to know which process you are. + You can use Horovod API such as `hvd.rank()` to know which process you are and choose
Chief process has rank 0. different code path. Chief process has rank 0.
5. Due to these caveats, see 5. Due to these caveats, see
`ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_ `ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
...@@ -395,7 +396,7 @@ class HorovodTrainer(SingleCostTrainer): ...@@ -395,7 +396,7 @@ class HorovodTrainer(SingleCostTrainer):
session_creator.config.gpu_options.visible_device_list = str(self._local_rank) session_creator.config.gpu_options.visible_device_list = str(self._local_rank)
try: try:
session_creator.config.inter_op_parallelism_threads = mp.cpu_count() // hvd.local_size() session_creator.config.inter_op_parallelism_threads = mp.cpu_count() // hvd.local_size()
except AttributeError: except AttributeError: # old horovod does not have local_size
pass pass
super(HorovodTrainer, self).initialize( super(HorovodTrainer, self).initialize(
session_creator, session_init) session_creator, session_init)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment