Commit f557aaa9 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent d2c5cc16
......@@ -11,7 +11,6 @@ cache:
addons:
apt:
packages:
- pandoc
- libprotobuf-dev
- protobuf-compiler
......@@ -42,7 +41,7 @@ matrix:
install:
- pip install -U pip # the pip version on travis is too old
- pip install flake8 scikit-image opencv-python pypandoc
- pip install flake8 scikit-image opencv-python
- pip install .
# check that dataflow can be imported alone
- python -c "import tensorpack.dataflow"
......
![Tensorpack](.github/tensorpack.png)
Tensorpack is a training interface based on TensorFlow.
Tensorpack is a neural network training interface based on TensorFlow.
[![Build Status](https://travis-ci.org/tensorpack/tensorpack.svg?branch=master)](https://travis-ci.org/tensorpack/tensorpack)
[![ReadTheDoc](https://readthedocs.org/projects/tensorpack/badge/?version=latest)](http://tensorpack.readthedocs.io/en/latest/index.html)
......@@ -12,7 +12,7 @@ Tensorpack is a training interface based on TensorFlow.
It's Yet Another TF high-level API, with __speed__, __readability__ and __flexibility__ built together.
1. Focus on __training speed__.
+ Speed comes for free with tensorpack -- it uses TensorFlow in the __efficient way__ with no extra overhead.
+ Speed comes for free with Tensorpack -- it uses TensorFlow in the __efficient way__ with no extra overhead.
On common CNNs, it runs training [1.2~5x faster](https://github.com/tensorpack/benchmarks/tree/master/other-wrappers) than the equivalent Keras code.
+ Data-parallel multi-GPU/distributed training strategy is off-the-shelf to use.
......@@ -28,7 +28,7 @@ It's Yet Another TF high-level API, with __speed__, __readability__ and __flexib
3. It's not a model wrapper.
+ There are too many symbolic function wrappers in the world. Tensorpack includes only a few common models.
But you can use any symbolic function library inside tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/....
But you can use any symbolic function library inside Tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/....
See [tutorials](http://tensorpack.readthedocs.io/tutorial/index.html#user-tutorials) to know more about these features.
......@@ -36,7 +36,7 @@ See [tutorials](http://tensorpack.readthedocs.io/tutorial/index.html#user-tutori
We refuse toy examples.
Instead of showing you 10 arbitrary networks trained on toy datasets,
[tensorpack examples](examples) faithfully replicate papers and care about reproducing numbers,
[Tensorpack examples](examples) faithfully replicate papers and care about reproducing numbers,
demonstrating its flexibility for actual research.
### Vision:
......@@ -67,7 +67,7 @@ Dependencies:
+ TensorFlow >= 1.3.0 (Optional if you only want to use `tensorpack.dataflow` alone as a data processing library)
```
# install git, then:
pip install -U git+https://github.com/tensorpack/tensorpack.git
pip install --upgrade git+https://github.com/tensorpack/tensorpack.git
# or add `--user` to avoid system-wide installation.
```
......
......@@ -12,7 +12,7 @@ with the support of:
## Dependencies
+ Python 3; TensorFlow >= 1.6 (1.4 or 1.5 can run but may crash due to a TF bug);
+ [pycocotools](https://github.com/pdollar/coco/tree/master/PythonAPI/), OpenCV.
+ [pycocotools](https://github.com/cocodataset/cocoapi/tree/master/PythonAPI/), OpenCV.
+ Pre-trained [ImageNet ResNet model](http://models.tensorpack.com/FasterRCNN/)
from tensorpack model zoo. Use the models with "-AlignPadding".
+ COCO data. It needs to have the following directory structure:
......@@ -33,17 +33,22 @@ COCO/DIR/
## Usage
To train:
### Train:
On a single machine:
```
./train.py --config \
MODE_MASK=True MODE_FPN=True \
DATA.BASEDIR=/path/to/COCO/DIR \
BACKBONE.WEIGHTS=/path/to/ImageNet-R50-Pad.npz \
```
To run distributed training, set `TRAINER=horovod` and refer to [HorovodTrainer docs](http://tensorpack.readthedocs.io/modules/train.html#tensorpack.train.HorovodTrainer).
Options can be changed by either the command line or the `config.py` file.
Recommended configurations are listed in the table below.
The code is only valid for training with 1, 2, 4 or 8 GPUs.
The code is only valid for training with 1, 2, 4 or >=8 GPUs.
Not training with 8 GPUs may result in different performance from the table below.
To predict on an image (and show output in a window):
......@@ -64,17 +69,17 @@ Evaluation or prediction will need the same `--config` used during training.
These models are trained with different configurations on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95.
MaskRCNN results contain both box and mask mAP.
| Backbone | mAP<br/>(box/mask) | Detectron mAP <br/> (box/mask) | Time | Configurations <br/> (click to expand) |
| Backbone | mAP<br/>(box;mask) | Detectron mAP <br/> (box;mask) | Time | Configurations <br/> (click to expand) |
| - | - | - | - | - |
| R50-C4 | 33.1 | | 18h on 8 V100s | <details><summary>super quick</summary>`MODE_MASK=False FRCNN.BATCH_PER_IM=64`<br/>`PREPROC.SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> |
| R50-C4 | 36.6 | 36.5 | 44h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=False` </details> |
| R50-FPN | 37.5 | 37.9<sup>[1](#ft1)</sup> | 28h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=False MODE_FPN=True` </details> |
| R50-C4 | 36.8/32.1 | | 39h on 8 P100s | <details><summary>quick</summary>`MODE_MASK=True FRCNN.BATCH_PER_IM=256`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> |
| R50-C4 | 37.8/33.1 | 37.8/32.8 | 49h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True` </details> |
| R50-FPN | 38.2/34.9 | 38.6/34.5<sup>[1](#ft1)</sup> | 32h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True` </details> |
| R50-FPN | 38.5/34.8 | 38.6/34.2<sup>[2](#ft2)</sup> | 34h on 8 V100s | <details><summary>standard+ConvHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head` </details> |
| R50-FPN | 39.5/35.2 | 39.5/34.4<sup>[2](#ft2)</sup> | 34h on 8 V100s | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details> |
| R101-C4 | 40.8/35.1 | | 63h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> |
| R50-C4 | 36.8;32.1 | | 39h on 8 P100s | <details><summary>quick</summary>`MODE_MASK=True FRCNN.BATCH_PER_IM=256`<br/>`TRAIN.LR_SCHEDULE=[150000,230000,280000]` </details> |
| R50-C4 | 37.8;33.1 | 37.8;32.8 | 49h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True` </details> |
| R50-FPN | 38.2;34.9 | 38.6;34.5<sup>[1](#ft1)</sup> | 32h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True MODE_FPN=True` </details> |
| R50-FPN | 38.5;34.8 | 38.6;34.2<sup>[2](#ft2)</sup> | 34h on 8 V100s | <details><summary>standard+ConvHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head` </details> |
| R50-FPN | 39.5;35.2 | 39.5;34.4<sup>[2](#ft2)</sup> | 34h on 8 V100s | <details><summary>standard+ConvGNHead</summary>`MODE_MASK=True MODE_FPN=True`<br/>`FPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head` </details> |
| R101-C4 | 40.8;35.1 | | 63h on 8 V100s | <details><summary>standard</summary>`MODE_MASK=True `<br/>`BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3]` </details> |
<a id="ft1">1</a>: Slightly different configurations.
......
numpy
six
termcolor>=1.1
tabulate>=0.7.7
tqdm>4.11.1
pyarrow>=0.9.0
pyzmq>=16
subprocess32; python_version < '3.0'
functools32; python_version < '3.0'
import setuptools
version = int(setuptools.__version__.split('.')[0])
assert version > 30, "tensorpack installation requires setuptools > 30"
from setuptools import setup
import os
from os import path
import platform
import shutil
import sys
# setup metainfo
CURRENT_DIR = os.path.dirname(__file__)
libinfo_py = os.path.join(CURRENT_DIR, 'tensorpack/libinfo.py')
exec(open(libinfo_py, "rb").read())
version = int(setuptools.__version__.split('.')[0])
assert version > 30, "tensorpack installation requires setuptools > 30"
this_directory = path.abspath(path.dirname(__file__))
# produce rst readme for pypi
try:
import pypandoc
long_description = pypandoc.convert_file('README.md', 'rst')
description_type = 'text/x-rst'
except ImportError:
long_description = open('README.md').read()
description_type = 'text/markdown'
# setup metainfo
libinfo_py = path.join(this_directory, 'tensorpack', 'libinfo.py')
last_line = open(libinfo_py, "rb").readlines()[-1].strip()
exec(last_line)
# configure requirements
reqfile = os.path.join(CURRENT_DIR, 'requirements.txt')
req = [x.strip() for x in open(reqfile).readlines()]
with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
long_description = f.read()
setup(
name='tensorpack',
version=__version__,
version=__version__, # noqa
description='Neural Network Toolbox on TensorFlow',
long_description=long_description,
long_description_content_type=description_type,
install_requires=req,
long_description_content_type='text/markdown',
install_requires=[
"numpy",
"six",
"termcolor>=1.1",
"tabulate>=0.7.7",
"tqdm>4.11.1",
"pyarrow>=0.9.0",
"pyzmq>=16",
"subprocess32; python_version < '3.0'",
"functools32; python_version < '3.0'",
],
tests_require=['flake8', 'scikit-image'],
extras_require={
'all': ['pillow', 'scipy', 'h5py', 'lmdb>=0.92', 'matplotlib', 'scikit-learn'] + \
'all': ['pillow', 'scipy', 'h5py', 'lmdb>=0.92', 'matplotlib', 'scikit-learn'] +
['python-prctl'] if platform.system() == 'Linux' else [],
'all: python_version < "3.0"': ['tornado'],
},
......
......@@ -10,10 +10,12 @@ __all__ = ['PeriodicTrigger', 'PeriodicCallback', 'EnableCallbackIf']
class PeriodicTrigger(ProxyCallback):
"""
Trigger a callback every k global steps or every k epochs by its :meth:`trigger()` method.
Most existing callbacks which do something every epoch are implemented
with :meth:`trigger()` method.
with :meth:`trigger()` method. By default the :meth:`trigger()` method will be called every epoch.
This wrapper can make the callback run at a different frequency.
All other methods (``before/after_run``, ``trigger_step``, etc) of the input callback are unaffected.
All other methods (``before/after_run``, ``trigger_step``, etc) of the given callback are unaffected.
"""
def __init__(self, triggerable, every_k_steps=None, every_k_epochs=None):
......
......@@ -52,4 +52,6 @@ except ImportError:
_HAS_TF = False
# This line has to be the last line of the file.
# setup.py will use it to determine the version
__version__ = '0.8.6'
......@@ -220,7 +220,7 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
batch_mean_square = tf.reduce_mean(tf.square(inputs), axis=red_axis)
if sync_statistics == 'nccl':
if six.PY3 and TF_version <= 1.8 and ctx.is_main_training_tower:
if six.PY3 and TF_version <= 1.9 and ctx.is_main_training_tower:
logger.warn("A TensorFlow bug will cause cross-GPU BatchNorm to fail. "
"Apply this patch: https://github.com/tensorflow/tensorflow/pull/20360")
......
......@@ -295,15 +295,15 @@ class HorovodTrainer(SingleCostTrainer):
.. code-block:: bash
# change trainer to HorovodTrainer(), then
# First, change trainer to HorovodTrainer(), then
CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -np 4 --output-filename mylog python train.py
To use for distributed training:
.. code-block:: bash
# change trainer to HorovodTrainer(), then
/path/to/mpirun -np 8 -H server1:4,server2:4 \\
# First, change trainer to HorovodTrainer(), then
mpirun -np 8 -H server1:4,server2:4 \\
-bind-to none -map-by slot \\
--output-filename mylog -x LD_LIBRARY_PATH \\
python train.py
......@@ -312,14 +312,15 @@ class HorovodTrainer(SingleCostTrainer):
# There are other MPI options that can potentially improve performance especially on special hardwares.
Note:
1. There are several options in Horovod installation and in MPI command line that can improve speed.
1. To reach the maximum speed in your system, there are many options to tune
for Horovod installation and in the MPI command line.
See Horovod docs for details.
2. Due to a TF bug, you must not initialize CUDA context before training.
2. Due to a TF bug, you must not initialize CUDA context before the trainer starts training.
Therefore TF functions like `is_gpu_available()` or `list_local_devices()`
must be avoided.
2. MPI does not like fork(). If your dataflow contains multiprocessing, it may cause problems.
2. MPI does not like `fork()`. If your dataflow contains multiprocessing, it may cause problems.
3. MPI sometimes fails to kill all processes. Be sure to check it afterwards.
......@@ -337,8 +338,8 @@ class HorovodTrainer(SingleCostTrainer):
See :meth:`callback.set_chief_only()`. Most callbacks have a reasonable
default already, but certain callbacks may not behave properly by default. Report an issue if you find any.
+ You can use Horovod API such as `hvd.rank()` to know which process you are.
Chief process has rank 0.
+ You can use Horovod API such as `hvd.rank()` to know which process you are and choose
different code path. Chief process has rank 0.
5. Due to these caveats, see
`ResNet-Horovod <https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod>`_
......@@ -395,7 +396,7 @@ class HorovodTrainer(SingleCostTrainer):
session_creator.config.gpu_options.visible_device_list = str(self._local_rank)
try:
session_creator.config.inter_op_parallelism_threads = mp.cpu_count() // hvd.local_size()
except AttributeError:
except AttributeError: # old horovod does not have local_size
pass
super(HorovodTrainer, self).initialize(
session_creator, session_init)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment