Commit 11932e68 authored by Yuxin Wu's avatar Yuxin Wu

Switch to trainer v2 by default. (#458)

parent 4ad831ac
Bug Reports/Feature Requests/Usage Questions Only:
Bug Reports (including performance bug):
Some part of code (either the library or examples) doesn't work as expected.
PLEASE always include the following:
Bug Reports: PLEASE always include
1. What you did. (command you run if using examples; post or describe your code if not)
2. What you observed, e.g. logs.
3. What you expected, if not obvious.
4. Your environment (TF version, cudnn version, number & type of GPUs), if it matters.
5. About low performance, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
Feature Requests:
1. Improve an existing feature.
......
# Performance Tuning
__We do not know why your training is slow__.
Performance is different on every machine. So you need to figure out most parts by your own.
Here's a list of things you can do when your training is slow.
And if you're going to open an issue about slow training, PLEASE do them and include your findings.
If you're going to open an issue about slow training, PLEASE do them and include your findings.
## Figure out the bottleneck
......@@ -18,16 +21,15 @@ And if you're going to open an issue about slow training, PLEASE do them and inc
so that the iterations doesn't take any data from Python side but train on a constant tensor.
This will help find out the slow operations you're using in the graph.
2. Use `dataflow=FakeData(shapes, random=False)` to replace your original DataFlow by a constant DataFlow.
Compared to using `DummyConstantInput`, this will include the extra Python-TF overhead, which is supposed to be negligible.
This has similar effect to (1), i.e., it eliminates the overhead of data.
3. If you're using a TF-based input pipeline you wrote, you can simply run it in a loop and test its speed.
4. Use `TestDataSpeed(mydf).start()` to benchmark your DataFlow.
A benchmark will give you more precise information about which part you should improve.
## Improve DataFlow
## Investigate DataFlow
Understand the [Efficient DataFlow](efficient-dataflow.html) tutorial,
so that you have an idea of what your DataFlow is doing.
Understand the [Efficient DataFlow](efficient-dataflow.html) tutorial, so you know what your DataFlow is doing.
Benchmark your DataFlow with modifications and you'll understand why it runs slow. Some examples
include:
......@@ -46,7 +48,7 @@ know the reason and improve it accordingly, e.g.:
anything (network, ZMQ pipe, Python-TF copy etc.)
5. Use distributed data preprocessing, with `send_dataflow_zmq` and `RemoteDataZMQ`.
## Improve TensorFlow
## Investigate TensorFlow
When you're sure that data is not a bottleneck (e.g. when queue is always full), you can start to
worry about the model.
......@@ -69,9 +71,8 @@ But there may be something cheap you can try:
If you're unable to scale to multiple GPUs almost linearly:
1. First make sure that the ResNet example can scale. Run it with `--fake` to use fake data.
If not, it's a bug or an environment setup problem.
2. Then note that your model may have a different communication-computation pattern or other
characteristics that affects efficiency.
2. Then note that your model may have a different communication-computation pattern that affects efficiency.
There isn't a simple answer to this.
You may try a different multi-GPU trainer; the speed can vary a lot sometimes.
Note that scalibility measurement always trains with the same "batch size per GPU", not the same total equivalent batch size.
Note that scalibility is always measured with the same "batch size per GPU", not the same total equivalent batch size.
......@@ -13,7 +13,7 @@ import tensorflow as tf
import six
from six.moves import queue
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.utils.concurrency import ensure_proc_terminate, start_proc_mask_signal
from tensorpack.utils.serialize import dumps
......
......@@ -7,7 +7,7 @@ import os
import argparse
from six.moves import range
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.gradproc import SummaryGradient, GlobalNormClip
import tensorflow as tf
......
......@@ -12,7 +12,7 @@ import operator
import six
from six.moves import range
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils import symbolic_functions, summary, optimizer
from tensorpack.tfutils.gradproc import GlobalNormClip
......
......@@ -8,7 +8,7 @@ import argparse
import cv2
import tensorflow as tf
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from DQNModel import Model as DQNModel
......
......@@ -6,7 +6,7 @@
import os
import argparse
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.dataflow import dataset
import tensorflow as tf
......
......@@ -7,7 +7,7 @@ import argparse
import os
import imp
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.dataflow import dataset
......
......@@ -10,7 +10,7 @@ import numpy as np
import os
import sys
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.symbolic_functions import prediction_incorrect
from tensorpack.tfutils.summary import add_moving_summary, add_param_summary
......
......@@ -4,9 +4,7 @@
# Author: Yuxin Wu <ppwwyyxxc@gmail.com>
import argparse
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.symbolic_functions import prediction_incorrect
from tensorpack.tfutils.summary import add_moving_summary, add_param_summary
......
......@@ -6,12 +6,11 @@ import argparse
import numpy as np
import tensorflow as tf
import cv2
import os
from scipy.signal import convolve2d
from six.moves import range, zip
import multiprocessing
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.utils import logger
from tensorpack.utils.viz import *
......
......@@ -13,7 +13,7 @@ import numpy as np
import json
import tensorflow as tf
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import add_moving_summary
from tensorpack.tfutils import optimizer
......
......@@ -3,9 +3,6 @@
# File: BEGAN.py
# Author: Yuxin Wu <ppwwyyxxc@gmail.com>
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import add_moving_summary
from tensorpack.utils.gpu import get_nr_gpu
......
......@@ -9,7 +9,7 @@ import os
import cv2
import argparse
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.utils.viz import interactive_imshow, stack_patches
import tensorpack.tfutils.symbolic_functions as symbf
......
......@@ -8,7 +8,7 @@ import argparse
import glob
from six.moves import range
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import add_moving_summary
from tensorpack.tfutils.scope_utils import auto_reuse_variable_scope
......
......@@ -8,7 +8,7 @@ import numpy as np
import os
import argparse
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.utils.viz import stack_patches
from tensorpack.tfutils.scope_utils import auto_reuse_variable_scope
......
......@@ -8,7 +8,7 @@ import argparse
from six.moves import map, zip
import numpy as np
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import add_moving_summary
from tensorpack.tfutils.scope_utils import auto_reuse_variable_scope
......
......@@ -10,7 +10,7 @@ import glob
import os
import argparse
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.utils.viz import stack_patches
from tensorpack.tfutils.summary import add_moving_summary
......
......@@ -3,9 +3,6 @@
# File: Improved-WGAN.py
# Author: Yuxin Wu <ppwwyyxxc@gmail.com>
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import add_moving_summary
from tensorpack.utils.globvars import globalns as G
......
......@@ -9,7 +9,7 @@ import tensorflow as tf
import os
import argparse
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.utils import viz
from tensorpack.tfutils.scope_utils import auto_reuse_variable_scope, under_name_scope
......
......@@ -3,9 +3,6 @@
# File: WGAN.py
# Author: Yuxin Wu <ppwwyyxxc@gmail.com>
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import add_moving_summary
from tensorpack.utils.globvars import globalns as G
......
......@@ -9,7 +9,7 @@ import argparse
from six.moves import zip
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.dataflow import dataset
from tensorpack.utils.gpu import get_nr_gpu
......
......@@ -7,7 +7,7 @@ import argparse
import os
import tensorflow as tf
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.symbolic_functions import prediction_incorrect
from tensorpack.tfutils.summary import add_moving_summary
......
......@@ -9,7 +9,7 @@ import os
import tensorflow as tf
import multiprocessing
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.symbolic_functions import prediction_incorrect
from tensorpack.tfutils.summary import add_moving_summary
......
......@@ -7,7 +7,7 @@ import numpy as np
import os
import argparse
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils import optimizer, summary, gradproc
from tensorpack.utils import logger
......
......@@ -7,7 +7,7 @@ import numpy as np
import argparse
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.symbolic_functions import *
from tensorpack.tfutils.summary import *
......
......@@ -6,7 +6,7 @@
import argparse
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import add_moving_summary, add_param_summary
from tensorpack.utils.gpu import get_nr_gpu
......
......@@ -5,7 +5,7 @@
import argparse
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import logger, QueueInput
from tensorpack.models import *
from tensorpack.callbacks import *
......
......@@ -9,7 +9,7 @@ import numpy as np
import os
import multiprocessing
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
import tensorflow as tf
from tensorflow.contrib.layers import variance_scaling_initializer
from tensorpack import *
......
......@@ -9,7 +9,7 @@ import cv2
import tensorflow as tf
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import logger, QueueInput, InputDesc, PlaceholderInput, TowerContext
from tensorpack.models import *
from tensorpack.callbacks import *
......
......@@ -3,13 +3,11 @@
# File: mnist-embeddings.py
import numpy as np
import os
import argparse
import tensorflow as tf
import tensorflow.contrib.slim as slim
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import add_moving_summary
from tensorpack.utils.gpu import change_gpu
......
......@@ -9,7 +9,7 @@ import tensorflow as tf
import os
import argparse
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.dataflow import dataset
from tensorpack.tfutils import sesscreate, optimizer, summary
......
......@@ -5,7 +5,7 @@
import os
import argparse
import tensorflow as tf
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
"""
......
......@@ -6,7 +6,7 @@ import tensorflow as tf
import argparse
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.tfutils.summary import *
from tensorpack.dataflow import dataset
......
......@@ -10,7 +10,7 @@ MNIST ConvNet example.
about 0.6% validation error after 30 epochs.
"""
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
# Just import everything into current namespace
from tensorpack import *
from tensorpack.tfutils import summary
......
......@@ -4,8 +4,6 @@
# Author: Yuxin Wu <ppwwyyxxc@gmail.com>
import tensorflow as tf
import os
from tensorflow import keras
KL = keras.layers
......@@ -14,7 +12,7 @@ This is an mnist example demonstrating how to use Keras symbolic function inside
This way you can define models in Keras-style, and benefit from the more efficeint trainers in tensorpack.
"""
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.dataflow import dataset
from tensorpack.utils.argtools import memoized
......
......@@ -12,7 +12,7 @@ the only differences are:
2. use slim names to summarize weights
"""
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.dataflow import dataset
import tensorflow as tf
......
......@@ -9,7 +9,7 @@ import argparse
MNIST ConvNet example with weights/activations visualization.
"""
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.dataflow import dataset
import tensorflow as tf
......
......@@ -6,7 +6,7 @@
import argparse
import os
os.environ['TENSORPACK_TRAIN_API'] = 'v2' # will become default soon
from tensorpack import *
from tensorpack.dataflow import dataset
from tensorpack.tfutils.summary import *
......
......@@ -16,8 +16,8 @@ if _HAS_TF:
from tensorpack.callbacks import *
from tensorpack.tfutils import *
# In development. Default to v1
if _os.environ.get('TENSORPACK_TRAIN_API', 'v1') == 'v2':
# Default to v2
if _os.environ.get('TENSORPACK_TRAIN_API', 'v2') == 'v2':
from tensorpack.train import *
else:
from tensorpack.trainv1 import *
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment