Commit 0f0a9c53 authored by Yuxin Wu's avatar Yuxin Wu

update docs; check TF is built with CUDA

parent ef62f188
...@@ -11,26 +11,27 @@ Any unexpected problems: __PLEASE ALWAYS INCLUDE__: ...@@ -11,26 +11,27 @@ Any unexpected problems: __PLEASE ALWAYS INCLUDE__:
+ If not, tell us what you did that may be relevant. + If not, tell us what you did that may be relevant.
But we may not be able to resolve it if there is no reproducible code. But we may not be able to resolve it if there is no reproducible code.
+ Better to paste what you did instead of describing them. + Better to paste what you did instead of describing them.
2. What you observed, e.g. as much logs as possible. 2. What you observed, e.g. the entire log:
+ Better to paste what you observed instead of describing them. + Better to paste what you observed instead of describing them.
3. What you expected, if not obvious. 3. What you expected, if not obvious.
4. Your environment: 4. Your environment:
+ Python version. + Python version.
+ TF version: `python -c 'import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)'`. + TF version: `python -c 'import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)'`.
+ Tensorpack version: `python3 -c 'import tensorpack; print(tensorpack.__version__)'`. You can install Tensorpack master by `pip install -U git+https://github.com/ppwwyyxx/tensorpack.git`. + Tensorpack version: `python3 -c 'import tensorpack; print(tensorpack.__version__)'`.
You can install Tensorpack master by `pip install -U git+https://github.com/ppwwyyxx/tensorpack.git`.:
5. About efficiency, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html 5. About efficiency, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
Feature Requests: Feature Requests:
+ You can implement a lot of features by extending tensorpack + You can implement a lot of features by extending tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack). (See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
It does not have to be added to tensorpack unless you have a good reason. It does not have to be added to tensorpack unless you have a good reason.
+ We don't take feature requests for implementing new techniques. + We don't take feature requests for implementing new papers.
If you don't know how, ask it as a usage question. If you don't know how, ask it as a usage question.
Usage Questions: Usage Questions:
+ Read the [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials) first. + Read the [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials) first.
+ We answer "HOW to do X in tensorpack" for a specific well-defined X. + We answer "HOW to do X in tensorpack" for a well-defined X.
We don't answer general machine learning questions, We don't answer general machine learning questions,
such as "what networks to use" or "I don't understand the paper". such as "what networks to use" or "I don't understand the paper".
......
...@@ -57,7 +57,10 @@ def fbresnet_augmentor(isTrain): ...@@ -57,7 +57,10 @@ def fbresnet_augmentor(isTrain):
if isTrain: if isTrain:
augmentors = [ augmentors = [
GoogleNetResize(), GoogleNetResize(),
imgaug.RandomOrderAug( # Remove these augs if your CPU is not fast enough # It's OK to remove these augs if your CPU is not fast enough.
# Removing brightness/contrast/saturation does not have a significant effect on accuracy.
# Removing lighting leads to a tiny drop in accuracy.
imgaug.RandomOrderAug(
[imgaug.BrightnessScale((0.6, 1.4), clip=False), [imgaug.BrightnessScale((0.6, 1.4), clip=False),
imgaug.Contrast((0.6, 1.4), clip=False), imgaug.Contrast((0.6, 1.4), clip=False),
imgaug.Saturation(0.4, rgb=False), imgaug.Saturation(0.4, rgb=False),
......
...@@ -31,7 +31,7 @@ To train, first decompress ImageNet data into [this structure](http://tensorpack ...@@ -31,7 +31,7 @@ To train, first decompress ImageNet data into [this structure](http://tensorpack
``` ```
You should be able to see good GPU utilization (95%~99%), if your data is fast enough. You should be able to see good GPU utilization (95%~99%), if your data is fast enough.
It can finish training [within 20 hours](http://dawn.cs.stanford.edu/benchmark/ImageNet/train.html) on AWS p3.16xlarge. With batch=64x8, it can finish 100 epochs in 16 hours on AWS p3.16xlarge (8 V100s).
The default data pipeline is probably OK for machines with SSD & 20 CPU cores. The default data pipeline is probably OK for machines with SSD & 20 CPU cores.
See the [tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html) on other options to speed up your data. See the [tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html) on other options to speed up your data.
......
...@@ -119,7 +119,8 @@ if __name__ == '__main__': ...@@ -119,7 +119,8 @@ if __name__ == '__main__':
parser.add_argument('--eval', action='store_true', help='run offline evaluation instead of training') parser.add_argument('--eval', action='store_true', help='run offline evaluation instead of training')
parser.add_argument('--batch', default=256, type=int, parser.add_argument('--batch', default=256, type=int,
help="total batch size. " help="total batch size. "
"Note that it's best to keep per-GPU batch size in [32, 64] to obtain the best accuracy.") "Note that it's best to keep per-GPU batch size in [32, 64] to obtain the best accuracy."
"Pretrained models listed in README were trained with batch=32x8.")
parser.add_argument('--mode', choices=['resnet', 'preact', 'se'], parser.add_argument('--mode', choices=['resnet', 'preact', 'se'],
help='variants of resnet to use', default='resnet') help='variants of resnet to use', default='resnet')
args = parser.parse_args() args = parser.parse_args()
......
...@@ -39,6 +39,8 @@ class DataParallelBuilder(GraphBuilder): ...@@ -39,6 +39,8 @@ class DataParallelBuilder(GraphBuilder):
""" """
if len(towers) > 1: if len(towers) > 1:
logger.info("[DataParallel] Training a model of {} towers.".format(len(towers))) logger.info("[DataParallel] Training a model of {} towers.".format(len(towers)))
if not tf.test.is_built_with_cuda():
logger.warn("TensorFlow was not built with CUDA support!")
self.towers = towers self.towers = towers
......
...@@ -27,26 +27,37 @@ def get_num_gpu(): ...@@ -27,26 +27,37 @@ def get_num_gpu():
Returns: Returns:
int: #available GPUs in CUDA_VISIBLE_DEVICES, or in the system. int: #available GPUs in CUDA_VISIBLE_DEVICES, or in the system.
""" """
def warn_return(ret, message):
try:
import tensorflow as tf
except ImportError:
return ret
built_with_cuda = tf.test.is_built_with_cuda()
if not built_with_cuda and ret > 0:
logger.warn(message + "But TensorFlow was not built with CUDA support!")
return ret
env = os.environ.get('CUDA_VISIBLE_DEVICES', None) env = os.environ.get('CUDA_VISIBLE_DEVICES', None)
if env is not None: if env is not None:
return len(env.split(',')) return warn_return(len(env.split(',')), "Found non-empty CUDA_VISIBLE_DEVICES. ")
output, code = subproc_call("nvidia-smi -L", timeout=5) output, code = subproc_call("nvidia-smi -L", timeout=5)
if code == 0: if code == 0:
output = output.decode('utf-8') output = output.decode('utf-8')
return len(output.strip().split('\n')) return warn_return(len(output.strip().split('\n')), "Found nvidia-smi. ")
else: try:
try: # Use NVML to query device properties
# Use NVML to query device properties with NVMLContext() as ctx:
with NVMLContext() as ctx: return warn_return(ctx.num_devices(), "NVML found nvidia devices. ")
return ctx.num_devices() except Exception:
except Exception: # Fallback
# Fallback # Note this will initialize all GPUs and therefore has side effect
# Note this will initialize all GPUs and therefore has side effect # https://github.com/tensorflow/tensorflow/issues/8136
# https://github.com/tensorflow/tensorflow/issues/8136 logger.info("Loading local devices by TensorFlow ...")
logger.info("Loading local devices by TensorFlow ...") from tensorflow.python.client import device_lib
from tensorflow.python.client import device_lib local_device_protos = device_lib.list_local_devices()
local_device_protos = device_lib.list_local_devices() return len([x.name for x in local_device_protos if x.device_type == 'GPU'])
return len([x.name for x in local_device_protos if x.device_type == 'GPU'])
get_nr_gpu = get_num_gpu get_nr_gpu = get_num_gpu
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment