Commit 1a5d3f4f authored by Yuxin Wu's avatar Yuxin Wu

gpu_nr_gpu() now really count the gpus

parent 1e790a93
...@@ -59,9 +59,10 @@ there are ways to understand which one is the bottleneck: ...@@ -59,9 +59,10 @@ there are ways to understand which one is the bottleneck:
### Load ImageNet efficiently ### Load ImageNet efficiently
We take ImageNet dataset as an example of how to optimize a DataFlow for speed. We take ImageNet dataset as an example of how to optimize a DataFlow.
We use ILSVRC12 training set, which contains 1.28 million images. We use ILSVRC12 training set, which contains 1.28 million images.
Following the [ResNet example](../examples/ResNet), our pre-processing need images in their original resolution, so we don't resize them. Following the [ResNet example](../examples/ResNet), our pre-processing need images in their original resolution, so we'll read the original
dataset instead of a down-sampled version here.
The average resolution is about 400x350 <sup>[[1]]</sup>. The average resolution is about 400x350 <sup>[[1]]</sup>.
The original images (JPEG compressed) are 140G in total. The original images (JPEG compressed) are 140G in total.
......
### Code and models for Atari games in gym ### Code and models for Atari games in gym
Implemented A3C in [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783). Implemented Multi-GPU version of the A3C algorithm in [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783).
Results of the same code trained on 47 different Atari games were uploaded on OpenAI Gym. Results of the same code trained on 47 different Atari games were uploaded on OpenAI Gym.
You can see them in [my gym page](https://gym.openai.com/users/ppwwyyxx). You can see them in [my gym page](https://gym.openai.com/users/ppwwyyxx).
...@@ -8,14 +8,16 @@ Most of them are the best reproducible results on gym. ...@@ -8,14 +8,16 @@ Most of them are the best reproducible results on gym.
### To train on an Atari game: ### To train on an Atari game:
`./train-atari.py --env Breakout-v0 --gpu 0` `CUDA_VISIBLE_DEVICES=0 ./train-atari.py --env Breakout-v0`
It should run at a speed of 6~10 iteration/s on 1 GPU plus 12+ CPU cores. It should run at a speed of 6~10 iteration/s on 1 GPU plus 12+ CPU cores.
Training with a significant slower speed (e.g. on CPU) will give bad performance, Training with a significant slower speed (e.g. on CPU) will result in very bad score,
probably because of async issues. probably because of async issues.
The pre-trained models are all trained with 4 GPUs for about 2 days. The pre-trained models are all trained with 4 GPUs for about 2 days.
But note that multi-GPU doesn't give you obvious speedup here,
because the bottleneck is not computation but data.
Occasionally processes may not get terminated completely, therefore it is suggested to use systemd-run to run any Occasionally, processes may not get terminated completely, therefore it is suggested to use `systemd-run` to run any
multiprocess Python program to get a cgroup dedicated for the task. multiprocess Python program to get a cgroup dedicated for the task.
### To run a pretrained Atari model for 100 episodes: ### To run a pretrained Atari model for 100 episodes:
......
...@@ -254,8 +254,8 @@ if __name__ == '__main__': ...@@ -254,8 +254,8 @@ if __name__ == '__main__':
elif args.task == 'eval': elif args.task == 'eval':
eval_model_multithread(cfg, EVAL_EPISODE) eval_model_multithread(cfg, EVAL_EPISODE)
else: else:
if args.gpu: nr_gpu = get_nr_gpu()
nr_gpu = get_nr_gpu() if nr_gpu > 0:
if nr_gpu > 1: if nr_gpu > 1:
predict_tower = range(nr_gpu)[-nr_gpu // 2:] predict_tower = range(nr_gpu)[-nr_gpu // 2:]
else: else:
......
...@@ -7,6 +7,7 @@ import numpy as np ...@@ -7,6 +7,7 @@ import numpy as np
import tensorflow as tf import tensorflow as tf
import os import os
import sys import sys
import cv2
import argparse import argparse
from tensorpack import * from tensorpack import *
......
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
import os import os
from .utils import change_env from .utils import change_env
from . import logger
__all__ = ['change_gpu', 'get_nr_gpu'] __all__ = ['change_gpu', 'get_nr_gpu']
...@@ -26,5 +27,10 @@ def get_nr_gpu(): ...@@ -26,5 +27,10 @@ def get_nr_gpu():
int: the number of GPU from ``CUDA_VISIBLE_DEVICES``. int: the number of GPU from ``CUDA_VISIBLE_DEVICES``.
""" """
env = os.environ.get('CUDA_VISIBLE_DEVICES', None) env = os.environ.get('CUDA_VISIBLE_DEVICES', None)
assert env is not None, 'gpu not set!' # TODO if env is not None:
return len(env.split(',')) return len(env.split(','))
logger.info("Loading local devices by TensorFlow ...")
from tensorflow.python.client import device_lib
device_protos = device_lib.list_local_devices()
gpus = [x.name for x in device_protos if x.device_type == 'GPU']
return len(gpus)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment