Commit 1a5d3f4f authored by Yuxin Wu's avatar Yuxin Wu

gpu_nr_gpu() now really count the gpus

parent 1e790a93
......@@ -59,9 +59,10 @@ there are ways to understand which one is the bottleneck:
### Load ImageNet efficiently
We take ImageNet dataset as an example of how to optimize a DataFlow for speed.
We take ImageNet dataset as an example of how to optimize a DataFlow.
We use ILSVRC12 training set, which contains 1.28 million images.
Following the [ResNet example](../examples/ResNet), our pre-processing need images in their original resolution, so we don't resize them.
Following the [ResNet example](../examples/ResNet), our pre-processing need images in their original resolution, so we'll read the original
dataset instead of a down-sampled version here.
The average resolution is about 400x350 <sup>[[1]]</sup>.
The original images (JPEG compressed) are 140G in total.
......
### Code and models for Atari games in gym
Implemented A3C in [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783).
Implemented Multi-GPU version of the A3C algorithm in [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783).
Results of the same code trained on 47 different Atari games were uploaded on OpenAI Gym.
You can see them in [my gym page](https://gym.openai.com/users/ppwwyyxx).
......@@ -8,14 +8,16 @@ Most of them are the best reproducible results on gym.
### To train on an Atari game:
`./train-atari.py --env Breakout-v0 --gpu 0`
`CUDA_VISIBLE_DEVICES=0 ./train-atari.py --env Breakout-v0`
It should run at a speed of 6~10 iteration/s on 1 GPU plus 12+ CPU cores.
Training with a significant slower speed (e.g. on CPU) will give bad performance,
Training with a significant slower speed (e.g. on CPU) will result in very bad score,
probably because of async issues.
The pre-trained models are all trained with 4 GPUs for about 2 days.
But note that multi-GPU doesn't give you obvious speedup here,
because the bottleneck is not computation but data.
Occasionally processes may not get terminated completely, therefore it is suggested to use systemd-run to run any
Occasionally, processes may not get terminated completely, therefore it is suggested to use `systemd-run` to run any
multiprocess Python program to get a cgroup dedicated for the task.
### To run a pretrained Atari model for 100 episodes:
......
......@@ -254,8 +254,8 @@ if __name__ == '__main__':
elif args.task == 'eval':
eval_model_multithread(cfg, EVAL_EPISODE)
else:
if args.gpu:
nr_gpu = get_nr_gpu()
nr_gpu = get_nr_gpu()
if nr_gpu > 0:
if nr_gpu > 1:
predict_tower = range(nr_gpu)[-nr_gpu // 2:]
else:
......
......@@ -7,6 +7,7 @@ import numpy as np
import tensorflow as tf
import os
import sys
import cv2
import argparse
from tensorpack import *
......
......@@ -5,6 +5,7 @@
import os
from .utils import change_env
from . import logger
__all__ = ['change_gpu', 'get_nr_gpu']
......@@ -26,5 +27,10 @@ def get_nr_gpu():
int: the number of GPU from ``CUDA_VISIBLE_DEVICES``.
"""
env = os.environ.get('CUDA_VISIBLE_DEVICES', None)
assert env is not None, 'gpu not set!' # TODO
return len(env.split(','))
if env is not None:
return len(env.split(','))
logger.info("Loading local devices by TensorFlow ...")
from tensorflow.python.client import device_lib
device_protos = device_lib.list_local_devices()
gpus = [x.name for x in device_protos if x.device_type == 'GPU']
return len(gpus)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment