[Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783),
[Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783).
with <500 lines of code.
Results of the same code trained on 47 different Atari games were uploaded on OpenAI Gym.
Results of the same code trained on 47 different Atari games were uploaded to OpenAI Gym.
You can see them in [my gym page](https://gym.openai.com/users/ppwwyyxx).
Most of them were the best reproducible results on gym.
Most of them are the best reproducible results on gym.
However OpenAI has later completely removed leaderboard from their site.
### To train on an Atari game:
### To train on an Atari game:
...
@@ -17,10 +16,10 @@ The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores.
...
@@ -17,10 +16,10 @@ The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores.
With 2 TitanX + 20+ CPU cores, by setting `SIMULATOR_PROC=240, PREDICT_BATCH_SIZE=30, PREDICTOR_THREAD_PER_GPU=6`, it can improve to 16 it/s (2K images/s).
With 2 TitanX + 20+ CPU cores, by setting `SIMULATOR_PROC=240, PREDICT_BATCH_SIZE=30, PREDICTOR_THREAD_PER_GPU=6`, it can improve to 16 it/s (2K images/s).
Note that the network architecture is larger than what's used in the original paper.
Note that the network architecture is larger than what's used in the original paper.
The uploaded models are all trained with 4 GPUs for about 2 days.
The pretrained models are all trained with 4 GPUs for about 2 days.
But on simple games like Breakout, you can get good performance within several hours.
But on simple games like Breakout, you can get good performance within several hours.
Also note that multi-GPU doesn't give you obvious speedup here,
Also note that multi-GPU doesn't give you obvious speedup here,
because the bottleneck in this implementation is not computation but data.
because the bottleneck in this implementation is not computation but simulation.
Some practicical notes:
Some practicical notes:
...
@@ -36,28 +35,32 @@ Download models from [model zoo](http://models.tensorpack.com/OpenAIGym/).
...
@@ -36,28 +35,32 @@ Download models from [model zoo](http://models.tensorpack.com/OpenAIGym/).
Watch the agent play:
Watch the agent play:
`./train-atari.py --task play --env Breakout-v0 --load Breakout-v0.npz`
`./train-atari.py --task play --env Breakout-v0 --load Breakout-v0.npz`
Note that atari game settings in gym (AtariGames-v0) are quite different from DeepMind papers, so the scores are not comparable. The most notable differences are:
All models above are trained with the `-v0` variant of atari games.
Note that this variant is quite different from DeepMind papers, so the scores are not directly comparable.