@@ -54,7 +54,7 @@ These trainers will take care of step 1 (define the graph), with the following a
4. A function which returns an optimizer.
These are documented in [SingleCostTrainer.setup_graph](../modules/train.html#tensorpack.train.SingleCostTrainer.setup_graph).
In practice you'll not use this method directly, but use [high-level interface](training-interface.html#with-modeldesc-and-trainconfig) instead.
In practice you'll not use this method directly, but use [high-level interface](../tutorial/training-interface.html#with-modeldesc-and-trainconfig) instead.
[Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783).
Results of the same code trained on 47 different Atari games were uploaded to OpenAI Gym.
Results of the code trained on 47 different Atari games were uploaded to OpenAI Gym and available for download.
Most of them were the best reproducible results on gym.
However OpenAI has later completely removed leaderboard from their site.
However OpenAI has later removed the leaderboard from their site.
### To train on an Atari game:
`./train-atari.py --env Breakout-v0 --gpu 0`
In each iteration it trains on a batch of 128 new states.
The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores.
With 2 TitanX + 20+ CPU cores, by setting `SIMULATOR_PROC=240, PREDICT_BATCH_SIZE=30, PREDICTOR_THREAD_PER_GPU=6`, it can improve to 16 it/s (2K images/s).
The speed is about 20 iterations/s (2.5k images/s) on 1 V100 GPU plus 12+ CPU cores.
Note that the network architecture is larger than what's used in the original paper.
The pretrained models are all trained with 4 GPUs for about 2 days.
But on simple games like Breakout, you can get good performance within several hours.
Also note that multi-GPU doesn't give you obvious speedup here,
because the bottleneck in this implementation is not computation but simulation.
But on simple games like Breakout, you can get decent performance within several hours.
For example, it takes only __2 hours__ on a V100 to reach 400 average score on Breakout.
Some practicical notes:
...
...
@@ -66,4 +64,4 @@ The most notable differences are:
+ An episode is limited to 60000 steps.
+ Lost of live is not end of episode.
Also see the DQN implementation [here](../DeepQNetwork)
Also see the [DQN implementation in tensorpack](../DeepQNetwork)