Commit cdbbbc5d authored by Yuxin Wu's avatar Yuxin Wu

a3c curve

parent e750306b
Reproduce DQN in:
Reproduce the following methods:
+ Nature-DQN in:
[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
and Double-DQN in:
+ Double-DQN in:
[Deep Reinforcement Learning with Double Q-learning](http://arxiv.org/abs/1509.06461)
Can reproduce the claimed performance, on several games I've tested with.
+ A3C in [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783). (I
used a modified version where each batch contains transitions from different simulators, which I called "Batch A3C".)
Claimed performance in the paper can be reproduced, on several games I've tested with.
![DQN](curve-breakout.png)
![DQN](DoubleDQN-breakout.png)
A demo trained with Double-DQN on breakout game is available at [youtube](https://youtu.be/o21mddZtE5Y).
A demo trained with Double-DQN on breakout is available at [youtube](https://youtu.be/o21mddZtE5Y).
DQN would typically take 2~3 days of training to reach a score of 400 on breakout, but it only takes <4 hours on 1 GPU with my A3C implementation.
This is probably the fastest RL trainer you'd find.
## How to use
......@@ -30,4 +36,4 @@ To visualize the agent:
./DQN.py --rom breakout.bin --task play --load pretrained.model
```
A3C code and curve will be available soon. It learns much faster.
A3C code will be released at the end of August.
......@@ -15,7 +15,7 @@ from .utils import change_env, get_dataset_dir
from .fs import download
from . import logger
__all__ = ['load_caffe']
__all__ = ['load_caffe', 'get_caffe_pb']
CAFFE_PROTO_URL = "https://github.com/BVLC/caffe/raw/master/src/caffe/proto/caffe.proto"
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment