a3c curve

cdbbbc5d · Yuxin Wu · e750306b · e750306b · cdbbbc5d · cdbbbc5d
Commit cdbbbc5d authored Jul 12, 2016 by Yuxin Wu
4 changed files
--- a/examples/Atari2600/DoubleDQN-breakout.png
+++ b/examples/Atari2600/DoubleDQN-breakout.png
--- a/examples/Atari2600/README.md
+++ b/examples/Atari2600/README.md
-Reproduce DQN in:
+Reproduce the following methods:

+ Nature-DQN in:
 [Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)

-and Double-DQN in:
-
+ Double-DQN in:
 [Deep Reinforcement Learning with Double Q-learning](http://arxiv.org/abs/1509.06461)

-Can reproduce the claimed performance, on several games I've tested with.
+ A3C in [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783). (I
+used a modified version where each batch contains transitions from different simulators, which I called "Batch A3C".)
+
+Claimed performance in the paper can be reproduced, on several games I've tested with.
+
+![DQN](curve-breakout.png)

-![DQN](DoubleDQN-breakout.png)
+A demo trained with Double-DQN on breakout game is available at [youtube](https://youtu.be/o21mddZtE5Y).

-A demo trained with Double-DQN on breakout is available at [youtube](https://youtu.be/o21mddZtE5Y).
+DQN would typically take 2~3 days of training to reach a score of 400 on breakout, but it only takes <4 hours on 1 GPU with my A3C implementation.
+This is probably the fastest RL trainer you'd find.

 ## How to use

@@ -30,4 +36,4 @@ To visualize the agent:
 ./DQN.py --rom breakout.bin --task play --load pretrained.model
 ```

-A3C code and curve will be available soon. It learns much faster.
+A3C code will be released at the end of August.
--- a/examples/Atari2600/curve-breakout.png
+++ b/examples/Atari2600/curve-breakout.png
--- a/tensorpack/utils/loadcaffe.py
+++ b/tensorpack/utils/loadcaffe.py
@@ -15,7 +15,7 @@ from .utils import change_env, get_dataset_dir
 from .fs import download
 from . import logger

-__all__ = ['load_caffe']
+__all__ = ['load_caffe', 'get_caffe_pb']

 CAFFE_PROTO_URL = "https://github.com/BVLC/caffe/raw/master/src/caffe/proto/caffe.proto"