Commit 4bc639da authored by Yuxin Wu's avatar Yuxin Wu

update readme

parent 3270acb8
...@@ -12,8 +12,12 @@ See some interesting [examples](https://github.com/ppwwyyxx/tensorpack/tree/mast ...@@ -12,8 +12,12 @@ See some interesting [examples](https://github.com/ppwwyyxx/tensorpack/tree/mast
## Features: ## Features:
Focused on modularity: Focused on modularity. Just have to define the three components in training:
+ Models has scoped abstraction of common models. 1. The model, or the graph. Define its input and output. `models/` has some scoped abstraction of common models.
+ Dataflow defines data preprocessing in pure Python.
+ Callbacks systems controls training behavior. 2. The data. All data producer has a unified `DataFlow` interface, and this interface can be chained
to perform complex preprocessing. It uses multiprocess to avoid performance bottleneck.
3. The callbacks. They include everything you want to do besides the training iterations:
change hyperparameters, save model, print logs, run validation, and more.
Implement DQN in: Reproduce DQN in:
**Human-level Control Through Deep Reinforcement Learning** **Human-level Control Through Deep Reinforcement Learning**
...@@ -6,10 +6,26 @@ and Double-DQN in: ...@@ -6,10 +6,26 @@ and Double-DQN in:
**Deep Reinforcement Learning with Double Q-learning** **Deep Reinforcement Learning with Double Q-learning**
To run: Can reproduce the claimed performance, on several games I've tested with.
![DQN](https://github.com/ppwwyyxx/tensorpack/raw/master/examples/Atari2600/DoubleDQN-breakout.png)
A demo trained with Double-DQN on breakout is available at [youtube](https://youtu.be/o21mddZtE5Y).
## How to use
Download [atari roms](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms) to
`$TENSORPACK_DATASET/atari_rom` (defaults to tensorpack/dataflow/dataset/atari_rom).
To train:
``` ```
./DQN.py --rom breakout.rom --gpu 0 ./DQN.py --rom breakout.bin --gpu 0
``` ```
Training speed is about 7.3 iteration/s on 1 Tesla M40. It takes days to learn well (see figure above).
Can reproduce the claimed performance, on games I've tested with (curves will be available soon). To play:
A demo trained with Double-DQN on breakout is available at [youtube](https://youtu.be/o21mddZtE5Y). ```
./DQN.py --rom breakout.bin --task play --load pretrained.model
```
A3C code and curve will be available soon. It learns much faster.
...@@ -9,5 +9,7 @@ The validation error here is computed on test set. ...@@ -9,5 +9,7 @@ The validation error here is computed on test set.
![cifar10](https://github.com/ppwwyyxx/tensorpack/raw/master/examples/ResNet/cifar10-resnet.png) ![cifar10](https://github.com/ppwwyyxx/tensorpack/raw/master/examples/ResNet/cifar10-resnet.png)
Download model: <!--
[Cifar10 n=18](https://drive.google.com/open?id=0B308TeQzmFDLeHpSaHAxWGV1WDg) -Download model:
-[Cifar10 n=18](https://drive.google.com/open?id=0B308TeQzmFDLeHpSaHAxWGV1WDg)
-->
...@@ -26,6 +26,7 @@ __all__ = ['AtariPlayer'] ...@@ -26,6 +26,7 @@ __all__ = ['AtariPlayer']
def log_once(): def log_once():
logger.warn("https://github.com/mgbellemare/Arcade-Learning-Environment/pull/171 is not merged!") logger.warn("https://github.com/mgbellemare/Arcade-Learning-Environment/pull/171 is not merged!")
ROM_URL = "https://github.com/openai/atari-py/tree/master/atari_py/atari_roms"
_ALE_LOCK = threading.Lock() _ALE_LOCK = threading.Lock()
class AtariPlayer(RLEnvironment): class AtariPlayer(RLEnvironment):
...@@ -51,7 +52,8 @@ class AtariPlayer(RLEnvironment): ...@@ -51,7 +52,8 @@ class AtariPlayer(RLEnvironment):
super(AtariPlayer, self).__init__() super(AtariPlayer, self).__init__()
if not os.path.isfile(rom_file) and '/' not in rom_file: if not os.path.isfile(rom_file) and '/' not in rom_file:
rom_file = os.path.join(get_dataset_dir('atari_rom'), rom_file) rom_file = os.path.join(get_dataset_dir('atari_rom'), rom_file)
assert os.path.isfile(rom_file), "rom {} not found".format(rom_file) assert os.path.isfile(rom_file), \
"rom {} not found. Please download at {}".format(rom_file, ROM_URL)
try: try:
ALEInterface.setLoggerMode(ALEInterface.Logger.Warning) ALEInterface.setLoggerMode(ALEInterface.Logger.Warning)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment