Commit 978f06a6 authored by Yuxin Wu's avatar Yuxin Wu

update readme

parent ceba3217
...@@ -44,7 +44,7 @@ the rest of the data pipeline. ...@@ -44,7 +44,7 @@ the rest of the data pipeline.
### Why DataFlow ### Why DataFlow
1. It's easy: write everything in pure Python, and reuse existing utilities. 1. It's easy: write everything in pure Python, and reuse existing utilities.
On the contrary, writing data loaders in TF operators or other frameworks is usually painful. On the contrary, writing data loaders in TF operators is usually painful, and performance is hard to tune.
2. It's fast: see [Efficient DataFlow](efficient-dataflow.html) 2. It's fast: see [Efficient DataFlow](efficient-dataflow.html)
on how to build a fast DataFlow with parallel prefetching. on how to build a fast DataFlow with parallel prefetching.
If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html) If you're using DataFlow with tensorpack, also see [Input Pipeline tutorial](input-source.html)
......
...@@ -89,9 +89,10 @@ to let this method run every k steps or every k epochs. ...@@ -89,9 +89,10 @@ to let this method run every k steps or every k epochs.
### What you can do in the callback ### What you can do in the callback
* Access tensors / ops in either training / inference mode (need to create them in `_setup_graph`). * Access tensors / ops in either training / inference mode (need to create them in `_setup_graph`).
To create a callable function under inference mode, use `self.trainer.get_predictor`. `self.trainer.get_predictor` is a helper function to create a callable under inference mode.
* Write stuff to the monitor backend, by `self.trainer.monitors.put_xxx`. * Write stuff to the monitor backend, by `self.trainer.monitors.put_xxx`.
The monitors might direct your events to TensorFlow events file, JSON file, stdout, etc. The monitors might direct your events to TensorFlow events file, JSON file, stdout, etc.
You can get history monitor data as well. See the docs for [Monitors](../../modules/callbacks.html#tensorpack.callbacks.Monitors) You can get history monitor data as well. See the docs for [Monitors](../../modules/callbacks.html#tensorpack.callbacks.Monitors)
* Access the current status of training, such as `epoch_num`, `global_step`. See [here](../../modules/callbacks.html#tensorpack.callbacks.Callback) * Access the current status of training, such as `epoch_num`, `global_step`. See [here](../../modules/callbacks.html#tensorpack.callbacks.Callback)
* Stop training by `raise StopTraining()` (with `from tensorpack.train import StopTraining`).
* Anything else that can be done with plain python. * Anything else that can be done with plain python.
### Code and models for Atari games in gym ### A3C code and models for Atari games in gym
Implemented Multi-GPU version of the A3C algorithm in [Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783). Multi-GPU version of the A3C algorithm in
[Asynchronous Methods for Deep Reinforcement Learning](http://arxiv.org/abs/1602.01783),
with <500 lines of code.
Results of the same code trained on 47 different Atari games were uploaded on OpenAI Gym. Results of the same code trained on 47 different Atari games were uploaded on OpenAI Gym.
You can see them in [my gym page](https://gym.openai.com/users/ppwwyyxx). You can see them in [my gym page](https://gym.openai.com/users/ppwwyyxx).
...@@ -15,7 +17,7 @@ The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores. ...@@ -15,7 +17,7 @@ The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores.
With 2 TitanX + 20+ CPU cores, by setting `SIMULATOR_PROC=240, PREDICT_BATCH_SIZE=30, PREDICTOR_THREAD_PER_GPU=6`, it can improve to 16 it/s (2K images/s). With 2 TitanX + 20+ CPU cores, by setting `SIMULATOR_PROC=240, PREDICT_BATCH_SIZE=30, PREDICTOR_THREAD_PER_GPU=6`, it can improve to 16 it/s (2K images/s).
Note that the network architecture is larger than what's used in the original paper. Note that the network architecture is larger than what's used in the original paper.
The pre-trained models are all trained with 4 GPUs for about 2 days. The uploaded models are all trained with 4 GPUs for about 2 days.
But on simple games like Breakout, you can get good performance within several hours. But on simple games like Breakout, you can get good performance within several hours.
Also note that multi-GPU doesn't give you obvious speedup here, Also note that multi-GPU doesn't give you obvious speedup here,
because the bottleneck in this implementation is not computation but data. because the bottleneck in this implementation is not computation but data.
...@@ -25,7 +27,7 @@ Some practicical notes: ...@@ -25,7 +27,7 @@ Some practicical notes:
1. Prefer Python 3. 1. Prefer Python 3.
2. Occasionally, processes may not get terminated completely. It is suggested to use `systemd-run` to run any 2. Occasionally, processes may not get terminated completely. It is suggested to use `systemd-run` to run any
multiprocess Python program to get a cgroup dedicated for the task. multiprocess Python program to get a cgroup dedicated for the task.
3. Training with a significant slower speed (e.g. on CPU) will result in very bad score, probably because of async issues. 3. Training with a significant slower speed (e.g. on CPU) will result in very bad score, probably because of the slightly off-policy implementation.
### To test a model: ### To test a model:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment