Commit 74e3eeef authored by Yuxin Wu's avatar Yuxin Wu

docs update

parent a9958037
...@@ -47,6 +47,14 @@ Another good thing about Dataflow is that it is independent of ...@@ -47,6 +47,14 @@ Another good thing about Dataflow is that it is independent of
tensorpack internals. You can just use it as an efficient data processing pipeline, tensorpack internals. You can just use it as an efficient data processing pipeline,
and plug it into other frameworks. and plug it into other frameworks.
To use a DataFlow, you'll need to call `reset_state()` first to initialize it, and then use the generator however you
want:
```python
df = get_some_df()
df.reset_state()
generator = df.get_data()
```
### Write your own Dataflow ### Write your own Dataflow
There are several existing Dataflow, e.g. ImageFromFile, DataFromList, which you can There are several existing Dataflow, e.g. ImageFromFile, DataFromList, which you can
......
...@@ -157,7 +157,7 @@ Then we add necessary transformations: ...@@ -157,7 +157,7 @@ Then we add necessary transformations:
ds = AugmentImageComponent(ds, lots_of_augmentors) ds = AugmentImageComponent(ds, lots_of_augmentors)
ds = BatchData(ds, 256) ds = BatchData(ds, 256)
``` ```
1. `LMDBData` deserialize the datapoints (from string to [jpeg_string, label]) 1. `LMDBDataPoint` deserialize the datapoints (from string to [jpeg_string, label])
2. Use OpenCV to decode the first component into ndarray 2. Use OpenCV to decode the first component into ndarray
3. Apply augmentations to the ndarray 3. Apply augmentations to the ndarray
......
...@@ -10,8 +10,8 @@ Most of them are the best reproducible results on gym. ...@@ -10,8 +10,8 @@ Most of them are the best reproducible results on gym.
`CUDA_VISIBLE_DEVICES=0 ./train-atari.py --env Breakout-v0` `CUDA_VISIBLE_DEVICES=0 ./train-atari.py --env Breakout-v0`
It should run at a speed of 6~10 iterations/s on 1 GPU plus 12+ CPU cores. The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores.
In each iteration it trains on a batch of 128 new states. In each iteration it trains on a batch of 128 new states. The network architecture is larger than what's used in the original paper.
The pre-trained models are all trained with 4 GPUs for about 2 days. The pre-trained models are all trained with 4 GPUs for about 2 days.
But on simple games like Breakout, you can get good performance within several hours. But on simple games like Breakout, you can get good performance within several hours.
......
...@@ -100,7 +100,6 @@ class SimulatorMaster(threading.Thread): ...@@ -100,7 +100,6 @@ class SimulatorMaster(threading.Thread):
defining callbacks when a transition or an episode is finished. defining callbacks when a transition or an episode is finished.
""" """
class ClientState(object): class ClientState(object):
def __init__(self): def __init__(self):
self.memory = [] # list of Experience self.memory = [] # list of Experience
...@@ -176,6 +175,7 @@ class SimulatorMaster(threading.Thread): ...@@ -176,6 +175,7 @@ class SimulatorMaster(threading.Thread):
self.context.destroy(linger=0) self.context.destroy(linger=0)
# ------------------- the following code are not used at all. Just experimental
class SimulatorProcessDF(SimulatorProcessBase): class SimulatorProcessDF(SimulatorProcessBase):
""" A simulator which contains a forward model itself, allowing """ A simulator which contains a forward model itself, allowing
it to produce data points directly """ it to produce data points directly """
......
...@@ -90,7 +90,7 @@ class Model(ModelDesc): ...@@ -90,7 +90,7 @@ class Model(ModelDesc):
.MaxPooling('pool2', 2) .MaxPooling('pool2', 2)
.Conv2D('conv3', out_channel=64, kernel_shape=3) .Conv2D('conv3', out_channel=64, kernel_shape=3)
# the original arch # the original arch is 2x faster
# .Conv2D('conv0', image, out_channel=32, kernel_shape=8, stride=4) # .Conv2D('conv0', image, out_channel=32, kernel_shape=8, stride=4)
# .Conv2D('conv1', out_channel=64, kernel_shape=4, stride=2) # .Conv2D('conv1', out_channel=64, kernel_shape=4, stride=2)
# .Conv2D('conv2', out_channel=64, kernel_shape=3) # .Conv2D('conv2', out_channel=64, kernel_shape=3)
......
...@@ -24,6 +24,8 @@ My Batch-A3C implementation only took <2 hours. ...@@ -24,6 +24,8 @@ My Batch-A3C implementation only took <2 hours.
Both were trained on one GPU with an extra GPU for simulation. Both were trained on one GPU with an extra GPU for simulation.
Double-DQN runs at 18 batches/s (1152 frames/s) on TitanX. Double-DQN runs at 18 batches/s (1152 frames/s) on TitanX.
Note that I wasn't using the network architecture in the paper.
If switched to the network in the paper it could run 2x faster.
## How to use ## How to use
......
...@@ -187,7 +187,7 @@ class ExpReplay(DataFlow, Callback): ...@@ -187,7 +187,7 @@ class ExpReplay(DataFlow, Callback):
history = np.stack(history, axis=2) history = np.stack(history, axis=2)
# assume batched network # assume batched network
q_values = self.predictor([[history]])[0][0] q_values = self.predictor([[history]])[0][0] # this is the bottleneck
act = np.argmax(q_values) act = np.argmax(q_values)
reward, isOver = self.player.action(act) reward, isOver = self.player.action(act)
self.mem.append(Experience(old_s, act, reward, isOver)) self.mem.append(Experience(old_s, act, reward, isOver))
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment