docs update

74e3eeef · Yuxin Wu · a9958037 · 74e3eeef · 74e3eeef · 74e3eeef
Commit 74e3eeef authored Feb 19, 2017 by Yuxin Wu
7 changed files
--- a/docs/tutorial/dataflow.md
+++ b/docs/tutorial/dataflow.md
@@ -47,6 +47,14 @@ Another good thing about Dataflow is that it is independent of
 tensorpack internals. You can just use it as an efficient data processing pipeline,
 and plug it into other frameworks.
+To use a DataFlow, you'll need to call `reset_state()` first to initialize it, and then use the generator however you
+want:
+```python
+df = get_some_df()
+df.reset_state()
+generator = df.get_data()
+```
 ### Write your own Dataflow
 There are several existing Dataflow, e.g. ImageFromFile, DataFromList, which you can

--- a/docs/tutorial/efficient-dataflow.md
+++ b/docs/tutorial/efficient-dataflow.md
@@ -157,7 +157,7 @@ Then we add necessary transformations:
    ds = AugmentImageComponent(ds, lots_of_augmentors)
    ds = BatchData(ds, 256)
 ```
-1. `LMDBData` deserialize the datapoints (from string to [jpeg_string, label])
+1. `LMDBDataPoint` deserialize the datapoints (from string to [jpeg_string, label])
 2. Use OpenCV to decode the first component into ndarray
 3. Apply augmentations to the ndarray

--- a/examples/A3C-Gym/README.md
+++ b/examples/A3C-Gym/README.md
@@ -10,8 +10,8 @@ Most of them are the best reproducible results on gym.
 `CUDA_VISIBLE_DEVICES=0 ./train-atari.py --env Breakout-v0`
-It should run at a speed of 6~10 iterations/s on 1 GPU plus 12+ CPU cores.
+The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores.
-In each iteration it trains on a batch of 128 new states.
+In each iteration it trains on a batch of 128 new states. The network architecture is larger than what's used in the original paper.
 The pre-trained models are all trained with 4 GPUs for about 2 days.
 But on simple games like Breakout, you can get good performance within several hours.

--- a/examples/A3C-Gym/simulator.py
+++ b/examples/A3C-Gym/simulator.py
@@ -100,7 +100,6 @@ class SimulatorMaster(threading.Thread):
        defining callbacks when a transition or an episode is finished.
    """
    class ClientState(object):
        def __init__(self):
            self.memory = []    # list of Experience
@@ -176,6 +175,7 @@ class SimulatorMaster(threading.Thread):
        self.context.destroy(linger=0)
+# ------------------- the following code are not used at all. Just experimental
 class SimulatorProcessDF(SimulatorProcessBase):
    """ A simulator which contains a forward model itself, allowing
    it to produce data points directly """

--- a/examples/DeepQNetwork/DQN.py
+++ b/examples/DeepQNetwork/DQN.py
@@ -90,7 +90,7 @@ class Model(ModelDesc):
                 .MaxPooling('pool2', 2)
                 .Conv2D('conv3', out_channel=64, kernel_shape=3)
-                 # the original arch
+                 # the original arch is 2x faster
                 # .Conv2D('conv0', image, out_channel=32, kernel_shape=8, stride=4)
                 # .Conv2D('conv1', out_channel=64, kernel_shape=4, stride=2)
                 # .Conv2D('conv2', out_channel=64, kernel_shape=3)

--- a/examples/DeepQNetwork/README.md
+++ b/examples/DeepQNetwork/README.md
@@ -24,6 +24,8 @@ My Batch-A3C implementation only took <2 hours.
 Both were trained on one GPU with an extra GPU for simulation.
 Double-DQN runs at 18 batches/s (1152 frames/s) on TitanX.
+Note that I wasn't using the network architecture in the paper.
+If switched to the network in the paper it could run 2x faster.
 ## How to use

--- a/examples/DeepQNetwork/expreplay.py
+++ b/examples/DeepQNetwork/expreplay.py
@@ -187,7 +187,7 @@ class ExpReplay(DataFlow, Callback):
            history = np.stack(history, axis=2)
            # assume batched network
-            q_values = self.predictor([[history]])[0][0]
+            q_values = self.predictor([[history]])[0][0]  # this is the bottleneck
            act = np.argmax(q_values)
        reward, isOver = self.player.action(act)
        self.mem.append(Experience(old_s, act, reward, isOver))