update docs

22978393 · Yuxin Wu · 7cd93047 · 22978393 · 22978393 · 22978393
Commit 22978393 authored Sep 17, 2017 by Yuxin Wu
Showing with 9 additions and 20 deletions

docs/tutorial/performance-tuning.md docs/tutorial/performance-tuning.md +6 -4

examples/DoReFa-Net/README.md examples/DoReFa-Net/README.md +2 -15

tensorpack/train/base.py tensorpack/train/base.py +1 -1

No files found.
--- a/docs/tutorial/performance-tuning.md
+++ b/docs/tutorial/performance-tuning.md
@@ -10,7 +10,7 @@ Here's a list of things you can do when your training is slow:
 2. If you use queue-based input + dataflow, you can look for the queue size statistics in
 	 training log. Ideally the queue should be near-full (default size is 50).
 	 If the size is near-zero, data is the bottleneck.
-3. If the GPU utilization is low, data is likely to be the bottleneck. Also make sure GPUs are not locked in P8 state.
+3. If the GPU utilization is low, it may be because of slow data, or some ops are on CPU. Also make sure GPUs are not locked in P8 state.

 ## Benchmark the components
 1. Use `data=DummyConstantInput(shapes)` in `TrainConfig`,
@@ -48,11 +48,13 @@ know the reason and improve it accordingly, e.g.:
 ## Improve TensorFlow

 You can add a `GraphProfiler` callback when benchmarking the graph. It will
-dump TF tracing information (to either TensorBoard or chrome) to help diagnose the issue.
+dump runtime tracing information (to either TensorBoard or chrome) to help diagnose the issue.

 Usually there isn't much you can do if a TF op is slow, except to optimize the kernels.
 But there may be something cheap you can try:
-1. Device placement of ops can affect speed,
-	 sometimes it helps to change device placement to avoid some copy.
+1. You can visualize copies across devices in chrome.
+	 It may help to change device placement to avoid copies.
+	 It may help to replace some CPU-only ops with equivalent GPU ops to avoid copies.
+
 2. Sometimes there are several mathematically equivalent ways of writing the same model
 	 with different speed.
--- a/examples/DoReFa-Net/README.md
+++ b/examples/DoReFa-Net/README.md
@@ -18,27 +18,14 @@ Alternative link to this page: [http://dorefa.net](http://dorefa.net)

 ## Preparation:

-To use the script. You'll need:
-
-+ TensorFlow >= 1.0.0 (>=1.1 for MultiGPU)
-
-+ OpenCV bindings for Python
-
-+ [tensorpack](https://github.com/ppwwyyxx/tensorpack):
-
-```
-git clone https://github.com/ppwwyyxx/tensorpack
-pip install --user -r tensorpack/requirements.txt
-pip install --user scipy
-export PYTHONPATH=$PYTHONPATH:`readlink -f tensorpack`
-```
+ Install [tensorpack](https://github.com/ppwwyyxx/tensorpack) and scipy.

 + Look at the docstring in `*-dorefa.py` to see detailed usage and performance.

 ## Support

 Please use [github issues](https://github.com/ppwwyyxx/tensorpack/issues) for any issues related to the code itself.
-Send email to the authors for general questions related to the paper.
+Please send email to the authors for general questions related to the paper.

 ## Citation


--- a/tensorpack/train/base.py
+++ b/tensorpack/train/base.py
@@ -40,7 +40,7 @@ class MaintainStepCounter(Callback):
        # ensure it exists
        gs_var = get_global_step_var()
        with tf.name_scope(None):
-            with tf.device(gs_var.device):
+            with self.graph.colocate_with(gs_var):
                self.gs_incr_op = tf.assign_add(
                    gs_var, 1,
                    name=GLOBAL_STEP_INCR_OP_NAME).op