update docs

edb1f6c3 · Yuxin Wu · 72385a85 · edb1f6c3 · edb1f6c3
Commit edb1f6c3 authored Nov 06, 2017 by Yuxin Wu
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 9 deletions

docs/tutorial/performance-tuning.md docs/tutorial/performance-tuning.md +1 -1

examples/DoReFa-Net/alexnet-dorefa.py examples/DoReFa-Net/alexnet-dorefa.py +4 -8

No files found.
--- a/docs/tutorial/performance-tuning.md
+++ b/docs/tutorial/performance-tuning.md
@@ -71,6 +71,6 @@ If you're unable to scale to multiple GPUs almost linearly:
 2. Then note that your model may have a different communication-computation pattern or other
 	 characteristics that affects efficiency.
 	 There isn't a simple answer to this.
-	 Changing different multi-GPU trainers may affect the speed significantly sometimes.
+	 You may try a different multi-GPU trainer; the speed can vary a lot sometimes.

 Note that scalibility measurement always trains with the same "batch size per GPU", not the same total equivalent batch size.
--- a/examples/DoReFa-Net/alexnet-dorefa.py
+++ b/examples/DoReFa-Net/alexnet-dorefa.py
@@ -33,21 +33,17 @@ This is our attempt to reproduce it on tensorpack & TensorFlow.
 Accuracy:
    Trained with 4 GPUs and (W,A,G)=(1,2,6), it can reach top-1 single-crop validation error of 47.6%,
    after 70 epochs. This number is better than what's in the paper
-    due to more sophisticated augmentors.
+    due to more sophisticated augmentations.

-    Note that the effective batch size in SyncMultiGPUTrainer is actually
-    BATCH_SIZE * NUM_GPU. With a different number of GPUs in use, things might
-    be a bit different, especially for learning rate.
-
-    With (W,A,G)=(32,32,32) -- full precision baseline
+    With (W,A,G)=(32,32,32) -- full precision baseline, 41.4% error.
    With (W,A,G)=(1,32,32) -- BWN
    With (W,A,G)=(1,2,6), 47.6% error
-    With (W,A,G)=(1,2,4)
+    With (W,A,G)=(1,2,4), 58.4% error

 Speed:
    About 11 iteration/s on 4 P100s. (Each epoch is set to 10000 iterations)
    Note that this code was written early without using NCHW format. You
-    should expect a speed up after switching to NCHW format.
+    should expect a speed up if the code is ported to NCHW format.

 To Train, for example:
    ./alexnet-dorefa.py --dorefa 1,2,6 --data PATH --gpu 0,1