update docs

d4799335 · Yuxin Wu · 53903072 · d4799335 · d4799335 · d4799335
Commit d4799335 authored Mar 25, 2018 by Yuxin Wu
Hide whitespace changes
Inline Side-by-side

Showing with 12 additions and 3 deletions

README.md README.md +1 -1

examples/ImageNetModels/imagenet_utils.py examples/ImageNetModels/imagenet_utils.py +10 -1

examples/README.md examples/README.md +1 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@ It's Yet Another TF wrapper, but different in:

 	+ Data-parallel multi-GPU training is off-the-shelf to use. It scales as well as Google's [official benchmark](https://www.tensorflow.org/performance/benchmarks).

-	+ See [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks) for the benchmark scripts.
+	+ Distributed data-parallel training is also supported and scales well. See [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks) for more benchmark scripts.

 2. Focus on __large datasets__.
 	+ It's unnecessary to read/preprocess data with a new language called TF.

--- a/examples/ImageNetModels/imagenet_utils.py
+++ b/examples/ImageNetModels/imagenet_utils.py
@@ -145,6 +145,11 @@ class ImageNetModel(ModelDesc):
    """
    image_dtype = tf.uint8

+    """
+    Whether to apply weight decay on BN parameters.
+    """
+    weight_decay_on_bn = False
+
    def __init__(self, data_format='NCHW'):
        self.data_format = data_format

@@ -161,7 +166,11 @@ class ImageNetModel(ModelDesc):
        loss = ImageNetModel.compute_loss_and_error(logits, label)

        if self.weight_decay > 0:
-            wd_loss = regularize_cost('.*/W', tf.contrib.layers.l2_regularizer(self.weight_decay),
+            if self.weight_decay_on_bn:
+                pattern = '.*/W|.*/gamma|.*/beta'
+            else:
+                pattern = '.*/W'
+            wd_loss = regularize_cost(pattern, tf.contrib.layers.l2_regularizer(self.weight_decay),
                                      name='l2_regularize_loss')
            add_moving_summary(loss, wd_loss)
            total_cost = tf.add_n([loss, wd_loss], name='cost')

--- a/examples/README.md
+++ b/examples/README.md
@@ -3,7 +3,7 @@

 Training examples with __reproducible performance__.

-__The word "reproduce" should always means reproduce performance__.
+__The word "reproduce" should always mean reproduce performance__.
 With the magic of SGD, wrong deep learning code often appears to still work,
 especially if you try it on toy datasets.
 See [Unawareness of Deep Learning Mistakes](https://medium.com/@ppwwyyxx/unawareness-of-deep-learning-mistakes-d5b5774da0ba).