Commit d4799335 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 53903072
...@@ -17,7 +17,7 @@ It's Yet Another TF wrapper, but different in: ...@@ -17,7 +17,7 @@ It's Yet Another TF wrapper, but different in:
+ Data-parallel multi-GPU training is off-the-shelf to use. It scales as well as Google's [official benchmark](https://www.tensorflow.org/performance/benchmarks). + Data-parallel multi-GPU training is off-the-shelf to use. It scales as well as Google's [official benchmark](https://www.tensorflow.org/performance/benchmarks).
+ See [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks) for the benchmark scripts. + Distributed data-parallel training is also supported and scales well. See [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks) for more benchmark scripts.
2. Focus on __large datasets__. 2. Focus on __large datasets__.
+ It's unnecessary to read/preprocess data with a new language called TF. + It's unnecessary to read/preprocess data with a new language called TF.
......
...@@ -145,6 +145,11 @@ class ImageNetModel(ModelDesc): ...@@ -145,6 +145,11 @@ class ImageNetModel(ModelDesc):
""" """
image_dtype = tf.uint8 image_dtype = tf.uint8
"""
Whether to apply weight decay on BN parameters.
"""
weight_decay_on_bn = False
def __init__(self, data_format='NCHW'): def __init__(self, data_format='NCHW'):
self.data_format = data_format self.data_format = data_format
...@@ -161,7 +166,11 @@ class ImageNetModel(ModelDesc): ...@@ -161,7 +166,11 @@ class ImageNetModel(ModelDesc):
loss = ImageNetModel.compute_loss_and_error(logits, label) loss = ImageNetModel.compute_loss_and_error(logits, label)
if self.weight_decay > 0: if self.weight_decay > 0:
wd_loss = regularize_cost('.*/W', tf.contrib.layers.l2_regularizer(self.weight_decay), if self.weight_decay_on_bn:
pattern = '.*/W|.*/gamma|.*/beta'
else:
pattern = '.*/W'
wd_loss = regularize_cost(pattern, tf.contrib.layers.l2_regularizer(self.weight_decay),
name='l2_regularize_loss') name='l2_regularize_loss')
add_moving_summary(loss, wd_loss) add_moving_summary(loss, wd_loss)
total_cost = tf.add_n([loss, wd_loss], name='cost') total_cost = tf.add_n([loss, wd_loss], name='cost')
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
Training examples with __reproducible performance__. Training examples with __reproducible performance__.
__The word "reproduce" should always means reproduce performance__. __The word "reproduce" should always mean reproduce performance__.
With the magic of SGD, wrong deep learning code often appears to still work, With the magic of SGD, wrong deep learning code often appears to still work,
especially if you try it on toy datasets. especially if you try it on toy datasets.
See [Unawareness of Deep Learning Mistakes](https://medium.com/@ppwwyyxx/unawareness-of-deep-learning-mistakes-d5b5774da0ba). See [Unawareness of Deep Learning Mistakes](https://medium.com/@ppwwyyxx/unawareness-of-deep-learning-mistakes-d5b5774da0ba).
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment