ResNet-mixup example: align implementation with one, referenced by original paper. (#571)

* Align implementation with reference implementation used by paper. ResNet-18 with preactivation as by https://github.com/kuangliu/pytorch-cifar is using ResNet with preactivation block with 2 consecutive convolution layers in the block. Existing implementation was using 3. Weight decay was set incorrectly. Architecture aligned with main repository approach: defined functions for bottleneck and regular PreActResNet blocks Support for multiple depths added. * PreActivation block: no BnRelu should appear outside of the residual branch * Code migration clean up: blocks reareanged, variable names aligned * Correct reference implementation: BnRelu is used in identity branch only before a convolutional layer. * Updated model accuracies after sigle run * Documentation update * closer to mixup experiment settings * fix lint

ResNet-mixup example: align implementation with one, referenced by original paper. (#571)
* Align implementation with reference implementation used by paper. ResNet-18 with preactivation as by https://github.com/kuangliu/pytorch-cifar is using ResNet with preactivation block with 2 consecutive convolution layers in the block. Existing implementation was using 3. Weight decay was set incorrectly. Architecture aligned with main repository approach: defined functions for bottleneck and regular PreActResNet blocks Support for multiple depths added. * PreActivation block: no BnRelu should appear outside of the residual branch * Code migration clean up: blocks reareanged, variable names aligned * Correct reference implementation: BnRelu is used in identity branch only before a convolutional layer. * Updated model accuracies after sigle run * Documentation update * closer to mixup experiment settings * fix lint
e086f05a · yselivonchyk · Yuxin Wu · 26e609f8 · e086f05a · e086f05a
Commit e086f05a authored Jan 03, 2018 by yselivonchyk Committed by Yuxin Wu Jan 03, 2018
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 65 additions and 104 deletions

examples/ResNet/README.md examples/ResNet/README.md +4 -7

examples/ResNet/cifar10-preact18-mixup.py examples/ResNet/cifar10-preact18-mixup.py +61 -97

No files found.
--- a/examples/ResNet/README.md
+++ b/examples/ResNet/README.md
@@ -67,9 +67,8 @@ Reproduce the mixup pre-act ResNet-18 CIFAR10 experiment, in the paper:

 * [mixup: Beyond Empirical Risk Minimization](https://arxiv.org/abs/1710.09412).

-Please note that this preact18 architecture is
-[different](https://github.com/kuangliu/pytorch-cifar/blob/master/models/preact_resnet.py)
-from `cifar10-resnet18.py`.
+This implementation follows exact settings from the [author's code](https://github.com/hongyi-zhang/mixup).
+Note that the architecture is different from the offcial preact-ResNet18.

 Usage:
 ```bash
@@ -77,7 +76,5 @@ Usage:
 ./cifar10-preact18-mixup.py --mixup	 # with mixup
 ```

-Validation error with the original LR schedule (100-150-200): __5.0%__ without mixup, __3.8%__ with mixup.
-This matches the number in the paper.
-
-With 2x LR schedule: 4.7% without mixup, and 3.2% with mixup.
+Results of the reference code can be reproduced.
+In one run it gives me: 5.48% without mixup; __4.17%__ with mixup (alpha=1).
--- a/examples/ResNet/cifar10-preact18-mixup.py
+++ b/examples/ResNet/cifar10-preact18-mixup.py