update readme

ae2bd873 · Yuxin Wu · d8b4b4d7 · ae2bd873 · ae2bd873 · ae2bd873
Commit ae2bd873 authored Dec 02, 2017 by Yuxin Wu
5 changed files
--- a/.gitignore
+++ b/.gitignore
 # tensorpack-specific stuff
 train_log
 tensorpack/user_ops/obj
-
 *.npy
-*.bin
+*.npz
+*.caffemodel
 *.tfmodel
 *.meta
 *.log*
-model-*
-.gitignore
-*.caffemodel
+*.bin
 *.png
 *.jpg
 checkpoint
 *.json
 *.prototxt
 *.txt
+*.tgz
+*.gz

 # my personal stuff
 snippet
 examples/private
+examples-old
 TODO.md
+.gitignore
+.vimrc.local

-*.gz

 # Byte-compiled / optimized / DLL files
 __pycache__/

--- a/examples/FasterRCNN/utils/README.md
+++ b/examples/FasterRCNN/utils/README.md
@@ -2,5 +2,5 @@
 # Some third-party helper functions

 + generate_anchors.py: copied from [py-faster-rcnn](https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/rpn/generate_anchors.py).
-+ box_ops.py: modified from [TF object detection API](https://github.com/tensorflow/models/blob/master/object_detection/core/box_list_ops.py).
+ box_ops.py: modified from [TF object detection API](https://github.com/tensorflow/models/blob/master/research/object_detection/core/box_list_ops.py).

--- a/examples/GAN/README.md
+++ b/examples/GAN/README.md
@@ -12,7 +12,7 @@ Reproduce the following GAN-related methods, 100~200 lines each:

 + [Wasserstein GAN](https://arxiv.org/abs/1701.07875)

-+ Improved Wasserstein GAN ([Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028))
+ Improved Wasserstein GAN, i.e. WGAN-GP ([Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028))

 + DiscoGAN ([Learning to Discover Cross-Domain Relations with Generative Adversarial Networks](https://arxiv.org/abs/1703.05192))

@@ -23,7 +23,7 @@ Reproduce the following GAN-related methods, 100~200 lines each:

 Please see the __docstring__ in each script for detailed usage and pretrained models. MultiGPU training is supported.

-## DCGAN.py
+## [DCGAN.py](DCGAN.py)

 Reproduce DCGAN following the setup in [dcgan.torch](https://github.com/soumith/dcgan.torch).

@@ -35,7 +35,7 @@ Reproduce DCGAN following the setup in [dcgan.torch](https://github.com/soumith/

 ![vec](demo/DCGAN-CelebA-vec.jpg)

-## Image2Image.py
+## [Image2Image.py](Image2Image.py)

 Image-to-Image translation following the setup in [pix2pix](https://github.com/phillipi/pix2pix).

@@ -45,7 +45,7 @@ For example, with the cityscapes dataset, it learns to generate semantic segment

 This is a visualization from tensorboard. Left to right: original, ground truth, model output.

-## InfoGAN-mnist.py
+## [InfoGAN-mnist.py](InfoGAN-mnist.py)

 Reproduce the mnist experiement in InfoGAN.
 It assumes 10 latent variables corresponding to a categorical distribution, 2 latent variables corresponding to a uniform distribution.
@@ -57,18 +57,18 @@ It then maximizes mutual information between these latent variables and the imag
 * Middle: 1 continuous latent variable controlled the rotation.
 * Right: another continuous latent variable controlled the thickness.

-## ConditionalGAN-mnist.py
+## [ConditionalGAN-mnist.py](ConditionalGAN-mnist.py)

 Train a simple GAN on mnist, conditioned on the class labels.

-## WGAN.py, Improved-WGAN.py, BEGAN.py
+## [WGAN.py](WGAN.py), [Improved-WGAN.py](Improved-WGAN.py), [BEGAN.py](BEGAN.py)

 These variants are implemented by some small modifications on top of DCGAN.py.
 Some BEGAN samples:

 ![began-sample](demo/BEGAN-CelebA-samples.jpg)

-## CycleGAN.py, DiscoGAN-CelebA.py
+## [CycleGAN.py](CycleGAN.py), [DiscoGAN-CelebA.py](DiscoGAN-CelebA.py)

 Reproduce CycleGAN with the original datasets, and DiscoGAN on CelebA. They are pretty much the same idea with different architecture.
 CycleGAN horse-to-zebra in tensorboard:

--- a/examples/ResNet/README.md
+++ b/examples/ResNet/README.md

-## imagenet-resnet.py
+## [imagenet-resnet.py](imagenet-resnet.py)

 __Training__ code of three variants of ResNet on ImageNet:

@@ -30,7 +30,7 @@ See the [tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/efficient

 ![imagenet](imagenet-resnet.png)

-## load-resnet.py
+## [load-resnet.py](load-resnet.py)

 This script only converts and runs ImageNet-ResNet{50,101,152} Caffe models [released by MSRA](https://github.com/KaimingHe/deep-residual-networks).
 Note that the architecture is different from the `imagenet-resnet.py` script and the models are not compatible.
@@ -52,7 +52,7 @@ The per-pixel mean used here is slightly different from the original.
 | ResNet 101         |      7.11%  |      23.54% |
 | ResNet 152         |      6.71%  |      23.21% |

-## cifar10-resnet.py
+## [cifar10-resnet.py](cifar10-resnet.py)

 Reproduce pre-activation ResNet on CIFAR10.

@@ -61,12 +61,15 @@ Reproduce pre-activation ResNet on CIFAR10.
 Also see a [DenseNet implementation](https://github.com/YixuanLi/densenet-tensorflow) of the paper [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993).


-## cifar10-preact18-mixup.py
+## [cifar10-preact18-mixup.py](cifar10-preact18-mixup.py)

-Reproduce mixup pre-activation ResNet18 on CIFAR10.
-Please notice that this preact18 architecture is
+Reproduce the mixup pre-act ResNet-18 CIFAR10 experiment, in the paper:
+
+* [mixup: Beyond Empirical Risk Minimization](https://arxiv.org/abs/1710.09412).
+
+Please note that this preact18 architecture is
 [different](https://github.com/kuangliu/pytorch-cifar/blob/master/models/preact_resnet.py)
-as the [mixup paper](https://arxiv.org/abs/1710.09412) said.
+from `cifar10-resnet18.py`.

 Usage:
 ```bash
@@ -75,5 +78,6 @@ Usage:
 ```

 Validation error with the original LR schedule (100-150-200): __5.0%__ without mixup, __3.8%__ with mixup.
+This matches the number in the paper.

 With 2x LR schedule: 4.7% without mixup, and 3.2% with mixup.
--- a/examples/ResNet/load-resnet.py
+++ b/examples/ResNet/load-resnet.py
 #!/usr/bin/env python
 # -*- coding: UTF-8 -*-
 # File: load-resnet.py
-# Author: Eric Yujia Huang yujiah1@andrew.cmu.edu
+# Author: Eric Yujia Huang <yujiah1@andrew.cmu.edu>
 #         Yuxin Wu <ppwwyyxx@gmail.com>

 import cv2
@@ -11,7 +11,6 @@ import argparse
 import re
 import numpy as np
 import six
-from tensorflow.contrib.layers import variance_scaling_initializer

 from tensorpack import *
 from tensorpack.utils import logger
@@ -45,8 +44,7 @@ class Model(ModelDesc):
        image = tf.transpose(image, [0, 3, 1, 2])
        with argscope([Conv2D, MaxPooling, GlobalAvgPooling, BatchNorm],
                      data_format='NCHW'), \
-                argscope(Conv2D, nl=tf.identity, use_bias=False,
-                         W_init=variance_scaling_initializer(mode='FAN_OUT')):
+                argscope(Conv2D, nl=tf.identity, use_bias=False):
            logits = (LinearWrap(image)
                      .Conv2D('conv0', 64, 7, stride=2, nl=BNReLU, padding='VALID')
                      .MaxPooling('pool0', shape=3, stride=2, padding='SAME')