Pre/Post processing in ImageNetModel

7f505225 · Yuxin Wu · 30c7a97c · 7f505225 · 7f505225 · 7f505225
Commit 7f505225 authored Sep 25, 2018 by Yuxin Wu
4 changed files
--- a/docs/tutorial/inference.md
+++ b/docs/tutorial/inference.md
@@ -22,29 +22,20 @@ You can use this predicate to choose a different code path in inference mode.
 ## Inference After Training

 Tensorpack is a training interface -- __it doesn't care what happened after training__.
-You have everything you need for inference or model diagnosis after
+You already have everything you need for inference or model diagnosis after
 training:
-1. The trained weights: tensorpack saves them in standard TF checkpoint format.
-2. The model: you've already written it yourself with TF symbolic functions.
+1. The model (the graph): you've already written it yourself with TF symbolic functions.
+2. The trained parameters: tensorpack saves them in standard TF checkpoint format.

 Therefore, you can build the graph for inference, load the checkpoint, and apply
 any processing or deployment TensorFlow supports.
-And you'll need to read TF docs and __do it on your own__.
+These are unrelated to tensorpack, and you'll need to read TF docs and __do it on your own__.

-### Don't Use Training Metagraph for Inference
+### Step 1: build the model

-Metagraph is the wrong abstraction for a "model". 
-It stores the entire graph which contains not only the mathematical model, but also all the
-training settings (queues, iterators, summaries, evaluations, multi-gpu replications).
-Therefore it is usually wrong to import a training metagraph for inference.
+You can build a graph however you like, with pure TensorFlow. If your model is written with
+tensorpack's `ModelDesc`, you can also build it like this:

-It's also very common to change the graph for inference.
-For example, you may need a different data layout for CPU inference,
-or you may need placeholders in the inference graph (which may not even exist in
-the training graph). However metagraph is not designed to be easily modified at all.
-
-To do inference, it's best to recreate a clean graph (and save it if needed).
-To construct a new graph, you can simply:
 ```python
 a, b = tf.placeholder(...), tf.placeholder(...)
 # call ANY symbolic functions on a, b. e.g.:
@@ -52,11 +43,34 @@ with TowerContext('', is_training=False):
 	model.build_graph(a, b)
 ```

+```eval_rst
+.. note:: **Do not use metagraph for inference!**. 
+
+	Metagraph is the wrong abstraction for a "model". 
+	It stores the entire graph which contains not only the mathematical model, but also all the
+	training settings (queues, iterators, summaries, evaluations, multi-gpu replications).
+	Therefore it is usually wrong to import a training metagraph for inference.
+
+	It's also very common to change the graph for inference.
+	For example, you may need a different data layout for CPU inference,
+	or you may need placeholders in the inference graph (which may not even exist in
+	the training graph). However metagraph is not designed to be easily modified at all.
+
+	To do inference, it's best to recreate a clean graph (and save it if needed) by yourself.
+```
+
+### Step 2: load the checkpoint
+
+You can just use `tf.train.Saver` for all the work.
+Alternatively, use tensorpack's `SaverRestore(path).init(tf.get_default_session())`
+
+
 ### OfflinePredictor
-The only tool tensorpack has for after-training inference is [OfflinePredictor](../modules/predict.html#tensorpack.predict.OfflinePredictor),
-a simple function to build the graph and return a callable for you.
-Check out examples and docs for its usage.

+Tensorpack provides one tool [OfflinePredictor](../modules/predict.html#tensorpack.predict.OfflinePredictor),
+to merge the above two steps together.
+It has simple functionailities to build the graph, load the checkpoint, and return a callable for you.
+Check out examples and docs for its usage.

 OfflinePredictor is only for quick demo purposes.
 It runs inference on numpy arrays, therefore may not be the most efficient way.

--- a/docs/tutorial/trainer.md
+++ b/docs/tutorial/trainer.md
@@ -81,10 +81,15 @@ Note some __common problems__ when using these trainers:
    all GPUs take tensors from the `InputSource`.
 	So the total batch size across all GPUs would become ``(batch size of InputSource) * #GPU``.

-	Splitting a tensor for data-parallel training makes no sense at all. First,
-	it wastes time because typically data is concatenated into batches by the user.
-    Second, this puts unnecessary shape constraints on the data.
-	By letting each GPU train on its own input tensors, they can train on inputs of different shapes simultaneously.
+    ```eval_rst
+    .. note:: 
+
+        Splitting a tensor for data-parallel training (as done by frameworks like Keras) 
+        makes no sense at all. 
+        First, it wastes time doing the split because typically data is first concatenated by the user.
+        Second, this puts unnecessary shape constraints on the data, that the
+        inputs on each GPU needs to have consistent shapes.
+    ```

 2. The tower function (your model code) will get called multipile times on each GPU.
   You must follow the abovementieond rules of tower function.

--- a/examples/ImageNetModels/imagenet_utils.py
+++ b/examples/ImageNetModels/imagenet_utils.py
@@ -168,18 +168,24 @@ class ImageNetModel(ModelDesc):
    """
    loss_scale = 1.

+    """
+    Label smoothing (See tf.losses.softmax_cross_entropy)
+    """
+    label_smoothing = 0.
+
    def inputs(self):
        return [tf.placeholder(self.image_dtype, [None, self.image_shape, self.image_shape, 3], 'input'),
                tf.placeholder(tf.int32, [None], 'label')]

    def build_graph(self, image, label):
-        image = ImageNetModel.image_preprocess(image, bgr=self.image_bgr)
+        image = self.image_preprocess(image)
        assert self.data_format in ['NCHW', 'NHWC']
        if self.data_format == 'NCHW':
            image = tf.transpose(image, [0, 3, 1, 2])

        logits = self.get_logits(image)
-        loss = ImageNetModel.compute_loss_and_error(logits, label)
+        loss = ImageNetModel.compute_loss_and_error(
+            logits, label, label_smoothing=self.label_smoothing)

        if self.weight_decay > 0:
            wd_loss = regularize_cost(self.weight_decay_pattern,
@@ -212,26 +218,29 @@ class ImageNetModel(ModelDesc):
        tf.summary.scalar('learning_rate-summary', lr)
        return tf.train.MomentumOptimizer(lr, 0.9, use_nesterov=True)

-    @staticmethod
-    def image_preprocess(image, bgr=True):
+    def image_preprocess(self, image):
        with tf.name_scope('image_preprocess'):
            if image.dtype.base_dtype != tf.float32:
                image = tf.cast(image, tf.float32)
-            image = image * (1.0 / 255)
-
            mean = [0.485, 0.456, 0.406]    # rgb
            std = [0.229, 0.224, 0.225]
-            if bgr:
+            if self.image_bgr:
                mean = mean[::-1]
                std = std[::-1]
-            image_mean = tf.constant(mean, dtype=tf.float32)
-            image_std = tf.constant(std, dtype=tf.float32)
+            image_mean = tf.constant(mean, dtype=tf.float32) * 255.
+            image_std = tf.constant(std, dtype=tf.float32) * 255.
            image = (image - image_mean) / image_std
            return image

    @staticmethod
-    def compute_loss_and_error(logits, label):
-        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=label)
+    def compute_loss_and_error(logits, label, label_smoothing=0.):
+        if label_smoothing == 0.:
+            loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=label)
+        else:
+            nclass = logits.shape[-1]
+            loss = tf.losses.softmax_cross_entropy(
+                tf.one_hot(label, nclass),
+                logits, label_smoothing=label_smoothing)
        loss = tf.reduce_mean(loss, name='xentropy-loss')

        def prediction_incorrect(logits, label, topk=1, name='incorrect_vector'):

--- a/examples/keras/imagenet-resnet-keras.py
+++ b/examples/keras/imagenet-resnet-keras.py
@@ -16,7 +16,7 @@ from tensorpack.contrib.keras import KerasModel
 from tensorpack.callbacks import *
 from tensorflow.python.keras.layers import *

-from imagenet_utils import get_imagenet_dataflow, fbresnet_augmentor, ImageNetModel
+from imagenet_utils import get_imagenet_dataflow, fbresnet_augmentor


 TOTAL_BATCH_SIZE = 512
@@ -90,7 +90,11 @@ def resnet50(image):
    input = Input(tensor=image)

    def image_preprocess(image):
-        image = ImageNetModel.image_preprocess(image)
+        image = tf.cast(image, tf.float32)
+        image = image * (1.0 / 255)
+        mean = [0.485, 0.456, 0.406][::-1]
+        std = [0.229, 0.224, 0.225][::-1]
+        image = (image - tf.constant(mean, dtype=tf.float32)) / tf.constant(std, dtype=tf.float32)
        image = tf.transpose(image, [0, 3, 1, 2])
        return image