Workaround dilated conv bugs in tf.layers.Conv2D (#1110).

Another bug in tf.layers .. maybe I should never switch to it.

Workaround dilated conv bugs in tf.layers.Conv2D (#1110).
Another bug in tf.layers .. maybe I should never switch to it.
353cd04f · Yuxin Wu · 505e28eb · 353cd04f · 353cd04f · 353cd04f
Commit 353cd04f authored Mar 17, 2019 by Yuxin Wu
Showing with 34 additions and 8 deletions

docs/tutorial/save-load.md docs/tutorial/save-load.md +24 -1

docs/tutorial/symbolic.md docs/tutorial/symbolic.md +5 -5

tensorpack/models/conv2d.py tensorpack/models/conv2d.py +5 -2

No files found.
--- a/docs/tutorial/save-load.md
+++ b/docs/tutorial/save-load.md
@@ -51,3 +51,26 @@ Therefore, transfer learning is trivial.
 If you want to load a pre-trained model, just use the same variable names.
 If you want to re-train some layer, just rename either the variables in the
 graph or the variables in your loader.
+
+
+## Resume Training
+
+"resume training" means "loading the last known checkpoint".
+Therefore you should refer to the [previous section](#load-a-model-to-a-session)
+on how to load a model.
+
+```eval_rst
+.. note:: **A checkpoint does not resume everything!**
+
+    The TensorFlow checkpoint only saves TensorFlow variables,
+    which means other Python states that are not TensorFlow variables will not be saved
+    and resumed. This often include:
+
+    1. Training epoch number. You can set it by providing a `starting_epoch` to
+       your resume job.
+    2. State in your callbacks. Certain callbacks maintain a state
+       (e.g., current best accuracy) in Python, which cannot be saved automatically.
+
+The [AutoResumeTrainConfig](../modules/train.html#tensorpack.train.AutoResumeTrainConfig)
+is an alternative of `TrainConfig` which applies some heuristics to
+automatically resume both checkpoint and the epoch number from your log directory.
--- a/docs/tutorial/symbolic.md
+++ b/docs/tutorial/symbolic.md
@@ -7,14 +7,14 @@ However, tensorpack is model-agnostic, which means
 **you can skip this tutorial and do not need to use tensorpack's symbolic layers.**

 These layers were written only because there were no alternatives when tensorpack was first developed.
-Nowadays, these implementation actually call `tf.layers` directly.
+Nowadays, many of these implementation actually call `tf.layers` directly.
 __Tensorpack will not add any more layers__ into its core library because this is
 not the focus of tensorpack, and there are many other alternative symbolic
 libraries today.

 Today, you can just use `tf.layers` or any other symbolic libraries inside tensorpack.
 If you use the tensorpack implementations, you can also benefit from `argscope` and `LinearWrap` to
-simplify the code.
+simplify the code, and also fewer bugs than `tf.layers`.

 Note that to keep backward compatibility of code and pre-trained models, tensorpack layers
 have some small differences with `tf.layers`, including variable names and default options.

--- a/tensorpack/models/conv2d.py
+++ b/tensorpack/models/conv2d.py
@@ -54,7 +54,10 @@ def Conv2D(
            kernel_initializer = tf.contrib.layers.variance_scaling_initializer(2.0)
        else:
            kernel_initializer = tf.keras.initializers.VarianceScaling(2.0, distribution='untruncated_normal')
-    if split == 1:
+    dilation_rate = shape2d(dilation_rate)
+
+    if split == 1 and dilation_rate == [1, 1]:
+        # tf.layers.Conv2D has bugs with dilations (https://github.com/tensorflow/tensorflow/issues/26797)
        with rename_get_variable({'kernel': 'W', 'bias': 'b'}):
            layer = tf.layers.Conv2D(
                filters,
@@ -92,7 +95,7 @@ def Conv2D(

        out_channel = filters
        assert out_channel % split == 0
-        assert dilation_rate == (1, 1) or get_tf_version_tuple() >= (1, 5), 'TF>=1.5 required for group dilated conv'
+        assert dilation_rate == [1, 1] or get_tf_version_tuple() >= (1, 5), 'TF>=1.5 required for dilated conv.'

        kernel_shape = shape2d(kernel_size)
        filter_shape = kernel_shape + [in_channel / split, out_channel]