Workaround dilated conv bugs in tf.layers.Conv2D (#1110).

Another bug in tf.layers .. maybe I should never switch to it.

Workaround dilated conv bugs in tf.layers.Conv2D (#1110).
Another bug in tf.layers .. maybe I should never switch to it.
353cd04f · Yuxin Wu · 505e28eb · 353cd04f · 353cd04f · 353cd04f
Commit 353cd04f authored Mar 17, 2019 by Yuxin Wu
Showing with 34 additions and 8 deletions

docs/tutorial/save-load.md docs/tutorial/save-load.md +24 -1

docs/tutorial/symbolic.md docs/tutorial/symbolic.md +5 -5

tensorpack/models/conv2d.py tensorpack/models/conv2d.py +5 -2

No files found.
--- a/docs/tutorial/save-load.md
+++ b/docs/tutorial/save-load.md
@@ -10,7 +10,7 @@ Both are necessary.
 `tf.train.NewCheckpointReader` is the offical tool to parse TensorFlow checkpoint.
 Read [TF docs](https://www.tensorflow.org/api_docs/python/tf/train/NewCheckpointReader) for details.
-Tensorpack also provides a small tool to load checkpoints, see 
+Tensorpack also provides a small tool to load checkpoints, see
 [load_chkpt_vars](../modules/tfutils.html#tensorpack.tfutils.varmanip.load_chkpt_vars)
 for details.
@@ -51,3 +51,26 @@ Therefore, transfer learning is trivial.
 If you want to load a pre-trained model, just use the same variable names.
 If you want to re-train some layer, just rename either the variables in the
 graph or the variables in your loader.
+## Resume Training
+"resume training" means "loading the last known checkpoint".
+Therefore you should refer to the [previous section](#load-a-model-to-a-session)
+on how to load a model.
+```eval_rst
+.. note:: **A checkpoint does not resume everything!**
+    The TensorFlow checkpoint only saves TensorFlow variables,
+    which means other Python states that are not TensorFlow variables will not be saved
+    and resumed. This often include:
+    1. Training epoch number. You can set it by providing a `starting_epoch` to
+       your resume job.
+    2. State in your callbacks. Certain callbacks maintain a state
+       (e.g., current best accuracy) in Python, which cannot be saved automatically.
+The [AutoResumeTrainConfig](../modules/train.html#tensorpack.train.AutoResumeTrainConfig)
+is an alternative of `TrainConfig` which applies some heuristics to
+automatically resume both checkpoint and the epoch number from your log directory.
--- a/docs/tutorial/symbolic.md
+++ b/docs/tutorial/symbolic.md
@@ -7,14 +7,14 @@ However, tensorpack is model-agnostic, which means
 **you can skip this tutorial and do not need to use tensorpack's symbolic layers.**
 These layers were written only because there were no alternatives when tensorpack was first developed.
-Nowadays, these implementation actually call `tf.layers` directly.
+Nowadays, many of these implementation actually call `tf.layers` directly.
 __Tensorpack will not add any more layers__ into its core library because this is
 not the focus of tensorpack, and there are many other alternative symbolic
 libraries today.
 Today, you can just use `tf.layers` or any other symbolic libraries inside tensorpack.
 If you use the tensorpack implementations, you can also benefit from `argscope` and `LinearWrap` to
-simplify the code.
+simplify the code, and also fewer bugs than `tf.layers`.
 Note that to keep backward compatibility of code and pre-trained models, tensorpack layers
 have some small differences with `tf.layers`, including variable names and default options.
@@ -111,13 +111,13 @@ always creates new variable scope. See the [Keras example](../examples/keras) fo
 ```eval_rst
 .. note:: **It's best to not trust others' layers!**
    For non-standard layers that's not included in TensorFlow or Tensorpack, it's best to implement them yourself.
    Non-standard layers often do not have a mathematical definition that people
-    all agree on, and different people can implement it differently. 
+    all agree on, and different people can implement it differently.
    Also, deep learning models on github often have bugs, especially when there is
    no reproduced experiments with the code.
    For your own good, it's best to implement the layers yourself.
    This is also why Tensorpack does not contain non-standard layers.
 ```
--- a/tensorpack/models/conv2d.py
+++ b/tensorpack/models/conv2d.py
@@ -54,7 +54,10 @@ def Conv2D(
            kernel_initializer = tf.contrib.layers.variance_scaling_initializer(2.0)
        else:
            kernel_initializer = tf.keras.initializers.VarianceScaling(2.0, distribution='untruncated_normal')
-    if split == 1:
+    dilation_rate = shape2d(dilation_rate)
+    if split == 1 and dilation_rate == [1, 1]:
+        # tf.layers.Conv2D has bugs with dilations (https://github.com/tensorflow/tensorflow/issues/26797)
        with rename_get_variable({'kernel': 'W', 'bias': 'b'}):
            layer = tf.layers.Conv2D(
                filters,
@@ -92,7 +95,7 @@ def Conv2D(
        out_channel = filters
        assert out_channel % split == 0
-        assert dilation_rate == (1, 1) or get_tf_version_tuple() >= (1, 5), 'TF>=1.5 required for group dilated conv'
+        assert dilation_rate == [1, 1] or get_tf_version_tuple() >= (1, 5), 'TF>=1.5 required for dilated conv.'
        kernel_shape = shape2d(kernel_size)
        filter_shape = kernel_shape + [in_channel / split, out_channel]