Commit 353cd04f authored by Yuxin Wu's avatar Yuxin Wu

Workaround dilated conv bugs in tf.layers.Conv2D (#1110).

Another bug in tf.layers .. maybe I should never switch to it.
parent 505e28eb
...@@ -10,7 +10,7 @@ Both are necessary. ...@@ -10,7 +10,7 @@ Both are necessary.
`tf.train.NewCheckpointReader` is the offical tool to parse TensorFlow checkpoint. `tf.train.NewCheckpointReader` is the offical tool to parse TensorFlow checkpoint.
Read [TF docs](https://www.tensorflow.org/api_docs/python/tf/train/NewCheckpointReader) for details. Read [TF docs](https://www.tensorflow.org/api_docs/python/tf/train/NewCheckpointReader) for details.
Tensorpack also provides a small tool to load checkpoints, see Tensorpack also provides a small tool to load checkpoints, see
[load_chkpt_vars](../modules/tfutils.html#tensorpack.tfutils.varmanip.load_chkpt_vars) [load_chkpt_vars](../modules/tfutils.html#tensorpack.tfutils.varmanip.load_chkpt_vars)
for details. for details.
...@@ -51,3 +51,26 @@ Therefore, transfer learning is trivial. ...@@ -51,3 +51,26 @@ Therefore, transfer learning is trivial.
If you want to load a pre-trained model, just use the same variable names. If you want to load a pre-trained model, just use the same variable names.
If you want to re-train some layer, just rename either the variables in the If you want to re-train some layer, just rename either the variables in the
graph or the variables in your loader. graph or the variables in your loader.
## Resume Training
"resume training" means "loading the last known checkpoint".
Therefore you should refer to the [previous section](#load-a-model-to-a-session)
on how to load a model.
```eval_rst
.. note:: **A checkpoint does not resume everything!**
The TensorFlow checkpoint only saves TensorFlow variables,
which means other Python states that are not TensorFlow variables will not be saved
and resumed. This often include:
1. Training epoch number. You can set it by providing a `starting_epoch` to
your resume job.
2. State in your callbacks. Certain callbacks maintain a state
(e.g., current best accuracy) in Python, which cannot be saved automatically.
The [AutoResumeTrainConfig](../modules/train.html#tensorpack.train.AutoResumeTrainConfig)
is an alternative of `TrainConfig` which applies some heuristics to
automatically resume both checkpoint and the epoch number from your log directory.
...@@ -7,14 +7,14 @@ However, tensorpack is model-agnostic, which means ...@@ -7,14 +7,14 @@ However, tensorpack is model-agnostic, which means
**you can skip this tutorial and do not need to use tensorpack's symbolic layers.** **you can skip this tutorial and do not need to use tensorpack's symbolic layers.**
These layers were written only because there were no alternatives when tensorpack was first developed. These layers were written only because there were no alternatives when tensorpack was first developed.
Nowadays, these implementation actually call `tf.layers` directly. Nowadays, many of these implementation actually call `tf.layers` directly.
__Tensorpack will not add any more layers__ into its core library because this is __Tensorpack will not add any more layers__ into its core library because this is
not the focus of tensorpack, and there are many other alternative symbolic not the focus of tensorpack, and there are many other alternative symbolic
libraries today. libraries today.
Today, you can just use `tf.layers` or any other symbolic libraries inside tensorpack. Today, you can just use `tf.layers` or any other symbolic libraries inside tensorpack.
If you use the tensorpack implementations, you can also benefit from `argscope` and `LinearWrap` to If you use the tensorpack implementations, you can also benefit from `argscope` and `LinearWrap` to
simplify the code. simplify the code, and also fewer bugs than `tf.layers`.
Note that to keep backward compatibility of code and pre-trained models, tensorpack layers Note that to keep backward compatibility of code and pre-trained models, tensorpack layers
have some small differences with `tf.layers`, including variable names and default options. have some small differences with `tf.layers`, including variable names and default options.
...@@ -111,13 +111,13 @@ always creates new variable scope. See the [Keras example](../examples/keras) fo ...@@ -111,13 +111,13 @@ always creates new variable scope. See the [Keras example](../examples/keras) fo
```eval_rst ```eval_rst
.. note:: **It's best to not trust others' layers!** .. note:: **It's best to not trust others' layers!**
For non-standard layers that's not included in TensorFlow or Tensorpack, it's best to implement them yourself. For non-standard layers that's not included in TensorFlow or Tensorpack, it's best to implement them yourself.
Non-standard layers often do not have a mathematical definition that people Non-standard layers often do not have a mathematical definition that people
all agree on, and different people can implement it differently. all agree on, and different people can implement it differently.
Also, deep learning models on github often have bugs, especially when there is Also, deep learning models on github often have bugs, especially when there is
no reproduced experiments with the code. no reproduced experiments with the code.
For your own good, it's best to implement the layers yourself. For your own good, it's best to implement the layers yourself.
This is also why Tensorpack does not contain non-standard layers. This is also why Tensorpack does not contain non-standard layers.
``` ```
...@@ -54,7 +54,10 @@ def Conv2D( ...@@ -54,7 +54,10 @@ def Conv2D(
kernel_initializer = tf.contrib.layers.variance_scaling_initializer(2.0) kernel_initializer = tf.contrib.layers.variance_scaling_initializer(2.0)
else: else:
kernel_initializer = tf.keras.initializers.VarianceScaling(2.0, distribution='untruncated_normal') kernel_initializer = tf.keras.initializers.VarianceScaling(2.0, distribution='untruncated_normal')
if split == 1: dilation_rate = shape2d(dilation_rate)
if split == 1 and dilation_rate == [1, 1]:
# tf.layers.Conv2D has bugs with dilations (https://github.com/tensorflow/tensorflow/issues/26797)
with rename_get_variable({'kernel': 'W', 'bias': 'b'}): with rename_get_variable({'kernel': 'W', 'bias': 'b'}):
layer = tf.layers.Conv2D( layer = tf.layers.Conv2D(
filters, filters,
...@@ -92,7 +95,7 @@ def Conv2D( ...@@ -92,7 +95,7 @@ def Conv2D(
out_channel = filters out_channel = filters
assert out_channel % split == 0 assert out_channel % split == 0
assert dilation_rate == (1, 1) or get_tf_version_tuple() >= (1, 5), 'TF>=1.5 required for group dilated conv' assert dilation_rate == [1, 1] or get_tf_version_tuple() >= (1, 5), 'TF>=1.5 required for dilated conv.'
kernel_shape = shape2d(kernel_size) kernel_shape = shape2d(kernel_size)
filter_shape = kernel_shape + [in_channel / split, out_channel] filter_shape = kernel_shape + [in_channel / split, out_channel]
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment