update docs

1f844978 · Yuxin Wu · f9bf5407 · 1f844978 · 1f844978 · 1f844978
Commit 1f844978 authored Mar 31, 2019 by Yuxin Wu
4 changed files
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -12,10 +12,18 @@ TensorFlow itself also changes API and those are not listed here.
  The concept of `InputDesc` was replaced by its equivalent in TF:
  `tf.TensorSpec`. This may be a breaking change if you have customized
  code that relies on internals of `InputDesc`.
-+ [2018/08/27] msgpack is used again for "serialization to disk", because pyarrow
+	To use `tf.TensorSpec` in your `ModelDesc`:
+```python
+	def inputs(self):
+			return [tf.TensorSpec((None, 28, 28, 1), tf.float32, 'image'),
+							tf.TensorSpec((None,), tf.int32, 'label')]
+```
+ [2018/08/27] msgpack is used for "serialization to disk", because pyarrow
  has no compatibility between versions. To use pyarrow instead, `export TENSORPACK_COMPATIBLE_SERIALIZE=pyarrow`.
-+ [2018/04/05] msgpack is replaced by pyarrow in favor of its speed. If you want old behavior,
-	`export TENSORPACK_SERIALIZE=msgpack`. It's later found that pyarrow is unstable and may lead to crash.
+ [2018/04/05] <del>msgpack is replaced by pyarrow in favor of its speed. If you want old behavior,
+	`export TENSORPACK_SERIALIZE=msgpack`.</del>
+	It's later found that pyarrow is unstable and may lead to crash.
+	So the default serialization is changed back to msgpack.
 + [2018/03/20] `ModelDesc` starts to use simplified interfaces:
 	+ `_get_inputs()` renamed to `inputs()` and returns `tf.placeholder`s.
 	+ `build_graph(self, tensor1, tensor2)` returns the cost tensor directly.

--- a/examples/ResNet/README.md
+++ b/examples/ResNet/README.md

 ## [imagenet-resnet.py](imagenet-resnet.py)

-__Training__ code of three variants of ResNet on ImageNet:
+__Training__ code of 4 variants of ResNet on ImageNet:

 * [Original ResNet](https://arxiv.org/abs/1512.03385)
 * [Pre-activation ResNet](https://arxiv.org/abs/1603.05027)
 * [Squeeze-and-Excitation ResNet](https://arxiv.org/abs/1709.01507)
+* [ResNeXt](https://arxiv.org/abs/1611.05431)

-The training follows the __exact__ recipe used by the [Training ImageNet in 1 Hour paper](https://arxiv.org/abs/1706.02677)
+The training follows the exact recipe used by the [Training ImageNet in 1 Hour paper](https://arxiv.org/abs/1706.02677)
 and gets the same performance.
 __Distributed training__ code & results can be found at [tensorpack/benchmarks](https://github.com/tensorpack/benchmarks/tree/master/ResNet-Horovod).

@@ -16,7 +17,7 @@ In fact, many papers that claim to "improve" ResNet by .5% only compete with a l
 baseline and they actually cannot beat this ResNet recipe.

 | Model            | Top 5 Error | Top 1 Error | Download                                                                          |
-|:------------|:------------|:-----------:|:----------------------------------------------------------------------------:|
+|:-----------------|:------------|:-----------:|:---------------------------------------------------------------------------------:|
 | ResNet18         | 10.50%      | 29.66%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet18.npz)         |
 | ResNet34         | 8.56%       | 26.17%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet34.npz)         |
 | ResNet50         | 6.85%       | 23.61%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet50.npz)         |
@@ -25,15 +26,15 @@ baseline and they actually cannot beat this ResNet recipe.
 | ResNeXt101-32x4d | 5.73%       | 21.05%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNeXt101-32x4d.npz) |
 | ResNet152        | 5.78%       | 21.51%      | [:arrow_down:](http://models.tensorpack.com/ResNet/ImageNet-ResNet152.npz)        |

-To reproduce the above results,
+To reproduce,
 first decompress ImageNet data into [this structure](http://tensorpack.readthedocs.io/modules/dataflow.dataset.html#tensorpack.dataflow.dataset.ILSVRC12), then:
 ```bash
-./imagenet-resnet.py --data /path/to/original/ILSVRC -d 50 [--mode resnet/preact/se] --batch 256
+./imagenet-resnet.py --data /path/to/original/ILSVRC -d 50 --mode resnet --batch 512
 # See ./imagenet-resnet.py -h for other options.
 ```

 You should be able to see good GPU utilization (95%~99%), if your data is fast enough.
-With batch=64x8, it can finish 100 epochs in 16 hours on AWS p3.16xlarge (8 V100s).
+With batch=64x8, ResNet50 training can finish 100 epochs in 16 hours on AWS p3.16xlarge (8 V100s).

 The default data pipeline is probably OK for machines with SSD & 20 CPU cores.
 See the [tutorial](http://tensorpack.readthedocs.io/tutorial/efficient-dataflow.html) on other options to speed up your data.

--- a/tensorpack/input_source/input_source.py
+++ b/tensorpack/input_source/input_source.py
@@ -11,7 +11,7 @@ from six.moves import range, zip
 from ..compat import tfv1
 from ..callbacks.base import Callback, CallbackFactory
 from ..callbacks.graph import RunOp
-from ..dataflow import DataFlow, MapData, RepeatedData
+from ..dataflow import DataFlow, MapData, RepeatedData, DataFlowTerminated
 from ..tfutils.common import get_op_tensor_name
 from ..tfutils.dependency import dependency_of_fetches
 from ..tfutils.summary import add_moving_summary
@@ -164,18 +164,19 @@ class EnqueueThread(ShareSessionThread):
                    self.op.run(feed_dict=feed)
            except (tf.errors.CancelledError, tf.errors.OutOfRangeError):
                pass
-                # logger.exception("Exception in {}:".format(self.name))
+            except DataFlowTerminated:
+                logger.info("[EnqueueThread] DataFlow has terminated.")
            except Exception as e:
                if isinstance(e, RuntimeError) and 'closed Session' in str(e):
                    pass
                else:
-                    logger.exception("Exception in {}:".format(self.name))
+                    logger.exception("[EnqueueThread] Exception in thread {}:".format(self.name))
            finally:
                try:
                    self.close_op.run()
                except Exception:
                    pass
-                logger.info("{} Exited.".format(self.name))
+                logger.info("[EnqueueThread] Thread {} Exited.".format(self.name))

    def reinitialize_dataflow(self):
        self._itr = self.dataflow.__iter__()

--- a/tensorpack/tfutils/optimizer.py
+++ b/tensorpack/tfutils/optimizer.py
@@ -146,7 +146,19 @@ class AccumGradOptimizer(ProxyOptimizer):
    :math:`k` times larger learning rate, but uses much less memory.

    Note that this implementation may not support all models.
-    E.g., it doesn't support sparse gradient update.
+    E.g., it currently doesn't support sparse gradient update.
+
+    This optimizer can be used in any TensorFlow code (with or without tensorpack).
+
+    Example:
+
+    .. code-block:: python
+
+        from tensorpack.tfutils.optimizer import AccumGradOptimizer
+        myopt = tf.train.GradientDescentOptimizer(0.01)
+        myopt = AccumGradOptimizer(myopt, niter=5)
+        train_op = myopt.minimize(loss)
+
    """

    def __init__(self, opt, niter):