update docs

891dc488 · Yuxin Wu · eccba14e · 891dc488 · 891dc488 · 891dc488
Commit 891dc488 authored May 26, 2019 by Yuxin Wu
Showing with 30 additions and 11 deletions

README.md README.md +1 -1

docs/tutorial/philosophy/dataflow.md docs/tutorial/philosophy/dataflow.md +5 -5

tensorpack/tfutils/varreplace.py tensorpack/tfutils/varreplace.py +24 -5

No files found.
--- a/README.md
+++ b/README.md
@@ -22,7 +22,7 @@ It's Yet Another TF high-level API, with __speed__, and __flexibility__ built to
    some benchmark scripts.

 2. Focus on __large datasets__.
-	+ [You don't usually need `tf.data`](http://tensorpack.readthedocs.io/tutorial/extend/input-source.html#tensorflow-reader-cons).
+	+ [You don't usually need `tf.data`](https://tensorpack.readthedocs.io/tutorial/philosophy/dataflow.html#alternative-data-loading-solutions).
    Symbolic programming often makes data processing harder.
 	  Tensorpack helps you efficiently process large datasets (e.g. ImageNet) in __pure Python__ with autoparallelization.


--- a/docs/tutorial/philosophy/dataflow.md
+++ b/docs/tutorial/philosophy/dataflow.md
@@ -42,7 +42,7 @@ And for us, we may optimize DataFlow even more, but we just haven't found the re

 Certain libraries advocate for a new binary data format (e.g., TFRecords, RecordIO).
 Do you need to use them?
-We think you usually do not. Not after you try DataFlow.
+We think you usually do not, at least not after you try DataFlow, because they are:

 1. **Not Easy**: To use the new binary format,
 	 you need to write a script, to process your data from its original format,
@@ -102,10 +102,10 @@ __it's extremely inflexible__.

 Why would you ever want to do anything in a computation graph? Here are the possible reasons:

-1. Automatic differentiation
-2. Run the computation on different devices
-3. Serialize the description of your computation
-4. Automatic performance optimization
+* Automatic differentiation
+* Run the computation on different devices
+* Serialize the description of your computation
+* Automatic performance optimization

 They all make sense for training neural networks, but **not much for data loading**.


--- a/tensorpack/tfutils/varreplace.py
+++ b/tensorpack/tfutils/varreplace.py
@@ -61,8 +61,8 @@ def freeze_variables(stop_gradient=True, skip_collection=False):
    Return a context to freeze variables,
    by wrapping ``tf.get_variable`` with a custom getter.
    It works by either applying ``tf.stop_gradient`` on the variables,
-    or by keeping them out of the ``TRAINABLE_VARIABLES`` collection, or
-    both.
+    or keeping them out of the ``TRAINABLE_VARIABLES`` collection, or
+    both. Both options have their own pros and cons.

    Example:
        .. code-block:: python
@@ -72,16 +72,35 @@ def freeze_variables(stop_gradient=True, skip_collection=False):

    Args:
        stop_gradient (bool): if True, variables returned from `get_variable`
-            will be wrapped with `tf.stop_gradient` and therefore has no
-            gradient when used later.
+            will be wrapped with `tf.stop_gradient`.
+
            Note that the created variables may still have gradient when accessed
            by other approaches (e.g. by name, or by collection).
+            For example, they may still have a gradient in weight decay.
            Also note that this makes `tf.get_variable` returns a Tensor instead of a Variable,
-            which may break existing code.
+            which may break existing contract.
            Therefore, it's recommended to use the `skip_collection` option instead.
        skip_collection (bool): if True, do not add the variable to
            ``TRAINABLE_VARIABLES`` collection, but to ``MODEL_VARIABLES``
            collection. As a result they will not be trained by default.
+
+    Note:
+
+    `stop_gradient` only stops variables returned by `get_variable` **within the context** to
+    contribute no gradient in this context. Therefore it may not completely freeze the variables.
+    For example:
+
+        1. If a variable is created, or reused outside of the context, it can still contribute to the
+           gradient of other tensors.
+        2. If a freezed variable is accessed by other approaches (e.g., by names, by collections),
+          it can still contribute to the gradient of other tensors.
+          For example, weight decay cannot be stopped by a `stop_gradient` context.
+
+    `skip_collection` has to be used the first time the variable is created.
+    Once `skip_collection` is used, the variable is not a trainable variable anymore,
+    and will be completely freezed from gradient update in tensorpack's single-cost trainer.
+
+    Choose the option carefully depend on what you need.
    """
    def custom_getter(getter, *args, **kwargs):
        trainable = kwargs.get('trainable', True)