Commit ca096908 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 6c482180
......@@ -8,10 +8,10 @@ The default logging behavior should be good enough for normal use cases, so you
This is how TensorFlow summaries eventually get logged/saved/printed:
1. __What to Log__: Define what you want to log in the graph.
1. __What to Log__: Define what you want to log in the graph, by just calling `tf.summary.xxx`.
When you call `tf.summary.xxx` in your graph code, TensorFlow adds an op to
`tf.GraphKeys.SUMMARIES` collection (by default).
Tensorpack further removes summaries not from the first training tower.
Tensorpack will further remove summaries (in the default collection) not from the first training tower.
2. __When to Log__: [MergeAllSummaries](../modules/callbacks.html#tensorpack.callbacks.MergeAllSummaries)
callback is one of the [default callbacks](../modules/train.html#tensorpack.train.DEFAULT_CALLBACKS).
It runs ops in the `tf.GraphKeys.SUMMARIES` collection (by default) every epoch (by default),
......@@ -25,28 +25,31 @@ This is how TensorFlow summaries eventually get logged/saved/printed:
* A [JSONWriter](../modules/callbacks.html#tensorpack.callbacks.JSONWriter)
saves scalars to a JSON file.
All the "what, when, where" can be customized in either the graph or with the callbacks/monitors setting.
You can call `tf.summary.xxx(collections=[...])` to add your custom summaries a different collection,
and use the `MergeAllSummaries(key=...)` callback to write them to monitors.
All the "what, when, where" can be customized in either the graph or with the callbacks/monitors setting:
1. You can call `tf.summary.xxx(collections=[...])` to add your custom summaries a different collection.
1. You can use the `MergeAllSummaries(key=...)` callback to write a different collection of summaries to monitors.
1. You can use `PeriodicCallback` or `MergeAllSummaries(period=...)` to make the callback execute less or more frequent.
1. You can tell the trainer to use different monitors.
The design goal to disentangle "what, when, where" is to make components reusable.
Suppose you have `M` items to log
Suppose you have `M` items to log
(possibly from differently places, not necessarily the graph)
and `N` backends to log your data to, you
automatically obtain all the `MxN` combinations.
Despite of that, if you only care about logging one specific item (e.g. for
debugging purpose), you can check out the
Despite of that, if you only care about logging one specific tensor in the graph (e.g. for
debugging purpose), you can check out the
[FAQ](http://tensorpack.readthedocs.io/tutorial/faq.html#how-to-print-dump-intermediate-results-in-training)
for easier options.
### Noisy TensorFlow Summaries
Since TF summaries are evaluated infrequently (every epoch) by default,
if the content is data-dependent (e.g., training loss),
if the content is data-dependent (e.g., training loss),
the infrequently-sampled values could have high variance.
To address this issue, you can:
1. Change "When to Log": log more frequently, but note that certain summaries can be expensive to
1. Change "When to Log": log more frequently, but note that certain large summaries can be expensive to
log. You may want to use a separate collection for frequent logging.
2. Change "What to Log": you can call
[tfutils.summary.add_moving_summary](../modules/tfutils.html#tensorpack.tfutils.summary.add_moving_summary)
......
......@@ -17,6 +17,7 @@ IMAGE_SIZE = 28
class Model(ModelDesc):
# See tutorial at https://tensorpack.readthedocs.io/tutorial/training-interface.html#with-modeldesc-and-trainconfig
def inputs(self):
"""
Define all the inputs (with type, shape, name) that the graph will need.
......@@ -25,8 +26,8 @@ class Model(ModelDesc):
tf.TensorSpec((None,), tf.int32, 'label')]
def build_graph(self, image, label):
"""This function should build the model which takes the input variables
and return cost at the end"""
"""This function should build the model which takes the input variables (defined above)
and return cost at the end."""
# In tensorflow, inputs to convolution function are assumed to be
# NHWC. Add a single channel here.
......@@ -35,7 +36,10 @@ class Model(ModelDesc):
image = image * 2 - 1 # center the pixels values at zero
# The context manager `argscope` sets the default option for all the layers under
# this context. Here we use 32 channel convolution with shape 3x3
# See tutorial at https://tensorpack.readthedocs.io/tutorial/symbolic.html
with argscope(Conv2D, kernel_size=3, activation=tf.nn.relu, filters=32):
# LinearWrap is just a syntax sugar.
# See tutorial at https://tensorpack.readthedocs.io/tutorial/symbolic.html
logits = (LinearWrap(image)
.Conv2D('conv0')
.MaxPooling('pool0', 2)
......@@ -58,6 +62,8 @@ class Model(ModelDesc):
# 1. written to tensosrboard
# 2. written to stat.json
# 3. printed after each epoch
# You can also just call `tf.summary.scalar`. But moving summary has some other benefits.
# See tutorial at https://tensorpack.readthedocs.io/tutorial/summary.html
train_error = tf.reduce_mean(1 - correct, name='train_error')
summary.add_moving_summary(train_error, accuracy)
......@@ -88,6 +94,8 @@ class Model(ModelDesc):
def get_data():
# We don't need any fancy data loading for this simple example.
# See dataflow tutorial at https://tensorpack.readthedocs.io/tutorial/dataflow.html
train = BatchData(dataset.Mnist('train'), 128)
test = BatchData(dataset.Mnist('test'), 256, remainder=True)
......@@ -110,18 +118,24 @@ if __name__ == '__main__':
config = TrainConfig(
model=Model(),
# The input source for training. FeedInput is slow, this is just for demo purpose.
# In practice it's best to use QueueInput or others. See tutorials for details.
# In practice it's best to use QueueInput or others.
# See tutorial at https://tensorpack.readthedocs.io/tutorial/extend/input-source.html
data=FeedInput(dataset_train),
# We use a few simple callbacks in this demo.
# See tutorial at https://tensorpack.readthedocs.io/tutorial/callback.html
callbacks=[
ModelSaver(), # save the model after every epoch
InferenceRunner( # run inference(for validation) after every epoch
dataset_test, # the DataFlow instance used for validation
ScalarStats( # produce `val_accuracy` and `val_cross_entropy_loss`
['cross_entropy_loss', 'accuracy'], prefix='val')),
# MaxSaver has to come after InferenceRunner
# MaxSaver needs to come after InferenceRunner to obtain its score
MaxSaver('val_accuracy'), # save the model with highest accuracy
],
steps_per_epoch=steps_per_epoch,
max_epoch=100,
)
# Use a simple trainer in this demo.
# More trainers with multi-gpu or distributed functionalities are available.
# See tutorial at https://tensorpack.readthedocs.io/tutorial/trainer.html
launch_train_with_config(config, SimpleTrainer())
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment