Commit 9b62b218 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 881c4ee6
...@@ -54,6 +54,7 @@ extensions = [ ...@@ -54,6 +54,7 @@ extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.autodoc',
'sphinx.ext.todo', 'sphinx.ext.todo',
'sphinx.ext.napoleon', 'sphinx.ext.napoleon',
#'sphinx.ext.autosectionlabel',
# 'sphinx.ext.coverage', # 'sphinx.ext.coverage',
'sphinx.ext.mathjax', 'sphinx.ext.mathjax',
'sphinx.ext.intersphinx', 'sphinx.ext.intersphinx',
......
...@@ -8,7 +8,7 @@ might not be correct. ...@@ -8,7 +8,7 @@ might not be correct.
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
user/tutorials tutorial/index
casestudies/index casestudies/index
modules/index modules/index
......
# Callbacks # Callback
Apart from the actual training iterations that minimizes the cost, Apart from the actual training iterations that minimizes the cost,
you almost surely would like to do something else during training. you almost surely would like to do something else during training.
......
# Efficient Data Loading
This tutorial gives an overview of how to efficiently load data in tensorpack, using ImageNet
dataset as an example.
Note that the actual performance would depend on not only the disk, but also
memory (for caching) and CPU (for data processing), so the solution in this tutorial is
not necessarily the best for different scenarios.
### Use TensorFlow queues
In general, ``feed_dict`` is slow and should never appear in your critical loop.
i.e., you should avoid loops like this:
```python
while True:
X, y = get_some_data()
minimize_op.run(feed_dict={'X': X, 'y': y})
```
However, when you need to load data from Python-side, this is the only available interface in frameworks such as Keras, tflearn.
You should use something like this instead:
```python
# Thread 1:
while True:
X, y = get_some_data()
enqueue.run(feed_dict={'X': X, 'y': y}) # feed data to a TensorFlow queue
# Thread 2:
while True:
minimize_op.run() # minimize_op was built from dequeued tensors
```
This is automatically handled by tensorpack trainers already (unless you used the demo ``SimpleTrainer``),
see [Trainer](trainer.md) for details.
TensorFlow is providing staging interface which may further improve the speed. This is
[issue#140](https://github.com/ppwwyyxx/tensorpack/issues/140).
### Figure out your bottleneck
...@@ -5,10 +5,11 @@ Tutorials ...@@ -5,10 +5,11 @@ Tutorials
Test. Test.
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 1
glance glance
dataflow dataflow
models efficient-data
model
trainer trainer
callbacks callback
# Trainers # Trainer
Training is basically **running something again and again**. Training is basically **running something again and again**.
Tensorpack base trainer implements the logic of *running the iteration*, Tensorpack base trainer implements the logic of *running the iteration*,
...@@ -12,15 +12,14 @@ therefore you can use these trainers as long as you set `self.cost` in `ModelDes ...@@ -12,15 +12,14 @@ therefore you can use these trainers as long as you set `self.cost` in `ModelDes
as did in most examples. as did in most examples.
Most existing trainers were implemented with a TensorFlow queue to prefetch and buffer Most existing trainers were implemented with a TensorFlow queue to prefetch and buffer
training data, which is significantly faster than training data, which is faster than a naive `sess.run(..., feed_dict={...})`.
a naive `sess.run(..., feed_dict={...})`.
There are also multi-GPU trainers which includes the logic of data-parallel multi-GPU training, There are also multi-GPU trainers which includes the logic of data-parallel multi-GPU training,
with either synchronous update or asynchronous update. You can enable multi-GPU training with either synchronous update or asynchronous update. You can enable multi-GPU training
by just changing one line. by just changing one line.
To use trainers, pass a `TrainConfig` to configure them: To use trainers, pass a `TrainConfig` to configure them:
````python ```python
config = TrainConfig( config = TrainConfig(
dataflow=my_dataflow, dataflow=my_dataflow,
optimizer=tf.train.AdamOptimizer(0.01), optimizer=tf.train.AdamOptimizer(0.01),
...@@ -36,7 +35,7 @@ config = TrainConfig( ...@@ -36,7 +35,7 @@ config = TrainConfig(
# start multi-GPU training with synchronous update: # start multi-GPU training with synchronous update:
SyncMultiGPUTrainer(config).train() SyncMultiGPUTrainer(config).train()
```` ```
Trainers just run some iterations, so there is no limit in where the data come from Trainers just run some iterations, so there is no limit in where the data come from
or what to do in an iteration. or what to do in an iteration.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment