Commit 757f5a39 authored by Patrick Wieschollek's avatar Patrick Wieschollek Committed by Yuxin Wu

another pass over the documentation (#238)

parent 8f64dd6d
...@@ -2,7 +2,7 @@ Welcome to tensorpack! ...@@ -2,7 +2,7 @@ Welcome to tensorpack!
====================================== ======================================
tensorpack is in early development. tensorpack is in early development.
All tutorials are drafts for now. You can get an idea from them but the details All tutorials are drafts for now. You can get an idea from them, but the details
might not be correct. might not be correct.
.. toctree:: .. toctree::
......
# Callbacks # Callbacks
Apart from the actual training iterations that minimizes the cost, Apart from the actual training iterations that minimize the cost,
you almost surely would like to do something else during training. you almost surely would like to do something else during training.
Callbacks are such an interface to describe what to do besides the Callbacks are such an interface to describe what to do besides the
training iterations defined by the trainers. training iterations defined by the trainers.
...@@ -15,7 +15,7 @@ There are several places where you might want to do something else: ...@@ -15,7 +15,7 @@ There are several places where you might want to do something else:
* After the training (e.g. send the model somewhere, send a message to your phone) * After the training (e.g. send the model somewhere, send a message to your phone)
By writing callbacks to implement these tasks, you can reuse the code as long as By writing callbacks to implement these tasks, you can reuse the code as long as
you're using tensorpack trainers. For example, these are the callbacks I used when training you are using tensorpack trainers. For example, these are the callbacks I used when training
a ResNet: a ResNet:
```python ```python
...@@ -58,6 +58,6 @@ TrainConfig( ...@@ -58,6 +58,6 @@ TrainConfig(
) )
``` ```
Notice that callbacks really cover every detail of training, ranging from graph operations to the progress bar. Notice that callbacks cover every detail of training, ranging from graph operations to the progress bar.
This means you can customize every part of the training to your preference, e.g. display something This means you can customize every part of the training to your preference, e.g. display something
different in the progress bar, evaluating part of the summaries at a different frequency, etc. different in the progress bar, evaluating part of the summaries at a different frequency, etc.
...@@ -13,7 +13,7 @@ a numpy array of shape (64, 28, 28), and an array of shape (64,). ...@@ -13,7 +13,7 @@ a numpy array of shape (64, 28, 28), and an array of shape (64,).
### Composition of DataFlow ### Composition of DataFlow
One good thing about having a standard interface is to be able to provide One good thing about having a standard interface is to be able to provide
the greatest code reusablility. the greatest code reusability.
There are a lot of existing modules in tensorpack which you can use to compose There are a lot of existing modules in tensorpack which you can use to compose
complex DataFlow instances with a long pre-processing pipeline. A whole pipeline usually complex DataFlow instances with a long pre-processing pipeline. A whole pipeline usually
would __read from disk (or other sources), apply augmentations, group into batches, would __read from disk (or other sources), apply augmentations, group into batches,
...@@ -35,10 +35,10 @@ with all the data preprocessing. ...@@ -35,10 +35,10 @@ with all the data preprocessing.
All these modules are written in Python, All these modules are written in Python,
so you can easily implement whatever operations/transformations you need, so you can easily implement whatever operations/transformations you need,
without worrying about adding operators to TensorFlow. without worrying about adding operators to TensorFlow.
In the mean time, thanks to the prefetching, it can still run fast enough for In the meantime, thanks to the prefetching, it can still run fast enough for
tasks as large as ImageNet training. tasks as large as ImageNet training.
Unless you're working with standard data types (image folders, LMDB, etc), Unless you are working with standard data types (image folders, LMDB, etc),
you would usually want to write your own DataFlow. you would usually want to write your own DataFlow.
See [another tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/dataflow.html) See [another tutorial](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/dataflow.html)
for details. for details.
...@@ -49,10 +49,10 @@ for details. ...@@ -49,10 +49,10 @@ for details.
### Use DataFlow outside Tensorpack ### Use DataFlow outside Tensorpack
Another good thing about DataFlow is that it is independent of Another good thing about DataFlow is that it is independent of
tensorpack internals. You can just use it as an efficient data processing pipeline, tensorpack internals. You can just use it as an efficient data processing pipeline
and plug it into other frameworks. and plug it into other frameworks.
To use a DataFlow independently, you'll need to call `reset_state()` first to initialize it, To use a DataFlow independently, you will need to call `reset_state()` first to initialize it,
and then use the generator however you want: and then use the generator however you want:
```python ```python
df = SomeDataFlow() df = SomeDataFlow()
......
...@@ -11,16 +11,16 @@ We use ILSVRC12 training set, which contains 1.28 million images. ...@@ -11,16 +11,16 @@ We use ILSVRC12 training set, which contains 1.28 million images.
The original images (JPEG compressed) are 140G in total. The original images (JPEG compressed) are 140G in total.
The average resolution is about 400x350 <sup>[[1]]</sup>. The average resolution is about 400x350 <sup>[[1]]</sup>.
Following the [ResNet example](../examples/ResNet), we need images in their original resolution, Following the [ResNet example](../examples/ResNet), we need images in their original resolution,
so we'll read the original dataset instead of a down-sampled version, and so we will read the original dataset instead of a down-sampled version, and
apply complicated preprocessing to it. apply complicated preprocessing to it.
We'll need to reach a speed of, roughly 1000 images per second, to keep GPUs busy. We will need to reach a speed of, roughly 1000 images per second, to keep GPUs busy.
Note that the actual performance would depend on not only the disk, but also Note that the actual performance would depend on not only the disk, but also
memory (for caching) and CPU (for data processing). memory (for caching) and CPU (for data processing).
You'll definitely need to tune the parameters (#processes, #threads, size of buffer, etc.) You will need to tune the parameters (#processes, #threads, size of buffer, etc.)
or change the pipeline for new tasks and new machines to achieve best performance. or change the pipeline for new tasks and new machines to achieve the best performance.
This tutorial is quite complicated, because you do need these knowledge of hardware & system to run fast on ImageNet-sized dataset. This tutorial is quite complicated because you do need this knowledge of hardware & system to run fast on ImageNet-sized dataset.
However, for __small datasets__ (e.g., several GBs), a proper prefetch should work well enough. However, for __small datasets__ (e.g., several GBs), a proper prefetch should work well enough.
## Random Read ## Random Read
...@@ -33,21 +33,21 @@ ds1 = BatchData(ds0, 256, use_list=True) ...@@ -33,21 +33,21 @@ ds1 = BatchData(ds0, 256, use_list=True)
TestDataSpeed(ds1).start_test() TestDataSpeed(ds1).start_test()
``` ```
Here `ds0` simply reads original images from filesystem. It is implemented simply by: Here `ds0` simply reads original images from the filesystem. It is implemented simply by:
```python ```python
for filename, label in filelist: for filename, label in filelist:
yield [cv2.imread(filename), label] yield [cv2.imread(filename), label]
``` ```
And `ds1` batch the datapoints from `ds0`, so that we can measure the speed of this DataFlow in terms of "batch per second". And `ds1` batch the datapoints from `ds0`, so that we can measure the speed of this DataFlow in terms of "batch per second".
By default `BatchData` By default, `BatchData`
will stack the datapoints into an `numpy.ndarray`, but since images are originally of different shapes, we use will stack the datapoints into an `numpy.ndarray`, but since images are original of different shapes, we use
`use_list=True` so that it just produces lists. `use_list=True` so that it just produces lists.
On an SSD you probably can already observe good speed here (e.g. 5 it/s, that is 1280 samples/s), but on HDD the speed may be just 1 it/s, On an SSD you probably can already observe good speed here (e.g. 5 it/s, that is 1280 samples/s), but on HDD the speed may be just 1 it/s,
because we're doing heavy random read on the filesystem (regardless of whether `shuffle` is True). because we are doing heavy random read on the filesystem (regardless of whether `shuffle` is True).
We'll now add the cheapest pre-processing now to get an ndarray in the end instead of a list We will now add the cheapest pre-processing now to get an ndarray in the end instead of a list
(because TensorFlow will need ndarray eventually): (because TensorFlow will need ndarray eventually):
```eval_rst ```eval_rst
.. code-block:: python .. code-block:: python
...@@ -90,14 +90,14 @@ put results into a buffer of size 1000. ...@@ -90,14 +90,14 @@ put results into a buffer of size 1000.
To reduce the effect of GIL, you can then uncomment the line so that everything above it (including all the To reduce the effect of GIL, you can then uncomment the line so that everything above it (including all the
threads) happen in an independent process. threads) happen in an independent process.
There is no answer whether it's faster to use threads or processes. There is no answer whether it is faster to use threads or processes.
Processes avoid the cost of GIL but bring the cost of communication. Processes avoid the cost of GIL but bring the cost of communication.
You can also try a combination of both (several processes each with several threads). You can also try a combination of both (several processes each with several threads).
## Sequential Read ## Sequential Read
Random read is usually not a good idea, especially if you're not on SSD. Random read is usually not a good idea, especially if the data is not on a SSD.
We can also dump the dataset into one single file and read it sequentially. We can also dump the dataset into one single file and read it sequentially.
```python ```python
...@@ -149,8 +149,8 @@ As a reference, on Samsung SSD 850, the uncached speed is about 16it/s. ...@@ -149,8 +149,8 @@ As a reference, on Samsung SSD 850, the uncached speed is about 16it/s.
``` ```
Instead of shuffling all the training data in every epoch (which would require random read), Instead of shuffling all the training data in every epoch (which would require random read),
the added line above maintains a buffer of datapoints and shuffle them once a while. the added line above maintains a buffer of datapoints and shuffle them once a while.
It won't affect the model as long as the buffer is large enough, It will not affect the model as long as the buffer is large enough,
but it can also consume a lot of memory if too large. but it can also consume much memory if too large.
Then we add necessary transformations: Then we add necessary transformations:
```eval_rst ```eval_rst
...@@ -194,18 +194,18 @@ Let me summarize what the above DataFlow does: ...@@ -194,18 +194,18 @@ Let me summarize what the above DataFlow does:
1. One process reads LMDB file, shuffle them in a buffer and put them into a `multiprocessing.Queue` (used by `PrefetchData`). 1. One process reads LMDB file, shuffle them in a buffer and put them into a `multiprocessing.Queue` (used by `PrefetchData`).
2. 25 processes take items from the queue, decode and process them into [image, label] pairs, and 2. 25 processes take items from the queue, decode and process them into [image, label] pairs, and
send them through ZMQ IPC pipes. send them through ZMQ IPC pipes.
3. The main process takes data from the pipe and feed it into the graph, according to 3. The main process takes data from the pipe and feeds it into the graph, according to
how the `Trainer` is implemented. how the `Trainer` is implemented.
The above DataFlow can run at a speed of 5~10 batches per second, if you have good CPUs, RAM, disks and augmentors. The above DataFlow can run at a speed of 5~10 batches per second if you have good CPUs, RAM, disks and augmentors.
As a reference, tensorpack can train ResNet-18 (a shallow ResNet) at 4.5 batches (of 256 samples) per second on 4 old TitanX. As a reference, tensorpack can train ResNet-18 (a shallow ResNet) at 4.5 batches (of 256 samples) per second on 4 old TitanX.
So DataFlow won't be a serious bottleneck if configured properly. So DataFlow will not be a serious bottleneck if configured properly.
## More Efficient DataFlow ## More Efficient DataFlow
To work with larger datasets (or smaller networks, or more GPUs) you could be seriously bounded by CPU or disk speed of a single machine. To work with larger datasets (or smaller networks, or more GPUs) you could be severely bounded by CPU or disk speed of a single machine.
One way is to optimize the preprocessing routine (e.g. write something in C++ or use TF reading operators). One way is to optimize the preprocessing routine (e.g. write something in C++ or use TF reading operators).
Another way to scale is to run DataFlow distributely and collect them on the Another way to scale is to run DataFlow in a distributed fashion and collect them on the
training machine. E.g.: training machine. E.g.:
```python ```python
# Data Machine #1, process 1-20: # Data Machine #1, process 1-20:
......
### Write an image augmentor ### Write an image augmentor
First thing to note: an augmentor is a part of the DataFlow, so you can always The first thing to note: an augmentor is a part of the DataFlow, so you can always
[write a DataFlow](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/dataflow.html) [write a DataFlow](http://tensorpack.readthedocs.io/en/latest/tutorial/extend/dataflow.html)
to do whatever operations to your data, rather than writing an augmentor. to do whatever operations to your data, rather than writing an augmentor.
Augmentors just sometimes make things easier. Augmentors just sometimes make things easier.
...@@ -9,9 +9,9 @@ Augmentors just sometimes make things easier. ...@@ -9,9 +9,9 @@ Augmentors just sometimes make things easier.
An augmentor maps images to images. An augmentor maps images to images.
If you have such a mapping function `f` already, you can simply use `imgaug.MapImage(f)` as the If you have such a mapping function `f` already, you can simply use `imgaug.MapImage(f)` as the
augmentor, or use `MapDataComponent(df, f, index)` as the DataFlow. augmentor, or use `MapDataComponent(df, f, index)` as the DataFlow.
In other words, for simple mapping you don't need to write an augmentor. In other words, for simple mapping you do not need to write an augmentor.
An augmentor does something more than applying the mapping. The interface you'll need to implement An augmentor does something more than applying the mapping. The interface you will need to implement
is: is:
```python ```python
...@@ -27,9 +27,9 @@ class MyAug(imgaug.ImageAugmentor): ...@@ -27,9 +27,9 @@ class MyAug(imgaug.ImageAugmentor):
It does the following extra things for you: It does the following extra things for you:
1. `self.rng` is a `np.random.RandomState` object, 1. `self.rng` is a `np.random.RandomState` object,
guranteed to have different seeds when you use multiprocess prefetch. guaranteed to have different seeds when you use multiprocess prefetch.
In multiprocess settings, you'll always need it to generate random numbers. In multiprocess settings, you will always need it to generate random numbers.
2. Random parameters and the actual augmentation is separated. This allows you to apply the 2. Random parameters and the actual augmentation is separated. This allows you to apply the
same random transformation to several images (with `AugmentImageComponents`), same random transformation to several images (with `AugmentImageComponents`),
which is important to tasks such as segmentation. which is essential to tasks such as segmentation.
...@@ -3,13 +3,13 @@ ...@@ -3,13 +3,13 @@
There are several existing DataFlow, e.g. ImageFromFile, DataFromList, which you can There are several existing DataFlow, e.g. ImageFromFile, DataFromList, which you can
use to read images or load data from a list. use to read images or load data from a list.
But in general, you'll probably need to write a new DataFlow to produce data for your task. However, in general, you will probably need to write a new DataFlow to produce data for your task.
DataFlow implementations for several well-known datasets are provided in the DataFlow implementations for several well-known datasets are provided in the
[dataflow.dataset](http://tensorpack.readthedocs.io/en/latest/modules/tensorpack.dataflow.dataset.html) [dataflow.dataset](http://tensorpack.readthedocs.io/en/latest/modules/tensorpack.dataflow.dataset.html)
module, you can take them as a reference. module, you can take them as a reference.
Usually you just need to implement the `get_data()` method which yields a datapoint every time. Usually, you just need to implement the `get_data()` method which yields a datapoint every time.
```python ```python
class MyDataFlow(DataFlow): class MyDataFlow(DataFlow):
def get_data(self): def get_data(self):
...@@ -22,12 +22,11 @@ class MyDataFlow(DataFlow): ...@@ -22,12 +22,11 @@ class MyDataFlow(DataFlow):
Optionally, DataFlow can implement the following two methods: Optionally, DataFlow can implement the following two methods:
+ `size()`. Return the number of elements the generator can produce. Certain modules might require this. + `size()`. Return the number of elements the generator can produce. Certain modules might require this.
For example, only DataFlows with the same number of elements can be joined together.
+ `reset_state()`. It's guaranteed that the actual process which runs a DataFlow will invoke this method before using it. + `reset_state()`. It is guaranteed that the actual process which runs a DataFlow will invoke this method before using it.
So if this DataFlow needs to something after a `fork()`, you should put it here. So if this DataFlow needs to something after a `fork()`, you should put it here.
A typical situation is when your DataFlow uses random number generator (RNG). Then you'd need to reset the RNG here, A typical situation is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
otherwise child processes will have the same random seed. The `RNGDataFlow` class does this already. Otherwise, child processes will have the same random seed. The `RNGDataFlow` class does this already.
With a "low-level" DataFlow defined, you can then compose it with existing modules. With a "low-level" DataFlow defined, you can then compose it with existing modules.
...@@ -2,12 +2,12 @@ ...@@ -2,12 +2,12 @@
## Implement a layer ## Implement a layer
Symbolic functions should be nothing new to you. Symbolic functions should be nothing new to you.
Using symbolic functions is not special in tensorpack: you can use any symbolic functions you've Using symbolic functions is not special in tensorpack: you can use any symbolic functions you have
made or seen elsewhere with tensorpack layers. made or seen elsewhere with tensorpack layers.
You can use symbolic functions from slim/tflearn/tensorlayer, and even Keras ([with some tricks](../../examples/mnist-keras.py)). You can use symbolic functions from slim/tflearn/tensorlayer, and even Keras ([with some tricks](../../examples/mnist-keras.py)).
So you never **have to** implement a tensorpack layer. So you never **have to** implement a tensorpack layer.
If you'd like, you can make a symbolic function become a "layer" by following some simple rules, and then gain benefits from the framework. If you would like, you can make a symbolic function become a "layer" by following some simple rules, and then gain benefits from the framework.
Take a look at the [Convolutional Layer](../../tensorpack/models/conv2d.py#L14) implementation for an example of how to define a layer: Take a look at the [Convolutional Layer](../../tensorpack/models/conv2d.py#L14) implementation for an example of how to define a layer:
...@@ -34,10 +34,10 @@ By making a symbolic function a "layer", the following things will happen: ...@@ -34,10 +34,10 @@ By making a symbolic function a "layer", the following things will happen:
+ `argscope` will then work for all its arguments except the input tensor(s). + `argscope` will then work for all its arguments except the input tensor(s).
+ It will work with `LinearWrap`: you can use it if the output of one layer matches the input of the next layer. + It will work with `LinearWrap`: you can use it if the output of one layer matches the input of the next layer.
There are also a number of (non-layer) symbolic functions in the `tfutils.symbolic_functions` module. There are also some (non-layer) symbolic functions in the `tfutils.symbolic_functions` module.
There isn't a rule about what kind of symbolic functions should be made a layer -- they're quite There is not a rule about what kind of symbolic functions should be made a layer -- they are quite
similar anyway. But in general I define the following symbolic functions as layers: similar anyway. However, in general, I define the following symbolic functions as layers:
+ Functions which contain variables. A variable scope is almost always helpful for such functions. + Functions which contain variables. A variable scope is almost always helpful for such functions.
+ Functions which are commonly referred to as "layers", such as pooling. This make a model + Functions which are commonly referred to as "layers", such as pooling. This makes a model
definition more straightforward. definition more straightforward.
...@@ -3,9 +3,9 @@ ...@@ -3,9 +3,9 @@
## Does it support data format X / augmentation Y / layer Z? ## Does it support data format X / augmentation Y / layer Z?
The library tries to __support__ everything, but it couldn't really __include__ everything. The library tries to __support__ everything, but it could not really __include__ everything.
For your XYZ, you can either implement them, or use any existing python code and wrap it For your XYZ, you can either implement them or use any existing python code and wrap it
with tensorpack interface. See [Extend Tensorpack](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack) with tensorpack interface. See [Extend Tensorpack](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack)
for more details. for more details.
...@@ -13,7 +13,7 @@ If you think: ...@@ -13,7 +13,7 @@ If you think:
1. The framework has limitation in its interface so your XYZ cannot be supported, OR 1. The framework has limitation in its interface so your XYZ cannot be supported, OR
2. Your XYZ is very common, or very well-defined, so it would be nice to include it. 2. Your XYZ is very common, or very well-defined, so it would be nice to include it.
Then it's a good time to open an issue. Then it is a good time to open an issue.
## How to dump/inspect a model ## How to dump/inspect a model
...@@ -35,13 +35,13 @@ All model loading (in either training or testing) is through the `session_init` ...@@ -35,13 +35,13 @@ All model loading (in either training or testing) is through the `session_init`
in `TrainConfig` or `PredictConfig`. in `TrainConfig` or `PredictConfig`.
It accepts a `SessionInit` instance, where the common options are `SaverRestore` which restores It accepts a `SessionInit` instance, where the common options are `SaverRestore` which restores
TF checkpoint, or `DictRestore` which restores a dict. `get_model_loader` is a small helper to TF checkpoint, or `DictRestore` which restores a dict. `get_model_loader` is a small helper to
decide which one to use from file name. decide which one to use from a file name.
Doing transfer learning is straightforward. Variable restoring is completely based on name match between Doing transfer learning is straightforward. Variable restoring is completely based on name match between
the current graph and the `SessionInit` initializer. the current graph and the `SessionInit` initializer.
Therefore, if you want to load some model, just use the same name. Therefore, if you want to load some model, just use the same name.
If you want to re-train some layer, just rename it. If you want to re-train some layer, just rename it.
Unmatched variables on both side will be printed as warning. Unmatched variables on both sides will be printed as a warning.
To freeze some variables, there are [different ways](https://github.com/ppwwyyxx/tensorpack/issues/87#issuecomment-270545291) To freeze some variables, there are [different ways](https://github.com/ppwwyyxx/tensorpack/issues/87#issuecomment-270545291)
with pros and cons. with pros and cons.
...@@ -7,9 +7,9 @@ A High Level Glance ...@@ -7,9 +7,9 @@ A High Level Glance
* :doc:`dataflow` is a set of extensible tools to help you define your input data with ease and speed. * :doc:`dataflow` is a set of extensible tools to help you define your input data with ease and speed.
It provides a uniform interface, so data processing modules can be chained together. It provides a uniform interface so that data processing modules can be chained together.
It allows you to load and process your data in pure Python and accelerate it by prefetching. It allows you to load and process your data in pure Python and accelerate it by prefetching.
See also :doc:`tf-queue` and :doc:`efficient-dataflow` for more details about efficiency of data See also :doc:`tf-queue` and :doc:`efficient-dataflow` for more details about the efficiency of data
processing. processing.
* You can use any TF-based symbolic function library to define a model in tensorpack. * You can use any TF-based symbolic function library to define a model in tensorpack.
...@@ -19,8 +19,8 @@ A High Level Glance ...@@ -19,8 +19,8 @@ A High Level Glance
Both DataFlow and models can be used outside tensorpack, as just a data processing library and a symbolic Both DataFlow and models can be used outside tensorpack, as just a data processing library and a symbolic
function library. Tensopack trainers integrate these two components and add more convenient features. function library. Tensopack trainers integrate these two components and add more convenient features.
* tensorpack :doc:`trainer` manages the training loops for you so you won't have to worry about * tensorpack :doc:`trainer` manages the training loops for you, so you will not have to worry about
details such as multi-GPU training. At the same time it keeps the power of customization details such as multi-GPU training. At the same time, it keeps the power of customization
through callbacks. through callbacks.
* Callbacks are like ``tf.train.SessionRunHook``, or plugins, or extensions. During training, * Callbacks are like ``tf.train.SessionRunHook``, or plugins, or extensions. During training,
......
...@@ -77,7 +77,7 @@ with TowerContext('', is_training=True): ...@@ -77,7 +77,7 @@ with TowerContext('', is_training=True):
When defining the model you can construct the graph using whatever library you feel comfortable with. When defining the model you can construct the graph using whatever library you feel comfortable with.
Usually, slim/tflearn/tensorlayer are just symbolic functions, calling them is nothing different Usually, slim/tflearn/tensorlayer are just symbolic functions, calling them is nothing different
from calling `tf.add`. However it's a bit different to use sonnet/Keras. from calling `tf.add`. However it is a bit different to use sonnet/Keras.
sonnet/Keras manages the variable scope by their own model classes, and calling their symbolic functions sonnet/Keras manages the variable scope by their own model classes, and calling their symbolic functions
always creates new variable scope. See the [Keras example](../examples/mnist-keras.py) for how to always creates new variable scope. See the [Keras example](../examples/mnist-keras.py) for how to
......
# How data goes into graph # How data goes into the graph
This tutorial covers how data goes from DataFlow to TensorFlow graph. This tutorial covers how data goes from DataFlow to TensorFlow graph.
They are tensorpack internal details, but it's important to know They are tensorpack internal details, but it is important to know
if you care about efficiency. if you care about efficiency.
## Use TensorFlow queues ## Use TensorFlow queues
...@@ -15,7 +15,7 @@ while True: ...@@ -15,7 +15,7 @@ while True:
minimize_op.run(feed_dict={'X': X, 'y': y}) minimize_op.run(feed_dict={'X': X, 'y': y})
``` ```
However, when you need to load data from Python-side, this is the only available interface in frameworks such as Keras, tflearn. However, when you need to load data from Python-side, this is the only available interface in frameworks such as Keras, tflearn.
This is part of the reason why [tensorpack is faster](https://gist.github.com/ppwwyyxx/8d95da79f8d97036a7d67c2416c851b6). This is part of the reason why [tensorpack is faster](https://gist.github.com/ppwwyyxx/8d95da79f8d97036a7d67c2416c851b6) than examples from other packages.
You should use something like this instead: You should use something like this instead:
```python ```python
...@@ -42,12 +42,11 @@ reading / preprocessing ops in C++ if there isn't one for your task. ...@@ -42,12 +42,11 @@ reading / preprocessing ops in C++ if there isn't one for your task.
## Figure out the bottleneck ## Figure out the bottleneck
For training we will only worry about the throughput but not the latency. For training, we will only worry about the throughput but not the latency.
Thread 1 & 2 runs in parallel, and the faster one will block to wait for the slower one. Thread 1 & 2 runs in parallel and the faster one will block to wait for the slower one.
So the overall throughput will appear to be the slower one. So the overall throughput will appear to be the slower one.
There isn't a way to accurately benchmark the two threads while they are running, without introducing overhead. But There isn't a way to accurately benchmark the two threads while they are running, without introducing overhead. However, are ways to understand which one is the bottleneck:
there are ways to understand which one is the bottleneck:
1. Use the average occupancy (size) of the queue. This information is summarized after every epoch. 1. Use the average occupancy (size) of the queue. This information is summarized after every epoch.
If the queue is nearly empty, then the data thread is the bottleneck. If the queue is nearly empty, then the data thread is the bottleneck.
......
# Trainer # Trainer
Training is basically **running something again and again**. Training is **running something again and again**.
Tensorpack base trainer implements the logic of *running the iteration*, Tensorpack base trainer implements the logic of *running the iteration*,
and other trainers implement *what the iteration is*. and other trainers implement *what the iteration is*.
Most neural network training tasks are single-cost optimization. Most neural network training tasks are single-cost optimization.
Tensorpack provides some trainer implementations for such tasks. Tensorpack provides some trainer implementations for such tasks.
These trainers will by default minimizes `ModelDesc.cost`, These trainers will by default minimizes `ModelDesc.cost`.
therefore you can use these trainers as long as you set `self.cost` in `ModelDesc._build_graph()`, Therefore, you can use these trainers as long as you set `self.cost` in `ModelDesc._build_graph()`,
as most examples did. as most examples did.
Most existing trainers were implemented with a TensorFlow queue to prefetch and buffer Most existing trainers were implemented with a TensorFlow queue to prefetch and buffer
training data, which is faster than a naive `sess.run(..., feed_dict={...})`. training data, which is faster than a naive `sess.run(..., feed_dict={...})`.
There are also multi-GPU trainers which includes the logic of data-parallel multi-GPU training, There are also multi-GPU trainers which include the logic of data-parallel multi-GPU training,
with either synchronous update or asynchronous update. You can enable multi-GPU training with either synchronous update or asynchronous update. You can enable multi-GPU training
by just changing one line. by just changing one line.
...@@ -32,11 +32,11 @@ config = TrainConfig( ...@@ -32,11 +32,11 @@ config = TrainConfig(
# start training with queue prefetch: # start training with queue prefetch:
# QueueInputTrainer(config).train() # QueueInputTrainer(config).train()
# start multi-GPU training with synchronous update: # start multi-GPU training with a synchronous update:
SyncMultiGPUTrainer(config).train() SyncMultiGPUTrainer(config).train()
``` ```
Trainers just run some iterations, so there is no limit in where the data come from Trainers just run some iterations, so there is no limit to where the data come from
or what to do in an iteration. or what to do in an iteration.
For example, [GAN trainer](../examples/GAN/GAN.py) minimizes For example, [GAN trainer](../examples/GAN/GAN.py) minimizes
two cost functions alternatively. two cost functions alternatively.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment