NOTE: this page is a bit outdated. Will be updated soon.
As creating a neural network for digit classification seems to be a bit outdated, we will create a fictional network that learns to colorize grayscale images. In this case-study, you will learn to do the following using TensorPack.
- DataFlow
+ create a basic dataflow containing images
+ debug you dataflow
+ add custom manipulation to your data such as converting to Lab-space
+ efficiently prefetch data
- Network
+ define a neural network architecture for regression
+ integrate summary functions of TensorFlow
- Training
+ create a training configuration
- Callbacks
+ write your own callback to export predicted images after each epoch
### DataFlow
The basic idea is to gather a huge amount of images, resizing them to the same size and extract
the luminance channel after converting from RGB to Lab. For demonstration purposes, we will split
the dataflow definition into separate steps, though it might more efficient to combine some steps.
#### Reading data
The first node in the dataflow is the image reader. You can implement the reader however you want, but there are some existing ones we can use, e.g.:
- use the lmdb files you probably already have for the Caffe framework
- collect images from a specific directory
- read ImageNet dataset if you have already downloaded these images
We will use simply a directory which consists of many RGB images. This is as simple as:
Note that we've also added batch and prefetch, so that the dataflow now generates images of shape (32, 256, 256, 3), and faster.
But wait! The alert reader makes a critical observation! For input we need the L channel *only* and we should add the RGB image as ground-truth data. Let's fix that.
Here, we simply apply a mapping function to the datapoint, transform the single component to two components: the first is the L color space, and the second is just itself.
The output when using `PrintData` should be like:
```
datapoint 0<2 with 2 components consists of
dp 0: is ndarray of shape (32, 256, 256) with range [0, 100.0000]
dp 1: is ndarray of shape (32, 256, 256, 3) with range [0, 221.6387]
datapoint 1<2 with 2 components consists of
dp 0: is ndarray of shape (32, 256, 256) with range [0, 100.0000]
dp 1: is ndarray of shape (32, 256, 256, 3) with range [0, 249.6030]
```
Well, this is probably not the most efficient way to encode this process. But it clearly demonstrates how much flexibility the `dataflow` gives.
You can easily insert you own functions, and utilize the pre-defined modules at the same time.
### Network
If you are surprised how far we already are, you will enjoy how easy it is to define a network model. The most simple model is probably:
```python
classModel(ModelDesc):
def_get_inputs(self):
pass
def_build_graph(self,input_vars):
self.cost=0
```
The framework expects:
- a definition of inputs in `_get_inputs`
- a computation graph containing the actual network layers in `_build_graph`
- In single-cost optimization problem, a member `self.cost` representing the loss function we would like to minimize.
#### Define inputs
Our dataflow produces data which looks like `[(32, 256, 256), (32, 256, 256, 3)]`.
The first entry is the luminance channel as input and the latter is the original RGB image with all three channels. So we will write
This is pretty straight forward, isn't it? We defined the shapes of the input and give each entry a name.
You can certainly use 32 instead of `None`, but since the model itself doesn't really need to know
the batch size, using `None` offers the extra flexibility to run inference with a different batch size in the same graph.
From now, the `input_vars` in `_build_graph(self, input_vars)` will be the tensors of the defined shapes in the method `_get_inputs`. We can therefore write
There are probably many better tutorials about defining your network model. And there are definitely [better models](../../examples/GAN/image2image.py). You should check them later. A good way to understand layers from this library is to play with those examples.
It should be noted that you can write your models using [tfSlim](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim)
which comes along [architectures and pre-trained models](https://github.com/tensorflow/models/tree/master/slim/nets) for image classification.
TensorPack automatically handles regularization and batchnorm updates from tfSlim. And you can directly load these pre-trained checkpoints from state-of-the-art models in TensorPack. Isn't this cool?
The remaining part is a boring L2-loss function given by:
add a plot of the moving average of the cost tensor, and add some intermediate results to the tab of "images" inside TensorBoard. The summary is written after each epoch.
Note that you can certainly use `tf.summary.scalar(self.cost)`, but then you'll only see a single cost value (rather than moving average) which is much less informative.
### Training
Let's summarize: we have a model and data.
The missing piece which stitches these parts together is the training protocol.
It is only a [configuration](http://tensorpack.readthedocs.io/modules/tensorpack.train.html#tensorpack.train.TrainConfig)
For the dataflow, we already implemented `get_data` in the first part. Specifying the learning rate is done by
The model was implemented, and `max_epoch` is set to 100.
The alert reader who almost already had gone to sleep makes some noise: "Where is `dataset.size()` coming from?"
This method is implemented by `ImageFromFile` and is forwarded by all mappings.
If you have 42 images in your directory, then this value would be 42.
Satisfied with this answer, the alert reader went out of the room.
But he will miss the most interesting part: the callback section. We will cover this in the next section.
### Callbacks
Until this point, we spoke about all necessary parts of deep learning pipelines which are common for GANs, image-recognition and embedding learning.
But sometimes you want to add your own code to do something extra. We will now add a functionality which will export some entries of the tensor `prediction`.
Remember, this tensor is the result of the decoder part in our network.
To modularize the code, there is a plug-in mechanism called callbacks. Our callback looks like
```python
classOnlineExport(Callback):
def__init__(self):
pass
def_setup_graph(self):
pass
def_trigger_epoch(self):
pass
```
So it has 3 methods, although there are some more.
TensorPack is conservative regarding the computation graph.
After the network is constructed and all callbacks are initialized the graph is read-only.
So once you started training, there is no way of modifying the graph, which we actually want to do for inference.
You'll need to define the whole graph before training starts.
Let us fill in some parts:
```python
classOnlineExport(Callback):
def__init__(self):
self.cc=0
self.example_input=color.rgb2lab(cv2.imread('myimage.jpg')[:,:,::-1])[:,:,0]# read rgb image and extract luminance