Commit 7ce3d7ab authored by Yuxin Wu's avatar Yuxin Wu

update docs. use INTEPR_LINEAR by default.

parent 72c97317
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
As creating a neural network for digit classification seems to be a bit outdated, we will create a fictional network that learns to colorize grayscale images. In this case-study, you will learn to do the following using TensorPack. As creating a neural network for digit classification seems to be a bit outdated, we will create a fictional network that learns to colorize grayscale images. In this case-study, you will learn to do the following using TensorPack.
- Dataflow - DataFlow
+ create a basic dataflow containing images + create a basic dataflow containing images
+ debug you dataflow + debug you dataflow
+ add custom manipulation to your data such as converting to Lab-space + add custom manipulation to your data such as converting to Lab-space
...@@ -16,94 +16,86 @@ As creating a neural network for digit classification seems to be a bit outdated ...@@ -16,94 +16,86 @@ As creating a neural network for digit classification seems to be a bit outdated
- Callbacks - Callbacks
+ write your own callback to export predicted images after each epoch + write your own callback to export predicted images after each epoch
## Dataflow ## DataFlow
The basic idea is to gather a huge amount of images, resizing them to the same size and extract the luminance channel after converting from RGB to Lab. For demonstration purposes, we will split the dataflow definition into single steps, though it might more efficient to combine some steps. The basic idea is to gather a huge amount of images, resizing them to the same size and extract
the luminance channel after converting from RGB to Lab. For demonstration purposes, we will split
the dataflow definition into separate steps, though it might more efficient to combine some steps.
### Reading data ### Reading data
The first node in the dataflow is the image-reader. There are several ways to read a lot of images: The first node in the dataflow is the image reader. You can implement the reader however you want, but there are some existing ones we can use, e.g.:
- use lmdb files you probably already used for the Caffe framework - use the lmdb files you probably already have for the Caffe framework
- collect images from a specific directory - collect images from a specific directory
- read data from the ImageNet if you have already downloaded these images - read ImageNet dataset if you have already downloaded these images
We will use simply a directory which consists of many RGB images. This is as simple as: We will use simply a directory which consists of many RGB images. This is as simple as:
````python ```python
from tensorpack import * from tensorpack import *
import glob, os import glob, os
imgs = glob.glob(os.path.join('/media/data/img', '*.jpg')) imgs = glob.glob(os.path.join('/media/data/img', '*.jpg'))
ds = ImageFromFile(imgs, channel=3, shuffle=True) ds = ImageFromFile(imgs, channel=3, shuffle=True)
ds = PrintData(ds, num=2) # only for debugging ds = PrintData(ds, num=2) # only for debugging
```` ```
Running this will give you: Running this will give you:
```` ```
[0112 18:59:47 @common.py:600] DataFlow Info: [0112 18:59:47 @common.py:600] DataFlow Info:
datapoint 0<2 with 1 elements consists of datapoint 0<2 with 1 components consists of
dp 0: is ndarray of shape (1920, 2560, 3) with range [0.0000, 255.0000] dp 0: is ndarray of shape (1920, 2560, 3) with range [0.0000, 255.0000]
datapoint 1<2 with 1 elements consists of datapoint 1<2 with 1 components consists of
dp 0: is ndarray of shape (850, 1554, 3) with range [0.0000, 255.0000] dp 0: is ndarray of shape (850, 1554, 3) with range [0.0000, 255.0000]
```` ```
To actually get data you can add To actually access the datapoints generated by the dataflow, you can use
````python ```python
ds.reset_state()
for dp in ds.get_data(): for dp in ds.get_data():
print dp[0] # this is RGB data! print dp[0] # this is an RGB image!
```` ```
This kind of iteration is used behind the scenes to feed data for training.
This iteration is used in an additional process. Of course, without the `print` statement. There is some other magic behind the scenes. The dataflow checks of the image is actually an RGB image with 3 channels and skip those gray-scale images.
### Manipulate incoming data ### Manipulate incoming data
Now, training a network which is not fully convolutional requires inputs of fixed size. Let us add this to the dataflow. Now, training a ConvNet which is not fully convolutional requires images of known shape, but our
directory may contain images of different sizes. Let us add this to the dataflow:
````python
from tensorpack import *
import glob, os
```python
imgs = glob.glob(os.path.join('/media/data/img', '*.jpg')) imgs = glob.glob(os.path.join('/media/data/img', '*.jpg'))
ds = ImageFromFile(imgs, channel=3, shuffle=True) ds = ImageFromFile(imgs, channel=3, shuffle=True)
ds = AugmentImageComponent(ds, [imgaug.Resize((224, 224))]) ds = AugmentImageComponent(ds, [imgaug.Resize((256, 256))])
ds = PrintData(ds, num=2) # only for debugging ds = PrintData(ds, num=2) # only for debugging
```` ```
It's time to convert the rgb information into the Lab space. In python, you would to something like It's time to convert the RGB image into the Lab space. In python, you would to something like this:
````python ```python
from skimage import color
rgb = get_my_image() rgb = get_my_image()
lab = color.rgb2lab(rgb) lab = cv2.cvtColor(rgb, cv2.COLOR_RGB2Lab)
```` ```
using the `scikit-image` pip package.
We should add this to our dataflow: We should add this to our dataflow:
````python ```python
from tensorpack import * import cv2
import glob, os
from skimage import color
imgs = glob.glob(os.path.join('/media/data/img', '*.jpg')) imgs = glob.glob(os.path.join('/media/data/img', '*.jpg'))
ds = ImageFromFile(imgs, channel=3, shuffle=True) ds = ImageFromFile(imgs, channel=3, shuffle=True)
ds = AugmentImageComponent(ds, [imgaug.Resize((224, 224))]) ds = AugmentImageComponent(ds, [imgaug.Resize((256, 256))])
ds = MapData(ds, lambda dp: [color.rgb2lab(dp[0])]) ds = MapDataComponent(ds, lambda im: cv2.cvtColor(im, cv2.COLOR_RGB2Lab))
ds = PrintData(ds, num=2) # only for debugging ds = PrintData(ds, num=2) # only for debugging
```` ```
We can enhance this version by writing Alternatively, we can also define `rgb2lab` as an augmentor, to make code more compact.
````python ```python
from tensorpack import *
import glob, os
from skimage import color from skimage import color
def get_data(): def get_data():
augs = [imgaug.Resize((256, 256)), augs = [imgaug.Resize((256, 256)),
imgaug.MapImage(color.rgb2lab)] imgaug.MapImage(lambda im: cv2.cvtColor(im, cv2.COLOR_RGB2Lab))]
imgs = glob.glob(os.path.join('/media/data/img', '*.jpg')) imgs = glob.glob(os.path.join('/media/data/img', '*.jpg'))
ds = ImageFromFile(imgs, channel=3, shuffle=True) ds = ImageFromFile(imgs, channel=3, shuffle=True)
...@@ -111,51 +103,47 @@ def get_data(): ...@@ -111,51 +103,47 @@ def get_data():
ds = BatchData(ds, 32) ds = BatchData(ds, 32)
ds = PrefetchData(ds, 4) # use queue size 4 ds = PrefetchData(ds, 4) # use queue size 4
return ds return ds
```` ```
Note that we've also added batch and prefetch, so that the dataflow now generates images of shape (32, 256, 256, 3), and faster.
But wait! The alert reader makes a critical observation! We need the L channel *only* and we should add the RGB image as ground-truth data. Let's fix that. But wait! The alert reader makes a critical observation! For input we need the L channel *only* and we should add the RGB image as ground-truth data. Let's fix that.
````python ```python
from tensorpack import * from tensorpack import *
import glob, os import glob, os
from skimage import color import cv2
def get_data(): def get_data():
augs = [imgaug.Resize((256, 256))]
augs2 = [imgaug.MapImage(color.rgb2lab)]
imgs = glob.glob(os.path.join('/media/data/img', '*.jpg')) imgs = glob.glob(os.path.join('/media/data/img', '*.jpg'))
ds = ImageFromFile(imgs, channel=3, shuffle=True) ds = ImageFromFile(imgs, channel=3, shuffle=True)
ds = AugmentImageComponent(ds, augs) ds = AugmentImageComponent(ds, [imgaug.Resize((256, 256))])
ds = MapData(ds, lambda dp: [dp[0], dp[0]]) # duplicate ds = MapData(ds, lambda dp: [cv2.cvtColor(dp[0], cv2.COLOR_RGB2Lab)[:,:,0], dp[0]])
ds = AugmentImageComponent(ds, augs2)
ds = MapData(ds, lambda dp: [dp[0][:, :, 0], dp[1]]) # get L channel from first entry
ds = BatchData(ds, 32) ds = BatchData(ds, 32)
ds = PrefetchData(ds, 4) # use queue size 4 ds = PrefetchData(ds, 4) # use queue size 4
ds = PrintData(ds, num=2) # only for debug
return ds return ds
```` ```
Here, we simply duplicate the rgb image and only apply the `image augmentors` to the first copy of the datapoint. The output when using `PrintData` should be like Here, we simply apply a mapping function to the datapoint, transform the single component to two components: the first is the L color space, and the second is just itself.
The output when using `PrintData` should be like:
```` ```
datapoint 0<2 with 2 elements consists of datapoint 0<2 with 2 components consists of
dp 0: is ndarray of shape (256, 256) with range [0, 100.0000] dp 0: is ndarray of shape (32, 256, 256) with range [0, 100.0000]
dp 1: is ndarray of shape (256, 256, 3) with range [0, 221.6387] dp 1: is ndarray of shape (32, 256, 256, 3) with range [0, 221.6387]
datapoint 1<2 with 2 elements consists of datapoint 1<2 with 2 components consists of
dp 0: is ndarray of shape (256, 256) with range [0, 100.0000] dp 0: is ndarray of shape (32, 256, 256) with range [0, 100.0000]
dp 1: is ndarray of shape (256, 256, 3) with range [0, 249.6030] dp 1: is ndarray of shape (32, 256, 256, 3) with range [0, 249.6030]
```
````
Again, do not use `PrintData` in combination with `PrefetchData` because the prefetch will be done in another process.
Well, this is probably not the most efficient way to encode this process. But it clearly demonstrates how much flexibility the `dataflow` gives. Well, this is probably not the most efficient way to encode this process. But it clearly demonstrates how much flexibility the `dataflow` gives.
You can easily insert you own functions, and utilize the pre-defined modules at the same time.
## Network ## Network
If you are surprised how far we already are, you will enjoy how easy it is to define a network model. The most simple model is probably: If you are surprised how far we already are, you will enjoy how easy it is to define a network model. The most simple model is probably:
````python ```python
class Model(ModelDesc): class Model(ModelDesc):
def _get_input_vars(self): def _get_input_vars(self):
...@@ -163,27 +151,30 @@ class Model(ModelDesc): ...@@ -163,27 +151,30 @@ class Model(ModelDesc):
def _build_graph(self, input_vars): def _build_graph(self, input_vars):
self.cost = 0 self.cost = 0
```` ```
The framework expects: The framework expects:
- a definition of inputs in `_get_input_vars` - a definition of inputs in `_get_input_vars`
- a computation graph containing the actual network layers in `_build_graph` - a computation graph containing the actual network layers in `_build_graph`
- a member `self.cost` representing the loss function we would like to minimize. - In single-cost optimization problem, a member `self.cost` representing the loss function we would like to minimize.
### Define inputs ### Define inputs
Our dataflow produces data which looks like `[(256, 256), (256, 256, 3)]`. The first entry is the luminance channel as input and the latter is the original RGB image with all three channels. So we will write Our dataflow produces data which looks like `[(32, 256, 256), (32, 256, 256, 3)]`.
The first entry is the luminance channel as input and the latter is the original RGB image with all three channels. So we will write
````python ```python
def _get_input_vars(self): def _get_input_vars(self):
return [InputVar(tf.float32, (None, 256, 256), 'luminance'), return [InputVar(tf.float32, (None, 256, 256), 'luminance'),
InputVar(tf.int32, (None, 256, 256, 3), 'rgb')] InputVar(tf.int32, (None, 256, 256, 3), 'rgb')]
```` ```
This is pretty straight forward, isn't it? We defined the shapes of the input and spend each entry a name. This is very generous of us and will us help later to build an inference mechanism. This is pretty straight forward, isn't it? We defined the shapes of the input and give each entry a name.
You can certainly use 32 instead of `None`, but since the model itself doesn't really need to know
the batch size, using `None` offers the extra flexibility to run inference with a different batch size in the same graph.
From now, the `input_vars` in `_build_graph(self, input_vars)` will have the shapes `[(256, 256), (256, 256, 3)]` because of the completed method `_get_input_vars`. We can therefore write From now, the `input_vars` in `_build_graph(self, input_vars)` will be the tensors of the defined shapes in the method `_get_input_vars`. We can therefore write
````python ```python
class Model(ModelDesc): class Model(ModelDesc):
def _get_input_vars(self): def _get_input_vars(self):
...@@ -193,101 +184,112 @@ class Model(ModelDesc): ...@@ -193,101 +184,112 @@ class Model(ModelDesc):
def _build_graph(self, input_vars): def _build_graph(self, input_vars):
luminance, rgb = input_vars # (None, 256, 256), (None, 256, 256, 3) luminance, rgb = input_vars # (None, 256, 256), (None, 256, 256, 3)
self.cost = 0 self.cost = 0
```` ```
### Define architecture ### Define Architecture
So all we need to do is to define a network layout So all we need to do is to define a network layout
$$f\colon \mathbb{R}^{b \times 256 \times 256} \to \mathbb{R}^{b \times 256 \times 256 \times 3}$$ mapping our input to a plausible rgb image. ```math
f\colon \mathbb{R}^{b \times 256 \times 256} \to \mathbb{R}^{b \times 256 \times 256 \times 3}
```
mapping our input to a plausible rgb image.
The process of coming up with such a network architecture is usually a soup of experience, a lot of trials and much time laced with magic or simply chance, depending what you prefer. We will use an auto-encoder with a lot of convolutions to squeeze the information through a bottle-neck (encoder) and then upsample from a hopefully meaningful compact representation (decoder). The process of coming up with such a network architecture is usually a soup of experience,
a lot of trials and much time laced with magic or simply chance, depending what you prefer.
We will use an auto-encoder with a lot of convolutions to squeeze the information through a bottle-neck
(encoder) and then upsample from a hopefully meaningful compact representation (decoder).
Because we are fancy, we will use a U-net layout with skip-connections. Because we are fancy, we will use a U-net layout with skip-connections.
````python ```python
NF = 64 NF = 64
with argscope(BatchNorm, use_local_stat=True), \ with argscope(Conv2D, kernel_shape=4, stride=2,
argscope(Dropout, is_training=True): nl=lambda x, name: LeakyReLU(BatchNorm('bn', x), name=name)):
with argscope(Conv2D, kernel_shape=4, stride=2, # encoder
nl=lambda x, name: LeakyReLU(BatchNorm('bn', x), name=name)): e1 = Conv2D('conv1', luminance, NF, nl=LeakyReLU)
# encoder e2 = Conv2D('conv2', e1, NF * 2)
e1 = Conv2D('conv1', luminance, NF, nl=LeakyReLU) e3 = Conv2D('conv3', e2, NF * 4)
e2 = Conv2D('conv2', e1, NF * 2) e4 = Conv2D('conv4', e3, NF * 8)
e3 = Conv2D('conv3', e2, NF * 4) e5 = Conv2D('conv5', e4, NF * 8)
e4 = Conv2D('conv4', e3, NF * 8) e6 = Conv2D('conv6', e5, NF * 8)
e5 = Conv2D('conv5', e4, NF * 8) e7 = Conv2D('conv7', e6, NF * 8)
e6 = Conv2D('conv6', e5, NF * 8) e8 = Conv2D('conv8', e7, NF * 8, nl=BNReLU) # 1x1
e7 = Conv2D('conv7', e6, NF * 8) with argscope(Deconv2D, nl=BNReLU, kernel_shape=4, stride=2):
e8 = Conv2D('conv8', e7, NF * 8, nl=BNReLU) # 1x1 # decoder
with argscope(Deconv2D, nl=BNReLU, kernel_shape=4, stride=2): e8 = Deconv2D('deconv1', e8, NF * 8)
# decoder e8 = Dropout(e8)
e8 = Deconv2D('deconv1', e8, NF * 8) e8 = ConcatWith(e8, 3, e7)
e8 = Dropout(e8)
e8 = ConcatWith(e8, 3, e7) e7 = Deconv2D('deconv2', e8, NF * 8)
e7 = Dropout(e7)
e7 = Deconv2D('deconv2', e8, NF * 8) e7 = ConcatWith(e7, 3, e6)
e7 = Dropout(e7)
e7 = ConcatWith(e7, 3, e6) e6 = Deconv2D('deconv3', e7, NF * 8)
e6 = Dropout(e6)
e6 = Deconv2D('deconv3', e7, NF * 8) e6 = ConcatWith(e6, 3, e5)
e6 = Dropout(e6)
e6 = ConcatWith(e6, 3, e5) e5 = Deconv2D('deconv4', e6, NF * 8)
e5 = Dropout(e5)
e5 = Deconv2D('deconv4', e6, NF * 8) e5 = ConcatWith(e5, 3, e4)
e5 = Dropout(e5)
e5 = ConcatWith(e5, 3, e4) e4 = Deconv2D('deconv5', e65, NF * 4)
e4 = Dropout(e4)
e4 = Deconv2D('deconv5', e65, NF * 4) e4 = ConcatWith(e4, 3, e3)
e4 = Dropout(e4)
e4 = ConcatWith(e4, 3, e3) e3 = Deconv2D('deconv6', e4, NF * 2)
e3 = Dropout(e3)
e3 = Deconv2D('deconv6', e4, NF * 2) e3 = ConcatWith(e3, 3, e2)
e3 = Dropout(e3)
e3 = ConcatWith(e3, 3, e2) e2 = Deconv2D('deconv7', e3, NF * 1)
e2 = Dropout(e2)
e2 = Deconv2D('deconv7', e3, NF * 1) e2 = ConcatWith(e2, 3, e1)
e2 = Dropout(e2)
e2 = ConcatWith(e2, 3, e1) prediction = Deconv2D('prediction', e2, 3, nl=tf.tanh)
```
prediction = Deconv2D('prediction', e2, 3, nl=tf.tanh)
```` There are probably many better tutorials about defining your network model. And there are definitely [better models](../../examples/GAN/image2image.py). You should check them later. A good way to understand layers from this library is to play with those examples.
There are probably many better tutorials about defining your network model. And there are definitely [better models](../../examples/GAN/image2image.py). You should check them later. A good way to understand layers from this library is to play with those examples. It should be noted that you can write your models using [tfSlim](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim)
which comes along [architectures and pre-trained models](https://github.com/tensorflow/models/tree/master/slim/nets) for image classification.
It should be noted that you can write your models using [tfSlim](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) which comes along architectures [architectures and pre-trained models](https://github.com/tensorflow/models/tree/master/slim/nets) for image classification. The library automatically handles regularization and batchnorm updates from tfSlim. And you can directly load these pre-trained checkpoints from state-of-the-art models in TensorPack. Isn't this cool? TensorPack automatically handles regularization and batchnorm updates from tfSlim. And you can directly load these pre-trained checkpoints from state-of-the-art models in TensorPack. Isn't this cool?
The remaining part is a boring L2-loss function given by: The remaining part is a boring L2-loss function given by:
````python ```python
self.cost = tf.nn.l2_loss(prediction - rgb, name="L2 loss") self.cost = tf.nn.l2_loss(prediction - rgb, name="L2 loss")
```` ```
### Pimp the TensorBoard output ### Pimp the TensorBoard output
It is a good idea to track the progress of your training session using TensorBoard. This library provides several functions to simplify the output of summaries and visualization of intermediate states. It is a good idea to track the progress of your training session using TensorBoard.
TensorPack provides several functions to simplify the output of summaries and visualization of intermediate states.
The following two lines The following two lines
````python ```python
add_moving_summary(self.cost) add_moving_summary(self.cost)
tf.summary.image('colorized', prediction, max_outputs=10) tf.summary.image('colorized', prediction, max_outputs=10)
```` ```
add a plot of the costs from our loss function and add some intermediate results to the tab of "images" inside TensorBoard. The updates are then triggered after each epoch. add a plot of the moving average of the cost tensor, and add some intermediate results to the tab of "images" inside TensorBoard. The summary is written after each epoch.
Note that you can certainly use `tf.summary.scalar(self.cost)`, but then you'll only see a single cost value (rather than moving average) which is much less informative.
## Training ## Training
Let's summarize: We have a model and data. The missing piece which stitches these parts together is the training protocol. It is only a [configuration](../../tensorpack/train/config.py#L23-#L29) Let's summarize: we have a model and data.
The missing piece which stitches these parts together is the training protocol.
It is only a [configuration](http://tensorpack.readthedocs.io/en/latest/modules/tensorpack.train.html#tensorpack.train.TrainConfig)
For the dataflow, we already implemented `get_data` in the first part. Specifying the learning rate is done by For the dataflow, we already implemented `get_data` in the first part. Specifying the learning rate is done by
````python ```python
lr = symbolic_functions.get_scalar_var('learning_rate', 1e-4, summary=True) lr = symbolic_functions.get_scalar_var('learning_rate', 1e-4, summary=True)
```` ```
This essentially creates a non-trainable variable with initial value `1e-4` and also track this value inside TensorBoard. Let's have a look at the entire code: This essentially creates a non-trainable variable with initial value `1e-4` and also track this value inside TensorBoard.
Let's have a look at the entire code:
````python ```python
def get_config(): def get_config():
logger.auto_set_dir() logger.auto_set_dir()
dataset = get_data() dataset = get_data()
...@@ -300,18 +302,25 @@ def get_config(): ...@@ -300,18 +302,25 @@ def get_config():
step_per_epoch=dataset.size(), step_per_epoch=dataset.size(),
max_epoch=100, max_epoch=100,
) )
```` ```
There is not really new stuff. The model was implemented, and `max_epoch` is set to 100. This means 100 runs over the entire dataset. The alert reader who almost already had gone to sleep makes some noise: "Where is `dataset.size()` coming from?" This values represents all images in one directory and is forwarded by all mappings. If you have 42 images in your directory, then this value is 42. Satisfied with this answer, the alert reader went out of the room. But he will miss the most interesting part: the callback section. We will cover this in the next section.
There is not really new stuff.
The model was implemented, and `max_epoch` is set to 100.
The alert reader who almost already had gone to sleep makes some noise: "Where is `dataset.size()` coming from?"
This method is implemented by `ImageFromFile` and is forwarded by all mappings.
If you have 42 images in your directory, then this value would be 42.
Satisfied with this answer, the alert reader went out of the room.
But he will miss the most interesting part: the callback section. We will cover this in the next section.
## Callbacks ## Callbacks
Until this point, we spoke about all necessary part of deep learning pipelines which are common from GANs, image-recognition and embedding learning. But sometimes you want to add your own code. We will now add a functionality which will export some entries of the tensor `prediction`. Remember, this is the result of the decoder part in our network. Until this point, we spoke about all necessary parts of deep learning pipelines which are common for GANs, image-recognition and embedding learning.
But sometimes you want to add your own code to do something extra. We will now add a functionality which will export some entries of the tensor `prediction`.
Remember, this tensor is the result of the decoder part in our network.
To not mess up the code, there is a plug-in mechanism with callbacks. Our callback looks like To modularize the code, there is a plug-in mechanism called callbacks. Our callback looks like
````python ```python
class OnlineExport(Callback): class OnlineExport(Callback):
def __init__(self): def __init__(self):
pass pass
...@@ -321,61 +330,75 @@ class OnlineExport(Callback): ...@@ -321,61 +330,75 @@ class OnlineExport(Callback):
def _trigger_epoch(self): def _trigger_epoch(self):
pass pass
```` ```
So it has 3 methods, although there are some more. TensorPack is really conservative regarding the computation graph. After the network is constructed and all callbacks are initialized the graph is finalized. So once you started training, there is no way of adding nodes to the graph, which we actually want to do for inference. So it has 3 methods, although there are some more.
TensorPack is conservative regarding the computation graph.
After the network is constructed and all callbacks are initialized the graph is read-only.
So once you started training, there is no way of modifying the graph, which we actually want to do for inference.
You'll need to define the whole graph before training starts.
Let us fill in some parts Let us fill in some parts:
````python ```python
class OnlineExport(Callback): class OnlineExport(Callback):
def __init__(self): def __init__(self):
self.cc = 0 self.cc = 0
self.example_input = color.rgb2lab(cv2.imread('myimage.jpg')[:, :, [2, 1, 0]])[:, :, 0] # read rgb image and extract luminance self.example_input = color.rgb2lab(cv2.imread('myimage.jpg')[:, :, ::-1])[:, :, 0] # read rgb image and extract luminance
def _setup_graph(self): def _setup_graph(self):
self.predictor = self.trainer.get_predict_func(['luminance'], ['prediction']) self.predictor = self.trainer.get_predict_func(['luminance'], ['prediction/output'])
def _trigger_epoch(self): def _trigger_epoch(self):
pass pass
```` ```
Can you remember the method `_get_input_vars` in our model? We used the name `luminance` to identify one input. If not, this is the best time to go back in this text and read how to specify input variables for the network. In the deconvolution step there was also: Can you remember the method `_get_input_vars` in our model? We used the name `luminance` to identify one input.
If not, this is the best time to go back in this text and read how to specify input variables for the network.
In the deconvolution step there was also:
````python ```python
prediction = Deconv2D('prediction', e2, 3, nl=tf.tanh) # name is 'prediction' prediction = Deconv2D('prediction', e2, 3, nl=tf.tanh) # name of the tensor is 'prediction/output'
```` ```
Usually the name of the output tensor of a layer is in the API documentation. If you are uncertain,
you can simply `print(prediction)` to find out the name.
These two names allows to build the inference part of the network in These two names allows us to build the inference part of the network in
````python ```python
inputs = ['luminance'] self.trainer.get_predict_func(['luminance', 'prediction/output'])
outputs = ['prediction'] ```
self.trainer.get_predict_func(inputs, outputs)
````
This is very convenient because in the `_tigger_epoch` we can use: This is very convenient because in the `_tigger_epoch` we can use:
````python ```python
def _trigger_epoch(self): def _trigger_epoch(self):
hopefully_cool_rgb = self.pred([self.example_input])[0] hopefully_cool_rgb = self.predictor([self.example_input])[0]
```` ```
to do inference. Together this looks like to do inference. Together this looks like
````python ```python
class OnlineExport(Callback): class OnlineExport(Callback):
def __init__(self): def __init__(self):
self.cc = 0 self.cc = 0
self.example_input = color.rgb2lab(cv2.imread('myimage.jpg')[:, :, [2, 1, 0]])[:, :, 0] self.example_input = color.rgb2lab(cv2.imread('myimage.jpg')[:, :, [2, 1, 0]])[:, :, 0]
def _setup_graph(self): def _setup_graph(self):
inputs = ['luminance'] self.trainer.get_predict_func(['luminance', 'prediction/output'])
outputs = ['prediction']
self.trainer.get_predict_func(inputs, outputs)
def _trigger_epoch(self): def _trigger_epoch(self):
hopefully_cool_rgb = self.pred([self.example_input])[0] hopefully_cool_rgb = self.pred([[self.example_input]])[0][0]
cv2.imwrite("export%04i.jpg" % self.cc, hopefully_cool_rgb) cv2.imwrite("export%04i.jpg" % self.cc, hopefully_cool_rgb)
self.cc += 1 self.cc += 1
```` ```
One note about the predictor: it allows any number of inputs or outputs, so it takes a list of names
to create a predictor, and the predictor takes list of inputs and returns list of outputs.
Also, remember that our graph takes and returns a batch but we only have one. This explains the double brackets `[[self.example_input]]`
and the double indexing `[0][0]` above.
Finally do not forget to add `OnlineExport` to you callbacks in your `TrainConfig`. Then the
inference will be executed after every epoch.
Do not forget to add `OnlineExport` to you callbacks in the train-configuration. This part shows a simple callback, but there are a lot more it can do. It can manipulate summary statistics, run inference,
dump parameters, modify hyperparameters, etc. You can checkout the pre-defined callbacks to see how
to implement a powerful callback.
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
# File: ptb-lstm.py # File: PTB-LSTM.py
# Author: Yuxin Wu <ppwwyyxxc@gmail.com> # Author: Yuxin Wu <ppwwyyxxc@gmail.com>
import tensorflow as tf import tensorflow as tf
...@@ -53,11 +53,10 @@ class Model(ModelDesc): ...@@ -53,11 +53,10 @@ class Model(ModelDesc):
input, nextinput = input_vars input, nextinput = input_vars
initializer = tf.random_uniform_initializer(-0.05, 0.05) initializer = tf.random_uniform_initializer(-0.05, 0.05)
with tf.variable_scope('LSTM', initializer=initializer): cell = rnn.BasicLSTMCell(num_units=HIDDEN_SIZE, forget_bias=0.0)
cell = rnn.BasicLSTMCell(num_units=HIDDEN_SIZE, forget_bias=0.0) if is_training:
if is_training: cell = rnn.DropoutWrapper(cell, output_keep_prob=DROPOUT)
cell = rnn.DropoutWrapper(cell, output_keep_prob=DROPOUT) cell = rnn.MultiRNNCell([cell] * NUM_LAYER)
cell = rnn.MultiRNNCell([cell] * NUM_LAYER)
def get_v(n): def get_v(n):
return tf.get_variable(n, [BATCH, HIDDEN_SIZE], return tf.get_variable(n, [BATCH, HIDDEN_SIZE],
...@@ -71,13 +70,13 @@ class Model(ModelDesc): ...@@ -71,13 +70,13 @@ class Model(ModelDesc):
input_feature = tf.nn.embedding_lookup(embeddingW, input) # B x seqlen x hiddensize input_feature = tf.nn.embedding_lookup(embeddingW, input) # B x seqlen x hiddensize
input_feature = Dropout(input_feature, DROPOUT) input_feature = Dropout(input_feature, DROPOUT)
input_list = tf.unstack(input_feature, num=SEQ_LEN, axis=1) # seqlen x (Bxhidden) with tf.variable_scope('LSTM', initializer=initializer):
outputs, last_state = rnn.static_rnn(cell, input_list, state_var, scope='rnn') input_list = tf.unstack(input_feature, num=SEQ_LEN, axis=1) # seqlen x (Bxhidden)
outputs, last_state = rnn.static_rnn(cell, input_list, state_var, scope='rnn')
# seqlen x (Bxrnnsize) # seqlen x (Bxrnnsize)
output = tf.reshape(tf.concat_v2(outputs, 1), [-1, HIDDEN_SIZE]) # (Bxseqlen) x hidden output = tf.reshape(tf.concat_v2(outputs, 1), [-1, HIDDEN_SIZE]) # (Bxseqlen) x hidden
logits = FullyConnected('fc', output, VOCAB_SIZE, nl=tf.identity, logits = FullyConnected('fc', output, VOCAB_SIZE, nl=tf.identity, W_init=initializer, b_init=initializer)
W_init=initializer)
xent_loss = tf.nn.sparse_softmax_cross_entropy_with_logits( xent_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=logits, labels=symbolic_functions.flatten(nextinput)) logits=logits, labels=symbolic_functions.flatten(nextinput))
......
...@@ -33,4 +33,4 @@ Note to contributors: ...@@ -33,4 +33,4 @@ Note to contributors:
Example needs to satisfy one of the following: Example needs to satisfy one of the following:
+ Reproduce performance of a published or well-known paper. + Reproduce performance of a published or well-known paper.
+ Get state-of-the-art performance on some task. + Get state-of-the-art performance on some task.
+ Illustrate a new way of using the library that are currently not covered. + Illustrate a new way of using the library that is currently not covered.
...@@ -499,12 +499,12 @@ class PrintData(ProxyDataFlow): ...@@ -499,12 +499,12 @@ class PrintData(ProxyDataFlow):
.. code-block:: none .. code-block:: none
[0110 09:22:21 @common.py:589] DataFlow Info: [0110 09:22:21 @common.py:589] DataFlow Info:
datapoint 0<2 with 4 elements consists of datapoint 0<2 with 4 components consists of
dp 0: is float of shape () with range [0.0816501893251] dp 0: is float of shape () with range [0.0816501893251]
dp 1: is ndarray of shape (64, 64) with range [0.1300, 0.6895] dp 1: is ndarray of shape (64, 64) with range [0.1300, 0.6895]
dp 2: is ndarray of shape (64, 64) with range [-1.2248, 1.2177] dp 2: is ndarray of shape (64, 64) with range [-1.2248, 1.2177]
dp 3: is ndarray of shape (9, 9) with range [-0.6045, 0.6045] dp 3: is ndarray of shape (9, 9) with range [-0.6045, 0.6045]
datapoint 1<2 with 4 elements consists of datapoint 1<2 with 4 components consists of
dp 0: is float of shape () with range [5.88252075399] dp 0: is float of shape () with range [5.88252075399]
dp 1: is ndarray of shape (64, 64) with range [0.0072, 0.9371] dp 1: is ndarray of shape (64, 64) with range [0.0072, 0.9371]
dp 2: is ndarray of shape (64, 64) with range [-0.9011, 0.8491] dp 2: is ndarray of shape (64, 64) with range [-0.9011, 0.8491]
...@@ -539,7 +539,7 @@ class PrintData(ProxyDataFlow): ...@@ -539,7 +539,7 @@ class PrintData(ProxyDataFlow):
string: debug message string: debug message
""" """
if isinstance(el, list): if isinstance(el, list):
return "%s is list of %i elements " % (" " * (depth * 2), len(el)) return "%s is list of %i elements" % (" " * (depth * 2), len(el))
else: else:
el_type = el.__class__.__name__ el_type = el.__class__.__name__
...@@ -593,7 +593,7 @@ class PrintData(ProxyDataFlow): ...@@ -593,7 +593,7 @@ class PrintData(ProxyDataFlow):
msg = [""] msg = [""]
for i, dummy in enumerate(cutoff(ds.get_data(), self.num)): for i, dummy in enumerate(cutoff(ds.get_data(), self.num)):
if isinstance(dummy, list): if isinstance(dummy, list):
msg.append("datapoint %i<%i with %i elements consists of" % (i, self.num, len(dummy))) msg.append("datapoint %i<%i with %i components consists of" % (i, self.num, len(dummy)))
for k, entry in enumerate(dummy): for k, entry in enumerate(dummy):
msg.append(self._analyze_input_data(entry, k)) msg.append(self._analyze_input_data(entry, k))
label = "" if self.label is "" else " (" + self.label + ")" label = "" if self.label is "" else " (" + self.label + ")"
......
...@@ -17,7 +17,7 @@ class ImageFromFile(RNGDataFlow): ...@@ -17,7 +17,7 @@ class ImageFromFile(RNGDataFlow):
""" """
Args: Args:
files (list): list of file paths. files (list): list of file paths.
channel (int): 1 or 3. Produce RGB images if channel==3. channel (int): 1 or 3. Will convert grayscale to RGB images if channel==3.
resize (tuple): (h, w). If given, resize the image. resize (tuple): (h, w). If given, resize the image.
""" """
assert len(files), "No image files given to ImageFromFile!" assert len(files), "No image files given to ImageFromFile!"
......
...@@ -14,7 +14,7 @@ class Rotation(ImageAugmentor): ...@@ -14,7 +14,7 @@ class Rotation(ImageAugmentor):
""" Random rotate the image w.r.t a random center""" """ Random rotate the image w.r.t a random center"""
def __init__(self, max_deg, center_range=(0, 1), def __init__(self, max_deg, center_range=(0, 1),
interp=cv2.INTER_CUBIC, interp=cv2.INTER_LINEAR,
border=cv2.BORDER_REPLICATE): border=cv2.BORDER_REPLICATE):
""" """
Args: Args:
...@@ -43,7 +43,7 @@ class RotationAndCropValid(ImageAugmentor): ...@@ -43,7 +43,7 @@ class RotationAndCropValid(ImageAugmentor):
Note that this will produce images of different shapes. Note that this will produce images of different shapes.
""" """
def __init__(self, max_deg, interp=cv2.INTER_CUBIC): def __init__(self, max_deg, interp=cv2.INTER_LINEAR):
""" """
Args: Args:
max_deg (float): max abs value of the rotation degree (in angle). max_deg (float): max abs value of the rotation degree (in angle).
......
...@@ -49,7 +49,7 @@ class Flip(ImageAugmentor): ...@@ -49,7 +49,7 @@ class Flip(ImageAugmentor):
class Resize(ImageAugmentor): class Resize(ImageAugmentor):
""" Resize image to a target size""" """ Resize image to a target size"""
def __init__(self, shape, interp=cv2.INTER_CUBIC): def __init__(self, shape, interp=cv2.INTER_LINEAR):
""" """
Args: Args:
shape: (h, w) tuple or a int shape: (h, w) tuple or a int
...@@ -85,7 +85,7 @@ class ResizeShortestEdge(ImageAugmentor): ...@@ -85,7 +85,7 @@ class ResizeShortestEdge(ImageAugmentor):
h, w = img.shape[:2] h, w = img.shape[:2]
scale = self.size / min(h, w) scale = self.size / min(h, w)
desSize = map(int, [scale * w, scale * h]) desSize = map(int, [scale * w, scale * h])
ret = cv2.resize(img, tuple(desSize), interpolation=cv2.INTER_CUBIC) ret = cv2.resize(img, tuple(desSize), interpolation=cv2.INTER_LINEAR)
if img.ndim == 3 and ret.ndim == 2: if img.ndim == 3 and ret.ndim == 2:
ret = ret[:, :, np.newaxis] ret = ret[:, :, np.newaxis]
return ret return ret
...@@ -95,7 +95,7 @@ class RandomResize(ImageAugmentor): ...@@ -95,7 +95,7 @@ class RandomResize(ImageAugmentor):
""" Randomly rescale w and h of the image""" """ Randomly rescale w and h of the image"""
def __init__(self, xrange, yrange, minimum=(0, 0), aspect_ratio_thres=0.15, def __init__(self, xrange, yrange, minimum=(0, 0), aspect_ratio_thres=0.15,
interp=cv2.INTER_CUBIC): interp=cv2.INTER_LINEAR):
""" """
Args: Args:
xrange (tuple): (min, max) range of scaling ratio for w xrange (tuple): (min, max) range of scaling ratio for w
......
...@@ -20,6 +20,11 @@ def replace_get_variable(fn): ...@@ -20,6 +20,11 @@ def replace_get_variable(fn):
Returns: Returns:
a context where ``tf.get_variable`` and a context where ``tf.get_variable`` and
``variable_scope.get_variable`` are replaced with ``fn``. ``variable_scope.get_variable`` are replaced with ``fn``.
Note that originally ``tf.get_variable ==
tensorflow.python.ops.variable_scope.get_variable``. But some code such as
some in `rnn_cell/`, uses the latter one to get variable, therefore both
need to be replaced.
""" """
old_getv = tf.get_variable old_getv = tf.get_variable
old_vars_getv = variable_scope.get_variable old_vars_getv = variable_scope.get_variable
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment