Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
833fc5e2
Commit
833fc5e2
authored
Jul 04, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
ebd7332d
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
51 additions
and
29 deletions
+51
-29
docs/tutorial/inference.md
docs/tutorial/inference.md
+21
-11
tensorpack/models/batch_norm.py
tensorpack/models/batch_norm.py
+30
-18
No files found.
docs/tutorial/inference.md
View file @
833fc5e2
...
@@ -12,32 +12,42 @@ There are two ways to do inference during training.
...
@@ -12,32 +12,42 @@ There are two ways to do inference during training.
2.
If your inference follows the paradigm of:
2.
If your inference follows the paradigm of:
"fetch some tensors for each input, and aggregate the results".
"fetch some tensors for each input, and aggregate the results".
You can use the
`InferenceRunner`
interface with some
`Inferencer
`
.
You can use the
`InferenceRunner`
interface with some
`Inferencer
**
.
This will further support prefetch & data-parallel inference.
This will further support prefetch & data-parallel inference.
More details to come.
More details to come.
In both methods, your tower function will be called again, with `
TowerContext.is_training==False
`.
In both methods, your tower function will be called again, with `
TowerContext.is_training==False
`.
You can
build a different graph using this predicat
e.
You can
use this predicate to choose a different code path in inference mod
e.
## Inference After Training
## Inference After Training
Tensorpack doesn't care what happened after training.
Tensorpack
is a training interface -- it
doesn't care what happened after training.
It saves models to standard checkpoint format
, plus a metagraph protobuf file
.
It saves models to standard checkpoint format.
They are sufficient to use with
whatever deployment methods TensorFlow supports.
You can build the graph for inference, load the checkpoint, and then use
whatever deployment methods TensorFlow supports.
But you'll need to read TF docs and do it on your own.
But you'll need to read TF docs and do it on your own.
Please note that, the metagraph saved during training is the training graph.
### Don't Use Training Metagraph for Inference
But sometimes you need a different one for inference.
Metagraph is the wrong abstraction for a "model".
It stores the entire graph which contains not only the model, but also all the
training settings (queues, iterators, summaries, evaluations, multi-gpu replications).
Therefore it is usually wrong to import a training metagraph for inference.
It's also very common to change the graph for inference.
For example, you may need a different data layout for CPU inference,
For example, you may need a different data layout for CPU inference,
or you may need placeholders in the inference graph
, or the training graph contains multi-GPU replicatio
n
or you may need placeholders in the inference graph
(which may not even exist i
n
which you want to remove. In fact, directly import a huge training metagraph is usually not a good idea for deployment
.
the training graph). However metagraph is not designed to be easily modified at all
.
In this case, you can always construct a new graph by simply:
To do inference, it's best to recreate a clean graph (and save it if needed).
To construct a new graph, you can simply:
```python
```python
a, b = tf.placeholder(...), tf.placeholder(...)
a, b = tf.placeholder(...), tf.placeholder(...)
# call symbolic functions on a, b
# call ANY symbolic functions on a, b. e.g.:
with TowerContext('', is_training=False):
model.build_graph(a, b)
``
`
``
`
### OfflinePredictor
The only tool tensorpack has for after-training inference is
[
OfflinePredictor
](
../modules/predict.html#tensorpack.predict.OfflinePredictor
)
,
The only tool tensorpack has for after-training inference is
[
OfflinePredictor
](
../modules/predict.html#tensorpack.predict.OfflinePredictor
)
,
a simple function to build the graph and return a callable for you.
a simple function to build the graph and return a callable for you.
It is mainly for quick demo purposes.
It is mainly for quick demo purposes.
...
...
tensorpack/models/batch_norm.py
View file @
833fc5e2
...
@@ -92,16 +92,26 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
...
@@ -92,16 +92,26 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
They are very similar in speed, but `internal_update=True` can be used
They are very similar in speed, but `internal_update=True` can be used
when you have conditionals in your model, or when you have multiple networks to train.
when you have conditionals in your model, or when you have multiple networks to train.
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/14699
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/14699
sync_statistics: either None or "nccl". By default (None), it uses statistics of the input tensor to normalize.
sync_statistics (str or None): one of None "nccl", or "horovod".
When set to "nccl", this layer must be used under tensorpack multi-gpu trainers,
and it then uses per-machine (multiple GPU) statistics to normalize.
Note that this implementation averages the per-tower E[x] and E[x^2] among towers to compute
By default (None), it uses statistics of the input tensor to normalize.
global mean&variance. The result is the global mean&variance only if each tower has the same batch size.
This is the standard way BatchNorm was done in most frameworks.
When set to "nccl", this layer must be used under tensorpack's multi-GPU trainers.
It uses the aggregated statistics of the whole batch (across all GPUs) to normalize.
When set to "horovod", this layer must be used under tensorpack's :class:`HorovodTrainer`.
It uses the aggregated statistics of the whole batch (across all MPI ranks) to normalize.
Note that on single machine this is significantly slower than the "nccl" implementation.
This implementation averages the per-GPU E[x] and E[x^2] among GPUs to compute
global mean & variance. Therefore each GPU needs to have the same batch size.
This option has no effect when not training.
This option has no effect when not training.
This option is also known as "Cross-GPU BatchNorm" as mentioned in https://arxiv.org/abs/1711.07240.
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/18222
This option is also known as "Cross-GPU BatchNorm" as mentioned in:
`MegDet: A Large Mini-Batch Object Detector <https://arxiv.org/abs/1711.07240>`_.
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/18222.
Variable Names:
Variable Names:
...
@@ -217,19 +227,21 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
...
@@ -217,19 +227,21 @@ def BatchNorm(inputs, axis=None, training=None, momentum=0.9, epsilon=1e-5,
from
tensorflow.contrib.nccl.ops
import
gen_nccl_ops
from
tensorflow.contrib.nccl.ops
import
gen_nccl_ops
shared_name
=
re
.
sub
(
'tower[0-9]+/'
,
''
,
tf
.
get_variable_scope
()
.
name
)
shared_name
=
re
.
sub
(
'tower[0-9]+/'
,
''
,
tf
.
get_variable_scope
()
.
name
)
num_dev
=
ctx
.
total
num_dev
=
ctx
.
total
batch_mean
=
gen_nccl_ops
.
nccl_all_reduce
(
if
num_dev
==
1
:
input
=
batch_mean
,
logger
.
warn
(
"BatchNorm(sync_statistics='nccl') is used with only one tower!"
)
reduction
=
'sum'
,
else
:
num_devices
=
num_dev
,
batch_mean
=
gen_nccl_ops
.
nccl_all_reduce
(
shared_name
=
shared_name
+
'_NCCL_mean'
)
*
(
1.0
/
num_dev
)
input
=
batch_mean
,
batch_mean_square
=
gen_nccl_ops
.
nccl_all_reduce
(
reduction
=
'sum'
,
input
=
batch_mean_square
,
num_devices
=
num_dev
,
reduction
=
'sum'
,
shared_name
=
shared_name
+
'_NCCL_mean'
)
*
(
1.0
/
num_dev
)
num_devices
=
num_dev
,
batch_mean_square
=
gen_nccl_ops
.
nccl_all_reduce
(
shared_name
=
shared_name
+
'_NCCL_mean_square'
)
*
(
1.0
/
num_dev
)
input
=
batch_mean_square
,
reduction
=
'sum'
,
num_devices
=
num_dev
,
shared_name
=
shared_name
+
'_NCCL_mean_square'
)
*
(
1.0
/
num_dev
)
elif
sync_statistics
==
'horovod'
:
elif
sync_statistics
==
'horovod'
:
# Require https://github.com/uber/horovod/pull/331
# Require https://github.com/uber/horovod/pull/331
# Proof-of-concept, not ready yet.
import
horovod.tensorflow
as
hvd
import
horovod.tensorflow
as
hvd
batch_mean
=
hvd
.
allreduce
(
batch_mean
,
average
=
True
)
batch_mean
=
hvd
.
allreduce
(
batch_mean
,
average
=
True
)
batch_mean_square
=
hvd
.
allreduce
(
batch_mean_square
,
average
=
True
)
batch_mean_square
=
hvd
.
allreduce
(
batch_mean_square
,
average
=
True
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment