Commit 96f8f96e authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 14964cc7
...@@ -5,11 +5,13 @@ __We do not know why your training is slow__ (and most of the times it's not a t ...@@ -5,11 +5,13 @@ __We do not know why your training is slow__ (and most of the times it's not a t
Tensorpack is designed to be high-performance, as can be seen in the [benchmarks](https://github.com/tensorpack/benchmarks). Tensorpack is designed to be high-performance, as can be seen in the [benchmarks](https://github.com/tensorpack/benchmarks).
But performance is different across machines and tasks, But performance is different across machines and tasks,
so you need to figure out what goes wrong by your own. so it's not easy to understand what goes wrong without doing some investigations by your own.
Tensorpack has some tools to make it easier to understand the performance. Tensorpack has some tools to make it easier to understand the performance.
Here's a list of things you can do when your training is slow. Here is a list of things you can do to understand why your training is slow.
If you ask for help to understand and improve the speed, PLEASE do them and include your findings. If you ask for help to understand and improve the speed, PLEASE do the
investigations below, post your hardware information and your findings from the investigation, such as what changes
you've made and what performance numbers you've seen.
## Figure out the bottleneck ## Figure out the bottleneck
...@@ -40,18 +42,29 @@ A benchmark will give you more precise information about which part you should i ...@@ -40,18 +42,29 @@ A benchmark will give you more precise information about which part you should i
## Investigate DataFlow ## Investigate DataFlow
Understand the [Efficient DataFlow](efficient-dataflow.html) tutorial, so you know what your DataFlow is doing. Understand the [Efficient DataFlow](efficient-dataflow.html) tutorial, so you know what your DataFlow is doing.
Then, make modifications and benchmark to understand which part of dataflow is the bottleneck. Then, make modifications and benchmark to understand what in the data pipeline is your bottleneck.
Use [TestDataSpeed](../modules/dataflow.html#tensorpack.dataflow.TestDataSpeed). Do __NOT__ look at training speed when you benchmark a DataFlow, only use the output of `TestDataSpeed`.
Do __NOT__ look at training speed when you benchmark a DataFlow.
Some example things to try:
1. Benchmark only the raw reader (and perhaps add some parallelism).
2. Gradually add some pre-processing and see how the performance changes.
3. Change the number of parallel processes or threads.
A DataFlow could be blocked by CPU/disk/network/IPC bandwidth. A DataFlow could be blocked by CPU/disk/network/IPC bandwidth.
Only by benchmarking will you know the reason and improve it accordingly, e.g.: Do __NOT__ optimize the DataFlow before knowing what it is blocked on.
By benchmarking with modifications to your dataflow, you can see which
components is the bottleneck of your dataflow. For example, with a simple
dataflow, you can usually do the following:
1. If your dataflow becomes fast enough after removing some pre-processing (e.g.
augmentations), then the pre-processing is the bottleneck.
1. Without pre-processing, your dataflow is just reading + parallelism, which
includes both reading cost and the multiprocess communication cost.
You can now let your reader produce only a single float after reading a large
amount of data, so that the pipeline contains only parallel reading, but negligible
communication cost any more.
If this becomes fast enough, it means that communication is the bottleneck.
If pure parallel reading is still not fast enough, it means your raw reader is the bottleneck.
1. In practice the dataflow can be more complicated and you'll need to design
your own strategies to understand its performance.
Once you've understand what is the bottleneck, you can try some improvements such as:
1. Use single-file database to avoid random read on hard disk. 1. Use single-file database to avoid random read on hard disk.
2. Use fewer pre-processings or write faster ones with whatever tools you have. 2. Use fewer pre-processings or write faster ones with whatever tools you have.
......
...@@ -50,6 +50,9 @@ l = func(l, *args, **kwargs) ...@@ -50,6 +50,9 @@ l = func(l, *args, **kwargs)
l = FullyConnected('fc1', l, 10, activation=tf.identity) l = FullyConnected('fc1', l, 10, activation=tf.identity)
``` ```
If you need to access the output of some layer and use it with some other
operations, then just don't use `LinearWrap`, because the graph is not linear anymore.
### Access Relevant Tensors ### Access Relevant Tensors
The variables inside the layer will be named `name/W`, `name/b`, etc. The variables inside the layer will be named `name/W`, `name/b`, etc.
...@@ -60,7 +63,7 @@ l = Conv2D('conv1', l, 32, 3) ...@@ -60,7 +63,7 @@ l = Conv2D('conv1', l, 32, 3)
print(l.variables.W) print(l.variables.W)
print(l.variables.b) print(l.variables.b)
``` ```
But note that this is a hacky way and may not work with future versions of TensorFlow. But note that this is a __hacky__ way and may not work with future versions of TensorFlow.
Also this method doesn't work with LinearWrap, and cannot access the variables created by an activation function. Also this method doesn't work with LinearWrap, and cannot access the variables created by an activation function.
The output of a layer is usually named `name/output` unless documented differently in the API. The output of a layer is usually named `name/output` unless documented differently in the API.
......
...@@ -51,6 +51,8 @@ The tower function needs to follow some rules: ...@@ -51,6 +51,8 @@ The tower function needs to follow some rules:
On the other hand, for a non-trainable variable, it may be desirable to not reuse it between towers. On the other hand, for a non-trainable variable, it may be desirable to not reuse it between towers.
In this case, `tf.Variable` can be used to ensure creation of new variables in each tower even when `reuse=True`. In this case, `tf.Variable` can be used to ensure creation of new variables in each tower even when `reuse=True`.
* Do not modify the reuse option (e.g., by `scope.reuse_variables()`) of a variable
scope that is not created by you. This affects other's code.
4. It cannot create scopes or variables containing the name 'tower', as it is 4. It cannot create scopes or variables containing the name 'tower', as it is
reserved for special use. reserved for special use.
......
...@@ -42,8 +42,8 @@ def proposal_metrics(iou): ...@@ -42,8 +42,8 @@ def proposal_metrics(iou):
@under_name_scope() @under_name_scope()
def sample_fast_rcnn_targets(boxes, gt_boxes, gt_labels): def sample_fast_rcnn_targets(boxes, gt_boxes, gt_labels):
""" """
Sample some ROIs from all proposals for training. Sample some boxes from all proposals for training.
#fg is guaranteed to be > 0, because grount truth boxes are added as RoIs. #fg is guaranteed to be > 0, because ground truth boxes will be added as proposals.
Args: Args:
boxes: nx4 region proposals, floatbox boxes: nx4 region proposals, floatbox
......
...@@ -43,6 +43,10 @@ os.environ['TF_GPU_THREAD_COUNT'] = '2' ...@@ -43,6 +43,10 @@ os.environ['TF_GPU_THREAD_COUNT'] = '2'
# overflow for certain input data range. # overflow for certain input data range.
os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '0' os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '0'
# Available since 1.12. issue#15874
os.environ['TF_ENABLE_WHILE_V2'] = '1'
os.environ['TF_ENABLE_COND_V2'] = '1'
try: try:
import tensorflow as tf # noqa import tensorflow as tf # noqa
_version = tf.__version__.split('.') _version = tf.__version__.split('.')
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment