Commit f42036ac authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent f417c49f
...@@ -6,11 +6,11 @@ https://github.com/tensorpack/tensorpack/issues/new?template=unexpected-problems ...@@ -6,11 +6,11 @@ https://github.com/tensorpack/tensorpack/issues/new?template=unexpected-problems
Otherwise, you can post here for: Otherwise, you can post here for:
1. Feature Requests: 1. Feature Requests:
+ Note that you can implement a lot of features by extending Tensorpack + Note that you can implement a lot of features by extending Tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack). (See http://tensorpack.readthedocs.io/tutorial/index.html#extend-tensorpack).
It does not have to be added to Tensorpack unless you have a good reason. It does not have to be added to Tensorpack unless you have a good reason.
2. Questions on Using/Understanding Tensorpack: 2. Questions on Using/Understanding Tensorpack:
+ Your question is probably answered in [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials). Read it first. + Your question is probably answered in [tutorials](http://tensorpack.readthedocs.io/tutorial/index.html#user-tutorials). Read it first.
+ We answer "HOW to do X with Tensorpack" for a well-defined X. + We answer "HOW to do X with Tensorpack" for a well-defined X.
We also answer "HOW/WHY Tensorpack does X" for some X that Tensorpack or its examples are doing. We also answer "HOW/WHY Tensorpack does X" for some X that Tensorpack or its examples are doing.
......
...@@ -5,7 +5,7 @@ about: Suggest an idea for Tensorpack ...@@ -5,7 +5,7 @@ about: Suggest an idea for Tensorpack
--- ---
+ Note that you can implement a lot of features by extending Tensorpack + Note that you can implement a lot of features by extending Tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack). (See http://tensorpack.readthedocs.io/tutorial/index.html#extend-tensorpack).
It does not have to be added to Tensorpack unless you have a good reason. It does not have to be added to Tensorpack unless you have a good reason.
+ "Could you improve/implement an example/paper ?" + "Could you improve/implement an example/paper ?"
......
...@@ -38,7 +38,9 @@ For example, CPU/GPU utilization, output images, tensorboard curves, if relevant ...@@ -38,7 +38,9 @@ For example, CPU/GPU utilization, output images, tensorboard curves, if relevant
### 3. What you expected, if not obvious. ### 3. What you expected, if not obvious.
If you expect higher speed, please first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html If you expect higher speed, please read
http://tensorpack.readthedocs.io/tutorial/performance-tuning.html
before posting.
If you expect higher accuracy, only in one of the two conditions can we help with it: If you expect higher accuracy, only in one of the two conditions can we help with it:
(1) You're unable to match the accuracy documented in tensorpack examples. (1) You're unable to match the accuracy documented in tensorpack examples.
......
...@@ -7,7 +7,7 @@ about: More general questions about Tensorpack. ...@@ -7,7 +7,7 @@ about: More general questions about Tensorpack.
+ If you did something with tensorpack and it failed, please use the "Unexpected Problems / + If you did something with tensorpack and it failed, please use the "Unexpected Problems /
Bugs" category. Bugs" category.
+ Your question is probably answered in [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials). Read it first. + Your question is probably answered in [tutorials](http://tensorpack.readthedocs.io/tutorial/index.html#user-tutorials). Read it first.
+ We answer "HOW to do X with Tensorpack" for a well-defined specific X. + We answer "HOW to do X with Tensorpack" for a well-defined specific X.
X must be something that you conceptually know how to do, but are unable to do due to lack of knowledge about Tensorpack. X must be something that you conceptually know how to do, but are unable to do due to lack of knowledge about Tensorpack.
......
...@@ -64,7 +64,7 @@ Dependencies: ...@@ -64,7 +64,7 @@ Dependencies:
+ Python 2.7 or 3.3+. Python 2.7 is supported until [it retires in 2020](https://pythonclock.org/). + Python 2.7 or 3.3+. Python 2.7 is supported until [it retires in 2020](https://pythonclock.org/).
+ Python bindings for OpenCV. (Optional, but required by a lot of features) + Python bindings for OpenCV. (Optional, but required by a lot of features)
+ TensorFlow ≥ 1.3. (Optional, if you only want to use `tensorpack.dataflow` alone as a data processing library) + TensorFlow ≥ 1.3, < 2. (Optional, if you only want to use `tensorpack.dataflow` alone as a data processing library)
``` ```
pip install --upgrade git+https://github.com/tensorpack/tensorpack.git pip install --upgrade git+https://github.com/tensorpack/tensorpack.git
# or add `--user` to install to user's local directories # or add `--user` to install to user's local directories
......
# Performance Tuning # Performance Tuning
__We do not know why your training is slow__ (and most of the times it's not a tensorpack problem). __We do not know why your training is slow__
(and most of the times it's not due to issues in tensorpack).
Tensorpack is designed to be high-performance, as can be seen in the [benchmarks](https://github.com/tensorpack/benchmarks). Tensorpack is designed to be high-performance, as can be seen in the [benchmarks](https://github.com/tensorpack/benchmarks).
But performance is different across machines and tasks, But performance is different across machines and tasks,
so it's not easy to understand what goes wrong without doing some investigations by your own. so it's not easy to let others understand what goes wrong without doing some investigations by your own.
Tensorpack has some tools to make it easier to understand the performance. Tensorpack has some tools to make it easier to understand the performance.
Here is a list of things you can do to understand why your training is slow. Here is a list of things you can do to understand why your training is slow.
...@@ -42,8 +43,9 @@ A benchmark will give you more precise information about which part you should i ...@@ -42,8 +43,9 @@ A benchmark will give you more precise information about which part you should i
## Investigate DataFlow ## Investigate DataFlow
Understand the [Efficient DataFlow](efficient-dataflow.html) tutorial, so you know what your DataFlow is doing. Understand the [Efficient DataFlow](efficient-dataflow.html) tutorial, so you know what your DataFlow is doing.
Then, make modifications and benchmark to understand what in the data pipeline is your bottleneck. Then, make modifications and benchmark your modifications to understand which
Do __NOT__ look at training speed when you benchmark a DataFlow, only use the output of `TestDataSpeed`. part in the data pipeline is your bottleneck.
Do __NOT__ look at training speed when you benchmark a DataFlow. Only look at the output of `TestDataSpeed`.
A DataFlow could be blocked by CPU/disk/network/IPC bandwidth. A DataFlow could be blocked by CPU/disk/network/IPC bandwidth.
Do __NOT__ optimize the DataFlow before knowing what it is blocked on. Do __NOT__ optimize the DataFlow before knowing what it is blocked on.
...@@ -55,8 +57,8 @@ dataflow, you can usually do the following: ...@@ -55,8 +57,8 @@ dataflow, you can usually do the following:
augmentations), then the pre-processing is the bottleneck. augmentations), then the pre-processing is the bottleneck.
1. Without pre-processing, your dataflow is just reading + parallelism, which 1. Without pre-processing, your dataflow is just reading + parallelism, which
includes both reading cost and the multiprocess communication cost. includes both reading cost and the multiprocess communication cost.
You can now let your reader produce only a single float after reading a large You can now let your reader produce only a single integer after reading a large
amount of data, so that the pipeline contains only parallel reading, but negligible amount of data, so that the pipeline contains only parallel reading cost, but negligible
communication cost any more. communication cost any more.
If this becomes fast enough, it means that communication is the bottleneck. If this becomes fast enough, it means that communication is the bottleneck.
...@@ -64,7 +66,8 @@ dataflow, you can usually do the following: ...@@ -64,7 +66,8 @@ dataflow, you can usually do the following:
1. In practice the dataflow can be more complicated and you'll need to design 1. In practice the dataflow can be more complicated and you'll need to design
your own strategies to understand its performance. your own strategies to understand its performance.
Once you've understand what is the bottleneck, you can try some improvements such as: Once you've understood which part is the bottleneck,
you can start optimizing the specific part by methods such as:
1. Use single-file database to avoid random read on hard disk. 1. Use single-file database to avoid random read on hard disk.
2. Use fewer pre-processings or write faster ones with whatever tools you have. 2. Use fewer pre-processings or write faster ones with whatever tools you have.
...@@ -74,19 +77,21 @@ Once you've understand what is the bottleneck, you can try some improvements suc ...@@ -74,19 +77,21 @@ Once you've understand what is the bottleneck, you can try some improvements suc
## Investigate TensorFlow ## Investigate TensorFlow
When you're sure that data is not a bottleneck (e.g. when the logs show that queue is almost full), you can start to When you're sure that data is not a bottleneck (e.g. when the logs show that queue is almost full),
worry about the model. you can investigate and optimize the model.
A naive but effective way is to remove ops from your model to understand how much time they cost. A naive but effective way is to remove ops from your model to understand how much time they cost.
Or you can use `GraphProfiler` callback to benchmark the graph. It will
Alternatively, you can use `GraphProfiler` callback to benchmark the graph. It will
dump runtime tracing information (to either TensorBoard or chrome) to help diagnose the issue. dump runtime tracing information (to either TensorBoard or chrome) to help diagnose the issue.
Remember not to use the first several iterations.
Remember to not use the first several iterations.
### Slow on single-GPU ### Slow on single-GPU
This is literally saying TF ops are slow. Usually there isn't much you can do, except to optimize the kernels. This is literally saying TF ops are slow. Usually there isn't much you can do, except to optimize the kernels.
But there may be something cheap you can try: But there may be something cheap you can try:
1. Visualize copies across devices in chrome. 1. Visualize copies across devices in the profiler.
It may help to change device placement to avoid some CPU-GPU copies. It may help to change device placement to avoid some CPU-GPU copies.
It may help to replace some CPU-only ops with equivalent GPU ops to avoid copies. It may help to replace some CPU-only ops with equivalent GPU ops to avoid copies.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment