Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
96f8f96e
Commit
96f8f96e
authored
Nov 14, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
14964cc7
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
38 additions
and
16 deletions
+38
-16
docs/tutorial/performance-tuning.md
docs/tutorial/performance-tuning.md
+26
-13
docs/tutorial/symbolic.md
docs/tutorial/symbolic.md
+4
-1
docs/tutorial/trainer.md
docs/tutorial/trainer.md
+2
-0
examples/FasterRCNN/model_frcnn.py
examples/FasterRCNN/model_frcnn.py
+2
-2
tensorpack/libinfo.py
tensorpack/libinfo.py
+4
-0
No files found.
docs/tutorial/performance-tuning.md
View file @
96f8f96e
...
...
@@ -5,11 +5,13 @@ __We do not know why your training is slow__ (and most of the times it's not a t
Tensorpack is designed to be high-performance, as can be seen in the
[
benchmarks
](
https://github.com/tensorpack/benchmarks
)
.
But performance is different across machines and tasks,
so
you need to figure out what goes wrong
by your own.
so
it's not easy to understand what goes wrong without doing some investigations
by your own.
Tensorpack has some tools to make it easier to understand the performance.
Here
's a list of things you can do when
your training is slow.
Here
is a list of things you can do to understand why
your training is slow.
If you ask for help to understand and improve the speed, PLEASE do them and include your findings.
If you ask for help to understand and improve the speed, PLEASE do the
investigations below, post your hardware information and your findings from the investigation, such as what changes
you've made and what performance numbers you've seen.
## Figure out the bottleneck
...
...
@@ -40,18 +42,29 @@ A benchmark will give you more precise information about which part you should i
## Investigate DataFlow
Understand the
[
Efficient DataFlow
](
efficient-dataflow.html
)
tutorial, so you know what your DataFlow is doing.
Then, make modifications and benchmark to understand which part of dataflow is the bottleneck.
Use
[
TestDataSpeed
](
../modules/dataflow.html#tensorpack.dataflow.TestDataSpeed
)
.
Do __NOT__ look at training speed when you benchmark a DataFlow.
Some example things to try:
1.
Benchmark only the raw reader (and perhaps add some parallelism).
2.
Gradually add some pre-processing and see how the performance changes.
3.
Change the number of parallel processes or threads.
Then, make modifications and benchmark to understand what in the data pipeline is your bottleneck.
Do __NOT__ look at training speed when you benchmark a DataFlow, only use the output of
`TestDataSpeed`
.
A DataFlow could be blocked by CPU/disk/network/IPC bandwidth.
Only by benchmarking will you know the reason and improve it accordingly, e.g.:
Do __NOT__ optimize the DataFlow before knowing what it is blocked on.
By benchmarking with modifications to your dataflow, you can see which
components is the bottleneck of your dataflow. For example, with a simple
dataflow, you can usually do the following:
1.
If your dataflow becomes fast enough after removing some pre-processing (e.g.
augmentations), then the pre-processing is the bottleneck.
1.
Without pre-processing, your dataflow is just reading + parallelism, which
includes both reading cost and the multiprocess communication cost.
You can now let your reader produce only a single float after reading a large
amount of data, so that the pipeline contains only parallel reading, but negligible
communication cost any more.
If this becomes fast enough, it means that communication is the bottleneck.
If pure parallel reading is still not fast enough, it means your raw reader is the bottleneck.
1.
In practice the dataflow can be more complicated and you'll need to design
your own strategies to understand its performance.
Once you've understand what is the bottleneck, you can try some improvements such as:
1.
Use single-file database to avoid random read on hard disk.
2.
Use fewer pre-processings or write faster ones with whatever tools you have.
...
...
docs/tutorial/symbolic.md
View file @
96f8f96e
...
...
@@ -50,6 +50,9 @@ l = func(l, *args, **kwargs)
l = FullyConnected('fc1', l, 10, activation=tf.identity)
```
If you need to access the output of some layer and use it with some other
operations, then just don't use
`LinearWrap`
, because the graph is not linear anymore.
### Access Relevant Tensors
The variables inside the layer will be named
`name/W`
,
`name/b`
, etc.
...
...
@@ -60,7 +63,7 @@ l = Conv2D('conv1', l, 32, 3)
print
(
l
.
variables
.
W
)
print
(
l
.
variables
.
b
)
```
But note that this is a
hacky
way and may not work with future versions of TensorFlow.
But note that this is a
__hacky__
way and may not work with future versions of TensorFlow.
Also this method doesn't work with LinearWrap, and cannot access the variables created by an activation function.
The output of a layer is usually named
`name/output`
unless documented differently in the API.
...
...
docs/tutorial/trainer.md
View file @
96f8f96e
...
...
@@ -51,6 +51,8 @@ The tower function needs to follow some rules:
On the other hand, for a non-trainable variable, it may be desirable to not reuse it between towers.
In this case, `tf.Variable` can be used to ensure creation of new variables in each tower even when `reuse=True`.
*
Do not modify the reuse option (e.g., by
`scope.reuse_variables()`
) of a variable
scope that is not created by you. This affects other's code.
4.
It cannot create scopes or variables containing the name 'tower', as it is
reserved for special use.
...
...
examples/FasterRCNN/model_frcnn.py
View file @
96f8f96e
...
...
@@ -42,8 +42,8 @@ def proposal_metrics(iou):
@
under_name_scope
()
def
sample_fast_rcnn_targets
(
boxes
,
gt_boxes
,
gt_labels
):
"""
Sample some
ROI
s from all proposals for training.
#fg is guaranteed to be > 0, because groun
t truth boxes are added as RoI
s.
Sample some
boxe
s from all proposals for training.
#fg is guaranteed to be > 0, because groun
d truth boxes will be added as proposal
s.
Args:
boxes: nx4 region proposals, floatbox
...
...
tensorpack/libinfo.py
View file @
96f8f96e
...
...
@@ -43,6 +43,10 @@ os.environ['TF_GPU_THREAD_COUNT'] = '2'
# overflow for certain input data range.
os
.
environ
[
'TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'
]
=
'0'
# Available since 1.12. issue#15874
os
.
environ
[
'TF_ENABLE_WHILE_V2'
]
=
'1'
os
.
environ
[
'TF_ENABLE_COND_V2'
]
=
'1'
try
:
import
tensorflow
as
tf
# noqa
_version
=
tf
.
__version__
.
split
(
'.'
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment