Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
0f0a9c53
Commit
0f0a9c53
authored
May 11, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs; check TF is built with CUDA
parent
ef62f188
Changes
6
Hide whitespace changes
Inline
Side-by-side
Showing
6 changed files
with
40 additions
and
22 deletions
+40
-22
.github/ISSUE_TEMPLATE.md
.github/ISSUE_TEMPLATE.md
+5
-4
examples/ImageNetModels/imagenet_utils.py
examples/ImageNetModels/imagenet_utils.py
+4
-1
examples/ResNet/README.md
examples/ResNet/README.md
+1
-1
examples/ResNet/imagenet-resnet.py
examples/ResNet/imagenet-resnet.py
+2
-1
tensorpack/graph_builder/training.py
tensorpack/graph_builder/training.py
+2
-0
tensorpack/utils/gpu.py
tensorpack/utils/gpu.py
+26
-15
No files found.
.github/ISSUE_TEMPLATE.md
View file @
0f0a9c53
...
@@ -11,26 +11,27 @@ Any unexpected problems: __PLEASE ALWAYS INCLUDE__:
...
@@ -11,26 +11,27 @@ Any unexpected problems: __PLEASE ALWAYS INCLUDE__:
+
If not, tell us what you did that may be relevant.
+
If not, tell us what you did that may be relevant.
But we may not be able to resolve it if there is no reproducible code.
But we may not be able to resolve it if there is no reproducible code.
+
Better to paste what you did instead of describing them.
+
Better to paste what you did instead of describing them.
2.
What you observed, e.g.
as much logs as possible.
2.
What you observed, e.g.
the entire log:
+
Better to paste what you observed instead of describing them.
+
Better to paste what you observed instead of describing them.
3.
What you expected, if not obvious.
3.
What you expected, if not obvious.
4.
Your environment:
4.
Your environment:
+
Python version.
+
Python version.
+
TF version:
`python -c 'import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)'`
.
+
TF version:
`python -c 'import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)'`
.
+
Tensorpack version:
`python3 -c 'import tensorpack; print(tensorpack.__version__)'`
. You can install Tensorpack master by
`pip install -U git+https://github.com/ppwwyyxx/tensorpack.git`
.
+
Tensorpack version:
`python3 -c 'import tensorpack; print(tensorpack.__version__)'`
.
You can install Tensorpack master by
`pip install -U git+https://github.com/ppwwyyxx/tensorpack.git`
.:
5.
About efficiency, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
5.
About efficiency, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
Feature Requests:
Feature Requests:
+
You can implement a lot of features by extending tensorpack
+
You can implement a lot of features by extending tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
It does not have to be added to tensorpack unless you have a good reason.
It does not have to be added to tensorpack unless you have a good reason.
+
We don't take feature requests for implementing new
technique
s.
+
We don't take feature requests for implementing new
paper
s.
If you don't know how, ask it as a usage question.
If you don't know how, ask it as a usage question.
Usage Questions:
Usage Questions:
+
Read the
[
tutorials
](
http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials
)
first.
+
Read the
[
tutorials
](
http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials
)
first.
+
We answer "HOW to do X in tensorpack" for a
specific
well-defined X.
+
We answer "HOW to do X in tensorpack" for a well-defined X.
We don't answer general machine learning questions,
We don't answer general machine learning questions,
such as "what networks to use" or "I don't understand the paper".
such as "what networks to use" or "I don't understand the paper".
...
...
examples/ImageNetModels/imagenet_utils.py
View file @
0f0a9c53
...
@@ -57,7 +57,10 @@ def fbresnet_augmentor(isTrain):
...
@@ -57,7 +57,10 @@ def fbresnet_augmentor(isTrain):
if
isTrain
:
if
isTrain
:
augmentors
=
[
augmentors
=
[
GoogleNetResize
(),
GoogleNetResize
(),
imgaug
.
RandomOrderAug
(
# Remove these augs if your CPU is not fast enough
# It's OK to remove these augs if your CPU is not fast enough.
# Removing brightness/contrast/saturation does not have a significant effect on accuracy.
# Removing lighting leads to a tiny drop in accuracy.
imgaug
.
RandomOrderAug
(
[
imgaug
.
BrightnessScale
((
0.6
,
1.4
),
clip
=
False
),
[
imgaug
.
BrightnessScale
((
0.6
,
1.4
),
clip
=
False
),
imgaug
.
Contrast
((
0.6
,
1.4
),
clip
=
False
),
imgaug
.
Contrast
((
0.6
,
1.4
),
clip
=
False
),
imgaug
.
Saturation
(
0.4
,
rgb
=
False
),
imgaug
.
Saturation
(
0.4
,
rgb
=
False
),
...
...
examples/ResNet/README.md
View file @
0f0a9c53
...
@@ -31,7 +31,7 @@ To train, first decompress ImageNet data into [this structure](http://tensorpack
...
@@ -31,7 +31,7 @@ To train, first decompress ImageNet data into [this structure](http://tensorpack
```
```
You should be able to see good GPU utilization (95%~99%), if your data is fast enough.
You should be able to see good GPU utilization (95%~99%), if your data is fast enough.
It can finish training
[
within 20 hours
](
http://dawn.cs.stanford.edu/benchmark/ImageNet/train.html
)
on AWS p3.16xlarge
.
With batch=64x8, it can finish 100 epochs in 16 hours on AWS p3.16xlarge (8 V100s)
.
The default data pipeline is probably OK for machines with SSD & 20 CPU cores.
The default data pipeline is probably OK for machines with SSD & 20 CPU cores.
See the
[
tutorial
](
http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html
)
on other options to speed up your data.
See the
[
tutorial
](
http://tensorpack.readthedocs.io/en/latest/tutorial/efficient-dataflow.html
)
on other options to speed up your data.
...
...
examples/ResNet/imagenet-resnet.py
View file @
0f0a9c53
...
@@ -119,7 +119,8 @@ if __name__ == '__main__':
...
@@ -119,7 +119,8 @@ if __name__ == '__main__':
parser
.
add_argument
(
'--eval'
,
action
=
'store_true'
,
help
=
'run offline evaluation instead of training'
)
parser
.
add_argument
(
'--eval'
,
action
=
'store_true'
,
help
=
'run offline evaluation instead of training'
)
parser
.
add_argument
(
'--batch'
,
default
=
256
,
type
=
int
,
parser
.
add_argument
(
'--batch'
,
default
=
256
,
type
=
int
,
help
=
"total batch size. "
help
=
"total batch size. "
"Note that it's best to keep per-GPU batch size in [32, 64] to obtain the best accuracy."
)
"Note that it's best to keep per-GPU batch size in [32, 64] to obtain the best accuracy."
"Pretrained models listed in README were trained with batch=32x8."
)
parser
.
add_argument
(
'--mode'
,
choices
=
[
'resnet'
,
'preact'
,
'se'
],
parser
.
add_argument
(
'--mode'
,
choices
=
[
'resnet'
,
'preact'
,
'se'
],
help
=
'variants of resnet to use'
,
default
=
'resnet'
)
help
=
'variants of resnet to use'
,
default
=
'resnet'
)
args
=
parser
.
parse_args
()
args
=
parser
.
parse_args
()
...
...
tensorpack/graph_builder/training.py
View file @
0f0a9c53
...
@@ -39,6 +39,8 @@ class DataParallelBuilder(GraphBuilder):
...
@@ -39,6 +39,8 @@ class DataParallelBuilder(GraphBuilder):
"""
"""
if
len
(
towers
)
>
1
:
if
len
(
towers
)
>
1
:
logger
.
info
(
"[DataParallel] Training a model of {} towers."
.
format
(
len
(
towers
)))
logger
.
info
(
"[DataParallel] Training a model of {} towers."
.
format
(
len
(
towers
)))
if
not
tf
.
test
.
is_built_with_cuda
():
logger
.
warn
(
"TensorFlow was not built with CUDA support!"
)
self
.
towers
=
towers
self
.
towers
=
towers
...
...
tensorpack/utils/gpu.py
View file @
0f0a9c53
...
@@ -27,26 +27,37 @@ def get_num_gpu():
...
@@ -27,26 +27,37 @@ def get_num_gpu():
Returns:
Returns:
int: #available GPUs in CUDA_VISIBLE_DEVICES, or in the system.
int: #available GPUs in CUDA_VISIBLE_DEVICES, or in the system.
"""
"""
def
warn_return
(
ret
,
message
):
try
:
import
tensorflow
as
tf
except
ImportError
:
return
ret
built_with_cuda
=
tf
.
test
.
is_built_with_cuda
()
if
not
built_with_cuda
and
ret
>
0
:
logger
.
warn
(
message
+
"But TensorFlow was not built with CUDA support!"
)
return
ret
env
=
os
.
environ
.
get
(
'CUDA_VISIBLE_DEVICES'
,
None
)
env
=
os
.
environ
.
get
(
'CUDA_VISIBLE_DEVICES'
,
None
)
if
env
is
not
None
:
if
env
is
not
None
:
return
len
(
env
.
split
(
','
)
)
return
warn_return
(
len
(
env
.
split
(
','
)),
"Found non-empty CUDA_VISIBLE_DEVICES. "
)
output
,
code
=
subproc_call
(
"nvidia-smi -L"
,
timeout
=
5
)
output
,
code
=
subproc_call
(
"nvidia-smi -L"
,
timeout
=
5
)
if
code
==
0
:
if
code
==
0
:
output
=
output
.
decode
(
'utf-8'
)
output
=
output
.
decode
(
'utf-8'
)
return
len
(
output
.
strip
()
.
split
(
'
\n
'
))
return
warn_return
(
len
(
output
.
strip
()
.
split
(
'
\n
'
)),
"Found nvidia-smi. "
)
else
:
try
:
try
:
# Use NVML to query device properties
# Use NVML to query device properties
with
NVMLContext
()
as
ctx
:
with
NVMLContext
()
as
ctx
:
return
warn_return
(
ctx
.
num_devices
(),
"NVML found nvidia devices. "
)
return
ctx
.
num_devices
()
except
Exception
:
except
Exception
:
# Fallback
# Fallback
# Note this will initialize all GPUs and therefore has side effect
# Note this will initialize all GPUs and therefore has side effect
# https://github.com/tensorflow/tensorflow/issues/8136
# https://github.com/tensorflow/tensorflow/issues/8136
logger
.
info
(
"Loading local devices by TensorFlow ..."
)
logger
.
info
(
"Loading local devices by TensorFlow ..."
)
from
tensorflow.python.client
import
device_lib
from
tensorflow.python.client
import
device_lib
local_device_protos
=
device_lib
.
list_local_devices
()
local_device_protos
=
device_lib
.
list_local_devices
()
return
len
([
x
.
name
for
x
in
local_device_protos
if
x
.
device_type
==
'GPU'
])
return
len
([
x
.
name
for
x
in
local_device_protos
if
x
.
device_type
==
'GPU'
])
get_nr_gpu
=
get_num_gpu
get_nr_gpu
=
get_num_gpu
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment