Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
f831a46e
Commit
f831a46e
authored
Apr 09, 2019
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Do not make unnecessary calls to NVML (#1134)
parent
6f02650d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
17 additions
and
16 deletions
+17
-16
tensorpack/callbacks/prof.py
tensorpack/callbacks/prof.py
+17
-16
No files found.
tensorpack/callbacks/prof.py
View file @
f831a46e
...
...
@@ -67,7 +67,7 @@ class GPUUtilizationTracker(Callback):
self
.
_evt
.
set
()
def
_after_epoch
(
self
):
while
self
.
_evt
.
is_set
():
# unlikely
while
self
.
_evt
.
is_set
():
# unlikely
, unless the epoch is extremely fast
pass
self
.
_evt
.
set
()
...
...
@@ -87,20 +87,21 @@ class GPUUtilizationTracker(Callback):
self
.
_proc
.
terminate
()
def
worker
(
self
,
evt
,
rst_queue
,
stop_evt
):
while
True
:
try
:
evt
.
wait
()
# start epoch
evt
.
clear
()
if
stop_evt
.
is_set
():
# or on exit
return
stats
=
np
.
zeros
((
len
(
self
.
_devices
),),
dtype
=
'f4'
)
cnt
=
0
with
NVMLContext
()
as
ctx
:
with
NVMLContext
()
as
ctx
:
devices
=
[
ctx
.
device
(
i
)
for
i
in
self
.
_devices
]
while
True
:
try
:
evt
.
wait
()
# start epoch
evt
.
clear
()
if
stop_evt
.
is_set
():
# or on exit
return
stats
=
np
.
zeros
((
len
(
self
.
_devices
),),
dtype
=
'f4'
)
cnt
=
0
while
True
:
time
.
sleep
(
1
)
data
=
[
ctx
.
device
(
i
)
.
utilization
()[
'gpu'
]
for
i
in
self
.
_
devices
]
data
=
[
d
.
utilization
()[
'gpu'
]
for
d
in
devices
]
data
=
list
(
map
(
float
,
data
))
stats
+=
data
cnt
+=
1
...
...
@@ -115,10 +116,10 @@ class GPUUtilizationTracker(Callback):
cnt
-=
1
rst_queue
.
put
(
stats
/
cnt
)
break
except
Exception
:
logger
.
exception
(
"Exception in GPUUtilizationTracker.worker"
)
rst_queue
.
put
(
-
1
)
return
except
Exception
:
logger
.
exception
(
"Exception in GPUUtilizationTracker.worker"
)
rst_queue
.
put
(
-
1
)
return
# Can add more features from tfprof
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment