Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
6e0b509b
Commit
6e0b509b
authored
Apr 09, 2019
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Catch exception in GPUUtilization.worker (#1134)
parent
d04c8444
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
33 additions
and
25 deletions
+33
-25
tensorpack/callbacks/prof.py
tensorpack/callbacks/prof.py
+33
-25
No files found.
tensorpack/callbacks/prof.py
View file @
6e0b509b
...
@@ -75,6 +75,9 @@ class GPUUtilizationTracker(Callback):
...
@@ -75,6 +75,9 @@ class GPUUtilizationTracker(Callback):
# Don't do this in after_epoch because
# Don't do this in after_epoch because
# before,after_epoch are supposed to be extremely fast by design.
# before,after_epoch are supposed to be extremely fast by design.
stats
=
self
.
_queue
.
get
()
stats
=
self
.
_queue
.
get
()
if
stats
==
-
1
:
from
..train.base
import
StopTraining
raise
StopTraining
(
"GPUUtilizationTracker.worker has failed."
)
for
idx
,
dev
in
enumerate
(
self
.
_devices
):
for
idx
,
dev
in
enumerate
(
self
.
_devices
):
self
.
trainer
.
monitors
.
put_scalar
(
'GPUUtil/{}'
.
format
(
dev
),
stats
[
idx
])
self
.
trainer
.
monitors
.
put_scalar
(
'GPUUtil/{}'
.
format
(
dev
),
stats
[
idx
])
...
@@ -85,6 +88,7 @@ class GPUUtilizationTracker(Callback):
...
@@ -85,6 +88,7 @@ class GPUUtilizationTracker(Callback):
def
worker
(
self
,
evt
,
rst_queue
,
stop_evt
):
def
worker
(
self
,
evt
,
rst_queue
,
stop_evt
):
while
True
:
while
True
:
try
:
evt
.
wait
()
# start epoch
evt
.
wait
()
# start epoch
evt
.
clear
()
evt
.
clear
()
if
stop_evt
.
is_set
():
# or on exit
if
stop_evt
.
is_set
():
# or on exit
...
@@ -111,6 +115,10 @@ class GPUUtilizationTracker(Callback):
...
@@ -111,6 +115,10 @@ class GPUUtilizationTracker(Callback):
cnt
-=
1
cnt
-=
1
rst_queue
.
put
(
stats
/
cnt
)
rst_queue
.
put
(
stats
/
cnt
)
break
break
except
Exception
:
logger
.
exception
(
"Exception in GPUUtilizationTracker.worker"
)
rst_queue
.
put
(
-
1
)
return
# Can add more features from tfprof
# Can add more features from tfprof
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment