Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
a4d4eafc
Commit
a4d4eafc
authored
Dec 27, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
misc small change
parent
ed702d1d
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
10 additions
and
4 deletions
+10
-4
tensorpack/tfutils/tower.py
tensorpack/tfutils/tower.py
+2
-0
tensorpack/train/base.py
tensorpack/train/base.py
+2
-1
tensorpack/train/trainers.py
tensorpack/train/trainers.py
+5
-1
tensorpack/utils/utils.py
tensorpack/utils/utils.py
+1
-2
No files found.
tensorpack/tfutils/tower.py
View file @
a4d4eafc
...
...
@@ -252,6 +252,8 @@ class TowerFuncWrapper(object):
each time the function is called.
:class:`TowerTrainer` needs this so that it knows how to build a predictor.
Conceptually, this class is roughly equivalent to `tf.function` with input signature, introduced in TF 2.0.
"""
def
__init__
(
self
,
tower_fn
,
inputs_desc
):
...
...
tensorpack/train/base.py
View file @
a4d4eafc
...
...
@@ -46,7 +46,8 @@ class TrainLoop(object):
self
.
starting_epoch
=
int
(
starting_epoch
)
self
.
max_epoch
=
int
(
max_epoch
)
self
.
steps_per_epoch
=
int
(
steps_per_epoch
)
assert
self
.
steps_per_epoch
>
0
and
self
.
max_epoch
>
0
# Allow empty epoch (no steps), if we want to run the callbacks only.
assert
self
.
steps_per_epoch
>=
0
and
self
.
max_epoch
>=
0
self
.
_epoch_num
=
starting_epoch
-
1
...
...
tensorpack/train/trainers.py
View file @
a4d4eafc
...
...
@@ -375,6 +375,7 @@ class HorovodTrainer(SingleCostTrainer):
hvd
.
init
()
self
.
is_chief
=
hvd
.
rank
()
==
0
self
.
_local_rank
=
hvd
.
local_rank
()
self
.
_rank
=
hvd
.
rank
()
self
.
_average
=
average
logger
.
info
(
"[HorovodTrainer] local rank={}"
.
format
(
self
.
_local_rank
))
super
(
HorovodTrainer
,
self
)
.
__init__
()
...
...
@@ -435,7 +436,10 @@ class HorovodTrainer(SingleCostTrainer):
# TODO:
# 1. a allgather helper to concat strings
# 2. check variables on each rank match each other, print warnings, and broadcast the common set.
if
self
.
is_chief
:
logger
.
info
(
"Broadcasting initialized variables ..."
)
else
:
logger
.
info
(
"Rank {} waiting for initialization broadcasting ..."
.
format
(
self
.
_rank
))
self
.
sess
.
run
(
self
.
_broadcast_op
)
...
...
tensorpack/utils/utils.py
View file @
a4d4eafc
...
...
@@ -179,7 +179,6 @@ def _pick_tqdm_interval(file):
return
15
if
'OMPI_COMM_WORLD_SIZE'
in
os
.
environ
:
if
int
(
os
.
environ
[
'OMPI_COMM_WORLD_SIZE'
])
>
8
:
return
60
# If not a tty, don't refresh progress bar that often
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment