Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
bed3fa19
Commit
bed3fa19
authored
Nov 23, 2017
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Support horovod distributed training (#422)
parent
9c2e2226
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
22 additions
and
3 deletions
+22
-3
docs/tutorial/intro.rst
docs/tutorial/intro.rst
+1
-1
tensorpack/train/trainers.py
tensorpack/train/trainers.py
+21
-2
No files found.
docs/tutorial/intro.rst
View file @
bed3fa19
...
...
@@ -15,7 +15,7 @@ which as a result makes people think TensorFlow is slow.
Tensorpack uses TensorFlow efficiently, and hides these details under its APIs.
You no longer need to learn about
multi-GPU model replication,
variables synchronization, queues, tf.data
-- anything that's unrelated to the model itself.
multi-GPU model replication,
device placement, variables synchronization, queues
-- anything that's unrelated to the model itself.
You still need to learn to write models with TF, but everything else is taken care of by tensorpack, in the efficient way.
A High Level Glance
...
...
tensorpack/train/trainers.py
View file @
bed3fa19
...
...
@@ -212,9 +212,28 @@ class DistributedTrainerReplicated(SingleCostTrainer):
class
HorovodTrainer
(
SingleCostTrainer
):
"""
Horovod trainer,
currently support multi-GPU
training.
Horovod trainer,
support multi-GPU and distributed
training.
It will use the first k GPUs in CUDA_VISIBLE_DEVICES.
To use for multi-GPU training:
CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -np 4 --output-filename mylog python train.py
To use for distributed training:
/path/to/mpirun -np 8 -H server1:4,server2:4
\
-bind-to none -map-by slot
\
--output-filename mylog -x LD_LIBRARY_PATH -x CUDA_VISIBLE_DEVICES=0,1,2,3
\
python train.py
Note:
1. If using all GPUs, you can always skip the `CUDA_VISIBLE_DEVICES` option.
2. About performance, horovod is expected to be slightly
slower than native tensorflow on multi-GPU training, but faster in distributed training.
3. Due to the use of MPI, training is less informative (no progress bar).
It's recommended to use other multi-GPU trainers for single-node
experiments, and scale to multi nodes by horovod.
"""
def
__init__
(
self
):
hvd
.
init
()
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment