Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
5a868442
Commit
5a868442
authored
Feb 16, 2019
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
some docs improvement about multi-gpu (fix #1084)
parent
24c1ec26
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
5 additions
and
11 deletions
+5
-11
docs/tutorial/trainer.md
docs/tutorial/trainer.md
+4
-2
tensorpack/train/config.py
tensorpack/train/config.py
+1
-9
No files found.
docs/tutorial/trainer.md
View file @
5a868442
...
...
@@ -83,7 +83,9 @@ Note some __common problems__ when using these trainers:
1.
In each iteration, instead of taking one input tensor for all GPUs and split,
all GPUs take tensors from the
`InputSource`
.
So the total batch size across all GPUs would become
``(batch size of InputSource) * #GPU``
.
So the total batch size across all GPUs is
``(batch size of InputSource) * #GPU``
.
You may want to change
`steps_per_epoch`
or learing rate appropriately according
to the total batch size.
```eval_rst
.. note::
...
...
@@ -96,7 +98,7 @@ Note some __common problems__ when using these trainers:
```
2.
The tower function (your model code) will get called once on each GPU.
You must follow the abovementi
eon
d rules of tower function.
You must follow the abovementi
one
d rules of tower function.
### Distributed Trainers
...
...
tensorpack/train/config.py
View file @
5a868442
...
...
@@ -93,7 +93,7 @@ class TrainConfig(object):
starting_epoch (int): The index of the first epoch.
steps_per_epoch (int): the number of steps (defined by :meth:`Trainer.run_step`) to run in each epoch.
Defaults to the input data size.
Defaults to the input data size.
You may want to divide it by the #GPUs in multi-GPU training.
max_epoch (int): maximum number of epoch to run training.
"""
...
...
@@ -156,14 +156,6 @@ class TrainConfig(object):
self
.
starting_epoch
=
int
(
starting_epoch
)
self
.
max_epoch
=
int
(
max_epoch
)
if
'nr_tower'
in
kwargs
:
self
.
nr_tower
=
kwargs
.
pop
(
'nr_tower'
)
if
'tower'
in
kwargs
:
self
.
tower
=
kwargs
.
pop
(
'tower'
)
else
:
self
.
tower
=
[
0
]
assert
len
(
kwargs
)
==
0
,
"Unknown arguments: {}"
.
format
(
kwargs
.
keys
())
class
AutoResumeTrainConfig
(
TrainConfig
):
"""
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment