Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
61a5960c
Commit
61a5960c
authored
Mar 29, 2017
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
docs about distributed data (fix #202)
parent
cddb713f
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
23 additions
and
3 deletions
+23
-3
docs/tutorial/efficient-dataflow.md
docs/tutorial/efficient-dataflow.md
+23
-3
No files found.
docs/tutorial/efficient-dataflow.md
View file @
61a5960c
...
...
@@ -198,11 +198,31 @@ The above DataFlow can run at a speed of 5~10 batches per second, if you have go
As a reference, tensorpack can train ResNet-18 (a shallow ResNet) at 4.5 batches (of 256 samples) per second on 4 old TitanX.
So DataFlow won't be a serious bottleneck if configured properly.
##
Larger Datasets?
##
More Efficient DataFlow
For larger datasets (and smaller networks
) you could be seriously bounded by CPU or disk speed of a single machine.
To work with larger datasets (or smaller networks, or more GPUS
) you could be seriously bounded by CPU or disk speed of a single machine.
Then it's best to run DataFlow distributely and collect them on the
training machine. Currently there is only little support for this feature.
training machine. E.g.:
```
python
# Data Machine #1, process 1-20:
df
=
MyLargeData
()
send_dataflow_zmq
(
df
,
'tcp://1.2.3.4:8877'
)
```
```
python
# Data Machine #2, process 1-20:
df
=
MyLargeData
()
send_dataflow_zmq
(
df
,
'tcp://1.2.3.4:8877'
)
```
```
python
# Training Machine, process 1-10:
df
=
MyLargeData
()
send_dataflow_zmq
(
df
,
'ipc:///tmp/ipc-socket'
)
```
```
python
# Training Machine, training process
df
=
RemoteDataZMQ
(
'ipc:///tmp/ipc-socket'
,
'tcp://0.0.0.0:8877'
)
TestDataSpeed
(
df
)
.
start_test
()
```
[
1
]:
#ref
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment