Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
88d949db
Commit
88d949db
authored
Dec 10, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs (#1006)
parent
f221d7f3
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
18 additions
and
8 deletions
+18
-8
README.md
README.md
+2
-2
tensorpack/dataflow/common.py
tensorpack/dataflow/common.py
+16
-6
No files found.
README.md
View file @
88d949db
...
@@ -8,7 +8,7 @@ Tensorpack is a neural network training interface based on TensorFlow.
...
@@ -8,7 +8,7 @@ Tensorpack is a neural network training interface based on TensorFlow.
[

](http://models.tensorpack.com)
[

](http://models.tensorpack.com)
## Features:
## Features:
It's Yet Another TF high-level API, with __speed__,
__readability__
and __flexibility__ built together.
It's Yet Another TF high-level API, with __speed__, and __flexibility__ built together.
1.
Focus on __training speed__.
1.
Focus on __training speed__.
+
Speed comes for free with Tensorpack -- it uses TensorFlow in the __efficient way__ with no extra overhead.
+
Speed comes for free with Tensorpack -- it uses TensorFlow in the __efficient way__ with no extra overhead.
...
@@ -37,7 +37,7 @@ See [tutorials and documentations](http://tensorpack.readthedocs.io/tutorial/ind
...
@@ -37,7 +37,7 @@ See [tutorials and documentations](http://tensorpack.readthedocs.io/tutorial/ind
We refuse toy examples.
We refuse toy examples.
Instead of showing you 10 arbitrary networks trained on toy datasets,
Instead of showing you 10 arbitrary networks trained on toy datasets,
[
Tensorpack examples
](
examples
)
faithfully replicate papers and care about reproducing numbers,
[
Tensorpack examples
](
examples
)
faithfully replicate papers and care about reproducing numbers,
demonstrating its
flexibility
for actual research.
demonstrating its
__flexibility__
for actual research.
### Vision:
### Vision:
+
[
Train ResNet
](
examples/ResNet
)
and
[
other models
](
examples/ImageNetModels
)
on ImageNet.
+
[
Train ResNet
](
examples/ResNet
)
and
[
other models
](
examples/ImageNetModels
)
on ImageNet.
...
...
tensorpack/dataflow/common.py
View file @
88d949db
...
@@ -577,12 +577,22 @@ def SelectComponent(ds, idxs):
...
@@ -577,12 +577,22 @@ def SelectComponent(ds, idxs):
class
LocallyShuffleData
(
ProxyDataFlow
,
RNGDataFlow
):
class
LocallyShuffleData
(
ProxyDataFlow
,
RNGDataFlow
):
"""
Maintain a pool to buffer datapoints, and shuffle
before producing them.
"""
Buffer the datapoints from a given dataflow, and shuffle them
before producing them.
This can be used as an alternative when a complete random read is too expensive
This can be used as an alternative when a complete random read is too expensive
or impossible for the data source.
or impossible for the data source.
This dataflow has the following behavior:
1. It takes datapoints from the given dataflow `ds` to an internal buffer of fixed size.
Each datapoint is duplicated for `nr_reuse` times.
2. Once the buffer is full, this dataflow starts to yield data from the beginning of the buffer,
and new datapoints will be added to the end of the buffer. This is like a FIFO queue.
3. The internal buffer is shuffled after every `shuffle_interval` datapoints that come from `ds`.
To maintain shuffling states, this dataflow is not reentrant.
To maintain shuffling states, this dataflow is not reentrant.
The iterator will run indefinitely because after mixing the datapoints, it does not make sense to stop anywhere.
Datapoints from one pass of `ds` will get mixed with datapoints from a different pass.
As a result, the iterator of this dataflow will run indefinitely
because it does not make sense to stop the iteration anywhere.
"""
"""
def
__init__
(
self
,
ds
,
buffer_size
,
nr_reuse
=
1
,
shuffle_interval
=
None
):
def
__init__
(
self
,
ds
,
buffer_size
,
nr_reuse
=
1
,
shuffle_interval
=
None
):
...
@@ -591,11 +601,11 @@ class LocallyShuffleData(ProxyDataFlow, RNGDataFlow):
...
@@ -591,11 +601,11 @@ class LocallyShuffleData(ProxyDataFlow, RNGDataFlow):
ds (DataFlow): input DataFlow.
ds (DataFlow): input DataFlow.
buffer_size (int): size of the buffer.
buffer_size (int): size of the buffer.
nr_reuse (int): duplicate each datapoints several times into the buffer to improve
nr_reuse (int): duplicate each datapoints several times into the buffer to improve
speed, but may hurt your model.
speed, but
duplication
may hurt your model.
shuffle_interval (int): shuffle the buffer after this many
shuffle_interval (int): shuffle the buffer after this many
datapoints were produced from the given dataflow. Frequent shuffle on large buffer
datapoints were produced from the given dataflow. Frequent shuffle on large buffer
may affect speed, but infrequent shuffle may
affect
may affect speed, but infrequent shuffle may
not provide enough randomness.
randomness.
Defaults to buffer_size / 3
Defaults to buffer_size / 3
"""
"""
ProxyDataFlow
.
__init__
(
self
,
ds
)
ProxyDataFlow
.
__init__
(
self
,
ds
)
self
.
q
=
deque
(
maxlen
=
buffer_size
)
self
.
q
=
deque
(
maxlen
=
buffer_size
)
...
@@ -620,7 +630,7 @@ class LocallyShuffleData(ProxyDataFlow, RNGDataFlow):
...
@@ -620,7 +630,7 @@ class LocallyShuffleData(ProxyDataFlow, RNGDataFlow):
for
dp
in
self
.
_inf_iter
:
for
dp
in
self
.
_inf_iter
:
self
.
_iter_cnt
=
(
self
.
_iter_cnt
+
1
)
%
self
.
shuffle_interval
self
.
_iter_cnt
=
(
self
.
_iter_cnt
+
1
)
%
self
.
shuffle_interval
# fill queue
# fill queue
if
self
.
_iter_cnt
%
self
.
shuffle_interval
==
0
:
if
self
.
_iter_cnt
==
0
:
self
.
rng
.
shuffle
(
self
.
q
)
self
.
rng
.
shuffle
(
self
.
q
)
for
_
in
range
(
self
.
nr_reuse
):
for
_
in
range
(
self
.
nr_reuse
):
if
self
.
q
.
maxlen
==
len
(
self
.
q
):
if
self
.
q
.
maxlen
==
len
(
self
.
q
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment