Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
215a4d6d
Commit
215a4d6d
authored
Mar 16, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
a6936913
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
21 additions
and
8 deletions
+21
-8
docs/tutorial/extend/dataflow.md
docs/tutorial/extend/dataflow.md
+20
-8
tensorpack/dataflow/raw.py
tensorpack/dataflow/raw.py
+1
-0
No files found.
docs/tutorial/extend/dataflow.md
View file @
215a4d6d
...
@@ -9,6 +9,17 @@ which you can use if your data format is simple.
...
@@ -9,6 +9,17 @@ which you can use if your data format is simple.
In general, you probably need to write a source DataFlow to produce data for your task,
In general, you probably need to write a source DataFlow to produce data for your task,
and then compose it with existing modules (e.g. mapping, batching, prefetching, ...).
and then compose it with existing modules (e.g. mapping, batching, prefetching, ...).
The easiest way to create a DataFlow to load custom data, is to wrap a custom generator, e.g.:
```
python
def
my_data_loader
():
while
True
:
# load data from somewhere
yield
[
my_array
,
my_label
]
dataflow
=
DataFromGenerator
(
my_data_loader
)
```
To write more complicated DataFlow, you need to inherit the base
`DataFlow`
class.
Usually, you just need to implement the
`get_data()`
method which yields a datapoint every time.
Usually, you just need to implement the
`get_data()`
method which yields a datapoint every time.
```
python
```
python
class
MyDataFlow
(
DataFlow
):
class
MyDataFlow
(
DataFlow
):
...
@@ -24,12 +35,12 @@ Optionally, you can implement the following two methods:
...
@@ -24,12 +35,12 @@ Optionally, you can implement the following two methods:
+
`size()`
. Return the number of elements the generator can produce. Certain tensorpack features might use it.
+
`size()`
. Return the number of elements the generator can produce. Certain tensorpack features might use it.
+
`reset_state()`
. It is guaranteed that the actual process which runs a DataFlow will invoke this method before using it.
+
`reset_state()`
. It is guaranteed that the actual process which runs a DataFlow will invoke this method before using it.
So if this DataFlow needs to do something after a
`fork()`
, you should put it here.
So if this DataFlow needs to do something after a
`fork()`
, you should put it here.
The convention is that,
`reset_state()`
must be called once and usually
only once for each DataFlow instance.
`reset_state()`
must be called once and
only once for each DataFlow instance.
A typical situation
is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
A typical example
is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
Otherwise, child processes will have the same random seed. The `RNGDataFlow` base class does this for you.
Otherwise, child processes will have the same random seed. The
`RNGDataFlow`
base class does this for you.
You can subclass `RNGDataFlow` to access `self.rng` whose seed has been taken care of.
You can subclass
`RNGDataFlow`
to access
`self.rng`
whose seed has been taken care of.
DataFlow implementations for several well-known datasets are provided in the
DataFlow implementations for several well-known datasets are provided in the
[
dataflow.dataset
](
../../modules/dataflow.dataset.html
)
[
dataflow.dataset
](
../../modules/dataflow.dataset.html
)
...
@@ -37,15 +48,16 @@ module, you can take them as a reference.
...
@@ -37,15 +48,16 @@ module, you can take them as a reference.
#### More Data Processing
#### More Data Processing
You can put any data processing you need in the source DataFlow
, or
write a new DataFlow for data
You can put any data processing you need in the source DataFlow
you write, or you can
write a new DataFlow for data
processing on top of the source DataFlow, e.g.:
processing on top of the source DataFlow, e.g.:
```
python
```
python
class
ProcessingDataFlow
(
DataFlow
):
class
ProcessingDataFlow
(
DataFlow
):
def
__init__
(
self
,
ds
):
def
__init__
(
self
,
ds
):
self
.
ds
=
ds
self
.
ds
=
ds
def
get_data
(
self
):
def
get_data
(
self
):
for
datapoint
in
self
.
ds
.
get_data
():
for
datapoint
in
self
.
ds
.
get_data
():
# do something
# do something
yield
new_datapoint
yield
new_datapoint
```
```
tensorpack/dataflow/raw.py
View file @
215a4d6d
...
@@ -104,6 +104,7 @@ class DataFromGenerator(DataFlow):
...
@@ -104,6 +104,7 @@ class DataFromGenerator(DataFlow):
"""
"""
Args:
Args:
gen: iterable, or a callable that returns an iterable
gen: iterable, or a callable that returns an iterable
size: deprecated
"""
"""
if
not
callable
(
gen
):
if
not
callable
(
gen
):
self
.
_gen
=
lambda
:
gen
self
.
_gen
=
lambda
:
gen
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment