Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
215a4d6d
Commit
215a4d6d
authored
Mar 16, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
a6936913
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
21 additions
and
8 deletions
+21
-8
docs/tutorial/extend/dataflow.md
docs/tutorial/extend/dataflow.md
+20
-8
tensorpack/dataflow/raw.py
tensorpack/dataflow/raw.py
+1
-0
No files found.
docs/tutorial/extend/dataflow.md
View file @
215a4d6d
...
...
@@ -9,6 +9,17 @@ which you can use if your data format is simple.
In general, you probably need to write a source DataFlow to produce data for your task,
and then compose it with existing modules (e.g. mapping, batching, prefetching, ...).
The easiest way to create a DataFlow to load custom data, is to wrap a custom generator, e.g.:
```
python
def
my_data_loader
():
while
True
:
# load data from somewhere
yield
[
my_array
,
my_label
]
dataflow
=
DataFromGenerator
(
my_data_loader
)
```
To write more complicated DataFlow, you need to inherit the base
`DataFlow`
class.
Usually, you just need to implement the
`get_data()`
method which yields a datapoint every time.
```
python
class
MyDataFlow
(
DataFlow
):
...
...
@@ -25,9 +36,9 @@ Optionally, you can implement the following two methods:
+
`reset_state()`
. It is guaranteed that the actual process which runs a DataFlow will invoke this method before using it.
So if this DataFlow needs to do something after a
`fork()`
, you should put it here.
The convention is that,
`reset_state()`
must be called once and usually
only once for each DataFlow instance.
`reset_state()`
must be called once and
only once for each DataFlow instance.
A typical situation
is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
A typical example
is when your DataFlow uses random number generator (RNG). Then you would need to reset the RNG here.
Otherwise, child processes will have the same random seed. The
`RNGDataFlow`
base class does this for you.
You can subclass
`RNGDataFlow`
to access
`self.rng`
whose seed has been taken care of.
...
...
@@ -37,13 +48,14 @@ module, you can take them as a reference.
#### More Data Processing
You can put any data processing you need in the source DataFlow
, or
write a new DataFlow for data
You can put any data processing you need in the source DataFlow
you write, or you can
write a new DataFlow for data
processing on top of the source DataFlow, e.g.:
```
python
class
ProcessingDataFlow
(
DataFlow
):
def
__init__
(
self
,
ds
):
self
.
ds
=
ds
def
get_data
(
self
):
for
datapoint
in
self
.
ds
.
get_data
():
# do something
...
...
tensorpack/dataflow/raw.py
View file @
215a4d6d
...
...
@@ -104,6 +104,7 @@ class DataFromGenerator(DataFlow):
"""
Args:
gen: iterable, or a callable that returns an iterable
size: deprecated
"""
if
not
callable
(
gen
):
self
.
_gen
=
lambda
:
gen
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment