Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
ccef4d4f
Commit
ccef4d4f
authored
Nov 01, 2017
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
796a4353
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
16 additions
and
6 deletions
+16
-6
docs/tutorial/dataflow.md
docs/tutorial/dataflow.md
+4
-3
docs/tutorial/input-source.md
docs/tutorial/input-source.md
+12
-3
No files found.
docs/tutorial/dataflow.md
View file @
ccef4d4f
...
@@ -18,7 +18,7 @@ One good thing about having a standard interface is to be able to provide
...
@@ -18,7 +18,7 @@ One good thing about having a standard interface is to be able to provide
the greatest code reusability.
the greatest code reusability.
There are a lot of existing DataFlow utilities in tensorpack, which you can use to compose
There are a lot of existing DataFlow utilities in tensorpack, which you can use to compose
complex DataFlow with a long data pipeline. A common pipeline usually
complex DataFlow with a long data pipeline. A common pipeline usually
would __read from disk (or other sources), apply
augment
ations, group into batches,
would __read from disk (or other sources), apply
transform
ations, group into batches,
prefetch data__, etc. A simple example is as the following:
prefetch data__, etc. A simple example is as the following:
````
python
````
python
...
@@ -35,16 +35,17 @@ You can find more complicated DataFlow in the [ResNet training script](../exampl
...
@@ -35,16 +35,17 @@ You can find more complicated DataFlow in the [ResNet training script](../exampl
with all the data preprocessing.
with all the data preprocessing.
Unless you are working with standard data types (image folders, LMDB, etc),
Unless you are working with standard data types (image folders, LMDB, etc),
you would usually want to write the
bas
e DataFlow (
`MyDataFlow`
in the above example) for your data format.
you would usually want to write the
sourc
e DataFlow (
`MyDataFlow`
in the above example) for your data format.
See
[
another tutorial
](
extend/dataflow.html
)
See
[
another tutorial
](
extend/dataflow.html
)
for simple instructions on writing a DataFlow.
for simple instructions on writing a DataFlow.
Once you have the
bas
e reader, all the
[
existing DataFlows
](
../modules/dataflow.html
)
are ready for you to complete
Once you have the
sourc
e reader, all the
[
existing DataFlows
](
../modules/dataflow.html
)
are ready for you to complete
the rest of the data pipeline.
the rest of the data pipeline.
### Why DataFlow
### Why DataFlow
1.
It's easy: write everything in pure Python, and reuse existing utilities.
1.
It's easy: write everything in pure Python, and reuse existing utilities.
On the contrary, writing data loaders in TF operators is usually painful, and performance is hard to tune.
On the contrary, writing data loaders in TF operators is usually painful, and performance is hard to tune.
See more discussions in
[
Python Reader or TF Reader
](
input-source.html#python-reader-or-tf-reader
)
.
2.
It's fast: see
[
Efficient DataFlow
](
efficient-dataflow.html
)
2.
It's fast: see
[
Efficient DataFlow
](
efficient-dataflow.html
)
on how to build a fast DataFlow with parallelism.
on how to build a fast DataFlow with parallelism.
If you're using DataFlow with tensorpack, also see
[
Input Pipeline tutorial
](
input-source.html
)
If you're using DataFlow with tensorpack, also see
[
Input Pipeline tutorial
](
input-source.html
)
...
...
docs/tutorial/input-source.md
View file @
ccef4d4f
...
@@ -60,9 +60,18 @@ The benefits of using TensorFlow ops are:
...
@@ -60,9 +60,18 @@ The benefits of using TensorFlow ops are:
and TF `StagingArea` can help hide the "Copy to GPU" latency.
and TF `StagingArea` can help hide the "Copy to GPU" latency.
They are used by most examples in tensorpack.
They are used by most examples in tensorpack.
The benefits of using Python reader is obvious:
The benefits of using Python reader is obvious: it's __much much easier__.
it's much much easier to write Python to read different data format,
Reading data is a much more complicated and much less structured job than training a model.
handle corner cases in noisy data, preprocess, etc.
You need to handle different data format, handle corner cases in noisy data,
which all require logical operations, condition operations, loops, etc. These operations
are __naturally not suitable__ for a graph computation framework.
It only makes sense to use TF to read data, if your data is originally very clean and well-formated.
You may want to write a script to clean your data, then you're almost writing a Python loader already!
Think about it: it's a waste of time to write a Python script to transform from raw data to TFRecords,
then a TF script to transform from TFRecords to tensors.
The intermediate step (TFRecords) doesn't have to exist.
## InputSource
## InputSource
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment