@@ -63,12 +63,12 @@ Both are supported in tensorpack, while we recommend using Python.
### TensorFlow Reader: Cons
The disadvantage of TF reader is obvious and it's huge: it's __too complicated__.
Reading data is a complicated and badly-structured job than running the model.
Unlike running a mathematical model, reading data is a complicated and badly-structured task.
You need to handle different data format, handle corner cases in noisy data,
which all require condition operations, loops, sometimes even exception handling. These operations
are __naturally not suitable__ for a symbolic graph.
Let's take a look at what users are asking for:
Let's take a look at what users are asking for`tf.data`:
* Different ways to [pad data](https://github.com/tensorflow/tensorflow/issues/13969), [shuffle data](https://github.com/tensorflow/tensorflow/issues/14518)
*[Handle none values in data](https://github.com/tensorflow/tensorflow/issues/13865)
*[Handle dataset that's not a multiple of batch size](https://github.com/tensorflow/tensorflow/issues/13745)
...
...
@@ -76,16 +76,16 @@ Let's take a look at what users are asking for:
*[Sort/skip some data](https://github.com/tensorflow/tensorflow/issues/14250)
*[Write data to files](https://github.com/tensorflow/tensorflow/issues/15014)
To support all these features which could've been done with 3 lines of code in Python, you need either a new TF
To support all these features which could've been done with __3 lines of code in Python__, you need either a new TF
API, or ask [Dataset.from_generator](https://www.tensorflow.org/versions/r1.4/api_docs/python/tf/contrib/data/Dataset#from_generator)
(i.e. Python again) to the rescue.
It only makes sense to use TF to read data, if your data is originally very clean and well-formated.
If not, you may feel like writing a script to clean your data, but then you're almost writing a Python loader already!
If not, you may feel like writing a script to format your data, but then you're almost writing a Python loader already!
Think about it: it's a waste of time to write a Python script to transform from raw data to clean format (e.g. TFRecords),
Think about it: it's a waste of time to write a Python script to transform from raw data to TF-friendly format,
then a TF script to transform from this format to tensors.
The intermediate step (TFRecords) doesn't have to exist.
The intermediate format doesn't have to exist.
You just need the right interface to connect Python to the graph directly, efficiently.
`tensorpack.InputSource` is such an interface.
...
...
@@ -95,13 +95,16 @@ You just need the right interface to connect Python to the graph directly, effic