1.`LMDBDataPoint` deserialize the datapoints (from string to [jpeg_string, label])
1.`LMDBDataPoint` deserialize the datapoints (from string to [jpeg_string, label])
2. Use OpenCV to decode the first component into ndarray
2. Use OpenCV to decode the first component into ndarray
3. Apply augmentations to the ndarray
3. Apply augmentations to the ndarray
...
@@ -188,6 +190,7 @@ launch the underlying DataFlow in one independent process, and only parallelize
...
@@ -188,6 +190,7 @@ launch the underlying DataFlow in one independent process, and only parallelize
(`PrefetchDataZMQ` is faster but not fork-safe, so the first prefetch has to be `PrefetchData`. This is [issue#138](https://github.com/ppwwyyxx/tensorpack/issues/138))
(`PrefetchDataZMQ` is faster but not fork-safe, so the first prefetch has to be `PrefetchData`. This is [issue#138](https://github.com/ppwwyyxx/tensorpack/issues/138))
Let me summarize what the above DataFlow does:
Let me summarize what the above DataFlow does:
1. One process reads LMDB file, shuffle them in a buffer and put them into a `multiprocessing.Queue` (used by `PrefetchData`).
1. One process reads LMDB file, shuffle them in a buffer and put them into a `multiprocessing.Queue` (used by `PrefetchData`).
2. 25 processes take items from the queue, decode and process them into [image, label] pairs, and
2. 25 processes take items from the queue, decode and process them into [image, label] pairs, and