1.`LMDBDataPoint` deserialize the datapoints (from string to [jpeg_string, label])
2. Use OpenCV to decode the first component into ndarray
3. Apply augmentations to the ndarray
...
...
@@ -188,6 +190,7 @@ launch the underlying DataFlow in one independent process, and only parallelize
(`PrefetchDataZMQ` is faster but not fork-safe, so the first prefetch has to be `PrefetchData`. This is [issue#138](https://github.com/ppwwyyxx/tensorpack/issues/138))
Let me summarize what the above DataFlow does:
1. One process reads LMDB file, shuffle them in a buffer and put them into a `multiprocessing.Queue` (used by `PrefetchData`).
2. 25 processes take items from the queue, decode and process them into [image, label] pairs, and