@@ -16,7 +16,7 @@ then apply complicated preprocessing to it.
...
@@ -16,7 +16,7 @@ then apply complicated preprocessing to it.
We aim to reach a speed of, roughly **1k~3k images per second**, to keep GPUs busy.
We aim to reach a speed of, roughly **1k~3k images per second**, to keep GPUs busy.
Some things to know before reading:
Some things to know before reading:
1. For smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some multiprocess prefetch should usually work well enough.
1. For smaller datasets (e.g. several GBs of images with lightweight preprocessing), a simple reader plus some multiprocess runner should usually work well enough.
Therefore you don't have to understand this tutorial in depth unless you really find your data being the bottleneck.
Therefore you don't have to understand this tutorial in depth unless you really find your data being the bottleneck.
This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset.
This tutorial could be a bit complicated for people new to system architectures, but you do need these to be able to run fast enough on ImageNet-scale dataset.
2. Having a fast Python generator **alone** may or may not improve your overall training speed.
2. Having a fast Python generator **alone** may or may not improve your overall training speed.
...
@@ -64,7 +64,7 @@ On a good filesystem you probably can already observe good speed here (e.g. 5 it
...
@@ -64,7 +64,7 @@ On a good filesystem you probably can already observe good speed here (e.g. 5 it
because we are doing heavy random read on the filesystem (regardless of whether `shuffle` is True).
because we are doing heavy random read on the filesystem (regardless of whether `shuffle` is True).
Image decoding in `cv2.imread` could also be a bottleneck at this early stage.
Image decoding in `cv2.imread` could also be a bottleneck at this early stage.
### Parallel Prefetch
### Parallel Runner
We will now add the cheapest pre-processing now to get an ndarray in the end instead of a list
We will now add the cheapest pre-processing now to get an ndarray in the end instead of a list
(because training will need ndarray eventually):
(because training will need ndarray eventually):
...
@@ -84,15 +84,15 @@ Now it's time to add threads or processes:
...
@@ -84,15 +84,15 @@ Now it's time to add threads or processes:
Here we fork 25 processes to run `ds1`, and collect their output through ZMQ IPC protocol,
Here we fork 25 processes to run `ds1`, and collect their output through ZMQ IPC protocol,
which is faster than `multiprocessing.Queue`. You can also apply prefetch after batch, of course.
which is faster than `multiprocessing.Queue`. You can also apply parallel runner after batching, of course.
### Parallel Map
### Parallel Map
The above DataFlow might be fast, but since it forks the ImageNet reader (`ds0`),
The above DataFlow might be fast, but since it forks the ImageNet reader (`ds0`),
it's **not a good idea to use it for validation** (for reasons mentioned at top. More details at the [documentation](../modules/dataflow.html#tensorpack.dataflow.PrefetchDataZMQ)).
it's **not a good idea to use it for validation** (for reasons mentioned at top. More details at the [documentation](../modules/dataflow.html#tensorpack.dataflow.MultiProcessRunnerZMQ)).
Alternatively, you can use multi-threaded preprocessing like this:
Alternatively, you can use multi-threaded preprocessing like this:
```eval_rst
```eval_rst
...
@@ -102,9 +102,9 @@ Alternatively, you can use multi-threaded preprocessing like this:
...
@@ -102,9 +102,9 @@ Alternatively, you can use multi-threaded preprocessing like this: