Commit 560bc84e authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent ae818ecd
...@@ -155,6 +155,8 @@ The above script builds a DataFlow which produces jpeg-encoded ImageNet data. ...@@ -155,6 +155,8 @@ The above script builds a DataFlow which produces jpeg-encoded ImageNet data.
We store the jpeg string as a numpy array because the function `cv2.imdecode` later expect this format. We store the jpeg string as a numpy array because the function `cv2.imdecode` later expect this format.
Please note we can only use 1 prefetch process to speed up. If `nr_proc>1`, `ds1` will take data Please note we can only use 1 prefetch process to speed up. If `nr_proc>1`, `ds1` will take data
from several forks of `ds0`, then neither the content nor the order of `ds1` will be the same as `ds0`. from several forks of `ds0`, then neither the content nor the order of `ds1` will be the same as `ds0`.
See [documentation](http://localhost:8000/modules/dataflow.html#tensorpack.dataflow.PrefetchDataZMQ)
about caveats of `PrefetchDataZMQ`.
It will generate a database file of 140G. We build a DataFlow to read this LMDB file sequentially: It will generate a database file of 140G. We build a DataFlow to read this LMDB file sequentially:
``` ```
......
...@@ -151,10 +151,11 @@ class Monitors(Callback): ...@@ -151,10 +151,11 @@ class Monitors(Callback):
def put_image(self, name, val): def put_image(self, name, val):
""" """
Put an image. Put an image.
Args: Args:
name (str): name (str):
val (np.ndarray): 2D, 3D (HWC) or 4D (NHWC) numpy array of images val (np.ndarray): 2D, 3D (HWC) or 4D (NHWC) numpy array of images
in range [0,255]. If channel is 3, assumed to be RGB. in range [0,255]. If channel is 3, assumed to be RGB.
""" """
assert isinstance(val, np.ndarray) assert isinstance(val, np.ndarray)
arr = image_to_nhwc(val) arr = image_to_nhwc(val)
......
...@@ -126,10 +126,15 @@ class PrefetchDataZMQ(ProxyDataFlow): ...@@ -126,10 +126,15 @@ class PrefetchDataZMQ(ProxyDataFlow):
collect datapoints from `ds` in each process by ZeroMQ IPC pipe. collect datapoints from `ds` in each process by ZeroMQ IPC pipe.
Note: Note:
1. The underlying dataflow worker will be forked multiple times When ``nr_proc>1``. 1. An iterator cannot run faster automatically -- the underlying dataflow worker
As a result, unless the underlying dataflow is fully shuffled, the data distribution will be forked ``nr_proc`` times. As a result, we have the following
produced by this dataflow will be different. guarantee on the dataflow correctness:
(e.g. you are likely to see duplicated datapoints at the beginning)
a. When ``nr_proc=1``, the dataflow produces the same data as ``ds`` in the same order.
b. When ``nr_proc>1``, the dataflow produces the same distribution
of data as ``ds`` if each sample from ``ds`` is i.i.d. (e.g. fully shuffled).
You probably only want to use it for training.
2. Once :meth:`reset_state` is called, this dataflow becomes not fork-safe. 2. Once :meth:`reset_state` is called, this dataflow becomes not fork-safe.
i.e., if you fork an already reset instance of this dataflow, i.e., if you fork an already reset instance of this dataflow,
it won't be usable in the forked process. it won't be usable in the forked process.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment