Commit afba8dee authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 0a0b387e
......@@ -20,7 +20,7 @@ feel free to delete everything in this template.
It's always better to copy-paste what you did than to describe them.
Please try to provide enough information to let other __reproduce__ your issues.
Please try to provide enough information to let others __reproduce__ your issues.
Without reproducing the issue, we may not be able to investigate it.
### 2. What you observed:
......@@ -44,11 +44,11 @@ If you expect higher speed, please read
http://tensorpack.readthedocs.io/tutorial/performance-tuning.html
before posting.
If you expect certain accuracy, only in one of the two conditions can we help with it:
(1) You're unable to reproduce the accuracy documented in tensorpack examples.
If you expect certain training results (e.g., accuracy), only in one of the two conditions can we help with it:
(1) You're unable to reproduce the results documented in tensorpack examples.
(2) It appears to be a tensorpack bug.
Otherwise, how to train a model to certain accuracy is a machine learning question.
Otherwise, how to train a model is a machine learning question.
We do not answer machine learning questions and it is your responsibility to
figure out how to make your models more accurate.
......
......@@ -60,21 +60,28 @@ Model:
6. Another alternative to BatchNorm is GroupNorm (`BACKBONE.NORM=GN`) which has better performance.
Speed:
Efficiency:
1. If CuDNN warmup is on, the training will start very slowly, until about
10k steps (or more if scale augmentation is used) to reach a maximum speed.
As a result, the ETA is also inaccurate at the beginning.
CuDNN warmup is by default on when no scale augmentation is used.
CuDNN warmup is by default enabled when no scale augmentation is used.
1. After warmup, the training speed will slowly decrease due to more accurate proposals.
1. The code should have around 70% GPU utilization on V100s, and 85%~90% scaling
1. The code should have around 80~90% GPU utilization on V100s, and 85%~90% scaling
efficiency from 1 V100 to 8 V100s.
1. This implementation does not use specialized CUDA ops (e.g. AffineChannel, ROIAlign).
Therefore it might be slower than other highly-optimized implementations.
1. To reduce RAM usage on host: (1) make sure you're using the "spawn" method as
set in `train.py`; (2) reduce `buffer_size` or `NUM_WORKERS` in `data.py`
(which may negatively impact your throughput). The training needs <10G RAM if `NUM_WORKERS=0`.
1. Inference is unoptimized. Tensorpack is a training interface, therefore it
does not help you on optimized inference.
Possible Future Enhancements:
1. Define a better interface to load different datasets.
......
......@@ -92,9 +92,7 @@ class LMDBData(RNGDataFlow):
logger.info("Found {} entries in {}".format(self._size, self._lmdb_path))
# Clean them up after finding the list of keys, since we don't want to fork them
self._lmdb.close()
del self._lmdb
del self._txn
self._close_lmdb()
def _set_keys(self, keys=None):
def find_keys(txn, size):
......@@ -131,6 +129,11 @@ class LMDBData(RNGDataFlow):
map_size=1099511627776 * 2, max_readers=100)
self._txn = self._lmdb.begin()
def _close_lmdb(self):
self._lmdb.close()
del self._lmdb
del self._txn
def reset_state(self):
self._guard = DataFlowReentrantGuard()
super(LMDBData, self).reset_state()
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment