update docs

afba8dee · Yuxin Wu · 0a0b387e · afba8dee · afba8dee · afba8dee
Commit afba8dee authored May 27, 2019 by Yuxin Wu
3 changed files
--- a/.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
+++ b/.github/ISSUE_TEMPLATE/unexpected-problems---bugs.md
@@ -20,7 +20,7 @@ feel free to delete everything in this template.

  It's always better to copy-paste what you did than to describe them.

-  Please try to provide enough information to let other __reproduce__ your issues. 
+  Please try to provide enough information to let others __reproduce__ your issues.
  Without reproducing the issue, we may not be able to investigate it.

 ### 2. What you observed:
@@ -44,11 +44,11 @@ If you expect higher speed, please read
 http://tensorpack.readthedocs.io/tutorial/performance-tuning.html
 before posting.

-If you expect certain accuracy, only in one of the two conditions can we help with it:
-(1) You're unable to reproduce the accuracy documented in tensorpack examples.
+If you expect certain training results (e.g., accuracy), only in one of the two conditions can we help with it:
+(1) You're unable to reproduce the results documented in tensorpack examples.
 (2) It appears to be a tensorpack bug.

-Otherwise, how to train a model to certain accuracy is a machine learning question.
+Otherwise, how to train a model is a machine learning question.
 We do not answer machine learning questions and it is your responsibility to
 figure out how to make your models more accurate.


--- a/examples/FasterRCNN/NOTES.md
+++ b/examples/FasterRCNN/NOTES.md
@@ -60,21 +60,28 @@ Model:

 6. Another alternative to BatchNorm is GroupNorm (`BACKBONE.NORM=GN`) which has better performance.

-Speed:
+Efficiency:

 1. If CuDNN warmup is on, the training will start very slowly, until about
   10k steps (or more if scale augmentation is used) to reach a maximum speed.
   As a result, the ETA is also inaccurate at the beginning.
-   CuDNN warmup is by default on when no scale augmentation is used.
+   CuDNN warmup is by default enabled when no scale augmentation is used.

 1. After warmup, the training speed will slowly decrease due to more accurate proposals.

-1. The code should have around 70% GPU utilization on V100s, and 85%~90% scaling
+1. The code should have around 80~90% GPU utilization on V100s, and 85%~90% scaling
   efficiency from 1 V100 to 8 V100s.

 1. This implementation does not use specialized CUDA ops (e.g. AffineChannel, ROIAlign).
   Therefore it might be slower than other highly-optimized implementations.
   
+1. To reduce RAM usage on host: (1) make sure you're using the "spawn" method as
+   set in `train.py`; (2) reduce `buffer_size` or `NUM_WORKERS` in `data.py`
+   (which may negatively impact your throughput). The training needs <10G RAM if `NUM_WORKERS=0`.
+   
+1. Inference is unoptimized. Tensorpack is a training interface, therefore it
+   does not help you on optimized inference.
+
 Possible Future Enhancements:

 1. Define a better interface to load different datasets.

--- a/tensorpack/dataflow/format.py
+++ b/tensorpack/dataflow/format.py
@@ -92,9 +92,7 @@ class LMDBData(RNGDataFlow):
        logger.info("Found {} entries in {}".format(self._size, self._lmdb_path))

        # Clean them up after finding the list of keys, since we don't want to fork them
-        self._lmdb.close()
-        del self._lmdb
-        del self._txn
+        self._close_lmdb()

    def _set_keys(self, keys=None):
        def find_keys(txn, size):
@@ -131,6 +129,11 @@ class LMDBData(RNGDataFlow):
                               map_size=1099511627776 * 2, max_readers=100)
        self._txn = self._lmdb.begin()

+    def _close_lmdb(self):
+        self._lmdb.close()
+        del self._lmdb
+        del self._txn
+
    def reset_state(self):
        self._guard = DataFlowReentrantGuard()
        super(LMDBData, self).reset_state()