Commit 02eb02e2 authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 5df194d9
PLEASE finish reading to show some respect to the authors.
An issue has to be one of the following: An issue has to be one of the following:
- Unexpected Problems / Potential Bugs - Unexpected Problems / Potential Bugs
- Feature Requests - Feature Requests
- Questions on Using/Understanding Tensorpack - Questions on Using/Understanding Tensorpack
## For any unexpected problems, __PLEASE ALWAYS INCLUDE__:
1. What you did:
+ If you're using examples:
+ What's the command you run:
+ Have you made any changes to code? Paste them if any:
+ If not, tell us what you did that may be relevant.
But we may not investigate it if there is no reproducible code.
+ Better to paste what you did instead of describing them.
2. What you observed, including but not limited to the __entire__ logs.
+ Better to paste what you observed instead of describing them.
3. What you expected, if not obvious.
4. Your environment:
+ Python version.
+ TF version: `python -c 'import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)'`.
+ Tensorpack version: `python -c 'import tensorpack; print(tensorpack.__version__)'`.
You can install Tensorpack master by `pip install -U git+https://github.com/ppwwyyxx/tensorpack.git`.:
+ Hardware information, if relevant.
About efficiency issues, PLEASE first read http://tensorpack.readthedocs.io/en/latest/tutorial/performance-tuning.html
## Feature Requests:
+ You can implement a lot of features by extending Tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
It does not have to be added to Tensorpack unless you have a good reason.
+ "Could you improve/implement an example/paper ?"
-- The answer is: we have no plans to do so. We don't take feature requests for
examples or implement a paper for you. If you don't know how to do it, you may ask a usage question.
## Usage Questions:
+ Read the [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials) first.
+ We answer "HOW to do X with Tensorpack" for a well-defined X.
We also answer "HOW/WHY Tensorpack does X" for some X that Tensorpack or its examples are doing.
We don't answer general machine learning questions, such as "why my training doesn't converge", "what networks to use" or "I don't understand the paper".
You can also use gitter (https://gitter.im/tensorpack/users) for more casual discussions.
--- ---
name: Feature Requests name: Feature Requests
about: Suggest an idea for this project about: Suggest an idea for Tensorpack
--- ---
+ You can implement a lot of features by extending Tensorpack + Note that you can implement a lot of features by extending Tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack). (See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
It does not have to be added to Tensorpack unless you have a good reason. It does not have to be added to Tensorpack unless you have a good reason.
+ "Could you improve/implement an example/paper ?" + "Could you improve/implement an example/paper ?"
-- The answer is: we have no plans to do so. We don't take feature requests for -- The answer is: we have no plans to do so. We don't take feature requests for
examples or implement a paper for you. If you don't know how to do it, you may ask a usage question. examples or implement a paper for you. If you don't know how to do it, you may ask a usage question.
--- ---
name: Unexpected Problems / Bugs name: Unexpected Problems / Bugs
about: For use with unexpected problems of Tensorpack. about: Report unexpected problems about Tensorpack or its examples.
--- ---
......
...@@ -4,7 +4,8 @@ about: More general questions about Tensorpack. ...@@ -4,7 +4,8 @@ about: More general questions about Tensorpack.
--- ---
+ Read the [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials) first. + Your question is probably answered in [tutorials](http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#user-tutorials). Read it first.
+ We answer "HOW to do X with Tensorpack" for a well-defined X. + We answer "HOW to do X with Tensorpack" for a well-defined X.
We also answer "HOW/WHY Tensorpack does X" for some X that Tensorpack or its examples are doing. We also answer "HOW/WHY Tensorpack does X" for some X that Tensorpack or its examples are doing.
......
...@@ -31,35 +31,40 @@ Model: ...@@ -31,35 +31,40 @@ Model:
<p align="center"> <img src="https://user-images.githubusercontent.com/1381301/31527740-2f1b38ce-af84-11e7-8de1-628e90089826.png"> </p> <p align="center"> <img src="https://user-images.githubusercontent.com/1381301/31527740-2f1b38ce-af84-11e7-8de1-628e90089826.png"> </p>
2. We use ROIAlign, and because of (1), `tf.image.crop_and_resize` is __NOT__ ROIAlign. 2. We use ROIAlign, and `tf.image.crop_and_resize` is __NOT__ ROIAlign.
3. We only support single image per GPU. 3. We currently only support single image per GPU.
4. Because of (3), BatchNorm statistics are supposed to be freezed during fine-tuning. 4. Because of (3), BatchNorm statistics are supposed to be freezed during fine-tuning.
This specific kind of BatchNorm will need [my kernel](https://github.com/tensorflow/tensorflow/pull/12580)
which is included since TF 1.4.
5. An alternative to freezing BatchNorm is to sync BatchNorm statistics across 5. An alternative to freezing BatchNorm is to sync BatchNorm statistics across
GPUs (the `BACKBONE.NORM=SyncBN` option). This would require [my bugfix](https://github.com/tensorflow/tensorflow/pull/20360) GPUs (the `BACKBONE.NORM=SyncBN` option). This would require [my bugfix](https://github.com/tensorflow/tensorflow/pull/20360)
which will probably be in TF 1.10. You can manually apply the patch to use it. which is available since TF 1.10. You can manually apply the patch to use it.
For now the total batch size is at most 8, so this option does not improve the model by much. For now the total batch size is at most 8, so this option does not improve the model by much.
6. Another alternative to BatchNorm is GroupNorm (`BACKBONE.NORM=GN`) which has better performance.
Speed: Speed:
1. The training will start very slow due to convolution warmup, until about 10k steps to reach a maximum speed. 1. The training will start very slow due to convolution warmup, until about 10k
Then the training speed will slowly decrease due to more accurate proposals. steps to reach a maximum speed.
You can disable warmup by `export TF_CUDNN_USE_AUTOTUNE=0`, which makes the
training faster at the beginning, but perhaps not in the end.
2. This implementation is about 14% slower than detectron, 1. After warmup the training speed will slowly decrease due to more accurate proposals.
1. This implementation is about 10% slower than detectron,
probably due to the lack of specialized ops (e.g. AffineChannel, ROIAlign) in TensorFlow. probably due to the lack of specialized ops (e.g. AffineChannel, ROIAlign) in TensorFlow.
It's certainly faster than other TF implementation. It's certainly faster than other TF implementation.
1. The code should have around 70% GPU utilization on V100s, and 85%~90% scaling
efficiency from 1 V100 to 8 V100s.
Possible Future Enhancements: Possible Future Enhancements:
1. Data-parallel evaluation during training. 1. Define an interface to load custom dataset.
2. Define an interface to load custom dataset.
3. Support batch>1 per GPU. 1. Support batch>1 per GPU.
4. Use dedicated ops to improve speed. (e.g. a TF implementation of ROIAlign op 1. Use dedicated ops to improve speed. (e.g. a TF implementation of ROIAlign op
can be found in [light-head RCNN](https://github.com/zengarden/light_head_rcnn/tree/master/lib/lib_kernel)) can be found in [light-head RCNN](https://github.com/zengarden/light_head_rcnn/tree/master/lib/lib_kernel))
...@@ -94,4 +94,4 @@ MaskRCNN results contain both box and mask mAP. ...@@ -94,4 +94,4 @@ MaskRCNN results contain both box and mask mAP.
## Notes ## Notes
See [Notes on This Implementation](NOTES.md) [NOTES.md](NOTES.md) has some notes about implementation details & speed.
...@@ -494,6 +494,8 @@ class EvalCallback(Callback): ...@@ -494,6 +494,8 @@ class EvalCallback(Callback):
if len(self.epochs_to_eval) < 15: if len(self.epochs_to_eval) < 15:
logger.info("[EvalCallback] Will evaluate at epoch " + str(sorted(self.epochs_to_eval))) logger.info("[EvalCallback] Will evaluate at epoch " + str(sorted(self.epochs_to_eval)))
else: else:
if cfg.TRAINER == 'horovod':
logger.warn("[EvalCallback] Evaluation is single-GPU only and quite slow under horovod mode.")
logger.info("[EvalCallback] Will evaluate every {} epochs".format(interval)) logger.info("[EvalCallback] Will evaluate every {} epochs".format(interval))
def _eval(self): def _eval(self):
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment