Commit 2d661d6d authored by Yuxin Wu's avatar Yuxin Wu

update docs

parent 379e9a07
...@@ -50,10 +50,10 @@ If you expect higher speed, please read ...@@ -50,10 +50,10 @@ If you expect higher speed, please read
http://tensorpack.readthedocs.io/tutorial/performance-tuning.html http://tensorpack.readthedocs.io/tutorial/performance-tuning.html
before posting. before posting.
If you expect the model to converge / work better, note that we do not help you on how to train a new model. If you expect the model to converge / work better, note that we do not help you on how to improve a model.
Only in one of the two conditions can we help with it: Only in one of the two conditions can we help with it:
(1) You're unable to reproduce the results documented in tensorpack examples. (1) You're unable to reproduce the results documented in tensorpack examples.
(2) It appears to be a tensorpack bug. (2) It indicates a tensorpack bug.
### 4. Your environment: ### 4. Your environment:
......
...@@ -48,7 +48,7 @@ This is a minimal implementation that simply contains these files: ...@@ -48,7 +48,7 @@ This is a minimal implementation that simply contains these files:
3. We currently only support single image per GPU in this example. 3. We currently only support single image per GPU in this example.
4. Because of (3), BatchNorm statistics are supposed to be freezed during fine-tuning. 4. Because of (3), BatchNorm statistics are supposed to be frozen during fine-tuning.
5. An alternative to freezing BatchNorm is to sync BatchNorm statistics across 5. An alternative to freezing BatchNorm is to sync BatchNorm statistics across
GPUs (the `BACKBONE.NORM=SyncBN` option). GPUs (the `BACKBONE.NORM=SyncBN` option).
......
...@@ -115,6 +115,5 @@ if __name__ == '__main__': ...@@ -115,6 +115,5 @@ if __name__ == '__main__':
if is_horovod: if is_horovod:
trainer = HorovodTrainer(average=False) trainer = HorovodTrainer(average=False)
else: else:
# nccl mode appears faster than cpu mode
trainer = SyncMultiGPUTrainerReplicated(cfg.TRAIN.NUM_GPUS, average=False) trainer = SyncMultiGPUTrainerReplicated(cfg.TRAIN.NUM_GPUS, average=False)
launch_train_with_config(traincfg, trainer) launch_train_with_config(traincfg, trainer)
...@@ -211,8 +211,7 @@ class SyncMultiGPUReplicatedBuilder(DataParallelBuilder): ...@@ -211,8 +211,7 @@ class SyncMultiGPUReplicatedBuilder(DataParallelBuilder):
self._mode = mode self._mode = mode
if self._mode == 'hierarchical' and len(towers) != 8: if self._mode == 'hierarchical' and len(towers) != 8:
logger.warn("mode='hierarchical' require 8 GPUs. Fallback to mode='nccl'.") raise ValueError("mode='hierarchical' require 8 GPUs.")
self._mode = 'nccl'
def call_for_each_tower(self, tower_fn): def call_for_each_tower(self, tower_fn):
""" """
......
...@@ -75,6 +75,9 @@ def get_sync_bn_mean_var(inputs, red_axis, sync_statistics): ...@@ -75,6 +75,9 @@ def get_sync_bn_mean_var(inputs, red_axis, sync_statistics):
assert TF_version >= (1, 10), \ assert TF_version >= (1, 10), \
"Cross-GPU BatchNorm is only supported in TF>=1.10 ." \ "Cross-GPU BatchNorm is only supported in TF>=1.10 ." \
"Upgrade TF or apply this patch manually: https://github.com/tensorflow/tensorflow/pull/20360" "Upgrade TF or apply this patch manually: https://github.com/tensorflow/tensorflow/pull/20360"
if TF_version >= (1, 15):
logger.warn("BatchNorm(sync_statistics='nccl') may produce incorrect results due "
"to bug in TF>=1.15: https://github.com/tensorflow/tensorflow/issues/41539")
if TF_version <= (1, 12): if TF_version <= (1, 12):
try: try:
......
...@@ -168,10 +168,10 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer): ...@@ -168,10 +168,10 @@ class SyncMultiGPUTrainerReplicated(SingleCostTrainer):
gpus (int or [int]): list of GPU ids. gpus (int or [int]): list of GPU ids.
average (bool): whether to average or sum gradients. average (bool): whether to average or sum gradients.
mode (str or None): Gradient aggregation mode. mode (str or None): Gradient aggregation mode.
Supported values: ['nccl', 'hierarchical', 'cpu']. Supported values: ['nccl', 'hierarchical', 'cpu', 'gpu'].
These modes may differ in speed.
Default to pick automatically by heuristics. Default to pick automatically by heuristics.
These modes may have slight (within 5%) differences in speed. "hierarchical" mode was designed for DGX-like 8-GPU machines.
"hierarchical" mode was designed for DGX-like 8GPU machines.
""" """
self.devices = gpus self.devices = gpus
if mode is not None: if mode is not None:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment