Commit Graph

35 Commits

Author SHA1 Message Date
Maggie Moss
84b14f3a10 Fix error suppression syntax in utils and nn (#166242)
Fixes syntax for pyrefly : ignores so they only ignore a specific category. No functional changes

pyrefly check
lintrunner

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166242
Approved by: https://github.com/oulgen, https://github.com/cyyever
2025-10-26 05:21:07 +00:00
Maggie Moss
c855f8632e Pyrefly suppressions 7/n (#164913)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Almost there!

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
 INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164913
Approved by: https://github.com/oulgen
2025-10-08 07:27:17 +00:00
Tom Ritchford
c0582fd0f8 Remove unused Python variables in torch/[b-z]* (#136963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963
Approved by: https://github.com/ezyang
2024-10-19 16:45:22 +00:00
Xuehai Pan
62ccf6d7cd [BE] enable UFMT for torch/nn/modules (#128594)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128594
Approved by: https://github.com/mikaylagawarecki
2024-06-23 05:37:57 +00:00
PyTorch MergeBot
d4022b4658 Revert "[BE] enable UFMT for torch/nn/modules (#128594)"
This reverts commit 95ac2d6482.

Reverted https://github.com/pytorch/pytorch/pull/128594 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128594#issuecomment-2181788935))
2024-06-21 00:50:08 +00:00
Xuehai Pan
95ac2d6482 [BE] enable UFMT for torch/nn/modules (#128594)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128594
Approved by: https://github.com/mikaylagawarecki
ghstack dependencies: #128596
2024-06-17 16:29:25 +00:00
Aaron Orenstein
27f9d3b0a1 Flip default value for mypy disallow_untyped_defs [8/11] (#127845)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127845
Approved by: https://github.com/oulgen
ghstack dependencies: #127842, #127843, #127844
2024-06-08 18:49:56 +00:00
Aaron Gokaslan
ea7d70aecc [BE]: ruff FURB136: replace ternary with min/max (preview) (#114382)
Replaces ternary if else statements with simple min max when appropriate.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114382
Approved by: https://github.com/albanD
2023-11-22 22:10:01 +00:00
Justin Chu
79c5e33349 [BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436
Approved by: https://github.com/malfet, https://github.com/albanD
2023-07-21 07:38:46 +00:00
shibo19
05854212dd add syncBN support for custom device (#104250)
Fixes #ISSUE_NUMBER
there are some hard checks for `cuda`, so I make optimize the check so that we can run it for other device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104250
Approved by: https://github.com/albanD
2023-07-17 15:41:39 +00:00
Danni Li
b33d63d97b [BE] Use ValueError for input.dim check in torch.nn.modules (#105127)
Summary: Use ValueError for input.dim check instead of Assertion Error.

Fix: #104839

Test Plan: Please see GitHub actions.

Differential Revision: D47427998

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105127
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-13 23:20:46 +00:00
Kazuaki Ishizaki
a531a464fd Fix typos under torch/nn directory (#97594)
This PR fixes typos in comments of `.py` files under `torch/nn` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97594
Approved by: https://github.com/dagitses, https://github.com/kit1980
2023-04-10 22:07:15 +00:00
Andrew Gu
3686416a57 [SyncBatchNorm] Support running with low precision parameters (#98332)
This PR fixes https://github.com/pytorch/pytorch/issues/96203.

**Details**
When using `nn.SyncBatchNorm` with the model converted to FP16, there is a dtype discrepancy in the `SyncBatchNorm.forward()` causing an error like:
```
 File "/.../pytorch/torch/nn/modules/_functions.py", line 91, in forward
    mean, invstd = torch.batch_norm_gather_stats_with_counts(
RuntimeError: Expected counts to have type Half but got Float
```
[`torch.batch_norm_gather_stats_with_counts()`](fe9da29842/torch/nn/modules/_functions.py (L88-L97)) requires the `running_mean`, `running_var`, and `counts` to have the same dtype. However, when the model has been converted to FP16, only `running_mean` and `running_var` use FP16, while the `counts` are in FP32 due to [`mean` being in FP32](fe9da29842/torch/nn/modules/_functions.py (L25-L30)). This PR resolves this by casting `counts` from FP32 to FP16 instead of the alternative to cast `mean` and `invstd` from FP32 to FP16.

Moreover, for the backward, this PR casts `weight` from FP16 to FP32 to match the dtype of `mean` and `invstd` as required by `torch.batch_norm_backward_elemt()` instead of the alternative to cast `mean` and `invstd` from FP32 to FP16.

**Test Plan**
I dug up this run command from 2021:
For `world_size` in `{1,2}` and `backend` in `{nccl, gloo}`:
```
WORLD_SIZE=world_size BACKEND=backend  python -m pytest test/distributed/test_distributed_spawn.py -k test_DistributedDataParallel_SyncBatchNorm_half -vs
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98332
Approved by: https://github.com/rohan-varma
2023-04-05 00:00:30 +00:00
Howard Huang
9497552771 Update SyncBatchNorm _all_gather_base to all_gather_into_tensor (#89521)
Summary: Fixes https://github.com/pytorch/pytorch/issues/88568

`_all_gather_base` is deprecated. So replacing its usage with `all_gather_into_tensor`

Test Plan: CI

Differential Revision: D41479983

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89521
Approved by: https://github.com/wz337
2022-11-24 19:41:17 +00:00
Xiao Wang
f5df685090 Enable channels_last_3d on SyncBatchNorm (#88401)
This PR enabled the use of fast channels_last kernels on SyncBatchNorm with channels_last_3d memory format.

With a small benchmark script here https://github.com/pytorch/pytorch/issues/88021#issuecomment-1299059859, on V100, I got

master:
```
DDP channels_last=False, run_forward_backward, time: 0.8945400714874268 sec
DDP channels_last=True, run_forward_backward, time: 1.4736433029174805 sec
```

This PR:
```
DDP channels_last=False, run_forward_backward, time: 0.8927242755889893 sec
DDP channels_last=True, run_forward_backward, time: 0.48697471618652344 sec
```

This PR is a follow-up of https://github.com/pytorch/pytorch/pull/46906

Close https://github.com/pytorch/pytorch/issues/88021
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88401
Approved by: https://github.com/ngimel
2022-11-15 19:25:53 +00:00
Shen Li
1884d7fbe9 Avoid CPU Sync in SyncBatchNorm When Capturing CUDA Graphs
We recently updated `SyncBatchNorm` to support empty input batches.
The new code removes stats from ranks with empty inputs. However,
this change breaks CUDA graph capture as it forces CPU sync. This
commit uses `is_current_stream_capturing()` to guard the new code
path, and only run the new code when not capturing CUA Graphs. To
support empty inputs with CUDA graph capturing, we might need to
update CUDA kernels for `batch_norm_backward_elemt` and
`batch_norm_gather_stats_with_counts`. See #78656.

Fixes #78549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78666

Approved by: https://github.com/albanD
2022-06-03 04:32:57 +00:00
Shen Li
87ab665ba6 Fix SyncBatchNorm for empty inputs (#74944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74944

fixes #36530

Prior to this commit, SyncBatchNorm crashes with the following
error message.

```
File "..../torch/nn/modules/_functions.py", line 17, in forward
    mean, invstd = torch.batch_norm_stats(input, eps)
RuntimeError: cannot reshape tensor of 0 elements into shape [0, 3, -1] because the unspecified dimension size -1 can be any value and is ambiguous
```

This PR adds a dedicated branch to handle empty inputs. When a process
recieves empty inputs, it will set its local `mean`, `invstd`, and `count`
to zero, and participate in the `all_gather` collective communications in
the forward pass. Then `mean` and `invstd` with zero count will be
filtered out before computing global mean and invstd. In the backward
pass, it also participate in the `all_reduce` communication with zero
tensors to unblock its peers.

Differential Revision:
D35273409
D35273409

Test Plan: Imported from OSS

Reviewed By: datumbox

Pulled By: mrshenli

fbshipit-source-id: 1cee51eea866773c329b3fbf5da2be8a5fee6f0f
(cherry picked from commit f8e2a2357240ebe7b7a058047d376a5300bdeda9)
2022-04-01 23:48:30 +00:00
Yanli Zhao
2733555ed1 replace all_gather with more efficient collective api _all_gather_base (#57769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57769

_all_gather_base saved copies in all_gather, so it is more efficient

Test Plan: unit test

Reviewed By: SciPioneer

Differential Revision: D28227193

fbshipit-source-id: ddd8590095a5b45676497a71ed792a457f9825c6
2021-05-24 11:34:45 -07:00
Nikita Shulga
69c5fd1e00 SyncBatchNorm.forward() to handle optional weight (#54568)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54495

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54568

Reviewed By: ezyang

Differential Revision: D27285822

Pulled By: malfet

fbshipit-source-id: 4f7b489d80294cb2509eec4f6c4aa22d5c47b35d
2021-04-01 08:02:21 -07:00
Xiao Wang
d30f4d1dfd Migrate apex.parallel.SyncBatchNorm channels_last to pytorch (#46906)
Summary:
per title

This PR did
- Migrate `apex.parallel.SyncBatchNorm` channels_last to pytorch `torch.nn.SyncBatchNorm`
- Fix a TODO here by fusing `sum`, `div` kernels into backward elementwise kernel
b167402e2e/torch/nn/modules/_functions.py (L76-L95)

Todo
- [x] Discuss a regression introduced in https://github.com/pytorch/pytorch/pull/37133#discussion_r512530389, which is the synchronized copy here
b167402e2e/torch/nn/modules/_functions.py (L32-L34)

**Comment**: This PR uses apex version for the size check. Test passed and I haven't seen anything wrong so far.

- [x] The restriction to use channels_last kernel will be like this
```
inline bool batch_norm_use_channels_last_kernels(const at::Tensor& self) {
  return self.is_contiguous(at::MemoryFormat::ChannelsLast) || self.ndimension() == 2;
}
```
I think we can relax that for channels_last_3d as well?

**Comment**: we don't have benchmark for this now, will check this and add functionality later when needed.
- [x] Add test
- [x] Add benchmark

Detailed benchmark is at https://github.com/xwang233/code-snippet/tree/master/syncbn-channels-last

Close https://github.com/pytorch/pytorch/issues/50781

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46906

Reviewed By: albanD

Differential Revision: D26771437

Pulled By: malfet

fbshipit-source-id: d00387044e9d43ac7e6c0e32a2db22c63d1504de
2021-03-03 15:29:45 -08:00
Nikita Shulga
bf4fcab681 Fix SyncBatchNorm usage without stats tracking (#50126)
Summary:
In `batch_norm_gather_stats_with_counts_cuda` use `input.scalar_type()` if `running_mean` is not defined
In `SyncBatchNorm` forward function create count tensor with `torch.float32` type if `running_mean` is None
Fix a few typos

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50126

Test Plan:
```
python -c "import torch;print(torch.batch_norm_gather_stats_with_counts( torch.randn(1, 3, 3, 3, device='cuda'), mean = torch.ones(2, 3, device='cuda'), invstd = torch.ones(2, 3, device='cuda'), running_mean = None, running_var = None  , momentum = .1, eps = 1e-5, counts = torch.ones(2, device='cuda')))"
```

Fixes https://github.com/pytorch/pytorch/issues/49730

Reviewed By: ngimel

Differential Revision: D25797930

Pulled By: malfet

fbshipit-source-id: 22a91e3969b5e9bbb7969d9cc70b45013a42fe83
2021-01-07 18:31:13 -08:00
albanD
ccd646696b Fix Module backward hooks for all Tensor inputs/outputs (#46163)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/598

This is BC-breaking as we now explicitly don't call the hook when there are not Tensors at the top level of the output.
This feature was not working anyways as the returned grad_input/grad_output were wrong (not respecting the output structure and wrong inputs for multi-Node Module).

This is also BC-breaking as we now report the correct gradients for `nn.Module`s that contain multiple autograd `Node`s while we use to return bad results before.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46163

Reviewed By: ailzhang, mruberry

Differential Revision: D24894180

Pulled By: albanD

fbshipit-source-id: e1b5d193d2818eb2f51e2a2722c7405c8bd13c2b
2020-12-18 09:04:36 -08:00
Vasiliy Kuznetsov
b167402e2e [redo] Fix SyncBatchNorm forward pass for non-default process group (#43861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43861

This is a redo of https://github.com/pytorch/pytorch/pull/38874, and
fixing my original bug from
https://github.com/pytorch/pytorch/pull/38246.

Test Plan:
CI

Imported from OSS

Reviewed By: supriyar

Differential Revision: D23418816

fbshipit-source-id: 2a3a3d67fc2d03bb0bf30a87cce4e805ac8839fb
2020-09-02 10:44:46 -07:00
Vasiliy Kuznetsov
f64d24c941 speed up SyncBatchNorm by batching distributed communication (#38246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38246

Speeds up SyncBatchNorm by batching the distributed communication.
Initial benchmarks show a ~15+% speed improvement on MobileNetV2 and
EfficientNetB3 on a single machine with 8 gpus. Improvement
vs baseline increases as # of gpus increases.

Test Plan:
verified that before+after intermediate values in fwd/bwd pass are equivalent (with `torch.allclose`)

benchmark runner:
https://gist.github.com/vkuzo/7b1ce1b1b051ee6d46877d0f18ab9b1f

results (1 forward pass + 1 backward pass, 1 machine, 8x Tesla-P100, batch_size=20 per node):
```
model           gpus  before_ms after_ms  speedup
efficientnet-b3 2     660       654       0.00909
efficientnet-b3 4     777       710       0.08623
efficientnet-b3 8     988       838       0.15182
mobilenet-v2    2     267       266       0.00375
mobilenet-v2    4     328       289       0.1189
mobilenet-v2    8     453       373       0.1766
```

Imported from OSS

Differential Revision: D21505905

fbshipit-source-id: 3e796343fce8329a2e17671d60ae66c0387924e7
2020-05-13 11:21:42 -07:00
elmirador
ae755a73d3 SyncBatchNorm size check update (#37133)
Summary:
Update the requirements on input dimensions for torch.nn.SyncBatchNorm:
1. Checks the aggregated batch size `count_all` instead of batch size in every DDP process https://github.com/pytorch/pytorch/issues/36865
2. Added test function for SyncBatchNorm where every process only has 1 input
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37133

Differential Revision: D21331120

Pulled By: zhaojuanmao

fbshipit-source-id: ef3d1937990006609cfe4a68a64d90276c5085f2
2020-05-01 18:01:30 -07:00
gzygzy9211
ab2a9ab925 Non-blocking SyncBatchNorm update (#36659)
Summary:
As shown in https://github.com/pytorch/pytorch/issues/36452 , SyncBatchNorm can block host thread due the ``MemcpyDtoH`` and ``MemcpyHtoD`` when dealing with argument ``counts`` for native function ``batch_norm_gather_stats_with_counts``.

- This fix change signiture of ``batch_norm_gather_stats_with_counts`` to
```c++
std::tuple<Tensor, Tensor> batch_norm_gather_stats_with_counts_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean, const Tensor& running_var, double momentum, double epsilon, const Tensor& counts)
```
so it can directly receive "counts" in a ``CUDATensor`` rather than ``IntArrayRef`` whose data is in host memory.

- This fix also improve implementation of ``SyncBatchNorm`` function so the construction of ``counts`` tensor will not cause additional ``MemcpyHtoD``, which will block host thread, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36659

Differential Revision: D21196991

Pulled By: ngimel

fbshipit-source-id: 84a529e6cf22e03618fecbb8f070ec452f81229e
2020-04-23 10:22:19 -07:00
Jie
289d52c120 Fixing SyncBN dgrad (#36382)
Summary:
Previous PR https://github.com/pytorch/pytorch/issues/22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36382

Differential Revision: D20984446

Pulled By: ngimel

fbshipit-source-id: 80066eee83760b275d61e2cdd4e86facca5577fd
2020-04-13 21:08:31 -07:00
Xiao Wang
c1dd70688a Fix deprecated python "add" calls (#33428)
Summary:
This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used.

cc csarofeen zasdfgbnm ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428

Differential Revision: D20002534

Pulled By: vincentqb

fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130
2020-02-26 09:02:31 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
jiej
9c7e604c60 SyncBatchNorm Update on input dimension checks (#29626)
Summary:
update the requirements on input dimensions for `torch.nn.SyncBatchNorm`:
1. 2D inputs is now permissible, https://github.com/pytorch/pytorch/issues/20204 ;
2. requires at least two element along normalization plane (BatchNorm behavior);
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29626

Differential Revision: D18492531

Pulled By: albanD

fbshipit-source-id: f008e46a2d520d73c3c2730890a7424eba2ede9e
2019-11-18 16:09:51 -08:00
Gregory Chanan
23fde77d3d Remove Module._backend as it's not used anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25342

Test Plan: Imported from OSS

Differential Revision: D17101571

Pulled By: gchanan

fbshipit-source-id: 2cda46fe197e26a1cacb5e912f535809973d306e
2019-08-29 15:43:49 -07:00
root
8640aef505 Add support for non-affine batch norm with float stats and half inputs (#22750)
Summary:
This PR creates support for non-affine batch norm with float running estimates and half inputs.
Changed were made similar to https://github.com/pytorch/pytorch/issues/16735.

I couldn't find a specific test for `SyncBatchNorm`, so I used [this code](https://gist.github.com/ptrblck/ab45bfcde6df55ac28a7be18531f4718) to test it.

cc ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22750

Differential Revision: D17119965

Pulled By: ezyang

fbshipit-source-id: 2e8c5d63fc3c636b8a1338c43c9c101a0f5e9b22
2019-08-29 14:04:37 -07:00
Gregory Chanan
a8ae33ce27 Move autograd function for CrossMapLRN2d from being backend specific to modules/_functions. (#25339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25339

This is to get rid of backend-specific dispatch in modules; this autograd function is no longer backend specific so
doesn't need to be in a backend specific location.

Test Plan: Imported from OSS

Differential Revision: D17101576

Pulled By: gchanan

fbshipit-source-id: f4f0bd3ecc2d4dbd8cdfedbaabcadb8c603d2507
2019-08-29 09:55:11 -07:00
Shuaipeng Li
29ec4769bb Fix SyncBatchNorm running var update issue (#22248)
Summary:
## Fix https://github.com/pytorch/pytorch/issues/22192

+ change signature: `func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)`
+ change cuda & cuda head
```cuda
std::tuple<Tensor, Tensor> batch_norm_gather_stats_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean,
                                                        const Tensor& running_var, double momentum, double epsilon, int64_t count) {
                                                        const Tensor& running_var, double momentum, double epsilon, const Tensor& counts)
```
+ change python interface
```python
class SyncBatchNorm(Function):
    def forward(self, input, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size):
        ...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22248

Differential Revision: D16002146

Pulled By: mrshenli

fbshipit-source-id: 9007e83928267b89df4d3847aabfbdb63e456956
2019-07-03 17:17:59 -07:00
jiej
39669316a6 (#14267)
Summary:
- Summary:

Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group.
Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API)

- User-facing api:

1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None)

2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, ***process_group=None***)

- supported use case:
DistributedDataParallel with ***single-gpu multi-process***

a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below:

  1. use layers directly:

     torch.nn.SyncBatchNorm(...)

     similar API as with torch.nn.BatchNormXd(...)
     with added argument `process_group` which is used to limit the scope of
     synchronization within each process group. Default value is None, which
     implies synchronization across all GPUs

  2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group)

     recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm`
     preserving values of parameters/buffers.
     the utility function also allows user to specify process_group value to all
     converted layers.

b. user wraps their model with
   `torch.distributed.parallel.DataParallelDistributed`, from this point, user
   should follow the general guidelines for DDP use guide

- Error checking

For use cases not supported, we error out:

1. Application launched without ddp:
   > import torch
   > sbn = torch.nn.SyncBatchNorm(10).cuda()
   > inp = torch.randn(5, 10, 3, 3).cuda()
   > sbn(inp) --> Error!
   > AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel

2. Application launched using DDP with multi-GPU per-process:
   > ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank)
   > ValueError: SyncBatchNorm is only supported for DDP with single GPU per process
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267

Differential Revision: D14270035

Pulled By: ezyang

fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d
2019-03-06 13:39:11 -08:00