Commit Graph

24 Commits

Author SHA1 Message Date
Justin Chu
01abbfbaae [BE] Fix all B022 useless-contextlib-suppress (#100335)
No arguments passed to contextlib.suppress. No exceptions will be suppressed and therefore this context manager is redundant

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100335
Approved by: https://github.com/Skylion007
2023-04-30 18:47:40 +00:00
Sergii Dymchenko
35bf5bac26 Fix "sandcastle_skip_if decorator name is confusing" (#95649)
Fixes https://github.com/pytorch/pytorch/issues/89473
See the issue https://github.com/pytorch/pytorch/issues/89473

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95649
Approved by: https://github.com/atalman, https://github.com/malfet
2023-03-03 09:29:40 +00:00
Xuehai Pan
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
Jeff Daily
72502b94f3 correct use of torch.backends.cudnn.flags() (#93182)
Fixes #77467.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93182
Approved by: https://github.com/ngimel
2023-01-28 06:50:06 +00:00
Howard Huang
7a0f29b776 Allow Process Group to support multiple backends (#88330) (#90997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88330

### Implementation
Move backend-specific (NCCL, Gloo, etc) collective implementations to corresponding `Backend` class. Update ProcessGroup to support multiple backends and use dispatcher to calls backends based on tensor device type.

### Changes

#### c++ changes (ProcessGroup files, `Ops.cpp`, `init.cpp`)
- Update pybind definitions for new process group base class and new backend class
- Update pybinded backend class with collective definitions to keep BC with Python PG instances (e.g. `dist.ProcessGroupGloo`, `dist.ProcessGroupNCCL`) which are used in tests
- Switch `ProcessGroupGloo`, `ProcessGroupNCCL`, `ProcessGroupMPI`, `ProcessGroupUCC` to derive from the `Backend` class.
- Update CPU/CUDA `Ops.cpp` and `OpsImpl.cpp` to perform this dispatching by querying the backend using the device type
- Update internal dispatched implementation of `barrier` to use a tensor which allows operation to be dispatched.
- Update `allgather` collective to use `TensorList`. For some reason it was using the default implementation of `allgather` rather than dispatching it correctly. I still don't understand why and had originally filed an issue in 85122.

#### python changes (`distributed_c10d.py`, test files)
- Add BackendConfig class to specify the default configurations of backends and `get_backend_config()` API
- `get_backend()` deprecation warning
- `init_process_group` how returns a generic `ProcessGroup` object, it contains a list of backends (the ones stated above) which it will dispatch operations to.
- `new_group` updated to return the same as above
- Update `test_c10d_gloo.py`, Update `DistributedDataParallelTest` to use `init_process_group`, Update `ReducerTest`, update `test_broadcast_coalesced_gloo` to move from PG instance and gloo options
- Update `test_c10d_nccl.py`, Update `DistributedDataParallelTest` to use `init_process_group`
- Specific tests updated: `test_Backend_enum_class`

### Changes missing
- lazy initialization of backends
- support parsing of BackendConfig

### open questions
- Pure Python PG extensions (https://github.com/pytorch/pytorch/pull/66338)

# Example

This is a basic script (using 2 backends within a process group)

```python
# python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 basic_scenario.py
import torch.distributed as dist
import torch
import os

if __name__ == "__main__":
    rank = os.environ.get("RANK")
    # initialize with both gloo and nccl
    dist.init_process_group()
    # with gloo
    dist.all_reduce(torch.tensor([1.0]))
    print(f"Rank {rank} finished")
    # with nccl
    dist.all_reduce(torch.tensor([1.0], device=f"cuda:{rank}"))
```

Test Plan: Imported from OSS

Differential Revision: D42069829

Pulled By: H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90997
Approved by: https://github.com/awgu, https://github.com/fduwjj
2022-12-16 23:15:00 +00:00
Philip Meier
bc73affdad prepare removal of deprecated functionality in torch.testing (#87969)
_Redo of #86586 with all BC breaking changes granularly placed into separate commits._

---

Per title. Deprecation happened on Feb 25, 2022 in c6f1bbc0ac, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969
Approved by: https://github.com/mruberry
2022-11-02 14:04:48 +00:00
Charlie Yan
ffae7308c9 Enable test: distributed/algorithms/quantization/test_quantization (#80097)
fixes  https://github.com/pytorch/pytorch/issues/69017
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80097
Approved by: https://github.com/wanchaol
2022-07-01 01:32:33 +00:00
Wanchao Liang
cf3a5160f8 [BE] move init_multigpu_helper to common_distributed (#67050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67050

This PR moves init_multi_gpu_helper to common_distributed so that it could be shared by different distributed tests.
ghstack-source-id: 141370119

Test Plan: wait for ci.

Reviewed By: mrshenli

Differential Revision: D31842644

fbshipit-source-id: c7bad25d6cef9bdce7ad1fb6c60c1cad4b765702
2021-10-22 17:16:11 -07:00
Jane Xu
34051d74da Add test owner to distributed files starting with test_ (#66797)
Summary:
Action based on https://github.com/pytorch/pytorch/issues/66232

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66797

Reviewed By: gchanan

Differential Revision: D31761389

Pulled By: janeyx99

fbshipit-source-id: c27c9ab4acec1eb71d5edd4538cd113b770dfc6c
2021-10-19 10:55:20 -07:00
Yi Wang
bf9d66586c [DDP Comm Hook] Create a noop hook for performance debugging (#64344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64344

As title.

Additionally, avoid using numpy array in test_ddp_hooks.py.
ghstack-source-id: 137170449

Test Plan: buck test mode/dev-nosan caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks -- test_ddp_comm_hook_noop_hook

Reviewed By: rohan-varma

Differential Revision: D30693220

fbshipit-source-id: e17f0d1c6198863cf20a53566f586a6bff602522
2021-09-01 17:36:22 -07:00
Nima Elyasi
d1f3d85fd8 fix GradBucket.is_last() logic (#63768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63768

passed number of buckets to GradBucket constructor, to check if index is equal to num_buckets - 1 in the .is_last() function.

Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks

test output: https://www.internalfb.com/intern/testinfra/testconsole/testrun/8162774375985873/

Reviewed By: SciPioneer, mrshenli

Differential Revision: D30455913

fbshipit-source-id: 8c67ca69cbf191d6e189e09248407eb167bb24b6
2021-09-01 09:29:13 -07:00
Marjan Fariborz
6a76ee04de Adding alltoall_single collective to collective quantization API (#63154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63154

The collective quantization API now supports alltoall, alltoall_single, and allscatter. The test is also included.
ghstack-source-id: 136856877

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/algorithms/quantization:DistQuantizationTests_nccl -- test_all_to_all_single_bfp16

Reviewed By: wanchaol

Differential Revision: D30255251

fbshipit-source-id: 856f4fa12de104689a03a0c8dc9e3ecfd41cad29
2021-08-27 12:46:31 -07:00
Marjan Fariborz
3b284ab024 Adding BFP16 quantization/dequantization support to OSS (#63059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63059

Supporting BFP16 quantization method to OSS. Currently only support CPU
ghstack-source-id: 136639528

Test Plan: Imported from OSS

Reviewed By: wanchaol

Differential Revision: D30194538

fbshipit-source-id: ac248567ad8028457c2a91b77ef2ce81709fce53
2021-08-25 23:41:34 -07:00
Marjan Fariborz
c545b099aa Separating quantization test from distributed_test (#63058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63058

Dedicating separate tests for different quantization methods. Currently supporting FP16 method.
ghstack-source-id: 136499767

Test Plan: uck test mode/dev //caffe2/test/distributed/algorithms/quantization:quantization_gloo_fork -- name_of_the_test

Reviewed By: wanchaol

Differential Revision: D30142580

fbshipit-source-id: 3aacec1a231a662067d2b48c001f0c69fefcdd60
2021-08-24 01:44:55 -07:00
Pritam Damania
2d671ca41b [8/N] Remove c10d/ddp fork tests. (#63454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454

Continuation of https://github.com/pytorch/pytorch/pull/63443, this
PR removes all fork tests from torch.distributed.
ghstack-source-id: 136285511

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D30387872

fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513
2021-08-20 12:23:18 -07:00
Pritam Damania
f7611b31aa [4/N] Enable opt-asan for distributed unit tests. (#62051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62051

The goal here is to enable opt-asan for "spawn" based unit tests since
this works for "spawn" unlike "dev-asan". As a result, we can run ASAN for
"spawn" unit tests as well.

This means we can completely remove fork unit tests from the code base since
the only purpose for these tests was to run ASAN.
ghstack-source-id: 135523770

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29854514

fbshipit-source-id: 02a5bfcfae2afc21badecff77082c7a6ad83636b
2021-08-10 22:38:31 -07:00
Andrew Gu
62a90c227f Make _Join, _Joinable, _JoinHook public (#62605)
Summary:
**Overview:**
This removes the preceding `_` from `_Join`, `_Joinable`, and `_JoinHook` in preparation for adding the generic join context manager tutorial (see [here](https://github.com/pytorch/tutorials/pull/1610)). This also adds a docs page, which can be linked from the tutorial. [Here](https://github.com/pytorch/pytorch/files/6919475/render.pdf) is a render of the docs page.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62605

Test Plan:
`DistributedDataParallel.join()`:
```
touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_ddp_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_inputs_stop_iteration_sync_bn TestDistBackendWithFork.test_ddp_grad_div_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_input_join_disable TestDistBackendWithFork.test_ddp_uneven_input_exception
```

`ZeroRedundancyOptimizer`:
```
gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py
```
NOTE: DDP overlap tests are failing due to a landing race. See https://github.com/pytorch/pytorch/pull/62592. Once the fix is landed, I will rebase, and tests should be passing.

`Join`:
```
gpurun4 python test/distributed/algorithms/test_join.py
```

Reviewed By: mrshenli

Differential Revision: D30055544

Pulled By: andwgu

fbshipit-source-id: a5ce1f1d9f1904de3bdd4edd0b31b0a612d87026
2021-08-03 12:20:11 -07:00
Andrew Gu
5186fa2831 Fix c10d -> dist in test_ddp_hooks.py (#61864)
Summary:
**Overview:**
The existing `test_ddp_hooks.py` test file uses a prefix `c10d`, which is not defined in the file, meaning the test errors if left as is. This renames each `c10d` prefix to `dist`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61864

Test Plan:
All four tests pass when run:
```
gpurun python test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py
```

Reviewed By: ejguan

Differential Revision: D29783860

Pulled By: andwgu

fbshipit-source-id: 16bdd2dfcb76192964246148f14851a74f8907c8
2021-07-22 07:20:41 -07:00
Andrew Gu
c2cc6a9396 Add generic join unit tests (#61786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61786

This adds unit tests for the generic join context manager.

```
gpurun python test/distributed/algorithms/test_join.py
```

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29746646

Pulled By: andwgu

fbshipit-source-id: 2933d85783c2225574c4b77bfb90064690c6e668
2021-07-20 12:13:05 -07:00
Hong Xu
1b35b1a0c4 Properly skip distributed tests when distributed module is not built (#52945)
Summary:
Currently there is some code that intends to skip distributed tests if
the distributed module is not built. However, they are missing in some
test files; and in some other test files they are checked after
distributed module is imported, which leads to failure.  This is
generating a lot of headaches when testing minimal builds locally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52945

Reviewed By: anjali411

Differential Revision: D26848241

Pulled By: ezyang

fbshipit-source-id: 983a848844add40869a86f3c9413503a3659b115
2021-03-05 10:28:47 -08:00
Kyle Chen
e0b6252de0 [ROCm] Enable test_ddp_hooks.py test cases (#52403)
Summary:
Re-enabling these test cases for ROCm because they are passing.

jeffdaily

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52403

Reviewed By: jbschlosser, SciPioneer

Differential Revision: D26516727

Pulled By: malfet

fbshipit-source-id: 6c70805eda39b0aadfbeb30a527af3906d2da867
2021-02-18 15:51:18 -08:00
Rong Rong
bc591d76a1 add skip_if_rocm to all requires_nccl tests (#45158)
Summary:
requires_nccl annotation should skip_if_rocm as well

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45158

Reviewed By: seemethere

Differential Revision: D23879952

Pulled By: walterddr

fbshipit-source-id: 818fb31ab75d5f02e77fe3f1367faf748855bee7
2020-09-24 08:37:49 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Sinan Nasir
1a79d7bb28 DDP communication hook examples (#43310)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43310

In this diff, we prepared some example DDP communication hooks [#40848](https://github.com/pytorch/pytorch/pull/40848):

1\. `allreduce_hook`: This DDP communication hook just calls ``allreduce`` using ``GradBucket`` tensors. Once gradient tensors are aggregated across all workers, its ``then`` callback takes the mean and returns the result. If user registers this hook DDP results is expected to be same as the case where no hook was registered. Hence, this won't change behavior of DDP and user can use this as a reference or modify this hook to log useful information or any other purposes while unaffecting DDP behavior.

2\. `allgather_then_aggregate_hook` Similar to ``allreduce_hook``, this hook first gathers ``GradBucket`` tensors and its ``then`` callback aggregates the gathered gradient tensors and takes mean. Instead of ``allreduce`` this hook uses ``allgather``. Note that with W workers, both the computation and communication time scale as O(W) for allgather compared to O(logW) for allreduce. Therefore, this hook is expected to be much slower than ``allreduce_hook`` although both essentially do the same thing with the gradients.

3\. `fp16_compress_hook` This DDP communication hook implements a simple gradient compression approach that converts ``GradBucket`` tensors whose type is assumed to be ``torch.float32`` to half-precision floating point format (``torch.float16``). It allreduces those ``float16`` gradient tensors. Once compressed gradient tensors are allreduced, its then callback called ``decompress`` converts the aggregated result back to ``float32`` and takes the mean.

4\. `quantization_pertensor_hook` does quantization per tensor and uses the idea in https://pytorch.org/docs/master/generated/torch.quantize_per_tensor.html.  Note that we separately send scale and zero_point (two floats per rank) before quantized tensors.

5\. `quantization_perchannel_hook` does quantization per channel similar to https://pytorch.org/docs/master/generated/torch.quantize_per_channel.html. The main motivation is that after the initial QSGD study diff, we realized that for considerably large gradient tensors such as a tensor that contains 6 million floats quantizing dividing it into smaller channels (512 float chunks) and quantizing independently may significantly increase the resolution and result with lower error.
ghstack-source-id: 110923269

Test Plan:
python torch/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py
Couldn't download test skip set, leaving all tests enabled...
.....
----------------------------------------------------------------------
Ran 4 tests in 26.724s

OK

Internal testing:
```
buck run mode/dev-nosan //caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks
```

Reviewed By: malfet

Differential Revision: D22937999

fbshipit-source-id: 274452e7932414570999cb978ae77a97eb3fb0ec
2020-08-28 18:59:14 -07:00