Commit Graph

12 Commits

Author SHA1 Message Date
Xuehai Pan
93e249969b [BE] enable ruff rule RSE and remove useless parentheses in raise statements (#124261)
Remove useless parentheses in `raise` statements if the exception type is raised with no argument.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261
Approved by: https://github.com/albanD
2024-04-17 19:29:34 +00:00
Yuanhao Ji
e3effa5855 Enable UFMT on all of test/distributed (#123539)
Partially addresses #123062

Ran lintrunner on:

- `test/distributed`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123539
Approved by: https://github.com/ezyang
2024-04-17 06:46:02 +00:00
PyTorch MergeBot
52be63eb2c Revert "Enable UFMT on all of test/distributed (#123539)"
This reverts commit 89ac37fe91.

Reverted https://github.com/pytorch/pytorch/pull/123539 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/123539#issuecomment-2058329471))
2024-04-16 06:33:21 +00:00
Yuanhao Ji
89ac37fe91 Enable UFMT on all of test/distributed (#123539)
Partially addresses #123062

Ran lintrunner on:

- `test/distributed`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123539
Approved by: https://github.com/ezyang
2024-04-16 03:23:56 +00:00
Jane Xu
eb8b80b76f Add test owners for elastic tests (#67293)
Summary:
Action following discussion with distributed and r2p team--the tests under elastic in distributed should be owned by oncall: r2p and not distributed.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67293

Reviewed By: jbschlosser

Differential Revision: D31973779

Pulled By: janeyx99

fbshipit-source-id: 05875a7600c6eb1da1310a48e1e32a1a69461c55
2021-10-28 08:32:50 -07:00
Can Balioglu
339d3bf394 [2/n] [torch/elastic] Introduce C10dRendezvousBackend. (#55636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55636

This diff introduces:

- The `C10dRendezvousBackend` type to support C10d stores as rendezvous backends.
- A fix to the `TCPStore.compare_set()` function to support non-existent keys.
- A placeholder `c10d-experimental` registry to instantiate C10d-baked rendezvous backends via `get_rendezvous_handler()`.
ghstack-source-id: 126312162

Test Plan: Run the existing and newly-introduced unit/integration tests.

Reviewed By: tierex

Differential Revision: D27654492

fbshipit-source-id: 09f498138b35186de4b0e174adb33fb5b5aa4b52
2021-04-12 22:20:27 -07:00
Can Balioglu
493a233c04 [torch/elastic] Revise the rendezvous handler registry logic. (#55466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55466

Improve the implementation and the unit test coverage of `RendezvousHandlerRegistry`.

### Note
See the original diff (D27442325 (df299dbd7d)) that had to be reverted due to an unexpected Python version incompatibility between the internal and external PyTorch CI tests.

Test Plan: Run the existing and newly-introduced unit tests.

Reviewed By: tierex

Differential Revision: D27623215

fbshipit-source-id: 51538d0f154f64e04f685a95d40d805b478c93f9
2021-04-07 20:43:20 -07:00
Brian Hirsh
bf70fe69ae Revert D27442325: [torch/elastic] Revise the rendezvous handler registry logic.
Test Plan: revert-hammer

Differential Revision:
D27442325 (df299dbd7d)

Original commit changeset: 8519a2caacbe

fbshipit-source-id: f10452567f592c23ae79ca31556a2a77546726b1
2021-04-06 06:17:14 -07:00
Can Balioglu
df299dbd7d [torch/elastic] Revise the rendezvous handler registry logic.
Summary: Improve the implementation and the unit test coverage of `RendezvousHandlerRegistry`.

Test Plan: Run the existing and newly-introduced unit tests.

Reviewed By: tierex

Differential Revision: D27442325

fbshipit-source-id: 8519a2caacbe2e3ce5d9a02e87a910503dea27d7
2021-04-05 23:38:29 -07:00
Can Balioglu
359d0a0205 [torch/elastic] Improve the implementation of RendezvousParameters and add its unit tests. (#146)
Summary:
Pull Request resolved: https://github.com/pytorch/elastic/pull/146

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54807

Improve the implementation and the unit test coverage of `RendezvousParameters`.

Test Plan: Run the existing and newly-introduced unit tests.

Reviewed By: kiukchung

Differential Revision: D27342444

fbshipit-source-id: 88de356c0a799844a739eb9105185bb8c1acf11f
2021-04-05 23:38:27 -07:00
Can Balioglu
bad8d34780 [torch/elastic] Revise the rendezvous exception types. (#54803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54803

Revise the rendezvous exception types to align their naming convention more closely with the standard Python exception types.

Test Plan: Run the existing test suite.

Reviewed By: H-Huang

Differential Revision: D27327505

fbshipit-source-id: 862c59222f9ca61a0e5afde89ae8f226090b4f92
2021-04-05 23:36:50 -07:00
Kiuk Chung
ba75cedfc5 [1/n][torch/elastic][upstream] Move torchelastic/rendezvous to torch/distributed/rendezvous (#53172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53172

Pull Request resolved: https://github.com/pytorch/elastic/pull/141

Upstreams two modules to torch:

1. `torchelastic.rendezvous`
2. `torchelastic.utils`

These modules were chosen as `[1/n]` since they are the leaf modules in torchelastic.

==== NOTES: ====
1. I'm disabling etcd_rendezvous and etcd_server tests in CIRCLECI for the moment since I need to edit the test dockers to contain the etcd server binary (there's 4-5 test dockers - one for each platform so this is going to take some time for me to set up the environments and test) - T85992919.

2. I've fixed all lint errors on python files but there are ones on the cpp files on the ZeusRendezvous. I took a look at them, and I don't want to fix the linter errors right now for 2 major reasons:
     1. Some of them are more than formatting changes (e.g. std::move vs pass by value) and I don't want to introduce bundled changes with the move
     1. The old rendezvous code (the one we forked from in caffe2/fb) has the same problems and I think its better for us to deal with this when we deprecate caffe2/fb/rendezvous in favor of the one in torchelastic -T86012579.

Test Plan:
```
buck test mode/dev-nosan //caffe2/torch/distributed/elastic/utils/test/...
buck test mode/dev-nosan //caffe2/torch/distributed/elastic/utils/data/test/...
buck test mode/dev-nosan //caffe2/torch/distributed/elastic/rendezvous/test/...
buck test mode/dev-nosan //caffe2/torch/distributed/elastic/rendezvous/fb/...
buck test mode/dev-nosan //pytorch/elastic/torchelastic/...
```
\+ Sandcastle

Reviewed By: H-Huang

Differential Revision: D26718746

fbshipit-source-id: 67cc0350c3d847221cb3c3038f98f47915362f51
2021-03-05 11:27:57 -08:00