pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Howard Huang	600d0d0284	Add "cuda" to MPI backend capabilities (#109614 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/109543 Test Plan: We need to run CUDA aware MPI in PyTorch to actually test this change, we currently have no MPI tests. Differential Revision: D49420438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109614 Approved by: https://github.com/XilunWu	2023-09-21 13:34:58 +00:00
Pritam Damania	704b0b3c67	[RESUBMIT] Standardize on error types for distributed errors. (#108191 ) We have a plethora of error types for various errors raised from c10d. These include `RuntimeError`, `TimeoutError`, `SocketError`, `DistBackendError` etc. This results in messy code during error handling somewhat like this: ``` if "NCCL" in exception_str: ... if "Timed out initializing process group in store based barrier on rank" in exception_str: ... if "The client socket has timed out after" in exception_str: ... if "Broken pipe" in exception_str: ... if "Connection reset by peer" in exception_str: ... ``` To address this issue, in this PR I've ensured added these error types: 1. DistError - the base type of all distributed errors 2. DistBackendError - this already existed and referred to PG backend errors 3. DistStoreError - for errors originating from the store 4. DistNetworkError - for general network errors coming from the socket library Pull Request resolved: https://github.com/pytorch/pytorch/pull/108191 Approved by: https://github.com/H-Huang	2023-08-30 21:47:39 +00:00
PyTorch MergeBot	d4ff06ec84	Revert "Standardize on error types for distributed errors. (#107651 )" This reverts commit `0e2317479b`. Reverted https://github.com/pytorch/pytorch/pull/107651 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing inductor test in trunk for one of its model moco ([comment](https://github.com/pytorch/pytorch/pull/107651#issuecomment-1696578138))	2023-08-28 23:58:33 +00:00
Pritam Damania	0e2317479b	Standardize on error types for distributed errors. (#107651 ) We have a plethora of error types for various errors raised from c10d. These include `RuntimeError`, `TimeoutError`, `SocketError`, `DistBackendError` etc. This results in messy code during error handling somewhat like this: ``` if "NCCL" in exception_str: ... if "Timed out initializing process group in store based barrier on rank" in exception_str: ... if "The client socket has timed out after" in exception_str: ... if "Broken pipe" in exception_str: ... if "Connection reset by peer" in exception_str: ... ``` To address this issue, in this PR I've ensured added these error types: 1. DistError - the base type of all distributed errors 2. DistBackendError - this already existed and referred to PG backend errors 3. DistStoreError - for errors originating from the store 4. DistNetworkError - for general network errors coming from the socket library Pull Request resolved: https://github.com/pytorch/pytorch/pull/107651 Approved by: https://github.com/H-Huang	2023-08-28 21:58:15 +00:00
Justin Chu	232b96b6e2	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105433 Approved by: https://github.com/albanD	2023-07-19 14:27:11 +00:00
Rodrigo Kumpera	9e1b07e692	[C10d] Handle bool tensors in gloo. Fixes #103585 . (#105354 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105354 Approved by: https://github.com/wanchaol	2023-07-18 20:42:58 +00:00
Rohan Varma	f044613f78	Back out "Revert "[DDP] multiple forward support for static graph (#103487 )" (#103873 )" (#103938 ) Differential Revision: [D46883396](https://our.internmc.facebook.com/intern/diff/D46883396/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103938 Approved by: https://github.com/awgu, https://github.com/fegin	2023-06-22 21:55:58 +00:00
Ashwin Hari	cf0aa38005	Allow ORT backend for DTensor (#101914 ) fixes #101911 Currently, `DTensor` supports cuda and cpu. This PR makes some changes for easier integration with the ort backend. * `Backend.NAME` attribute now has value `name` instead of `NAME` for backends registered through `register_backend(name)`; this matches the pattern for backends with built-in support like nccl. * remove unused `_check_for_nccl_backend` function * add test case that moves parameters to device in the `partition_fn` - a scenario that's useful for big models Pull Request resolved: https://github.com/pytorch/pytorch/pull/101914 Approved by: https://github.com/wanchaol	2023-06-01 22:37:09 +00:00
shaoyf42	8d7e082300	[c10d] Add is_backend_available for c10d backend. (#101945 ) Add is_backend_available for c10d backend, either the built-in backends or third-party backends through function ``Backend.register_backend``. There is a related discussion in https://github.com/pytorch/pytorch/pull/101775#discussion_r1199253553 > For example in python constructor for their backend they should explicitly add the is_X_available. Or if defining in C++ they should modify pybind like this https://github.com/H-Huang/torch_collective_extension/blob/main/custom_backend/include/dummy.hpp#L98-L101 to also add their own is_available property It is a natural choice for users to add their own `is_available` when they create a backend. We think it might be a possible way for the user to use `is_X_available` in the same way as the native, for example by dynamically adding`torch.distributed.is_dummpy_available()` function. This is why we want to dynamically add the `is_X_available` to `torch.distributed` in `register_backend`. > Or we could add an Is_available(backend) function, that checks for the backend. Providing a public function is indeed another good approach. We have implemented an `is_backend_available` in https://github.com/pytorch/pytorch/pull/101945 that supports both built-in backends and third-party backends. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101945 Approved by: https://github.com/H-Huang	2023-05-31 22:51:51 +00:00
Ke Wen	0848ed21b8	[c10d] Figure out device to use for object collectives (#100954 ) Fixes https://github.com/pytorch/pytorch/issues/97938 this pr is clone from https://github.com/pytorch/pytorch/pull/100238, which is important to me. But @kwen2501 has not resolved the confliction. So, this pr is submitted to resolve the confliction. the only confliction is `distributed_c10d.py:2653` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100954 Approved by: https://github.com/kwen2501	2023-05-11 01:49:09 +00:00
Xiaodong Wang	c29ab84115	Fix bug in process_group_name when there is duplicate pgs (#100518 ) Summary: with the new c10d API, we don't need all ranks to call new_group. Integrate with the new API, so that every rank just call new_group 3 times, with a local barrier with the members within the group. Reviewed By: xunnanxu, eeggl Differential Revision: D45315615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100518 Approved by: https://github.com/kumpera	2023-05-04 02:12:28 +00:00
Justin Chu	01abbfbaae	[BE] Fix all B022 `useless-contextlib-suppress` (#100335 ) No arguments passed to contextlib.suppress. No exceptions will be suppressed and therefore this context manager is redundant Pull Request resolved: https://github.com/pytorch/pytorch/pull/100335 Approved by: https://github.com/Skylion007	2023-04-30 18:47:40 +00:00
Rodrigo Kumpera	ad21890f8f	[c10d] Scalable PG initiation. (#99931 ) Add use_local_synchronization argument to new_group. When this argument is True, is change new_group to do a store_barrier only on the ranks that are park of the group and not the whole cluster. This addressess both scalability and composability problems associated with new_group. Fixes #81291. This is relanding #84224 As part of the original PR I did a quick benchmark of creating 3 PGs per rank using both functions and perf is the following: new_group use_local_synchronization=False: \| World Size \| Time (in secs) \| \| --- \| ----------- \| \| 4 \| 0.12 \| \| 8 \| 0.25 \| \| 16 \| 0.51 \| \| 32 \| 0.87 \| \| 64 \| 1.50 \| \| 128 \| 2.87 \| new_group use_local_synchronization=True: \| World Size \| Time (in secs) \| \| --- \| ----------- \| \| 4 \| 0.05 \| \| 8 \| 0.04 \| \| 16 \| 0.03 \| \| 32 \| 0.03 \| \| 64 \| 0.04 \| \| 128 \| 0.04 \| Scaling for `use_local_synchronization=False` is sub linear because the number of process groups created as a multiple of world_size decreases as we go up. It's 6 with world_size 4 and 192 with world_size 128. Scaling for `use_local_synchronization=True` is constant as the number of store barriers executed per rank remains constant at 3. Setup: 1 AWS host, backend gloo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99931 Approved by: https://github.com/xw285cornell	2023-04-27 13:44:02 +00:00
Howard Huang	760967a284	Update _store_based_barrier implementation to reduce load on rank 0 (#98000 ) Summary: Update from using add() which makes rank 0 overloaded with requests to a single request every 10 seconds to handle the last joined worker Added optional logging_interval arg to _store_based_barrier Test Plan: ``` pytest test/distributed/test_c10d_common.py -vsk test_store_based_barrier ``` Reviewed By: rohan-varma Differential Revision: D44430531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98000 Approved by: https://github.com/kumpera	2023-04-11 14:25:29 +00:00
Howard Huang	61c74ab0f8	Fix MPI rank and world size pg initialization (#98545 ) Fixes https://github.com/pytorch/pytorch/issues/97507 Test command `pytest test/distributed/test_c10d_common.py -vsk def test_init_process_group_for_all_backends` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98545 Approved by: https://github.com/malfet	2023-04-07 21:57:31 +00:00
Aaron Gokaslan	5471621497	[BE] Remove unnecessary dict comprehensions (#97116 ) Removes unnecessary dict comprehensions that optimize creation of dicts from iterables Pull Request resolved: https://github.com/pytorch/pytorch/pull/97116 Approved by: https://github.com/kit1980	2023-03-20 00:56:57 +00:00
Howard Huang	02fa2291f7	Add support for custom backend (#95072 ) Fixes https://github.com/pytorch/pytorch/issues/92344 A custom backend can be specified by passing in a string with format `"<device_type1>:<backend_name>,<device_type2>:<backend_name>"`, e.g. `"cpu:gloo,cuda:custom_backend"`. Differential Revision: [D43630050](https://our.internmc.facebook.com/intern/diff/D43630050) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95072 Approved by: https://github.com/kwen2501	2023-03-02 21:41:49 +00:00
Howard Huang	8b3e3f937d	Update documentation init_process_group optional backend (#94543 ) Update documentation for `init_process_group()` to mention the `backend` argument is optional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94543 Approved by: https://github.com/kwen2501	2023-02-13 21:45:38 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Howard Huang	2503a4a7c6	Fix MPI backend PG initialization (#92847 ) Fixes #92573 Add test to check that all default backends can be initialized to prevent the above from regressing in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92847 Approved by: https://github.com/rohan-varma	2023-01-24 23:24:41 +00:00
Shen Li	0035340488	Allow DDP to handle custom dataclass forward outputs (#92334 ) Differential Revision: [D42554973](https://our.internmc.facebook.com/intern/diff/D42554973) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92334 Approved by: https://github.com/zhaojuanmao	2023-01-18 14:51:37 +00:00
Wanchao Liang	f30694c700	Add allgather_into_tensor to CommTensor (#90565 ) This PR adds _all_gather_base_ to CommTensor to support allgather_base Pull Request resolved: https://github.com/pytorch/pytorch/pull/90565 Approved by: https://github.com/mrshenli	2022-12-13 04:18:02 +00:00
Wanchao Liang	b782927ed4	Add reduce_scatter_tensor to CommTensor (#90564 ) This PR adds reduce_scatter_base to the CommTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/90564 Approved by: https://github.com/mrshenli	2022-12-13 04:18:02 +00:00
Wanchao Liang	3ba9e4cd55	Add alltoall_ to CommTensor (#90512 ) This PR adds alltoall_ to the CommTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/90512 Approved by: https://github.com/mrshenli	2022-12-13 04:18:02 +00:00
Howard Huang	80150788bc	[21/N] Add alltoall_base custom op with CPU/CUDA implementations (#89813 ) Differential Revision: [D41812670](https://our.internmc.facebook.com/intern/diff/D41812670) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89813 Approved by: https://github.com/kwen2501	2022-12-08 23:39:26 +00:00
Masaki Kozuki	508916128d	[ReduceOp] ameliorate custom `__eq__` (#90088 ) Improve the completeness of `ReduceOp.__eq__`. Should support the equal operator with the first argument of `RedOpType` and the second of `ReduceOp` in a follow-up. Fixes #90072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90088 Approved by: https://github.com/kwen2501	2022-12-06 05:13:50 +00:00
Masaki Kozuki	63e16216d8	[c10d] Implement `__instancecheck__` for `c10d::ReduceOp` (#88275 ) Summary: - Customize the metaclass of `torch.distributed.distributed_c10d.ReduceOp` for the sake of custom `__instancecheck__` - Add `copy.copy`, `copy.deepcopy`, and `pickle` support with tests Rel: - #81272 - #84243 - #87191 - #87303 - #87555 Ref: - https://github.com/pybind/pybind11/issues/2696 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88275 Approved by: https://github.com/wanchaol	2022-11-15 13:21:41 +00:00
Iris	68fd8f3706	[BE] [c10d][send] Improve error message on dist.send() with destination rank as itself (#89004 ) This improves error msg on dist.send() and add corresponding test in test_c10d_common.py(https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_common.py). Context in issue#83912: https://github.com/pytorch/pytorch/issues/83912 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89004 Approved by: https://github.com/H-Huang	2022-11-15 06:13:17 +00:00
Howard Huang	6e5f736d86	[15/N] Add allreduce_coalesced custom op with CPU/CUDA implementations (#88846 ) Differential Revision: [D41227740](https://our.internmc.facebook.com/intern/diff/D41227740) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88846 Approved by: https://github.com/kwen2501	2022-11-12 14:23:45 +00:00
Howard Huang	3a3500fa08	[13/N] Update gather with CPU/CUDA implementations (#86409 ) Differential Revision: [D40181612](https://our.internmc.facebook.com/intern/diff/D40181612) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86409 Approved by: https://github.com/kwen2501	2022-11-09 22:11:40 +00:00
Howard Huang	55df18e3da	[12/N] Update scatter with CPU/CUDA implementations (#86408 ) Differential Revision: [D40181613](https://our.internmc.facebook.com/intern/diff/D40181613) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86408 Approved by: https://github.com/kwen2501	2022-11-09 18:40:25 +00:00
Howard Huang	81f74eed75	[11/N] Update all_to_all with CPU/CUDA implementations (#86407 ) * #83916 [7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations Pull Request resolved: https://github.com/pytorch/pytorch/pull/86407 Approved by: https://github.com/kwen2501	2022-11-01 17:54:13 +00:00
Howard Huang	bed8102741	[10/N] Update barrier with CPU/CUDA implementations (#86368 ) ### Changes - Updates for the barrier collective - NOTE: current change will not achieve dispatching of barrier since there is no tensor to read from ### Context https://github.com/pytorch/pytorch/issues/86225 cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @kwen2501 @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/86368 Approved by: https://github.com/kwen2501	2022-11-01 17:41:01 +00:00
Howard Huang	20d849b982	[9/N] [Dispatchable Collectives] Update reduce_scatter with CPU / CUDA implementations (#86166 ) ### Changes - Updates for the reduce_scatter collective ### Context https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86166 Approved by: https://github.com/kwen2501	2022-11-01 15:23:41 +00:00
Masaki Kozuki	aa8248cc9a	Reenable `isinstance` with `torch.distributed.ReduceOp` (#87303 ) tentatively marking as draft as I haven't gotten a comprehensive list of side effects... Ref: https://stackoverflow.com/questions/40244413/python-static-class-attribute-of-the-class-itself Rel: https://github.com/pytorch/pytorch/issues/87191 cc @kwen2501 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87303 Approved by: https://github.com/wanchaol	2022-10-21 15:05:36 +00:00
Howard Huang	ad449b338f	[8/N] [Dispatchable Collectives] Update allgather with CPU / CUDA implementations (#84423 ) ### Changes - Updates for the allgather collective ### Context https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84423 Approved by: https://github.com/kwen2501	2022-10-10 17:18:48 +00:00
Howard Huang	8a1fc5d2f8	[7/N] [Dispatchable Collectives] Update reduce with CPU / CUDA implementations (#83916 ) ### Changes - Updates for the reduce collective ### Context https://github.com/pytorch/pytorch/issues/86225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83916 Approved by: https://github.com/kwen2501	2022-10-10 15:58:37 +00:00
Saliya Ekanayake	941d7a31f6	Pass group ranks and options to third party distributed backends (#73164 ) Fixes #73163 PyTorch's [_new_process_group_helper()](`9f541aa3ac/torch/distributed/distributed_c10d.py (L633)`) does not pass group's participating ranks to the backend. This PR adds the above capability. Also, refactors some variables for better clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73164 Approved by: https://github.com/kumpera	2022-09-29 17:28:58 +00:00
Howard Huang	06e0583fb0	[4/N] [Dispatchable Collectives] Update all_reduce_ with CPU / CUDA implementations (#83810 ) ### About this PR * Update the all_reduce op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Update test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D39506979](https://our.internmc.facebook.com/intern/diff/D39506979) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83810 Approved by: https://github.com/kwen2501	2022-09-28 08:48:32 +00:00
Howard Huang	ccac8d13d5	[3/N] [Dispatchable Collectives] Update broadcast_ with CPU and CUDA implementations (#83735 ) ### About this PR * Update the broadcast op to dispatch to cpu and cuda implementations. Right now they both perform the same logic so this is essentially a no-op. * Add test to validate that a separate device implementation is not supported. ### About this stack In the future we will repurpose ProcessGroup to instead contain a list of Backends (ProcessGroupNCCL/Gloo/UCC) and perform dispatching to them based on tensor type. The CPU and CUDA implementations will be updated to have process group select its CPU and CUDA backends respectively. Differential Revision: [D38876771](https://our.internmc.facebook.com/intern/diff/D38876771) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83735 Approved by: https://github.com/kwen2501	2022-09-28 03:24:06 +00:00
Rodrigo Kumpera	7dcc723d35	[c10d] Ensure collectives are called with the same dtype for all tensor params. (#84664 ) While passing tensors with different dtypes don't crash, they don't produce sensible results. We see data tearing instead of casting. It's not clear we want to support transparent casting so, for now, we fail when such input is presented. Fixes #84525 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/84664 Approved by: https://github.com/rohan-varma	2022-09-15 22:32:51 +00:00
Shen Li	1a81ab3ba5	Test tracing consecutive comms on the same input tensor (#84980 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84980 Approved by: https://github.com/wanchaol	2022-09-14 17:23:23 +00:00
Shen Li	8cbbd3a25f	Avoid nested CommTensor wrapping (#84963 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84963 Approved by: https://github.com/wanchaol	2022-09-14 01:22:45 +00:00
Shen Li	2211949513	Moving CommTensor from tests to private _spmd folder (#84719 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84719 Approved by: https://github.com/wanchaol	2022-09-09 06:25:42 +00:00
PyTorch MergeBot	a6e6276c8b	Revert "Moving CommTensor from tests to private _spmd folder (#84655 )" This reverts commit `07dad15583`. Reverted https://github.com/pytorch/pytorch/pull/84655 on behalf of https://github.com/kit1980 due to Several test failures on trunk `07dad15583`, PR also had failures	2022-09-08 19:28:38 +00:00
Shen Li	07dad15583	Moving CommTensor from tests to private _spmd folder (#84655 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84655 Approved by: https://github.com/wanchaol	2022-09-08 17:25:38 +00:00
Shen Li	89c4654ba9	Add scatter_ to CommTensor (#84606 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84606 Approved by: https://github.com/wanchaol	2022-09-07 14:00:20 +00:00
Shen Li	f43c38bdc8	Add broadcast_ to CommTensor (#84604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84604 Approved by: https://github.com/wanchaol	2022-09-07 14:00:20 +00:00

1 2

90 Commits