pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	f4f1a5b5b3	Revert "Move functional collectives to the right namespace (#97793 )" This reverts commit `184bfbc3d7`. Reverted https://github.com/pytorch/pytorch/pull/97793 on behalf of https://github.com/atalman due to breaks internal builds	2023-03-31 16:02:07 +00:00
Rodrigo Kumpera	184bfbc3d7	Move functional collectives to the right namespace (#97793 ) This moves them from `torch._C._nn` to `torch._C._dist` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97793 Approved by: https://github.com/albanD	2023-03-30 22:18:13 +00:00
Wanchao Liang	848bf8103b	fix functional collective to not generate getattr node (#97924 ) use mesh.get_dim_groups directly instead of doing mesh tensor operations This help us get rid of the getattr ops during tracing Pull Request resolved: https://github.com/pytorch/pytorch/pull/97924 Approved by: https://github.com/kumpera	2023-03-30 20:14:50 +00:00
Kazuaki Ishizaki	35fd5c548e	Fix typos under torch/distributed directory (#95638 ) This PR fixes typos in comments and messages of `.py` files under torch/distributed directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95638 Approved by: https://github.com/usamah1, https://github.com/H-Huang, https://github.com/kit1980	2023-03-27 21:13:44 +00:00
Rodrigo Kumpera	c7bd9b9490	Switch AsyncCollectiveTensor to be a wrapper subclass. (#96105 ) Our usage is of a wrapper, so it makes sense that we use one. This makes it possible for FakeTensorMode to work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96105 Approved by: https://github.com/wanchaol, https://github.com/wconstab	2023-03-10 15:13:32 +00:00
Rodrigo Kumpera	5b2ab0dd4f	Multiple fixes for functional collectives. (#95897 ) _functional_collectives.py: Ensure we always wait all collectives. derivatives.yaml: mark all_reduce as non differentiable gen_variable_type.py: Add all_reduce to DONT_ENFORCE_TENSOR_IMPL_USE_COUNT common_dtensor.py: replace dist.barrier with all_reduce Pull Request resolved: https://github.com/pytorch/pytorch/pull/95897 Approved by: https://github.com/wconstab, https://github.com/fegin	2023-03-06 15:35:07 +00:00
Will Constable	92a2107375	Support Inductor collectives with wait or collective outside graph (#95893 ) Inductor implementations of collectives/wait must match eager impls in _functional_collectives in terms of interacting with _register_tensor_work API. If they do, then splitting a collective-wait pair so one half is in a compiled graph should work fine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95893 Approved by: https://github.com/kumpera	2023-03-03 09:00:48 +00:00
Wanchao Liang	f397d1700f	Inductor reduce_scatter_tensor (#95764 ) This adds reduce_scatter to the functional collective and adds the inductor lowering support Pull Request resolved: https://github.com/pytorch/pytorch/pull/95764 Approved by: https://github.com/kumpera	2023-03-02 22:05:30 +00:00
Rodrigo Kumpera	3e8eedd78e	Round of fixes for functional collectives (#95714 ) Move collective registration to torch.__init__ to handle multipy warmup. Fix all_reduce with non-contiguous tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95714 Approved by: https://github.com/wconstab	2023-03-01 17:52:14 +00:00
Will Constable	cc6da7b901	Inductor allgather_into_tensor (#95530 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95530 Approved by: https://github.com/kumpera	2023-02-27 21:38:36 +00:00
PyTorch MergeBot	d950f45577	Revert "[Functional Collectives] Migrate DeviceMesh::all_reduce to use functional all_reduce. (#95009 )" This reverts commit `0765dbc25e`. Reverted https://github.com/pytorch/pytorch/pull/95009 on behalf of https://github.com/jeanschmidt due to this PR is causing internal breakages. Check https://fburl.com/diff/me41urq8	2023-02-27 19:21:58 +00:00
Rodrigo Kumpera	0765dbc25e	[Functional Collectives] Migrate DeviceMesh::all_reduce to use functional all_reduce. (#95009 ) BC: This changes the signature and semantics of DeviceMesh::all_reduce. DeviceMesh::all_reduce now uses a functional collective under the hood which makes it more easily traceable. You no longer need to use CommTensor to get a trace. all_reduce now is async only and uses AsyncCollectiveTensor to ensure proper stream synchronization. Signature changed: removed `async_op` param and changes return type from `Optional[Work]` to `torch.Tensor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95009 Approved by: https://github.com/wanchaol	2023-02-24 02:10:55 +00:00
Rodrigo Kumpera	e22d791287	[PTD] Introduce tracing friendly collectives. (#93990 ) This change adds torch.distributed.traceable_collectives. This experimental API enables collectives to be fully traced by dynamo and FX. See #93173 for the RFC Pull Request resolved: https://github.com/pytorch/pytorch/pull/93990 Approved by: https://github.com/wconstab, https://github.com/wanchaol, https://github.com/H-Huang	2023-02-16 15:35:01 +00:00

13 Commits