pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Sergii Dymchenko	d083b44818	Remove unused rank from _AllGatherBase backward (#81515 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81515 Approved by: https://github.com/mrshenli	2022-07-15 15:30:07 +00:00
pritam	500fb24715	Ensure tensors are contiguous in functional all_gather. We called `tensor.contiguous()` in the forward pass, however this was after the `out_tensor_list` was built which results in the `out_tensor_list` containing non-contiguous tensors resulting in errors. Fixing this by moving the contiguous call above. Differential Revision: [D37222870](https://our.internmc.facebook.com/intern/diff/D37222870/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/79747 Approved by: https://github.com/fduwjj, https://github.com/wanchaol	2022-06-17 01:27:11 +00:00
pritam	b9e3d722c4	Use appropriate dtype for sharded linear implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79255 We use several collective operations in our sharded linear implementation and for many collectives, we do not set the `dtype` of the output tensor appropriately. As a result, using a datatype like torch.float16 (which is not the default torch.float32) results in errors. Fixing this across the board and adding appropriate tests. Differential Revision: [D37059752](https://our.internmc.facebook.com/intern/diff/D37059752/) Approved by: https://github.com/fduwjj, https://github.com/wanchaol	2022-06-10 07:32:15 +00:00
pritam	44aa4ad894	Use `_all_gather_base` and fuse matmul for sharded linear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78477 Use `_all_gather_base` instead of all_gather for col-wise sharding since `_all_gather_base` returns a single fused tensor that can be used to perform a single matmul instead of looping through and performing multiple matmuls. This improves performance for col-wise sharding. Differential Revision: [D36754385](https://our.internmc.facebook.com/intern/diff/D36754385/) Approved by: https://github.com/aazzolini, https://github.com/wanchaol	2022-06-01 17:17:34 +00:00
Alban Desmaison	da3c848dfa	Make distributed raise ImportError when not available Pull Request resolved: https://github.com/pytorch/pytorch/pull/75975 Approved by: https://github.com/mrshenli	2022-04-20 13:05:18 +00:00
Sherlockk Huang	752ab799bf	Support noncontiguous inputs for torch.distributed.nn.functional.all_gather/reducescatter/gather Fixes #73515 The backward for AllGather is ReduceScatter. I am wondering is there a deeper reason why it's currently implemented as All2All with explicit sum. ReduceScatter also has a lower communication payload than All2All. In addition, dist.reduce_scatter accepts non-contiguous input_tensor_list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75276 Approved by: https://github.com/H-Huang	2022-04-15 02:35:45 +00:00
Junjie Wang	7c2489bdae	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68786 To enable the auto grad for the sharded linear, we find we need to make some changes to the current nn function api (c10d api with auto grad enabled). So we made the following several changes: 1. Add a new api `reduce_scatter` since we need it in the rowwise sharding. 2. Modify the `all_to_all` api to make sure it consistent with the ones in distributed_c10d.py. 3. Found the cpp input params of `reduce_scatter` is missing input param, added more unit test to cover these cases. 4. Sync the NN test from gloo to nccl. ghstack-source-id: 144860208 Test Plan: CI + Unit Test Reviewed By: pritamdamania87 Differential Revision: D32569674 fbshipit-source-id: 9bd613f91bbf7a39eede0af32a5a5db0f2ade43b	2021-12-06 13:38:58 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Emilio Castillo	233e4ebdb6	Implement autograd functions for c10d communication operations (#40762 ) Summary: Closes https://github.com/pytorch/pytorch/issues/40702, Fixes https://github.com/pytorch/pytorch/issues/40690 Currently wip. But I would appreciate some feedback. Functions should be double-differentiable. Contrary to `b35cdc5200/torch/nn/parallel/_functions.py` This PR generates list of tensors instead of aggregating the received data in a single tensor. Is this behavior correct? Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/40762 Reviewed By: glaringlee Differential Revision: D24758889 Pulled By: mrshenli fbshipit-source-id: 79285fb4b791cae3d248f34e2aadb11c9ab10cce	2021-01-26 07:52:51 -08:00

9 Commits