pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Tianyu Liu	d2ad9aa2f2	[dtensor][tp] add a ParallelStyle PrepareModuleInputOutput (#150372 ) Needed this class for because `parallelize_module` takes a dict, which doesn't allow `PrepareModuleInput` and `PrepareModuleOutput` to be applied at the same time. The `PrepareModuleInputOutput` in this PR initializes two variables `prepare_module_input` and `prepare_module_output` and uses them to process module / inputs / outputs. I had another implementation which put all code in `PrepareModuleInputOutput` and let `PrepareModuleInput` and `PrepareModuleOutput` inherit the monolithic `PrepareModuleInputOutput`. But it is 1. less cleaner 2. conceptually abusing inheritance because `PrepareModuleInput` shouldn't be able to access class methods of `PrepareModuleOutput` and vice versa Pull Request resolved: https://github.com/pytorch/pytorch/pull/150372 Approved by: https://github.com/wanchaol	2025-04-01 19:15:43 +00:00
Tianyu Liu	5d6ac2dced	[dtensor] add op support for select_backward and slice_backward (#150357 ) Inheriting and rebasing @awgu 's PR https://github.com/pytorch/pytorch/pull/149071 - fixed an issue for `select_backward` and an issue for `slice_backward` - removed `_experimental_ops.py` as it becomes empty Pull Request resolved: https://github.com/pytorch/pytorch/pull/150357 Approved by: https://github.com/awgu, https://github.com/XilunWu	2025-04-01 19:15:25 +00:00
Keke Zhai	68414512e6	Implement aten.select.int sharding strategy (#149842 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149842 Approved by: https://github.com/XilunWu	2025-03-27 20:49:00 +00:00
_githubsgi	f0e1a0838c	Enabling xpu in OffsetBasedRNGTracker . (#148360 ) Else torch.distributed breaks on xpu devices. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148360 Approved by: https://github.com/zhangxiaoli73, https://github.com/guangyey, https://github.com/gujinghui, https://github.com/XilunWu, https://github.com/kwen2501 Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-03-27 10:55:05 +00:00
Francisco Massa	0a60a0cad4	Let pointwise sharding take arg with largest number of dims in case of ties (#149721 ) Before, we would take the first argument with the largest number of shards, regardless if it had fewer dims than another arg with the same number of shards but more dimensions. This would lead to potentially fewer sharding options Pull Request resolved: https://github.com/pytorch/pytorch/pull/149721 Approved by: https://github.com/tianyu-l	2025-03-24 15:39:39 +00:00
Brian Hirsh	1c6b517e19	DTensor: more generically support CompositeImplicitAutograd ops under inference mode (#149514 ) Today, if you run DTensor (or any tensor subclass) under __torch_dispatch__, you will start seeing `CompositeImplicitAutograd` ops show up in the torch_dispatch. "handling" these ops is trivial: you can just tell them to decompose into their constituent ops. Normally this decomposing happens in autograd, above DTensor, but inference_mode turns autograd off, forcing the subclass to handle the op directly. It looks like previously we manually added a few CompositeImplicitAutograd entries to DTensor (e.g. linear), but this PR tries to support these ops a bit more generically. The main difference is that DTensor now needs to check if a given op is `CompositeImplicitAutograd` before attempting to run sharding prop. I ran a quick microbenchmark for the below code with `timeit`, which gave me overhead on the order of ~1us, which is hopefully not too bad for eager mode: ``` def fast_function(): return torch._C._dispatch_has_kernel_for_dispatch_key(op_call.name(), torch._C.DispatchKey.CompositeImplicitAutograd) import timeit time_taken = timeit.timeit(fast_function, number=1000) # printed 0.12..., aka 1.2us print(f'func={str(op_call)}, time={str(time_taken)}') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149514 Approved by: https://github.com/kwen2501, https://github.com/albanD, https://github.com/wanchaol	2025-03-21 22:09:19 +00:00
Yuanhao Ji	bf6621d08f	[Distributed] Add `repr` methods for `ParallelStyle`s (#149478 ) Fixes #149470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149478 Approved by: https://github.com/wanchaol	2025-03-21 03:59:25 +00:00
Francisco Massa	9b92828d4b	Add batch dim sharding rule to sdpa (#149253 ) This is a trivial rule that for most cases isn't needed, but if we want to consider that the input data is actually `Shard(0)` (instead of `Replicated()` as it is currently assumed), then we need this rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149253 Approved by: https://github.com/XilunWu	2025-03-18 07:54:02 +00:00
Aaron Gokaslan	a0ac63cbd9	[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/149257 Approved by: https://github.com/jansel	2025-03-18 00:46:07 +00:00
PyTorch MergeBot	24cfeec2c7	Revert "[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 )" This reverts commit `bfee141666`. Reverted https://github.com/pytorch/pytorch/pull/149257 on behalf of https://github.com/malfet due to Let's see if it helps restore compiler benchmark sanity, see `8bc7bd94a5/1` ([comment](https://github.com/pytorch/pytorch/pull/149257#issuecomment-2731133812))	2025-03-17 22:57:00 +00:00
Aaron Gokaslan	bfee141666	[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/149257 Approved by: https://github.com/jansel	2025-03-16 23:52:58 +00:00
Tugsbayasgalan Manlaibaatar	6b1b95ad2a	Support subclass constructor capturing in export (#147014 ) Notable TODOs: 1. Need to implement AutogradHOP to get rid of subclasses before serializing 2. Need to implement mechanism to figure out what subclasses will be used in export when they are not expressed in the inputs Differential Revision: [D69640673](https://our.internmc.facebook.com/intern/diff/D69640673) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147014 Approved by: https://github.com/bdhirsh	2025-03-16 18:19:19 +00:00
Wenjie Yang	115fc98cc0	Migrate aten.split.Tensor from using Sharding Rule to Sharding Strategy (#149106 ) Summary: Use Sharding Strategy for aten.split.Tensor instead of sharding rule Test Plan: pytest test/distributed/tensor/test_dtensor_ops.py -s -k split Reviewers: xilunwu Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/149106 Approved by: https://github.com/XilunWu, https://github.com/tianyu-l	2025-03-15 04:03:40 +00:00
Andrew Gu	a8b1767ae5	[DTensor] Fix `local_map` with multi-threading (#149070 ) Using `nonlocal device_mesh` is not safe with multi-threading Pull Request resolved: https://github.com/pytorch/pytorch/pull/149070 Approved by: https://github.com/wanchaol	2025-03-13 10:58:59 +00:00
Francisco Massa	ea86b8d315	Fix redistribution cost for all-reduce (#148761 ) This issue seems to have been introduced in https://github.com/pytorch/pytorch/pull/119897. With the current implementation, it might be more favorable to perform a reduce_scatter followed by an all-gather than simply an all-reduce. Thanks @lw for the helpful discussions on getting this PR out! Pull Request resolved: https://github.com/pytorch/pytorch/pull/148761 Approved by: https://github.com/Skylion007, https://github.com/lw, https://github.com/tianyu-l, https://github.com/fegin	2025-03-10 12:13:11 +00:00
Xilun Wu	e2a0296e80	[dtensor] add CuDNN SDPA op support to DTensor (#148537 ) ### Summary This PR adds `_scaled_dot_product_cudnn_attention` and `_scaled_dot_product_cudnn_attention_backward` to DTensor ops ### Test `pytest test/distributed/tensor/test_attention.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148537 Approved by: https://github.com/drisspg, https://github.com/fegin	2025-03-06 23:44:40 +00:00
PyTorch MergeBot	c9edd37ffb	Revert "[dtensor] add aten._scaled_dot_product_cudnn_attention.default op support (#148377 )" This reverts commit `9eef457c02`. Reverted https://github.com/pytorch/pytorch/pull/148377 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/13683650448/job/38261818684) [HUD commit link](`9eef457c02`) probably landrace ([comment](https://github.com/pytorch/pytorch/pull/148377#issuecomment-2701903810))	2025-03-05 19:45:16 +00:00
Xilun Wu	9eef457c02	[dtensor] add aten._scaled_dot_product_cudnn_attention.default op support (#148377 ) ### Summary This PR adds `_scaled_dot_product_cudnn_attention` to DTensor ops and tests it with unit test. This should allow Context Parallel and Tensor Parallel to use cudnn SDPA. ### Test `pytest test/distributed/tensor/test_attention.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148377 Approved by: https://github.com/drisspg	2025-03-05 19:09:52 +00:00
Wanchao Liang	f859722f70	[dtensor] refactor sharding prop to handle cross mesh computation (#147869 ) as titled, this PR moves the same mesh check from the sharding propagation level to each individual operator level. This is to allow more flexibility for each individual operator to check the operator can be run on the same mesh or not. For example, before this PR if user have two DTensor params that lives on different DeviceMesh, and want to run `for_each` operator on them individually, it would error out with cross mesh error. But for foreach computation there could be DTensors that live on different meshes, as long as the the mesh are the same in a "zipped way". This should also fix https://github.com/pytorch/pytorch/issues/134212 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147869 Approved by: https://github.com/tianyu-l	2025-03-04 18:30:44 +00:00
Xilun Wu	4106aa33eb	[dtensor][fix] fix _scaled_dot_product_flash_attention sharding (#148125 ) ### Summary https://github.com/pytorch/pytorch/pull/146372/ changed the op signature of `_scaled_dot_product_flash_attention` and as a consequence DTensor needs to change its sharding defined at `40ad5e01df/torch/distributed/tensor/_ops/_matrix_ops.py (L232)` ### Test `pytest test/distributed/tensor/test_attention.py` ### Follow-up It's still unclear why the CP unit tests were not run over the original PR which is BC-breaking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148125 Approved by: https://github.com/tianyu-l, https://github.com/fegin	2025-02-28 09:26:43 +00:00
Xuehai Pan	995df34b19	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144547 Approved by: https://github.com/kwen2501	2025-02-28 07:35:56 +00:00
Aaron Gokaslan	3b4b23ab0b	[BE][Ez]: Remove extra copy in dtensor parallel loss (#148096 ) Remove an extra copy of the input to `_log_softmax` when there is a dtype and memory format change. Fuse the copies instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148096 Approved by: https://github.com/jansel, https://github.com/wconstab	2025-02-28 05:42:32 +00:00
Xilun Wu	ef61c290e1	[DTensor][random] defer DTensor RNG state sync until first random op call or manual_seed call; support more flexible OffsetBasedRNGTracker init (#147025 ) Resolves https://github.com/pytorch/pytorch/issues/146767. May also resolve https://github.com/pytorch/pytorch/issues/147584. ### Summary This PR removes the RNG tracker init from the `distribute_tensor` call for the following reasons: 1. if the user does not use random ops on DTensor, there's no need to init DTensor RNG which currently requires CUDA device to be present. 2. this complies with the 0-communication semantic of `src_data_rank=None` shard distribution. Besides, `OffsetBasedRNGTracker` only accepts `DeviceMesh` argument to its constructor method. ### Consequence DTensor RNG initialization is delayed till the first DTensor random ops call or `torch.distributed.tensor.random.manual_seed`. ### Test `pytest test/distributed/tensor/test_random_ops.py` `pytest test/distributed/tensor/parallel/test_tp_random_state.py` `pytest test/distributed/tensor/parallel/test_tp_style.py` Differential Revision: [D70201856](https://our.internmc.facebook.com/intern/diff/D70201856) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147025 Approved by: https://github.com/kwen2501	2025-02-26 17:33:22 +00:00
Ke Wen	4879f8f919	[TP] Add warning when module is distributed twice (#147006 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147006 Approved by: https://github.com/XilunWu	2025-02-13 06:49:17 +00:00
Tianyu Liu	ac0f206f3c	[dtensor] fix side-effect on dtype for _like ops (#146869 ) fixes https://github.com/pytorch/pytorch/issues/146749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146869 Approved by: https://github.com/yifuwang, https://github.com/janeyx99, https://github.com/ngimel	2025-02-12 08:42:14 +00:00
Xilun Wu	c4d835fbab	[DTensor][conv] add DTensor convolution_backward op support for case where the input Tensor has requires_grad=False (#142278 ) Fixes #142058 ## Summary DTensor `convolution_backward` op throws exception when the input Tensor has `requires_grad=False` which happens if the conv layer is the first layer in the model. ATEN convolution_backward op Usually returns 3 Tensors (grad_input, grad_weight, grad_bias) and the `grad_input` is actually an Optional[Tensor] which can be `None` in the case mentioned above. However, the DTensor sharding propagation rule and corresponding TP conv backward implementation both assume that the `grad_input` would be existent. ## Fix allow the `grad_input` to be `None` for `convolution_backward` op. ## Test `pytest test/distributed/tensor/test_convolution_ops.py` ## Follow-up The current implementation of DTensor conv op also ignores `output_mask` and this may need further care. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142278 Approved by: https://github.com/bdhirsh	2025-02-10 07:06:40 +00:00
Xilun Wu	5cc1b54a91	[2/N][cp][example] flex attention in context parallel (backward pass) (#146397 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146397 Approved by: https://github.com/fegin ghstack dependencies: #145896	2025-02-06 19:50:02 +00:00
Xilun Wu	6220c64aea	[1/N][cp][example] flex attention in context parallel (forward pass) (#145896 ) Description This is an example of how FlexAttention can be used in a context parallel fashion. Right now it's only a flex_attention call with collectives added and has no load balancer, but we're about to add the missing parts step by step: 1. backward pass 2. static load balancing for causal masking 3. dynamic load balancing for other general maskings 4. automatic collective insertion solution 5. non-intrusive context parallel APIs Test `torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/tensor/examples/flex_attention_cp.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145896 Approved by: https://github.com/fegin, https://github.com/Skylion007	2025-02-06 19:50:02 +00:00
Aaron Gokaslan	292af3cc89	[BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408 ) Apply ruff rule about implicit string concatenation, this autofixes strings that are all the same type and on the same line. These lines are broken up likely as the result of autoformatters in the past. All fixes are automated using the autofixes in ISC001. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146408 Approved by: https://github.com/justinchuby, https://github.com/janeyx99	2025-02-04 19:07:04 +00:00
Stas Bekman	3aeccf2a28	DeepSpeed github repo move sync (#146320 ) DeepSpeed has moved to a new repo on github https://github.com/deepspeedai/DeepSpeed This PR updates this repo to use the new URL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146320 Approved by: https://github.com/awgu	2025-02-03 23:20:49 +00:00
wz337	6f5c8fb128	[DTensor] Add pointwise ops strategy for `aten.minimum` (#145816 ) Need it for Shampoo optimizer. `9c5700ad5e/matrix_functions.py (L240-L242)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145816 Approved by: https://github.com/XilunWu	2025-01-29 01:19:01 +00:00
Xilun Wu	2ce70da96c	[cp] override compute_log_sumexp to True for aten._scaled_dot_product_efficient_attention.default if False (#145421 ) ## Description Our current CP doesn't support efficient attention when `compute_log_sumexp=False`. `compute_log_sumexp=False` only if that `requires_grad=False` and since PP's [shape inference](`d95a6babcc/torch/distributed/pipelining/stage.py (L1387)`) happens under `torch.no_grad()` context , we need to override `compute_log_sumexp` to `True` in our CP attention implementation. ## Test - Test PP+FSDP+CP w/ `mixed_precision = "float32"` in torchtitan - `pytest test/distributed/tensor/test_attention.py -s -k test_ring_attention_sdpa` Before: <img width="1880" alt="image" src="https://github.com/user-attachments/assets/872ff583-295e-4751-a280-cf7f2d41c61a" /> After: <img width="2988" alt="image" src="https://github.com/user-attachments/assets/4bdcc2e5-22a5-427a-91a5-82206d5bd78f" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/145421 Approved by: https://github.com/H-Huang, https://github.com/tianyu-l	2025-01-24 06:17:54 +00:00
Aaron Orenstein	c95efc37ba	PEP585 update - torch/distributed/tensor (#145141 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145141 Approved by: https://github.com/bobrenjc93	2025-01-18 20:01:59 +00:00
bobrenjc93	08be9ec312	Migrate from Tuple -> tuple in torch/distributed (#144258 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144258 Approved by: https://github.com/aorenste	2025-01-10 08:34:54 +00:00
Wanchao Liang	b1c2c3967a	[dtensor] deprecate _shard_tensor to use src_data_rank=None (#144171 ) as titled, we can achieve no comm sharding for the inference case with src_data_rank=None, so deprecate the private APi Pull Request resolved: https://github.com/pytorch/pytorch/pull/144171 Approved by: https://github.com/awgu	2025-01-09 22:26:45 +00:00
Andrew Gu	8ac005ddb8	[DTensor] Add `aten.view.dtype` op support (#144404 ) Fixes https://github.com/pytorch/pytorch/issues/144286 Viewing a tensor to a different dtype does not require any redistribution and can use the default strategy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144404 Approved by: https://github.com/wanchaol	2025-01-08 23:11:22 +00:00
Xuehai Pan	dcc3cf7066	[BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415 ) The fixes are generated by: ```bash ruff check --fix --preview --unsafe-fixes --select=E226 . lintrunner -a --take "RUFF,PYFMT" --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415 Approved by: https://github.com/huydhn, https://github.com/Skylion007	2025-01-08 21:55:00 +00:00
Luca Wehrstedt	defbf0d339	[DTensor] Add strategy for _scaled_mm (#143760 ) This is done by copying the one for a regular mm, and enforcing that the scales have the same sharding scheme as their respective operands. This works because scales are 2-d tensors that must "broadcast" to the operands. This broadcasting is trivial when scales have dimensions of 1 or N, which is the only options we currently support. Note, however, that after this PR scales will be allowed to have the mesh's world size as a dimension (in certain cases). This works because, when mapped to the local shard, it becomes a dimension of 1, which can be handled by the operator. Note that when using row-wise _scaled_mm for tensor (sequence) parallelism, this situation arises naturally! Because of these specificities, the test is rather complex, as it specifically tests all these behaviors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143760 Approved by: https://github.com/tianyu-l	2025-01-06 16:35:47 +00:00
Aaron Orenstein	45ef3309e3	[BE] typing for decorators (#144161 ) Summary: Untyped decorators strip annotations from the decorated items. - _compile - _inductor/fx_passes/post_grad - _inductor/lowering - _library/custom_ops - _meta_registrations - _ops - _refs/nn/functional - ao/quantization/quantizer/xnnpack_quantizer_utils - distributed/_composable/contract - fx/experimental/graph_gradual_typechecker - fx/experimental/migrate_gradual_types/constraint_generator - optim/optimizer - signal/windows/windows - testing/_internal/common_device_type - torch/_inductor/decomposition - utils/flop_counter Test Plan: unit tests Differential Revision: D62302684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144161 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-01-04 16:40:09 +00:00
Wanchao Liang	eb7a303d21	[dtensor] expose the __create_chunk_list__ in the doc (#144100 ) as titled, this PR expose this dunder method as a public API in the doc, so that different checkpoint implementations can leverage this protocol, instead of exposing a separate API Pull Request resolved: https://github.com/pytorch/pytorch/pull/144100 Approved by: https://github.com/awgu ghstack dependencies: #144099	2025-01-03 20:06:23 +00:00
Wanchao Liang	48a05ee773	[dtensor] improve doc of the DTensor class (#144099 ) as titled: explicitly list all public members to make sure the public API stays consistent, also use groupwise as the member order to make doc look better Pull Request resolved: https://github.com/pytorch/pytorch/pull/144099 Approved by: https://github.com/awgu	2025-01-03 05:35:44 +00:00
Wanchao Liang	0431d47eaa	[tp] propagate src_data_rank kwarg in TP API (#144005 ) as titled, this PR propagates the src_data_rank in the TP API, so that module level APIs could leverage the flexibility to choose src_data_rank, and avoid the communication if it does not need to Pull Request resolved: https://github.com/pytorch/pytorch/pull/144005 Approved by: https://github.com/tianyu-l ghstack dependencies: #143883	2025-01-02 05:35:52 +00:00
Wanchao Liang	f242dbb76f	[dtensor] add src_data_rank to distribute_tensor API (#143883 ) As titled, this PR add a kwarg src_data_rank to the distribute_tensor API, to allow user specify a specific rank as the full tensor source data. Previously we by default specify group_rank=0 as the source of truth for single device semantic, this new option: * gives advanced user flexiblity to choose the source data rank * allow user to specify None explicity, which means we will skip the communications needed (scatter/broadcast) for the cases that does not care about single device semantic (i.e. loading from a checkpoint) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143883 Approved by: https://github.com/XilunWu, https://github.com/tianyu-l	2025-01-02 05:35:52 +00:00
Luca Wehrstedt	aec3b46274	[DTensor] Add aten.amin/amax to linear_reduction_strategy (#143747 ) In the same vein as https://github.com/pytorch/pytorch/pull/134206, these two ops still seemed missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143747 Approved by: https://github.com/kwen2501	2024-12-24 13:36:40 +00:00
Xuehai Pan	b77406a9ec	[BE][CI] bump `ruff` to 0.8.4 (#143753 ) Changes: 1. Bump `ruff` from 0.7.4 to 0.8.4 2. Change `%`-formatted strings to f-string 3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753 Approved by: https://github.com/Skylion007	2024-12-24 12:24:10 +00:00
Tom Ritchford	f1cbf4b1b5	Enable ruff's unused variable checking everywhere in pytorch (#136965 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136965 Approved by: https://github.com/cyyever, https://github.com/albanD	2024-12-22 02:33:11 +00:00
bobrenjc93	8e78345d69	remove allow-untyped-defs from distributed/tensor/experimental/__init__.py (#143583 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143583 Approved by: https://github.com/awgu	2024-12-19 20:25:28 +00:00
Aaron Orenstein	401b1498d2	[BE] typing for decorators - distributed/_tensor/ops/utils (#142139 ) Test Plan: unit tests Differential Revision: D62302679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142139 Approved by: https://github.com/Skylion007, https://github.com/kwen2501	2024-12-16 21:19:33 +00:00
lzhang2	b7ad52abb0	Use new group instead of split group on non-CUDA device (#141469 ) Motivation: Currently, `split_group` only works for NCCL backend. https://github.com/pytorch/pytorch/blob/main/torch/distributed/distributed_c10d.py#L4745. Then we need to use `use_group` on other non-CUDA device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141469 Approved by: https://github.com/kwen2501, https://github.com/gujinghui, https://github.com/albanD	2024-12-13 05:11:33 +00:00
Jane Xu	fd65bd755d	[BE] replace incorrect .. note:: invocations (#142868 ) Something I've noticed is that a lot of the distributed sites don't render on our docs at all, but if they ever do, the notes will render properly now 😛 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142868 Approved by: https://github.com/albanD	2024-12-11 19:58:18 +00:00

1 2 3 4 5

223 Commits