pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Brian Hirsh	1c6b517e19	DTensor: more generically support CompositeImplicitAutograd ops under inference mode (#149514 ) Today, if you run DTensor (or any tensor subclass) under __torch_dispatch__, you will start seeing `CompositeImplicitAutograd` ops show up in the torch_dispatch. "handling" these ops is trivial: you can just tell them to decompose into their constituent ops. Normally this decomposing happens in autograd, above DTensor, but inference_mode turns autograd off, forcing the subclass to handle the op directly. It looks like previously we manually added a few CompositeImplicitAutograd entries to DTensor (e.g. linear), but this PR tries to support these ops a bit more generically. The main difference is that DTensor now needs to check if a given op is `CompositeImplicitAutograd` before attempting to run sharding prop. I ran a quick microbenchmark for the below code with `timeit`, which gave me overhead on the order of ~1us, which is hopefully not too bad for eager mode: ``` def fast_function(): return torch._C._dispatch_has_kernel_for_dispatch_key(op_call.name(), torch._C.DispatchKey.CompositeImplicitAutograd) import timeit time_taken = timeit.timeit(fast_function, number=1000) # printed 0.12..., aka 1.2us print(f'func={str(op_call)}, time={str(time_taken)}') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149514 Approved by: https://github.com/kwen2501, https://github.com/albanD, https://github.com/wanchaol	2025-03-21 22:09:19 +00:00
Wanchao Liang	f859722f70	[dtensor] refactor sharding prop to handle cross mesh computation (#147869 ) as titled, this PR moves the same mesh check from the sharding propagation level to each individual operator level. This is to allow more flexibility for each individual operator to check the operator can be run on the same mesh or not. For example, before this PR if user have two DTensor params that lives on different DeviceMesh, and want to run `for_each` operator on them individually, it would error out with cross mesh error. But for foreach computation there could be DTensors that live on different meshes, as long as the the mesh are the same in a "zipped way". This should also fix https://github.com/pytorch/pytorch/issues/134212 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147869 Approved by: https://github.com/tianyu-l	2025-03-04 18:30:44 +00:00
Xuehai Pan	995df34b19	[BE][PYFMT] migrate PYFMT for `torch.{distributed,distributions}` to `ruff format` (#144547 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144547 Approved by: https://github.com/kwen2501	2025-02-28 07:35:56 +00:00
Xilun Wu	ef61c290e1	[DTensor][random] defer DTensor RNG state sync until first random op call or manual_seed call; support more flexible OffsetBasedRNGTracker init (#147025 ) Resolves https://github.com/pytorch/pytorch/issues/146767. May also resolve https://github.com/pytorch/pytorch/issues/147584. ### Summary This PR removes the RNG tracker init from the `distribute_tensor` call for the following reasons: 1. if the user does not use random ops on DTensor, there's no need to init DTensor RNG which currently requires CUDA device to be present. 2. this complies with the 0-communication semantic of `src_data_rank=None` shard distribution. Besides, `OffsetBasedRNGTracker` only accepts `DeviceMesh` argument to its constructor method. ### Consequence DTensor RNG initialization is delayed till the first DTensor random ops call or `torch.distributed.tensor.random.manual_seed`. ### Test `pytest test/distributed/tensor/test_random_ops.py` `pytest test/distributed/tensor/parallel/test_tp_random_state.py` `pytest test/distributed/tensor/parallel/test_tp_style.py` Differential Revision: [D70201856](https://our.internmc.facebook.com/intern/diff/D70201856) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147025 Approved by: https://github.com/kwen2501	2025-02-26 17:33:22 +00:00
Aaron Orenstein	c95efc37ba	PEP585 update - torch/distributed/tensor (#145141 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145141 Approved by: https://github.com/bobrenjc93	2025-01-18 20:01:59 +00:00
bobrenjc93	08be9ec312	Migrate from Tuple -> tuple in torch/distributed (#144258 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144258 Approved by: https://github.com/aorenste	2025-01-10 08:34:54 +00:00
Aaron Orenstein	45ef3309e3	[BE] typing for decorators (#144161 ) Summary: Untyped decorators strip annotations from the decorated items. - _compile - _inductor/fx_passes/post_grad - _inductor/lowering - _library/custom_ops - _meta_registrations - _ops - _refs/nn/functional - ao/quantization/quantizer/xnnpack_quantizer_utils - distributed/_composable/contract - fx/experimental/graph_gradual_typechecker - fx/experimental/migrate_gradual_types/constraint_generator - optim/optimizer - signal/windows/windows - testing/_internal/common_device_type - torch/_inductor/decomposition - utils/flop_counter Test Plan: unit tests Differential Revision: D62302684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144161 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-01-04 16:40:09 +00:00
Ke Wen	8bdcdae733	[DTensor] Support matmul in inference_mode (#142197 ) Fixes #142190 . The solution is to add a `decompose_handler` for `aten.matmul`, similar to how we handle `aten.linear`. With the decomposition, `aten.matmul` becomes `aten.mm` which has sharding strategy registered with DTensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142197 Approved by: https://github.com/XilunWu, https://github.com/wz337	2024-12-06 07:15:05 +00:00
Xilun Wu	c55191f3a2	[dtensor][random] add 1d and 2d model meta init tests (#141731 ) Summary Added tests for model meta init on 1-d mesh (TP) and 2-d mesh (FSDP+TP). This exploits the issue where DTensor RNG failed to initialize weights differently across FSDP ranks. Test `pytest test/distributed/_tensor/test_random_ops.py -s -k meta_init` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141731 Approved by: https://github.com/wconstab	2024-11-29 07:59:20 +00:00
Tom Ritchford	c0582fd0f8	Remove unused Python variables in torch/[b-z]* (#136963 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963 Approved by: https://github.com/ezyang	2024-10-19 16:45:22 +00:00
Wanchao Liang	cfc227ad43	[reland][dtensor] move DTensor to public namespace (#134203 ) reland of https://github.com/pytorch/pytorch/pull/133113 I have to create a new PR because the previous reverted PR could not either be rebased, or imported successfully :( ---- Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the torch.distributed._tensor, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/134203 Approved by: https://github.com/tianyu-l	2024-09-08 17:08:40 +00:00
PyTorch MergeBot	35f36363ec	Revert "[dtensor] move DTensor to public namespace (#133113 )" This reverts commit `2ee6b97464`. Reverted https://github.com/pytorch/pytorch/pull/133113 on behalf of https://github.com/wanchaol due to looks like it break some internal type imports ([comment](https://github.com/pytorch/pytorch/pull/133113#issuecomment-2295670911))	2024-08-19 05:00:19 +00:00
Wanchao Liang	2ee6b97464	[dtensor] move DTensor to public namespace (#133113 ) Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes: * many path renames and path import fixes * a dedicated doc page without too much content yet (adding in the next PRs) * To preserve the BC for users still using the `torch.distributed._tensor`, I added a shim script to redirect old path calls to the new module The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/133113 Approved by: https://github.com/XilunWu ghstack dependencies: #133305, #133306	2024-08-17 05:09:52 +00:00

13 Commits