pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
linhaifeng	695cb0d342	[2/N][Fix] Fix typo in test folder (#166374 ) Fix typo in test folder. _typos.toml ```bash [default.extend-words] nd = "nd" arange = "arange" Nd = "Nd" GLOBALs = "GLOBALs" hte = "hte" iy = "iy" PN = "PN" Dout = "Dout" optin = "optin" gam = "gam" PTD = "PTD" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166374 Approved by: https://github.com/cyyever, https://github.com/ezyang	2025-10-29 03:02:07 +00:00
Ved Thorat	21b48f8dfa	Fixes torch.compile(nn.ModuleList()) changes bool() behavior (#159208 ) Fixes #159139 ## The Cause The bug occurs because the OptimizedModule wrapper in torch._dynamo.eval_frame doesn't call the len method. This causes Python's bool() check to fall back to the default object truthiness (always True) instead of correctly evaluating containers with len() == 0 as False. ## The Fix A very easy fix . I just added the len method to OptimizedModule in torch._dynamo.eval_frame class to delegate the call to the original module ```python def __len__(self): """ Proxy the len() call to the original module to fix truthiness checks. """ return len(self._orig_mod) ``` This successfully fixes the issue . The script now works as expected. ## Reproduction Script ```python import torch import torch.nn as nn # Create an empty nn.ModuleList original = nn.ModuleList() # Compile it using torch.compile compiled = torch.compile(original) # Compare their boolean evaluations print(f"bool(original): {bool(original)}") print(f"bool(compiled): {bool(compiled)}") # Trigger failure if they differ assert bool(original) == bool(compiled), "BUG: truthiness behavior mismatch after compilation" ``` ## Output bool(original): False bool(compiled): False Pull Request resolved: https://github.com/pytorch/pytorch/pull/159208 Approved by: https://github.com/andrewboldi, https://github.com/Lucaskabela Co-authored-by: pushkar-hue <pushkarsharma.rtm@gmail.com> Co-authored-by: Lucas Kabela <lucasakabela@gmail.com>	2025-10-28 19:21:24 +00:00
William Wen	32fe4f681e	[dynamo] fix keyerror in resume_execution (again) (#166040 ) Fixes https://github.com/pytorch/pytorch/issues/166176 The error I attempted to fix in https://github.com/pytorch/pytorch/pull/162318 was still appearing internally. Surprised that this wasn't caught anywhere 😰 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166040 Approved by: https://github.com/Lucaskabela ghstack dependencies: #166036	2025-10-28 07:04:29 +00:00
William Wen	ebb2b2e894	[dynamo] fix store attr graph break in with block (#166036 ) Fixes https://github.com/pytorch/pytorch/issues/166033 Differential Revision: [D85198055](https://our.internmc.facebook.com/intern/diff/D85198055) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166036 Approved by: https://github.com/Lucaskabela	2025-10-28 07:04:29 +00:00
Animesh Jain	8e1e4ee8e0	[reland][dynamo][easy] Support torch.accelerator.current_accelerator (#166327 ) Reland https://github.com/pytorch/pytorch/pull/165734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166327 Approved by: https://github.com/Lucaskabela	2025-10-27 23:41:43 +00:00
Animesh Jain	2e8e9a59a8	Revert "[dynamo][easy] Support torch.accelerator.current_accelerator (#165734 )" (#166094 ) This reverts commit `c18ddfc572`. Discovers some latent issues causing internal failures. Will fix those issues first and resend the PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/166094 Approved by: https://github.com/bdhirsh	2025-10-23 01:24:46 +00:00
Yuanyuan Chen	fdab48a7c1	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 07:36:18 +00:00
PyTorch MergeBot	24520b8386	Revert "Enable all PIE rules on ruff (#165814 )" This reverts commit `c79dfdc655`. Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))	2025-10-18 07:21:08 +00:00
Yuanyuan Chen	c79dfdc655	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 06:40:12 +00:00
Animesh Jain	c18ddfc572	[dynamo][easy] Support torch.accelerator.current_accelerator (#165734 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165734 Approved by: https://github.com/Skylion007	2025-10-17 22:04:19 +00:00
jmaczan	cff1b20771	Patch the flex_attention._get_mod_type to not use inspect.signature when computing num_positional_args (an alternative fix for flex attention graph break on create_block_mask) (#164923 ) The initial fix for inspect.signature uses not a right approach (https://github.com/pytorch/pytorch/pull/164349#pullrequestreview-3306614010). As @williamwen42 suggests (https://github.com/pytorch/pytorch/pull/164349#issuecomment-3379222885) we can just for now get rid of `inspect.signature` call in flex_attention to resolve this high priority issue (https://github.com/pytorch/pytorch/issues/164247#issuecomment-3378673179). In this PR I did exactly this - limited the scope of fix to just computing `num_positional_args` in `flex_attention._get_mod_type` based on properties returned by `NestedUserFunctionVariable.const_getattr` (some were missing so I added them) Fixes #164247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164923 Approved by: https://github.com/williamwen42	2025-10-17 17:44:45 +00:00
jmaczan	003dd13073	[dynamo, guards] Better error messages when generated guard fails on the same frame (#165242 ) Not sure what exactly we want to have in the message, but that's easy to adjust. I tried to find a reliable test to reproduce this message (happens only when a guard fails right after it's created), but I ended up mocking a `guard_manager.check` function to return `False` to trigger this behavior. I think that's fine, because any other case that we pick (like datetime.now()), we want to patch one day anyway, so every time we make the next patch, will need to chase for another repro test @williamwen42 Fixes #164990 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165242 Approved by: https://github.com/williamwen42	2025-10-16 01:05:31 +00:00
Guilherme Leobas	e6f766c7d7	[Dynamo] Fixes for exceptions (#153966 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153966 Approved by: https://github.com/Lucaskabela	2025-10-14 22:03:58 +00:00
PyTorch MergeBot	a2f34bdd7c	Revert "Patch the flex_attention._get_mod_type to not use inspect.signature when computing num_positional_args (an alternative fix for flex attention graph break on create_block_mask) (#164923 )" This reverts commit `3401665110`. Reverted https://github.com/pytorch/pytorch/pull/164923 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/164923#issuecomment-3403654378))	2025-10-14 21:20:49 +00:00
jmaczan	3401665110	Patch the flex_attention._get_mod_type to not use inspect.signature when computing num_positional_args (an alternative fix for flex attention graph break on create_block_mask) (#164923 ) The initial fix for inspect.signature uses not a right approach (https://github.com/pytorch/pytorch/pull/164349#pullrequestreview-3306614010). As @williamwen42 suggests (https://github.com/pytorch/pytorch/pull/164349#issuecomment-3379222885) we can just for now get rid of `inspect.signature` call in flex_attention to resolve this high priority issue (https://github.com/pytorch/pytorch/issues/164247#issuecomment-3378673179). In this PR I did exactly this - limited the scope of fix to just computing `num_positional_args` in `flex_attention._get_mod_type` based on properties returned by `NestedUserFunctionVariable.const_getattr` (some were missing so I added them) Fixes #164247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164923 Approved by: https://github.com/williamwen42	2025-10-14 18:29:15 +00:00
Animesh Jain	af7ca55ced	[export][dynamo] Fallback to slowpath for MultiHeadAttention for strict export (#164721 ) In https://github.com/pytorch/pytorch/pull/106824, export decided to slow-path for MultiHeadAttention module (look into the PR description as to why). But that PR eventually caused a divergence between Dynamo and export. Today, strict-export does not inline into builtin modules (like MultiHeadAttention), and therefore make_fx sees the original nn.Module and takes the slow path. But compile inlines into the nn module, and at this time the condition `_is_make_fx_tracing` is False. As a result, Dynamo takes a fast path, resulting in a different op being called. This divergence is undesirable. There are 2 ways to fix it 1) Make export take the fast path - As explained in the https://github.com/pytorch/pytorch/pull/106824 , this might be difficult. So, we go to (2) 2) Make compile as well take the slow path - This is easy to implement. The con here is that Pytorch eager and compile will use different operators, which can cause numerics issues etc. Since (2) is easy to do, we will follow this path. We are tracking the issue in https://github.com/pytorch/pytorch/issues/164062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164721 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan	2025-10-09 03:25:15 +00:00
William Wen	af4c29fea8	[dynamo, nested graph breaks] fix nested step graph break related issues (#162737 ) Turns out codegen'ing a nested step graph break is significantly more complicated than first thought. The optimized function should actually do: - call graph/load values/do side effects etc. - call into the leaf's resume function, but skipped (this essentially step graph break function for just the leaf function) - call into all the other resume functions, traced. This PR also adds `torch._dynamo.step_unsupported()`, which can be used for internal testing purposes to better test step graph break handling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162737 Approved by: https://github.com/Lucaskabela ghstack dependencies: #160601	2025-10-08 22:02:52 +00:00
Laith Sakka	2035f6b2e6	use check_size instead of check_is_size in ops.py (#164668 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164668 Approved by: https://github.com/angelayi ghstack dependencies: #164664, #164665, #164667	2025-10-08 14:23:38 +00:00
Animesh Jain	cac5e13e13	[dynamo] Inline nn module calls using __call__ methods (#164817 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164817 Approved by: https://github.com/SherlockNoMad, https://github.com/mlazos	2025-10-07 08:57:20 +00:00
Lucas Kabela	979e10f7d6	[Bugfix] Match eager stride semantics for cloned tensors with preserve_format in compile (#163017 ) Fixes #161010 by making `clone_meta` match the semantics of strides for eager mode. This is: * Case 1: Tensor is_non_overlapping_and_dense; in this case, stride should match input tensor stride * Case 2: Otherwise, stride should be contiguous computed from input tensor using `compute_elementwise_output_strides` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163017 Approved by: https://github.com/williamwen42, https://github.com/xmfan Co-authored-by: morrison-turnansky <mturnans@redhat.com>	2025-09-19 19:41:33 +00:00
Prachi Gupta	c0142f5c06	[ROCm] Enabling several UTs (#161715 ) All these UTs are working as is, just removing the skip - test_p2p_ipc - test_repros.py: working, added fp8 support - test_activation_checkpointing.py - test_content_store.py - test_cuda_multigpu.py - test_compute_comm_reordering.py - test_segment_reductions.py - test_dataloader.py - test_math_ops.py - test_loop_ordering.py - test_control_flow.py - distributed_test.py - test_mem_tracker.py - test_fsdp_optim_state.py - test_fully_shard_mixed_precision.py: skippped for < ROCm7.0 - test_aot_inductor_custom_ops.py - test_c10d_ops_nccl.py - test_eager_transforms.py - test_sparse_csr.py - test_inductor_collectives.py - test_fake_tensor.py - test_cupy_as_tensor.py - test_cuda.py: enable UTs that are working - test_matmul_cuda.py: enable UTs that are working Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/161715 Approved by: https://github.com/msaroufim Co-authored-by: Mark Saroufim <marksaroufim@fb.com>	2025-09-09 15:49:21 +00:00
William Wen	26a1b9cce2	[dynamo] fix resume_execution.py KeyError in Python 3.11+ (#162318 ) Fixes https://github.com/pytorch/pytorch/issues/162313 Differential Revision: [D81938289](https://our.internmc.facebook.com/intern/diff/D81938289) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162318 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos, https://github.com/anijain2305	2025-09-08 20:26:24 +00:00
PyTorch MergeBot	8235c4f65d	Revert "[ROCm] Enabling several UTs (#161715 )" This reverts commit `b9ba612f7a`. Reverted https://github.com/pytorch/pytorch/pull/161715 on behalf of https://github.com/jeanschmidt due to Need to revert in order to revert https://github.com/pytorch/pytorch/pull/159473, feel free to merge it back once conflicts are cleared ([comment](https://github.com/pytorch/pytorch/pull/161715#issuecomment-3264040604))	2025-09-07 21:03:17 +00:00
Prachi Gupta	b9ba612f7a	[ROCm] Enabling several UTs (#161715 ) All these UTs are working as is, just removing the skip - test_p2p_ipc - test_repros.py: working, added fp8 support - test_activation_checkpointing.py - test_content_store.py - test_cuda_multigpu.py - test_compute_comm_reordering.py - test_segment_reductions.py - test_dataloader.py - test_math_ops.py - test_loop_ordering.py - test_control_flow.py - distributed_test.py - test_mem_tracker.py - test_fsdp_optim_state.py - test_fully_shard_mixed_precision.py: skippped for < ROCm7.0 - test_aot_inductor_custom_ops.py - test_c10d_ops_nccl.py - test_eager_transforms.py - test_sparse_csr.py - test_inductor_collectives.py - test_fake_tensor.py - test_cupy_as_tensor.py - test_cuda.py: enable UTs that are working - test_matmul_cuda.py: enable UTs that are working Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/161715 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily	2025-09-04 20:43:03 +00:00
Animesh Jain	600c25e9a1	[dynamo] Graph break on torch.cuda.sychronize (#161925 ) Today, AOTDispatcher ignores cuda.synchornize. Even if we wrap it in some HOP, we need it to be a barrier op to prevent any inductor reordering. So graph breaking. Fixes https://github.com/pytorch/pytorch/issues/160751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161925 Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/mlazos	2025-09-02 19:00:21 +00:00
can-gaa-hou	c0ed87c82d	[Dynamo] Fix weakref.proxy error when `torch.compile` (#161508 ) Fixes #159258 The error occurs when we attempt to create a weak reference from a weak reference proxy. `e9d42b3880/torch/_dynamo/guards.py (L2910-L2915)` In fact, we shouldn't create a weak reference from another reference or proxy, as it would check in CPython. `f60f8225ed/Objects/weakrefobject.c (L410-L418)` However, `__weakrefoffset__` is not equal to 0 when the `guarded_object` is in `weakref.ProxyTypes`, and it will wrongly create a weak reference for the `weakref.ProxyTypes`. I think this could be a bug from CPython, but we can prevent it by adding more weakref type checks (`weakref.ProxyTypes` contains `weakref.ProxyType` and `weakref.CallableProxyType`) here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161508 Approved by: https://github.com/Lucaskabela, https://github.com/anijain2305, https://github.com/malfet	2025-08-28 22:34:18 +00:00
Animesh Jain	3d406429b0	[dynamo][vllm] Support typing.get_type_hints (#161362 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161362 Approved by: https://github.com/Skylion007, https://github.com/StrongerXi, https://github.com/jansel	2025-08-27 09:55:31 +00:00
Michael Lazos	be55d7ac9e	Revert "[Dynamo] Allow inlining into AO quantization modules (#152934 )" (#161567 ) This reverts commit `20e2ca3e29`. Fixes https://github.com/pytorch/pytorch/issues/157434 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161567 Approved by: https://github.com/Lucaskabela	2025-08-27 03:33:04 +00:00
William Wen	b074cbaedd	[dynamo] allow resume functions to have name in both freevars and varnames (#161544 ) fixes https://github.com/pytorch/pytorch/issues/161542 Differential Revision: [D81073109](https://our.internmc.facebook.com/intern/diff/D81073109) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161544 Approved by: https://github.com/StrongerXi, https://github.com/anijain2305	2025-08-27 00:25:16 +00:00
Simon Fan	8aad3a60ce	[dynamo] propagate tensor metadata on Tensor.__setitem__(tensor) (#161036 ) Fixes silent incorrectness for autograd function tracing, where we rely on FakeTensor metadata (requires_grad) to determine whether to HOP or not: `5ee464db5c/torch/_dynamo/variables/misc.py (L671)` Stared at this with @anijain2305 yesterday, `Tensor.__setitem__` can update tensor metadata, and we can just run the fake prop and extract the output metadata from the updated FakeTensor. FIXES https://github.com/pytorch/pytorch/issues/160901 It should also be the root cause behind the issue in https://github.com/pytorch/torchtitan/pull/1604 @bdhirsh @ruisizhang123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161036 Approved by: https://github.com/anijain2305 ghstack dependencies: #160805	2025-08-22 04:43:22 +00:00
Peter Y. Yeh	e389a08dcd	AMD/ROCm OCP Micro-scaling Format (mx-fp8/mx-fp4) Support (#151360 ) - This pull request introduces support for the [OCP Micro-scaling (MX) format](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf), with a focus on compatibility with AMD ROCm 7.0 and the gfx950 architecture. This PR also establishes the foundation for enabling MX-FPX features in [TorchAO](https://github.com/pytorch/ao/issues/2229) on the AMD platform. - Validation (ROCm 7.0 + gfx950 required): `111 relevant tests passing.` > PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Co-author: @jagadish-amd — Thank you for the efforts leading validation on gfx950 with ROCm 7.0. ----------------------------------- This pull request introduces support for new scalar types and scaling methods, particularly for ROCm 7.0 and gfx950, and refines testing for these features. Key changes include adding constraints for matrix dimensions, enabling block-wise scaling, and updating tests to accommodate new data types. ### Support for new scalar types and scaling methods: * [`aten/src/ATen/cuda/CUDABlas.cpp`](diffhunk://#diff-74fcb26047c1df4024105d36ce22a36b77cf8cc93c28631d743e639b3d6066aeR1876-R1885): Added constraints for matrix dimensions when using `Float8_e8m0fnu` with block-wise scaling, ensuring dimensions are multiples of 32. Updated compatibility checks to support ROCm 7.0 for `Float8_e8m0fnu` and `Float8_e4m3fn`. [[1]](diffhunk://#diff-74fcb26047c1df4024105d36ce22a36b77cf8cc93c28631d743e639b3d6066aeR1876-R1885) [[2]](diffhunk://#diff-74fcb26047c1df4024105d36ce22a36b77cf8cc93c28631d743e639b3d6066aeL1913-R1934) * [`aten/src/ATen/native/cuda/Blas.cpp`](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abR1276-R1290): Introduced block-wise scaling for `Float8_e8m0fnu`, with checks for ROCm 7.0 and GPU architecture `gfx950`. Added validation for supported scalar types and matrix dimensions. [[1]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abR1276-R1290) [[2]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abR1349-R1364) ### Updates to scalar type mappings: * [`aten/src/ATen/cuda/CUDADataType.h`](diffhunk://#diff-9188bb13b1a49f459141f5f9b875593d1c5ce2beb5ad711fdbaf5bc7089ec015L93-R93): Extended scalar type mappings to support `Float4_e2m1fn_x2` for ROCm 7.0. * [`aten/src/ATen/cuda/tunable/GemmHipblaslt.h`](diffhunk://#diff-bfa1a3b5d4bef1892bf50338775f3b0fd8cd31fc1868148f3968b98aefb68e3fR88-R96): Added a constexpr mapping for `Float4_e2m1fn_x2` based on ROCm version. ### Enhancements to testing(@jagadish-amd): * [`test/test_matmul_cuda.py`](diffhunk://#diff-3f31c52b48cfddf8f4617d809f7695b2e4a1c78656f8c4b5143a4b45d01fcf23R765-R766): Updated tests to include new scalar types (`Float4_e2m1fn_x2`) and recipes (`mxfp4`). Added logic to handle different scaling recipes and validate compatibility with ROCm and CUDA versions. [[1]](diffhunk://#diff-3f31c52b48cfddf8f4617d809f7695b2e4a1c78656f8c4b5143a4b45d01fcf23R765-R766) [[2]](diffhunk://#diff-3f31c52b48cfddf8f4617d809f7695b2e4a1c78656f8c4b5143a4b45d01fcf23L1331-R1356) F592e669L1353R1472) These changes improve compatibility with newer hardware and software versions, enhance functionality for matrix operations, and ensure robust testing for the added features. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151360 Approved by: https://github.com/drisspg, https://github.com/malfet	2025-08-18 16:43:09 +00:00
Simon Fan	c8205cb354	[autograd] match 0-dim gradients device type regardless of subclassness (#160165 ) Not sure if there some subclasses where the outer.dim() == 0 but you wouldn't want to move it? FIXES https://github.com/pytorch/pytorch/issues/160084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160165 Approved by: https://github.com/ezyang, https://github.com/albanD	2025-08-11 17:57:32 +00:00
William Wen	fd606a3a91	[dynamo] update pytorch-labs -> meta-pytorch in graph break URLs (#159975 ) Related PR: https://github.com/meta-pytorch/compile-graph-break-site/pull/30 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159975 Approved by: https://github.com/Lucaskabela	2025-08-06 23:57:31 +00:00
Lucas Kabela	5d89634ca8	Graph break with error message (#158800 ) Fixes #157452 Test with ``` python test/dynamo/test_repros.py ReproTests.test_nn_parameter_ctor_graph_breaks ``` ### Release Notes Change to nn.Parameter Constructor Behavior in Dynamo Semantic change introduced in the nn.Parameter constructor; previously, if the constructor lacked a clean source, the system would attempt to infer arguments to construct a clone and lift this synthetic proxy in the computation graph. This approach had many potential edge cases and was difficult to reason about. The new behavior defaults to graph breaking when the nn.Parameter constructor does not have a clean source. Users are now suggested to manually move the constructor out of the graph in such cases. This change improves clarity and reduces complexity in graph construction and debugging. Users can escape hatch to old semantics with `torch.dynamo.config.graph_break_on_nn_param_ctor=False` if this cannot be done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158800 Approved by: https://github.com/anijain2305	2025-07-29 17:34:49 +00:00
Xu Han	dfcb07bdfa	[Inductor] disable windows failed UTs temporary. (#159163 ) Disable windows failed UTs temporary. <img width="1238" height="107" alt="image" src="https://github.com/user-attachments/assets/c8a40408-a793-4016-99bb-19c1bb09860a" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/159163 Approved by: https://github.com/desertfire	2025-07-25 20:25:36 +00:00
PyTorch MergeBot	8d2a1d6e18	Revert "Graph break with error message (#158800 )" This reverts commit `cae4746952`. Reverted https://github.com/pytorch/pytorch/pull/158800 on behalf of https://github.com/clee2000 due to broke some tests on main inductor/test_distributed_patterns.py::DistributedPatternTests::test_nn_param_return4 [GH job link](https://github.com/pytorch/pytorch/actions/runs/16507837934/job/46685704688) [HUD commit link](`cae4746952`), note to self: bad TD, but also dynamo/test_repros failed but didn't get skipped by TD so maybe a landrace, or I just blaming the wrong commit entirely.. ([comment](https://github.com/pytorch/pytorch/pull/158800#issuecomment-3115224608))	2025-07-24 22:45:58 +00:00
Lucas Kabela	cae4746952	Graph break with error message (#158800 ) Fixes #157452 Test with ``` python test/dynamo/test_repros.py ReproTests.test_nn_parameter_ctor_graph_breaks ``` ### Release Notes Change to nn.Parameter Constructor Behavior in Dynamo Semantic change introduced in the nn.Parameter constructor; previously, if the constructor lacked a clean source, the system would attempt to infer arguments to construct a clone and lift this synthetic proxy in the computation graph. This approach had many potential edge cases and was difficult to reason about. The new behavior defaults to graph breaking when the nn.Parameter constructor does not have a clean source. Users are now suggested to manually move the constructor out of the graph in such cases. This change improves clarity and reduces complexity in graph construction and debugging. Users can escape hatch to old semantics with `torch.dynamo.config.graph_break_on_nn_param_ctor=False` if this cannot be done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158800 Approved by: https://github.com/anijain2305	2025-07-24 21:05:17 +00:00
Animesh Jain	1b456c580d	[dynamo][guards] Add type info of the guarded value in guard managers (#158765 ) tlparse looks like this <img width="1165" height="226" alt="image" src="https://github.com/user-attachments/assets/04c4e6b1-34a3-4d9d-8304-6eb6d9a94980" /> This will aid in reading guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158765 Approved by: https://github.com/Lucaskabela, https://github.com/StrongerXi	2025-07-23 16:59:15 +00:00
William Wen	7d2ceaff21	[dynamo] skip tracing functions registered in sys.monitoring (#158171 ) Fixes https://github.com/pytorch/pytorch/issues/158164 This was fixed by applying `skip_code_recursive` to any function registered to `sys.monitoring` (via `PyThreadState_GET()->interp->monitoring_callables`). This check is done whenever we attempt to set the eval frame callback from Python. Microbenchmark: `benchmarks/dynamo/microbenchmarks/overheads.py`: BEFORE: ``` requires_grad=False eager 7.1us (warmup=0.0s) compiled 24.6us (warmup=10.0s) requires_grad=True eager 8.9us (warmup=0.0s) compiled 57.8us (warmup=0.1s) inference_mode() eager 6.5us (warmup=0.0s) compiled 23.4us (warmup=0.1s) ``` AFTER: ``` requires_grad=False eager 7.0us (warmup=0.0s) compiled 23.2us (warmup=15.2s) requires_grad=True eager 9.0us (warmup=0.0s) compiled 55.1us (warmup=0.1s) inference_mode() eager 6.4us (warmup=0.0s) compiled 22.2us (warmup=0.1s) ``` Followup thought: how do we let users know that a frame is skipped because the code object is a callable registered to sys.monitoring? (or any other reason?) Differential Revision: [D78530528](https://our.internmc.facebook.com/intern/diff/D78530528) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158171 Approved by: https://github.com/jansel	2025-07-22 18:02:30 +00:00
Simon Fan	07c4c2a792	[dynamo][be] hide warnings without invalidating warnings cache (#158520 ) I feel uneasy about touching `__warningregistry__` since it is undocumented and private surface. The only public API hook that doesn't increment warnings version seems to be https://docs.python.org/3/library/warnings.html#warnings.showwarning. So we could wack a mole all the warnings muters in compile to just not display warnings, and we wouldn't invalidate warnings cache. This PR adds it for torch/_dynamo, and I didn't find any warnings versioning mutation from torch/_inductor. There is a behavior change if someone calls a compiled graph with simplefilter("error"): ```python # e.g. test/dynamo_expected_failures/TestAutogradFallback.test_no_autograd_kernel_inplace_mode_nothing with warnings.catch_warnings(): warnings.simplefilter("error") # turns all warnings into errors compiled_fn() # will throw if any of the muted warnings fire ``` FIXES https://github.com/pytorch/pytorch/issues/128427 A note for the future: The warnings module doesn't offer a thread safe way of using it. Even regular filters have this problem, directly editing `__warningregistry__` would be very bad, and this PR would mute all threads. Someone will need to build a thread safe warnings interface. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158520 Approved by: https://github.com/anijain2305, https://github.com/zou3519	2025-07-18 22:02:31 +00:00
Sam Larsen	bff69f25c2	[BE][testing] fix test/dynamo/test_repros:test_longtensor_list (#158458 ) Summary: This test is failing internally because the number of underlying calls to the rng differ by virtue of various library initializations that get sucked in with an internal build. Test Plan: `buck test '@fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_repros.py::ReproTests::test_longtensor_list' --run-disabled` Pull Request resolved: https://github.com/pytorch/pytorch/pull/158458 Approved by: https://github.com/jansel	2025-07-17 17:27:00 +00:00
Simon Fan	7cf31b4a42	[dynamo] fix NamedTupleVariable cloning (#158190 ) FIXES https://github.com/pytorch/pytorch/issues/157945 ## Explanation 1. Some VTs add additional attrs e.g. NamedTupleVariable has "dynamic_attributes" `a0308edb6c/torch/_dynamo/variables/lists.py (L1048-L1051)` 2. VT.clone passes everything by dict, includes "dynamic_attributes" `a0308edb6c/torch/_dynamo/variables/base.py (L255-L259)` 3. Non-handled args become kwargs in VT's `__init__`, `super().__init__()` passes kwargs to Base VT `a0308edb6c/torch/_dynamo/variables/lists.py (L1048-L1051)` 4. Base VT's `__init__` gets unexpected "dynamic_attributes" kwarg `a0308edb6c/torch/_dynamo/variables/base.py (L609-L613)` You could also let Base VT's `__init__` ignore additional kwargs, but that seemed a bit too permissive, and I don't think many VT's add these derived class only attrs. ## After fix ```python ===== __compiled_fn_1_7f9541ed_e166_43fe_8322_c5225ce4207f ===== /home/xmfan/core/miniconda3/envs/0712/lib/python3.12/site-packages/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module): def forward(self, L_x_: "f32[4, 8, 6][48, 6, 1]cpu"): l_x_ = L_x_ # File: /home/xmfan/core/a/torchtitan/wtf.py:10 in forward, code: U, S = torch.linalg.svd(x)[:2] linalg_svd = torch._C._linalg.linalg_svd(l_x_); l_x_ = None U: "f32[4, 8, 8][64, 1, 8]cpu" = linalg_svd[0] S: "f32[4, 6][6, 1]cpu" = linalg_svd[1]; linalg_svd = None # File: /home/xmfan/core/a/torchtitan/wtf.py:11 in forward, code: reduced = U[:, :, :self.k] @ torch.diag_embed(S[:, :self.k]) getitem_3: "f32[4, 8, 5][64, 1, 8]cpu" = U[(slice(None, None, None), slice(None, None, None), slice(None, 5, None))]; U = None getitem_4: "f32[4, 5][6, 1]cpu" = S[(slice(None, None, None), slice(None, 5, None))]; S = None diag_embed: "f32[4, 5, 5][25, 5, 1]cpu" = torch.diag_embed(getitem_4); getitem_4 = None reduced: "f32[4, 8, 5][40, 5, 1]cpu" = getitem_3 @ diag_embed; getitem_3 = diag_embed = None return (reduced,) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/158190 Approved by: https://github.com/StrongerXi	2025-07-14 23:39:25 +00:00
Huy Do	2af7c67e48	Mitigate some flaky tests in trunk (#157756 ) (not really fix these issues, but we should be able to close them. This also allows CI from the PR to test them) Fixes https://github.com/pytorch/pytorch/issues/156579 Fixes https://github.com/pytorch/pytorch/issues/156580 Fixes https://github.com/pytorch/pytorch/issues/126867 Pull Request resolved: https://github.com/pytorch/pytorch/pull/157756 Approved by: https://github.com/clee2000	2025-07-08 07:07:11 +00:00
Xuehai Pan	02715d0876	[BE][5/6] fix typos in test/ (test/dynamo/) (#157639 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157639 Approved by: https://github.com/yewentao256, https://github.com/jansel ghstack dependencies: #157638	2025-07-06 06:34:25 +00:00
William Wen	52e4e41cbc	[dynamo] do not issue lru_cache warning for functions in the top-level torch namespace (#157598 ) `lru_cache` usage warning was being raised for `torch.get_device_module()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157598 Approved by: https://github.com/Sidharth123-cpu	2025-07-04 08:17:50 +00:00
Nikita Shulga	5e636d664a	[BE] `@serialTest` decorator must be called (#157388 ) Otherwise it turns test into a trivial one(that always succeeds), as following example demonstrates ```python import torch from torch.testing._internal.common_utils import serialTest, run_tests, TestCase class MegaTest(TestCase): @serialTest def test_foo(self): if hasattr(self.test_foo, "pytestmark"): print("foo has attr and it is", self.test_foo.pytestmark) print("foo") @serialTest() def test_bar(self): if hasattr(self.test_bar, "pytestmark"): print("bar has attr and it is", self.test_bar.pytestmark) print("bar") if __name__ == "__main__": run_tests() ``` That will print ``` test_bar (__main__.MegaTest.test_bar) ... bar has attr and it is [Mark(name='serial', args=(), kwargs={})] bar ok test_foo (__main__.MegaTest.test_foo) ... ok ---------------------------------------------------------------------- Ran 2 tests in 0.013s ``` Added assert that arg is boolean in the decorator to prevent such silent skips in the future Pull Request resolved: https://github.com/pytorch/pytorch/pull/157388 Approved by: https://github.com/clee2000	2025-07-02 19:15:19 +00:00
William Wen	bdb7819166	[dynamo, nested graph breaks] remove recursive cell/freevar in instruction tx (#154078 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154078 Approved by: https://github.com/StrongerXi, https://github.com/jansel	2025-07-02 13:36:14 +00:00
Ryan Guo	a4b59498c5	Fix fake kernel for the `out=...` variant of `unbind_copy` (#156643 ) `unbind_copy(..., out=...)` returns None rather than the `out` argument (see https://github.com/pytorch/pytorch/issues/130829#issuecomment-2283936222), but the old fake kernel didn't account for that and caused an assertion failure in `pushPyOutToStack`. This patch fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156643 Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/bdhirsh ghstack dependencies: #156642	2025-06-27 01:34:07 +00:00
Ryan Guo	89aa708b39	[core] Dispatch to `at::nansum_out` rather than `at::native::nansum_out` (#156642 ) Calling `at::native::nansum_out` causes the fake kernel to dispatch to a `make_reduction` call and then segfaults later due to the `mutable_data_ptr` call in `TensorIteratorBase::build`. It also causes fake tensor propagation issue in Dynamo. The added tests demonstrate the aforementioned 2 issues. This patch fixes it by dispatching to `at::nansum_out` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156642 Approved by: https://github.com/zou3519	2025-06-27 01:34:07 +00:00
William Wen	6089ebcf6d	[dynamo] fix segfault due to dangling CacheEntry backend pointer (#156527 ) Fixes https://github.com/pytorch/pytorch/issues/155057 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156527 Approved by: https://github.com/anijain2305, https://github.com/jansel	2025-06-26 23:51:08 +00:00

1 2 3 4 5 ...

538 Commits