Yuanyuan Chen
fc8ac1216c
[4/N] Remove unused loop variables in tests ( #166690 )
...
This PR removes unused loop variables in tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/166690
Approved by: https://github.com/justinchuby , https://github.com/mlazos
2025-10-31 10:20:48 +00:00
Sherlock Huang
34d6ef7022
Update gm.print_readable to include Annotation ( #165397 )
...
Sample output
```
[rank0]: # Annotation: {'compile_with_inductor': 'flex_attention'} File: /data/users/bahuang/pytorch/torch/nn/attention/flex_attention.py:1490 in flex_attention, code: out, lse, max_scores = flex_attention_hop(
[rank0]: score_mod_2 = self.score_mod_2
[rank0]: mask_fn_2 = self.mask_fn_2
[rank0]: flex_attention_1 = torch.ops.higher_order.flex_attention(xq_5, xk_5, xv_3, score_mod_2, (2048, 2048, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___kv_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___kv_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_kv_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_kv_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___q_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___q_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_q_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_q_indices, 128, 128, mask_fn_2), 0.25, {'PRESCALE_QK': False, 'ROWS_GUARANTEED_SAFE': False, 'BLOCKS_ARE_CONTIGUOUS': False, 'WRITE_DQ': True, 'OUTPUT_LOGSUMEXP': True, 'OUTPUT_MAX': False}, (), (g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___mask_mod___closure___0_cell_contents,)); xq_5 = xk_5 = xv_3 = score_mod_2 = mask_fn_2 = None
[rank0]: out_2: "bf16[8, 4, 2048, 16]" = flex_attention_1[0]; flex_attention_1 = None
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165397
Approved by: https://github.com/yushangdi , https://github.com/anijain2305 , https://github.com/mlazos
2025-10-28 13:54:38 +00:00
Yuanyuan Chen
fdab48a7c1
Enable all PIE rules on ruff ( #165814 )
...
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796 Enum contains duplicate value: {value}
PIE808 Unnecessary start argument in range
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
PyTorch MergeBot
24520b8386
Revert "Enable all PIE rules on ruff ( #165814 )"
...
This reverts commit c79dfdc655 .
Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863 ))
2025-10-18 07:21:08 +00:00
Yuanyuan Chen
c79dfdc655
Enable all PIE rules on ruff ( #165814 )
...
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796 Enum contains duplicate value: {value}
PIE808 Unnecessary start argument in range
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
PyTorch MergeBot
e50dc40d28
Revert "Update gm.print_readable to include Annotation ( #165397 )"
...
This reverts commit 7a65770013 .
Reverted https://github.com/pytorch/pytorch/pull/165397 on behalf of https://github.com/malfet due to I don't know how/why, but it breaks windows tests, see 2e22b1a61e/1 ([comment](https://github.com/pytorch/pytorch/pull/165397#issuecomment-3417428128 ))
2025-10-17 22:35:50 +00:00
Sherlock Huang
7a65770013
Update gm.print_readable to include Annotation ( #165397 )
...
Sample output
```
[rank0]: # Annotation: {'compile_with_inductor': 'flex_attention'} File: /data/users/bahuang/pytorch/torch/nn/attention/flex_attention.py:1490 in flex_attention, code: out, lse, max_scores = flex_attention_hop(
[rank0]: score_mod_2 = self.score_mod_2
[rank0]: mask_fn_2 = self.mask_fn_2
[rank0]: flex_attention_1 = torch.ops.higher_order.flex_attention(xq_5, xk_5, xv_3, score_mod_2, (2048, 2048, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___kv_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___kv_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_kv_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_kv_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___q_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___q_indices, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_q_num_blocks, g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___full_q_indices, 128, 128, mask_fn_2), 0.25, {'PRESCALE_QK': False, 'ROWS_GUARANTEED_SAFE': False, 'BLOCKS_ARE_CONTIGUOUS': False, 'WRITE_DQ': True, 'OUTPUT_LOGSUMEXP': True, 'OUTPUT_MAX': False}, (), (g____import_torchtitan_dot_models_dot_attention___flex_attention_block_masks___block_causal___none___mask_mod___closure___0_cell_contents,)); xq_5 = xk_5 = xv_3 = score_mod_2 = mask_fn_2 = None
[rank0]: out_2: "bf16[8, 4, 2048, 16]" = flex_attention_1[0]; flex_attention_1 = None
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165397
Approved by: https://github.com/yushangdi , https://github.com/anijain2305
2025-10-17 18:35:18 +00:00
soulitzer
71aefd5595
[reland] Allow setting grad_dtype on leaf tensors ( #164751 )
...
ghstack-source-id: e44b3941530be83a630ec93f1478eec741ffca2e
Pull-Request-resolved: https://github.com/pytorch/pytorch/pull/162815
Fixes #ISSUE_NUMBER
Relanding due to internal weirdness. Separate PR to codev w/o ghstack.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164751
Approved by: https://github.com/albanD
2025-10-08 20:23:13 +00:00
PyTorch MergeBot
3ddf2018d0
Revert "Support setting grad_dtype on leaf tensors ( #162815 )"
...
This reverts commit dca73982c5 .
Reverted https://github.com/pytorch/pytorch/pull/162815 on behalf of https://github.com/yangw-dev due to break internal test D83850533, see more details below ([comment](https://github.com/pytorch/pytorch/pull/162815#issuecomment-3367498501 ))
2025-10-03 23:14:28 +00:00
soulitzer
dca73982c5
Support setting grad_dtype on leaf tensors ( #162815 )
...
`grad_dtype` is a new attribute on Tensor to control gradient dtype:
- Access/setting is leaf-only.
- grad_dtype is respected when (1) when assigning to .grad, and (2) in the engine after the previous node produces incoming gradients for AccumulateGrad. (See table below for details)
- Not setting grad_dtype preserves the current behavior. Accessing it returns `t.dtype`
- `grad_dtype` cannot be set when there is already a `.grad` present and the dtypes conflict.
| `grad_dtype` setting | Setting `.grad` manually | Incoming gradient from autograd engine |
|-----------------------|--------------------------|-----------------------------------------|
| **Default (tensor’s dtype)** | `.grad` must match tensor’s dtype | Engine casts incoming grad to tensor’s dtype |
| **Set to specific dtype** | `.grad` must match that dtype | Engine casts incoming grad to the specified dtype |
| **Set to `None`** | `.grad` may be any dtype | Engine does not cast; accepts incoming grad dtype as-is |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162815
Approved by: https://github.com/albanD
2025-10-02 23:09:07 +00:00
Simon Fan
821458d97a
[dynamo][hop] Introduce Local Map HOP ( #161458 )
...
Can't actually deploy it because of: https://github.com/pytorch/pytorch/issues/161456
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161458
Approved by: https://github.com/ydwu4
2025-09-17 09:32:38 +00:00
PyTorch MergeBot
e7c3f802ff
Revert "[dynamo][hop] Introduce Local Map HOP ( #161458 )"
...
This reverts commit 505458db80 .
Reverted https://github.com/pytorch/pytorch/pull/161458 on behalf of https://github.com/jeffdaily due to broke rocm tests ([comment](https://github.com/pytorch/pytorch/pull/161458#issuecomment-3299230458 ))
2025-09-16 15:14:36 +00:00
Simon Fan
505458db80
[dynamo][hop] Introduce Local Map HOP ( #161458 )
...
Can't actually deploy it because of: https://github.com/pytorch/pytorch/issues/161456
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161458
Approved by: https://github.com/ydwu4
2025-09-16 00:37:40 +00:00
Guilherme Leobas
789d494212
Defer loading hipify until it is needed ( #160824 )
...
Saves a few milliseconds when running a test case:
Before:
```
$ PYTORCH_TEST_WITH_DYNAMO=1 python test/dynamo/cpython/3_13/test_float.py GeneralFloatCases.test_float_pow
frames [('total', 1), ('ok', 1)]
inline_call []
.
----------------------------------------------------------------------
Ran 1 test in 1.497s
```
After:
```
$ PYTORCH_TEST_WITH_DYNAMO=1 python test/dynamo/cpython/3_13/test_float.py GeneralFloatCases.test_float_pow
frames [('total', 1), ('ok', 1)]
inline_call []
.
----------------------------------------------------------------------
Ran 1 test in 0.909s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160824
Approved by: https://github.com/zou3519
2025-09-02 15:27:37 +00:00
soulitzer
8171d6052e
Clear custom autograd Function ctx.to_save earlier ( #161171 )
...
Fixes https://github.com/pytorch/pytorch/issues/161186
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161171
Approved by: https://github.com/albanD
2025-09-02 03:26:31 +00:00
PyTorch MergeBot
13b65196db
Revert "Defer loading hipify until it is needed ( #160824 )"
...
This reverts commit 403a3a393c .
Reverted https://github.com/pytorch/pytorch/pull/160824 on behalf of https://github.com/atalman due to Broke slow tests test_utils.py::TestHipifyTrie::test_special_char_export_trie_to_regex [GH job link](https://github.com/pytorch/pytorch/actions/runs/17387051351/job/49355619371 ) [HUD commit link](403a3a393c ) ([comment](https://github.com/pytorch/pytorch/pull/160824#issuecomment-3243281628 ))
2025-09-01 21:34:13 +00:00
Guilherme Leobas
403a3a393c
Defer loading hipify until it is needed ( #160824 )
...
Saves a few milliseconds when running a test case:
Before:
```
$ PYTORCH_TEST_WITH_DYNAMO=1 python test/dynamo/cpython/3_13/test_float.py GeneralFloatCases.test_float_pow
frames [('total', 1), ('ok', 1)]
inline_call []
.
----------------------------------------------------------------------
Ran 1 test in 1.497s
```
After:
```
$ PYTORCH_TEST_WITH_DYNAMO=1 python test/dynamo/cpython/3_13/test_float.py GeneralFloatCases.test_float_pow
frames [('total', 1), ('ok', 1)]
inline_call []
.
----------------------------------------------------------------------
Ran 1 test in 0.909s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160824
Approved by: https://github.com/zou3519
2025-09-01 20:57:41 +00:00
Boyuan Feng
5f1010fbb3
[Graph Partition] Pass all OSS unit tests ( #154667 )
...
Graph partition leads to 6.2% speedup on vision_maskrcnn, 5.8% speedup on yolov3. [P1819700563](https://www.internalfb.com/phabricator/paste/view/P1819700563 ), 39.5% speedup on speech_transformer inference [P1830602200](https://www.internalfb.com/phabricator/paste/view/P1830602200 ), 85% speedup on speech_transformer training [P1831115315](https://www.internalfb.com/phabricator/paste/view/P1831115315 ).
Run the same diff on two days and both show speedup on average.
[first TorchInductor Benchmark ci run](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Mon%2C%2021%20Jul%202025%2016%3A37%3A55%20GMT&stopTime=Mon%2C%2028%20Jul%202025%2016%3A37%3A55%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=bf/partition-turn-on&lCommit=75ef90fe89b82c967362a2d40fdf1af047202bc2&rBranch=main&rCommit=abcb24f4de11f8fedf2c2c9ff53b6092ef42306d )
<img width="1885" height="752" alt="image" src="https://github.com/user-attachments/assets/13bba9fc-5dbf-42ad-8558-d54f7e367b41 " />
[second TorchInductorBenchmark ci run](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Wed%2C%2023%20Jul%202025%2016%3A38%3A27%20GMT&stopTime=Wed%2C%2030%20Jul%202025%2016%3A38%3A27%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=bf/partition-turn-on&lCommit=66de27e29338c26b1be94733049868cb0309ea52&rBranch=main&rCommit=70d2e9ba455c3c910f6f95b24171c8eee7bc00bf )
<img width="2513" height="1030" alt="image" src="https://github.com/user-attachments/assets/3a413dcb-2314-4292-919a-7ca181f9eeac " />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154667
Approved by: https://github.com/eellison
2025-08-12 04:37:58 +00:00
PyTorch MergeBot
09381f5dac
Revert "[Graph Partition] Pass all OSS unit tests ( #154667 )"
...
This reverts commit ca7315c171 .
Reverted https://github.com/pytorch/pytorch/pull/154667 on behalf of https://github.com/clee2000 due to broke inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_lpmf [GH job link](https://github.com/pytorch/pytorch/actions/runs/16885961204/job/47836769279 ) [HUD commit link](ca7315c171 ) note to self: bad TD ([comment](https://github.com/pytorch/pytorch/pull/154667#issuecomment-3176805477 ))
2025-08-11 20:34:27 +00:00
Boyuan Feng
ca7315c171
[Graph Partition] Pass all OSS unit tests ( #154667 )
...
Graph partition leads to 6.2% speedup on vision_maskrcnn, 5.8% speedup on yolov3. [P1819700563](https://www.internalfb.com/phabricator/paste/view/P1819700563 ), 39.5% speedup on speech_transformer inference [P1830602200](https://www.internalfb.com/phabricator/paste/view/P1830602200 ), 85% speedup on speech_transformer training [P1831115315](https://www.internalfb.com/phabricator/paste/view/P1831115315 ).
Run the same diff on two days and both show speedup on average.
[first TorchInductor Benchmark ci run](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Mon%2C%2021%20Jul%202025%2016%3A37%3A55%20GMT&stopTime=Mon%2C%2028%20Jul%202025%2016%3A37%3A55%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=bf/partition-turn-on&lCommit=75ef90fe89b82c967362a2d40fdf1af047202bc2&rBranch=main&rCommit=abcb24f4de11f8fedf2c2c9ff53b6092ef42306d )
<img width="1885" height="752" alt="image" src="https://github.com/user-attachments/assets/13bba9fc-5dbf-42ad-8558-d54f7e367b41 " />
[second TorchInductorBenchmark ci run](https://hud.pytorch.org/benchmark/compilers?dashboard=torchinductor&startTime=Wed%2C%2023%20Jul%202025%2016%3A38%3A27%20GMT&stopTime=Wed%2C%2030%20Jul%202025%2016%3A38%3A27%20GMT&granularity=hour&mode=inference&dtype=bfloat16&deviceName=cuda%20(h100)&lBranch=bf/partition-turn-on&lCommit=66de27e29338c26b1be94733049868cb0309ea52&rBranch=main&rCommit=70d2e9ba455c3c910f6f95b24171c8eee7bc00bf )
<img width="2513" height="1030" alt="image" src="https://github.com/user-attachments/assets/3a413dcb-2314-4292-919a-7ca181f9eeac " />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154667
Approved by: https://github.com/eellison
2025-08-11 16:25:12 +00:00
ghostspiders
af10f1f86c
Fix requires_cuda to requires_cuda_and_triton ( #160222 )
...
Fixes ##159399
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160222
Approved by: https://github.com/janeyx99
2025-08-10 07:05:52 +00:00
gaoyvfeng
50f23ff6f8
rename-HAS_CUDA-to-HAS_CUDA_AND_TRITON ( #159883 )
...
Fixes #159399
"Modified torch.testing._internal.inductor_utils and test/inductor"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159883
Approved by: https://github.com/janeyx99
2025-08-08 15:44:52 +00:00
yuchengliu1
1cdd665526
fix test_verbose_logs_dynamic_shapes with MSVC ( #159573 )
...
Operator `typeid` have different outputs in different compiler. There is a good example in [cppreference](https://www.en.cppreference.com/w/cpp/language/typeid.html ).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159573
Approved by: https://github.com/angelayi , https://github.com/jansel
2025-08-04 15:56:53 +00:00
xinan.lin
2ffb510942
[Break XPU][Indutor UT] Fix failures introduced by community. ( #159463 )
...
Fixes #159000 , Fixes #159335 , Fixes #159334 , Fixes #159332 , Fixes #159331 , Fixes #159330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159463
Approved by: https://github.com/jansel
2025-07-31 08:37:41 +00:00
Xuehai Pan
17687eb792
[BE][4/6] fix typos in test/ (test/inductor/) ( #157638 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157638
Approved by: https://github.com/yewentao256 , https://github.com/jansel
2025-07-06 06:34:25 +00:00
Xuehai Pan
f5e6e52f25
[BE][PYFMT] migrate PYFMT for test/inductor/ to ruff format ( #148186 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148186
Approved by: https://github.com/jansel
2025-06-24 11:12:11 +00:00
Simon Fan
c06c2569ee
[ca] Support TorchDispatchMode via pass through ( #156516 )
...
The CA initial trace just proxies nodes without dispatching any ops, we should hide it from ambient TorchDispatchModes
In terms of differences with eager autograd engine:
- For function mode, CA additionally disables/re-enables `_set_multithreading_enabled`
- For dispatch mode:
- accumulate grad doesn't go down the stealing path (inaccurate compile-time refcount) so the grad `detach` ops are `copy_` instead
- Since we always initial trace with dynamic shapes, and we filter out sizes, there's 1 aten.empty.memory_format for each mark_dynamic'd scalar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156516
Approved by: https://github.com/jansel
ghstack dependencies: #156374 , #156509
2025-06-21 18:33:47 +00:00
Arsh Zahed
c09b054878
Add runtime profiler info for AOTDispatcher prologue ( #155785 )
...
Fixes #155721
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155785
Approved by: https://github.com/bdhirsh
2025-06-21 03:34:07 +00:00
Simon Fan
17b38b850e
[ca] Allow using compiled autograd context managers during backward runtime ( #156120 )
...
Added an invariant that nested compiled autograd context managers must exit before their parent context manager. This allows us to defer the thread check.
FIXES https://github.com/pytorch/pytorch/issues/152219
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156120
Approved by: https://github.com/jansel
ghstack dependencies: #155521 , #155480
2025-06-18 03:01:15 +00:00
Simon Fan
9ff9c28fe8
[ca] Functionalize AccumulateGrad ( #155521 )
...
This PR changes compiled autograd's handling of gradient accumulation, by proxying it as a `call_accumulate_grad`, which does the .grad mutation in python bytecode for dynamo to see. For eager, the only change is the leaf invariant check was moved up.
Before:
- Compiled Autograd Engine: proxies call to inductor accumulate_grad op
- Dynamo: polyfills the inductor accumulate_grad op (not respecting all of the accumulateGrad implementation e.g. sparse, gradient layout contract)
```python
new_grad_strided: "f32[s21]" = torch.empty_like(getitem_1); getitem_1 = None
copy_: "f32[s21]" = new_grad_strided.copy_(aot3_tangents_1); copy_ = None
```
- AOTAutograd: functionalizes the copy_
After:
- Compiled Autograd Engine: proxies call to `call_accumulate_grad`, which calls `torch._dynamo.compiled_autograd.ops.AccumulateGrad`/`AccumulateGrad_apply_functional_no_hooks_ivalue`, similar to other functional autograd implementations, but also sets .grad from python. Hooks are still handled separately from this call.
- Dynamo: `torch._dynamo.compiled_autograd.ops.AccumulateGrad` was allow_in_graph'd
- AOTAutograd: traces into the op, with FunctionalTensors.
While functionalizing the tensors, we insert an autograd Error node to ensure that we don't use the autograd meta from tracing. This clashes with the "leaf variable has been moved into the graph interior" error check, I could not find a way to identify a FunctionalTensor subclass from C++, so I bypass that for Error nodes in the compiled case.
In the CI PR, this fixes 19 tests relating to sparse tensors, and more are hidden by an earlier failure in dynamo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155521
Approved by: https://github.com/jansel
2025-06-16 18:45:02 +00:00
Simon Fan
212575f994
[ca] Annotate AccumulateGrad branching and add polyfill tests ( #155289 )
...
Annotates AccumulateGrad and tracks the semantics for AccumulateGrad's polyfill , except for Scenario 1.4: Cloning MKLDNN new_grad and Scenario 2.2: Vmap-incompatible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155289
Approved by: https://github.com/jansel , https://github.com/albanD
2025-06-12 02:10:52 +00:00
Shunting Zhang
61e13782dd
[inductor] handle -1 for pointless view pairs ( #155295 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155295
Approved by: https://github.com/laithsakka , https://github.com/jansel
2025-06-11 22:20:36 +00:00
Simon Fan
87b002b6fb
[ca] make torch.compile API respect ambient disable contexts ( #155473 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155473
Approved by: https://github.com/jansel
2025-06-11 19:09:29 +00:00
Animesh Jain
271ca679a8
[reland][dynamo] Record the pre-graph bytecode using fast record function event ( #154974 )
...
reland of https://github.com/pytorch/pytorch/pull/154769
@diff-train-skip-merge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154974
Approved by: https://github.com/Lucaskabela , https://github.com/jansel
2025-06-06 13:11:03 +00:00
PyTorch MergeBot
e01fde8213
Revert "[reland][dynamo] Record the pre-graph bytecode using fast record function event ( #154974 )"
...
This reverts commit bee9c70c5d .
Reverted https://github.com/pytorch/pytorch/pull/154974 on behalf of https://github.com/malfet due to Broke inductor tests, see 3c72b9fd8f/1 ([comment](https://github.com/pytorch/pytorch/pull/154974#issuecomment-2944370617 ))
2025-06-05 13:36:21 +00:00
Animesh Jain
bee9c70c5d
[reland][dynamo] Record the pre-graph bytecode using fast record function event ( #154974 )
...
reland of https://github.com/pytorch/pytorch/pull/154769
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154974
Approved by: https://github.com/Lucaskabela , https://github.com/jansel
2025-06-05 07:25:04 +00:00
PyTorch MergeBot
a7e496a896
Revert "[dynamo] Record the pre-graph bytecode using fast record function event ( #154769 )"
...
This reverts commit 409c396a48 .
Reverted https://github.com/pytorch/pytorch/pull/154769 on behalf of https://github.com/seemethere due to This fails internal tests see [fburl.com/diff/67gyp7gp](https://fburl.com/diff/67gyp7gp ) ([comment](https://github.com/pytorch/pytorch/pull/154769#issuecomment-2933629894 ))
2025-06-03 06:13:49 +00:00
Animesh Jain
409c396a48
[dynamo] Record the pre-graph bytecode using fast record function event ( #154769 )
...

Adds another event in the profiler traces. This can help us find models where pre-graph bytecode is very expensive.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154769
Approved by: https://github.com/zou3519 , https://github.com/williamwen42 , https://github.com/StrongerXi , https://github.com/jansel
2025-06-02 22:33:27 +00:00
Simon Fan
d2f506cae8
[ca] disable ca for functorch grad and run all HOO tests ( #154147 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154147
Approved by: https://github.com/zou3519
ghstack dependencies: #154133
2025-05-28 22:06:13 +00:00
Simon Fan
857f21631d
[ca] fix hop_db tests ( #154133 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154133
Approved by: https://github.com/zou3519
2025-05-28 22:06:13 +00:00
Simon Fan
d1f1ff8610
[ddp] propagate use_python_reducer to C++ reducer ( #152735 )
...
C++ Reducer is silently incorrect under CA, its implementation is no-oping the collective. I'm guessing that it was no-op'd because in DDP + python reducer, the C++ reducer is still being initialized.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152735
Approved by: https://github.com/fegin
ghstack dependencies: #153300 , #152689
2025-05-16 01:38:03 +00:00
Simon Fan
5aea57d653
[ca][dynamo] always run eager checkpoint region's recomputation in eager ( #153300 )
...
I slap disable on the recomputation hook, otherwise the partitioner may save less/more activations and mismatch with the expected eager count in checkpoint. See code comment `Note: [compiled autograd and checkpoint unpack hook]`.
This fixes all non-nested checkpointing tests. I also wrap nested checkpointing tests, and a few of them still fail.
This also seems to fix all PYTORCH_TEST_WITH_DYNAMO checkpointing tests except for `TestAutograd.test_checkpointing_without_reentrant_custom_function_works`. For those tests, it looks like we fail to HOPify the checkpointed region and when the backward executes the unpack hooks, dynamo tried to trace them. This messed up the internal state tracking of checkpointing, some raising the _StopRecomputationError and others raising the same count mismatch error as CA.
FIXES https://github.com/pytorch/pytorch/issues/127115
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153300
Approved by: https://github.com/jansel
2025-05-16 01:37:48 +00:00
PyTorch MergeBot
236b08cbf8
Revert "[ca][dynamo] always run eager checkpoint region's recomputation in eager ( #153300 )"
...
This reverts commit 4863e5c843 .
Reverted https://github.com/pytorch/pytorch/pull/153300 on behalf of https://github.com/malfet due to Looks like it breaks rocm, see fa8543454a/1 ([comment](https://github.com/pytorch/pytorch/pull/153300#issuecomment-2884489459 ))
2025-05-15 16:58:52 +00:00
Simon Fan
4863e5c843
[ca][dynamo] always run eager checkpoint region's recomputation in eager ( #153300 )
...
I slap disable on the recomputation hook, otherwise the partitioner may save less/more activations and mismatch with the expected eager count in checkpoint. See code comment `Note: [compiled autograd and checkpoint unpack hook]`.
This fixes all non-nested checkpointing tests. I also wrap nested checkpointing tests, and a few of them still fail.
This also seems to fix all PYTORCH_TEST_WITH_DYNAMO checkpointing tests except for `TestAutograd.test_checkpointing_without_reentrant_custom_function_works`. For those tests, it looks like we fail to HOPify the checkpointed region and when the backward executes the unpack hooks, dynamo tried to trace them. This messed up the internal state tracking of checkpointing, some raising the _StopRecomputationError and others raising the same count mismatch error as CA.
FIXES https://github.com/pytorch/pytorch/issues/127115
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153300
Approved by: https://github.com/jansel
2025-05-15 08:10:35 +00:00
Simon Fan
216e28f7e9
[ca] run xfails up until their last passing backend ( #153279 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153279
Approved by: https://github.com/jansel
ghstack dependencies: #153193 , #153222
2025-05-13 16:42:10 +00:00
Simon Fan
a80eb84a5f
[ca] support higher order gradients (create_graph=True) ( #153222 )
...
Adds create_graph support if you don't compile or compile only with torch.compile(backend="eager").
Using a backend that uses AOTDispatch produces a post-dispatch AOT backward, where its double backward will be silently incorrect if the forward trace involved any ops that are not composite implicit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153222
Approved by: https://github.com/jansel
ghstack dependencies: #153193
2025-05-13 16:42:09 +00:00
Simon Fan
37efaf4af9
[ca][api] config api shouldn't error with optimize_assert ( #153193 )
...
Toggling on `torch._dynamo.config.compiled_autograd = True` was erroring export (optimize_assert didn't have `rebuild_ctx` defined). Separately add a way to `rebuild_ctx` for `optimize_assert` since it is a public API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153193
Approved by: https://github.com/jansel
2025-05-13 16:42:02 +00:00
Simon Fan
500cbeee4e
[dynamo][ca] support dynamic annotations on tensors in ListVariables/TupleVariables ( #152119 )
...
Together with https://github.com/pytorch/pytorch/pull/151962 , FIXES https://github.com/pytorch/pytorch/issues/133575
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152119
Approved by: https://github.com/jansel
ghstack dependencies: #151731 , #151962
2025-05-08 15:12:16 +00:00
Simon Fan
6dea8ef555
[ca] hide unused scalar int sizes from dynamo ( #151962 )
...
together with https://github.com/pytorch/pytorch/pull/151731 , FIXES https://github.com/pytorch/pytorch/issues/113129 https://github.com/pytorch/pytorch/issues/146168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151962
Approved by: https://github.com/jansel
ghstack dependencies: #151731
2025-05-08 15:12:16 +00:00
Simon Fan
8f380b239f
[ca] mark scalar int sizes as dynamic via tensor wrapping ( #151731 )
...
This is the only way to support dynamic shapes on scalars right now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151731
Approved by: https://github.com/jansel
2025-05-08 15:12:08 +00:00