Commit Graph

525 Commits

Author SHA1 Message Date
Yuanyuan Chen
3255e7872b Enable all flake8-logging-format rules (#164655)
These rules are enabled by removing existing suppressions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164655
Approved by: https://github.com/janeyx99, https://github.com/mlazos
2025-10-19 00:59:28 +00:00
Yuanyuan Chen
fdab48a7c1 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
PyTorch MergeBot
24520b8386 Revert "Enable all PIE rules on ruff (#165814)"
This reverts commit c79dfdc655.

Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))
2025-10-18 07:21:08 +00:00
Yuanyuan Chen
c79dfdc655 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
Tugsbayasgalan Manlaibaatar
08c97b4a1f Don't run compile inside kernel invocation (#165687)
When we call torch.compile during fake tensor prop, we shouldn't actually compile because we can't guarantee that the compiled artifact can be fake tensor prop-d. (for example, inductor backend). Instead we should just skip compiling. However, the inner compile will be triggered when being executed in runtime.

Fixes: https://github.com/pytorch/pytorch/issues/151328

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165687
Approved by: https://github.com/zou3519
2025-10-17 19:03:57 +00:00
Animesh Jain
c9b2a09530 [export] Turn on install_free_tensors flag (#164691)
The final step in removing the discrepancy between
torch.compile(fullgraph=True) and torch.export(strict=True).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164691
Approved by: https://github.com/avikchaudhuri
2025-10-14 15:33:50 +00:00
PyTorch MergeBot
fa3916f466 Revert "[export] Turn on install_free_tensors flag (#164691)"
This reverts commit 220a34118f.

Reverted https://github.com/pytorch/pytorch/pull/164691 on behalf of https://github.com/seemethere due to Breaks some internal things, both me and author agreed that revert was the best course of action ([comment](https://github.com/pytorch/pytorch/pull/164691#issuecomment-3400013759))
2025-10-14 03:58:12 +00:00
PyTorch MergeBot
1803d40c99 Reapply "[export] Turn on install_free_tensors flag (#164691)" (#165353)
This reverts commit 9166f6120f.

Reverted https://github.com/pytorch/pytorch/pull/165353 on behalf of https://github.com/seemethere due to This is causing merge conflicts since a dependent PR wasn't reverted ([comment](https://github.com/pytorch/pytorch/pull/165353#issuecomment-3400006587))
2025-10-14 03:52:50 +00:00
Animesh Jain
9166f6120f Revert "[export] Turn on install_free_tensors flag (#164691)" (#165353)
This reverts commit 220a34118f.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165353
Approved by: https://github.com/seemethere
2025-10-13 23:40:11 +00:00
Animesh Jain
220a34118f [export] Turn on install_free_tensors flag (#164691)
The final step in removing the discrepancy between
torch.compile(fullgraph=True) and torch.export(strict=True).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164691
Approved by: https://github.com/avikchaudhuri
2025-10-11 04:26:09 +00:00
PyTorch MergeBot
34ac9b61cb Revert "[export] Turn on install_free_tensors flag (#164691)"
This reverts commit 0e9b3a772a.

Reverted https://github.com/pytorch/pytorch/pull/164691 on behalf of https://github.com/izaitsevfb due to breaks tests internally, author asked to revert, see [D84230990](https://www.internalfb.com/diff/D84230990) ([comment](https://github.com/pytorch/pytorch/pull/164691#issuecomment-3387718323))
2025-10-09 22:53:50 +00:00
Animesh Jain
0e9b3a772a [export] Turn on install_free_tensors flag (#164691)
The final step in removing the discrepancy between
torch.compile(fullgraph=True) and torch.export(strict=True).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164691
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #164721
2025-10-09 03:25:15 +00:00
Maggie Moss
c855f8632e Pyrefly suppressions 7/n (#164913)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Almost there!

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
 INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164913
Approved by: https://github.com/oulgen
2025-10-08 07:27:17 +00:00
Animesh Jain
0e5773b7fa [dynamo][export] Do not graph break on torch.autograd._profiler_enabled for export (#164418)
Actually we would like to not graph break even in the case of Dynamo. But there is a weird-unsolved bug with Kineto + Dynamo when there are distributed jobs that lead to NCCL timeouts. This bug is a rare edege case, but we have not been able to root cause it yet.

But for export, we do not anticipate JIT tracing in distributed job training and therefore this PR is safe for export.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164418
Approved by: https://github.com/StrongerXi, https://github.com/williamwen42
2025-10-02 09:00:00 +00:00
James Wu
bfe9e60ffb Simplify PrecompileContext to no longer be a CacheArtifactManager (#162886)
Summary:
This diff does a big refactor of PrecompileContext to make it considerably simpler: instead of being a CacheArtifactManager and managing a bunch of bytes, it simply stores two things: dynamo cache entries and backend cache entries. When asked, it stitches them together into PrecompileCacheEntries, which are stored by DynamoCache.

This structure then allows us to register DynamoCache to the regular Megacache API, instead of having two separate APIs that are confusing. It also lets us remove the autotune cache integration, since MegaCache API will automatically store autotune cache entries.

The intent here is that users who want to use caching precompile will simply be able to use torch.compiler.save_cache_artifacts as before, just with `torch.dynamo.config.caching_precompile` set to True. They can also directly interact with PrecompileContext if they wish to specifically only load Precompile entries, using PrecompileContext.create_cache_entries().

Saving single entries and such with DynamoCache still works normally.

Test Plan:
All existing unit tests pass.

Rollback Plan:

Differential Revision: D82380307

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162886
Approved by: https://github.com/zhxchen17
2025-09-20 01:24:37 +00:00
Scott Wolchok
6b608dfe81 Add DISABLE_JUSTKNOBS to torch/_utils_internal.py and use it for dynamo _maybe_set_eval_frame (#162298)
If JustKnobs is disabled (as it always is in OSS), we can easily avoid an extra layer of Python function call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162298
Approved by: https://github.com/ezyang
2025-09-15 23:00:39 +00:00
James Wu
e156a07171 [Precompile] [RFC] Implement aot_compile_module (#162171)
This PR adds a new interface _aot_compile to `OptimizedModule`, so that the following is possible:

```
mod = SimpleLinearModule()
inputs = [
            ModelInput(
                args=(torch.randn(3, 3),),
                kwargs={},
                contexts=[torch.no_grad(), eval_mode(model)],
            ),
            ModelInput(
                args=(torch.randn(3, 3),), kwargs={}, contexts=[train_mode(model)]
            ),
        ]
        assert isinstance(model, torch._dynamo.eval_frame.OptimizedModule)
        model._aot_compile(
            inputs,
        )
```

After this PR, you can AOT precompile NanoGPT and use it to train directly. I'll share my fork of the repo to make this work.

## ModelInput
The `ModelInput` API is a work in progress; for now it represents a set of inputs and contexts to instruct the compiler to compile. Most commonly, this is "compile an eval mode with no grad, and  a training mode with grad", but also contains things like autocasting contexts, etc.

## Dispatch
Dispatching is super simple here, we just iterate through all the precompiled fullgraphs and check guards for each one until there's one htat passes. I'm a bit worried that having this in python code is going to be too expensive. The guard checks are happening in C++ anyway, though, so the only python bottlenecked step here is just the for loop, so perhaps the overhead will not be high. I'll work on measuring this, though.

## TODOs

This PR does not support `mod.compile()`, only `torch.compile(mod)`. In order to support `mod.compile()`, we'll need to update torch.nn.Module with an updated implementation — I can add that frontend later.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162171
Approved by: https://github.com/zhxchen17
2025-09-14 23:32:28 +00:00
Tugsbayasgalan Manlaibaatar
047603d35b New export implementation with flat inp/out (#162167)
This is my first attempt of building new export API. The main thing it addresses is correctly getting input and output relations. Subsequent diffs willl add functionality for dynamic shapes, nn_module_stack etc.

Differential Revision: [D81793205](https://our.internmc.facebook.com/intern/diff/D81793205)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162167
Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri
2025-09-06 20:03:52 +00:00
William Wen
f36f285953 [dynamo] change error_on_graph_break/fullgraph semantics (#161747)
This PR implements the semantics change to `torch._dynamo.error_on_graph_break`:
- ~`torch.compile` now has a new `error_on_graph_break` kwarg that serves as a lower-priority toggle for erroring/continuing on graph breaks~
- `error_on_graph_break` is a new internal `torch.compile `setting that is lower-priority than `fullgraph`. It allows the user to toggle erroring/continuing on graph breaks.
- `error_on_graph_break` does nothing when `fullgraph=True`
- `error_on_graph_break` does NOT guarantee a single graph

Followup [DONE]: need to change the programming model docs to reflect the 3 graph break modes for compilation:
- `fullgraph=True`: enforce one graph, no graph breaks, cannot be toggled
- `fullgraph=False, error_on_graph_break=True`: errors on graph breaks, latter can be toggled during compile time
- `fullgraph=False, error_on_graph_break=False`: resumes tracing on graph breaks, latter can be toggled during compile time

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161747
Approved by: https://github.com/mlazos
ghstack dependencies: #161739
2025-09-04 17:10:17 +00:00
William Wen
8678d831c4 [dynamo] rename set_fullgraph to error_on_graph_break (#161739)
Renaming `set_fullgraph` to `error_on_graph_break` for now. There are no semantic differences yet. In a followup PR, we will introduce a new `torch.compile` option `error_on_graph_break` that has lower priority than `fullgraph` so that `fullgraph` really returns 1 graph.

I could keep `set_fullgraph` as a deprecated alias for `error_on_graph_break` for now, but I'm hoping that won't be necessary since it's still private API (there are no internal callsites yet, and there are no significant OSS callsites yet).

 cc @albanD @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela @mlazos @guilhermeleobas @xmfan as primary users for `set_fullgraph`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161739
Approved by: https://github.com/xmfan, https://github.com/Lucaskabela, https://github.com/anijain2305, https://github.com/mlazos
2025-09-04 01:15:06 +00:00
Avik Chaudhuri
3c45af079a kill allow_complex_guards_as_runtime_asserts (#161794)
Summary:
[reland]
Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).

Test Plan:
updated tests

Rollback Plan:

Differential Revision: D81334984

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161794
Approved by: https://github.com/zhxchen17
2025-09-04 00:17:01 +00:00
PyTorch MergeBot
47742081c9 Revert "kill allow_complex_guards_as_runtime_asserts (#160198)"
This reverts commit 69d91b94ba.

Reverted https://github.com/pytorch/pytorch/pull/160198 on behalf of https://github.com/jeffdaily due to let's revert again instead of waiting for forward fix, see earlier comments ([comment](https://github.com/pytorch/pytorch/pull/160198#issuecomment-3235165462))
2025-08-28 22:50:37 +00:00
Avik Chaudhuri
69d91b94ba kill allow_complex_guards_as_runtime_asserts (#160198)
Summary: Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).

Test Plan:
updated tests

Rollback Plan:

Differential Revision: D79903317

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160198
Approved by: https://github.com/ezyang
2025-08-28 19:36:19 +00:00
PyTorch MergeBot
a8270dd124 Revert "kill allow_complex_guards_as_runtime_asserts (#160198)"
This reverts commit 196232bb93.

Reverted https://github.com/pytorch/pytorch/pull/160198 on behalf of https://github.com/atalman due to dynamo/test_activation_checkpointing.py::ActivationCheckpointingViaTagsTestsCUDA::test_compile_selective_checkpoint_triton_kernel_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/17289619543/job/49074475338) [HUD commit link](196232bb93) ([comment](https://github.com/pytorch/pytorch/pull/160198#issuecomment-3234013520))
2025-08-28 15:40:37 +00:00
Avik Chaudhuri
196232bb93 kill allow_complex_guards_as_runtime_asserts (#160198)
Summary: Since `allow_complex_guards_as_runtime_asserts` is now sync'd with `prefer_deferred_runtime_asserts_over_guards`, we can kill the former (especially since it was a export-only concept).

Test Plan:
updated tests

Rollback Plan:

Differential Revision: D79903317

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160198
Approved by: https://github.com/ezyang
2025-08-28 07:59:29 +00:00
zhxchen17
c36d18d7e8 [rfc] aot precompile with custom backend api (#161383)
Adding a new feature to torch.compile(fullgraph=True) which "aot_compile" a function with given example inputs.

On user side it should look like:

```
def foo(x, y):
    return x + y

compiled_fn = torch.compile(fullgraph=True).aot_compile(((torch.randn(3, 4), torch.randn(3, 4)), {}))
```

This is different from the traditional `torch.compile` workflow where compiled object will be a drop-in replacement for the original eager model:
```
tensor input -> torch.compile() -> tensor output (and populates the cache entry)
```
`aot_compile` will instead return a compiled function as result, and it's purely functional and doesn't populate the compile cache entry in dynamo:
```
tensor input -> aot_compile() -> compiled function
```
The aot compiled function will be savable and loadable on disk as well:
```
torch.compile(fullgraph=True).aot_compile(...).save_compiled_function('my/path')
compiled_fn = torch.compiler.load_compiled_function("my/path")
```

Right now we treat compiler backend as a blackbox and it needs to implement the following interface to make compile artifacts serialzable:
```
class SerializableCallable:
    def save_compile_artifacts(): ....
    def load_compile_artifacts(): ....
```
We haven't implemented this for inductor yet, but this shouldn't be an issue since we gate this feature through `torch._dynamo.config.aot_compile` (which defaults to False), and this will be left as follow up PR to the current PR.

Differential Revision: [D80914270](https://our.internmc.facebook.com/intern/diff/D80914270/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161383
Approved by: https://github.com/tugsbayasgalan
2025-08-27 21:26:25 +00:00
William Wen
8b78ba07b1 [dynamo, nested graph breaks] add nested graph break tests (#144516)
Note: nested graph break tests (and wrapped tests) are xfailed/skipped for now - we will iteratively enable the tests as more of the nested graph break implementation is complete.

Differential Revision: [D81084809](https://our.internmc.facebook.com/intern/diff/D81084809)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144516
Approved by: https://github.com/anijain2305
2025-08-27 03:00:56 +00:00
angelayi
4d078cfc4e [fx] Add is_fx_symbolic_tracing flag (#161385)
Fixes https://github.com/pytorch/pytorch/issues/135276

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161385
Approved by: https://github.com/pianpwk
2025-08-26 22:26:27 +00:00
PyTorch MergeBot
6686974ddd Revert "[dynamo, nested graph breaks] add nested graph break tests (#144516)"
This reverts commit 9a756c2d71.

Reverted https://github.com/pytorch/pytorch/pull/144516 on behalf of https://github.com/atalman due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/144516#issuecomment-3225659358))
2025-08-26 20:40:17 +00:00
Zhengxu Chen
74124d1b46 [reland] [dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#161514)
Summary:
convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function.

This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame.

Test Plan:
CI

Rollback Plan:

Differential Revision: D81041296

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161514
Approved by: https://github.com/tugsbayasgalan
2025-08-26 19:16:05 +00:00
PyTorch MergeBot
e795450a35 Revert "[dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900)"
This reverts commit 447d34b5f8.

Reverted https://github.com/pytorch/pytorch/pull/160900 on behalf of https://github.com/atalman due to reverting since can't land existing diff internally, will need to reland it ([comment](https://github.com/pytorch/pytorch/pull/160900#issuecomment-3224029031))
2025-08-26 12:45:59 +00:00
William Wen
9a756c2d71 [dynamo, nested graph breaks] add nested graph break tests (#144516)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144516
Approved by: https://github.com/anijain2305
ghstack dependencies: #157971, #159281
2025-08-26 00:57:58 +00:00
zhxchen17
447d34b5f8 [dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900)
convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function.

This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame.
@exported-using-ghexport

Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801/)

Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160900
Approved by: https://github.com/tugsbayasgalan, https://github.com/anijain2305
2025-08-25 23:16:21 +00:00
PyTorch MergeBot
3e210f90c2 Revert "[dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900)"
This reverts commit 1113e7de30.

Reverted https://github.com/pytorch/pytorch/pull/160900 on behalf of https://github.com/atalman due to executorch failure ([comment](https://github.com/pytorch/pytorch/pull/160900#issuecomment-3221372096))
2025-08-25 18:56:18 +00:00
zhxchen17
1113e7de30 [dynamo] Refactor convert_frame.compile_frame to be self contained function. [5/n] (#160900)
convert_frame.compile_frame used to take a callback transform function which will capture the frame object it has, but the frame information is not passed directly into compile_frame function.

This PR changes the signature of compile_frame so that frame information is directly passed in the function without taking a callback. This makes it easier to build fullgraph capture API on top of compile_frame.
@exported-using-ghexport

Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801/)

Differential Revision: [D80469801](https://our.internmc.facebook.com/intern/diff/D80469801)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160900
Approved by: https://github.com/tugsbayasgalan, https://github.com/anijain2305
2025-08-25 14:53:54 +00:00
Lucas Kabela
d52bb67ac3 typing registry.py (#160367)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160367
Approved by: https://github.com/Skylion007
ghstack dependencies: #160362, #160363, #160364, #160365, #160366
2025-08-15 02:09:31 +00:00
Lucas Kabela
9faca5f260 typing debugging.py (#160364)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160364
Approved by: https://github.com/Skylion007
ghstack dependencies: #160362, #160363
2025-08-15 02:09:31 +00:00
Zhengxu Chen
16d15445f8 Fullgraph graph capture with dynamo. (#159749)
Summary:
Following up on Avik's doc https://docs.google.com/document/d/11RW0Bbkp1QwFbEu8rCNW5d7wUFaEkxbL0uLyqcc2jTk/edit?tab=t.0

We are experimenting with a new API which utilizes torch.compile(fullgraph=True) and intend to use it to replace the old dynamo.export() API.

This PR adds a prototype for the API described in the doc.

Test Plan:
test_misc -- -k test_aot_capture

Rollback Plan:

Differential Revision: D79534608

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159749
Approved by: https://github.com/tugsbayasgalan
2025-08-12 22:06:18 +00:00
Lucas Kabela
2b1ae29960 [Dynamo][Better Engineering] Add typing annotations to guard and source (#158397) (#159491)
Summary:
X-link: https://github.com/pytorch/executorch/pull/12986

As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo

This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py`

Running
```
mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log
```

| -------- | Lines Unannotated | Lines Total | % lines covered | Funcs Unannotated | Funcs Total | % funcs covered |
| -------- | ------- | -------- | ------- | ------- | ------- | ------- |
| Main  |  1227 | 2208 | 55.57% | 207 | 362 | 57.18% |
| This PR | 2217 | 2217 | 100.00% | 362 | 362 | 100.00% |
| Delta    | +990 | +9 | +44.43% | +155 | 0 | +42.82% |

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 jerryzh168 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Reviewed By: JacobSzwejbka, yangw-dev

Differential Revision: D79199389

Pulled By: Lucaskabela

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159491
Approved by: https://github.com/anijain2305, https://github.com/yangw-dev
2025-07-30 22:57:50 +00:00
PyTorch MergeBot
d987a6f7f0 Revert "[Dynamo][Better Engineering] Add typing annotations to guard and source (#158397)"
This reverts commit abcb24f4de.

Reverted https://github.com/pytorch/pytorch/pull/158397 on behalf of https://github.com/yangw-dev due to Suggested to fix failing internal signals on D78911890 ([comment](https://github.com/pytorch/pytorch/pull/158397#issuecomment-3133823766))
2025-07-29 19:49:40 +00:00
zhxchen17
90c241dedd [precompile] Support user defined function calls from bytecode. (#158947)
Previously precompile was implemented under the assumption that dynamo always inlines the user code and generate resume functions when a graph break is hit. In cases like nanogpt training, there exists nontrivial amount of code causing dynamo to fail the speculation and stop inlining certain type of user function. This results in more code objects to be tracked by CompilePackage.

Since these new code objects are user defined, we need to also serialize the location of these code so that we can load the precompile entries to the these code objects in another process.

With this fix, we are able to run nanogpt inference+training with precompile under torchbench.

Differential Revision: [D78691422](https://our.internmc.facebook.com/intern/diff/D78691422/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158947
Approved by: https://github.com/jamesjwu
2025-07-24 20:10:57 +00:00
Lucas Kabela
abcb24f4de [Dynamo][Better Engineering] Add typing annotations to guard and source (#158397)
As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo

This PR adds strict typing support to a critical set of files for dynamo, `source.py` and the base `_guards.py`

Running
```
mypy torch/_dynamo/source.py torch/_guards.py --linecount-report /tmp/coverage_log
```

| -------- | Lines Unannotated | Lines Total | % lines covered | Funcs Unannotated | Funcs Total | % funcs covered |
| -------- | ------- | -------- | ------- | ------- | ------- | ------- |
| Main  |  1227 | 2208 | 55.57% | 207 | 362 | 57.18% |
| This PR | 2217 | 2217 | 100.00% | 362 | 362 | 100.00% |
| Delta    | +990 | +9 | +44.43% | +155 | 0 | +42.82% |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158397
Approved by: https://github.com/anijain2305
2025-07-24 15:55:18 +00:00
James Wu
f55c5d085e [Precompile] Various small bugfixes, add CachingPrecompile to torchbench (#158847)
This PR addresses a few small bugfixes needed to make NanoGPT inference work, and also adds a new `--caching-precompile` argument to torchbench. With `--caching-precompile`, after every benchmark we save precompile artifacts to DynamoCache, allowing us to test caching precompile on all existing benchmarks.

The following bugfixes are in this PR to make all of this work:
- Fix global variables being pruned with DUPLICATE_INPUT guards. DUPLICATE_INPUT guards have additional vars from the second input, which we track with additional_local_vars, but we never tracked additional global variables. This fixes the issue. (See torch/_dynamo/guards.py changes)
- Return None from PRecompileContext.serialize() if no new dynamo compiles occurred. There's no reason to save artifacts (i.e. autotuning artifacts, etc) if no dynamo_compile occurred, so we return None early. We may later want to support editing existing dynamo artifacts as a TODO, but that's upcoming.
- log `dynamo_start` on CompilePackage.load: This is only needed so that tlparse doesn't ignore TORCH_TRACE logs generated when caching precompile hits. If there are no actual compiles, we never log a "dynamo_start" entry, which makes internal tlparse ignore the TORCH_TRACE file.

## Test Plan

After this PR, the following now works:
```
TORCH_LOGS=dynamo tlp python benchmarks/dynamo/torchbench.py --only nanogpt --performance  --inference --backend inductor  --caching-precompile --warm-start-latency
```
tlparse result (internal):
Cold Start (6 seconds):
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpAWe0zD/dedicated_log_torch_trace_vk9nkp4m.log/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

Warm Start (~1 s):
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpAWe0zD/dedicated_log_torch_trace_5l4iwrpm.log/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

The 1 second of warm start here can be improved: the costs here are mostly in starting up workers and triton and initializing CUDA, a lot of which should not be included in the compile time cost in real world scenarios where these are already loaded before training begins.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158847
Approved by: https://github.com/zhxchen17
2025-07-24 14:09:54 +00:00
PyTorch MergeBot
76be282e3a Revert "[Precompile] Various small bugfixes, add CachingPrecompile to torchbench (#158847)"
This reverts commit d898d0d437.

Reverted https://github.com/pytorch/pytorch/pull/158847 on behalf of https://github.com/jithunnair-amd due to Broke ROCm CI jobs on MI200 and MI300 ([comment](https://github.com/pytorch/pytorch/pull/158847#issuecomment-3109664713))
2025-07-23 18:25:46 +00:00
James Wu
d898d0d437 [Precompile] Various small bugfixes, add CachingPrecompile to torchbench (#158847)
This PR addresses a few small bugfixes needed to make NanoGPT inference work, and also adds a new `--caching-precompile` argument to torchbench. With `--caching-precompile`, after every benchmark we save precompile artifacts to DynamoCache, allowing us to test caching precompile on all existing benchmarks.

The following bugfixes are in this PR to make all of this work:
- Fix global variables being pruned with DUPLICATE_INPUT guards. DUPLICATE_INPUT guards have additional vars from the second input, which we track with additional_local_vars, but we never tracked additional global variables. This fixes the issue. (See torch/_dynamo/guards.py changes)
- Return None from PRecompileContext.serialize() if no new dynamo compiles occurred. There's no reason to save artifacts (i.e. autotuning artifacts, etc) if no dynamo_compile occurred, so we return None early. We may later want to support editing existing dynamo artifacts as a TODO, but that's upcoming.
- log `dynamo_start` on CompilePackage.load: This is only needed so that tlparse doesn't ignore TORCH_TRACE logs generated when caching precompile hits. If there are no actual compiles, we never log a "dynamo_start" entry, which makes internal tlparse ignore the TORCH_TRACE file.

## Test Plan

After this PR, the following now works:
```
TORCH_LOGS=dynamo tlp python benchmarks/dynamo/torchbench.py --only nanogpt --performance  --inference --backend inductor  --caching-precompile --warm-start-latency
```
tlparse result (internal):
Cold Start (6 seconds):
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpAWe0zD/dedicated_log_torch_trace_vk9nkp4m.log/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

Warm Start (~1 s):
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpAWe0zD/dedicated_log_torch_trace_5l4iwrpm.log/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

The 1 second of warm start here can be improved: the costs here are mostly in starting up workers and triton and initializing CUDA, a lot of which should not be included in the compile time cost in real world scenarios where these are already loaded before training begins.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158847
Approved by: https://github.com/zhxchen17
2025-07-23 15:06:54 +00:00
Simon Fan
07c4c2a792 [dynamo][be] hide warnings without invalidating warnings cache (#158520)
I feel uneasy about touching `__warningregistry__` since it is undocumented and private surface. The only public API hook that doesn't increment warnings version seems to be https://docs.python.org/3/library/warnings.html#warnings.showwarning.

So we could wack a mole all the warnings muters in compile to just not display warnings, and we wouldn't invalidate warnings cache. This PR adds it for torch/_dynamo, and I didn't find any warnings versioning mutation from torch/_inductor.

There is a behavior change if someone calls a compiled graph with simplefilter("error"):
```python
# e.g. test/dynamo_expected_failures/TestAutogradFallback.test_no_autograd_kernel_inplace_mode_nothing
with warnings.catch_warnings():
    warnings.simplefilter("error")  # turns all warnings into errors
    compiled_fn()  # will throw if any of the muted warnings fire
```

FIXES https://github.com/pytorch/pytorch/issues/128427

A note for the future: The warnings module doesn't offer a thread safe way of using it. Even regular filters have this problem, directly editing `__warningregistry__` would be very bad, and this PR would mute all threads. Someone will need to build a thread safe warnings interface.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158520
Approved by: https://github.com/anijain2305, https://github.com/zou3519
2025-07-18 22:02:31 +00:00
Lucas Kabela
583138d170 [Dynamo][Better Engineering] Add typing for comptime, cache, and convert_frame (#158379)
As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo

This PR adds strict typing support to a critical tracing point for dynamo, primarily for`comptime.py` but also `cache_size.py` and `convert_frame.py`.

Running
```
mypy torch/_dynamo/comptime.py torch/_dynamo/cache_size.py torch/_dynamo/convert_frame.py --linecount-report /tmp/coverage_log
```

| -------- | Lines Unannotated | Lines Total | % lines covered | Funcs Unannotated | Funcs Total | % funcs covered |
| -------- | ------- | -------- | ------- | ------- | ------- | ------- |
| Main  |  1837 | 2215 | 82.93% | 45 | 82 | 54.88% |
| This PR | 2230 | 2230 | 100.00% | 82 | 82 | 100.00% |
| Delta    | +393 | +15 | +17.07% | +37 | 0 | +45.12% |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158379
Approved by: https://github.com/mlazos
2025-07-18 02:11:57 +00:00
Lucas Kabela
a4d753295e [Dynamo][Better Engineering] Add enhanced typing support to _dynamo/eval_frame.py (#158276)
As part of better engineering week, we would like to improve out type support to improve dev experience in dynamo

This PR adds strict typing support to the main entrypoint for dynamo, `eval_frame.py`

Running
```
mypy torch/_dynamo/eval_frame.py --linecount-report /tmp/coverage_log
```

| -------- | Lines Unannotated | Lines Total | % lines covered | Funcs Unannotated | Funcs Total | % funcs covered |
| -------- | ------- | -------- | ------- | ------- | ------- | ------- |
| Main  |  623 | 2232 | 27.91% | 19 | 68 | 27.94% |
| This PR | 2285 | 2285 | 100.00% | 68 | 68 | 100.00% |
| Delta    | +1662 | +63 | +72.09% | +49 | 0 | +72.06% |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158276
Approved by: https://github.com/williamwen42

Co-authored-by: William Wen <williamwen@meta.com>
2025-07-16 23:31:10 +00:00
Edward Z. Yang
b40c0b61eb Make guard collective logging less chatty (#157995)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/157995
Approved by: https://github.com/Microve, https://github.com/albanD, https://github.com/Skylion007
2025-07-10 17:18:37 +00:00
James Wu
be56a8d7ac Automatically load and save dynamo entries via caching_precompile (#155913)
This PR adds a new config option, `caching_precompile`, and a `DynamoCache`, which loads and saves Dynamo Cache entries automatically. It also hooks up DynamoCache to PrecompileContext, so that we can save multiple cache entries.

When this configuration is turned on, we:
- Automatically create and initialize a CompilePackage on every torch.compile
- Automatically use BundledAutogradcache
- Automatically save the CompilePackage entry to DynamoCache after every compile

You can also use PrecompileContext.serialize() to manually serialize a full object.

I've added unit tests to exhibit this behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155913
Approved by: https://github.com/zhxchen17
2025-07-07 23:57:17 +00:00