Previously when processing `sym_and(a, b, c)`, symbolic shapes wouldn't individually process a, b, and c and store their implications. This would lead us to data-dependent error on individual checks, e.g. we stored `u0 >= 0 & u0 <= 10`, but then couldn't figure out `u0 <= 10`.
This handles that, and also makes `sym_and/or` user-code friendly, for testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154737
Approved by: https://github.com/laithsakka
#153622 introduced a hook for getting the relevant code objects after frame tracing. The idea is to have vLLM use this instead of monkey-patching `inline_call_()` to determine the source code files to hash. Unfortunately, the hook runs too late; the vLLM backend needs access to the set of source code filenames while it's running.
This PR replaces the newly-added hook with a utility function that a backend can call to get this information. I've made the change in vLLM and can verify that this allows the information to be queried at the right time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155249
Approved by: https://github.com/zou3519
In a precompiled bytecode, it looks like the following:
```
pre-graph bytecode
...
compiled graph code
...
post-graph bytecode
```
In pre-graph bytecode we have calls into helper functions like torch._dynamo.utils.call_size which will invoke @disable inside the bytecode.
Normally torch.compile() will handle these frames fine, but for precompile we will load bytecode from a clean state of dynamo and we want a way to assert recompile never happen, so the current way to ensure this is by doing set_stance("fail_on_recompile") (open to any other idea to test this, but IMO this is the closest thing we have today).
This approach doesn't work when util functions like call_size() is involved and this PR fixes a bunch of places to make sure "fail_on_recompile" can skip through the functions meant to be skipped during compilation.
Differential Revision: [D76156867](https://our.internmc.facebook.com/intern/diff/D76156867/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155363
Approved by: https://github.com/jamesjwu, https://github.com/jansel
ghstack dependencies: #155329
On this line, we see that the bw_compiler that dynamo uses for AotAutograd automatically disables the backward runnable:
05dd638ee9/torch/_dynamo/backends/common.py (L76)
This disables dynamo in the bw_compiler but also disables the runnable the compiler returns.
On a AOTAutogradCache hit, however, we never call the bw_compiler! So we don't disable dynamo properly. This only has an effect on certain cases of cpu tensors' backwards, where the backward is being done in python land, and dynamo unnecessarily tries to trace through the inductor generated code. It also only matters if the backward is being accessed outside of dynamo itself (say, in a graph break in eager mode), since dynamo properly disables the forward function already.
```
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] TorchDynamo attempted to trace the following frames: [
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * fn /home/jjwu/test.py:9
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * cast /data/users/jjwu/a/pytorch-env/lib/python3.10/typing.py:1737
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] * call /tmp/torchinductor_jjwu/rq/crq327nhoyjzog5n3qlchauucdrunrtutwmmoh7ipoe2ngnson5s.py:35
I0605 09:58:40.135000 3981970 torch/_dynamo/eval_frame.py:517] ]
```
This PR fixes the issue and adds a unit test showing that with or without cache hit, the frames dynamo is tracing is identical.
Fixes https://github.com/pytorch/pytorch/issues/154536
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155251
Approved by: https://github.com/bdhirsh, https://github.com/anijain2305
Add comprehensive module docstring explaining built-in function and type
variable tracking, including handling of Python built-ins, type constructors,
operators, and special constructs during symbolic execution.
Originally generated by claude but reviewed and edited by me.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155402
Approved by: https://github.com/Skylion007
ghstack dependencies: #155403
Add comprehensive module docstring explaining side effect tracking and
management, including mutation tracking, context changes, aliasing,
and state preservation during symbolic execution.
Originally generated by claude but reviewed and edited by me.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155403
Approved by: https://github.com/williamwen42
Add comprehensive module docstring explaining the tracing rules and policies
that govern TorchDynamo's compilation decisions, including skip rules,
inlining policies, and library-specific handling.
Originally generated by claude but reviewed and edited by me.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155401
Approved by: https://github.com/williamwen42
Add a few missing returns in torch._logging and use ruff to infer the obvious ones.
LazyStr now properly checks the return type of the Callable and the args and kwargs passed to it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155345
Approved by: https://github.com/ezyang
We observed that guard overhead at runtime using profiler traces was
higher than reported in this profiling function at the compile time.
After investigation, we found that f_locals are already in cache and
that was causing the guard overhead to be way smaller while profiling
during the compilation. To be more realistic, we flush the cache here.
Profiling the guard overhead during compilation (in addition to at
runtime) allows faster iteration time, and logging in tlparse and
internal databases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154764
Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/StrongerXi
We observed that guard overhead at runtime using profiler traces was
higher than reported in this profiling function at the compile time.
After investigation, we found that f_locals are already in cache and
that was causing the guard overhead to be way smaller while profiling
during the compilation. To be more realistic, we flush the cache here.
Profiling the guard overhead during compilation (in addition to at
runtime) allows faster iteration time, and logging in tlparse and
internal databases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154764
Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/StrongerXi
ghstack dependencies: #154769
AOTAutogradCache uses FXGraphCache which uses the tracing context to get the ShapeEnv. Although the TracingContext global_context is cleared by the time we get around to reusing it, we don't actually need it. We just need the ShapeEnv in the TracingContext, which isn't cleared at the end of dynamo and does persist. This PR adds the tracing context manager around the specialized compile to ensure our caching infrastructure can get access to the ShapeEnv. A test was also added to prove correctness.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153526
Approved by: https://github.com/jamesjwu, https://github.com/zou3519
ghstack dependencies: #153433, #153449