Fixes https://github.com/pytorch/pytorch/issues/138920. See comments there for details.
I still need to try to get a smaller repro to write an actual test. But suppressing the guards, I now no longer see the specilization in the CA graph in the linked example:
```
aot1_view_3: ... = torch.ops.aten.view.default(aot1_tangents_1, [aot1_sym_size_int, 48, 1])
aot1_view_4: ... = torch.ops.aten.view.default(aot1_view_3, [aot1_sym_size_int, 48])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138968
Approved by: https://github.com/yf225, https://github.com/xmfan
Dynamo stance is recently added in https://github.com/pytorch/pytorch/pull/137504. When Dynamo stance is "force_eager" (user explicitly wants to fall back to eager), we would like Compiled Autograd to fall back to eager as well. This will allow the Traceable FSDP2 use case to work since "eager forward + compiled autograd backward" is not supported for Traceable FSDP2.
In general, if user wants to do "eager forward + compiled autograd backward", they should explicitly run the forward in eager instead of applying compile and then set stance to "force_eager".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138113
Approved by: https://github.com/xmfan
Dynamo stance is recently added in https://github.com/pytorch/pytorch/pull/137504. When Dynamo stance is "force_eager" (user explicitly wants to fall back to eager), we would like Compiled Autograd to fall back to eager as well. This will allow the Traceable FSDP2 use case to work since "eager forward + compiled autograd backward" is not supported for Traceable FSDP2.
In general, if user wants to do "eager forward + compiled autograd backward", they should explicitly run the forward in eager instead of applying compile and then set stance to "force_eager".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138113
Approved by: https://github.com/xmfan
ghstack dependencies: #138105
inductor mutates the aot backward graph. a solution could be to copy the graph, but since we don't know if compiled autograd is applied or not, it would be expensive to always clone it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136741
Approved by: https://github.com/jansel
ghstack dependencies: #135663
Part of #134054.
This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
Partially addresses https://github.com/pytorch/pytorch/issues/130170 for float scalars saved from forward pass of a custom c++ autograd function. With this PR, compiled autograd no longer recaptures when the float value changes, but downstream support isn't there yet: 4bdb4bbd86/torch/_dynamo/config.py (L58-L61)
Currently, any non-tensors passed in ctx->saved_data is specialized on by compiled autograd. To stop specializing on float values, we lift the float. We also require user code to use IValue::toSymFloat instead of IValue::toDouble in order to swap the SymFloat to proxy during compiled autograd tracing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133048
Approved by: https://github.com/jansel
ghstack dependencies: #132771
Addresses https://github.com/pytorch/pytorch/issues/130170 for int scalars saved from forward pass of a custom c++ autograd function
Currently, any non-tensors passed in ctx->saved_data is specialized on by compiled autograd. To stop specializing on int values, we lift the ints. We also require user code to use IValue::toSymInt instead of IValue::toInt in order to swap the SymInt to proxy during compiled autograd tracing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132771
Approved by: https://github.com/jansel
Instead of having a separate context variable for SymDispatchMode, we
now simply delegate to the current active proxy tensor mode when we
need to trace a SymInt. We maintain a separate `__sym_dispatch__` magic
method as the calling convention is different than `__torch_dispatch__`.
Consolidating the modes in this ways means that we can consistently
disable both of these modes in tandem simply by removing the mode
from the proxy mode infra slot.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132674
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
Instead of having a separate context variable for SymDispatchMode, we
now simply delegate to the current active proxy tensor mode when we
need to trace a SymInt. We maintain a separate `__sym_dispatch__` magic
method as the calling convention is different than `__torch_dispatch__`.
Consolidating the modes in this ways means that we can consistently
disable both of these modes in tandem simply by removing the mode
from the proxy mode infra slot.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132674
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
current implementation of compiled_autograd_enabled_count affects the entire region under the context manager. so if the context manager wraps torch.compile calls unrelated to the backward, they are affected too:
- no lazy compile for compiled fw
- no aot autograd cache for inference graphs
we instead maintain a flag when we execute the compiled backward callable, to isolate the special handling to the compiled backward graph
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128982
Approved by: https://github.com/jansel
ghstack dependencies: #127960, #128905
Adds support for `Variable._execution_engine.queue_callback()`, which is used in FSDP2.
Important tests:
- `pytest -rA test/inductor/test_compiled_autograd.py::TestCompiledAutograd::test_callback_graph_break_throws_error`
- `pytest -rA test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_callback_adds_callback`
- `PYTORCH_TEST_WITH_DYNAMO=1 python test/test_autograd.py -k TestAutograd.test_callback_adds_callback`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126366
Approved by: https://github.com/xmfan
Most commonly CPU scalars used for philox random seed. Right now, any cpu input will skip cudagraphing the entire graph. We need both the traced graph and the runtime inputs to be cudaified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125382
Approved by: https://github.com/jansel
- `FakeContext` hides all fields other than ctx.saved_tensors, this dynamo errors when the autograd.Function.backward uses other attrs on ctx and it also doesn't allow fallback to eager.
- If we remove it, we still can't fallback to eager: node variables are already freed (ctx.saved_tensors throws)
- However, we can fallback to "pseudo-eager" by using a duck-typed ctx and routing the ctx.saved_tensors to lifted tensors
- Dynamo tries to inline external_utils.call_backward, treats BackwardCFunction as a AutogradFunctionContextVariable (only used up until we create the fake context: FakeBackwardCFunction)
- we call_function backward from the forward class AutogradFunctionVariable, and we still pass in the fake context as a UserDefinedObjectVariable (can later use AutogradFunctionContextVariable + HOO graph speculate)
Fixes#125489#124827
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125661
Approved by: https://github.com/jansel