Previously when processing `sym_and(a, b, c)`, symbolic shapes wouldn't individually process a, b, and c and store their implications. This would lead us to data-dependent error on individual checks, e.g. we stored `u0 >= 0 & u0 <= 10`, but then couldn't figure out `u0 <= 10`.
This handles that, and also makes `sym_and/or` user-code friendly, for testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154737
Approved by: https://github.com/laithsakka
Fixes#129673
### Summary:
Modifying a tensor by reshaping in place (such as `unsqueeze_`) should cause a graph break; however, when accessed through `torch.Tensor` api as opposed to as self attribute caused the code to crash with an error (see attached issue)
Paths differed when traced due to the stack variable popped, as:
* `self.unsqueeze_` pops a `LazyVariableTracker` which gets resolved to `TensorVariable`, so when looking for the method, triggers the fn call `var_getattr` in `_dynamo/variables/tensor.py`; since this is an inplace view (metadata mutation) on graph input, it is not well supported so should fall back (see [L446](1017927c83/torch/_dynamo/variables/tensor.py (L446)) in that file)
* `torch.Tensor.unsqueeze` pops a `UserDefinedClassVariable` so when looking for the method, triggers the fn call `var_getattr` in `_dynamo/variables/user_defined.py` on [L273](a8f6b40e36/torch/_dynamo/variables/user_defined.py (L273)). This path tries to build a variable tracker from the obj popped, which resolves to a trace_rule , and as a Tensor method, is resolved to `TorchInGraphFunctionVariable` on [L3767](a8f6b40e36/torch/_dynamo/trace_rules.py (L3767))
So, one straightforward option is to check if the fn is an inplace_view on a input tensor in `torch.py` when we resolve the `__call__function` for the `TorchInGraphFunctionVariable` instead, which resolves the bug by providing a graph break
### Test
```
pytest test/dynamo/test_functions.py::FunctionTests::test_unsqueeze_inplace
```
Results in
```
Running 1 items in this shard
test/dynamo/test_functions.py . [100%]
=========================================================================================== 1 passed in 9.16s ==========================================================================================
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150573
Approved by: https://github.com/anijain2305
This patch effectively ignores traceable_tensor_subclasses, allowing
Dynamo to always try tracing into the `__torch_function__` of tensor
subclass. This helps us with 2 things:
1. allowing users to directly benefit from better compilation of tensor
subclass, by just upgrading pytorch, without having to change legacy
library code (see earlier patches in the stack for examples).
2. potentially exposing more issues in compiling tensor subclass, so we
can get signals and improve them.
As a consequence, it exposed and fixes 2 subtle bugs:
1. In `build_torch_function_fn`, we could get
`torch._C._disabled_torch_function_impl` because we have a
`Parameter` subclass without `__torch_function__` override or if we
have a tensor subclass with `__torch_dispatch__` override. We graph
break on this for now, and plan to add support -- the logic for
simulating `torch._C._disabled_torch_function_impl` is already in
`SuperVariable`, we just need to reuse it.
2. Sometimes we create `SyntheticLocalSource` and need to remove all the
guards installed on it, but we only removed the ones whose source
_is_ the created synthetic source `s`, but forgot about chained
source like `s.foo`, this showed up as
`SYNTHETIC_LOCAL['tmp_0'].__torch_function__.__func__`.
Differential Revision: [D71906141](https://our.internmc.facebook.com/intern/diff/D71906141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149792
Approved by: https://github.com/jansel, https://github.com/mlazos
ghstack dependencies: #149482, #149483, #149484
This patch effectively ignores traceable_tensor_subclasses, allowing
Dynamo to always try tracing into the `__torch_function__` of tensor
subclass. This helps us with 2 things:
1. allowing users to directly benefit from better compilation of tensor
subclass, by just upgrading pytorch, without having to change legacy
library code (see earlier patches in the stack for examples).
2. potentially exposing more issues in compiling tensor subclass, so we
can get signals and improve them.
As a consequence, it exposed and fixes 2 subtle bugs:
1. In `build_torch_function_fn`, we could get
`torch._C._disabled_torch_function_impl` because we have a
`Parameter` subclass without `__torch_function__` override or if we
have a tensor subclass with `__torch_dispatch__` override. We graph
break on this for now, and plan to add support -- the logic for
simulating `torch._C._disabled_torch_function_impl` is already in
`SuperVariable`, we just need to reuse it.
2. Sometimes we create `SyntheticLocalSource` and need to remove all the
guards installed on it, but we only removed the ones whose source
_is_ the created synthetic source `s`, but forgot about chained
source like `s.foo`, this showed up as
`SYNTHETIC_LOCAL['tmp_0'].__torch_function__.__func__`.
Differential Revision: [D71906141](https://our.internmc.facebook.com/intern/diff/D71906141)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149792
Approved by: https://github.com/jansel, https://github.com/mlazos
ghstack dependencies: #149482, #149483, #149484
PR does following
* Turns `inference_mode` to False and `no_grad` for `convert_frame`, if the inference_mode is on globally.
* Turns off inference_mode for fake tensor prop. This ensures that converting from real inference tensor to a fake tensor removes the inference-ness.
* Graph breaks on is_inference and is_inference_mode_enabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149321
Approved by: https://github.com/jansel, https://github.com/zou3519
As title, this enables `nonstrict_trace`-ed function to take in object
whose type has been `pytree.register_constant`-ed, as long as the object
existed outside the `torch.compile` region. This also forces Dynamo to
emit a `EQUALS_MATCH` guard on the object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148007
Approved by: https://github.com/zou3519
ghstack dependencies: #148385
As title. Without this patch we get the following error:
Tweaking the `allow_non_fake_inputs` flag on tensor mode doesn't quite
work for AOTAutograd, which also needs to fake-tensor-propagate the
`nonstrict_trace`-ed function, but that's _after_ Dynamo has handled the
`nonstrict_trace` processing and put the `flat_apply(...)` node into the graph.
So we can't easily to temporarily enable the `allow_non_fake_inputs`
flag on current fake mode, when AOTAutograd processes a `flat_apply`
node from Dynamo's `nonstrict_trace` handling. And after discussing
with zou3519, I decided to add a global `FakeTensorTLS` that contains a
`allow_non_fake_inputs_override` flag, and patch the `nonstrict_trace`-ed
function to temporarily tweak this flag during its execution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147572
Approved by: https://github.com/zou3519
ghstack dependencies: #146714, #146367, #146950, #147571
## Context
> **Note:** `mark_traceable` got renamed to `nonstrict_trace` after
> offline discussion. The reasons are (1) it aligns with `torch.export`'s
> `nonstrict` notion, and (2) it's more definitive in behavior suggestion.
1. [Overall Design](https://docs.google.com/document/d/1O-dR2ZQaJQVt_v67AVcDCw2yJLtqgkZFwoXK0buEWRg/edit?tab=t.0)
2. [Dynamo graph representation with `torch._higher_order_ops.flat_apply`](https://docs.google.com/document/d/1YHl5nPTJvYeCPE5TO9uA18DPWNgUYGE4gCn6bFvXcBM/edit?tab=t.0#heading=h.xtw3hhbro4gn)
## Summary
This patch adds a `torch._dynamo.nonstrict_trace` decorator, which
currently is an enhanced version of `torch._dynamo.allow_in_graph` (see
docstring for their differences). Specifically, this patch focuses on
the UI and functionality prototyping/plumbing.
The main enhancement is supporting more input types, and the
implementation challenge lies in reconstructing the input objects from
Dynamo `VariableTracker` (while accounting for buffered side-effects and
guards). This patch takes a middle-ground (simple implementation with a
bit of user labor), by
1. asking the user to provide pytree registration for non-proxy-able
input types,
2. letting Dynamo trace through `pytree_flatten` (which accounts for
buffered side-effects and guards automatically),
3. and passing in the TreeSpec as a graph attribute constant into
`torch._higher_order_ops.flat_apply` (which unflattens the inputs and
invokes the underlying function).
## Next Steps
In subsequent patches, we will try to support the following:
- annotating on class method
- reads to global tensors
- inputs that contains `pytree.register_constant`-ed instances.
- function as input
- more output types (e.g., any pytree-registered type)
- `torch.nn.Module` as inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146367
Approved by: https://github.com/zou3519
ghstack dependencies: #146714
Fixes https://github.com/pytorch/pytorch/issues/144907
```
class Foo(torch.nn.Module):
def forward(self, val):
return torch.full((80, 2), val, dtype=torch.float32)
export(Foo(), args=(torch.tensor(1),))
```
When we have a `torch.full` call like above, where the fill value is a scalar Tensor and not a scalar value, the FX graph from `_dynamo.export()` contains a single node: the full op. We run into a `PendingUnbackedSymbolNotFound` error, because the `item()` call is implicit; the UnbackedSymInt is extracted but goes directly into the data of the output tensor value, and we're then unable to locate it when we try to compute unbacked bindings.
On the other hand, non-strict export doesn't face this, because an explicit `item()`, or `local_scalar_dense` node is inserted, and the unbacked binding is directly the example value of that node.
This adds a dynamo handler to imitate what happens in non-strict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144999
Approved by: https://github.com/angelayi
BuiltinVariable.call_hasattr() overrides the base class - but actually behaves differently. The base is `obj.call_hasattr(tx, attr)` but BuiltinVariable's version is `<unused>.call_hasattr(tx, obj, attr)`.
The BuiltinVariable version is used as a pattern from `call_self_handler()` for `BuiltinVariable(hasattr)`. I think the other version is just used for internal `hasattr(obj, name)` so I renamed that one to `call_obj_hasattr`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145551
Approved by: https://github.com/anijain2305
Prior to this patch, Dynamo conveniently modelled torch profiler context
objects (e.g., `torch.profiler.profile`) as `NullContextVariable`
because `torch.compile` ignore the effect of these profiler contexts.
However, the semantics of these profiler contexts diverges from
`contextlib.nullcontext` in the `__enter__` function, where the former
returns `self` and the latter returns `None`. This causes subtle error
as observed in #125021.
This patch adds back a `ProfilerContextVariable`, which addresses the
aforementioned semantic discrepency.
Fixes#125021.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145537
Approved by: https://github.com/zou3519, https://github.com/williamwen42
We added an is_export flag under torch.compiler.is_exporting. This comes handy when we try to do some special logic in user-level and system-level (e.g. in upper of the stack).
In increasing-scope:
- `_is_fx_tracing` is set to True when we use under symbolic_trace or make_fx.
- `is_exporting` is set to True when we're doing strict or non-strict export, which internally has a step that calls make_fx and set _is_fx_tracing to be True.
- `is_compiling` is set to True when we're either doing strict, non-strict export or torch.compile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142425
Approved by: https://github.com/avikchaudhuri
**Overview**
This PR moves `torch/distributed/_composable/fsdp` to `torch/distributed/fsdp/_fully_shard` and makes public APIs available from `torch.distributed.fsdp`, e.g.:
```
from torch.distributed.fsdp import fully_shard
```
This is targeting 2.6 release. I rewrote some of the documentation with (hopefully) improved phrasing.
**Changes for Reland**
- Preserved the public objects from `torch/distributed/_composable/fsdp/fully_shard.py` so that the import path still works internally
- Added a unit test that we can do `from torch.distributed._composable.fsdp.fully_shard import FSDPModule`
Differential Revision: [D66890387](https://our.internmc.facebook.com/intern/diff/D66890387)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141868
Approved by: https://github.com/kwen2501, https://github.com/wconstab, https://github.com/weifengpy, https://github.com/fegin, https://github.com/XilunWu
Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
**Overview**
This PR moves `torch/distributed/_composable/fsdp` to `torch/distributed/fsdp/_fully_shard` and makes public APIs available from `torch.distributed.fsdp`, e.g.:
```
from torch.distributed.fsdp import fully_shard
```
This is targeting 2.6 release. I rewrote some of the documentation with (hopefully) improved phrasing.
**Follow-Ups**
- [x] Add some explanation in the docs about FSDP1 vs. FSDP2
- [ ] Move unit tests from `test/distributed/_composable/fsdp` to `test/distributed/fsdp/fully_shard/`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141868
Approved by: https://github.com/kwen2501, https://github.com/wconstab, https://github.com/weifengpy
Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
As title, this also uncovered a few invalid use cases; the cases that
cause error are fixed in separate patches prior to this patch, and the
rest are fixed in this patch.
This patch also moves a few `.source` mutation to variable construction,
to increase the coverage of the validation.
Fixes#133027.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141717
Approved by: https://github.com/jansel
ghstack dependencies: #141713, #141714, #141715, #141902, #141716
There has been a series of attempts to provide support for resizing in
torch operators like `torch.sigmoid(x, out=y)`, i.e., `y` would have a
different shape before and after this expression. Prior to this patch,
we have some checks to graph break if the shape changed.
This patch extends
1. extends the existing check and graph break for any shape change, not
just for `TensorVariable` with source field.
2. removes an old code path which was introduced to address the shape
change, but became obselete in that regard because we added extra
checks to graph break upon shape change. Moreover, this old code path
is unsound, it tries to replace references to the old
`TensorVariable` the new one returned by `wrap_fx_proxy`, but it only
does the replacement in `symbolic_locals`, which breaks when cells
are involved. In general the old `TensorVariable` could be _anywhere_,
think the `replace_all` we had for immutable VTs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140202
Approved by: https://github.com/jansel
ghstack dependencies: #140035, #140036, #140149, #140150, #140151, #140201
## `VariableTracker::build()` hides the Builders
### The problem
In the current code, creating a `VariableTracker` involves choosing one of two `Builder` classes and either calling a method, or calling a constructor that creates an object that you immediately call, [like this](083c9149b7/torch/_dynamo/variables/functions.py (L761-L768)).
Variations on this code are repeated in many places.
More, the `Builder` classes have a lot of dependencies, so they have to be loaded late in the whole import process to avoid circular imports, so they end up being repeatedly imported at local scope.
### The solution
In this commit, the import from `builder` and the logic of choosing and calling the Builder class are hidden in a single static factory method, `VariableTracker.build()`, easier to reason about and to import.
This commit net lowers the total lines of code by over 150 lines by removing repetitive logic and unnecessary local imports.
**CHANGES:** Originally the name of the static method was `VariableTracker.create()` but a static method on a derived class, `LazyVariableTracker.create()` now exists with a different signature that's irreconcilable, so the new static method was renamed to `VariableTracker.build()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135714
Approved by: https://github.com/jansel
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)
Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call
resume fn structure:
1. enter context
2. jump
...
3. exit context
The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).
So for torch function modes the structure of our output code is this:
1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function
Then our resume fn looks like this:
1. no-op enter torch function mode
2. jump
3. exit tf mode
To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).
Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114
Approved by: https://github.com/yanboliang