Commit Graph

111 Commits

Author SHA1 Message Date
PyTorch MergeBot
6f60c65a3a Revert "[dynamo] Log guard latency (#145132)"
This reverts commit 0a310d7388.

Reverted https://github.com/pytorch/pytorch/pull/145132 on behalf of https://github.com/anijain2305 due to CI failures observed after PR was merged ([comment](https://github.com/pytorch/pytorch/pull/145132#issuecomment-2611268421))
2025-01-24 00:11:50 +00:00
Animesh Jain
0a310d7388 [dynamo] Log guard latency (#145132)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145132
Approved by: https://github.com/ezyang
ghstack dependencies: #145351, #145420
2025-01-23 23:30:07 +00:00
Aaron Orenstein
a79100ab11 PEP585 update - torch/_dynamo (#145105)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105
Approved by: https://github.com/bobrenjc93
2025-01-18 20:47:11 +00:00
bobrenjc93
1fe3af2c68 Migrate from Tuple -> tuple in torch/_dynamo (#144261)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144261
Approved by: https://github.com/aorenste, https://github.com/zou3519
2025-01-10 07:45:57 +00:00
Oguz Ulgen
9ee242213b [RFC] Introduce cache hot loading APIs (a.k.a. "Mega-cache") (#143341)
This PR essentially introduces two new APIs
* torch.compiler.save_cache_artifacts
* torch.compiler.load_cache_artifacts

which aim to create a mega cache experience where the user can start collecting cache artifacts, and later call the save API to fetch them. In the next attempt, the user can "hot load" the cache artifacts via the load function.

This bundling approach reduces the need to rely on porting individual files one by one, or relying on many network requests.

Note that these APIs CANNOT log to structured logging as these functions will be called before and after compilation, as opposed to during compilation. Due to this limitation, the API returns a struct that the user can log with.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143341
Approved by: https://github.com/jansel
2025-01-07 23:13:24 +00:00
Kasperi Apell
a7915c56f6 Propagate callable parameter types using ParamSpec (#142306) (#143797)
The codebase has a few locations where callable parameter type information is lost when the unpackings *args and **kwargs are typed as Any. Refactor these instances to retain type information using typing_extensions.ParamSpec.

Also, in these functions, enforce return type with TypeVar.

Addresses #142306

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143797
Approved by: https://github.com/Skylion007

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>
2024-12-29 23:03:14 +00:00
Simon Fan
4ee166b82f [ca] add compiled autograd to CompileId (#141907)
tlparse PR: https://github.com/ezyang/tlparse/pull/83

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141907
Approved by: https://github.com/ezyang
2024-12-21 00:41:24 +00:00
William Wen
18261e9f39 [dynamo] implement framelocals mapping as c++ object (#140063)
Implements https://github.com/pytorch/pytorch/issues/93753 - move frame local guard accessors to C++.

Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible.

Some implementation notes:
- `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells.
- `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily.
- We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed.
- We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up)

dynamo_guard_eval.py microbenchmark update:
- 713.2us -> 630.0us (3.10)
- 598.8us -> 530.7us (3.12)

Other followups:
- Add `FrameLocalsMapping` version for `check_verbose_nopybind` in order to match behavior between `check_nopybind` and `check_verbose_nopybind`. This can prevent difficult debugging situations where guards fail (`check_nopybind` returns false) but no guard error message is generated (`check_verbose_nopybind` succeeds).
- Rewrite the `SHAPE_ENV` guard into C++ - it is a fairly common guard that results in `FrameLocalsMapping` needing to convert to a dict

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140063
Approved by: https://github.com/jansel
ghstack dependencies: #142117, #142430
2024-12-17 18:54:27 +00:00
Michael Lazos
49e4307686 [Dynamo] add debug logging for graph region expansion (#141382)
This PR adds debug logging for the region expansion algorithm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141382
Approved by: https://github.com/williamwen42
ghstack dependencies: #141381
2024-12-11 02:22:21 +00:00
Aaron Gokaslan
12e95aa4ee [BE]: Apply PERF401 autofixes from ruff (#140980)
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-11-20 17:52:07 +00:00
Ryan Guo
85dd7b84cf [dynamo] Add a DynamoFrameType type above Python frame object (#140330)
This patch introduces a `DynamoFrameType` to serve as a layer between
Dynamo and different versions of Python frame object. In
`DynamoFrameType`, we only register attributes Dynamo cares about (e.g.,
`f_code`, `f_locals`, etc.

This will be helpful when it comes to adding new attributes to this
`DynamoFrameType`, or dealing with Python version changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140330
Approved by: https://github.com/jansel, https://github.com/williamwen42
2024-11-15 17:17:30 +00:00
Yidi Wu
ab42967238 [hop free symbols] lift free symbols in example_value when create_graph_input (#138363)
There are 4 parts (they are hard to further break into smaller ones cause they're highly coupled) in this PR:
1. **Whenever we call create_graph_input, we try to bind the symbols in the graph input.**
We've enforced the invariant that all create_graph_inputs calls must provide an example value, we could intercept at the create_graph_input calls (This PR only handles free symbols in tensors).
2. **We cache the bound_symbols** to avoid lift the same symbol repeated.
3. For lifted symbols, we re-used  **lifted_freevars** i.e. the mapping between symbol proxy in parent graph to the lifted phs in current subgraph, which we handle lifted tensors. In this way, all hops that supports lifted tensors should be able to handle lifted_symints automatically (at least in dynamo part).
4. For **unbacked symbols** created during tracing, we need to also bound these symbols to its proxy. This is to support the tests cases where we want to lift unbacked symbols as input. We need the proxy of the unbacked symbol in parent graph in order to properly create the args to the hop.
5. We change all the tests after free symbols are lifted in subgraphs. And also supports the lifted symbols in existing higher order ops.

**The interaction of nested tracers:**
The previous design for lifting tensor closures is that: suppose we're in nested tracers, whenever we see a new proxy that's not created by create tracer, we recursively look for the proxy in parent tracer until we find the tracer that creates this proxy (either a placeholder or some intermediate results). More detail is in Note [Nested SubgraphTracer and free_variable handling].

Given the above design, the plan for lifting the free symbols is: whenever we lift a free tensor to be the inputs of current subgraph, we'll look at the symbols in it and bind the symbols at the same time.

For example, suppose we have the following function:
```python
def f(x: [s1, s2]):
  def true_f():
    def true_f_inner():
      return x.sin()
```
what will happen in time order:

1. we create a subtracer 1 and start to speculate the outer cond's true_f
2. we create a another subtracer 2 and start to speculate the inner cond's true_f_inner.
3. dynamo realize the tensor input x by calling wrap_tensor in top-level to create graph input x (tracer 0), we bind the symbol s1, s2 after ph for x is created. So the graph now looks like:
```python
def gm(s1, s2, x):
```
4. when seeing TensorVariable.call_method of x,  tracer2 wants to create a call_function(sin, proxy_of_x), but it finds that proxy_of_x is not created by current tracer. So it recursively look up its parent tracer1 and find parent tracer1 also doesn't track this proxy_of_x then it finds the root tracer0, who is the creator of it and tracks it as a ph. Then tracer 1 create_graph_input  to lift the closure to its input ph1 and add (proxy_of_x: ph1) k-v in **lifted_freevars**  of tracer 1.
Now the graph looks like:
```python
def gm(s1, s2, x):
  def true_gm(x):
```
5. Since there are free symbols inside this new tensor input, tracer 1 also binds the symbols (maybe_bind_symbol), which calls create_graph_input for s1 and s2. Now the graph looks like
```python
def gm(s1, s2, x):
  def true_gm(s1, s2, x):
```
6. then it goes back to tracer 2, and call create_graph_input for x and get ph2, tracer 2's **lifted_freevars** records (ph1, ph2). and tracer 2 also binds the symbols in this new tensor input. Now the graph looks like:
```python
def gm(s1, s2, x):
  def true_gm(s1, s2, x):
    def true_gm_inner(s1, s2, x):
```
7. Finally the sin call_function node is created by tracer 2.

**This PR also handles the following cases:**
- What if we lift two tensors share the same symbol? e.g. x1 [s1, s2], x2 [s2, s3]? Each subtracer maintains bound_symbols as a cache that maps a symbol.expr to its proxy in current tracer. So when we see x1, we'll track s1 and s2 as inputs and bound s1 to ph1, s2 to ph2. So when we try to bind symbols of x2, s2 will already be tracked so no graph input is created.
- what if a subgraph close over a symint? e.g.
```python
def f(x):
  def true_f():
    c = x.size(0)
   def true_fn_inner():
     return c
```
When we speculate true_fn_inner, we find proxy_of_c is not tracked by tracer 2, so it recursively looks up its parent. At this point, x and its symbols have been lifted as input of true_f (as a result of lifting x during tracing true_f in tracer 1. Specifically the graph looks like:
```python
def gm(s1, s2, x):
  def true_gm(s1, s2, x):
    def true_gm_inner():
```
So tracer 2 is able to find that s1 have been tracked as ph in tracer 1 so it returns back to gm and call create_graph_input on s1. The graph now looks like:
```python
def gm(s1, s2, x):
  def true_gm(s1, s2, x):
    def true_gm_inner(s1):
     return s1
```

-  What if subgraph close over an unbacked symint? e.g.
```python
def f(x):
  def true_f():
    c =  x.item()
    def true_f_inner():
      return c
```
When x.item() is called, proxy_of_c and its symnode variable is created for tracer 1, and we also call track_unbacked_symbols to record this relationship. So when tracer 2 finds proxy_of_c is not created by current tracer, it recursivelly looks up its parent tracer and finds that that expression u0 has been tracked as a result of track_unbacked_symbol in tracer 1. So it will stop the recursion and create_graph_input u0 in tracer 2. Graph looks like:
```python
def f(x):
  def true_f(s1, s2, x):
    c = x.item()
    def true_gm_inner(u0):
      return u0
    cond(pred, true_gm_inner, false_gm_inner, (c,))
```

- what if subgraph close over a tensor with unbacked symint shape?
```python
def f(x):
  def true_f():
    c = x.item()
    r = torch.randn((c,))
    def true_f_inner():
      return r + 1
```
This is the same as the case of closing over tensors with backed shapes. where we first lift r, then bind u0 in it, which recursively bind_symint of u0 in its parent and found u0 is tracked in parent tracer as a result of .item() call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138363
Approved by: https://github.com/zou3519
2024-11-07 04:44:32 +00:00
Animesh Jain
dba6887dc6 [dynamo][refactor][config-cleanp] Use guard_manager consistently instead of check_fn (#138896)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138896
Approved by: https://github.com/williamwen42, https://github.com/jansel
ghstack dependencies: #138512
2024-10-26 15:14:46 +00:00
Animesh Jain
e714ebf664 [dynamo][testing] Update AOTEagerandRecordGraphs backend (#138231)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138231
Approved by: https://github.com/StrongerXi, https://github.com/mlazos, https://github.com/aakhundov
2024-10-18 02:35:00 +00:00
Adnan Akhundov
809ff3b274 Add host-side Triton TMA support to Dynamo (#137677)
This adds Dynamo tracing support for the host-side Triton TMA API (see `create_2d_tma_descriptor` calls on the host in the [Triton tutorial](https://triton-lang.org/main/getting-started/tutorials/09-persistent-matmul.html#sphx-glr-getting-started-tutorials-09-persistent-matmul-py)). A few notes:

- Here we assume the availability of the host-side TMA API added to upstream Triton in https://github.com/triton-lang/triton/pull/4498. As of time of writing, this is not a part of the PT2 OSS Triton pin (although back-ported internally). OSS Triton pin update should be done in December 2024.
- To capture the chain of calls `t.data_ptr() --> create_{1d,2d}_tma_descriptor(ptr, ...) --> kernel[grid](tma_desc, ...)`, we add three new variable trackers: `DataPtrVariable`, `CreateTMADescriptorVariable` (for the function), `TMADescriptorVariable` (for TMA descriptor object). This is to maintain the path back from the Triton kernel to the Tensor from which the TMA descriptor has been created.
- The newly introduced variables have `reconstruct` methods used in case of graph breaks.
- The `tma_descriptor_metadata` extracted from the captured `create_{1d,2d}_tma_descriptor` calls is propagated through the HOPs in Dynamo and AOTAutograd to be used by the downstream compiler (e.g., Inductor). See the unit tests for how the captured HOP arguments look like.
- In the Dynamo-captured fx graph, we replace the TMA descriptor arguments of the Triton kernel by the underlying Tensors, to be able to track the input/output relationships in terms of Tensors.
- In the Triton kernel mutation analysis pass (in AOTAutograd), we use the `tt.experimental_descriptor_store` TTIR op to detect mutations of the underlying tensors via TMA descriptors. So that downstream AOTAutograd can perform functionalizations as required.
- JIT Inductor and AOT Inductor support will be implemented in follow-up PRs.

Differential Revision: [D64404928](https://our.internmc.facebook.com/intern/diff/D64404928)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137677
Approved by: https://github.com/zou3519
2024-10-16 02:18:48 +00:00
Michael Lazos
e41dffbedd [Dynamo] Trace enter/exit of TorchFunctionModes (#135422) (#137114)
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)

Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call

resume fn structure:
1. enter context
2. jump
...
3. exit context

The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).

So for torch function modes the structure of our output code is this:

1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function

Then our resume fn looks like this:

1. no-op enter torch function mode
2. jump
3.  exit tf mode

To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).

Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114
Approved by: https://github.com/yanboliang
2024-10-09 02:29:40 +00:00
PyTorch MergeBot
d34b617bb9 Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422) (#137114)"
This reverts commit 51bc839b94.

Reverted https://github.com/pytorch/pytorch/pull/137114 on behalf of https://github.com/huydhn due to The top of the stack has been reverted but it leaves trunk in a broken state, so I try to revert the rest of the stack ([comment](https://github.com/pytorch/pytorch/pull/137114#issuecomment-2400765603))
2024-10-08 20:33:17 +00:00
Michael Lazos
51bc839b94 [Dynamo] Trace enter/exit of TorchFunctionModes (#135422) (#137114)
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)

Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call

resume fn structure:
1. enter context
2. jump
...
3. exit context

The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).

So for torch function modes the structure of our output code is this:

1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function

Then our resume fn looks like this:

1. no-op enter torch function mode
2. jump
3.  exit tf mode

To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).

Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137114
Approved by: https://github.com/yanboliang
2024-10-07 18:55:26 +00:00
Animesh Jain
289df45cee Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422)" (#136590)
This reverts commit 7743149b2b.

Reverts
* https://github.com/pytorch/pytorch/pull/135503
* https://github.com/pytorch/pytorch/pull/135502
* https://github.com/pytorch/pytorch/pull/135422

This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation.

```
import torch
from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention
from torch._inductor.lowering import make_pointwise, register_lowering
from torch._inductor.virtualized import ops
from torch.nn.attention.flex_attention import create_block_mask

torch.set_default_device('cuda')

flex_attention = torch.compile(flex_attention, dynamic=False)

prefix_lengths = torch.arange(8)
def prefix_lm(b, h, q, kv):
    return prefix_lengths[b] >= kv

mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136590
Approved by: https://github.com/Chillee
2024-09-25 21:10:43 +00:00
Bob Ren
32727b9859 Add types to _dynamo/testing.py (#136402)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136402
Approved by: https://github.com/jansel
2024-09-24 10:23:54 +00:00
Michael Lazos
1b9daeb240 [Dynamo] Trace enter/exit of TorchFunctionModes (#135422)
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)

Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call

resume fn structure:
1. enter context
2. jump
...
3. exit context

The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).

So for torch function modes the structure of our output code is this:

1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function

Then our resume fn looks like this:

1. no-op enter torch function mode
2. jump
3.  exit tf mode

To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).

Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135422
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444
2024-09-14 18:52:22 +00:00
PyTorch MergeBot
f3180f0088 Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422)"
This reverts commit 7743149b2b.

Reverted https://github.com/pytorch/pytorch/pull/135422 on behalf of https://github.com/mlazos due to broke python test/quantization/pt2e/test_numeric_debugger.py TestNumericDebugger.test_re_export_preserve_handle modified yesterday ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2350937008))
2024-09-14 10:02:55 +00:00
Michael Lazos
7743149b2b [Dynamo] Trace enter/exit of TorchFunctionModes (#135422)
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)

Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call

resume fn structure:
1. enter context
2. jump
...
3. exit context

The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).

So for torch function modes the structure of our output code is this:

1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function

Then our resume fn looks like this:

1. no-op enter torch function mode
2. jump
3.  exit tf mode

To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).

Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135422
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444
2024-09-14 02:41:08 +00:00
PyTorch MergeBot
ac169795a9 Revert "[Dynamo] Trace enter/exit of TorchFunctionModes (#135422)"
This reverts commit 2af3b8ffd8.

Reverted https://github.com/pytorch/pytorch/pull/135422 on behalf of https://github.com/albanD due to Broke tests on main ([comment](https://github.com/pytorch/pytorch/pull/134732#issuecomment-2348886378))
2024-09-13 12:52:57 +00:00
Michael Lazos
2af3b8ffd8 [Dynamo] Trace enter/exit of TorchFunctionModes (#135422)
This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)

Typically the bytecode for a context manager looks like this during a graph break:
1. graph call
2. enter context
3. unsupported code
4. exit context
5. resume call

resume fn structure:
1. enter context
2. jump
...
3. exit context

The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).

So for torch function modes the structure of our output code is this:

1. graph call
2. mutate tf mode stack to replay mutations
4. unsupported code
5. on exception restore stack
6. resume function

Then our resume fn looks like this:

1. no-op enter torch function mode
2. jump
3.  exit tf mode

To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).

Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135422
Approved by: https://github.com/williamwen42
ghstack dependencies: #134732, #133137, #135443, #135444
2024-09-13 08:41:24 +00:00
Michael Lazos
d9ae92cd6e [Dynamo] Support for proxying frozen dataclasses (#134846)
Fixes https://github.com/pytorch/pytorch/issues/133858

Details: Previously Dynamo would treat dataclasses as UserDefinedVariables. This was non-desirable if we would like to proxy the value into the graph, which is needed for TensorSubclassMetadata. To rectify this, frozen dataclasses are now able to be proxied similarly to NamedTuples. We require the object to be frozen, because if arbitrary mutation were allowed, we would need to replay those mutations in the graph after construction of the object.

For tracing construction of the variable, the generated `__init__` for the dataclass uses `object.__setattr__` because frozen dataclasses throw errors on the usual `__setattr__` invocation. With this treatment, no special handling is needed in dynamo for frozen dataclass construction.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134846
Approved by: https://github.com/bdhirsh, https://github.com/anijain2305
2024-09-04 22:17:00 +00:00
Xu Han
2ec149cd3e [inductor] fix test_functional_call_sequential_params_and_buffers expectation on Windows (#134394)
This UT actual code only one empty line wrap difference(`linear` and `add`) between Windows/Linux, and the context is right.
Reproduce UTs:
```cmd
pytest test\dynamo\test_higher_order_ops.py -v -k test_functional_call_sequential_params_and_buffers
```

We can add `empty_line_normalizer` to fix it.

```cmd
______________________________________________________________________________________________ FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers _______________________________________________________________________________________________
Traceback (most recent call last):
  File "D:\xu_git\dnnl_cb\pytorch\test\dynamo\test_higher_order_ops.py", line 3676, in test_functional_call_sequential_params_and_buffers
    self.assertExpectedInline(
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 2871, in assertExpectedInline
    return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\expecttest\__init__.py", line 271, in assertExpectedInline
    self.assertMultiLineEqualMaybeCppStack(expect, actual, msg=help_text)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\expecttest\__init__.py", line 292, in assertMultiLineEqualMaybeCppStack
    self.assertMultiLineEqual(expect, actual, *args, **kwargs)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 1226, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 675, in fail
    raise self.failureException(msg)
AssertionError: 'clas[509 chars]one\n        add: "f32[1, 1]" = linear + l_buf[69 chars],)\n' != 'clas[509 chars]one\n\n        add: "f32[1, 1]" = linear + l_b[71 chars],)\n'
  class GraphModule(torch.nn.Module):
      def forward(self, L_params_l1_weight_: "f32[1, 1]", L_params_l1_bias_: "f32[1]", L_buffers_buffer_: "f32[1]", L_inputs_: "f32[1, 1]"):
          l_params_l1_weight_ = L_params_l1_weight_
          l_params_l1_bias_ = L_params_l1_bias_
          l_buffers_buffer_ = L_buffers_buffer_
          l_inputs_ = L_inputs_

          linear: "f32[1, 1]" = torch._C._nn.linear(l_inputs_, l_params_l1_weight_, l_params_l1_bias_);  l_inputs_ = l_params_l1_weight_ = l_params_l1_bias_ = None
+ <<<< (difference is here )
          add: "f32[1, 1]" = linear + l_buffers_buffer_;  linear = l_buffers_buffer_ = None
          return (add,)
 : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)

To execute this test, run the following from the base repo dir:
    python test\dynamo\test_higher_order_ops.py FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
========================================================================================================================== short test summary info ==========================================================================================================================
FAILED [0.4275s] test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers - AssertionError: 'clas[509 chars]one\n        add: "f32[1, 1]" = linear + l_buf[69 chars],)\n' != 'clas[509 chars]one\n\n        add: "f32[1, 1]" = linear + l_b[71 chars],)\n'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134394
Approved by: https://github.com/jansel

Co-authored-by: Jason Ansel <jansel@jansel.net>
2024-08-26 01:41:20 +00:00
Xuehai Pan
e74ba1b34a [BE][Easy][15/19] enforce style for empty lines in import segments in torch/_d*/ (#129767)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129767
Approved by: https://github.com/anijain2305
2024-07-31 21:18:11 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
Edward Z. Yang
e836ee1955 Enhancements to recompiles logs (#130043)
----

- We now record on CacheEntry what the compile id that populated it was, so now we can say why a specific frame was rejected
- Add structured log for recompiles under name artifact "recompile_reasons". As it stands, it's not terribly structured, but this was the easiest thing I could do to start
- Slightly reformat multi-reason printing; since we only report one guard failure seems better to have it as a single line

Example output:

```
V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles] Recompiling function f in /data/users/ezyang/a/pytorch/b.py:3
V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles]     triggered by the following guard failure(s):
V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles]     - 0/0: tensor 'L['x']' size mismatch at index 0. expected 4, actual 5
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130043
Approved by: https://github.com/anijain2305
2024-07-09 03:40:56 +00:00
PyTorch MergeBot
4bbadeee8a Revert "Set simdlen based on ATEN_CPU_CAPABILITY (#123514)"
This reverts commit b66e3f0957.

Reverted https://github.com/pytorch/pytorch/pull/123514 on behalf of https://github.com/clee2000 due to broke test/inductor/test_torchinductor.py::CpuTests::test_new_cpp_build_logical_cpu on periodic test on the no gpu tests b66e3f0957 https://github.com/pytorch/pytorch/actions/runs/9453518547/job/26040077301 ([comment](https://github.com/pytorch/pytorch/pull/123514#issuecomment-2159433432))
2024-06-10 22:46:01 +00:00
CaoE
b66e3f0957 Set simdlen based on ATEN_CPU_CAPABILITY (#123514)
It is part of https://github.com/pytorch/pytorch/issues/123224. Set simdlen based on the environment ATEN_CPU_CAPABILITY to control CPU vec ISA like eager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123514
Approved by: https://github.com/jgong5, https://github.com/peterbell10
2024-06-10 09:02:14 +00:00
Aaron Orenstein
dcfa7702c3 Flip default value for mypy disallow_untyped_defs [1/11] (#127838)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838
Approved by: https://github.com/oulgen
2024-06-08 18:16:33 +00:00
William Wen
d44ab8ba6d [dynamo] utility to generate bytecode from template function (#127359)
This will be helpful in reducing some of the hardcoded and python-version-dependent bytecode generation in various places in dynamo - e.g. resume function generation and object reconstruction.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127359
Approved by: https://github.com/jansel
ghstack dependencies: #127329
2024-05-30 06:37:32 +00:00
William Wen
f17572fcf6 add 3.12 inductor CI tests (#126218)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126218
Approved by: https://github.com/huydhn, https://github.com/desertfire
2024-05-16 22:29:24 +00:00
Edward Z. Yang
e93b57a570 Add propagate_real_tensors mode for unbacked (#125115)
A common complaint when working with data-dependent code in PyTorch is that it's hard to tell how far you are from the finish line: every time a GuardOnDataDependentSymNode error is hit, you have to somehow fix or workaround it to see the next one.

This PR adds a new mode `torch._functorch.config.fake_tensor_propagate_real_tensors` which modifies fake tensors to also propagate real tensors. This means that when we try to guard on a data-dependent SymNode, we can actually produce a real result. We also produce a warning which you should consult to figure out what the crux points are.

I ran this on vision_maskrcnn. In the baseline (without this mode), the model has 27 graph breaks, resulting in 40 graphs. With this mode on, the model has only 11 graph breaks, resulting in 15 graphs (the remaining graph breaks are due to missing functionality for item() on float tensor and some other Dynamo missing features.) You get a list of things that would have errored like this:

```
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> False
```

Potential later follow ups:

* Improve the warning messages (in particular, should provide user frames)
* GC real tensors when they are no longer needed by tracing. Right now, this will use A LOT of memory, equal to as if your GC was broken and every intermediate tensor was kept live

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125115
Approved by: https://github.com/IvanKobzarev
2024-05-02 15:28:26 +00:00
William Wen
d6c713884a [dynamo, 3.12] xfail refleaking tests due to buggy getattr_static (#125062)
For tracking https://github.com/pytorch/pytorch/issues/124302 so that we can re-enable the test once 3.12 updates with the bug fix for https://github.com/python/cpython/issues/118013.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125062
Approved by: https://github.com/anijain2305, https://github.com/jansel
2024-04-30 22:40:47 +00:00
Simon Fan
c12c85e919 Revert "[benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729)" (#125246)
This reverts commit 62b5738a8b.

https://github.com/pytorch/pytorch/pull/119729/ regresses cudagraph dashboard. Moving the one-time per iteration loss from CPU to CUDA is somehow causing a lot of copies:

current (top) vs with revert (bottom)
![image](https://github.com/pytorch/pytorch/assets/9547562/62dfbf66-7edc-4a3c-ba7f-1ec057fba950)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125246
Approved by: https://github.com/eellison
2024-04-30 22:39:53 +00:00
Simon Fan
62b5738a8b [benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729)
aten.div's output device will be its numerator's device. so it is acceptable to do cuda / cpu type divisions. post grad passes operate only on graphs and can't handle runtime graph inputs. so we change user code to move inputs to cuda for cudagraph. this affects any graph that has cpu tensors as graph inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119729
Approved by: https://github.com/eellison
2024-04-26 03:22:26 +00:00
PyTorch MergeBot
48a016157d Revert "[benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729)"
This reverts commit c021c9b8e4.

Reverted https://github.com/pytorch/pytorch/pull/119729 on behalf of https://github.com/jeanschmidt due to one PR in this stack seems to have broken linux pull cuda12 tests ([comment](https://github.com/pytorch/pytorch/pull/119729#issuecomment-2076750595))
2024-04-25 09:26:25 +00:00
Simon Fan
c021c9b8e4 [benchmark][cudagraph] Explicitly call aten.div with CUDA denominator for cudagraphs (#119729)
aten.div's output device will be its numerator's device. so it is acceptable to do cuda / cpu type divisions. post grad passes operate only on graphs and can't handle runtime graph inputs. so we change user code to move inputs to cuda for cudagraph. this affects any graph that has cpu tensors as graph inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119729
Approved by: https://github.com/eellison
2024-04-25 03:38:09 +00:00
Animesh Jain
a32eac345f [dynamo] Return gm.forward for eager backend (#124109)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124109
Approved by: https://github.com/yanboliang, https://github.com/jansel
ghstack dependencies: #124445
2024-04-20 14:11:05 +00:00
William Wen
812bae09be [dynamo] fix 3.11+ refleak (#124238)
Fixes https://github.com/pytorch/pytorch/issues/119607 for 3.11+.

In 3.11+, `_PyFrame_FastToLocalsWithError` could implicity run `COPY_FREE_VARS` on the original frame, leading to double incref's since the dynamo shadow frame can rerun `COPY_FREE_VARS`. So the solution is to skip the first `COPY_FREE_VARS` instruction in the shadow frame if it was already executed in the original frame.

Also move the location for clearing the original frame in 3.12 to handle error cases more thoroughly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124238
Approved by: https://github.com/jansel
2024-04-18 03:02:29 +00:00
Shunting Zhang
df5829d0ba [inductor] let rand_strided support fp8 (#124120)
I'm working on https://fb.workplace.com/groups/1075192433118967/posts/1411161629522044/ (this is a meta internal link about a inefficient inner/persistent reduction kernel generated by inductor). I found the generated benchmark code for a kernel ( https://gist.github.com/shunting314/13a0105f72a1c54d9c220370c7fd3845 ) can not be run since rand_strided failed to generate tensors for fp8. Errors are like

```
RuntimeError: "normal_kernel_cpu" not implemented for 'Float8_e4m3fn'
```
for CPU
or
```
RuntimeError: "normal_kernel_cuda" not implemented for 'Float8_e4m3fn'
```
for GPU

This PR work around that problem.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124120
Approved by: https://github.com/Chillee, https://github.com/jansel
2024-04-16 04:15:56 +00:00
Aaron Gokaslan
1d6c5972c1 [BE]: Optimize min/max/sum comprehensions C419 (#123960)
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
2024-04-12 23:54:15 +00:00
Peter Bell
6939279a17 [dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)
Fixes #114844

In the linked issue we have
```
compiled_module = torch.compile(module)
compiled_module.x = ...
compiled_module(...)  # Mutates self.x
```
Where since the module mutates `self.x` you would expect `compiled_module.x`
to be updated but actually `compiled_module.x = ...` sets an attribute "x"
on the `OptimizedModule` object while the forward method of the module mutates
`module.x`.

This gives the expected behavior by forwarding `compiled_module.__setattr__`
down to `module.__setattr__`. There is already a corresponding `__getattr__`
so now `compiled_module.x` becomes an alias for `module.x`.

Co-authored-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098
Approved by: https://github.com/ezyang, https://github.com/lezcano
2024-04-01 14:30:44 +00:00
William Wen
a9b27bbbe9 [dynamo, 3.12] update jump instructions (#122530)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122530
Approved by: https://github.com/jansel
ghstack dependencies: #122146, #122335, #122354, #122355, #122356, #122449, #122455, #122456
2024-03-27 20:39:39 +00:00
William Wen
2564f6cf0e [dynamo, 3.12] Allocate Dynamo shadow frames by mimicking CPython (#122146)
Python 3.12 changed a few things with how `_PyInterpreterFrame`s are allocated and freed:
- Frames are now required to be placed on the Python frame stack. In 3.11, we could allocate frames anywhere in memory. In 3.12, we now need to use `THP_PyThreadState_BumpFramePointerSlow`/`push_chunk`/`allocate_chunk`. This method of allocating/freeing frames is also compatible with 3.11.
- The eval frame function is now responsible for clearing the frame (see https://docs.python.org/3/whatsnew/changelog.html#id128, the point about "...which now clear the frame.")

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122146
Approved by: https://github.com/jansel
2024-03-27 20:39:39 +00:00
PyTorch MergeBot
f631586084 Revert "[dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)"
This reverts commit b6982bf2b2.

Reverted https://github.com/pytorch/pytorch/pull/122098 on behalf of https://github.com/atalman due to Failing internally ([comment](https://github.com/pytorch/pytorch/pull/122098#issuecomment-2021233604))
2024-03-26 18:54:17 +00:00
Peter Bell
b6982bf2b2 [dynamo] Forward OptimizedModule.__setattr__ to the wrapped module (#122098)
Fixes #114844

In the linked issue we have
```
compiled_module = torch.compile(module)
compiled_module.x = ...
compiled_module(...)  # Mutates self.x
```
Where since the module mutates `self.x` you would expect `compiled_module.x`
to be updated but actually `compiled_module.x = ...` sets an attribute "x"
on the `OptimizedModule` object while the forward method of the module mutates
`module.x`.

This gives the expected behavior by forwarding `compiled_module.__setattr__`
down to `module.__setattr__`. There is already a corresponding `__getattr__`
so now `compiled_module.x` becomes an alias for `module.x`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122098
Approved by: https://github.com/ezyang, https://github.com/lezcano
2024-03-26 00:52:12 +00:00