Commit Graph

219 Commits

Author SHA1 Message Date
Yidi Wu
824474cb35 [cond] support output sizes mismatch in front end (#147130)
This PR finishes https://github.com/pytorch/pytorch/pull/137615 by addressing the TODOs and comments left there.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147130
Approved by: https://github.com/zou3519
2025-02-25 20:28:41 +00:00
eellison
92b7e610ab [Inductor changes] Invoke Quant (#139102)
Adds a `invoke_quant` higher order operator as proposed [here](https://docs.google.com/document/d/1s2PfJlq6Q1F8l11CkTIC69BW1rEnGEgs6YmBC7hu8rA/edit?tab=t.0).

The primary motivations are

- Unifying scattered reasoning for quant operators throughout the code base

- Easy of pattern matching - see this very large pattern match expression [here](949fdd2997/torch/_inductor/fx_passes/post_grad.py (L390-L426). Compared to the pattern I have in the tests:

```
        @register_graph_pattern(
            CallFunction(
                torch.ops.aten.mm,
                CallFunction(
                    torch.ops.higher_order.invoke_quant,
                    Ignored(),
                    Ignored(),
                    Ignored(),
                    scheme="nf4",
                ),
                Arg(),
            ),
            pass_dict=test_pass,
        )
```

- Ability to specify inductor specific logic, like codegen'ing the operators in lower precision, or forcing fusion to a matmul.

Example graph:

``` Python
 ===== AFTER POST GRAD =====
 /data/users/eellison/pytorch/torch/fx/_lazy_graph_module.py class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"):
         # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self)  # type: ignore[call-arg]
        repeated_subgraph0 = self.repeated_subgraph0
        invoke_quant: "f32[8][1]cpu" = torch.ops.higher_order.invoke_quant(repeated_subgraph0, arg0_1, arg1_1, scheme = 'nf4');  repeated_subgraph0 = arg0_1 = arg1_1 = None
        return (invoke_quant,)

    class repeated_subgraph0(torch.nn.Module):
        def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"):
             # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self)  # type: ignore[call-arg]
            mul: "f32[8][1]cpu" = torch.ops.aten.mul.Tensor(arg0_1, arg1_1);  arg0_1 = None
            add: "f32[8][1]cpu" = torch.ops.aten.add.Tensor(mul, arg1_1);  mul = arg1_1 = None
            return add
```

The schema for `invoke_quant` is `torch.ops.higher_order.invoke_quant(subgraph, *args, scheme=None)` where the scheme will not always be present.

I wasn't sure exactly how the inductor specific configurations like `codgen_in_low_precision` should be passed through. I didnt want to stuff them all in as kwargs, and I didn't want to have them affect pattern matching. So they will be stored as meta of the node itself. And, following that, I wanted the invocation of the hop to match how it will show up in the graph. So I decided to have it be an object that is then invoked for the tracing.

```
invoke_quant = InvokeQuant(codegen_low_precision=True)
invoke_quant(gn, (x, y), scheme="nf4")
```
Todo - not require the packing of args in a tuple, will do following https://github.com/pytorch/pytorch/pull/139162.

Feedback welcome.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139102
Approved by: https://github.com/Chillee
2025-02-08 19:30:19 +00:00
Yanbo Liang
bd8d7b1b74 [Dynamo][Trace PyDispatcher] Remove disable from HigherOrderOperator.__call__ (#146270)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146270
Approved by: https://github.com/zou3519
2025-02-03 21:47:54 +00:00
Simon Fan
2e197c8a2d [dynamo][hop] test torch.compiling all HOPs (#145422)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145422
Approved by: https://github.com/ydwu4, https://github.com/zou3519
2025-01-31 20:45:22 +00:00
Ryan Guo
eaec97ab1f [dynamo] Properly prune dead input cell object (#145781)
This patch models input cell object as "newly created" rather than
"pre-existing" python object (see added documentation for why this
actually captures the semantics more accurately).

This enables the `SideEffects.prune_dead_object_new` algorithm to prune
away writes to input cell objects which are no longer relevant; this
didn't happen prior to this patch because we modelled them as
pre-existing objects, which forces us to codegen their attribute
mutations.

Fixes #145564.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145781
Approved by: https://github.com/williamwen42, https://github.com/jansel
2025-01-28 18:28:13 +00:00
Animesh Jain
19584b28fd [dynamo][dicts] Consolidate dict(..) construction (#144342)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144342
Approved by: https://github.com/StrongerXi
2025-01-20 04:42:06 +00:00
PyTorch MergeBot
5e6e6200bf Revert "[dynamo][dicts] Consolidate dict(..) construction (#144342)"
This reverts commit a54a784b82.

Reverted https://github.com/pytorch/pytorch/pull/144342 on behalf of https://github.com/kit1980 due to breaking internal builds, see D68125388 ([comment](https://github.com/pytorch/pytorch/pull/144342#issuecomment-2597184167))
2025-01-17 00:32:09 +00:00
Animesh Jain
a54a784b82 [dynamo][dicts] Consolidate dict(..) construction (#144342)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144342
Approved by: https://github.com/StrongerXi
2025-01-13 22:24:56 +00:00
Yidi Wu
c36f94b373 [while_loop][dynamo] auto-unspecialize int input and output to unbacked symints (#143106)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143106
Approved by: https://github.com/zou3519
ghstack dependencies: #143105, #143545
2025-01-03 19:01:07 +00:00
Tom Ritchford
d25e6e623f Fix unused Python variables in test/[a-d]* (#134665)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134665
Approved by: https://github.com/albanD
2024-12-13 22:13:12 +00:00
Yidi Wu
7111cd6ee0 [hop][BE] add util diff_meta with prettier error message. (#142162)
The error message changes from:
```python
-torch._dynamo.exc.Unsupported: Expected branches to return tensors with same metadata. [(tensor_pair, difference)...]:[('pair0:', TensorMetadata(shape=torch.Size([4, 3]), dtype=torch.float32, requires_grad=False, stride=(3, 1), memory_format=None, is_quantized=False, qparams={}), TensorMetadata(shape=torch.Size([2, 3]), dtype=torch.float32, requires_grad=False, stride=(3, 1), memory_format=None, is_quantized=False, qparams={}))]
```
to
```python
+torch._dynamo.exc.Unsupported: Expect branches to return tensors with same metadata but find pair[0] differ in 'shape', where lhs is TensorMetadata(shape=torch.Size([4, 3]), dtype=torch.float32, requires_grad=False, stride=(3, 1), memory_format=None, is_quantized=False, qparams={}) and rhs is TensorMetadata(shape=torch.Size([2, 3]), dtype=torch.float32, requires_grad=False, stride=(3, 1), memory_format=None, is_quantized=False, qparams={})
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142162
Approved by: https://github.com/zou3519
2024-12-10 21:54:28 +00:00
Yidi Wu
9ced54a51a [hop] lift free symbols in slice (#142385)
Before the change, we get an unfound proxy error when linting the subgraph.

After the change, we have the following dynamo graph for dynamic_shape test.

```python
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]  /data/users/yidi/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.Module):
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]     def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", L_x_: "f32[s0, s1, s2][s1*s2, s2, 1]cpu"):
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]         l_x_ = L_x_
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]          # File: /data/users/yidi/pytorch/test/dynamo/test_higher_order_ops.py:307 in f, code: i = x.size(0) - 2
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]         sub: "Sym(s0 - 2)" = s0 - 2
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]          # File: /data/users/yidi/pytorch/test/dynamo/test_higher_order_ops.py:308 in f, code: j = x.size(1) - 3
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]         sub_1: "Sym(s1 - 3)" = s1 - 3
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]          # File: /data/users/yidi/pytorch/test/dynamo/test_higher_order_ops.py:310 in f, code: return wrap(lambda x: x[:i, :j, k:], x)
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]         wrap_body_0 = self.wrap_body_0
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]         wrap = torch.ops.higher_order.wrap(wrap_body_0, s0, s1, s2, l_x_, sub, sub_1);  wrap_body_0 = s0 = s1 = s2 = l_x_ = sub = sub_1 = None
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]         getitem: "f32[s0 - 2, s1 - 3, 0][s1*s2, s2, 1]cpu" = wrap[0];  wrap = None
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]         return (getitem,)
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]     class wrap_body_0(torch.nn.Module):
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]         def forward(self, s0: "Sym(s0)", s1: "Sym(s1)", s2: "Sym(s2)", l_x_: "f32[s0, s1, s2][s1*s2, s2, 1]cpu", sub: "Sym(s0 - 2)", sub_1: "Sym(s1 - 3)"):
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]              # File: /data/users/yidi/pytorch/test/dynamo/test_higher_order_ops.py:310 in <lambda>, code: return wrap(lambda x: x[:i, :j, k:], x)
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]             getitem: "f32[s0 - 2, s1 - 3, 0][s1*s2, s2, 1]cpu" = l_x_[(slice(None, sub, None), slice(None, sub_1, None), slice(s2, None, None))];  l_x_ = sub = sub_1 = s2 = None
V1209 11:11:06.187000 4091124 torch/_dynamo/output_graph.py:1346] [0/2] [__graph_code]             return (getitem,)
```

We lift sub, sub_1 because they're compound expressions and are directly used in argument of the getitem node. We lift s0, s1 and s2 because they're basic symbols in the tensor input.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142385
Approved by: https://github.com/zou3519
2024-12-10 21:52:30 +00:00
Ryan Guo
9d54cd1504 [dynamo] Undo some jvp old workarounds in functorch (#142081)
This basically undoes some workarounds introduced in #119926, the
root causes of which have been fixed by #142078 and other changes in
Dynamo.

Now that Dynamo traces the spec comparison code, the test also needs update:
- removing the `_jvp_treespec_compare` calls in fx graph

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142081
Approved by: https://github.com/zou3519
ghstack dependencies: #142078, #142080
2024-12-06 08:06:53 +00:00
Ryan Guo
59de5e867b [dynamo] Undo some vjp old workarounds in functorch (#142080)
This basically undoes most of the workarounds introduced in #119405, the
root causes of which have been fixed by #142078 and other changes in
Dynamo.

Now that Dynamo traces the spec comparison code, the test also needs update:
1. renaming `o` to `pimals_out`
2. removing the `_vjp_treespec_compare` calls in fx graph

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142080
Approved by: https://github.com/zou3519
ghstack dependencies: #142078
2024-12-06 08:06:53 +00:00
PyTorch MergeBot
ad37afd590 Revert "Always unspecialize float in OSS (#138922)"
This reverts commit ba5253da9b.

Reverted https://github.com/pytorch/pytorch/pull/138922 on behalf of https://github.com/yf225 due to perf regression on torchbench ([comment](https://github.com/pytorch/pytorch/pull/138922#issuecomment-2499277511))
2024-11-26 00:03:03 +00:00
Bob Ren
ba5253da9b Always unspecialize float in OSS (#138922)
Fixes https://github.com/pytorch/pytorch/issues/107277

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138922
Approved by: https://github.com/ezyang

Co-authored-by: Edward Z. Yang <ezyang@meta.com>
2024-11-24 01:58:13 +00:00
PyTorch MergeBot
a8c90e5140 Revert "Always unspecialize float in OSS (#138922)"
This reverts commit 6d779d0549.

Reverted https://github.com/pytorch/pytorch/pull/138922 on behalf of https://github.com/huydhn due to Sorry for reverting your change but there is some slow tests failing after this land ([comment](https://github.com/pytorch/pytorch/pull/138922#issuecomment-2495076878))
2024-11-22 23:18:36 +00:00
Bob Ren
6d779d0549 Always unspecialize float in OSS (#138922)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138922
Approved by: https://github.com/ezyang

Co-authored-by: Edward Z. Yang <ezyang@meta.com>
2024-11-22 17:54:42 +00:00
Guilherme Leobas
7ced49d2cc Raise exception if vmap (eager) calls compiled function (#140439)
Fixes #138422

This is not a proper fix for #140439, but more of a way to prevent a user from seeing a nasty error inside the C++ code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140439
Approved by: https://github.com/zou3519
2024-11-19 16:27:48 +00:00
Ryan Guo
ea1d11cf74 [dynamo] Represent all cells as NewCellVariable (#140153)
In addition to `NewCellVariable`, Dynamo has 3 ways of modeling cell objects:
1. For cells captured and created by the root frame, represent them as
   their contents in `root_tx.symbolic_locals`, which `LOAD_DEREF` and
   `STORE_DEREF` update directly, without going through `SideEffects`.
2. `ClosureVariable`: this is created when cells from (1) are captured
   by a newly created function Dynamo is about to inline. It's a handle
   with a name that redirects `LOAD_DEREF` and `STORE_DEREF` back (1),
   to make `root_tx.symbolic_locals` up-to-date.
3. For cells that are captured by both the root frame and some
   pre-existing function Dynamo is about to inline, represent those
   cells as contents, and do not allow writes to them.

Note that (2) and (3) are mainly to conform with (1) -- to make sure
Dynamo has a consistent modeling of cells for the same cell objects.

In this patch, we represent all of these cells as `NewCellVariable`. The
main new code paths introduced are:
- using `NewCellVariable` to model cell objects created by the root
  frame (the cells are passed in as input to `InstructionTranslator`),
  this is what allows us to get rid of all 3 legacy paths above.
- adding a new `AutoDerefLocalSource` to deal with the python-code
  level (guards) and bytecode level (codegen) auto-dereferencing
  behavior, when accessing pre-existing python cells. This also
  involves a tiny update to guard manager generation.
- plumbing some extra info into `LocalSource` and `CellVariable` so that
  we can still emit `LOAD_DEREF`, `STORE_DEREF`, `LOAD_CLOSURE` (instead
  of `make_cell`, `cell_contents` attribute access, and `LOAD_FAST`),
  which is important for readability, performance, and some
  assumptions `bytecode_transformation.py` makes.

As a result, this patch removes a lot of the now-dead code paths and
TODOs. Notably, it significantly simplified the `prune_dead_locals`
function, which was duplicating a lot of the logic from
`prune_dead_object_new`; this conveniently closes #137123.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140153
Approved by: https://github.com/jansel
ghstack dependencies: #140330, #140152, #140436, #140435
2024-11-15 17:17:30 +00:00
zeshengzong
cb71bcc542 Replace clone.detach with detach.clone (#140264)
Fixes #64532

As state in issue, replace `clone.detach` by `detach.clone`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264
Approved by: https://github.com/soulitzer
2024-11-13 07:01:02 +00:00
Yidi Wu
ab42967238 [hop free symbols] lift free symbols in example_value when create_graph_input (#138363)
There are 4 parts (they are hard to further break into smaller ones cause they're highly coupled) in this PR:
1. **Whenever we call create_graph_input, we try to bind the symbols in the graph input.**
We've enforced the invariant that all create_graph_inputs calls must provide an example value, we could intercept at the create_graph_input calls (This PR only handles free symbols in tensors).
2. **We cache the bound_symbols** to avoid lift the same symbol repeated.
3. For lifted symbols, we re-used  **lifted_freevars** i.e. the mapping between symbol proxy in parent graph to the lifted phs in current subgraph, which we handle lifted tensors. In this way, all hops that supports lifted tensors should be able to handle lifted_symints automatically (at least in dynamo part).
4. For **unbacked symbols** created during tracing, we need to also bound these symbols to its proxy. This is to support the tests cases where we want to lift unbacked symbols as input. We need the proxy of the unbacked symbol in parent graph in order to properly create the args to the hop.
5. We change all the tests after free symbols are lifted in subgraphs. And also supports the lifted symbols in existing higher order ops.

**The interaction of nested tracers:**
The previous design for lifting tensor closures is that: suppose we're in nested tracers, whenever we see a new proxy that's not created by create tracer, we recursively look for the proxy in parent tracer until we find the tracer that creates this proxy (either a placeholder or some intermediate results). More detail is in Note [Nested SubgraphTracer and free_variable handling].

Given the above design, the plan for lifting the free symbols is: whenever we lift a free tensor to be the inputs of current subgraph, we'll look at the symbols in it and bind the symbols at the same time.

For example, suppose we have the following function:
```python
def f(x: [s1, s2]):
  def true_f():
    def true_f_inner():
      return x.sin()
```
what will happen in time order:

1. we create a subtracer 1 and start to speculate the outer cond's true_f
2. we create a another subtracer 2 and start to speculate the inner cond's true_f_inner.
3. dynamo realize the tensor input x by calling wrap_tensor in top-level to create graph input x (tracer 0), we bind the symbol s1, s2 after ph for x is created. So the graph now looks like:
```python
def gm(s1, s2, x):
```
4. when seeing TensorVariable.call_method of x,  tracer2 wants to create a call_function(sin, proxy_of_x), but it finds that proxy_of_x is not created by current tracer. So it recursively look up its parent tracer1 and find parent tracer1 also doesn't track this proxy_of_x then it finds the root tracer0, who is the creator of it and tracks it as a ph. Then tracer 1 create_graph_input  to lift the closure to its input ph1 and add (proxy_of_x: ph1) k-v in **lifted_freevars**  of tracer 1.
Now the graph looks like:
```python
def gm(s1, s2, x):
  def true_gm(x):
```
5. Since there are free symbols inside this new tensor input, tracer 1 also binds the symbols (maybe_bind_symbol), which calls create_graph_input for s1 and s2. Now the graph looks like
```python
def gm(s1, s2, x):
  def true_gm(s1, s2, x):
```
6. then it goes back to tracer 2, and call create_graph_input for x and get ph2, tracer 2's **lifted_freevars** records (ph1, ph2). and tracer 2 also binds the symbols in this new tensor input. Now the graph looks like:
```python
def gm(s1, s2, x):
  def true_gm(s1, s2, x):
    def true_gm_inner(s1, s2, x):
```
7. Finally the sin call_function node is created by tracer 2.

**This PR also handles the following cases:**
- What if we lift two tensors share the same symbol? e.g. x1 [s1, s2], x2 [s2, s3]? Each subtracer maintains bound_symbols as a cache that maps a symbol.expr to its proxy in current tracer. So when we see x1, we'll track s1 and s2 as inputs and bound s1 to ph1, s2 to ph2. So when we try to bind symbols of x2, s2 will already be tracked so no graph input is created.
- what if a subgraph close over a symint? e.g.
```python
def f(x):
  def true_f():
    c = x.size(0)
   def true_fn_inner():
     return c
```
When we speculate true_fn_inner, we find proxy_of_c is not tracked by tracer 2, so it recursively looks up its parent. At this point, x and its symbols have been lifted as input of true_f (as a result of lifting x during tracing true_f in tracer 1. Specifically the graph looks like:
```python
def gm(s1, s2, x):
  def true_gm(s1, s2, x):
    def true_gm_inner():
```
So tracer 2 is able to find that s1 have been tracked as ph in tracer 1 so it returns back to gm and call create_graph_input on s1. The graph now looks like:
```python
def gm(s1, s2, x):
  def true_gm(s1, s2, x):
    def true_gm_inner(s1):
     return s1
```

-  What if subgraph close over an unbacked symint? e.g.
```python
def f(x):
  def true_f():
    c =  x.item()
    def true_f_inner():
      return c
```
When x.item() is called, proxy_of_c and its symnode variable is created for tracer 1, and we also call track_unbacked_symbols to record this relationship. So when tracer 2 finds proxy_of_c is not created by current tracer, it recursivelly looks up its parent tracer and finds that that expression u0 has been tracked as a result of track_unbacked_symbol in tracer 1. So it will stop the recursion and create_graph_input u0 in tracer 2. Graph looks like:
```python
def f(x):
  def true_f(s1, s2, x):
    c = x.item()
    def true_gm_inner(u0):
      return u0
    cond(pred, true_gm_inner, false_gm_inner, (c,))
```

- what if subgraph close over a tensor with unbacked symint shape?
```python
def f(x):
  def true_f():
    c = x.item()
    r = torch.randn((c,))
    def true_f_inner():
      return r + 1
```
This is the same as the case of closing over tensors with backed shapes. where we first lift r, then bind u0 in it, which recursively bind_symint of u0 in its parent and found u0 is tracked in parent tracer as a result of .item() call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138363
Approved by: https://github.com/zou3519
2024-11-07 04:44:32 +00:00
Yidi Wu
dc3a6a9d08 [hop free symbols][refactor] make create_graph_input always take example_value (#138428)
Code refactoring only. We move the wrap_to_fake_tensor_logic out of wrap_fx_proxy for placeholders to provide the invariant that **all graph inputs must set their example values when creating the inputs**. This invariant helps us to identify all the free symbols in the graph in top-level and sub-graphs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138428
Approved by: https://github.com/ezyang, https://github.com/zou3519
ghstack dependencies: #138345
2024-11-04 22:47:49 +00:00
PyTorch MergeBot
b6b9596607 Revert "[dynamo] Fix constant propagation in builtins and UserClasses (#131354)"
This reverts commit 44257c063e.

Reverted https://github.com/pytorch/pytorch/pull/131354 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems to break some internal tests ([comment](https://github.com/pytorch/pytorch/pull/131354#issuecomment-2451050605))
2024-11-01 00:13:20 +00:00
Tom Ritchford
44257c063e [dynamo] Fix constant propagation in builtins and UserClasses (#131354)
* Fixes https://github.com/pytorch/pytorch/issues/118675
* Replaces https://github.com/pytorch/pytorch/pull/118994

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131354
Approved by: https://github.com/jansel, https://github.com/anijain2305
2024-10-30 12:47:20 +00:00
Yuanhao Ji
1a2dc89f17 [Dynamo] Allow torch.cond() to handle emply arguments (#138190)
Fixes #138150

```python
import torch

@torch.compile(fullgraph=True)
def foo(x, y, z):
    def f():
        return y + 2

    def g():
        return z + 1

    return torch.cond(x, f, g)

print(foo(torch.zeros(1), torch.ones(1), torch.ones(1))) # tensor([2.])
print(foo(torch.ones(1), torch.ones(1), torch.ones(1))) # tensor([3.])
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138190
Approved by: https://github.com/ezyang, https://github.com/zou3519
2024-10-26 15:26:21 +00:00
Ryan Guo
dd7c2899bd [dynamo] Properly prune dead cell local variables (#136891)
This patch updates the `prune_dead_locals` logic to do slightly more aggressive pruning for cell local variables, in absence of side-effects, e.g., a cell variable can be pruned when its user function(s) will never be used again.

See added tests for examples; note that a few tests in `test/dynamo/test_higher_order_ops.py` also got updated because we are no longer returning the unnecessary graph output.

Fixes #127350, #124653

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136891
Approved by: https://github.com/jansel, https://github.com/anijain2305, https://github.com/williamwen42, https://github.com/zou3519
2024-10-10 18:21:24 +00:00
Yanbo Liang
a408cfcbf1 [torch.compile] torch.vmap supports dynamic shapes + enable flex attention create_block_mask dynamic shapes (#137163)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137163
Approved by: https://github.com/Chillee
2024-10-04 05:16:04 +00:00
Bob Ren
13ec343afe clean up capture_func_transforms flag (#136960)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136960
Approved by: https://github.com/guilhermeleobas, https://github.com/jansel
2024-10-04 01:10:52 +00:00
PyTorch MergeBot
9223c16208 Revert "Fix constant propagation in builtins and UserClasses (#131354)"
This reverts commit dd4a51b39a.

Reverted https://github.com/pytorch/pytorch/pull/131354 on behalf of https://github.com/atalman due to Breaks torchrec tests ([comment](https://github.com/pytorch/pytorch/pull/131354#issuecomment-2375417145))
2024-09-25 23:01:03 +00:00
Tom Ritchford
dd4a51b39a Fix constant propagation in builtins and UserClasses (#131354)
* Fixes https://github.com/pytorch/pytorch/issues/118675
* Replaces https://github.com/pytorch/pytorch/pull/118994

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131354
Approved by: https://github.com/jansel, https://github.com/anijain2305
2024-09-25 13:03:40 +00:00
Yidi Wu
b07d0a22f5 [hop] require hops to override __call__. (#134352)
Fixes https://github.com/pytorch/pytorch/issues/133719 by making `__call__` of hops an abstractmethod.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134352
Approved by: https://github.com/zou3519
2024-08-28 19:56:40 +00:00
Wuxun Zhang
1d231ff8ba [HOO] add hints_wrapper to support passing context hints (#132860)
Fixes #126393

The implementation code is based on feedback here (https://github.com/pytorch/pytorch/pull/121639#issuecomment-2223948842).

Hints are passed as kwargs of hints_wrapper op. It also supports nested hints.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132860
Approved by: https://github.com/ydwu4, https://github.com/zou3519
2024-08-26 18:21:22 +00:00
Xu Han
2ec149cd3e [inductor] fix test_functional_call_sequential_params_and_buffers expectation on Windows (#134394)
This UT actual code only one empty line wrap difference(`linear` and `add`) between Windows/Linux, and the context is right.
Reproduce UTs:
```cmd
pytest test\dynamo\test_higher_order_ops.py -v -k test_functional_call_sequential_params_and_buffers
```

We can add `empty_line_normalizer` to fix it.

```cmd
______________________________________________________________________________________________ FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers _______________________________________________________________________________________________
Traceback (most recent call last):
  File "D:\xu_git\dnnl_cb\pytorch\test\dynamo\test_higher_order_ops.py", line 3676, in test_functional_call_sequential_params_and_buffers
    self.assertExpectedInline(
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 2871, in assertExpectedInline
    return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\expecttest\__init__.py", line 271, in assertExpectedInline
    self.assertMultiLineEqualMaybeCppStack(expect, actual, msg=help_text)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\expecttest\__init__.py", line 292, in assertMultiLineEqualMaybeCppStack
    self.assertMultiLineEqual(expect, actual, *args, **kwargs)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 1226, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 675, in fail
    raise self.failureException(msg)
AssertionError: 'clas[509 chars]one\n        add: "f32[1, 1]" = linear + l_buf[69 chars],)\n' != 'clas[509 chars]one\n\n        add: "f32[1, 1]" = linear + l_b[71 chars],)\n'
  class GraphModule(torch.nn.Module):
      def forward(self, L_params_l1_weight_: "f32[1, 1]", L_params_l1_bias_: "f32[1]", L_buffers_buffer_: "f32[1]", L_inputs_: "f32[1, 1]"):
          l_params_l1_weight_ = L_params_l1_weight_
          l_params_l1_bias_ = L_params_l1_bias_
          l_buffers_buffer_ = L_buffers_buffer_
          l_inputs_ = L_inputs_

          linear: "f32[1, 1]" = torch._C._nn.linear(l_inputs_, l_params_l1_weight_, l_params_l1_bias_);  l_inputs_ = l_params_l1_weight_ = l_params_l1_bias_ = None
+ <<<< (difference is here )
          add: "f32[1, 1]" = linear + l_buffers_buffer_;  linear = l_buffers_buffer_ = None
          return (add,)
 : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)

To execute this test, run the following from the base repo dir:
    python test\dynamo\test_higher_order_ops.py FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
========================================================================================================================== short test summary info ==========================================================================================================================
FAILED [0.4275s] test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers - AssertionError: 'clas[509 chars]one\n        add: "f32[1, 1]" = linear + l_buf[69 chars],)\n' != 'clas[509 chars]one\n\n        add: "f32[1, 1]" = linear + l_b[71 chars],)\n'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134394
Approved by: https://github.com/jansel

Co-authored-by: Jason Ansel <jansel@jansel.net>
2024-08-26 01:41:20 +00:00
Yidi Wu
a23d86c178 [hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133645
Approved by: https://github.com/zou3519
2024-08-23 17:28:02 +00:00
PyTorch MergeBot
1491a61769 Revert "[hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645)"
This reverts commit 696107efcb.

Reverted https://github.com/pytorch/pytorch/pull/133645 on behalf of https://github.com/ydwu4 due to breaking ci. probably due to land race ([comment](https://github.com/pytorch/pytorch/pull/133645#issuecomment-2302866106))
2024-08-21 19:33:14 +00:00
Yidi Wu
696107efcb [hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133645
Approved by: https://github.com/zou3519
ghstack dependencies: #133521
2024-08-21 17:34:21 +00:00
Guilherme Leobas
a9954d22f8 Raise exception if torch.func.* calls torch.compile functions (#128736)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128736
Approved by: https://github.com/zou3519
2024-08-08 20:21:44 +00:00
Zhengxu Chen
942ffd1b2d Make the __module__ name of HOO to be always "torch.ops.higher_order" (#132775)
Summary: It seems that we can just make this the default so that in the future all the ops printed in the graph should be like torch.ops.higher_order

Test Plan: CI

Differential Revision: D60530900

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132775
Approved by: https://github.com/ydwu4, https://github.com/zou3519
2024-08-08 16:55:09 +00:00
angelayi
a270800f0b [export][reland] Add print_readable to unflattened module (#132817)
Reland https://github.com/pytorch/pytorch/pull/128617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132817
Approved by: https://github.com/pianpwk
2024-08-08 06:05:30 +00:00
rzou
2073ddfd1c Actually report the HOP and subclass/mode when there isn't a registration (#132550)
Test Plan:
- tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132550
Approved by: https://github.com/ydwu4
2024-08-06 21:33:10 +00:00
David Berard
5973aec671 [fx] python_code(verbose=True): show size/strides for all tensors (#132192)
python_code(verbose=True) (or print_readable()) generates a string with the code representing the fx graph, with extra annotations indicating the size or stride of the tensor. Currently, it'll only shows sizes/strides for FakeTensors provided in metadata. For subclass tensors like NestedTensor, the outer class (provided in the node metadata) will be a non-FakeTensor and the inner tensors will be fake. This PR expands the conditional to show sizes/strides for all tensors, not just FakeTensors.

Testing: I ran this test script (below), ran it with `TORCH_LOGS=+dynamo` and found in the logs the graph shown below - we see that the input nested tensor has sizes and strides associated with it. Also, I stacked a diff on top of this one that forces the readable graph to be generated whenever PT2 is in use in tests, which should hopefully find any issues; https://github.com/pytorch/pytorch/pull/132195 shows no significant failures except for preexisting failures.

test script:
```python
import torch

def fn(x):
    return x.cos()

nt = torch.nested.nested_tensor_from_jagged(
    torch.randn(10, 10),
    torch.tensor([0, 1, 3, 6, 10]),
)

torch.compile(fn)(nt)
```

logs excerpt:
```
[0/0] [__graph_code] TRACED GRAPH
[0/0] [__graph_code]  ===== __compiled_fn_1 =====
[0/0] [__graph_code]  /data/users/dberard/pytorch/torch/fx/_lazy_graph_module.py class GraphModule(torch.nn.M

[0/0] [__graph_code]     def forward(self, L_x_: "f32[4, zf1, 10][10*zf1, 10, 1]cpu", zf1: "Sym(zf1)"):
[0/0] [__graph_code]         l_x_ = L_x_
[0/0] [__graph_code]
[0/0] [__graph_code]          # File: /data/users/dberard/scripts/nt_print_graph.py:4 in fn, code: return x.c

[0/0] [__graph_code]         cos: "f32[4, zf1, 10][10*zf1, 10, 1]cpu" = l_x_.cos();  l_x_ = None
[0/0] [__graph_code]         return (cos,)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132192
Approved by: https://github.com/Chillee
2024-08-03 02:54:32 +00:00
PyTorch MergeBot
3855ac5a5d Revert "[export] Add print_readable to unflattener (#128617)"
This reverts commit ab9791c0e3.

Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/angelayi due to never got landed internally due to weird flow... sorry ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2264224466))
2024-08-01 23:47:29 +00:00
Oguz Ulgen
920f0426ae Add None return type to init -- tests rest (#132376)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132376
Approved by: https://github.com/jamesjwu
ghstack dependencies: #132335, #132351, #132352
2024-08-01 15:44:51 +00:00
YangQun1
589aef4bb0 Fix py codegen to delete values that don't have any users (#131028)
Fixes #131025

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131028
Approved by: https://github.com/ezyang
2024-08-01 03:18:37 +00:00
ekamiti
9e473fd868 Make adding Buffers more like adding Parameters (#125971)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971
Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos
2024-07-31 10:32:40 +00:00
Guilherme Leobas
a843178529 Let dynamo inline functional_call (#128646)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128646
Approved by: https://github.com/zou3519
2024-07-30 14:22:23 +00:00
angelayi
ab9791c0e3 [export] Add print_readable to unflattener (#128617)
Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](17b45e905a/torch/fx/graph_module.py (L824))), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module.

Example print from `python test/export/test_unflatten.py -k test_unflatten_nested`
```
class UnflattenedModule(torch.nn.Module):
    def forward(self, x: "f32[2, 3]"):
        # No stacktrace found for following nodes
        rootparam: "f32[2, 3]" = self.rootparam

        # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam
        mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam);  x = rootparam = None

        # No stacktrace found for following nodes
        foo: "f32[2, 3]" = self.foo(mul);  mul = None
        bar: "f32[2, 3]" = self.bar(foo);  foo = None
        return (bar,)

    class foo(torch.nn.Module):
        def forward(self, mul: "f32[2, 3]"):
            # No stacktrace found for following nodes
            child1param: "f32[2, 3]" = self.child1param
            nested: "f32[2, 3]" = self.nested(mul);  mul = None

            # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param
            add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param);  nested = child1param = None
            return add

        class nested(torch.nn.Module):
            def forward(self, mul: "f32[2, 3]"):
                # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x
                div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul);  mul = None
                return div

    class bar(torch.nn.Module):
        def forward(self, add: "f32[2, 3]"):
            # No stacktrace found for following nodes
            child2buffer: "f32[2, 3]" = self.child2buffer

            # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer
            sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer);  add = child2buffer = None
            return sub
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617
Approved by: https://github.com/zhxchen17, https://github.com/pianpwk
2024-07-30 00:41:44 +00:00
PyTorch MergeBot
f72266ecea Revert "Let dynamo inline functional_call (#128646)"
This reverts commit 5aab1acc84.

Reverted https://github.com/pytorch/pytorch/pull/128646 on behalf of https://github.com/clee2000 due to the newly added test dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers [GH job link](https://github.com/pytorch/pytorch/actions/runs/10147452270/job/28058682000) [HUD commit link](5aab1acc84) is broken, probably a landrace since it passed on PR ([comment](https://github.com/pytorch/pytorch/pull/128646#issuecomment-2256375501))
2024-07-29 16:26:50 +00:00
Guilherme Leobas
5aab1acc84 Let dynamo inline functional_call (#128646)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128646
Approved by: https://github.com/zou3519
ghstack dependencies: #129091, #130490
2024-07-29 15:41:03 +00:00