Commit Graph

860 Commits

Author SHA1 Message Date
Yanbo Liang
da341d0d48 [Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432)
This is splitted from #113009, please check https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 for more details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113432
Approved by: https://github.com/ezyang, https://github.com/jansel
2023-12-09 05:11:44 +00:00
PyTorch MergeBot
e8e4141773 Revert "[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432)"
This reverts commit e61d6b42f0.

Reverted https://github.com/pytorch/pytorch/pull/113432 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing dynamo tests in trunk e61d6b42f0, landrace? ([comment](https://github.com/pytorch/pytorch/pull/113432#issuecomment-1847787981))
2023-12-08 20:15:39 +00:00
Michael Lazos
1c3a4a864c Remove always restore (#115317)
Removes always restore, assuming that a HOP will cleanup any leftover state from tracing fwd + bwd

This required a minor change to the autograd fn variable higher order op. If we are tracing forward DON'T add the call_function node into the main graph, since we are only tracing it for the purposes of speculation. Instead return the result directly to be passed to the backward for speculation. This was the only observable side effect on the output graph that I found.

Test plan:
test_smoke_from_test_autograd in test_autograd_function.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115317
Approved by: https://github.com/voznesenskym, https://github.com/jansel
2023-12-08 18:17:37 +00:00
Yanbo Liang
e61d6b42f0 [Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432)
This is splitted from #113009, please check https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 for more details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113432
Approved by: https://github.com/ezyang, https://github.com/jansel
2023-12-08 17:15:14 +00:00
Iris Zhang (PyTorch)
23fa9621e4 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099) (#115193)
Summary:

Rename _device_mesh.py to device_mesh.py, update all callsites, add documentation.
We created stubs for public class and methods in torch.distributed.device_mesh so that torch.distributed.device_mesh can be imported with or without distributed is available().

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/115099
Prior to landing, CI signals are all passed. Shipit added the "ci/trunk" label to the PR and DID NOT wait for it and went ahead committing. More context can be found in the reverted PR above.

Test Plan: CI.

Differential Revision: D51861018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115193
Approved by: https://github.com/fegin
2023-12-08 08:44:32 +00:00
voznesenskym
2c84616a94 Move the shape env symint cache to a symbol cache, better routing for subclass fakification [re-pr 115227] (#115396)
*
Context:

Joel sees that unless he manually writes to the fake tensor memo, fakification seems to produce spurious symbols! Voz (me) objects, saying that not only is directly writing to memo a bad pattern, recursively invoking fakification on tensor subclass elements in dynamo should suffice! Joel says that while he morally agrees, he has a test proving otherwise, a most perplexing situation.

Digging in, I figured out that while *we were* making fake tensors correctly, with properly cached symbols and the like, we were *also* incorrectly creating spurious symbols, leading the test to fail.

Before this PR, we would only cache source->symint. This was generally fine, but meant that you would create a symbol, then potentially throw it out due to symint cache. For example, the cache hit flow was:

make a symbol (ex: s2) -> use it to make a symint -> hit the cache (my_source-s1)

Now, in this example,  you have a symbol in your val_to_var/var_to_val (s2) that is unused. This is sound, but wasteful, and furthermore, misleading.

This was causing a test added in a PR in this stack to fail, specifically, because the test was using

```
curr_var_to_val = {
    str(k): v for k, v in context.fake_mode.shape_env.var_to_val.items()
}
````

To validate that no new symbols were being created (that is, that recursively creating fake tensors for subclasses was working).

The test is correct, but the implementation of caching would make (by this method of observation) cache hits look like cache misses.

So, the fix here is to move the cache up to be a general symbol cache, rather than only a cache for symints.

The initial implementation did that! But then, it ran into some interesting errors when it came to replay. When replaying symbol creation, behaviors would diverge in the new shape env! How could that be? The answer is because creating a new shape_env resulted in us replaying symbol creation... but with a cache from a different shape env! This was short circuiting symbol creation - and so, adding an extra layer to the cache for id(shape_env) fixes the problem.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115396
Approved by: https://github.com/mlazos
2023-12-08 05:02:21 +00:00
Michael Lazos
18d57dde2d Remove remaining uses of copy_graphstate (#115321)
After auditing higher_order_ops.py, the graph checkpoints were only getting used in the event of an exception, so it is safe to remove because we restart analysis in this case now.

To make this clearer the current state is the following:
Checkpoint side effects
Capture subgraph
if graph break:
  restore as usual
else:
  throw away inlining translator and subgraph tracer
Restore side effects

This will change to the following after this change:
Checkpoint side effects
Capture subgraph:
if graph break:
  restart analysis
else:
  throw away inlining translator and subgraph tracer
Restore side effects

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115321
Approved by: https://github.com/jansel, https://github.com/zou3519
2023-12-07 22:35:02 +00:00
ydwu4
dd6ae6d3b4 [HigherOrderOp] Remove additional get item calls in MapHigherOrder. (#115207)
As titled, this PR removes the unnessecary getitem call from the graph that's manipulated in MapHigherOrder, where we want to get the first dim slice of original tensor for specualtion but using call_method will accidentally create a get_item call in the graph, so want to avoid it by calling unpack_var_sequence on input tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115207
Approved by: https://github.com/yanboliang
ghstack dependencies: #115115, #115204, #115205
2023-12-07 17:06:44 +00:00
ydwu4
8b74735878 [HigherOrderOp] make MapHigherOrder create map_impl call_function node instead of map (#115205)
We want to remove the map_wrapper and replace it with dynamo always on. This is the first step of this plan.

In this PR, we make dynamo directly generates a map_impl nodes. This hasn't touch the eager logic yet. So the execution path after this PR looks like 1. `dynamo -> map_impl` when torch.compile is on. (Before this PR, it's `dynamo -> map_wrapper -> map_impl` and 2. `map_wrapper -> map_impl` (This PR did't touch the logic here).

The added TODO(yidi) is addressed in the following pr.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115205
Approved by: https://github.com/yanboliang
ghstack dependencies: #115115, #115204
2023-12-07 17:06:44 +00:00
ydwu4
be3efbebb6 [HigherOrderOp] make MapHigherOrder use should_flatten_output=True (#115204)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115204
Approved by: https://github.com/yanboliang
ghstack dependencies: #115115
2023-12-07 17:06:35 +00:00
ydwu4
998c87f93c [BE][HigherOrderOp] extract redundant code that unflattens the output (#115115)
We need this function to unflatten the variable tracker for HOPs that want pytree output support, e.g. map.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115115
Approved by: https://github.com/yanboliang
2023-12-07 17:06:28 +00:00
Michael Lazos
3c882925da Make subclass type instances constants (like UserDefinedClasses) (#115323)
As title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115323
Approved by: https://github.com/oulgen
2023-12-07 08:10:59 +00:00
Joel Schlosser
3a18211622 Guard on subclass inner tensors (#114965)
This PR introduces guarding on subclass inner tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114965
Approved by: https://github.com/voznesenskym
ghstack dependencies: #114311, #115212
2023-12-07 01:47:48 +00:00
Jon Chuang
83cb6a75ad [dynamo] add list iterator contains (#115237)
Fixes https://github.com/pytorch/pytorch/issues/115236

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115237
Approved by: https://github.com/jansel
2023-12-06 22:26:16 +00:00
rzou
67c8ad7285 Fix autograd.Function x enum input x torch.compile (#115206)
Fixes https://github.com/pytorch/pytorch/issues/114777. We treat Enums
like we do ConstantVariable.

Test Plan:
New test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115206
Approved by: https://github.com/yanboliang
ghstack dependencies: #115185, #115186, #115187
2023-12-06 15:18:25 +00:00
Jason Ansel
f4c67ffff4 [dynamo] Improve support for dynamic shapes str.format and _assert (#115203)
This removes a graph break in vision_maskrcnn.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115203
Approved by: https://github.com/yanboliang
2023-12-06 04:54:45 +00:00
rzou
b0b190f7c0 More descriptive error message for unsupported inputs to HOP (#115187)
Test Plan:
See updated tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115187
Approved by: https://github.com/ydwu4, https://github.com/yanboliang
ghstack dependencies: #115185, #115186
2023-12-06 01:29:03 +00:00
rzou
b5b011a5cd Expand input types for HOPs that use manually_set_subgraph_inputs=False (#115186)
Previously we only supported Tensor, Constants, and SymNode. We lift
that restriction (there's not really a good reason for it). HOPs like
torch.cond, torch.map already do input validation (those are the ones
that can only support Tensor, Constant, and SymNode inputs).

Test Plan:
New test for `wrap`, which is a HOP that has
manually_set_subgraph_inputs=False

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115186
Approved by: https://github.com/ydwu4, https://github.com/yanboliang
ghstack dependencies: #115185
2023-12-06 01:29:03 +00:00
rzou
bc46347152 Refactor how HOPs create new args to subgraphs (#115185)
This PR combines the logic for Tensor and SymNode.

Test Plan:
- Existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115185
Approved by: https://github.com/ydwu4, https://github.com/yanboliang
2023-12-06 01:29:03 +00:00
Yanbo Liang
4620170008 [Dynamo] Revert multiple PRs since they triggered compilation stuck internally (#115126)
Revert the following PRs to mitigate internal compilation stuck:
#113432
#114016
#114507
#114196
#114739
#114669

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115126
Approved by: https://github.com/xush6528
2023-12-05 22:35:37 +00:00
Joel Schlosser
22704426c3 Expand dynamic dims support for traceable subclasses (#114311)
Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo).

Summary:
* Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors
    * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance
    * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called
    * Addresses this: 6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)
* Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols)
    * Signatures now:
    ```python
    # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr)
    # ctx is anything useful for rebuilding the class we want to guard on
    attrs, ctx = x.__tensor_flatten__()
    ...
    # inner_tensors is a dict of {attr -> tensor}
    # ctx is taken unmodified from flattening and (eventually) guarded on
    # outer_size is the expected size of the output; possibly symbolic
    # outer_stride is the expected strides of the output; possibly symbolic
    y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride)

    # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride
    # the assert simplifies symbols when there are relationships between outer and inner symbols
    ```
    * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least
    * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now
* ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work)
* ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~
    * Now handled in #114469
* Next PR: add TENSOR_MATCH guards on inner tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311
Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh
2023-12-05 21:09:25 +00:00
Jason Ansel
4b8ddbbc7e [dynamo] Improve graph break message for copy.deepcopy (#115120)
I was curious what hf_T5_generate was trying to deepcopy, so I updated the errror message:
Before:
```
STATS graph_break
  ("'skip function deepcopy in file /home/jansel/conda/envs/pytorch/lib/python3.10/copy.py'', skipped according skipfiles.SKIP_DIRS'", 3)
  ...
```
After:
```
STATS graph_break
  ('copy.deepcopy UserDefinedObjectVariable(GenerationConfig)', 3)
  ...
```

Related issue: #115122

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115120
Approved by: https://github.com/oulgen
ghstack dependencies: #115095, #115046, #115057, #115119
2023-12-05 19:01:31 +00:00
Jason Ansel
522bae20df [dynamo] Support any() on SymNodeVariable (#115119)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115119
Approved by: https://github.com/yanboliang
ghstack dependencies: #115095, #115046, #115057
2023-12-05 19:01:31 +00:00
Jason Ansel
88642d44d9 [dynamo] Add RestrictedListSubclassVariable (#115057)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115057
Approved by: https://github.com/yanboliang
ghstack dependencies: #115095, #115046
2023-12-05 19:01:23 +00:00
Jason Ansel
a97ed2470a [dynamo] Support hasattr on dataclass (#115046)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115046
Approved by: https://github.com/yanboliang
ghstack dependencies: #115095
2023-12-05 19:01:14 +00:00
Nikita Shulga
a827ac71f2 Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)"
This reverts commit eaa64339d6.
2023-12-05 08:59:36 -08:00
Iris Zhang (PyTorch)
eaa64339d6 [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099)
Summary:
Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.

Original diff reverted: D51629761
Original PR reverted: https://github.com/pytorch/pytorch/pull/114991
It was failing because failing a public module binding tests in MacOS, and this is due to the change in import order for torch/distributed/fsdp/_common_utils.py. Since this original import would still work, we remove the changes in this file.

Test Plan: CI.

Differential Revision: D51825114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115099
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-12-05 05:44:52 +00:00
Jason Ansel
3d0bbb24a1 [dynamo] Improve support for list subclasses (#115052)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115052
Approved by: https://github.com/oulgen, https://github.com/eellison
ghstack dependencies: #114830, #115047, #115048
2023-12-05 01:31:33 +00:00
Jason Ansel
fe690f430a [dynamo] Fix dict.get with no default (#115048)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115048
Approved by: https://github.com/eellison, https://github.com/oulgen
ghstack dependencies: #114830, #115047
2023-12-05 01:31:33 +00:00
Yanbo Liang
8ef44e6110 [autograd.Function] Fix torch.compile w/ once_differentiable leads to opaque graph break (#113625)
Fixes #106893

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113625
Approved by: https://github.com/zou3519
2023-12-04 21:37:06 +00:00
Jason Ansel
a70c85ce90 [dynamo] Improve support for inspect.signature().parameters (#115047)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115047
Approved by: https://github.com/oulgen
ghstack dependencies: #114830
2023-12-04 19:08:36 +00:00
Xuehai Pan
3fbfa8cd0a [dynamo] support dict.copy() / OrderedDict.copy() / defaultdict.copy() (#115012)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115012
Approved by: https://github.com/jansel
ghstack dependencies: #115010, #115011
2023-12-04 01:50:10 +00:00
Xuehai Pan
917a52d2a2 [dynamo] support dict.update(seq2) / OrderedDict.update(seq2) / defaultdict.update(seq2) (#115011)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115011
Approved by: https://github.com/jansel
ghstack dependencies: #115010
2023-12-04 01:50:10 +00:00
Xuehai Pan
2e8ac5ea93 [dynamo] support dict.fromkeys() / OrderedDict.fromkeys() / defaultdict.fromkeys() (#115010)
Add support for `dict.fromkeys`, `OrderedDict.fromkeys`, and `defaultdict.fromkeys`.

Fixes #114963

- #114963

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115010
Approved by: https://github.com/jansel
2023-12-04 01:49:59 +00:00
Tugsbayasgalan Manlaibaatar
7f49603ed3 Fix https://github.com/pytorch/pytorch/issues/114899 (#114985)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114985
Approved by: https://github.com/ydwu4
2023-12-03 05:24:02 +00:00
PyTorch MergeBot
3a2e2044cd Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)"
This reverts commit 729ac7317a.

Reverted https://github.com/pytorch/pytorch/pull/114991 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114991#issuecomment-1837214567))
2023-12-02 17:55:51 +00:00
Iris Zhang (PyTorch)
729ac7317a [DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710) (#114991)
Summary:

Same content of changes as https://github.com/pytorch/pytorch/pull/114710

Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation.
ghstack-source-id: 208980207
exported-using-ghexport

Test Plan: CI.

Reviewed By: wanchaol

Differential Revision: D51629761

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114991
Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/fegin
2023-12-02 04:39:41 +00:00
voznesenskym
4cfe997490 [dynamo] handle setting .data on a tensor (#113080)
**Dynamo**

We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine.

The safe recipe is:

1) Disable grad
2) Call set_()
3) Manually lower the version counter on the object to hide it from the autograd engine

This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor.

**aot_autograd**

For aot_autograd, there's another snag.

Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input.

The new plan here is to:
1) Record tensor fakification policy in dynamo
2) provide a fresh fake mode to all backends
3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113080
Approved by: https://github.com/bdhirsh
2023-12-02 00:35:44 +00:00
David Berard
3fc58a6bbe Revert "Make offsets dynamic by default (#113734)" (#114889)
This reverts commit 7c38b76efe.

if a graph has a lot of inputs which are views (with nonzero storage offset), then the check for overlapping tensor views will add a lot of guards (n^2?)

b35ca2cb94/torch/_functorch/_aot_autograd/input_output_analysis.py (L256-L260)

this was causing very slow compilations on an internal model.

Differential Revision: [D51733774](https://our.internmc.facebook.com/intern/diff/D51733774)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114889
Approved by: https://github.com/ckluk2, https://github.com/YuqingJ, https://github.com/aaronenyeshi
2023-12-01 16:49:42 +00:00
Yanbo Liang
ab5385fc50 [Dynamo][6.3/N] Further cleanup torch.py (#114669)
A follow-up PR to clean up what I found during the refactor of torch.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114669
Approved by: https://github.com/jansel
2023-12-01 04:08:29 +00:00
Yanbo Liang
7f40640342 [Dynamo] Support torch.amp.autocast as decorator (#114845)
Fixes #114818

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114845
Approved by: https://github.com/jansel
2023-11-30 23:54:57 +00:00
vfdev
f93ea14309 [dynamo] Added support for math ops on ints with dynamic shapes (#114507)
Fixes #114218

```
import math
import torch

def func(x, a):
    b = math.floor(a + 0.5)
    b = math.radians(a) + b
    y = x + b
    return y

cfunc = torch.compile(func, dynamic=True, fullgraph=True, backend="eager")
x = torch.tensor([0, 1, 2, 3], dtype=torch.float32)
a = 12

out = cfunc(x, a)
```

```
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] TRACED GRAPH
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]  ===== __compiled_fn_0 =====
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]  <eval_with_key>.0 class GraphModule(torch.nn.Module):
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]     def forward(self, L_a_ : torch.SymInt, s1 : torch.SymInt, L_x_ : torch.Tensor):
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         l_a_ = L_a_
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         l_x_ = L_x_
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:7, code: b = math.floor(a + 0.5)
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add = l_a_ + 0.5
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         floor = math_floor(add);  add = None
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: /pytorch/torch/_dynamo/polyfill.py:28, code: return math.pi / 180.0 * x
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         mul = 0.017453292519943295 * l_a_;  l_a_ = None
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:9, code: b = math.radians(a) + b
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         add_1 = mul + floor;  mul = floor = None
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         # File: check_math_ops.py:13, code: y = x + b
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         y = l_x_ + add_1;  l_x_ = add_1 = None
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]         return (y,)
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
[2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114507
Approved by: https://github.com/lezcano
2023-11-30 14:11:57 +00:00
rzou
ce4bff4013 [dynamo] fix functools.wraps on nested functions (#114279)
Updated version of #108885 addressing the review. In this PR:
- We add a VT.can_reconstruct utility that checks if VT.reconstruct()
  does something.
- If functools.wraps(fn) is passed a `fn` that either has a source or
  has .can_reconstruct() == True, then we stash the source (or the VT)
- Later on, we use the source (or VT.reconstruct) to actually
  reconstruct the object in codegen.

Test Plan:
- New tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114279
Approved by: https://github.com/voznesenskym
2023-11-28 22:34:59 +00:00
voznesenskym
ddf1cb7870 AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554)
This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are:

(1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break)

(2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call.

(3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`.

(4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same).

I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation

**This PR is still silently correct in one case though**, which I'd like to discuss more. In particular, this example:
```
def f(x):
    x_view = x.view(-1)
    x.set_(torch.ones(2))
    x_view.mul_(2)
    return
```

If you have an input that experiences both a data-mutation **and** a `x_old.set_(x_new)` call, there are two cases:

(a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input

(b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like:
```

def functionalized_f(x):
    x_view = x.view(-1)
    # set_() desugars into a no-op; later usages of x will use x_output
    x_output = torch.ones(2)
    # functionalize the mutation on x_view
    x_view_updated = x.mul(2)
    x_updated = x_view_updated.view(x.shape)
    # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation
    # We need to return both updated tensors in our graph
    return x_updated, x_output
def runtime_wrapper(x):
    x_data_mutation_result, x_set_mutation_result = compiled_graph(x)
    # First, perform the data mutation on x's old storage
    x.copy_(x_data_mutation_result)
    # Then, swap out the storage of x with the new storage
    x.set_(x_set_mutation_result)
```

There are two things that make this difficult to do though:

(1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated.

(2) AOTAutograd now needs to know that we might have *two* graph outputs that correspond to a single "mutated input", which is annoying.

It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554
Approved by: https://github.com/ezyang
ghstack dependencies: #113926
2023-11-28 19:33:35 +00:00
Bin Bao
0bef97fac3 [dynamo] Support itertools.groupby (#114192)
Summary: for https://github.com/pytorch/pytorch/issues/108698

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114192
Approved by: https://github.com/jansel
2023-11-28 14:58:59 +00:00
lezcano
79ee99e6d2 [easy] Dispatch torch.from_numpy to torch.as_tensor (#114609)
...rather than detaching the tensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114609
Approved by: https://github.com/larryliu0820, https://github.com/voznesenskym
ghstack dependencies: #114608
2023-11-28 12:04:37 +00:00
lezcano
0bb2600c28 Allow to differentiate through NumPy code (#114608)
With this PR it is possible to differentiate through NumPy code modulo
the usual caveats that apply to differentiation:
- That there are no graphbreaks
- That the decomposition in `torch._numpy` is differentiable

@ev-br and I were somewhat careful to achieve the second point, but
it is not tested though and through, so YMMV

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114608
Approved by: https://github.com/voznesenskym
2023-11-28 12:04:37 +00:00
Angela Yi
dffa5f3f23 [dynamo][reland] ExecutorchCallDelegateHigherOrderVariable - add sanity check that input and output tensors are disjoint (#114167)
Summary: Reland of https://github.com/pytorch/pytorch/pull/111960, Fixes https://github.com/pytorch/pytorch/issues/111917

Original PR broke some internal tests which the current diff has resolved.

Test Plan: CI

Differential Revision: D51473196

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114167
Approved by: https://github.com/jon-chuang, https://github.com/zou3519
2023-11-28 00:27:23 +00:00
ydwu4
2ac0b61e60 [HigherOrderOp] dedup repeated get_attr placeholders in branches of cond (#112874)
We further de-duplicate the dupliacted get_attrs nodes.

For code below:
```python
def test_cond_free_variable_in_both_branches(self):
    backend = EagerAndRecordGraphs()
    cnt = CompileCounterWithBackend(backend)

    z = torch.ones(4, 4)

    class Foo(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.register_buffer("buffer", torch.ones(6, 4))

        def forward(self, x, y):
            def true_fn(x):
                return x.sum() + self.buffer.sum() + z.sum()

            def false_fn(x):
                return x.sum() - z.sum() - self.buffer.sum()

            return control_flow.cond(y, true_fn, false_fn, [x])

    mod_for_compile = torch.compile(
        Foo(), backend=cnt, dynamic=True, fullgraph=True
    )
```

Before de-duplication, we have the following graph module:
```python
class GraphModule(torch.nn.Module):
    def forward(self, L_y_ : torch.Tensor, L_x_ : torch.Tensor, s0 : torch.SymInt, L_z_ : torch.Tensor):
        l_y_ = L_y_
        l_x_ = L_x_
        l_z_ = L_z_

        # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1243, code: return x.sum() + self.buffer.sum() + z.sum()
        l__self___buffer = self.L__self___buffer

        # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1246, code: return x.sum() - z.sum() - self.buffer.sum()
        l__self___buffer_1 = self.L__self___buffer

        # File: /home/yidi/local/pytorch/torch/_higher_order_ops/cond.py:118, code: return cond_op(pred, true_fn, false_fn, operands)
        cond_true_0 = self.cond_true_0
        cond_false_0 = self.cond_false_0
        cond = torch.ops.higher_order.cond(l_y_, cond_true_0, cond_false_0, [l_x_, l_z_, l__self___buffer, l__self___buffer_1]);  l_y_ = cond_true_0 = cond_false_0 = l_x_ = l_z_ = l__self___buffer = l__self___buffer_1 = None
        return (cond,)

    class GraphModule(torch.nn.Module):
        def forward(self, l_x_, l_z_, l__self___buffer_true_branch, l__self___buffer_1_false_branch):
            l_x__1 = l_x_
            l_z__1 = l_z_

            # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1243, code: return x.sum() + self.buffer.sum() + z.sum()
            sum_1 = l_x__1.sum();  l_x__1 = None
            sum_2 = l__self___buffer_true_branch.sum();  l__self___buffer_true_branch = None
            add = sum_1 + sum_2;  sum_1 = sum_2 = None
            sum_3 = l_z__1.sum();  l_z__1 = None
            add_1 = add + sum_3;  add = sum_3 = None
            return add_1

    class GraphModule(torch.nn.Module):
        def forward(self, l_x_, l_z_, l__self___buffer_true_branch, l__self___buffer_1_false_branch):
            l_x__1 = l_x_
            l_z__1 = l_z_

            # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1246, code: return x.sum() - z.sum() - self.buffer.sum()
            sum_1 = l_x__1.sum();  l_x__1 = None
            sum_2 = l_z__1.sum();  l_z__1 = None
            sub = sum_1 - sum_2;  sum_1 = sum_2 = None
            sum_3 = l__self___buffer_1_false_branch.sum();  l__self___buffer_1_false_branch = None
            sub_1 = sub - sum_3;  sub = sum_3 = None
            return sub_1
```

After de-duplication, we have the following graph module:
```python
class GraphModule(torch.nn.Module):
    def forward(self, L_x_ : torch.Tensor, L_y_ : torch.Tensor, s0 : torch.SymInt, L_z_ : torch.Tensor):
        l_x_ = L_x_
        l_y_ = L_y_
        l_z_ = L_z_

        # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1232, code: return x.sum() + self.buffer.sum() + z.sum()
        l__self___buffer = self.L__self___buffer

        # File: /home/yidi/local/pytorch/torch/_higher_order_ops/cond.py:118, code: return cond_op(pred, true_fn, false_fn, operands)
        cond_true_0 = self.cond_true_0
        cond_false_0 = self.cond_false_0
        cond = torch.ops.higher_order.cond(l_y_, cond_true_0, cond_false_0, [l__self___buffer, l_x_, l_z_]);  l_y_ = cond_true_0 = cond_false_0 = l__self___buffer = l_x_ = l_z_ = None
        return (cond,)

    class GraphModule(torch.nn.Module):
        def forward(self, l__self___buffer, l_x_, l_z_):
            l__self___buffer_1 = l__self___buffer
            l_x__1 = l_x_
            l_z__1 = l_z_

            # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1232, code: return x.sum() + self.buffer.sum() + z.sum()
            sum_1 = l_x__1.sum();  l_x__1 = None
            sum_2 = l__self___buffer_1.sum();  l__self___buffer_1 = None
            add = sum_1 + sum_2;  sum_1 = sum_2 = None
            sum_3 = l_z__1.sum();  l_z__1 = None
            add_1 = add + sum_3;  add = sum_3 = None
            return add_1

    class GraphModule(torch.nn.Module):
        def forward(self, l__self___buffer_1, l_x_, l_z_):
            l__self___buffer_2 = l__self___buffer_1
            l_x__1 = l_x_
            l_z__1 = l_z_

            # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1235, code: return x.sum() - z.sum() - self.buffer.sum()
            sum_1 = l_x__1.sum();  l_x__1 = None
            sum_2 = l_z__1.sum();  l_z__1 = None
            sub = sum_1 - sum_2;  sum_1 = sum_2 = None
            sum_3 = l__self___buffer_2.sum();  l__self___buffer_2 = None
            sub_1 = sub - sum_3;  sub = sum_3 = None
            return sub_1

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112874
Approved by: https://github.com/zou3519
2023-11-27 22:07:42 +00:00
voznesenskym
081c5b3adc Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926) (#114526)
Summary:

The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this)

cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: huydhn, Chillee

Differential Revision: D51566250

Pulled By: voznesenskym

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526
Approved by: https://github.com/Chillee, https://github.com/huydhn
2023-11-26 23:40:32 +00:00