Changes:
1. Add `_private_register_pytree_node` API in both C++ and Python pytree. In C++ pytree, the API will only register pytree node for C++ pytree. In Python pytree, the API will only register pytree node for Python pytree.
2. Do not allow registering a type as pytree node twice in the Python pytree.
3. Add thread lock to the Python pytree node register API.
4. The old `_register_pytree_node` API will call the `_private_register_pytree_node` API and raise a deprecation warning.
5. Add a new `register_pytree_node` API to register node type in both C++ and Python implementations.
6. Add tests to ensure a warning will be raised when the old private function is called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111
Approved by: https://github.com/zou3519
Summary: Previously, when we had two dynamic shape symbols `s0` and `s1` bound by the relationship `s1 == s0 + 1`, even when the range constraints were set in accordance with the relationship (e.g., to `[2, 1024]` for `s0` and to `[3, 1025]` for `s1`), `torch._dynamo.export` raised an error saying that the constraint is violated. Here we add a range check between the expression and the constraint and, if the ranges match, don't declare the constraint violated.
We also add a flag to disable the dim constraint solver in `torch._dynamo.export` (not set by default for BC), passed down from the `torch._export.aot_compile`. This is because, even for simple constraints like `s1 == s0 + 1`, the solver claims that the constraint is too complex and the dimension `s0` must be specialized. The new flag is not exposed as a part of the public API (i.e., the one without `_`s in the module names).
Both changes are required to unblock PT2 compilation of an internal model with AOT Inductor.
Test Plan:
```
$ python test/inductor/test_aot_inductor.py -k test_shifted_constraint_ranges
s...
----------------------------------------------------------------------
Ran 4 tests in 53.247s
OK (skipped=1)
```
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114024
Approved by: https://github.com/zhxchen17
Summary:
Based on discussions with Sherlock + Zhengxu in D51118067, updated the internal thrift schema to match the OSS schema.
Verifier failures:
* Test contains a None as input, resulting in no meta["val"]
* Test contains torch.autograd.grad_mode.set_grad_enabled as an op, which also results in no meta["val"]
* torch.autograd.grad_mode.set_grad_enabled is also not a valid op
* Test adds a "parameter" to the state dict but the parameter is not an nn.Parameter, causing an assertion failure
So to bypass these failures I did the following hacks(?):
* Before creating the exported program in deserialization, populate nodes w/o meta["val"] with meta["val"] = None
* Add torch.autograd.grad_mode.set_grad_enabled to the skip opset
* Duplicated ExportGraphSignature into aot_export.py so that the graph signature checks will be skipped
Configerator changes in D51343615
Test Plan: CI
Reviewed By: zhxchen17
Differential Revision: D51342921
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113810
Approved by: https://github.com/zhxchen17
Fixes#110100 by making aot_export_modules uses dynamo.export's fake_mode in export.
Test Plan:
Add new tests. One of the test places the fake tensor on cuda devices manually and we are able to export the program and preserve the device information in the final produced graph module even on a machine that installs a cpu version of pytorch. One workaround we need to do is to set all tensor's requires_grad to false as fake tensor with cuda devices doesn't compose well with aot_autograd right now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113681
Approved by: https://github.com/SherlockNoMad
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
Previous discussion: https://github.com/pytorch/pytorch/pull/109476
In this PR, I made following additions to the original PR:
1) Unlifted graph module now runs the runtime assertions in its' forward call.
2) When we retrace, we make sure we run the assertions to make sure user is tracing the module with correct inputs with respect to the assumptions we made during first tracing. The way I do is that I create new graph module type with modified call method. And the runtime assertions happen under torchdynamo.disable so that it is just run in eager directly. The reason is we don't this to be traced part of the graph.
3) Both ep.module and capture_pre_autograd now returns _UnliftedGraphModule.
Differential Revision: [D51078056](https://our.internmc.facebook.com/intern/diff/D51078056)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110222
Approved by: https://github.com/zhxchen17
Subsumes https://github.com/pytorch/pytorch/pull/110794
Fixes https://github.com/pytorch/pytorch/issues/110315
This is not really a 100% sound fix, a deeper analysis of the bug can be found at https://docs.google.com/document/d/1y-nRAPdbZEji52MPKYzC0U3VhvW9yEAEDqP5t5GhWZ0/edit
The general idea behind the fix here is that we are going to play fast and loose with user defined classes: as Dynamo is written today, we are willing to pull out these types and directly manipulate them (e.g., look at their `__mro__`, etc) without an intervening VariableTracker. As such, if I use `python_type` to extract out the Python type of a VT or if I am manually reading out the `__bases__` of a type, which may be a user defined class, if it is sourceless, all I need to do is use SourcelessBuilder instead of ConstantVariable to make sure I wrap it into the correct VT class.
The approach in https://github.com/pytorch/pytorch/pull/110794 was "more correct", but we'd have to go substantially further to get it all working. So I am doing this to unblock suo for now.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113390
Approved by: https://github.com/suo
Summary: yolo fixing issues. See Test plan
Test Plan:
buck2 run 'fbcode//mode/dev' fbcode//executorch/examples/portable/test:test_export -- -r test_mv3_export_to_executorch
[Need acl to repro this but the error message looks straight forward]
buck2 test 'fbcode//mode/dev-nosan' fbcode//pye/model_inventory/nlu_stella_cap:nlu_stella_cap_test -- --exact 'pye/model_inventory/nlu_stella_cap:nlu_stella_cap_test - test_export_to_backend_dynamic_quantized (pye.model_inventory.nlu_stella_cap.NluStellaCapTest.NluStellaCapTest)'
Differential Revision: D51128480
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113296
Approved by: https://github.com/tugsbayasgalan
`install_config_module` makes a regular module into a ConfigModule with
extra methods defined on it. mypy thinks those extra methods (or module
functions) are undefined since it cannot analyze something so
dynamic. As a workaround, I've created a fake module that defines these
extra functions, which I import into the config modules during type
checking.
As part of this change, I've also added more types to config_utils.py
and enabled typechecking for torch/_dynamo/config.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112130
Approved by: https://github.com/jansel
Summary: Turn on verifier check for exportec program ctor. Note that this effectively detect a large surface of spec violations, so we also spend some time fixing them one by one in this diff.
Test Plan: CI
Differential Revision: D51014944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113075
Approved by: https://github.com/angelayi
Custom classes that are serialized with pytree are serialized by default with `f”{class.__module__}.{class.__name__}”`. This is a dependency from our serialized program directly into the outer Python environment. If a user moves the class to a different directory, the serialized program will be unable to be loaded. So, we will require users to pass in an FQN if they want to serialize their custom treespec type.
Differential Revision: [D50886366](https://our.internmc.facebook.com/intern/diff/D50886366)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112428
Approved by: https://github.com/suo
split_cat fx passes expect the `example_value` metadata on every node. However, the graph module from _export_torch_ir does not contain this metadata, causing the split_cat fx passes to not run. So, I added a pass to add this metadata to every node in the graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112415
Approved by: https://github.com/frank-wei
Updates `_export.aot_compile` to pass a torch IR graph to inductor, allowing inductor to now run the pre_grad_passes, and reuse more of inductor's code.
Also updates the API to only return the `so_path`, and not returning the exported program. The pytree call spec is now serialized and placed inside of the generated model code. When calling the model, because there is no c++ pytree implementation linked yet, we can access the call specs through `get_call_spec()`, and call pytree flatten/unflattenin python.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110020
Approved by: https://github.com/desertfire
Did some easy fixes from enabling TRY200. Most of these seem like oversights instead of intentional. The proper way to silence intentional errors is with `from None` to note that you thought about whether it should contain the cause and decided against it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111496
Approved by: https://github.com/malfet
Summary: In AOTInductor ABI Compatible-mode, we don't serialize missing args with default value.
Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops
Differential Revision: D50345729
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111425
Approved by: https://github.com/angelayi
Previously we were generating a graph to add runtime assertions on inputs and then running that graph to check input constraints. This PR checks input constraints directly.
Differential Revision: D50289970
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111262
Approved by: https://github.com/zhxchen17
Summary:
This was error was run into when running ExportPassBase on an exported model with lifted constant tensors:
```
File "/data/users/angelayi/pytorch/torch/_subclasses/fake_tensor.py", line 1444, in dispatch
len(kwargs) == 0 and len(args) == 1 and type(args[0]) is torch.Tensor
AssertionError: (FakeTensor(..., size=(s0,)),) {}
While executing %lift_fresh_copy_1 : [num_users=1] = call_function[target=torch.ops.aten.lift_fresh_copy.default](args = (%_lifted_tensor_constant99,), kwargs = {})
Original traceback:
File "" in forward
mean = torch.tensor([0.485, 0.456, 0.406]).reshape(3, 1, 1)
```
In ExportPassBase, we retrace using the fake tensors in the placeholder nodes, but when running into this lift_fresh_copy operators, it's unable to be called with the fake tensors.
Test Plan: CI
Reviewed By: chakriu
Differential Revision: D50211827
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111140
Approved by: https://github.com/zhxchen17
Summary:
Previously we design the GraphSignature format as a bunch of inputs and outputs node names. After a discussion in the design meeting we decide to change the format to make signature more self-contained. Now the signature format look like the following:
```
[
InputSpec(
kind=InputKind.USER_INPUT,
arg=TensorArgument(name="arg0_1"),
target=None,
),
...
]
```
Test Plan: CI
Reviewed By: angelayi
Differential Revision: D49876258
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111017
Approved by: https://github.com/angelayi
Summary:
https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130
Changes:
* `torch.export` will return a functional ATen graph but not lowered to core aten decompositions (CompositeImplicitAutograd decomps still run)
* `exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table.
Calling convention for Executorch stays the same:
```
pre_autograd_graph = capture_pre_autograd_graph(f, args, ...)
aten_graph_no_decomps = torch.export.export(pre_autograd_graph, args, ...)
# Within to_edge we decompose to core aten and then convert to edge
edge_graph = exir.to_edge(aten_graph_no_decomps)
```
Test Plan: CI
Differential Revision: D50172210
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111030
Approved by: https://github.com/ydwu4
Summary:
https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130
Changes:
* `torch.export` will return a functional ATen graph w/o decompositions
* `exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table.
Calling convention for Executorch stays the same:
```
pre_autograd_graph = capture_pre_autograd_graph(f, args, ...)
aten_graph_no_decomps = torch.export.export(pre_autograd_graph, args, ...)
# Within to_edge we decompose to core aten and then convert to edge
edge_graph = exir.to_edge(aten_graph_no_decomps)
```
Test Plan: CI
Differential Revision: D49742989
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110410
Approved by: https://github.com/ydwu4
We want to get to a point where most UserErrors link to exportdb examples. This PR makes passing case names non-optional to make this intent clearer and encourage developers who raise UserErrors to make or point to examples that make fixing such errors more obvious for users.
In addition, sometimes there are multiple examples that are relevant to an error. Thus this PR also enables passing multiple case names.
Retry of #110733 which was reverted due to a landrace.
Differential Revision: [D50087148](https://our.internmc.facebook.com/intern/diff/D50087148/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110878
Approved by: https://github.com/gmagogsfm, https://github.com/tugsbayasgalan
We want to get to a point where most `UserError`s link to `exportdb` examples. This PR makes passing case names non-optional to make this intent clearer and encourage developers who raise `UserError`s to make or point to examples that make fixing such errors more obvious for users.
In addition, sometimes there are multiple examples that are relevant to an error. Thus this PR also enables passing multiple case names.
Differential Revision: [D50020465](https://our.internmc.facebook.com/intern/diff/D50020465/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110733
Approved by: https://github.com/zhxchen17
Previously,`Dim` definitions that shared the same name but had different ranges were allowed to appear in the `dynamic_shapes` argument of an `export` call. They would correspond to the *same* dynamic dimension (identified by the shared name) with an effective range would be the *intersection* of the different ranges.
However this behavior can be confusing, because having different definitions with the same name is more likely than not unintentional. Therefore, this PR makes it a user error.
We still allow different definitions with the same name to exist at the same time (no global uniqueness) as long as they are not confused in the same `export` call. Redefinitions with the same bounds are also allowed, in case they are accidentally created by executing the same code multiple times.
Differential Revision: [D49965944](https://our.internmc.facebook.com/intern/diff/D49965944/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110638
Approved by: https://github.com/zhxchen17
Summary: The runtime assertions inserted in the `torch._export.export` by the `_AddRuntimeAssertionsForInlineConstraintsPass` lead to errors in AOT Inductor like #109884. In `torch._export.aot_compile` export and AOT compilation are run consecutively which would lead to the above issue if any assertions are inserted.
In this PR, we're adding a new parameter / flag to `torch._export.aot_compile`, `remove_runtime_assertions`, to remove the assertions inserted during export before AOT compilation. The flag is set to `False` for BC.
Additionally, we remove the flag `add_runtime_assertions_for_inline_constraints` recently added to `torch._dynamo.config`, as it can lead to undesirable `torch._export` behavior and is 's no longer required for the AOT Inductor testing purposes.
Test Plan: CI
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110710
Approved by: https://github.com/zhxchen17, https://github.com/chenyang78
Ideally all `_dynamo.exc.UserError`s should have "case names", i.e., link to examples in `exportdb`.
This PR adds case names to several instances of `_dynamo.exc.UserError`. In particular, looking at coverage based on `UserErrorType`:
* `DYNAMIC_CONTROL_FLOW`, `ANTI_PATTERN`, and `STANDARD_LIBRARY` are fully covered.
* `CONSTRAINT_VIOLATION` and `DYNAMIC_DIM` have no coverage. We don't seem to have any dedicated examples of specifying dynamic shapes in `exportdb` (although they are used in some other examples without explanation, to avoid some specialization that would make such examples moot).
* `INVALID_INPUT` is only partly covered. Frankly this is tedious to cover via examples.
Differential Revision: [D49928518](https://our.internmc.facebook.com/intern/diff/D49928518/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110555
Approved by: https://github.com/angelayi, https://github.com/ydwu4
Summary:
See wrapper.codegen_reinterpret_view(), it return a temporary handle for tensor, which has following problem.
```
# NB, the return handle here represents a temporary tensor, which will be automatically
# released.
# Here's a sample usage in the cpp wrapper code:
# ```
# aoti_torch_addmm_out(
# buf1,
# arg1_1,
# RAIIAtenTensorHandle(tmp_tensor_handle_0),
# buf0,
# 1L,
# 1L));
# ```
# RAIIAtenTensorHandle(tmp_tensor_handle_0) will be released after the call to addmm_out.
# This could be problematic when it's used in a different pattern, for example:
# ````
# AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6};
# aoti_torch_proxy_executor_call_function(..., tensor_args);
# ````
# RAIIAtenTensorHandle(tmp_tensor_handle_2) will be invalid when it's used in the latter
# kernel call.
return f"RAIIAtenTensorHandle({tmp_name})"
```
As a result, ProxyExecutor would generate following code, which cause invalid memory access.
Before:
```
// Source Nodes: [fn_with_tuple_output], Original ATen: [fb.fn_with_tuple_output]
AtenTensorHandle tmp_tensor_handle_2;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__reinterpret_tensor(buf3, 2, int_array_0, int_array_1, 0L, &tmp_tensor_handle_2));
...
AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6};
int64_t int_args[] = {1};
aoti_torch_proxy_executor_call_function(proxy_executor, 1, 1, int_args, 3, tensor_args);
buf3.reset();
```
With fix in this diff, ProxyExecutor generates following code
After:
```
// Source Nodes: [fn_with_tuple_output], Original ATen: [fb.fn_with_tuple_output]
AtenTensorHandle tmp_tensor_handle_2;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__reinterpret_tensor(buf3, 2, int_array_0, int_array_1, 0L, &tmp_tensor_handle_2));
...
aoti_torch_proxy_executor_call_function(proxy_executor, 1, 1, std::vector<int64_t>{1}.data(), 3, std::vector<AtenTensorHandle>{RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6}.data());
buf3.reset();
```
I am not exactly a big fan of such `std::vector{...}.data()` for creating a temp array, but I can't think of another fix.
Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops
Reviewed By: desertfire
Differential Revision: D49758764
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110451
Approved by: https://github.com/desertfire
Summary:
Previously we used export's constraints to specify all batch-size dimensions being dynamic. This is done by creating 1 constraint `dynamic_dim(inp[0][0], lower, upper)`, followed by `dynamic_dim(inp[0][0]) == dynamic_dim(inp[i][0])` for every input `i`.
Through the new `dynamic_shapes` API, we can use `Dims("batch_size")` on every dimension to specify which dimensions are dynamic and equal to each other, and `None` otherwise: `{i: [Dims("batch_size", lower, upper), None] for every input i}`
Note: `dynamic_shapes` and `constraints` utilize the same "constraints" backend so this diff should be idempotent.
Test Plan: `buck2 run @//mode/dev-nosan //caffe2/torch/fb/model_transform/experimental/benchmark/test/aotinductor:test_aot_inductor_benchmark`
Reviewed By: chenyang78, aakhundov
Differential Revision: D49784351
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110360
Approved by: https://github.com/desertfire
Summary:
https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130
`exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table.
Splitting up this diff with the following one (D49742989) to make migrating Executorch easier:
1. Land this diff
1. Wait for a pytorch nightly to include this diff
1. Update executorch's pytorch nightly
1. Land the following diff to have export() return no decomps
Test Plan: Tested in following diff
Differential Revision: D49743208
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110236
Approved by: https://github.com/gmagogsfm
Summary: with the grid computed in terms of unbacked `SymInt`s, it can happen that the grid is zero size. This causes CUDA error on `cuLaunchKernel` in the AOT Inductor codegen.
In this PR, when the grid contains unbacked `SymInt`s, a check is added around the `launchKernel` in the AOT Inductor's C++ wrapper codegen to make sure that the grid is not zero-size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110312
Approved by: https://github.com/chenyang78
An `ExportedProgram`'s `__call__` signature is different from the original module, so `dynamic_shapes` that follow the original signature would fail when applied to re-export an `ExportedProgram`.
This PR fixes this issue, in other words, the original `dynamic_shapes` should now work when re-exporting.
Differential Revision: D49764011
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110276
Approved by: https://github.com/tugsbayasgalan
A resubmit of https://github.com/pytorch/pytorch/pull/108447. Copy over the descriptions:
This is a follow-up of the discussion in https://github.com/pytorch/pytorch/pull/108356, where we want to repalce source_fn with source_fn_stack
Before this PR, for the following example:
```python
backend = EagerAndRecordGraphs()
@torch.compile(backend=backend, fullgraph=True)
def cond_f(pred, pred2, x, y):
def true_fn(pred2, x, y):
return x + y
def false_fn(pred2, x, y):
def true_fn2(x, y):
return x.sin() - y.cos()
def false_fn2(x, y):
return x.cos() - y.sin()
return control_flow.cond(pred2, true_fn2, false_fn2, (x, y))
return control_flow.cond(pred, true_fn, false_fn, (pred2, x, y))
```
The graph captured is shown below:
```python
class GraphModule(torch.nn.Module):
def forward(self, L_pred_ : torch.Tensor, L_pred2_ : torch.Tensor, L_x_ : torch.Tensor, L_y_ : torch.Tensor):
l_pred_ = L_pred_
l_pred2_ = L_pred2_
l_x_ = L_x_
l_y_ = L_y_
cond_true_1 = self.cond_true_1
cond_false_1 = self.cond_false_1
cond = torch.ops.higher_order.cond(l_pred_, cond_true_1, cond_false_1, [l_pred2_, l_x_, l_y_]); l_pred_ = cond_true_1 = cond_false_1 = l_pred2_ = l_x_ = l_y_ = None
return (cond,)
class GraphModule(torch.nn.Module):
def forward(self, l_pred2_, l_x_, l_y_):
add = l_x_ + l_y_; l_x_ = l_y_ = None
return add
class GraphModule(torch.nn.Module):
def forward(self, l_pred2_, l_x_, l_y_):
cond_true_0 = self.cond_true_0
cond_false_0 = self.cond_false_0
cond = torch.ops.higher_order.cond(l_pred2_, cond_true_0, cond_false_0, [l_x_, l_y_]); l_pred2_ = cond_true_0 = cond_false_0 = l_x_ = l_y_ = None
return cond
class GraphModule(torch.nn.Module):
def forward(self, l_x_, l_y_):
sin = l_x_.sin(); l_x_ = None
cos = l_y_.cos(); l_y_ = None
sub = sin - cos; sin = cos = None
return sub
class GraphModule(torch.nn.Module):
def forward(self, l_x_, l_y_):
cos = l_x_.cos(); l_x_ = None
sin = l_y_.sin(); l_y_ = None
sub = cos - sin; cos = sin = None
return sub
```
the source_fn for inner cond, sin, cos will be a (name, target) tuple:
```
('cond', <torch._ops.HigherOrderOperator object at xxx>)
('sin', 'sin')
('cos', 'cos')
('sub'. <built-in function sub>)
```
After this pr, the source_fn_stack will be a list of (name, target) tuple. The bottom of stack is the end of the list.
```
[('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>)],
[('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('sin', 'sin')],
[('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cos', 'cos')]
[('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('sub', <built-in function sub>)]
```
Test Plan:
See added tests in test_higher_order_ops.py and modify existing test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108595
Approved by: https://github.com/angelayi, https://github.com/zou3519
Recently we updated the `export` API to take an experimental `dynamic_shapes` argument that was meant to subsume the existing `constraints` argument.
This PR deprecates `constraints` (with a warning on its use, but without actually removing it). Simultaneously it replaces all uses of `constraints` in docs, examples, and tests with corresponding uses of `dynamic_shapes` (preserving behavior). This exercise fortunately revealed some minor bugs in the implementation which have also been fixed in this PR.
Some uses of `constraints` still remain, e.g., when `torch._dynamo.export` is called directly. (Meta-internal uses will be updated in a separate diff.)
Differential Revision: D49676049
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110143
Approved by: https://github.com/tugsbayasgalan
Based on William's recent diff on preserving node metadata on retracing, we no longer need to skip dynamo on retracing. This softens our previous restriction of not allowing any new constraints from user side because we can utilize dynamo to analyze through constraints now. As a result, re-export can technically happen with any new constraints. This opens up another problem that "Is it ok to use more loose constraints on the retracing?" If we allow loose constraints, we can technically diverge from eager behaviour because for example we could have eliminated unsafe control flow based on previous assumption. But we can also argue this is ok because we can say we treat the Exported callable to be an independent callable from its' original source code.
We can technically ban loose constraints inside export, but my concern is we are breaking abstraction by doing special case checks on ExportedProgram.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109476
Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17
Our experience using `constraints` / `dynamic_dim` with the existing export API has found it to be (subjectively) clunky and (objectively) verbose in common cases.
This PR implements a new design for the export API that replaces the use of `constraints` / `dynamic_dim` with a new way of specifying dynamic shapes, involving the following concepts:
* a constructor `Dim` for first-class named dynamic dimensions with ranges (similar to `functorch.dim`, and analogous to internal symbolic sizes)
* a mechanism that uses the above in `export` calls to associate inputs to their dynamic shape specifications (`dynamic_shapes`)
Design doc: https://docs.google.com/presentation/d/168U7XK72C_WSsZpGESP6Cho9udh193fi0gfjxCNcJ4E/edit#slide=id.p (Meta-only). Note that we only implement Option 1 in that doc. An older version of this PR also implemented Option 3, which is an alternative way of specifying dynamic shapes using tensor type annotations on the exported callable; but we have moved that to future work for now.
See docs for these new features in `torch.export`. The existing `torch.export.export` is modified to use the new API, `torch._export.export__RC__`, whenever `constraints=None`. We have not deprecated the existing API yet, but will do in a follow-up.
Constraint violation errors arising through use of the new API will now contain suggested fixes using the new API. No longer do we need to report all specializations for static dimensions and suggest all constraints over dynamic dimensions to fix such errors. Instead, due to the redesign, the suggested fixes are much more concise, only involving modifying the definitions of relevant `Dim`s.
Differential Revision: [D48919204](https://our.internmc.facebook.com/intern/diff/D48919204/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108448
Approved by: https://github.com/suo, https://github.com/gmagogsfm
We now have two types of functionalization, C++ Functionalization (through the `Functionalize` dispatch key), and python functionalization (through the `FunctionalTensorMode` torch_dispatch mode).
This means that all higher order ops need custom functionalization rules for the python variant too. I added them here, as well as a helper function `dispatch_functionalize()` - equivalent to `torch.func.functionalize()`, except that it uses `FunctionalTensorMode`.
In theory we could have secretly switched `torch.func.functionalize` to use `FunctionalTensorMode`. This would be BC-breaking, though, since `FunctionalTensorMode` isn't composable with the other functorch transforms (the functorch layer-mode stack doesn't know how to re-order torch_dispatch modes arbitrarily).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108656
Approved by: https://github.com/zou3519
ghstack dependencies: #109024, #109248
Fix: #107315
This PR enables dynamo to trace through the `pytree` API by inlining its functions. In
order to do so, a few details of `pytree` had to be changed.
In summary, this PR:
- Introduces `TreeSpecVariable` for representing `TreeSpec` instances
- Specializes `<type>.__bases__` call, returning a `TupleVariable`
- Enables the call to `id` builtin function for every variable that implements
`as_python_constant` method
- Specializes `ConstantVariable.call_method` for its (un)flatten functions
- Implements `UserDefinedObjectVariable.as_python_constant`
- Modifies `pytree` by:
- Make `SUPPORTED_NODES` a map of ids (instead of types) to `NodeDef`
- Removed `functools.wraps` function, since it can't be inlined
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108533
Approved by: https://github.com/ezyang, https://github.com/voznesenskym
ghstack dependencies: #109201
Summary:
X-link: https://github.com/pytorch/executorch/pull/359
When exporting using enable_aot (through the torch.export path), we want to lift all constant tensors as buffers to the exported program. The ScalarToTensor pass in EXIR's aten_to_edge passes will create some constant tensors in the graph, so we will need to run a lift_constant_tensors pass afterwards.
Note that this only needs to be applied when exporting using the torch.export path because in the original path, nothing is lifted.
Test Plan: CI
Differential Revision: D49207492
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109382
Approved by: https://github.com/cccclai
Previously, the code for passing inputs to exported program was:
```
if kwargs:
return (args, kwargs)
else:
return args
```
However, this causes some inconsistency where if the original input contains args and kwargs, the treespec would be a tuple containing a tuple of arguments, and a dictionary of keyword arguments. But if the original input only contained args, the treespec would just be a tuple of arguments. This inconsistency causes some inconveniences in the runtime.
So I updated the code to just always keep the kwargs around.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109160
Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri
Summary:
If there's no inline constraints added, just return the original graph.
We want to do this because sometimes this pass mess up the node names,
before we actually fix this, we could make the behavior a bit less buggy
by skipping noop passes.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109395
Approved by: https://github.com/angelayi