Summary:
_dynamo_graph_capture_for_export in the current form has the compability issue
with the main torch.compile() path despite we reuse fullgraph_capture as the
bytecode tracer. The reason is that we flip on many export specific flags
and even trace with a wrapped function which will cause divergence with
torch.compile() again.
This PR instead creates a new implementation of dynamo_graph_capture_for_export
which 100% relies on fullgraph capture and post-processing on CaptureOutput so
that we can avoid the inversion of phases in PT2 compiler stack.
This also benefits precompile workflow since we want to have a feature that
only accepts pytree inputs and ship portable python wrappers in package. In
other words, I think the code here is sharable between export and precompile
for exporting portable graph.
Test Plan:
===================================================================== test session starts =====================================================================
platform linux -- Python 3.12.11, pytest-7.3.2, pluggy-1.6.0
rootdir: /data/users/zhxchen17/pytorch
configfile: pytest.ini
plugins: xdoctest-1.1.0, hypothesis-5.35.1, xdist-3.3.1, subtests-0.13.1, rerunfailures-14.0, flakefinder-1.1.0, cpp-2.3.0, anyio-4.10.0
collected 9 items
Running 9 items in this shard
test/distributed/tensor/test_dtensor_export.py ........x [100%]
================================================================ 8 passed, 1 xfailed in 11.42s ================================================================
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165562
Approved by: https://github.com/tugsbayasgalan
Differential Revision: [D82603767](https://our.internmc.facebook.com/intern/diff/D82603767)
Previously, i forgot to add handle call_module case which now will have export_root prepended to their names. Basically i want to clean up sth like:
```
graph():
%l_self_export_root_sub_mod = call_module[target=l_self_export_root_sub_mod](%x, %y)
%l_self_export_root_sub_mod_1 = call_module[target=l_self_export_root_sub_mod](%x, %y)
```
Dynamo graph can have call_module nodes that have messed up name due to our wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163136
Approved by: https://github.com/avikchaudhuri
Summary: This PR introduces shape guards to export. Previously only value ranges, equalities, and specializations would be tracked for symbolic expressions, and we had a forward hook to check them. Instead now we create a function to check shape guards and call it in the exported program.
Test Plan:
updated several tests
Rollback Plan:
Differential Revision: D80713603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161178
Approved by: https://github.com/tugsbayasgalan
Summary:
In preparation for checking shape guards in export, this PR effectively switches `prefer_deferred_runtime_asserts_over_guards` to `False`, matching Dynamo.
Actually that's a lie: we switch it to `allow_complex_guards_as_runtime_asserts`, which is `False` by default but can be controlled via an internally API to be `True`. This makes the two flags synchronized, so we should be able to kill `allow_complex_guards_as_runtime_asserts` at this point.
Test Plan:
updated tests
Rollback Plan:
Differential Revision: D79734206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160111
Approved by: https://github.com/tugsbayasgalan
Previously specialization error messages would render sources that were pretty far from source-code names. E.g., given args named `x, y, zs`, the source for `y.size()[0]` would be rendered as `args[0][1].size()[0]`.
This is because we created artificial local names following `(args, kwargs)` structure instead of reusing signatures. This PR fixes that situation.
Basically we map prefixes of key paths that correspond to original arg names to root sources corresponding to those names; the rest of the key paths hang from these root sources.
Differential Revision: [D76461391](https://our.internmc.facebook.com/intern/diff/D76461391/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155738
Approved by: https://github.com/bobrenjc93
Lets explore firs a couple of problem related to replacements and runtime assertions.
#### example problem 1
if we have a runtime assertions that u0==s0, u0 is an input coming from mark_unbacked. A replacement u0=s0 will be added, the function f(u0, s0) will become f(s0, s0), this leads to the assert not being inserted during insert_deferred_runtime_asserts.
The reason is that insert_deferred_runtime_asserts logic insert each assertion once all its inputs are seen, but u0 will never be seen. Same thing can happen when we defer assertion on backed i.e: s0==s2 ..etc.
#### example problem 2
Consider u0==s0, where u0 is coming from a call to .item() Imagine later on that a specialization happens to s0 to become 2. In that case s0 as input wont be seen during insert_deferred_runtime_asserts and the assertion won't be inserted in the graph. Worse, Inductor will generate some code that refers to s0 in the cpp wrapper while it does not exist, causing a failure.
internal xref: https://fb.workplace.com/groups/1075192433118967/permalink/1669766396994898/
## The solution :
Runtime assertions insertion loops depend on detecting that the symbols that are used in the runtime assertions are seen, note that those symbols are either graph inputs or generated in the graph from data dependent ops like .item().
The issues above happen when symbols are graph inputs, in order to force the symbols to exist in the graph and to be seen by the runtime assertions we do not do replacements on placeholders expressions during codegen and during runtime assertions insertion.
This should not have performance overhead, since we already optimized the graph with replacements, the only effect is not mistakenly dropping graph inputs that are used in runtime assertions.
I added extended testing. A solo unrelated follow up that I noticed, is that we might want to rename unbacked symbols in runtime assertions when we do unbacked renaming, but that's a different issue.
Other approaches that did not work :
#### ban replacements on unbacked.
1. does not work when we defer runtime assertions on backed ex: s0==s1. we could also ban such replacements
but problem 2 becomes more problematic.
2. Problem two, it affects the quality of reasoning ! in a bad way.
#### Apply specialization on runtime assertions before codegen .
1. Can fix some issues, but may lead also to runtime assertions becoming NOPs.
2. Does not fix the issue if not inserting runtime assertions during insert_deferred_runtime_asserts due to input not being detected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153661
Approved by: https://github.com/jansel
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.
Fixes#35735
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971
Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos
Summary:
_decompose_exported_program() ran into an issue with trace_joint, where trace_joint() produces values with mismatching FakeModes. Adding fake mode context to aot_export_module() so this doesn't happen.
#thanks to tugsbayasgalan for the fix!
Test Plan: test_experimental
Differential Revision: D58977694
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129421
Approved by: https://github.com/tugsbayasgalan, https://github.com/zhxchen17
Summary:
WARNING: This API is highly unstable and will be subject to change in the future.
Add a protoype to "decompose" an ExportedProgram into a joint graph form, so that we can compute the gradients on this graph.
Test Plan: buck test mode/opt caffe2/torch/fb/export:test_experimental
Differential Revision: D55657917
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128847
Approved by: https://github.com/tugsbayasgalan
## Goal
As title
## Design
Based on the fact that each TorchScript module has a `code` property which provides the original source code for the `forward` function, I implemented a function to extrapolate `forward` function signature by using the AST parser.
Some other tradeoff
* Directly parsing src code as string --> will be very buggy
* Directly using `compile` function in Python to get the function object --> raises a lot of exceptions because of missing packages or undefined variable names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126787
Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan
This PR switches export IR from aot-dispatch to pre-dispatch IR.
**What is pre-dispatch IR and why should you care?**
Currently the default IR returned by torch.export can contain only functional ATen operators after ALL pytorch dispatcher decompositions (for example, CompositeImplicitAutograd) run.
In contrast, pre-dispatch IR refers to an IR that can contain all functional ATen operators (i.e., not just from the core subset), before any decomposition happens, as well as operators that manipulate autograd state. Pre-dispatch IR closely resembles eager PyTorch computation, but is still functional and serializable by torch.export. As a result:
You can train the pre-dispatch IR in eager mode as the IR contains necessary information for the autograd engine to automatically generate a backward graph.
You can write sound graph transformations more easily as the IR is functional.
Since it is an ATen IR, it is still normalized. For example, torch.add has multiple overloads, but aten.add.Tensor is unique in this IR.
If you want to get the core aten IR out of torch.export, you will need to:
```
ep = torch.export.export(M(), inputs)
ep_for_core_aten = ep.run_decompositions()
```
Differential Revision: [D57172986](https://our.internmc.facebook.com/intern/diff/D57172986)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125860
Approved by: https://github.com/zhxchen17
This PR switches export IR from aot-dispatch to pre-dispatch IR.
**What is pre-dispatch IR and why should you care?**
Currently the default IR returned by torch.export can contain only functional ATen operators after ALL pytorch dispatcher decompositions (for example, CompositeImplicitAutograd) run.
In contrast, pre-dispatch IR refers to an IR that can contain all functional ATen operators (i.e., not just from the core subset), before any decomposition happens, as well as operators that manipulate autograd state. Pre-dispatch IR closely resembles eager PyTorch computation, but is still functional and serializable by torch.export. As a result:
- You can train the pre-dispatch IR in eager mode as the IR contains necessary information for the autograd engine to automatically generate a backward graph.
- You can write sound graph transformations more easily as the IR is functional.
- Since it is an ATen IR, it is still normalized. For example, torch.add has multiple overloads, but aten.add.Tensor is unique in this IR.
If you want to get the core aten IR out of `torch.export`, you will need to:
```
ep = torch.export.export(M(), inputs)
ep_for_core_aten = ep.run_decompositions()
```
Differential Revision: [D56273267](https://our.internmc.facebook.com/intern/diff/D56273267)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123573
Approved by: https://github.com/gmagogsfm
Previously, we didn't expand the shape of example_value of map to the same as inputs (edit: the first mapped dimension). This pr fixes this bug. To make this easier, we change _call_function_and_unflatten_output to accept example_values directly instead of retrieving them from the variable trackers.
Also remove a redundant call function node in strict_mode higher order op in dynamo.
Test Plan:
existing tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124203
Approved by: https://github.com/ezyang, https://github.com/zou3519
Summary:
This PR restores original names to placeholder nodes, replacing the default names arg0_1, arg1_1, and so on.
User inputs now follow the signature of mod.forward(), for example forward(x, y) produces nodes x, y. If the tensors are nested in dictionaries, lists, tuples, or dataclasses, the names are a concatenation of the path to the tensor, e.g. x = {'a': torch.randn(4), 'b': [torch.randn(4), torch.randn(4)]} produces nodes x_a, x_b_0, x_b_1.
Parameters, buffers, constants, and custom objects follow the FQN of the object, prefixed by "p", "b", "c", and "obj" respectively. For example, self.bar.l0.weight gets you p_bar_l0_weight.
Effect tokens are named token_1, token_2, and so on, since they are not grounded in model inputs or named attributes.
note: breaking the original diff into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.
Examples:
```python
# params, buffers, constants, inputs, torch.cond
ExportedProgram:
class GraphModule(torch.nn.Module):
def forward(self, p_l0_weight: "f32[4, 4]", p_l0_bias: "f32[4]", c_alpha: "f32[4]", b_beta: "f32[4]", x_0_a: "f32[4, 4]", y: "f32[4, 4]"):
# No stacktrace found for following nodes
mul: "f32[4, 4]" = torch.ops.aten.mul.Tensor(x_0_a, x_0_a)
t: "f32[4, 4]" = torch.ops.aten.t.default(p_l0_weight); p_l0_weight = None
addmm: "f32[4, 4]" = torch.ops.aten.addmm.default(p_l0_bias, y, t); p_l0_bias = y = t = None
return addmm
# model code
class Bar(torch.nn.Module):
def forward(self, x):
return x * x
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
self.bar = Bar()
self.l0 = torch.nn.Linear(4, 4)
self.alpha = torch.randn(4)
self.register_buffer('beta', torch.randn(4))
def forward(self, x, y):
x = x[0]['a']
mul = self.bar(x)
z1 = self.l0(y)
return z1
# custom objects, dataclasses, tokens, constant inputs
ExportedProgram:
class GraphModule(torch.nn.Module):
def forward(self, token_1: "f32[0]", obj_attr, data_x: "f32[4, 4]", data_y: "f32[4, 4]", mode):
# No stacktrace found for following nodes
mul: "f32[4, 4]" = torch.ops.aten.mul.Scalar(data_x, 30); data_x = None
div: "f32[4, 4]" = torch.ops.aten.div.Tensor_mode(data_y, 1.0, rounding_mode = 'floor'); data_y = None
add: "f32[4, 4]" = torch.ops.aten.add.Tensor(mul, div); mul = div = None
with_effects = torch._higher_order_ops.effects.with_effects(token_1, torch.ops._TorchScriptTesting.takes_foo.default, obj_attr, add); token_1 = obj_attr = add = None
getitem: "f32[0]" = with_effects[0]
getitem_1: "f32[4, 4]" = with_effects[1]; with_effects = None
return (getitem, getitem_1)
# model code
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
self.attr = torch.classes._TorchScriptTesting._Foo(10, 20)
def forward(self, data, a=1.0, mode="floor"):
x = self.attr.add_tensor(data.x) + torch.div(data.y, a, rounding_mode=mode)
x = torch.ops._TorchScriptTesting.takes_foo(self.attr, x)
return x
dataclass
class DataClass:
x: Tensor
y: Tensor
register_dataclass_as_pytree_node(
DataClass,
serialized_type_name="test.DataClass"
)
args = (DataClass(x=torch.randn(4, 4), y=torch.randn(4, 4)), )
kwargs = {'mode': 'floor'}
ep = torch.export.export(Foo(), args, kwargs, strict=False)
```
Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py
Differential Revision: D55456418
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122904
Approved by: https://github.com/angelayi, https://github.com/thiagocrepaldi
This is proof-of-concept implementation of how people can use a marker `mark_strict` to enable torchdynamo while exporting under non-strict mode. The main idea is that `mark_strict` will turn into an HOO which then utilizes dynamo to do correctness analysis in the same way how torch.cond works today. There are some notable limitations:
1. This API is not meant for public use yet
2. Strict region can't work with arbitrary container inputs
3. We don't preserve `nn_module_stack` and other node metadata for the strict region.
4. strict_mode HOO will show up in the final graph. This is undesirable in the long term, but for short term experiments, it should be good enough. Will fix this in the follow up PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114658
Approved by: https://github.com/ydwu4