This PR switches export IR from aot-dispatch to pre-dispatch IR.
**What is pre-dispatch IR and why should you care?**
Currently the default IR returned by torch.export can contain only functional ATen operators after ALL pytorch dispatcher decompositions (for example, CompositeImplicitAutograd) run.
In contrast, pre-dispatch IR refers to an IR that can contain all functional ATen operators (i.e., not just from the core subset), before any decomposition happens, as well as operators that manipulate autograd state. Pre-dispatch IR closely resembles eager PyTorch computation, but is still functional and serializable by torch.export. As a result:
You can train the pre-dispatch IR in eager mode as the IR contains necessary information for the autograd engine to automatically generate a backward graph.
You can write sound graph transformations more easily as the IR is functional.
Since it is an ATen IR, it is still normalized. For example, torch.add has multiple overloads, but aten.add.Tensor is unique in this IR.
If you want to get the core aten IR out of torch.export, you will need to:
```
ep = torch.export.export(M(), inputs)
ep_for_core_aten = ep.run_decompositions()
```
Differential Revision: [D57172986](https://our.internmc.facebook.com/intern/diff/D57172986)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125860
Approved by: https://github.com/zhxchen17
Summary: This fix does three things:
1. When we add inputs from partioner to the top level graph module, we insert in the order of partioner which is not guaranteed to be same as original graph inputs. This PR fixes that.
2. When we replace autograd ops with HOP, we create new submodules and access their outputs via getitem calls. As a result, previous node names associated with getitem gets updated, resulting in the graph being different from produced graph signature. So I just update the graph signature accordingly.
3. We run runtime_assertion pass before autograd HOP pass because the constraints won't be populated correctly.
Differential Revision: [D57130314](https://our.internmc.facebook.com/intern/diff/D57130314)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125793
Approved by: https://github.com/zhxchen17
Summary:
By default, some inferred dynamic shapes guards/constraints that are not expressible with the current dynamic shapes language will lead to specialization to the concrete input values provided. If disable_forced_specializations is set to True, we will not specialize, and will not perform runtime checks on such produced guards. Instead, we allow the user to specify arbitrary shapes, and fail during runtime if the inputs are invalid. Constraints expressible with the language (e.g. ranges, linear derived dims) will still be enforced, and behavior for all other guards remains the same.
Cases where we typically specialize are reshapes:
```
x: [4, 6] # [s0, s1]
x = x.reshape([x.shape[0] - 1, -1])
# this emits a guard Mod(s0*s1, s0-1) = 0, we specialize on s0=4, s1=6
x: [4, 6], y: [24] # [s0, s1], [s2]
x = x.reshape([-1]) + y
# this emits a guard s0*s1 = s2, we specialize on s0=4, s1=6, s2=24
```
For now only applicable for non-strict mode (need to figure out how to pass this flag into dynamo's call of produce_guards).
Test Plan: Added test case that checks compilation, runtime, and suggested fixes behavior.
Differential Revision: D56361177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124949
Approved by: https://github.com/avikchaudhuri
Fixes [internal error](https://fb.workplace.com/groups/1075192433118967/permalink/1416709435633930/).
The issue is that the asserting nodes added in the `insert_deferred_runtime_assertion` pass do not contain metadata that the ExportedProgram requires the graph to have. One solution to fix this is to retrace the entire module, or another solution is to manually add back this metadata.
This diff implements the latter solution (manually add back the metadata) through hooking into fx.graph's `create_node` function, and adding export-specific metadata for every node that is created. The reason I did this is so that the `insert_deferred_runtime_assertion` does not have to know about what metadata export wants.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125414
Approved by: https://github.com/zhxchen17, https://github.com/BoyuanFeng
Summary:
Fixing the implementation of `_flatten_dynamic_shapes()`, to follow how `_process_dynamic_shapes()` does it. The previous implementation would misinterpret some nested dynamic shapes specs, causing it to miss out on some shapes specs, for example with nested inputs/constant input tuples:
```
inputs = (
(2, 1),
(
torch.randn(2, 1),
torch.randn(2, 2),
torch.randn(2, 3),
)
)
dynamic_shapes = (
(None, None),
(
None,
None,
None,
)
)
```
This would get interpreted as 2 shapes specs for 2d and 3d tensors. Fixing so this doesn't happen.
Test Plan: Existing export tests
Differential Revision: D56894923
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125415
Approved by: https://github.com/angelayi
To fix data-dependent errors we want to recommend that people use `torch._check*` APIs. The `constrain_as*` APIs should be fully subsumed by them, and in the future we should kill them entirely.
Differential Revision: D56774333
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125253
Approved by: https://github.com/ezyang
This PR introduces a new way of building `dynamic_shapes` for export. The idea is to build up a mapping from input tensors to the dynamic shapes that should be assigned to their corresponding fake tensors.
This mapping is automatically converted to the current form of `dynamic_shapes`, which must exactly match the structure of inputs. We do this by using pytree utils.
With the current `dynamic_shapes`, we had to be careful about user-defined classes that are registered with pytree, since such classes are not necessarily polymorphic containers; they may be fine containing tensors, but not dynamic shapes. Thus we had decided to allow input instances of such classes to be associated with dynamic shapes in flattened form. This decision needs to be mirrored in this PR as well. To make it easier to keep these code paths in sync, we refactor the current recursive procedure for associating inputs with dynamic shapes to use the same pytree utils. This needs minor fixes to a few tests where `dynamic_shapes` were not exactly matching the structure of inputs.
Differential Revision: D56551992
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124898
Approved by: https://github.com/zhxchen17
Summary:
Fixes https://github.com/pytorch/pytorch/issues/122842
Currently, calling ep.module() on an ExportedProgram leads to a GraphModule with a default forward signature (e.g. arg_0, arg_1, ...). This leads to original placeholder names disappearing for retracing/re-exporting.
Fixing this issue by creating a forward_arg_names field (will take renaming suggestions for this), that stores the positional & keyword arg names that are used. These names aren't present in the call_spec currently stored, and requires a major version bump for the ExportedProgram schema.
Test Plan: Tests exist for export, but names are now changed from generic (e.g. arg_0, arg_1) to follow user inputs (e.g. x, y)
Differential Revision: D56484994
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124765
Approved by: https://github.com/zhxchen17
Summary:
There are multiple things implemented incorrectly in non strict for reparametrizing state dict:
1. The same fake tensor should be generated for duplicated weights.
2. We should snapshot state dict in the beginning to always hold the invariant that ep.state_dict == mod.state_dict()
3. We will overwrite real weights with fake weights if we don't restore the weights in LIFO ordering.
4. We don't turn on strict checking which could sliently fail on corner cases.
This diff aims to solve all these issues at once.
Test Plan: CI
Differential Revision: D56505020
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124847
Approved by: https://github.com/pianpwk
Summary: We use to skip tensor.to() during tracing when the device is the same. This will bring some performance improvement in eager but making graph capture losing the semantics from original model. In this diff, we add an additional condition to skip the fast path when we don't have actual data inside a tensor, which is the case when we're using FakeTensor / FunctionalTensor to trace the model. This won't have perf impact on previous eager models while making sure we can capture the _to_copy() node in the graph.
Test Plan: buck test mode/opt caffe2/test:test_export -- -r device_to
Differential Revision: D55969674
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123732
Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan
This PR switches export IR from aot-dispatch to pre-dispatch IR.
**What is pre-dispatch IR and why should you care?**
Currently the default IR returned by torch.export can contain only functional ATen operators after ALL pytorch dispatcher decompositions (for example, CompositeImplicitAutograd) run.
In contrast, pre-dispatch IR refers to an IR that can contain all functional ATen operators (i.e., not just from the core subset), before any decomposition happens, as well as operators that manipulate autograd state. Pre-dispatch IR closely resembles eager PyTorch computation, but is still functional and serializable by torch.export. As a result:
- You can train the pre-dispatch IR in eager mode as the IR contains necessary information for the autograd engine to automatically generate a backward graph.
- You can write sound graph transformations more easily as the IR is functional.
- Since it is an ATen IR, it is still normalized. For example, torch.add has multiple overloads, but aten.add.Tensor is unique in this IR.
If you want to get the core aten IR out of `torch.export`, you will need to:
```
ep = torch.export.export(M(), inputs)
ep_for_core_aten = ep.run_decompositions()
```
Differential Revision: [D56273267](https://our.internmc.facebook.com/intern/diff/D56273267)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123573
Approved by: https://github.com/gmagogsfm
Summary:
With pre-dispatch export and ep.run_decompositions(), range constraints are updated through looking at ShapeEnv.var_to_range. However the lower bounds on these may be incorrect - analysis on un-specialized symbols are done with lower bounds of 2, which mismatch with user-specified bounds (may be 0, 1).
This updates `_get_updated_range_constraints()` to use the old range constraints if possible.
Test Plan: Existing pre-dispatch/dynamic shapes test case.
Differential Revision: D55899872
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123602
Approved by: https://github.com/tugsbayasgalan
Differential Revision: D56200666
Previously, when we hit the Functionalize kernel for lift_fresh_copy, we directly dispatch self.clone() to proxy dispatch. As a result, we end up receiving a functional tensor at proxy dispatch. As a work around, I unwrap self manually. Not sure, why it works ok in aot-dispatch tho
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124198
Approved by: https://github.com/bdhirsh
Summary:
note: breaking the original diff D55225818 into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.
Stacked PR to restore original names to placeholder nodes, replacing the default names arg0_1, arg1_1, ...
This PR supports constant argument placeholder (e.g. forward(self, x, y=1)) names and de/serialization, by adding a name field for ConstantArguments in the graph signature, and ConstantInputSpec in the input specs for serialization.
Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py
Differential Revision: D55506949
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123590
Approved by: https://github.com/angelayi, https://github.com/zhxchen17
This PR ensures that assignment of attributes of primitive type work without needing any code changes in non-strict mode. (In a previous PR we banned attribute assignments of tensor type unless such attributes are registered as buffers.)
While strict mode errors on (all) attribute assignments, non-strict doesn't care, so one might assume that this kind of attribute assignment should already work in non-strict. However, there's a problem: we run through the program once for metadata collection and then run through it again for tracing, so the values observed during tracing (and potentially burned into the graph) do not reflect what should have been observed had the metadata collection pass not run.
So the only thing this PR needs to do is restore values of assigned attributes of primitive type once the metadata collection pass has run. We do this by moving the attribute assignment detecting context manager from the overall `aot_export` call in `_trace.py` to the metadata collection pass in `aot_autograd.py`, and extending it. The rest of the PR moves some utils around.
Differential Revision: D56047952
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123898
Approved by: https://github.com/angelayi
Summary:
Currently, torch.export (through AOTAutograd) compiles with a torch.no_grad() wrapper, which affects the presence of `set_grad_enabled` nodes in pre-dispatch export graphs. This changes the wrapper to nullcontext (i.e. enable grad) if `pre_dispatch=True`.
An example that previously failed without `with torch.no_grad()` is below:
```
class Model(torch.nn.Module):
def forward(self, x, y):
with torch.enable_grad():
x = x + y
return x
model = Model()
exported_program = torch.export._trace._export(
model,
(torch.tensor(2), torch.tensor(3)),
dynamic_shapes=None,
pre_dispatch=True,
strict=False
)
```
The pass would inline the add call, but then try to construct a HOO subgraph with no inputs/outputs:
```
def forward(self):
_set_grad_enabled_1 = torch._C._set_grad_enabled(False)
```
Test Plan: Test case checking that nullcontext & no_grad wrappers lead to different export behaviors (regarding set grad subgraphs).
Reviewed By: tugsbayasgalan
Differential Revision: D55777804
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123671
Approved by: https://github.com/tugsbayasgalan
Summary:
note: breaking the original diff [D55225818](https://www.internalfb.com/diff/D55225818) into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.
Stacked PR to restore original names to placeholder nodes, replacing the default names arg0_1, arg1_1, ...
This PR propagates node names to higher-order-op subgraph placeholders, retaining the top-level names and handling naming collisions by suffixing other non-placeholder nodes in the subgraph with an index. This is the same handling as in fx.Graph/fx.Node, but implemented separately as a pass.
Since the input schemas of HOO subgraphs are very different, they are enumerated in _name_hoo_subgraph_placeholders(). Currently cond, map_impl, and wrap_with_set_grad_enabled are handled, but other ops can be easily added.
Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py
Differential Revision: D55456749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123587
Approved by: https://github.com/angelayi
In non-strict, assignment of attributes in a model causes their state to contain fake tensors post-tracing, which leads to incorrect results on running the exported model. We now error when this happens, asking the user to use buffers instead.
Next, we add support for assignment of buffers. The final values of the buffers turn into outputs of the graph. Since the buffers are already lifted as inputs and populated with the initial values when the model is run, this leads to a simple programming model where the driver of the model can feed the outputs back as inputs for successive runs.
Differential Revision: D55146852
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122337
Approved by: https://github.com/bdhirsh, https://github.com/tugsbayasgalan
This doesn't entirely fix the original problem that prompted this, but
it seems to just be getting stuck in export constraint formatting now
which seems like progress to me.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122827
Approved by: https://github.com/avikchaudhuri
Summary:
This PR restores original names to placeholder nodes, replacing the default names arg0_1, arg1_1, and so on.
User inputs now follow the signature of mod.forward(), for example forward(x, y) produces nodes x, y. If the tensors are nested in dictionaries, lists, tuples, or dataclasses, the names are a concatenation of the path to the tensor, e.g. x = {'a': torch.randn(4), 'b': [torch.randn(4), torch.randn(4)]} produces nodes x_a, x_b_0, x_b_1.
Parameters, buffers, constants, and custom objects follow the FQN of the object, prefixed by "p", "b", "c", and "obj" respectively. For example, self.bar.l0.weight gets you p_bar_l0_weight.
Effect tokens are named token_1, token_2, and so on, since they are not grounded in model inputs or named attributes.
note: breaking the original diff into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.
Examples:
```python
# params, buffers, constants, inputs, torch.cond
ExportedProgram:
class GraphModule(torch.nn.Module):
def forward(self, p_l0_weight: "f32[4, 4]", p_l0_bias: "f32[4]", c_alpha: "f32[4]", b_beta: "f32[4]", x_0_a: "f32[4, 4]", y: "f32[4, 4]"):
# No stacktrace found for following nodes
mul: "f32[4, 4]" = torch.ops.aten.mul.Tensor(x_0_a, x_0_a)
t: "f32[4, 4]" = torch.ops.aten.t.default(p_l0_weight); p_l0_weight = None
addmm: "f32[4, 4]" = torch.ops.aten.addmm.default(p_l0_bias, y, t); p_l0_bias = y = t = None
return addmm
# model code
class Bar(torch.nn.Module):
def forward(self, x):
return x * x
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
self.bar = Bar()
self.l0 = torch.nn.Linear(4, 4)
self.alpha = torch.randn(4)
self.register_buffer('beta', torch.randn(4))
def forward(self, x, y):
x = x[0]['a']
mul = self.bar(x)
z1 = self.l0(y)
return z1
# custom objects, dataclasses, tokens, constant inputs
ExportedProgram:
class GraphModule(torch.nn.Module):
def forward(self, token_1: "f32[0]", obj_attr, data_x: "f32[4, 4]", data_y: "f32[4, 4]", mode):
# No stacktrace found for following nodes
mul: "f32[4, 4]" = torch.ops.aten.mul.Scalar(data_x, 30); data_x = None
div: "f32[4, 4]" = torch.ops.aten.div.Tensor_mode(data_y, 1.0, rounding_mode = 'floor'); data_y = None
add: "f32[4, 4]" = torch.ops.aten.add.Tensor(mul, div); mul = div = None
with_effects = torch._higher_order_ops.effects.with_effects(token_1, torch.ops._TorchScriptTesting.takes_foo.default, obj_attr, add); token_1 = obj_attr = add = None
getitem: "f32[0]" = with_effects[0]
getitem_1: "f32[4, 4]" = with_effects[1]; with_effects = None
return (getitem, getitem_1)
# model code
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
self.attr = torch.classes._TorchScriptTesting._Foo(10, 20)
def forward(self, data, a=1.0, mode="floor"):
x = self.attr.add_tensor(data.x) + torch.div(data.y, a, rounding_mode=mode)
x = torch.ops._TorchScriptTesting.takes_foo(self.attr, x)
return x
dataclass
class DataClass:
x: Tensor
y: Tensor
register_dataclass_as_pytree_node(
DataClass,
serialized_type_name="test.DataClass"
)
args = (DataClass(x=torch.randn(4, 4), y=torch.randn(4, 4)), )
kwargs = {'mode': 'floor'}
ep = torch.export.export(Foo(), args, kwargs, strict=False)
```
Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py
Differential Revision: D55456418
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122904
Approved by: https://github.com/angelayi, https://github.com/thiagocrepaldi
This addresses 2 issues with stack_trace metadata:
- stack_trace is currently missing from nodes in non-strict export
- in strict mode, stack_trace is populated for placeholder nodes, which may not be well-defined (with multiple uses)
We filter the call stack during tracing for calls from forward() methods, or ops in `torch.__init__.py` (e.g. sym_size_int, sym_constrain_range, etc.) to populate stack_trace. A node-level check is also added to _export_non_strict().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121034
Approved by: https://github.com/angelayi
This PR adds a new metadata, `torch_fn` which is meant to replace `source_fn_stack` as `source_fn_stack` is not entirely well defined between strict/nonstrict. Previous discussion [here](https://docs.google.com/document/d/1sPmmsmh6rZFWH03QBOe49MaXrQkP8SxoG8AOMb-pFk4/edit#heading=h.anmx9qknhvm).
`torch_fn` represents the torch function that a particular aten operator came from. For example, `torch.nn.Linear` goes down to the `torch.nn.functional.linear` at the `__torch_function__` layer, and then `aten.t/aten.addmm` in the `__torch_dispatch__` layer. So the nodes `aten.t/aten.addmm` will now have the `torch_fn` metadata containing the `torch.nn.functional.linear`.
The `torch_fn` metadata is a tuple of 2 strings: a unique identifier for each torch function call, and the actual torch function `f"{fn.__class__}.{fn.__name__}"`. The purpose of the first value is to distinguish between 2 consecutive calls to the same function. For example, if we had 2 calls to `torch.nn.Linear`, the nodes and corresponding metadata would look something like:
```
aten.t - ("linear_1", "builtin_function_or_method.linear"),
aten.addmm - ("linear_1", "builtin_function_or_method.linear"),
aten.t - ("linear_2", "builtin_function_or_method.linear"),
aten.addmm - ("linear_2", "builtin_function_or_method.linear"),
```
Higher order ops -- currently we can get the torch_fn metadata for nodes within the HOO's subgraph, but after retracing, this becomes the `(cond, higher_order_op.cond)` :( This is because `fx_traceback.set_current_meta` points to the cond node in the toplevel graph, rather than the original node in the subgraph. I think this is because `fx.Interpreter` does not go into the cond subgraphs. (will discuss with Yidi more ab this)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122693
Approved by: https://github.com/tugsbayasgalan