Summary: Adds non-strict implementation of training IR export. Any expected non-strict training IR failures are also either existing strict training IR or non-strict failures (no new failures added). 4 strict training IR failures also resolved.
Refraining from unifying export/export_for_training, per @ydwu4's feedback :)
Test Plan: added test_export_training_ir_to_run_decomp_non_strict.py for non-strict training IR
Differential Revision: D59349454
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130062
Approved by: https://github.com/ydwu4, https://github.com/zhxchen17
original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train)
Summary:
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0) # 2*s0
w = z.repeat(y.shape[0]) # 2*s0*s1
_w = w.shape[0]
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```
Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```
Test Plan:
contbuild & OSS CI, see 940e4477ab
Original Phabricator Test Plan:
Imported from GitHub, without a `Test Plan:` line.
Differential Revision: D59543603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380
Approved by: https://github.com/izaitsevfb
Fixes the example in #118304 for `torch._functorch.aot_autograd.aot_export_module` and `torch.export.export`.
On a high level, the issue is caused by not detecting fake_mode when there's no input.
Change plan:
1) we add a `dynamic_shapes: Union[bool, None] = None` arg to `aot_export_module` and `_aot_export_function`.
2) if the input is not a graph module, then we can only rely on this `dynamic_shapes` input arg.
3) If the input is a graph module, then we can traverse the graph and check.
4) So we check if the input mod is a graph module or just a module, and do 2) or 3) depending on the type.
Fixes#129927
Bug source: dynamo's fake_mode is not detected correctly in `_convert_input_to_fake` in `_traced.py` when there’s no input to the graph). So in ` _strict_export_lower_to_aten_ir`, we create another fake_mode. `dynamo_fake_mode` is not the same as the fake_mode used by dynamo.
Change plan:
check `gm_torch_level` graph's node meta "example_value" for fake mode in addition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129928
Approved by: https://github.com/angelayi
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0) # 2*s0
w = z.repeat(y.shape[0]) # 2*s0*s1
_w = w.shape[0]
# something with _w ...
# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```
Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)
# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0) # 2*s0
w = z.repeat(y.shape[0]) # 2*s0*s1
_w = w.shape[0]
# something with _w ...
# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```
Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)
# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
Recently we decided to split export IR into two different IRs (training vs inference). In the inference IR, one major change we decided to introduce was we wanted to keep the composite ops that user specified in the IR. This PR does that by overriding the CompositeImplicitAutograd decomp in export inference path.
Differential Revision: [D58701607](https://our.internmc.facebook.com/intern/diff/D58701607)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128077
Approved by: https://github.com/bdhirsh
Summary: Today meta['val'] on placeholder nodes doesn't preserve the consistent requires_grad information with the original inputs. Seems there's no easy way to fix this directly at proxy tensor layer. This is useful for reexporting joint graph.
Test Plan: test_preserve_requires_grad_placeholders
Differential Revision: D58555651
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128656
Approved by: https://github.com/tugsbayasgalan
Summary: When we export already traced module, it seems to be modifying some global state causing the traced modules to fail to run. For now, we are only logging for test cases, so it is probs ok to trace fresh copy to be used in export for now.
Test Plan: CI
Differential Revision: D57983518
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127577
Approved by: https://github.com/pianpwk
With the current state of export's dynamic shapes, we struggle with guards and constraints that are beyond the current dynamic shapes language, expressed with dims and derived dims. While we can compile and guarantee correctness for guards within the current language (e.g. min/max ranges, linear relationships, integer divisibility) we struggle to dynamically compile guards which extend beyond that.
For these "complex" guards, we typically do either of the following: 1) raise a constraint violation error, along the lines of "not all values of <symbol> in the specified range satisfy <guard>", with or without suggested fixes, 2) specialize to the provided static values and suggest removing dynamism, or 3) fail compilation due to some arbitrary unsupported case. Previous [work](https://github.com/pytorch/pytorch/pull/124949) went towards resolving this by disabling forced specializations, instead allowing the user to fail at runtime with incorrect inputs.
In this PR, relying on [hybrid backed-unbacked symints](https://github.com/pytorch/pytorch/issues/121749), [deferred runtime asserts](https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/runtime_assert.py), and the function [_is_supported_equivalence()](d7de4c9d80/torch/fx/experimental/symbolic_shapes.py (L1824)), we add a flag `_allow_complex_guards_as_runtime_asserts` which allows the user to compile exported programs containing these guards and maintain dynamism, while adding correctness checks as runtime assertions in the graph.
Hybrid backed-unbacked symints allow us to easily bypass "implicit" guards emitted from computation - guards that we ~expect to be true. Popular examples revolve around reshapes:
```
# reshape
def forward(self, x, y): # x: [s0, s1], y: [s2]
return x.reshape([-1]) + y # guard s0 * s1 = s2
This leads to the following exported program
class GraphModule(torch.nn.Module):
def forward(self, x: "f32[s0, s1]", y: "f32[s2]"):
sym_size_int: "Sym(s2)" = torch.ops.aten.sym_size.int(y, 0)
mul: "Sym(-s2)" = -1 * sym_size_int; sym_size_int = None
sym_size_int_1: "Sym(s0)" = torch.ops.aten.sym_size.int(x, 0)
sym_size_int_2: "Sym(s1)" = torch.ops.aten.sym_size.int(x, 1)
mul_1: "Sym(s0*s1)" = sym_size_int_1 * sym_size_int_2; sym_size_int_1 = sym_size_int_2 = None
add: "Sym(s0*s1 - s2)" = mul + mul_1; mul = mul_1 = None
eq: "Sym(Eq(s0*s1 - s2, 0))" = add == 0; add = None
_assert_scalar = torch.ops.aten._assert_scalar.default(eq, "Runtime assertion failed for expression Eq(s0*s1 - s2, 0) on node 'eq'"); eq = None
view: "f32[s0*s1]" = torch.ops.aten.view.default(x, [-1]); x = None
add_1: "f32[s0*s1]" = torch.ops.aten.add.Tensor(view, y); view = y = None
return (add_1,)
```
Another case is symbol divisibility:
```
def forward(self, x): # x: [s0, s1]
return x.reshape([-1, x.shape[0] - 1]) # Eq(Mod(s0 * s1, s0 - 1), 0)
```
Applying deferred runtime asserts also helps dynamic compilation for "explicit" complex guards that typically cause problems for export. For example we can generate runtime asserts for not-equal guards, and complex conditions like the following:
```
class Foo(torch.nn.Module):
def forward(self, x, y):
# check that negation of first guard also shows up as runtime assertion
if x.shape[0] == y.shape[0]: # False
return x + y
elif x.shape[0] == y.shape[0] ** 3: # False
return x + 2, y + 3
elif x.shape[0] ** 2 == y.shape[0] * 3: # True
return x * 2.0, y * 3.0
```
For the above graph we will generate 3 runtime assertions: the negation of the first 2, and the 3rd condition as a guard.
One additional benefit here over the current state of exported programs is that this adds further correctness guarantees - previously with explicit complex guards, if compilation succeeded, the guards would be ignored at runtime, treated as given.
As shown above, the runtime asserts appear as math ops in the graph, generated by the sympy interpreter, resulting in an _assert_scalar call. There is an option to avoid adding these asserts into the graph, by setting `TORCH_DYNAMO_DO_NOT_EMIT_RUNTIME_ASSERTS=1`. This results in the "original" computation graph, with dynamism, and any incorrect inputs will fail on ops during runtime. Further work could go into prettifying the printer, so the majority of the graph isn't guard-related.
Ideally this PR would subsume and remove the recently added [_disable_forced_specializations](https://github.com/pytorch/pytorch/pull/124949) flag, but that flag still handles one additional case of specialization: single-variable equalities where the symbol is solvable for a concrete value: see this [PR](https://github.com/pytorch/pytorch/pull/126925)
This PR doesn't change any behavior around data-dependent errors/unbacked symints yet, that could be further work.
NOTE: will take naming change suggestions for the flag :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127129
Approved by: https://github.com/avikchaudhuri
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125543
This PR address 2 issues with derived dim suggested fixes, 1) newly introduced roots, and 2) root swapping.
1 | Newly introduced roots appear with modulo guards, e.g. Mod(dx, 2) = 0 suggests dx is a derived dim equal to 2 * _dx, introducing a new root _dx. Currently the final suggested fixes handle this correctly, but we can get intermediate results where related derived dims don't rely on a unified root, and are a mixture of min/max range and derived suggestions.
For example:
```
"dx": {"eq": 3*_dx-1, "max": 36}
"dy": {"eq": dx+1}
This should lead to suggested fixes
_dx = Dim('_dx', max=12)
dx = 3 * _dx - 1
dy = 3 * _dx
```
This PR prettifies the suggested fixes routine by unifying to a single root, and making each intermediate suggestion either a derived dim or min/max range, not both.
2 | The current suggested fixes for derived dims can lead to root dims/derived dims being swapped, e.g. `dy - 1, dy` -> `dx, dx + 1`. This leads to problematic suggested fixes that look like `dy - 1 = Dim("dy - 1")` since we don't have access to the original variable name.
This PR only adds a suggested fix for the root dim, and removes all other derived suggestions.
For example, with the export test case test_derived_dim_out_of_order_simplified:
```
_dimz = torch.export.Dim("_dimz", min=6, max=8)
dimy = _dimz - 1
dimx = dimy - 1
dimz = torch.export.Dim("dimz", min=6, max=8) # doesn't work, should be = _dimz
class Foo(torch.nn.Module):
def forward(self, x, y, z):
return x + y[1:] + z[2:]
foo = Foo()
u, v, w = torch.randn(5), torch.randn(6), torch.randn(7)
export(
foo,
(u, v, w),
dynamic_shapes=({0: dimx}, {0: dimy}, {0: dimz}),
)
```
Before:
```
Suggested fixes:
_dimz = Dim('_dimz', min=3, max=9223372036854775807) # 2 <= _dimz - 1 <= 9223372036854775806
_dimz - 2 = Dim('_dimz - 2', min=4, max=6)
_dimz = Dim('_dimz', min=2, max=9223372036854775806) # 2 <= _dimz <= 9223372036854775806
_dimz - 1 = _dimz - 1
dimz = _dimz
```
New suggested fixes:
```
Suggested fixes:
dimz = _dimz
```
Note: This assumes the specified derived relations between dims are correct. This should be valid because: 1) if the relation is plain wrong (e.g. (dx, dx - 1) provided with inputs (6, 4)), this gets caught in beforehand in produce_guards. 2) if the relation is correct but does not match the emitted guard, for example:
```
def forward(self, x, y):
return x.reshape([-1]) + y # guard: s0 * 2 = s1
dx = Dim("dx")
export(
model,
(torch.randn(6, 2), torch.randn(12)),
dynamic_shapes={"x": (dx, 2), "y": (dx + 6, )}
)
```
This produces two linear equations, leading to specialization since a) produce_guards is able to solve for a concrete value, and b) the export constraint solver will anyways force specializations due to range constraints.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125543
Approved by: https://github.com/avikchaudhuri
Summary: We want to track how well torch.jit.trace can be converted to export in large scale. As a first step, we log all of torch.jit.trace unittests whether we can convert the traced module to export module OR we can export the model directly
Test Plan: CI
Differential Revision: D57629682
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126900
Approved by: https://github.com/SherlockNoMad
## Goal
As title
## Design
Based on the fact that each TorchScript module has a `code` property which provides the original source code for the `forward` function, I implemented a function to extrapolate `forward` function signature by using the AST parser.
Some other tradeoff
* Directly parsing src code as string --> will be very buggy
* Directly using `compile` function in Python to get the function object --> raises a lot of exceptions because of missing packages or undefined variable names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126787
Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan
This PR requires a little justification, but let's start with what it does first:
1. When you have a 0d CPU scalar int64/float64 tensor input to a graph, we will preallocate a backed SymInt/SymFloat corresponding to what you would get if you call item() on this tensor. This means you can freely change your input to be a Python int/float or a Tensor with an item() call and end up with exactly the same level of expressivity (specifically, you can guard on the internal SymInt/SymFloat no matter what). By default, the source of the backed SymInt/SymFloat is `L['tensor'].item()`, but if you have promoted a float input into a Tensor, we will cancel out `torch.as_tensor(L['float']).item()` into just `L['float']`.
2. We switch wrap_symfloat to use this, instead of hand crafting the new SymNodeVariable. Everything works out, except that we carefully pass the item() result to tracked fakes (and not the fake Tensor argument)
OK, so why do this at all? There is some marginal benefit where now some item() calls on scalar inputs can be guarded on, but IMO this is a pretty marginal benefit, and if it was the only reason, I wouldn't do this. The real reason for this is that I need to be able to propagate fake tensors through the graphs that are produced by Dynamo, and if I am doing the old custom wrap_symfloat logic, there's no way I can do this, because ordinarily an item() call will cause an unbacked SymInt when I reallocate.
The other obvious way to solve the problem above is to make a HOP alternative that item() that "bakes in" the backed SymInt its supposed to return. But this strategy seems more parsimonious, and it does have the marginal benefit I mentioned above. The main downside is that what I have to do next, is make it so that when I run tensor computation, I also apply the equivalent operations to the SymInt/SymFloat as well. That's next PR.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126245
Approved by: https://github.com/eellison
ghstack dependencies: #126637
Some operations have a scalar input parameter, like `torch.add(a, b, alpha=2.0)`. Currently, the aot compile does not support such a case because it requires the signature of the captured graph to align with the operation's signature. This means that some inputs in the captured graph may be scalar(float, int, bool, etc.). It breaks the assumption of `compile_fx_aot` as it assumes all the example inputs are tensor - 0f6ce45bcb/torch/_inductor/compile_fx.py (L1048)
This PR intends to support such cases by allowing not-aligned signature and filtering out the non-Tensor parameters.
Captured graph for `torch.add(a, b, alpha=2.0)`
```
opcode name target args kwargs
------------- -------- --------------- ---------------- --------------
placeholder arg0_1 arg0_1 () {}
placeholder arg1_1 arg1_1 () {}
call_function add aten.add.Tensor (arg0_1, arg1_1) {'alpha': 2.0}
output output_1 output ((add,),) {}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124177
Approved by: https://github.com/jansel, https://github.com/desertfire, https://github.com/jgong5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125758
Aliased and unused params are currently an issue for strict-mode export. For a model like this:
```
def __init__(self):
# ...
self.alpha = nn.Parameter(torch.randn(4))
self.beta = self.alpha
self.gamma = self.alpha
def forward(self, x):
return x + self.beta
```
Dynamo will trace only 1 parameter (beta) and assign a dynamo name (e.g. `L__self___beta`) which can be difficult to match to the correct FQN in the original eager module. This leads to export graph signature potentially having the incorrect target FQN for the parameter, leading to downstream issues unflattening (the parameter may be assigned to the wrong target attribute, mismatching the relevant placeholder node in the unflattened module).
This handles aliasing issues by assigning all tensors present in the state dict as module attributes, even if they're unused. Still, only the used tensors will appear in the graph's forward pass.
Another issue that exists is weight-sharing is not maintained in unflattening (all params/buffers are re-cloned) - handle this by checking tensor ids too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125758
Approved by: https://github.com/zhxchen17
Summary: This fix does three things:
1. When we add inputs from partioner to the top level graph module, we insert in the order of partioner which is not guaranteed to be same as original graph inputs. This PR fixes that.
2. When we replace autograd ops with HOP, we create new submodules and access their outputs via getitem calls. As a result, previous node names associated with getitem gets updated, resulting in the graph being different from produced graph signature. So I just update the graph signature accordingly.
3. We run runtime_assertion pass before autograd HOP pass because the constraints won't be populated correctly.
Differential Revision: [D57130314](https://our.internmc.facebook.com/intern/diff/D57130314)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125793
Approved by: https://github.com/zhxchen17
Summary:
By default, some inferred dynamic shapes guards/constraints that are not expressible with the current dynamic shapes language will lead to specialization to the concrete input values provided. If disable_forced_specializations is set to True, we will not specialize, and will not perform runtime checks on such produced guards. Instead, we allow the user to specify arbitrary shapes, and fail during runtime if the inputs are invalid. Constraints expressible with the language (e.g. ranges, linear derived dims) will still be enforced, and behavior for all other guards remains the same.
Cases where we typically specialize are reshapes:
```
x: [4, 6] # [s0, s1]
x = x.reshape([x.shape[0] - 1, -1])
# this emits a guard Mod(s0*s1, s0-1) = 0, we specialize on s0=4, s1=6
x: [4, 6], y: [24] # [s0, s1], [s2]
x = x.reshape([-1]) + y
# this emits a guard s0*s1 = s2, we specialize on s0=4, s1=6, s2=24
```
For now only applicable for non-strict mode (need to figure out how to pass this flag into dynamo's call of produce_guards).
Test Plan: Added test case that checks compilation, runtime, and suggested fixes behavior.
Differential Revision: D56361177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124949
Approved by: https://github.com/avikchaudhuri
Fixes [internal error](https://fb.workplace.com/groups/1075192433118967/permalink/1416709435633930/).
The issue is that the asserting nodes added in the `insert_deferred_runtime_assertion` pass do not contain metadata that the ExportedProgram requires the graph to have. One solution to fix this is to retrace the entire module, or another solution is to manually add back this metadata.
This diff implements the latter solution (manually add back the metadata) through hooking into fx.graph's `create_node` function, and adding export-specific metadata for every node that is created. The reason I did this is so that the `insert_deferred_runtime_assertion` does not have to know about what metadata export wants.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125414
Approved by: https://github.com/zhxchen17, https://github.com/BoyuanFeng
A re-land of #124239.
This PR fakify ScriptObject inputs and attributes in export non-strict mode by default.
The basic idea is to only fakify the script object during tracing (i.e. aot_export). After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor.
Concretely, we
fakify all the script object inputs, and module attributes (gathered by constant_attrs).
patch the module's attributes with fakified script object
right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125490
Approved by: https://github.com/angelayi
Summary:
Fixing the implementation of `_flatten_dynamic_shapes()`, to follow how `_process_dynamic_shapes()` does it. The previous implementation would misinterpret some nested dynamic shapes specs, causing it to miss out on some shapes specs, for example with nested inputs/constant input tuples:
```
inputs = (
(2, 1),
(
torch.randn(2, 1),
torch.randn(2, 2),
torch.randn(2, 3),
)
)
dynamic_shapes = (
(None, None),
(
None,
None,
None,
)
)
```
This would get interpreted as 2 shapes specs for 2d and 3d tensors. Fixing so this doesn't happen.
Test Plan: Existing export tests
Differential Revision: D56894923
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125415
Approved by: https://github.com/angelayi
To fix data-dependent errors we want to recommend that people use `torch._check*` APIs. The `constrain_as*` APIs should be fully subsumed by them, and in the future we should kill them entirely.
Differential Revision: D56774333
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125253
Approved by: https://github.com/ezyang
This PR fakify ScriptObject inputs and attributes in export non-strict mode by default.
The basic idea is to `only fakify the script object during tracing (i.e. aot_export)`. After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor.
Concretely, we
1. fakify all the script object inputs, and module attributes (gathered by constant_attrs).
2. patch the module's attributes with fakified script object
3. right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124239
Approved by: https://github.com/zou3519
This PR introduces a new way of building `dynamic_shapes` for export. The idea is to build up a mapping from input tensors to the dynamic shapes that should be assigned to their corresponding fake tensors.
This mapping is automatically converted to the current form of `dynamic_shapes`, which must exactly match the structure of inputs. We do this by using pytree utils.
With the current `dynamic_shapes`, we had to be careful about user-defined classes that are registered with pytree, since such classes are not necessarily polymorphic containers; they may be fine containing tensors, but not dynamic shapes. Thus we had decided to allow input instances of such classes to be associated with dynamic shapes in flattened form. This decision needs to be mirrored in this PR as well. To make it easier to keep these code paths in sync, we refactor the current recursive procedure for associating inputs with dynamic shapes to use the same pytree utils. This needs minor fixes to a few tests where `dynamic_shapes` were not exactly matching the structure of inputs.
Differential Revision: D56551992
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124898
Approved by: https://github.com/zhxchen17
Summary:
Fixes https://github.com/pytorch/pytorch/issues/122842
Currently, calling ep.module() on an ExportedProgram leads to a GraphModule with a default forward signature (e.g. arg_0, arg_1, ...). This leads to original placeholder names disappearing for retracing/re-exporting.
Fixing this issue by creating a forward_arg_names field (will take renaming suggestions for this), that stores the positional & keyword arg names that are used. These names aren't present in the call_spec currently stored, and requires a major version bump for the ExportedProgram schema.
Test Plan: Tests exist for export, but names are now changed from generic (e.g. arg_0, arg_1) to follow user inputs (e.g. x, y)
Differential Revision: D56484994
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124765
Approved by: https://github.com/zhxchen17
Summary: When I was debugging an issue, this silent error makes the debugging harder. It is better to error earlier with more descriptive error message.
Test Plan: None
Differential Revision: D56312433
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124411
Approved by: https://github.com/zhxchen17
Summary:
There are multiple things implemented incorrectly in non strict for reparametrizing state dict:
1. The same fake tensor should be generated for duplicated weights.
2. We should snapshot state dict in the beginning to always hold the invariant that ep.state_dict == mod.state_dict()
3. We will overwrite real weights with fake weights if we don't restore the weights in LIFO ordering.
4. We don't turn on strict checking which could sliently fail on corner cases.
This diff aims to solve all these issues at once.
Test Plan: CI
Differential Revision: D56505020
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124847
Approved by: https://github.com/pianpwk
The process for populating range_constraints follows separate methods for non-strict (`make_constraints`), and strict (`_process_constraints`). The strict method is somewhat more convoluted, and the analysis that Dynamo performs for strict is already present as part of the non-strict process in make_constraints (produce_guards(), running the export constraint solver).
This PR kills _process_constraints() and replaces calls with make_constraints, without duplicating the work that Dynamo already does.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123985
Approved by: https://github.com/avikchaudhuri
Summary:
The current _AddRuntimeAssertionsForInlineConstraintsPass has 2 known issues caused by its use of torch.fx.Interpreter:
1. SymInt-related ops (e.g. item()) are executed, causing new Unbacked SymInts to appear in the graph during the pass.
2. The graph is reconstructed, and node names/indices can be different from before, causing mismatches with `module_call_graph`, and leading to issues during unflattening.
This refactors the pass to use PassBase instead of _ExportPassBaseDeprecatedDoNotUse, only constructing new nodes for assertions.
Test Plan: This pass is called on all strict-mode export calls with range_constraints, test that behavior remains unchanged.
Differential Revision: D56360137
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124503
Approved by: https://github.com/zhxchen17