Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking.
Having these `# type: ignore` linger around is not ideal for two reasons:
- They are verbose/ugly syntatically.
- They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore.
I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward.
This PR should have no effect on runtime at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
For better tracking, we need to make maybe aliasing/mutating ops with proper tag. We need to special case native_batch_norm because it is not a CIA but has a wrong schema. I guess native_batch_norm will be removed at some point, so until then we just keep it around.
D60347117
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131990
Approved by: https://github.com/bdhirsh
Summary:
When we have both `set_grad` and `autocast` HOP, name collision might happen when we try to inline a node.
For exmaple, for a GraphModule like this:
```
GraphModule(
(submod_0): GraphModule(
(submod_1): GraphModule()
)
(submod_1): GraphModule()
(submod_2): GraphModule()
)
```
when we inline `submod_0`, we might accidentally overwrite `submod_1`.
In this PR, we fix this by check if the graph module already has an attribute with the same name, if so, we use the next "submod_{i}", until no name collision.
Partially fixes https://github.com/pytorch/pytorch/issues/140589.
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_predispatch_autocast_and_set_grad
```
Differential Revision: D66200994
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141169
Approved by: https://github.com/angelayi
In this PR, we implement lazy dictionary for export decomp behaviour for following reasons:
1. Custom op loading can happen after import time, as a result, the decomp table might not be able to pick up the decomp. Therefore we try to delay materialization as late as possible.
I intentionally seperated out the core_aten_decomp to not have any custom CIA ops in this PR to mitigate the risk of getting reverted but in the future, core_aten_decomp under torch/_decomp will exist as an alias to official export table (torch.export.default_decompositions)
Differential Revision: [D64140807](https://our.internmc.facebook.com/intern/diff/D64140807)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137650
Approved by: https://github.com/justinchuby, https://github.com/bdhirsh
Resolve#137540
Summary:
We might get different state_dict and named_parameters result when the module has registered custom state_dict_hooks.
For exported_program's state_dict, we want the state_dict to reflect the actual module hierarchy at runtime, and it might be different from the model's state_dict() output if the model has state_dict hooks.
To do weight swapping, one needs to either re-export or turn-off the hooks when saving model's state_dict().
Previously, ExportedProgram uses nn.Module's state_dict() method to populate its own state_dict, but it doesn't work for some models (e.g. llama3_3_vision) because ExportedProgram's state_dict and an nn.Module's state_dict have some subtle differences semantically.
nn.Module's state_dict is about how the state should be serialized, and it reflects the structure of the original user model code. In contrast, export specializes on a “run” of a model, and its state_dict needs to reflect the runtime module hierarchy.
One example where these two are different is TorchTune's Llama3_2_vision text decoder. Here, a FusionLayer is added as a local optimization and it is not part of the "static model definition". In runtime, we have mod.layers[3].layer.sa_norm.scale.
But in nn.Module's state_dict, the authors of the model added a state_dict hook to remove the "layer" in mod.state_dict() to reflect the static model definition, so we have mod.state_dict()["layers.3.sa_norm.scale"].
In this Diff, we change ExportedProgram to populate its state_dict using named_parameters() and named_buffers() instead. So in ExportedProgram's state_dict, we have "layers.3.layer.sa_norm.scale", which reflects the runtime module hierarchy.
Now one problem this presents is weight swapping. Since ExportedProgram's state and the model's state is not the same anymore, weight swapping procedure also needs to change slightly.
In internal Ads and RecSys models deployment, weight swapping is where they have one model that is currently being being deployed and serving traffic, and they want to swap out the weights with newly trained model weights without having to redo the whole exporting/lowering process and create a new artifact. So they would move the deployed model’s pointer to the state dict over to the new state dict. Because of this, it’s previously a requirement that the FQNs are matching between the exported and the eager model’s state dict.
The new ExportedProgram's state dict still supports weight swapping, but the state_dict to be swapped needs to be obtained from torch.export.exported_program instead of model.state_dict() if the model has state_dict hooks.
The new requirement is that the FQNs are matching between the exported’s state dict and the state_dict obtained from `_disabled_load_state_dict_hooks(M)` context manager. One benefit of having this new API is that we are now in full control within export of gathering and updating the model state.
If a model doesn't have any state_dict hooks, one can still use model.state_dict() for weight swapping, so it's BC.
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_export_for_training_with_state_dict_hooks
```
Differential Revision: D64080561
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137609
Approved by: https://github.com/angelayi, https://github.com/pianpwk
Summary:
Fixes https://github.com/pytorch/pytorch/issues/134778
The previous D62304294 broke some executorch tests. It has already been reverted.
In this diff, `_collect_param_buffer_metadata()` is modified in a way that when a `call_function` node is encountered and its input nodes include `get_attr`. We skip the fields that have been collected previously and only collect rest of the fields. This prevents over-writing.
Test Plan:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//executorch/backends/xnnpack/test:test_xnnpack_ops
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_re_export_preserve_handle
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_run_decompositions_preserve_handle
```
Differential Revision: D62514208
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135720
Approved by: https://github.com/zhxchen17, https://github.com/jerryzh168
Starter version of automatic dynamic shapes for export.
Creates enums `DIM.AUTO`, `DIM.STATIC`, allowing user to specify `AUTO` for dims in dynamic_shapes specs, meaning that corresponding dims are treated as dynamic, and relevant guards will do what's necessary (e.g. refine ValueRanges, set replacements based on equality, or even set static) without raising ConstraintViolationErrors. Basically allows the user to say, "a bunch of these dims can be dynamic, let export do model analysis and return the program with maximum possible dynamism, without complaining".
The usage for specifying `dynamic_shapes` is now:
```
AUTO -> dynamic by default, return whatever produce_guards() says, even if it's static
None/int/STATIC -> static
Dim/DerivedDim -> same as before - will complain if the min/max range is invalid, or if dims related to this are unspecified.
```
Caveat 1: specifying `AUTO` for a dim won't guarantee it'll be dynamic:
- specifying `AUTO` for a dim will return the maximum possible dynamism given your program and other specified constraints, but this can still mean you'll get a static program. For example, with the program below, x is specified dynamic, but it's equal to y, which is specified static, and with how we currently do things we won't promote y to dynamic, but will demote(?) x to static. So this can be surprising if you don't fully know your model, and/or missed one of your other inputs when specifying auto-dynamic shapes.
```
class Foo(torch.nn.Module):
def forward(self, x, y):
return x + y
inputs = (torch.randn(6), torch.randn(6))
export(Foo(), inputs, dynamic_shapes={"x": (DIM.AUTO,), "y": None})
```
Caveat 2: specifying `AUTO` and Dims in the same spec is still problematic:
- The way Dims/DerivedDims are currently handled is very strict. A Dim represents a symbol, and we require a user to specify the symbol for all dims governed by the symbol - that's why we've seen errors in the past like `The values of x must always be related to y by ...`, asking the user to specify the exact relation as in the program. We also require the specified min/max range to be a subset of the valid range from model analysis. All this doesn't compose well with specifying `AUTO` just yet - for example in the program below, ideal behavior could be to return a dynamic program, where `dx = x.size(0) = y.size(0)` has range (3,6). Unfortunately this crashes, and correct behavior is to specify `dx` for both inputs. So currently we raise a UserError and crash if both Dims + `AUTO` are present in the spec.
```
class Foo(torch.nn.Module):
def forward(self, x, y):
return x + y
inputs = (torch.randn(6), torch.randn(6))
export(Foo(), inputs, dynamic_shapes={"x": (DIM.AUTO,), "y": {0: Dim("dx", min=3, max=6)}}) # this doesn't work, because x & y and related
```
Implementation details:
This is done by setting `assume_static_by_default=False`, and doing a transform on the `dynamic_shapes` spec to preserve semantics. `assume_static_by_default=False` will treat unspecified dims or Nones as dynamic. This is the opposite of what `export.export()` currently does - unspecified Dims/Nones are treated as static. Historically this static-by-default behavior, where the user deals with fewer guards, has been desirable, and we would like to respect that in this implementation. So this internal spec transformation is added, `_transform_shapes_for_default_dynamic()`, does the spec conversion necessary to be compatbile with dynamic by default. Specifically, AUTOs are converted into Nones, and Nones/unspecified dims are filled in with explicitly static constraints.
For example, this would look like, for a 3-d tensor: `{0: DIM.AUTO, 1: None, 2: Dim("dx")} -> {0: None, 1: 32, 2: Dim("dx")}`
This does seem overly complicated, but it's done to preserve dynamic shapes semantics for `torch._dynamo.export()`, which already uses `assume_static_by_default=False`, and follows the same process for generating shape constraints , via `_process_dynamic_shapes`. There the semantics are:
```
None/unspecified: dynamic by default
Dim/DerivedDim: also a strict assertion
```
If we don't care about BC for `_dynamo.export(dynamic_shapes)`, then we can just modify semantics for `_process_dynamic_shapes()` and change all the relevant tests in `test/dynamo/test_export.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133620
Approved by: https://github.com/avikchaudhuri
The process of inlining HOO subgraphs (e.g. set_grad_enabled) seems to break node.users when a node is present in multiple subgraphs, for example:
```
class SetGradCase(torch.nn.Module):
def forward(self, x):
_x = x.shape[0] + 2
_xx = _x + 2
with torch.no_grad():
y = _x * 4
return _xx, y
```
The `_x` node contains 2 users (_xx and y) after being inlined, but with inspection it seems to only contain y as a user.
Previously we were completely clearing node.users for output nodes in HOO subgraphs before inlining them - we should just be deleting the subgraph output nodes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133144
Approved by: https://github.com/larryliu0820, https://github.com/ydwu4
Summary:
A re-land of D60006710.
Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing.
edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure:
a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state.
The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design
edit 2: Also fix the inconsistency of graph signatures when param_constant is marked as lifted_tensor_constants but it's registered as parameters in the output of ep.module().
Differential Revision: D60532628
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132307
Approved by: https://github.com/zhxchen17
Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).
- mark Proxy dump as NotImplemented for better error msg
- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.
More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.
Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node
```
Reviewed By: angelayi
Differential Revision: D60319175
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
Summary:
Reland of D60206382.
Suggested in https://github.com/pytorch/pytorch/issues/128394.
If there's an autocast context manager, the predispatch (strict) graph can look something like:
```
class <lambda>(torch.nn.Module):
def forward(self, x: "f32[1]"):
...
_enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None
_exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None
return (mm_1,)
```
But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.
Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.
Test Plan:
CI
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_autocast"
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_set_grad"
```
Verified that now we can export the llama model in gh issue 128394 and the gemma model in gh issue 131829 without error.
Differential Revision: D60770038
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132677
Approved by: https://github.com/angelayi
Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing.
**edit:** also remove the eliminate_dead_code() in _unlift because of one onnx test failure:
a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state.
The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130990
Approved by: https://github.com/pianpwk
Sets `prefer_deferred_runtime_asserts_over_guards=True` for export, so any guards emitted from `SymNode.expect_true` (for example, guards that are implicitly required to be true for an op to succeed) won't lead to constraint violations. Instead these should appear in the graph as runtime asserts, or potentially as replacement expressions for placeholder shapes.
For example, this reshape op should emit s0 * s1 = s2, deferred as a runtime assert.
```
x = torch.randn(4, 8) # [s0, s1]
y = torch.randn(32) # [s2]
out = x.reshape(-1) + y
# this emits Eq(s0 * s1, s2), and we represent y's shape as [s0*s1] in the graph.
```
However, other complex guards can still cause export to fail, for instance guards emitted from `SymNode.guard_bool/guard_size_oblivious` (e.g. explicit if-else conditions in user code or lower-level op implementations hit during tracing) can still raise constraint violations. These can be deferred with `allow_complex_guards_as_runtime_asserts=True`. We don't yet make this default, because while this makes export more likely to succeed, it results in non-trivial asserts being emitted that often represent specialization to a variant of the op, or checks related to 0/1 specialization.
We also remove forced specializations for export and kill the `_disable_forced_specializations` flag - now any guard we can't express with Dims/DerivedDims either are handled with Hybrid SymInts, or should be resolved with rewriting or deferring.
Follow up:
Currently, `ShapeEnv._set_replacement()` is called for complex equality expressions (e.g. s2 -> s0*s1 in the example above), and the ExportedProgram stores `s0*s1` in the input placeholder. This isn't checked for validity when the program is run, so an option is to avoid replacement and/or runtime assert on equality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130775
Approved by: https://github.com/avikchaudhuri
Summary: Previously, when we inline the subgraphs that doesn't have a different require_grad environment, we didn't clean up the nodes's users in subgraph and direcly used them to to replace the output of the call_modules. This records dead depencies in node.users. This PR fixes this.
Test Plan:
Added a new test.
Also see the torchrec tests:
Step 1:
buck run mode/dev-nosan //aimp/experimental/pt2:pt2_export -- --model-entity-id 934687114 --output /tmp/934687114.zip --use-torchrec-eager-mp --use-manifold
Step 2:
buck run mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true aimp/cli:cli -- --platform=aps --template=disagg_gpu_aps_pt2 --pt2 --model-entity-id=934687114 non-request-only-tagging torchrec-shard-and-quantize gpu-disagg-split assign-device materialize-weights script-and-save
Differential Revision: D59132214
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129716
Approved by: https://github.com/angelayi
## Goal
As title
## Design
Based on the fact that each TorchScript module has a `code` property which provides the original source code for the `forward` function, I implemented a function to extrapolate `forward` function signature by using the AST parser.
Some other tradeoff
* Directly parsing src code as string --> will be very buggy
* Directly using `compile` function in Python to get the function object --> raises a lot of exceptions because of missing packages or undefined variable names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126787
Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan
Summary:
By default, some inferred dynamic shapes guards/constraints that are not expressible with the current dynamic shapes language will lead to specialization to the concrete input values provided. If disable_forced_specializations is set to True, we will not specialize, and will not perform runtime checks on such produced guards. Instead, we allow the user to specify arbitrary shapes, and fail during runtime if the inputs are invalid. Constraints expressible with the language (e.g. ranges, linear derived dims) will still be enforced, and behavior for all other guards remains the same.
Cases where we typically specialize are reshapes:
```
x: [4, 6] # [s0, s1]
x = x.reshape([x.shape[0] - 1, -1])
# this emits a guard Mod(s0*s1, s0-1) = 0, we specialize on s0=4, s1=6
x: [4, 6], y: [24] # [s0, s1], [s2]
x = x.reshape([-1]) + y
# this emits a guard s0*s1 = s2, we specialize on s0=4, s1=6, s2=24
```
For now only applicable for non-strict mode (need to figure out how to pass this flag into dynamo's call of produce_guards).
Test Plan: Added test case that checks compilation, runtime, and suggested fixes behavior.
Differential Revision: D56361177
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124949
Approved by: https://github.com/avikchaudhuri
Summary:
note: breaking the original diff D55225818 into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.
Stacked PR to restore original names to placeholder nodes, replacing the default names arg0_1, arg1_1, ...
This PR supports constant argument placeholder (e.g. forward(self, x, y=1)) names and de/serialization, by adding a name field for ConstantArguments in the graph signature, and ConstantInputSpec in the input specs for serialization.
Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py
Differential Revision: D55506949
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123590
Approved by: https://github.com/angelayi, https://github.com/zhxchen17
Summary:
note: breaking the original diff [D55225818](https://www.internalfb.com/diff/D55225818) into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.
Stacked PR to restore original names to placeholder nodes, replacing the default names arg0_1, arg1_1, ...
This PR propagates node names to higher-order-op subgraph placeholders, retaining the top-level names and handling naming collisions by suffixing other non-placeholder nodes in the subgraph with an index. This is the same handling as in fx.Graph/fx.Node, but implemented separately as a pass.
Since the input schemas of HOO subgraphs are very different, they are enumerated in _name_hoo_subgraph_placeholders(). Currently cond, map_impl, and wrap_with_set_grad_enabled are handled, but other ops can be easily added.
Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py
Differential Revision: D55456749
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123587
Approved by: https://github.com/angelayi
Summary:
This PR restores original names to placeholder nodes, replacing the default names arg0_1, arg1_1, and so on.
User inputs now follow the signature of mod.forward(), for example forward(x, y) produces nodes x, y. If the tensors are nested in dictionaries, lists, tuples, or dataclasses, the names are a concatenation of the path to the tensor, e.g. x = {'a': torch.randn(4), 'b': [torch.randn(4), torch.randn(4)]} produces nodes x_a, x_b_0, x_b_1.
Parameters, buffers, constants, and custom objects follow the FQN of the object, prefixed by "p", "b", "c", and "obj" respectively. For example, self.bar.l0.weight gets you p_bar_l0_weight.
Effect tokens are named token_1, token_2, and so on, since they are not grounded in model inputs or named attributes.
note: breaking the original diff into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.
Examples:
```python
# params, buffers, constants, inputs, torch.cond
ExportedProgram:
class GraphModule(torch.nn.Module):
def forward(self, p_l0_weight: "f32[4, 4]", p_l0_bias: "f32[4]", c_alpha: "f32[4]", b_beta: "f32[4]", x_0_a: "f32[4, 4]", y: "f32[4, 4]"):
# No stacktrace found for following nodes
mul: "f32[4, 4]" = torch.ops.aten.mul.Tensor(x_0_a, x_0_a)
t: "f32[4, 4]" = torch.ops.aten.t.default(p_l0_weight); p_l0_weight = None
addmm: "f32[4, 4]" = torch.ops.aten.addmm.default(p_l0_bias, y, t); p_l0_bias = y = t = None
return addmm
# model code
class Bar(torch.nn.Module):
def forward(self, x):
return x * x
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
self.bar = Bar()
self.l0 = torch.nn.Linear(4, 4)
self.alpha = torch.randn(4)
self.register_buffer('beta', torch.randn(4))
def forward(self, x, y):
x = x[0]['a']
mul = self.bar(x)
z1 = self.l0(y)
return z1
# custom objects, dataclasses, tokens, constant inputs
ExportedProgram:
class GraphModule(torch.nn.Module):
def forward(self, token_1: "f32[0]", obj_attr, data_x: "f32[4, 4]", data_y: "f32[4, 4]", mode):
# No stacktrace found for following nodes
mul: "f32[4, 4]" = torch.ops.aten.mul.Scalar(data_x, 30); data_x = None
div: "f32[4, 4]" = torch.ops.aten.div.Tensor_mode(data_y, 1.0, rounding_mode = 'floor'); data_y = None
add: "f32[4, 4]" = torch.ops.aten.add.Tensor(mul, div); mul = div = None
with_effects = torch._higher_order_ops.effects.with_effects(token_1, torch.ops._TorchScriptTesting.takes_foo.default, obj_attr, add); token_1 = obj_attr = add = None
getitem: "f32[0]" = with_effects[0]
getitem_1: "f32[4, 4]" = with_effects[1]; with_effects = None
return (getitem, getitem_1)
# model code
class Foo(torch.nn.Module):
def __init__(self):
super().__init__()
self.attr = torch.classes._TorchScriptTesting._Foo(10, 20)
def forward(self, data, a=1.0, mode="floor"):
x = self.attr.add_tensor(data.x) + torch.div(data.y, a, rounding_mode=mode)
x = torch.ops._TorchScriptTesting.takes_foo(self.attr, x)
return x
dataclass
class DataClass:
x: Tensor
y: Tensor
register_dataclass_as_pytree_node(
DataClass,
serialized_type_name="test.DataClass"
)
args = (DataClass(x=torch.randn(4, 4), y=torch.randn(4, 4)), )
kwargs = {'mode': 'floor'}
ep = torch.export.export(Foo(), args, kwargs, strict=False)
```
Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py
Differential Revision: D55456418
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122904
Approved by: https://github.com/angelayi, https://github.com/thiagocrepaldi
Summary: When fake tensors are passed to a graph module and we do runtime assertions on them, we can accidentally trigger specialization guards. It's better to just relax the checking for these.
Test Plan: confirmed that problem in T181400371 is now fixed
Differential Revision: D54658960
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121460
Approved by: https://github.com/angelayi
With the current `Dim`-based dynamic shapes API for export, one can express that shapes of different input shapes must be equal by reusing the same `Dim`. However, non-trivial relationships between such input shapes cannot be expressed.
Recently we are seeing more and more examples of code that require this additional expressibility, e.g., where a pair of shapes might differ by one, or a shape might be double another (or simply even).
This PR introduces the concept of a "derived" `Dim`, i.e., a linear arithmetic expression over a `Dim`. By using a combination of `Dim`s and derived `Dim`s to specify input shapes, the desired relationships can be expressed naturally. E.g., a pair of shapes might be `dim` and `dim + 1`, or `dim` and `2*dim`, or even `2*dim` and `dim + 1`.
We extend the current infrastructure that translates `Dim`s to deprecated `dynamic_dim`-based constraints to work with derived `Dim`s. As usual, we raise constraint violation errors when shape guards cannot be verified given a dynamic shapes spec; suggest fixes; and raise runtime errors when future inputs violate the spec.
Importantly, some guards that used to cause forced specializations in the constraint solver because they were deemed "too complex" now do not do so, because they can now be specified as constraints. Since this was what motivated the introduction of a `disable_constraint_solver` flag to some internal APIs, we may not need that flag any more.
Note that shapes of placeholders in exported programs can now contain symbolic expressions and not just symbols.
Differential Revision: D53254587
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118729
Approved by: https://github.com/ezyang
As titled. Before the PR, after we split then inline_, there will be getitem calls in the graph while the original graph module doesn't have them. This PR removes the additional get_item calls by inlining.
Test Plan:
Added new test cases for graphs that return multiple outputs and takes multiple inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119913
Approved by: https://github.com/tugsbayasgalan
ghstack dependencies: #119732, #119736, #119810
This pr is the 1/N pr of transforming the global state mutating ops such as torch._C.set_grad_enabled calls in pre-dispatch graph into a higher order op so that the graph becomes more functional. We make use of split_module to help us do the transformation.
This pr preserves the node.name in original module by adding a new kwarg `keep_original_node_name` to split_module.
For a graph looks like this:
```python
def forward(self, arg_0):
arg0_1, = fx_pytree.tree_flatten_spec(([arg_0], {}), self._in_spec)
add = torch.ops.aten.add.Tensor(arg0_1, 1); arg0_1 = None
sin = torch.ops.aten.sin.default(add); add = None
sum_1 = torch.ops.aten.sum.default(sin); sin = None
_set_grad_enabled = torch._C._set_grad_enabled(False)
add_1 = torch.ops.aten.add.Tensor(sum_1, 1); sum_1 = None
_set_grad_enabled_1 = torch._C._set_grad_enabled(True)
sub = torch.ops.aten.sub.Tensor(add_1, 1)
return pytree.tree_unflatten((add_1, sub), self._out_spec)
```
Before the change, split graph returns the following graphs and subgraphs (notice the change from `add` -> `add_tensor`, `sin` -> `sin_default`:
```python
def forward(self, arg_0):
arg0_1, = fx_pytree.tree_flatten_spec(([arg_0], {}), self._in_spec)
submod_0 = self.submod_0(arg0_1); arg0_1 = None
submod_1 = self.submod_1(submod_0); submod_0 = None
submod_2 = self.submod_2(submod_1)
return pytree.tree_unflatten((submod_1, submod_2), self._out_spec)
# submod_0
def forward(self, arg0_1):
add_tensor = torch.ops.aten.add.Tensor(arg0_1, 1); arg0_1 = None
sin_default = torch.ops.aten.sin.default(add_tensor); add_tensor = None
sum_default = torch.ops.aten.sum.default(sin_default); sin_default = None
return sum_default
# submod_1
def forward(self, sum_1):
_set_grad_enabled = torch._C._set_grad_enabled(False)
add_tensor = torch.ops.aten.add.Tensor(sum_1, 1); sum_1 = None
return add_tensor
# submod_2
def forward(self, add_1):
_set_grad_enabled = torch._C._set_grad_enabled(True)
sub_tensor = torch.ops.aten.sub.Tensor(add_1, 1); add_1 = None
return sub_tensor
""")
```
After the change, the test produce the following graph, all the node names in original graph module are preserved in sub_modules.
```python
def forward(self, arg_0):
sub, = fx_pytree.tree_flatten_spec(([arg_0], {}), self._in_spec)
submod_0 = self.submod_0(sub); sub = None
submod_1 = self.submod_1(submod_0); submod_0 = None
submod_2 = self.submod_2(submod_1)
return pytree.tree_unflatten((submod_1, submod_2), self._out_spec)
# submod_0
def forward(self, arg0_1):
add = torch.ops.aten.add.Tensor(arg0_1, 1); arg0_1 = None
sin = torch.ops.aten.sin.default(add); add = None
sum_1 = torch.ops.aten.sum.default(sin); sin = None
return sum_1
# submod_1
def forward(self, sum_1):
_set_grad_enabled = torch._C._set_grad_enabled(False)
add_1 = torch.ops.aten.add.Tensor(sum_1, 1); sum_1 = None
return add_1
# submod_2
def forward(self, add_1):
_set_grad_enabled_1 = torch._C._set_grad_enabled(True)
sub = torch.ops.aten.sub.Tensor(add_1, 1); add_1 = None
return sub
```
Note that currently, we call split_module on the graph after pre-dispatch aot. The difference is even larger if we `split_module` the graph module produced by dynamo, where all the original variables names in user program are preserved after dynamo but lost after `split_module` without this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119732
Approved by: https://github.com/tugsbayasgalan