Summary:
Fixes T198245910.
In previous diff D60532628 that causes the test failure, we fix the in-consistency caused by constant tensors is accidentally reigistered as buffer by deleting the buffer and re assign them as constant.
However, this broke several existing tests in pyspeech when the exported program is re-traced with torch.jit.trace (which is an anti-pattern we probably should have some alignment), the jit tracer finds this constant tensor requiring grad and errors out.
This PR force constant attr not requiring grad, which is the correct behavior. A better fix is finding out where the constants are created in user code and why it requires grad. But this has low roi so we warn user about it.
Test Plan: See failures in T198245910.
Differential Revision: D60974869
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133031
Approved by: https://github.com/angelayi
Summary:
A re-land of D60006710.
Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing.
edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure:
a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state.
The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design
edit 2: Also fix the inconsistency of graph signatures when param_constant is marked as lifted_tensor_constants but it's registered as parameters in the output of ep.module().
Differential Revision: D60532628
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132307
Approved by: https://github.com/zhxchen17
Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).
- mark Proxy dump as NotImplemented for better error msg
- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.
More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.
Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node
```
Reviewed By: angelayi
Differential Revision: D60319175
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
Summary:
This pr fixes all the places in strict export stack where the output node's meta is not preserved correctly. However, we're getting a new error for the test we intend to fix: `buck2 run caffe2/test/quantization:test_quantization -- -r "test_re_export_preserve_handle"`:
The `get_attr` nodes has wrong metadata. I guess there are more things need to be fixed to get it working but it's beyond the scope of this PR.
Test Plan: buck2 run caffe2/test/quantization:test_quantization -- -r "test_re_export_preserve_handle"
Differential Revision: D60198221
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131706
Approved by: https://github.com/yushangdi
Summary:
This diff reverts D59561509
D59561509: [FX][export] DCE pass, check schema for node impurity (#130395) by yushangdi causes the following test failure:
Tests affected:
- [cogwheel:cogwheel_mtia_cmf_m5_shrunk_test#test_flow_with_verification](https://www.internalfb.com/intern/test/844425041436985/)
Here's the Multisect link:
https://www.internalfb.com/multisect/6533402
Here are the tasks that are relevant to this breakage:
T191383430: 10+ tests unhealthy for ads_mtia_inference
The backout may land if someone accepts it.
If this diff has been generated in error, you can Commandeer and Abandon it.
Test Plan: NA
Differential Revision: D60029318
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131341
Approved by: https://github.com/angelayi
Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing.
**edit:** also remove the eliminate_dead_code() in _unlift because of one onnx test failure:
a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state.
The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130990
Approved by: https://github.com/pianpwk
Fixes the failure in `test/export/test_export_training_ir_to_run_decomp.py ` caused by dead code elimination removing node with side effects.
For background, in export, we may want to export higher-level IRs that are not functional, so we need to check for side effects more carefully.
A call_function node is impure if it has at least one mutable argument.
Fixed the tests below:
test_to_module_with_mutated_buffer_multiple_update_sub_later
test_export_input_mutation_static_shape
test_buffer_util
Another attempt modifying the original DCE pass is made in PR #130395, but it breaks some other tests, so here we add a flag and use it for export only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130552
Approved by: https://github.com/pianpwk
This PR intends to support the aten operations with the `out` tensor.
Currently, the AOT compile always does **NOT** keep input tensor mutations. According to the comments, this is because it has not encountered such a use case.
> For now there's no use case involving keeping input mutations in the graph (which we can only do in the inference case anyway). We can add this later if we need to.
However, for aten operations, it is popular that the `out` tensor is an input parameter and needs to be mutated. This PR intends to support it by adding a `keep_inference_input_mutations` flag to `aot_inductor.keep_inference_input_mutations`. This flag can provide flexibility to the callee in deciding whether the AOT compile needs to keep input tensor mutations in the graph.
Take `clamp` as an example as follows.
```python
out_tensor = torch.randn(128, dtype=torch.float, device=device).fill_(-2.0)
inp_tensor = torch.randn(128, dtype=torch.float, device=device).fill_(1.0)
min_tensor = inp_tensor - 0.05
max_tensor = inp_tensor + 0.05
torch.clamp(input=inp_tensor, min=min_tensor, max=max_tensor, out=out_tensor)
```
W/O this PR
```python
def forward(self):
arg0_1: "f32[128]"; arg1_1: "f32[128]"; arg2_1: "f32[128]"; arg3_1: "f32[128]";
arg0_1, arg1_1, arg2_1, arg3_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
clamp_min: "f32[128]" = torch.ops.aten.clamp_min.Tensor(arg0_1, arg1_1); arg0_1 = arg1_1 = None
clamp_max: "f32[128]" = torch.ops.aten.clamp_max.Tensor(clamp_min, arg2_1); clamp_min = arg2_1 = None
return (clamp_max, clamp_max)
```
W/ this PR
```python
def forward(self):
arg0_1: "f32[128]"; arg1_1: "f32[128]"; arg2_1: "f32[128]"; arg3_1: "f32[128]";
arg0_1, arg1_1, arg2_1, arg3_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
clamp_min: "f32[128]" = torch.ops.aten.clamp_min.Tensor(arg0_1, arg1_1); arg0_1 = arg1_1 = None
clamp_max: "f32[128]" = torch.ops.aten.clamp_max.Tensor(clamp_min, arg2_1); clamp_min = arg2_1 = None
copy_: "f32[128]" = torch.ops.aten.copy_.default(arg3_1, clamp_max); arg3_1 = clamp_max = None
return (copy_,)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124926
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/angelayi
This PR intends to support the aten operations with the `out` tensor.
Currently, the AOT compile always does **NOT** keep input tensor mutations. According to the comments, this is because it has not encountered such a use case.
> For now there's no use case involving keeping input mutations in the graph (which we can only do in the inference case anyway). We can add this later if we need to.
However, for aten operations, it is popular that the `out` tensor is an input parameter and needs to be mutated. This PR intends to support it by adding a `keep_inference_input_mutations` flag to `aot_inductor.keep_inference_input_mutations`. This flag can provide flexibility to the callee in deciding whether the AOT compile needs to keep input tensor mutations in the graph.
Take `clamp` as an example as follows.
```python
out_tensor = torch.randn(128, dtype=torch.float, device=device).fill_(-2.0)
inp_tensor = torch.randn(128, dtype=torch.float, device=device).fill_(1.0)
min_tensor = inp_tensor - 0.05
max_tensor = inp_tensor + 0.05
torch.clamp(input=inp_tensor, min=min_tensor, max=max_tensor, out=out_tensor)
```
W/O this PR
```python
def forward(self):
arg0_1: "f32[128]"; arg1_1: "f32[128]"; arg2_1: "f32[128]"; arg3_1: "f32[128]";
arg0_1, arg1_1, arg2_1, arg3_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
clamp_min: "f32[128]" = torch.ops.aten.clamp_min.Tensor(arg0_1, arg1_1); arg0_1 = arg1_1 = None
clamp_max: "f32[128]" = torch.ops.aten.clamp_max.Tensor(clamp_min, arg2_1); clamp_min = arg2_1 = None
return (clamp_max, clamp_max)
```
W/ this PR
```python
def forward(self):
arg0_1: "f32[128]"; arg1_1: "f32[128]"; arg2_1: "f32[128]"; arg3_1: "f32[128]";
arg0_1, arg1_1, arg2_1, arg3_1, = fx_pytree.tree_flatten_spec([], self._in_spec)
clamp_min: "f32[128]" = torch.ops.aten.clamp_min.Tensor(arg0_1, arg1_1); arg0_1 = arg1_1 = None
clamp_max: "f32[128]" = torch.ops.aten.clamp_max.Tensor(clamp_min, arg2_1); clamp_min = arg2_1 = None
copy_: "f32[128]" = torch.ops.aten.copy_.default(arg3_1, clamp_max); arg3_1 = clamp_max = None
return (copy_,)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124926
Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/angelayi
Summary:
Fixes https://github.com/pytorch/pytorch/issues/122842
Currently, calling ep.module() on an ExportedProgram leads to a GraphModule with a default forward signature (e.g. arg_0, arg_1, ...). This leads to original placeholder names disappearing for retracing/re-exporting.
Fixing this issue by creating a forward_arg_names field (will take renaming suggestions for this), that stores the positional & keyword arg names that are used. These names aren't present in the call_spec currently stored, and requires a major version bump for the ExportedProgram schema.
Test Plan: Tests exist for export, but names are now changed from generic (e.g. arg_0, arg_1) to follow user inputs (e.g. x, y)
Differential Revision: D56484994
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124765
Approved by: https://github.com/zhxchen17
Summary: Change to get_buffer from the input plain_graph_module instead of the new stateful_gm when restoring non_persistent buffers, since the stateful_gm doesn't contain the buffer yet.
Test Plan:
Added test case.
`buck test caffe2/test:test_export -- test_unlift_nonpersistent_buffer`
Differential Revision: D54216772
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120715
Approved by: https://github.com/zhxchen17
Summary:
X-link: https://github.com/pytorch/executorch/pull/1817
Basic support for non-persistent buffers, which are buffers that do not show up in the state dict.
One weird twist is that most of our other systems (FX, aot_export, dynamo) have completely buggy handling of non-persistent buffers. I tried to go on a wild goose chase to fix them all, but it got to be too much. So I introduced some sad rewrite passes in `_export` make the final state dict correctly align with the original module's state dict.
This exposed some bugs/ambiguous handling of parameters/buffers in existing test code. For example, `TestSaveLoad.test_save_buffer` traced over a module that was not in the root module hierarchy and caused some weird behavior. I think we should error explicitly on use cases like this: https://github.com/pytorch/pytorch/issues/118410. For now I just rewrote the tests or skipped them.
As a side effect, this diff tightened up quite a few sloppy behaviors around state dict handling:
- Tensor attributes were getting promoted to be buffers—bad!
- Tracing through a module not in the children of the root module would add its parameters/buffers to the state dict—bad!
This behavior is unlikely to show up in user code since the model would be totally broken, but did show up in a bunch of tests.
#buildmore
Test Plan:
unit tests
sandcastle
Differential Revision: D53340041
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118969
Approved by: https://github.com/guangy10, https://github.com/huydhn, https://github.com/titaiwangms
Summary: We only need to deepcopy the graph because we're modifying the graph by unlifting its parameter/buffer inputs. We don't need to deepcopy the graph module state/contents. This causes an error when the graph module contains an ExecuTorch LoweredModule which stores tensors.
Test Plan: Fixes the following diff
Differential Revision: D53290077
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118821
Approved by: https://github.com/tugsbayasgalan
Summary:
X-link: https://github.com/pytorch/executorch/pull/1769
Basic support for non-persistent buffers, which are buffers that do not show up in the state dict.
One weird twist is that most of our other systems (FX, aot_export, dynamo) have completely buggy handling of non-persistent buffers. I tried to go on a wild goose chase to fix them all, but it got to be too much. So I introduced some sad rewrite passes in `_export` make the final state dict correctly align with the original module's state dict.
This exposed some bugs/ambiguous handling of parameters/buffers in existing test code. For example, `TestSaveLoad.test_save_buffer` traced over a module that was not in the root module hierarchy and caused some weird behavior. I think we should error explicitly on use cases like this: https://github.com/pytorch/pytorch/issues/118410. For now I just rewrote the tests or skipped them.
Test Plan: added a unit test
Differential Revision: D53253905
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118722
Approved by: https://github.com/SherlockNoMad, https://github.com/angelayi
I don't think we should be unlifting HOO submodules.
What is the constract of unlifting? It is: restore the original calling convention of the module, undoing the transformation in which we lift parameters, buffers, and constants to inputs in the graph.
Unlifting does *not* make any guarantees about what's going on inside the module. It's still a flat module. So why should we lift the cond/map submodules? It doesn't have anything to do with the contract stated above; it's some internal stuff that doesn't affect how the module will be called.
Further, this code as written modifies the state dict; adding a new buffer that is actually duplicate of a previous buffer. Modifying the state dict from the original eager module is never correct.
Differential Revision: [D53160713](https://our.internmc.facebook.com/intern/diff/D53160713/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118610
Approved by: https://github.com/zhxchen17
ghstack dependencies: #118607, #118608, #118609
This PR rewrites two paths to use the newly-added keypaths API in pytree:
First: we were hand-rolling a tree_map during fakification because we wanted to track sources. This PR uses keypaths instead, which can do the same thing without needing custom code.
Second: our constraint error formatting was referencing placeholder names in error messages. These placeholder names are not otherwise user-visible, so they are super confusing to users (e.g. "which input does arg1_3 correspond to?"). This diff uses the `keystr` API to format the error message.
This necessitated some small refactors—generating the keystr is expensive so doing it in an f-string was very bad.
It can also be further improved—we can inspect the signature so that instead of `*args[0]` we can give people the actual argument name, which would be the ideal UX. But leaving that for later.
Differential Revision: [D53139358](https://our.internmc.facebook.com/intern/diff/D53139358/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118609
Approved by: https://github.com/zhxchen17
ghstack dependencies: #118607, #118608
tree_flatten_spec is bad; it isn't synced up with `register_pytree_node` so it will not handle arbitrary custom pytrees. It's also not really maintained.
We only use it for two purposes:
- To retain kwarg ordering stability, so that if the user passes in kwargs in a different order things will still work.
- To do "structural" checks that ignore types.
In both cases, tree_flatten_spec is probably *not* the ideal way to implement the desired behavior.
## kwargs ordering
- tree_flatten_spec overwrites the behavior of ALL dictionaries, not just kwargs. This is not correct, dictionary ordering is meaningful in Python, and it's pretty trivial to write a program that relies on dict ordering.
- For kwargs, we do sort of expect that the order in which arguments are passed shouldn't matter. BUT there is one exception: `**kwargs`. In fact, [PEP 468](https://peps.python.org/pep-0468/) was introduced specifically to clarify that ordering does matter when the function being called uses `**kwargs`.
In this diff I introduce a utility function that *only* reorders kwargs. This gets us most of the way to correct—dicts are no longer reordered, but kwargs can be passed in any order.
A "fully correct" solution would need fix the corner case from PEP468. We could detect whether the top-level fn being traced uses `**kwargs` (via `inspect`), then serialize a flag for it. In ExportedProgram, we would check that flag and only re-order if `**kwargs` was unused; otherwise error if the key order doesn't match. This is a super corner case though, so I'll file it as a followup task.
## structural equivalence checking
This is another use case, where again `tree_flatten_spec` is too broad. Generally we want to treat a precise two types as the same, not override the behavior of comparison generally. So I introduce an `is_equivalent` util for this purpose.
Differential Revision: [D53168420](https://our.internmc.facebook.com/intern/diff/D53168420/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118608
Approved by: https://github.com/zhxchen17
ghstack dependencies: #118607
Summary: While turning on .module() for all the export tests, I uncovered some bugs with .module() and while fixing them I ended up rewriting some of the code... Some of the bugs were:
* bad kwargs support on the unlifted module
* no support for user input mutations
* (at the commit hash i was working off of) no support for custom objects
* there were no tests on unlifting weights from cond/map submodules
Test Plan: CI
Differential Revision: D53075380
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118272
Approved by: https://github.com/suo
Added support for constant outputs. We will just embed the constant directly into the output, like `return (x, 1)`.
Also adds support for None input/outputs. For None inputs we address it the same way we do to constants, which is that a placeholder with no users will be inserted into the graph, and the None will be embedded into whatever operator is using the None. For None outputs, we will also address the same way we do constants, which is that we embed it into the output, like `return (x, None)`.
Differential Revision: D52881070
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117894
Approved by: https://github.com/zhxchen17
Summary: lifted tensor constants were not being treated the same way as named buffers when unlifting, i.e. getting name correction to convert "." in FQNS to "_" for proper names. Additionally, future torchbind object support will allow objects to be registered, so only register_buffer for lifted constants if the value is a tensor.
Differential Revision: D52367846
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116266
Approved by: https://github.com/angelayi