Commit Graph

882 Commits

Author SHA1 Message Date
Yidi Wu
a23d86c178 [hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133645
Approved by: https://github.com/zou3519
2024-08-23 17:28:02 +00:00
Avik Chaudhuri
b454c51060 remove dynamic_dim (#134211)
Summary: As promised in https://github.com/pytorch/pytorch/pull/134045.

Test Plan: existing

Differential Revision: D61646937

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134211
Approved by: https://github.com/angelayi
2024-08-23 04:13:03 +00:00
Shangdi Yu
b0cf287b46 [export][training ir migration] Fix getitem not exist (#134259)
Summary:
Make quantization tests compatible with the new training IR.

With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node.

We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir.

For now, we are just rolling out the training ir for fbcode internal tests.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion

buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args
```

Reviewed By: andrewor14, tugsbayasgalan

Differential Revision: D61292102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259
Approved by: https://github.com/tugsbayasgalan
2024-08-22 22:00:14 +00:00
Aaron Orenstein
d95aedf5fd [BE] typing for decorators - fx/_compatibility (part 1) (#134202)
Part of #134054.

This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
2024-08-22 17:07:33 +00:00
Avik Chaudhuri
0d7ac1966a kill sharing of constraints (#134045)
Summary:
Previously, reuse of the same `Dim` was encoded by "sharing" internal constraints among constraint targets. This kind of sharing, implemented using `shared` fields between `_Constraint`s, was originally motivated by `dynamic_dim`, specifically to support `==` between `dynamic_dim`s, but we no longer need to maintain this overcomplicated structure: we can simply use names of `Dims` to directly encode sharing information.

Thus this PR vastly simplifies the structure of `_Constraint` by removing `shared` fields. As a result, both `_Constraint` and its moral subclass, `_DerivedConstraint`, are 1-1 with `Dim` and its moral subclass, `DerivedDim`.

Note that this will break `==` over `dynamic_dim`, so an immediate follow-up will be to remove `dynamic_dim` entirely from our public API. (It's been more than 6 months since the deprecation warning anyway.) I just didn't want to deal with that process in the same PR.

Test Plan: existing

Differential Revision: D61559413

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134045
Approved by: https://github.com/pianpwk
2024-08-22 04:40:47 +00:00
Yiming Zhou
7b20514f8e [export] Device remapping in export (#133660)
Implemented `move_to_device_pass()` function in `torch._export.passes`.

The user has to explicitly call this method to move the exported program from one torch.device to another one.

Fixes https://github.com/pytorch/pytorch/issues/121761
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133660
Approved by: https://github.com/angelayi
2024-08-22 01:03:35 +00:00
PyTorch MergeBot
1491a61769 Revert "[hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645)"
This reverts commit 696107efcb.

Reverted https://github.com/pytorch/pytorch/pull/133645 on behalf of https://github.com/ydwu4 due to breaking ci. probably due to land race ([comment](https://github.com/pytorch/pytorch/pull/133645#issuecomment-2302866106))
2024-08-21 19:33:14 +00:00
Shangdi Yu
5fcfccefc6 [export] Migrate capture_pre_autograd_graph to _export_for_training (#132815)
Summary: as title

Test Plan: CI

Differential Revision: D60860909

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132815
Approved by: https://github.com/tugsbayasgalan
2024-08-21 19:00:41 +00:00
Yidi Wu
696107efcb [hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133645
Approved by: https://github.com/zou3519
ghstack dependencies: #133521
2024-08-21 17:34:21 +00:00
Shangdi Yu
8337b4d96e [training ir migration] Fix ReorderConvertTest (#134010)
Summary:
Change ReorderConvertTest to work with the new `capture_pre_autograd_graph` implementation using D61175223.

Note that now `ReorderConvertTest` doesn't work with the old `capture_pre_autograd_graph` anymore.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/passes/tests:optimize_test -- -r ReorderConvertTest
```

Differential Revision: D61507772

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134010
Approved by: https://github.com/tugsbayasgalan
2024-08-21 04:48:43 +00:00
Sherlock Huang
41fab40be7 [report_exportability] Avoid re-exporting duplicated modules (#133930)
Summary:
Skip re-exporting modules with the duplicated types to speed up the exportability tests.

In real models, there are many duplicated modules, and mostly have the same export issues.

Test Plan: Existing CI

Differential Revision: D61504630

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133930
Approved by: https://github.com/angelayi
2024-08-20 22:11:57 +00:00
PyTorch MergeBot
49f6ea6dd9 Revert "[report_exportability] Avoid re-exporting duplicated modules (#133930)"
This reverts commit 278bc985d7.

Reverted https://github.com/pytorch/pytorch/pull/133930 on behalf of https://github.com/izaitsevfb due to breaks lint ([comment](https://github.com/pytorch/pytorch/pull/133930#issuecomment-2299513046))
2024-08-20 18:44:09 +00:00
Sherlock Huang
278bc985d7 [report_exportability] Avoid re-exporting duplicated modules (#133930)
Summary:
Skip re-exporting modules with the duplicated types to speed up the exportability tests.

In real models, there are many duplicated modules, and mostly have the same export issues.

Test Plan: Existing CI

Differential Revision: D61504630

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133930
Approved by: https://github.com/angelayi

Co-authored-by: bearzx <bearzx@fb.com>
2024-08-20 18:20:49 +00:00
Angela Yi
a1a869f2f5 [ts_converter][reland] Add support for LinearOpContext and Conv2dOpContext in quantization pass (#133622)
Summary: Reland of D60871242

Test Plan: CI

Differential Revision: D61352600

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133622
Approved by: https://github.com/SherlockNoMad
2024-08-16 01:55:45 +00:00
Angela Yi
29c4b4ea5a [executorch] Refactor delegation code (#132773)
Summary: Refactoring partitioner-based delegation to prepare for allowing buffer mutations in the delegate (following diff).

Test Plan: CI

Differential Revision: D60813405

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132773
Approved by: https://github.com/ydwu4, https://github.com/cccclai
2024-08-15 22:52:12 +00:00
Sherlock Huang
09a489b177 Fix serialization for tensor list output (#133539)
Summary: Some element of tensor list output doesn't not have a user. In such case, create a name as `{node_name}_unused_{index}` for it.

Test Plan: OSS CI

Differential Revision: D61309011

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133539
Approved by: https://github.com/zhxchen17
2024-08-15 20:31:44 +00:00
Shangdi Yu
d3b458e603 [export] Do not use export.export for capture_pre_autograd_graph (#133370)
Summary:
Do not use export.export for `capture_pre_autograd_graph` in unittests anymore.

#buildall

Test Plan: CI

Reviewed By: tugsbayasgalan

Differential Revision: D60996041

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133370
Approved by: https://github.com/tugsbayasgalan
2024-08-15 17:37:45 +00:00
Pian Pawakapan
a75248528f [export] refactor _process_dynamic_shapes (#133391)
Sorryyyyy for another refactor. This splits `_process_dynamic_shapes` into 3 parts:
1. `_combine_args` - mostly the same thing
2. `_check_dynamic_shapes`, which is responsible for raising 99% of UserErrors if the dynamic shapes spec is invalid (minus 1 UserError with DerivedDims)
3.  `_process_dynamic_shapes`, which for now, is the same thing, minus the stuff in 2.

This refactor is helpful for incoming automatic dynamic shapes work, because, we're switching to `assume_static_by_default=False`, which is what `_dynamo.export` currently does. This means any unspecified dims are allocated a symbol, in contrast to export today which keeps unspecified dims static. Historically this has been desirable - export users don't want too much dynamism. So we want to change how the spec is translated into constraints.

This means when we switch over to automatic dynamic shapes, we want to plug in something in between steps 2. and 3. which patches up the spec for `assume_static_by_default=False`, filling in static shapes for any unspecified dims, and potentially clearing out the auto-dynamic dims (since they're no-ops). We would do this in-between 2. and 3. to keep `_process_dynamic_shapes` semantically the same, since it's used with `_dynamo.export`.

We could do this without a refactor, plugging in this transform before `_process_dynamic_shapes`, but since that function's responsible for both spec checking + constraint production, moving spec checking to before we transform the specs helps guarantee we're raising errors on what the user's specified, and not an internal export bug.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133391
Approved by: https://github.com/avikchaudhuri
2024-08-15 16:21:21 +00:00
Xuehai Pan
758a0a88a2 [BE][Easy] enable ruff rule PIE790: unnecessary pass statement (#133200)
This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change.

Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200
Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980
2024-08-15 15:50:19 +00:00
Zhengxu Chen
f23dbefe52 [export] Support "custom" metadata field. (#131912)
Summary:
Add a special field in Graph and Node level metadata called "custom" which should be mapped to a json-serializable object, and we guarantee this field should be always preversed across the following transformations:
1. copy/deepcopy
2. run_decompositions()
3. serialization
4. re-exporting

Test Plan: :test_export -- -r custom_tag

Reviewed By: angelayi

Differential Revision: D60291839

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131912
Approved by: https://github.com/angelayi
2024-08-14 01:09:01 +00:00
Pian Pawakapan
4671e98656 [export] fix node.users when inlining HOOs (#133144)
The process of inlining HOO subgraphs (e.g. set_grad_enabled) seems to break node.users when a node is present in multiple subgraphs, for example:
```
class SetGradCase(torch.nn.Module):
    def forward(self, x):
        _x = x.shape[0] + 2
        _xx = _x + 2
        with torch.no_grad():
            y = _x * 4
        return _xx, y
```

The `_x` node contains 2 users (_xx and y) after being inlined, but with inspection it seems to only contain y as a user.

Previously we were completely clearing node.users for output nodes in HOO subgraphs before inlining them - we should just be deleting the subgraph output nodes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133144
Approved by: https://github.com/larryliu0820, https://github.com/ydwu4
2024-08-13 03:21:42 +00:00
Shangdi Yu
b06959e614 [export] change deepcopy to copy in _replace_with_hop passes (#133142)
Summary:
Add back the change in 19897a1647.

The change was lost in refactoring due to a bad rebase.

Test Plan:
CI

```
buck2 run 'fbcode//mode/dev-nosan'  fbcode//torchrec/distributed/tests:test_pt2 -- --filter-text test_sharded_quant_fpebc_non_strict_export
```

Differential Revision: D61052687

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133142
Approved by: https://github.com/ydwu4
2024-08-12 17:15:04 +00:00
PyTorch MergeBot
9641abe97a Revert "[export] change deepcopy to copy in _replace_with_hop passes (#133142)"
This reverts commit 2d71f03db1.

Reverted https://github.com/pytorch/pytorch/pull/133142 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/133142#issuecomment-2284327241))
2024-08-12 15:48:11 +00:00
Shangdi Yu
2d71f03db1 [export] change deepcopy to copy in _replace_with_hop passes (#133142)
Summary:
Add back the change in 19897a1647.

The change was lost in refactoring due to a bad rebase.

Test Plan:
CI

```
buck2 run 'fbcode//mode/dev-nosan'  fbcode//torchrec/distributed/tests:test_pt2 -- --filter-text test_sharded_quant_fpebc_non_strict_export
```

Differential Revision: D61052687

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133142
Approved by: https://github.com/ydwu4
2024-08-11 21:47:52 +00:00
Avik Chaudhuri
c8275e25a7 fix requirement for error classification (#133122)
Test Plan: none

Differential Revision: D61039300

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133122
Approved by: https://github.com/yushangdi
2024-08-10 04:59:09 +00:00
Avik Chaudhuri
3899465268 relax unification checks when size-like symbols can be 0 (#133112)
Test Plan: Fixes test failure in https://www.internalfb.com/diff/D51127481

Differential Revision: D61031307

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133112
Approved by: https://github.com/angelayi
2024-08-10 00:57:49 +00:00
Shangdi Yu
574cdf1232 [export] Merge functions in replace set_grad/autocast with HOO (#132724)
Summary: as title

Test Plan: CI

Differential Revision: D60701648

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132724
Approved by: https://github.com/ydwu4
2024-08-09 17:25:07 +00:00
Avik Chaudhuri
22ea248aa8 dynamic shapes mismatch errors (#132982)
Summary: When PyTree detects a structural mismatch between inputs and dynamic shapes, the error messages are quite horrible. This PR fixes these error messages by adding, for each kind of error, the path to the point where the error happens and an actionable reason for the error.

Test Plan: added test with several cases

Differential Revision: D60956976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132982
Approved by: https://github.com/yushangdi
2024-08-09 02:22:32 +00:00
Shangdi Yu
3c5b246d3c [export] Remove Proxy from exported programs and modules (#132956)
Summary: Remove Proxy from exported programs and modules because they cannot be deepcopied or pickeled.

Test Plan:
CI

```
buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test/quantization:test_quantization -- -r  qat_conv2d
buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r  test_fold_bn_erases_bn_node
```

Differential Revision: D60940832

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132956
Approved by: https://github.com/angelayi
2024-08-09 00:00:20 +00:00
Jiashen Cao
fa8c34301a [ts-migration]: Quantized ops to standard ops pass. (#133026)
#### Description
Transform quantized operation properly. Add de/quantization before and after the quantized operation.

#### Test Plan
`pytest test/export/test_converter.py -s -k test_ts2ep_convert_quantized_model`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133026
Approved by: https://github.com/angelayi
2024-08-08 23:10:17 +00:00
Edward Z. Yang
1f66487c69 [BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770
Approved by: https://github.com/bdhirsh
2024-08-08 23:07:23 +00:00
PyTorch MergeBot
6f99e97f0a Revert "[ts-migration]: Support quantized operation transformation (#131915)"
This reverts commit 0e8541766f.

Reverted https://github.com/pytorch/pytorch/pull/131915 on behalf of https://github.com/ezyang due to test broken on windows 0e8541766f ([comment](https://github.com/pytorch/pytorch/pull/131915#issuecomment-2275974907))
2024-08-08 14:30:35 +00:00
PyTorch MergeBot
d1f73fd844 Revert "[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)"
This reverts commit 902c6f3a19.

Reverted https://github.com/pytorch/pytorch/pull/132770 on behalf of https://github.com/ezyang due to Removed API was recommitted ([comment](https://github.com/pytorch/pytorch/pull/132770#issuecomment-2275749689))
2024-08-08 12:54:34 +00:00
Edward Z. Yang
902c6f3a19 [BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770
Approved by: https://github.com/bdhirsh
ghstack dependencies: #132674, #132675, #132421, #132062, #132767, #132769
2024-08-08 12:03:25 +00:00
Edward Z. Yang
54efd43022 [BE] Simplify code interacting with get_proxy_mode/enable_tracing (#132675)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132675
Approved by: https://github.com/Skylion007, https://github.com/ydwu4, https://github.com/zou3519
ghstack dependencies: #132674
2024-08-08 12:03:00 +00:00
Jiashen Cao
0e8541766f [ts-migration]: Support quantized operation transformation (#131915)
#### Description
Transform quantized operation properly. Add de/quantization before and after the quantized operation.

#### Test Plan
`pytest test/export/test_converter.py -s -k test_ts2ep_convert_quantized_model`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131915
Approved by: https://github.com/angelayi
2024-08-08 06:34:53 +00:00
Angela Yi
45d0e90bd3 [export] Allow str outputs (#132808)
Summary: Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1478413606130179/

Test Plan: CI

Differential Revision: D60850712

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132808
Approved by: https://github.com/ydwu4
2024-08-08 02:20:59 +00:00
Yidi Wu
bbf568aac8 Split of "[reland] [export] fix zero arg export in training_ir and constant tensor handling" (#132307)
Summary:
A re-land of D60006710.
Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing.

edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure:
a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state.
The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design

edit 2: Also fix the inconsistency of graph signatures when param_constant is marked as lifted_tensor_constants but it's registered as parameters in the output of ep.module().

Differential Revision: D60532628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132307
Approved by: https://github.com/zhxchen17
2024-08-08 01:36:16 +00:00
Xu Han
a9036e1cf8 [inductor] raise unsupport msg in capture_pre_autograd_graph on Windows (#132841)
Debuged with @leslie-fang-intel , and we found that: https://github.com/pytorch/pytorch/issues/132561 and https://github.com/pytorch/pytorch/issues/132569 are all failed by `capture_pre_autograd_graph` not work well on Windows.

So, we added some code to raise message and let end user known that.

Detailed:
For https://github.com/pytorch/pytorch/issues/132561
```cmd
Traceback (most recent call last):
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 549, in _callTestMethod
    method()
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 2918, in wrapper
    method(*args, **kwargs)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 1515, in wrapper
    fn(*args, **kwargs)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_quantization.py", line 399, in wrapper
    fn(*args, **kwargs)
  File "D:\xu_git\dnnl_cb\pytorch\test\quantization\pt2e\test_x86inductor_quantizer.py", line 1737, in test_qat_conv2d
    self._test_quantizer(
  File "D:\xu_git\dnnl_cb\pytorch\test\quantization\pt2e\test_x86inductor_quantizer.py", line 553, in _test_quantizer
    m = capture_pre_autograd_graph(
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\_export\__init__.py", line 121, in capture_pre_autograd_graph
    raise RuntimeError("capture_pre_autograd_graph not yet supported on Windows")
RuntimeError: capture_pre_autograd_graph not yet supported on Windows

To execute this test, run the following from the base repo dir:
    python test\quantization\pt2e\test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor.test_qat_conv2d

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
```

For https://github.com/pytorch/pytorch/issues/132569
```cmd
Traceback (most recent call last):
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 59, in testPartExecutor
    yield
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 549, in _callTestMethod
    method()
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 2918, in wrapper
    method(*args, **kwargs)
  File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_torchinductor.py", line 11218, in new_test
    return value(self)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\_dynamo\testing.py", line 312, in _fn
    return fn(*args, **kwargs)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_cpu_cpp_wrapper.py", line 155, in fn
    _, code = test_torchinductor.run_and_get_cpp_code(
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\_inductor\utils.py", line 1863, in run_and_get_cpp_code
    result = fn(*args, **kwargs)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_quantization.py", line 415, in wrapper
    fn(*args, **kwargs)
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_quantization.py", line 367, in wrapper
    fn(*args, **kwargs)
  File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_mkldnn_pattern_matcher.py", line 1668, in test_qlinear_gelu_cpu
    self._qlinear_unary_cpu_test_helper((torch.randn((2, 4)),), gelu)
  File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_mkldnn_pattern_matcher.py", line 1615, in _qlinear_unary_cpu_test_helper
    self._test_common(
  File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_mkldnn_pattern_matcher.py", line 165, in _test_common
    convert_model = _generate_qdq_quantized_model(
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_quantization.py", line 2949, in _generate_qdq_quantized_model
    export_model = capture_pre_autograd_graph(
  File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\_export\__init__.py", line 121, in capture_pre_autograd_graph
    raise RuntimeError("capture_pre_autograd_graph not yet supported on Windows")
RuntimeError: capture_pre_autograd_graph not yet supported on Windows

To execute this test, run the following from the base repo dir:
    python test\inductor\test_cpu_cpp_wrapper.py -k DynamicShapesCppWrapperCpuTests.test_qlinear_gelu_cpu_dynamic_shapes_cpp_wrapper

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
--------------------------------------------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------------------------------------------
W0807 13:24:34.291000 11228 torch\_export\__init__.py:64] +============================+
W0807 13:24:34.291000 11228 torch\_export\__init__.py:65] |     !!!   WARNING   !!!    |
W0807 13:24:34.291000 11228 torch\_export\__init__.py:66] +============================+
W0807 13:24:34.291000 11228 torch\_export\__init__.py:67] capture_pre_autograd_graph() is deprecated and doesn't provide any function guarantee moving forward.
W0807 13:24:34.291000 11228 torch\_export\__init__.py:68] Please switch to use torch.export instead.
```

Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132841
Approved by: https://github.com/jgong5, https://github.com/ezyang
2024-08-08 00:28:07 +00:00
Edward Z. Yang
9282e6ca78 Don't use _disable_current_modes as decorator (#132809)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132809
Approved by: https://github.com/albanD
ghstack dependencies: #132801, #132802, #132804
2024-08-07 23:59:46 +00:00
angelayi
c327710a87 [export] Publicize validate function (#132777)
as titled

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132777
Approved by: https://github.com/zhxchen17
2024-08-07 23:10:05 +00:00
PyTorch MergeBot
9d476fee53 Revert "[BE] Simplify code interacting with get_proxy_mode/enable_tracing (#132675)"
This reverts commit c2bccfd431.

Reverted https://github.com/pytorch/pytorch/pull/132675 on behalf of https://github.com/PaliC due to We need to now revert https://github.com/pytorch/pytorch/pull/132216 in OSS and there is a dependency on this pr ([comment](https://github.com/pytorch/pytorch/pull/132674#issuecomment-2274062785))
2024-08-07 18:25:33 +00:00
Shangdi Yu
825002c9c6 [export][fx] More robust DCE pass (#132764)
Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).

- mark Proxy dump as NotImplemented for better error msg

- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.

More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.

Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test/quantization:test_quantization -- -r  qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r  test_fold_bn_erases_bn_node
```

Reviewed By: angelayi

Differential Revision: D60319175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
2024-08-06 22:27:22 +00:00
Edward Z. Yang
c2bccfd431 [BE] Simplify code interacting with get_proxy_mode/enable_tracing (#132675)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132675
Approved by: https://github.com/Skylion007, https://github.com/ydwu4, https://github.com/zou3519
ghstack dependencies: #132674
2024-08-06 18:13:22 +00:00
Jiashen Cao
ca7ce2fca1 [ts-migration][1/N]: Add prim::Loop for constant number of iterations and condition (#131418)
#### Description
This PR adds prim::Loop support for the simplest case where the number of iteration is constant and the loop termination condition is also a constant.

[PR by stages](https://docs.google.com/document/d/1q6OprW3HBHbYPwEyE_DikBn-uzmhnN284Cmen_CnlhI/edit?usp=sharing)

#### Test Plan
Add reprod example.
* `pytest test/export/test_converter.py -s -k test_ts2ep_with_loop`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131418
Approved by: https://github.com/angelayi
2024-08-06 16:51:08 +00:00
Shangdi Yu
93fad2f0f2 [export] Fix import in D60427208 (#132707)
Summary:
D60427208 broke APS release by failing our NE  deterministric test. https://www.internalfb.com/intern/test/562950111197340/

This Diff fixes it.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//aps_models/ads/gmp/tests/ne/e2e_deterministic_tests:gmp_e2e_ne_tests -- --filter-text test_mtml_instagram_model_474023725_single_gpu_with_ir
```

Differential Revision: D60790203

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132707
Approved by: https://github.com/ydwu4
2024-08-06 02:35:17 +00:00
Shangdi Yu
4a2cf50edf [export][reland] Convert autocast to HOO (#132677)
Summary:
Reland of D60206382.

Suggested in https://github.com/pytorch/pytorch/issues/128394.

If there's an autocast context manager, the predispatch (strict) graph can look something like:

```
class <lambda>(torch.nn.Module):
    def forward(self, x: "f32[1]"):
        ...
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
        mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1);  rand = rand_1 = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = None
        return (mm_1,)
```

But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.

Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.

Test Plan:
CI

```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_autocast"
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_set_grad"
```

Verified that now we can export the llama model in  gh issue 128394 and the gemma model in  gh issue 131829 without error.

Differential Revision: D60770038

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132677
Approved by: https://github.com/angelayi
2024-08-05 22:34:52 +00:00
PyTorch MergeBot
a3ea96b762 Revert "[export] Convert autocast to HOO (#131914)"
This reverts commit aec948adfc.

Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/davidberard98 due to PR shouldn't have been relanded by the bot, phabricator diff did not have any recent changes and is still internally reverted ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2269797388))
2024-08-05 19:52:09 +00:00
Shangdi Yu
aec948adfc [export] Convert autocast to HOO (#131914)
Summary:
Suggested in https://github.com/pytorch/pytorch/issues/128394.

If there's an autocast context manager, the predispatch (strict) graph can look something like:

```
class <lambda>(torch.nn.Module):
    def forward(self, x: "f32[1]"):
        ...
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
        mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1);  rand = rand_1 = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = None
        return (mm_1,)
```

But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.

Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.

Test Plan:
CI

```
parsh --build-flags fbcode//mode/dev-nosan  fbcode//caffe2/test:test_export
run_tests("test_predispatch_autocast")
```

Reviewed By: angelayi

Differential Revision: D60206382

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914
Approved by: https://github.com/angelayi
2024-08-05 18:52:12 +00:00
Avik Chaudhuri
27f61eba58 serde sympy functions (#132493)
Summary: Sympy functions appearing in symbolic expressions inside tensor metadata were not being deserialized properly.

Test Plan: updated test

Differential Revision: D60573150

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132493
Approved by: https://github.com/pianpwk
2024-08-05 08:08:50 +00:00
Jiashen Cao
09fcd792eb [Fix]: ScriptObject lifting issue (#130952)
#### Issue
ScriptObject was treated as normal attribute by the converter previously. This PR lifts it to be a constant and convert it directly to a GetAttr fx node. ScriptObject would also trigger `CallMethod` and this PR adds that support as well.

#### Test Plan
Add test case for ScriptObject.
`pytest test/export/test_converter.py -s -k test_convert_script_object`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130952
Approved by: https://github.com/angelayi
2024-08-04 16:52:45 +00:00
PyTorch MergeBot
d984105748 Revert "[export] Convert autocast to HOO (#131914)"
This reverts commit b28c01d90d.

Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/ezyang due to Failing lint, but was covered up by master failure on lint ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2267248773))
2024-08-04 02:10:35 +00:00
Pian Pawakapan
a896fb1b36 check unsupported sympy functions for runtime asserts (#132457)
Some sympy Functions aren't supported by sympy_interp(); we can't turn them into FX nodes, so currently the runtime asserts CSE pass avoids CSE'ing on any expression containing a sympy Function. https://github.com/pytorch/pytorch/pull/132325 started tracking unsupported functions, so we switch the check to that to be more precise. We also check for and skip unsupported functions when adding asserts - previously we only did the check for CSE, and not adding new expressions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132457
Approved by: https://github.com/avikchaudhuri
2024-08-03 10:17:25 +00:00
Jiashen Cao
159d508f03 [Fix]: prim::If with multiple outputs and input return directly (#131779)
#### Issue
Test is not working for prim::Loop with multiple outputs. Additionally fix issue where input is directly returned, which is not supported by HigherOrderOp.

#### Test Plan
`pytest test/export/test_converter.py -s -k test_convert_if_multiple_out`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131779
Approved by: https://github.com/angelayi, https://github.com/SherlockNoMad
2024-08-03 08:07:21 +00:00
Shangdi Yu
b28c01d90d [export] Convert autocast to HOO (#131914)
Summary:
Suggested in https://github.com/pytorch/pytorch/issues/128394.

If there's an autocast context manager, the predispatch (strict) graph can look something like:

```
class <lambda>(torch.nn.Module):
    def forward(self, x: "f32[1]"):
        ...
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
        mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1);  rand = rand_1 = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = None
        return (mm_1,)
```

But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.

Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.

Test Plan:
CI

```
parsh --build-flags fbcode//mode/dev-nosan  fbcode//caffe2/test:test_export
run_tests("test_predispatch_autocast")
```

Reviewed By: angelayi

Differential Revision: D60206382

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914
Approved by: https://github.com/angelayi
2024-08-03 05:48:57 +00:00
Shangdi Yu
a503136583 [export] Detect whether case_name is registered in exportdb (#132420)
Summary:
- moves logging functionalities into `torch/_export/db/logging.py` file.
- add a check in `_dynamo/eval_frame.py` to check for optional input and error out with `UnsupportedError`
- change the case name of `torch_sym_int` to `unsupported_operator`
- Check if the case name is registered in exportdb, if so, we give a link to the case in exportdb.
- TODO: add test

Test Plan:
CI

Running the example in https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input gives the following error logging:

```
E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] Parameter y is optional with a default value of tensor([[-0.1633,  1.2414, -0.1071],
E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086]         [-0.1936, -0.9425, -0.0824]])
E0730 10:53:33.688000 4155538 torch/export/_trace.py:1043] See optional_input in exportdb for unsupported case.                 https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input
......
  File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/389acaeb40d57230/tutorials/pytorch/nntest/__torchtest__/torchtest#link-tree/torch/_dynamo/eval_frame.py", line 1091, in produce_matching
    raise Unsupported(
torch._dynamo.exc.Unsupported: Tracing through optional input is not supported yet
```

It also logs a `export.error.classified` event in Scuba.

Reviewed By: zhxchen17

Differential Revision: D60427208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132420
Approved by: https://github.com/zhxchen17
2024-08-03 01:08:48 +00:00
Sherlock Huang
e3513fb2af [ts_converter]handle python list append, list add, aten.to.dtype+mutation_op pattern (#132529)
Summary:
#### Description
Add support for aten::append with a python function that returns a new list with the appended element. We then update the `fx_node` in the `name_to_node` mapping.

aten::append contributed by Jiashen Cao <jiashenc@meta.com>

Fix conversion for csr_ranker_test

```
    model_name: csr_ranker_test_4.ptl
    has_ts_model: True
    has_sample_inputs: True
    ops_maybe_missing_meta: set()
    script_objects: set()
    ts_can_run: True
    ts_run_exception: None
    can_convert: True
    convert_exception: None
    ep_result_correct: True
    ep_run_exception: None
    can_package: True
    package_exception: None
    sigmoid_can_run: False
    sigmoid_run_exception: RuntimeError('not for symbolics')
    sigmoid_result_correct: None
```

Test Plan:
test_aten_add_t
test_aten_append_t
test_aten_to_dtype_with_mutating_storage

buck2 run mode/opt sigmoid/inference/ts_migration:main -- --mode test_one --model_name csr_ranker_test

Differential Revision: D60635893

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132529
Approved by: https://github.com/jiashenC
2024-08-02 23:32:37 +00:00
Yidi Wu
19897a1647 [export] change deepcopy to copy in _replace_set_grad_with_hop pass.. (#132181)
Summary:
Fixes T197371132.

Previously, we call copy.deepcopy to avoid mutating the original signature. However, this causes errors when the signature reference a FakeScriptObject, which then references a real torch.ScriptObject due to "The tensor has a non-zero number of elements, but its data is not allocated yet."

We therefore just change it to a shallow copy. This should be good enough for guarding the signature.

Test Plan: buck2 run 'fbcode//mode/opt' torchrec/distributed/tests:test_pt2 -- --filter-text "test_sharded_quant_ebc_non_strict_export"

Differential Revision: D60476839

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132181
Approved by: https://github.com/BoyuanFeng
2024-08-02 17:57:09 +00:00
Mikayla Gawarecki
87ddf70fc6 Set weights_only=False in export deserialize_torch_artifact (#132348)
Context:

We are planning to make a BC breaking change to `torch.load` by flipping the default for `weights_only` from `False` --> `True` in a future release. With `weights_only=True`, a custom unpickler is used that limits what can be loaded to state_dicts containing tensors (there is also a way for the user to allowlist specific things to be loaded). The goal of this is to attempt to prevent remote execution of arbitrary code when using `torch.load`.

To my understanding, in export, `torch.load` is used internally to load arbitrary objects, so we should set `weights_only=False` here to prevent the flip from breaking export.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132348
Approved by: https://github.com/angelayi
2024-08-01 23:25:07 +00:00
angelayi
010fc7858a [export] Fix serialization of OpOverload w/ SymInt outputs (#132126)
Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1473575486613991/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132126
Approved by: https://github.com/ydwu4
2024-08-01 17:22:04 +00:00
Oguz Ulgen
72d2dba992 Add None return type to init (#132335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335
Approved by: https://github.com/albanD
2024-08-01 15:26:45 +00:00
Justin Chu
0d88dd0f77 [TS2E] Remove reference to torch.onnx internals (#132186)
Instead, this PR moves the code to the converter to avoid dependence. Feel free to refactor it afterward.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132186
Approved by: https://github.com/angelayi
2024-08-01 15:08:02 +00:00
Xuehai Pan
e7eeee473c [BE][Easy][14/19] enforce style for empty lines in import segments in torch/_[a-c]*/ and torch/_[e-h]*/ and torch/_[j-z]*/ (#129765)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765
Approved by: https://github.com/ezyang
2024-07-31 10:42:50 +00:00
Jiashen Cao
ff377e16ab Improve logging in the TSConverter (#132082)
Summary: Currently, running explain with TORCH_LOGS enabled will cause duplicate loggings because explain uses the exact same code path for covnersion. This PR just disables logging when it is running explain. And move all logging to convert() to prevent from logging from __init__ when we are just using explain.

Test Plan: Manual testing with attached outputs.

Reviewed By: SherlockNoMad, angelayi

Differential Revision: D60199007

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132082
Approved by: https://github.com/ydwu4
2024-07-30 21:37:44 +00:00
PyTorch MergeBot
945bf78894 Revert "[BE] typing for decorators - fx/_compatibility (#131568)"
This reverts commit 193f62fde9.

Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident.  This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))
2024-07-28 03:43:39 +00:00
Yidi Wu
404a8ae8f6 [export] fix set_grad x tensor constant. (#131787)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/130379.

The original error is verifier finds that the placeholder nodes' meta[''val"] are missing in subgraph of WrapSetGradEnabled hop.

In this PR, we fixed it by re-ordering the replace_set_grad_with_hop_pass with lift_constant_tensor pass because only after lift_constant_pass, all the constant attrs start to have meta["val"].

Test Plan: buck2 test test:test_export -- -r "test_setgrad_lifted_tensor"

Differential Revision: D60244935

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131787
Approved by: https://github.com/yushangdi
2024-07-26 16:41:59 +00:00
Avik Chaudhuri
5b05ad9697 fix non-persistent buffers (#131756)
Summary:
Dynamo doesn't track whether buffers are `persistent`. This led to some ugly code where we would mark buffers as always persistent when creating signatures, then later check whether the buffers were not in the state dict to infer whether they were non-persistent, and use this to fix up the signature.

This PR instead defines a utility to look up all the non-persistent buffers registered inside a module (this information is recorded in a private `_non_persistent_buffers_set` module attribute), and uses it to (a) correctly set the persistent flag on buffers when creating signatures (b) transfer this information to a Dynamo-traced graph module, which then causes non-persistent buffers to (correctly) not show up in the state dict.

Test Plan: existing tests + new case with non-persistent buffer in nested module

Differential Revision: D60224656

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131756
Approved by: https://github.com/zhxchen17, https://github.com/ydwu4
2024-07-26 04:45:30 +00:00
Aaron Orenstein
193f62fde9 [BE] typing for decorators - fx/_compatibility (#131568)
See #131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568
Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519
2024-07-25 22:24:19 +00:00
angelayi
f063027d54 [aoti] Fix constant inputs passed to aoti (#131594)
In cases where the program takes in a constant, export will specialize on the constant and embed the constant into the graph, with the graph containing a placeholder node with no users. However, inductor errors further down as typically in torch.compile, these constants don't show up as inputs. Since these constants are already embedded in the graph, we will just ignore these inputs while compiling with AOTI, and filter out the non-tensor inputs during the runtime.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131594
Approved by: https://github.com/desertfire
2024-07-25 16:22:15 +00:00
Sherlock Huang
96e8df6a3a [ts_converter] Support prim::max and prim::if with multiple outputs (#131593)
Summary: As title.

Test Plan: test_converter.py

Reviewed By: angelayi

Differential Revision: D60147455

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131593
Approved by: https://github.com/ydwu4
2024-07-25 16:13:31 +00:00
Justin Chu
c88c90a897 [TS2E] Improve logging (#131711)
Serializing the text without having to do so can be costly for large outputs like ExportedProgram

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131711
Approved by: https://github.com/ydwu4
2024-07-25 13:40:10 +00:00
angelayi
ab609d6aa6 [ts_convert] Update conversion for aten.tensor (#131549)
Fixes aten::tensor issues in edgeml models
P1492137675
| suite   |   #models |   #has_ts_model |   #has_sample_inputs |   #ts_can_run |   #can_convert |   #ep_result_correct |   #can_package |   #sigmoid_can_run |   #sigmoid_result_correct |
|---------|-----------|-----------------|----------------------|---------------|----------------|----------------------|----------------|--------------------|---------------------------|
| EDGEML  |        34 |              25 |                   23 |            21 |              2 |                    2 |              2 |                  2 |                         2 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131549
Approved by: https://github.com/jiashenC, https://github.com/SherlockNoMad
2024-07-25 01:11:03 +00:00
Angela Yi
7535b23a25 [export] Fix set_grad hoo if output is empty (#131511)
Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1467707973867409/

Test Plan: CI

Differential Revision: D60135531

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131511
Approved by: https://github.com/ydwu4
2024-07-24 23:17:20 +00:00
Shangdi Yu
f9322c26b2 Remove _export/exported_program.py (#131597)
Summary:
We removed references to _export/exported_program.py in executorch
in D60052318. Now we can remove this file.

Update the pin to executorch.

Test Plan: contbuild & OSS CI:

Differential Revision: D60072980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131597
Approved by: https://github.com/avikchaudhuri
2024-07-24 22:04:17 +00:00
angelayi
b90aa18569 [aoti] Add initial custom op support (#127034)
Re-land of https://github.com/pytorch/pytorch/pull/125242

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127034
Approved by: https://github.com/malfet
2024-07-24 20:29:55 +00:00
Jiashen Cao
31da9ee711 Use explain function to provide more meaningful information when conversion failed. (#131214)
Summary: In the script of testing different families of models, when the conversion failed, we switch to use output from the explain function to provide more meaningful information.

Test Plan:
Manual testing with attatched log information.

```
buck2 run mode/dev-nosan sigmoid/inference/ts_migration:main -- --mode test_all --test_suites ads_merge --model_id 440779101
```

```
Processing 440779101_5455.predictor.disagg.gpu.merge

    model_name: 440779101_5455.predictor.disagg.gpu.merge
    has_ts_model: True
    has_sample_inputs: True
    ops_maybe_missing_meta: set()
    ts_can_run: True
    ts_run_exception: None
    can_convert: False
    convert_exception: Unsupported nodes are found in the following list:

        0. prim::Loop [%14259 : int = prim::Loop(%14258, %1129, %1126), scope: torch.fx.graph_module.GraphModule:: # <torch_package_1>.caffe2/torch/fb/predictor/modules/tensors_to_device_module.py💯19]

        1. prim::Loop [%14326 : int = prim::Loop(%1115, %1129, %14259), scope: torch.fx.graph_module.GraphModule:: # <torch_package_1>.caffe2/torch/fb/predictor/modules/tensors_to_device_module.py💯19]
    ep_result_correct: None
    ep_run_exception: None
    can_package: None
    package_exception: None
    sigmoid_can_run: None
    sigmoid_run_exception: None
    sigmoid_result_correct: None
```

Reviewed By: SherlockNoMad

Differential Revision: D59971446

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131214
Approved by: https://github.com/angelayi
2024-07-24 02:42:18 +00:00
Shangdi Yu
fdc9a1404e Remove _BLACK_LISTED_OPS (#131361)
Summary: remove _BLACK_LISTED_OPS after https://github.com/pytorch/pytorch/pull/100749

Test Plan: contbuild & OSS CI

Differential Revision: D60056130

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131361
Approved by: https://github.com/angelayi
2024-07-24 00:15:27 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
Avik Chaudhuri
94f22eb6b2 refactor post-trace fakification in strict (#131421)
Summary:
Previously it was unclear what `_convert_input_to_fake` actually does (used in strict), and in particular how it is different from `make_fake_inputs` (used in non-strict).

This PR splits that function to work purely on user inputs, then renames it to `extract_fake_inputs` and adds a comment clarifying what it does—namely, it extracts fake inputs from a given graph module instead of "converting inputs to fake inputs" (as suggested by the current name) or "making fake inputs" (as happens in non-strict, where no tracing has taken place yet).

The remainder of that function used to also fakify params and buffers. It turns out that this part is identical to what happens in non-strict, hence we also pull `make_fake_inputs` out from `non_strict_utils` into `_trace`, merge it with another util, and make both modes call it.

Test Plan: existing tests

Differential Revision: D60084442

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131421
Approved by: https://github.com/zhxchen17
2024-07-23 18:23:03 +00:00
Shangdi Yu
f85c35872b Remove GraphModuleOpUpgrader in _export.serde.upgrade.py (#131373)
Summary: Remove GraphModuleOpUpgrader in _export.serde.upgrade.py and the file

Test Plan: contbuild & OSS CI

Differential Revision: D60067937

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131373
Approved by: https://github.com/angelayi
2024-07-23 18:09:44 +00:00
Zhengxu Chen
3aa45cae77 [export] Removed deprecated dialect field from EP schema. [2/2] (#131344)
Summary: Not landable until we've updated the pin of executorch.

Test Plan: CI

Reviewed By: SherlockNoMad

Differential Revision: D59759620

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131344
Approved by: https://github.com/SherlockNoMad, https://github.com/ydwu4
2024-07-23 16:05:10 +00:00
Avik Chaudhuri
1e5ecc4277 move save/load from _export to export (#131353)
Test Plan: existing tests

Differential Revision: D60053905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131353
Approved by: https://github.com/angelayi
2024-07-23 00:48:28 +00:00
PyTorch MergeBot
b9912f31ef Revert "[export] fix zero arg export in training_ir (#130990)"
This reverts commit 50436d5bdb.

Reverted https://github.com/pytorch/pytorch/pull/130990 on behalf of https://github.com/clee2000 due to failing some executorch and torchrec tests internally D60006710 ([comment](https://github.com/pytorch/pytorch/pull/130990#issuecomment-2243395316))
2024-07-22 16:49:25 +00:00
Yidi Wu
50436d5bdb [export] fix zero arg export in training_ir (#130990)
Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing.

**edit:** also remove the eliminate_dead_code() in _unlift because of one onnx test failure:
a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state.

The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130990
Approved by: https://github.com/pianpwk
2024-07-20 02:35:13 +00:00
Jiashen Cao
9b5c70878b [Fix] Missing parameter happens when retracing an already jit.scripted module (#129787)
#### Issue
Model parameters sometime do not appear in the `named_parameters()` function. For example, when trying to jit.trace an already jit.scripted model. This PR fixes that by relying on `state_dict` to get both parameters`requires_grad=True` and buffers.

#### Test Plan
* `pytest test/export/test_converter.py -s -k test_convert_retrace_nested_scripted_modules`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129787
Approved by: https://github.com/angelayi
2024-07-19 16:58:48 +00:00
Jiashen Cao
686b7f046a [Fix]: TSConverter handles call ops with multiple outputs (#129294)
#### Issue
* Current call ops does not handle IR with multiple outputs. If an op has multiple outputs, we add an implicit unpack to map output. E.g.,
```
%5 : Tensor, %6 : Tensor = aten::max(%x.1, %3, %4), scope: export.test_converter.M:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:774:20
```
* There are some cases that `prim::If` sub-blocks do not return any outputs. E.g.,
```
%9 : bool = aten::gt(%8, %3), scope: export.test_converter.M::/torch.nn.modules.pooling.AdaptiveMaxPool2d::pool # <string>:5:9
   = prim::If(%9), scope: export.test_converter.M::/torch.nn.modules.pooling.AdaptiveMaxPool2d::pool # <string>:5:2
    block0():
      -> ()
    block1():
       = prim::RaiseException(%5, %4), scope: export.test_converter.M::/torch.nn.modules.pooling.AdaptiveMaxPool2d::pool # <string>:5:2
      -> ()
```

#### Test Plan
We did an exhaustive search of all torch APIs that can return multiple outputs. We sample some of common ones and add new test cases based on those.
* `pytest test/export/test_converter.py -s -k test_ts2ep_multi_outputs_on_call_ops`

#### Appendix
* aten ops that return multiple outputs.
```
aten._batch_norm_impl_index
aten._batch_norm_no_update
aten._batch_norm_with_update
aten._batch_norm_with_update_functional
aten._cudnn_rnn
aten._efficient_attention_backward
aten._efficient_attention_forward
aten._embedding_bag
aten._embedding_bag_forward_only
aten._flash_attention_backward
aten._flash_attention_forward
aten._fused_adam
aten._fused_dropout
aten._fused_moving_avg_obs_fq_helper
aten._linalg_det
aten._linalg_eigh
aten._linalg_slogdet
aten._linalg_solve_ex
aten._linalg_svd
aten._native_batch_norm_legit
aten._native_batch_norm_legit_functional
aten._native_batch_norm_legit_no_training
aten._pack_padded_sequence
aten._prelu_kernel_backward
aten._scaled_dot_product_efficient_attention
aten._scaled_dot_product_efficient_attention_backward
aten._scaled_dot_product_flash_attention
aten._scaled_dot_product_flash_attention_backward
aten._scaled_dot_product_flash_attention_for_cpu
aten._scaled_dot_product_flash_attention_for_cpu_backward
aten._thnn_fused_lstm_cell
aten._thnn_fused_lstm_cell_backward_impl
aten._unique2
aten._weight_norm_interface
aten.adaptive_max_pool2d
aten.adaptive_max_pool3d
aten.aminmax
aten.batch_norm_backward
aten.convolution_backward
aten.cudnn_batch_norm
aten.cudnn_batch_norm_backward
aten.cummax
aten.cummin
aten.fractional_max_pool2d
aten.frexp
aten.grid_sampler_2d_backward
aten.grid_sampler_3d_backward
aten.gru
aten.linalg_cholesky_ex
aten.linalg_eig
aten.linalg_inv_ex
aten.linalg_ldl_factor_ex
aten.linalg_lu
aten.linalg_lu_factor_ex
aten.linalg_qr
aten.linear_backward
aten.log_sigmoid_forward
aten.lstm
aten.lu_unpack
aten.max
aten.max_pool2d_with_indices
aten.max_pool3d_with_indices
aten.median
aten.min
aten.miopen_batch_norm
aten.miopen_batch_norm_backward
aten.mkldnn_rnn_layer
aten.mkldnn_rnn_layer_backward
aten.mode
aten.multilabel_margin_loss_forward
aten.nanmedian
aten.native_batch_norm
aten.native_batch_norm_backward
aten.native_dropout
aten.native_group_norm
aten.native_group_norm_backward
aten.native_layer_norm
aten.native_layer_norm_backward
aten.nll_loss2d_forward
aten.nll_loss_forward
aten.quantized_gru
aten.quantized_lstm
aten.rnn_relu
aten.rnn_tanh
aten.sort
aten.std_mean
aten.topk
aten.triangular_solve
aten.unique_dim
aten.var_mean
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129294
Approved by: https://github.com/angelayi
2024-07-18 21:55:18 +00:00
Pian Pawakapan
745324e487 [export] turn on hybrid symints by default (#130775)
Sets `prefer_deferred_runtime_asserts_over_guards=True` for export, so any guards emitted from `SymNode.expect_true` (for example, guards that are implicitly required to be true for an op to succeed) won't lead to constraint violations. Instead these should appear in the graph as runtime asserts, or potentially as replacement expressions for placeholder shapes.

For example, this reshape op should emit s0 * s1 = s2, deferred as a runtime assert.
```
x = torch.randn(4, 8)  # [s0, s1]
y = torch.randn(32)  # [s2]
out = x.reshape(-1) + y
# this emits Eq(s0 * s1, s2), and we represent y's shape as [s0*s1] in the graph.
```

However, other complex guards can still cause export to fail, for instance guards emitted from `SymNode.guard_bool/guard_size_oblivious` (e.g. explicit if-else conditions in user code or lower-level op implementations hit during tracing) can still raise constraint violations. These can be deferred with `allow_complex_guards_as_runtime_asserts=True`. We don't yet make this default, because while this makes export more likely to succeed, it results in non-trivial asserts being emitted that often represent specialization to a variant of the op, or checks related to 0/1 specialization.

We also remove forced specializations for export and kill the `_disable_forced_specializations` flag - now any guard we can't express with Dims/DerivedDims either are handled with Hybrid SymInts, or should be resolved with rewriting or deferring.

Follow up:
Currently, `ShapeEnv._set_replacement()` is called for complex equality expressions (e.g. s2 -> s0*s1 in the example above), and the ExportedProgram stores `s0*s1` in the input placeholder. This isn't checked for validity when the program is run, so an option is to avoid replacement and/or runtime assert on equality.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130775
Approved by: https://github.com/avikchaudhuri
2024-07-18 17:40:58 +00:00
Zhengxu Chen
5484c86021 [export] Fully support extension op in serialization/deserialization. (#130851)
Summary: Finishing up the mechanism to "register" certain types of operators to a registry so that the serializer can handle them correctly. This is expected to be firstly used by executorch.

Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_export_with_extension_op_serialization

Differential Revision: D59825148

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130851
Approved by: https://github.com/angelayi
2024-07-18 16:47:53 +00:00
Boyuan Feng
90105a4f3e [ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416)
- Support raise exception. It's behavior matches non-strict export now, thanks to @ydwu4's [PR](https://github.com/pytorch/pytorch/pull/128709).
- Support prim::Unitialized, prim::Enter, and prim::Exit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129416
Approved by: https://github.com/angelayi
2024-07-17 21:59:52 +00:00
PyTorch MergeBot
0eb43ed189 Revert "[ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416)"
This reverts commit f0faecd291.

Reverted https://github.com/pytorch/pytorch/pull/129416 on behalf of https://github.com/clee2000 due to broke lint, but for for torch/_inductor/codecache.py this time https://github.com/pytorch/pytorch/actions/runs/9981737836/job/27586013811 f0faecd291 ([comment](https://github.com/pytorch/pytorch/pull/129416#issuecomment-2234387254))
2024-07-17 21:55:48 +00:00
Boyuan Feng
f0faecd291 [ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416)
- Support raise exception. It's behavior matches non-strict export now, thanks to @ydwu4's [PR](https://github.com/pytorch/pytorch/pull/128709).
- Support prim::Unitialized, prim::Enter, and prim::Exit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129416
Approved by: https://github.com/angelayi
2024-07-17 21:27:45 +00:00
PyTorch MergeBot
1bf4a44b33 Revert "[ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416)"
This reverts commit ef0511245a.

Reverted https://github.com/pytorch/pytorch/pull/129416 on behalf of https://github.com/clee2000 due to broke lint for test/export/test_converter.py https://github.com/pytorch/pytorch/actions/runs/9979009143/job/27577181982 ef0511245a.  Probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/129416#issuecomment-2234067407))
2024-07-17 19:21:52 +00:00
Boyuan Feng
ef0511245a [ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416)
- Support raise exception. It's behavior matches non-strict export now, thanks to @ydwu4's [PR](https://github.com/pytorch/pytorch/pull/128709).
- Support prim::Unitialized, prim::Enter, and prim::Exit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129416
Approved by: https://github.com/angelayi
2024-07-17 17:48:36 +00:00
Jiashen Cao
67e22d6c61 [Fix]: Convert operator that does specialization to its symbolic counterpart (#129578)
#### Issue
During conversion, use symbolic operator when exist.

#### Test Plan
`pytest test/export/test_converter.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129578
Approved by: https://github.com/angelayi
2024-07-16 17:19:57 +00:00
Jiashen Cao
3f031b96c6 [Fix] Correctly identifying arguments for sub-blocks with renaming logic during TorchScript to ExportedProgram conversion (#128386)
#### Issue
Fix two issues related to inputs lifting when there are sub-blocks.
* Some inputs may appear in the nested sub-blocks, which need a recursive search to identify which arguments need to be lifted / passed in the top-level block.
* Some inputs to the sub-block are intermediate results, meaning their names are only number. This will cause issue during code generation (i.e., invalid argument name). We rename those to valid names.

#### Test Plan
* `pytest test/export/test_converter.py -s -k test_convert_nn_module_with_nested_if_and_param`
* `test/export/test_converter.py -s -k test_hidden_input_name`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128386
Approved by: https://github.com/angelayi
2024-07-15 22:48:13 +00:00
Aaron Orenstein
567482973d typing fake_tensor.py (#128041)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128041
Approved by: https://github.com/eellison
ghstack dependencies: #129182
2024-07-13 06:07:40 +00:00
Yidi Wu
0bf9a091ec [torchbind] add tracing_mode support (#129586)
Sometimes, it could be difficult to write a fake class e.g. when the original implementation is using some third-party libraries or users are certain that the class is safe to trace with the real object.

This PR allows user to specify their intention by implementing a "safe_to_trace_with_real_obj" method on their script class.

Test Plan:
`pytest test/export/test_torchbind.py -k safe`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129586
Approved by: https://github.com/zou3519
2024-07-12 18:01:47 +00:00
Pian Pawakapan
988ed4d5db [export] clean up allow_complex_guards_as_runtime_asserts flag (#130596)
Summary: removes underscore, cleans up dead code in DimConstraints

Test Plan: existing export tests

Reviewed By: angelayi

Differential Revision: D59612746

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130596
Approved by: https://github.com/angelayi
2024-07-12 17:17:11 +00:00
Tarun Karuturi
ff25dfca5a Save quantization_tag in export graph serialization (#127473)
Summary: `quantization_tag` is a first class citizen metadata in quantization flows that is preserved by it. As we'll want to store the quantized exported graphs we also need to preserve this metadata as it's used in later flows. Only json supported metadata will be allowed to be serialized.

Test Plan: Added test case

Differential Revision: D57939282

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127473
Approved by: https://github.com/angelayi
2024-07-12 05:06:40 +00:00
Zhengxu Chen
726a287271 [export] Expand verifier to be multiple on ExportedProgram (#130364)
Summary: This diff updates the ExportedProgram class in PyTorch to allow for multiple verifiers to be attached to it. This is done by adding a new field to the ExportedProgram schema called "verifiers" which is a list of strings representing the names of the verifiers to be attached to the program. The verifiers are loaded using the "load_verifier" function which is defined in the "torch._export.serde.serialize" module. The "exported_program.dialect" field is also deprecated in favor of the "verifiers" field.

Test Plan: CI

Differential Revision: D59408546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130364
Approved by: https://github.com/angelayi, https://github.com/ydwu4
2024-07-11 20:34:49 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
Pian Pawakapan
1b3b4c2fb9 [runtime asserts] deduplicate runtime asserts & CSE (#128599) (#130380)
original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train)

Summary:
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]

s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Test Plan:
contbuild & OSS CI, see 940e4477ab

Original Phabricator Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D59543603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380
Approved by: https://github.com/izaitsevfb
2024-07-10 19:23:37 +00:00
Shangdi Yu
c83b941141 [export] add dynamic shapes argument and infer from graph nodes (#129928)
Fixes the example in #118304 for `torch._functorch.aot_autograd.aot_export_module` and `torch.export.export`.

On a high level, the issue is caused by not detecting fake_mode when there's no input.

Change plan:

1) we add a  `dynamic_shapes: Union[bool, None] = None` arg to `aot_export_module` and `_aot_export_function`.

2) if the input is not a graph module, then we can only rely on this `dynamic_shapes` input arg.

3) If the input is a graph module, then we can traverse the graph and check.

4) So we check if the input mod is a graph module or just a module, and do 2) or 3) depending on the type.

Fixes #129927

Bug source: dynamo's fake_mode is not detected correctly in `_convert_input_to_fake` in `_traced.py` when there’s no input to the graph). So in ` _strict_export_lower_to_aten_ir`, we create another fake_mode. `dynamo_fake_mode` is not the same as the fake_mode used by dynamo.

Change plan:
check `gm_torch_level` graph's node meta "example_value" for fake mode in addition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129928
Approved by: https://github.com/angelayi
2024-07-10 15:51:05 +00:00
PyTorch MergeBot
9c9744c3ac Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599)"
This reverts commit 940e4477ab.

Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))
2024-07-09 21:03:49 +00:00
Yidi Wu
cb4bec311a Fix nodes has more than one output users after replace_set_grad_with_hop pass (#129716)
Summary: Previously, when we inline the subgraphs that doesn't have a different require_grad environment, we didn't clean up the nodes's users in subgraph and direcly used them to  to replace the output of  the call_modules. This records dead depencies in node.users. This PR fixes this.

Test Plan:
Added a new test.

Also see the torchrec tests:
Step 1:
buck run mode/dev-nosan //aimp/experimental/pt2:pt2_export -- --model-entity-id 934687114 --output /tmp/934687114.zip --use-torchrec-eager-mp --use-manifold

Step 2:
buck run mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true aimp/cli:cli --  --platform=aps --template=disagg_gpu_aps_pt2 --pt2 --model-entity-id=934687114 non-request-only-tagging torchrec-shard-and-quantize gpu-disagg-split assign-device materialize-weights script-and-save

Differential Revision: D59132214

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129716
Approved by: https://github.com/angelayi
2024-07-09 17:04:03 +00:00
Pian Pawakapan
940e4477ab [runtime asserts] deduplicate runtime asserts & CSE (#128599)
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]
# something with _w ...

# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
2024-07-07 20:10:14 +00:00
PyTorch MergeBot
963f430d13 Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599)"
This reverts commit 0267b2ddcb.

Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a landrace and fails inductor/test_cudagraph_trees in trunk 0267b2ddcb ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2211690518))
2024-07-06 07:20:05 +00:00
Pian Pawakapan
0267b2ddcb [runtime asserts] deduplicate runtime asserts & CSE (#128599)
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]
# something with _w ...

# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
2024-07-06 03:44:49 +00:00
Jiashen Cao
7c5f3cd049 Add explain function to TSConverter. (#129968)
Summary: The explain function does a conversion dry run to provide feedback on which operators are not supported / fail the conversion to the users.

Test Plan: * `pytest test/export/test_converter.py`

Differential Revision: D59251934

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129968
Approved by: https://github.com/angelayi
2024-07-05 18:04:29 +00:00
Zhengxu Chen
042d764872 [export] Update example inputs format for DB. (#129982)
Summary: To give user a simpler example code, we are getting rid of ExportArgs in favor of example_args and example_kwargs.

Test Plan: CI

Differential Revision: D59288920

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129982
Approved by: https://github.com/angelayi
2024-07-03 17:53:15 +00:00
Boyuan Feng
3ef44df667 [ts-migration] support prim::SetAttr and fix prim::GetAttr (#129440)
- Lifting Tensor Constant attributes to buffers: TorchScript does not automatically lift tensor constant attributes to buffers. So previous converter cannot access tensor constant attributes. This PR fixed the issue.
- Add SetAttr support for tensor attributes by copy_.
- Add SetAttr support for non-tensor attributes. In particular, we maintain the current value of non-tensor attributes in `name_to_non_tensor_attribute_node`, similar to an interpreter pass on non-tensor attributes. So we can support the following use case:
```python
 def forward(self, x):
      c1 = self.count
      self.count += 1
      c2 = self.count
      return x + c1 + c2
```
- Fixed a bug in GetAttr to support the following use case:
```python
def forward(self, inp):
  x = self.buffer
  self.buffer += 1
  y = self.buffer
  return x + y + inp
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129440
Approved by: https://github.com/angelayi
2024-06-29 05:08:13 +00:00
Tugsbayasgalan Manlaibaatar
ec284d3a74 Prototype for export_for_training (#129092)
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer
3. call make_fx

TODO:
1. run_decomp doesn't work
2. not-strict is not supported

Differential Revision: [D59069087](https://our.internmc.facebook.com/intern/diff/D59069087)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129092
Approved by: https://github.com/zhxchen17
ghstack dependencies: #128077
2024-06-27 18:27:11 +00:00
Jiashen Cao
b6689e0fb8 [ts migration] add logging as part of torch logging system (#129405)
#### Description
Add more verbose logging of conversion process. Output which IR is being converted, which function is used to do conversion, and whether it succeeds.

#### Example
`TORCH_LOGS="+export,ts2ep_conversion" pytest test/export/test_converter.py -s -k test_prim_tolist`
```
test/export/test_converter.py I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] TorchScript graph
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734]
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] graph(%x.1 : Long(3, strides=[1], requires_grad=0, device=cpu)):
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734]   %1 : __torch__.export.test_converter.___torch_mangle_1.Module = prim::CreateObject()
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734]   %2 : int = prim::Constant[value=1](), scope: export.test_converter.Module::
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734]   %3 : int = prim::Constant[value=0](), scope: export.test_converter.Module::
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734]   %4 : int[] = prim::tolist(%x.1, %2, %3), scope: export.test_converter.Module::
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734]   return (%4)
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734]
I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734]
V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert [%1 : __torch__.export.test_converter.___torch_mangle_1.Module = prim::CreateObject()]
V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert using [convert_prim_CreateObject] succeeds
V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert [%2 : int = prim::Constant[value=1](), scope: export.test_converter.Module::]
V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert using [convert_prim_Constant] succeeds
V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert [%3 : int = prim::Constant[value=0](), scope: export.test_converter.Module::]
V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert using [convert_prim_Constant] succeeds
V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert [%4 : int[] = prim::tolist(%x.1, %2, %3), scope: export.test_converter.Module::]
V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert using [convert_prim_tolist] succeeds
I0624 13:19:26.427000 140608224474112 torch/_export/converter.py:760] TS2EPConverter IR-to-IR conversion succeeds
```

#### Test Plan
`pytest test/export/test_converter`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129405
Approved by: https://github.com/angelayi
2024-06-27 00:20:20 +00:00
Zhengxu Chen
e58ef5b65f [export] Rewrite exportdb formatting. (#129260)
Summary: It'll be easier to generate examples if the code doesn't depend on exportdb library.

Test Plan: CI

Differential Revision: D58886554

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129260
Approved by: https://github.com/tugsbayasgalan
2024-06-25 21:04:53 +00:00
Jiashen Cao
45f2876934 [Fix] NumToTensor resulted from numel() and size() in TSCovnerter (#128761)
#### Issue
In jit.trace, torch.numel() is automatically cast to a `LongTensor`. But during conversion, we lost the casting part. `prim::NumToTensor` was previously converted to `torch.ops.aten.scalar_tensor`, which uses the same `dtype` as the input tensor instead of `LongTensor`. in this PR, we add a casting to convert it to the correct `dtype`.

#### Test Plan
We activate previously failing test case.
* `pytest test/export/test_converter.py -s -k test_implicit_constant_to_tensor_handling`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128761
Approved by: https://github.com/angelayi
2024-06-25 20:20:03 +00:00
Yidi Wu
dd00f5e78d Fixes T192448049 (#129146)
Differential Revision: D58767610

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129146
Approved by: https://github.com/angelayi
2024-06-25 17:50:15 +00:00
Zhengxu Chen
665d6ea05b [export] Fix IR canonlization. (#129401)
Summary: as title. we should unpack results from _canonicalize_graph.

Differential Revision: D58963429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129401
Approved by: https://github.com/tugsbayasgalan
2024-06-25 16:33:02 +00:00
Boyuan Feng
04a5d3228e [ts migration] Support prim::tolist and aten::len (#128894)
Support prim::tolist and aten::len. Add unit tests for prim::min.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128894
Approved by: https://github.com/angelayi
2024-06-18 19:11:07 +00:00
Jiashen Cao
316b729677 [Fix] TS converter constant to tensor (#128442)
#### Issue
Tensor constant was previously lifted directly as an input in the fx graph, which results errors for multiple test cases with tensor constant. This PR introduces a fix to convert tensor constant to a `GetAttr` in the fx graph.

This PR also introduces other fixes to maintain a valid `state_dict` for exported program when there are tensor constants. In short, after tensor constants are converted as `GetAttr`, they are treated as buffers during retracing. The fix will convert those back from buffer to constant.

#### Test Plan
Add new test cases that generate tensor constants
* `pytest test/export/test_converter.py -s -k test_implicit_constant_to_tensor_handling`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128442
Approved by: https://github.com/angelayi
2024-06-17 16:42:43 +00:00
Zhengxu Chen
bfad0aee44 [export] Preserve requires_grad for export inputs. (#128656)
Summary: Today meta['val'] on placeholder nodes doesn't preserve the consistent requires_grad information with the original inputs. Seems there's no easy way to fix this directly at proxy tensor layer. This is useful for reexporting joint graph.

Test Plan: test_preserve_requires_grad_placeholders

Differential Revision: D58555651

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128656
Approved by: https://github.com/tugsbayasgalan
2024-06-17 16:26:08 +00:00
angelayi
e9c6e8369c Torchbind call method + effects support (#128397)
Adds effect token support to torchbind method calls by allowing `with_effects` to take in `torch.ops._higher_order_ops.call_torchbind` as an input.

Here is the print from `TORCH_LOGS="aot" python test/export/test_torchbind.py -k test_compile_obj_torchbind_op`:
```python
def forward(self, arg0_1: "f32[0]", arg1_1: "f32[2]", arg2_1):
    # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1266 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.cos())
    cos: "f32[2]" = torch.ops.aten.cos.default(arg1_1)
    with_effects = torch._higher_order_ops.effects.with_effects(arg0_1, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, cos);  arg0_1 = cos = None
    getitem: "f32[0]" = with_effects[0];  with_effects = None

    # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1267 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.cos() + 1)
    cos_1: "f32[2]" = torch.ops.aten.cos.default(arg1_1)
    add: "f32[2]" = torch.ops.aten.add.Tensor(cos_1, 1);  cos_1 = None
    with_effects_1 = torch._higher_order_ops.effects.with_effects(getitem, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, add);  getitem = add = None
    getitem_2: "f32[0]" = with_effects_1[0];  with_effects_1 = None

    # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1268 in f, code: torch.ops._TorchScriptTesting.queue_pop(tq)
    with_effects_2 = torch._higher_order_ops.effects.with_effects(getitem_2, torch.ops._TorchScriptTesting.queue_pop.default, arg2_1);  getitem_2 = None
    getitem_4: "f32[0]" = with_effects_2[0];  with_effects_2 = None

    # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1269 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.sin())
    sin: "f32[2]" = torch.ops.aten.sin.default(arg1_1);  arg1_1 = None
    with_effects_3 = torch._higher_order_ops.effects.with_effects(getitem_4, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, sin);  getitem_4 = sin = None
    getitem_6: "f32[0]" = with_effects_3[0];  with_effects_3 = None

    # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1270 in f, code: return tq.pop(), tq.pop() + tq.size(), tq
    with_effects_4 = torch._higher_order_ops.effects.with_effects(getitem_6, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'pop');  getitem_6 = None
    getitem_8: "f32[0]" = with_effects_4[0]
    getitem_9: "f32[2]" = with_effects_4[1];  with_effects_4 = None
    with_effects_5 = torch._higher_order_ops.effects.with_effects(getitem_8, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'pop');  getitem_8 = None
    getitem_10: "f32[0]" = with_effects_5[0]
    getitem_11: "f32[2]" = with_effects_5[1];  with_effects_5 = None
    with_effects_6 = torch._higher_order_ops.effects.with_effects(getitem_10, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'size');  getitem_10 = arg2_1 = None
    getitem_12: "f32[0]" = with_effects_6[0];  with_effects_6 = None
    add_1: "f32[2]" = torch.ops.aten.add.Tensor(getitem_11, 0);  getitem_11 = None
    return (getitem_12, getitem_9, add_1)
```

In order to support this, this PR makes the following changes:
* Adds `FakeScriptObject` to `CustomObjArgument`, which will be put on the `meta["val"]` of nodes representing torchbind objects.
* Adds pickle/deepcopy support to FunctionSchema.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128397
Approved by: https://github.com/ydwu4, https://github.com/zou3519
2024-06-14 21:28:17 +00:00
Zhengxu Chen
be0eec9031 [export] Improve static typing in tracer. (#128552)
Summary: as title.

Test Plan: CI

Differential Revision: D58485487

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128552
Approved by: https://github.com/angelayi
2024-06-14 17:57:37 +00:00
lezcano
0fdd8d84fa Do not generate -1* in SymPy expressions when canonicalising (#128411)
Partially addresses https://github.com/pytorch/pytorch/issues/128150
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128411
Approved by: https://github.com/ezyang
ghstack dependencies: #128410
2024-06-13 16:49:59 +00:00
Yidi Wu
e9b81e4edf Fakify torch bind input by default (#128454)
Summary: Try a reland of https://github.com/pytorch/pytorch/pull/127116 after some fixes landed

Differential Revision: D58418251

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128454
Approved by: https://github.com/angelayi
2024-06-13 16:25:11 +00:00
Edward Z. Yang
2229884102 Introduce int_oo (#127693)
In a previous life, we used sympy.oo to represent the lower/upper bounds of integer ranges. Later, we changed this to be sys.maxsize - 1 for a few reasons: (1) sometimes we do tests on a value being exactly sys.maxsize, and we wanted to avoid a data dependent guard in this case, (2) sympy.oo corresponds to floating point infinity, so you get incorrect types for value ranges with oo, and (3) you can do slightly better reasoning if you assume that input sizes fall within representable 64-bit integer range.

After working in the sys.maxsize regime for a bit, I've concluded that this was actually a bad idea. Specifically, the problem is that you end up with sys.maxsize in your upper bound, and then whenever you do any sort of size-increasing computation like size * 2, you end up with 2 * sys.maxsize, and you end up doing a ton of arbitrary precision int computation that is totally unnecessary. A symbolic bound is better.

But especially after #126905, we can't go back to using sympy.oo, because that advertises that it's not an integer, and now your ValueRanges is typed incorrectly. So what do we do? We define a new numeric constant `int_oo`, which is like `sympy.oo` but it advertises `is_integer`. **test/test_sympy_utils.py** describes some basic properties of the number, and **torch/utils/_sympy/numbers.py** has the actual implementation.

The rest of the changes of the PR are working out the implications of this change. I'll give more commentary as inline comments.

Fixes https://github.com/pytorch/pytorch/issues/127396

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127693
Approved by: https://github.com/lezcano
ghstack dependencies: #126905
2024-06-13 04:08:20 +00:00
Angela Yi
3bc2004f91 [ts_converter] Fix prim::dtype (#128517)
Summary: prim::dtype has the signature `(Tensor a) -> int`, where it gets the dtype of the tensor and returns the integer corresponding to this dtype based on the enum in ScalarType.h. Previously we were converting prim::dtype by returning the actual dtype of the tensor (ex. torch.float32). This causes some incorrect control flow to behavior, specifically where it checks if `prim::dtype(tensor) in [3, 5, 7]`, where [3, 5, 7] correspond to torch.int32, torch.float16, torch.float64. This control flow would always returns False because we would be comparing torch.float32 against the integers [3, 5, 7], which is a type mismatch.

Test Plan: 7/22 internal models now are convertable and runnable in eager and sigmoid! P1410243909

Reviewed By: jiashenC

Differential Revision: D58469232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128517
Approved by: https://github.com/jiashenC
2024-06-12 23:02:50 +00:00
Zhengxu Chen
0444e89931 [export] Remove replace_sym_size_ops_pass (#128443)
Summary: Not needed anymore.

Test Plan: CI

Differential Revision: D58429458

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128443
Approved by: https://github.com/angelayi
2024-06-12 21:03:06 +00:00
PyTorch MergeBot
5d8c7f39d4 Revert "Introduce int_oo (#127693)"
This reverts commit 9cab5987bd.

Reverted https://github.com/pytorch/pytorch/pull/127693 on behalf of https://github.com/clee2000 due to sorry executorch CI is a bit weird regarding pins, I'll make a chat with mergen with the choices of what to do and how it'll affect executorch CI, reverting for now to prevent more divergences in the meantime ([comment](https://github.com/pytorch/pytorch/pull/127693#issuecomment-2161775400))
2024-06-11 23:36:08 +00:00
Jiashen Cao
739aa224ec [Fix] Parameter un/lifting issues in the TorchScript to ExportedProgram converter (#127975)
This PR fixes issues related to parameters and inputs lifting in the converter.

#### Issue 1
```
> Graph[linear.weights, bias.weights, x.1]
%1 ...
%2 ...
%3 = CreateObject()

	> Block 0[]
        %linear.0 = GetAttr(linear)[%3]

	             > Block 0.0[]
	             %weight.0 = GetAttr(weights)[%linear.0]

	> Block 1[]
	...
```
* Model parameters for the top level module should be unlifted, while parameters from sub-blocks should be lifted.
#### Fixes
* Bottom-up traversal (i.e., start from the inner most block) to figure out which parameters to be lifted for sub-blocks.

#### Test Plan
* Add test cases for nested block without control flow `pytest test/export/test_converter.py -s -k test_convert_nn_module_with_nested_param`
* Add test cases for nested block with control flow `pytest test/export/test_converter.py -s -k test_convert_nn_module_with_nested_if_and_param`

#### Outcome
##### TorchScript
```
graph(%x.1 : Float(3, strides=[1], requires_grad=0, device=cpu),
      %m1.m1.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu),
      %m1.m1.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu),
      %m1.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu),
      %m1.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu),
      %m1.m2.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu),
      %m1.m2.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu),
      %linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu),
      %linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu),
      %m2.m1.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu),
      %m2.m1.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu),
      %m2.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu),
      %m2.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu),
      %m2.m2.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu),
      %m2.m2.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu)):
  %15 : __torch__.export.test_converter.___torch_mangle_14.SuperNestedM1 = prim::CreateObject()
  %16 : NoneType = prim::Constant(), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
  %17 : int = prim::Constant[value=1](), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:34
  %18 : Tensor = aten::max(%x.1), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:19
  %19 : Tensor = aten::gt(%18, %17), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:19
  %20 : bool = aten::Bool(%19), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:19
  %21 : Tensor = prim::If(%20), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:16
    block0():
      %linear.6 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%15), scope: export.test_converter.SuperNestedM1::
      %m1.1 : __torch__.export.test_converter.___torch_mangle_15.NestedM = prim::GetAttr[name="m1"](%15), scope: export.test_converter.SuperNestedM1::
      %24 : Tensor = aten::sum(%x.1, %16), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19
      %25 : Tensor = aten::gt(%24, %17), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19
      %26 : bool = aten::Bool(%25), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19
      %27 : Tensor = prim::If(%26), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:16
        block0():
          %linear.10 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m1.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %m1.3 : __torch__.export.test_converter.___torch_mangle_16.M = prim::GetAttr[name="m1"](%m1.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %linear.12 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m1.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %weight.4 : Tensor = prim::GetAttr[name="weight"](%linear.12), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %bias.4 : Tensor = prim::GetAttr[name="bias"](%linear.12), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %33 : Tensor = aten::linear(%x.1, %weight.4, %bias.4), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
          %weight.6 : Tensor = prim::GetAttr[name="weight"](%linear.10), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %bias.6 : Tensor = prim::GetAttr[name="bias"](%linear.10), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %36 : Tensor = aten::linear(%33, %weight.6, %bias.6), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
          -> (%36)
        block1():
          %linear.14 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m1.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %m2.3 : __torch__.export.test_converter.___torch_mangle_16.M = prim::GetAttr[name="m2"](%m1.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %linear.16 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m2.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %weight.8 : Tensor = prim::GetAttr[name="weight"](%linear.16), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %bias.8 : Tensor = prim::GetAttr[name="bias"](%linear.16), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %42 : Tensor = aten::linear(%x.1, %weight.8, %bias.8), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
          %weight.2 : Tensor = prim::GetAttr[name="weight"](%linear.14), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %bias.2 : Tensor = prim::GetAttr[name="bias"](%linear.14), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1
          %45 : Tensor = aten::linear(%42, %weight.2, %bias.2), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
          -> (%45)
      %weight.10 : Tensor = prim::GetAttr[name="weight"](%linear.6), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear
      %bias.10 : Tensor = prim::GetAttr[name="bias"](%linear.6), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear
      %48 : Tensor = aten::linear(%27, %weight.10, %bias.10), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
      -> (%48)
    block1():
      %linear.8 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%15), scope: export.test_converter.SuperNestedM1::
      %m2.1 : __torch__.export.test_converter.___torch_mangle_15.NestedM = prim::GetAttr[name="m2"](%15), scope: export.test_converter.SuperNestedM1::
      %51 : Tensor = aten::sum(%x.1, %16), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19
      %52 : Tensor = aten::gt(%51, %17), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19
      %53 : bool = aten::Bool(%52), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19
      %54 : Tensor = prim::If(%53), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:16
        block0():
          %linear.1 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m2.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %m1 : __torch__.export.test_converter.___torch_mangle_16.M = prim::GetAttr[name="m1"](%m2.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %linear.5 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %weight.1 : Tensor = prim::GetAttr[name="weight"](%linear.5), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %bias.1 : Tensor = prim::GetAttr[name="bias"](%linear.5), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %60 : Tensor = aten::linear(%x.1, %weight.1, %bias.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
          %weight.3 : Tensor = prim::GetAttr[name="weight"](%linear.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %bias.3 : Tensor = prim::GetAttr[name="bias"](%linear.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %63 : Tensor = aten::linear(%60, %weight.3, %bias.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
          -> (%63)
        block1():
          %linear.3 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m2.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %m2 : __torch__.export.test_converter.___torch_mangle_16.M = prim::GetAttr[name="m2"](%m2.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %linear : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m2), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %weight.5 : Tensor = prim::GetAttr[name="weight"](%linear), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %bias.5 : Tensor = prim::GetAttr[name="bias"](%linear), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %69 : Tensor = aten::linear(%x.1, %weight.5, %bias.5), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
          %weight.12 : Tensor = prim::GetAttr[name="weight"](%linear.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %bias.12 : Tensor = prim::GetAttr[name="bias"](%linear.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2
          %72 : Tensor = aten::linear(%69, %weight.12, %bias.12), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
          -> (%72)
      %weight : Tensor = prim::GetAttr[name="weight"](%linear.8), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear
      %bias : Tensor = prim::GetAttr[name="bias"](%linear.8), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear
      %75 : Tensor = aten::linear(%54, %weight, %bias), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15
      -> (%75)
  return (%21)
```
##### ExportedProgram
```
ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_linear_weight: "f32[3, 3]", p_linear_bias: "f32[3]", p_m1_linear_weight: "f32[3, 3]", p_m1_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m1_m1_linear_bias: "f32[3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]", p_m2_linear_weight: "f32[3, 3]", p_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]", p_m2_m2_linear_bias: "f32[3]", x_1: "f32[3]"):
            # No stacktrace found for following nodes
            max_1: "f32[]" = torch.ops.aten.max.default(x_1)
            gt: "b8[]" = torch.ops.aten.gt.Scalar(max_1, 1);  max_1 = None

            # File: <eval_with_key>.137:23 in forward, code: cond = torch.ops.higher_order.cond(l_args_0_, cond_true_2, cond_false_2, [l_args_3_0_, l_args_3_13_, l_args_3_5_, l_args_3_12_, l_args_3_14_, l_args_3_1_, l_args_3_3_, l_args_3_4_, l_args_3_7_, l_args_3_10_, l_args_3_11_, l_args_3_2_, l_args_3_6_, l_args_3_8_, l_args_3_9_]);  l_args_0_ = cond_true_2 = cond_false_2 = l_args_3_0_ = l_args_3_13_ = l_args_3_5_ = l_args_3_12_ = l_args_3_14_ = l_args_3_1_ = l_args_3_3_ = l_args_3_4_ = l_args_3_7_ = l_args_3_10_ = l_args_3_11_ = l_args_3_2_ = l_args_3_6_ = l_args_3_8_ = l_args_3_9_ = None
            true_graph_0 = self.true_graph_0
            false_graph_0 = self.false_graph_0
            conditional = torch.ops.higher_order.cond(gt, true_graph_0, false_graph_0, [p_linear_weight, p_linear_bias, x_1, p_m1_linear_weight, p_m1_m1_linear_bias, p_m1_linear_bias, p_m1_m2_linear_weight, p_m1_m2_linear_bias, p_m1_m1_linear_weight, p_m2_m2_linear_bias, p_m2_m1_linear_weight, p_m2_linear_weight, p_m2_m1_linear_bias, p_m2_m2_linear_weight, p_m2_linear_bias]);  gt = true_graph_0 = false_graph_0 = p_linear_weight = p_linear_bias = x_1 = p_m1_linear_weight = p_m1_m1_linear_bias = p_m1_linear_bias = p_m1_m2_linear_weight = p_m1_m2_linear_bias = p_m1_m1_linear_weight = p_m2_m2_linear_bias = p_m2_m1_linear_weight = p_m2_linear_weight = p_m2_m1_linear_bias = p_m2_m2_linear_weight = p_m2_linear_bias = None
            getitem: "f32[3]" = conditional[0];  conditional = None
            return (getitem,)

        class <lambda>(torch.nn.Module):
            def forward(self, p_linear_weight: "f32[3, 3]", p_linear_bias: "f32[3]", x_1: "f32[3]", p_m1_linear_weight: "f32[3, 3]", p_m1_m1_linear_bias: "f32[3]", p_m1_linear_bias: "f32[3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m2_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]", p_m2_linear_bias: "f32[3]"):
                # File: <eval_with_key>.134:8 in forward, code: sum_default = torch.ops.aten.sum.default(l_args_3_5__1, dtype = None)
                sum_1: "f32[]" = torch.ops.aten.sum.default(x_1)

                # File: <eval_with_key>.134:9 in forward, code: gt_scalar = torch.ops.aten.gt.Scalar(sum_default, 1);  sum_default = None
                gt: "b8[]" = torch.ops.aten.gt.Scalar(sum_1, 1);  sum_1 = None

                # File: <eval_with_key>.134:12 in forward, code: cond = torch.ops.higher_order.cond(gt_scalar, cond_true_0, cond_false_0, [l_args_3_12__true_branch, l_args_3_1__true_branch, l_args_3_5__1, l_args_3_14__true_branch, l_args_3_7__true_branch, l_args_3_3__true_branch, l_args_3_4__true_branch]);  gt_scalar = cond_true_0 = cond_false_0 = l_args_3_12__true_branch = l_args_3_1__true_branch = l_args_3_5__1 = l_args_3_14__true_branch = l_args_3_7__true_branch = l_args_3_3__true_branch = l_args_3_4__true_branch = None
                true_graph_0 = self.true_graph_0
                false_graph_0 = self.false_graph_0
                conditional = torch.ops.higher_order.cond(gt, true_graph_0, false_graph_0, [p_m1_linear_weight, p_m1_linear_bias, x_1, p_m1_m1_linear_bias, p_m1_m1_linear_weight, p_m1_m2_linear_weight, p_m1_m2_linear_bias]);  gt = true_graph_0 = false_graph_0 = p_m1_linear_weight = p_m1_linear_bias = x_1 = p_m1_m1_linear_bias = p_m1_m1_linear_weight = p_m1_m2_linear_weight = p_m1_m2_linear_bias = None
                getitem: "f32[3]" = conditional[0];  conditional = None

                # File: <eval_with_key>.134:14 in forward, code: linear_default = torch.ops.aten.linear.default(getitem, l_args_3_0__1, l_args_3_13__1);  getitem = l_args_3_0__1 = l_args_3_13__1 = None
                linear: "f32[3]" = torch.ops.aten.linear.default(getitem, p_linear_weight, p_linear_bias);  getitem = p_linear_weight = p_linear_bias = None
                return (linear,)

            class <lambda>(torch.nn.Module):
                def forward(self, p_m1_linear_weight: "f32[3, 3]", p_m1_linear_bias: "f32[3]", x_1: "f32[3]", p_m1_m1_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]"):
                    # File: <eval_with_key>.130:8 in forward, code: linear_default = torch.ops.aten.linear.default(l_args_3_5__1, l_args_3_7__true_branch, l_args_3_14__true_branch);  l_args_3_5__1 = l_args_3_7__true_branch = l_args_3_14__true_branch = None
                    linear: "f32[3]" = torch.ops.aten.linear.default(x_1, p_m1_m1_linear_weight, p_m1_m1_linear_bias);  x_1 = p_m1_m1_linear_weight = p_m1_m1_linear_bias = None

                    # File: <eval_with_key>.130:9 in forward, code: linear_default_1 = torch.ops.aten.linear.default(linear_default, l_args_3_12__1, l_args_3_1__1);  linear_default = l_args_3_12__1 = l_args_3_1__1 = None
                    linear_1: "f32[3]" = torch.ops.aten.linear.default(linear, p_m1_linear_weight, p_m1_linear_bias);  linear = p_m1_linear_weight = p_m1_linear_bias = None
                    return (linear_1,)

            class <lambda>(torch.nn.Module):
                def forward(self, p_m1_linear_weight: "f32[3, 3]", p_m1_linear_bias: "f32[3]", x_1: "f32[3]", p_m1_m1_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]"):
                    # File: <eval_with_key>.131:8 in forward, code: linear_default = torch.ops.aten.linear.default(l_args_3_5__1, l_args_3_3__false_branch, l_args_3_4__false_branch);  l_args_3_5__1 = l_args_3_3__false_branch = l_args_3_4__false_branch = None
                    linear: "f32[3]" = torch.ops.aten.linear.default(x_1, p_m1_m2_linear_weight, p_m1_m2_linear_bias);  x_1 = p_m1_m2_linear_weight = p_m1_m2_linear_bias = None

                    # File: <eval_with_key>.131:9 in forward, code: linear_default_1 = torch.ops.aten.linear.default(linear_default, l_args_3_12__1, l_args_3_1__1);  linear_default = l_args_3_12__1 = l_args_3_1__1 = None
                    linear_1: "f32[3]" = torch.ops.aten.linear.default(linear, p_m1_linear_weight, p_m1_linear_bias);  linear = p_m1_linear_weight = p_m1_linear_bias = None
                    return (linear_1,)

        class <lambda>(torch.nn.Module):
            def forward(self, p_linear_weight: "f32[3, 3]", p_linear_bias: "f32[3]", x_1: "f32[3]", p_m1_linear_weight: "f32[3, 3]", p_m1_m1_linear_bias: "f32[3]", p_m1_linear_bias: "f32[3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m2_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]", p_m2_linear_bias: "f32[3]"):
                # File: <eval_with_key>.135:8 in forward, code: sum_default = torch.ops.aten.sum.default(l_args_3_5__1, dtype = None)
                sum_1: "f32[]" = torch.ops.aten.sum.default(x_1)

                # File: <eval_with_key>.135:9 in forward, code: gt_scalar = torch.ops.aten.gt.Scalar(sum_default, 1);  sum_default = None
                gt: "b8[]" = torch.ops.aten.gt.Scalar(sum_1, 1);  sum_1 = None

                # File: <eval_with_key>.135:12 in forward, code: cond = torch.ops.higher_order.cond(gt_scalar, cond_true_1, cond_false_1, [l_args_3_2__false_branch, l_args_3_5__1, l_args_3_9__false_branch, l_args_3_11__false_branch, l_args_3_6__false_branch, l_args_3_10__false_branch, l_args_3_8__false_branch]);  gt_scalar = cond_true_1 = cond_false_1 = l_args_3_2__false_branch = l_args_3_5__1 = l_args_3_9__false_branch = l_args_3_11__false_branch = l_args_3_6__false_branch = l_args_3_10__false_branch = l_args_3_8__false_branch = None
                true_graph_0 = self.true_graph_0
                false_graph_0 = self.false_graph_0
                conditional = torch.ops.higher_order.cond(gt, true_graph_0, false_graph_0, [p_m2_linear_weight, x_1, p_m2_linear_bias, p_m2_m1_linear_weight, p_m2_m1_linear_bias, p_m2_m2_linear_bias, p_m2_m2_linear_weight]);  gt = true_graph_0 = false_graph_0 = p_m2_linear_weight = x_1 = p_m2_linear_bias = p_m2_m1_linear_weight = p_m2_m1_linear_bias = p_m2_m2_linear_bias = p_m2_m2_linear_weight = None
                getitem: "f32[3]" = conditional[0];  conditional = None

                # File: <eval_with_key>.135:14 in forward, code: linear_default = torch.ops.aten.linear.default(getitem, l_args_3_0__1, l_args_3_13__1);  getitem = l_args_3_0__1 = l_args_3_13__1 = None
                linear: "f32[3]" = torch.ops.aten.linear.default(getitem, p_linear_weight, p_linear_bias);  getitem = p_linear_weight = p_linear_bias = None
                return (linear,)

            class <lambda>(torch.nn.Module):
                def forward(self, p_m2_linear_weight: "f32[3, 3]", x_1: "f32[3]", p_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]"):
                    # File: <eval_with_key>.132:8 in forward, code: linear_default = torch.ops.aten.linear.default(l_args_3_5__1, l_args_3_11__true_branch, l_args_3_6__true_branch);  l_args_3_5__1 = l_args_3_11__true_branch = l_args_3_6__true_branch = None
                    linear: "f32[3]" = torch.ops.aten.linear.default(x_1, p_m2_m1_linear_weight, p_m2_m1_linear_bias);  x_1 = p_m2_m1_linear_weight = p_m2_m1_linear_bias = None

                    # File: <eval_with_key>.132:9 in forward, code: linear_default_1 = torch.ops.aten.linear.default(linear_default, l_args_3_2__1, l_args_3_9__1);  linear_default = l_args_3_2__1 = l_args_3_9__1 = None
                    linear_1: "f32[3]" = torch.ops.aten.linear.default(linear, p_m2_linear_weight, p_m2_linear_bias);  linear = p_m2_linear_weight = p_m2_linear_bias = None
                    return (linear_1,)

            class <lambda>(torch.nn.Module):
                def forward(self, p_m2_linear_weight: "f32[3, 3]", x_1: "f32[3]", p_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]"):
                    # File: <eval_with_key>.133:8 in forward, code: linear_default = torch.ops.aten.linear.default(l_args_3_5__1, l_args_3_8__false_branch, l_args_3_10__false_branch);  l_args_3_5__1 = l_args_3_8__false_branch = l_args_3_10__false_branch = None
                    linear: "f32[3]" = torch.ops.aten.linear.default(x_1, p_m2_m2_linear_weight, p_m2_m2_linear_bias);  x_1 = p_m2_m2_linear_weight = p_m2_m2_linear_bias = None

                    # File: <eval_with_key>.133:9 in forward, code: linear_default_1 = torch.ops.aten.linear.default(linear_default, l_args_3_2__1, l_args_3_9__1);  linear_default = l_args_3_2__1 = l_args_3_9__1 = None
                    linear_1: "f32[3]" = torch.ops.aten.linear.default(linear, p_m2_linear_weight, p_m2_linear_bias);  linear = p_m2_linear_weight = p_m2_linear_bias = None
                    return (linear_1,)

Graph signature: ExportGraphSignature(input_specs=[InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_linear_weight'), target='linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_linear_bias'), target='linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_linear_weight'), target='m1.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_linear_bias'), target='m1.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_m1_linear_weight'), target='m1.m1.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_m1_linear_bias'), target='m1.m1.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_m2_linear_weight'), target='m1.m2.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_m2_linear_bias'), target='m1.m2.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_linear_weight'), target='m2.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_linear_bias'), target='m2.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_m1_linear_weight'), target='m2.m1.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_m1_linear_bias'), target='m2.m1.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_m2_linear_weight'), target='m2.m2.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_m2_linear_bias'), target='m2.m2.linear.bias', persistent=None), InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='x_1'), target=None, persistent=None)], output_specs=[OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='getitem'), target=None)])
Range constraints: {}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127975
Approved by: https://github.com/angelayi, https://github.com/ydwu4
2024-06-10 23:24:16 +00:00
Aaron Orenstein
946f554c8f Flip default value for mypy disallow_untyped_defs [10+1/11] (#128293)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128293
Approved by: https://github.com/oulgen
2024-06-10 19:32:44 +00:00
Edward Z. Yang
9cab5987bd Introduce int_oo (#127693)
In a previous life, we used sympy.oo to represent the lower/upper bounds of integer ranges. Later, we changed this to be sys.maxsize - 1 for a few reasons: (1) sometimes we do tests on a value being exactly sys.maxsize, and we wanted to avoid a data dependent guard in this case, (2) sympy.oo corresponds to floating point infinity, so you get incorrect types for value ranges with oo, and (3) you can do slightly better reasoning if you assume that input sizes fall within representable 64-bit integer range.

After working in the sys.maxsize regime for a bit, I've concluded that this was actually a bad idea. Specifically, the problem is that you end up with sys.maxsize in your upper bound, and then whenever you do any sort of size-increasing computation like size * 2, you end up with 2 * sys.maxsize, and you end up doing a ton of arbitrary precision int computation that is totally unnecessary. A symbolic bound is better.

But especially after #126905, we can't go back to using sympy.oo, because that advertises that it's not an integer, and now your ValueRanges is typed incorrectly. So what do we do? We define a new numeric constant `int_oo`, which is like `sympy.oo` but it advertises `is_integer`. **test/test_sympy_utils.py** describes some basic properties of the number, and **torch/utils/_sympy/numbers.py** has the actual implementation.

The rest of the changes of the PR are working out the implications of this change. I'll give more commentary as inline comments.

Fixes https://github.com/pytorch/pytorch/issues/127396

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127693
Approved by: https://github.com/lezcano
ghstack dependencies: #126905
2024-06-10 19:09:53 +00:00
Edward Z. Yang
3964a3ec73 Complete revamp of float/promotion sympy handling (#126905)
At a high level, the idea behind this PR is:

* Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.)
* Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers.

The story begins in **torch/utils/_sympy/functions.py**. Here, I make some changes to how we represent certain operations in sympy expressions:

* FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing).
* ModularIndexing, LShift, RShift now assert they are given integer inputs.
* Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver
* TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2**53 beyond what first coercing the integer to floats and then doing true division.
* Trunc is split to TruncToFloat and TruncToInt.
* Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result.
* RoundDecimal updated to consistently only ever return a float
* Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing)

In **torch/__init__.py**, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations.  Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information.

We also need to introduce some new op handlers in **torch/_inductor/ops_handler.py**:

* `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy
* `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv`

These changes have consequences. First, we need to make some administrative changes:

* Actually wire up these Sympy functions from SymInt/SymFloat in **torch/fx/experimental/sym_node.py**, including the new promotion rules (promote2)
* Add support for new Sympy functions in **torch/utils/_sympy/interp.py**, **torch/utils/_sympy/reference.py**
  * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function
  * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here
* Add printer support for the Sympy functions in **torch/_inductor/codegen/common.py**, **torch/_inductor/codegen/cpp_utils.py**, **torch/_inductor/codegen/triton.py**. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet
* Update ValueRanges logic to use new sympy functions in **torch/utils/_sympy/value_ranges.py**. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions.

In **torch/fx/experimental/symbolic_shapes.py** we need to make some symbolic reasoning adjustments:

* Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now
* `_assert_bound_is_rational` is no more, we no longer generate rational bounds
* Don't intersect non-int value ranges with the `int_range`
* Support more sympy Functions for guard SYMPY_INTERP
* Assert the type of value range is consistent with the variable type

The new asserts uncovered necessary bug fixes:

* **torch/_inductor/codegen/cpp.py**, **torch/_inductor/select_algorithm.py**, **torch/_inductor/sizevars.py** - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions
* **torch/_inductor/utils.py** - make sure you actually pass in sympy.Expr to these functions
* **torch/_inductor/ir.py** - make_contiguous_strides_for takes int/SymInt, not sympy.Expr!
* **torch/export/dynamic_shapes.py** - don't use infinity to represent int ranges, instead use sys.maxsize - 1

Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at **test/test_proxy_tensor.py**

**Reland notes.** This requires this internal fbcode diff https://www.internalfb.com/phabricator/paste/view/P1403322587 but I cannot prepare the diff codev due to https://fb.workplace.com/groups/osssupport/posts/26343544518600814/

It also requires this Executorch PR https://github.com/pytorch/executorch/pull/3911 but the ET PR can be landed prior to this landing.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905
Approved by: https://github.com/xadupre, https://github.com/lezcano
2024-06-09 06:20:25 +00:00
Aaron Orenstein
ea614fb2b1 Flip default value for mypy disallow_untyped_defs [2/11] (#127839)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127839
Approved by: https://github.com/oulgen
2024-06-08 18:23:08 +00:00
Aaron Orenstein
dcfa7702c3 Flip default value for mypy disallow_untyped_defs [1/11] (#127838)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838
Approved by: https://github.com/oulgen
2024-06-08 18:16:33 +00:00
PyTorch MergeBot
ac51f782fe Revert "Complete revamp of float/promotion sympy handling (#126905)"
This reverts commit 2f7cfecd86.

Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/atalman due to Sorry need to revert - failing internally ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2155118778))
2024-06-07 16:01:46 +00:00
Jiashen Cao
56a3d276fe Handle custom op during TorchScript to ExportedProgram conversion (#127580)
#### Description
Handle custom ops during TorchScript to ExportedProgram covnersion
```python
torch.library.define(
    "mylib::foo",
    "(Tensor x) -> Tensor",
    lib=lib,
)

# PyTorch custorm op implementation
@torch.library.impl(
    "mylib::foo",
    "CompositeExplicitAutograd",
    lib=lib,
)
def foo_impl(x):
    return x + x

# Meta function of the custom op.
@torch.library.impl_abstract(
    "mylib::foo",
    lib=lib,
)
def foo_meta(x):
    return x + x

class M(torch.nn.Module):
    def forward(self, x):
        return torch.ops.mylib.foo(x)
```

#### Test Plan
* Add a test case where custom op is called and converted. `pytest test/export/test_converter.py -s -k test_ts2ep_converter_custom_op`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127580
Approved by: https://github.com/angelayi
2024-06-06 22:06:51 +00:00
Jiashen Cao
cd42b95047 Handle aten::__contains__ during TorchScript to ExportedProgram conversion (#127544)
#### Description
Add support for converting `prim::__contains__` from TorchScript IR to ExportedProgram, e.g.,
```python
class MIn(torch.nn.Module):
    def forward(self, x: torch.Tensor):
        return x.dtype in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
```
#### Test Plan
* Add test cases to cover both contains IR resulted from primitive types or Tensor. `pytest test/export/test_converter.py -s -k test_ts2ep_converter_contains`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127544
Approved by: https://github.com/angelayi
2024-06-06 05:00:13 +00:00
Edward Z. Yang
2f7cfecd86 Complete revamp of float/promotion sympy handling (#126905)
At a high level, the idea behind this PR is:

* Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.)
* Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers.

The story begins in **torch/utils/_sympy/functions.py**. Here, I make some changes to how we represent certain operations in sympy expressions:

* FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing).
* ModularIndexing, LShift, RShift now assert they are given integer inputs.
* Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver
* TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2**53 beyond what first coercing the integer to floats and then doing true division.
* Trunc is split to TruncToFloat and TruncToInt.
* Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result.
* RoundDecimal updated to consistently only ever return a float
* Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing)

In **torch/__init__.py**, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations.  Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information.

We also need to introduce some new op handlers in **torch/_inductor/ops_handler.py**:

* `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy
* `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv`

These changes have consequences. First, we need to make some administrative changes:

* Actually wire up these Sympy functions from SymInt/SymFloat in **torch/fx/experimental/sym_node.py**, including the new promotion rules (promote2)
* Add support for new Sympy functions in **torch/utils/_sympy/interp.py**, **torch/utils/_sympy/reference.py**
  * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function
  * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here
* Add printer support for the Sympy functions in **torch/_inductor/codegen/common.py**, **torch/_inductor/codegen/cpp_utils.py**, **torch/_inductor/codegen/triton.py**. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet
* Update ValueRanges logic to use new sympy functions in **torch/utils/_sympy/value_ranges.py**. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions.

In **torch/fx/experimental/symbolic_shapes.py** we need to make some symbolic reasoning adjustments:

* Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now
* `_assert_bound_is_rational` is no more, we no longer generate rational bounds
* Don't intersect non-int value ranges with the `int_range`
* Support more sympy Functions for guard SYMPY_INTERP
* Assert the type of value range is consistent with the variable type

The new asserts uncovered necessary bug fixes:

* **torch/_inductor/codegen/cpp.py**, **torch/_inductor/select_algorithm.py**, **torch/_inductor/sizevars.py** - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions
* **torch/_inductor/utils.py** - make sure you actually pass in sympy.Expr to these functions
* **torch/_inductor/ir.py** - make_contiguous_strides_for takes int/SymInt, not sympy.Expr!
* **torch/export/dynamic_shapes.py** - don't use infinity to represent int ranges, instead use sys.maxsize - 1

Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at **test/test_proxy_tensor.py**

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905
Approved by: https://github.com/xadupre, https://github.com/lezcano
2024-06-06 02:29:45 +00:00
Jiashen Cao
4f9fcd7156 Handle unpacking during TorchScript to ExportedProgram conversion (#127419)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127419
Approved by: https://github.com/angelayi
2024-06-05 15:27:13 +00:00
PyTorch MergeBot
d5cb5d623a Revert "Complete revamp of float/promotion sympy handling (#126905)"
This reverts commit fb696ef3aa.

Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/ezyang due to internal user reported ceiling equality simplification problem, I have a plan ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2148805840))
2024-06-05 03:57:58 +00:00
Jiashen Cao
f4b05ce683 Add registry for TorchScript to ExportedProgram conversion (#127464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127464
Approved by: https://github.com/ydwu4, https://github.com/angelayi
2024-06-04 22:53:00 +00:00
PyTorch MergeBot
4c074a9b8b Revert "[torchbind] always fakify script object by default in non-strict export (#127116)"
This reverts commit c27882ffa8.

Reverted https://github.com/pytorch/pytorch/pull/127116 on behalf of https://github.com/atalman due to Failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/127116#issuecomment-2147459339))
2024-06-04 12:53:19 +00:00
Edward Z. Yang
fb696ef3aa Complete revamp of float/promotion sympy handling (#126905)
At a high level, the idea behind this PR is:

* Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.)
* Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers.

The story begins in **torch/utils/_sympy/functions.py**. Here, I make some changes to how we represent certain operations in sympy expressions:

* FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing).
* ModularIndexing, LShift, RShift now assert they are given integer inputs.
* Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver
* TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2**53 beyond what first coercing the integer to floats and then doing true division.
* Trunc is split to TruncToFloat and TruncToInt.
* Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result.
* RoundDecimal updated to consistently only ever return a float
* Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing)

In **torch/__init__.py**, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations.  Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information.

We also need to introduce some new op handlers in **torch/_inductor/ops_handler.py**:

* `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy
* `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv`

These changes have consequences. First, we need to make some administrative changes:

* Actually wire up these Sympy functions from SymInt/SymFloat in **torch/fx/experimental/sym_node.py**, including the new promotion rules (promote2)
* Add support for new Sympy functions in **torch/utils/_sympy/interp.py**, **torch/utils/_sympy/reference.py**
  * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function
  * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here
* Add printer support for the Sympy functions in **torch/_inductor/codegen/common.py**, **torch/_inductor/codegen/cpp_utils.py**, **torch/_inductor/codegen/triton.py**. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet
* Update ValueRanges logic to use new sympy functions in **torch/utils/_sympy/value_ranges.py**. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions.

In **torch/fx/experimental/symbolic_shapes.py** we need to make some symbolic reasoning adjustments:

* Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now
* `_assert_bound_is_rational` is no more, we no longer generate rational bounds
* Don't intersect non-int value ranges with the `int_range`
* Support more sympy Functions for guard SYMPY_INTERP
* Assert the type of value range is consistent with the variable type

The new asserts uncovered necessary bug fixes:

* **torch/_inductor/codegen/cpp.py**, **torch/_inductor/select_algorithm.py**, **torch/_inductor/sizevars.py** - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions
* **torch/_inductor/utils.py** - make sure you actually pass in sympy.Expr to these functions
* **torch/_inductor/ir.py** - make_contiguous_strides_for takes int/SymInt, not sympy.Expr!
* **torch/export/dynamic_shapes.py** - don't use infinity to represent int ranges, instead use sys.maxsize - 1

Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at **test/test_proxy_tensor.py**

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905
Approved by: https://github.com/xadupre, https://github.com/lezcano
2024-06-04 11:47:32 +00:00
Boyuan Feng
2ad0e4197d [ts-migration] support aten::__is__, aten::__isnot__, aten::__not__, profiler::_record_function_enter_new, profiler::_record_function_exit (#127656)
Support more ops in ts converter and add unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127656
Approved by: https://github.com/SherlockNoMad
2024-06-04 04:51:29 +00:00
Yidi Wu
c27882ffa8 [torchbind] always fakify script object by default in non-strict export (#127116)
This diff can be risky for internal tests: any torchbind class that hasn't registered a fake class will fail and we should fix them. We've gained some confidence that this can work e2e by implementing FakeTensorQueue for TBE models in sigmoid with [D54210823](https://www.internalfb.com/diff/D54210823).

Differential Revision: [D57991002](https://our.internmc.facebook.com/intern/diff/D57991002)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127116
Approved by: https://github.com/zou3519
ghstack dependencies: #127113, #127114
2024-06-03 21:38:57 +00:00
angelayi
4d32de14b6 [export] Handle serializing duplicate getitem nodes (#127633)
We ran into a graph that looks something like the following, where we have 2 getitem calls to the same index (%getitem, %getitem_2 both query topk[0]):
```
graph():
    %x : [num_users=1] = placeholder[target=x]
    %topk : [num_users=3] = call_function[target=torch.ops.aten.topk.default](args = (%x, 2), kwargs = {})
    %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%topk, 0), kwargs = {})
    %getitem_1 : [num_users=1] = call_function[target=operator.getitem](args = (%topk, 1), kwargs = {})
    %getitem_2 : [num_users=1] = call_function[target=operator.getitem](args = (%topk, 0), kwargs = {})
    %mul_tensor : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%getitem, %getitem_2), kwargs = {})
    %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%mul_tensor, 2), kwargs = {})
    return (mul, getitem_1)
```

The duplicate getitem call gets created during a pass.. so there are a couple of solutions:

1. Change serializer to support the case of duplicate getitem calls
2. Change the pass so that it doesn’t produce duplicate getitem calls
3. Add a pass which dedups the getitem calls

As a framework, we should do 1 and 3 (through a CSE pass).

This PR implements solution 1. However, the serializer currently does some special handling for getitem nodes -- instead of directly serializing the getitem nodes, we serialize the output of the node that outputting a list of tensors (the %topk node in this example) into a list nodes for each output ([%getitem, %getitem_1]). This fails when we have duplicate getitem nodes to the same index (%getitem_2), since we do not record that duplicate getitem node anywhere. So, the solution this PR takes is that the serializer will deduplicate the getitem nodes (%getitem_2 will be replaced with %getitem). This would result in a sematically correct graph, but not necessarily node-to-node identical as the original fx graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127633
Approved by: https://github.com/ydwu4
2024-06-03 17:25:51 +00:00
Boyuan Feng
2cef2fc2b4 [ts migration] support aten::dim, aten::len, aten::__getitem__ (#127593)
- Add support for aten::dim, aten::len, aten::__getitem__ for torchscript to export converter.
- Add unit tests
Co-authored-by: cyy <cyyever@outlook.com>
Co-authored-by: Menglu Yu <mengluy@meta.com>
Co-authored-by: Animesh Jain <anijain@umich.edu>
Co-authored-by: Simon Fan <xmfan@meta.com>
Co-authored-by: Zain Rizvi <ZainR@meta.com>
Co-authored-by: Tugsbayasgalan (Tugsuu) Manlaibaatar <tmanlaibaatar@meta.com>
Co-authored-by: titaiwangms <titaiwang@microsoft.com>
Co-authored-by: Yueming Hao <yhao@meta.com>
Co-authored-by: IvanKobzarev <ivan.kobzarev@gmail.com>
Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
Co-authored-by: Edward Z. Yang <ezyang@meta.com>
Co-authored-by: Bin Bao <binbao@meta.com>
Co-authored-by: Feny Patel <fenypatel@meta.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: xinan.lin <xinan.lin@intel.com>
Co-authored-by: Zain Huda <zainhuda@meta.com>
Co-authored-by: Chien-Chin Huang <chienchin@fb.com>
Co-authored-by: Wei Wang <weiwan@nvidia.com>
Co-authored-by: Jason Ansel <jansel@meta.com>
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Iris Z <31293777+wz337@users.noreply.github.com>
Co-authored-by: Wang, Eikan <eikan.wang@intel.com>
Co-authored-by: angelayi <yiangela7@gmail.com>
Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
Co-authored-by: Yanbo Liang <ybliang8@gmail.com>
Co-authored-by: Catherine Lee <csl@fb.com>
Co-authored-by: Kwanghoon An <kwanghoon@meta.com>
Co-authored-by: Brian Hirsh <hirsheybar@fb.com>
Co-authored-by: Robert Mast <rmast@live.nl>
Co-authored-by: drisspg <drisspguessous@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127593
Approved by: https://github.com/SherlockNoMad, https://github.com/malfet
2024-06-02 00:36:33 +00:00
angelayi
b2f5fd8efb [ts_converter] Basic support for prim::If conversion (#127336)
Script module:
```
graph(%self : __torch__.M,
      %x.1 : Tensor,
      %y.1 : Tensor):
  %11 : int = prim::Constant[value=1]()
  %5 : bool = aten::Bool(%x.1) # /data/users/angelayi/pytorch2/test/export/test_converter.py:27:19
  %21 : Tensor = prim::If(%5) # /data/users/angelayi/pytorch2/test/export/test_converter.py:27:16
    block0():
      %8 : Tensor = aten::mul(%y.1, %y.1) # /data/users/angelayi/pytorch2/test/export/test_converter.py:28:27
      -> (%8)
    block1():
      %12 : Tensor = aten::add(%y.1, %y.1, %11) # /data/users/angelayi/pytorch2/test/export/test_converter.py:30:27
      -> (%12)
  return (%21)
```
ExportedProgram:
```
ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, x_1: "b8[]", y_1: "i64[]"):
            # File: <eval_with_key>.23:9 in forward, code: cond = torch.ops.higher_order.cond(l_args_0_, cond_true_0, cond_false_0, [l_args_3_0_]);  l_args_0_ = cond_true_0 = cond_false_0 = l_args_3_0_ = None
            true_graph_0 = self.true_graph_0
            false_graph_0 = self.false_graph_0
            conditional = torch.ops.higher_order.cond(x_1, true_graph_0, false_graph_0, [y_1]);  x_1 = true_graph_0 = false_graph_0 = y_1 = None
            return (conditional,)

        class <lambda>(torch.nn.Module):
            def forward(self, y_1: "i64[]"):
                # File: <eval_with_key>.20:6 in forward, code: mul_tensor = torch.ops.aten.mul.Tensor(l_args_3_0__1, l_args_3_0__1);  l_args_3_0__1 = None
                mul: "i64[]" = torch.ops.aten.mul.Tensor(y_1, y_1);  y_1 = None
                return mul

        class <lambda>(torch.nn.Module):
            def forward(self, y_1: "i64[]"):
                # File: <eval_with_key>.21:6 in forward, code: add_tensor = torch.ops.aten.add.Tensor(l_args_3_0__1, l_args_3_0__1, alpha = 1);  l_args_3_0__1 = None
                add: "i64[]" = torch.ops.aten.add.Tensor(y_1, y_1);  y_1 = None
                return add
```

This PR also adds support for TupleIndex and incorporates some changes from https://github.com/pytorch/pytorch/pull/127341
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127336
Approved by: https://github.com/BoyuanFeng
2024-05-31 17:46:16 +00:00
Boyuan Feng
4afc5c7bb9 [torchscript] Handle prim::device and prim::dtype (#127466)
- Support prim::device and prim::dtype during torchscript migration to export
- Add unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127466
Approved by: https://github.com/SherlockNoMad
2024-05-30 18:35:44 +00:00
Chen Lai
7827afca14 Copy the constant folding pass to the pass under export/passes folder (#127456)
It's a generic pass and I'm trying to find a good place to host it. It's currently needed by quantization flow. See context in D55930580, it's too much effort to land a fix in the inductor folder.

Differential Revision: [D57934182](https://our.internmc.facebook.com/intern/diff/D57934182/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127456
Approved by: https://github.com/angelayi
2024-05-30 18:04:08 +00:00
Jiashen Cao
d66f12674c Handle tuple and dict during TorchScript to ExportedProgram conversion (#127341)
* Add some test cases for testing List, Tuple, and Dict
* Refactor the conversion code slightly
* Add a logic to handle Dict
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127341
Approved by: https://github.com/SherlockNoMad, https://github.com/angelayi
2024-05-30 00:08:09 +00:00
Pian Pawakapan
8a31c2aa84 [export] allow complex guards as runtime asserts (#127129)
With the current state of export's dynamic shapes, we struggle with guards and constraints that are beyond the current dynamic shapes language, expressed with dims and derived dims. While we can compile and guarantee correctness for guards within the current language (e.g. min/max ranges, linear relationships, integer divisibility) we struggle to dynamically compile guards which extend beyond that.

For these "complex" guards, we typically do either of the following: 1) raise a constraint violation error, along the lines of "not all values of <symbol> in the specified range satisfy <guard>", with or without suggested fixes, 2) specialize to the provided static values and suggest removing dynamism, or 3) fail compilation due to some arbitrary unsupported case. Previous [work](https://github.com/pytorch/pytorch/pull/124949) went towards resolving this by disabling forced specializations, instead allowing the user to fail at runtime with incorrect inputs.

In this PR, relying on [hybrid backed-unbacked symints](https://github.com/pytorch/pytorch/issues/121749), [deferred runtime asserts](https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/runtime_assert.py), and the function [_is_supported_equivalence()](d7de4c9d80/torch/fx/experimental/symbolic_shapes.py (L1824)), we add a flag `_allow_complex_guards_as_runtime_asserts` which allows the user to compile exported programs containing these guards and maintain dynamism, while adding correctness checks as runtime assertions in the graph.

Hybrid backed-unbacked symints allow us to easily bypass "implicit" guards emitted from computation - guards that we ~expect to be true. Popular examples revolve around reshapes:
```
# reshape
def forward(self, x, y):  # x: [s0, s1], y: [s2]
    return x.reshape([-1]) + y  # guard s0 * s1 = s2

This leads to the following exported program

class GraphModule(torch.nn.Module):
    def forward(self, x: "f32[s0, s1]", y: "f32[s2]"):
        sym_size_int: "Sym(s2)" = torch.ops.aten.sym_size.int(y, 0)
        mul: "Sym(-s2)" = -1 * sym_size_int;  sym_size_int = None
        sym_size_int_1: "Sym(s0)" = torch.ops.aten.sym_size.int(x, 0)
        sym_size_int_2: "Sym(s1)" = torch.ops.aten.sym_size.int(x, 1)
        mul_1: "Sym(s0*s1)" = sym_size_int_1 * sym_size_int_2;  sym_size_int_1 = sym_size_int_2 = None
        add: "Sym(s0*s1 - s2)" = mul + mul_1;  mul = mul_1 = None
        eq: "Sym(Eq(s0*s1 - s2, 0))" = add == 0;  add = None
        _assert_scalar = torch.ops.aten._assert_scalar.default(eq, "Runtime assertion failed for expression Eq(s0*s1 - s2, 0) on node 'eq'");  eq = None

        view: "f32[s0*s1]" = torch.ops.aten.view.default(x, [-1]);  x = None
        add_1: "f32[s0*s1]" = torch.ops.aten.add.Tensor(view, y);  view = y = None
        return (add_1,)
```
Another case is symbol divisibility:
```
def forward(self, x):  # x: [s0, s1]
    return x.reshape([-1, x.shape[0] - 1])  # Eq(Mod(s0 * s1, s0 - 1), 0)
```

Applying deferred runtime asserts also helps dynamic compilation for "explicit" complex guards that typically cause problems for export. For example we can generate runtime asserts for not-equal guards, and complex conditions like the following:
```
class Foo(torch.nn.Module):
    def forward(self, x, y):
        # check that negation of first guard also shows up as runtime assertion
        if x.shape[0] == y.shape[0]:  # False
            return x + y
        elif x.shape[0] == y.shape[0] ** 3:  # False
            return x + 2, y + 3
        elif x.shape[0] ** 2 == y.shape[0] * 3:  # True
            return x * 2.0, y * 3.0
```
For the above graph we will generate 3 runtime assertions: the negation of the first 2, and the 3rd condition as a guard.

One additional benefit here over the current state of exported programs is that this adds further correctness guarantees - previously with explicit complex guards, if compilation succeeded, the guards would be ignored at runtime, treated as given.

As shown above, the runtime asserts appear as math ops in the graph, generated by the sympy interpreter, resulting in an _assert_scalar call. There is an option to avoid adding these asserts into the graph, by setting `TORCH_DYNAMO_DO_NOT_EMIT_RUNTIME_ASSERTS=1`. This results in the "original" computation graph, with dynamism, and any incorrect inputs will fail on ops during runtime. Further work could go into prettifying the printer, so the majority of the graph isn't guard-related.

Ideally this PR would subsume and remove the recently added [_disable_forced_specializations](https://github.com/pytorch/pytorch/pull/124949) flag, but that flag still handles one additional case of specialization: single-variable equalities where the symbol is solvable for a concrete value: see this [PR](https://github.com/pytorch/pytorch/pull/126925)

This PR doesn't change any behavior around data-dependent errors/unbacked symints yet, that could be further work.

NOTE: will take naming change suggestions for the flag :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127129
Approved by: https://github.com/avikchaudhuri
2024-05-29 17:15:25 +00:00
Jiashen Cao
10d2373abd Add a registry for GraphModuleSerializer (#126550)
This PR adds a registration function and a global registry for GraphModuleSerializer. After this PR, custom serialization methods can be done through registration instead of subclassing for ease of maintenance.

## Changes
- Add a test case where it injects custom op to test serialization.
- Add custom op handler
- Change allowed op for verifier
Co-authored-by: Zhengxu Chen <zhxchen17@outlook.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126550
Approved by: https://github.com/zhxchen17
2024-05-29 03:12:48 +00:00
Pian Pawakapan
f206c5c628 [export] handle new roots & root swapping in derived dims suggested fixes (#125543)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125543

This PR address 2 issues with derived dim suggested fixes, 1) newly introduced roots, and 2) root swapping.

1 | Newly introduced roots appear with modulo guards, e.g. Mod(dx, 2) = 0 suggests dx is a derived dim equal to 2 * _dx, introducing a new root _dx. Currently the final suggested fixes handle this correctly, but we can get intermediate results where related derived dims don't rely on a unified root, and are a mixture of min/max range and derived suggestions.

For example:
```
"dx": {"eq": 3*_dx-1, "max": 36}
"dy": {"eq": dx+1}
This should lead to suggested fixes
  _dx = Dim('_dx', max=12)
  dx = 3 * _dx - 1
  dy = 3 * _dx
```

This PR prettifies the suggested fixes routine by unifying to a single root, and making each intermediate suggestion either a derived dim or min/max range, not both.

2 | The current suggested fixes for derived dims can lead to root dims/derived dims being swapped, e.g. `dy - 1, dy` -> `dx, dx + 1`. This leads to problematic suggested fixes that look like `dy - 1 = Dim("dy - 1")` since we don't have access to the original variable name.

This PR only adds a suggested fix for the root dim, and removes all other derived suggestions.

For example, with the export test case test_derived_dim_out_of_order_simplified:
```
_dimz = torch.export.Dim("_dimz", min=6, max=8)
dimy = _dimz - 1
dimx = dimy - 1
dimz = torch.export.Dim("dimz", min=6, max=8)  # doesn't work, should be = _dimz

class Foo(torch.nn.Module):
    def forward(self, x, y, z):
        return x + y[1:] + z[2:]

foo = Foo()
u, v, w = torch.randn(5), torch.randn(6), torch.randn(7)
export(
    foo,
    (u, v, w),
    dynamic_shapes=({0: dimx}, {0: dimy}, {0: dimz}),
)
```

Before:
```
Suggested fixes:
  _dimz = Dim('_dimz', min=3, max=9223372036854775807)  # 2 <= _dimz - 1 <= 9223372036854775806
  _dimz - 2 = Dim('_dimz - 2', min=4, max=6)
  _dimz = Dim('_dimz', min=2, max=9223372036854775806)  # 2 <= _dimz <= 9223372036854775806
  _dimz - 1 = _dimz - 1
  dimz = _dimz
```

New suggested fixes:
```
Suggested fixes:
  dimz = _dimz
```

Note: This assumes the specified derived relations between dims are correct. This should be valid because: 1) if the relation is plain wrong (e.g. (dx, dx - 1) provided with inputs (6, 4)), this gets caught in beforehand in produce_guards. 2) if the relation is correct but does not match the emitted guard, for example:
```
def forward(self, x, y):
    return x.reshape([-1]) + y  # guard: s0 * 2 = s1
dx = Dim("dx")
export(
    model,
    (torch.randn(6, 2), torch.randn(12)),
    dynamic_shapes={"x": (dx, 2), "y": (dx + 6, )}
)
```
This produces two linear equations, leading to specialization since a) produce_guards is able to solve for a concrete value, and b) the export constraint solver will anyways force specializations due to range constraints.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125543
Approved by: https://github.com/avikchaudhuri
2024-05-28 20:41:43 +00:00
Jiashen Cao
254783ce80 [Fix]: populate input parameter name when convert TorchScript to ExportedProgram (#126787)
## Goal
As title

## Design
Based on the fact that each TorchScript module has a `code` property which provides the original source code for the `forward` function, I implemented a function to extrapolate `forward` function signature by using the AST parser.

Some other tradeoff
* Directly parsing src code as string --> will be very buggy
* Directly using `compile` function in Python to get the function object --> raises a lot of exceptions because of missing packages or undefined variable names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126787
Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan
2024-05-28 17:33:44 +00:00
Angela Yi
cbb79a2baf [export] Disable backend decomps for capture_pre_autograd (#127120)
Differential Revision: D57785713

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127120
Approved by: https://github.com/ydwu4
2024-05-28 16:37:13 +00:00
Xuehai Pan
35ea5c6b22 [3/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torchgen (#127124)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127124
Approved by: https://github.com/Skylion007
ghstack dependencies: #127122, #127123
2024-05-25 19:20:03 +00:00
Sheng Fu
bbeb0906c4 Register creak_node_hook (#126671)
Differential Revision: D57469157

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126671
Approved by: https://github.com/angelayi
2024-05-24 23:32:15 +00:00
Aaron Gokaslan
3cb16ebf08 [BE]: Update ruff to 0.4.5 (#126979)
Update ruff to 0.4.5 and addresses some false negatives that have been found in the newer version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126979
Approved by: https://github.com/ezyang
2024-05-24 18:38:35 +00:00
Angela Yi
cb6ef68caa Propagate tokens in aotautograd (#127028)
Test Plan: `buck run mode/dev-nosan //aimp/experimental/pt2:pt2_export -- --model-entity-id 938593492 --output /tmp/938593492.zip --use-torchrec-eager-mp --use-manifold`

Differential Revision: D57750072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127028
Approved by: https://github.com/tugsbayasgalan
2024-05-24 03:23:17 +00:00
Sherlock Huang
a63310eebc TorchScript 2 ExportedProgram Converter (#126920)
Summary:
Initial commit for TorchScript 2 ExportedProgram Converter.

TODO:
- Improve TorchScript IR coverage
- parameter and buffers should be owned by output ExportedProgram
- Experiment on conditional op conversion

Test Plan: buck2 run mode/dev-nosan fbcode//caffe2/test:test_export -- -r TestConverter

Differential Revision: D57694784

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126920
Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan
2024-05-23 17:00:18 +00:00
angelayi
da7bf1d588 [export] Fix unflatten with empty nn_module_stack (#126785)
Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1433418843962989/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126785
Approved by: https://github.com/tugsbayasgalan
2024-05-23 08:34:25 +00:00
Edward Z. Yang
0d17aae242 Teach FakeTensor to fill in item_memo when converting scalar CPU tensor (#126245)
This PR requires a little justification, but let's start with what it does first:

1. When you have a 0d CPU scalar int64/float64 tensor input to a graph, we will preallocate a backed SymInt/SymFloat corresponding to what you would get if you call item() on this tensor. This means you can freely change your input to be a Python int/float or a Tensor with an item() call and end up with exactly the same level of expressivity (specifically, you can guard on the internal SymInt/SymFloat no matter what). By default, the source of the backed SymInt/SymFloat is `L['tensor'].item()`, but if you have promoted a float input into a Tensor, we will cancel out `torch.as_tensor(L['float']).item()` into just `L['float']`.
2. We switch wrap_symfloat to use this, instead of hand crafting the new SymNodeVariable. Everything works out, except that we carefully pass the item() result to tracked fakes (and not the fake Tensor argument)

OK, so why do this at all? There is some marginal benefit where now some item() calls on scalar inputs can be guarded on, but IMO this is a pretty marginal benefit, and if it was the only reason, I wouldn't do this. The real reason for this is that I need to be able to propagate fake tensors through the graphs that are produced by Dynamo, and if I am doing the old custom wrap_symfloat logic, there's no way I can do this, because ordinarily an item() call will cause an unbacked SymInt when I reallocate.

The other obvious way to solve the problem above is to make a HOP alternative that item() that "bakes in" the backed SymInt its supposed to return. But this strategy seems more parsimonious, and it does have the marginal benefit I mentioned above. The main downside is that what I have to do next, is make it so that when I run tensor computation, I also apply the equivalent operations to the SymInt/SymFloat as well. That's next PR.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126245
Approved by: https://github.com/eellison
ghstack dependencies: #126637
2024-05-22 15:25:38 +00:00
Jiashen Cao
ac1f0befcf Remove redundant serialization code (#126803)
After https://github.com/pytorch/pytorch/pull/123308, we no longer need separate serialization path to handle different types that exist in the nn_module metadata. This PR cleans up the redundant code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126803
Approved by: https://github.com/angelayi
2024-05-22 03:14:17 +00:00
Edward Z. Yang
97eef61474 Don't assume compare_arg is fx.Node (#126771)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126771
Approved by: https://github.com/Skylion007
2024-05-21 19:53:21 +00:00
Sherlock Huang
0d5ba547ec Tool for scouting exportability in one shot (#126471)
Summary:
Tool for scouting exportability issues in one shot.

- Collect sample inputs for all submodules by running eager inference with forward_pre_hook.
- Start from root module, recursively try exporting child modules, if current module export fails.

Limitations:
- only works for nn.module that contains tree-like submodules structure. this doesn't work for flatten GraphModule.

TODO: support dynamic_dims

Sample output: https://docs.google.com/spreadsheets/d/1jnixrqBTYbWO_y6AaKA13XqOZmeB1MQAMuWL30dGoOg/edit?usp=sharing

```
exportability_report =
        {
            '': UnsupportedOperatorException(func=<OpOverload(op='testlib.op_missing_meta', overload='default')>),
            'submod_1': UnsupportedOperatorException(func=<OpOverload(op='testlib.op_missing_meta', overload='default')>),
            'submod_2': None
        }
```

Test Plan: buck2 run mode/dev-nosan fbcode//caffe2/test:test_export -- -r TestExportTools

Differential Revision: D57466486

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126471
Approved by: https://github.com/zhxchen17
2024-05-18 00:10:46 +00:00
Tugsbayasgalan Manlaibaatar
bed1c600bb Experimental prototype for converting torch.jit.trace modules to export (#124449)
Differential Revision: [D56440613](https://our.internmc.facebook.com/intern/diff/D56440613)

We want to do this for following reasons:
1. There is current limitation in export tracing for torch.jit.trace d modules that cannot be easily upstreamed
2. We need to run internal CI regularly to understand feature gaps and continuously track them
3. Multiple people will be working on this prototype so it is better to have a checked in version so we don't always run into merge conflicts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124449
Approved by: https://github.com/angelayi, https://github.com/avikchaudhuri
2024-05-17 20:42:42 +00:00
PyTorch MergeBot
f89500030b Revert "Remove redundant serialization code (#126249)"
This reverts commit aab448e381.

Reverted https://github.com/pytorch/pytorch/pull/126249 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing sigmoid/frontend:serialization_test internally ([comment](https://github.com/pytorch/pytorch/pull/126249#issuecomment-2118233656))
2024-05-17 19:19:02 +00:00
Matthew Hoffman
81277baa0c Remove removed ruff rule TRY200 (#126256)
My TOML linter is complaining that "TRY200" is not acceptable for the `tool.ruff.lint` schema.

From the ruff docs: https://docs.astral.sh/ruff/rules/reraise-no-cause/

> This rule has been removed and its documentation is only available for historical reasons.
>
> This rule is identical to [B904](https://docs.astral.sh/ruff/rules/raise-without-from-inside-except/) which should be used instead.

and we are currently explicitly ignoring B904.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126256
Approved by: https://github.com/Skylion007
2024-05-17 16:31:05 +00:00
Tarun Karuturi
4b7eee3450 Print export warning only once in capture_pre_autograd (#126403)
Summary: Missed this in D57163341

Test Plan: CI

Differential Revision: D57442088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126403
Approved by: https://github.com/zhxchen17
2024-05-16 21:55:11 +00:00
Jiashen Cao
aab448e381 Remove redundant serialization code (#126249)
After https://github.com/pytorch/pytorch/pull/123308, we no longer need separate serialization path to handle different types that exist in the `nn_module` metadata. This PR cleans up the redundant code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126249
Approved by: https://github.com/angelayi
2024-05-16 19:22:20 +00:00
Wang, Eikan
08aa704d0c [1/N] Non-Tensor: Scalar Support: Enable aot compile to support aten operations with scalar input like alpha (#124177)
Some operations have a scalar input parameter, like `torch.add(a, b, alpha=2.0)`.  Currently, the aot compile does not support such a case because it requires the signature of the captured graph to align with the operation's signature. This means that some inputs in the captured graph may be scalar(float, int, bool, etc.). It breaks the assumption of `compile_fx_aot` as it assumes all the example inputs are tensor - 0f6ce45bcb/torch/_inductor/compile_fx.py (L1048)

This PR intends to support such cases by allowing not-aligned signature and filtering out the non-Tensor parameters.

Captured graph for `torch.add(a, b, alpha=2.0)`

```
opcode         name      target           args              kwargs
-------------  --------  ---------------  ----------------  --------------
placeholder    arg0_1    arg0_1           ()                {}
placeholder    arg1_1    arg1_1           ()                {}
call_function  add       aten.add.Tensor  (arg0_1, arg1_1)  {'alpha': 2.0}
output         output_1  output           ((add,),)         {}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124177
Approved by: https://github.com/jansel, https://github.com/desertfire, https://github.com/jgong5
2024-05-16 05:15:55 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
26f6f98364 Forward fix failures for torch.export switch to predispatch (#126081)
Summary:
Fixes:
- executorch test
- torchrec test

Test Plan: CI

Differential Revision: D57282304

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126081
Approved by: https://github.com/angelayi
2024-05-15 18:13:06 +00:00
angelayi
c712b0f8a3 [export] Fix runtime assertions to add call_function (#125878)
Fixes [internal issue](https://www.internalfb.com/intern/everpaste/?handle=GJCK9xUNpYXovnEBAHfuJ7vQLxZnbsIXAAAB)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125878
Approved by: https://github.com/ezyang
2024-05-14 00:57:50 +00:00
Pian Pawakapan
c9a258e474 [export] handle constant aliasing for export (#125509)
Summary: Currently export will [error out](2b5ae2611e/torch/export/_trace.py (L477)) if a constant is aliased. This PR supports this by modifying ConstantAttrMap to map constants to a list of FQNs instead of a single FQN, populating the ExportedProgram constants dict to contain multiple entries to the same constant.

Test Plan: added test case in test_export.py

Differential Revision: D56955654

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125509
Approved by: https://github.com/angelayi, https://github.com/ydwu4
2024-05-10 00:14:37 +00:00
angelayi
13545fe68a [export] Don't create a new fake mode if dynamo tracing (#125185)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125185
Approved by: https://github.com/mikekgfb
2024-05-09 23:43:08 +00:00
Tarun Karuturi
eaaf0f3299 Print capture_pre_autograd_graph warning only once (#125848)
Summary: Print this warning only once to avoid flooding the logs of workflows where this is called frequently.

Test Plan: CI

Differential Revision: D57163341

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125848
Approved by: https://github.com/zhxchen17
2024-05-09 22:04:05 +00:00
Tugsbayasgalan Manlaibaatar
0e419b9146 Fix graph partitioner and make runtime assertion work with submodules in export (#125793)
Summary: This fix does three things:

1. When we add inputs from partioner to the top level graph module, we insert in the order of partioner which is not guaranteed to be same as original graph inputs. This PR fixes that.
2. When we replace autograd ops with HOP, we create new submodules and access their outputs via getitem calls. As a result, previous node names associated with getitem gets updated, resulting in the graph being different from produced graph signature. So I just update the graph signature accordingly.
3. We run runtime_assertion pass before autograd HOP pass because the constraints won't be populated correctly.

Differential Revision: [D57130314](https://our.internmc.facebook.com/intern/diff/D57130314)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125793
Approved by: https://github.com/zhxchen17
2024-05-09 18:13:46 +00:00
Zhengxu Chen
3ccf107f01 [export] remove upgrader. (#125625)
Summary: talked to executorch team, seems we can remove this now.

Test Plan: CI

Differential Revision: D57013451

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125625
Approved by: https://github.com/larryliu0820
2024-05-09 16:30:12 +00:00
Pian Pawakapan
f4b2d50fd7 [export] disable_forced_specializations (#124949)
Summary:
By default, some inferred dynamic shapes guards/constraints that are not expressible with the current dynamic shapes language will lead to specialization to the concrete input values provided. If disable_forced_specializations is set to True, we will not specialize, and will not perform runtime checks on such produced guards. Instead, we allow the user to specify arbitrary shapes, and fail during runtime if the inputs are invalid. Constraints expressible with the language (e.g. ranges, linear derived dims) will still be enforced, and behavior for all other guards remains the same.

Cases where we typically specialize are reshapes:
```
x: [4, 6]  # [s0, s1]
x = x.reshape([x.shape[0] - 1, -1])
# this emits a guard Mod(s0*s1, s0-1) = 0, we specialize on s0=4, s1=6

x: [4, 6], y: [24]  # [s0, s1], [s2]
x = x.reshape([-1]) + y
# this emits a guard s0*s1 = s2, we specialize on s0=4, s1=6, s2=24
```

For now only applicable for non-strict mode (need to figure out how to pass this flag into dynamo's call of produce_guards).

Test Plan: Added test case that checks compilation, runtime, and suggested fixes behavior.

Differential Revision: D56361177

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124949
Approved by: https://github.com/avikchaudhuri
2024-05-08 18:42:39 +00:00
angelayi
8be4c1bc2f [export] Add metadata for nodes insert_deferred_runtime_asserts (#125414)
Fixes [internal error](https://fb.workplace.com/groups/1075192433118967/permalink/1416709435633930/).

The issue is that the asserting nodes added in the `insert_deferred_runtime_assertion` pass do not contain metadata that the ExportedProgram requires the graph to have. One solution to fix this is to retrace the entire module, or another solution is to manually add back this metadata.

This diff implements the latter solution (manually add back the metadata) through hooking into fx.graph's `create_node` function, and adding export-specific metadata for every node that is created. The reason I did this is so that the `insert_deferred_runtime_assertion` does not have to know about what metadata export wants.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125414
Approved by: https://github.com/zhxchen17, https://github.com/BoyuanFeng
2024-05-07 23:15:21 +00:00
Zhengxu Chen
8024e72326 [export] Warn on capture_pre_autograd_graph. (#125602)
Summary: capture_pre_autograd_graph is deprecated and torch.export won't able to provide timely fix for this API. To reduce some confusion around this we should explicitly give users clear warnings.

Test Plan: eyes

Reviewed By: tarun292

Differential Revision: D56955202

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125602
Approved by: https://github.com/angelayi
2024-05-07 22:51:17 +00:00
angelayi
0de9ce9bb3 [export] Fix serialization of empty torch artifact (#125542)
A previous PR added support for serializing/deserializing example inputs, but this fails when `example_inputs` is none.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125542
Approved by: https://github.com/pianpwk, https://github.com/BoyuanFeng, https://github.com/ydwu4
2024-05-07 15:54:45 +00:00
PyTorch MergeBot
7ffa5558ee Revert "[FX] Update type hints in torch.fx._compatibility.py (#125469)"
This reverts commit 235b4d6ec2.

Reverted https://github.com/pytorch/pytorch/pull/125469 on behalf of https://github.com/izaitsevfb due to breaks pyre in dependent projects (internal: see D56986361) ([comment](https://github.com/pytorch/pytorch/pull/125469#issuecomment-2096665396))
2024-05-06 18:36:43 +00:00
Aaron Gokaslan
1dd42e42c4 [BE]: Try TCH autofixes on torch/ (#125536)
Tries TCH autofixes and see what breaks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536
Approved by: https://github.com/ezyang
2024-05-05 23:13:59 +00:00
Xuehai Pan
235b4d6ec2 [FX] Update type hints in torch.fx._compatibility.py (#125469)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125469
Approved by: https://github.com/Skylion007
ghstack dependencies: #125468
2024-05-05 19:30:22 +00:00
ydwu4
0302dc68bf [Reland] Fakify script object inputs and attributes for non-strict ex… (#125490)
A re-land of #124239.

This PR fakify ScriptObject inputs and attributes in export non-strict mode by default.

The basic idea is to only fakify the script object during tracing (i.e. aot_export). After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor.

Concretely, we

fakify all the script object inputs, and module attributes (gathered by constant_attrs).
patch the module's attributes with fakified script object
right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125490
Approved by: https://github.com/angelayi
2024-05-04 02:39:42 +00:00
Zhengxu Chen
12a69afa6d [export] Fix deserializer node meta handling. (#125454)
Summary: The code seems not needed because serializer shouldn't make any meaningful decision about what goes to node metadata.

Test Plan: CI

Differential Revision: D56918543

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125454
Approved by: https://github.com/angelayi
2024-05-03 16:51:08 +00:00
Pian Pawakapan
ef757a5c00 [export] use tree_map for _flatten_dynamic_shapes (#125415)
Summary:
Fixing the implementation of `_flatten_dynamic_shapes()`, to follow how `_process_dynamic_shapes()` does it. The previous implementation would misinterpret some nested dynamic shapes specs, causing it to miss out on some shapes specs, for example with nested inputs/constant input tuples:

```
inputs = (
    (2, 1),
    (
        torch.randn(2, 1),
        torch.randn(2, 2),
        torch.randn(2, 3),
    )
)

dynamic_shapes = (
    (None, None),
    (
        None,
        None,
        None,
    )
)
```
This would get interpreted as 2 shapes specs for 2d and 3d tensors. Fixing so this doesn't happen.

Test Plan: Existing export tests

Differential Revision: D56894923

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125415
Approved by: https://github.com/angelayi
2024-05-03 04:59:17 +00:00
PyTorch MergeBot
f1f142c44f Revert "Fakify script object inputs and attributes for non-strict export (#124239)"
This reverts commit ecc2e034f7.

Reverted https://github.com/pytorch/pytorch/pull/124239 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/124239#issuecomment-2089305447))
2024-05-01 23:56:00 +00:00
Avik Chaudhuri
746da8755c switch tests from constrain_as* to torch._check* (#125253)
To fix data-dependent errors we want to recommend that people use `torch._check*` APIs. The `constrain_as*` APIs should be fully subsumed by them, and in the future we should kill them entirely.

Differential Revision: D56774333

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125253
Approved by: https://github.com/ezyang
2024-05-01 21:01:27 +00:00
ydwu4
ecc2e034f7 Fakify script object inputs and attributes for non-strict export (#124239)
This PR fakify ScriptObject inputs and attributes in export non-strict mode by default.

The basic idea is to `only fakify the script object during tracing (i.e. aot_export)`. After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor.

Concretely, we
1. fakify all the script object inputs, and module attributes (gathered by constant_attrs).
2. patch the module's attributes with fakified script object
3. right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124239
Approved by: https://github.com/zou3519
2024-04-30 15:57:25 +00:00
Aaron Gokaslan
3e1fb96964 [BE]: RUF018 - ban assignment in assert (#125125)
Ban assignment inside of assert. Python code should ideally not break with assertions disabled. Adds a ruff lint rule to enforce this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125125
Approved by: https://github.com/ezyang
2024-04-28 21:41:36 +00:00
Edward Z. Yang
7aa6bd7fa0 Refactor all top level usages of record_shapeenv_event to ShapeEnv class (#123735)
This ensures that first argument to record_shapeenv_event is a ShapeEnv
so we can appropriately short circuit when recording is not in progress.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123735
Approved by: https://github.com/ysiraichi, https://github.com/zou3519, https://github.com/albanD
2024-04-27 20:36:40 +00:00
Aaron Orenstein
a8574a9719 Fix global flake8 issues (#124771)
Prior to this `lintrunner --all-files --take FLAKE8` failed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771
Approved by: https://github.com/Skylion007
ghstack dependencies: #124428
2024-04-26 15:35:53 +00:00
Aaron Gokaslan
2f3b0befed [BE]: Apply ruff FURB 118. (#124743)
Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743
Approved by: https://github.com/albanD, https://github.com/malfet
2024-04-26 14:34:52 +00:00
PyTorch MergeBot
1ac60484c1 Revert "Fix global flake8 issues (#124771)"
This reverts commit f01275934b.

Reverted https://github.com/pytorch/pytorch/pull/124771 on behalf of https://github.com/jeanschmidt due to Unfortunately, I needed to revert #123735 and this one depends on it. So please check if there are no merge conflicts or breakages and feel free to merge this PR again ([comment](https://github.com/pytorch/pytorch/pull/124428#issuecomment-2078699836))
2024-04-26 06:15:17 +00:00
PyTorch MergeBot
e607dc8abb Revert "Refactor all top level usages of record_shapeenv_event to ShapeEnv class (#123735)"
This reverts commit 87bec7db4e.

Reverted https://github.com/pytorch/pytorch/pull/123735 on behalf of https://github.com/jeanschmidt due to Breaking internal signals, more info in D56587358 ([comment](https://github.com/pytorch/pytorch/pull/123735#issuecomment-2078695590))
2024-04-26 06:10:58 +00:00
angelayi
724f8dd8c5 [export] Serialize empty list based on argument type (#123748)
Fixes https://github.com/pytorch/pytorch/issues/123480

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123748
Approved by: https://github.com/zhxchen17
2024-04-25 23:03:27 +00:00
Zhengxu Chen
7bb89bcaa4 [export] Fix state dict reparametrization in non-strict. (#124847)
Summary:

There are multiple things implemented incorrectly in non strict for reparametrizing state dict:
1. The same fake tensor should be generated for duplicated weights.
2. We should snapshot state dict in the beginning to always hold the invariant that ep.state_dict == mod.state_dict()
3. We will overwrite real weights with fake weights if we don't restore the weights in LIFO ordering.
4. We don't turn on strict checking which could sliently fail on corner cases.

This diff aims to solve all these issues at once.

Test Plan: CI

Differential Revision: D56505020

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124847
Approved by: https://github.com/pianpwk
2024-04-25 22:44:16 +00:00
angelayi
84fb96130f [export] Fix check for optional tensor returns (#123739)
Sorry for the delay! Addressing issue in https://www.internalfb.com/diff/D55455000?dst_version_fbid=1599488570890576&transaction_fbid=776042617791884
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123739
Approved by: https://github.com/zhxchen17
2024-04-25 20:51:26 +00:00
Pian Pawakapan
93a319a4fc [export] kill _process_constraints() (#123985)
The process for populating range_constraints follows separate methods for non-strict (`make_constraints`), and strict (`_process_constraints`). The strict method is somewhat more convoluted, and the analysis that Dynamo performs for strict is already present as part of the non-strict process in make_constraints (produce_guards(), running the export constraint solver).

This PR kills _process_constraints() and replaces calls with make_constraints, without duplicating the work that Dynamo already does.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123985
Approved by: https://github.com/avikchaudhuri
2024-04-25 16:58:57 +00:00
Aaron Orenstein
f01275934b Fix global flake8 issues (#124771)
Prior to this `lintrunner --all-files --take FLAKE8` failed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771
Approved by: https://github.com/Skylion007
ghstack dependencies: #124428
2024-04-25 14:25:00 +00:00
Edward Z. Yang
87bec7db4e Refactor all top level usages of record_shapeenv_event to ShapeEnv class (#123735)
This ensures that first argument to record_shapeenv_event is a ShapeEnv
so we can appropriately short circuit when recording is not in progress.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123735
Approved by: https://github.com/ysiraichi, https://github.com/zou3519, https://github.com/albanD
ghstack dependencies: #124310, #124314, #124316, #124394, #124739, #124782, #124785
2024-04-25 14:02:48 +00:00
Edward Z. Yang
13ab24f192 Reimplement unbacked symbol bindings in Inductor (#124394)
This PR has a lot of "draw the rest of the fucking owl" energy. Here's how to break it down.

1. **torch/_inductor/graph.py** - We start by tightening unbacked symbol invariants. Specifically, as we lower FX nodes, we check whether or not every unbacked_binding recorded on the FX node meta, actually ends up getting bound (according to get_unbacked_symbol_defs) in all the buffers generated by the lowering. Hopefully this invariant is self evident. This leads to a lot of failures.
2. **torch/_inductor/ir.py** - Problem 1: There is softness in how Inductor computes defs of unbacked symbols in IR node. Previously, we tried to infer it by looking at the output sizes/strides/etc and see if new unbacked symbols popped up that we hadn't seen in the inputs. I don't know exactly what was buggy about the old code, but sometimes we would fail to notice an unbacked symbol had been bound, or rebind an unbacked symbol multiple times. Fortunately, thanks to the earlier PRs in our stack, we now have a nice list of unbacked symbol bindings from FX, so we now just store it directly on ExternKernel and use it directly to report defs. This has to be done twice: once for FallbackKernel (e.g., nonzero) and once for DynamicScalar (e.g., item) (see also **torch/_inductor/lowering.py**, **torch/_inductor/codegen/wrapper.py** and  **torch/_inductor/codegen/cpp_wrapper_cpu.py** for the lowering and codegen changes for item)
   * **process_kernel** - Sidequest! It turns out that Inductor lowering can reallocate unbacked symbols. This happens specifically when we repropagate fake tensors through the operator in `process_kernel`. This repropagation process is necessary because Inductor may have changed the strides of input tensors, and it must now recompute the strides so that it can continue to appropriately plan the rest of the lowering process. This is fine: we just make sure we do the rebind unbacked + compute_unbacked_bindings dance we've been doing previously in the PR stack. But instead of putting unbacked_bindings on a new FX node, they go straight into our unbacked_bindings on the Inductor IR node.
    * **codegen_unbacked_symbol_defs** - Sidequest! FallbackKernel lowering is done in two steps. First, you emit the FallbackKernel buffer. Then, you emit MultiOutput buffers which actually give access to the individual outputs of FallbackKernel, which may have been multi-output. There is a design decision here: does the FallbackKernel bind the unbacked symbols, or the MultiOutput buffer? Historically, we put the binding on MultiOutput buffer, because it's more convenient: the FallbackKernel buffer is fake, in fact, it doesn't even get a name in C++ codegen. But it's kind of inconsistent with the keypath model that we've been tracking unbacked bindings with: if you have a multi-output node, you'd expect a keypath like `[0].size()[0]` representing the first output's first dimension size. That suggests that it's the FallbackKernel that should define the things. So that was my first implementation. Unfortunately, the C++ codegen is too cursed and I could not understand how to make it work in that case. So now we just unsoundly assume you cannot have multi-output data dependent output, and do the codegen in MultiOutput. There are some comments explaining exactly what we are improperly assuming.
3. **_rename_unbacked_to** in **torch/fx/experimental/symbolic_shapes.py** - Previously, when we renamed unbacked symbols, we clobbered any facts we previously knew about them. So for example, if we had a replacement `u0 -> s0` but then we renamed u0 to u1, we would now setup the replacement `u0 -> u1`, clobbering the old replacement. This apparently didn't matter in earlier PRs in the stack, but with Inductor now on the ball, there were some tests that indicated this was a problem. The solution is easy: if u0 had a preexisting replacement, reapply it to u1. However...
    * **torch/_functorch/_aot_autograd/collect_metadata_analysis.py** - When we run forward analysis, this triggers fake tensor repropagation and fresh allocations. Previously, we just cleared out the pending symbols when finished the analysis. But with the change above, this would also migrate replacements to the new symbols... which are now dead. So now we explicitly suppress generation of these symbols with `ignore_fresh_unbacked_symbols` so that no rebinding happens at all.
    * **torch/_dynamo/eval_frame.py** - same deal; I just searched for all sites we called clear() on pending
4. The last step is fixing the long tail of extra problems that show up, now that unbacked_bindings are load bearing into Inductor
    * **torch/_dynamo/eval_frame.py** - Some of the exports are making copies of nodes without repropagating fake tensors, so in this case, it is important to also copy the `unbacked_bindings` (apparently this didn't matter before without the Inductor changes)
    * **torch/_export/pass_base.py** - I discover that this is doing fake tensor repropagation via a test suite failure. Do the same playbook as AOTAutograd: PropagateUnbackedSymInts too!  Actually, they also have implemented their own tracer as well, so do the same playbook as proxy_tensor: record unbacked_bindings on the newly traced nodes. UGH code duplication.
    * **torch/_subclasses/fake_tensor.py**, **torch/_subclasses/fake_impls.py** (with call site updates at  **torch/_functorch/_aot_autograd/traced_function_transforms.py** and **torch/fx/passes/fake_tensor_prop.py**) - What's this new epoch thing? I noticed that sometimes I would be retracing, call nonzero() on a fake tensor, and not allocate a new unbacked symbol. This is actually bad, because if I don't get a new unbacked symbol, I don't know there's a binding site, and `unbacked_bindings` is now missing a binding. The reason for this is memoization: if I reuse the exact same fake tensor on my retrace, it will already have an unbacked symint memoized on it and we will short circuit allocation. Well, that's no good. So I associate the memos with a fake tensor epoch, and every time you start a new fake tensor propagation from scratch, you bump the epoch so that I clear all the memos.
    * **torch/_inductor/scheduler.py** - I notice in unit tests that V.current_node is not always set when we call process_kernel. So I save it into the IR node and restore it when we are running `get_estimated_runtime`.
    * **torch/fx/experimental/symbolic_shapes.py** - A few things
      * **rebind_unbacked** (re **_tensor_version**). Ordinarily, when you have an unbacked SymInt, you persistently hvae it all the way to the end of the program. `_tensor_version` violates this: this generates an unbacked SymInt (for reasons I don't quite understand?) and then gets rid of it later. This triggered an assert violation. I think this op is kind of misusing unbacked SymInt, but I didn't know how to refactor it, so it gets a special case.
      * **rebind_unbacked** (re **Simplify SymBool binding**). Ugh, SymBool, what a pain in the butt. I have an assert that you can only rebind unbacked symbol to another unbacked symbol. This assert fails when a boolean is involved, because the result of running keypath on the result is not `u1`, it's `sympy.Piecewise(... sympy.Eq(u1, 1) ...)`. This is actually just `u1`, but Sympy doesn't know it because it doesn't know that `u1` value range is `[0, 1]`. So we manually implement the simplification needed to get the assert to pass.
      * **compute_unbacked_bindings** (re **This is pretty fragile**). There is a really funny disaster involving memoization and Inductor process kernel. Ordinarily when I retrace, if there was a memo hit in the old trace, there will be a memo hit in the new trace. However, Inductor process kernel breaks this, because it recreates fake tensor inputs to the operator call from scratch (since they might have different strides), and obviously these tensor inputs don't have the memo from the old one. I tried a little bit to try to manually transplant the memo to the new fake tensor but it seemed hopeless, so I just let the fresh symbol ride, allocating a new unbacked symbol. However, in one of our tests, we rely on knowing that the first nonzero call is equal to the second (memoized) nonzero call. The equality test looked pretty easy to discharge, so I just went ahead and added a deferred runtime assert to this effect and it worked.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124394
Approved by: https://github.com/jansel
ghstack dependencies: #124310, #124314, #124316
2024-04-25 02:08:59 +00:00
Zhengxu Chen
d40774f4ed [export] Fix up nn_module_stack for nodes occured around tracepoint ops. (#124457)
Summary: as title.

Test Plan:
hg checkout D55901896
buck run mode/opt torchrec/ir/tests:test_serializer -- --filter-regex test_serialize_deserialize_ebc

Differential Revision: D56340319

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124457
Approved by: https://github.com/tugsbayasgalan
2024-04-23 20:26:44 +00:00
Pian Pawakapan
e112792a69 [export] refactor _AddRuntimeAssertionsForInlineConstraintsPass (#124503)
Summary:
The current _AddRuntimeAssertionsForInlineConstraintsPass has 2 known issues caused by its use of torch.fx.Interpreter:
1. SymInt-related ops (e.g. item()) are executed, causing new Unbacked SymInts to appear in the graph during the pass.
2. The graph is reconstructed, and node names/indices can be different from before, causing mismatches with `module_call_graph`, and leading to issues during unflattening.

This refactors the pass to use PassBase instead of _ExportPassBaseDeprecatedDoNotUse, only constructing new nodes for assertions.

Test Plan: This pass is called on all strict-mode export calls with range_constraints, test that behavior remains unchanged.

Differential Revision: D56360137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124503
Approved by: https://github.com/zhxchen17
2024-04-23 20:07:49 +00:00
Pian Pawakapan
10b9d4d19c [export] handle Dim.lower = 0, 1 for ep.run_decompositions() (#123602)
Summary:
With pre-dispatch export and ep.run_decompositions(), range constraints are updated through looking at ShapeEnv.var_to_range. However the lower bounds on these may be incorrect - analysis on un-specialized symbols are done with lower bounds of 2, which mismatch with user-specified bounds (may be 0, 1).

This updates `_get_updated_range_constraints()` to use the old range constraints if possible.

Test Plan: Existing pre-dispatch/dynamic shapes test case.

Differential Revision: D55899872

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123602
Approved by: https://github.com/tugsbayasgalan
2024-04-19 21:29:36 +00:00
angelayi
74bedbb9e1 [export] Serialize rational symint ranges (#123884)
Some symints result in rational ranges like 10/3 which runs into an error ([example](https://www.internalfb.com/intern/everpaste/?handle=GMG2AxkeoFUrh-UDAFcE8pKPgjoUbsIXAAAB)).

Ed will eventually get rid(?) of these rational ranges but as a workaround export can just clamp the results during serialization time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123884
Approved by: https://github.com/zhxchen17
2024-04-18 18:20:11 +00:00
Pian Pawakapan
90d1720861 [export] Restore original placeholder names (part 3: constant input de/serialization) (#123590)
Summary:
note: breaking the original diff D55225818 into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.

Stacked PR to restore original names to placeholder nodes, replacing the default names arg0_1, arg1_1, ...

This PR supports constant argument placeholder (e.g. forward(self, x, y=1)) names and de/serialization, by adding a name field for ConstantArguments in the graph signature, and ConstantInputSpec in the input specs for serialization.

Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py

Differential Revision: D55506949

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123590
Approved by: https://github.com/angelayi, https://github.com/zhxchen17
2024-04-15 19:09:41 +00:00
Zhengxu Chen
951582949b [export] Enforce final classes in serialization. (#123861)
Summary: as title, these are private API and not meant to be used across repos.

Test Plan: CI

Differential Revision: D56027954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123861
Approved by: https://github.com/tugsbayasgalan
2024-04-12 15:44:56 +00:00
Pian Pawakapan
d0ccf599cc [export] Restore original placeholder names (part 2: higher-order-op subgraph naming) (#123587)
Summary:
note: breaking the original diff [D55225818](https://www.internalfb.com/diff/D55225818) into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.

Stacked PR to restore original names to placeholder nodes, replacing the default names arg0_1, arg1_1, ...

This PR propagates node names to higher-order-op subgraph placeholders, retaining the top-level names and handling naming collisions by suffixing other non-placeholder nodes in the subgraph with an index. This is the same handling as in fx.Graph/fx.Node, but implemented separately as a pass.

Since the input schemas of HOO subgraphs are very different, they are enumerated in _name_hoo_subgraph_placeholders(). Currently cond, map_impl, and wrap_with_set_grad_enabled are handled, but other ops can be easily added.

Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py

Differential Revision: D55456749

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123587
Approved by: https://github.com/angelayi
2024-04-11 22:40:46 +00:00
Angela Yi
b287dbbc24 [export] Fix naming if state dict contains colons (#123601)
Test Plan:
buck2 run mode/opt //aps_models/pyper/ads:train\[inplace\] +training.ir_serializer=on_disk

https://www.internalfb.com/intern/everpaste/?handle=GICWmAB0g_Z1StMCAMxuhJI6U9pHbsIXAAAz

Reviewed By: tugsbayasgalan

Differential Revision: D55894742

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123601
Approved by: https://github.com/pianpwk
2024-04-09 21:25:08 +00:00
Pian Pawakapan
8bd6223730 [export] construct set_grad_enabled HOO subgraph inside other HOO subgraphs (#123391)
Summary:
Reference: https://github.com/pytorch/pytorch/pull/121736

Previously set_grad_enabled nodes in HOO subgraphs (e.g. cond) were inlined and not replaced with their own HOO subgraphs. This diff recursively does that.

Example:
```
class Model(torch.nn.Module):
    def forward(self, x, y):
        def true_fn(x, y):
            with torch.enable_grad():
                return x - y

        return torch.cond(
            x.sum() > 0,
            true_fn,
            lambda x, y: x + y,
            [x, y],
        )
```

Before (printing out `ep.graph_module.true_graph_0`):
```
        class <lambda>(torch.nn.Module):
            def forward(self, arg0_1: "i64[]", arg1_1: "i64[]"):
                # No stacktrace found for following nodes
                _set_grad_enabled = torch._C._set_grad_enabled(True)
                sub: "i64[]" = torch.ops.aten.sub.Tensor(arg0_1, arg1_1);  arg0_1 = arg1_1 = None
                _set_grad_enabled_1 = torch._C._set_grad_enabled(False)
                return (sub,)
```

After:
```
        class GraphModule(torch.nn.Module):
            def forward(self, arg0_1: "i64[]", arg1_1: "i64[]"):
                # No stacktrace found for following nodes
                submod_3 = self.submod_1
                sub: "i64[]" = torch._higher_order_ops.wrap.wrap_with_set_grad_enabled(True, submod_3, arg0_1, arg1_1);  submod_3 = arg0_1 = arg1_1 = None
                return (sub,)

            class GraphModule(torch.nn.Module):
                def forward(self, arg0_1: "i64[]", arg1_1: "i64[]"):
                    # No stacktrace found for following nodes
                    sub: "i64[]" = torch.ops.aten.sub.Tensor(arg0_1, arg1_1);  arg0_1 = arg1_1 = None
                    return sub
```

Differential Revision: D55770138

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123391
Approved by: https://github.com/tugsbayasgalan
2024-04-09 02:08:03 +00:00
Pian Pawakapan
42c2a5477c [export] nn_module_stack to return class name str (#123308)
Previously, `node.meta["nn_module_stack"]` had type `Dict[str, Tuple[str, class]]` when exported, and later `Dict[str, Tuple[str, str]]` after de/serialization. This PR changes it to consistently be `Dict[str, Tuple[str, str]]` for round-trippability, i.e.
```
{..., 'L__self___conv': ('conv', 'torch.nn.modules.conv.Conv2d')}
```

`source_fn_stack` is left untouched in this PR.

note: the `Union[type, str]` type annotations in ONNX are because ONNX goes through both `export.export()` and `_dynamo.export()` (which still has the original `Dict[str, Tuple[str, class]]` format). nn_module_stack from `export.export()` should consistently have the new format, and we verify/test for that in `_trace.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123308
Approved by: https://github.com/zhxchen17, https://github.com/thiagocrepaldi
2024-04-05 21:48:22 +00:00
Pian Pawakapan
d7f23f6826 [export] Restore original placeholder names (part 1: top-level renaming) (#122904)
Summary:
This PR restores original names to placeholder nodes, replacing the default names arg0_1, arg1_1, and so on.

User inputs now follow the signature of mod.forward(), for example forward(x, y) produces nodes x, y. If the tensors are nested in dictionaries, lists, tuples, or dataclasses, the names are a concatenation of the path to the tensor, e.g. x = {'a': torch.randn(4), 'b': [torch.randn(4), torch.randn(4)]} produces nodes x_a, x_b_0, x_b_1.

Parameters, buffers, constants, and custom objects follow the FQN of the object, prefixed by "p", "b", "c", and "obj" respectively. For example, self.bar.l0.weight gets you p_bar_l0_weight.
Effect tokens are named token_1, token_2, and so on, since they are not grounded in model inputs or named attributes.

note: breaking the original diff into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size.

Examples:
```python
# params, buffers, constants, inputs, torch.cond

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_l0_weight: "f32[4, 4]", p_l0_bias: "f32[4]", c_alpha: "f32[4]", b_beta: "f32[4]", x_0_a: "f32[4, 4]", y: "f32[4, 4]"):
            # No stacktrace found for following nodes
            mul: "f32[4, 4]" = torch.ops.aten.mul.Tensor(x_0_a, x_0_a)
            t: "f32[4, 4]" = torch.ops.aten.t.default(p_l0_weight);  p_l0_weight = None
            addmm: "f32[4, 4]" = torch.ops.aten.addmm.default(p_l0_bias, y, t);  p_l0_bias = y = t = None
            return addmm

# model code

class Bar(torch.nn.Module):
    def forward(self, x):
        return x * x
class Foo(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.bar = Bar()
        self.l0 = torch.nn.Linear(4, 4)
        self.alpha = torch.randn(4)
        self.register_buffer('beta', torch.randn(4))
    def forward(self, x, y):
        x = x[0]['a']
        mul = self.bar(x)
        z1 = self.l0(y)
        return z1

# custom objects, dataclasses, tokens, constant inputs

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, token_1: "f32[0]", obj_attr, data_x: "f32[4, 4]", data_y: "f32[4, 4]", mode):
            # No stacktrace found for following nodes
            mul: "f32[4, 4]" = torch.ops.aten.mul.Scalar(data_x, 30);  data_x = None
            div: "f32[4, 4]" = torch.ops.aten.div.Tensor_mode(data_y, 1.0, rounding_mode = 'floor');  data_y = None
            add: "f32[4, 4]" = torch.ops.aten.add.Tensor(mul, div);  mul = div = None
            with_effects = torch._higher_order_ops.effects.with_effects(token_1, torch.ops._TorchScriptTesting.takes_foo.default, obj_attr, add);  token_1 = obj_attr = add = None
            getitem: "f32[0]" = with_effects[0]
            getitem_1: "f32[4, 4]" = with_effects[1];  with_effects = None
            return (getitem, getitem_1)

# model code

class Foo(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.attr = torch.classes._TorchScriptTesting._Foo(10, 20)
    def forward(self, data, a=1.0, mode="floor"):
        x = self.attr.add_tensor(data.x) + torch.div(data.y, a, rounding_mode=mode)
        x = torch.ops._TorchScriptTesting.takes_foo(self.attr, x)
        return x

dataclass
class DataClass:
    x: Tensor
    y: Tensor
register_dataclass_as_pytree_node(
    DataClass,
    serialized_type_name="test.DataClass"
)

args = (DataClass(x=torch.randn(4, 4), y=torch.randn(4, 4)), )
kwargs = {'mode': 'floor'}
ep = torch.export.export(Foo(), args, kwargs, strict=False)

```

Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py

Differential Revision: D55456418

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122904
Approved by: https://github.com/angelayi, https://github.com/thiagocrepaldi
2024-04-05 18:56:00 +00:00
Josh Fromm
0c8a165b43 [Export] Improve metadata and output parsing during deserialization (#122793)
Summary:
Deserialization of metadata could encounter a bug where commas are used in valid metadata names. This specifically occurs when a split of a `torch.nn.Sequential` stack is used, but may have other possible triggers. Because the deserialization relies on a comma based string split, such names trigger an error. This change uses a simple regular expression to ignore commas within parentheses to avoid the issue.

I add a test that constructs one such problematic sequential stack and show that it can be properly round-tripped with the improved splitting.

Similarly, deserialization could fail when outputs are not a tensor type. Although such outputs like None or constants are not very useful, they do show up in graphs and export should be able to support them. This change improves output node parsing and adds a corresponding test.

Test Plan: buck test //caffe2/test:test_export -- TestSerialize

Differential Revision: D55391674

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122793
Approved by: https://github.com/zhxchen17
2024-04-05 00:25:37 +00:00
Pian Pawakapan
4b1b4db231 [export] Add stack_trace for non-strict export (#121034)
This addresses 2 issues with stack_trace metadata:
- stack_trace is currently missing from nodes in non-strict export
- in strict mode, stack_trace is populated for placeholder nodes, which may not be well-defined (with multiple uses)

We filter the call stack during tracing for calls from forward() methods, or ops in `torch.__init__.py` (e.g. sym_size_int, sym_constrain_range, etc.) to populate stack_trace. A node-level check is also added to _export_non_strict().

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121034
Approved by: https://github.com/angelayi
2024-04-04 22:35:33 +00:00
Boyuan Feng
64d743044d Add inline constraints to non-strict exported program (#123017)
Summary: This PR reduces the difference between strict and non-strict exported program by supporting inline_constraints for non-strict exported program,

Test Plan: CI

Differential Revision: D55547830

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123017
Approved by: https://github.com/angelayi
2024-04-02 18:16:16 +00:00
angelayi
ed457c7dbe [export] Add torch_fn (#122693)
This PR adds a new metadata, `torch_fn` which is meant to replace `source_fn_stack` as `source_fn_stack` is not entirely well defined between strict/nonstrict. Previous discussion [here](https://docs.google.com/document/d/1sPmmsmh6rZFWH03QBOe49MaXrQkP8SxoG8AOMb-pFk4/edit#heading=h.anmx9qknhvm).

`torch_fn` represents the torch function that a particular aten operator came from. For example, `torch.nn.Linear` goes down to the `torch.nn.functional.linear` at the `__torch_function__` layer, and then `aten.t/aten.addmm` in the `__torch_dispatch__` layer. So the nodes `aten.t/aten.addmm` will now have the `torch_fn` metadata containing the `torch.nn.functional.linear`.

The `torch_fn` metadata is a tuple of 2 strings: a unique identifier for each torch function call, and the actual torch function `f"{fn.__class__}.{fn.__name__}"`. The purpose of the first value is to distinguish between 2 consecutive calls to the same function. For example, if we had 2 calls to `torch.nn.Linear`, the nodes and corresponding metadata would look something like:
```
aten.t - ("linear_1", "builtin_function_or_method.linear"),
aten.addmm - ("linear_1", "builtin_function_or_method.linear"),
aten.t - ("linear_2", "builtin_function_or_method.linear"),
aten.addmm - ("linear_2", "builtin_function_or_method.linear"),
```

Higher order ops -- currently we can get the torch_fn metadata for nodes within the HOO's subgraph, but after retracing, this becomes the `(cond, higher_order_op.cond)` :( This is because `fx_traceback.set_current_meta` points to the cond node in the toplevel graph, rather than the original node in the subgraph. I think this is because `fx.Interpreter` does not go into the cond subgraphs. (will discuss with Yidi more ab this)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122693
Approved by: https://github.com/tugsbayasgalan
2024-03-30 06:47:15 +00:00
Angela Yi
482d8bf1ea [aoti] Change aot_compile callsites (#122225)
Summary:
Replacing `torch._export.aot_compile` callsites with
```
ep = torch.export._trace._export(.., predispatch=True)   # Traces the given program into predispatch IR
so_path = torch._inductor.aot_compile_ep(ep, ...)  # Takes an exported program and compiles it into a .so
```

This allows us to explicitly split up the export step from AOTInductor. We can later modify tests to do `export + serialize + deserialize + inductor` to mimic internal production use cases better.

Test Plan: CI

Differential Revision: D54808612

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122225
Approved by: https://github.com/SherlockNoMad, https://github.com/khabinov
2024-03-29 21:34:20 +00:00
PyTorch MergeBot
3beb9d85a6 Revert "Add non strict inline constraints and runtime assertions to non-strict exported program (#122722)"
This reverts commit b693fff5d7.

Reverted https://github.com/pytorch/pytorch/pull/122722 on behalf of https://github.com/BoyuanFeng due to This breaks torchrec.distributed.tests.test_pt2.TestPt2: test_kjt__getitem__ ([comment](https://github.com/pytorch/pytorch/pull/122722#issuecomment-2026078351))
2024-03-28 20:42:35 +00:00
Boyuan Feng
b693fff5d7 Add non strict inline constraints and runtime assertions to non-strict exported program (#122722)
This PR reduces the difference between strict and non-strict exported program by

- Support `inline_constraints` for non-strict exported program
- Add runtime assertions for range constraints to non-strict exported program

After this PR, the following unit tests are no longer `expectedFailureNonStrict`:
- test_automatic_constrain_size
- test_export_with_inline_constraints
- test_redundant_asserts
- test_constrain_size_with_constrain_value
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122722
Approved by: https://github.com/pianpwk
2024-03-27 21:20:03 +00:00
Josh Fromm
0c47f8028e Keep example_inputs when saving and loading ExportedProgram (#122618)
Summary:
`torch.export` is a powerful tool for creating a structured and shareable package from arbitrary pytorch code. One great use case of `torch.export` is sharing models or subgraphs in a way that allows results to be easily replicated. However, in the current implementation of `export`, the `example_inputs` field is thrown out. When trying to replicate bugs, benchmarks, or behaviors, losing the original input shapes and values makes the process much messier.

This change adds saving and loading for the `example_inputs` attribute of an `ExportedProgram` when using `torch.export.save` and `torch.export.load`. This simple addition makes `ExportedPrograms`s a fantastic tool for performance and accuracy replication. For example, with this change we enable the following workflow:

```
# Script to create a reproducible accuracy issue with my model.
kwargs = {"fastmath_mode": True}
exp_program = export(my_model, sample_inputs, kwargs)
result = exp_program.module()(*sample_inputs, **kwargs)
# Uhoh, I dont like that result, lets send the module to a colleague to take a look.
torch.export.save(exp_program, "my_model.pt2")
```

My colleague can then easily reproduce my results llike so:

```
# Script to load and reproduce results from a saved ExportedProgram.
loaded_program = torch.export.load("my_model.pt2")
# The following line is enabled by this Diff, we pull out the arguments
# and options that caused the issue.
args, kwargs = loaded_program.example_inputs
reproduced_result = loaded_program.module()(*args, **kwargs)
# Oh I see what happened here, lets fix it.
```

Being able to share exact inputs and arguments makes `ExportedPrograms` much
more clean and powerful with little downside. The main potential issue with this change
is that it does slightly increase the size of saved programs. However, the size of
inputs will be much smaller than parameters in most cases. I am curious to hear
discussion on saved file size though.

The deserialization of `example_inputs` is currently implemented as `Optional`. Although this wont effect users of `export.save` and `export.load`, it does give backwards compatibility to any direct users of `serialize` and `deserialize`.

Test Plan:
This diff includes a new test which exercises the save / load flow with multiple args and kwargs.

```
buck test //caffe2/test:test_export -- TestSerialize
```

Differential Revision: D55294614

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122618
Approved by: https://github.com/zhxchen17
2024-03-26 03:32:44 +00:00
Edward Z. Yang
5891c5b3a6 Factor meta conversion through serializable MetaTensorDesc (#122044)
Fixes https://github.com/pytorch/pytorch/issues/121085

This PR pretty involved so pay attention to this description.  At a high
level, the refactor is intended to be mechanical: anywhere in
MetaConverter where previously we took a Tensor as argument, we now take
a MetaTensorDesc, which contains all of the information that we would
have queried off of the Tensor, but placed into a separate data
structure which we can serialize or use to recreate a fake tensor in
a separate fake tensor mode in exact fidelity to the original.

However, this transformation is not always entirely mechanical.  Here
is what you need to pay attention to:

- The memo table from real Tensor -> meta/fake Tensor is now broken
  into two memo tables: real Tensor -> stable int id -> meta/fake
  Tensor.  The stable int id is needed so that when we do serialization,
  we know when tensors/storages alias each other and can ensure we preserve
  this aliasing upon deserialization.

  The way I have implemented changes the weak reference behavior.
  Previously, when either the real Tensor OR the meta/fake Tensor went
  dead, we would remove the entry from the memo table.  Now, this only
  removes entries from one of the two memo tables.  This semantically
  makes sense, because the user may have held on to the stable int id
  out of band, and may expect a real Tensor to continue to be numbered
  consistently / expect to be able to lookup a meta/fake tensor from
  this id.  If this is unacceptable, it may be possible to rejigger
  the memo tables so that we have real Tensor -> stable int id
  and real Tensor -> meta/fake Tensor, but TBH I find the new
  implementation a lot simpler, and arranging the memo tables in this
  way means that I have to muck around with the real tensor to save
  to the memo table; in the current implementation, I never pass the
  Tensor to meta_tensor function AT ALL, which means it is impossible
  to accidentally depend on it.

- When I fill in the fields of MetaTensorDesc in describe_tensor, I need
  to be careful not to poke fields when they are not valid.  Previously,
  preconditions were implicitly checked via the conditional structure
  ("is this sparse? is this nested?") that is tested before we start
  reading attributes.  This structure has to be replicated in
  describe_tensor, and I have almost assuredly gotten it wrong on my
  first try (I'll be grinding through it on CI; a careful audit will
  help too, by auditing that I've tested all the same conditionals that
  the original access was guarded by.)

- I originally submitted https://github.com/pytorch/pytorch/pull/121821
  for the symbolic shapes change, but it turned out the way I did it
  there didn't actually work so well for this PR.  I ended up just
  inlining the symbolic shapes allocation logic into MetaConverter
  (look for calls to maybe_specialize_sym_int_with_hint), maybe there
  is a better way to structure it, but what I really want is to
  just read sizes/strides/offset directly off of MetaTensorDesc; I
  don't want another intermediate data structure.

- Some fields aren't serializable. These are documented as "NOT
  serializable".  ctx/type should morally be serializable and I just
  need to setup a contract with subclasses to let them be serialized.
  The fake_mode is used solely to test if we are refakefying with
  a pre-existing ShapeEnv and we want to reuse the SymInt
  directly--serializing this case is hopeless but I am kind of hoping
  after this refactor we do not need this at all.  view_func is not
  serializable because it's a bound C implemented method.  Joel has
  promised me that this is not too difficult to actually expose as a
  true data structure, but this is the edgiest of edge cases and there
  is no reason to deal with it right now.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122044
Approved by: https://github.com/eellison
2024-03-25 06:21:17 +00:00
Pian Pawakapan
3f99306452 [export] Remove from_export flag (#122500)
Summary: The flag from_export was incorrectly included in a previous diff (https://www.internalfb.com/diff/D54314379) - it was intended for helping with ExportedProgram verification, but was no longer needed in the final implementation.

Test Plan: Changes no functionality, test/export already covers everything

Differential Revision: D55205857

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122500
Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17
2024-03-22 22:55:14 +00:00
Zhengxu Chen
7fd14ebb52 [export] Use randomized inputs to examples. (#122424)
Summary: as title. replacing all torch.ones to randn

Test Plan: CI

Reviewed By: tugsbayasgalan

Differential Revision: D55206441

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122424
Approved by: https://github.com/tugsbayasgalan
2024-03-22 15:32:28 +00:00
PyTorch MergeBot
f65373e278 Revert "Factor meta conversion through serializable MetaTensorDesc (#122044)"
This reverts commit e2d89e9704.

Reverted https://github.com/pytorch/pytorch/pull/122044 on behalf of https://github.com/jeanschmidt due to Seems that some landrace caused this PR to break lint ([comment](https://github.com/pytorch/pytorch/pull/122044#issuecomment-2015025490))
2024-03-22 12:46:21 +00:00
Edward Z. Yang
e2d89e9704 Factor meta conversion through serializable MetaTensorDesc (#122044)
Fixes https://github.com/pytorch/pytorch/issues/121085

This PR pretty involved so pay attention to this description.  At a high
level, the refactor is intended to be mechanical: anywhere in
MetaConverter where previously we took a Tensor as argument, we now take
a MetaTensorDesc, which contains all of the information that we would
have queried off of the Tensor, but placed into a separate data
structure which we can serialize or use to recreate a fake tensor in
a separate fake tensor mode in exact fidelity to the original.

However, this transformation is not always entirely mechanical.  Here
is what you need to pay attention to:

- The memo table from real Tensor -> meta/fake Tensor is now broken
  into two memo tables: real Tensor -> stable int id -> meta/fake
  Tensor.  The stable int id is needed so that when we do serialization,
  we know when tensors/storages alias each other and can ensure we preserve
  this aliasing upon deserialization.

  The way I have implemented changes the weak reference behavior.
  Previously, when either the real Tensor OR the meta/fake Tensor went
  dead, we would remove the entry from the memo table.  Now, this only
  removes entries from one of the two memo tables.  This semantically
  makes sense, because the user may have held on to the stable int id
  out of band, and may expect a real Tensor to continue to be numbered
  consistently / expect to be able to lookup a meta/fake tensor from
  this id.  If this is unacceptable, it may be possible to rejigger
  the memo tables so that we have real Tensor -> stable int id
  and real Tensor -> meta/fake Tensor, but TBH I find the new
  implementation a lot simpler, and arranging the memo tables in this
  way means that I have to muck around with the real tensor to save
  to the memo table; in the current implementation, I never pass the
  Tensor to meta_tensor function AT ALL, which means it is impossible
  to accidentally depend on it.

- When I fill in the fields of MetaTensorDesc in describe_tensor, I need
  to be careful not to poke fields when they are not valid.  Previously,
  preconditions were implicitly checked via the conditional structure
  ("is this sparse? is this nested?") that is tested before we start
  reading attributes.  This structure has to be replicated in
  describe_tensor, and I have almost assuredly gotten it wrong on my
  first try (I'll be grinding through it on CI; a careful audit will
  help too, by auditing that I've tested all the same conditionals that
  the original access was guarded by.)

- I originally submitted https://github.com/pytorch/pytorch/pull/121821
  for the symbolic shapes change, but it turned out the way I did it
  there didn't actually work so well for this PR.  I ended up just
  inlining the symbolic shapes allocation logic into MetaConverter
  (look for calls to maybe_specialize_sym_int_with_hint), maybe there
  is a better way to structure it, but what I really want is to
  just read sizes/strides/offset directly off of MetaTensorDesc; I
  don't want another intermediate data structure.

- Some fields aren't serializable. These are documented as "NOT
  serializable".  ctx/type should morally be serializable and I just
  need to setup a contract with subclasses to let them be serialized.
  The fake_mode is used solely to test if we are refakefying with
  a pre-existing ShapeEnv and we want to reuse the SymInt
  directly--serializing this case is hopeless but I am kind of hoping
  after this refactor we do not need this at all.  view_func is not
  serializable because it's a bound C implemented method.  Joel has
  promised me that this is not too difficult to actually expose as a
  true data structure, but this is the edgiest of edge cases and there
  is no reason to deal with it right now.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122044
Approved by: https://github.com/eellison
ghstack dependencies: #122018
2024-03-22 03:56:34 +00:00
Zhengxu Chen
b1fa0ce4aa [export] build the infra to rollout predispatch export. (#122326)
Test Plan:
fbcode:caffe2/test/quantization:test_quantization
fbcode:bolt/nn/executorch/backends/tests:qnn_test
fbcode:on_device_ai/helios/compiler_tests/...
fbcode:pyspeech/tests:pyspeech_utils_test_oss
fbcode:caffe2/test:quantization_pt2e_qat
fbcode:on_device_ai/Assistant/Jarvis/tests:test_custom_ops
fbcode:modai/test:test_modai
fbcode:executorch/exir/backend/test:test_partitioner

Differential Revision: D55133846

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122326
Approved by: https://github.com/tugsbayasgalan
2024-03-22 00:55:10 +00:00
Pian Pawakapan
c20bc18d59 [export] allow static constraints in dynamic_shapes (#121860)
This PR allows users to specify int values for dimensions in dynamic_shapes as well as None, for example:

```
class Foo(torch.nn.Module):
    def forward(self, x, y, z):
        ...

    foo = Foo()
    inputs = (torch.randn(4, 6), torch.randn(5, 4), torch.randn(3, 3))

for dynamic_shapes in [
    None
    ((4, 6), (5, 4), (3, 3)),
    ((None, 6), None, {0: 3, 1: 3})
]:
    _ = export(foo, inputs, dynamic_shapes=dynamic_shapes)
```

All of the above should produce the same ExportedProgram.

This is done by temporarily creating a static dim constraint during analysis, where vr.lower == vr.upper. These constraints are then deleted during _process_constraints(), and do not show up in the final ExportedProgram's range_constraints.

Additionally, export() will also fail if the shapes are mis-specified, for example:
```
_ = export(foo, inputs, dynamic_shapes=((5, None), None, None))
```
leads to `torch._dynamo.exc.UserError: Static shape constraint of 5 does not match input size of 4, for L['x'].size()[0]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121860
Approved by: https://github.com/avikchaudhuri
2024-03-21 16:59:59 +00:00
Sherlock Huang
ae913175c3 Fix GraphModuleDeserializer (#122342)
Summary: self.constants is used in self.deserialize_signature()

Test Plan: CI

Differential Revision: D55152971

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122342
Approved by: https://github.com/zhxchen17
2024-03-21 02:27:39 +00:00
Zhengxu Chen
f8565c4a28 [sigmoid] Clean up serialization API. (#122102)
Summary: Entirely remove the old serializer code to avoid further confusion and code bloat.

Test Plan: CI

Reviewed By: SherlockNoMad

Differential Revision: D54857118

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122102
Approved by: https://github.com/tugsbayasgalan
2024-03-20 03:45:36 +00:00
Pian Pawakapan
c5ffebebab [export] allow Dim(1,2) for export dynamic shapes (v2 after revert) (#121910)
Creating this after [PR](https://github.com/pytorch/pytorch/pull/121642) got reverted.

Current dynamic shapes implementation fixes lower range of Dims to be 2 for analysis, but allows 0/1 shapes during runtime. This leads to failures when initializing Dim(1,2). This PR sets the lower bound to 0, and avoids erroring out when conflicting with the generated (2, maxsize) constraint during analysis.

Also resolves a derived dim constraints issue with the following code:
```
class Bar(torch.nn.Module):
    def forward(self, x, y):
        return x + y[1:]

dx = Dim("dx", min=1, max=3)
ep = export(
    Bar(),
    (torch.randn(2, 2), torch.randn(3, 2)),
    dynamic_shapes=({0: dx, 1: None}, {0: dx+1, 1: None})
)
print(ep.range_constraints)
```

In main:
```
{s0: ValueRanges(lower=2, upper=3, is_bool=False), s0 + 1: ValueRanges(lower=3, upper=4, is_bool=False)}
```

This PR:
```
{s0: ValueRanges(lower=1, upper=3, is_bool=False), s0 + 1: ValueRanges(lower=2, upper=4, is_bool=False)}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121910
Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17
2024-03-19 19:08:05 +00:00
PyTorch MergeBot
d56ab7b020 Revert "[torch export][serialize] create a more compact stacktrace format for serialization (#121675)"
This reverts commit eae89138d8.

Reverted https://github.com/pytorch/pytorch/pull/121675 on behalf of https://github.com/jeanschmidt due to It seems that this PR broke lint jobs, I am reverting to confirm if this is the case ([comment](https://github.com/pytorch/pytorch/pull/121675#issuecomment-2007919486))
2024-03-19 19:02:09 +00:00
Wenting Wang
eae89138d8 [torch export][serialize] create a more compact stacktrace format for serialization (#121675)
Summary:
- we want fx nodes' stack trace format to be backward compatible and same as before in the program we export
- however in the serialized format, we would want to show a more compact stack_trace format, otherwise the nodes attributes are dominated by stack traces
- the diff implements the minimal in serialization process to dedupe node stack traces by resorting to a fileinfo_list and a filename_to_abbrev map, so we can use index to represent filenames, use lineno to represent lines.

Test Plan:
# llm
base on D54497918
```
buck2 run @//mode/dev-nosan fbcode//executorch/examples/models/llama2:export_llama -- -c ~/stories110M.pt -p ~/params.json
```
set up breakpoint after serialization/deserialization
- serialize
```
(Pdb) v_meta = [n.meta for n in exported_program.graph_module.graph.nodes]
(Pdb) paste_client.create_phabricator_paste_object(paste_creation_client_id=1093956601162697, content=str(v_meta)).number
1193647450
(Pdb) json_program = json.dumps(_dataclass_to_dict(serialized_graph.co_fileinfo_ordered_list),cls=EnumEncoder)
(Pdb) json_bytes = json_program.encode('utf-8')
(Pdb) paste_client.create_phabricator_paste_object(paste_creation_client_id=1093956601162697, content=str(json_bytes)).number
1193604333
(Pdb) sys.getsizeof(json_bytes)
3846
(Pdb) compressed_bytes = zstd.ZstdCompressor().compress(json_bytes)
(Pdb) sys.getsizeof(compressed_bytes)
1139
```
in P1193647450 (before serialization), search for `stack_trace`
in P1193604333 (after serialization), search for `stack_trace` and `co_fileinfo_ordered_list`

[note: didn't do compression in this diff since the size is pretty small and it adds complexity if we do compression]
- deserialize
```
(Pdb) v_meta = [n.meta for n in deserialized_exported_program.graph_module.graph.nodes]
(Pdb) paste_client.create_phabricator_paste_object(paste_creation_client_id=1093956601162697, content=str(v_meta)).number
1193629435
```
in P1193629435, search for `stack_trace`

# ads

Differential Revision: D54654443

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121675
Approved by: https://github.com/angelayi
2024-03-19 17:58:12 +00:00
Pian Pawakapan
3bd38928ba [export] Improve consistency for nn_module_stack metadata, add checks to _trace.py (#120661)
We would like to improve consistency for nn_module_stack metadata in torch.export.

This PR ensures that all tests in test/export/test_export.py has the following constraints:
- Remove nn_module_stack for all placeholder & output nodes, for all modules and submodules
- Ensure nn_module_stack is present for all other node types for the top-level module (there is still an issue with torch.cond submodules having empty fields)
- Add these checks to _export() in _trace.py (we would add this in the Verifier, but downstream apps construct ExportedPrograms separate from _export(), and metadata may not be maintained there)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120661
Approved by: https://github.com/avikchaudhuri
2024-03-16 21:44:52 +00:00
Wenting Wang
dfc5e9325d format caffe2/torch/_export/serde/serialize.py (#121670)
Summary: black caffe2/torch/_export/serde/serialize.py

Test Plan: tests

Differential Revision: D54654847

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121670
Approved by: https://github.com/angelayi
2024-03-15 21:30:16 +00:00
angelayi
ef25d83a62 [export] Add serialization support for tokens (#121552)
Differential Revision: [D54906766](https://our.internmc.facebook.com/intern/diff/D54906766)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121552
Approved by: https://github.com/zhxchen17
2024-03-15 16:15:11 +00:00
Zhengxu Chen
c409292197 [sigmoid] Use deserializer from oss. (#121839)
Summary:
Old path:
thrift -> thrift deserializer -> graph module.
new path:
thrift -> python dataclass -> oss deserializer -> graph_module

Test Plan:
CI
buck2 test mode/dev-nosan caffe2/test/inductor/fb:test_aot_inductor_pt2_inference

Reviewed By: SherlockNoMad

Differential Revision: D54855251

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121839
Approved by: https://github.com/angelayi
2024-03-14 18:38:58 +00:00
PyTorch MergeBot
bf7ac4ddf7 Revert "[export] allow Dim(1,2) for export dynamic shapes (#121642)"
This reverts commit a8dcbf2749.

Reverted https://github.com/pytorch/pytorch/pull/121642 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/121642#issuecomment-1996121710))
2024-03-13 23:51:20 +00:00
Pian Pawakapan
a8dcbf2749 [export] allow Dim(1,2) for export dynamic shapes (#121642)
Current dynamic shapes implementation fixes lower range of Dims to be 2 for analysis, but allows 0/1 shapes during runtime. This leads to failures when initializing Dim(1,2). This PR sets the lower bound to 0, and avoids erroring out when conflicting with the generated (2, maxsize) constraint during analysis.

Also resolves a derived dim constraints issue with the following code:
```
class Bar(torch.nn.Module):
    def forward(self, x, y):
        return x + y[1:]

dx = Dim("dx", min=1, max=3)
ep = export(
    Bar(),
    (torch.randn(2, 2), torch.randn(3, 2)),
    dynamic_shapes=({0: dx, 1: None}, {0: dx+1, 1: None})
)
print(ep.range_constraints)
```

In main:
```
{s0: ValueRanges(lower=2, upper=3, is_bool=False), s0 + 1: ValueRanges(lower=3, upper=4, is_bool=False)}
```

This PR:
```
{s0: ValueRanges(lower=1, upper=3, is_bool=False), s0 + 1: ValueRanges(lower=2, upper=4, is_bool=False)}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121642
Approved by: https://github.com/avikchaudhuri
2024-03-13 22:59:07 +00:00
Sherlock Huang
dd568f4207 [Export, AOTInductor] Populate ShapeEnv's var_to_val during deserialization (#121759)
Summary:
Deserialization didn't populate ShapeEnv's `var_to_val` field properly, and AOTInductor is relying on this field to compile dynamic shape properly.
As a result, when AOTI failed at compiling a deserialized ExportedProgram.

Test Plan: buck2 test  mode/dev-nosan caffe2/test/inductor/fb:test_aot_inductor_pt2_inference

Differential Revision: D54559494

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121759
Approved by: https://github.com/avikchaudhuri
2024-03-13 21:28:25 +00:00
Avik Chaudhuri
7fe0cc53e9 make _process_dynamic_shapes an implementation detail (#121713)
Summary: `_process_dynamic_shapes` converts new dynamic shapes to old constraints, but in the future may not need to do so. Preparing for that future.

Test Plan: CI

Differential Revision: D54780374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121713
Approved by: https://github.com/tugsbayasgalan
2024-03-13 08:33:00 +00:00
Zhenghao Zhao
3461404869 [pt2 export]fix name collision on constant name (#121145)
Summary: Taking the right most part of the fqn will cause name conflict when having multiple instances of the same class. Changed to replace "." in fqn by "_" to avoid invalid syntax in input args.

Test Plan: added test case

Differential Revision: D54435230

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121145
Approved by: https://github.com/zhxchen17
2024-03-11 20:40:59 +00:00
Oguz Ulgen
660ec3d38d [Export] Fix bug removing node from wrong graph (#121574)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121574
Approved by: https://github.com/ydwu4
2024-03-10 04:46:11 +00:00
angelayi
e8836759d0 [export] Add effect token to export (#121424)
Following the creation of effect tokens (https://github.com/pytorch/pytorch/pull/120296), we want to now add support for these tokens in export because the calling/returning convention has changed. The inputs are now `(tokens, params, buffers, constants, user_inputs)` and the outputs are `(tokens, buffer_mutations, user_mutations, user_outputs)`. The graph looks something like:
```
graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %attr : [num_users=2] = placeholder[target=attr]
    %arg1_1 : [num_users=2] = placeholder[target=arg1_1]
    %with_effects : [num_users=2] = call_function[target=torch._higher_order_ops.effects.with_effects](args = (%arg0_1, _TorchScriptTesting.takes_foo.default, %attr, %arg1_1), kwargs = {})
    %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects, 0), kwargs = {})
    %getitem_1 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects, 1), kwargs = {})
    %with_effects_1 : [num_users=2] = call_function[target=torch._higher_order_ops.effects.with_effects](args = (%getitem, _TorchScriptTesting.takes_foo.default, %attr, %getitem_1), kwargs = {})
    %getitem_2 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects_1, 0), kwargs = {})
    %getitem_3 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects_1, 1), kwargs = {})
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, %getitem_3), kwargs = {})
    return (getitem_2, add)
```

During unlifting, we will first remove the tokens and with_effect calls using the `remove_effect_tokens` pass. (cc @SherlockNoMad on the pass to remove tokens). This is so that this won't change the calling conventions when retracing. The graph after unlifting looks something like:
```
graph():
    %attr_1 : [num_users=2] = get_attr[target=attr]
    %arg1_1 : [num_users=2] = placeholder[target=arg1_1]
    %takes_foo_default_1 : [num_users=1] = call_function[target=torch.ops._TorchScriptTesting.takes_foo.default](args = (%attr_1, %arg1_1), kwargs = {})
    %takes_foo_default : [num_users=1] = call_function[target=torch.ops._TorchScriptTesting.takes_foo.default](args = (%attr_1, %takes_foo_default_1), kwargs = {})
    %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, %takes_foo_default), kwargs = {})
    return (add,)
```

Serialization support will be added in a followup.
Note: tokens only affect custom ops that take in ScriptObjects, not ScriptObject methods yet.

Differential Revision: [D54639390](https://our.internmc.facebook.com/intern/diff/D54639390)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121424
Approved by: https://github.com/tugsbayasgalan
2024-03-09 02:43:26 +00:00
Avik Chaudhuri
b3f24b57fb fix accidental specialization with faketensor input checks (#121460)
Summary: When fake tensors are passed to a graph module and we do runtime assertions on them, we can accidentally trigger specialization guards. It's better to just relax the checking for these.

Test Plan: confirmed that problem in T181400371 is now fixed

Differential Revision: D54658960

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121460
Approved by: https://github.com/angelayi
2024-03-08 08:02:37 +00:00
Zhengxu Chen
76f1461892 [export] Serialize union fields with single entry dict. (#121263) (#121337)
Summary:

remove "$type" and "$value" fields, instead only serialize as {type: value} for union fields directly.

bypass-github-export-checks

Test Plan: CI

Differential Revision: D54600943

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121337
Approved by: https://github.com/tugsbayasgalan
2024-03-07 21:24:28 +00:00
PyTorch MergeBot
23fb37fa41 Revert "[export] Serialize union fields with single entry dict. (#121263)"
This reverts commit 7feabe9b73.

Reverted https://github.com/pytorch/pytorch/pull/121263 on behalf of https://github.com/osalpekar due to A large number of inductor benchmarking jobs failing starting this PR. See for details: 7feabe9b73 ([comment](https://github.com/pytorch/pytorch/pull/121263#issuecomment-1981680049))
2024-03-06 19:58:55 +00:00