pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yidi Wu	a23d86c178	[hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133645 Approved by: https://github.com/zou3519	2024-08-23 17:28:02 +00:00
Avik Chaudhuri	b454c51060	remove dynamic_dim (#134211 ) Summary: As promised in https://github.com/pytorch/pytorch/pull/134045. Test Plan: existing Differential Revision: D61646937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134211 Approved by: https://github.com/angelayi	2024-08-23 04:13:03 +00:00
Shangdi Yu	b0cf287b46	[export][training ir migration] Fix getitem not exist (#134259 ) Summary: Make quantization tests compatible with the new training IR. With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node. We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir. For now, we are just rolling out the training ir for fbcode internal tests. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args ``` Reviewed By: andrewor14, tugsbayasgalan Differential Revision: D61292102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259 Approved by: https://github.com/tugsbayasgalan	2024-08-22 22:00:14 +00:00
Aaron Orenstein	d95aedf5fd	[BE] typing for decorators - fx/_compatibility (part 1) (#134202 ) Part of #134054. This corresponds to the pytorch mypy changes from D61493706. Updating takes so long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change. So landing these 'type: ignore' for pytorch in advance of them actually being needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202 Approved by: https://github.com/Skylion007	2024-08-22 17:07:33 +00:00
Avik Chaudhuri	0d7ac1966a	kill sharing of constraints (#134045 ) Summary: Previously, reuse of the same `Dim` was encoded by "sharing" internal constraints among constraint targets. This kind of sharing, implemented using `shared` fields between `_Constraint`s, was originally motivated by `dynamic_dim`, specifically to support `==` between `dynamic_dim`s, but we no longer need to maintain this overcomplicated structure: we can simply use names of `Dims` to directly encode sharing information. Thus this PR vastly simplifies the structure of `_Constraint` by removing `shared` fields. As a result, both `_Constraint` and its moral subclass, `_DerivedConstraint`, are 1-1 with `Dim` and its moral subclass, `DerivedDim`. Note that this will break `==` over `dynamic_dim`, so an immediate follow-up will be to remove `dynamic_dim` entirely from our public API. (It's been more than 6 months since the deprecation warning anyway.) I just didn't want to deal with that process in the same PR. Test Plan: existing Differential Revision: D61559413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134045 Approved by: https://github.com/pianpwk	2024-08-22 04:40:47 +00:00
Yiming Zhou	7b20514f8e	[export] Device remapping in export (#133660 ) Implemented `move_to_device_pass()` function in `torch._export.passes`. The user has to explicitly call this method to move the exported program from one torch.device to another one. Fixes https://github.com/pytorch/pytorch/issues/121761 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133660 Approved by: https://github.com/angelayi	2024-08-22 01:03:35 +00:00
PyTorch MergeBot	1491a61769	Revert "[hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645 )" This reverts commit `696107efcb`. Reverted https://github.com/pytorch/pytorch/pull/133645 on behalf of https://github.com/ydwu4 due to breaking ci. probably due to land race ([comment](https://github.com/pytorch/pytorch/pull/133645#issuecomment-2302866106))	2024-08-21 19:33:14 +00:00
Shangdi Yu	5fcfccefc6	[export] Migrate `capture_pre_autograd_graph` to `_export_for_training` (#132815 ) Summary: as title Test Plan: CI Differential Revision: D60860909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132815 Approved by: https://github.com/tugsbayasgalan	2024-08-21 19:00:41 +00:00
Yidi Wu	696107efcb	[hop] ban creating hop by directly instantiating HigherOrderOperator. (#133645 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133645 Approved by: https://github.com/zou3519 ghstack dependencies: #133521	2024-08-21 17:34:21 +00:00
Shangdi Yu	8337b4d96e	[training ir migration] Fix ReorderConvertTest (#134010 ) Summary: Change ReorderConvertTest to work with the new `capture_pre_autograd_graph` implementation using D61175223. Note that now `ReorderConvertTest` doesn't work with the old `capture_pre_autograd_graph` anymore. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/passes/tests:optimize_test -- -r ReorderConvertTest ``` Differential Revision: D61507772 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134010 Approved by: https://github.com/tugsbayasgalan	2024-08-21 04:48:43 +00:00
Sherlock Huang	41fab40be7	[report_exportability] Avoid re-exporting duplicated modules (#133930 ) Summary: Skip re-exporting modules with the duplicated types to speed up the exportability tests. In real models, there are many duplicated modules, and mostly have the same export issues. Test Plan: Existing CI Differential Revision: D61504630 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133930 Approved by: https://github.com/angelayi	2024-08-20 22:11:57 +00:00
PyTorch MergeBot	49f6ea6dd9	Revert "[report_exportability] Avoid re-exporting duplicated modules (#133930 )" This reverts commit `278bc985d7`. Reverted https://github.com/pytorch/pytorch/pull/133930 on behalf of https://github.com/izaitsevfb due to breaks lint ([comment](https://github.com/pytorch/pytorch/pull/133930#issuecomment-2299513046))	2024-08-20 18:44:09 +00:00
Sherlock Huang	278bc985d7	[report_exportability] Avoid re-exporting duplicated modules (#133930 ) Summary: Skip re-exporting modules with the duplicated types to speed up the exportability tests. In real models, there are many duplicated modules, and mostly have the same export issues. Test Plan: Existing CI Differential Revision: D61504630 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133930 Approved by: https://github.com/angelayi Co-authored-by: bearzx <bearzx@fb.com>	2024-08-20 18:20:49 +00:00
Angela Yi	a1a869f2f5	[ts_converter][reland] Add support for LinearOpContext and Conv2dOpContext in quantization pass (#133622 ) Summary: Reland of D60871242 Test Plan: CI Differential Revision: D61352600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133622 Approved by: https://github.com/SherlockNoMad	2024-08-16 01:55:45 +00:00
Angela Yi	29c4b4ea5a	[executorch] Refactor delegation code (#132773 ) Summary: Refactoring partitioner-based delegation to prepare for allowing buffer mutations in the delegate (following diff). Test Plan: CI Differential Revision: D60813405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132773 Approved by: https://github.com/ydwu4, https://github.com/cccclai	2024-08-15 22:52:12 +00:00
Sherlock Huang	09a489b177	Fix serialization for tensor list output (#133539 ) Summary: Some element of tensor list output doesn't not have a user. In such case, create a name as `{node_name}_unused_{index}` for it. Test Plan: OSS CI Differential Revision: D61309011 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133539 Approved by: https://github.com/zhxchen17	2024-08-15 20:31:44 +00:00
Shangdi Yu	d3b458e603	[export] Do not use export.export for `capture_pre_autograd_graph` (#133370 ) Summary: Do not use export.export for `capture_pre_autograd_graph` in unittests anymore. #buildall Test Plan: CI Reviewed By: tugsbayasgalan Differential Revision: D60996041 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133370 Approved by: https://github.com/tugsbayasgalan	2024-08-15 17:37:45 +00:00
Pian Pawakapan	a75248528f	[export] refactor _process_dynamic_shapes (#133391 ) Sorryyyyy for another refactor. This splits `_process_dynamic_shapes` into 3 parts: 1. `_combine_args` - mostly the same thing 2. `_check_dynamic_shapes`, which is responsible for raising 99% of UserErrors if the dynamic shapes spec is invalid (minus 1 UserError with DerivedDims) 3. `_process_dynamic_shapes`, which for now, is the same thing, minus the stuff in 2. This refactor is helpful for incoming automatic dynamic shapes work, because, we're switching to `assume_static_by_default=False`, which is what `_dynamo.export` currently does. This means any unspecified dims are allocated a symbol, in contrast to export today which keeps unspecified dims static. Historically this has been desirable - export users don't want too much dynamism. So we want to change how the spec is translated into constraints. This means when we switch over to automatic dynamic shapes, we want to plug in something in between steps 2. and 3. which patches up the spec for `assume_static_by_default=False`, filling in static shapes for any unspecified dims, and potentially clearing out the auto-dynamic dims (since they're no-ops). We would do this in-between 2. and 3. to keep `_process_dynamic_shapes` semantically the same, since it's used with `_dynamo.export`. We could do this without a refactor, plugging in this transform before `_process_dynamic_shapes`, but since that function's responsible for both spec checking + constraint production, moving spec checking to before we transform the specs helps guarantee we're raising errors on what the user's specified, and not an internal export bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133391 Approved by: https://github.com/avikchaudhuri	2024-08-15 16:21:21 +00:00
Xuehai Pan	758a0a88a2	[BE][Easy] enable `ruff` rule `PIE790`: unnecessary `pass` statement (#133200 ) This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change. Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980	2024-08-15 15:50:19 +00:00
Zhengxu Chen	f23dbefe52	[export] Support "custom" metadata field. (#131912 ) Summary: Add a special field in Graph and Node level metadata called "custom" which should be mapped to a json-serializable object, and we guarantee this field should be always preversed across the following transformations: 1. copy/deepcopy 2. run_decompositions() 3. serialization 4. re-exporting Test Plan: :test_export -- -r custom_tag Reviewed By: angelayi Differential Revision: D60291839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131912 Approved by: https://github.com/angelayi	2024-08-14 01:09:01 +00:00
Pian Pawakapan	4671e98656	[export] fix node.users when inlining HOOs (#133144 ) The process of inlining HOO subgraphs (e.g. set_grad_enabled) seems to break node.users when a node is present in multiple subgraphs, for example: ``` class SetGradCase(torch.nn.Module): def forward(self, x): _x = x.shape[0] + 2 _xx = _x + 2 with torch.no_grad(): y = _x * 4 return _xx, y ``` The `_x` node contains 2 users (_xx and y) after being inlined, but with inspection it seems to only contain y as a user. Previously we were completely clearing node.users for output nodes in HOO subgraphs before inlining them - we should just be deleting the subgraph output nodes Pull Request resolved: https://github.com/pytorch/pytorch/pull/133144 Approved by: https://github.com/larryliu0820, https://github.com/ydwu4	2024-08-13 03:21:42 +00:00
Shangdi Yu	b06959e614	[export] change deepcopy to copy in _replace_with_hop passes (#133142 ) Summary: Add back the change in `19897a1647`. The change was lost in refactoring due to a bad rebase. Test Plan: CI ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//torchrec/distributed/tests:test_pt2 -- --filter-text test_sharded_quant_fpebc_non_strict_export ``` Differential Revision: D61052687 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133142 Approved by: https://github.com/ydwu4	2024-08-12 17:15:04 +00:00
PyTorch MergeBot	9641abe97a	Revert "[export] change deepcopy to copy in _replace_with_hop passes (#133142 )" This reverts commit `2d71f03db1`. Reverted https://github.com/pytorch/pytorch/pull/133142 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/133142#issuecomment-2284327241))	2024-08-12 15:48:11 +00:00
Shangdi Yu	2d71f03db1	[export] change deepcopy to copy in _replace_with_hop passes (#133142 ) Summary: Add back the change in `19897a1647`. The change was lost in refactoring due to a bad rebase. Test Plan: CI ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//torchrec/distributed/tests:test_pt2 -- --filter-text test_sharded_quant_fpebc_non_strict_export ``` Differential Revision: D61052687 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133142 Approved by: https://github.com/ydwu4	2024-08-11 21:47:52 +00:00
Avik Chaudhuri	c8275e25a7	fix requirement for error classification (#133122 ) Test Plan: none Differential Revision: D61039300 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133122 Approved by: https://github.com/yushangdi	2024-08-10 04:59:09 +00:00
Avik Chaudhuri	3899465268	relax unification checks when size-like symbols can be 0 (#133112 ) Test Plan: Fixes test failure in https://www.internalfb.com/diff/D51127481 Differential Revision: D61031307 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133112 Approved by: https://github.com/angelayi	2024-08-10 00:57:49 +00:00
Shangdi Yu	574cdf1232	[export] Merge functions in replace set_grad/autocast with HOO (#132724 ) Summary: as title Test Plan: CI Differential Revision: D60701648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132724 Approved by: https://github.com/ydwu4	2024-08-09 17:25:07 +00:00
Avik Chaudhuri	22ea248aa8	dynamic shapes mismatch errors (#132982 ) Summary: When PyTree detects a structural mismatch between inputs and dynamic shapes, the error messages are quite horrible. This PR fixes these error messages by adding, for each kind of error, the path to the point where the error happens and an actionable reason for the error. Test Plan: added test with several cases Differential Revision: D60956976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132982 Approved by: https://github.com/yushangdi	2024-08-09 02:22:32 +00:00
Shangdi Yu	3c5b246d3c	[export] Remove Proxy from exported programs and modules (#132956 ) Summary: Remove Proxy from exported programs and modules because they cannot be deepcopied or pickeled. Test Plan: CI ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node ``` Differential Revision: D60940832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132956 Approved by: https://github.com/angelayi	2024-08-09 00:00:20 +00:00
Jiashen Cao	fa8c34301a	[ts-migration]: Quantized ops to standard ops pass. (#133026 ) #### Description Transform quantized operation properly. Add de/quantization before and after the quantized operation. #### Test Plan `pytest test/export/test_converter.py -s -k test_ts2ep_convert_quantized_model` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133026 Approved by: https://github.com/angelayi	2024-08-08 23:10:17 +00:00
Edward Z. Yang	1f66487c69	[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770 Approved by: https://github.com/bdhirsh	2024-08-08 23:07:23 +00:00
PyTorch MergeBot	6f99e97f0a	Revert "[ts-migration]: Support quantized operation transformation (#131915 )" This reverts commit `0e8541766f`. Reverted https://github.com/pytorch/pytorch/pull/131915 on behalf of https://github.com/ezyang due to test broken on windows `0e8541766f` ([comment](https://github.com/pytorch/pytorch/pull/131915#issuecomment-2275974907))	2024-08-08 14:30:35 +00:00
PyTorch MergeBot	d1f73fd844	Revert "[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 )" This reverts commit `902c6f3a19`. Reverted https://github.com/pytorch/pytorch/pull/132770 on behalf of https://github.com/ezyang due to Removed API was recommitted ([comment](https://github.com/pytorch/pytorch/pull/132770#issuecomment-2275749689))	2024-08-08 12:54:34 +00:00
Edward Z. Yang	902c6f3a19	[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770 Approved by: https://github.com/bdhirsh ghstack dependencies: #132674, #132675, #132421, #132062, #132767, #132769	2024-08-08 12:03:25 +00:00
Edward Z. Yang	54efd43022	[BE] Simplify code interacting with get_proxy_mode/enable_tracing (#132675 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132675 Approved by: https://github.com/Skylion007, https://github.com/ydwu4, https://github.com/zou3519 ghstack dependencies: #132674	2024-08-08 12:03:00 +00:00
Jiashen Cao	0e8541766f	[ts-migration]: Support quantized operation transformation (#131915 ) #### Description Transform quantized operation properly. Add de/quantization before and after the quantized operation. #### Test Plan `pytest test/export/test_converter.py -s -k test_ts2ep_convert_quantized_model` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131915 Approved by: https://github.com/angelayi	2024-08-08 06:34:53 +00:00
Angela Yi	45d0e90bd3	[export] Allow str outputs (#132808 ) Summary: Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1478413606130179/ Test Plan: CI Differential Revision: D60850712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132808 Approved by: https://github.com/ydwu4	2024-08-08 02:20:59 +00:00
Yidi Wu	bbf568aac8	Split of "[reland] [export] fix zero arg export in training_ir and constant tensor handling" (#132307 ) Summary: A re-land of D60006710. Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing. edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure: a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state. The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design edit 2: Also fix the inconsistency of graph signatures when param_constant is marked as lifted_tensor_constants but it's registered as parameters in the output of ep.module(). Differential Revision: D60532628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132307 Approved by: https://github.com/zhxchen17	2024-08-08 01:36:16 +00:00
Xu Han	a9036e1cf8	[inductor] raise unsupport msg in capture_pre_autograd_graph on Windows (#132841 ) Debuged with @leslie-fang-intel , and we found that: https://github.com/pytorch/pytorch/issues/132561 and https://github.com/pytorch/pytorch/issues/132569 are all failed by `capture_pre_autograd_graph` not work well on Windows. So, we added some code to raise message and let end user known that. Detailed: For https://github.com/pytorch/pytorch/issues/132561 ```cmd Traceback (most recent call last): File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 59, in testPartExecutor yield File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 591, in run self._callTestMethod(testMethod) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 549, in _callTestMethod method() File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 2918, in wrapper method(args, kwargs) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 1515, in wrapper fn(args, *kwargs) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_quantization.py", line 399, in wrapper fn(args, *kwargs) File "D:\xu_git\dnnl_cb\pytorch\test\quantization\pt2e\test_x86inductor_quantizer.py", line 1737, in test_qat_conv2d self._test_quantizer( File "D:\xu_git\dnnl_cb\pytorch\test\quantization\pt2e\test_x86inductor_quantizer.py", line 553, in _test_quantizer m = capture_pre_autograd_graph( File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\_export\__init__.py", line 121, in capture_pre_autograd_graph raise RuntimeError("capture_pre_autograd_graph not yet supported on Windows") RuntimeError: capture_pre_autograd_graph not yet supported on Windows To execute this test, run the following from the base repo dir: python test\quantization\pt2e\test_x86inductor_quantizer.py -k TestQuantizePT2EX86Inductor.test_qat_conv2d This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ``` For https://github.com/pytorch/pytorch/issues/132569 ```cmd Traceback (most recent call last): File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 59, in testPartExecutor yield File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 591, in run self._callTestMethod(testMethod) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 549, in _callTestMethod method() File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 2918, in wrapper method(args, *kwargs) File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_torchinductor.py", line 11218, in new_test return value(self) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\_dynamo\testing.py", line 312, in _fn return fn(args, *kwargs) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\contextlib.py", line 79, in inner return func(args, *kwds) File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_cpu_cpp_wrapper.py", line 155, in fn _, code = test_torchinductor.run_and_get_cpp_code( File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\_inductor\utils.py", line 1863, in run_and_get_cpp_code result = fn(args, *kwargs) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_quantization.py", line 415, in wrapper fn(args, *kwargs) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_quantization.py", line 367, in wrapper fn(args, **kwargs) File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_mkldnn_pattern_matcher.py", line 1668, in test_qlinear_gelu_cpu self._qlinear_unary_cpu_test_helper((torch.randn((2, 4)),), gelu) File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_mkldnn_pattern_matcher.py", line 1615, in _qlinear_unary_cpu_test_helper self._test_common( File "D:\xu_git\dnnl_cb\pytorch\test\inductor\test_mkldnn_pattern_matcher.py", line 165, in _test_common convert_model = _generate_qdq_quantized_model( File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_quantization.py", line 2949, in _generate_qdq_quantized_model export_model = capture_pre_autograd_graph( File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\_export\__init__.py", line 121, in capture_pre_autograd_graph raise RuntimeError("capture_pre_autograd_graph not yet supported on Windows") RuntimeError: capture_pre_autograd_graph not yet supported on Windows To execute this test, run the following from the base repo dir: python test\inductor\test_cpu_cpp_wrapper.py -k DynamicShapesCppWrapperCpuTests.test_qlinear_gelu_cpu_dynamic_shapes_cpp_wrapper This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 --------------------------------------------------------------------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------------------------------------------------------------------- W0807 13:24:34.291000 11228 torch\_export\__init__.py:64] +============================+ W0807 13:24:34.291000 11228 torch\_export\__init__.py:65] \| !!! WARNING !!! \| W0807 13:24:34.291000 11228 torch\_export\__init__.py:66] +============================+ W0807 13:24:34.291000 11228 torch\_export\__init__.py:67] capture_pre_autograd_graph() is deprecated and doesn't provide any function guarantee moving forward. W0807 13:24:34.291000 11228 torch\_export\__init__.py:68] Please switch to use torch.export instead. ``` Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132841 Approved by: https://github.com/jgong5, https://github.com/ezyang	2024-08-08 00:28:07 +00:00
Edward Z. Yang	9282e6ca78	Don't use _disable_current_modes as decorator (#132809 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132809 Approved by: https://github.com/albanD ghstack dependencies: #132801, #132802, #132804	2024-08-07 23:59:46 +00:00
angelayi	c327710a87	[export] Publicize validate function (#132777 ) as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/132777 Approved by: https://github.com/zhxchen17	2024-08-07 23:10:05 +00:00
PyTorch MergeBot	9d476fee53	Revert "[BE] Simplify code interacting with get_proxy_mode/enable_tracing (#132675 )" This reverts commit `c2bccfd431`. Reverted https://github.com/pytorch/pytorch/pull/132675 on behalf of https://github.com/PaliC due to We need to now revert https://github.com/pytorch/pytorch/pull/132216 in OSS and there is a dependency on this pr ([comment](https://github.com/pytorch/pytorch/pull/132674#issuecomment-2274062785))	2024-08-07 18:25:33 +00:00
Shangdi Yu	825002c9c6	[export][fx] More robust DCE pass (#132764 ) Summary: - make default DCE pass check schema, - need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added). - mark Proxy dump as NotImplemented for better error msg - Remove Proxy from tensors when dumping models, as Proxy cannot be dumped. More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing. Test Plan: CI ``` - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d - test_export.py - buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export - buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce - buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False - buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node ``` Reviewed By: angelayi Differential Revision: D60319175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764 Approved by: https://github.com/angelayi	2024-08-06 22:27:22 +00:00
Edward Z. Yang	c2bccfd431	[BE] Simplify code interacting with get_proxy_mode/enable_tracing (#132675 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132675 Approved by: https://github.com/Skylion007, https://github.com/ydwu4, https://github.com/zou3519 ghstack dependencies: #132674	2024-08-06 18:13:22 +00:00
Jiashen Cao	ca7ce2fca1	[ts-migration][1/N]: Add prim::Loop for constant number of iterations and condition (#131418 ) #### Description This PR adds prim::Loop support for the simplest case where the number of iteration is constant and the loop termination condition is also a constant. [PR by stages](https://docs.google.com/document/d/1q6OprW3HBHbYPwEyE_DikBn-uzmhnN284Cmen_CnlhI/edit?usp=sharing) #### Test Plan Add reprod example. * `pytest test/export/test_converter.py -s -k test_ts2ep_with_loop` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131418 Approved by: https://github.com/angelayi	2024-08-06 16:51:08 +00:00
Shangdi Yu	93fad2f0f2	[export] Fix import in D60427208 (#132707 ) Summary: D60427208 broke APS release by failing our NE deterministric test. https://www.internalfb.com/intern/test/562950111197340/ This Diff fixes it. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//aps_models/ads/gmp/tests/ne/e2e_deterministic_tests:gmp_e2e_ne_tests -- --filter-text test_mtml_instagram_model_474023725_single_gpu_with_ir ``` Differential Revision: D60790203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132707 Approved by: https://github.com/ydwu4	2024-08-06 02:35:17 +00:00
Shangdi Yu	4a2cf50edf	[export][reland] Convert autocast to HOO (#132677 ) Summary: Reland of D60206382. Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_autocast" buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_set_grad" ``` Verified that now we can export the llama model in gh issue 128394 and the gemma model in gh issue 131829 without error. Differential Revision: D60770038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132677 Approved by: https://github.com/angelayi	2024-08-05 22:34:52 +00:00
PyTorch MergeBot	a3ea96b762	Revert "[export] Convert autocast to HOO (#131914 )" This reverts commit `aec948adfc`. Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/davidberard98 due to PR shouldn't have been relanded by the bot, phabricator diff did not have any recent changes and is still internally reverted ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2269797388))	2024-08-05 19:52:09 +00:00
Shangdi Yu	aec948adfc	[export] Convert autocast to HOO (#131914 ) Summary: Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` parsh --build-flags fbcode//mode/dev-nosan fbcode//caffe2/test:test_export run_tests("test_predispatch_autocast") ``` Reviewed By: angelayi Differential Revision: D60206382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914 Approved by: https://github.com/angelayi	2024-08-05 18:52:12 +00:00
Avik Chaudhuri	27f61eba58	serde sympy functions (#132493 ) Summary: Sympy functions appearing in symbolic expressions inside tensor metadata were not being deserialized properly. Test Plan: updated test Differential Revision: D60573150 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132493 Approved by: https://github.com/pianpwk	2024-08-05 08:08:50 +00:00
Jiashen Cao	09fcd792eb	[Fix]: ScriptObject lifting issue (#130952 ) #### Issue ScriptObject was treated as normal attribute by the converter previously. This PR lifts it to be a constant and convert it directly to a GetAttr fx node. ScriptObject would also trigger `CallMethod` and this PR adds that support as well. #### Test Plan Add test case for ScriptObject. `pytest test/export/test_converter.py -s -k test_convert_script_object` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130952 Approved by: https://github.com/angelayi	2024-08-04 16:52:45 +00:00
PyTorch MergeBot	d984105748	Revert "[export] Convert autocast to HOO (#131914 )" This reverts commit `b28c01d90d`. Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/ezyang due to Failing lint, but was covered up by master failure on lint ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2267248773))	2024-08-04 02:10:35 +00:00
Pian Pawakapan	a896fb1b36	check unsupported sympy functions for runtime asserts (#132457 ) Some sympy Functions aren't supported by sympy_interp(); we can't turn them into FX nodes, so currently the runtime asserts CSE pass avoids CSE'ing on any expression containing a sympy Function. https://github.com/pytorch/pytorch/pull/132325 started tracking unsupported functions, so we switch the check to that to be more precise. We also check for and skip unsupported functions when adding asserts - previously we only did the check for CSE, and not adding new expressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132457 Approved by: https://github.com/avikchaudhuri	2024-08-03 10:17:25 +00:00
Jiashen Cao	159d508f03	[Fix]: prim::If with multiple outputs and input return directly (#131779 ) #### Issue Test is not working for prim::Loop with multiple outputs. Additionally fix issue where input is directly returned, which is not supported by HigherOrderOp. #### Test Plan `pytest test/export/test_converter.py -s -k test_convert_if_multiple_out` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131779 Approved by: https://github.com/angelayi, https://github.com/SherlockNoMad	2024-08-03 08:07:21 +00:00
Shangdi Yu	b28c01d90d	[export] Convert autocast to HOO (#131914 ) Summary: Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` parsh --build-flags fbcode//mode/dev-nosan fbcode//caffe2/test:test_export run_tests("test_predispatch_autocast") ``` Reviewed By: angelayi Differential Revision: D60206382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914 Approved by: https://github.com/angelayi	2024-08-03 05:48:57 +00:00
Shangdi Yu	a503136583	[export] Detect whether case_name is registered in exportdb (#132420 ) Summary: - moves logging functionalities into `torch/_export/db/logging.py` file. - add a check in `_dynamo/eval_frame.py` to check for optional input and error out with `UnsupportedError` - change the case name of `torch_sym_int` to `unsupported_operator` - Check if the case name is registered in exportdb, if so, we give a link to the case in exportdb. - TODO: add test Test Plan: CI Running the example in https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input gives the following error logging: ``` E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] Parameter y is optional with a default value of tensor([[-0.1633, 1.2414, -0.1071], E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] [-0.1936, -0.9425, -0.0824]]) E0730 10:53:33.688000 4155538 torch/export/_trace.py:1043] See optional_input in exportdb for unsupported case. https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input ...... File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/389acaeb40d57230/tutorials/pytorch/nntest/__torchtest__/torchtest#link-tree/torch/_dynamo/eval_frame.py", line 1091, in produce_matching raise Unsupported( torch._dynamo.exc.Unsupported: Tracing through optional input is not supported yet ``` It also logs a `export.error.classified` event in Scuba. Reviewed By: zhxchen17 Differential Revision: D60427208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132420 Approved by: https://github.com/zhxchen17	2024-08-03 01:08:48 +00:00
Sherlock Huang	e3513fb2af	[ts_converter]handle python list append, list add, aten.to.dtype+mutation_op pattern (#132529 ) Summary: #### Description Add support for aten::append with a python function that returns a new list with the appended element. We then update the `fx_node` in the `name_to_node` mapping. aten::append contributed by Jiashen Cao <jiashenc@meta.com> Fix conversion for csr_ranker_test ``` model_name: csr_ranker_test_4.ptl has_ts_model: True has_sample_inputs: True ops_maybe_missing_meta: set() script_objects: set() ts_can_run: True ts_run_exception: None can_convert: True convert_exception: None ep_result_correct: True ep_run_exception: None can_package: True package_exception: None sigmoid_can_run: False sigmoid_run_exception: RuntimeError('not for symbolics') sigmoid_result_correct: None ``` Test Plan: test_aten_add_t test_aten_append_t test_aten_to_dtype_with_mutating_storage buck2 run mode/opt sigmoid/inference/ts_migration:main -- --mode test_one --model_name csr_ranker_test Differential Revision: D60635893 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132529 Approved by: https://github.com/jiashenC	2024-08-02 23:32:37 +00:00
Yidi Wu	19897a1647	[export] change deepcopy to copy in _replace_set_grad_with_hop pass.. (#132181 ) Summary: Fixes T197371132. Previously, we call copy.deepcopy to avoid mutating the original signature. However, this causes errors when the signature reference a FakeScriptObject, which then references a real torch.ScriptObject due to "The tensor has a non-zero number of elements, but its data is not allocated yet." We therefore just change it to a shallow copy. This should be good enough for guarding the signature. Test Plan: buck2 run 'fbcode//mode/opt' torchrec/distributed/tests:test_pt2 -- --filter-text "test_sharded_quant_ebc_non_strict_export" Differential Revision: D60476839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132181 Approved by: https://github.com/BoyuanFeng	2024-08-02 17:57:09 +00:00
Mikayla Gawarecki	87ddf70fc6	Set weights_only=False in export `deserialize_torch_artifact` (#132348 ) Context: We are planning to make a BC breaking change to `torch.load` by flipping the default for `weights_only` from `False` --> `True` in a future release. With `weights_only=True`, a custom unpickler is used that limits what can be loaded to state_dicts containing tensors (there is also a way for the user to allowlist specific things to be loaded). The goal of this is to attempt to prevent remote execution of arbitrary code when using `torch.load`. To my understanding, in export, `torch.load` is used internally to load arbitrary objects, so we should set `weights_only=False` here to prevent the flip from breaking export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132348 Approved by: https://github.com/angelayi	2024-08-01 23:25:07 +00:00
angelayi	010fc7858a	[export] Fix serialization of OpOverload w/ SymInt outputs (#132126 ) Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1473575486613991/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/132126 Approved by: https://github.com/ydwu4	2024-08-01 17:22:04 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Justin Chu	0d88dd0f77	[TS2E] Remove reference to torch.onnx internals (#132186 ) Instead, this PR moves the code to the converter to avoid dependence. Feel free to refactor it afterward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132186 Approved by: https://github.com/angelayi	2024-08-01 15:08:02 +00:00
Xuehai Pan	e7eeee473c	[BE][Easy][14/19] enforce style for empty lines in import segments in `torch/_[a-c]/` and `torch/_[e-h]/` and `torch/_[j-z]*/` (#129765 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129765 Approved by: https://github.com/ezyang	2024-07-31 10:42:50 +00:00
Jiashen Cao	ff377e16ab	Improve logging in the TSConverter (#132082 ) Summary: Currently, running explain with TORCH_LOGS enabled will cause duplicate loggings because explain uses the exact same code path for covnersion. This PR just disables logging when it is running explain. And move all logging to convert() to prevent from logging from __init__ when we are just using explain. Test Plan: Manual testing with attached outputs. Reviewed By: SherlockNoMad, angelayi Differential Revision: D60199007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132082 Approved by: https://github.com/ydwu4	2024-07-30 21:37:44 +00:00
PyTorch MergeBot	945bf78894	Revert "[BE] typing for decorators - fx/_compatibility (#131568 )" This reverts commit `193f62fde9`. Reverted https://github.com/pytorch/pytorch/pull/131568 on behalf of https://github.com/clee2000 due to same as https://github.com/pytorch/pytorch/pull/131572#issuecomment-2254328359 but I clicked the wrong link by accident. This is where it actually starts ([comment](https://github.com/pytorch/pytorch/pull/131568#issuecomment-2254330781))	2024-07-28 03:43:39 +00:00
Yidi Wu	404a8ae8f6	[export] fix set_grad x tensor constant. (#131787 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/130379. The original error is verifier finds that the placeholder nodes' meta[''val"] are missing in subgraph of WrapSetGradEnabled hop. In this PR, we fixed it by re-ordering the replace_set_grad_with_hop_pass with lift_constant_tensor pass because only after lift_constant_pass, all the constant attrs start to have meta["val"]. Test Plan: buck2 test test:test_export -- -r "test_setgrad_lifted_tensor" Differential Revision: D60244935 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131787 Approved by: https://github.com/yushangdi	2024-07-26 16:41:59 +00:00
Avik Chaudhuri	5b05ad9697	fix non-persistent buffers (#131756 ) Summary: Dynamo doesn't track whether buffers are `persistent`. This led to some ugly code where we would mark buffers as always persistent when creating signatures, then later check whether the buffers were not in the state dict to infer whether they were non-persistent, and use this to fix up the signature. This PR instead defines a utility to look up all the non-persistent buffers registered inside a module (this information is recorded in a private `_non_persistent_buffers_set` module attribute), and uses it to (a) correctly set the persistent flag on buffers when creating signatures (b) transfer this information to a Dynamo-traced graph module, which then causes non-persistent buffers to (correctly) not show up in the state dict. Test Plan: existing tests + new case with non-persistent buffer in nested module Differential Revision: D60224656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131756 Approved by: https://github.com/zhxchen17, https://github.com/ydwu4	2024-07-26 04:45:30 +00:00
Aaron Orenstein	193f62fde9	[BE] typing for decorators - fx/_compatibility (#131568 ) See #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131568 Approved by: https://github.com/justinchuby, https://github.com/oulgen, https://github.com/zou3519	2024-07-25 22:24:19 +00:00
angelayi	f063027d54	[aoti] Fix constant inputs passed to aoti (#131594 ) In cases where the program takes in a constant, export will specialize on the constant and embed the constant into the graph, with the graph containing a placeholder node with no users. However, inductor errors further down as typically in torch.compile, these constants don't show up as inputs. Since these constants are already embedded in the graph, we will just ignore these inputs while compiling with AOTI, and filter out the non-tensor inputs during the runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/131594 Approved by: https://github.com/desertfire	2024-07-25 16:22:15 +00:00
Sherlock Huang	96e8df6a3a	[ts_converter] Support prim::max and prim::if with multiple outputs (#131593 ) Summary: As title. Test Plan: test_converter.py Reviewed By: angelayi Differential Revision: D60147455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131593 Approved by: https://github.com/ydwu4	2024-07-25 16:13:31 +00:00
Justin Chu	c88c90a897	[TS2E] Improve logging (#131711 ) Serializing the text without having to do so can be costly for large outputs like ExportedProgram Pull Request resolved: https://github.com/pytorch/pytorch/pull/131711 Approved by: https://github.com/ydwu4	2024-07-25 13:40:10 +00:00
angelayi	ab609d6aa6	[ts_convert] Update conversion for aten.tensor (#131549 ) Fixes aten::tensor issues in edgeml models P1492137675 \| suite \| #models \| #has_ts_model \| #has_sample_inputs \| #ts_can_run \| #can_convert \| #ep_result_correct \| #can_package \| #sigmoid_can_run \| #sigmoid_result_correct \| \|---------\|-----------\|-----------------\|----------------------\|---------------\|----------------\|----------------------\|----------------\|--------------------\|---------------------------\| \| EDGEML \| 34 \| 25 \| 23 \| 21 \| 2 \| 2 \| 2 \| 2 \| 2 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/131549 Approved by: https://github.com/jiashenC, https://github.com/SherlockNoMad	2024-07-25 01:11:03 +00:00
Angela Yi	7535b23a25	[export] Fix set_grad hoo if output is empty (#131511 ) Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1467707973867409/ Test Plan: CI Differential Revision: D60135531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131511 Approved by: https://github.com/ydwu4	2024-07-24 23:17:20 +00:00
Shangdi Yu	f9322c26b2	Remove _export/exported_program.py (#131597 ) Summary: We removed references to _export/exported_program.py in executorch in D60052318. Now we can remove this file. Update the pin to executorch. Test Plan: contbuild & OSS CI: Differential Revision: D60072980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131597 Approved by: https://github.com/avikchaudhuri	2024-07-24 22:04:17 +00:00
angelayi	b90aa18569	[aoti] Add initial custom op support (#127034 ) Re-land of https://github.com/pytorch/pytorch/pull/125242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127034 Approved by: https://github.com/malfet	2024-07-24 20:29:55 +00:00
Jiashen Cao	31da9ee711	Use explain function to provide more meaningful information when conversion failed. (#131214 ) Summary: In the script of testing different families of models, when the conversion failed, we switch to use output from the explain function to provide more meaningful information. Test Plan: Manual testing with attatched log information. ``` buck2 run mode/dev-nosan sigmoid/inference/ts_migration:main -- --mode test_all --test_suites ads_merge --model_id 440779101 ``` ``` Processing 440779101_5455.predictor.disagg.gpu.merge model_name: 440779101_5455.predictor.disagg.gpu.merge has_ts_model: True has_sample_inputs: True ops_maybe_missing_meta: set() ts_can_run: True ts_run_exception: None can_convert: False convert_exception: Unsupported nodes are found in the following list: 0. prim::Loop [%14259 : int = prim::Loop(%14258, %1129, %1126), scope: torch.fx.graph_module.GraphModule:: # <torch_package_1>.caffe2/torch/fb/predictor/modules/tensors_to_device_module.py💯19] 1. prim::Loop [%14326 : int = prim::Loop(%1115, %1129, %14259), scope: torch.fx.graph_module.GraphModule:: # <torch_package_1>.caffe2/torch/fb/predictor/modules/tensors_to_device_module.py💯19] ep_result_correct: None ep_run_exception: None can_package: None package_exception: None sigmoid_can_run: None sigmoid_run_exception: None sigmoid_result_correct: None ``` Reviewed By: SherlockNoMad Differential Revision: D59971446 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131214 Approved by: https://github.com/angelayi	2024-07-24 02:42:18 +00:00
Shangdi Yu	fdc9a1404e	Remove _BLACK_LISTED_OPS (#131361 ) Summary: remove _BLACK_LISTED_OPS after https://github.com/pytorch/pytorch/pull/100749 Test Plan: contbuild & OSS CI Differential Revision: D60056130 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131361 Approved by: https://github.com/angelayi	2024-07-24 00:15:27 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
Avik Chaudhuri	94f22eb6b2	refactor post-trace fakification in strict (#131421 ) Summary: Previously it was unclear what `_convert_input_to_fake` actually does (used in strict), and in particular how it is different from `make_fake_inputs` (used in non-strict). This PR splits that function to work purely on user inputs, then renames it to `extract_fake_inputs` and adds a comment clarifying what it does—namely, it extracts fake inputs from a given graph module instead of "converting inputs to fake inputs" (as suggested by the current name) or "making fake inputs" (as happens in non-strict, where no tracing has taken place yet). The remainder of that function used to also fakify params and buffers. It turns out that this part is identical to what happens in non-strict, hence we also pull `make_fake_inputs` out from `non_strict_utils` into `_trace`, merge it with another util, and make both modes call it. Test Plan: existing tests Differential Revision: D60084442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131421 Approved by: https://github.com/zhxchen17	2024-07-23 18:23:03 +00:00
Shangdi Yu	f85c35872b	Remove GraphModuleOpUpgrader in _export.serde.upgrade.py (#131373 ) Summary: Remove GraphModuleOpUpgrader in _export.serde.upgrade.py and the file Test Plan: contbuild & OSS CI Differential Revision: D60067937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131373 Approved by: https://github.com/angelayi	2024-07-23 18:09:44 +00:00
Zhengxu Chen	3aa45cae77	[export] Removed deprecated dialect field from EP schema. [2/2] (#131344 ) Summary: Not landable until we've updated the pin of executorch. Test Plan: CI Reviewed By: SherlockNoMad Differential Revision: D59759620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131344 Approved by: https://github.com/SherlockNoMad, https://github.com/ydwu4	2024-07-23 16:05:10 +00:00
Avik Chaudhuri	1e5ecc4277	move save/load from _export to export (#131353 ) Test Plan: existing tests Differential Revision: D60053905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131353 Approved by: https://github.com/angelayi	2024-07-23 00:48:28 +00:00
PyTorch MergeBot	b9912f31ef	Revert "[export] fix zero arg export in training_ir (#130990 )" This reverts commit `50436d5bdb`. Reverted https://github.com/pytorch/pytorch/pull/130990 on behalf of https://github.com/clee2000 due to failing some executorch and torchrec tests internally D60006710 ([comment](https://github.com/pytorch/pytorch/pull/130990#issuecomment-2243395316))	2024-07-22 16:49:25 +00:00
Yidi Wu	50436d5bdb	[export] fix zero arg export in training_ir (#130990 ) Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing. edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure: a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state. The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130990 Approved by: https://github.com/pianpwk	2024-07-20 02:35:13 +00:00
Jiashen Cao	9b5c70878b	[Fix] Missing parameter happens when retracing an already jit.scripted module (#129787 ) #### Issue Model parameters sometime do not appear in the `named_parameters()` function. For example, when trying to jit.trace an already jit.scripted model. This PR fixes that by relying on `state_dict` to get both parameters`requires_grad=True` and buffers. #### Test Plan * `pytest test/export/test_converter.py -s -k test_convert_retrace_nested_scripted_modules` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129787 Approved by: https://github.com/angelayi	2024-07-19 16:58:48 +00:00
Jiashen Cao	686b7f046a	[Fix]: TSConverter handles call ops with multiple outputs (#129294 ) #### Issue * Current call ops does not handle IR with multiple outputs. If an op has multiple outputs, we add an implicit unpack to map output. E.g., ``` %5 : Tensor, %6 : Tensor = aten::max(%x.1, %3, %4), scope: export.test_converter.M:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:774:20 ``` * There are some cases that `prim::If` sub-blocks do not return any outputs. E.g., ``` %9 : bool = aten::gt(%8, %3), scope: export.test_converter.M::/torch.nn.modules.pooling.AdaptiveMaxPool2d::pool # <string>:5:9 = prim::If(%9), scope: export.test_converter.M::/torch.nn.modules.pooling.AdaptiveMaxPool2d::pool # <string>:5:2 block0(): -> () block1(): = prim::RaiseException(%5, %4), scope: export.test_converter.M::/torch.nn.modules.pooling.AdaptiveMaxPool2d::pool # <string>:5:2 -> () ``` #### Test Plan We did an exhaustive search of all torch APIs that can return multiple outputs. We sample some of common ones and add new test cases based on those. * `pytest test/export/test_converter.py -s -k test_ts2ep_multi_outputs_on_call_ops` #### Appendix * aten ops that return multiple outputs. ``` aten._batch_norm_impl_index aten._batch_norm_no_update aten._batch_norm_with_update aten._batch_norm_with_update_functional aten._cudnn_rnn aten._efficient_attention_backward aten._efficient_attention_forward aten._embedding_bag aten._embedding_bag_forward_only aten._flash_attention_backward aten._flash_attention_forward aten._fused_adam aten._fused_dropout aten._fused_moving_avg_obs_fq_helper aten._linalg_det aten._linalg_eigh aten._linalg_slogdet aten._linalg_solve_ex aten._linalg_svd aten._native_batch_norm_legit aten._native_batch_norm_legit_functional aten._native_batch_norm_legit_no_training aten._pack_padded_sequence aten._prelu_kernel_backward aten._scaled_dot_product_efficient_attention aten._scaled_dot_product_efficient_attention_backward aten._scaled_dot_product_flash_attention aten._scaled_dot_product_flash_attention_backward aten._scaled_dot_product_flash_attention_for_cpu aten._scaled_dot_product_flash_attention_for_cpu_backward aten._thnn_fused_lstm_cell aten._thnn_fused_lstm_cell_backward_impl aten._unique2 aten._weight_norm_interface aten.adaptive_max_pool2d aten.adaptive_max_pool3d aten.aminmax aten.batch_norm_backward aten.convolution_backward aten.cudnn_batch_norm aten.cudnn_batch_norm_backward aten.cummax aten.cummin aten.fractional_max_pool2d aten.frexp aten.grid_sampler_2d_backward aten.grid_sampler_3d_backward aten.gru aten.linalg_cholesky_ex aten.linalg_eig aten.linalg_inv_ex aten.linalg_ldl_factor_ex aten.linalg_lu aten.linalg_lu_factor_ex aten.linalg_qr aten.linear_backward aten.log_sigmoid_forward aten.lstm aten.lu_unpack aten.max aten.max_pool2d_with_indices aten.max_pool3d_with_indices aten.median aten.min aten.miopen_batch_norm aten.miopen_batch_norm_backward aten.mkldnn_rnn_layer aten.mkldnn_rnn_layer_backward aten.mode aten.multilabel_margin_loss_forward aten.nanmedian aten.native_batch_norm aten.native_batch_norm_backward aten.native_dropout aten.native_group_norm aten.native_group_norm_backward aten.native_layer_norm aten.native_layer_norm_backward aten.nll_loss2d_forward aten.nll_loss_forward aten.quantized_gru aten.quantized_lstm aten.rnn_relu aten.rnn_tanh aten.sort aten.std_mean aten.topk aten.triangular_solve aten.unique_dim aten.var_mean ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129294 Approved by: https://github.com/angelayi	2024-07-18 21:55:18 +00:00
Pian Pawakapan	745324e487	[export] turn on hybrid symints by default (#130775 ) Sets `prefer_deferred_runtime_asserts_over_guards=True` for export, so any guards emitted from `SymNode.expect_true` (for example, guards that are implicitly required to be true for an op to succeed) won't lead to constraint violations. Instead these should appear in the graph as runtime asserts, or potentially as replacement expressions for placeholder shapes. For example, this reshape op should emit s0 * s1 = s2, deferred as a runtime assert. ``` x = torch.randn(4, 8) # [s0, s1] y = torch.randn(32) # [s2] out = x.reshape(-1) + y # this emits Eq(s0 * s1, s2), and we represent y's shape as [s0s1] in the graph. ``` However, other complex guards can still cause export to fail, for instance guards emitted from `SymNode.guard_bool/guard_size_oblivious` (e.g. explicit if-else conditions in user code or lower-level op implementations hit during tracing) can still raise constraint violations. These can be deferred with `allow_complex_guards_as_runtime_asserts=True`. We don't yet make this default, because while this makes export more likely to succeed, it results in non-trivial asserts being emitted that often represent specialization to a variant of the op, or checks related to 0/1 specialization. We also remove forced specializations for export and kill the `_disable_forced_specializations` flag - now any guard we can't express with Dims/DerivedDims either are handled with Hybrid SymInts, or should be resolved with rewriting or deferring. Follow up: Currently, `ShapeEnv._set_replacement()` is called for complex equality expressions (e.g. s2 -> s0s1 in the example above), and the ExportedProgram stores `s0*s1` in the input placeholder. This isn't checked for validity when the program is run, so an option is to avoid replacement and/or runtime assert on equality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130775 Approved by: https://github.com/avikchaudhuri	2024-07-18 17:40:58 +00:00
Zhengxu Chen	5484c86021	[export] Fully support extension op in serialization/deserialization. (#130851 ) Summary: Finishing up the mechanism to "register" certain types of operators to a registry so that the serializer can handle them correctly. This is expected to be firstly used by executorch. Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_export_with_extension_op_serialization Differential Revision: D59825148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130851 Approved by: https://github.com/angelayi	2024-07-18 16:47:53 +00:00
Boyuan Feng	90105a4f3e	[ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416 ) - Support raise exception. It's behavior matches non-strict export now, thanks to @ydwu4's [PR](https://github.com/pytorch/pytorch/pull/128709). - Support prim::Unitialized, prim::Enter, and prim::Exit Pull Request resolved: https://github.com/pytorch/pytorch/pull/129416 Approved by: https://github.com/angelayi	2024-07-17 21:59:52 +00:00
PyTorch MergeBot	0eb43ed189	Revert "[ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416 )" This reverts commit `f0faecd291`. Reverted https://github.com/pytorch/pytorch/pull/129416 on behalf of https://github.com/clee2000 due to broke lint, but for for torch/_inductor/codecache.py this time https://github.com/pytorch/pytorch/actions/runs/9981737836/job/27586013811 `f0faecd291` ([comment](https://github.com/pytorch/pytorch/pull/129416#issuecomment-2234387254))	2024-07-17 21:55:48 +00:00
Boyuan Feng	f0faecd291	[ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416 ) - Support raise exception. It's behavior matches non-strict export now, thanks to @ydwu4's [PR](https://github.com/pytorch/pytorch/pull/128709). - Support prim::Unitialized, prim::Enter, and prim::Exit Pull Request resolved: https://github.com/pytorch/pytorch/pull/129416 Approved by: https://github.com/angelayi	2024-07-17 21:27:45 +00:00
PyTorch MergeBot	1bf4a44b33	Revert "[ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416 )" This reverts commit `ef0511245a`. Reverted https://github.com/pytorch/pytorch/pull/129416 on behalf of https://github.com/clee2000 due to broke lint for test/export/test_converter.py https://github.com/pytorch/pytorch/actions/runs/9979009143/job/27577181982 `ef0511245a`. Probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/129416#issuecomment-2234067407))	2024-07-17 19:21:52 +00:00
Boyuan Feng	ef0511245a	[ts-migration] Support RaiseException, prim::Unitialized, prim::Enter, and prim::Exit (#129416 ) - Support raise exception. It's behavior matches non-strict export now, thanks to @ydwu4's [PR](https://github.com/pytorch/pytorch/pull/128709). - Support prim::Unitialized, prim::Enter, and prim::Exit Pull Request resolved: https://github.com/pytorch/pytorch/pull/129416 Approved by: https://github.com/angelayi	2024-07-17 17:48:36 +00:00
Jiashen Cao	67e22d6c61	[Fix]: Convert operator that does specialization to its symbolic counterpart (#129578 ) #### Issue During conversion, use symbolic operator when exist. #### Test Plan `pytest test/export/test_converter.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129578 Approved by: https://github.com/angelayi	2024-07-16 17:19:57 +00:00
Jiashen Cao	3f031b96c6	[Fix] Correctly identifying arguments for sub-blocks with renaming logic during TorchScript to ExportedProgram conversion (#128386 ) #### Issue Fix two issues related to inputs lifting when there are sub-blocks. * Some inputs may appear in the nested sub-blocks, which need a recursive search to identify which arguments need to be lifted / passed in the top-level block. * Some inputs to the sub-block are intermediate results, meaning their names are only number. This will cause issue during code generation (i.e., invalid argument name). We rename those to valid names. #### Test Plan * `pytest test/export/test_converter.py -s -k test_convert_nn_module_with_nested_if_and_param` * `test/export/test_converter.py -s -k test_hidden_input_name` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128386 Approved by: https://github.com/angelayi	2024-07-15 22:48:13 +00:00
Aaron Orenstein	567482973d	typing fake_tensor.py (#128041 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128041 Approved by: https://github.com/eellison ghstack dependencies: #129182	2024-07-13 06:07:40 +00:00
Yidi Wu	0bf9a091ec	[torchbind] add tracing_mode support (#129586 ) Sometimes, it could be difficult to write a fake class e.g. when the original implementation is using some third-party libraries or users are certain that the class is safe to trace with the real object. This PR allows user to specify their intention by implementing a "safe_to_trace_with_real_obj" method on their script class. Test Plan: `pytest test/export/test_torchbind.py -k safe` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129586 Approved by: https://github.com/zou3519	2024-07-12 18:01:47 +00:00
Pian Pawakapan	988ed4d5db	[export] clean up allow_complex_guards_as_runtime_asserts flag (#130596 ) Summary: removes underscore, cleans up dead code in DimConstraints Test Plan: existing export tests Reviewed By: angelayi Differential Revision: D59612746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130596 Approved by: https://github.com/angelayi	2024-07-12 17:17:11 +00:00
Tarun Karuturi	ff25dfca5a	Save quantization_tag in export graph serialization (#127473 ) Summary: `quantization_tag` is a first class citizen metadata in quantization flows that is preserved by it. As we'll want to store the quantized exported graphs we also need to preserve this metadata as it's used in later flows. Only json supported metadata will be allowed to be serialized. Test Plan: Added test case Differential Revision: D57939282 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127473 Approved by: https://github.com/angelayi	2024-07-12 05:06:40 +00:00
Zhengxu Chen	726a287271	[export] Expand verifier to be multiple on ExportedProgram (#130364 ) Summary: This diff updates the ExportedProgram class in PyTorch to allow for multiple verifiers to be attached to it. This is done by adding a new field to the ExportedProgram schema called "verifiers" which is a list of strings representing the names of the verifiers to be attached to the program. The verifiers are loaded using the "load_verifier" function which is defined in the "torch._export.serde.serialize" module. The "exported_program.dialect" field is also deprecated in favor of the "verifiers" field. Test Plan: CI Differential Revision: D59408546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130364 Approved by: https://github.com/angelayi, https://github.com/ydwu4	2024-07-11 20:34:49 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Pian Pawakapan	1b3b4c2fb9	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) (#130380 ) original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train) Summary: This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Test Plan: contbuild & OSS CI, see `940e4477ab` Original Phabricator Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D59543603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380 Approved by: https://github.com/izaitsevfb	2024-07-10 19:23:37 +00:00
Shangdi Yu	c83b941141	[export] add dynamic shapes argument and infer from graph nodes (#129928 ) Fixes the example in #118304 for `torch._functorch.aot_autograd.aot_export_module` and `torch.export.export`. On a high level, the issue is caused by not detecting fake_mode when there's no input. Change plan: 1) we add a `dynamic_shapes: Union[bool, None] = None` arg to `aot_export_module` and `_aot_export_function`. 2) if the input is not a graph module, then we can only rely on this `dynamic_shapes` input arg. 3) If the input is a graph module, then we can traverse the graph and check. 4) So we check if the input mod is a graph module or just a module, and do 2) or 3) depending on the type. Fixes #129927 Bug source: dynamo's fake_mode is not detected correctly in `_convert_input_to_fake` in `_traced.py` when there’s no input to the graph). So in ` _strict_export_lower_to_aten_ir`, we create another fake_mode. `dynamo_fake_mode` is not the same as the fake_mode used by dynamo. Change plan: check `gm_torch_level` graph's node meta "example_value" for fake mode in addition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129928 Approved by: https://github.com/angelayi	2024-07-10 15:51:05 +00:00
PyTorch MergeBot	9c9744c3ac	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `940e4477ab`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))	2024-07-09 21:03:49 +00:00
Yidi Wu	cb4bec311a	Fix nodes has more than one output users after replace_set_grad_with_hop pass (#129716 ) Summary: Previously, when we inline the subgraphs that doesn't have a different require_grad environment, we didn't clean up the nodes's users in subgraph and direcly used them to to replace the output of the call_modules. This records dead depencies in node.users. This PR fixes this. Test Plan: Added a new test. Also see the torchrec tests: Step 1: buck run mode/dev-nosan //aimp/experimental/pt2:pt2_export -- --model-entity-id 934687114 --output /tmp/934687114.zip --use-torchrec-eager-mp --use-manifold Step 2: buck run mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true aimp/cli:cli -- --platform=aps --template=disagg_gpu_aps_pt2 --pt2 --model-entity-id=934687114 non-request-only-tagging torchrec-shard-and-quantize gpu-disagg-split assign-device materialize-weights script-and-save Differential Revision: D59132214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129716 Approved by: https://github.com/angelayi	2024-07-09 17:04:03 +00:00
Pian Pawakapan	940e4477ab	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-07 20:10:14 +00:00
PyTorch MergeBot	963f430d13	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `0267b2ddcb`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a landrace and fails inductor/test_cudagraph_trees in trunk `0267b2ddcb` ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2211690518))	2024-07-06 07:20:05 +00:00
Pian Pawakapan	0267b2ddcb	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-06 03:44:49 +00:00
Jiashen Cao	7c5f3cd049	Add explain function to TSConverter. (#129968 ) Summary: The explain function does a conversion dry run to provide feedback on which operators are not supported / fail the conversion to the users. Test Plan: * `pytest test/export/test_converter.py` Differential Revision: D59251934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129968 Approved by: https://github.com/angelayi	2024-07-05 18:04:29 +00:00
Zhengxu Chen	042d764872	[export] Update example inputs format for DB. (#129982 ) Summary: To give user a simpler example code, we are getting rid of ExportArgs in favor of example_args and example_kwargs. Test Plan: CI Differential Revision: D59288920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129982 Approved by: https://github.com/angelayi	2024-07-03 17:53:15 +00:00
Boyuan Feng	3ef44df667	[ts-migration] support prim::SetAttr and fix prim::GetAttr (#129440 ) - Lifting Tensor Constant attributes to buffers: TorchScript does not automatically lift tensor constant attributes to buffers. So previous converter cannot access tensor constant attributes. This PR fixed the issue. - Add SetAttr support for tensor attributes by copy_. - Add SetAttr support for non-tensor attributes. In particular, we maintain the current value of non-tensor attributes in `name_to_non_tensor_attribute_node`, similar to an interpreter pass on non-tensor attributes. So we can support the following use case: ```python def forward(self, x): c1 = self.count self.count += 1 c2 = self.count return x + c1 + c2 ``` - Fixed a bug in GetAttr to support the following use case: ```python def forward(self, inp): x = self.buffer self.buffer += 1 y = self.buffer return x + y + inp ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129440 Approved by: https://github.com/angelayi	2024-06-29 05:08:13 +00:00
Tugsbayasgalan Manlaibaatar	ec284d3a74	Prototype for export_for_training (#129092 ) This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported Differential Revision: [D59069087](https://our.internmc.facebook.com/intern/diff/D59069087) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129092 Approved by: https://github.com/zhxchen17 ghstack dependencies: #128077	2024-06-27 18:27:11 +00:00
Jiashen Cao	b6689e0fb8	[ts migration] add logging as part of torch logging system (#129405 ) #### Description Add more verbose logging of conversion process. Output which IR is being converted, which function is used to do conversion, and whether it succeeds. #### Example `TORCH_LOGS="+export,ts2ep_conversion" pytest test/export/test_converter.py -s -k test_prim_tolist` ``` test/export/test_converter.py I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] TorchScript graph I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] graph(%x.1 : Long(3, strides=[1], requires_grad=0, device=cpu)): I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] %1 : __torch__.export.test_converter.___torch_mangle_1.Module = prim::CreateObject() I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] %2 : int = prim::Constant[value=1](), scope: export.test_converter.Module:: I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] %3 : int = prim::Constant[value=0](), scope: export.test_converter.Module:: I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] %4 : int[] = prim::tolist(%x.1, %2, %3), scope: export.test_converter.Module:: I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] return (%4) I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] I0624 13:19:26.416000 140608224474112 torch/_export/converter.py:734] V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert [%1 : __torch__.export.test_converter.___torch_mangle_1.Module = prim::CreateObject()] V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert using [convert_prim_CreateObject] succeeds V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert [%2 : int = prim::Constant[value=1](), scope: export.test_converter.Module::] V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert using [convert_prim_Constant] succeeds V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert [%3 : int = prim::Constant[value=0](), scope: export.test_converter.Module::] V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert using [convert_prim_Constant] succeeds V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert [%4 : int[] = prim::tolist(%x.1, %2, %3), scope: export.test_converter.Module::] V0624 13:19:26.417000 140608224474112 torch/_export/converter.py:690] Convert using [convert_prim_tolist] succeeds I0624 13:19:26.427000 140608224474112 torch/_export/converter.py:760] TS2EPConverter IR-to-IR conversion succeeds ``` #### Test Plan `pytest test/export/test_converter` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129405 Approved by: https://github.com/angelayi	2024-06-27 00:20:20 +00:00
Zhengxu Chen	e58ef5b65f	[export] Rewrite exportdb formatting. (#129260 ) Summary: It'll be easier to generate examples if the code doesn't depend on exportdb library. Test Plan: CI Differential Revision: D58886554 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129260 Approved by: https://github.com/tugsbayasgalan	2024-06-25 21:04:53 +00:00
Jiashen Cao	45f2876934	[Fix] NumToTensor resulted from numel() and size() in TSCovnerter (#128761 ) #### Issue In jit.trace, torch.numel() is automatically cast to a `LongTensor`. But during conversion, we lost the casting part. `prim::NumToTensor` was previously converted to `torch.ops.aten.scalar_tensor`, which uses the same `dtype` as the input tensor instead of `LongTensor`. in this PR, we add a casting to convert it to the correct `dtype`. #### Test Plan We activate previously failing test case. * `pytest test/export/test_converter.py -s -k test_implicit_constant_to_tensor_handling` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128761 Approved by: https://github.com/angelayi	2024-06-25 20:20:03 +00:00
Yidi Wu	dd00f5e78d	Fixes T192448049 (#129146 ) Differential Revision: D58767610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129146 Approved by: https://github.com/angelayi	2024-06-25 17:50:15 +00:00
Zhengxu Chen	665d6ea05b	[export] Fix IR canonlization. (#129401 ) Summary: as title. we should unpack results from _canonicalize_graph. Differential Revision: D58963429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129401 Approved by: https://github.com/tugsbayasgalan	2024-06-25 16:33:02 +00:00
Boyuan Feng	04a5d3228e	[ts migration] Support prim::tolist and aten::len (#128894 ) Support prim::tolist and aten::len. Add unit tests for prim::min. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128894 Approved by: https://github.com/angelayi	2024-06-18 19:11:07 +00:00
Jiashen Cao	316b729677	[Fix] TS converter constant to tensor (#128442 ) #### Issue Tensor constant was previously lifted directly as an input in the fx graph, which results errors for multiple test cases with tensor constant. This PR introduces a fix to convert tensor constant to a `GetAttr` in the fx graph. This PR also introduces other fixes to maintain a valid `state_dict` for exported program when there are tensor constants. In short, after tensor constants are converted as `GetAttr`, they are treated as buffers during retracing. The fix will convert those back from buffer to constant. #### Test Plan Add new test cases that generate tensor constants * `pytest test/export/test_converter.py -s -k test_implicit_constant_to_tensor_handling` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128442 Approved by: https://github.com/angelayi	2024-06-17 16:42:43 +00:00
Zhengxu Chen	bfad0aee44	[export] Preserve requires_grad for export inputs. (#128656 ) Summary: Today meta['val'] on placeholder nodes doesn't preserve the consistent requires_grad information with the original inputs. Seems there's no easy way to fix this directly at proxy tensor layer. This is useful for reexporting joint graph. Test Plan: test_preserve_requires_grad_placeholders Differential Revision: D58555651 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128656 Approved by: https://github.com/tugsbayasgalan	2024-06-17 16:26:08 +00:00
angelayi	e9c6e8369c	Torchbind call method + effects support (#128397 ) Adds effect token support to torchbind method calls by allowing `with_effects` to take in `torch.ops._higher_order_ops.call_torchbind` as an input. Here is the print from `TORCH_LOGS="aot" python test/export/test_torchbind.py -k test_compile_obj_torchbind_op`: ```python def forward(self, arg0_1: "f32[0]", arg1_1: "f32[2]", arg2_1): # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1266 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.cos()) cos: "f32[2]" = torch.ops.aten.cos.default(arg1_1) with_effects = torch._higher_order_ops.effects.with_effects(arg0_1, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, cos); arg0_1 = cos = None getitem: "f32[0]" = with_effects[0]; with_effects = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1267 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.cos() + 1) cos_1: "f32[2]" = torch.ops.aten.cos.default(arg1_1) add: "f32[2]" = torch.ops.aten.add.Tensor(cos_1, 1); cos_1 = None with_effects_1 = torch._higher_order_ops.effects.with_effects(getitem, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, add); getitem = add = None getitem_2: "f32[0]" = with_effects_1[0]; with_effects_1 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1268 in f, code: torch.ops._TorchScriptTesting.queue_pop(tq) with_effects_2 = torch._higher_order_ops.effects.with_effects(getitem_2, torch.ops._TorchScriptTesting.queue_pop.default, arg2_1); getitem_2 = None getitem_4: "f32[0]" = with_effects_2[0]; with_effects_2 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1269 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.sin()) sin: "f32[2]" = torch.ops.aten.sin.default(arg1_1); arg1_1 = None with_effects_3 = torch._higher_order_ops.effects.with_effects(getitem_4, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, sin); getitem_4 = sin = None getitem_6: "f32[0]" = with_effects_3[0]; with_effects_3 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1270 in f, code: return tq.pop(), tq.pop() + tq.size(), tq with_effects_4 = torch._higher_order_ops.effects.with_effects(getitem_6, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'pop'); getitem_6 = None getitem_8: "f32[0]" = with_effects_4[0] getitem_9: "f32[2]" = with_effects_4[1]; with_effects_4 = None with_effects_5 = torch._higher_order_ops.effects.with_effects(getitem_8, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'pop'); getitem_8 = None getitem_10: "f32[0]" = with_effects_5[0] getitem_11: "f32[2]" = with_effects_5[1]; with_effects_5 = None with_effects_6 = torch._higher_order_ops.effects.with_effects(getitem_10, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'size'); getitem_10 = arg2_1 = None getitem_12: "f32[0]" = with_effects_6[0]; with_effects_6 = None add_1: "f32[2]" = torch.ops.aten.add.Tensor(getitem_11, 0); getitem_11 = None return (getitem_12, getitem_9, add_1) ``` In order to support this, this PR makes the following changes: * Adds `FakeScriptObject` to `CustomObjArgument`, which will be put on the `meta["val"]` of nodes representing torchbind objects. * Adds pickle/deepcopy support to FunctionSchema. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128397 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2024-06-14 21:28:17 +00:00
Zhengxu Chen	be0eec9031	[export] Improve static typing in tracer. (#128552 ) Summary: as title. Test Plan: CI Differential Revision: D58485487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128552 Approved by: https://github.com/angelayi	2024-06-14 17:57:37 +00:00
lezcano	0fdd8d84fa	Do not generate -1* in SymPy expressions when canonicalising (#128411 ) Partially addresses https://github.com/pytorch/pytorch/issues/128150 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128411 Approved by: https://github.com/ezyang ghstack dependencies: #128410	2024-06-13 16:49:59 +00:00
Yidi Wu	e9b81e4edf	Fakify torch bind input by default (#128454 ) Summary: Try a reland of https://github.com/pytorch/pytorch/pull/127116 after some fixes landed Differential Revision: D58418251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128454 Approved by: https://github.com/angelayi	2024-06-13 16:25:11 +00:00
Edward Z. Yang	2229884102	Introduce int_oo (#127693 ) In a previous life, we used sympy.oo to represent the lower/upper bounds of integer ranges. Later, we changed this to be sys.maxsize - 1 for a few reasons: (1) sometimes we do tests on a value being exactly sys.maxsize, and we wanted to avoid a data dependent guard in this case, (2) sympy.oo corresponds to floating point infinity, so you get incorrect types for value ranges with oo, and (3) you can do slightly better reasoning if you assume that input sizes fall within representable 64-bit integer range. After working in the sys.maxsize regime for a bit, I've concluded that this was actually a bad idea. Specifically, the problem is that you end up with sys.maxsize in your upper bound, and then whenever you do any sort of size-increasing computation like size * 2, you end up with 2 * sys.maxsize, and you end up doing a ton of arbitrary precision int computation that is totally unnecessary. A symbolic bound is better. But especially after #126905, we can't go back to using sympy.oo, because that advertises that it's not an integer, and now your ValueRanges is typed incorrectly. So what do we do? We define a new numeric constant `int_oo`, which is like `sympy.oo` but it advertises `is_integer`. test/test_sympy_utils.py describes some basic properties of the number, and torch/utils/_sympy/numbers.py has the actual implementation. The rest of the changes of the PR are working out the implications of this change. I'll give more commentary as inline comments. Fixes https://github.com/pytorch/pytorch/issues/127396 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127693 Approved by: https://github.com/lezcano ghstack dependencies: #126905	2024-06-13 04:08:20 +00:00
Angela Yi	3bc2004f91	[ts_converter] Fix prim::dtype (#128517 ) Summary: prim::dtype has the signature `(Tensor a) -> int`, where it gets the dtype of the tensor and returns the integer corresponding to this dtype based on the enum in ScalarType.h. Previously we were converting prim::dtype by returning the actual dtype of the tensor (ex. torch.float32). This causes some incorrect control flow to behavior, specifically where it checks if `prim::dtype(tensor) in [3, 5, 7]`, where [3, 5, 7] correspond to torch.int32, torch.float16, torch.float64. This control flow would always returns False because we would be comparing torch.float32 against the integers [3, 5, 7], which is a type mismatch. Test Plan: 7/22 internal models now are convertable and runnable in eager and sigmoid! P1410243909 Reviewed By: jiashenC Differential Revision: D58469232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128517 Approved by: https://github.com/jiashenC	2024-06-12 23:02:50 +00:00
Zhengxu Chen	0444e89931	[export] Remove replace_sym_size_ops_pass (#128443 ) Summary: Not needed anymore. Test Plan: CI Differential Revision: D58429458 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128443 Approved by: https://github.com/angelayi	2024-06-12 21:03:06 +00:00
PyTorch MergeBot	5d8c7f39d4	Revert "Introduce int_oo (#127693 )" This reverts commit `9cab5987bd`. Reverted https://github.com/pytorch/pytorch/pull/127693 on behalf of https://github.com/clee2000 due to sorry executorch CI is a bit weird regarding pins, I'll make a chat with mergen with the choices of what to do and how it'll affect executorch CI, reverting for now to prevent more divergences in the meantime ([comment](https://github.com/pytorch/pytorch/pull/127693#issuecomment-2161775400))	2024-06-11 23:36:08 +00:00
Jiashen Cao	739aa224ec	[Fix] Parameter un/lifting issues in the TorchScript to ExportedProgram converter (#127975 ) This PR fixes issues related to parameters and inputs lifting in the converter. #### Issue 1 ``` > Graph[linear.weights, bias.weights, x.1] %1 ... %2 ... %3 = CreateObject() > Block 0[] %linear.0 = GetAttr(linear)[%3] > Block 0.0[] %weight.0 = GetAttr(weights)[%linear.0] > Block 1[] ... ``` * Model parameters for the top level module should be unlifted, while parameters from sub-blocks should be lifted. #### Fixes * Bottom-up traversal (i.e., start from the inner most block) to figure out which parameters to be lifted for sub-blocks. #### Test Plan * Add test cases for nested block without control flow `pytest test/export/test_converter.py -s -k test_convert_nn_module_with_nested_param` * Add test cases for nested block with control flow `pytest test/export/test_converter.py -s -k test_convert_nn_module_with_nested_if_and_param` #### Outcome ##### TorchScript ``` graph(%x.1 : Float(3, strides=[1], requires_grad=0, device=cpu), %m1.m1.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu), %m1.m1.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu), %m1.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu), %m1.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu), %m1.m2.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu), %m1.m2.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu), %linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu), %linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu), %m2.m1.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu), %m2.m1.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu), %m2.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu), %m2.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu), %m2.m2.linear.weight : Float(3, 3, strides=[3, 1], requires_grad=0, device=cpu), %m2.m2.linear.bias : Float(3, strides=[1], requires_grad=0, device=cpu)): %15 : __torch__.export.test_converter.___torch_mangle_14.SuperNestedM1 = prim::CreateObject() %16 : NoneType = prim::Constant(), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %17 : int = prim::Constant[value=1](), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:34 %18 : Tensor = aten::max(%x.1), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:19 %19 : Tensor = aten::gt(%18, %17), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:19 %20 : bool = aten::Bool(%19), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:19 %21 : Tensor = prim::If(%20), scope: export.test_converter.SuperNestedM1:: # /data/users/jiashenc/pytorch/test/export/test_converter.py:342:16 block0(): %linear.6 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%15), scope: export.test_converter.SuperNestedM1:: %m1.1 : __torch__.export.test_converter.___torch_mangle_15.NestedM = prim::GetAttr[name="m1"](%15), scope: export.test_converter.SuperNestedM1:: %24 : Tensor = aten::sum(%x.1, %16), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19 %25 : Tensor = aten::gt(%24, %17), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19 %26 : bool = aten::Bool(%25), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19 %27 : Tensor = prim::If(%26), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:16 block0(): %linear.10 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m1.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %m1.3 : __torch__.export.test_converter.___torch_mangle_16.M = prim::GetAttr[name="m1"](%m1.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %linear.12 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m1.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %weight.4 : Tensor = prim::GetAttr[name="weight"](%linear.12), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %bias.4 : Tensor = prim::GetAttr[name="bias"](%linear.12), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %33 : Tensor = aten::linear(%x.1, %weight.4, %bias.4), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 %weight.6 : Tensor = prim::GetAttr[name="weight"](%linear.10), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %bias.6 : Tensor = prim::GetAttr[name="bias"](%linear.10), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %36 : Tensor = aten::linear(%33, %weight.6, %bias.6), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 -> (%36) block1(): %linear.14 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m1.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %m2.3 : __torch__.export.test_converter.___torch_mangle_16.M = prim::GetAttr[name="m2"](%m1.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %linear.16 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m2.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %weight.8 : Tensor = prim::GetAttr[name="weight"](%linear.16), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %bias.8 : Tensor = prim::GetAttr[name="bias"](%linear.16), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %42 : Tensor = aten::linear(%x.1, %weight.8, %bias.8), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 %weight.2 : Tensor = prim::GetAttr[name="weight"](%linear.14), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %bias.2 : Tensor = prim::GetAttr[name="bias"](%linear.14), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 %45 : Tensor = aten::linear(%42, %weight.2, %bias.2), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m1 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 -> (%45) %weight.10 : Tensor = prim::GetAttr[name="weight"](%linear.6), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear %bias.10 : Tensor = prim::GetAttr[name="bias"](%linear.6), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear %48 : Tensor = aten::linear(%27, %weight.10, %bias.10), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 -> (%48) block1(): %linear.8 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%15), scope: export.test_converter.SuperNestedM1:: %m2.1 : __torch__.export.test_converter.___torch_mangle_15.NestedM = prim::GetAttr[name="m2"](%15), scope: export.test_converter.SuperNestedM1:: %51 : Tensor = aten::sum(%x.1, %16), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19 %52 : Tensor = aten::gt(%51, %17), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19 %53 : bool = aten::Bool(%52), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:19 %54 : Tensor = prim::If(%53), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/test/export/test_converter.py:327:16 block0(): %linear.1 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m2.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %m1 : __torch__.export.test_converter.___torch_mangle_16.M = prim::GetAttr[name="m1"](%m2.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %linear.5 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %weight.1 : Tensor = prim::GetAttr[name="weight"](%linear.5), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %bias.1 : Tensor = prim::GetAttr[name="bias"](%linear.5), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %60 : Tensor = aten::linear(%x.1, %weight.1, %bias.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 %weight.3 : Tensor = prim::GetAttr[name="weight"](%linear.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %bias.3 : Tensor = prim::GetAttr[name="bias"](%linear.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %63 : Tensor = aten::linear(%60, %weight.3, %bias.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 -> (%63) block1(): %linear.3 : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m2.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %m2 : __torch__.export.test_converter.___torch_mangle_16.M = prim::GetAttr[name="m2"](%m2.1), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %linear : __torch__.torch.nn.modules.linear.___torch_mangle_17.Linear = prim::GetAttr[name="linear"](%m2), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %weight.5 : Tensor = prim::GetAttr[name="weight"](%linear), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %bias.5 : Tensor = prim::GetAttr[name="bias"](%linear), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %69 : Tensor = aten::linear(%x.1, %weight.5, %bias.5), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 %weight.12 : Tensor = prim::GetAttr[name="weight"](%linear.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %bias.12 : Tensor = prim::GetAttr[name="bias"](%linear.3), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 %72 : Tensor = aten::linear(%69, %weight.12, %bias.12), scope: export.test_converter.SuperNestedM1::/export.test_converter.NestedM::m2 # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 -> (%72) %weight : Tensor = prim::GetAttr[name="weight"](%linear.8), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear %bias : Tensor = prim::GetAttr[name="bias"](%linear.8), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear %75 : Tensor = aten::linear(%54, %weight, %bias), scope: export.test_converter.SuperNestedM1::/torch.nn.modules.linear.Linear::linear # /data/users/jiashenc/pytorch/torch/nn/modules/linear.py:116:15 -> (%75) return (%21) ``` ##### ExportedProgram ``` ExportedProgram: class GraphModule(torch.nn.Module): def forward(self, p_linear_weight: "f32[3, 3]", p_linear_bias: "f32[3]", p_m1_linear_weight: "f32[3, 3]", p_m1_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m1_m1_linear_bias: "f32[3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]", p_m2_linear_weight: "f32[3, 3]", p_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]", p_m2_m2_linear_bias: "f32[3]", x_1: "f32[3]"): # No stacktrace found for following nodes max_1: "f32[]" = torch.ops.aten.max.default(x_1) gt: "b8[]" = torch.ops.aten.gt.Scalar(max_1, 1); max_1 = None # File: <eval_with_key>.137:23 in forward, code: cond = torch.ops.higher_order.cond(l_args_0_, cond_true_2, cond_false_2, [l_args_3_0_, l_args_3_13_, l_args_3_5_, l_args_3_12_, l_args_3_14_, l_args_3_1_, l_args_3_3_, l_args_3_4_, l_args_3_7_, l_args_3_10_, l_args_3_11_, l_args_3_2_, l_args_3_6_, l_args_3_8_, l_args_3_9_]); l_args_0_ = cond_true_2 = cond_false_2 = l_args_3_0_ = l_args_3_13_ = l_args_3_5_ = l_args_3_12_ = l_args_3_14_ = l_args_3_1_ = l_args_3_3_ = l_args_3_4_ = l_args_3_7_ = l_args_3_10_ = l_args_3_11_ = l_args_3_2_ = l_args_3_6_ = l_args_3_8_ = l_args_3_9_ = None true_graph_0 = self.true_graph_0 false_graph_0 = self.false_graph_0 conditional = torch.ops.higher_order.cond(gt, true_graph_0, false_graph_0, [p_linear_weight, p_linear_bias, x_1, p_m1_linear_weight, p_m1_m1_linear_bias, p_m1_linear_bias, p_m1_m2_linear_weight, p_m1_m2_linear_bias, p_m1_m1_linear_weight, p_m2_m2_linear_bias, p_m2_m1_linear_weight, p_m2_linear_weight, p_m2_m1_linear_bias, p_m2_m2_linear_weight, p_m2_linear_bias]); gt = true_graph_0 = false_graph_0 = p_linear_weight = p_linear_bias = x_1 = p_m1_linear_weight = p_m1_m1_linear_bias = p_m1_linear_bias = p_m1_m2_linear_weight = p_m1_m2_linear_bias = p_m1_m1_linear_weight = p_m2_m2_linear_bias = p_m2_m1_linear_weight = p_m2_linear_weight = p_m2_m1_linear_bias = p_m2_m2_linear_weight = p_m2_linear_bias = None getitem: "f32[3]" = conditional[0]; conditional = None return (getitem,) class <lambda>(torch.nn.Module): def forward(self, p_linear_weight: "f32[3, 3]", p_linear_bias: "f32[3]", x_1: "f32[3]", p_m1_linear_weight: "f32[3, 3]", p_m1_m1_linear_bias: "f32[3]", p_m1_linear_bias: "f32[3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m2_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]", p_m2_linear_bias: "f32[3]"): # File: <eval_with_key>.134:8 in forward, code: sum_default = torch.ops.aten.sum.default(l_args_3_5__1, dtype = None) sum_1: "f32[]" = torch.ops.aten.sum.default(x_1) # File: <eval_with_key>.134:9 in forward, code: gt_scalar = torch.ops.aten.gt.Scalar(sum_default, 1); sum_default = None gt: "b8[]" = torch.ops.aten.gt.Scalar(sum_1, 1); sum_1 = None # File: <eval_with_key>.134:12 in forward, code: cond = torch.ops.higher_order.cond(gt_scalar, cond_true_0, cond_false_0, [l_args_3_12__true_branch, l_args_3_1__true_branch, l_args_3_5__1, l_args_3_14__true_branch, l_args_3_7__true_branch, l_args_3_3__true_branch, l_args_3_4__true_branch]); gt_scalar = cond_true_0 = cond_false_0 = l_args_3_12__true_branch = l_args_3_1__true_branch = l_args_3_5__1 = l_args_3_14__true_branch = l_args_3_7__true_branch = l_args_3_3__true_branch = l_args_3_4__true_branch = None true_graph_0 = self.true_graph_0 false_graph_0 = self.false_graph_0 conditional = torch.ops.higher_order.cond(gt, true_graph_0, false_graph_0, [p_m1_linear_weight, p_m1_linear_bias, x_1, p_m1_m1_linear_bias, p_m1_m1_linear_weight, p_m1_m2_linear_weight, p_m1_m2_linear_bias]); gt = true_graph_0 = false_graph_0 = p_m1_linear_weight = p_m1_linear_bias = x_1 = p_m1_m1_linear_bias = p_m1_m1_linear_weight = p_m1_m2_linear_weight = p_m1_m2_linear_bias = None getitem: "f32[3]" = conditional[0]; conditional = None # File: <eval_with_key>.134:14 in forward, code: linear_default = torch.ops.aten.linear.default(getitem, l_args_3_0__1, l_args_3_13__1); getitem = l_args_3_0__1 = l_args_3_13__1 = None linear: "f32[3]" = torch.ops.aten.linear.default(getitem, p_linear_weight, p_linear_bias); getitem = p_linear_weight = p_linear_bias = None return (linear,) class <lambda>(torch.nn.Module): def forward(self, p_m1_linear_weight: "f32[3, 3]", p_m1_linear_bias: "f32[3]", x_1: "f32[3]", p_m1_m1_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]"): # File: <eval_with_key>.130:8 in forward, code: linear_default = torch.ops.aten.linear.default(l_args_3_5__1, l_args_3_7__true_branch, l_args_3_14__true_branch); l_args_3_5__1 = l_args_3_7__true_branch = l_args_3_14__true_branch = None linear: "f32[3]" = torch.ops.aten.linear.default(x_1, p_m1_m1_linear_weight, p_m1_m1_linear_bias); x_1 = p_m1_m1_linear_weight = p_m1_m1_linear_bias = None # File: <eval_with_key>.130:9 in forward, code: linear_default_1 = torch.ops.aten.linear.default(linear_default, l_args_3_12__1, l_args_3_1__1); linear_default = l_args_3_12__1 = l_args_3_1__1 = None linear_1: "f32[3]" = torch.ops.aten.linear.default(linear, p_m1_linear_weight, p_m1_linear_bias); linear = p_m1_linear_weight = p_m1_linear_bias = None return (linear_1,) class <lambda>(torch.nn.Module): def forward(self, p_m1_linear_weight: "f32[3, 3]", p_m1_linear_bias: "f32[3]", x_1: "f32[3]", p_m1_m1_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]"): # File: <eval_with_key>.131:8 in forward, code: linear_default = torch.ops.aten.linear.default(l_args_3_5__1, l_args_3_3__false_branch, l_args_3_4__false_branch); l_args_3_5__1 = l_args_3_3__false_branch = l_args_3_4__false_branch = None linear: "f32[3]" = torch.ops.aten.linear.default(x_1, p_m1_m2_linear_weight, p_m1_m2_linear_bias); x_1 = p_m1_m2_linear_weight = p_m1_m2_linear_bias = None # File: <eval_with_key>.131:9 in forward, code: linear_default_1 = torch.ops.aten.linear.default(linear_default, l_args_3_12__1, l_args_3_1__1); linear_default = l_args_3_12__1 = l_args_3_1__1 = None linear_1: "f32[3]" = torch.ops.aten.linear.default(linear, p_m1_linear_weight, p_m1_linear_bias); linear = p_m1_linear_weight = p_m1_linear_bias = None return (linear_1,) class <lambda>(torch.nn.Module): def forward(self, p_linear_weight: "f32[3, 3]", p_linear_bias: "f32[3]", x_1: "f32[3]", p_m1_linear_weight: "f32[3, 3]", p_m1_m1_linear_bias: "f32[3]", p_m1_linear_bias: "f32[3]", p_m1_m2_linear_weight: "f32[3, 3]", p_m1_m2_linear_bias: "f32[3]", p_m1_m1_linear_weight: "f32[3, 3]", p_m2_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]", p_m2_linear_bias: "f32[3]"): # File: <eval_with_key>.135:8 in forward, code: sum_default = torch.ops.aten.sum.default(l_args_3_5__1, dtype = None) sum_1: "f32[]" = torch.ops.aten.sum.default(x_1) # File: <eval_with_key>.135:9 in forward, code: gt_scalar = torch.ops.aten.gt.Scalar(sum_default, 1); sum_default = None gt: "b8[]" = torch.ops.aten.gt.Scalar(sum_1, 1); sum_1 = None # File: <eval_with_key>.135:12 in forward, code: cond = torch.ops.higher_order.cond(gt_scalar, cond_true_1, cond_false_1, [l_args_3_2__false_branch, l_args_3_5__1, l_args_3_9__false_branch, l_args_3_11__false_branch, l_args_3_6__false_branch, l_args_3_10__false_branch, l_args_3_8__false_branch]); gt_scalar = cond_true_1 = cond_false_1 = l_args_3_2__false_branch = l_args_3_5__1 = l_args_3_9__false_branch = l_args_3_11__false_branch = l_args_3_6__false_branch = l_args_3_10__false_branch = l_args_3_8__false_branch = None true_graph_0 = self.true_graph_0 false_graph_0 = self.false_graph_0 conditional = torch.ops.higher_order.cond(gt, true_graph_0, false_graph_0, [p_m2_linear_weight, x_1, p_m2_linear_bias, p_m2_m1_linear_weight, p_m2_m1_linear_bias, p_m2_m2_linear_bias, p_m2_m2_linear_weight]); gt = true_graph_0 = false_graph_0 = p_m2_linear_weight = x_1 = p_m2_linear_bias = p_m2_m1_linear_weight = p_m2_m1_linear_bias = p_m2_m2_linear_bias = p_m2_m2_linear_weight = None getitem: "f32[3]" = conditional[0]; conditional = None # File: <eval_with_key>.135:14 in forward, code: linear_default = torch.ops.aten.linear.default(getitem, l_args_3_0__1, l_args_3_13__1); getitem = l_args_3_0__1 = l_args_3_13__1 = None linear: "f32[3]" = torch.ops.aten.linear.default(getitem, p_linear_weight, p_linear_bias); getitem = p_linear_weight = p_linear_bias = None return (linear,) class <lambda>(torch.nn.Module): def forward(self, p_m2_linear_weight: "f32[3, 3]", x_1: "f32[3]", p_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]"): # File: <eval_with_key>.132:8 in forward, code: linear_default = torch.ops.aten.linear.default(l_args_3_5__1, l_args_3_11__true_branch, l_args_3_6__true_branch); l_args_3_5__1 = l_args_3_11__true_branch = l_args_3_6__true_branch = None linear: "f32[3]" = torch.ops.aten.linear.default(x_1, p_m2_m1_linear_weight, p_m2_m1_linear_bias); x_1 = p_m2_m1_linear_weight = p_m2_m1_linear_bias = None # File: <eval_with_key>.132:9 in forward, code: linear_default_1 = torch.ops.aten.linear.default(linear_default, l_args_3_2__1, l_args_3_9__1); linear_default = l_args_3_2__1 = l_args_3_9__1 = None linear_1: "f32[3]" = torch.ops.aten.linear.default(linear, p_m2_linear_weight, p_m2_linear_bias); linear = p_m2_linear_weight = p_m2_linear_bias = None return (linear_1,) class <lambda>(torch.nn.Module): def forward(self, p_m2_linear_weight: "f32[3, 3]", x_1: "f32[3]", p_m2_linear_bias: "f32[3]", p_m2_m1_linear_weight: "f32[3, 3]", p_m2_m1_linear_bias: "f32[3]", p_m2_m2_linear_bias: "f32[3]", p_m2_m2_linear_weight: "f32[3, 3]"): # File: <eval_with_key>.133:8 in forward, code: linear_default = torch.ops.aten.linear.default(l_args_3_5__1, l_args_3_8__false_branch, l_args_3_10__false_branch); l_args_3_5__1 = l_args_3_8__false_branch = l_args_3_10__false_branch = None linear: "f32[3]" = torch.ops.aten.linear.default(x_1, p_m2_m2_linear_weight, p_m2_m2_linear_bias); x_1 = p_m2_m2_linear_weight = p_m2_m2_linear_bias = None # File: <eval_with_key>.133:9 in forward, code: linear_default_1 = torch.ops.aten.linear.default(linear_default, l_args_3_2__1, l_args_3_9__1); linear_default = l_args_3_2__1 = l_args_3_9__1 = None linear_1: "f32[3]" = torch.ops.aten.linear.default(linear, p_m2_linear_weight, p_m2_linear_bias); linear = p_m2_linear_weight = p_m2_linear_bias = None return (linear_1,) Graph signature: ExportGraphSignature(input_specs=[InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_linear_weight'), target='linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_linear_bias'), target='linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_linear_weight'), target='m1.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_linear_bias'), target='m1.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_m1_linear_weight'), target='m1.m1.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_m1_linear_bias'), target='m1.m1.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_m2_linear_weight'), target='m1.m2.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m1_m2_linear_bias'), target='m1.m2.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_linear_weight'), target='m2.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_linear_bias'), target='m2.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_m1_linear_weight'), target='m2.m1.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_m1_linear_bias'), target='m2.m1.linear.bias', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_m2_linear_weight'), target='m2.m2.linear.weight', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_m2_m2_linear_bias'), target='m2.m2.linear.bias', persistent=None), InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='x_1'), target=None, persistent=None)], output_specs=[OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='getitem'), target=None)]) Range constraints: {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127975 Approved by: https://github.com/angelayi, https://github.com/ydwu4	2024-06-10 23:24:16 +00:00
Aaron Orenstein	946f554c8f	Flip default value for mypy disallow_untyped_defs [10+1/11] (#128293 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128293 Approved by: https://github.com/oulgen	2024-06-10 19:32:44 +00:00
Edward Z. Yang	9cab5987bd	Introduce int_oo (#127693 ) In a previous life, we used sympy.oo to represent the lower/upper bounds of integer ranges. Later, we changed this to be sys.maxsize - 1 for a few reasons: (1) sometimes we do tests on a value being exactly sys.maxsize, and we wanted to avoid a data dependent guard in this case, (2) sympy.oo corresponds to floating point infinity, so you get incorrect types for value ranges with oo, and (3) you can do slightly better reasoning if you assume that input sizes fall within representable 64-bit integer range. After working in the sys.maxsize regime for a bit, I've concluded that this was actually a bad idea. Specifically, the problem is that you end up with sys.maxsize in your upper bound, and then whenever you do any sort of size-increasing computation like size * 2, you end up with 2 * sys.maxsize, and you end up doing a ton of arbitrary precision int computation that is totally unnecessary. A symbolic bound is better. But especially after #126905, we can't go back to using sympy.oo, because that advertises that it's not an integer, and now your ValueRanges is typed incorrectly. So what do we do? We define a new numeric constant `int_oo`, which is like `sympy.oo` but it advertises `is_integer`. test/test_sympy_utils.py describes some basic properties of the number, and torch/utils/_sympy/numbers.py has the actual implementation. The rest of the changes of the PR are working out the implications of this change. I'll give more commentary as inline comments. Fixes https://github.com/pytorch/pytorch/issues/127396 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127693 Approved by: https://github.com/lezcano ghstack dependencies: #126905	2024-06-10 19:09:53 +00:00
Edward Z. Yang	3964a3ec73	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Reland notes. This requires this internal fbcode diff https://www.internalfb.com/phabricator/paste/view/P1403322587 but I cannot prepare the diff codev due to https://fb.workplace.com/groups/osssupport/posts/26343544518600814/ It also requires this Executorch PR https://github.com/pytorch/executorch/pull/3911 but the ET PR can be landed prior to this landing. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-09 06:20:25 +00:00
Aaron Orenstein	ea614fb2b1	Flip default value for mypy disallow_untyped_defs [2/11] (#127839 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127839 Approved by: https://github.com/oulgen	2024-06-08 18:23:08 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
PyTorch MergeBot	ac51f782fe	Revert "Complete revamp of float/promotion sympy handling (#126905 )" This reverts commit `2f7cfecd86`. Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/atalman due to Sorry need to revert - failing internally ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2155118778))	2024-06-07 16:01:46 +00:00
Jiashen Cao	56a3d276fe	Handle custom op during TorchScript to ExportedProgram conversion (#127580 ) #### Description Handle custom ops during TorchScript to ExportedProgram covnersion ```python torch.library.define( "mylib::foo", "(Tensor x) -> Tensor", lib=lib, ) # PyTorch custorm op implementation @torch.library.impl( "mylib::foo", "CompositeExplicitAutograd", lib=lib, ) def foo_impl(x): return x + x # Meta function of the custom op. @torch.library.impl_abstract( "mylib::foo", lib=lib, ) def foo_meta(x): return x + x class M(torch.nn.Module): def forward(self, x): return torch.ops.mylib.foo(x) ``` #### Test Plan * Add a test case where custom op is called and converted. `pytest test/export/test_converter.py -s -k test_ts2ep_converter_custom_op` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127580 Approved by: https://github.com/angelayi	2024-06-06 22:06:51 +00:00
Jiashen Cao	cd42b95047	Handle aten::__contains__ during TorchScript to ExportedProgram conversion (#127544 ) #### Description Add support for converting `prim::__contains__` from TorchScript IR to ExportedProgram, e.g., ```python class MIn(torch.nn.Module): def forward(self, x: torch.Tensor): return x.dtype in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ``` #### Test Plan * Add test cases to cover both contains IR resulted from primitive types or Tensor. `pytest test/export/test_converter.py -s -k test_ts2ep_converter_contains` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127544 Approved by: https://github.com/angelayi	2024-06-06 05:00:13 +00:00
Edward Z. Yang	2f7cfecd86	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-06 02:29:45 +00:00
Jiashen Cao	4f9fcd7156	Handle unpacking during TorchScript to ExportedProgram conversion (#127419 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127419 Approved by: https://github.com/angelayi	2024-06-05 15:27:13 +00:00
PyTorch MergeBot	d5cb5d623a	Revert "Complete revamp of float/promotion sympy handling (#126905 )" This reverts commit `fb696ef3aa`. Reverted https://github.com/pytorch/pytorch/pull/126905 on behalf of https://github.com/ezyang due to internal user reported ceiling equality simplification problem, I have a plan ([comment](https://github.com/pytorch/pytorch/pull/126905#issuecomment-2148805840))	2024-06-05 03:57:58 +00:00
Jiashen Cao	f4b05ce683	Add registry for TorchScript to ExportedProgram conversion (#127464 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127464 Approved by: https://github.com/ydwu4, https://github.com/angelayi	2024-06-04 22:53:00 +00:00
PyTorch MergeBot	4c074a9b8b	Revert "[torchbind] always fakify script object by default in non-strict export (#127116 )" This reverts commit `c27882ffa8`. Reverted https://github.com/pytorch/pytorch/pull/127116 on behalf of https://github.com/atalman due to Failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/127116#issuecomment-2147459339))	2024-06-04 12:53:19 +00:00
Edward Z. Yang	fb696ef3aa	Complete revamp of float/promotion sympy handling (#126905 ) At a high level, the idea behind this PR is: * Make it clearer what the promotion and int/float rules for various Sympy operations are. Operators that previously were polymorphic over int/float are now split into separate operators for clarity. We never do mixed int/float addition/multiplication etc in sympy, instead, we always promote to the appropriate operator. (However, equality is currently not done correctly.) * Enforce strict typing on ValueRanges: if you have a ValueRange for a float, the lower and upper MUST be floats, and so forth for integers. The story begins in torch/utils/_sympy/functions.py. Here, I make some changes to how we represent certain operations in sympy expressions: * FloorDiv now only supports integer inputs; to do float floor division, do a truediv and then a trunc. Additionally, we remove the divide out addition by gcd optimization, because sympy gcd is over fields and is willing to generate rationals (but rationals are bad for ValueRange strict typing). * ModularIndexing, LShift, RShift now assert they are given integer inputs. * Mod only supports integer inputs; eventually we will support FloatMod (left for later work, when we build out Sympy support for floating operations). Unfortunately, I couldn't assert integer inputs here, because of a bad interaction with sympy's inequality solver that is used by the offline solver * TrueDiv is split into FloatTrueDiv and IntTrueDiv. This allows for us to eventually generate accurate code for Python semantics IntTrueDiv, which is written in a special way to preserve precision when the inputs are >= 2*53 beyond what first coercing the integer to floats and then doing true division. Trunc is split to TruncToFloat and TruncToInt. * Round is updated to return a float, not an int, making it consistent with the round op handler in Inductor. To get Python-style conversion to int, we call TruncToInt on the result. * RoundDecimal updated to consistently only ever return a float * Add ToFloat for explicit coercion to float (required so we can enforce strict ValueRanges typing) In torch/__init__.py, we modify SymInt and SymFloat to appropriately call into new bindings that route to these refined sympy operations. Also, we modify `torch.sym_min` and `torch.sym_max` to have promotion semantics (if one argument is a float, the return result is always a float), making them inconsistent with builtins.min/max, but possible to do type analysis without runtime information. We also need to introduce some new op handlers in torch/_inductor/ops_handler.py: * `to_int` for truncation to int64, directly corresponding to TruncToInt; this can be implemented by trunc and dtype, but with a dedicated handler it is more convenient for roundtripping in Sympy * `int_truediv` for Python-style integer true division, which has higher precision than casting to floats and then running `truediv` These changes have consequences. First, we need to make some administrative changes: * Actually wire up these Sympy functions from SymInt/SymFloat in torch/fx/experimental/sym_node.py, including the new promotion rules (promote2) * Add support for new Sympy functions in torch/utils/_sympy/interp.py, torch/utils/_sympy/reference.py * In particular, in torch.utils._sympy.reference, we have a strong preference to NOT do nontrivial compute, instead, everything in ops handler should map to a singular sympy function * TODO: I chose to roundtrip mod back to our Mod function, but I think I'm going to have to deal with the C/Python inconsistency this to fix tests here * Add printer support for the Sympy functions in torch/_inductor/codegen/common.py, torch/_inductor/codegen/cpp_utils.py, torch/_inductor/codegen/triton.py. `int_truediv` and mixed precision equality is currently not implemented soundly, so we will lose precision in codegen for large values. TODO: The additions here are not exhaustive yet * Update ValueRanges logic to use new sympy functions in torch/utils/_sympy/value_ranges.py. In general, we prefer to use the new Sympy function rather than try to roll things by hand, which is what was done previously for many VR analysis functions. In torch/fx/experimental/symbolic_shapes.py we need to make some symbolic reasoning adjustments: * Avoid generation of rational subexpressions by removing simplification of `x // y` into `floor(x / y)`. This simplification then triggers an addition simplification rule `(x + y) / c --> x / c + y / c` which is bad because x / c is a rational number now * `_assert_bound_is_rational` is no more, we no longer generate rational bounds * Don't intersect non-int value ranges with the `int_range` * Support more sympy Functions for guard SYMPY_INTERP * Assert the type of value range is consistent with the variable type The new asserts uncovered necessary bug fixes: * torch/_inductor/codegen/cpp.py, torch/_inductor/select_algorithm.py, torch/_inductor/sizevars.py - Ensure Wild/Symbol manually allocated in Inductor is marked `is_integer` so it's accepted to build expressions * torch/_inductor/utils.py - make sure you actually pass in sympy.Expr to these functions * torch/_inductor/ir.py - make_contiguous_strides_for takes int/SymInt, not sympy.Expr! * torch/export/dynamic_shapes.py - don't use infinity to represent int ranges, instead use sys.maxsize - 1 Because of the removal of some symbolic reasoning that produced rationals, some of our symbolic reasoning has gotten worse and we are unable to simplify some guards. Check the TODO at test/test_proxy_tensor.py Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126905 Approved by: https://github.com/xadupre, https://github.com/lezcano	2024-06-04 11:47:32 +00:00
Boyuan Feng	2ad0e4197d	[ts-migration] support aten::__is__, aten::__isnot__, aten::__not__, profiler::_record_function_enter_new, profiler::_record_function_exit (#127656 ) Support more ops in ts converter and add unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127656 Approved by: https://github.com/SherlockNoMad	2024-06-04 04:51:29 +00:00
Yidi Wu	c27882ffa8	[torchbind] always fakify script object by default in non-strict export (#127116 ) This diff can be risky for internal tests: any torchbind class that hasn't registered a fake class will fail and we should fix them. We've gained some confidence that this can work e2e by implementing FakeTensorQueue for TBE models in sigmoid with [D54210823](https://www.internalfb.com/diff/D54210823). Differential Revision: [D57991002](https://our.internmc.facebook.com/intern/diff/D57991002) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127116 Approved by: https://github.com/zou3519 ghstack dependencies: #127113, #127114	2024-06-03 21:38:57 +00:00
angelayi	4d32de14b6	[export] Handle serializing duplicate getitem nodes (#127633 ) We ran into a graph that looks something like the following, where we have 2 getitem calls to the same index (%getitem, %getitem_2 both query topk[0]): ``` graph(): %x : [num_users=1] = placeholder[target=x] %topk : [num_users=3] = call_function[target=torch.ops.aten.topk.default](args = (%x, 2), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%topk, 0), kwargs = {}) %getitem_1 : [num_users=1] = call_function[target=operator.getitem](args = (%topk, 1), kwargs = {}) %getitem_2 : [num_users=1] = call_function[target=operator.getitem](args = (%topk, 0), kwargs = {}) %mul_tensor : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%getitem, %getitem_2), kwargs = {}) %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%mul_tensor, 2), kwargs = {}) return (mul, getitem_1) ``` The duplicate getitem call gets created during a pass.. so there are a couple of solutions: 1. Change serializer to support the case of duplicate getitem calls 2. Change the pass so that it doesn’t produce duplicate getitem calls 3. Add a pass which dedups the getitem calls As a framework, we should do 1 and 3 (through a CSE pass). This PR implements solution 1. However, the serializer currently does some special handling for getitem nodes -- instead of directly serializing the getitem nodes, we serialize the output of the node that outputting a list of tensors (the %topk node in this example) into a list nodes for each output ([%getitem, %getitem_1]). This fails when we have duplicate getitem nodes to the same index (%getitem_2), since we do not record that duplicate getitem node anywhere. So, the solution this PR takes is that the serializer will deduplicate the getitem nodes (%getitem_2 will be replaced with %getitem). This would result in a sematically correct graph, but not necessarily node-to-node identical as the original fx graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127633 Approved by: https://github.com/ydwu4	2024-06-03 17:25:51 +00:00
Boyuan Feng	2cef2fc2b4	[ts migration] support aten::dim, aten::len, aten::__getitem__ (#127593 ) - Add support for aten::dim, aten::len, aten::__getitem__ for torchscript to export converter. - Add unit tests Co-authored-by: cyy <cyyever@outlook.com> Co-authored-by: Menglu Yu <mengluy@meta.com> Co-authored-by: Animesh Jain <anijain@umich.edu> Co-authored-by: Simon Fan <xmfan@meta.com> Co-authored-by: Zain Rizvi <ZainR@meta.com> Co-authored-by: Tugsbayasgalan (Tugsuu) Manlaibaatar <tmanlaibaatar@meta.com> Co-authored-by: titaiwangms <titaiwang@microsoft.com> Co-authored-by: Yueming Hao <yhao@meta.com> Co-authored-by: IvanKobzarev <ivan.kobzarev@gmail.com> Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com> Co-authored-by: Edward Z. Yang <ezyang@meta.com> Co-authored-by: Bin Bao <binbao@meta.com> Co-authored-by: Feny Patel <fenypatel@meta.com> Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Co-authored-by: xinan.lin <xinan.lin@intel.com> Co-authored-by: Zain Huda <zainhuda@meta.com> Co-authored-by: Chien-Chin Huang <chienchin@fb.com> Co-authored-by: Wei Wang <weiwan@nvidia.com> Co-authored-by: Jason Ansel <jansel@meta.com> Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Iris Z <31293777+wz337@users.noreply.github.com> Co-authored-by: Wang, Eikan <eikan.wang@intel.com> Co-authored-by: angelayi <yiangela7@gmail.com> Co-authored-by: Svetlana Karslioglu <svekars@meta.com> Co-authored-by: Yanbo Liang <ybliang8@gmail.com> Co-authored-by: Catherine Lee <csl@fb.com> Co-authored-by: Kwanghoon An <kwanghoon@meta.com> Co-authored-by: Brian Hirsh <hirsheybar@fb.com> Co-authored-by: Robert Mast <rmast@live.nl> Co-authored-by: drisspg <drisspguessous@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127593 Approved by: https://github.com/SherlockNoMad, https://github.com/malfet	2024-06-02 00:36:33 +00:00
angelayi	b2f5fd8efb	[ts_converter] Basic support for prim::If conversion (#127336 ) Script module: ``` graph(%self : __torch__.M, %x.1 : Tensor, %y.1 : Tensor): %11 : int = prim::Constant[value=1]() %5 : bool = aten::Bool(%x.1) # /data/users/angelayi/pytorch2/test/export/test_converter.py:27:19 %21 : Tensor = prim::If(%5) # /data/users/angelayi/pytorch2/test/export/test_converter.py:27:16 block0(): %8 : Tensor = aten::mul(%y.1, %y.1) # /data/users/angelayi/pytorch2/test/export/test_converter.py:28:27 -> (%8) block1(): %12 : Tensor = aten::add(%y.1, %y.1, %11) # /data/users/angelayi/pytorch2/test/export/test_converter.py:30:27 -> (%12) return (%21) ``` ExportedProgram: ``` ExportedProgram: class GraphModule(torch.nn.Module): def forward(self, x_1: "b8[]", y_1: "i64[]"): # File: <eval_with_key>.23:9 in forward, code: cond = torch.ops.higher_order.cond(l_args_0_, cond_true_0, cond_false_0, [l_args_3_0_]); l_args_0_ = cond_true_0 = cond_false_0 = l_args_3_0_ = None true_graph_0 = self.true_graph_0 false_graph_0 = self.false_graph_0 conditional = torch.ops.higher_order.cond(x_1, true_graph_0, false_graph_0, [y_1]); x_1 = true_graph_0 = false_graph_0 = y_1 = None return (conditional,) class <lambda>(torch.nn.Module): def forward(self, y_1: "i64[]"): # File: <eval_with_key>.20:6 in forward, code: mul_tensor = torch.ops.aten.mul.Tensor(l_args_3_0__1, l_args_3_0__1); l_args_3_0__1 = None mul: "i64[]" = torch.ops.aten.mul.Tensor(y_1, y_1); y_1 = None return mul class <lambda>(torch.nn.Module): def forward(self, y_1: "i64[]"): # File: <eval_with_key>.21:6 in forward, code: add_tensor = torch.ops.aten.add.Tensor(l_args_3_0__1, l_args_3_0__1, alpha = 1); l_args_3_0__1 = None add: "i64[]" = torch.ops.aten.add.Tensor(y_1, y_1); y_1 = None return add ``` This PR also adds support for TupleIndex and incorporates some changes from https://github.com/pytorch/pytorch/pull/127341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127336 Approved by: https://github.com/BoyuanFeng	2024-05-31 17:46:16 +00:00
Boyuan Feng	4afc5c7bb9	[torchscript] Handle prim::device and prim::dtype (#127466 ) - Support prim::device and prim::dtype during torchscript migration to export - Add unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/127466 Approved by: https://github.com/SherlockNoMad	2024-05-30 18:35:44 +00:00
Chen Lai	7827afca14	Copy the constant folding pass to the pass under export/passes folder (#127456 ) It's a generic pass and I'm trying to find a good place to host it. It's currently needed by quantization flow. See context in D55930580, it's too much effort to land a fix in the inductor folder. Differential Revision: [D57934182](https://our.internmc.facebook.com/intern/diff/D57934182/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127456 Approved by: https://github.com/angelayi	2024-05-30 18:04:08 +00:00
Jiashen Cao	d66f12674c	Handle tuple and dict during TorchScript to ExportedProgram conversion (#127341 ) * Add some test cases for testing List, Tuple, and Dict * Refactor the conversion code slightly * Add a logic to handle Dict Pull Request resolved: https://github.com/pytorch/pytorch/pull/127341 Approved by: https://github.com/SherlockNoMad, https://github.com/angelayi	2024-05-30 00:08:09 +00:00
Pian Pawakapan	8a31c2aa84	[export] allow complex guards as runtime asserts (#127129 ) With the current state of export's dynamic shapes, we struggle with guards and constraints that are beyond the current dynamic shapes language, expressed with dims and derived dims. While we can compile and guarantee correctness for guards within the current language (e.g. min/max ranges, linear relationships, integer divisibility) we struggle to dynamically compile guards which extend beyond that. For these "complex" guards, we typically do either of the following: 1) raise a constraint violation error, along the lines of "not all values of <symbol> in the specified range satisfy <guard>", with or without suggested fixes, 2) specialize to the provided static values and suggest removing dynamism, or 3) fail compilation due to some arbitrary unsupported case. Previous [work](https://github.com/pytorch/pytorch/pull/124949) went towards resolving this by disabling forced specializations, instead allowing the user to fail at runtime with incorrect inputs. In this PR, relying on [hybrid backed-unbacked symints](https://github.com/pytorch/pytorch/issues/121749), [deferred runtime asserts](https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/runtime_assert.py), and the function [_is_supported_equivalence()](`d7de4c9d80/torch/fx/experimental/symbolic_shapes.py (L1824)`), we add a flag `_allow_complex_guards_as_runtime_asserts` which allows the user to compile exported programs containing these guards and maintain dynamism, while adding correctness checks as runtime assertions in the graph. Hybrid backed-unbacked symints allow us to easily bypass "implicit" guards emitted from computation - guards that we ~expect to be true. Popular examples revolve around reshapes: ``` # reshape def forward(self, x, y): # x: [s0, s1], y: [s2] return x.reshape([-1]) + y # guard s0 * s1 = s2 This leads to the following exported program class GraphModule(torch.nn.Module): def forward(self, x: "f32[s0, s1]", y: "f32[s2]"): sym_size_int: "Sym(s2)" = torch.ops.aten.sym_size.int(y, 0) mul: "Sym(-s2)" = -1 * sym_size_int; sym_size_int = None sym_size_int_1: "Sym(s0)" = torch.ops.aten.sym_size.int(x, 0) sym_size_int_2: "Sym(s1)" = torch.ops.aten.sym_size.int(x, 1) mul_1: "Sym(s0s1)" = sym_size_int_1 sym_size_int_2; sym_size_int_1 = sym_size_int_2 = None add: "Sym(s0s1 - s2)" = mul + mul_1; mul = mul_1 = None eq: "Sym(Eq(s0s1 - s2, 0))" = add == 0; add = None _assert_scalar = torch.ops.aten._assert_scalar.default(eq, "Runtime assertion failed for expression Eq(s0s1 - s2, 0) on node 'eq'"); eq = None view: "f32[s0s1]" = torch.ops.aten.view.default(x, [-1]); x = None add_1: "f32[s0s1]" = torch.ops.aten.add.Tensor(view, y); view = y = None return (add_1,) ``` Another case is symbol divisibility: ``` def forward(self, x): # x: [s0, s1] return x.reshape([-1, x.shape[0] - 1]) # Eq(Mod(s0 s1, s0 - 1), 0) ``` Applying deferred runtime asserts also helps dynamic compilation for "explicit" complex guards that typically cause problems for export. For example we can generate runtime asserts for not-equal guards, and complex conditions like the following: ``` class Foo(torch.nn.Module): def forward(self, x, y): # check that negation of first guard also shows up as runtime assertion if x.shape[0] == y.shape[0]: # False return x + y elif x.shape[0] == y.shape[0] 3: # False return x + 2, y + 3 elif x.shape[0] 2 == y.shape[0] * 3: # True return x * 2.0, y * 3.0 ``` For the above graph we will generate 3 runtime assertions: the negation of the first 2, and the 3rd condition as a guard. One additional benefit here over the current state of exported programs is that this adds further correctness guarantees - previously with explicit complex guards, if compilation succeeded, the guards would be ignored at runtime, treated as given. As shown above, the runtime asserts appear as math ops in the graph, generated by the sympy interpreter, resulting in an _assert_scalar call. There is an option to avoid adding these asserts into the graph, by setting `TORCH_DYNAMO_DO_NOT_EMIT_RUNTIME_ASSERTS=1`. This results in the "original" computation graph, with dynamism, and any incorrect inputs will fail on ops during runtime. Further work could go into prettifying the printer, so the majority of the graph isn't guard-related. Ideally this PR would subsume and remove the recently added [_disable_forced_specializations](https://github.com/pytorch/pytorch/pull/124949) flag, but that flag still handles one additional case of specialization: single-variable equalities where the symbol is solvable for a concrete value: see this [PR](https://github.com/pytorch/pytorch/pull/126925) This PR doesn't change any behavior around data-dependent errors/unbacked symints yet, that could be further work. NOTE: will take naming change suggestions for the flag :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127129 Approved by: https://github.com/avikchaudhuri	2024-05-29 17:15:25 +00:00
Jiashen Cao	10d2373abd	Add a registry for GraphModuleSerializer (#126550 ) This PR adds a registration function and a global registry for GraphModuleSerializer. After this PR, custom serialization methods can be done through registration instead of subclassing for ease of maintenance. ## Changes - Add a test case where it injects custom op to test serialization. - Add custom op handler - Change allowed op for verifier Co-authored-by: Zhengxu Chen <zhxchen17@outlook.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126550 Approved by: https://github.com/zhxchen17	2024-05-29 03:12:48 +00:00
Pian Pawakapan	f206c5c628	[export] handle new roots & root swapping in derived dims suggested fixes (#125543 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125543 This PR address 2 issues with derived dim suggested fixes, 1) newly introduced roots, and 2) root swapping. 1 \| Newly introduced roots appear with modulo guards, e.g. Mod(dx, 2) = 0 suggests dx is a derived dim equal to 2 * _dx, introducing a new root _dx. Currently the final suggested fixes handle this correctly, but we can get intermediate results where related derived dims don't rely on a unified root, and are a mixture of min/max range and derived suggestions. For example: ``` "dx": {"eq": 3_dx-1, "max": 36} "dy": {"eq": dx+1} This should lead to suggested fixes _dx = Dim('_dx', max=12) dx = 3 _dx - 1 dy = 3 * _dx ``` This PR prettifies the suggested fixes routine by unifying to a single root, and making each intermediate suggestion either a derived dim or min/max range, not both. 2 \| The current suggested fixes for derived dims can lead to root dims/derived dims being swapped, e.g. `dy - 1, dy` -> `dx, dx + 1`. This leads to problematic suggested fixes that look like `dy - 1 = Dim("dy - 1")` since we don't have access to the original variable name. This PR only adds a suggested fix for the root dim, and removes all other derived suggestions. For example, with the export test case test_derived_dim_out_of_order_simplified: ``` _dimz = torch.export.Dim("_dimz", min=6, max=8) dimy = _dimz - 1 dimx = dimy - 1 dimz = torch.export.Dim("dimz", min=6, max=8) # doesn't work, should be = _dimz class Foo(torch.nn.Module): def forward(self, x, y, z): return x + y[1:] + z[2:] foo = Foo() u, v, w = torch.randn(5), torch.randn(6), torch.randn(7) export( foo, (u, v, w), dynamic_shapes=({0: dimx}, {0: dimy}, {0: dimz}), ) ``` Before: ``` Suggested fixes: _dimz = Dim('_dimz', min=3, max=9223372036854775807) # 2 <= _dimz - 1 <= 9223372036854775806 _dimz - 2 = Dim('_dimz - 2', min=4, max=6) _dimz = Dim('_dimz', min=2, max=9223372036854775806) # 2 <= _dimz <= 9223372036854775806 _dimz - 1 = _dimz - 1 dimz = _dimz ``` New suggested fixes: ``` Suggested fixes: dimz = _dimz ``` Note: This assumes the specified derived relations between dims are correct. This should be valid because: 1) if the relation is plain wrong (e.g. (dx, dx - 1) provided with inputs (6, 4)), this gets caught in beforehand in produce_guards. 2) if the relation is correct but does not match the emitted guard, for example: ``` def forward(self, x, y): return x.reshape([-1]) + y # guard: s0 * 2 = s1 dx = Dim("dx") export( model, (torch.randn(6, 2), torch.randn(12)), dynamic_shapes={"x": (dx, 2), "y": (dx + 6, )} ) ``` This produces two linear equations, leading to specialization since a) produce_guards is able to solve for a concrete value, and b) the export constraint solver will anyways force specializations due to range constraints. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125543 Approved by: https://github.com/avikchaudhuri	2024-05-28 20:41:43 +00:00
Jiashen Cao	254783ce80	[Fix]: populate input parameter name when convert TorchScript to ExportedProgram (#126787 ) ## Goal As title ## Design Based on the fact that each TorchScript module has a `code` property which provides the original source code for the `forward` function, I implemented a function to extrapolate `forward` function signature by using the AST parser. Some other tradeoff * Directly parsing src code as string --> will be very buggy * Directly using `compile` function in Python to get the function object --> raises a lot of exceptions because of missing packages or undefined variable names Pull Request resolved: https://github.com/pytorch/pytorch/pull/126787 Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan	2024-05-28 17:33:44 +00:00
Angela Yi	cbb79a2baf	[export] Disable backend decomps for capture_pre_autograd (#127120 ) Differential Revision: D57785713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127120 Approved by: https://github.com/ydwu4	2024-05-28 16:37:13 +00:00
Xuehai Pan	35ea5c6b22	[3/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torchgen (#127124 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127124 Approved by: https://github.com/Skylion007 ghstack dependencies: #127122, #127123	2024-05-25 19:20:03 +00:00
Sheng Fu	bbeb0906c4	Register creak_node_hook (#126671 ) Differential Revision: D57469157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126671 Approved by: https://github.com/angelayi	2024-05-24 23:32:15 +00:00
Aaron Gokaslan	3cb16ebf08	[BE]: Update ruff to 0.4.5 (#126979 ) Update ruff to 0.4.5 and addresses some false negatives that have been found in the newer version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126979 Approved by: https://github.com/ezyang	2024-05-24 18:38:35 +00:00
Angela Yi	cb6ef68caa	Propagate tokens in aotautograd (#127028 ) Test Plan: `buck run mode/dev-nosan //aimp/experimental/pt2:pt2_export -- --model-entity-id 938593492 --output /tmp/938593492.zip --use-torchrec-eager-mp --use-manifold` Differential Revision: D57750072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127028 Approved by: https://github.com/tugsbayasgalan	2024-05-24 03:23:17 +00:00
Sherlock Huang	a63310eebc	TorchScript 2 ExportedProgram Converter (#126920 ) Summary: Initial commit for TorchScript 2 ExportedProgram Converter. TODO: - Improve TorchScript IR coverage - parameter and buffers should be owned by output ExportedProgram - Experiment on conditional op conversion Test Plan: buck2 run mode/dev-nosan fbcode//caffe2/test:test_export -- -r TestConverter Differential Revision: D57694784 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126920 Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan	2024-05-23 17:00:18 +00:00
angelayi	da7bf1d588	[export] Fix unflatten with empty nn_module_stack (#126785 ) Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1433418843962989/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/126785 Approved by: https://github.com/tugsbayasgalan	2024-05-23 08:34:25 +00:00
Edward Z. Yang	0d17aae242	Teach FakeTensor to fill in item_memo when converting scalar CPU tensor (#126245 ) This PR requires a little justification, but let's start with what it does first: 1. When you have a 0d CPU scalar int64/float64 tensor input to a graph, we will preallocate a backed SymInt/SymFloat corresponding to what you would get if you call item() on this tensor. This means you can freely change your input to be a Python int/float or a Tensor with an item() call and end up with exactly the same level of expressivity (specifically, you can guard on the internal SymInt/SymFloat no matter what). By default, the source of the backed SymInt/SymFloat is `L['tensor'].item()`, but if you have promoted a float input into a Tensor, we will cancel out `torch.as_tensor(L['float']).item()` into just `L['float']`. 2. We switch wrap_symfloat to use this, instead of hand crafting the new SymNodeVariable. Everything works out, except that we carefully pass the item() result to tracked fakes (and not the fake Tensor argument) OK, so why do this at all? There is some marginal benefit where now some item() calls on scalar inputs can be guarded on, but IMO this is a pretty marginal benefit, and if it was the only reason, I wouldn't do this. The real reason for this is that I need to be able to propagate fake tensors through the graphs that are produced by Dynamo, and if I am doing the old custom wrap_symfloat logic, there's no way I can do this, because ordinarily an item() call will cause an unbacked SymInt when I reallocate. The other obvious way to solve the problem above is to make a HOP alternative that item() that "bakes in" the backed SymInt its supposed to return. But this strategy seems more parsimonious, and it does have the marginal benefit I mentioned above. The main downside is that what I have to do next, is make it so that when I run tensor computation, I also apply the equivalent operations to the SymInt/SymFloat as well. That's next PR. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126245 Approved by: https://github.com/eellison ghstack dependencies: #126637	2024-05-22 15:25:38 +00:00
Jiashen Cao	ac1f0befcf	Remove redundant serialization code (#126803 ) After https://github.com/pytorch/pytorch/pull/123308, we no longer need separate serialization path to handle different types that exist in the nn_module metadata. This PR cleans up the redundant code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126803 Approved by: https://github.com/angelayi	2024-05-22 03:14:17 +00:00
Edward Z. Yang	97eef61474	Don't assume compare_arg is fx.Node (#126771 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126771 Approved by: https://github.com/Skylion007	2024-05-21 19:53:21 +00:00
Sherlock Huang	0d5ba547ec	Tool for scouting exportability in one shot (#126471 ) Summary: Tool for scouting exportability issues in one shot. - Collect sample inputs for all submodules by running eager inference with forward_pre_hook. - Start from root module, recursively try exporting child modules, if current module export fails. Limitations: - only works for nn.module that contains tree-like submodules structure. this doesn't work for flatten GraphModule. TODO: support dynamic_dims Sample output: https://docs.google.com/spreadsheets/d/1jnixrqBTYbWO_y6AaKA13XqOZmeB1MQAMuWL30dGoOg/edit?usp=sharing ``` exportability_report = { '': UnsupportedOperatorException(func=<OpOverload(op='testlib.op_missing_meta', overload='default')>), 'submod_1': UnsupportedOperatorException(func=<OpOverload(op='testlib.op_missing_meta', overload='default')>), 'submod_2': None } ``` Test Plan: buck2 run mode/dev-nosan fbcode//caffe2/test:test_export -- -r TestExportTools Differential Revision: D57466486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126471 Approved by: https://github.com/zhxchen17	2024-05-18 00:10:46 +00:00
Tugsbayasgalan Manlaibaatar	bed1c600bb	Experimental prototype for converting torch.jit.trace modules to export (#124449 ) Differential Revision: [D56440613](https://our.internmc.facebook.com/intern/diff/D56440613) We want to do this for following reasons: 1. There is current limitation in export tracing for torch.jit.trace d modules that cannot be easily upstreamed 2. We need to run internal CI regularly to understand feature gaps and continuously track them 3. Multiple people will be working on this prototype so it is better to have a checked in version so we don't always run into merge conflicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124449 Approved by: https://github.com/angelayi, https://github.com/avikchaudhuri	2024-05-17 20:42:42 +00:00
PyTorch MergeBot	f89500030b	Revert "Remove redundant serialization code (#126249 )" This reverts commit `aab448e381`. Reverted https://github.com/pytorch/pytorch/pull/126249 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing sigmoid/frontend:serialization_test internally ([comment](https://github.com/pytorch/pytorch/pull/126249#issuecomment-2118233656))	2024-05-17 19:19:02 +00:00
Matthew Hoffman	81277baa0c	Remove removed ruff rule TRY200 (#126256 ) My TOML linter is complaining that "TRY200" is not acceptable for the `tool.ruff.lint` schema. From the ruff docs: https://docs.astral.sh/ruff/rules/reraise-no-cause/ > This rule has been removed and its documentation is only available for historical reasons. > > This rule is identical to [B904](https://docs.astral.sh/ruff/rules/raise-without-from-inside-except/) which should be used instead. and we are currently explicitly ignoring B904. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126256 Approved by: https://github.com/Skylion007	2024-05-17 16:31:05 +00:00
Tarun Karuturi	4b7eee3450	Print export warning only once in capture_pre_autograd (#126403 ) Summary: Missed this in D57163341 Test Plan: CI Differential Revision: D57442088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126403 Approved by: https://github.com/zhxchen17	2024-05-16 21:55:11 +00:00
Jiashen Cao	aab448e381	Remove redundant serialization code (#126249 ) After https://github.com/pytorch/pytorch/pull/123308, we no longer need separate serialization path to handle different types that exist in the `nn_module` metadata. This PR cleans up the redundant code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126249 Approved by: https://github.com/angelayi	2024-05-16 19:22:20 +00:00
Wang, Eikan	08aa704d0c	[1/N] Non-Tensor: Scalar Support: Enable aot compile to support aten operations with scalar input like alpha (#124177 ) Some operations have a scalar input parameter, like `torch.add(a, b, alpha=2.0)`. Currently, the aot compile does not support such a case because it requires the signature of the captured graph to align with the operation's signature. This means that some inputs in the captured graph may be scalar(float, int, bool, etc.). It breaks the assumption of `compile_fx_aot` as it assumes all the example inputs are tensor - `0f6ce45bcb/torch/_inductor/compile_fx.py (L1048)` This PR intends to support such cases by allowing not-aligned signature and filtering out the non-Tensor parameters. Captured graph for `torch.add(a, b, alpha=2.0)` ``` opcode name target args kwargs ------------- -------- --------------- ---------------- -------------- placeholder arg0_1 arg0_1 () {} placeholder arg1_1 arg1_1 () {} call_function add aten.add.Tensor (arg0_1, arg1_1) {'alpha': 2.0} output output_1 output ((add,),) {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124177 Approved by: https://github.com/jansel, https://github.com/desertfire, https://github.com/jgong5	2024-05-16 05:15:55 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	26f6f98364	Forward fix failures for torch.export switch to predispatch (#126081 ) Summary: Fixes: - executorch test - torchrec test Test Plan: CI Differential Revision: D57282304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126081 Approved by: https://github.com/angelayi	2024-05-15 18:13:06 +00:00
angelayi	c712b0f8a3	[export] Fix runtime assertions to add call_function (#125878 ) Fixes [internal issue](https://www.internalfb.com/intern/everpaste/?handle=GJCK9xUNpYXovnEBAHfuJ7vQLxZnbsIXAAAB) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125878 Approved by: https://github.com/ezyang	2024-05-14 00:57:50 +00:00
Pian Pawakapan	c9a258e474	[export] handle constant aliasing for export (#125509 ) Summary: Currently export will [error out](`2b5ae2611e/torch/export/_trace.py (L477)`) if a constant is aliased. This PR supports this by modifying ConstantAttrMap to map constants to a list of FQNs instead of a single FQN, populating the ExportedProgram constants dict to contain multiple entries to the same constant. Test Plan: added test case in test_export.py Differential Revision: D56955654 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125509 Approved by: https://github.com/angelayi, https://github.com/ydwu4	2024-05-10 00:14:37 +00:00
angelayi	13545fe68a	[export] Don't create a new fake mode if dynamo tracing (#125185 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/125185 Approved by: https://github.com/mikekgfb	2024-05-09 23:43:08 +00:00
Tarun Karuturi	eaaf0f3299	Print capture_pre_autograd_graph warning only once (#125848 ) Summary: Print this warning only once to avoid flooding the logs of workflows where this is called frequently. Test Plan: CI Differential Revision: D57163341 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125848 Approved by: https://github.com/zhxchen17	2024-05-09 22:04:05 +00:00
Tugsbayasgalan Manlaibaatar	0e419b9146	Fix graph partitioner and make runtime assertion work with submodules in export (#125793 ) Summary: This fix does three things: 1. When we add inputs from partioner to the top level graph module, we insert in the order of partioner which is not guaranteed to be same as original graph inputs. This PR fixes that. 2. When we replace autograd ops with HOP, we create new submodules and access their outputs via getitem calls. As a result, previous node names associated with getitem gets updated, resulting in the graph being different from produced graph signature. So I just update the graph signature accordingly. 3. We run runtime_assertion pass before autograd HOP pass because the constraints won't be populated correctly. Differential Revision: [D57130314](https://our.internmc.facebook.com/intern/diff/D57130314) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125793 Approved by: https://github.com/zhxchen17	2024-05-09 18:13:46 +00:00
Zhengxu Chen	3ccf107f01	[export] remove upgrader. (#125625 ) Summary: talked to executorch team, seems we can remove this now. Test Plan: CI Differential Revision: D57013451 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125625 Approved by: https://github.com/larryliu0820	2024-05-09 16:30:12 +00:00
Pian Pawakapan	f4b2d50fd7	[export] disable_forced_specializations (#124949 ) Summary: By default, some inferred dynamic shapes guards/constraints that are not expressible with the current dynamic shapes language will lead to specialization to the concrete input values provided. If disable_forced_specializations is set to True, we will not specialize, and will not perform runtime checks on such produced guards. Instead, we allow the user to specify arbitrary shapes, and fail during runtime if the inputs are invalid. Constraints expressible with the language (e.g. ranges, linear derived dims) will still be enforced, and behavior for all other guards remains the same. Cases where we typically specialize are reshapes: ``` x: [4, 6] # [s0, s1] x = x.reshape([x.shape[0] - 1, -1]) # this emits a guard Mod(s0s1, s0-1) = 0, we specialize on s0=4, s1=6 x: [4, 6], y: [24] # [s0, s1], [s2] x = x.reshape([-1]) + y # this emits a guard s0s1 = s2, we specialize on s0=4, s1=6, s2=24 ``` For now only applicable for non-strict mode (need to figure out how to pass this flag into dynamo's call of produce_guards). Test Plan: Added test case that checks compilation, runtime, and suggested fixes behavior. Differential Revision: D56361177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124949 Approved by: https://github.com/avikchaudhuri	2024-05-08 18:42:39 +00:00
angelayi	8be4c1bc2f	[export] Add metadata for nodes insert_deferred_runtime_asserts (#125414 ) Fixes [internal error](https://fb.workplace.com/groups/1075192433118967/permalink/1416709435633930/). The issue is that the asserting nodes added in the `insert_deferred_runtime_assertion` pass do not contain metadata that the ExportedProgram requires the graph to have. One solution to fix this is to retrace the entire module, or another solution is to manually add back this metadata. This diff implements the latter solution (manually add back the metadata) through hooking into fx.graph's `create_node` function, and adding export-specific metadata for every node that is created. The reason I did this is so that the `insert_deferred_runtime_assertion` does not have to know about what metadata export wants. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125414 Approved by: https://github.com/zhxchen17, https://github.com/BoyuanFeng	2024-05-07 23:15:21 +00:00
Zhengxu Chen	8024e72326	[export] Warn on capture_pre_autograd_graph. (#125602 ) Summary: capture_pre_autograd_graph is deprecated and torch.export won't able to provide timely fix for this API. To reduce some confusion around this we should explicitly give users clear warnings. Test Plan: eyes Reviewed By: tarun292 Differential Revision: D56955202 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125602 Approved by: https://github.com/angelayi	2024-05-07 22:51:17 +00:00
angelayi	0de9ce9bb3	[export] Fix serialization of empty torch artifact (#125542 ) A previous PR added support for serializing/deserializing example inputs, but this fails when `example_inputs` is none. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125542 Approved by: https://github.com/pianpwk, https://github.com/BoyuanFeng, https://github.com/ydwu4	2024-05-07 15:54:45 +00:00
PyTorch MergeBot	7ffa5558ee	Revert "[FX] Update type hints in `torch.fx._compatibility.py` (#125469 )" This reverts commit `235b4d6ec2`. Reverted https://github.com/pytorch/pytorch/pull/125469 on behalf of https://github.com/izaitsevfb due to breaks pyre in dependent projects (internal: see D56986361) ([comment](https://github.com/pytorch/pytorch/pull/125469#issuecomment-2096665396))	2024-05-06 18:36:43 +00:00
Aaron Gokaslan	1dd42e42c4	[BE]: Try TCH autofixes on torch/ (#125536 ) Tries TCH autofixes and see what breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536 Approved by: https://github.com/ezyang	2024-05-05 23:13:59 +00:00
Xuehai Pan	235b4d6ec2	[FX] Update type hints in `torch.fx._compatibility.py` (#125469 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125469 Approved by: https://github.com/Skylion007 ghstack dependencies: #125468	2024-05-05 19:30:22 +00:00
ydwu4	0302dc68bf	[Reland] Fakify script object inputs and attributes for non-strict ex… (#125490 ) A re-land of #124239. This PR fakify ScriptObject inputs and attributes in export non-strict mode by default. The basic idea is to only fakify the script object during tracing (i.e. aot_export). After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor. Concretely, we fakify all the script object inputs, and module attributes (gathered by constant_attrs). patch the module's attributes with fakified script object right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125490 Approved by: https://github.com/angelayi	2024-05-04 02:39:42 +00:00
Zhengxu Chen	12a69afa6d	[export] Fix deserializer node meta handling. (#125454 ) Summary: The code seems not needed because serializer shouldn't make any meaningful decision about what goes to node metadata. Test Plan: CI Differential Revision: D56918543 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125454 Approved by: https://github.com/angelayi	2024-05-03 16:51:08 +00:00
Pian Pawakapan	ef757a5c00	[export] use tree_map for _flatten_dynamic_shapes (#125415 ) Summary: Fixing the implementation of `_flatten_dynamic_shapes()`, to follow how `_process_dynamic_shapes()` does it. The previous implementation would misinterpret some nested dynamic shapes specs, causing it to miss out on some shapes specs, for example with nested inputs/constant input tuples: ``` inputs = ( (2, 1), ( torch.randn(2, 1), torch.randn(2, 2), torch.randn(2, 3), ) ) dynamic_shapes = ( (None, None), ( None, None, None, ) ) ``` This would get interpreted as 2 shapes specs for 2d and 3d tensors. Fixing so this doesn't happen. Test Plan: Existing export tests Differential Revision: D56894923 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125415 Approved by: https://github.com/angelayi	2024-05-03 04:59:17 +00:00
PyTorch MergeBot	f1f142c44f	Revert "Fakify script object inputs and attributes for non-strict export (#124239 )" This reverts commit `ecc2e034f7`. Reverted https://github.com/pytorch/pytorch/pull/124239 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/124239#issuecomment-2089305447))	2024-05-01 23:56:00 +00:00
Avik Chaudhuri	746da8755c	switch tests from constrain_as* to torch._check* (#125253 ) To fix data-dependent errors we want to recommend that people use `torch._check` APIs. The `constrain_as` APIs should be fully subsumed by them, and in the future we should kill them entirely. Differential Revision: D56774333 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125253 Approved by: https://github.com/ezyang	2024-05-01 21:01:27 +00:00
ydwu4	ecc2e034f7	Fakify script object inputs and attributes for non-strict export (#124239 ) This PR fakify ScriptObject inputs and attributes in export non-strict mode by default. The basic idea is to `only fakify the script object during tracing (i.e. aot_export)`. After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor. Concretely, we 1. fakify all the script object inputs, and module attributes (gathered by constant_attrs). 2. patch the module's attributes with fakified script object 3. right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124239 Approved by: https://github.com/zou3519	2024-04-30 15:57:25 +00:00
Aaron Gokaslan	3e1fb96964	[BE]: RUF018 - ban assignment in assert (#125125 ) Ban assignment inside of assert. Python code should ideally not break with assertions disabled. Adds a ruff lint rule to enforce this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125125 Approved by: https://github.com/ezyang	2024-04-28 21:41:36 +00:00
Edward Z. Yang	7aa6bd7fa0	Refactor all top level usages of record_shapeenv_event to ShapeEnv class (#123735 ) This ensures that first argument to record_shapeenv_event is a ShapeEnv so we can appropriately short circuit when recording is not in progress. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123735 Approved by: https://github.com/ysiraichi, https://github.com/zou3519, https://github.com/albanD	2024-04-27 20:36:40 +00:00
Aaron Orenstein	a8574a9719	Fix global flake8 issues (#124771 ) Prior to this `lintrunner --all-files --take FLAKE8` failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771 Approved by: https://github.com/Skylion007 ghstack dependencies: #124428	2024-04-26 15:35:53 +00:00
Aaron Gokaslan	2f3b0befed	[BE]: Apply ruff FURB 118. (#124743 ) Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743 Approved by: https://github.com/albanD, https://github.com/malfet	2024-04-26 14:34:52 +00:00
PyTorch MergeBot	1ac60484c1	Revert "Fix global flake8 issues (#124771 )" This reverts commit `f01275934b`. Reverted https://github.com/pytorch/pytorch/pull/124771 on behalf of https://github.com/jeanschmidt due to Unfortunately, I needed to revert #123735 and this one depends on it. So please check if there are no merge conflicts or breakages and feel free to merge this PR again ([comment](https://github.com/pytorch/pytorch/pull/124428#issuecomment-2078699836))	2024-04-26 06:15:17 +00:00
PyTorch MergeBot	e607dc8abb	Revert "Refactor all top level usages of record_shapeenv_event to ShapeEnv class (#123735 )" This reverts commit `87bec7db4e`. Reverted https://github.com/pytorch/pytorch/pull/123735 on behalf of https://github.com/jeanschmidt due to Breaking internal signals, more info in D56587358 ([comment](https://github.com/pytorch/pytorch/pull/123735#issuecomment-2078695590))	2024-04-26 06:10:58 +00:00
angelayi	724f8dd8c5	[export] Serialize empty list based on argument type (#123748 ) Fixes https://github.com/pytorch/pytorch/issues/123480 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123748 Approved by: https://github.com/zhxchen17	2024-04-25 23:03:27 +00:00
Zhengxu Chen	7bb89bcaa4	[export] Fix state dict reparametrization in non-strict. (#124847 ) Summary: There are multiple things implemented incorrectly in non strict for reparametrizing state dict: 1. The same fake tensor should be generated for duplicated weights. 2. We should snapshot state dict in the beginning to always hold the invariant that ep.state_dict == mod.state_dict() 3. We will overwrite real weights with fake weights if we don't restore the weights in LIFO ordering. 4. We don't turn on strict checking which could sliently fail on corner cases. This diff aims to solve all these issues at once. Test Plan: CI Differential Revision: D56505020 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124847 Approved by: https://github.com/pianpwk	2024-04-25 22:44:16 +00:00
angelayi	84fb96130f	[export] Fix check for optional tensor returns (#123739 ) Sorry for the delay! Addressing issue in https://www.internalfb.com/diff/D55455000?dst_version_fbid=1599488570890576&transaction_fbid=776042617791884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123739 Approved by: https://github.com/zhxchen17	2024-04-25 20:51:26 +00:00
Pian Pawakapan	93a319a4fc	[export] kill _process_constraints() (#123985 ) The process for populating range_constraints follows separate methods for non-strict (`make_constraints`), and strict (`_process_constraints`). The strict method is somewhat more convoluted, and the analysis that Dynamo performs for strict is already present as part of the non-strict process in make_constraints (produce_guards(), running the export constraint solver). This PR kills _process_constraints() and replaces calls with make_constraints, without duplicating the work that Dynamo already does. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123985 Approved by: https://github.com/avikchaudhuri	2024-04-25 16:58:57 +00:00
Aaron Orenstein	f01275934b	Fix global flake8 issues (#124771 ) Prior to this `lintrunner --all-files --take FLAKE8` failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124771 Approved by: https://github.com/Skylion007 ghstack dependencies: #124428	2024-04-25 14:25:00 +00:00
Edward Z. Yang	87bec7db4e	Refactor all top level usages of record_shapeenv_event to ShapeEnv class (#123735 ) This ensures that first argument to record_shapeenv_event is a ShapeEnv so we can appropriately short circuit when recording is not in progress. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123735 Approved by: https://github.com/ysiraichi, https://github.com/zou3519, https://github.com/albanD ghstack dependencies: #124310, #124314, #124316, #124394, #124739, #124782, #124785	2024-04-25 14:02:48 +00:00
Edward Z. Yang	13ab24f192	Reimplement unbacked symbol bindings in Inductor (#124394 ) This PR has a lot of "draw the rest of the fucking owl" energy. Here's how to break it down. 1. torch/_inductor/graph.py - We start by tightening unbacked symbol invariants. Specifically, as we lower FX nodes, we check whether or not every unbacked_binding recorded on the FX node meta, actually ends up getting bound (according to get_unbacked_symbol_defs) in all the buffers generated by the lowering. Hopefully this invariant is self evident. This leads to a lot of failures. 2. torch/_inductor/ir.py - Problem 1: There is softness in how Inductor computes defs of unbacked symbols in IR node. Previously, we tried to infer it by looking at the output sizes/strides/etc and see if new unbacked symbols popped up that we hadn't seen in the inputs. I don't know exactly what was buggy about the old code, but sometimes we would fail to notice an unbacked symbol had been bound, or rebind an unbacked symbol multiple times. Fortunately, thanks to the earlier PRs in our stack, we now have a nice list of unbacked symbol bindings from FX, so we now just store it directly on ExternKernel and use it directly to report defs. This has to be done twice: once for FallbackKernel (e.g., nonzero) and once for DynamicScalar (e.g., item) (see also torch/_inductor/lowering.py, torch/_inductor/codegen/wrapper.py and torch/_inductor/codegen/cpp_wrapper_cpu.py for the lowering and codegen changes for item) * process_kernel - Sidequest! It turns out that Inductor lowering can reallocate unbacked symbols. This happens specifically when we repropagate fake tensors through the operator in `process_kernel`. This repropagation process is necessary because Inductor may have changed the strides of input tensors, and it must now recompute the strides so that it can continue to appropriately plan the rest of the lowering process. This is fine: we just make sure we do the rebind unbacked + compute_unbacked_bindings dance we've been doing previously in the PR stack. But instead of putting unbacked_bindings on a new FX node, they go straight into our unbacked_bindings on the Inductor IR node. * codegen_unbacked_symbol_defs - Sidequest! FallbackKernel lowering is done in two steps. First, you emit the FallbackKernel buffer. Then, you emit MultiOutput buffers which actually give access to the individual outputs of FallbackKernel, which may have been multi-output. There is a design decision here: does the FallbackKernel bind the unbacked symbols, or the MultiOutput buffer? Historically, we put the binding on MultiOutput buffer, because it's more convenient: the FallbackKernel buffer is fake, in fact, it doesn't even get a name in C++ codegen. But it's kind of inconsistent with the keypath model that we've been tracking unbacked bindings with: if you have a multi-output node, you'd expect a keypath like `[0].size()[0]` representing the first output's first dimension size. That suggests that it's the FallbackKernel that should define the things. So that was my first implementation. Unfortunately, the C++ codegen is too cursed and I could not understand how to make it work in that case. So now we just unsoundly assume you cannot have multi-output data dependent output, and do the codegen in MultiOutput. There are some comments explaining exactly what we are improperly assuming. 3. _rename_unbacked_to in torch/fx/experimental/symbolic_shapes.py - Previously, when we renamed unbacked symbols, we clobbered any facts we previously knew about them. So for example, if we had a replacement `u0 -> s0` but then we renamed u0 to u1, we would now setup the replacement `u0 -> u1`, clobbering the old replacement. This apparently didn't matter in earlier PRs in the stack, but with Inductor now on the ball, there were some tests that indicated this was a problem. The solution is easy: if u0 had a preexisting replacement, reapply it to u1. However... * torch/_functorch/_aot_autograd/collect_metadata_analysis.py - When we run forward analysis, this triggers fake tensor repropagation and fresh allocations. Previously, we just cleared out the pending symbols when finished the analysis. But with the change above, this would also migrate replacements to the new symbols... which are now dead. So now we explicitly suppress generation of these symbols with `ignore_fresh_unbacked_symbols` so that no rebinding happens at all. * torch/_dynamo/eval_frame.py - same deal; I just searched for all sites we called clear() on pending 4. The last step is fixing the long tail of extra problems that show up, now that unbacked_bindings are load bearing into Inductor * torch/_dynamo/eval_frame.py - Some of the exports are making copies of nodes without repropagating fake tensors, so in this case, it is important to also copy the `unbacked_bindings` (apparently this didn't matter before without the Inductor changes) * torch/_export/pass_base.py - I discover that this is doing fake tensor repropagation via a test suite failure. Do the same playbook as AOTAutograd: PropagateUnbackedSymInts too! Actually, they also have implemented their own tracer as well, so do the same playbook as proxy_tensor: record unbacked_bindings on the newly traced nodes. UGH code duplication. * torch/_subclasses/fake_tensor.py, torch/_subclasses/fake_impls.py (with call site updates at torch/_functorch/_aot_autograd/traced_function_transforms.py and torch/fx/passes/fake_tensor_prop.py) - What's this new epoch thing? I noticed that sometimes I would be retracing, call nonzero() on a fake tensor, and not allocate a new unbacked symbol. This is actually bad, because if I don't get a new unbacked symbol, I don't know there's a binding site, and `unbacked_bindings` is now missing a binding. The reason for this is memoization: if I reuse the exact same fake tensor on my retrace, it will already have an unbacked symint memoized on it and we will short circuit allocation. Well, that's no good. So I associate the memos with a fake tensor epoch, and every time you start a new fake tensor propagation from scratch, you bump the epoch so that I clear all the memos. * torch/_inductor/scheduler.py - I notice in unit tests that V.current_node is not always set when we call process_kernel. So I save it into the IR node and restore it when we are running `get_estimated_runtime`. * torch/fx/experimental/symbolic_shapes.py - A few things * rebind_unbacked (re _tensor_version). Ordinarily, when you have an unbacked SymInt, you persistently hvae it all the way to the end of the program. `_tensor_version` violates this: this generates an unbacked SymInt (for reasons I don't quite understand?) and then gets rid of it later. This triggered an assert violation. I think this op is kind of misusing unbacked SymInt, but I didn't know how to refactor it, so it gets a special case. * rebind_unbacked (re Simplify SymBool binding). Ugh, SymBool, what a pain in the butt. I have an assert that you can only rebind unbacked symbol to another unbacked symbol. This assert fails when a boolean is involved, because the result of running keypath on the result is not `u1`, it's `sympy.Piecewise(... sympy.Eq(u1, 1) ...)`. This is actually just `u1`, but Sympy doesn't know it because it doesn't know that `u1` value range is `[0, 1]`. So we manually implement the simplification needed to get the assert to pass. * compute_unbacked_bindings (re This is pretty fragile). There is a really funny disaster involving memoization and Inductor process kernel. Ordinarily when I retrace, if there was a memo hit in the old trace, there will be a memo hit in the new trace. However, Inductor process kernel breaks this, because it recreates fake tensor inputs to the operator call from scratch (since they might have different strides), and obviously these tensor inputs don't have the memo from the old one. I tried a little bit to try to manually transplant the memo to the new fake tensor but it seemed hopeless, so I just let the fresh symbol ride, allocating a new unbacked symbol. However, in one of our tests, we rely on knowing that the first nonzero call is equal to the second (memoized) nonzero call. The equality test looked pretty easy to discharge, so I just went ahead and added a deferred runtime assert to this effect and it worked. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124394 Approved by: https://github.com/jansel ghstack dependencies: #124310, #124314, #124316	2024-04-25 02:08:59 +00:00
Zhengxu Chen	d40774f4ed	[export] Fix up nn_module_stack for nodes occured around tracepoint ops. (#124457 ) Summary: as title. Test Plan: hg checkout D55901896 buck run mode/opt torchrec/ir/tests:test_serializer -- --filter-regex test_serialize_deserialize_ebc Differential Revision: D56340319 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124457 Approved by: https://github.com/tugsbayasgalan	2024-04-23 20:26:44 +00:00
Pian Pawakapan	e112792a69	[export] refactor _AddRuntimeAssertionsForInlineConstraintsPass (#124503 ) Summary: The current _AddRuntimeAssertionsForInlineConstraintsPass has 2 known issues caused by its use of torch.fx.Interpreter: 1. SymInt-related ops (e.g. item()) are executed, causing new Unbacked SymInts to appear in the graph during the pass. 2. The graph is reconstructed, and node names/indices can be different from before, causing mismatches with `module_call_graph`, and leading to issues during unflattening. This refactors the pass to use PassBase instead of _ExportPassBaseDeprecatedDoNotUse, only constructing new nodes for assertions. Test Plan: This pass is called on all strict-mode export calls with range_constraints, test that behavior remains unchanged. Differential Revision: D56360137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124503 Approved by: https://github.com/zhxchen17	2024-04-23 20:07:49 +00:00
Pian Pawakapan	10b9d4d19c	[export] handle Dim.lower = 0, 1 for ep.run_decompositions() (#123602 ) Summary: With pre-dispatch export and ep.run_decompositions(), range constraints are updated through looking at ShapeEnv.var_to_range. However the lower bounds on these may be incorrect - analysis on un-specialized symbols are done with lower bounds of 2, which mismatch with user-specified bounds (may be 0, 1). This updates `_get_updated_range_constraints()` to use the old range constraints if possible. Test Plan: Existing pre-dispatch/dynamic shapes test case. Differential Revision: D55899872 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123602 Approved by: https://github.com/tugsbayasgalan	2024-04-19 21:29:36 +00:00
angelayi	74bedbb9e1	[export] Serialize rational symint ranges (#123884 ) Some symints result in rational ranges like 10/3 which runs into an error ([example](https://www.internalfb.com/intern/everpaste/?handle=GMG2AxkeoFUrh-UDAFcE8pKPgjoUbsIXAAAB)). Ed will eventually get rid(?) of these rational ranges but as a workaround export can just clamp the results during serialization time Pull Request resolved: https://github.com/pytorch/pytorch/pull/123884 Approved by: https://github.com/zhxchen17	2024-04-18 18:20:11 +00:00
Pian Pawakapan	90d1720861	[export] Restore original placeholder names (part 3: constant input de/serialization) (#123590 ) Summary: note: breaking the original diff D55225818 into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size. Stacked PR to restore original names to placeholder nodes, replacing the default names arg0_1, arg1_1, ... This PR supports constant argument placeholder (e.g. forward(self, x, y=1)) names and de/serialization, by adding a name field for ConstantArguments in the graph signature, and ConstantInputSpec in the input specs for serialization. Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py Differential Revision: D55506949 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123590 Approved by: https://github.com/angelayi, https://github.com/zhxchen17	2024-04-15 19:09:41 +00:00
Zhengxu Chen	951582949b	[export] Enforce final classes in serialization. (#123861 ) Summary: as title, these are private API and not meant to be used across repos. Test Plan: CI Differential Revision: D56027954 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123861 Approved by: https://github.com/tugsbayasgalan	2024-04-12 15:44:56 +00:00
Pian Pawakapan	d0ccf599cc	[export] Restore original placeholder names (part 2: higher-order-op subgraph naming) (#123587 ) Summary: note: breaking the original diff [D55225818](https://www.internalfb.com/diff/D55225818) into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size. Stacked PR to restore original names to placeholder nodes, replacing the default names arg0_1, arg1_1, ... This PR propagates node names to higher-order-op subgraph placeholders, retaining the top-level names and handling naming collisions by suffixing other non-placeholder nodes in the subgraph with an index. This is the same handling as in fx.Graph/fx.Node, but implemented separately as a pass. Since the input schemas of HOO subgraphs are very different, they are enumerated in _name_hoo_subgraph_placeholders(). Currently cond, map_impl, and wrap_with_set_grad_enabled are handled, but other ops can be easily added. Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py Differential Revision: D55456749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123587 Approved by: https://github.com/angelayi	2024-04-11 22:40:46 +00:00
Angela Yi	b287dbbc24	[export] Fix naming if state dict contains colons (#123601 ) Test Plan: buck2 run mode/opt //aps_models/pyper/ads:train\[inplace\] +training.ir_serializer=on_disk https://www.internalfb.com/intern/everpaste/?handle=GICWmAB0g_Z1StMCAMxuhJI6U9pHbsIXAAAz Reviewed By: tugsbayasgalan Differential Revision: D55894742 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123601 Approved by: https://github.com/pianpwk	2024-04-09 21:25:08 +00:00
Pian Pawakapan	8bd6223730	[export] construct set_grad_enabled HOO subgraph inside other HOO subgraphs (#123391 ) Summary: Reference: https://github.com/pytorch/pytorch/pull/121736 Previously set_grad_enabled nodes in HOO subgraphs (e.g. cond) were inlined and not replaced with their own HOO subgraphs. This diff recursively does that. Example: ``` class Model(torch.nn.Module): def forward(self, x, y): def true_fn(x, y): with torch.enable_grad(): return x - y return torch.cond( x.sum() > 0, true_fn, lambda x, y: x + y, [x, y], ) ``` Before (printing out `ep.graph_module.true_graph_0`): ``` class <lambda>(torch.nn.Module): def forward(self, arg0_1: "i64[]", arg1_1: "i64[]"): # No stacktrace found for following nodes _set_grad_enabled = torch._C._set_grad_enabled(True) sub: "i64[]" = torch.ops.aten.sub.Tensor(arg0_1, arg1_1); arg0_1 = arg1_1 = None _set_grad_enabled_1 = torch._C._set_grad_enabled(False) return (sub,) ``` After: ``` class GraphModule(torch.nn.Module): def forward(self, arg0_1: "i64[]", arg1_1: "i64[]"): # No stacktrace found for following nodes submod_3 = self.submod_1 sub: "i64[]" = torch._higher_order_ops.wrap.wrap_with_set_grad_enabled(True, submod_3, arg0_1, arg1_1); submod_3 = arg0_1 = arg1_1 = None return (sub,) class GraphModule(torch.nn.Module): def forward(self, arg0_1: "i64[]", arg1_1: "i64[]"): # No stacktrace found for following nodes sub: "i64[]" = torch.ops.aten.sub.Tensor(arg0_1, arg1_1); arg0_1 = arg1_1 = None return sub ``` Differential Revision: D55770138 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123391 Approved by: https://github.com/tugsbayasgalan	2024-04-09 02:08:03 +00:00
Pian Pawakapan	42c2a5477c	[export] nn_module_stack to return class name str (#123308 ) Previously, `node.meta["nn_module_stack"]` had type `Dict[str, Tuple[str, class]]` when exported, and later `Dict[str, Tuple[str, str]]` after de/serialization. This PR changes it to consistently be `Dict[str, Tuple[str, str]]` for round-trippability, i.e. ``` {..., 'L__self___conv': ('conv', 'torch.nn.modules.conv.Conv2d')} ``` `source_fn_stack` is left untouched in this PR. note: the `Union[type, str]` type annotations in ONNX are because ONNX goes through both `export.export()` and `_dynamo.export()` (which still has the original `Dict[str, Tuple[str, class]]` format). nn_module_stack from `export.export()` should consistently have the new format, and we verify/test for that in `_trace.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123308 Approved by: https://github.com/zhxchen17, https://github.com/thiagocrepaldi	2024-04-05 21:48:22 +00:00
Pian Pawakapan	d7f23f6826	[export] Restore original placeholder names (part 1: top-level renaming) (#122904 ) Summary: This PR restores original names to placeholder nodes, replacing the default names arg0_1, arg1_1, and so on. User inputs now follow the signature of mod.forward(), for example forward(x, y) produces nodes x, y. If the tensors are nested in dictionaries, lists, tuples, or dataclasses, the names are a concatenation of the path to the tensor, e.g. x = {'a': torch.randn(4), 'b': [torch.randn(4), torch.randn(4)]} produces nodes x_a, x_b_0, x_b_1. Parameters, buffers, constants, and custom objects follow the FQN of the object, prefixed by "p", "b", "c", and "obj" respectively. For example, self.bar.l0.weight gets you p_bar_l0_weight. Effect tokens are named token_1, token_2, and so on, since they are not grounded in model inputs or named attributes. note: breaking the original diff into 3 parts (top-level renaming, higher-order-op subgraphs, constant input de/serialization) because of its size. Examples: ```python # params, buffers, constants, inputs, torch.cond ExportedProgram: class GraphModule(torch.nn.Module): def forward(self, p_l0_weight: "f32[4, 4]", p_l0_bias: "f32[4]", c_alpha: "f32[4]", b_beta: "f32[4]", x_0_a: "f32[4, 4]", y: "f32[4, 4]"): # No stacktrace found for following nodes mul: "f32[4, 4]" = torch.ops.aten.mul.Tensor(x_0_a, x_0_a) t: "f32[4, 4]" = torch.ops.aten.t.default(p_l0_weight); p_l0_weight = None addmm: "f32[4, 4]" = torch.ops.aten.addmm.default(p_l0_bias, y, t); p_l0_bias = y = t = None return addmm # model code class Bar(torch.nn.Module): def forward(self, x): return x * x class Foo(torch.nn.Module): def __init__(self): super().__init__() self.bar = Bar() self.l0 = torch.nn.Linear(4, 4) self.alpha = torch.randn(4) self.register_buffer('beta', torch.randn(4)) def forward(self, x, y): x = x[0]['a'] mul = self.bar(x) z1 = self.l0(y) return z1 # custom objects, dataclasses, tokens, constant inputs ExportedProgram: class GraphModule(torch.nn.Module): def forward(self, token_1: "f32[0]", obj_attr, data_x: "f32[4, 4]", data_y: "f32[4, 4]", mode): # No stacktrace found for following nodes mul: "f32[4, 4]" = torch.ops.aten.mul.Scalar(data_x, 30); data_x = None div: "f32[4, 4]" = torch.ops.aten.div.Tensor_mode(data_y, 1.0, rounding_mode = 'floor'); data_y = None add: "f32[4, 4]" = torch.ops.aten.add.Tensor(mul, div); mul = div = None with_effects = torch._higher_order_ops.effects.with_effects(token_1, torch.ops._TorchScriptTesting.takes_foo.default, obj_attr, add); token_1 = obj_attr = add = None getitem: "f32[0]" = with_effects[0] getitem_1: "f32[4, 4]" = with_effects[1]; with_effects = None return (getitem, getitem_1) # model code class Foo(torch.nn.Module): def __init__(self): super().__init__() self.attr = torch.classes._TorchScriptTesting._Foo(10, 20) def forward(self, data, a=1.0, mode="floor"): x = self.attr.add_tensor(data.x) + torch.div(data.y, a, rounding_mode=mode) x = torch.ops._TorchScriptTesting.takes_foo(self.attr, x) return x dataclass class DataClass: x: Tensor y: Tensor register_dataclass_as_pytree_node( DataClass, serialized_type_name="test.DataClass" ) args = (DataClass(x=torch.randn(4, 4), y=torch.randn(4, 4)), ) kwargs = {'mode': 'floor'} ep = torch.export.export(Foo(), args, kwargs, strict=False) ``` Test Plan: verification checks on placeholder names for all export() calls, unit test in test/export/test_export.py Differential Revision: D55456418 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122904 Approved by: https://github.com/angelayi, https://github.com/thiagocrepaldi	2024-04-05 18:56:00 +00:00
Josh Fromm	0c8a165b43	[Export] Improve metadata and output parsing during deserialization (#122793 ) Summary: Deserialization of metadata could encounter a bug where commas are used in valid metadata names. This specifically occurs when a split of a `torch.nn.Sequential` stack is used, but may have other possible triggers. Because the deserialization relies on a comma based string split, such names trigger an error. This change uses a simple regular expression to ignore commas within parentheses to avoid the issue. I add a test that constructs one such problematic sequential stack and show that it can be properly round-tripped with the improved splitting. Similarly, deserialization could fail when outputs are not a tensor type. Although such outputs like None or constants are not very useful, they do show up in graphs and export should be able to support them. This change improves output node parsing and adds a corresponding test. Test Plan: buck test //caffe2/test:test_export -- TestSerialize Differential Revision: D55391674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122793 Approved by: https://github.com/zhxchen17	2024-04-05 00:25:37 +00:00
Pian Pawakapan	4b1b4db231	[export] Add stack_trace for non-strict export (#121034 ) This addresses 2 issues with stack_trace metadata: - stack_trace is currently missing from nodes in non-strict export - in strict mode, stack_trace is populated for placeholder nodes, which may not be well-defined (with multiple uses) We filter the call stack during tracing for calls from forward() methods, or ops in `torch.__init__.py` (e.g. sym_size_int, sym_constrain_range, etc.) to populate stack_trace. A node-level check is also added to _export_non_strict(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/121034 Approved by: https://github.com/angelayi	2024-04-04 22:35:33 +00:00
Boyuan Feng	64d743044d	Add inline constraints to non-strict exported program (#123017 ) Summary: This PR reduces the difference between strict and non-strict exported program by supporting inline_constraints for non-strict exported program, Test Plan: CI Differential Revision: D55547830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123017 Approved by: https://github.com/angelayi	2024-04-02 18:16:16 +00:00
angelayi	ed457c7dbe	[export] Add torch_fn (#122693 ) This PR adds a new metadata, `torch_fn` which is meant to replace `source_fn_stack` as `source_fn_stack` is not entirely well defined between strict/nonstrict. Previous discussion [here](https://docs.google.com/document/d/1sPmmsmh6rZFWH03QBOe49MaXrQkP8SxoG8AOMb-pFk4/edit#heading=h.anmx9qknhvm). `torch_fn` represents the torch function that a particular aten operator came from. For example, `torch.nn.Linear` goes down to the `torch.nn.functional.linear` at the `__torch_function__` layer, and then `aten.t/aten.addmm` in the `__torch_dispatch__` layer. So the nodes `aten.t/aten.addmm` will now have the `torch_fn` metadata containing the `torch.nn.functional.linear`. The `torch_fn` metadata is a tuple of 2 strings: a unique identifier for each torch function call, and the actual torch function `f"{fn.__class__}.{fn.__name__}"`. The purpose of the first value is to distinguish between 2 consecutive calls to the same function. For example, if we had 2 calls to `torch.nn.Linear`, the nodes and corresponding metadata would look something like: ``` aten.t - ("linear_1", "builtin_function_or_method.linear"), aten.addmm - ("linear_1", "builtin_function_or_method.linear"), aten.t - ("linear_2", "builtin_function_or_method.linear"), aten.addmm - ("linear_2", "builtin_function_or_method.linear"), ``` Higher order ops -- currently we can get the torch_fn metadata for nodes within the HOO's subgraph, but after retracing, this becomes the `(cond, higher_order_op.cond)` :( This is because `fx_traceback.set_current_meta` points to the cond node in the toplevel graph, rather than the original node in the subgraph. I think this is because `fx.Interpreter` does not go into the cond subgraphs. (will discuss with Yidi more ab this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122693 Approved by: https://github.com/tugsbayasgalan	2024-03-30 06:47:15 +00:00
Angela Yi	482d8bf1ea	[aoti] Change aot_compile callsites (#122225 ) Summary: Replacing `torch._export.aot_compile` callsites with ``` ep = torch.export._trace._export(.., predispatch=True) # Traces the given program into predispatch IR so_path = torch._inductor.aot_compile_ep(ep, ...) # Takes an exported program and compiles it into a .so ``` This allows us to explicitly split up the export step from AOTInductor. We can later modify tests to do `export + serialize + deserialize + inductor` to mimic internal production use cases better. Test Plan: CI Differential Revision: D54808612 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122225 Approved by: https://github.com/SherlockNoMad, https://github.com/khabinov	2024-03-29 21:34:20 +00:00
PyTorch MergeBot	3beb9d85a6	Revert "Add non strict inline constraints and runtime assertions to non-strict exported program (#122722 )" This reverts commit `b693fff5d7`. Reverted https://github.com/pytorch/pytorch/pull/122722 on behalf of https://github.com/BoyuanFeng due to This breaks torchrec.distributed.tests.test_pt2.TestPt2: test_kjt__getitem__ ([comment](https://github.com/pytorch/pytorch/pull/122722#issuecomment-2026078351))	2024-03-28 20:42:35 +00:00
Boyuan Feng	b693fff5d7	Add non strict inline constraints and runtime assertions to non-strict exported program (#122722 ) This PR reduces the difference between strict and non-strict exported program by - Support `inline_constraints` for non-strict exported program - Add runtime assertions for range constraints to non-strict exported program After this PR, the following unit tests are no longer `expectedFailureNonStrict`: - test_automatic_constrain_size - test_export_with_inline_constraints - test_redundant_asserts - test_constrain_size_with_constrain_value Pull Request resolved: https://github.com/pytorch/pytorch/pull/122722 Approved by: https://github.com/pianpwk	2024-03-27 21:20:03 +00:00
Josh Fromm	0c47f8028e	Keep example_inputs when saving and loading ExportedProgram (#122618 ) Summary: `torch.export` is a powerful tool for creating a structured and shareable package from arbitrary pytorch code. One great use case of `torch.export` is sharing models or subgraphs in a way that allows results to be easily replicated. However, in the current implementation of `export`, the `example_inputs` field is thrown out. When trying to replicate bugs, benchmarks, or behaviors, losing the original input shapes and values makes the process much messier. This change adds saving and loading for the `example_inputs` attribute of an `ExportedProgram` when using `torch.export.save` and `torch.export.load`. This simple addition makes `ExportedPrograms`s a fantastic tool for performance and accuracy replication. For example, with this change we enable the following workflow: ``` # Script to create a reproducible accuracy issue with my model. kwargs = {"fastmath_mode": True} exp_program = export(my_model, sample_inputs, kwargs) result = exp_program.module()(sample_inputs, kwargs) # Uhoh, I dont like that result, lets send the module to a colleague to take a look. torch.export.save(exp_program, "my_model.pt2") ``` My colleague can then easily reproduce my results llike so: ``` # Script to load and reproduce results from a saved ExportedProgram. loaded_program = torch.export.load("my_model.pt2") # The following line is enabled by this Diff, we pull out the arguments # and options that caused the issue. args, kwargs = loaded_program.example_inputs reproduced_result = loaded_program.module()(args, **kwargs) # Oh I see what happened here, lets fix it. ``` Being able to share exact inputs and arguments makes `ExportedPrograms` much more clean and powerful with little downside. The main potential issue with this change is that it does slightly increase the size of saved programs. However, the size of inputs will be much smaller than parameters in most cases. I am curious to hear discussion on saved file size though. The deserialization of `example_inputs` is currently implemented as `Optional`. Although this wont effect users of `export.save` and `export.load`, it does give backwards compatibility to any direct users of `serialize` and `deserialize`. Test Plan: This diff includes a new test which exercises the save / load flow with multiple args and kwargs. ``` buck test //caffe2/test:test_export -- TestSerialize ``` Differential Revision: D55294614 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122618 Approved by: https://github.com/zhxchen17	2024-03-26 03:32:44 +00:00
Edward Z. Yang	5891c5b3a6	Factor meta conversion through serializable MetaTensorDesc (#122044 ) Fixes https://github.com/pytorch/pytorch/issues/121085 This PR pretty involved so pay attention to this description. At a high level, the refactor is intended to be mechanical: anywhere in MetaConverter where previously we took a Tensor as argument, we now take a MetaTensorDesc, which contains all of the information that we would have queried off of the Tensor, but placed into a separate data structure which we can serialize or use to recreate a fake tensor in a separate fake tensor mode in exact fidelity to the original. However, this transformation is not always entirely mechanical. Here is what you need to pay attention to: - The memo table from real Tensor -> meta/fake Tensor is now broken into two memo tables: real Tensor -> stable int id -> meta/fake Tensor. The stable int id is needed so that when we do serialization, we know when tensors/storages alias each other and can ensure we preserve this aliasing upon deserialization. The way I have implemented changes the weak reference behavior. Previously, when either the real Tensor OR the meta/fake Tensor went dead, we would remove the entry from the memo table. Now, this only removes entries from one of the two memo tables. This semantically makes sense, because the user may have held on to the stable int id out of band, and may expect a real Tensor to continue to be numbered consistently / expect to be able to lookup a meta/fake tensor from this id. If this is unacceptable, it may be possible to rejigger the memo tables so that we have real Tensor -> stable int id and real Tensor -> meta/fake Tensor, but TBH I find the new implementation a lot simpler, and arranging the memo tables in this way means that I have to muck around with the real tensor to save to the memo table; in the current implementation, I never pass the Tensor to meta_tensor function AT ALL, which means it is impossible to accidentally depend on it. - When I fill in the fields of MetaTensorDesc in describe_tensor, I need to be careful not to poke fields when they are not valid. Previously, preconditions were implicitly checked via the conditional structure ("is this sparse? is this nested?") that is tested before we start reading attributes. This structure has to be replicated in describe_tensor, and I have almost assuredly gotten it wrong on my first try (I'll be grinding through it on CI; a careful audit will help too, by auditing that I've tested all the same conditionals that the original access was guarded by.) - I originally submitted https://github.com/pytorch/pytorch/pull/121821 for the symbolic shapes change, but it turned out the way I did it there didn't actually work so well for this PR. I ended up just inlining the symbolic shapes allocation logic into MetaConverter (look for calls to maybe_specialize_sym_int_with_hint), maybe there is a better way to structure it, but what I really want is to just read sizes/strides/offset directly off of MetaTensorDesc; I don't want another intermediate data structure. - Some fields aren't serializable. These are documented as "NOT serializable". ctx/type should morally be serializable and I just need to setup a contract with subclasses to let them be serialized. The fake_mode is used solely to test if we are refakefying with a pre-existing ShapeEnv and we want to reuse the SymInt directly--serializing this case is hopeless but I am kind of hoping after this refactor we do not need this at all. view_func is not serializable because it's a bound C implemented method. Joel has promised me that this is not too difficult to actually expose as a true data structure, but this is the edgiest of edge cases and there is no reason to deal with it right now. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122044 Approved by: https://github.com/eellison	2024-03-25 06:21:17 +00:00
Pian Pawakapan	3f99306452	[export] Remove from_export flag (#122500 ) Summary: The flag from_export was incorrectly included in a previous diff (https://www.internalfb.com/diff/D54314379) - it was intended for helping with ExportedProgram verification, but was no longer needed in the final implementation. Test Plan: Changes no functionality, test/export already covers everything Differential Revision: D55205857 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122500 Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17	2024-03-22 22:55:14 +00:00
Zhengxu Chen	7fd14ebb52	[export] Use randomized inputs to examples. (#122424 ) Summary: as title. replacing all torch.ones to randn Test Plan: CI Reviewed By: tugsbayasgalan Differential Revision: D55206441 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122424 Approved by: https://github.com/tugsbayasgalan	2024-03-22 15:32:28 +00:00
PyTorch MergeBot	f65373e278	Revert "Factor meta conversion through serializable MetaTensorDesc (#122044 )" This reverts commit `e2d89e9704`. Reverted https://github.com/pytorch/pytorch/pull/122044 on behalf of https://github.com/jeanschmidt due to Seems that some landrace caused this PR to break lint ([comment](https://github.com/pytorch/pytorch/pull/122044#issuecomment-2015025490))	2024-03-22 12:46:21 +00:00
Edward Z. Yang	e2d89e9704	Factor meta conversion through serializable MetaTensorDesc (#122044 ) Fixes https://github.com/pytorch/pytorch/issues/121085 This PR pretty involved so pay attention to this description. At a high level, the refactor is intended to be mechanical: anywhere in MetaConverter where previously we took a Tensor as argument, we now take a MetaTensorDesc, which contains all of the information that we would have queried off of the Tensor, but placed into a separate data structure which we can serialize or use to recreate a fake tensor in a separate fake tensor mode in exact fidelity to the original. However, this transformation is not always entirely mechanical. Here is what you need to pay attention to: - The memo table from real Tensor -> meta/fake Tensor is now broken into two memo tables: real Tensor -> stable int id -> meta/fake Tensor. The stable int id is needed so that when we do serialization, we know when tensors/storages alias each other and can ensure we preserve this aliasing upon deserialization. The way I have implemented changes the weak reference behavior. Previously, when either the real Tensor OR the meta/fake Tensor went dead, we would remove the entry from the memo table. Now, this only removes entries from one of the two memo tables. This semantically makes sense, because the user may have held on to the stable int id out of band, and may expect a real Tensor to continue to be numbered consistently / expect to be able to lookup a meta/fake tensor from this id. If this is unacceptable, it may be possible to rejigger the memo tables so that we have real Tensor -> stable int id and real Tensor -> meta/fake Tensor, but TBH I find the new implementation a lot simpler, and arranging the memo tables in this way means that I have to muck around with the real tensor to save to the memo table; in the current implementation, I never pass the Tensor to meta_tensor function AT ALL, which means it is impossible to accidentally depend on it. - When I fill in the fields of MetaTensorDesc in describe_tensor, I need to be careful not to poke fields when they are not valid. Previously, preconditions were implicitly checked via the conditional structure ("is this sparse? is this nested?") that is tested before we start reading attributes. This structure has to be replicated in describe_tensor, and I have almost assuredly gotten it wrong on my first try (I'll be grinding through it on CI; a careful audit will help too, by auditing that I've tested all the same conditionals that the original access was guarded by.) - I originally submitted https://github.com/pytorch/pytorch/pull/121821 for the symbolic shapes change, but it turned out the way I did it there didn't actually work so well for this PR. I ended up just inlining the symbolic shapes allocation logic into MetaConverter (look for calls to maybe_specialize_sym_int_with_hint), maybe there is a better way to structure it, but what I really want is to just read sizes/strides/offset directly off of MetaTensorDesc; I don't want another intermediate data structure. - Some fields aren't serializable. These are documented as "NOT serializable". ctx/type should morally be serializable and I just need to setup a contract with subclasses to let them be serialized. The fake_mode is used solely to test if we are refakefying with a pre-existing ShapeEnv and we want to reuse the SymInt directly--serializing this case is hopeless but I am kind of hoping after this refactor we do not need this at all. view_func is not serializable because it's a bound C implemented method. Joel has promised me that this is not too difficult to actually expose as a true data structure, but this is the edgiest of edge cases and there is no reason to deal with it right now. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122044 Approved by: https://github.com/eellison ghstack dependencies: #122018	2024-03-22 03:56:34 +00:00
Zhengxu Chen	b1fa0ce4aa	[export] build the infra to rollout predispatch export. (#122326 ) Test Plan: fbcode:caffe2/test/quantization:test_quantization fbcode:bolt/nn/executorch/backends/tests:qnn_test fbcode:on_device_ai/helios/compiler_tests/... fbcode:pyspeech/tests:pyspeech_utils_test_oss fbcode:caffe2/test:quantization_pt2e_qat fbcode:on_device_ai/Assistant/Jarvis/tests:test_custom_ops fbcode:modai/test:test_modai fbcode:executorch/exir/backend/test:test_partitioner Differential Revision: D55133846 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122326 Approved by: https://github.com/tugsbayasgalan	2024-03-22 00:55:10 +00:00
Pian Pawakapan	c20bc18d59	[export] allow static constraints in dynamic_shapes (#121860 ) This PR allows users to specify int values for dimensions in dynamic_shapes as well as None, for example: ``` class Foo(torch.nn.Module): def forward(self, x, y, z): ... foo = Foo() inputs = (torch.randn(4, 6), torch.randn(5, 4), torch.randn(3, 3)) for dynamic_shapes in [ None ((4, 6), (5, 4), (3, 3)), ((None, 6), None, {0: 3, 1: 3}) ]: _ = export(foo, inputs, dynamic_shapes=dynamic_shapes) ``` All of the above should produce the same ExportedProgram. This is done by temporarily creating a static dim constraint during analysis, where vr.lower == vr.upper. These constraints are then deleted during _process_constraints(), and do not show up in the final ExportedProgram's range_constraints. Additionally, export() will also fail if the shapes are mis-specified, for example: ``` _ = export(foo, inputs, dynamic_shapes=((5, None), None, None)) ``` leads to `torch._dynamo.exc.UserError: Static shape constraint of 5 does not match input size of 4, for L['x'].size()[0]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121860 Approved by: https://github.com/avikchaudhuri	2024-03-21 16:59:59 +00:00
Sherlock Huang	ae913175c3	Fix GraphModuleDeserializer (#122342 ) Summary: self.constants is used in self.deserialize_signature() Test Plan: CI Differential Revision: D55152971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122342 Approved by: https://github.com/zhxchen17	2024-03-21 02:27:39 +00:00
Zhengxu Chen	f8565c4a28	[sigmoid] Clean up serialization API. (#122102 ) Summary: Entirely remove the old serializer code to avoid further confusion and code bloat. Test Plan: CI Reviewed By: SherlockNoMad Differential Revision: D54857118 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122102 Approved by: https://github.com/tugsbayasgalan	2024-03-20 03:45:36 +00:00
Pian Pawakapan	c5ffebebab	[export] allow Dim(1,2) for export dynamic shapes (v2 after revert) (#121910 ) Creating this after [PR](https://github.com/pytorch/pytorch/pull/121642) got reverted. Current dynamic shapes implementation fixes lower range of Dims to be 2 for analysis, but allows 0/1 shapes during runtime. This leads to failures when initializing Dim(1,2). This PR sets the lower bound to 0, and avoids erroring out when conflicting with the generated (2, maxsize) constraint during analysis. Also resolves a derived dim constraints issue with the following code: ``` class Bar(torch.nn.Module): def forward(self, x, y): return x + y[1:] dx = Dim("dx", min=1, max=3) ep = export( Bar(), (torch.randn(2, 2), torch.randn(3, 2)), dynamic_shapes=({0: dx, 1: None}, {0: dx+1, 1: None}) ) print(ep.range_constraints) ``` In main: ``` {s0: ValueRanges(lower=2, upper=3, is_bool=False), s0 + 1: ValueRanges(lower=3, upper=4, is_bool=False)} ``` This PR: ``` {s0: ValueRanges(lower=1, upper=3, is_bool=False), s0 + 1: ValueRanges(lower=2, upper=4, is_bool=False)} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121910 Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17	2024-03-19 19:08:05 +00:00
PyTorch MergeBot	d56ab7b020	Revert "[torch export][serialize] create a more compact stacktrace format for serialization (#121675 )" This reverts commit `eae89138d8`. Reverted https://github.com/pytorch/pytorch/pull/121675 on behalf of https://github.com/jeanschmidt due to It seems that this PR broke lint jobs, I am reverting to confirm if this is the case ([comment](https://github.com/pytorch/pytorch/pull/121675#issuecomment-2007919486))	2024-03-19 19:02:09 +00:00
Wenting Wang	eae89138d8	[torch export][serialize] create a more compact stacktrace format for serialization (#121675 ) Summary: - we want fx nodes' stack trace format to be backward compatible and same as before in the program we export - however in the serialized format, we would want to show a more compact stack_trace format, otherwise the nodes attributes are dominated by stack traces - the diff implements the minimal in serialization process to dedupe node stack traces by resorting to a fileinfo_list and a filename_to_abbrev map, so we can use index to represent filenames, use lineno to represent lines. Test Plan: # llm base on D54497918 ``` buck2 run @//mode/dev-nosan fbcode//executorch/examples/models/llama2:export_llama -- -c ~/stories110M.pt -p ~/params.json ``` set up breakpoint after serialization/deserialization - serialize ``` (Pdb) v_meta = [n.meta for n in exported_program.graph_module.graph.nodes] (Pdb) paste_client.create_phabricator_paste_object(paste_creation_client_id=1093956601162697, content=str(v_meta)).number 1193647450 (Pdb) json_program = json.dumps(_dataclass_to_dict(serialized_graph.co_fileinfo_ordered_list),cls=EnumEncoder) (Pdb) json_bytes = json_program.encode('utf-8') (Pdb) paste_client.create_phabricator_paste_object(paste_creation_client_id=1093956601162697, content=str(json_bytes)).number 1193604333 (Pdb) sys.getsizeof(json_bytes) 3846 (Pdb) compressed_bytes = zstd.ZstdCompressor().compress(json_bytes) (Pdb) sys.getsizeof(compressed_bytes) 1139 ``` in P1193647450 (before serialization), search for `stack_trace` in P1193604333 (after serialization), search for `stack_trace` and `co_fileinfo_ordered_list` [note: didn't do compression in this diff since the size is pretty small and it adds complexity if we do compression] - deserialize ``` (Pdb) v_meta = [n.meta for n in deserialized_exported_program.graph_module.graph.nodes] (Pdb) paste_client.create_phabricator_paste_object(paste_creation_client_id=1093956601162697, content=str(v_meta)).number 1193629435 ``` in P1193629435, search for `stack_trace` # ads Differential Revision: D54654443 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121675 Approved by: https://github.com/angelayi	2024-03-19 17:58:12 +00:00
Pian Pawakapan	3bd38928ba	[export] Improve consistency for nn_module_stack metadata, add checks to _trace.py (#120661 ) We would like to improve consistency for nn_module_stack metadata in torch.export. This PR ensures that all tests in test/export/test_export.py has the following constraints: - Remove nn_module_stack for all placeholder & output nodes, for all modules and submodules - Ensure nn_module_stack is present for all other node types for the top-level module (there is still an issue with torch.cond submodules having empty fields) - Add these checks to _export() in _trace.py (we would add this in the Verifier, but downstream apps construct ExportedPrograms separate from _export(), and metadata may not be maintained there) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120661 Approved by: https://github.com/avikchaudhuri	2024-03-16 21:44:52 +00:00
Wenting Wang	dfc5e9325d	format caffe2/torch/_export/serde/serialize.py (#121670 ) Summary: black caffe2/torch/_export/serde/serialize.py Test Plan: tests Differential Revision: D54654847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121670 Approved by: https://github.com/angelayi	2024-03-15 21:30:16 +00:00
angelayi	ef25d83a62	[export] Add serialization support for tokens (#121552 ) Differential Revision: [D54906766](https://our.internmc.facebook.com/intern/diff/D54906766) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121552 Approved by: https://github.com/zhxchen17	2024-03-15 16:15:11 +00:00
Zhengxu Chen	c409292197	[sigmoid] Use deserializer from oss. (#121839 ) Summary: Old path: thrift -> thrift deserializer -> graph module. new path: thrift -> python dataclass -> oss deserializer -> graph_module Test Plan: CI buck2 test mode/dev-nosan caffe2/test/inductor/fb:test_aot_inductor_pt2_inference Reviewed By: SherlockNoMad Differential Revision: D54855251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121839 Approved by: https://github.com/angelayi	2024-03-14 18:38:58 +00:00
PyTorch MergeBot	bf7ac4ddf7	Revert "[export] allow Dim(1,2) for export dynamic shapes (#121642 )" This reverts commit `a8dcbf2749`. Reverted https://github.com/pytorch/pytorch/pull/121642 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/121642#issuecomment-1996121710))	2024-03-13 23:51:20 +00:00
Pian Pawakapan	a8dcbf2749	[export] allow Dim(1,2) for export dynamic shapes (#121642 ) Current dynamic shapes implementation fixes lower range of Dims to be 2 for analysis, but allows 0/1 shapes during runtime. This leads to failures when initializing Dim(1,2). This PR sets the lower bound to 0, and avoids erroring out when conflicting with the generated (2, maxsize) constraint during analysis. Also resolves a derived dim constraints issue with the following code: ``` class Bar(torch.nn.Module): def forward(self, x, y): return x + y[1:] dx = Dim("dx", min=1, max=3) ep = export( Bar(), (torch.randn(2, 2), torch.randn(3, 2)), dynamic_shapes=({0: dx, 1: None}, {0: dx+1, 1: None}) ) print(ep.range_constraints) ``` In main: ``` {s0: ValueRanges(lower=2, upper=3, is_bool=False), s0 + 1: ValueRanges(lower=3, upper=4, is_bool=False)} ``` This PR: ``` {s0: ValueRanges(lower=1, upper=3, is_bool=False), s0 + 1: ValueRanges(lower=2, upper=4, is_bool=False)} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121642 Approved by: https://github.com/avikchaudhuri	2024-03-13 22:59:07 +00:00
Sherlock Huang	dd568f4207	[Export, AOTInductor] Populate ShapeEnv's var_to_val during deserialization (#121759 ) Summary: Deserialization didn't populate ShapeEnv's `var_to_val` field properly, and AOTInductor is relying on this field to compile dynamic shape properly. As a result, when AOTI failed at compiling a deserialized ExportedProgram. Test Plan: buck2 test mode/dev-nosan caffe2/test/inductor/fb:test_aot_inductor_pt2_inference Differential Revision: D54559494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121759 Approved by: https://github.com/avikchaudhuri	2024-03-13 21:28:25 +00:00
Avik Chaudhuri	7fe0cc53e9	make _process_dynamic_shapes an implementation detail (#121713 ) Summary: `_process_dynamic_shapes` converts new dynamic shapes to old constraints, but in the future may not need to do so. Preparing for that future. Test Plan: CI Differential Revision: D54780374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121713 Approved by: https://github.com/tugsbayasgalan	2024-03-13 08:33:00 +00:00
Zhenghao Zhao	3461404869	[pt2 export]fix name collision on constant name (#121145 ) Summary: Taking the right most part of the fqn will cause name conflict when having multiple instances of the same class. Changed to replace "." in fqn by "_" to avoid invalid syntax in input args. Test Plan: added test case Differential Revision: D54435230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121145 Approved by: https://github.com/zhxchen17	2024-03-11 20:40:59 +00:00
Oguz Ulgen	660ec3d38d	[Export] Fix bug removing node from wrong graph (#121574 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121574 Approved by: https://github.com/ydwu4	2024-03-10 04:46:11 +00:00
angelayi	e8836759d0	[export] Add effect token to export (#121424 ) Following the creation of effect tokens (https://github.com/pytorch/pytorch/pull/120296), we want to now add support for these tokens in export because the calling/returning convention has changed. The inputs are now `(tokens, params, buffers, constants, user_inputs)` and the outputs are `(tokens, buffer_mutations, user_mutations, user_outputs)`. The graph looks something like: ``` graph(): %arg0_1 : [num_users=1] = placeholder[target=arg0_1] %attr : [num_users=2] = placeholder[target=attr] %arg1_1 : [num_users=2] = placeholder[target=arg1_1] %with_effects : [num_users=2] = call_function[target=torch._higher_order_ops.effects.with_effects](args = (%arg0_1, _TorchScriptTesting.takes_foo.default, %attr, %arg1_1), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects, 0), kwargs = {}) %getitem_1 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects, 1), kwargs = {}) %with_effects_1 : [num_users=2] = call_function[target=torch._higher_order_ops.effects.with_effects](args = (%getitem, _TorchScriptTesting.takes_foo.default, %attr, %getitem_1), kwargs = {}) %getitem_2 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects_1, 0), kwargs = {}) %getitem_3 : [num_users=1] = call_function[target=operator.getitem](args = (%with_effects_1, 1), kwargs = {}) %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, %getitem_3), kwargs = {}) return (getitem_2, add) ``` During unlifting, we will first remove the tokens and with_effect calls using the `remove_effect_tokens` pass. (cc @SherlockNoMad on the pass to remove tokens). This is so that this won't change the calling conventions when retracing. The graph after unlifting looks something like: ``` graph(): %attr_1 : [num_users=2] = get_attr[target=attr] %arg1_1 : [num_users=2] = placeholder[target=arg1_1] %takes_foo_default_1 : [num_users=1] = call_function[target=torch.ops._TorchScriptTesting.takes_foo.default](args = (%attr_1, %arg1_1), kwargs = {}) %takes_foo_default : [num_users=1] = call_function[target=torch.ops._TorchScriptTesting.takes_foo.default](args = (%attr_1, %takes_foo_default_1), kwargs = {}) %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, %takes_foo_default), kwargs = {}) return (add,) ``` Serialization support will be added in a followup. Note: tokens only affect custom ops that take in ScriptObjects, not ScriptObject methods yet. Differential Revision: [D54639390](https://our.internmc.facebook.com/intern/diff/D54639390) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121424 Approved by: https://github.com/tugsbayasgalan	2024-03-09 02:43:26 +00:00
Avik Chaudhuri	b3f24b57fb	fix accidental specialization with faketensor input checks (#121460 ) Summary: When fake tensors are passed to a graph module and we do runtime assertions on them, we can accidentally trigger specialization guards. It's better to just relax the checking for these. Test Plan: confirmed that problem in T181400371 is now fixed Differential Revision: D54658960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121460 Approved by: https://github.com/angelayi	2024-03-08 08:02:37 +00:00
Zhengxu Chen	76f1461892	[export] Serialize union fields with single entry dict. (#121263 ) (#121337 ) Summary: remove "$type" and "$value" fields, instead only serialize as {type: value} for union fields directly. bypass-github-export-checks Test Plan: CI Differential Revision: D54600943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121337 Approved by: https://github.com/tugsbayasgalan	2024-03-07 21:24:28 +00:00
PyTorch MergeBot	23fb37fa41	Revert "[export] Serialize union fields with single entry dict. (#121263 )" This reverts commit `7feabe9b73`. Reverted https://github.com/pytorch/pytorch/pull/121263 on behalf of https://github.com/osalpekar due to A large number of inductor benchmarking jobs failing starting this PR. See for details: `7feabe9b73` ([comment](https://github.com/pytorch/pytorch/pull/121263#issuecomment-1981680049))	2024-03-06 19:58:55 +00:00

... 3 4 5 6 7 ...

882 Commits