pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Tugsbayasgalan Manlaibaatar	2b21a653d8	Register CIA ops to FakeTensorMode directly in export (#140465 ) During export, we nub out most CIA ops to return NotImplemented to avoid decomposing them during tracing. To recover the existing shape propagation behavior, we register these CIA decomps directly as FakeTensorMode rules as well. The reason we have to do is because when we return NotImplemented, FakeTensor would fallback to running these CIAs with Meta backend causing device branching CIA ops to fail. (because now the device is Meta. One example is sdpa). If we register a kernel directly to FakeTensorMode, we won't fallback to Meta backend. Differential Revision: [D65716260](https://our.internmc.facebook.com/intern/diff/D65716260/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140465 Approved by: https://github.com/bdhirsh	2024-11-19 15:00:35 +00:00
Henry Tsang	350bc2a166	[export] Add support for symbool to make it usable for torch.cond (#138765 ) # Why? I want the following code to work. minimal repro: ``` class M(torch.nn.Module): def forward(self, dilate_flag): return dilate_flag.item() input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),) model = M().cuda() ep = torch.export.export(model, input1, strict=True) path = torch._inductor.aot_compile(ep.module(), input1) aot_model = torch._export.aot_load(path, device="cuda") actual_output = aot_model(input1) ``` error: AssertionError: Encountered an unsupported object of type <class 'torch.SymBool'> while writing the metadata for exported program second error will be handled by https://github.com/pytorch/pytorch/pull/138760 # Motivation I could technically bypass it with a torch.int tensor. However, it doesn't work with torch.cond. I want the following to work. It would also require https://github.com/pytorch/pytorch/pull/138760 for aot compile to work. ``` class M(torch.nn.Module): def __init__(self) -> None: super().__init__() self.dilate_flag = 0 def forward(self, dilate_flag): self.dilate_flag = dilate_flag.item() def true_fn(dilate_flag): return dilate_flag.clone() def false_fn(dilate_flag): return dilate_flag.clone() torch.cond( self.dilate_flag, true_fn, false_fn, (dilate_flag,), ) return self.dilate_flag input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),) input2 = (torch.tensor([0], dtype=torch.bool, device="cuda"),) inputs = (input1, input2) model = M().cuda() for input in inputs: expected_output = model(input) ep = torch.export.export(model, input, strict=False) path = torch._inductor.aot_compile(ep.module(), input) aot_model = torch._export.aot_load(path, device="cuda") actual_output = aot_model(*input) assert ( expected_output == actual_output ), f"henry they are not equal {expected_output} != {actual_output}" ``` Differential Revision: D64867504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138765 Approved by: https://github.com/ydwu4	2024-11-04 23:31:49 +00:00
Tugsbayasgalan Manlaibaatar	ae0e7042f6	Fix custom obj being input (#139209 ) Differential Revision: [D65158939](https://our.internmc.facebook.com/intern/diff/D65158939) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139209 Approved by: https://github.com/ydwu4 ghstack dependencies: #138658	2024-11-04 18:24:29 +00:00
Tugsbayasgalan Manlaibaatar	e080c89bdc	Make test_torchbind.py training IR compatible (#138658 ) In this diff, i make test_torchbind.py tests to handle training IR. Today in the training IR, we don't see the effect token and HOP because this happens at the FunctionalTensorMode. Maybe in the future, we should move this logic up to the training IR so that writing passes etc on training Ir is safer. But for the migration purposes, i think it is ok for now. I also fixed two bugs: 1. ep.module() doesn't register all aliased constants in the module. 2. When we retrace, we need to fakify the original Torchbind object. 3. We don't run any DCE on training IR so we need to add some more torch ops to verifier. Differential Revision: [D64853530](https://our.internmc.facebook.com/intern/diff/D64853530) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138658 Approved by: https://github.com/ydwu4, https://github.com/zhxchen17	2024-11-04 17:43:11 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	3a0c361899	Remove presere ops (#138371 ) Summary: CI #buildall Test Plan: CI Reviewed By: StellarrZ Differential Revision: D64151426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138371 Approved by: https://github.com/bdhirsh	2024-10-25 19:13:55 +00:00
Avik Chaudhuri	1d98a526dd	preserve signatures with multiple calls + buffer mutations (#138669 ) As called out in https://github.com/pytorch/pytorch/pull/137999, preserving signatures of multiple calls when buffer mutations are present was NYI. The main problem was that intermediate values of buffers were not tracked, so couldn't be propagated statefully between multiple calls (i.e., they would need to be explicitly passed around, defeating the unlifting needed for preserving signatures). This PR fixes this situation, by introducing module attributes that carry the necessary intermediate values of buffer mutations. In general, a buffer mutation can have several intermediate values it depends on recursively, even other buffers. So rather than tying an intermediate value with a particular buffer, we tie it with the submodules that create and read it. We install an attribute on all modules that create or read a particular intermediate value, sharing the same initial storage (i.e., initialized with the same empty tensor). For the module that creates this intermediate value, we copy the value into the corresponding attribute; and for the modules that read it, we read the corresponding attribute instead. Another complication that needed to be addressed was that a `run_decompositions` following an `export_for_training` was not preserving module call graphs, which is needed for unflattening and, in particular, used when remapping inputs. Fortunately some existing metadata already tracks provenance of nodes, which we could use to update a module call graph after functionalization / decomposition. Differential Revision: D64806175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138669 Approved by: https://github.com/tugsbayasgalan	2024-10-25 00:13:25 +00:00
Tugsbayasgalan Manlaibaatar	1f32a1fb80	Replace torch.export default decomp table to be lazily populated (#137650 ) In this PR, we implement lazy dictionary for export decomp behaviour for following reasons: 1. Custom op loading can happen after import time, as a result, the decomp table might not be able to pick up the decomp. Therefore we try to delay materialization as late as possible. I intentionally seperated out the core_aten_decomp to not have any custom CIA ops in this PR to mitigate the risk of getting reverted but in the future, core_aten_decomp under torch/_decomp will exist as an alias to official export table (torch.export.default_decompositions) Differential Revision: [D64140807](https://our.internmc.facebook.com/intern/diff/D64140807) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137650 Approved by: https://github.com/justinchuby, https://github.com/bdhirsh	2024-10-18 19:28:52 +00:00
Tugsbayasgalan Manlaibaatar	5fca2fd365	Try unify training and inference (#136888 ) Previously inference -> inference IR was going through a seperate flow from train -> inference decomposition. This diff unifies them so that we always retrace when decomposing. Joint IR decomp is still going through old flow (inference -> inference) but seems ok for now since it is still in experimental stage. Differential Revision: [D63062521](https://our.internmc.facebook.com/intern/diff/D63062521/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136888 Approved by: https://github.com/avikchaudhuri	2024-10-11 20:09:58 +00:00
Tugsbayasgalan Manlaibaatar	bb31e3f57e	Add original forward names to schema so that prettify pass works (#136887 ) When we run_decomp, we retrace if it is training IR. As a result, we do need to reliably store the oroiginal forward names when we run decomp. Differential Revision: [D63064453](https://our.internmc.facebook.com/intern/diff/D63064453/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136887 Approved by: https://github.com/angelayi	2024-10-08 04:21:02 +00:00
Pian Pawakapan	f33ffd01f2	[export] fix joint graph metadata (#136011 ) Differential Revision: D62652832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136011 Approved by: https://github.com/tugsbayasgalan	2024-10-07 19:36:44 +00:00
Tugsbayasgalan Manlaibaatar	73b07df042	Preserve custom ops via run_decomps (#136882 ) This is re-apply of https://github.com/pytorch/pytorch/pull/136773?fbclid=IwZXh0bgNhZW0CMTEAAR3SmginkvZcILVY7G2XDa_KosnV4DPmq1l6pkjPIM255QgJLKVAR90rGAU_aem_ZWpcVdUsmAGzOGiwbjtBDg. Note that this doesn't completely remove the _preserve_ops list from export mainly because we want to have small change to address failing executorch tests. All the complications included in this PR is deleted in the next PR. Differential Revision: [D63553086](https://our.internmc.facebook.com/intern/diff/D63553086/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136882 Approved by: https://github.com/bdhirsh	2024-10-01 17:38:00 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	e4e83a4ac4	Remove aten.item hack (#136663 ) Summary: Title Test Plan: CI Differential Revision: D63404353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136663 Approved by: https://github.com/bdhirsh	2024-09-26 17:14:48 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	0b38fa154a	Fix meta registry in export (#136492 ) Summary: Title Test Plan: CI This fixes some breaking tests in executorch. I think the root cause is when we have aten::matmul which we are not preserving, we register meta implementation from C++ side. It seems like the C++ kernel doesn't work well with mix of FakeTensor and real tensor. This PR sidesteps this problem by always preferring python CIA decomp over C++ Cia decomp Differential Revision: D63297050 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136492 Approved by: https://github.com/bdhirsh	2024-09-25 17:53:02 +00:00
Tugsbayasgalan Manlaibaatar	1904b09e61	Create export_for_inference API and expose core_aten as public facing API (#135912 ) Differential Revision: [D62606908](https://our.internmc.facebook.com/intern/diff/D62606908) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135912 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #135080	2024-09-15 17:05:07 +00:00
Tugsbayasgalan Manlaibaatar	382fad58b3	Deprecate _preserve_ops and consolidate with decomp_table (#135080 ) In this PR, we deprecate _preserve_ops feature in run_decomposition API. We can't kill this API completely because Executorch team depends on it. As the syncing between two repos is non-trivial, I just leave this argument as deprecated for now. In the next PR, i will immediately remove it. After this PR, run_decompositions will only decompose what's inside the decomp table and preserve the rest by default. Note that this feature is only rolled out to OSS for now. Old code path is protected under IS_FBCODE flag. Differential Revision: [D62163161](https://our.internmc.facebook.com/intern/diff/D62163161/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135080 Approved by: https://github.com/justinchuby, https://github.com/avikchaudhuri, https://github.com/bdhirsh	2024-09-15 17:01:58 +00:00
Tugsbayasgalan Manlaibaatar	9d705605dd	Fix decomp behaviour in export training IR (#134801 ) Subset of changes in https://github.com/pytorch/pytorch/pull/132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export. Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134801 Approved by: https://github.com/avikchaudhuri, https://github.com/angelayi	2024-09-05 06:37:44 +00:00
Zhengxu Chen	a19a7524f6	[export] Make sure getitem replacement are synced with module call graph. (#134830 ) Summary: When we are placing nodes in the graph, we should also replace the references in module_call_graph. Test Plan: buck2 run 'fbcode//mode/opt' torchrec/fb/ir/tests:test_serializer -- --filter-regex test_serialize_deserialize_vlea buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_serialize_empty_value_vlea' --run-disabled buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_deserialized_device_vle' --run-disabled Differential Revision: D62014035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134830 Approved by: https://github.com/angelayi	2024-08-30 16:47:05 +00:00
Avik Chaudhuri	92e38a476f	preserve aten::to device in export training (#134622 ) Summary: With training IR, we cannot rely on trapping `to()` in `FunctionalTensor` because the regular decomposition kicks it first, and that can cause it to be optimized away. So instead we preserve it until we functionalize, and then replace it explicitly with `_to_copy()`. Test Plan: expected test failures go away Differential Revision: D61883878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134622 Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan	2024-08-29 14:53:30 +00:00
Zhengxu Chen	3ef1cc8583	[export] Implement common_getitem_elimination pass. (#133618 ) Summary: In export, we will generate many redundant getitem nodes branching from the same source, inserted by runtime assertions or any passes. This is causing issues with any downstream system relying on any value being uniquely defined by a single node. I don't think it hurt to remove a bunch of getitem nodes only, so I just added to the ctor. Test Plan: rebase on D61256937 ``` buck2 run scripts/bearzx:pt2_export_playground ``` Differential Revision: D61351578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133618 Approved by: https://github.com/tugsbayasgalan	2024-08-21 16:48:24 +00:00
Justin Chu	271ee90851	[easy] Fix type annotation for `ExportedProgram.run_decompositions` (#133720 ) Fix the tuple type annotation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133720 Approved by: https://github.com/Skylion007	2024-08-16 22:11:42 +00:00
Edward Z. Yang	1f66487c69	[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770 Approved by: https://github.com/bdhirsh	2024-08-08 23:07:23 +00:00
PyTorch MergeBot	d1f73fd844	Revert "[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 )" This reverts commit `902c6f3a19`. Reverted https://github.com/pytorch/pytorch/pull/132770 on behalf of https://github.com/ezyang due to Removed API was recommitted ([comment](https://github.com/pytorch/pytorch/pull/132770#issuecomment-2275749689))	2024-08-08 12:54:34 +00:00
Edward Z. Yang	902c6f3a19	[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770 Approved by: https://github.com/bdhirsh ghstack dependencies: #132674, #132675, #132421, #132062, #132767, #132769	2024-08-08 12:03:25 +00:00
Yidi Wu	bbf568aac8	Split of "[reland] [export] fix zero arg export in training_ir and constant tensor handling" (#132307 ) Summary: A re-land of D60006710. Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing. edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure: a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state. The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design edit 2: Also fix the inconsistency of graph signatures when param_constant is marked as lifted_tensor_constants but it's registered as parameters in the output of ep.module(). Differential Revision: D60532628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132307 Approved by: https://github.com/zhxchen17	2024-08-08 01:36:16 +00:00
angelayi	c327710a87	[export] Publicize validate function (#132777 ) as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/132777 Approved by: https://github.com/zhxchen17	2024-08-07 23:10:05 +00:00
Tugsbayasgalan Manlaibaatar	775c310c0c	Preserve source_fn_stack in the training IR decomp (#132033 ) Title Differential Revision: [D60377712](https://our.internmc.facebook.com/intern/diff/D60377712/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132033 Approved by: https://github.com/angelayi ghstack dependencies: #131988, #131995, #131999	2024-08-06 19:45:40 +00:00
Xuehai Pan	f3fce597e9	[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]/` and `torch/[e-n]/` (#129769 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769 Approved by: https://github.com/ezyang	2024-08-04 10:24:09 +00:00
Tugsbayasgalan Manlaibaatar	073430ebea	Don't check for autograd state when lowering to inference IR (#131988 ) When lowering to inference IR, we shouldn't error on autograd state changes because we will have preserved the autograd state change at the training level. I think the more correct way of implementing it would be to wrap autograd ops in HOP before decomposing, but that seems low ROI. Differential Revision: [D60346235](https://our.internmc.facebook.com/intern/diff/D60346235/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131988 Approved by: https://github.com/angelayi	2024-08-01 04:15:37 +00:00
Zhengxu Chen	7feaa73057	[export] Remove deprecated fields from ExportedProgram ctor. (#131697 ) Summary: as title. Test Plan: CI Reviewed By: SherlockNoMad Differential Revision: D60078426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131697 Approved by: https://github.com/ydwu4	2024-07-26 16:19:46 +00:00
Avik Chaudhuri	5b05ad9697	fix non-persistent buffers (#131756 ) Summary: Dynamo doesn't track whether buffers are `persistent`. This led to some ugly code where we would mark buffers as always persistent when creating signatures, then later check whether the buffers were not in the state dict to infer whether they were non-persistent, and use this to fix up the signature. This PR instead defines a utility to look up all the non-persistent buffers registered inside a module (this information is recorded in a private `_non_persistent_buffers_set` module attribute), and uses it to (a) correctly set the persistent flag on buffers when creating signatures (b) transfer this information to a Dynamo-traced graph module, which then causes non-persistent buffers to (correctly) not show up in the state dict. Test Plan: existing tests + new case with non-persistent buffer in nested module Differential Revision: D60224656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131756 Approved by: https://github.com/zhxchen17, https://github.com/ydwu4	2024-07-26 04:45:30 +00:00
Avik Chaudhuri	83d19620f6	kill tmp _is_executorch flag (#131488 ) Test Plan: existing tests Differential Revision: D60126186 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131488 Approved by: https://github.com/ydwu4	2024-07-24 08:51:37 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
Avik Chaudhuri	94f22eb6b2	refactor post-trace fakification in strict (#131421 ) Summary: Previously it was unclear what `_convert_input_to_fake` actually does (used in strict), and in particular how it is different from `make_fake_inputs` (used in non-strict). This PR splits that function to work purely on user inputs, then renames it to `extract_fake_inputs` and adds a comment clarifying what it does—namely, it extracts fake inputs from a given graph module instead of "converting inputs to fake inputs" (as suggested by the current name) or "making fake inputs" (as happens in non-strict, where no tracing has taken place yet). The remainder of that function used to also fakify params and buffers. It turns out that this part is identical to what happens in non-strict, hence we also pull `make_fake_inputs` out from `non_strict_utils` into `_trace`, merge it with another util, and make both modes call it. Test Plan: existing tests Differential Revision: D60084442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131421 Approved by: https://github.com/zhxchen17	2024-07-23 18:23:03 +00:00
Sherlock Huang	c1ef214046	Print ExportedProgram without color by default (#131399 ) Summary: Without plugin, colored ExportedProgram is not really readable. ![image](https://github.com/user-attachments/assets/319920a9-bb4b-4ad2-bcac-0c4f76973b11) Test Plan: CI Differential Revision: D60074481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131399 Approved by: https://github.com/angelayi	2024-07-23 16:41:55 +00:00
angelayi	26f7dd286b	[export] Allow non-CIA ops to be preserved (#131075 ) I feel like the semantics of `run_decompositions(preserve_ops,...)` should be that we should always preserve whatever operator is put into `preserve_ops`, even if it's not CIA? Pull Request resolved: https://github.com/pytorch/pytorch/pull/131075 Approved by: https://github.com/bdhirsh	2024-07-23 00:41:48 +00:00
PyTorch MergeBot	b9912f31ef	Revert "[export] fix zero arg export in training_ir (#130990 )" This reverts commit `50436d5bdb`. Reverted https://github.com/pytorch/pytorch/pull/130990 on behalf of https://github.com/clee2000 due to failing some executorch and torchrec tests internally D60006710 ([comment](https://github.com/pytorch/pytorch/pull/130990#issuecomment-2243395316))	2024-07-22 16:49:25 +00:00
Yidi Wu	50436d5bdb	[export] fix zero arg export in training_ir (#130990 ) Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing. edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure: a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state. The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130990 Approved by: https://github.com/pianpwk	2024-07-20 02:35:13 +00:00
Pian Pawakapan	745324e487	[export] turn on hybrid symints by default (#130775 ) Sets `prefer_deferred_runtime_asserts_over_guards=True` for export, so any guards emitted from `SymNode.expect_true` (for example, guards that are implicitly required to be true for an op to succeed) won't lead to constraint violations. Instead these should appear in the graph as runtime asserts, or potentially as replacement expressions for placeholder shapes. For example, this reshape op should emit s0 * s1 = s2, deferred as a runtime assert. ``` x = torch.randn(4, 8) # [s0, s1] y = torch.randn(32) # [s2] out = x.reshape(-1) + y # this emits Eq(s0 * s1, s2), and we represent y's shape as [s0s1] in the graph. ``` However, other complex guards can still cause export to fail, for instance guards emitted from `SymNode.guard_bool/guard_size_oblivious` (e.g. explicit if-else conditions in user code or lower-level op implementations hit during tracing) can still raise constraint violations. These can be deferred with `allow_complex_guards_as_runtime_asserts=True`. We don't yet make this default, because while this makes export more likely to succeed, it results in non-trivial asserts being emitted that often represent specialization to a variant of the op, or checks related to 0/1 specialization. We also remove forced specializations for export and kill the `_disable_forced_specializations` flag - now any guard we can't express with Dims/DerivedDims either are handled with Hybrid SymInts, or should be resolved with rewriting or deferring. Follow up: Currently, `ShapeEnv._set_replacement()` is called for complex equality expressions (e.g. s2 -> s0s1 in the example above), and the ExportedProgram stores `s0*s1` in the input placeholder. This isn't checked for validity when the program is run, so an option is to avoid replacement and/or runtime assert on equality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130775 Approved by: https://github.com/avikchaudhuri	2024-07-18 17:40:58 +00:00
Zhengxu Chen	5484c86021	[export] Fully support extension op in serialization/deserialization. (#130851 ) Summary: Finishing up the mechanism to "register" certain types of operators to a registry so that the serializer can handle them correctly. This is expected to be firstly used by executorch. Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_export_with_extension_op_serialization Differential Revision: D59825148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130851 Approved by: https://github.com/angelayi	2024-07-18 16:47:53 +00:00
angelayi	6c2c8ee15b	[export] Remove preserved ops from decomp list (#130970 ) Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1466016147369925/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/130970 Approved by: https://github.com/bdhirsh	2024-07-18 05:15:22 +00:00
Pian Pawakapan	d96c80649f	[export] constants & non-persistent buffers for training IR (#130864 ) Summary: Uses original ExportedProgram constants and graph signature to inform decompositions, so that constant tensors and non-persistent buffers are respected for training IR. Removes 7 test failures for training IR. Test Plan: test_export Differential Revision: D59820909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130864 Approved by: https://github.com/angelayi	2024-07-17 18:27:53 +00:00
Pian Pawakapan	18b7633bfb	[export] fix kwargs in run_decompositions() for training IR (#130553 ) Re-exporting GraphModule expects all inputs to be in args, though not in pytree-flattened format. This avoids failing when we run with a fx.Interpreter subclass in [AOTAutograd tracing](`973037be6a/torch/_functorch/_aot_autograd/traced_function_transforms.py (L760-L762)`). Removes 7 test failures for training IR export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130553 Approved by: https://github.com/zhxchen17, https://github.com/ydwu4	2024-07-11 22:53:18 +00:00
Zhengxu Chen	726a287271	[export] Expand verifier to be multiple on ExportedProgram (#130364 ) Summary: This diff updates the ExportedProgram class in PyTorch to allow for multiple verifiers to be attached to it. This is done by adding a new field to the ExportedProgram schema called "verifiers" which is a list of strings representing the names of the verifiers to be attached to the program. The verifiers are loaded using the "load_verifier" function which is defined in the "torch._export.serde.serialize" module. The "exported_program.dialect" field is also deprecated in favor of the "verifiers" field. Test Plan: CI Differential Revision: D59408546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130364 Approved by: https://github.com/angelayi, https://github.com/ydwu4	2024-07-11 20:34:49 +00:00
Pian Pawakapan	1b3b4c2fb9	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) (#130380 ) original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train) Summary: This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Test Plan: contbuild & OSS CI, see `940e4477ab` Original Phabricator Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D59543603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380 Approved by: https://github.com/izaitsevfb	2024-07-10 19:23:37 +00:00
PyTorch MergeBot	9c9744c3ac	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `940e4477ab`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))	2024-07-09 21:03:49 +00:00
Pian Pawakapan	940e4477ab	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-07 20:10:14 +00:00
PyTorch MergeBot	963f430d13	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `0267b2ddcb`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a landrace and fails inductor/test_cudagraph_trees in trunk `0267b2ddcb` ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2211690518))	2024-07-06 07:20:05 +00:00
Pian Pawakapan	0267b2ddcb	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-06 03:44:49 +00:00
Tugsbayasgalan Manlaibaatar	dabaebd339	Make run_decomp work (#129249 ) In this PR, we implement the first version of training_ir.run_decomp functionality. Since we don't return the modified buffers as extra output in training IR, our previous strategy of reusing graph signature won't work. In fact, this run_decomp is more similar to retracing. So i reuse some of export steps here. After this PR: export_for_training().run_decomp({}, _preserve_ops=[all 183 ops]) == export_for_predispatch() - autograd_manipulating_ops. Differential Revision: [D59069090](https://our.internmc.facebook.com/intern/diff/D59069090) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129249 Approved by: https://github.com/zhxchen17 ghstack dependencies: #128077, #129092	2024-06-27 19:16:07 +00:00
Tugsbayasgalan Manlaibaatar	90f6043368	Don't decompose functional composite ops in export inference IR (#128077 ) Recently we decided to split export IR into two different IRs (training vs inference). In the inference IR, one major change we decided to introduce was we wanted to keep the composite ops that user specified in the IR. This PR does that by overriding the CompositeImplicitAutograd decomp in export inference path. Differential Revision: [D58701607](https://our.internmc.facebook.com/intern/diff/D58701607) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128077 Approved by: https://github.com/bdhirsh	2024-06-26 23:07:55 +00:00

1 2 3

150 Commits