pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Shangdi Yu	46dd226702	Fakify torchbind objects in compile_fx and add tests for SigridTransformsInstanceTorchBind (#149529 ) Summary: We need to properly fakify torchbind objects, including the ones in graph module attributes, so the resgitered fake implementation works properly. - _fakify_script_objects in `compile_fx` - Allow fake torchbind objects in `torchbind_constants` Remove `node.meta["unbacked_bindings"]` for `aot_compile` in `compile_fx`. Otherwise `ShapeProp` will fail when trying to resolve the `unbacked_bindings` of `with_effect` tokens. Update `sigrid_transforms_test` to use the latest `torch._inductor.aot_compile` API. Add a test for `Fakify torchbind objects in compile_fx and add tests for SigridTransformsInstanceTorchBind` in `e2e_test`. Test Plan: ``` buck run //caffe2/torch/fb/sparsenn:sigrid_test -- -r test_transform_torch_bind buck run //sigmoid/inference/test:e2e_test_cpu -- -r SigridTransforms buck2 run mode/dev-nosan sigmoid/inference/ts_migration:pt2i_readiness_main -- --model_id 545017754 --test_suite ads_all --mode test_preproc ``` Differential Revision: D70013257 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149529 Approved by: https://github.com/angelayi	2025-03-21 18:58:28 +00:00
angelayi	bf34e228c5	[export] Beef up guard_added logs (#149465 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149465 Approved by: https://github.com/pianpwk	2025-03-20 23:02:07 +00:00
Avik Chaudhuri	6237495fcf	torch.Size input (#149414 ) Summary: Support for `torch.Size` inputs was patchy before because `unflatten_fn` for this type returned a tuple. This PR cleans this up. Fixes #149158 Test Plan: added test Differential Revision: D71403635 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149414 Approved by: https://github.com/yushangdi	2025-03-20 16:23:13 +00:00
Tugsbayasgalan Manlaibaatar	3b7bd6c63d	Fix dynamic shapes repordering bug (#149528 ) WHen we create constraints, we look at the ordering of kwargs according to model signature. But when we trace, we use the ordering that is created based on how user passes in their kwargs. As a result, constraints and dynamic shapes end up having a different order causing issues when they have different dynamic tensor specs. Differential Revision: [D71478578](https://our.internmc.facebook.com/intern/diff/D71478578) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149528 Approved by: https://github.com/ydwu4	2025-03-20 01:57:44 +00:00
Yanan Cao (PyTorch)	fae79e91a0	Remove torch.export.export_for_inference (#149078 ) Summary: Remove torch.export.export_for_inference, it is redundant and can always be replaced with torch.export.export_for_training() + run_decompositions() Test Plan: unit tests Differential Revision: D71069057 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149078 Approved by: https://github.com/tugsbayasgalan	2025-03-19 19:57:18 +00:00
Pian Pawakapan	96828a2155	[export] refactor DimHints for type errors (#149424 ) Differential Revision: D71414367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149424 Approved by: https://github.com/justinchuby, https://github.com/avikchaudhuri	2025-03-19 18:51:07 +00:00
Avik Chaudhuri	20874a1f46	debug ival swap (#149206 ) Summary: Recall that we use "ivals" to track intermediate values of mutations during unflattening. Previously, for each such intermediate value, we would create a hidden shared attribute that would be updated / read by respective submodules. Unfortunately this scheme doesn't work when some but not all of those submodules are swapped out. This is because the swapped in submodules have no knowledge of these hidden attributes. Thus the submodules that are not swapped out end up reading / updating dangling state. This PR does away with these hidden attributes. Instead, we directly read the underlying buffer or placeholder that was updated, and update those underlying buffers and placeholders in place. This makes the graphs look much closer to their eager origins. Test Plan: added some tests, ensured existing tests pass Differential Revision: D71203469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149206 Approved by: https://github.com/tugsbayasgalan	2025-03-19 03:43:30 +00:00
angelayi	01a57981aa	[export] Add TracingContext (#149294 ) TracingContext is added to all tracing locations -- in torch.export this is where we call make_fx (for training IR) and aot_export_module (for inference IR), and in run_decompositions where we call aot_export_module Differential Revision: [D71298927](https://our.internmc.facebook.com/intern/diff/D71298927) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149294 Approved by: https://github.com/ydwu4	2025-03-19 03:11:08 +00:00
angelayi	3b48c72141	[export] Minor refactor to trace.py (#149240 ) Minor refactor to trace.py * Removed `_strict_export_lower_to_aten_ir` in favor of just `_strict_export` and `_non_strict_export` * Matched the APIs of `_strict_export` and `_non_strict_export` * Instead of a `lower_to_aten_callback` which is a callable, or `dispatch_tracing_mode`, both functions take in a `_to_aten_func` which can be either `_export_to_aten_ir_make_fx` or `_export_to_aten_ir`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149240 Approved by: https://github.com/pianpwk	2025-03-18 21:40:30 +00:00
Yanan Cao (PyTorch)	a16ada41b9	Fix outdated docstring of torch.export.export regarding strict flag (#149077 ) Summary: Fix outdated docstring of torch.export.export regarding strict flag Test Plan: None, doc only change Differential Revision: D71068215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149077 Approved by: https://github.com/zhxchen17	2025-03-17 22:29:20 +00:00
Yanan Cao (PyTorch)	ab45aaca97	Set non-strict export as default mode (#148790 ) Summary: - Flip the default value of strict argument in torch.export.export from True to False - Update test infra to cope with the change, some of them made the assumption of strict mode as default - Disabled some tests that fail in non-strict mode Test Plan: Sandcastle Differential Revision: D70228628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148790 Approved by: https://github.com/angelayi	2025-03-12 21:10:58 +00:00
Aditya Tiwari	bb9c426024	Typo Errors fixed in multiple files (#148262 ) # Fix typo errors across PyTorch codebase This PR fixes various spelling errors throughout the PyTorch codebase to improve documentation quality and code readability. ## Changes Made ### Documentation Fixes - Changed "seperate" to "separate" in multiple files: - `setup.py`: Build system documentation - `torch/_library/triton.py`: AOT compilation comments - `torch/csrc/dynamo/compiled_autograd.h`: Node compilation documentation - `torch/export/_unlift.py`: Pass population comments - `torch/export/exported_program.py`: Decomposition table notes ### Code Comments and Error Messages - Changed "occured" to "occurred" in: - `test/mobile/test_lite_script_module.py`: Exception handling comments - `torch/export/_draft_export.py`: Error message text - `aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp`: MAGMA bug comment - `torch/csrc/utils/python_numbers.h`: Overflow handling comment - `torch/csrc/jit/OVERVIEW.md`: Graph compilation documentation - `torch/_dynamo/symbolic_convert.py`: Error explanation ### API Documentation - Changed "fullfill" to "fulfill" in `torch/distributed/checkpoint/state_dict_loader.py` - Changed "accross" to "across" in: - `torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp` - `torch/distributed/distributed_c10d.py` ## Motivation These changes improve code readability and maintain consistent spelling throughout the codebase. No functional changes were made; this is purely a documentation and comment improvement PR. ## Test Plan No testing required as these changes only affect comments and documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148262 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2025-03-09 12:21:40 +00:00
Pian Pawakapan	c677f3251f	[export] don't use unbacked_renamings in export (#147574 ) Plan: avoid the use of unbacked renamings, and introduce a pass run in `_produce_aten_artifact` that recomputes unbacked bindings. Decided to do this because in we don't serialize unbacked renamings (or any ShapeEnv state), so this used to compose poorly with de/serialization. This hopefully establishes the invariant that the unbacked binding keys are always in sync with the example values (i.e. same indices, and removed if the symbol is replaced / specialized). For de/serialization, we don't stored unbacked bindings, and just rerun the pass. Involved a refactor of compute_unbacked_bindings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147574 Approved by: https://github.com/avikchaudhuri	2025-03-04 21:43:49 +00:00
Angela Yi	60205b0eb2	[export] Fix logging so that it doesn't result in max recursion error (#148231 ) Test Plan: buck2 run mode/dev-nosan sigmoid/inference/ts_migration:pt2i_readiness_main -- --model_id=487493491 --test_suite ads_all --mode test_full_model Produces https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmp2wsjQH/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 Differential Revision: D70416613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148231 Approved by: https://github.com/yiming0416	2025-03-04 20:47:25 +00:00
Angela Yi	6e0b09728a	[export] Remove report from draft-export output (#147558 ) Summary: This matches the export API. To print the report, people can just do `print(ep._report)`. This information is also displayed in the terminal after the draft_export call. Test Plan: CI Reviewed By: SherlockNoMad Differential Revision: D69689154 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147558 Approved by: https://github.com/pianpwk	2025-02-22 00:54:29 +00:00
Avik Chaudhuri	698f6f9fae	specify only some dimensions in shapes collection (#147534 ) Differential Revision: [D69936316](https://our.internmc.facebook.com/intern/diff/D69936316/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147534 Approved by: https://github.com/bobrenjc93	2025-02-21 22:02:42 +00:00
Zhengxu Chen	fdb1305ace	reland "[sigmoid] Test OSS model runner with test_export.py" (#147535 ) Summary: There are ~260 tests for all the corner cases of export from test_export.py. utitlizing to test sigmoid in the OSS setting. Test Plan: buck test mode/opt caffe2/test:test_export -- -r _sigmoid Differential Revision: D69937387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147535 Approved by: https://github.com/yiming0416	2025-02-20 23:45:13 +00:00
Aaron Orenstein	db4ce78d46	PEP585: More UP006 fixes (#146392 ) This should be the final PR before we can enable RUFF UP006. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392 Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007	2025-02-20 06:18:13 +00:00
Gregory Comer	f63db6255f	Re-land exclude upsample_bilinear2d.vec and nearest2d.vec from default export decomposition table (#147153 ) Note: This is a re-land of https://github.com/pytorch/pytorch/pull/141791, which I reverted due to breaking some Meta-internal tests - an internal ET delegate did not handle the non-decomposed upsample_nearest2d, and it was not caught in CI. I've resolved that issue and should be ready to safely re-land. Summary: As upsample_bilinear2d.vec and upsample_nearest2d.vec are core ATen ops, they should not be decomposed by default in the export path. Because the operators have CompositeImplicitAutograd dispatch, their decomposition is registered by default. This change adds an override list for CIA decompositions being registered in the default decomp table. In the long-term, we likely will want to exclude decompositions for all core-tagged CIA ops, but this will require all consumers to be ready to handle the remaining two ops, avg_pool1d, and adaptive_avg_pool1d. Until they are ready, I believe an explicit override list is the safest option. Additionally, I've also removed the ExecuTorch XNNPACK delegate ConvertToUpsampleBilinear2d pass, as the pass breaks (and is not needed), given that the op is not decomposed. The purpose of this pass was originally to pattern match the decomposition and recompose it, but this is no longer necessary. Test Plan: Added a new test (`test_default_decomposition_core_cia_ops`) in test_export.py to verify that upsample_bilinear2d.vec (and in the future, other core-tagged CIA ops) are not decomposed by default. Also, I manually validated end to end with ExecuTorch that the op is not decomposed in to_edge (see N6238522). ``` buck test //caffe2/test:test_export -- test_default_decomposition_core_cia_ops ``` Differential Revision: D69625112 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147153 Approved by: https://github.com/manuelcandales	2025-02-19 23:03:29 +00:00
Angela Yi	2c3680ce38	[apf] Fix input adapter (#147238 ) Summary: Add support for inputs that no longer exist in `input_fields`, but is not actually used by the original program. In this case, we just give it a dummy input based on the node's metadata. Test Plan: Verified for S488841 Differential Revision: D69328093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147238 Approved by: https://github.com/pianpwk	2025-02-19 04:49:58 +00:00
Aaron Gokaslan	e738f7ba23	[BE]: Enable ruff rule SIM113 (#147290 ) Lint rules that tells the user to avoid keeping track of their own counter and use the builtin enumerate when possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147290 Approved by: https://github.com/jansel	2025-02-16 22:41:16 +00:00
Aaron Gokaslan	6344ca1dd4	[BE][Ez]: Apply FURB188: use str remove(pre\|suf)fix (#146997 ) Since we are on 3.9, we can use this nice str builtin which is more readable and more efficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146997 Approved by: https://github.com/XuehaiPan, https://github.com/cyyever, https://github.com/jansel	2025-02-14 03:38:07 +00:00
angelayi	67cbbb29e0	[export] Dedup expression_created logs (#146859 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146859 Approved by: https://github.com/pianpwk ghstack dependencies: #146532, #146533, #146534, #146858	2025-02-13 00:21:34 +00:00
angelayi	59bc5d0d71	[tlparse] Add stacktrace filter utility (#146858 ) Added a utility function for capturing the user stack and framework stacktrace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146858 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #146532, #146533, #146534	2025-02-13 00:21:34 +00:00
angelayi	43f5566c92	[export] Add additional tlparse logging (#146534 ) Added some additional logging so we can also run tlparse on generic export errors Pull Request resolved: https://github.com/pytorch/pytorch/pull/146534 Approved by: https://github.com/pianpwk ghstack dependencies: #146532, #146533	2025-02-13 00:21:34 +00:00
angelayi	b4bdbce1ac	[export] Use custom stream logger in draft-export (#146533 ) Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows: ```python if key == "missing_fake_kernel": return hash((key, data["op"])) # Same ops get deduped elif key == "mismatched_fake_kernel": return hash((key, data["op"], data["reason"])) # Same op and reason for errors get deduped elif key == "propagate_real_tensors": return hash((key, json.dumps(data["stack"]))) # Guards appearing on the same stacktrace get deduped elif key == "create_unbacked_symbol": return hash((key, json.dumps(data["stack"]))) # Unbacked symbols appearing on the same stacktrace get deduped ``` Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this. The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146533 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #146532	2025-02-13 00:21:34 +00:00
Tugsbayasgalan Manlaibaatar	ebd992724f	Implement serializable getattr support for tensor subclasses (#145772 ) builtins.getattr is not serializable, so we replace it with a custom op that has more refined schema. Differential Revision: [D68899421](https://our.internmc.facebook.com/intern/diff/D68899421) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145772 Approved by: https://github.com/bdhirsh	2025-02-11 19:05:14 +00:00
PyTorch MergeBot	fe94ece375	Revert "Exclude upsample_bilinear2d.vec from default core ATen decomposition table (#141791 )" This reverts commit `3d604b17d9`. Reverted https://github.com/pytorch/pytorch/pull/141791 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/141791#issuecomment-2649717140))	2025-02-11 03:17:59 +00:00
PyTorch MergeBot	f38f1dcd82	Revert "move and fix logic to update unbacked bindings (#146115 )" This reverts commit `103c8b44bc`. Reverted https://github.com/pytorch/pytorch/pull/146115 on behalf of https://github.com/huydhn due to This change has been reverted internally D69129334 but the OSS revert failed https://github.com/pytorch/pytorch/pull/146437 ([comment](https://github.com/pytorch/pytorch/pull/146115#issuecomment-2649610877))	2025-02-11 01:26:36 +00:00
Gregory Comer	3d604b17d9	Exclude upsample_bilinear2d.vec from default core ATen decomposition table (#141791 ) As upsample_bilinear2d.vec is a core ATen op, it should not be decomposed by default in the export path. Because the operator has CompositeImplicitAutograd dispatch, its decomposition is registered by default. This change adds an override list for CIA decompositions being registered in the default decomp table. In the long-term, we likely will want to exclude decompositions for all core-tagged CIA ops, but this will require all consumers to be ready to handle the remaining three ops: upsample_nearest2d.vec, avg_pool1d, and adaptive_avg_pool1d. Until they are ready, I believe an explicit override list is the safest option. Additionally, I've also removed the ExecuTorch XNNPACK delegate ConvertToUpsampleBilinear2d pass, as the pass breaks (and is not needed), given that the op is not decomposed. The purpose of this pass was originally to pattern match the decomposition and un-decomposite it, but this is no longer necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141791 Approved by: https://github.com/tugsbayasgalan, https://github.com/digantdesai	2025-02-10 19:30:19 +00:00
Avik Chaudhuri	103c8b44bc	move and fix logic to update unbacked bindings (#146115 ) Summary: Previously we were touching up unbacked bindings between Dynamo and AOTAutograd in strict export, but the logic had a bug: if an unbacked symint gets substituted by a backed symint, we would put the backed symint in the unbacked bindings (the check `is_symbol` was not enough here). This PR fixes this logic, and moreover, moves it into the serializer instead, because we don't need this adjustment outside serde. Test Plan: added test D68880766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146115 Approved by: https://github.com/pianpwk	2025-02-07 22:41:19 +00:00
Pian Pawakapan	1c872803cb	[export][dynamic shapes] log provenance for locals & symbols for non-strict (#143378 ) Adds `dtrace_structured` logging so when a guard or real-tensor propagation assert is added, the relevant user code with local symbolic values & free symbols are logged, e.g. from the draft export CLI report (soon to be added to tlparse): 1. Guard added: ``` 1. Constraint violation error. The specified input dynamic_shapes spec was found to be incorrect during tracing. Specifically, this guard was added: Eq(s0, 3), where {'s0': "L['args'][0][0].size()[0]"}. This occured at the following stacktrace: File /data/users/pianpwk/pytorch/test/export/test_draft_export.py, lineno 267, in forward: assert a.shape[0] == 3 Locals: a: Tensor(shape: torch.Size([s0, 3]), stride: (3, 1), storage_offset: 0) Symbols: s0: L['args'][0][0].size()[0] ... ``` 2. Real tensor propagation: ``` 1. Data dependent error. When exporting, we were unable to evaluate the value of `u2 < 0`. This was encountered 8 times. This occurred at the following stacktrace: File /data/users/pianpwk/pytorch/test/export/test_draft_export.py, lineno 217, in forward: return res[:c_item] Locals: res: Tensor(shape: torch.Size([u0, u1]), stride: (Max(1, u1), 1), storage_offset: 0) c_item: u2 ... ``` Currently the values are extracted from the traceback, and are only valid for non-strict; strict seems to require storing & fakifying locals in the frames reporting by `TracingContext`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143378 Approved by: https://github.com/avikchaudhuri, https://github.com/bobrenjc93	2025-02-07 05:46:05 +00:00
Tugsbayasgalan Manlaibaatar	d2a2b9f8a7	Fix constants with non-functional operators (#145593 ) Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing. Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (https://github.com/pytorch/pytorch/issues/141336) and internal issues reported due to this difference. To fix this, there are two ways: 1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs. 2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster. For future reference: Why are we not doing "turning constants into non-persistent buffers and never de-register"? The reason is because in some internal models, they rely on module.to to reliably work to move params/buffers to correct device. As a result, buffers are moved while constants are not. In composibility meeting, we agreed that export won't do device agnostic tracing going forward (it will provide a way to specify FakeTensor in CPU that can be configured to be run on GPU), so after that is done, we can always turn constants into non-persistent buffers which will simplify export's constant handling. Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145593 Approved by: https://github.com/avikchaudhuri	2025-02-05 17:44:19 +00:00
Angela Yi	eb832b7bcc	[export] Fix draft-export logging (#146106 ) Summary: Fix issue where the lazyTraceHandler does not exist Test Plan: CI Differential Revision: D68928070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146106 Approved by: https://github.com/yiming0416	2025-02-05 05:49:22 +00:00
PyTorch MergeBot	f242da41c7	Revert "move and fix logic to update unbacked bindings (#146115 )" This reverts commit `0144613e6f`. Reverted https://github.com/pytorch/pytorch/pull/146115 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/146115#issuecomment-2635695958))	2025-02-05 04:51:39 +00:00
Angela Yi	6e03f4f90e	[export] Include metadata in FlatArgsAdapter (#146107 ) Summary: With https://github.com/pytorch/pytorch/pull/145956, which introduces storing a list of namedtuple field names when serializing, we now want to expose this list to the args adapater so that APS can utilize this information and remove extraneous inputs. Test Plan: No-op Differential Revision: D68928416 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146107 Approved by: https://github.com/pianpwk	2025-02-05 00:29:58 +00:00
angelayi	0c37c332da	[export] Additionally save pytree namedtuple field names (#145956 ) If a user passes in a namedtuple as an input, currently the input TreeSpec looks like: `TreeSpec(type=namedtuple, context=”class_fqn”, children_spec=[, ])` The user then saves the program containing this input TreeSpec. But what happens if they load it in a new environment where `class_fqn` now contains an additional field? This means that the exported program is now expected to take in another input. But since those fields were not used in the original program, users should be able just drop those additional fields and the program will run successfully. This is needed/used in APS where they use unflattener's adapter to adapt the inputs based on the previously saved treespecs. There are a couple of [solutions](https://docs.google.com/document/d/1V4ZSdy-8PUISWc8RqvGu3DU01BVegJhHHPWqa1Io7Eg/edit?tab=t.0) for how we can address this, but eventually we settled on saving a side table mapping namedtuple types to their list of field names, which can then be accessed by the adapter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145956 Approved by: https://github.com/zhxchen17	2025-02-04 04:42:30 +00:00
Tugsbayasgalan Manlaibaatar	041e08f9dc	Add buffers to parameterizaiton rule (#145991 ) Differential Revision: [D68959513](https://our.internmc.facebook.com/intern/diff/D68959513) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145991 Approved by: https://github.com/bdhirsh	2025-02-03 16:49:03 +00:00
Avik Chaudhuri	0144613e6f	move and fix logic to update unbacked bindings (#146115 ) Summary: Previously we were touching up unbacked bindings between Dynamo and AOTAutograd in strict export, but the logic had a bug: if an unbacked symint gets substituted by a backed symint, we would put the backed symint in the unbacked bindings (the check `is_symbol` was not enough here). This PR fixes this logic, and moreover, moves it into the serializer instead, because we don't need this adjustment outside serde. Test Plan: added test Differential Revision: D68880766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146115 Approved by: https://github.com/pianpwk	2025-02-02 10:43:55 +00:00
Avik Chaudhuri	cde5ddfd14	fix internal error with reorder submodules (#146181 ) Test Plan: hard to isolate as small repro Differential Revision: D68963033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146181 Approved by: https://github.com/angelayi	2025-02-01 00:30:42 +00:00
angelayi	1c9014a135	[export] Add tlparse to draft-export (#145810 ) Dependent on https://github.com/ezyang/tlparse/pull/87/files Pull Request resolved: https://github.com/pytorch/pytorch/pull/145810 Approved by: https://github.com/pianpwk	2025-01-29 19:26:00 +00:00
Pian Pawakapan	cbc4094298	[draft_export] add LOC for data-dep error logging (#145443 ) Summary: maybe this is too much info, but it's difficult to go through old draft export reports where the stack trace is out of sync with the current codebase. Data-dependent errors now look like: ``` 2. Data dependent error. When exporting, we were unable to evaluate the value of `u306`. This occurred at the following stacktrace: File /data/users/pianpwk/fbsource/buck-out/v2/gen/fbcode/78204cab86e8a0fb/sigmoid/inference/ts_migration/__pt2i_readiness_main__/pt2i_readiness_main#link-tree/caffe2/torch/fb/training_toolkit/common/proxy_module_thrift/embedding_bag_proxy.py, lineno 109, in _forward_impl: `if offsets[-1] > len(input):` As a result, it was specialized to evaluate to `261`, and asserts were inserted into the graph. Please add `torch._check(...)` to the original code to assert this data-dependent assumption. Please refer to https://docs.google.com/document/d/1kZ_BbB3JnoLbUZleDT6635dHs88ZVYId8jT-yTFgf3A/edit#heading=h.boi2xurpqa0o for more details. ``` This would be even more helpful for reports on torch-packaged models, but that requires some more work on PT2I-specific stack trace processing Test Plan: . Differential Revision: D68534017 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145443 Approved by: https://github.com/angelayi	2025-01-28 18:55:16 +00:00
Randolf Scholz	835e770bad	Use `typing.IO[bytes]` instead of `io.BytesIO` in annotations (#144994 ) Fixes #144976 Using appoach ① `IO[bytes]`, but could also try with a protocol. ## Notes: - moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike` - Use `FileLike` annotation where it makes sense - made sure those functions also support `os.PathLike` - Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate. - Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`) - needed to make `torch.serialization._opener` generic to avoid LSP violations. - skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str \| PathLike[str] \| IO[bytes]` directly... Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-01-27 18:08:07 +00:00
Avik Chaudhuri	42b8e233d9	serde unbacked bindings (#144894 ) Adds unbacked bindings during deserialization. These are carried by a node's metadata, and map pending fresh unbacked symbols to paths to such symbols inside the corresponding example value carried by the node's metadata. Since it is awkward to serialize paths, we only serialize the names of these symbols and reconstruct the paths on deserialization, using a shape env util. We also need to bump counters for unbacked symbols here, because the shape env util we use to create these symbols (when deserializing example values) don't do so, and not doing so makes later passes (like `run_decompositions`) crash because new unbacked symbols don't get new names. This is enough for non-strict. For strict, the unbacked bindings and example values in node metadata can get out of sync, because of running AOTAutograd as an additional step after Dynamo. So we have to sync those back. Differential Revision: [D68232274](https://our.internmc.facebook.com/intern/diff/D68232274/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144894 Approved by: https://github.com/pianpwk	2025-01-25 02:34:27 +00:00
Aaron Gokaslan	f3304571fc	[BE][Ez]: FURB148 - remove useless enumerate calls (#145619 ) Remove useless enumerate calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/145619 Approved by: https://github.com/drisspg	2025-01-24 23:37:15 +00:00
Pian Pawakapan	99367ecbed	[draft export] count how many times a data-dep error shows up (#145030 ) Summary: maybe this is helpful? Test Plan: draft_export Differential Revision: D68303934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145030 Approved by: https://github.com/angelayi	2025-01-23 20:27:31 +00:00
Aaron Gokaslan	5ebca3015d	[BE]: Simplify set add with set update (#145152 ) Simplifies the set update slightly to be more readable and efficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145152 Approved by: https://github.com/XuehaiPan, https://github.com/albanD Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>	2025-01-23 20:18:13 +00:00
PyTorch MergeBot	6e53588789	Revert "[BE]: Simplify set add with set update (#145152 )" This reverts commit `0cb9b2284a`. Reverted https://github.com/pytorch/pytorch/pull/145152 on behalf of https://github.com/davidberard98 due to land race with https://github.com/pytorch/pytorch/pull/145165 broke lint ([comment](https://github.com/pytorch/pytorch/pull/145152#issuecomment-2608378172))	2025-01-22 22:14:26 +00:00
Aaron Gokaslan	0cb9b2284a	[BE]: Simplify set add with set update (#145152 ) Simplifies the set update slightly to be more readable and efficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145152 Approved by: https://github.com/XuehaiPan, https://github.com/albanD Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>	2025-01-22 21:31:13 +00:00
Zhengxu Chen	ac8ddf1150	[export][be] Clean up local imports from export [1/n] (#145287 ) Summary: as title Test Plan: CI Differential Revision: D68449844 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145287 Approved by: https://github.com/pianpwk	2025-01-22 19:09:17 +00:00
Aaron Orenstein	b6c5562c1f	PEP585 update - torch/export (#145165 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145165 Approved by: https://github.com/bobrenjc93	2025-01-19 20:56:55 +00:00
Zhengxu Chen	53256edff9	[export] Support module inputs for non strict mode. (#143925 ) Summary: Add experimental support for torch.nn.Module as input types. Before this change, we don't support module inputs but recently we saw some interesting use cases like gpt-fast https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L68 where we directly pass in a module input for different variants of the same models. Since we don't really care about non-param or non-buffer states in non strict mode, we don't care about those either and pretend they are like plain constants during tracing. We treat any module input like a nested container of tensor, and each time we will automatically register a pytree handler for these module types to flatten its state dict into a group of tensors. We will just inline any module method call during tracing like we did for `self` module in export_for_training. This will make input modules' behavior very similar to the training module in typical case, except that we don't record the inputs as parameter or buffers but rather just plain user inputs. Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_module_input Differential Revision: D67680827 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143925 Approved by: https://github.com/tugsbayasgalan	2025-01-16 17:30:36 +00:00
Pian Pawakapan	774f21a370	[export] handle buffer/input mutations for joint-graph (#144806 ) Summary: previous construction of GraphSignature output specs didn't consider buffer/user input mutations Test Plan: test_experimental Differential Revision: D68177409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144806 Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri	2025-01-16 00:22:16 +00:00
Yidi Wu	c7dbee5106	[reland][export] don't decompose custom triton op when exporting (#144284 ) Summary: A reland of https://github.com/pytorch/pytorch/pull/142426. Copying the description over here: For torch.export (strict and non-strict), we don't do functional decomposition. Instead, we preserve the custom triton ops as custom ops. This is because we want the exported program to be high-level and serializable. The alternative: If we decompose the custom op to a functional hop and make it a node in exported program, we need to figure out ways of serializing the hop and its arguments, which can be triton.jited python functions and triton dtypes. This is undesireble because: it can be tedious to maintain layer that serialize the jited function (e.g. with a string) and dtypes. changes to triton or the serialization logic for triton arguments can be BC breaking exported program will expose the implementation detail (i.e. triton source code) for a specific backend (GPU) to users, which mixes levels of abstraction. Future plans: After this PR, in the short term, we expect users to have a seperate aot_compile stage that compiles the exported program into a Cubin file on the same machine that users call export, which does autotuning and removes triton dependency and serve the model with Cubin. This guarantees that triton changes won't break BC. In the long term, we may export multiple cubins for the triton op directly. Test Plan: see new tests. Differential Revision: D67879685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144284 Approved by: https://github.com/zou3519	2025-01-11 01:34:35 +00:00
angelayi	10ff6b8894	[export] Add pickle protocol (#142253 ) Fixes https://github.com/pytorch/pytorch/issues/142004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142253 Approved by: https://github.com/avikchaudhuri	2025-01-10 19:49:07 +00:00
Xuehai Pan	dcc3cf7066	[BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415 ) The fixes are generated by: ```bash ruff check --fix --preview --unsafe-fixes --select=E226 . lintrunner -a --take "RUFF,PYFMT" --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415 Approved by: https://github.com/huydhn, https://github.com/Skylion007	2025-01-08 21:55:00 +00:00
Brian Muse	a5164a2b18	[BE] Clean up ExecuTorch Export Docstring (#141490 ) Summary: I noticed when looking at the docs for [`torch.export.load`](https://pytorch.org/docs/stable/_modules/torch/export.html#load) that it looked like there was a copy and paste error from the save command docstring since ep is not an actual parameter for load and it says "The exported program to save." This diff removes it from the docstring. Test Plan: Automated Testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/141490 Approved by: https://github.com/JacobSzwejbka	2025-01-08 21:28:58 +00:00
Tugsbayasgalan Manlaibaatar	c68c38c673	Support getattr for tensor subclasses in pre-dispatch export via patching tensor.getattr (#143946 ) Previous discussion: https://github.com/pytorch/pytorch/pull/143671#issuecomment-2560112499 and https://github.com/pytorch/pytorch/pull/143671 Differential Revision: [D67693609](https://our.internmc.facebook.com/intern/diff/D67693609) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143946 Approved by: https://github.com/bdhirsh	2025-01-06 23:55:50 +00:00
bobrenjc93	edbda2fad8	remove allow-untyped-defs from torch/export/_remove_auto_functionalized_pass.py (#144230 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144230 Approved by: https://github.com/Skylion007	2025-01-06 22:23:19 +00:00
Pian Pawakapan	bba672e117	[docs/export] update dynamic_shapes docs (#142510 ) https://pytorch.org/docs/stable/export.html dynamic_shapes section formatting is messed up, fix & update documentation to be more user-friendly. Happy accepting nits :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142510 Approved by: https://github.com/yushangdi	2025-01-06 14:12:34 +00:00
bobrenjc93	64b197b603	remove allow-untyped-defs from export/_remove_auto_functionalized_pass.py (#144135 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144135 Approved by: https://github.com/Skylion007	2025-01-03 20:08:11 +00:00
Tom Ritchford	f1cbf4b1b5	Enable ruff's unused variable checking everywhere in pytorch (#136965 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136965 Approved by: https://github.com/cyyever, https://github.com/albanD	2024-12-22 02:33:11 +00:00
Avik Chaudhuri	51eacea8c4	graph module retracing without preserving MCS (#143676 ) Retracing while preserving module call signatures used to be a problem because graph modules don't have submodules at given paths. This led to a number of failing retracebility tests. By not trying to wrap modules with export tracepoints we can pass most of these tests; the only exception is where you do module swapping on retraced programs, which is still not possible. Differential Revision: [D67539304](https://our.internmc.facebook.com/intern/diff/D67539304/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143676 Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan ghstack dependencies: #143664	2024-12-21 07:57:43 +00:00
Avik Chaudhuri	bdeee82822	unflatten isinstance (#143664 ) When we unflatten, the submodules we generate (`InterpreterModule` or `InterpreterModuleDispatcher`) are not related by type to the original submodules `N`. This makes `isinstance(mod, N)` checks fail. Since we do not have the original types after export, the best we can do is expose a `type_name()` method that carries the original type name, which we do carry in `nn_module_stack` entries. Differential Revision: [D67526542](https://our.internmc.facebook.com/intern/diff/D67526542/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143664 Approved by: https://github.com/tugsbayasgalan	2024-12-21 01:07:10 +00:00
Tugsbayasgalan Manlaibaatar	0ce233b8ca	Support tensor subclass unwrapping (#141941 ) This PR adds support for export to unwrap/wrap subclasses AOT so that we can trace through subclass parameters. This will resolve the UX issue in torchao where users had to manually unwrap their subclasses before calling export. Differential Revision: [D67531057](https://our.internmc.facebook.com/intern/diff/D67531057) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141941 Approved by: https://github.com/bdhirsh	2024-12-21 00:29:31 +00:00
Yidi Wu	1e201422ed	[export] add is_exporting flag (#142425 ) We added an is_export flag under torch.compiler.is_exporting. This comes handy when we try to do some special logic in user-level and system-level (e.g. in upper of the stack). In increasing-scope: - `_is_fx_tracing` is set to True when we use under symbolic_trace or make_fx. - `is_exporting` is set to True when we're doing strict or non-strict export, which internally has a step that calls make_fx and set _is_fx_tracing to be True. - `is_compiling` is set to True when we're either doing strict, non-strict export or torch.compile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142425 Approved by: https://github.com/avikchaudhuri	2024-12-18 21:36:28 +00:00
Avik Chaudhuri	497f89ff83	fix dynamo nn module stack fqn (#142823 ) Dynamo can produce sources that have funny patterns in their `.name()` that break `nn_module_stack` fqns. Added a test that used to have `._modules` inside nn_module_stack fqns, now doesn't. (Unfortunately couldn't repro a case mentioned in the GH issue where `.slice(...)` is claimed to appear as well.) Fixes https://github.com/pytorch/pytorch/issues/141939 Differential Revision: [D67064189](https://our.internmc.facebook.com/intern/diff/D67064189/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142823 Approved by: https://github.com/pianpwk, https://github.com/zhxchen17	2024-12-12 07:02:13 +00:00
Avik Chaudhuri	db51308d9c	fix output node name (#142506 ) Fixes #142227 Differential Revision: [D67043283](https://our.internmc.facebook.com/intern/diff/D67043283/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142506 Approved by: https://github.com/ydwu4	2024-12-11 17:28:28 +00:00
Avik Chaudhuri	e3886fb13c	misc. fixes to unflatten (#142141 ) Combining several fixes to unflatten for bugs revealed by random graph testing. The fixes target two categories of bugs: 1. Some bugs show up as exponential blowups for largish system of nn modules. These are fixes by converting lists to sets, using caching, or otherwise rewriting to reuse computation more effiicently. 2. Other bugs were due to missing intermediate modules created when attributes such as submodules and buffers are accessed through longish paths before calling the corresponding intermediate modules, or missing attributes such as buffers and constants in submodules corresponding to multiple calls. Differential Revision: [D66659795](https://our.internmc.facebook.com/intern/diff/D66659795/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142141 Approved by: https://github.com/ydwu4	2024-12-10 03:45:13 +00:00
Shangdi Yu	bcddae14ec	Enhance "from_node" node meta to track source recursively (#142066 ) Summary: Change the "from_node" node meta format to be able to track the provenance of nodes recursively. The new "from_node" format is a a list node NodeSource: ``` class NodeSource: self.node_name: str self.target: str self.graph_id: int self.pass_name: str self.action: str self.from_node: List[NoedSource] ``` This is in preparation for the inductor provenance tracking. For background, the inductor provenance tracking doc: https://docs.google.com/document/d/1dGh9myqNhywmbfP0Quzx_f04bghDFlj8cawj8MopiO8/edit?fbclid=IwZXh0bgNhZW0CMTEAAR0jUQ0Tf4ROLDED8Y_eIzrU0KVZVdRmyIQLp-avt-kGRPI_VgYVNyjH_q0_aem_HCQ_pxHDiwOkO9mQyWB2-g&tab=t.0 (internal only), Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_unflatten_multiple_graphs_state buck run mode/dev-nosan caffe2/test:fx -- -r node_source ``` Differential Revision: D66737916 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142066 Approved by: https://github.com/avikchaudhuri	2024-12-09 23:39:15 +00:00
Fabian Keller	5e8e1d725a	Remove some unused type ignores (round 1) (#142325 ) Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking. Having these `# type: ignore` linger around is not ideal for two reasons: - They are verbose/ugly syntatically. - They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore. I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward. This PR should have no effect on runtime at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325 Approved by: https://github.com/Skylion007, https://github.com/janeyx99	2024-12-09 18:23:46 +00:00
bhack	ae9cda0221	Add `truediv` support in export serializer (#136364 ) Fixes #136113 - [x] Inital `truediv` coverage - [ ] Expand/reduce coverage? - [x] Add tests - [x] Re-check docstrings - [ ] Linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364 Approved by: https://github.com/pianpwk Co-authored-by: Angela Yi <angelayi@meta.com> Co-authored-by: Pian Pawakapan <pianpwk@meta.com>	2024-12-05 17:33:33 +00:00
Yiming Zhou	31f2d4eb4e	[export] Update docs (#142011 ) Summary: Update export docs. Including: 1. Update the output graph. 2. Misc fixes for examples. Test Plan: CI Differential Revision: D66726729 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142011 Approved by: https://github.com/angelayi	2024-12-05 03:44:46 +00:00
Fabian Keller	f472b3aee1	improve typings around torch.export (#141829 ) This is another follow-up to https://github.com/pytorch/pytorch/pull/115074 / https://github.com/pytorch/pytorch/pull/141240 following the strategy discussed there (https://github.com/pytorch/pytorch/pull/115074#issuecomment-2480992230). This PR improves the type annotations around `torch._export`. Even though the PR introduces a few runtime type asserts, the runtime behavior should stay equivalent, because the failed assertions should have been immediate crashes anyway. CC @Skylion007 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/141829 Approved by: https://github.com/ezyang	2024-12-03 19:57:21 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	a6bea3d86d	Fix DCe in training IR to reflect correct record function op (#141899 ) Summary: The exit function is actually exit._recordFunction not exit.default Test Plan: CI Differential Revision: D66665359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141899 Approved by: https://github.com/ydwu4	2024-12-03 01:59:37 +00:00
Fabian Keller	394c339691	improve typings in unflatten (#141817 ) A first follow-up to https://github.com/pytorch/pytorch/pull/115074 / https://github.com/pytorch/pytorch/pull/141240 following the strategy discussed there (https://github.com/pytorch/pytorch/pull/115074#issuecomment-2480992230). This PR improves the type annotations around `unflatten.py` which had been inaccurate due to the previously suppressed type checking on `torch.nn.Module`. CC @Skylion007 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/141817 Approved by: https://github.com/Skylion007	2024-11-30 22:12:15 +00:00
Ivan Zaitsev	09a3eddc07	Revert #141066 and #141494 (#141721 ) manual revert due to merge conflicts note: #141494 was reverted out of order blocking automatic revert of #141066 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141721 Approved by: https://github.com/avikchaudhuri	2024-11-28 20:18:19 +00:00
PyTorch MergeBot	8c90a9a030	Revert "fix non termination in unflatten + state (#141494 )" This reverts commit `5d7c3701e4`. Reverted https://github.com/pytorch/pytorch/pull/141494 on behalf of https://github.com/jovianjaison due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/141494#issuecomment-2504639230))	2024-11-27 19:30:55 +00:00
PyTorch MergeBot	6e61ff4fd3	Revert "Add `truediv` support in export serializer (#136364 )" This reverts commit `1df440dc4e`. Reverted https://github.com/pytorch/pytorch/pull/136364 on behalf of https://github.com/huydhn due to Sorry for reverting your change but its doc build failure is legit ([comment](https://github.com/pytorch/pytorch/pull/136364#issuecomment-2502620732))	2024-11-27 03:24:31 +00:00
bhack	1df440dc4e	Add `truediv` support in export serializer (#136364 ) Fixes #136113 - [x] Inital `truediv` coverage - [ ] Expand/reduce coverage? - [x] Add tests - [x] Re-check docstrings - [ ] Linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364 Approved by: https://github.com/pianpwk Co-authored-by: Angela Yi <angelayi@meta.com> Co-authored-by: Pian Pawakapan <pianpwk@meta.com>	2024-11-27 00:31:47 +00:00
Zhengxu Chen	011650adc5	[sigmoid] Refactor out a helper function to insert const graph into top level graph. (#140854 ) Summary: Add the helper function to put a const graph back to the toplevel graph, can be useful when we're taking const graphs from delegates. Test Plan: CI Reviewed By: trieuat Differential Revision: D63031982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140854 Approved by: https://github.com/SherlockNoMad	2024-11-26 20:07:46 +00:00
Avik Chaudhuri	5d7c3701e4	fix non termination in unflatten + state (#141494 ) With largish systems of nn modules with buffers, sinking params suffered from some kind of exponential blowup that is easily fixed by using a set instead of a list to keep track of unlifted buffer placeholders. Test Plan: added random dag test that failed previously Differential Revision: D66457661 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141494 Approved by: https://github.com/angelayi	2024-11-26 00:17:56 +00:00
Tugsbayasgalan Manlaibaatar	11c786dcb5	[BE] Make maybe_aliasing_or_mutating proper tag (#131990 ) For better tracking, we need to make maybe aliasing/mutating ops with proper tag. We need to special case native_batch_norm because it is not a CIA but has a wrong schema. I guess native_batch_norm will be removed at some point, so until then we just keep it around. D60347117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131990 Approved by: https://github.com/bdhirsh	2024-11-24 00:12:49 +00:00
Angela Yi	f6eeab7ea8	[export] Make unflattened module compileable (#141249 ) Test Plan: Fixes https://fb.workplace.com/groups/1028545332188949/permalink/1091988579177957/ Differential Revision: D66302806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141249 Approved by: https://github.com/avikchaudhuri	2024-11-23 18:46:01 +00:00
Avik Chaudhuri	8b4ae29b1b	misc. fixes to unflatten (#141066 ) Handling of nested modules in unflatten had several bugs, which were caught by trying to preserve module call signatures for nested modules. * A module `k` encountered when calling `k.n()` before `k()` used to become an empty nn module. This caused some information to be dropped when `k()` was eventually called. Relatedly, we would also lose call counts for `k.n()` through different paths (say, when `k()` calls `n()`). * Deleting call-indexed modules and patching up their call sites was broken for nested modules when creating dispatcher modules, because of silliness when handling their fqns. An interesting aside is that we used random graph generation for testing some of these changes. A future PR will add the infra to create tests using these random graphs. Differential Revision: D66192799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141066 Approved by: https://github.com/angelayi	2024-11-23 07:31:51 +00:00
angelayi	32583d915e	[export] Improve stacktrace filtering (#141285 ) Differential Revision: [D66321127](https://our.internmc.facebook.com/intern/diff/D66321127) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141285 Approved by: https://github.com/yushangdi ghstack dependencies: #141071, #141072	2024-11-22 20:55:04 +00:00
angelayi	53df1c11cd	[export] Add custom op guards (#141072 ) For custom ops that do not have a meta kernel, draft export automatically creates a meta kernel based on the tracing example inputs. To ensure that these assumptions made during tracing is clear to the user, we add assertions into the traced exported program: An example graph: ``` ExportedProgram: class GraphModule(torch.nn.Module): def forward(self, a: "f32[s0, s1]", b: "f32[s2, s3]"): # File: /data/users/angelayi/pytorch/test/export/test_draft_export.py:172 in forward, code: res1 = torch.ops.mylib.foo4(a, b) _assert_tensor_metadata = torch.ops.aten._assert_tensor_metadata(a, dtype = torch.float32, device = device(type='cpu')); _assert_tensor_metadata = None _assert_tensor_metadata_1 = torch.ops.aten._assert_tensor_metadata(b, dtype = torch.float32, device = device(type='cpu')); _assert_tensor_metadata_1 = None foo4: "f32[u2, u3]" = torch.ops.mylib.foo4.default(a, b); a = b = None return (foo4,) ``` Differential Revision: [D66321129](https://our.internmc.facebook.com/intern/diff/D66321129) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141072 Approved by: https://github.com/pianpwk ghstack dependencies: #141071	2024-11-22 20:55:04 +00:00
Edward Z. Yang	612122af8f	Fix type-safety of torch.nn.Module instances (#141240 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141240 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-11-22 00:05:05 +00:00
Pian Pawakapan	e894219504	[export] fix loss_output in joint graph signature (#140974 ) Summary: joint-graph export is marking all outputs as LOSS_OUTPUT, fix so it marks only the correct one Test Plan: test_experimental Differential Revision: D66117412 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140974 Approved by: https://github.com/JacobSzwejbka	2024-11-21 23:57:07 +00:00
Pian Pawakapan	1132b6764a	[draft export] generate fake outputs when real tensor prop finds mismatches (#139766 ) Currently real tensor tracing raises MetadataMismatchErrors if registered fake kernels don't match the real kernels (e.g. shape, aliasing, dtype, etc.). This adds an option to use fake kernel inference to bypass mismatches - this option defaults to False for real tensor tracing, but is on for draft export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139766 Approved by: https://github.com/angelayi, https://github.com/zou3519	2024-11-21 08:01:09 +00:00
Tugsbayasgalan Manlaibaatar	87f9c1abe5	Change export IR to non-functional pre-dispatch IR (#139511 ) Differential Revision: [D65362160](https://our.internmc.facebook.com/intern/diff/D65362160) State after this IR: 1. For the tests that require inference IR, they are replaced with ep.run_decomp({}) so export_for_training_run_decomp is sort of redundant but i guess it is still nice that multiple round of retracing still working. In general, we need some auditing to reduce our redundant testing coverages. 2. After this PR landed and not get reverted for a week or so, i will replace the export_for_training calls with export as they are the same thing now. 3. Added more tests to also cover now "deprecated" old IR by patching export to use old export. For reviewers, please look at the internal version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139511 Approved by: https://github.com/ydwu4, https://github.com/angelayi, https://github.com/avikchaudhuri	2024-11-20 21:47:55 +00:00
Aaron Gokaslan	12e95aa4ee	[BE]: Apply PERF401 autofixes from ruff (#140980 ) * Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-11-20 17:52:07 +00:00
angelayi	cb6a21b033	[export] Add setattr for ep.example_inputs (#140990 ) Differential Revision: [D66136725](https://our.internmc.facebook.com/intern/diff/D66136725) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140990 Approved by: https://github.com/yushangdi, https://github.com/ydwu4	2024-11-20 02:49:20 +00:00
Tugsbayasgalan Manlaibaatar	2b21a653d8	Register CIA ops to FakeTensorMode directly in export (#140465 ) During export, we nub out most CIA ops to return NotImplemented to avoid decomposing them during tracing. To recover the existing shape propagation behavior, we register these CIA decomps directly as FakeTensorMode rules as well. The reason we have to do is because when we return NotImplemented, FakeTensor would fallback to running these CIAs with Meta backend causing device branching CIA ops to fail. (because now the device is Meta. One example is sdpa). If we register a kernel directly to FakeTensorMode, we won't fallback to Meta backend. Differential Revision: [D65716260](https://our.internmc.facebook.com/intern/diff/D65716260/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140465 Approved by: https://github.com/bdhirsh	2024-11-19 15:00:35 +00:00
Tugsbayasgalan Manlaibaatar	b86b5349cb	Ignore eager profiling code in training IR (#140826 ) Differential Revision: [D66010452](https://our.internmc.facebook.com/intern/diff/D66010452/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140826 Approved by: https://github.com/zhxchen17	2024-11-16 20:31:17 +00:00
Angela Yi	2b39a8db77	Refactor UnflattenedModule's adapt flat args (#140840 ) Test Plan: unblocks model launch Differential Revision: D66014709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140840 Approved by: https://github.com/pianpwk	2024-11-16 05:09:37 +00:00
Shangdi Yu	8094b19620	Fix _out_spec (#140608 ) Summary: The gm_torch_level can be a _LazyGraphModule(GraphModule) instead of a GraphModule. When we call .recompile(), GraphModule populates the self._out_spec, but _LazyGraphModule(GraphModule).recompile() doesn't populate it. Test Plan: CI Differential Revision: D65902135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140608 Approved by: https://github.com/tugsbayasgalan	2024-11-14 20:09:30 +00:00
Avik Chaudhuri	7691064768	dispatcher module for multiple graphs (#139439 ) Differential Revision: [D65307961](https://our.internmc.facebook.com/intern/diff/D65307961/) This PR introduces the concept of a "dispatcher" module `n` that carries multiple interpreter modules `n`, `n@1`, `n@2`, etc., each corresponding to a particular call of `n` and thus might carry a different specialized graph. We only do this when we're preserving module call signatures for `n`. The carried modules have the same number and order of calls to `n` appearing in the original module / exported program. In the unflattened module, all those calls go to the "dispatcher" module which internally tracks how many calls have been made so far and invokes the corresponding interpreter module. We reset this tracking after a successful or unsuccessful run of the unflattened module. Overall this makes swapping easier when module call signatures are preserved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139439 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #139438	2024-11-12 09:53:40 +00:00
Angela Yi	de509abe1c	[export] Dedup data-dependent errors based on stacktrace (#139540 ) Summary: Dedup the data-dependent errors based on the stacktrace it points to. Right now we just display every propagate-real-tensor log that shows up, but we actually can dedup them if they are due to the same piece of code (ex. there could multiple calls to a piece of code that does some data dependent computation). This occurred when trying out draft export on the PT2I model zoo. For a specific model, previously we would get ~3k data dependent errors, but after deduping based on the stacktrace we now only get 4 errors. Test Plan: CI Differential Revision: D65374254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139540 Approved by: https://github.com/pianpwk, https://github.com/zou3519	2024-11-05 18:16:05 +00:00
Henry Tsang	350bc2a166	[export] Add support for symbool to make it usable for torch.cond (#138765 ) # Why? I want the following code to work. minimal repro: ``` class M(torch.nn.Module): def forward(self, dilate_flag): return dilate_flag.item() input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),) model = M().cuda() ep = torch.export.export(model, input1, strict=True) path = torch._inductor.aot_compile(ep.module(), input1) aot_model = torch._export.aot_load(path, device="cuda") actual_output = aot_model(input1) ``` error: AssertionError: Encountered an unsupported object of type <class 'torch.SymBool'> while writing the metadata for exported program second error will be handled by https://github.com/pytorch/pytorch/pull/138760 # Motivation I could technically bypass it with a torch.int tensor. However, it doesn't work with torch.cond. I want the following to work. It would also require https://github.com/pytorch/pytorch/pull/138760 for aot compile to work. ``` class M(torch.nn.Module): def __init__(self) -> None: super().__init__() self.dilate_flag = 0 def forward(self, dilate_flag): self.dilate_flag = dilate_flag.item() def true_fn(dilate_flag): return dilate_flag.clone() def false_fn(dilate_flag): return dilate_flag.clone() torch.cond( self.dilate_flag, true_fn, false_fn, (dilate_flag,), ) return self.dilate_flag input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),) input2 = (torch.tensor([0], dtype=torch.bool, device="cuda"),) inputs = (input1, input2) model = M().cuda() for input in inputs: expected_output = model(input) ep = torch.export.export(model, input, strict=False) path = torch._inductor.aot_compile(ep.module(), input) aot_model = torch._export.aot_load(path, device="cuda") actual_output = aot_model(*input) assert ( expected_output == actual_output ), f"henry they are not equal {expected_output} != {actual_output}" ``` Differential Revision: D64867504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138765 Approved by: https://github.com/ydwu4	2024-11-04 23:31:49 +00:00
Tugsbayasgalan Manlaibaatar	ae0e7042f6	Fix custom obj being input (#139209 ) Differential Revision: [D65158939](https://our.internmc.facebook.com/intern/diff/D65158939) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139209 Approved by: https://github.com/ydwu4 ghstack dependencies: #138658	2024-11-04 18:24:29 +00:00
Tugsbayasgalan Manlaibaatar	e080c89bdc	Make test_torchbind.py training IR compatible (#138658 ) In this diff, i make test_torchbind.py tests to handle training IR. Today in the training IR, we don't see the effect token and HOP because this happens at the FunctionalTensorMode. Maybe in the future, we should move this logic up to the training IR so that writing passes etc on training Ir is safer. But for the migration purposes, i think it is ok for now. I also fixed two bugs: 1. ep.module() doesn't register all aliased constants in the module. 2. When we retrace, we need to fakify the original Torchbind object. 3. We don't run any DCE on training IR so we need to add some more torch ops to verifier. Differential Revision: [D64853530](https://our.internmc.facebook.com/intern/diff/D64853530) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138658 Approved by: https://github.com/ydwu4, https://github.com/zhxchen17	2024-11-04 17:43:11 +00:00
angelayi	86db2cd194	[export] Initial draft export (#139383 ) Differential Revision: [D65288590](https://our.internmc.facebook.com/intern/diff/D65288590) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139383 Approved by: https://github.com/zou3519	2024-11-01 06:25:44 +00:00
angelayi	f14f245747	[export] Remove custom forward func in swap (#139126 ) Differential Revision: [D65100694](https://our.internmc.facebook.com/intern/diff/D65100694) Remove the custom forward function and instead move the pytree flatten/unflatten ops into the graph. This allows us to natively run via the interpreter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139126 Approved by: https://github.com/avikchaudhuri	2024-10-30 16:50:57 +00:00
Yiming Zhou	48b55ca1b1	[export] Fix non-strict retracing with kwargs (#138927 ) Summary: `torch.fx.Interpreter.run()` only takes args as input. Currently we pass kwargs as well which causes errors during retracing. Flatten the kwargs and concat them with args will solve the issue. Several previously failing tests under `_retraceability_non_strict` now passes. Test Plan: ``` buck2 test @//mode/dev-nosan //caffe2/test:test_export -- -r _retraceability_non_strict ``` Differential Revision: D64980053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138927 Approved by: https://github.com/angelayi	2024-10-29 04:31:21 +00:00
Avik Chaudhuri	9e06b5b5cb	fix unflatten with HOPs (#138978 ) Summary: Unflatten was broken for HOPs for a couple of reasons: (1) we didn't expect `get_attr` nodes in the exported program, but they can occur to hold graph arguments to HOPs; such attributes must be moved from the exported program to the corresponding unflattened submodule containing the HOP call. (2) we don't record metadata for graph arguments on serialization (there's nothing to hold it in our schema), and accordingly the `get_attr` nodes we create on deserialization don't have `nn_module_stack` metadata, which obviously wrecks unflatten. Test Plan: added a couple of tests Differential Revision: D65013647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138978 Approved by: https://github.com/zhxchen17	2024-10-28 19:30:56 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	3a0c361899	Remove presere ops (#138371 ) Summary: CI #buildall Test Plan: CI Reviewed By: StellarrZ Differential Revision: D64151426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138371 Approved by: https://github.com/bdhirsh	2024-10-25 19:13:55 +00:00
Gagan Jain	a6287b5c27	Fixing issue in move pass for copying Parameter (#138855 ) Summary: Fixing bug for Parameter copy during move pass of exported graph. Test Plan: UT runs on APS models. Differential Revision: D64876951 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138855 Approved by: https://github.com/pianpwk Co-authored-by: Gagan Jain <gaganj@meta.com>	2024-10-25 17:57:27 +00:00
Avik Chaudhuri	1d98a526dd	preserve signatures with multiple calls + buffer mutations (#138669 ) As called out in https://github.com/pytorch/pytorch/pull/137999, preserving signatures of multiple calls when buffer mutations are present was NYI. The main problem was that intermediate values of buffers were not tracked, so couldn't be propagated statefully between multiple calls (i.e., they would need to be explicitly passed around, defeating the unlifting needed for preserving signatures). This PR fixes this situation, by introducing module attributes that carry the necessary intermediate values of buffer mutations. In general, a buffer mutation can have several intermediate values it depends on recursively, even other buffers. So rather than tying an intermediate value with a particular buffer, we tie it with the submodules that create and read it. We install an attribute on all modules that create or read a particular intermediate value, sharing the same initial storage (i.e., initialized with the same empty tensor). For the module that creates this intermediate value, we copy the value into the corresponding attribute; and for the modules that read it, we read the corresponding attribute instead. Another complication that needed to be addressed was that a `run_decompositions` following an `export_for_training` was not preserving module call graphs, which is needed for unflattening and, in particular, used when remapping inputs. Fortunately some existing metadata already tracks provenance of nodes, which we could use to update a module call graph after functionalization / decomposition. Differential Revision: D64806175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138669 Approved by: https://github.com/tugsbayasgalan	2024-10-25 00:13:25 +00:00
Tugsbayasgalan Manlaibaatar	f4b3813989	Wrap autograd and autocast ops in training IR (#138516 ) Differential Revision: [D64732361](https://our.internmc.facebook.com/intern/diff/D64732361) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138516 Approved by: https://github.com/yushangdi ghstack dependencies: #138261	2024-10-23 00:37:54 +00:00
Pian Pawakapan	51045e6251	make DimHints compatible with Dims (#138490 ) Previously we'd been raising UserErrors when `Dim()` and DimHints (`Dim.AUTO/Dim.DYNAMIC`) were both specified in `dynamic_shapes`, this PR stops that, and uses `Dim()` objects to guide DimHints. The key to this was making the `EqualityConstraint` class happy when it checks that inferred equivalence relations were specified in the original `dynamic_shapes` spec, and this introduces a `RelaxedConstraint` object to mark the hinted dimensions, so equality checks between `RelaxedConstraints` and other constraints are treated as valid. Current behavior is that: ``` class Foo(torch.nn.Module): def forward(self, x, y): return x - y inputs = (torch.randn(4, 4), torch.randn(4, 4)) shapes = { "x": (Dim.AUTO, Dim("d1", min=3)), "y": (Dim("d0", max=8), Dim.DYNAMIC), } ep = export(Foo(), inputs, dynamic_shapes=shapes) ``` The dimensions marked `AUTO` and `DYNAMIC` will have max & min ranges of 8 & 3 respectively. Note that inferred equality between `Dim()` objects & `Dim.STATIC` will still raise errors - `Dim()` suggests not specializing to a constant. Differential Revision: D64636101 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138490 Approved by: https://github.com/avikchaudhuri	2024-10-22 07:43:48 +00:00
Tugsbayasgalan Manlaibaatar	9f7c26bef3	Fix training IR bug by changing passes order (#138292 ) Inserting runtime_assertions cause gm to have different names but the graph signature was populated earlier. To avoid this kind of errors in the future, I refactored these steps into a helper function. Differential Revision: [D64576251](https://our.internmc.facebook.com/intern/diff/D64576251) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138292 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #138266	2024-10-22 01:24:14 +00:00
Tugsbayasgalan Manlaibaatar	5adc33d3b8	Training IR should preserve custom metadata (#138266 ) Differential Revision: [D64576252](https://our.internmc.facebook.com/intern/diff/D64576252) @diff-train-skip-merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/138266 Approved by: https://github.com/yushangdi	2024-10-22 01:09:56 +00:00
Aaron Orenstein	07cc4bd3e2	typing compile_fx.py (#138033 ) Type annotations for compile_fx. - Some of the stuff here is pretty complicated (functions which return functions that take functions) so I bailed on those and used `Any` just to get the rest landed. - There are also changes to type signatures in other files which I did just to let mypy know more about the types in compile_fx.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138033 Approved by: https://github.com/Skylion007	2024-10-21 18:14:59 +00:00
Tom Ritchford	c0582fd0f8	Remove unused Python variables in torch/[b-z]* (#136963 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963 Approved by: https://github.com/ezyang	2024-10-19 16:45:22 +00:00
Tugsbayasgalan Manlaibaatar	1f32a1fb80	Replace torch.export default decomp table to be lazily populated (#137650 ) In this PR, we implement lazy dictionary for export decomp behaviour for following reasons: 1. Custom op loading can happen after import time, as a result, the decomp table might not be able to pick up the decomp. Therefore we try to delay materialization as late as possible. I intentionally seperated out the core_aten_decomp to not have any custom CIA ops in this PR to mitigate the risk of getting reverted but in the future, core_aten_decomp under torch/_decomp will exist as an alias to official export table (torch.export.default_decompositions) Differential Revision: [D64140807](https://our.internmc.facebook.com/intern/diff/D64140807) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137650 Approved by: https://github.com/justinchuby, https://github.com/bdhirsh	2024-10-18 19:28:52 +00:00
Avik Chaudhuri	5d01126616	preserve module signature with multiple calls (#137999 ) Previously we would error when trying to preserve the call signature for a module when it was called multiple times. This PR can now do this without erroring. The fix is to propagate call indices in a few more places. Note that while this works in the presence of params, buffers, and tensor constants, preserving call signatures for multiple calls to a module when buffers are mutated is not supported yet. This is future work. The main problem is that we do not have enough metadata to `copy_` mutated buffers at the end of each call to a module, so the next call can read those buffers at the beginning. Making this work will likely need some explicit tracking of intermediate values of mutated buffers when collecting metadata during functionalization in export. Note also that we stop short of creating a single graph out of multiple graphs: that is still future work. So the unflattened module will still have different targets `n`, `n@1`, `n@2`, etc. for each call when we ask the module call signature of `n` to be preserved. However it is way easier to swap all of these targets with a replacement that behaves similar to the original, because all of these calls will respect the original module call signature. (In particular, any constant inputs will be carried by the calls.) Differential Revision: D64406945 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137999 Approved by: https://github.com/tugsbayasgalan	2024-10-18 07:30:22 +00:00
Tugsbayasgalan Manlaibaatar	f3c3f3a3c3	Fix assigning tensor with requires_grad as constant in export (#137997 ) When we insert cojstants into unlifted graph, we need to detach them if they require grad BUT when we detach we need to preserve the original aliasing information. Differential Revision: [D64406859](https://our.internmc.facebook.com/intern/diff/D64406859/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137997 Approved by: https://github.com/avikchaudhuri	2024-10-17 06:41:10 +00:00
Avik Chaudhuri	0e9708f907	tensor constant with wrapped method (#138091 ) Summary: Tensor constants can show up through wrapped methods, so that they may not always be found in constant attributes. They need to be fakified and their meta vals need to be found to create graph signatures nevertheless. Otherwise non-strict barfs. Longer term maybe we should pull this fakification up in non-strict. Test Plan: added test Differential Revision: D64480272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138091 Approved by: https://github.com/tugsbayasgalan	2024-10-17 00:00:04 +00:00
Avik Chaudhuri	ed55d356de	[alt] fix unroll in successive unflatten (#137646 ) We use nn_module_stack in unflatten to recognize when module calls begin and end. However the current format is not sufficient to detect module call boundaries when we have successive calls to the same module, because the successive instructions (end of one call, begin of next call) have the same nn_module_stack. This causes us to effectively "unroll" successive calls to a single call. This can cause problems when preserving module call signatures because the outputs of the successive calls might be concatenated in the single call. Previously we introduced the concept of a "call index" to generate multiple graphs when unflattening, one per call. This PR pushes this concept into nn_module_stack itself. In particular, the keys of nn_module_stack now go from `key` to `key@call_index`. (In a previous attempt, https://github.com/pytorch/pytorch/pull/137457, instead values in nn_module_stack go from (fqn, type) to (fqn, type, call_index), which is BC-breaking.) Note that we still do not have the ability to preserve module call signatures for multiple calls to the same module. But now instead of randomly crashing we give a proper error. OTOH when not preserving module call signatures we simply generate multiple calls, each with its own graph, possibly deduplicated, matching what we would do for non-successive calls. Test Plan: Like D64014936 Differential Revision: D64136277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137646 Approved by: https://github.com/angelayi	2024-10-12 15:53:52 +00:00
Tugsbayasgalan Manlaibaatar	5fca2fd365	Try unify training and inference (#136888 ) Previously inference -> inference IR was going through a seperate flow from train -> inference decomposition. This diff unifies them so that we always retrace when decomposing. Joint IR decomp is still going through old flow (inference -> inference) but seems ok for now since it is still in experimental stage. Differential Revision: [D63062521](https://our.internmc.facebook.com/intern/diff/D63062521/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136888 Approved by: https://github.com/avikchaudhuri	2024-10-11 20:09:58 +00:00
Avik Chaudhuri	8ee361ed13	fix test_retrace_pre_autograd (#137733 ) Test Plan: fixed Differential Revision: D64200918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137733 Approved by: https://github.com/pianpwk, https://github.com/tugsbayasgalan	2024-10-11 03:46:22 +00:00
Shangdi Yu	9d4cb0d3eb	Fix param and buffer mapping for state_dict when there are state_dict hooks (#137609 ) Resolve #137540 Summary: We might get different state_dict and named_parameters result when the module has registered custom state_dict_hooks. For exported_program's state_dict, we want the state_dict to reflect the actual module hierarchy at runtime, and it might be different from the model's state_dict() output if the model has state_dict hooks. To do weight swapping, one needs to either re-export or turn-off the hooks when saving model's state_dict(). Previously, ExportedProgram uses nn.Module's state_dict() method to populate its own state_dict, but it doesn't work for some models (e.g. llama3_3_vision) because ExportedProgram's state_dict and an nn.Module's state_dict have some subtle differences semantically. nn.Module's state_dict is about how the state should be serialized, and it reflects the structure of the original user model code. In contrast, export specializes on a “run” of a model, and its state_dict needs to reflect the runtime module hierarchy. One example where these two are different is TorchTune's Llama3_2_vision text decoder. Here, a FusionLayer is added as a local optimization and it is not part of the "static model definition". In runtime, we have mod.layers[3].layer.sa_norm.scale. But in nn.Module's state_dict, the authors of the model added a state_dict hook to remove the "layer" in mod.state_dict() to reflect the static model definition, so we have mod.state_dict()["layers.3.sa_norm.scale"]. In this Diff, we change ExportedProgram to populate its state_dict using named_parameters() and named_buffers() instead. So in ExportedProgram's state_dict, we have "layers.3.layer.sa_norm.scale", which reflects the runtime module hierarchy. Now one problem this presents is weight swapping. Since ExportedProgram's state and the model's state is not the same anymore, weight swapping procedure also needs to change slightly. In internal Ads and RecSys models deployment, weight swapping is where they have one model that is currently being being deployed and serving traffic, and they want to swap out the weights with newly trained model weights without having to redo the whole exporting/lowering process and create a new artifact. So they would move the deployed model’s pointer to the state dict over to the new state dict. Because of this, it’s previously a requirement that the FQNs are matching between the exported and the eager model’s state dict. The new ExportedProgram's state dict still supports weight swapping, but the state_dict to be swapped needs to be obtained from torch.export.exported_program instead of model.state_dict() if the model has state_dict hooks. The new requirement is that the FQNs are matching between the exported’s state dict and the state_dict obtained from `_disabled_load_state_dict_hooks(M)` context manager. One benefit of having this new API is that we are now in full control within export of gathering and updating the model state. If a model doesn't have any state_dict hooks, one can still use model.state_dict() for weight swapping, so it's BC. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_export_for_training_with_state_dict_hooks ``` Differential Revision: D64080561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137609 Approved by: https://github.com/angelayi, https://github.com/pianpwk	2024-10-11 01:33:50 +00:00
Avik Chaudhuri	365722f606	fix test_constant_output (#137547 ) Summary: Fixes a couple of problems: constants didn't have metadata before creating graph signatures, and graph signatures weren't updated when lifting constants. Test Plan: fixed test Differential Revision: D64081786 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137547 Approved by: https://github.com/tugsbayasgalan	2024-10-10 07:48:15 +00:00
Tugsbayasgalan Manlaibaatar	02013da038	Lift restriction on training IR for unflatten (#137470 ) Differential Revision: [D64025578](https://our.internmc.facebook.com/intern/diff/D64025578) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137470 Approved by: https://github.com/avikchaudhuri	2024-10-08 22:30:24 +00:00
Tugsbayasgalan Manlaibaatar	bb31e3f57e	Add original forward names to schema so that prettify pass works (#136887 ) When we run_decomp, we retrace if it is training IR. As a result, we do need to reliably store the oroiginal forward names when we run decomp. Differential Revision: [D63064453](https://our.internmc.facebook.com/intern/diff/D63064453/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136887 Approved by: https://github.com/angelayi	2024-10-08 04:21:02 +00:00
Pian Pawakapan	f33ffd01f2	[export] fix joint graph metadata (#136011 ) Differential Revision: D62652832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136011 Approved by: https://github.com/tugsbayasgalan	2024-10-07 19:36:44 +00:00
angelayi	1dc1b85714	[export] Move swap to a different file (#137134 ) Refactor so that unflattener doesn't become too messy Differential Revision: [D63719648](https://our.internmc.facebook.com/intern/diff/D63719648/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137134 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #136191, #137102	2024-10-06 04:28:18 +00:00
angelayi	fa9cd46d12	[export] Update swap's forward function (#137102 ) Downstream APS code was failing to run the previously swapped module because of some fx.GraphModule forward function weirdness (P1594789677). So to fix this, I just attached a custom forward function which matches the unflattened module's forward function. Differential Revision: [D63683422](https://our.internmc.facebook.com/intern/diff/D63683422/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137102 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #136191	2024-10-06 04:25:36 +00:00
angelayi	52d7704b32	[export] Add optimization passes (#136191 ) Added an optimization pass to the swap function which removes extraneous pytrees. Currently it removes the pytree flatten/unflatten calls between modules in very specific scenarios (all the inputs of one module go into the other). Future work can be to remove the input pytree.flatten if the inputs go directly into an unflatten, and output pytree unflatten if the outputs are directly from a pytree.flatten. Differential Revision: [D62879820](https://our.internmc.facebook.com/intern/diff/D62879820) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136191 Approved by: https://github.com/avikchaudhuri	2024-10-06 04:22:42 +00:00
Avik Chaudhuri	17718209ea	fix specialization bug in unflatten + preserve_module_call_signature (#137363 ) Summary: In unflatten, when we generate module calls when their signature has been preserved, we do not pass the original constant args. This can cause strange effects, e.g., if the module is swapped out with itself, we may suddenly go down a different path than the original, or even crash. Test Plan: added a test Reviewed By: angelayi Differential Revision: D63913750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137363 Approved by: https://github.com/angelayi	2024-10-05 04:26:02 +00:00
Avik Chaudhuri	6a6a8b17b8	handle state tensors in training ir path (#137240 ) Summary: We had attribute assignment detection and handling of registered buffer assignments when using `aot_autograd`, but not when using just `make_fx`. Fixed. Test Plan: expanded coverage of `test_state_tensors` to use `export` instead of `torch.export.export` Differential Revision: D63802576 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137240 Approved by: https://github.com/tugsbayasgalan	2024-10-04 20:23:48 +00:00
Tugsbayasgalan Manlaibaatar	d2d14d14e3	[RELAND] Fix unlift to preserve aliased constants (#137310 ) Differential Revision: [D63864743](https://our.internmc.facebook.com/intern/diff/D63864743) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137310 Approved by: https://github.com/avikchaudhuri	2024-10-04 18:15:52 +00:00
Pian Pawakapan	6dcd773c57	[export] clean up dynamic markers from tensors (#137230 ) Summary: When we handle dynamic shapes markers like `Dim.AUTO, Dim.DYNAMIC`, we use dynamo decorators, attaching set attributes to the export input tensors, e.g. `x._dynamo_dynamic_indices = set()`. I thought this was fine, since it's done all the time with torch.compile, but it breaks some PT2Inference tests, specifically because unpickling a set attribute isn't possible with the C++ torch::jit::pickle_load call. We've agreed that the PT2Inference side will clone sample inputs & pickle the original inputs to be safe, but this still establishes a nice invariant that user-facing decorators are both ignored & cleaned out in the lifecycle of an export call. Test Plan: test_export Differential Revision: D63773534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137230 Approved by: https://github.com/avikchaudhuri	2024-10-04 06:50:45 +00:00
Tugsbayasgalan Manlaibaatar	97634e4f82	Rollout infra for executorch migration to training IR (#132703 ) Title Differential Revision: [D60432217](https://our.internmc.facebook.com/intern/diff/D60432217/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132703 Approved by: https://github.com/tarun292	2024-10-04 04:33:08 +00:00
PyTorch MergeBot	525f6715bc	Revert "Fix unlift to unblock training IR + run_decomp on aliasing constants (#137162 )" This reverts commit `f96020c246`. Reverted https://github.com/pytorch/pytorch/pull/137162 on behalf of https://github.com/jovianjaison due to Sorry for reverting your changes but many jobs are failing with NameError: name _recursive_getattr is not defined + a Lint job fails ([comment](https://github.com/pytorch/pytorch/pull/137162#issuecomment-2392036062))	2024-10-03 18:17:56 +00:00
Tugsbayasgalan Manlaibaatar	f96020c246	Fix unlift to unblock training IR + run_decomp on aliasing constants (#137162 ) When we populate unlifted graph module, we actually only "unlift" constant tensor inputs which is problematic because export de-duplicates aliasing constants. As a result, we only register one constant instead of two constants. This PR fixes that by querying ep.constants table instead of ep.graph_signature.lifted_tensor_constants. Differential Revision: [D63743111](https://our.internmc.facebook.com/intern/diff/D63743111) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137162 Approved by: https://github.com/pianpwk	2024-10-03 17:28:53 +00:00
Avik Chaudhuri	cd5d1fe015	unflatten with specialized graphs per submodule call (#137013 ) Previously we were making a fairly restrictive assumption when unflattening an exported program: for any submodule, we would assert that the graph of every call to that submodule must be the same. This assertion is load-bearing, i.e., if we simply remove the assertion then we can get incorrect results, as shown by the following example. ``` class N(torch.nn.Module): def forward(self, x, b): if b: return x + 1 else: return x + 2 class M(torch.nn.Module): def __init__(self): super().__init__() self.n = N() def forward(self, x): x0 = x + 3 x1 = self.n(x0, True) x2 = x1 + 4 x3 = self.n(x2, False) return x3 + 5 m = M() inp = (torch.ones(1),) print(m(inp)) # tensor([16.]) ep = torch.export.export(m, inp) print(ep.module()(inp)) # tensor([16.]) unflattened = torch.export.unflatten(ep) print(unflattened(inp)) # tensor([15.]) ``` However, this goes against the spirit of specializing graphs when exporting: we should expect* that for every call to a submodule we might generate a different graph. The goal of this PR is to fix unflattening to handle multiple specialized graphs corresponding to multiple calls to the same submodule. The idea is simple: for every call to a child module `foo`, we will create potentially different child modules `foo`, `foo@1`, `foo@2`, etc. and use those names as targets in `callmodule` instructions in the parent graph. An immediate consequence of this is that the list of fqns in an unflattened module may not be the same as an exported module. Note that all these variants share the same parameters / buffers, so that multiple calls to the same submodule can share state as expected. However, as described so far this scheme may end up with needlessly too many submodules. Thus, between calls to the same submodule, if graphs are equal then we optimize away the extra submodules and reuse call names as much as possible. Moreover, when submodules are shared across fqns, we also try to de-duplicate graphs corresponding to their calls as much as possible. Note that no matter what, information about which submodule was called is still preserved, so that if a submodule has to be swapped with another, one can still find all calls to the former submodule and replace them with calls to the latter. A note on the choice of naming scheme for call names: instead of generating "sibling" modules `foo@1`, `foo@2`, etc. for `foo`, we had considered generating "children" modules `foo._1`, `foo._2`, etc. of `foo`. However this can cause spurious cycles when de-duplicating graphs. E.g., suppose that `foo` is an alias for `bar._1` and `foo._1` is an alias for `bar`, then we must either introduce a cycle or drop the opportunity to optimize. Another idea would be to make `foo` a dummy module that contains `foo._0` corresponding to the first call, but this necessitates too many changes to existing tests and hurts the common case. Differential Revision: D63642479 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137013 Approved by: https://github.com/pianpwk	2024-10-03 00:55:44 +00:00
Tugsbayasgalan Manlaibaatar	73b07df042	Preserve custom ops via run_decomps (#136882 ) This is re-apply of https://github.com/pytorch/pytorch/pull/136773?fbclid=IwZXh0bgNhZW0CMTEAAR3SmginkvZcILVY7G2XDa_KosnV4DPmq1l6pkjPIM255QgJLKVAR90rGAU_aem_ZWpcVdUsmAGzOGiwbjtBDg. Note that this doesn't completely remove the _preserve_ops list from export mainly because we want to have small change to address failing executorch tests. All the complications included in this PR is deleted in the next PR. Differential Revision: [D63553086](https://our.internmc.facebook.com/intern/diff/D63553086/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136882 Approved by: https://github.com/bdhirsh	2024-10-01 17:38:00 +00:00
Pian Pawakapan	cc2a66c55e	[export] hook up mark_dynamic to export Dims (#137029 ) Adds Dim.DYNAMIC which calls torch._dynamo.mark_dynamic() in the backend. Similar to Dim.AUTO in that it does automatic inference for ranges & relations, but errors out for specializations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137029 Approved by: https://github.com/avikchaudhuri	2024-10-01 17:05:09 +00:00
Ivan Zaitsev	b35f70da05	[ez] fixup the export of D62879819 (#136900 ) a line from D62879819 (#136190) went missing somehow Pull Request resolved: https://github.com/pytorch/pytorch/pull/136900 Approved by: https://github.com/atalman	2024-09-28 13:46:17 +00:00
Pian Pawakapan	6075f566cc	[export] simplify automatic dynamic shapes processing (#136591 ) Removing `_transform_shapes_for_default_dynamic` and `assume_static_by_default=False` as added in https://github.com/pytorch/pytorch/pull/133620. This reverts back to `assume_static_by_default=True` with the use of dynamo decorators (e.g. `maybe_mark_dynamic, mark_static`, instead) for handling Dim.AUTO & Dim.STATIC instead. This is easier to maintain, as it doesn't requiring reasoning about "inverting" the dynamic_shapes specs, and also opens up usage of other decorators (`mark_dynamic, mark_unbacked`). On the user side this change has no effect, but internally this means dynamic behavior is determined only by the `dynamic_shapes` specs (ignoring user-side input decorators following https://github.com/pytorch/pytorch/pull/135536), but transferring this information for _DimHints via decorators, for Dynamo/non-strict to create symbolic_contexts accordingly, e.g. `7c6d543a5b/torch/_dynamo/variables/builder.py (L2646-L2666)` One caveat is we don't raise errors for dynamic decorators on the user side, since we don't know if they're from user markings, or from re-exporting with inputs we've previously marked. Differential Revision: D63358628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136591 Approved by: https://github.com/avikchaudhuri	2024-09-27 18:28:51 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	e4e83a4ac4	Remove aten.item hack (#136663 ) Summary: Title Test Plan: CI Differential Revision: D63404353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136663 Approved by: https://github.com/bdhirsh	2024-09-26 17:14:48 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	0b38fa154a	Fix meta registry in export (#136492 ) Summary: Title Test Plan: CI This fixes some breaking tests in executorch. I think the root cause is when we have aten::matmul which we are not preserving, we register meta implementation from C++ side. It seems like the C++ kernel doesn't work well with mix of FakeTensor and real tensor. This PR sidesteps this problem by always preferring python CIA decomp over C++ Cia decomp Differential Revision: D63297050 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136492 Approved by: https://github.com/bdhirsh	2024-09-25 17:53:02 +00:00
Pian Pawakapan	7c6d543a5b	[export] fix _get_non_persistent_buffers for duplicates (#136552 ) Summary: Export's method _get_non_persistent_buffers doesn't check duplicate submodules, so we run into state_dict related issues if non-persistent buffers exist on shared submodules. Test Plan: test_export Differential Revision: D63332976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136552 Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan	2024-09-25 16:46:31 +00:00
angelayi	210b136c07	[export] Add experimental swap API (#136190 ) Prototyped the following API which takes in an ExportedProgram, a dictionary of fqn to modules to swap, and returns a (unlifted) GraphModule ``` _swap_modules( ep: ExportedProgram, modules_to_swap: Dict[str, torch.nn.Module] ) -> torch.fx.GraphModule: ``` Differential Revision: [D62879819](https://our.internmc.facebook.com/intern/diff/D62879819) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136190 Approved by: https://github.com/avikchaudhuri	2024-09-24 22:50:44 +00:00
Huanyu He	a4e9a1c90b	[TorchRec][PT2 IR][APF] short circuit the flatten/unflatten between EBC and KTRegroupAsDict modules (#136045 ) Summary: # context * for the root cause and background please refer to this [post](https://fb.workplace.com/groups/1028545332188949/permalink/1042204770823005/) * basica idea of this diff is to short circuit the pytree flatten-unflatten function pairs between two preserved modules, i.e., EBC/fpEBC and KTRegroupAsDict. NOTE: There could be multiple EBCs and one single KTRegroupAsDict as shown in the [pic](https://fburl.com/gslide/lcyt8eh3) {F1864810545} * short-circuiting the EBC-KTRegroupAsDict pairs are very special and a must in most of the cases due to the EBC key-order issue with distributed table lookup. * hide all the operations behind a control flag `short_circuit_pytree_ebc_regroup` to the torchrec main api call `decapsulate_ir_modules`, which should only be visible to the infra layer, not to the users. # details * The `_short_circuit_pytree_ebc_regroup` function finds all the EBCs/fpEBC and KTRegroupAsDict modules in an unflattened module. Retrieve their fqns and sort to in_fqns (regroup_fqns) and out_fqns (ebc_fqns). Because currently the fpEBC is swapped as a whole, so we do some extra fqn logic to filter out the EBC that belongs to an up-level fpEBC. * a util function `prune_pytree_flatten_unflatten` removes the in-coming and out-going pytree flatten/unflatten function calls in the graph module, based on the given fqns. WARNING: The flag `short_circuit_pytree_ebc_regroup` should be turned on if EBCs are used and EBC sharding is needed. Assertions are also added if can't find a `KTRegroupAsDict` module, or `finalize_interpreter_modules` is not `True`. # additional changes * absorb the `finalize_interpreter_modules` process inside the torchrec main api `decapsulate_ir_modules`. * set `graph.owning_module` in export.unflatten as required by the graph modification * add one more layer of `sparse_module` for closely mimicing the APF model structure. Test Plan: # run test * serializer ``` buck2 run fbcode//mode/opt fbcode//torchrec/ir/tests:test_serializer ``` * apf ``` buck2 run fbcode//mode/opt fbcode//aps_models/ads/gmp/tests/ne/e2e_deterministic_tests:gmp_e2e_ne_tests -- --filter-text 'test_mtml_instagram_model_562438350_single_gpu_with_ir' ``` * local mp run ``` ==== Finished E2E deterministic test for mtml_instagram_model_gmp_474023725_non_kjt_unary ==== finished test_mtml_instagram_model_562438350_single_gpu_with_ir Imports took: 6.0s! Profile with --import-profiler. --_ \|""---__ Executed 1 example in 203.1s: \|'.\| \|\| . """\| Successful: 1 \| \|\| \|\| /\|\""-. \| Failed: 0 \| \|\| \|\| \| \| \| Skipped: 0 \| \|\| \|\| \| \\|/ \| Not executed: 8 \|."\| \|\| --"" '__\| https://testslide.readthedocs.io/ --" \|__---""" ``` Differential Revision: D62606738 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136045 Approved by: https://github.com/angelayi	2024-09-17 18:42:56 +00:00
Tugsbayasgalan Manlaibaatar	1904b09e61	Create export_for_inference API and expose core_aten as public facing API (#135912 ) Differential Revision: [D62606908](https://our.internmc.facebook.com/intern/diff/D62606908) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135912 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #135080	2024-09-15 17:05:07 +00:00
Tugsbayasgalan Manlaibaatar	382fad58b3	Deprecate _preserve_ops and consolidate with decomp_table (#135080 ) In this PR, we deprecate _preserve_ops feature in run_decomposition API. We can't kill this API completely because Executorch team depends on it. As the syncing between two repos is non-trivial, I just leave this argument as deprecated for now. In the next PR, i will immediately remove it. After this PR, run_decompositions will only decompose what's inside the decomp table and preserve the rest by default. Note that this feature is only rolled out to OSS for now. Old code path is protected under IS_FBCODE flag. Differential Revision: [D62163161](https://our.internmc.facebook.com/intern/diff/D62163161/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135080 Approved by: https://github.com/justinchuby, https://github.com/avikchaudhuri, https://github.com/bdhirsh	2024-09-15 17:01:58 +00:00
Pian Pawakapan	b897ab0540	[export] ignore mark_dynamic() in export (#135536 ) Previously we were accomodating `torch._dynamo.mark_dynamic()` for export's dynamic shapes. Here we clean things up and ignore it, requiring users to specify an export input for `dynamic_shapes`. Note: there's 4 decorators relevant to export, `mark_dynamic, maybe_mark_dynamic, mark_static, mark_unbacked`. User calls that involve export have only been `mark_dynamic()`, and we use `maybe_mark_dynamic` under the hood for `Dim.AUTO`, but we could start using others. One reason I decided to not warn and just silently ignore is these decorators cause the tensors to carry dynamic info, and it'll be hard to tell whether the markers are from export or user calls when re-exporting with the same inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135536 Approved by: https://github.com/avikchaudhuri	2024-09-12 21:22:19 +00:00
Tugsbayasgalan Manlaibaatar	5a9ac83e94	Fix doc (#135551 ) Differential Revision: [D62412667](https://our.internmc.facebook.com/intern/diff/D62412667/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135551 Approved by: https://github.com/yushangdi ghstack dependencies: #135549	2024-09-10 07:18:44 +00:00
Tugsbayasgalan Manlaibaatar	c18052da0e	Add some minor doc improvement and ban using training IR for unflattener (#135549 ) Title Differential Revision: [D62412490](https://our.internmc.facebook.com/intern/diff/D62412490/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135549 Approved by: https://github.com/yushangdi	2024-09-10 06:48:42 +00:00
Avik Chaudhuri	de74aafff4	error on exporting ScriptModule (#135302 ) Test Plan: added test Differential Revision: D62279179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135302 Approved by: https://github.com/yushangdi	2024-09-06 15:12:40 +00:00
Zhengxu Chen	116fd474da	[export] Expand coverage to more copied sym ops for unflattener. (#135119 ) Test Plan: buck2 test 'fbcode//mode/opt' fbcode//torchrec/ir/tests:test_serializer -- --run-disabled ``` File changed: fbcode//caffe2/torch/export/unflatten.py Buck UI: https://www.internalfb.com/buck2/2e0377e7-e2b6-4bd0-8133-a787245165a0 Test UI: https://www.internalfb.com/intern/testinfra/testrun/5066549824883887 Network: Up: 0B Down: 0B Jobs completed: 16. Time elapsed: 10.2s. Tests finished: Pass 6. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Differential Revision: D62190172 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135119 Approved by: https://github.com/yushangdi	2024-09-05 21:58:20 +00:00
Angela Yi	9c38b00999	[export] Add ability to run eagerly on UnflattenedModule (#133996 ) Summary: Added the contextmanager, `_disable_interpreter`, which is meant to put around a call to `unflatten`. This will generate an UnflattendModule and sub-InterpreterModules which will not use torch.fx.Interpreter to run eagerly. We want to have this as a state of the module instead of a contextmanager around running the module because it's not clear where we are calling the unflattened module. This seems to improve the performance: https://fb.workplace.com/groups/1075192433118967/posts/1473590629945810/?comment_id=1473621763276030 Test Plan: CI Differential Revision: D60939034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133996 Approved by: https://github.com/pianpwk	2024-09-05 20:28:42 +00:00
Tugsbayasgalan Manlaibaatar	9d705605dd	Fix decomp behaviour in export training IR (#134801 ) Subset of changes in https://github.com/pytorch/pytorch/pull/132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export. Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134801 Approved by: https://github.com/avikchaudhuri, https://github.com/angelayi	2024-09-05 06:37:44 +00:00
Laith Sakka	c8ab9b06a2	Redesign custom op functionlaization for better re-inplace (#134409 ) - The new implementation (auto_functionalized_v2) is enabled by default but can be disable using an inductor flag. - In export mode the old implementation is used. Motiviation Previous functionalization fails to re-inplace arguments when they are view over other tensors. see issue https://github.com/pytorch/pytorch/issues/131192 The new functionalization is easier to re-inplace for views. A) Functionalizations pass consider a program: ``` func(t) x = t[0] y = t[1] foo(x, y) # custom operator with x, y mutable return (x, y, t) ``` - To functionalize `foo` we generate a function that operates on the base tensors of the inputs; (x.base() and y.base()) and record how to regenerates the views out of the base for argument x by recording ```ViewInfo=(x.base(), x.size(), x.stride, x,storage_offset())``` - Due to some limitations on the torch.export arguments format, we have to generate alot of arguments, but this is something we can simplify in the future, for the example above we get the following function. ``` auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.mylib.foo.default, _x_base_index = 0, _x_size = (), _x_stride = (), _x_storage_offset = 0 , _y_base_index = 0,_y_size = (), _y_stride = (), _y_storage_offset = 1 , _all_bases = [arg0_1]) ``` - In the code above: - _all_bases[t]: refers to a unique set of bases for all foo arguments. - for each argument x we have _x_base_index, _x_size, _x_stride, _x_storage_offset that can be used to (1) regenerate x from _all_bases[_x_base_index] or a copy of a the base. - the output of auto_functionalized is foo output , followed by x tensors one for each base in _all_bases, that is a copy of the base tensor after observing the mutations of the all the arguments that are views of that base. - for each use of a base in _all_bases or a view of it , that are after the call to foo, replace it with a view of the new output for the function above after functionalization we get : ``` def forward(self, arg0_1: "f32[2][1]cpu"): auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.mylib.foo.default, _x_base_index = 0, _x_size = (), _x_stride = (), _x_storage_offset = 0, _y_base_index = 0, _y_size = (), _y_stride = (), _y_storage_offset = 1, _all_bases = [arg0_1]) getitem_1: "f32[2][1]cpu" = auto_functionalized[1]; auto_functionalized = None copy_: "f32[2][1]cpu" = torch.ops.aten.copy_.default(arg0_1, getitem_1); arg0_1 = copy_ = None # No stacktrace found for following nodes select_2: "f32[][]cpu" = torch.ops.aten.select.int(getitem_1, 0, 0) select_3: "f32[][]cpu" = torch.ops.aten.select.int(getitem_1, 0, 1); getitem_1 = None return (select_2, select_3) ``` B) Semantics of auto_functionalize The new semantics of auto_functionalize is as the following: 1. For each base in all_bases, copy the base and create all_bases copies. (if a base is inplaced we do not need to copy it) 2. For each arg, regenerate the arg from the copy of its base using the view information above. 3. return the original foo output followed by the new bases. C) Re-inplace pass since auto_functionalize not copy the bases, what we actually inplace is the bases. (run just like before but on the beses instead of args). 1. For each base b in _all_bases check if there is any use of base (or its aliases/views) after auto_functionalize (before its overwritten with a copy) if there is not any, then inplace it (avoid copying it in step 1 above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/134409 Approved by: https://github.com/zou3519	2024-09-04 17:08:58 +00:00
Avik Chaudhuri	9f00317997	rationalize STATIC vs. None (#134877 ) Summary: A bit of refactoring to prepare to remove `None` as a way to specify static dimensions in dynamic shapes, given we already have `Dim.STATIC` for the same purpose. We will now warn whenever this happens. However no tests were modified because problematic uses of `None` still need to behave as they do today, until we are ready to remove support. It should be easy to port tests by replacing the warning function to raise instead. Note that other uses of `None`, such as for entire values (tensor or non-tensor) remain as is. Moving forward this should be the only purpose of `None` (at least externally). Finally, there's a bit of confusion in our representation now because `AUTO` also internally transforms to `None`. Renamed dynamic_shapes to transformed_dynamic_shapes where this happens. Overall the two forms (pre and post transformation) have different properties so should probably not be represented in the same format in the future. Test Plan: existing Differential Revision: D62040729 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134877 Approved by: https://github.com/pianpwk	2024-09-04 05:34:26 +00:00
Zhengxu Chen	a19a7524f6	[export] Make sure getitem replacement are synced with module call graph. (#134830 ) Summary: When we are placing nodes in the graph, we should also replace the references in module_call_graph. Test Plan: buck2 run 'fbcode//mode/opt' torchrec/fb/ir/tests:test_serializer -- --filter-regex test_serialize_deserialize_vlea buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_serialize_empty_value_vlea' --run-disabled buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_deserialized_device_vle' --run-disabled Differential Revision: D62014035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134830 Approved by: https://github.com/angelayi	2024-08-30 16:47:05 +00:00
Laith Sakka	f5b0caee71	Rewrite `unsafe_remove_auto_functionalized_pass` using `decompose_auto_functionalized` (#134831 ) `unsafe_remove_auto_functionalized_pass` can be written as using `decompose_auto_functionalized`, this way we do not have to update it each time we do a change to `auto_functionalize` (Ex https://github.com/pytorch/pytorch/pull/134409) , and we avoid duplicate logics implemented in two different ways. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134831 Approved by: https://github.com/zou3519	2024-08-30 16:27:53 +00:00
Pian Pawakapan	36a6516290	[export] use single FQN for param_buffer_mapping (#134500 ) Fixes #133252 In strict mode, we have this routine for mapping traced parameters to their FQNs using tensor ids. Currently we assume there's at least 1 unique FQN for each traced parameter, but this seems to break with parameter reuse when call_module nodes are present. Adding a test case where this breaks. Fixes this by assigning the same FQN to all traced parameters with the same tensor id. This is fine because we return the original state_dict for the EP, and the unflattener has its own routine of handling aliasing: https://github.com/pytorch/pytorch/pull/125758 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134500 Approved by: https://github.com/angelayi	2024-08-29 17:06:31 +00:00
Avik Chaudhuri	92e38a476f	preserve aten::to device in export training (#134622 ) Summary: With training IR, we cannot rely on trapping `to()` in `FunctionalTensor` because the regular decomposition kicks it first, and that can cause it to be optimized away. So instead we preserve it until we functionalize, and then replace it explicitly with `_to_copy()`. Test Plan: expected test failures go away Differential Revision: D61883878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134622 Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan	2024-08-29 14:53:30 +00:00
Avik Chaudhuri	ca03a14cf7	hang dim hint constants off Dim (#134702 ) Summary: Retry landing https://github.com/pytorch/pytorch/pull/134484 Test Plan: (see original) Differential Revision: D61925860 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134702 Approved by: https://github.com/pianpwk	2024-08-29 01:02:01 +00:00
Tugsbayasgalan Manlaibaatar	6dd3f81aaf	Add export_for_training as public API (#134677 ) Differential Revision: [D61912084](https://our.internmc.facebook.com/intern/diff/D61912084) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134677 Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17	2024-08-28 22:32:10 +00:00
PyTorch MergeBot	13d40f6fc5	Revert "hang dim hint constants off Dim (#134484 )" This reverts commit `c142af7209`. Reverted https://github.com/pytorch/pytorch/pull/134484 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/134484#issuecomment-2315749549))	2024-08-28 16:05:42 +00:00
Avik Chaudhuri	c142af7209	hang dim hint constants off Dim (#134484 ) Summary: Recently https://github.com/pytorch/pytorch/pull/133620 added support for automatic dynamic shapes, where a new enum, `DIM`, was introduced to provide hints like `AUTO` and `STATIC`. This PR is a nominal change where we expose the hints via the existing public `Dim` API, and remove `DIM` from the public API. The main motivation is to avoid having users need to import too many things. Test Plan: existing Differential Revision: D61807361 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134484 Approved by: https://github.com/angelayi	2024-08-28 14:35:40 +00:00
Pian Pawakapan	5ead965026	[export] don't duck size for DIM.AUTO (#134486 ) Summary: apparently DIM.AUTO leads to duck sizing, I didn't catch this. Doing the least intrusive fix possible by using `torch._dynamo.maybe_mark_dynamic()` under the hood. Test Plan: added test Differential Revision: D61809344 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134486 Approved by: https://github.com/avikchaudhuri	2024-08-27 23:00:26 +00:00
Avik Chaudhuri	8db8ac700d	line by line logging (#134298 ) Summary: Today there is no good mechanism to detect progress of non-strict export line-by-line in user code. This caused some pain recently in trying to find the exact line of user code that was triggering a bug where the process appeared stuck because deep down something was calling some symbolic shapes code that was suffering some exponential blowup. This PR adds a environment variable for extended debugging that will log the line of user code corresponding to every torch function call. It only works in non-strict export for now. Prefix setting this environment variable with `TORCH_LOGS` enabled for `export` logs at `DEBUG` level (i.e., with a `+` prefix), i.e.,.: ``` TORCHEXPORT_EXTENDED_DEBUG_CURRENT_LOC=1 TORCH_LOGS="+export" ... ``` This will show logs with something like: ``` ... prim::device called at .../example.py:4284 in foo TensorBase.item called at .../example.py:4277 in bar ... ``` We already have an existing place to intercept torch functions where we process data-dependent errors in non-strict, so parking the logging there. An alternative place we could be doing this is where we add `stack_trace` metadata when generating code, but unfortunately at least the example that motivated this gets stuck before generating code, so that would be too late. Test Plan: ran it on some sample commands Differential Revision: D61692156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134298 Approved by: https://github.com/angelayi	2024-08-25 02:57:11 +00:00
Pian Pawakapan	54ff320519	[export] refactor ExportGraphSignature construction (#134059 ) Refactors construction of ExportGraphSignature object for export & training IR, explicitly creating AOTAutograd signature for training IR. This will be helpful for upcoming refactors for placeholder naming & runtime asserts prettifying. Changes: - dedups `make_argument_spec` call, moved to export/graph_signature.py - `_sig_to_specs` wrapped into new function `_convert_to_export_graph_signature`, directly converts GraphSignature -> ExportGraphSignature - `_make_fx_helper` explicitly creates AOTAutograd GraphSignature object Pull Request resolved: https://github.com/pytorch/pytorch/pull/134059 Approved by: https://github.com/angelayi, https://github.com/ydwu4	2024-08-23 23:29:28 +00:00
Yiming Zhou	2cfc2da527	[export] Make move_to_device_pass function public (#134263 ) Summary: This is a follow-up of https://github.com/pytorch/pytorch/pull/133660 Here we make the `move_to_device_pass()` function publich so users can call it by `from torch.export.passes import move_to_device_pass` Test Plan: CI Differential Revision: D61671310 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134263 Approved by: https://github.com/angelayi	2024-08-23 23:18:30 +00:00
Pian Pawakapan	8ff3a5be1b	[export] basic auto dynamic shapes (#133620 ) Starter version of automatic dynamic shapes for export. Creates enums `DIM.AUTO`, `DIM.STATIC`, allowing user to specify `AUTO` for dims in dynamic_shapes specs, meaning that corresponding dims are treated as dynamic, and relevant guards will do what's necessary (e.g. refine ValueRanges, set replacements based on equality, or even set static) without raising ConstraintViolationErrors. Basically allows the user to say, "a bunch of these dims can be dynamic, let export do model analysis and return the program with maximum possible dynamism, without complaining". The usage for specifying `dynamic_shapes` is now: ``` AUTO -> dynamic by default, return whatever produce_guards() says, even if it's static None/int/STATIC -> static Dim/DerivedDim -> same as before - will complain if the min/max range is invalid, or if dims related to this are unspecified. ``` Caveat 1: specifying `AUTO` for a dim won't guarantee it'll be dynamic: - specifying `AUTO` for a dim will return the maximum possible dynamism given your program and other specified constraints, but this can still mean you'll get a static program. For example, with the program below, x is specified dynamic, but it's equal to y, which is specified static, and with how we currently do things we won't promote y to dynamic, but will demote(?) x to static. So this can be surprising if you don't fully know your model, and/or missed one of your other inputs when specifying auto-dynamic shapes. ``` class Foo(torch.nn.Module): def forward(self, x, y): return x + y inputs = (torch.randn(6), torch.randn(6)) export(Foo(), inputs, dynamic_shapes={"x": (DIM.AUTO,), "y": None}) ``` Caveat 2: specifying `AUTO` and Dims in the same spec is still problematic: - The way Dims/DerivedDims are currently handled is very strict. A Dim represents a symbol, and we require a user to specify the symbol for all dims governed by the symbol - that's why we've seen errors in the past like `The values of x must always be related to y by ...`, asking the user to specify the exact relation as in the program. We also require the specified min/max range to be a subset of the valid range from model analysis. All this doesn't compose well with specifying `AUTO` just yet - for example in the program below, ideal behavior could be to return a dynamic program, where `dx = x.size(0) = y.size(0)` has range (3,6). Unfortunately this crashes, and correct behavior is to specify `dx` for both inputs. So currently we raise a UserError and crash if both Dims + `AUTO` are present in the spec. ``` class Foo(torch.nn.Module): def forward(self, x, y): return x + y inputs = (torch.randn(6), torch.randn(6)) export(Foo(), inputs, dynamic_shapes={"x": (DIM.AUTO,), "y": {0: Dim("dx", min=3, max=6)}}) # this doesn't work, because x & y and related ``` Implementation details: This is done by setting `assume_static_by_default=False`, and doing a transform on the `dynamic_shapes` spec to preserve semantics. `assume_static_by_default=False` will treat unspecified dims or Nones as dynamic. This is the opposite of what `export.export()` currently does - unspecified Dims/Nones are treated as static. Historically this static-by-default behavior, where the user deals with fewer guards, has been desirable, and we would like to respect that in this implementation. So this internal spec transformation is added, `_transform_shapes_for_default_dynamic()`, does the spec conversion necessary to be compatbile with dynamic by default. Specifically, AUTOs are converted into Nones, and Nones/unspecified dims are filled in with explicitly static constraints. For example, this would look like, for a 3-d tensor: `{0: DIM.AUTO, 1: None, 2: Dim("dx")} -> {0: None, 1: 32, 2: Dim("dx")}` This does seem overly complicated, but it's done to preserve dynamic shapes semantics for `torch._dynamo.export()`, which already uses `assume_static_by_default=False`, and follows the same process for generating shape constraints , via `_process_dynamic_shapes`. There the semantics are: ``` None/unspecified: dynamic by default Dim/DerivedDim: also a strict assertion ``` If we don't care about BC for `_dynamo.export(dynamic_shapes)`, then we can just modify semantics for `_process_dynamic_shapes()` and change all the relevant tests in `test/dynamo/test_export.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133620 Approved by: https://github.com/avikchaudhuri	2024-08-23 22:56:39 +00:00
Angela Yi	f5a2a22dc4	[export] Fix unflattener to respect nn.Parameter requires_grad (#134353 ) Summary: Fixes P1539870235 Test Plan: CI Differential Revision: D61726403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134353 Approved by: https://github.com/pianpwk	2024-08-23 22:49:34 +00:00
Avik Chaudhuri	b454c51060	remove dynamic_dim (#134211 ) Summary: As promised in https://github.com/pytorch/pytorch/pull/134045. Test Plan: existing Differential Revision: D61646937 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134211 Approved by: https://github.com/angelayi	2024-08-23 04:13:03 +00:00
Aaron Orenstein	d95aedf5fd	[BE] typing for decorators - fx/_compatibility (part 1) (#134202 ) Part of #134054. This corresponds to the pytorch mypy changes from D61493706. Updating takes so long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change. So landing these 'type: ignore' for pytorch in advance of them actually being needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202 Approved by: https://github.com/Skylion007	2024-08-22 17:07:33 +00:00
Avik Chaudhuri	0d7ac1966a	kill sharing of constraints (#134045 ) Summary: Previously, reuse of the same `Dim` was encoded by "sharing" internal constraints among constraint targets. This kind of sharing, implemented using `shared` fields between `_Constraint`s, was originally motivated by `dynamic_dim`, specifically to support `==` between `dynamic_dim`s, but we no longer need to maintain this overcomplicated structure: we can simply use names of `Dims` to directly encode sharing information. Thus this PR vastly simplifies the structure of `_Constraint` by removing `shared` fields. As a result, both `_Constraint` and its moral subclass, `_DerivedConstraint`, are 1-1 with `Dim` and its moral subclass, `DerivedDim`. Note that this will break `==` over `dynamic_dim`, so an immediate follow-up will be to remove `dynamic_dim` entirely from our public API. (It's been more than 6 months since the deprecation warning anyway.) I just didn't want to deal with that process in the same PR. Test Plan: existing Differential Revision: D61559413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134045 Approved by: https://github.com/pianpwk	2024-08-22 04:40:47 +00:00
Zhengxu Chen	3ef1cc8583	[export] Implement common_getitem_elimination pass. (#133618 ) Summary: In export, we will generate many redundant getitem nodes branching from the same source, inserted by runtime assertions or any passes. This is causing issues with any downstream system relying on any value being uniquely defined by a single node. I don't think it hurt to remove a bunch of getitem nodes only, so I just added to the ctor. Test Plan: rebase on D61256937 ``` buck2 run scripts/bearzx:pt2_export_playground ``` Differential Revision: D61351578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133618 Approved by: https://github.com/tugsbayasgalan	2024-08-21 16:48:24 +00:00
PyTorch MergeBot	49f6ea6dd9	Revert "[report_exportability] Avoid re-exporting duplicated modules (#133930 )" This reverts commit `278bc985d7`. Reverted https://github.com/pytorch/pytorch/pull/133930 on behalf of https://github.com/izaitsevfb due to breaks lint ([comment](https://github.com/pytorch/pytorch/pull/133930#issuecomment-2299513046))	2024-08-20 18:44:09 +00:00
Sherlock Huang	278bc985d7	[report_exportability] Avoid re-exporting duplicated modules (#133930 ) Summary: Skip re-exporting modules with the duplicated types to speed up the exportability tests. In real models, there are many duplicated modules, and mostly have the same export issues. Test Plan: Existing CI Differential Revision: D61504630 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133930 Approved by: https://github.com/angelayi Co-authored-by: bearzx <bearzx@fb.com>	2024-08-20 18:20:49 +00:00
Avik Chaudhuri	b0bafd2be5	remove tensor weak ref from constraint target (#133890 ) Summary: `_ConstraintTarget` is an internal data structure that has some redundancy: tensors are identified by their id but also carry a weak reference. The weak reference was probably useful a year back but everything is done with ids right now, and the lifetime of these tensors ensures that using their ids is OK. Test Plan: existing tests Differential Revision: D61488816 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133890 Approved by: https://github.com/tugsbayasgalan	2024-08-20 03:03:05 +00:00
Justin Chu	271ee90851	[easy] Fix type annotation for `ExportedProgram.run_decompositions` (#133720 ) Fix the tuple type annotation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133720 Approved by: https://github.com/Skylion007	2024-08-16 22:11:42 +00:00
Pian Pawakapan	a75248528f	[export] refactor _process_dynamic_shapes (#133391 ) Sorryyyyy for another refactor. This splits `_process_dynamic_shapes` into 3 parts: 1. `_combine_args` - mostly the same thing 2. `_check_dynamic_shapes`, which is responsible for raising 99% of UserErrors if the dynamic shapes spec is invalid (minus 1 UserError with DerivedDims) 3. `_process_dynamic_shapes`, which for now, is the same thing, minus the stuff in 2. This refactor is helpful for incoming automatic dynamic shapes work, because, we're switching to `assume_static_by_default=False`, which is what `_dynamo.export` currently does. This means any unspecified dims are allocated a symbol, in contrast to export today which keeps unspecified dims static. Historically this has been desirable - export users don't want too much dynamism. So we want to change how the spec is translated into constraints. This means when we switch over to automatic dynamic shapes, we want to plug in something in between steps 2. and 3. which patches up the spec for `assume_static_by_default=False`, filling in static shapes for any unspecified dims, and potentially clearing out the auto-dynamic dims (since they're no-ops). We would do this in-between 2. and 3. to keep `_process_dynamic_shapes` semantically the same, since it's used with `_dynamo.export`. We could do this without a refactor, plugging in this transform before `_process_dynamic_shapes`, but since that function's responsible for both spec checking + constraint production, moving spec checking to before we transform the specs helps guarantee we're raising errors on what the user's specified, and not an internal export bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133391 Approved by: https://github.com/avikchaudhuri	2024-08-15 16:21:21 +00:00
Zhengxu Chen	f23dbefe52	[export] Support "custom" metadata field. (#131912 ) Summary: Add a special field in Graph and Node level metadata called "custom" which should be mapped to a json-serializable object, and we guarantee this field should be always preversed across the following transformations: 1. copy/deepcopy 2. run_decompositions() 3. serialization 4. re-exporting Test Plan: :test_export -- -r custom_tag Reviewed By: angelayi Differential Revision: D60291839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131912 Approved by: https://github.com/angelayi	2024-08-14 01:09:01 +00:00
Pian Pawakapan	40061bd61e	[export] overwrite placeholder names when deepcopying (#133269 ) In joint-graph export we have a `copy.deepcopy(ep.graph_module)` call. This turns out to be an imperfect deepcopy, because deepcopy allows objects to overwrite their `__deepcopy__` methods. For fx.Graph, this ends up deferring to `Graph.create_node()`, which checks the graph namespace, and can avoiding copying the exact name in niche examples, like where the name is a Python keyword (e.g. `input` gets renamed to `input_1`). Names like `input` happen because export's placeholder naming pass overwrites what the namespace creates, based on the model's `forward()` signature. So we can either 1) avoid overwriting such cases, which requires rewriting the naming pass logic, or 2) force another overwrite after deepcopying. This goes with 2). Pull Request resolved: https://github.com/pytorch/pytorch/pull/133269 Approved by: https://github.com/zhxchen17, https://github.com/dvorjackz, https://github.com/ydwu4	2024-08-13 10:20:43 +00:00
Yidi Wu	c44cb89e06	[export] detach constant tensors when they're not registered as buffer or parameter in unlift (#133031 ) Summary: Fixes T198245910. In previous diff D60532628 that causes the test failure, we fix the in-consistency caused by constant tensors is accidentally reigistered as buffer by deleting the buffer and re assign them as constant. However, this broke several existing tests in pyspeech when the exported program is re-traced with torch.jit.trace (which is an anti-pattern we probably should have some alignment), the jit tracer finds this constant tensor requiring grad and errors out. This PR force constant attr not requiring grad, which is the correct behavior. A better fix is finding out where the constants are created in user code and why it requires grad. But this has low roi so we warn user about it. Test Plan: See failures in T198245910. Differential Revision: D60974869 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133031 Approved by: https://github.com/angelayi	2024-08-09 20:33:52 +00:00
Avik Chaudhuri	22ea248aa8	dynamic shapes mismatch errors (#132982 ) Summary: When PyTree detects a structural mismatch between inputs and dynamic shapes, the error messages are quite horrible. This PR fixes these error messages by adding, for each kind of error, the path to the point where the error happens and an actionable reason for the error. Test Plan: added test with several cases Differential Revision: D60956976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132982 Approved by: https://github.com/yushangdi	2024-08-09 02:22:32 +00:00
Shangdi Yu	3c5b246d3c	[export] Remove Proxy from exported programs and modules (#132956 ) Summary: Remove Proxy from exported programs and modules because they cannot be deepcopied or pickeled. Test Plan: CI ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node ``` Differential Revision: D60940832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132956 Approved by: https://github.com/angelayi	2024-08-09 00:00:20 +00:00
Edward Z. Yang	1f66487c69	[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770 Approved by: https://github.com/bdhirsh	2024-08-08 23:07:23 +00:00
PyTorch MergeBot	d1f73fd844	Revert "[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 )" This reverts commit `902c6f3a19`. Reverted https://github.com/pytorch/pytorch/pull/132770 on behalf of https://github.com/ezyang due to Removed API was recommitted ([comment](https://github.com/pytorch/pytorch/pull/132770#issuecomment-2275749689))	2024-08-08 12:54:34 +00:00
Edward Z. Yang	902c6f3a19	[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770 Approved by: https://github.com/bdhirsh ghstack dependencies: #132674, #132675, #132421, #132062, #132767, #132769	2024-08-08 12:03:25 +00:00
angelayi	a270800f0b	[export][reland] Add print_readable to unflattened module (#132817 ) Reland https://github.com/pytorch/pytorch/pull/128617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132817 Approved by: https://github.com/pianpwk	2024-08-08 06:05:30 +00:00
Angela Yi	45d0e90bd3	[export] Allow str outputs (#132808 ) Summary: Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1478413606130179/ Test Plan: CI Differential Revision: D60850712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132808 Approved by: https://github.com/ydwu4	2024-08-08 02:20:59 +00:00
Yidi Wu	bbf568aac8	Split of "[reland] [export] fix zero arg export in training_ir and constant tensor handling" (#132307 ) Summary: A re-land of D60006710. Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing. edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure: a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state. The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design edit 2: Also fix the inconsistency of graph signatures when param_constant is marked as lifted_tensor_constants but it's registered as parameters in the output of ep.module(). Differential Revision: D60532628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132307 Approved by: https://github.com/zhxchen17	2024-08-08 01:36:16 +00:00
angelayi	c327710a87	[export] Publicize validate function (#132777 ) as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/132777 Approved by: https://github.com/zhxchen17	2024-08-07 23:10:05 +00:00
Shangdi Yu	825002c9c6	[export][fx] More robust DCE pass (#132764 ) Summary: - make default DCE pass check schema, - need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added). - mark Proxy dump as NotImplemented for better error msg - Remove Proxy from tensors when dumping models, as Proxy cannot be dumped. More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing. Test Plan: CI ``` - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r qat_conv2d - test_export.py - buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export - buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r dce - buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False - buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False - buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_fold_bn_erases_bn_node ``` Reviewed By: angelayi Differential Revision: D60319175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764 Approved by: https://github.com/angelayi	2024-08-06 22:27:22 +00:00
Tugsbayasgalan Manlaibaatar	775c310c0c	Preserve source_fn_stack in the training IR decomp (#132033 ) Title Differential Revision: [D60377712](https://our.internmc.facebook.com/intern/diff/D60377712/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132033 Approved by: https://github.com/angelayi ghstack dependencies: #131988, #131995, #131999	2024-08-06 19:45:40 +00:00
Shangdi Yu	4a2cf50edf	[export][reland] Convert autocast to HOO (#132677 ) Summary: Reland of D60206382. Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_autocast" buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_set_grad" ``` Verified that now we can export the llama model in gh issue 128394 and the gemma model in gh issue 131829 without error. Differential Revision: D60770038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132677 Approved by: https://github.com/angelayi	2024-08-05 22:34:52 +00:00
PyTorch MergeBot	a3ea96b762	Revert "[export] Convert autocast to HOO (#131914 )" This reverts commit `aec948adfc`. Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/davidberard98 due to PR shouldn't have been relanded by the bot, phabricator diff did not have any recent changes and is still internally reverted ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2269797388))	2024-08-05 19:52:09 +00:00
Shangdi Yu	aec948adfc	[export] Convert autocast to HOO (#131914 ) Summary: Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` parsh --build-flags fbcode//mode/dev-nosan fbcode//caffe2/test:test_export run_tests("test_predispatch_autocast") ``` Reviewed By: angelayi Differential Revision: D60206382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914 Approved by: https://github.com/angelayi	2024-08-05 18:52:12 +00:00
Yidi Wu	618e2c9de4	fix torch rec test failure (#132437 ) Summary: Fixes T192448049. The module call form an unusal call stack for the nodes: https://www.internalfb.com/phabricator/paste/view/P1507230978. This is currently not supported by unflattener and need some extra design to make it work. Test Plan: buck2 run 'fbcode//mode/opt' torchrec/distributed/tests:test_pt2 -- --filter-text "test_sharded_quant_fpebc_non_strict_export" Reviewed By: zhxchen17 Differential Revision: D60528900 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132437 Approved by: https://github.com/Skylion007	2024-08-05 18:06:07 +00:00
Xuehai Pan	f3fce597e9	[BE][Easy][17/19] enforce style for empty lines in import segments in `torch/[a-c]/` and `torch/[e-n]/` (#129769 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769 Approved by: https://github.com/ezyang	2024-08-04 10:24:09 +00:00
PyTorch MergeBot	d984105748	Revert "[export] Convert autocast to HOO (#131914 )" This reverts commit `b28c01d90d`. Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/ezyang due to Failing lint, but was covered up by master failure on lint ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2267248773))	2024-08-04 02:10:35 +00:00
Shangdi Yu	b28c01d90d	[export] Convert autocast to HOO (#131914 ) Summary: Suggested in https://github.com/pytorch/pytorch/issues/128394. If there's an autocast context manager, the predispatch (strict) graph can look something like: ``` class <lambda>(torch.nn.Module): def forward(self, x: "f32[1]"): ... _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None) mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1); rand = rand_1 = None _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast); _enter_autocast = None return (mm_1,) ``` But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`. Some potential followup improvement: 1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py` 2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status. Test Plan: CI ``` parsh --build-flags fbcode//mode/dev-nosan fbcode//caffe2/test:test_export run_tests("test_predispatch_autocast") ``` Reviewed By: angelayi Differential Revision: D60206382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914 Approved by: https://github.com/angelayi	2024-08-03 05:48:57 +00:00
Avik Chaudhuri	ed4493de0e	dim name is identifier (#132557 ) Summary: Dim names appear in suggested fixes so should be valid Python identifiers. Test Plan: none Differential Revision: D60696854 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132557 Approved by: https://github.com/pianpwk	2024-08-03 05:28:50 +00:00
Shangdi Yu	a503136583	[export] Detect whether case_name is registered in exportdb (#132420 ) Summary: - moves logging functionalities into `torch/_export/db/logging.py` file. - add a check in `_dynamo/eval_frame.py` to check for optional input and error out with `UnsupportedError` - change the case name of `torch_sym_int` to `unsupported_operator` - Check if the case name is registered in exportdb, if so, we give a link to the case in exportdb. - TODO: add test Test Plan: CI Running the example in https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input gives the following error logging: ``` E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] Parameter y is optional with a default value of tensor([[-0.1633, 1.2414, -0.1071], E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] [-0.1936, -0.9425, -0.0824]]) E0730 10:53:33.688000 4155538 torch/export/_trace.py:1043] See optional_input in exportdb for unsupported case. https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input ...... File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/389acaeb40d57230/tutorials/pytorch/nntest/__torchtest__/torchtest#link-tree/torch/_dynamo/eval_frame.py", line 1091, in produce_matching raise Unsupported( torch._dynamo.exc.Unsupported: Tracing through optional input is not supported yet ``` It also logs a `export.error.classified` event in Scuba. Reviewed By: zhxchen17 Differential Revision: D60427208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132420 Approved by: https://github.com/zhxchen17	2024-08-03 01:08:48 +00:00
PyTorch MergeBot	3855ac5a5d	Revert "[export] Add print_readable to unflattener (#128617 )" This reverts commit `ab9791c0e3`. Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/angelayi due to never got landed internally due to weird flow... sorry ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2264224466))	2024-08-01 23:47:29 +00:00
Oguz Ulgen	72d2dba992	Add None return type to init (#132335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335 Approved by: https://github.com/albanD	2024-08-01 15:26:45 +00:00
Tugsbayasgalan Manlaibaatar	928adb7cc2	Fix empty fake mode problem (#131995 ) Title Differential Revision: [D60348541](https://our.internmc.facebook.com/intern/diff/D60348541/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131995 Approved by: https://github.com/angelayi ghstack dependencies: #131988	2024-08-01 04:55:37 +00:00
Tugsbayasgalan Manlaibaatar	073430ebea	Don't check for autograd state when lowering to inference IR (#131988 ) When lowering to inference IR, we shouldn't error on autograd state changes because we will have preserved the autograd state change at the training level. I think the more correct way of implementing it would be to wrap autograd ops in HOP before decomposing, but that seems low ROI. Differential Revision: [D60346235](https://our.internmc.facebook.com/intern/diff/D60346235/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131988 Approved by: https://github.com/angelayi	2024-08-01 04:15:37 +00:00
angelayi	ab9791c0e3	[export] Add print_readable to unflattener (#128617 ) Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](`17b45e905a/torch/fx/graph_module.py (L824)`)), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module. Example print from `python test/export/test_unflatten.py -k test_unflatten_nested` ``` class UnflattenedModule(torch.nn.Module): def forward(self, x: "f32[2, 3]"): # No stacktrace found for following nodes rootparam: "f32[2, 3]" = self.rootparam # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam); x = rootparam = None # No stacktrace found for following nodes foo: "f32[2, 3]" = self.foo(mul); mul = None bar: "f32[2, 3]" = self.bar(foo); foo = None return (bar,) class foo(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # No stacktrace found for following nodes child1param: "f32[2, 3]" = self.child1param nested: "f32[2, 3]" = self.nested(mul); mul = None # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param); nested = child1param = None return add class nested(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul); mul = None return div class bar(torch.nn.Module): def forward(self, add: "f32[2, 3]"): # No stacktrace found for following nodes child2buffer: "f32[2, 3]" = self.child2buffer # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer); add = child2buffer = None return sub ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617 Approved by: https://github.com/zhxchen17, https://github.com/pianpwk	2024-07-30 00:41:44 +00:00
Avik Chaudhuri	c49e857d32	[pt] immutable accessors in graph signature (#131940 ) Summary: splitting PT part of D60253955 Test Plan: existing tests Differential Revision: D60296909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131940 Approved by: https://github.com/angelayi, https://github.com/zhxchen17	2024-07-27 05:32:53 +00:00
Yidi Wu	404a8ae8f6	[export] fix set_grad x tensor constant. (#131787 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/130379. The original error is verifier finds that the placeholder nodes' meta[''val"] are missing in subgraph of WrapSetGradEnabled hop. In this PR, we fixed it by re-ordering the replace_set_grad_with_hop_pass with lift_constant_tensor pass because only after lift_constant_pass, all the constant attrs start to have meta["val"]. Test Plan: buck2 test test:test_export -- -r "test_setgrad_lifted_tensor" Differential Revision: D60244935 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131787 Approved by: https://github.com/yushangdi	2024-07-26 16:41:59 +00:00
Zhengxu Chen	7feaa73057	[export] Remove deprecated fields from ExportedProgram ctor. (#131697 ) Summary: as title. Test Plan: CI Reviewed By: SherlockNoMad Differential Revision: D60078426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131697 Approved by: https://github.com/ydwu4	2024-07-26 16:19:46 +00:00
PyTorch MergeBot	7339c8ab28	Revert "immutable accessors in graph signature (#131807 )" This reverts commit `6fd28fc228`. Reverted https://github.com/pytorch/pytorch/pull/131807 on behalf of https://github.com/atalman due to Broke CI: [GH job link](https://github.com/pytorch/pytorch/actions/runs/10111847569/job/27965364355) [HUD commit link](`608057afe2`) ([comment](https://github.com/pytorch/pytorch/pull/131807#issuecomment-2252875417))	2024-07-26 14:21:12 +00:00
Avik Chaudhuri	6fd28fc228	immutable accessors in graph signature (#131807 ) Test Plan: existing tests Differential Revision: D60253955 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131807 Approved by: https://github.com/ydwu4	2024-07-26 08:56:19 +00:00
Avik Chaudhuri	2bf649f5ae	suggested fix for data-dependent error (#125378 ) Suggests fixes for data-dependent errors in non-strict export. Any data-dependent error has an unresolved condition on unbacked symints. A mechanizable strategy for fixing such errors, which this PR enables, is to "bash" them using `torch._check()`s. For each error we suggest using `torch._check()` on the condition or its negation. The user selects and copy-pastes the suggested fix and continues. For example, here's an existing data-dependent error message with the suffix following `<snip>...</snip>` added by this PR: ``` Could not guard on data-dependent expression Eq(u2, u1) (unhinted: Eq(u2, u1)). (Size-like symbols: u1) <snip>...</snip> User code: File "test/export/test_export.py", line 1944, in forward return r.view(items[0], items[2]) Suggested fixes (please choose one of the following): 1. torch._check(items[2] == r.shape[1]) 2. torch._check(items[2] != r.shape[1])" ``` Tests in this PR illustrate this workflow, by taking common examples of data-dependent errors and bashing them until success, purely based on suggested fixes. In particular, we test this workflow on the "puzzlers" in https://www.internalfb.com/intern/anp/view/?id=5330476 (thanks @ezyang). In terms of implementation, we focus on non-strict mode, where we can intercept torch function calls to install a handler that walks up the stack from the error, finding the closest non-torch frame and inspecting its locals for symints appearing in the error. The suggested fixes then access these symints through the local variables so that they can be (a) easily understood by the user (b) directly added to the code. Implementing this idea in strict mode is follow-up work—we have already investigated what it would take, and decided to separate it out of this PR for reasons described next. It's not too hard to map symints to locals in Dynamo (although it needs to happen elsewhere, i.e., intercepting torch function calls won't work). However, unfortunately this doesn't seem to be enough; the graph modules created by Dynamo when going through AOTAutograd can raise further data-dependent errors in some cases, and thus we need yet another mechanism to map symints to locals for graph modules, via captured source-level metadata and FX node walking. This latter component will require some care to build properly, or we might conclude it is altogether unnecessary and fix Dynamo instead. Differential Revision: D56867432 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125378 Approved by: https://github.com/ezyang	2024-07-26 08:34:50 +00:00
Avik Chaudhuri	5b05ad9697	fix non-persistent buffers (#131756 ) Summary: Dynamo doesn't track whether buffers are `persistent`. This led to some ugly code where we would mark buffers as always persistent when creating signatures, then later check whether the buffers were not in the state dict to infer whether they were non-persistent, and use this to fix up the signature. This PR instead defines a utility to look up all the non-persistent buffers registered inside a module (this information is recorded in a private `_non_persistent_buffers_set` module attribute), and uses it to (a) correctly set the persistent flag on buffers when creating signatures (b) transfer this information to a Dynamo-traced graph module, which then causes non-persistent buffers to (correctly) not show up in the state dict. Test Plan: existing tests + new case with non-persistent buffer in nested module Differential Revision: D60224656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131756 Approved by: https://github.com/zhxchen17, https://github.com/ydwu4	2024-07-26 04:45:30 +00:00
Yidi Wu	2c1851f04e	[export] fix output node's meta (#131706 ) Summary: This pr fixes all the places in strict export stack where the output node's meta is not preserved correctly. However, we're getting a new error for the test we intend to fix: `buck2 run caffe2/test/quantization:test_quantization -- -r "test_re_export_preserve_handle"`: The `get_attr` nodes has wrong metadata. I guess there are more things need to be fixed to get it working but it's beyond the scope of this PR. Test Plan: buck2 run caffe2/test/quantization:test_quantization -- -r "test_re_export_preserve_handle" Differential Revision: D60198221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131706 Approved by: https://github.com/yushangdi	2024-07-25 18:44:21 +00:00
Yidi Wu	75c4176b05	[export][BE] consolidate export and export_for_training (#131496 ) Summary: This PR consolidates the implementation of export and export_for_training to maximize code re-use. Also add some type annotations and comments in the code for better readability. Test Plan: Existing tests. Differential Revision: D60130515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131496 Approved by: https://github.com/avikchaudhuri, https://github.com/pianpwk	2024-07-25 16:35:16 +00:00
Shangdi Yu	6bc8db1d32	Rename is_training flag to have more information (#131618 ) Summary: rename is_training flag into dispatch_tracing_mode = “make_fx” or “aot_export” Test Plan: OSS CI Differential Revision: D60154327 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131618 Approved by: https://github.com/ydwu4	2024-07-25 16:29:55 +00:00
Shangdi Yu	29c9f8c782	[export] Fix `graph_break` log registration error when importing export/_trace.py (#131523 ) Summary: When importing `_trace.py`, put `torch._dynamo.exc.Unsupported` in the global variable ``_ALLOW_LIST`` can cause import to ``export/_trace.py`` to fail with error: ValueError: Artifact name: 'graph_breaks' not registered, please call register_artifact('graph_breaks') in torch._logging.registrations. The error is directly raise on line `graph_breaks_log = torch._logging.getArtifactLogger(__name__, "graph_breaks")` in `_dynamo/exc.py`. I've checked that ``register_artifact('graph_breaks')`` does already exist in torch._logging.registrations. Explicitly call `import torch._logging` doesn't fix the issue. (see T196719676) We move ``_ALLOW_LIST`` to be a local variable. Test Plan: buck2 test 'fbcode//mode/opt' fbcode//aiplatform/modelstore/publish/utils/tests:fc_transform_utils_test -- --exact 'aiplatform/modelstore/publish/utils/tests:fc_transform_utils_test - test_serialized_model_for_disagg_acc (aiplatform.modelstore.publish.utils.tests.fc_transform_utils_test.PrepareSerializedModelTest)' buck2 test 'fbcode//mode/opt' fbcode//aiplatform/modelstore/publish/utils/tests:fc_transform_utils_test -- --exact 'aiplatform/modelstore/publish/utils/tests:fc_transform_utils_test - test_serialized_test_dsnn_module (aiplatform.modelstore.publish.utils.tests.fc_transform_utils_test.PrepareSerializedModelTest)' Differential Revision: D60136706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131523 Approved by: https://github.com/zhxchen17	2024-07-24 22:40:24 +00:00
Avik Chaudhuri	83d19620f6	kill tmp _is_executorch flag (#131488 ) Test Plan: existing tests Differential Revision: D60126186 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131488 Approved by: https://github.com/ydwu4	2024-07-24 08:51:37 +00:00
Aaron Orenstein	5a0068cc69	[BE] mypy: disallow untyped decorators (#131428 ) Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations. Step 1 - Enable the error and override in all the offending files. #131429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428 Approved by: https://github.com/justinchuby, https://github.com/oulgen	2024-07-23 21:50:55 +00:00
Avik Chaudhuri	94f22eb6b2	refactor post-trace fakification in strict (#131421 ) Summary: Previously it was unclear what `_convert_input_to_fake` actually does (used in strict), and in particular how it is different from `make_fake_inputs` (used in non-strict). This PR splits that function to work purely on user inputs, then renames it to `extract_fake_inputs` and adds a comment clarifying what it does—namely, it extracts fake inputs from a given graph module instead of "converting inputs to fake inputs" (as suggested by the current name) or "making fake inputs" (as happens in non-strict, where no tracing has taken place yet). The remainder of that function used to also fakify params and buffers. It turns out that this part is identical to what happens in non-strict, hence we also pull `make_fake_inputs` out from `non_strict_utils` into `_trace`, merge it with another util, and make both modes call it. Test Plan: existing tests Differential Revision: D60084442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131421 Approved by: https://github.com/zhxchen17	2024-07-23 18:23:03 +00:00
Shangdi Yu	cfb9ccab6c	[export] Filter errors by exception type, add case name (#131327 ) Summary: - Log export errors to Scuba and mark them with "classified" and "unclassified" - Classify errors by exception type (ALLOW_LIST) and a `case_name` attribute - Add `case_name` for some exceptions. Test Plan: Running the code below logs a classified error to `torch_export_usage` table in Scuba. ``` import torch from torch._export.db.case import SupportLevel class TorchSymMin(torch.nn.Module): """ torch.sym_min operator is not supported in export. """ def forward(self, x): return x.sum() + torch.sym_min(x.size(0), 100) example_args = (torch.randn(3, 2),) tags = {"torch.operator"} support_level = SupportLevel.NOT_SUPPORTED_YET model = TorchSymMin() torch.export.export(model, example_args) `` Differential Revision: D59981459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131327 Approved by: https://github.com/zhxchen17	2024-07-23 18:01:13 +00:00
Sherlock Huang	c1ef214046	Print ExportedProgram without color by default (#131399 ) Summary: Without plugin, colored ExportedProgram is not really readable. ![image](https://github.com/user-attachments/assets/319920a9-bb4b-4ad2-bcac-0c4f76973b11) Test Plan: CI Differential Revision: D60074481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131399 Approved by: https://github.com/angelayi	2024-07-23 16:41:55 +00:00
Shangdi Yu	29e2e2afb6	Revert D59561509: Multisect successfully blamed "D59561509: [FX][export] DCE pass, check schema for node impurity (#130395 )" for one test failure (#131341 ) Summary: This diff reverts D59561509 D59561509: [FX][export] DCE pass, check schema for node impurity (#130395) by yushangdi causes the following test failure: Tests affected: - [cogwheel:cogwheel_mtia_cmf_m5_shrunk_test#test_flow_with_verification](https://www.internalfb.com/intern/test/844425041436985/) Here's the Multisect link: https://www.internalfb.com/multisect/6533402 Here are the tasks that are relevant to this breakage: T191383430: 10+ tests unhealthy for ads_mtia_inference The backout may land if someone accepts it. If this diff has been generated in error, you can Commandeer and Abandon it. Test Plan: NA Differential Revision: D60029318 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131341 Approved by: https://github.com/angelayi	2024-07-23 05:23:47 +00:00
Avik Chaudhuri	1e5ecc4277	move save/load from _export to export (#131353 ) Test Plan: existing tests Differential Revision: D60053905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131353 Approved by: https://github.com/angelayi	2024-07-23 00:48:28 +00:00
angelayi	26f7dd286b	[export] Allow non-CIA ops to be preserved (#131075 ) I feel like the semantics of `run_decompositions(preserve_ops,...)` should be that we should always preserve whatever operator is put into `preserve_ops`, even if it's not CIA? Pull Request resolved: https://github.com/pytorch/pytorch/pull/131075 Approved by: https://github.com/bdhirsh	2024-07-23 00:41:48 +00:00
PyTorch MergeBot	b9912f31ef	Revert "[export] fix zero arg export in training_ir (#130990 )" This reverts commit `50436d5bdb`. Reverted https://github.com/pytorch/pytorch/pull/130990 on behalf of https://github.com/clee2000 due to failing some executorch and torchrec tests internally D60006710 ([comment](https://github.com/pytorch/pytorch/pull/130990#issuecomment-2243395316))	2024-07-22 16:49:25 +00:00
Yidi Wu	50436d5bdb	[export] fix zero arg export in training_ir (#130990 ) Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing. edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure: a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state. The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130990 Approved by: https://github.com/pianpwk	2024-07-20 02:35:13 +00:00
Pian Pawakapan	745324e487	[export] turn on hybrid symints by default (#130775 ) Sets `prefer_deferred_runtime_asserts_over_guards=True` for export, so any guards emitted from `SymNode.expect_true` (for example, guards that are implicitly required to be true for an op to succeed) won't lead to constraint violations. Instead these should appear in the graph as runtime asserts, or potentially as replacement expressions for placeholder shapes. For example, this reshape op should emit s0 * s1 = s2, deferred as a runtime assert. ``` x = torch.randn(4, 8) # [s0, s1] y = torch.randn(32) # [s2] out = x.reshape(-1) + y # this emits Eq(s0 * s1, s2), and we represent y's shape as [s0s1] in the graph. ``` However, other complex guards can still cause export to fail, for instance guards emitted from `SymNode.guard_bool/guard_size_oblivious` (e.g. explicit if-else conditions in user code or lower-level op implementations hit during tracing) can still raise constraint violations. These can be deferred with `allow_complex_guards_as_runtime_asserts=True`. We don't yet make this default, because while this makes export more likely to succeed, it results in non-trivial asserts being emitted that often represent specialization to a variant of the op, or checks related to 0/1 specialization. We also remove forced specializations for export and kill the `_disable_forced_specializations` flag - now any guard we can't express with Dims/DerivedDims either are handled with Hybrid SymInts, or should be resolved with rewriting or deferring. Follow up: Currently, `ShapeEnv._set_replacement()` is called for complex equality expressions (e.g. s2 -> s0s1 in the example above), and the ExportedProgram stores `s0*s1` in the input placeholder. This isn't checked for validity when the program is run, so an option is to avoid replacement and/or runtime assert on equality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130775 Approved by: https://github.com/avikchaudhuri	2024-07-18 17:40:58 +00:00
Zhengxu Chen	5484c86021	[export] Fully support extension op in serialization/deserialization. (#130851 ) Summary: Finishing up the mechanism to "register" certain types of operators to a registry so that the serializer can handle them correctly. This is expected to be firstly used by executorch. Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_export_with_extension_op_serialization Differential Revision: D59825148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130851 Approved by: https://github.com/angelayi	2024-07-18 16:47:53 +00:00
Shangdi Yu	27ded03545	[FX][export] DCE pass, check schema for node impurity (#130395 ) Change the default DCE pass to check node schema for impure nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130395 Approved by: https://github.com/angelayi, https://github.com/jgong5	2024-07-18 16:31:40 +00:00
PyTorch MergeBot	d6ae8bbf16	Revert "[export] Add print_readable to unflattener (#128617 )" This reverts commit `9fee87e4cd`. Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/clee2000 due to broke inductor/test_flex_attention https://github.com/pytorch/pytorch/actions/runs/9984688318/job/27595182606 `433ef4e444` Not run on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2236867975))	2024-07-18 15:31:51 +00:00
angelayi	6c2c8ee15b	[export] Remove preserved ops from decomp list (#130970 ) Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1466016147369925/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/130970 Approved by: https://github.com/bdhirsh	2024-07-18 05:15:22 +00:00
PyTorch MergeBot	433ef4e444	Revert "[FX][export] DCE pass, check schema for node impurity (#130395 )" This reverts commit `e22b0acc76`. Reverted https://github.com/pytorch/pytorch/pull/130395 on behalf of https://github.com/yushangdi due to breaking tests, need to rebase and fix ([comment](https://github.com/pytorch/pytorch/pull/130395#issuecomment-2235192986))	2024-07-18 02:46:03 +00:00
angelayi	9fee87e4cd	[export] Add print_readable to unflattener (#128617 ) Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](`17b45e905a/torch/fx/graph_module.py (L824)`)), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module. Example print from `python test/export/test_unflatten.py -k test_unflatten_nested` ``` class UnflattenedModule(torch.nn.Module): def forward(self, x: "f32[2, 3]"): # No stacktrace found for following nodes rootparam: "f32[2, 3]" = self.rootparam # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam); x = rootparam = None # No stacktrace found for following nodes foo: "f32[2, 3]" = self.foo(mul); mul = None bar: "f32[2, 3]" = self.bar(foo); foo = None return (bar,) class foo(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # No stacktrace found for following nodes child1param: "f32[2, 3]" = self.child1param nested: "f32[2, 3]" = self.nested(mul); mul = None # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param); nested = child1param = None return add class nested(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul); mul = None return div class bar(torch.nn.Module): def forward(self, add: "f32[2, 3]"): # No stacktrace found for following nodes child2buffer: "f32[2, 3]" = self.child2buffer # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer); add = child2buffer = None return sub ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617 Approved by: https://github.com/zhxchen17, https://github.com/pianpwk	2024-07-18 01:36:01 +00:00
Shangdi Yu	e22b0acc76	[FX][export] DCE pass, check schema for node impurity (#130395 ) Change the default DCE pass to check node schema for impure nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130395 Approved by: https://github.com/angelayi, https://github.com/jgong5	2024-07-18 00:55:20 +00:00
Pian Pawakapan	d96c80649f	[export] constants & non-persistent buffers for training IR (#130864 ) Summary: Uses original ExportedProgram constants and graph signature to inform decompositions, so that constant tensors and non-persistent buffers are respected for training IR. Removes 7 test failures for training IR. Test Plan: test_export Differential Revision: D59820909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130864 Approved by: https://github.com/angelayi	2024-07-17 18:27:53 +00:00
Pian Pawakapan	e8998d68c8	[export] add non-strict training IR (#130062 ) Summary: Adds non-strict implementation of training IR export. Any expected non-strict training IR failures are also either existing strict training IR or non-strict failures (no new failures added). 4 strict training IR failures also resolved. Refraining from unifying export/export_for_training, per @ydwu4's feedback :) Test Plan: added test_export_training_ir_to_run_decomp_non_strict.py for non-strict training IR Differential Revision: D59349454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130062 Approved by: https://github.com/ydwu4, https://github.com/zhxchen17	2024-07-16 17:08:00 +00:00
Aaron Orenstein	567482973d	typing fake_tensor.py (#128041 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128041 Approved by: https://github.com/eellison ghstack dependencies: #129182	2024-07-13 06:07:40 +00:00
Pian Pawakapan	988ed4d5db	[export] clean up allow_complex_guards_as_runtime_asserts flag (#130596 ) Summary: removes underscore, cleans up dead code in DimConstraints Test Plan: existing export tests Reviewed By: angelayi Differential Revision: D59612746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130596 Approved by: https://github.com/angelayi	2024-07-12 17:17:11 +00:00
Shangdi Yu	ea4b80e6d6	[FX][export] strict DCE pass, check schema for node impurity (#130552 ) Fixes the failure in `test/export/test_export_training_ir_to_run_decomp.py ` caused by dead code elimination removing node with side effects. For background, in export, we may want to export higher-level IRs that are not functional, so we need to check for side effects more carefully. A call_function node is impure if it has at least one mutable argument. Fixed the tests below: test_to_module_with_mutated_buffer_multiple_update_sub_later test_export_input_mutation_static_shape test_buffer_util Another attempt modifying the original DCE pass is made in PR #130395, but it breaks some other tests, so here we add a flag and use it for export only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130552 Approved by: https://github.com/pianpwk	2024-07-12 15:43:27 +00:00
Pian Pawakapan	18b7633bfb	[export] fix kwargs in run_decompositions() for training IR (#130553 ) Re-exporting GraphModule expects all inputs to be in args, though not in pytree-flattened format. This avoids failing when we run with a fx.Interpreter subclass in [AOTAutograd tracing](`973037be6a/torch/_functorch/_aot_autograd/traced_function_transforms.py (L760-L762)`). Removes 7 test failures for training IR export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130553 Approved by: https://github.com/zhxchen17, https://github.com/ydwu4	2024-07-11 22:53:18 +00:00
Zhengxu Chen	726a287271	[export] Expand verifier to be multiple on ExportedProgram (#130364 ) Summary: This diff updates the ExportedProgram class in PyTorch to allow for multiple verifiers to be attached to it. This is done by adding a new field to the ExportedProgram schema called "verifiers" which is a list of strings representing the names of the verifiers to be attached to the program. The verifiers are loaded using the "load_verifier" function which is defined in the "torch._export.serde.serialize" module. The "exported_program.dialect" field is also deprecated in favor of the "verifiers" field. Test Plan: CI Differential Revision: D59408546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130364 Approved by: https://github.com/angelayi, https://github.com/ydwu4	2024-07-11 20:34:49 +00:00
Yidi Wu	cd9bae30de	Allow kwargs in _remove_effect_tokens_pass (#130491 ) Summary: Previously, remove_effect_tokens pass didn't pass kwargs to the internal nodes. This PR fix it and add a test for it. Test Plan: buck2 run caffe2/test:test_export -- -r test_remove_effect_token_kwargs Reviewed By: angelayi Differential Revision: D59603147 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130491 Approved by: https://github.com/angelayi	2024-07-11 19:03:19 +00:00
Pian Pawakapan	1b3b4c2fb9	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) (#130380 ) original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train) Summary: This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Test Plan: contbuild & OSS CI, see `940e4477ab` Original Phabricator Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D59543603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380 Approved by: https://github.com/izaitsevfb	2024-07-10 19:23:37 +00:00
Shangdi Yu	c83b941141	[export] add dynamic shapes argument and infer from graph nodes (#129928 ) Fixes the example in #118304 for `torch._functorch.aot_autograd.aot_export_module` and `torch.export.export`. On a high level, the issue is caused by not detecting fake_mode when there's no input. Change plan: 1) we add a `dynamic_shapes: Union[bool, None] = None` arg to `aot_export_module` and `_aot_export_function`. 2) if the input is not a graph module, then we can only rely on this `dynamic_shapes` input arg. 3) If the input is a graph module, then we can traverse the graph and check. 4) So we check if the input mod is a graph module or just a module, and do 2) or 3) depending on the type. Fixes #129927 Bug source: dynamo's fake_mode is not detected correctly in `_convert_input_to_fake` in `_traced.py` when there’s no input to the graph). So in ` _strict_export_lower_to_aten_ir`, we create another fake_mode. `dynamo_fake_mode` is not the same as the fake_mode used by dynamo. Change plan: check `gm_torch_level` graph's node meta "example_value" for fake mode in addition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129928 Approved by: https://github.com/angelayi	2024-07-10 15:51:05 +00:00
PyTorch MergeBot	9c9744c3ac	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `940e4477ab`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))	2024-07-09 21:03:49 +00:00
Pian Pawakapan	940e4477ab	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-07 20:10:14 +00:00

... 3 4 5 6 7 ...

771 Commits