pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Avik Chaudhuri	ed55d356de	[alt] fix unroll in successive unflatten (#137646 ) We use nn_module_stack in unflatten to recognize when module calls begin and end. However the current format is not sufficient to detect module call boundaries when we have successive calls to the same module, because the successive instructions (end of one call, begin of next call) have the same nn_module_stack. This causes us to effectively "unroll" successive calls to a single call. This can cause problems when preserving module call signatures because the outputs of the successive calls might be concatenated in the single call. Previously we introduced the concept of a "call index" to generate multiple graphs when unflattening, one per call. This PR pushes this concept into nn_module_stack itself. In particular, the keys of nn_module_stack now go from `key` to `key@call_index`. (In a previous attempt, https://github.com/pytorch/pytorch/pull/137457, instead values in nn_module_stack go from (fqn, type) to (fqn, type, call_index), which is BC-breaking.) Note that we still do not have the ability to preserve module call signatures for multiple calls to the same module. But now instead of randomly crashing we give a proper error. OTOH when not preserving module call signatures we simply generate multiple calls, each with its own graph, possibly deduplicated, matching what we would do for non-successive calls. Test Plan: Like D64014936 Differential Revision: D64136277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137646 Approved by: https://github.com/angelayi	2024-10-12 15:53:52 +00:00
Tugsbayasgalan Manlaibaatar	02013da038	Lift restriction on training IR for unflatten (#137470 ) Differential Revision: [D64025578](https://our.internmc.facebook.com/intern/diff/D64025578) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137470 Approved by: https://github.com/avikchaudhuri	2024-10-08 22:30:24 +00:00
angelayi	1dc1b85714	[export] Move swap to a different file (#137134 ) Refactor so that unflattener doesn't become too messy Differential Revision: [D63719648](https://our.internmc.facebook.com/intern/diff/D63719648/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137134 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #136191, #137102	2024-10-06 04:28:18 +00:00
angelayi	fa9cd46d12	[export] Update swap's forward function (#137102 ) Downstream APS code was failing to run the previously swapped module because of some fx.GraphModule forward function weirdness (P1594789677). So to fix this, I just attached a custom forward function which matches the unflattened module's forward function. Differential Revision: [D63683422](https://our.internmc.facebook.com/intern/diff/D63683422/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137102 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #136191	2024-10-06 04:25:36 +00:00
angelayi	52d7704b32	[export] Add optimization passes (#136191 ) Added an optimization pass to the swap function which removes extraneous pytrees. Currently it removes the pytree flatten/unflatten calls between modules in very specific scenarios (all the inputs of one module go into the other). Future work can be to remove the input pytree.flatten if the inputs go directly into an unflatten, and output pytree unflatten if the outputs are directly from a pytree.flatten. Differential Revision: [D62879820](https://our.internmc.facebook.com/intern/diff/D62879820) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136191 Approved by: https://github.com/avikchaudhuri	2024-10-06 04:22:42 +00:00
Avik Chaudhuri	17718209ea	fix specialization bug in unflatten + preserve_module_call_signature (#137363 ) Summary: In unflatten, when we generate module calls when their signature has been preserved, we do not pass the original constant args. This can cause strange effects, e.g., if the module is swapped out with itself, we may suddenly go down a different path than the original, or even crash. Test Plan: added a test Reviewed By: angelayi Differential Revision: D63913750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137363 Approved by: https://github.com/angelayi	2024-10-05 04:26:02 +00:00
Tugsbayasgalan Manlaibaatar	d2d14d14e3	[RELAND] Fix unlift to preserve aliased constants (#137310 ) Differential Revision: [D63864743](https://our.internmc.facebook.com/intern/diff/D63864743) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137310 Approved by: https://github.com/avikchaudhuri	2024-10-04 18:15:52 +00:00
PyTorch MergeBot	525f6715bc	Revert "Fix unlift to unblock training IR + run_decomp on aliasing constants (#137162 )" This reverts commit `f96020c246`. Reverted https://github.com/pytorch/pytorch/pull/137162 on behalf of https://github.com/jovianjaison due to Sorry for reverting your changes but many jobs are failing with NameError: name _recursive_getattr is not defined + a Lint job fails ([comment](https://github.com/pytorch/pytorch/pull/137162#issuecomment-2392036062))	2024-10-03 18:17:56 +00:00
Tugsbayasgalan Manlaibaatar	f96020c246	Fix unlift to unblock training IR + run_decomp on aliasing constants (#137162 ) When we populate unlifted graph module, we actually only "unlift" constant tensor inputs which is problematic because export de-duplicates aliasing constants. As a result, we only register one constant instead of two constants. This PR fixes that by querying ep.constants table instead of ep.graph_signature.lifted_tensor_constants. Differential Revision: [D63743111](https://our.internmc.facebook.com/intern/diff/D63743111) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137162 Approved by: https://github.com/pianpwk	2024-10-03 17:28:53 +00:00
Avik Chaudhuri	cd5d1fe015	unflatten with specialized graphs per submodule call (#137013 ) Previously we were making a fairly restrictive assumption when unflattening an exported program: for any submodule, we would assert that the graph of every call to that submodule must be the same. This assertion is load-bearing, i.e., if we simply remove the assertion then we can get incorrect results, as shown by the following example. ``` class N(torch.nn.Module): def forward(self, x, b): if b: return x + 1 else: return x + 2 class M(torch.nn.Module): def __init__(self): super().__init__() self.n = N() def forward(self, x): x0 = x + 3 x1 = self.n(x0, True) x2 = x1 + 4 x3 = self.n(x2, False) return x3 + 5 m = M() inp = (torch.ones(1),) print(m(inp)) # tensor([16.]) ep = torch.export.export(m, inp) print(ep.module()(inp)) # tensor([16.]) unflattened = torch.export.unflatten(ep) print(unflattened(inp)) # tensor([15.]) ``` However, this goes against the spirit of specializing graphs when exporting: we should expect* that for every call to a submodule we might generate a different graph. The goal of this PR is to fix unflattening to handle multiple specialized graphs corresponding to multiple calls to the same submodule. The idea is simple: for every call to a child module `foo`, we will create potentially different child modules `foo`, `foo@1`, `foo@2`, etc. and use those names as targets in `callmodule` instructions in the parent graph. An immediate consequence of this is that the list of fqns in an unflattened module may not be the same as an exported module. Note that all these variants share the same parameters / buffers, so that multiple calls to the same submodule can share state as expected. However, as described so far this scheme may end up with needlessly too many submodules. Thus, between calls to the same submodule, if graphs are equal then we optimize away the extra submodules and reuse call names as much as possible. Moreover, when submodules are shared across fqns, we also try to de-duplicate graphs corresponding to their calls as much as possible. Note that no matter what, information about which submodule was called is still preserved, so that if a submodule has to be swapped with another, one can still find all calls to the former submodule and replace them with calls to the latter. A note on the choice of naming scheme for call names: instead of generating "sibling" modules `foo@1`, `foo@2`, etc. for `foo`, we had considered generating "children" modules `foo._1`, `foo._2`, etc. of `foo`. However this can cause spurious cycles when de-duplicating graphs. E.g., suppose that `foo` is an alias for `bar._1` and `foo._1` is an alias for `bar`, then we must either introduce a cycle or drop the opportunity to optimize. Another idea would be to make `foo` a dummy module that contains `foo._0` corresponding to the first call, but this necessitates too many changes to existing tests and hurts the common case. Differential Revision: D63642479 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137013 Approved by: https://github.com/pianpwk	2024-10-03 00:55:44 +00:00
Ivan Zaitsev	b35f70da05	[ez] fixup the export of D62879819 (#136900 ) a line from D62879819 (#136190) went missing somehow Pull Request resolved: https://github.com/pytorch/pytorch/pull/136900 Approved by: https://github.com/atalman	2024-09-28 13:46:17 +00:00
angelayi	210b136c07	[export] Add experimental swap API (#136190 ) Prototyped the following API which takes in an ExportedProgram, a dictionary of fqn to modules to swap, and returns a (unlifted) GraphModule ``` _swap_modules( ep: ExportedProgram, modules_to_swap: Dict[str, torch.nn.Module] ) -> torch.fx.GraphModule: ``` Differential Revision: [D62879819](https://our.internmc.facebook.com/intern/diff/D62879819) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136190 Approved by: https://github.com/avikchaudhuri	2024-09-24 22:50:44 +00:00
Huanyu He	a4e9a1c90b	[TorchRec][PT2 IR][APF] short circuit the flatten/unflatten between EBC and KTRegroupAsDict modules (#136045 ) Summary: # context * for the root cause and background please refer to this [post](https://fb.workplace.com/groups/1028545332188949/permalink/1042204770823005/) * basica idea of this diff is to short circuit the pytree flatten-unflatten function pairs between two preserved modules, i.e., EBC/fpEBC and KTRegroupAsDict. NOTE: There could be multiple EBCs and one single KTRegroupAsDict as shown in the [pic](https://fburl.com/gslide/lcyt8eh3) {F1864810545} * short-circuiting the EBC-KTRegroupAsDict pairs are very special and a must in most of the cases due to the EBC key-order issue with distributed table lookup. * hide all the operations behind a control flag `short_circuit_pytree_ebc_regroup` to the torchrec main api call `decapsulate_ir_modules`, which should only be visible to the infra layer, not to the users. # details * The `_short_circuit_pytree_ebc_regroup` function finds all the EBCs/fpEBC and KTRegroupAsDict modules in an unflattened module. Retrieve their fqns and sort to in_fqns (regroup_fqns) and out_fqns (ebc_fqns). Because currently the fpEBC is swapped as a whole, so we do some extra fqn logic to filter out the EBC that belongs to an up-level fpEBC. * a util function `prune_pytree_flatten_unflatten` removes the in-coming and out-going pytree flatten/unflatten function calls in the graph module, based on the given fqns. WARNING: The flag `short_circuit_pytree_ebc_regroup` should be turned on if EBCs are used and EBC sharding is needed. Assertions are also added if can't find a `KTRegroupAsDict` module, or `finalize_interpreter_modules` is not `True`. # additional changes * absorb the `finalize_interpreter_modules` process inside the torchrec main api `decapsulate_ir_modules`. * set `graph.owning_module` in export.unflatten as required by the graph modification * add one more layer of `sparse_module` for closely mimicing the APF model structure. Test Plan: # run test * serializer ``` buck2 run fbcode//mode/opt fbcode//torchrec/ir/tests:test_serializer ``` * apf ``` buck2 run fbcode//mode/opt fbcode//aps_models/ads/gmp/tests/ne/e2e_deterministic_tests:gmp_e2e_ne_tests -- --filter-text 'test_mtml_instagram_model_562438350_single_gpu_with_ir' ``` * local mp run ``` ==== Finished E2E deterministic test for mtml_instagram_model_gmp_474023725_non_kjt_unary ==== finished test_mtml_instagram_model_562438350_single_gpu_with_ir Imports took: 6.0s! Profile with --import-profiler. --_ \|""---__ Executed 1 example in 203.1s: \|'.\| \|\| . """\| Successful: 1 \| \|\| \|\| /\|\""-. \| Failed: 0 \| \|\| \|\| \| \| \| Skipped: 0 \| \|\| \|\| \| \\|/ \| Not executed: 8 \|."\| \|\| --"" '__\| https://testslide.readthedocs.io/ --" \|__---""" ``` Differential Revision: D62606738 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136045 Approved by: https://github.com/angelayi	2024-09-17 18:42:56 +00:00
Tugsbayasgalan Manlaibaatar	c18052da0e	Add some minor doc improvement and ban using training IR for unflattener (#135549 ) Title Differential Revision: [D62412490](https://our.internmc.facebook.com/intern/diff/D62412490/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135549 Approved by: https://github.com/yushangdi	2024-09-10 06:48:42 +00:00
Zhengxu Chen	116fd474da	[export] Expand coverage to more copied sym ops for unflattener. (#135119 ) Test Plan: buck2 test 'fbcode//mode/opt' fbcode//torchrec/ir/tests:test_serializer -- --run-disabled ``` File changed: fbcode//caffe2/torch/export/unflatten.py Buck UI: https://www.internalfb.com/buck2/2e0377e7-e2b6-4bd0-8133-a787245165a0 Test UI: https://www.internalfb.com/intern/testinfra/testrun/5066549824883887 Network: Up: 0B Down: 0B Jobs completed: 16. Time elapsed: 10.2s. Tests finished: Pass 6. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Differential Revision: D62190172 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135119 Approved by: https://github.com/yushangdi	2024-09-05 21:58:20 +00:00
Angela Yi	9c38b00999	[export] Add ability to run eagerly on UnflattenedModule (#133996 ) Summary: Added the contextmanager, `_disable_interpreter`, which is meant to put around a call to `unflatten`. This will generate an UnflattendModule and sub-InterpreterModules which will not use torch.fx.Interpreter to run eagerly. We want to have this as a state of the module instead of a contextmanager around running the module because it's not clear where we are calling the unflattened module. This seems to improve the performance: https://fb.workplace.com/groups/1075192433118967/posts/1473590629945810/?comment_id=1473621763276030 Test Plan: CI Differential Revision: D60939034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133996 Approved by: https://github.com/pianpwk	2024-09-05 20:28:42 +00:00
Zhengxu Chen	a19a7524f6	[export] Make sure getitem replacement are synced with module call graph. (#134830 ) Summary: When we are placing nodes in the graph, we should also replace the references in module_call_graph. Test Plan: buck2 run 'fbcode//mode/opt' torchrec/fb/ir/tests:test_serializer -- --filter-regex test_serialize_deserialize_vlea buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_serialize_empty_value_vlea' --run-disabled buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_deserialized_device_vle' --run-disabled Differential Revision: D62014035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134830 Approved by: https://github.com/angelayi	2024-08-30 16:47:05 +00:00
Angela Yi	f5a2a22dc4	[export] Fix unflattener to respect nn.Parameter requires_grad (#134353 ) Summary: Fixes P1539870235 Test Plan: CI Differential Revision: D61726403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134353 Approved by: https://github.com/pianpwk	2024-08-23 22:49:34 +00:00
angelayi	a270800f0b	[export][reland] Add print_readable to unflattened module (#132817 ) Reland https://github.com/pytorch/pytorch/pull/128617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132817 Approved by: https://github.com/pianpwk	2024-08-08 06:05:30 +00:00
Yidi Wu	618e2c9de4	fix torch rec test failure (#132437 ) Summary: Fixes T192448049. The module call form an unusal call stack for the nodes: https://www.internalfb.com/phabricator/paste/view/P1507230978. This is currently not supported by unflattener and need some extra design to make it work. Test Plan: buck2 run 'fbcode//mode/opt' torchrec/distributed/tests:test_pt2 -- --filter-text "test_sharded_quant_fpebc_non_strict_export" Reviewed By: zhxchen17 Differential Revision: D60528900 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132437 Approved by: https://github.com/Skylion007	2024-08-05 18:06:07 +00:00
PyTorch MergeBot	3855ac5a5d	Revert "[export] Add print_readable to unflattener (#128617 )" This reverts commit `ab9791c0e3`. Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/angelayi due to never got landed internally due to weird flow... sorry ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2264224466))	2024-08-01 23:47:29 +00:00
angelayi	ab9791c0e3	[export] Add print_readable to unflattener (#128617 ) Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](`17b45e905a/torch/fx/graph_module.py (L824)`)), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module. Example print from `python test/export/test_unflatten.py -k test_unflatten_nested` ``` class UnflattenedModule(torch.nn.Module): def forward(self, x: "f32[2, 3]"): # No stacktrace found for following nodes rootparam: "f32[2, 3]" = self.rootparam # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam); x = rootparam = None # No stacktrace found for following nodes foo: "f32[2, 3]" = self.foo(mul); mul = None bar: "f32[2, 3]" = self.bar(foo); foo = None return (bar,) class foo(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # No stacktrace found for following nodes child1param: "f32[2, 3]" = self.child1param nested: "f32[2, 3]" = self.nested(mul); mul = None # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param); nested = child1param = None return add class nested(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul); mul = None return div class bar(torch.nn.Module): def forward(self, add: "f32[2, 3]"): # No stacktrace found for following nodes child2buffer: "f32[2, 3]" = self.child2buffer # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer); add = child2buffer = None return sub ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617 Approved by: https://github.com/zhxchen17, https://github.com/pianpwk	2024-07-30 00:41:44 +00:00
PyTorch MergeBot	d6ae8bbf16	Revert "[export] Add print_readable to unflattener (#128617 )" This reverts commit `9fee87e4cd`. Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/clee2000 due to broke inductor/test_flex_attention https://github.com/pytorch/pytorch/actions/runs/9984688318/job/27595182606 `433ef4e444` Not run on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2236867975))	2024-07-18 15:31:51 +00:00
angelayi	9fee87e4cd	[export] Add print_readable to unflattener (#128617 ) Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](`17b45e905a/torch/fx/graph_module.py (L824)`)), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module. Example print from `python test/export/test_unflatten.py -k test_unflatten_nested` ``` class UnflattenedModule(torch.nn.Module): def forward(self, x: "f32[2, 3]"): # No stacktrace found for following nodes rootparam: "f32[2, 3]" = self.rootparam # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam); x = rootparam = None # No stacktrace found for following nodes foo: "f32[2, 3]" = self.foo(mul); mul = None bar: "f32[2, 3]" = self.bar(foo); foo = None return (bar,) class foo(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # No stacktrace found for following nodes child1param: "f32[2, 3]" = self.child1param nested: "f32[2, 3]" = self.nested(mul); mul = None # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param); nested = child1param = None return add class nested(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul); mul = None return div class bar(torch.nn.Module): def forward(self, add: "f32[2, 3]"): # No stacktrace found for following nodes child2buffer: "f32[2, 3]" = self.child2buffer # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer); add = child2buffer = None return sub ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617 Approved by: https://github.com/zhxchen17, https://github.com/pianpwk	2024-07-18 01:36:01 +00:00
Pian Pawakapan	1b3b4c2fb9	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) (#130380 ) original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train) Summary: This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Test Plan: contbuild & OSS CI, see `940e4477ab` Original Phabricator Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D59543603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380 Approved by: https://github.com/izaitsevfb	2024-07-10 19:23:37 +00:00
PyTorch MergeBot	9c9744c3ac	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `940e4477ab`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))	2024-07-09 21:03:49 +00:00
Pian Pawakapan	940e4477ab	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-07 20:10:14 +00:00
PyTorch MergeBot	963f430d13	Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599 )" This reverts commit `0267b2ddcb`. Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause a landrace and fails inductor/test_cudagraph_trees in trunk `0267b2ddcb` ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2211690518))	2024-07-06 07:20:05 +00:00
Pian Pawakapan	0267b2ddcb	[runtime asserts] deduplicate runtime asserts & CSE (#128599 ) This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example: ``` z = torch.cat([x, x], dim=0) # 2s0 w = z.repeat(y.shape[0]) # 2s0s1 _w = w.shape[0] # something with _w ... # turns into -> s0 = x.shape[0] s1 = y.shape[0] _w0 = 2 s0 _w = _w0 * s1 ``` Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example: ``` torch.sym_constrain_range_for_size(n, min=2, max=16) torch.sym_constrain_range(n, min=4, max=20) torch._check(n >= 0) torch._check(n >= 3) torch._check(n <= 14) # turns into torch.sym_constrain_range_for_size(n) torch._check(n >= 4) torch._check(n <= 14) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599 Approved by: https://github.com/ezyang	2024-07-06 03:44:49 +00:00
Pian Pawakapan	8758fedbfc	[export] copy sym ops when respecting call module signature (#129153 ) Summary: Export, through AOTAutograd, [deduplicates](`11ff5345d2/torch/fx/experimental/proxy_tensor.py (L198)`) sym_size calls, which can cause issues during unflattening when the sym_size node is used in multiple submodules. If preserve_call_module_signature is set, these nodes can't be passed between submodules as placeholders, so the calls (and any downstream un-duplicated nodes) must be copied. Adding this to unflattener Test Plan: export unflatten test case Reviewed By: TroyGarden, angelayi Differential Revision: D58697231 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129153 Approved by: https://github.com/angelayi	2024-06-21 01:40:22 +00:00
PyTorch MergeBot	5efe71f134	Revert "[export] Add print_readable to unflattener (#128617 )" This reverts commit `5d9a609b4f`. Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/huydhn due to Sorry for reverting your change but another failed test shows up in trunk inductor/test_flex_attention.py where it needs to be updated `5d9a609b4f`. I guess it is easier to revert and reland this ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2169030779))	2024-06-15 01:46:23 +00:00
angelayi	5d9a609b4f	[export] Add print_readable to unflattener (#128617 ) Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](`17b45e905a/torch/fx/graph_module.py (L824)`)), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module. Example print from `python test/export/test_unflatten.py -k test_unflatten_nested` ``` class UnflattenedModule(torch.nn.Module): def forward(self, x: "f32[2, 3]"): # No stacktrace found for following nodes rootparam: "f32[2, 3]" = self.rootparam # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam); x = rootparam = None # No stacktrace found for following nodes foo: "f32[2, 3]" = self.foo(mul); mul = None bar: "f32[2, 3]" = self.bar(foo); foo = None return (bar,) class foo(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # No stacktrace found for following nodes child1param: "f32[2, 3]" = self.child1param nested: "f32[2, 3]" = self.nested(mul); mul = None # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param); nested = child1param = None return add class nested(torch.nn.Module): def forward(self, mul: "f32[2, 3]"): # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul); mul = None return div class bar(torch.nn.Module): def forward(self, add: "f32[2, 3]"): # No stacktrace found for following nodes child2buffer: "f32[2, 3]" = self.child2buffer # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer); add = child2buffer = None return sub ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617 Approved by: https://github.com/zhxchen17, https://github.com/pianpwk	2024-06-15 00:26:04 +00:00
angelayi	e9c6e8369c	Torchbind call method + effects support (#128397 ) Adds effect token support to torchbind method calls by allowing `with_effects` to take in `torch.ops._higher_order_ops.call_torchbind` as an input. Here is the print from `TORCH_LOGS="aot" python test/export/test_torchbind.py -k test_compile_obj_torchbind_op`: ```python def forward(self, arg0_1: "f32[0]", arg1_1: "f32[2]", arg2_1): # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1266 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.cos()) cos: "f32[2]" = torch.ops.aten.cos.default(arg1_1) with_effects = torch._higher_order_ops.effects.with_effects(arg0_1, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, cos); arg0_1 = cos = None getitem: "f32[0]" = with_effects[0]; with_effects = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1267 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.cos() + 1) cos_1: "f32[2]" = torch.ops.aten.cos.default(arg1_1) add: "f32[2]" = torch.ops.aten.add.Tensor(cos_1, 1); cos_1 = None with_effects_1 = torch._higher_order_ops.effects.with_effects(getitem, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, add); getitem = add = None getitem_2: "f32[0]" = with_effects_1[0]; with_effects_1 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1268 in f, code: torch.ops._TorchScriptTesting.queue_pop(tq) with_effects_2 = torch._higher_order_ops.effects.with_effects(getitem_2, torch.ops._TorchScriptTesting.queue_pop.default, arg2_1); getitem_2 = None getitem_4: "f32[0]" = with_effects_2[0]; with_effects_2 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1269 in f, code: torch.ops._TorchScriptTesting.queue_push(tq, x.sin()) sin: "f32[2]" = torch.ops.aten.sin.default(arg1_1); arg1_1 = None with_effects_3 = torch._higher_order_ops.effects.with_effects(getitem_4, torch.ops._TorchScriptTesting.queue_push.default, arg2_1, sin); getitem_4 = sin = None getitem_6: "f32[0]" = with_effects_3[0]; with_effects_3 = None # File: /data/users/angelayi/pytorch2/test/export/test_torchbind.py:1270 in f, code: return tq.pop(), tq.pop() + tq.size(), tq with_effects_4 = torch._higher_order_ops.effects.with_effects(getitem_6, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'pop'); getitem_6 = None getitem_8: "f32[0]" = with_effects_4[0] getitem_9: "f32[2]" = with_effects_4[1]; with_effects_4 = None with_effects_5 = torch._higher_order_ops.effects.with_effects(getitem_8, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'pop'); getitem_8 = None getitem_10: "f32[0]" = with_effects_5[0] getitem_11: "f32[2]" = with_effects_5[1]; with_effects_5 = None with_effects_6 = torch._higher_order_ops.effects.with_effects(getitem_10, torch.ops._higher_order_ops.call_torchbind, arg2_1, 'size'); getitem_10 = arg2_1 = None getitem_12: "f32[0]" = with_effects_6[0]; with_effects_6 = None add_1: "f32[2]" = torch.ops.aten.add.Tensor(getitem_11, 0); getitem_11 = None return (getitem_12, getitem_9, add_1) ``` In order to support this, this PR makes the following changes: * Adds `FakeScriptObject` to `CustomObjArgument`, which will be put on the `meta["val"]` of nodes representing torchbind objects. * Adds pickle/deepcopy support to FunctionSchema. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128397 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2024-06-14 21:28:17 +00:00
angelayi	b79d056e76	[export] FIx unflattener for preserving modules containing unused inputs (#128260 ) Currently unflattener fails if the module its preserving the module signature for contains unused inputs/outputs. This also fixes unflattener issues in D57829276. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128260 Approved by: https://github.com/pianpwk	2024-06-11 22:32:08 +00:00
PyTorch MergeBot	db2fa7b827	Revert "[export] FIx unflattener for preserving modules containing unused inputs (#128260 )" This reverts commit `093a4ff5f8`. Reverted https://github.com/pytorch/pytorch/pull/128260 on behalf of https://github.com/angelayi due to breaking windows test ([comment](https://github.com/pytorch/pytorch/pull/128260#issuecomment-2159050726))	2024-06-10 18:42:33 +00:00
angelayi	093a4ff5f8	[export] FIx unflattener for preserving modules containing unused inputs (#128260 ) Currently unflattener fails if the module its preserving the module signature for contains unused inputs/outputs. This also fixes unflattener issues in D57829276. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128260 Approved by: https://github.com/pianpwk	2024-06-10 18:39:33 +00:00
Aaron Orenstein	038b927590	Flip default value for mypy disallow_untyped_defs [7/11] (#127844 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127844 Approved by: https://github.com/oulgen ghstack dependencies: #127842, #127843	2024-06-08 18:49:45 +00:00
Ke Wen	6abca6a564	[export][unflatten] More strictly respect scope when removing inputs (#127607 ) Code snippet from TorchTitan (LLaMa): ``` for layer in self.layers.values(): h = layer(h, self.freqs_cis) ``` `self.freqs_cis` is a buffer of root module (`self`). It is also an explicit arg in the call signature of original `layer` modules. If not respecting scope -- `freqs_cis`'s scope only corresponds to root -- `_sink_param` can remove `freqs_cis` from `layer`'s call signature, resulting in runtime error. There are two fixes in this PR: 1. We filter out the `inputs_to_state` corresponding to the current scope, using existing code that does prefix matching. 2. We delay the removal of param inputs from `call_module` nodes' `args`, till `_sink_param` call on that submodule returns. The return now returns information on which input is actually removed by the submodule, thus more accurate than just doing: ``` for node in call_module_nodes: node.args = tuple(filter(lambda n: n.name not in inputs_to_state, node.args)) ``` Before the PR: ![Screenshot 2024-05-31 at 1 40 24 AM](https://github.com/pytorch/pytorch/assets/6676466/a2e06b18-44d5-40ca-b242-0edab45075b7) After the PR: ![Screenshot 2024-05-31 at 1 43 41 AM](https://github.com/pytorch/pytorch/assets/6676466/b72afb94-cdfa-420d-b88b-29a92bf2a0c0) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127607 Approved by: https://github.com/pianpwk	2024-06-04 06:43:54 +00:00
angelayi	da7bf1d588	[export] Fix unflatten with empty nn_module_stack (#126785 ) Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1433418843962989/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/126785 Approved by: https://github.com/tugsbayasgalan	2024-05-23 08:34:25 +00:00
Pian Pawakapan	e046c59e5b	[export] handle aliased/unused params for unflattening (#125758 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125758 Aliased and unused params are currently an issue for strict-mode export. For a model like this: ``` def __init__(self): # ... self.alpha = nn.Parameter(torch.randn(4)) self.beta = self.alpha self.gamma = self.alpha def forward(self, x): return x + self.beta ``` Dynamo will trace only 1 parameter (beta) and assign a dynamo name (e.g. `L__self___beta`) which can be difficult to match to the correct FQN in the original eager module. This leads to export graph signature potentially having the incorrect target FQN for the parameter, leading to downstream issues unflattening (the parameter may be assigned to the wrong target attribute, mismatching the relevant placeholder node in the unflattened module). This handles aliasing issues by assigning all tensors present in the state dict as module attributes, even if they're unused. Still, only the used tensors will appear in the graph's forward pass. Another issue that exists is weight-sharing is not maintained in unflattening (all params/buffers are re-cloned) - handle this by checking tensor ids too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125758 Approved by: https://github.com/zhxchen17	2024-05-14 23:00:46 +00:00
Pian Pawakapan	c9a258e474	[export] handle constant aliasing for export (#125509 ) Summary: Currently export will [error out](`2b5ae2611e/torch/export/_trace.py (L477)`) if a constant is aliased. This PR supports this by modifying ConstantAttrMap to map constants to a list of FQNs instead of a single FQN, populating the ExportedProgram constants dict to contain multiple entries to the same constant. Test Plan: added test case in test_export.py Differential Revision: D56955654 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125509 Approved by: https://github.com/angelayi, https://github.com/ydwu4	2024-05-10 00:14:37 +00:00
ydwu4	0302dc68bf	[Reland] Fakify script object inputs and attributes for non-strict ex… (#125490 ) A re-land of #124239. This PR fakify ScriptObject inputs and attributes in export non-strict mode by default. The basic idea is to only fakify the script object during tracing (i.e. aot_export). After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor. Concretely, we fakify all the script object inputs, and module attributes (gathered by constant_attrs). patch the module's attributes with fakified script object right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125490 Approved by: https://github.com/angelayi	2024-05-04 02:39:42 +00:00
PyTorch MergeBot	f1f142c44f	Revert "Fakify script object inputs and attributes for non-strict export (#124239 )" This reverts commit `ecc2e034f7`. Reverted https://github.com/pytorch/pytorch/pull/124239 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/124239#issuecomment-2089305447))	2024-05-01 23:56:00 +00:00
angelayi	a216d87c6b	[export] Fix for unflattening modules with duplicate tensors (#125192 ) In the given test case, we have a ModuleList of 3 modules (`norm.0`, `norm.1`, `norm.2`) which share the same `weight` and `bias` tensors. However when we trace, they all end up pointing to one state dict name, (ex. `norm.2`). ``` graph(): %p_norms_0_weight : [num_users=0] = placeholder[target=p_norms_0_weight] %p_norms_0_bias : [num_users=0] = placeholder[target=p_norms_0_bias] %p_norms_1_weight : [num_users=0] = placeholder[target=p_norms_1_weight] %p_norms_1_bias : [num_users=0] = placeholder[target=p_norms_1_bias] %p_norms_2_weight : [num_users=3] = placeholder[target=p_norms_2_weight] %p_norms_2_bias : [num_users=3] = placeholder[target=p_norms_2_bias] %input_ : [num_users=1] = placeholder[target=input_] %native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%input_, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm, 0), kwargs = {}) %native_layer_norm_1 : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%getitem, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {}) %getitem_3 : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm_1, 0), kwargs = {}) %native_layer_norm_2 : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%getitem_3, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {}) %getitem_6 : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm_2, 0), kwargs = {}) return (getitem_6,) ``` This causes an error in the unflattener where after constructing the submodules for `norm.0`, it will have the graph pointing to `norm.2.weight` and `norm.2.bias`: ``` graph(): %p_norms_2_bias : [num_users=1] = placeholder[target=p_norms_2_bias] %p_norms_2_weight : [num_users=1] = placeholder[target=p_norms_2_weight] %input_ : [num_users=1] = placeholder[target=input_] %native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%input_, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {}) %getitem : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm, 0), kwargs = {}) return getitem ``` Since the attributes are not within the same scope of the graph, (`norm.0` vs. `norm.2`), they will not be added to the subgraph, causing an error. So this PR handles the duplicate state dict attributes by modifying the `inputs_to_state` dict to map from node names to a list of possible state dict target names. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125192 Approved by: https://github.com/zhxchen17	2024-05-01 19:12:50 +00:00
ydwu4	ecc2e034f7	Fakify script object inputs and attributes for non-strict export (#124239 ) This PR fakify ScriptObject inputs and attributes in export non-strict mode by default. The basic idea is to `only fakify the script object during tracing (i.e. aot_export)`. After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor. Concretely, we 1. fakify all the script object inputs, and module attributes (gathered by constant_attrs). 2. patch the module's attributes with fakified script object 3. right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124239 Approved by: https://github.com/zou3519	2024-04-30 15:57:25 +00:00
Aaron Gokaslan	2f3b0befed	[BE]: Apply ruff FURB 118. (#124743 ) Replaces various lambdas with operator.itemgetter which is more efficient (as it's a builtin function). Particularly useful for when lambdas are used as 'key' functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124743 Approved by: https://github.com/albanD, https://github.com/malfet	2024-04-26 14:34:52 +00:00
Zhengxu Chen	d40774f4ed	[export] Fix up nn_module_stack for nodes occured around tracepoint ops. (#124457 ) Summary: as title. Test Plan: hg checkout D55901896 buck run mode/opt torchrec/ir/tests:test_serializer -- --filter-regex test_serialize_deserialize_ebc Differential Revision: D56340319 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124457 Approved by: https://github.com/tugsbayasgalan	2024-04-23 20:26:44 +00:00
Zhengxu Chen	e1062f5738	[export] Add a printer to unflattened module. (#124315 ) Summary: add a helper method to print graph in every level of unflattened module. Test Plan: {F1489609684} Differential Revision: D56263195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124315 Approved by: https://github.com/tugsbayasgalan	2024-04-18 16:35:51 +00:00
Zhengxu Chen	b1aca36f4c	[export] Allow legacy IR to be unflattened with weaker submodule ordering. (#123192 ) Summary: In some cases we don't have information from the old IR about submodule ordering, in this case unflattener should still work in best effort mode. Differential Revision: D55642005 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123192 Approved by: https://github.com/angelayi	2024-04-02 23:08:55 +00:00
Zhengxu Chen	0465a90b00	[export][reland] Fix unflattened submodule ordering. (#122341 ) (#122507 ) Summary: Make sure the order of submodules is the same as the original eager module. bypass-github-export-checks Test Plan: buck test mode/opt caffe2/test:test_export -- -r test_unflatten_submodule_ordering Differential Revision: D55251277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122507 Approved by: https://github.com/tugsbayasgalan	2024-03-25 15:22:01 +00:00

1 2

76 Commits