Commit Graph

103 Commits

Author SHA1 Message Date
Angela Yi
6e03f4f90e [export] Include metadata in FlatArgsAdapter (#146107)
Summary:
With https://github.com/pytorch/pytorch/pull/145956, which introduces
storing a list of namedtuple field names when serializing, we now want to
expose this list to the args adapater so that APS can utilize this information
and remove extraneous inputs.

Test Plan: No-op

Differential Revision: D68928416

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146107
Approved by: https://github.com/pianpwk
2025-02-05 00:29:58 +00:00
angelayi
0c37c332da [export] Additionally save pytree namedtuple field names (#145956)
If a user passes in a namedtuple as an input, currently the input TreeSpec looks like: `TreeSpec(type=namedtuple, context=”class_fqn”, children_spec=[*, *])`

The user then saves the program containing this input TreeSpec. But what happens if they load it in a new environment where `class_fqn` now contains an additional field?

This means that the exported program is now expected to take in another input. But since those fields were not used in the original program, users should be able just drop those additional fields and the program will run successfully. This is needed/used in APS where they use unflattener's adapter to adapt the inputs based on the previously saved treespecs.

There are a couple of [solutions](https://docs.google.com/document/d/1V4ZSdy-8PUISWc8RqvGu3DU01BVegJhHHPWqa1Io7Eg/edit?tab=t.0) for how we can address this, but eventually we settled on saving a side table mapping namedtuple types to their list of field names, which can then be accessed by the adapter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145956
Approved by: https://github.com/zhxchen17
2025-02-04 04:42:30 +00:00
Avik Chaudhuri
cde5ddfd14 fix internal error with reorder submodules (#146181)
Test Plan: hard to isolate as small repro

Differential Revision: D68963033

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146181
Approved by: https://github.com/angelayi
2025-02-01 00:30:42 +00:00
Aaron Gokaslan
f3304571fc [BE][Ez]: FURB148 - remove useless enumerate calls (#145619)
Remove useless enumerate calls

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145619
Approved by: https://github.com/drisspg
2025-01-24 23:37:15 +00:00
Aaron Orenstein
b6c5562c1f PEP585 update - torch/export (#145165)
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145165
Approved by: https://github.com/bobrenjc93
2025-01-19 20:56:55 +00:00
Xuehai Pan
dcc3cf7066 [BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415)
The fixes are generated by:

```bash
ruff check --fix --preview --unsafe-fixes --select=E226 .
lintrunner -a --take "RUFF,PYFMT" --all-files
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415
Approved by: https://github.com/huydhn, https://github.com/Skylion007
2025-01-08 21:55:00 +00:00
Avik Chaudhuri
bdeee82822 unflatten isinstance (#143664)
When we unflatten, the submodules we generate (`InterpreterModule` or `InterpreterModuleDispatcher`) are not related by type to the original submodules `N`. This makes `isinstance(mod, N)` checks fail. Since we do not have the original types after export, the best we can do is expose a `type_name()` method that carries the original type name, which we do carry in `nn_module_stack` entries.

Differential Revision: [D67526542](https://our.internmc.facebook.com/intern/diff/D67526542/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143664
Approved by: https://github.com/tugsbayasgalan
2024-12-21 01:07:10 +00:00
Avik Chaudhuri
e3886fb13c misc. fixes to unflatten (#142141)
Combining several fixes to unflatten for bugs revealed by random graph testing.

The fixes target two categories of bugs:
1. Some bugs show up as exponential blowups for largish system of nn modules. These are fixes by converting lists to sets, using caching, or otherwise rewriting to reuse computation more effiicently.
2. Other bugs were due to missing intermediate modules created when attributes such as submodules and buffers are accessed through longish paths before calling the corresponding intermediate modules, or missing attributes such as buffers and constants in submodules corresponding to multiple calls.

Differential Revision: [D66659795](https://our.internmc.facebook.com/intern/diff/D66659795/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142141
Approved by: https://github.com/ydwu4
2024-12-10 03:45:13 +00:00
Fabian Keller
5e8e1d725a Remove some unused type ignores (round 1) (#142325)
Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking.

Having these `# type: ignore` linger around is not ideal for two reasons:

- They are verbose/ugly syntatically.
- They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore.

I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward.

This PR should have no effect on runtime at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
2024-12-09 18:23:46 +00:00
bhack
ae9cda0221 Add truediv support in export serializer (#136364)
Fixes #136113

- [x] Inital `truediv` coverage
- [ ] Expand/reduce coverage?
- [x] Add tests
- [x] Re-check docstrings
- [ ] Linting

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364
Approved by: https://github.com/pianpwk

Co-authored-by: Angela Yi <angelayi@meta.com>
Co-authored-by: Pian Pawakapan <pianpwk@meta.com>
2024-12-05 17:33:33 +00:00
Fabian Keller
394c339691 improve typings in unflatten (#141817)
A first follow-up to https://github.com/pytorch/pytorch/pull/115074 / https://github.com/pytorch/pytorch/pull/141240 following the strategy discussed there (https://github.com/pytorch/pytorch/pull/115074#issuecomment-2480992230).

This PR improves the type annotations around `unflatten.py` which had been inaccurate due to the previously suppressed type checking on `torch.nn.Module`.

CC @Skylion007 @ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141817
Approved by: https://github.com/Skylion007
2024-11-30 22:12:15 +00:00
Ivan Zaitsev
09a3eddc07 Revert #141066 and #141494 (#141721)
manual revert due to merge conflicts

note: #141494 was reverted out of order blocking automatic revert of #141066

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141721
Approved by: https://github.com/avikchaudhuri
2024-11-28 20:18:19 +00:00
PyTorch MergeBot
8c90a9a030 Revert "fix non termination in unflatten + state (#141494)"
This reverts commit 5d7c3701e4.

Reverted https://github.com/pytorch/pytorch/pull/141494 on behalf of https://github.com/jovianjaison due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/141494#issuecomment-2504639230))
2024-11-27 19:30:55 +00:00
PyTorch MergeBot
6e61ff4fd3 Revert "Add truediv support in export serializer (#136364)"
This reverts commit 1df440dc4e.

Reverted https://github.com/pytorch/pytorch/pull/136364 on behalf of https://github.com/huydhn due to Sorry for reverting your change but its doc build failure is legit ([comment](https://github.com/pytorch/pytorch/pull/136364#issuecomment-2502620732))
2024-11-27 03:24:31 +00:00
bhack
1df440dc4e Add truediv support in export serializer (#136364)
Fixes #136113

- [x] Inital `truediv` coverage
- [ ] Expand/reduce coverage?
- [x] Add tests
- [x] Re-check docstrings
- [ ] Linting

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364
Approved by: https://github.com/pianpwk

Co-authored-by: Angela Yi <angelayi@meta.com>
Co-authored-by: Pian Pawakapan <pianpwk@meta.com>
2024-11-27 00:31:47 +00:00
Avik Chaudhuri
5d7c3701e4 fix non termination in unflatten + state (#141494)
With largish systems of nn modules with buffers, sinking params suffered from some kind of exponential blowup that is easily fixed by using a set instead of a list to keep track of unlifted buffer placeholders.

Test Plan: added random dag test that failed previously

Differential Revision: D66457661

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141494
Approved by: https://github.com/angelayi
2024-11-26 00:17:56 +00:00
Angela Yi
f6eeab7ea8 [export] Make unflattened module compileable (#141249)
Test Plan: Fixes https://fb.workplace.com/groups/1028545332188949/permalink/1091988579177957/

Differential Revision: D66302806

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141249
Approved by: https://github.com/avikchaudhuri
2024-11-23 18:46:01 +00:00
Avik Chaudhuri
8b4ae29b1b misc. fixes to unflatten (#141066)
Handling of nested modules in unflatten had several bugs, which were caught by trying to preserve module call signatures for nested modules.
* A module `k` encountered when calling `k.n()` before `k()` used to become an empty nn module. This caused some information to be dropped when `k()` was eventually called. Relatedly, we would also lose call counts for `k.n()` through different paths (say, when `k()` calls `n()`).
* Deleting call-indexed modules and patching up their call sites was broken for nested modules when creating dispatcher modules, because of silliness when handling their fqns.

An interesting aside is that we used random graph generation for testing some of these changes. A future PR will add the infra to create tests using these random graphs.

Differential Revision: D66192799

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141066
Approved by: https://github.com/angelayi
2024-11-23 07:31:51 +00:00
Edward Z. Yang
612122af8f Fix type-safety of torch.nn.Module instances (#141240)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141240
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-11-22 00:05:05 +00:00
Aaron Gokaslan
12e95aa4ee [BE]: Apply PERF401 autofixes from ruff (#140980)
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-11-20 17:52:07 +00:00
Angela Yi
2b39a8db77 Refactor UnflattenedModule's adapt flat args (#140840)
Test Plan: unblocks model launch

Differential Revision: D66014709

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140840
Approved by: https://github.com/pianpwk
2024-11-16 05:09:37 +00:00
Avik Chaudhuri
7691064768 dispatcher module for multiple graphs (#139439)
Differential Revision: [D65307961](https://our.internmc.facebook.com/intern/diff/D65307961/)

This PR introduces the concept of a "dispatcher" module `n` that carries multiple interpreter modules `n`, `n@1`, `n@2`, etc., each corresponding to a particular call of `n` and thus might carry a different specialized graph. We only do this when we're preserving module call signatures for `n`. The carried modules have the same number and order of calls to `n` appearing in the original module / exported program. In the unflattened module, all those calls go to the "dispatcher" module which internally tracks how many calls have been made so far and invokes the corresponding interpreter module. We reset this tracking after a successful or unsuccessful run of the unflattened module.

Overall this makes swapping easier when module call signatures are preserved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139439
Approved by: https://github.com/tugsbayasgalan
ghstack dependencies: #139438
2024-11-12 09:53:40 +00:00
Henry Tsang
350bc2a166 [export] Add support for symbool to make it usable for torch.cond (#138765)
# Why?

I want the following code to work.

minimal repro:
```
class M(torch.nn.Module):
    def forward(self, dilate_flag):
        return dilate_flag.item()

input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),)
model = M().cuda()

ep = torch.export.export(model, input1, strict=True)
path = torch._inductor.aot_compile(ep.module(), input1)
aot_model = torch._export.aot_load(path, device="cuda")
actual_output = aot_model(*input1)
```

error: AssertionError: Encountered an unsupported object of type <class 'torch.SymBool'> while writing the metadata for exported program

second error will be handled by https://github.com/pytorch/pytorch/pull/138760

# Motivation

I could technically bypass it with a torch.int tensor. However, it doesn't work with torch.cond. I want the following to work. It would also require https://github.com/pytorch/pytorch/pull/138760 for aot compile to work.

```
class M(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.dilate_flag = 0

    def forward(self, dilate_flag):
        self.dilate_flag = dilate_flag.item()

        def true_fn(dilate_flag):
            return dilate_flag.clone()

        def false_fn(dilate_flag):
            return dilate_flag.clone()

        torch.cond(
            self.dilate_flag,
            true_fn,
            false_fn,
            (dilate_flag,),
        )
        return self.dilate_flag

input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),)
input2 = (torch.tensor([0], dtype=torch.bool, device="cuda"),)
inputs = (input1, input2)
model = M().cuda()

for input in inputs:
    expected_output = model(*input)

    ep = torch.export.export(model, input, strict=False)
    path = torch._inductor.aot_compile(ep.module(), input)
    aot_model = torch._export.aot_load(path, device="cuda")
    actual_output = aot_model(*input)

    assert (
        expected_output == actual_output
    ), f"henry they are not equal {expected_output} != {actual_output}"
```

Differential Revision: D64867504

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138765
Approved by: https://github.com/ydwu4
2024-11-04 23:31:49 +00:00
angelayi
f14f245747 [export] Remove custom forward func in swap (#139126)
Differential Revision: [D65100694](https://our.internmc.facebook.com/intern/diff/D65100694)

Remove the custom forward function and instead move the pytree flatten/unflatten ops into the graph. This allows us to natively run via the interpreter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139126
Approved by: https://github.com/avikchaudhuri
2024-10-30 16:50:57 +00:00
Avik Chaudhuri
9e06b5b5cb fix unflatten with HOPs (#138978)
Summary:
Unflatten was broken for HOPs for a couple of reasons:
(1) we didn't expect `get_attr` nodes in the exported program, but they can occur to hold graph arguments to HOPs; such attributes must be moved from the exported program to the corresponding unflattened submodule containing the HOP call.
(2) we don't record metadata for graph arguments on serialization (there's nothing to hold it in our schema), and accordingly the `get_attr` nodes we create on deserialization don't have `nn_module_stack` metadata, which obviously wrecks unflatten.

Test Plan: added a couple of tests

Differential Revision: D65013647

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138978
Approved by: https://github.com/zhxchen17
2024-10-28 19:30:56 +00:00
Avik Chaudhuri
1d98a526dd preserve signatures with multiple calls + buffer mutations (#138669)
As called out in https://github.com/pytorch/pytorch/pull/137999, preserving signatures of multiple calls when buffer mutations are present was NYI. The main problem was that intermediate values of buffers were not tracked, so couldn't be propagated statefully between multiple calls (i.e., they would need to be explicitly passed around, defeating the unlifting needed for preserving signatures).

This PR fixes this situation, by introducing module attributes that carry the necessary intermediate values of buffer mutations. In general, a buffer mutation can have several intermediate values it depends on recursively, even other buffers. So rather than tying an intermediate value with a particular buffer, we tie it with the submodules that create and read it. We install an attribute on all modules that create or read a particular intermediate value, sharing the same initial storage (i.e., initialized with the same empty tensor). For the module that creates this intermediate value, we copy the value into the corresponding attribute; and for the modules that read it, we read the corresponding attribute instead.

Another complication that needed to be addressed was that a `run_decompositions` following an `export_for_training` was not preserving module call graphs, which is needed for unflattening and, in particular, used when remapping inputs. Fortunately some existing metadata already tracks provenance of nodes, which we could use to update a module call graph after functionalization / decomposition.

Differential Revision: D64806175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138669
Approved by: https://github.com/tugsbayasgalan
2024-10-25 00:13:25 +00:00
Avik Chaudhuri
5d01126616 preserve module signature with multiple calls (#137999)
Previously we would error when trying to preserve the call signature for a module when it was called multiple times. This PR can now do this without erroring. The fix is to propagate call indices in a few more places.

Note that while this works in the presence of params, buffers, and tensor constants, preserving call signatures for multiple calls to a module when buffers are mutated is not supported yet. This is future work. The main problem is that we do not have enough metadata to `copy_` mutated buffers at the end of each call to a module, so the next call can read those buffers at the beginning. Making this work will likely need some explicit tracking of intermediate values of mutated buffers when collecting metadata during functionalization in export.

Note also that we stop short of creating a single graph out of multiple graphs: that is still future work. So the unflattened module will still have different targets `n`, `n@1`, `n@2`, etc. for each call when we ask the module call signature of `n` to be preserved. However it is way easier to swap all of these targets with a replacement that behaves similar to the original, because all of these calls will respect the original module call signature. (In particular, any constant inputs will be carried by the calls.)

Differential Revision: D64406945

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137999
Approved by: https://github.com/tugsbayasgalan
2024-10-18 07:30:22 +00:00
Avik Chaudhuri
ed55d356de [alt] fix unroll in successive unflatten (#137646)
We use nn_module_stack in unflatten to recognize when module calls begin and end. However the current format is not sufficient to detect module call boundaries when we have successive calls to the same module, because the successive instructions (end of one call, begin of next call) have the same nn_module_stack. This causes us to effectively "unroll" successive calls to a single call. This can cause problems when preserving module call signatures because the outputs of the successive calls might be concatenated in the single call.

Previously we introduced the concept of a "call index" to generate multiple graphs when unflattening, one per call. This PR pushes this concept into nn_module_stack itself. In particular, the keys of nn_module_stack now go from `key` to `key@call_index`. (In a previous attempt, https://github.com/pytorch/pytorch/pull/137457, instead values in nn_module_stack go from (fqn, type) to (fqn, type, call_index), which is BC-breaking.)

Note that we still do not have the ability to preserve module call signatures for multiple calls to the same module. But now instead of randomly crashing we give a proper error. OTOH when not preserving module call signatures we simply generate multiple calls, each with its own graph, possibly deduplicated, matching what we would do for non-successive calls.

Test Plan: Like D64014936

Differential Revision: D64136277

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137646
Approved by: https://github.com/angelayi
2024-10-12 15:53:52 +00:00
Tugsbayasgalan Manlaibaatar
02013da038 Lift restriction on training IR for unflatten (#137470)
Differential Revision: [D64025578](https://our.internmc.facebook.com/intern/diff/D64025578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137470
Approved by: https://github.com/avikchaudhuri
2024-10-08 22:30:24 +00:00
angelayi
1dc1b85714 [export] Move swap to a different file (#137134)
Refactor so that unflattener doesn't become too messy

Differential Revision: [D63719648](https://our.internmc.facebook.com/intern/diff/D63719648/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137134
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #136191, #137102
2024-10-06 04:28:18 +00:00
angelayi
fa9cd46d12 [export] Update swap's forward function (#137102)
Downstream APS code was failing to run the previously swapped module because of some fx.GraphModule forward function weirdness (P1594789677). So to fix this, I just attached a custom forward function which matches the unflattened module's forward function.

Differential Revision: [D63683422](https://our.internmc.facebook.com/intern/diff/D63683422/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137102
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #136191
2024-10-06 04:25:36 +00:00
angelayi
52d7704b32 [export] Add optimization passes (#136191)
Added an optimization pass to the swap function which removes extraneous pytrees. Currently it removes the pytree flatten/unflatten calls between modules in very specific scenarios (all the inputs of one module go into the other).

Future work can be to remove the input pytree.flatten if the inputs go directly into an unflatten, and output pytree unflatten if the outputs are directly from a pytree.flatten.

Differential Revision: [D62879820](https://our.internmc.facebook.com/intern/diff/D62879820)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136191
Approved by: https://github.com/avikchaudhuri
2024-10-06 04:22:42 +00:00
Avik Chaudhuri
17718209ea fix specialization bug in unflatten + preserve_module_call_signature (#137363)
Summary: In unflatten, when we generate module calls when their signature has been preserved, we do not pass the original constant args. This can cause strange effects, e.g., if the module is swapped out with itself, we may suddenly go down a different path than the original, or even crash.

Test Plan: added a test

Reviewed By: angelayi

Differential Revision: D63913750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137363
Approved by: https://github.com/angelayi
2024-10-05 04:26:02 +00:00
Tugsbayasgalan Manlaibaatar
d2d14d14e3 [RELAND] Fix unlift to preserve aliased constants (#137310)
Differential Revision: [D63864743](https://our.internmc.facebook.com/intern/diff/D63864743)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137310
Approved by: https://github.com/avikchaudhuri
2024-10-04 18:15:52 +00:00
PyTorch MergeBot
525f6715bc Revert "Fix unlift to unblock training IR + run_decomp on aliasing constants (#137162)"
This reverts commit f96020c246.

Reverted https://github.com/pytorch/pytorch/pull/137162 on behalf of https://github.com/jovianjaison due to Sorry for reverting your changes but many jobs are failing with NameError: name _recursive_getattr is not defined + a Lint job fails ([comment](https://github.com/pytorch/pytorch/pull/137162#issuecomment-2392036062))
2024-10-03 18:17:56 +00:00
Tugsbayasgalan Manlaibaatar
f96020c246 Fix unlift to unblock training IR + run_decomp on aliasing constants (#137162)
When we populate unlifted graph module, we actually only "unlift" constant tensor inputs which is problematic because export de-duplicates aliasing constants. As a result, we only register one constant instead of two constants. This PR fixes that by querying ep.constants table instead of ep.graph_signature.lifted_tensor_constants.

Differential Revision: [D63743111](https://our.internmc.facebook.com/intern/diff/D63743111)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137162
Approved by: https://github.com/pianpwk
2024-10-03 17:28:53 +00:00
Avik Chaudhuri
cd5d1fe015 unflatten with specialized graphs per submodule call (#137013)
Previously we were making a fairly restrictive assumption when unflattening an exported program: for any submodule, we would assert that the graph of every call to that submodule must be the same. This assertion is load-bearing, i.e., if we simply remove the assertion then we can get incorrect results, as shown by the following example.

```
    class N(torch.nn.Module):
        def forward(self, x, b):
            if b:
                return x + 1
            else:
                return x + 2

    class M(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.n = N()

        def forward(self, x):
            x0 = x + 3
            x1 = self.n(x0, True)
            x2 = x1 + 4
            x3 = self.n(x2, False)
            return x3 + 5

    m = M()
    inp = (torch.ones(1),)
    print(m(*inp))  # tensor([16.])
    ep = torch.export.export(m, inp)
    print(ep.module()(*inp))  # tensor([16.])

    unflattened = torch.export.unflatten(ep)
    print(unflattened(*inp))  # tensor([15.])
```

However, this goes against the spirit of specializing graphs when exporting: we should *expect* that for every call to a submodule we *might* generate a different graph. The goal of this PR is to fix unflattening to handle multiple specialized graphs corresponding to multiple calls to the same submodule.

The idea is simple: for every call to a child module `foo`, we will create potentially different child modules `foo`, `foo@1`, `foo@2`, etc. and use those names as targets in `callmodule` instructions in the parent graph. An immediate consequence of this is that the list of fqns in an unflattened module may not be the same as an exported module. Note that all these variants share the same parameters / buffers, so that multiple calls to the same submodule can share state as expected.

However, as described so far this scheme may end up with needlessly too many submodules. Thus, between calls to the same submodule, if graphs are equal then we optimize away the extra submodules and reuse call names as much as possible. Moreover, when submodules are shared across fqns, we also try to de-duplicate graphs corresponding to their calls as much as possible. Note that no matter what, information about which submodule was called is still preserved, so that if a submodule has to be swapped with another, one can still find all calls to the former submodule and replace them with calls to the latter.

A note on the choice of naming scheme for call names: instead of generating "sibling" modules `foo@1`, `foo@2`, etc. for `foo`, we had considered generating "children" modules `foo._1`, `foo._2`, etc. of `foo`. However this can cause spurious cycles when de-duplicating graphs. E.g., suppose that `foo` is an alias for `bar._1` and `foo._1` is an alias for `bar`, then we must either introduce a cycle or drop the opportunity to optimize. Another idea would be to make `foo` a dummy module that contains `foo._0` corresponding to the first call, but this necessitates too many changes to existing tests and hurts the common case.

Differential Revision: D63642479

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137013
Approved by: https://github.com/pianpwk
2024-10-03 00:55:44 +00:00
Ivan Zaitsev
b35f70da05 [ez] fixup the export of D62879819 (#136900)
a line from D62879819 (#136190) went missing somehow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136900
Approved by: https://github.com/atalman
2024-09-28 13:46:17 +00:00
angelayi
210b136c07 [export] Add experimental swap API (#136190)
Prototyped the following API which takes in an ExportedProgram, a dictionary of fqn to modules to swap, and returns a (unlifted) GraphModule
```
_swap_modules(
    ep: ExportedProgram, modules_to_swap: Dict[str, torch.nn.Module]
) -> torch.fx.GraphModule:
```

Differential Revision: [D62879819](https://our.internmc.facebook.com/intern/diff/D62879819)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136190
Approved by: https://github.com/avikchaudhuri
2024-09-24 22:50:44 +00:00
Huanyu He
a4e9a1c90b [TorchRec][PT2 IR][APF] short circuit the flatten/unflatten between EBC and KTRegroupAsDict modules (#136045)
Summary:
# context
* for the root cause and background please refer to this [post](https://fb.workplace.com/groups/1028545332188949/permalink/1042204770823005/)
* basica idea of this diff is to **short circuit the pytree flatten-unflatten function pairs** between two preserved modules, i.e., EBC/fpEBC and KTRegroupAsDict.
NOTE: There could be multiple EBCs and one single KTRegroupAsDict as shown in the [pic](https://fburl.com/gslide/lcyt8eh3) {F1864810545}
* short-circuiting the EBC-KTRegroupAsDict pairs are very special and a must in most of the cases due to the EBC key-order issue with distributed table lookup.
* hide all the operations behind a control flag `short_circuit_pytree_ebc_regroup` to the torchrec main api call `decapsulate_ir_modules`, which should only be visible to the infra layer, not to the users.

# details
* The `_short_circuit_pytree_ebc_regroup` function finds all the EBCs/fpEBC and KTRegroupAsDict modules in an unflattened module.  Retrieve their fqns and sort to in_fqns (regroup_fqns) and out_fqns (ebc_fqns). Because currently the fpEBC is swapped as a whole, so we do some extra fqn logic to filter out the EBC that belongs to an up-level fpEBC.
* a util function `prune_pytree_flatten_unflatten` removes the in-coming and out-going pytree flatten/unflatten function calls in the graph module, based on the given fqns.

WARNING: The flag `short_circuit_pytree_ebc_regroup` should be turned on if EBCs are used and EBC sharding is needed. Assertions are also added if can't find a `KTRegroupAsDict` module, or `finalize_interpreter_modules` is not `True`.

# additional changes
* absorb the `finalize_interpreter_modules` process inside the torchrec main api `decapsulate_ir_modules`.
* set `graph.owning_module` in export.unflatten as required by the graph modification
* add one more layer of `sparse_module` for closely mimicing the APF model structure.

Test Plan:
# run test
* serializer
```
buck2 run fbcode//mode/opt fbcode//torchrec/ir/tests:test_serializer
```
* apf
```
buck2 run fbcode//mode/opt fbcode//aps_models/ads/gmp/tests/ne/e2e_deterministic_tests:gmp_e2e_ne_tests -- --filter-text 'test_mtml_instagram_model_562438350_single_gpu_with_ir'
```
* local mp run
```
==== Finished E2E deterministic test for mtml_instagram_model_gmp_474023725_non_kjt_unary ====
finished
  test_mtml_instagram_model_562438350_single_gpu_with_ir
Imports took: 6.0s! Profile with --import-profiler.            --_ |""---__
Executed 1 example in 203.1s:                               |'.|  ||  .    """|
  Successful: 1                                             | ||  || /|\""-.  |
  Failed: 0                                                 | ||  ||  |    |  |
  Skipped: 0                                                | ||  ||  |   \|/ |
  Not executed: 8                                           |."|  ||  --"" '__|
https://testslide.readthedocs.io/                              --" |__---"""
```

Differential Revision: D62606738

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136045
Approved by: https://github.com/angelayi
2024-09-17 18:42:56 +00:00
Tugsbayasgalan Manlaibaatar
c18052da0e Add some minor doc improvement and ban using training IR for unflattener (#135549)
Title

Differential Revision: [D62412490](https://our.internmc.facebook.com/intern/diff/D62412490/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135549
Approved by: https://github.com/yushangdi
2024-09-10 06:48:42 +00:00
Zhengxu Chen
116fd474da [export] Expand coverage to more copied sym ops for unflattener. (#135119)
Test Plan:
buck2 test 'fbcode//mode/opt' fbcode//torchrec/ir/tests:test_serializer -- --run-disabled

```
File changed: fbcode//caffe2/torch/export/unflatten.py
Buck UI: https://www.internalfb.com/buck2/2e0377e7-e2b6-4bd0-8133-a787245165a0
Test UI: https://www.internalfb.com/intern/testinfra/testrun/5066549824883887
Network: Up: 0B  Down: 0B
Jobs completed: 16. Time elapsed: 10.2s.
Tests finished: Pass 6. Fail 0. Fatal 0. Skip 0. Build failure 0
```

Differential Revision: D62190172

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135119
Approved by: https://github.com/yushangdi
2024-09-05 21:58:20 +00:00
Angela Yi
9c38b00999 [export] Add ability to run eagerly on UnflattenedModule (#133996)
Summary:
Added the contextmanager, `_disable_interpreter`, which is meant to put around a call to `unflatten`. This will generate an UnflattendModule and sub-InterpreterModules which will not use torch.fx.Interpreter to run eagerly. We want to have this as a state of the module instead of a contextmanager around running the module because it's not clear where we are calling the unflattened module.

This seems to improve the performance: https://fb.workplace.com/groups/1075192433118967/posts/1473590629945810/?comment_id=1473621763276030

Test Plan: CI

Differential Revision: D60939034

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133996
Approved by: https://github.com/pianpwk
2024-09-05 20:28:42 +00:00
Zhengxu Chen
a19a7524f6 [export] Make sure getitem replacement are synced with module call graph. (#134830)
Summary: When we are placing nodes in the graph, we should also replace the references in module_call_graph.

Test Plan:
buck2 run 'fbcode//mode/opt' torchrec/fb/ir/tests:test_serializer -- --filter-regex test_serialize_deserialize_vlea
buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_serialize_empty_value_vlea' --run-disabled

buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_deserialized_device_vle' --run-disabled

Differential Revision: D62014035

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134830
Approved by: https://github.com/angelayi
2024-08-30 16:47:05 +00:00
Angela Yi
f5a2a22dc4 [export] Fix unflattener to respect nn.Parameter requires_grad (#134353)
Summary: Fixes P1539870235

Test Plan: CI

Differential Revision: D61726403

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134353
Approved by: https://github.com/pianpwk
2024-08-23 22:49:34 +00:00
angelayi
a270800f0b [export][reland] Add print_readable to unflattened module (#132817)
Reland https://github.com/pytorch/pytorch/pull/128617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132817
Approved by: https://github.com/pianpwk
2024-08-08 06:05:30 +00:00
Yidi Wu
618e2c9de4 fix torch rec test failure (#132437)
Summary: Fixes T192448049. The module call form an unusal call stack for the nodes: https://www.internalfb.com/phabricator/paste/view/P1507230978. This is currently not supported by unflattener and need some extra design to make it work.

Test Plan: buck2 run 'fbcode//mode/opt' torchrec/distributed/tests:test_pt2 -- --filter-text "test_sharded_quant_fpebc_non_strict_export"

Reviewed By: zhxchen17

Differential Revision: D60528900

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132437
Approved by: https://github.com/Skylion007
2024-08-05 18:06:07 +00:00
PyTorch MergeBot
3855ac5a5d Revert "[export] Add print_readable to unflattener (#128617)"
This reverts commit ab9791c0e3.

Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/angelayi due to never got landed internally due to weird flow... sorry ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2264224466))
2024-08-01 23:47:29 +00:00
angelayi
ab9791c0e3 [export] Add print_readable to unflattener (#128617)
Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](17b45e905a/torch/fx/graph_module.py (L824))), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module.

Example print from `python test/export/test_unflatten.py -k test_unflatten_nested`
```
class UnflattenedModule(torch.nn.Module):
    def forward(self, x: "f32[2, 3]"):
        # No stacktrace found for following nodes
        rootparam: "f32[2, 3]" = self.rootparam

        # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam
        mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam);  x = rootparam = None

        # No stacktrace found for following nodes
        foo: "f32[2, 3]" = self.foo(mul);  mul = None
        bar: "f32[2, 3]" = self.bar(foo);  foo = None
        return (bar,)

    class foo(torch.nn.Module):
        def forward(self, mul: "f32[2, 3]"):
            # No stacktrace found for following nodes
            child1param: "f32[2, 3]" = self.child1param
            nested: "f32[2, 3]" = self.nested(mul);  mul = None

            # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param
            add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param);  nested = child1param = None
            return add

        class nested(torch.nn.Module):
            def forward(self, mul: "f32[2, 3]"):
                # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x
                div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul);  mul = None
                return div

    class bar(torch.nn.Module):
        def forward(self, add: "f32[2, 3]"):
            # No stacktrace found for following nodes
            child2buffer: "f32[2, 3]" = self.child2buffer

            # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer
            sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer);  add = child2buffer = None
            return sub
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617
Approved by: https://github.com/zhxchen17, https://github.com/pianpwk
2024-07-30 00:41:44 +00:00
PyTorch MergeBot
d6ae8bbf16 Revert "[export] Add print_readable to unflattener (#128617)"
This reverts commit 9fee87e4cd.

Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/clee2000 due to broke inductor/test_flex_attention https://github.com/pytorch/pytorch/actions/runs/9984688318/job/27595182606 433ef4e444 Not run on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2236867975))
2024-07-18 15:31:51 +00:00