Commit Graph

771 Commits

Author SHA1 Message Date
Shangdi Yu
46dd226702 Fakify torchbind objects in compile_fx and add tests for SigridTransformsInstanceTorchBind (#149529)
Summary:
We need to properly fakify torchbind objects, including the ones in graph module attributes, so the resgitered fake implementation works properly.

- _fakify_script_objects in `compile_fx`
- Allow fake torchbind objects in `torchbind_constants`

Remove `node.meta["unbacked_bindings"]` for `aot_compile` in `compile_fx`. Otherwise `ShapeProp` will fail when trying to resolve the `unbacked_bindings` of `with_effect` tokens.

Update `sigrid_transforms_test` to use the latest `torch._inductor.aot_compile` API.

Add a test for `Fakify torchbind objects in compile_fx and add tests for SigridTransformsInstanceTorchBind` in `e2e_test`.

Test Plan:
```
buck run //caffe2/torch/fb/sparsenn:sigrid_test -- -r test_transform_torch_bind

buck run //sigmoid/inference/test:e2e_test_cpu -- -r SigridTransforms

buck2 run mode/dev-nosan sigmoid/inference/ts_migration:pt2i_readiness_main -- --model_id 545017754 --test_suite ads_all --mode test_preproc

```

Differential Revision: D70013257

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149529
Approved by: https://github.com/angelayi
2025-03-21 18:58:28 +00:00
angelayi
bf34e228c5 [export] Beef up guard_added logs (#149465)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149465
Approved by: https://github.com/pianpwk
2025-03-20 23:02:07 +00:00
Avik Chaudhuri
6237495fcf torch.Size input (#149414)
Summary: Support for `torch.Size` inputs was patchy before because `unflatten_fn` for this type returned a tuple. This PR cleans this up.

Fixes #149158

Test Plan: added test

Differential Revision: D71403635

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149414
Approved by: https://github.com/yushangdi
2025-03-20 16:23:13 +00:00
Tugsbayasgalan Manlaibaatar
3b7bd6c63d Fix dynamic shapes repordering bug (#149528)
WHen we create constraints, we look at the ordering of kwargs according to model signature. But when we trace, we use the ordering that is created based on how user passes in their kwargs. As a result, constraints and dynamic shapes end up having a different order causing issues when they have different dynamic tensor specs.

Differential Revision: [D71478578](https://our.internmc.facebook.com/intern/diff/D71478578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149528
Approved by: https://github.com/ydwu4
2025-03-20 01:57:44 +00:00
Yanan Cao (PyTorch)
fae79e91a0 Remove torch.export.export_for_inference (#149078)
Summary: Remove torch.export.export_for_inference, it is redundant and can always be replaced with torch.export.export_for_training() + run_decompositions()

Test Plan: unit tests

Differential Revision: D71069057

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149078
Approved by: https://github.com/tugsbayasgalan
2025-03-19 19:57:18 +00:00
Pian Pawakapan
96828a2155 [export] refactor DimHints for type errors (#149424)
Differential Revision: D71414367

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149424
Approved by: https://github.com/justinchuby, https://github.com/avikchaudhuri
2025-03-19 18:51:07 +00:00
Avik Chaudhuri
20874a1f46 debug ival swap (#149206)
Summary:
Recall that we use "ivals" to track intermediate values of mutations during unflattening. Previously, for each such intermediate value, we would create a hidden shared attribute that would be updated / read by respective submodules.

Unfortunately this scheme doesn't work when some but not all of those submodules are swapped out. This is because the swapped in submodules have no knowledge of these hidden attributes. Thus the submodules that are not swapped out end up reading / updating dangling state.

This PR does away with these hidden attributes. Instead, we directly read the underlying buffer or placeholder that was updated, and update those underlying buffers and placeholders in place. This makes the graphs look much closer to their eager origins.

Test Plan: added some tests, ensured existing tests pass

Differential Revision: D71203469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149206
Approved by: https://github.com/tugsbayasgalan
2025-03-19 03:43:30 +00:00
angelayi
01a57981aa [export] Add TracingContext (#149294)
TracingContext is added to all tracing locations -- in torch.export this is where we call make_fx (for training IR) and aot_export_module (for inference IR), and in run_decompositions where we call aot_export_module

Differential Revision: [D71298927](https://our.internmc.facebook.com/intern/diff/D71298927)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149294
Approved by: https://github.com/ydwu4
2025-03-19 03:11:08 +00:00
angelayi
3b48c72141 [export] Minor refactor to trace.py (#149240)
Minor refactor to trace.py
* Removed `_strict_export_lower_to_aten_ir` in favor of just `_strict_export` and `_non_strict_export`
* Matched the APIs of `_strict_export` and `_non_strict_export`
    * Instead of a `lower_to_aten_callback` which is a callable, or `dispatch_tracing_mode`, both functions take in a `_to_aten_func` which can be either `_export_to_aten_ir_make_fx` or `_export_to_aten_ir`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149240
Approved by: https://github.com/pianpwk
2025-03-18 21:40:30 +00:00
Yanan Cao (PyTorch)
a16ada41b9 Fix outdated docstring of torch.export.export regarding strict flag (#149077)
Summary: Fix outdated docstring of torch.export.export regarding strict flag

Test Plan: None, doc only change

Differential Revision: D71068215

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149077
Approved by: https://github.com/zhxchen17
2025-03-17 22:29:20 +00:00
Yanan Cao (PyTorch)
ab45aaca97 Set non-strict export as default mode (#148790)
Summary:
- Flip the default value of strict argument in torch.export.export from True to False
- Update test infra to cope with the change, some of them made the assumption of strict mode as default
- Disabled some tests that fail in non-strict mode

Test Plan: Sandcastle

Differential Revision: D70228628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148790
Approved by: https://github.com/angelayi
2025-03-12 21:10:58 +00:00
Aditya Tiwari
bb9c426024 Typo Errors fixed in multiple files (#148262)
# Fix typo errors across PyTorch codebase

This PR fixes various spelling errors throughout the PyTorch codebase to improve documentation quality and code readability.

## Changes Made

### Documentation Fixes
- Changed "seperate" to "separate" in multiple files:
  - `setup.py`: Build system documentation
  - `torch/_library/triton.py`: AOT compilation comments
  - `torch/csrc/dynamo/compiled_autograd.h`: Node compilation documentation
  - `torch/export/_unlift.py`: Pass population comments
  - `torch/export/exported_program.py`: Decomposition table notes

### Code Comments and Error Messages
- Changed "occured" to "occurred" in:
  - `test/mobile/test_lite_script_module.py`: Exception handling comments
  - `torch/export/_draft_export.py`: Error message text
  - `aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp`: MAGMA bug comment
  - `torch/csrc/utils/python_numbers.h`: Overflow handling comment
  - `torch/csrc/jit/OVERVIEW.md`: Graph compilation documentation
  - `torch/_dynamo/symbolic_convert.py`: Error explanation

### API Documentation
- Changed "fullfill" to "fulfill" in `torch/distributed/checkpoint/state_dict_loader.py`
- Changed "accross" to "across" in:
  - `torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp`
  - `torch/distributed/distributed_c10d.py`

## Motivation
These changes improve code readability and maintain consistent spelling throughout the codebase. No functional changes were made; this is purely a documentation and comment improvement PR.

## Test Plan
No testing required as these changes only affect comments and documentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148262
Approved by: https://github.com/janeyx99

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-09 12:21:40 +00:00
Pian Pawakapan
c677f3251f [export] don't use unbacked_renamings in export (#147574)
Plan: avoid the use of unbacked renamings, and introduce a pass run in `_produce_aten_artifact` that recomputes unbacked bindings. Decided to do this because in we don't serialize unbacked renamings (or any ShapeEnv state), so this used to compose poorly with de/serialization. This hopefully establishes the invariant that the unbacked binding keys are always in sync with the example values (i.e. same indices, and removed if the symbol is replaced / specialized).

For de/serialization, we don't stored unbacked bindings, and just rerun the pass.

Involved a refactor of compute_unbacked_bindings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147574
Approved by: https://github.com/avikchaudhuri
2025-03-04 21:43:49 +00:00
Angela Yi
60205b0eb2 [export] Fix logging so that it doesn't result in max recursion error (#148231)
Test Plan:
buck2 run mode/dev-nosan sigmoid/inference/ts_migration:pt2i_readiness_main -- --model_id=487493491 --test_suite ads_all --mode test_full_model

Produces https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmp2wsjQH/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100

Differential Revision: D70416613

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148231
Approved by: https://github.com/yiming0416
2025-03-04 20:47:25 +00:00
Angela Yi
6e0b09728a [export] Remove report from draft-export output (#147558)
Summary: This matches the export API. To print the report, people can just do `print(ep._report)`. This information is also displayed in the terminal after the draft_export call.

Test Plan: CI

Reviewed By: SherlockNoMad

Differential Revision: D69689154

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147558
Approved by: https://github.com/pianpwk
2025-02-22 00:54:29 +00:00
Avik Chaudhuri
698f6f9fae specify only some dimensions in shapes collection (#147534)
Differential Revision: [D69936316](https://our.internmc.facebook.com/intern/diff/D69936316/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147534
Approved by: https://github.com/bobrenjc93
2025-02-21 22:02:42 +00:00
Zhengxu Chen
fdb1305ace reland "[sigmoid] Test OSS model runner with test_export.py" (#147535)
Summary: There are ~260 tests for all the corner cases of export from test_export.py. utitlizing to test sigmoid in the OSS setting.

Test Plan: buck test mode/opt caffe2/test:test_export -- -r _sigmoid

Differential Revision: D69937387

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147535
Approved by: https://github.com/yiming0416
2025-02-20 23:45:13 +00:00
Aaron Orenstein
db4ce78d46 PEP585: More UP006 fixes (#146392)
This should be the final PR before we can enable RUFF UP006.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392
Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007
2025-02-20 06:18:13 +00:00
Gregory Comer
f63db6255f Re-land exclude upsample_bilinear2d.vec and nearest2d.vec from default export decomposition table (#147153)
Note: This is a re-land of https://github.com/pytorch/pytorch/pull/141791, which I reverted due to breaking some Meta-internal tests - an internal ET delegate did not handle the non-decomposed upsample_nearest2d, and it was not caught in CI. I've resolved that issue and should be ready to safely re-land.

Summary:
As upsample_bilinear2d.vec and upsample_nearest2d.vec are core ATen ops, they should not be decomposed by default in the export path. Because the operators have CompositeImplicitAutograd dispatch, their decomposition is registered by default. This change adds an override list for CIA decompositions being registered in the default decomp table.

In the long-term, we likely will want to exclude decompositions for all core-tagged CIA ops, but this will require all consumers to be ready to handle the remaining two ops, avg_pool1d, and adaptive_avg_pool1d. Until they are ready, I believe an explicit override list is the safest option.

Additionally, I've also removed the ExecuTorch XNNPACK delegate ConvertToUpsampleBilinear2d pass, as the pass breaks (and is not needed), given that the op is not decomposed. The purpose of this pass was originally to pattern match the decomposition and recompose it, but this is no longer necessary.

Test Plan:
Added a new test (`test_default_decomposition_core_cia_ops`) in test_export.py to verify that upsample_bilinear2d.vec (and in the future, other core-tagged CIA ops) are not decomposed by default. Also, I manually validated end to end with ExecuTorch that the op is not decomposed in to_edge (see N6238522).

```
buck test //caffe2/test:test_export -- test_default_decomposition_core_cia_ops
```

Differential Revision: D69625112

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147153
Approved by: https://github.com/manuelcandales
2025-02-19 23:03:29 +00:00
Angela Yi
2c3680ce38 [apf] Fix input adapter (#147238)
Summary: Add support for inputs that no longer exist in `input_fields`, but is not actually used by the original program. In this case, we just give it a dummy input based on the node's metadata.

Test Plan: Verified for S488841

Differential Revision: D69328093

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147238
Approved by: https://github.com/pianpwk
2025-02-19 04:49:58 +00:00
Aaron Gokaslan
e738f7ba23 [BE]: Enable ruff rule SIM113 (#147290)
Lint rules that tells the user to avoid keeping track of their own counter and use the builtin enumerate when possible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147290
Approved by: https://github.com/jansel
2025-02-16 22:41:16 +00:00
Aaron Gokaslan
6344ca1dd4 [BE][Ez]: Apply FURB188: use str remove(pre|suf)fix (#146997)
Since we are on 3.9, we can use this nice str builtin which is more readable and more efficient.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146997
Approved by: https://github.com/XuehaiPan, https://github.com/cyyever, https://github.com/jansel
2025-02-14 03:38:07 +00:00
angelayi
67cbbb29e0 [export] Dedup expression_created logs (#146859)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146859
Approved by: https://github.com/pianpwk
ghstack dependencies: #146532, #146533, #146534, #146858
2025-02-13 00:21:34 +00:00
angelayi
59bc5d0d71 [tlparse] Add stacktrace filter utility (#146858)
Added a utility function for capturing the user stack and framework stacktrace.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146858
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #146532, #146533, #146534
2025-02-13 00:21:34 +00:00
angelayi
43f5566c92 [export] Add additional tlparse logging (#146534)
Added some additional logging so we can also run tlparse on generic export errors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146534
Approved by: https://github.com/pianpwk
ghstack dependencies: #146532, #146533
2025-02-13 00:21:34 +00:00
angelayi
b4bdbce1ac [export] Use custom stream logger in draft-export (#146533)
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows:

```python
        if key == "missing_fake_kernel":
            return hash((key, data["op"]))  # Same ops get deduped
        elif key == "mismatched_fake_kernel":
            return hash((key, data["op"], data["reason"]))  # Same op and reason for errors get deduped
        elif key == "propagate_real_tensors":
            return hash((key, json.dumps(data["stack"])))  # Guards appearing on the same stacktrace get deduped
        elif key == "create_unbacked_symbol":
            return hash((key, json.dumps(data["stack"])))  # Unbacked symbols appearing on the same stacktrace get deduped
```

Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this.

The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146533
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #146532
2025-02-13 00:21:34 +00:00
Tugsbayasgalan Manlaibaatar
ebd992724f Implement serializable getattr support for tensor subclasses (#145772)
builtins.getattr is not serializable, so we replace it with a custom op that has more refined schema.

Differential Revision: [D68899421](https://our.internmc.facebook.com/intern/diff/D68899421)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145772
Approved by: https://github.com/bdhirsh
2025-02-11 19:05:14 +00:00
PyTorch MergeBot
fe94ece375 Revert "Exclude upsample_bilinear2d.vec from default core ATen decomposition table (#141791)"
This reverts commit 3d604b17d9.

Reverted https://github.com/pytorch/pytorch/pull/141791 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/141791#issuecomment-2649717140))
2025-02-11 03:17:59 +00:00
PyTorch MergeBot
f38f1dcd82 Revert "move and fix logic to update unbacked bindings (#146115)"
This reverts commit 103c8b44bc.

Reverted https://github.com/pytorch/pytorch/pull/146115 on behalf of https://github.com/huydhn due to This change has been reverted internally D69129334 but the OSS revert failed https://github.com/pytorch/pytorch/pull/146437 ([comment](https://github.com/pytorch/pytorch/pull/146115#issuecomment-2649610877))
2025-02-11 01:26:36 +00:00
Gregory Comer
3d604b17d9 Exclude upsample_bilinear2d.vec from default core ATen decomposition table (#141791)
As upsample_bilinear2d.vec is a core ATen op, it should not be decomposed by default in the export path. Because the operator has CompositeImplicitAutograd dispatch, its decomposition is registered by default. This change adds an override list for CIA decompositions being registered in the default decomp table.
In the long-term, we likely will want to exclude decompositions for all core-tagged CIA ops, but this will require all consumers to be ready to handle the remaining three ops: upsample_nearest2d.vec, avg_pool1d, and adaptive_avg_pool1d. Until they are ready, I believe an explicit override list is the safest option.

Additionally, I've also removed the ExecuTorch XNNPACK delegate ConvertToUpsampleBilinear2d pass, as the pass breaks (and is not needed), given that the op is not decomposed. The purpose of this pass was originally to pattern match the decomposition and un-decomposite it, but this is no longer necessary.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141791
Approved by: https://github.com/tugsbayasgalan, https://github.com/digantdesai
2025-02-10 19:30:19 +00:00
Avik Chaudhuri
103c8b44bc move and fix logic to update unbacked bindings (#146115)
Summary:
Previously we were touching up unbacked bindings between Dynamo and AOTAutograd in strict export, but the logic had a bug: if an unbacked symint gets substituted by a backed symint, we would put the backed symint in the unbacked bindings (the check `is_symbol` was not enough here).

This PR fixes this logic, and moreover, moves it into the serializer instead, because we don't need this adjustment outside serde.

Test Plan: added test

 D68880766

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146115
Approved by: https://github.com/pianpwk
2025-02-07 22:41:19 +00:00
Pian Pawakapan
1c872803cb [export][dynamic shapes] log provenance for locals & symbols for non-strict (#143378)
Adds `dtrace_structured` logging so when a guard or real-tensor propagation assert is added, the relevant user code with local symbolic values & free symbols are logged, e.g. from the draft export CLI report (soon to be added to tlparse):
1. Guard added:
```
1. Constraint violation error.
    The specified input dynamic_shapes spec was found to be incorrect during tracing.
    Specifically, this guard was added: Eq(s0, 3), where {'s0': "L['args'][0][0].size()[0]"}.
    This occured at the following stacktrace:
        File /data/users/pianpwk/pytorch/test/export/test_draft_export.py, lineno 267, in forward:
            assert a.shape[0] == 3

        Locals:
            a: Tensor(shape: torch.Size([s0, 3]), stride: (3, 1), storage_offset: 0)

        Symbols:
           s0: L['args'][0][0].size()[0]
...
```

2. Real tensor propagation:
```
1. Data dependent error.
    When exporting, we were unable to evaluate the value of `u2 < 0`.
    This was encountered 8 times.
    This occurred at the following stacktrace:
        File /data/users/pianpwk/pytorch/test/export/test_draft_export.py, lineno 217, in forward:
            return res[:c_item]

        Locals:
            res: Tensor(shape: torch.Size([u0, u1]), stride: (Max(1, u1), 1), storage_offset: 0)
            c_item: u2
...
```

Currently the values are extracted from the traceback, and are only valid for non-strict; strict seems to require storing & fakifying locals in the frames reporting by `TracingContext`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143378
Approved by: https://github.com/avikchaudhuri, https://github.com/bobrenjc93
2025-02-07 05:46:05 +00:00
Tugsbayasgalan Manlaibaatar
d2a2b9f8a7 Fix constants with non-functional operators (#145593)
Previously, in non-strict path, we always error when trying to inplace update a constant tensor because those constant tensors are not actually wrapped by functional tensors. This is correct behaviour in torch.compile, because dynamo makes all constant tensors into buffers and AOTDispatcher just lifts them and wraps them in functional tensors. However, in non-strict, there is no such step that registers constants as buffers so AOTDispatcher panics when it sees these dangling constant tensors when functioanalizing.

Due to recent change in the IR, this is no longer an issue in non-strict path because we don't call AOTDispatcher at training IR level, but now it is a problem for both strict and non-strict when we lower to inference. (lowering to inference is very similar to non-strict tracing) As a result, we have at least one external (https://github.com/pytorch/pytorch/issues/141336) and internal issues reported due to this difference.

To fix this, there are two ways:
1. Make functionalization be aware of constant tensors and map them to functional tensors on the fly. This makes functionalization invariant uglier and could potentially open up a gate for more nasty bugs.
2. Special handle this in export. This seems more aligned with what dynamo does today so i think we should do it this way. I think the current state could benefit from more refactors to make the run_deocmpositions to be more similar to strict export (because both of them now handle this constant registerinig logic) but it is bit complicated to do it now because strict export version of this logic is also not complete because it doesn't take into account of export graph renaming pass etc). I will follow up with more refactors after this PR (T213466691) to unblock users faster.

For future reference:

Why are we not doing "turning constants into non-persistent buffers and never de-register"? The reason is because in some internal models, they rely on module.to to reliably work to move params/buffers to correct device. As a result, buffers are moved while constants are not. In composibility meeting, we agreed that export won't do device agnostic tracing going forward (it will provide a way to specify FakeTensor in CPU that can be configured to be run on GPU), so after that is done, we can always turn constants into non-persistent buffers which will simplify export's constant handling.

Differential Revision: [D68610739](https://our.internmc.facebook.com/intern/diff/D68610739)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145593
Approved by: https://github.com/avikchaudhuri
2025-02-05 17:44:19 +00:00
Angela Yi
eb832b7bcc [export] Fix draft-export logging (#146106)
Summary: Fix issue where the lazyTraceHandler does not exist

Test Plan: CI

Differential Revision: D68928070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146106
Approved by: https://github.com/yiming0416
2025-02-05 05:49:22 +00:00
PyTorch MergeBot
f242da41c7 Revert "move and fix logic to update unbacked bindings (#146115)"
This reverts commit 0144613e6f.

Reverted https://github.com/pytorch/pytorch/pull/146115 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/146115#issuecomment-2635695958))
2025-02-05 04:51:39 +00:00
Angela Yi
6e03f4f90e [export] Include metadata in FlatArgsAdapter (#146107)
Summary:
With https://github.com/pytorch/pytorch/pull/145956, which introduces
storing a list of namedtuple field names when serializing, we now want to
expose this list to the args adapater so that APS can utilize this information
and remove extraneous inputs.

Test Plan: No-op

Differential Revision: D68928416

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146107
Approved by: https://github.com/pianpwk
2025-02-05 00:29:58 +00:00
angelayi
0c37c332da [export] Additionally save pytree namedtuple field names (#145956)
If a user passes in a namedtuple as an input, currently the input TreeSpec looks like: `TreeSpec(type=namedtuple, context=”class_fqn”, children_spec=[*, *])`

The user then saves the program containing this input TreeSpec. But what happens if they load it in a new environment where `class_fqn` now contains an additional field?

This means that the exported program is now expected to take in another input. But since those fields were not used in the original program, users should be able just drop those additional fields and the program will run successfully. This is needed/used in APS where they use unflattener's adapter to adapt the inputs based on the previously saved treespecs.

There are a couple of [solutions](https://docs.google.com/document/d/1V4ZSdy-8PUISWc8RqvGu3DU01BVegJhHHPWqa1Io7Eg/edit?tab=t.0) for how we can address this, but eventually we settled on saving a side table mapping namedtuple types to their list of field names, which can then be accessed by the adapter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145956
Approved by: https://github.com/zhxchen17
2025-02-04 04:42:30 +00:00
Tugsbayasgalan Manlaibaatar
041e08f9dc Add buffers to parameterizaiton rule (#145991)
Differential Revision: [D68959513](https://our.internmc.facebook.com/intern/diff/D68959513)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145991
Approved by: https://github.com/bdhirsh
2025-02-03 16:49:03 +00:00
Avik Chaudhuri
0144613e6f move and fix logic to update unbacked bindings (#146115)
Summary:
Previously we were touching up unbacked bindings between Dynamo and AOTAutograd in strict export, but the logic had a bug: if an unbacked symint gets substituted by a backed symint, we would put the backed symint in the unbacked bindings (the check `is_symbol` was not enough here).

This PR fixes this logic, and moreover, moves it into the serializer instead, because we don't need this adjustment outside serde.

Test Plan: added test

Differential Revision: D68880766

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146115
Approved by: https://github.com/pianpwk
2025-02-02 10:43:55 +00:00
Avik Chaudhuri
cde5ddfd14 fix internal error with reorder submodules (#146181)
Test Plan: hard to isolate as small repro

Differential Revision: D68963033

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146181
Approved by: https://github.com/angelayi
2025-02-01 00:30:42 +00:00
angelayi
1c9014a135 [export] Add tlparse to draft-export (#145810)
Dependent on https://github.com/ezyang/tlparse/pull/87/files
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145810
Approved by: https://github.com/pianpwk
2025-01-29 19:26:00 +00:00
Pian Pawakapan
cbc4094298 [draft_export] add LOC for data-dep error logging (#145443)
Summary:
maybe this is too much info, but it's difficult to go through old draft export reports where the stack trace is out of sync with the current codebase. Data-dependent errors now look like:
```
2. Data dependent error.
    When exporting, we were unable to evaluate the value of `u306`.
    This occurred at the following stacktrace:
    File /data/users/pianpwk/fbsource/buck-out/v2/gen/fbcode/78204cab86e8a0fb/sigmoid/inference/ts_migration/__pt2i_readiness_main__/pt2i_readiness_main#link-tree/caffe2/torch/fb/training_toolkit/common/proxy_module_thrift/embedding_bag_proxy.py, lineno 109, in _forward_impl:
         `if offsets[-1] > len(input):`
    As a result, it was specialized to evaluate to `261`, and asserts were inserted into the graph.
    Please add `torch._check(...)` to the original code to assert this data-dependent assumption.
    Please refer to https://docs.google.com/document/d/1kZ_BbB3JnoLbUZleDT6635dHs88ZVYId8jT-yTFgf3A/edit#heading=h.boi2xurpqa0o for more details.
```

This would be even more helpful for reports on torch-packaged models, but that requires some more work on PT2I-specific stack trace processing

Test Plan: .

Differential Revision: D68534017

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145443
Approved by: https://github.com/angelayi
2025-01-28 18:55:16 +00:00
Randolf Scholz
835e770bad Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994)
Fixes #144976

Using appoach ① `IO[bytes]`, but could also try with a protocol.

## Notes:

- moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike`
- Use `FileLike` annotation where it makes sense
- made sure those functions also support `os.PathLike`
- Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate.
- Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`)
- needed to make `torch.serialization._opener` generic to avoid LSP violations.
- skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str | PathLike[str] | IO[bytes]` directly...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2025-01-27 18:08:07 +00:00
Avik Chaudhuri
42b8e233d9 serde unbacked bindings (#144894)
Adds unbacked bindings during deserialization. These are carried by a node's metadata, and map pending fresh unbacked symbols to paths to such symbols inside the corresponding example value carried by the node's metadata.

Since it is awkward to serialize paths, we only serialize the names of these symbols and reconstruct the paths on deserialization, using a shape env util. We also need to bump counters for unbacked symbols here, because the shape env util we use to create these symbols (when deserializing example values) don't do so, and not doing so makes later passes (like `run_decompositions`) crash because new unbacked symbols don't get new names.

This is enough for non-strict. For strict, the unbacked bindings and example values in node metadata can get out of sync, because of running AOTAutograd as an additional step after Dynamo. So we have to sync those back.

Differential Revision: [D68232274](https://our.internmc.facebook.com/intern/diff/D68232274/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144894
Approved by: https://github.com/pianpwk
2025-01-25 02:34:27 +00:00
Aaron Gokaslan
f3304571fc [BE][Ez]: FURB148 - remove useless enumerate calls (#145619)
Remove useless enumerate calls

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145619
Approved by: https://github.com/drisspg
2025-01-24 23:37:15 +00:00
Pian Pawakapan
99367ecbed [draft export] count how many times a data-dep error shows up (#145030)
Summary: maybe this is helpful?

Test Plan: draft_export

Differential Revision: D68303934

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145030
Approved by: https://github.com/angelayi
2025-01-23 20:27:31 +00:00
Aaron Gokaslan
5ebca3015d [BE]: Simplify set add with set update (#145152)
Simplifies the set update slightly to be more readable and efficient.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145152
Approved by: https://github.com/XuehaiPan, https://github.com/albanD

Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>
2025-01-23 20:18:13 +00:00
PyTorch MergeBot
6e53588789 Revert "[BE]: Simplify set add with set update (#145152)"
This reverts commit 0cb9b2284a.

Reverted https://github.com/pytorch/pytorch/pull/145152 on behalf of https://github.com/davidberard98 due to land race with https://github.com/pytorch/pytorch/pull/145165 broke lint ([comment](https://github.com/pytorch/pytorch/pull/145152#issuecomment-2608378172))
2025-01-22 22:14:26 +00:00
Aaron Gokaslan
0cb9b2284a [BE]: Simplify set add with set update (#145152)
Simplifies the set update slightly to be more readable and efficient.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145152
Approved by: https://github.com/XuehaiPan, https://github.com/albanD

Co-authored-by: Xuehai Pan <XuehaiPan@outlook.com>
2025-01-22 21:31:13 +00:00
Zhengxu Chen
ac8ddf1150 [export][be] Clean up local imports from export [1/n] (#145287)
Summary: as title

Test Plan: CI

Differential Revision: D68449844

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145287
Approved by: https://github.com/pianpwk
2025-01-22 19:09:17 +00:00
Aaron Orenstein
b6c5562c1f PEP585 update - torch/export (#145165)
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145165
Approved by: https://github.com/bobrenjc93
2025-01-19 20:56:55 +00:00
Zhengxu Chen
53256edff9 [export] Support module inputs for non strict mode. (#143925)
Summary:
Add experimental support for torch.nn.Module as input types.

Before this change, we don't support module inputs but recently we saw some interesting use cases like gpt-fast https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L68 where we directly pass in a module input for different variants of the same models.

Since we don't really care about non-param or non-buffer states in non strict mode, we don't care about those either and pretend they are like plain constants during tracing. We treat any module input like a nested container of tensor, and each time we will automatically register a pytree handler for these module types to flatten its state dict into a group of tensors. We will just inline any module method call during tracing like we did for `self` module in export_for_training. This will make input modules' behavior very similar to the training module in typical case, except that we don't record the inputs as parameter or buffers but rather just plain user inputs.

Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_module_input

Differential Revision: D67680827

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143925
Approved by: https://github.com/tugsbayasgalan
2025-01-16 17:30:36 +00:00
Pian Pawakapan
774f21a370 [export] handle buffer/input mutations for joint-graph (#144806)
Summary: previous construction of GraphSignature output specs didn't consider buffer/user input mutations

Test Plan: test_experimental

Differential Revision: D68177409

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144806
Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri
2025-01-16 00:22:16 +00:00
Yidi Wu
c7dbee5106 [reland][export] don't decompose custom triton op when exporting (#144284)
Summary:
A reland of https://github.com/pytorch/pytorch/pull/142426.

Copying the description over here:

For torch.export (strict and non-strict), we don't do functional decomposition. Instead, we preserve the custom triton ops as custom ops. This is because we want the exported program to be high-level and serializable.

The alternative:
If we decompose the custom op to a functional hop and make it a node in exported program, we need to figure out ways of serializing the hop and its arguments, which can be triton.jited python functions and triton dtypes. This is undesireble because:

it can be tedious to maintain layer that serialize the jited function (e.g. with a string) and dtypes.
changes to triton or the serialization logic for triton arguments can be BC breaking
exported program will expose the implementation detail (i.e. triton source code) for a specific backend (GPU) to users, which mixes levels of abstraction.

Future plans:
After this PR, in the short term, we expect users to have a seperate aot_compile stage that compiles the exported program into a Cubin file on the same machine that users call export, which does autotuning and removes triton dependency and serve the model with Cubin. This guarantees that triton changes won't break BC.

In the long term, we may export multiple cubins for the triton op directly.

Test Plan: see new tests.

Differential Revision: D67879685

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144284
Approved by: https://github.com/zou3519
2025-01-11 01:34:35 +00:00
angelayi
10ff6b8894 [export] Add pickle protocol (#142253)
Fixes https://github.com/pytorch/pytorch/issues/142004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142253
Approved by: https://github.com/avikchaudhuri
2025-01-10 19:49:07 +00:00
Xuehai Pan
dcc3cf7066 [BE] fix ruff rule E226: add missing whitespace around operator in f-strings (#144415)
The fixes are generated by:

```bash
ruff check --fix --preview --unsafe-fixes --select=E226 .
lintrunner -a --take "RUFF,PYFMT" --all-files
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144415
Approved by: https://github.com/huydhn, https://github.com/Skylion007
2025-01-08 21:55:00 +00:00
Brian Muse
a5164a2b18 [BE] Clean up ExecuTorch Export Docstring (#141490)
Summary: I noticed when looking at the docs for [`torch.export.load`](https://pytorch.org/docs/stable/_modules/torch/export.html#load) that it looked like there was a copy and paste error from the save command docstring since ep is not an actual parameter for load and it says "The exported program to save." This diff removes it from the docstring.

Test Plan: Automated Testing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141490
Approved by: https://github.com/JacobSzwejbka
2025-01-08 21:28:58 +00:00
Tugsbayasgalan Manlaibaatar
c68c38c673 Support getattr for tensor subclasses in pre-dispatch export via patching tensor.getattr (#143946)
Previous discussion: https://github.com/pytorch/pytorch/pull/143671#issuecomment-2560112499 and https://github.com/pytorch/pytorch/pull/143671

Differential Revision: [D67693609](https://our.internmc.facebook.com/intern/diff/D67693609)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143946
Approved by: https://github.com/bdhirsh
2025-01-06 23:55:50 +00:00
bobrenjc93
edbda2fad8 remove allow-untyped-defs from torch/export/_remove_auto_functionalized_pass.py (#144230)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144230
Approved by: https://github.com/Skylion007
2025-01-06 22:23:19 +00:00
Pian Pawakapan
bba672e117 [docs/export] update dynamic_shapes docs (#142510)
https://pytorch.org/docs/stable/export.html dynamic_shapes section formatting is messed up, fix & update documentation to be more user-friendly.

Happy accepting nits :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142510
Approved by: https://github.com/yushangdi
2025-01-06 14:12:34 +00:00
bobrenjc93
64b197b603 remove allow-untyped-defs from export/_remove_auto_functionalized_pass.py (#144135)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144135
Approved by: https://github.com/Skylion007
2025-01-03 20:08:11 +00:00
Tom Ritchford
f1cbf4b1b5 Enable ruff's unused variable checking everywhere in pytorch (#136965)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136965
Approved by: https://github.com/cyyever, https://github.com/albanD
2024-12-22 02:33:11 +00:00
Avik Chaudhuri
51eacea8c4 graph module retracing without preserving MCS (#143676)
Retracing while preserving module call signatures used to be a problem because graph modules don't have submodules at given paths. This led to a number of failing retracebility tests. By not trying to wrap modules with export tracepoints we can pass most of these tests; the only exception is where you do module swapping on retraced programs, which is still not possible.

Differential Revision: [D67539304](https://our.internmc.facebook.com/intern/diff/D67539304/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143676
Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan
ghstack dependencies: #143664
2024-12-21 07:57:43 +00:00
Avik Chaudhuri
bdeee82822 unflatten isinstance (#143664)
When we unflatten, the submodules we generate (`InterpreterModule` or `InterpreterModuleDispatcher`) are not related by type to the original submodules `N`. This makes `isinstance(mod, N)` checks fail. Since we do not have the original types after export, the best we can do is expose a `type_name()` method that carries the original type name, which we do carry in `nn_module_stack` entries.

Differential Revision: [D67526542](https://our.internmc.facebook.com/intern/diff/D67526542/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143664
Approved by: https://github.com/tugsbayasgalan
2024-12-21 01:07:10 +00:00
Tugsbayasgalan Manlaibaatar
0ce233b8ca Support tensor subclass unwrapping (#141941)
This PR adds support for export to unwrap/wrap subclasses AOT so that we can trace through subclass parameters. This will resolve the UX issue in torchao where users had to manually unwrap their subclasses before calling export.

Differential Revision: [D67531057](https://our.internmc.facebook.com/intern/diff/D67531057)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141941
Approved by: https://github.com/bdhirsh
2024-12-21 00:29:31 +00:00
Yidi Wu
1e201422ed [export] add is_exporting flag (#142425)
We added an is_export flag under torch.compiler.is_exporting. This comes handy when we try to do some special logic in user-level and system-level (e.g. in upper of the stack).

In increasing-scope:
- `_is_fx_tracing` is set to True when we use under symbolic_trace or make_fx.
- `is_exporting` is set to True when we're doing strict or non-strict export, which internally has a step that calls make_fx and set _is_fx_tracing to be True.
- `is_compiling` is set to True when we're either doing strict, non-strict export or torch.compile.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142425
Approved by: https://github.com/avikchaudhuri
2024-12-18 21:36:28 +00:00
Avik Chaudhuri
497f89ff83 fix dynamo nn module stack fqn (#142823)
Dynamo can produce sources that have funny patterns in their `.name()` that break `nn_module_stack` fqns. Added a test that used to have `._modules` inside nn_module_stack fqns, now doesn't. (Unfortunately couldn't repro a case mentioned in the GH issue where `.slice(...)` is claimed to appear as well.)

Fixes https://github.com/pytorch/pytorch/issues/141939

Differential Revision: [D67064189](https://our.internmc.facebook.com/intern/diff/D67064189/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142823
Approved by: https://github.com/pianpwk, https://github.com/zhxchen17
2024-12-12 07:02:13 +00:00
Avik Chaudhuri
db51308d9c fix output node name (#142506)
Fixes #142227

Differential Revision: [D67043283](https://our.internmc.facebook.com/intern/diff/D67043283/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142506
Approved by: https://github.com/ydwu4
2024-12-11 17:28:28 +00:00
Avik Chaudhuri
e3886fb13c misc. fixes to unflatten (#142141)
Combining several fixes to unflatten for bugs revealed by random graph testing.

The fixes target two categories of bugs:
1. Some bugs show up as exponential blowups for largish system of nn modules. These are fixes by converting lists to sets, using caching, or otherwise rewriting to reuse computation more effiicently.
2. Other bugs were due to missing intermediate modules created when attributes such as submodules and buffers are accessed through longish paths before calling the corresponding intermediate modules, or missing attributes such as buffers and constants in submodules corresponding to multiple calls.

Differential Revision: [D66659795](https://our.internmc.facebook.com/intern/diff/D66659795/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142141
Approved by: https://github.com/ydwu4
2024-12-10 03:45:13 +00:00
Shangdi Yu
bcddae14ec Enhance "from_node" node meta to track source recursively (#142066)
Summary:
Change the "from_node" node meta format to be able to track the provenance of nodes recursively.

The new "from_node" format is a a list node NodeSource:

```
class NodeSource:
	self.node_name: str
	self.target: str
	self.graph_id: int
	self.pass_name: str
	self.action: str
	self.from_node: List[NoedSource]
```

This is in preparation for the inductor provenance tracking. For background, the inductor provenance tracking doc: https://docs.google.com/document/d/1dGh9myqNhywmbfP0Quzx_f04bghDFlj8cawj8MopiO8/edit?fbclid=IwZXh0bgNhZW0CMTEAAR0jUQ0Tf4ROLDED8Y_eIzrU0KVZVdRmyIQLp-avt-kGRPI_VgYVNyjH_q0_aem_HCQ_pxHDiwOkO9mQyWB2-g&tab=t.0 (internal only),

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r test_unflatten_multiple_graphs_state
buck run mode/dev-nosan caffe2/test:fx -- -r node_source
```

Differential Revision: D66737916

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142066
Approved by: https://github.com/avikchaudhuri
2024-12-09 23:39:15 +00:00
Fabian Keller
5e8e1d725a Remove some unused type ignores (round 1) (#142325)
Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking.

Having these `# type: ignore` linger around is not ideal for two reasons:

- They are verbose/ugly syntatically.
- They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore.

I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward.

This PR should have no effect on runtime at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
2024-12-09 18:23:46 +00:00
bhack
ae9cda0221 Add truediv support in export serializer (#136364)
Fixes #136113

- [x] Inital `truediv` coverage
- [ ] Expand/reduce coverage?
- [x] Add tests
- [x] Re-check docstrings
- [ ] Linting

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364
Approved by: https://github.com/pianpwk

Co-authored-by: Angela Yi <angelayi@meta.com>
Co-authored-by: Pian Pawakapan <pianpwk@meta.com>
2024-12-05 17:33:33 +00:00
Yiming Zhou
31f2d4eb4e [export] Update docs (#142011)
Summary:
Update export docs. Including:
1. Update the output graph.
2. Misc fixes for examples.

Test Plan: CI

Differential Revision: D66726729

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142011
Approved by: https://github.com/angelayi
2024-12-05 03:44:46 +00:00
Fabian Keller
f472b3aee1 improve typings around torch.export (#141829)
This is another follow-up to https://github.com/pytorch/pytorch/pull/115074 / https://github.com/pytorch/pytorch/pull/141240 following the strategy discussed there (https://github.com/pytorch/pytorch/pull/115074#issuecomment-2480992230).

This PR improves the type annotations around `torch._export`. Even though the PR introduces a few runtime type asserts, the runtime behavior should stay equivalent, because the failed assertions should have been immediate crashes anyway.

CC @Skylion007 @ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141829
Approved by: https://github.com/ezyang
2024-12-03 19:57:21 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
a6bea3d86d Fix DCe in training IR to reflect correct record function op (#141899)
Summary: The exit function is actually exit._recordFunction not exit.default

Test Plan: CI

Differential Revision: D66665359

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141899
Approved by: https://github.com/ydwu4
2024-12-03 01:59:37 +00:00
Fabian Keller
394c339691 improve typings in unflatten (#141817)
A first follow-up to https://github.com/pytorch/pytorch/pull/115074 / https://github.com/pytorch/pytorch/pull/141240 following the strategy discussed there (https://github.com/pytorch/pytorch/pull/115074#issuecomment-2480992230).

This PR improves the type annotations around `unflatten.py` which had been inaccurate due to the previously suppressed type checking on `torch.nn.Module`.

CC @Skylion007 @ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141817
Approved by: https://github.com/Skylion007
2024-11-30 22:12:15 +00:00
Ivan Zaitsev
09a3eddc07 Revert #141066 and #141494 (#141721)
manual revert due to merge conflicts

note: #141494 was reverted out of order blocking automatic revert of #141066

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141721
Approved by: https://github.com/avikchaudhuri
2024-11-28 20:18:19 +00:00
PyTorch MergeBot
8c90a9a030 Revert "fix non termination in unflatten + state (#141494)"
This reverts commit 5d7c3701e4.

Reverted https://github.com/pytorch/pytorch/pull/141494 on behalf of https://github.com/jovianjaison due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/141494#issuecomment-2504639230))
2024-11-27 19:30:55 +00:00
PyTorch MergeBot
6e61ff4fd3 Revert "Add truediv support in export serializer (#136364)"
This reverts commit 1df440dc4e.

Reverted https://github.com/pytorch/pytorch/pull/136364 on behalf of https://github.com/huydhn due to Sorry for reverting your change but its doc build failure is legit ([comment](https://github.com/pytorch/pytorch/pull/136364#issuecomment-2502620732))
2024-11-27 03:24:31 +00:00
bhack
1df440dc4e Add truediv support in export serializer (#136364)
Fixes #136113

- [x] Inital `truediv` coverage
- [ ] Expand/reduce coverage?
- [x] Add tests
- [x] Re-check docstrings
- [ ] Linting

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364
Approved by: https://github.com/pianpwk

Co-authored-by: Angela Yi <angelayi@meta.com>
Co-authored-by: Pian Pawakapan <pianpwk@meta.com>
2024-11-27 00:31:47 +00:00
Zhengxu Chen
011650adc5 [sigmoid] Refactor out a helper function to insert const graph into top level graph. (#140854)
Summary: Add the helper function to put a const graph back to the toplevel graph, can be useful when we're taking const graphs from delegates.

Test Plan: CI

Reviewed By: trieuat

Differential Revision: D63031982

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140854
Approved by: https://github.com/SherlockNoMad
2024-11-26 20:07:46 +00:00
Avik Chaudhuri
5d7c3701e4 fix non termination in unflatten + state (#141494)
With largish systems of nn modules with buffers, sinking params suffered from some kind of exponential blowup that is easily fixed by using a set instead of a list to keep track of unlifted buffer placeholders.

Test Plan: added random dag test that failed previously

Differential Revision: D66457661

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141494
Approved by: https://github.com/angelayi
2024-11-26 00:17:56 +00:00
Tugsbayasgalan Manlaibaatar
11c786dcb5 [BE] Make maybe_aliasing_or_mutating proper tag (#131990)
For better tracking, we need to make maybe aliasing/mutating ops with proper tag. We need to special case native_batch_norm because it is not a CIA but has a wrong schema. I guess native_batch_norm will be removed at some point, so until then we just keep it around.

D60347117
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131990
Approved by: https://github.com/bdhirsh
2024-11-24 00:12:49 +00:00
Angela Yi
f6eeab7ea8 [export] Make unflattened module compileable (#141249)
Test Plan: Fixes https://fb.workplace.com/groups/1028545332188949/permalink/1091988579177957/

Differential Revision: D66302806

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141249
Approved by: https://github.com/avikchaudhuri
2024-11-23 18:46:01 +00:00
Avik Chaudhuri
8b4ae29b1b misc. fixes to unflatten (#141066)
Handling of nested modules in unflatten had several bugs, which were caught by trying to preserve module call signatures for nested modules.
* A module `k` encountered when calling `k.n()` before `k()` used to become an empty nn module. This caused some information to be dropped when `k()` was eventually called. Relatedly, we would also lose call counts for `k.n()` through different paths (say, when `k()` calls `n()`).
* Deleting call-indexed modules and patching up their call sites was broken for nested modules when creating dispatcher modules, because of silliness when handling their fqns.

An interesting aside is that we used random graph generation for testing some of these changes. A future PR will add the infra to create tests using these random graphs.

Differential Revision: D66192799

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141066
Approved by: https://github.com/angelayi
2024-11-23 07:31:51 +00:00
angelayi
32583d915e [export] Improve stacktrace filtering (#141285)
Differential Revision: [D66321127](https://our.internmc.facebook.com/intern/diff/D66321127)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141285
Approved by: https://github.com/yushangdi
ghstack dependencies: #141071, #141072
2024-11-22 20:55:04 +00:00
angelayi
53df1c11cd [export] Add custom op guards (#141072)
For custom ops that do not have a meta kernel, draft export automatically creates a meta kernel based on the tracing example inputs. To ensure that these assumptions made during tracing is clear to the user, we add assertions into the traced exported program:

An example graph:
```
ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, a: "f32[s0, s1]", b: "f32[s2, s3]"):
             # File: /data/users/angelayi/pytorch/test/export/test_draft_export.py:172 in forward, code: res1 = torch.ops.mylib.foo4(a, b)
            _assert_tensor_metadata = torch.ops.aten._assert_tensor_metadata(a, dtype = torch.float32, device = device(type='cpu'));  _assert_tensor_metadata = None
            _assert_tensor_metadata_1 = torch.ops.aten._assert_tensor_metadata(b, dtype = torch.float32, device = device(type='cpu'));  _assert_tensor_metadata_1 = None
            foo4: "f32[u2, u3]" = torch.ops.mylib.foo4.default(a, b);  a = b = None
            return (foo4,)
```

Differential Revision: [D66321129](https://our.internmc.facebook.com/intern/diff/D66321129)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141072
Approved by: https://github.com/pianpwk
ghstack dependencies: #141071
2024-11-22 20:55:04 +00:00
Edward Z. Yang
612122af8f Fix type-safety of torch.nn.Module instances (#141240)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141240
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-11-22 00:05:05 +00:00
Pian Pawakapan
e894219504 [export] fix loss_output in joint graph signature (#140974)
Summary: joint-graph export is marking all outputs as LOSS_OUTPUT, fix so it marks only the correct one

Test Plan: test_experimental

Differential Revision: D66117412

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140974
Approved by: https://github.com/JacobSzwejbka
2024-11-21 23:57:07 +00:00
Pian Pawakapan
1132b6764a [draft export] generate fake outputs when real tensor prop finds mismatches (#139766)
Currently real tensor tracing raises MetadataMismatchErrors if registered fake kernels don't match the real kernels (e.g. shape, aliasing, dtype, etc.). This adds an option to use fake kernel inference to bypass mismatches - this option defaults to False for real tensor tracing, but is on for draft export.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139766
Approved by: https://github.com/angelayi, https://github.com/zou3519
2024-11-21 08:01:09 +00:00
Tugsbayasgalan Manlaibaatar
87f9c1abe5 Change export IR to non-functional pre-dispatch IR (#139511)
Differential Revision: [D65362160](https://our.internmc.facebook.com/intern/diff/D65362160)

State after this IR:
1. For the tests that require inference IR, they are replaced with ep.run_decomp({}) so export_for_training_run_decomp is sort of redundant but i guess it is still nice that multiple round of retracing still working. In general, we need some auditing to reduce our redundant testing coverages.
2. After this PR landed and not get reverted for a week or so, i will replace the export_for_training calls with export as they are the same thing now.
3. Added more tests to also cover now "deprecated" old IR by patching export to use old export. For reviewers, please look at the internal version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139511
Approved by: https://github.com/ydwu4, https://github.com/angelayi, https://github.com/avikchaudhuri
2024-11-20 21:47:55 +00:00
Aaron Gokaslan
12e95aa4ee [BE]: Apply PERF401 autofixes from ruff (#140980)
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-11-20 17:52:07 +00:00
angelayi
cb6a21b033 [export] Add setattr for ep.example_inputs (#140990)
Differential Revision: [D66136725](https://our.internmc.facebook.com/intern/diff/D66136725)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140990
Approved by: https://github.com/yushangdi, https://github.com/ydwu4
2024-11-20 02:49:20 +00:00
Tugsbayasgalan Manlaibaatar
2b21a653d8 Register CIA ops to FakeTensorMode directly in export (#140465)
During export, we nub out most CIA ops to return NotImplemented to avoid decomposing them during tracing. To recover the existing shape propagation behavior, we register these CIA decomps directly as FakeTensorMode rules as well. The reason we have to do is because when we return NotImplemented, FakeTensor would fallback to running these CIAs with Meta backend causing device branching CIA ops to fail. (because now the device is Meta. One example is sdpa). If we register a kernel directly to FakeTensorMode, we won't fallback to Meta backend.

Differential Revision: [D65716260](https://our.internmc.facebook.com/intern/diff/D65716260/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140465
Approved by: https://github.com/bdhirsh
2024-11-19 15:00:35 +00:00
Tugsbayasgalan Manlaibaatar
b86b5349cb Ignore eager profiling code in training IR (#140826)
Differential Revision: [D66010452](https://our.internmc.facebook.com/intern/diff/D66010452/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140826
Approved by: https://github.com/zhxchen17
2024-11-16 20:31:17 +00:00
Angela Yi
2b39a8db77 Refactor UnflattenedModule's adapt flat args (#140840)
Test Plan: unblocks model launch

Differential Revision: D66014709

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140840
Approved by: https://github.com/pianpwk
2024-11-16 05:09:37 +00:00
Shangdi Yu
8094b19620 Fix _out_spec (#140608)
Summary: The gm_torch_level can be a _LazyGraphModule(GraphModule) instead of a GraphModule. When we call .recompile(), GraphModule populates the self._out_spec, but _LazyGraphModule(GraphModule).recompile() doesn't populate it.

Test Plan: CI

Differential Revision: D65902135

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140608
Approved by: https://github.com/tugsbayasgalan
2024-11-14 20:09:30 +00:00
Avik Chaudhuri
7691064768 dispatcher module for multiple graphs (#139439)
Differential Revision: [D65307961](https://our.internmc.facebook.com/intern/diff/D65307961/)

This PR introduces the concept of a "dispatcher" module `n` that carries multiple interpreter modules `n`, `n@1`, `n@2`, etc., each corresponding to a particular call of `n` and thus might carry a different specialized graph. We only do this when we're preserving module call signatures for `n`. The carried modules have the same number and order of calls to `n` appearing in the original module / exported program. In the unflattened module, all those calls go to the "dispatcher" module which internally tracks how many calls have been made so far and invokes the corresponding interpreter module. We reset this tracking after a successful or unsuccessful run of the unflattened module.

Overall this makes swapping easier when module call signatures are preserved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139439
Approved by: https://github.com/tugsbayasgalan
ghstack dependencies: #139438
2024-11-12 09:53:40 +00:00
Angela Yi
de509abe1c [export] Dedup data-dependent errors based on stacktrace (#139540)
Summary:
Dedup the data-dependent errors based on the stacktrace it points to. Right now we just display every propagate-real-tensor log that shows up, but we actually can dedup them if they are due to the same piece of code (ex. there could multiple calls to a piece of code that does some data dependent computation).

This occurred when trying out draft export on the PT2I model zoo. For a specific model, previously we would get ~3k data dependent errors, but after deduping based on the stacktrace we now only get 4 errors.

Test Plan: CI

Differential Revision: D65374254

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139540
Approved by: https://github.com/pianpwk, https://github.com/zou3519
2024-11-05 18:16:05 +00:00
Henry Tsang
350bc2a166 [export] Add support for symbool to make it usable for torch.cond (#138765)
# Why?

I want the following code to work.

minimal repro:
```
class M(torch.nn.Module):
    def forward(self, dilate_flag):
        return dilate_flag.item()

input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),)
model = M().cuda()

ep = torch.export.export(model, input1, strict=True)
path = torch._inductor.aot_compile(ep.module(), input1)
aot_model = torch._export.aot_load(path, device="cuda")
actual_output = aot_model(*input1)
```

error: AssertionError: Encountered an unsupported object of type <class 'torch.SymBool'> while writing the metadata for exported program

second error will be handled by https://github.com/pytorch/pytorch/pull/138760

# Motivation

I could technically bypass it with a torch.int tensor. However, it doesn't work with torch.cond. I want the following to work. It would also require https://github.com/pytorch/pytorch/pull/138760 for aot compile to work.

```
class M(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.dilate_flag = 0

    def forward(self, dilate_flag):
        self.dilate_flag = dilate_flag.item()

        def true_fn(dilate_flag):
            return dilate_flag.clone()

        def false_fn(dilate_flag):
            return dilate_flag.clone()

        torch.cond(
            self.dilate_flag,
            true_fn,
            false_fn,
            (dilate_flag,),
        )
        return self.dilate_flag

input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),)
input2 = (torch.tensor([0], dtype=torch.bool, device="cuda"),)
inputs = (input1, input2)
model = M().cuda()

for input in inputs:
    expected_output = model(*input)

    ep = torch.export.export(model, input, strict=False)
    path = torch._inductor.aot_compile(ep.module(), input)
    aot_model = torch._export.aot_load(path, device="cuda")
    actual_output = aot_model(*input)

    assert (
        expected_output == actual_output
    ), f"henry they are not equal {expected_output} != {actual_output}"
```

Differential Revision: D64867504

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138765
Approved by: https://github.com/ydwu4
2024-11-04 23:31:49 +00:00
Tugsbayasgalan Manlaibaatar
ae0e7042f6 Fix custom obj being input (#139209)
Differential Revision: [D65158939](https://our.internmc.facebook.com/intern/diff/D65158939)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139209
Approved by: https://github.com/ydwu4
ghstack dependencies: #138658
2024-11-04 18:24:29 +00:00
Tugsbayasgalan Manlaibaatar
e080c89bdc Make test_torchbind.py training IR compatible (#138658)
In this diff, i make test_torchbind.py tests to handle training IR. Today in the training IR, we don't see the effect token and HOP because this happens at the FunctionalTensorMode. Maybe in the future, we should move this logic up to the training IR so that writing passes etc on training Ir is safer. But for the migration purposes, i think it is ok for now.  I also fixed two bugs:
1. ep.module() doesn't register all aliased constants in the module.
2. When we retrace, we need to fakify the original Torchbind object.
3. We don't run any DCE on training IR so we need to add some more torch ops to verifier.

Differential Revision: [D64853530](https://our.internmc.facebook.com/intern/diff/D64853530)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138658
Approved by: https://github.com/ydwu4, https://github.com/zhxchen17
2024-11-04 17:43:11 +00:00
angelayi
86db2cd194 [export] Initial draft export (#139383)
Differential Revision: [D65288590](https://our.internmc.facebook.com/intern/diff/D65288590)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139383
Approved by: https://github.com/zou3519
2024-11-01 06:25:44 +00:00
angelayi
f14f245747 [export] Remove custom forward func in swap (#139126)
Differential Revision: [D65100694](https://our.internmc.facebook.com/intern/diff/D65100694)

Remove the custom forward function and instead move the pytree flatten/unflatten ops into the graph. This allows us to natively run via the interpreter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139126
Approved by: https://github.com/avikchaudhuri
2024-10-30 16:50:57 +00:00
Yiming Zhou
48b55ca1b1 [export] Fix non-strict retracing with kwargs (#138927)
Summary:
`torch.fx.Interpreter.run()` only takes args as input. Currently we pass kwargs as well which causes errors during retracing.

Flatten the kwargs and concat them with args will solve the issue.

Several previously failing tests under `_retraceability_non_strict` now passes.

Test Plan:
```
buck2 test @//mode/dev-nosan //caffe2/test:test_export -- -r _retraceability_non_strict
```

Differential Revision: D64980053

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138927
Approved by: https://github.com/angelayi
2024-10-29 04:31:21 +00:00
Avik Chaudhuri
9e06b5b5cb fix unflatten with HOPs (#138978)
Summary:
Unflatten was broken for HOPs for a couple of reasons:
(1) we didn't expect `get_attr` nodes in the exported program, but they can occur to hold graph arguments to HOPs; such attributes must be moved from the exported program to the corresponding unflattened submodule containing the HOP call.
(2) we don't record metadata for graph arguments on serialization (there's nothing to hold it in our schema), and accordingly the `get_attr` nodes we create on deserialization don't have `nn_module_stack` metadata, which obviously wrecks unflatten.

Test Plan: added a couple of tests

Differential Revision: D65013647

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138978
Approved by: https://github.com/zhxchen17
2024-10-28 19:30:56 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
3a0c361899 Remove presere ops (#138371)
Summary:
CI
#buildall

Test Plan: CI

Reviewed By: StellarrZ

Differential Revision: D64151426

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138371
Approved by: https://github.com/bdhirsh
2024-10-25 19:13:55 +00:00
Gagan Jain
a6287b5c27 Fixing issue in move pass for copying Parameter (#138855)
Summary: Fixing bug for Parameter copy during move pass of exported graph.

Test Plan:
UT

runs on APS models.

Differential Revision: D64876951

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138855
Approved by: https://github.com/pianpwk

Co-authored-by: Gagan Jain <gaganj@meta.com>
2024-10-25 17:57:27 +00:00
Avik Chaudhuri
1d98a526dd preserve signatures with multiple calls + buffer mutations (#138669)
As called out in https://github.com/pytorch/pytorch/pull/137999, preserving signatures of multiple calls when buffer mutations are present was NYI. The main problem was that intermediate values of buffers were not tracked, so couldn't be propagated statefully between multiple calls (i.e., they would need to be explicitly passed around, defeating the unlifting needed for preserving signatures).

This PR fixes this situation, by introducing module attributes that carry the necessary intermediate values of buffer mutations. In general, a buffer mutation can have several intermediate values it depends on recursively, even other buffers. So rather than tying an intermediate value with a particular buffer, we tie it with the submodules that create and read it. We install an attribute on all modules that create or read a particular intermediate value, sharing the same initial storage (i.e., initialized with the same empty tensor). For the module that creates this intermediate value, we copy the value into the corresponding attribute; and for the modules that read it, we read the corresponding attribute instead.

Another complication that needed to be addressed was that a `run_decompositions` following an `export_for_training` was not preserving module call graphs, which is needed for unflattening and, in particular, used when remapping inputs. Fortunately some existing metadata already tracks provenance of nodes, which we could use to update a module call graph after functionalization / decomposition.

Differential Revision: D64806175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138669
Approved by: https://github.com/tugsbayasgalan
2024-10-25 00:13:25 +00:00
Tugsbayasgalan Manlaibaatar
f4b3813989 Wrap autograd and autocast ops in training IR (#138516)
Differential Revision: [D64732361](https://our.internmc.facebook.com/intern/diff/D64732361)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138516
Approved by: https://github.com/yushangdi
ghstack dependencies: #138261
2024-10-23 00:37:54 +00:00
Pian Pawakapan
51045e6251 make DimHints compatible with Dims (#138490)
Previously we'd been raising UserErrors when `Dim()` and DimHints (`Dim.AUTO/Dim.DYNAMIC`) were both specified in `dynamic_shapes`, this PR stops that, and uses `Dim()` objects to guide DimHints.

The key to this was making the `EqualityConstraint` class happy when it checks that inferred equivalence relations were specified in the original `dynamic_shapes` spec, and this introduces a `RelaxedConstraint` object to mark the hinted dimensions, so equality checks between `RelaxedConstraints` and other constraints are treated as valid.

Current behavior is that:
```
class Foo(torch.nn.Module):
    def forward(self, x, y):
        return x - y

inputs = (torch.randn(4, 4), torch.randn(4, 4))
shapes = {
    "x": (Dim.AUTO, Dim("d1", min=3)),
    "y": (Dim("d0", max=8), Dim.DYNAMIC),
}
ep = export(Foo(), inputs, dynamic_shapes=shapes)
```

The dimensions marked `AUTO` and `DYNAMIC` will have max & min ranges of 8 & 3 respectively. Note that inferred equality between `Dim()` objects & `Dim.STATIC` will still raise errors - `Dim()` suggests not specializing to a constant.

Differential Revision: D64636101

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138490
Approved by: https://github.com/avikchaudhuri
2024-10-22 07:43:48 +00:00
Tugsbayasgalan Manlaibaatar
9f7c26bef3 Fix training IR bug by changing passes order (#138292)
Inserting runtime_assertions cause gm to have different names but the graph signature was populated earlier. To avoid this kind of errors in the future, I refactored these steps into a helper function.

Differential Revision: [D64576251](https://our.internmc.facebook.com/intern/diff/D64576251)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138292
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #138266
2024-10-22 01:24:14 +00:00
Tugsbayasgalan Manlaibaatar
5adc33d3b8 Training IR should preserve custom metadata (#138266)
Differential Revision: [D64576252](https://our.internmc.facebook.com/intern/diff/D64576252)

@diff-train-skip-merge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138266
Approved by: https://github.com/yushangdi
2024-10-22 01:09:56 +00:00
Aaron Orenstein
07cc4bd3e2 typing compile_fx.py (#138033)
Type annotations for compile_fx.
- Some of the stuff here is pretty complicated (functions which return functions that take functions) so I bailed on those and used `Any` just to get the rest landed.
- There are also changes to type signatures in other files which I did just to let mypy know more about the types in compile_fx.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138033
Approved by: https://github.com/Skylion007
2024-10-21 18:14:59 +00:00
Tom Ritchford
c0582fd0f8 Remove unused Python variables in torch/[b-z]* (#136963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963
Approved by: https://github.com/ezyang
2024-10-19 16:45:22 +00:00
Tugsbayasgalan Manlaibaatar
1f32a1fb80 Replace torch.export default decomp table to be lazily populated (#137650)
In this PR, we implement lazy dictionary for export decomp behaviour for following reasons:
1. Custom op loading can happen after import time, as a result, the decomp table might not be able to pick up the decomp. Therefore we try to delay materialization as late as possible.

I intentionally seperated out the core_aten_decomp to not have any custom CIA ops in this PR to mitigate the risk of getting reverted but in the future, core_aten_decomp under torch/_decomp will exist as an alias to official export table (torch.export.default_decompositions)

Differential Revision: [D64140807](https://our.internmc.facebook.com/intern/diff/D64140807)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137650
Approved by: https://github.com/justinchuby, https://github.com/bdhirsh
2024-10-18 19:28:52 +00:00
Avik Chaudhuri
5d01126616 preserve module signature with multiple calls (#137999)
Previously we would error when trying to preserve the call signature for a module when it was called multiple times. This PR can now do this without erroring. The fix is to propagate call indices in a few more places.

Note that while this works in the presence of params, buffers, and tensor constants, preserving call signatures for multiple calls to a module when buffers are mutated is not supported yet. This is future work. The main problem is that we do not have enough metadata to `copy_` mutated buffers at the end of each call to a module, so the next call can read those buffers at the beginning. Making this work will likely need some explicit tracking of intermediate values of mutated buffers when collecting metadata during functionalization in export.

Note also that we stop short of creating a single graph out of multiple graphs: that is still future work. So the unflattened module will still have different targets `n`, `n@1`, `n@2`, etc. for each call when we ask the module call signature of `n` to be preserved. However it is way easier to swap all of these targets with a replacement that behaves similar to the original, because all of these calls will respect the original module call signature. (In particular, any constant inputs will be carried by the calls.)

Differential Revision: D64406945

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137999
Approved by: https://github.com/tugsbayasgalan
2024-10-18 07:30:22 +00:00
Tugsbayasgalan Manlaibaatar
f3c3f3a3c3 Fix assigning tensor with requires_grad as constant in export (#137997)
When we insert cojstants into unlifted graph, we need to detach them if they require grad BUT when we detach we need to preserve the original aliasing information.

Differential Revision: [D64406859](https://our.internmc.facebook.com/intern/diff/D64406859/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137997
Approved by: https://github.com/avikchaudhuri
2024-10-17 06:41:10 +00:00
Avik Chaudhuri
0e9708f907 tensor constant with wrapped method (#138091)
Summary:
Tensor constants can show up through wrapped methods, so that they may not always be found in constant attributes. They need to be fakified and their meta vals need to be found to create graph signatures nevertheless. Otherwise non-strict barfs.

Longer term maybe we should pull this fakification up in non-strict.

Test Plan: added test

Differential Revision: D64480272

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138091
Approved by: https://github.com/tugsbayasgalan
2024-10-17 00:00:04 +00:00
Avik Chaudhuri
ed55d356de [alt] fix unroll in successive unflatten (#137646)
We use nn_module_stack in unflatten to recognize when module calls begin and end. However the current format is not sufficient to detect module call boundaries when we have successive calls to the same module, because the successive instructions (end of one call, begin of next call) have the same nn_module_stack. This causes us to effectively "unroll" successive calls to a single call. This can cause problems when preserving module call signatures because the outputs of the successive calls might be concatenated in the single call.

Previously we introduced the concept of a "call index" to generate multiple graphs when unflattening, one per call. This PR pushes this concept into nn_module_stack itself. In particular, the keys of nn_module_stack now go from `key` to `key@call_index`. (In a previous attempt, https://github.com/pytorch/pytorch/pull/137457, instead values in nn_module_stack go from (fqn, type) to (fqn, type, call_index), which is BC-breaking.)

Note that we still do not have the ability to preserve module call signatures for multiple calls to the same module. But now instead of randomly crashing we give a proper error. OTOH when not preserving module call signatures we simply generate multiple calls, each with its own graph, possibly deduplicated, matching what we would do for non-successive calls.

Test Plan: Like D64014936

Differential Revision: D64136277

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137646
Approved by: https://github.com/angelayi
2024-10-12 15:53:52 +00:00
Tugsbayasgalan Manlaibaatar
5fca2fd365 Try unify training and inference (#136888)
Previously inference -> inference IR was going through a seperate flow from train -> inference decomposition. This diff unifies them so that we always retrace when decomposing. Joint IR decomp is still going through old flow (inference -> inference) but seems ok for now since it is still in experimental stage.

Differential Revision: [D63062521](https://our.internmc.facebook.com/intern/diff/D63062521/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136888
Approved by: https://github.com/avikchaudhuri
2024-10-11 20:09:58 +00:00
Avik Chaudhuri
8ee361ed13 fix test_retrace_pre_autograd (#137733)
Test Plan: fixed

Differential Revision: D64200918

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137733
Approved by: https://github.com/pianpwk, https://github.com/tugsbayasgalan
2024-10-11 03:46:22 +00:00
Shangdi Yu
9d4cb0d3eb Fix param and buffer mapping for state_dict when there are state_dict hooks (#137609)
Resolve #137540

Summary:

We might get different state_dict and named_parameters result when the module has registered custom state_dict_hooks.
For exported_program's state_dict, we want the state_dict to reflect the actual module hierarchy at runtime, and it might be different from the model's state_dict() output if the model has state_dict hooks.
To do weight swapping, one needs to either re-export or turn-off the hooks when saving model's state_dict().
Previously, ExportedProgram uses nn.Module's state_dict() method to populate its own state_dict, but it doesn't work for some models (e.g. llama3_3_vision) because ExportedProgram's state_dict and an nn.Module's state_dict have some subtle differences semantically.

nn.Module's state_dict is about how the state should be serialized, and it reflects the structure of the original user model code. In contrast, export specializes on a “run” of a model, and its state_dict needs to reflect the runtime module hierarchy.

One example where these two are different is TorchTune's Llama3_2_vision text decoder. Here, a FusionLayer is added as a local optimization and it is not part of the "static model definition".  In runtime, we have mod.layers[3].layer.sa_norm.scale.

But in nn.Module's state_dict, the authors of the model added a state_dict hook to remove the "layer" in mod.state_dict() to reflect the static model definition, so we have mod.state_dict()["layers.3.sa_norm.scale"].
In this Diff, we change ExportedProgram to populate its state_dict using named_parameters() and named_buffers() instead. So in ExportedProgram's state_dict, we have "layers.3.layer.sa_norm.scale", which reflects the runtime module hierarchy.

Now one problem this presents is weight swapping. Since ExportedProgram's state and the model's state is not the same anymore, weight swapping procedure also needs to change slightly.

In internal Ads and RecSys models deployment, weight swapping is where they have one model that is currently being being deployed and serving traffic, and they want to swap out the weights with newly trained model weights without having to redo the whole exporting/lowering process and create a new artifact. So they would move the deployed model’s pointer to the state dict over to the new state dict. Because of this, it’s previously a requirement that the FQNs are matching between the exported and the eager model’s state dict.

The new ExportedProgram's state dict still supports weight swapping, but the state_dict to be swapped needs to be obtained from torch.export.exported_program instead of model.state_dict() if the model has state_dict hooks.
The new requirement is that the FQNs are matching between the exported’s state dict and the state_dict obtained from `_disabled_load_state_dict_hooks(M)` context manager. One benefit of having this new API is that we are now in full control within export of gathering and updating the model state.
If a model doesn't have any state_dict hooks, one can still use model.state_dict() for weight swapping, so it's BC.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export  -- -r  test_export_for_training_with_state_dict_hooks
```

Differential Revision: D64080561

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137609
Approved by: https://github.com/angelayi, https://github.com/pianpwk
2024-10-11 01:33:50 +00:00
Avik Chaudhuri
365722f606 fix test_constant_output (#137547)
Summary: Fixes a couple of problems: constants didn't have metadata before creating graph signatures, and graph signatures weren't updated when lifting constants.

Test Plan: fixed test

Differential Revision: D64081786

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137547
Approved by: https://github.com/tugsbayasgalan
2024-10-10 07:48:15 +00:00
Tugsbayasgalan Manlaibaatar
02013da038 Lift restriction on training IR for unflatten (#137470)
Differential Revision: [D64025578](https://our.internmc.facebook.com/intern/diff/D64025578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137470
Approved by: https://github.com/avikchaudhuri
2024-10-08 22:30:24 +00:00
Tugsbayasgalan Manlaibaatar
bb31e3f57e Add original forward names to schema so that prettify pass works (#136887)
When we run_decomp, we retrace if it is training IR. As a result, we do need to reliably store the oroiginal forward names when we run decomp.

Differential Revision: [D63064453](https://our.internmc.facebook.com/intern/diff/D63064453/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136887
Approved by: https://github.com/angelayi
2024-10-08 04:21:02 +00:00
Pian Pawakapan
f33ffd01f2 [export] fix joint graph metadata (#136011)
Differential Revision: D62652832

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136011
Approved by: https://github.com/tugsbayasgalan
2024-10-07 19:36:44 +00:00
angelayi
1dc1b85714 [export] Move swap to a different file (#137134)
Refactor so that unflattener doesn't become too messy

Differential Revision: [D63719648](https://our.internmc.facebook.com/intern/diff/D63719648/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137134
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #136191, #137102
2024-10-06 04:28:18 +00:00
angelayi
fa9cd46d12 [export] Update swap's forward function (#137102)
Downstream APS code was failing to run the previously swapped module because of some fx.GraphModule forward function weirdness (P1594789677). So to fix this, I just attached a custom forward function which matches the unflattened module's forward function.

Differential Revision: [D63683422](https://our.internmc.facebook.com/intern/diff/D63683422/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137102
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #136191
2024-10-06 04:25:36 +00:00
angelayi
52d7704b32 [export] Add optimization passes (#136191)
Added an optimization pass to the swap function which removes extraneous pytrees. Currently it removes the pytree flatten/unflatten calls between modules in very specific scenarios (all the inputs of one module go into the other).

Future work can be to remove the input pytree.flatten if the inputs go directly into an unflatten, and output pytree unflatten if the outputs are directly from a pytree.flatten.

Differential Revision: [D62879820](https://our.internmc.facebook.com/intern/diff/D62879820)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136191
Approved by: https://github.com/avikchaudhuri
2024-10-06 04:22:42 +00:00
Avik Chaudhuri
17718209ea fix specialization bug in unflatten + preserve_module_call_signature (#137363)
Summary: In unflatten, when we generate module calls when their signature has been preserved, we do not pass the original constant args. This can cause strange effects, e.g., if the module is swapped out with itself, we may suddenly go down a different path than the original, or even crash.

Test Plan: added a test

Reviewed By: angelayi

Differential Revision: D63913750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137363
Approved by: https://github.com/angelayi
2024-10-05 04:26:02 +00:00
Avik Chaudhuri
6a6a8b17b8 handle state tensors in training ir path (#137240)
Summary: We had attribute assignment detection and handling of registered buffer assignments when using `aot_autograd`, but not when using just `make_fx`. Fixed.

Test Plan: expanded coverage of `test_state_tensors` to use `export` instead of `torch.export.export`

Differential Revision: D63802576

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137240
Approved by: https://github.com/tugsbayasgalan
2024-10-04 20:23:48 +00:00
Tugsbayasgalan Manlaibaatar
d2d14d14e3 [RELAND] Fix unlift to preserve aliased constants (#137310)
Differential Revision: [D63864743](https://our.internmc.facebook.com/intern/diff/D63864743)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137310
Approved by: https://github.com/avikchaudhuri
2024-10-04 18:15:52 +00:00
Pian Pawakapan
6dcd773c57 [export] clean up dynamic markers from tensors (#137230)
Summary:
When we handle dynamic shapes markers like `Dim.AUTO, Dim.DYNAMIC`, we use dynamo decorators, attaching set attributes to the export input tensors, e.g. `x._dynamo_dynamic_indices = set()`.

I thought this was fine, since it's done all the time with torch.compile, but it breaks some PT2Inference tests, specifically because unpickling a set attribute isn't possible with the C++ torch::jit::pickle_load call.

We've agreed that the PT2Inference side will clone sample inputs & pickle the original inputs to be safe, but this still establishes a nice invariant that user-facing decorators are both ignored & cleaned out in the lifecycle of an export call.

Test Plan: test_export

Differential Revision: D63773534

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137230
Approved by: https://github.com/avikchaudhuri
2024-10-04 06:50:45 +00:00
Tugsbayasgalan Manlaibaatar
97634e4f82 Rollout infra for executorch migration to training IR (#132703)
Title

Differential Revision: [D60432217](https://our.internmc.facebook.com/intern/diff/D60432217/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132703
Approved by: https://github.com/tarun292
2024-10-04 04:33:08 +00:00
PyTorch MergeBot
525f6715bc Revert "Fix unlift to unblock training IR + run_decomp on aliasing constants (#137162)"
This reverts commit f96020c246.

Reverted https://github.com/pytorch/pytorch/pull/137162 on behalf of https://github.com/jovianjaison due to Sorry for reverting your changes but many jobs are failing with NameError: name _recursive_getattr is not defined + a Lint job fails ([comment](https://github.com/pytorch/pytorch/pull/137162#issuecomment-2392036062))
2024-10-03 18:17:56 +00:00
Tugsbayasgalan Manlaibaatar
f96020c246 Fix unlift to unblock training IR + run_decomp on aliasing constants (#137162)
When we populate unlifted graph module, we actually only "unlift" constant tensor inputs which is problematic because export de-duplicates aliasing constants. As a result, we only register one constant instead of two constants. This PR fixes that by querying ep.constants table instead of ep.graph_signature.lifted_tensor_constants.

Differential Revision: [D63743111](https://our.internmc.facebook.com/intern/diff/D63743111)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137162
Approved by: https://github.com/pianpwk
2024-10-03 17:28:53 +00:00
Avik Chaudhuri
cd5d1fe015 unflatten with specialized graphs per submodule call (#137013)
Previously we were making a fairly restrictive assumption when unflattening an exported program: for any submodule, we would assert that the graph of every call to that submodule must be the same. This assertion is load-bearing, i.e., if we simply remove the assertion then we can get incorrect results, as shown by the following example.

```
    class N(torch.nn.Module):
        def forward(self, x, b):
            if b:
                return x + 1
            else:
                return x + 2

    class M(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.n = N()

        def forward(self, x):
            x0 = x + 3
            x1 = self.n(x0, True)
            x2 = x1 + 4
            x3 = self.n(x2, False)
            return x3 + 5

    m = M()
    inp = (torch.ones(1),)
    print(m(*inp))  # tensor([16.])
    ep = torch.export.export(m, inp)
    print(ep.module()(*inp))  # tensor([16.])

    unflattened = torch.export.unflatten(ep)
    print(unflattened(*inp))  # tensor([15.])
```

However, this goes against the spirit of specializing graphs when exporting: we should *expect* that for every call to a submodule we *might* generate a different graph. The goal of this PR is to fix unflattening to handle multiple specialized graphs corresponding to multiple calls to the same submodule.

The idea is simple: for every call to a child module `foo`, we will create potentially different child modules `foo`, `foo@1`, `foo@2`, etc. and use those names as targets in `callmodule` instructions in the parent graph. An immediate consequence of this is that the list of fqns in an unflattened module may not be the same as an exported module. Note that all these variants share the same parameters / buffers, so that multiple calls to the same submodule can share state as expected.

However, as described so far this scheme may end up with needlessly too many submodules. Thus, between calls to the same submodule, if graphs are equal then we optimize away the extra submodules and reuse call names as much as possible. Moreover, when submodules are shared across fqns, we also try to de-duplicate graphs corresponding to their calls as much as possible. Note that no matter what, information about which submodule was called is still preserved, so that if a submodule has to be swapped with another, one can still find all calls to the former submodule and replace them with calls to the latter.

A note on the choice of naming scheme for call names: instead of generating "sibling" modules `foo@1`, `foo@2`, etc. for `foo`, we had considered generating "children" modules `foo._1`, `foo._2`, etc. of `foo`. However this can cause spurious cycles when de-duplicating graphs. E.g., suppose that `foo` is an alias for `bar._1` and `foo._1` is an alias for `bar`, then we must either introduce a cycle or drop the opportunity to optimize. Another idea would be to make `foo` a dummy module that contains `foo._0` corresponding to the first call, but this necessitates too many changes to existing tests and hurts the common case.

Differential Revision: D63642479

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137013
Approved by: https://github.com/pianpwk
2024-10-03 00:55:44 +00:00
Tugsbayasgalan Manlaibaatar
73b07df042 Preserve custom ops via run_decomps (#136882)
This is re-apply of https://github.com/pytorch/pytorch/pull/136773?fbclid=IwZXh0bgNhZW0CMTEAAR3SmginkvZcILVY7G2XDa_KosnV4DPmq1l6pkjPIM255QgJLKVAR90rGAU_aem_ZWpcVdUsmAGzOGiwbjtBDg.

Note that this doesn't completely remove the _preserve_ops list from export mainly because we want to have small change to address failing executorch tests. All the complications included in this PR is deleted in the next PR.

Differential Revision: [D63553086](https://our.internmc.facebook.com/intern/diff/D63553086/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136882
Approved by: https://github.com/bdhirsh
2024-10-01 17:38:00 +00:00
Pian Pawakapan
cc2a66c55e [export] hook up mark_dynamic to export Dims (#137029)
Adds Dim.DYNAMIC which calls torch._dynamo.mark_dynamic() in the backend. Similar to Dim.AUTO in that it does automatic inference for ranges & relations, but errors out for specializations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137029
Approved by: https://github.com/avikchaudhuri
2024-10-01 17:05:09 +00:00
Ivan Zaitsev
b35f70da05 [ez] fixup the export of D62879819 (#136900)
a line from D62879819 (#136190) went missing somehow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136900
Approved by: https://github.com/atalman
2024-09-28 13:46:17 +00:00
Pian Pawakapan
6075f566cc [export] simplify automatic dynamic shapes processing (#136591)
Removing `_transform_shapes_for_default_dynamic` and `assume_static_by_default=False` as added in https://github.com/pytorch/pytorch/pull/133620.

This reverts back to `assume_static_by_default=True` with the use of dynamo decorators (e.g. `maybe_mark_dynamic, mark_static`, instead) for handling Dim.AUTO & Dim.STATIC instead. This is easier to maintain, as it doesn't requiring reasoning about "inverting" the dynamic_shapes specs, and also opens up usage of other decorators (`mark_dynamic, mark_unbacked`).

On the user side this change has no effect, but internally this means dynamic behavior is determined only by the `dynamic_shapes` specs (ignoring user-side input decorators following https://github.com/pytorch/pytorch/pull/135536), but transferring this information for _DimHints via decorators, for Dynamo/non-strict to create symbolic_contexts accordingly, e.g. 7c6d543a5b/torch/_dynamo/variables/builder.py (L2646-L2666)

One caveat is we don't raise errors for dynamic decorators on the user side, since we don't know if they're from user markings, or from re-exporting with inputs we've previously marked.

Differential Revision: D63358628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136591
Approved by: https://github.com/avikchaudhuri
2024-09-27 18:28:51 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
e4e83a4ac4 Remove aten.item hack (#136663)
Summary: Title

Test Plan: CI

Differential Revision: D63404353

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136663
Approved by: https://github.com/bdhirsh
2024-09-26 17:14:48 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
0b38fa154a Fix meta registry in export (#136492)
Summary: Title

Test Plan: CI

This fixes some breaking tests in executorch. I think the root cause is when we have aten::matmul which we are not preserving, we register meta implementation from C++ side. It seems like the C++ kernel doesn't work well with mix of FakeTensor and real tensor. This PR sidesteps this problem by always preferring python CIA decomp over C++ Cia decomp

Differential Revision: D63297050

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136492
Approved by: https://github.com/bdhirsh
2024-09-25 17:53:02 +00:00
Pian Pawakapan
7c6d543a5b [export] fix _get_non_persistent_buffers for duplicates (#136552)
Summary: Export's method _get_non_persistent_buffers doesn't check duplicate submodules, so we run into state_dict related issues if non-persistent buffers exist on shared submodules.

Test Plan: test_export

Differential Revision: D63332976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136552
Approved by: https://github.com/avikchaudhuri, https://github.com/tugsbayasgalan
2024-09-25 16:46:31 +00:00
angelayi
210b136c07 [export] Add experimental swap API (#136190)
Prototyped the following API which takes in an ExportedProgram, a dictionary of fqn to modules to swap, and returns a (unlifted) GraphModule
```
_swap_modules(
    ep: ExportedProgram, modules_to_swap: Dict[str, torch.nn.Module]
) -> torch.fx.GraphModule:
```

Differential Revision: [D62879819](https://our.internmc.facebook.com/intern/diff/D62879819)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136190
Approved by: https://github.com/avikchaudhuri
2024-09-24 22:50:44 +00:00
Huanyu He
a4e9a1c90b [TorchRec][PT2 IR][APF] short circuit the flatten/unflatten between EBC and KTRegroupAsDict modules (#136045)
Summary:
# context
* for the root cause and background please refer to this [post](https://fb.workplace.com/groups/1028545332188949/permalink/1042204770823005/)
* basica idea of this diff is to **short circuit the pytree flatten-unflatten function pairs** between two preserved modules, i.e., EBC/fpEBC and KTRegroupAsDict.
NOTE: There could be multiple EBCs and one single KTRegroupAsDict as shown in the [pic](https://fburl.com/gslide/lcyt8eh3) {F1864810545}
* short-circuiting the EBC-KTRegroupAsDict pairs are very special and a must in most of the cases due to the EBC key-order issue with distributed table lookup.
* hide all the operations behind a control flag `short_circuit_pytree_ebc_regroup` to the torchrec main api call `decapsulate_ir_modules`, which should only be visible to the infra layer, not to the users.

# details
* The `_short_circuit_pytree_ebc_regroup` function finds all the EBCs/fpEBC and KTRegroupAsDict modules in an unflattened module.  Retrieve their fqns and sort to in_fqns (regroup_fqns) and out_fqns (ebc_fqns). Because currently the fpEBC is swapped as a whole, so we do some extra fqn logic to filter out the EBC that belongs to an up-level fpEBC.
* a util function `prune_pytree_flatten_unflatten` removes the in-coming and out-going pytree flatten/unflatten function calls in the graph module, based on the given fqns.

WARNING: The flag `short_circuit_pytree_ebc_regroup` should be turned on if EBCs are used and EBC sharding is needed. Assertions are also added if can't find a `KTRegroupAsDict` module, or `finalize_interpreter_modules` is not `True`.

# additional changes
* absorb the `finalize_interpreter_modules` process inside the torchrec main api `decapsulate_ir_modules`.
* set `graph.owning_module` in export.unflatten as required by the graph modification
* add one more layer of `sparse_module` for closely mimicing the APF model structure.

Test Plan:
# run test
* serializer
```
buck2 run fbcode//mode/opt fbcode//torchrec/ir/tests:test_serializer
```
* apf
```
buck2 run fbcode//mode/opt fbcode//aps_models/ads/gmp/tests/ne/e2e_deterministic_tests:gmp_e2e_ne_tests -- --filter-text 'test_mtml_instagram_model_562438350_single_gpu_with_ir'
```
* local mp run
```
==== Finished E2E deterministic test for mtml_instagram_model_gmp_474023725_non_kjt_unary ====
finished
  test_mtml_instagram_model_562438350_single_gpu_with_ir
Imports took: 6.0s! Profile with --import-profiler.            --_ |""---__
Executed 1 example in 203.1s:                               |'.|  ||  .    """|
  Successful: 1                                             | ||  || /|\""-.  |
  Failed: 0                                                 | ||  ||  |    |  |
  Skipped: 0                                                | ||  ||  |   \|/ |
  Not executed: 8                                           |."|  ||  --"" '__|
https://testslide.readthedocs.io/                              --" |__---"""
```

Differential Revision: D62606738

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136045
Approved by: https://github.com/angelayi
2024-09-17 18:42:56 +00:00
Tugsbayasgalan Manlaibaatar
1904b09e61 Create export_for_inference API and expose core_aten as public facing API (#135912)
Differential Revision: [D62606908](https://our.internmc.facebook.com/intern/diff/D62606908)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135912
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #135080
2024-09-15 17:05:07 +00:00
Tugsbayasgalan Manlaibaatar
382fad58b3 Deprecate _preserve_ops and consolidate with decomp_table (#135080)
In this PR, we deprecate _preserve_ops feature in run_decomposition API. We can't kill this API completely because Executorch team depends on it. As the syncing between two repos is non-trivial, I just leave this argument as deprecated for now. In the next PR, i will immediately remove it.

After this PR, run_decompositions will only decompose what's inside the decomp table and preserve the rest by default. Note that this feature is only rolled out to OSS for now. Old code path is protected under IS_FBCODE flag.

Differential Revision: [D62163161](https://our.internmc.facebook.com/intern/diff/D62163161/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135080
Approved by: https://github.com/justinchuby, https://github.com/avikchaudhuri, https://github.com/bdhirsh
2024-09-15 17:01:58 +00:00
Pian Pawakapan
b897ab0540 [export] ignore mark_dynamic() in export (#135536)
Previously we were accomodating `torch._dynamo.mark_dynamic()` for export's dynamic shapes. Here we clean things up and ignore it, requiring users to specify an export input for `dynamic_shapes`.

Note: there's 4 decorators relevant to export, `mark_dynamic, maybe_mark_dynamic, mark_static, mark_unbacked`. User calls that involve export have only been `mark_dynamic()`, and we use `maybe_mark_dynamic` under the hood for `Dim.AUTO`, but we could start using others. One reason I decided to not warn and just silently ignore is these decorators cause the tensors to carry dynamic info, and it'll be hard to tell whether the markers are from export or user calls when re-exporting with the same inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135536
Approved by: https://github.com/avikchaudhuri
2024-09-12 21:22:19 +00:00
Tugsbayasgalan Manlaibaatar
5a9ac83e94 Fix doc (#135551)
Differential Revision: [D62412667](https://our.internmc.facebook.com/intern/diff/D62412667/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135551
Approved by: https://github.com/yushangdi
ghstack dependencies: #135549
2024-09-10 07:18:44 +00:00
Tugsbayasgalan Manlaibaatar
c18052da0e Add some minor doc improvement and ban using training IR for unflattener (#135549)
Title

Differential Revision: [D62412490](https://our.internmc.facebook.com/intern/diff/D62412490/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135549
Approved by: https://github.com/yushangdi
2024-09-10 06:48:42 +00:00
Avik Chaudhuri
de74aafff4 error on exporting ScriptModule (#135302)
Test Plan: added test

Differential Revision: D62279179

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135302
Approved by: https://github.com/yushangdi
2024-09-06 15:12:40 +00:00
Zhengxu Chen
116fd474da [export] Expand coverage to more copied sym ops for unflattener. (#135119)
Test Plan:
buck2 test 'fbcode//mode/opt' fbcode//torchrec/ir/tests:test_serializer -- --run-disabled

```
File changed: fbcode//caffe2/torch/export/unflatten.py
Buck UI: https://www.internalfb.com/buck2/2e0377e7-e2b6-4bd0-8133-a787245165a0
Test UI: https://www.internalfb.com/intern/testinfra/testrun/5066549824883887
Network: Up: 0B  Down: 0B
Jobs completed: 16. Time elapsed: 10.2s.
Tests finished: Pass 6. Fail 0. Fatal 0. Skip 0. Build failure 0
```

Differential Revision: D62190172

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135119
Approved by: https://github.com/yushangdi
2024-09-05 21:58:20 +00:00
Angela Yi
9c38b00999 [export] Add ability to run eagerly on UnflattenedModule (#133996)
Summary:
Added the contextmanager, `_disable_interpreter`, which is meant to put around a call to `unflatten`. This will generate an UnflattendModule and sub-InterpreterModules which will not use torch.fx.Interpreter to run eagerly. We want to have this as a state of the module instead of a contextmanager around running the module because it's not clear where we are calling the unflattened module.

This seems to improve the performance: https://fb.workplace.com/groups/1075192433118967/posts/1473590629945810/?comment_id=1473621763276030

Test Plan: CI

Differential Revision: D60939034

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133996
Approved by: https://github.com/pianpwk
2024-09-05 20:28:42 +00:00
Tugsbayasgalan Manlaibaatar
9d705605dd Fix decomp behaviour in export training IR (#134801)
Subset of changes in https://github.com/pytorch/pytorch/pull/132901, can't land the previous one because it is too complicated. Rest of the change will be implemented as follow up after export design meeting. This part just makes the training IR -> inference IR decomp to have the same path as normal export.

Differential Revision: [D62000525](https://our.internmc.facebook.com/intern/diff/D62000525)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134801
Approved by: https://github.com/avikchaudhuri, https://github.com/angelayi
2024-09-05 06:37:44 +00:00
Laith Sakka
c8ab9b06a2 Redesign custom op functionlaization for better re-inplace (#134409)
- The new implementation (auto_functionalized_v2) is enabled by default but can be disable
 using an inductor flag.
- In export mode the old implementation is used.

**Motiviation**
Previous functionalization fails to re-inplace arguments when they are view over other tensors.
see issue https://github.com/pytorch/pytorch/issues/131192
The new functionalization is easier to re-inplace for views.

**A) Functionalizations pass**
consider a program:

```

func(t)
    x = t[0]
    y = t[1]
    foo(x, y) # custom operator with x, y mutable
    return (x, y, t)
```

- To functionalize `foo` we generate a function that operates on the base tensors of the inputs;  (x.base() and y.base())
and record how to regenerates the views out of the base for argument x by recording ```ViewInfo=(x.base(), x.size(), x.stride, x,storage_offset())```

- Due to some limitations on the torch.export arguments format, we have to generate alot of arguments, but this is something we can simplify in the future, for the example above we get the following function.

   ```
   auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.mylib.foo.default,
     _x_base_index = 0, _x_size = (), _x_stride = (), _x_storage_offset = 0 ,
     _y_base_index = 0,_y_size = (), _y_stride = (), _y_storage_offset = 1   ,
     _all_bases = [arg0_1])
   ```
 -  In the code above:
        - _all_bases[t]: refers to a unique set of bases for all foo arguments.
        - for each argument x we have _x_base_index, _x_size, _x_stride, _x_storage_offset that can be used to (1)  regenerate x from _all_bases[_x_base_index] or a copy of a the base.

-  the output of auto_functionalized is foo output , followed by x tensors one for each base in  _all_bases, that is a copy of the base tensor after observing the mutations of the all the arguments that are views of that base.

-  for each use of a base in _all_bases or a view of it , that are after the call to foo, replace it with a view of the new output

 for the function above after functionalization we get :
 ```
    def forward(self, arg0_1: "f32[2][1]cpu"):
        auto_functionalized = torch.ops.higher_order.auto_functionalized(torch.ops.mylib.foo.default, _x_base_index = 0, _x_size = (), _x_stride = (), _x_storage_offset = 0, _y_base_index = 0, _y_size = (), _y_stride = (), _y_storage_offset = 1, _all_bases = [arg0_1])
        getitem_1: "f32[2][1]cpu" = auto_functionalized[1];  auto_functionalized = None
        copy_: "f32[2][1]cpu" = torch.ops.aten.copy_.default(arg0_1, getitem_1);  arg0_1 = copy_ = None

        # No stacktrace found for following nodes
        select_2: "f32[][]cpu" = torch.ops.aten.select.int(getitem_1, 0, 0)
        select_3: "f32[][]cpu" = torch.ops.aten.select.int(getitem_1, 0, 1);  getitem_1 = None
        return (select_2, select_3)
```

**B) Semantics of  auto_functionalize**
The new semantics of auto_functionalize is as the following:
1. For each base in all_bases, copy the base and create all_bases copies. (if a base is inplaced we do not need to copy it)
2. For each arg, regenerate the arg from the copy of its base using the view information above.
3. return the original foo output followed by the new bases.

**C) Re-inplace pass**
since auto_functionalize not copy the bases, what we actually inplace is the bases.
 (run just like before but on the beses instead of args).

1. For each base b in _all_bases check if there is any use of base (or its aliases/views) after auto_functionalize (before its overwritten with a copy) if there is not any, then inplace it (avoid copying it in step 1 above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134409
Approved by: https://github.com/zou3519
2024-09-04 17:08:58 +00:00
Avik Chaudhuri
9f00317997 rationalize STATIC vs. None (#134877)
Summary:
A bit of refactoring to prepare to remove `None` as a way to specify static dimensions in dynamic shapes, given we already have `Dim.STATIC` for the same purpose. We will now warn whenever this happens. However no tests were modified because problematic uses of `None` still need to behave as they do today, until we are ready to remove support. It should be easy to port tests by replacing the warning function to raise instead.

Note that other uses of `None`, such as for entire values (tensor or non-tensor) remain as is. Moving forward this should be the only purpose of `None` (at least externally).

Finally, there's a bit of confusion in our representation now because `AUTO` also internally transforms to `None`. Renamed dynamic_shapes to transformed_dynamic_shapes where this happens. Overall the two forms (pre and post transformation) have different properties so should probably not be represented in the same format in the future.

Test Plan: existing

Differential Revision: D62040729

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134877
Approved by: https://github.com/pianpwk
2024-09-04 05:34:26 +00:00
Zhengxu Chen
a19a7524f6 [export] Make sure getitem replacement are synced with module call graph. (#134830)
Summary: When we are placing nodes in the graph, we should also replace the references in module_call_graph.

Test Plan:
buck2 run 'fbcode//mode/opt' torchrec/fb/ir/tests:test_serializer -- --filter-regex test_serialize_deserialize_vlea
buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_serialize_empty_value_vlea' --run-disabled

buck2 test 'fbcode//mode/opt' fbcode//torchrec/fb/ir/tests:test_serializer -- --exact 'torchrec/fb/ir/tests:test_serializer - torchrec.fb.ir.tests.test_serializer.TestSerializer: test_deserialized_device_vle' --run-disabled

Differential Revision: D62014035

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134830
Approved by: https://github.com/angelayi
2024-08-30 16:47:05 +00:00
Laith Sakka
f5b0caee71 Rewrite unsafe_remove_auto_functionalized_pass using decompose_auto_functionalized (#134831)
`unsafe_remove_auto_functionalized_pass` can be written as using `decompose_auto_functionalized`, this way we do not have to update it each time we do a change to `auto_functionalize` (Ex https://github.com/pytorch/pytorch/pull/134409) , and we avoid duplicate logics implemented in two different ways.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134831
Approved by: https://github.com/zou3519
2024-08-30 16:27:53 +00:00
Pian Pawakapan
36a6516290 [export] use single FQN for param_buffer_mapping (#134500)
Fixes #133252

In strict mode, we have this routine for mapping traced parameters to their FQNs using tensor ids. Currently we assume there's at least 1 unique FQN for each traced parameter, but this seems to break with parameter reuse when call_module nodes are present. Adding a test case where this breaks.

Fixes this by assigning the same FQN to all traced parameters with the same tensor id. This is fine because we return the original state_dict for the EP, and the unflattener has its own routine of handling aliasing: https://github.com/pytorch/pytorch/pull/125758
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134500
Approved by: https://github.com/angelayi
2024-08-29 17:06:31 +00:00
Avik Chaudhuri
92e38a476f preserve aten::to device in export training (#134622)
Summary:
With training IR, we cannot rely on trapping `to()` in `FunctionalTensor` because the regular decomposition kicks it first, and that can cause it to be optimized away.

So instead we preserve it until we functionalize, and then replace it explicitly with `_to_copy()`.

Test Plan: expected test failures go away

Differential Revision: D61883878

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134622
Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan
2024-08-29 14:53:30 +00:00
Avik Chaudhuri
ca03a14cf7 hang dim hint constants off Dim (#134702)
Summary: Retry landing https://github.com/pytorch/pytorch/pull/134484

Test Plan: (see original)

Differential Revision: D61925860

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134702
Approved by: https://github.com/pianpwk
2024-08-29 01:02:01 +00:00
Tugsbayasgalan Manlaibaatar
6dd3f81aaf Add export_for_training as public API (#134677)
Differential Revision: [D61912084](https://our.internmc.facebook.com/intern/diff/D61912084)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134677
Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17
2024-08-28 22:32:10 +00:00
PyTorch MergeBot
13d40f6fc5 Revert "hang dim hint constants off Dim (#134484)"
This reverts commit c142af7209.

Reverted https://github.com/pytorch/pytorch/pull/134484 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/134484#issuecomment-2315749549))
2024-08-28 16:05:42 +00:00
Avik Chaudhuri
c142af7209 hang dim hint constants off Dim (#134484)
Summary: Recently https://github.com/pytorch/pytorch/pull/133620 added support for automatic dynamic shapes, where a new enum, `DIM`, was introduced to provide hints like `AUTO` and `STATIC`. This PR is a nominal change where we expose the hints via the existing public `Dim` API, and remove `DIM` from the public API. The main motivation is to avoid having users need to import too many things.

Test Plan: existing

Differential Revision: D61807361

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134484
Approved by: https://github.com/angelayi
2024-08-28 14:35:40 +00:00
Pian Pawakapan
5ead965026 [export] don't duck size for DIM.AUTO (#134486)
Summary: apparently DIM.AUTO leads to duck sizing, I didn't catch this. Doing the least intrusive fix possible by using `torch._dynamo.maybe_mark_dynamic()` under the hood.

Test Plan: added test

Differential Revision: D61809344

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134486
Approved by: https://github.com/avikchaudhuri
2024-08-27 23:00:26 +00:00
Avik Chaudhuri
8db8ac700d line by line logging (#134298)
Summary:
Today there is no good mechanism to detect progress of non-strict export line-by-line in user code. This caused some pain recently in trying to find the exact line of user code that was triggering a bug where the process appeared stuck because deep down something was calling some symbolic shapes code that was suffering some exponential blowup.

This PR adds a environment variable for extended debugging that will log the line of user code corresponding to every torch function call. It only works in non-strict export for now. Prefix setting this environment variable with `TORCH_LOGS`  enabled for `export` logs at `DEBUG` level (i.e., with a `+` prefix), i.e.,.:

```
TORCHEXPORT_EXTENDED_DEBUG_CURRENT_LOC=1 TORCH_LOGS="+export" ...
```

This will show logs with something like:
```
...
prim::device called at .../example.py:4284 in foo
TensorBase.item called at .../example.py:4277 in bar
...
```

We already have an existing place to intercept torch functions where we process data-dependent errors in non-strict, so parking the logging there. An alternative place we could be doing this is where we add `stack_trace` metadata when generating code, but unfortunately at least the example that motivated this gets stuck before generating code, so that would be too late.

Test Plan: ran it on some sample commands

Differential Revision: D61692156

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134298
Approved by: https://github.com/angelayi
2024-08-25 02:57:11 +00:00
Pian Pawakapan
54ff320519 [export] refactor ExportGraphSignature construction (#134059)
Refactors construction of ExportGraphSignature object for export & training IR, explicitly creating AOTAutograd signature for training IR. This will be helpful for upcoming refactors for placeholder naming & runtime asserts prettifying.

Changes:
- dedups `make_argument_spec` call, moved to export/graph_signature.py
- `_sig_to_specs` wrapped into new function `_convert_to_export_graph_signature`, directly converts GraphSignature -> ExportGraphSignature
- `_make_fx_helper` explicitly creates AOTAutograd GraphSignature object
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134059
Approved by: https://github.com/angelayi, https://github.com/ydwu4
2024-08-23 23:29:28 +00:00
Yiming Zhou
2cfc2da527 [export] Make move_to_device_pass function public (#134263)
Summary:
This is a follow-up of https://github.com/pytorch/pytorch/pull/133660

Here we make the `move_to_device_pass()` function publich so users can call it by `from torch.export.passes import move_to_device_pass`

Test Plan: CI

Differential Revision: D61671310

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134263
Approved by: https://github.com/angelayi
2024-08-23 23:18:30 +00:00
Pian Pawakapan
8ff3a5be1b [export] basic auto dynamic shapes (#133620)
Starter version of automatic dynamic shapes for export.

Creates enums `DIM.AUTO`, `DIM.STATIC`, allowing user to specify `AUTO` for dims in dynamic_shapes specs, meaning that corresponding dims are treated as dynamic, and relevant guards will do what's necessary (e.g. refine ValueRanges, set replacements based on equality, or even set static) without raising ConstraintViolationErrors. Basically allows the user to say, "a bunch of these dims can be dynamic, let export do model analysis and return the program with maximum possible dynamism, without complaining".

The usage for specifying `dynamic_shapes` is now:
```
AUTO -> dynamic by default, return whatever produce_guards() says, even if it's static
None/int/STATIC -> static
Dim/DerivedDim -> same as before - will complain if the min/max range is invalid, or if dims related to this are unspecified.
```

Caveat 1: specifying `AUTO` for a dim won't guarantee it'll be dynamic:

- specifying `AUTO` for a dim will return the maximum possible dynamism given your program and other specified constraints, but this can still mean you'll get a static program. For example, with the program below, x is specified dynamic, but it's equal to y, which is specified static, and with how we currently do things we won't promote y to dynamic, but will demote(?) x to static. So this can be surprising if you don't fully know your model, and/or missed one of your other inputs when specifying auto-dynamic shapes.
```
class Foo(torch.nn.Module):
    def forward(self, x, y):
        return x + y
inputs = (torch.randn(6), torch.randn(6))
export(Foo(), inputs, dynamic_shapes={"x": (DIM.AUTO,), "y": None})
```

Caveat 2: specifying `AUTO` and Dims in the same spec is still problematic:

- The way Dims/DerivedDims are currently handled is very strict. A Dim represents a symbol, and we require a user to specify the symbol for all dims governed by the symbol - that's why we've seen errors in the past like `The values of x must always be related to y by ...`, asking the user to specify the exact relation as in the program. We also require the specified min/max range to be a subset of the valid range from model analysis. All this doesn't compose well with specifying `AUTO` just yet - for example in the program below, ideal behavior could be to return a dynamic program, where `dx = x.size(0) = y.size(0)` has range (3,6). Unfortunately this crashes, and correct behavior is to specify `dx` for both inputs. So currently we raise a UserError and crash if both Dims + `AUTO` are present in the spec.
```
class Foo(torch.nn.Module):
    def forward(self, x, y):
        return x + y
inputs = (torch.randn(6), torch.randn(6))
export(Foo(), inputs, dynamic_shapes={"x": (DIM.AUTO,), "y": {0: Dim("dx", min=3, max=6)}})  # this doesn't work, because x & y and related
```

Implementation details:

This is done by setting `assume_static_by_default=False`, and doing a transform on the `dynamic_shapes` spec to preserve semantics. `assume_static_by_default=False` will treat unspecified dims or Nones as dynamic. This is the opposite of what `export.export()` currently does - unspecified Dims/Nones are treated as static. Historically this static-by-default behavior, where the user deals with fewer guards, has been desirable, and we would like to respect that in this implementation. So this internal spec transformation is added, `_transform_shapes_for_default_dynamic()`, does the spec conversion necessary to be compatbile with dynamic by default. Specifically, AUTOs are converted into Nones, and Nones/unspecified dims are filled in with explicitly static constraints.

For example, this would look like, for a 3-d tensor: `{0: DIM.AUTO, 1: None, 2: Dim("dx")} -> {0: None, 1: 32, 2: Dim("dx")}`

This does seem overly complicated, but it's done to preserve dynamic shapes semantics for `torch._dynamo.export()`, which already uses `assume_static_by_default=False`, and follows the same process for generating shape constraints , via `_process_dynamic_shapes`. There the semantics are:
```
None/unspecified: dynamic by default
Dim/DerivedDim: also a strict assertion
```

If we don't care about BC for `_dynamo.export(dynamic_shapes)`, then we can just modify semantics for `_process_dynamic_shapes()` and change all the relevant tests in `test/dynamo/test_export.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133620
Approved by: https://github.com/avikchaudhuri
2024-08-23 22:56:39 +00:00
Angela Yi
f5a2a22dc4 [export] Fix unflattener to respect nn.Parameter requires_grad (#134353)
Summary: Fixes P1539870235

Test Plan: CI

Differential Revision: D61726403

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134353
Approved by: https://github.com/pianpwk
2024-08-23 22:49:34 +00:00
Avik Chaudhuri
b454c51060 remove dynamic_dim (#134211)
Summary: As promised in https://github.com/pytorch/pytorch/pull/134045.

Test Plan: existing

Differential Revision: D61646937

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134211
Approved by: https://github.com/angelayi
2024-08-23 04:13:03 +00:00
Aaron Orenstein
d95aedf5fd [BE] typing for decorators - fx/_compatibility (part 1) (#134202)
Part of #134054.

This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
2024-08-22 17:07:33 +00:00
Avik Chaudhuri
0d7ac1966a kill sharing of constraints (#134045)
Summary:
Previously, reuse of the same `Dim` was encoded by "sharing" internal constraints among constraint targets. This kind of sharing, implemented using `shared` fields between `_Constraint`s, was originally motivated by `dynamic_dim`, specifically to support `==` between `dynamic_dim`s, but we no longer need to maintain this overcomplicated structure: we can simply use names of `Dims` to directly encode sharing information.

Thus this PR vastly simplifies the structure of `_Constraint` by removing `shared` fields. As a result, both `_Constraint` and its moral subclass, `_DerivedConstraint`, are 1-1 with `Dim` and its moral subclass, `DerivedDim`.

Note that this will break `==` over `dynamic_dim`, so an immediate follow-up will be to remove `dynamic_dim` entirely from our public API. (It's been more than 6 months since the deprecation warning anyway.) I just didn't want to deal with that process in the same PR.

Test Plan: existing

Differential Revision: D61559413

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134045
Approved by: https://github.com/pianpwk
2024-08-22 04:40:47 +00:00
Zhengxu Chen
3ef1cc8583 [export] Implement common_getitem_elimination pass. (#133618)
Summary:
In export, we will generate many redundant getitem nodes branching from the same source, inserted by runtime assertions or any passes. This is causing issues with any downstream system relying on any value being uniquely defined by a single node.

I don't think it hurt to remove a bunch of getitem nodes only, so I just added to the ctor.

Test Plan:
rebase on D61256937
```
buck2 run scripts/bearzx:pt2_export_playground
```

Differential Revision: D61351578

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133618
Approved by: https://github.com/tugsbayasgalan
2024-08-21 16:48:24 +00:00
PyTorch MergeBot
49f6ea6dd9 Revert "[report_exportability] Avoid re-exporting duplicated modules (#133930)"
This reverts commit 278bc985d7.

Reverted https://github.com/pytorch/pytorch/pull/133930 on behalf of https://github.com/izaitsevfb due to breaks lint ([comment](https://github.com/pytorch/pytorch/pull/133930#issuecomment-2299513046))
2024-08-20 18:44:09 +00:00
Sherlock Huang
278bc985d7 [report_exportability] Avoid re-exporting duplicated modules (#133930)
Summary:
Skip re-exporting modules with the duplicated types to speed up the exportability tests.

In real models, there are many duplicated modules, and mostly have the same export issues.

Test Plan: Existing CI

Differential Revision: D61504630

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133930
Approved by: https://github.com/angelayi

Co-authored-by: bearzx <bearzx@fb.com>
2024-08-20 18:20:49 +00:00
Avik Chaudhuri
b0bafd2be5 remove tensor weak ref from constraint target (#133890)
Summary: `_ConstraintTarget` is an internal data structure that has some redundancy: tensors are identified by their id but also carry a weak reference. The weak reference was probably useful a year back but everything is done with ids right now, and the lifetime of these tensors ensures that using their ids is OK.

Test Plan: existing tests

Differential Revision: D61488816

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133890
Approved by: https://github.com/tugsbayasgalan
2024-08-20 03:03:05 +00:00
Justin Chu
271ee90851 [easy] Fix type annotation for ExportedProgram.run_decompositions (#133720)
Fix the tuple type annotation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133720
Approved by: https://github.com/Skylion007
2024-08-16 22:11:42 +00:00
Pian Pawakapan
a75248528f [export] refactor _process_dynamic_shapes (#133391)
Sorryyyyy for another refactor. This splits `_process_dynamic_shapes` into 3 parts:
1. `_combine_args` - mostly the same thing
2. `_check_dynamic_shapes`, which is responsible for raising 99% of UserErrors if the dynamic shapes spec is invalid (minus 1 UserError with DerivedDims)
3.  `_process_dynamic_shapes`, which for now, is the same thing, minus the stuff in 2.

This refactor is helpful for incoming automatic dynamic shapes work, because, we're switching to `assume_static_by_default=False`, which is what `_dynamo.export` currently does. This means any unspecified dims are allocated a symbol, in contrast to export today which keeps unspecified dims static. Historically this has been desirable - export users don't want too much dynamism. So we want to change how the spec is translated into constraints.

This means when we switch over to automatic dynamic shapes, we want to plug in something in between steps 2. and 3. which patches up the spec for `assume_static_by_default=False`, filling in static shapes for any unspecified dims, and potentially clearing out the auto-dynamic dims (since they're no-ops). We would do this in-between 2. and 3. to keep `_process_dynamic_shapes` semantically the same, since it's used with `_dynamo.export`.

We could do this without a refactor, plugging in this transform before `_process_dynamic_shapes`, but since that function's responsible for both spec checking + constraint production, moving spec checking to before we transform the specs helps guarantee we're raising errors on what the user's specified, and not an internal export bug.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133391
Approved by: https://github.com/avikchaudhuri
2024-08-15 16:21:21 +00:00
Zhengxu Chen
f23dbefe52 [export] Support "custom" metadata field. (#131912)
Summary:
Add a special field in Graph and Node level metadata called "custom" which should be mapped to a json-serializable object, and we guarantee this field should be always preversed across the following transformations:
1. copy/deepcopy
2. run_decompositions()
3. serialization
4. re-exporting

Test Plan: :test_export -- -r custom_tag

Reviewed By: angelayi

Differential Revision: D60291839

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131912
Approved by: https://github.com/angelayi
2024-08-14 01:09:01 +00:00
Pian Pawakapan
40061bd61e [export] overwrite placeholder names when deepcopying (#133269)
In joint-graph export we have a `copy.deepcopy(ep.graph_module)` call. This turns out to be an imperfect deepcopy, because deepcopy allows objects to overwrite their `__deepcopy__` methods. For fx.Graph, this ends up deferring to `Graph.create_node()`, which checks the graph namespace, and can avoiding copying the exact name in niche examples, like where the name is a Python keyword (e.g. `input` gets renamed to `input_1`).

Names like `input` happen because export's placeholder naming pass overwrites what the namespace creates, based on the model's `forward()` signature. So we can either 1) avoid overwriting such cases, which requires rewriting the naming pass logic, or 2) force another overwrite after deepcopying. This goes with 2).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133269
Approved by: https://github.com/zhxchen17, https://github.com/dvorjackz, https://github.com/ydwu4
2024-08-13 10:20:43 +00:00
Yidi Wu
c44cb89e06 [export] detach constant tensors when they're not registered as buffer or parameter in unlift (#133031)
Summary:
Fixes T198245910.

In  previous diff D60532628 that causes the test failure, we fix the  in-consistency caused by constant tensors is accidentally reigistered as buffer by deleting the buffer and re assign them as constant.

However, this broke several existing tests in pyspeech when the exported program is re-traced with torch.jit.trace (which is an anti-pattern we probably should have some alignment), the jit tracer finds this constant tensor requiring grad and errors out.

This PR force constant attr not requiring grad, which is the correct behavior. A better fix is finding out where the constants are created in user code and why it requires grad. But this has low roi so we warn user about it.

Test Plan: See failures in T198245910.

Differential Revision: D60974869

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133031
Approved by: https://github.com/angelayi
2024-08-09 20:33:52 +00:00
Avik Chaudhuri
22ea248aa8 dynamic shapes mismatch errors (#132982)
Summary: When PyTree detects a structural mismatch between inputs and dynamic shapes, the error messages are quite horrible. This PR fixes these error messages by adding, for each kind of error, the path to the point where the error happens and an actionable reason for the error.

Test Plan: added test with several cases

Differential Revision: D60956976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132982
Approved by: https://github.com/yushangdi
2024-08-09 02:22:32 +00:00
Shangdi Yu
3c5b246d3c [export] Remove Proxy from exported programs and modules (#132956)
Summary: Remove Proxy from exported programs and modules because they cannot be deepcopied or pickeled.

Test Plan:
CI

```
buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test/quantization:test_quantization -- -r  qat_conv2d
buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r  test_fold_bn_erases_bn_node
```

Differential Revision: D60940832

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132956
Approved by: https://github.com/angelayi
2024-08-09 00:00:20 +00:00
Edward Z. Yang
1f66487c69 [BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770
Approved by: https://github.com/bdhirsh
2024-08-08 23:07:23 +00:00
PyTorch MergeBot
d1f73fd844 Revert "[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)"
This reverts commit 902c6f3a19.

Reverted https://github.com/pytorch/pytorch/pull/132770 on behalf of https://github.com/ezyang due to Removed API was recommitted ([comment](https://github.com/pytorch/pytorch/pull/132770#issuecomment-2275749689))
2024-08-08 12:54:34 +00:00
Edward Z. Yang
902c6f3a19 [BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770
Approved by: https://github.com/bdhirsh
ghstack dependencies: #132674, #132675, #132421, #132062, #132767, #132769
2024-08-08 12:03:25 +00:00
angelayi
a270800f0b [export][reland] Add print_readable to unflattened module (#132817)
Reland https://github.com/pytorch/pytorch/pull/128617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132817
Approved by: https://github.com/pianpwk
2024-08-08 06:05:30 +00:00
Angela Yi
45d0e90bd3 [export] Allow str outputs (#132808)
Summary: Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1478413606130179/

Test Plan: CI

Differential Revision: D60850712

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132808
Approved by: https://github.com/ydwu4
2024-08-08 02:20:59 +00:00
Yidi Wu
bbf568aac8 Split of "[reland] [export] fix zero arg export in training_ir and constant tensor handling" (#132307)
Summary:
A re-land of D60006710.
Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing.

edit: also remove the eliminate_dead_code() in _unlift because of one onnx test failure:
a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state.
The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design

edit 2: Also fix the inconsistency of graph signatures when param_constant is marked as lifted_tensor_constants but it's registered as parameters in the output of ep.module().

Differential Revision: D60532628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132307
Approved by: https://github.com/zhxchen17
2024-08-08 01:36:16 +00:00
angelayi
c327710a87 [export] Publicize validate function (#132777)
as titled

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132777
Approved by: https://github.com/zhxchen17
2024-08-07 23:10:05 +00:00
Shangdi Yu
825002c9c6 [export][fx] More robust DCE pass (#132764)
Summary:
- make default DCE pass check schema,
- need to rebase onto https://github.com/pytorch/pytorch/pull/131651 after it's in phabricator (for now the change is manually added).

- mark Proxy dump as NotImplemented for better error msg

- Remove Proxy from tensors when dumping models, as Proxy cannot be dumped.

More details in https://docs.google.com/document/d/1G5vmTXjzxoyVGRI2kpA1gQukK_Glyg2NrE0Oh6Nlg9A/edit?usp=sharing.

Test Plan:
CI
```
- buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test/quantization:test_quantization -- -r  qat_conv2d
- test_export.py
- buck2 run 'fbcode//mode/dev-nosan' fbcode//modai/test:test_modai -- -r test_qat_stinson_htp_export
- buck2 run 'fbcode//mode/dev-nosan' fbcode//vizard_projects/ml_depth/tests:test_model -- -r test_qat_model_et
- buck2 run 'fbcode//mode/dev-nosan'  fbcode//caffe2/test:fx -- -r dce
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=False,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/backends/tests:qnn_test -- -r test_qat_bias=True,use_3d_input=False
- buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r  test_fold_bn_erases_bn_node
```

Reviewed By: angelayi

Differential Revision: D60319175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132764
Approved by: https://github.com/angelayi
2024-08-06 22:27:22 +00:00
Tugsbayasgalan Manlaibaatar
775c310c0c Preserve source_fn_stack in the training IR decomp (#132033)
Title

Differential Revision: [D60377712](https://our.internmc.facebook.com/intern/diff/D60377712/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132033
Approved by: https://github.com/angelayi
ghstack dependencies: #131988, #131995, #131999
2024-08-06 19:45:40 +00:00
Shangdi Yu
4a2cf50edf [export][reland] Convert autocast to HOO (#132677)
Summary:
Reland of D60206382.

Suggested in https://github.com/pytorch/pytorch/issues/128394.

If there's an autocast context manager, the predispatch (strict) graph can look something like:

```
class <lambda>(torch.nn.Module):
    def forward(self, x: "f32[1]"):
        ...
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
        mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1);  rand = rand_1 = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = None
        return (mm_1,)
```

But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.

Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.

Test Plan:
CI

```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_autocast"
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:test_export -- -r "test_predispatch_set_grad"
```

Verified that now we can export the llama model in  gh issue 128394 and the gemma model in  gh issue 131829 without error.

Differential Revision: D60770038

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132677
Approved by: https://github.com/angelayi
2024-08-05 22:34:52 +00:00
PyTorch MergeBot
a3ea96b762 Revert "[export] Convert autocast to HOO (#131914)"
This reverts commit aec948adfc.

Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/davidberard98 due to PR shouldn't have been relanded by the bot, phabricator diff did not have any recent changes and is still internally reverted ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2269797388))
2024-08-05 19:52:09 +00:00
Shangdi Yu
aec948adfc [export] Convert autocast to HOO (#131914)
Summary:
Suggested in https://github.com/pytorch/pytorch/issues/128394.

If there's an autocast context manager, the predispatch (strict) graph can look something like:

```
class <lambda>(torch.nn.Module):
    def forward(self, x: "f32[1]"):
        ...
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
        mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1);  rand = rand_1 = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = None
        return (mm_1,)
```

But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.

Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.

Test Plan:
CI

```
parsh --build-flags fbcode//mode/dev-nosan  fbcode//caffe2/test:test_export
run_tests("test_predispatch_autocast")
```

Reviewed By: angelayi

Differential Revision: D60206382

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914
Approved by: https://github.com/angelayi
2024-08-05 18:52:12 +00:00
Yidi Wu
618e2c9de4 fix torch rec test failure (#132437)
Summary: Fixes T192448049. The module call form an unusal call stack for the nodes: https://www.internalfb.com/phabricator/paste/view/P1507230978. This is currently not supported by unflattener and need some extra design to make it work.

Test Plan: buck2 run 'fbcode//mode/opt' torchrec/distributed/tests:test_pt2 -- --filter-text "test_sharded_quant_fpebc_non_strict_export"

Reviewed By: zhxchen17

Differential Revision: D60528900

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132437
Approved by: https://github.com/Skylion007
2024-08-05 18:06:07 +00:00
Xuehai Pan
f3fce597e9 [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769
Approved by: https://github.com/ezyang
2024-08-04 10:24:09 +00:00
PyTorch MergeBot
d984105748 Revert "[export] Convert autocast to HOO (#131914)"
This reverts commit b28c01d90d.

Reverted https://github.com/pytorch/pytorch/pull/131914 on behalf of https://github.com/ezyang due to Failing lint, but was covered up by master failure on lint ([comment](https://github.com/pytorch/pytorch/pull/131914#issuecomment-2267248773))
2024-08-04 02:10:35 +00:00
Shangdi Yu
b28c01d90d [export] Convert autocast to HOO (#131914)
Summary:
Suggested in https://github.com/pytorch/pytorch/issues/128394.

If there's an autocast context manager, the predispatch (strict) graph can look something like:

```
class <lambda>(torch.nn.Module):
    def forward(self, x: "f32[1]"):
        ...
        _enter_autocast = torch.amp.autocast_mode._enter_autocast('cuda', torch.bfloat16, True, None)
        mm: "f32[8, 8]" = torch.ops.aten.mm.default(rand, rand_1);  rand = rand_1 = None
        _exit_autocast = torch.amp.autocast_mode._exit_autocast(_enter_autocast);  _enter_autocast = None
        return (mm_1,)
```

But the operator `torch.amp.autocast_mode._enter_autocast` is not a valid ATen op. We remove these nodes by turning autocast into a higher order operator and make a submodule for the blocks between `_enter_autocast` and `_exit_autocast`.

Some potential followup improvement:
1) Merge some of the duplicated logic with `replace_set_grad_with_hop_pass.py`
2) Check the current autocast status (any enabled? dtype?) and not create a submodule if the autocast args matches current autocast status.

Test Plan:
CI

```
parsh --build-flags fbcode//mode/dev-nosan  fbcode//caffe2/test:test_export
run_tests("test_predispatch_autocast")
```

Reviewed By: angelayi

Differential Revision: D60206382

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131914
Approved by: https://github.com/angelayi
2024-08-03 05:48:57 +00:00
Avik Chaudhuri
ed4493de0e dim name is identifier (#132557)
Summary: Dim names appear in suggested fixes so should be valid Python identifiers.

Test Plan: none

Differential Revision: D60696854

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132557
Approved by: https://github.com/pianpwk
2024-08-03 05:28:50 +00:00
Shangdi Yu
a503136583 [export] Detect whether case_name is registered in exportdb (#132420)
Summary:
- moves logging functionalities into `torch/_export/db/logging.py` file.
- add a check in `_dynamo/eval_frame.py` to check for optional input and error out with `UnsupportedError`
- change the case name of `torch_sym_int` to `unsupported_operator`
- Check if the case name is registered in exportdb, if so, we give a link to the case in exportdb.
- TODO: add test

Test Plan:
CI

Running the example in https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input gives the following error logging:

```
E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086] Parameter y is optional with a default value of tensor([[-0.1633,  1.2414, -0.1071],
E0730 10:53:33.687000 4155538 torch/_dynamo/eval_frame.py:1086]         [-0.1936, -0.9425, -0.0824]])
E0730 10:53:33.688000 4155538 torch/export/_trace.py:1043] See optional_input in exportdb for unsupported case.                 https://pytorch.org/docs/main/generated/exportdb/index.html#optional-input
......
  File "/data/users/shangdiy/fbsource/buck-out/v2/gen/fbcode/389acaeb40d57230/tutorials/pytorch/nntest/__torchtest__/torchtest#link-tree/torch/_dynamo/eval_frame.py", line 1091, in produce_matching
    raise Unsupported(
torch._dynamo.exc.Unsupported: Tracing through optional input is not supported yet
```

It also logs a `export.error.classified` event in Scuba.

Reviewed By: zhxchen17

Differential Revision: D60427208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132420
Approved by: https://github.com/zhxchen17
2024-08-03 01:08:48 +00:00
PyTorch MergeBot
3855ac5a5d Revert "[export] Add print_readable to unflattener (#128617)"
This reverts commit ab9791c0e3.

Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/angelayi due to never got landed internally due to weird flow... sorry ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2264224466))
2024-08-01 23:47:29 +00:00
Oguz Ulgen
72d2dba992 Add None return type to init (#132335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335
Approved by: https://github.com/albanD
2024-08-01 15:26:45 +00:00
Tugsbayasgalan Manlaibaatar
928adb7cc2 Fix empty fake mode problem (#131995)
Title

Differential Revision: [D60348541](https://our.internmc.facebook.com/intern/diff/D60348541/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131995
Approved by: https://github.com/angelayi
ghstack dependencies: #131988
2024-08-01 04:55:37 +00:00
Tugsbayasgalan Manlaibaatar
073430ebea Don't check for autograd state when lowering to inference IR (#131988)
When lowering to inference IR, we shouldn't error on autograd state changes because we will have preserved the autograd state change at the training level. I think the more correct way of implementing it would be to wrap autograd ops in HOP before decomposing, but that seems low ROI.

Differential Revision: [D60346235](https://our.internmc.facebook.com/intern/diff/D60346235/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131988
Approved by: https://github.com/angelayi
2024-08-01 04:15:37 +00:00
angelayi
ab9791c0e3 [export] Add print_readable to unflattener (#128617)
Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](17b45e905a/torch/fx/graph_module.py (L824))), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module.

Example print from `python test/export/test_unflatten.py -k test_unflatten_nested`
```
class UnflattenedModule(torch.nn.Module):
    def forward(self, x: "f32[2, 3]"):
        # No stacktrace found for following nodes
        rootparam: "f32[2, 3]" = self.rootparam

        # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam
        mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam);  x = rootparam = None

        # No stacktrace found for following nodes
        foo: "f32[2, 3]" = self.foo(mul);  mul = None
        bar: "f32[2, 3]" = self.bar(foo);  foo = None
        return (bar,)

    class foo(torch.nn.Module):
        def forward(self, mul: "f32[2, 3]"):
            # No stacktrace found for following nodes
            child1param: "f32[2, 3]" = self.child1param
            nested: "f32[2, 3]" = self.nested(mul);  mul = None

            # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param
            add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param);  nested = child1param = None
            return add

        class nested(torch.nn.Module):
            def forward(self, mul: "f32[2, 3]"):
                # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x
                div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul);  mul = None
                return div

    class bar(torch.nn.Module):
        def forward(self, add: "f32[2, 3]"):
            # No stacktrace found for following nodes
            child2buffer: "f32[2, 3]" = self.child2buffer

            # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer
            sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer);  add = child2buffer = None
            return sub
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617
Approved by: https://github.com/zhxchen17, https://github.com/pianpwk
2024-07-30 00:41:44 +00:00
Avik Chaudhuri
c49e857d32 [pt] immutable accessors in graph signature (#131940)
Summary: splitting PT part of D60253955

Test Plan: existing tests

Differential Revision: D60296909

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131940
Approved by: https://github.com/angelayi, https://github.com/zhxchen17
2024-07-27 05:32:53 +00:00
Yidi Wu
404a8ae8f6 [export] fix set_grad x tensor constant. (#131787)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/130379.

The original error is verifier finds that the placeholder nodes' meta[''val"] are missing in subgraph of WrapSetGradEnabled hop.

In this PR, we fixed it by re-ordering the replace_set_grad_with_hop_pass with lift_constant_tensor pass because only after lift_constant_pass, all the constant attrs start to have meta["val"].

Test Plan: buck2 test test:test_export -- -r "test_setgrad_lifted_tensor"

Differential Revision: D60244935

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131787
Approved by: https://github.com/yushangdi
2024-07-26 16:41:59 +00:00
Zhengxu Chen
7feaa73057 [export] Remove deprecated fields from ExportedProgram ctor. (#131697)
Summary: as title.

Test Plan: CI

Reviewed By: SherlockNoMad

Differential Revision: D60078426

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131697
Approved by: https://github.com/ydwu4
2024-07-26 16:19:46 +00:00
PyTorch MergeBot
7339c8ab28 Revert "immutable accessors in graph signature (#131807)"
This reverts commit 6fd28fc228.

Reverted https://github.com/pytorch/pytorch/pull/131807 on behalf of https://github.com/atalman due to Broke CI: [GH job link](https://github.com/pytorch/pytorch/actions/runs/10111847569/job/27965364355) [HUD commit link](608057afe2) ([comment](https://github.com/pytorch/pytorch/pull/131807#issuecomment-2252875417))
2024-07-26 14:21:12 +00:00
Avik Chaudhuri
6fd28fc228 immutable accessors in graph signature (#131807)
Test Plan: existing tests

Differential Revision: D60253955

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131807
Approved by: https://github.com/ydwu4
2024-07-26 08:56:19 +00:00
Avik Chaudhuri
2bf649f5ae suggested fix for data-dependent error (#125378)
Suggests fixes for data-dependent errors in non-strict export.

Any data-dependent error has an unresolved condition on unbacked symints. A mechanizable strategy for fixing such errors, which this PR enables, is to "bash" them using `torch._check()`s. For each error we suggest using `torch._check()` on the condition or its negation. The user selects and copy-pastes the suggested fix and continues.

For example, here's an existing data-dependent error message with the suffix following `<snip>...</snip>` added by this PR:
```
Could not guard on data-dependent expression Eq(u2, u1) (unhinted: Eq(u2, u1)).  (Size-like symbols: u1)

<snip>...</snip>

User code:
  File "test/export/test_export.py", line 1944, in forward
    return r.view(items[0], items[2])

Suggested fixes (please choose one of the following):
  1. torch._check(items[2] == r.shape[1])
  2. torch._check(items[2] != r.shape[1])"
```

Tests in this PR illustrate this workflow, by taking common examples of data-dependent errors and bashing them until success, purely based on suggested fixes. In particular, we test this workflow on the "puzzlers" in https://www.internalfb.com/intern/anp/view/?id=5330476 (thanks @ezyang).

In terms of implementation, we focus on non-strict mode, where we can intercept torch function calls to install a handler that walks up the stack from the error, finding the closest non-torch frame and inspecting its locals for symints appearing in the error. The suggested fixes then access these symints through the local variables so that they can be (a) easily understood by the user (b) directly added to the code.

Implementing this idea in strict mode is follow-up work—we have already investigated what it would take, and decided to separate it out of this PR for reasons described next.

It's not too hard to map symints to locals in Dynamo (although it needs to happen elsewhere, i.e., intercepting torch function calls won't work). However, unfortunately this doesn't seem to be enough; the graph modules created by Dynamo when going through AOTAutograd can raise further data-dependent errors in some cases, and thus we need yet another mechanism to map symints to locals for graph modules, via captured source-level metadata and FX node walking. This latter component will require some care to build properly, or we might conclude it is altogether unnecessary and fix Dynamo instead.

Differential Revision: D56867432

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125378
Approved by: https://github.com/ezyang
2024-07-26 08:34:50 +00:00
Avik Chaudhuri
5b05ad9697 fix non-persistent buffers (#131756)
Summary:
Dynamo doesn't track whether buffers are `persistent`. This led to some ugly code where we would mark buffers as always persistent when creating signatures, then later check whether the buffers were not in the state dict to infer whether they were non-persistent, and use this to fix up the signature.

This PR instead defines a utility to look up all the non-persistent buffers registered inside a module (this information is recorded in a private `_non_persistent_buffers_set` module attribute), and uses it to (a) correctly set the persistent flag on buffers when creating signatures (b) transfer this information to a Dynamo-traced graph module, which then causes non-persistent buffers to (correctly) not show up in the state dict.

Test Plan: existing tests + new case with non-persistent buffer in nested module

Differential Revision: D60224656

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131756
Approved by: https://github.com/zhxchen17, https://github.com/ydwu4
2024-07-26 04:45:30 +00:00
Yidi Wu
2c1851f04e [export] fix output node's meta (#131706)
Summary:
This pr fixes all the places in strict export stack where the output node's meta is not preserved correctly. However, we're getting a new error for the test we intend to fix: `buck2 run caffe2/test/quantization:test_quantization -- -r "test_re_export_preserve_handle"`:

The `get_attr` nodes has wrong metadata. I guess there are more things need to be fixed to get it working but it's beyond the scope of this PR.

Test Plan: buck2 run caffe2/test/quantization:test_quantization -- -r "test_re_export_preserve_handle"

Differential Revision: D60198221

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131706
Approved by: https://github.com/yushangdi
2024-07-25 18:44:21 +00:00
Yidi Wu
75c4176b05 [export][BE] consolidate export and export_for_training (#131496)
Summary: This PR consolidates the implementation of export and export_for_training to maximize code re-use. Also add some type annotations and comments in the code for better readability.

Test Plan: Existing tests.

Differential Revision: D60130515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131496
Approved by: https://github.com/avikchaudhuri, https://github.com/pianpwk
2024-07-25 16:35:16 +00:00
Shangdi Yu
6bc8db1d32 Rename is_training flag to have more information (#131618)
Summary: rename is_training flag into dispatch_tracing_mode = “make_fx” or “aot_export”

Test Plan: OSS CI

Differential Revision: D60154327

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131618
Approved by: https://github.com/ydwu4
2024-07-25 16:29:55 +00:00
Shangdi Yu
29c9f8c782 [export] Fix graph_break log registration error when importing export/_trace.py (#131523)
Summary:
When importing `_trace.py`, put `torch._dynamo.exc.Unsupported` in the global variable ``_ALLOW_LIST`` can cause import to ``export/_trace.py`` to fail with error:

ValueError: Artifact name: 'graph_breaks' not registered, please call register_artifact('graph_breaks') in torch._logging.registrations.

The error is directly raise on line `graph_breaks_log = torch._logging.getArtifactLogger(__name__, "graph_breaks")` in `_dynamo/exc.py`. I've checked that ``register_artifact('graph_breaks')`` does already exist in torch._logging.registrations.

Explicitly call `import torch._logging` doesn't fix the issue.

(see T196719676)

We move ``_ALLOW_LIST`` to be a local variable.

Test Plan:
buck2 test 'fbcode//mode/opt' fbcode//aiplatform/modelstore/publish/utils/tests:fc_transform_utils_test -- --exact 'aiplatform/modelstore/publish/utils/tests:fc_transform_utils_test - test_serialized_model_for_disagg_acc (aiplatform.modelstore.publish.utils.tests.fc_transform_utils_test.PrepareSerializedModelTest)'

buck2 test 'fbcode//mode/opt' fbcode//aiplatform/modelstore/publish/utils/tests:fc_transform_utils_test -- --exact 'aiplatform/modelstore/publish/utils/tests:fc_transform_utils_test - test_serialized_test_dsnn_module (aiplatform.modelstore.publish.utils.tests.fc_transform_utils_test.PrepareSerializedModelTest)'

Differential Revision: D60136706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131523
Approved by: https://github.com/zhxchen17
2024-07-24 22:40:24 +00:00
Avik Chaudhuri
83d19620f6 kill tmp _is_executorch flag (#131488)
Test Plan: existing tests

Differential Revision: D60126186

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131488
Approved by: https://github.com/ydwu4
2024-07-24 08:51:37 +00:00
Aaron Orenstein
5a0068cc69 [BE] mypy: disallow untyped decorators (#131428)
Untyped decorators strip the types from their decorated function so even if the underlying function is fully typed then callers to it don't get any benefit from type annotations.

Step 1 - Enable the error and override in all the offending files.

#131429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131428
Approved by: https://github.com/justinchuby, https://github.com/oulgen
2024-07-23 21:50:55 +00:00
Avik Chaudhuri
94f22eb6b2 refactor post-trace fakification in strict (#131421)
Summary:
Previously it was unclear what `_convert_input_to_fake` actually does (used in strict), and in particular how it is different from `make_fake_inputs` (used in non-strict).

This PR splits that function to work purely on user inputs, then renames it to `extract_fake_inputs` and adds a comment clarifying what it does—namely, it extracts fake inputs from a given graph module instead of "converting inputs to fake inputs" (as suggested by the current name) or "making fake inputs" (as happens in non-strict, where no tracing has taken place yet).

The remainder of that function used to also fakify params and buffers. It turns out that this part is identical to what happens in non-strict, hence we also pull `make_fake_inputs` out from `non_strict_utils` into `_trace`, merge it with another util, and make both modes call it.

Test Plan: existing tests

Differential Revision: D60084442

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131421
Approved by: https://github.com/zhxchen17
2024-07-23 18:23:03 +00:00
Shangdi Yu
cfb9ccab6c [export] Filter errors by exception type, add case name (#131327)
Summary:
-  Log export errors to Scuba and mark them with "classified" and "unclassified"
- Classify errors by exception type (ALLOW_LIST) and a `case_name` attribute
- Add `case_name` for some exceptions.

Test Plan:
Running the code below logs a classified error to `torch_export_usage` table in Scuba.

```
import torch

from torch._export.db.case import SupportLevel

class TorchSymMin(torch.nn.Module):
    """
    torch.sym_min operator is not supported in export.
    """

    def forward(self, x):
        return x.sum() + torch.sym_min(x.size(0), 100)

example_args = (torch.randn(3, 2),)
tags = {"torch.operator"}
support_level = SupportLevel.NOT_SUPPORTED_YET
model = TorchSymMin()

torch.export.export(model, example_args)
``

Differential Revision: D59981459

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131327
Approved by: https://github.com/zhxchen17
2024-07-23 18:01:13 +00:00
Sherlock Huang
c1ef214046 Print ExportedProgram without color by default (#131399)
Summary:
Without plugin, colored ExportedProgram is not really readable.

![image](https://github.com/user-attachments/assets/319920a9-bb4b-4ad2-bcac-0c4f76973b11)

Test Plan: CI

Differential Revision: D60074481

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131399
Approved by: https://github.com/angelayi
2024-07-23 16:41:55 +00:00
Shangdi Yu
29e2e2afb6 Revert D59561509: Multisect successfully blamed "D59561509: [FX][export] DCE pass, check schema for node impurity (#130395)" for one test failure (#131341)
Summary:
This diff reverts D59561509
D59561509: [FX][export] DCE pass, check schema for node impurity (#130395) by yushangdi causes the following test failure:

Tests affected:
- [cogwheel:cogwheel_mtia_cmf_m5_shrunk_test#test_flow_with_verification](https://www.internalfb.com/intern/test/844425041436985/)

Here's the Multisect link:
https://www.internalfb.com/multisect/6533402
Here are the tasks that are relevant to this breakage:
T191383430: 10+ tests unhealthy for ads_mtia_inference

The backout may land if someone accepts it.

If this diff has been generated in error, you can Commandeer and Abandon it.

Test Plan: NA

Differential Revision: D60029318

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131341
Approved by: https://github.com/angelayi
2024-07-23 05:23:47 +00:00
Avik Chaudhuri
1e5ecc4277 move save/load from _export to export (#131353)
Test Plan: existing tests

Differential Revision: D60053905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131353
Approved by: https://github.com/angelayi
2024-07-23 00:48:28 +00:00
angelayi
26f7dd286b [export] Allow non-CIA ops to be preserved (#131075)
I feel like the semantics of `run_decompositions(preserve_ops,...)` should be that we should always preserve whatever operator is put into `preserve_ops`, even if it's not CIA?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131075
Approved by: https://github.com/bdhirsh
2024-07-23 00:41:48 +00:00
PyTorch MergeBot
b9912f31ef Revert "[export] fix zero arg export in training_ir (#130990)"
This reverts commit 50436d5bdb.

Reverted https://github.com/pytorch/pytorch/pull/130990 on behalf of https://github.com/clee2000 due to failing some executorch and torchrec tests internally D60006710 ([comment](https://github.com/pytorch/pytorch/pull/130990#issuecomment-2243395316))
2024-07-22 16:49:25 +00:00
Yidi Wu
50436d5bdb [export] fix zero arg export in training_ir (#130990)
Fixed TrainingIRToRunDecomp failures for test_tensor_attribute_zero_args and also a few re-tracability failures because run_decomposition does a retracing.

**edit:** also remove the eliminate_dead_code() in _unlift because of one onnx test failure:
a constant tensor attr was lifted as constant_tensor input but it's not used in the graph after aot_autograd due to a short cut in its decomposition. This causes the setattr to be removed by eliminate_dead_code but the graph signature still contains the name of that buffer, which causes an inconsitency between the transformed graph and ep's original signature after _unlift. And it seems that this has happened a few times where some nodes are accidentally removed and we're in an inconsistent state.

The alternative of removing it would be: every time we call elimiate_dead_code, we verify the consistency of the graph with 1. the graph before transformation and 2. all the meta datas but i think this deserves a complete design.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130990
Approved by: https://github.com/pianpwk
2024-07-20 02:35:13 +00:00
Pian Pawakapan
745324e487 [export] turn on hybrid symints by default (#130775)
Sets `prefer_deferred_runtime_asserts_over_guards=True` for export, so any guards emitted from `SymNode.expect_true` (for example, guards that are implicitly required to be true for an op to succeed) won't lead to constraint violations. Instead these should appear in the graph as runtime asserts, or potentially as replacement expressions for placeholder shapes.

For example, this reshape op should emit s0 * s1 = s2, deferred as a runtime assert.
```
x = torch.randn(4, 8)  # [s0, s1]
y = torch.randn(32)  # [s2]
out = x.reshape(-1) + y
# this emits Eq(s0 * s1, s2), and we represent y's shape as [s0*s1] in the graph.
```

However, other complex guards can still cause export to fail, for instance guards emitted from `SymNode.guard_bool/guard_size_oblivious` (e.g. explicit if-else conditions in user code or lower-level op implementations hit during tracing) can still raise constraint violations. These can be deferred with `allow_complex_guards_as_runtime_asserts=True`. We don't yet make this default, because while this makes export more likely to succeed, it results in non-trivial asserts being emitted that often represent specialization to a variant of the op, or checks related to 0/1 specialization.

We also remove forced specializations for export and kill the `_disable_forced_specializations` flag - now any guard we can't express with Dims/DerivedDims either are handled with Hybrid SymInts, or should be resolved with rewriting or deferring.

Follow up:
Currently, `ShapeEnv._set_replacement()` is called for complex equality expressions (e.g. s2 -> s0*s1 in the example above), and the ExportedProgram stores `s0*s1` in the input placeholder. This isn't checked for validity when the program is run, so an option is to avoid replacement and/or runtime assert on equality.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130775
Approved by: https://github.com/avikchaudhuri
2024-07-18 17:40:58 +00:00
Zhengxu Chen
5484c86021 [export] Fully support extension op in serialization/deserialization. (#130851)
Summary: Finishing up the mechanism to "register" certain types of operators to a registry so that the serializer can handle them correctly. This is expected to be firstly used by executorch.

Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_export_with_extension_op_serialization

Differential Revision: D59825148

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130851
Approved by: https://github.com/angelayi
2024-07-18 16:47:53 +00:00
Shangdi Yu
27ded03545 [FX][export] DCE pass, check schema for node impurity (#130395)
Change the default DCE pass to check node schema for impure nodes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130395
Approved by: https://github.com/angelayi, https://github.com/jgong5
2024-07-18 16:31:40 +00:00
PyTorch MergeBot
d6ae8bbf16 Revert "[export] Add print_readable to unflattener (#128617)"
This reverts commit 9fee87e4cd.

Reverted https://github.com/pytorch/pytorch/pull/128617 on behalf of https://github.com/clee2000 due to broke inductor/test_flex_attention https://github.com/pytorch/pytorch/actions/runs/9984688318/job/27595182606 433ef4e444 Not run on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/128617#issuecomment-2236867975))
2024-07-18 15:31:51 +00:00
angelayi
6c2c8ee15b [export] Remove preserved ops from decomp list (#130970)
Fixes https://fb.workplace.com/groups/1075192433118967/permalink/1466016147369925/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130970
Approved by: https://github.com/bdhirsh
2024-07-18 05:15:22 +00:00
PyTorch MergeBot
433ef4e444 Revert "[FX][export] DCE pass, check schema for node impurity (#130395)"
This reverts commit e22b0acc76.

Reverted https://github.com/pytorch/pytorch/pull/130395 on behalf of https://github.com/yushangdi due to breaking tests, need to rebase and fix ([comment](https://github.com/pytorch/pytorch/pull/130395#issuecomment-2235192986))
2024-07-18 02:46:03 +00:00
angelayi
9fee87e4cd [export] Add print_readable to unflattener (#128617)
Taking inspiration from `GraphModule.print_readable` (aka I copied its [code](17b45e905a/torch/fx/graph_module.py (L824))), I added a `print_readable` to the unflattened module, because it's kind of nontrivial to print the contents of this module.

Example print from `python test/export/test_unflatten.py -k test_unflatten_nested`
```
class UnflattenedModule(torch.nn.Module):
    def forward(self, x: "f32[2, 3]"):
        # No stacktrace found for following nodes
        rootparam: "f32[2, 3]" = self.rootparam

        # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:99 in forward, code: x = x * self.rootparam
        mul: "f32[2, 3]" = torch.ops.aten.mul.Tensor(x, rootparam);  x = rootparam = None

        # No stacktrace found for following nodes
        foo: "f32[2, 3]" = self.foo(mul);  mul = None
        bar: "f32[2, 3]" = self.bar(foo);  foo = None
        return (bar,)

    class foo(torch.nn.Module):
        def forward(self, mul: "f32[2, 3]"):
            # No stacktrace found for following nodes
            child1param: "f32[2, 3]" = self.child1param
            nested: "f32[2, 3]" = self.nested(mul);  mul = None

            # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:79 in forward, code: return x + self.child1param
            add: "f32[2, 3]" = torch.ops.aten.add.Tensor(nested, child1param);  nested = child1param = None
            return add

        class nested(torch.nn.Module):
            def forward(self, mul: "f32[2, 3]"):
                # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:67 in forward, code: return x / x
                div: "f32[2, 3]" = torch.ops.aten.div.Tensor(mul, mul);  mul = None
                return div

    class bar(torch.nn.Module):
        def forward(self, add: "f32[2, 3]"):
            # No stacktrace found for following nodes
            child2buffer: "f32[2, 3]" = self.child2buffer

            # File: /data/users/angelayi/pytorch2/test/export/test_unflatten.py:87 in forward, code: return x - self.child2buffer
            sub: "f32[2, 3]" = torch.ops.aten.sub.Tensor(add, child2buffer);  add = child2buffer = None
            return sub
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128617
Approved by: https://github.com/zhxchen17, https://github.com/pianpwk
2024-07-18 01:36:01 +00:00
Shangdi Yu
e22b0acc76 [FX][export] DCE pass, check schema for node impurity (#130395)
Change the default DCE pass to check node schema for impure nodes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130395
Approved by: https://github.com/angelayi, https://github.com/jgong5
2024-07-18 00:55:20 +00:00
Pian Pawakapan
d96c80649f [export] constants & non-persistent buffers for training IR (#130864)
Summary: Uses original ExportedProgram constants and graph signature to inform decompositions, so that constant tensors and non-persistent buffers are respected for training IR. Removes 7 test failures for training IR.

Test Plan: test_export

Differential Revision: D59820909

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130864
Approved by: https://github.com/angelayi
2024-07-17 18:27:53 +00:00
Pian Pawakapan
e8998d68c8 [export] add non-strict training IR (#130062)
Summary: Adds non-strict implementation of training IR export. Any expected non-strict training IR failures are also either existing strict training IR or non-strict failures (no new failures added). 4 strict training IR failures also resolved.

Refraining from unifying export/export_for_training, per @ydwu4's feedback :)

Test Plan: added test_export_training_ir_to_run_decomp_non_strict.py for non-strict training IR

Differential Revision: D59349454

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130062
Approved by: https://github.com/ydwu4, https://github.com/zhxchen17
2024-07-16 17:08:00 +00:00
Aaron Orenstein
567482973d typing fake_tensor.py (#128041)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128041
Approved by: https://github.com/eellison
ghstack dependencies: #129182
2024-07-13 06:07:40 +00:00
Pian Pawakapan
988ed4d5db [export] clean up allow_complex_guards_as_runtime_asserts flag (#130596)
Summary: removes underscore, cleans up dead code in DimConstraints

Test Plan: existing export tests

Reviewed By: angelayi

Differential Revision: D59612746

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130596
Approved by: https://github.com/angelayi
2024-07-12 17:17:11 +00:00
Shangdi Yu
ea4b80e6d6 [FX][export] strict DCE pass, check schema for node impurity (#130552)
Fixes the failure in `test/export/test_export_training_ir_to_run_decomp.py ` caused by dead code elimination removing node with side effects.

For background, in export, we may want to export higher-level IRs that are not functional, so we need to check for side effects more carefully.

 A call_function node is impure if it has at least one mutable argument.

Fixed the tests below:

test_to_module_with_mutated_buffer_multiple_update_sub_later
test_export_input_mutation_static_shape
test_buffer_util

Another attempt modifying the original DCE pass is made in PR #130395, but it breaks some other tests, so here we add a flag and use it for export only.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130552
Approved by: https://github.com/pianpwk
2024-07-12 15:43:27 +00:00
Pian Pawakapan
18b7633bfb [export] fix kwargs in run_decompositions() for training IR (#130553)
Re-exporting GraphModule expects all inputs to be in args, though not in pytree-flattened format. This avoids failing when we run with a fx.Interpreter subclass in [AOTAutograd tracing](973037be6a/torch/_functorch/_aot_autograd/traced_function_transforms.py (L760-L762)).

Removes 7 test failures for training IR export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130553
Approved by: https://github.com/zhxchen17, https://github.com/ydwu4
2024-07-11 22:53:18 +00:00
Zhengxu Chen
726a287271 [export] Expand verifier to be multiple on ExportedProgram (#130364)
Summary: This diff updates the ExportedProgram class in PyTorch to allow for multiple verifiers to be attached to it. This is done by adding a new field to the ExportedProgram schema called "verifiers" which is a list of strings representing the names of the verifiers to be attached to the program. The verifiers are loaded using the "load_verifier" function which is defined in the "torch._export.serde.serialize" module. The "exported_program.dialect" field is also deprecated in favor of the "verifiers" field.

Test Plan: CI

Differential Revision: D59408546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130364
Approved by: https://github.com/angelayi, https://github.com/ydwu4
2024-07-11 20:34:49 +00:00
Yidi Wu
cd9bae30de Allow kwargs in _remove_effect_tokens_pass (#130491)
Summary: Previously, remove_effect_tokens pass didn't pass kwargs to the internal nodes. This PR fix it and add a test for it.

Test Plan: buck2 run caffe2/test:test_export -- -r test_remove_effect_token_kwargs

Reviewed By: angelayi

Differential Revision: D59603147

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130491
Approved by: https://github.com/angelayi
2024-07-11 19:03:19 +00:00
Pian Pawakapan
1b3b4c2fb9 [runtime asserts] deduplicate runtime asserts & CSE (#128599) (#130380)
original PR: https://github.com/pytorch/pytorch/pull/128599 (re-created after revert + poisoned diff train)

Summary:
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]

s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Test Plan:
contbuild & OSS CI, see 940e4477ab

Original Phabricator Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Differential Revision: D59543603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130380
Approved by: https://github.com/izaitsevfb
2024-07-10 19:23:37 +00:00
Shangdi Yu
c83b941141 [export] add dynamic shapes argument and infer from graph nodes (#129928)
Fixes the example in #118304 for `torch._functorch.aot_autograd.aot_export_module` and `torch.export.export`.

On a high level, the issue is caused by not detecting fake_mode when there's no input.

Change plan:

1) we add a  `dynamic_shapes: Union[bool, None] = None` arg to `aot_export_module` and `_aot_export_function`.

2) if the input is not a graph module, then we can only rely on this `dynamic_shapes` input arg.

3) If the input is a graph module, then we can traverse the graph and check.

4) So we check if the input mod is a graph module or just a module, and do 2) or 3) depending on the type.

Fixes #129927

Bug source: dynamo's fake_mode is not detected correctly in `_convert_input_to_fake` in `_traced.py` when there’s no input to the graph). So in ` _strict_export_lower_to_aten_ir`, we create another fake_mode. `dynamo_fake_mode` is not the same as the fake_mode used by dynamo.

Change plan:
check `gm_torch_level` graph's node meta "example_value" for fake mode in addition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129928
Approved by: https://github.com/angelayi
2024-07-10 15:51:05 +00:00
PyTorch MergeBot
9c9744c3ac Revert "[runtime asserts] deduplicate runtime asserts & CSE (#128599)"
This reverts commit 940e4477ab.

Reverted https://github.com/pytorch/pytorch/pull/128599 on behalf of https://github.com/izaitsevfb due to breaking internal APS tests, see D59498864 ([comment](https://github.com/pytorch/pytorch/pull/128599#issuecomment-2218724762))
2024-07-09 21:03:49 +00:00
Pian Pawakapan
940e4477ab [runtime asserts] deduplicate runtime asserts & CSE (#128599)
This PR adds deduplication and CSE for runtime asserts. Existing size computation in the graph is CSE'd along with added runtime asserts, and redundant asserts are removed. Shape calls on intermediate tensors are also turned into compute on input sizes if possible, allowing intermediate tensors to be freed earlier. For example:
```
z = torch.cat([x, x], dim=0)  # 2*s0
w = z.repeat(y.shape[0])  # 2*s0*s1
_w = w.shape[0]
# something with _w ...

# turns into ->
s0 = x.shape[0]
s1 = y.shape[0]
_w0 = 2 * s0
_w = _w0 * s1
```

Additionally, constrain_range calls are deduplicated. Single-symbol bound checks for unbacked symbols (e.g. u0 >= 0, u0 <= 5) and sym_constrain_range.default calls are also removed, since they accumulate range info in the ShapeEnv, and are replaced with two _assert_scalar.default calls that check the min/max bounds. For example:
```
torch.sym_constrain_range_for_size(n, min=2, max=16)
torch.sym_constrain_range(n, min=4, max=20)
torch._check(n >= 0)
torch._check(n >= 3)
torch._check(n <= 14)

# turns into
torch.sym_constrain_range_for_size(n)
torch._check(n >= 4)
torch._check(n <= 14)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128599
Approved by: https://github.com/ezyang
2024-07-07 20:10:14 +00:00