Summary:
Adding "from_node" information that indicates which nodes are unlifted in `.module()` call.
The lifted nodes will have "ExportedProgram.module().unlift()" passname in the last entry of from_node.
Test Plan:
```
buck run fbcode//caffe2/test:test_export -- -r test_from_node_metadata_export
```
Rollback Plan:
Reviewed By: angelayi
Differential Revision: D75837494
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155053
Approved by: https://github.com/angelayi
Refactor the pt2 archive saving to consolidate the format of torch.export.save and torch._inductor.package.package_aoti.
This PR adds the following functions, which torch.export.save and AOTI packaging calls into:
```python
package_pt2(
f: FileLike,
*,
exported_programs: Optional[Union[ExportedProgram, dict[str, ExportedProgram]]] = None,
aoti_files: Optional[Union[list[str], dict[str, list[str]]]] = None,
extra_files: Optional[dict[str, Any]] = None,
) -> FileLike
@dataclass
class PT2ArchiveContents:
exported_programs: dict[str, ExportedProgram]
aoti_runners: dict[str, AOTICompiledModel]
extra_files: dict[str, Any]
load_pt2(f: FileLike) -> PT2ArchiveContents
```
Power users directly call into these APIs if they want to bundle multiple exported programs, aoti files, or extra metadata.
This is how the pt2 archive looks like ([spec](https://docs.google.com/document/d/1RQ4cmywilnFUT1VE-4oTGxwXdc8vowCSZsrRgo3wFA8/edit?tab=t.0)):
```
├── archive_format
├── version
├── .data
├── data
│ ├── aotinductor
│ │ └── model1
│ │ ├── model1.cpp
│ │ ├── model1.so # currently AOTI automatically moves weights in here, TODO to move it out
│ │ ├── cg7domx3woam3nnliwud7yvtcencqctxkvvcafuriladwxw4nfiv.cubin
│ │ └── cubaaxppb6xmuqdm4bej55h2pftbce3bjyyvljxbtdfuolmv45ex.cubin
│ ├── weights
│ │ ├── model1.pt # TODO to dedup weights between model1/model2
│ │ └── model2.pt
│ └── constants
│ │ ├── model1.pt # TODO to dedup weights between model1/model2
│ │ └── model2.pt
│ └── sample_inputs
│ ├── model1.pt # TODO to dedup weights between model1/model2
│ └── model2.pt
├── extra
│ └── user_metadata.txt
└── models
├── model1.json
└── model2.json
```
Future todos:
- unbundle the weights -- instead of .pt, we can use bin files, which will also allow us to dedup weights if we store multiple models
- update aoti_compile_and_package to also save the exported program
- integrate TNR with this packaging flow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152495
Approved by: https://github.com/yushangdi
Summary:
Before:
`from sigmoid.core.package.pt2_archive import PT2ArchiveWriter, PT2ArchiveReader, is_sigmoid_package`
After:
`from torch.export.pt2_archive import PT2ArchiveWriter, PT2ArchiveReader, is_pt2_package`
By merging the two PT2ArchiveReader/Writers, into using the native PytorchFileReader/Writer, the open source PT2 archive also changed to have an additional folder. However this PR still maintains support for loading an old PT2 archive which does not have the additional folder.
Before:
```
├── archive_format
├── byteorder
├── .data
│ ├── serialization_id
│ └── version
├── data
│ ├── aotinductor
```
After:
```
├── tmp
│ ├── archive_format
│ ├── byteorder
│ ├── .data
│ │ ├── serialization_id
│ │ └── version
│ ├── data
│ │ ├── aotinductor
```
Test Plan:
`buck2 test //sigmoid/...`
https://www.internalfb.com/intern/testinfra/testrun/5348024839248187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153795
Approved by: https://github.com/zhxchen17
Summary:
Before:
`from sigmoid.core.package.pt2_archive import PT2ArchiveWriter, PT2ArchiveReader, is_sigmoid_package`
After:
`from torch.export.pt2_archive import PT2ArchiveWriter, PT2ArchiveReader, is_pt2_package`
By merging the two PT2ArchiveReader/Writers, into using the native PytorchFileReader/Writer, the open source PT2 archive also changed to have an additional folder. However this PR still maintains support for loading an old PT2 archive which does not have the additional folder.
Before:
```
├── archive_format
├── byteorder
├── .data
│ ├── serialization_id
│ └── version
├── data
│ ├── aotinductor
```
After:
```
├── tmp
│ ├── archive_format
│ ├── byteorder
│ ├── .data
│ │ ├── serialization_id
│ │ └── version
│ ├── data
│ │ ├── aotinductor
```
Test Plan:
`buck2 test //sigmoid/...`
https://www.internalfb.com/intern/testinfra/testrun/5348024839248187
Differential Revision: D74616598
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153795
Approved by: https://github.com/zhxchen17
Basically adds native _IntWrapper support to dynamo. Here's my process of trying to make symint input support work on dynamo, and how I ended up with this approach [(doc)](https://docs.google.com/document/d/1GvNRQd8BnxlMay_hrEVgEta6VUeUW_hcFeRuB7q1nDY/edit?tab=t.0).
What I did was, before passing inputs to dynamo.export, I first wrap them with a class, `_IntWrapper`. When processing dynamic shapes, I will then add the corresponding dynamic shape specification to the `dynamism` field stored on the `_IntWrapper`. If there is no dynamism specified, then this will get unwrapped back to an integer. When dynamo tracing, when we encounter an `_IntWrapper`, we will convert this to a symint if the dynamism was specified as `Dim.DYNAMIC/AUTO`. Dynamo will then trace a graph that contains symint inputs, which will get passed to AOTAutograd and so on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152677
Approved by: https://github.com/pianpwk
Change logging.error to logging.exception to log additional information when relevant. A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
Summary:
1. Adding `input` field to `_adapt_flat_args` function
2. In `process_forward_inputs`, `reorder_kwargs` will now do nothing if no kwargs are provided (previously would error)
3. Pass `args` as input to `_adapt_flat_args`
These changes are made to update the InputAdapter
see more context in D73811508
Test Plan: see D73811508
Differential Revision: D73945419
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152575
Approved by: https://github.com/angelayi
We need to make function schema proxyable to trace a the auto_functionalized hop that takes function schema as inputs. The implementation basically follows how we support torchbind object:
1. upon seeing an untracked function schema arg, we creates a constant get_attr node
2. we track the function schema argument in export to support lift/unlift.
3. we need to support serde for functional schema. We'll add support for this in follow-up PRs.
However, compared with torchbind object:
1. we don't need a dynamo implementation, because the function schema is added when we auto_functionalize a hop to the argument of auto_functionalized. One potential use case is users re-traces an exported program with strict mode. Since non-strict is the default now, we don't see a use case yet.
2. we don't need an inductor implementation, because the function schema will go away after auto_functionalized re-inplacing pass.
edit: we greatly simplifies (and generalizes) the implementation following @zou3519 's suggestion of using pytree.register_constant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152073
Approved by: https://github.com/zou3519
ghstack dependencies: #152072
With `AdditionalInputs`, the behavior is the same as with tensors:
```python
class M(torch.nn.Module):
def forward(self, x, y):
return x + y
additional_inputs = torch.export.AdditionalInputs()
additional_inputs.add((5, 5))
additional_inputs.add((3, 5))
additional_inputs.add((5, 4))
ep = torch.export.export(
M(), (6, 7), dynamic_shapes=additional_inputs, strict=False
)
```
With `ShapesCollection`, we now need to wrap integer inputs as `_IntWrapper` so that we can have a unique identifier for each integer input.
```python
class M(torch.nn.Module):
def forward(self, x, y):
return x + y
from torch.export.dynamic_shapes import _IntWrapper
args = (_IntWrapper(5), _IntWrapper(5))
# Or we can do `args = pytree.tree_map_only(int, lambda a: _IntWrapper(a), orig_args)`
shapes_collection = torch.export.ShapesCollection()
shapes_collection[args[0]] = Dim.DYNAMIC
shapes_collection[args[1]] = Dim.DYNAMIC
ep = torch.export.export(
M(), args, dynamic_shapes=shapes_collection, strict=False
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151842
Approved by: https://github.com/pianpwk
If a custom operator does not contain a fake impl, currently draft-export will use the real-tensor propagation to get an output for the operator and continue tracing. However if we retrace the exported model using `ep.run_decompositions`, or `export`, or run the exported program with fake tensors, we'll still fail because there's no fake impl.
With this PR, after draft-export we will generate an operator profile for each operator call that we encounter, and store this on the report attached to the exported program `ep._report.op_profiles`. Users can then use `torch._library.fake_profile.register_fake_profile` to temporarily generate and register a fake impl based on these operator profiles. This way future fake tensor retracing will work.
The workflow would look something like:
```python
class M(torch.nn.Module):
def forward(self, a, b):
res = torch.ops.mylib.foo8(a, b) # no fake impl
return res
ep = export(M(), (torch.ones(3, 4), torch.ones(3, 4)) # this fails bc no fake impl
ep = draft_export(M(), (torch.ones(3, 4), torch.ones(3, 4))
ep.run_decompositions() # this fails bc no fake impl
# this registers fake impls based on the profiles
with torch._library.fake_profile.register_fake_profile(ep._report.op_profiles):
decomp = ep.run_decompositions() # this works
new_inp = (
torch.ones(2, 3, 4),
torch.ones(2, 3, 4),
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150809
Approved by: https://github.com/zou3519
Given an exception in torch.export, I want to try/catch it to add the message "hey try out draft-export!". Currently I only add this message for errors that draft-export is known to fix, like DataDependentErrors, ConstraintViolationErrors, and no fake impl.
Originally the error message looks like:
```
File "/data/users/angelayi/pytorch/torch/_library/custom_ops.py", line 626, in fake_impl
raise RuntimeError(
RuntimeError: There was no fake impl registered for <CustomOpDef(mylib::foo2)>. This is necessary for torch.compile/export/fx tracing to work. Please use `foo2_impl.register_fake` to add an fake impl.
```
Now, the error msg now looks something like:
```
File "/data/users/angelayi/pytorch/torch/_library/custom_ops.py", line 626, in fake_impl
raise RuntimeError(
RuntimeError: There was no fake impl registered for <CustomOpDef(mylib::foo2)>. This is necessary for torch.compile/export/fx tracing to work. Please use `foo2_impl.register_fake` to add an fake impl.
The error above occurred when calling torch.export.export. If you would like to view some more information about this error, and get a list of all other errors that may occur in your export call, you can rerun your program with the `DRAFT_EXPORT=1` envvar, or replace your `export()` call with `draft_export()`.
```
In python versions >= 3.11, we can use `exception.add_note` to add to the error message. However with previous versions I did a hack to modify `e.args`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151065
Approved by: https://github.com/pianpwk
ghstack dependencies: #151051
Summary:
as title
`export._trace._WrapperModule` is used to wrap functions into a Module so we can export the function.
We add `export._wrapper_utils` to `dynamo`'s `MOD_INLINELIST` so dynamo traces into `_WrapperModule`
Fixes https://github.com/pytorch/pytorch/issues/146867
Test Plan:
```
buck run fbcode//mode/dev-nosan //caffe2/test:test_export -- -r wrapper_module
```
Differential Revision: D72986826
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151264
Approved by: https://github.com/angelayi
Summary:
as title
`export._trace._WrapperModule` is used to wrap functions into a Module so we can export the function.
We add `export._wrapper_utils` to `dynamo`'s `MOD_INLINELIST` so dynamo traces into `_WrapperModule`
Fixes https://github.com/pytorch/pytorch/issues/146867
Test Plan:
```
buck run fbcode//mode/dev-nosan //caffe2/test:test_export -- -r wrapper_module
```
Differential Revision: D69434316
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146919
Approved by: https://github.com/angelayi
Summary:
- Flip default value of `strict` argument from True to False on torch.export.export_for_training API
- All callsites have been updated to provide this argument explicitly to avoid behavior change.
- If you see any breakages, that means you may have a new callsite that is missed, please set `strict=True` explicitly to the callsite to mitigage.
Test Plan: CI
Differential Revision: D72724975
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150941
Approved by: https://github.com/ydwu4
Changes decomposition behavior of `aten.to` to respect the aliasing/non-aliasing behavior in eager, and to specialize to the input/conversion dtype & device.
Before change: we always decompose `aten.to` into `_to_copy`, regardless of aliasing behavior. This leads us to ban mutations on the result of `_to_copy` when aliased, since we can't guarantee correct program semantics. This meant users had to explicitly call `.clone()` before mutating. In the special cases where we don’t ban mutations (e.g. dtype conversion), we add runtime assertions on the input & conversion dtype/devices in the decomposed program (see https://github.com/pytorch/pytorch/pull/142420).
After change: we decompose to the aliasing/non-aliasing behavior that matches eager, allowing mutations in all cases. We also add dtype/device assertions for all `aten.to` ops, starting in the pre-dispatch graph, basically specializing the program to the dtype/devices.
Differential Revision: D71229547
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149235
Approved by: https://github.com/tugsbayasgalan
Summary:
Instead of explicitly specifying dynamic shapes, it is possible to infer them from additional example inputs. Together with the example inputs provided to export, we can basically make any varying dim dynamic and keep any fixed dim static. This should be useful for prod scenarios that have access to tests and/or profiling data, yet are somewhat removed from the model authoring process.
However this alone is not satisfactory: the exported program by design has only one graph, representing one path through the model, and we cannot necessarily guarantee that this graph works for the additional example inputs because different guards might have been created if we had exported with them instead (corresponding to different traced paths). However, checking that the additional example inputs satisfy the guards created by the original export should be sufficient for generalization.
Now, while we don't preserve all guards in the exported program, we do check a subset of them as part of input matching. So we add a verification step at the end of export when such additional example inputs are provided. This should be enough for now.
Test Plan: added test (positive and negative cases)
Differential Revision: D72001771
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150144
Approved by: https://github.com/bobrenjc93
Lazos correctly pointed out this doesn't make sense for compile since
we graph break in compile. This results in tons of unwanted user log
spew. We do want this in export though since it's drastiaclly reduced
the support load for DDEs. This PR does the refactor to keep it in
export but remove it from compile
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149831
Approved by: https://github.com/mlazos
Summary: When we call torch.inference_mode, we seem to skip Autograd key causing the custom op export uses to be not decomposed properly before subclass dispatching starts. We fix this by force desugaring this op at Python key
Test Plan: test
Differential Revision: D71599541
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149698
Approved by: https://github.com/bdhirsh