Summary:
When a C++ custom op returns an uninitialized tensor, it will be marked as None in Python. For this scenario, the user should mark the possibly uninitialized return as Tensor? in the custom op schema.
This diff adds `as_optional_tensor` type to export schema and the support for optional tensor in AOTI proxy executor.
Test Plan:
```
buck2 run mode/dev-nosan caffe2/test/inductor:test_aot_inductor_custom_ops -- -r test_fn_with_optional_tensor_output
```
Differential Revision: D75262529
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154286
Approved by: https://github.com/desertfire
We need to make function schema proxyable to trace a the auto_functionalized hop that takes function schema as inputs. The implementation basically follows how we support torchbind object:
1. upon seeing an untracked function schema arg, we creates a constant get_attr node
2. we track the function schema argument in export to support lift/unlift.
3. we need to support serde for functional schema. We'll add support for this in follow-up PRs.
However, compared with torchbind object:
1. we don't need a dynamo implementation, because the function schema is added when we auto_functionalize a hop to the argument of auto_functionalized. One potential use case is users re-traces an exported program with strict mode. Since non-strict is the default now, we don't see a use case yet.
2. we don't need an inductor implementation, because the function schema will go away after auto_functionalized re-inplacing pass.
edit: we greatly simplifies (and generalizes) the implementation following @zou3519 's suggestion of using pytree.register_constant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152073
Approved by: https://github.com/zou3519
ghstack dependencies: #152072
In https://github.com/pytorch/pytorch/issues/151746, users ran into an error where a custom triton op cannot be resolved into an operator from string target. We improve the error message by reminding users to register the same custom operator at de-serialization time.
Now the error looks like this:
```python
torch._export.serde.serialize.SerializeError: We failed to resolve torch.ops.triton_kernel.add.default to an operator. If it's a custom op/custom triton op, this is usally because the custom op is not registered when deserializing. Please import the custom op to register it before deserializing. Otherwise, please file an issue on github. Unsupported target type for node Node(target='torch.ops.triton_kernel.add.default', inputs=[NamedArgument(name='x', arg=Argument(as_tensor=TensorArgument(name='linear')), kind=1), NamedArgument(name='y', arg=Argument(as_tensor=TensorArgument(name='mul')), kind=1)], outputs=[Argument(as_tensor=TensorArgument(name='add'))], metadata={'stack_trace': 'File "/data/users/yidi/pytorch/test.py", line 50, in forward\n output = triton_add(dense_output, bias)', 'nn_module_stack': 'L__self__,,__main__.SimpleModel', 'torch_fn': 'add.default_1;OpOverload.add.default'}, is_hop_single_tensor_return=None): <class 'str'>.```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152029
Approved by: https://github.com/jingsh
This has been pretty helpful for the size-oblivious rewrite. Wanted the variadic args version to avoid `sym_or(a, sym_or(b, sym_or(c, d)))` in favor of `sym_or(a, b, c, d)`. Happy to change this to ban the 1-arg version.
This is better than plain and/or because the whole symbolic expression gets preserved, and if we guard on it or defer as a runtime assert, we preserve all branches.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150456
Approved by: https://github.com/laithsakka
Summary: Added a field `protocol` to `ExternKernelNodes` and all the lowering pass will always use the oss schema to serialize external kernel nodes from now on.
Test Plan: CI
Differential Revision: D72020444
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150197
Approved by: https://github.com/zhxchen17
An internal model was serialized in 2023, and is now breaking while loading with the following error:
```
File "<eval_with_key>.1675", line 4
def forward(self, arg1163_1, arg1164_1, , arg1166_1, , arg1168_1, arg1169_1, arg1170_1, , arg1172_1, arg1173_1, arg1174_1, arg1175_1, arg1176_1, arg1177_1, arg1178_1, arg1179_1, arg1180_1, arg1181_1, arg1182_1, arg1183_1, arg1184_1, arg1185_1, arg1186_1, arg1187_1, arg1188_1, arg1189_1, arg1190_1, arg1191_1, arg1192_1, arg1193_1, arg1194_1, arg1195_1, arg1196_1, arg1197_1, arg1198_1, arg1199_1, arg1200_1, arg1201_1, arg1202_1, arg1203_1, arg1204_1, arg1205_1, arg1206_1, arg1207_1, arg1208_1, arg1209_1, arg1210_1, arg1211_1, arg1212_1, arg1213_1, arg1214_1, arg1215_1, arg1216_1, , arg1218_1, arg1219_1, arg1220_1, arg1221_1, arg1222_1, arg1223_1, arg1224_1, , arg1226_1, arg1227_1, arg1228_1, , arg1230_1, , , , , , , , , , , , , , , ):
^
SyntaxError: invalid syntax
```
The syntax errors are due to inputs that are `None` when exporting. Prior to changes in https://github.com/pytorch/pytorch/pull/123590 (landed 4/2024), input specs for none inputs look like `InputSpec(userInput=UserInputSpec(arg=Argument(asNone=True)))`, and during deserialization when creating a node, we would just use a dummy name `arg`. After to those changes, the input specs for none inputs look like `InputSpec(constantInput=InputToConstantInputSpec(name='y', value=ConstantValue(asNone=True)))`, and when creating a node we would use the name `y` as the name. However the PR didn't handle the case if it's loading an old package which doesn't have this name, so ended up putting empty names in the placeholder nodes.
This error was uncovered after https://github.com/pytorch/pytorch/pull/149717, where we now use the GraphModule's python codegen to run the UnflattenedModule instead of going through the interpreter path. The placeholder nodes having empty names caused the python codegen to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150515
Approved by: https://github.com/yushangdi
Changes in this PR:
1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.
Resolves#75982. New tests are included in this PR.
- #75982
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
Changes in this PR:
1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.
Resolves#75982. New tests are included in this PR.
- #75982
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
Summary:
**Codegen**
- Skip some codegen parts for torchbind (such as arg decleration) because they are loaded in proxy executor, so we do not need to declare torchbind args in cpp code
- Added a helper method to get the schema of CallTorchBind HOP. The returned schema is only the schema of `obj.method()`.
**Serialization**
Add support for torchbind object in serialization
- For CallTorchBind HOP, we need to handle it specially because of it's schema. The output serialized args is in the format of `(obj, method, *args, **kwargs)`.
- it.TorchBindObject inputs are serialized to `as_custom_obj` Argument.
**Packaging**
Add torchbind objects file and `custom_objs_config.json` file to generated files output of `aot_compile`.
The json file is stored in the `data/aotinductor/<model_name>` folder in pt2 archive.
The torchbind objects are stored in data/constants/ folder in pt2 archive.
The format of torchbind objects are `f"{CUSTOM_OBJ_FILENAME_PREFIX}{custom_obj_idx}"`. e.g. `custom_obj_0`.
CustomClassHolder objects implement their own pickle methods.
Note that this `custom_objs_config.json` file is different from the `model_constants_config.json` file produced in package_sigmoid(). The keys in `custom_objs_config` directly correspond to the arg name in extern nodes json.
The key in `model_constants_config.json` produced by `package_sigmoid` is the attribute name in the user mode code.
This is required for both internal and OSS torchbind support.
For OSS torchbind support, we also need to package torchbind_constants into the .pt2 output.
**Work Left**
We still need to add torchbind support in ProxyExecutor for inductor.aoti_load_package to work. See other diffs in the stack.
Test Plan:
```
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchbind -- -r schema
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchbind -- -r aot_compile
```
Differential Revision: D69490718
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148506
Approved by: https://github.com/angelayi
Changes in this PR:
1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.
Resolves#75982. New tests are included in this PR.
- #75982
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
Plan: avoid the use of unbacked renamings, and introduce a pass run in `_produce_aten_artifact` that recomputes unbacked bindings. Decided to do this because in we don't serialize unbacked renamings (or any ShapeEnv state), so this used to compose poorly with de/serialization. This hopefully establishes the invariant that the unbacked binding keys are always in sync with the example values (i.e. same indices, and removed if the symbol is replaced / specialized).
For de/serialization, we don't stored unbacked bindings, and just rerun the pass.
Involved a refactor of compute_unbacked_bindings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147574
Approved by: https://github.com/avikchaudhuri
Summary:
Generate two helper functions for enum classes in generated_serialization_types.h
printEnum: will convert enum values into strings.
parseEnum: will convert strings into enum values.
Test Plan: CI
Differential Revision: D69604850
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147126
Approved by: https://github.com/yiming0416
Summary: Currently inf is serialized as Infinity in JSON which is not standard compliant. Instead we will tweak all special floating points into strings and handle them at json layer.
Test Plan:
see D69060784
CI
Differential Revision: D69186425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146490
Approved by: https://github.com/yiming0416
Summary: Implement an oss version of modelrunner with clean dependencies. The new oss model runner only removes thrift and only use json header to load the model.
Test Plan: Test will be added in the next diff separately. (D69060784)
Differential Revision: D68846877
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146440
Approved by: https://github.com/SherlockNoMad
Summary:
Previously we were touching up unbacked bindings between Dynamo and AOTAutograd in strict export, but the logic had a bug: if an unbacked symint gets substituted by a backed symint, we would put the backed symint in the unbacked bindings (the check `is_symbol` was not enough here).
This PR fixes this logic, and moreover, moves it into the serializer instead, because we don't need this adjustment outside serde.
Test Plan: added test
D68880766
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146115
Approved by: https://github.com/pianpwk
This enables a check that which a class which only inherits from immutable classes like str, tuple, and NamedTuple, also defined `__slots__` so they don't allocate memory unnecessarily. This also ensure contributors think about how they define their classes with subclass NamedTuples and str, of which we have many in our codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146276
Approved by: https://github.com/aorenste
If a user passes in a namedtuple as an input, currently the input TreeSpec looks like: `TreeSpec(type=namedtuple, context=”class_fqn”, children_spec=[*, *])`
The user then saves the program containing this input TreeSpec. But what happens if they load it in a new environment where `class_fqn` now contains an additional field?
This means that the exported program is now expected to take in another input. But since those fields were not used in the original program, users should be able just drop those additional fields and the program will run successfully. This is needed/used in APS where they use unflattener's adapter to adapt the inputs based on the previously saved treespecs.
There are a couple of [solutions](https://docs.google.com/document/d/1V4ZSdy-8PUISWc8RqvGu3DU01BVegJhHHPWqa1Io7Eg/edit?tab=t.0) for how we can address this, but eventually we settled on saving a side table mapping namedtuple types to their list of field names, which can then be accessed by the adapter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145956
Approved by: https://github.com/zhxchen17
Summary:
Previously we were touching up unbacked bindings between Dynamo and AOTAutograd in strict export, but the logic had a bug: if an unbacked symint gets substituted by a backed symint, we would put the backed symint in the unbacked bindings (the check `is_symbol` was not enough here).
This PR fixes this logic, and moreover, moves it into the serializer instead, because we don't need this adjustment outside serde.
Test Plan: added test
Differential Revision: D68880766
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146115
Approved by: https://github.com/pianpwk
Summary:
Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering.
I introduce a new HOP to address this.
The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```
There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`.
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.
Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test
Reviewed By: zhxchen17
Differential Revision: D68359391
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145630
Approved by: https://github.com/zou3519
Instead of bumping symint counters when we process unbacked bindings during deserialization, it's better to bump them at the beginning based on what the symbols in the original shape env before serialization were. This allows symbols in unbacked bindings to have "gaps" that bumping alone would not be able to match.
Why is bumping counters important at all? It is because when the shape env coming out of deserialization is used later for propagating symints, say in run_decompositions, we don't want new names to clash with existing names (bad things happen).
Differential Revision: [D68798191](https://our.internmc.facebook.com/intern/diff/D68798191/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145882
Approved by: https://github.com/pianpwk