We need to make function schema proxyable to trace a the auto_functionalized hop that takes function schema as inputs. The implementation basically follows how we support torchbind object:
1. upon seeing an untracked function schema arg, we creates a constant get_attr node
2. we track the function schema argument in export to support lift/unlift.
3. we need to support serde for functional schema. We'll add support for this in follow-up PRs.
However, compared with torchbind object:
1. we don't need a dynamo implementation, because the function schema is added when we auto_functionalize a hop to the argument of auto_functionalized. One potential use case is users re-traces an exported program with strict mode. Since non-strict is the default now, we don't see a use case yet.
2. we don't need an inductor implementation, because the function schema will go away after auto_functionalized re-inplacing pass.
edit: we greatly simplifies (and generalizes) the implementation following @zou3519 's suggestion of using pytree.register_constant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152073
Approved by: https://github.com/zou3519
ghstack dependencies: #152072
In https://github.com/pytorch/pytorch/issues/151746, users ran into an error where a custom triton op cannot be resolved into an operator from string target. We improve the error message by reminding users to register the same custom operator at de-serialization time.
Now the error looks like this:
```python
torch._export.serde.serialize.SerializeError: We failed to resolve torch.ops.triton_kernel.add.default to an operator. If it's a custom op/custom triton op, this is usally because the custom op is not registered when deserializing. Please import the custom op to register it before deserializing. Otherwise, please file an issue on github. Unsupported target type for node Node(target='torch.ops.triton_kernel.add.default', inputs=[NamedArgument(name='x', arg=Argument(as_tensor=TensorArgument(name='linear')), kind=1), NamedArgument(name='y', arg=Argument(as_tensor=TensorArgument(name='mul')), kind=1)], outputs=[Argument(as_tensor=TensorArgument(name='add'))], metadata={'stack_trace': 'File "/data/users/yidi/pytorch/test.py", line 50, in forward\n output = triton_add(dense_output, bias)', 'nn_module_stack': 'L__self__,,__main__.SimpleModel', 'torch_fn': 'add.default_1;OpOverload.add.default'}, is_hop_single_tensor_return=None): <class 'str'>.```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152029
Approved by: https://github.com/jingsh
This has been pretty helpful for the size-oblivious rewrite. Wanted the variadic args version to avoid `sym_or(a, sym_or(b, sym_or(c, d)))` in favor of `sym_or(a, b, c, d)`. Happy to change this to ban the 1-arg version.
This is better than plain and/or because the whole symbolic expression gets preserved, and if we guard on it or defer as a runtime assert, we preserve all branches.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150456
Approved by: https://github.com/laithsakka
An internal model was serialized in 2023, and is now breaking while loading with the following error:
```
File "<eval_with_key>.1675", line 4
def forward(self, arg1163_1, arg1164_1, , arg1166_1, , arg1168_1, arg1169_1, arg1170_1, , arg1172_1, arg1173_1, arg1174_1, arg1175_1, arg1176_1, arg1177_1, arg1178_1, arg1179_1, arg1180_1, arg1181_1, arg1182_1, arg1183_1, arg1184_1, arg1185_1, arg1186_1, arg1187_1, arg1188_1, arg1189_1, arg1190_1, arg1191_1, arg1192_1, arg1193_1, arg1194_1, arg1195_1, arg1196_1, arg1197_1, arg1198_1, arg1199_1, arg1200_1, arg1201_1, arg1202_1, arg1203_1, arg1204_1, arg1205_1, arg1206_1, arg1207_1, arg1208_1, arg1209_1, arg1210_1, arg1211_1, arg1212_1, arg1213_1, arg1214_1, arg1215_1, arg1216_1, , arg1218_1, arg1219_1, arg1220_1, arg1221_1, arg1222_1, arg1223_1, arg1224_1, , arg1226_1, arg1227_1, arg1228_1, , arg1230_1, , , , , , , , , , , , , , , ):
^
SyntaxError: invalid syntax
```
The syntax errors are due to inputs that are `None` when exporting. Prior to changes in https://github.com/pytorch/pytorch/pull/123590 (landed 4/2024), input specs for none inputs look like `InputSpec(userInput=UserInputSpec(arg=Argument(asNone=True)))`, and during deserialization when creating a node, we would just use a dummy name `arg`. After to those changes, the input specs for none inputs look like `InputSpec(constantInput=InputToConstantInputSpec(name='y', value=ConstantValue(asNone=True)))`, and when creating a node we would use the name `y` as the name. However the PR didn't handle the case if it's loading an old package which doesn't have this name, so ended up putting empty names in the placeholder nodes.
This error was uncovered after https://github.com/pytorch/pytorch/pull/149717, where we now use the GraphModule's python codegen to run the UnflattenedModule instead of going through the interpreter path. The placeholder nodes having empty names caused the python codegen to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150515
Approved by: https://github.com/yushangdi
Changes in this PR:
1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.
Resolves#75982. New tests are included in this PR.
- #75982
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
Changes in this PR:
1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.
Resolves#75982. New tests are included in this PR.
- #75982
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
Summary:
**Codegen**
- Skip some codegen parts for torchbind (such as arg decleration) because they are loaded in proxy executor, so we do not need to declare torchbind args in cpp code
- Added a helper method to get the schema of CallTorchBind HOP. The returned schema is only the schema of `obj.method()`.
**Serialization**
Add support for torchbind object in serialization
- For CallTorchBind HOP, we need to handle it specially because of it's schema. The output serialized args is in the format of `(obj, method, *args, **kwargs)`.
- it.TorchBindObject inputs are serialized to `as_custom_obj` Argument.
**Packaging**
Add torchbind objects file and `custom_objs_config.json` file to generated files output of `aot_compile`.
The json file is stored in the `data/aotinductor/<model_name>` folder in pt2 archive.
The torchbind objects are stored in data/constants/ folder in pt2 archive.
The format of torchbind objects are `f"{CUSTOM_OBJ_FILENAME_PREFIX}{custom_obj_idx}"`. e.g. `custom_obj_0`.
CustomClassHolder objects implement their own pickle methods.
Note that this `custom_objs_config.json` file is different from the `model_constants_config.json` file produced in package_sigmoid(). The keys in `custom_objs_config` directly correspond to the arg name in extern nodes json.
The key in `model_constants_config.json` produced by `package_sigmoid` is the attribute name in the user mode code.
This is required for both internal and OSS torchbind support.
For OSS torchbind support, we also need to package torchbind_constants into the .pt2 output.
**Work Left**
We still need to add torchbind support in ProxyExecutor for inductor.aoti_load_package to work. See other diffs in the stack.
Test Plan:
```
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchbind -- -r schema
buck run fbcode//mode/dev-nosan //caffe2/test/inductor:torchbind -- -r aot_compile
```
Differential Revision: D69490718
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148506
Approved by: https://github.com/angelayi
Changes in this PR:
1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.
Resolves#75982. New tests are included in this PR.
- #75982
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
Plan: avoid the use of unbacked renamings, and introduce a pass run in `_produce_aten_artifact` that recomputes unbacked bindings. Decided to do this because in we don't serialize unbacked renamings (or any ShapeEnv state), so this used to compose poorly with de/serialization. This hopefully establishes the invariant that the unbacked binding keys are always in sync with the example values (i.e. same indices, and removed if the symbol is replaced / specialized).
For de/serialization, we don't stored unbacked bindings, and just rerun the pass.
Involved a refactor of compute_unbacked_bindings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147574
Approved by: https://github.com/avikchaudhuri
Summary: Currently inf is serialized as Infinity in JSON which is not standard compliant. Instead we will tweak all special floating points into strings and handle them at json layer.
Test Plan:
see D69060784
CI
Differential Revision: D69186425
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146490
Approved by: https://github.com/yiming0416
Summary: Implement an oss version of modelrunner with clean dependencies. The new oss model runner only removes thrift and only use json header to load the model.
Test Plan: Test will be added in the next diff separately. (D69060784)
Differential Revision: D68846877
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146440
Approved by: https://github.com/SherlockNoMad
Summary:
Previously we were touching up unbacked bindings between Dynamo and AOTAutograd in strict export, but the logic had a bug: if an unbacked symint gets substituted by a backed symint, we would put the backed symint in the unbacked bindings (the check `is_symbol` was not enough here).
This PR fixes this logic, and moreover, moves it into the serializer instead, because we don't need this adjustment outside serde.
Test Plan: added test
D68880766
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146115
Approved by: https://github.com/pianpwk
If a user passes in a namedtuple as an input, currently the input TreeSpec looks like: `TreeSpec(type=namedtuple, context=”class_fqn”, children_spec=[*, *])`
The user then saves the program containing this input TreeSpec. But what happens if they load it in a new environment where `class_fqn` now contains an additional field?
This means that the exported program is now expected to take in another input. But since those fields were not used in the original program, users should be able just drop those additional fields and the program will run successfully. This is needed/used in APS where they use unflattener's adapter to adapt the inputs based on the previously saved treespecs.
There are a couple of [solutions](https://docs.google.com/document/d/1V4ZSdy-8PUISWc8RqvGu3DU01BVegJhHHPWqa1Io7Eg/edit?tab=t.0) for how we can address this, but eventually we settled on saving a side table mapping namedtuple types to their list of field names, which can then be accessed by the adapter.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145956
Approved by: https://github.com/zhxchen17
Summary:
Previously we were touching up unbacked bindings between Dynamo and AOTAutograd in strict export, but the logic had a bug: if an unbacked symint gets substituted by a backed symint, we would put the backed symint in the unbacked bindings (the check `is_symbol` was not enough here).
This PR fixes this logic, and moreover, moves it into the serializer instead, because we don't need this adjustment outside serde.
Test Plan: added test
Differential Revision: D68880766
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146115
Approved by: https://github.com/pianpwk
Summary:
Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering.
I introduce a new HOP to address this.
The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```
There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`.
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.
Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test
Reviewed By: zhxchen17
Differential Revision: D68359391
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145630
Approved by: https://github.com/zou3519
Instead of bumping symint counters when we process unbacked bindings during deserialization, it's better to bump them at the beginning based on what the symbols in the original shape env before serialization were. This allows symbols in unbacked bindings to have "gaps" that bumping alone would not be able to match.
Why is bumping counters important at all? It is because when the shape env coming out of deserialization is used later for propagating symints, say in run_decompositions, we don't want new names to clash with existing names (bad things happen).
Differential Revision: [D68798191](https://our.internmc.facebook.com/intern/diff/D68798191/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145882
Approved by: https://github.com/pianpwk
Adds unbacked bindings during deserialization. These are carried by a node's metadata, and map pending fresh unbacked symbols to paths to such symbols inside the corresponding example value carried by the node's metadata.
Since it is awkward to serialize paths, we only serialize the names of these symbols and reconstruct the paths on deserialization, using a shape env util. We also need to bump counters for unbacked symbols here, because the shape env util we use to create these symbols (when deserializing example values) don't do so, and not doing so makes later passes (like `run_decompositions`) crash because new unbacked symbols don't get new names.
This is enough for non-strict. For strict, the unbacked bindings and example values in node metadata can get out of sync, because of running AOTAutograd as an additional step after Dynamo. So we have to sync those back.
Differential Revision: [D68232274](https://our.internmc.facebook.com/intern/diff/D68232274/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144894
Approved by: https://github.com/pianpwk
Summary:
Add experimental support for torch.nn.Module as input types.
Before this change, we don't support module inputs but recently we saw some interesting use cases like gpt-fast https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L68 where we directly pass in a module input for different variants of the same models.
Since we don't really care about non-param or non-buffer states in non strict mode, we don't care about those either and pretend they are like plain constants during tracing. We treat any module input like a nested container of tensor, and each time we will automatically register a pytree handler for these module types to flatten its state dict into a group of tensors. We will just inline any module method call during tracing like we did for `self` module in export_for_training. This will make input modules' behavior very similar to the training module in typical case, except that we don't record the inputs as parameter or buffers but rather just plain user inputs.
Test Plan: buck run mode/opt caffe2/test:test_export -- -r test_module_input
Differential Revision: D67680827
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143925
Approved by: https://github.com/tugsbayasgalan
Summary: Introduce `is_hop_single_tensor_return` field to the `Node` class in serialization so that during deserialization when there is a single return, we know whether it is a tuple of a single element or a single element.
Test Plan:
```
buck2 run @mode/dev-nosan sigmoid/inference/test:e2e_test_cpu -- -r E2ETestCPUCond
buck2 run @mode/dev-nosan sigmoid/inference/test:test_passes -- -r test_const_folding2
```
Differential Revision: D66991624
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143227
Approved by: https://github.com/zhxchen17
Summary:
When there is a `torch._check()` that checks if a sym_int is equal to some constant, it will generate 3 nodes in the graph with target `operation.ge`, `operator.le` and `operator.eq`. These operators belong to `_SYM_BOOL_OPS` but the `meta_val` of these nodes are are `bool` instead of `torch.SymBool`.
Similar things can happen to `torch.SymInt`, where a `node.target` belongs to `_SYM_INT_OPS` but `node.meta["val"]` is an `int` instead of `torch.SymInt`.
Therefore, we need to check both `meta_val` type and `node.target` type during serialization.
Test Plan:
```
buck2 run @mode/dev-nosan caffe2/test:test_export -- -r test_sym_bool_torch_check_equal
buck2 run @mode/dev-nosan caffe2/test:test_export -- -r test_sym_int_torch_check_equal
```
Differential Revision: D67883754
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144295
Approved by: https://github.com/avikchaudhuri, https://github.com/angelayi