Summary:
Added the contextmanager, `_disable_interpreter`, which is meant to put around a call to `unflatten`. This will generate an UnflattendModule and sub-InterpreterModules which will not use torch.fx.Interpreter to run eagerly. We want to have this as a state of the module instead of a contextmanager around running the module because it's not clear where we are calling the unflattened module.
This seems to improve the performance: https://fb.workplace.com/groups/1075192433118967/posts/1473590629945810/?comment_id=1473621763276030
Test Plan: CI
Differential Revision: D60939034
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133996
Approved by: https://github.com/pianpwk
Summary: Fixes T192448049. The module call form an unusal call stack for the nodes: https://www.internalfb.com/phabricator/paste/view/P1507230978. This is currently not supported by unflattener and need some extra design to make it work.
Test Plan: buck2 run 'fbcode//mode/opt' torchrec/distributed/tests:test_pt2 -- --filter-text "test_sharded_quant_fpebc_non_strict_export"
Reviewed By: zhxchen17
Differential Revision: D60528900
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132437
Approved by: https://github.com/Skylion007
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.
Fixes#35735
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971
Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos
Summary:
Export, through AOTAutograd, [deduplicates](11ff5345d2/torch/fx/experimental/proxy_tensor.py (L198)) sym_size calls, which can cause issues during unflattening when the sym_size node is used in multiple submodules.
If preserve_call_module_signature is set, these nodes can't be passed between submodules as placeholders, so the calls (and any downstream un-duplicated nodes) must be copied. Adding this to unflattener
Test Plan: export unflatten test case
Reviewed By: TroyGarden, angelayi
Differential Revision: D58697231
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129153
Approved by: https://github.com/angelayi
Code snippet from TorchTitan (LLaMa):
```
for layer in self.layers.values():
h = layer(h, self.freqs_cis)
```
`self.freqs_cis` is a buffer of root module (`self`).
It is also an explicit arg in the call signature of original `layer` modules.
If not respecting scope -- `freqs_cis`'s scope only corresponds to root -- `_sink_param` can remove `freqs_cis` from `layer`'s call signature, resulting in runtime error.
There are two fixes in this PR:
1. We filter out the `inputs_to_state` corresponding to the current scope, using existing code that does prefix matching.
2. We delay the removal of param inputs from `call_module` nodes' `args`, till `_sink_param` call on that submodule returns. The return now returns information on which input is actually removed by the submodule, thus more accurate than just doing:
```
for node in call_module_nodes:
node.args = tuple(filter(lambda n: n.name not in inputs_to_state, node.args))
```
Before the PR:

After the PR:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127607
Approved by: https://github.com/pianpwk
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127125
Approved by: https://github.com/Skylion007
ghstack dependencies: #127122, #127123, #127124
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125758
Aliased and unused params are currently an issue for strict-mode export. For a model like this:
```
def __init__(self):
# ...
self.alpha = nn.Parameter(torch.randn(4))
self.beta = self.alpha
self.gamma = self.alpha
def forward(self, x):
return x + self.beta
```
Dynamo will trace only 1 parameter (beta) and assign a dynamo name (e.g. `L__self___beta`) which can be difficult to match to the correct FQN in the original eager module. This leads to export graph signature potentially having the incorrect target FQN for the parameter, leading to downstream issues unflattening (the parameter may be assigned to the wrong target attribute, mismatching the relevant placeholder node in the unflattened module).
This handles aliasing issues by assigning all tensors present in the state dict as module attributes, even if they're unused. Still, only the used tensors will appear in the graph's forward pass.
Another issue that exists is weight-sharing is not maintained in unflattening (all params/buffers are re-cloned) - handle this by checking tensor ids too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125758
Approved by: https://github.com/zhxchen17
A re-land of #124239.
This PR fakify ScriptObject inputs and attributes in export non-strict mode by default.
The basic idea is to only fakify the script object during tracing (i.e. aot_export). After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor.
Concretely, we
fakify all the script object inputs, and module attributes (gathered by constant_attrs).
patch the module's attributes with fakified script object
right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125490
Approved by: https://github.com/angelayi
In the given test case, we have a ModuleList of 3 modules (`norm.0`, `norm.1`, `norm.2`) which share the same `weight` and `bias` tensors. However when we trace, they all end up pointing to one state dict name, (ex. `norm.2`).
```
graph():
%p_norms_0_weight : [num_users=0] = placeholder[target=p_norms_0_weight]
%p_norms_0_bias : [num_users=0] = placeholder[target=p_norms_0_bias]
%p_norms_1_weight : [num_users=0] = placeholder[target=p_norms_1_weight]
%p_norms_1_bias : [num_users=0] = placeholder[target=p_norms_1_bias]
%p_norms_2_weight : [num_users=3] = placeholder[target=p_norms_2_weight]
%p_norms_2_bias : [num_users=3] = placeholder[target=p_norms_2_bias]
%input_ : [num_users=1] = placeholder[target=input_]
%native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%input_, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {})
%getitem : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm, 0), kwargs = {})
%native_layer_norm_1 : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%getitem, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {})
%getitem_3 : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm_1, 0), kwargs = {})
%native_layer_norm_2 : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%getitem_3, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {})
%getitem_6 : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm_2, 0), kwargs = {})
return (getitem_6,)
```
This causes an error in the unflattener where after constructing the submodules for `norm.0`, it will have the graph pointing to `norm.2.weight` and `norm.2.bias`:
```
graph():
%p_norms_2_bias : [num_users=1] = placeholder[target=p_norms_2_bias]
%p_norms_2_weight : [num_users=1] = placeholder[target=p_norms_2_weight]
%input_ : [num_users=1] = placeholder[target=input_]
%native_layer_norm : [num_users=1] = call_function[target=torch.ops.aten.native_layer_norm.default](args = (%input_, [2, 2, 3], %p_norms_2_weight, %p_norms_2_bias, 1e-05), kwargs = {})
%getitem : [num_users=1] = call_function[target=operator.getitem](args = (%native_layer_norm, 0), kwargs = {})
return getitem
```
Since the attributes are not within the same scope of the graph, (`norm.0` vs. `norm.2`), they will not be added to the subgraph, causing an error.
So this PR handles the duplicate state dict attributes by modifying the `inputs_to_state` dict to map from node names to a list of possible state dict target names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125192
Approved by: https://github.com/zhxchen17
This PR fakify ScriptObject inputs and attributes in export non-strict mode by default.
The basic idea is to `only fakify the script object during tracing (i.e. aot_export)`. After we get the traced graph module, eagerly executing, serializing, or running more passes will use the real script objects. This is essentially treating the script object as constant tensor.
Concretely, we
1. fakify all the script object inputs, and module attributes (gathered by constant_attrs).
2. patch the module's attributes with fakified script object
3. right after aot_export, remove the patching (to avoid changing the original module) then modify the exported graph module's attribute to real script object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124239
Approved by: https://github.com/zou3519
Summary:
Make sure the order of submodules is the same as the original eager module.
bypass-github-export-checks
Test Plan: buck test mode/opt caffe2/test:test_export -- -r test_unflatten_submodule_ordering
Differential Revision: D55251277
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122507
Approved by: https://github.com/tugsbayasgalan
Summary:
When we export in on strict mode and turn on preserve_module_call_signature, the following assertion error will occur today:
```
child_split[: len(parent_split)] == parent_split
```
This is due to the fact that we're monkey patching forward call directly, which kinda breaks the attribute propagation in the tracer. It's actually better to implement this by using forward hook because we don't have to alter the original module structure at all during export.
Test Plan: CI
Differential Revision: D54102714
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120468
Approved by: https://github.com/ydwu4
Summary: `ExportedProgram` is an artifact produced by torch.export, containing the graph that is exported, along with other attributes about the original program such as the graph signature, state dict, and constants. One slightly confusing thing that users run into is that they treat the `ExportedProgram` as a `torch.nn.Module`, since the object is callable. However, as we do not plan to support all features that `torch.nn.Module`s have, like hooks, we want to create a distinction between it and the `ExportedProgram` by removing the `__call__` method. Instead users can create a proper `torch.nn.Module` through `exported_program.module()` and use that as a callable.
Test Plan: CI
Differential Revision: D53075378
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119466
Approved by: https://github.com/zhxchen17, https://github.com/thiagocrepaldi
This PR rewrites two paths to use the newly-added keypaths API in pytree:
First: we were hand-rolling a tree_map during fakification because we wanted to track sources. This PR uses keypaths instead, which can do the same thing without needing custom code.
Second: our constraint error formatting was referencing placeholder names in error messages. These placeholder names are not otherwise user-visible, so they are super confusing to users (e.g. "which input does arg1_3 correspond to?"). This diff uses the `keystr` API to format the error message.
This necessitated some small refactors—generating the keystr is expensive so doing it in an f-string was very bad.
It can also be further improved—we can inspect the signature so that instead of `*args[0]` we can give people the actual argument name, which would be the ideal UX. But leaving that for later.
Differential Revision: [D53139358](https://our.internmc.facebook.com/intern/diff/D53139358/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118609
Approved by: https://github.com/zhxchen17
ghstack dependencies: #118607, #118608