Commit Graph

74 Commits

Author SHA1 Message Date
angelayi
e43d33f4f7 [export] Support torch.sym* ops (#115854)
Fixes https://github.com/pytorch/pytorch/issues/108830 and https://github.com/pytorch/executorch/issues/1379#issuecomment-1853322866

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115854
Approved by: https://github.com/zhxchen17
2023-12-18 17:48:47 +00:00
PyTorch MergeBot
50c9665f92 Revert "[export] Support torch.sym* ops (#115854)"
This reverts commit 347cb91946.

Reverted https://github.com/pytorch/pytorch/pull/115854 on behalf of https://github.com/atalman due to OSSCI oncall, broke multple jobs ([comment](https://github.com/pytorch/pytorch/pull/115854#issuecomment-1858486796))
2023-12-15 21:07:52 +00:00
angelayi
347cb91946 [export] Support torch.sym* ops (#115854)
Fixes https://github.com/pytorch/pytorch/issues/108830 and https://github.com/pytorch/executorch/issues/1379#issuecomment-1853322866

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115854
Approved by: https://github.com/zhxchen17
2023-12-15 20:08:04 +00:00
Zhengxu Chen
ef6a0faf89 [export] Fix canonicalization. (#115830)
Summary: Add the missed layout argument branch.

Test Plan: buck2 test 'fbcode//mode/dev-nosan' fbcode//sigmoid/inference/test_gpu:export_package_sparse_toy_test

Differential Revision: D52166501

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115830
Approved by: https://github.com/angelayi
2023-12-14 22:48:26 +00:00
zhxchen17
d5286d7ea8 [export] Add canonical form for differentiating IR (#115589)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115589
Approved by: https://github.com/suo
2023-12-12 16:21:57 +00:00
Angela Yi
f0cc6364ed [export] Remove convert_to_cpu flag (#114775)
Test Plan: CI

Differential Revision: D51674158

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114775
Approved by: https://github.com/zhxchen17, https://github.com/SherlockNoMad
2023-11-30 01:59:52 +00:00
Angela Yi
05f071d922 [export] Fix state dict device serialization (#114695)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/114000
Will check with SherlockNoMad on why we need to convert to cpu after his PTO

Test Plan: CI

Differential Revision: D51629068

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114695
Approved by: https://github.com/ydwu4
2023-11-29 05:05:22 +00:00
Tobias Ringwald
a28876832c Fixed an export problem when moving tensors to CPU during torch.export.save (#114029)
For whatever reason calling`.cpu()` on a `nn.Parameter` wrapping a CUDA tensor will return a plain (non-parameter) tensor. This PR fixes the symptom in the linked issue, but not the underlying issue.

Fixes #113999.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114029
Approved by: https://github.com/zhxchen17
2023-11-23 21:17:43 +00:00
Angela Yi
f961bda939 [export] Move serialized custom class objs to toplevel (#114371)
Summary:
Move the serialized CustomClassHolder objects to the toplevel SerializedArtifact instead of embedding the bytes in the graph.

Currently the CustomClassHolder objects are embedded in the graph instead of being lifted to the ExportedProgram, so there's some logic introduced to lift it to the higher level of the serialized ExportedProgram. However, once that CustomClassHolder objects get lifted, we can remove the TODOs I added.

Test Plan: CI

Reviewed By: zhxchen17

Differential Revision: D51479125

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114371
Approved by: https://github.com/ydwu4
2023-11-22 23:44:20 +00:00
Angela Yi
9fcf1f9632 [export] Update schema (#114172)
Summary: Will update CustomClassHolder in a followup

Test Plan: CI

Differential Revision: D51343522

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114172
Approved by: https://github.com/zhxchen17
2023-11-22 16:43:43 +00:00
Zhengxu Chen
e4ec5545cd [export] Turn on verifier for serialization. (#113980)
Summary: as title.

Test Plan: CI

Differential Revision: D51435909

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113980
Approved by: https://github.com/larryliu0820
2023-11-20 18:32:16 +00:00
Angela Yi
50101d59ba [export][retry] Move lifted tensors out of state_dict (#113689)
Test Plan: CI

Differential Revision: D51321532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113689
Approved by: https://github.com/zhxchen17
2023-11-15 09:24:49 +00:00
Zhengxu Chen
aa376e31fd [export] Enable verifier [2/n] (#113075)
Summary: Turn on verifier check for exportec program ctor. Note that this effectively detect a large surface of spec violations, so we also spend some time fixing them one by one in this diff.

Test Plan: CI

Differential Revision: D51014944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113075
Approved by: https://github.com/angelayi
2023-11-08 03:32:11 +00:00
Janet Yang
ef1f08c5a0 State_dict serialization for meta tensors (#112213)
Summary: Add cases for serializing meta tensors from state_dict

Test Plan: sandcastle

Differential Revision: D50718161

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112213
Approved by: https://github.com/zhxchen17, https://github.com/houseroad
2023-11-01 01:07:09 +00:00
Zhengxu Chen
f2a0bef35a [export] Upstream support of (tensor, tensor list) in op returns. (#111857)
Summary:
Upstreaming from internal to oss.
Diff: D49710320

Test Plan: buck2 build mode/opt sigmoid/inference/test_gpu:package_gen

Differential Revision: D50577490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111857
Approved by: https://github.com/SherlockNoMad
2023-10-25 21:38:12 +00:00
Sherlock Huang
4d45c21c3f [Export] Don't serialize missing args with default value (#111715)
Summary: Per https://docs.google.com/document/d/1FzWm-sHYwmRi3x_g036kOxd99KaYquUsA-L5JwOn8ys/edit

I wonder if this would break executorch? @larryliu0820
I see exir/serialize.py using export's GraphModuleSerializer.

Test Plan: Existing CIs

Differential Revision: D50519217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111715
Approved by: https://github.com/zhxchen17
2023-10-23 21:09:15 +00:00
Zhengxu Chen
9656ef88b6 [sigmoid] Switch to oss serializer. (#111455)
Differential Revision: D50348807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111455
Approved by: https://github.com/tugsbayasgalan
2023-10-20 18:19:05 +00:00
Sherlock Huang
b72a1402f5 [AOTInductor] ProxyExecutor skips serializing missing args with default value (#111425)
Summary: In AOTInductor ABI Compatible-mode, we don't serialize missing args with default value.

Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops

Differential Revision: D50345729

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111425
Approved by: https://github.com/angelayi
2023-10-18 17:10:42 +00:00
Zhengxu Chen
17002d25c5 [export] Remove call_spec argument from ExportedProgram ctor. (#111407)
Summary: call_spec arg is not used anymore.

Test Plan: CI

Reviewed By: SherlockNoMad, tugsbayasgalan

Differential Revision: D50335365

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111407
Approved by: https://github.com/izaitsevfb
2023-10-17 21:01:37 +00:00
Zhengxu Chen
ba7b9211ee [export] Update serialization schema to input/output specs. (#845) (#111204)
Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/845

Test Plan: CI

Differential Revision: D50191531

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111204
Approved by: https://github.com/angelayi
2023-10-13 22:19:56 +00:00
Zhengxu Chen
168bad5f23 [export] Reland "Fix graph signature data model to list of specs." (#111136)
Summary: reland D49876258

Test Plan: CI

Differential Revision: D50224384

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111136
Approved by: https://github.com/angelayi
2023-10-13 02:04:29 +00:00
PyTorch MergeBot
42b89aea4b Revert "[export] Fix graph signature data model to list of specs. (#111017)"
This reverts commit 33b69509d3.

Reverted https://github.com/pytorch/pytorch/pull/111017 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111017#issuecomment-1759292161))
2023-10-12 09:52:33 +00:00
Zhengxu Chen
33b69509d3 [export] Fix graph signature data model to list of specs. (#111017)
Summary:
Previously we design the GraphSignature format as a bunch of inputs and outputs node names. After a discussion in the design meeting we decide to change the format to make signature more self-contained. Now the signature format look like the following:
```
[
InputSpec(
   kind=InputKind.USER_INPUT,
   arg=TensorArgument(name="arg0_1"),
   target=None,
),
...
]
```

Test Plan: CI

Reviewed By: angelayi

Differential Revision: D49876258

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111017
Approved by: https://github.com/angelayi
2023-10-12 03:39:04 +00:00
Tugsbayasgalan Manlaibaatar
cd275dc24f Remove RangeConstraints in favor of ValueRanges (#109859)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109859
Approved by: https://github.com/avikchaudhuri
2023-10-10 22:22:05 +00:00
Zhengxu Chen
be5dc3a00d [export] Update ArgumentSpec definition. (#110612)
Summary: Changing ArgumentSpec into a true union type in Python without changing serialization format.

Test Plan: CI

Differential Revision: D49871088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110612
Approved by: https://github.com/angelayi
2023-10-06 03:14:45 +00:00
Sherlock Huang
f1b94461aa [AOTInductor] ProxyExecutor support Dynamic Shape (#110526)
Summary:
Extend ProxyExecutor to support dynamic shape.

Example of ProxyExecutor invocation with symints.
```
    int64_t* arg0_1_size;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_get_sizes(arg0_1, &arg0_1_size));
    auto s0 = arg0_1_size[0];
    auto s1 = arg0_1_size[1];
    int64_t* arg1_1_size;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_get_sizes(arg1_1, &arg1_1_size));
    auto s2 = arg1_1_size[0];
    auto s3 = arg1_1_size[1];
    ...
    aoti_torch_proxy_executor_call_function(proxy_executor, 0, 15, std::vector<int64_t>{42, 16, 17, s0 + s1, s0 + s1, s2*s3, 45, 67, 16, 17, s2*s3, s2*s3, s0 + s1, 89, 910}.data(), 7, std::vector<AtenTensorHandle>{arg0_1, arg0_1, arg1_1, buf2, arg0_1, arg1_1, buf4}.data());
```

Example of serialized SymInt(s) arguments:
```
          {
            "name": "symint",
            "arg": {
              "asSymInt": {
                "asName": "s0 + s1"
              }
            }
          },
          {
            "name": "symints",
            "arg": {
              "asSymInts": [
                {
                  "asName": "s0 + s1"
                },
                {
                  "asName": "s2*s3"
                }
              ]
            }
          },
          ...
          {
            "name": "o_symint",
            "arg": {
              "asSymInt": {
                "asName": "s2*s3"
              }
            }
          },
          {
            "name": "o_symints",
            "arg": {
              "asSymInts": [
                {
                  "asName": "s2*s3"
                },
                {
                  "asName": "s0 + s1"
                }
              ]
            }
          },
```

Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops

Differential Revision: D49887555

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110526
Approved by: https://github.com/chenyang78
2023-10-05 19:05:20 +00:00
Sherlock Huang
50054b1a62 [AOTInductor] ProxyExecutor support ReinterpretView inputs (#110451)
Summary:
See wrapper.codegen_reinterpret_view(), it return a temporary handle for tensor, which has following problem.
```
            # NB, the return handle here represents a temporary tensor, which will be automatically
            # released.
            # Here's a sample usage in the cpp wrapper code:
            # ```
            # aoti_torch_addmm_out(
            #     buf1,
            #     arg1_1,
            #     RAIIAtenTensorHandle(tmp_tensor_handle_0),
            #     buf0,
            #     1L,
            #     1L));
            # ```
            # RAIIAtenTensorHandle(tmp_tensor_handle_0) will be released after the call to addmm_out.
            # This could be problematic when it's used in a different pattern, for example:
            # ````
            # AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6};
            # aoti_torch_proxy_executor_call_function(..., tensor_args);
            # ````
            # RAIIAtenTensorHandle(tmp_tensor_handle_2) will be invalid when it's used in the latter
            # kernel call.
            return f"RAIIAtenTensorHandle({tmp_name})"
```

As a result, ProxyExecutor would generate following code, which cause invalid memory access.

Before:

```
    // Source Nodes: [fn_with_tuple_output], Original ATen: [fb.fn_with_tuple_output]
    AtenTensorHandle tmp_tensor_handle_2;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__reinterpret_tensor(buf3, 2, int_array_0, int_array_1, 0L, &tmp_tensor_handle_2));
    ...
    AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6};
    int64_t int_args[] = {1};
    aoti_torch_proxy_executor_call_function(proxy_executor, 1, 1, int_args, 3, tensor_args);
    buf3.reset();
```

With fix in this diff, ProxyExecutor generates following code

After:

```
    // Source Nodes: [fn_with_tuple_output], Original ATen: [fb.fn_with_tuple_output]
    AtenTensorHandle tmp_tensor_handle_2;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__reinterpret_tensor(buf3, 2, int_array_0, int_array_1, 0L, &tmp_tensor_handle_2));
    ...
    aoti_torch_proxy_executor_call_function(proxy_executor, 1, 1, std::vector<int64_t>{1}.data(), 3, std::vector<AtenTensorHandle>{RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6}.data());
    buf3.reset();
```

I am not exactly a big fan of such `std::vector{...}.data()` for creating a temp array, but I can't think of another fix.

Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops

Reviewed By: desertfire

Differential Revision: D49758764

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110451
Approved by: https://github.com/desertfire
2023-10-04 02:20:31 +00:00
ydwu4
5f7eff0adb Replace node.meta source_fn with source_fn_stack (#108595)
A resubmit of https://github.com/pytorch/pytorch/pull/108447. Copy over the descriptions:

This is a follow-up of the discussion in https://github.com/pytorch/pytorch/pull/108356, where we want to repalce source_fn with source_fn_stack

Before this PR, for the following example:
```python
backend = EagerAndRecordGraphs()

@torch.compile(backend=backend, fullgraph=True)
def cond_f(pred, pred2, x, y):
    def true_fn(pred2, x, y):
        return x + y

    def false_fn(pred2, x, y):
        def true_fn2(x, y):
            return x.sin() - y.cos()

        def false_fn2(x, y):
            return x.cos() - y.sin()

        return control_flow.cond(pred2, true_fn2, false_fn2, (x, y))

    return control_flow.cond(pred, true_fn, false_fn, (pred2, x, y))
```
The graph captured is shown below:
```python
class GraphModule(torch.nn.Module):
    def forward(self, L_pred_ : torch.Tensor, L_pred2_ : torch.Tensor, L_x_ : torch.Tensor, L_y_ : torch.Tensor):
        l_pred_ = L_pred_
        l_pred2_ = L_pred2_
        l_x_ = L_x_
        l_y_ = L_y_

        cond_true_1 = self.cond_true_1
        cond_false_1 = self.cond_false_1
        cond = torch.ops.higher_order.cond(l_pred_, cond_true_1, cond_false_1, [l_pred2_, l_x_, l_y_]);  l_pred_ = cond_true_1 = cond_false_1 = l_pred2_ = l_x_ = l_y_ = None
        return (cond,)

    class GraphModule(torch.nn.Module):
        def forward(self, l_pred2_, l_x_, l_y_):
            add = l_x_ + l_y_;  l_x_ = l_y_ = None
            return add

    class GraphModule(torch.nn.Module):
        def forward(self, l_pred2_, l_x_, l_y_):
            cond_true_0 = self.cond_true_0
            cond_false_0 = self.cond_false_0
            cond = torch.ops.higher_order.cond(l_pred2_, cond_true_0, cond_false_0, [l_x_, l_y_]);  l_pred2_ = cond_true_0 = cond_false_0 = l_x_ = l_y_ = None
            return cond

        class GraphModule(torch.nn.Module):
            def forward(self, l_x_, l_y_):
                sin = l_x_.sin();  l_x_ = None
                cos = l_y_.cos();  l_y_ = None
                sub = sin - cos;  sin = cos = None
                return sub

        class GraphModule(torch.nn.Module):
            def forward(self, l_x_, l_y_):
                cos = l_x_.cos();  l_x_ = None
                sin = l_y_.sin();  l_y_ = None
                sub = cos - sin;  cos = sin = None
                return sub
```
the source_fn for inner cond, sin, cos will be a (name, target) tuple:
```
('cond', <torch._ops.HigherOrderOperator object at xxx>)
('sin', 'sin')
('cos', 'cos')
('sub'. <built-in function sub>)
```

After this pr, the source_fn_stack will be a list of (name, target) tuple. The bottom of stack is the end of the list.
```
[('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>)],
[('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('sin', 'sin')],
[('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cos', 'cos')]
[('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('sub', <built-in function sub>)]
```

Test Plan:
See added tests in test_higher_order_ops.py and modify existing test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108595
Approved by: https://github.com/angelayi, https://github.com/zou3519
2023-09-28 18:18:36 +00:00
Sherlock Huang
7f2b51c668 [AOTInductor] ProxyExecutor supports custom op with tuple output (#110140)
Summary:
Extend ProxyExecutor to support custom ops with tuple outputs.

Generated wrapper code for `out3, out4 = torch.ops.fb.fn_with_tuple_output(out2, 1)`

```
    AtenTensorHandle buf5_handle;  // output buffer
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf5_handle));
    RAIIAtenTensorHandle buf5(buf5_handle);
    AtenTensorHandle buf6_handle;  // output buffer
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf6_handle));
    RAIIAtenTensorHandle buf6(buf6_handle);
    AtenTensorHandle tensor_args_var_3[] = {buf3.get(), buf5.get(), buf6.get()};
    int64_t int_args_var_4[] = {1};
    aoti_torch_proxy_executor_call_function(proxy_executor, 1, 1, int_args_var_4, 3, tensor_args_var_3);
```

Test Plan: Test

Differential Revision: D49673994

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110140
Approved by: https://github.com/chenyang78
2023-09-28 02:50:39 +00:00
Sherlock Huang
ec5bbef8af [AOTInductor] Switch ProxyExecutor to use AtenTensorHandle (#109748)
Summary: Switch ProxyExecutor to use AtenTensorHandle.

Test Plan: E2E Test

Differential Revision: D49471659

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109748
Approved by: https://github.com/yifuwang, https://github.com/desertfire, https://github.com/chenyang78
2023-09-27 17:51:30 +00:00
Sherlock Huang
293205c54b [AOTInductor] Fix aot_inductor/test:test_custom_ops (#109660)
Summary: Fix aot_inductor/test:test_custom_ops, which was broken by https://github.com/pytorch/pytorch/pull/109391

Test Plan: buck2 run mode/dev-nosan //deeplearning/aot_inductor/test:test_custom_ops

Differential Revision: D49438928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109660
Approved by: https://github.com/desertfire, https://github.com/chenyang78
2023-09-20 07:44:39 +00:00
Angela Yi
98208e5160 [export] Update deserialized FakeTensorMode/ShapeEnv with same configs as export (#109522)
Summary: Deserialized FakeTensorMode/ShapeEnv should have the same configs as export: https://fburl.com/code/y7jxf5qw

Test Plan: CI

Differential Revision: D49377410

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109522
Approved by: https://github.com/zhxchen17
2023-09-19 00:34:30 +00:00
Sherlock Huang
b9dfdc091b [AOTInductor][Reland] Proxy Executor for Extern Fallback kernels (#107279) (#108350)
Summary:

This is a prototype for running extern fallback kernels with a host side proxy executor.

Sample of generated cpp wrapper call:
```
        at::Tensor buf0;  // output buffer
        void* tensor_args_var_0[] = {&arg0_1, &arg0_1, &arg1_1, &arg0_1, &arg1_1, &buf0};
        int64_t int_args_var_1[] = {81, 81, 7, 7, 7, 81};
        proxy_executor->call_function("buf0", int_args_var_1, tensor_args_var_0);
```

- In my current implementation, proxy executor interprets the raw pointers according to the ops schema.
This assumes that custom op MUST have a valid schema registered to Dispatcher. (I would like to validate this assumption)
- I am using callboxed() API of the custom kernels. This is inevitable, as we wish to have a single call_function API for all possible custom kernels.

- These are all the input argument types I have support so far.
       union Argument {
         # Bool value does not matter
         1: bool asNone;
         2: TensorArgument asTensor;
         3: list<TensorArgument> asTensors;
         5: i64 asInt;
         7: list<i64> asInts;
         8: double asFloat;
         9: list<double> asFloats;
         10: string asString;
         10.5: list<string> asStrings;
         11: SymIntArgument asSymInt;
         12: list<SymIntArgument> asSymInts;
         13: ScalarType asScalarType;
         14: MemoryFormat asMemoryFormat;
         15: Layout asLayout;
         16: Device asDevice;
         17: bool asBool;
         18: list<bool> asBools;
       }

- Need a policy for handling unpopulated argument with default values. Here are the options, and it has BC  implications.
1. requires exported fx graph to explicitly populate default values, if users doesn't specify.
2. requires cpp wrapper to explicitly populate default values, if fx graph doesn't specify.
3. Proxy executor look up from opSchema for default values.

For fixing T162112344

Test Plan:
frontend:
buck2 run mode/dev-sand mode/inplace -c fbcode.enable_gpu_sections=True sigmoid/frontend:export_main

test:
 buck2 run mode/dev-sand //deeplearning/aot_inductor/test:test_custom_ops

backend:
buck2 run mode/dev-nosan //deeplearning/aot_inductor/fb:main

buck2 test 'fbcode//mode/opt' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_cmf30x (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'

Reviewed By: suo

Differential Revision: D48747417

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108350
Approved by: https://github.com/izaitsevfb
2023-09-02 17:14:10 +00:00
Zhengxu Chen
138fafe72d [export] Fix torch.export() issues for server use cases. (#108275)
Test Plan: In D48788843

Differential Revision: D48811793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108275
Approved by: https://github.com/tugsbayasgalan
2023-08-31 07:19:18 +00:00
angelayi
a432f37e49 Serialize pytree to json string (#106116)
Fixes https://github.com/pytorch/pytorch/pull/102577#issuecomment-1650905536

Serializing to json is more stable, and renamed the API:

```
# Takes in a treespec and returns the serialized treespec as a string. Also optionally takes in a protocol version number.
def treespec_dumps(treespec: TreeSpec, protocol: Optional[int] = None) -> str:
# Takes in a serialized treespec and outputs a TreeSpec
def treespec_loads(data: str) -> TreeSpec:
```

If users want to register their own serialization format for a given pytree, they can go through the `_register_treespec_serializer` API which optionally takes in a `getstate` and `setstate` function.
```
_register_treespec_serializer(type_, *, getstate, setstate)
# Takes in the context, and outputs a json-dumpable context
def getstate(context: Context) -> DumpableContext:
# Takes in a json-dumpable context, and reconstructs the original context
def setstate(dumpable_context: DumpableContext) -> Context:
```

We will serialize to the following dataclass, and then json.dump this it to string.
```
class TreeSpec
    type: Optional[str]  # a string name of the type. null for the case of a LeafSpec
    context: Optional[Any]  # optional, a json dumpable format of the context
    children_specs: List[TreeSpec],
}
```

If no getstate/setstate function is registered, we will by default serialize the context using `json.dumps/loads`. We will also serialize the type through `f"{typ.__module__}.{typ.__name__}"`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106116
Approved by: https://github.com/zou3519
2023-08-27 14:34:49 +00:00
zhxchen17
162109f6c2 [export] Don't save example_inputs for now. (#107978)
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107978
Approved by: https://github.com/angelayi
2023-08-26 14:36:56 +00:00
angelayi
4e9d7f878b [export] Serialize getattr nodes (#107924)
Turns out some graphs will result in getattr nodes...so let's serialize them
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107924
Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri
2023-08-26 02:41:49 +00:00
angelayi
4b44b1861d [export] Store the arguments used to trace the exported program in itself (#107906)
Proper fix would be to do something like https://github.com/pytorch/pytorch/pull/107877, but since that depends on internal changes and it would take too long for diff train to land we will first just make OSS work using torch.save.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107906
Approved by: https://github.com/gmagogsfm
2023-08-25 16:04:58 +00:00
angelayi
1e71c51350 [export] Serialize map correctly (#107837)
Summary: Previously serializing graphs using map would error
because map returns a singleton tensor list rather than a
single tensor. So this diff adds support for if a higher order operator
returns a list of tensors as output.

We also run into an issue with roundtripping the source_fn on
map nodes/subgraphs. The source_fn originally is
<functorch.experimental._map.MapWrapper object at 0x7f80a0549930>, which
serializes to `functorch.experimental._map.map`. However, we are unable
to construct the function from this string. This should be fixed once
map becomes a fully supported operator like
torch.ops.higher_order.cond.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D48631302](https://our.internmc.facebook.com/intern/diff/D48631302)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107837
Approved by: https://github.com/zhxchen17
ghstack dependencies: #107818
2023-08-24 06:47:50 +00:00
angelayi
1166f9a02c [export] Custom object serialization (#107666)
Some NvidaTRT folks were asking for a way to integrate the serialization of custom objects with export's serialization. After some discussion (more background [here](https://docs.google.com/document/d/1lJfxakmgeoEt50inWZ53MdUtOSa_0ihwCuPy_Ak--wc/edit)), we settled on a way for users to register their custom object's serializer/deserializer functions.

Since TorchScript's `.def_pickle` already exists for [registering custom classes](https://pytorch.org/tutorials/advanced/torch_script_custom_classes.html), and `tensorrt.ICudaEngine` already contains a `.def_pickle` implementation, we'll start off by reusing the existing framework and integrating it with export's serialization.

TorchScript's `.def_pickle` requires users to register two functions, which end up being the `__getstate__` and `__setstate__` methods on the class. The semantics of `__getstate__` and `__setstate__` in TorchScript are equivalent to that of Python pickle modules. This is then registered using pybind's `py::pickle` function [here](https://www.internalfb.com/code/fbsource/[f44e048145e4697bccfaec300798fce7daefb858]/fbcode/caffe2/torch/csrc/jit/python/script_init.cpp?lines=861-916) to be used with Python's pickle to initialize a ScriptObject with the original class, and set the state back to what it used to be.

I attempted to call `__getstate__` and `__setstate__` directly, but I couldn't figure out how to initial the object to be called with `__setstate__` in python. One option would be to create a `torch._C.ScriptObject` and then set the class and call `__setstate__`, but there is no constructor initialized for ScriptObjects. Another option would be to construct an instance of the serialized class itself, but if the class constructor required arguments, I wouldn't know what to initialize it with. In ScriptObject's `py::pickle` registration it directly creates the object [here](https://www.internalfb.com/code/fbsource/[f44e048145e4697bccfaec300798fce7daefb858]/fbcode/caffe2/torch/csrc/jit/python/script_init.cpp?lines=892-906), which is why I was thinking that just directly using Python's `pickle` will be ok since it is handled here.

So, what I did is that I check if the object is pickle-able, meaning it contains `__getstate__` and `__setstate__` methods, and if so, I serialize it with Python's pickle. TorchScript does have its own implementation of [pickle/unpickle](https://www.internalfb.com/code/fbsource/[59cbc569ccbcaae0db9ae100c96cf0bae701be9a][history]/fbcode/caffe2/torch/csrc/jit/serialization/pickle.h?lines=19%2C82), but it doesn't seem to have pybinded functions callable from python.

A question is -- is it ok to combine this pickle + json serialization?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107666
Approved by: https://github.com/gmagogsfm
2023-08-24 06:36:23 +00:00
angelayi
7bab98f161 [export] Serialize cond submodules (#107818)
Cond submodules only return a single tensor, which was not supported by the serializer. Since the serializer assumes that the graph always returns a list -- this is true for the toplevel graph from dynamo, but not true for the subgraphs.

Differential Revision: [D48622687](https://our.internmc.facebook.com/intern/diff/D48622687)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107818
Approved by: https://github.com/avikchaudhuri
2023-08-24 02:29:26 +00:00
Tarun Karuturi
e8278d6058 Support graphs which return get_attr nodes directly as output (#107610)
Summary: Currently serializing graphs which return get_attr's directly as output fails. This diff adds support for that only in EXIR serializer while we still support unlifted params.

Test Plan: Added test case.

Differential Revision: D48258552

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107610
Approved by: https://github.com/angelayi
2023-08-22 23:16:10 +00:00
angelayi
a5efb5eb84 [export] Serialize constrain_as_size ops (#107386)
Since constrain_as_size has been fixed, I tried serializing it, but ran into some issues.
Notably, after each `.transform` call, I added a helper `_get_updated_range_constraints` to update the range constrains list. This is because when we retrace in a pass, the symbolic values being used changes, so we need to update this dictionary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107386
Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17
2023-08-21 15:24:11 +00:00
angelayi
63e9b5481d [export] Add schema version to serializer/deserializer (#107420)
Added a version number to the schema for BC issues. We will add this number to the serialized ExportedProgram and then when deserializing, if the number does not match up with the existing deserializer, we will error. We should update the number of there are any major changes to the schema.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107420
Approved by: https://github.com/zhxchen17
2023-08-21 06:56:46 +00:00
Tugsbayasgalan Manlaibaatar
20c5add133 [export] Refactor constrain_as_value and constrain_as_size (#106591)
Some notable changes:
1. `constrain_as_size` allows min value to be less than 2 as it will unconditionally assume min >= 2 for compiler purposes. Instead, we add additional check to make sure max value is always greater than 2.
2. Previously, we used to runtime assert on the unbacked symint's val range which would be always between [2, max]. I modified this logic to assert on [0, max] unless user explicitly specifies the min range.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106591
Approved by: https://github.com/gmagogsfm, https://github.com/ezyang
2023-08-15 05:41:43 +00:00
Sherlock Huang
1e007d044d [AOTInductor] Prepare for ProxyExecutor, OSS only change (#107065)
Summary: Minor fixes to export schema and serialization

Test Plan: OSS CI

Differential Revision: D48280809

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107065
Approved by: https://github.com/zhxchen17
2023-08-14 20:04:45 +00:00
Zhengxu Chen
547ccae0db [export] Support preserving calling convention to some modules. (#106798)
Summary: APS use this feature to swap out some submodules after unflattening.

Test Plan: test_export_preserve_signature

Differential Revision: D48154341

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106798
Approved by: https://github.com/tugsbayasgalan
2023-08-11 21:17:45 +00:00
PyTorch MergeBot
745d29b0cc Revert "[export] Refactor constrain_as_value and constrain_as_size (#106591)"
This reverts commit 18989890bf.

Reverted https://github.com/pytorch/pytorch/pull/106591 on behalf of https://github.com/izaitsevfb due to Breaks inductor test on trunk ([comment](https://github.com/pytorch/pytorch/pull/106591#issuecomment-1675069091))
2023-08-11 16:37:47 +00:00
Tugsbayasgalan Manlaibaatar
18989890bf [export] Refactor constrain_as_value and constrain_as_size (#106591)
Some notable changes:
1. `constrain_as_size` allows min value to be less than 2 as it will unconditionally assume min >= 2 for compiler purposes. Instead, we add additional check to make sure max value is always greater than 2.
2. Previously, we used to runtime assert on the unbacked symint's val range which would be always between [2, max]. I modified this logic to assert on [0, max] unless user explicitly specifies the min range.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106591
Approved by: https://github.com/gmagogsfm, https://github.com/ezyang
2023-08-11 05:29:22 +00:00
Tarun Karuturi
3143d81f6c Add support for edge dialect ops in exir/serde (#106371)
Summary:
Adding support for edge dialect ops in `exir/serde`. This diff does the following:
- Moves the global `serialize_operator/deserialize_operator` implementations in`export/serde/serialize.py` into `GraphModuleSerializer` and `GraphModuleDeserializer`
- Adds implementations of `serialize_operator/deserialize_operator` inside `GraphModuleSerializer` and `GraphModuleDeserializer` in `exir/serde/serialize.py`

Test Plan: CI + Enabled edge dialect ops in `executorch/exir/tests/test_serde.py`

Differential Revision: D47938280

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106371
Approved by: https://github.com/angelayi
2023-08-02 20:09:15 +00:00