Commit Graph

253 Commits

Author SHA1 Message Date
PyTorch MergeBot
b2f09c1859 Revert "[compiled autograd] support custom ops backed by c++ autograd::Function (#120681)"
This reverts commit d27509c384.

Reverted https://github.com/pytorch/pytorch/pull/120681 on behalf of https://github.com/xmfan due to breaking internal builds, see D54707287 ([comment](https://github.com/pytorch/pytorch/pull/120681#issuecomment-1989542344))
2024-03-11 22:18:36 +00:00
Simon Fan
d27509c384 [compiled autograd] support custom ops backed by c++ autograd::Function (#120681)
- Adds support for custom ops backed by c++ custom autograd functions, e.g. fbgemm
- Include files more granularly to avoid namespace pollution and circular imports

limitations:
- requires user to audit their code and opt-in their custom autograd::Function via autograd::Function::is_traceable and maybe additional compiled_args + apply_with_saved implementation. this was the only way I can think of for soundness
- will throw if we can't hash the saved_data i.e. for any non implemented type other than list and dict in at::IValue::hash b0cfa96e82/aten/src/ATen/core/ivalue.cpp (L364)
- can technically silently fail if both the typeid hash and the typeid string name of the custom autograd::Function collide at the same time, and an identical autograd graph containing a different custom autograd::Function, yet that has an identical implementation, is called. this case seems extremely unlikely, and the only alternative to hash collision i can think of is compiling with reflection
- tensors not saved via save_variables are not lifted, and are specialized on TensorImpl*'s hash (treated as a memory address). if needed, we can lift them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120681
Approved by: https://github.com/jansel
2024-03-08 20:43:29 +00:00
PyTorch MergeBot
2b1661c7a0 Revert "[compiled autograd] support custom ops backed by c++ autograd::Function (#120681)"
This reverts commit 05c256849b.

Reverted https://github.com/pytorch/pytorch/pull/120681 on behalf of https://github.com/izaitsevfb due to breaking internal builds, see D54617701 ([comment](https://github.com/pytorch/pytorch/pull/120681#issuecomment-1984214079))
2024-03-07 18:53:51 +00:00
Simon Fan
05c256849b [compiled autograd] support custom ops backed by c++ autograd::Function (#120681)
- Adds support for custom ops backed by c++ custom autograd functions, e.g. fbgemm
- Include files more granularly to avoid namespace pollution and circular imports

limitations:
- requires user to audit their code and opt-in their custom autograd::Function via autograd::Function::is_traceable and maybe additional compiled_args + apply_with_saved implementation. this was the only way I can think of for soundness
- will throw if we can't hash the saved_data i.e. for any non implemented type other than list and dict in at::IValue::hash b0cfa96e82/aten/src/ATen/core/ivalue.cpp (L364)
- can technically silently fail if both the typeid hash and the typeid string name of the custom autograd::Function collide at the same time, and an identical autograd graph containing a different custom autograd::Function, yet that has an identical implementation, is called. this case seems extremely unlikely, and the only alternative to hash collision i can think of is compiling with reflection
- tensors not saved via save_variables are not lifted, and are specialized on TensorImpl*'s hash (treated as a memory address). if needed, we can lift them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120681
Approved by: https://github.com/jansel
2024-03-06 18:01:56 +00:00
Sheng Fu
31bfa59970 Capture primitive data type arguments for profiling python_function (#120949)
RECORD_FUNCTION in python_function only captures argument that is a Tensor. However, it is very common for user to use non tensor arguments in custom ops, for example, sequence length in GPT attention custom op. My previous PR tries to capture all non-tensor arguments, it turned out in some cases, it is very expensive.

This PR is to support primitive (or its container) arguments in RECORD_FUNCTION.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120949
Approved by: https://github.com/soulitzer
2024-03-06 05:09:22 +00:00
PyTorch MergeBot
4903e33e19 Revert "Capture non tensor arguments in record_function (#120017)"
This reverts commit 5c5b71b6ee.

Reverted https://github.com/pytorch/pytorch/pull/120017 on behalf of https://github.com/soulitzer due to regresses perf on autograd Function when using profiler ([comment](https://github.com/pytorch/pytorch/pull/120017#issuecomment-1969883792))
2024-02-28 20:43:33 +00:00
Jason Ansel
01ec8df6d8 [Compiled Autograd] Introduce BackwardState capture (#120382)
This adds support for backwards hooks that are *both*:
1) Interior to the graph; and
2) Dynamically generated (e.g. lambdas)

We do this by creating a BackwardState object that is used to register the hooks in the forward, then populated by dynamo *after* the forwards runs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120382
Approved by: https://github.com/xmfan
2024-02-28 20:36:47 +00:00
Sheng Fu
5c5b71b6ee Capture non tensor arguments in record_function (#120017)
Summary: RECORD_FUNCTION only capture the argument when it is a Tensor. However, it is very common for user to use the argument with primitive data type (int, float, index, bool). This DIFF is to support non tensor arguments in RECORD_FUNCTION.

Test Plan:
unit test
    buck test  mode/dev-nosan caffe2/test:profiler -- test_execution_trace_with_pt2 test_execution_trace_alone test_execution_trace_with_kineto test_execution_trace_start_stop test_execution_trace_repeat_in_loop test_execution_trace_no_capture

Differential Revision: D53674768

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120017
Approved by: https://github.com/soulitzer
2024-02-22 09:40:08 +00:00
cyy
8a3c241094 Remove unused header inclusion (#119667)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119667
Approved by: https://github.com/Skylion007
2024-02-12 05:36:25 +00:00
PyTorch MergeBot
dabb90f2a4 Revert "[Exception] [6/N] Remove use of torch::TypeError (#117964)"
This reverts commit 87335fabae.

Reverted https://github.com/pytorch/pytorch/pull/117964 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/117964#issuecomment-1913079096))
2024-01-27 08:44:34 +00:00
cyy
87335fabae [Exception] [6/N] Remove use of torch::TypeError (#117964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117964
Approved by: https://github.com/albanD
2024-01-25 03:35:58 +00:00
Simon Fan
9eb842cbd6 Compiled autograd: Lift autograd functions' backward and provide default key for custom autograd functions (#115573)
This PR adds support for torch.autograd.Function subclasses in compiled autograd. We do this by:
- Creating a uid for all torch.autograd.Function via its metaclass. This uid is used in the compiled autograd key, which is a subset of the cache key to the compiled graph
- "Lifting" the backward/saved_tensors, having them as input arguments in the compiled graph
  - Creating proxies to track the backward's inputs and outputs. Since the backward's outputs (grads) have to match the forward's inputs, we pass the node's `input_info` (forward's input sizes) to build the proxies tracking the backward's outputs.
  - Use a `FakeContext` class as a replacement for the autograd node's context object (`BackwardCFunction`) during tracing, only support passing saved_tensors from the forward to the backward
  - Index each backward, to support multiple torch.autograd.Functions in the same graph
  - Special case for `CompiledFunctionBackward`, lifting CompiledFunction will fail 4 tests and requires some skipfiles changes that I'd rather do that in a separate PR

Example graph: test_custom_fn_saved_multiple_tensors (eager fw + compiled autograd)
```python
class MyFn(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x, y):
        ctx.save_for_backward(x, y)
        return torch.sin(x), torch.sin(y)

    @staticmethod
    def backward(ctx, gO_x, gO_y):
        (x, y) = ctx.saved_tensors
        return gO_x * torch.cos(x), gO_y * torch.cos(y)
```
The backwards is lifted via `getitem_5` and `call_backward`
```python
# Compiled autograd graph
 ===== Compiled autograd graph =====
 <eval_with_key>.0 class CompiledAutograd(torch.nn.Module):
    def forward(self, inputs, sizes, hooks):
        # No stacktrace found for following nodes
        getitem: "f32[]" = inputs[0]
        getitem_1: "f32[10]" = inputs[1]
        getitem_2: "f32[10]" = inputs[2]
        getitem_3: "f32[10]" = inputs[3]
        getitem_4: "f32[10]" = inputs[4];  inputs = None
        expand: "f32[10]" = torch.ops.aten.expand.default(getitem, [10]);  getitem = None
        mul: "f32[10]" = torch.ops.aten.mul.Tensor(expand, getitem_2);  getitem_2 = None
        mul_1: "f32[10]" = torch.ops.aten.mul.Tensor(expand, getitem_1);  expand = getitem_1 = None
        getitem_5 = hooks[0];  hooks = None
        call_backward = torch__dynamo_external_utils_call_backward(getitem_5, (getitem_3, getitem_4), mul_1, mul);  getitem_5 = mul_1 = mul = None
        getitem_6: "f32[10]" = call_backward[0]
        getitem_7: "f32[10]" = call_backward[1];  call_backward = None
        accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_4, getitem_7);  getitem_4 = getitem_7 = None
        accumulate_grad__1 = torch.ops.inductor.accumulate_grad_.default(getitem_3, getitem_6);  getitem_3 = getitem_6 = None
        return []
```

then is later inlined by dynamo
```python
# Dynamo graph
 ===== __compiled_fn_0 =====
 <eval_with_key>.1 class GraphModule(torch.nn.Module):
    def forward(self, L_inputs_0_ : torch.Tensor, L_inputs_1_ : torch.Tensor, L_inputs_2_ : torch.Tensor, L_inputs_3_ : torch.Tensor, L_inputs_4_ : torch.Tensor):
        getitem = L_inputs_0_
        getitem_1 = L_inputs_1_
        getitem_2 = L_inputs_2_
        x = L_inputs_3_
        y = L_inputs_4_

        # File: <eval_with_key>.0:10, code: expand = torch.ops.aten.expand.default(getitem, [10]);  getitem = None
        expand = torch.ops.aten.expand.default(getitem, [10]);  getitem = None

        # File: <eval_with_key>.0:11, code: mul = torch.ops.aten.mul.Tensor(expand, getitem_2);  getitem_2 = None
        mul = torch.ops.aten.mul.Tensor(expand, getitem_2);  getitem_2 = None

        # File: <eval_with_key>.0:12, code: mul_1 = torch.ops.aten.mul.Tensor(expand, getitem_1);  expand = getitem_1 = None
        mul_1 = torch.ops.aten.mul.Tensor(expand, getitem_1);  expand = getitem_1 = None

        # File: /data/users/xmfan/core/pytorch/test/inductor/test_compiled_autograd.py:412, code: return gO_x * torch.cos(x), gO_y * torch.cos(y)
        cos = torch.cos(x)
        getitem_6 = mul_1 * cos;  mul_1 = cos = None
        cos_1 = torch.cos(y)
        getitem_7 = mul * cos_1;  mul = cos_1 = None

        # File: <eval_with_key>.0:17, code: accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_4, getitem_7);  getitem_4 = getitem_7 = None
        accumulate_grad__default = torch.ops.inductor.accumulate_grad_.default(y, getitem_7);  y = getitem_7 = None

        # File: <eval_with_key>.0:18, code: accumulate_grad__1 = torch.ops.inductor.accumulate_grad_.default(getitem_3, getitem_6);  getitem_3 = getitem_6 = None
        accumulate_grad__default_1 = torch.ops.inductor.accumulate_grad_.default(x, getitem_6);  x = getitem_6 = None
        return ()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115573
Approved by: https://github.com/jansel
2024-01-10 18:01:28 +00:00
cyy
91bbcf8c71 [1/N] replace THPUtils_assert with TORCH_CHECK (#116675)
This PR replaces THPUtils_assert with TORCH_CHECK.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116675
Approved by: https://github.com/albanD
2024-01-04 11:15:33 +00:00
Scott Wolchok
165f4f6ccf [PyTorch] Redirect c10::optional to std::optional (#101995)
We have C++17 now!

I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway.

Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995
Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang
2023-11-30 02:46:41 +00:00
Ying Liu
85b97605ab Enable set sequence nr (#114120)
Summary:
In some cases (especially those involving collective calls) - we would want to always kick off a collective call first before running going down another path.

For  example:

```
tbe lookup -> a2a ->
                     overarch
dense ------------->
```

if the forward code is written as
a2a_out = a2a
dense = dense_net
out = overarch(a2a_out, dense)
out.backward()

The current default is running backwards in the opposite order the forward is called. However, there is no data dependency between a2a and dense, so in reality either of them could be run first. We would like the a2a to run first because it provides optimal (on average) overlap.

Changing the seq_nr of a2a_out to something large enough would allow autograd engine to kick it off first.

Test Plan: Tests incoming

Differential Revision: D51445261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114120
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-11-21 19:47:28 +00:00
soulitzer
c435b8c10a Fix autograd engine callback error propagation from device thread (#113702)
The existing try-catch doesn't work because it doesn't call err.persist(). This is in contrast to the try-catch for evaluate_function which does work because it calls into python_engine's thread_on_exception which calls persist.

Calling persist on a python_error stashes the PyErr state from the thread-local PyThreadState onto the python_error object, so that when this error object is stored onto the future and passed back to the calling cpu thread, python_engine's execute try-catch can then err.restore() the error state. Finally, the python_engine's execute would re-raise so that this is re-caught by the HANDLE_TH_ERRORS macro.

Fixes https://github.com/pytorch/pytorch/issues/75750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113702
Approved by: https://github.com/albanD
2023-11-17 20:17:02 +00:00
albanD
5e8be63e99 Allow specifiying inputs as GradientEdge in autograd APIs (#110867)
This can be useful for advanced users (like AOTAutograd) who don't want to keep the corresponding Tensor alive (for memory reasons for example) or when inplace op will change the Tensor's grad_fn (but gradients wrt to the original value is needed).

I went minimal API change but open to suggestions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110867
Approved by: https://github.com/soulitzer
2023-10-12 04:08:44 +00:00
soulitzer
73f4c1a406 [reland2] Update custom Function preserve torch function when inputs … (#110895)
…returned as-is

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110895
Approved by: https://github.com/albanD
2023-10-11 21:37:19 +00:00
PyTorch MergeBot
d1c157c598 Revert "[reland] Update custom Function preserve torch function when inputs r… (#110679)"
This reverts commit 563728f61c.

Reverted https://github.com/pytorch/pytorch/pull/110679 on behalf of https://github.com/kit1980 due to The diff has Meta-internal changes, please land from Phabricator ([comment](https://github.com/pytorch/pytorch/pull/110679#issuecomment-1753523182))
2023-10-09 19:09:01 +00:00
soulitzer
563728f61c [reland] Update custom Function preserve torch function when inputs r… (#110679)
…eturned as-is

reland of https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749803837

Opening this without ghstack to do codev. In our PR, we changed the signature of `_wrap_outputs`. There is some internal code that calls `_wrap_outputs` directly, so we also need to update that callsite.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110679
Approved by: https://github.com/albanD
2023-10-07 00:27:45 +00:00
PyTorch MergeBot
236afe73a2 Revert "Update custom Function preserve torch function when inputs returned as-is (#109825)"
This reverts commit 4e73eee93f.

Reverted https://github.com/pytorch/pytorch/pull/109825 on behalf of https://github.com/PaliC due to causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749802739))
2023-10-05 23:49:41 +00:00
soulitzer
4e73eee93f Update custom Function preserve torch function when inputs returned as-is (#109825)
Fixes https://github.com/pytorch/pytorch/issues/109805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109825
Approved by: https://github.com/albanD
2023-10-04 22:45:11 +00:00
cyy
d0ad848aa5 Enable misc clang-tidy checks (#110283)
This PR enables the misc-XX checks in clang-tidy. Meanwhile, I excluded some of them that require a lot of code changes and have no immediate benefits. Some additional fixes and suppression were also given.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110283
Approved by: https://github.com/albanD
2023-09-30 10:39:52 +00:00
Pritam Damania
550b0ec3d4 Release GIL around VariableInfo::zeros to avoid deadlocks (#109454)
See https://github.com/pytorch/pytorch/issues/109074#issue-1891369807 and https://github.com/pytorch/pytorch/issues/109074#issuecomment-1718825855
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109454
Approved by: https://github.com/albanD
2023-09-18 22:28:48 +00:00
cyy
a14d30d8d1 [1/N] apply clang-tidy in torch/csrc/autograd (#109032)
This PR begins a new series of patches for enabling clang-tidy checks in torch/csrc/augograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109032
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-09-15 23:28:43 +00:00
cyy
36b8ca4e48 [2/N] apply clang-tidy in torch/csrc/autograd (#109277)
This PR follows the work of PR #109032.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109277
Approved by: https://github.com/albanD
2023-09-15 00:39:12 +00:00
Alex Settle
9ba0558d48 Add sequence_nr to aot_autograd to map forward ops to their corresponding backward ops (#103129)
Fixes #102375

Sequence_nr increments in the forward pass and decrements in the backward pass.  Backward ops with the same sequence_nr as a forward op represent the backward implementation for the op.  The long term goal is to make this information available to the profiler so users can observe which ops are fused by the inductor openai triton kernels.

Added a test for this feature **test/dynamo/test_aot_autograd.py::AotAutogradFallbackTests::test_aot_sequence_nr**.  The test case uses **aot_export_module()** to create a joint fwd/bwd fx graph.  Then it walks all the nodes in fx graph using fx_graph.graph.nodes.   The seq_nr of each node is recorded in node.meta.  During the fwd pass the seq_nr increments and it decrements during the bwd pass.  This allows the user to map forward ops to their corresponding bwd ops which is useful for performance analysis.

Expected output from the test case.

 SeqNr|OrigAten|SrcFn
0|aten.convolution.default|l__self___conv1
0|aten.add.Tensor|l__self___bn1
1|aten._native_batch_norm_legit_functional.default|l__self___bn1
2|aten.relu.default|l__self___relu1
3|aten.add.Tensor|add
4|aten.view.default|flatten
5|aten.t.default|l__self___fc1
6|aten.unsqueeze.default|l__self___fc1
7|aten.mm.default|l__self___fc1
8|aten.squeeze.dim|l__self___fc1
9|aten.add.Tensor|l__self___fc1
10|aten.sub.Tensor|l__self___loss_fn
11|aten.abs.default|l__self___loss_fn
12|aten.mean.default|l__self___loss_fn
12|aten.ones_like.default|
12|aten.expand.default|
12|aten.div.Scalar|
11|aten.sgn.default|
11|aten.mul.Tensor|
8|aten.unsqueeze.default|
7|aten.t.default|
7|aten.mm.default|
7|aten.t.default|
7|aten.t.default|
7|aten.mm.default|
6|aten.squeeze.dim|
5|aten.t.default|
4|aten.view.default|
2|aten.threshold_backward.default|
1|aten.native_batch_norm_backward.default|
0|aten.convolution_backward.default|
0|aten.add.Tensor|

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103129
Approved by: https://github.com/soulitzer
2023-08-02 00:52:52 +00:00
Jason Ansel
457d01bcfd [Compiled Autograd] Remove TORCH_API from generated autograd nodes (#105286)
This works around the Windows symbol count issues in #103822.  Unfortunately, removing TORCH_API only works on Windows, but causes build issues on Linux, so we need the `#ifdef`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105286
Approved by: https://github.com/albanD
2023-07-27 02:33:14 +00:00
Jason Ansel
5a114f72bf [Compiled Autograd] Move to torch::dynamo::autograd namespace (#105854)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105854
Approved by: https://github.com/albanD
2023-07-27 00:36:47 +00:00
PyTorch MergeBot
e60af5c8e4 Revert "[Compiled Autograd] Move to torch::dynamo::autograd namespace (#105854)"
This reverts commit 26e3b4020f.

Reverted https://github.com/pytorch/pytorch/pull/105854 on behalf of https://github.com/PaliC due to breaking internal embedded device tests (details shared with author) ([comment](https://github.com/pytorch/pytorch/pull/105854#issuecomment-1650559375))
2023-07-25 21:09:18 +00:00
Jason Ansel
26e3b4020f [Compiled Autograd] Move to torch::dynamo::autograd namespace (#105854)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105854
Approved by: https://github.com/albanD
2023-07-25 01:14:04 +00:00
Jason Ansel
c902b84e0b Compiled autograd (#103822)
This branch:
1) converts the autograd tape into an FX graph
2) caches that conversion using a "shadow" graph
3) compiles and runs the generated FX graph instead of the normal autograd

What works currently:
1) Caching, capture, and initial integration
2) Backwards hooks
3) Inlining AotAutograd generated subgraphs
4) torch.compiling the generated FX graph
5) Auto-detecting dynamic shapes based on changes

Future work
1) Larger scale testing
1) Boxed calling convention, so memory can be freed incrementally
1) Support hooks on SavedTensor
1) Additional testing by running eager autograd tests under compiled_autograd.enable()

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103822
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-07-24 21:12:05 +00:00
soulitzer
c85468a94c [autograd Function] Add private API to not materialize grads for non-differentiable outputs (#104291)
Fixes https://github.com/pytorch/pytorch/issues/104272

This PR adds a new private API `materialize_non_diff_grads` (default True) such that when set to False, grad outputs corresponding to outputs marked non-differentiable would receive None instead of a zero-filled tensor. This is overrides the setting of `materialize_grads`, i.e. grad outputs corresponding non-differentiable outputs would still be None even if `materialize_grads=True` (the default).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104291
Approved by: https://github.com/albanD
2023-07-08 14:53:54 +00:00
Thiago Crepaldi
3834582327 [ONNX] Add autograd_inlining flag to torch.onnx.export (#104067)
Fixes #88286, Fixes #97160

Repro:

```python
import torch
import io
from torch.utils.checkpoint import checkpoint

class A(torch.nn.Module):
    # A supported module.
    def __init__(self):
        super(A, self).__init__()
        self.l1 = torch.nn.Linear(2, 2)

    def forward(self, x):
        return self.l1(x)

class B(torch.nn.Module):
    # This module is not exportable to ONNX because it
    # uses gradient-checkpointing. However, its two sub-module's
    # are exportable, so ORTModule should be used to compute them.
    def __init__(self):
        super(B, self).__init__()
        self.l1 = torch.nn.Linear(2, 2)
        self.a = A()

    def forward(self, x):
        def custom():
            def custom_forward(x_):
                return self.a(x_)

            return custom_forward

        z = self.l1(checkpoint(custom(), x))
        return z

torch.onnx.export(
    B(),
    (torch.randn(2, 2),),
    io.BytesIO(),
    autograd_inlining=True
)
```

`torch.onnx.export(autograd_inlining=True)` should repro the user error as this is the original execution path.
```bash
Traceback (most recent call last):
  File "repro88286.py", line 36, in <module>
    torch.onnx.export(
  File "<@beartype(torch.onnx.utils.export) at 0x7f0f011faee0>", line 385, in export
  File "/opt/pytorch/torch/onnx/utils.py", line 511, in export
    _export(
  File "/opt/pytorch/torch/onnx/utils.py", line 1576, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "<@beartype(torch.onnx.utils._model_to_graph) at 0x7f0f01187dc0>", line 11, in _model_to_graph
  File "/opt/pytorch/torch/onnx/utils.py", line 1130, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/opt/pytorch/torch/onnx/utils.py", line 1006, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/opt/pytorch/torch/onnx/utils.py", line 910, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/opt/pytorch/torch/jit/_trace.py", line 1269, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/opt/pytorch/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/pytorch/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/pytorch/torch/jit/_trace.py", line 128, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/opt/pytorch/torch/jit/_trace.py", line 119, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/opt/pytorch/torch/nn/modules/module.py", line 1502, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/pytorch/torch/nn/modules/module.py", line 1511, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/pytorch/torch/nn/modules/module.py", line 1492, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "repro88286.py", line 32, in forward
    z = self.l1(checkpoint(custom(), x))
  File "/opt/pytorch/torch/utils/checkpoint.py", line 412, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/pytorch/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
RuntimeError: _Map_base::at
```
By using `autograd_inlining=False`, the export still fail with a different error because autograd inlining is not enabled:

```bash
Traceback (most recent call last):
  File "repro88286.py", line 36, in <module>
    torch.onnx.export(
  File "<@beartype(torch.onnx.utils.export) at 0x7f6088b32ee0>", line 385, in export
  File "/opt/pytorch/torch/onnx/utils.py", line 511, in export
    _export(
  File "/opt/pytorch/torch/onnx/utils.py", line 1615, in _export
    ) = graph._export_onnx(  # type: ignore[attr-defined]
RuntimeError: ONNX export failed: Couldn't export Python operator CheckpointFunction
```
To allow `CheckpointFunction` into the onnx graph, `operator_export_type=torch.onnx.OperatorExportTypes.ONNX_FALLTHROUGH` flag can be added to `torch.onnx.export`, which would lead to the following ONNX graph:

```bash
Exported graph: graph(%prim::PythonOp_0 : Float(2, 2, strides=[2, 1], requires_grad=0, device=cpu),
      %l1.weight : Float(2, 2, strides=[2, 1], requires_grad=1, device=cpu),
      %l1.bias : Float(2, strides=[1], requires_grad=1, device=cpu)):
  %/PythonOp_output_0 : Float(2, 2, strides=[2, 1], requires_grad=0, device=cpu) = ^CheckpointFunction[inplace=0, module="torch.utils.checkpoint", onnx_name="/PythonOp"](<function B.forward.<locals>.custom.<locals>.custom_forward at 0x7fdf9182f670>, True)(%prim::PythonOp_0), scope: __main__.B:: # /opt/pytorch/torch/autograd/function.py:506:0
  %6 : Float(2, 2, strides=[2, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="/l1/Gemm"](%/PythonOp_output_0, %l1.weight, %l1.bias), scope: __main__.B::/torch.nn.modules.linear.Linear::l1 # /opt/pytorch/torch/nn/modules/linear.py:114:0
  return (%6)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104067
Approved by: https://github.com/BowenBao, https://github.com/kit1980
2023-07-05 15:27:36 +00:00
soulitzer
896d997dd0 Remove incorrect THP{Cpp,}Function_traverse PyObject traversals (#102860)
Fixes https://github.com/pytorch/pytorch/issues/102174

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102860
Approved by: https://github.com/albanD
2023-06-02 22:05:25 +00:00
PandaNinjas
f0786ad776 Use %zu instead of %ld when formatting size_t (#101412)
This fixes compiling on systems where `size_t` is an `unsigned int` instead of an `unsigned long int` (32 bit Raspberry Pi OS is one example).
`%ld` expects an `unsigned long int`, while `%zu` specifies that it's an unsigned size_t.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101412
Approved by: https://github.com/albanD
2023-05-16 02:45:55 +00:00
soulitzer
abe96654de [reland][BE][autograd Function] Raise an error if input is returned a… (#98051)
…s-is and saved for forward or backward in setup_context

Fixes #ISSUE_NUMBER

Relanding this in a new non-ghstack PR so I can import this to do co-dev
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98051
Approved by: https://github.com/zou3519
2023-04-11 15:42:54 +00:00
PyTorch MergeBot
45acfc8574 Revert "[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212)"
This reverts commit 313db584f3.

Reverted https://github.com/pytorch/pytorch/pull/97212 on behalf of https://github.com/soulitzer due to Internally someone is rely on _wrap_outputs and we updated its signature
2023-03-30 22:03:07 +00:00
soulitzer
313db584f3 [BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212)
Fixes https://github.com/pytorch/pytorch/issues/96887

We error out in BOTH the case when graph is created and when it is not created.

Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context.

This makes setup_context and non-setup_context versions diverge in their behavior
- With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs.
- But now with the setup_context version, we produce an error for this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212
Approved by: https://github.com/zou3519
2023-03-29 17:54:00 +00:00
PyTorch MergeBot
2ef6ffdfa1 Revert "[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212)"
This reverts commit f3aca45a16.

Reverted https://github.com/pytorch/pytorch/pull/97212 on behalf of https://github.com/soulitzer due to TestAutogradFunctionCUDA.test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_output_mark_dirty_True_cuda leaks
2023-03-28 18:30:51 +00:00
soulitzer
f3aca45a16 [BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212)
Fixes https://github.com/pytorch/pytorch/issues/96887

We error out in BOTH the case when graph is created and when it is not created.

Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context.

This makes setup_context and non-setup_context versions diverge in their behavior
- With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs.
- But now with the setup_context version, we produce an error for this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212
Approved by: https://github.com/zou3519
2023-03-28 03:14:32 +00:00
Aaron Gokaslan
8c8cd9539d Add missing moves to torch autograd (#92772)
Applies some additional std::move functions to torch/csrc/autograd to opportunities that were found via static analysis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92772
Approved by: https://github.com/ezyang
2023-01-24 02:01:52 +00:00
soulitzer
a112814a7f Simplify retains grad hook implementation (#92604)
How the old retains_grad hooks was implemented:
- retains_grad hooks are stored on the autograd_meta, as entries in a vector
- upon registration, a wrapper hook CppFunctionTensorPreHook is created to wrap that vector, and then that wrapper hook is registered to the grad_fn, i.e., by appending it to a vector of retains_grad hooks on the grad_fn
- upon in-place, for the old grad_fn we set the retains_grad hook to nullptr, so that even though the old grad_fn still references the vector, the vector contains a single nullptr. For the new grad_fn, we create a new wrapper hook around the vector (storing the single retains_grad hook) on autograd_meta.

The new retains_grad hook implementation:
- we store std::function by value, and we store it on the grad_fn rather than the autograd_meta
- a single grad_fn can have multiple outputs, so it can potentially hold multiple retains_grad hooks. We use an unordered_map (previously a vector).
- on in-place we remove the hook from the old grad_fn and put it in the new grad_fn (small implication of this change is that  we we now need to have access to both the old grad_fn and new grad_fn, this isn't a problem)

Other details:
- CppFunctionTensorPreHook took a shared_ptr to vector of std::function. In our new implementation, we add a new wrapper hook CppFunctionSingleTensorPreHook, which takes a single std::function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92604
Approved by: https://github.com/albanD
2023-01-23 20:10:46 +00:00
soulitzer
1bc60c6b31 [reland] Improve hooks ordering behavior (#92559)
This reverts commit e525f433e1.

Original PR:  #85849
Fixes #ISSUE_NUMBER

In addition to reverting the revert, this PR:
- defines the virtual destructor of FunctionPreHook in the header. Why? Presumably the internal build imports the header from somewhere, but does not have function_hooks.cpp (where the virtual destructor was previously defined) in the same compilation unit.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92559
Approved by: https://github.com/albanD
2023-01-19 08:17:32 +00:00
PyTorch MergeBot
e525f433e1 Revert "Improve hooks ordering behavior (#85849)"
This reverts commit 049838f249.

Reverted https://github.com/pytorch/pytorch/pull/85849 on behalf of https://github.com/albanD due to fails internal build
2023-01-18 15:27:22 +00:00
Richard Zou
98b78aa11c [autograd.Function] setup_context always appears on the Function (#92312)
Previously, we used the existence of setup_context to switch between if
forward should take a ctx object or not.

To be consistent with all other staticmethod (which always exist on the
autograd.Function), this PR change it so that we use IF setup_context
gets overriden by the user to switch between if forward should take a
ctx object or not.

Fixes https://github.com/pytorch/pytorch/issues/91451

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92312
Approved by: https://github.com/albanD, https://github.com/soulitzer
2023-01-18 02:55:42 +00:00
soulitzer
049838f249 Improve hooks ordering behavior (#85849)
Addresses: https://github.com/pytorch/pytorch/issues/35802

Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit#

### Changes in this PR

#### Implementation
- We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed.
- Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated.

#### Hooks ordering and execution:
- Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn
- Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad()
- Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook

#### Retains grad hooks
- retains grad hooks always execute last, even if there are other tensor pre-hooks registered

#### Unchanged:
- pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad()

Follow ups:
- simplify retains_grad field to not be a vector, since it always holds a single hook
- potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since
- python hooks registered to tensor behavior on in-place is still wrong

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85849
Approved by: https://github.com/albanD
2023-01-17 16:23:21 +00:00
Richard Zou
81cc9bba5e [autograd.Function] Kill the extension feature flag (#92026)
This PR removes the autograd.Function extension feature flag. This was
previously used for development of the functorch <> autograd.Function
interaction.

It's been in master for long enough with the feature flag defaulting to
True, so it's time to remove it.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92026
Approved by: https://github.com/soulitzer
2023-01-17 13:36:42 +00:00
Richard Zou
7aaad0b832 Rename flag that enables/disables _SingleLevelFunction for functorch (#92025)
functorch used to have a switch that enables/disables autograd.Function.
That switch now enables/disables torch.autograd.function._SingleLevelFunction, so
I've renamed it accordingly.

We could just delete the switch because users should not be directly
working with torch.autograd.function._SingleLevelFunction. However,
it was useful for debugging when something went wrong when I was
implementing the autograd.Function <> functorch interaction, so I want
to keep it around as a debugging tool for a while since the code is
already there.

Test Plan:
- updated tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92025
Approved by: https://github.com/soulitzer
2023-01-17 13:36:41 +00:00
PyTorch MergeBot
b3603f8129 Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855)"
This reverts commit 34f2d3e6ae.

Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests
2023-01-06 19:56:35 +00:00