Commit Graph

83 Commits

Author SHA1 Message Date
Jason Ansel
5a7fd20aa1 [dynamo] Support autograd.FunctionCtx.needs_input_grad (#123700)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123700
Approved by: https://github.com/anijain2305
2024-04-11 19:30:55 +00:00
William Wen
5c7e2fd270 [dynamo, 3.12] use pymalloc allocator instead of malloc/free for frames (#123299)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123299
Approved by: https://github.com/jansel
ghstack dependencies: #123216
2024-04-04 20:00:54 +00:00
William Wen
d59c5d7353 [dynamo, 3.12] enable dynamo on 3.12, enable most dynamo unittests on 3.12 (#123216)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123216
Approved by: https://github.com/jansel, https://github.com/malfet
2024-04-04 20:00:54 +00:00
Jason Ansel
3c706bf483 [dynamo] Optimize BuiltinVariable (#122055)
Improves `benchmarks/dynamo/microbenchmarks/dynamo_microbenchmarks.py`
from 5.1s to 4.2s (compared to 2 PRs ago).

This works by precomputing (and caching) the parts of `BuiltinVariable.call_function` that don't depend on the values of args/kwargs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122055
Approved by: https://github.com/oulgen, https://github.com/anijain2305
ghstack dependencies: #122039, #122043
2024-03-19 04:23:20 +00:00
albanD
6791b0c09e Change default torch_function behavior to be disabled when torch_dispatch is defined (take 2) (#120632)
This does not introduce a new test but is tested by checking that all the classes we already have still behave as before now that they don't explicitly disable torch_function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120632
Approved by: https://github.com/ezyang
2024-03-09 01:08:37 +00:00
Brian Hirsh
67f6aca0d0 dynamo: respect autograd.Function + multiple save_for_backward calls (#117667)
Fixes https://github.com/pytorch/pytorch/issues/117652. Corner case that I hit debugging some Float8 issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117667
Approved by: https://github.com/ezyang, https://github.com/zou3519
2024-02-16 21:16:07 +00:00
Yanbo Liang
03db96c248 [Dynamo] Enhance autograd.Function strict mode test (#119237)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119237
Approved by: https://github.com/zou3519
2024-02-06 02:54:19 +00:00
Yanbo Liang
cee16353db [Dynamo][autograd.Function] Should graph break on stride accesses in backward (#119137)
Fixes #118399

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119137
Approved by: https://github.com/oulgen
2024-02-04 09:08:45 +00:00
Alexander Grund
865945cc1f Convert requires_cuda to full decorator (#118281)
Don't require using it as `@requires_cuda()` -> `@requires_cuda` instead No need for the partial function invoked many times

Split out this change from the initial large refactoring in #117741 to hopefully get merged before conflicts arise

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118281
Approved by: https://github.com/ezyang
2024-01-25 15:50:21 +00:00
Oguz Ulgen
28bb31e4a5 [Dynamo] Trace autograd.function in dynamo when inputs require grad (#116358) (#116897)
For training graphs (when inputs require grad), previously, we would speculate the forward and backward graph to determine if there are any graph breaks, side effect and etc but would not actually use these speculated graphs. We would just insert a call function node on the graph and later rely on autograd's tracing.

This approach does not work for more generalized graphs like graphs that include user defined triton kernels because autograd is not able to do the higher order function conversation.

This PR speculates the forward and backward functions and emits them in a HOF that later gets used via templating mechanism.

While working on this PR, I have exposed some bugs in the current tracing due to trampoline functions losing the source information resulting in incorrect graphs being produced. I have fixed these source information bugs and killed the trampolines.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116897
Approved by: https://github.com/Skylion007, https://github.com/jansel, https://github.com/voznesenskym
2024-01-16 03:57:13 +00:00
Oguz Ulgen
8894a97707 [Dynamo] Fix source for autograd.function default value (#116894)
Before this PR, the source guard would emit
```
globals()['Gradient'].__class__.forward.__defaults__[0]
```
which is incorrect

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116894
Approved by: https://github.com/zou3519, https://github.com/yanboliang
2024-01-06 00:36:00 +00:00
PyTorch MergeBot
68105da229 Revert "[Dynamo] Trace autograd.function in dynamo when inputs require grad (#116358)"
This reverts commit 97891b184c.

Reverted https://github.com/pytorch/pytorch/pull/116358 on behalf of https://github.com/izaitsevfb due to Breaks internal accuracy test, see D52491095, pytorch/benchmark/fb/test_gpu:run_test_gpu - test_train_ig_feed_over_inductor_accuracy  ([comment](https://github.com/pytorch/pytorch/pull/116358#issuecomment-1875779697))
2024-01-03 18:20:51 +00:00
Aaron Gokaslan
bd10fea79a [BE]: Enable F821 and fix bugs (#116579)
Fixes #112371

I tried to fix as many of the bugs as I could, a few I could not figure out what the proper fix for them was though and so I left them with noqas.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116579
Approved by: https://github.com/ezyang
2024-01-01 08:40:46 +00:00
Oguz Ulgen
97891b184c [Dynamo] Trace autograd.function in dynamo when inputs require grad (#116358)
For training graphs (when inputs require grad), previously, we would speculate the forward and backward graph to determine if there are any graph breaks, side effect and etc but would not actually use these speculated graphs. We would just insert a call function node on the graph and later rely on autograd's tracing.

This approach does not work for more generalized graphs like graphs that include user defined triton kernels because autograd is not able to do the higher order function conversation.

This PR speculates the forward and backward functions and emits them in a HOF that later gets used via templating mechanism.

While working on this PR, I have exposed some bugs in the current tracing due to trampoline functions losing the source information resulting in incorrect graphs being produced. I have fixed these source information bugs and killed the trampolines.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116358
Approved by: https://github.com/jansel
2023-12-30 01:51:30 +00:00
rzou
67c8ad7285 Fix autograd.Function x enum input x torch.compile (#115206)
Fixes https://github.com/pytorch/pytorch/issues/114777. We treat Enums
like we do ConstantVariable.

Test Plan:
New test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115206
Approved by: https://github.com/yanboliang
ghstack dependencies: #115185, #115186, #115187
2023-12-06 15:18:25 +00:00
Joel Schlosser
22704426c3 Expand dynamic dims support for traceable subclasses (#114311)
Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo).

Summary:
* Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors
    * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance
    * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called
    * Addresses this: 6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)
* Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols)
    * Signatures now:
    ```python
    # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr)
    # ctx is anything useful for rebuilding the class we want to guard on
    attrs, ctx = x.__tensor_flatten__()
    ...
    # inner_tensors is a dict of {attr -> tensor}
    # ctx is taken unmodified from flattening and (eventually) guarded on
    # outer_size is the expected size of the output; possibly symbolic
    # outer_stride is the expected strides of the output; possibly symbolic
    y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride)

    # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride
    # the assert simplifies symbols when there are relationships between outer and inner symbols
    ```
    * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least
    * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now
* ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work)
* ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~
    * Now handled in #114469
* Next PR: add TENSOR_MATCH guards on inner tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311
Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh
2023-12-05 21:09:25 +00:00
Yanbo Liang
8ef44e6110 [autograd.Function] Fix torch.compile w/ once_differentiable leads to opaque graph break (#113625)
Fixes #106893

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113625
Approved by: https://github.com/zou3519
2023-12-04 21:37:06 +00:00
Yanbo Liang
6cba8b584d [Dynamo] Support torch.cuda.amp.custom_fwd/custom_bwd by inlining (#114891)
Fixes #114693

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114891
Approved by: https://github.com/zou3519
2023-12-01 01:23:51 +00:00
Yanbo Liang
bab41f44b8 [dynamo] Fix allow_in_graph decorator doesn't work on autograd.Function (#113510)
Fixes #111032

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113510
Approved by: https://github.com/zou3519
2023-11-16 22:44:46 +00:00
Yanbo Liang
6f681ab5d9 [torch.compile] autograd.Function with multiple return values (#112475)
Fixes #106389

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112475
Approved by: https://github.com/zou3519
2023-11-02 04:43:49 +00:00
voznesenskym
303c54dbd9 [dynamo] share a subgraph tracer across fwd and bwd in autograd.Function (#111588)
Fixes https://github.com/pytorch/pytorch/issues/111031

The current design of autograd.Function tracing in dynamo is that we:

1) speculate fwd, and if its fine,
2) speculate bwd, and if its fine
3) install the .apply in the graph alongside fwd guards

The mechanism for doing so involves creating HOPs for fwd, bwd, and apply. The speculation for fwd and bwd create their own subtracer. This is fine, until a proxy created in fwd is used in bwd.

For a simple example, consider:

```
 class Foo(Function):
            @staticmethod
            def forward(ctx, x):
                ctx.x0 = x.size(0)
                return x * 2

            @staticmethod
            def backward(ctx, grad_out):
                return grad_out * ctx.x0
```
the value stored at `x0` is a proxy - but it is a proxy belonging to the fwd speculation subtracer. Rather than teaching it to the subtracer for bwd, we choose to create a subtracer that covers both fwd and bwd speculation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111588
Approved by: https://github.com/zou3519
2023-10-20 21:32:02 +00:00
Yanbo Liang
da662248fb [Dynamo] Fix autograd.Function tracing errors loudly involving saved tensors (#111277)
Fixes #104792

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111277
Approved by: https://github.com/jansel, https://github.com/zou3519
2023-10-15 00:47:59 +00:00
Emil Laftchiev
f2639a2c37 Back out "Dynamo support for autograd.Function w/ once_differentiable (#108686)" (#109199)
Summary:
Original commit changeset: e11cddf1fecc

Original Phabricator Diff: D49064185

Test Plan:
Comparing PT1 and PT2 performance on the IG Feed Model with this diff backed out: N4274204

Comparing the PT1 and PT2 performance on IG Feed with this diff committed: N4271093

Reviewed By: zou3519

Differential Revision: D49230047

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109199
Approved by: https://github.com/zou3519, https://github.com/xw285cornell
2023-09-13 15:43:20 +00:00
Richard Zou
ef2bbe1ae1 Dynamo support for autograd.Function w/ once_differentiable (#108686)
Fixes #106893

There are two main changes:
- Before this PR, the function returned by once_differentiable was
included in skipfiles (because its .co_code is
torch/autograd/function.py). This PR adds a mechanism to tell Dynamo
to inline a function, no matter if it is included in skipfiles.
- A bugfix: when we are introspecting the backward, we need to turn the
grad mode off. This is to accurately model the eager-mode semantics:
In eager-mode PyTorch, if second-order gradients were not requested, then
the grad mode is off. torch.compile does not work with higher-order
gradients and just assumes we do first-order gradients, so this is OK.

Test Plan:
- new test

Differential Revision: [D49064185](https://our.internmc.facebook.com/intern/diff/D49064185)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108686
Approved by: https://github.com/voznesenskym
2023-09-08 16:10:32 +00:00
vasiliy
3702980717 dynamo: trace autograd.Function with tensor subclass input (#108093)
Summary:

Enables dynamo eager mode tracing for the following situation:
1. we have a torch.autograd.Function
2. the input to that function is a tensor subclass which is an intermediary

This is useful for float8 training UX.

Test Plan:

```
python test/dynamo/test_autograd_function.py -k intermediary_input
```

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108093
Approved by: https://github.com/bdhirsh, https://github.com/wanchaol
2023-09-01 02:12:38 +00:00
Richard Zou
5f56c4fb32 [torch.compile x autograd.Function] More test cases (#107467)
I pulled a bunch of autograd.Function from test_autograd.py and added a
smoke test for them. Ideally we would actually run test_autograd.py as a
part of the Dynamo test suite, but we have excluded it due to there
being too many errors and I don't have time to figure that out at the
moment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107467
Approved by: https://github.com/ydwu4
ghstack dependencies: #107459, #107461
2023-08-21 13:39:36 +00:00
Richard Zou
72de9b2ec2 [HigherOrderOp] stop erroring out on non-Tensor returns (#107461)
If map or autograd.Function have an input that returns a non-Tensor,
then the code just errors out. Instead of erroring out we should graph
break by raising Unsupported so users aren't confused. The better thing
to do is actually support non-Tensor returns but that requires more
work.

Test Plan:
- new tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107461
Approved by: https://github.com/ydwu4
ghstack dependencies: #107459
2023-08-21 13:39:36 +00:00
Justin Chu
8a688277a2 [BE] Enable ruff's UP rules and autoformat dynamo / functorch and refs (#105432)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105432
Approved by: https://github.com/ezyang
2023-07-19 13:48:44 +00:00
Richard Zou
adf1405909 [HigherOrderOp] Simplify design by removing reliance on name match (#104350)
Previously:
- we were keeping a list of proxies seen by the current SubgraphTracer.
It turns out, fx.Proxy has a .tracer field that we should be able to use instead.
- we were using name matching to determine if a freevar was already
lifted to being the input of the parent SubgraphTracer. Voz and I have
previously expressed concerns about the robsustness of name matching.

This PR introduces a simplified design with more invariants:
- When doing HigherOrderOp tracing, we may encounter Proxys
- Each Proxy object is associated with a SubgraphTracer.
- The new invariant is that SubgraphTracer should only construct Nodes
using Proxy that come from the SubgraphTracer. This helps us avoid
malformed graphs.
- If the Proxy object came from another SubgraphTracer, then this means
it is a free variable. We need to lift it to being an input of the
current SubgraphTracer, which will result in the construction of a new
Proxy in the current SubgraphTracer. This new Proxy should be used
whenever the old Proxy is seen by the current SubgraphTracer.

Test Plan:
- existing tests + some new tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104350
Approved by: https://github.com/ydwu4, https://github.com/voznesenskym
2023-07-06 13:32:33 +00:00
Yan Li
56ef8ca054 Fix recursive call error in lift_tracked_freevar_to_input (#104378)
Summary:
The test was failing in `lift_tracked_freevar_to_input `
https://www.internalfb.com/phabricator/paste/view/P776002064

Cause:
* line 1219 assumes that `lift_tracked_freevar_to_input` is never called by the root tracer
* However, when we see a bound free variable in a child tracer, line 1226 will invoke the parent tracer recursively.
* When it reaches the root tracer, the assumption will fail.

Fix:
* we relax the assumption: if `lift_tracked_freevar_to_input` is called on the root tracer, we validate the variable is bound free, to allow the case where `lift_tracked_freevar_to_input` is populated from child tracers.

Test Plan:
pytest ./generated/test_VainF_pytorch_msssim.py
  pytest caffe2/test/dynamo/test_autograd_function.py -k test_function_with_bound_free_variable

Reviewed By: yanboliang

Differential Revision: D47033011

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104378
Approved by: https://github.com/Skylion007, https://github.com/yanboliang
2023-06-30 04:53:45 +00:00
Yan Li
3ca8542dff Fix _saved_tensors argument issue in test (#103594)
Summary:
fix broken test in

https://github.com/pytorch/pytorch/issues/103460

Test Plan: pytest ./generated/test_pabloppp_pytorch_tools.py -k test_015

Differential Revision: D46723640

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103594
Approved by: https://github.com/yanboliang
2023-06-20 19:03:41 +00:00
Angela Yi
4a72708d2b [dynamo] Fix Autograd Function Classmethod bug (#103175)
Fixes https://github.com/pytorch/pytorch/issues/103139

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103175
Approved by: https://github.com/williamwen42, https://github.com/yanboliang
2023-06-08 18:15:27 +00:00
Michael Voznesensky
4c1bc91f42 Support autograd.Function w/ grad (#99483)
This PR adds support for tracing autograd.Function with grad.

A few important bullet points outlining our approach:

1) Our goal is to verify soundness in order to add a call_function to the autograd.Function's `apply` to the graph.
2) We achieve (1) by either verifying soundness or rejecting soundness, by ensuring that both forward and backward of the autograd.Function are sound.
3) For the forward, if we verify soundness, we install its guards into the graph.
4) For the backward, if we verify soundness, we throw it out. However, backwards soundness verification is more onerous, and has a config driven set of banned attrs and methods for tensors.

1-4 above are achieved by turning the forward and backward into UserDefinedFunctionVariables, and inlining through them, relying on dynamo's soundness detection. If we graph break in these, we raise and treat them as unsound. As noted above, backwards is stricter yet.

For the tracing, the safety comes from dynamo's HigherOrderOperator system. That system ensures that not only do we trace soundly, but that no new variables are lifted into inputs during the tracing, and that the forward and backwards are entirely self contained.

Whenever we reject a function as unsound, we restore back, as usual.

Due to some limitations in the lifting logic, we have an escape hatch we implemented for tensors that are known in forward, but cross into backwards through save_tensors (save) /saved_tensors (load). We escape hatch here to avoid having the known saved tensors coming from forward end up being accidentally treated as lifted variables (and rejected). This is sound, but a little hacky feeling.

Additionally, due to some limitations in fx node removal, combined with how we produce subgraphs for the traces installed from HigherOrderOperators, we had to improve our node removal logic. In the event of a restore, we remove the old nodes from the graph, as usual in dynamo. However, because the references to these nodes may exist in subgraphs, we traverse any nodes users and remove them first if and only if they are in another graph. This is always sound, because removal should only be downstream of restoration at this point.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99483
Approved by: https://github.com/zou3519
2023-05-19 01:26:21 +00:00