Commit Graph

330 Commits

Author SHA1 Message Date
voznesenskym
76ced0df03 Consider storage_changed for assigning alias_of_input in aot_autograd when computing differentiable outputs that alias each other (#115315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115315
Approved by: https://github.com/bdhirsh
2023-12-12 23:21:58 +00:00
Oguz Ulgen
c9c4cdf9a9 [AOTAutograd] Do not call ctx.mark_dirty on mutations hidden from autograd (#115324)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115324
Approved by: https://github.com/bdhirsh
2023-12-09 02:23:13 +00:00
Bin Bao
0757e2ba84 [aotautograd] Fix an output shape error when inputs are aliased (#115279)
Summary: https://github.com/pytorch/pytorch/issues/97083, when an output
is marked as OutputType.is_input but a synthetic base is constructed
because of aliased inputs, we may need to update the output type to
OutputType.alias_of_input if needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115279
Approved by: https://github.com/bdhirsh
2023-12-06 23:10:21 +00:00
Brian Hirsh
1102d37958 remove aot_config.keep_inference_input_mutations from assert_functional_graph (#115195)
We technically allow backends to aot_autograd to pass a config saying "yes I am ok with seeing input mutations in my graph".

With https://github.com/pytorch/pytorch/pull/112906 though, there can be input mutations that show up in the backward (that we need to handle for correctness), that are a large pain to keep out of the graph. The meta-point is that it's been ~a year since we added the config, and it almost always makes sense for backends to support input mutations for performance reasons (inductor does). So I just allow these input mutations in the graph in this rare backward situation, even if the backend didn't explicitly use the config.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115195
Approved by: https://github.com/drisspg
2023-12-05 23:36:37 +00:00
Joel Schlosser
22704426c3 Expand dynamic dims support for traceable subclasses (#114311)
Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo).

Summary:
* Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors
    * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance
    * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors *of the same dim as outer* when `mark_dynamic(outer, ...)` is called
    * Addresses this: 6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)
* Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols)
    * Signatures now:
    ```python
    # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr)
    # ctx is anything useful for rebuilding the class we want to guard on
    attrs, ctx = x.__tensor_flatten__()
    ...
    # inner_tensors is a dict of {attr -> tensor}
    # ctx is taken unmodified from flattening and (eventually) guarded on
    # outer_size is the expected size of the output; possibly symbolic
    # outer_stride is the expected strides of the output; possibly symbolic
    y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride)

    # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride
    # the assert simplifies symbols when there are relationships between outer and inner symbols
    ```
    * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least
    * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now
* ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work)
* ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~
    * Now handled in #114469
* Next PR: add TENSOR_MATCH guards on inner tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311
Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh
2023-12-05 21:09:25 +00:00
Tugsbayasgalan Manlaibaatar
f1c8c427da Fix https://github.com/pytorch/pytorch/issues/114892 (#115054)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115054
Approved by: https://github.com/bdhirsh
2023-12-04 18:29:33 +00:00
voznesenskym
4cfe997490 [dynamo] handle setting .data on a tensor (#113080)
**Dynamo**

We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine.

The safe recipe is:

1) Disable grad
2) Call set_()
3) Manually lower the version counter on the object to hide it from the autograd engine

This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor.

**aot_autograd**

For aot_autograd, there's another snag.

Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input.

The new plan here is to:
1) Record tensor fakification policy in dynamo
2) provide a fresh fake mode to all backends
3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113080
Approved by: https://github.com/bdhirsh
2023-12-02 00:35:44 +00:00
Brian Hirsh
c546ca9f80 AOTAutograd: support mutations on buffers that happen during the bw (#114953)
Re-land of https://github.com/pytorch/pytorch/pull/112906

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114953
Approved by: https://github.com/zou3519, https://github.com/drisspg
2023-12-01 23:09:37 +00:00
Yang Chen
5c3f03e2dd [inductor] add a config to specify the shape attribute for the generated svg graphs (#114811)
We draw our fx graphs with the "record" shape attribute by default.
Sometimes, when the graph is very complex, we may hit dot errors like below:
  "flat edge between adjacent nodes one of which has a record shape -
   replace records with HTML-like labels"
and thus fail to generate a graph. So, let's give the user an option
to specify the shape attribute for the dot graph. For example, passing
INDUCTOR_DOT_GRAPH_SHAPE_SVG = "none" would let us generate HTML-like lables
to workaround the above failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114811
Approved by: https://github.com/weifengpy
2023-11-30 06:10:37 +00:00
Jon Chuang
80ae00d11a [AOT Refactor] jit compile runtime wrappers (#114564)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Total reduction in lines: 5200 lines -> 1100 lines

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114564
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561, #114562, #114563
2023-11-30 00:28:57 +00:00
Jon Chuang
741414b739 [AOT Refactor] dispatch compile graph (#114563)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114563
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561, #114562
2023-11-30 00:28:43 +00:00
Jon Chuang
abb84051a3 [AOT Refactor] alias runtime wrappers (#114562)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114562
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561
2023-11-30 00:24:43 +00:00
Jon Chuang
4d4093a5de [AOT Refactor] traced function transforms pt. 2 (#114561)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114561
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559
2023-11-30 00:24:05 +00:00
Jon Chuang
dab89d546c [AOT Refactor] traced function transforms pt. 1 (#114559)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Current progress: 5200 lines -> 2400 lines

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114559
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558
2023-11-30 00:24:05 +00:00
Jon Chuang
0f41a0e99d [AOT Refactor] (missed) graph signature to i/o analysis (#114558)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114558
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557
2023-11-30 00:23:59 +00:00
Jon Chuang
5ab61c1ae1 [AOT Refactor] runtime wrappers (#114557)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114557
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556
2023-11-30 00:23:52 +00:00
Jon Chuang
7eafdee4d6 [AOT Refactor] input/output analysis (#114556)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Current progress: 5200 lines -> 3000 lines

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114556
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555
2023-11-30 00:21:00 +00:00
Jon Chuang
7cb2e8387b [AOT Refactor] collect metadata analysis (#114555)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114555
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553, #114554
2023-11-30 00:21:00 +00:00
Jon Chuang
e9b03ac36d [AOT Refactor] subclass utils (#114554)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114554
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552, #114553
2023-11-30 00:17:57 +00:00
Jon Chuang
721d99181e [AOT Refactor] schemas (#114553)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Current progress: 5200 lines -> 4200 lines

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114553
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551, #114552
2023-11-30 00:15:28 +00:00
Jon Chuang
1971eda1db [AOT Refactor] functional utils (#114552)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114552
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550, #114551
2023-11-30 00:12:41 +00:00
Jon Chuang
ec4b59305b [AOT Refactor] logging utils (#114551)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114551
Approved by: https://github.com/bdhirsh
ghstack dependencies: #114550
2023-11-30 00:06:34 +00:00
Jon Chuang
41c1090e48 [AOT Refactor] utils (#114550)
---

Part _ of https://github.com/pytorch/pytorch/issues/114548

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114550
Approved by: https://github.com/bdhirsh
2023-11-30 00:02:40 +00:00
Brian Hirsh
64ccdd4afb AOTAutograd: keep input mutations in the graph if they are under no_grad, even if they require_grad (#114646)
Quick recap of events:

(1) https://github.com/pytorch/pytorch/pull/111347, which fixed a perf regression in 2.1 compared to 2.0, introduced a correctness problem around input mutations on inputs that require grad that show up in an inference-only graph (the specific case where this can happen is rare and nobody reported the issue, but it was fixed a few weeks later)

(2) That fix happened here: https://github.com/pytorch/pytorch/pull/113584, which makes sure to keep input mutations outside of the graph, so the autograd engine can set metadata properly on them

(3) That in turn caused a slight regression compared to (1), which is what this PR attempts to fix. In particular, code like the below is safe to keep the mutations in the graph for:

```
@torch.compile
def f(x):
    x.mul_(2)

x = torch.ones(2, requires_grad=True).clone()
# x requires_grad, so the input mutation will change some autograd metadata, like the version counter
# However, the mutation is under no_grad, so we don't have to worry about e.g. aliases of x having their .grad_fn fields changed
with torch.no_grad():
    f(x)
```

This particular case is pretty important to the shampoo optimizer code, which is run under `torch.compile`, and mutates parameters (which require grad).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114646
Approved by: https://github.com/zou3519
2023-11-29 04:29:32 +00:00
PyTorch MergeBot
48820c928c Revert "[test] AOTAutograd: support mutations on buffers that happen during th bw (#112906)"
This reverts commit c8974d649d.

Reverted https://github.com/pytorch/pytorch/pull/112906 on behalf of https://github.com/huydhn due to There are lots of failure after this change c8974d649d, this is probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/112906#issuecomment-1831016362))
2023-11-29 00:49:57 +00:00
Brian Hirsh
c8974d649d [test] AOTAutograd: support mutations on buffers that happen during th bw (#112906)
I can hold off on reviews / landing until I talk to Driss and we confirm that we need this for FP8. This PR also needs testing and probably shouldn't land until Tugsuu's input mutation handling [PR](https://github.com/pytorch/pytorch/pull/111046) goes through.

What this PR tries to solve is when you have a model that tries to mutate some nn module state (a buffer), but during the **backward**. It appears that this might be necessary for FP8's delayed scaling.

Today, AOTAutograd will just not realize if you happened to mutate any graph inputs when running the backward pass, and functionalize them away but not realize that they were input mutations. This PR tries to:

(a) detect this situation (input mutations during the backward)

(b) put `copy_()`'s in the graph to properly handle the input mutation when we can. In cases where we can't keep the copy_() in the graph, we just error loudly (I imagine that these cases will be extremely rare, but we can fix them if they ever come up).

This is mostly a prototype for now, not ready for review.

I made this example locally to test out:
```
import torch

class MutatingAutogradFn(torch.autograd.Function):

    @staticmethod
    def forward(ctx, x, buf):
        ctx.save_for_backward(buf)
        return x

    @staticmethod
    def backward(ctx, x_grad):
        buf = ctx.saved_tensors[0]
        buf.add_(x_grad)
        return x_grad * 3, None

class Mod(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.buf = torch.ones(2)

    @torch._dynamo.allow_in_graph
    def backward_mutating_fn(self, x, buf):
        return MutatingAutogradFn.apply(x, buf)

    def forward(self, x):
        tmp = self.backward_mutating_fn(x, self.buf)
        return tmp + self.buf

m = Mod()

x = torch.ones(2, requires_grad=True)
out = m(x)
# After the fw, buf should not have been mutated
print(m.buf)
out.sum().backward()
# bw has run, so buf should now be mutated
print(m.buf)
print(x.grad)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112906
Approved by: https://github.com/ezyang
2023-11-28 23:59:21 +00:00
voznesenskym
ddf1cb7870 AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554)
This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are:

(1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break)

(2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call.

(3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`.

(4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same).

I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation

**This PR is still silently correct in one case though**, which I'd like to discuss more. In particular, this example:
```
def f(x):
    x_view = x.view(-1)
    x.set_(torch.ones(2))
    x_view.mul_(2)
    return
```

If you have an input that experiences both a data-mutation **and** a `x_old.set_(x_new)` call, there are two cases:

(a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input

(b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like:
```

def functionalized_f(x):
    x_view = x.view(-1)
    # set_() desugars into a no-op; later usages of x will use x_output
    x_output = torch.ones(2)
    # functionalize the mutation on x_view
    x_view_updated = x.mul(2)
    x_updated = x_view_updated.view(x.shape)
    # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation
    # We need to return both updated tensors in our graph
    return x_updated, x_output
def runtime_wrapper(x):
    x_data_mutation_result, x_set_mutation_result = compiled_graph(x)
    # First, perform the data mutation on x's old storage
    x.copy_(x_data_mutation_result)
    # Then, swap out the storage of x with the new storage
    x.set_(x_set_mutation_result)
```

There are two things that make this difficult to do though:

(1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated.

(2) AOTAutograd now needs to know that we might have *two* graph outputs that correspond to a single "mutated input", which is annoying.

It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554
Approved by: https://github.com/ezyang
ghstack dependencies: #113926
2023-11-28 19:33:35 +00:00
Xuehai Pan
89a1fe6966 [pytree] register pytree node type in both C++ pytree and Python pytree (#112111)
Changes:

1. Add `_private_register_pytree_node` API in both C++ and Python pytree. In C++ pytree, the API will only register pytree node for C++ pytree. In Python pytree, the API will only register pytree node for Python pytree.
2. Do not allow registering a type as pytree node twice in the Python pytree.
3. Add thread lock to the Python pytree node register API.
4. The old `_register_pytree_node` API will call the `_private_register_pytree_node` API and raise a deprecation warning.
5. Add a new `register_pytree_node` API to register node type in both C++ and Python implementations.
6. Add tests to ensure a warning will be raised when the old private function is called.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111
Approved by: https://github.com/zou3519
2023-11-28 11:41:38 +00:00
voznesenskym
081c5b3adc Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926) (#114526)
Summary:

The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this)

cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: huydhn, Chillee

Differential Revision: D51566250

Pulled By: voznesenskym

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526
Approved by: https://github.com/Chillee, https://github.com/huydhn
2023-11-26 23:40:32 +00:00
Tugsbayasgalan Manlaibaatar
dad3cc4d02 Fix type for keep_inference_mutation flag (#114482)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114482
Approved by: https://github.com/Skylion007
ghstack dependencies: #114421, #114479, #114481
2023-11-24 00:04:31 +00:00
Tugsbayasgalan Manlaibaatar
fa71f5efdc [BE][aot_autograd] Remove unnecessary fields from ViewMutationData (#114481)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114481
Approved by: https://github.com/zhxchen17
ghstack dependencies: #114421, #114479
2023-11-24 00:04:26 +00:00
Tugsbayasgalan Manlaibaatar
e6e650d5eb [BE][aot_autograd] Remove num_mutated_inputs (#114479)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114479
Approved by: https://github.com/zhxchen17
ghstack dependencies: #114421
2023-11-24 00:04:25 +00:00
Tugsbayasgalan Manlaibaatar
a378ae33e9 [BE][aot_autograd] Remove mutated_inp_indices (#114421)
We should use mutated_inp_runtime_indices moving forward

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114421
Approved by: https://github.com/zhxchen17
2023-11-23 22:41:38 +00:00
PyTorch MergeBot
01366efcc9 Revert "[pytree] register pytree node type in both C++ pytree and Python pytree (#112111)"
This reverts commit 4e4a6ad6ec.

Reverted https://github.com/pytorch/pytorch/pull/112111 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/112111#issuecomment-1824099658))
2023-11-23 09:59:32 +00:00
PyTorch MergeBot
2f3beb715c Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926)"
This reverts commit 2ca1119d53.

Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))
2023-11-22 12:52:33 +00:00
PyTorch MergeBot
3e1abde46d Revert "AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554)"
This reverts commit a911b4db9d.

Reverted https://github.com/pytorch/pytorch/pull/111554 on behalf of https://github.com/DanilBaibak due to The lower PR in the stack #113926 breaks the internal build ([comment](https://github.com/pytorch/pytorch/pull/111554#issuecomment-1822472206))
2023-11-22 10:13:48 +00:00
Xuehai Pan
4e4a6ad6ec [pytree] register pytree node type in both C++ pytree and Python pytree (#112111)
Changes:

1. Add `_private_register_pytree_node` API in both C++ and Python pytree. In C++ pytree, the API will only register pytree node for C++ pytree. In Python pytree, the API will only register pytree node for Python pytree.
2. Do not allow registering a type as pytree node twice in the Python pytree.
3. Add thread lock to the Python pytree node register API.
4. The old `_register_pytree_node` API will call the `_private_register_pytree_node` API and raise a deprecation warning.
5. Add a new `register_pytree_node` API to register node type in both C++ and Python implementations.
6. Add tests to ensure a warning will be raised when the old private function is called.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111
Approved by: https://github.com/zou3519
2023-11-21 19:53:13 +00:00
voznesenskym
a911b4db9d AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554)
This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are:

(1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break)

(2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call.

(3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`.

(4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same).

I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation

**This PR is still silently correct in one case though**, which I'd like to discuss more. In particular, this example:
```
def f(x):
    x_view = x.view(-1)
    x.set_(torch.ones(2))
    x_view.mul_(2)
    return
```

If you have an input that experiences both a data-mutation **and** a `x_old.set_(x_new)` call, there are two cases:

(a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input

(b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like:
```

def functionalized_f(x):
    x_view = x.view(-1)
    # set_() desugars into a no-op; later usages of x will use x_output
    x_output = torch.ones(2)
    # functionalize the mutation on x_view
    x_view_updated = x.mul(2)
    x_updated = x_view_updated.view(x.shape)
    # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation
    # We need to return both updated tensors in our graph
    return x_updated, x_output
def runtime_wrapper(x):
    x_data_mutation_result, x_set_mutation_result = compiled_graph(x)
    # First, perform the data mutation on x's old storage
    x.copy_(x_data_mutation_result)
    # Then, swap out the storage of x with the new storage
    x.set_(x_set_mutation_result)
```

There are two things that make this difficult to do though:

(1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated.

(2) AOTAutograd now needs to know that we might have *two* graph outputs that correspond to a single "mutated input", which is annoying.

It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554
Approved by: https://github.com/ezyang
ghstack dependencies: #113926
2023-11-21 01:52:46 +00:00
voznesenskym
2ca1119d53 Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926)
The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors *at the end* of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor.

This PR is the result of *a lot* of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same:

1) We cache source->symbol in shape_env
2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification
3) We create a new fake mode for backends
(from https://github.com/pytorch/pytorch/pull/113605/files)

This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't).

We went back to the drawing board here, but with a few concessions:
1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons
2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926
Approved by: https://github.com/ezyang, https://github.com/eellison
2023-11-20 23:06:37 +00:00
Xuehai Pan
0c450f4504 [functorch] fix potential race condition while loading vmap decomposition library (#113520)
There can be a potential race condition while loading the `vmap` decomposition library in multi-threading programs.

This PR adds a thread lock to avoid the case of registering the kernel multiple times.

```python
import threading
from torch._functorch.vmap import lazy_load_decompositions

threads = []
for i in range(10000):
    thread = threading.Thread(target=lazy_load_decompositions)
    threads.append(thread)
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()
```

```text
RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace.
    VMAP_DECOMPOSITIONS_LIB.impl(decomp, decomposition_table[decomp])
RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace.
RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace.
RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace.
RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113520
Approved by: https://github.com/zou3519
2023-11-20 19:50:54 +00:00
Oguz Ulgen
a450c784da [AotAutograd] Move mutations hidden from autograd in graph (#113454)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113454
Approved by: https://github.com/bdhirsh
2023-11-17 22:47:06 +00:00
Jon Chuang
00b67193ef [utils] move config_typing.pyi to torch.utils (#113929)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113929
Approved by: https://github.com/ezyang, https://github.com/jansel
ghstack dependencies: #111299, #111300, #113901, #113916
2023-11-17 18:51:57 +00:00
chilli
d4bb16f443 Change functorch import to proxy_tensor import (#113913)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113913
Approved by: https://github.com/ezyang, https://github.com/zou3519
2023-11-17 18:32:50 +00:00
Edward Z. Yang
97a62c715d [BE] Remove duplicate storage_offset equality test (#113790)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113790
Approved by: https://github.com/albanD
2023-11-15 22:25:07 +00:00
Brian Hirsh
cc11c0d11b aot_autograd: keep input mutations on requires_grad=True tensor out of the graph for inference (#113584)
The original behavior of torch.compile w.r.t. input mutations maintains that if an input to a graph was mutated, **and** requires grad, we will keep the input mutation outside of the graph and replay it at runtime.

This is important because, e.g., an input can have outstanding aliases, and mutating the input in eager mode will cause autograd to change the `grad_fn` of all outstanding aliases.

It looks like landing https://github.com/pytorch/pytorch/pull/111347 changed this behavior slightly:
* The linked PR makes it possible for AOTAutograd to go down the inference code path, even if some inputs require grad (because all of the outputs of the graph were seen to not require grad)
* AOTAutograd's logic in the inference code path today is to **always** keep input mutations in the graph.

This PR fixes that regression: regardless of inference vs. training, we should always keep input mutations outside of the graph if the input requires_grad.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113584
Approved by: https://github.com/tugsbayasgalan
ghstack dependencies: #113267, #113416
2023-11-15 19:55:47 +00:00
Brian Hirsh
032e5a4528 handle cross-dtype views during AOTAutograd view-replay (#113416)
Fixes https://github.com/pytorch/pytorch/issues/109053

I think "partitioning views out of the graph" will be a more robust fix for the class of errors that we've seen around incorrectly regenerating views at runtime.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113416
Approved by: https://github.com/ezyang
ghstack dependencies: #113267
2023-11-15 19:55:47 +00:00
Aaron Gokaslan
b7b2178204 [BE]: Remove useless lambdas (#113602)
Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602
Approved by: https://github.com/albanD
2023-11-14 20:06:48 +00:00
Jez Ng
5b95715bc0 Make {Tracing,Compile}Context.get() return non-optional type (#113535)
They are used in many contexts that don't actually check if the returned
type is `None`. I have also created `try_get()` for the cases where we
do actually want an Optional type returned.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535
Approved by: https://github.com/ezyang
ghstack dependencies: #113412
2023-11-14 04:31:12 +00:00
PyTorch MergeBot
0e6b6a2483 Revert "AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554)"
This reverts commit 3afb4e5cf7.

Reverted https://github.com/pytorch/pytorch/pull/111554 on behalf of https://github.com/clee2000 due to the xla failure is real sorry, log classifier is showing the wrong line ([comment](https://github.com/pytorch/pytorch/pull/111554#issuecomment-1809177978))
2023-11-13 21:46:57 +00:00
Brian Hirsh
3afb4e5cf7 AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554)
This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are:

(1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break)

(2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call.

(3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`.

(4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same).

I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation

**This PR is still silently correct in one case though**, which I'd like to discuss more. In particular, this example:
```
def f(x):
    x_view = x.view(-1)
    x.set_(torch.ones(2))
    x_view.mul_(2)
    return
```

If you have an input that experiences both a data-mutation **and** a `x_old.set_(x_new)` call, there are two cases:

(a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input

(b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like:
```

def functionalized_f(x):
    x_view = x.view(-1)
    # set_() desugars into a no-op; later usages of x will use x_output
    x_output = torch.ones(2)
    # functionalize the mutation on x_view
    x_view_updated = x.mul(2)
    x_updated = x_view_updated.view(x.shape)
    # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation
    # We need to return both updated tensors in our graph
    return x_updated, x_output
def runtime_wrapper(x):
    x_data_mutation_result, x_set_mutation_result = compiled_graph(x)
    # First, perform the data mutation on x's old storage
    x.copy_(x_data_mutation_result)
    # Then, swap out the storage of x with the new storage
    x.set_(x_set_mutation_result)
```

There are two things that make this difficult to do though:

(1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated.

(2) AOTAutograd now needs to know that we might have *two* graph outputs that correspond to a single "mutated input", which is annoying.

It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554
Approved by: https://github.com/ezyang
2023-11-13 16:39:25 +00:00