pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
voznesenskym	76ced0df03	Consider storage_changed for assigning alias_of_input in aot_autograd when computing differentiable outputs that alias each other (#115315 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115315 Approved by: https://github.com/bdhirsh	2023-12-12 23:21:58 +00:00
Oguz Ulgen	c9c4cdf9a9	[AOTAutograd] Do not call ctx.mark_dirty on mutations hidden from autograd (#115324 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115324 Approved by: https://github.com/bdhirsh	2023-12-09 02:23:13 +00:00
Bin Bao	0757e2ba84	[aotautograd] Fix an output shape error when inputs are aliased (#115279 ) Summary: https://github.com/pytorch/pytorch/issues/97083, when an output is marked as OutputType.is_input but a synthetic base is constructed because of aliased inputs, we may need to update the output type to OutputType.alias_of_input if needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115279 Approved by: https://github.com/bdhirsh	2023-12-06 23:10:21 +00:00
Brian Hirsh	1102d37958	remove aot_config.keep_inference_input_mutations from assert_functional_graph (#115195 ) We technically allow backends to aot_autograd to pass a config saying "yes I am ok with seeing input mutations in my graph". With https://github.com/pytorch/pytorch/pull/112906 though, there can be input mutations that show up in the backward (that we need to handle for correctness), that are a large pain to keep out of the graph. The meta-point is that it's been ~a year since we added the config, and it almost always makes sense for backends to support input mutations for performance reasons (inductor does). So I just allow these input mutations in the graph in this rare backward situation, even if the backend didn't explicitly use the config. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115195 Approved by: https://github.com/drisspg	2023-12-05 23:36:37 +00:00
Joel Schlosser	22704426c3	Expand dynamic dims support for traceable subclasses (#114311 ) Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors of the same dim as outer when `mark_dynamic(outer, ...)` is called * Addresses this: `6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)` * Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols) * Signatures now: ```python # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr) # ctx is anything useful for rebuilding the class we want to guard on attrs, ctx = x.__tensor_flatten__() ... # inner_tensors is a dict of {attr -> tensor} # ctx is taken unmodified from flattening and (eventually) guarded on # outer_size is the expected size of the output; possibly symbolic # outer_stride is the expected strides of the output; possibly symbolic y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride) # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride # the assert simplifies symbols when there are relationships between outer and inner symbols ``` * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work) * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Next PR: add TENSOR_MATCH guards on inner tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311 Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh	2023-12-05 21:09:25 +00:00
Tugsbayasgalan Manlaibaatar	f1c8c427da	Fix https://github.com/pytorch/pytorch/issues/114892 (#115054 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115054 Approved by: https://github.com/bdhirsh	2023-12-04 18:29:33 +00:00
voznesenskym	4cfe997490	[dynamo] handle setting .data on a tensor (#113080 ) Dynamo We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. aot_autograd For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/113080 Approved by: https://github.com/bdhirsh	2023-12-02 00:35:44 +00:00
Brian Hirsh	c546ca9f80	AOTAutograd: support mutations on buffers that happen during the bw (#114953 ) Re-land of https://github.com/pytorch/pytorch/pull/112906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114953 Approved by: https://github.com/zou3519, https://github.com/drisspg	2023-12-01 23:09:37 +00:00
Yang Chen	5c3f03e2dd	[inductor] add a config to specify the shape attribute for the generated svg graphs (#114811 ) We draw our fx graphs with the "record" shape attribute by default. Sometimes, when the graph is very complex, we may hit dot errors like below: "flat edge between adjacent nodes one of which has a record shape - replace records with HTML-like labels" and thus fail to generate a graph. So, let's give the user an option to specify the shape attribute for the dot graph. For example, passing INDUCTOR_DOT_GRAPH_SHAPE_SVG = "none" would let us generate HTML-like lables to workaround the above failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114811 Approved by: https://github.com/weifengpy	2023-11-30 06:10:37 +00:00
Jon Chuang	80ae00d11a	[AOT Refactor] jit compile runtime wrappers (#114564 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Total reduction in lines: 5200 lines -> 1100 lines Pull Request resolved: https://github.com/pytorch/pytorch/pull/114564 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561, #114562, #114563	2023-11-30 00:28:57 +00:00
Jon Chuang	741414b739	[AOT Refactor] dispatch compile graph (#114563 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114563 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561, #114562	2023-11-30 00:28:43 +00:00
Jon Chuang	abb84051a3	[AOT Refactor] alias runtime wrappers (#114562 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114562 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561	2023-11-30 00:24:43 +00:00
Jon Chuang	4d4093a5de	[AOT Refactor] traced function transforms pt. 2 (#114561 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114561 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559	2023-11-30 00:24:05 +00:00
Jon Chuang	dab89d546c	[AOT Refactor] traced function transforms pt. 1 (#114559 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Current progress: 5200 lines -> 2400 lines Pull Request resolved: https://github.com/pytorch/pytorch/pull/114559 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558	2023-11-30 00:24:05 +00:00
Jon Chuang	0f41a0e99d	[AOT Refactor] (missed) graph signature to i/o analysis (#114558 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114558 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557	2023-11-30 00:23:59 +00:00
Jon Chuang	5ab61c1ae1	[AOT Refactor] runtime wrappers (#114557 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114557 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556	2023-11-30 00:23:52 +00:00
Jon Chuang	7eafdee4d6	[AOT Refactor] input/output analysis (#114556 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Current progress: 5200 lines -> 3000 lines Pull Request resolved: https://github.com/pytorch/pytorch/pull/114556 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555	2023-11-30 00:21:00 +00:00
Jon Chuang	7cb2e8387b	[AOT Refactor] collect metadata analysis (#114555 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114555 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554	2023-11-30 00:21:00 +00:00
Jon Chuang	e9b03ac36d	[AOT Refactor] subclass utils (#114554 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114554 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553	2023-11-30 00:17:57 +00:00
Jon Chuang	721d99181e	[AOT Refactor] schemas (#114553 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Current progress: 5200 lines -> 4200 lines Pull Request resolved: https://github.com/pytorch/pytorch/pull/114553 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552	2023-11-30 00:15:28 +00:00
Jon Chuang	1971eda1db	[AOT Refactor] functional utils (#114552 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114552 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551	2023-11-30 00:12:41 +00:00
Jon Chuang	ec4b59305b	[AOT Refactor] logging utils (#114551 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114551 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550	2023-11-30 00:06:34 +00:00
Jon Chuang	41c1090e48	[AOT Refactor] utils (#114550 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114550 Approved by: https://github.com/bdhirsh	2023-11-30 00:02:40 +00:00
Brian Hirsh	64ccdd4afb	AOTAutograd: keep input mutations in the graph if they are under no_grad, even if they require_grad (#114646 ) Quick recap of events: (1) https://github.com/pytorch/pytorch/pull/111347, which fixed a perf regression in 2.1 compared to 2.0, introduced a correctness problem around input mutations on inputs that require grad that show up in an inference-only graph (the specific case where this can happen is rare and nobody reported the issue, but it was fixed a few weeks later) (2) That fix happened here: https://github.com/pytorch/pytorch/pull/113584, which makes sure to keep input mutations outside of the graph, so the autograd engine can set metadata properly on them (3) That in turn caused a slight regression compared to (1), which is what this PR attempts to fix. In particular, code like the below is safe to keep the mutations in the graph for: ``` @torch.compile def f(x): x.mul_(2) x = torch.ones(2, requires_grad=True).clone() # x requires_grad, so the input mutation will change some autograd metadata, like the version counter # However, the mutation is under no_grad, so we don't have to worry about e.g. aliases of x having their .grad_fn fields changed with torch.no_grad(): f(x) ``` This particular case is pretty important to the shampoo optimizer code, which is run under `torch.compile`, and mutates parameters (which require grad). Pull Request resolved: https://github.com/pytorch/pytorch/pull/114646 Approved by: https://github.com/zou3519	2023-11-29 04:29:32 +00:00
PyTorch MergeBot	48820c928c	Revert "[test] AOTAutograd: support mutations on buffers that happen during th bw (#112906 )" This reverts commit `c8974d649d`. Reverted https://github.com/pytorch/pytorch/pull/112906 on behalf of https://github.com/huydhn due to There are lots of failure after this change `c8974d649d`, this is probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/112906#issuecomment-1831016362))	2023-11-29 00:49:57 +00:00
Brian Hirsh	c8974d649d	[test] AOTAutograd: support mutations on buffers that happen during th bw (#112906 ) I can hold off on reviews / landing until I talk to Driss and we confirm that we need this for FP8. This PR also needs testing and probably shouldn't land until Tugsuu's input mutation handling [PR](https://github.com/pytorch/pytorch/pull/111046) goes through. What this PR tries to solve is when you have a model that tries to mutate some nn module state (a buffer), but during the backward. It appears that this might be necessary for FP8's delayed scaling. Today, AOTAutograd will just not realize if you happened to mutate any graph inputs when running the backward pass, and functionalize them away but not realize that they were input mutations. This PR tries to: (a) detect this situation (input mutations during the backward) (b) put `copy_()`'s in the graph to properly handle the input mutation when we can. In cases where we can't keep the copy_() in the graph, we just error loudly (I imagine that these cases will be extremely rare, but we can fix them if they ever come up). This is mostly a prototype for now, not ready for review. I made this example locally to test out: ``` import torch class MutatingAutogradFn(torch.autograd.Function): @staticmethod def forward(ctx, x, buf): ctx.save_for_backward(buf) return x @staticmethod def backward(ctx, x_grad): buf = ctx.saved_tensors[0] buf.add_(x_grad) return x_grad * 3, None class Mod(torch.nn.Module): def __init__(self): super().__init__() self.buf = torch.ones(2) @torch._dynamo.allow_in_graph def backward_mutating_fn(self, x, buf): return MutatingAutogradFn.apply(x, buf) def forward(self, x): tmp = self.backward_mutating_fn(x, self.buf) return tmp + self.buf m = Mod() x = torch.ones(2, requires_grad=True) out = m(x) # After the fw, buf should not have been mutated print(m.buf) out.sum().backward() # bw has run, so buf should now be mutated print(m.buf) print(x.grad) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112906 Approved by: https://github.com/ezyang	2023-11-28 23:59:21 +00:00
voznesenskym	ddf1cb7870	AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 ) This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are: (1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break) (2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call. (3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`. (4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same). I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation This PR is still silently correct in one case though, which I'd like to discuss more. In particular, this example: ``` def f(x): x_view = x.view(-1) x.set_(torch.ones(2)) x_view.mul_(2) return ``` If you have an input that experiences both a data-mutation and a `x_old.set_(x_new)` call, there are two cases: (a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input (b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like: ``` def functionalized_f(x): x_view = x.view(-1) # set_() desugars into a no-op; later usages of x will use x_output x_output = torch.ones(2) # functionalize the mutation on x_view x_view_updated = x.mul(2) x_updated = x_view_updated.view(x.shape) # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation # We need to return both updated tensors in our graph return x_updated, x_output def runtime_wrapper(x): x_data_mutation_result, x_set_mutation_result = compiled_graph(x) # First, perform the data mutation on x's old storage x.copy_(x_data_mutation_result) # Then, swap out the storage of x with the new storage x.set_(x_set_mutation_result) ``` There are two things that make this difficult to do though: (1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated. (2) AOTAutograd now needs to know that we might have two graph outputs that correspond to a single "mutated input", which is annoying. It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554 Approved by: https://github.com/ezyang ghstack dependencies: #113926	2023-11-28 19:33:35 +00:00
Xuehai Pan	89a1fe6966	[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 ) Changes: 1. Add `_private_register_pytree_node` API in both C++ and Python pytree. In C++ pytree, the API will only register pytree node for C++ pytree. In Python pytree, the API will only register pytree node for Python pytree. 2. Do not allow registering a type as pytree node twice in the Python pytree. 3. Add thread lock to the Python pytree node register API. 4. The old `_register_pytree_node` API will call the `_private_register_pytree_node` API and raise a deprecation warning. 5. Add a new `register_pytree_node` API to register node type in both C++ and Python implementations. 6. Add tests to ensure a warning will be raised when the old private function is called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111 Approved by: https://github.com/zou3519	2023-11-28 11:41:38 +00:00
voznesenskym	081c5b3adc	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) (#114526 ) Summary: The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526 Approved by: https://github.com/Chillee, https://github.com/huydhn	2023-11-26 23:40:32 +00:00
Tugsbayasgalan Manlaibaatar	dad3cc4d02	Fix type for keep_inference_mutation flag (#114482 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114482 Approved by: https://github.com/Skylion007 ghstack dependencies: #114421, #114479, #114481	2023-11-24 00:04:31 +00:00
Tugsbayasgalan Manlaibaatar	fa71f5efdc	[BE][aot_autograd] Remove unnecessary fields from ViewMutationData (#114481 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114481 Approved by: https://github.com/zhxchen17 ghstack dependencies: #114421, #114479	2023-11-24 00:04:26 +00:00
Tugsbayasgalan Manlaibaatar	e6e650d5eb	[BE][aot_autograd] Remove num_mutated_inputs (#114479 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114479 Approved by: https://github.com/zhxchen17 ghstack dependencies: #114421	2023-11-24 00:04:25 +00:00
Tugsbayasgalan Manlaibaatar	a378ae33e9	[BE][aot_autograd] Remove mutated_inp_indices (#114421 ) We should use mutated_inp_runtime_indices moving forward Pull Request resolved: https://github.com/pytorch/pytorch/pull/114421 Approved by: https://github.com/zhxchen17	2023-11-23 22:41:38 +00:00
PyTorch MergeBot	01366efcc9	Revert "[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 )" This reverts commit `4e4a6ad6ec`. Reverted https://github.com/pytorch/pytorch/pull/112111 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/112111#issuecomment-1824099658))	2023-11-23 09:59:32 +00:00
PyTorch MergeBot	2f3beb715c	Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 )" This reverts commit `2ca1119d53`. Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))	2023-11-22 12:52:33 +00:00
PyTorch MergeBot	3e1abde46d	Revert "AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 )" This reverts commit `a911b4db9d`. Reverted https://github.com/pytorch/pytorch/pull/111554 on behalf of https://github.com/DanilBaibak due to The lower PR in the stack #113926 breaks the internal build ([comment](https://github.com/pytorch/pytorch/pull/111554#issuecomment-1822472206))	2023-11-22 10:13:48 +00:00
Xuehai Pan	4e4a6ad6ec	[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 ) Changes: 1. Add `_private_register_pytree_node` API in both C++ and Python pytree. In C++ pytree, the API will only register pytree node for C++ pytree. In Python pytree, the API will only register pytree node for Python pytree. 2. Do not allow registering a type as pytree node twice in the Python pytree. 3. Add thread lock to the Python pytree node register API. 4. The old `_register_pytree_node` API will call the `_private_register_pytree_node` API and raise a deprecation warning. 5. Add a new `register_pytree_node` API to register node type in both C++ and Python implementations. 6. Add tests to ensure a warning will be raised when the old private function is called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111 Approved by: https://github.com/zou3519	2023-11-21 19:53:13 +00:00
voznesenskym	a911b4db9d	AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 ) This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are: (1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break) (2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call. (3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`. (4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same). I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation This PR is still silently correct in one case though, which I'd like to discuss more. In particular, this example: ``` def f(x): x_view = x.view(-1) x.set_(torch.ones(2)) x_view.mul_(2) return ``` If you have an input that experiences both a data-mutation and a `x_old.set_(x_new)` call, there are two cases: (a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input (b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like: ``` def functionalized_f(x): x_view = x.view(-1) # set_() desugars into a no-op; later usages of x will use x_output x_output = torch.ones(2) # functionalize the mutation on x_view x_view_updated = x.mul(2) x_updated = x_view_updated.view(x.shape) # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation # We need to return both updated tensors in our graph return x_updated, x_output def runtime_wrapper(x): x_data_mutation_result, x_set_mutation_result = compiled_graph(x) # First, perform the data mutation on x's old storage x.copy_(x_data_mutation_result) # Then, swap out the storage of x with the new storage x.set_(x_set_mutation_result) ``` There are two things that make this difficult to do though: (1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated. (2) AOTAutograd now needs to know that we might have two graph outputs that correspond to a single "mutated input", which is annoying. It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554 Approved by: https://github.com/ezyang ghstack dependencies: #113926	2023-11-21 01:52:46 +00:00
voznesenskym	2ca1119d53	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-11-20 23:06:37 +00:00
Xuehai Pan	0c450f4504	[functorch] fix potential race condition while loading `vmap` decomposition library (#113520 ) There can be a potential race condition while loading the `vmap` decomposition library in multi-threading programs. This PR adds a thread lock to avoid the case of registering the kernel multiple times. ```python import threading from torch._functorch.vmap import lazy_load_decompositions threads = [] for i in range(10000): thread = threading.Thread(target=lazy_load_decompositions) threads.append(thread) for thread in threads: thread.start() for thread in threads: thread.join() ``` ```text RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. VMAP_DECOMPOSITIONS_LIB.impl(decomp, decomposition_table[decomp]) RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113520 Approved by: https://github.com/zou3519	2023-11-20 19:50:54 +00:00
Oguz Ulgen	a450c784da	[AotAutograd] Move mutations hidden from autograd in graph (#113454 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113454 Approved by: https://github.com/bdhirsh	2023-11-17 22:47:06 +00:00
Jon Chuang	00b67193ef	[utils] move `config_typing.pyi` to `torch.utils` (#113929 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113929 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #111299, #111300, #113901, #113916	2023-11-17 18:51:57 +00:00
chilli	d4bb16f443	Change functorch import to proxy_tensor import (#113913 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113913 Approved by: https://github.com/ezyang, https://github.com/zou3519	2023-11-17 18:32:50 +00:00
Edward Z. Yang	97a62c715d	[BE] Remove duplicate storage_offset equality test (#113790 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113790 Approved by: https://github.com/albanD	2023-11-15 22:25:07 +00:00
Brian Hirsh	cc11c0d11b	aot_autograd: keep input mutations on requires_grad=True tensor out of the graph for inference (#113584 ) The original behavior of torch.compile w.r.t. input mutations maintains that if an input to a graph was mutated, and requires grad, we will keep the input mutation outside of the graph and replay it at runtime. This is important because, e.g., an input can have outstanding aliases, and mutating the input in eager mode will cause autograd to change the `grad_fn` of all outstanding aliases. It looks like landing https://github.com/pytorch/pytorch/pull/111347 changed this behavior slightly: * The linked PR makes it possible for AOTAutograd to go down the inference code path, even if some inputs require grad (because all of the outputs of the graph were seen to not require grad) * AOTAutograd's logic in the inference code path today is to always keep input mutations in the graph. This PR fixes that regression: regardless of inference vs. training, we should always keep input mutations outside of the graph if the input requires_grad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113584 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #113267, #113416	2023-11-15 19:55:47 +00:00
Brian Hirsh	032e5a4528	handle cross-dtype views during AOTAutograd view-replay (#113416 ) Fixes https://github.com/pytorch/pytorch/issues/109053 I think "partitioning views out of the graph" will be a more robust fix for the class of errors that we've seen around incorrectly regenerating views at runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113416 Approved by: https://github.com/ezyang ghstack dependencies: #113267	2023-11-15 19:55:47 +00:00
Aaron Gokaslan	b7b2178204	[BE]: Remove useless lambdas (#113602 ) Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602 Approved by: https://github.com/albanD	2023-11-14 20:06:48 +00:00
Jez Ng	5b95715bc0	Make {Tracing,Compile}Context.get() return non-optional type (#113535 ) They are used in many contexts that don't actually check if the returned type is `None`. I have also created `try_get()` for the cases where we do actually want an Optional type returned. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535 Approved by: https://github.com/ezyang ghstack dependencies: #113412	2023-11-14 04:31:12 +00:00
PyTorch MergeBot	0e6b6a2483	Revert "AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 )" This reverts commit `3afb4e5cf7`. Reverted https://github.com/pytorch/pytorch/pull/111554 on behalf of https://github.com/clee2000 due to the xla failure is real sorry, log classifier is showing the wrong line ([comment](https://github.com/pytorch/pytorch/pull/111554#issuecomment-1809177978))	2023-11-13 21:46:57 +00:00
Brian Hirsh	3afb4e5cf7	AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 ) This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are: (1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break) (2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call. (3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`. (4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same). I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation This PR is still silently correct in one case though, which I'd like to discuss more. In particular, this example: ``` def f(x): x_view = x.view(-1) x.set_(torch.ones(2)) x_view.mul_(2) return ``` If you have an input that experiences both a data-mutation and a `x_old.set_(x_new)` call, there are two cases: (a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input (b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like: ``` def functionalized_f(x): x_view = x.view(-1) # set_() desugars into a no-op; later usages of x will use x_output x_output = torch.ones(2) # functionalize the mutation on x_view x_view_updated = x.mul(2) x_updated = x_view_updated.view(x.shape) # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation # We need to return both updated tensors in our graph return x_updated, x_output def runtime_wrapper(x): x_data_mutation_result, x_set_mutation_result = compiled_graph(x) # First, perform the data mutation on x's old storage x.copy_(x_data_mutation_result) # Then, swap out the storage of x with the new storage x.set_(x_set_mutation_result) ``` There are two things that make this difficult to do though: (1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated. (2) AOTAutograd now needs to know that we might have two graph outputs that correspond to a single "mutated input", which is annoying. It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554 Approved by: https://github.com/ezyang	2023-11-13 16:39:25 +00:00

1 2 3 4 5 ...

330 Commits