pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Tugsbayasgalan Manlaibaatar	c646030cd2	Support higher order op functionalization in predispatch IR (#115314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115314 Approved by: https://github.com/bdhirsh	2024-03-01 09:13:47 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	8646872ff7	Make balance_gradient preserved in export (#120332 ) Summary: We can only not-decompose CompositeImplicit functional custom ops. From the looks of the implementation, this op looks functional. So the fix is just fixing the schema. Test Plan: CI Differential Revision: D54019265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120332 Approved by: https://github.com/zhxchen17	2024-02-23 19:14:08 +00:00
Tugsbayasgalan Manlaibaatar	fa1e89b337	Ban mutation on dropout outputs in export (#117879 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117879 Approved by: https://github.com/ezyang ghstack dependencies: #117811	2024-01-21 04:53:40 +00:00
Tugsbayasgalan Manlaibaatar	81f98f1082	Experimental non-strict mode (#114658 ) This is proof-of-concept implementation of how people can use a marker `mark_strict` to enable torchdynamo while exporting under non-strict mode. The main idea is that `mark_strict` will turn into an HOO which then utilizes dynamo to do correctness analysis in the same way how torch.cond works today. There are some notable limitations: 1. This API is not meant for public use yet 2. Strict region can't work with arbitrary container inputs 3. We don't preserve `nn_module_stack` and other node metadata for the strict region. 4. strict_mode HOO will show up in the final graph. This is undesirable in the long term, but for short term experiments, it should be good enough. Will fix this in the follow up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114658 Approved by: https://github.com/ydwu4	2024-01-04 12:24:58 +00:00
Tugsbayasgalan Manlaibaatar	dfc898ede4	Don't decompose functional ops in predispatch functionalization (#116383 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116383 Approved by: https://github.com/bdhirsh ghstack dependencies: #115188, #115210	2023-12-28 11:54:04 +00:00
Tugsbayasgalan Manlaibaatar	76b1d44d57	pre_dispatch aot_export (#115188 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115188 Approved by: https://github.com/bdhirsh	2023-12-25 04:51:21 +00:00
PyTorch MergeBot	0567f71ac6	Revert " pre_dispatch aot_export (#115188 )" This reverts commit `a267d67350`. Reverted https://github.com/pytorch/pytorch/pull/115188 on behalf of https://github.com/jeanschmidt due to sadly, it is required to revert this commit in order to revert https://github.com/pytorch/pytorch/pull/115454 ([comment](https://github.com/pytorch/pytorch/pull/115188#issuecomment-1866310014))	2023-12-21 14:03:18 +00:00
Tugsbayasgalan Manlaibaatar	a267d67350	pre_dispatch aot_export (#115188 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115188 Approved by: https://github.com/bdhirsh	2023-12-20 21:36:25 +00:00
Tugsbayasgalan Manlaibaatar	d85314c95c	Support Predispatch functionalization (#113728 ) In this PR, we are implementing Functionalization on pre-dispatch graph. Today, every dispatch key except for Dispatchkey.Python has a dedicated mode stack in python. PreDispatch tracing relies on this behaviour by pushing ProxyTorchDispatchMode to Dispatchkey.PreDispatch mode stack and handle the dispatching logic in python. To make pre-dispatch functionalization work, we now need to push FunctionalTensorMode on DispatchKey.PreDispatch mode stack and make sure it runs before ProxyTorchDispatchMode. (this is very similar to how post-dispatch tracing work). Here are some design decisions we made for this flow to work: 1. FunctionalTensorMode internally calls C++ functionalize key. Since C++ functionalization goes after PreDispatch, if we are not careful, we will keep re-entering into PreDispatch key. We solve this by directly dispatching to C++ Functionalize key. 2. We delete mode_stack_per_key logic because the only realistic time it is exercised is for PreDispatch and it is in general not safe to have a plain list because FunctionalTensorMode and ProxyTorchDispatchMode ordering matter and it is hard to enforce it on plain list. Instead, now we have a private class that tracks PreDispatch mode stack. 3. We will still run CompositeImplicitAutograd decomps in this PR, and disable this logic later as a followup. Some missing bits after this PR: 1. Preserving autograd ops in a functional form. Right now they still show up in the graph but in a "non-functional" way. 2. Turn off CompositeImplicitAutograd decomps 3. Functionalizing HOO Pull Request resolved: https://github.com/pytorch/pytorch/pull/113728 Approved by: https://github.com/bdhirsh	2023-12-19 20:28:35 +00:00
rzou	99257002fa	Extend auto_functionalized to support ops that return Tensors (#115135 ) We can auto-functionalize operators that mutate their inputs as long as the outputs of the operator do not alias their inputs. The user needs to provide an abstract impl for the operator if it has non-trivial returns. - We update can_auto_functionalize(op) to include ops that return (but do not alias) Tensors - We update auto_functionalized(op, mutated_args_names, kwargs) to return (out, mutated_args), where `out = op(**kwargs)` and `mutated_args` are the new values of the inputs that would have been mutated. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/115135 Approved by: https://github.com/bdhirsh ghstack dependencies: #114955, #114956, #115134	2023-12-05 22:43:06 +00:00
rzou	cfa4370c07	torch.compile should auto-functionalize certain mutable ops (#114955 ) Users may wish to torch.compile custom ops that mutate their inputs and return nothing (this is a common class of operators). torch.compile will automatically support this op without anyone needing to provide a functionalization kernel for it. Here's how. Let's say we have a hypothetical mylib::sin_(Tensor(a!) x) -> () op. First, when FakeTensor sees this op, it can just return None. This is the case because custom ops are not allowed to mutate input metadata, so the FakeTensor rule for one that returns nothing is trivial. Next, when Python FunctionalTensor sees the op, it will functionalize it by emitting a call to an auto_functionalize(op, ["x"], {"x": ...}) HOP and replacing the mutated inputs with the outputs of this HOP. This HOP effectively runs the functional version of the op when called: it clones inputs that will be mutated, runs the op, and then returns Tensors with the new values. In the future we can teach Inductor how to do re-inplacing when it sees this HOP (like how triton kernels do it) but this isn't urgent (and is more of a performance problem). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/114955 Approved by: https://github.com/bdhirsh	2023-12-05 14:53:08 +00:00
Yukio Siraichi	132cb57e47	Skip aliasing correction for `lift_fresh`. (#112202 ) Fix: #111506 This PR skips aliasing correction on `lift_fresh` calls. Reasoning is: although unlifted and lifted tensors are technically aliases, they are from different levels of abstraction (`FunctionalTensorWrapper` and `XLATensor`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/112202 Approved by: https://github.com/bdhirsh	2023-11-03 20:46:30 +00:00
Peter Bell	66c32d099a	Use `pytree.arg_tree_leaves` everywhere (#112394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393	2023-10-31 15:57:06 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
Oguz Ulgen	4e310fd875	[Autograd] Track when mutations are for triton kernels (#111500 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111500 Approved by: https://github.com/bdhirsh	2023-10-19 15:34:34 +00:00
Brian Hirsh	4cf23c6a61	FunctionalTensor: avoid spurious not_implemented logging during proxy tracing (#111040 ) This is kind of hard to test, but I can try to add a test case if requested. I noticed locally that we now end up logging to the ProxyTensorMode and FakeTensorMode `not_implemented` logs in very simple compile examples: https://github.com/pytorch/pytorch/blob/main/torch/fx/experimental/proxy_tensor.py#L269 It was because `_mirror_autograd_meta_to()` indirectly queries sizes, and since modes have higher priority than subclasses, `aten::sym_sizes()` was getting dispatched to our modes before going to `FunctionalTensor.__torch_dispatch__`. This works out fine (they return NotImplemented and we eventually get to `FunctionalTensor`) but I figured we want to avoid cluttering up the logs. So I wrapped the calls with `FunctionalTensorMode`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111040 Approved by: https://github.com/ezyang	2023-10-16 15:18:20 +00:00
Oguz Ulgen	f04b1a0d27	[AOTInductor] Implement autograd eager backend for native triton kernels (#110403 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110403 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2023-10-04 17:56:56 +00:00
Brian Hirsh	b457e3f79a	Reland attempt 2 of "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" (#109906 )" (#110079 ) The first reland broke internal (failing diff: D49617462). The major error looks like it's because there's an internal-only higher order op that needs a new functionalization rule. I'm going to land an internal diff for that and confirm tests pass before relanding this PR. Also confirmed that the issue from https://github.com/pytorch/pytorch/issues/110121 is fixed, and added a test. This reverts commit `1b90f07f5a`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110079 Approved by: https://github.com/ezyang	2023-10-03 18:50:25 +00:00
Brian Hirsh	63526a63f5	Make FunctionalTensor subclass to be more like functorch (interaction with ZeroTensor + Conjugate key) (#109023 ) I added some tests for Conj, Neg and ZeroTensor for both python and C++ functionalization. This also fixes a nasty segfult when running a functorch `jacfwd` test with `torch.compile`, once AOTAutograd is using `FunctionalTensor`. Changes: (1) I use Jeffrey's `make_wrapper_subclass(extra_dispatch_keys)` kwarg to plumb extra dispatch keys ontoto the wrapper, mirroring what C++ functionalization does (C++ functionalization will mirror all dispatch keys from the inner tensor to the wrapper, except for python and functorch keys). (2) FunctionalTensorMode will decompose CompositeImplicitAutograd ops, since (for example) ZeroTensor kernels can send ops like `.to()` directly to the Python key. We'll need a way to toggle this later for pre-dispatch functionalization (3) Bound `_ForceDispatchKeyGuard` and BatchedTensorImpl's dispatch keyset to python Pull Request resolved: https://github.com/pytorch/pytorch/pull/109023 Approved by: https://github.com/zou3519 ghstack dependencies: #108654, #109662, #109632	2023-09-22 07:09:04 +00:00
hauntsaninja	2cd0b94533	Hide __getattr__ from type checkers (#109683 ) Visibility of this causes type checkers to conservatively assume that all attributes are defined on torch module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109683 Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/malfet	2023-09-21 17:01:23 +00:00
Brian Hirsh	238fb66085	python functionalization: support higher order ops (#108656 ) We now have two types of functionalization, C++ Functionalization (through the `Functionalize` dispatch key), and python functionalization (through the `FunctionalTensorMode` torch_dispatch mode). This means that all higher order ops need custom functionalization rules for the python variant too. I added them here, as well as a helper function `dispatch_functionalize()` - equivalent to `torch.func.functionalize()`, except that it uses `FunctionalTensorMode`. In theory we could have secretly switched `torch.func.functionalize` to use `FunctionalTensorMode`. This would be BC-breaking, though, since `FunctionalTensorMode` isn't composable with the other functorch transforms (the functorch layer-mode stack doesn't know how to re-order torch_dispatch modes arbitrarily). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108656 Approved by: https://github.com/zou3519 ghstack dependencies: #109024, #109248	2023-09-20 04:37:31 +00:00
Brian Hirsh	25e81f19f3	reland "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" (#109518 ) Reland - the previous PR was reverted by internal with this error: ``` File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/buck-out/v2/gen/fbcode/363cd7e240f5d021/caffe2/torch/fb/trainer/data_modules/tests/__test_dataloader__/test_dataloader#link-tree/torch/__init__.py", line 29, in <module> from ._utils_internal import _functionalize_sync as _sync ImportError: cannot import name '_functionalize_sync' from 'torch._utils_internal' ``` I couldn't figure out why internal was unhappy with the import. One potential reason is that I see a build rule for another `_utils_internal.py` in the fb folder here ([link](https://www.internalfb.com/code/fbsource/[30ed85cd88409af98b7490be137aaa5dfd7afd01]/fbcode/caffe2/TARGETS?lines=444)) Rather than burn more time investigating, I confirmed internally that the error goes away if I move the util from `torch/_utils_internal.py` to `torch/_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109518 Approved by: https://github.com/albanD	2023-09-19 13:25:24 +00:00
PyTorch MergeBot	49b18ae546	Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" This reverts commit `0ad595954a`. Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))	2023-09-17 20:57:41 +00:00
Brian Hirsh	0ad595954a	python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 ) Added two new utils to help with turning python functionalization on in AOTAutograd (next PR): (1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic (2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917 Approved by: https://github.com/ezyang ghstack dependencies: #106404	2023-09-15 20:19:25 +00:00
Brian Hirsh	f22b303f65	Add TorchDispatch version of functionalization (#106404 ) This PR adds a new `FunctionalTensor` subclass, and `FunctionalTensorMode` torch dispatch mode. Together, this class/mode are a lightweight wrapper around our existing C++ functionalization logic. This idea came from Ed - later in the stack, I want to be able to run functionalization underneath torch_dispatch, when performing tracing in AOTAutograd. I can't do this easily with vanilla C++ functionalization, because it has a dedicated dispatch key that always runs before TorchDispatch. However, by adding a torch_dispatch mode shim around functionalization, we can use functionalization as a torch_dispatch mode, which will make it easier to run underneath other modes later. This PR provides the basic new classes, and some light testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106404 Approved by: https://github.com/ezyang	2023-09-15 20:19:25 +00:00

25 Commits