pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
soulitzer	110382bacf	Make NestedTensor compilable with eager backend (#109171 ) In this PR: - Adds support for strides for jagged tensor (design doc for this coming soon) - NestedTensor skips automatic dynamic - Make use of @bdhirsh's subclass fakification logic by adding the __tensor_{un,}flatten__ functions. - Additional logic for fakification: since existing subclass fakification logic does not handle the case where the outer tensor has an additional dimension. We insert one-off logic to (1) insert an extra SingletonSymInt onto the fakified NestedTensor. (2) make sure we call track_symint on both the sizes on the inner and outer tensor during guard creation. Remaining things that are weird: - Still need to skip some logic in meta utils for some reason (I was going to write this up more, but decided not to since we're not able to do this anyway for a immediate reason: we cannot arbitrarily compare singleton ints. For now I'm just following Brian's advise from [here](https://github.com/pytorch/pytorch/pull/109171#discussion_r1328137070) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109171 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-10-11 04:47:10 +00:00
drisspg	e0dbaa04d2	Fix the meta func for mem_eff_backward (#110893 ) Fixes #110832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110893 Approved by: https://github.com/eellison	2023-10-11 02:58:54 +00:00
angelayi	096b14eae8	Fix numel test to be > 2 (#110731 ) This makes it consistent with the comment. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110731 Approved by: https://github.com/angelayi	2023-10-07 19:18:59 +00:00
Oguz Ulgen	f04b1a0d27	[AOTInductor] Implement autograd eager backend for native triton kernels (#110403 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110403 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2023-10-04 17:56:56 +00:00
Brian Hirsh	b457e3f79a	Reland attempt 2 of "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" (#109906 )" (#110079 ) The first reland broke internal (failing diff: D49617462). The major error looks like it's because there's an internal-only higher order op that needs a new functionalization rule. I'm going to land an internal diff for that and confirm tests pass before relanding this PR. Also confirmed that the issue from https://github.com/pytorch/pytorch/issues/110121 is fixed, and added a test. This reverts commit `1b90f07f5a`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110079 Approved by: https://github.com/ezyang	2023-10-03 18:50:25 +00:00
Edward Z. Yang	f7c9ef88f5	Add masked_select abstract impl (#110103 ) Fixes https://github.com/pytorch/pytorch/issues/109871 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110103 Approved by: https://github.com/bdhirsh	2023-09-27 04:07:58 +00:00
rzou	f8fcc54f70	Add torch.library.impl_abstract (#109912 ) Changelog: - torch.library.impl_abstract optionally accepts a torch.library.Library object. If passed in, then the lifetime of the registration is tied to the Library object. - we've also changed torch.library.impl_abstract to work on all operators, including overloads. - we refactored the `torch._custom_ops.` and `torch._custom_op.` impl_abstract APIs and put them under torch._library. This is the final resting place for them. I will follow-up with deleting all the `torch._custom_ops.` stuff later. - There is a new "SimpleOperatorRegistry" where we actually collect the abstract_impl. We will expand this to also hold the other torch._custom_ops. APIs when we move those to torch.library NB: Previously we had designed `impl_abstract` assuming a very high-level Python-only custom op API. We've revisited that since; now, impl_abstract works for all custom ops, no matter python or C++, no matter the schema. The new refactored design reflects this better. Test Plan: - existing and new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/109912 Approved by: https://github.com/ezyang	2023-09-26 01:59:50 +00:00
Moritz Hennen	09c598745c	Rename `torch._C._TensorBase` to `TensorBase` (#109940 ) I have gone ahead and implemented the renaming of the type `torch._C._TensorBase` to a non-private class name `TensorBase`. The changes also include leaving `torch._C._TensorBase` as an alias to the new type: `70458768fb/torch/csrc/autograd/python_variable.cpp (L2196-L2197)` both in the c++ code and in the corresponding `__init__.pyi.in` file: `70458768fb/torch/_C/__init__.pyi.in (L1522)` Fixes #109438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109940 Approved by: https://github.com/ezyang	2023-09-25 19:10:22 +00:00
Edward Z. Yang	b4ede53776	Use constrain_range_as_size for nonzero/repeat_interleave (#109857 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109857 Approved by: https://github.com/tugsbayasgalan	2023-09-22 12:14:46 +00:00
Brian Hirsh	63526a63f5	Make FunctionalTensor subclass to be more like functorch (interaction with ZeroTensor + Conjugate key) (#109023 ) I added some tests for Conj, Neg and ZeroTensor for both python and C++ functionalization. This also fixes a nasty segfult when running a functorch `jacfwd` test with `torch.compile`, once AOTAutograd is using `FunctionalTensor`. Changes: (1) I use Jeffrey's `make_wrapper_subclass(extra_dispatch_keys)` kwarg to plumb extra dispatch keys ontoto the wrapper, mirroring what C++ functionalization does (C++ functionalization will mirror all dispatch keys from the inner tensor to the wrapper, except for python and functorch keys). (2) FunctionalTensorMode will decompose CompositeImplicitAutograd ops, since (for example) ZeroTensor kernels can send ops like `.to()` directly to the Python key. We'll need a way to toggle this later for pre-dispatch functionalization (3) Bound `_ForceDispatchKeyGuard` and BatchedTensorImpl's dispatch keyset to python Pull Request resolved: https://github.com/pytorch/pytorch/pull/109023 Approved by: https://github.com/zou3519 ghstack dependencies: #108654, #109662, #109632	2023-09-22 07:09:04 +00:00
rzou	8124a6c40c	[TORCH_LIBRARY] Add impl_abstract_pystub (#109529 ) We want users to be able to define custom ops in C++ but put the abstract impl in Python (since it is easier to write them in Python and the abstract impl better models device semantics and data-dependent operators). `m.impl_abstract_pystub(opname, python_module, context)` declares the abstract_impl of the operator to exist in the given python module. When the abstract_impl needs to be accessed (either via FakeTensor or Meta), and it does not exist, the PyTorch Dispatcher will yell with a descriptive error message. Some details: - We construct a new global AbstractImplPyStub mapping in Dispatcher.cpp. Read/write to this map is protected by the Dispatcher lock. - We add a new Meta Tensor fallback kernel. The fallback errors out if there is no meta kernel, but also offers a nicer error message if we see that there is a pystub. - We create a `torch._utils_internal.throw_abstract_impl_not_imported_error` helper function to throw errors. This way, we can throw different error messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we added a PyInterpreter::throw_abstract_impl_not_imported_error. Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/) Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-09-22 04:55:36 +00:00
hauntsaninja	2cd0b94533	Hide __getattr__ from type checkers (#109683 ) Visibility of this causes type checkers to conservatively assume that all attributes are defined on torch module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109683 Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/malfet	2023-09-21 17:01:23 +00:00
Brian Hirsh	238fb66085	python functionalization: support higher order ops (#108656 ) We now have two types of functionalization, C++ Functionalization (through the `Functionalize` dispatch key), and python functionalization (through the `FunctionalTensorMode` torch_dispatch mode). This means that all higher order ops need custom functionalization rules for the python variant too. I added them here, as well as a helper function `dispatch_functionalize()` - equivalent to `torch.func.functionalize()`, except that it uses `FunctionalTensorMode`. In theory we could have secretly switched `torch.func.functionalize` to use `FunctionalTensorMode`. This would be BC-breaking, though, since `FunctionalTensorMode` isn't composable with the other functorch transforms (the functorch layer-mode stack doesn't know how to re-order torch_dispatch modes arbitrarily). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108656 Approved by: https://github.com/zou3519 ghstack dependencies: #109024, #109248	2023-09-20 04:37:31 +00:00
Brian Hirsh	25e81f19f3	reland "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" (#109518 ) Reland - the previous PR was reverted by internal with this error: ``` File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/buck-out/v2/gen/fbcode/363cd7e240f5d021/caffe2/torch/fb/trainer/data_modules/tests/__test_dataloader__/test_dataloader#link-tree/torch/__init__.py", line 29, in <module> from ._utils_internal import _functionalize_sync as _sync ImportError: cannot import name '_functionalize_sync' from 'torch._utils_internal' ``` I couldn't figure out why internal was unhappy with the import. One potential reason is that I see a build rule for another `_utils_internal.py` in the fb folder here ([link](https://www.internalfb.com/code/fbsource/[30ed85cd88409af98b7490be137aaa5dfd7afd01]/fbcode/caffe2/TARGETS?lines=444)) Rather than burn more time investigating, I confirmed internally that the error goes away if I move the util from `torch/_utils_internal.py` to `torch/_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109518 Approved by: https://github.com/albanD	2023-09-19 13:25:24 +00:00
PyTorch MergeBot	49b18ae546	Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" This reverts commit `0ad595954a`. Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))	2023-09-17 20:57:41 +00:00
Brian Hirsh	0ad595954a	python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 ) Added two new utils to help with turning python functionalization on in AOTAutograd (next PR): (1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic (2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917 Approved by: https://github.com/ezyang ghstack dependencies: #106404	2023-09-15 20:19:25 +00:00
Brian Hirsh	f22b303f65	Add TorchDispatch version of functionalization (#106404 ) This PR adds a new `FunctionalTensor` subclass, and `FunctionalTensorMode` torch dispatch mode. Together, this class/mode are a lightweight wrapper around our existing C++ functionalization logic. This idea came from Ed - later in the stack, I want to be able to run functionalization underneath torch_dispatch, when performing tracing in AOTAutograd. I can't do this easily with vanilla C++ functionalization, because it has a dedicated dispatch key that always runs before TorchDispatch. However, by adding a torch_dispatch mode shim around functionalization, we can use functionalization as a torch_dispatch mode, which will make it easier to run underneath other modes later. This PR provides the basic new classes, and some light testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106404 Approved by: https://github.com/ezyang	2023-09-15 20:19:25 +00:00
Edward Z. Yang	55f956f1d2	optests improvements based on torchvision usage on nms (#108929 ) - Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over - Support dynamic shapes by default for make_fx `tracing_mode="fake"` without symbolifying everything else Fixes https://github.com/pytorch/pytorch/issues/108927 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108929 Approved by: https://github.com/zou3519	2023-09-13 13:26:15 +00:00
Edward Z. Yang	e5f300f085	Make mutation test work with quantized tensors (#108935 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108935 Approved by: https://github.com/zou3519	2023-09-13 00:54:01 +00:00
Edward Z. Yang	2b9ad3d5c4	Fix setitem with SymInt (#108873 ) Fixes https://github.com/pytorch/pytorch/issues/101939 Several fixes bundled together: 1. When we valueToTensor, we only handled non-symbolic inputs and not symbolic inputs. We support symbolic Scalar, so also handle symbolic values. 2. In the symbolic case, we MUST NOT lift_fresh, as you're not going to inline a constant into the graph, it's going to be from a `scalar_tensor` call (so no need to clone it to avoid mutations) 3. In indexing scalarToTensor, must not do the static, directly read out the scalar contents logic with the scalar is symbolic Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108873 Approved by: https://github.com/jansel	2023-09-10 06:44:22 +00:00
Edward Z. Yang	9b83402666	Add support for symbolic repeat_interleave (#108763 ) Fixes https://github.com/pytorch/pytorch/issues/108195 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108763 Approved by: https://github.com/Chillee	2023-09-08 16:48:32 +00:00
vasiliy	3702980717	dynamo: trace autograd.Function with tensor subclass input (#108093 ) Summary: Enables dynamo eager mode tracing for the following situation: 1. we have a torch.autograd.Function 2. the input to that function is a tensor subclass which is an intermediary This is useful for float8 training UX. Test Plan: ``` python test/dynamo/test_autograd_function.py -k intermediary_input ``` Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108093 Approved by: https://github.com/bdhirsh, https://github.com/wanchaol	2023-09-01 02:12:38 +00:00
Brian Hirsh	4f34caf164	add return_and_correct_aliasing() util for wrapper subclasses (#107915 ) This PR adds a `return_and_correct_aliasing()` utility, that wrapper subclasses can use to get correct aliasing. I updated `TwoTensor` to use it, and added some testing that the aliasing of my `TwoTensor` subclass now matches the aliasing behavior of normal tensors. Right now my test just uses a few hand-picked opinfos (that have varying aliasing behavior). I thought all op infos might be overkill (does that take a while to run?), but I'm happy to add them all if people prefer. One more general question about this PR: eventually, proper aliasing will be a requirement in order for AOTAutograd to handle aliasing/mutations on subclasses properly during compilation. How can we make sure that wrapper subclasses use this API? A few options (from talking to Richard): (1) Yolo require subclasses to use the API and hope users do as well (what this PR does) (2) Yolo require subclasses to use the API, but add a kwarg to `_make_wrapper_subclass`, e.g. `manual_aliasing=True`, that torch.compile checks for before allowing the subclass to be used in compilation (3) Automatically run this API in our python fallback, for every tensor subclass that currently implements `__tensor_flatten__` (aka only the "traceable" subclasses) (4) Automatically run this API in our python fallback, for every tensor subclass. This would be a bit higher blast radius, since it would change the existing aliasing behavior of wrapper subclasses. Maybe.. this is the right thing to do though? Either way, my tentative plan is to do (1) to unblock, and revisit this later once we want to come up with public docs + a more general "tensor subclass in PT2 requirements" plan Pull Request resolved: https://github.com/pytorch/pytorch/pull/107915 Approved by: https://github.com/ezyang	2023-08-29 14:27:19 +00:00
Brian Hirsh	da54f3c519	reorder proxy / fake modes so they always run last (#104482 ) Update: Made refactor of the original PR. See the original description below, but here I'll describe the updates: (1) TLS changes in `TorchDispatchModeTLS.h/cpp`. I added a `TorchDispatchModeKey` enum, that (for now) just contains PROXY and FAKE. The ModeTLS used to just contain a `std::vector<std::shared_ptr<c10::SafePyObject>>` corresponding to the mode stack. It now also contains a separate array of "infra modes", indexed by mode key (PROXY and FAKE, with a new addition, FUNCTIONAL, coming later in the stack). `TorchDispatchModeTLS::push_onto_stack` and `TorchDispatchModeTLS::pop_stack` are now a bit more complicated. Pushing accepts an optional mode_key, which if set, tells us to add the given mode directly to our "infra_modes" array. Popping will first check the "user mode" stack, before trying to pop anything from the infra mode stack. It also optionally returns the mode key of the mode we popped if there was one - that way if we push that same mode back onto the TLS later, we know where it goes. `TorchDispatchModeTLS::dispatch_mode_enabled()` now accepts an optional `skip_infra_modes` param, so you can separately query if there are "any modes at all", or if there are "any user modes". `TorchDispatchModeTLS::get/set/unset_mode()` all take in a mode key, and get/set/unset the mode at that particular mode key (meaning they are only meant to be used for infra modes). There were also some mild codegen changes to support the new enum (2) `fake_tensor.py/proxy_tensor.py/_python_dispatch.py` The way I tell the infra that certain subclasses/modes are "infra" is through the enum: I gave `FakeTensor` and `FakeTensorMode` a `self._mode_key = torch._C.TorchDispatchModeKey.FAKE`. `TorchDispatchMode.__enter/exit__()` (in `_python_dispatch.py` now check if the current mode has a mode key, and if so they plumb it into any `push_onto_stack()` calls (which eventually instructs `TorchDispatchModeTLS` where to put the mode). Same thing for `ProxyTorchDispatchMode`. I also had to change both of these mode's enter/exit, to handle the fact that there can no longer be multiple proxy/fake modes on the mode stack at once. I updated them both to have a `self.enter_stack: List[Optional[TorchDispatchMode]]` - whenever we push a given mode in `__enter__`, we remove the current ambient fake/proxy mode from the mode stack, and save it in `enter_stack`, so that on exit we can reset the state properly. (2) dispatching logic in `python_arg_parser.cpp` This is where the core dispatching logic changes are. I added two helpers, `dispatch_on_subclass()` and `dispatch_on_mode()`. The overall dispatching order is now: ``` (a) dispatch_on_mode() # try user modes first (where the mode stack automatically considers infra modes last) (b) dispatch_on_subclass() # try user subclasses next (skipping infra subclasses) (c) dispatch_on_subclass() # try infra subclasses next (skipping user subclasses) ``` Note that we still want "user subclasses" to run before "infra modes". As Ed helped me realize, this will work today: If proxy/fake modes in step 1, they'll return NotImplemented if they see a user subclass, allowing us to redispatch to the user subclass. How do (b) and (c) distinguish between user and infra subclasses? Infra subclasses (FakeTensor, and later FunctionalTensor) are required to have a `_mode_key` hidden on the subclass - so we filter via arguments that do/don't have the _mode_key. (3) I also changed `DoubleTensor` to `TwoTensor` to minimize confusion (@albanD pointed out that DoubleTensor would be easily confused with `torch.FloatTensor` and friends). ----- original description below ----- The main purpose of this PR is to fix the "ordering problem" between torch_dispatch modes, where we want to ensure that our Fake and Proxy dispatch modes always run after any dispatch modes created by the user, regardless of where they are in the stack. See this doc for more details: https://docs.google.com/document/d/1COQ291nOZvtFnzGTQMJqoYZ3sttEYFw_7HbfSyL8gcA/edit Full set of changes below. I ended up including a few semi-related changes in this PR that I documented - but if folks would rather I separate them out, happy to try to do that. (1) Add dedicated TLS slots for FakeTensorMode and ProxyTensorMode This is the main component of this PR. There are two new slots, `TorchDispatchModeTLS.fake_mode_` and `TorchDispatchModeTLS.proxy_mode_`, which correspond to a single "global" fake and proxy mode. There is now an invariant that `torchDispatchModeState.stack_` can never contain either of these modes. I also added a `TorchDispatchModeTLS::maybe_highest_mode()` helper that consults the `stack_` as well as both the proxy and fake slots, and returns the highest priority mode - this is because there are a few places in the codebase where we legitimately want to get the highest priority mode, including fake or proxy, if one is set. This also made the implementations of the existing `disable_proxy_modes_tracing()` and `get_innermost_proxy_mode()` marginally simpler. (2) Updated the dispatching logic in handle_torch_function_no_python_arg_parser() This is the function that actually figures out which torch_dispatch implementation to call, given the current mode stack and tensor subclass inputs. This function got marginally more complicated as part of the refactor: First we inspect the mode stack and any non-fake subclass inputs. Then we check for the proxy mode slot. Then we check for the Fake mode slot, before finally checking for any fake subclass inputs. (3) new python `_get_fake_tensor_mode()` and `_get_proxy_tensor_mode()` API's Before, if you wanted to see if proxy or fake modes were active in python, you would have to consult the mode stack. Since these two modes are no longer part of the actual mode stack, I added two new API's to directly check if either proxy or fake modes are active. (4) Allow traceable tensor subclasses to access storages from python This is convenient later in the stack, where AOTAutograd needs to detect aliasing of inputs and outputs, where those inputs and outputs might be tensor subclasses. Previously, `x.untyped_storage()` would raise an error if `x` was a subclass. In this PR, I tried to relax this constraint as little as possible: `THPVariable_storage()` will only try to return a storage to python if the tensor subclass that you are passing in is "traceable" (5) Fixed subclass fakeification @wanchaol recently added support to be able to fakeify tensor subclasses. That fakeification logic works in most cases, but there is one case it doesn't handle: autograd metadata. In particular, since autograd sees our tensor subclasses and not their desugared tensors, we need to make sure that our fakeified subclass has the same autograd metadata as the original subclass. I updated `meta_utils.py` to make sure that the autograd metadata is correct. (6) make tensor subclasses resizeable Previously we didn't allow tensor subclasses to be resizeable. I ran into an issue where fakeifying a tensor subclass occasionally requires swapping out its storage, which can involve resizing the tensor. Mechanically, this required updating `at::for_blob()` to expose a way to request that the tensor that you create has resizeable storage, and then using this new API in `_make_wrapper_tensor()`. (7) Added a basic DoubleTensor subclass for testing I use this subclass more later in this stack in my AOTAutograd tests - but it serves as a simple subclass example to test the dispatch ordering in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104482 Approved by: https://github.com/ezyang ghstack dependencies: #107415	2023-08-29 02:36:48 +00:00
Brian Hirsh	5efd63b1b8	better support for fakeifying and dynamoing through torch_dispatch subclasses (with dynamic shapes) (#107415 ) There is already some support for plumbing `__torch_dispatch__` tensor subclasses through dynamo, but this PR beefs it up a bit and adds a test. In particular: (1) Fakeifying tensor subclasses didn't properly set autograd metadata (requires_grad, is_leaf) on the newly fakeified wrapper subclass. I don't actually have a test for this in this PR, but it's tested pretty heavily later in my aot autograd tests (2) Fakeifying tensor subclasses didn't properly track source information for dynamic shapes on the inner tensors. I added a new `WrapperSubclassFieldSource` subclass, that represents a source coming from a tensor field on a wrapper subclass, which I use in the fakeifying logic, and again in symbolic_shapes.py to generate proper guards. (3) `_make_wrapper_subclass()` marginally updated this code to work better with dynamic shapes. One thing that's a bit weird about `_make_wrapper_subclass`: it has two overloads, and the first explicitly does not support dynamic shapes (and the second.. does not support kwargs). I think that later we probably want to consolidate / at least make the first overload work with dynamic shapes, but I didn't want to handle that in this PR (so these smaller changes seemed like a strict improvement). Pull Request resolved: https://github.com/pytorch/pytorch/pull/107415 Approved by: https://github.com/ezyang	2023-08-29 02:36:48 +00:00
leslie-fang-intel	9319dd1c7c	[Quant][Inductor] Enable the lowering of quantized maxpool2d (#105906 ) Summary Enable the `dq-maxpool2d-q` pattern match and lower into `torch.ops.quantized.max_pool2d`. Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qmaxpool2d python -m pytest test_quantized_op.py -k test_max_pool2d_pt2e ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105906 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #104580, #104581, #104588, #104590, #105455, #105456, #105639	2023-08-26 08:36:47 +00:00
Vishwa Raj Singh	35de780aa6	Fix Inplace tensor update on transpose (#104689 ) Fixes #https://github.com/pytorch/pytorch/issues/103650 - To align with HPU device backend architecture. Ensure all non-view ops return contiguous fake tensor outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104689 Approved by: https://github.com/ezyang	2023-08-24 16:58:50 +00:00
ydwu4	cbcd551045	Fix torch.compile FunctionalTensor inputs for higherOrderOps (#107604 ) Before this PR, for the added [test](https://github.com/pytorch/pytorch/pull/107604/files#diff-c618f2274b6b5ccc533c580549d2e552edbd9fc5ac0da1aa4b00338525c8f78dR224), which feeds FunctionTensorWrapper inputs to higherOrderOperator, we have an assertion error in this line [code](https://github.com/pytorch/pytorch/pull/107604/files#diff-9f0663783bcd93e948e0491ef61b48123bdc9977bcc632fd707da578df13bfa1R1284). The key difference of this PR is this [line ](https://github.com/pytorch/pytorch/pull/107604/files#diff-9f0663783bcd93e948e0491ef61b48123bdc9977bcc632fd707da578df13bfa1L1263)of check: ```python elif ( isinstance(example_value, FakeTensor) and example_value.fake_mode is tx.fake_mode ): ``` The original intention of it seems to be dealing with case where we want to wrap an fx proxy for an intermediate fake tensor that's produced by some tensor ops and an example value is provided (as is the case for higherOrderOps [here](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/variables/higher_order_ops.py#L85)). A fakified FunctionalTensorWrapper(FakeTensor) always fails this check. This PR changes it to checking whether it's already fakified by tx.fake_mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107604 Approved by: https://github.com/zou3519 ghstack dependencies: #107569	2023-08-23 02:42:18 +00:00
ydwu4	a408920817	Reland fakify FunctionalTensor (#107569 ) Try to rebase and reland https://github.com/pytorch/pytorch/pull/107062 . One difference compared with previous is to make the DTensor logic same as previously in _clone_input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107569 Approved by: https://github.com/zou3519	2023-08-22 15:46:25 +00:00
Yukio Siraichi	bcede143bd	Do not mutate `SymNode` expression. (#107492 ) This PR stops `SymNode` from mutating (i.e. simplifying) its expression. Instead, the simplification (without mutation) is deferred to the `SymNode.maybe_as_int` method. ```python - FakeTensor(size=(s0,), ...) - FakeTensor(size=(s1, s2, s3), ...) - Eq(s0, s1 + s2 + s3) - FakeTensor(size=(s0,), ...) - FakeTensor(size=(s1, s2, s3), ...) ``` In summary, this PR: - Replaces `SymNode._expr` by `SymNode.expr`, removing the old property function - This makes it so `SymNode` instances never update their expression - Creates `SymNode.simplified_expr()` method for actually calling `ShapeEnv.replace` on its expression. Note that this doesn't updates `SymNode.expr` - Changes how `tensor.size()` gets converted to its Python `torch.Size` type - Instead of calling `SymInt::maybe_as_int()` method, we create a new `SymInt::is_symbolic()` method for checking whether it is actually a symbolic value - This is needed so that when we call `tensor.size()` in the Python side, the returned sequence is faithful to the actual data, instead of possibly simplifying it and returning an integer - 2 files needs this modification: - _torch/csrc/Size.cpp_: for handling `torch.Tensor.size` Python calls - _torch/csrc/utils/pybind.cpp_: for handling `symint.cast()` C++ calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/107492 Approved by: https://github.com/ezyang ghstack dependencies: #107523	2023-08-22 12:38:05 +00:00
PyTorch MergeBot	96c5be8bc4	Revert "Fakify leaf of FunctionalTensor (#107062 )" This reverts commit `3349725766`. Reverted https://github.com/pytorch/pytorch/pull/107062 on behalf of https://github.com/ydwu4 due to This appears to have broken the test TestDTensorCompile.test_dtensor_fullgraph. Probably a land race ([comment](https://github.com/pytorch/pytorch/pull/107062#issuecomment-1685447747))	2023-08-21 00:30:16 +00:00
ydwu4	3349725766	Fakify leaf of FunctionalTensor (#107062 ) This PR allows dynamo to fakify FunctionalTensorWrapper by unwrapping, replacing and wrapping again for FunctionalTensorWrapper so that FunctionalTensorWrapper can be passed in as input for dynamo.optimize and we can support something like this ```python ff = torch.func.functionalize(f) torch.compile(ff)(x) ``` This PR didn't follow the \_\_tensor_flatten\_\_ and \_\_tensor_unflatten\_\_ protocol right now because we're not sure the plan of doing that for FunctionalTensorWrapper (it's implemented in C++). Test Plan: Add a new test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107062 Approved by: https://github.com/zou3519 ghstack dependencies: #107042	2023-08-19 17:33:42 +00:00
PyTorch MergeBot	3c11184ca8	Revert "Fakify leaf of FunctionalTensor (#107062 )" This reverts commit `6cb0128c8a`. Reverted https://github.com/pytorch/pytorch/pull/107062 on behalf of https://github.com/ZainRizvi due to This appears to have broken the test TestDTensorCompile.test_dtensor_fullgraph. Probably a land race ([comment](https://github.com/pytorch/pytorch/pull/107062#issuecomment-1684124230))	2023-08-18 16:02:54 +00:00
ydwu4	6cb0128c8a	Fakify leaf of FunctionalTensor (#107062 ) This PR allows dynamo to fakify FunctionalTensorWrapper by unwrapping, replacing and wrapping again for FunctionalTensorWrapper so that FunctionalTensorWrapper can be passed in as input for dynamo.optimize and we can support something like this ```python ff = torch.func.functionalize(f) torch.compile(ff)(x) ``` This PR didn't follow the \_\_tensor_flatten\_\_ and \_\_tensor_unflatten\_\_ protocol right now because we're not sure the plan of doing that for FunctionalTensorWrapper (it's implemented in C++). Test Plan: Add a new test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107062 Approved by: https://github.com/zou3519 ghstack dependencies: #107042	2023-08-18 03:05:45 +00:00
eellison	3495f0c999	Generate mypy hints for torch.Tag, add a couple of pointwise ops (#106910 ) Replace https://github.com/pytorch/pytorch/pull/106739, since i had a bad CLA commit. - adds clone, and convert_element_dtype to pointwise - adds codegen for mypy hints of torch.Tag and removes existing ignores for them Pull Request resolved: https://github.com/pytorch/pytorch/pull/106910 Approved by: https://github.com/mlazos	2023-08-10 05:12:27 +00:00
Jason Lu	bc88028e8e	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 ) Summary: Original commit changeset: 81319beb97f3 Original Phabricator Diff: D47961182 Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822 Reviewed By: atuljangra Differential Revision: D48131623 @diff-train-skip-merge (D48131623 landed internally) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743 Approved by: https://github.com/malfet	2023-08-08 15:27:34 +00:00
Peter Bell	d4d090e2da	[FakeTensor] Workaround FFT ops with incorrect meta strides (#106319 ) Currently there are FFT operators which raise `UnsupportedOperatorException` because their meta implementations sometimes give incorrect strides. This works around the problem for static shapes by falling back to eager. Though we still don't support calls with dynamic shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106319 Approved by: https://github.com/ezyang	2023-08-07 20:59:30 +00:00
Edward Z. Yang	91afefb55b	Fix some fake mode confusion between inner/outer fake mode in export (#106515 ) Fixes https://github.com/pytorch/pytorch/issues/106412 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106515 Approved by: https://github.com/voznesenskym, https://github.com/BowenBao, https://github.com/thiagocrepaldi	2023-08-04 15:42:23 +00:00
Richard Zou	fd6e052a8a	Some minor improvements to FakeTensor testing (#106311 ) Summary: - PyTorch testing chokes sometimes when it sees an exception where the first argument is not a string. fake_tensor.UnsupportedOperatorException's first arg is an OpOverload. This PR fixes PyTorch testing to not choke. I'm not really sure how to reproduce this in OSS. - It turns out that if an operator does not have a meta kernel, the FakeTensor rule is really slow (30ms in OSS in debug mode, 3s on some internal config). The thing that is slow (aside from the previous diff) is waiting for the Dispatcher to report NotImplemented and then attempting to catch that. I'm not really sure why this is slow but it's easy to workaround so I added a workaround. Test Plan: - existing tests Differential Revision: D47917554 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106311 Approved by: https://github.com/eellison	2023-08-03 01:44:15 +00:00
drisspg	f533791cd0	[SDPA] Mirror c++ implementation in FlashAttention meta func (#106477 ) # Summary Test edge case and update meta function to match the c++ implementation Pull Request resolved: https://github.com/pytorch/pytorch/pull/106477 Approved by: https://github.com/eellison	2023-08-03 00:28:27 +00:00
PyTorch MergeBot	fdd4b3aaa8	Revert "faketensor: prevent deepcopy from cloning FakeTensorMode (#104476 )" This reverts commit `c54afea6ee`. Reverted https://github.com/pytorch/pytorch/pull/104476 on behalf of https://github.com/jeanschmidt due to sadly it is breaking internal tests, and I can't coordinate a FF due to timezone differences ([comment](https://github.com/pytorch/pytorch/pull/104476#issuecomment-1661808343))	2023-08-02 08:56:33 +00:00
Mikayla Gawarecki	d8e5f2aa6d	Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224 Approved by: https://github.com/atalman, https://github.com/albanD	2023-07-31 17:18:56 +00:00
Brian Hirsh	c54afea6ee	faketensor: prevent deepcopy from cloning FakeTensorMode (#104476 ) fixes https://github.com/pytorch/pytorch/issues/104465 A more detailed repro is here, which uses `nn.TransformerLayer` (this breaks with AOTAutograd today, due to the presence of multiple FakeTensorMode objects lying around) https://github.com/pytorch/pytorch/issues/103505#issuecomment-1614817132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104476 Approved by: https://github.com/ezyang	2023-07-31 15:49:08 +00:00
Jason Ansel	3ecd05d9f3	Fix FakeTensor issues with copy_ between devices (#106172 ) Used to fail with: ``` RuntimeError: Unhandled FakeTensor Device Propagation for aten.copy_.default, found two different devices cpu, cuda:0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106172 Approved by: https://github.com/eellison	2023-07-29 15:55:32 +00:00
Richard Zou	f3d165bf61	[fake_tensor] Don't run fallback for fbgemm ops (#106210 ) Summary: This diff also adds more warning messages around allowing a namespace into the fallback. We need to grandfather in an operator to actually merge this diff. Test Plan: - existing tests Differential Revision: D47873841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106210 Approved by: https://github.com/eellison	2023-07-28 22:31:54 +00:00
Edward Z. Yang	884cd53e49	Unconditionally record when FakeTensorMode is allocated and report it on inconsistency (#105927 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105927 Approved by: https://github.com/albanD	2023-07-26 03:38:42 +00:00
Edward Z. Yang	4af9a914ab	Improve FakeTensor to work with mixed meta-cpu embedding bag arguments (#105924 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105924 Approved by: https://github.com/mikaylagawarecki, https://github.com/eellison	2023-07-26 01:19:08 +00:00
Edward Z. Yang	5403c7770c	Provide a refined upper bound for nonzero when original numel is static (#105843 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105843 Approved by: https://github.com/lezcano	2023-07-25 00:51:35 +00:00
Andrey Talman	c6653b65d8	Back out "Make adding buffers more like adding parameters (#104069 )" (#105581 ) Summary: D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/ with `TypeError: register_buffer() takes 3 positional arguments but 4 were given` Original commit changeset: d4b4069fbd38 Original Phabricator Diff: D47537831 Test Plan: ``` buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform ``` Reviewed By: atalman Differential Revision: D47600140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581 Approved by: https://github.com/mikaylagawarecki	2023-07-20 03:39:53 +00:00
Wanchao Liang	cb23373264	[dynamo] allow tensor subclass fakification in dynamo (#105308 ) This PR adds necessary plumbing through torchdynamo to allow tensor subclasses with certain contract (i.e. with `__tensor_flatten__` and `__tensor_unflatten__`) to goes through the dynamo fakification pass by fakifying the tensor subclass internal components. Some of the tensor subclass contract logic mostly borrowed from https://github.com/pytorch/pytorch/pull/97540 Added some tests to verify simply passing through a tensor subclass (i.e. DTensor) through dynamo eager works as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105308 Approved by: https://github.com/ezyang	2023-07-18 17:28:04 +00:00

1 2 3 4 5 ...

278 Commits