pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	1c3fae46ee	Revert "Support SingletonSymNode mul with coefficient (#110369 )" This reverts commit `eb8feb8ff8`. Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))	2023-10-05 23:51:28 +00:00
soulitzer	eb8feb8ff8	Support SingletonSymNode mul with coefficient (#110369 ) We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions. Constraints: - [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided. - When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below. Design: Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369 Approved by: https://github.com/ezyang ghstack dependencies: #110044	2023-10-04 22:56:15 +00:00
Brian Hirsh	63526a63f5	Make FunctionalTensor subclass to be more like functorch (interaction with ZeroTensor + Conjugate key) (#109023 ) I added some tests for Conj, Neg and ZeroTensor for both python and C++ functionalization. This also fixes a nasty segfult when running a functorch `jacfwd` test with `torch.compile`, once AOTAutograd is using `FunctionalTensor`. Changes: (1) I use Jeffrey's `make_wrapper_subclass(extra_dispatch_keys)` kwarg to plumb extra dispatch keys ontoto the wrapper, mirroring what C++ functionalization does (C++ functionalization will mirror all dispatch keys from the inner tensor to the wrapper, except for python and functorch keys). (2) FunctionalTensorMode will decompose CompositeImplicitAutograd ops, since (for example) ZeroTensor kernels can send ops like `.to()` directly to the Python key. We'll need a way to toggle this later for pre-dispatch functionalization (3) Bound `_ForceDispatchKeyGuard` and BatchedTensorImpl's dispatch keyset to python Pull Request resolved: https://github.com/pytorch/pytorch/pull/109023 Approved by: https://github.com/zou3519 ghstack dependencies: #108654, #109662, #109632	2023-09-22 07:09:04 +00:00
soulitzer	5252fcb133	Handle constant SymBool in unary and binary operations (#109169 ) In this PR: - When Constant SymNode are detected in unary/binary ops demote them to plain int/bool before proceeding. Sometimes this means doing a unary op with a Constant SymNode would result in a plain bool. - Introduce an is_symbolic method, only available from Python. We need this because isinstance(x, SymInt) is no longer sufficient to check whether a given int/SymInt is symbolic or not. See later PR in the stack to see how this is used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109169 Approved by: https://github.com/ezyang	2023-09-20 20:37:15 +00:00
Brian Hirsh	f22b303f65	Add TorchDispatch version of functionalization (#106404 ) This PR adds a new `FunctionalTensor` subclass, and `FunctionalTensorMode` torch dispatch mode. Together, this class/mode are a lightweight wrapper around our existing C++ functionalization logic. This idea came from Ed - later in the stack, I want to be able to run functionalization underneath torch_dispatch, when performing tracing in AOTAutograd. I can't do this easily with vanilla C++ functionalization, because it has a dedicated dispatch key that always runs before TorchDispatch. However, by adding a torch_dispatch mode shim around functionalization, we can use functionalization as a torch_dispatch mode, which will make it easier to run underneath other modes later. This PR provides the basic new classes, and some light testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106404 Approved by: https://github.com/ezyang	2023-09-15 20:19:25 +00:00
soulitzer	8d863560bd	Allow adding extra dispatch keys to wrapper tensor subclass (#108808 ) Updated version of https://github.com/pytorch/pytorch/pull/108313 which has more review comments Pull Request resolved: https://github.com/pytorch/pytorch/pull/108808 Approved by: https://github.com/bdhirsh	2023-09-08 18:46:09 +00:00
Brian Hirsh	da54f3c519	reorder proxy / fake modes so they always run last (#104482 ) Update: Made refactor of the original PR. See the original description below, but here I'll describe the updates: (1) TLS changes in `TorchDispatchModeTLS.h/cpp`. I added a `TorchDispatchModeKey` enum, that (for now) just contains PROXY and FAKE. The ModeTLS used to just contain a `std::vector<std::shared_ptr<c10::SafePyObject>>` corresponding to the mode stack. It now also contains a separate array of "infra modes", indexed by mode key (PROXY and FAKE, with a new addition, FUNCTIONAL, coming later in the stack). `TorchDispatchModeTLS::push_onto_stack` and `TorchDispatchModeTLS::pop_stack` are now a bit more complicated. Pushing accepts an optional mode_key, which if set, tells us to add the given mode directly to our "infra_modes" array. Popping will first check the "user mode" stack, before trying to pop anything from the infra mode stack. It also optionally returns the mode key of the mode we popped if there was one - that way if we push that same mode back onto the TLS later, we know where it goes. `TorchDispatchModeTLS::dispatch_mode_enabled()` now accepts an optional `skip_infra_modes` param, so you can separately query if there are "any modes at all", or if there are "any user modes". `TorchDispatchModeTLS::get/set/unset_mode()` all take in a mode key, and get/set/unset the mode at that particular mode key (meaning they are only meant to be used for infra modes). There were also some mild codegen changes to support the new enum (2) `fake_tensor.py/proxy_tensor.py/_python_dispatch.py` The way I tell the infra that certain subclasses/modes are "infra" is through the enum: I gave `FakeTensor` and `FakeTensorMode` a `self._mode_key = torch._C.TorchDispatchModeKey.FAKE`. `TorchDispatchMode.__enter/exit__()` (in `_python_dispatch.py` now check if the current mode has a mode key, and if so they plumb it into any `push_onto_stack()` calls (which eventually instructs `TorchDispatchModeTLS` where to put the mode). Same thing for `ProxyTorchDispatchMode`. I also had to change both of these mode's enter/exit, to handle the fact that there can no longer be multiple proxy/fake modes on the mode stack at once. I updated them both to have a `self.enter_stack: List[Optional[TorchDispatchMode]]` - whenever we push a given mode in `__enter__`, we remove the current ambient fake/proxy mode from the mode stack, and save it in `enter_stack`, so that on exit we can reset the state properly. (2) dispatching logic in `python_arg_parser.cpp` This is where the core dispatching logic changes are. I added two helpers, `dispatch_on_subclass()` and `dispatch_on_mode()`. The overall dispatching order is now: ``` (a) dispatch_on_mode() # try user modes first (where the mode stack automatically considers infra modes last) (b) dispatch_on_subclass() # try user subclasses next (skipping infra subclasses) (c) dispatch_on_subclass() # try infra subclasses next (skipping user subclasses) ``` Note that we still want "user subclasses" to run before "infra modes". As Ed helped me realize, this will work today: If proxy/fake modes in step 1, they'll return NotImplemented if they see a user subclass, allowing us to redispatch to the user subclass. How do (b) and (c) distinguish between user and infra subclasses? Infra subclasses (FakeTensor, and later FunctionalTensor) are required to have a `_mode_key` hidden on the subclass - so we filter via arguments that do/don't have the _mode_key. (3) I also changed `DoubleTensor` to `TwoTensor` to minimize confusion (@albanD pointed out that DoubleTensor would be easily confused with `torch.FloatTensor` and friends). ----- original description below ----- The main purpose of this PR is to fix the "ordering problem" between torch_dispatch modes, where we want to ensure that our Fake and Proxy dispatch modes always run after any dispatch modes created by the user, regardless of where they are in the stack. See this doc for more details: https://docs.google.com/document/d/1COQ291nOZvtFnzGTQMJqoYZ3sttEYFw_7HbfSyL8gcA/edit Full set of changes below. I ended up including a few semi-related changes in this PR that I documented - but if folks would rather I separate them out, happy to try to do that. (1) Add dedicated TLS slots for FakeTensorMode and ProxyTensorMode This is the main component of this PR. There are two new slots, `TorchDispatchModeTLS.fake_mode_` and `TorchDispatchModeTLS.proxy_mode_`, which correspond to a single "global" fake and proxy mode. There is now an invariant that `torchDispatchModeState.stack_` can never contain either of these modes. I also added a `TorchDispatchModeTLS::maybe_highest_mode()` helper that consults the `stack_` as well as both the proxy and fake slots, and returns the highest priority mode - this is because there are a few places in the codebase where we legitimately want to get the highest priority mode, including fake or proxy, if one is set. This also made the implementations of the existing `disable_proxy_modes_tracing()` and `get_innermost_proxy_mode()` marginally simpler. (2) Updated the dispatching logic in handle_torch_function_no_python_arg_parser() This is the function that actually figures out which torch_dispatch implementation to call, given the current mode stack and tensor subclass inputs. This function got marginally more complicated as part of the refactor: First we inspect the mode stack and any non-fake subclass inputs. Then we check for the proxy mode slot. Then we check for the Fake mode slot, before finally checking for any fake subclass inputs. (3) new python `_get_fake_tensor_mode()` and `_get_proxy_tensor_mode()` API's Before, if you wanted to see if proxy or fake modes were active in python, you would have to consult the mode stack. Since these two modes are no longer part of the actual mode stack, I added two new API's to directly check if either proxy or fake modes are active. (4) Allow traceable tensor subclasses to access storages from python This is convenient later in the stack, where AOTAutograd needs to detect aliasing of inputs and outputs, where those inputs and outputs might be tensor subclasses. Previously, `x.untyped_storage()` would raise an error if `x` was a subclass. In this PR, I tried to relax this constraint as little as possible: `THPVariable_storage()` will only try to return a storage to python if the tensor subclass that you are passing in is "traceable" (5) Fixed subclass fakeification @wanchaol recently added support to be able to fakeify tensor subclasses. That fakeification logic works in most cases, but there is one case it doesn't handle: autograd metadata. In particular, since autograd sees our tensor subclasses and not their desugared tensors, we need to make sure that our fakeified subclass has the same autograd metadata as the original subclass. I updated `meta_utils.py` to make sure that the autograd metadata is correct. (6) make tensor subclasses resizeable Previously we didn't allow tensor subclasses to be resizeable. I ran into an issue where fakeifying a tensor subclass occasionally requires swapping out its storage, which can involve resizing the tensor. Mechanically, this required updating `at::for_blob()` to expose a way to request that the tensor that you create has resizeable storage, and then using this new API in `_make_wrapper_tensor()`. (7) Added a basic DoubleTensor subclass for testing I use this subclass more later in this stack in my AOTAutograd tests - but it serves as a simple subclass example to test the dispatch ordering in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104482 Approved by: https://github.com/ezyang ghstack dependencies: #107415	2023-08-29 02:36:48 +00:00
soulitzer	d7130e9704	Add SingletonSymIntNode (#107089 ) Adds `SingletonSymNodeImpl` (alternatively, `SkolemSymNodeImpl`). This is a int-like object that only allows the`eq` operation; any other operation produces an error. The main complexity is that we require operations that dispatch to SymNode must take and return SymNodes, but when performing operations involving `SingletonSymNodeImpl`, operations involving SymNode can return non-SymNode bools. For more discussion see [here](https://docs.google.com/document/d/18iqMdnHlUnvoTz4BveBbyWFi_tCRmFoqMFdBHKmCm_k/edit) - Introduce `ConstantSymNodeImpl` a generalization of `LargeNegativeIntSymNodeImpl` and replace usage of `LargeNegativeIntSymNodeImpl` in SymInt. - Also use ConstantSymNodeImpl to enable SymBool to store its data on a SymNode. Remove the assumption that if SymBool holds a non-null SymNode, it must be symbolic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107089 Approved by: https://github.com/ezyang ghstack dependencies: #107839	2023-08-24 21:38:47 +00:00
Mikayla Gawarecki	035124774a	Enable registering fallthroughs to (op, dk) from torch.library (#106086 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106086 Approved by: https://github.com/zou3519, https://github.com/albanD	2023-07-28 19:37:59 +00:00
cyy	646fa36875	Add const reference in opportunities detected by clang-tidy (#105931 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105931 Approved by: https://github.com/Skylion007	2023-07-26 21:38:10 +00:00
Charlie West-Taylor	5eb7325bc7	Add autocast support for IPU (#103890 ) As part of this, a new `AutocastIPU` dispatch key has been added. There's an existing PR, #85043, to make `Autocast` a proper per-backend functionality key, but it ran into issues with layering with other functionality keys and went stale. This has been tested in the out-of-tree IPU PyTorch backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103890 Approved by: https://github.com/albanD	2023-06-22 15:38:45 +00:00
Brian Hirsh	c3c03e7cb8	Reland of https://github.com/pytorch/pytorch/pull/101818 (#103888 ) Original PR broke internal This reverts commit `5ed618132f`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103888 Approved by: https://github.com/albanD	2023-06-21 21:00:56 +00:00
PyTorch MergeBot	5ed618132f	Revert "change pre_autograd to pre_dispatch tracing (#101818 )" This reverts commit `b0392de2c3`. Reverted https://github.com/pytorch/pytorch/pull/101818 on behalf of https://github.com/izaitsevfb due to Breaks internal builds see D46629736 TypeError: wrap_key() got an unexpected keyword argument pre_autograd ([comment](https://github.com/pytorch/pytorch/pull/101818#issuecomment-1587837667))	2023-06-12 18:16:37 +00:00
Brian Hirsh	b0392de2c3	change pre_autograd to pre_dispatch tracing (#101818 ) We discussed in a composability meeting a few weeks ago that `pre_autograd` should probably be renamed to `pre_dispatch`. One question in this PR was: should I re-use a dispatch key? Or should I create a new dispatch key (that yet again corresponds to "top of the dispatcher")? ~~For now, I ended up sticking our proxy mode on the mode stack corresponding to `PythonTLSSnapshot`, because it was simple and it works. It looks like one of the functorch dispatch keys has higher priority though, so it's possible that functorch will end up running first. Open to options, but we can consider adding a new dispatch key later if that becomes a problem~~ Update: I added a dedicated dispatch key, `PreDispatch`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101818 Approved by: https://github.com/ezyang, https://github.com/Neilblaze, https://github.com/albanD, https://github.com/zou3519	2023-06-09 17:30:15 +00:00
Richard Zou	3897c479af	Add API to construct the functional variant of an op (#102293 ) `register_functional_op`: - constructs the functional variant of an op - registers a functionalization kernel to the op To get this to work: - `register_functional_op` makes assumptions that it checks about the op's schema. In particular, the op is not allowed to return anything it mutates. We can relax these constraints in the future. - We add a "boxed" python functionalization kernel that handles this case. I'm not actually sure (or convinced) this should be public API or how it should work. If we want this to be public, then it should probably be a torch.library API, but does that also mean we should give the same lifetime guarantees? If so, then it would be up to the user to construct a Library object to actually register the functional variant onto. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/102293 Approved by: https://github.com/bdhirsh	2023-06-02 13:36:50 +00:00
Richard Zou	fc31b3a106	Allow existing "Python RAII guards" to be used as context managers (#102579 ) This PR adds a `py_context_manager_DEPRECATED` that converts a C++ RAII guard to an object that may be either used as Python context manager or as a "Python RAII guard". We don't convert all of them to Python context manager only due to BC reasons; people in OSS and internally actually rely on these APIs and I don't want to break them. We are justified in breaking BC if we wanted to, but it seemed like too much work for not a lot of gain. The API is postfixed with "DEPRECATED" to indicate that people should really use `py_context_manager` (converts C++ RAII guard to Python context manager) instead. Test Plan: - this PR converts all PyTorch usages of _AutoDispatchBelowAutograd to context manager. I can do the rest in follow-ups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102579 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2023-05-31 19:55:38 +00:00
Richard Zou	08fb648fe1	Add mechanism to turn any RAII guard into a Python Context Manager (#102037 ) This PR: - adds a mechanism to turn any RAII guard into a Python Context Manager - turns ExcludeDispatchKeyGuard into a context manager, and purges usages of the older torch._C.ExcludeDispatchKeyGuard from the codebase. The mechanism is that given a RAII guard, we construct a context manager object that holds an optional guard. When we enter the context manager we populate the guard, when we exit we reset it. We don't delete torch._C.ExcludeDispatchKeyGuard for BC reasons (people are using it in fbcode). If this code actually sticks (it is using C++17 and that worries me a bit), then I'll apply the change to other RAII guards we have, otherwise, we can write our own std::apply. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102037 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-05-24 14:20:52 +00:00
Richard Zou	6bc0f4a4ee	[reland][CustomOp] Add Dispatcher error callback (#101452 ) Reland of #101015, original stack reverted due to internal test flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101452 Approved by: https://github.com/soulitzer	2023-05-16 13:33:31 +00:00
PyTorch MergeBot	7912b34789	Revert "[CustomOp] Add Dispatcher error callback (#101015 )" This reverts commit `c0e5d7e7fe`. Reverted https://github.com/pytorch/pytorch/pull/101015 on behalf of https://github.com/huydhn due to Revert this as the earlier commits in the stack have been reverted ([comment](https://github.com/pytorch/pytorch/pull/101015#issuecomment-1548476583))	2023-05-15 19:49:53 +00:00
Richard Zou	c0e5d7e7fe	[CustomOp] Add Dispatcher error callback (#101015 ) The PyTorch Dispatcher's "no kernel found for DispatchKey" error message is a bit long and winded. This PR adds a way to add a custom error callback and changes the CustomOp API to use the custom error callback to deliver better error messages. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/101015 Approved by: https://github.com/ezyang	2023-05-12 13:49:20 +00:00
Brian Hirsh	62fad315c1	fix per-dispatchkey-mode caching bug (#98030 ) The bug was that: if you want to move a mode to the autograd key, we need to use the "functionality" key for it (AutogradFunctionality). But when we do that, we need to clear any PythonDispatcher caches for every op for every autograd key (since you could run autograd ops with both cpu and cuda tensors underneath the mode, which both may have been cached). I didn't add a test, since this ends up getting indirectly tests by export in the PR. If someone would prefer a direct test I can add one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98030 Approved by: https://github.com/ezyang	2023-04-25 21:58:14 +00:00
cyy	dbc7e919b8	add Wmissing-prototypes to clang-tidy (#96805 ) This PR introduces -Wmissing-prototypes of clang-tidy to prevent further coding errors such as the one fixed by PR #96714. <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at fd2cf2a</samp> This pull request makes several internal functions static to improve performance and avoid name clashes. It also fixes some typos, formatting, and missing includes in various files. It adds a new .clang-tidy check to warn about missing prototypes for non-static functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96805 Approved by: https://github.com/malfet, https://github.com/albanD	2023-04-25 18:20:36 +00:00
shibo	da322ea874	Enable torch.jit.load for custom device (#99535 ) Fixes #ISSUE_NUMBER 1、torch.jit.load for custom device ``` # custom device named `foo` ts_model = torch.jit.script(mode.to(device="foo")) ts_model.save("./ts.pt") # it is a script model on device `foo` # and then we want to load it and run it torch.jit.load("./ts.pt") ``` 2、 add some extra key for custom device with `privateuse1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99535 Approved by: https://github.com/albanD	2023-04-20 05:37:57 +00:00
Richard Zou	44b09bf673	Reland "Simple Custom Operator API, V0 (#98440 )" (#99416 ) See the original PR (#98440) for the description. It broke internal builds due to proxy_tensor.py not importing torch._dynamo, which is being fixed in the previous PR in the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99416 Approved by: https://github.com/soulitzer, https://github.com/bdhirsh	2023-04-18 23:48:33 +00:00
PyTorch MergeBot	f497031df9	Revert "Simple Custom Operator API, V0 (#98440 )" This reverts commit `0157b2d722`. Reverted https://github.com/pytorch/pytorch/pull/98440 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-04-18 13:04:27 +00:00
Richard Zou	0157b2d722	Simple Custom Operator API, V0 (#98440 ) This PR introduces CustomOp, a wrapper around a dispatcher operator that allows users to define custom operators. It adds the skeleton for CustomOp and some very simple behavior: as of this PR: - one can create a CustomOp for an operator that does not have inplace or aliasing - give it CPU/CUDA and Meta implementations - and trace it into a graph via make_fx. The design follows https://docs.google.com/document/d/19Uc5OUCA187q9BZggJb70RT2ZoSTDoG5QQkJkZwd25M/edit Concretely, we implement the following things mentioned in the doc in this PR: - Entrypoint 1 (CustomOp.define, creating a new custom operator) - impl (to define device-specific code) and impl_meta (to define meta formulas) The goal for the short term is to get the code to a state where it can be trialed by the export folks. On top of this PR, the blockers are: - adding Entrypoint 3 (CustomOp.from_existing) - adding a way to do data-dependent shape formulas These will come in future PRs since this one is getting long. Things that will come in the longer-near-term (before 2.1): - adding the other entrypoints mentioned in the doc (2 & 3) - more safety checks and better error messages - support for views and mutation - support for defining autograd formulas - support for functionalization - making this API public (it's private right now). Test Plan: - added a new test case, TestCustomOp. It mostly tests a bunch of error cases. - added OpInfos for custom operators and hooked these up to test_proxy_tensor to test that they work with make_fx. These custom operators were based off of the ones in the autograd_function_db. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98440 Approved by: https://github.com/ezyang	2023-04-17 12:17:32 +00:00
Richard Zou	f21a176c03	Python Dispatcher should respect FuncTorchBatchedDecomposition key (#98328 ) Fixes https://github.com/pytorch/pytorch/issues/97425. Python Dispatcher's resolve_key function should be equivalent to computeDispatchTableEntryWithDebug. We added a section to computeDispatchTableEntryWithDebug but forgot to add it to resolve_key. This PR fixes that discrepancy. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/98328 Approved by: https://github.com/Chillee, https://github.com/kshitij12345, https://github.com/Neilblaze	2023-04-05 20:32:53 +00:00
Brian Hirsh	af440c427b	[draft for discussion] add per-dispatch key modes (#97052 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97052 Approved by: https://github.com/ezyang, https://github.com/zou3519	2023-03-21 23:45:45 +00:00
Edward Z. Yang	6a675f7cac	Correctly resolve dispatch keys for PyOperator (#96306 ) Previously, we never actually used resolve_key, which meant that you had to register CPU/CUDA/etc all manually; none of the alias keys worked. Now they work. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96306 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2023-03-09 22:16:31 +00:00
Edward Z. Yang	32ffd70644	Rewrite fallthrough to more closely match how C++ works (#96304 ) Fallthrough is modeled as a mask which we use to remove keys from the compute dispatch key set for eligibility. It's possible this addresses https://github.com/pytorch/pytorch/issues/89037 in a better way than https://github.com/pytorch/pytorch/pull/95891 but I cannot easily tell as the original repro no longer works and the new PR does not have a test. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96304 Approved by: https://github.com/zou3519, https://github.com/albanD, https://github.com/zhxchen17	2023-03-08 23:00:26 +00:00
cyy	1a32db15e7	Some performance fixes (#94034 ) Applies some performance fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/94034 Approved by: https://github.com/Skylion007	2023-02-04 02:17:48 +00:00
Aaron Gokaslan	0247ed27cc	Apply Clang-Tidy readability-container-size-empty (#93236 ) Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236 Approved by: https://github.com/malfet	2023-01-29 23:28:19 +00:00
Kurt Mohler	4d9920fa9c	Move PyInterpreter code in `python_variable.cpp` to its own files (#92647 ) Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92647 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-01-24 23:08:23 +00:00
Kurt Mohler	3a0053abd6	Move `PyObject` code out of `TensorImpl` into new `PyObjectSlot` class (#92169 ) Redo of PR #92099 Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92169 Approved by: https://github.com/albanD	2023-01-14 02:55:32 +00:00
vasiliy	d19791e4cd	add autocast keys to pybind11 DispatchKey object (#90821 ) Summary: This is useful for debugging what autocast is doing when it's running on top of torchdynamo, without this the Python dispatch key for autocast prints as `???`. Test Plan: ``` import torch dir(torch._C.DispatchKey) // the autocast keys show up now ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/90821 Approved by: https://github.com/ezyang	2022-12-15 00:15:07 +00:00
Edward Z. Yang	5266953443	Add crossref debug mode for functionalization, catches stride errors (#89498 ) The idea is to add a custom handler to Functionalize key in Python dispatcher that runs the functionalized version along side a non functionalized version, and checks that their outputs agree in the end. (Technically, for metadata mutation we should also check the inputs, but for now we're relying on those functions returning self.) I turned this on for test_functionalize.py (new TestCrossRefFunctionalize) and found a bunch of failures that look legit. This probably doesn't interact that nicely if you're also tracing at the same time, probably need more special logic for that (directly, just disabling tracing for when we create the nested fake tensor mode, but IDK if there's a more principled way to organize this.) There are some misc fixups which I can split if people really want. - xfail_inherited_tests moved to test common_utils - Bindings for _dispatch_tls_set_dispatch_key_included, _dispatch_tls_is_dispatch_key_included and _functionalization_reapply_views_tls - Type stubs for _enable_functionalization, _disable_functionalization - all_known_overloads utility to let you iterate over all OpOverloads in all namespaces. Iterator support on all torch._ops objects to let you iterate over their members. - suspend_functionalization lets you temporarily disable functionalization mode in a context - check_metadata_matches for easily comparing outputs of functions and see if they match (TODO: there are a few copies of this logic, consolidate!) - _fmt for easily printing the metadata of a tensor without its data - _uncache_dispatch for removing a particular dispatch key from the cache, so that we force it to regenerate - check_significant_strides new kwarg only_cuda to let you also do stride test even when inputs are not CUDA - Functionalize in torch._C.DispatchKey Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89498 Approved by: https://github.com/malfet	2022-11-23 04:18:25 +00:00
Edward Z. Yang	57ed94804e	Bind DispatchKey.Functionalonalize in pybind11 (#89452 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89452 Approved by: https://github.com/albanD, https://github.com/bdhirsh	2022-11-22 00:32:30 +00:00
zhxchen17	c3938bb97a	[functorch] introduce an experimental map() op. (#88767 ) Summary: We want to introduce an experimental control flow op: map() to export some models as FX graphs correctly. Some calrification on basic requirements we have in mind: 1. This op can nest cond() and other control flow primitives internally. 2. We don't necessarily need loop carried dependencies for the models we've seen. 3. This map() op can handle dynamically shaped tensor as input and return dynamically shaped output based on input shapes. 4. We should be able to pass through additional arguments to the loop body as extra arguments. In this diff we introduce a new control flow op `map()` which has the following semantics: ``` def map(f: Callable, xs: Tensor, args): # one possible implementation: return torch.stack([f(x, args) for x in xs]) ``` Test Plan: pytest functorch/test_control_flow.py CI Differential Revision: D41165796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88767 Approved by: https://github.com/zou3519	2022-11-19 00:19:50 +00:00
Richard Zou	3bc327993f	PyDispatcher integration with functorch (#88785 ) This PR teaches PyDispatcher and PyOperator about functorch transforms. It is important that PyDispatcher/PyOperator dispatch with functorch transforms, because this is our plan for higher-order operators (operators that accept functions as arguments). Examples of these include: - functorch transforms over the existing cond operator (control flow) - autograd.Function support for functorch (which I am working towards), - AOTDispatcher (should be a higher order operator) Concretely, the problem with teaching PyDispatcher/PyOperator about functorch is that the stack-based dispatching logic (DynamicLayerStack) is hidden inside the fallbacks for two dispatch keys (DynamicLayer{Front, Back}). PyDispatcher doesn't know about C++ boxed fallbacks, our plan on record for that is that we need to reimplement all of them in Python (but can call helper functions in C++ to make our lives easier). Instead of exposing all of what DynamicLayer{Front, Back} do to python, this PR takes the approach of re-implementing part of the stack-based dispatching in Python. The motivation is that this is more sane and follows what the "ideal" implementation of functorch would have been: - each transform should be a "mode" - there should be no TLS dispatch key set hackery. functorch needs to do this hackery today to re-use VariableType implementations. This PR: - exposes the DynamicLayerStack to Python - The DynamicLayerStack is a stack of Interpreters. These get exposed to Python as well. - Interpreters can run operations (Interpreter.process) or lower them to the next interpreter in the stack (Interpreter.lower) - To use a PyOperator with functorch transforms, a developer needs to register a rule for each transform (vmap, grad, jvp, ...). - The PyOperator API is NOT user-facing. Things like autograd.Function support for functorch will end up going through the autograd.Function API. Question for reviewers: - Does this design make sense? - I'm trying to split up the "functorch support for autograd.Function" work into logical pieces. Would it be better if I didn't? (the full thing is a bit long - 1000-2000 LOC). Test Plan: - new tests that construct PyOperator and compose them with functorch transforms Pull Request resolved: https://github.com/pytorch/pytorch/pull/88785 Approved by: https://github.com/samdow, https://github.com/soulitzer	2022-11-16 00:46:59 +00:00
Edward Z. Yang	f884e817d4	Make Python op registration work with torchdeploy/multipy (#87162 ) See strategy at PythonOpRegistrationTrampoline.cpp for the big picture. Along the way, I made OperatorHandle support == and hashing, and slightly changed the low level python_dispatch impl API to disallow empty strings for dispatch key, which had the knock on effect of requiring us to explicitly make sure we pass in CompositeImplicitAutograd if we would have passed in "" (I didn't apply this to the rest of the file because I'm lazy.) Test strategy is we delete the logic for preventing Python op registrations in torch from being skipped in a torchdeploy context and show CI still works. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162 Approved by: https://github.com/anjali411, https://github.com/bdhirsh	2022-11-03 12:56:44 +00:00
Sherlock Huang	ab901b4817	Python binding for dispatcher getAllOpNames (#87422 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87422 Approved by: https://github.com/bdhirsh	2022-10-21 06:55:10 +00:00
Richard Zou	cd32a86bf2	Stop monkeypatching Tensor.backward() on `import functorch` (#85152 ) Monkeypatching is bad, we should never be doing it. This PR removes functorch's monkeypatching on Tensor.backward() by adding it directly to the implementation of Tensor.backward(). As an alternative, we could have done an `import functorch` and used `functorch._C.are_transforms_active` directly in `torch/autograd/__init__.py`. The problem with that is that it runs into a bunch of circular imports. NB: https://github.com/pytorch/pytorch/issues/72179 is still on my mind. I didn't choose to do it right now because: - This PR doesn't make the situation worse than it already is (no monkeypatching is better than having the monkeypatch) - We don't have a design for #72179 yet. Test Plan: - tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/85152 Approved by: https://github.com/soulitzer	2022-09-19 17:06:15 +00:00
Michael Voznesensky	8ca1839d32	Python Dispatcher integration with C++ dispatcher (#85050 ) #84826 but without ghstack Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050 Approved by: https://github.com/malfet	2022-09-15 00:43:36 +00:00
PyTorch MergeBot	706b990306	Revert "Python Dispatcher integration with C++ dispatcher (#84826 )" This reverts commit `35f6a69191`. Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see `35f6a69191`	2022-09-14 14:07:58 +00:00
Michael Voznesensky	35f6a69191	Python Dispatcher integration with C++ dispatcher (#84826 ) Signed-off-by: Edward Z. Yang <ezyangfb.com> From @ezyang's original PR: There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients: We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch. I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful. I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826 Approved by: https://github.com/ezyang	2022-09-14 06:57:19 +00:00
Michael Voznesensky	ced2ca8f86	Torch cond operator, python dispatch, pyoperator (#83154 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/83154 Approved by: https://github.com/ezyang	2022-08-25 20:11:53 +00:00
Brian Hirsh	1a51efd8bb	dispatch API for checking computed table, use it in prim decomps (#82358 ) Fixes https://github.com/pytorch/pytorch/issues/82331 Expose a `torch._C._dispatch_has_computed_kernel_for_dispatch_key` to check if an operator has a kernel registered to the given dispatch key in the computed table. Use it in the prim registration logic, making it more accurate and robust (so that it e.g. picks up `CompositeExplicitAutograd` kernels. It looks like before this change we'd register 134 prim ops to the meta key, and after we only register 62. So that's 72 ops that now use an existing C++ decomp to get meta working, instead of going directly through the prim decomp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82358 Approved by: https://github.com/ezyang	2022-08-10 23:42:02 +00:00
Edward Z. Yang	df69660832	Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )"" (#82599 ) This reverts commit `532b8a9e00`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599 Approved by: https://github.com/albanD	2022-08-02 19:37:02 +00:00
PyTorch MergeBot	532b8a9e00	Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )" This reverts commit `9465c0e0b5`. Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels	2022-08-01 20:25:35 +00:00
Edward Z. Yang	9465c0e0b5	Add a lint rule for torch/csrc/util/pybind.h include (#82552 ) We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations always be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552 Approved by: https://github.com/albanD	2022-08-01 17:16:58 +00:00

1 2

87 Commits