Summary:
FBGEMM uses `self.iter.is_cuda` to check if the tensor is for CUDA. This diff enables similar feature `self.iter.is_mtia` for tensors with MTIA device key.
Test Plan: See diff D48693225
Reviewed By: jackm321
Differential Revision: D48809191
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108310
Approved by: https://github.com/albanD
**Update:** Made refactor of the original PR. See the original description below, but here I'll describe the updates:
(1) TLS changes in `TorchDispatchModeTLS.h/cpp`.
I added a `TorchDispatchModeKey` enum, that (for now) just contains PROXY and FAKE. The ModeTLS used to just contain a `std::vector<std::shared_ptr<c10::SafePyObject>>` corresponding to the mode stack. It now **also** contains a separate array of "infra modes", indexed by mode key (PROXY and FAKE, with a new addition, FUNCTIONAL, coming later in the stack).
`TorchDispatchModeTLS::push_onto_stack` and `TorchDispatchModeTLS::pop_stack` are now a bit more complicated. Pushing accepts an optional mode_key, which if set, tells us to add the given mode directly to our "infra_modes" array. Popping will first check the "user mode" stack, before trying to pop anything from the infra mode stack. It also optionally returns the mode key of the mode we popped if there was one - that way if we push that same mode back onto the TLS later, we know where it goes.
`TorchDispatchModeTLS::dispatch_mode_enabled()` now accepts an optional `skip_infra_modes` param, so you can separately query if there are "any modes at all", or if there are "any user modes".
`TorchDispatchModeTLS::get/set/unset_mode()` all take in a mode key, and get/set/unset the mode at that particular mode key (meaning they are only meant to be used for infra modes).
There were also some mild codegen changes to support the new enum
(2) `fake_tensor.py/proxy_tensor.py/_python_dispatch.py`
The way I tell the infra that certain subclasses/modes are "infra" is through the enum: I gave `FakeTensor` and `FakeTensorMode` a `self._mode_key = torch._C.TorchDispatchModeKey.FAKE`. `TorchDispatchMode.__enter/exit__()` (in `_python_dispatch.py` now check if the current mode has a mode key, and if so they plumb it into any `push_onto_stack()` calls (which eventually instructs `TorchDispatchModeTLS` where to put the mode). Same thing for `ProxyTorchDispatchMode`.
I also had to change both of these mode's enter/exit, to handle the fact that there can no longer be multiple proxy/fake modes on the mode stack at once. I updated them both to have a `self.enter_stack: List[Optional[TorchDispatchMode]]` - whenever we push a given mode in `__enter__`, we remove the current ambient fake/proxy mode from the mode stack, and save it in `enter_stack`, so that on exit we can reset the state properly.
(2) dispatching logic in `python_arg_parser.cpp`
This is where the core dispatching logic changes are. I added two helpers, `dispatch_on_subclass()` and `dispatch_on_mode()`. The overall dispatching order is now:
```
(a) dispatch_on_mode() # try user modes first (where the mode stack automatically considers infra modes last)
(b) dispatch_on_subclass() # try user subclasses next (skipping infra subclasses)
(c) dispatch_on_subclass() # try infra subclasses next (skipping user subclasses)
```
Note that we still want "user subclasses" to run before "infra modes". As Ed helped me realize, this will work today: If proxy/fake modes in step 1, they'll return NotImplemented if they see a user subclass, allowing us to redispatch to the user subclass.
How do (b) and (c) distinguish between user and infra subclasses? Infra subclasses (FakeTensor, and later FunctionalTensor) are required to have a `_mode_key` hidden on the subclass - so we filter via arguments that do/don't have the _mode_key.
(3) I also changed `DoubleTensor` to `TwoTensor` to minimize confusion (@albanD pointed out that DoubleTensor would be easily confused with `torch.FloatTensor` and friends).
----- original description below -----
The main purpose of this PR is to fix the "ordering problem" between torch_dispatch modes, where we want to ensure that our Fake and Proxy dispatch modes always run **after** any dispatch modes created by the user, regardless of where they are in the stack. See this doc for more details: https://docs.google.com/document/d/1COQ291nOZvtFnzGTQMJqoYZ3sttEYFw_7HbfSyL8gcA/edit
Full set of changes below. I ended up including a few semi-related changes in this PR that I documented - but if folks would rather I separate them out, happy to try to do that.
**(1) Add dedicated TLS slots for FakeTensorMode and ProxyTensorMode**
This is the main component of this PR. There are two new slots, `TorchDispatchModeTLS.fake_mode_` and `TorchDispatchModeTLS.proxy_mode_`, which correspond to a single "global" fake and proxy mode. There is now an invariant that `torchDispatchModeState.stack_` can never contain either of these modes.
I also added a `TorchDispatchModeTLS::maybe_highest_mode()` helper that consults the `stack_` as well as both the proxy and fake slots, and returns the highest priority mode - this is because there are a few places in the codebase where we legitimately want to get the highest priority mode, *including* fake or proxy, if one is set.
This also made the implementations of the existing `disable_proxy_modes_tracing()` and `get_innermost_proxy_mode()` marginally simpler.
**(2) Updated the dispatching logic in handle_torch_function_no_python_arg_parser()**
This is the function that actually figures out which torch_dispatch implementation to call, given the current mode stack and tensor subclass inputs. This function got marginally more complicated as part of the refactor: First we inspect the mode stack and any non-fake subclass inputs. Then we check for the proxy mode slot. Then we check for the Fake mode slot, before finally checking for any fake subclass inputs.
**(3) new python `_get_fake_tensor_mode()` and `_get_proxy_tensor_mode()` API's**
Before, if you wanted to see if proxy or fake modes were active in python, you would have to consult the mode stack. Since these two modes are no longer part of the actual mode stack, I added two new API's to directly check if either proxy or fake modes are active.
**(4) Allow traceable tensor subclasses to access storages from python**
This is convenient later in the stack, where AOTAutograd needs to detect aliasing of inputs and outputs, where those inputs and outputs might be tensor subclasses. Previously, `x.untyped_storage()` would raise an error if `x` was a subclass. In this PR, I tried to relax this constraint as little as possible: `THPVariable_storage()` will only try to return a storage to python if the tensor subclass that you are passing in is "traceable"
**(5) Fixed subclass fakeification**
@wanchaol recently added support to be able to fakeify tensor subclasses. That fakeification logic works in most cases, but there is one case it doesn't handle: autograd metadata. In particular, since autograd sees our tensor subclasses and not their desugared tensors, we need to make sure that our fakeified subclass has the same autograd metadata as the original subclass. I updated `meta_utils.py` to make sure that the autograd metadata is correct.
**(6) make tensor subclasses resizeable**
Previously we didn't allow tensor subclasses to be resizeable. I ran into an issue where fakeifying a tensor subclass occasionally requires swapping out its storage, which can involve resizing the tensor. Mechanically, this required updating `at::for_blob()` to expose a way to request that the tensor that you create has resizeable storage, and then using this new API in `_make_wrapper_tensor()`.
**(7) Added a basic DoubleTensor subclass for testing**
I use this subclass more later in this stack in my AOTAutograd tests - but it serves as a simple subclass example to test the dispatch ordering in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104482
Approved by: https://github.com/ezyang
ghstack dependencies: #107415
Resolves https://github.com/pytorch/pytorch/issues/107097
After this PR, instead of
```python
torch.sparse_coo_tensor(indices, values, size)._coalesced_(is_coalesced)
```
(that does not work in the autograd context, see #107097), use
```python
torch.sparse_coo_tensor(indices, values, size, is_coalesced=is_coalesced)
```
All sparse coo factory functions that take indices as input support the `is_coalesced` argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107638
Approved by: https://github.com/cpuhrsch
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
Summary:
As suggested in #105230, mypy checking is enabled in `torch/_inductor/lowering.py`.
23 errors fixed; 6 silenced with `# type: ignore[attr-defined]`.
Test Plan:
Before the fix:
```
$ mypy torch/_inductor/lowering.py
torch/_inductor/lowering.py:139:16: error: "Symbol" has no attribute "is_integer" [attr-defined]
torch/_inductor/lowering.py:263:20: error: Incompatible types in assignment (expression has type "Union[List[Any], Tuple[Any, ...]]", variable has type "List[Any]") [assignment]
torch/_inductor/lowering.py:427:49: error: "IRNode" has no attribute "get_size" [attr-defined]
torch/_inductor/lowering.py:439:37: error: "IRNode" has no attribute "get_dtype" [attr-defined]
torch/_inductor/lowering.py:456:34: error: "IRNode" has no attribute "get_device" [attr-defined]
torch/_inductor/lowering.py:645:44: error: Need type annotation for "b" [var-annotated]
torch/_inductor/lowering.py:1321:12: error: "FakeTensor" has no attribute "is_cpu" [attr-defined]
torch/_inductor/lowering.py:1542:24: error: Argument 3 to "FixedLayout" has incompatible type "List[int]"; expected "List[Expr]" [arg-type]
torch/_inductor/lowering.py:1542:81: error: Argument "offset" to "FixedLayout" has incompatible type "int"; expected "Expr" [arg-type]
torch/_inductor/lowering.py:1571:24: error: Argument 3 to "FixedLayout" has incompatible type "List[int]"; expected "List[Expr]" [arg-type]
torch/_inductor/lowering.py:1571:81: error: Argument "offset" to "FixedLayout" has incompatible type "int"; expected "Expr" [arg-type]
torch/_inductor/lowering.py:1654:12: error: Incompatible types in assignment (expression has type "List[Any]", variable has type "Tuple[Any, ...]") [assignment]
torch/_inductor/lowering.py:2009:9: error: Need type annotation for "ranges" (hint: "ranges: List[<type>] = ...") [var-annotated]
torch/_inductor/lowering.py:2151:16: error: Incompatible types in assignment (expression has type "List[Any]", variable has type "Tuple[Any, ...]") [assignment]
torch/_inductor/lowering.py:2198:43: error: Item "type" of "Union[List[Any], type]" has no attribute "__iter__" (not iterable) [union-attr]
torch/_inductor/lowering.py:2229:36: error: Argument 1 to "len" has incompatible type "Union[List[Any], type]"; expected "Sized" [arg-type]
torch/_inductor/lowering.py:2231:38: error: Item "type" of "Union[List[Any], type]" has no attribute "__iter__" (not iterable) [union-attr]
torch/_inductor/lowering.py:2233:35: error: Item "type" of "Union[List[Any], type]" has no attribute "__iter__" (not iterable) [union-attr]
torch/_inductor/lowering.py:2569:54: error: Incompatible default for argument "reduce" (default has type "None", argument has type "str") [assignment]
torch/_inductor/lowering.py:2569:54: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
torch/_inductor/lowering.py:2569:54: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
torch/_inductor/lowering.py:2586:59: error: Incompatible default for argument "reduce" (default has type "None", argument has type "str") [assignment]
torch/_inductor/lowering.py:2586:59: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
torch/_inductor/lowering.py:2586:59: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
torch/_inductor/lowering.py:2720:65: error: Incompatible default for argument "scales_x" (default has type "None", argument has type "Tuple[float]") [assignment]
torch/_inductor/lowering.py:2720:65: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
torch/_inductor/lowering.py:2720:65: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
torch/_inductor/lowering.py:2735:5: error: Name "scale" already defined on line 2731 [no-redef]
torch/_inductor/lowering.py:2758:47: error: Argument 3 to "upsample_nearestnd" has incompatible type "Tuple[Optional[float]]"; expected "Tuple[float]" [arg-type]
torch/_inductor/lowering.py:2765:47: error: Argument 3 to "upsample_nearestnd" has incompatible type "Tuple[Optional[float], Optional[float]]"; expected "Tuple[float]" [arg-type]
torch/_inductor/lowering.py:2776:47: error: Argument 3 to "upsample_nearestnd" has incompatible type "Tuple[Optional[float], Optional[float], Optional[float]]"; expected "Tuple[float]" [arg-type]
torch/_inductor/lowering.py:2949:13: error: No binding for nonlocal "grad" found [misc]
torch/_inductor/lowering.py:3063:49: error: Argument 2 to "range_mask_low" has incompatible type "int"; expected "Expr" [arg-type]
torch/_inductor/lowering.py:3271:48: error: "IRNode" has no attribute "data" [attr-defined]
torch/_inductor/lowering.py:3272:16: error: "IRNode" has no attribute "data" [attr-defined]
Found 29 errors in 1 file (checked 1 source file)
```
After the fix:
```
$ mypy torch/_inductor/lowering.py
Success: no issues found in 1 source file
```
Reviewers: @eellison
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105317
Approved by: https://github.com/eellison
Fixes#102768
- Provides proper function declarations in generated `torch/nn/functional.pyi`.
- Moves some functions from manually defined in `functional.pyi.in` to generated code, in order to single-source the signature.
- Includes some of the functions in `torch._C._nn` into its `.pyi.in`, but not exhaustive (only what's already there).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102918
Approved by: https://github.com/drisspg, https://github.com/malfet
Fixes#101862
No more type errors and improved return type value:
```python
import torch
from torch import nn
t = torch.tensor([1, 2, 3], dtype=torch.float32)
t2 = torch.Tensor._make_subclass( # OK
nn.Parameter,
t.data,
)
reveal_type(t2) # Type of "t2" is "Parameter"
t3 = t._make_subclass( # OK
nn.Parameter,
t.data,
)
reveal_type(t3) # Type of "t3" is "Parameter"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101961
Approved by: https://github.com/albanD
Implements a simple content-addressable store for storages (with tensors implemented as cheap references on top), enabling incremental serialization of tensors to disk, which I intend to use in the accuracy repro extractor. Check the comment at the top of torch/utils/_content_store.py for more details on the intended use case.
One major piece of this PR is implementing the content hash for tensors. For our prospective use case, we may need to repeatedly hash up to 80 GB of tensor data every time we snapshot (and we may snapshot multiple times). Using a conventional cryptographic hash and hashing each snapshot would likely take on order of minutes, which seemed too slow to me. So instead, I implemented a crappy hash function that can be run on GPU. It is at least somewhat theoretically grounded: using random parameters generated by Philox, we use the standard shift-multiply and xor sum universal hash family. The hash function is a bit dorky though; instead of properly doing 160-bit math, it just runs 32-bit hash five times and cats them together. By the way, this sets the first precedent for kernel in PyTorch library which MUST be torch.compile'd to be run (in fact, this kernel does not run in eager mode because of the use of xor_sum, which doesn't actually exist in ATen.)
I had to add a few more primitives to inductor, namely randint (over the entire int range) and xor_sum. Fortunately, these primitives are natively supported by Triton/C++, and so they were very easy to plumb through. xor_sum is exposed as a prim, while randint special cases on when low/high span the entire 32-bit signed integer range.
Thanks to Jeff Johnson for letting me bounce ideas of him on a Saturday morning lol.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99809
Approved by: https://github.com/voznesenskym
Changes:
1. Use class inheritance for `torch/return_types.pyi`:
Before:
```python
max = NamedTuple("max", [("values", Tensor), ("indices", Tensor)])
```
After:
```python
class max(NamedTuple):
values: Tensor
indices: Tensor
```
------
2. Add missing spaces in generated type annotations.
1. Always has a space after `,`.
2. If an argument is annotated, then there need spaces around `=` when it has a default value.
```diff
- def func(..., out: Optional[Tensor]=None, ...) -> Tensor:
+ def func(..., out: Optional[Tensor] = None, ...) -> Tensor:
```
3. If an argument is not annotated, then there should be no spaces around `=` when it has a default value.
```python
def contiguous(self, memory_format=torch.contiguous_format) -> Tensor: ...
```
------
3. ~Remove redundant import alias in `torch/nn/functional.pyi`:~ (Reverted)
UPDATE: `mypy` needs the alias to work.
Before:
```python
from .. import conv1d as conv1d
from .. import conv2d as conv2d
from .. import conv3d as conv3d
from .. import conv_transpose1d as conv_transpose1d
from .. import conv_transpose2d as conv_transpose2d
from .. import conv_transpose3d as conv_transpose3d
from .. import conv_tbc as conv_tbc
from .. import avg_pool1d as avg_pool1d
from .. import relu_ as relu_
from .. import selu_ as selu_
from .. import celu_ as celu_
from .. import rrelu_ as rrelu_
from .. import pixel_shuffle as pixel_shuffle
from .. import pixel_unshuffle as pixel_unshuffle
from .. import channel_shuffle as channel_shuffle
from .. import native_channel_shuffle as native_channel_shuffle
from .. import pdist as pdist
from .. import cosine_similarity as cosine_similarity
```
After:
```python
from .. import (
conv1d,
conv2d,
conv3d,
conv_transpose1d,
conv_transpose2d,
conv_transpose3d,
conv_tbc,
avg_pool1d,
relu_,
selu_,
celu_,
rrelu_,
pixel_shuffle,
pixel_unshuffle,
channel_shuffle,
native_channel_shuffle,
pdist,
cosine_similarity,
)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95877
Approved by: https://github.com/ezyang
# Summary
- Adds type hinting support for SDPA
- Updates the documentation adding warnings and notes on the context manager
- Adds scaled_dot_product_attention to the non-linear activation function section of nn.functional docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94008
Approved by: https://github.com/cpuhrsch
This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted.
The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:
`torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking.
`torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.
The PR fixes https://github.com/pytorch/pytorch/issues/90833
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094
Approved by: https://github.com/cpuhrsch
This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI:
- `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively
- `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden.
The PR also fixes https://github.com/pytorch/pytorch/issues/90833
# Main issue
*The following content is outdated after merging the PRs in this ghstack but kept for the record.*
The importance of this feature is that when enabling the invariants checks by default, say, via
<details>
```
$ git diff
diff --git a/torch/__init__.py b/torch/__init__.py
index c8543057c7..19a91d0482 100644
--- a/torch/__init__.py
+++ b/torch/__init__.py
@@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ:
# Populate magic methods on SymInt and SymFloat
import torch.fx.experimental.symbolic_shapes
+
+# temporarily enable sparse tensor arguments validation in unsafe
+# constructors:
+
+torch._C._set_check_sparse_tensor_invariants(True)
```
</details>
a massive number of test failures/errors occur in test_sparse_csr.py tests:
```
$ pytest -sv test/test_sparse_csr.py
<snip>
==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ====
```
that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised:
```
AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor"
RuntimeError: CUDA error: device-side assert triggered
RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied.
RuntimeError: expected col_indices to be a strided and contiguous tensor
RuntimeError: expected row_indices to be a strided and contiguous tensor
RuntimeError: expected values to be a strided and contiguous tensor
RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered
RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
The idea is to add a custom handler to Functionalize key in Python
dispatcher that runs the functionalized version along side a non
functionalized version, and checks that their outputs agree in the
end. (Technically, for metadata mutation we should also check the
inputs, but for now we're relying on those functions returning self.)
I turned this on for test_functionalize.py (new TestCrossRefFunctionalize)
and found a bunch of failures that look legit.
This probably doesn't interact that nicely if you're also tracing at
the same time, probably need more special logic for that (directly,
just disabling tracing for when we create the nested fake tensor mode,
but IDK if there's a more principled way to organize this.)
There are some misc fixups which I can split if people really want.
- xfail_inherited_tests moved to test common_utils
- Bindings for _dispatch_tls_set_dispatch_key_included,
_dispatch_tls_is_dispatch_key_included and _functionalization_reapply_views_tls
- Type stubs for _enable_functionalization, _disable_functionalization
- all_known_overloads utility to let you iterate over all OpOverloads
in all namespaces. Iterator support on all torch._ops objects to let
you iterate over their members.
- suspend_functionalization lets you temporarily disable functionalization mode
in a context
- check_metadata_matches for easily comparing outputs of functions and see
if they match (TODO: there are a few copies of this logic, consolidate!)
- _fmt for easily printing the metadata of a tensor without its data
- _uncache_dispatch for removing a particular dispatch key from the cache,
so that we force it to regenerate
- check_significant_strides new kwarg only_cuda to let you also do stride
test even when inputs are not CUDA
- Functionalize in torch._C.DispatchKey
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89498
Approved by: https://github.com/malfet
This refactor was prompted by challenges handling mixed int/float
operations in C++. A previous version of this patch
added overloads for each permutation of int/float and was unwieldy
https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different
approach.
The general outline of the patch is to combine the C++ types SymIntNode
and SymFloatNode into a single type, SymNode. This is type erased; we
no longer know statically at C++ if we have an int/float and have to test
it with the is_int()/is_float() virtual methods. This has a number of
knock on effects.
- We no longer have C++ classes to bind to Python. Instead, we take an
entirely new approach to our Python API, where we have a SymInt/SymFloat
class defined entirely in Python, which hold a SymNode (which corresponds
to the C++ SymNode). However, SymNode is not pybind11-bound; instead,
it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode
when it goes into C++. This implies a userland rename.
In principle, it is also possible for the canonical implementation of SymNode
to be written in C++, and then bound to Python with pybind11 (we have
this code, although it is commented out.) However, I did not implement
this as we currently have no C++ implementations of SymNode.
Because we do return SymInt/SymFloat from C++ bindings, the C++ binding
code needs to know how to find these classes. Currently, this is done
just by manually importing torch and getting the attributes.
- Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now
takes SymInt/SymFloat, rather than SymNode, bringing it in line with how
__torch_dispatch__ works.
Some miscellaneous improvements:
- SymInt now has a constructor that takes SymNode. Note that this
constructor is ambiguous if you pass in a subclass of SymNode,
so an explicit downcast is necessary. This means toSymFloat/toSymInt
are no more. This is a mild optimization as it means rvalue reference
works automatically.
- We uniformly use the caster for c10::SymInt/SymFloat, rather than
going the long way via the SymIntNode/SymFloatNode.
- Removed some unnecessary toSymInt/toSymFloat calls in normalize_*
functions, pretty sure this doesn't do anything.
- guard_int is now a free function, since to guard on an int you cannot
assume the method exists. A function can handle both int and SymInt
inputs.
- We clean up the magic method definition code for SymInt/SymFloat/SymNode.
ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets
plain methods; this is to help avoid confusion between the two types.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817
Approved by: https://github.com/albanD, https://github.com/anjali411
In `torchvision` we started to use tensor subclasses. With the current annotations, this minimal example throws three errors when checking with `mypy`:
```py
from typing import Type, TypeVar, Any, Optional, Union
import torch
T = TypeVar("T", bound="TensorSubclass")
class TensorSubclass(torch.Tensor):
def __new__(
cls: Type[T],
data: Any,
*,
dtype: Optional[torch.dtype] = None,
device: Optional[Union[torch.device, str, int]] = None,
) -> T:
return torch.as_tensor(data, dtype=dtype, device=device).as_subclass(cls)
```
```
main.py:16:16: error: Incompatible return value type (got "Tensor", expected "T") [return-value]
main.py:16:58: error: Argument "device" to "as_tensor" has incompatible type "Union[device, str, int, None]"; expected "Optional[device]" [arg-type]
main.py:16:78: error: Argument 1 to "as_subclass" of "_TensorBase" has incompatible type "Type[T]"; expected "Tensor" [arg-type]
```
I'll explain inline why the old annotations are wrong.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86105
Approved by: https://github.com/albanD
By upstreaming functorch's tensor printing logic into PyTorch. There's
no way of creating a custom print function for a TensorImpl subclass (as
opposed to a torch_dispatch or torch_function tensor subclass, which can
just override repr()) right now, so we need to directly interpose inside
regular Tensor printing in PyTorch.
Monkey patching is bad; users do not expect `import blah` to change
something about another library.
Fixes https://github.com/pytorch/functorch/issues/900
Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85430
Approved by: https://github.com/ezyang
Instead of calling into the Python dispatcher for EVERY dispatcher
call, we now have a two step process. First, we
getattr(op: OpOverload, dispatch_key) to "load" the handler for the
function. This can either be a conventional function (in which
case we will call it, in the same way the old Python dispatcher
worked), or it can be a DispatchKey, in which case we will directly
call that DispatchKey in C++, bypassing marshalling between Python
and C++ entirely. OpOverload.__getattr__ is carefully written so
that it will cache the
A further optimization would be to define __slots__ on OpOverload,
and ensuring that the DispatchKey strings are interned.
The resulting Python dispatcher is less flexible: after the first
lookup, the handler is cached and we won't recompute it. Furthermore,
by default, dispatches will not go into Python, and so you won't
get stack frames for the Python dispatcher by default. But we get
a huge performance improvement: on the following microbenchmark
we go from 2.5s to 1.9s.
```
import time
import torch
from functorch import make_fx
def f(x):
for i in range(1000):
x = x * x
return x
begin = time.time()
res = make_fx(f, tracing_mode="symbolic")(torch.randn(10, 20))
print(time.time()-begin)
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85133
Approved by: https://github.com/wconstab
Signed-off-by: Edward Z. Yang <ezyangfb.com>
From @ezyang's original PR:
There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:
We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful.
I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826
Approved by: https://github.com/ezyang
From PR:
```
Note: [Fake Tensor Dispatch Keys]
In order to model the behavior of device-specific autocast
and autograd logic, we update the dispatch keys of FakeTensors
to reflect their fake device. This includes the BackendComponent
(DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent
related Autocast and Autograd keys. __torch__dispatch__ sits below
Autocast and Autograd, and is only invoked when we are at the
kernel for the BackendComponent. Then, we add Meta to the
thread-local dispatch include set to hit the meta kernel
instead of the kernel of the BackendComponent for the fake device.
```
Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge.
See: https://github.com/pytorch/pytorch/issues/81608
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82449
Approved by: https://github.com/ezyang
### Description
Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public.
`TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`.
Documentation for storages is improved as well.
### Issue
Fixes#82436
### Testing
N/A
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438
Approved by: https://github.com/ezyang
**Reopened** to help with merge issues. See #59790 for full context.
Fixes#20778. Helps #71688.
Finalizes @martinPasen's force argument for `Tensor.numpy()`. It is set to False by default. If it's set to True then we:
1. detatch the Tensor, if requires_grad == True
2. move to cpu, if not on cpu already
3. Uses .resolve_conj() if .is_conj() == True
4. Uses .resolve_neg() if .is_neg() == True
cc @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78564
Approved by: https://github.com/albanD