Ref #54789
A `bool` has only two valid values, 1 or 0. Any in-memory value
outside of those leads to undefined behavior. So, instead of
`reinterpret_cast`-ing to `bool*` I introduce `c10::load<scalar_t>`
which will read as `unsigned char` and convert to a valid `bool`.
This gets >90% of operators working, but the remaining operators where
skips and xfails have been added will require individual attention.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77122
Approved by: https://github.com/mruberry
Ref #54789
A `bool` has only two valid values, 1 or 0. Any in-memory value
outside of those leads to undefined behavior. So, instead of
`reinterpret_cast`-ing to `bool*` I introduce `c10::load<scalar_t>`
which will read as `unsigned char` and convert to a valid `bool`.
This gets >90% of operators working, but the remaining operators where
skips and xfails have been added will require individual attention.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77122
Approved by: https://github.com/mruberry
Added operator-, DispatchKeySet::add, and DispatchKeySet::remove.
I wanted to use these in functorch to make a constexpr DispatchKeySet.
Also adds C10_NODISCARD to DispatchKeySet::remove to make it
consistent with DispatchKeySet::add (this will raise a
warning if someone calls remove without assigning the result to a
variable; remove is NOT mutable and this is a pitfall that I run into a
lot)
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78558
Approved by: https://github.com/bdhirsh
Seems like it should be one. This will make it possible to register
meta implementations even when there is a CompositeImplicitAutograd
registration already. It also paves the way for sparse meta, etc.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78469
Approved by: https://github.com/ngimel
Change our representation of sizes and strides to contain SymInts
instead of int64_t.
Right now it's not actually possible to create a Tensor with symbolic
shape, so this change is intended to be a no-op.
But the intended behavior is:
- If you create a Tensor with symbolic shape, a `CustomSizes` policy
will be set, and the `has_symbolic_sizes_strides_` bit will be set. (not
currently implemented)
- Calling any TensorImpl function that naively interacts with sizes and
strides will throw. For hot-path functions (`sizes()`, `strides()`), we
make use of the existing policy check to throw. For others, we just have
a regular `TORCH_CHECK(!has_symbolic_sizes_strides_)`.
This also undoes the explicit constructor I made in
https://github.com/pytorch/pytorch/pull/77666; it ended up being more
annoying than useful when making these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78272
Approved by: https://github.com/Krovatkin, https://github.com/Chillee
Change our representation of sizes and strides to contain SymInts
instead of int64_t.
Right now it's not actually possible to create a Tensor with symbolic
shape, so this change is intended to be a no-op.
But the intended behavior is:
- If you create a Tensor with symbolic shape, a `CustomSizes` policy
will be set, and the `has_symbolic_sizes_strides_` bit will be set. (not
currently implemented)
- Calling any TensorImpl function that naively interacts with sizes and
strides will throw. For hot-path functions (`sizes()`, `strides()`), we
make use of the existing policy check to throw. For others, we just have
a regular `TORCH_CHECK(!has_symbolic_sizes_strides_)`.
This also undoes the explicit constructor I made in
https://github.com/pytorch/pytorch/pull/77666; it ended up being more
annoying than useful when making these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77994
Approved by: https://github.com/Krovatkin
Since we plan to have a bunch of code that is sensitive to whether or
not a SymInt contains a symbolic shape or not, it seems like a bad idea
to have an implicit constructor.
For example, code like:
```
sizes_and_strides_.stride_at_unchecked(dim) = 0;
```
would sail through, and the `0` would get implicitly promoted to a
SymInt.
This is a tradeoff though: it makes code that handles `SymInt`s more
clunky as `int64_t`s and integer literals need to be explicitly wrapped
in `SymInt` before being used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77666
Approved by: https://github.com/ezyang
Double-header bug fix:
- As reported by jansel, dtypes are still showing up as integers
when the schema is an optional dtype. This is simple enough to
fix and I added a test for it. But while I was at it...
- I noticed that the THPMemoryFormat_new idiom with "unused" name
doesn't actually work, the repr of the returned memory format
object is wrong and this shows up when we try to log the args/kwargs.
So I fixed memory format to do it properly along with everything
else.
Fixes https://github.com/pytorch/pytorch/issues/77135
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77543
Approved by: https://github.com/albanD, https://github.com/jansel
Summary: The new PrivateUse1 DeviceType is associated with the PrivateUse1 DispatchKey, which can be used for non-public devices without introducing a new device type. Note that the stringified name of the PrivateUse1 device is "privateuseone".
Test Plan: All CI should pass.
Differential Revision: D35859437
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77208
Approved by: https://github.com/bdhirsh
Prior to this PR, we had a mish-mash of ways of getting unconventional
sizes/strides behavior:
- In OSS (but not in fbcode), some methods are virtual and you
can override them directly
- There is a is_contiguous policy which is a bitfield tag that lets
you toggle is_contiguous to error or hit a virtual method
is_contiguous_custom if it is set. Ordinarily is_contiguous()
is virtual and you can just override it, but this works EVEN IF
is_contiguous() is non-virtual (e.g., in fbcode)
- There is also a sizes policy which is the same idea but for sizes
This PR unifies these mechanisms, and in doing so, eliminates the
maybe virtual/not-virtualness of the methods in question. The primary
downside of this change is that it is BC-breaking (but the BC break is
very easy to fix!)
The new scheme works like this: we have three levels of policy for
sizes/strides (order matters).
- The Default policy is a conventional dense tensor, where we use
all of the built-in fields to directly represent the
sizes/strides/numel/contiguity of the tensor, and it is possible
to bypass virtual call entirely.
- The CustomStrides policy represent tensors which have a custom
notion of strides (most typically, that they don't support them),
shunting strides() and is_contiguous() to virtual methods
strides_custom() and is_contiguous_custom(). This INCLUDES handling
for contiguity, since they typically go hand-in-hand (although
the situation is murky with batched tensors). The default
implementations of these functions raise errors saying the tensor
doesn't support them.
- The CustomSizes policy represent tensors which have a custom
notion of sizes (the two notable examples are nested tensor, which
doesn't have a representation of sizes in the conventional form, and
XLA/LTC tensor, which synchronizes its sizes with an underlying
compiler backend). This shunts sizes(), numel() and dim() (along
with everything from strides) to _custom() variants.
There is no special policy for erroring; instead, we just do a vcall
and expect the virtual method to raise an exception (the performance
hit from the vcall doesn't matter because you're about to raise a C++
exception anyway). The default implementations of all overridable
functions are available at _default() which is helpful in some
situations when you just want to do a "sync" and then run the
conventional semantics.
This PR could be extended further in two ways but I did not do them
due to time constraints:
- Ideally, all TENSORIMPL_MAYBE_VIRTUAL would be eliminated from
TensorImpl, by using the same policy trick.
- set_size and set_stride are still virtual; it's not entirely clear
the same trick should be used here though as these methods are
deprecated.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77036
Approved by: https://github.com/bdhirsh
This PR allows user to author a CUDA kernel in python.
```
from torch.cuda.jiterator import create_jit_fn
code_string = "template <typename T> T my_kernel(T x, T y, T alpha) { return -x * y + x - y + alpha; }"
jitted_fn = create_jit_fn(code_string, alpha=0)
a = torch.rand(3, device='cuda')
b = torch.rand(3, device='cuda')
result = jitted_fn(a, b, alpha=1.0)
```
Limitations:
- Only supports elementwise kernel
- 1~8 tensor inputs (empty input, e.g. factory methods, is not supported)
- inputs tensors must live in cuda device
- cpu Scalar is not supported
- kwargs must be pre-declared when calling create_jit_fn
- kwargs must be convertible to at::Scalar, one of float64, int64_t, bool. (complex not support for now)
TODOs:
- [x] consolidate union and c10::variant implementation
- [x] plug into existing op testing framework
- [ ] rename files, place files in the right folder
- [ ] place util functions in the right file
- [x] enforce assumptions in python interface e.g <8 inputs, kwargs types
- [x] Add user-facing documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76394
Approved by: https://github.com/mruberry
I noticed that when `SymInt` was introduced, `jit_type_base.h` was
added as an include to the `Operator.h` template which is supposed to
be kept extremely clean and only use forward declarations. Also,
that forward declarations for `OptionalArrayRef` were missing.
So, I've refactored the forward declarations into
`ATen/core/ATen_fwd.h` and cleaned up some of the `c10`
headers that were masking these missing declarations. I've also
re-generated the pre-compiled header so `SymInt` is included.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76576
Approved by: https://github.com/albanD