This refactor was prompted by challenges handling mixed int/float
operations in C++. A previous version of this patch
added overloads for each permutation of int/float and was unwieldy
https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different
approach.
The general outline of the patch is to combine the C++ types SymIntNode
and SymFloatNode into a single type, SymNode. This is type erased; we
no longer know statically at C++ if we have an int/float and have to test
it with the is_int()/is_float() virtual methods. This has a number of
knock on effects.
- We no longer have C++ classes to bind to Python. Instead, we take an
entirely new approach to our Python API, where we have a SymInt/SymFloat
class defined entirely in Python, which hold a SymNode (which corresponds
to the C++ SymNode). However, SymNode is not pybind11-bound; instead,
it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode
when it goes into C++. This implies a userland rename.
In principle, it is also possible for the canonical implementation of SymNode
to be written in C++, and then bound to Python with pybind11 (we have
this code, although it is commented out.) However, I did not implement
this as we currently have no C++ implementations of SymNode.
Because we do return SymInt/SymFloat from C++ bindings, the C++ binding
code needs to know how to find these classes. Currently, this is done
just by manually importing torch and getting the attributes.
- Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now
takes SymInt/SymFloat, rather than SymNode, bringing it in line with how
__torch_dispatch__ works.
Some miscellaneous improvements:
- SymInt now has a constructor that takes SymNode. Note that this
constructor is ambiguous if you pass in a subclass of SymNode,
so an explicit downcast is necessary. This means toSymFloat/toSymInt
are no more. This is a mild optimization as it means rvalue reference
works automatically.
- We uniformly use the caster for c10::SymInt/SymFloat, rather than
going the long way via the SymIntNode/SymFloatNode.
- Removed some unnecessary toSymInt/toSymFloat calls in normalize_*
functions, pretty sure this doesn't do anything.
- guard_int is now a free function, since to guard on an int you cannot
assume the method exists. A function can handle both int and SymInt
inputs.
- We clean up the magic method definition code for SymInt/SymFloat/SymNode.
ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets
plain methods; this is to help avoid confusion between the two types.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817
Approved by: https://github.com/albanD, https://github.com/anjali411
This is by no means comprehensive, but adds initial support for SymInt as a Scalar.
Things that don't work yet but need to:
- for some reason `torch.add(tensor, sym_int)` got matched to the `add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor` schema
- `x + sym_int` failed bc we tried to turn `x` into a sym int:
```
"__radd__",
[](c10::SymIntNode a, py::object b) -> c10::SymIntNode {
auto snb = toSymIntNode(a, b);
return a->add(snb);
})
```
- Many more things I'm sure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84958
Approved by: https://github.com/ezyang
This PR allows user to author a CUDA kernel in python.
```
from torch.cuda.jiterator import create_jit_fn
code_string = "template <typename T> T my_kernel(T x, T y, T alpha) { return -x * y + x - y + alpha; }"
jitted_fn = create_jit_fn(code_string, alpha=0)
a = torch.rand(3, device='cuda')
b = torch.rand(3, device='cuda')
result = jitted_fn(a, b, alpha=1.0)
```
Limitations:
- Only supports elementwise kernel
- 1~8 tensor inputs (empty input, e.g. factory methods, is not supported)
- inputs tensors must live in cuda device
- cpu Scalar is not supported
- kwargs must be pre-declared when calling create_jit_fn
- kwargs must be convertible to at::Scalar, one of float64, int64_t, bool. (complex not support for now)
TODOs:
- [x] consolidate union and c10::variant implementation
- [x] plug into existing op testing framework
- [ ] rename files, place files in the right folder
- [ ] place util functions in the right file
- [x] enforce assumptions in python interface e.g <8 inputs, kwargs types
- [x] Add user-facing documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76394
Approved by: https://github.com/mruberry
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71356
Suppress remaining header based warnings in `caffe2/c10` when building with `clang`
Test Plan: CI pass
Reviewed By: r-barnes
Differential Revision: D33600097
fbshipit-source-id: e1c0d84a0bad768eb03e047d62b5379cf28b48e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61361
This PR ports the `clamp` kernel to the structured format. In addition, it introduces `OptionalScalarRef` as a replacement for `c10::optional<Scalar>&`. The latter, although it is a reference type, can still involve copying the contained `Scalar` (e.g. if the actual parameter is a `Scalar` or if a `c10::optional<Scalar>` is constructed just to call a kernel). `OptionalScalarRef` contains only a `const Scalar&`, and stores flag about whether the instance contains something inside the `Scalar` itself using a new tag.
For more information, see #55070.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D29821533
Pulled By: SplitInfinity
fbshipit-source-id: 88d55df5a4b2c14b68a57e4905d90eea1b088d99
Summary:
Currently calling `scalar.to<std::complex<double>>()` for example compiles but throws an error at runtime. Instead, marking the non-specialized cases as `= delete` means the code fails to compile and you catch the error sooner.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58726
Reviewed By: zou3519, seemethere
Differential Revision: D28646057
Pulled By: ezyang
fbshipit-source-id: 9e4e3d1b4586eeecbb73db61bba56560b2657351
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46596
1. Added `conj` method for scalar similar to numpy.
2. Updates backward formulas for add and sub to work correctly for R -> C cases and for the case when alpha is complex.
3. Enabled complex backward for nonzero (no formula update needed).
Test Plan: Imported from OSS
Reviewed By: glaringlee
Differential Revision: D24529227
Pulled By: anjali411
fbshipit-source-id: da871309a6decf5a4ab5c561d5ab35fc66b5273d
Summary:
`is_complex_t` is a bad name. For example in std, there are `std::is_same` but not `std::is_same_t`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39906
Reviewed By: mrshenli
Differential Revision: D22665013
Pulled By: anjali411
fbshipit-source-id: 4b71745f5e2ea2d8cf5845d95ada4556c87e040d
Summary:
**Main:**
- `c10::complex` is refactored: it no longer uses inheritance to specialize constructors, but using SFINAE instead. This implementation is cleaner and avoids some compiler bugs.
- `c10::Scalar` is cleaned up: it no longer needs to store complex as `double z[2]`, `c10::complex<double>` will work.
**Other cleanups:**
- `numeric_limits` of `c10::complex` is moved to `complex_utils.h`
- the variable in `c10::complex` storing real and imag is changed from `storage[2]` to `real_` and `imag_`
- remove the `c10::` before `complex` when in `c10` namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38593
Differential Revision: D21769463
Pulled By: anjali411
fbshipit-source-id: 3cb5bcbb0ff304d137221e00fe481a08dba7bc12
Summary:
This PR basically makes `c10::ComplexHalf` a template specialization of `c10::complex`. Since `c10::ComplexHalf` is not used much, this does not include much change.
Due to the fact that `c10::Half` does not have much `constexpr` methods, it is impossible to keep the same API. Currently, we are just completely reusing the old implementation. It is just the name getting changed from `c10::ComplexHalf` to `c10::complex<c10::Half>`. We can always change the implementation in the future when needed. But for now, I think this is OK.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37426
Differential Revision: D21300754
Pulled By: anjali411
fbshipit-source-id: fc0f65adccf97025a727735096780ce8078675a1
Summary:
# Overview
This PR changes the backing type of complex tensors in `ScalarType` from `std::complex` to `c10::complex`.
Since `c10::complex` and `std::complex` are reinterpret-castable, we can freely use `std::complex *` to access `c10::complex` data and vice versa. The implementation of `c10::complex` is not complete yet, so we are reinterpret casting all complex data to `std::complex` during dispatch, and do all operations in `std::complex`.
# `std::complex` and `c10::complex` interoperatability
To use `std::complex *` to access `c10::complex` data, the following specializations are added:
```C++
template <> inline std::complex<float>* Tensor::data_ptr();
template <> inline std::complex<double>* Tensor::data_ptr();
template <> inline std::complex<float> Tensor::item();
template <> inline std::complex<double> Tensor::item();
```
See [`aten/src/ATen/templates/TensorMethods.h`](https://github.com/pytorch/pytorch/pull/37274/files#diff-0e8bf6f5024b32c240a4c1f0b4d8fd71)
And
```C++
template <> inline std::complex<float> Scalar::to();
template <> inline std::complex<double> Scalar::to();
```
is added in [`c10/core/Scalar.h`](https://github.com/pytorch/pytorch/pull/37274/files#diff-aabe1c134055c8dcefad830c1c7ae957)
# Dispatch
Macros in [`Dispatch.h`](https://github.com/pytorch/pytorch/pull/37274/files#diff-737cfdab7707be924da409a98d46cb98) still using `std::complex` as its type. We will add macros such as `AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND3` as needed during the migration and not in this PR.
Note that `AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3` is only used in copy kernel of CUDA, and this PR is already changing it to use `c10::complex` because CUDA copy kernel has to use its original dtype otherwise there will be funny casting of dtypes causing cuda unspecified launch error.
When all the migration is done, the c10 version of macros will be removed, and the default version will have `std::complex` replaced by `c10::complex` by default. This design allows us to incrementally migrate from `std::complex` to `c10::complex`.
# Note
Note that the `std::complex` is not completely replaced by `c10::complex` in c10 yet, for example `c10::Scalar` is still using `std::complex`. This will be fixed in later PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37421
Differential Revision: D21282161
Pulled By: anjali411
fbshipit-source-id: 635e309e8c8a807c2217723ad250b5ab5a20ce45
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33603
This function returns ScalarType based on its value. This is helpful
to avoid code generated in aten_op.h has returned Scalars depending on
arg self to determine its type.
Test Plan: Imported from OSS
Differential Revision: D20100218
Pulled By: ezyang
fbshipit-source-id: 337729a7559e6abb3a16b2a563a2b92aa96c7016
Summary:
These codes are implemented twice at different places by different people, we should merge them together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32730
Differential Revision: D19622023
Pulled By: ezyang
fbshipit-source-id: a9cbda31428b335bf28a7e4050f51f58e787b94f
Summary:
7zip and cmake are part of base image, no need to re-install. Remove the install step can make build/test more stable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30897
Differential Revision: D19232961
Pulled By: mingbowan
fbshipit-source-id: fa3bbd1325839a2a977bf13fdbd97fda43793b8d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30918
This is a C++14 feature we can use now
ghstack-source-id: 95811482
Test Plan: waitforsandcastle
Differential Revision: D18869636
fbshipit-source-id: b5b3d78b61b6ceb2deda509131f8502e95b1d057
Summary:
Improve handling of mixed-type tensor operations.
This PR affects the arithmetic (add, sub, mul, and div) operators implemented via TensorIterator (so dense but not sparse tensor ops).
For these operators, we will now promote to reasonable types where possible, following the rules defined in https://github.com/pytorch/pytorch/issues/9515, and error in cases where the cast would require floating point -> integral or non-boolean to boolean downcasts.
The details of the promotion rules are described here:
https://github.com/nairbv/pytorch/blob/promote_types_strict/docs/source/tensor_attributes.rst
Some specific backwards incompatible examples:
* now `int_tensor * float` will result in a float tensor, whereas previously the floating point operand was first cast to an int. Previously `torch.tensor(10) * 1.9` => `tensor(10)` because the 1.9 was downcast to `1`. Now the result will be the more intuitive `tensor(19)`
* Now `int_tensor *= float` will error, since the floating point result of this operation can't be cast into the in-place integral type result.
See more examples/detail in the original issue (https://github.com/pytorch/pytorch/issues/9515), in the above linked tensor_attributes.rst doc, or in the test_type_promotion.py tests added in this PR:
https://github.com/nairbv/pytorch/blob/promote_types_strict/test/test_type_promotion.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22273
Reviewed By: gchanan
Differential Revision: D16582230
Pulled By: nairbv
fbshipit-source-id: 4029cca891908cdbf4253e4513c617bba7306cb3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25268
The AT_FORALL AND macros with mistakenly already include Half, which differs from the Dispatch macros.
This change shouldn't have any effect.
Test Plan: Imported from OSS
Differential Revision: D17079747
Pulled By: gchanan
fbshipit-source-id: 635eb167722ce850d6c1949fac652de4dddf32ee
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230
Implementing minimum qtensor API to unblock other workstreams in quantization
Changes:
- Added Quantizer which represents different quantization schemes
- Added qint8 as a data type for QTensor
- Added a new ScalarType QInt8
- Added QTensorImpl for QTensor
- Added following user facing APIs
- quantize_linear(scale, zero_point)
- dequantize()
- q_scale()
- q_zero_point()
Reviewed By: dzhulgakov
Differential Revision: D14524641
fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18165
ghimport-source-id: 55cb3fb63a25c2faab1725b4ec14c688bf45bd38
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18166 Bool Tensor for CUDA
* **#18165 Resolved comments from Bool Tensor for CPU PR**
-------
------------
This is a follow up PR that resolves some additional feedback on one the of previous Bool Tensor PRs.
gchanan, here is a list of almost all the comments from the original PR with respective fixes and replies:
**[utils/python_scalars.h]** why is this converting from uint8_t and not bool? (comment?)
When i was adding this, i was testing by creating a tensor and then calling its .tolist(). it worked for bool and uint8_t equally good so i left uint8_t as thought it makes more sense as we are calling PyBool_FromLong. �Changing it to bool.
**[ATen/Dispatch.h]**better name?.
fixed.
**[test/test_torch.py]** what about other factories, such as full? (and more).
There is a test that goes through the factory methods - test_tensor_factories_empty. i added some bool cases above it and added a comment that once CUDA will be done, i will unite them and it will iterate not just between CUDA and CPU but also all types. ��Adding all bool cases now. Will unite in CUDA PR.
**[generic/THTensorMath.h]** any changes in this file actually needed?
Bad merge. Fixed.
**[TH/THTensor.h]** this generates code for random, clampedRandom, and cappedRandom -- do we have tests for all of these with bool?
Added
**[c10/core/ScalarType.h]** I'm not very confident about the lack of Bool here -- can you look at the call sites and see what makes sense to do here?
Added bool to the macro and created a similar one without for a single case which fails the build with errors:
_./torch/csrc/jit/symbolic_variable.h:79:20: error: ambiguous overload for ‘operator*’ (operand types are ‘const torch::jit::SymbolicVariable’ and ‘torch::jit::Value*’)
return (*this) * insertConstant(rhs);_
Differential Revision: D14605105
fbshipit-source-id: abf82d50e8f8c50b386545ac068268651b28496d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316
This starts cleaning up the files in c10 according to the module structure we decided on.
Move to c10/util:
- Half.h, Half-inl.h, Half.cpp, bitcasts.h
Move to c10/core:
- Device.h, Device.cpp
- DeviceType.h, DeviceType.cpp
i-am-not-moving-c2-to-c10
Reviewed By: dzhulgakov
Differential Revision: D13498493
fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63