Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61104
This patch added a new test case for findDanglingImpls. The test case introduces a C++ extension which has a dangling impl such that findDanglingImpls can find it and output its information.
Test Plan:
python test/test_dispatch.py TestDispatch.test_find_dangling_impls_ext
Imported from OSS
Reviewed By: ezyang
Differential Revision: D29512520
fbshipit-source-id: 6883fb8f065f2c0ae0e7a1adf6fd298591497e2b
Summary:
The function name and return type both are called `class_`, therefore they are ambiguous and this is UB and does not work on NVCC. See the tests for the failure case.
Thanks for the help of Thibaut Lutz from NVIDIA's compiler team.
cc: yueyericardo ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57962
Reviewed By: mruberry
Differential Revision: D28359400
Pulled By: ezyang
fbshipit-source-id: c64ec89203f99f656611aba34f7424eed7bc9e7c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583
`Scalar` takes 32 bytes due to `c10::complex<double>`
requires aligning to 16 bytes. Passing Scalar by reference
shows about 1% improvements on instruction count.
All the changes in this commit are codemoded except for
the following 4 files (which code-gen signatures):
```
tools/codegen/api/cpp.py
tools/codegen/api/native.py
tools/codegen/api/structured.py
caffe2/contrib/aten/gen_op.py
```
# Codemode
## Main Step
For the codemod part, here is the main command used:
```
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}'
```
As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter).
In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference).
## Pre-Step
Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list:
```
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}'
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}'
```
## Fixup
There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list:
```
fastmod --extensions cpp 'const const Scalar' 'const Scalar'
fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>'
fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>'
fastmod 'at::const Scalar&' 'const at::Scalar&'
```
## Supplementary
`cu` and `mm` files also need to be codemoded, for example:
```
fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&'
fastmod --extensions mm '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
```
Function pointers are not codemoded. Here is an incomplete list:
```
# Cover case: using index_fill_fn = void(*)(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source);
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
# Cover case: using softplus_fn = void (*)(TensorIterator&, Scalar, Scalar);
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}'
fastmod --extensions cpp '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}'
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)optional<Scalar>([, \)])' '${1}const optional<Scalar>&${2}'
```
Some corner cases needs to be manually fixed.
ghstack-source-id: 123970306
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D26904445
fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53037
As remarked in #52277 it is easy to give an (inefficient, due to extra
redispatches) DefaultBackend implementation of foo and foo_ in terms of
foo_out. This patch enables code generation for DefaultBackend in these
cases by default for all structured kernels. You can see the payoff
in MSNPU extension: it only has to register a kernel for add.out, and it
gets add and add_ kernels automatically.
The actual code changes are very modest:
- When DefaultBackend, call the dispatched (not direct native::)
functions to allocate tensors, change device guard, etc
- Don't call impl() for DefaultBackend (as it doesn't exist); instead,
directly generate a call to at::foo_out to do the actual work.
- Do NOT generate DefaultBackend implementation for foo_out. Actually,
there is a case to be made for this being a good idea with more infra;
see comments inside.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D26731225
Pulled By: ezyang
fbshipit-source-id: 939da7cb69f694722ec293e5e42e74a755dd0985
Summary:
## Rationale
While most of the `torch.Generator` properties and methods are implemented as a thin wrapper of the corresponding `at::Generator` methods, `torch.Generator.get_state()` and `torch.Generator.set_state()` are implemented in legacy Torch code and are not dispatched through the `c10::GeneratorImpl` interface. This is not structured well and makes implementing generators for new backends (e.g. `XLAGeneratorImpl` for the XLA backend) inconvenient. As such, this pull request seeks to move these generator state APIs to c10 and ATen.
## What is being refactored?
* Interfaces
- Added `c10::GeneratorImpl::set_state` and `c10::GeneratorImpl::state` for getting and setting the internal state of a random number generator.
- `at::Generator::set_state` and `at::Generator::state` wraps the above-mentioned APIs, as it's basically a PIMPL.
- Added helper function `at::detail::check_rng_state` for checking the validity of new RNG state tensor.
* CPU Generator
- Renamed and moved `THTensor_(setRNGState)` and `THTensor_(getRNGState)` to `CPUGeneratorImpl::set_state` and `CPUGenerator::state`.
- Renamed and moved `THGeneratorState` and `THGeneratorStateNew` to `CPUGeneratorStateLegacy` and `CPUGeneratorState`.
* CUDA Generator
- Renamed and moved `THCRandom_setRNGState` and `THCRandom_getRNGState` to `CUDAGeneratorImpl::set_state` and `CUDAGeneratorImpl::state`.
* PyTorch Bindings
- `THPGenerator_setState` and `THPGenerator_getState` now simply forward to `at::Generator::set_state` and `at::Generator::state`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49589
Reviewed By: H-Huang
Differential Revision: D25785774
Pulled By: pbelevich
fbshipit-source-id: 8ed79209c4ffb1a0ae8b19952ac8871ac9e0255f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49220
Since all ops are c10-full, we can remove .impl_UNBOXED now.
This also removes the ability of KernelFunction or CppFunction to store unboxedOnly kernels.
ghstack-source-id: 119450489
Test Plan: waitforsandcastle
Reviewed By: ezyang
Differential Revision: D25490225
fbshipit-source-id: 32de9d591e6a842fe18abc82541580647e9cfdad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49145
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49105
(1) Add a safety check `C10_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This diff only changes the files inside the directory /fbsource/fbcode/caffe2/modules/, /fbsource/fbcode/caffe2/fb/, /fbsource/fbcode/caffe2/test/.
(2) Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` when necessary.
Test Plan:
Test build:
```
buck build mode/dev-nosan //caffe2/modules/detectron:
buck test mode/dev-nosan //caffe2/modules/detectron:
buck build mode/dev-nosan //caffe2/torch/fb/:
buck test mode/dev-nosan //caffe2/torch/fb/:
```
To check for launches without checks:
```
python3 caffe2/torch/testing/check_kernel_launches.py
```
Make sure none of the updated files are in the returned list.
Reviewed By: r-barnes
Differential Revision: D25452852
fbshipit-source-id: d6657edab612c9e0fa99b29c68460be8b1a20064
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49105
(1) Add a safety check `C10_CUDA_KERNEL_LAUNCH_CHECK()` after each kernel launch. This diff only changes the files inside the directory /fbsource/fbcode/caffe2/modules/, /fbsource/fbcode/caffe2/fb/, /fbsource/fbcode/caffe2/test/.
(2) Get rid of old check `AT_CUDA_CHECK(cudaGetLastError())` when necessary.
Test Plan:
Test build:
```
buck build //caffe2/modules/detectron:
buck build //caffe2/torch/fb/:
```
To check for launches without checks:
```
python3 caffe2/torch/testing/check_kernel_launches.py
```
Make sure none of the updated files are in the returned list.
Reviewed By: r-barnes
Differential Revision: D25325039
fbshipit-source-id: 2043d6e63c7d029c35576d3101c18247ffe92f01
Summary:
[Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451]
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself.
Correspondingly, changes are made to cpp_extension.py to match these improvements.
The list of improvements to hipify is as follows:
1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path.
The list of changes to cpp_extension.py is as follows:
1. Call hipify when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically
cc jeffdaily sunway513 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715
Reviewed By: bdhirsh
Differential Revision: D25272824
Pulled By: ezyang
fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e
Summary:
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, **not for PyTorch or Caffe2 itself**.
Correspondingly, changes are made to `cpp_extension.py` to match these improvements.
The list of improvements to hipify is as follows:
1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path.
The list of changes to `cpp_extension.py` is as follows:
1. Call `hipify` when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically
cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451
Reviewed By: ezyang
Differential Revision: D24924736
Pulled By: malfet
fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46092
Make empty c10-full without using hacky-wrapper, i.e. port the kernel to the new style signature.
This PR also changes the signature of some helpers called by empty to the new style.
ghstack-source-id: 116544203
(Note: this ignores all push blocking failures!)
Test Plan:
vs prev diff (outdated, before c10::optional fix): https://www.internalfb.com/intern/fblearner/details/224735103/
after c10::optional fix:
https://www.internalfb.com/intern/fblearner/details/231391773/
Also, after the c10::optional fix, the instruction counting benchmark shows a 2% regression for calling empty from Python. We decided this is acceptable and decided against landing D24425836 which would fix the regression.
Reviewed By: ezyang
Differential Revision: D24219944
fbshipit-source-id: e554096e90ce438c75b679131c3151ff8e5c5d50
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45181
`init_process_group` and `new_group` update a bunch of global
variables after initializing the actual process group. As a result, there is a
race that after initializing the process group on say rank 0, if we immediately
check the default process group on rank 1 (say via RPC), we might actually get
an error since rank 1 hasn't yet updated its _default_pg variable.
To resolve this issue, I've added barrier() at the end of both of these calls.
This ensures that once these calls return we are guaranteed about correct
initialization on all ranks.
Since these calls are usually done mostly during initialization, it should be
fine to add the overhead of a barrier() here.
#Closes: https://github.com/pytorch/pytorch/issues/40434, https://github.com/pytorch/pytorch/issues/40378
ghstack-source-id: 112923112
Test Plan:
Reproduced the failures in
https://github.com/pytorch/pytorch/issues/40434 and
https://github.com/pytorch/pytorch/issues/40378 and verified that this PR fixes
the issue.
Reviewed By: mrshenli
Differential Revision: D23858025
fbshipit-source-id: c4d5e46c2157981caf3ba1525dec5310dcbc1830
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41610
Previously, operators that have a `Tensor?` (i.e. optional tensor) in their schema implemented it using `Tensor` in C++ and filled in an undefined tensor for the None case.
The c10 operator library, however, expects `Tensor?` to be represented as `optional<Tensor>`, so those operators couldn't be c10-full yet and still had to use codegenerated unboxing instead of templated unboxing.
This PR changes that. It extends the `hacky_wrapper_for_legacy_signatures` to not only take case of TensorOptions, but now also map between signatures taking `Tensor` and `optional<Tensor>`.
For this, it requires an additional template parameter, the expected signature, and it uses that to go argument-by-argument and unwrap any optionals it finds.
ghstack-source-id: 108873701
Test Plan: waitforsandcastle
Reviewed By: bhosmer
Differential Revision: D22607879
fbshipit-source-id: 57b2fb01a294b804f82cd55cd70f0ef4a478e14f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40944
This stack adds Work-level timeout for blocking wait.
This PR just changes the API to accept a default wait arg for the wait function in each ProcessGroup backend. The ProcessGroup superclass correctly waits for the given timeout by changing the CV wait to wait_for.
Closes: https://github.com/pytorch/pytorch/issues/37571
ghstack-source-id: 107835735
Test Plan: Tests in 4th PR in this stack
Reviewed By: jiayisuse
Differential Revision: D22107135
fbshipit-source-id: b38c07cb5e79e6c86c205e580336e7918ed96501
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39459
Update to this PR: this code isn't going to fully solve https://github.com/pytorch/pytorch/issues/37010. The changes required for 37010 is more than this PR initially planned. Instead, this PR switches op registration of rng related tests to use the new API (similar to what was done in #36925)
Test Plan:
1) unit tests
Imported from OSS
Reviewed By: ezyang
Differential Revision: D22264889
fbshipit-source-id: 82488ac6e3b762a756818434e22c2a0f9cb9dd47
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39492
This PR adds use_c10_dispatcher: full to ops taking TensorOptions. To allow this, since the c10 operator library doesn't know about TensorOptions, we need to register the operator kernels as optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead, and also call them this way.
Changes:
Add use_c10_dispatcher: full to those ops
Write hacky_wrapper_for_legacy_signatures which takes an old-style kernel (i.e. one written to take TensorOptions) an creates a wrapper kernel for it that takes the scattered optional<ScalarType>, optional<Device>, optional<Layout>, optional<bool> instead.
Change codegen so that all op registrations are wrapped into hacky_wrapper_for_legacy_signatures. This is added to all ops but is a no-op if the op doesn't take TensorOptions. This allows us in the future to just change a kernel signature from TensorOptions to the scattered version and have it work without having to touch codegen.
Change codegen so that the frontend calls those operators with expanded arguments instead of with a TensorOptions object. This is required because now the kernels are written in this way.
This PR does not remove TensorOptions special cases from codegen, but instead it separates kernels from the codegen/frontend issues. After this, kernels can be worked on separately without having to touch codegen and codegen can be worked on without having to touch kernels.
Codegen diff: P133121032
ghstack-source-id: 106426630
Test Plan: waitforsandcastle
Differential Revision: D21581908
fbshipit-source-id: 6d4a9f526fd70fae40581bf26f3ccf794ce6a89e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40251
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.
This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.
ghstack-source-id: 106194240
Test Plan: waitforsandcastle
Differential Revision: D22126701
fbshipit-source-id: 6c908a822e295757bcc0014f78f51e6a560f221f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38361
Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel.
This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly.
supersedes D17485438
ghstack-source-id: 106178820
Test Plan: waitforsandcastle
Differential Revision: D21534052
fbshipit-source-id: 6be436a3f20586277a051d764af29e21d5567da0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776
* Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl
* Changed numel() and set_numel() to nbytes() and set_nbytes()
* Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter
* Update all callers of the changed API
Part of issue https://github.com/pytorch/pytorch/issues/33950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028
Differential Revision: D21171334
Pulled By: ezyang
fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36631
Summary of changes
1. Moved random transformation functions to DistributionHelper.h (`uniform_int_from_to_distribution`, `uniform_int_full_range_distribution`, `uniform_int_distribution`) to avoid code duplication between default CPU, CUDA rngs and custom rng extensions
2. Made GeneratorImpl fields protected instead of private
3. Introduced `TORCH_CHECK_IF_NOT_ON_CUDA` that does the same as `TORCH_CHECK` if it is not CUDA/ROCm device
4. To test multiple rng extensions I had to move ops registration to the method `registerOps()`, expose it to python and call it `def setUp(self)`
Test Plan: Imported from OSS
Differential Revision: D21229202
Pulled By: pbelevich
fbshipit-source-id: 6aa3280f2fc3324cf3e748388b5087e3a1e49f23
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742
Now, you can define a custom class inside a TORCH_LIBRARY block.
It looks very similar to what you did before. Instead of
```
static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo);
```
you write
```
TORCH_LIBRARY(Namespace, m) {
m.class_<Class>("Class")
.def("foo", foo);
}
```
All the old usages still work, but at some point we should start
updating the tutorials when we're ready to go 100% live with the
new pybind11 style API.
custom class API previously lived in torch/ folder and in torch
namespace, so for consistency, the new TORCH_LIBRARY also got
moved to torch/library.h The definition of Library::class_ is in the
bottom of that header because I need all of the class_ constructors
available, but there is a circular dependency between the two headers.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D21089648
Test Plan: Imported from OSS
Pulled By: ezyang
fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36258
Previous we had a && chaining style API. There are some downsides to
this API:
- It's easy to forget the 'static' qualifier in front, leading to
subtle ODR bugs.
- It is not compatible with torchbind class_ definitions, as these
need multiple levels of chaining. So in practice people end
up having to define multiple static initializers, one per class.
- It's not like pybind11.
- There's no way to conveniently get the file and line number of
the registration, as there is no macro point in the API.
- The old API doesn't really encourage people to put all of their
definitions for a library in one place, and to give a custom
namespace for it. Similarly, the old API wasn't very DRY, because
you had to keep repeating the namespace/dispatch key you
were writing implementations for.
The new API is modeled exactly off of the PYBIND11_MODULE macro:
you write:
```
TORCH_LIBRARY(aten, m) {
m.def("aten::add(Tensor self, Tensor other) -> Tensor");
...
}
```
in a non-chaining fashion, and under the hood the macro expands to
define a function, and define a static initializer that allocates
c10::Library (previously called c10::Module, but we renamed it
to avoid confusion with the existing NN module concept), passes
it to your function, and then retains it for the rest of the lifetime
of the program. Specification of the namespace is mandatory,
and in later commit I plan to make it a hard error to TORCH_LIBRARY
the same library name twice.
If you are specifying an implementation for an existing operator
(e.g., you're the XLA backend, or even if you're just putting
registrations for implementations at the implementation site),
you should use TORCH_LIBRARY_IMPL, which instead takes a backend
argument (instead of namespace) and can be used to specify an
implementation for a backend. Unlike TORCH_LIBRARY, you can do
as many of these as you want for a backend.
This needs updates to the mobile code analyzer.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20929257
Pulled By: ezyang
fbshipit-source-id: ba04d78492e8c93ae7190165fb936f6872896ada
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36232
The purpose of this PR is to replace `at::Generator generator = nullptr` with `c10::optional<at::Generator> = c10::nullopt` all over the code
* #36230 Replace std::shared_ptr with c10::intrusive_ptr in at::Generator
Test Plan: Imported from OSS
Differential Revision: D20943603
Pulled By: pbelevich
fbshipit-source-id: 65d335990f01fcc706867d5344e73793fad68ae6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36223
Previously #35714
There are a lot of unboxed only defs. We're committed to removing
them at the end of the half but as I am about to do a lot of porting
to the new API, let's get them into a form where they're easy to
remove. This is a new overload impl_UNBOXED that will pass
the function pointer straight to CppFunction::makeUnboxedOnly
I don't attempt to make the _UNBOXED API complete; in particular,
catchall declarations don't get this sugar (as there are very few
of them).
To get some coverage of _UNBOXED API for code analysis, I switched
one of our unboxed tests to be an impl rather than a def. This
shouldn't materially affect coverage.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20929259
Pulled By: ezyang
fbshipit-source-id: 72d2061b6c8a6afbcd392b47f53ade18de2f9184
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36222
Reland of #35706, with fixes to code analyzer.
It is extremely common to define implementations of operators at a
specific dispatch key, so we add an overload to impl specifically for
this case. I then delete most uses of torch::dispatch
dispatch_autograd call sites can't make use of this overload. So
instead the new preferred way to specify something as autograd is to
pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA
which we support today).
I flip flopped about whether or not kAutograd should have the type
DispatchKey or some other type (to help better encapsulate the
DispatchKey enum); this is more direct and I can't think of any
BC problems from this usage.
Some other reorganization I did:
- I renamed all of the worker functions in op_registration to have
a leading underscore and made them private, just to make it more
clear what the public versus private API were (the private API
shouldn't be used by users because it doesn't come with && overloads)
Note that this means I needed to adjust the regex in the
code analyzer, because
- In a few places where I was touching lines already, I replaced
full DispatchKey typed out enums with shorter kFoo names, similar
to kAutograd but I didn't publish these globally.
- Code analyzer now prints a unified diff, and in the other order
(because I tend to think of the diff as reporting how the /new/ result
is different)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20929256
Pulled By: ezyang
fbshipit-source-id: c69b803d2b3a1a8aff70e14da33d3adec5239f13
Summary:
The original behavior of pytorch c10d only supports built-in c10d backends, such as
nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically
loading 3rd party communication libraries which are derived from ProcessGroup base class.
related RFC is in: https://github.com/pytorch/pytorch/issues/27955
Through this way, user just need specify a 3rd party c10d backend name when invoking
torch.distributed.init_process_group(). The proposed logic will try to load corresponding
c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend
through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068
Differential Revision: D19174838
Pulled By: agolynski
fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35714
There are a lot of unboxed only defs. We're committed to removing
them at the end of the half but as I am about to do a lot of porting
to the new API, let's get them into a form where they're easy to
remove. This is a new overload impl_UNBOXED that will pass
the function pointer straight to CppFunction::makeUnboxedOnly
I don't attempt to make the _UNBOXED API complete; in particular,
catchall declarations don't get this sugar (as there are very few
of them).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20775782
Pulled By: ezyang
fbshipit-source-id: c5e804c69f5961c9d4862f6c5dbbe4c524cc32cc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35706
It is extremely common to define implementations of operators at a
specific dispatch key, so we add an overload to impl specifically for
this case. I then delete most uses of torch::dispatch
dispatch_autograd call sites can't make use of this overload. So
instead the new preferred way to specify something as autograd is to
pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA
which we support today).
I flip flopped about whether or not kAutograd should have the type
DispatchKey or some other type (to help better encapsulate the
DispatchKey enum); this is more direct and I can't think of any
BC problems from this usage.
Some other reorganization I did:
- I renamed all of the worker functions in op_registration to have
a leading underscore and made them private, just to make it more
clear what the public versus private API were (the private API
shouldn't be used by users because it doesn't come with && overloads)
- In a few places where I was touching lines already, I replaced
full DispatchKey typed out enums with shorter kFoo names, similar
to kAutograd but I didn't publish these globally.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20775783
Pulled By: ezyang
fbshipit-source-id: e45b289e5d1f86c180b24cf14c63cf4459ab5337
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35398
This disables namespaced c10::import which is broken with custom
mobile op builds. This is to help prevent people from accidentally
breaking the custom mobile build in a mysterious way; if they use
the longform version it will work. Fixing the analyzer is tracked
in https://github.com/pytorch/pytorch/issues/35397
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20680519
Pulled By: ezyang
fbshipit-source-id: a18ac8df7e72bf399807870beedb828131273e48
Summary:
Reland of https://github.com/pytorch/pytorch/pull/35061 ; removed
the get qualified type name magic from debug strings to work around
MSVC 2017 bug.
Main points of the new API:
- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
initializers run, you end up with the same end result.
op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface
How does this implementation proceed? The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.
- DispatchKeyExtractor has an uninitialized state where it doesn't look
for dispatch keys in any arguments of the stack. It can have a
schema (de)registered to itself post facto with
registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
the uninitialized state. It can have a schema (de)registered to itself
post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
defs_and_impls keeps track of the outstanding impl registrations; you
may have impl registrations but no defs. If there are no defs (no
schema), the operator is not returned by findSchema. A new
findOperatorByName fucntion unconditionally returns the OperatorHandle
even if there's no schema. OperatorHandle::hasSchema can be used
to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
'registerDef' to only return a RegistrationHandleRAII. This is marginally
less efficient (since we're doing two hash table lookups on a registration
now), but this won't matter in the long term, and probably doesn't
matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
a bunch of places where we're improperly directly interfacing with Dispatcher;
we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
API. This includes VariableType registrations (which previously
weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
rather than indirecting through the old API
- We deleted alias analysis kind merging entirely. As a nod to BC, it's
possible to define a full schema with alias analysis kind, and then
later do another full schema def with missing alias analysis kind, but
the opposite direction is not allowed. We can remove this entirely
following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
be able to immediately schema match at the point of an impl() (because
we don't have the schema yet). To do this, we store the inferred
function schema inside a KernelEntry, so we can check it when we get
the real schema.
- Registered kernel functions now store a debug string which
can be used to more easily identify them. Tests use this to
distinguish between multiple distinct registrations; regular
invocations get only very basic information.
Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.
The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
so that we can easily write a more complex testing harness
using expect tests.
- For series of registrations we want to test, exhaustively
test every possible permutation of registrations (and
deregistrations), and show that the intermediate states
agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
debugging method that prints the internal state of the
dispatcher. This method may be generally useful for people
who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
checks that the internal invariants of the dispatcher are
upheld (so we don't have to print internal implementation
details of the dispatcher)
The testing framework found a few bugs in development. For example,
here is a case where we registered schema too early, before checking
if it was valid:
```
Traceback (most recent call last):
File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
], raises=True)
File "test/test_dispatch.py", line 135, in commute
results=results, raises=raises)
File "test/test_dispatch.py", line 83, in run_permutation
.format(ctor_order[:i], op_ix))
File "test/test_dispatch.py", line 59, in check_invariants
.format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
: expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```
There are also C++ smoketests for the API. These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)
Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:
- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
is a dict. The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
We now do namespacing by post facto fixing up the OperatorName
embedded in FunctionSchema. This also means that you can
now do torch::import("ns1").def("ns2::blah") and have the ns2
override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
annotation kinds. This meant we had to template up some function
signatures which previously took const char*. There's now a nice
comment explaining this strategy.
- torch::import now takes std::string which means we can use
the namespacing from Python
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35629
Differential Revision: D20724551
Pulled By: ezyang
fbshipit-source-id: befa46a1affb4ec4ae1fb39e3564a63695a6ca41
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35061
Main points of the new API:
- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
initializers run, you end up with the same end result.
op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface
How does this implementation proceed? The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.
- DispatchKeyExtractor has an uninitialized state where it doesn't look
for dispatch keys in any arguments of the stack. It can have a
schema (de)registered to itself post facto with
registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
the uninitialized state. It can have a schema (de)registered to itself
post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
defs_and_impls keeps track of the outstanding impl registrations; you
may have impl registrations but no defs. If there are no defs (no
schema), the operator is not returned by findSchema. A new
findOperatorByName fucntion unconditionally returns the OperatorHandle
even if there's no schema. OperatorHandle::hasSchema can be used
to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
'registerDef' to only return a RegistrationHandleRAII. This is marginally
less efficient (since we're doing two hash table lookups on a registration
now), but this won't matter in the long term, and probably doesn't
matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
a bunch of places where we're improperly directly interfacing with Dispatcher;
we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
API. This includes VariableType registrations (which previously
weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
rather than indirecting through the old API
- We deleted alias analysis kind merging entirely. As a nod to BC, it's
possible to define a full schema with alias analysis kind, and then
later do another full schema def with missing alias analysis kind, but
the opposite direction is not allowed. We can remove this entirely
following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
be able to immediately schema match at the point of an impl() (because
we don't have the schema yet). To do this, we store the inferred
function schema inside a KernelEntry, so we can check it when we get
the real schema.
- Registered kernel functions now store a debug string which
can be used to more easily identify them. There's some best
effort stuff based on __FUNCSIG__ but this is only really
capable of reporting types and not function symbols. Tests
use this to distinguish between multiple distinct registrations.
Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.
The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
so that we can easily write a more complex testing harness
using expect tests.
- For series of registrations we want to test, exhaustively
test every possible permutation of registrations (and
deregistrations), and show that the intermediate states
agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
debugging method that prints the internal state of the
dispatcher. This method may be generally useful for people
who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
checks that the internal invariants of the dispatcher are
upheld (so we don't have to print internal implementation
details of the dispatcher)
The testing framework found a few bugs in development. For example,
here is a case where we registered schema too early, before checking
if it was valid:
```
Traceback (most recent call last):
File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
], raises=True)
File "test/test_dispatch.py", line 135, in commute
results=results, raises=raises)
File "test/test_dispatch.py", line 83, in run_permutation
.format(ctor_order[:i], op_ix))
File "test/test_dispatch.py", line 59, in check_invariants
.format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
: expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```
There are also C++ smoketests for the API. These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)
Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:
- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
is a dict. The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
We now do namespacing by post facto fixing up the OperatorName
embedded in FunctionSchema. This also means that you can
now do torch::import("ns1").def("ns2::blah") and have the ns2
override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
annotation kinds. This meant we had to template up some function
signatures which previously took const char*. There's now a nice
comment explaining this strategy.
- torch::import now takes std::string which means we can use
the namespacing from Python
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D20680520
Pulled By: ezyang
fbshipit-source-id: 5d39a28e4ec7c73fe4b1fb2222e865ab65e188f5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34774
This PR provides pybind11's `type_caster<at::Generator>` that allows mapping `at::Generator` instance returned from user-defined method to python `torch::Generator`, defined as `THPGenerator ` c++ class.
This allows 1) defining custom RNG in c++ extension 2) using custom RNG in python code.
`TestRNGExtension.test_rng` shows how to use custom RNG defined in `rng_extension.cpp`
Test Plan: Imported from OSS
Differential Revision: D20549451
Pulled By: pbelevich
fbshipit-source-id: 312a6deccf8228f7f60695bbf95834620d52f5eb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33093
In #30187 the aliasAnalysis field on operator registration was updated
so that alias analysis could be specified in only some registration call
sites, rather than requiring it be consistently specified in all call
sites. With this change, we can eliminate the requirement that all
registrations specify aliasAnalysis; as long as we know *one* site
specifies the correct aliasAnalysis, we don't have to specify it
any of the other sites.
In this patch, the "one site" is TypeDefault.cpp (previously we only
generated these stub declarations for manually registered functions,
but now we generate the stubs for everything). Then I delete aliasAnalysis
anywhere we register an op for an existing function (which is a lot
of places).
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19837897
Pulled By: ezyang
fbshipit-source-id: 26a7fbc809ec1553da89ea5c0361f3e81526d4c2
Summary:
This pull request has changes for:
1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py
2. Fixes for hipify module to be able to be used by a torch extension
cc: ezyang iotamudelta jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32669
Differential Revision: D20033893
Pulled By: zou3519
fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495
Background
------------------------------
Previously, ninja was used to compile+link inline cpp_extensions and
ahead-of-time cpp_extensions were compiled with distutils. This PR adds
the ability to compile (but not link) ahead-of-time cpp_extensions with ninja.
The main motivation for this is to speed up cpp_extension builds: distutils
does not make use of parallelism. With this PR, using the new option, on my machine,
- torchvision compilation goes from 3m43s to 49s
- nestedtensor compilation goes from 2m0s to 28s.
User-facing changes
------------------------------
I added a `use_ninja` flag to BuildExtension. This defaults to
`True`. When `use_ninja` is True:
- it will attempt to use ninja.
- If we cannot use ninja, then this throws a warning and falls back to
distutils.
- Situations we cannot use ninja: Windows (NYI, I'll open a new issue
for this), if ninja cannot be found on the system.
Implementation Details
------------------------------
This PR makes this change in two steps. Please me know if it would be
easier to review this if I split this up into a stacked diff.
Those changes are:
1) refactor _write_ninja_file to separate the policy (what compiler flags
to pass) from the mechanism (how to write the ninja file and do compilation).
2) call _write_ninja_file and _run_ninja_build while building
ahead-of-time cpp_extensions. These are only used to compile objects;
distutils still handles the linking.
Change 1: refactor _write_ninja_file to seperate policy from mechanism
- I split _write_ninja_file into: _write_ninja_file and
_write_ninja_file_to_build_library
- I renamed _build_extension_module to _run_ninja_build
Change 2: Call _write_ninja_file while building ahead-of-time
cpp_extensions
- _write_ninja_file_and_compile_objects calls _write_ninja_file to only
build object files.
- We monkey-patch distutils.CCompiler.compile to call
_write_ninja_files_and_compile_objects
- distutils still handles the linking step. The linking step is not a
bottleneck so it was not a concern.
- This change only works on unix-based systems. Our code for windows
goes down a different codepath and I did not want to mess with that.
- If a system does not support ninja, we raise a warning and fall back
to the original compilation path.
Test Plan
------------------------------
Adhoc testing
- I built torchvision using pytorch master and printed out the build
commands. Next, I used this branch to build torchvision and looked at
the ninja file. I compared the ninja file with the build commands and
asserted that they were functionally the same.
- I repeated the above for pytorch/nestedtensor.
PyTorch test suite
- I split `test_cpp_extensions` into `test_cpp_extensions_aot` and
`test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests
ahead-of-time and the JIT version tests just-in-time (not to be confused
with TorchScript)
- `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with
a module that was built with ninja, and once with a module that was
built without ninja.
- run_test.py asserts that when we are building with use_ninja=True,
ninja is actually available on the system.
Test Plan: Imported from OSS
Differential Revision: D19730432
Pulled By: zou3519
fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32704
-Werror is too aggressive check for test cpp extensions because it fails even on deprecation warnings which is are included from core codebase.
Fixes#32136
Test Plan: Imported from OSS
Differential Revision: D19620190
Pulled By: pbelevich
fbshipit-source-id: 0e91566eb5de853559bb59e68a02b0bb15e7341b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29161.
I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281
Differential Revision: D18830818
Pulled By: ezyang
fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28468
We don't need this anymore.
ghstack-source-id: 92595388
Test Plan: unit tests
Differential Revision: D18073339
fbshipit-source-id: d0ef1332c83e47117fe0a5eadc8faedb259cfba0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28208
Backend extensions should call torch::RegisterOperators, not globalATenDispatch().
If the op is still on globalATenDispatch, then torch::RegisterOperators will do the right thing and forward it to globalATenDispatch.
ghstack-source-id: 92436988
Test Plan: waitforsandcastle
Differential Revision: D17975369
fbshipit-source-id: 0d4bd5e4e5b86e6dcfba527a7d11c25508896ac1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25914
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D17284083
Pulled By: ezyang
fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131
Changes in this PR:
- For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files.
- This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things.
- Enable the use_c10_dispatcher: True flag for about ~70% of operators
- This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash
- For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops.
Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true):
- `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet
- `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser
- out functions have different argument order in C++ as in the jit schema
- `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
- fixed-size arrays like `int[3]` not supported in c10 yet
These will be fixed in separate diffs and then the exclusion tag will be removed.
ghstack-source-id: 90060748
Test Plan: a diff stacked on top uses these registrations to call these ops from ATen
Differential Revision: D16603131
fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25252
Our model going forward for extensions will be that you will have to
get an allocation of an ID in our system. This is how things work
in practice today; we're just simplifying our underlying registration
since there is no need to have distributed registration.
There are some codemods in this diff:
```
codemod --extensions cpp,h,cc,cuh,py,in --exclude-paths=c10/core/TensorTypeId.h '([A-Za-z]+?)TensorId\(\)' 'TensorTypeId::\1TensorId'
codemod --extensions cpp,h,cc,cuh,py,in 'TensorTypeIds::undefined\(\)' 'TensorTypeId::UndefinedTensorId'
codemod --extensions cpp 'TensorType1\(\)' 'TensorTypeId::CPUTensorId'
codemod --extensions cpp 'TensorType2\(\)' 'TensorTypeId::CUDATensorId'
codemod --extensions cpp 'TensorType3\(\)' 'TensorTypeId::XLATensorId'
codemod --extensions cpp 'TensorType1' 'CPUTensorId'
codemod --extensions cpp 'TensorType2' 'CUDATensorId'
codemod --extensions cpp 'TensorType3' 'XLATensorId'
```
The main hand-written changes are in c10/core/TensorTypeId.h
Other manual fixes:
- aten/src/ATen/core/op_registration/op_registration.cpp - stop using
std::string operator+
- aten/src/ATen/function_wrapper.py - handle a hardcoded TypeId() that
wasn't caught by codemod
- torch/csrc/tensor/python_tensor.h - fix now incorrect forward declaration
of TensorTypeId
- aten/src/ATen/core/op_registration/ - remove out-of-line registration
Differential Revision: D17072001
Test Plan: ossci and sandcastle
Pulled By: ezyang
fbshipit-source-id: c641515fd0604c045c54fbb1d6b1b950f45e89d1
Summary:
This PR adds deprecation message for `tensor.data<T>()` (91d94e7d41), and changes all call sites of `tensor.data<T>()` to `tensor.data_ptr<T>()` in PyTorch core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24886
Differential Revision: D16924576
Pulled By: yf225
fbshipit-source-id: 0943d6be73245c7c549c78597b74c3b07fa24440
Summary:
We need this to be able to register them with the c10 dispatcher.
The overload names are based on one-letter-per-argument-type.
Script used to change native_functions.yaml and derivatives.yaml: P75630718
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23532
ghstack-source-id: 87539687
Differential Revision: D16553437
fbshipit-source-id: a1d0f10c42d284eba07e2a40641f71baa4f82ecf
Summary:
re-apply changes reverted in:
https://github.com/pytorch/pytorch/pull/22412
Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456
Differential Revision: D16097159
Pulled By: nairbv
fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d
Summary:
This change is backwards incompatible in *C++ only* on mean(), sum(), and prod() interfaces that accepted either of:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
but now to specify both the dim and dtype will require the keepdim parameter:
```
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
```
[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088
Reviewed By: ailzhang
Differential Revision: D15944971
Pulled By: nairbv
fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74
Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.
After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.
**Note that this PR is BC-breaking in the following use cases:**
**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.
**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
# Change gradient to a sparse tensor
params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))
grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad) # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072
Differential Revision: D14075257
Pulled By: yf225
fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957