Commit Graph

69 Commits

Author SHA1 Message Date
Yuanyuan Chen
e2dc32f4ba Replace decltype(auto) with auto (#166537)
This PR replaces `decltype(auto)` with `auto` for C++ return type deduction and simplifies some templates.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/166537
Approved by: https://github.com/Skylion007
2025-11-01 00:30:23 +00:00
Yuanyuan Chen
35153d0846 Simplify c10::guts::apply (#164566)
There is only one call site of `c10::guts::apply` that can be replaced by `:std::apply` except for ROCm. This PR therefore simplifies the implementation of `c10::guts::apply`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164566
Approved by: https://github.com/Aidyn-A, https://github.com/albanD
2025-10-22 00:47:43 +00:00
Richard Barnes
d428d81c7f Remove some pre-cpp17 stuff (#138410)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138410
Approved by: https://github.com/Skylion007
2024-10-23 00:38:03 +00:00
cyy
1b182ea0d2 Remove c10::guts::{conjunction,disjunction} (#127726)
They are not used in Pytorch OSS.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127726
Approved by: https://github.com/ezyang
2024-06-03 04:06:21 +00:00
cyy
0a9d73a814 Remove c10::guts::bool_constant and c10::guts::negation (#127300)
They are not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127300
Approved by: https://github.com/r-barnes
2024-05-28 19:55:20 +00:00
cyy
d6e3e89804 Remove c10::void_t (#127248)
OSS version doesn't use it anymore.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127248
Approved by: https://github.com/ezyang
2024-05-28 06:59:20 +00:00
cyy
57000708fc Remove c10::invoke_result (#127160)
Following #124169 , it can be safely remove from OSS version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127160
Approved by: https://github.com/ezyang
2024-05-28 01:39:28 +00:00
Jeff Daily
c11bd724fe [ROCm] replace ROCmLoops.cuh with hipified CUDALoops.cuh (#120101)
The intent of this change was to minimize code differences between CUDA and ROCm while maintaining or improving performance.

Verified new performance using pytorch/benchmarks/operator_benchmark.

```
python -u -m pt.unary_test --tag-filter all --device cuda
python -u -m pt.binary_test --tag-filter all --device cuda
```

On MI200 this improved performance on average 3%.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120101
Approved by: https://github.com/albanD
2024-02-22 21:57:36 +00:00
Nikita Shulga
3ad067fe2b [CPP] Update GCC minversion check to 9 or newer (#120126)
It's already a requirement for building PyTorch, but should be a
requirement for linking extensions with it, as that can lead to runtime
crashes, as `std::optional` template layout is incompatible between
gcc-9 and older compilers.

Also, update minimum supported clang version to 9.x(used to build Android), as clang-5 is clearly not C++17 compliant.

Fixes https://github.com/pytorch/pytorch/issues/120020

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120126
Approved by: https://github.com/Skylion007
2024-02-19 22:05:00 +00:00
cyy
a9953a5ef3 Remove unused c10/util/C++17.h inclusion and outdated checks (#120149)
This is a continued work to clean up pre-C++17 code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120149
Approved by: https://github.com/ezyang
2024-02-17 14:28:17 +00:00
cyy
e61c8ef3aa Simplify c10::is_pod implementation and remove unneeded inclusion of C++17.h (#118212)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118212
Approved by: https://github.com/albanD
2024-02-17 00:14:09 +00:00
cyy
c3780010a5 Remove calls of c10::guts::void_t (#117942)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117942
Approved by: https://github.com/Skylion007
2024-01-22 06:12:37 +00:00
cyy
3baade4425 Remove calls of c10::guts::conjunction,c10::guts::disjunction,c10::guts::negation (#117926)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117926
Approved by: https://github.com/Skylion007
2024-01-22 00:35:42 +00:00
Nikita Shulga
53e32d12c4 [c10] Use nested namespace in c10/cuda (#116464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116464
Approved by: https://github.com/Skylion007
2023-12-27 23:14:00 +00:00
cyy
99f222372b [5/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115354)
This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115354
Approved by: https://github.com/Skylion007
2023-12-09 17:16:04 +00:00
PyTorch MergeBot
1427b8149c Revert "Eliminate c10::guts::make_unique_base (#109429)"
This reverts commit 6b1a15d1bb.

Reverted https://github.com/pytorch/pytorch/pull/109429 on behalf of https://github.com/clee2000 due to Sorry its me again, I'm getting that this caused an instruction count regression internally ([comment](https://github.com/pytorch/pytorch/pull/109429#issuecomment-1725923294))
2023-09-19 15:47:00 +00:00
cyy
6b1a15d1bb Eliminate c10::guts::make_unique_base (#109429)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109429
Approved by: https://github.com/Skylion007
2023-09-17 00:04:09 +00:00
cyy
e4f3e5434f [Reland] Elimates c10::guts::to_string (#108748)
Reland of PR #108480, after relanding another blocking PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108748
Approved by: https://github.com/huydhn
2023-09-07 13:35:17 +00:00
PyTorch MergeBot
8da04e023e Revert "Eliminate c10::guts::to_string (#108480)"
This reverts commit 4146be192e.

Reverted https://github.com/pytorch/pytorch/pull/108480 on behalf of https://github.com/huydhn due to Sorry for reverting this, but this is needed to keep trunk green after https://github.com/pytorch/pytorch/pull/108479 was reverted.  Both will need to be relanded ([comment](https://github.com/pytorch/pytorch/pull/108480#issuecomment-1707067595))
2023-09-05 18:04:53 +00:00
cyy
4146be192e Eliminate c10::guts::to_string (#108480)
This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string.
Some other guts functions in the affected source files are also replaced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480
Approved by: https://github.com/Skylion007
2023-09-04 08:12:53 +00:00
cyy
30e2764221 remove c10::guts::{max,min} (#102952)
Because we have enabled C++17, and std::{max,min} are required to be constexpr since C++14 according to [cppreference](https://en.cppreference.com/w/cpp/algorithm/max) we can safely remove them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102952
Approved by: https://github.com/Skylion007
2023-06-06 05:40:30 +00:00
Scott Wolchok
99f68d56ee [PyTorch] Delete c10::guts::if_constexpr (#101991)
Now that we have C++17, we should not need this any more.

Differential Revision: [D46078335](https://our.internmc.facebook.com/intern/diff/D46078335/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101991
Approved by: https://github.com/r-barnes, https://github.com/Skylion007
2023-05-23 23:19:35 +00:00
Richard Barnes
bcb4444cec PyTorch -> C++17 (#98209) (#100557)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 4f0b524</samp>

This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557
Approved by: https://github.com/malfet
2023-05-19 00:49:08 +00:00
PyTorch MergeBot
da02ccc60e Revert "PyTorch -> C++17 (#98209) (#100557)"
This reverts commit 083f88e126.

Reverted https://github.com/pytorch/pytorch/pull/100557 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/100557#issuecomment-1543285863))
2023-05-11 03:43:11 +00:00
Richard Barnes
083f88e126 PyTorch -> C++17 (#98209) (#100557)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 4f0b524</samp>

This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557
Approved by: https://github.com/malfet
2023-05-10 04:47:35 +00:00
Catherine Lee
2ec6eb3d09 Revert "PyTorch -> C++17 (#98209)" (#100497)
This reverts commit 8f0c825d36.

https://github.com/pytorch/pytorch/pull/98209#issuecomment-1532099965, cannot revert normally due to unmerged linked diff

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100497
Approved by: https://github.com/huydhn, https://github.com/malfet
2023-05-02 21:22:31 +00:00
Richard Barnes
8f0c825d36 PyTorch -> C++17 (#98209)
This diff locks in C++17 as the minimum standard with which PyTorch can be compiled.

This makes it possible to use all C++17 features in PyTorch.

This breaks backward compatibility in the sense that users with older compilers may find their compilers no longer are sufficient for the job.

Summary: #buildmore

Differential Revision: D44356879

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98209
Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/PaliC
2023-05-02 19:41:50 +00:00
PyTorch MergeBot
befe3b68de Revert "Clean up C++14 code (#92216)"
This reverts commit dfbdfb276e.

Reverted https://github.com/pytorch/pytorch/pull/92216 on behalf of https://github.com/atalman due to fails internal build
2023-01-18 21:24:23 +00:00
cyy
dfbdfb276e Clean up C++14 code (#92216)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92216
Approved by: https://github.com/ezyang
2023-01-18 08:14:54 +00:00
Nikita Shulga
36ac095ff8 Migrate PyTorch to C++17 (#85969)
With CUDA-10.2 gone we can finally do it!

This PR mostly contains build system related changes, invasive functional ones are to be followed.
Among many expected tweaks to the build system, here are few unexpected ones:
 - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it
 - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code.
 - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious
 - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it.

Some prerequisites:
 - https://github.com/pytorch/pytorch/pull/89297
 - https://github.com/pytorch/pytorch/pull/89605
 - https://github.com/pytorch/pytorch/pull/90228
 - https://github.com/pytorch/pytorch/pull/90389
 - https://github.com/pytorch/pytorch/pull/90379
 - https://github.com/pytorch/pytorch/pull/89570
 - https://github.com/facebookincubator/gloo/pull/336
 - https://github.com/facebookincubator/gloo/pull/343
 - 919676fb32

Fixes https://github.com/pytorch/pytorch/issues/56055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2022-12-08 02:27:48 +00:00
Lukas N Wirz
301d9c0556 Remove deprecated usage of is_pod/is_pod_v (#88918)
… as equivalent replacements for std::is_pod and std::is_pod_v because they are deprecated in C++20.

When consuming libtorch header files in a project that uses C++20, there are warnings about std::is_pod being deprecated.  This patch fixes that issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88918
Approved by: https://github.com/ezyang
2022-12-05 16:50:00 +00:00
Vasu Agrawal
00a1065286 [pytorch] Inline std::forward definition (#85255)
Summary: Alternative (probably better) solution to the problem laid out in D39562394.

Test Plan: CI should be green.

Differential Revision: D39612710

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85255
Approved by: https://github.com/ezyang
2022-09-20 17:15:59 +00:00
Lukas N Wirz
5af48581b5 In order to make pytorch headers consumable from cpp20 code bases, … (#79985)
… all instances of std::result_of and std:result_of_t are conditionally replaced by std::invoke_result and std::invoke_result_t if __cpp_lib_is_invocable >= 201703L.  std::invoke_result was only introduced in c++17, so it should probably not be required yet.

Fixes #71657  and a small part of #69290

Tested on Centos 7 / gcc11 + a private project that requires cpp20.

I think the main questions to check by a maintainer are,
- whether my choices of preprocessor blocks are appropriate
- whether there are any very subtle differences between std::result_of and std::invoke_result that I have missed
- whether in any of the replacements  the 'new' side can/should be simplified further

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79985
Approved by: https://github.com/ezyang
2022-07-04 20:14:36 +00:00
jason_w
f42202d26c 'typename Base' is checked repeatedly (#72842)
Summary:
'typename Base' is checked repeatedly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72842

Reviewed By: albanD

Differential Revision: D34481951

Pulled By: swolchok

fbshipit-source-id: bd07fb87540397fd2f1829a8d0dad167c6a3c6d0
(cherry picked from commit e63081c469b2073c458c3a4a9530bcc08025c3f7)
2022-03-01 20:34:14 +00:00
Pruthvi Madugundu
085e2f7bdd [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610

- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.

- In the next PR
   - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
   - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd

Reviewed By: jbschlosser

Differential Revision: D30909053

Pulled By: ezyang

fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
2021-09-29 09:55:43 -07:00
Scott Wolchok
44cc873fba [PyTorch] Autoformat c10 (#56830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830

Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.

Test Plan: CI

Reviewed By: zertosh

Differential Revision: D27979080

fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
2021-04-30 21:23:28 -07:00
skyline75489
cdac61ecd4 Prevent VS from emitting ambiguous symbol errors (third time) (#53490)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/53409

First: https://github.com/pytorch/pytorch/issues/15697
Second: https://github.com/pytorch/pytorch/issues/17863

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53490

Reviewed By: VitalyFedyunin

Differential Revision: D26946687

Pulled By: mrshenli

fbshipit-source-id: 27f85abecbb75456354cc0373529c8cadc8133bd
2021-03-11 13:51:41 -08:00
Chester Liu
8177f63c91 Reorganize and refine the Windows.h import in C++ files (#48009)
Summary:
This PR aims to reduce the import overhead and symbol noises from the `windows.h` headers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48009

Reviewed By: gchanan

Differential Revision: D25045840

Pulled By: ezyang

fbshipit-source-id: 01fda70f433ba2dd0cd2d7cd676ab6ffe9d98b90
2020-11-20 14:21:09 -08:00
Basil Hosmer
6b94830cdc faithful signature support in BoxedKernelWrapper (#47267)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47267

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D24701488

Pulled By: bhosmer

fbshipit-source-id: dbce246319670f9590c5762ad20c26cb24575fe8
2020-11-10 13:58:36 -08:00
Sebastian Messmer
63c3b89c1c Simplify code with decltype(auto) (#30922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30922

New c++14 feature we can use now
ghstack-source-id: 103767403

Test Plan: waitforsandcastle

Differential Revision: D18869644

fbshipit-source-id: 54541c8004b2116386668a31eb9b0410a603b7dc
2020-05-11 21:31:18 -07:00
Sebastian Messmer
77d8a44802 If we're building on C++17, use actual "if constexpr" (#38154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38154

This should give better error messages and shorter stack traces on C++17 builds (e.g. fbcode)
ghstack-source-id: 103775564

Test Plan: waitforsandcastle

Differential Revision: D21483327

fbshipit-source-id: 184d1f9c0543bf43dc9713fa97fcc5955e7be319
2020-05-11 12:22:19 -07:00
Sebastian Messmer
379e717a1b Back out "Revert D18927220: if_constexpr for C++14" (#37792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37792

Original commit changeset: a1b8755a2790
ghstack-source-id: 103609715

Test Plan: waitforsandcastle

Differential Revision: D21389755

fbshipit-source-id: 1a3c74295dbfbf07fe225be9bcd47d11e31a20fa
2020-05-07 15:20:55 -07:00
Mike Ruberry
b428f454e1 Revert D18927220: if_constexpr for C++14
Test Plan: revert-hammer

Differential Revision:
D18927220

Original commit changeset: 19a135e00af6

fbshipit-source-id: a1b8755a27903b98b742881b3ecce4f5e99543b2
2020-04-26 04:27:53 -07:00
Sebastian Messmer
f5e6f1f333 if_constexpr for C++14 (#31091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31091

This implements a C++17 "if constexpr" like feature for C++14.
This can be used, for example, to replace SFINAE or to force the compiler to remove some parts of a function in the assembly based on a condition.
PRs stacked on top will use this to simplify some of our template metaprogramming.
ghstack-source-id: 102867141

Test Plan: unit tests

Differential Revision: D18927220

fbshipit-source-id: 19a135e00af6ebb0139ce3730353762d4512158f
2020-04-25 11:31:51 -07:00
Sebastian Messmer
2fa51dde28 Remove unnecessary tensor copies (#33732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33732

move and forward instead of copy

Benchmarks:
A microbenchmark calling the add operation on two tensors in a tight loop shows a 5% improvement in performance.
No visible change for a model like resnet that does more work in its kernels.
ghstack-source-id: 99161486

Test Plan: benchmarks

Differential Revision: D20082642

fbshipit-source-id: eeac59686f8621dd5eaa85d61e6d219bba48c847
2020-02-28 14:47:04 -08:00
Xiang Gao
f62f1b2ef0 Revert "Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to … (#33553)
Summary:
…have different argument types"

This reverts commit 05fb160048.

Please go to https://github.com/pytorch/pytorch/pull/33558 and check the CUDA9 on CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33553

Differential Revision: D20017575

Pulled By: ngimel

fbshipit-source-id: a5fd78eea00c7b0925ab21fd90a7daeb66725f1a
2020-02-21 14:56:30 -08:00
Vitaly Fedyunin
05fb160048 Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to have different argument types
Test Plan: revert-hammer

Differential Revision:
D19964089

Original commit changeset: a1e8e62d1ebc

fbshipit-source-id: fee9423d5924714f0e92eea712cde2d2163b3cf0
2020-02-20 08:19:21 -08:00
Gao, Xiang
1fe635be3c Allow vectorized gpu loop to have different argument types (#33222)
Summary:
Although currently the only user of GPU loops that has args with different dtypes is `where`, it sounds strange to restrict the args to have the same dtype. Allowing args to have different dtypes also makes it possible for me to clean up legacy code by reusing current code to implement unrolled GPU loop for non-contiguous tensors.

The stack storage of `elementwise_kernel_helper` is changed from `arg_t args[nt][arity]` to `traits:: ArgsTuple args[nt]`. Due to this change, we can no longer get element by `operator[]`, but instead we should use `std::get`. As a result, we can no longer unroll the loop wrt arity using pragma, but we have to
create a `static_unroll` to make use of template meta-programming to do the same job.

A good side effect of this change is, `invoke_with_array` is no longer needed and can be replaced with already existing `c10::guts::apply`. And we don't need the `namespace arg_type` workaround either. This makes the code less ugly.

The same approach might also work for ROCm loops, but I didn't change anything on ROCm in this PR, because I don't want potential compilation error or perf regression to delay this PR. But after this gets merged, I will try on ROCm and send a separate PR to make the code less diverge if the same approach trivially applies (trivially apply means a mindless copy-paste doesn't introduce unexpected compilation error or perf regression).

Assembly (https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb#33222):
```
**Symbol:**
void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char*, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char*, 3>)

**ASM:**

	.section	.text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits
	.sectioninfo	@"SHI_REGISTERS=20"
	.align	128
        .global         _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_
        .type           _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function
        .size           _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40520 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_)
        .other          _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT"
_ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_:
.text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_:
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253
        /*0000*/                   IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;
        /*0010*/              @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ;
        /*0020*/                   S2R R9, SR_CTAID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 39
        /*0030*/                   S2R R0, SR_TID.X ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253
        /*0040*/                   IMAD.SHL.U32 R9, R9, 0x100, RZ ;
        /*0050*/                   IADD3 R5, -R9, c[0x0][0x160], RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227
        /*0060*/                   SHF.R.S32.HI R17, RZ, 0x1f, R9 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 255
        /*0070*/                   ISETP.GE.AND P0, PT, R5, 0x100, PT ;
        /*0080*/              @!P0 BRA `(.L_2919) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227
        /*0090*/                   IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ;
        /*00a0*/                   SHF.L.U64.HI R17, R9, 0x2, R17 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 229
        /*00b0*/                   IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ;
        /*00c0*/                   IADD3 R2, P1, R12, c[0x0][0x190], RZ ;
        /*00d0*/                   IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ;
        /*00e0*/                   IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 82
        /*00f0*/                   IMAD.WIDE R8, R0, 0x10, R8 ;
        /*0100*/                   IMAD.WIDE R2, R0, 0x10, R2 ;
        /*0110*/                   LDG.E.128.SYS R8, [R8] ;
        /*0120*/                   LDG.E.128.SYS R4, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227
        /*0130*/                   IADD3 R12, P0, R12, c[0x0][0x180], RZ ;
        /*0140*/                   IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102
        /*0150*/                   IMAD.WIDE R12, R0, 0x10, R12 ;
	//## File "/usr/include/c++/8/tuple", line 1315
        /*0160*/                   FFMA R7, R7, c[0x0][0x168], R11 ;
        /*0170*/                   FFMA R6, R6, c[0x0][0x168], R10 ;
        /*0180*/                   FFMA R5, R5, c[0x0][0x168], R9 ;
        /*0190*/                   FFMA R4, R4, c[0x0][0x168], R8 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102
        /*01a0*/                   STG.E.128.SYS [R12], R4 ;
        /*01b0*/                   EXIT ;
.L_2919:
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*01c0*/                   ISETP.GE.AND P0, PT, R0, R5, PT ;
        /*01d0*/                   BMOV.32.CLEAR RZ, B0 ;
        /*01e0*/                   BSSY B0, `(.L_2920) ;
        /*01f0*/                   IMAD.MOV.U32 R4, RZ, RZ, RZ ;
        /*0200*/                   CS2R R6, SRZ ;
        /*0210*/                   IMAD.MOV.U32 R8, RZ, RZ, RZ ;
        /*0220*/                   IMAD.MOV.U32 R10, RZ, RZ, RZ ;
        /*0230*/               P0 BRA `(.L_2921) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*0240*/                   IADD3 R3, P1, R9, R0, RZ ;
        /*0250*/                   LEA.HI.X.SX32 R6, R0, R17, 0x1, P1 ;
        /*0260*/                   LEA R2, P1, R3, c[0x0][0x188], 0x2 ;
        /*0270*/                   LEA.HI.X R3, R3, c[0x0][0x18c], R6, 0x2, P1 ;
        /*0280*/                   LDG.E.SYS R10, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*0290*/                   IADD3 R6, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*02a0*/                   ISETP.GE.AND P1, PT, R6, R5, PT ;
        /*02b0*/               P1 BRA `(.L_2922) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*02c0*/                   LDG.E.SYS R6, [R2+0x100] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*02d0*/                   IADD3 R8, R0, 0x80, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*02e0*/                   ISETP.GE.AND P1, PT, R8, R5, PT ;
        /*02f0*/               P1 BRA `(.L_2923) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*0300*/                   IADD3 R8, R0, 0xc0, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*0310*/                   ISETP.GE.AND P1, PT, R8, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*0320*/                   LDG.E.SYS R8, [R2+0x200] ;
        /*0330*/              @!P1 LDG.E.SYS R7, [R2+0x300] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102
        /*0340*/               P1 IMAD.MOV.U32 R7, RZ, RZ, RZ ;
        /*0350*/                   BRA `(.L_2921) ;
.L_2923:
        /*0360*/                   IMAD.MOV.U32 R7, RZ, RZ, RZ ;
        /*0370*/                   IMAD.MOV.U32 R8, RZ, RZ, RZ ;
        /*0380*/                   BRA `(.L_2921) ;
.L_2922:
        /*0390*/                   CS2R R6, SRZ ;
        /*03a0*/                   IMAD.MOV.U32 R8, RZ, RZ, RZ ;
.L_2921:
        /*03b0*/                   BSYNC B0 ;
.L_2920:
        /*03c0*/                   BMOV.32.CLEAR RZ, B0 ;
        /*03d0*/                   BSSY B0, `(.L_2924) ;
        /*03e0*/               P0 BRA `(.L_2925) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*03f0*/                   IADD3 R3, P1, R9, R0, RZ ;
        /*0400*/                   LEA.HI.X.SX32 R12, R0, R17, 0x1, P1 ;
        /*0410*/                   LEA R2, P1, R3, c[0x0][0x190], 0x2 ;
        /*0420*/                   LEA.HI.X R3, R3, c[0x0][0x194], R12, 0x2, P1 ;
        /*0430*/                   LDG.E.SYS R11, [R2] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*0440*/                   IADD3 R12, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*0450*/                   ISETP.GE.AND P1, PT, R12, R5, PT ;
        /*0460*/               P1 BRA `(.L_2926) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*0470*/                   LDG.E.SYS R13, [R2+0x100] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*0480*/                   IADD3 R12, R0, 0x80, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*0490*/                   ISETP.GE.AND P1, PT, R12, R5, PT ;
        /*04a0*/               P1 BRA `(.L_2927) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*04b0*/                   LDG.E.SYS R15, [R2+0x200] ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46
        /*04c0*/                   IADD3 R12, R0, 0xc0, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42
        /*04d0*/                   ISETP.GE.AND P1, PT, R12, R5, PT ;
        /*04e0*/               P1 BRA `(.L_2928) ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45
        /*04f0*/                   LDG.E.SYS R4, [R2+0x300] ;
        /*0500*/                   BRA `(.L_2928) ;
.L_2927:
        /*0510*/                   IMAD.MOV.U32 R15, RZ, RZ, RZ ;
        /*0520*/                   BRA `(.L_2928) ;
.L_2926:
        /*0530*/                   IMAD.MOV.U32 R15, RZ, RZ, RZ ;
        /*0540*/                   IMAD.MOV.U32 R13, RZ, RZ, RZ ;
        /*0550*/                   BRA `(.L_2928) ;
.L_2925:
        /*0560*/                   IMAD.MOV.U32 R15, RZ, RZ, RZ ;
        /*0570*/                   IMAD.MOV.U32 R13, RZ, RZ, RZ ;
        /*0580*/                   IMAD.MOV.U32 R11, RZ, RZ, RZ ;
.L_2928:
        /*0590*/                   BSYNC B0 ;
.L_2924:
	//## File "/usr/include/c++/8/tuple", line 1315
        /*05a0*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*05b0*/                   IADD3 R9, P0, R9, R0, RZ ;
	//## File "/usr/include/c++/8/tuple", line 1315
        /*05c0*/                   FFMA R11, R11, c[0x0][0x168], R10 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59
        /*05d0*/                   IADD3 R14, R0, 0x40, RZ ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*05e0*/                   LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ;
        /*05f0*/                   LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*0600*/                   ISETP.GE.AND P1, PT, R14, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*0610*/                   LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ;
        /*0620*/                   STG.E.SYS [R2], R11 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*0630*/               P1 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59
        /*0640*/                   IADD3 R10, R0, 0x80, RZ ;
	//## File "/usr/include/c++/8/tuple", line 1315
        /*0650*/                   FFMA R13, R13, c[0x0][0x168], R6 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*0660*/                   ISETP.GE.AND P0, PT, R10, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*0670*/                   STG.E.SYS [R2+0x100], R13 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*0680*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59
        /*0690*/                   IADD3 R0, R0, 0xc0, RZ ;
	//## File "/usr/include/c++/8/tuple", line 1315
        /*06a0*/                   FFMA R15, R15, c[0x0][0x168], R8 ;
        /*06b0*/                   FFMA R7, R4, c[0x0][0x168], R7 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*06c0*/                   ISETP.GE.AND P0, PT, R0, R5, PT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*06d0*/                   STG.E.SYS [R2+0x200], R15 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55
        /*06e0*/               P0 EXIT ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58
        /*06f0*/                   STG.E.SYS [R2+0x300], R7 ;
	//## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 260
        /*0700*/                   EXIT ;
.L_2929:
        /*0710*/                   BRA `(.L_2929);
        /*0720*/                   NOP;
        /*0730*/                   NOP;
        /*0740*/                   NOP;
        /*0750*/                   NOP;
        /*0760*/                   NOP;
        /*0770*/                   NOP;
.L_40520:
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33222

Differential Revision: D19964089

Pulled By: ngimel

fbshipit-source-id: a1e8e62d1ebcc67fb49f00d87c02bcdd13194024
2020-02-19 18:41:27 -08:00
Michael Ranieri
e025f393f6 windows template specialization bug (#33076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33076

attempt at fixing https://github.com/pytorch/pytorch/issues/30886

Test Plan: circleCI with `call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.16` passes

Differential Revision: D19784550

fbshipit-source-id: 9fb42c3854d1d00d96cd7179bef9dd1aa2972ea6
2020-02-07 00:41:22 -08:00
Sebastian Messmer
ab60cca488 Make c10::util::get_fully_qualified_type_name() backwards compatible with clang 4 (#31351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31351

Clang 4 needs the c10:: namespace specifier on fully_qualified_type_name_impl() to work correctly.

Also, let's add an error message for people using clang 3 and earlier, we don't support those compilers anymore but before this PR, they got a crappy message.
ghstack-source-id: 96380163

Test Plan: testinprod

Differential Revision: D19135587

fbshipit-source-id: c206b56240b36e5c207fb2b69c389bb39f1e62aa
2020-01-07 17:07:54 -08:00