pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Yuanyuan Chen	e2dc32f4ba	Replace decltype(auto) with auto (#166537 ) This PR replaces `decltype(auto)` with `auto` for C++ return type deduction and simplifies some templates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166537 Approved by: https://github.com/Skylion007	2025-11-01 00:30:23 +00:00
Yuanyuan Chen	35153d0846	Simplify c10::guts::apply (#164566 ) There is only one call site of `c10::guts::apply` that can be replaced by `:std::apply` except for ROCm. This PR therefore simplifies the implementation of `c10::guts::apply`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164566 Approved by: https://github.com/Aidyn-A, https://github.com/albanD	2025-10-22 00:47:43 +00:00
Richard Barnes	d428d81c7f	Remove some pre-cpp17 stuff (#138410 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138410 Approved by: https://github.com/Skylion007	2024-10-23 00:38:03 +00:00
cyy	1b182ea0d2	Remove c10::guts::{conjunction,disjunction} (#127726 ) They are not used in Pytorch OSS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127726 Approved by: https://github.com/ezyang	2024-06-03 04:06:21 +00:00
cyy	0a9d73a814	Remove c10::guts::bool_constant and c10::guts::negation (#127300 ) They are not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127300 Approved by: https://github.com/r-barnes	2024-05-28 19:55:20 +00:00
cyy	d6e3e89804	Remove c10::void_t (#127248 ) OSS version doesn't use it anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127248 Approved by: https://github.com/ezyang	2024-05-28 06:59:20 +00:00
cyy	57000708fc	Remove c10::invoke_result (#127160 ) Following #124169 , it can be safely remove from OSS version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127160 Approved by: https://github.com/ezyang	2024-05-28 01:39:28 +00:00
Jeff Daily	c11bd724fe	[ROCm] replace ROCmLoops.cuh with hipified CUDALoops.cuh (#120101 ) The intent of this change was to minimize code differences between CUDA and ROCm while maintaining or improving performance. Verified new performance using pytorch/benchmarks/operator_benchmark. ``` python -u -m pt.unary_test --tag-filter all --device cuda python -u -m pt.binary_test --tag-filter all --device cuda ``` On MI200 this improved performance on average 3%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120101 Approved by: https://github.com/albanD	2024-02-22 21:57:36 +00:00
Nikita Shulga	3ad067fe2b	[CPP] Update GCC minversion check to 9 or newer (#120126 ) It's already a requirement for building PyTorch, but should be a requirement for linking extensions with it, as that can lead to runtime crashes, as `std::optional` template layout is incompatible between gcc-9 and older compilers. Also, update minimum supported clang version to 9.x(used to build Android), as clang-5 is clearly not C++17 compliant. Fixes https://github.com/pytorch/pytorch/issues/120020 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120126 Approved by: https://github.com/Skylion007	2024-02-19 22:05:00 +00:00
cyy	a9953a5ef3	Remove unused c10/util/C++17.h inclusion and outdated checks (#120149 ) This is a continued work to clean up pre-C++17 code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120149 Approved by: https://github.com/ezyang	2024-02-17 14:28:17 +00:00
cyy	e61c8ef3aa	Simplify c10::is_pod implementation and remove unneeded inclusion of C++17.h (#118212 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118212 Approved by: https://github.com/albanD	2024-02-17 00:14:09 +00:00
cyy	c3780010a5	Remove calls of c10::guts::void_t (#117942 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117942 Approved by: https://github.com/Skylion007	2024-01-22 06:12:37 +00:00
cyy	3baade4425	Remove calls of c10::guts::conjunction,c10::guts::disjunction,c10::guts::negation (#117926 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117926 Approved by: https://github.com/Skylion007	2024-01-22 00:35:42 +00:00
Nikita Shulga	53e32d12c4	[c10] Use nested namespace in c10/cuda (#116464 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116464 Approved by: https://github.com/Skylion007	2023-12-27 23:14:00 +00:00
cyy	99f222372b	[5/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115354 ) This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115354 Approved by: https://github.com/Skylion007	2023-12-09 17:16:04 +00:00
PyTorch MergeBot	1427b8149c	Revert "Eliminate c10::guts::make_unique_base (#109429 )" This reverts commit `6b1a15d1bb`. Reverted https://github.com/pytorch/pytorch/pull/109429 on behalf of https://github.com/clee2000 due to Sorry its me again, I'm getting that this caused an instruction count regression internally ([comment](https://github.com/pytorch/pytorch/pull/109429#issuecomment-1725923294))	2023-09-19 15:47:00 +00:00
cyy	6b1a15d1bb	Eliminate c10::guts::make_unique_base (#109429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109429 Approved by: https://github.com/Skylion007	2023-09-17 00:04:09 +00:00
cyy	e4f3e5434f	[Reland] Elimates c10::guts::to_string (#108748 ) Reland of PR #108480, after relanding another blocking PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108748 Approved by: https://github.com/huydhn	2023-09-07 13:35:17 +00:00
PyTorch MergeBot	8da04e023e	Revert "Eliminate c10::guts::to_string (#108480 )" This reverts commit `4146be192e`. Reverted https://github.com/pytorch/pytorch/pull/108480 on behalf of https://github.com/huydhn due to Sorry for reverting this, but this is needed to keep trunk green after https://github.com/pytorch/pytorch/pull/108479 was reverted. Both will need to be relanded ([comment](https://github.com/pytorch/pytorch/pull/108480#issuecomment-1707067595))	2023-09-05 18:04:53 +00:00
cyy	4146be192e	Eliminate c10::guts::to_string (#108480 ) This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string. Some other guts functions in the affected source files are also replaced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480 Approved by: https://github.com/Skylion007	2023-09-04 08:12:53 +00:00
cyy	30e2764221	remove c10::guts::{max,min} (#102952 ) Because we have enabled C++17, and std::{max,min} are required to be constexpr since C++14 according to [cppreference](https://en.cppreference.com/w/cpp/algorithm/max) we can safely remove them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102952 Approved by: https://github.com/Skylion007	2023-06-06 05:40:30 +00:00
Scott Wolchok	99f68d56ee	[PyTorch] Delete c10::guts::if_constexpr (#101991 ) Now that we have C++17, we should not need this any more. Differential Revision: [D46078335](https://our.internmc.facebook.com/intern/diff/D46078335/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101991 Approved by: https://github.com/r-barnes, https://github.com/Skylion007	2023-05-23 23:19:35 +00:00
Richard Barnes	bcb4444cec	PyTorch -> C++17 (#98209 ) (#100557 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 4f0b524</samp> This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557 Approved by: https://github.com/malfet	2023-05-19 00:49:08 +00:00
PyTorch MergeBot	da02ccc60e	Revert "PyTorch -> C++17 (#98209 ) (#100557 )" This reverts commit `083f88e126`. Reverted https://github.com/pytorch/pytorch/pull/100557 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/100557#issuecomment-1543285863))	2023-05-11 03:43:11 +00:00
Richard Barnes	083f88e126	PyTorch -> C++17 (#98209 ) (#100557 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 4f0b524</samp> This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557 Approved by: https://github.com/malfet	2023-05-10 04:47:35 +00:00
Catherine Lee	2ec6eb3d09	Revert "PyTorch -> C++17 (#98209 )" (#100497 ) This reverts commit `8f0c825d36`. https://github.com/pytorch/pytorch/pull/98209#issuecomment-1532099965, cannot revert normally due to unmerged linked diff Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100497 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-05-02 21:22:31 +00:00
Richard Barnes	8f0c825d36	PyTorch -> C++17 (#98209 ) This diff locks in C++17 as the minimum standard with which PyTorch can be compiled. This makes it possible to use all C++17 features in PyTorch. This breaks backward compatibility in the sense that users with older compilers may find their compilers no longer are sufficient for the job. Summary: #buildmore Differential Revision: D44356879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98209 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/PaliC	2023-05-02 19:41:50 +00:00
PyTorch MergeBot	befe3b68de	Revert "Clean up C++14 code (#92216 )" This reverts commit `dfbdfb276e`. Reverted https://github.com/pytorch/pytorch/pull/92216 on behalf of https://github.com/atalman due to fails internal build	2023-01-18 21:24:23 +00:00
cyy	dfbdfb276e	Clean up C++14 code (#92216 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/92216 Approved by: https://github.com/ezyang	2023-01-18 08:14:54 +00:00
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Lukas N Wirz	301d9c0556	Remove deprecated usage of is_pod/is_pod_v (#88918 ) … as equivalent replacements for std::is_pod and std::is_pod_v because they are deprecated in C++20. When consuming libtorch header files in a project that uses C++20, there are warnings about std::is_pod being deprecated. This patch fixes that issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88918 Approved by: https://github.com/ezyang	2022-12-05 16:50:00 +00:00
Vasu Agrawal	00a1065286	[pytorch] Inline std::forward definition (#85255 ) Summary: Alternative (probably better) solution to the problem laid out in D39562394. Test Plan: CI should be green. Differential Revision: D39612710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85255 Approved by: https://github.com/ezyang	2022-09-20 17:15:59 +00:00
Lukas N Wirz	5af48581b5	In order to make pytorch headers consumable from cpp20 code bases, … (#79985 ) … all instances of std::result_of and std:result_of_t are conditionally replaced by std::invoke_result and std::invoke_result_t if __cpp_lib_is_invocable >= 201703L. std::invoke_result was only introduced in c++17, so it should probably not be required yet. Fixes #71657 and a small part of #69290 Tested on Centos 7 / gcc11 + a private project that requires cpp20. I think the main questions to check by a maintainer are, - whether my choices of preprocessor blocks are appropriate - whether there are any very subtle differences between std::result_of and std::invoke_result that I have missed - whether in any of the replacements the 'new' side can/should be simplified further Pull Request resolved: https://github.com/pytorch/pytorch/pull/79985 Approved by: https://github.com/ezyang	2022-07-04 20:14:36 +00:00
jason_w	f42202d26c	'typename Base' is checked repeatedly (#72842 ) Summary: 'typename Base' is checked repeatedly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72842 Reviewed By: albanD Differential Revision: D34481951 Pulled By: swolchok fbshipit-source-id: bd07fb87540397fd2f1829a8d0dad167c6a3c6d0 (cherry picked from commit e63081c469b2073c458c3a4a9530bcc08025c3f7)	2022-03-01 20:34:14 +00:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
skyline75489	cdac61ecd4	Prevent VS from emitting ambiguous symbol errors (third time) (#53490 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/53409 First: https://github.com/pytorch/pytorch/issues/15697 Second: https://github.com/pytorch/pytorch/issues/17863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53490 Reviewed By: VitalyFedyunin Differential Revision: D26946687 Pulled By: mrshenli fbshipit-source-id: 27f85abecbb75456354cc0373529c8cadc8133bd	2021-03-11 13:51:41 -08:00
Chester Liu	8177f63c91	Reorganize and refine the Windows.h import in C++ files (#48009 ) Summary: This PR aims to reduce the import overhead and symbol noises from the `windows.h` headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48009 Reviewed By: gchanan Differential Revision: D25045840 Pulled By: ezyang fbshipit-source-id: 01fda70f433ba2dd0cd2d7cd676ab6ffe9d98b90	2020-11-20 14:21:09 -08:00
Basil Hosmer	6b94830cdc	faithful signature support in BoxedKernelWrapper (#47267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47267 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24701488 Pulled By: bhosmer fbshipit-source-id: dbce246319670f9590c5762ad20c26cb24575fe8	2020-11-10 13:58:36 -08:00
Sebastian Messmer	63c3b89c1c	Simplify code with decltype(auto) (#30922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30922 New c++14 feature we can use now ghstack-source-id: 103767403 Test Plan: waitforsandcastle Differential Revision: D18869644 fbshipit-source-id: 54541c8004b2116386668a31eb9b0410a603b7dc	2020-05-11 21:31:18 -07:00
Sebastian Messmer	77d8a44802	If we're building on C++17, use actual "if constexpr" (#38154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38154 This should give better error messages and shorter stack traces on C++17 builds (e.g. fbcode) ghstack-source-id: 103775564 Test Plan: waitforsandcastle Differential Revision: D21483327 fbshipit-source-id: 184d1f9c0543bf43dc9713fa97fcc5955e7be319	2020-05-11 12:22:19 -07:00
Sebastian Messmer	379e717a1b	Back out "Revert D18927220: if_constexpr for C++14" (#37792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37792 Original commit changeset: a1b8755a2790 ghstack-source-id: 103609715 Test Plan: waitforsandcastle Differential Revision: D21389755 fbshipit-source-id: 1a3c74295dbfbf07fe225be9bcd47d11e31a20fa	2020-05-07 15:20:55 -07:00
Mike Ruberry	b428f454e1	Revert D18927220: if_constexpr for C++14 Test Plan: revert-hammer Differential Revision: D18927220 Original commit changeset: 19a135e00af6 fbshipit-source-id: a1b8755a27903b98b742881b3ecce4f5e99543b2	2020-04-26 04:27:53 -07:00
Sebastian Messmer	f5e6f1f333	if_constexpr for C++14 (#31091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31091 This implements a C++17 "if constexpr" like feature for C++14. This can be used, for example, to replace SFINAE or to force the compiler to remove some parts of a function in the assembly based on a condition. PRs stacked on top will use this to simplify some of our template metaprogramming. ghstack-source-id: 102867141 Test Plan: unit tests Differential Revision: D18927220 fbshipit-source-id: 19a135e00af6ebb0139ce3730353762d4512158f	2020-04-25 11:31:51 -07:00
Sebastian Messmer	2fa51dde28	Remove unnecessary tensor copies (#33732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33732 move and forward instead of copy Benchmarks: A microbenchmark calling the add operation on two tensors in a tight loop shows a 5% improvement in performance. No visible change for a model like resnet that does more work in its kernels. ghstack-source-id: 99161486 Test Plan: benchmarks Differential Revision: D20082642 fbshipit-source-id: eeac59686f8621dd5eaa85d61e6d219bba48c847	2020-02-28 14:47:04 -08:00
Xiang Gao	f62f1b2ef0	Revert "Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to … (#33553 ) Summary: …have different argument types" This reverts commit `05fb160048`. Please go to https://github.com/pytorch/pytorch/pull/33558 and check the CUDA9 on CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/33553 Differential Revision: D20017575 Pulled By: ngimel fbshipit-source-id: a5fd78eea00c7b0925ab21fd90a7daeb66725f1a	2020-02-21 14:56:30 -08:00
Vitaly Fedyunin	05fb160048	Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to have different argument types Test Plan: revert-hammer Differential Revision: D19964089 Original commit changeset: a1e8e62d1ebc fbshipit-source-id: fee9423d5924714f0e92eea712cde2d2163b3cf0	2020-02-20 08:19:21 -08:00
Gao, Xiang	1fe635be3c	Allow vectorized gpu loop to have different argument types (#33222 ) Summary: Although currently the only user of GPU loops that has args with different dtypes is `where`, it sounds strange to restrict the args to have the same dtype. Allowing args to have different dtypes also makes it possible for me to clean up legacy code by reusing current code to implement unrolled GPU loop for non-contiguous tensors. The stack storage of `elementwise_kernel_helper` is changed from `arg_t args[nt][arity]` to `traits:: ArgsTuple args[nt]`. Due to this change, we can no longer get element by `operator[]`, but instead we should use `std::get`. As a result, we can no longer unroll the loop wrt arity using pragma, but we have to create a `static_unroll` to make use of template meta-programming to do the same job. A good side effect of this change is, `invoke_with_array` is no longer needed and can be replaced with already existing `c10::guts::apply`. And we don't need the `namespace arg_type` workaround either. This makes the code less ugly. The same approach might also work for ROCm loops, but I didn't change anything on ROCm in this PR, because I don't want potential compilation error or perf regression to delay this PR. But after this gets merged, I will try on ROCm and send a separate PR to make the code less diverge if the same approach trivially applies (trivially apply means a mindless copy-paste doesn't introduce unexpected compilation error or perf regression). Assembly (https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb#33222): ``` Symbol: void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3>) ASM: .section .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits .sectioninfo @"SHI_REGISTERS=20" .align 128 .global _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_ .type _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function .size _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40520 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_) .other _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT" _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253 /0000/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R9, SR_CTAID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 39 /0030/ S2R R0, SR_TID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253 /0040/ IMAD.SHL.U32 R9, R9, 0x100, RZ ; /0050/ IADD3 R5, -R9, c[0x0][0x160], RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0060/ SHF.R.S32.HI R17, RZ, 0x1f, R9 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 255 /0070/ ISETP.GE.AND P0, PT, R5, 0x100, PT ; /0080/ @!P0 BRA `(.L_2919) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0090/ IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ; /00a0/ SHF.L.U64.HI R17, R9, 0x2, R17 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 229 /00b0/ IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ; /00c0/ IADD3 R2, P1, R12, c[0x0][0x190], RZ ; /00d0/ IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ; /00e0/ IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 82 /00f0/ IMAD.WIDE R8, R0, 0x10, R8 ; /0100/ IMAD.WIDE R2, R0, 0x10, R2 ; /0110/ LDG.E.128.SYS R8, [R8] ; /0120/ LDG.E.128.SYS R4, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0130/ IADD3 R12, P0, R12, c[0x0][0x180], RZ ; /0140/ IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /0150/ IMAD.WIDE R12, R0, 0x10, R12 ; //## File "/usr/include/c++/8/tuple", line 1315 /0160/ FFMA R7, R7, c[0x0][0x168], R11 ; /0170/ FFMA R6, R6, c[0x0][0x168], R10 ; /0180/ FFMA R5, R5, c[0x0][0x168], R9 ; /0190/ FFMA R4, R4, c[0x0][0x168], R8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /01a0/ STG.E.128.SYS [R12], R4 ; /01b0/ EXIT ; .L_2919: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /01c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; /01d0/ BMOV.32.CLEAR RZ, B0 ; /01e0/ BSSY B0, `(.L_2920) ; /01f0/ IMAD.MOV.U32 R4, RZ, RZ, RZ ; /0200/ CS2R R6, SRZ ; /0210/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; /0220/ IMAD.MOV.U32 R10, RZ, RZ, RZ ; /0230/ P0 BRA `(.L_2921) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0240/ IADD3 R3, P1, R9, R0, RZ ; /0250/ LEA.HI.X.SX32 R6, R0, R17, 0x1, P1 ; /0260/ LEA R2, P1, R3, c[0x0][0x188], 0x2 ; /0270/ LEA.HI.X R3, R3, c[0x0][0x18c], R6, 0x2, P1 ; /0280/ LDG.E.SYS R10, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0290/ IADD3 R6, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /02a0/ ISETP.GE.AND P1, PT, R6, R5, PT ; /02b0/ P1 BRA `(.L_2922) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /02c0/ LDG.E.SYS R6, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /02d0/ IADD3 R8, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /02e0/ ISETP.GE.AND P1, PT, R8, R5, PT ; /02f0/ P1 BRA `(.L_2923) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0300/ IADD3 R8, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0310/ ISETP.GE.AND P1, PT, R8, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0320/ LDG.E.SYS R8, [R2+0x200] ; /0330/ @!P1 LDG.E.SYS R7, [R2+0x300] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /0340/ P1 IMAD.MOV.U32 R7, RZ, RZ, RZ ; /0350/ BRA `(.L_2921) ; .L_2923: /0360/ IMAD.MOV.U32 R7, RZ, RZ, RZ ; /0370/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; /0380/ BRA `(.L_2921) ; .L_2922: /0390/ CS2R R6, SRZ ; /03a0/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; .L_2921: /03b0/ BSYNC B0 ; .L_2920: /03c0/ BMOV.32.CLEAR RZ, B0 ; /03d0/ BSSY B0, `(.L_2924) ; /03e0/ P0 BRA `(.L_2925) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /03f0/ IADD3 R3, P1, R9, R0, RZ ; /0400/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P1 ; /0410/ LEA R2, P1, R3, c[0x0][0x190], 0x2 ; /0420/ LEA.HI.X R3, R3, c[0x0][0x194], R12, 0x2, P1 ; /0430/ LDG.E.SYS R11, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0440/ IADD3 R12, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0450/ ISETP.GE.AND P1, PT, R12, R5, PT ; /0460/ P1 BRA `(.L_2926) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0470/ LDG.E.SYS R13, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0480/ IADD3 R12, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0490/ ISETP.GE.AND P1, PT, R12, R5, PT ; /04a0/ P1 BRA `(.L_2927) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /04b0/ LDG.E.SYS R15, [R2+0x200] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /04c0/ IADD3 R12, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /04d0/ ISETP.GE.AND P1, PT, R12, R5, PT ; /04e0/ P1 BRA `(.L_2928) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /04f0/ LDG.E.SYS R4, [R2+0x300] ; /0500/ BRA `(.L_2928) ; .L_2927: /0510/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0520/ BRA `(.L_2928) ; .L_2926: /0530/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0540/ IMAD.MOV.U32 R13, RZ, RZ, RZ ; /0550/ BRA `(.L_2928) ; .L_2925: /0560/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0570/ IMAD.MOV.U32 R13, RZ, RZ, RZ ; /0580/ IMAD.MOV.U32 R11, RZ, RZ, RZ ; .L_2928: /0590/ BSYNC B0 ; .L_2924: //## File "/usr/include/c++/8/tuple", line 1315 /05a0/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /05b0/ IADD3 R9, P0, R9, R0, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /05c0/ FFMA R11, R11, c[0x0][0x168], R10 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /05d0/ IADD3 R14, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /05e0/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ; /05f0/ LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0600/ ISETP.GE.AND P1, PT, R14, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /0610/ LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ; /0620/ STG.E.SYS [R2], R11 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0630/ P1 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /0640/ IADD3 R10, R0, 0x80, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /0650/ FFMA R13, R13, c[0x0][0x168], R6 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0660/ ISETP.GE.AND P0, PT, R10, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /0670/ STG.E.SYS [R2+0x100], R13 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0680/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /0690/ IADD3 R0, R0, 0xc0, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /06a0/ FFMA R15, R15, c[0x0][0x168], R8 ; /06b0/ FFMA R7, R4, c[0x0][0x168], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /06c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /06d0/ STG.E.SYS [R2+0x200], R15 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /06e0/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /06f0/ STG.E.SYS [R2+0x300], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 260 /0700/ EXIT ; .L_2929: /0710/ BRA `(.L_2929); /0720/ NOP; /0730/ NOP; /0740/ NOP; /0750/ NOP; /0760/ NOP; /0770/ NOP; .L_40520: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33222 Differential Revision: D19964089 Pulled By: ngimel fbshipit-source-id: a1e8e62d1ebcc67fb49f00d87c02bcdd13194024	2020-02-19 18:41:27 -08:00
Michael Ranieri	e025f393f6	windows template specialization bug (#33076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33076 attempt at fixing https://github.com/pytorch/pytorch/issues/30886 Test Plan: circleCI with `call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.16` passes Differential Revision: D19784550 fbshipit-source-id: 9fb42c3854d1d00d96cd7179bef9dd1aa2972ea6	2020-02-07 00:41:22 -08:00
Sebastian Messmer	ab60cca488	Make c10::util::get_fully_qualified_type_name() backwards compatible with clang 4 (#31351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31351 Clang 4 needs the c10:: namespace specifier on fully_qualified_type_name_impl() to work correctly. Also, let's add an error message for people using clang 3 and earlier, we don't support those compilers anymore but before this PR, they got a crappy message. ghstack-source-id: 96380163 Test Plan: testinprod Differential Revision: D19135587 fbshipit-source-id: c206b56240b36e5c207fb2b69c389bb39f1e62aa	2020-01-07 17:07:54 -08:00

1 2

69 Commits