The intent of this change was to minimize code differences between CUDA and ROCm while maintaining or improving performance.
Verified new performance using pytorch/benchmarks/operator_benchmark.
```
python -u -m pt.unary_test --tag-filter all --device cuda
python -u -m pt.binary_test --tag-filter all --device cuda
```
On MI200 this improved performance on average 3%.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120101
Approved by: https://github.com/albanD
It's already a requirement for building PyTorch, but should be a
requirement for linking extensions with it, as that can lead to runtime
crashes, as `std::optional` template layout is incompatible between
gcc-9 and older compilers.
Also, update minimum supported clang version to 9.x(used to build Android), as clang-5 is clearly not C++17 compliant.
Fixes https://github.com/pytorch/pytorch/issues/120020
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120126
Approved by: https://github.com/Skylion007
This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string.
Some other guts functions in the affected source files are also replaced.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480
Approved by: https://github.com/Skylion007
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 4f0b524</samp>
This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557
Approved by: https://github.com/malfet
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 4f0b524</samp>
This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557
Approved by: https://github.com/malfet
This diff locks in C++17 as the minimum standard with which PyTorch can be compiled.
This makes it possible to use all C++17 features in PyTorch.
This breaks backward compatibility in the sense that users with older compilers may find their compilers no longer are sufficient for the job.
Summary: #buildmore
Differential Revision: D44356879
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98209
Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/PaliC
… as equivalent replacements for std::is_pod and std::is_pod_v because they are deprecated in C++20.
When consuming libtorch header files in a project that uses C++20, there are warnings about std::is_pod being deprecated. This patch fixes that issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88918
Approved by: https://github.com/ezyang
… all instances of std::result_of and std:result_of_t are conditionally replaced by std::invoke_result and std::invoke_result_t if __cpp_lib_is_invocable >= 201703L. std::invoke_result was only introduced in c++17, so it should probably not be required yet.
Fixes#71657 and a small part of #69290
Tested on Centos 7 / gcc11 + a private project that requires cpp20.
I think the main questions to check by a maintainer are,
- whether my choices of preprocessor blocks are appropriate
- whether there are any very subtle differences between std::result_of and std::invoke_result that I have missed
- whether in any of the replacements the 'new' side can/should be simplified further
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79985
Approved by: https://github.com/ezyang
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610
- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.
- In the next PR
- Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
- HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.
cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd
Reviewed By: jbschlosser
Differential Revision: D30909053
Pulled By: ezyang
fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
This PR aims to reduce the import overhead and symbol noises from the `windows.h` headers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48009
Reviewed By: gchanan
Differential Revision: D25045840
Pulled By: ezyang
fbshipit-source-id: 01fda70f433ba2dd0cd2d7cd676ab6ffe9d98b90
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30922
New c++14 feature we can use now
ghstack-source-id: 103767403
Test Plan: waitforsandcastle
Differential Revision: D18869644
fbshipit-source-id: 54541c8004b2116386668a31eb9b0410a603b7dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38154
This should give better error messages and shorter stack traces on C++17 builds (e.g. fbcode)
ghstack-source-id: 103775564
Test Plan: waitforsandcastle
Differential Revision: D21483327
fbshipit-source-id: 184d1f9c0543bf43dc9713fa97fcc5955e7be319
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31091
This implements a C++17 "if constexpr" like feature for C++14.
This can be used, for example, to replace SFINAE or to force the compiler to remove some parts of a function in the assembly based on a condition.
PRs stacked on top will use this to simplify some of our template metaprogramming.
ghstack-source-id: 102867141
Test Plan: unit tests
Differential Revision: D18927220
fbshipit-source-id: 19a135e00af6ebb0139ce3730353762d4512158f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33732
move and forward instead of copy
Benchmarks:
A microbenchmark calling the add operation on two tensors in a tight loop shows a 5% improvement in performance.
No visible change for a model like resnet that does more work in its kernels.
ghstack-source-id: 99161486
Test Plan: benchmarks
Differential Revision: D20082642
fbshipit-source-id: eeac59686f8621dd5eaa85d61e6d219bba48c847
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31351
Clang 4 needs the c10:: namespace specifier on fully_qualified_type_name_impl() to work correctly.
Also, let's add an error message for people using clang 3 and earlier, we don't support those compilers anymore but before this PR, they got a crappy message.
ghstack-source-id: 96380163
Test Plan: testinprod
Differential Revision: D19135587
fbshipit-source-id: c206b56240b36e5c207fb2b69c389bb39f1e62aa