pytorch/cmake
Jerry Mannil 202f83dc4e [ROCm][layer_norm] Use __builtin_amdgcn_rcpf(x) instead of 1.f/x (#165589)
Replace (more) exact calculation with hardware approximation.

Benefits:
Reduced code size.
Improved performance for certain scenarios.

Experiments show low reduction in precision.
Experiments show no significant performance regressions. bfloat16 as well as float16 related calculations may benefit largely from this change.

Co-author: @mhalk @amd-hhashemi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165589
Approved by: https://github.com/jeffdaily
2025-10-17 09:12:30 +00:00
..
External [ROCm][Windows] Enable AOTriton runtime compile on Windows (#165538) 2025-10-16 19:51:43 +00:00
Modules [Intel GPU] Upgrade OneDNN XPU Tag to v3.9.1 (#161932) 2025-09-04 11:05:10 +00:00
Modules_CUDA_fix [2/N] Remove FindPackageHandleStandardArgs.cmake (#156559) 2025-07-24 02:34:10 +00:00
public [CMake] Remove forcing of -O2 from torch_compile_options (#164894) 2025-10-10 04:43:53 +00:00
Allowlist.cmake
BLAS_ABI.cmake [submodule] Bump fbgemm to latest (#158210) 2025-08-11 13:48:02 +00:00
BuildVariables.cmake
Caffe2Config.cmake.in xpu: improve error handling and reporting in XPU cmake files (#149353) 2025-03-20 02:00:39 +00:00
cmake_uninstall.cmake.in
Codegen.cmake [ATen][CUDA] CUTLASS matmuls: add sm_103a flag (#162956) 2025-09-16 10:29:55 +00:00
DebugHelper.cmake [BE] fix typos in cmake/ (#156079) 2025-06-17 19:25:43 +00:00
Dependencies.cmake [ROCm][layer_norm] Use __builtin_amdgcn_rcpf(x) instead of 1.f/x (#165589) 2025-10-17 09:12:30 +00:00
FlatBuffers.cmake
IncludeSource.cpp.in
iOS.cmake [BE] fix typos in cmake/ (#156079) 2025-06-17 19:25:43 +00:00
Metal.cmake [Build] Allow metal shaders to include ATen headers (#156256) 2025-06-18 01:03:25 +00:00
MiscCheck.cmake [submodule] Bump fbgemm to latest (#158210) 2025-08-11 13:48:02 +00:00
prioritized_text.txt Revert "[BE] Remove HermeticPyObjectTLS and Simplify PythonOpRegistrationTrampoline (#163464)" 2025-09-30 18:20:20 +00:00
ProtoBuf.cmake [Reland] Use 3.27 as the minimum CMake version (#154783) 2025-06-14 16:37:51 +00:00
ProtoBufPatch.cmake
Summary.cmake [ROCm][layer_norm] Use __builtin_amdgcn_rcpf(x) instead of 1.f/x (#165589) 2025-10-17 09:12:30 +00:00
TorchConfig.cmake.in Revert "Simplify nvtx3 CMake handling, always use nvtx3 (#153784)" 2025-06-24 20:02:07 +00:00
TorchConfigVersion.cmake.in
VulkanCodegen.cmake
VulkanDependencies.cmake