One PR towards #89205.
The content is mostly from PR #38465, but slightly changed the expression to make it faster.
Here are some benchmarking code:
```c++
#include <complex>
#include <iostream>
#include <chrono>
// main.cc
template<typename T> inline std::complex<T> log1p_v0(const std::complex<T> &z) {
// this PR
T x = z.real();
T y = z.imag();
T theta = std::atan2(y, x + T(1));
T r = x * (x + T(2)) + y * y;
return {T(0.5) * std::log1p(r), theta};
}
template<typename T> inline std::complex<T> log1p_v1(const std::complex<T> &z) {
// PR #38465
T x = z.real();
T y = z.imag();
std::complex<T> p1 = z + T(1);
T r = std::abs(p1);
T a = std::arg(p1);
T rm1 = (x * x + y * y + x * T(2)) / (r + 1);
return {std::log1p(rm1), a};
}
template<typename T>
inline std::complex<T> log1p_v2(const std::complex<T> &z) {
// naive, but numerically inaccurate
return std::log(T(1) + z);
}
int main() {
int n = 1000000;
std::complex<float> res(0.0, 0.0);
std::complex<float> input(0.5, 2.0);
auto start = std::chrono::system_clock::now();
for (int i = 0; i < n; i++) {
res += log1p_v0(input);
}
auto end = std::chrono::system_clock::now();
auto elapsed = end - start;
std::cout << "time for v0: " << elapsed.count() << '\n';
start = std::chrono::system_clock::now();
for (int i = 0; i < n; i++) {
res += log1p_v1(input);
}
end = std::chrono::system_clock::now();
elapsed = end - start;
std::cout << "time for v1: " << elapsed.count() << '\n';
start = std::chrono::system_clock::now();
for (int i = 0; i < n; i++) {
res += log1p_v2(input);
}
end = std::chrono::system_clock::now();
elapsed = end - start;
std::cout << "time for v2: " << elapsed.count() << '\n';
std::cout << res << '\n';
}
```
Compiling the script with command `g++ main.cc` produces the following results:
```
time for v0: 237812271
time for v1: 414524941
time for v2: 360585994
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89214
Approved by: https://github.com/lezcano
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55177
This fixes `warning: '_GLIBCXX11_USE_C99_COMPLEX' is not defined, evaluates to 0`, that would be raised if https://github.com/pytorch/pytorch/pull/54820 used with libstd++ compiled without USE_C99_COMPLEX support.
In `c++config.h` `_GLIBCXX_USE_C99_COMPLEX` is aliased to either `_GLIBCXX98_USE_C99_COMPLEX` or `_GLIBCXX11_USE_C99_COMPLEX` depending on `__cplusplus` macro, as shown here:
0cf4813202/libstdc%2B%2B-v3/include/bits/c%2B%2Bconfig (L641-L647)
Abovementioned config file is generated by autoconf, that leaves macro undefined if feature is not used, so using conditional like `defined(_GLIBCXX_USE_C99_COMPLEX) && _GLIBCXX_USE_C99_COMPLEX == 0` would trigger undefined macro preprocessor warning.
Test Plan: CI
Reviewed By: Orvid
Differential Revision: D27517788
fbshipit-source-id: a6db98d21c9bd98205815641363b765a02399678
Summary: Because the bare CXX version forwards to this without checking if it's defined causing errors for builds with -Wundef enabled
Test Plan: contbuilds
Differential Revision: D27443462
fbshipit-source-id: 554a3c653aae14d19e35038ba000cf5330e6d679
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54820
template implementation of std::sqrt() in libstdc++ yields incorrect results for `std::complex(-std::abs(x), -0.0)`, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991
For example:
```
#include <iostream>
#include <complex>
int main() {
std::cout << std::sqrt(std::complex<float>(-1.0f, -0.0f)) << std::endl;
}
```
prints `(0, -1)` if libstdc++ is compiled to use C99 csqrt/csqrtf fallback, but `(0, 1)` if configured not to use it.
Test Plan: CI
Reviewed By: luciang
Differential Revision: D27379302
fbshipit-source-id: 03f614fdb7ff734139736a2a5f6872cee0173bee
Summary:
Use `std::acos` even when avx2 is available
Add slow but accurate implementation of complex arc cosine based on
W. Kahan "Branch Cuts for Complex Elementary Functions" paper, where
cacos(z).re = 2*atan2(sqrt(1-z).re(), sqrt(1+z).re())
cacos(z).im = asinh((sqrt(conj(1+z))*sqrt(1-z)).im())
Fixes https://github.com/pytorch/pytorch/issues/42952
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52287
Reviewed By: walterddr
Differential Revision: D26455027
Pulled By: malfet
fbshipit-source-id: a81ce1ba4953eff4d3c2a265ef9199896a67b240
Summary:
libc++ implements csqrt using polar form of the number, which results in higher numerical error, if `arg` is close to 0, pi/2, pi, 3pi/4
Fixes https://github.com/pytorch/pytorch/issues/47500
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52018
Reviewed By: walterddr
Differential Revision: D26359947
Pulled By: malfet
fbshipit-source-id: 8c9f4dc45948cb29c43230dcee9b030c2642d981
Summary:
This file should have been renamed as `complex.h`, but unfortunately, it was named as `complex_type.h` due to a name clash with FBCode. Is this still the case and is it easy to resolve the name clash? Maybe related to the comment at https://github.com/pytorch/pytorch/pull/39834#issuecomment-642950012
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39885
Differential Revision: D22018575
Pulled By: ezyang
fbshipit-source-id: e237ccedbe2b30c31aca028a5b4c8c063087a30f
Summary:
Add a compilation error if they are individually included. Devs should
instead include c10/util/complex_type.h (which includes these two files).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39276
Differential Revision: D21924922
Pulled By: ezyang
fbshipit-source-id: ad1034be5d9d694b18cc5f03a44f540f10de568c
Summary:
I'm using CUDA 10.1 on Debian buster but I can still experience
compilation issues:
```
/usr/include/thrust/detail/complex/complex.inl(64): error: no suitable conversion function from "const c10::complex<float>" to "float" exists
detected during:
instantiation of "thrust::complex<T>::complex(const R &) [with T=float, R=c10::complex<float>]"
/home/hong/xusrc/pytorch/c10/util/complex_type.h(503): here
instantiation of "T std::abs(const c10::complex<T> &) [with T=float]"
/home/hong/xusrc/pytorch/aten/src/ATen/native/cuda/AbsKernel.cu(17): here
instantiation of "c10::complex<T> at::native::abs_wrapper(c10::complex<T>) [with T=float]"
/home/hong/xusrc/pytorch/aten/src/ATen/native/cuda/AbsKernel.cu(29): here
/usr/include/thrust/detail/complex/complex.inl(64): error: no suitable conversion function from "const c10::complex<double>" to "double" exists
detected during:
instantiation of "thrust::complex<T>::complex(const R &) [with T=double, R=c10::complex<double>]"
/home/hong/xusrc/pytorch/c10/util/complex_type.h(503): here
instantiation of "T std::abs(const c10::complex<T> &) [with T=double]"
/home/hong/xusrc/pytorch/aten/src/ATen/native/cuda/AbsKernel.cu(17): here
instantiation of "c10::complex<T> at::native::abs_wrapper(c10::complex<T>) [with T=double]"
/home/hong/xusrc/pytorch/aten/src/ATen/native/cuda/AbsKernel.cu(29): here
2 errors detected in the compilation of "/tmp/hong/tmpxft_00005893_00000000-6_AbsKernel.cpp1.ii".
CMake Error at torch_cuda_generated_AbsKernel.cu.o.Debug.cmake:281 (message):
Error generating file
/home/hong/xusrc/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_AbsKernel.cu.o
```
`nvcc --version`:
```
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38941
Differential Revision: D21818790
Pulled By: ezyang
fbshipit-source-id: a4bfcd8ae701f7c214bea0731c13a5f3587b7a98
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37866
make sure not to check `CUDA_VERSION` if it is not defined
Test Plan: CI gree
Reviewed By: anjali411
Differential Revision: D21408844
fbshipit-source-id: 5a9afe372b3f1fbaf08a7c43fa3e0e654a569d5f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37689
It has to be this way, otherwise, we will not be able to use it in vec256 because the function pointers declared there are using const reference.
Test Plan: Imported from OSS
Differential Revision: D21394603
Pulled By: anjali411
fbshipit-source-id: daa075b86daaa694489c883d79950a41d6e996ba
Summary:
Issue: https://github.com/pytorch/pytorch/issues/35284
~This depends on and contains https://github.com/pytorch/pytorch/pull/35524. Please review after the dependency gets merged and I will rebase to get a clean diff.~
The implementation of most functions follow the pattern
```C++
template<typename T>
C10_HOST_DEVICE c10::complex<T> some_function(c10::complex<T> x) {
#if defined(__CUDACC__) || defined(__HIPCC__)
return static_cast<c10::complex<T>>(thrust::some_function(static_cast<thrust::complex<T>>(x)));
#else
return static_cast<c10::complex<T>>(std::some_function(static_cast<std::complex<T>>(x)));
#endif
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35725
Differential Revision: D21256854
Pulled By: ezyang
fbshipit-source-id: 2112ba6b79923450feafd7ebdc7184a3eaecadb6
Summary:
Step 0 of https://github.com/pytorch/pytorch/issues/35284
Reference: https://en.cppreference.com/w/cpp/numeric/complex
We are targeting C++20. The difference across C++ versions are mostly `constexpr` qualifiers, newer version has more function declared as `constexpr`
This PR adds the core of `c10::complex`, it includes
- standard constructors as in `std::complex`
- explicit conversion constructors converting from `std/thrust::complex` to `c10::complex`
- standard assignment operators as in `std::complex`
- conversion assignment operators converting from `std/thrust::complex` to `c10::complex`
- other standard operators as in `std::complex`
- standard methods as in `std::complex`
- explicit casting operators to std/thrust
- basic non-member functions as in `std::complex`:
- arithmetic operators
- `==`, `!=`
- `<<`, `>>`
- `std::real`, `std::imag`, `std::abs`, `std::arg`, `std::norm`, `std::conj`, `std::proj`, `std::polar`
- Some of them are intentionally not completely implemented, these are marked as `TODO` and will be implemented in the future.
This PR does not include:
- overload of math functions
which will come in the next PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35524
Differential Revision: D21021677
Pulled By: anjali411
fbshipit-source-id: 9e144e581fa4b2bee62d33adaf756ce5aadc0c71