When PyTorch is built with OpenBLAS support and libopenblas is ldrectly linked with libgomp.so the libtorch_cpu.so ends up getting multiple omp runtimes linked against it. This may result in unexpected runtime behaviour /regression. This patch fixes this by avoiding linking against libomp.so if OpenBLAS is linked against libgomp.so
Fixes#146603
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147725
Approved by: https://github.com/albanD
When clang-cl parses its command line arguments, it expects MSVC-style arguments (beggining with `/` such as `/WX`, `/MD`, etc.) to be provided, and clang-style arguments to be preceded by `-Xclang`, otherwise, the clang-style parameters are ignored as they are interpreted unrecognized compiler options.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148097
Approved by: https://github.com/jeffdaily
According to the [APL documentation](https://developer.arm.com/documentation/101004/2404/General-information/Arm-Performance-Libraries-example-programs), libraries ending with _mp are OpenMP multi-threaded libraries.
When a project is compiled with MSVC and the -openmp flag, the vcomp library (Visual C++ implementation of OpenMP) is used for runtime calls.
However, the current APL implementation uses the libomp.dll (LLVM) variant.
As a result, there are unexpected behaviors at runtime.
---
For Example:
```python
import torch
# Create a sparse tensor
# Input (Sparse Tensor):
# [[0, 1],
# [1, 0]]
indices = torch.tensor([[0, 1], [1, 0]])
values = torch.tensor([1, 1], dtype=torch.float32)
size = torch.Size([2, 2])
sparse_tensor = torch.sparse_coo_tensor(indices, values, size)
# Convert sparse tensor to dense tensor
dense_tensor = sparse_tensor.to_dense()
# Expected Output (Dense Tensor):
# [[0, 1],
# [1, 0]]
print("\nDense Tensor:")
print(dense_tensor)
```
However, it prints unexpected outputs such as:
```python
# [[0, 11],
# [10, 0]]
```
The issue arises because the following code does not function as expected at runtime:
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/ParallelOpenMP.h#L30
```c++
// returns 1 , however since OpenMP is enabled it should return total number of threads
int64_t num_threads = omp_get_num_threads();
```
---
In the runtime, loading multiple OpenMP libraries (in this case `libomp` and `vcomp`) is causing unexpected behaviours.
So, we've changed libraries from `_mp` to non `_mp` versions and we used `vcomp` for OpenMP calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145215
Approved by: https://github.com/ozanMSFT, https://github.com/malfet
Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>
Reland of PR #94924. The purpose of this PR is to deal with the complicated interactions between MKL and OpenMP.
There are two improvements:
1. It uses a flag to avoid infinite mutual recursion in calling find_package(MKL) and find_package(OpenMP) in some cases.
2. The logic of finding iomp5 is improved and now we can test MKLDNN under ASAN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104224
Approved by: https://github.com/malfet
This is reland of PR #94402 that tries to solve the additional link issues.
The PR #94402 failed because caffe2::mkl had been converted to private dependency while libtorch_cuda_linalg hadn't linked to it explicitly. This is fixed in commit 4373bf0ae3dee32afc178f9d51a4154d6c5904c6
We also replace more references of MKL_LIBRARIES by caffe2::mkl in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94924
Approved by: https://github.com/malfet
We greatly simplify the handing of OpenMP in CMake by using caffe2::openmp target thoroughly. We follow the old behavior by defaulting to MKL OMP library and detecting OMP flags otherwise.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91576
Approved by: https://github.com/malfet