Commit Graph

1141 Commits

Author SHA1 Message Date
cyy
e64ddd1ab9 Upgrade NVTX to NVTX3 (#90689)
Due to recent upgrade to CUDA 11, we can upgrade NVTX to NVTX3 as well, which is a header only library that can simplify the building system a lot.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90689
Approved by: https://github.com/soumith, https://github.com/malfet
2023-03-23 01:56:42 +00:00
wangxiyuan
4ab1588d99 Enhance error message for dependency check (#96642)
If python development library is missing when building pytorch from source, cmake will raise the error like:
```
CMake Error at cmake/Dependencies.cmake:1079 (if):
  if given arguments:

    "VERSION_LESS" "3"

  Unknown arguments specified
```

it's quite a misleading information that user would consider it's a syntax error or cmake version problem.

This PR add a check to ensure `PYTHONLIBS_VERSION_STRING` exist before using.

Related  #87993

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96642
Approved by: https://github.com/kit1980
2023-03-22 08:42:48 +00:00
Nikita Shulga
a229e78544 [BE] Enforce sign-compare (#96723)
Number of OSS PR were reverted, because new signed-unsigned comparison warnings, which are treated as errors in some internal builds.
Not sure how those selective rules are applied, but this PR removes `-Wno-sign-compare` from PyTorch codebase.

The only tricky part in this PR, as making sure that non-ASCII character detection works for both signed and unsigned chars  here:
6e3d51b08a/torch/csrc/jit/serialization/python_print.cpp (L926)

Exclude several files from sign-compare if flash attention is used, due to the violation in cutlass, to be fixed by https://github.com/NVIDIA/cutlass/pull/869
Do not try to fix sign compare violations in caffe2 codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96723
Approved by: https://github.com/albanD
2023-03-15 06:04:20 +00:00
Nikita Shulga
62c1e33fc9 [BE] Remove fast_nvcc tool (#96665)
As of CUDA-11.4+ this functionality can be mimicked by passing
[`--threads`](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#threads-number-t) option to CUDA compiler

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96665
Approved by: https://github.com/atalman, https://github.com/PaliC
2023-03-14 03:17:31 +00:00
cyy
666efd8d5d Improve ASAN and TSAN handling in cmake (#93147)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93147
Approved by: https://github.com/malfet
2023-03-07 14:10:13 +00:00
mantaionut
3beafc91d1 USE_FAST_NVCC Windows (#95206)
USE_FAST_NVCC now works on Windows.

Fixes #67100

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95206
Approved by: https://github.com/ezyang
2023-03-06 15:04:24 +00:00
Eddie Yan
db8e91ef73 [CUDA] Split out compute capability 8.7 and 7.2 from others (#95803)
Follow up of #95008 to avoid building Jetson compute capabilities unnecessarily, also adds missing 7.2.

CC @ptrblck @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95803
Approved by: https://github.com/ezyang
2023-03-02 14:13:15 +00:00
cyy
6786a24fd2 fix some tiny code issues (#95757)
This PR tries to fix:
1. a misspelled NDEBUG preprocessing condition.
2. get ride of all writable-strings warnings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95757
Approved by: https://github.com/soulitzer
2023-03-01 23:27:32 +00:00
Peter Bell
c5f6092591 Use FindCUDAToolkit to find cuda dependencies (#82695)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695
Approved by: https://github.com/malfet
2023-03-01 17:26:36 +00:00
Nikita Shulga
e970dd9dcf [CI] Compile on M1 natively (#95719)
We have plenty of runners now, let's use them for compilation as well.
To achieve that, remove `xcode-version: "13.3.1"` property and tweak Metal framework detection logic to work with command line tools(which are installed in `/Library/Developer/CommandLineTools`) and SDK is in `/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk`) rather than full Xcode installation.

TODO: Fix/enable OpenMP accelerated native builds (which are currently broken with `OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized.`), but this matches existing behavior as cross-builds are compiled  with OpenMP disabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95719
Approved by: https://github.com/huydhn
2023-03-01 04:20:42 +00:00
PyTorch MergeBot
801b3f8fc7 Revert "Use FindCUDAToolkit to find cuda dependencies (#82695)"
This reverts commit 7289d22d67.

Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/peterbell10 due to Breaks torchaudio build
2023-02-28 02:29:09 +00:00
cyy
f27e09de04 Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927)
This PR do two things:
1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang.
2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927
Approved by: https://github.com/malfet
2023-02-27 19:22:20 +00:00
eqy
cc39cd6938 [CUDA][CUBLAS] Explicitly link against cuBLASLt (#95094)
An issue surfaced recently that revealed that we were never explicitly linking against `cuBLASLt`, this fixes it by linking explicitly rather than depending on linker magic.

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95094
Approved by: https://github.com/malfet, https://github.com/ngimel, https://github.com/atalman
2023-02-24 21:44:32 +00:00
Peter Bell
7289d22d67 Use FindCUDAToolkit to find cuda dependencies (#82695)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695
Approved by: https://github.com/malfet
2023-02-21 22:35:17 +00:00
Eddie Yan
13ebffe088 [CUDA] sm_87 / Jetson Orin support (#95008)
Surfaced from #94438 CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95008
Approved by: https://github.com/ezyang
2023-02-17 02:22:23 +00:00
dllehr-amd
98012e4a59 [ROCm] hipGraph support for pytorch mainline (#88202)
With the release of ROCm 5.3 hip now supports a hipGraph implementation.

All necessary backend work and hipification is done to support the same functionality as cudaGraph.

Unit tests are modified to support a new TEST_GRAPH feature which allows us to create a single check for graph support instead of attempted to gather the CUDA level in annotations for every graph test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88202
Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet
2023-02-14 22:18:56 +00:00
PyTorch MergeBot
e743d316e2 Revert "fix some MKL detection issues of CMake (#94402)"
This reverts commit 7ef46d40a1.

Reverted https://github.com/pytorch/pytorch/pull/94402 on behalf of https://github.com/malfet due to Broke binary builds, see https://github.com/pytorch/pytorch/issues/94751#issuecomment-1428562517
2023-02-13 22:09:40 +00:00
PyTorch MergeBot
36dfbb08f3 Revert "Update Cutlass to v2.11 (#94188)"
This reverts commit a0f9abdcb6.

Reverted https://github.com/pytorch/pytorch/pull/94188 on behalf of https://github.com/ezyang due to bouncing this to derisk branch cut
2023-02-13 19:03:36 +00:00
Aaron Gokaslan
a0f9abdcb6 Update Cutlass to v2.11 (#94188)
Now that we are on CUDA 11+ exclusively, we can update Nvidia's Cutlass to the next version. We also had to remove the cuda build flag : "-D__CUDA_NO_HALF_CONVERSIONS__" since Cutlass no longer builds without it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94188
Approved by: https://github.com/ezyang, https://github.com/jansel
2023-02-12 20:45:03 +00:00
cyy
7ef46d40a1 fix some MKL detection issues of CMake (#94402)
This PR rewrites some logic of FindMKL.cmake and FindOpenMP.cmake to better detect the corresponding libraries and fix the infinitely recursion between them. It also contains some other fixes without changing the CMake interface.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94402
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-12 19:19:10 +00:00
Jing Xu
8b37eff69f remove abi uncertainty and potential abi conflict (#94306)
Currently there is a potential conflict for `GLIBCXX_USE_CXX11_ABI` configuration if users don't explicitly set this variable.
In `caffe2/CMakeLists.txt`, if the variable is not set, an `abi checker` will be used to retrieve the ABI configuration from compiler.
https://github.com/pytorch/pytorch/blob/master/caffe2/CMakeLists.txt#L1165-L1183
However, in 'torch/csrc/Module.cpp`, if the variable is not set, it will be set to `0`. The conflict happens when the default ABI of the compiler is `1`.
https://github.com/pytorch/pytorch/blob/master/torch/csrc/Module.cpp#L1612

This PR eliminate this uncertainty and potential conflict.
The ABI will be checked and set in `CMakeLists.txt`, and pass the value to `caffe2/CMakeLists.txt`. Meanwhile, in case the `caffe2/CMakeLists.txt` is directly invoked from a `cmake` command, The original GLIBC check logic is kept in this file.
If users doesn't explicitly assign a value to `GLIBCXX_USE_CXX11_ABI`, the `abi checker` will be executed and set the value accordingly. If the `abi checker` failed to compile or execute, the value will be set to `0`. If users explicitly assigned a value, then the provided value will be used.

Moreover, if `GLIBCXX_USE_CXX11_ABI` is set to `0`, the '-DGLIBCXX_USE_CXX11_ABI=0' flag won't be appended to `CMAKE_CXX_FLAGS`. Thus, whether to use ABI=0 or ABI=1 fully depends on compiler's default configuration. It could cause an issue that even users explicitly set `GLIBCXX_USE_CXX11_ABI` to `0`, the compiler still builds the binaries with ABI=1.
https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L44-L51
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94306
Approved by: https://github.com/malfet
2023-02-09 09:54:04 +00:00
cyy
5fa7120722 Simplify CMake CUDNN code (#91676)
1. Move CUDNN code to seperate module.
2. Merge CUDNN public and private targets into a single private target. There is no need to expose CUDNN dependency.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91676
Approved by: https://github.com/malfet
2023-02-08 01:06:10 +00:00
cyy
9291f9b9e2 Simplify cmake code (#91546)
We use various newer CMake features to simplify build system:
1.Caffe2::threads is replaced by threads::threads.
2.Some unused MSVC flags are removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91546
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-08 01:05:19 +00:00
cyy
afd7b581aa Simplify OpenMP detection in CMake (#91576)
We greatly simplify the handing of OpenMP in CMake by using caffe2::openmp target thoroughly. We follow the old behavior by defaulting to MKL OMP library and detecting OMP flags otherwise.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91576
Approved by: https://github.com/malfet
2023-02-04 11:50:06 +00:00
Hansong Zhang
d996acfbc2 [XNNPACK] disable ARM_BF16 and ARM_FP16_VECTOR (#94020)
Summary: This is not used and will cause build failure

Test Plan: CI

Differential Revision: D42982023

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94020
Approved by: https://github.com/Skylion007, https://github.com/tiandiao123, https://github.com/digantdesai
2023-02-03 05:01:00 +00:00
Digant Desai
989722cd19 Use global PIC flag for XNNPACK (#93896)
Summary:
- XNNPACK Object libraries needs an explicit PIC flag when building static, PIC libXNPACK.a
- Without this link process runs into relocation errors
- Using this global switch to avoid updating XNNPACK CMake

Test Plan: CI

Differential Revision: D42944764

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93896
Approved by: https://github.com/Skylion007, https://github.com/Neilblaze, https://github.com/salilsdesai
2023-02-02 23:38:21 +00:00
jjsjann123
c11b301bcd [NVFUSER] refactor nvfuser build (#89621)
This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library.

Contents inside this PR:
1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp)
2. splits the build system so nvfuser is generating its own `.so` files. Currently there are:
    - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser
    - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser`
3. nvfuser cpp tests is currently being compiled into `nvfuser_tests`
4. cmake is refactored so that:
    - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`.
    - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more
    - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built.
    - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary`

Future work that's scoped in following PR:
- Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet
- Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621
Approved by: https://github.com/davidberard98
2023-01-26 02:50:44 +00:00
PyTorch MergeBot
523d4f2562 Revert "[cuDNN][cuDNN V8 API] Always build assuming cuDNN >= 8.0 (#91527)"
This reverts commit 4d07ad74f1.

Reverted https://github.com/pytorch/pytorch/pull/91527 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-16 13:28:09 +00:00
Eddie Yan
4d07ad74f1 [cuDNN][cuDNN V8 API] Always build assuming cuDNN >= 8.0 (#91527)
We've been building with V8 (incl. V8 API) by default for a while now; this PR cleans up some guards for cuDNN < 8.0.

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91527
Approved by: https://github.com/ngimel
2023-01-13 18:55:37 +00:00
salilsdesai
ec94cbc66a [Vulkan] Remove GLSL Code Gen (#91912)
@bypass-github-export-checks

GLSL Code Gen is not used, so this diff removes
- GLSL parts of ShaderSource
- Anything enclosed by USE_VULKAN_SHADERC_RUNTIME, as well as the flag itself
- gen_vulkan_glsl script

Plus some additional refactoring

Differential Revision: [D41358861](https://our.internmc.facebook.com/intern/diff/D41358861/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D41358861/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91912
Approved by: https://github.com/mcr229
2023-01-10 20:29:47 +00:00
Eddie Yan
bac33ea8b6 [CUDA] Drop CUDA 10 support (#89582)
CC @ptrblck @ngimel @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89582
Approved by: https://github.com/malfet, https://github.com/ngimel
2023-01-05 05:11:53 +00:00
cyy
9710ac6531 Some CMake and CUDA cleanup given recent update to C++17 (#90599)
The main changes are:
1. Remove outdated checks for old compiler versions because they can't support C++17.
2. Remove outdated CMake checks because it now requires 3.18.
3. Remove outdated CUDA checks because we are moving to CUDA 11.

Almost all changes are in CMake files for easy audition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599
Approved by: https://github.com/soumith
2022-12-30 11:19:26 +00:00
atalman
3bd37ff2d5 Removing invalid git option when updating submodules (#91132)
Same as this: https://github.com/pytorch/builder/pull/1246
Related to following git commit: 51243f9f0f
Which makes jobs = 0 invalid.

Nightlies for MacOS are failing because of this issue: https://github.com/pytorch/pytorch/actions/runs/3729522653/jobs/6325523414

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91132
Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/malfet, https://github.com/seemethere
2022-12-20 02:17:02 +00:00
Nikita Shulga
36ac095ff8 Migrate PyTorch to C++17 (#85969)
With CUDA-10.2 gone we can finally do it!

This PR mostly contains build system related changes, invasive functional ones are to be followed.
Among many expected tweaks to the build system, here are few unexpected ones:
 - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it
 - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code.
 - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious
 - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it.

Some prerequisites:
 - https://github.com/pytorch/pytorch/pull/89297
 - https://github.com/pytorch/pytorch/pull/89605
 - https://github.com/pytorch/pytorch/pull/90228
 - https://github.com/pytorch/pytorch/pull/90389
 - https://github.com/pytorch/pytorch/pull/90379
 - https://github.com/pytorch/pytorch/pull/89570
 - https://github.com/facebookincubator/gloo/pull/336
 - https://github.com/facebookincubator/gloo/pull/343
 - 919676fb32

Fixes https://github.com/pytorch/pytorch/issues/56055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2022-12-08 02:27:48 +00:00
Michael Wootton
5351176caa Kineto activity fix (#89785)
Continuation of https://github.com/pytorch/pytorch/pull/88207

A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table().

Previously a cmake variable was not being propagated to a '-D', causing an issue on Windows, which uses cuda but not cupti.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89785
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-12-08 00:24:55 +00:00
Edward Z. Yang
c09929659c Also include MKL_THREAD_LIB in link libraries for caffe2::mkl (#89378)
Actually fixes https://github.com/pytorch/audio/issues/2784 for
real; in my previous testing I didn't check if I could import
torchaudio; now torchaudio successfully imports.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89378
Approved by: https://github.com/soumith
2022-11-20 19:47:25 +00:00
Edward Z. Yang
7b0d577c22 Set INTERFACE_LINK_DIRECTORIES on caffe2::mkl (#89359)
This ensures that subsequent link commands involving mkl libraries
know where to find the libraries if they are in a non-standard
location (which is the case if you installed mkl via conda, which
is what our standard instructions recommend.)

This is kind of a hack, because the MKL libraries are not actually
guaranteed to be in $MKL_ROOT/lib (they are for the conda install
though).  The real fix is to properly use the MKL targets from
FindMKL.cmake but thats its own can of fish.  See
https://github.com/pytorch/pytorch/issues/73008

This fixes https://github.com/pytorch/audio/issues/2784

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89359
Approved by: https://github.com/soumith
2022-11-20 13:34:30 +00:00
Huy Do
ee2ce3fef6 Set make max load when building libtorch (#89237)
The nccl build is still OOM sometimes when using `$(MAKE)`:

```
virtual memory exhausted: Cannot allocate memory
Makefile:73: recipe for target '/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o' failed
make[5]: *** [/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o] Error 1
make[5]: Leaving directory '/var/lib/jenkins/workspace/third_party/nccl/nccl/src/collectives/device'
```

* https://github.com/pytorch/pytorch/actions/runs/3476485191/jobs/5811758058
* https://github.com/pytorch/pytorch/actions/runs/3422228421/jobs/5702153639

So trying to set the same limit here as when building with ninja

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89237
Approved by: https://github.com/malfet
2022-11-18 18:55:33 +00:00
Dmytro Dzhulgakov
ae01615d75 Fix cupti search path in CMake (#88657)
Minor fix for when cuda is installed via conda. In this case the libraries are in `lib` and not `lib64`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88657
Approved by: https://github.com/kit1980, https://github.com/malfet
2022-11-10 23:44:52 +00:00
Eddie Yan
a7420d2ccb Hopper (sm90) support (#87736)
Essentially a followup of #87436

CC @xwang233 @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87736
Approved by: https://github.com/xwang233, https://github.com/malfet
2022-11-09 01:49:50 +00:00
Pruthvi Madugundu
fbd08fb358 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-04 04:43:05 +00:00
PyTorch MergeBot
0fa23663cc Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)"
This reverts commit 1e2c4a6e0e.

Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev
2022-11-02 18:13:37 +00:00
Pruthvi Madugundu
1e2c4a6e0e Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-02 17:41:57 +00:00
Jithun Nair
2e48b478e0 [ROCm] Use -rpath-link to fix libtinfo conflict (#83552)
Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors.

cc @jeffdaily @sunway513 @ROCmSupport
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552
Approved by: https://github.com/malfet, https://github.com/pruthvistony
2022-10-28 03:50:43 +00:00
PyTorch MergeBot
ac0c13f665 Revert "[ROCm] Use -rpath-link to fix libtinfo conflict (#83552)"
This reverts commit a10446c4d8.

Reverted https://github.com/pytorch/pytorch/pull/83552 on behalf of https://github.com/kit1980 due to Broke ios/macos builds https://github.com/pytorch/pytorch/actions/runs/3329991911/jobs/5507911292
2022-10-26 16:43:13 +00:00
Jithun Nair
a10446c4d8 [ROCm] Use -rpath-link to fix libtinfo conflict (#83552)
Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552
Approved by: https://github.com/malfet
2022-10-26 14:40:29 +00:00
min-jean-cho
7a6808c5f6 build: support DNNL_GRAPH_CPU_RUNTIME=TBB (#87512)
Force set cmake `DNNL_GRAPH_CPU_RUNTIME` as `MKLDNN_CPU_RUNTIME` to overwrite [`set(DNNL_GRAPH_CPU_RUNTIME "OMP")`](d19d0f795c/cmake/options.cmake (L65-L67)), enabling user-specified `MKLDNN_CPU_RUNTIME` values (`OMP` (default), `TBB`) for `DNNL_GRAPH_CPU_RUNTIME`.

Fixes https://github.com/pytorch/pytorch/issues/87511
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87512
Approved by: https://github.com/jgong5, https://github.com/ashokei, https://github.com/malfet
2022-10-25 19:24:38 +00:00
Greg Hogan
71fe069d98 ada lovelace (arch 8.9) support (#87436)
changes required to be able to compile https://github.com/pytorch/vision and https://github.com/nvidia/apex for `sm_89` architecture
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87436
Approved by: https://github.com/ngimel
2022-10-24 21:25:36 +00:00
Vladimír Aubrecht
409efebab8 Added define to fix issue with compatibility with latest Windows SDK (#85408)
Fixes #83820.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85408
Approved by: https://github.com/ezyang
2022-10-12 15:44:28 +00:00
Huy Do
7f02f2ac0c [Experimentation] Add TSAN build and test (#85313)
Some parts of the PR are adopted from the previously abandoned https://github.com/pytorch/pytorch/pull/36694.  This PR is the first part to setup TSAN jobs in the CI.  The data race warnings from TSAN will need to be reviewed later in a separate PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85313
Approved by: https://github.com/osalpekar
2022-10-11 19:34:44 +00:00
PyTorch MergeBot
deb414a43f Revert "Use FindCUDAToolkit to find cuda dependencies (#82695)"
This reverts commit fb9b96593c.

Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/malfet due to Break cublas packaging into wheel
2022-10-11 02:50:47 +00:00
Peter Bell
fb9b96593c Use FindCUDAToolkit to find cuda dependencies (#82695)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695
Approved by: https://github.com/malfet
2022-10-06 15:43:39 +00:00
Sahan Paliskara
936e93058b Delete torch::deploy from pytorch core (#85953)
As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there.

This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953
Approved by: https://github.com/seemethere, https://github.com/malfet
2022-10-06 07:20:16 +00:00
saltyJeff
b32020e937 make vulkan codegen windows-compatible (#85241)
Using `:` to join together paths works on *nix only. This process uses cmake's `list(APPEND ...)` to make vulkan codegen work on windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85241
Approved by: https://github.com/ezyang
2022-09-26 15:13:24 +00:00
Peter Bell
9a81da7ad1 Update NCCL to current master and remove patch step (#85367)
The patch from #84245 has been upstreamed into NCCL, so the patch step is no longer required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85367
Approved by: https://github.com/ezyang
2022-09-21 19:23:49 +00:00
Jithun Nair
90b64e231e Update hipification logic for all ROCm headers (#85320)
...to remove deprecation warnings. Remove component-specific include dirs from include path

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85320
Approved by: https://github.com/kit1980
2022-09-21 16:22:12 +00:00
Peter Bell
fa86874bbd Fix intermittent link errors in NCCL build (#84245)
Should fix #13362 and fix #83790

I think I've discovered the root cause of the intermittent nccl link
failures. If we look at the variable name in the redefinition error:
```
_02021d91_11_sendrecv_cu_0bc7b9c8_11152
```

this is the name of the file being compiled + some form of unique ID.
As part of NCCL's build process, the same file is compiled multiple
times with different macro definitions depending on which operator and
dtype are being compiled, e.g.
```
nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -dc sendrecv.cu -o sendrecv_sum_i8.o
```

Since the filename parts are the same, then if the unique IDs also
happen to collide then the entire identifier will collide and the link
fails. So the fix here is to generate a unique `.cu` file for each
object file. I've implemented this as a `.patch` file that gets
applied from our cmake code, but if we instead fork nccl that would be
cleaner.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84245
Approved by: https://github.com/janeyx99, https://github.com/malfet
2022-09-13 19:55:52 +00:00
Dhruv Matani
a06f2edab6 [Build] Replace message() in caffe2/CMakeLists.txt with message in cmake/Summary.cmake (#84814)
Summary: In [PR 84755](https://github.com/pytorch/pytorch/pull/84755), @cccclai noticed and mentioned the presence of `message(STATUS...)` logging in caffe2/CMakeLists.txt and suggested moving it to the file cmake/Summary.cmake. This PR addresses that comment/suggestion.

Test Plan: Ran the build as `USE_NUMPY=0 USE_DISTRIBUTED=0 USE_CUDA=0 TRACING_BASED=1 python setup.py develop`

and saw the follwing being printed:

```
--   BUILD_MOBILE_AUTOGRAD : OFF
--   BUILD_LITE_INTERPRETER: OFF
--   INTERN_BUILD_MOBILE   :
--   TRACING_BASED         : 1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84814
Approved by: https://github.com/cccclai
2022-09-12 16:32:32 +00:00
Driss Guessous
0fc02dbba4 flash_attention integration (#81434)
# Summary:
- I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on.

- Only looked at CMake did not attempt bazel or buck yet.

-  I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434
Approved by: https://github.com/cpuhrsch
2022-09-09 20:11:26 +00:00
John Detloff
e0229d6517 Remove caffe2 mobile (#84338)
We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338
Approved by: https://github.com/dreiss
2022-09-08 01:49:55 +00:00
Shen Li
56a37ea1a6 Set default value for nccl make MAX_JOBS if ProcessorCount returns 0 (#84231)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84231
Approved by: https://github.com/malfet, https://github.com/rohan-varma
2022-08-30 16:06:34 +00:00
Peter Bell
b429a17545 Enable -Wunused-local-typedefs (#83708)
I recently had a PR reverted because it triggered an
unused-local-typedefs warning, so disabling these in the CMake build
is counter-productive.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83708
Approved by: https://github.com/albanD
2022-08-26 15:45:47 +00:00
Peter Bell
2000eba454 NCCL: Re-enable parallel builds (#83696)
Since #83173 was merged I have noticed some CI being slowed down by
the nccl building step. e.g. if there are no C++ changes then sccache
compiles everything else very quickly and nccl becomes the limiting
factor.

This re-enables parallel builds with some safeguards to protect
against oversubscription. When `make` is the parent build system, we
can use `$(MAKE)` and the `make` jobserver will coordinate job
allocation with the sub-process. For other build systems, this calls
`make` with the `-l` flag which should prevent it launching jobs when
the system load average is already too high.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83696
Approved by: https://github.com/malfet
2022-08-25 05:16:01 +00:00
Jane Xu
37d3db7579 Deletes CCACHE_DISABLE and SCCACHE_DISABLE from nccl.cmake (#84007)
Looking through the code and online, it does not look like these variables actually change anything. Regardless, this change was instituted to fix https://github.com/pytorch/pytorch/issues/13362, but we are again running into similar issues even with the workaround: see https://github.com/pytorch/pytorch/issues/83790.

Thus, since
1. this change isn't preventing flakiness
2. these variables do not seem used anywhere in pytorch/pytorch nor mozilla/sccache
we should remove this confusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84007
Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/ZainRizvi
2022-08-24 21:43:12 +00:00
Nikita Shulga
3a9ae518f2 Skip NCCL slimming for cxx11 libtorch builds (#83959)
Fixes https://github.com/pytorch/pytorch/issues/83887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83959
Approved by: https://github.com/atalman
2022-08-24 18:31:27 +00:00
Pruthvi Madugundu
8473e69684 [ROCm] Fixes the kernel asserts API declaration mismatch error (#81790)
This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040)

The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled.

Solution:
For HIP we keep `__device__ __assert_fail()`
and for host side compilation we want to use the `__assert_fail()` from the glibc library.

Tested the code by compiling with below steps
```
python3 tools/amd_build/build_amd.py
python3 setup.py develop --cmake-only
cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build
cmake --build build
```

The UT test_fixed_cuda_assert_async is still skipped due performance overhead.

cc @jithunnair-amd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81790
Approved by: https://github.com/shintaro-iwasaki, https://github.com/jeffdaily, https://github.com/malfet
2022-08-16 19:22:31 +00:00
Peter Bell
1c83ec8f61 Build nccl single-threaded (#83173)
Closes #82888

This is a tentative fix. make is called by ninja so should be run in
parallel with other jobs already.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83173
Approved by: https://github.com/malfet
2022-08-10 21:40:46 +00:00
Nikita Shulga
62c8d30f9f [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-10 14:32:26 +00:00
PyTorch MergeBot
d3a1f17fc7 Revert "[BE] Add append_cxx_flag_if_supported macro (#82883)"
This reverts commit d7e6aaa59b.

Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-10 10:27:59 +00:00
Xiang Gao
cda210e23b UCC PG build in CI (#81583)
- Modifies the current cmake build definitions to use `find_package` to find UCX and UCC installed in the system
- Install UCX and UCC in CUDA dockers
- Build PyTorch with `USE_UCC=1` in pipelines
- Currently, we are not running unit tests with the UCC PG. Those tests will be added in future PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81583
Approved by: https://github.com/vtlam, https://github.com/malfet
2022-08-10 00:23:47 +00:00
Nikita Shulga
d7e6aaa59b [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-08 21:04:09 +00:00
Nikita Shulga
c08092fdf2 Update NCCL to v2.13.4-1 (#82775)
Also, update slimming script to include two instances of net.o that new library generates
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82775
Approved by: https://github.com/ngimel
2022-08-04 19:36:45 +00:00
Nikita Shulga
7c298b8244 Fix objcopy version detection (#82774)
By extending regex to match any character other than not just version

On Ubuntu version string looks as follows:
```
$ objcopy --version
GNU objcopy (GNU Binutils for Ubuntu) 2.30
```
And on some CentOSes it looks as
```
$ objcopy --version
GNU objcopy (GNU Binutils) 2.37

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82774
Approved by: https://github.com/ngimel
2022-08-04 16:26:31 +00:00
Nikita Shulga
83086b7f45 Fix NCCL detection by Gloo (#82773)
Instruct Gloo to always use bundled version of the library by passing `NCCL_EXTERNAL`
Otherwise, it would link with shared library if one could be found in the system
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82773
Approved by: https://github.com/ngimel
2022-08-04 16:26:30 +00:00
Johannes
2ffb23616d Fix false positive AVX, AVX2 and AVX512 detection with MSVC (#82554)
### Description

These changes were made to assure, that the code that tests the vector instruction set extensions not only compiles but also runs to detect it properly for MSVC:
- INCLUDE(CheckCSourceRuns) instead of INCLUDE(CheckCSourceCompiles)
- INCLUDE(CheckCXXSourceRuns) instead of INCLUDE(CheckCXXSourceCompiles)
- CHECK_C_SOURCE_RUNS instead of CHECK_C_SOURCE_COMPILES
- CHECK_CXX_SOURCE_RUNS instead of CHECK_CXX_SOURCE_COMPILES

### Issue
#82553

### Testing
I tried the [code changes](86246b3c58) on a copy of [FindAVX.cmake](https://github.com/pytorch/pytorch/blob/master/cmake/Modules/FindAVX.cmake) in my repository [convolution-benchmarks](https://github.com/JohT/convolution-benchmarks) and could verify that the detection works properly now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82554
Approved by: https://github.com/malfet
2022-08-01 23:52:49 +00:00
zhang, xiaobing
86b86202b5 fix torch.config can't respect USE_MKLDNN flag issue (#75001)
Fixes https://github.com/pytorch/pytorch/issues/74949, which reports that torch.config can't respect USE_MKLDNN flag.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75001
Approved by: https://github.com/malfet
2022-07-17 15:00:48 +00:00
Larry Liu
e345138591 [retake2][mobile] Fix lightweight dispatch OOM error by introducing selective build (#80791)
To fix #78540 I committed #78983 which is reverted due to internal CI failure. Then I comitted #79215 which was only fixing the failure but didn't have the full feature of #78983. This PR is another try.

This PR adds script to dump all operators from test models and automatically write into `lightweight_dispatch_ops.yaml`. This way we don't have to manually update the yaml file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80791
Approved by: https://github.com/raziel
2022-07-15 18:04:25 +00:00
Nikita Shulga
17fe7ce0e4 [BE] Delete Win specific case for CMake older than 3.1 (#81411)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81411
Approved by: https://github.com/janeyx99
2022-07-14 00:31:31 +00:00
Tongliang Liao
dff70a5e1a Make language std configurable. (#75519)
RocksDB 7 starts to use C++17 in header.
We should make this configurable, in case user needs higher std version.

List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`.
Doc string is from CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519
Approved by: https://github.com/malfet
2022-07-13 14:21:27 +00:00
Jing Xu
3c7044728b Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-07-13 13:50:15 +00:00
Terry Lam
54bdaf76d6 [PFC] Native UCC process group for Pytorch (#79918)
Summary:
This diff integrates UCC process group as a native component of Pytorch Distributed core. It is based on the existing torch-ucc (https://github.com/facebookresearch/torch_ucc) as the wrapper for UCC collective communication library.
The environment and cmake variables are named in mirroring to the existing process groups such as NCCL and Gloo. Specifically,
- USE_UCC: enables UCC PG. This defaults to OFF, so there is no breakage of existing builds that do not have UCX/UCC external libraries.
- USE_SYSTEM_UCC: uses external UCX and UCC shared libraries that are set accordingly with UCX_HOME and UCC_HOME.

Currently, this diff only supports USE_SYSTEM_UCC=ON, i.e., requiring users to specify external libraries for UCX and UCC. In subsequent diffs, we will add UCX and UCC repos as third-party dependencies in pytorch/third-party.

Test Plan:
Passed Torch-UCC tests that invoke UCC process group. For example:

$ sh test/start_test.sh test/torch_allreduce_test.py --backend gloo --use-cuda
...
Test allreduce: succeeded

Differential Revision: D36973688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79918
Approved by: https://github.com/kwen2501, https://github.com/kingchc
2022-07-12 14:45:44 +00:00
Dmitry Mikushin
e08026d4d4 Use miopen_LIBRARIES and rccl_LIBRARIES directly, when they are valid target (#80446)
As of [this RCCL PR](https://github.com/ROCmSoftwarePlatform/rccl/pull/570), `${rccl_LIBRARIES}` refers to the actual RCCL library target, not just a symbolic "rccl" string. So starting from the next release, no special treatment of it would be required in PyTorch anymore. This patch checks whether `${RCCL_LIBRARIES}` and `${MIOpen_LIBRARIES}` are already valid, and if they are - is not trying to find them manually.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80446
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-07-06 23:39:59 +00:00
Michael Suo
b349d15907 [build] fix compiling with clang13 (#80916)
This check is incorrect; clang 13.1.0 doesn't exist.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80916
Approved by: https://github.com/malfet
2022-07-06 02:35:46 +00:00
Ronak Malik
d03f989d53 [ROCm] Load ROCm if Torch is used as a dependency (#80469)
Includes LoadHIP.cmake if pytorch is used as a dependency for another project and ROCm is enabled. This removes the need to explicitly link against ROCm libraries in extension projects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80469
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-07-05 21:04:07 +00:00
PyTorch MergeBot
1454515253 Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)"
This reverts commit f988aa2b3f.

Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see f988aa2b3f
2022-06-30 12:49:41 +00:00
Jing Xu
f988aa2b3f Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-06-30 05:14:03 +00:00
Linbin Yu
b62d39eda0 Consolidate all python targets in the tools folder (#80408)
Summary:
All buck targets that points to caffe2/tools folder are now moved to tools/BUCK.
This also eliminates all python library/binary import in pt_defs.bzl, which caused T124308913.

Test Plan: CI

Differential Revision: D37468313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80408
Approved by: https://github.com/seemethere, https://github.com/malfet
2022-06-29 23:27:47 +00:00
Mo Zhou
799d71378c cmake: Fix variable typo for USE_SYSTEM_PYBIND11. (#80272)
The correct variable name should be USE_SYSTEM_PYBIND11, as defined in
the root CMakeLists.txt. In cmake/Dependencies.cmake, it is incorrectly
written as USE_SYSTEM_BIND11, but cmake will not complain about this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80272
Approved by: https://github.com/suo
2022-06-27 02:08:07 +00:00
Toyohisa Kameyama
8adec19230 Specify "Generic" BLAS library name. (#74269)
When we use pytorch with unregistered blas, spack set BLAS=Generic.
pytorch is searched only libblas.
If the blas package's blas library name is not libblas, spack install py-torch is failed.

This PR set blas lirary names to GENERIC_BLAS_LIBRARIES environment variable, and py-torch is found blas library.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74269
Approved by: https://github.com/kit1980
2022-06-20 18:44:54 +00:00
Anush Elangovan
e3135946b2 reorder cpuinfo and clog deps in TorchConfig.cmake (#79551)
cpuinfo has some symbols that need to be resolved with clog.
```

Static builds fail without this fix with this error:
api.c:(.text+0xc2): undefined reference to `clog_vlog_fatal'
init.c:(.text+0x19d1): undefined reference to `clog_vlog_error'
processors.c:(.text+0x551): undefined reference to `clog_vlog_error'
smallfile.c:(.text+0x172): undefined reference to `clog_vlog_error'

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79551
Approved by: https://github.com/malfet
2022-06-16 18:23:26 +00:00
Sergii Dymchenko
f1fb575b9e Remove -Wno-unused-but-set-variable for clang 13.0.0 (#79666)
Fixes #74805

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79666
Approved by: https://github.com/malfet
2022-06-16 04:26:39 +00:00
atalman
0e25a9490b Removing cublas static linking (#79280)
Removing cublas static linking

Test:  https://github.com/pytorch/pytorch/runs/6837323424?check_suite_focus=true

```
(base) atalman@atalman-dev-workstation-d4c889c8-2k8hl:~/whl_test/torch/lib$ ldd libtorch_cuda.so
	linux-vdso.so.1 (0x00007fffe8f6a000)
	libc10_cuda.so (0x00007f6539e6a000)
	libcudart-80664282.so.10.2 (0x00007f6539be9000)
	libnvToolsExt-3965bdd0.so.1 (0x00007f65399df000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f65397c0000)
	libc10.so (0x00007f653952f000)
	libtorch_cpu.so (0x00007f6520921000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6520583000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f652037f000)
	libcublas.so.10 (0x00007f651c0c5000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f651bebd000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f651bb34000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f651b91c000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f651b52b000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f656aa13000)
	libgomp-a34b3233.so.1 (0x00007f651b301000)
	libcublasLt.so.10 (0x00007f651946c000)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79280
Approved by: https://github.com/seemethere
2022-06-13 13:10:16 +00:00
Mark Harfouche
221755cc71 Link BLAS privately (#78883)
We've some users report that they are getting symbol collisions when linking to blas.

I don't see a need to re-export the blas library symbols.

I figured I would share here for other packagers to be able to benefit too.

xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/116
xref: https://github.com/conda-forge/openblas-feedstock/issues/134
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78883
Approved by: https://github.com/ezyang
2022-06-09 17:02:06 +00:00
PyTorch MergeBot
c3e089a047 Revert "[mobile] Fix lightweight dispatch OOM error by introducing selective build"
This reverts commit 272bdb1442.

Reverted https://github.com/pytorch/pytorch/pull/78983 on behalf of https://github.com/osalpekar due to broke internal mobile tests
2022-06-09 05:16:42 +00:00
PyTorch MergeBot
272bdb1442 [mobile] Fix lightweight dispatch OOM error by introducing selective build
This PR introduces selective build to lightweight dispatch CI job. By doing so we can't run the `test_lite_intepreter_runtime` test suite anymore because it requires some other operators.

From now on, if we are adding a new unit test in `test_codegen_unboxing`, we will have to export the operators for the unit test model and add them into `lightweight_dispatch_ops.yaml`. This can be automated by introducing tracing based selective build, but that's for next PR to do.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78983

Approved by: https://github.com/kit1980
2022-06-08 04:29:35 +00:00
Michael Andreas Dagitses
501d0729cb move build_variables.bzl and ufunc_defs.bzl from pytorch-root/tools/ to the root
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78542

This makes importing easier in different build systems that have
different absolute names for the pytorch-root.

Differential Revision: [D36782582](https://our.internmc.facebook.com/intern/diff/D36782582/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36782582/)!

Approved by: https://github.com/malfet
2022-06-02 19:39:27 +00:00
Peter Bell
5cdf79fddc Bump minimum CMake version to 3.13
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76312

Approved by: https://github.com/malfet
2022-05-19 15:38:55 +00:00
drisspg
1f7d243e36 Add USE_MPS option to cmake summary
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77782

Approved by: https://github.com/albanD
2022-05-18 20:16:03 +00:00
Nikita Shulga
4b4a6a0b19 Use TensorPipe libuv in Gloo (#77312)
Otherwise, its possible to build TensorPipe with one version of libuv
and gloo with another.

Also, delete strange `GLOO_INSTALL` logic, as none of the install artifacts are really packaged as part of PyTorch (and it were probably used by Caffe2 builds)

This helps solve problem for compiling PyTorch for M1, where `libuv` is not available in conda

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77312
Approved by: https://github.com/seemethere
2022-05-17 03:31:48 +00:00
Kulin Seth
f348b1b2b5 Add the Runtime components for MPS backend. (#76725)
The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend.

Current list of identified TODOs are:

-  https://github.com/pytorch/pytorch/issues/77176
- Unify the logic with CUDACachingAllocator and remove redundant code.
-  https://github.com/pytorch/pytorch/issues/77170
- Look into using C++ smart pointers where possible with ObjC code
- Use empty_strided_generic() to implement the `empty_strided_mps` code
- https://github.com/pytorch/pytorch/issues/77144
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725
Approved by: https://github.com/albanD
2022-05-11 17:19:45 +00:00