pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
cyy	3a70a02a81	Enable Wrange-loop-analysis (#110837 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110837 Approved by: https://github.com/Skylion007	2023-10-09 11:19:03 +00:00
cyy	c3e4e4f6d2	[4/N] Add -Wdeprecated and related fixes (#110204 ) This PR enables Wdeprecated on torch_cpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/110204 Approved by: https://github.com/ezyang	2023-10-07 19:46:08 +00:00
cyy	9a492fc27f	Fix unknown c++ flag detection in CMake (#109000 ) Unknown -Wno-XXX flags are still appended to GCC via append_cxx_flag_if_supported because of the behavior mentioned in GCC document: ``` When an unrecognized warning option is requested (e.g., -Wunknown-warning), GCC emits a diagnostic stating that the option is not recognized. However, if the -Wno- form is used, the behavior is slightly different: no diagnostic is produced for -Wno-unknown-warning unless other diagnostics are being produced. This allows the use of new -Wno- options with old compilers, but if something goes wrong, the compiler warns that an unrecognized option is present. ``` This PR tries to fix by detection the flag of the -WXXX form. Unfortunately, third_party/fbgemm/CMakeLists.txt redefines append_cxx_flag_if_supported and our version is overwritten. As a result, we have to re-include utils.cmake to overwrite it again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109000 Approved by: https://github.com/malfet	2023-09-11 08:32:07 +00:00
cyy	0cc2f06aec	[Reland] Improve MKL related logic in FindOpenMP.cmake (#104224 ) Reland of PR #94924. The purpose of this PR is to deal with the complicated interactions between MKL and OpenMP. There are two improvements: 1. It uses a flag to avoid infinite mutual recursion in calling find_package(MKL) and find_package(OpenMP) in some cases. 2. The logic of finding iomp5 is improved and now we can test MKLDNN under ASAN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104224 Approved by: https://github.com/malfet	2023-09-02 07:55:11 +00:00
Xia, Weiwen	97a291f6bd	[ONEDNN][BC-breaking] update onednn from v2.7.3 to v3.1.1 (#97957 ) Summary Update onednn from v2.7.3 to v3.1.1. It is bc-breaking as some APIs are changed on oneDNN side. Changes include: - PyTorch code where oneDNN is directly called - Submodule `third_party/ideep` to adapt to oneDNN's new API. - CMAKE files to fix build issues. Test plan Building issues and correctness are covered by CI checks. For performance, we have run TorchBench models to ensure there is no regression. Below is the comparison before and after oneDNN update. ![image](https://github.com/pytorch/pytorch/assets/12522207/415a4ff0-7566-40c6-aed0-24997a475b0e) Note: - Base commit of PyTorch: `da322ea` - CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Ice Lake) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97957 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-08-25 12:13:18 +00:00
PyTorch MergeBot	22cade56ba	Revert "[Reland] Upgrade NVTX to NVTX3 (#97582 )" This reverts commit `5bbfb96203`. Reverted https://github.com/pytorch/pytorch/pull/97582 on behalf of https://github.com/izaitsevfb due to Breaks meta RL builds ([comment](https://github.com/pytorch/pytorch/pull/97582#issuecomment-1679568525))	2023-08-15 20:55:12 +00:00
cyy	5bbfb96203	[Reland] Upgrade NVTX to NVTX3 (#97582 ) PR #90689 replaces NVTX with NVTX3. However, the torch::nvtoolsext is created only when the third party NVTX is used. This is clear a logical error. We now move the creation code out of the branch to cover all cases. This should fix the issues reported in the comments of #90689. It would be better to move configurations of the failed FRL jobs to CI tests so that we can find such issues early before merging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97582 Approved by: https://github.com/peterbell10	2023-08-14 16:55:25 +00:00
Jesse Cai	f81f9093ec	[core][pruning][feature] cuSPARSELt build integration (#103700 ) Summary: This stack of PR's integrates cuSPARSELt into PyTorch. This PR adds support for cuSPARSELt into the build process. It adds in a new flag, USE_CUSPARSELT that defaults to false. When USE_CUSPASRELT=1 is specified, the user can also specify CUSPASRELT_ROOT, which defines the path to the library. Compiling pytorch with cusparselt support can be done as follows: `` USE_CUSPARSELT=1 CUSPARSELT_ROOT=/path/to/cusparselt python setup.py develop ``` Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700 Approved by: https://github.com/albanD	2023-08-02 12:48:39 +00:00
Jeff Daily	5379b5f927	[ROCm] use hipblas instead of rocblas (#105881 ) - BatchLinearAlgebraLib.cpp is now split into one additional file - BatchLinearAlgebraLib.cpp uses only cusolver APIs - BatchLinearAlgebraLibBlas.cpp uses only cublas APIs - hipify operates at the file level and cannot mix cusolver and cublas APIs within the same file - cmake changes to link against hipblas instead of rocblas - hipify mappings changes to map cublas -> hipblas instead of rocblas Pull Request resolved: https://github.com/pytorch/pytorch/pull/105881 Approved by: https://github.com/albanD	2023-07-31 20:42:55 +00:00
CURTLab	b5d3d58497	Fixed cmake mkl lib path in caffee2 public (#105525 ) This small change fixes a linking error (Intel MKL) for distributed version of libtorch c++ using cmake. Fixes #105215. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105525 Approved by: https://github.com/albanD	2023-07-20 17:15:09 +00:00
Peter Bell	c500f1d13b	[CMake] Fix TORCH_CUDA_ARCH_LIST warning (#104680 ) The warning complains that `TORCH_CUDA_ARCH_LIST` is set on the environment instead of being defined as a build variable, which is fixed by the change to `tools/setup_helpers/cmake.py`. However, I still see the warning even with this fix because ```cmake if((NOT EXISTS ${TORCH_CUDA_ARCH_LIST}) ... ``` is actually checking whether a file exists called "7.5" (or whatever arch is being requested). Instead we want to check if the variable is defined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104680 Approved by: https://github.com/albanD	2023-07-07 15:12:54 +00:00
Te	a73ad82c8f	conditional CMAKE_CUDA_STANDARD (#104240 ) Fixes #104237 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104240 Approved by: https://github.com/malfet	2023-06-27 18:41:25 +00:00
Andres Lugo-Reyes	eaffd98880	Enable hipSOLVER in ROCm builds (#97370 ) Enables the hipSolver backend for ROCm builds -------------------------------------------------------------------------- - Minimum ROCm version requirement - 5.3 - Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER - Adds hipSOLVER API to hipification process - combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings - Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr() - Will enable 100+ linalg unit tests for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370 Approved by: https://github.com/malfet	2023-05-31 16:53:23 +00:00
cyy	c8877e6080	enable some cuda warnings (#95568 ) Currently some CUDA warnings are disabled due to some old issues of code quality that are fixed now. So it is time to remove the suppression. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95568 Approved by: https://github.com/albanD	2023-04-28 02:39:17 +00:00
PyTorch MergeBot	3226ad21cf	Revert "[Reland] fix some MKL detection issues of CMake (#94924 )" This reverts commit `dc2b7aa955`. Reverted https://github.com/pytorch/pytorch/pull/94924 on behalf of https://github.com/atalman due to conda nightly build failures	2023-03-31 18:41:11 +00:00
cyy	dc2b7aa955	[Reland] fix some MKL detection issues of CMake (#94924 ) This is reland of PR #94402 that tries to solve the additional link issues. The PR #94402 failed because caffe2::mkl had been converted to private dependency while libtorch_cuda_linalg hadn't linked to it explicitly. This is fixed in commit 4373bf0ae3dee32afc178f9d51a4154d6c5904c6 We also replace more references of MKL_LIBRARIES by caffe2::mkl in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94924 Approved by: https://github.com/malfet	2023-03-31 02:01:52 +00:00
Pruthvi Madugundu	08f125bcac	[ROCm] Remove usage of deprecated ROCm component header includes (#97620 ) - clang parameter 'amdgpu-target' changed to 'offload-arch' - HIP and MIOpen includes path updated for extensions Pull Request resolved: https://github.com/pytorch/pytorch/pull/97620 Approved by: https://github.com/ezyang, https://github.com/jithunnair-amd	2023-03-28 19:28:38 +00:00
Nikita Shulga	96e3b3ac72	[BE] Cleanup CMake flag suppressions (#97584 ) Use `append_cxx_flag_if_supported` to determine whether or not `-Werror` is supported Do not suppress deprecation warnings if glog is not used/installed, as the way check is written right now, it will suppress deprecations even if `glog` is not installed. Similarly, do not suppress deprecations on MacOS simply because we are compiling with protobuf. Fix deprecation warnings in: - MPS by replacing `MTLResourceOptionCPUCacheModeDefault`->`MTLResourceCPUCacheModeDefaultCache` - In GTests by replacing `TYPED_TEST_CASE`->`TYPED_TEST_SUITE` - In `codegen/onednn/interface.cpp`, by using passing `Stack` by reference rathern than pointer. Do not guard calls to `append_cxx_flag_if_supported` with `if(CLANG)` or `if(GCC)`. Fix some deprecated calls in `Metal` hide more complex exception under `C10_CLANG_DIAGNOSTIC_IGNORE` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97584 Approved by: https://github.com/kit1980	2023-03-27 18:46:09 +00:00
PyTorch MergeBot	5170995b2a	Revert "Upgrade NVTX to NVTX3 (#90689 )" This reverts commit `e64ddd1ab9`. Reverted https://github.com/pytorch/pytorch/pull/90689 on behalf of https://github.com/osalpekar due to Build Failures due to not being able to find one nvtx3 header in FRL jobs: [D42332540](https://www.internalfb.com/diff/D42332540)	2023-03-24 18:16:06 +00:00
cyy	e64ddd1ab9	Upgrade NVTX to NVTX3 (#90689 ) Due to recent upgrade to CUDA 11, we can upgrade NVTX to NVTX3 as well, which is a header only library that can simplify the building system a lot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90689 Approved by: https://github.com/soumith, https://github.com/malfet	2023-03-23 01:56:42 +00:00
Nikita Shulga	a229e78544	[BE] Enforce sign-compare (#96723 ) Number of OSS PR were reverted, because new signed-unsigned comparison warnings, which are treated as errors in some internal builds. Not sure how those selective rules are applied, but this PR removes `-Wno-sign-compare` from PyTorch codebase. The only tricky part in this PR, as making sure that non-ASCII character detection works for both signed and unsigned chars here: `6e3d51b08a/torch/csrc/jit/serialization/python_print.cpp (L926)` Exclude several files from sign-compare if flash attention is used, due to the violation in cutlass, to be fixed by https://github.com/NVIDIA/cutlass/pull/869 Do not try to fix sign compare violations in caffe2 codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/96723 Approved by: https://github.com/albanD	2023-03-15 06:04:20 +00:00
cyy	6786a24fd2	fix some tiny code issues (#95757 ) This PR tries to fix: 1. a misspelled NDEBUG preprocessing condition. 2. get ride of all writable-strings warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95757 Approved by: https://github.com/soulitzer	2023-03-01 23:27:32 +00:00
Peter Bell	c5f6092591	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2023-03-01 17:26:36 +00:00
PyTorch MergeBot	801b3f8fc7	Revert "Use FindCUDAToolkit to find cuda dependencies (#82695 )" This reverts commit `7289d22d67`. Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/peterbell10 due to Breaks torchaudio build	2023-02-28 02:29:09 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
Peter Bell	7289d22d67	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2023-02-21 22:35:17 +00:00
cyy	5fa7120722	Simplify CMake CUDNN code (#91676 ) 1. Move CUDNN code to seperate module. 2. Merge CUDNN public and private targets into a single private target. There is no need to expose CUDNN dependency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91676 Approved by: https://github.com/malfet	2023-02-08 01:06:10 +00:00
cyy	9291f9b9e2	Simplify cmake code (#91546 ) We use various newer CMake features to simplify build system: 1.Caffe2::threads is replaced by threads::threads. 2.Some unused MSVC flags are removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91546 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-08 01:05:19 +00:00
cyy	afd7b581aa	Simplify OpenMP detection in CMake (#91576 ) We greatly simplify the handing of OpenMP in CMake by using caffe2::openmp target thoroughly. We follow the old behavior by defaulting to MKL OMP library and detecting OMP flags otherwise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91576 Approved by: https://github.com/malfet	2023-02-04 11:50:06 +00:00
cyy	9710ac6531	Some CMake and CUDA cleanup given recent update to C++17 (#90599 ) The main changes are: 1. Remove outdated checks for old compiler versions because they can't support C++17. 2. Remove outdated CMake checks because it now requires 3.18. 3. Remove outdated CUDA checks because we are moving to CUDA 11. Almost all changes are in CMake files for easy audition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90599 Approved by: https://github.com/soumith	2022-12-30 11:19:26 +00:00
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Edward Z. Yang	c09929659c	Also include MKL_THREAD_LIB in link libraries for caffe2::mkl (#89378 ) Actually fixes https://github.com/pytorch/audio/issues/2784 for real; in my previous testing I didn't check if I could import torchaudio; now torchaudio successfully imports. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89378 Approved by: https://github.com/soumith	2022-11-20 19:47:25 +00:00
Edward Z. Yang	7b0d577c22	Set INTERFACE_LINK_DIRECTORIES on caffe2::mkl (#89359 ) This ensures that subsequent link commands involving mkl libraries know where to find the libraries if they are in a non-standard location (which is the case if you installed mkl via conda, which is what our standard instructions recommend.) This is kind of a hack, because the MKL libraries are not actually guaranteed to be in $MKL_ROOT/lib (they are for the conda install though). The real fix is to properly use the MKL targets from FindMKL.cmake but thats its own can of fish. See https://github.com/pytorch/pytorch/issues/73008 This fixes https://github.com/pytorch/audio/issues/2784 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89359 Approved by: https://github.com/soumith	2022-11-20 13:34:30 +00:00
Pruthvi Madugundu	fbd08fb358	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-04 04:43:05 +00:00
PyTorch MergeBot	0fa23663cc	Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 )" This reverts commit `1e2c4a6e0e`. Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev	2022-11-02 18:13:37 +00:00
Pruthvi Madugundu	1e2c4a6e0e	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-02 17:41:57 +00:00
PyTorch MergeBot	deb414a43f	Revert "Use FindCUDAToolkit to find cuda dependencies (#82695 )" This reverts commit `fb9b96593c`. Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/malfet due to Break cublas packaging into wheel	2022-10-11 02:50:47 +00:00
Peter Bell	fb9b96593c	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2022-10-06 15:43:39 +00:00
John Detloff	e0229d6517	Remove caffe2 mobile (#84338 ) We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338 Approved by: https://github.com/dreiss	2022-09-08 01:49:55 +00:00
Peter Bell	b429a17545	Enable -Wunused-local-typedefs (#83708 ) I recently had a PR reverted because it triggered an unused-local-typedefs warning, so disabling these in the CMake build is counter-productive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83708 Approved by: https://github.com/albanD	2022-08-26 15:45:47 +00:00
Pruthvi Madugundu	8473e69684	[ROCm] Fixes the kernel asserts API declaration mismatch error (#81790 ) This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040) The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled. Solution: For HIP we keep `__device__ __assert_fail()` and for host side compilation we want to use the `__assert_fail()` from the glibc library. Tested the code by compiling with below steps ``` python3 tools/amd_build/build_amd.py python3 setup.py develop --cmake-only cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build cmake --build build ``` The UT test_fixed_cuda_assert_async is still skipped due performance overhead. cc @jithunnair-amd Pull Request resolved: https://github.com/pytorch/pytorch/pull/81790 Approved by: https://github.com/shintaro-iwasaki, https://github.com/jeffdaily, https://github.com/malfet	2022-08-16 19:22:31 +00:00
Nikita Shulga	62c8d30f9f	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-10 14:32:26 +00:00
PyTorch MergeBot	d3a1f17fc7	Revert "[BE] Add `append_cxx_flag_if_supported` macro (#82883 )" This reverts commit `d7e6aaa59b`. Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-10 10:27:59 +00:00
Nikita Shulga	d7e6aaa59b	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-08 21:04:09 +00:00
Dmitry Mikushin	e08026d4d4	Use miopen_LIBRARIES and rccl_LIBRARIES directly, when they are valid target (#80446 ) As of [this RCCL PR](https://github.com/ROCmSoftwarePlatform/rccl/pull/570), `${rccl_LIBRARIES}` refers to the actual RCCL library target, not just a symbolic "rccl" string. So starting from the next release, no special treatment of it would be required in PyTorch anymore. This patch checks whether `${RCCL_LIBRARIES}` and `${MIOpen_LIBRARIES}` are already valid, and if they are - is not trying to find them manually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80446 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-07-06 23:39:59 +00:00
atalman	0e25a9490b	Removing cublas static linking (#79280 ) Removing cublas static linking Test: https://github.com/pytorch/pytorch/runs/6837323424?check_suite_focus=true ``` (base) atalman@atalman-dev-workstation-d4c889c8-2k8hl:~/whl_test/torch/lib$ ldd libtorch_cuda.so linux-vdso.so.1 (0x00007fffe8f6a000) libc10_cuda.so (0x00007f6539e6a000) libcudart-80664282.so.10.2 (0x00007f6539be9000) libnvToolsExt-3965bdd0.so.1 (0x00007f65399df000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f65397c0000) libc10.so (0x00007f653952f000) libtorch_cpu.so (0x00007f6520921000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6520583000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f652037f000) libcublas.so.10 (0x00007f651c0c5000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f651bebd000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f651bb34000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f651b91c000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f651b52b000) /lib64/ld-linux-x86-64.so.2 (0x00007f656aa13000) libgomp-a34b3233.so.1 (0x00007f651b301000) libcublasLt.so.10 (0x00007f651946c000) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79280 Approved by: https://github.com/seemethere	2022-06-13 13:10:16 +00:00
Jeff Daily	64b543434d	[ROCm] update cmake package DIR paths (#77087 ) Fixes nightly libtorch builds. As of ROCm 5.1.x, all *.cmake files are under /opt/rocm/lib/cmake/package instead of /opt/rocm/package/lib/cmake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77087 Approved by: https://github.com/seemethere	2022-05-10 19:06:51 +00:00
sanchitintel	4ee29d6033	[Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5) Re-landing #68111/#74596 ## Description v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of #50256, the below improvements are included: * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` `torch.jit.freeze` should be used after tracing (recommended) or scripting a model. ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: * SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) * SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code is placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is in: ``` caffe2/CMakeLists.txt cmake/public/mkldnn.cmake cmake/Modules/FindMKLDNN.cmake ``` ## Limitations * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step. * We have only optimized the inference use-case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622 Approved by: https://github.com/eellison	2022-05-05 16:57:03 +00:00
PyTorch MergeBot	3dcd67a1b3	Revert "[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)" This reverts commit `8b11d81058`. Reverted https://github.com/pytorch/pytorch/pull/74596 on behalf of https://github.com/janeyx99	2022-04-29 15:40:17 +00:00
chunyuan	8b11d81058	[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1) Re-landing https://github.com/pytorch/pytorch/pull/68111 ## Description Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444). On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included: - The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used - The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties. ### User API: The optimization pass is disabled by default. Users could enable it by: ``` torch.jit.enable_onednn_fusion(True) ``` ### Performance: [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance: - SkyLake 8180 (1 socket of 28 cores): ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png) - SkyLake 8180 (single thread): ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png) \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI) \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops ### Directory structure of the integration code Fuser-related code are placed under: ``` torch/csrc/jit/codegen/onednn/ ``` Optimization pass registration is done in: ``` torch/csrc/jit/passes/onednn_graph_fuser.h ``` CMake for the integration code is: ``` caffe2/CMakeLists.txt ``` ## Limitations - In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step. - We have only optimized the inference use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596 Approved by: https://github.com/malfet	2022-04-29 01:01:33 +00:00

1 2 3 4 5

245 Commits