pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Michael Wootton	5351176caa	Kineto activity fix (#89785 ) Continuation of https://github.com/pytorch/pytorch/pull/88207 A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table(). Previously a cmake variable was not being propagated to a '-D', causing an issue on Windows, which uses cuda but not cupti. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89785 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-12-08 00:24:55 +00:00
Edward Z. Yang	c09929659c	Also include MKL_THREAD_LIB in link libraries for caffe2::mkl (#89378 ) Actually fixes https://github.com/pytorch/audio/issues/2784 for real; in my previous testing I didn't check if I could import torchaudio; now torchaudio successfully imports. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89378 Approved by: https://github.com/soumith	2022-11-20 19:47:25 +00:00
Edward Z. Yang	7b0d577c22	Set INTERFACE_LINK_DIRECTORIES on caffe2::mkl (#89359 ) This ensures that subsequent link commands involving mkl libraries know where to find the libraries if they are in a non-standard location (which is the case if you installed mkl via conda, which is what our standard instructions recommend.) This is kind of a hack, because the MKL libraries are not actually guaranteed to be in $MKL_ROOT/lib (they are for the conda install though). The real fix is to properly use the MKL targets from FindMKL.cmake but thats its own can of fish. See https://github.com/pytorch/pytorch/issues/73008 This fixes https://github.com/pytorch/audio/issues/2784 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89359 Approved by: https://github.com/soumith	2022-11-20 13:34:30 +00:00
Huy Do	ee2ce3fef6	Set make max load when building libtorch (#89237 ) The nccl build is still OOM sometimes when using `$(MAKE)`: ``` virtual memory exhausted: Cannot allocate memory Makefile:73: recipe for target '/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o' failed make[5]: *** [/var/lib/jenkins/cpp-build/caffe2/build/nccl/obj/collectives/device/devlink.o] Error 1 make[5]: Leaving directory '/var/lib/jenkins/workspace/third_party/nccl/nccl/src/collectives/device' ``` * https://github.com/pytorch/pytorch/actions/runs/3476485191/jobs/5811758058 * https://github.com/pytorch/pytorch/actions/runs/3422228421/jobs/5702153639 So trying to set the same limit here as when building with ninja Pull Request resolved: https://github.com/pytorch/pytorch/pull/89237 Approved by: https://github.com/malfet	2022-11-18 18:55:33 +00:00
Dmytro Dzhulgakov	ae01615d75	Fix cupti search path in CMake (#88657 ) Minor fix for when cuda is installed via conda. In this case the libraries are in `lib` and not `lib64`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88657 Approved by: https://github.com/kit1980, https://github.com/malfet	2022-11-10 23:44:52 +00:00
Eddie Yan	a7420d2ccb	Hopper (`sm90`) support (#87736 ) Essentially a followup of #87436 CC @xwang233 @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/87736 Approved by: https://github.com/xwang233, https://github.com/malfet	2022-11-09 01:49:50 +00:00
Pruthvi Madugundu	fbd08fb358	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-04 04:43:05 +00:00
PyTorch MergeBot	0fa23663cc	Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 )" This reverts commit `1e2c4a6e0e`. Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev	2022-11-02 18:13:37 +00:00
Pruthvi Madugundu	1e2c4a6e0e	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-02 17:41:57 +00:00
Jithun Nair	2e48b478e0	[ROCm] Use -rpath-link to fix libtinfo conflict (#83552 ) Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors. cc @jeffdaily @sunway513 @ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552 Approved by: https://github.com/malfet, https://github.com/pruthvistony	2022-10-28 03:50:43 +00:00
PyTorch MergeBot	ac0c13f665	Revert "[ROCm] Use -rpath-link to fix libtinfo conflict (#83552 )" This reverts commit `a10446c4d8`. Reverted https://github.com/pytorch/pytorch/pull/83552 on behalf of https://github.com/kit1980 due to Broke ios/macos builds https://github.com/pytorch/pytorch/actions/runs/3329991911/jobs/5507911292	2022-10-26 16:43:13 +00:00
Jithun Nair	a10446c4d8	[ROCm] Use -rpath-link to fix libtinfo conflict (#83552 ) Fixes issue building PyTorch for ROCm5.3 and above on Ubuntu20.04 because libtinfo6 from conda conflicts with the one from the distro causing symbol not found errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83552 Approved by: https://github.com/malfet	2022-10-26 14:40:29 +00:00
min-jean-cho	7a6808c5f6	build: support DNNL_GRAPH_CPU_RUNTIME=TBB (#87512 ) Force set cmake `DNNL_GRAPH_CPU_RUNTIME` as `MKLDNN_CPU_RUNTIME` to overwrite [`set(DNNL_GRAPH_CPU_RUNTIME "OMP")`](`d19d0f795c/cmake/options.cmake (L65-L67)`), enabling user-specified `MKLDNN_CPU_RUNTIME` values (`OMP` (default), `TBB`) for `DNNL_GRAPH_CPU_RUNTIME`. Fixes https://github.com/pytorch/pytorch/issues/87511 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87512 Approved by: https://github.com/jgong5, https://github.com/ashokei, https://github.com/malfet	2022-10-25 19:24:38 +00:00
Greg Hogan	71fe069d98	ada lovelace (arch 8.9) support (#87436 ) changes required to be able to compile https://github.com/pytorch/vision and https://github.com/nvidia/apex for `sm_89` architecture Pull Request resolved: https://github.com/pytorch/pytorch/pull/87436 Approved by: https://github.com/ngimel	2022-10-24 21:25:36 +00:00
Vladimír Aubrecht	409efebab8	Added define to fix issue with compatibility with latest Windows SDK (#85408 ) Fixes #83820. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85408 Approved by: https://github.com/ezyang	2022-10-12 15:44:28 +00:00
Huy Do	7f02f2ac0c	[Experimentation] Add TSAN build and test (#85313 ) Some parts of the PR are adopted from the previously abandoned https://github.com/pytorch/pytorch/pull/36694. This PR is the first part to setup TSAN jobs in the CI. The data race warnings from TSAN will need to be reviewed later in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85313 Approved by: https://github.com/osalpekar	2022-10-11 19:34:44 +00:00
PyTorch MergeBot	deb414a43f	Revert "Use FindCUDAToolkit to find cuda dependencies (#82695 )" This reverts commit `fb9b96593c`. Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/malfet due to Break cublas packaging into wheel	2022-10-11 02:50:47 +00:00
Peter Bell	fb9b96593c	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2022-10-06 15:43:39 +00:00
Sahan Paliskara	936e93058b	Delete torch::deploy from pytorch core (#85953 ) As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there. This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953 Approved by: https://github.com/seemethere, https://github.com/malfet	2022-10-06 07:20:16 +00:00
saltyJeff	b32020e937	make vulkan codegen windows-compatible (#85241 ) Using `:` to join together paths works on *nix only. This process uses cmake's `list(APPEND ...)` to make vulkan codegen work on windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85241 Approved by: https://github.com/ezyang	2022-09-26 15:13:24 +00:00
Peter Bell	9a81da7ad1	Update NCCL to current master and remove patch step (#85367 ) The patch from #84245 has been upstreamed into NCCL, so the patch step is no longer required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85367 Approved by: https://github.com/ezyang	2022-09-21 19:23:49 +00:00
Jithun Nair	90b64e231e	Update hipification logic for all ROCm headers (#85320 ) ...to remove deprecation warnings. Remove component-specific include dirs from include path Pull Request resolved: https://github.com/pytorch/pytorch/pull/85320 Approved by: https://github.com/kit1980	2022-09-21 16:22:12 +00:00
Peter Bell	fa86874bbd	Fix intermittent link errors in NCCL build (#84245 ) Should fix #13362 and fix #83790 I think I've discovered the root cause of the intermittent nccl link failures. If we look at the variable name in the redefinition error: ``` _02021d91_11_sendrecv_cu_0bc7b9c8_11152 ``` this is the name of the file being compiled + some form of unique ID. As part of NCCL's build process, the same file is compiled multiple times with different macro definitions depending on which operator and dtype are being compiled, e.g. ``` nvcc -DNCCL_OP=0 -DNCCL_TYPE=0 -dc sendrecv.cu -o sendrecv_sum_i8.o ``` Since the filename parts are the same, then if the unique IDs also happen to collide then the entire identifier will collide and the link fails. So the fix here is to generate a unique `.cu` file for each object file. I've implemented this as a `.patch` file that gets applied from our cmake code, but if we instead fork nccl that would be cleaner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84245 Approved by: https://github.com/janeyx99, https://github.com/malfet	2022-09-13 19:55:52 +00:00
Dhruv Matani	a06f2edab6	[Build] Replace message() in caffe2/CMakeLists.txt with message in cmake/Summary.cmake (#84814 ) Summary: In [PR 84755](https://github.com/pytorch/pytorch/pull/84755), @cccclai noticed and mentioned the presence of `message(STATUS...)` logging in caffe2/CMakeLists.txt and suggested moving it to the file cmake/Summary.cmake. This PR addresses that comment/suggestion. Test Plan: Ran the build as `USE_NUMPY=0 USE_DISTRIBUTED=0 USE_CUDA=0 TRACING_BASED=1 python setup.py develop` and saw the follwing being printed: ``` -- BUILD_MOBILE_AUTOGRAD : OFF -- BUILD_LITE_INTERPRETER: OFF -- INTERN_BUILD_MOBILE : -- TRACING_BASED : 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84814 Approved by: https://github.com/cccclai	2022-09-12 16:32:32 +00:00
Driss Guessous	0fc02dbba4	flash_attention integration (#81434 ) # Summary: - I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on. - Only looked at CMake did not attempt bazel or buck yet. - I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434 Approved by: https://github.com/cpuhrsch	2022-09-09 20:11:26 +00:00
John Detloff	e0229d6517	Remove caffe2 mobile (#84338 ) We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338 Approved by: https://github.com/dreiss	2022-09-08 01:49:55 +00:00
Shen Li	56a37ea1a6	Set default value for nccl make MAX_JOBS if ProcessorCount returns 0 (#84231 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84231 Approved by: https://github.com/malfet, https://github.com/rohan-varma	2022-08-30 16:06:34 +00:00
Peter Bell	b429a17545	Enable -Wunused-local-typedefs (#83708 ) I recently had a PR reverted because it triggered an unused-local-typedefs warning, so disabling these in the CMake build is counter-productive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83708 Approved by: https://github.com/albanD	2022-08-26 15:45:47 +00:00
Peter Bell	2000eba454	NCCL: Re-enable parallel builds (#83696 ) Since #83173 was merged I have noticed some CI being slowed down by the nccl building step. e.g. if there are no C++ changes then sccache compiles everything else very quickly and nccl becomes the limiting factor. This re-enables parallel builds with some safeguards to protect against oversubscription. When `make` is the parent build system, we can use `$(MAKE)` and the `make` jobserver will coordinate job allocation with the sub-process. For other build systems, this calls `make` with the `-l` flag which should prevent it launching jobs when the system load average is already too high. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83696 Approved by: https://github.com/malfet	2022-08-25 05:16:01 +00:00
Jane Xu	37d3db7579	Deletes CCACHE_DISABLE and SCCACHE_DISABLE from nccl.cmake (#84007 ) Looking through the code and online, it does not look like these variables actually change anything. Regardless, this change was instituted to fix https://github.com/pytorch/pytorch/issues/13362, but we are again running into similar issues even with the workaround: see https://github.com/pytorch/pytorch/issues/83790. Thus, since 1. this change isn't preventing flakiness 2. these variables do not seem used anywhere in pytorch/pytorch nor mozilla/sccache we should remove this confusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84007 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/ZainRizvi	2022-08-24 21:43:12 +00:00
Nikita Shulga	3a9ae518f2	Skip NCCL slimming for cxx11 libtorch builds (#83959 ) Fixes https://github.com/pytorch/pytorch/issues/83887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83959 Approved by: https://github.com/atalman	2022-08-24 18:31:27 +00:00
Pruthvi Madugundu	8473e69684	[ROCm] Fixes the kernel asserts API declaration mismatch error (#81790 ) This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040) The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled. Solution: For HIP we keep `__device__ __assert_fail()` and for host side compilation we want to use the `__assert_fail()` from the glibc library. Tested the code by compiling with below steps ``` python3 tools/amd_build/build_amd.py python3 setup.py develop --cmake-only cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build cmake --build build ``` The UT test_fixed_cuda_assert_async is still skipped due performance overhead. cc @jithunnair-amd Pull Request resolved: https://github.com/pytorch/pytorch/pull/81790 Approved by: https://github.com/shintaro-iwasaki, https://github.com/jeffdaily, https://github.com/malfet	2022-08-16 19:22:31 +00:00
Peter Bell	1c83ec8f61	Build nccl single-threaded (#83173 ) Closes #82888 This is a tentative fix. make is called by ninja so should be run in parallel with other jobs already. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83173 Approved by: https://github.com/malfet	2022-08-10 21:40:46 +00:00
Nikita Shulga	62c8d30f9f	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-10 14:32:26 +00:00
PyTorch MergeBot	d3a1f17fc7	Revert "[BE] Add `append_cxx_flag_if_supported` macro (#82883 )" This reverts commit `d7e6aaa59b`. Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-08-10 10:27:59 +00:00
Xiang Gao	cda210e23b	UCC PG build in CI (#81583 ) - Modifies the current cmake build definitions to use `find_package` to find UCX and UCC installed in the system - Install UCX and UCC in CUDA dockers - Build PyTorch with `USE_UCC=1` in pipelines - Currently, we are not running unit tests with the UCC PG. Those tests will be added in future PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81583 Approved by: https://github.com/vtlam, https://github.com/malfet	2022-08-10 00:23:47 +00:00
Nikita Shulga	d7e6aaa59b	[BE] Add `append_cxx_flag_if_supported` macro (#82883 ) And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt` Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883 Approved by: https://github.com/seemethere	2022-08-08 21:04:09 +00:00
Nikita Shulga	c08092fdf2	Update NCCL to v2.13.4-1 (#82775 ) Also, update slimming script to include two instances of net.o that new library generates Pull Request resolved: https://github.com/pytorch/pytorch/pull/82775 Approved by: https://github.com/ngimel	2022-08-04 19:36:45 +00:00
Nikita Shulga	7c298b8244	Fix objcopy version detection (#82774 ) By extending regex to match any character other than not just version On Ubuntu version string looks as follows: ``` $ objcopy --version GNU objcopy (GNU Binutils for Ubuntu) 2.30 ``` And on some CentOSes it looks as ``` $ objcopy --version GNU objcopy (GNU Binutils) 2.37 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/82774 Approved by: https://github.com/ngimel	2022-08-04 16:26:31 +00:00
Nikita Shulga	83086b7f45	Fix NCCL detection by Gloo (#82773 ) Instruct Gloo to always use bundled version of the library by passing `NCCL_EXTERNAL` Otherwise, it would link with shared library if one could be found in the system Pull Request resolved: https://github.com/pytorch/pytorch/pull/82773 Approved by: https://github.com/ngimel	2022-08-04 16:26:30 +00:00
Johannes	2ffb23616d	Fix false positive AVX, AVX2 and AVX512 detection with MSVC (#82554 ) ### Description These changes were made to assure, that the code that tests the vector instruction set extensions not only compiles but also runs to detect it properly for MSVC: - INCLUDE(CheckCSourceRuns) instead of INCLUDE(CheckCSourceCompiles) - INCLUDE(CheckCXXSourceRuns) instead of INCLUDE(CheckCXXSourceCompiles) - CHECK_C_SOURCE_RUNS instead of CHECK_C_SOURCE_COMPILES - CHECK_CXX_SOURCE_RUNS instead of CHECK_CXX_SOURCE_COMPILES ### Issue #82553 ### Testing I tried the [code changes](`86246b3c58`) on a copy of [FindAVX.cmake](https://github.com/pytorch/pytorch/blob/master/cmake/Modules/FindAVX.cmake) in my repository [convolution-benchmarks](https://github.com/JohT/convolution-benchmarks) and could verify that the detection works properly now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82554 Approved by: https://github.com/malfet	2022-08-01 23:52:49 +00:00
zhang, xiaobing	86b86202b5	fix torch.config can't respect USE_MKLDNN flag issue (#75001 ) Fixes https://github.com/pytorch/pytorch/issues/74949, which reports that torch.config can't respect USE_MKLDNN flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75001 Approved by: https://github.com/malfet	2022-07-17 15:00:48 +00:00
Larry Liu	e345138591	[retake2][mobile] Fix lightweight dispatch OOM error by introducing selective build (#80791 ) To fix #78540 I committed #78983 which is reverted due to internal CI failure. Then I comitted #79215 which was only fixing the failure but didn't have the full feature of #78983. This PR is another try. This PR adds script to dump all operators from test models and automatically write into `lightweight_dispatch_ops.yaml`. This way we don't have to manually update the yaml file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80791 Approved by: https://github.com/raziel	2022-07-15 18:04:25 +00:00
Nikita Shulga	17fe7ce0e4	[BE] Delete Win specific case for CMake older than 3.1 (#81411 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81411 Approved by: https://github.com/janeyx99	2022-07-14 00:31:31 +00:00
Tongliang Liao	dff70a5e1a	Make language std configurable. (#75519 ) RocksDB 7 starts to use C++17 in header. We should make this configurable, in case user needs higher std version. List of files to changed is found by `git grep 'CMAKE_[^_]*_STANDARD'`. Doc string is from CMake code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75519 Approved by: https://github.com/malfet	2022-07-13 14:21:27 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
Terry Lam	54bdaf76d6	[PFC] Native UCC process group for Pytorch (#79918 ) Summary: This diff integrates UCC process group as a native component of Pytorch Distributed core. It is based on the existing torch-ucc (https://github.com/facebookresearch/torch_ucc) as the wrapper for UCC collective communication library. The environment and cmake variables are named in mirroring to the existing process groups such as NCCL and Gloo. Specifically, - USE_UCC: enables UCC PG. This defaults to OFF, so there is no breakage of existing builds that do not have UCX/UCC external libraries. - USE_SYSTEM_UCC: uses external UCX and UCC shared libraries that are set accordingly with UCX_HOME and UCC_HOME. Currently, this diff only supports USE_SYSTEM_UCC=ON, i.e., requiring users to specify external libraries for UCX and UCC. In subsequent diffs, we will add UCX and UCC repos as third-party dependencies in pytorch/third-party. Test Plan: Passed Torch-UCC tests that invoke UCC process group. For example: $ sh test/start_test.sh test/torch_allreduce_test.py --backend gloo --use-cuda ... Test allreduce: succeeded Differential Revision: D36973688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79918 Approved by: https://github.com/kwen2501, https://github.com/kingchc	2022-07-12 14:45:44 +00:00
Dmitry Mikushin	e08026d4d4	Use miopen_LIBRARIES and rccl_LIBRARIES directly, when they are valid target (#80446 ) As of [this RCCL PR](https://github.com/ROCmSoftwarePlatform/rccl/pull/570), `${rccl_LIBRARIES}` refers to the actual RCCL library target, not just a symbolic "rccl" string. So starting from the next release, no special treatment of it would be required in PyTorch anymore. This patch checks whether `${RCCL_LIBRARIES}` and `${MIOpen_LIBRARIES}` are already valid, and if they are - is not trying to find them manually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80446 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2022-07-06 23:39:59 +00:00
Michael Suo	b349d15907	[build] fix compiling with clang13 (#80916 ) This check is incorrect; clang 13.1.0 doesn't exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80916 Approved by: https://github.com/malfet	2022-07-06 02:35:46 +00:00

1 2 3 4 5 ...

1058 Commits