pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Anthony Alayo	31611b40b9	cmake: allow to build pytorch as a CMake subproject (#110373 ) This is a re-attempt of fixing https://github.com/pytorch/pytorch/issues/53980, first submitted in https://github.com/pytorch/pytorch/pull/54978. Quoting @SpaceIm ``` Fixes https://github.com/pytorch/pytorch/issues/53980 Maybe it would be nice to find why some files are generated in CMAKE_BINARY_DIR instead of CMAKE_CURRENT_BINARY_DIR or Torch_BINARY_DIR or PROJECT_BINARY_DIR, but there is a lot of indirection in the logic of pytorch build files, so I was not able to find where it comes from. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110373 Approved by: https://github.com/malfet	2023-10-10 17:47:35 +00:00
cyy	3a70a02a81	Enable Wrange-loop-analysis (#110837 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110837 Approved by: https://github.com/Skylion007	2023-10-09 11:19:03 +00:00
Edward Z. Yang	10f9edc99d	Don't -Werror on cast-function-type (#109796 ) I recently built PyTorch with clang and we are apparently not warnings clean on this. Since we don't have any contbuild that catches this situation, just get rid of it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109796 Approved by: https://github.com/cpuhrsch	2023-09-24 23:05:10 +00:00
cyy	4c208c1475	Remove unneeded linking in CMake targets (#109192 ) This PR removes unused library dependencies, help refactoring in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109192 Approved by: https://github.com/ezyang	2023-09-15 19:43:25 +00:00
drisspg	ad90ab31f2	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-13 13:59:05 +00:00
cyy	9a492fc27f	Fix unknown c++ flag detection in CMake (#109000 ) Unknown -Wno-XXX flags are still appended to GCC via append_cxx_flag_if_supported because of the behavior mentioned in GCC document: ``` When an unrecognized warning option is requested (e.g., -Wunknown-warning), GCC emits a diagnostic stating that the option is not recognized. However, if the -Wno- form is used, the behavior is slightly different: no diagnostic is produced for -Wno-unknown-warning unless other diagnostics are being produced. This allows the use of new -Wno- options with old compilers, but if something goes wrong, the compiler warns that an unrecognized option is present. ``` This PR tries to fix by detection the flag of the -WXXX form. Unfortunately, third_party/fbgemm/CMakeLists.txt redefines append_cxx_flag_if_supported and our version is overwritten. As a result, we have to re-include utils.cmake to overwrite it again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109000 Approved by: https://github.com/malfet	2023-09-11 08:32:07 +00:00
Huy Do	a9c663c269	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 07:43:04 +00:00
PyTorch MergeBot	e45b290127	Revert "Revert "Flash Attention v2 (#105602 )" (#108827 )" This reverts commit `24e9bbe22a`. Reverted https://github.com/pytorch/pytorch/pull/108827 on behalf of https://github.com/huydhn due to I need to land this revert properly as there are new failures showing up on trunk ([comment](https://github.com/pytorch/pytorch/pull/108827#issuecomment-1711020924))	2023-09-08 03:25:45 +00:00
Huy Do	24e9bbe22a	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 02:54:20 +00:00
cyy	621463a3e6	Update libfmt submodule to 10.1.1 (#108431 ) This PR updates libfmt to version 10.1.1. We also set utf-8 source encoding earlier before include third party libraries on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108431 Approved by: https://github.com/Skylion007	2023-09-03 23:44:39 +00:00
drisspg	add45aea1c	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-01 22:14:44 +00:00
PyTorch MergeBot	d569e506ab	Revert "Flash Attention v2 (#105602 )" This reverts commit `9df3d882c8`. Reverted https://github.com/pytorch/pytorch/pull/105602 on behalf of https://github.com/huydhn due to I think we miss a case here for sm80 build on inductor workflow as it is now OOM on trunk https://github.com/pytorch/pytorch/actions/runs/6042843139 ([comment](https://github.com/pytorch/pytorch/pull/105602#issuecomment-1701974862))	2023-09-01 01:15:01 +00:00
drisspg	9df3d882c8	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-08-31 16:02:20 +00:00
drisspg	182a9cf366	Add Independent Memory Efficient and Flash Attention Build Flags (#107985 ) # Summary In an effort to simplify https://github.com/pytorch/pytorch/pull/105602, this PR pulls out independent chunks of code that can be landed prior to FlashV2 landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107985 Approved by: https://github.com/cpuhrsch	2023-08-28 18:39:18 +00:00
peterjc123	8507b22fea	propagate _GLIBCXX_USE_CXX11_ABI to NVCC (#107209 ) Fixes #107161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107209 Approved by: https://github.com/malfet	2023-08-16 22:41:52 +00:00
Jesse Cai	f81f9093ec	[core][pruning][feature] cuSPARSELt build integration (#103700 ) Summary: This stack of PR's integrates cuSPARSELt into PyTorch. This PR adds support for cuSPARSELt into the build process. It adds in a new flag, USE_CUSPARSELT that defaults to false. When USE_CUSPASRELT=1 is specified, the user can also specify CUSPASRELT_ROOT, which defines the path to the library. Compiling pytorch with cusparselt support can be done as follows: `` USE_CUSPARSELT=1 CUSPARSELT_ROOT=/path/to/cusparselt python setup.py develop ``` Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/103700 Approved by: https://github.com/albanD	2023-08-02 12:48:39 +00:00
Driss Guessous	d184c81166	Add -fstandalone-debug debug flag (#104475 ) # Summary While debugging something in lldb, I found that the formatter I wrote for c10::intarrayref was not working correctly producing: `(std::string) $6 = error: summary string parsing error` Based off of this thread: https://github.com/vadimcn/codelldb/issues/415 I adde the standalone-debug information and fixed the std::string formatting issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104475 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-11 01:29:20 +00:00
Nikita Shulga	456ecefd52	[BE] Fix warning in top-level CMakeLists.txt (#104726 ) Fixes warning introduced by https://github.com/pytorch/pytorch/issues/102594: ``` CMake Warning (dev) in CMakeLists.txt: A logical block opening on the line /pytorch/CMakeLists.txt:726 (if) closes on the line /pytorch/CMakeLists.txt:735 (endif) with mis-matching arguments. ``` <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at b7555d5</samp> > _`DEBUG_CUDA` on_ > _No more CUDA in exe_ > _Winter bug is fixed_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104726 Approved by: https://github.com/huydhn, https://github.com/atalman	2023-07-06 22:13:29 +00:00
Xu Han	a956b1c849	optimize mimalloc build options. (#104497 ) 1. pytorch only need static lib, disable other libs. 2. disable override, pytorch only access mimalloc via cpu_alloc/cpu_free. Reference: https://github.com/microsoft/mimalloc/blob/master/CMakeLists.txt#L10-L25 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104497 Approved by: https://github.com/jgong5, https://github.com/albanD	2023-07-06 04:44:21 +00:00
Edward Z. Yang	3dc4adc7a6	Don't build CUDA with debug info by default. (#102617 ) Fixes https://github.com/pytorch/pytorch/issues/102594 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/102617 Approved by: https://github.com/malfet	2023-07-05 20:16:19 +00:00
Connor Baker	0c8323e4a4	cmake: allow USE_SYSTEM_ZSTD (#104611 ) Fixes #44255. This is part of larger work I'm doing to allow for more `USE_SYSTEM_*` options to allow Nix to have faster re-builds of PyTorch: https://github.com/NixOS/nixpkgs/pull/239291. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104611 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-05 04:47:35 +00:00
Connor Baker	e8174faa02	cmake: respect USE_SYSTEM_LIBS when USE_NCCL is set (#104511 ) Even though `USE_SYSTEM_LIBS` is set to true, we still need to set `USE_SYSTEM_NCCL` for the system NCCL to be used. This fixes that by adding a conditional `set` similar to what is done for `USE_TBB`: `e9ebda29d8/CMakeLists.txt (L426-L428)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104511 Approved by: https://github.com/ezyang	2023-07-04 19:08:50 +00:00
Xu Han	6c1ccccf21	Enable mimalloc on pytorch Windows (#102595 ) This PR is implemention of [#102534](https://github.com/pytorch/pytorch/issues/102534), option 2. Major changes: 1. Add mimalloc to the submodule. 2. Add build option "USE_MIMALLOC". 3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance. Additional Test: <img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3"> This PR also build & static link mimalloc on Linux well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102595 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-27 08:53:26 +00:00
cyy	483f748dd5	[BE] Enforce missing `override` keyword (#104032 ) This PR enables `-Winconsistent-missing-destructor-override` and `-Winconsistent-missing-override` and fixes violations. <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 47e904e</samp> This pull request updates the code of various classes and operators in the `caffe2` and `aten` subdirectories to use the `override` specifier instead of the `virtual` keyword for destructors and other virtual functions that override a base class function. This improves the code readability, quality, and consistency with C++ best practices. It also modifies the `./CMakeLists.txt` file to enable warnings for these specifiers, but disable errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104032 Approved by: https://github.com/malfet	2023-06-24 02:34:24 +00:00
Nikita Shulga	0b7320315a	[CI] Move libtorch-debug CUDA build to CUDA-12.1 (#102756 ) To avoid nvcc segfaults, compile without `--source-in-ptx` option on CUDA-12.1+ <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 984e4b2</samp> > _Sing, O Muse, of the daring deeds of PyTorch, the swift and fiery_ > _framework that harnesses the power of CUDA, the blazing tool of Nvidia._ > _How they faced a mighty challenge when CUDA, the ever-shifting,_ > _released a new version, twelve point one, that broke their code and caused them grief._ Fixes https://github.com/pytorch/pytorch/issues/102372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102756 Approved by: https://github.com/atalman	2023-06-01 23:11:07 +00:00
Nikita Shulga	30cecc0e11	[MPS] Fix build regressions introduced by #92868 (#101036 ) https://github.com/pytorch/pytorch/pull/92868 introduced `OBJC` and `OBJCXX` language dialects, but fails to propagate some important flags, like OpenMP include path(if found), `-fno-objc-arc` and `-Wno-unguarded-availability-new` suppression. This PR remedies that and fixes https://github.com/pytorch/pytorch/issues/100925 <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 62677d4</samp> This pull request improves the support for MPSGraph on Apple platforms by fixing some CMake flags for parallelism and memory management. It modifies `cmake/Dependencies.cmake` and `CMakeLists.txt` accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101036 Approved by: https://github.com/atalman, https://github.com/huydhn	2023-05-10 04:15:41 +00:00
TachikakaMin	bb28f3f519	`USE_PRECOMPILED_HEADERS` is not supported on Apple M1 (#92868 ) Fixes #80018 ```bash MACOSX_DEPLOYMENT_TARGET=12.6 CC=gcc CXX=g++ DEBUG=1 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_CUDA=0 BUILD_TEST=0 USE_FBGEMM=0 USE_NNPACK=0 USE_QNNPACK=0 USE_XNNPACK=0 USE_PRECOMPILED_HEADERS=1 USE_MPS=1 python setup.py develop ``` `error: Objective-C was disabled in PCH file but is currently enabled` This PR(https://github.com/pytorch/pytorch/pull/80432) has been reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92868 Approved by: https://github.com/kulinseth, https://github.com/malfet	2023-05-08 16:03:34 +00:00
Bin Bao	e43918b93a	[inductor] Fix AOTInductor (#99203 ) Summary: Fix the broken AOTInductor flow and add a smoketest on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99203 Approved by: https://github.com/jansel	2023-04-25 14:42:12 +00:00
Nikita Shulga	6b8ef8ea4c	[BE] Build PyTorch with `-Wnewline-eof` (#99687 ) This would avoid further regressions like the ones reported in https://github.com/pytorch/pytorch/pull/96668#issuecomment-1468029259 Surround some ONNX/flatbuffer includes with `C10_DIAGNOSTIC_PUSH_AND_IGNORED_IF_DEFINED("-Wnewline-eof")` cone of shame Fixes https://github.com/pytorch/pytorch/issues/96747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99687 Approved by: https://github.com/kit1980	2023-04-21 14:46:47 +00:00
Nikita Shulga	a8f5d72edf	Guard color diagnostics opts by compiler type (#98952 ) On Linux system where `/usr/bin/c++` is not a symlink to either `g++` or `clang++`, `try_compile` can still incorrectly identify `gcc` as supporting `-fcolor-diagnostics` flag. Rather than introducing a super complex condition (i.e. `USE_CCACHE` and `LINUX` ...) just guard the checks specific to compiler identifier. See https://github.com/ccache/ccache/issues/1275 Fixes https://github.com/pytorch/pytorch/issues/83500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98952 Approved by: https://github.com/albanD	2023-04-12 23:39:37 +00:00
Nikita Shulga	af0264ae08	[BE] Pass `-faligned-new` if supported by compiler (#97887 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 507f7a2</samp> > _`-faligned-new` flag_ > _always on for C++17_ > _simpler winter code_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97887 Approved by: https://github.com/atalman, https://github.com/Skylion007	2023-03-30 03:16:19 +00:00
QiangZiBro	a95815c6b7	fix compiler version detection on MacOS (#97883 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 43c1df6</samp> Fix build error on macOS with Xcode 12 or newer by updating clang version detection in `CMakeLists.txt`. Fixes https://github.com/pytorch/pytorch/issues/97882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97883 Approved by: https://github.com/malfet	2023-03-30 02:56:22 +00:00
Nikita Shulga	96e3b3ac72	[BE] Cleanup CMake flag suppressions (#97584 ) Use `append_cxx_flag_if_supported` to determine whether or not `-Werror` is supported Do not suppress deprecation warnings if glog is not used/installed, as the way check is written right now, it will suppress deprecations even if `glog` is not installed. Similarly, do not suppress deprecations on MacOS simply because we are compiling with protobuf. Fix deprecation warnings in: - MPS by replacing `MTLResourceOptionCPUCacheModeDefault`->`MTLResourceCPUCacheModeDefaultCache` - In GTests by replacing `TYPED_TEST_CASE`->`TYPED_TEST_SUITE` - In `codegen/onednn/interface.cpp`, by using passing `Stack` by reference rathern than pointer. Do not guard calls to `append_cxx_flag_if_supported` with `if(CLANG)` or `if(GCC)`. Fix some deprecated calls in `Metal` hide more complex exception under `C10_CLANG_DIAGNOSTIC_IGNORE` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97584 Approved by: https://github.com/kit1980	2023-03-27 18:46:09 +00:00
Nikita Shulga	14177f0d3d	[BE] Make `USE_FLASH_ATTENTION` private (#97579 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at b07152e</samp> This pull request refactors the CMake configuration to enable the `USE_FLASH_ATTENTION` feature for the `torch_cuda` target only, using a target-specific macro. This avoids conflicts with other libraries that also use this feature, such as fairseq. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97579 Approved by: https://github.com/kit1980	2023-03-25 05:41:07 +00:00
mikey dagitses	5f5d675587	remove unused CAFFE2_VERSION macros (#97337 ) remove unused CAFFE2_VERSION macros Summary: Nothing reads these and they are completely subsumed by TORCH_VERSION. Getting rid of these will be helpful for build unification, since they are also not used internally. Test Plan: Rely on CI. Reviewers: sahanp Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/97337 Approved by: https://github.com/malfet	2023-03-24 16:02:35 +00:00
Nikita Shulga	62c1e33fc9	[BE] Remove fast_nvcc tool (#96665 ) As of CUDA-11.4+ this functionality can be mimicked by passing [`--threads`](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#threads-number-t) option to CUDA compiler Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/96665 Approved by: https://github.com/atalman, https://github.com/PaliC	2023-03-14 03:17:31 +00:00
cyy	666efd8d5d	Improve ASAN and TSAN handling in cmake (#93147 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93147 Approved by: https://github.com/malfet	2023-03-07 14:10:13 +00:00
Peter Bell	c5f6092591	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2023-03-01 17:26:36 +00:00
PyTorch MergeBot	801b3f8fc7	Revert "Use FindCUDAToolkit to find cuda dependencies (#82695 )" This reverts commit `7289d22d67`. Reverted https://github.com/pytorch/pytorch/pull/82695 on behalf of https://github.com/peterbell10 due to Breaks torchaudio build	2023-02-28 02:29:09 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
cyy	c1fa403e57	suppress nvfuser loading warning when we disable nvfuser (#95603 ) To avoid annoying warnings such as "[W interface.cpp:47] Warning: Loading nvfuser library failed" Pull Request resolved: https://github.com/pytorch/pytorch/pull/95603 Approved by: https://github.com/ezyang	2023-02-27 18:56:46 +00:00
Peter Bell	7289d22d67	Use FindCUDAToolkit to find cuda dependencies (#82695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82695 Approved by: https://github.com/malfet	2023-02-21 22:35:17 +00:00
cyy	1ab112cfab	code is clean enough that some warnings can be enabled (#95139 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95139 Approved by: https://github.com/Skylion007	2023-02-21 07:24:20 +00:00
jjsjann123	21eb7f70f1	Nvfuser python API import fix (#94036 ) 1. Having nvfuser python API import working with both devel and upstream; 2. Add environment variable to allow custom nvfuser code base to be built with upstream pytorch core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94036 Approved by: https://github.com/malfet, https://github.com/davidberard98	2023-02-16 20:10:40 +00:00
Jing Xu	8b37eff69f	remove abi uncertainty and potential abi conflict (#94306 ) Currently there is a potential conflict for `GLIBCXX_USE_CXX11_ABI` configuration if users don't explicitly set this variable. In `caffe2/CMakeLists.txt`, if the variable is not set, an `abi checker` will be used to retrieve the ABI configuration from compiler. https://github.com/pytorch/pytorch/blob/master/caffe2/CMakeLists.txt#L1165-L1183 However, in 'torch/csrc/Module.cpp`, if the variable is not set, it will be set to `0`. The conflict happens when the default ABI of the compiler is `1`. https://github.com/pytorch/pytorch/blob/master/torch/csrc/Module.cpp#L1612 This PR eliminate this uncertainty and potential conflict. The ABI will be checked and set in `CMakeLists.txt`, and pass the value to `caffe2/CMakeLists.txt`. Meanwhile, in case the `caffe2/CMakeLists.txt` is directly invoked from a `cmake` command, The original GLIBC check logic is kept in this file. If users doesn't explicitly assign a value to `GLIBCXX_USE_CXX11_ABI`, the `abi checker` will be executed and set the value accordingly. If the `abi checker` failed to compile or execute, the value will be set to `0`. If users explicitly assigned a value, then the provided value will be used. Moreover, if `GLIBCXX_USE_CXX11_ABI` is set to `0`, the '-DGLIBCXX_USE_CXX11_ABI=0' flag won't be appended to `CMAKE_CXX_FLAGS`. Thus, whether to use ABI=0 or ABI=1 fully depends on compiler's default configuration. It could cause an issue that even users explicitly set `GLIBCXX_USE_CXX11_ABI` to `0`, the compiler still builds the binaries with ABI=1. https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L44-L51 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94306 Approved by: https://github.com/malfet	2023-02-09 09:54:04 +00:00
cyy	9291f9b9e2	Simplify cmake code (#91546 ) We use various newer CMake features to simplify build system: 1.Caffe2::threads is replaced by threads::threads. 2.Some unused MSVC flags are removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91546 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-08 01:05:19 +00:00
PyTorch MergeBot	1063394898	Revert "Add fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries for _GLIBCXX_USE_CXX11_ABI=1 (#93835 )" This reverts commit `b562be793a`. Reverted https://github.com/pytorch/pytorch/pull/93835 on behalf of https://github.com/huydhn due to This breaks XLA build `b562be793a`	2023-02-07 04:49:06 +00:00
zhuhong61	b562be793a	Add fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries for _GLIBCXX_USE_CXX11_ABI=1 (#93835 ) Fixes #https://github.com/pytorch/pytorch/pull/92550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93835 Approved by: https://github.com/malfet	2023-02-07 03:05:39 +00:00
Aaron Gokaslan	2fc2ca7652	[BE]: Fix CMake LTO policy on pytorch (#93388 ) Not this is a non-functional change since non of our CIs actually build with LTO. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93388 Approved by: https://github.com/albanD	2023-02-01 17:06:53 +00:00
Dmytro Dzhulgakov	5105a8d3fc	Enable Kineto in OSS builds by fixing build condition (resubmit) (#93033 ) Resubmit of https://github.com/pytorch/pytorch/pull/89174 . I think I fixed underlying issues back then, but only CI would tell. Context: This PR enables Kineto on OSS builds because of how the flags were misconfigured before. I think generally having global observer in OSS is nice. There's some work to release on demand profiling with dynolog, and right now its build instructions start with "go change pytorch's CMake": https://github.com/facebookincubator/dynolog/blob/main/docs/pytorch_profiler.md#pytorch-setup The previous PR was reverted because of the bug in Kineto that got fixed in https://github.com/pytorch/kineto/pull/696 (and the submodule was updated since) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93033 Approved by: https://github.com/kimishpatel	2023-01-27 08:58:03 +00:00

1 2 3 4 5 ...

594 Commits