pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Wang, Chuanqi	292454942e	[CD] Introduce windows.12xlarge runners for CD Windows build (#165287 ) Follows https://github.com/pytorch/test-infra/pull/7174. Windows CD build time cost comparison as below \|Runner\|cpu\|cuda\|xpu\| \|-\|-\|-\|-\| \|windows.4xlarge\|1.5h\| 4.0h\| 5.5h\| \|windows.12xlarge\|0.5h\|1.5h\|2.5h\| Fixes #162962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165287 Approved by: https://github.com/zxiiro, https://github.com/malfet, https://github.com/seemethere	2025-10-21 18:28:23 +00:00
atalman	81dbeb06f4	CUDA aarch64 12.6 and 12.8 builds fix triton constraints (#165013 ) Since we have introduced CUDA aarch64 builds for all cuda versions we need to remove this constraint. This was missed by https://github.com/pytorch/pytorch/pull/162364 Proper constraint on triton should be: ``` Requires-Dist: triton==3.5.0; platform_system == "Linux" ``` not: ``` Requires-Dist: triton==3.5.0; platform_system == "Linux" and platform_machine == "x86_64" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165013 Approved by: https://github.com/Camyll, https://github.com/nWEIdia, https://github.com/tinglvv	2025-10-09 00:49:28 +00:00
Wei Wang	ef8aabd424	[CD][CUDA13][ARM] aarch64 binary seems to be missing Triton dependency (#161833 ) Requires: filelock, fsspec, jinja2, networkx, setuptools, sympy, typing-extensions Seems to be missing Triton. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161833 Approved by: https://github.com/tinglvv, https://github.com/Skylion007, https://github.com/atalman	2025-09-02 19:31:14 +00:00
Nikita Shulga	d0226719a9	[BE][EZ] Delete remains of split-build logic (#159990 ) Hopefully last piece of https://github.com/pytorch/pytorch/issues/138750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159990 Approved by: https://github.com/atalman ghstack dependencies: #159986	2025-08-07 01:59:30 +00:00
Andrey Talman	7275f28045	Fix cuda 12.9 aarch64 GPU builds. Update CUDA_STABLE variable. (#157630 ) This contains 2 fixes that required in main and will need to be cherry-picked to Release 2.8 branch: 1. The PR https://github.com/pytorch/pytorch/pull/155819 missed to include triton change. 2. CUDA STABLE variable needs to be set to 12.8. Updating CUDA stable updates full static build Pull Request resolved: https://github.com/pytorch/pytorch/pull/157630 Approved by: https://github.com/Skylion007, https://github.com/jeanschmidt	2025-07-04 18:08:31 +00:00
Wang, Chuanqi	5116293f7e	[XPU] Split triton version as 2 files to decouple triton version bump (#155313 ) Triton XPU shares its version file with the community one. When the community updates Triton version, it will temporarily break the XPU CI/CD because they use different repositories and commits. To decouple Triton version bumps between the community and XPU, we propose splitting the version into two separate files. Refer the latest community triton version bump PR #153117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155313 Approved by: https://github.com/etaf, https://github.com/EikanWang, https://github.com/atalman	2025-06-10 08:49:03 +00:00
Andrey Talman	9b1127437e	Add triton as dependency to CUDA aarch64 build (#149584 ) Aarch64 Triton build was added by: https://github.com/pytorch/pytorch/pull/148705 Hence add proper contrain to CUDA 12.8 Aarch64 build Please note we want to still use: ```platform_system == 'Linux' and platform_machine == 'x86_64'``` For all other builds. Since these are prototype binaries only used by cuda 12.8 linux aarch64 build. Which we would like to serve from download.pytorch.org Pull Request resolved: https://github.com/pytorch/pytorch/pull/149584 Approved by: https://github.com/nWEIdia, https://github.com/tinglvv, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-03-20 15:39:45 +00:00
Wang, Chuanqi	4fdd076907	[CD] Add triton xpu as dependency of torch xpu windows whl (#148755 ) Depends on PR #147637 land Pull Request resolved: https://github.com/pytorch/pytorch/pull/148755 Approved by: https://github.com/atalman	2025-03-10 14:04:30 +00:00
atalman	1db3c58fab	Remove manylinux 2014 artifacts (#148135 ) 1. Switch Magma build to Manylinux 2.28 base 2. Use manylinux 2.28 as default in populate_binary_env.sh 3. Remove manylinux 2014 docker builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/148135 Approved by: https://github.com/malfet	2025-02-28 13:43:14 +00:00
atalman	519269a415	[BE] - Remove conda test and upload scripts and env variables from Workflows Part 1 (#144870 ) Remove conda test and upload scripts and env variables from Workflows Related to: https://github.com/pytorch/pytorch/issues/138506 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144870 Approved by: https://github.com/malfet	2025-01-16 17:20:14 +00:00
chuanqiw	438698b20b	[CD] Remove redundant triton dependency for xpu wheels (#143839 ) Due to XPU CD wheels enabled pypi dependencies by https://github.com/pytorch/pytorch/pull/141135, so the PYTORCH_EXTRA_INSTALL_REQUIREMENTS has value for XPU CD wheel build. Works for https://github.com/pytorch/pytorch/issues/139722 and https://github.com/pytorch/pytorch/issues/114850 Fixes #143838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143839 Approved by: https://github.com/huydhn	2024-12-30 13:39:06 +00:00
PyTorch MergeBot	0ebc6388cf	Revert "Exclude py 31.3t triton package from PyTorch 3.13t wheel (#143218 )" This reverts commit `3bfdf6f063`. Reverted https://github.com/pytorch/pytorch/pull/143218 on behalf of https://github.com/atalman due to this constrain is ignored see https://github.com/pytorch/pytorch/issues/143654 ([comment](https://github.com/pytorch/pytorch/pull/143218#issuecomment-2560208992))	2024-12-23 19:37:35 +00:00
atalman	3bfdf6f063	Exclude py 31.3t triton package from PyTorch 3.13t wheel (#143218 ) Follow up after https://github.com/pytorch/pytorch/pull/143162 Include triton only for 3.13 packages not 3.13t Pull Request resolved: https://github.com/pytorch/pytorch/pull/143218 Approved by: https://github.com/kit1980	2024-12-14 00:12:45 +00:00
Andrey Talman	04bb82f097	Linux Wheels: Remove triton dependency python < 3.13 constraint (#143162 ) We do build pytorch-triton package for python 3.13 : https://github.com/pytorch/pytorch/actions/runs/12304476674/job/34344764271 Hence constraint is no longer needed. This stack enabled torch.compile for Python 3.13 : https://github.com/pytorch/pytorch/pull/141264 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143162 Approved by: https://github.com/kit1980	2024-12-13 15:08:44 +00:00
Huy Do	bae9510307	Fix pytorch-triton nightly checksum shorthash (#141410 ) Binary build is failing in trunk after https://github.com/pytorch/pytorch/pull/139206 lands, for example, https://github.com/pytorch/pytorch/actions/runs/11981181986/job/33410250461#step:17:539. It's a bit tricky to spot the issue but the difference is between `3.2.0+35c6c7c628` set by PyTorch and `3.2.0+git35c6c7c6` from triton (look closely one has the length of 10, the other of 8 characters) Triton now has its own nightly build logic in https://github.com/triton-lang/triton/pull/4812 that takes only 8 characters by default while the original logic from PT took 10. So, PT nightly couldn't find the dependency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141410 Approved by: https://github.com/seemethere, https://github.com/malfet	2024-11-23 04:56:40 +00:00
PyTorch MergeBot	66f2550328	Revert "Fix pytorch-triton nightly checksum shorthand (#141410 )" This reverts commit `9f8a19172d`. Reverted https://github.com/pytorch/pytorch/pull/141410 on behalf of https://github.com/huydhn due to There is still a small tweak that I need to do 35c6c7c628 is now git35c6c7c6 so a prefix is needed, going to revert and reland this ([comment](https://github.com/pytorch/pytorch/pull/141410#issuecomment-2495291851))	2024-11-23 04:16:39 +00:00
Huy Do	9f8a19172d	Fix pytorch-triton nightly checksum shorthand (#141410 ) Binary build is failing in trunk after https://github.com/pytorch/pytorch/pull/139206 lands, for example, https://github.com/pytorch/pytorch/actions/runs/11981181986/job/33410250461#step:17:539. It's a bit tricky to spot the issue but the difference is between `3.2.0+35c6c7c628` set by PyTorch and `3.2.0+git35c6c7c6` from triton (look closely one has the length of 10, the other of 8 characters) Triton now has its own nightly build logic in https://github.com/triton-lang/triton/pull/4812 that takes only 8 characters by default while the original logic from PT took 10. So, PT nightly couldn't find the dependency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141410 Approved by: https://github.com/seemethere, https://github.com/malfet	2024-11-23 03:25:52 +00:00
atalman	078dca1ce8	Aarch64 binary builds - fix passing env_file to Docker (#138588 ) Aarch64 builds skipped the logic of sourcing binary env file. And as a result PYTORCH_EXTRA_INSTALL_REQUIREMENTS passed to Aarch64 builds have not included triton dependency constraint. This PR makes sure Aarch64 builds follow same path as our regular manywheel builds. To work around this issue we had to inject triton in aarrch64 builds for release 2.5, which is not ideal: https://github.com/pytorch/builder/pull/2011 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138588 Approved by: https://github.com/jeanschmidt, https://github.com/malfet	2024-10-22 19:04:19 +00:00
Jack Taylor	034717a029	[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>	2024-09-05 20:36:45 +00:00
PyTorch MergeBot	a1ba8e61d1	Revert "[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 )" This reverts commit `5e8bf29148`. Reverted https://github.com/pytorch/pytorch/pull/133438 on behalf of https://github.com/ZainRizvi due to This still breaks linux binary builds. Added the appropriate labels to ensure tests can pass. See [GH job link](https://github.com/pytorch/pytorch/actions/runs/10626427003/job/29460479554) [HUD commit link](`5e8bf29148`) ([comment](https://github.com/pytorch/pytorch/pull/133438#issuecomment-2322246198))	2024-08-30 20:00:41 +00:00
Jack Taylor	5e8bf29148	[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>	2024-08-30 03:38:35 +00:00
PyTorch MergeBot	4648848696	Revert "[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 )" This reverts commit `f71c3d265a`. Reverted https://github.com/pytorch/pytorch/pull/133438 on behalf of https://github.com/jeanschmidt due to seems to have introduced breakages in linux binary builds ([comment](https://github.com/pytorch/pytorch/pull/133438#issuecomment-2308787310))	2024-08-25 11:20:30 +00:00
Jack Taylor	f71c3d265a	[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2024-08-24 18:26:49 +00:00
chuanqiw	6590f4fb0e	[CD] Enable python 3.13 for xpu nightly build (#133670 ) Enable python 3.13 for XPU nightly build, it depends on https://github.com/pytorch/pytorch/pull/133454 land. Also update the xpu nightly wheel test env. Works for https://github.com/pytorch/pytorch/issues/114850 Fixes #130543 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133670 Approved by: https://github.com/atalman, https://github.com/malfet	2024-08-20 15:05:20 +00:00
chuanqiw	ca023f77bc	[CD] Add pytorch xpu wheel build in nightly (#129560 ) Add pytorch xpu wheel build in nightly after the xpu build image enabling PR https://github.com/pytorch/builder/pull/1879 merged Pull Request resolved: https://github.com/pytorch/pytorch/pull/129560 Approved by: https://github.com/atalman	2024-07-11 15:49:04 +00:00
PaliC	b57fa8d9c0	[BE] Remove JNI from libtorch builds (#124995 ) Removes jni files from the libtorch build as we do not plan to distribute them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124995 Approved by: https://github.com/malfet	2024-06-22 07:41:54 +00:00
PaliC	b0044e2e18	[Split Build] Support nightly release (#129011 ) This PR adds the split build to our binaries workflow. Validation for the workflow is done using the PR above in conjunction with https://github.com/pytorch/builder/pull/1876. Test Workflow: Check CI in the workflow above Pull Request resolved: https://github.com/pytorch/pytorch/pull/129011 Approved by: https://github.com/atalman	2024-06-22 05:45:14 +00:00
PaliC	fc5b0ff2d7	[BE][Hackaday] deprecate legacy cuda docker image (#128859 ) Fixes https://github.com/pytorch/builder/issues/1795 from the pytorch side specifically for the cuda image Pull Request resolved: https://github.com/pytorch/pytorch/pull/128859 Approved by: https://github.com/atalman	2024-06-20 16:30:49 +00:00
Jithun Nair	a6ac6447b5	Re-enable py3.12 nightly wheel builds and add triton dependency for ROCm (#128525 ) The llnl-hatchet developers have published the py3.12 binaries on [PyPI](https://pypi.org/project/llnl-hatchet/#files). In fact, looking [here](https://download.pytorch.org/whl/nightly/llnl-hatchet), it seems we already have the py3.12 wheels mirrored. This should allow us to re-enable py3.12 binaries for ROCm. This PR reverts commit `9d849d4312`. It also adds the pytorch-triton-rocm dependency for torch wheels on ROCm since pytorch-triton-rocm py3.12 wheels are available now Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128525 Approved by: https://github.com/malfet	2024-06-19 21:56:54 +00:00
PyTorch MergeBot	ee140a198f	Revert "[Port][Quant][Inductor] Bug fix: mutation nodes not handled correctly for QLinearPointwiseBinaryPT2E (#128591 )" This reverts commit `03e8a4cf45`. Reverted https://github.com/pytorch/pytorch/pull/128591 on behalf of https://github.com/atalman due to Contains release only changes should not be landed ([comment](https://github.com/pytorch/pytorch/pull/128591#issuecomment-2168308233))	2024-06-14 15:51:00 +00:00
Xia, Weiwen	03e8a4cf45	[Port][Quant][Inductor] Bug fix: mutation nodes not handled correctly for QLinearPointwiseBinaryPT2E (#128591 ) Port #127592 from main to release/2.4 ------ Fixes #127402 - Revert some changes to `ir.MutationOutput` and inductor/test_flex_attention.py - Add checks of mutation for QLinearPointwiseBinaryPT2E Pull Request resolved: https://github.com/pytorch/pytorch/pull/127592 Approved by: https://github.com/leslie-fang-intel, https://github.com/Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/128591 Approved by: https://github.com/jgong5, https://github.com/Chillee	2024-06-14 09:31:38 +00:00
atalman	af5ed05416	Include triton in py3.12 binaries (#127547 ) Additional Builder PR: https://github.com/pytorch/builder/pull/1846/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127547 Approved by: https://github.com/williamwen42	2024-05-31 00:30:10 +00:00
atalman	5db5049b34	Move TRITON_CONSTRAINT setting to common binary_populate_env.sh, BE - Cleanup unused build scripts (#120744 ) 1. This moves TRITON_CONSTRAINT to common binary_populate_env.sh so that this is set for all wheels. test in CI via ``ciflow/binaries`` label. Please note we only setting this constraint when PYTORCH_EXTRA_INSTALL_REQUIREMENTS is set. And this variable is set for all the wheels that gets uploaded to pypi. Hence triton wheels need to be set at the same place. This is done for regular wheels and rocm wheels separately, since rocm wheels using different triton package 3. Cleanup legacy unused code Test: `` git grep setup_linux_system_environment.sh `` Needs: https://github.com/pytorch/builder/pull/1712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120744 Approved by: https://github.com/huydhn	2024-02-29 14:25:34 +00:00
Nikita Shulga	5e615f5f3a	[BE] Use `version.txt` to determine version of nightly builds (#115794 ) Fixes TODO from https://github.com/pytorch/pytorch/pull/33326 Test plan: check version generated by CI: - https://github.com/pytorch/pytorch/actions/runs/7202798334/job/19621620744?pr=115794#step:9:64 - https://github.com/pytorch/pytorch/actions/runs/7202798329/job/19621639791?pr=115794#step:11:104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115794 Approved by: https://github.com/atalman	2023-12-14 01:09:51 +00:00
atalman	995fae6060	Move small pypi build as default for linux cuda 12.1 (#114281 ) This is first PR to resolve: https://github.com/pytorch/pytorch/issues/113972 Move our small wheel build as default Test: ``` pip3 install --no-cache-dir --pre torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl --index-url https://download.pytorch.org/whl/nightly/cu121 Looking in indexes: https://download.pytorch.org/whl/nightly/cu121 Processing ./torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl Collecting filelock (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/filelock-3.9.0-py3-none-any.whl (9.7 kB) Collecting typing-extensions>=4.8.0 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/typing_extensions-4.8.0-py3-none-any.whl (31 kB) Collecting sympy (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/sympy-1.11.1-py3-none-any.whl (6.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 253.4 MB/s eta 0:00:00 Collecting networkx (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/networkx-3.0rc1-py3-none-any.whl (2.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 387.1 MB/s eta 0:00:00 Collecting jinja2 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/Jinja2-3.1.2-py3-none-any.whl (133 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 365.3 MB/s eta 0:00:00 Collecting fsspec (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/fsspec-2023.4.0-py3-none-any.whl (153 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.0/154.0 kB 370.6 MB/s eta 0:00:00 Collecting pytorch-triton==2.1.0+6e4932cda8 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-2.1.0%2B6e4932cda8-cp310-cp310-linux_x86_64.whl (125.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 MB 384.1 MB/s eta 0:00:00 Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 404.9 MB/s eta 0:00:00 Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 402.5 MB/s eta 0:00:00 Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 383.9 MB/s eta 0:00:00 Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 406.9 MB/s eta 0:00:00 Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 388.2 MB/s eta 0:00:00 Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 410.5 MB/s eta 0:00:00 Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 272.9 MB/s eta 0:00:00 Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 381.5 MB/s eta 0:00:00 Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 394.6 MB/s eta 0:00:00 Collecting nvidia-nccl-cu12==2.19.3 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 384.7 MB/s eta 0:00:00 Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 281.8 MB/s eta 0:00:00 Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvjitlink_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (19.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.8/19.8 MB 367.3 MB/s eta 0:00:00 Collecting MarkupSafe>=2.0 (from jinja2->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) Collecting mpmath>=0.19 (from sympy->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/mpmath-1.2.1-py3-none-any.whl (532 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 532.6/532.6 kB 391.3 MB/s eta 0:00:00 Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, pytorch-triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114281 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-11-22 00:10:03 +00:00
atalman	f9053877b4	Add pypi required metadata to all wheels except linux (#111042 ) Will fix package after publishing https://github.com/pytorch/pytorch/issues/100974 Poetry install requires all wheels on pypi to have same metadata. Hence including linux dependencies in all non-linux wheels Pull Request resolved: https://github.com/pytorch/pytorch/pull/111042 Approved by: https://github.com/malfet	2023-10-12 17:40:13 +00:00
drisspg	ad90ab31f2	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-13 13:59:05 +00:00
Huy Do	a9c663c269	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 07:43:04 +00:00
PyTorch MergeBot	e45b290127	Revert "Revert "Flash Attention v2 (#105602 )" (#108827 )" This reverts commit `24e9bbe22a`. Reverted https://github.com/pytorch/pytorch/pull/108827 on behalf of https://github.com/huydhn due to I need to land this revert properly as there are new failures showing up on trunk ([comment](https://github.com/pytorch/pytorch/pull/108827#issuecomment-1711020924))	2023-09-08 03:25:45 +00:00
Huy Do	24e9bbe22a	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 02:54:20 +00:00
atalman	6a1a893f8f	Bump version 2.1.0 -> 2.2.0 (#108156 ) Same as: https://github.com/pytorch/pytorch/pull/95790 <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 50063bb</samp> > _`PyTorch` version up_ > _Nightly and release builds change_ > _Autumn of progress_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/108156 Approved by: https://github.com/osalpekar, https://github.com/albanD	2023-09-05 15:56:23 +00:00
drisspg	add45aea1c	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-01 22:14:44 +00:00
PyTorch MergeBot	d569e506ab	Revert "Flash Attention v2 (#105602 )" This reverts commit `9df3d882c8`. Reverted https://github.com/pytorch/pytorch/pull/105602 on behalf of https://github.com/huydhn due to I think we miss a case here for sm80 build on inductor workflow as it is now OOM on trunk https://github.com/pytorch/pytorch/actions/runs/6042843139 ([comment](https://github.com/pytorch/pytorch/pull/105602#issuecomment-1701974862))	2023-09-01 01:15:01 +00:00
drisspg	9df3d882c8	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-08-31 16:02:20 +00:00
atalman	d1ec9a51e9	Bump version 2.0.0 -> 2.1.0 (#95790 ) Same as: https://github.com/pytorch/pytorch/pull/90491 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95790 Approved by: https://github.com/albanD, https://github.com/malfet	2023-03-02 00:38:46 +00:00
Cristian Panaite	d3049378be	Repair the path to jni.h for libtorch windows build (#93057 ) Fixes #86536 It seems like the file is not found when the environment is populate, so the BUILD_JNI flag is false. To mark it as true, I had to add a `/pytorch/` when adding paths in `POSSIBLE_JAVA_HOMES`. This way, it seems like the file is found and the flag it's true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93057 Approved by: https://github.com/malfet, https://github.com/Blackhex	2023-01-27 15:20:30 +00:00
Nikita Shulga	6fb79b7004	Bump version: 1.14.0->2.0.0 (#90491 ) Except for the usual location, had to update the version in one of ONNX expect patterns, namely here: `43660051d8/test/onnx/expect/TestOperators.test_avg_pool2d.expect (L3)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90491 Approved by: https://github.com/jansel, https://github.com/albanD	2022-12-09 01:08:08 +00:00
atalman	3af0eafea6	Release 1.13: Bump nightly version 1.13->1.14 (#86296 ) Release 1.13: Bump nightly version 1.13->1.14 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86296 Approved by: https://github.com/seemethere, https://github.com/malfet	2022-10-06 23:26:58 +00:00
atalman	eb94df28c7	Use pip install cu117 (#85097 ) Creates new wheel workflow specific to CUDA 11.7 that does not bundle the cudnn and cublas. Workflow: https://github.com/pytorch/pytorch/actions/runs/3094622781 New Package: manywheel-py3_10-cuda11_7-with-pypi-cudnn \| 843 MB Old Package: manywheel-py3_10-cuda11_7 \| 1.65 GB Testing workflow: [manywheel-py3_7-cuda11_7-with-pypi-cudnn-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000867662#logs): ``` Bundling without cudnn and cublas. + DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "$LIBGOMP_PATH") + DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libgomp.so.1") ..... pytorch_extra_install_requirements: nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cublas-cu11 ``` [manywheel-py3_7-cuda11_7-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000863250#logs) ``` Bundling with cudnn and cublas. + DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "/usr/local/cuda/lib64/libcudnn_adv_infer.so.8" "/usr/local/cuda/lib64/libcudnn_adv_train.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_infer.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_train.so.8" "/usr/local/cuda/lib64/libcudnn_ops_infer.so.8" "/usr/local/cuda/lib64/libcudnn_ops_train.so.8" "/usr/local/cuda/lib64/libcudnn.so.8" "/usr/local/cuda/lib64/libcublas.so.11" "/usr/local/cuda/lib64/libcublasLt.so.11" "$LIBGOMP_PATH") + DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libcudnn_adv_infer.so.8" "libcudnn_adv_train.so.8" "libcudnn_cnn_infer.so.8" "libcudnn_cnn_train.so.8" "libcudnn_ops_infer.so.8" "libcudnn_ops_train.so.8" "libcudnn.so.8" "libcublas.so.11" "libcublasLt.so.11" "libgomp.so.1") ``` cc: @malfet @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85097 Approved by: https://github.com/malfet	2022-09-21 16:30:25 +00:00
Michael Suo	0117fb7600	[ci] remove IS_GHA env var This is unnecessary, GitHub automatically populates a `GITHUB_ACTION` env var: https://docs.github.com/en/actions/learn-github-actions/environment-variables#default-environment-variables For docker, this env var is automatically propagated through our use of `--env-file`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79219 Approved by: https://github.com/seemethere	2022-06-10 15:29:20 +00:00

1 2 3

104 Commits