pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Huy Do	4400c5d31e	Continue to build nightly CUDA 12.9 for internal (#163029 ) Revert part of https://github.com/pytorch/pytorch/pull/161916 to continue building CUDA 12.9 nightly Pull Request resolved: https://github.com/pytorch/pytorch/pull/163029 Approved by: https://github.com/malfet	2025-10-11 08:26:47 +00:00
Jithun Nair	0ec0120b19	Move aws OIDC credentials steps into setup-rocm.yml (#164769 ) The AWS ECR login step needs `id-token: write` permissions. We move the steps to get OIDC-based credentials from `_rocm-test.yml` to `setup-rocm.yml`. This lays the groundwork to enable access to AWS ECR in workflows in other repos such as torchtitan that use [linux_job_v2.yml](https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job_v2.yml), which also uses [setup-rocm.yml](`335f4f80a0/.github/workflows/linux_job_v2.yml (L168)`). Any caller workflows that eventually execute `setup-rocm` action will thus need to provide the `id-token: write` permission. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164769 Approved by: https://github.com/huydhn	2025-10-10 21:24:29 +00:00
Jeff Daily	f1260c9b9a	[ROCm][CI/CD] upgrade nightly wheels to ROCm 7.0 (#163937 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163937 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-26 21:42:09 +00:00
Jithun Nair	0ec946a052	[ROCm] Increase binary build timeout to 5 hours (300 minutes) (#163776 ) Despite narrowing down the [FBGEMM_GENAI build to gfx942](https://github.com/pytorch/pytorch/pull/162648), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897). This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026)and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041), because both of those are close to the 4hr mark currently. This PR is a more ROCm-targeted version of https://github.com/pytorch/pytorch/pull/162880 (which is for release/2.9 branch). Pull Request resolved: https://github.com/pytorch/pytorch/pull/163776 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-24 23:02:08 +00:00
atalman	bffc7dd1f3	[CD] Add cuda 13.0 libtorch builds, remove CUDA 12.9 builds (#161916 ) Related to https://github.com/pytorch/pytorch/issues/159779 Adding CUDA 13.0 libtorch builds, followup after https://github.com/pytorch/pytorch/pull/160956 Removing CUDA 12.9 builds, See https://github.com/pytorch/pytorch/issues/159980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>	2025-09-05 07:47:54 +00:00
PyTorch MergeBot	6b8b3ac440	Revert "[ROCm] Use MI325 (gfx942) runners for binary smoke testing (#162044 )" This reverts commit `cd529b686d`. Reverted https://github.com/pytorch/pytorch/pull/162044 on behalf of https://github.com/jeffdaily due to mi200 backlog is purged, and mi300 runners are failing in GHA download ([comment](https://github.com/pytorch/pytorch/pull/162044#issuecomment-3254427869))	2025-09-04 16:06:30 +00:00
Jithun Nair	cd529b686d	[ROCm] Use MI325 (gfx942) runners for binary smoke testing (#162044 ) ### Motivation * MI250 Cirrascale runners are currently having network timeout leading to huge queueing of binary smoke test jobs: <img width="483" height="133" alt="image" src="https://github.com/user-attachments/assets/17293002-78ad-4fc9-954f-ddd518bf0a43" /> * MI210 Hollywood runners (with runner names such as `pytorch-rocm-hw-`) are not suitable for these jobs, because they seem to take much longer to download artifacts: https://github.com/pytorch/pytorch/pull/153287#issuecomment-2918420345 (this is why these jobs were specifically targeting Cirrascale runners). However, it doesn't seem like Cirrascale runners are necessarily doing much better either e.g. [this recent build](https://github.com/pytorch/pytorch/actions/runs/17332256791/job/49231006755). Moving to MI325 runners should address the stability part at least, while also reducing load on limited MI2xx runner capacity. * However, I'm not sure if the MI325 runners will do any better on the artifact download part (this may need to be investigated more) cc @amdfaa * Also removing `ciflow/binaries` and `ciflow/binaries_wheel` label/tag triggers for `generated-linux-binary-manywheel-rocm-main.yml` because we already trigger ROCm binary build/test jobs via these labels/tags in `generated-linux-binary-manywheel-nightly.yml`. And for developers who want to trigger ROCm binary build/test jobs on their PRs, they can use the `ciflow/rocm-mi300` label/tag as per this PR. ### TODOs (cc @amdfaa): * Check that the workflow runs successfully on the MI325 runners in this PR. Note how long the test jobs take esp. the "Download Build Artifacts" step * Once this PR is merged, clear the queue of jobs targeting `linux.rocm.gpu.mi250` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162044 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-03 18:34:07 +00:00
Ting Lu	49ff884b1e	Add CUDA 13.0 x86 builds (#160956 ) https://github.com/pytorch/pytorch/issues/159779 CUDA 13.0.0 NVSHMEM 3.3.20 CUDNN 9.12.0.46 Adding x86 linux builds for CUDA 13. Adding libtorch docker. Package naming changed for CUDA 13 (removed postfix -cu13 for some packages). Preparation checklist: 1. Update index https://download.pytorch.org/whl/nightly/cu130 with pypi packages 2. Update packaging name based on https://pypi.org/project/cuda-toolkit/ metadata Pull Request resolved: https://github.com/pytorch/pytorch/pull/160956 Approved by: https://github.com/atalman Co-authored-by: atalman <atalman@fb.com>	2025-08-22 11:31:09 +00:00
Ting Lu	0504480f37	Add CUDA 12.9 libtorch nightly (#155895 ) https://github.com/pytorch/pytorch/issues/155196 with libtorch docker added, we can add the build script Pull Request resolved: https://github.com/pytorch/pytorch/pull/155895 Approved by: https://github.com/atalman	2025-06-19 13:15:42 +00:00
Jithun Nair	794ef6c9b8	Enable manywheel build and smoke test on main branch for ROCm (#153287 ) Fixes issue of not discovering breakage of ROCm wheel builds until the nightly job runs e.g. https://github.com/pytorch/pytorch/pull/153253 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153287 Approved by: https://github.com/jeffdaily	2025-06-14 19:14:31 +00:00
PyTorch MergeBot	d7e3c9ce82	Revert "Enable manywheel build and smoke test on main branch for ROCm (#153287 )" This reverts commit `3b6569b1ef`. Reverted https://github.com/pytorch/pytorch/pull/153287 on behalf of https://github.com/clee2000 due to broke lint [GH job link](https://github.com/pytorch/pytorch/actions/runs/15646152483/job/44083912145) [HUD commit link](`3b6569b1ef`) ([comment](https://github.com/pytorch/pytorch/pull/153287#issuecomment-2972088294))	2025-06-14 01:32:27 +00:00
Jithun Nair	3b6569b1ef	Enable manywheel build and smoke test on main branch for ROCm (#153287 ) Fixes issue of not discovering breakage of ROCm wheel builds until the nightly job runs e.g. https://github.com/pytorch/pytorch/pull/153253 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153287 Approved by: https://github.com/jeffdaily	2025-06-14 00:05:57 +00:00
Ting Lu	4c3da611c2	Add CUDA 12.9.1 x86 nightly binaries (#154980 ) Adding CUDA 12.9.1 to nightly binaries matrix for linux (x86) builds. Add sbsa and libtorch build docker images, builds addition will be follow-up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154980 Approved by: https://github.com/eqy, https://github.com/atalman	2025-06-11 13:43:17 +00:00
atalman	8153340d10	[CI/CD] Remove CUDA 11.8 builds (#155509 ) This removes CUDA 11.8 from CI/CD Please see: https://github.com/pytorch/pytorch/issues/147383 TODO: Will followup of cleaning CUDA 11.8 config from scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/155509 Approved by: https://github.com/cyyever, https://github.com/huydhn, https://github.com/malfet	2025-06-10 05:16:41 +00:00
Catherine Lee	e05ac9b794	Use folder tagged docker images for binary builds (#151706 ) Should be the last part of https://github.com/pytorch/pytorch/pull/150558, except for maybe s390x stuff, which I'm still not sure what's going on there For binary builds, do the thing like we do in CI where we tag each image with a hash of the .ci/docker folder to ensure a docker image built from that commit gets used. Previously it would use imagename:arch-main, which could be a version of the image based on an older commit After this, changing a docker image and then tagging with ciflow/binaries on the same PR should use the new docker images Release and main builds should still pull from docker io Cons: * if someone rebuilds the image from main or a PR where the hash is the same (ex folder is unchanged, but retrigger docker build for some reason), the release would use that image instead of one built on the release branch * spin wait for docker build to finish Pull Request resolved: https://github.com/pytorch/pytorch/pull/151706 Approved by: https://github.com/atalman	2025-04-22 21:50:10 +00:00
Jithun Nair	b4550541ea	[ROCm] upgrade nightly wheels to rocm6.4 (#151355 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151355 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-04-17 17:29:07 +00:00
Eli Uriegas	1d9401befc	ci: Remove mentions and usages of DESIRED_DEVTOOLSET and cxx11 (#149443 ) This is a remnant of our migration to manylinux2_28 we should remove these since all of our binary builds are now built with cxx11_abi Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/149443 Approved by: https://github.com/izaitsevfb, https://github.com/atalman	2025-03-20 16:49:46 +00:00

17 Commits