Biggest difference between both conda and homebrew CPython builds and one from python.org, is that later are universal binaries and they are always trying to build universal extension...
Workaround lots of universal binary build attempts by explicitly specifying both `_PYTHON_PLATFORM` and `--plat-name` as well as `ARCH_FLAGS`
Suppressed actionlint warning on use of `freethreaded` flag which is document in https://github.com/actions/setup-python/tree/v5
TODO: Remove lots of temporary workarounds when `3.14` is out in October 2025
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162136
Approved by: https://github.com/atalman, https://github.com/huydhn
ghstack dependencies: #162297, #162265
### Motivation
* MI250 Cirrascale runners are currently having network timeout leading to huge queueing of binary smoke test jobs:
<img width="483" height="133" alt="image" src="https://github.com/user-attachments/assets/17293002-78ad-4fc9-954f-ddd518bf0a43" />
* MI210 Hollywood runners (with runner names such as `pytorch-rocm-hw-*`) are not suitable for these jobs, because they seem to take much longer to download artifacts: https://github.com/pytorch/pytorch/pull/153287#issuecomment-2918420345 (this is why these jobs were specifically targeting Cirrascale runners). However, it doesn't seem like Cirrascale runners are necessarily doing much better either e.g. [this recent build](https://github.com/pytorch/pytorch/actions/runs/17332256791/job/49231006755).
* Moving to MI325 runners should address the stability part at least, while also reducing load on limited MI2xx runner capacity.
* However, I'm not sure if the MI325 runners will do any better on the artifact download part (this may need to be investigated more) cc @amdfaa
* Also removing `ciflow/binaries` and `ciflow/binaries_wheel` label/tag triggers for `generated-linux-binary-manywheel-rocm-main.yml` because we already trigger ROCm binary build/test jobs via these labels/tags in `generated-linux-binary-manywheel-nightly.yml`. And for developers who want to trigger ROCm binary build/test jobs on their PRs, they can use the `ciflow/rocm-mi300` label/tag as per this PR.
### TODOs (cc @amdfaa):
* Check that the workflow runs successfully on the MI325 runners in this PR. Note how long the test jobs take esp. the "Download Build Artifacts" step
* Once this PR is merged, clear the queue of jobs targeting `linux.rocm.gpu.mi250`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162044
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Including below changes,
- Add XPU support package 2025.2 build and test in CI for both Linux and Windows
- Keep XPU support package 2025.1 build in CI to ensure no break issue until PyTorch 2.9 release
- Upgrade XPU support package from 2025.1 to 2025.2 in CD for both Linux and Windows
- Rename Linux CI job name & image name to n & n-1
- Update XPU runtime pypi packages dependencies of CD wheels
- Remove deprecated support package version docker image build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158733
Approved by: https://github.com/EikanWang, https://github.com/atalman
Should be the last part of https://github.com/pytorch/pytorch/pull/150558, except for maybe s390x stuff, which I'm still not sure what's going on there
For binary builds, do the thing like we do in CI where we tag each image with a hash of the .ci/docker folder to ensure a docker image built from that commit gets used. Previously it would use imagename:arch-main, which could be a version of the image based on an older commit
After this, changing a docker image and then tagging with ciflow/binaries on the same PR should use the new docker images
Release and main builds should still pull from docker io
Cons:
* if someone rebuilds the image from main or a PR where the hash is the same (ex folder is unchanged, but retrigger docker build for some reason), the release would use that image instead of one built on the release branch
* spin wait for docker build to finish
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151706
Approved by: https://github.com/atalman
Try removing sm50 and sm60 to shrink binary size, and resolve the ld --relink error
"Architecture support for Maxwell, Pascal, and Volta is considered feature-complete and will be frozen in an upcoming release." from 12.8 release note.
Also updating the runner for cuda 12.8 test to g4dn (T4, sm75) due to the drop in sm50/60 support.
https://github.com/pytorch/pytorch/issues/145570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146265
Approved by: https://github.com/atalman
Added c7564f31f7/wheel/build_wheel.sh to `.ci/wheel/` folder
Commented out call to 39532891a0/run_tests.sh, because since 2018 this script just checked that tests folder is there and exited, as there are no way to run all pytorch tests in single shard, see this logic:
```bash
#!/bin/bash
set -eux -o pipefail
# Essentially runs pytorch/test/run_test.py, but keeps track of which tests to
# skip in a centralized place.
#
# TODO Except for a few tests, this entire file is a giant TODO. Why are these
# tests # failing?
# TODO deal with Windows
# This script expects to be in the pytorch root folder
if [[ ! -d 'test' || ! -f 'test/run_test.py' ]]; then
echo "builder/test.sh expects to be run from the Pytorch root directory " \
"but I'm actually in $(pwd)"
exit 2
fi
# Allow master skip of all tests
if [[ -n "${SKIP_ALL_TESTS:-}" ]]; then
exit 0
fi
```
https://github.com/pytorch/pytorch/pull/123390 is a misread attempt to interpret above-mentioned logic, as run_tests will be skipped if `${SKIP_ALL_TESTS}` is a non-empty string
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142277
Approved by: https://github.com/huydhn, https://github.com/atalman
ghstack dependencies: #142276
MInor, adds a linter that ensures that all jobs run on pull_request, schedule, push etc have a `if: github.repository_owner == 'pytorch'` or are dependent on a job that has that check
There is also a setting in Github repos that can disable all workflows for that repo
A lot of these are unnecessary because many jobs use reusable workflows that have that check. However, this is a one time change so I'm not that bothered
Unfortunately I can't put this at the workflow level, which would make this better
Lots of weird string parsing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138849
Approved by: https://github.com/malfet
Updates all references to runner determinator workflow (`_runner-determinator.yml`) from current cloned version to main version.
This enables the team to push updates to this workflow, like fixing bugs or pushing improvements, and have it immediately be reflected on all open PRs. So avoiding potentially breaking situations, empowering moving fast and fast and simple recover in case of bugs.
From:
```
jobs:
get-label-type:
uses: ./.github/workflows/_runner-determinator.yml
```
To:
```
jobs:
get-label-type:
uses: pytorch/pytorch/.github/workflows/_runner-determinator.yml@main
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137791
Approved by: https://github.com/malfet, https://github.com/huydhn, https://github.com/zxiiro