https://github.com/pytorch/pytorch/pull/128038#issuecomment-2150802030
In the above, pending unstable jobs get put into the ok_failed_checks list, and because there are a lot of unstable jobs, it exceeds the threshold and merge fails.
I don't think unstable jobs should be considered in the ok failed checks threshold, only flaky and broken trunk jobs should be considered there.
Change looks big, but main thing is that unstable jobs don't get included in the check for how many flaky failures there are. The other changes are mostly renames so things are clearer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128080
Approved by: https://github.com/huydhn
This PR addresses the issue identified in #121920. The existing problem is that all tests are deemed mandatory if none are selected as required. This behavior is particularly noticeable during a force merge operation.
In the context of a force merge, it may not be necessary to execute any tests which are not required (imo). However, this proposed change could be seen as controversial, hence it has been separated from the main update for further discussion and review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121921
Approved by: https://github.com/huydhn
ghstack dependencies: #121920
This PR addresses an issue with the trymerge function for executorch, which currently uses Facebook CLA instead of Easy CLA. This bug has been patched in #121921. However, the patch is potentially controversial, and we still want to verify Facebook CLA if it exists. Therefore, this PR includes Facebook CLA in our set of mandatory checks.
Additionally, this PR removes Facebook CLA from one of the mocks. This change is necessary because the specific PR used for testing fails due to the presence of Facebook CLA in the mock.
## Testing:
We run `find_matching_merge_rule(pr = GitHubPR("pytorch", "executorch", 2326), skip_mandatory_checks=True, skip_internal_checks=True)` to check if things work
https://pastebin.com/HHSFp2Gw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121920
Approved by: https://github.com/huydhn
Example https://github.com/pytorch/pytorch/actions/runs/7562281351/job/20592425611?pr=117079 (The code to delete branches isn't being run, it's just listing the branches it wants to delete)
Internal code: https://fburl.com/code/hdvvbfkj
Threshold for branch with PR is 30 days regardless of whether or not the PR is merged or not (compared to 3 days if merged and 30 days if closed). Threshold for branch without PR is 1.5 years (same internally).
Threshold of ~400 queries to github so it doesn't hit token usage limits. Currently this leads to about 350 branches deleted per run.
Only query for the last 90 days of updated PRs to reduce token usage, so if a branch has a PR but it was updated 90+ days ago, it will think it doesn't have a PR and will wait for the 1.5 years branch update check instead, regardless of whether the PR is open or closed.
I tested that it could delete my own branch and it worked.
labeled with test-config/crossref because I just want the smallest test config possible to reduce CI usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117079
Approved by: https://github.com/malfet
Instead rely on `GitHubPR.default_branch()` which is the name of the repo's default branch.
Do not pass branch name `merge_changes` is called, as it is set to default branch inside the function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118530
Approved by: https://github.com/clee2000
Mention co-authors in PR body
Modify `CommitAuthors` to include query first two commit `authors`, which makes sure that authors from suggested commits are recognized.
Test plan: CI + check `get_authors()` on a few PRs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118347
Approved by: https://github.com/kit1980
# Changes
* introduce `--check-mergeability` trymerge flag that attempts to merge PR locally, using the same merge logic as the mergebot, but requires just a read-only `GITHUB_TOKEN` and git repo.
* change mergeability workflow to utilize the new --check-mergeability logic
# Alternatives considered
1.
> Rewrite `https://github.com/pytorch/test-infra/actions/workflows/pr-dependencies-check.yml` to correctly support partially merged ghstacks.
That would be a slightly better approach, but ROI is lower, as it requires reimplementing trymerge logic and additional effort to consolidate the codebase (trymerge lives in pytorch repo).
`pr-dependencies-check.yml` still produces human-readable results for partially merged ghstack prs (even if it falsely reports them as non-mergeable).
2.
> Instead of introducing new trymerge flag, use existing flags, including `--dry-run`.
That didn't work, as no combination of existing flags skips the rule checks and ROCKSET lookups.
# Testing
1. Manual testing `trymerge.py --check-mergeability` on the regular and ghstack PRs:
```
export GITHUB_TOKEN=
export GIT_REPO_DIR=`pwd`
export GITHUB_REPOSITORY=pytorch/pytorch
export GIT_REMOTE_URL=https://github.com/pytorch/pytorch
# Test 1 (2 prs, 1 is closed)
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 117862
Skipping 1 of 2 PR (#117859) as its already been merged
echo $?
0
# Test 2 (3 prs, 1 is closed)
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 118125
Skipping 1 of 3 PR (#117859) as its already been merged
echo $?
0
# Test 3 (3 prs, intentional conflicts introduced into `main`):
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability 118125
Skipping 1 of 3 PR (#117859) as its already been merged
stdout:
Auto-merging torch/_inductor/ir.py
Auto-merging torch/_inductor/lowering.py
CONFLICT (content): Merge conflict in torch/_inductor/lowering.py
error: could not apply 66ba5b8792f... Realize inputs to DynamicScalar before unwrapping
...
RuntimeError: Command `git -C /Users/ivanzaitsev/pytorch2 cherry-pick -x 66ba5b8792fa076c4e512d920651e5b6b7e466f4` returned non-zero exit code 1
```
2. Workflow run:
https://github.com/pytorch/pytorch/actions/runs/7660736172/job/20878651852?pr=118258
<img width="516" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/28fbf0d2-ac2a-4518-b41d-b32b41373747">
<img width="621" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/ddbf8566-a417-43ec-9d0e-f623f4a71313">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118258
Approved by: https://github.com/PaliC, https://github.com/huydhn
Dry run open for labels so we can run trymerge locally with dryrun without actually affected the PR
Make Dr.CI results easier to read (previously a massive json dump, now just the job names + ids, in a nicer format)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118240
Approved by: https://github.com/huydhn
Where base stack targets default branch, rather than base. But as
default branch is likely to advance, since PR was made, search for
mergebase before determining whether `base`..`head` are in sync with `orig` branch
Also, rather than hardcode default branch name, fetch it from `GitHubPR.default_branch()`
Test Plan: https://github.com/malfet/deleteme/pull/77
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116873
Approved by: https://github.com/ezyang
By adding `get_ghstack_dependent_prs` that using `git branch --contains`
finds all PRs containing stacked branch, selecting longest one (in
terms of distance between origin and default branch) and skipping all
open PRs
Please note, that reverts should be applied in a reversed order with the
one how PRs were landed originally.
Use a bit of a defensive programming, i.e. revert single PR if attempt to fetch dependencies fails for some reason.
Test plan:
- Lint
- ```
>>> from trymerge import GitRepo, GitHubPR, get_ghstack_prs, get_ghstack_dependent_prs
>>> pr=GitHubPR("pytorch", "pytorch", 115188)
>>> pr1=GitHubPR("pytorch", "pytorch", 115210)
>>> repo=GitRepo("/Users/nshulga/git/pytorch/pytorch")
>>> get_ghstack_dependent_prs(repo, pr1)
[('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x100f32f40>)]
>>> get_ghstack_dependent_prs(repo, pr)
[('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x10102eaf0>), ('76b1d44d576c20be79295810904c589241ca1bd2', <trymerge.GitHubPR object at 0x10102eb50>)]
>>> rc=get_ghstack_dependent_prs(repo, pr)
rc[0]>>> rc[0][1].pr_num
115210
>>> rc[1][1].pr_num
115188
```
- see: https://github.com/malfet/deleteme/pull/59#issuecomment-1869904714 and https://github.com/malfet/deleteme/pull/74#issuecomment-1870542702
Fixes https://github.com/pytorch/test-infra/issues/4845
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116447
Approved by: https://github.com/huydhn
ghstack dependencies: #116446
Prep change for allowing stacked reverts
This is a no-op that factors out some helper function that would be
useful later:
- `get_pr_commit_sha` finds a committed sha for a given PR
- `_revlist_to_prs` converts a revlist to GitHubPRs conditionally
filtering some out
- `do_revert_prs` reverts multiple PRs in a batch, but so far is
invoked with only one PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116446
Approved by: https://github.com/huydhn, https://github.com/seemethere
In https://github.com/pytorch/pytorch/pull/110362, the failure was flaky but merge bot treated it as an actual failure. This is a regression after https://github.com/pytorch/test-infra/pull/4604 where the name returned by Dr.CI now includes workflow name. For example, the name is `trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)` in the JSON response:
```
{"FAILED": [], "FLAKY": [{"workflowId": 6372581477, "id": 17297638807, "name": "trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "jobName": "macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "conclusion": "failure", "completed_at": "2023-10-01T22:18:28Z", "html_url": "https://github.com/pytorch/pytorch/actions/runs/6372581477/job/17297638807", "head_branch": "ciflow/trunk/110362", "pr_number": 110362, "head_sha": "03f51e36dedf234931006d1db61677b229c9a119", "failure_captures": ["Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of"], "failure_line": "Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of 6291456KB for macOS", "time": "2023-10-01T22:17:53.847751Z"}], "BROKEN_TRUNK": [], "UNSTABLE": []}
```
I update merge bot to handle this better by considering both workflow name, job name, and the combination full name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110661
Approved by: https://github.com/clee2000
After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there. This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function.
Because the change is relatively large, I'm breaking it down to several smaller PRs in this order:
* [x] This PR queries Dr.CI and adds `is_flaky` check
* [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason
### Testing
* Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`.
* `pytest -v test_trymerge.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054
Approved by: https://github.com/clee2000
After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there. This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function.
Because the change is relatively large, I'm breaking it down to several smaller PRs in this order:
* [x] This PR queries Dr.CI and adds `is_flaky` check
* [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason
### Testing
* Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`.
* `pytest -v test_trymerge.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054
Approved by: https://github.com/clee2000
I notice a curious case on https://github.com/pytorch/pytorch/pull/107508 where there was one broken trunk failure and the PR was merged with `merge -ic`. Because the failure had been classified as unrelated, I expected to see a no-op force merge here. However, it showed up as a force merge with failure.

The record on Rockset reveals https://github.com/pytorch/pytorch/pull/107508 has:
* 0 broken trunk check (unexpected, this should be 1 as Dr. CI clearly say so)
* 1 ignore current check (unexpected, this should be 0 and the failure should be counted as broken trunk instead)
* 3 unstable ROCm jobs (expected)
It turns out that ignore current takes precedence over flaky and broken trunk classification. This might have been the expectation in the past but I think that's not the case now. The bot should be consistent with what is shown on Dr. CI. The change here is to make flaky, unstable, and broken trunk classification to take precedence over ignore current. Basically, we only need to ignore new or unrecognized failures that have yet been classified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107761
Approved by: https://github.com/clee2000
At the moment, we only record the list of pending and failed check on Rockset merge records. This is enough to compute the force merge KPI(s), but isn't enough for more in-depth analysis on what happened at the time of the merge:
* If the number of `ok_failed_checks` is less than `ok_failed_checks_threshold`, the list of `failed_checks` would be empty (expectedly). So Rockset would only record an empty list.
* We support retry in PR, so the classifications on Dr.CI could be different than what dev observed at the time of the merge if retry completed successfully
### Testing
`python .github/scripts/trymerge.py --comment-id 1654010315 106095 --dry-run` (need to comment out some of the code to actually write a test record to Rockset), then manually verify it with
```
SELECT
*
FROM
commons.merges
WHERE
pr_num = 106095
```
to see that `ignore_current_checks`, `broken_trunk_checks`, `flaky_checks`, and `unstable_checks` shows up correctly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106162
Approved by: https://github.com/clee2000
No need to wait if the job classification is unstable as it would be ignored anyway. This is useful to not need to wait for scarce resources like ROCm, which is also frequently in unstable mode (There is a ROCm queue atm)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106095
Approved by: https://github.com/clee2000