Commit Graph

212 Commits

Author SHA1 Message Date
PyTorch MergeBot
99f2491af9 Revert "Use absolute path path.resolve() -> path.absolute() (#129409)"
This reverts commit 45411d1fc9.

Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))
2025-01-04 14:17:20 +00:00
Xuehai Pan
45411d1fc9 Use absolute path path.resolve() -> path.absolute() (#129409)
Changes:

1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()`
2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409
Approved by: https://github.com/albanD
2025-01-03 20:03:40 +00:00
PyTorch MergeBot
cc4e70b7c3 Revert "Use absolute path path.resolve() -> path.absolute() (#129409)"
This reverts commit 135c7db99d.

Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))
2024-12-26 17:26:06 +00:00
Xuehai Pan
135c7db99d Use absolute path path.resolve() -> path.absolute() (#129409)
Changes:

1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()`
2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409
Approved by: https://github.com/albanD
2024-12-24 08:33:08 +00:00
Tom Ritchford
498a7808ff Fix unused Python variables outside torch/ and test/ (#136359)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359
Approved by: https://github.com/albanD
2024-12-11 17:10:23 +00:00
Catherine Lee
ea7d1826a2 [ez] Make merge blocking sevs be based on label instead of string (#140636)
sev issues are now merge blocking if they are labeled merge blocking, instead of simply having the merge blocking string in the body.  This makes it easier to default to non merge blocking when creating a sev

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140636
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-11-14 19:02:27 +00:00
Catherine Lee
24c9683355 [mergebot] Add ci-no-td label on revert (#139218)
Just in case?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139218
Approved by: https://github.com/wdvr
2024-10-30 21:36:09 +00:00
Xuehai Pan
267f82b860 [BE] Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/ / torchgen/ with ruff format (#132577)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132577
Approved by: https://github.com/malfet
2024-10-11 18:30:26 +00:00
Catherine Lee
f54e142c58 Remove references to Rockset in trymerge (#137207)
For the migration to ClickHouse

But also Rockset is not used in trymerge anymore
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137207
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-10-05 12:53:22 +00:00
Catherine Lee
bc0f330169 [trymerge] Manually close merged PR when Github fails (#135890)
Manually close merged PR when Github fails to do it.

Consequences of current design:
Sleeping for 1 min uses up the machine, might result in race conditions, results in merging label to removed a bit later, pr still left open if this api fails too (ie no async clean up job)

Tested in https://github.com/malfet/deleteme/pull/92 by removing the part of the commit message that has "resolved #pr num"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135890
Approved by: https://github.com/malfet, https://github.com/huydhn
2024-09-13 17:29:24 +00:00
Nikita Shulga
7666ef9d9b [GHF] Fix co-authors attribution (#133372)
Acording to https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors Co-authors must be mentioned at the very end of commit message and separated by 2 newlines

Test plan:
```python
from trymerge import GitHubPR
pr = GitHubPR("pytorch", "pytorch", 133189)
print(pr.gen_commit_message())
```

Fixes https://github.com/pytorch/pytorch/issues/133310

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133372
Approved by: https://github.com/kit1980
2024-08-14 00:48:24 +00:00
PaliC
544f950d14 [BE] Improve error message when there are internal changes (#131547)
Fixes https://github.com/pytorch/test-infra/issues/4988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131547
Approved by: https://github.com/xuzhao9, https://github.com/malfet, https://github.com/atalman
2024-07-24 20:38:08 +00:00
Xuehai Pan
747b38c131 [BE][Easy][2/19] enforce style for empty lines in import segments in .ci/ and .github/ (#129753)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129753
Approved by: https://github.com/malfet
ghstack dependencies: #129752
2024-07-16 09:40:00 +00:00
Catherine Lee
90f82426b9 RS migration - trymerge to upload merge records to s3 (#129503)
Uploads merge records to to ossci-raw-job-status (public) bucket instead of directly to rockset

The runner used by trymerge is a GH runner, so it doesn't have access to s3.  Instead, I save the record as a json and upload the json to s3 in a different step that runs after the aws credentials are configured.

The role is defined [here](https://togithub.com/pytorch-labs/pytorch-gha-infra/pull/421)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129503
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi, https://github.com/malfet
2024-06-26 19:06:52 +00:00
Huy Do
cda4d4887d Skip signals from older runs of the same workflows (#129291)
I discovered this bug in trymerge when debugging https://github.com/pytorch/pytorch/pull/129013 in which Dr.CI reported no relevant failures while mergebot complained about some unrelated ROCm failures https://github.com/pytorch/pytorch/pull/129013#issuecomment-2183009217.

It turns out that mergebot took into account stale signals from older runs of the same workflow here.  For example,
* https://github.com/pytorch/pytorch/actions/runs/9604985361 was the first run where it had a ROCm failure
* While https://github.com/pytorch/pytorch/actions/runs/9608926565 was the second attempt and it was all green

Notice that both runs came from the same push to commit [be69191](be69191f2d) with [ciflow/rocm/129013](https://github.com/pytorch/pytorch/tree/ciflow/rocm/129013).  So, we just need to check the signals from the newer run.

Note that Dr.CI handles this part correctly using the logic in https://github.com/pytorch/test-infra/blob/main/torchci/pages/api/drci/drci.ts#L1079-L1088.  So, the fix in this PR is to bring the same logic to trymerge.

### Testing

`pytest -v test_trymerge.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129291
Approved by: https://github.com/ZainRizvi
2024-06-26 03:49:09 +00:00
Nikita Shulga
b94c52dd29 [GHF] Refuse merge to non-default branch (#128710)
Unless PR is ghstack one

Test plan:
```
% GITHUB_TOKEN=$(gh auth token)  python3 -c "from trymerge import GitHubPR; pr=GitHubPR('pytorch', 'pytorch', 128591); print(pr.base_ref(), pr.default_branch())"
release/2.4 main
```
Fixes: https://github.com/pytorch/test-infra/issues/5339

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128710
Approved by: https://github.com/seemethere, https://github.com/atalman
2024-06-14 18:23:25 +00:00
Catherine Lee
936225d7b2 [mergebot] Fix pending unstable jobs being viewed as failed (#128080)
https://github.com/pytorch/pytorch/pull/128038#issuecomment-2150802030

In the above, pending unstable jobs get put into the ok_failed_checks list, and because there are a lot of unstable jobs, it exceeds the threshold and merge fails.

I don't think unstable jobs should be considered in the ok failed checks threshold, only flaky and broken trunk jobs should be considered there.

Change looks big, but main thing is that unstable jobs don't get included in the check for how many flaky failures there are.  The other changes are mostly renames so things are clearer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128080
Approved by: https://github.com/huydhn
2024-06-06 18:22:20 +00:00
Huy Do
e323c681ad Update trymerge to honor the list of unstable failures from Dr.CI (#124965)
After https://github.com/pytorch/test-infra/pull/5131, we want to have trymerge to honor the list of unstable failures from Dr.CI because having the unstable keyword is the job name now doesn't cover all unstable jobs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124965
Approved by: https://github.com/clee2000
2024-04-26 05:10:50 +00:00
Huy Do
f00ece024b Handle wrong workflow name from GitHub (#123301)
Fixes https://github.com/pytorch/pytorch/issues/122422.  From my testing, the problem is that GitHub didn't return the correct workflow name in some cases and used the path to the workflow instead.

Take https://github.com/pytorch/pytorch/pull/123104 as an example, the returning name from GH graphql was `.github/workflows/generated-linux-binary-conda-nightly.yml` while the name we had on Rockset was `linux-binary-conda`.  The latter was correct, but the mismatch caused mergebot to miss the flaky failures.

This is a weird issue because retrying the graphql query eventually returns the correct name.

First query:
![Screenshot 2024-04-03 at 15 28 37](https://github.com/pytorch/pytorch/assets/475357/81a8ada4-c241-4e6b-b45d-7a6de1c3a151)

After several retries:
![Screenshot 2024-04-03 at 15 31 53](https://github.com/pytorch/pytorch/assets/475357/402c2e8c-f963-45f6-8c10-e1d2f49c5479)

Then I could never get the result like the first query again.

The fix here is to keep track of the job ID so that we can compare it instead of the `workflow / job` name.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123301
Approved by: https://github.com/clee2000
2024-04-04 07:00:40 +00:00
PaliC
7bc91d5dc2 [mergebot][BE] If we don't have any required checks, don't run required checks (#121921)
This PR addresses the issue identified in #121920. The existing problem is that all tests are deemed mandatory if none are selected as required. This behavior is particularly noticeable during a force merge operation.

In the context of a force merge, it may not be necessary to execute any tests which are not required (imo). However, this proposed change could be seen as controversial, hence it has been separated from the main update for further discussion and review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121921
Approved by: https://github.com/huydhn
ghstack dependencies: #121920
2024-03-16 01:35:21 +00:00
PaliC
efbeefbb84 [executorch] Make trymerge force merges actually work with executorch (#121920)
This PR addresses an issue with the trymerge function for executorch, which currently uses Facebook CLA instead of Easy CLA. This bug has been patched in #121921. However, the patch is potentially controversial, and we still want to verify Facebook CLA if it exists. Therefore, this PR includes Facebook CLA in our set of mandatory checks.

Additionally, this PR removes Facebook CLA from one of the mocks. This change is necessary because the specific PR used for testing fails due to the presence of Facebook CLA in the mock.

## Testing:
We run `find_matching_merge_rule(pr = GitHubPR("pytorch", "executorch", 2326), skip_mandatory_checks=True, skip_internal_checks=True)` to check if things work

https://pastebin.com/HHSFp2Gw

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121920
Approved by: https://github.com/huydhn
2024-03-15 03:21:44 +00:00
Catherine Lee
34638c82a6 [mergebot] No unique behavior for facebook bot re pending jobs (#119735)
if fb bot says merge without -f, do normal behavior and wait for pending checks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119735
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn
2024-02-13 20:07:24 +00:00
Catherine Lee
200108c6e6 Delete old branches (#117079)
Example https://github.com/pytorch/pytorch/actions/runs/7562281351/job/20592425611?pr=117079 (The code to delete branches isn't being run, it's just listing the branches it wants to delete)

Internal code: https://fburl.com/code/hdvvbfkj

Threshold for branch with PR is 30 days regardless of whether or not the PR is merged or not (compared to 3 days if merged and 30 days if closed).  Threshold for branch without PR is 1.5 years (same internally).

Threshold of ~400 queries to github so it doesn't hit token usage limits.  Currently this leads to about 350 branches deleted per run.

Only query for the last 90 days of updated PRs to reduce token usage, so if a branch has a PR but it was updated 90+ days ago, it will think it doesn't have a PR and will wait for the 1.5 years branch update check instead, regardless of whether the PR is open or closed.

I tested that it could delete my own branch and it worked.

labeled with test-config/crossref because I just want the smallest test config possible to reduce CI usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117079
Approved by: https://github.com/malfet
2024-02-05 20:50:05 +00:00
Nikita Shulga
3011a4406f [BE][GHF] Do not hardcode default branch name (#118530)
Instead rely on `GitHubPR.default_branch()` which is the name of the repo's default branch.

Do not pass branch name `merge_changes` is called, as it is set to default branch inside the function

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118530
Approved by: https://github.com/clee2000
2024-01-29 17:18:23 +00:00
Nikita Shulga
7cc7bf9dda [GHF] Add co-authors to PR (#118347)
Mention co-authors in PR body

Modify `CommitAuthors` to include query first two commit `authors`, which makes sure that authors from suggested commits are recognized.

Test plan: CI + check `get_authors()` on a few PRs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118347
Approved by: https://github.com/kit1980
2024-01-27 01:02:49 +00:00
Ivan Zaitsev
b599f5608c Fix mergeability check for ghstack PRs (#118258)
# Changes
* introduce `--check-mergeability` trymerge flag that attempts to merge PR locally, using the same merge logic as the mergebot, but requires just a read-only `GITHUB_TOKEN` and git repo.
* change mergeability workflow to utilize the new --check-mergeability logic

# Alternatives considered

1.
> Rewrite `https://github.com/pytorch/test-infra/actions/workflows/pr-dependencies-check.yml` to correctly support partially merged ghstacks.

That would be a slightly better approach, but ROI is lower, as it requires reimplementing trymerge logic and additional effort to consolidate the codebase (trymerge lives in pytorch repo).

`pr-dependencies-check.yml` still produces human-readable results for partially merged ghstack prs (even if it falsely reports them as non-mergeable).

2.

> Instead of introducing new trymerge flag, use existing flags, including `--dry-run`.

That didn't work, as no combination of existing flags skips the rule checks and ROCKSET lookups.

# Testing

1. Manual testing  `trymerge.py --check-mergeability`  on the regular and ghstack PRs:

```
export GITHUB_TOKEN=
export GIT_REPO_DIR=`pwd`
export GITHUB_REPOSITORY=pytorch/pytorch
export GIT_REMOTE_URL=https://github.com/pytorch/pytorch

# Test 1 (2 prs, 1 is closed)
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability  117862
Skipping 1 of 2 PR (#117859) as its already been merged

echo $?
0

# Test 2 (3 prs, 1 is closed)
python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability  118125
Skipping 1 of 3 PR (#117859) as its already been merged

echo $?
0

# Test 3 (3 prs, intentional conflicts introduced into `main`):

python3 ../pytorch/.github/scripts/trymerge.py --check-mergeability  118125
Skipping 1 of 3 PR (#117859) as its already been merged
stdout:
Auto-merging torch/_inductor/ir.py
Auto-merging torch/_inductor/lowering.py
CONFLICT (content): Merge conflict in torch/_inductor/lowering.py
error: could not apply 66ba5b8792f... Realize inputs to DynamicScalar before unwrapping
...
RuntimeError: Command `git -C /Users/ivanzaitsev/pytorch2 cherry-pick -x 66ba5b8792fa076c4e512d920651e5b6b7e466f4` returned non-zero exit code 1
```

2.  Workflow run:
https://github.com/pytorch/pytorch/actions/runs/7660736172/job/20878651852?pr=118258

<img width="516" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/28fbf0d2-ac2a-4518-b41d-b32b41373747">
<img width="621" alt="image" src="https://github.com/pytorch/pytorch/assets/108101595/ddbf8566-a417-43ec-9d0e-f623f4a71313">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118258
Approved by: https://github.com/PaliC, https://github.com/huydhn
2024-01-26 03:15:56 +00:00
Catherine Lee
02a411d4a6 [mergebot] Dry run for labels + easier to read Dr CI result (#118240)
Dry run open for labels so we can run trymerge locally with dryrun without actually affected the PR

Make Dr.CI results easier to read (previously a massive json dump, now just the job names + ids, in a nicer format)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118240
Approved by: https://github.com/huydhn
2024-01-25 23:06:43 +00:00
Nikita Shulga
0f0020d76f [GHF] Add support for new style stacks (#116873)
Where base stack targets default branch, rather than base. But as
default branch is likely to advance, since PR was made, search for
mergebase before determining whether `base`..`head` are in sync with `orig` branch
Also, rather than hardcode default branch name, fetch it from `GitHubPR.default_branch()`

Test Plan: https://github.com/malfet/deleteme/pull/77

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116873
Approved by: https://github.com/ezyang
2024-01-05 20:32:24 +00:00
Nikita Shulga
93b86bf531 [GHF] Implement stacked revert (#116447)
By adding `get_ghstack_dependent_prs` that using `git branch --contains`
finds all PRs containing stacked branch, selecting longest one (in
terms of distance between origin and default branch) and skipping all
open PRs

Please note, that reverts should be applied in a reversed order with the
one how PRs were landed originally.

Use a bit of a defensive programming, i.e. revert single PR if attempt to fetch dependencies fails for some reason.

Test plan:
 - Lint
 -  ```
    >>> from trymerge import GitRepo, GitHubPR, get_ghstack_prs, get_ghstack_dependent_prs
    >>> pr=GitHubPR("pytorch", "pytorch", 115188)
    >>> pr1=GitHubPR("pytorch", "pytorch", 115210)
    >>> repo=GitRepo("/Users/nshulga/git/pytorch/pytorch")
    >>> get_ghstack_dependent_prs(repo, pr1)
    [('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x100f32f40>)]
    >>> get_ghstack_dependent_prs(repo, pr)
    [('22742d93a5357c9b5b45a74f91a6dc5599c9c266', <trymerge.GitHubPR object at 0x10102eaf0>), ('76b1d44d576c20be79295810904c589241ca1bd2', <trymerge.GitHubPR object at 0x10102eb50>)]
    >>> rc=get_ghstack_dependent_prs(repo, pr)
    rc[0]>>> rc[0][1].pr_num
    115210
    >>> rc[1][1].pr_num
    115188
    ```
 - see: https://github.com/malfet/deleteme/pull/59#issuecomment-1869904714 and https://github.com/malfet/deleteme/pull/74#issuecomment-1870542702

Fixes https://github.com/pytorch/test-infra/issues/4845
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116447
Approved by: https://github.com/huydhn
ghstack dependencies: #116446
2023-12-27 23:01:16 +00:00
Nikita Shulga
5fcc2519f5 [GHF] Refactors (#116446)
Prep change for allowing stacked reverts

This is a no-op that factors out some helper function that would be
useful later:
 - `get_pr_commit_sha` finds a committed sha for a given PR
 - `_revlist_to_prs` converts a revlist to GitHubPRs conditionally
   filtering some out
 - `do_revert_prs` reverts multiple PRs in a batch, but so far is
   invoked with only one PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116446
Approved by: https://github.com/huydhn, https://github.com/seemethere
2023-12-27 23:01:16 +00:00
Huy Do
d6de2df6b6 Improve the error message when a PR lacks the necessary approvals (#116161)
The error message from https://github.com/pytorch/pytorch/pull/115329#issuecomment-1857135047 is pretty confusing because it lists some random `pytorch/metamates` folks from `superuser` merge rule.  My attempt here is to make the error message clearer by pointing out:

* All the matching merge rules and
* Their list of approvers

The message will now become:

```
Approvers from one of the follow rules are needed:
- Core Reviewers (1, 2, 3, 4, 5, ...)
- Core Maintainers (1, 2)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116161
Approved by: https://github.com/malfet, https://github.com/PaliC, https://github.com/atalman, https://github.com/ZainRizvi
2023-12-22 00:22:43 +00:00
Nikita Shulga
7ed2bc7c67 [GHF] Do not block reverts with internal changes (#115903)
As check is more often than not is unreliable, so better just post a
warning and let the revert proceed.

Fixes https://github.com/pytorch/test-infra/issues/4797

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115903
Approved by: https://github.com/clee2000, https://github.com/atalman
2023-12-15 17:00:07 +00:00
Eli Uriegas
84ee7453ad ci: Add clickable PR link to trymerge (#113712)
Adds a link to trymerge so that you can quickly click through the job to
the pull request for debugging.

Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113712
Approved by: https://github.com/clee2000, https://github.com/malfet
2023-11-15 01:55:33 +00:00
Nikita Shulga
54c7d0d99d [GHF] Bot should reopen PR after revert (#112614)
Fixes https://github.com/pytorch/test-infra/issues/4692
Test plan, see https://github.com/malfet/deleteme/pull/58#issuecomment-1789365259 / https://github.com/malfet/deleteme/actions/runs/6723011476
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112614
Approved by: https://github.com/seemethere, https://github.com/ezyang
ghstack dependencies: #112613
2023-11-01 18:03:32 +00:00
Huy Do
9132734a35 Use Dr.CI GitHub checkrun summary when querying its API fails (#111628)
This will allow internal SandCastle job to access Dr.CI classification results via GitHub checkrun summary and correctly ignore unrelated failures.

### Testing

Adding `TestBypassFailuresOnSandCastle` where Dr.CI API returns nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111628
Approved by: https://github.com/clee2000
2023-10-24 01:32:30 +00:00
Huy Do
4ec777e9a5 [BE] Clean up trymerge code handling broken trunk failures (#111520)
This is the final part of https://github.com/pytorch/pytorch/pull/110054.  The broken trunk classification has been done on Dr.CI, so we can just check for that in trymerge for consistency when ghstack is used.

* [x] https://github.com/pytorch/pytorch/pull/110054
* [x] https://github.com/pytorch/pytorch/pull/110133
* [x] This PR to clean up the broken trunk logic.

One important change is that `get_classifications` doesn't need to query the jobs from Rockset for the head and merge base SHA anymore, saving a query there.  The function looks a lot simpler now.

### Testing

https://github.com/pytorch/pytorch/pull/111253 had 1 broken trunk failure as detected by Dr.CI from the base commit 3eb5cae3af (valid) while trymerge didn't detect that because ghstack base commit be8e517174 didn't have the same failure (miss).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111520
Approved by: https://github.com/clee2000
2023-10-19 02:30:56 +00:00
Huy Do
f952551963 Handle invalid cancellation signals in trymerge (#110690)
This change is needed after https://github.com/pytorch/test-infra/pull/4579 and https://github.com/pytorch/test-infra/pull/4610.  All invalid cancelled signals have been removed from Dr.CI and HUD.  So trymerge should ignore them accordingly for a consistent experience.

### Testing

https://github.com/pytorch/pytorch/pull/110367#issuecomment-1750099960 is the PR where a bunch of invalid cancelled signals showed up and blocked merges

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110690
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi
2023-10-06 22:43:33 +00:00
Huy Do
26bfb0fc21 Check for both workflow and job names from Dr.CI (#110661)
In https://github.com/pytorch/pytorch/pull/110362, the failure was flaky but merge bot treated it as an actual failure. This is a regression after https://github.com/pytorch/test-infra/pull/4604 where the name returned by Dr.CI now includes workflow name.  For example, the name is `trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)` in the JSON response:

```
{"FAILED": [], "FLAKY": [{"workflowId": 6372581477, "id": 17297638807, "name": "trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "jobName": "macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "conclusion": "failure", "completed_at": "2023-10-01T22:18:28Z", "html_url": "https://github.com/pytorch/pytorch/actions/runs/6372581477/job/17297638807", "head_branch": "ciflow/trunk/110362", "pr_number": 110362, "head_sha": "03f51e36dedf234931006d1db61677b229c9a119", "failure_captures": ["Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of"], "failure_line": "Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of 6291456KB for macOS", "time": "2023-10-01T22:17:53.847751Z"}], "BROKEN_TRUNK": [], "UNSTABLE": []}
```

I update merge bot to handle this better by considering both workflow name, job name, and the combination full name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110661
Approved by: https://github.com/clee2000
2023-10-06 04:36:52 +00:00
Huy Do
81a74457ca [BE] Clean up trymerge code handling flaky failures (#110133)
This is the 2nd part of https://github.com/pytorch/pytorch/pull/110054.  The flaky classification has been done on Dr.CI.  There is no need to download flaky rule files and do the check anymore.  Some tests are also updated with new examples because we mocked the list of flaky rules there.  Similar tests have been done on Dr.CI.

* [x] https://github.com/pytorch/pytorch/pull/110054
* [x] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110133
Approved by: https://github.com/clee2000
2023-09-30 08:01:00 +00:00
Huy Do
955298bc40 Use Dr.CI results to classify flaky failures in trymerge (#110054)
After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there.  This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function.

Because the change is relatively large, I'm breaking it down to several smaller PRs in this order:

* [x] This PR queries Dr.CI and adds `is_flaky` check
* [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason

### Testing

* Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`.
*  `pytest -v test_trymerge.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054
Approved by: https://github.com/clee2000
2023-09-27 21:21:29 +00:00
PyTorch MergeBot
063d2572da Revert "Use Dr.CI results to classify flaky failures in trymerge (#110054)"
This reverts commit d0f82cd082.

Reverted https://github.com/pytorch/pytorch/pull/110054 on behalf of https://github.com/huydhn due to The mock gql_mocks.json file is not bigger than the file size limit on fbcode ([comment](https://github.com/pytorch/pytorch/pull/110054#issuecomment-1737727552))
2023-09-27 16:33:10 +00:00
Huy Do
d0f82cd082 Use Dr.CI results to classify flaky failures in trymerge (#110054)
After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there.  This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function.

Because the change is relatively large, I'm breaking it down to several smaller PRs in this order:

* [x] This PR queries Dr.CI and adds `is_flaky` check
* [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI
* [ ] Clean up the broken trunk logic for the same reason

### Testing

* Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`.
*  `pytest -v test_trymerge.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054
Approved by: https://github.com/clee2000
2023-09-26 21:24:21 +00:00
Huy Do
d7f943ec82 [mergebot] Flaky and broken trunk should take precedence over ic (#107761)
I notice a curious case on https://github.com/pytorch/pytorch/pull/107508 where there was one broken trunk failure and the PR was merged with `merge -ic`.  Because the failure had been classified as unrelated, I expected to see a no-op force merge here.  However, it showed up as a force merge with failure.

![Screenshot 2023-08-22 at 20 01 10](https://github.com/pytorch/pytorch/assets/475357/b9c93e24-8da8-4fc6-9b3d-61b6bd0a8937)

The record on Rockset reveals https://github.com/pytorch/pytorch/pull/107508 has:

* 0 broken trunk check (unexpected, this should be 1 as Dr. CI clearly say so)
* 1 ignore current check (unexpected, this should be 0 and the failure should be counted as broken trunk instead)
* 3 unstable ROCm jobs (expected)

It turns out that ignore current takes precedence over flaky and broken trunk classification.  This might have been the expectation in the past but I think that's not the case now.  The bot should be consistent with what is shown on Dr. CI.  The change here is to make flaky, unstable, and broken trunk classification to take precedence over ignore current.  Basically, we only need to ignore new or unrecognized failures that have yet been classified.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107761
Approved by: https://github.com/clee2000
2023-08-23 21:22:56 +00:00
Zain Rizvi
b9c86c521d Make mergebot work with review comments (#107390)
Fixes https://github.com/pytorch/pytorch/issues/100406

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107390
Approved by: https://github.com/clee2000
ghstack dependencies: #107385
2023-08-17 21:31:41 +00:00
Zain Rizvi
4874b02379 [BE] Remove deprecated github gql param and disable inconsistent test (#107385)
Two fixes:
- Stop querying `pushDate`, which [has been deprecated ](https://docs.github.com/en/graphql/reference/objects) and now always returns null
- Disables the test `test_merge_ghstack_into` which was recently added in https://github.com/pytorch/pytorch/pull/105251. This test used the results of another person's ghstack PR for testing, but as the dev submitted chunks of their PR this test's assumptions have been broken. cc @izaitsevfb for a long term fix here

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107385
Approved by: https://github.com/clee2000
2023-08-17 21:31:41 +00:00
Huy Do
4979a1b8f9 Fix trymerge broken trunk detection when the merge base job was retried (successfully) (#107333)
This fixes a discrepancy bug between Dr.CI and trymerge when detecting broken trunk failures.
 Take https://github.com/pytorch/pytorch/pull/107160 as an example:

* Dr.CI correctly identifies the broken trunk failure
* while trymerge records it as a new failure

The issue is that the merge base [failure](https://github.com/pytorch/pytorch/actions/runs/5833057579/job/15820504498) was flaky.  It was retried successfully and its conclusion went from a failure to a success.  The Rockset query returns all run attempts and while Dr.CI correctly records the failure, trymerge overwrites it with the successful retry.   Thus, the latter saw a new failure.

This change makes trymerge keep the merge base failure similar to what Dr.CI does https://github.com/pytorch/test-infra/blob/main/torchci/pages/api/drci/drci.ts#L158-L168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107333
Approved by: https://github.com/clee2000
2023-08-17 02:09:31 +00:00
Ivan Zaitsev
d2aa3f5fa9 [GHF][mergebot] record ghstack dependencies in the commit message (#105251)
Currently all information about the dependencies of ghstack PRs (e.g. #105010) is stripped away:
c984885809/.github/scripts/trymerge.py (L1077-L1078)

This PR adds this information back in a more compact form. All dependencies (PR numbers) of each PR in ghstack are recorded.

The resulting commit message will look like this (the last line is new):

> Mock title (#123)
>
> Mock body text
> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123
> Approved by: https://github.com/Approver1, https://github.com/Approver2
> ghstack dependencies: #1, #2

---

### Testing

Unit tests.

---

### Note Re: `# type: ignore[assignment]` in unit tests.

I did my due diligence to find alternatives. Unfortunately mypy [doesn't](https://github.com/python/mypy/issues/6713) support this [way of patching methods](https://docs.python.org/3/library/unittest.mock-examples.html#mock-patching-methods), and the alternatives are either extremely verbose or don't work for this case. I decided it's not worth the effort (since the problem is limited only to the unit test).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105251
Approved by: https://github.com/huydhn
2023-07-29 20:32:10 +00:00
Huy Do
4fe407ad73 Add details about ic, broken, flaky, and unstable checks to merge records (#106162)
At the moment, we only record the list of pending and failed check on Rockset merge records. This is enough to compute the force merge KPI(s), but isn't enough for more in-depth analysis on what happened at the time of the merge:

* If the number of `ok_failed_checks` is less than `ok_failed_checks_threshold`, the list of `failed_checks` would be empty (expectedly).  So Rockset would only record an empty list.
* We support retry in PR, so the classifications on Dr.CI could be different than what dev observed at the time of the merge if retry completed successfully

### Testing

`python .github/scripts/trymerge.py --comment-id 1654010315 106095 --dry-run` (need to comment out some of the code to actually write a test record to Rockset), then manually verify it with

```
SELECT
    *
FROM
    commons.merges
WHERE
    pr_num = 106095
```

to see that `ignore_current_checks`, `broken_trunk_checks`, `flaky_checks`, and `unstable_checks` shows up correctly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106162
Approved by: https://github.com/clee2000
2023-07-28 09:41:02 +00:00
Huy Do
32175d794a No need to wait for pending unstable jobs when merging (#106095)
No need to wait if the job classification is unstable as it would be ignored anyway. This is useful to not need to wait for scarce resources like ROCm, which is also frequently in unstable mode (There is a ROCm queue atm)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106095
Approved by: https://github.com/clee2000
2023-07-28 07:08:23 +00:00
Huy Do
3b5fb7c0d4 Support regex flaky rules in trymerge (#106103)
This goes together with https://github.com/pytorch/test-infra/pull/4423 to support regex flaky rules in `trymerge`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106103
Approved by: https://github.com/clee2000
2023-07-28 07:05:12 +00:00