* Add pytorchbot to list of approvers for file
* Add labels to the auto created PR
The auto generated PR is currently not merging due to some failing tests on slow workflow that were supposed to be moved back to normal
idk if this has much value, clearly we've been managing without the update
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135390
Approved by: https://github.com/ZainRizvi
Move the slow test json to be in the pytorch/pytorch repo and make a job that will update it weekly. The job uses the same environment as the commit hash. It uses similar code to the hash updates, but the hash update contains a lot of code that is specific to the hash update, so I chose to pick out the parts that are relevant
Remove references to the old file and set up testing to read from the new file instead
The old update cadence was every day, the new one is every week
The auto slow test infra + the lack of pinning between pytorch and test-infra makes it really hard to tell if a test started failing because of a change or because of the slow test json changing. While this can have benefits, like disable test issues being effective everywhere immediately, it can also be very confusing, especially since we don't have the same insight into slow tests like we do for disable issues.
Example PR made: https://github.com/pytorch/pytorch/pull/132383 (with all the changes from this PR because it was working on top of this)
We should just get rid of this at some point in favor of the slowTest decorator, but there are some tests that take 5+ minutes to run and I don't want to track them down right now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132379
Approved by: https://github.com/huydhn
test_public_bindings should be run on anything that changes the public API - need to figure out in the future what is part of the public api, currently I'm using anything in torch/
flex_attention should be run on anything involving autograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130397
Approved by: https://github.com/malfet
test_public_bindings should be run on anything that changes the public API - need to figure out in the future what is part of the public api, currently I'm using anything in torch/
flex_attention should be run on anything involving autograd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130397
Approved by: https://github.com/malfet
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
yolo
iirc the a10g/sm86 runners have ~21 GB of space, so we can increase parallelism on it to 3. This results in about 6GB CUDA mem per proc. The previous calculation + 2 procs resulted in about 8 GB
Also fixes the the calc for per proc memory, assuming that CUDA context + anything else take about a little under 1GB of space (previous calc was .11 on about 7.5 - 8 GB <= .9GB)
Times on main are about 1.9-2.5hr per shard
This commit is around 1.6-2hr per shard
Risks: increase in flaky tests due to OOM
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125598
Approved by: https://github.com/huydhn
yolo
Also
* Ensure that at least 1 test always gets run (`//` does truncation which results in 0 if you have too few tests discovered)
* Don't run test removal on slow tests - I'm not touching that yet
I am avoid everything other than pull + trunk workflows, so not doing this on windows CUDA, which runs on periodic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125049
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
A better query for the base commit of a PR.
Some ghstack PRs are not connected to main so git merge-base doesn't work. Instead, use the Github API to query for the base of the PR, which should be more accurate
Sanity checked on one of Ed's ghstack PRs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122214
Approved by: https://github.com/seemethere
Test the generic torch.Stream/Event with fake device gurad and hooks. Since we added a fake device backend, it is mutual exclusive to other backends. Tests will be skipped if TEST_CUDA or TEST_ROCM is true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611, #123612
A better query for the base commit of a PR.
Some ghstack PRs are not connected to main so git merge-base doesn't work. Instead, use the Github API to query for the base of the PR, which should be more accurate
Sanity checked on one of Ed's ghstack PRs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122214
Approved by: https://github.com/seemethere
Fix round robin sharding when there are no test times and sort_by_time=False
Adds more tests to test_test_selections for sort_by_time=False
Adds more checks to test_split_shards_random for serial/parallel ordering + ordering of tests
Refactoring of dup code
Tested locally by running `python test/run_test.py --shard 3 5` with no test times downloaded and checked that it wasn't an empty list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121022
Approved by: https://github.com/huydhn, https://github.com/osalpekar
Fix round robin sharding when there are no test times and sort_by_time=False
Adds more tests to test_test_selections for sort_by_time=False
Adds more checks to test_split_shards_random for serial/parallel ordering + ordering of tests
Refactoring of dup code
Tested locally by running `python test/run_test.py --shard 3 5` with no test times downloaded and checked that it wasn't an empty list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121022
Approved by: https://github.com/huydhn, https://github.com/osalpekar
Move tests that are mentioned in PR body or commit message to front. Also attempts to find any issues/PRs mentioned in the PR body and search for those too (ex if you link a disable issue and that issue contains the test file that it was failing on)
looking for: dynamo/test_export_mutations
Also removes some printed information in TD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120621
Approved by: https://github.com/osalpekar