tbh at this point it might be easier to make a new workflow and copy the relevant jobs...
Changes:
* Disable cuda mem leak check except for on scheduled workflows
* Make pull and trunk run on a schedule which will run the memory leak check
* Periodic will always run the memory leak check -> periodic does not have parallelization anymore
* Concurrency check changed to be slightly more generous
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88373
Approved by: https://github.com/ZainRizvi, https://github.com/huydhn
Sometimes you want to query the small element of a set of elements and use `sorted(elements)[0]` without a second thought. However, this is not optimal, since the entire list must be sorted first `O(n log n)`. It would be better to use the `min(elements)` method provided for this purpose `O(n)`.
Furthermore `sorted(elements)[::-1]` is not very efficient, because it would be better to use `sorted(elements, reverse=True)` to save the slice operation.
**TLDR: using `sorted(elements)[0]` is slow and can be replaced with `min(elements)`.**
I stumbled across these code snippets while playing around with CodeQL (see https://lgtm.com/query/4148064474379348546/).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86995
Approved by: https://github.com/jansel
run tests in parallel at the test file granularity
runs 3 files in parallel using multiprocessing pool, output goes to a file, which is then printed when the test finishes. Some tests cannot be run in parallel (usually due to lacking memory), so we run those after. Sharding is changed to attempt to mask large files with other large files/run them on the same shard.
test_ops* gets a custom handler to run it because it is simply too big (2hrs on windows) and linalg_cholesky fails (I would really like a solution to this if possible, but until then we use the custom handler).
reduces cuda tests by a lot, reduces total windows test time by ~1hr
Ref. https://github.com/pytorch/pytorch/issues/82894
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84961
Approved by: https://github.com/huydhn
Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718
Approved by: https://github.com/albanD
With ufmt in place https://github.com/pytorch/pytorch/pull/81157, we can now use it to gradually format all files. I'm breaking this down into multiple smaller batches to avoid too many merge conflicts later on.
This batch (as copied from the current BLACK linter config):
* `tools/**/*.py`
Upcoming batchs:
* `torchgen/**/*.py`
* `torch/package/**/*.py`
* `torch/onnx/**/*.py`
* `torch/_refs/**/*.py`
* `torch/_prims/**/*.py`
* `torch/_meta_registrations.py`
* `torch/_decomp/**/*.py`
* `test/onnx/**/*.py`
Once they are all formatted, BLACK linter will be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81285
Approved by: https://github.com/suo
`JOB_BASE_NAME` was a holdover from jenkins compatibility. Eventually,
it morphed to be always set to the build enviroment + `-test` or
`-build`, and we used it to detect whether we were in a build or test.
That's sort of pointless, so removing and fixing up the few remaining
use cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80046
Approved by: https://github.com/malfet, https://github.com/janeyx99
`JOB_BASE_NAME` was a holdover from jenkins compatibility. Eventually,
it morphed to be always set to the build enviroment + `-test` or
`-build`, and we used it to detect whether we were in a build or test.
That's sort of pointless, so removing and fixing up the few remaining
use cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80046
Approved by: https://github.com/malfet
In the case of target determination, this is just removing comments that
refer to non-existent code.
In the case of the test specification code; this removes (what I believe
to be) an unused feature. If we're using this somehow let me know and I
can revise the PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79372
Approved by: https://github.com/janeyx99
Fixes #ISSUE_NUMBER
Sharding for linux-bionic-py3.7-clang9 previously included slow test times in the calculation for how long a test takes, causing the sharding to be uneven:
| Duration | Count | Name|
| ----------- | ----------- | ---|
| 11.2m | 221 |linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)|
| 1.1h | 218 | linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)|
Numbers taken from https://hud.pytorch.org/metrics from 04/10/2022 12:20 PM to 04/17/2022 12:20 PM.
The duration of these jobs on this PR are 39m and 38m.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75918
Approved by: https://github.com/seemethere, https://github.com/janeyx99
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71178
This should no longer be needed as we now set a lifecycle policy on ECR
and we also don't generate lots of temporary containers anymore.
Test Plan: Imported from OSS
Reviewed By: seemethere
Differential Revision: D33537851
Pulled By: suo
fbshipit-source-id: b97b7525be6f62ec8771dfb6a7ee13b22b78ac5a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64721
There are a couple PRs where distributed CI did not runa nd we expect
it to. Examples:
https://github.com/pytorch/pytorch/pull/64513/checks?check_run_id=3539190960,
https://github.com/pytorch/pytorch/pull/64113. All distributed tests should've
been run on these PRs, but we can see they were not:
```
Determination is skipping distributed/test_c10d_common
Determination is skipping distributed/test_c10d_gloo
Determination is skipping distributed/test_c10d_nccl
Determination is skipping distributed/test_c10d_spawn_gloo
Determination is skipping distributed/test_c10d_spawn_nccl
Running distributed/test_data_parallel without determination
Determination is skipping distributed/test_distributed_spawn
Determination is skipping distributed/test_jit_c10d
```
Since it is important to run distributed tests on PRs that touch distributed,
exclude distributed from target_det_list for now.
ghstack-source-id: 137654015
Test Plan: CI
Reviewed By: driazati, mrshenli
Differential Revision: D30830455
fbshipit-source-id: 8b0fdf5b57c2c647b0d82c48e2bb8e2bdbe4d307
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64598
This adds a filter option rather than an all-or-nothing so it's easier to iterate on a specific job.
```bash
python tools/testing/explicit_ci_jobs.py --filter-gha '*generated-linux-*gcc5.4*'
```
See #64600 for an example usage
NB: If you regenerate the worfklows you will need to re-run that command to re-delete everything.
Test Plan: Imported from OSS
Reviewed By: janeyx99
Differential Revision: D30788850
Pulled By: driazati
fbshipit-source-id: a32c266bbd876c396665bceef9a0a961b4586564
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63400
This is the first step to break up test_autograd.py for #63205.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D30541499
Pulled By: dagitses
fbshipit-source-id: 8d9d32007938b9eade0e88f95a6a3190e7e2ef01
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64102
We sort this list so that we may add comments to indicate the absence
of a file right where that file would need to be put. This makes it
difficult to wrongly add such a file.
The sorting itself was done programmatically to ensure that no entries
were inadvertently removed.
I printed the sorted list with:
```
for p in sorted(TARGET_DET_LIST):
print(f' "{p}",')
```
Then copied it back into the file.
Test Plan: Imported from OSS
Reviewed By: driazati
Differential Revision: D30625076
Pulled By: dagitses
fbshipit-source-id: cf36fcb3e53e274b76d1f4aae83da1f53c03f9ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809
This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit.
Test Plan: Imported from OSS
Reviewed By: malfet, seemethere, janeyx99
Differential Revision: D30497438
Pulled By: driazati
fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f
Summary:
and into tools/ folder
Currently run_tests.py invokes tools/test_selections.py
1. download and analyze what test_file to run
2. download and parse S3 stats and pass the info to local files.
3. common_utils.py uses download S3 stats to determine what test cases to run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479
Reviewed By: janeyx99
Differential Revision: D29661986
Pulled By: walterddr
fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595
Summary:
run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment
Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards)
Follow up PRs should:
- refactor those file read/write logic entangled inside test_selections.py into stats/ folder
- restructure and add network independent test logics to test_test_selections.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124
Test Plan:
- tools/test
- CI
Related PR:
This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373
Reviewed By: malfet
Differential Revision: D29558981
Pulled By: walterddr
fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60711
We already build the docs on each PR, this adds a step to push the relevant folder of the docs (we build the entire website for pytorch.github.io which clocks in at around 500 MB, but we really only need the "master" docs, not every version. The master docs by themselves are around 50 MB which is more reasonable). It uses the same S3 bucket as the artifacts but places the items at the `pytorch/pytorch/pr-previews/<pr number>` prefix. The bucket has a rule to expire resources in that prefix after 1 month.
On the AWS side the bucket has static hosting enabled with CloudFront directing to the docs preview prefix, so you can see the output at `https://d28slxzaq48q8t.cloudfront.net/<pr number>/`, e.g. https://d28slxzaq48q8t.cloudfront.net/60711/. For advertising we could link this on the HUD PR page as well as in the Dr. CI comment. We could add a CNAME on CloudFront to make this be `pr-preview.pytorch.org/<pr number>` or something but having random PRs be able to host content on the pytorch.org domain seems sketchy.
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D29398818
Pulled By: driazati
fbshipit-source-id: 24032854d83815853b3650d8e54f60b684707f76
Summary:
Changes including:
- introduced `linter/`, `testing/`, `stats/` folders in `tools/`
- move appropriate scripts into these folders
- change grepped references in the pytorch/pytorch repo
Next step
- introduce `build/` folder for build scripts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473
Test Plan:
- CI (this is important b/c pytorch/test-infra also rely on some script reference.
- tools/tests/
Reviewed By: albanD
Differential Revision: D29352716
Pulled By: walterddr
fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd