pytorch/tools/stats
Catherine Lee 06b52dd103 TD outside of test job (#118250)
Give TD it's own job so that each shard can get the results from this one job artifact and they will always be in sync with each other/no longer need to worry about consistently issues

* Move test discovery to its own file that is not dependent on torch so it can be run without building torch
  * Cannot do cpp test discovery before building pytorch
* Move TD calculation to own file that will create a json file with the final results
* TD is now job/build env agnostic
* TD will rank all tests, including those that test jobs may not want to run (ex it will rank distributed tests along with default tests, even though these tests are never run on the same machine together)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118250
Approved by: https://github.com/huydhn
2024-03-01 23:08:10 +00:00
..
__init__.py
check_disabled_tests.py Handle the list of skipped messages when uploading disabled test stats (#104803) 2023-07-08 07:23:46 +00:00
export_test_times.py TD outside of test job (#118250) 2024-03-01 23:08:10 +00:00
import_test_stats.py [td] Consistent pytest cache (#113804) 2023-11-17 23:45:47 +00:00
monitor.py Bump black version to 23.1.0 (#96578) 2023-03-15 06:27:59 +00:00
README.md
upload_artifacts.py
upload_dynamo_perf_stats.py Specify the head branch when upload perf stats to Rockset (#97643) 2023-03-27 17:17:52 +00:00
upload_external_contrib_stats.py Update list of bots in upload_external_contrib_stats.py (#102786) 2023-06-02 18:34:22 +00:00
upload_metrics.py TD outside of test job (#118250) 2024-03-01 23:08:10 +00:00
upload_sccache_stats.py
upload_stats_lib.py Make emit_metrics importable without having boto3 installed (#107070) 2023-08-21 21:13:01 +00:00
upload_test_stat_aggregates.py Access ROCKSET_API_KEY from ephemeral runners (#107652) 2023-08-22 17:02:44 +00:00
upload_test_stats.py [ez] Remove unused code in upload_test_stats (#111504) 2023-10-19 16:09:15 +00:00

PyTorch CI Stats

We track various stats about each CI job.

  1. Jobs upload their artifacts to an intermediate data store (either GitHub Actions artifacts or S3, depending on what permissions the job has). Example: a9f6a35a33/.github/workflows/_linux-build.yml (L144-L151)
  2. When a workflow completes, a workflow_run event triggers upload-test-stats.yml.
  3. upload-test-stats downloads the raw stats from the intermediate data store and uploads them as JSON to Rockset, our metrics backend.
graph LR
    J1[Job with AWS creds<br>e.g. linux, win] --raw stats--> S3[(AWS S3)]
    J2[Job w/o AWS creds<br>e.g. mac] --raw stats--> GHA[(GH artifacts)]

    S3 --> uts[upload-test-stats.yml]
    GHA --> uts

    uts --json--> R[(Rockset)]

Why this weird indirection? Because writing to Rockset requires special permissions which, for security reasons, we do not want to give to pull request CI. Instead, we implemented GitHub's recommended pattern for cases like this.

For more details about what stats we export, check out upload-test-stats.yml