pytorch/tools/stats
Yang Wang f76f4abf3f Track monitor (#156907)
Tracking gpu mem allocation, we were tracking the gpu bandwidth memory, the mem allocation is the one reflect wether the gpu is oom or not, upcoming ui fix.

UI fix: https://github.com/pytorch/test-infra/pull/6878/files

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156907
Approved by: https://github.com/huydhn
2025-07-18 22:54:13 +00:00
..
upload_utilization_stats [Monitoring] enable local logs and add mac test monitoring (#153454) 2025-05-20 17:14:40 +00:00
__init__.py
check_disabled_tests.py [BE] fix typos in tools/ (#156082) 2025-06-17 19:25:50 +00:00
export_test_times.py Revert "Use absolute path path.resolve() -> path.absolute() (#129409)" 2025-01-04 14:17:20 +00:00
import_test_stats.py PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
monitor.py Track monitor (#156907) 2025-07-18 22:54:13 +00:00
README.md
sccache_stats_to_benchmark_format.py PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
test_dashboard.py
upload_artifacts.py Revert "Use absolute path path.resolve() -> path.absolute() (#129409)" 2025-01-04 14:17:20 +00:00
upload_dynamo_perf_stats.py Enable ruff rule S324 (#147665) 2025-02-25 18:27:34 +00:00
upload_external_contrib_stats.py Fix broken URLs (#152237) 2025-04-27 09:56:42 +00:00
upload_metrics.py
upload_sccache_stats.py
upload_stats_lib.py [CI] test upload: better check for if job is rerun disabled tests (#148027) 2025-02-28 00:04:33 +00:00
upload_test_stats_intermediate.py
upload_test_stats_running_jobs.py PEP585 update - benchmarks tools torchgen (#145101) 2025-01-18 05:05:07 +00:00
upload_test_stats.py Fix flaky "Upload test stats" job (#143991) 2024-12-30 21:40:01 +00:00
utilization_stats_lib.py Track monitor (#156907) 2025-07-18 22:54:13 +00:00

PyTorch CI Stats

We track various stats about each CI job.

  1. Jobs upload their artifacts to an intermediate data store (either GitHub Actions artifacts or S3, depending on what permissions the job has). Example: a9f6a35a33/.github/workflows/_linux-build.yml (L144-L151)
  2. When a workflow completes, a workflow_run event triggers upload-test-stats.yml.
  3. upload-test-stats downloads the raw stats from the intermediate data store and uploads them as JSON to s3, which then uploads to our database backend
graph LR
    J1[Job with AWS creds<br>e.g. linux, win] --raw stats--> S3[(AWS S3)]
    J2[Job w/o AWS creds<br>e.g. mac] --raw stats--> GHA[(GH artifacts)]

    S3 --> uts[upload-test-stats.yml]
    GHA --> uts

    uts --json--> s3[(s3)]
    s3 --> DB[(database)]

Why this weird indirection? Because writing to the database requires special permissions which, for security reasons, we do not want to give to pull request CI. Instead, we implemented GitHub's recommended pattern for cases like this.

For more details about what stats we export, check out upload-test-stats.yml