pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Catherine Lee eef253d9f6 [CI] Keep going display on HUD: upload log when test fails (#155371 ) I guess this is more of an RFC Goal: Enable keep going so that we can get information immediately for failures. We want be aware of failures as soon as possible, especially on the main branch, this is so that reverts can happen quickly. Proposal: A job with `keep-going` will continue through errors in `python run_test.py`. If a test fails, before it runs the next test, it will upload a fake log that should have enough information in it so that viewing the log will be able to tell you what failed and any stack traces/error logs, and should be able to be parsed by log classifier to get a line. I am getting the log by concating the test logs in test/test-reports, which is all the text outputted by pytest (unless someone runs with `ci-verbose-test-logs` label). There are obviously many things this won't catch, ex output outside of run_test.py, some output inside of run_test.py, but it should be enough. After a log finishes, eventually its raw log is uploaded to ossci-raw-job-status s3 bucket and the log classifier will read it to do classification. This means we will have to change log classifier to read from this bucket as well. I'm thinking just add an input parameter to log classifier like https://github.com/pytorch/test-infra/pull/6723/files Also upload the temp results to a temp attribute instead of the real one To overwrite the conclusion on HUD, I'm thinking a lambda that is s3 put trigger on the fake log being put into s3, that does something similar to log classifier where it just mutates the entry `13a990b678/aws/lambda/log-classifier/src/network.rs (L85)` to add a new field like "will_fail": true, and also triggers the log classifier to run Then we change HUD/ClickHouse to point the raw log url to the alternate place, the new "will_fail" field as the conclusion, and the temp log classifier result if needed Why always write to temp attribution/column? I am unsure about overwriting the real results with fake ones Pros: Not many changes outside of HUD/UI Cons: Lots of moving parts, lots of temp fields that will require adjustment for queries, temp fields never really get deleted Pull Request resolved: https://github.com/pytorch/pytorch/pull/155371 Approved by: https://github.com/malfet		2025-06-13 21:21:55 +00:00
..
target_determination	[TD] test_cpp_extensions_aot_ninja corresponds to things in test/cpp_extensions (#148992 )	2025-03-12 15:40:06 +00:00
__init__.py
clickhouse.py	[BE][CI] bump ruff to 0.9.8 (#145606 )	2025-02-27 21:01:10 +00:00
discover_tests.py	Add __main__ guards to tests (#154716 )	2025-06-04 14:38:13 +00:00
do_target_determination_for_s3.py	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )"	2025-01-04 14:17:20 +00:00
explicit_ci_jobs.py	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )"	2025-01-04 14:17:20 +00:00
modulefinder_determinator.py	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )"	2025-01-04 14:17:20 +00:00
test_run.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
test_selections.py	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 )	2025-02-27 20:46:16 +00:00
update_slow_tests.py	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )"	2025-01-04 14:17:20 +00:00
upload_artifacts.py	[CI] Keep going display on HUD: upload log when test fails (#155371 )	2025-06-13 21:21:55 +00:00