pytorch/tools/stats/export_slow_tests.py
Andrey Talman 2e15e16f8f Excluding ASAN and periodic jobs from slow job calculation (#74253)
Summary:
Mitigates https://github.com/pytorch/pytorch/issues/72368

As per discussion here  https://github.com/pytorch/pytorch/issues/72368
Some ASAN tests take much longer then same tests that not run under ASAN:

```
test_fn_gradgrad_pca_lowrank_cpu_float64 (__main__.TestGradientsCPU) ... ok (60.780s)
test_fn_gradgrad_svd_cpu_complex128 (__main__.TestGradientsCPU) ... ok (69.131s)
test_inplace_gradgrad_cumprod_cpu_complex128 (__main__.TestGradientsCPU) ... ok (211.554s)
test_variant_consistency_jit_diff_cpu_complex64 (__main__.TestJitCPU) ... ok (67.640s)
2022-03-12T21:46:25.3026906Z   test_variant_consistency_jit_linalg_solve_triangular_cpu_float32 (__main__.TestJitCPU) ... ok (125.208s)
2022-03-12T21:48:58.0469092Z   test_variant_consistency_jit_linalg_svd_cpu_complex64 (__main__.TestJitCPU) ... ok (152.744s)
2022-03-12T21:50:11.6688335Z   test_variant_consistency_jit_linalg_svd_cpu_float32 (__main__.TestJitCPU) ... ok (73.622s)
2022-03-12T21:54:44.5263321Z   test_variant_consistency_jit_lu_solve_cpu_complex64 (__main__.TestJitCPU) ... ok (102.051s)
2022-03-12T21:55:35.6167891Z   test_variant_consistency_jit_lu_solve_cpu_float32 (__main__.TestJitCPU) ... ok (51.090s)
2022-03-12T22:00:58.4220662Z   test_variant_consistency_jit_nanquantile_cpu_float32 (__main__.TestJitCPU) ... ok (47.142s)
2022-03-12T22:12:25.8979944Z   test_variant_consistency_jit_nn_functional_max_pool1d_cpu_float32 (__main__.TestJitCPU) ... ok (494.579s)
2022-03-12T22:32:45.9750642Z   test_variant_consistency_jit_nn_functional_max_pool2d_cpu_float32 (__main__.TestJitCPU) ... ok (1220.077s)
2022-03-12T22:40:31.3121960Z   test_variant_consistency_jit_nn_functional_max_pool3d_cpu_float32 (__main__.TestJitCPU) ... ok (465.337s)
2022-03-12T22:41:56.5711967Z   test_variant_consistency_jit_nn_functional_pad_circular_cpu_complex64 (__main__.TestJitCPU) ... ok (58.542s)
2022-03-12T22:45:48.7048047Z   test_variant_consistency_jit_nn_functional_pad_constant_cpu_complex64 (__main__.TestJitCPU) ... ok (232.128s)
2022-03-12T22:47:49.1422719Z   test_variant_consistency_jit_nn_functional_pad_constant_cpu_float32 (__main__.TestJitCPU) ... ok (120.437s)
2022-03-12T22:48:48.9686822Z   test_variant_consistency_jit_nn_functional_pad_reflect_cpu_complex64 (__main__.TestJitCPU) ... ok (59.826s)
2022-03-12T22:49:49.2502012Z   test_variant_consistency_jit_nn_functional_pad_replicate_cpu_complex64 (__main__.TestJitCPU) ... ok (60.272s)
2022-03-12T22:51:02.5255728Z   test_variant_consistency_jit_nn_functional_poisson_nll_loss_cpu_float32 (__main__.TestJitCPU) ... ok (73.208s)
2022-03-12T22:54:23.8291811Z   test_variant_consistency_jit_norm_cpu_complex64 (__main__.TestJitCPU) ... ok (136.107s)
2022-03-12T22:55:33.0800761Z   test_variant_consistency_jit_norm_cpu_float32 (__main__.TestJitCPU) ... ok (69.251s)
2022-03-12T22:57:50.4699741Z   test_variant_consistency_jit_ormqr_cpu_complex64 (__main__.TestJitCPU) ... ok (105.720s)
2022-03-12T22:58:46.2191192Z   test_variant_consistency_jit_ormqr_cpu_float32 (__main__.TestJitCPU) ... ok (55.749s)
2022-03-12T23:01:40.5424782Z   test_variant_consistency_jit_prod_cpu_complex64 (__main__.TestJitCPU) ... ok (89.440s)
2022-03-12T23:03:20.2300845Z   test_variant_consistency_jit_put_cpu_complex64 (__main__.TestJitCPU) ... ok (55.004s)
2022-03-12T23:05:34.2481242Z   test_variant_consistency_jit_qr_cpu_complex64 (__main__.TestJitCPU) ... ok (106.490s)
2022-03-12T23:06:26.1268335Z   test_variant_consistency_jit_qr_cpu_float32 (__main__.TestJitCPU) ... ok (51.879s)
2022-03-12T23:07:51.5261184Z   test_variant_consistency_jit_quantile_cpu_float32 (__main__.TestJitCPU) ... ok (85.399s)
test_variant_consistency_jit_sort_cpu_float32 (__main__.TestJitCPU) ... ok (84.378s)
2022-03-12T23:23:48.6314435Z   test_variant_consistency_jit_sum_cpu_complex64 (__main__.TestJitCPU) ... ok (55.706s)
2022-03-12T23:24:19.2219967Z   test_variant_consistency_jit_sum_cpu_float32 (__main__.TestJitCPU) ... ok (30.590s)
2022-03-12T23:36:45.9809917Z   test_variant_consistency_jit_svd_cpu_complex64 (__main__.TestJitCPU) ... ok (746.744s)
2022-03-12T23:42:42.7827088Z   test_variant_consistency_jit_svd_cpu_float32 (__main__.TestJitCPU) ... ok (356.802s)
2022-03-12T23:47:12.7248896Z   test_variant_consistency_jit_tile_cpu_complex64 (__main__.TestJitCPU) ... ok (85.721s)
```
-------------

This PR ignores ASAN and periodic job for slow job calculation.

Tested  by printing the matched jobs rather then continue:

```
python export_slow_test.py
Overwriting existent file: .pytorch-slow-tests.json
linux-xenial-py3.7-clang7-asan-test
linux-xenial-py3.7-clang7-asan-test
linux-xenial-py3.7-clang7-asan-test
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck-test
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck-test
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74253

Reviewed By: janeyx99

Differential Revision: D34906097

Pulled By: atalman

fbshipit-source-id: 48103f970e0c4e8683aebd8f78b7d7546fdb35d4
(cherry picked from commit 233e5942363e56c60a6f3d68aa09865158374acc)
2022-03-16 15:16:27 +00:00

111 lines
4.6 KiB
Python

#!/usr/bin/env python3
import argparse
import json
import os
import statistics
from collections import defaultdict
from tools.stats.s3_stat_parser import get_previous_reports_for_branch, Report, Version2Report
from typing import cast, DefaultDict, Dict, List, Any
from urllib.request import urlopen
SLOW_TESTS_FILE = '.pytorch-slow-tests.json'
SLOW_TEST_CASE_THRESHOLD_SEC = 60.0
RELATIVE_DIFFERENCE_THRESHOLD = 0.1
IGNORED_JOBS = ["asan", "periodic"]
def get_test_case_times() -> Dict[str, float]:
reports: List[Report] = get_previous_reports_for_branch('origin/viable/strict', "")
# an entry will be like ("test_doc_examples (__main__.TestTypeHints)" -> [values]))
test_names_to_times: DefaultDict[str, List[float]] = defaultdict(list)
for report in reports:
if report.get('format_version', 1) != 2: # type: ignore[misc]
raise RuntimeError("S3 format currently handled is version 2 only")
v2report = cast(Version2Report, report)
if any(job_name in str(report['build_job']) for job_name in IGNORED_JOBS):
continue
for test_file in v2report['files'].values():
for suitename, test_suite in test_file['suites'].items():
for casename, test_case in test_suite['cases'].items():
# The below attaches a __main__ as that matches the format of test.__class__ in
# common_utils.py (where this data will be used), and also matches what the output
# of a running test would look like.
name = f'{casename} (__main__.{suitename})'
succeeded: bool = test_case['status'] is None
if succeeded:
test_names_to_times[name].append(test_case['seconds'])
return {test_case: statistics.mean(times) for test_case, times in test_names_to_times.items()}
def filter_slow_tests(test_cases_dict: Dict[str, float]) -> Dict[str, float]:
return {test_case: time for test_case, time in test_cases_dict.items() if time >= SLOW_TEST_CASE_THRESHOLD_SEC}
def get_test_infra_slow_tests() -> Dict[str, float]:
url = "https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/slow-tests.json"
contents = urlopen(url, timeout=1).read().decode('utf-8')
return cast(Dict[str, float], json.loads(contents))
def too_similar(calculated_times: Dict[str, float], other_times: Dict[str, float], threshold: float) -> bool:
# check that their keys are the same
if calculated_times.keys() != other_times.keys():
return False
for test_case, test_time in calculated_times.items():
other_test_time = other_times[test_case]
relative_difference = abs((other_test_time - test_time) / max(other_test_time, test_time))
if relative_difference > threshold:
return False
return True
def export_slow_tests(options: Any) -> None:
filename = options.filename
if os.path.exists(filename):
print(f'Overwriting existent file: {filename}')
with open(filename, 'w+') as file:
slow_test_times: Dict[str, float] = filter_slow_tests(get_test_case_times())
if options.ignore_small_diffs:
test_infra_slow_tests_dict = get_test_infra_slow_tests()
if too_similar(slow_test_times, test_infra_slow_tests_dict, options.ignore_small_diffs):
slow_test_times = test_infra_slow_tests_dict
json.dump(slow_test_times, file, indent=' ', separators=(',', ': '), sort_keys=True)
file.write('\n')
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description='Export a JSON of slow test cases in PyTorch unit test suite')
parser.add_argument(
'-f',
'--filename',
nargs='?',
type=str,
default=SLOW_TESTS_FILE,
const=SLOW_TESTS_FILE,
help='Specify a file path to dump slow test times from previous S3 stats. Default file path: .pytorch-slow-tests.json',
)
parser.add_argument(
'--ignore-small-diffs',
nargs='?',
type=float,
const=RELATIVE_DIFFERENCE_THRESHOLD,
help='Compares generated results with stats/slow-tests.json in pytorch/test-infra. If the relative differences '
'between test times for each test are smaller than the threshold and the set of test cases have not '
'changed, we will export the stats already in stats/slow-tests.json. Else, we will export the calculated '
'results. The default threshold is 10%.',
)
return parser.parse_args()
def main() -> None:
options = parse_args()
export_slow_tests(options)
if __name__ == '__main__':
main()