1) always generate expected_results.csv up to accuracy of first three digits
ex: 112313212312 --> 1120000000 .. etc
2) regenerate all record in expected_results.csv and not just failed ones , why? because if we change something
by 1.3% and noise 1.5% we want to reflect that.
3) add "please update all results that changed significantly, and not only the failed ones"
```
(myenv) [lsakka@devgpu005.nha1 ~/pytorch/benchmarks/dynamo/pr_time_benchmarks (check_result_ehancements)]$ python check_results.py test_check_result/expected_test.csv te
st_check_result/result_test.csv out
WIN: benchmark ('a', 'instruction count') failed, actual result 9011111111 is -18.16% lower than expected 11011111111 ±1.00% please update the expected results.
please update all results that changed significantly, and not only the failed ones
REGRESSION: benchmark ('b', 'memory') failed, actual result 20011111111 is 99.89% higher than expected 10011111111 ±+10.00% if this is an expected regression, please update the expected results.
please update all results that changed significantly, and not only the failed ones
REGRESSION: benchmark ('c', 'something') failed, actual result 107111111111 is 969.92% higher than expected 10011111111 ±+10.00% if this is an expected regression, please update the expected results.
please update all results that changed significantly, and not only the failed ones
MISSING REGRESSION TEST: benchmark ('d', 'missing-test') does not have a regression test enabled for it.
new expected results file content if needed:
a,instruction count,9011000000,0.01
b,memory,20010000000,0.1
c,something,107100000000,0.1
There was some failures you can use the new reference expected result stored at path:out and printed above
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137925
Approved by: https://github.com/aorenste
update_hint_regression has been behaving, so I am setting 2% noise threshold for it. 1.5% for sum_floordiv_regression.
I have one concern, with the way we do the regression detection. small or changes <threshold level will accumulate and eventually trigger failure. to avoid those would have to keep any eye on the dashboard and potentially refresh the expected result file regularly even when there is no faluires. .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137548
Approved by: https://github.com/aorenste
The test also add singpost log for the benchmarks that pass.
to test run I ran python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv out.csv
results
```
WIN: benchmark ('a', 'instruction count') failed, actual result 90 is -18.18% lower than expected 110 ±1.00% please update the expected results.
REGRESSION: benchmark ('b', 'memory') failed, actual result 200 is 100.00% higher than expected 100 ±+10.00% if this is an expected regression, please update the expected results.
PASS: benchmark ('c', 'something') pass, actual result 107 +7.00% is within expected 100 ±10.00%
MISSING REGRESSION TEST: benchmark ('d', 'missing-test') does not have a regression test enabled for it.
You can use the new reference expected result stored at path: out.csv.
a,instruction count,90,0.01
b,memory,200,0.1
c,something,100,0.1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137551
Approved by: https://github.com/aorenste
1. example of failing diff
https://github.com/pytorch/pytorch/pull/136740
2. test this by running
python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv
results
```
WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results.
REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results.
MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it
```
MISSING REGRESSION TEST does not fail but its logged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136551
Approved by: https://github.com/ezyang
ghstack dependencies: #136383