Commit Graph

10 Commits

Author SHA1 Message Date
Sam Estep
21ef248fb8 [reland] Report test time regressions (#50171)
Summary:
This is a followup to https://github.com/pytorch/pytorch/issues/49190. Vaguely speaking, the goals are to make it easy to identify test time regressions introduced by PRs. Eventually the hope is to use this information to edit Dr CI comments, but this particular PR just does the analysis and prints it to stdout, so a followup PR would be needed to edit the actual comments on GitHub.

**Important:** for uninteresting reasons, this PR moves the `print_test_stats.py` file.

- *Before:* `test/print_test_stats.py`
- *After:* `torch/testing/_internal/print_test_stats.py`

Notes on the approach:

- Just getting the mean and stdev for the total job time of the last _N_ commits isn't sufficient, because e.g. if `master` was broken 5 commits ago, then a lot of those job times will be much shorter, breaking the statistics.
- We use the commit history to make better estimates for the mean and stdev of individual test (and suite) times, but only when the test in that historical commit is present and its status matches that of the base commit.
- We list all the tests that were removed or added, or whose status changed (e.g. skipped to not skipped, or vice versa), along with time (estimate) info for that test case and its containing suite.
- We don't list tests whose time changed a lot if their status didn't change, because there's a lot of noise and it's unclear how to do that well without too many false positives.
- We show a human-readable commit graph that indicates exactly how many commits are in the pool of commits that could be causing regressions (e.g. if a PR has multiple commits in it, or if the base commit on `master` doesn't have a report in S3).
- We don't show an overall estimate of whether the PR increased or decreased the total test job time, because it's noisy and it's a bit tricky to aggregate stdevs up from individual tests to the whole job level. This might change in a followup PR.
- Instead, we simply show a summary at the bottom which says how many tests were removed/added/modified (where "modified" means that the status changed), and our best estimates of the mean times (and stdevs) of those changes.
- Importantly, the summary at the bottom is only for the test cases that were already shown in the more verbose diff report, and does not include any information about tests whose status didn't change but whose running time got much longer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50171

Test Plan:
To run the unit tests:
```
$ python test/test_testing.py
$ python test/print_test_stats.py
```

To verify that this works, check the [CircleCI logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/258628/workflows/9cfadc34-e042-485e-b3b3-dc251f160307) for a test job run on this PR; for example:
- pytorch_linux_bionic_py3_6_clang9_test

To test locally, use the following steps.

First run an arbitrary test suite (you need to have some XML reports so that `test/print_test_stats.py` runs, but we'll be ignoring them here via the `--use-json` CLI option):
```
$ DATA_DIR=/tmp
$ ARBITRARY_TEST=testing
$ python test/test_$ARBITRARY_TEST.py --save-xml=$DATA_DIR/test/test_$ARBITRARY_TEST
```
Now choose a commit and a test job (it has to be on `master` since we're going to grab the test time data from S3, and [we only upload test times to S3 on the `master`, `nightly`, and `release` branches](https://github.com/pytorch/pytorch/pull/49645)):
```
$ export CIRCLE_SHA1=c39fb9771d89632c5c3a163d3c00af3bef1bd489
$ export CIRCLE_JOB=pytorch_linux_bionic_py3_6_clang9_test
```
Download the `*.json.bz2` file(s) for that commit/job pair:
```
$ aws s3 cp s3://ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/ $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB --recursive
```
And feed everything into `test/print_test_stats.py`:
```
$ bzip2 -kdc $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/*Z.json.bz2 | torch/testing/_internal/print_test_stats.py --compare-with-s3 --use-json=/dev/stdin $DATA_DIR/test/test_$ARBITRARY_TEST
```
The first part of the output should be the same as before this PR; here is the new part, at the end of the output:

- https://pastebin.com/Jj1svhAn

Reviewed By: malfet, izdeby

Differential Revision: D26317769

Pulled By: samestep

fbshipit-source-id: 1ba06cec0fafac77f9e7341d57079543052d73db
2021-02-08 15:35:21 -08:00
Sam Estep
21dccbca62 Revert D26232345: [pytorch][PR] Report test time regressions
Test Plan: revert-hammer

Differential Revision:
D26232345 (7467f90b13)

Original commit changeset: b687b1737519

fbshipit-source-id: 10a031c5500b083f7c82f2ae2743b671c5a07bff
2021-02-08 10:15:07 -08:00
Sam Estep
7467f90b13 Report test time regressions (#50171)
Summary:
This is a followup to https://github.com/pytorch/pytorch/issues/49190. Vaguely speaking, the goals are to make it easy to identify test time regressions introduced by PRs. Eventually the hope is to use this information to edit Dr CI comments, but this particular PR just does the analysis and prints it to stdout, so a followup PR would be needed to edit the actual comments on GitHub.

**Important:** for uninteresting reasons, this PR moves the `print_test_stats.py` file.

- *Before:* `test/print_test_stats.py`
- *After:* `torch/testing/_internal/print_test_stats.py`

Notes on the approach:

- Just getting the mean and stdev for the total job time of the last _N_ commits isn't sufficient, because e.g. if `master` was broken 5 commits ago, then a lot of those job times will be much shorter, breaking the statistics.
- We use the commit history to make better estimates for the mean and stdev of individual test (and suite) times, but only when the test in that historical commit is present and its status matches that of the base commit.
- We list all the tests that were removed or added, or whose status changed (e.g. skipped to not skipped, or vice versa), along with time (estimate) info for that test case and its containing suite.
- We don't list tests whose time changed a lot if their status didn't change, because there's a lot of noise and it's unclear how to do that well without too many false positives.
- We show a human-readable commit graph that indicates exactly how many commits are in the pool of commits that could be causing regressions (e.g. if a PR has multiple commits in it, or if the base commit on `master` doesn't have a report in S3).
- We don't show an overall estimate of whether the PR increased or decreased the total test job time, because it's noisy and it's a bit tricky to aggregate stdevs up from individual tests to the whole job level. This might change in a followup PR.
- Instead, we simply show a summary at the bottom which says how many tests were removed/added/modified (where "modified" means that the status changed), and our best estimates of the mean times (and stdevs) of those changes.
- Importantly, the summary at the bottom is only for the test cases that were already shown in the more verbose diff report, and does not include any information about tests whose status didn't change but whose running time got much longer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50171

Test Plan:
To run the unit tests:
```
$ python test/test_testing.py
$ python test/print_test_stats.py
```

To verify that this works, check the [CircleCI logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/258628/workflows/9cfadc34-e042-485e-b3b3-dc251f160307) for a test job run on this PR; for example:
- pytorch_linux_bionic_py3_6_clang9_test

To test locally, use the following steps.

First run an arbitrary test suite (you need to have some XML reports so that `test/print_test_stats.py` runs, but we'll be ignoring them here via the `--use-json` CLI option):
```
$ DATA_DIR=/tmp
$ ARBITRARY_TEST=testing
$ python test/test_$ARBITRARY_TEST.py --save-xml=$DATA_DIR/test/test_$ARBITRARY_TEST
```
Now choose a commit and a test job (it has to be on `master` since we're going to grab the test time data from S3, and [we only upload test times to S3 on the `master`, `nightly`, and `release` branches](https://github.com/pytorch/pytorch/pull/49645)):
```
$ export CIRCLE_SHA1=c39fb9771d89632c5c3a163d3c00af3bef1bd489
$ export CIRCLE_JOB=pytorch_linux_bionic_py3_6_clang9_test
```
Download the `*.json.bz2` file(s) for that commit/job pair:
```
$ aws s3 cp s3://ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/ $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB --recursive
```
And feed everything into `test/print_test_stats.py`:
```
$ bzip2 -kdc $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/*Z.json.bz2 | torch/testing/_internal/print_test_stats.py --compare-with-s3 --use-json=/dev/stdin $DATA_DIR/test/test_$ARBITRARY_TEST
```
The first part of the output should be the same as before this PR; here is the new part, at the end of the output:

- https://pastebin.com/Jj1svhAn

Reviewed By: walterddr

Differential Revision: D26232345

Pulled By: samestep

fbshipit-source-id: b687b1737519d2eed68fbd591a667e4e029de509
2021-02-08 07:54:34 -08:00
Yujun Zhao
f3a79b881f add lcov to oss for beautiful html report (#44568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44568

By `lcov`, we can generate beautiful html. It's better than current file report and line report. Therefore in oss gcc, remove `export` code and `file/line level report` code, only use the html report.

But in clang, since such tool is not available, we will still use file-report and line-report generated by ourself.

Test Plan:
Test in docker ubuntu machine.
## Mesurement
1. After running `atest`, it takes about 15 mins to collect code coverage and genrate the report.
```
# gcc code coverage
python oss_coverage.py --run-only=atest
```

## Presentation
**The html result looks like:**

*Top Level:*

{F328330856}

*File Level:*

{F328336709}

Reviewed By: malfet

Differential Revision: D23550784

fbshipit-source-id: 1fff050e7f7d1cc8e86a6a200fd8db04b47f5f3e
2020-09-11 15:29:24 -07:00
Yujun Zhao
c2b40b056a Filter default tests for clang coverage in oss
Summary: Some tests like `test_dataloader.py` are not able to run under `clang` in oss, because it generates too large intermediate files (~40G) that can't be merged by `llvm`. Skip them when user doesn't specify the `--run-only` option

Test Plan: Test locally. But still, not recomend user to run `clang` coverage in default mode, because it takes too much space.

Reviewed By: malfet

Differential Revision: D23549829

fbshipit-source-id: 0737e6e9dcbe3f38de00580ee6007906e743e52f
2020-09-11 15:28:15 -07:00
Elias Ellison
f9146b4598 fix lint (#44346)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44346

Reviewed By: jamesr66a

Differential Revision: D23589324

Pulled By: eellison

fbshipit-source-id: a4e22b69196909ec200ac3e262f04d2aaf78e9cf
2020-09-08 18:29:44 -07:00
Yujun Zhao
49e979bfde Set default compiler differently according to platform (#43890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43890

1. auto-detect `CXX` default compiler type in oss, and `clang` as default compiler type in fbcode (because auto-detecting will say `gcc` is the default compiler on devserver).

2. change `compiler type` from str `"CLANG" "GCC"` to enum type
3. rename function `get_cov_type` to `detect_compiler_type`
4. auto-set the default pytorch folder for users in oss

Test Plan:
on devserver:
```
buck run :coverage //caffe2/c10:
```

on oss:
```
python oss_coverage.py --run-only=atest
```

Reviewed By: malfet

Differential Revision: D23420034

fbshipit-source-id: c0ea88188578bb1343a286f2090eb8a74cdf3982
2020-09-08 14:57:35 -07:00
yujunzhao@devvm1621.atn0.facebook.com
db6bd9d60b rename input argunment interested-folder to interest-only -- be consistent with other arguments (#43889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43889

1. rename input argunment `interested-folder` to `interest-only` -- be consistent with `run-only`, `coverage-only` and be shorted

Test Plan: Test on devserver and linux docker.

Reviewed By: malfet

Differential Revision: D23417338

fbshipit-source-id: ce9711e75ca3a1c30801ad6bd1a620f3b06819c5
2020-09-01 11:46:23 -07:00
yujunzhao@devvm1621.atn0.facebook.com
e941a462a3 Enable gcc coverage in OSS (#43883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43883

Check the result of GCC coverage in OSS is reasonable and ready to ship.

The amount of executable lines are not the same between `gcc` and `clang` because of the following reasons:
* Lines following are counted in `clang` but not in `gcc`:
1. empty line or line with only “{” or “}”
3. some comments are counted in clang but not in gcc
5. `#define ...` -- not supported by gcc according to official documentation

* Besides, a statement that explains to more than one line will be counted as only one executable line in gcc, but several lines in clang

## Advantage of `gcc` coverage
1. Much faster
- code coverage tool runtime is onle **4 min** (*ammazzzing!!*) by `gcc`, compared to **3 hours!!** by `clang`, to analyze all the tests' artifacts
2. Use less disk
- `Clang`'s artifacts will take as large as 170G, but `GCC` is 980M

Besides, also update `README.md`.

Test Plan:
Compare the result in OSS `clang` and OSS `gcc` with the same command:
```
python oss_coverage.py --run-only atest test_nn.py --interested-folder=aten
```

----

## GCC
**Summary**
> time: 0:15:45
summary percentage: 44.85%

**Report and Log**
[File Coverage Report](P140825162)
[Line Coverage Report](P140825196)
[Log](P140825385)

------

## CLANG

**Summary**
> time: 0:21:35
summary percentage: 44.08%

**Report and Log**
[File Coverage Report](P140825845)
[Line Coverage Report](P140825923)
[Log](P140825950)

----------

# Run all tests
```
# run all tests and get coverage over Pytorch
python oss_coverage.py
```
**Summary**
> time: 1:27:20. ( time to run tests:  1:23:33)
summary percentage: 56.62%

**Report and Log**
[File Coverage Report](P140837175)
[Log](P140837121)

Reviewed By: malfet

Differential Revision: D23416772

fbshipit-source-id: a6810fa4d8199690f10bd0a4f58a42ab2a22182b
2020-08-31 16:11:33 -07:00
yujunzhao@devvm229.ftw0.facebook.com
0564d7a652 Land code coverage tool for OSS (#43778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43778

Move code_coverage_tool from experimental folder to caffe2/tools folder.

Delete `TODO` and fb-related code.

Test Plan: Test locally

Reviewed By: malfet

Differential Revision: D23399983

fbshipit-source-id: 92316fd3cc88409d087d2dc6ed0be674155b3762
2020-08-28 13:56:15 -07:00