pytorch/caffe2/python/serialized_test
Sam Estep 5bcbbf5373 Lint trailing newlines (#54737)
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.

The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:

- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`

I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):

- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)

To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737

Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:

- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true

In contrast, this run (after correcting the trailing newlines in this PR) succeeded:

- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241

To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```

Reviewed By: malfet

Differential Revision: D27409736

Pulled By: samestep

fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
2021-03-30 13:09:52 -07:00
..
data/operator_test
__init__.py remediation of S205607 2020-07-17 17:19:47 -07:00
coverage.py Remove __future__ imports for legacy Python2 supports (#45033) 2020-09-23 17:57:02 -07:00
README.md Forbid trailing whitespace (#53406) 2021-03-05 17:22:55 -08:00
serialized_test_util.py Remove __future__ imports for legacy Python2 supports (#45033) 2020-09-23 17:57:02 -07:00
SerializedTestCoverage.md Lint trailing newlines (#54737) 2021-03-30 13:09:52 -07:00

Serialized operator test framework

Major functionality lives in serialized_test_util.py

How to use

  1. Extend the test case class from SerializedTestCase
  2. Change the @given decorator to @serialized_test_util.given. This runs a seeded hypothesis test instance which will generate outputs if desired in addition to the unseeded hypothesis tests normally run.
  3. [Optional] Add (or change a call of unittest.main() to) testWithArgs in __main__. This allows you to generate outputs using python caffe2/python/operator_test/my_test.py -G.
  4. Run your test python -m pytest caffe2/python/operator_test/my_test.py -G to generate serialized outputs. They will live in caffe2/python/serialized_test/data/operator_test, one zip file per test function. The zip file contains an inout.npz file of the inputs, outputs, and meta data (like device type), a op.pb file of the operator, and grad_#.pb files of the gradients if there are any. Use -O to change the output directory. This also generates a markdown document summarizing the coverage of serialized tests. We can disable generating this coverage document using the -C flag.
  5. Thereafter, runs of the test without the flag will load serialized outputs and gradient operators for comparison against the seeded run. The comparison is done as long as you have a call to assertReferenceChecks. If for any reason the seeded run's inputs are different (this can happen with different hypothesis versions or different setups), then we'll run the serialized inputs through the serialized operator to get a runtime output for comparison.

Coverage report

SerializedTestCoverage.md contains some statistics about the coverage of serialized tests. It is regenerated every time someone regenerates a serialized test (i.e. running an operator test with the -G option). If you run into merge conflicts for the file, please rebase and regenerate. If you'd like to disable generating this file when generating the serialized test, you can run with -G -C. The logic for generating this file lives in coverage.py.

##Additional Notes

If we'd like to extend the test framework beyond that for operator tests, we can create a new subfolder for them inside caffe2/python/serialized_test/data.

Note, we currently don't support using other hypothesis decorators on top of given_and_seeded. Hypothesis has some handling to explicitly check that @given is on the bottom of the decorator stack.

If there are multiple calls to assertReferenceChecks in a test function, we'll serialize and write the last one. The actual input checked may then differ if we refactor a test function that calls this multiple times, though the serialized test should still pass since we then use the serialized input to generate a dynamic output.