Splitting into a seperate PR in case of bike shedding. We can't use
the normal fluent syntax `SampleInput(x).name("foo")` because `.name`
is already how the metadata is accessed. So instead, this adds a
single function where you pass keyword arguments to fill in the
metadata, e.g.
```
SampleInput(x).with_metadata(
name="foo", output_process_fn_grad=out_fn)
```
An alternative closer to the normal fluent style would be to adding a
prefix to the property's name, e.g.
```
(SampleInput(x)
.with_name("foo")
.with_output_process_fn_grad(out_fn))
```
However, I have a slight preference for the `with_metadata` style
because you don't need to add extra parenthesis to break lines.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85890
Approved by: https://github.com/mruberry
Most SampleInput objects currently have no additional metadata,
meaning they have a 1:1 mapping with a normal function call. This adds
var arg forms of the `SampleInput` constructor such that you can just
call the `SampleInput` constructor as you would call the operator.
So, for example
```python
SampleInput(make_arg(shape), args=(2, 3), kwargs=dict(alpha=4))
```
becomes
```python
SampleInput(make_arg(shape), 2, 3, alpha=4)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85723
Approved by: https://github.com/mruberry
Ref #82518
Starting small to minimize merge conflicts, this moves the top-level
class definitions and some helper functions into the `opinfos` folder.
It also brings `common_methods_invocations.py` to just below 1MB.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82540
Approved by: https://github.com/albanD
Lightning callback that enables post-training sparsity.
This callback aims to sparsify the model inside lightning module after training.
**Note that the model is copied and then sparsified, so the existing model is not modified**
The sparsified model can be used for comparison and can be accessed using <callback_obj>.sparsified
Test Plan
```python torch/ao/sparsity/_experimental/data_sparsifier/lightning/tests/test_callbacks.py TestPostTrainingCallback```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80370
Approved by: https://github.com/z-a-f
This PR makes the following improvements:
- moves the custom skip list for test_normalize_operator_exhaustive in test_fx_experimental to use the typical OpInfo skip architecture. The skips were updated to xfails, and that identified some operators which were no longer failing the test
- redundant tests with OpInfo-based testing in test_jit.py were removed
- test_dtypes was improved so its error messages are clear and it makes test_nondifferentiable redundant; the latter test has been removed
- OpInfo.supports_complex_autograd() is removed in favor of a more accurate and general test for whether the particular dtype is in the backward dtypes of the operator
- gradchecks have been improved to verify that an operator doesn't support grad if it claims not to
- gradchecks have been improved to test the gradient of all input tensors that require gradient
- the concept of "default test dtypes" has been removed
- excessive and mostly redundant out testing for elementwise unary operators has been removed
- metadata for whether an op supports nuanced "safe casting" to out behavior has been removed from OpInfos
- numerous skips have been converted to xfails
- numerous OpInfos have had their metadata fixed based on the new checks
- jit-specific utilities in common_methods_invocations.py have been moved to jit_programming_utils.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75951
Approved by: https://github.com/ngimel
This PR extends our OpInfo test architecture with "reference inputs," an optional expansion of typical sample inputs that allows for more thorough testing. Currently only the elementwise binary operations implement an extended set of reference inputs. This PR also cleans up some smaller OpInfo-related issues, including several bugs, and it identified https://github.com/pytorch/pytorch/issues/74279.
A reference inputs function can be specified for an OpInfo by filling in its "reference_inputs_func" metadata. If this is done it's recommended that the reference inputs function first call the sample inputs function, then produce additional sample inputs. See `reference_inputs_elementwise_binary` for an example of this pattern.
In addition to implementing reference inputs for the elementwise binary operations, this PR improves their consistency and simplifies how their metadata is represented. The great majority now use a generic sample input function, and those that want extensions start by calling the generic sample input function and then adding additional samples. This removes many older sample input functions. The BinaryUfuncInfo subclass also now allows specifying scalar support more precisely, and reference inputs and error inputs are generated based on this metadata to ensure it's correct.
cc @kshitij12345 @pmeier @zou3519 @Chillee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74280
Approved by: https://github.com/ngimel
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70304
Without this patch `TensorLikePair` will try to instantiate everything although it should only do so for tensor-likes. This is problematic if it is used before a different pair that would be able to handle the inputs but never gets to do so, because `TensorLikePair` bails out before.
```python
from torch.testing._comparison import assert_equal, TensorLikePair, ObjectPair
assert_equal("a", "a", pair_types=(TensorLikePair, ObjectPair))
```
```
ValueError: Constructing a tensor from <class 'str'> failed with
new(): invalid data type 'str'.
```
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33542995
Pulled By: mruberry
fbshipit-source-id: 77a5cc0abad44356c3ec64c7ec46e84d166ab2dd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68728
This removes the ability for `assert_close` to `.coalesce()` the tensors internally. Additionally, we now also check `.sparse_dim()`. Sparse team: please make sure that is the behavior you want for all sparse COO comparisons in the future. #67796 will temporarily keep BC by always coalescing, but in the future `TestCase.assertEqual` will no longer do that.
cc nikitaved pearu cpuhrsch IvanYashchuk
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33542996
Pulled By: mruberry
fbshipit-source-id: a8d2322c6ee1ca424e3efb14ab21787328cf28fc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68802
Without this patch, the error message of comparing meta tensors looks like this after #68722 was merged:
```python
>>> t = torch.empty((), device="meta")
>>> assert_close(t, t)
NotImplementedError: Could not run 'aten::abs.out' with arguments from the 'Meta' backend. [...]
[...]
The above exception was the direct cause of the following exception:
[...]
RuntimeError: Comparing
TensorLikePair(
id=(),
actual=tensor(..., device='meta', size=()),
expected=tensor(..., device='meta', size=()),
rtol=1.3e-06,
atol=1e-05,
equal_nan=False,
check_device=True,
check_dtype=True,
check_layout=True,
check_stride=False,
check_is_coalesced=True,
)
resulted in the unexpected exception above. If you are a user and see this message during normal operation please file an issue at https://github.com/pytorch/pytorch/issues. If you are a developer and working on the comparison functions, please except the previous error and raise an expressive `ErrorMeta` instead.
```
Thus, we follow our own advice and turn it into an expected exception until #68592 is resolved:
```python
>>> t = torch.empty((), device="meta")
>>> assert_close(t, t)
ValueError: Comparing meta tensors is currently not supported
```
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33542999
Pulled By: mruberry
fbshipit-source-id: 0fe1ddee15b5decdbd4c5dd84f03804ca7eac95b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68977
Follow-up to #68722 to address the review comments that were left open before merge.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33542998
Pulled By: mruberry
fbshipit-source-id: 23c567cd328f83ae4df561ac8ee6c40c259408c9
Summary:
This PR absolves `_TestParametrizer`s (e.g. `ops`, `modules`, `parametrize`) of the responsibility of adding device type (e.g. `'cpu'`, `'cuda'`, etc.) / dtype (e.g. 'float32') to generated test names. This fixes repeated instances of the device string being added to generated test names (e.g. `test_batch_norm_training_True_cuda_track_running_stats_True_cuda_affine_True_cuda`).
The responsibility for placing device / dtype suffixes is now handled by `instantiate_device_type_tests()` instead so it is added a single time. It will place `<device>_<dtype>` at the end of the test name unconditionally, maintaining the current naming convention.
As part of this work, I also tightened the semantics through some additional error case handling:
* Composing multiple decorators that each try to handle the same parameter will error out with a nice message. This includes the case to trying to compose `modules` + `ops`, as they each try to handle `dtype`. Similarly, `ops` + `dtypes` is forbidden when both try to handle `dtype`. This required changes in the following test files:
* `test/test_unary_ufuncs.py`
* `test/test_foreach.py`
* The `modules` / `ops` decorators will now error out with a nice message if used with `instantiate_parametrized_tests()` instead of `instantiate_device_type_tests()`, since they're not (currently) written to work outside of a device-specific context.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65217
Reviewed By: mruberry
Differential Revision: D32627303
Pulled By: jbschlosser
fbshipit-source-id: c2957228353ed46a0b7da8fa1a34c67598779312
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67794
This change is needed to conveniently use the same comparison mechanism for our internal testsuite (see #67796). The reworked version is on par with the previous version except for the ability to pass a custom message as callable. Before we converted everything to a tensor so it was fairly easy to provide consistent mismatch diagnostics to the callable. Now, with arbitrary `Pair`'s that are used for comparison that is no longer viable.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D32532206
Pulled By: mruberry
fbshipit-source-id: dc847fba6a795c1766e01bc3e88b680a68287b1e
Summary:
After realizing that CUDA mem leaks were not rerun, I realized I forgot to pass the env var as a Docker variable.
What a noob mistake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68486
Reviewed By: seemethere
Differential Revision: D32501718
Pulled By: janeyx99
fbshipit-source-id: 9918d626e90bea1562a3094c6eb12cb7d86dbf6a
Summary:
In case the inputs have a different layout, `assert_close(..., check_layout=False)` converts them to strided before comparison. This is helpful if you just want to compare the values of sparse COO / CSR tensor against a strided reference.
This keeps BC, since the default `check_layout=True` was the old, hard-coded behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65419
Reviewed By: H-Huang
Differential Revision: D31133629
Pulled By: mruberry
fbshipit-source-id: ca8918af81fb0e0ba263104836a4c2eeacdfc7e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554
Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:
1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.
We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D30662206
Pulled By: mruberry
fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63572
Addresses #61906. Issue will be fixed later in the stack when `torch.testing.assert_close` got the same treatment.
cc ezyang gchanan
Test Plan: Imported from OSS
Reviewed By: ezyang
Differential Revision: D30633527
Pulled By: mruberry
fbshipit-source-id: c2002a4998a7a75cb2ab83f87190bde43a9d4f7c
Summary:
This utilizes the feature introduced in https://github.com/pytorch/pytorch/issues/60091 to modify the header of the error message.
Before:
```python
AssertionError: Tensor-likes are not equal!
Mismatched elements: 1 / 2 (50.0%)
Greatest absolute difference: 1 at index 1
Greatest relative difference: 0.3333333432674408 at index 1
The failure occurred for the values.
```
After:
```python
AssertionError: Sparse COO values of tensor-likes are not equal!
Mismatched elements: 1 / 2 (50.0%)
Greatest absolute difference: 1 at index 1
Greatest relative difference: 0.3333333432674408 at index 1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61583
Reviewed By: malfet
Differential Revision: D30014797
Pulled By: cpuhrsch
fbshipit-source-id: 66e30645e94de5c8c96510822082ff9aabef5329
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61412.
Large integers gave false positives, because the comparison always takes place in floating point dtypes. This happens, because their integer precision is lower than the range of an integer dtype with the same number of bits.
For non-extremal values, `isclose` is defined by [this equation]:
```python
abs(a - b) <= atol + rtol * abs(b)
```
For `rtol == 0 and atol==0`, this is equivalent to `a == b`. This PR goes for the low hanging fruit and adds a shortcut for this case that falls back to an actual equality check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61529
Reviewed By: gchanan
Differential Revision: D29707534
Pulled By: mruberry
fbshipit-source-id: 71b8c4901e9cd4f366442437e52032b0d3002b4a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60638
Initial proposal in https://github.com/pytorch/pytorch/pull/58981#issuecomment-866690334. Opposed to the proposal, this PR only allows relaxing the type equality constraint to a common superclass constraint, for example `torch.Tensor` vs `torch.nn.Parameter`. Inputs that do not share a common superclass will still fail.
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D29626811
Pulled By: mruberry
fbshipit-source-id: 1916c3b710d38889de7ce57eb0770c76cbbb8166
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60536
`torch.isclose` does not do this bool tensors, which results in a test failure since subtraction (`abs(actual - expected)`) is not supported for them (see #58981). Since the `dtype` is already checked at this point, we can safely move the upcasting before `torch.isclose` is invoked.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D29556356
Pulled By: mruberry
fbshipit-source-id: 4c65fad4f06cf402d6aab9dde5b127235766d5e0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60254
Before we only tested that the correct error message is returned if `msg` is passed as callable. This adds tests that make sure that
- the inputs passed to the callable are the same inputs passed to `torch.assert_close` and
- the `diagnostics` namespace has the same attributes and types as documented.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D29556354
Pulled By: mruberry
fbshipit-source-id: 9793c6d86fda842b6329381fc03b945eee878464
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60163
Changes to the default error message in case of mismatching values need to be reflected in the examples given in the docstring. Normally this should be enforced by a [`doctest`](https://docs.python.org/3/library/doctest.html). mruberry do you know why we don't have such a check?
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D29556353
Pulled By: mruberry
fbshipit-source-id: 8dbc3f566f429618811b542a059d9abde9a6530b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60091Closes#58383. (1) and (2) are implemented. (3) was rejected. No consensus was reached on (4) and (5).
Improvements:
- Instead of calling everything "Tensors" we now use "Scalars" and "Tensor-likes" depending on the shape. Plus, we now internally have the option to adapt this identifier for example to report "Imaginary components of complex tensor-likes", which is even more expressive.
- The reported conditions "not close" and "not equal" are now determined based on `rtol` and `atol`.
- The number of mismatched elements and the offending indices are only reported in case the inputs are not scalar
- The allowed `rtol` and `atol` is only reported if `> 0`
**Example 1**
```python
torch.testing.assert_close(1, 3, rtol=0, atol=1)
```
Before:
```
AssertionError: Tensors are not close!
Mismatched elements: 1 / 1 (100.0%)
Greatest absolute difference: 2 at 0 (up to 1 allowed)
Greatest relative difference: 0.6666666865348816 at 0 (up to 0 allowed)
```
After:
```
AssertionError: Scalars are not close!
Absolute difference: 2 (up to 1 allowed)
Relative difference: 0.6666666865348816
```
**Example 2**
```python
torch.manual_seed(0)
t = torch.rand((2, 2), dtype=torch.complex64)
torch.testing.assert_close(t, t + complex(0, 1))
```
Before:
```
AssertionError: Tensors are not close!
Mismatched elements: 4 / 4 (100.0%)
Greatest absolute difference: 1.0000000596046448 at (0, 0) (up to 1e-05 allowed)
Greatest relative difference: 0.8833684352411922 at (0, 1) (up to 1.3e-06 allowed)
The failure occurred for the imaginary part.
```
After:
```
AssertionError: Imaginary components of tensor-likes are not close!
Mismatched elements: 4 / 4 (100.0%)
Greatest absolute difference: 1.0000000596046448 at index (0, 0) (up to 1e-05 allowed)
Greatest relative difference: 0.8833684352411922 at index (0, 1) (up to 1.3e-06 allowed)
```
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D29556357
Pulled By: mruberry
fbshipit-source-id: 559d4a19ad4fc069b2b4f8cb5fc2f6058621e33d
Summary:
This adds support for quantized tensors the same way torch.testing._internal.common_utils.TestCase.assertEqual does:
bf269fdc98/torch/testing/_internal/common_utils.py (L1314-L1341)
- `.qscheme()` is checked for equality
- `.q_scale` and `q_zero_point` are checked for equality (see comment below) for `.qscheme() == torch.per_tensor_affine`
- `.q_per_channel_scales`, `q_per_channel_zero_points`, and `q_per_channel_axis` are checked for equality (see comment below) for `.qscheme() == torch.per_tensor_affine`
- values are checked with the default checks after a `.int_repr().to(torch.int32)` call
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58926
Reviewed By: jerryzh168
Differential Revision: D29483532
Pulled By: mruberry
fbshipit-source-id: 003fde7e21cf844778a879c3de0a7c84d13877bd
Summary:
We need to resolve the conjugate bit for complex tensors, because otherwise we may not be able to access the imaginary component:
```python
>>> torch.tensor(complex(1, 1)).conj().imag
RuntimeError: view_as_real doesn't work on unresolved conjugated tensors. To resolve the conjugate tensor so you can view it as real, use self.resolve_conj(); however, be warned that the resulting tensor will NOT alias the original.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60522
Reviewed By: ngimel
Differential Revision: D29353095
Pulled By: mruberry
fbshipit-source-id: c36eaf883dd55041166f692f7b1d35cd2a34acfb
Summary:
Changes including:
- introduced `linter/`, `testing/`, `stats/` folders in `tools/`
- move appropriate scripts into these folders
- change grepped references in the pytorch/pytorch repo
Next step
- introduce `build/` folder for build scripts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473
Test Plan:
- CI (this is important b/c pytorch/test-infra also rely on some script reference.
- tools/tests/
Reviewed By: albanD
Differential Revision: D29352716
Pulled By: walterddr
fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd
Summary:
This adds support for sparse tensors the same way `torch.testing._internal.common_utils.TestCase.assertEqual` does:
5c7dace309/torch/testing/_internal/common_utils.py (L1287-L1313)
- Tensors are coalesced before comparison.
- Indices and values are compared individually.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58844
Reviewed By: zou3519
Differential Revision: D29160250
Pulled By: mruberry
fbshipit-source-id: b0955656c2c7ff3db37a1367427ca54ca14f2e87
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58919
Instead of having one large TestAsserts test case, we split of tests for
self-contained functionality like container or complex checking into
separate test cases. That makes it a lot easier to keep an overview over
what is tested.
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision: D29259407
Pulled By: mruberry
fbshipit-source-id: 9769cb6d56c1a3790280542db398cb247986b09a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58918
~Instead of a distinct `torch.testing.assert_close` and `torch.testing.assert_equal`, this makes `torch.testing.assert_equal` a special case of `torch.testing.assert_close` for `rtol=atol=0`. In this case the closeness definition `abs(actual - expected) <= atol + rtol * abs(expected)` boils down to `abs(actual - expected) <= 0`. Since `abs(x)` can never be `<0`, this is equivalent to `abs(a - b) == 0` and this again boils down to `a == b`.~
Following https://github.com/pytorch/pytorch/pull/58918#issuecomment-860642057 and some offline discussions, we opted to use `assert_equal` as an example how to `partial` it.
This makes maintaing the module a lot easier, because we don't need to keep two functions in sync.
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision: D29259404
Pulled By: mruberry
fbshipit-source-id: fa1a1fa93672a7ed1c5f0e4beb0dcd45b5c14fce
Summary:
Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs.
I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275
Test Plan: CI.
Reviewed By: zhouzhuojie
Differential Revision: D28428143
Pulled By: samestep
fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf
Summary:
In contrast to the initial opinion in https://github.com/pytorch/pytorch/issues/55385, there are legitimate use cases for nested containers. One such example is the [output of `LSTM`'s](https://pytorch.org/docs/stable/generated/torch.nn.LSTM):
```python
output: Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] = torch.nn.LSTM()(input)
assert_close(output, expected)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57270
Reviewed By: albanD
Differential Revision: D28249303
Pulled By: mruberry
fbshipit-source-id: 75caa4414cc184ff0ce4cfc0dd5aafddfad42bcf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56365
Follow-up to https://github.com/pytorch/pytorch/pull/54784#discussion_r614156172. Instead of having one large testcase where most methods are decorated with `onlyCPU`, this factors out all tests that actually need another device into a separate test case.
Test Plan: Imported from OSS
Reviewed By: walterddr, albanD
Differential Revision: D28247529
Pulled By: mruberry
fbshipit-source-id: 946e7694b70e736941565f29b5dd459ed7fbca4e
Summary:
Redo of https://github.com/pytorch/pytorch/issues/57135 out of stack
---
Currently all values are used for the reported absolute and relative differences. This usually works fine, but breaks down for the extremals:
```python
torch.testing.assert_close(torch.tensor([1.0, 0.0]), torch.tensor([2.0, 0.0]))
```
```
[...]
Greatest absolute difference: 1.0 at 0 (up to 1e-05 allowed)
Greatest relative difference: nan at 1 (up to 1.3e-06 allowed)
```
Although the second element is matching it is listed as offender for the greatest relative difference. The `NaN` stems from the `0 / 0` division.
To overcome this, we should only use the values that were considered a mismatch for the reported stats.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57923
Reviewed By: ngimel
Differential Revision: D28317316
Pulled By: mruberry
fbshipit-source-id: 4c604493bbe13b37f41225ea9af9e839a7304161
Summary:
Redo of https://github.com/pytorch/pytorch/issues/56373 out of stack.
---
To reviewers: **please be nitpicky**. I've read this so often that I probably missed some typos and inconsistencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57247
Reviewed By: albanD
Differential Revision: D28247402
Pulled By: mruberry
fbshipit-source-id: 71142678ee5c82cc8c0ecc1dad6a0b2b9236d3e6
Summary:
Currently we require type equality for `torch.testing.assert_(equal|close)`:
3db45bcb91/torch/testing/_asserts.py (L509-L513)
That means `assert_equal(1, 1.0)` will correctly fail. Although the type of a scalar is similiar to a dtype of a tensor, `assert_equal(1, 1.0, check_dtype=False)` will also fail while `assert_equal(torch.as_tensor(1), torch.as_tensor(1.0), check_dtype=False)` will pass.
To make the interface more consistent, this PR relaxes the type equality constraint, by disabling it in case both inputs are scalars.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57532
Reviewed By: ngimel
Differential Revision: D28242428
Pulled By: mruberry
fbshipit-source-id: b643c77f48b64fc2c8a43925120d2b634ec336b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55890
Proof-of-concept for https://github.com/pytorch/pytorch/pull/55145#issuecomment-817297273
With this the user is able to pass a custom error message to `assert_(equal|close)` which will be used in case the values mismatch. Optionally, a callable can be passed which will be called with mismatch diagnostics and should return an error message:
```python
def make_msg(a, b, info):
return (
f"Argh, we found {info.total_mismatches} mismatches! "
f"That is {info.mismatch_ratio:.1%}!"
)
torch.testing.assert_equal(torch.tensor(1), torch.tensor(2), msg=make_msg)
```
If you imagine `a` and `b` as the outputs of binary ufuncs, the error message could look like this:
```python
def make_msg(input, torch_output, numpy_output, info):
return (
f"For input {input} torch.binary_op() and np.binary_op() do not match: "
f"{torch_output} != {numpy_output}"
)
torch.testing.assert_equal(
torch.binary_op(input),
numpy.binary_op(input),
msg=lambda a, b, info: make_msg(input, a, b, info),
)
```
This should make it much easier for developers to find out what is actually going wrong.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27903842
Pulled By: mruberry
fbshipit-source-id: 4c82e3d969e9a621789018018bec6399724cf388
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55786
Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27903814
Pulled By: mruberry
fbshipit-source-id: fe3d063d0c9513cbd8b3408a2023e94c490c817e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55385
This renames `assert_tensors_(equal|close)` to `_check_tensors_(equal|close)` and exposes two new functions: `assert_(equal|close)`. In addition to tensor pairs, the newly added functions also support the comparison of tensors in sequences or mappings. Otherwise their signature stays the same.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27903805
Pulled By: mruberry
fbshipit-source-id: 719d19a1d26de8d14cb25846e3d22a6ac828c80a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55145
Repeating the discussion from https://github.com/pytorch/pytorch/pull/54784#issuecomment-811792089
The error messages for mismatched values are directly adapted from the old `_compare_tensors_internal`:
50cb75edce/torch/testing/__init__.py (L104-L111)
A sample error message right now looks like this
```
With rtol=1.3e-06 and atol=1e-05, found 1 different element(s) out of 12 (8.3%). The greatest difference of 4.0 (5.0 vs. 9.0) occurred at index (2, 3)
```
Using the same data with `numpy.testing.assert_equal` gives the following output:
```
Not equal to tolerance rtol=1.3e-06, atol=1e-05
Mismatched elements: 1 / 12 (8.33%)
Max absolute difference: 4.
Max relative difference: 0.44444445
x: array([[5., 5., 5., 5.],
[5., 5., 5., 5.],
[5., 5., 5., 5.]], dtype=float32)
y: array([[5., 5., 5., 5.],
[5., 5., 5., 5.],
[5., 5., 5., 9.]], dtype=float32)
```
Pros:
- The info is presented in a list instead of a sentence. IMO this makes it more readable
- The maximum relative difference is reported, which is beneficial in case a comparison fails due to the `rtol`
Cons:
- The values of the inputs are reported (this can be disabled by passing `verbose=False`, but lets face it: most users will use the default setting). In case the inputs are large, the output gets truncated with `...`. Not only is it hard to visually find the mismatching values, they could also live within the truncated part, making the output completely useless.
- Even when visually find the offending values it is hard to parse this back to the index in the inputs.
This implements a mix of both to get a short but expressive message:
```
Tensors are not close according to rtol=1.3e-6 and atol=1e-05:
Mismatched elements: 1 / 12 (8.3%)
Max. rel. diff.: 4.44e-1 at (2, 3)
Max. abs. diff.: 4.0 at (2, 3)
```
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D27877157
Pulled By: mruberry
fbshipit-source-id: 6898a995f116f127e3ae8ed0bcb1ada63eadc45a
Summary:
This PR
- moves `torch/testing/_internal/mypy_wrapper.py` (and its accompanying tests from `test/test_testing.py`) to `tools`,
- removes the now-unused `test_run_mypy` from `test/test_type_hints.py`, and
- replaces the hardcoded list of `mypy` configs (previously duplicated across `mypy_wrapper.py` and `.github/workflows/lint.yml`) with a simpler glob
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54268
Test Plan:
Should also be run in the "Test tools" GHA workflow in CI:
```
python tools/test/test_mypy_wrapper.py
```
Reviewed By: janeyx99
Differential Revision: D27168095
Pulled By: samestep
fbshipit-source-id: a8dc18407b5e4c103ace23a636b0a8534951905a
Summary:
Step 2 to fixing https://github.com/pytorch/pytorch/issues/53882 :)
This changes TARGET_DET_LIST and sharding automation by checking if there's already cached data from the commit in `.pytorch-test-times`. If not, it pulls data from S3 and updates the file to have the stats. This way, S3 pulling does not need to happen more than once for the same commit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54210
Test Plan:
the following methods should run the same set of tests.
First `export CIRCLE_JOB=pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2` or your favorite CIRCLE JOB.
1. Pull data first and use it:
Download the data from S3 and write it to the cache file with `python test/run_test.py --export-historic-test-times .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`
2. Make the sharding job pull data:
Delete the file you just created: `rm .pytorch-test-times`
Now run `python test/run_test.py --shard 1 10`
Reviewed By: walterddr
Differential Revision: D27136849
Pulled By: janeyx99
fbshipit-source-id: 51a42c4e2fa3f8cf15e682679dd3eb6130aad927
Summary:
This is an initial attempt in refactoring and consolidating our S3 read logic for print_test_stats.py, test_history.py, and run_test.py. This way, boto3 and botocore do not need to be imported in various places throughout the code base, and duplicated logic (such as the many type definitions) can exist in one place: `tools/stat_utils/s3_stat_parser.py`. walterddr contributed to this PR by moving print_test_stats.py to the tools folder and the corresponding tests a subfolder within tools.
**NOTE: this removes those tests from CI as the new `tools/test/test_stats.py` is not in the test/ directory as the other tests in TESTS in run_test.py.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53755
Test Plan:
This refactoring change should not break anything, so running the files as before should work as they did previously.
To make sure that print_test_stats.py still functions: run `python tools/test/test_stats.py` and make sure all tests pass.
To make sure that test_history.py works, run the example commands from `tools/test_history.py --help` and check that their output matches that shown. Note that the script will continue printing for a while, so don't be alarmed.
Some next steps:
- Actually coming up with similarities among the three current use cases and further refactoring/consolidating of functions (e.g., combining simplify and get_cases)
- Moving more parsing logic to s3_stat_parser.py to have better abstraction between our files
- Adding tests for s3_stat_parser.py when there is more functionality in it
Reviewed By: agolynski, samestep
Differential Revision: D27030285
Pulled By: janeyx99
fbshipit-source-id: e664781324ef7c0c30943bfd7f17c895075ef7a7
Summary:
This PR disables the bulk of the output for test time regression reporting, since it's obscuring more important signal (especially in cases where shards are shifting around).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54078
Test Plan:
```
python test/test_testing.py
```
Reviewed By: ezyang, walterddr
Differential Revision: D27088987
Pulled By: samestep
fbshipit-source-id: 06a4eeb75641552bad2ab4b9154a8c70c57b0d68
Summary:
This PR:
1. moves sharding algorithm from run_test.py to framework_utils.py (let me know if you have a better place for it)
2. adds tests for the algorithm in test_testing.py
3. fixes the algorithm so that it doesn't tack on the unknown jobs all to the shard with the minimum time, but instead distributes them around the shards.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53942
Test Plan: python test/test_testing.py -k TestFrameworkUtils
Reviewed By: samestep
Differential Revision: D27047223
Pulled By: janeyx99
fbshipit-source-id: 824b20009c0bb707aa5361de445cdec795d5e3f1
Summary:
We want to store the file names that triggers each test suite so that we can use this data for categorizing those test files.
~~After considering several solutions, this one is the most backwards compatible, and the current test cases in test_testing.py for print test stats don't break.~~
The previous plan did not work, as there are multiple Python test jobs that spawn the same suites. Instead, the new S3 format will store test files (e.g., `test_nn` and `distributed/test_distributed_fork`) which will contain the suites they spawn, which will contain the test cases run within the suite. (Currently, there is no top layer of test files.)
Because of this major structural change, a lot of changes have now been made (thank you samestep!) to test_history.py and print_test_stats.py to make this new format backwards compatible.
Old test plan:
Make sure that the data is as expected in S3 after https://github.com/pytorch/pytorch/pull/52873 finishes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52869
Test Plan: Added tests to test_testing.py which pass, and CI.
Reviewed By: samestep
Differential Revision: D26672561
Pulled By: janeyx99
fbshipit-source-id: f46b91e16c1d9de5e0cb9bfa648b6448d979257e
Summary:
Take 2 of https://github.com/pytorch/pytorch/issues/50914
This change moves the early termination logic into common_utils.TestCase class.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52126
Test Plan: CI with ci-all tag
Reviewed By: malfet
Differential Revision: D26391762
Pulled By: walterddr
fbshipit-source-id: a149ecc47ccda7f2795e107fb95915506ae060b4
Summary:
This fixes an issue (currently blocking https://github.com/pytorch/pytorch/issues/51905) where the test time regression reporting step will fail if none of the most recent `master` ancestors have any reports in S3 (e.g. if a new job is added).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52054
Test Plan:
```
python test/test_testing.py
```
Reviewed By: walterddr
Differential Revision: D26369507
Pulled By: samestep
fbshipit-source-id: 4c4e1e290cb943ce8fcdadacbf51d66b31c3262a
Summary:
This is a follow up on https://github.com/pytorch/pytorch/issues/49869.
Previously CUDA early termination only happens for generic test classes that extends from `DeviceTypeTestBase`. However, JIT test cases which extends from common_utils.TestCase cannot benefit from the early termination.
This change moves the early termination logic into common_utils.TestCase class.
- all tests extended from common_utils.TestCase now should early terminate if CUDA assert occurs.
- For TestCases that extends from common_device_type.DeviceTypeTestBase, still only do torch.cuda.synchronize() when RTE is thrown.
- For TestCases extends common_utils.TestCase, regardless of whether a test case uses GPU or not, it will always synchronize CUDA as long as `torch.cuda.is_initialize()` returns true.
- Disabling this on common_distributed.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50914
Reviewed By: malfet
Differential Revision: D26019289
Pulled By: walterddr
fbshipit-source-id: ddc7c1c0d00db4d073a6c8bc5b7733637a7e77d1
Summary:
This is a followup to https://github.com/pytorch/pytorch/issues/49190. Vaguely speaking, the goals are to make it easy to identify test time regressions introduced by PRs. Eventually the hope is to use this information to edit Dr CI comments, but this particular PR just does the analysis and prints it to stdout, so a followup PR would be needed to edit the actual comments on GitHub.
**Important:** for uninteresting reasons, this PR moves the `print_test_stats.py` file.
- *Before:* `test/print_test_stats.py`
- *After:* `torch/testing/_internal/print_test_stats.py`
Notes on the approach:
- Just getting the mean and stdev for the total job time of the last _N_ commits isn't sufficient, because e.g. if `master` was broken 5 commits ago, then a lot of those job times will be much shorter, breaking the statistics.
- We use the commit history to make better estimates for the mean and stdev of individual test (and suite) times, but only when the test in that historical commit is present and its status matches that of the base commit.
- We list all the tests that were removed or added, or whose status changed (e.g. skipped to not skipped, or vice versa), along with time (estimate) info for that test case and its containing suite.
- We don't list tests whose time changed a lot if their status didn't change, because there's a lot of noise and it's unclear how to do that well without too many false positives.
- We show a human-readable commit graph that indicates exactly how many commits are in the pool of commits that could be causing regressions (e.g. if a PR has multiple commits in it, or if the base commit on `master` doesn't have a report in S3).
- We don't show an overall estimate of whether the PR increased or decreased the total test job time, because it's noisy and it's a bit tricky to aggregate stdevs up from individual tests to the whole job level. This might change in a followup PR.
- Instead, we simply show a summary at the bottom which says how many tests were removed/added/modified (where "modified" means that the status changed), and our best estimates of the mean times (and stdevs) of those changes.
- Importantly, the summary at the bottom is only for the test cases that were already shown in the more verbose diff report, and does not include any information about tests whose status didn't change but whose running time got much longer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50171
Test Plan:
To run the unit tests:
```
$ python test/test_testing.py
$ python test/print_test_stats.py
```
To verify that this works, check the [CircleCI logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/258628/workflows/9cfadc34-e042-485e-b3b3-dc251f160307) for a test job run on this PR; for example:
- pytorch_linux_bionic_py3_6_clang9_test
To test locally, use the following steps.
First run an arbitrary test suite (you need to have some XML reports so that `test/print_test_stats.py` runs, but we'll be ignoring them here via the `--use-json` CLI option):
```
$ DATA_DIR=/tmp
$ ARBITRARY_TEST=testing
$ python test/test_$ARBITRARY_TEST.py --save-xml=$DATA_DIR/test/test_$ARBITRARY_TEST
```
Now choose a commit and a test job (it has to be on `master` since we're going to grab the test time data from S3, and [we only upload test times to S3 on the `master`, `nightly`, and `release` branches](https://github.com/pytorch/pytorch/pull/49645)):
```
$ export CIRCLE_SHA1=c39fb9771d89632c5c3a163d3c00af3bef1bd489
$ export CIRCLE_JOB=pytorch_linux_bionic_py3_6_clang9_test
```
Download the `*.json.bz2` file(s) for that commit/job pair:
```
$ aws s3 cp s3://ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/ $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB --recursive
```
And feed everything into `test/print_test_stats.py`:
```
$ bzip2 -kdc $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/*Z.json.bz2 | torch/testing/_internal/print_test_stats.py --compare-with-s3 --use-json=/dev/stdin $DATA_DIR/test/test_$ARBITRARY_TEST
```
The first part of the output should be the same as before this PR; here is the new part, at the end of the output:
- https://pastebin.com/Jj1svhAn
Reviewed By: malfet, izdeby
Differential Revision: D26317769
Pulled By: samestep
fbshipit-source-id: 1ba06cec0fafac77f9e7341d57079543052d73db
Summary:
This is a followup to https://github.com/pytorch/pytorch/issues/49190. Vaguely speaking, the goals are to make it easy to identify test time regressions introduced by PRs. Eventually the hope is to use this information to edit Dr CI comments, but this particular PR just does the analysis and prints it to stdout, so a followup PR would be needed to edit the actual comments on GitHub.
**Important:** for uninteresting reasons, this PR moves the `print_test_stats.py` file.
- *Before:* `test/print_test_stats.py`
- *After:* `torch/testing/_internal/print_test_stats.py`
Notes on the approach:
- Just getting the mean and stdev for the total job time of the last _N_ commits isn't sufficient, because e.g. if `master` was broken 5 commits ago, then a lot of those job times will be much shorter, breaking the statistics.
- We use the commit history to make better estimates for the mean and stdev of individual test (and suite) times, but only when the test in that historical commit is present and its status matches that of the base commit.
- We list all the tests that were removed or added, or whose status changed (e.g. skipped to not skipped, or vice versa), along with time (estimate) info for that test case and its containing suite.
- We don't list tests whose time changed a lot if their status didn't change, because there's a lot of noise and it's unclear how to do that well without too many false positives.
- We show a human-readable commit graph that indicates exactly how many commits are in the pool of commits that could be causing regressions (e.g. if a PR has multiple commits in it, or if the base commit on `master` doesn't have a report in S3).
- We don't show an overall estimate of whether the PR increased or decreased the total test job time, because it's noisy and it's a bit tricky to aggregate stdevs up from individual tests to the whole job level. This might change in a followup PR.
- Instead, we simply show a summary at the bottom which says how many tests were removed/added/modified (where "modified" means that the status changed), and our best estimates of the mean times (and stdevs) of those changes.
- Importantly, the summary at the bottom is only for the test cases that were already shown in the more verbose diff report, and does not include any information about tests whose status didn't change but whose running time got much longer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50171
Test Plan:
To run the unit tests:
```
$ python test/test_testing.py
$ python test/print_test_stats.py
```
To verify that this works, check the [CircleCI logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/258628/workflows/9cfadc34-e042-485e-b3b3-dc251f160307) for a test job run on this PR; for example:
- pytorch_linux_bionic_py3_6_clang9_test
To test locally, use the following steps.
First run an arbitrary test suite (you need to have some XML reports so that `test/print_test_stats.py` runs, but we'll be ignoring them here via the `--use-json` CLI option):
```
$ DATA_DIR=/tmp
$ ARBITRARY_TEST=testing
$ python test/test_$ARBITRARY_TEST.py --save-xml=$DATA_DIR/test/test_$ARBITRARY_TEST
```
Now choose a commit and a test job (it has to be on `master` since we're going to grab the test time data from S3, and [we only upload test times to S3 on the `master`, `nightly`, and `release` branches](https://github.com/pytorch/pytorch/pull/49645)):
```
$ export CIRCLE_SHA1=c39fb9771d89632c5c3a163d3c00af3bef1bd489
$ export CIRCLE_JOB=pytorch_linux_bionic_py3_6_clang9_test
```
Download the `*.json.bz2` file(s) for that commit/job pair:
```
$ aws s3 cp s3://ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/ $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB --recursive
```
And feed everything into `test/print_test_stats.py`:
```
$ bzip2 -kdc $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/*Z.json.bz2 | torch/testing/_internal/print_test_stats.py --compare-with-s3 --use-json=/dev/stdin $DATA_DIR/test/test_$ARBITRARY_TEST
```
The first part of the output should be the same as before this PR; here is the new part, at the end of the output:
- https://pastebin.com/Jj1svhAn
Reviewed By: walterddr
Differential Revision: D26232345
Pulled By: samestep
fbshipit-source-id: b687b1737519d2eed68fbd591a667e4e029de509