This PR makes the following changes...
Prims
- adds as_strided
- fixes errors in flatten meta
Testing
- enables view consistency checking (which can be opted out of, see issues below)
- adds reference inputs for view, reshape, and flatten
- adds error inputs for reshape
Refs
- adds as_strided, reshape, and view
- fixes an error in the flatten ref where it was not returning self on no-op
- fixes a bug in transpose where it was not retuning a view when the transposed tensor has 1 or fewer dims
Issues
- https://github.com/pytorch/pytorch/issues/77218
- https://github.com/pytorch/pytorch/issues/77216
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77220
Approved by: https://github.com/ngimel
This PR ...
Makes the following testing changes:
- Updates stride testing in test_python_reference_consistency to only check strides of dimensions with length > 1
- Creates reference inputs for reshape
- Creates reference inputs for chunk
- Extends the sample inputs for unsqueeze
- Extends the sample inputs for stack -- test_conj_view and test_neg_view are now xfailed
- https://github.com/pytorch/pytorch/issues/77046
Makes the following architecture changes:
- Adds the refs.special (sub)module
- Adds the refs.nn.functional (sub)module
Adds the following prims:
- expand_dims
- view_of
- rev
- clone
Adds the following references:
- flatten
- squeeze
- unsqueeze
- special.i0e
- special.i1e
- logical_or
- logical_and
- isclose
- flip
- stack
- nn.functional.elu
- chunk
- clone
- narrow
Identifies the following bugs in PyTorch today:
- https://github.com/pytorch/pytorch/issues/77054
- https://github.com/pytorch/pytorch/issues/77055
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77043
Approved by: https://github.com/ngimel
This PR does the following...
Tests:
- fixes test_type_promotion in test_binary_ufuncs to correctly generate scalar cpu tensors
- fixes test_python_reference_consistency to use the Python Reference's reference inputs
- extends Python reference testing to test_conj_view, test_neg_view, and test_neg_conj_view
- adds a NaN propagation sample input for elementwise unary and binary operations
- fixes the UnaryUfuncInfo class to properly register its reference inputs
- Updates the Python Reference OpInfos to skip error inputs when their behavior on scalar inputs is inconsistent with their reference operators
Code organization:
- moves elementwise type promotion functionality to prims.utils
Prims & Refs:
- fixes scalar cpu tensor handling by having them pass through broadcasting and device and shape checks
- adds two decorators, `elementwise_type_promotion_wrapper` and `out_wrapper`, the former allows for elementwise type promotion to be automated and the latter automatically adds the out kwarg and handles it properly
cc @ezyang who also had some thoughts on cpu scalar tensor handling
cc @chillee -- might want to use this new decorator as we converge decompositions and references
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76945
Approved by: https://github.com/ngimel
This PR makes the following changes:
Prims:
- igamma and igammac are now correctly listed as elementwise binary operations, not elementwise unary operations
- elementwise prims now must specify their type promotion kind (this is currently unused)
Refs:
- complexhalf is now handled by opmath-style type promotion
- adds references for: abs, acos, acosh, asin, atan, ceil, cos, cosh, digamma, erf, erfinv, erfc, exp, expm1, isfinite, isnan, lgamma, log, log1p, neg, reciprocal, sign, sin, sinh, sqrt, square, tan, igamma, igammac
- adds "complex to float" and "bool to long" type promotion kinds
- updates out behavior to warn when resizing a non-empty tensor, consistent with current ops
- updates the elementwise unary reference template with type promotion
Tests:
- fixes torch.pow's OpInfo to correctly specify it only supports one scalar input, not two
- fixes elementwise binary reference inputs to not attempt generating certain tensors in complex half (for now, cc @kshitij12345)
- adds OpInfos for the following Python references: abs, acos, acosh, asin, atan, ceil, cos, cosh, digamma, erf, erfinv, erfc, exp, expm1, isfinite, isnan, lgamma, log, log1p, neg, reciprocal, round, sign, sin, sinh, sqrt, square, tan, atan2, bitwise_and, bitwise_left_shift, bitwise_or, bitwise_xor, eq, float_power, ge, gt, igamma, igammac, le, lt, maximum, minimum, mul, ne, nextafter, pow, sub, true_divide
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76647
Approved by: https://github.com/ngimel
This adds prototype nvFuser integration for the following prims:
- broadcast_in_dim
- convert_element_type
- add
- div
- ge
- gt
- le
- lt
- mul
Adding it for additional prims supported by nvFuser's prototype Python frontend should be easy.
This also adds a new sugar to run operations using the ATen or nvFuser trace executors. For example:
```
def foo(a, b):
return torch.add(a, b)
traced_foo = make_traced(foo)
a = torch.randn((1, 2, 3, 4, 5), device='cuda')
b = torch.randn((1, 2, 3, 4, 5), device='cuda')
result = traced_foo(a, b, executor='nvfuser')
```
Currently only operations with tensor inputs and one tensor output are supported, and the operation must be composed exclusively of reference or prim operations.
Finally, this adds a new test, test_prims.py, that just tests the broadcast_in_dim prim for now. In the future we'll likely have OpInfos for each prim, but we'll need a reference implementation of broadcast_in_dim to make that interesting.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76560
Approved by: https://github.com/ngimel
Adds a prototype tracer with no caching support and the `ElementwiseUnaryPythonRefInfo` class. A reference for `floor` is added to test the latter, and the elementwise binary reference inputs are extended to also return noncontiguous inputs. The SampleInput transform operation has been updated to return an actual SampleInput instead of a tuple to facilitate uniform handling of (transformed) SampleInputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76388
Approved by: https://github.com/ngimel
Summary:
This PR adds an initial set of experimental primitive operations and Python references that reimplement existing PyTorch operations using them. See https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-0/577 for additional context.
The following experimental primitives are added:
- Elementwise unary prims -- abs, acos, acosh, asin, atan, cos, cosh, bessel_i0e, bessel_i1e, cbrt, ceil, digamma, erf, erf_inv, erfc, exp, expm1, floor, igamma, igammac, is_finite, lgamma, log, log1p, neg, reciprocal, round, sign, sinh, sqrt, square, tan.
- Elementwise binary prims -- add, atan2, bitwise_and, bitwise_not, bitwise_or, bitwise_xor, div, eq, ge, gt, le, lt, max, min, mul, ne, nextafter, pow, rsqrt, shift_left, shift_right_arithmetic
- View prims -- brodcast_in_dim, collapse_view, split_dim, squeeze
- Shape prims -- collapse, concatenate, reshape
- Conditional prims -- select
- Data conversion & movement prims -- convert_element_type, device_put
- Inplace prims -- copy_to, resize
These primitives do not add any new functionality to PyTorch, but are intended to be the semantic building blocks for reference operators. We have tried to make them consistent with the operations in [jax.lax](https://jax.readthedocs.io/en/latest/jax.lax.html) where possible (because PyTorch prefers being consistent with other frameworks), although there are key differences between these prims and operations in jax.lax. Most notably is that these prims model view semantics and inplace operations.
In addition to these primitives the following elementwise binary Python references are added:
- Elementwise binary Python references -- add, atan2, bitwise_and, bitwise_left_shift, bitwise_or, bitwise_right_shift, bitwise_xor, eq, float_power, ge, gt, le, lt, maximum, minimum, mul, ne, nextafter, pow, sub, true_divide
- Conditional Python references - where
- Data conversion & movement references - copy_to
A Python reference implements the same behavior as its corresponding PyTorch operator (excepting slight numerical differences, bug fixes, and in some cases additional features).
The start of an OpInfo-based test architecture for these references is also included in this PR. A new list, `python_ref_db`, is added to `common_methods_invocations.py`. This list introduces the new `ElementwiseBinaryPythonRefInfo`, which inherits input arguments from the original operators' OpInfo, allows them to be overridden, and then constructs the OpInfo for the Python reference using the (potentially modified) arguments. OpInfo-based tests can opt-into testing references by including this new list in the Sequence passed to the `ops` decorator.
cc ngimel csarofeen kevinstephano Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75095
Reviewed By: ngimel
Differential Revision: D35888004
Pulled By: mruberry
fbshipit-source-id: 21e77c4456c2a02113367d4bdae168a3a2f33f25
(cherry picked from commit 1d5bcfa99d4e8cf36f60642803a0bfca50e2ea4e)
Reference #74537
Support for jiterating with `c10::complex<Half>`. Note that computation will take place in `complex<float>` by allowing implicit casting in JITerated code (similar to Half and BFloat16 which upcast to float for computation).
We add `complex32` support for `sigmoid` and `sigmoid_backward` in this PR. This is tested with `test_ops.py::test_dtypes and test_ops.py::test_complex_half_reference_testing`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75656
Approved by: https://github.com/ngimel
This PR makes the following improvements:
- moves the custom skip list for test_normalize_operator_exhaustive in test_fx_experimental to use the typical OpInfo skip architecture. The skips were updated to xfails, and that identified some operators which were no longer failing the test
- redundant tests with OpInfo-based testing in test_jit.py were removed
- test_dtypes was improved so its error messages are clear and it makes test_nondifferentiable redundant; the latter test has been removed
- OpInfo.supports_complex_autograd() is removed in favor of a more accurate and general test for whether the particular dtype is in the backward dtypes of the operator
- gradchecks have been improved to verify that an operator doesn't support grad if it claims not to
- gradchecks have been improved to test the gradient of all input tensors that require gradient
- the concept of "default test dtypes" has been removed
- excessive and mostly redundant out testing for elementwise unary operators has been removed
- metadata for whether an op supports nuanced "safe casting" to out behavior has been removed from OpInfos
- numerous skips have been converted to xfails
- numerous OpInfos have had their metadata fixed based on the new checks
- jit-specific utilities in common_methods_invocations.py have been moved to jit_programming_utils.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75951
Approved by: https://github.com/ngimel
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74646
The OpInfo-based test, given an operator and sample inputs,
checks all permutations of {inputs, grad_output} being either
{CompositeCompliantTensor, regular Tensor}, running them through a
forward pass and a backward pass.
Test Plan: - wait for tests
Reviewed By: albanD
Differential Revision: D35186860
Pulled By: zou3519
fbshipit-source-id: 8b2577dd6106c05db2ab583bbefd10545fdd8adf
(cherry picked from commit 3f5c3793715af9a8d4db06690c5faa7256a82645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74645
This PR adds tests for when only some inputs are Tensor Subclasses.
Why is this important to test?
==============================
Consider the following hypothetical out-of-place operation:
```
def my_add(x, y):
result = x.clone()
result.add_(y)
return result
```
You may expect this to work the same as torch.add. If x is not a Tensor
Subclass, but y is a Tensor subclass, then this returns us a regular
Tensor, NOT a Tensor subclass!
This is exactly the type of in-place operations that causes `vmap` to
fail and will be problematic for certain Tensor Subclasses in the future
so we're adding tests to make sure Composite pytorch operations don't do
this.
What exactly does this PR do?
=============================
Composite compliance now takes a sample input and produces a test case
where some of the sample inputs are Tensor Subclasses. It then sends
this through the original operation, once with Python Mode and one
without.
(Why once with Python Mode? Because we want to use it to detect the
pattern of "create a Tensor and call resize_ on it")
Finally, it repeats this process for all possiblities where the inputs
are Tensor subclasses. For example, if the sample input is (x, y), then
we test all four of the following cases:
- Subclass(x), y
- x, Subclass(y)
- Subclass(x), Subclass(y)
- x, y
Test Plan
=========
- run tests
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D35186862
Pulled By: zou3519
fbshipit-source-id: 102477507b56583463668db7523a6586d92b357d
(cherry picked from commit bfcb087244b0598abb270f7c26d472482f00b5e2)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74644
This is in preparation for me adding additional tests for:
1. composite compliance of autograd formulas
2. composite compliance of forward-mode AD formulas
This PR also changes these tests to run on both CPU and CUDA. Previously
they were just run on CPU, but it turns out there's a lot of branching
on the device in composite operations in PyTorch today :/
Test Plan: - wait for tests
Reviewed By: albanD
Differential Revision: D35186861
Pulled By: zou3519
fbshipit-source-id: d974592a7547f71ef26ff0740bf453f7d335d55a
(cherry picked from commit 773b43394c2406502a6e386a30eb003a73861f13)
Summary:
Following triage review discussion, it would be best for these tests to not be triaged high priority by automation, but by the triagers in the oncall.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74555
Reviewed By: albanD
Differential Revision: D35099202
Pulled By: janeyx99
fbshipit-source-id: 657a0317141de3a598476a6f601ec26cc26231b1
(cherry picked from commit 057519cb2494d0f9a0b169f359ac87ba9e89f088)
This PR extends our OpInfo test architecture with "reference inputs," an optional expansion of typical sample inputs that allows for more thorough testing. Currently only the elementwise binary operations implement an extended set of reference inputs. This PR also cleans up some smaller OpInfo-related issues, including several bugs, and it identified https://github.com/pytorch/pytorch/issues/74279.
A reference inputs function can be specified for an OpInfo by filling in its "reference_inputs_func" metadata. If this is done it's recommended that the reference inputs function first call the sample inputs function, then produce additional sample inputs. See `reference_inputs_elementwise_binary` for an example of this pattern.
In addition to implementing reference inputs for the elementwise binary operations, this PR improves their consistency and simplifies how their metadata is represented. The great majority now use a generic sample input function, and those that want extensions start by calling the generic sample input function and then adding additional samples. This removes many older sample input functions. The BinaryUfuncInfo subclass also now allows specifying scalar support more precisely, and reference inputs and error inputs are generated based on this metadata to ensure it's correct.
cc @kshitij12345 @pmeier @zou3519 @Chillee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74280
Approved by: https://github.com/ngimel
Fixes#72368
As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:
Reference : pytorch_test_times.json
```
{
"commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
"JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
"job_times": {
"test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----
Hence separating test_ops into following parts:
1. TestGradients
2. TestJit
3. TestCommon and TestMathBits
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
Fixes#72368
As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:
Reference : pytorch_test_times.json
```
{
"commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
"JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
"job_times": {
"test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----
Hence separating test_ops into following parts:
1. TestGradients
2. TestJit
3. TestCommon and TestMathBits
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
Summary:
A number of ROCm tests were skipped via the skipCUDAIfRocm flag.
A majority of the testcases are now supported on the ROCm platform. This fix enabled all of the test_ops tests for ROCm and enables most Operators in common_methods_invocations.py minus the SpectralFuncInfo class which still has some fft issues.
Partially Fixes https://github.com/pytorch/pytorch/issues/51303
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH amathews-amd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67706
Reviewed By: seemethere, janeyx99
Differential Revision: D34153457
Pulled By: malfet
fbshipit-source-id: 95f4420f306ca7580cd438d3b5cc0b24efbfae99
(cherry picked from commit 0d178fffd3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465
These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur
Test Plan: Imported from OSS
Reviewed By: wenleix
Differential Revision: D33595240
Pulled By: davidberard98
fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67996
This is necessary for most matrix decompositions in `linalg`.
cc mruberry
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision: D33774418
Pulled By: mruberry
fbshipit-source-id: 576f2dda9d484808b4acf0621514c0ffe26834e6
(cherry picked from commit fb07c50aa9)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69909
This test detected a number of sampling methods that were not generating
the samples as expected, e.g. `index_put`, `cosine_embedding`, `stft`, but
perhaps most notably the generator for `BinOps`.
It also detected that `reminder` and `fmod` did not have implemented the
backward formula for the second input. I added this in the previous PR.
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision: D33774422
Pulled By: mruberry
fbshipit-source-id: 76cfc75b1fdfd72ee64aa524665f83a75fe52509
(cherry picked from commit 13ea7b436b)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70253
I included a derivation of the formula in the complex case, as it is
particularly tricky. As far as I know, this is the first time this formula
is derived in the literature.
I also implemented a more efficient and more accurate version of svd_backward.
More importantly, I also added a lax check in the complex case making sure the loss
function just depends on the subspaces spanned by the pairs of singular
vectors, and not their joint phase.
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Test Plan: Imported from OSS
Reviewed By: mikaylagawarecki
Differential Revision: D33751982
Pulled By: mruberry
fbshipit-source-id: c2a4a92a921a732357e99c01ccb563813b1af512
(cherry picked from commit 391319ed8f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69998
Fixes: https://github.com/pytorch/pytorch/issues/69855
The check for undefined grads for forward AD was not being run because `check_undefined_grads` was only passed as True by OpInfo for backward AD. This PR updates gradcheck to interpret `check_undefined_grads` as possibly for forward or backward AD.
This PR also updates codegen to 1) not use ZeroTensor for `self` when the op is inplace. 2) only create zeros (either through ZeroTensor or at::zeros) if the tensor itself is not undefined. Previously we would error in this case when we call `.options` on the undefined tensor.
~TODO: undo the skips that are due to the original issue~
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D33235973
Pulled By: soulitzer
fbshipit-source-id: 5769b6d6ca123b2bed31dc2bc6bc8e4701581891
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68948
The case where both the negative and conjugate bits are set
isn't tested currently despite being handled explicitly by `copy`.
In theory this shouldn't matter because neg_bit is only used for real
values, but it does mean the code in copy is untested. So, this just
runs it with a single sample as a sanity check.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D33064371
Pulled By: anjali411
fbshipit-source-id: e90c65e311507c4fc618ff74fecc4929599c4fa3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68947
`_test_math_view` currently calls the operator with different values
than those specified in the `SampleInput`. This is undesirable as it
could break mathematical properties required by the operator. Instead,
this calls `math_op_view(math_op_physical(sample.input))` to get a
view that represents the same value as the original input.
`test_neg_view` already did this by returning `torch._neg_view(-x)`
from `math_op_view` but this moves the handling into `_test_math_view`
to make it apply to all view op tests.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D33064327
Pulled By: anjali411
fbshipit-source-id: 4d87e0c04fc39b95f8dc30dcabda0d554d16a1d8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69558
Currently we skip batched forward grad checks completely for certain views that also have inplace variants. This PR allow us to decouple the check.
Alternative: just skip the batched forward checks for inplace ops entirely. I'm okay with this because it was surprising to me these checks are being run in the first place.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33020599
Pulled By: soulitzer
fbshipit-source-id: f8012aadc0e775f80da0ab62b2c11f6645bb1f51
Summary:
This PR:
- creates the "jiterator" pattern, allowing elementwise unary and binary kernels that don't accept scalars to be jit compiled when called
- ports the gcd and i1 CUDA kernels to use the jiterator
- extends elementwise binary systemic testing to be comparable to elementwise unary systemic testing
- separates one test case from test_out in test_ops.py
- updates more OpInfos to use expected failures instead of skips
The jiterator currently does not support half, bfloat16 or complex dtypes. It also (as mentioned above) doesn't support scalar inputs. In the future we expect to add support for those datatypes and scalars.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69439
Reviewed By: ngimel
Differential Revision: D32874968
Pulled By: mruberry
fbshipit-source-id: d44bb9cde4f602703e75400ec5a0b209f085e9b3