Enables:
test_bmm_cuda_float64
test_bmm_deterministic_cuda_float64
test_csr_matvec_cuda_complex128
test_csr_matvec_cuda_complex64
test_csr_matvec_cuda_float32
test_csr_matvec_cuda_float64
To enable the above tests had to add some more hip mappings for the hipification process.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78939
Approved by: https://github.com/pruthvistony, https://github.com/malfet
Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`.
This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring.
My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes.
It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang.
(Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590
Approved by: https://github.com/ezyang
Add support for sparse fake tensors.
- The testing strategy is to run a fake tensor cross ref test on `test_sparse.py`. This is necessary because OpInfo sparse coverage is completely nonexistent. We could have tried to turn on cross ref testing globally for all files, but that would be very time consuming and the tests I'm interested in are mostly in this file. There are some exclusions in testing for things that don't work.
- I make fake tensor converter raise a UnsupportedFakeTensorException if the meta converter fails to do a conversion (which can happen in a relatively large number of situations).
- I relax fake tensor invariants so that you can make a fake tensor from a meta tensor. This is useful because in the cross ref test sometimes we operate on meta tensors.
- Fake tensor wrapping is improved to handle the case when a function doesn't return any tensors
- Meta converter is taught how to convert sparse tensors to meta
There's still a little more cleanup that needs to be done, but this is good for review.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82172
Approved by: https://github.com/eellison
Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags)
Part of #70926
In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to.
Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output.
The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output:
```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor
```
Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`.
This would need to be altered for the case where `len(shape)` > 2. One options is:
```
torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different.
Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like:
```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`.
In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility.
I think some discussion is required about:
- [x] Should the N-D output case be implemented from the outset
- [x] If not, should the future addition of the N-D output case be considered when designing the interface.
- [x] Other thoughts on the signature which includes the `dims` information for the N-D output case.
**Resolution**: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu
Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags)
Part of #70926
In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to.
Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output.
The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output:
```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor
```
Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`.
This would need to be altered for the case where `len(shape)` > 2. One options is:
```
torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different.
Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like:
```
torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor
```
Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`.
In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility.
I think some discussion is required about:
- [x] Should the N-D output case be implemented from the outset
- [x] If not, should the future addition of the N-D output case be considered when designing the interface.
- [x] Other thoughts on the signature which includes the `dims` information for the N-D output case.
**Resolution**: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439
Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963
This is a re-land of D35192346 (9872a06d77) and D35192317 (a9216cde6c), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633.
The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`.
**Background: Existing Mobile Optimization**
Pytorch mobile builds have an existing optimization (here cc23725e89/c10/core/DispatchKey.h (L382) and here cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)), which works as follows:
Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).
In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.
The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294).
The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64.
**The Bug**
This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294).
That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`.
**Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?**
Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this?
**The debugging experience was pretty difficult**
Debugging the Milan-specific failure was made difficult by the following:
(1) lack of CI
- the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging.
(2) It's difficult to get a repro.
- my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space)
- There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/)
(3) Lack of stack-traces.
- Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash.
ghstack-source-id: 152688542
Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing.
Reviewed By: phding, albanD
Differential Revision: D35222806
fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30
(cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402
The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes.
Reland of D33255193 (20b8653dfa)
ghstack-source-id: 148830728
Test Plan:
Steps to test:
(1) connect to a mobile OD
(2) run `one_world android emulator android-29` in a terminal to start the android emulator
(3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled`
I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR.
Reviewed By: albanD
Differential Revision: D34034848
fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5
(cherry picked from commit 9ab9b12d35)
Summary: I think this diff stack broke all the related tasks below.
Test Plan:
For our failing tests:
buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled
For the ubn:
Not really sure what to do, trying to build the app and see if I can use an effect?
Reviewed By: shoumikhin
Differential Revision: D34018849
fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65
(cherry picked from commit 3cc63cb2ea)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71091
Fixes https://github.com/pytorch/pytorch/issues/65394
The masked sum on a full input tensor (of any layout) with an all-true mask is the same as the sum on the strided input tensor (after applying `to_dense` to sparse inputs).
Since masked sum uses `torch.sparse.sum` then, for the simplicity of masked reductions implementations, its reduction behavior ought to be defined by the behavior of the `torch.sum`. This PR implements the behavioral connection with respect to the directional summation of empty sparse tensors that correspond to all-zero strided tensors.
cc nikitaved pearu cpuhrsch
Test Plan: Imported from OSS
Reviewed By: davidberard98
Differential Revision: D33651750
Pulled By: cpuhrsch
fbshipit-source-id: 703891bff88c8da6270b4272f5d2da81688db67d
(cherry picked from commit 53f97e80f7)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887Closes#46988, closes#46987, closes#46761
By "simple" I mean operators that map 0->0 so we can implement it by
just re-dispatching on the values tensor. That does mean we have `sin`
but not `cos` for example, but without fill value support this is the
best that can be done.
Most of these don't support autograd because the derivative formulas
use unsupported operators.
cc nikitaved pearu cpuhrsch IvanYashchuk
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32734911
Pulled By: cpuhrsch
fbshipit-source-id: 203ab105799f3d2d682b01ca3d6b18e7c994776a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887Closes#46988, closes#46987, closes#46761
By "simple" I mean operators that map 0->0 so we can implement it by
just re-dispatching on the values tensor. That does mean we have `sin`
but not `cos` for example, but without fill value support this is the
best that can be done.
Most of these don't support autograd because the derivative formulas
use unsupported operators.
cc nikitaved pearu cpuhrsch IvanYashchuk
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32706197
Pulled By: cpuhrsch
fbshipit-source-id: 65e1acb3645737ca7bdb7f2db739d8e118906f4b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67904.
- Create a sparse tensor when the sparse layout is given even if the input tensor is not sparse.
cc nikitaved pearu cpuhrsch IvanYashchuk
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68108
Reviewed By: anjali411
Differential Revision: D32316269
Pulled By: cpuhrsch
fbshipit-source-id: 923dbd4dc7c74f51f7cdbafb2375a30271a6a886
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181
This PR replaces all the calls to:
- `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python
- `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python.
It also simplifies two pieces of code, and fixes one bug where a pair
of parentheses were missing in the function `make_symmetric_matrices`.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D31692896
Pulled By: anjali411
fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.
`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.
cc nikitaved pearu cpuhrsch IvanYashchuk ezyang anjali411 dylanbespalko mruberry Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980
Reviewed By: ngimel
Differential Revision: D30994115
Pulled By: cpuhrsch
fbshipit-source-id: 4f55b99e8e25079d6273b4edf95ad6fa85aeaf24
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554
Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:
1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.
We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D30662206
Pulled By: mruberry
fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60548
`Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback.
Repro:
```
import torch
x = torch.tensor(0)
x // 1
```
Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide:
```
UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:571.)
return torch.floor_divide(self, other)
```
After this change, the warning instead cites the user's offending line of Python code:
```
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
x // 1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034
Reviewed By: mruberry
Differential Revision: D30658010
Pulled By: saketh-are
fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.
`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980
Reviewed By: ngimel
Differential Revision: D29699456
Pulled By: cpuhrsch
fbshipit-source-id: 407ae53392acb2f92396a62a57cbaeb0fe6e950b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59916
This fixes two problems with sparse multiplication
- 0d-dense * sparse was creating a non-sparse output and failing.
- dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message
<details>
<summary> unhelpful error message </summary>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].
SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel]
SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel]
SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel]
SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
</details>
Also added tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723
Reviewed By: ezyang
Differential Revision: D29962639
Pulled By: cpuhrsch
fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06
Summary:
The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`.
This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic.
One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism.
The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability.
Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233
Reviewed By: iramazanli
Differential Revision: D29494995
Pulled By: jbschlosser
fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59553
Added a test for 0x0 sparse coo input for sparse_unary_ufuncs.
This test fails for `conj` on master.
Modified `unsupportedTypes` for test_sparse_consistency, complex dtypes
pass, but float16 doesn't pass for `conj` because `to_dense()` doesn't
work with float16.
Fixes https://github.com/pytorch/pytorch/issues/59549
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D28968215
Pulled By: anjali411
fbshipit-source-id: 44e99f0ce4aa45b760d79995a021e6139f064fea
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745
The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.
This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.
| Shape | Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us | 2150 us | 551 us |
| 128,128,32 | 1250 us | 1020 us | 197 us |
| 64,128,32 | 581 us | 495 us | 99 us |
| 32,128,32 | 292 us | 255 us | 83 us |
| 16,128,32 | 147 us | 126 us | 75 us |
| 8,128,32 | 75 us | 65 us | 65 us |
| 4,128,32 | 39 us | 33 us | 33 us |
| 2,128,32 | 20 us | 18 us | 18 us |
| 1,128,32 | 11 us | 9 us | 9 us |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149
Reviewed By: mruberry
Differential Revision: D28817466
Pulled By: ngimel
fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732
Summary:
Closes gh-24745
The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.
This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.
| Shape | Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us | 2220 us | 496 us |
| 128,128,32 | 1250 us | 976 us | 175 us |
| 64,128,32 | 581 us | 486 us | 88 us |
| 32,128,32 | 292 us | 245 us | 80 us |
| 16,128,32 | 147 us | 120 us | 71 us |
| 8,128,32 | 75 us | 61 us | 61 us |
| 4,128,32 | 39 us | 32 us | 32 us |
| 2,128,32 | 20 us | 17 us | 17 us |
| 1,128,32 | 11 us | 9 us | 9 us |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811
Reviewed By: anjali411
Differential Revision: D28700259
Pulled By: ngimel
fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153
Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA.
- [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors
- [x] add complex support to coalesce function
- [x] add complex support to to_dense function
- [x] add complex support to to_sparse function
- [x] add complex support to sparse_add function
- [x] add unit tests
Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra.
Note: Before using ghstack the original PR was #50984
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27765618
Pulled By: ezyang
fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52253
In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor.
As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874
Reviewed By: bdhirsh
Differential Revision: D27246997
Pulled By: albanD
fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3
Summary:
Stack:
* https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs
* __#54922 Added support for TensorList inputs in OpInfo__
Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck.
Note: JIT testing support for TensorList inputs will be added in a follow up PR.
Fixes https://github.com/pytorch/pytorch/issues/51996
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922
Reviewed By: H-Huang
Differential Revision: D27448952
Pulled By: heitorschueroff
fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034Fixes#53544
I had to touch a bunch of lines but the refactoring was fairly
mechanical. Here's how it works.
The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions. The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key(); it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time). See also #53124. But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.
So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
- Most top level functions take a DispatchKey as their argument. I
use the new function dispatchKeyToTensorOptions to convert it into
a TensorOptions
- typeIdWithDefault now produces a TensorOptions (probably could do
with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
- Previously, the function options() was typically used to convert the
DispatchKey into a TensorOptions. Now its replacement build_options
just takes a TensorOptions and sets some extra fields on it.
Irritatingly, I can't just replace
`build_options(options, scalar_type, device)` with
`options.dtype(scalar_type).device(device)` because the semantics
are slightly different: if device is nullopt, we should preserve
the usage of the device specified in options (what options.device()
does is overwrite the device unconditionally; e.g., if device is
nullopt, unset device from options)
- The other major sink for DispatchKey was `internal_new_from_data`,
but it turns out it only really extracts the device type from
the dispatch key. Now it just pulls out the device from
TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
introduce new functions dispatchKeyToLayout (replicating
layout_from_backend--there are still a few uses of this function
so I couldn't delete it) and dispatchKeyToDeviceType (replacing
computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
previously used, I now do `other.options().type_equal()`, which
is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
a tensor for values, and then read out the dispatch key from the
result to allocate the keys. As best as I can tell, this is totally
equivalent to just passing in the options to both values and indices
(the only difference is dtype, which is captured via a separate
argument)
This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all. I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.
Even with this limited refactor, the payoff is sweet. I can delete:
- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType
The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D27109509
Pulled By: ezyang
fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
checks sizes of sparse tensors when comparing them in assertEqual.
Removes additional checks in safeCoalesce, safeCoalesce should not be a test for `.coalesce()` function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47148
Reviewed By: mruberry
Differential Revision: D24823127
Pulled By: ngimel
fbshipit-source-id: 9303a6ff74aa3c9d9207803d05c0be2325fe392a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43699
- Changed the order of `TORCH_CHECK` and `if (options.layout() == kSparse && self.is_sparse())`
inside `empty_like` method.
- [x] Added tests
EDIT:
More details on that and why we can not take zeros_like approach.
Python code :
```python
res = torch.zeros_like(input_coalesced, memory_format=torch.preserve_format)
```
is routed to
```c++
// TensorFactories.cpp
Tensor zeros_like(
const Tensor& self,
const TensorOptions& options,
c10::optional<c10::MemoryFormat> optional_memory_format) {
if (options.layout() == kSparse && self.is_sparse()) {
auto res = at::empty({0}, options); // to be resized
res.sparse_resize_and_clear_(
self.sizes(), self.sparse_dim(), self.dense_dim());
return res;
}
auto result = at::empty_like(self, options, optional_memory_format);
return result.zero_();
}
```
and passed to `if (options.layout() == kSparse && self.is_sparse())`
When we call in Python
```python
res = torch.empty_like(input_coalesced, memory_format=torch.preserve_format)
```
it is routed to
```c++
Tensor empty_like(
const Tensor& self,
const TensorOptions& options_,
c10::optional<c10::MemoryFormat> optional_memory_format) {
TORCH_CHECK(
!(options_.has_memory_format() && optional_memory_format.has_value()),
"Cannot set memory_format both in TensorOptions and explicit argument; please delete "
"the redundant setter.");
TensorOptions options =
self.options()
.merge_in(options_)
.merge_in(TensorOptions().memory_format(optional_memory_format));
TORCH_CHECK(
!(options.layout() != kStrided &&
optional_memory_format.has_value()),
"memory format option is only supported by strided tensors");
if (options.layout() == kSparse && self.is_sparse()) {
auto result = at::empty({0}, options); // to be resized
result.sparse_resize_and_clear_(
self.sizes(), self.sparse_dim(), self.dense_dim());
return result;
}
```
cc pearu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44058
Reviewed By: albanD
Differential Revision: D23672494
Pulled By: mruberry
fbshipit-source-id: af232274dd2b516dd6e875fc986e3090fa285658
Summary:
This PR:
- updates div to perform true division
- makes torch.true_divide an alias of torch.div
This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907
Reviewed By: ngimel
Differential Revision: D23622114
Pulled By: mruberry
fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927
Summary:
Refactor comnon pattern of (torch.cuda.version and [int(x) for x in torch.cuda.version.split(".")] >= [a, b]) into `_get_torch_cuda_version()` function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42626
Reviewed By: seemethere
Differential Revision: D22956149
Pulled By: malfet
fbshipit-source-id: 897c55965e53b477cd20f69e8da15d90489035de
Summary:
**BC-Breaking Note:**
BC breaking changes in the case where keepdim=True. Before this change, when calling `torch.norm` with keepdim=True and p='fro' or p=number, leaving all other optional arguments as their default values, the keepdim argument would be ignored. Also, any time `torch.norm` was called with p='nuc', the result would have one fewer dimension than the input, and the dimensions could be out of order depending on which dimensions were being reduced. After the change, for each of these cases, the result has the same number and order of dimensions as the input.
**PR Summary:**
* Fix keepdim behavior
* Throw descriptive errors for unsupported sparse norm args
* Increase unit test coverage for these cases and for complex inputs
These changes were taken from part of PR https://github.com/pytorch/pytorch/issues/40924. That PR is not going to be merged because it overrides `torch.norm`'s interface, which we want to avoid. But these improvements are still useful.
Issue https://github.com/pytorch/pytorch/issues/24802
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41956
Reviewed By: albanD
Differential Revision: D22837455
Pulled By: mruberry
fbshipit-source-id: 509ecabfa63b93737996f48a58c7188b005b7217
Summary:
**BC-Breaking Note**
This PR changes the behavior of the torch.tensor, torch.as_tensor, and sparse constructors. When given a tensor as input and a device is not explicitly specified, these constructors now always infer their device from the tensor. Historically, if the optional dtype kwarg was provided then these constructors would not infer their device from tensor inputs. Additionally, for the sparse ctor a runtime error is now thrown if the indices and values tensors are on different devices and the device kwarg is not specified.
**PR Summary**
This PR's functional change is a single line:
```
auto device = device_opt.has_value() ? *device_opt : (type_inference ? var.device() : at::Device(computeDeviceType(dispatch_key)));
```
=>
```
auto device = device_opt.has_value() ? *device_opt : var.device();
```
in `internal_new_from_data`. This line entangled whether the function was performing type inference with whether it inferred its device from an input tensor, and in practice meant that
```
t = torch.tensor((1, 2, 3), device='cuda')
torch.tensor(t, dtype=torch.float64)
```
would return a tensor on the CPU, not the default CUDA device, while
```
t = torch.tensor((1, 2, 3), device='cuda')
torch.tensor(t)
```
would return a tensor on the device of `t`!
This behavior is niche and odd, but came up while aocsa was fixing https://github.com/pytorch/pytorch/issues/40648.
An additional side affect of this change is that the indices and values tensors given to a sparse constructor must be on the same device, or the sparse ctor must specify the dtype kwarg. The tests in test_sparse.py have been updated to reflect this behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41984
Reviewed By: ngimel
Differential Revision: D22721426
Pulled By: mruberry
fbshipit-source-id: 909645124837fcdf3d339d7db539367209eccd48
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21740237
Pulled By: mruberry
fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21717199
Pulled By: mruberry
fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a
Summary:
This PR implements softmax support for sparse tensors.
The sparse softmax is related to dense softmax when the values of unspecified sparse tensor entries are taken to be `-inf` that will have the effect of "zero entries ignored". This relation is used for testing the correctness of results here.
Resolves https://github.com/pytorch/pytorch/issues/23651 for CPU.
- [x] sparse softmax
- [x] CPU C++ implementation
- [x] unittests
- [x] update softmax documentation
- [x] autograd support
- [x] sparse log_softmax
- [x] CPU C++ implementation
- [x] unittests
- [x] update log_softmax documentation
- [x] autograd support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36305
Differential Revision: D21566540
Pulled By: ezyang
fbshipit-source-id: a632ea69c38622f960721482e442efeb8d0a54fc
Summary:
This pull request fixes and re-enables two of the tests disabled in https://github.com/pytorch/pytorch/issues/37427
1. `test_sparse_add_out_bfloat16` in test_sparse.py fixed to use updated `atol` argument instead of `prec` for `assertEqual`
2. The conversion of `flt_min` to `int64` is divergent on HIP compared to numpy. The change removes that conversion from the `test_float_to_int_conversion_finite` test case in test_torch.py
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37616
Differential Revision: D21379876
Pulled By: ezyang
fbshipit-source-id: 2bfb41d67874383a01330c5d540ee516b3b07dcc
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34964
Sparse cuda add was implemented by just concatenating the indices and values for the tensor. If called repeatedly in a tight loop this will let `nnz` grow unbounded. In the worst case of `x.add_(x)` it grows exponentially.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36030
Differential Revision: D20873504
Pulled By: zou3519
fbshipit-source-id: d90ed8dda0c89571fb89e358757b5dde299513df
Summary:
This pull request disables the unit tests that were observed to be failing once `test2` was enabled. These tests will be one by one looked at and fixed at the earliest, but until then disabling them to unblock `test2`
The pull request also disables fftPlanDestroy for rocFFT to avoid double-freeing FFT handles
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37427
Differential Revision: D21302909
Pulled By: ezyang
fbshipit-source-id: ecadda3778e65b7f4f97e24b932b96b9ce928616
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
Summary:
Enables bfloat16 type for add_out of sparse tensors. Also enabled it for coalesce() which is used in unit test reference checking.
iotamudelta ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35978
Differential Revision: D20874142
Pulled By: ezyang
fbshipit-source-id: af8d2f4bc5f5cc3bb7f8cb1e3c688669ba3d13b9
Summary:
Per title. See related https://github.com/pytorch/pytorch/pull/34570.
In PyTorch 1.7 the plan is for torch.div and Python's division operator to perform "true" division, like Python 3, JAX, and NumPy. To facilitate this change, this PR expands true_divide to be a method so it can cover all of torch.div's use cases.
New true_divide tests are added to test_torch.py, test_type_promotion.py, and test_sparse.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34794
Differential Revision: D20545507
Pulled By: mruberry
fbshipit-source-id: 55286f819716c8823d1930441a69008560ac2bd5
Summary:
(Updated per review feedback)
`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:
- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors
Tests are added to test_sparse.py and test_torch.py for these new behaviors.
In addition, this PR:
- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU
Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).
The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.
There are two potential follow-up issues suggested by this PR:
- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552
Differential Revision: D20509850
Pulled By: mruberry
fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8
Summary:
(Updated per review feedback)
`torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to:
- have an out variant: `floor_divide(x, y, out=z)`
- be a method on a tensor: `x.floor_divide(y)`
- have an in-place variant: `x.floor_divide_(y)`
- work with sparse tensors
Tests are added to test_sparse.py and test_torch.py for these new behaviors.
In addition, this PR:
- cleans up the existing sparse division and true_division code and improves their error message
- adds testing of sparse true_division to test_sparse.py
- extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU
Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y).
The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors.
There are two potential follow-up issues suggested by this PR:
- the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes
- the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552
Differential Revision: D20497453
Pulled By: mruberry
fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d
Summary:
This PR fixed documentation for `torch.add` with alpha. It also fixed these deprecated python calls `torch.add` and `torch.addmm` in tests, which may affect performance in *test/test_sparse.py* and *test/test_nn.py*.
cc csarofeen ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33935
Differential Revision: D20313320
Pulled By: ngimel
fbshipit-source-id: fb08413d7e244865952e3fc0e1be7f1794ce4e9a
Summary:
Addresses https://github.com/pytorch/pytorch/issues/33300.
Calling .numpy() on a CUDA or non-strided (e.g. sparse) tensor segfaults in current PyTorch. This fixes the segfaults and throws the appropriate TypeError, as was intended.
Two tests, one in test_cuda.py and the other in test_sparse.py, are added to verify the behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33612
Differential Revision: D20038210
Pulled By: mruberry
fbshipit-source-id: 265531dacd37c392232fd3ec763489a62ef54795
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445
Create distributed and rpc directories under caffe/test for better management
of unit tests.
Differential Revision: D18702786
fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31490
When this happens, a dense tensor is constructed from a sparse constructor.
Fixes: https://github.com/pytorch/pytorch/issues/16154
Test Plan: Imported from OSS
Reviewed By: cpuhrsch, mrshenli
Differential Revision: D19196498
Pulled By: gchanan
fbshipit-source-id: 57a6324833e35f3e62318587ac74267077675b93
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892
Fixes all outstanding lints and actually installs a properly configured
flake8
Test Plan: Imported from OSS
Differential Revision: D18862825
Pulled By: suo
fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85