Commit Graph

213 Commits

Author SHA1 Message Date
Yukio Siraichi
c829cb6840 Port min kernel to structured kernels. (#61450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61450

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D29741713

Pulled By: bdhirsh

fbshipit-source-id: 2c107752a90fd39cfb55e08aaf3541bd484a5fc3
2021-09-28 14:03:54 -07:00
Ivan Yashchuk
1fec9cd76b [Fixed] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980)
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.

`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.

cc nikitaved pearu cpuhrsch IvanYashchuk ezyang anjali411 dylanbespalko mruberry Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980

Reviewed By: ngimel

Differential Revision: D30994115

Pulled By: cpuhrsch

fbshipit-source-id: 4f55b99e8e25079d6273b4edf95ad6fa85aeaf24
2021-09-21 13:03:40 -07:00
Philip Meier
26b7ff5aea deprecate dtype getters from torch.testing namespace (#63554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554

Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:

1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.

We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30662206

Pulled By: mruberry

fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
2021-09-07 08:58:51 -07:00
Richard Zou
92b31b59af Revert D29699456: [pytorch][PR] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA]
Test Plan: revert-hammer

Differential Revision:
D29699456 (ad4848565e)

Original commit changeset: 407ae53392ac

fbshipit-source-id: b6c70ba8bb28c0c38de47857030b69792a8470de
2021-09-01 07:32:24 -07:00
Saketh Are
83e28a7d28 Use stacklevel for floordiv deprecation warnings (#64034)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60548

`Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback.

Repro:
```
import torch
x = torch.tensor(0)
x // 1
```

Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide:
```
UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:571.)
  return torch.floor_divide(self, other)
```

After this change, the warning instead cites the user's offending line of Python code:
```
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  x // 1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034

Reviewed By: mruberry

Differential Revision: D30658010

Pulled By: saketh-are

fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a
2021-08-31 11:27:56 -07:00
Ivan Yashchuk
ad4848565e Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980)
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.

`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980

Reviewed By: ngimel

Differential Revision: D29699456

Pulled By: cpuhrsch

fbshipit-source-id: 407ae53392acb2f92396a62a57cbaeb0fe6e950b
2021-08-30 15:06:25 -07:00
Kushashwa Ravi Shrimali
d37636901e [Doc] make_tensor to torch.testing module (#63925)
Summary:
This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs.

TODOs:

* [x] Add examples

cc: pmeier mruberry brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925

Reviewed By: ngimel

Differential Revision: D30633487

Pulled By: mruberry

fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af
2021-08-30 12:25:40 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
mattip
c8eda919a4 test, fix sparse * dense exceptions and corner case (#61723)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59916

This fixes two problems with sparse multiplication
- 0d-dense * sparse was creating a non-sparse output and failing.
- dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message
<details>
<summary> unhelpful error message </summary>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel]
SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel]
SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel]
SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
</details>

Also added tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723

Reviewed By: ezyang

Differential Revision: D29962639

Pulled By: cpuhrsch

fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06
2021-08-05 11:27:12 -07:00
Kurt Mohler
87334c40a7 Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61571

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629

Reviewed By: mrshenli

Differential Revision: D29774486

Pulled By: albanD

fbshipit-source-id: bfc9119c478f0244d5be681bcf4954a3eb97e542
2021-07-20 10:55:43 -07:00
Anjali Chourdia
287603f51c Revert D29698486: [pytorch][PR] Remove torch._bmm and remove torch.bmm deterministic arg documentation
Test Plan: revert-hammer

Differential Revision:
D29698486 (328606699f)

Original commit changeset: 5af2d3803ab1

fbshipit-source-id: ce954c13196b1fb8277d61a686ac351d3bf13903
2021-07-16 11:02:09 -07:00
Kurt Mohler
328606699f Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61571

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629

Reviewed By: zou3519

Differential Revision: D29698486

Pulled By: albanD

fbshipit-source-id: 5af2d3803ab1eb093616bcfc7e074d8b57ef6958
2021-07-16 09:18:34 -07:00
Joel Schlosser
03b5a225a7 Test parametrization for instantiated device-specific tests (#60233)
Summary:
The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`.

This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic.

One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism.

The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability.

Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233

Reviewed By: iramazanli

Differential Revision: D29494995

Pulled By: jbschlosser

fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc
2021-06-30 18:50:22 -07:00
Ivan Yashchuk
90303157ab Enable complex dtypes for coo_sparse-coo_sparse matmul [CPU] (#59554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59554

This PR enables complex numbers supports for matrix-matrix
multiplication of COO sparse matrices.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D28968309

Pulled By: anjali411

fbshipit-source-id: 4fd471e76a5584366aabc86c08b4564667ee54ca
2021-06-08 19:34:41 -07:00
Ivan Yashchuk
acc47357b5 Fix torch.conj for zero-dimensional sparse coo matrix (#59553)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59553

Added a test for 0x0 sparse coo input for sparse_unary_ufuncs.
This test fails for `conj` on master.

Modified `unsupportedTypes` for test_sparse_consistency, complex dtypes
pass, but float16 doesn't pass for `conj` because `to_dense()` doesn't
work with float16.

Fixes https://github.com/pytorch/pytorch/issues/59549

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D28968215

Pulled By: anjali411

fbshipit-source-id: 44e99f0ce4aa45b760d79995a021e6139f064fea
2021-06-08 15:46:49 -07:00
Peter Bell
99f2000a99 Migrate nonzero from TH to ATen (CPU) (#59149)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745

The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.

This PR also significantly improves performance by adding multithreading support to the algorithm.  As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.

|    Shape   |  Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us |          2150 us |            551 us |
| 128,128,32 | 1250 us |          1020 us |            197 us |
|  64,128,32 |  581 us |           495 us |             99 us |
|  32,128,32 |  292 us |           255 us |             83 us |
|  16,128,32 |  147 us |           126 us |             75 us |
|  8,128,32  |   75 us |            65 us |             65 us |
|  4,128,32  |   39 us |            33 us |             33 us |
|  2,128,32  |   20 us |            18 us |             18 us |
|  1,128,32  |   11 us |             9 us |              9 us |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149

Reviewed By: mruberry

Differential Revision: D28817466

Pulled By: ngimel

fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732
2021-06-02 12:26:29 -07:00
Natalia Gimelshein
657b75d155 Revert D28700259: [pytorch][PR] Migrate nonzero from TH to ATen (CPU)
Test Plan: revert-hammer

Differential Revision:
D28700259 (95b1bc1009)

Original commit changeset: 9b279ca7c36d

fbshipit-source-id: 267afe63376be598d24c862e02e3b4b3ea75f77c
2021-05-27 20:07:30 -07:00
Peter Bell
95b1bc1009 Migrate nonzero from TH to ATen (CPU) (#58811)
Summary:
Closes gh-24745

The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.

This PR also significantly improves performance by adding multithreading support to the algorithm.  As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.

|    Shape   |  Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us |          2220 us |            496 us |
| 128,128,32 | 1250 us |           976 us |            175 us |
|  64,128,32 |  581 us |           486 us |             88 us |
|  32,128,32 |  292 us |           245 us |             80 us |
|  16,128,32 |  147 us |           120 us |             71 us |
|  8,128,32  |   75 us |            61 us |             61 us |
|  4,128,32  |   39 us |            32 us |             32 us |
|  2,128,32  |   20 us |            17 us |             17 us |
|  1,128,32  |   11 us |             9 us |              9 us |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811

Reviewed By: anjali411

Differential Revision: D28700259

Pulled By: ngimel

fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159
2021-05-27 10:06:54 -07:00
Pearu Peterson
be4ba29d49 Detect overflow in numel of sparse COO tensor (#57492)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57416

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57492

Reviewed By: albanD

Differential Revision: D28273649

Pulled By: mruberry

fbshipit-source-id: 08ba50509556df1981d7ede025d84a836d2e8e5e
2021-05-25 22:16:21 -07:00
Alexander
6f2c0cccdd New: sparse complex: add linear algebra, addmm (#57129)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57129

Test Plan: Imported from OSS

Reviewed By: janeyx99, astaff

Differential Revision: D28112701

Pulled By: ezyang

fbshipit-source-id: 1b253453dc19e908fb18d0b1a83738243e0a8d59
2021-05-07 05:37:48 -07:00
Alexander
a911c4fc1c New: Initial support for sparse complex tensors constructors for CPU/CUDA (#57125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57125

I'm opening this PR, solving the last issued reported before merging PR #54153

https://github.com/pytorch/pytorch/pull/54153#issuecomment-827997616,

Solves gh-50690

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D28112702

Pulled By: ezyang

fbshipit-source-id: 915681954edb14b7c19c3ffe641af2d2e6649576
2021-05-07 05:36:41 -07:00
Peter Bell
a5288a0244 Sparse support for division rounding_mode argument (#51989)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51989

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D28118114

Pulled By: mruberry

fbshipit-source-id: 2a76ee55c3845552e57e93d54628ce3c2fab3399
2021-05-01 17:37:25 -07:00
Mike Ruberry
7bcce2acb9 Revert D27765618: Initial support for sparse complex tensors constructors for CPU/CUDA
Test Plan: revert-hammer

Differential Revision:
D27765618 (daef60c3b7)

Original commit changeset: a9cdd31d5c7a

fbshipit-source-id: f700d5db7ff8930b9158460b5a77f68a35e212a4
2021-04-27 15:48:51 -07:00
Alexander
0d41122e61 Eliminate global usage of torch.set_default_dtype in sparse test (#56393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56393

Fixes for  gh-56369

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27913266

Pulled By: mruberry

fbshipit-source-id: 2c590d3a2188aae251184f08c1a6a2c4c570d150
2021-04-27 15:23:14 -07:00
Alexander
daef60c3b7 Initial support for sparse complex tensors constructors for CPU/CUDA (#54153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153

Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA.

- [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors
- [x] add complex support to coalesce function
- [x] add complex support to to_dense function
- [x] add complex support to to_sparse function
- [x] add complex support to sparse_add function
- [x] add unit tests

Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra.

Note: Before using ghstack the original PR  was  #50984

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D27765618

Pulled By: ezyang

fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89
2021-04-27 14:39:13 -07:00
sorenrasmussenai
f27513e951 Fix bug in torch.sparse.addmm on CUDA when beta != 0 or 1 (#56160)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/55917, which caused `torch.sparse.addmm` to fail on CUDA whenever `beta` was different from 0 or 1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56160

Reviewed By: ejguan

Differential Revision: D27825108

Pulled By: ngimel

fbshipit-source-id: 2ade5ea38c5322768dc4dffb40c65fcbb17ec201
2021-04-26 02:57:41 -07:00
Alexander
6ee333cdb5 modernize test_sparse (#54572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54572

Adding device generic tests to `test_sparse`.
Follow-up PR: #54153

I think is ready to review.
Looking forward your comments cc mruberry.

Thanks

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27562663

Pulled By: mruberry

fbshipit-source-id: c48973e707f779b529bc7f61b75103194b428987
2021-04-09 12:19:29 -07:00
Alban Desmaison
b91d48877d Reland Fix reference cycle in sparse coalesce graph (#55404)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/52874

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55404

Reviewed By: bdhirsh

Differential Revision: D27600438

Pulled By: albanD

fbshipit-source-id: f5c286638b324ad59be65657a016028af5e2b303
2021-04-07 12:02:42 -07:00
Brian Hirsh
ec80981d28 Revert D27246997: [pytorch][PR] Fix reference cycle in sparse coalesce graph
Test Plan: revert-hammer

Differential Revision:
D27246997 (815bfad28c)

Original commit changeset: 0fe6c1104350

fbshipit-source-id: 4d345718589a642d3c65474b266342285205ccdf
2021-04-06 11:45:27 -07:00
Peter Bell
815bfad28c Fix reference cycle in sparse coalesce graph (#52874)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52253

In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor.

As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874

Reviewed By: bdhirsh

Differential Revision: D27246997

Pulled By: albanD

fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3
2021-04-06 08:32:19 -07:00
Heitor Schueroff
6d87b3667f Added support for TensorList inputs in OpInfo (#54922)
Summary:
Stack:
* https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs
* __#54922 Added support for TensorList inputs in OpInfo__

Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck.

Note: JIT testing support for TensorList inputs will be added in a follow up PR.

Fixes https://github.com/pytorch/pytorch/issues/51996

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922

Reviewed By: H-Huang

Differential Revision: D27448952

Pulled By: heitorschueroff

fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278
2021-03-31 04:42:10 -07:00
Edward Yang
e0aebe241d Refactor tensor_new.cpp to use TensorOptions instead of DispatchKey (#54034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27109509

Pulled By: ezyang

fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9
2021-03-19 09:08:32 -07:00
mattip
54a2498919 Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387)
Summary:
Related to https://github.com/pytorch/pytorch/issues/50006

Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387

Reviewed By: albanD

Differential Revision: D26773387

Pulled By: mruberry

fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd
2021-03-08 03:32:14 -08:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Rong Rong (AI Infra)
b52e2e6045 [BE] _get_torch_cuda_version should return tuple (#52409)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52409

Reviewed By: jbschlosser, glaringlee

Differential Revision: D26513924

Pulled By: walterddr

fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734
2021-02-18 09:28:38 -08:00
Mike Ruberry
594a66d778 Warn about floor_divide performing incorrect rounding (#50281) (#50281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745

Test Plan: Imported from OSS

Reviewed By: ngimel

Pulled By: mruberry

Differential Revision: D26257855

fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b
2021-02-10 03:13:34 -08:00
Jeffrey Wan
c0966914bc Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49409

There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories:
1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead
3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag

Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?)

Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False.

So far exceptions to the above (as discovered by CI) include:
 - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests
 - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103)
 - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236)
 - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235)
 - test_data_parallel (test_data_parallel_buffers_requiring_grad) - *SIGSEGV* (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697)
 - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315)

Possible TODO is to prevent new tests from invoking external gradcheck.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133

Reviewed By: ezyang

Differential Revision: D26147919

Pulled By: soulitzer

fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432
2021-01-29 09:13:37 -08:00
Kyle Chen
d5e5c5455a [ROCm] re-enable test_sparse.py tests (#50557)
Summary:
Signed-off-by: Kyle Chen <kylechen@amd.com>

cc: jeffdaily

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50557

Reviewed By: mruberry

Differential Revision: D25941432

Pulled By: ngimel

fbshipit-source-id: 534fc8a91a48fa8b3b397e63423cd8347b41bbe2
2021-01-18 23:36:39 -08:00
Nathan Howell
c517e15d79 Add support for converting sparse bool tensors to dense (#50019)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50019

Reviewed By: smessmer

Differential Revision: D25782045

Pulled By: ezyang

fbshipit-source-id: a8389cbecb7e79099292a423a6fd8ac28631905b
2021-01-06 07:38:14 -08:00
mattip
f96ce3305c prohibit assignment to a sparse tensor (#50040)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48225 by prohibiting assignment to a sparse Tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50040

Reviewed By: mrshenli

Differential Revision: D25757125

Pulled By: zou3519

fbshipit-source-id: 3db6f48932eb10bf6ca5e97a6091afcabb60e478
2021-01-04 14:38:35 -08:00
Himangshu
9552cc65d4 Creation of test framework for Sparse Operators (#48488)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48488

Reviewed By: ngimel

Differential Revision: D25696487

Pulled By: mruberry

fbshipit-source-id: dc4f57c6628f62b74dd321f3f6b0fff86f25b040
2020-12-23 15:42:26 -08:00
Alexander
44ce0b8883 Sparse-sparse matrix multiplication (CPU/CUDA) (#39526)
Summary:
This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format.

The current implementation of `torch.sparse.mm` support this configuration,
`torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large.

This implementation extends `torch.sparse.mm` function to support  `torch.sparse.mm(sparse_matrix1, sparse_matrix2)`

Resolves  #[20988](https://github.com/pytorch/pytorch/issues/20988) for CPU/CUDA.

- [x] sparse matmul
  - [x] CPU/CUDA C++ implementation
  - [x] unittests
  - [x] update torch.sparse.mm documentation
  - [x] autograd support

The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA  rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm.

Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars:

size | density | sparse.mm(CUDA) | sparse.mm(CPU) | scipy_coo_matmul
-- | -- | -- | -- | --
(32, 10000) | 0.01 | 822.7 | 79.4 | 704.1
(32, 10000) | 0.05 | 1741.1 | 402.6 | 1155.3
(32, 10000) | 0.1 | 2956.8 | 840.8 | 1885.4
(32, 10000) | 0.25 | 6417.7 | 2832.3 | 4665.2
(512, 10000) | 0.01 | 1010.2 | 3941.3 | 26937.7
(512, 10000) | 0.05 | 2216.2 | 26903.8 | 57343.7
(512, 10000) | 0.1 | 4868.4 | 87773.7 | 117477.0
(512, 10000) | 0.25 | 16639.3 | 608105.0 | 624290.4
(1024, 10000) | 0.01 | 1224.8 | 13088.1 | 110379.2
(1024, 10000) | 0.05 | 3897.5 | 94783.9 | 236541.8
(1024, 10000) | 0.1 | 10559.1 | 405312.5 | 525483.4
(1024, 10000) | 0.25 | 57456.3 | 2424337.5 | 2729318.7

A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking:

```
[------------------------- sparse.mm-backward -------------------------]
                            |   sparse.backward   |  dense.backward
 -----------------------------------------------------------------------
      (32, 10000) | 0.01    |            13.5          |         2.4
      (32, 10000) | 0.05    |            52.3          |         2.4
      (512, 10000) | 0.01   |          1016.8          |       491.5
      (512, 10000) | 0.05   |          1604.3          |       492.3
      (1024, 10000) | 0.01  |          2384.1          |      1963.7
      (1024, 10000) | 0.05  |          3965.8          |      1951.9
```

I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels.

```
[---------------------------------- matmul ---------------------------------]
                        |   0.5   |  0.7   |  0.8   |  0.9   |  0.95  |  0.98
1 threads: ------------------------------------------------------------------
  (cpu)   torch         |    5.4  |   5.4  |   5.2  |   5.3  |   5.3  |   5.4
          torch.sparse  |  122.2  |  51.9  |  27.5  |  11.4  |   4.9  |   1.8
          scipy         |  150.1  |  87.4  |  69.2  |  56.8  |  38.4  |  17.1
  (cuda)  torch         |    1.3  |   1.1  |   1.1  |   1.1  |   1.1  |   1.1
          torch.sparse  |   20.0  |   8.4  |   5.1  |   2.5  |   1.5  |   1.1

[----------------------------------- backward -----------------------------------]
                        |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -----------------------------------------------------------------------
  (cpu)   torch         |   17.7  |   17.9  |   17.7  |   17.7  |   17.6  |   17.9
          torch.sparse  |  672.9  |  432.6  |  327.5  |  230.8  |  176.7  |  116.7
  (cuda)  torch         |    3.8  |    3.6  |    3.5  |    3.5  |    3.6  |    3.5
          torch.sparse  |   68.8  |   46.2  |   35.6  |   24.2  |   17.8  |   11.9

Times are in milliseconds (ms).
```

In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before.

## **References**

1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. **Sparse GPU Kernels for Deep Learning.**  Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk)
2. Trevor Gale, Erich Elsen, Sara Hooker. **The State of Sparsity in Deep Neural Networks.** [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39526

Reviewed By: mruberry

Differential Revision: D25661239

Pulled By: ngimel

fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938
2020-12-21 11:53:55 -08:00
Xiang Gao
87636c07bb CUDA BF16 sparse (#48807)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48807

Reviewed By: mruberry

Differential Revision: D25526752

Pulled By: ngimel

fbshipit-source-id: 9ff8e637486cfd67d46daf0c05142bbe611e08ec
2020-12-14 09:55:52 -08:00
kshitij12345
25ab39acd0 [numpy] torch.asin : promote integer inputs to float (#48461)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48461

Reviewed By: ngimel

Differential Revision: D25192319

Pulled By: mruberry

fbshipit-source-id: fd5dffeca9cd98b86782bfa6a9ab367e425ee934
2020-11-27 15:26:58 -08:00
kshitij12345
e9efd8df1b [numpy] torch.log1p : promote integer inputs to float (#48002)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48002

Reviewed By: ngimel

Differential Revision: D25148911

Pulled By: mruberry

fbshipit-source-id: 902d0ddf699debd6edd1b3d55f5c73932ca45e83
2020-11-24 22:01:07 -08:00
Natalia Gimelshein
4a2fb34042 check sparse sizes (#47148)
Summary:
checks sizes of sparse tensors when comparing them in assertEqual.
Removes additional checks in safeCoalesce, safeCoalesce should not be a test for `.coalesce()` function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47148

Reviewed By: mruberry

Differential Revision: D24823127

Pulled By: ngimel

fbshipit-source-id: 9303a6ff74aa3c9d9207803d05c0be2325fe392a
2020-11-09 10:33:24 -08:00
vfdev-5
dc7cd97402 Fixes bug in sspaddmm (#45113) (#45963)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45113

Description:
- Fixed bug in sspaddmm by calling contiguous on indices.
- Added tests

We have to make indices contiguous as we use `indices.data_ptr` in `_to_csr` which assumes row-contiguous storage:
be45c3401a/aten/src/ATen/native/sparse/SparseTensorMath.cpp (L1087-L1090)

> Part 1 of fixing this is probably to document sspaddmm. Part 2 may be to rewrite it using other ops. (https://github.com/pytorch/pytorch/issues/45113#issuecomment-700166809)

- Docs will be written here: https://github.com/pytorch/pytorch/pull/45400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45963

Reviewed By: malfet

Differential Revision: D24335599

Pulled By: ngimel

fbshipit-source-id: 8278c73a1b4cccc5e22c6f3818dd222588c46b45
2020-10-15 16:50:16 -07:00
Alexander
29dc3c5ec8 Sparse softmax support (CUDA) (#42307)
Summary:
This PR implements softmax support for sparse tensors.

Resolves gh-23651 for CUDA.

- [x]  sparse softmax
    - [x]  CUDA C++ implementation
    - [x]  unittests
    - [x]  update softmax documentation
    - [x]  autograd support
- [x]  sparse log_softmax
    - [x]  CUDA C++ implementation
    - [x]  unittests
    - [x]  update log_softmax documentation
    - [x]  autograd support

Here are some benchmark (script is [here](https://gist.github.com/aocsa/fbc1827b3e49901512a33ba96092cbc1)) results for `torch.sparse.softmax and torch.softmax`,  using CPU and GPU, values are float64 scalars, timing repeat is 1000:

| size         | density | sparse CUDA | sparse CPU |
|--------------|---------|-------------|------------|
|  (32, 10000) |   0.01  |    380.2    |    687.5   |
| (32, 10000)  | 0.05    | 404.3       | 2357.9     |
| (32, 10000)  | 0.1     | 405.9       | 3677.2     |
| (512, 10000) | 0.01    | 438.0       | 5443.4     |
| (512, 10000) | 0.05    | 888.1       | 24485.0    |
| (512, 10000) | 0.1     | 1921.3      | 45340.5    |

| size         | density | dense CUDA | dense CPU |
|--------------|---------|-------------|------------|
|  (32, 10000) |   0.01  |     23.6    |   1943.2   |
| (32, 10000)  | 0.05    | 23.6        | 1954.0     |
| (32, 10000)  | 0.1     | 23.5        | 1950.0     |
| (512, 10000) | 0.01    | 639.3       | 39797.9    |
| (512, 10000) | 0.05    | 640.3       | 39374.4    |
| (512, 10000) | 0.1     | 639.6       | 39192.3    |

Times are in microseconds (us).

Quick note:  I updated the performance test again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42307

Reviewed By: ngimel

Differential Revision: D23774427

Pulled By: mruberry

fbshipit-source-id: bfabf726075b39dde544c10249f27ae1871f82c7
2020-09-24 00:07:30 -07:00
vfdev-5
c947ab0bb9 Added sparse support for asin and neg functions, updated log1p (#44028)
Summary:
Description:

- [x] added C++ code for sparse `asin` and `neg` ops similarly to `log1p` op
- [x] added tests
  - [x] coalesced input CPU/CUDA
  - [x] uncoalesced input CPU/CUDA
- [x] added tests for `negative`  and `arcsin`

Backprop will be addressed in another PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44028

Reviewed By: agolynski

Differential Revision: D23793027

Pulled By: mruberry

fbshipit-source-id: 5fd642808da8e528cf6acd608ca0dcd720c4ccc3
2020-09-22 02:04:38 -07:00