Commit Graph

197 Commits

Author SHA1 Message Date
Peter Bell
99f2000a99 Migrate nonzero from TH to ATen (CPU) (#59149)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745

The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.

This PR also significantly improves performance by adding multithreading support to the algorithm.  As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.

|    Shape   |  Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us |          2150 us |            551 us |
| 128,128,32 | 1250 us |          1020 us |            197 us |
|  64,128,32 |  581 us |           495 us |             99 us |
|  32,128,32 |  292 us |           255 us |             83 us |
|  16,128,32 |  147 us |           126 us |             75 us |
|  8,128,32  |   75 us |            65 us |             65 us |
|  4,128,32  |   39 us |            33 us |             33 us |
|  2,128,32  |   20 us |            18 us |             18 us |
|  1,128,32  |   11 us |             9 us |              9 us |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149

Reviewed By: mruberry

Differential Revision: D28817466

Pulled By: ngimel

fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732
2021-06-02 12:26:29 -07:00
Natalia Gimelshein
657b75d155 Revert D28700259: [pytorch][PR] Migrate nonzero from TH to ATen (CPU)
Test Plan: revert-hammer

Differential Revision:
D28700259 (95b1bc1009)

Original commit changeset: 9b279ca7c36d

fbshipit-source-id: 267afe63376be598d24c862e02e3b4b3ea75f77c
2021-05-27 20:07:30 -07:00
Peter Bell
95b1bc1009 Migrate nonzero from TH to ATen (CPU) (#58811)
Summary:
Closes gh-24745

The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.

This PR also significantly improves performance by adding multithreading support to the algorithm.  As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.

|    Shape   |  Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us |          2220 us |            496 us |
| 128,128,32 | 1250 us |           976 us |            175 us |
|  64,128,32 |  581 us |           486 us |             88 us |
|  32,128,32 |  292 us |           245 us |             80 us |
|  16,128,32 |  147 us |           120 us |             71 us |
|  8,128,32  |   75 us |            61 us |             61 us |
|  4,128,32  |   39 us |            32 us |             32 us |
|  2,128,32  |   20 us |            17 us |             17 us |
|  1,128,32  |   11 us |             9 us |              9 us |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811

Reviewed By: anjali411

Differential Revision: D28700259

Pulled By: ngimel

fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159
2021-05-27 10:06:54 -07:00
Pearu Peterson
be4ba29d49 Detect overflow in numel of sparse COO tensor (#57492)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57416

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57492

Reviewed By: albanD

Differential Revision: D28273649

Pulled By: mruberry

fbshipit-source-id: 08ba50509556df1981d7ede025d84a836d2e8e5e
2021-05-25 22:16:21 -07:00
Alexander
6f2c0cccdd New: sparse complex: add linear algebra, addmm (#57129)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57129

Test Plan: Imported from OSS

Reviewed By: janeyx99, astaff

Differential Revision: D28112701

Pulled By: ezyang

fbshipit-source-id: 1b253453dc19e908fb18d0b1a83738243e0a8d59
2021-05-07 05:37:48 -07:00
Alexander
a911c4fc1c New: Initial support for sparse complex tensors constructors for CPU/CUDA (#57125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57125

I'm opening this PR, solving the last issued reported before merging PR #54153

https://github.com/pytorch/pytorch/pull/54153#issuecomment-827997616,

Solves gh-50690

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D28112702

Pulled By: ezyang

fbshipit-source-id: 915681954edb14b7c19c3ffe641af2d2e6649576
2021-05-07 05:36:41 -07:00
Peter Bell
a5288a0244 Sparse support for division rounding_mode argument (#51989)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51989

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D28118114

Pulled By: mruberry

fbshipit-source-id: 2a76ee55c3845552e57e93d54628ce3c2fab3399
2021-05-01 17:37:25 -07:00
Mike Ruberry
7bcce2acb9 Revert D27765618: Initial support for sparse complex tensors constructors for CPU/CUDA
Test Plan: revert-hammer

Differential Revision:
D27765618 (daef60c3b7)

Original commit changeset: a9cdd31d5c7a

fbshipit-source-id: f700d5db7ff8930b9158460b5a77f68a35e212a4
2021-04-27 15:48:51 -07:00
Alexander
0d41122e61 Eliminate global usage of torch.set_default_dtype in sparse test (#56393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56393

Fixes for  gh-56369

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27913266

Pulled By: mruberry

fbshipit-source-id: 2c590d3a2188aae251184f08c1a6a2c4c570d150
2021-04-27 15:23:14 -07:00
Alexander
daef60c3b7 Initial support for sparse complex tensors constructors for CPU/CUDA (#54153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153

Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA.

- [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors
- [x] add complex support to coalesce function
- [x] add complex support to to_dense function
- [x] add complex support to to_sparse function
- [x] add complex support to sparse_add function
- [x] add unit tests

Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra.

Note: Before using ghstack the original PR  was  #50984

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D27765618

Pulled By: ezyang

fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89
2021-04-27 14:39:13 -07:00
sorenrasmussenai
f27513e951 Fix bug in torch.sparse.addmm on CUDA when beta != 0 or 1 (#56160)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/55917, which caused `torch.sparse.addmm` to fail on CUDA whenever `beta` was different from 0 or 1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56160

Reviewed By: ejguan

Differential Revision: D27825108

Pulled By: ngimel

fbshipit-source-id: 2ade5ea38c5322768dc4dffb40c65fcbb17ec201
2021-04-26 02:57:41 -07:00
Alexander
6ee333cdb5 modernize test_sparse (#54572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54572

Adding device generic tests to `test_sparse`.
Follow-up PR: #54153

I think is ready to review.
Looking forward your comments cc mruberry.

Thanks

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27562663

Pulled By: mruberry

fbshipit-source-id: c48973e707f779b529bc7f61b75103194b428987
2021-04-09 12:19:29 -07:00
Alban Desmaison
b91d48877d Reland Fix reference cycle in sparse coalesce graph (#55404)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/52874

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55404

Reviewed By: bdhirsh

Differential Revision: D27600438

Pulled By: albanD

fbshipit-source-id: f5c286638b324ad59be65657a016028af5e2b303
2021-04-07 12:02:42 -07:00
Brian Hirsh
ec80981d28 Revert D27246997: [pytorch][PR] Fix reference cycle in sparse coalesce graph
Test Plan: revert-hammer

Differential Revision:
D27246997 (815bfad28c)

Original commit changeset: 0fe6c1104350

fbshipit-source-id: 4d345718589a642d3c65474b266342285205ccdf
2021-04-06 11:45:27 -07:00
Peter Bell
815bfad28c Fix reference cycle in sparse coalesce graph (#52874)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52253

In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor.

As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874

Reviewed By: bdhirsh

Differential Revision: D27246997

Pulled By: albanD

fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3
2021-04-06 08:32:19 -07:00
Heitor Schueroff
6d87b3667f Added support for TensorList inputs in OpInfo (#54922)
Summary:
Stack:
* https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs
* __#54922 Added support for TensorList inputs in OpInfo__

Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck.

Note: JIT testing support for TensorList inputs will be added in a follow up PR.

Fixes https://github.com/pytorch/pytorch/issues/51996

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922

Reviewed By: H-Huang

Differential Revision: D27448952

Pulled By: heitorschueroff

fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278
2021-03-31 04:42:10 -07:00
Edward Yang
e0aebe241d Refactor tensor_new.cpp to use TensorOptions instead of DispatchKey (#54034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034

Fixes #53544

I had to touch a bunch of lines but the refactoring was fairly
mechanical.  Here's how it works.

The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions.  The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key();  it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time).  See also #53124.  But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.

So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
  - Most top level functions take a DispatchKey as their argument.  I
    use the new function dispatchKeyToTensorOptions to convert it into
    a TensorOptions
  - typeIdWithDefault now produces a TensorOptions (probably could do
    with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
  - Previously, the function options() was typically used to convert the
    DispatchKey into a TensorOptions.  Now its replacement build_options
    just takes a TensorOptions and sets some extra fields on it.
    Irritatingly, I can't just replace
    `build_options(options, scalar_type, device)` with
    `options.dtype(scalar_type).device(device)` because the semantics
    are slightly different: if device is nullopt, we should preserve
    the usage of the device specified in options (what options.device()
    does is overwrite the device unconditionally; e.g., if device is
    nullopt, unset device from options)
  - The other major sink for DispatchKey was `internal_new_from_data`,
    but it turns out it only really extracts the device type from
    the dispatch key.  Now it just pulls out the device from
    TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
  introduce new functions dispatchKeyToLayout (replicating
  layout_from_backend--there are still a few uses of this function
  so I couldn't delete it) and dispatchKeyToDeviceType (replacing
  computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
  I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
  previously used, I now do `other.options().type_equal()`, which
  is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
  a tensor for values, and then read out the dispatch key from the
  result to allocate the keys.  As best as I can tell, this is totally
  equivalent to just passing in the options to both values and indices
  (the only difference is dtype, which is captured via a separate
  argument)

This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all.  I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.

Even with this limited refactor, the payoff is sweet.  I can delete:

- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType

The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27109509

Pulled By: ezyang

fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9
2021-03-19 09:08:32 -07:00
mattip
54a2498919 Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387)
Summary:
Related to https://github.com/pytorch/pytorch/issues/50006

Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387

Reviewed By: albanD

Differential Revision: D26773387

Pulled By: mruberry

fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd
2021-03-08 03:32:14 -08:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Rong Rong (AI Infra)
b52e2e6045 [BE] _get_torch_cuda_version should return tuple (#52409)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52409

Reviewed By: jbschlosser, glaringlee

Differential Revision: D26513924

Pulled By: walterddr

fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734
2021-02-18 09:28:38 -08:00
Mike Ruberry
594a66d778 Warn about floor_divide performing incorrect rounding (#50281) (#50281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745

Test Plan: Imported from OSS

Reviewed By: ngimel

Pulled By: mruberry

Differential Revision: D26257855

fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b
2021-02-10 03:13:34 -08:00
Jeffrey Wan
c0966914bc Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49409

There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories:
1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead
3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag

Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?)

Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False.

So far exceptions to the above (as discovered by CI) include:
 - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests
 - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103)
 - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236)
 - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235)
 - test_data_parallel (test_data_parallel_buffers_requiring_grad) - *SIGSEGV* (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697)
 - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315)

Possible TODO is to prevent new tests from invoking external gradcheck.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133

Reviewed By: ezyang

Differential Revision: D26147919

Pulled By: soulitzer

fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432
2021-01-29 09:13:37 -08:00
Kyle Chen
d5e5c5455a [ROCm] re-enable test_sparse.py tests (#50557)
Summary:
Signed-off-by: Kyle Chen <kylechen@amd.com>

cc: jeffdaily

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50557

Reviewed By: mruberry

Differential Revision: D25941432

Pulled By: ngimel

fbshipit-source-id: 534fc8a91a48fa8b3b397e63423cd8347b41bbe2
2021-01-18 23:36:39 -08:00
Nathan Howell
c517e15d79 Add support for converting sparse bool tensors to dense (#50019)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49977

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50019

Reviewed By: smessmer

Differential Revision: D25782045

Pulled By: ezyang

fbshipit-source-id: a8389cbecb7e79099292a423a6fd8ac28631905b
2021-01-06 07:38:14 -08:00
mattip
f96ce3305c prohibit assignment to a sparse tensor (#50040)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48225 by prohibiting assignment to a sparse Tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50040

Reviewed By: mrshenli

Differential Revision: D25757125

Pulled By: zou3519

fbshipit-source-id: 3db6f48932eb10bf6ca5e97a6091afcabb60e478
2021-01-04 14:38:35 -08:00
Himangshu
9552cc65d4 Creation of test framework for Sparse Operators (#48488)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48488

Reviewed By: ngimel

Differential Revision: D25696487

Pulled By: mruberry

fbshipit-source-id: dc4f57c6628f62b74dd321f3f6b0fff86f25b040
2020-12-23 15:42:26 -08:00
Alexander
44ce0b8883 Sparse-sparse matrix multiplication (CPU/CUDA) (#39526)
Summary:
This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format.

The current implementation of `torch.sparse.mm` support this configuration,
`torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large.

This implementation extends `torch.sparse.mm` function to support  `torch.sparse.mm(sparse_matrix1, sparse_matrix2)`

Resolves  #[20988](https://github.com/pytorch/pytorch/issues/20988) for CPU/CUDA.

- [x] sparse matmul
  - [x] CPU/CUDA C++ implementation
  - [x] unittests
  - [x] update torch.sparse.mm documentation
  - [x] autograd support

The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA  rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm.

Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars:

size | density | sparse.mm(CUDA) | sparse.mm(CPU) | scipy_coo_matmul
-- | -- | -- | -- | --
(32, 10000) | 0.01 | 822.7 | 79.4 | 704.1
(32, 10000) | 0.05 | 1741.1 | 402.6 | 1155.3
(32, 10000) | 0.1 | 2956.8 | 840.8 | 1885.4
(32, 10000) | 0.25 | 6417.7 | 2832.3 | 4665.2
(512, 10000) | 0.01 | 1010.2 | 3941.3 | 26937.7
(512, 10000) | 0.05 | 2216.2 | 26903.8 | 57343.7
(512, 10000) | 0.1 | 4868.4 | 87773.7 | 117477.0
(512, 10000) | 0.25 | 16639.3 | 608105.0 | 624290.4
(1024, 10000) | 0.01 | 1224.8 | 13088.1 | 110379.2
(1024, 10000) | 0.05 | 3897.5 | 94783.9 | 236541.8
(1024, 10000) | 0.1 | 10559.1 | 405312.5 | 525483.4
(1024, 10000) | 0.25 | 57456.3 | 2424337.5 | 2729318.7

A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking:

```
[------------------------- sparse.mm-backward -------------------------]
                            |   sparse.backward   |  dense.backward
 -----------------------------------------------------------------------
      (32, 10000) | 0.01    |            13.5          |         2.4
      (32, 10000) | 0.05    |            52.3          |         2.4
      (512, 10000) | 0.01   |          1016.8          |       491.5
      (512, 10000) | 0.05   |          1604.3          |       492.3
      (1024, 10000) | 0.01  |          2384.1          |      1963.7
      (1024, 10000) | 0.05  |          3965.8          |      1951.9
```

I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels.

```
[---------------------------------- matmul ---------------------------------]
                        |   0.5   |  0.7   |  0.8   |  0.9   |  0.95  |  0.98
1 threads: ------------------------------------------------------------------
  (cpu)   torch         |    5.4  |   5.4  |   5.2  |   5.3  |   5.3  |   5.4
          torch.sparse  |  122.2  |  51.9  |  27.5  |  11.4  |   4.9  |   1.8
          scipy         |  150.1  |  87.4  |  69.2  |  56.8  |  38.4  |  17.1
  (cuda)  torch         |    1.3  |   1.1  |   1.1  |   1.1  |   1.1  |   1.1
          torch.sparse  |   20.0  |   8.4  |   5.1  |   2.5  |   1.5  |   1.1

[----------------------------------- backward -----------------------------------]
                        |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -----------------------------------------------------------------------
  (cpu)   torch         |   17.7  |   17.9  |   17.7  |   17.7  |   17.6  |   17.9
          torch.sparse  |  672.9  |  432.6  |  327.5  |  230.8  |  176.7  |  116.7
  (cuda)  torch         |    3.8  |    3.6  |    3.5  |    3.5  |    3.6  |    3.5
          torch.sparse  |   68.8  |   46.2  |   35.6  |   24.2  |   17.8  |   11.9

Times are in milliseconds (ms).
```

In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before.

## **References**

1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. **Sparse GPU Kernels for Deep Learning.**  Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk)
2. Trevor Gale, Erich Elsen, Sara Hooker. **The State of Sparsity in Deep Neural Networks.** [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39526

Reviewed By: mruberry

Differential Revision: D25661239

Pulled By: ngimel

fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938
2020-12-21 11:53:55 -08:00
Xiang Gao
87636c07bb CUDA BF16 sparse (#48807)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48807

Reviewed By: mruberry

Differential Revision: D25526752

Pulled By: ngimel

fbshipit-source-id: 9ff8e637486cfd67d46daf0c05142bbe611e08ec
2020-12-14 09:55:52 -08:00
kshitij12345
25ab39acd0 [numpy] torch.asin : promote integer inputs to float (#48461)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48461

Reviewed By: ngimel

Differential Revision: D25192319

Pulled By: mruberry

fbshipit-source-id: fd5dffeca9cd98b86782bfa6a9ab367e425ee934
2020-11-27 15:26:58 -08:00
kshitij12345
e9efd8df1b [numpy] torch.log1p : promote integer inputs to float (#48002)
Summary:
Reference https://github.com/pytorch/pytorch/issues/42515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48002

Reviewed By: ngimel

Differential Revision: D25148911

Pulled By: mruberry

fbshipit-source-id: 902d0ddf699debd6edd1b3d55f5c73932ca45e83
2020-11-24 22:01:07 -08:00
Natalia Gimelshein
4a2fb34042 check sparse sizes (#47148)
Summary:
checks sizes of sparse tensors when comparing them in assertEqual.
Removes additional checks in safeCoalesce, safeCoalesce should not be a test for `.coalesce()` function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47148

Reviewed By: mruberry

Differential Revision: D24823127

Pulled By: ngimel

fbshipit-source-id: 9303a6ff74aa3c9d9207803d05c0be2325fe392a
2020-11-09 10:33:24 -08:00
vfdev-5
dc7cd97402 Fixes bug in sspaddmm (#45113) (#45963)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45113

Description:
- Fixed bug in sspaddmm by calling contiguous on indices.
- Added tests

We have to make indices contiguous as we use `indices.data_ptr` in `_to_csr` which assumes row-contiguous storage:
be45c3401a/aten/src/ATen/native/sparse/SparseTensorMath.cpp (L1087-L1090)

> Part 1 of fixing this is probably to document sspaddmm. Part 2 may be to rewrite it using other ops. (https://github.com/pytorch/pytorch/issues/45113#issuecomment-700166809)

- Docs will be written here: https://github.com/pytorch/pytorch/pull/45400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45963

Reviewed By: malfet

Differential Revision: D24335599

Pulled By: ngimel

fbshipit-source-id: 8278c73a1b4cccc5e22c6f3818dd222588c46b45
2020-10-15 16:50:16 -07:00
Alexander
29dc3c5ec8 Sparse softmax support (CUDA) (#42307)
Summary:
This PR implements softmax support for sparse tensors.

Resolves gh-23651 for CUDA.

- [x]  sparse softmax
    - [x]  CUDA C++ implementation
    - [x]  unittests
    - [x]  update softmax documentation
    - [x]  autograd support
- [x]  sparse log_softmax
    - [x]  CUDA C++ implementation
    - [x]  unittests
    - [x]  update log_softmax documentation
    - [x]  autograd support

Here are some benchmark (script is [here](https://gist.github.com/aocsa/fbc1827b3e49901512a33ba96092cbc1)) results for `torch.sparse.softmax and torch.softmax`,  using CPU and GPU, values are float64 scalars, timing repeat is 1000:

| size         | density | sparse CUDA | sparse CPU |
|--------------|---------|-------------|------------|
|  (32, 10000) |   0.01  |    380.2    |    687.5   |
| (32, 10000)  | 0.05    | 404.3       | 2357.9     |
| (32, 10000)  | 0.1     | 405.9       | 3677.2     |
| (512, 10000) | 0.01    | 438.0       | 5443.4     |
| (512, 10000) | 0.05    | 888.1       | 24485.0    |
| (512, 10000) | 0.1     | 1921.3      | 45340.5    |

| size         | density | dense CUDA | dense CPU |
|--------------|---------|-------------|------------|
|  (32, 10000) |   0.01  |     23.6    |   1943.2   |
| (32, 10000)  | 0.05    | 23.6        | 1954.0     |
| (32, 10000)  | 0.1     | 23.5        | 1950.0     |
| (512, 10000) | 0.01    | 639.3       | 39797.9    |
| (512, 10000) | 0.05    | 640.3       | 39374.4    |
| (512, 10000) | 0.1     | 639.6       | 39192.3    |

Times are in microseconds (us).

Quick note:  I updated the performance test again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42307

Reviewed By: ngimel

Differential Revision: D23774427

Pulled By: mruberry

fbshipit-source-id: bfabf726075b39dde544c10249f27ae1871f82c7
2020-09-24 00:07:30 -07:00
vfdev-5
c947ab0bb9 Added sparse support for asin and neg functions, updated log1p (#44028)
Summary:
Description:

- [x] added C++ code for sparse `asin` and `neg` ops similarly to `log1p` op
- [x] added tests
  - [x] coalesced input CPU/CUDA
  - [x] uncoalesced input CPU/CUDA
- [x] added tests for `negative`  and `arcsin`

Backprop will be addressed in another PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44028

Reviewed By: agolynski

Differential Revision: D23793027

Pulled By: mruberry

fbshipit-source-id: 5fd642808da8e528cf6acd608ca0dcd720c4ccc3
2020-09-22 02:04:38 -07:00
Xiao Wang
d75c402755 Add cusolver to build, rewrite MAGMA inverse with cusolver (#42403)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42265

This PR adds cusolver to the pytorch build, and enables the use of cusolver/cublas library functions on GPU `torch.inverse` on certain tensor shapes.

Specifically, when

* the tensor is two dimensional (single batch), or
* has >2 dimensions (multiple batches) and `batch_size <= 2`, or
* magma is not linked,

cusolver/cublas will be used. In other conditions, the current implementation of MAGMA will still be used.

8c0949ae45/aten/src/ATen/native/cuda/BatchLinearAlgebra.cu (L742-L752)

The reason for this is that for tensors with large batch_size, `cublasXgetrfBatched` and `cublasXgetriBatched` doesn't perform very well. For `batch_size > 1`, we launch cusolver functions in multiple streams. This lets cusolver functions run in parallel, and can greatly increase the performance. When `batch_size > 2`, the parallel launched cusolver functions are slightly slower than the current magma implementation, so we still use the current magma impl.

On CUDA 9.2, there were some numerical issues detected, so cusolver impl will not be used. The cusolver impl will also not be used on platforms other than Nvidia CUDA.

060769feaf/aten/src/ATen/native/cuda/BatchLinearAlgebraLib.h (L10-L13)

Note that there is a new heuristic used before cusolver/cublas calls here:

8c0949ae45/aten/src/ATen/native/cuda/MiscUtils.h (L113-L121)

where `use_loop_launch = true` means launch single batch cusolver functions in parallel, and `use_loop_launch = false` means use cublas_X_batched functions. When magma is enabled (only `batch_size <= 2` will be dispatched to cusolver/cublas), the heuristic will always return `true` and the cusolver calls are faster than small batch_size magma calls. When magma is disabled, this adds the functionality of `torch.inverse`, which was disabled before for all shapes (though large batch_size cublas performance may not be as well as magma).

Checklist:
- [X] Add benchmark, cpu, gpu-before (magma), gpu-after (cusolver)
- [X] Rewrite single inverse (ndim == 2) with cusolver
- [X] Rewrite batched inverse (ndim > 2) with cublas
- [X] Add cusolver to build
- [x] Clean up functions related to `USE_MAGMA` define guard
- [x] Workaround for non-cuda platform
- [x] Workaround for cuda 9.2
- [x] Add zero size check
- [x] Add tests

Next step:

If cusolver doesn't cause any problem in pytorch build, and there are no major performance regressions reported after this PR being merged, I will start porting other cusolver/cublas functions for linear algebra to improve the performance.

<details>
<summary> benchmark 73499c6 </summary>

benchmark code: https://github.com/xwang233/code-snippet/blob/master/torch.inverse/inverse-cusolver.ipynb

shape meaning:

* `[] 2 torch.float32 -> torch.randn(2, 2, dtype=torch.float32)`
* `[2] 4 torch.float32 -> torch.randn(2, 4, 4, dtype=torch.float32)`

| shape | cpu_time (ms) | gpu_time_before (magma) (ms) | gpu_time_after (ms) |
| --- | --- | --- | --- |
| [] 2 torch.float32 |  0.095 |  7.534 |  0.129  |
| [] 4 torch.float32 |  0.009 |  7.522 |  0.129  |
| [] 8 torch.float32 |  0.011 |  7.647 |  0.138  |
| [] 16 torch.float32 |  0.075 |  7.582 |  0.135  |
| [] 32 torch.float32 |  0.073 |  7.573 |  0.191  |
| [] 64 torch.float32 |  0.134 |  7.694 |  0.288  |
| [] 128 torch.float32 |  0.398 |  8.073 |  0.491  |
| [] 256 torch.float32 |  1.054 |  11.860 |  1.074  |
| [] 512 torch.float32 |  5.218 |  14.130 |  2.582  |
| [] 1024 torch.float32 |  19.010 |  18.780 |  6.936  |
| [1] 2 torch.float32 |  0.009 |  0.113 |  0.128 ***regressed |
| [1] 4 torch.float32 |  0.009 |  0.113 |  0.131 ***regressed |
| [1] 8 torch.float32 |  0.011 |  0.116 |  0.129 ***regressed |
| [1] 16 torch.float32 |  0.015 |  0.122 |  0.135 ***regressed |
| [1] 32 torch.float32 |  0.032 |  0.177 |  0.178 ***regressed |
| [1] 64 torch.float32 |  0.070 |  0.420 |  0.281  |
| [1] 128 torch.float32 |  0.328 |  0.816 |  0.490  |
| [1] 256 torch.float32 |  1.125 |  1.690 |  1.084  |
| [1] 512 torch.float32 |  4.344 |  4.305 |  2.576  |
| [1] 1024 torch.float32 |  16.510 |  16.340 |  6.928  |
| [2] 2 torch.float32 |  0.009 |  0.113 |  0.186 ***regressed |
| [2] 4 torch.float32 |  0.011 |  0.115 |  0.184 ***regressed |
| [2] 8 torch.float32 |  0.012 |  0.114 |  0.184 ***regressed |
| [2] 16 torch.float32 |  0.019 |  0.119 |  0.173 ***regressed |
| [2] 32 torch.float32 |  0.050 |  0.170 |  0.240 ***regressed |
| [2] 64 torch.float32 |  0.120 |  0.429 |  0.375  |
| [2] 128 torch.float32 |  0.576 |  0.830 |  0.675  |
| [2] 256 torch.float32 |  2.021 |  1.748 |  1.451  |
| [2] 512 torch.float32 |  9.070 |  4.749 |  3.539  |
| [2] 1024 torch.float32 |  33.655 |  18.240 |  12.220  |
| [4] 2 torch.float32 |  0.009 |  0.112 |  0.318 ***regressed |
| [4] 4 torch.float32 |  0.010 |  0.115 |  0.319 ***regressed |
| [4] 8 torch.float32 |  0.013 |  0.115 |  0.320 ***regressed |
| [4] 16 torch.float32 |  0.027 |  0.120 |  0.331 ***regressed |
| [4] 32 torch.float32 |  0.085 |  0.173 |  0.385 ***regressed |
| [4] 64 torch.float32 |  0.221 |  0.431 |  0.646 ***regressed |
| [4] 128 torch.float32 |  1.102 |  0.834 |  1.055 ***regressed |
| [4] 256 torch.float32 |  4.042 |  1.811 |  2.054 ***regressed |
| [4] 512 torch.float32 |  18.390 |  4.884 |  5.087 ***regressed |
| [4] 1024 torch.float32 |  69.025 |  19.840 |  20.000 ***regressed |

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42403

Reviewed By: ailzhang, mruberry

Differential Revision: D23717984

Pulled By: ngimel

fbshipit-source-id: 54cbd9ea72a97989cff4127089938e8a8e29a72b
2020-09-18 20:43:29 -07:00
vfdev
24df3b7373 torch.empty_like and torch.zeros_like raise error if any memory format is provided with sparse input (#43699) (#44058)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43699

- Changed the order of `TORCH_CHECK` and `if (options.layout() == kSparse && self.is_sparse())`
inside `empty_like` method.

- [x] Added tests

EDIT:

More details on that and why we can not take zeros_like  approach.
Python code :
```python
res = torch.zeros_like(input_coalesced, memory_format=torch.preserve_format)
```
is routed to
```c++
// TensorFactories.cpp
Tensor zeros_like(
    const Tensor& self,
    const TensorOptions& options,
    c10::optional<c10::MemoryFormat> optional_memory_format) {
  if (options.layout() == kSparse && self.is_sparse()) {
    auto res = at::empty({0}, options); // to be resized
    res.sparse_resize_and_clear_(
        self.sizes(), self.sparse_dim(), self.dense_dim());
    return res;
  }
  auto result = at::empty_like(self, options, optional_memory_format);
  return result.zero_();
}
```
and passed to `if (options.layout() == kSparse && self.is_sparse())`

When we call in Python
```python
res = torch.empty_like(input_coalesced, memory_format=torch.preserve_format)
```
it is routed to
```c++
Tensor empty_like(
    const Tensor& self,
    const TensorOptions& options_,
    c10::optional<c10::MemoryFormat> optional_memory_format) {
  TORCH_CHECK(
    !(options_.has_memory_format() && optional_memory_format.has_value()),
    "Cannot set memory_format both in TensorOptions and explicit argument; please delete "
    "the redundant setter.");
  TensorOptions options =
      self.options()
          .merge_in(options_)
          .merge_in(TensorOptions().memory_format(optional_memory_format));
  TORCH_CHECK(
      !(options.layout() != kStrided &&
          optional_memory_format.has_value()),
      "memory format option is only supported by strided tensors");
  if (options.layout() == kSparse && self.is_sparse()) {
    auto result = at::empty({0}, options); // to be resized
    result.sparse_resize_and_clear_(
        self.sizes(), self.sparse_dim(), self.dense_dim());
    return result;
  }
```

cc pearu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44058

Reviewed By: albanD

Differential Revision: D23672494

Pulled By: mruberry

fbshipit-source-id: af232274dd2b516dd6e875fc986e3090fa285658
2020-09-17 10:25:31 -07:00
Mike Ruberry
686e281bcf Updates div to perform true division (#42907)
Summary:
This PR:

- updates div to perform true division
- makes torch.true_divide an alias of torch.div

This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907

Reviewed By: ngimel

Differential Revision: D23622114

Pulled By: mruberry

fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927
2020-09-14 15:50:38 -07:00
vfdev
9f88bcb5a2 Minor typo fix (#42731)
Summary:
Just fixed a typo in test/test_sparse.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42731

Reviewed By: ezyang

Differential Revision: D22999930

Pulled By: mrshenli

fbshipit-source-id: 1b5b21d7cb274bd172fb541b2761f727ba06302c
2020-08-07 11:17:51 -07:00
Nikita Shulga
aa4e91a6dc Fix TestSparse.test_bmm_windows_error when CUDA is not available (#42626)
Summary:
Refactor comnon pattern of (torch.cuda.version and [int(x) for x in torch.cuda.version.split(".")] >= [a, b]) into `_get_torch_cuda_version()` function

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42626

Reviewed By: seemethere

Differential Revision: D22956149

Pulled By: malfet

fbshipit-source-id: 897c55965e53b477cd20f69e8da15d90489035de
2020-08-05 16:07:35 -07:00
peter
b08347fd7b Add CUDA 11 builds for Windows CI (#42420)
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/42410.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42420

Reviewed By: seemethere

Differential Revision: D22917230

Pulled By: malfet

fbshipit-source-id: 6ad394f7f8c430c587e0b0d9c5a5e7b7bcd85bfe
2020-08-05 09:40:33 -07:00
Kurt Mohler
206db5c127 Improve torch.norm functionality, errors, and tests (#41956)
Summary:
**BC-Breaking Note:**
BC breaking changes in the case where keepdim=True. Before this change, when calling `torch.norm` with keepdim=True and p='fro' or p=number, leaving all other optional arguments as their default values, the keepdim argument would be ignored. Also, any time `torch.norm` was called with p='nuc', the result would have one fewer dimension than the input, and the dimensions could be out of order depending on which dimensions were being reduced. After the change, for each of these cases, the result has the same number and order of dimensions as the input.

**PR Summary:**

* Fix keepdim behavior
* Throw descriptive errors for unsupported sparse norm args
* Increase unit test coverage for these cases and for complex inputs

These changes were taken from part of PR https://github.com/pytorch/pytorch/issues/40924. That PR is not going to be merged because it overrides `torch.norm`'s interface, which we want to avoid. But these improvements are still useful.

Issue https://github.com/pytorch/pytorch/issues/24802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41956

Reviewed By: albanD

Differential Revision: D22837455

Pulled By: mruberry

fbshipit-source-id: 509ecabfa63b93737996f48a58c7188b005b7217
2020-08-01 01:55:12 -07:00
Mike Ruberry
12cd083fd7 Updates torch.tensor, torch.as_tensor, and sparse ctors to use the device of inputs tensors they're given, by default (#41984)
Summary:
**BC-Breaking Note**

This PR changes the behavior of the torch.tensor, torch.as_tensor, and sparse constructors. When given a tensor as input and a device is not explicitly specified, these constructors now always infer their device from the tensor. Historically, if the optional dtype kwarg was provided then these constructors would not infer their device from tensor inputs. Additionally, for the sparse ctor a runtime error is now thrown if the indices and values tensors are on different devices and the device kwarg is not specified.

**PR Summary**
This PR's functional change is a single line:

```
auto device = device_opt.has_value() ? *device_opt : (type_inference ? var.device() : at::Device(computeDeviceType(dispatch_key)));
```
=>
```
auto device = device_opt.has_value() ? *device_opt : var.device();
```

in `internal_new_from_data`. This line entangled whether the function was performing type inference with whether it inferred its device from an input tensor, and in practice meant that

```
t = torch.tensor((1, 2, 3), device='cuda')
torch.tensor(t, dtype=torch.float64)
```

would return a tensor on the CPU, not the default CUDA device, while

```
t = torch.tensor((1, 2, 3), device='cuda')
torch.tensor(t)
```

would return a tensor on the device of `t`!

This behavior is niche and odd, but came up while aocsa was fixing https://github.com/pytorch/pytorch/issues/40648.

An additional side affect of this change is that the indices and values tensors given to a sparse constructor must be on the same device, or the sparse ctor must specify the dtype kwarg. The tests in test_sparse.py have been updated to reflect this behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41984

Reviewed By: ngimel

Differential Revision: D22721426

Pulled By: mruberry

fbshipit-source-id: 909645124837fcdf3d339d7db539367209eccd48
2020-07-25 02:49:45 -07:00
Mike Ruberry
13120bf677 Updates assertEqual to require atol and rtol, removes positional atol (#38872)
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.

In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872

Differential Revision: D21740237

Pulled By: mruberry

fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042
2020-05-27 06:31:07 -07:00
Rohan Varma
63e545e0fe Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol
Test Plan: revert-hammer

Differential Revision:
D21717199

Original commit changeset: 9feb856f94ee

fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259
2020-05-26 18:23:59 -07:00
Mike Ruberry
6ddca30b2d Updates assertEqual to require atol and rtol, removes positional atol (#38872)
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.

In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872

Differential Revision: D21717199

Pulled By: mruberry

fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a
2020-05-26 08:30:23 -07:00
Pearu Peterson
48c0331e01 Sparse softmax support (CPU) (#36305)
Summary:
This PR implements softmax support for sparse tensors.

The sparse softmax is related to dense softmax when the values of unspecified sparse tensor entries are taken to be `-inf` that will have the effect of "zero entries ignored". This relation is used for testing the correctness of results here.

Resolves https://github.com/pytorch/pytorch/issues/23651 for CPU.

- [x] sparse softmax
  - [x] CPU C++ implementation
  - [x] unittests
  - [x] update softmax documentation
  - [x] autograd support
- [x] sparse log_softmax
  - [x] CPU C++ implementation
  - [x] unittests
  - [x] update log_softmax documentation
  - [x] autograd support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36305

Differential Revision: D21566540

Pulled By: ezyang

fbshipit-source-id: a632ea69c38622f960721482e442efeb8d0a54fc
2020-05-14 08:08:40 -07:00
Hong Xu
336e1ec592 Clean up error handling in is_nonzero and where in TensorCompare.cpp (#38150)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38150

Differential Revision: D21539736

Pulled By: ezyang

fbshipit-source-id: e390c12f5948192a552d66dcd1bb89b2cb45f170
2020-05-13 20:19:40 -07:00
ashishfarmer
bcdff7eb67 Fix for tests on ROCm (#37616)
Summary:
This pull request fixes and re-enables two of the tests disabled in https://github.com/pytorch/pytorch/issues/37427
1. `test_sparse_add_out_bfloat16` in test_sparse.py fixed to use updated `atol` argument instead of `prec` for `assertEqual`
2. The conversion of `flt_min` to `int64` is divergent on HIP compared to numpy. The change removes that conversion from the `test_float_to_int_conversion_finite` test case in test_torch.py

cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37616

Differential Revision: D21379876

Pulled By: ezyang

fbshipit-source-id: 2bfb41d67874383a01330c5d540ee516b3b07dcc
2020-05-04 07:16:54 -07:00
Peter Bell
675b3fc834 Prevent unbounded growth of sparse tensor in add operation (#36030)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34964

Sparse cuda add was implemented by just concatenating the indices and values for the tensor. If called repeatedly in a tight loop this will let `nnz` grow unbounded. In the worst case of  `x.add_(x)` it grows exponentially.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36030

Differential Revision: D20873504

Pulled By: zou3519

fbshipit-source-id: d90ed8dda0c89571fb89e358757b5dde299513df
2020-05-01 12:05:15 -07:00
ashishfarmer
bbd2350c99 Disable tests failing on test2 in ROCm CI (#37427)
Summary:
This pull request disables the unit tests that were observed to be failing once `test2` was enabled. These tests will be one by one looked at and fixed at the earliest, but until then disabling them to unblock `test2`
The pull request also disables fftPlanDestroy for rocFFT to avoid double-freeing FFT handles

cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37427

Differential Revision: D21302909

Pulled By: ezyang

fbshipit-source-id: ecadda3778e65b7f4f97e24b932b96b9ce928616
2020-04-29 09:56:28 -07:00