Commit Graph

276 Commits

Author SHA1 Message Date
Nikita Vedeneev
d7ddae8e4f det_backward: correct, more robust and with complex support [clone] (#61905)
Summary:
Clone of https://github.com/pytorch/pytorch/pull/58195 to ease the import. Done by request from anjali411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61905

Reviewed By: albanD

Differential Revision: D29937920

Pulled By: anjali411

fbshipit-source-id: 025892a8e6147790825b20458986730ad8c5bb0f
2021-07-27 10:08:26 -07:00
Ivan Yashchuk
3cd12448b4 Add forward mode differentiation for inverse and solve (#62160)
Summary:
This PR adds forward mode differentiation for `torch.linalg.inv`, `torch.linalg.inv_ex`, and `torch.linalg.solve` functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62160

Reviewed By: mruberry

Differential Revision: D29917213

Pulled By: albanD

fbshipit-source-id: b08bbc830f77f342cc7ca5b823d7ea4380f2aaa8
2021-07-27 07:51:22 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Mike Ruberry
1ce3281a6d Revert D29361872: [pytorch][PR] det_backward: more robust and with complex support
Test Plan: revert-hammer

Differential Revision:
D29361872 (fce85480b9)

Original commit changeset: b1f0fec7e3ac

fbshipit-source-id: feffa74ad65b0b294e0a9b0ee72d245393421f70
2021-07-15 15:26:00 -07:00
Nikita Vedeneev
fce85480b9 det_backward: more robust and with complex support (#58195)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58195

Reviewed By: albanD

Differential Revision: D29361872

Pulled By: anjali411

fbshipit-source-id: b1f0fec7e3ac52acd1481bcc878cc0c1d07c1852
2021-07-15 11:04:42 -07:00
Anjali Chourdia
30e48bbeae Add neg bit (#56058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56058

User facing changes:
1. Adds a negative bit and corresponding new API (`is_neg()`,`resolve_neg()`)
2. `tensor.conj().imag` now returns a floating point tensor with neg bit set to 1 instead of a tensor with no notion of negative bit. Note that imag is still a view and all the view properties still hold for imag.

Non user facing changes:
1. Added a new Negative dispatch key and a backend fallback to handle it
2. Updated copy kernel to handle negative bit
3. Merged conjugate and negative bit fallback kernel
4. fixed https://github.com/pytorch/pytorch/issues/60478 (caused due to https://github.com/pytorch/pytorch/pull/54987)

Testing:
1. Added a new OpInfo based test `test_neg_view` (verifies that out-of-place and in-place operations work correctly for all operations when the input is a neg view tensor by checking the result against an actually negated tensor, verifies that autograd returns the same output for both neg view and actually negated tensors as well as it works fine when grad_out is a neg view).
2. Added a new test class containing `test_conj_view`, `test_neg_view`.

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29636403

fbshipit-source-id: 12214c9dc4806c51850f4a72a109db9527c0ca63
2021-07-13 13:50:42 -07:00
albanD
056a8e0d5c Remove un-used parameter in _trilinear backward (#60673)
Summary:
This argument is only important for speed and memory usage. So it is ok to ignore it during the backward.
As discussed, we might want to change this to speed up backward in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60673

Reviewed By: soulitzer

Differential Revision: D29370125

Pulled By: albanD

fbshipit-source-id: ad50b3ea530aeb194f5a51845523b517a50f2c71
2021-06-25 17:47:10 -07:00
lezcano
dfc8247d33 Faster cumsum and cumprod backwards (#60642)
Summary:
Piggybacking on https://github.com/pytorch/pytorch/pull/58747, now we can implement the backwards of `cumsum` and `cumprod` without tricks. This minimises the number of kernels that are launched in GPU, so we see a reasonable speed-up on GPU. We should also get a better stability for ill-conditioned inputs, as we do not perform any numerical tricks to get the result.

Note that the benchmarks test forward + backward, so the true speed-up on the backward should be even faster. Even more so in `cumsum`, as it requires less operations than the backward of `cumprod`.

<details>
<summary>
Test Script
</summary>

```python
from itertools import product

import torch
from torch.utils.benchmark import Compare, Timer

def get_timer(ndims, prod_dim, dim, num_threads, device):
    size = [500]*ndims
    size[dim] = prod_dim

    x = torch.rand(*size, device=device, requires_grad=True)
    # Make sure there are no zeros as the formula for the backward
    # that we are testing is for when the backward has no zeros
    with torch.no_grad():
        x.add_(1e-3)
    grad = torch.ones_like(x)

    timer = Timer(
        "torch.autograd.grad([x.cumprod(dim)], [x], grad_outputs=[grad])",
        globals={"x": x, "dim": dim, "grad": grad},
        label=f"Cumprod + Backwards {device}",
        description=f"dim: {dim}",
        sub_label=f"prod_dim: {prod_dim}",
        num_threads=num_threads,
    )

    return timer.blocked_autorange(min_run_time=5)

def get_params():
    ndims = 3
    dims = range(ndims)
    prod_dims = [10, 100, 500]
    for dim, prod_dim, device in product(dims, prod_dims, ("cpu", "cuda")):
        threads = (1, 2, 4) if device == "cpu" else (1,)
        for num_threads in threads:
            yield ndims, prod_dim, dim, num_threads, device

compare = Compare([get_timer(*params) for params in get_params()])
compare.trim_significant_figures()
compare.print()
```

</details>

<details>
<summary>
Benchmark PR
</summary>

```
[------------ Cumprod + Backwards cpu -------------]
                     |  dim: 0  |  dim: 1  |  dim: 2
1 threads: -----------------------------------------
      prod_dim: 10   |     11   |     14   |     12
      prod_dim: 100  |    260   |    270   |    260
      prod_dim: 500  |   1400   |   1550   |   1360
2 threads: -----------------------------------------
      prod_dim: 10   |      6   |      6   |      6
      prod_dim: 100  |    170   |    166   |    167
      prod_dim: 500  |    902   |    950   |    858
4 threads: -----------------------------------------
      prod_dim: 10   |      4   |      3   |      3
      prod_dim: 100  |    110   |    108   |    106
      prod_dim: 500  |    576   |    590   |    547

Times are in milliseconds (ms).

[------------ Cumprod + Backwards cuda ------------]
                     |  dim: 0  |  dim: 1  |  dim: 2
1 threads: -----------------------------------------
      prod_dim: 10   |    562   |    566   |   1075
      prod_dim: 100  |   5388   |   5394   |   6697
      prod_dim: 500  |  28170   |  27580   |  30740

Times are in microseconds (us).
```

</details>

<details>
<summary>
Benchmark master
</summary>

```
[------------ Cumprod + Backwards cpu -------------]
                     |  dim: 0  |  dim: 1  |  dim: 2
1 threads: -----------------------------------------
      prod_dim: 10   |     11   |     13   |     12
      prod_dim: 100  |    270   |    270   |    256
      prod_dim: 500  |   1500   |   1590   |   1300
2 threads: -----------------------------------------
      prod_dim: 10   |      6   |      6   |      6
      prod_dim: 100  |    170   |    170   |    164
      prod_dim: 500  |    911   |    940   |    840
4 threads: -----------------------------------------
      prod_dim: 10   |      4   |      4   |      4
      prod_dim: 100  |    111   |    109   |    105
      prod_dim: 500  |    570   |    590   |    536

Times are in milliseconds (ms).

[------------ Cumprod + Backwards cuda ------------]
                     |  dim: 0  |  dim: 1  |  dim: 2
1 threads: -----------------------------------------
      prod_dim: 10   |    616   |    597   |   1109
      prod_dim: 100  |   5976   |   5723   |   7017
      prod_dim: 500  |  31110   |  29160   |  32320

Times are in microseconds (us).
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60642

Reviewed By: ngimel

Differential Revision: D29366368

Pulled By: albanD

fbshipit-source-id: b0d692ce030352965c2f152e0f92fbb61fc5ebde
2021-06-25 12:44:12 -07:00
Richard Barnes
b162d95e46 Fix a number of lint perf and safety issues in torch (#59897)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59897

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29037012

fbshipit-source-id: 7c16286d5fc2b67964fb65f8374dfff4d1a7aefb
2021-06-15 13:14:51 -07:00
albanD
a524ee00ca Forward AD formulas batch 3 (#59711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59711

This is the exact same PR as before.
This was reverted before the PR below was faulty.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28995762

Pulled By: albanD

fbshipit-source-id: 65940ad93bced9b5f97106709d603d1cd7260812
2021-06-10 19:30:02 -07:00
Richard Barnes
e3d75b8475 irange for PyTorch sans jit (#59481)
Summary:
Switches most of the simple for loops outside of `jit` directories to use `c10::irange`.

Generated with D28874212.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D28909681

fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85
2021-06-09 14:46:11 -07:00
Ivan Yashchuk
90303157ab Enable complex dtypes for coo_sparse-coo_sparse matmul [CPU] (#59554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59554

This PR enables complex numbers supports for matrix-matrix
multiplication of COO sparse matrices.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D28968309

Pulled By: anjali411

fbshipit-source-id: 4fd471e76a5584366aabc86c08b4564667ee54ca
2021-06-08 19:34:41 -07:00
Jane Xu
14f4c8d333 Revert D28387762: Forward AD formulas batch 3
Test Plan: revert-hammer

Differential Revision:
D28387762 (58348bea06)

Original commit changeset: fc395c92af7e

fbshipit-source-id: 608d704ff5bc560714790a576eaf9ed7f1f44e13
2021-06-08 15:19:26 -07:00
Natalia Gimelshein
9d533ef3ac Renorm fix (#59615)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59584
albanD, soulitzer, `renorm` grad was completely busted. Fast gradcheck is definitely not doing its job.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59615

Reviewed By: jbschlosser

Differential Revision: D28964271

Pulled By: ngimel

fbshipit-source-id: b6878cd24db9189b64b67eb58bd2cd8956cda78a
2021-06-08 14:59:24 -07:00
Victor Quach
c268eefe96 Use TORCH_CHECK_NOT_IMPLEMENTED for AD not implemented (#59482)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59482

Fixes #53398

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D28933809

fbshipit-source-id: 53387ec9690fc235b0622b50800feced706ea1ee
2021-06-08 14:02:04 -07:00
albanD
58348bea06 Forward AD formulas batch 3 (#58094)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58094

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28387762

Pulled By: albanD

fbshipit-source-id: fc395c92af7ebb5ebae95c40f6c76273047f4097
2021-06-08 13:00:21 -07:00
Nikita Vedeneev
a30b359590 fix double backward for binary_cross_entropy loss function when reduction=sum. (#59479)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59477.

```python
In [1]: import torch

In [2]: x = torch.rand(3, 3, dtype=torch.double, requires_grad=True)

In [3]: y = torch.rand(3, 3, dtype=torch.double)

In [4]: torch.autograd.gradgradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='sum'), [x, y])
Out[4]: True

In [5]: torch.autograd.gradgradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='mean'), [x, y])
Out[5]: True

In [6]: torch.autograd.gradcheck(lambda x, y: torch.nn.functional.binary_cross_entropy(x, y, reduction='sum'), [x, y])
Out[6]: True

```

More comprehensive testing could be added in https://github.com/pytorch/pytorch/pull/59447 where explicit `gradcheck` and `gradgradcheck` tests are added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59479

Reviewed By: ejguan

Differential Revision: D28934354

Pulled By: albanD

fbshipit-source-id: 12ce68e3c5c499b2531f7cdba3c22548d67e07e9
2021-06-07 14:14:08 -07:00
Nikita Vedeneev
c51abf8fca Make binary_cross_entropy differentiable wrt target (#59447)
Summary:
As per title. Resolves https://github.com/pytorch/pytorch/issues/56683.
`gradgradcheck` will fail once `target.requires_grad() == True` because of the limitations of the current double backward implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59447

Reviewed By: agolynski

Differential Revision: D28910140

Pulled By: albanD

fbshipit-source-id: 20934880eb4d22bec34446a6d1be0a38ef95edc7
2021-06-07 09:20:17 -07:00
anjali411
3607478ecd Conjugate View (#54987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987

Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype:

Here's a summary of the changes in this PR:
This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose).

1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor.
2. NEW API:
    a) `.conj()` -- now returning a view.
    b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory.
    c) `.conj_physical_()`, and `out=` variant
    d) `.resolve_conj()`  -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0.
    e) `.resolve_conj_()` in-place version of (d)
    f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors.
    g) `view_as_real` -- existing function, but now errors out on conjugated tensors.
3. Conjugate Fallback
    a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor.
    b) This fallback is well equipped to handle the following cases:
        - functional operation e.g., `torch.sin(input)`
        - Mutable inputs and in-place operations e.g., `tensor.add_(2)`
        - out-of-place operation e.g., `torch.sin(input, out=out)`
        - Tensorlist input args
        - NOTE: Meta tensors don't work with conjugate fallback.
4. Autograd
    a) `resolve_conj()` is an identity function w.r.t. autograd
    b) Everything else works as expected.
5. Testing:
    a) All method_tests run with conjugate view tensors.
    b) OpInfo tests that run with conjugate views
        - test_variant_consistency_eager/jit
        - gradcheck, gradgradcheck
        - test_conj_views (that only run for `torch.cfloat` dtype)

NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit.

Follow up work:
1. conjugate view RFC
2. Add neg bit to re-enable view operation on conjugated tensors
3. Update linalg functions to call into specialized functions that fast path with the hermitian operation.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28227315

Pulled By: anjali411

fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f
2021-06-04 14:12:41 -07:00
Peter Bell
6408cbd918 Migrate renorm to ATen (CPU and CUDA) (#59250)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/59108, closes https://github.com/pytorch/pytorch/issues/24754, closes https://github.com/pytorch/pytorch/issues/24616

This reuses `linalg_vector_norm` to calculate the norms. I just add a new kernel that turns  the norm into a normalization factor, then multiply the original tensor using a normal broadcasted `mul` operator. The result is less code, and better performance to boot.

#### Benchmarks (CPU):
|     Shape    | Dim |  Before | After (1 thread) | After (8 threads) |
|:------------:|:---:|--------:|-----------------:|------------------:|
| (10, 10, 10) | 0   | 11.6 us |           4.2 us |            4.2 us |
|              | 1   | 14.3 us |           5.2 us |            5.2 us |
|              | 2   | 12.7 us |           4.6 us |            4.6 us |
| (50, 50, 50) | 0   |  330 us |           120 us |           24.4 us |
|              | 1   |  350 us |           135 us |           28.2 us |
|              | 2   |  417 us |           130 us |           24.4 us |

#### Benchmarks (CUDA)
|     Shape    | Dim |  Before |   After |
|:------------:|:---:|--------:|--------:|
| (10, 10, 10) | 0   | 12.5 us | 12.1 us |
|              | 1   | 13.1 us | 12.2 us |
|              | 2   | 13.1 us | 11.8 us |
| (50, 50, 50) | 0   | 33.7 us | 11.6 us |
|              | 1   | 36.5 us | 15.8 us |
|              | 2   | 41.1 us |   15 us |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59250

Reviewed By: mruberry

Differential Revision: D28820359

Pulled By: ngimel

fbshipit-source-id: 572486adabac8135d52a9b8700f9d145c2a4ed45
2021-06-03 11:43:27 -07:00
albanD
d095ec75a1 Forward AD formulas batch 2 (#57863)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57863

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28387763

Pulled By: albanD

fbshipit-source-id: e1b60ab728bb05b9e3323ee0dc7e401aaf5b8817
2021-06-03 07:33:04 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
kshitij12345
5c18994674 [special] Add i1 and i1e (#56352)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

* [x] Check Docs https://12721710-65600975-gh.circle-artifacts.com/0/docs/special.html
* [x] Investigate fp32 failure on CI?! (Fails on clang. Reproduced locally with clang-11)
* [ ] Kernel vs Composite?
* [x] Autograd for `i0e` for zero?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56352

Reviewed By: anjali411

Differential Revision: D28700888

Pulled By: mruberry

fbshipit-source-id: 91a3cbb94f5b8a3b063589ec38179848c11def83
2021-05-29 20:55:23 -07:00
Natalia Gimelshein
355b24438c make vector_norm backward call norm_backward (#59135)
Summary:
Per title. Remove duplicated code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59135

Reviewed By: mruberry

Differential Revision: D28775716

Pulled By: ngimel

fbshipit-source-id: 50dc77590db15976453fc41c3657a77198749849
2021-05-29 12:14:46 -07:00
Adnios
09a8f22bf9 Add mish activation function (#58648)
Summary:
See issus: https://github.com/pytorch/pytorch/issues/58375

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648

Reviewed By: gchanan

Differential Revision: D28625390

Pulled By: jbschlosser

fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4
2021-05-25 10:36:21 -07:00
Kurt Mohler
fe8e5eb260 Change native functions to take c10::string_view args instead of std::string (#57680)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57680

Reviewed By: malfet

Differential Revision: D28511799

Pulled By: ezyang

fbshipit-source-id: 43142f994d048b28b3279ccdb7a28cbaa3190973
2021-05-20 18:15:45 -07:00
lezcano
1f3807ce5d More stable and faster implementation of the gradient of torch.linalg.eigh (#55049)
Summary:
This PR:
- Renames symeig_backward to eigh_backward
- Improves the stability and speed of the gradient computation by doing `V(A + B)Vh` instead of `VAVh + VBVh`  when both the gradients of the eigenvectors and eigenvalues are defined.
- Updates the comments of the function to make them arguably clearer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55049

Reviewed By: ngimel

Differential Revision: D28396823

Pulled By: mruberry

fbshipit-source-id: a144482bfb1054e281b58ae1fe3cf1015bab505d
2021-05-13 17:17:35 -07:00
lezcano
9e156b01e5 linalg.eig backwards and linalg.eigvals (#57276)
Summary:
This PR adds backwards support for `eig` and `eigvals`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57276

Reviewed By: ngimel

Differential Revision: D28405056

Pulled By: mruberry

fbshipit-source-id: 27ef03f139f44d75f4d319b0f3e77e99eea9bb01
2021-05-13 09:42:13 -07:00
lezcano
db13119fc4 Deprecate symeig (#57732)
Summary:
This one had a tricky usage of `torch.symeig` that had to be replaced. I tested the replacement locally though.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57732

Reviewed By: bdhirsh

Differential Revision: D28328189

Pulled By: mruberry

fbshipit-source-id: 7f000fcbf2b029beabc76e5a89ff158b47977474
2021-05-12 02:21:35 -07:00
Nikita Vedeneev
c790fd2bf8 ATen lu_unpack. Required for making torch.lu_solve differentiable. (#46913)
Summary:
Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method.
However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`,
`torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python.

Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function.

~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913

Reviewed By: albanD

Differential Revision: D28355725

Pulled By: mruberry

fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78
2021-05-11 22:53:21 -07:00
Ivan Yashchuk
aaca12bcc2 Deprecate in docs torch.svd and change svd -> linalg_svd (#57981)
Summary:
This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549).
In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981

Reviewed By: ngimel

Differential Revision: D28345558

Pulled By: mruberry

fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213
2021-05-11 18:04:10 -07:00
lezcano
415ae54c31 Deprecate torch.eig (#57727)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57727

Reviewed By: bdhirsh

Differential Revision: D28317984

Pulled By: mruberry

fbshipit-source-id: fa1aa1b78fd3611ac208bca93e2b745a1bac41f1
2021-05-10 23:31:02 -07:00
Mike Ruberry
3c87fe9b14 Revert D28117714: [pytorch][PR] ATen lu_unpack. Required for making torch.lu_solve differentiable.
Test Plan: revert-hammer

Differential Revision:
D28117714 (5c67d8dfd3)

Original commit changeset: befd33db12ec

fbshipit-source-id: 295b2134935542a903a73f90a7998239dfe6cc81
2021-05-09 23:20:06 -07:00
Nikita Vedeneev
5c67d8dfd3 ATen lu_unpack. Required for making torch.lu_solve differentiable. (#46913)
Summary:
Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method.
However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`,
`torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python.

Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function.

~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913

Reviewed By: astaff

Differential Revision: D28117714

Pulled By: mruberry

fbshipit-source-id: befd33db12ecc147afacac792418b6f4948fa4a4
2021-05-09 19:12:56 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Peter Bell
2043093217 Add correction parameter to std/var (#50903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50903

First part of #50010. Also fixes #51127.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27911345

Pulled By: mruberry

fbshipit-source-id: 7138fddc935802918ab9ff19f4bc1b9f4d745d41
2021-05-07 14:40:28 -07:00
Alexander
6f2c0cccdd New: sparse complex: add linear algebra, addmm (#57129)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57129

Test Plan: Imported from OSS

Reviewed By: janeyx99, astaff

Differential Revision: D28112701

Pulled By: ezyang

fbshipit-source-id: 1b253453dc19e908fb18d0b1a83738243e0a8d59
2021-05-07 05:37:48 -07:00
Heitor Schueroff
1f1e2dab6b Remove optional type for ord parameter in vector_norm (#57662)
Summary:
As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215

Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None`

### BC Breaking Note
This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662

Reviewed By: albanD, mruberry

Differential Revision: D28228870

Pulled By: heitorschueroff

fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13
2021-05-06 17:53:25 -07:00
Peter Bell
33eea146ee torch.clamp with tensor min and max (#52695)
Summary:
Fixes gh-2793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52695

Reviewed By: mruberry

Differential Revision: D27395977

Pulled By: ezyang

fbshipit-source-id: f86aa240feb034d42e4c45447e72218f6a773c24
2021-05-03 12:56:16 -07:00
Kevin Rose
ec86f96e91 Fix for derivative of sinc(x) when x is positive but very very small (#56986)
Summary:
Problem arises for sinc'(x) where x != 0, but x ** 2 == 0, which happens for some very small floats.

I realized that my solution from https://github.com/pytorch/pytorch/issues/56763 was incomplete when I did a quick implementation using `torch.autograd.Function` and still got a `NaN` from my derivative.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56986

Reviewed By: gchanan

Differential Revision: D28093507

Pulled By: albanD

fbshipit-source-id: 2a30e1065b08c5c60de843a0778dedeb0fb295f4
2021-04-29 11:16:39 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Kevin Rose
5854e93bc9 Fix derivative of sinc at x=0 (#56763)
Summary:
Attempting to fix https://github.com/pytorch/pytorch/issues/56760

The derivative of `sinc(x)` at `x=0` should be special cased to 0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56763

Reviewed By: zhangguanheng66

Differential Revision: D27978135

Pulled By: albanD

fbshipit-source-id: ede5e734613cf60e720f6bcc7387c3cd9c6ec233
2021-04-26 09:43:42 -07:00
Xiao Wang
7b31ba4708 Fix cudnn ctc loss backward (#56639)
Summary:
Fix cudnn ctc loss backward

Fix https://github.com/pytorch/pytorch/issues/49046, which was working in pytorch 1.1

Originally modified in this PR in Oct 2019, https://github.com/pytorch/pytorch/pull/27039/files#diff-25ec2c1108ee03e2167622588ec31d167897ef1cccb12a4cfe77eb98777316daR2383-R2392

According to the original code

90ffab6e37/tools/autograd/derivatives.yaml (L1387-L1388)

and the code after PR

f461184505/tools/autograd/templates/Functions.cpp (L2456-L2465)

This `at::zeros({0}, raw_grad.options())` in line 2460 seems suspicious, and is causing `infer_size` runtime error

```
RuntimeError: The size of tensor a (0) must match the size of tensor b (177) at non-singleton dimension 2
Exception raised from infer_size at ..\aten\src\ATen\ExpandUtils.cpp:24 (most recent call first):
```

I've modified that to `at::zeros_like(raw_grad)`, which looks more accurate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56639

Reviewed By: mruberry

Differential Revision: D27987860

Pulled By: ngimel

fbshipit-source-id: 5ad65e78d017c26894fb26318a5992b0878d04d5
2021-04-25 22:51:19 -07:00
Brian Hirsh
e8faf69739 fix torch.pow type promotion issue (#54085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54085

Fixes https://github.com/pytorch/pytorch/issues/50121.

This fixes two similar issues pointed out with the dtype that `torch.pow` performs its computation. Thanks ngimel for spotting the issues originally (comments [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594624355) and [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594719704))!

Before:
```
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0]))
tensor([0])
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0))
tensor(131072)
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda'))
tensor([131072], device='cuda:0')
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda'))
tensor(131072, device='cuda:0')
```

After:
```
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0]))
tensor([0])
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0))
tensor(0)
>>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda'))
tensor([0], device='cuda:0')
>>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda'))
tensor(0, device='cuda:0')
```

In all four cases above, `tensor(0, ...)` is the correct value because the computed "common dtype" among the inputs is expected to be `uint8`. Computing `2 ** 7` in uint8 will then overflow to zero. Finally, we cast the computed output to the output tensor's dtype, which is `int32`.

There were two separate issues fixed in this PR: one for cpu and one for cuda:
* For CPU, The `pow(Scalar, Tensor)` overload wasn't calling `set_wrapped_number(true)` after wrapping the scalar in a Tensor, which caused the "promoted" scalar to incorrectly participate in type promotion (see the documented behavior [here](aa8714dfed/c10/core/TensorImpl.h (L590)))
* For CUDA, the cuda kernels defined in `PowKernel.cu` were using the output's dtype to run the computation, instead of the common dtype.

As an aside: The CPU and CUDA kernels actually both use `iter.dtype()` instead of `iter.common_dtype()` to run the computation, which I fixed. The reason that only manifested here for CUDA is because TensorIterator has cpu-specific logic to create temporary outputs with the intermediate dtype (shown [here](aa8714dfed/aten/src/ATen/TensorIterator.cpp (L349))). I'm not sure what the end state is there- I can imagine that being something we're more okay doing for cpu than for cuda, but it also leads to hard-to-track-down inconsistencies between the two like in this case.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27096330

Pulled By: bdhirsh

fbshipit-source-id: a7e2909243851625cb3056d1e7abb2383bfe95f2
2021-04-15 08:55:53 -07:00
Richard Barnes
d690973295 irange on int64_t (#55148)
Summary:
Converts loops of the form:
```
for(int64_t VAR=0;VAR<LIMIT;VAR++)
```
to the form
```
for(const auto VAR : c10::irange(LIMIT))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55148

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27447811

fbshipit-source-id: 6311a094ec4a81a0b57383aaee0ba1b1dc2445c4
2021-04-05 16:14:00 -07:00
Peter Bell
2ee02b30b1 Replace rounding_mode="true" with rounding_mode=None (#51988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51988

* **#51988 Replace rounding_mode="true" with rounding_mode=None**

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27561817

Pulled By: mruberry

fbshipit-source-id: 60d1d9c389570f60d599fc1876518717367fb368
2021-04-05 14:53:43 -07:00
Antonio Cuni
980d6f2589 torch.linalg.det (#53119)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51652.
In particular:
- the main implementation is in `torch.linalg.det` now. `torch.det` is just a deprecated alias to it
- add a new `OpInfo` for `torch.linalg.det`
- remove the old-style tests for `torch.det` (this is similar to what we did for `torch.linalg.slogdet`, see https://github.com/pytorch/pytorch/issues/49194)
- added a `out=` argument to `torch.linalg.det`, but **not** to `torch.det`.

It is worth noting that I had to skip few tests:
- `TestGradientsCuda::test_fn_gradgrad_linalg_det_cuda_float64`. This is not a regression: the functionality is broken also on master, but the test is not executed properly due to https://github.com/pytorch/pytorch/issues/53361.

And the following tests which fails only on ROCm:
- `test_variant_consistency_jit_cuda_{float64,float32}`
- `test_fn_grad_cuda_float64`

I think that the ROCm tests fail because the current linalg.det backward is unstable if the matrix has repeated singular values, see https://github.com/pytorch/pytorch/issues/53364 .

(At the moment of writing some CI jobs are still running but I believe the build will be green, since the only difference wrt the last push is the skip of the ROCm tests)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53119

Reviewed By: H-Huang

Differential Revision: D27441999

Pulled By: mruberry

fbshipit-source-id: 5eab14c4f0a165e0cf9ec626c3f4bb23359f2a9e
2021-04-05 08:45:27 -07:00
Mike Ruberry
c0ac0fef4e Revert D27448156: irange for size_t
Test Plan: revert-hammer

Differential Revision:
D27448156 (041b4431b2)

Original commit changeset: 585da57d4de9

fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365
2021-04-03 19:14:00 -07:00
Richard Barnes
041b4431b2 irange for size_t (#55163)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27448156

fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1
2021-04-02 23:22:29 -07:00
Nikita Vedeneev
61b074581c torch.prod backward for complex types. (#48125)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53511
torch.det does depend on torch.prod, which in turn depends on several other functions, and they also depend on torch.prod, so there is a circular relationship, hence this PR will enable complex backward support for several functions at once.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48125

Reviewed By: pbelevich

Differential Revision: D27188589

Pulled By: anjali411

fbshipit-source-id: bbb80f8ecb83a0c3bea2b917627d3cd3b84eb09a
2021-03-19 09:44:08 -07:00
albanD
09b4af2f0f Remove legacy from optional-related function names (#54101)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54101

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D27117839

Pulled By: albanD

fbshipit-source-id: 1f50b06ff9b0be8301f6ea9eca14f73a3a5fa137
2021-03-18 09:29:00 -07:00
albanD
cba8516b52 make internal forwardAD methods on at::Tensor internal (#54099)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54099

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D27117838

Pulled By: albanD

fbshipit-source-id: ede96529a4b099dea9cf885d0bf2cb352aa30fa5
2021-03-18 09:27:17 -07:00
Kurt Mohler
382a47b493 Add torch.linalg.vector_norm function (#51099)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50214

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51099

Reviewed By: agolynski

Differential Revision: D27147360

Pulled By: mruberry

fbshipit-source-id: 1056f840e7027ad81971c9d1a9f952ab9648f1b5
2021-03-18 06:41:39 -07:00
Ivan Yashchuk
564456ac44 Added autograd support for torch.orgqr (#52637)
Summary:
This PR adds autograd support for `torch.orgqr`.

Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it.

The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html.
Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation.

Resolves https://github.com/pytorch/pytorch/issues/50104

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637

Reviewed By: agolynski

Differential Revision: D27114246

Pulled By: mruberry

fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce
2021-03-18 05:42:18 -07:00
lezcano
1f5b9170aa Faster backwards for cumsum and cumprod (#53711)
Summary:
Provides a faster formula for `cumprod` in the case when the input has zeros. This formula is non-differentiable, so we leave the previous formula for the cases when `at::GradMode::is_enabled()`.

This new formula gives up to x10 and x30 speed-ups in CPU and GPU (see the benchmarks below).

The `cumsum` backward formula was rewritten so that no copies are necessary. We also removed a double negation in its formula. This gives a significant speed-up in CPU, while being almost as efficient as the formula with copies in GPU. We can see this speed-up when comparing the "No zeros" part of the benchmark.

Benchmarks:

nb. It is worth noting that the script tests the forward and the backward for `cumprod`, so the speed-ups should be even larger than those announced here.
<details>
<summary>Script</summary>

```python
from IPython import get_ipython
import torch
from itertools import product

torch.manual_seed(13)
torch.set_num_threads(1)

ipython = get_ipython()

cpu = torch.device('cpu')
cuda = torch.device('cuda')

def run_test(ndims, size, size_prod, zeros, device):
    print(f"ndims: {ndims}, tensor_size: {size}, size_prod: {size_prod}, zeros: {zeros}, device: {device}")

    for dim in range(ndims):
        sizes = ndims * [size]
        sizes[dim] = size_prod
        tensor = torch.rand(*sizes, device=device)
        with torch.no_grad():
            if zeros:
                # Set 0.1 of them to zero
                p_drop = 0.1
                mask = torch.full_like(tensor, 1.0 - p_drop)
                tensor = tensor * torch.bernoulli(mask)
            else:
                tensor = tensor + 1e-3
        tensor.requires_grad_()
        grad = torch.ones_like(tensor)
        # We test both forward + backward, meaning that the speed-up is actually greater than reported
        # That being said, this is more realistic than doing `retain_graph=True`
        command = "torch.autograd.grad([tensor.cumprod(dim)], [tensor], grad_outputs=[grad])"
        if device == cuda:
            command += "; torch.cuda.synchronize()"
        ipython.magic(f"timeit {command}")
    print()

for device, zeros in product([cuda, cpu], [True, False]):
    run_test(3, 300, 10, zeros, device)
    run_test(3, 300, 100, zeros, device)
    if device == cuda:
        run_test(3, 300, 300, zeros, device)
```

</details>

<details>
<summary>CPU This PR  (Some regression small tensors, x4 speed-up large tensors)</summary>

```
Zeros:
ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cpu
28.2 ms ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
29.8 ms ± 78.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
24.5 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cpu
414 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
428 ms ± 4.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
382 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

No Zeros:
ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cpu
3.11 ms ± 9.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.83 ms ± 3.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.08 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cpu
92.2 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
101 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
87 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
</details>

<details>
<summary>CUDA This PR (7-30x speed-up)</summary>

```

Zeros:
ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cuda
1.46 ms ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.48 ms ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.93 ms ± 8.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cuda
10.5 ms ± 914 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.6 ms ± 509 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.7 ms ± 864 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

ndims: 3, tensor_size: 300, size_prod: 300, zeros: True, device: cuda
30.3 ms ± 5.16 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
30.6 ms ± 6.44 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
32.2 ms ± 2.34 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

No Zeros:
ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cuda
248 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
252 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
438 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cuda
2.1 ms ± 193 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.16 ms ± 380 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.59 ms ± 398 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

ndims: 3, tensor_size: 300, size_prod: 300, zeros: False, device: cuda
6.3 ms ± 857 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.39 ms ± 288 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.15 ms ± 233 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

</details>

<details>
<summary>CPU master</summary>

```
Zeros:
ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cpu
8.27 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.8 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
28.2 ms ± 74.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cpu
1.53 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.95 s ± 4.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.86 s ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

No Zeros:
ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cpu
3.42 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.25 ms ± 3.65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.34 ms ± 3.04 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cpu
104 ms ± 148 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
117 ms ± 99.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
94.8 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

</details>

<details>
<summary>CUDA master</summary>

```
Zeros:
ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cuda
912 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.05 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.74 ms ± 381 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cuda
71.3 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
85.4 ms ± 9.82 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
119 ms ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

ndims: 3, tensor_size: 300, size_prod: 300, zeros: True, device: cuda
646 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
776 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
917 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

No Zeros:
ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cuda
301 µs ± 893 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
308 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
592 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cuda
2.61 ms ± 375 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.68 ms ± 524 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.38 ms ± 736 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

ndims: 3, tensor_size: 300, size_prod: 300, zeros: False, device: cuda
7.89 ms ± 848 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.03 ms ± 517 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.24 ms ± 405 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

</details>

cc nikitaved

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53711

Reviewed By: jbschlosser

Differential Revision: D27059662

Pulled By: anjali411

fbshipit-source-id: be610d5590c0199b4412dff66fac47666faaff9d
2021-03-16 13:57:43 -07:00
Wenlei Xie
2ecb2c7931 Pass Scalar by reference (#53583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583

`Scalar` takes 32 bytes due to `c10::complex<double>`
requires aligning to 16 bytes. Passing Scalar by reference
shows about 1% improvements on instruction count.

All the changes in this commit are codemoded except for
the following 4 files (which code-gen signatures):
```
tools/codegen/api/cpp.py
tools/codegen/api/native.py
tools/codegen/api/structured.py
caffe2/contrib/aten/gen_op.py
```

# Codemode

## Main Step

For the codemod part, here is the main command used:
```
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}'
```

As you can tell, it codemods both `Scalar` and `optional<Scalar>`.  Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter).

In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference).

## Pre-Step

Prior to applying the main command,  as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list:
```
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)at::Scalar (\w+)' '${1}const at::Scalar& ${2}'
fastmod --extensions h '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}'
fastmod --extensions cpp '([a-zA-Z_+]\([^)]*,?\s*)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}'
```

## Fixup
There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&`  (if `Pre-step` is not done comprehensively). Here is an incomplete list:
```
fastmod --extensions cpp 'const const Scalar' 'const Scalar'
fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>'
fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>'
fastmod 'at::const Scalar&' 'const at::Scalar&'
```

## Supplementary

`cu` and `mm` files also need to be codemoded, for example:

```
fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&'
fastmod --extensions mm '([a-zA-Z_+]\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'
```

Function pointers are not codemoded. Here is an incomplete list:

```
# Cover case: using index_fill_fn = void(*)(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source);
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar (\w+)' '${1}const Scalar& ${2}'

# Cover case: using softplus_fn = void (*)(TensorIterator&, Scalar, Scalar);
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}'
fastmod --extensions cpp '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)Scalar([, \)])' '${1}const Scalar&${2}'
fastmod --extensions h '(void\s*\(\s*\*\s*\)\([^)]*,?\s*)optional<Scalar>([, \)])' '${1}const optional<Scalar>&${2}'
```

Some corner cases needs to be manually fixed.

ghstack-source-id: 123970306

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D26904445

fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d
2021-03-15 23:17:06 -07:00
Nikita Vedeneev
8f15a2f052 eig_backward: faster and with complex support (#52875)
Summary:
As per title. Compared to the previous version, it is lighter on the usage of `at::solve` and `at::matmul` methods.

Fixes https://github.com/pytorch/pytorch/issues/51621

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52875

Reviewed By: mrshenli

Differential Revision: D26768653

Pulled By: anjali411

fbshipit-source-id: aab141968d02587440128003203fed4b94c4c655
2021-03-10 11:33:30 -08:00
Joel Schlosser
e86476f736 Huber loss (#50553)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48595.

## Background

This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant.

I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows:
```
Huber loss calls dedicated Huber kernel: 2,795,300
Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612
```
With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553

Test Plan:
```
python test/test_nn.py TestNN.test_HuberLoss
python test/test_nn.py TestNN.test_HuberLoss_delta
python test/test_nn.py TestNN.test_huber_loss_invalid_delta
python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu
python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda
python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu
python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda
python test/test_nn.py TestNN.test_loss_equal_input_target_shape
python test/test_nn.py TestNN.test_pointwise_loss_broadcast
python test/test_overrides.py
python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss
python test/test_type_hints.py
python test/test_cpp_api_parity.py
build/bin/test_api
```

## Documentation
<img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png">
<img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png">
<img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png">
<img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png">
<img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png">

Reviewed By: H-Huang

Differential Revision: D26734071

Pulled By: jbschlosser

fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5
2021-03-02 17:30:45 -08:00
kshitij12345
748285ccd7 [complex] add autograd support for torch.polar (#52488)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52488

Reviewed By: zou3519

Differential Revision: D26711841

Pulled By: anjali411

fbshipit-source-id: b8538fb8cb44456b832e4f993cf41954b3ddd2e8
2021-03-01 21:57:35 -08:00
Richard Barnes
fa325d7c9f Use sum_integers and multiply_integers (#51146)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51146

Test Plan: Sandcastle tests

Reviewed By: ngimel

Differential Revision: D25903430

fbshipit-source-id: 329c14018c9e5192864eed88a8ed0a5068ff1c69
2021-02-10 18:05:45 -08:00
Alexander
0c313564af Backward through sparse_coo_tensor (#50361)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49683

This PR  solves Backward through sparse_coo_tensor bug by implementing a `sparse_mask_helper` function for n-dimensional sparse tensor for CPU and CUDA which is used to reimplement `sparse_constructor_values_backward` function.

This `sparse_mask` function was implemented before for  backward  sparse-sparse matmul. However,  the algorithm is little different  because in this case it should be applyable not only for matrices but for n-dimensional tensors. Thankfully it was not quite hard to extend and now both share the same code base.

Note that  no new tests are required because now the backward for sparse-sparse matmul now uses the new `sparse_mask_helper`.

ngimel, mruberry - kindly review this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50361

Reviewed By: zhangguanheng66

Differential Revision: D26270483

Pulled By: ngimel

fbshipit-source-id: ee4bda49ff86e769342674b64d3c4bc34eae38ef
2021-02-06 23:15:54 -08:00
Peter Bell
b150f150ba Add division overload with rounding_mode selection (#51706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50280

As mentioned in gh-43874, this adds a `rounding_mode={'true', 'trunc', 'floor'}`
argument so `torch.div` can be used as a replacement for `floor_divide` during
the transitional period.

I've included dedicated kernels for truncated and floor division which
aren't strictly necessary for float, but do perform significantly better (~2x) than
doing true division followed by a separate rounding kernel.

Note: I introduce new overloads for `aten::div` instead of just adding a default
`rounding_mode` because various JIT passes rely on the exact operator schema.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D26123271

Pulled By: mruberry

fbshipit-source-id: 51a83717602114597ec9c4d946e35a392eb01d46
2021-02-04 13:08:36 -08:00
anjali411
bd3ae117fc Fixes cat backward formula to return correct gradient values for R -> C case (#51681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51681

Fixes https://github.com/pytorch/pytorch/issues/51627

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D26238748

Pulled By: anjali411

fbshipit-source-id: 1dc47f8ddddbf3f2c176f21e5dcee917f84f4c93
2021-02-03 21:29:55 -08:00
XiaobingSuper
ec378055c3 add OneDNN linear backward (#49453)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49453

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26006889

Pulled By: VitalyFedyunin

fbshipit-source-id: 06e2a02b6e01d847395521a31fe84d844f2ee9ae
2021-02-02 12:18:59 -08:00
Ivan Yashchuk
ddf26816d3 Make torch.svd return V, not V.conj() for complex inputs (#51012)
Summary:
**BC-breaking note:**

torch.svd() added support for complex inputs in PyTorch 1.7, but was not documented as doing so. The complex "V" tensor returned was actually the complex conjugate of what's expected. This PR fixes the discrepancy.

This will silently break all users of torch.svd() with complex inputs.

**Original PR Summary:**

This PR resolves https://github.com/pytorch/pytorch/issues/45821.

The problem was that when introducing the support of complex inputs for `torch.svd` it was overlooked that LAPACK/MAGMA returns the conjugate transpose of V matrix, not just the transpose of V. So `torch.svd` was silently returning U, S, V.conj() instead of U, S, V.

Behavior of `torch.linalg.pinv`, `torch.pinverse` and `torch.linalg.svd` (they depend on `torch.svd`) is not changed in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51012

Reviewed By: bdhirsh

Differential Revision: D26047593

Pulled By: albanD

fbshipit-source-id: d1e08dbc3aab9ce1150a95806ef3b5da98b5d3ca
2021-01-25 14:06:41 -08:00
Tugsbayasgalan Manlaibaatar
1a38fa9930 Striding for lists Part 1 (#48719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48719

Attempt to break this PR (https://github.com/pytorch/pytorch/pull/33019) into two parts. As per our discussion with eellison,  the first part is to make sure our aten::slice operator take optional parameters for begin/step/end. This will help with refactoring ir_emitter.cpp for genering handling for list and slice striding. Once this PR merged, we will submit a second PR with compiler change.

Test Plan:
None for this PR, but new tests will be added for the second part.

Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D25929902

fbshipit-source-id: 5385df04e6d61ded0699b09bbfec6691396b56c3
2021-01-19 09:30:01 -08:00
Richard Zou
1154a8594e Add instructional error message for cudnn RNN double backward workaround (#33884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33884

Mitigates https://github.com/pytorch/pytorch/issues/5261.

It's not possible for us to support cudnn RNN double backwards due to
limitations in the cudnn API. This PR makes it so that we raise an error
message if users try to get the double backward on a cudnn RNN; in the
error message we suggest using the non-cudnn RNN.

Test Plan: - added some tests to check the error message

Reviewed By: albanD

Differential Revision: D20143544

Pulled By: zou3519

fbshipit-source-id: c2e49b3d8bdb9b34b561f006150e4c7551a78fac
2021-01-19 09:05:36 -08:00
Ivan Yashchuk
f9a5ba7398 Added linalg.slogdet (#49194)
Summary:
This PR adds `torch.linalg.slogdet`.

Changes compared to the original torch.slogdet:

- Complex input now works as in NumPy
- Added out= variant (allocates temporary and makes a copy for now)
- Updated `slogdet_backward` to work with complex input

Ref. https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49194

Reviewed By: VitalyFedyunin

Differential Revision: D25916959

Pulled By: mruberry

fbshipit-source-id: cf9be8c5c044870200dcce38be48cd0d10e61a48
2021-01-19 07:28:12 -08:00
anjali411
227acc2e51 Complex autograd support for torch.{baddbmm, addbmm, addmm, addmv} (#50632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50632

I'll port the following method tests in follow-up PRs:
`'baddbmm', 'addbmm', 'addmv', 'addr'`
After the tests are ported to OpInfo based tests, it would also be much easier to add tests with complex alpha and beta values.
Edit- it seems like it's hard to port the broadcasting variant tests because one ends up skipping `test_inplace_grad` and `test_variant_consistency_eager` even for the case when inputs are not required to be broadcasted.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D25947471

Pulled By: anjali411

fbshipit-source-id: 9faa7f1fd55a1269bad282adac2b39d19bfa4591
2021-01-18 14:05:02 -08:00
Jeffrey Wan
6e3e57095c Add complex support for torch.nn.L1Loss (#49912)
Summary:
Building on top of the work of anjali411 (https://github.com/pytorch/pytorch/issues/46640)

Things added in this PR:
1. Modify backward and double-backward formulas
2. Add complex support for `new module tests` and criterion tests (and add complex tests for L1)
3. Modify some existing tests to support complex

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49912

Reviewed By: zhangguanheng66

Differential Revision: D25853036

Pulled By: soulitzer

fbshipit-source-id: df619f1b71c450ab2818eb17804e0c55990aa8ad
2021-01-15 15:53:15 -08:00
Howard Huang
ec51b67282 Fix elu backward operation for negative alpha (#49272)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47671

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49272

Test Plan:
```
x = torch.tensor([-2, -1, 0, 1, 2], dtype=torch.float32, requires_grad=True)
y = torch.nn.functional.elu_(x.clone(), alpha=-2)
grads = torch.tensor(torch.ones_like(y))
y.backward(grads)
```

```
RuntimeError: In-place elu backward calculation is triggered with a negative slope which is not supported.
This is caused by calling in-place forward function with a negative slope, please call out-of-place
version instead.
```

Reviewed By: albanD

Differential Revision: D25569839

Pulled By: H-Huang

fbshipit-source-id: e3c6c0c2c810261566c10c0cc184fd81b280c650
2021-01-11 12:52:52 -08:00
Nikita Vedeneev
eb87686511 svd_backward: more memory and computationally efficient. (#50109)
Summary:
As per title.

CC IvanYashchuk (unfortunately I cannot add you as a reviewer for some reason).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50109

Reviewed By: gchanan

Differential Revision: D25828536

Pulled By: albanD

fbshipit-source-id: 3791c3dd4f5c2a2917eac62e6527ecd1edcb400d
2021-01-11 05:28:43 -08:00
Antonio Cuni
b5ab0a7f78 Improve torch.linalg.qr (#50046)
Summary:
This is a follow up of PR https://github.com/pytorch/pytorch/issues/47764 to fix the remaining details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50046

Reviewed By: zou3519

Differential Revision: D25825557

Pulled By: mruberry

fbshipit-source-id: b8e335e02265e73484a99b0189e4cc042828e0a9
2021-01-08 09:52:31 -08:00
Sebastian Messmer
c7e9abb66a Making ops c10-full: list of optional tensors (#49138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138

See for details: https://fb.quip.com/QRtJAin66lPN

We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this.

## Backwards Compatibility

- This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`.
- This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57
- This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`.

ghstack-source-id: 119269131

Test Plan:
## Benchmarks (C++ instruction counts):
### Forward
#### Script
```py
from torch.utils.benchmark import Timer

counts = Timer(
    stmt="""
        auto t = {{op call to measure}};
    """,
    setup="""
        using namespace torch::indexing;
        auto x = torch::ones({4, 4, 4});
    """,
    language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
#### Results
|  Op call                                                              |before   |after   |delta  |      |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x[0] = 1                                                                |11566015 |11566015|0      |0.00% |
|x.index({0})                                                            |6807019  |6801019 |-6000  |-0.09%|
|x.index({0, 0})                                                         |13529019 |13557019|28000  |0.21% |
|x.index({0, 0, 0})                                                      |10677004 |10692004|15000  |0.14% |
|x.index({"..."})                                                        |5512015  |5506015 |-6000  |-0.11%|
|x.index({Slice(None, None, None)})                                      |6866016  |6936016 |70000  |1.02% |
|x.index({None})                                                         |8554015  |8548015 |-6000  |-0.07%|
|x.index({false})                                                        |22400000 |22744000|344000 |1.54% |
|x.index({true})                                                         |27624088 |27264393|-359695|-1.30%|
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})|123472000|123463306|-8694|-0.01%|

### Autograd
#### Script
```py
from torch.utils.benchmark import Timer

counts = Timer(
    stmt="""
        auto t = {{op call to measure}};
    """,
    setup="""
        using namespace torch::indexing;
        auto x = torch::ones({4, 4, 4}, torch::requires_grad());
    """,
    language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
Note: the script measures the **forward** path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path.

#### Results
|  Op call                                                              |before   |after   |delta  |      |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x.index({0})                                                            |14839019|14833019|-6000| 0.00% |
|x.index({0, 0})                                                         |28342019|28370019|28000| 0.00% |
|x.index({0, 0, 0})                                                      |24434004|24449004|15000| 0.00% |
|x.index({"..."})                                                       |12773015|12767015|-6000| 0.00% |
|x.index({Slice(None, None, None)})                                      |14837016|14907016|70000| 0.47% |
|x.index({None})                                                        |15926015|15920015|-6000| 0.00% |
|x.index({false})                                                        |36958000|37477000|519000| 1.40% |
|x.index({true})                                                         |41971408|42426094|454686| 1.08% |
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) |168184392|164545682|-3638710| -2.16% |

Reviewed By: bhosmer

Differential Revision: D25454632

fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d
2021-01-04 05:04:02 -08:00
Jeffrey Wan
4677fc69a2 Fix inf norm grad (reland) (#48611)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/48122

Does this result in a regression? No significant regression observed.

Timer script:
```
import torch
from torch.utils.benchmark import Timer

setup="""
a = torch.rand((2, 2), requires_grad=True)
gradient = torch.ones(2)
"""

stmt="""
torch.autograd.grad(torch.norm(a, dim=(0,), keepdim=False), a, gradient)
"""

timer = Timer(stmt, setup)

print(timer.timeit(10000))
print(timer.collect_callgrind(100))
```
Note: small matrix, keepdim is False, and dims is non-empty

Before change
```
Runtime   37.37 us
1 measurement, 10000 runs , 1 thread

                           All          Noisy symbols removed
    Instructions:     15279045                   15141710
    Baseline:             4257                       3851
100 runs per measurement, 1 thread
```

After change
```
Runtime 36.08 us
1 measurement, 10000 runs , 1 thread

                           All          Noisy symbols removed
    Instructions:     15296974                   15153534
    Baseline:             4257                       3851
100 runs per measurement, 1 thread
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48611

Reviewed By: albanD, mruberry

Differential Revision: D25309997

Pulled By: soulitzer

fbshipit-source-id: 5fb950dc9259234342985c0e84ada25a7e3814d6
2020-12-30 21:13:33 -08:00
anjali411
97c17b4772 Fix auto exponent issue for torch.pow (#49809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49809

Fixes https://github.com/pytorch/xla/issues/2688 #46936

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D25724176

Pulled By: anjali411

fbshipit-source-id: 16287a1f481e9475679b99d6fb45de840da225be
2020-12-29 17:02:56 -08:00
Antonio Cuni
361f5ed91d Implement torch.linalg.qr (#47764)
Summary:
I am opening this PR early to have a place to discuss design issues.
The biggest difference between `torch.qr` and `numpy.linalg.qr` is that the former `torch.qr` takes a boolean parameter `some=True`, while the latter takes a string parameter `mode='reduced'` which can be one of the following:

`reduced`
this is completely equivalent to `some=True`, and both are the default.

`complete`
this is completely equivalent to `some=False`.

`r`
this returns only `r` instead of a tuple `(r, q)`. We have already decided that we don't want different return types depending on the parameters, so I propose to return `(r, empty_tensor)` instead. I **think** that in this mode it will be impossible to implement the backward pass, so we should raise an appropriate error in that case.

`raw`
in this mode, it returns `(h, tau)` instead of `(q, r)`. Internally, `h` and `tau` are obtained by calling lapack's `dgeqrf` and are later used to compute the actual values of `(q, r)`. The numpy docs suggest that these might be useful to call other lapack functions, but at the moment none of them is exposed by numpy and I don't know how often it is used in the real world.
I suppose the implementing the backward pass need attention to: the most straightforward solution is to use `(h, tau)` to compute `(q, r)` and then use the normal logic for `qr_backward`, but there might be faster alternatives.

`full`, `f`
alias for `reduced`, deprecated since numpy 1.8.0

`economic`, `e`
similar to `raw but it returns only `h` instead of `(h, tau). Deprecated since numpy 1.8.0

To summarize:
  * `reduce`, `complete` and `r` are straightforward to implement.

  * `raw` needs a bit of extra care, but I don't know how much high priority it is: since it is used rarely, we might want to not support it right now and maybe implement it in the future?

  * I think we should just leave `full` and `economic` out, and possibly add a note to the docs explaining what you need to use instead

/cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47764

Reviewed By: ngimel

Differential Revision: D25708870

Pulled By: mruberry

fbshipit-source-id: c25c70a23a02ec4322430d636542041e766ebe1b
2020-12-28 17:28:17 -08:00
albanD
c23808d8e8 Reland: Add base forward grad logic (#49734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49734

RFC: https://github.com/pytorch/rfcs/pull/11

This PR add the basic logic to handle forward grad as dual Tensors.
It contains the following:
- Mechanism to save dual state on a Tensor and clear it up when the dual level ends
- C++ and python user facing API
- Updated view system that is able to track both forward and backward views

The current PR has the following limitations:
- Extensive tests are in the next PR in the stack as formulas are needed to write full tests.
- Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack)
- Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR.
- We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise.
- We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise.

Reading guide:
- Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view.
- New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development.
- Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677)
- API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243)
- c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9)
- python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d)
- python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8)
- c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3)
- Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433)
- Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030)

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D25678797

Pulled By: albanD

fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd
2020-12-22 12:11:27 -08:00
Martin Yuan
590e7168ed [PyTorch] Remove direct reference to native symbols in sparse related non-native codes (#49721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49721

As a refactor effort of per-app selective build, we are decoupling ATen/native from the rest of aten (D25413998).
All symbols of ATen/native could only be referenced through dispatcher (https://github.com/pytorch/pytorch/issues/48684).

This diff is to decouple the native reference recently introduced for sparse tensors.
ghstack-source-id: 119028080

Test Plan: CI

Reviewed By: dhruvbird, ngimel

Differential Revision: D25675711

fbshipit-source-id: 381cbb3b361ee41b002055399d4996a9ca21377c
2020-12-21 22:16:20 -08:00
Walter Shen
f5178bf151 Revert D25607503: Add base forward grad logic
Test Plan: revert-hammer

Differential Revision:
D25607503 (fdf02eff3d)

Original commit changeset: f1396290de1d

fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f
2020-12-21 19:56:28 -08:00
albanD
fdf02eff3d Add base forward grad logic (#49097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49097

RFC: https://github.com/pytorch/rfcs/pull/11

This PR add the basic logic to handle forward grad as dual Tensors.
It contains the following:
- Mechanism to save dual state on a Tensor and clear it up when the dual level ends
- C++ and python user facing API
- Updated view system that is able to track both forward and backward views

The current PR has the following limitations:
- Extensive tests are in the next PR in the stack as formulas are needed to write full tests.
- Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack)
- Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR.
- We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise.
- We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise.

Reading guide:
- Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view.
- New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development.
- Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677)
- API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243)
- c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9)
- python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d)
- python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8)
- c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3)
- Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433)
- Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030)

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D25607503

Pulled By: albanD

fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099
2020-12-21 14:39:43 -08:00
Alexander
44ce0b8883 Sparse-sparse matrix multiplication (CPU/CUDA) (#39526)
Summary:
This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format.

The current implementation of `torch.sparse.mm` support this configuration,
`torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large.

This implementation extends `torch.sparse.mm` function to support  `torch.sparse.mm(sparse_matrix1, sparse_matrix2)`

Resolves  #[20988](https://github.com/pytorch/pytorch/issues/20988) for CPU/CUDA.

- [x] sparse matmul
  - [x] CPU/CUDA C++ implementation
  - [x] unittests
  - [x] update torch.sparse.mm documentation
  - [x] autograd support

The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA  rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm.

Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars:

size | density | sparse.mm(CUDA) | sparse.mm(CPU) | scipy_coo_matmul
-- | -- | -- | -- | --
(32, 10000) | 0.01 | 822.7 | 79.4 | 704.1
(32, 10000) | 0.05 | 1741.1 | 402.6 | 1155.3
(32, 10000) | 0.1 | 2956.8 | 840.8 | 1885.4
(32, 10000) | 0.25 | 6417.7 | 2832.3 | 4665.2
(512, 10000) | 0.01 | 1010.2 | 3941.3 | 26937.7
(512, 10000) | 0.05 | 2216.2 | 26903.8 | 57343.7
(512, 10000) | 0.1 | 4868.4 | 87773.7 | 117477.0
(512, 10000) | 0.25 | 16639.3 | 608105.0 | 624290.4
(1024, 10000) | 0.01 | 1224.8 | 13088.1 | 110379.2
(1024, 10000) | 0.05 | 3897.5 | 94783.9 | 236541.8
(1024, 10000) | 0.1 | 10559.1 | 405312.5 | 525483.4
(1024, 10000) | 0.25 | 57456.3 | 2424337.5 | 2729318.7

A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking:

```
[------------------------- sparse.mm-backward -------------------------]
                            |   sparse.backward   |  dense.backward
 -----------------------------------------------------------------------
      (32, 10000) | 0.01    |            13.5          |         2.4
      (32, 10000) | 0.05    |            52.3          |         2.4
      (512, 10000) | 0.01   |          1016.8          |       491.5
      (512, 10000) | 0.05   |          1604.3          |       492.3
      (1024, 10000) | 0.01  |          2384.1          |      1963.7
      (1024, 10000) | 0.05  |          3965.8          |      1951.9
```

I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels.

```
[---------------------------------- matmul ---------------------------------]
                        |   0.5   |  0.7   |  0.8   |  0.9   |  0.95  |  0.98
1 threads: ------------------------------------------------------------------
  (cpu)   torch         |    5.4  |   5.4  |   5.2  |   5.3  |   5.3  |   5.4
          torch.sparse  |  122.2  |  51.9  |  27.5  |  11.4  |   4.9  |   1.8
          scipy         |  150.1  |  87.4  |  69.2  |  56.8  |  38.4  |  17.1
  (cuda)  torch         |    1.3  |   1.1  |   1.1  |   1.1  |   1.1  |   1.1
          torch.sparse  |   20.0  |   8.4  |   5.1  |   2.5  |   1.5  |   1.1

[----------------------------------- backward -----------------------------------]
                        |   0.5   |   0.7   |   0.8   |   0.9   |   0.95  |   0.98
1 threads: -----------------------------------------------------------------------
  (cpu)   torch         |   17.7  |   17.9  |   17.7  |   17.7  |   17.6  |   17.9
          torch.sparse  |  672.9  |  432.6  |  327.5  |  230.8  |  176.7  |  116.7
  (cuda)  torch         |    3.8  |    3.6  |    3.5  |    3.5  |    3.6  |    3.5
          torch.sparse  |   68.8  |   46.2  |   35.6  |   24.2  |   17.8  |   11.9

Times are in milliseconds (ms).
```

In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before.

## **References**

1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. **Sparse GPU Kernels for Deep Learning.**  Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk)
2. Trevor Gale, Erich Elsen, Sara Hooker. **The State of Sparsity in Deep Neural Networks.** [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39526

Reviewed By: mruberry

Differential Revision: D25661239

Pulled By: ngimel

fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938
2020-12-21 11:53:55 -08:00
Ivan Yashchuk
8be205ae13 Added linalg.solve (#48456)
Summary:
This PR adds `torch.linalg.solve`.

`linalg_solve_out` uses in-place operations on the provided result tensor.

I modified `apply_solve` to accept tensor of Int instead of std::vector, that way we can write a function similar to `linalg_solve_out` but removing the error checks and device memory synchronization.

In comparison to `torch.solve` this routine accepts 1-dimensional tensors and batches of 1-dim tensors for the right-hand-side term. `torch.solve` requires it to be at least 2-dimensional.

Ref. https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48456

Reviewed By: izdeby

Differential Revision: D25562222

Pulled By: mruberry

fbshipit-source-id: a9355c029e2442c2e448b6309511919631f9e43b
2020-12-21 10:11:12 -08:00
Ivan Yashchuk
f5ee619d2a Updated derivative rules for complex svd and pinverse (#47761)
Summary:
Updated `svd_backward` to work correctly for complex-valued inputs.
Updated `common_methods_invocations.py` to take dtype, device arguments for input construction.
Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`.
Added `svd` and `pinverse` to list of complex tests.

References for complex-valued SVD differentiation:

- https://giggleliu.github.io/2019/04/02/einsumbp.html
- https://arxiv.org/abs/1909.02659

The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/

The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl).

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761

Reviewed By: ngimel

Differential Revision: D25658897

Pulled By: mruberry

fbshipit-source-id: ba33ecbbea3f592238c01e62c7f193daf22a9d01
2020-12-20 14:39:31 -08:00
Mike Ruberry
f5b68e74d7 Revert D25574962: [pytorch][PR] Updated derivative rules for complex svd and pinverse
Test Plan: revert-hammer

Differential Revision:
D25574962 (9955355853)

Original commit changeset: 832b61303e88

fbshipit-source-id: d73f77f3e51b0f535dad6d21c5bebf8d41a6bfbd
2020-12-17 00:59:43 -08:00
Ryan Spring
65876d3f51 Change aten::native_layer_norm signature to match torch.layer_norm definition (#48971)
Summary:
This PR is to change the `aten::native_layer_norm` and `aten::native_layer_norm_backward` signature to match `torch.layer_norm` definition. The current definition doesn't provide enough information to the PyTorch JIT to fuse layer_norm during training.

`native_layer_norm(X, gamma, beta, M, N, eps)` =>
`native_layer_norm(input, normalized_shape, weight, bias, eps)`

`native_layer_norm_backward(dY, X, mean, rstd, gamma, M, N, grad_input_mask)` =>
`native_layer_norm_backward(dY, input, normalized_shape, mean, rstd, weight, bias, grad_input_mask)`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48971

Reviewed By: izdeby

Differential Revision: D25574070

Pulled By: ngimel

fbshipit-source-id: 23e2804295a95bda3f1ca6b41a1e4c5a3d4d31b4
2020-12-16 23:09:18 -08:00
Ivan Yashchuk
9955355853 Updated derivative rules for complex svd and pinverse (#47761)
Summary:
Updated `svd_backward` to work correctly for complex-valued inputs.
Updated `common_methods_invocations.py` to take dtype, device arguments for input construction.
Removed `test_pinverse` from `test_autograd.py`, it is replaced by entries to `common_methods_invocations.py`.
Added `svd` and `pinverse` to list of complex tests.

References for complex-valued SVD differentiation:

- https://giggleliu.github.io/2019/04/02/einsumbp.html
- https://arxiv.org/abs/1909.02659

The derived rules assume gauge invariance of loss functions, so the result would not be correct for loss functions that are not gauge invariant.
https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/

The same rule is implemented in Tensorflow and [BackwardsLinalg.jl](https://github.com/GiggleLiu/BackwardsLinalg.jl).

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47761

Reviewed By: izdeby

Differential Revision: D25574962

Pulled By: mruberry

fbshipit-source-id: 832b61303e883ad3a451b84850ccf0f36763a6f6
2020-12-16 12:32:22 -08:00
Richard Zou
f98d8c6237 Move inplace_is_vmap_compatible to BatchedTensorImpl.h (#49118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49118

I need this in the next stack up. It seems useful to have as a helper
function.

Test Plan: - run tests

Reviewed By: izdeby

Differential Revision: D25563546

Pulled By: zou3519

fbshipit-source-id: a4031fdc4b2373cc230ba3c66738d91dcade96e2
2020-12-16 11:30:13 -08:00
Peter Bell
94a3d4b083 Remove unused operator at::_fft_with_size (#48905)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48905

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25480385

Pulled By: mruberry

fbshipit-source-id: 192d04a1b7e33b4e408cda8a82679c3ae3490a7d
2020-12-13 20:28:41 -08:00
Ivan Yashchuk
6c1b405a3b Updated derivative rules for complex QR decomposition (#48489)
Summary:
Updated `qr_backward` to work correctly for complex-valued inputs.
Added `torch.qr` to list of complex tests.

The previous implementation for real-valued differentiation used equation 42 from https://arxiv.org/abs/1001.1654
The current implementation is a bit simpler but the result for the real-valued input case is the same and all tests still pass.
Derivation of complex-valued QR differentiation https://giggleliu.github.io/2019/04/02/einsumbp.html

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48489

Reviewed By: bdhirsh

Differential Revision: D25272344

Pulled By: albanD

fbshipit-source-id: b53c1fca1683f4aee5f4d5ce3cab9e559170e7cf
2020-12-11 14:14:40 -08:00
Kurt Mohler
54f0556ee4 Add missing complex support for torch.norm and torch.linalg.norm (#48284)
Summary:
**BC-breaking note:**

Previously, when given a complex input, `torch.linalg.norm` and `torch.norm` would return a complex output. `torch.linalg.cond` would sometimes return a complex output and sometimes return a real output when given a complex input, depending on its `p` argument. This PR changes this behavior to match `numpy.linalg.norm` and `numpy.linalg.cond`, so that a complex input will result in the downgraded real number type, consistent with NumPy.

**PR Summary:**

The following cases were previously unsupported for complex inputs, and this commit adds support:

- Frobenius norm
- Norm order 2 (vector and matrix)
- CUDA vector norm

Part of https://github.com/pytorch/pytorch/issues/47833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48284

Reviewed By: H-Huang

Differential Revision: D25420880

Pulled By: mruberry

fbshipit-source-id: 11f6a2f3cad57d66476d30921c3f6ab8f3cd4017
2020-12-10 10:23:45 -08:00
Peter Bell
fc0a3a1787 Improve torch.fft n-dimensional transforms (#46911)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46911

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25420647

Pulled By: mruberry

fbshipit-source-id: bf7e6a2ec41f9f95ffb05c128ee0f3297e34aae2
2020-12-09 12:40:06 -08:00
Ivan Yashchuk
85121a7a0f Added CUDA support for complex input for torch.cholesky_solve (#47047)
Summary:
`torch.cholesky_solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Differentiation also works correctly with complex inputs now.

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47047

Reviewed By: ngimel

Differential Revision: D24730020

Pulled By: mruberry

fbshipit-source-id: 95402da5789c56e5a682019790985207fa28fa1f
2020-12-05 20:18:30 -08:00
Ivan Yashchuk
4ed7f36ed1 Added linalg.eigh, linalg.eigvalsh (#45526)
Summary:
This PR adds `torch.linalg.eigh`, and `torch.linalg.eigvalsh` for NumPy compatibility.
The current `torch.symeig` uses (on CPU) a different LAPACK routine than NumPy (`syev` vs `syevd`). Even though it shouldn't matter in practice, `torch.linalg.eigh` uses `syevd` (as NumPy does).

Ref https://github.com/pytorch/pytorch/issues/42666

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45526

Reviewed By: gchanan

Differential Revision: D25022659

Pulled By: mruberry

fbshipit-source-id: 3676b77a121c4b5abdb712ad06702ac4944e900a
2020-11-22 04:57:28 -08:00
Nikita Shulga
6d0947c8cf Revert D25093315: [pytorch][PR] Fix inf norm grad
Test Plan: revert-hammer

Differential Revision:
D25093315 (ca880d77b8)

Original commit changeset: be1a7af32fe8

fbshipit-source-id: b383ec2a2c5884149b4fc7896f9d2856259794cd
2020-11-20 18:27:52 -08:00
Jeffrey Wan
ca880d77b8 Fix inf norm grad (#48122)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41779

Also fixes an issue with inf norm returning small non-zero values due to usage of `numeric_limit::min` which actually "returns the minimum positive normalized value" when applied to floating-point numbers. See https://en.cppreference.com/w/cpp/types/numeric_limits/min.

```
>>> import torch
>>> with torch.enable_grad():
...     a = torch.tensor([
...         [9., 2., 9.],
...         [-2., -3., -4.],
...         [7., 8., -9.],
...     ], requires_grad=True)
...     b = torch.norm(a, p=float('inf'))
...     b.backward()
...     print(a.grad)
...
tensor([[ 0.3333,  0.0000,  0.3333],
        [-0.0000, -0.0000, -0.0000],
        [ 0.0000,  0.0000, -0.3333]])
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48122

Reviewed By: izdeby

Differential Revision: D25093315

Pulled By: soulitzer

fbshipit-source-id: be1a7af32fe8bac0df877971fd75089d33e4bd43
2020-11-20 10:22:11 -08:00
Richard Zou
370310bedb batched grad for binary_cross_entropy, symeig (#48057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48057

This PR fixes batched grad computation for:
- binary_cross_entropy (i.e., vmap through binary_cross_entropy_double_backward)
- symeig (i.e. vmap through symeig_backward)

It was previously impossible to vmap through those functions because
they use in-place operations in a vmap-incompatible way.

See note at
233192be73/aten/src/ATen/BatchedFallback.cpp (L117-L122)
for what it means for an in-place operation to be vmap-incompatible.

This PR adds a check: if the in-place operations in e.g. symeig are
vmap-incompatible and we are inside of a vmap, then we do the
out-of-place variant of the operation. Ditto for binary_cross_entropy.

This is to avoid code duplication: the alternative would be to register
the backward formula as an operator and change just those lines to be
out-of-place!

This PR also adds some general guidelines for what to do if an in-place
operation is vmap-incompatible.

General guidelines
------------------

If an in-place operation used in a backward formula is vmap-incompatible,
then as developers we have the following options:

- If the in-place operation directly followed the creation of a tensor with
  a factory function like at::zeros(...), we should replace the factory with a
  corresponding grad.new_zeros(...) call. The grad.new_zeros(...) call
  propagates the batch dims to the resulting tensor.
  For example:
    Before: at::zeros(input.sizes(), grad.options()).copy_(grad)
    After:  grad.new_zeros(input.sizes()).copy_(grad)

- If the in-place operation followed some sequence of operations, if the
  we want to be able to vmap over the backward formula as-is (this is
  usually the case for simple (<15loc) backward formulas), then use
  inplace_is_vmap_compatible to guard the operation. For example:
            c = a * b
    Before: c.mul_(grad)
    After:  c = inplace_is_vmap_compatible(c, grad) ? c.mul_(grad) : c * grad

- If we don't want to vmap directly over the backward formula (e.g., if the
  backward formula is too complicated or has a lot of vmap-incompatible
  operations, then register the backward formula as an operator and eventually
  write a batching rule for it.

Test Plan
---------
New tests

Test Plan: Imported from OSS

Reviewed By: zhangguanheng66

Differential Revision: D25069525

Pulled By: zou3519

fbshipit-source-id: e0dfeb5a812f35b7579fc6ecf7252bf31ce0d790
2020-11-19 07:59:02 -08:00
Mike Ruberry
013e6a3d9d Revert D24698027: Fix auto exponent issue for torch.pow
Test Plan: revert-hammer

Differential Revision:
D24698027 (8ef7ccd669)

Original commit changeset: f23fdb65c925

fbshipit-source-id: 9a67a2c6310c9e4fdefbb421a8cd4fa41595bc9a
2020-11-15 03:58:44 -08:00
anjali411
8ef7ccd669 Fix auto exponent issue for torch.pow (#47024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47024

Fixes https://github.com/pytorch/pytorch/issues/46936

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#47024 Fix auto exponent issue for torch.pow**

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D24698027

Pulled By: anjali411

fbshipit-source-id: f23fdb65c925166243593036e08214c4f041a63d
2020-11-14 22:50:12 -08:00
Ivan Yashchuk
149190c014 Added CUDA support for complex input for torch.solve (#47045)
Summary:
`torch.solve` now works for complex inputs on GPU.
I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes.
Differentiation also works correctly with complex inputs.

Fixes https://github.com/pytorch/pytorch/issues/41084
Ref. https://github.com/pytorch/pytorch/issues/33152

anjali411 I hope you don't mind that I took over https://github.com/pytorch/pytorch/pull/42737

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47045

Reviewed By: nikithamalgifb

Differential Revision: D24921503

Pulled By: anjali411

fbshipit-source-id: 4c3fc4f193a84b6e28c43c08672d480715000923
2020-11-12 12:22:59 -08:00
Richard Zou
57dcb04239 Batched gradient support for view+inplace operations (#47227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47227

Motivation
----------
We would like to compute batched gradients for view+inplace operations.
This most notably shows up in internal implementation of operations.
For example, many view backward functions (SelectBackward, DiagonalBackward)
are implemented with view+inplace, so to support vectorized hessian
computation for e.g. torch.select and torch.diagonal we would need a
way to handle or workaround view+inplace.

Approach
--------
view+inplace creates a CopySlices node and transmute view backward nodes
into an AsStrided node. For example,

```
leaf = torch.randn(4, 5, requires_grad=True)
base = leaf * leaf
view = base[0]
view.cos_()
```

base.grad_fn is CopySlices and view.grad_fn is AsStridedBackward.

To support vmap over CopySlices and AsStridedBackward:
- We use `new_empty_strided` instead of `empty_strided` in CopySlices
so that the batch dims get propagated
- We use `new_zeros` inside AsStridedBackward so that the batch dims get
propagated.

Test Plan
---------
- New tests. When we get closer to having most operations support batched
grad computation via vmap, I'd like to add it as an option to gradcheck
and turn it on for our tests.

Test Plan: Imported from OSS

Reviewed By: kwanmacher, glaringlee

Differential Revision: D24741687

Pulled By: zou3519

fbshipit-source-id: 8210064f782a0a7a193752029a4340e505ffb5d8
2020-11-10 07:38:02 -08:00
Erjia Guan
86bb413600 Optimize backward for torch.repeat (#46726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46726

Fixes #43192

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D24739840

Pulled By: ejguan

fbshipit-source-id: ddf21fc52c4676de25ad7bfb0b5c1c23daa77ee6
2020-11-09 15:12:40 -08:00
Erjia Guan
bba5a31176 Revert D24481801: Optimize backward for torch.repeat
Test Plan: revert-hammer

Differential Revision:
D24481801 (4e6f2440d8)

Original commit changeset: 95c155e0de83

fbshipit-source-id: 0fb0afde760b0f5e17bd75df950a5d76aee5370b
2020-11-04 10:44:40 -08:00
Erjia Guan
f1ac63d324 Implement copysign (#46396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46396

Related #38349

[numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign)
- No in-place function
- No method
- Optional output
- Available: byte, char, bool, int, short, long, float, double, half
- Integral promoted to float
- Not available: float/double complex

`c = np.copysign(a, b)`
|  a |  b |  c | a.grad |
| -1 | -1 | -1 |   1  |
| -0 | -1 | -0 |   0  |
|  0 | -1 | -0 |  0  |
|  1 | -1 | -1 |  -1  |
| -1 | -0 |  -1 |  1  |
| -0 | -0 |  0 |  0  |
|  0 | -0 |  0 |   0  |
|  1 | -0 |  -1 |   -1  |
| -1 |  0 |  1 |  -1  |
| -0 |  0 |  0 |  0  |
|  0 |  0 |  0 |   0  |
|  1 |  0 |  1 |   1  |
| -1 |  1 |  1 |  -1  |
| -0 |  1 |  0 |  0  |
|  0 |  1 |  0 |   0  |
|  1 |  1 |  1 |   1  |

This function becomes **non-differentiable** at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0.

TODO:
- [x] test (cpu/gpu)
- [x] doc
- [x] ~kernel_vec~

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D24401366

Pulled By: ejguan

fbshipit-source-id: 3621c5ff74b185376a3705589983bb5197ab896d
2020-11-04 08:08:57 -08:00
Erjia Guan
4e6f2440d8 Optimize backward for torch.repeat (#46726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46726

Fixes #43192

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D24481801

Pulled By: ejguan

fbshipit-source-id: 95c155e0de83b71f173c9135732ea84ba6399d69
2020-11-03 11:16:55 -08:00
kshitij12345
1d233d7d1f [fix] torch.nn.functional.embedding -> padding_idx behavior (#46714)
Summary:
Reference https://github.com/pytorch/pytorch/issues/46585

Fix for second snippet in the mentioned issue.
```python
predefined_weights = torch.rand(10, 3)
result = torch.nn.functional.embedding(torch.LongTensor([1,2,0]), predefined_weights, padding_idx=0)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46714

Reviewed By: VitalyFedyunin

Differential Revision: D24593352

Pulled By: albanD

fbshipit-source-id: 655b69d9ec57891871e26feeda2aa0dcff73beba
2020-10-29 13:29:00 -07:00
anjali411
d94bd998ec Update backward formulas (Re #44444) (#46275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46275

Re #44444

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D24285785

Pulled By: anjali411

fbshipit-source-id: c60ecd4fe4f144132085f2c91d3b950e92b2a491
2020-10-25 19:40:59 -07:00
Richard Zou
74d81080a0 Use new_zeros in evenly_distribute_backward (#46674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46674

Summary
-------

This adds batched gradient support (i.e., vmap through the gradient
formulas) for Tensor.max(), Tensor.min(), Tensor.median()
that have evenly_distribute_backward as their backward formula.

Previously, the plan was to register incompatible gradient formulas as
backward operators (see #44052). However, it turns out that we can just use
`new_zeros` to get around some incompatible gradient formulas (see next
section for discussion).

Context: the vmap+inplace problem
---------------------------------

A lot of backwards functions are incompatible with BatchedTensor due to
using in-place operations. Sometimes we can allow the in-place
operations, but other times we can't. For example, consider select_backward:

```
Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes,
                       int64_t dim, int64_t index) {
  auto grad_input = at::zeros(input_sizes, grad.options());
  grad_input.select(dim, index).copy_(grad);
  return grad_input;
}
```
and consider the following code:

```
x = torch.randn(5, requires_grad=True)
def select_grad(v):
  torch.autograd.grad(x[0], x, v)

vs = torch.randn(B0)
batched_grads = vmap(select_grad)(vs)
```

For the batched gradient use case, grad is a BatchedTensor.
The physical version of grad has size (B0,).
However, select_backward creates a grad_input of shape (5), and
tries to copy grad to a slice of it.

Up until now, the proposal to handle this has been to register these
backward formulas as operators so that vmap doesn’t actually see the
`copy_` calls (see #44052). However, it turns out we can actually just
use `new_zeros` to construct a new Tensor that has the same
"batched-ness" as grad:
```
auto grad_input = grad.new_zeros(input_sizes);
grad_input.select(dim, index).copy_(grad);
```
We should use this for simple backward functions. For more complicated
backward functions where this solution doesn't work, we should register
those as operators.

Alternatives
------------
Option 2: Register `evenly_distribute_backward` as an operator and have the
vmap fallback run it in a loop.
- This requires more LOC changes.
- Furthermore, we'd have to write an efficient batching rule for
`evenly_distribute_backward` in the future.
- If we use `new_zeros` instead, we don't need to write an efficient
batching rule for `evenly_distribute_backward` as long as the
constituents of `evenly_distributed_backward` have efficient batching rules.

Option 3: Have factory functions perform differently if they are called
inside vmap.
- For example, `at::zeros(3, 5)` could return a Tensor of shape
`(B0, B1, 3, 5)` if we are vmapping over two dimensions with size B0 and B1.
This requires maintaining some global and/or thread-local state about
the size of the dims being vmapped over which can be tricky.

And more...

Future
------
- I will undo some of the work I’ve done in the past to move backward
functions to being operators (#44052, #44408). The simpler backward
functions (like select backward) can just use Tensor.new_zeros.
I apologize for the thrashing.
- Include a NOTE about the vmap+inplace problem somewhere in the
codebase. I don't have a good idea of where to put it at the moment.

Test Plan
---------
- New tests

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D24456781

Pulled By: zou3519

fbshipit-source-id: 9c6c8ee2cb1a4e25afd779bdf0bdf5ab76b9bc20
2020-10-23 14:29:40 -07:00
Kurt Mohler
28f8372bf4 Avoid mat1 references in mm_mat1_backward (#45777)
Summary:
Avoiding references to `mat1` in `mm_mat1_backward` is a first step to solving issue https://github.com/pytorch/pytorch/issues/42371

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45777

Reviewed By: malfet

Differential Revision: D24347967

Pulled By: albanD

fbshipit-source-id: f09a8149d9795481b5ed5b48fdd0e598ba027d0b
2020-10-16 13:52:44 -07:00
Ivan Yashchuk
528158af47 Updated derivatives for complex mm, mv, ger, bmm, triangular_solve (#45737)
Summary:
This PR updates derivatives for a few functions so that `gradgradcheck` for `torch.cholesky` is passed ([ref](https://github.com/pytorch/pytorch/pull/45267#discussion_r494439967)).

Some tests (that call to `bmm_cuda`) fail with with `RuntimeError: _th_bmm_out not supported on CUDAType for ComplexDouble`
until PR https://github.com/pytorch/pytorch/issues/42553 is merged.

Ref. https://github.com/pytorch/pytorch/issues/33152

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45737

Reviewed By: bdhirsh

Differential Revision: D24279917

Pulled By: anjali411

fbshipit-source-id: 7b696d2cfc2ef714332c2e3e5d207e257be67744
2020-10-15 11:27:30 -07:00
Edward Yang
546aab66c1 Revert D24027761: Update backward definition for more operators and reenable tests in test_ops.py
Test Plan: revert-hammer

Differential Revision:
D24027761 (7d809f5d8e)

Original commit changeset: c1f707c2a039

fbshipit-source-id: 30750d2f08886036fb8b2cd0ae51c7732d3b7b19
2020-10-02 18:52:57 -07:00
anjali411
7d809f5d8e Update backward definition for more operators and reenable tests in test_ops.py (#44444)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44444

This PR:
1. Fixes https://github.com/pytorch/pytorch/issues/41510. Updates backward formula for the following functions: `asin`, `acos`, `asinh`, `acosh`, `atan`, `atanh`, `div`, `log`, `log10`, `log2`, `log1p`, `pow`, `reciprocal`, `angle`.
2. Re-enables the tests in `test_ops.py`.
3. Adds dispatch for complex dtypes for `tanh_backward`.
4. Re-enables commented tests in `common_methods_invocation.py`.

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D24027761

Pulled By: anjali411

fbshipit-source-id: c1f707c2a039149a6e04bbde53ee120d9119d99a
2020-10-02 13:37:10 -07:00
Randall Hunt
ab5cf16b6c fix standard deviation gradient NaN behavior (#45468)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4320

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45468

Reviewed By: zhangguanheng66

Differential Revision: D23991064

Pulled By: albanD

fbshipit-source-id: d4274895f2dac8b2cdbd73e5276ce3df466fc341
2020-09-29 13:47:29 -07:00
anjali411
18876b5722 Update backward formula for torch.dot and add backward definition for torch.vdot (#45074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45074

TODO: Add R -> C tests in https://github.com/pytorch/pytorch/pull/44744 (blocked on some JIT changes)

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D23975361

Pulled By: anjali411

fbshipit-source-id: 3512bd2962b588a198bc317673bd18cc96ac823f
2020-09-29 12:52:03 -07:00
Ivan Yashchuk
f47fd0eb72 Updated cholesky_backward for complex inputs (#45267)
Summary:
Updated `cholesky_backward` to work correctly for complex input.
Note that the current implementation gives the conjugate of what JAX would return. anjali411 is that correct thing to do?
Ref. https://github.com/pytorch/pytorch/issues/44895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45267

Reviewed By: bwasti

Differential Revision: D23975269

Pulled By: anjali411

fbshipit-source-id: 9908b0bb53c411e5ad24027ff570c4f0abd451e6
2020-09-29 11:07:32 -07:00
Heitor Schueroff de Souza
96f8755034 Fixed handling of nan for evenly_distribute_backward (#45280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45280

Performance is the same on CPU and on CUDA is only 1-1.05x slower. This change is necessary for the future nan ops including nan(min|max|median)

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D23908796

Pulled By: heitorschueroff

fbshipit-source-id: c2b57acbe924cfa59fbd85216811f29f4af05088
2020-09-28 15:57:02 -07:00
Brian Hirsh
439930c81b adding a beta parameter to the smooth_l1 loss fn (#44433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433

Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time

fixing some type errors, updated fn signature in a few more files

removing my usage of Scalar, making beta a double everywhere instead

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23636720

Pulled By: bdhirsh

fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d
2020-09-25 16:36:28 -07:00
kshitij12345
00e704e757 [fix] torch.repeat : dim-0 backward (#45212)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45201

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45212

Reviewed By: mrshenli

Differential Revision: D23905545

Pulled By: albanD

fbshipit-source-id: c5bf9cf481c8cf3ccc1fdbfb364006b29f67dc9f
2020-09-25 07:53:00 -07:00
anjali411
58b6ab69e5 torch.sgn for complex tensors (#39955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955

resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors.
`torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0`

This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D23460526

Pulled By: anjali411

fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92
2020-09-22 08:24:53 -07:00
anjali411
9f67176b82 Complex gradcheck logic (#43208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208

This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf

More concretely, this PR introduces the following changes:
1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated.
2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added.
3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`.
4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`.

Follow up tasks:
1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)`
2. Add back commented test in `common_methods_invocation.py`.
3. Add more special case checking for complex gradcheck to make debugging easier.
4. Update complex autograd note.
5. disable complex autograd for operators not tested for complex.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D23655088

Pulled By: anjali411

fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb
2020-09-20 22:05:04 -07:00
Peter Bell
da7863f46b Add one dimensional FFTs to torch.fft namespace (#43011)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43011

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23751850

Pulled By: mruberry

fbshipit-source-id: 8dc5fec75102d8809eeb85a3d347ba1b5de45b33
2020-09-19 23:32:22 -07:00
Richard Zou
69f6d94caa Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators (#44422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422

See #44052 for context.

Test Plan:
- `pytest test/test_autograd.py -v`
- `pytest test/test_nn.py -v`

Reviewed By: mrshenli

Differential Revision: D23607691

Pulled By: zou3519

fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6
2020-09-10 18:43:18 -07:00
Richard Zou
7ff7e6cfc8 Register cummaxmin_backward, cumprod_backward as operators (#44410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410

See #44052 for context. One of the cumprod_backward overloads was unused
so I just deleted it.

Test Plan: - `pytest test/test_autograd.py -v`

Reviewed By: mrshenli

Differential Revision: D23605503

Pulled By: zou3519

fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7
2020-09-10 18:43:15 -07:00
Richard Zou
08b431f54c Add trace_backward, masked_select_backward, and take_backward as ops (#44408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408

See #44052 for context.

Test Plan: - `pytest test/test_autograd.py -v`

Reviewed By: mrshenli

Differential Revision: D23605504

Pulled By: zou3519

fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3
2020-09-10 18:41:07 -07:00
Richard Zou
9a5a732866 Register some backwards functions as operators (#44052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052

Summary
=======

This PR registers the following backwards functions as operators:
- slice_backward
- select_backward
- gather_backward
- index_select_backward (the backward function for index_select)
- select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc)

In the future, I'd like to register more backward functions as operators
so that we can write batching rules for the backward functions. Batching
rules for backward functions makes it so that we can compute batched
gradients.

Motivation
==========
The rationale behind this PR is that a lot of backwards functions (27 in total)
are incompatible with BatchedTensor due to using in-place operations.
Sometimes we can allow the in-place operations, but other times we can't.
For example, consider select_backward:

```
Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) {
  auto grad_input = at::zeros(input_sizes, grad.options());
  grad_input.select(dim, index).copy_(grad);
  return grad_input;
}
```

and consider the following code:
```
x = torch.randn(5, requires_grad=True)
def select_grad(v):
    torch.autograd.grad(x[0], x, v)

vs = torch.randn(B0)
batched_grads = vmap(select_grad)(vs)
```

For the batched gradient use case, `grad` is a BatchedTensor.
The physical version of `grad` has size `(B0,)`.
However, select_backward creates a `grad_input` of shape `(5)`, and
tries to copy `grad` to a slice of it.

Other approaches
================

I've considered the following:
- register select_backward as an operator (this PR)
- have a branch inside select_backward for if `grad` is batched.
    - this is OK, but what if we have more tensor extensions that want to override this?
- modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior".
    - select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful

Test Plan
=========

- `pytest test/test_autograd.py -v`
- Registering backward functions may impact performance. I benchmarked
select_backward to see if registering it as an operator led to any noticable
performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc.
The TL;DR is that the overhead is pretty minimal.

Test Plan: Imported from OSS

Reviewed By: ezyang, fbhuba

Differential Revision: D23481183

Pulled By: zou3519

fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350
2020-09-04 08:30:39 -07:00
albanD
73f009a2aa refactor manual function definitions (#43711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43711

this makes them available in forward if needed

No change to the file content, just a copy-paste.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23454146

Pulled By: albanD

fbshipit-source-id: 6269a4aaf02ed53870fadf8b769ac960e49af195
2020-09-02 09:23:21 -07:00