Commit Graph

490 Commits

Author SHA1 Message Date
Harish Shankam
ad31aa652c Fixed the error in conv1d example (#57356)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57356

Reviewed By: albanD

Differential Revision: D28173174

Pulled By: malfet

fbshipit-source-id: 5e813306f2e2f7e0412ffaa5d147441134739e00
2021-05-06 07:02:37 -07:00
Joel Schlosser
7d2a9f2dc9 Fix instance norm input size validation + test (#56659)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45687

Fix changes the input size check for `InstanceNorm*d` to be more restrictive and correctly reject sizes with only a single spatial element, regardless of batch size, to avoid infinite variance.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56659

Reviewed By: pbelevich

Differential Revision: D27948060

Pulled By: jbschlosser

fbshipit-source-id: 21cfea391a609c0774568b89fd241efea72516bb
2021-04-23 10:53:39 -07:00
M.L. Croci
1f0223d6bb Fix bug in gaussian_nll_loss (#56469)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53964. cc albanD almson

## Major changes:
- Overhauled the actual loss calculation so that the shapes are now correct (in functional.py)
- added the missing doc in nn.functional.rst

## Minor changes (in functional.py):
- I removed the previous check on whether input and target were the same shape. This is to allow for broadcasting, say when you have 10 predictions that all have the same target.
- I added some comments to explain each shape check in detail. Let me know if these should be shortened/cut.

Screenshots of updated docs attached.
Let me know what you think, thanks!

## Edit: Description of change of behaviour (affecting BC):
The backwards-compatibility is only affected for the `reduction='none'` mode. This was the source of the bug. For tensors with size (N, D), the old returned loss had size (N), as incorrect summation was happening. It will now have size (N, D) as expected.

### Example
Define input tensors, all with size (2, 3).
`input = torch.tensor([[0., 1., 3.], [2., 4., 0.]], requires_grad=True)`
`target = torch.tensor([[1., 4., 2.], [-1., 2., 3.]])`
`var = 2*torch.ones(size=(2, 3), requires_grad=True)`

Initialise loss with reduction mode 'none'. We expect the returned loss to have the same size as the input tensors, (2, 3).
`loss = torch.nn.GaussianNLLLoss(reduction='none')`

Old behaviour:
`print(loss(input, target, var)) `
`# Gives tensor([3.7897, 6.5397], grad_fn=<MulBackward0>. This has size (2).`

New behaviour:
`print(loss(input, target, var)) `
`# Gives tensor([[0.5966, 2.5966, 0.5966], [2.5966, 1.3466, 2.5966]], grad_fn=<MulBackward0>)`
`# This has the expected size, (2, 3).`

To recover the old behaviour, sum along all dimensions except for the 0th:
`print(loss(input, target, var).sum(dim=1))`
`# Gives tensor([3.7897, 6.5397], grad_fn=<SumBackward1>.`

![doc1](https://user-images.githubusercontent.com/26558092/115391089-f7f47b00-a1d6-11eb-8726-e4da9057aee0.png)
![doc2](https://user-images.githubusercontent.com/26558092/115391094-f925a800-a1d6-11eb-954b-afd187f42bc7.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56469

Reviewed By: jbschlosser, agolynski

Differential Revision: D27894170

Pulled By: albanD

fbshipit-source-id: 197890189c97c22109491c47f469336b5b03a23f
2021-04-22 07:43:48 -07:00
Nikita Shulga
6d7d36d255 s/“pad”/"pad"/ in files introduced by #56065 (#56618)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56618

Reviewed By: albanD

Differential Revision: D27919343

Pulled By: malfet

fbshipit-source-id: 2fac8ba5f399e050463141eba225da935c97a5ce
2021-04-21 17:40:29 -07:00
Joel Schlosser
8a81c4dc27 Update padding_idx docs for EmbeddingBag to better match Embedding's (#56065)
Summary:
Match updated `Embedding` docs from https://github.com/pytorch/pytorch/pull/54026 as closely as possible. Additionally, update the C++ side `Embedding` docs, since those were missed in the previous PR.

There are 6 (!) places for docs:
1. Python module form in `sparse.py` - includes an additional line about newly constructed `Embedding`s / `EmbeddingBag`s
2. Python `from_pretrained()` in `sparse.py` (refers back to module docs)
3. Python functional form in `functional.py`
4. C++ module options - includes an additional line about newly constructed `Embedding`s / `EmbeddingBag`s
5. C++ `from_pretrained()` options
6. C++ functional options

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56065

Reviewed By: malfet

Differential Revision: D27908383

Pulled By: jbschlosser

fbshipit-source-id: c5891fed1c9d33b4b8cd63500a14c1a77d92cc78
2021-04-21 12:10:37 -07:00
Sam Estep
e3900d2ba5 Add lint for unqualified noqa (#56272)
Summary:
As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future.

Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two:
```
test/jit/test_misc.py:27:            print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999
test/jit/test_misc.py:28:            print(f"format blank") # noqa F541
```
However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored:

- If you change them to anything else, the warnings will still be suppressed.
- If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally:
  ```
  test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment
  test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment
  ```

I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2365189927

Reviewed By: janeyx99

Differential Revision: D27830127

Pulled By: samestep

fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb
2021-04-19 13:16:18 -07:00
Kurt Mohler
3fe4718d16 Add padding_idx argument to EmbeddingBag (#49237)
Summary:
This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction.

This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided.

Fixes https://github.com/pytorch/pytorch/issues/3194

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237

Reviewed By: walterddr, VitalyFedyunin

Differential Revision: D26948258

Pulled By: jbschlosser

fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc
2021-04-14 09:38:01 -07:00
Jeff Yang
263d8ef4ef docs: fix formatting for embedding_bag (#54666)
Summary:
fixes https://github.com/pytorch/pytorch/issues/43499

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54666

Reviewed By: H-Huang

Differential Revision: D27411027

Pulled By: jbschlosser

fbshipit-source-id: a84cc174155bd725e108d8f953a21bb8de8d9d23
2021-04-07 06:32:16 -07:00
Hameer Abbasi
db3a9d7f8a Fix __torch_function__ tests. (#54492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54492

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27292567

Pulled By: ezyang

fbshipit-source-id: dc29daea967c6d8aaf63bdbcb4aff0bb13d7a5f7
2021-03-26 10:59:15 -07:00
Bel H
645119eaef Lowering NLLLoss/CrossEntropyLoss to ATen code (#53789)
Summary:
* Lowering NLLLoss/CrossEntropyLoss to ATen dispatch
* This allows the MLC device to override these ops
* Reduce code duplication between the Python and C++ APIs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53789

Reviewed By: ailzhang

Differential Revision: D27345793

Pulled By: albanD

fbshipit-source-id: 99c0d617ed5e7ee8f27f7a495a25ab4158d9aad6
2021-03-26 07:31:08 -07:00
Edward Yang
33b95c6bac Add __torch_function__ support for torch.nn.functional.embedding (#54478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54478

Fixes #54292

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D27264179

Pulled By: ezyang

fbshipit-source-id: cd267e2e668fdd8d7f958bf70a0b93e058ec7c23
2021-03-23 17:22:39 -07:00
Peter Bell
04e0cbf5a9 Add padding='same' mode to conv{1,2,3}d (#45667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45667

First part of #3867 (Pooling operators still to do)

This adds a `padding='same'` mode to the interface of `conv{n}d`and `nn.Conv{n}d`. This should match the behaviour of `tensorflow`. I couldn't find it explicitly documented but through experimentation I found `tensorflow` returns the shape `ceil(len/stride)` and always adds any extra asymmetric padding onto the right side of the input.

Since the `native_functions.yaml` schema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically padded `conv{n}d` and the `_conv{n}d_same` variant. Underscores because I couldn't see any way to avoid exporting a function into the `torch` namespace.

A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even  and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use `TORCH_WARN_ONCE` to notify the user of the performance implications.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D27170744

Pulled By: jbschlosser

fbshipit-source-id: b3d8a0380e0787ae781f2e5d8ee365a7bfd49f22
2021-03-18 16:22:03 -07:00
kshitij12345
c1a39620b8 [nn] nn.Embedding : padding_idx doc update (#53809)
Summary:
Follow-up of https://github.com/pytorch/pytorch/pull/53447

Reference: https://github.com/pytorch/pytorch/pull/53447#discussion_r590521051

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53809

Reviewed By: bdhirsh

Differential Revision: D27049643

Pulled By: jbschlosser

fbshipit-source-id: 623a2a254783b86391dc2b0777b688506adb4c0e
2021-03-15 11:54:51 -07:00
kshitij12345
45ddf113c9 [fix] nn.Embedding: allow changing the padding vector (#53447)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53368

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53447

Reviewed By: albanD

Differential Revision: D26946284

Pulled By: jbschlosser

fbshipit-source-id: 54e5eec7da86fa02b1b6e4a235d66976a80764fc
2021-03-10 09:53:27 -08:00
James Reed
215950e2be Convert type annotations in nn/functional.py to py3 syntax (#53656)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53656

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D26926018

Pulled By: jamesr66a

fbshipit-source-id: 2381583cf93c9c9d0c9eeaa6e41eddce3729942d
2021-03-09 22:26:22 -08:00
Evelyn Fitzgerald
b4395b046a Edit SiLU documentation (#53239)
Summary:
I edited the documentation for `nn.SiLU` and `F.silu` to:
- Explain that SiLU is also known as swish and that it stands for "Sigmoid Linear Unit."
- Ensure that "SiLU" is correctly capitalized.

I believe these changes will help users find the function they're looking for by adding relevant keywords to the docs.

Fixes: N/A

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53239

Reviewed By: jbschlosser

Differential Revision: D26816998

Pulled By: albanD

fbshipit-source-id: b4e9976e6b7e88686e3fa7061c0e9b693bd6d198
2021-03-04 12:51:25 -08:00
Joel Schlosser
e86476f736 Huber loss (#50553)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48595.

## Background

This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant.

I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows:
```
Huber loss calls dedicated Huber kernel: 2,795,300
Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612
```
With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553

Test Plan:
```
python test/test_nn.py TestNN.test_HuberLoss
python test/test_nn.py TestNN.test_HuberLoss_delta
python test/test_nn.py TestNN.test_huber_loss_invalid_delta
python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu
python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda
python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu
python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda
python test/test_nn.py TestNN.test_loss_equal_input_target_shape
python test/test_nn.py TestNN.test_pointwise_loss_broadcast
python test/test_overrides.py
python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss
python test/test_type_hints.py
python test/test_cpp_api_parity.py
build/bin/test_api
```

## Documentation
<img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png">
<img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png">
<img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png">
<img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png">
<img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png">

Reviewed By: H-Huang

Differential Revision: D26734071

Pulled By: jbschlosser

fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5
2021-03-02 17:30:45 -08:00
Jeff Yang
316eabe9ba fix(docs): remove redundant hardsigmoid() in docstring to show up inplace parameter (#52559)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50016

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52559

Reviewed By: ailzhang

Differential Revision: D26636347

Pulled By: vkuzo

fbshipit-source-id: da615d0eb6372637a6441e53698e86252591f6d8
2021-02-25 09:09:32 -08:00
Bel H
99a428ab22 Lower ReLu6 to aten (#52723)
Summary:
-Lower Relu6 to ATen
-Change Python and C++ to reflect change
-adds an entry in native_functions.yaml for that new function
-this is needed as we would like to intercept ReLU6 at a higher level with an XLA-approach codegen.
-Should pass functional C++ tests pass. But please let me know if more tests are required.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52723

Reviewed By: ailzhang

Differential Revision: D26641414

Pulled By: albanD

fbshipit-source-id: dacfc70a236c4313f95901524f5f021503f6a60f
2021-02-25 08:38:11 -08:00
Jeff Yang
f111ec48c1 docs: add fractional_max_pool in nn.functional (#52557)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51708

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52557

Reviewed By: bdhirsh

Differential Revision: D26591388

Pulled By: jbschlosser

fbshipit-source-id: 42643864df92ea014e69a8ec5c29333735e98898
2021-02-22 20:45:07 -08:00
Joel Schlosser
a39b1c42c1 MHA: Fix regression and apply bias flag to both in/out proj (#52537)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52257

## Background
Reverts MHA behavior for `bias` flag to that of v1.5: flag enables or disables both in and out projection biases.

Updates type annotations for both in and out projections biases from `Tensor` to `Optional[Tensor]` for `torch.jit.script` usage.

Note: With this change, `_LinearWithBias` defined in `torch/nn/modules/linear.py` is no longer utilized. Completely removing it would require updates to quantization logic in the following files:
```
test/quantization/test_quantized_module.py
torch/nn/quantizable/modules/activation.py
torch/nn/quantized/dynamic/modules/linear.py
torch/nn/quantized/modules/linear.py
torch/quantization/quantization_mappings.py
```
This PR takes a conservative initial approach and leaves these files unchanged.

**Is it safe to fully remove `_LinearWithBias`?**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52537

Test Plan:
```
python test/test_nn.py TestNN.test_multihead_attn_no_bias
```

## BC-Breaking Note
In v1.6, the behavior of `MultiheadAttention`'s `bias` flag was incorrectly changed to affect only the in projection layer. That is, setting `bias=False` would fail to disable the bias for the out projection layer. This regression has been fixed, and the `bias` flag now correctly applies to both the in and out projection layers.

Reviewed By: bdhirsh

Differential Revision: D26583639

Pulled By: jbschlosser

fbshipit-source-id: b805f3a052628efb28b89377a41e06f71747ac5b
2021-02-22 14:47:12 -08:00
Mike Ruberry
594a66d778 Warn about floor_divide performing incorrect rounding (#50281) (#50281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745

Test Plan: Imported from OSS

Reviewed By: ngimel

Pulled By: mruberry

Differential Revision: D26257855

fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b
2021-02-10 03:13:34 -08:00
Brandon Lin
35b3e16091 [pytorch] Fix torch.nn.functional.normalize to be properly scriptable (#51909)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51909

Several scenarios don't work when trying to script `F.normalize`, notably when you try to symbolically trace through it with using the default argument:

```
import torch.nn.functional as F
import torch
from torch.fx import symbolic_trace

def f(x):
    return F.normalize(x)

gm = symbolic_trace(f)
torch.jit.script(gm)
```
which leads to the error
```
RuntimeError:

normalize(Tensor input, float p=2., int dim=1, float eps=9.9999999999999998e-13, Tensor? out=None) -> (Tensor):
Expected a value of type 'float' for argument 'p' but instead found type 'int'.
:
def forward(self, x):
    normalize_1 = torch.nn.functional.normalize(x, p = 2, dim = 1, eps = 1e-12, out = None);  x = None
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    return normalize_1

Reviewed By: jamesr66a

Differential Revision: D26324308

fbshipit-source-id: 30dd944a6011795d17164f2c746068daac570cea
2021-02-09 07:26:57 -08:00
jiej
4d703d040b Linear autodiff revert revert (#51613)
Summary:
patch PR https://github.com/pytorch/pytorch/issues/50856 and rollbak the revert D26105797 (e488e3c443)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51613

Reviewed By: mruberry

Differential Revision: D26253999

Pulled By: ngimel

fbshipit-source-id: a20b1591de06dd277e4cd95542e3291a2f5a252c
2021-02-04 16:32:05 -08:00
Natalia Gimelshein
26f9ac98e5 Revert D26105797: [pytorch][PR] Exposing linear layer to fuser
Test Plan: revert-hammer

Differential Revision:
D26105797 (e488e3c443)

Original commit changeset: 6f7cedb9f6e3

fbshipit-source-id: f0858cefed76d726e9dba61e51e1eaf2af4c99c5
2021-02-02 17:39:17 -08:00
jiej
e488e3c443 Exposing linear layer to fuser (#50856)
Summary:
1. enabling linear in autodiff;
2. remove control flow in python for linear;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50856

Reviewed By: pbelevich

Differential Revision: D26105797

Pulled By: eellison

fbshipit-source-id: 6f7cedb9f6e3e46daa24223d2a6080880498deb4
2021-02-02 15:39:01 -08:00
M.L. Croci
8eb90d4865 Add Gaussian NLL Loss (#50886)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48520.

cc albanD (This is a clean retry PR https://github.com/pytorch/pytorch/issues/49807)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50886

Reviewed By: ejguan

Differential Revision: D26007435

Pulled By: albanD

fbshipit-source-id: 88fe91b40dea6f72e093e6301f0f04fcc842d2f0
2021-01-22 06:56:49 -08:00
Taylor Robie
6a3fc0c21c Treat has_torch_function and object_has_torch_function as static False when scripting (#48966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48966

This PR lets us skip the `if not torch.jit.is_scripting():` guards on `functional` and `nn.functional` by directly registering `has_torch_function` and `object_has_torch_function` to the JIT as statically False.

**Benchmarks**

The benchmark script is kind of long. The reason is that it's testing all four PRs in the stack, plus threading and subprocessing so that the benchmark can utilize multiple cores while still collecting good numbers. Both wall times and instruction counts were collected. This stack changes dozens of operators / functions, but very mechanically such that there are only a handful of codepath changes. Each row is a slightly different code path (e.g. testing in Python, testing in the arg parser, different input types, etc.)

<details>

<summary> Test script </summary>

```
import argparse
import multiprocessing
import multiprocessing.dummy
import os
import pickle
import queue
import random
import sys
import subprocess
import tempfile
import time

import torch
from torch.utils.benchmark import Timer, Compare, Measurement

NUM_CORES = multiprocessing.cpu_count()
ENVS = {
    "ref": "HEAD (current)",
    "torch_fn_overhead_stack_0": "#48963",
    "torch_fn_overhead_stack_1": "#48964",
    "torch_fn_overhead_stack_2": "#48965",
    "torch_fn_overhead_stack_3": "#48966",
}

CALLGRIND_ENVS = tuple(ENVS.keys())

MIN_RUN_TIME = 3
REPLICATES = {
    "longer": 1_000,
    "long": 300,
    "short": 50,
}

CALLGRIND_NUMBER = {
    "overnight": 500_000,
    "long": 250_000,
    "short": 10_000,
}

CALLGRIND_TIMEOUT = {
    "overnight": 800,
    "long": 400,
    "short": 100,
}

SETUP = """
    x = torch.ones((1, 1))
    y = torch.ones((1, 1))
    w_tensor = torch.ones((1, 1), requires_grad=True)
    linear = torch.nn.Linear(1, 1, bias=False)
    linear_w = linear.weight
"""

TASKS = {
    "C++: unary                 `.t()`": "w_tensor.t()",
    "C++: unary  (Parameter)    `.t()`": "linear_w.t()",
    "C++: binary (Parameter)    `mul` ": "x + linear_w",
    "tensor.py: _wrap_type_error_to_not_implemented `__floordiv__`": "x // y",
    "tensor.py: method          `__hash__`": "hash(x)",
    "Python scalar              `__rsub__`": "1 - x",
    "functional.py: (unary)     `unique`": "torch.functional.unique(x)",
    "functional.py: (args)      `atleast_1d`": "torch.functional.atleast_1d((x, y))",
    "nn/functional.py: (unary)  `relu`": "torch.nn.functional.relu(x)",
    "nn/functional.py: (args)   `linear`": "torch.nn.functional.linear(x, w_tensor)",
    "nn/functional.py: (args)   `linear (Parameter)`": "torch.nn.functional.linear(x, linear_w)",
    "Linear(..., bias=False)": "linear(x)",
}

def _worker_main(argv, fn):
    parser = argparse.ArgumentParser()
    parser.add_argument("--output_file", type=str)
    parser.add_argument("--single_task", type=int, default=None)
    parser.add_argument("--length", type=str)
    args = parser.parse_args(argv)
    single_task = args.single_task

    conda_prefix = os.getenv("CONDA_PREFIX")
    assert torch.__file__.startswith(conda_prefix)

    env = os.path.split(conda_prefix)[1]
    assert env in ENVS

    results = []
    for i, (k, stmt) in enumerate(TASKS.items()):
        if single_task is not None and single_task != i:
            continue

        timer = Timer(
            stmt=stmt,
            setup=SETUP,
            sub_label=k,
            description=ENVS[env],
        )
        results.append(fn(timer, args.length))

    with open(args.output_file, "wb") as f:
        pickle.dump(results, f)

def worker_main(argv):
    _worker_main(
        argv,
        lambda timer, _: timer.blocked_autorange(min_run_time=MIN_RUN_TIME)
    )

def callgrind_worker_main(argv):
    _worker_main(
        argv,
        lambda timer, length: timer.collect_callgrind(number=CALLGRIND_NUMBER[length], collect_baseline=False))

def main(argv):
    parser = argparse.ArgumentParser()
    parser.add_argument("--long", action="store_true")
    parser.add_argument("--longer", action="store_true")
    args = parser.parse_args(argv)

    if args.longer:
        length = "longer"
    elif args.long:
        length = "long"
    else:
        length = "short"
    replicates = REPLICATES[length]

    num_workers = int(NUM_CORES // 2)
    tasks = list(ENVS.keys()) * replicates
    random.shuffle(tasks)
    task_queue = queue.Queue()
    for _ in range(replicates):
        envs = list(ENVS.keys())
        random.shuffle(envs)
        for e in envs:
            task_queue.put((e, None))

    callgrind_task_queue = queue.Queue()
    for e in CALLGRIND_ENVS:
        for i, _ in enumerate(TASKS):
            callgrind_task_queue.put((e, i))

    results = []
    callgrind_results = []

    def map_fn(worker_id):
        # Adjacent cores often share cache and maxing out a machine can distort
        # timings so we space them out.
        callgrind_cores = f"{worker_id * 2}-{worker_id * 2 + 1}"
        time_cores = str(worker_id * 2)
        _, output_file = tempfile.mkstemp(suffix=".pkl")
        try:
            loop_tasks = (
                # Callgrind is long running, and then the workers can help with
                # timing after they finish collecting counts.
                (callgrind_task_queue, callgrind_results, "callgrind_worker", callgrind_cores, CALLGRIND_TIMEOUT[length]),
                (task_queue, results, "worker", time_cores, None))

            for queue_i, results_i, mode_i, cores, timeout in loop_tasks:
                while True:
                    try:
                        env, task_i = queue_i.get_nowait()
                    except queue.Empty:
                        break

                    remaining_attempts = 3
                    while True:
                        try:
                            subprocess.run(
                                " ".join([
                                    "source", "activate", env, "&&",
                                    "taskset", "--cpu-list", cores,
                                    "python", os.path.abspath(__file__),
                                    "--mode", mode_i,
                                    "--length", length,
                                    "--output_file", output_file
                                ] + ([] if task_i is None else ["--single_task", str(task_i)])),
                                shell=True,
                                check=True,
                                timeout=timeout,
                            )
                            break

                        except subprocess.TimeoutExpired:
                            # Sometimes Valgrind will hang if there are too many
                            # concurrent runs.
                            remaining_attempts -= 1
                            if not remaining_attempts:
                                print("Too many failed attempts.")
                                raise
                            print(f"Timeout after {timeout} sec. Retrying.")

                    # We don't need a lock, as the GIL is enough.
                    with open(output_file, "rb") as f:
                        results_i.extend(pickle.load(f))

        finally:
            os.remove(output_file)

    with multiprocessing.dummy.Pool(num_workers) as pool:
        st, st_estimate, eta, n_total = time.time(), None, "", len(tasks) * len(TASKS)
        map_job = pool.map_async(map_fn, range(num_workers))
        while not map_job.ready():
            n_complete = len(results)
            if n_complete and len(callgrind_results):
                if st_estimate is None:
                    st_estimate = time.time()
                else:
                    sec_per_element = (time.time() - st_estimate) / n_complete
                    n_remaining = n_total - n_complete
                    eta = f"ETA: {n_remaining * sec_per_element:.0f} sec"

            print(
                f"\r{n_complete} / {n_total}  "
                f"({len(callgrind_results)} / {len(CALLGRIND_ENVS) * len(TASKS)})   "
                f"{eta}".ljust(40), end="")
            sys.stdout.flush()
            time.sleep(2)
    total_time = int(time.time() - st)
    print(f"\nTotal time: {int(total_time // 60)} min, {total_time % 60} sec")

    desc_to_ind = {k: i for i, k in enumerate(ENVS.values())}
    results.sort(key=lambda r: desc_to_ind[r.description])

    # TODO: Compare should be richer and more modular.
    compare = Compare(results)
    compare.trim_significant_figures()
    compare.colorize(rowwise=True)

    # Manually add master vs. overall relative delta t.
    merged_results = {
        (r.description, r.sub_label): r
        for r in Measurement.merge(results)
    }

    cmp_lines = str(compare).splitlines(False)
    print(cmp_lines[0][:-1] + "-" * 15 + "]")
    print(f"{cmp_lines[1]} |{'':>10}\u0394t")
    print(cmp_lines[2] + "-" * 15)
    for l, t in zip(cmp_lines[3:3 + len(TASKS)], TASKS.keys()):
        assert l.strip().startswith(t)
        t0 = merged_results[(ENVS["ref"], t)].median
        t1 = merged_results[(ENVS["torch_fn_overhead_stack_3"], t)].median
        print(f"{l} |{'':>5}{(t1 / t0 - 1) * 100:>6.1f}%")
    print("\n".join(cmp_lines[3 + len(TASKS):]))

    counts_dict = {
        (r.task_spec.description, r.task_spec.sub_label): r.counts(denoise=True)
        for r in callgrind_results
    }

    def rel_diff(x, x0):
        return f"{(x / x0 - 1) * 100:>6.1f}%"

    task_pad = max(len(t) for t in TASKS)
    print(f"\n\nInstruction % change (relative to `{CALLGRIND_ENVS[0]}`)")
    print(" " * (task_pad + 8)  + (" " * 7).join([ENVS[env] for env in CALLGRIND_ENVS[1:]]))
    for t in TASKS:
        values = [counts_dict[(ENVS[env], t)] for env in CALLGRIND_ENVS]

        print(t.ljust(task_pad + 3) + "  ".join([
            rel_diff(v, values[0]).rjust(len(ENVS[env]) + 5)
            for v, env in zip(values[1:], CALLGRIND_ENVS[1:])]))

        print("\033[4m" + "    Instructions per invocation".ljust(task_pad + 3) + "  ".join([
            f"{v // CALLGRIND_NUMBER[length]:.0f}".rjust(len(ENVS[env]) + 5)
            for v, env in zip(values[1:], CALLGRIND_ENVS[1:])]) + "\033[0m")
        print()

    import pdb
    pdb.set_trace()

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--mode", type=str, choices=("main", "worker", "callgrind_worker"), default="main")
    args, remaining = parser.parse_known_args()

    if args.mode == "main":
        main(remaining)

    elif args.mode == "callgrind_worker":
        callgrind_worker_main(remaining)

    else:
        worker_main(remaining)

```

</details>

**Wall time**
<img width="1178" alt="Screen Shot 2020-12-12 at 12 28 13 PM" src="https://user-images.githubusercontent.com/13089297/101994419-284f6a00-3c77-11eb-8dc8-4f69a890302e.png">

<details>

<summary> Longer run (`python test.py --long`) is basically identical. </summary>

<img width="1184" alt="Screen Shot 2020-12-12 at 5 02 47 PM" src="https://user-images.githubusercontent.com/13089297/102000425-2350e180-3c9c-11eb-999e-a95b37e9ef54.png">

</details>

**Callgrind**
<img width="936" alt="Screen Shot 2020-12-12 at 12 28 54 PM" src="https://user-images.githubusercontent.com/13089297/101994421-2e454b00-3c77-11eb-9cd3-8cde550f536e.png">

Test Plan: existing unit tests.

Reviewed By: ezyang

Differential Revision: D25590731

Pulled By: robieta

fbshipit-source-id: fe05305ff22b0e34ced44b60f2e9f07907a099dd
2021-01-10 19:23:38 -08:00
Taylor Robie
d31a760be4 move has_torch_function to C++, and make a special case object_has_torch_function (#48965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48965

This PR pulls `__torch_function__` checking entirely into C++, and adds a special `object_has_torch_function` method for ops which only have one arg as this lets us skip tuple construction and unpacking. We can now also do away with the Python side fast bailout for `Tensor` (e.g. `if any(type(t) is not Tensor for t in tensors) and has_torch_function(tensors)`) because they're actually slower than checking with the Python C API.

Test Plan: Existing unit tests. Benchmarks are in #48966

Reviewed By: ezyang

Differential Revision: D25590732

Pulled By: robieta

fbshipit-source-id: 6bd74788f06cdd673f3a2db898143d18c577eb42
2021-01-10 19:23:35 -08:00
Richard Barnes
2bceee785f Clean up simple type annotations in nn/functional.py (#50106)
Summary:
Also reformats code to pass linters.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50106

Test Plan: Sandcastle tests

Reviewed By: xush6528

Differential Revision: D25787566

fbshipit-source-id: 39c86b4021e279f92f8ccf30252a6cfae1063c3c
2021-01-07 15:33:40 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Joel Schlosser
68d438c9da Add PixelUnshuffle (#49334)
Summary:
Adds an implementation of `torch.nn.PixelUnshuffle` as the inverse operation of `torch.nn.PixelShuffle`. This addresses https://github.com/pytorch/pytorch/issues/2456

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49334

Test Plan:
```
# Unit tests.
python test/test_nn.py TestNN.test_pixel_shuffle_unshuffle

# Module test.
python test/test_nn.py TestNN.test_PixelUnshuffle

# C++ API tests.
build/bin/test_api

# C++ / python parity tests.
python test/test_cpp_api_parity.py

# JIT test.
python test/test_jit.py TestJitGeneratedFunctional.test_nn_pixel_unshuffle

# Override tests.
python test/test_overrides.py

# Type hint tests.
python test/test_type_hints.py
```

Screenshots of rendered docs:
<img width="876" alt="Screen Shot 2020-12-18 at 12 19 05 PM" src="https://user-images.githubusercontent.com/75754324/102642255-6b07bb00-412b-11eb-88fa-e53e7e8ba720.png">
<img width="984" alt="Screen Shot 2020-12-18 at 12 19 26 PM" src="https://user-images.githubusercontent.com/75754324/102642276-70fd9c00-412b-11eb-8548-445082a2db02.png">
<img width="932" alt="Screen Shot 2020-12-18 at 12 19 34 PM" src="https://user-images.githubusercontent.com/75754324/102642704-19abfb80-412c-11eb-9546-95bdd1c3cf22.png">
<img width="876" alt="Screen Shot 2020-12-22 at 12 51 36 PM" src="https://user-images.githubusercontent.com/75754324/102918259-986aa680-4454-11eb-99e7-a0b4c8b3e283.png">
<img width="869" alt="Screen Shot 2020-12-22 at 12 51 44 PM" src="https://user-images.githubusercontent.com/75754324/102918274-9ef91e00-4454-11eb-94bb-91b58aff47d3.png">

Reviewed By: mruberry

Differential Revision: D25401439

Pulled By: jbschlosser

fbshipit-source-id: 209d92ce7295e51699e83616d0c62170a7ce75c8
2020-12-22 20:14:55 -08:00
Guanheng Zhang
e2b4c63dd9 Enable the faster combined weight branch in MHA when query/key/value is same object with nan (#48126)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47979

For MHA module, it is preferred to use the combined weight branch as much as possible when query/key/value are same (in case of same values by `torch.equal` or exactly same object by `is` ops). This PR will enable the faster branch when a single object with `nan` is passed to MHA.

For the background knowledge
```
import torch
a = torch.tensor([float('NaN'), 1, float('NaN'), 2, 3])
print(a is a) # True
print(torch.equal(a, a)) # False
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48126

Reviewed By: gchanan

Differential Revision: D25042082

Pulled By: zhangguanheng66

fbshipit-source-id: 6bb17a520e176ddbb326ddf30ee091a84fcbbf27
2020-11-18 08:24:41 -08:00
Qi Zhou
0ec717c830 Support int32 indices and offsets in nn.EmbeddingBag (#46758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46758

It's in general helpful to support int32 indices and offsets, especially when such tensors are large and need to be transferred to accelerator backends. Since it may not be very useful to support the combination of int32 indices and int64 offsets, here we enforce that these two must have the same type.

Test Plan: unit tests

Reviewed By: ngimel

Differential Revision: D24470808

fbshipit-source-id: 94b8a1d0b7fc9fe3d128247aa042c04d7c227f0b
2020-11-03 23:33:50 -08:00
pomelyu
f41f3e3cd1 Implement bicubic grid sampler (#44780)
Summary:
Fix https://github.com/pytorch/pytorch/issues/44601

I added bicubic grid sampler in both cpu and cuda side, but haven't in AVX2

There is a [colab notebook](https://colab.research.google.com/drive/1mIh6TLLj5WWM_NcmKDRvY5Gltbb781oU?usp=sharing) show some test results. The notebook use bilinear for test, since I could only use distributed version of pytorch in it. You could just download it and modify the `mode_torch=bicubic` to show the results.

There are some duplicate code about getting and setting values, since the helper function used in bilinear at first clip the coordinate beyond boundary, and then get or set the value. However, in bicubic, there are more points should be consider. I could refactor that part after making sure the overall calculation are correct.

Thanks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44780

Reviewed By: mrshenli

Differential Revision: D24681114

Pulled By: mruberry

fbshipit-source-id: d39c8715e2093a5a5906cb0ef040d62bde578567
2020-11-03 15:34:59 -08:00
Ollin Boer Bohan
ac4ee0ef5d Fix typo in docs for interpolate (#46589)
Summary:
Removes a spurious backtick in [the docs for `torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/nn.functional.html?highlight=grid_sample#torch.nn.functional.interpolate)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46589

Reviewed By: zou3519

Differential Revision: D24422550

Pulled By: ezyang

fbshipit-source-id: c1e6b7de4584b2a3f68b458801a33b3fc71c1944
2020-10-21 11:31:53 -07:00
n-v-k
64b0686986 Expose ChannelShuffle (#46000)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45999
Also small fix for caffe2 counterpart

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46000

Reviewed By: mruberry

Differential Revision: D24185855

Pulled By: ngimel

fbshipit-source-id: c5d599bb8100b86b81c6901f1b8b8baefc12cb16
2020-10-08 16:00:01 -07:00
Natalia Gimelshein
52f2db752d unify reproducibility notes (#45748)
Summary:
Many of our functions contain same warnings about results reproducibility. Make them use common template.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45748

Reviewed By: colesbury

Differential Revision: D24089114

Pulled By: ngimel

fbshipit-source-id: e6aa4ce6082f6e0f4ce2713c2bf1864ee1c3712a
2020-10-08 02:14:57 -07:00
Ansley Ussery
7726754e70 Add function signature for pixel_shuffle (#45661)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45661

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D24078627

Pulled By: ansleyadelaide

fbshipit-source-id: 44917ff5932e4d0adcc18ce24ecfc0b5686818e3
2020-10-02 11:46:35 -07:00
Guilherme Leobas
c1e6592964 Enable type-checking of torch.nn.quantized.* modules (#43110)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43029

I am not changing the following files in this PR:
* `torch/nn/quantized/dynamic/modules/rnn.py` due to https://github.com/pytorch/pytorch/issues/43072
* `torch/nn/quantized/modules/conv.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43110

Reviewed By: gchanan

Differential Revision: D23963258

Pulled By: ezyang

fbshipit-source-id: 0fb0fd13af283f6f7b3434e7bbf62165357d1f98
2020-09-29 18:14:29 -07:00
Brian Hirsh
439930c81b adding a beta parameter to the smooth_l1 loss fn (#44433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433

Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time

fixing some type errors, updated fn signature in a few more files

removing my usage of Scalar, making beta a double everywhere instead

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D23636720

Pulled By: bdhirsh

fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d
2020-09-25 16:36:28 -07:00
Kurt Mohler
d1c68a7069 Clarify that 5-D 'bilinear' grid_sample is actually trilinear (#45090)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41528

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45090

Reviewed By: ailzhang

Differential Revision: D23841046

Pulled By: zou3519

fbshipit-source-id: 941770cd5b3e705608957739026e9113e5f0c616
2020-09-22 15:10:22 -07:00
Mike Ruberry
ef885c10d8 [pytorch] Add triplet margin loss with custom distance (#43680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43680

As discussed [here](https://github.com/pytorch/pytorch/issues/43342),
adding in a Python-only implementation of the triplet-margin loss that takes a
custom distance function.  Still discussing whether this is necessary to add to
PyTorch Core.

Test Plan:
python test/run_tests.py

Imported from OSS

Reviewed By: albanD

Differential Revision: D23363898

fbshipit-source-id: 1cafc05abecdbe7812b41deaa1e50ea11239d0cb
2020-09-22 11:35:52 -07:00
Xiang Gao
e48201c5cf Mention TF32 on related docs (#44690)
Summary:
cc: ptrblck

![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690

Reviewed By: ngimel

Differential Revision: D23727921

Pulled By: mruberry

fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914
2020-09-16 19:18:30 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Gregory Chanan
5579b53a7f Fix SmoothL1Loss when target.requires_grad is True. (#44486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44486

SmoothL1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True.

This PR does the following:

1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the SmoothL1Loss CriterionTests to verify that the target derivative is checked.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23630699

Pulled By: gchanan

fbshipit-source-id: 0f94d1a928002122d6b6875182867618e713a917
2020-09-11 12:13:36 -07:00
David Reiss
7d78a6fcdd Update interpolate to use new upsample overloads (#43025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43025

- Use new overloads that better reflect the arguments to interpolate.
- More uniform interface for upsample ops allows simplifying the Python code.
- Also reorder overloads in native_functions.yaml to give them priority.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/37177

ghstack-source-id: 106938111

Test Plan:
test_nn has pretty good coverage.

Relying on CI for ONNX, etc.

Didn't test FC because this change is *not* forward compatible.

To ensure backwards compatibility, I ran this code before this change

```python
def test_func(arg):
    interp = torch.nn.functional.interpolate
    with_size = interp(arg, size=(16,16))
    with_scale = interp(arg, scale_factor=[2.1, 2.2], recompute_scale_factor=False)
    with_compute = interp(arg, scale_factor=[2.1, 2.2])
    return (with_size, with_scale, with_compute)

traced_func = torch.jit.trace(test_func, torch.randn(1,1,1,1))

sample = torch.randn(1, 3, 7, 7)
output = traced_func(sample)

assert not torch.allclose(output[1], output[2])

torch.jit.save(traced_func, "model.pt")
torch.save((sample, output), "data.pt")
```

then this code after this change

```python
model = torch.jit.load("model.pt")
sample, golden = torch.load("data.pt")
result = model(sample)
for r, g in zip(result, golden):
    assert torch.allclose(r, g)
```

Reviewed By: AshkanAliabadi

Differential Revision: D21209991

fbshipit-source-id: 5b2ebb7c3ed76947361fe532d1dbdd6faa3544c8
2020-09-11 09:59:14 -07:00
Gregory Chanan
3de2c0b42f Fix L1Loss when target.requires_grad is True. (#44471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44471

L1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True.

This PR does the following:

1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the L1Loss CriterionTests to verify that the target derivative is checked.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23626008

Pulled By: gchanan

fbshipit-source-id: 2828be16b56b8dabe114962223d71b0e9a85f0f5
2020-09-11 09:51:16 -07:00
Gregory Chanan
d07d25a8c5 Fix MSELoss when target.requires_grad is True. (#44437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44437

MSELoss had a completely different (and incorrect, see https://github.com/pytorch/pytorch/issues/43228) path when target.requires_grad was True.

This PR does the following:
1) adds derivative support for target via the normal derivatives.yaml route
2) kill the different (and incorrect) path for when target.requires_grad was True
3) modify the MSELoss CriterionTests to verify that the target derivative is checked.

TODO:
1) do we still need check_criterion_jacobian when we run grad/gradgrad checks?
2) ensure the Module tests check when target.requires_grad
3) do we actually test when reduction='none' and reduction='mean'?

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D23612166

Pulled By: gchanan

fbshipit-source-id: 4f74d38d8a81063c74e002e07fbb7837b2172a10
2020-09-11 08:51:28 -07:00
Chris Huynh
7b547f086f To fix extra memory allocation when using circular padding (#39273)
Summary:
For fixing https://github.com/pytorch/pytorch/issues/39256

Pull Request resolved: https://github.com/pytorch/pytorch/pull/39273

Reviewed By: anjali411

Differential Revision: D23471811

Pulled By: mruberry

fbshipit-source-id: fb324b51baea765311715cdf14642b334f335733
2020-09-10 00:15:31 -07:00