Commit Graph

1420 Commits

Author SHA1 Message Date
CaoE
54c28c564f add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/mingfeima
2023-09-19 10:43:33 +00:00
lezcano
653c1564bf Fix broadcasting cosine_similarity (#109363)
Fixes https://github.com/pytorch/pytorch/issues/109333
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109363
Approved by: https://github.com/peterbell10
2023-09-15 17:12:35 +00:00
PyTorch MergeBot
b226373d16 Revert "add Half support for BatchNorm on CPU (#102070)"
This reverts commit b6a1d3fb97.

Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to I'm very sorry but it looks like #106543 was not fixed, I still see it failing on main b6a1d3fb97 https://github.com/pytorch/pytorch/actions/runs/6185704949/job/16793975677 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1719747065))
2023-09-14 16:13:34 +00:00
CaoE
b6a1d3fb97 add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-14 12:23:59 +00:00
PyTorch MergeBot
04a765f95d Revert "add Half support for BatchNorm on CPU (#102070)"
This reverts commit 6065e7a97c.

Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to sorry it looks like this is causing an unexpected success for `test_jit_fuser_te.py::TestNNCOpInfoCPU::test_nnc_correctness_nn_functional_batch_norm_cpu_float16` 6065e7a97c https://github.com/pytorch/pytorch/actions/runs/6178069462/job/16770849782 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1718402208))
2023-09-13 22:38:42 +00:00
CaoE
6065e7a97c add Half support for BatchNorm on CPU (#102070)
Fixes #106543

### Testing

Single core:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.7116 | 0.1427 | 0.1744 | 0.2638 | 0.2002 | 0.2556
(1, 32, 100, 100) | 0.8579 | 0.1725 | 0.2077 | 0.3023 | 0.2399 | 0.2995
(32, 16, 200, 200) | 57.3466 | 12.2179 | 13.1320 | 45.9524 | 24.1526 | 24.9882

28 cores:

shape | fp32 forward / ms | fp16 forward / ms | bf16 forward / ms | fp32 backward / ms | fp16 backward / ms | bf16 backward / ms
-- | -- | -- | -- | -- | -- | --
(1, 4, 256, 256) | 0.2571 | 0.0713 | 0.0846 | 0.1140 | 0.0883 |  0.1043
(1, 32, 100, 100) | 0.1077 | 0.0510 | 0.0548 | 0.0700 | 0.0645 | 0.0713
(32, 16, 200, 200) | 5.5060 | 1.4195 | 1.4663 | 6.773 | 3.0886 | 3.1343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-13 17:30:16 +00:00
Kurt Mohler
3f88e3105f Reland: Remove remaining global set_default_dtype calls from tests (#108088)
Fixes #68972

Relands #107246

To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088
Approved by: https://github.com/ezyang
2023-09-07 03:04:34 +00:00
CaoE
8f02884569 add Half support for GroupNorm on CPU (#100234)
### Testing
Single socket (28cores):

* Contiguous:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 2.45E-05 | 3.26E-05 | 6.87E-05 | 7.40E-05
[10,   128, 80, 80] | 0.000726 | 0.000606 | 0.002183 | 0.001112

* Channels Last:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 2.88E-05 | 2.72E-05 | 6.56E-05 | 6.63E-05
[10,   128, 80, 80] | 0.00076 | 0.000256 | 0.002385 | 0.000735

Single core:

* Contiguous:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 9.47E-05 | 1.90E-04 | 2.03E-04 | 3.10E-04
[10,   128, 80, 80] | 6.25E-03 | 8.98E-03 | 0.016485 | 0.01369

* Channels Last:

shape | forward / s| forward / s| backward / s| backward / s
-- | -- | -- | -- | --
  | fp32 | mixed fp32 fp16 | fp32 | mixed fp32 fp16
[10,   128, 10, 10] | 8.66E-05 | 7.89E-05 | 1.95E-04 | 1.43E-04
[10,   128, 80, 80] | 5.97E-03 | 3.13E-03 | 0.01626 | 8.70E-03

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-09-01 21:25:24 +00:00
Mikayla Gawarecki
3817de5d84 Fix layernorm cpu precision issues (#108089)
#108072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108089
Approved by: https://github.com/mingfeima, https://github.com/albanD
2023-08-30 23:55:10 +00:00
Xia, Weiwen
97a291f6bd [ONEDNN][BC-breaking] update onednn from v2.7.3 to v3.1.1 (#97957)
**Summary**
Update onednn from v2.7.3 to v3.1.1.
It is bc-breaking as some APIs are changed on oneDNN side. Changes include:
- PyTorch code where oneDNN is directly called
- Submodule `third_party/ideep` to adapt to oneDNN's new API.
- CMAKE files to fix build issues.

**Test plan**
Building issues and correctness are covered by CI checks.
For performance, we have run TorchBench models to ensure there is no regression. Below is the comparison before and after oneDNN update.
![image](https://github.com/pytorch/pytorch/assets/12522207/415a4ff0-7566-40c6-aed0-24997a475b0e)

Note:
- Base commit of PyTorch: da322ea
- CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Ice Lake)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97957
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2023-08-25 12:13:18 +00:00
Aaron Gokaslan
660e8060ad [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-22 23:16:38 +00:00
PyTorch MergeBot
d59a6864fb Revert "[BE]: Update ruff to 0.285 (#107519)"
This reverts commit 88ab3e4322.

Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))
2023-08-22 19:53:32 +00:00
Aaron Gokaslan
88ab3e4322 [BE]: Update ruff to 0.285 (#107519)
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.

I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
2023-08-20 01:36:18 +00:00
lcskrishna
bc662ffff9 [ROCm] Update ROCm skip decorators (#106138)
This PR adds a msg argument for skipIfRocm and skipCUDAIfRocm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106138
Approved by: https://github.com/jataylo, https://github.com/jeffdaily, https://github.com/pruthvistony, https://github.com/albanD
2023-08-18 22:02:06 +00:00
Kurt Mohler
6af6b8f728 Reland: Remove set_default_dtype from nn tests (#107069)
Part of #68972
Relands #105775

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107069
Approved by: https://github.com/ezyang
2023-08-14 17:01:57 +00:00
PyTorch MergeBot
ec0f3fda7d Revert "Remove set_default_dtype from nn tests (#105775)"
This reverts commit 4d6a891baf.

Reverted https://github.com/pytorch/pytorch/pull/105775 on behalf of https://github.com/huydhn due to Sorry for reverting you change, it is failing one of the slow test in trunk ([comment](https://github.com/pytorch/pytorch/pull/105775#issuecomment-1675460195))
2023-08-11 22:14:17 +00:00
Kurt Mohler
4d6a891baf Remove set_default_dtype from nn tests (#105775)
Part of #68972

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105775
Approved by: https://github.com/ezyang
2023-08-10 14:56:13 +00:00
Jason Lu
bc88028e8e Back out "Reland "Make adding buffers more like adding parameters (#104069)" (#106224)" (#106743)
Summary:
Original commit changeset: 81319beb97f3

Original Phabricator Diff: D47961182

Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822

Reviewed By: atuljangra

Differential Revision: D48131623

@diff-train-skip-merge
(D48131623 landed internally)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743
Approved by: https://github.com/malfet
2023-08-08 15:27:34 +00:00
Michael Gschwind
63d45275f4 is causal hints for transformer (#106143)
Summary:
make is_causal hint flags available for the top level transformer module.

It's debatable whether this is useful -- at present we autodetect causal masks for src and tgt masks in transformer encoder and decoder, respectively. is_causal flags available woul enable users to short-cut this check by asserting whether they mask is causal, or not.

I am putting this diff up for discussion, not as a solution.  Not doing anything may be the right solution, unless there is strong (data-driven) user demand. -- it appears the consensus is to move ahead with this, as per discussions below.

@cpuhrsch @mikaylagawarecki @jbschlosser @janEbert

Test Plan: sandcastle

Differential Revision: D47373260

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106143
Approved by: https://github.com/mikaylagawarecki
2023-08-04 14:16:48 +00:00
CaoE
f82e6ff29e add channel last 3d support for batch_norm on CPU (#97774)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97774
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/mikaylagawarecki
2023-08-03 01:16:05 +00:00
Mikayla Gawarecki
c9be60cd0e Add error inputs to ModuleInfo (mirroring OpInfo) (#106325)
Add infra for error inputs to ModuleInfos, migrate first few error inputs tests from test_nn.py (more to come!)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106325
Approved by: https://github.com/albanD
2023-08-01 12:49:56 +00:00
Mikayla Gawarecki
d8e5f2aa6d Reland "Make adding buffers more like adding parameters (#104069)" (#106224)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224
Approved by: https://github.com/atalman, https://github.com/albanD
2023-07-31 17:18:56 +00:00
Mikayla Gawarecki
ca7ece9b50 [easy] improve hint on error message in nn.Module.load_state_dict (#106042)
Fix #105963

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106042
Approved by: https://github.com/albanD
2023-07-27 19:56:02 +00:00
Nikita Karetnikov
eac9e1b35f [OpInfo] add reference and error inputs for multilabel_margin_loss (#105523)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105523
Approved by: https://github.com/ezyang
2023-07-23 02:16:29 +00:00
Aaron Gokaslan
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
Andrey Talman
c6653b65d8 Back out "Make adding buffers more like adding parameters (#104069)" (#105581)
Summary:
D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/

with `TypeError: register_buffer() takes 3 positional arguments but 4 were given`

Original commit changeset: d4b4069fbd38

Original Phabricator Diff: D47537831

Test Plan:
```
buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform
```

Reviewed By: atalman

Differential Revision: D47600140

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581
Approved by: https://github.com/mikaylagawarecki
2023-07-20 03:39:53 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
Michael Gschwind
11b753af01 Refactor causal mask generation and detection for nn.transformer (#105265)
Summary:
* Create a private global-scope function _generate_subsequent because static class attribute member functions not supported by TorchScript resulting in torchscripting errors.
* Make TransformerEncoder and TransformerDecoder consistent w.r.t. is_causal handling by calling _detect_casual_mask
* Clarify documentation that is_causal is a hint
* Move causal mask detection into a method _detect_causal_mask
* only accept input-size compatible causal mask as causal mask
* update _generate_subsequent_causal_mask to include factory kwargs for dtype and device:
   avoid extra copies & conversions by passing directly to torch.full.

Test Plan: sandcastle & github CICD
Continuation of #101487 (due to a tooling issue) which is a continuation-in-part of https://github.com/pytorch/pytorch/pull/98327 by @janEbert

Differential Revision: D47427117

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105265
Approved by: https://github.com/mikaylagawarecki
2023-07-19 01:26:50 +00:00
Danni Li
1b78f23a1a Allow nn.ChannelShuffle to run without erroring on CUDA tensors (#105351)
Summary: Include GPU support for `nn.ChannelShuffle` & update test.

Fix: #104603

Test Plan: Please see GitHub Actions.

Differential Revision: D47523764

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105351
Approved by: https://github.com/mikaylagawarecki
2023-07-18 16:24:30 +00:00
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
Nikita Karetnikov
0c89596e4f [OpInfo] add reference and error inputs for multi_margin_loss (#104850)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104850
Approved by: https://github.com/ezyang
2023-07-14 21:16:09 +00:00
yanbing-j
3fe2b73416 Update use_mkldnn in LSTM op to avoid input and parameter not in the same device (#102050)
This PR is to fix https://github.com/pytorch/pytorch/issues/101935.

Only when input, parameters and hidden states are all in CPU device, LSTM will go into oneDNN fast path implementation. Otherwise, it will fallback to the original implmentation.

Note here, if input and parameters are indeed not in the same device, it will encounter Error `Input and parameter tensors are not at the same device, found input tensor......` in `check_attributes`. Therefore, the proper usage of LSTM is `input.to(device)` and `model.to(device)` together.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102050
Approved by: https://github.com/XiaobingSuper, https://github.com/albanD
2023-07-13 01:13:59 +00:00
Masaki Kozuki
6929e9e947 Use int64_t accordingly in cunn_SoftMaxBackward to avoid int overflow (#104270)
Fixes #103501

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104270
Approved by: https://github.com/malfet, https://github.com/mikaylagawarecki
2023-06-30 21:39:46 +00:00
cyy
54cb61f7d9 enable ASAN on some tests (#103647)
Enabling more tests on ASAN, meanwhile we disable float-divide-by-zero and float-cast-overflow, both are disabled because they are also disabled by default in latest clang.
The following cited doc explains the reasons.
```
-fsanitize=float-cast-overflow: Conversion to, from, or between floating-point types
which would overflow the destination. Because the range of representable values
for all floating-point types supported by Clang is [-inf, +inf], the only cases detected are
conversions from floating point to integer types.
-fsanitize=float-divide-by-zero: Floating point division by zero.
This is undefined per the C and C++ standards,
 but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing
either an infinity or NaN value,
so is not included in -fsanitize=undefined.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103647
Approved by: https://github.com/kit1980
2023-06-28 02:17:14 +00:00
Mikayla Gawarecki
b93ed8164e Add non-recursive module.to_empty option (#104197)
Fixes https://github.com/pytorch/pytorch/issues/97049, related to https://github.com/pytorch/pytorch/issues/104187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104197
Approved by: https://github.com/albanD
2023-06-26 21:47:22 +00:00
Ryan Smith
6bda97e2c1 Raise type error message for interpolate if size contains non-integer elements (#99243)
Raise type error message for interpolate when output size is a tuple containing elements that are not `int`

Fixes #98287

Check is only performed if `size` is an instance of `list` or `tuple`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99243
Approved by: https://github.com/Skylion007, https://github.com/Neilblaze, https://github.com/MovsisyanM, https://github.com/albanD
2023-06-23 00:48:45 +00:00
Mikayla Gawarecki
d1cecd9c32 Add assign kwarg to module.load_state_dict (#102212)
Fixes #64601 and #98906

Adds an `assign` argument to `load_state_dict` that loads params/buffers by assignment instead of doing `param.copy_(param_from_state_dict)`.

Primarily intended to remove the need for the `.to_empty()` in

```
with torch.device('meta'):
    m = SomeModule()
m.to_empty()
state_dict = torch.load('...pth')
m.load_state_dict(state_dict)
```

so we can instead do

```
with torch.device('meta'):
    m = SomeModule()
state_dict = torch.load('...pth')
m.load_state_dict(state_dict, assign=True)
```

**A problem with this PR for the case where the model is initialized on meta is what happens to nonpersistent buffers/params corresponding to keys missing from the state dict?**
What happens in the case where `load_state_dict(state_dict, strict=False, assign=True)` and the state_dict is missing some keys? The corresponding params missing from the `state_dict` and nonpersistent buffers would still be on `meta` and need to be manually initialized. However, I don't think we offer an API that would initialize these.

One solution would be to make these empty tensors but it might not be semantically correct...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102212
Approved by: https://github.com/albanD
2023-06-15 18:41:00 +00:00
Nicolas Hug
3766c04736 Add uint8 support for CPU images in interpolate(mode='bicubic') (#103252)
CC @vfdev-5

Proposed strategy: Be as close as possible to PIL when `antialias=True`. Be as close as possible to float path when `antialias=False`.

Ad-hoc tests:

<details>

```py
import random

import torch
import pytest
import numpy as np
from PIL import Image
from torch.nn.functional import interpolate

@pytest.mark.parametrize("C", (1, 3, 6))
@pytest.mark.parametrize("batch_size", (1, 4))
@pytest.mark.parametrize("memory_format", (torch.contiguous_format, torch.channels_last, "strided", "cropped"))
@pytest.mark.parametrize("antialias", (True, False))
# @pytest.mark.parametrize("mode", ("bilinear", "bicubic",))
@pytest.mark.parametrize("mode", ("bicubic",))
@pytest.mark.parametrize("seed", range(100))
def test_resize(C, batch_size, memory_format, antialias, mode, seed):

def test_resize(C, batch_size, memory_format, antialias, mode, seed):

    torch.manual_seed(seed)
    random.seed(seed)

    Hi = 2**random.randint(3, 10) + random.randint(0, 30)
    Wi = 2**random.randint(3, 10) + random.randint(0, 30)
    Ho = 2**random.randint(3, 10) + random.randint(0, 30)
    Wo = 2**random.randint(3, 10) + random.randint(0, 30)
    # print(Hi, Wi, Ho, Wo)

    img = torch.randint(0, 256, size=(batch_size, C, Hi, Wi), dtype=torch.uint8)

    if memory_format in (torch.contiguous_format, torch.channels_last):
        img = img.to(memory_format=memory_format, copy=True)
    elif memory_format == "strided":
        img = img[:, :, ::2, ::2]
    elif memory_format == "cropped":
        a = random.randint(1, Hi // 2)
        b = random.randint(Hi // 2 + 1, Hi)
        c = random.randint(1, Wi // 2)
        d = random.randint(Wi // 2 + 1, Wi)
        img = img[:, :, a:b, c:d]
    else:
        raise ValueError("Uh?")

    margin = 0
    img = img.clip(margin, 255 - margin)
    out_uint8 = interpolate(img, size=[Ho, Wo], mode=mode, antialias=antialias)

    if antialias and C == 3:
        out_pil_tensor = resize_with_pil(img, Wo, Ho, mode=mode, antialias=antialias)
        atol = {"bicubic": 2, "bilinear": 1}[mode]  # TODO: is 2 expected when comparing with PIL bicubic? Why not 1 as for bilinear?
        torch.testing.assert_close(out_uint8, out_pil_tensor, rtol=0, atol=atol)

    out_float = interpolate(img.to(torch.float), size=[Ho, Wo], mode=mode, antialias=antialias).round().clip(0, 255).to(torch.uint8)
    if mode == "bicubic":
        diff = (out_float.float() - out_uint8.float()).abs()
        assert diff.max() < 30

        percent = .03 if antialias else .1
        assert (diff > 2).float().mean() < percent

        mae = .4 if antialias else .8
        assert diff.mean() < mae
    else:
        torch.testing.assert_close(out_uint8, out_float, rtol=0, atol=1)

def resize_with_pil(batch, Wo, Ho, mode, antialias):
    resample = {"bicubic": Image.BICUBIC, "bilinear": Image.BILINEAR}[mode]
    out_pil = [
        Image.fromarray(img.permute((1, 2, 0)).numpy()).resize((Wo, Ho), resample=resample)
        for img in batch
    ]
    out_pil_tensor = torch.cat(
        [
            torch.as_tensor(np.array(img, copy=True)).permute((2, 0, 1))[None]
            for img in out_pil
        ]
    )
    return out_pil_tensor
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103252
Approved by: https://github.com/vfdev-5, https://github.com/H-Huang, https://github.com/malfet, https://github.com/atalman
2023-06-12 18:25:33 +00:00
ecao
73fd7235ad add function specializations for the case of parameters in BFloat16 data type (#100233)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100233
Approved by: https://github.com/jgong5, https://github.com/ngimel
2023-05-31 02:01:07 +00:00
vfdev-5
7042e10215 Fixed issue with bicubic interpolation on uint8 input and antialising (#102296)
Description:

- Fixed issue with bicubic interpolation on uint8 input and antialising, discovered by @NicolasHug
- Unified `_separable_upsample_generic_Nd_kernel_impl_single_dim` on `antialis` arg.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102296
Approved by: https://github.com/NicolasHug
2023-05-30 14:57:19 +00:00
ecao
af1d437654 Improve precision and performance for BFloat16 upsampling (#91169)
### Description
- Fix precision issue for BFloat16 upsampling: https://github.com/pytorch/pytorch/issues/89212
- Improve performance for BFloat16 upsampling.
### Testing
data type: BFloat16

- Single core

contiguous:
mode | scale_factor | shape  | before backward / ms |  after backward / ms
-- | -- | -- | -- | --
nearest | 2 | [10, 3, 200, 200] | 14.47 | 8.34
linear | 2 | [3, 200, 200] | 3.69 | 2.74
bilinear | 2 | [3, 5, 200, 200] | 87.99 | 49.05
trilinear | 2 | [3, 3, 3, 100, 100]  | 171.02 | 72.53
bicubic | 2 | [3, 3, 200, 200 ] | 176.29 | 78

channels last:
mode | scale_factor | shape | before backward / ms |  after backward / ms
-- | -- | -- | -- | --
nearest | 2 | [10, 3, 200, 200] | 17.70 | 10.30
linear | 2 | [3, 200, 200] | \ | \
bilinear | 2 | [3, 5, 200, 200] | 50.90 | 18.83
trilinear | 2 | [3, 3, 3, 100, 100] | 121.56 | 42.60
bicubic | 2 | [3, 3, 200, 200 ] | 179.40 | 80

- 20 cores

contiguous:
mode | scale_factor | shape | before backward / ms |  after backward / ms
-- | -- | -- | -- | --
nearest | 2 | [10, 3, 200, 200] | 1.17 | 1.01
linear | 2 | [3, 200, 200] | 0.41 | 0.26
bilinear | 2 | [3, 5, 200, 200] | 7.19 | 4.07
trilinear | 2 | [3, 3, 3, 100, 100]  | 21.32 | 9.33
bicubic | 2 | [3, 3, 200, 200 ] | 178.67 | 10

channels last:
mode | scale_factor | shape | before backward / ms |  after backward / ms
-- | -- | -- | -- | --
nearest | 2 | [10, 3, 200, 200] |  2.25 | 1.55
linear | 2 | [3, 200, 200] | \ | \
bilinear | 2 | [3, 5, 200, 200] |  20.17 | 7.20
trilinear | 2 | [3, 3, 3, 100, 100] | 43.33 | 15.66
bicubic | 2 | [3, 3, 200, 200 ] | 176.76 | 10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91169
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/Skylion007
2023-05-29 01:35:57 +00:00
ecao
3f4fee735a add Half support for logsigmoid, threshold, elu, gelu, hardtanh, hardsigmoid, hardswish, hardshrink, softshrink, leakyrelu, softplus, glu, silu, mish, and prelu on CPU (#98745)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98745
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/ngimel
2023-05-27 16:20:21 +00:00
ts
563d8058f4 Fix inconsistent torch.nn.MaxPool1d output on cpu and gpu (#99843)
Fixes #99412 , correctly raising an error when an output of invalid size is calculated.

Would be happy to iterate on this if there are any issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99843
Approved by: https://github.com/mikaylagawarecki
2023-05-15 20:27:43 +00:00
vfdev
a8ea4178ab Fixed bug in interpolate when interpolation size is larger than max (#101403)
## Description

This is a bug fix for rare cases that can happen with specific scale, antialias=False, output for a random line can be wrong. For example:
```
line 14
output uint8: [76, 78, 80, 81, 83, 85, 87, 88, 90]
expected float: [149, 152, 155, 158, 161, 164, 167, 170, 173]
diff: [-73, -74, -75, -77, -78, -79, -80, -82, -83]
opencv ref: [149 152 155 158 161 164 167 170 173]
```

It appears that for this line we have 3 weights coeff instead of 2:
```
line 13 | 351, 2
k: 1130 15254
line 14 | 378, 3
k: 0 16384 -6780            <-------  We should have 2 weights and not 3
line 15 | 432, 2
k: 15254 1130
```
which comes from our `_compute_weights_aa` function that is specifically used for AA=False and uint8.
```
    xmin = std::max(
        static_cast<int64_t>(center - support + 0.5 + align_corners_delta), static_cast<int64_t>(0));
    xsize = std::min(
        static_cast<int64_t>(center + support + 0.5 + align_corners_delta), input_size) - xmin;
```
```
center - support + 0.5 + align_corners_delta: 14.999999999999998
static_cast<int64_t>(center - support + 0.5 + align_corners_delta): 14
xmin -> 14

center + support + 0.5 + align_corners_delta: 17.0
static_cast<int64_t>(center + support + 0.5 + align_corners_delta): 17.0
xsize -> 17 - 14 = 3  <------ 3 instead of 2
```

For float dtype, AA=False weights and indices are computed differently due to historically first implemented.

In any case, `xsize` should not be larger than `max_interp_size`, so we decided to clip `xsize`.

Once fixed computed indices and weights are same as for float dtype code path:
```
# Option: xsize = min(xsize, max_interp_size)
Line Num | xmin, xsize

14 | 378, 2                 xmin=378 <---> xmin = i * stride = i * 3 * 9 => i = 14
k: 0 16384                  16384 = w * (1 << 14) => w = 1.0

=> i=14, w=0 and i=15, w=1
```
vs
```
Line Num | index0, index1
F32: 14 | 15, 16
F32: lambda0, lambda1: 0.999999, 9.53674e-07
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101403
Approved by: https://github.com/NicolasHug
2023-05-15 15:55:42 +00:00
vfdev-5
a3700571e1 Fixed a bug in interpolate uint8 AVX2 on non-contig input (#101136)
Description:
- Fixed a bug in interpolate uint8 AVX2 on non-contig input
- Added tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101136
Approved by: https://github.com/NicolasHug
2023-05-12 17:17:10 +00:00
yanbing-j
36d91b5513 Add differentiable mkldnn_rnn_layer_backward to support double backward of LSTM (#100627)
### Description

This PR is to fix #99413, which shows the limitation of double backward using oneDNN in LSTM.

This PR does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements mkldnn_rnn_layer_backward using differentiable operations, so that double backward can be done automatically.

During backward process, it needs to use gates and hidden states between cells during one layer. However, these middle variables are stored in the `workspace`, and it is hard to figure them out. Therefore, in backward, we need re-calculate them first.

Corresponding UT has been added based on the failing case in # 99413. The UT with gradcheck and gradgradcheck which is added in https://github.com/pytorch/pytorch/pull/26660 cannot test LSTM using oneDNN, because UT only supports `double` datatype, while oneDNN does not support it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100627
Approved by: https://github.com/jgong5, https://github.com/soulitzer
2023-05-09 12:58:57 +00:00
vfdev-5
ff974cd962 Fixing interpolate on uint8 unsqueezed 3D CL tensor (#100258)
Description:

- Fixed a bug with memory format issue:

When input is channels last 4d tensor that was produced as following
```
t = torch.ones(1, 3, 32, 32).contiguous(memory_format=torch.channels_last)
t = t[0]
t = t[None, ...]
```
upsampling will produce output with channels first memory format but our avx code does not take that into account.

Here is a repro code to show that nightly is broken for this particular case:
```python
import torch

torch.manual_seed(0)

input = torch.randint(0, 256, size=(1, 3, 256, 256), dtype=torch.uint8).contiguous(memory_format=torch.channels_last)
input = input[0]
input = input[None, ...]

assert input.is_contiguous(memory_format=torch.channels_last)

output = torch.nn.functional.interpolate(input, (224, 224), mode="bilinear", antialias=True)
expected = torch.nn.functional.interpolate(input.float(), (224, 224), mode="bilinear", antialias=True)

assert output.is_contiguous()
assert expected.is_contiguous()

torch.testing.assert_close(expected, output.float(), atol=1, rtol=1)
# >
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/pytorch/torch/testing/_comparison.py", line 1511, in assert_close
#     raise error_metas[0].to_error(msg)
# AssertionError: Tensor-likes are not close!
#
# Mismatched elements: 14120 / 150528 (9.4%)
# Greatest absolute difference: 214.6112518310547 at index (0, 1, 152, 13) (up to 1 allowed)
# Greatest relative difference: 17.005144119262695 at index (0, 2, 26, 2) (up to 1 allowed)
```

- Also renamed needs_unpacking by skip_unpacking

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100258
Approved by: https://github.com/NicolasHug
2023-05-04 13:28:33 +00:00
Larry Liu
687afeb686 [dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849)
Issue: #93684

# Problem

Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations.

# Design (as I know it)

* Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`.
* Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent.

This PR adds `NumpyTensorVariable` and supports:
1.  tensor to ndarray, ndarray to tensor
2. numpy functions such as numpy.meshgrid()
3. ndarray attributes such as `itemsize`, `stride`

Next PR will handle returning `np.ndarray` and add support for ndarray methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849
Approved by: https://github.com/ezyang
2023-04-27 16:18:35 +00:00
Yanli Zhao
9bc03db670 Move nn.module state dict pre hook (#98964)
Some modules like lazyModule may override '_save_to_state_dict()', in this case, pre_state_dict hook will not be called. So move the pre_state_dict hook out from '_save_to_state_dict()' to make sure the pre hook could be called

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98964
Approved by: https://github.com/albanD
2023-04-26 16:51:13 +00:00
soulitzer
5ee5afb82c Update channel shuffle to return alias instead of self as-is (#99745)
Partially addresses https://github.com/pytorch/pytorch/issues/99655
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99745
Approved by: https://github.com/albanD
2023-04-24 14:02:14 +00:00