Commit Graph

1134 Commits

Author SHA1 Message Date
mingfeima
84b1c9798c add BFloat16 support for AvgPool2d on CPU (#66927)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66927

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33353198

Pulled By: VitalyFedyunin

fbshipit-source-id: 1aeaa4bb90ac99210b8f6051c09d6995d06ce3a1
2022-01-14 07:59:10 -08:00
mingfeima
910c01020e add BFloat16 support for AdaptiveMaxPool2d on CPU (#66929)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66929

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33353199

Pulled By: VitalyFedyunin

fbshipit-source-id: d402d5deb7ca766259ca42118ddc16625e134c4c
2022-01-13 20:00:42 -08:00
Jake Tae
eac3decf93 ModuleList concatenation (#70887)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70441.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70887

Reviewed By: ejguan

Differential Revision: D33555431

Pulled By: albanD

fbshipit-source-id: ce42459ee46a611e98e89f02686acbac16b6b668
2022-01-13 15:31:07 -08:00
mingfeima
385773cb77 add BFloat16 support for MaxPool2d on CPU (#56903)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56903

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D28836791

Pulled By: VitalyFedyunin

fbshipit-source-id: e03d55cc30dfa3628f096938fbad34b1031948af
2022-01-12 14:20:20 -08:00
Ilya Persky
a8612cd72a Skip failing tests in test_nn if compiled without LAPACK (#70913)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70912

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70913

Reviewed By: mruberry

Differential Revision: D33534840

Pulled By: albanD

fbshipit-source-id: 0facf5682140ecd7a78edb34b9cd997f9319e084
2022-01-11 12:21:18 -08:00
George Qi
d7db5fb462 ctc loss no batch dim support (#70092)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70092

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33280068

Pulled By: george-qi

fbshipit-source-id: 3278fb2d745a396fe27c00fb5f40df0e7f584f81
2022-01-07 14:33:22 -08:00
Bin Bao
f135438d3b Dispatch to at::convolution intead of at::_convolution in _convolution_double_backward (#70661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70661

Dispatching to at::convolution can make Lazy Tensor trace the right convolution op.

Test Plan: pytest test/test_nn.py -k test_conv_double_backward_strided_with_3D_input_and_weight

Reviewed By: wconstab, jbschlosser

Differential Revision: D33428780

Pulled By: desertfire

fbshipit-source-id: 899e4135588ea33fff23d16103c25d9bcd3f902c
2022-01-07 07:53:46 -08:00
Joel Schlosser
e6befbe85c Add flag to optionally average output attention weights across heads (#70055)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47583

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055

Reviewed By: bhosmer

Differential Revision: D33457866

Pulled By: jbschlosser

fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5
2022-01-06 17:32:37 -08:00
Joel Schlosser
7b8f73dd32 No-batch-dim support for ConvNd (#70506)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70506

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33355034

Pulled By: jbschlosser

fbshipit-source-id: 5a42645299b1d82cee7d461826acca1c5b35a71c
2022-01-06 16:53:50 -08:00
Jake Tae
b7742b437a Allow RNN hidden_size to be 0 (#70556)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56767.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70556

Reviewed By: ngimel

Differential Revision: D33455156

Pulled By: jbschlosser

fbshipit-source-id: 5dc57b09d7beb6ae81dfabc318e87c109bb4e6ae
2022-01-06 14:18:36 -08:00
Jane Xu
c00d33033c Remove repeat test for types in test nn (#70872)
Summary:
Helps fix a part of https://github.com/pytorch/pytorch/issues/69865

The first commit just migrates everything as is.

The second commit uses the "device" variable instead of passing "cuda" everywhere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70872

Reviewed By: jbschlosser

Differential Revision: D33455941

Pulled By: janeyx99

fbshipit-source-id: 9d9ec8c95f1714c40d55800e652ccd69b0c314dc
2022-01-06 09:20:02 -08:00
soulitzer
3051aabd0e Add forward AD formulas for convolution and some others (#69956)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69956

Test Plan: Imported from OSS

Reviewed By: albanD, bdhirsh

Differential Revision: D33235974

Pulled By: soulitzer

fbshipit-source-id: ea60d687edc5d62d92f3fd3cb6640421d32c908c
2022-01-06 08:39:51 -08:00
Joel Schlosser
b60b1b100f Set cuDNN deterministic flag for test_conv_double_backward_cuda (#69941)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69941

Reviewed By: george-qi

Differential Revision: D33430727

Pulled By: jbschlosser

fbshipit-source-id: 4a250bd0e5460ee631730afe0ab68ba72f37d292
2022-01-05 10:05:56 -08:00
kshitij12345
7bfaa230be [nn] adaptive_avg_pool{1/2/3}d : Error on negative output_size (#70488)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70488

Reviewed By: H-Huang

Differential Revision: D33367289

Pulled By: jbschlosser

fbshipit-source-id: 6b7b89d72c4e1e049ad6a0addb22a261c28ddb4c
2021-12-30 14:42:11 -08:00
mingfeima
401a6b682b add BFloat16 support for AdaptiveAvgPool2d on CPU (#56902)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56902

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D28836789

Pulled By: VitalyFedyunin

fbshipit-source-id: caac5e5b15190b8010bbfbc6920aa44032208ee7
2021-12-30 11:58:37 -08:00
vfdev
d2abf3f981 Added antialias flag to interpolate (CPU only, bicubic) (#68819)
Summary:
Description:
- Added antialias flag to interpolate (CPU only)
  - forward and backward for bicubic mode
  - added tests

Previous PR for bilinear, https://github.com/pytorch/pytorch/pull/65142

### Benchmarks

<details>
<summary>
Forward pass, CPU. PTH interpolation vs PIL
</summary>

Cases:
- PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apples vs pears)
- PTH 1 Channel, float32 vs PIL 1 Channel Float

Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112

```
Torch config: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
  - CuDNN 8.0.5
  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF,

Num threads: 1
[------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitb0bdf58
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                4.5                |          5.2
      channels_last non-contiguous torch.float32  |                4.5                |          5.3

Times are in milliseconds (ms).

[------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitb0bdf58
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                5.7                |          6.4
      channels_last non-contiguous torch.float32  |                5.7                |          6.4

Times are in milliseconds (ms).

[------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) --------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitb0bdf58
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                3.0                |          4.0
      channels_last non-contiguous torch.float32  |                2.9                |          4.1

Times are in milliseconds (ms).

[------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitb0bdf58
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                14.7               |          17.1
      channels_last non-contiguous torch.float32  |                14.8               |          17.2

Times are in milliseconds (ms).

[------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitb0bdf58
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                3.5                |          3.9
      channels_last non-contiguous torch.float32  |                3.5                |          3.9

Times are in milliseconds (ms).

[---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) ---------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitb0bdf58
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               2.4               |          1.8

Times are in milliseconds (ms).

[---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) ---------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitb0bdf58
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               3.1               |          2.2

Times are in milliseconds (ms).

[---------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ----------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitb0bdf58
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               1.6               |          1.4

Times are in milliseconds (ms).

[--------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ---------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitb0bdf58
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               7.9               |          5.7

Times are in milliseconds (ms).

[--------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ---------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitb0bdf58
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               1.7               |          1.3

Times are in milliseconds (ms).

```

</details>

Code is moved from torchvision: https://github.com/pytorch/vision/pull/3810 and https://github.com/pytorch/vision/pull/4208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68819

Reviewed By: mikaylagawarecki

Differential Revision: D33339117

Pulled By: jbschlosser

fbshipit-source-id: 6a0443bbba5439f52c7dbc1be819b75634cf67c4
2021-12-29 14:04:43 -08:00
George Qi
8af39b7668 AdaptiveLogSoftmaxWithLoss no_batch_dim support (#69054)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69054

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33200166

Pulled By: george-qi

fbshipit-source-id: 9d953744351a25f372418d2a64e8402356d1e9b7
2021-12-29 10:25:26 -08:00
soulitzer
3116d87024 Add forward AD formulas for {adaptive_,fractional_,}max_pool{2,3}d_{backward,} (#69884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69884

Also fixes: https://github.com/pytorch/pytorch/issues/69322, https://github.com/pytorch/pytorch/issues/69325

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D33093039

Pulled By: soulitzer

fbshipit-source-id: b9a522a00f4e9e85974888de5058de07280f8f66
2021-12-23 15:51:09 -08:00
soulitzer
5651e1e3ad Add auto_linear formulas and some others (#69727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69727

Still need to test the backward ones. We would need to update gradgradcheck to check forward over backward.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33031728

Pulled By: soulitzer

fbshipit-source-id: 86c59df5d2196b5c8dbbb1efed9321e02ab46d30
2021-12-20 12:15:25 -08:00
Albert Liang
0d06616c47 Add dict methods to ParameterDict (#69403)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68476

We implemented all of the following `dict` methods for `ParameterDict`
- `get `
- `setdefault`
- `popitem`
- `fromkeys`
- `copy`
- `__or__`
- `__ior__`
- `__reversed__`
- `__ror__`

The behavior of these new methods matches the expected behavior of python `dict` as defined by the language itself: https://docs.python.org/3/library/stdtypes.html#typesmapping

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69403

Reviewed By: albanD

Differential Revision: D33187111

Pulled By: jbschlosser

fbshipit-source-id: ecaa493837dbc9d8566ddbb113b898997e2debcb
2021-12-17 10:15:47 -08:00
Rui Zhu
46ace4ac33 Add support for masked_softmax when softmax_elements > 1024 & corresponding unit tests (#69924)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69924

Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax

Reviewed By: ngimel

Differential Revision: D32819181

fbshipit-source-id: 6838a11d3554ec8e1bd48f1c2c7b1ee3a4680995
2021-12-15 16:44:15 -08:00
kshitij12345
e8d5c7cf7f [nn] mha : no-batch-dim support (python) (#67176)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

* [x] Update docs
* [x] Tests for shape checking

Tests take roughly 20s on system that I use. Below is the timings for slowest 20 tests.

```
pytest test/test_modules.py -k _multih --durations=20
============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini
plugins: hypothesis-6.23.2, repeat-0.9.1
collected 372 items / 336 deselected / 36 selected

test/test_modules.py ..............ssssssss..............                                                                                                                                                  [100%]

================================================================================================ warnings summary ================================================================================================
../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73
test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32
  /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================================================================== slowest 20 durations ==============================================================================================
8.66s call     test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_cuda_float64
2.02s call     test/test_modules.py::TestModuleCPU::test_gradgrad_nn_MultiheadAttention_cpu_float64
1.89s call     test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiheadAttention_cuda_float64
1.01s call     test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32
0.51s call     test/test_modules.py::TestModuleCPU::test_grad_nn_MultiheadAttention_cpu_float64
0.46s call     test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float32
0.45s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float64
0.44s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float32
0.21s call     test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float64
0.21s call     test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float32
0.18s call     test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float64
0.17s call     test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float32
0.16s call     test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float64
0.11s call     test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float64
0.08s call     test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float32
0.08s call     test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float64
0.06s call     test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float64
0.06s call     test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float32
0.06s call     test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float32
0.06s call     test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float64
============================================================================================ short test summary info =============================================================================================
=========================================================================== 28 passed, 8 skipped, 336 deselected, 2 warnings in 19.71s ===========================================================================
```

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67176

Reviewed By: dagitses

Differential Revision: D33094285

Pulled By: jbschlosser

fbshipit-source-id: 0dd08261b8a457bf8bad5c7f3f6ded14b0beaf0d
2021-12-14 13:21:21 -08:00
Rui Zhu
1a299d8f1b Add support for transformer layout of masked_softmax (#69272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272

In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D).
This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory.
In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread.

This new layout is not currently supported in CPU yet.

Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax

Reviewed By: ngimel

Differential Revision: D32605557

fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2
2021-12-14 10:51:58 -08:00
Joel Schlosser
fc37e5b3ed Hook up general convolution to convolution_backward (#69584)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69584

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32936380

Pulled By: jbschlosser

fbshipit-source-id: c6fdd88db33bd1a9d0eabea47ae09a4d5b170e92
2021-12-12 17:30:01 -08:00
Joel Schlosser
f0e98dcbd3 General convolution_backward function (#69044)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69044

Test Plan: Imported from OSS

Reviewed By: zou3519, albanD, H-Huang

Differential Revision: D32708818

Pulled By: jbschlosser

fbshipit-source-id: e563baa3197811d8d51553fc83718ace2f8d1b7a
2021-12-12 15:53:38 -08:00
Rui Zhu
aab67c6dff Add native masked_softmax (#69268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69268

This diff enabled native masked softmax on CUDA, also expanded our current warp_softmax to accept masking.
The mask in this masked softmax has to be the same shape as input, and has to be contiguous.

In a following diff I will submit later, I will have encoder mask layout included, where input is BHDD and mask is BD.

Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax

Reviewed By: ngimel

Differential Revision: D32338419

fbshipit-source-id: 48c3fde793ad4535725d9dae712db42e2bdb8a49
2021-12-09 23:29:45 -08:00
kshitij12345
7407e3d6fd [fix] cross_entropy : fix weight with ignore_index and label_smoothing (#69511)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69339

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69511

Reviewed By: mrshenli

Differential Revision: D32951935

Pulled By: jbschlosser

fbshipit-source-id: 482eae851861a32f96bd6231dd3448fb6d44a015
2021-12-08 12:08:33 -08:00
jjsjann123
3c1e2ff9eb fixing layer_norm cuda bug (#69210)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69210

Reviewed By: H-Huang

Differential Revision: D32764811

Pulled By: ngimel

fbshipit-source-id: fb4201fe5f2284fcb22e36bc1029eef4a21b09bf
2021-12-01 15:46:47 -08:00
Kurt Mohler
d507fd63f3 Check that block height and width are positive in nn.Fold (#69048)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68875

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69048

Reviewed By: samdow

Differential Revision: D32729307

Pulled By: jbschlosser

fbshipit-source-id: 162cafb005873012d900d86997d07640967038c0
2021-12-01 10:08:47 -08:00
Omkar Salpekar
8e343ba5db Revert D32611368: [pytorch][PR] Initial version of general convolution_backward
Test Plan: revert-hammer

Differential Revision:
D32611368 (445b31abff)

Original commit changeset: 26d759b7c908

fbshipit-source-id: e91f45f0f31150e60d657a3964b7e42027beff58
2021-11-23 13:39:36 -08:00
Joel Schlosser
445b31abff Initial version of general convolution_backward (#65219)
Summary:
Towards [convolution consolidation](https://fb.quip.com/tpDsAYtO15PO).

Introduces the general `convolution_backward` function that uses the factored-out backend routing logic from the forward function.

Some notes:
* `finput` is now recomputed in the backward pass for the slow 2d / 3d kernels instead of being saved from the forward pass. The logic for is based on the forward computation and is present in `compute_finput2d` / `compute_finput3d` functions in `ConvUtils.h`.
* Using structured kernels for `convolution_backward` requires extra copying since the backend-specific backward functions return tensors. Porting to structured is left as future work.
* The tests that check the routing logic have been renamed from `test_conv_backend_selection` -> `test_conv_backend` and now also include gradcheck validation using an `autograd.Function` hooking up `convolution` to `convolution_backward`. This was done to ensure that gradcheck passes for the same set of inputs / backends.

The forward pass routing is done as shown in this flowchart (probably need to download it for it to be readable since it's ridiculous):
![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/137186002-5bca75ca-f911-4e61-8245-ec07af841506.png)

![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/139731619-9d0d436e-cce3-4bc3-8eaf-d469f667f0d7.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65219

Reviewed By: mruberry

Differential Revision: D32611368

Pulled By: jbschlosser

fbshipit-source-id: 26d759b7c908ab8f19ecce627acea7bd3d5f59ba
2021-11-23 08:19:45 -08:00
soulitzer
7bb401a4c9 Add forward AD support for miscellanous operators (#67820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67820

Original PR here: https://github.com/pytorch/pytorch/pull/67040

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D32314423

Pulled By: soulitzer

fbshipit-source-id: ecd898dc903692cab084f6922a1d86986f957b1b
2021-11-19 14:31:06 -08:00
jiej
ca92111758 Add native_dropout (#63937)
Summary:
Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition.

cc gmagogsfm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937

Reviewed By: mruberry

Differential Revision: D32477657

Pulled By: ngimel

fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4
2021-11-18 19:41:10 -08:00
kshitij12345
d5d2096dab [testing] make @dtypes mandatory when using @dtypesIf (#68186)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53647

With this if a test forgets to add `dtypes` while using `dtypesIf`, following error is raised
```
AssertionError: dtypes is mandatory when using dtypesIf however 'test_exponential_no_zero' didn't specify it
```

**Tested Locally**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68186

Reviewed By: VitalyFedyunin

Differential Revision: D32468581

Pulled By: mruberry

fbshipit-source-id: 805e0855f988b77a5d8d4cd52b31426c04c2200b
2021-11-18 08:29:31 -08:00
vfdev-5
3da2e09c9b Added antialias flag to interpolate (CPU only, bilinear) (#65142)
Summary:
Description:
- Added antialias flag to interpolate (CPU only)
  - forward and backward for bilinear mode
  - added tests

### Benchmarks

<details>
<summary>
Forward pass, CPU. PTH interpolation vs PIL
</summary>

Cases:
- PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apply vs pears)
- PTH 1 Channel, float32 vs PIL 1 Channel Float

Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112

```
# OMP_NUM_THREADS=1 python bench_interp_aa_vs_pillow.py

Torch config: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75
  - CuDNN 8.0.5
  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON,

Num threads: 1
[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (320, 196) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                2.9                |          3.1
      channels_last non-contiguous torch.float32  |                2.6                |          3.6

Times are in milliseconds (ms).

[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (460, 220) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                3.4                |          4.0
      channels_last non-contiguous torch.float32  |                3.4                |          4.8

Times are in milliseconds (ms).

[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 96) -------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                1.6                |          1.8
      channels_last non-contiguous torch.float32  |                1.6                |          1.9

Times are in milliseconds (ms).

[----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                9.0                |          11.3
      channels_last non-contiguous torch.float32  |                8.9                |          12.5

Times are in milliseconds (ms).

[----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                2.1                |          1.8
      channels_last non-contiguous torch.float32  |                2.1                |          3.4

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (320, 196) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               1.2               |          1.0

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (460, 220) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               1.4               |          1.3

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |              719.9              |         599.9

Times are in microseconds (us).

[-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               3.7               |          3.5

Times are in milliseconds (ms).

[-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |              834.4              |         605.7

Times are in microseconds (us).

```

</details>

Code is moved from torchvision: https://github.com/pytorch/vision/pull/4208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65142

Reviewed By: mrshenli

Differential Revision: D32432405

Pulled By: jbschlosser

fbshipit-source-id: b66c548347f257c522c36105868532e8bc1d4c6d
2021-11-17 09:10:15 -08:00
vfdev-5
6adbe044e3 Added nearest-exact interpolation mode (#64501)
Summary:
Added "nearest-exact" interpolation mode to fix the issues: https://github.com/pytorch/pytorch/issues/34808 and https://github.com/pytorch/pytorch/issues/62237.

Description:

As we can not fix "nearest" mode without large impact on already trained model [it was suggested](https://github.com/pytorch/pytorch/pull/64501#pullrequestreview-749771815) to introduce new mode instead of fixing exising "nearest" mode.

- New mode "nearest-exact" performs index computation for nearest interpolation to match scikit-image, pillow, TF2 and while "nearest" mode still match opencv INTER_NEAREST, which appears to be buggy, see https://ppwwyyxx.com/blog/2021/Where-are-Pixels/#Libraries.

"nearest":
```
input_index_f32 = output_index * scale
input_index = floor(input_index_f32)
```

"nearest-exact"
```
input_index_f32 = (output_index + 0.5) * scale - 0.5
input_index = round(input_index_f32)
```

Comparisions with other libs: https://gist.github.com/vfdev-5/a5bd5b1477b1c82a87a0f9e25c727664

PyTorch version | 1.9.0 "nearest" | this PR "nearest" | this PR "nearest-exact"
---|---|---|---
Resize option: | |
OpenCV INTER_NEAREST result mismatches | 0 | 0 | 10
OpenCV INTER_NEAREST_EXACT result mismatches | 9 | 9 | 9
Scikit-Image result mismatches | 10 | 10 | 0
Pillow result mismatches | 10 | 10 | 7
TensorFlow result mismatches | 10 | 10 | 0
Rescale option: | | |
size mismatches (https://github.com/pytorch/pytorch/issues/62396) | 10 | 10 | 10
OpenCV INTER_NEAREST result mismatches | 3 | 3| 5
OpenCV INTER_NEAREST_EXACT result mismatches | 3 | 3| 4
Scikit-Image result mismatches | 4 | 4 | 0
Scipy result mismatches | 4 | 4 | 0
TensorFlow: no such option | - |  -

Versions:
```
skimage: 0.19.0.dev0
opencv: 4.5.4-dev
scipy: 1.7.2
Pillow: 8.4.0
TensorFlow: 2.7.0
```

Implementations in other libs:

- Pillow:
  - ee079ae67e/src/libImaging/Geometry.c (L889-L899)
  - ee079ae67e/src/libImaging/Geometry.c (L11)
  - `a[2] == 0`

- Scikit-Image :
  - dev v0.19.0 uses scipy ndi.zoom:
    - 38fae50c3f/skimage/transform/_warps.py (L180-L188)
    - 47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L775-L779)
    - 47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L479)

Additionally:
- Updated upsampling tests

cc ezyang gchanan albanD mruberry jbschlosser walterddr fmassa heitorschueroff ppwwyyxx

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64501

Reviewed By: anjali411

Differential Revision: D32361901

Pulled By: jbschlosser

fbshipit-source-id: df906f4d25a2b2180e1942ffbab2cc14600aeed2
2021-11-15 14:28:19 -08:00
yanbing-j
12026124cc Avoid the view for mkldnn case in 1D convolution (#68166)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68034

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68166

Reviewed By: mrshenli

Differential Revision: D32432444

Pulled By: jbschlosser

fbshipit-source-id: fc4e626d497d9e4597628a18eb89b94518bb3b33
2021-11-15 11:56:45 -08:00
eqy
a1ace029e2 Add host-side memory requirement for test_softmax_64bit_indexing (#67922)
Summary:
https://github.com/pytorch/pytorch/issues/67910
The original `largeTensorTest` decorator didn't account for the additional host-side memory requirements.
Thanks crcrpar for raising the issue, CC ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67922

Reviewed By: malfet

Differential Revision: D32308602

Pulled By: mruberry

fbshipit-source-id: 97b7d2c39fe63c1a8269402f72186026a89f6b4c
2021-11-11 09:24:15 -08:00
Dani El-Ayyass
f171c78c04 add unpack_sequence and unpad_sequence functions (#66550)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66550

Reviewed By: malfet

Differential Revision: D32299193

Pulled By: jbschlosser

fbshipit-source-id: 96c92d73d3d40b7424778b2365e0c8bb1ae56cfb
2021-11-10 15:15:08 -08:00
Joel Schlosser
9a2db6f091 Factor backend routing logic out of convolution forward (#67790)
Summary:
This PR introduces a new function `_select_conv_backend` that returns a `ConvBackend` enum representing the selected backend for a given set of convolution inputs and params.

The function and enum are exposed to python for testing purposes through `torch/csrc/Module.cpp` (please let me know if there's a better place to do this).

A new set of tests validates that the correct backend is selected for several sets of inputs + params. Some backends aren't tested yet:
* nnpack (for mobile)
* xnnpack (for mobile)
* winograd 3x3 (for mobile)

Some flowcharts for reference:
![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/140828957-1135b400-38c0-4c9f-87ef-4f33ceebeeae.png)
![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/140828977-ed223a4e-aa86-49f1-9925-c0f6b9ab36af.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67790

Reviewed By: zou3519

Differential Revision: D32280878

Pulled By: jbschlosser

fbshipit-source-id: 0ce55174f470f65c9b5345b9980cf12251f3abbb
2021-11-10 07:53:55 -08:00
Xiao Wang
f6a4c80a5a Refactor cuDNN Convolution memory format and Conv-Bias-Relu code (#65594)
Summary:
This PR makes several changes:

- Changed function `bool cudnn_conv_use_channels_last(...)` to `at::MemoryFormat cudnn_conv_suggest_memory_format(...)`
- Removed `resize_` in cudnn convolution code. Added a new overloading method `TensorDescriptor::set` that also passes the desired memory format of the tensor.
- Disabled the usage of double + channels_last on cuDNN Conv-Relu and Conv-Bias-Relu. Call `.contiguous(memory_format)` before passing data to cuDNN functions.
- Disabled the usage of cuDNN fused Conv-Bias-Relu in cuDNN < 8.0 version due to a CUDNN_STATUS_NOT_SUPPORTED error. Instead, use the native fallback path.
- Let Conv-Bias-Relu code respect the global `allow_tf32` flag.

From cuDNN document, double + NHWC is genenrally not supported.

Close https://github.com/pytorch/pytorch/pull/66968

Fix https://github.com/pytorch/pytorch/issues/55301

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65594

Reviewed By: jbschlosser, malfet

Differential Revision: D32175766

Pulled By: ngimel

fbshipit-source-id: 7ba079c9f7c46fc56f8bfef05bad0854acf380d7
2021-11-05 11:50:55 -07:00
Alban Desmaison
bb8978f605 Revert D32175963: Converting hardswish to strucutred kernels with metatensor support
Test Plan: revert-hammer

Differential Revision:
D32175963 (57335a9ee3)

Original commit changeset: f4d749c6aeaf

fbshipit-source-id: 6d68a96cf872c2d7b518c061875b9336bca0043a
2021-11-05 07:04:40 -07:00
John Clow
57335a9ee3 Converting hardswish to strucutred kernels with metatensor support (#66899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66899

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175963

Pulled By: Gamrix

fbshipit-source-id: f4d749c6aeaf064084be72361607ea4f3f6bc91d
2021-11-04 19:02:00 -07:00
soulitzer
83e8612d11 Clean up test autograd (#67413)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/66066

This PR:
 - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality
 - tests related to an operator are better colocated
 - see the tracker for details

What to think about when moving tests to their correct test suite:
 - naming, make sure its not too generic
 - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter
 - can this be merged with existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413

Reviewed By: jbschlosser, albanD

Differential Revision: D32031480

Pulled By: soulitzer

fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f
2021-11-03 15:26:09 -07:00
Xiao Wang
31cf3d6aad Fix adaptive_max_pool2d for channels-last on CUDA (#67697)
Summary:
Fix https://github.com/pytorch/pytorch/issues/67239

The CUDA kernels for `adaptive_max_pool2d` (forward and backward) were written for contiguous output. If outputs are non-contiguous, first create a contiguous copy and let the kernel write output to the contiguous memory space. Then copy the output from contiguous memory space to the original non-contiguous memory space.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67697

Reviewed By: ejguan

Differential Revision: D32112443

Pulled By: ngimel

fbshipit-source-id: 0e3bf06d042200c651a79d13b75484526fde11fe
2021-11-03 09:47:29 -07:00
John Shen
234bd6dc56 [quantized] Add bilinear quantized grid_sample (#66879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66879

This adds a quantized implementation for bilinear gridsample. Bicubic interpolation cannot be supported as easily since we rely on the linearity of quantization to operate on the raw values, i.e.

f(q(a), q(b)) = q(f(a, b)) where f is the linear interpolation function.
ghstack-source-id: 141321116

Test Plan: test_quantization

Reviewed By: kimishpatel

Differential Revision: D31656893

fbshipit-source-id: d0bc31da8ce93daf031a142decebf4a155943f0f
2021-11-01 14:44:26 -07:00
kshitij12345
885a8e53ba replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201)
Summary:
Reference https://github.com/pytorch/pytorch/issues/53849

Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201

Reviewed By: mrshenli

Differential Revision: D31299718

Pulled By: mruberry

fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd
2021-11-01 09:22:34 -07:00
Joel Schlosser
16d937b0df Fix strided _conv_double_backward() with 3D input / weight (#67283)
Summary:
Removes the 3D special case logic in `_convolution_double_backward()` that never worked.

The logic was never called previously since `convolution()` expands input / weight from 3D -> 4D before passing them to backends; backend-specific backward calls thus save the 4D version to pass to `_convolution_double_backward()`.

The new general `convolution_backward()` saves the original 3D input / weight, uncovering the bug.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67283

Reviewed By: anjali411

Differential Revision: D32021100

Pulled By: jbschlosser

fbshipit-source-id: 0916bcaa77ef49545848b344d6385b33bacf473d
2021-10-29 09:48:53 -07:00
Sameer Deshmukh
edd4d246c3 Accept 0-dim channel inputs in convolution layer (#66256)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56998 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66256

Reviewed By: mrshenli

Differential Revision: D31859428

Pulled By: jbschlosser

fbshipit-source-id: 034b6c1ce35aac50eabfa09bbcd8b1e3c8b171bd
2021-10-25 08:12:29 -07:00
Eddie Yan
d9c4b3feab Do rowwisemoments computation in float for half LayerNorm (#66920)
Summary:
https://github.com/pytorch/pytorch/issues/66707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66920

Reviewed By: mrshenli

Differential Revision: D31850612

Pulled By: ngimel

fbshipit-source-id: a95a33567285dcf9ee28d33f503cead3268960f9
2021-10-22 09:50:42 -07:00