Commit Graph

1243 Commits

Author SHA1 Message Date
Kurt Mohler
4cfd09d7bc Reland: Add index value checking to MaxUnpool2d and MaxUnpool3d (#78280)
Relanding #70545
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78280
Approved by: https://github.com/jbschlosser
2022-06-03 20:09:07 +00:00
samdow
b7cb4eae6b Fix embedding jvp support by making embedding_renorm ignore forward mode AD (#78560)
On functorch, we started seeing [embedding forward mode fail](https://github.com/pytorch/functorch/pull/816). From looking at it, we figured out that recently [embedding got forward mode support enabled](369d9f4137) and then doing forward mode with embedding and [max_norm doesn't work with gradcheck](https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py#L8877-L8881), so it's not checked.

What was happening is that `embedding_renorm` was setting `torch.no_grad()` which only turns off the backwards mode AD so functorch's jvp tests were still using forward mode AD during the `embedding_renorm` call. This makes it so that we don't use forward mode during the embedding_renorm call
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78560
Approved by: https://github.com/soulitzer, https://github.com/albanD
2022-06-03 19:14:51 +00:00
Eddie Yan
14b0e9e75f [cuDNN] Don't enforce bitwise exact results in test_conv_transposed_large_cuda (#78147)
`test_conv_transposed_large` expects bitwise perfect results in fp16 on CUDA, but this behavior isn't guaranteed by cuDNN (e.g., in the case of FFT algos).

This PR just changes the tolerance on the test to account for these cases.

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78147
Approved by: https://github.com/ngimel
2022-06-03 19:03:24 +00:00
Eddie Yan
b740a99b9e [cuDNN][TF32] Threshold adjustments for TF32 on >=sm80 (#78437)
CC @ptrblck @mcarilli

Change to transformer multilayer test can potentially be swapped in favor of an rtol change? (see also: #75612).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78437
Approved by: https://github.com/ngimel
2022-06-03 01:02:56 +00:00
PyTorch MergeBot
d578197747 Revert "Fix embedding jvp support by making embedding_renorm ignore forward mode AD (#78560)"
This reverts commit ce7c7bb2a9.

Reverted https://github.com/pytorch/pytorch/pull/78560 on behalf of https://github.com/malfet due to broke XLA (on CI and trunk), see ce7c7bb2a9
2022-06-02 17:40:34 +00:00
samdow
ce7c7bb2a9 Fix embedding jvp support by making embedding_renorm ignore forward mode AD (#78560)
On functorch, we started seeing [embedding forward mode fail](https://github.com/pytorch/functorch/pull/816). From looking at it, we figured out that recently [embedding got forward mode support enabled](369d9f4137) and then doing forward mode with embedding and [max_norm doesn't work with gradcheck](https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_methods_invocations.py#L8877-L8881), so it's not checked.

What was happening is that `embedding_renorm` was setting `torch.no_grad()` which only turns off the backwards mode AD so functorch's jvp tests were still using forward mode AD during the `embedding_renorm` call. This makes it so that we don't use forward mode during the embedding_renorm call
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78560
Approved by: https://github.com/soulitzer, https://github.com/albanD
2022-06-02 13:40:21 +00:00
Edward Z. Yang
c20969c40c Fix ParameterList printing meta tensor
Fixes https://github.com/pytorch/pytorch/issues/78250

There are actually two bugs.  First, the crash is caused
by TensorOptions::backend incorrectly reporting noexcept when
it can failed.  Second, ParameterList is using torch.tensortype
for no good reason; we can just print the dtype instead.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78529

Approved by: https://github.com/albanD
2022-06-01 00:46:52 +00:00
mikeiovine
d6db5ea50d Back out "add mixed data type mode for LayerNorm forward path"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78298

Also back out "improve LayerNorm bfloat16 performance on CPU".

These layer norm changes seem fine, but they are causing `LayerNorm` to not use AVX2 instructions, which is causing performance on internal models to degrade. More investigation is needed to find the true root cause, but we should unland to mitigate the issue ASAP.

I left `mixed_data_type.h` around since there are some other files depending on it.

Differential Revision: [D36675352](https://our.internmc.facebook.com/intern/diff/D36675352/)

Approved by: https://github.com/tenpercent
2022-05-26 02:54:13 +00:00
PyTorch MergeBot
c50089712c Revert "Add index value checking to MaxUnpool2d and MaxUnpool3d (#70545)"
This reverts commit 53ef66bb59.

Reverted https://github.com/pytorch/pytorch/pull/70545 on behalf of https://github.com/malfet due to as it broke cuda-10.2 test on trunk, see 53ef66bb59
2022-05-23 23:58:43 +00:00
Kurt Mohler
53ef66bb59 Add index value checking to MaxUnpool2d and MaxUnpool3d (#70545)
Fixes #68727

cc @mruberry @jbschlosser @walterddr @kshitij12345 @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70545
Approved by: https://github.com/ngimel
2022-05-23 21:08:25 +00:00
yuguo68
c186250d95 raise error when groups is not positive in Conv modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77919

Approved by: https://github.com/jbschlosser
2022-05-23 20:35:00 +00:00
Jeff Daily
9aed30d3ad [ROCm] support benchmark flag for MIOpen (#77438)
Fixes #68172.  Generally, this corrects multiple flaky convolution unit test behavior seen on ROCm.

The MIOpen integration has been forcing benchmark=True when calling `torch._C._set_cudnn_benchmark(False)`, typically called by `torch.backends.cudnn.set_flags(enabled=True, benchmark=False)`.  We now add support for MIOpen immediate mode to avoid benchmarking during MIOpen solution selection.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77438
Approved by: https://github.com/ngimel, https://github.com/malfet
2022-05-23 17:10:24 +00:00
zrphercule
734a97a7c8 Revert "Revert "Switch to use nested tensor by-default in Transformer… (#77924)
…Encoder (#77217)""

This reverts commit 0d6fa91d1b.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77924
Approved by: https://github.com/atalman
2022-05-20 11:44:03 +00:00
George Qi
f9db8b72ac MHA forward pass bug fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77761

Approved by: https://github.com/jbschlosser
2022-05-19 01:21:24 +00:00
Joel Benjamin Schlosser
8881d7ac6c Support no-batch-dim for CrossEntropyLoss with prob target
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77653

Approved by: https://github.com/albanD
2022-05-18 19:51:09 +00:00
Nikita Vedeneev
a760dc2687 binary_cross_entropy: double backwart wrt target (#77416)
As per title. An effort to make `binary_cross_entropy` all around differentiable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77416
Approved by: https://github.com/soulitzer
2022-05-18 10:29:27 +00:00
Rui Zhu
4e2f5507d0 Add support for TxT mask layout for masked_softmax in BetterTransformer (#77607)
Summary: Expand mask to BxHxDxD when mask is DxD layout

Test Plan: buck build mode/opt -c fbcode.platform=platform009 -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/opt/gen/caffe2/test/nn\#binary.par -r masked_softmax_DxD

Differential Revision: D36428170

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77607
Approved by: https://github.com/cpuhrsch
2022-05-18 01:31:05 +00:00
PyTorch MergeBot
d8b80edade Revert "Use weakref.proxy when saving module to internal dictionaries to not increase refcount (#76435)"
This reverts commit 1aa3cbb83b.

Reverted https://github.com/pytorch/pytorch/pull/76435 on behalf of https://github.com/jbschlosser
2022-05-17 17:51:26 +00:00
mingfeima
c003494754 add channels last support for PixelShuffle and PixelUnshuffle
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50573

Approved by: https://github.com/VitalyFedyunin
2022-05-17 17:33:49 +00:00
Edward Z. Yang
b5bc954a71 Fix optional dtype/layout/memory_format pycall; fix memory format
Double-header bug fix:

- As reported by jansel, dtypes are still showing up as integers
  when the schema is an optional dtype.  This is simple enough to
  fix and I added a test for it.  But while I was at it...

- I noticed that the THPMemoryFormat_new idiom with "unused" name
  doesn't actually work, the repr of the returned memory format
  object is wrong and this shows up when we try to log the args/kwargs.
  So I fixed memory format to do it properly along with everything
  else.

Fixes https://github.com/pytorch/pytorch/issues/77135

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77543

Approved by: https://github.com/albanD, https://github.com/jansel
2022-05-16 16:46:08 +00:00
mingfeima
8c50414233 add BFloat16 support for BatchNorm on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77496

Approved by: https://github.com/frank-wei
2022-05-16 16:31:18 +00:00
mingfeima
6fa20bdfe8 add native kernel for weight_norm on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73845

Approved by: https://github.com/frank-wei
2022-05-16 06:36:24 +00:00
PyTorch MergeBot
93a969221d Revert "add BFloat16 support for BatchNorm on CPU"
This reverts commit 7c8911ca7a.

Reverted https://github.com/pytorch/pytorch/pull/74410 on behalf of https://github.com/albanD
2022-05-14 14:28:58 +00:00
mingfeima
7c8911ca7a add BFloat16 support for BatchNorm on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74410

Approved by: https://github.com/frank-wei
2022-05-14 07:49:00 +00:00
Rohan Varma
a275491c6f [Reland] load_state_dict post hook (#77392)
Reland of https://github.com/pytorch/pytorch/pull/76823 with fixes to call `__setstate__` for softmax/softmin/logsoftmax as per discussion with @albanD and @jbschlosser. Original description:

Implements `register_load_state_dict_post_hook` API as discussed in https://github.com/pytorch/pytorch/issues/75287.

Unittests cover:
- Ensuring hooks are called with the correct module
- Hook is called with `IncompatibleKeys` field
- If hook modifies this, load_state_dict returns the modified result

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77392
Approved by: https://github.com/jbschlosser
2022-05-14 06:06:23 +00:00
mingfeima
59b56ba785 improve group_norm channels last performance on CPU
add channels_last_3d memory format support

add BFloat16 support on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69067

Approved by: https://github.com/VitalyFedyunin
2022-05-14 03:13:02 +00:00
Kulin Seth
e011a8e18b Enable PyTorch operations on MPS Backend. (#77343)
Add PyTorch operations to MPS backend.

- https://github.com/pytorch/pytorch/issues/77394
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77343
Approved by: https://github.com/albanD
2022-05-13 18:28:53 +00:00
mingfeima
2b7943c47c fix torchvhsion failed case test_classification_model on slow_conv2d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77347

Approved by: https://github.com/datumbox, https://github.com/frank-wei
2022-05-13 08:04:08 +00:00
PyTorch MergeBot
d92b0a51aa Revert "Load state dict post hook"
This reverts commit 56bed0dcfe.

Reverted https://github.com/pytorch/pytorch/pull/76823 on behalf of https://github.com/rohan-varma
2022-05-12 21:00:49 +00:00
ecao
37c6017831 Add BFloat16 support for GLU, and randperm operators on CPU (#61944)
add BFloat16 support for GLU and randperm operators on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61944
Approved by: https://github.com/frank-wei
2022-05-12 17:41:57 +00:00
yanbing-j
4f82f439d1 Enable BFloat16 ELU, SELU and CELU in CPU path (#62546)
Enable BFloat16 ELU, SELU and CELU in CPU path. SELU and CELU will call ELU implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62546
Approved by: https://github.com/frank-wei
2022-05-12 16:56:57 +00:00
mingfeima
3b56efd4e1 add mixed data type mode for LayerNorm forward path
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73844

Approved by: https://github.com/frank-wei
2022-05-12 03:35:06 +00:00
otaj
1aa3cbb83b Use weakref.proxy when saving module to internal dictionaries to not increase refcount (#76435)
Fixes #76434

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76435
Approved by: https://github.com/jbschlosser
2022-05-11 18:40:59 +00:00
mingfeima
3d0e6f169c add channels last support for slow_conv_dilated2d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70665

Approved by: https://github.com/VitalyFedyunin
2022-05-11 15:28:50 +00:00
Rui Zhu
533b44a280 Add _native nested_tensor_from_mask (#76942)
Summary: For user to convert nested tensor more easily. Some impl detail might change on user's need.

Test Plan: buck test mode/dev caffe2/test:nn -- test_nested_tensor_from_mask

Differential Revision: D36191182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76942
Approved by: https://github.com/jbschlosser
2022-05-11 05:19:36 +00:00
mingfeima
3d561ee926 add channels last support for thnn_conv2d (non-dilated)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68101

Approved by: https://github.com/VitalyFedyunin
2022-05-11 00:09:45 +00:00
neverix
87e543da9b Add load_state_dict error message for non-dicts (#77197)
Fixes #76886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77197
Approved by: https://github.com/jbschlosser
2022-05-10 22:11:51 +00:00
Aidyn-A
a127c584a0 Fix max pool forward nhwc (#76597)
Fixes issue #76432.

Added dilation to loops in CUDA kernel.

cc @ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76597
Approved by: https://github.com/ngimel
2022-05-10 17:39:48 +00:00
mingfeima
8d4e069e66 add BFloat16 support for UpSample on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76935

Approved by: https://github.com/frank-wei
2022-05-10 16:56:41 +00:00
Scott Wolchok
e5915a2216 [PyTorch] Don't enter MHA fast path when bias & query dtypes don't match
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76879

The fast path does not support this: transform_bias_rescale_qkv will try to grab bias.data_ptr() assuming the dtypes are the same. (Also, I have no idea how this happens.)

Differential Revision: [D36156872](https://our.internmc.facebook.com/intern/diff/D36156872/)

Approved by: https://github.com/cpuhrsch
2022-05-09 18:21:04 +00:00
Rohan Varma
56bed0dcfe Load state dict post hook
Implements `register_load_state_dict_post_hook` API as discussed in https://github.com/pytorch/pytorch/issues/75287.

Unittests cover:
- Ensuring hooks are called with the correct module
- Hook is called with `IncompatibleKeys` field
- If hook modifies this, load_state_dict returns the modified result

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76823
Approved by: https://github.com/albanD
2022-05-05 19:27:05 +00:00
lkct
b8776e143f Fix false DeprecationWarning in Module.state_dict
Fixes #75404

TODO:
- [x] add tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75507
Approved by: https://github.com/jbschlosser
2022-05-04 20:08:23 +00:00
Nikita Shulga
b074bffa41 Revert D28836788: add BFloat16 support for UpSample on CPU
Test Plan: revert-hammer

Differential Revision:
D28836788 (1399d83bc0)

Original commit changeset: 63dc45e5bb91

Original Phabricator Diff: D28836788 (1399d83bc0)

fbshipit-source-id: 92733af87cba87aed800473ff44ca6d7af037da9
(cherry picked from commit 1c9fc492503b768a343723e4cf347b30bf5dcfc2)
2022-05-02 23:13:39 +00:00
mingfeima
1399d83bc0 add BFloat16 support for UpSample on CPU (#58297)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58297

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D28836788

Pulled By: VitalyFedyunin

fbshipit-source-id: 63dc45e5bb91964d5ff1110262228718289435d1
(cherry picked from commit 8a37d607d6a89ccb50364cf54a6f26ca8d05cab9)
2022-05-02 22:33:26 +00:00
Scott Wolchok
e816e17655 [PyTorch] Add native fast path for transformer encoder inference (#76333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76333

The current PyTorch multi-head attention and transformer
implementations are slow. This should speed them up for inference.
ghstack-source-id: 154737857

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: cpuhrsch

Differential Revision: D35239925

fbshipit-source-id: 5a7eb8ff79bc6afb4b7d45075ddb2a24a6e2df28
2022-04-26 12:58:03 -04:00
Jon Janzen
2387efd356 Revert "[PyTorch] Add native fast path for transformer encoder inference"
This reverts commit b369b89f23.

This has internal changes and should not have been landed via mergebot.

Ref: https://github.com/pytorch/pytorch/pull/75809#issuecomment-1108717166
2022-04-25 11:40:02 -04:00
Scott Wolchok
b369b89f23 [PyTorch] Add native fast path for transformer encoder inference
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75809

The current PyTorch multi-head attention and transformer
implementations are slow. This should speed them up for inference.

Differential Revision: [D35239925](https://our.internmc.facebook.com/intern/diff/D35239925/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35239925/)!

Approved by: https://github.com/ezyang
2022-04-25 06:11:36 +00:00
Peter Bell
cb37e7a080 Remove F.pad python implementation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73433

Approved by: https://github.com/albanD, https://github.com/jbschlosser
2022-04-23 00:13:20 +00:00
Joel Benjamin Schlosser
041e6e750a Fix to support no-batch-dim inputs in ConvTransposeNd._output_padding
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76151

Approved by: https://github.com/albanD
2022-04-22 19:25:09 +00:00
Nikita Vedeneev
9e137ee583 more numerically stable cosine_similarity
**Previous behavior**: compute inner product, then normalize.
**This patch**: first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms.
By design ensures that cosine similarity is within `[-1.0, +1.0]`, so it should fix [#29442](https://github.com/pytorch/pytorch/issues/29442).

P.S. I had to change tests because this implementation handles division by 0 differently.
This PR computes cosine similarity as follows: <x/max(eps, ||x||), y/max(eps, ||y||)>.
Let f(x,y) = <x,y>/(||x|| * ||y||), then
df/dx = y/(||x|| * ||y||) - (||y||/||x|| * <x,y> * x)/(||x|| * ||y||)^2.
The changed test checks division by zero in backward when x=0 and y != 0.
For this case the non-zero part of the gradient is just y / (||x|| * ||y||).
The previous test evaluates y/(||x|| * ||y||) to y / eps, and this PR to 1/eps * y/||y||.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31378
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-04-22 09:28:50 +00:00