Commit Graph

271 Commits

Author SHA1 Message Date
mingfeima
054b90f0d6 add channels last support for ChannelShuffle (#50247)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50247

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26007052

Pulled By: VitalyFedyunin

fbshipit-source-id: 08f737d64a65791c8002ffd56b79b02cf14d6159
2022-01-14 11:55:21 -08:00
Rui Zhu
9267fd8d73 [WIP] [ATen] Add native_multi_attention_self_attention CPU + GPU implementation (#70649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70649

As described in https://fb.quip.com/oxpiA1uDBjgP

This implements the first parts of the RFC, and is a rough draft showing the approach. The idea is that for the first cut we can maintain very close (identical I believe in this diff) numerical equivalence to the existing nn.MHA implementation, which is what this diff attempts to do. In subsequent implementations, once we have a working and adopted native self-attention implementation, we could then explore alternative implementations, etc.

The current implementation is similar to existing dedicated implementations such as LightSeq/FasterTransformer/DeepSpeed, and for MHA on both CPUs and GPUs is between 1.2x and 2x faster depending on the setting. It makes some approximations/restrictions (doesn't handle masking in masked softmax, etc), but these shouldn't materially impact performance.

This does the first few items:

* add native_multi_head_attention(...) , native_multi_head_attention_backward(..) to native_functions.yaml
* Implement native_multi_head_attention(..) on GPU, extracting bits and pieces out of LS/DS/FT as appropriate
* Implement native_multi_head_attention(..) on CPU

The backward implementation is still WIP, but the idea would be to:

* Hook these up in derivatives.yaml
Implement native_multi_head_attention_backward(..) on GPU, extracting out bits and pieces out of LS/DS (not FT since it’s inference only)
* Implement native_multi_head_attention_backward(..) on CPU
* In torch.nn.functional.multi_head_attention_forward 23321ba7a3/torch/nn/functional.py (L4953), add some conditionals to check if we are being called in a BERT/ViT-style encoder fashion, and invoke the native function directly.

Test Plan: TODO

Reviewed By: mikekgfb

Differential Revision: D31829981

fbshipit-source-id: c430344d91ba7a5fbee3138e50b3e62efbb33d96
2022-01-08 21:50:41 -08:00
lezcano
a35b4b49d2 Add linalg.lu_factor (#66933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933

This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.

This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.

We add a test and an OpInfo for the new function.

This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.

Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D32834069

Pulled By: mruberry

fbshipit-source-id: 51ef12535fa91d292f419acf83b800b86ee9c7eb
2022-01-05 20:32:12 -08:00
Heitor Schueroff
34c49d3d3b Document torch.quantile interpolation kwarg (#70637)
Summary:
clone of https://github.com/pytorch/pytorch/pull/59397

This PR documents the interpolation kwarg parameter added in https://github.com/pytorch/pytorch/issues/49267. Now that the forward compatibility period is over, we can expose this parameter.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70637

Reviewed By: jbschlosser

Differential Revision: D33411707

Pulled By: anjali411

fbshipit-source-id: f5f2d0a6739b3a855bbdf58fc671ac2f0342ce69
2022-01-05 11:02:13 -08:00
Joel Schlosser
e6c3aa3880 Remove backward ops for mkldnn convolution (#70467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70467

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33342476

Pulled By: jbschlosser

fbshipit-source-id: 9811d02b16adea0dd1dd2500261f4b3b294d2dee
2021-12-30 14:29:22 -08:00
anjali411
3e6164449f Add efficient zero tensors (#64837)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D32834987

Pulled By: anjali411

fbshipit-source-id: 20ea08ade0db0044ca633d9c1a117a6a2e65d1fd
2021-12-08 10:37:39 -08:00
Mark Richardson
834bd3134e Back out "Add efficient zero tensors" (#69327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69327

Original commit changeset: d44096d88265

Original Phabricator Diff: D32144240 (668574af4a)

Test Plan:
CI

original diff failed 175 builds in CI

Reviewed By: airboyang, anjali411

Differential Revision: D32809407

fbshipit-source-id: c7c8e69bcee0274992e2d5da901f035332e60071
2021-12-02 19:11:41 -08:00
anjali411
668574af4a Add efficient zero tensors (#64837)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32144240

Pulled By: anjali411

fbshipit-source-id: d44096d882657c7f9270a16636900e0b73cefa40
2021-12-02 08:47:45 -08:00
Mike Ruberry
6ae34ea6f8 Revert D32521980: Add linalg.lu_factor
Test Plan: revert-hammer

Differential Revision:
D32521980 (b10929a14a)

Original commit changeset: 26a49ebd87f8

fbshipit-source-id: e1a6bb9c2ece9bd78190fe17e16a46e3358c5c82
2021-11-28 17:22:15 -08:00
lezcano
b10929a14a Add linalg.lu_factor (#66933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933

This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.

This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.

We add a test and an OpInfo for the new function.

This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.

Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32521980

Pulled By: mruberry

fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb
2021-11-27 17:52:48 -08:00
lezcano
b46c89d950 Add linalg.solve_triangular (#63568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568

This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.

This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.

This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.

Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.

We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.

Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32588230

Pulled By: mruberry

fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910
2021-11-22 12:41:06 -08:00
jiej
ca92111758 Add native_dropout (#63937)
Summary:
Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition.

cc gmagogsfm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937

Reviewed By: mruberry

Differential Revision: D32477657

Pulled By: ngimel

fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4
2021-11-18 19:41:10 -08:00
Jane Xu
9f4e004abd Revert D32283178: Add linalg.solve_triangular
Test Plan: revert-hammer

Differential Revision:
D32283178 (0706607abc)

Original commit changeset: deb672e6e52f

fbshipit-source-id: d2a3421292147426cc61c2f063b721acf9004755
2021-11-18 14:46:10 -08:00
lezcano
0706607abc Add linalg.solve_triangular (#63568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568

This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.

This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.

This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.

Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.

We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.

Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: zou3519, JacobSzwejbka

Differential Revision: D32283178

Pulled By: mruberry

fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8
2021-11-18 09:45:51 -08:00
Rok
952ca25daa Sparse CSR: add convert_indices_from_csr_to_coo (#66774)
Summary:
This PR adds conversion from CSR to COO.

Fixes https://github.com/pytorch/pytorch/issues/56959

cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774

Reviewed By: zou3519

Differential Revision: D32288415

Pulled By: cpuhrsch

fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968
2021-11-17 22:28:30 -08:00
rusty1s
9807787135 scatter_reduce (#68115)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63780

Basic functionality of a `scatter_reduce` algorithm with `reduce="sum"`:

* `scatter_reduce` is named as `scatter_reduce2` due to compiling issues
* It currently re-uses functionality from `scatter_add`
* Tests are missing: WIP

The error when the `scatter_reduce` naming is used:
```
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13949:18: error: redefinition of ‘struct at::_ops::scatter_reduce’
13949 | struct TORCH_API scatter_reduce {
      |                  ^~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13817:18: note: previous definition of ‘struct at::_ops::scatter_reduce’
13817 | struct TORCH_API scatter_reduce {
      |                  ^~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13960:18: error: redefinition of ‘struct at::_ops::scatter_reduce_out’
13960 | struct TORCH_API scatter_reduce_out {
      |                  ^~~~~~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13839:18: note: previous definition of ‘struct at::_ops::scatter_reduce_out’
13839 | struct TORCH_API scatter_reduce_out {
      |                  ^~~~~~~~~~~~~~~~~~
In file included from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/core/TensorBody.h: In member function ‘at::Tensor at::Tensor::scatter_reduce(int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>) const’:
aten/src/ATen/core/TensorBody.h:3976:83: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 3976 |     return at::_ops::scatter_reduce::call(const_cast<Tensor&>(*this), dim, index, reduce, output_size);
      |                                                                                   ^~~~~~
      |                                                                                   |
      |                                                                                   c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13824:109: note:   initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’
13824 |   static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce);
      |                                                                                          ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor at::scatter_reduce(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’:
aten/src/ATen/Functions.h:7119:61: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7119 |     return at::_ops::scatter_reduce::call(self, dim, index, reduce, output_size);
      |                                                             ^~~~~~
      |                                                             |
      |                                                             c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13824:109: note:   initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’
13824 |   static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce);
      |                                                                                          ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_out(at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’:
aten/src/ATen/Functions.h:7124:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7124 |     return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out);
      |                                                                 ^~~~~~
      |                                                                 |
      |                                                                 c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13846:111: note:   initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’
13846 |   static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out);
      |                                                                                            ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_outf(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>, at::Tensor&)’:
aten/src/ATen/Functions.h:7129:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7129 |     return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out);
      |                                                                 ^~~~~~
      |                                                                 |
      |                                                                 c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13846:111: note:   initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’
13846 |   static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out);
      |                                                                                            ~~~~~~~~~~~~~~~~~~~^~~
In file included from aten/src/ATen/NativeFunctions.h:6,
                 from ../aten/src/ATen/TensorIndexing.h:12,
                 from ../aten/src/ATen/ATen.h:20,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/NativeMetaFunctions.h: At global scope:
aten/src/ATen/NativeMetaFunctions.h:496:18: error: redefinition of ‘struct at::meta::structured_scatter_reduce’
  496 | struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase {
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~
aten/src/ATen/NativeMetaFunctions.h:481:18: note: previous definition of ‘struct at::meta::structured_scatter_reduce’
  481 | struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase {
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68115

Reviewed By: albanD

Differential Revision: D32488450

Pulled By: cpuhrsch

fbshipit-source-id: 65e79c6d0555c0d5715535bb52aade8d5fcd9722
2021-11-17 19:53:12 -08:00
vfdev-5
3da2e09c9b Added antialias flag to interpolate (CPU only, bilinear) (#65142)
Summary:
Description:
- Added antialias flag to interpolate (CPU only)
  - forward and backward for bilinear mode
  - added tests

### Benchmarks

<details>
<summary>
Forward pass, CPU. PTH interpolation vs PIL
</summary>

Cases:
- PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apply vs pears)
- PTH 1 Channel, float32 vs PIL 1 Channel Float

Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112

```
# OMP_NUM_THREADS=1 python bench_interp_aa_vs_pillow.py

Torch config: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75
  - CuDNN 8.0.5
  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON,

Num threads: 1
[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (320, 196) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                2.9                |          3.1
      channels_last non-contiguous torch.float32  |                2.6                |          3.6

Times are in milliseconds (ms).

[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (460, 220) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                3.4                |          4.0
      channels_last non-contiguous torch.float32  |                3.4                |          4.8

Times are in milliseconds (ms).

[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 96) -------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                1.6                |          1.8
      channels_last non-contiguous torch.float32  |                1.6                |          1.9

Times are in milliseconds (ms).

[----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                9.0                |          11.3
      channels_last non-contiguous torch.float32  |                8.9                |          12.5

Times are in milliseconds (ms).

[----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                2.1                |          1.8
      channels_last non-contiguous torch.float32  |                2.1                |          3.4

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (320, 196) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               1.2               |          1.0

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (460, 220) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               1.4               |          1.3

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |              719.9              |         599.9

Times are in microseconds (us).

[-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               3.7               |          3.5

Times are in milliseconds (ms).

[-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |              834.4              |         605.7

Times are in microseconds (us).

```

</details>

Code is moved from torchvision: https://github.com/pytorch/vision/pull/4208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65142

Reviewed By: mrshenli

Differential Revision: D32432405

Pulled By: jbschlosser

fbshipit-source-id: b66c548347f257c522c36105868532e8bc1d4c6d
2021-11-17 09:10:15 -08:00
Thomas Metcalfe
ba16b1eca7 [numpy] Alias arctan2 to atan2 (#67010)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65906

Adds an alias `arctan2` to improve numpy compatibility

cc mruberry rgommers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67010

Reviewed By: anjali411

Differential Revision: D32378998

Pulled By: mruberry

fbshipit-source-id: 424c5c10c12b49c20ee83ccd109325c480b5b6cf
2021-11-16 09:41:09 -08:00
David Dang
f7366ca51b implemented quantize_per_tensor_dynamic and added a corresponding test script (#68004)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68004

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D32301792

Pulled By: dzdang

fbshipit-source-id: f680557ba4736d095efc33e8c92111265f25aee0
2021-11-13 06:34:36 -08:00
Anirudh Dagar
b07a11929d Array API: Add torch.linalg.cross (#63285)
Summary:
### Create `linalg.cross`

Fixes https://github.com/pytorch/pytorch/issues/62810

As discussed in the corresponding issue, this PR adds `cross` to the `linalg` namespace (**Note**: There is no method variant) which is slightly different in behaviour compared to `torch.cross`.

**Note**: this is NOT an alias as suggested in mruberry's [https://github.com/pytorch/pytorch/issues/62810 comment](https://github.com/pytorch/pytorch/issues/62810#issuecomment-897504372) below
> linalg.cross being consistent with the Python Array API (over NumPy) makes sense because NumPy has no linalg.cross. I also think we can implement linalg.cross without immediately deprecating torch.cross, although we should definitely refer users to linalg.cross. Deprecating torch.cross will require additional review. While it's not used often it is used, and it's unclear if users are relying on its unique behavior or not.

The current default implementation of `torch.cross` is extremely weird and confusing. This has also been reported multiple times previously. (See https://github.com/pytorch/pytorch/issues/17229, https://github.com/pytorch/pytorch/issues/39310, https://github.com/pytorch/pytorch/issues/41850, https://github.com/pytorch/pytorch/issues/50273)

- [x] Add `torch.linalg.cross` with default `dim=-1`
- [x] Add OpInfo and other tests for `torch.linalg.cross`
- [x] Add broadcasting support to `torch.cross` and `torch.linalg.cross`
- [x] Remove out skip from `torch.cross` OpInfo
- [x] Add docs for `torch.linalg.cross`. Improve docs for `torch.cross` mentioning `linalg.cross` and the difference between the two. Also adds a warning to `torch.cross`, that it may change in the future (we might want to deprecate it later)

 ---

### Additional Fixes to `torch.cross`
- [x] Fix Doc for Tensor.cross
- [x] Fix torch.cross in `torch/overridres.py`

While working on `linalg.cross` I noticed these small issues with `torch.cross` itself.

[Tensor.cross docs](https://pytorch.org/docs/stable/generated/torch.Tensor.cross.html) still mentions `dim=-1` default which is actually wrong. It should be `dim=None` after the behaviour was updated in PR https://github.com/pytorch/pytorch/issues/17582 but the documentation for the `method` or `function` variant wasn’t updated. Later PR https://github.com/pytorch/pytorch/issues/41850 updated the documentation for the `function` variant i.e `torch.cross` and also added the following warning about the weird behaviour.
> If `dim` is not given, it defaults to the first dimension found with the size 3. Note that this might be unexpected.

But still, the `Tensor.cross` docs were missed and remained outdated. I’m finally fixing that here. Also fixing `torch/overrides.py` for `torch.cross` as well now, with `dim=None`.

To verify according to the docs the default behaviour of `dim=-1` should raise, you can try the following.

```python
a = torch.randn(3, 4)
b = torch.randn(3, 4)
b.cross(a)  # this works because the implementation finds 3 in the first dimension and the default behaviour as shown in documentation is actually not true.
>>> tensor([[ 0.7171, -1.1059,  0.4162,  1.3026],
        [ 0.4320, -2.1591, -1.1423,  1.2314],
        [-0.6034, -1.6592, -0.8016,  1.6467]])

b.cross(a, dim=-1)  # this raises as expected since the last dimension doesn't have a 3
>>> RuntimeError: dimension -1 does not have size 3
```

Please take a closer look (particularly the autograd part, this is the first time I'm dealing with `derivatives.yaml`). If there is something missing, wrong or needs more explanation, please let me know. Looking forward to the feedback.

cc mruberry Lezcano IvanYashchuk rgommers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63285

Reviewed By: gchanan

Differential Revision: D32313346

Pulled By: mruberry

fbshipit-source-id: e68c2687c57367274e8ddb7ef28ee92dcd4c9f2c
2021-11-11 12:49:41 -08:00
Kurt Mohler
db014b8529 Add set_deterministic_debug_mode and get_deterministic_debug_mode (#67778)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67386

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67778

Reviewed By: ngimel

Differential Revision: D32310661

Pulled By: mruberry

fbshipit-source-id: 300129e96ca51c22fa711182ce6a9f4d4d2ce57f
2021-11-11 12:48:29 -08:00
kshitij12345
510e3026a9 [numpy] add torch.argwhere (#64257)
Summary:
Adds `torch.argwhere` as an alias to `torch.nonzero`

Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`.

From NumPy docs,
> np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257

Reviewed By: qihqi

Differential Revision: D32049884

Pulled By: saketh-are

fbshipit-source-id: 016e49884698daa53b83e384435c3f8f6b5bf6bb
2021-10-30 15:26:11 -07:00
Brian Hirsh
03f3a0331b add slice/select/diagonal_scatter variants as primitive ops (#64430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64430

The functionalization pass needs `{view}_scatter` versions of the slice/select/diagonal ops in order to correctly propagate mutations from a view to its base. On top of that, the implementations need to be primitive w.r.t. autograd, because they look something like `...slice().copy_()`, and the functionalization pass can't use views + mutations inside of it's own alias-removal machinery!

I added some basic tests that I tried to base off of existing tests for views (particularly around testing the derivative formulas), but I'm wondering if I should add something more comprehensive.

Also, as_strided fits into this category - the functionalization pass will need an `as_strided_scatter` op that's primitive w.r.t. autograd. I didn't add it for now, because it'll involve duplicating a bunch of logic from the current `as_strided_backward()` function, and also writing a derivative formula that I wasn't sure how to write :)

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31942092

Pulled By: bdhirsh

fbshipit-source-id: c702a57c2748a7c771c14e4bcc3e996b48fcc4c8
2021-10-28 10:51:12 -07:00
jjsjann123
1ec732bc46 Add fp16/fp32 autocasting to JIT/TorchScript (#63939)
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)

This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.

We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`

The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md

This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.

Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.

2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')

3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.

Credit goes mostly to:
tlemo
kevinstephano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939

Reviewed By: navahgar

Differential Revision: D31093381

Pulled By: eellison

fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
2021-10-27 12:11:36 -07:00
Saketh Are
33790c4e06 Implement histogramdd on CPU (#65318)
Summary:
Implements `torch.histogramdd` analogous to `numpy.histogramdd`.

Builds on https://github.com/pytorch/pytorch/pull/58780, generalizing the existing `torch.histogram` kernel to handle D-dimensional inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65318

Reviewed By: soulitzer

Differential Revision: D31654555

Pulled By: saketh-are

fbshipit-source-id: 14b781fac0fd3698b052dbd6f0fda46e50d4c5f1
2021-10-21 16:09:31 -07:00
Natalia Gimelshein
f29e5220a6 Revert D31474901: [pytorch][PR] [numpy] add torch.argwhere
Test Plan: revert-hammer

Differential Revision:
D31474901

Original commit changeset: 335327a4986f

fbshipit-source-id: 534093e459762ff7a888c58d76e49e362015f2ba
2021-10-21 15:50:54 -07:00
kshitij12345
462f333c01 [numpy] add torch.argwhere (#64257)
Summary:
Adds `torch.argwhere` as an alias to `torch.nonzero`

Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`.

From NumPy docs,
> np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257

Reviewed By: dagitses

Differential Revision: D31474901

Pulled By: saketh-are

fbshipit-source-id: 335327a4986fa327da74e1fb8624cc1e56959c70
2021-10-21 14:02:11 -07:00
lezcano
a2e94b80fa Create linalg.matrix_exp (#62715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62715

Fixes https://github.com/pytorch/pytorch/issues/61648

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31641698

Pulled By: mruberry

fbshipit-source-id: 2e2965d14807b6b4fada4b809d539066dd0ba277
2021-10-19 09:07:15 -07:00
Yukio Siraichi
8854817f44 Implement Python Array API asarray function. (#60627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60627

In this PR, the core of `frombuffer` and `fromDLPack` onto _tensor_new.cpp_. `asarray`
uses such refactored functions for interpreting the object as a tensor. We follow the
Python Array API standard found:

https://data-apis.org/array-api/latest/API_specification/creation_functions.html?highlight=asarray

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31640510

Pulled By: mruberry

fbshipit-source-id: d0869e0d73cb50023d5866b001dac5d34ca30dfd
2021-10-16 21:11:31 -07:00
lezcano
82a216c45b Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64179

This PR follows the discussion in https://github.com/pytorch/pytorch/issues/45063#issuecomment-904431478

Fixes https://github.com/pytorch/pytorch/issues/45063

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30730483

Pulled By: anjali411

fbshipit-source-id: 821d25083f5f682450f6812bf852dc96a1cdf9f2
2021-10-13 07:44:43 -07:00
Kurt Mohler
5883523c1d Remove dtype from torch.Storage and use only torch.ByteStorage (#62030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030

Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible

Fixes https://github.com/pytorch/pytorch/issues/47442

* **THE SERIALIZATION FORMAT IS FULLY FC/BC.** We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today.
* There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate.
* As we no longer know what dtype of a storage is, we've **removed** the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes.
* `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls.  `Storage._new_with_file` and `_set_from_file` require explicit element size arguments.
* It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor.
* It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling.
* The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall.
 To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. **If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage** or your serialization code will degrade to standard file-based serialization.

Original pull request: https://github.com/pytorch/pytorch/pull/59671

Reviewed By: soulitzer, ngimel

Differential Revision: D29466819

Pulled By: ezyang

fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e
2021-10-05 13:50:34 -07:00
Supriya Rao
458a00bacb Back out "[quant] update fused_obs_fake_quant op to accept output_fake_quant argument" (#66063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66063

Original commit changeset: bffe776216d0

Test Plan: CI

Reviewed By: vkuzo

Differential Revision: D31347042

fbshipit-source-id: f56f628dc4690187bf284a8f2fda4c6aae10c1d6
2021-10-05 11:02:54 -07:00
kshitij12345
c1447f06a8 [special] special alias for softmax (#62251)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62251

Reviewed By: H-Huang

Differential Revision: D31141834

Pulled By: mruberry

fbshipit-source-id: aecaf62af248e9034ef589159ce0fb325c729493
2021-10-01 03:55:32 -07:00
Peter Bell
6285348f06 Implement n-dimensional hermitian FFTs (#63890)
Summary:
Closes https://github.com/pytorch/pytorch/issues/59127

cc mruberry peterbell10 walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63890

Reviewed By: ngimel

Differential Revision: D30761909

Pulled By: mruberry

fbshipit-source-id: 06e1e4dc65726f35c99a74f18b9fa36eb7d694a5
2021-09-30 16:02:28 -07:00
Supriya Rao
4666e3f192 [quant] update fused_obs_fake_quant op to accept output_fake_quant argument (#65621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65621

Add a new attribute to the FusedMovingAvgObsFakeQuantize that controls if the Fake Quant operation should be applied at the output of a particular layer. The motivation is to give the users additional control to control the numerics of the fake_quant operators during training. It defaults to always fake quant the output (True).

Note: We will still observer the tensors as before (only the fake_quant operation is controlled using this flag)

For example
```
input model
x -> fc1 -> fc2 -> non_quantizable_op -> fc3

After fake_quant
x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> fake_quant(fc2) -> non_quantizable_op -> fake_quant() -> fc3 -> fake_quantize(fc3)

With output_fake_quant disabled at the output of fc2 and fc3 (since their outputs are non-quantizable)
x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> non_quantizable_op -> fake_quant() -> fc3
```

Test Plan: ./buck-out/gen/caffe2/test/quantization_fx\#binary.par -r test_disable_output_fake_quant

Reviewed By: jerryzh168

Differential Revision: D31174526

fbshipit-source-id: bffe776216d041fb09133a6fb09bfc2c0bb46b89
2021-09-30 01:08:01 -07:00
Edward Yang
70a545b21e Add Tensor._make_wrapper_subclass (#65340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65340

I thought about a few possible ways of doing this.  The main hazard is
that if I create a CPU tensor that doesn't have any real storage, the
moment I actually try to access the data on the tensor I will segfault.
So I don't want to use _make_subclass on a "cpu meta tensor" because
the CPU meta tensor (with no subclass) is radioactive: printing it
will immediately cause a segfault.  So instead, I have to create
the CPU meta tensor AND subclass all in one go, and that means I need
another function for it.  One downside to doing it this way is
I need another overload for explicit strides, and in general it is
difficult to get the view relationships to all work out properly;
tracked at https://github.com/pytorch/pytorch/issues/65339

Fixes https://github.com/pytorch/pytorch/issues/62972
Fixes https://github.com/pytorch/pytorch/issues/62730

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31057231

Pulled By: ezyang

fbshipit-source-id: 73522769e093ae8a1bf0c7f7e594659bfb827b28
2021-09-22 11:10:47 -07:00
albanD
6eafe7f15e Actually deprecate __torch_function__ as plain methods (#64843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64843

Fix for https://github.com/pytorch/pytorch/issues/63767

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991425

Pulled By: albanD

fbshipit-source-id: 1214143b8aea87e6ff406c7fc13096bd15d1a768
2021-09-17 08:32:53 -07:00
albanD
473e55d5b2 Use classmethods for overrides (#64841)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991424

Pulled By: albanD

fbshipit-source-id: 551e2119768f3a4292713f3bfa83930f5506adbd
2021-09-17 08:32:49 -07:00
Heitor Schueroff
b37503e452 Initial implementation of nanmean (#62671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62671

Very crude first implementation of `torch.nanmean`. The current reduction kernels do not have good support for implementing nan* variants. Rather than implementing new kernels for each nan* operator, I will work on new reduction kernels with support for a `nan_policy` flag and then I will port `nanmean` to use that.

**TODO**

- [x] Fix autograd issue

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30515181

Pulled By: heitorschueroff

fbshipit-source-id: 303004ebd7ac9cf963dc4f8e2553eaded5f013f0
2021-09-13 05:53:58 -07:00
Emilio Castillo
1cb3507ed3 Adds DLPack support (#57110)
Summary:
Partially Fixes https://github.com/pytorch/pytorch/issues/55090
Depends on https://github.com/pytorch/pytorch/issues/55365

Inspired by https://github.com/dmlc/dlpack/issues/57#issuecomment-774482973

Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an [`ExternalStream`](https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ExternalStream.html) object like the one we have in CuPy?

TODO: Add tests

Would like some feedback as this design needs quite a few iterations
rgommers leofang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57110

Reviewed By: saketh-are

Differential Revision: D30761481

Pulled By: mruberry

fbshipit-source-id: e85d78df3c1f8defc2a698878da89cd843cb1209
2021-09-12 19:47:15 -07:00
Edward Yang
d4b1016850 Filter out _disabled_torch_function_impl from handle_torch_function (#64689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64689

This brings it in line with the C++ implementation.

Fixes https://github.com/pytorch/pytorch/issues/64687

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30816215

Pulled By: ezyang

fbshipit-source-id: ed36af6c35467ae678d9548197efd97c36d38dec
2021-09-09 07:29:09 -07:00
leslie-fang-intel
768014b3e6 Allow disabling cache in autocast (automatic mixed precision) (#63552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63552

In this PR, we want to exclude these 2 cases in the `Autocast` weight cache usages:

- Using `torch.jit.trace` under the `Autocast`
As report in https://github.com/pytorch/pytorch/issues/50231 and several other discussions, using `torch.jit.trace` under the `Autocast`, the trace process would hit Autocast's weight cache and fails. So we should disable weight cache under the trace process.
- Using `Autocast` with `Grad mode`

  - Usually we are using `Grad mode` for training. Since in the training phase, the weight will change in every step. So we doesn't need to cache the weight.
  - For the recommended `Autocast` training case in the [doc](https://pytorch.org/docs/stable/amp.html), `Autocast` will clear the cache every step leaving the context. We should disable it to save the clear operations.
    ```
    model = Net().cuda()
    optimizer = optim.SGD(model.parameters(), ...)

    for input, target in data:
        optimizer.zero_grad()
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    ```

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30644913

Pulled By: ezyang

fbshipit-source-id: ad7bc87372e554e7aa1aa0795e9676871b3974e7
2021-09-08 07:47:18 -07:00
kshitij12345
2c351c76e0 [special] Alias igamma, igammac to special.gammaninc, special.gammaincc (#61902)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Also added relevant OpInfo

TODO:
* [x] Check rendered docs gammainc : https://docs-preview.pytorch.org/61902/special.html#torch.special.gammainc
* [x] Check rendered docs gammaincc: https://docs-preview.pytorch.org/61902/special.html#torch.special.gammaincc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61902

Reviewed By: ngimel

Differential Revision: D30761428

Pulled By: mruberry

fbshipit-source-id: 06a16432873357958d53364f12a4e91c29779d26
2021-09-07 15:31:26 -07:00
Anirudh Dagar
337c71be05 Array API: Add torch.linalg.matmul alias to torch.matmul (#63227)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62811

Add `torch.linalg.matmul` alias to `torch.matmul`. Note that the `linalg.matmul` doesn't have a `method` variant.

Also cleaning up `torch/_torch_docs.py` when formatting is not needed.

cc IvanYashchuk Lezcano mruberry rgommers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63227

Reviewed By: mrshenli

Differential Revision: D30770235

Pulled By: mruberry

fbshipit-source-id: bfba77dfcbb61fcd44f22ba41bd8d84c21132403
2021-09-07 12:35:32 -07:00
Anirudh Dagar
1a1fb31cfa Support torch.concat alias, add cat OpInfo & remove OpInfo test_out skips {cat, stack, hstack, vtack, dstack} (#62560)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61767

## Changes

- [x] Add `torch.concat` alias to `torch.cat`
- [x] Add OpInfo for `cat`/`concat`
- [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`)
  - [x] `cat`/`concat`
  - [x] `stack`
  - [x] `hstack`
  - [x] `dstack`
  - [x] `vstack`/`row_stack`
- [x] Remove redundant tests for `cat`/`stack`

~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~
**Edit**: cat/concat OpInfo has been added.

**Note**: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch.

Thanks to krshrimali for guidance on my first PR :))

cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560

Reviewed By: saketh-are

Differential Revision: D30762069

Pulled By: mruberry

fbshipit-source-id: 6985159d1d9756238890488a0ab3ae7699d94337
2021-09-06 23:57:18 -07:00
Thomas J. Fan
d3bcba5f85 ENH Adds label_smoothing to cross entropy loss (#63122)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/7455

Partially resolves pytorch/vision#4281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122

Reviewed By: iramazanli

Differential Revision: D30586076

Pulled By: jbschlosser

fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924
2021-08-29 23:33:04 -07:00
Aaron Bockover
c78ab28441 Add support for the ONNX Runtime Eager Mode backend (#58248)
Summary:
This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort.

We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends).

The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248

Reviewed By: astaff

Differential Revision: D30344992

Pulled By: albanD

fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2
2021-08-20 11:17:13 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Rishi Puri
324673a537 rebase for autocast updates to include device_type and dtype flags (#61002)
Summary:
Fixes #{55374}
https://github.com/pytorch/pytorch/issues/55374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61002

Reviewed By: malfet, mruberry

Differential Revision: D30016812

Pulled By: ngimel

fbshipit-source-id: 6e09a29f539d28e9aea5cd9489b1e633cc588033
2021-08-10 20:03:12 -07:00