Commit Graph

678 Commits

Author SHA1 Message Date
ppiskorski
bc38b278bf Extend TensorImpl with BackendMeta (#97429)
BackendMeta offers a binary interface for the backend to attach arbitrary data to TensorImpl. TensorImpl has exactly one "slot" for backend metadata, however backend is free to compose any structure that is opaque to the framework beyond iheriting standard BackendMeta base.

Change-Id: I670fcdd16dd1c2b00f7eaa1cbc5b5dfea59a6221

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97429
Approved by: https://github.com/ezyang
2023-04-04 03:01:14 +00:00
PyTorch MergeBot
45acfc8574 Revert "[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212)"
This reverts commit 313db584f3.

Reverted https://github.com/pytorch/pytorch/pull/97212 on behalf of https://github.com/soulitzer due to Internally someone is rely on _wrap_outputs and we updated its signature
2023-03-30 22:03:07 +00:00
soulitzer
313db584f3 [BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212)
Fixes https://github.com/pytorch/pytorch/issues/96887

We error out in BOTH the case when graph is created and when it is not created.

Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context.

This makes setup_context and non-setup_context versions diverge in their behavior
- With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs.
- But now with the setup_context version, we produce an error for this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212
Approved by: https://github.com/zou3519
2023-03-29 17:54:00 +00:00
PyTorch MergeBot
2ef6ffdfa1 Revert "[BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212)"
This reverts commit f3aca45a16.

Reverted https://github.com/pytorch/pytorch/pull/97212 on behalf of https://github.com/soulitzer due to TestAutogradFunctionCUDA.test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_output_mark_dirty_True_cuda leaks
2023-03-28 18:30:51 +00:00
soulitzer
f3aca45a16 [BE][autograd Function] Raise an error if input is returned as-is and saved for forward or backward in setup_context (#97212)
Fixes https://github.com/pytorch/pytorch/issues/96887

We error out in BOTH the case when graph is created and when it is not created.

Still bc-breaking, but not as severe because we are limiting to the case where someone uses setup_context.

This makes setup_context and non-setup_context versions diverge in their behavior
- With the non-setup_context version, saved variables are assumed to have the grad_fn of the inputs.
- But now with the setup_context version, we produce an error for this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97212
Approved by: https://github.com/zou3519
2023-03-28 03:14:32 +00:00
Rishub Tamirisa
f3b8638074 Adding nn.ZeroPad1d and nn.ZeroPad3d (#96295)
Fixes #95796

### Implementation
Adds python implementation for `nn.ZeroPad1d` and `nn.ZeroPad3d` in `torch/nn/modules/padding.py`.

Adds cpp implementation for `nn::ZeroPad1d` and `nn::ZeroPad3d` in the following 3 files, refactored with templates similarly to `nn::ConstantPad`'s implementation: <br>
- `torch/crsc/api/include/torch/nn/modules/padding.h`
- `torch/csrc/api/include/torch/nn/options/padding.h`
- `torch/csrc/api/src/nn/modules/padding.cpp`

Also added relevant definitions in `torch/nn/modules/__init__.py`.
### Testing
Adds the following tests:
-  cpp tests of similar length and structure as `ConstantPad` and the existing `ZeroPad2d` impl in `test/cpp/api/modules.cpp`
- cpp API parity tests in `torch/testing/_internal/common_nn.py`
- module init tests in `test/test_module_init.py`

Also added relevant definitions in `test/cpp_api_parity/parity-tracker.md`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96295
Approved by: https://github.com/soulitzer
2023-03-10 03:51:41 +00:00
kshitij12345
3b966a6ce3 [autograd] disable backward/grad for complex scalar output (#92753)
Fixes https://github.com/pytorch/pytorch/issues/92750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753
Approved by: https://github.com/ezyang
2023-02-23 11:38:27 +00:00
Kshiteej K
68a98537d5 [fix] nn c++ : segfault in modulelist and moduledict (#93074)
Fixes https://github.com/pytorch/pytorch/issues/73565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93074
Approved by: https://github.com/albanD
2023-01-27 12:20:19 +00:00
Jane Xu
819bd5b77a [nn] add set_to_none flag for C++ optim endpoint (#92989)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92989
Approved by: https://github.com/ngimel, https://github.com/Skylion007
2023-01-26 04:16:52 +00:00
Jane Xu
b90496eef5 [nn] zero_grad() set_to_none default True (#92731)
Attempts to fix #92656

BC-breaking! This changes the default of zero_grad in optim and in nn to default set grads to None instead of zero tensors. We are changing the default because there are proven perf wins and existing code has typically not regressed due to this change. (will probably have to flesh out this note more).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92731
Approved by: https://github.com/ngimel
2023-01-26 01:04:28 +00:00
mikey dagitses
322e4b4c8a set -Wsuggest-override for builds (#89852)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852).
* __->__ #89852
* #89851

set -Wsuggest-override for builds

Summary: This was flagged by a Meta internal build.

Test Plan: Rely on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852
Approved by: https://github.com/malfet
2022-12-19 22:08:47 +00:00
Everton Constantino
dd6beca854 Changing the use from ASSERT_EQ to ASSERT_FLOAT_EQ on nn_utils test. (#83693)
Changing the use from ASSERT_EQ to ASSERT_FLOAT_EQ on nn_utils.cpp:ClipGradNorm as this is the proper way to compare equality between floating point values. This avoids `test_api` ClipGradNorm failing for WoA.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83693
Approved by: https://github.com/ngimel, https://github.com/kit1980
2022-11-15 04:10:52 +00:00
kshitij12345
d15a6b0c97 Error on ZeroTensor serialization (#88803)
Follow-up : https://github.com/pytorch/pytorch/pull/88182#issuecomment-1308628415

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88803
Approved by: https://github.com/anjali411
2022-11-11 08:51:29 +00:00
kshitij12345
eb9b156019 [fix] MathBits: serialization (#88182)
Fixes #81690

TODO:

* [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++)
* [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python)
* [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this)
* [x] Add Comments
* [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.)

Notes:
Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : https://github.com/pytorch/pytorch/issues/88484

Sparse Tensor:
```python
>>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse()
>>> a.conj().is_conj()
False
>>> a._neg_view()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Cannot access storage of SparseTensorImpl
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88182
Approved by: https://github.com/ezyang, https://github.com/anjali411
2022-11-09 17:15:12 +00:00
Kurt Mohler
1dbc8ad3b7 Add Warning class and refactor C++ warnings to use it (#84101)
Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros

Part of #72948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101
Approved by: https://github.com/albanD
2022-10-18 20:02:42 +00:00
Sahan Paliskara
936e93058b Delete torch::deploy from pytorch core (#85953)
As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there.

This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953
Approved by: https://github.com/seemethere, https://github.com/malfet
2022-10-06 07:20:16 +00:00
Edward Z. Yang
6cd9c447da Make test_api compile on DEBUG mode with some compiler versions (#86092)
The symbol seems to conflict under some compiler versions, giving
an error like "relocation refers to global symbol which is defined in a
discarded section".  Simple enough to put it in an anonymous namespace,
so why not.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86092
Approved by: https://github.com/Chillee
2022-10-03 13:52:32 +00:00
lezcano
787028cadb Implement col2im decomposition and fix im2col and add a few preconditions (#85541)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85541
Approved by: https://github.com/jansel
2022-09-30 09:31:53 +00:00
Mikayla Gawarecki
afaee00fec Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593)
Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ).

Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc.

Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593
Approved by: https://github.com/drisspg, https://github.com/cpuhrsch
2022-09-28 20:15:02 +00:00
Thomas Viehmann
e41d758e26 Handle implicit real->complex casting for backward of stack (#84993)
Fixes: #75852

P.S.: Yay for the PyTorch foundation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84993
Approved by: https://github.com/soulitzer
2022-09-19 21:20:34 +00:00
Steven Krawczyk
89525cbd69 Add variable_list support to ExtractVariables struct (#84583)
This is required to unblock https://github.com/pytorch/xla/pull/3843, which lowers the einsum op for pytorch/xla.  Because one method input parameter is a TensorList, we need to support TensorLists here so that we can support einsum gradients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84583
Approved by: https://github.com/soulitzer
2022-09-16 01:26:22 +00:00
Edward Z. Yang
ccade9410f Don't detach when making views; force caller to detach (#84893)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84893
Approved by: https://github.com/soulitzer, https://github.com/SherlockNoMad
2022-09-14 22:32:45 +00:00
PyTorch MergeBot
8ca057eb71 Revert "Don't detach when making views; force caller to detach (#84893)"
This reverts commit 3bb8d6a93c.

Reverted https://github.com/pytorch/pytorch/pull/84893 on behalf of https://github.com/malfet due to Broke MPS, see 3bb8d6a93c
2022-09-14 01:09:04 +00:00
Edward Z. Yang
3bb8d6a93c Don't detach when making views; force caller to detach (#84893)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84893
Approved by: https://github.com/soulitzer, https://github.com/SherlockNoMad
2022-09-13 23:31:21 +00:00
vfdev-5
d951165bd8 [C++ API] Added missing antialiasing path in interpolation C++ api (#84599)
Description:

Following https://github.com/pytorch/pytorch/pull/69318#issuecomment-1238433540 adding missing bicubic path for anti-alias flag to C++ frontend.

- https://github.com/pytorch/pytorch/pull/70930

- added tests in pytorch/test/cpp/api/functional.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84599
Approved by: https://github.com/kit1980, https://github.com/malfet
2022-09-13 03:54:07 +00:00
Mikayla Gawarecki
e217b30b0f Add torch.nested namespace (#84102)
First step towards #83775
- only `to_padded_tensor` is moved to the nested namespace for now
- following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in
`torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`.

~~**Question**: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~

[generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested)

Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102
Approved by: https://github.com/drisspg
2022-09-12 16:31:05 +00:00
Peter Bell
b429a17545 Enable -Wunused-local-typedefs (#83708)
I recently had a PR reverted because it triggered an
unused-local-typedefs warning, so disabling these in the CMake build
is counter-productive.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83708
Approved by: https://github.com/albanD
2022-08-26 15:45:47 +00:00
soulitzer
31fad3926a Add option to run anomaly mode without nan checking (#83481)
Fixes https://github.com/pytorch/pytorch/issues/83117

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83481
Approved by: https://github.com/albanD
2022-08-16 22:56:23 +00:00
richard
382ef1fda7 Autograd graphtask trim unnecessary edges (#82544)
### Introduction
<!-- What did you change and why was it needed? -->

Removing unnecessary weight gradient calculation is very important for applications that need high-order derivatives during training. However, this is not supported by the current Autograd engine.

For more detail: The backward function of a `matmul` operator (e.g., `linear` `addmm` `mm`), has two matmuls, one for `input gradient` and another for `weight gradient`. For a typical neural network (nn) with a few linear layers and activation functions, if the user calls `torch.autograd.grad()` to calculate the derivative of the nn output `y` w.r.t the nn input `x`,  only the `input gradient` of the `matmul` operator is needed, and the `weight gradient` is discarded. However, the current PyTorch autograd engine will always calculate the `weight gradient` if `weight` requires gradient (the calculation of the high-order derivative is performed during training).

The figure attached shows the autograd graph of the following code snippet:
```py
y = torch.nn.functional.linear(x, weight, bias)
y = y.pow(2)
# first order derivative
y__x, = torch.autograd.grad(y, x, grad_outputs=grad_outputs, create_graph=True)
# first order derivative
y__x__x, = torch.autograd.grad(y__x, x, grad_outputs=grad_outputs, create_graph=True)
```
The path with  is not needed when calculating derivatives.

<img width="50%" alt="image" src="https://user-images.githubusercontent.com/9999318/182018117-719c5a23-bcc6-4a63-8e8d-1bca3ebda2e3.png">

### Issue
<!-- Link to Issue ticket or RFP -->
Related issue: https://github.com/pytorch/pytorch/issues/56500

### Method
When calling `torch.autograd.grad`, `exec_info_` is created for each GraphTask, which allows filtering paths on the graph that are not needed. However, when the GraphTask calls into the node, the node still does not know whether the edges are needed or not. In the case of matmul, `weight.requires_grad is True` so the weight gradient is always calculated.

Following https://github.com/pytorch/pytorch/issues/56500#issuecomment-825694656, this PR passes the graph task's thread_local `exec_info_` into the node, so it could trim unnecessary edges during `torch.autograd.grad` calls.

### Benchmark
Benchmark script: https://gist.github.com/yueyericardo/24158433a2021c51eeef9c3e2722df99

Benchmark result:
6 hidden layers, batch size 10000, on A100

FP32 result
| hessian benchmark             | FP32 (before) | FP32 (After)      | FP32 (Functorch v0.1.1) |
| ----------------------------- | ------------- | ----------------- | ----------------------- |
| Linear + ReLU (no backward)   | 55.658 ms     | 29.392 ms (1.90X) | 29.547 ms (1.90X)       |
| Linear + ReLU (with backward) | 81.173 ms     | 54.917 ms (1.47X) | 68.988 ms (1.18X)       |

TF32 result
| hessian benchmark             | TF32 (before) | TF32 (after)      | TF32 (Functorch v0.1.1) |
| ----------------------------- | ------------- | ----------------- | ----------------------- |
| Linear + ReLU (no backward)   | 19.801 ms     | 11.259 ms (1.76X) | 10.754 ms (1.84X)       |
| Linear + ReLU (with backward) | 29.167 ms     | 20.466 ms (1.42X) | 22.784 ms (1.28X)       |

For FP32 result, we could get 1.9X speed up for hessian calculation, and 1.47X speed up during training, which is even faster than functorch `vmap(jacfwd(jacrev` implementation. (functorch has performance regression on v0.2.0, https://github.com/pytorch/functorch/issues/989, so we are using v0.1.1 for benchmark)

@zou3519 does functorch also includes similar optimizations during hessian calculation? If not, what do we need to do so the functorch could also benefit from this PR?

### Testing
<!-- How did you test your change? -->

- [x] we need to figure out a way for unittest

### Thanks
Thanks for the great blog: [How Computational Graphs are Executed in PyTorch | PyTorch](https://pytorch.org/blog/how-computational-graphs-are-executed-in-pytorch/)

cc @zasdfgbnm @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82544
Approved by: https://github.com/soulitzer
2022-08-11 18:50:09 +00:00
Nikita Shulga
1b2a17b8f9 Build MacOS binaries with -Werror (#83049)
Should prevent proliferating MPS warnings
Fixes https://github.com/pytorch/pytorch/issues/82966
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83049
Approved by: https://github.com/albanD, https://github.com/ezyang
2022-08-10 17:29:44 +00:00
Nikita Shulga
62c8d30f9f [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-10 14:32:26 +00:00
PyTorch MergeBot
d3a1f17fc7 Revert "[BE] Add append_cxx_flag_if_supported macro (#82883)"
This reverts commit d7e6aaa59b.

Reverted https://github.com/pytorch/pytorch/pull/82883 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-10 10:27:59 +00:00
Nikita Shulga
d7e6aaa59b [BE] Add append_cxx_flag_if_supported macro (#82883)
And use it throughout the CMakeLists and rectify `IF(APPLE)`/`IF(GNU_CXX_VERSION VERSION_GREATER A.B)` and so on

Also, add `target_compile_options_if_supported` and use it in `Dependencies.cmake` as well as in test's `CMakeListst.txt`

Delete `-Wno-unknown-warning-option` to test that conditions indeed working as expected
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82883
Approved by: https://github.com/seemethere
2022-08-08 21:04:09 +00:00
Peter Bell
8d0cbce069 Lower randint default dtype to the C++ API (#81410)
The default dtype for randint is currently handled with manual python
binding code, this moves it into the `native_functions.yaml` declaration
for API consistency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81410
Approved by: https://github.com/albanD
2022-07-21 16:42:49 +00:00
soulitzer
516f3198d6 Fix retains grad behavior after in-place (#79996)
See this doc: https://docs.google.com/document/d/1KiRdnoj6B4cI3yl017hTbCqcOGO1gWIpUf20sldipHM/edit#

Two issues (1) regarding hooks in general and (2) regarding retains grad hooks are fixed, Python hooks, which rely on a different mechanism are not discussed here:
- Hooks in cpp in general
  - (fixed) new hooks to registered to a newer version of the tensor no longer get applied to grad_fn
    associated with older version of the tensor when the first hook was ever registered
  - (unchanged) hooks registered to the older version of the tensor remain active on
- Retains grad hooks
  - (fixed) now get moved to the latest grad_fn. NB: To the user, retains_grad is not considered hooks
    or expected to behave like hooks (which we consider properties of the grad_fn) vs retains_gradness
    which is a property of the tensor.
- (not in this PR) Python hooks
  - (will fix) same issue as hooks in cpp where new hooks are being applied to grad_fn associated
    with the older version of the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79996
Approved by: https://github.com/albanD
2022-07-08 19:13:28 +00:00
Sergei Vorobev
a8b0988596 Fix //:module_test Conversion_MultiCUDA (#79926)
Fixes #79871

Make `module.cpp` tests respect change that was made in #78436 (no int types in autograd).

Note that there still a gap in Cmake test -- it's unclear why it didn't fail CI before.

As far as I can tell it should be executed, because it's included here 79507d2a9d/test/cpp/api/CMakeLists.txt (L17):L17

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79926
Approved by: https://github.com/soulitzer
2022-06-21 23:32:18 +00:00
Michael Suo
30fb2c4aba [lint] autoformat test/cpp and torch/csrc
Let's have some fun.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828

Approved by: https://github.com/ezyang
2022-06-11 21:11:16 +00:00
Michael Andreas Dagitses
ab2ca95dd1 turn on -Werror=unused-variable in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-11 02:46:34 +00:00
Nikita Shulga
3255ddeec9 Make Wunused-local-typedef a hard error (#77918)
Only allow it for `libtorch_python` and tests
Helps prevent regression like https://github.com/pytorch/pytorch/pull/76547#issuecomment-1132208232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77918
Approved by: https://github.com/osalpekar, https://github.com/seemethere
2022-06-09 18:14:01 +00:00
yanbing-j
4f82f439d1 Enable BFloat16 ELU, SELU and CELU in CPU path (#62546)
Enable BFloat16 ELU, SELU and CELU in CPU path. SELU and CELU will call ELU implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62546
Approved by: https://github.com/frank-wei
2022-05-12 16:56:57 +00:00
Pavithran Ramachandran
f984e50f39 Extend jit::load to work on flatbuffer file; Take 2 (#75256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75256

ghstack-source-id: 153138970

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D35399581

fbshipit-source-id: dafe9d301009d3f70986ed92bfe06d160ab90ba0
(cherry picked from commit ccc860fd07946de5aae12bc179a0b8bbba83b997)
2022-04-06 17:54:01 +00:00
Lu Fang
32e58c73c4 Back out "Extend jit::load to work on flatbuffer file" (#75244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75244

Original commit changeset: d653a5af662a

Original Phabricator Diff: D35060736 (d9d34922a0)

Test Plan: Model loading test, verified that D35060736 (d9d34922a0) will cause the torch::save => torch::load failure.

Reviewed By: yinghai, jianyuh

Differential Revision: D35387009

fbshipit-source-id: 9d176992d402d57779e2af3d905b3c1538335298
(cherry picked from commit 6c8cc0d3b8a88b15e35702d70e18bbae8aa4628a)
2022-04-05 09:55:04 +00:00
Nikita Shulga
81d765ef1f Fix sign-compare violations in cpp tests
Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75080

Approved by: https://github.com/atalman
2022-04-04 23:05:31 +00:00
Pavithran Ramachandran
d9d34922a0 Extend jit::load to work on flatbuffer file (#75022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75022

Extending torch::jit::load to read flatbuffer file
ghstack-source-id: 152820697

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D35060736

fbshipit-source-id: d653a5af662a46107ff4fd70209fd2a0a4d40f20
(cherry picked from commit 109e14a54bd279011c8f9066e6c29e8e0b1fc4db)
2022-04-02 01:33:34 +00:00
Kurt Mohler
5375b2e994 Resolve int[]? arguments to new OptionalIntArrayRef class
This PR uses the `OptionalArrayRef` template class that was drafted in #64084.

Fixes #44409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70864
Approved by: https://github.com/ezyang
2022-03-26 01:45:50 +00:00
Yedidya Feldblum
7a5b0efc64 [caffe2] fix build failures in optimized builds under clang
Summary:
There are various possible approaches, but the approach chosen minimizes disruption to source control blame.

Addresses:
```
error: Function _ZN23FunctionalTest_Pad_Test8TestBodyEv is too big to optimize [-Werror,-Wignored-optimization-argument]
```

Test Plan: buck2 build mode/opt caffe2/test/cpp/api:functional

Reviewed By: jamesr66a

Differential Revision: D34027291

fbshipit-source-id: 9dfd771ad56d3d4bc0d41b38b04654c8dae7c006
(cherry picked from commit d43b5a7ed6)
2022-02-22 22:31:47 +00:00
Ryan Spring
4f8b986e28 Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: VitalyFedyunin

Differential Revision: D33894937

Pulled By: jbschlosser

fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851
(cherry picked from commit 6e986f91a9)
2022-02-14 03:40:32 +00:00
kshitij12345
02f6226bff [fix] Dropout2d-3d no-batch-dim (#69885)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69801

TODO:
* [x] Update C++ API

cc albanD mruberry jbschlosser walterddr kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69885

Reviewed By: mruberry

Differential Revision: D33175470

Pulled By: jbschlosser

fbshipit-source-id: c9d7d9e0f59ba290a0157725c338a345f3d58b9f
(cherry picked from commit 7e4271a156)
2022-02-02 16:40:32 +00:00
Nikita Shulga
74c44ba9d6 Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33850228 (23d03025dc)

Original commit changeset: 3cc33fb298e4

Original Phabricator Diff: D33850228 (23d03025dc)

fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692
(cherry picked from commit c9efb58223)
2022-01-31 17:44:19 +00:00
Ryan Spring
23d03025dc Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: cpuhrsch

Differential Revision: D33850228

Pulled By: jbschlosser

fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33
(cherry picked from commit 3a53b3e94f)
2022-01-31 17:07:45 +00:00
Joel Schlosser
cb823d9f07 Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation
Test Plan: revert-hammer

Differential Revision:
D33744717 (f499ab9cef)

Original commit changeset: d64532a562ed

Original Phabricator Diff: D33744717 (f499ab9cef)

fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93
(cherry picked from commit e9fb2d1db1)
2022-01-28 18:35:01 +00:00
Ryan Spring
f499ab9cef Implement Tanh Gelu Approximation (#61439)
Summary:
1. Implements https://github.com/pytorch/pytorch/issues/39853
2. Adds approximate boolean flag to Gelu
3. Enables Tanh Gelu approximation
4. Adds double backward support for Gelu
5. Enable Tanh Gelu in NvFuser

```
def gelu(x, approximate : str = 'none'):
    if approximate == 'tanh':
        # sqrt(2/pi) = 0.7978845608028654
        return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0))))
    else:
        return x * normcdf(x)
```

Linking XLA PR - https://github.com/pytorch/xla/pull/3039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439

Reviewed By: mikaylagawarecki

Differential Revision: D33744717

Pulled By: jbschlosser

fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187
(cherry picked from commit 4713dd9cca)
2022-01-28 16:59:09 +00:00
Joel Schlosser
e6befbe85c Add flag to optionally average output attention weights across heads (#70055)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47583

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70055

Reviewed By: bhosmer

Differential Revision: D33457866

Pulled By: jbschlosser

fbshipit-source-id: 17746b3668b0148c1e1ed8333227b7c42f1e3bf5
2022-01-06 17:32:37 -08:00
George Qi
8af39b7668 AdaptiveLogSoftmaxWithLoss no_batch_dim support (#69054)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69054

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33200166

Pulled By: george-qi

fbshipit-source-id: 9d953744351a25f372418d2a64e8402356d1e9b7
2021-12-29 10:25:26 -08:00
vfdev-5
ce9a2f8ba9 [C++ API] Added missing nearest-exact mode and anti-alias flag (#69318)
Summary:
Description:

Following https://github.com/pytorch/pytorch/pull/65142#issuecomment-981995692 adding missing nearest-exact mode and anti-alias flag to C++ frontend.

- https://github.com/pytorch/pytorch/pull/65142
- https://github.com/pytorch/pytorch/pull/64501

- added tests in pytorch/test/cpp/api/functional.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69318

Reviewed By: davidberard98

Differential Revision: D33278995

Pulled By: jbschlosser

fbshipit-source-id: fa87c0c78df6b398e4f9688cc02111eed187afa7
2021-12-22 11:10:51 -08:00
George Qi
bb51519937 bug fix FractionalMaxPool2d (random_samples dimensions) (#70031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70031

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33200618

Pulled By: george-qi

fbshipit-source-id: 142f224c2cab1008d2d4e9ed333697a92d2d42db
2021-12-21 12:21:54 -08:00
Richard Barnes
afb742382a use irange for loops 10 (#69394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394

Modified loops in files under fbsource/fbcode/caffe2/ from the format
```
for(TYPE var=x0;var<x_max;x++)
```
to the format
```
for(const auto var: irange(xmax))
```

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D32837991

fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32
2021-12-09 09:49:34 -08:00
Ramanpreet Nara
f587267dc7 Revert D31705359: use irange for loops 8
Test Plan: revert-hammer

Differential Revision:
D31705359 (17e5200441)

Original commit changeset: c9ea2fbc0f9c

fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14
2021-12-02 12:55:08 -08:00
Richard Barnes
17e5200441 use irange for loops 8 (#66743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705359

fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b
2021-12-02 10:21:29 -08:00
Vinnam Kim
7b701ce2d4 Add set_to_none option to C++ API (#68801)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68167.

Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801

Reviewed By: mruberry

Differential Revision: D32625239

Pulled By: jbschlosser

fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82
2021-11-29 08:42:39 -08:00
Pavithran Ramachandran
1ce500f56f [easy][PyTorch] Use at::native::is_nonzero (#67195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67195

Now that `is_nonzero` is part of `at::native` refer https://github.com/pytorch/pytorch/pull/66663, replacing `TensorCompare::is_nonzero` to `at::native::is_nonzero`

ghstack-source-id: 141514416

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D31704041

fbshipit-source-id: 36813e5411d0aa2eb2d0442e2a195bbed417b33d
2021-10-26 12:40:32 -07:00
Richard Barnes
e0643fa3fc use irange for loops 5 (#66744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31705358

fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48
2021-10-18 21:59:50 -07:00
Xue Li
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
Richard Barnes
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
soulitzer
93d326c868 Add InplaceOrView boxed kernel (#63878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878

See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context:

In this PR:
 - Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel
   - Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node.
   - This limitation makes it impossible to use this node for general use
   - view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor)
   - do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable
 - Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature:
   - static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`.
 - Adds testing:
    - for views: view relationship created
      -  performing in-place operation on the view, raises properly
      - trying to create two view relationships is not allowed,
      - single view relationship but not first input/first output should error
      - view relation created properly for tensor vector output
    - for in-place:
      - version count bump
      - triggers rebase_history
      - multiple mutations is okay and also updates version counter
 - TODO (follow up): Update tutorials for adding  third-party operators (and document the above limitations)
 - TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code

EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30901714

Pulled By: soulitzer

fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c
2021-10-12 18:55:50 -07:00
Nikita Shulga
4c4525fa5c Compile without -Wno-unused-variable (take 2) (#66041)
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`

Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants

Do not delete `caffe2::OperatorBase::Output` calls as they have side effects

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041

Reviewed By: ngimel

Differential Revision: D31360142

Pulled By: malfet

fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8
2021-10-04 20:39:39 -07:00
Nikita Shulga
e4ee5ca698 Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable
Test Plan: revert-hammer

Differential Revision:
D31326599 (a6280ab653)

Original commit changeset: 924155f1257a

fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf
2021-10-01 20:40:47 -07:00
Nikita Shulga
a6280ab653 Compile without -Wno-unused-variable (#65954)
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`

Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954

Reviewed By: ngimel

Differential Revision: D31326599

Pulled By: malfet

fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3
2021-10-01 17:40:47 -07:00
Michael Suo
33c03cb61a [deploy][1/n] Make deploy code conform to PyTorch style. (#65861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861

First in a series. This PR changes the code in deploy.h/cpp and
interpreter_impl.h/cpp to be camel case instead of snake case. Starting
with this as it has the most impact on downstream users.

Test Plan: Imported from OSS

Reviewed By: shannonzhu

Differential Revision: D31291183

Pulled By: suo

fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934
2021-09-30 22:59:47 -07:00
kshitij12345
a012216b96 [nn] Fold : no batch dim (#64909)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64907
Reference: https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909

Reviewed By: cpuhrsch, heitorschueroff

Differential Revision: D30991087

Pulled By: jbschlosser

fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3
2021-09-23 08:37:32 -07:00
Edward Yang
9601deb1b3 Disable autograd fallback tests on Windows (#65147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147

I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763
ghstack-source-id: 138247203

Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg`

Reviewed By: soulitzer

Differential Revision: D30992685

fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6
2021-09-17 08:32:43 -07:00
Peter Bell
d701357d92 Factor out TensorBase that doesn't depend on native operators (#63612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612

This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't
directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to
be rebuilt every time someone changes an operator signature.

Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable
with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to
minimize friction in code mixing the two types.

To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error
into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build
system for certain folders, or just define it at the top of any file.

I've also included an example of manually special-casing the commonly used `contiguous` operator.
The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in
`Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can
materialize a `Tensor` for use in dispatch without actually increasing its refcount.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728580

Pulled By: ezyang

fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03
2021-09-08 13:28:54 -07:00
Maksim Levental
81fe2c5e49 add out variant of linear (#61801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61801

resubmitting because the last one was unrecoverable due to making changes incorrectly in the stack

Test Plan: Imported from OSS

Reviewed By: desertfire

Differential Revision: D29812510

Pulled By: makslevental

fbshipit-source-id: ba9685dc81b6699724104d5ff3211db5852370a6
2021-09-07 19:58:52 -07:00
Will Constable
85df73658c Make name() part of IMethod interface (#63995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995

JIT methods already have name() in their interface, and Py methods have names in their implementation.  I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod.

Test Plan: add case to imethod API test

Reviewed By: suo

Differential Revision: D30559401

fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c
2021-08-30 13:31:55 -07:00
Thomas J. Fan
d3bcba5f85 ENH Adds label_smoothing to cross entropy loss (#63122)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/7455

Partially resolves pytorch/vision#4281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122

Reviewed By: iramazanli

Differential Revision: D30586076

Pulled By: jbschlosser

fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924
2021-08-29 23:33:04 -07:00
soulitzer
90a6498a12 Add autograd not implemented boxed fallback (#63458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63458

See description and discussion from https://github.com/pytorch/pytorch/pull/62450

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30518572

Pulled By: soulitzer

fbshipit-source-id: 3b1504d49abb84560ae17077f0dec335749c9882
2021-08-27 15:00:28 -07:00
Jiewen Tan
ed573a8e08 Enable test_api IMethodTest in OSS (#63345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345

This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.
3. Generated torch::deploy examples when building torch_deploy library.

Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.*

Reviewed By: ngimel

Differential Revision: D30346257

Pulled By: alanwaketan

fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d
2021-08-26 16:50:52 -07:00
yanbing-j
33a163d886 Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514)
Summary:
Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514

Reviewed By: ejguan

Differential Revision: D30257612

Pulled By: VitalyFedyunin

fbshipit-source-id: 8cc0d1faacd02dcc9827af724a86d95b6952748f
2021-08-24 08:34:56 -07:00
Shen Li
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
Zsolt Dollenstein
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
Will Constable
22e3cc21e5 Back out "Enable test_api IMethodTest in OSS" (#62893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62893

Original commit changeset: 50eb3689cf84

Test Plan: Confirm pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 passes in OSS

Reviewed By: seemethere, alanwaketan

Differential Revision: D30159999

fbshipit-source-id: 74ff8975328409a3dc8222d3e2707a1bb0ab930c
2021-08-06 16:43:50 -07:00
Jiewen Tan
4b68801c69 Enable test_api IMethodTest in OSS (#62521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521

This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.

Test Plan:
./build/bin/test_api --gtest_filter=IMethodTest.*

To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command.

Reviewed By: ezyang

Differential Revision: D30055372

Pulled By: alanwaketan

fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508
2021-08-04 21:14:20 -07:00
yanbing-j
c7a7c2b62f Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525)
Summary:
Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one.

Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525

Reviewed By: ejguan

Differential Revision: D29940369

Pulled By: ezyang

fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf
2021-08-03 06:52:23 -07:00
Joel Schlosser
ee482edf0a Callable activation function support for Transformer modules (C++) (#62342)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60747

Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342

Reviewed By: malfet

Differential Revision: D30022592

Pulled By: jbschlosser

fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4
2021-08-02 08:06:39 -07:00
Will Constable
bc787f2402 Fix setArgumentNames and make Script/Python consistent (#62442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62442

For PythonMethodWrapper::setArgumentNames, make sure to use the correct method
specified by method_name_ rather than using the parent model_ obj which itself
_is_ callable, but that callable is not the right signature to extract.

For Python vs Script, unify the behavior to avoid the 'self' parameter, so we only
list the argument names to the unbound arguments which is what we need in practice.

Test Plan: update unit test and it passes

Reviewed By: alanwaketan

Differential Revision: D29965283

fbshipit-source-id: a4e6a1d0f393f2a41c3afac32285548832da3fb4
2021-07-29 21:29:06 -07:00
Richard Barnes
ee44d73e59 Modernize override (#61744)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717320

fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc
2021-07-23 23:04:46 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
imaginary-person
9e53c823b8 Add AVX512 support in ATen & remove AVX support (#61903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61903

### Remaining Tasks

- [ ] Collate results of benchmarks on two Intel Xeon machines (with & without CUDA, to check if CPU throttling causes issues with GPUs) - make graphs, including Roofline model plots (Intel Advisor can't make them with libgomp, though, but with Intel OpenMP).

### Summary

1. This draft PR produces binaries with with 3 types of ATen kernels - default, AVX2, AVX512 . Using the environment variable `ATEN_AVX512_256=TRUE`  also results in 3 types of kernels, but the compiler can use 32 ymm registers for AVX2, instead of the default 16. ATen kernels for `CPU_CAPABILITY_AVX` have been removed.

2. `nansum` is not using AVX512 kernel right now, as it has poorer accuracy for Float16, than does AVX2 or DEFAULT, whose respective accuracies aren't very good either (#59415).
It was more convenient to disable AVX512 dispatch for all dtypes of `nansum` for now.

3. On Windows , ATen Quantized AVX512 kernels are not being used, as quantization tests are flaky. If `--continue-through-failure` is used, then `test_compare_model_outputs_functional_static` fails. But if this test is skipped, `test_compare_model_outputs_conv_static` fails. If both these tests are skipped, then a third one fails. These are hard to debug right now due to not having access to a Windows machine with AVX512 support, so it was more convenient to disable AVX512 dispatch of all ATen Quantized kernels on Windows for now.

4. One test is currently being skipped -
[test_lstm` in `quantization.bc](https://github.com/pytorch/pytorch/issues/59098) - It fails only on Cascade Lake machines, irrespective of the `ATEN_CPU_CAPABILITY` used, because FBGEMM uses `AVX512_VNNI` on machines that support it. The value of `reduce_range` should be used as `False` on such machines.

The list of the changes is at https://gist.github.com/imaginary-person/4b4fda660534f0493bf9573d511a878d.

Credits to ezyang for proposing `AVX512_256` - these use AVX2 intrinsics but benefit from 32 registers, instead of the 16 ymm registers that AVX2 uses.
Credits to limo1996 for the initial proposal, and for optimizing `hsub_pd` & `hadd_pd`, which didn't have direct AVX512 equivalents, and are being used in some kernels. He also refactored `vec/functional.h` to remove duplicated code.
Credits to quickwritereader for helping fix 4 failing complex multiplication & division tests.

### Testing
1. `vec_test_all_types` was modified to test basic AVX512 support, as tests already existed for AVX2.
Only one test had to be modified, as it was hardcoded for AVX2.
2.  `pytorch_linux_bionic_py3_8_gcc9_coverage_test1` & `pytorch_linux_bionic_py3_8_gcc9_coverage_test2` are now using `linux.2xlarge` instances, as they support AVX512. They were used for testing AVX512 kernels, as AVX512 kernels are being used by default in both of the CI checks. Windows CI checks had already been using machines with AVX512 support.

### Would the downclocking caused by AVX512 pose an issue?

I think it's important to note that AVX2 causes downclocking as well, and the additional downclocking caused by AVX512 may not hamper performance on some Skylake machines & beyond, because of the double vector-size. I think that [this post with verifiable references is a must-read](https://community.intel.com/t5/Software-Tuning-Performance/Unexpected-power-vs-cores-profile-for-MKL-kernels-on-modern-Xeon/m-p/1133869/highlight/true#M6450). Also, AVX512 would _probably not_ hurt performance on a high-end machine, [but measurements are recommended](https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/). In case it does, `ATEN_AVX512_256=TRUE` can be used for building PyTorch, as AVX2 can then use 32 ymm registers instead of the default 16. [FBGEMM uses `AVX512_256` only on Xeon D processors](https://github.com/pytorch/FBGEMM/pull/209), which are said to have poor AVX512 performance.

This [official data](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) is for the Intel Skylake family, and the first link helps understand its significance. Cascade Lake & Ice Lake SP Xeon processors are said to be even better when it comes to AVX512 performance.

Here is the corresponding data for [Cascade Lake](https://cdrdv2.intel.com/v1/dl/getContent/338848) -

![CASCADE LAKE AVX2](https://user-images.githubusercontent.com/76181208/120666172-ffec3f80-c451-11eb-8ea1-8933ccc12a1b.PNG)
![CASCADE LAKE AVX512](https://user-images.githubusercontent.com/76181208/120666190-04b0f380-c452-11eb-9faa-38d233c874c8.PNG)

The corresponding data isn't publicly available for Intel Xeon SP 3rd gen (Ice Lake SP), but [Intel mentioned that the 3rd gen has frequency improvements pertaining to AVX512](https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf). Ice Lake SP machines also have 48 KB L1D caches, so that's another reason for AVX512 performance to be better on them.

### Is PyTorch always faster with AVX512?

No, but then PyTorch is not always faster with AVX2 either. Please refer to #60202. The benefit from vectorization is apparent with with small tensors that fit in caches or in kernels that are more compute heavy. For instance, AVX512 or AVX2 would yield no benefit for adding two 64 MB tensors, but adding two 1 MB tensors would do well with AVX2, and even more so with AVX512.

It seems that memory-bound computations, such as adding two 64 MB tensors can be slow with vectorization (depending upon the number of threads used), as the effects of downclocking can then be observed.

Original pull request: https://github.com/pytorch/pytorch/pull/56992

Reviewed By: soulitzer

Differential Revision: D29266289

Pulled By: ezyang

fbshipit-source-id: 2d5e8d1c2307252f22423bbc14f136c67c3e6184
2021-07-22 08:51:49 -07:00
Jiewen Tan
31beef009d Fix IMethodTest.GetArgumentNames after D29648756 (#61985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61985

Fix IMethodTest.GetArgumentNames after D29648756 (641f6ef8a7).
ghstack-source-id: 134054637

Test Plan: buck test mode/dev caffe2/test/cpp/api:imethod -- IMethodTest.GetArgumentNames

Reviewed By: suo

Differential Revision: D29828807

fbshipit-source-id: b1411745b91e1b8c0ea0fd9e9666e22125dde333
2021-07-22 00:21:59 -07:00
Laurence Rouesnel
adb73d3dcf Removed overhead from reshape() call if tensor doesn't need to be changed (#61466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61466

## Goal

Per #55126 the performance of `reshape` is worse than `alias` in cases where they are performing the same operation (i.e. where reshape is returning a view) because `reshape` delegates to `view` and duplicates some of the operations (specifically `infer_size_dv` and `computeStride`).

The goal of this pull-request is to reduce or remove the additional overhead that `reshape` has.

### Proposed Implementation

Instead of using `view` we implement a private/internal operator (`_reshape_alias`) that `reshape` dispatches to which skips the relevant checks. This is functionally equivalent to `as_strided` however it is a lot simpler because it's specialized to this use-case, and importantly the `backward` implementation is a lot faster.

Note that we have to dispatch (`reshape` is a composite operator) because `reshape` can return either a view or a copy of the Tensor depending on the parameters, and this complicates implementing a derivative/backward for `reshape`.

### Why not `as_strided`?

Using `as_strided` directly slows down autograd. If we use a custom function equivalent to `_reshape_alias` but with a simpler backward function then `view` has the same performance as `reshape`. If we delegate to `as_strided` it is about 56% slower (and this holds against our custom function).

This is also the reason we make an internal operator named `_reshape_alias` instead of exposing a new operator since this should only be used in the `reshape` case and it is effectively a more limited version of `view`, `alias`, and `as_strided`.

## Benchmarks
In a micro-benchmark for `backward` running:

```cpp
// Setup
at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));

// Benchmark loop
// `reshape(-1)` replaced with a call to view(-1) for view baseline
x.pow(4).reshape(-1).mean().backward();
```

I also benchmarked simple operations without gradients using:

```cpp
// Setup
at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));

// Benchmark loop
x.reshape(-1) // replaced with a call to view(-1) for view baseline
```

Baselined to `view`:

* Original `reshape`: `+3.3%` (without gradients `+20.8%`)
* Using `as_strided`: `+55.1%` (without gradients `+1.0%`)
* Using custom `_reshape_view`: `-1.0%` (without gradients `+6.2%`)

In absolute terms (note the percentages above were generated comparing between runs/tests rather than to a single baseline):

* Original `view`: `53.66 us` (without gradients `582.78 ns`)
* Original `reshape`: `55.46 us` (without gradients `704.24 ns`)
* Using `as_strided`: `83.24 us` (without gradients `576.49 ns`)
* Using custom `_reshape_view`: `53.13 us` (without gradients `536.01 ns`)

Note that these benchmarks perform a backwards operation as well. When compared without using gradient computation at all the performance differneces are more pronounced as this takes up more of the time.

### Original performance

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e4d393160>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.66 us
  IQR:    2.70 us (52.54 to 55.24)
  884 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e2ebd4fa0>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 55.46 us
  IQR:    2.61 us (54.39 to 57.01)
  889 measurements, 100 runs per measurement, 1 thread]

2276116
2286256

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f0e5b2e3e20>
   2640  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   1920  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
   1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
   1040  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&)
    980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
    720  ???:__tls_get_addr
    520  ???:at::shouldRunRecordFunction(bool*)
    520  ???:__memcpy_avx_unaligned_erms
    200  ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    100  ???:c10::TensorImpl::strides() const
    100  ???:c10::TensorImpl::sizes() const
    100  ???:at::(anonymous namespace)::manager()
     77  /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_7815557938202456331/timer_src.cpp:main
     40  ???:c10::TensorImpl::numel() const
    -77  /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_8055217880649990171/timer_src.cpp:main
   -260  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 10140
```

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f850dd66c10>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 582.78 ns
  IQR:    33.80 ns (573.80 to 607.61)
  833 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f850de31e20>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 704.24 ns
  IQR:    24.42 ns (697.20 to 721.62)
  679 measurements, 10000 runs per measurement, 1 thread]

56896
67036

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f84e1930bb0>
   2640  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   1920  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
   1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
   1040  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&)
    980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
    720  ???:__tls_get_addr
    520  ???:at::shouldRunRecordFunction(bool*)
    520  ???:__memcpy_avx_unaligned_erms
    200  ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    100  ???:c10::TensorImpl::strides() const
    100  ???:c10::TensorImpl::sizes() const
    100  ???:at::(anonymous namespace)::manager()
     76  /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_547407365342278353/timer_src.cpp:main
     40  ???:c10::TensorImpl::numel() const
    -76  /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_3457873755756181226/timer_src.cpp:main
   -260  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 10140
```

</details>

### Using `as_strided`

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f8b13bb5b50>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.37 us
  IQR:    3.15 us (51.73 to 54.88)
  936 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f8af55f8490>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 83.24 us
  IQR:    4.05 us (81.20 to 85.25)
  609 measurements, 100 runs per measurement, 1 thread]

2267916
2525061

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f8af55f8e50>
   31930  ???:_int_free
   15940  ???:malloc
   11595  ???:_int_malloc
   10100  ???:torch::autograd::generated::details::as_strided_backward(at::Tensor, at::TensorGeometry, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    9360  ???:__tls_get_addr
    8280  ???:free
    8100  ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    4520  ???:c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_()
    4080  ???:operator new(unsigned long)
     ...
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1220  ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -2560  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)
   -4860  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)

Total: 257145
```

```

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f93176a0160>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 570.55 ns
  IQR:    32.69 ns (552.87 to 585.56)
  874 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f92f8f29490>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 576.49 ns
  IQR:    37.95 ns (559.51 to 597.46)
  861 measurements, 10000 runs per measurement, 1 thread]

56896
58556

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f932556ca60>
    2140  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1940  ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1880  ???:torch::ADInplaceOrView::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1720  ???:at::_ops::as_strided::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1400  ???:at::native::as_strided_tensorimpl(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1260  ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)'2
    1260  ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
     980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
     ...
    -620  ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -1740  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 1660

```

</details>

### Using custom function (`_reshape_alias`)

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f16861d6b50>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.50 us
  IQR:    2.64 us (52.32 to 54.96)
  906 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f1667b2ed60>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.13 us
  IQR:    3.40 us (51.72 to 55.13)
  914 measurements, 100 runs per measurement, 1 thread]

2269736
2273236

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f1693f8dc10>
    5060  ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    2000  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1780  ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1660  ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1600  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1220  ???:torch::autograd::generated::AliasToShapeBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
     ...
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1220  ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)
   -4860  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)

Total: 3500
```

```

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f5287adfb20>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 505.10 ns
  IQR:    20.04 ns (500.41 to 520.45)
  944 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f526951b430>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 536.01 ns
  IQR:    17.81 ns (531.34 to 549.16)
  916 measurements, 10000 runs per measurement, 1 thread]

56896
60376

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f5295896c10>
    2000  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1860  ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1780  ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1660  ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1600  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
     980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
     ...
    -620  ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -1740  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 3480

```

</details>

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29792126

Pulled By: laurencer

fbshipit-source-id: f0519b45b65f868aa3e8651679354558bd761dfd
2021-07-21 14:05:35 -07:00
Will Constable
a25e6370e5 Add IMethod interface
Summary:
Expose IMethod interface, which provides a unified interface to either script or python methods backed by torchscript or torchdeploy.

IMethod provides a way to depend on a torch method without depending on a particular runtime implementation such as torchscript or python/deploy.

Test Plan: add unit tests.

Reviewed By: suo

Differential Revision: D29463455

fbshipit-source-id: 903391d9af9fbdd8fcdb096c1a136ec6ac153b7c
2021-06-30 11:28:24 -07:00
Xiong Wei
7e3a694b23 supports non-leaf inputs for autograd.backward() function (#60521)
Summary:
Close https://github.com/pytorch/pytorch/issues/60268

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60521

Reviewed By: ngimel

Differential Revision: D29393586

Pulled By: albanD

fbshipit-source-id: 2dd2de427ecfecca8d544237bacf690e0b7c918c
2021-06-25 18:57:26 -07:00
Michael Dagitses
91451369ed require non-empty inputs to grad() calls in the API (#52016)
Summary:
The grad() function needs to return the updated values, and hence
needs a non-empty inputs to populate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016

Test Plan:
Passes Python and C++ unit tests, and added new tests to catch this behavior.

Fixes https://github.com/pytorch/pytorch/issues/47061

Reviewed By: albanD

Differential Revision: D26406444

Pulled By: dagitses

fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a
2021-06-22 10:10:58 -07:00
Thomas J. Fan
c16f87949f ENH Adds nn.ReflectionPad3d (#59791)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655

This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791

Reviewed By: gchanan

Differential Revision: D29242015

Pulled By: jbschlosser

fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
2021-06-21 10:53:14 -07:00
Brian Hirsh
27a3204982 generate C++ API for meta functions using at::meta:: (#58570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58570

**What the PR does**
Generate a fast-path `at::meta::{op}` API for calling meta functions without having to go through the dispatcher. This will be important for perf for external backends that want to use meta functions for shape checking (which seems likely to be what we end up doing for LazyTensorCore).

**Details**
In order to avoid naming collisions I had to make two small changes:
- rename `MetaFunctions.h` template -> `NativeMetaFunctions.h` (this is the file that declares the impl() function for every structured operator).
- rename the meta class: `at::meta::{op}::meta()` -> `at::meta::structured_{op}::meta()`

I also deleted a few unnecessary includes, since any file that includes NativeFunctions.h will automatically include NativeMetaFunctions.h.

**Why I made the change**
This change isn't actually immediately used anywhere; I already started writing it because I thought it would be useful for structured composite ops, but that isn't actually true (see [comment](https://github.com/pytorch/pytorch/pull/58266#issuecomment-843213147)). The change feels useful and unambiguous though so I think it's safe to add. I added explicit tests for C++ meta function calls just to ensure that I wrote it correctly - which is actually how I hit the internal linkage issue in the PR below this in the stack.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D28711299

Pulled By: bdhirsh

fbshipit-source-id: d410d17358c2b406f0191398093f17308b3c6b9e
2021-06-15 16:54:46 -07:00
Jeffrey Wan
f52e202840 Add warning when accessing Tensor::grad() in the C++ API (#59362)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35379

 - Adds  `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`.
   - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?).
 - Python API now uses `retain_grad` implementation from cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362

Reviewed By: jbschlosser

Differential Revision: D28969298

Pulled By: soulitzer

fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02
2021-06-08 19:43:21 -07:00
Jeffrey Wan
1733d10399 Warn when backward() is called with create_graph=True (#59412)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/4661
- Add warnings in engine's `execute` function so it can be triggered through both cpp and python codepaths
- Adds an RAII guard version of `c10::Warning::set_warnAlways` and replaces all prior usages of the set_warnAlways with the new one

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59412

Reviewed By: jbschlosser

Differential Revision: D28969294

Pulled By: soulitzer

fbshipit-source-id: b03369c926a3be18ce1cf363b39edd82a14245f0
2021-06-08 17:19:04 -07:00
Jeffrey Wan
4ae5764d47 Add is_inference to native functions (#58729)
Summary:
Adds `is_inference` as a native function w/ manual cpp bindings.
Also changes instances of `is_inference_tensor` to `is_inference` to be consistent with other properties such as `is_complex`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58729

Reviewed By: mruberry

Differential Revision: D28874507

Pulled By: soulitzer

fbshipit-source-id: 0fa6bcdc72a4ae444705e2e0f3c416c1b28dadc7
2021-06-04 08:59:11 -07:00
Joel Schlosser
ef32a29c97 Back out "[pytorch][PR] ENH Adds dtype to nn.functional.one_hot" (#59080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59080

Original commit changeset: 3686579517cc

Test Plan: None; reverting diff

Reviewed By: albanD

Differential Revision: D28746799

fbshipit-source-id: 75a7885ab0bf3abadde9a42b56d479f71f57c89c
2021-05-27 15:40:52 -07:00
Adnios
09a8f22bf9 Add mish activation function (#58648)
Summary:
See issus: https://github.com/pytorch/pytorch/issues/58375

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648

Reviewed By: gchanan

Differential Revision: D28625390

Pulled By: jbschlosser

fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4
2021-05-25 10:36:21 -07:00