Commit Graph

10 Commits

Author SHA1 Message Date
Shen Li
ac6e75a165 Revert D20195053: [pytorch][PR] Add API for listing functions overridable by __torch_function__
Test Plan: revert-hammer

Differential Revision:
D20195053

Original commit changeset: 1585f4e405f5

fbshipit-source-id: 3c1aab9c60e3138d40d200ae4238bda0cddf8896
2020-03-04 10:13:54 -08:00
Nathan Goldbaum
ad2825a2c9 Add API for listing functions overridable by __torch_function__ (#33791)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33182

This adds private API functions that developers of types that implement `__torch_function__` can use to ensure full coverage of the subset of the PyTorch API that can be overrided.

I've refactored some of the code in the tests into a new `torch._overrides.get_overridable_functions` function. I've also changed `TENSOR_LIKE_TORCH_OVERRIDES` into `torch._overrides.get_testing_overrides` and `IGNORED_TORCH_FUNCTIONS` into `torch._overrides.get_ignored_functions`. Making these two static global variables in the tests into functions should allow rewriting their implementation to construct their return values instead of just statically defining the return value as is done here. Currently that is blocked on not being able to inspect function signatures of compiled kernels in PyTorch (see https://github.com/pytorch/pytorch/issues/28233). See the docs I've added for usage examples of these new functions. I also refactored the existing override tests to make use of these new functions, which should be a good forcing function to make sure they're kept up-to-date.

Finally, while working on this I discovered that `TestTorchFunctionOverrides.test_mean` and `TestTorchFunctionOverrides.test_mm` weren't ever being run because they were getting clobbered by the other dynamically generated override tests. I fixed that by renaming the tests and then fixing the actual test code. I've verified that all the subclassing semantics is correct and that the updated test answers are correct. I'm happy to put the fixes to the existing tests in as a separate pull request if that would be easier to review.

ping cpuhrsch since the feature request originally came from them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33791

Differential Revision: D20195053

Pulled By: cpuhrsch

fbshipit-source-id: 1585f4e405f5223932b410eae03a288dc8eb627e
2020-03-03 12:40:34 -08:00
Nathan Goldbaum
fa80299bdf __torch_function__ overrides for torch.functional and torch.nn.functional (#32799)
Summary:
This adds `__torch_function__` support for all functions in `torch.functional` and `torch.nn.functional`.

The changes to C++ code and codegen scripts are to facilitate adding `__torch_function__` support for the native functions in `torch._C._nn`. Note that I moved the `handle_torch_function` C++ function to a header that both `python_torch_functions.cpp` and `python_nn_functions.cpp` include. The changes to `python_nn_functions.cpp` mirror the changes I made to `python_torch_functions.cpp` when `__torch_function__` support was first added in https://github.com/pytorch/pytorch/issues/27064. Due to the somewhat different way the `torch._C` and `torch._C._nn` namespaces are initialized I needed to create a new static reference to the `torch._C._nn` namespace (`THPNNVariableFunctions`). I'm not sure if that is the best way to do this. In principle I could import these namespaces in each kernel and avoid the global variable but that would have a runtime cost.

I added `__torch_function__` support to the Python functions in `torch.nn.functional` following the approach in https://github.com/pytorch/pytorch/issues/32194.

I re-enabled the test that checks if all functions in the `torch` namespace are explicitly tested for `__torch_function__` support. I also generalized the check to work for `torch.functional` and `torch.nn.functional` as well. This test was explicitly disabled in https://github.com/pytorch/pytorch/issues/30730 and I'm happy to disable it again if you think that's appropriate. I figured now was as good a time as any to try to re-enable it.

Finally I adjusted the existing torch API tests to suppress deprecation warnings and add keyword arguments used by some of the code in `torch.nn.functional` that were missed when I originally added the tests in https://github.com/pytorch/pytorch/issues/27064.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32799

Differential Revision: D19956809

Pulled By: ezyang

fbshipit-source-id: 40d34e0109cc4b9f3ef62f409d2d35a1d84e3d22
2020-02-21 08:38:37 -08:00
Pritam Damania
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00
Nathan Goldbaum
bab87e4b60 reimplement __torch_function__ overrides for torch.functional using inline logic (#32194)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30831.

This improves the performance of operators in the `torch.functional` namespace that are overridable by `__torch_function__` implementations when supplied with `Tensor` operands.

Running the split benchmark in various configurations produces the following timings:

<details>
<summary>Expand for timings on <code>master</code> </summary>

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cpu
# Input: M: 8, N: 8, parts: 2, device: cpu
Forward Execution Time (us) : 3.340

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cuda
# Input: M: 8, N: 8, parts: 2, device: cuda
Forward Execution Time (us) : 3.333

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 3.366

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cuda
# Input: M: 256, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 3.385

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cpu
# Input: M: 512, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 3.468

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cuda
# Input: M: 512, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 3.416
```
</details>

<details>
<summary>Expand for timings with this pull request applied</summary>

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cpu
# Input: M: 8, N: 8, parts: 2, device: cpu
Forward Execution Time (us) : 2.261

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cuda
# Input: M: 8, N: 8, parts: 2, device: cuda
Forward Execution Time (us) : 2.223

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 2.237

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cuda
# Input: M: 256, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 2.218

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cpu
# Input: M: 512, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 2.259

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cuda
# Input: M: 512, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 2.234
```

</details>

<details>
<summary>Expand for timings on <code>master</code> with <code>__torch_function__</code> dispatch disabled </summary>

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cpu
# Input: M: 8, N: 8, parts: 2, device: cpu
Forward Execution Time (us) : 2.180

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M8_N8_parts2_cuda
# Input: M: 8, N: 8, parts: 2, device: cuda
Forward Execution Time (us) : 2.172

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 2.171

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cuda
# Input: M: 256, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 2.146

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cpu
# Input: M: 512, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 2.175

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M512_N512_parts2_cuda
# Input: M: 512, N: 512, parts: 2, device: cuda
Forward Execution Time (us) : 2.152
```

</details>

So at least on the machine I'm testing on, this brings the overhead down to less than 100 ns. For comparison, the overhead for `__array_function__` in NumPy is about 850 ns on the same machine.

<details>
<summary>Expand for timings for NumPy <code>__array_function__</code> dispatch </summary>

```
In [1]: import numpy as np

In [2]: %timeit np.mean([1])
8.89 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [3]: %timeit np.mean._implementation([1])
8.04 µs ± 28.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```

See [the implementation in NumPy](https://github.com/numpy/numpy/blob/master/numpy/core/overrides.py#L195) for why this measures `__array_function__` overhead.

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32194

Differential Revision: D19410396

Pulled By: ezyang

fbshipit-source-id: ada788a4399c81cd7eb2d548aa04a2459e96634a
2020-01-16 07:10:38 -08:00
Karl Ostmo
227d1a43a4 Revert D18838848: disable __torch_function__ overides for operators in torch.functional
Test Plan: revert-hammer

Differential Revision:
D18838848

Original commit changeset: 22b8015d7b2f

fbshipit-source-id: fdaeffcd112990ed379782cf7216d3f1beeb2cb1
2020-01-07 15:03:15 -08:00
Nathan Goldbaum
ca72df06ae disable __torch_function__ overides for operators in torch.functional (#30839)
Summary:
For now I'm just removing the decorators from all of the currently overridable functions in `torch.functional`. This means they are no longer overridable, however this should fix the benchmark regressions reported in https://github.com/pytorch/pytorch/issues/30831. Moving forward we'll be looking at reducing the overhead of the python-level override mechanism and failing that, re-implementing all of these operators in C++.

cc hl475
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30839

Differential Revision: D18838848

Pulled By: ezyang

fbshipit-source-id: 22b8015d7b2f7a947f1ebc9632c998e081b48ad8
2020-01-07 12:27:28 -08:00
Nathan Goldbaum
9d3402e4cb Add the __torch_function__ API override mechanism (#30730)
Summary:
This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (b8792c0438). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures.

I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730

Differential Revision: D18813270

Pulled By: ezyang

fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68
2019-12-04 13:19:07 -08:00
Edward Yang
b8792c0438 Revert D18645954: add __torch_function__ API override mechanism
Test Plan: revert-hammer

Differential Revision:
D18645954

Original commit changeset: 54b5e4344d7a

fbshipit-source-id: 4a7aebb483e6b001130d6f384ccc53c5a808ab13
2019-12-04 07:41:47 -08:00
Prasun Anand
d12786b24f add __torch_function__ API override mechanism (#27064)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24015 (see description of that issue for more details).

For a toy example, see the `DiagonalTensor` and `SubDiagonalTensor` class in test/test_overrides.py.

This PR currently contains:

* tests for `__torch_function__` behavior
* modification to `gen_python_functions` and `parse` function signatures and dispatched to correct overloaded argument.

This feature is inspired by and analogous to NumPy's `__array_function__` protocol ([see NumPy Enhancement Proposal 18](https://numpy.org/neps/nep-0018-array-function-protocol.html#trying-array-function-methods-until-the-right-one-works)).

### Benchmarks:
See Nathan's comment below: https://github.com/pytorch/pytorch/pull/27064#issuecomment-554601189
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27064

Differential Revision: D18645954

Pulled By: ezyang

fbshipit-source-id: 54b5e4344d7afdbcf996bb57191b0bdadc7b1767
2019-12-04 05:56:46 -08:00