Commit Graph

452 Commits

Author SHA1 Message Date
gmagogsfm
ddba7a5a55 Expose torch.export() API (#106904)
Other class definitions and utilities will be moved in subsequent PRs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106904
Approved by: https://github.com/avikchaudhuri
2023-08-16 10:47:26 +00:00
Tugsbayasgalan Manlaibaatar
20c5add133 [export] Refactor constrain_as_value and constrain_as_size (#106591)
Some notable changes:
1. `constrain_as_size` allows min value to be less than 2 as it will unconditionally assume min >= 2 for compiler purposes. Instead, we add additional check to make sure max value is always greater than 2.
2. Previously, we used to runtime assert on the unbacked symint's val range which would be always between [2, max]. I modified this logic to assert on [0, max] unless user explicitly specifies the min range.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106591
Approved by: https://github.com/gmagogsfm, https://github.com/ezyang
2023-08-15 05:41:43 +00:00
PyTorch MergeBot
745d29b0cc Revert "[export] Refactor constrain_as_value and constrain_as_size (#106591)"
This reverts commit 18989890bf.

Reverted https://github.com/pytorch/pytorch/pull/106591 on behalf of https://github.com/izaitsevfb due to Breaks inductor test on trunk ([comment](https://github.com/pytorch/pytorch/pull/106591#issuecomment-1675069091))
2023-08-11 16:37:47 +00:00
Tugsbayasgalan Manlaibaatar
18989890bf [export] Refactor constrain_as_value and constrain_as_size (#106591)
Some notable changes:
1. `constrain_as_size` allows min value to be less than 2 as it will unconditionally assume min >= 2 for compiler purposes. Instead, we add additional check to make sure max value is always greater than 2.
2. Previously, we used to runtime assert on the unbacked symint's val range which would be always between [2, max]. I modified this logic to assert on [0, max] unless user explicitly specifies the min range.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106591
Approved by: https://github.com/gmagogsfm, https://github.com/ezyang
2023-08-11 05:29:22 +00:00
Justin Chu
79c5e33349 [BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436
Approved by: https://github.com/malfet, https://github.com/albanD
2023-07-21 07:38:46 +00:00
Nikita Vedeneev
437bc5b1b7 sparse_mask: backward support for sparse lhs (take 2) (#104341)
This is a copy of https://github.com/pytorch/pytorch/pull/95165 with some bug fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104341
Approved by: https://github.com/albanD, https://github.com/pearu, https://github.com/amjames
2023-07-03 14:12:44 +00:00
Jinku Cui
27eecf32bd Remove redundant dummy overrides (#103992)
# Tidy the code in [overrides.py](https://github.com/pytorch/pytorch/blob/main/torch/overrides.py)

## Duplicate APIs in the [get_testing_overrides()](https://github.com/pytorch/pytorch/blob/main/torch/overrides.py#L335) function:

| APIs  | Line number|
|-------|-------|
| torch.fft.fft| L544 L564 |
| torch.logsumexp | L670 L672
| torch.narrow_copy | L733 L1126 |
| torch.native_norm | L740 L741 L742 |
| torch.nn.init.constant_ | L885 L887 |
| torch.squeeze_copy | L1134 L1135 |
| torch.view_copy | L1148 L1149 |
| Tensor.\_coalesced\_ | L1236 L1261 |

## Testing script

```Python

import torch
import inspect
import functools
from typing import Dict, Set, Callable

"""
@functools.lru_cache(None)
def get_testing_overrides() -> Dict[Callable, Callable]:
    ...
    Tensor = torch.Tensor
    ret: Dict[Callable, Callable] = {
        # ...
        torch.fft.fft: lambda input, n=None, dim=-1, norm=None: -1,                         # L544
        torch.fft.fft: lambda input, n=None, dim=-1, norm=None: -1,                         # L564
        torch.logsumexp: lambda input, names, keepdim=False, out=None: -1,                  # L670
        torch.logsumexp: lambda input, names, keepdim=False, out=None: -1,                  # L672
        torch.narrow_copy: lambda input, dim, start, length: -1,                            # L733
        torch.narrow_copy: lambda self, dim, start, length: -1,                             # L1126
        torch.native_norm: lambda input, p=2: -1,                                           # L740
        torch.native_norm: lambda input, p=2: -1,                                           # L741
        torch.native_norm: lambda input, p=2, dim=None, keepdim=False, dtype=None: -1,      # L742
        torch.squeeze_copy: lambda self: -1,                                                # L1134
        torch.squeeze_copy: lambda self, dim: -1,                                           # L1135
        torch.view_copy: lambda self, size: -1,                                             # L1148
        torch.view_copy: lambda self, dtype: -1,                                            # L1149
        Tensor._coalesced_: lambda self: -1,                                                # L1236
        Tensor._coalesced_: lambda self, coalesced: -1,                                     # L1261
        # ...
    }
    ...
"""

if __name__ == "__main__":
    ret = torch.overrides.get_testing_overrides()

    Tensor = torch.Tensor
    dups = {"torch.fft.fft": torch.fft.fft,
            "torch.logsumexp": torch.logsumexp,
            "torch.narrow_copy": torch.narrow_copy,
            "torch.native_norm": torch.native_norm,
            "torch.squeeze_copy": torch.squeeze_copy,
            "torch.view_copy": torch.view_copy,
            "Tensor._coalesced_": Tensor._coalesced_}

    for k,v in dups.items():
        print(f"{k:18} {inspect.signature(ret[v])}")

```

## Testing output

```Shell
torch.fft.fft      (input, n=None, dim=-1, norm=None)
torch.logsumexp    (input, names, keepdim=False, out=None)
torch.narrow_copy  (self, dim, start, length)
torch.native_norm  (input, p=2, dim=None, keepdim=False, dtype=None)
torch.squeeze_copy (self, dim)
torch.view_copy    (self, dtype)
Tensor._coalesced_ (self, coalesced)

```

## Explanation:
The function `get_testing_overrides()` returns a `Dict[Callable, Callable]`. The later dummy overrides will cover the previous dummy overrides in the returned `Dict`. Therefore, removing the dummy overrides with homonym API names can tidy the code and increase the readability of the code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103992
Approved by: https://github.com/kit1980
2023-06-28 01:59:56 +00:00
Meghan
6ff4548b6e [AMP] Support XLA:TPU (#96370)
With https://github.com/pytorch/xla/pull/5148, https://github.com/pytorch/xla/pull/4740

With these changes
XLA:GPU users should use `torch.cuda.amp.autocast()` for AMP with float16
XLA:TPU users should use `torch.amp.autocast('xla')` for AMP with bfloat16

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96370
Approved by: https://github.com/bdhirsh, https://github.com/malfet
2023-06-23 19:46:42 +00:00
PyTorch MergeBot
7274582390 Revert "sparse_mask: backward support for sparse lhs (#95165)"
This reverts commit f090fdf3b4.

Reverted https://github.com/pytorch/pytorch/pull/95165 on behalf of https://github.com/huydhn due to Sorry for reverting this. I think one of the tests test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_complex128 is failing on slow gradcheck f090fdf3b4 ([comment](https://github.com/pytorch/pytorch/pull/95165#issuecomment-1604696109))
2023-06-23 18:40:15 +00:00
Nikita Vedeneev
f090fdf3b4 sparse_mask: backward support for sparse lhs (#95165)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95165
Approved by: https://github.com/pearu, https://github.com/cpuhrsch
2023-06-23 12:27:27 +00:00
xuanqi
a152b3e3b8 [RFC] Create functional aten assertion ops (#103751)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

* #103887
* #103757
* __->__ #103751

Prep PR to create functional version of assertions. Concrete logic will be implemented in future PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103751
Approved by: https://github.com/tugsbayasgalan
2023-06-23 06:20:42 +00:00
Muralidhar Andoorveedu
4e204ff87b Added is_xla (#103100)
This change creates `is_xla` which is congruent with `is_cuda` and `is_cpu`. Useful in situations like: https://github.com/pytorch/pytorch/pull/102858

```
>>> x = torch.tensor([1], device=xm.xla_device())
>>> x.is_xla
True
>>> x.is_cpu
False
>>> x = torch.tensor([1])
>>> x.is_cpu
True
>>> x.is_xla
False
```

Attn: @albanD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103100
Approved by: https://github.com/albanD
2023-06-22 23:31:04 +00:00
Charlie West-Taylor
5eb7325bc7 Add autocast support for IPU (#103890)
As part of this, a new `AutocastIPU` dispatch key has been added.

There's an existing PR, #85043, to make `Autocast` a proper per-backend functionality key, but it ran into issues with layering with other functionality keys and went stale.

This has been tested in the out-of-tree IPU PyTorch backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103890
Approved by: https://github.com/albanD
2023-06-22 15:38:45 +00:00
Aleksandar Samardžić
09fdea8564 Fix autograd issue with identity conversions (#92022)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92022
Approved by: https://github.com/pearu, https://github.com/mtaaooby, https://github.com/amjames, https://github.com/cpuhrsch
2023-06-21 21:23:03 +00:00
xuanqi
b27c3558a4 [RFC]: Create aten native op for constrain_range (#103346)
At high current implementation of constrains functions (constrain_as_**) will raise exception for the following code snippets:
```
def f(x):
    a = x.item()
    constrain_as_size(a, 4, 7)
    return torch.empty((a, 4))

inp = torch.tensor([5])
ep = torch._export.export(f, (inp,))
```

The reason is because current constrain logic is:
1) Purely python so it won't survive AOT export (the full node is gone after AOT export since AOT export only maintains aten level op).
2) Utilize side effect to add range constraints for traced symbol's shape env ([code](9591e52880/torch/fx/experimental/symbolic_shapes.py (L370-L372))).
3) If runtime assertion is turned on (by default). [`_AddRuntimeAssertionsForConstraintsPass`](9591e52880/torch/_export/passes/add_runtime_assertions_for_constraints_pass.py (L98-L100)) will try to append assertion node based on range constrains extracted from shape env of symbol during another interpretation round.
4). However, since 1), in the round of AOT export, range constraints logic won't run for symbols generated during this round. And later there is no range constrains information available for assertion round and caused issue.
5) As a result of above, it will failure at `torch.empty((a, 4))` (there is no constrains for `a` that it must be positive).

The fix here is just to implement range constrain logic as a native aten op (CPU implementation as no-op) to make it be able to survive AOT export.

**NOTE:**
[Logic](2d745b95d7/torch/fx/experimental/symbolic_shapes.py (L350-L365C15)) within [`constrain_range`](2d745b95d7/torch/fx/experimental/symbolic_shapes.py (LL313C74-L313C74)) is split out as `constrain_range_int` to capture case when non `SymInt` is passed in and reused in the new `_constrain_range`. The reason is when non `SymInt` is provided:
* If it directly calls `sym_constrain_range`, the C++ version will be called which will be no-op.
* So in this case it calls `constrain_range_int` instead to be able to capture issue like user provides a input whose tensor's shape could be out of range during exporting, like the following for above code example:
```
...
inp = torch.tensor([10])
ep = torch._export.export(f, (inp,)) # immediately raise error
```

Differential Revision: [D46734204](https://our.internmc.facebook.com/intern/diff/D46734204)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103346
Approved by: https://github.com/tugsbayasgalan
2023-06-16 14:55:40 +00:00
Nikita Vedeneev
056d92e2a0 sparse.mm backward: performance improvements (#94991)
`torch.sparse.mm` - faster and without syncs in "most" cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94991
Approved by: https://github.com/Skylion007, https://github.com/pearu, https://github.com/cpuhrsch
2023-06-12 20:57:29 +00:00
Richard Zou
74f10b9ea5 Switch most Python RAII guard usages to context manager (#102642)
There are some I can't easily switch due to reasons like:
- Dynamo modelling the guard
- BC concerns (for torch.autograd.set_multithreading_enabled)

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102642
Approved by: https://github.com/albanD
2023-06-01 16:28:37 +00:00
leslie-fang-intel
488a4303a5 Enable quantized_max_pool3d (#101654)
**Summary**
Enable `quantized_max_pool3d` kernel to fix the issue https://github.com/pytorch/pytorch/issues/101386.

**Test Plan**
```
clear && python -u -m pytest -s -v test_quantized_op.py -k test_max_pool3d
clear && python -u -m pytest -s -v test_quantized_op.py -k test_max_pool3d_nhwc
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101654
Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/mingfeima
2023-05-23 00:45:38 +00:00
Tugsbayasgalan Manlaibaatar
d4bf76c2a4 Persist torch.assert in aten graph (#100101)
This PR introduces a new operator called aten._assert_async.msg, which allows passing a tensor value and assertion message as inputs. As part of TorchDynamo, we're replacing the use of torch._assert with this new operator so that make_fx also knows how to handle assertions. This is subset of https://github.com/pytorch/pytorch/pull/98878, refer there for historic reviews.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100101
Approved by: https://github.com/jansel
2023-04-28 07:31:43 +00:00
Justin Chu
6e3cdcad08 Fix flake8 lint errors - part 2 - manual fixes (#99799)
<!--
copilot:all
-->
### <samp>🤖 Generated by Copilot at 8aef78f</samp>

### Summary
📝🚀🛠️

<!--
1.  📝 for modifying the logging format and style
2.  🚀 for improving performance and avoiding unnecessary string creation
3.  🛠️ for fixing flake8 issues
-->
This pull request updates some logging calls to use old-style string formatting with `%s` placeholders instead of f-strings in `torch/_dynamo/logging.py`, `torch/_functorch/compilers.py`, and `torch/fx/passes/pass_manager.py` as part of a logging standardization effort. It also adds a `# noqa: F404` comment to the `import __future__` statement in `torch/overrides.py` to fix a flake8 warning.

> _`log` uses old style_
> _formatting strings with `%s`_
> _logging is faster_

### Walkthrough
*  Standardize logging format and style to use old-style string formatting with `%s` placeholders instead of f-string syntax for performance and consistency ([link](https://github.com/pytorch/pytorch/pull/99799/files?diff=unified&w=0#diff-18807f7fd187b8bc8e69e93722566195b36d5bf269099b415a6f90b552228d6bL55-R55), [link](https://github.com/pytorch/pytorch/pull/99799/files?diff=unified&w=0#diff-fae8a66564055743ec031edb87eb22edeebf7fdebef9d21660d5e6a6252e5222L370-R373), [link](https://github.com/pytorch/pytorch/pull/99799/files?diff=unified&w=0#diff-5f3e37ded032f24e247dcf4a3be4b73ea0cf21382e342631742e5a04550202e1L72-R72))
*  Suppress flake8 warning for `import __future__` statement in `torch/overrides.py` with `# noqa: F404` comment ([link](https://github.com/pytorch/pytorch/pull/99799/files?diff=unified&w=0#diff-4f601fe7f31e875ee4354882c0bb490bc35e51d3d413d058cc5fda3be8ca9f15L23-R23))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99799
Approved by: https://github.com/Skylion007
2023-04-24 06:03:26 +00:00
Guang Yang
c377a8590b Add nonzero_static() op to pytorch to unblock export (#97417)
Summary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR

Example:

Say input tensor is `x = torch.tensor([[1, 0], [3, 2]])`

call regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])`
call `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded)
call `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated)

Test Plan:
**Unit Tests**
```
buck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static'
```

**PT2 Export with `nonzero_static()`**
Example of `GraphModule` in the exported graph
```
def forward(self, x):
    arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec)
    nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4);  arg0 = None
    return pytree.tree_unflatten([nonzero_static_default], self._out_spec)
```

Differential Revision: D44324808

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97417
Approved by: https://github.com/ezyang
2023-04-11 05:13:36 +00:00
BJ Hargrave
555ab310dc Add itemsize and nbytes properties to Tensor (#98322)
Adds properties for itemsize and nbytes to Tensor matching the properties in NumPy.

Fixes https://github.com/pytorch/pytorch/issues/12728

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98322
Approved by: https://github.com/ezyang
2023-04-05 12:11:55 +00:00
Joel Schlosser
77e73b9b7a Refactor NT offsets metadata to be a Tensor (#96909)
It's tedious work, but somebody's gotta do it.

Benefits:
* Enable access to offsets metadata from Python via private API (for validation, etc.)
* Consistency with nested sizes / strides metadata
* Needed for SymInt-ifying offsets metadata
* more TBD

Bonus:
* Remove `_tensor` suffixes from metadata / getter names
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96909
Approved by: https://github.com/drisspg
2023-03-21 18:51:35 +00:00
Pearu Peterson
2abcafcfd8 Add masked_grad kw argument to to_dense (#96095)
As in the title.

The `masked_grad` kw argument is required for `to_dense` backward to distinguish the expected semantics of sparse tensors. `masked_grad=True` means that the `to_dense` backward will apply a mask to the returned gradient where the mask is defined by the input indices. The default semantics implies `masked_grad==True` for BC but see the [comment](https://github.com/pytorch/pytorch/pull/96095/files#diff-d4df180433a09071e891d552426911c227b30ae9b8a8e56da31046e7ecb1afbeR501-R513) in `to_dense_backward`.

As a consequence, existing code that is run through autograd engine must replace `.to_dense()` calls with `.to_dense(masked_grad=False)`. For example,
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense())
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense())
```
(recall, gradcheck has `masked=False` as default) must be updated to
```python
torch.autograd.gradcheck(lambda x: torch.sum(x, [0]).to_dense(masked_grad=False))
torch.autograd.gradcheck(lambda x: torch.sparse.sum(x, [0]).to_dense(masked_grad=True), masked=True)
```

Fixes https://github.com/pytorch/pytorch/issues/95550

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96095
Approved by: https://github.com/cpuhrsch
2023-03-16 21:38:11 +00:00
Edward Z. Yang
ce950b412f Reland "Add torch.empty_permuted (#95069)" (#95208)
This reverts commit 92e03cd583.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95208
Approved by: https://github.com/albanD
2023-02-21 18:02:48 +00:00
PyTorch MergeBot
92e03cd583 Revert "Add torch.empty_permuted (#95069)"
This reverts commit bedeb1f014.

Reverted https://github.com/pytorch/pytorch/pull/95069 on behalf of https://github.com/jeanschmidt due to Breaking internal builds. More in https://fburl.com/phabricator/ztrxrroq
2023-02-21 12:05:20 +00:00
Edward Z. Yang
bedeb1f014 Add torch.empty_permuted (#95069)
torch.empty_permuted is a generalized version of torch.empty(memory_format=...), where you can pass an arbitrary physical layout as a tuple of dims to allow you to setup dense, non-overlapping tensors with non-standard memory format. Check the docblock for a full description of semantics.

The initial motivation for this PR is with guard-less unbacked SymInts. Traditionally, the way we allocate dense tensors with arbitrary layout is with `empty_strided`. However, `empty_strided` does not know that the given strides are actually contiguous, and must test this manually to find out if it is the case. With `empty_permuted`, this is known statically to be the case and helps us skip some 0/1 guards.

However, I also think torch.empty_permuted is a useful API in its own right. It is technically possible to simulate this with an empty and a permute; however, there are some downsides:

* The manual incant is tricky to work out. To allocate an NHWC tensor, the invocation is `torch.empty(N, H, W, C).permute(0, 3, 1, 2)`; the permute call has to take NHWC to NCHW, and is the *inverse* of the permutation people are typically thinking of when they talk about NHWC (0, 2, 3, 1). Instead, torch.empty_permuted lets you say `torch.empty_permuted((N, C, H, W), (0, 2, 3, 1))`, letting you provide the intuitive permutation. It can be literally be read off as NHWC if you assign N=0, C=1, H=2, W=3.
* An empty(requires_grad=True).permute() is no longer a leaf tensor. You can force it to be a leaf with a detach(), but it is more straightforward and less error prone to allow directly allocating a tensor with the correct permutation.

It is also technically possible to simulate this with empty_strided. However, this requires the user to manually compute the contiguous output strides and is bad from a reduction of guards perspective. For what it's worth, this is one of the more common uses of as_strided in the wild, and it would be nice to get rid of it.

A nice enhancement of this feature would be to accept `physical_layout` anywhere `memory_format` is accepted. However, this would be a pretty involved change, so I'm doing the easy thing instead.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95069
Approved by: https://github.com/malfet, https://github.com/ngimel, https://github.com/albanD, https://github.com/dagitses
2023-02-20 00:23:10 +00:00
Mikayla Gawarecki
c7c7238976 Fix bug in unsqueeze_nested stride calculation (#88688)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88688
Approved by: https://github.com/cpuhrsch
2023-02-10 17:00:04 +00:00
Aaron Gokaslan
1e2d82b8e4 [BE] Merge isinstance calls together (#94419)
Simplify and speeds up isinstance calls by checking for multiple types at the same time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94419
Approved by: https://github.com/ezyang
2023-02-09 00:47:26 +00:00
albanD
496c0a207b Make segment_reduce properly private. (#93166)
I am attempting not to change the aten function to reduce the amount of BC issues on the torchscript side.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93166
Approved by: https://github.com/ngimel
2023-02-06 18:32:23 +00:00
Ivan Yashchuk
fba13d94a1 Remove deprecated torch.symeig (#70988)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`.

- [x] XLA PR: https://github.com/pytorch/xla/pull/4498

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988
Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet
2023-01-31 11:59:11 +00:00
PyTorch MergeBot
acdd462b1a Revert "Remove deprecated torch.symeig (#70988)"
This reverts commit d70ed68162.

Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful
2023-01-24 19:03:40 +00:00
Michael Gschwind
7265f60ad0 Regularize mask handling for attn_mask and key_padding_mask (#92733)
Summary:
Regularize mask handling for attn_mask and key_padding_mask
* Update documentation to remove reference to byte masks (which were deprecated long ago)
* Introduce check and warn about deprecation if attn_mask and key_padding_mask types mismatch
* Convert all masks to float before combining
* Combine by adding

Test Plan: sandcastle & github CI

Differential Revision: D42653215

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92733
Approved by: https://github.com/ngimel, https://github.com/drisspg
2023-01-24 14:12:05 +00:00
Ivan Yashchuk
d70ed68162 Remove deprecated torch.symeig (#70988)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988
Approved by: https://github.com/lezcano, https://github.com/kit1980
2023-01-23 22:51:40 +00:00
Driss Guessous
df14650f0b [SDPA] Update SDPA API and make function Public (#92189)
# Summary
In preparation for pt 2.0 launch this PR updates SDPA's API and makes the function a nn.funcitonal public function.

## Changes
### API
Previously the the function signature was:
`scaled_dot_product_attention(query, key, value, attn_mask=None, need_attn_weights=False, dropout_p=0.0, is_causal=False) -> (Tensor, Tensor)`
Updated signature:
`scaled_dot_product_attention(query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False) -> Tensor`

This PR removes the need_attn_weights optional boolean variable and updates the return type to a singular tensor.

#### Reasoning:
The main goal of this function is to provide an easy interface for users to call into fused attention kernels e.g.  (FlashAttention). The fused kernels do not currently support arbitrary attn_mask or dropout but there is a PR to mem-efficient attention to enable these. We want to have the API surface ready for when the backing kernels get updated.

The fused kernels save on memory usage by not materializing the weights and it is unlikely that a fast fused implementation will enable this feature so we are removing.

Discussed with folks at FAIR/Xformers and +1 this API change.

#### Make function Public
In preparation for the pt 2.0 launch we make the function public to start to generate user feedback

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92189
Approved by: https://github.com/cpuhrsch
2023-01-23 20:50:46 +00:00
kshitij12345
d2728bb6a7 [functorch] add is_any_true (#92686)
Adds `is_any_true` similar to `is_all_true` (https://github.com/pytorch/pytorch/pull/89097/files)

This would unblock https://github.com/pytorch/functorch/issues/1049
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92686
Approved by: https://github.com/Chillee
2023-01-21 03:36:05 +00:00
Edward Z. Yang
5c6f5439b7 Implement SymBool (#92149)
We have known for a while that we should in principle support SymBool as a separate concept from SymInt and SymFloat ( in particular, every distinct numeric type should get its own API). However, recent work with unbacked SymInts in, e.g., https://github.com/pytorch/pytorch/pull/90985 have made this a priority to implement. The essential problem is that our logic for computing the contiguity of tensors performs branches on the passed in input sizes, and this causes us to require guards when constructing tensors from unbacked SymInts. Morally, this should not be a big deal because, we only really care about the regular (non-channels-last) contiguity of the tensor, which should be guaranteed since most people aren't calling `empty_strided` on the tensor, however, because we store a bool (not a SymBool, prior to this PR it doesn't exist) on TensorImpl, we are forced to *immediately* compute these values, even if the value ends up not being used at all. In particular, even when a user allocates a contiguous tensor, we still must compute channels-last contiguity (as some contiguous tensors are also channels-last contiguous, but others are not.)

This PR implements SymBool, and makes TensorImpl use SymBool to store the contiguity information in ExtraMeta. There are a number of knock on effects, which I now discuss below.

* I introduce a new C++ type SymBool, analogous to SymInt and SymFloat. This type supports logical and, logical or and logical negation. I support the bitwise operations on this class (but not the conventional logic operators) to make it clear that logical operations on SymBool are NOT short-circuiting. I also, for now, do NOT support implicit conversion of SymBool to bool (creating a guard in this case). This does matter too much in practice, as in this PR I did not modify the equality operations (e.g., `==` on SymInt) to return SymBool, so all preexisting implicit guards did not need to be changed. I also introduced symbolic comparison functions `sym_eq`, etc. on SymInt to make it possible to create SymBool. The current implementation of comparison functions makes it unfortunately easy to accidentally introduce guards when you do not mean to (as both `s0 == s1` and `s0.sym_eq(s1)` are valid spellings of equality operation); in the short term, I intend to prevent excess guarding in this situation by unit testing; in the long term making the equality operators return SymBool is probably the correct fix.
* ~~I modify TensorImpl to store SymBool for the `is_contiguous` fields and friends on `ExtraMeta`. In practice, this essentially meant reverting most of the changes from https://github.com/pytorch/pytorch/pull/85936 . In particular, the fields on ExtraMeta are no longer strongly typed; at the time I was particularly concerned about the giant lambda I was using as the setter getting a desynchronized argument order, but now that I have individual setters for each field the only "big list" of boolean arguments is in the constructor of ExtraMeta, which seems like an acceptable risk. The semantics of TensorImpl are now that we guard only when you actually attempt to access the contiguity of the tensor via, e.g., `is_contiguous`. By in large, the contiguity calculation in the implementations now needs to be duplicated (as the boolean version can short circuit, but the SymBool version cannot); you should carefully review the duplicate new implementations. I typically use the `identity` template to disambiguate which version of the function I need, and rely on overloading to allow for implementation sharing. The changes to the `compute_` functions are particularly interesting; for most of the functions, I preserved their original non-symbolic implementation, and then introduce a new symbolic implementation that is branch-less (making use of our new SymBool operations). However, `compute_non_overlapping_and_dense` is special, see next bullet.~~ This appears to cause performance problems, so I am leaving this to an update PR.
* (Update: the Python side pieces for this are still in this PR, but they are not wired up until later PRs.) While the contiguity calculations are relatively easy to write in a branch-free way, `compute_non_overlapping_and_dense` is not: it involves a sort on the strides. While in principle we can still make it go through by using a data oblivious sorting network, this seems like too much complication for a field that is likely never used (because typically, it will be obvious that a tensor is non overlapping and dense, because the tensor is contiguous.) So we take a different approach: instead of trying to trace through the logic computation of non-overlapping and dense, we instead introduce a new opaque operator IsNonOverlappingAndDenseIndicator which represents all of the compute that would have been done here. This function returns an integer 0 if `is_non_overlapping_and_dense` would have returned `False`, and an integer 1 otherwise, for technical reasons (Sympy does not easily allow defining custom functions that return booleans). The function itself only knows how to evaluate itself if all of its arguments are integers; otherwise it is left unevaluated. This means we can always guard on it (as `size_hint` will always be able to evaluate through it), but otherwise its insides are left a black box. We typically do NOT expect this custom function to show up in actual boolean expressions, because we will typically shortcut it due to the tensor being contiguous. It's possible we should apply this treatment to all of the other `compute_` operations, more investigation necessary. As a technical note, because this operator takes a pair of a list of SymInts, we need to support converting `ArrayRef<SymNode>` to Python, and I also unpack the pair of lists into a single list because I don't know if Sympy operations can actually validly take lists of Sympy expressions as inputs. See for example `_make_node_sizes_strides`
* On the Python side, we also introduce a SymBool class, and update SymNode to track bool as a valid pytype. There is some subtlety here: bool is a subclass of int, so one has to be careful about `isinstance` checks (in fact, in most cases I replaced `isinstance(x, int)` with `type(x) is int` for expressly this reason.) Additionally, unlike, C++, I do NOT define bitwise inverse on SymBool, because it does not do the correct thing when run on booleans, e.g., `~True` is `-2`. (For that matter, they don't do the right thing in C++ either, but at least in principle the compiler can warn you about it with `-Wbool-operation`, and so the rule is simple in C++; only use logical operations if the types are statically known to be SymBool). Alas, logical negation is not overrideable, so we have to introduce `sym_not` which must be used in place of `not` whenever a SymBool can turn up. To avoid confusion with `__not__` which may imply that `operators.__not__` might be acceptable to use (it isn't), our magic method is called `__sym_not__`. The other bitwise operators `&` and `|` do the right thing with booleans and are acceptable to use.
* There is some annoyance working with booleans in Sympy. Unlike int and float, booleans live in their own algebra and they support less operations than regular numbers. In particular, `sympy.expand` does not work on them. To get around this, I introduce `safe_expand` which only calls expand on operations which are known to be expandable.

TODO: this PR appears to greatly regress performance of symbolic reasoning. In particular, `python test/functorch/test_aotdispatch.py -k max_pool2d` performs really poorly with these changes. Need to investigate.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92149
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-01-21 02:21:56 +00:00
Kurt Mohler
647b8f8e3e Add TORCH_CHECK_TENSOR_ALL (#89097)
`TORCH_CHECK_TENSOR_ALL(cond, ...)` is a wrapper around `TORCH_CHECK` which allows the condition argument to be a tensor, batched or unbatched. `cond` can be a boolean tensor of any size. If any element is False, or if `cond.numel() == 0`, then `TORCH_CHECK_TENSOR_ALL` raises an error

Part of #72948
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89097
Approved by: https://github.com/zou3519
2023-01-19 21:04:09 +00:00
Edward Z. Yang
6420fecdc4 Introduce sym_min and sym_max (#92107)
It turns out our old max/min implementation didn't do anything, because `__max__` and `__min__` are not actually magic methods in Python. So I give 'em the `sym_` treatment, similar to the other non-overrideable builtins.

NB: I would like to use `sym_max` when computing contiguous strides but this appears to make `python test/functorch/test_aotdispatch.py -v -k test_aot_autograd_symbolic_exhaustive_nn_functional_max_pool2d_cpu_float32` run extremely slowly. Needs investigating.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92107
Approved by: https://github.com/albanD, https://github.com/voznesenskym, https://github.com/Skylion007
2023-01-18 20:57:27 +00:00
yanbing-j
94a7c01159 Enable oneDNN implementation in LSTM op (#91158)
### Description
This PR is to enable oneDNN implementation in LSTM op to improve the performance of it. Both FP32 and BF16 are supported.

### Performance improvement
In CPX 28C, with setting iomp and jemalloc.
We choose 8 LSTM input options (including input_size, hidden_size, num_layers, bidirectional, bias, batch_first, dropout, batch_size, seq_len), and the final option is a real input from train-clean-100 in LibriSpeech dataset. The performance improvements are shown in the following figures. We can see that LSTM with oneDNN implementation can perform better than the original.

In single socket:
![image](https://user-images.githubusercontent.com/61222868/211182994-833debec-518a-4b35-8504-6b0fadb17930.png)

![image](https://user-images.githubusercontent.com/61222868/211183012-31e1253f-2c60-4c92-a656-c239a971b453.png)

In single core:
![image](https://user-images.githubusercontent.com/61222868/211183017-186e5d47-cb9a-4c1e-914f-fa718e769f1c.png)

![image](https://user-images.githubusercontent.com/61222868/211183022-53266857-5a9e-4a95-b300-33fa34811d08.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91158
Approved by: https://github.com/jgong5, https://github.com/malfet
2023-01-18 04:41:18 +00:00
Edward Z. Yang
333540a458 Reland "Add torch.utils.device_mode" (#91796)
Original PR https://github.com/pytorch/pytorch/pull/91525

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91796
Approved by: https://github.com/albanD
2023-01-09 20:57:12 +00:00
PyTorch MergeBot
9b415240d4 Revert "Reland "Add torch.utils.device_mode" (#91796)"
This reverts commit 81b5eff3c3.

Reverted https://github.com/pytorch/pytorch/pull/91796 on behalf of https://github.com/huydhn due to This breaks trunk with the following failed test https://hud.pytorch.org/failure/test_jit_save%2CTestTracer
2023-01-09 04:45:47 +00:00
Edward Z. Yang
81b5eff3c3 Reland "Add torch.utils.device_mode" (#91796)
Original PR https://github.com/pytorch/pytorch/pull/91525

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91796
Approved by: https://github.com/albanD
2023-01-08 03:44:56 +00:00
PyTorch MergeBot
f571ae4fdb Revert "Make torch.device usable as a context manager (#91525)"
This reverts commit 619d52a5d2.

Reverted https://github.com/pytorch/pytorch/pull/91525 on behalf of https://github.com/mehtanirav due to Internal breakages
2023-01-05 21:34:50 +00:00
Edward Z. Yang
619d52a5d2 Make torch.device usable as a context manager (#91525)
Fixes https://github.com/pytorch/pytorch/issues/82296
Fixes https://github.com/pytorch/pytorch/issues/27878
Fixes https://github.com/pytorch/pytorch/issues/260

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91525
Approved by: https://github.com/albanD
2023-01-04 01:32:00 +00:00
Kurt Mohler
08a47549af Rename Tensor._storage to Tensor.untyped_storage and update docs (#91414)
Fixes #89224

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91414
Approved by: https://github.com/ezyang
2022-12-28 19:21:34 +00:00
Joel Schlosser
8b55b86dbd Move sym_int and sym_float alongside SymInt / SymFloat in base torch package (#91317)
This PR moves the definitions for:
* `sym_int`
* `sym_ceil` (used only for `sym_int`)
* `sym_floor` (used only for `sym_int`)
* `sym_float`

from `torch/fx/experimental/symbolic_shapes.py` to `torch/__init__.py`, where `SymInt` and `SymFloat` are already defined.

This removes the need for several in-line imports, and enables proper JIT script gating for #91318. I'm very open to doing this in a better way!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91317
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2022-12-28 16:08:16 +00:00
Richard Zou
fb2e1878cb [torch.func] alias torch.func.vmap as torch.vmap (#91026)
This PR also redirects torch.vmap to torch.func.vmap instead of the old
vmap prototype.

Test Plan:
- tests
- view docs preview
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91026
Approved by: https://github.com/albanD, https://github.com/samdow
2022-12-21 20:51:49 +00:00
Michael Gschwind
512ec181ec Introduce causal mask (#90508)
Summary: Introduce causal mask

This PR introduces a causal mask option _causal_mask (as well as causal mask detection if attn_mask is provided), since current custom kernels do not support arbitrary masks.

Test Plan: sandcastle & github ci/cd

Differential Revision: D41723137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90508
Approved by: https://github.com/albanD
2022-12-16 21:39:42 +00:00
Sergii Dymchenko
f51f6aa387 Fix non-existing parameters in docstrings (#90505)
Continuation after https://github.com/pytorch/pytorch/pull/90163.

Here is a script I used to find all the non-existing arguments in the docstrings (the script can give false positives in presence of *args/**kwargs or decorators):

_Edit:_
I've realized that the indentation is wrong for the last `break` in the script, so the script only gives output for a function if the first docstring argument is wrong. I'll create a separate PR if I find more issues with corrected script.

``` python
import ast
import os
import docstring_parser

for root, dirs, files in os.walk('.'):
    for name in files:
        if root.startswith("./.git/") or root.startswith("./third_party/"):
            continue
        if name.endswith(".py"):
            full_name = os.path.join(root, name)
            with open(full_name, "r") as source:
                tree = ast.parse(source.read())
                for node in ast.walk(tree):
                    if isinstance(node, ast.FunctionDef):
                        all_node_args = node.args.args
                        if node.args.vararg is not None:
                            all_node_args.append(node.args.vararg)
                        if node.args.kwarg is not None:
                            all_node_args.append(node.args.kwarg)
                        if node.args.posonlyargs is not None:
                            all_node_args.extend(node.args.posonlyargs)
                        if node.args.kwonlyargs is not None:
                            all_node_args.extend(node.args.kwonlyargs)
                        args = [a.arg for a in all_node_args]
                        docstring = docstring_parser.parse(ast.get_docstring(node))
                        doc_args = [a.arg_name for a in docstring.params]
                        clean_doc_args = []
                        for a in doc_args:
                            clean_a = ""
                            for c in a.split()[0]:
                                if c.isalnum() or c == '_':
                                    clean_a += c
                            if clean_a:
                                clean_doc_args.append(clean_a)
                        doc_args = clean_doc_args
                        for a in doc_args:
                            if a not in args:
                                print(full_name, node.lineno, args, doc_args)
                            break

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90505
Approved by: https://github.com/malfet, https://github.com/ZainRizvi
2022-12-09 21:43:09 +00:00