Commit Graph

423 Commits

Author SHA1 Message Date
cyy
df458be4e5 [4/N] Apply py39 ruff and pyupgrade fixes (#143257)
```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257
Approved by: https://github.com/justinchuby, https://github.com/albanD
2025-01-04 10:47:51 +00:00
Wenqin Yang
8d9ff9c8a4 Fix a bug for wrong stride in fake tensor (#141427)
Fixes #141426

Please see details in the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141427
Approved by: https://github.com/jansel
2024-12-31 23:45:32 +00:00
Tom Ritchford
d8c8ba2440 Fix unused Python variables in test/[e-z]* (#136964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964
Approved by: https://github.com/justinchuby, https://github.com/albanD
2024-12-18 23:02:30 +00:00
cyy
7c1d5db1f3 [2/N] Enable UBSAN tests (#141740)
Apply c10::load in more places. The function was introduced to cast a byte to valid boolean values, thus fixing the UBSAN errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141740
Approved by: https://github.com/ezyang
2024-12-03 20:52:26 +00:00
George Wigley
44707b0667 Pass rounding_mode for div reference inputs through kwargs (#136308)
Previously, the reference inputs for div with rounding mode did not supply the rounding_mode keyword argument. This didn't match the sample inputs for this op.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136308
Approved by: https://github.com/albanD

Co-authored-by: Xia, Weiwen <weiwen.xia@intel.com>
Co-authored-by: Bob Ren <bobren@meta.com>
Co-authored-by: Xilun Wu <12968408+XilunWu@users.noreply.github.com>
Co-authored-by: siahuat0727 <tansiahuat@gmail.com>
2024-11-29 21:28:24 +00:00
Jake Harmon
740d1eb030 Fix test_out when run on CPU with CUDA available (#137140)
Ever since #135140, this test will fail if run with CPU parameterization (e.g. test_out__refs_logical_or_cpu_float32) and CUDA available - as far as I can tell, the PyTorch CI isn't currently checking for this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137140
Approved by: https://github.com/ezyang
2024-11-21 23:10:07 +00:00
Yukio Siraichi
446ea2aea5 pow: fix meta function output argument dtype check. (#140287)
Tracking issue: #138399

This PR changes the `pow` C++ implementation, making its C++ meta kernel consistent with
its Python ref implementation. The following example shows the inconsistency between the
two:

```python
def run(device):
    S = (5,)
    a = torch.rand(S, device=device, dtype=torch.float32)
    b = 2
    out = torch.empty(S, device=device, dtype=torch.float64)
    return torch.pow(a, b, out=out)

>>> run("cpu")
Traceback (most recent call last):
  File "test.py", line 34, in run
    return torch.pow(a, b, out=out)
RuntimeError: Found dtype Double but expected Float

>>> run("meta")
tensor(..., device='meta', size=(5,), dtype=torch.float64)
```

**~Update:~**

~Note that this happens only for `pow.Tensor_Scalar` overloads. Therefore, this PR needed
further 2 modifications:~

- ~Split the `pow` ref implementation, making `pow.Tensor_Scalar` error on mismatching
output dtypes~
- ~Create a dispatch for `pow` when `_refs.pow()` is called~

**Update:**

Changing the `TensorIteratorConfig` for `pow.Tensor_Scalar` was easier and,
after the discussion below, more correct. The solution was to change the
`TensorIteratorBase::build_output_borrowing_argument_owning_unary_op` function,
setting:

- `cast_common_dtype_to_outputs`; and
- `enforce_safe_casting_to_output`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140287
Approved by: https://github.com/ezyang
2024-11-20 13:28:47 +00:00
Yukio Siraichi
48a276c5a0 log_softmax: fix meta function output argument dtype check. (#140289)
Tracking issue: #138399
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140289
Approved by: https://github.com/ezyang
ghstack dependencies: #140186, #140286, #140288
2024-11-18 23:05:29 +00:00
Yukio Siraichi
435286e985 Fix unary references' out dtype check. (#140288)
Tracking issue: #138399

This PR fixes a number of reference implementations (which are also used as meta
functions), making them more consistent with CPU device. More specifically, it fixes those
operations that use `_make_elementwise_unary_reference` decorator, and don't error on
mismatching out argument dtype while they error when using concrete devices (e.g. CPU).

The fixed operations are:

- `abs`
- `ceil`
- `floor`
- `frac`
- `isneginf`
- `isposinf`
- `sgn`
- `sign`
- `signbit`
- `trunc`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140288
Approved by: https://github.com/ezyang
ghstack dependencies: #140186, #140286
2024-11-18 23:05:29 +00:00
Yukio Siraichi
216b6a952c triangular_solve: fix meta function output argument dtype check. (#140286)
Tracking issue: #138399
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140286
Approved by: https://github.com/ezyang
ghstack dependencies: #140186
2024-11-14 15:25:14 +00:00
zeshengzong
cb71bcc542 Replace clone.detach with detach.clone (#140264)
Fixes #64532

As state in issue, replace `clone.detach` by `detach.clone`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264
Approved by: https://github.com/soulitzer
2024-11-13 07:01:02 +00:00
Yukio Siraichi
c182c7ccfc Fix triangular_solve meta function out parameter names. (#140186)
This PR replaces the parameter names specified in the `triangular_solve_meta`
function (specifically in its `@out_wrapper(...)` decorator) by those written in the
_native_functions.yaml_ file.

This name mismatch caused the operation to fail when using the meta device (see error
below):

```python
Traceback (most recent call last):
  File "examples/test.py", line 23, in <module>
    torch.triangular_solve(b.to("meta"), A.to("meta"), out=meta_out)
  File "torch/_decomp/__init__.py", line 100, in _fn
    return f(*args, **kwargs, out=None if is_none else out_kwargs)
  File "torch/_prims_common/wrappers.py", line 289, in _fn
    result = fn(*args, **kwargs)
TypeError: triangular_solve_meta() got an unexpected keyword argument 'X'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140186
Approved by: https://github.com/ezyang
2024-11-12 19:04:34 +00:00
Yukio Siraichi
fef5e94657 addmm: error on output dtype mismatch. (#138520)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138520
Approved by: https://github.com/ezyang
ghstack dependencies: #138515
2024-10-30 21:46:39 +00:00
Yukio Siraichi
6da3a043a8 Add test for consistency between meta and CPU devices. (#138515)
Reference: https://github.com/pytorch/pytorch/issues/138399

This PR introduces an `OpInfo` test that checks whether running each `out=` operation
using meta inputs is consistent with using concrete (e.g. CPU) inputs. More specifically,
it tests the case where the output tensors are not of the expected data type. According to
the `out=` specification, some operations should error.

I have added XFAIL to the set of operations that are currently failing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138515
Approved by: https://github.com/ezyang
2024-10-30 21:46:39 +00:00
cyy
da1c1a9884 [4/N] Don't skip ASAN on some tests (#139189)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139189
Approved by: https://github.com/ezyang
2024-10-30 00:59:32 +00:00
PyTorch MergeBot
228963ad60 Revert "Add test for consistency between meta and CPU devices. (#138515)"
This reverts commit 006130d8ea.

Reverted https://github.com/pytorch/pytorch/pull/138515 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the test is failing in trunk, maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/138515#issuecomment-2442357471))
2024-10-28 18:45:09 +00:00
Yukio Siraichi
006130d8ea Add test for consistency between meta and CPU devices. (#138515)
Reference: https://github.com/pytorch/pytorch/issues/138399

This PR introduces an `OpInfo` test that checks whether running each `out=` operation
using meta inputs is consistent with using concrete (e.g. CPU) inputs. More specifically,
it tests the case where the output tensors are not of the expected data type. According to
the `out=` specification, some operations should error.

I have added XFAIL to the set of operations that are currently failing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138515
Approved by: https://github.com/ezyang
2024-10-28 16:58:48 +00:00
Benjamin Glass
f984b88718 Ensure noncontiguous tensor creation tests offsetting (#136396)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136396
Approved by: https://github.com/amjames, https://github.com/eellison
ghstack dependencies: #136055
2024-10-02 00:40:43 +00:00
Fuzzkatt
6300eb1dc7 tf32 off for test_noncontiguous_samples in test_ops.py (#136484)
Upstreaming minor unit test fix from nvidia internal CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136484
Approved by: https://github.com/soulitzer
2024-09-24 14:26:47 +00:00
David Berard
289486d007 Move attention kernels back from fake_impls to meta_registrations (#134288)
See #121528 for additional context.

In #120682, we moved the attention kernels from meta_registrations to fake_impls with the intent of fixing the device handling for seed/offset: these are typically on CPU. We needed to put the registrations in fake_impls to do this because meta_registrations doesn't have a way to specify device, whereas fake_impls does. But when we tried to actually fix the device types (#120839), we had to revert the PR because it broke cudagraph handling (during which seed/offset _are_ on CUDA).

Now, we want to put the registrations back in meta_registrations so that we can call these kernels with meta tensors. The use case is later in this stack - we want to be able to use the flop counter with these kernels.

Also - I specifically skip the `compare_tensor_meta()` check in test_fake / test_fake_autocast tests for the `_efficient_attention_forward` and `_flash_attention_forward` kernels, which fails because of the device mismatch from the seed/offset tensors. Then we can un-skip these opinfos. I verified that the efficient_attention_forward bug (#120842) is now caught by these opinfos if I revert the fix from this PR.

Differential Revision: [D61687369](https://our.internmc.facebook.com/intern/diff/D61687369)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134288
Approved by: https://github.com/drisspg
2024-08-27 21:10:36 +00:00
David Berard
d433a603af [BE] use torch.amp.autocast instead of torch.cuda.amp.autocast (#134291)
torch.cuda.amp.autocast / torch.cpu.amp.autocast are deprecated and spew a ton of warnings when these tests run. This PR: Update to just use torch.amp.autocast(device).

Note: this uncovers a bug in the test: when `device` is CUDA, it actually shows up as "cuda:0" - so previously, this test was _always_ using `torch.cpu.amp.autocast` even for `cuda` device. This PR fixes this, and uncovers additional bugs in `pinverse` and `linalg.pinv`; `linalg.pinv` was already failing before on CPU, but now the test also catches failures on CUDA, (and this PR adds to the skipped-test list).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134291
Approved by: https://github.com/YuqingJ
2024-08-24 15:07:49 +00:00
Xuehai Pan
4226ed1585 [BE] Format uncategorized Python files with ruff format (#132576)
Remove patterns `**`, `test/**`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576
Approved by: https://github.com/ezyang, https://github.com/Skylion007
ghstack dependencies: #132574
2024-08-04 17:13:31 +00:00
zengxian
d3e932dc10 [CI] Add inductor cpu accuracy test running on AVX2 runners (#128682)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128682
Approved by: https://github.com/jgong5, https://github.com/desertfire
2024-07-26 13:24:41 +00:00
ankurneog
ebc012ace6 Add hooks for execution on intel gaudi devices - 1 (#128584)
## Motivation
This is follow up to PR:https://github.com/pytorch/pytorch/pull/126970  to support Gaudi devices for Pytorch UT execution.

## Changes
We are adding additional hooks to:
1. Add dtype exceptions for Gaudi/HPU
2. Extend onlyNativeDevices decorator  functionality to add additional devices

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128584
Approved by: https://github.com/albanD
2024-07-20 05:03:36 +00:00
Xuehai Pan
ba48cf6535 [BE][Easy][6/19] enforce style for empty lines in import segments in test/ (#129757)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757
Approved by: https://github.com/ezyang
2024-07-17 06:42:37 +00:00
Xuehai Pan
4d7bf72d93 [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206
Approved by: https://github.com/malfet
2024-07-14 08:17:52 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
a-gardner1
3c1cf03fde Add fake impl for aten.unique_dim (#126561)
Follow-up to #113118 and #124306.

Developed in coordination with the solution to https://github.com/microsoft/onnxscript/pull/1547

This PR adds the missing fake tensor implementation for `aten.unique_dim`, thus enabling tracing and compilation of `torch.unique` when `dim` is not None.

Local testing has proceeded with the following simple script (provided that one has checked out the changes in https://github.com/microsoft/onnxscript/pull/1547):

```python
    import onnx
    import onnxruntime as ort
    import logging
    import numpy as np
    onnx_program = torch.onnx.dynamo_export(
        lambda x: torch.unique(x,
                               dim=0,
                               return_inverse=True),
        torch.arange(10),
        export_options=torch.onnx.ExportOptions(
            dynamic_shapes=True,
            diagnostic_options=torch.onnx.DiagnosticOptions(
                verbosity_level=logging.DEBUG)))
    onnx_program.save("torch_unique.onnx")
    onnx_inputs = onnx_program.adapt_torch_inputs_to_onnx(torch.arange(10))
    onnx_outputs = onnx_program(*onnx_inputs)
    loaded_onnx_program = onnx.load("torch_unique.onnx")
    onnx.checker.check_model(loaded_onnx_program)
    ort_session = ort.InferenceSession("torch_unique.onnx")
    inputs = np.random.randint(0, 10, 10)
    print(f"Inputs: {inputs}")
    outputs = ort_session.run(None,
                              {
                                  "l_x_": inputs
                              })
    print(f"Outputs: {outputs}")
    print("Success")
```

Co-authored-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126561
Approved by: https://github.com/ezyang
2024-06-01 04:03:10 +00:00
Aaron Gokaslan
5a1216bb2e [BE]: Update ruff to 0.4.1 (#124549)
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.

Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0

| Repository                                         | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7         | 251.8         | 351.1            | 274.9            |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
2024-04-21 14:06:23 +00:00
Tugsbayasgalan Manlaibaatar
d23bf9cef0 Add fake impl for aten.unique2 (#124306)
Reapply of: https://github.com/pytorch/pytorch/pull/121571
Differential Revision: [D56258431](https://our.internmc.facebook.com/intern/diff/D56258431)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124306
Approved by: https://github.com/gmagogsfm
2024-04-17 22:55:27 +00:00
statelesshz
2216068559 Enable UFMT on test/test_ops* (#123935)
Part of https://github.com/pytorch/pytorch/issues/123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123935
Approved by: https://github.com/ezyang
2024-04-13 03:31:56 +00:00
Kurt Mohler
db895ace1d Only run backward part of COW test if results are strided (#123870)
Fixes #123792

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123870
Approved by: https://github.com/ezyang
2024-04-12 04:43:02 +00:00
Kurt Mohler
3908ebca86 Test COW materialization in backward ops (#123593)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123593
Approved by: https://github.com/ezyang
2024-04-09 22:31:50 +00:00
Kurt Mohler
ca9606f809 Update COW OpInfo test to include kwargs and expected materialization (#122437)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122437
Approved by: https://github.com/ezyang
2024-03-24 06:07:30 +00:00
PyTorch MergeBot
c80601f35a Revert "Avoid COW materialize in conv, log sigmoid, repeat, group_norm, batch_norm (#121537)"
This reverts commit a2a88f39ee.

Reverted https://github.com/pytorch/pytorch/pull/121537 on behalf of https://github.com/kurtamohler due to flaky CI failures ([comment](https://github.com/pytorch/pytorch/pull/121537#issuecomment-2010937226))
2024-03-21 00:03:30 +00:00
Kurt Mohler
a2a88f39ee Avoid COW materialize in conv, log sigmoid, repeat, group_norm, batch_norm (#121537)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121537
Approved by: https://github.com/ezyang
2024-03-19 06:15:00 +00:00
Peter Bell
34a28f01dd [Autograd] Improve error for leaf tensors as out argument to fallback (#121089)
Closes  #120988

Currently operators that hit the autograd fallback call `check_inplace`
on all mutated inputs, including out arguments. This leads to a slightly
confusing error message:
```
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
```

Compared to functions that don't fallback, which raise
```
RuntimeError: add(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.
```

This changes the error message to make clear the issue is with the out argument,
but does not tighten the check to outright ban out arguments that require grad.
Instead, I use the same checks from `check_inplace` which allows non-leaf tensors
that require grad to pass without error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121089
Approved by: https://github.com/lezcano, https://github.com/soulitzer
ghstack dependencies: #121142
2024-03-05 21:13:27 +00:00
Kurt Mohler
77aea289ae Add test to check that COW inputs are not materialized (#119507)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119507
Approved by: https://github.com/ezyang
ghstack dependencies: #120455
2024-03-01 05:05:28 +00:00
Sergii Dymchenko
09aefe1502 Fix ouput typos (#120870)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120870
Approved by: https://github.com/clee2000
2024-02-29 08:29:14 +00:00
David Berard
df1e855313 [fake_impls] fix max_seqlen return values in efficient_attention_forward (#120842)
To match the actual implementation, we should return the max_seqlen_q/k, not M, N, when in the sparse case

7e185277cd/aten/src/ATen/native/transformers/cuda/attention.cu (L981-L996)

Note that although the .cu file sets max_seqlen_k = 0 in the sparse case, it actually returns max_seqlen_k or N:

7e185277cd/aten/src/ATen/native/transformers/cuda/attention.cu (L1224-L1231)

Tests - added in the next PR (#102839, which also fixes other parts of the test_fake tests so that we can un-xfail them and actually run the tests)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120842
Approved by: https://github.com/YuqingJ
ghstack dependencies: #120682
2024-02-29 07:12:27 +00:00
PyTorch MergeBot
dbe0967a0a Revert "Add test to check that COW inputs are not materialized (#119507)"
This reverts commit 2ebf2c88ba.

Reverted https://github.com/pytorch/pytorch/pull/119507 on behalf of https://github.com/izaitsevfb due to breaks xla jobs ([comment](https://github.com/pytorch/pytorch/pull/119507#issuecomment-1970022840))
2024-02-28 22:26:59 +00:00
Kurt Mohler
2ebf2c88ba Add test to check that COW inputs are not materialized (#119507)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119507
Approved by: https://github.com/ezyang
ghstack dependencies: #120455
2024-02-28 00:37:33 +00:00
Isuru Fernando
b7df3bba62 add decomposition for frexp (#119217)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119217
Approved by: https://github.com/peterbell10
ghstack dependencies: #119284, #120027
2024-02-23 21:52:42 +00:00
Joel Schlosser
9ec8dd2467 Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.

```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
  virtual ~ViewFunc() {}
  /// Returns any SymInts in the saved state.
  virtual std::vector<c10::SymInt> get_symints() const { return {}; }
  /// Returns the number of SymInts in the saved state.
  virtual size_t num_symints() const { return 0; }
  /// Returns any tensors in the saved state.
  virtual std::vector<at::Tensor> get_tensors() const { return {}; }
  /// Returns the number of tensors in the saved state.
  virtual size_t num_tensors() const { return 0; }
  /// Reapplies the view on the given base using the saved state.
  virtual at::Tensor operator()(const at::Tensor&) const = 0;
  /// Returns a clone of this ViewFunc, optionally with the specified saved state.
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;

protected:
  /// Sets the values of any SymInts in the saved state. The input vector size must
  /// match the number of SymInts in the saved state (i.e. the size of the list
  /// returned by get_symints()).
  virtual void set_symints(std::vector<c10::SymInt>) {}
  /// Sets the values of any Tensors in the saved state. The input vector size must
  /// match the number of Tensors in the saved state (i.e. the size of the list
  /// returned by get_tensors()).
  virtual void set_tensors(std::vector<at::Tensor>) {}
};
```

New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`

The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.

Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
  SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
  {};
  virtual ~SliceTensorViewFunc() override {};
  virtual std::vector<c10::SymInt> get_symints() const override;
  virtual size_t num_symints() const override;
  virtual std::vector<at::Tensor> get_tensors() const override;
  virtual size_t num_tensors() const override;
  virtual at::Tensor operator()(const at::Tensor&) const override;
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;

protected:
  virtual void set_symints(std::vector<c10::SymInt>) override;
  virtual void set_tensors(std::vector<at::Tensor>) override;

private:
  int64_t dim;
  c10::optional<c10::SymInt> start;
  c10::optional<c10::SymInt> end;
  c10::SymInt step;
};
...

// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
  ::std::vector<c10::SymInt> symints;
  symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
  if(start.has_value()) symints.insert(symints.end(), *(start));
  if(end.has_value()) symints.insert(symints.end(), *(end));
  symints.push_back(step);
  return symints;
}

size_t SliceTensorViewFunc::num_symints() const {
  return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}

void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
  TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
  auto i = 0;
  if(start.has_value()) start = symints[i];
  i += (start.has_value() ? 1 : 0);
  if(end.has_value()) end = symints[i];
  i += (end.has_value() ? 1 : 0);
  step = symints[i];
}

std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
  ::std::vector<at::Tensor> tensors;
  return tensors;
}

size_t SliceTensorViewFunc::num_tensors() const {
  return static_cast<size_t>(0);
}

void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
  TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());

}

at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
  return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}

std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
    std::optional<std::vector<c10::SymInt>> symints,
    std::optional<std::vector<at::Tensor>> tensors) const {
  auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
  if (symints.has_value()) {
    output->set_symints(std::move(*(symints)));
  }
  if (tensors.has_value()) {
    output->set_tensors(std::move(*(tensors)));
  }
  return output;
}
```

The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.

For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-14 22:00:43 +00:00
blorange-amd
df9b44436a [ROCm] Enable float16/complex32 fft tests on ROCm (#117296)
This PR is to enable float16/complex32 fft tests on ROCm.
Sample results are attached here:
[test_spectral_ops_results.log](https://github.com/pytorch/pytorch/files/13908533/test_spectral_ops_results.log)

test_decomp::TestDecompCUDA::test_comprehensive_fft*
test_decomp::TestDecompCUDA::test_quick_fft*
test_jit_fuser_te::TestNNCOpInfoCUDA::test_nnc_correctness_fft*
test_meta::TestMetaCUDA::test_dispatch_meta_inplace_fft*
test_meta::TestMetaCUDA::test_dispatch_meta_outplace_fft*
test_meta::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft*
test_meta::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft*
test_meta::TestMetaCUDA::test_meta_inplace_fft*
test_meta::TestMetaCUDA::test_meta_outplace_fft*
test_ops::TestCommonCUDA::test_complex_half_reference_testing_fft*
test_ops::TestCommonCUDA::test_python_ref__refs_fft*
test_ops::TestCommonCUDA::test_python_ref_executor__refs_fft*
test_ops::TestCommonCUDA::test_python_ref_meta__refs*
test_ops::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft*
test_schema_check::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft*
test_spectral_ops::TestFFTCUDA::test_empty_fft__refs_fft*
test_spectral_ops::TestFFTCUDA::test_empty_fft_fft*
test_spectral_ops::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft*
test_spectral_ops::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft*
test_spectral_ops::TestFFTCUDA::test_fft_round_trip_cuda*
test_spectral_ops::TestFFTCUDA::test_fft_type_promotion_cuda*
test_spectral_ops::TestFFTCUDA::test_fftn_round_trip_cuda*
test_spectral_ops::TestFFTCUDA::test_hfftn_cuda_float16
test_spectral_ops::TestFFTCUDA::test_ihfftn_cuda_float16
test_utils::TestDeviceUtilsCUDA::test_device_mode_ops_fft

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117296
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2024-02-13 22:35:32 +00:00
PyTorch MergeBot
24bdd03d23 Revert "Reify view_func() closures as ViewFuncs (#118404)"
This reverts commit d5a6762263.

Reverted https://github.com/pytorch/pytorch/pull/118404 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/118404#issuecomment-1938600260))
2024-02-12 12:38:51 +00:00
Joel Schlosser
d5a6762263 Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.

```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
  virtual ~ViewFunc() {}
  /// Returns any SymInts in the saved state.
  virtual std::vector<c10::SymInt> get_symints() const { return {}; }
  /// Returns the number of SymInts in the saved state.
  virtual size_t num_symints() const { return 0; }
  /// Returns any tensors in the saved state.
  virtual std::vector<at::Tensor> get_tensors() const { return {}; }
  /// Returns the number of tensors in the saved state.
  virtual size_t num_tensors() const { return 0; }
  /// Reapplies the view on the given base using the saved state.
  virtual at::Tensor operator()(const at::Tensor&) const = 0;
  /// Returns a clone of this ViewFunc, optionally with the specified saved state.
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;

protected:
  /// Sets the values of any SymInts in the saved state. The input vector size must
  /// match the number of SymInts in the saved state (i.e. the size of the list
  /// returned by get_symints()).
  virtual void set_symints(std::vector<c10::SymInt>) {}
  /// Sets the values of any Tensors in the saved state. The input vector size must
  /// match the number of Tensors in the saved state (i.e. the size of the list
  /// returned by get_tensors()).
  virtual void set_tensors(std::vector<at::Tensor>) {}
};
```

New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`

The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.

Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
  SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
  {};
  virtual ~SliceTensorViewFunc() override {};
  virtual std::vector<c10::SymInt> get_symints() const override;
  virtual size_t num_symints() const override;
  virtual std::vector<at::Tensor> get_tensors() const override;
  virtual size_t num_tensors() const override;
  virtual at::Tensor operator()(const at::Tensor&) const override;
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;

protected:
  virtual void set_symints(std::vector<c10::SymInt>) override;
  virtual void set_tensors(std::vector<at::Tensor>) override;

private:
  int64_t dim;
  c10::optional<c10::SymInt> start;
  c10::optional<c10::SymInt> end;
  c10::SymInt step;
};
...

// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
  ::std::vector<c10::SymInt> symints;
  symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
  if(start.has_value()) symints.insert(symints.end(), *(start));
  if(end.has_value()) symints.insert(symints.end(), *(end));
  symints.push_back(step);
  return symints;
}

size_t SliceTensorViewFunc::num_symints() const {
  return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}

void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
  TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
  auto i = 0;
  if(start.has_value()) start = symints[i];
  i += (start.has_value() ? 1 : 0);
  if(end.has_value()) end = symints[i];
  i += (end.has_value() ? 1 : 0);
  step = symints[i];
}

std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
  ::std::vector<at::Tensor> tensors;
  return tensors;
}

size_t SliceTensorViewFunc::num_tensors() const {
  return static_cast<size_t>(0);
}

void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
  TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());

}

at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
  return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}

std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
    std::optional<std::vector<c10::SymInt>> symints,
    std::optional<std::vector<at::Tensor>> tensors) const {
  auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
  if (symints.has_value()) {
    output->set_symints(std::move(*(symints)));
  }
  if (tensors.has_value()) {
    output->set_tensors(std::move(*(tensors)));
  }
  return output;
}
```

The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.

For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-09 18:51:36 +00:00
Isuru Fernando
3e79ef6db8 Complete decomposition for aten.round (#118635)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118635
Approved by: https://github.com/peterbell10
2024-02-01 17:14:44 +00:00
Isuru Fernando
2f7839e6db register decomposition for rsub in torch._refs (#118288)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118288
Approved by: https://github.com/lezcano
ghstack dependencies: #118398
2024-01-30 22:18:15 +00:00
Alexander Grund
f1aef2c094 Don't check is_conj for _refs.linalg.svd (#117972)
The flag is not correctly set when PyTorch is compiled with GPU support resulting in failures in
`test_ops.py::test_python_ref_meta__refs_linalg_svd_cpu_complex`.

Use a similar approach to test_meta and skip the check for this function.

Workaround for #105068

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117972
Approved by: https://github.com/lezcano
2024-01-26 15:24:29 +00:00