Commit Graph

402 Commits

Author SHA1 Message Date
Xuehai Pan
4226ed1585 [BE] Format uncategorized Python files with ruff format (#132576)
Remove patterns `**`, `test/**`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576
Approved by: https://github.com/ezyang, https://github.com/Skylion007
ghstack dependencies: #132574
2024-08-04 17:13:31 +00:00
zengxian
d3e932dc10 [CI] Add inductor cpu accuracy test running on AVX2 runners (#128682)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128682
Approved by: https://github.com/jgong5, https://github.com/desertfire
2024-07-26 13:24:41 +00:00
ankurneog
ebc012ace6 Add hooks for execution on intel gaudi devices - 1 (#128584)
## Motivation
This is follow up to PR:https://github.com/pytorch/pytorch/pull/126970  to support Gaudi devices for Pytorch UT execution.

## Changes
We are adding additional hooks to:
1. Add dtype exceptions for Gaudi/HPU
2. Extend onlyNativeDevices decorator  functionality to add additional devices

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128584
Approved by: https://github.com/albanD
2024-07-20 05:03:36 +00:00
Xuehai Pan
ba48cf6535 [BE][Easy][6/19] enforce style for empty lines in import segments in test/ (#129757)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757
Approved by: https://github.com/ezyang
2024-07-17 06:42:37 +00:00
Xuehai Pan
4d7bf72d93 [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206
Approved by: https://github.com/malfet
2024-07-14 08:17:52 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
a-gardner1
3c1cf03fde Add fake impl for aten.unique_dim (#126561)
Follow-up to #113118 and #124306.

Developed in coordination with the solution to https://github.com/microsoft/onnxscript/pull/1547

This PR adds the missing fake tensor implementation for `aten.unique_dim`, thus enabling tracing and compilation of `torch.unique` when `dim` is not None.

Local testing has proceeded with the following simple script (provided that one has checked out the changes in https://github.com/microsoft/onnxscript/pull/1547):

```python
    import onnx
    import onnxruntime as ort
    import logging
    import numpy as np
    onnx_program = torch.onnx.dynamo_export(
        lambda x: torch.unique(x,
                               dim=0,
                               return_inverse=True),
        torch.arange(10),
        export_options=torch.onnx.ExportOptions(
            dynamic_shapes=True,
            diagnostic_options=torch.onnx.DiagnosticOptions(
                verbosity_level=logging.DEBUG)))
    onnx_program.save("torch_unique.onnx")
    onnx_inputs = onnx_program.adapt_torch_inputs_to_onnx(torch.arange(10))
    onnx_outputs = onnx_program(*onnx_inputs)
    loaded_onnx_program = onnx.load("torch_unique.onnx")
    onnx.checker.check_model(loaded_onnx_program)
    ort_session = ort.InferenceSession("torch_unique.onnx")
    inputs = np.random.randint(0, 10, 10)
    print(f"Inputs: {inputs}")
    outputs = ort_session.run(None,
                              {
                                  "l_x_": inputs
                              })
    print(f"Outputs: {outputs}")
    print("Success")
```

Co-authored-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126561
Approved by: https://github.com/ezyang
2024-06-01 04:03:10 +00:00
Aaron Gokaslan
5a1216bb2e [BE]: Update ruff to 0.4.1 (#124549)
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.

Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0

| Repository                                         | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7         | 251.8         | 351.1            | 274.9            |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
2024-04-21 14:06:23 +00:00
Tugsbayasgalan Manlaibaatar
d23bf9cef0 Add fake impl for aten.unique2 (#124306)
Reapply of: https://github.com/pytorch/pytorch/pull/121571
Differential Revision: [D56258431](https://our.internmc.facebook.com/intern/diff/D56258431)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124306
Approved by: https://github.com/gmagogsfm
2024-04-17 22:55:27 +00:00
statelesshz
2216068559 Enable UFMT on test/test_ops* (#123935)
Part of https://github.com/pytorch/pytorch/issues/123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123935
Approved by: https://github.com/ezyang
2024-04-13 03:31:56 +00:00
Kurt Mohler
db895ace1d Only run backward part of COW test if results are strided (#123870)
Fixes #123792

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123870
Approved by: https://github.com/ezyang
2024-04-12 04:43:02 +00:00
Kurt Mohler
3908ebca86 Test COW materialization in backward ops (#123593)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123593
Approved by: https://github.com/ezyang
2024-04-09 22:31:50 +00:00
Kurt Mohler
ca9606f809 Update COW OpInfo test to include kwargs and expected materialization (#122437)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122437
Approved by: https://github.com/ezyang
2024-03-24 06:07:30 +00:00
PyTorch MergeBot
c80601f35a Revert "Avoid COW materialize in conv, log sigmoid, repeat, group_norm, batch_norm (#121537)"
This reverts commit a2a88f39ee.

Reverted https://github.com/pytorch/pytorch/pull/121537 on behalf of https://github.com/kurtamohler due to flaky CI failures ([comment](https://github.com/pytorch/pytorch/pull/121537#issuecomment-2010937226))
2024-03-21 00:03:30 +00:00
Kurt Mohler
a2a88f39ee Avoid COW materialize in conv, log sigmoid, repeat, group_norm, batch_norm (#121537)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121537
Approved by: https://github.com/ezyang
2024-03-19 06:15:00 +00:00
Peter Bell
34a28f01dd [Autograd] Improve error for leaf tensors as out argument to fallback (#121089)
Closes  #120988

Currently operators that hit the autograd fallback call `check_inplace`
on all mutated inputs, including out arguments. This leads to a slightly
confusing error message:
```
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
```

Compared to functions that don't fallback, which raise
```
RuntimeError: add(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.
```

This changes the error message to make clear the issue is with the out argument,
but does not tighten the check to outright ban out arguments that require grad.
Instead, I use the same checks from `check_inplace` which allows non-leaf tensors
that require grad to pass without error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121089
Approved by: https://github.com/lezcano, https://github.com/soulitzer
ghstack dependencies: #121142
2024-03-05 21:13:27 +00:00
Kurt Mohler
77aea289ae Add test to check that COW inputs are not materialized (#119507)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119507
Approved by: https://github.com/ezyang
ghstack dependencies: #120455
2024-03-01 05:05:28 +00:00
Sergii Dymchenko
09aefe1502 Fix ouput typos (#120870)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120870
Approved by: https://github.com/clee2000
2024-02-29 08:29:14 +00:00
David Berard
df1e855313 [fake_impls] fix max_seqlen return values in efficient_attention_forward (#120842)
To match the actual implementation, we should return the max_seqlen_q/k, not M, N, when in the sparse case

7e185277cd/aten/src/ATen/native/transformers/cuda/attention.cu (L981-L996)

Note that although the .cu file sets max_seqlen_k = 0 in the sparse case, it actually returns max_seqlen_k or N:

7e185277cd/aten/src/ATen/native/transformers/cuda/attention.cu (L1224-L1231)

Tests - added in the next PR (#102839, which also fixes other parts of the test_fake tests so that we can un-xfail them and actually run the tests)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120842
Approved by: https://github.com/YuqingJ
ghstack dependencies: #120682
2024-02-29 07:12:27 +00:00
PyTorch MergeBot
dbe0967a0a Revert "Add test to check that COW inputs are not materialized (#119507)"
This reverts commit 2ebf2c88ba.

Reverted https://github.com/pytorch/pytorch/pull/119507 on behalf of https://github.com/izaitsevfb due to breaks xla jobs ([comment](https://github.com/pytorch/pytorch/pull/119507#issuecomment-1970022840))
2024-02-28 22:26:59 +00:00
Kurt Mohler
2ebf2c88ba Add test to check that COW inputs are not materialized (#119507)
Part of #97856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119507
Approved by: https://github.com/ezyang
ghstack dependencies: #120455
2024-02-28 00:37:33 +00:00
Isuru Fernando
b7df3bba62 add decomposition for frexp (#119217)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119217
Approved by: https://github.com/peterbell10
ghstack dependencies: #119284, #120027
2024-02-23 21:52:42 +00:00
Joel Schlosser
9ec8dd2467 Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.

```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
  virtual ~ViewFunc() {}
  /// Returns any SymInts in the saved state.
  virtual std::vector<c10::SymInt> get_symints() const { return {}; }
  /// Returns the number of SymInts in the saved state.
  virtual size_t num_symints() const { return 0; }
  /// Returns any tensors in the saved state.
  virtual std::vector<at::Tensor> get_tensors() const { return {}; }
  /// Returns the number of tensors in the saved state.
  virtual size_t num_tensors() const { return 0; }
  /// Reapplies the view on the given base using the saved state.
  virtual at::Tensor operator()(const at::Tensor&) const = 0;
  /// Returns a clone of this ViewFunc, optionally with the specified saved state.
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;

protected:
  /// Sets the values of any SymInts in the saved state. The input vector size must
  /// match the number of SymInts in the saved state (i.e. the size of the list
  /// returned by get_symints()).
  virtual void set_symints(std::vector<c10::SymInt>) {}
  /// Sets the values of any Tensors in the saved state. The input vector size must
  /// match the number of Tensors in the saved state (i.e. the size of the list
  /// returned by get_tensors()).
  virtual void set_tensors(std::vector<at::Tensor>) {}
};
```

New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`

The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.

Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
  SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
  {};
  virtual ~SliceTensorViewFunc() override {};
  virtual std::vector<c10::SymInt> get_symints() const override;
  virtual size_t num_symints() const override;
  virtual std::vector<at::Tensor> get_tensors() const override;
  virtual size_t num_tensors() const override;
  virtual at::Tensor operator()(const at::Tensor&) const override;
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;

protected:
  virtual void set_symints(std::vector<c10::SymInt>) override;
  virtual void set_tensors(std::vector<at::Tensor>) override;

private:
  int64_t dim;
  c10::optional<c10::SymInt> start;
  c10::optional<c10::SymInt> end;
  c10::SymInt step;
};
...

// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
  ::std::vector<c10::SymInt> symints;
  symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
  if(start.has_value()) symints.insert(symints.end(), *(start));
  if(end.has_value()) symints.insert(symints.end(), *(end));
  symints.push_back(step);
  return symints;
}

size_t SliceTensorViewFunc::num_symints() const {
  return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}

void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
  TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
  auto i = 0;
  if(start.has_value()) start = symints[i];
  i += (start.has_value() ? 1 : 0);
  if(end.has_value()) end = symints[i];
  i += (end.has_value() ? 1 : 0);
  step = symints[i];
}

std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
  ::std::vector<at::Tensor> tensors;
  return tensors;
}

size_t SliceTensorViewFunc::num_tensors() const {
  return static_cast<size_t>(0);
}

void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
  TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());

}

at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
  return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}

std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
    std::optional<std::vector<c10::SymInt>> symints,
    std::optional<std::vector<at::Tensor>> tensors) const {
  auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
  if (symints.has_value()) {
    output->set_symints(std::move(*(symints)));
  }
  if (tensors.has_value()) {
    output->set_tensors(std::move(*(tensors)));
  }
  return output;
}
```

The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.

For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-14 22:00:43 +00:00
blorange-amd
df9b44436a [ROCm] Enable float16/complex32 fft tests on ROCm (#117296)
This PR is to enable float16/complex32 fft tests on ROCm.
Sample results are attached here:
[test_spectral_ops_results.log](https://github.com/pytorch/pytorch/files/13908533/test_spectral_ops_results.log)

test_decomp::TestDecompCUDA::test_comprehensive_fft*
test_decomp::TestDecompCUDA::test_quick_fft*
test_jit_fuser_te::TestNNCOpInfoCUDA::test_nnc_correctness_fft*
test_meta::TestMetaCUDA::test_dispatch_meta_inplace_fft*
test_meta::TestMetaCUDA::test_dispatch_meta_outplace_fft*
test_meta::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft*
test_meta::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft*
test_meta::TestMetaCUDA::test_meta_inplace_fft*
test_meta::TestMetaCUDA::test_meta_outplace_fft*
test_ops::TestCommonCUDA::test_complex_half_reference_testing_fft*
test_ops::TestCommonCUDA::test_python_ref__refs_fft*
test_ops::TestCommonCUDA::test_python_ref_executor__refs_fft*
test_ops::TestCommonCUDA::test_python_ref_meta__refs*
test_ops::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft*
test_schema_check::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft*
test_spectral_ops::TestFFTCUDA::test_empty_fft__refs_fft*
test_spectral_ops::TestFFTCUDA::test_empty_fft_fft*
test_spectral_ops::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error__refs_fft*
test_spectral_ops::TestFFTCUDA::test_fft_half_and_chalf_not_power_of_two_error_fft*
test_spectral_ops::TestFFTCUDA::test_fft_round_trip_cuda*
test_spectral_ops::TestFFTCUDA::test_fft_type_promotion_cuda*
test_spectral_ops::TestFFTCUDA::test_fftn_round_trip_cuda*
test_spectral_ops::TestFFTCUDA::test_hfftn_cuda_float16
test_spectral_ops::TestFFTCUDA::test_ihfftn_cuda_float16
test_utils::TestDeviceUtilsCUDA::test_device_mode_ops_fft

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117296
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2024-02-13 22:35:32 +00:00
PyTorch MergeBot
24bdd03d23 Revert "Reify view_func() closures as ViewFuncs (#118404)"
This reverts commit d5a6762263.

Reverted https://github.com/pytorch/pytorch/pull/118404 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/118404#issuecomment-1938600260))
2024-02-12 12:38:51 +00:00
Joel Schlosser
d5a6762263 Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.

```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
  virtual ~ViewFunc() {}
  /// Returns any SymInts in the saved state.
  virtual std::vector<c10::SymInt> get_symints() const { return {}; }
  /// Returns the number of SymInts in the saved state.
  virtual size_t num_symints() const { return 0; }
  /// Returns any tensors in the saved state.
  virtual std::vector<at::Tensor> get_tensors() const { return {}; }
  /// Returns the number of tensors in the saved state.
  virtual size_t num_tensors() const { return 0; }
  /// Reapplies the view on the given base using the saved state.
  virtual at::Tensor operator()(const at::Tensor&) const = 0;
  /// Returns a clone of this ViewFunc, optionally with the specified saved state.
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;

protected:
  /// Sets the values of any SymInts in the saved state. The input vector size must
  /// match the number of SymInts in the saved state (i.e. the size of the list
  /// returned by get_symints()).
  virtual void set_symints(std::vector<c10::SymInt>) {}
  /// Sets the values of any Tensors in the saved state. The input vector size must
  /// match the number of Tensors in the saved state (i.e. the size of the list
  /// returned by get_tensors()).
  virtual void set_tensors(std::vector<at::Tensor>) {}
};
```

New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`

The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.

Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
  SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
  {};
  virtual ~SliceTensorViewFunc() override {};
  virtual std::vector<c10::SymInt> get_symints() const override;
  virtual size_t num_symints() const override;
  virtual std::vector<at::Tensor> get_tensors() const override;
  virtual size_t num_tensors() const override;
  virtual at::Tensor operator()(const at::Tensor&) const override;
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;

protected:
  virtual void set_symints(std::vector<c10::SymInt>) override;
  virtual void set_tensors(std::vector<at::Tensor>) override;

private:
  int64_t dim;
  c10::optional<c10::SymInt> start;
  c10::optional<c10::SymInt> end;
  c10::SymInt step;
};
...

// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
  ::std::vector<c10::SymInt> symints;
  symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
  if(start.has_value()) symints.insert(symints.end(), *(start));
  if(end.has_value()) symints.insert(symints.end(), *(end));
  symints.push_back(step);
  return symints;
}

size_t SliceTensorViewFunc::num_symints() const {
  return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}

void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
  TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
  auto i = 0;
  if(start.has_value()) start = symints[i];
  i += (start.has_value() ? 1 : 0);
  if(end.has_value()) end = symints[i];
  i += (end.has_value() ? 1 : 0);
  step = symints[i];
}

std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
  ::std::vector<at::Tensor> tensors;
  return tensors;
}

size_t SliceTensorViewFunc::num_tensors() const {
  return static_cast<size_t>(0);
}

void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
  TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());

}

at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
  return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}

std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
    std::optional<std::vector<c10::SymInt>> symints,
    std::optional<std::vector<at::Tensor>> tensors) const {
  auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
  if (symints.has_value()) {
    output->set_symints(std::move(*(symints)));
  }
  if (tensors.has_value()) {
    output->set_tensors(std::move(*(tensors)));
  }
  return output;
}
```

The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.

For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-09 18:51:36 +00:00
Isuru Fernando
3e79ef6db8 Complete decomposition for aten.round (#118635)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118635
Approved by: https://github.com/peterbell10
2024-02-01 17:14:44 +00:00
Isuru Fernando
2f7839e6db register decomposition for rsub in torch._refs (#118288)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118288
Approved by: https://github.com/lezcano
ghstack dependencies: #118398
2024-01-30 22:18:15 +00:00
Alexander Grund
f1aef2c094 Don't check is_conj for _refs.linalg.svd (#117972)
The flag is not correctly set when PyTorch is compiled with GPU support resulting in failures in
`test_ops.py::test_python_ref_meta__refs_linalg_svd_cpu_complex`.

Use a similar approach to test_meta and skip the check for this function.

Workaround for #105068

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117972
Approved by: https://github.com/lezcano
2024-01-26 15:24:29 +00:00
Sam Larsen
208e64a9ba Initial implementation of FakeTensor caching (#113873)
Summary: Cache the result of FakeTensor dispatch and skip re-evaluation on cache hits.

Test Plan: New unit tests. Caching is enabled in this diff, so all existing tests exercise the cache as well.

Differential Revision: [D52841637](https://our.internmc.facebook.com/intern/diff/D52841637)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113873
Approved by: https://github.com/eellison
2024-01-17 20:38:54 +00:00
Joel Schlosser
3c21264c9b Introduce reverse view_funcs (#115894)
Part 2 of implementation for general [subclass view fake-ification](https://docs.google.com/document/d/1C5taWiplmX7nKiURXDOAZG2W5VNJ2iV0fQFq92H0Cxw).

Details:
* Codegen `rev_view_func()` alongside `view_func()`
    * Reverse view_func gives you a "base" from a "view": `rev_view_func(new_view) -> new_base` AKA it plays the original view backwards
* Utilizes the functional inverses defined in `FunctionalInverses.cpp`, passing `InverseReturnMode::AlwaysView`
* Manually implements functional inverses for `narrow()` and `chunk()`
* **NB: Multi-output views now set view_func() / rev_view_func() for each of the output views!**
    * Due to this, the `as_view()` overload that operates on a list of views is scrapped in favor of iteration via codegen

Example codegen in `ADInplaceOrViewTypeN.cpp`:
```cpp
at::Tensor narrow(c10::DispatchKeySet ks, const at::Tensor & self, int64_t dim, c10::SymInt start, c10::SymInt length) {
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::_ops::narrow::redispatch(ks & c10::after_ADInplaceOrView_keyset, self, dim, start, length);
  })();
  std::function<at::Tensor(const at::Tensor&)> func=nullptr;
  std::function<at::Tensor(const at::Tensor&)> rev_func=nullptr;
  if (false || !self.unsafeGetTensorImpl()->support_as_strided() ||
      c10::AutogradState::get_tls_state().get_view_replay_enabled()) {
    func = [=](const at::Tensor& input_base) {
      return at::_ops::narrow::call(input_base, dim, start, length);
    };
    rev_func = [=](const at::Tensor& input_view) {
      // NB: args from narrow() signature are passed along to the inverse
      return at::functionalization::FunctionalInverses::narrow_copy_inverse(self, input_view, at::functionalization::InverseReturnMode::AlwaysView, dim, start, length);
    };
  }
  auto result = as_view(/* base */ self, /* output */ _tmp, /* is_bw_differentiable */ true, /* is_fw_differentiable */ true, /* view_func */ func, /* rev_view_func */ rev_func, /* creation_meta */ InferenceMode::is_enabled() ? CreationMeta::INFERENCE_MODE : (at::GradMode::is_enabled() ? CreationMeta::DEFAULT : CreationMeta::NO_GRAD_MODE));
  return result;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115894
Approved by: https://github.com/soulitzer
2024-01-05 16:48:12 +00:00
Aaron Gokaslan
3fe437b24b [BE]: Update flake8 to v6.1.0 and fix lints (#116591)
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
  - `assert(a == b)` -> `assert a == b`
  - `if(x > y or y < z):`->`if x > y or y < z:`
  - And `return('...')` -> `return '...'`

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
2024-01-03 06:04:44 +00:00
kflu
c5dcb50c00 [easy] aten ops: support passing all args as kwargs, including self (#114920)
Summary:
This is important for writing aten IR based graph transformation.

```
In [4]: [x.name for x in torch.ops.aten.reshape.default._schema.arguments]
Out[4]: ['self', 'shape']

In [8]: torch.ops.aten.reshape.default(torch.rand(1,2), shape=[2])
Out[8]: tensor([0.7584, 0.4834])

# === CANNOT CALL `self` BY KWARGS ===

In [7]: torch.ops.aten.reshape.default(self=torch.rand(1,2), shape=[2])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 torch.ops.aten.reshape.default(self=torch.rand(1,2), shape=[2])

TypeError: OpOverload.__call__() got multiple values for argument 'self'

```

# Where's the problem?

1. the aten ops first arg is usually named `self` (aten/src/ATen/native/native_functions.yaml)
2. Unfortunately, in `torch._ops.{OpOverload, OpOverloadPacket}.__call__()`, the first arg is (by python convention) named `self` too.

So when call `self` by kwargs, `OpOverloadPacket.__call__` received:

```
OpOverloadPacket.__call__(self, {"self": ...})
```

It is Python that does not allow some argument named "arg" to appear twice. and hence

> TypeError: OpOverload.__call__() got multiple values for argument 'self'

# How to fix?

**Note that**, in above, `self` is an instance of `OpOverloadPacket`, and the "self" kwarg is the input tensor to the aten op. To fix, we only need to differentiate the two `self`s.

In Python, first arg of a method does not need to be named `self`. So we change the `__call__` definition to:

```
def __call__(_self, ...):
```

Now the call becomes:

```
OpOverloadPacket.__call__(_self, {"self": ...})
```

where:
* `_self` is the instance to the `OpOverloadPacket`
* `"self"` is the input tensor to the aten op.

Test Plan:
```
In [4]: [x.name for x in torch.ops.aten.reshape.default._schema.arguments]
Out[4]: ['self', 'shape']

In [3]: torch.ops.aten.reshape.default(self=torch.rand(1,2), shape=[2])
Out[3]: tensor([0.5127, 0.3051])
```

Differential Revision: D51731996

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114920
Approved by: https://github.com/houseroad
2023-12-16 18:32:58 +00:00
rzou
3477a2ee03 unMarkDynamoStrictTest on OpInfo-based tests (#115856)
These take too long to run under strict mode. We'll worry about them
later. Note that these decorators don't do anything yet (unless we flip
the default from non-strict to strict).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115856
Approved by: https://github.com/voznesenskym
ghstack dependencies: #115845, #115855
2023-12-15 01:22:31 +00:00
Isuru Fernando
505574c46a Add decomposition for torch.block_diag (#115096)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115096
Approved by: https://github.com/peterbell10
2023-12-11 20:04:22 +00:00
Aaron Gokaslan
794545c11f [BE]: Enable RUF015 codebase wide (#115507)
Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507
Approved by: https://github.com/malfet
2023-12-11 15:51:01 +00:00
Isuru Fernando
e4a88d9581 Convert SymInts to SymFloats with SymPy (#113683)
Fixes #109365

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113683
Approved by: https://github.com/ezyang, https://github.com/lezcano
2023-11-20 23:35:40 +00:00
Evgeni Burovski
237cbd5be6 BUG: trace frames with numpy scalar -> ndarray functions (#112959)
Fixes #112951

Make dynamo detect that `np.arange(3)` returns a FakeTensor, so the frame needs to be traced.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112959
Approved by: https://github.com/lezcano
2023-11-17 03:00:24 +00:00
Aryan Gupta
8cee0a25bd fix: Flake8-BugBear code B-026 for PyTorch (#111362)
Fixes #106571

I have fixed the B-026 error codes for Flake8 tests on the codebase. Please review and tell me anything else to do.
Thanks and excited for this first contribution to PyTorch.

Also I refer this issue which introduced [B-026](https://github.com/PyCQA/flake8-bugbear/issues/286) in `pytest-bugbear` and discuss the error code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111362
Approved by: https://github.com/Skylion007
2023-11-07 21:38:18 +00:00
Peter Bell
66c32d099a Use pytree.arg_tree_leaves everywhere (#112394)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394
Approved by: https://github.com/lezcano
ghstack dependencies: #112391, #112392, #112393
2023-10-31 15:57:06 +00:00
Peter Bell
bbd5b935e4 Use pytree.tree_leaves everywhere (#112324)
This changes all the instances I could find of `tree_flatten(...)[0]` or
`x, _ = tree_flatten` to use `tree_leaves`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324
Approved by: https://github.com/lezcano
ghstack dependencies: #112327, #112323
2023-10-30 03:39:04 +00:00
William Wen
a380bf3297 [dynamo, test] skip flaky dynamo-wrapped tests (#112310)
ghstack-source-id: 7a87e33e7513e7924e4513b6473284562989ed4c
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112309

Skip flaky tests reported by
- https://github.com/pytorch/pytorch/issues/111825
- https://github.com/pytorch/pytorch/issues/111826
- https://github.com/pytorch/pytorch/issues/111909
- https://github.com/pytorch/pytorch/issues/112142
- https://github.com/pytorch/pytorch/issues/112220

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112310
Approved by: https://github.com/xmfan
2023-10-28 04:14:57 +00:00
Isuru Fernando
c120e5606e Use ops_and_refs in test_ops.py instead of _ops_and_refs (#112022)
`ops_and_refs` and `_ops_and_refs` have the same definition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112022
Approved by: https://github.com/lezcano
2023-10-27 18:37:05 +00:00
Isuru Fernando
fdbb73fa4e Check both ops and refs in test_strided_layout (#112160)
Trying #112023 again to see if CLA issue is fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112160
Approved by: https://github.com/lezcano, https://github.com/Neilblaze
2023-10-27 15:35:34 +00:00
alhridoy
0c64ac0d3a Add tests for strided layout in factory functions (#111463)
Fixes #111222
This pull request adds tests for factory functions that create tensors with a strided layout. The tests are added to the `test_ops.py` file and check the behavior of the `empty`, `zeros`, `ones`, and `rand` factory functions when used with the `layout=torch.strided` argument.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111463
Approved by: https://github.com/lezcano
2023-10-24 17:05:44 +00:00
Philip Meier
973c87b320 raise instead of skip in test/test_meta.py (#110939)
Supersedes #109004.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110939
Approved by: https://github.com/lezcano, https://github.com/kurtamohler
2023-10-17 10:17:43 +00:00
Jez Ng
ddb0c26511 [inductor] Re-enable more fixed tests (#110798)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110798
Approved by: https://github.com/Skylion007
2023-10-09 04:36:51 +00:00
Jez Ng
dddf581da7 [dynamo] Add graph break on requires_grad_() (#110053)
Fixes #107861.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110053
Approved by: https://github.com/eellison
2023-10-04 06:22:16 +00:00
SS-JIA
5df8aca994 [core IR] Add a core decomposition for floor_divide (#110046)
## Context

Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table.

This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition

```
# TorchInductor-only decomposition. It should not be taken to core.
# See https://github.com/pytorch/torchdynamo/pull/1120
```

but couldn't discern the reason why this is the case. cc: @lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046
Approved by: https://github.com/peterbell10
2023-09-26 08:39:21 +00:00
SS-JIA
7de669f2f9 [core IR] Remove trunc decomp and add trunc to core (#109902)
Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226). Remove the decomposition for `trunc`, and add it as a core operator.

Going forward, provide similar treatment for operators that map cleanly to hardware instructions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902
Approved by: https://github.com/peterbell10
2023-09-25 18:18:06 +00:00