Commit Graph

1153 Commits

Author SHA1 Message Date
PyTorch MergeBot
520bc1080e Revert "[Profiler] Unify the device(CUDA, XPU, PrivateUse1) in torch profiler post processing (#123247)"
This reverts commit 768ce2cdda.

Reverted https://github.com/pytorch/pytorch/pull/123247 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/123247#issuecomment-2066152611))
2024-04-19 09:09:03 +00:00
Chen, Zejun
768ce2cdda [Profiler] Unify the device(CUDA, XPU, PrivateUse1) in torch profiler post processing (#123247)
This PR unifies the CUDA, XPU and PrivateUse1 in the torch profiler. Now CUDA, XPU and PrivateUse1 can together use string object `use_device` to distinguish each other and share one device path for calculating kineto time durations and memory statistics for post processing.

#suppress-api-compatibility-check

Co-authored-by: Aaron Enye Shi <enye.shi@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123247
Approved by: https://github.com/aaronenyeshi, https://github.com/gujinghui
2024-04-19 03:31:13 +00:00
Yuanhao Ji
21f7cbdc1c Enable UFMT on test/test_autograd.py (#124141)
Part of: #123062

Ran lintrunner on:

- `test/test_autograd.py`

Detail:

```bash
$ lintrunner -a --take UFMT --all-files
ok No lint issues.
Successfully applied all patches.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124141
Approved by: https://github.com/soulitzer
2024-04-18 00:16:23 +00:00
rzou
3d2d7ba19d Delete torch.autograd.function.traceable APIs (#122817)
We deprecated them in 2.3 with plans to delete in 2.4. Very few OSS
repos use this flag at all and it also does nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122817
Approved by: https://github.com/albanD
2024-03-28 18:24:15 +00:00
Joel Schlosser
cd6bfc7965 Proper view support for jagged layout NestedTensor (#113279)
This PR:
* Introduces an ATen op for creating true jagged views from a dense values buffer
    * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)`
    * This ops is implemented on the Python side using torch.library so we can return a subclass instance
    * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer`
    * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()`
    * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view
* Introduces an ATen op for accessing the `values` component of an NT via a view
    * `_nested_get_values(nt)`
* **Removes** the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively.
* Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly
    * Similarly, avoid `buffer_from_jagged()`, preferring `values()`
* Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack)

With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling.

Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922)
Co-authored-by: voznesenskym <voznesenskym@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279
Approved by: https://github.com/ezyang
2024-03-22 02:12:36 +00:00
PyTorch MergeBot
224beecee6 Revert "Proper view support for jagged layout NestedTensor (#113279)"
This reverts commit 5855c490f0.

Reverted https://github.com/pytorch/pytorch/pull/113279 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113279#issuecomment-2013899762))
2024-03-21 22:03:01 +00:00
Joel Schlosser
5855c490f0 Proper view support for jagged layout NestedTensor (#113279)
This PR:
* Introduces an ATen op for creating true jagged views from a dense values buffer
    * `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)`
    * This ops is implemented on the Python side using torch.library so we can return a subclass instance
    * `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer`
    * The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()`
    * `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view
* Introduces an ATen op for accessing the `values` component of an NT via a view
    * `_nested_get_values(nt)`
* **Removes** the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively.
* Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly
    * Similarly, avoid `buffer_from_jagged()`, preferring `values()`
* Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack)

With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling.

Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922)
Co-authored-by: voznesenskym <voznesenskym@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279
Approved by: https://github.com/ezyang
2024-03-20 23:45:34 +00:00
albanD
6791b0c09e Change default torch_function behavior to be disabled when torch_dispatch is defined (take 2) (#120632)
This does not introduce a new test but is tested by checking that all the classes we already have still behave as before now that they don't explicitly disable torch_function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120632
Approved by: https://github.com/ezyang
2024-03-09 01:08:37 +00:00
rzou
b52e0bf131 Deprecate torch.autograd.function.traceable, is_traceable (#121413)
- There are no usages of this internally.
- There are very few usages of this in OSS (most of these are forks of old
repositories).
- This flag doesn't do anything.

We're deprecating it to prevent confusion. I will delete it immediately
after the branch cut.

Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121413
Approved by: https://github.com/albanD, https://github.com/soulitzer
2024-03-08 18:41:07 +00:00
Yu, Guangye
c2b2e57032 Intel GPU Runtime Upstreaming for Guard (#118523)
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118523
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/jgong5, https://github.com/malfet
ghstack dependencies: #120315
2024-02-22 14:07:21 +00:00
soulitzer
450339ab2d Test for fatal signal in test_pynode_destruction_deadlock (#120279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120279
Approved by: https://github.com/albanD
2024-02-21 21:53:51 +00:00
Joel Schlosser
9ec8dd2467 Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.

```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
  virtual ~ViewFunc() {}
  /// Returns any SymInts in the saved state.
  virtual std::vector<c10::SymInt> get_symints() const { return {}; }
  /// Returns the number of SymInts in the saved state.
  virtual size_t num_symints() const { return 0; }
  /// Returns any tensors in the saved state.
  virtual std::vector<at::Tensor> get_tensors() const { return {}; }
  /// Returns the number of tensors in the saved state.
  virtual size_t num_tensors() const { return 0; }
  /// Reapplies the view on the given base using the saved state.
  virtual at::Tensor operator()(const at::Tensor&) const = 0;
  /// Returns a clone of this ViewFunc, optionally with the specified saved state.
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;

protected:
  /// Sets the values of any SymInts in the saved state. The input vector size must
  /// match the number of SymInts in the saved state (i.e. the size of the list
  /// returned by get_symints()).
  virtual void set_symints(std::vector<c10::SymInt>) {}
  /// Sets the values of any Tensors in the saved state. The input vector size must
  /// match the number of Tensors in the saved state (i.e. the size of the list
  /// returned by get_tensors()).
  virtual void set_tensors(std::vector<at::Tensor>) {}
};
```

New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`

The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.

Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
  SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
  {};
  virtual ~SliceTensorViewFunc() override {};
  virtual std::vector<c10::SymInt> get_symints() const override;
  virtual size_t num_symints() const override;
  virtual std::vector<at::Tensor> get_tensors() const override;
  virtual size_t num_tensors() const override;
  virtual at::Tensor operator()(const at::Tensor&) const override;
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;

protected:
  virtual void set_symints(std::vector<c10::SymInt>) override;
  virtual void set_tensors(std::vector<at::Tensor>) override;

private:
  int64_t dim;
  c10::optional<c10::SymInt> start;
  c10::optional<c10::SymInt> end;
  c10::SymInt step;
};
...

// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
  ::std::vector<c10::SymInt> symints;
  symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
  if(start.has_value()) symints.insert(symints.end(), *(start));
  if(end.has_value()) symints.insert(symints.end(), *(end));
  symints.push_back(step);
  return symints;
}

size_t SliceTensorViewFunc::num_symints() const {
  return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}

void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
  TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
  auto i = 0;
  if(start.has_value()) start = symints[i];
  i += (start.has_value() ? 1 : 0);
  if(end.has_value()) end = symints[i];
  i += (end.has_value() ? 1 : 0);
  step = symints[i];
}

std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
  ::std::vector<at::Tensor> tensors;
  return tensors;
}

size_t SliceTensorViewFunc::num_tensors() const {
  return static_cast<size_t>(0);
}

void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
  TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());

}

at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
  return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}

std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
    std::optional<std::vector<c10::SymInt>> symints,
    std::optional<std::vector<at::Tensor>> tensors) const {
  auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
  if (symints.has_value()) {
    output->set_symints(std::move(*(symints)));
  }
  if (tensors.has_value()) {
    output->set_tensors(std::move(*(tensors)));
  }
  return output;
}
```

The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.

For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-14 22:00:43 +00:00
PyTorch MergeBot
24bdd03d23 Revert "Reify view_func() closures as ViewFuncs (#118404)"
This reverts commit d5a6762263.

Reverted https://github.com/pytorch/pytorch/pull/118404 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/118404#issuecomment-1938600260))
2024-02-12 12:38:51 +00:00
Joel Schlosser
d5a6762263 Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.

```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
  virtual ~ViewFunc() {}
  /// Returns any SymInts in the saved state.
  virtual std::vector<c10::SymInt> get_symints() const { return {}; }
  /// Returns the number of SymInts in the saved state.
  virtual size_t num_symints() const { return 0; }
  /// Returns any tensors in the saved state.
  virtual std::vector<at::Tensor> get_tensors() const { return {}; }
  /// Returns the number of tensors in the saved state.
  virtual size_t num_tensors() const { return 0; }
  /// Reapplies the view on the given base using the saved state.
  virtual at::Tensor operator()(const at::Tensor&) const = 0;
  /// Returns a clone of this ViewFunc, optionally with the specified saved state.
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;

protected:
  /// Sets the values of any SymInts in the saved state. The input vector size must
  /// match the number of SymInts in the saved state (i.e. the size of the list
  /// returned by get_symints()).
  virtual void set_symints(std::vector<c10::SymInt>) {}
  /// Sets the values of any Tensors in the saved state. The input vector size must
  /// match the number of Tensors in the saved state (i.e. the size of the list
  /// returned by get_tensors()).
  virtual void set_tensors(std::vector<at::Tensor>) {}
};
```

New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`

The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.

Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
  SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
  {};
  virtual ~SliceTensorViewFunc() override {};
  virtual std::vector<c10::SymInt> get_symints() const override;
  virtual size_t num_symints() const override;
  virtual std::vector<at::Tensor> get_tensors() const override;
  virtual size_t num_tensors() const override;
  virtual at::Tensor operator()(const at::Tensor&) const override;
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;

protected:
  virtual void set_symints(std::vector<c10::SymInt>) override;
  virtual void set_tensors(std::vector<at::Tensor>) override;

private:
  int64_t dim;
  c10::optional<c10::SymInt> start;
  c10::optional<c10::SymInt> end;
  c10::SymInt step;
};
...

// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
  ::std::vector<c10::SymInt> symints;
  symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
  if(start.has_value()) symints.insert(symints.end(), *(start));
  if(end.has_value()) symints.insert(symints.end(), *(end));
  symints.push_back(step);
  return symints;
}

size_t SliceTensorViewFunc::num_symints() const {
  return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}

void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
  TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
  auto i = 0;
  if(start.has_value()) start = symints[i];
  i += (start.has_value() ? 1 : 0);
  if(end.has_value()) end = symints[i];
  i += (end.has_value() ? 1 : 0);
  step = symints[i];
}

std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
  ::std::vector<at::Tensor> tensors;
  return tensors;
}

size_t SliceTensorViewFunc::num_tensors() const {
  return static_cast<size_t>(0);
}

void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
  TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());

}

at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
  return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}

std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
    std::optional<std::vector<c10::SymInt>> symints,
    std::optional<std::vector<at::Tensor>> tensors) const {
  auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
  if (symints.has_value()) {
    output->set_symints(std::move(*(symints)));
  }
  if (tensors.has_value()) {
    output->set_tensors(std::move(*(tensors)));
  }
  return output;
}
```

The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.

For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-09 18:51:36 +00:00
Andrew Gu
d6b556bd98 Added "any" mode to register_multi_grad_hook (#117984)
This is a re-open of https://github.com/pytorch/pytorch/pull/115628/. This PR adds an `"any"` option to `register_multi_grad_hook` that runs the hook when the gradient of _any_ of the input tensors is computed. The existing functionality is folded under the default `"all"` mode.

The multi-threaded test case is based on the existing one for `register_multi_grad_hook`. I would appreciate a closer look on that. ~~I am not sure about the hook signature (i.e. why we see two gradients in the hook that runs instead of just one, as [`register_hook`](https://pytorch.org/docs/stable/generated/torch.Tensor.register_hook.html) docs suggest).~~ It was because I was iterating over the 2 elements in the single tensor 😢 .

I did not update the `notes/autograd.rst`, which currently has a [blurb](https://pytorch.org/docs/stable/notes/autograd.html#special-hooks) on `register_multi_grad_hook`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117984
Approved by: https://github.com/soulitzer
ghstack dependencies: #117994, #118186
2024-01-25 16:25:52 +00:00
soulitzer
67300a11cb Support custom autograd Function forward AD return non-Tensor in forward (#118234)
Fixes https://github.com/pytorch/pytorch/issues/117491

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118234
Approved by: https://github.com/albanD
ghstack dependencies: #117552
2024-01-25 03:24:29 +00:00
soulitzer
5b819d9ef0 Properly move retains_grad hook on in-place over view for base (#117552)
Fixes https://github.com/pytorch/pytorch/issues/117366
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117552
Approved by: https://github.com/albanD
2024-01-25 00:27:13 +00:00
rzou
db1a6eda9e [codemod] markDynamoStrictTest batch 22 (#117729)
[codemod] markDynamoStrictTest test_autograd
[codemod] markDynamoStrictTest test_ao_sparsity
[codemod] markDynamoStrictTest test_jit
[codemod] markDynamoStrictTest test_quantization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117729
Approved by: https://github.com/bdhirsh
2024-01-18 16:59:26 +00:00
soulitzer
5866284d4a Make not passing use_reentrant back to warning instead of erroring and clarify docs (#116710)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116710
Approved by: https://github.com/albanD
ghstack dependencies: #116523
2024-01-09 20:58:49 +00:00
Joel Schlosser
3c21264c9b Introduce reverse view_funcs (#115894)
Part 2 of implementation for general [subclass view fake-ification](https://docs.google.com/document/d/1C5taWiplmX7nKiURXDOAZG2W5VNJ2iV0fQFq92H0Cxw).

Details:
* Codegen `rev_view_func()` alongside `view_func()`
    * Reverse view_func gives you a "base" from a "view": `rev_view_func(new_view) -> new_base` AKA it plays the original view backwards
* Utilizes the functional inverses defined in `FunctionalInverses.cpp`, passing `InverseReturnMode::AlwaysView`
* Manually implements functional inverses for `narrow()` and `chunk()`
* **NB: Multi-output views now set view_func() / rev_view_func() for each of the output views!**
    * Due to this, the `as_view()` overload that operates on a list of views is scrapped in favor of iteration via codegen

Example codegen in `ADInplaceOrViewTypeN.cpp`:
```cpp
at::Tensor narrow(c10::DispatchKeySet ks, const at::Tensor & self, int64_t dim, c10::SymInt start, c10::SymInt length) {
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::_ops::narrow::redispatch(ks & c10::after_ADInplaceOrView_keyset, self, dim, start, length);
  })();
  std::function<at::Tensor(const at::Tensor&)> func=nullptr;
  std::function<at::Tensor(const at::Tensor&)> rev_func=nullptr;
  if (false || !self.unsafeGetTensorImpl()->support_as_strided() ||
      c10::AutogradState::get_tls_state().get_view_replay_enabled()) {
    func = [=](const at::Tensor& input_base) {
      return at::_ops::narrow::call(input_base, dim, start, length);
    };
    rev_func = [=](const at::Tensor& input_view) {
      // NB: args from narrow() signature are passed along to the inverse
      return at::functionalization::FunctionalInverses::narrow_copy_inverse(self, input_view, at::functionalization::InverseReturnMode::AlwaysView, dim, start, length);
    };
  }
  auto result = as_view(/* base */ self, /* output */ _tmp, /* is_bw_differentiable */ true, /* is_fw_differentiable */ true, /* view_func */ func, /* rev_view_func */ rev_func, /* creation_meta */ InferenceMode::is_enabled() ? CreationMeta::INFERENCE_MODE : (at::GradMode::is_enabled() ? CreationMeta::DEFAULT : CreationMeta::NO_GRAD_MODE));
  return result;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115894
Approved by: https://github.com/soulitzer
2024-01-05 16:48:12 +00:00
Aaron Gokaslan
3fe437b24b [BE]: Update flake8 to v6.1.0 and fix lints (#116591)
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
  - `assert(a == b)` -> `assert a == b`
  - `if(x > y or y < z):`->`if x > y or y < z:`
  - And `return('...')` -> `return '...'`

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
2024-01-03 06:04:44 +00:00
soulitzer
4d6a1ad400 Activation checkpoint and checkpoint_sequential errors if use_reentrant not passed explicitly (#115868)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115868
Approved by: https://github.com/albanD
ghstack dependencies: #115438
2023-12-20 15:23:44 +00:00
soulitzer
cfb3cd11c1 Add basic autograd TORCH_LOGS support (#115438)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115438
Approved by: https://github.com/albanD
2023-12-20 15:23:44 +00:00
Aaron Gokaslan
794545c11f [BE]: Enable RUF015 codebase wide (#115507)
Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507
Approved by: https://github.com/malfet
2023-12-11 15:51:01 +00:00
Xuehai Pan
55064a4ef9 [BE] add parentheses to kwargs unpacking func(*args, **(kwargs or {})) (#115026)
This PR adds parentheses to kwargs unpacking `func(*args, **(kwargs or {}))` for better code readability.

With/without the parentheses are semantic equivalent because they produce the same bytecode.

```console
$ echo "func(*args, **kwargs or {})" | python3 -m dis -
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (func)
              6 LOAD_NAME                1 (args)
              8 BUILD_MAP                0
             10 LOAD_NAME                2 (kwargs)
             12 JUMP_IF_TRUE_OR_POP      1 (to 16)
             14 BUILD_MAP                0
        >>   16 DICT_MERGE               1
             18 CALL_FUNCTION_EX         1
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

$ echo "func(*args, **(kwargs or {}))" | python3 -m dis -
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (func)
              6 LOAD_NAME                1 (args)
              8 BUILD_MAP                0
             10 LOAD_NAME                2 (kwargs)
             12 JUMP_IF_TRUE_OR_POP      1 (to 16)
             14 BUILD_MAP                0
        >>   16 DICT_MERGE               1
             18 CALL_FUNCTION_EX         1
             20 POP_TOP
             22 LOAD_CONST               0 (None)
             24 RETURN_VALUE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115026
Approved by: https://github.com/Skylion007
2023-12-03 20:03:26 +00:00
Tobias Ringwald
b6df841460 Fixed an issue where a user-specified default device clashed with the… (#114560)
… device placement of the RNG. This PR now ignores the user-specified default device, allocates the tensor on the CPU and then moves the tensor to the device of the input tensor. This was more or less already the standard procedure in case the default device wasn't set.

Fixes #114536.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114560
Approved by: https://github.com/soulitzer
2023-11-29 17:45:49 +00:00
Ying Liu
85b97605ab Enable set sequence nr (#114120)
Summary:
In some cases (especially those involving collective calls) - we would want to always kick off a collective call first before running going down another path.

For  example:

```
tbe lookup -> a2a ->
                     overarch
dense ------------->
```

if the forward code is written as
a2a_out = a2a
dense = dense_net
out = overarch(a2a_out, dense)
out.backward()

The current default is running backwards in the opposite order the forward is called. However, there is no data dependency between a2a and dense, so in reality either of them could be run first. We would like the a2a to run first because it provides optimal (on average) overlap.

Changing the seq_nr of a2a_out to something large enough would allow autograd engine to kick it off first.

Test Plan: Tests incoming

Differential Revision: D51445261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114120
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-11-21 19:47:28 +00:00
soulitzer
c1d9d4a2b5 checkpoint_sequential warns if use_reentrant not passed explicitly (#114158)
Use warning text for deprecation message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114158
Approved by: https://github.com/albanD
2023-11-20 23:08:44 +00:00
soulitzer
c435b8c10a Fix autograd engine callback error propagation from device thread (#113702)
The existing try-catch doesn't work because it doesn't call err.persist(). This is in contrast to the try-catch for evaluate_function which does work because it calls into python_engine's thread_on_exception which calls persist.

Calling persist on a python_error stashes the PyErr state from the thread-local PyThreadState onto the python_error object, so that when this error object is stored onto the future and passed back to the calling cpu thread, python_engine's execute try-catch can then err.restore() the error state. Finally, the python_engine's execute would re-raise so that this is re-caught by the HANDLE_TH_ERRORS macro.

Fixes https://github.com/pytorch/pytorch/issues/75750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113702
Approved by: https://github.com/albanD
2023-11-17 20:17:02 +00:00
soulitzer
3e3c6cc05e Do not error when printing view created in no-grad modified in-place in no-grad (#113716)
Fixes https://github.com/pytorch/pytorch/issues/99968

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113716
Approved by: https://github.com/albanD
2023-11-16 18:57:56 +00:00
Jon Chuang
5ccd22502f [contextlib] Wrapping a function with set_grad_enabled will consume its global mutation (#113359)
Fixes https://github.com/pytorch/pytorch/issues/113298

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113359
Approved by: https://github.com/soulitzer, https://github.com/jansel
2023-11-09 19:16:20 +00:00
PyTorch MergeBot
b0087b4cf7 Revert "record_function: remove legacy internal operators (#72303)"
This reverts commit 0be84bb41e.

Reverted https://github.com/pytorch/pytorch/pull/72303 on behalf of https://github.com/izaitsevfb due to Apparently _record_function_enter is still used internally at Meta in several places and in lots of internal tests. ([comment](https://github.com/pytorch/pytorch/pull/72303#issuecomment-1777942975))
2023-10-24 20:01:14 +00:00
Peter Bell
0be84bb41e record_function: remove legacy internal operators (#72303)
These operators have not been used since #76420 but were preserved for TorchScript backward compatibility

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72303
Approved by: https://github.com/albanD
ghstack dependencies: #104535
2023-10-23 22:55:05 +00:00
albanD
5e8be63e99 Allow specifiying inputs as GradientEdge in autograd APIs (#110867)
This can be useful for advanced users (like AOTAutograd) who don't want to keep the corresponding Tensor alive (for memory reasons for example) or when inplace op will change the Tensor's grad_fn (but gradients wrt to the original value is needed).

I went minimal API change but open to suggestions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110867
Approved by: https://github.com/soulitzer
2023-10-12 04:08:44 +00:00
soulitzer
73f4c1a406 [reland2] Update custom Function preserve torch function when inputs … (#110895)
…returned as-is

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110895
Approved by: https://github.com/albanD
2023-10-11 21:37:19 +00:00
soulitzer
c9eb8d8d90 Add set_checkpoint_debug_enabled that overrides local setting (#110728)
People access activation checkpoint through many layers of config and it is not always guaranteed that all the layers of wrapping around checkpoint properly propagate all the kwargs, e.g. debug mode. This context manager offers an alternative way to enable debug mode that bypasses the need for all layers to propagate kwargs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110728
Approved by: https://github.com/albanD
ghstack dependencies: #110673, #110674, #110675, #110676
2023-10-11 02:12:31 +00:00
PyTorch MergeBot
d1c157c598 Revert "[reland] Update custom Function preserve torch function when inputs r… (#110679)"
This reverts commit 563728f61c.

Reverted https://github.com/pytorch/pytorch/pull/110679 on behalf of https://github.com/kit1980 due to The diff has Meta-internal changes, please land from Phabricator ([comment](https://github.com/pytorch/pytorch/pull/110679#issuecomment-1753523182))
2023-10-09 19:09:01 +00:00
soulitzer
563728f61c [reland] Update custom Function preserve torch function when inputs r… (#110679)
…eturned as-is

reland of https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749803837

Opening this without ghstack to do codev. In our PR, we changed the signature of `_wrap_outputs`. There is some internal code that calls `_wrap_outputs` directly, so we also need to update that callsite.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110679
Approved by: https://github.com/albanD
2023-10-07 00:27:45 +00:00
PyTorch MergeBot
236afe73a2 Revert "Update custom Function preserve torch function when inputs returned as-is (#109825)"
This reverts commit 4e73eee93f.

Reverted https://github.com/pytorch/pytorch/pull/109825 on behalf of https://github.com/PaliC due to causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749802739))
2023-10-05 23:49:41 +00:00
soulitzer
4e73eee93f Update custom Function preserve torch function when inputs returned as-is (#109825)
Fixes https://github.com/pytorch/pytorch/issues/109805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109825
Approved by: https://github.com/albanD
2023-10-04 22:45:11 +00:00
FFFrog
70f2adaec3 Setup_context does not contain default values of forward() (#108561)
Fixes #108529

As the title shown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108561
Approved by: https://github.com/soulitzer
2023-09-19 16:23:52 +00:00
soulitzer
3efc1882e8 Update CopySlices to not internal assert when grad_output is undefined (#108353)
Fixes https://github.com/pytorch/pytorch/issues/107928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108353
Approved by: https://github.com/albanD
ghstack dependencies: #107296, #107349
2023-09-11 16:26:05 +00:00
Jane Xu
6e71ad0509 Add tensor post accumulate grad hook API (#107063)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107063
Approved by: https://github.com/albanD, https://github.com/soulitzer
2023-08-24 00:19:35 +00:00
PyTorch MergeBot
432fce4e0d Revert "Add tensor post accumulate grad hook API (#107063)"
This reverts commit 3f655277d4.

Reverted https://github.com/pytorch/pytorch/pull/107063 on behalf of https://github.com/ZainRizvi due to Diff train weirdness. Need to temporarily revert this PR and will right land it soon afterwards ([comment](https://github.com/pytorch/pytorch/pull/107063#issuecomment-1690799057))
2023-08-24 00:12:34 +00:00
Jane Xu
3f655277d4 Add tensor post accumulate grad hook API (#107063)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107063
Approved by: https://github.com/albanD, https://github.com/soulitzer
2023-08-22 15:15:57 +00:00
soulitzer
aa04b0536b Fix inference_mode decorator pass mode as kwarg (#107349)
Fixes https://fb.workplace.com/groups/1405155842844877/permalink/7330520550308347/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107349
Approved by: https://github.com/albanD
ghstack dependencies: #107296
2023-08-17 17:12:31 +00:00
andreasfloros
c9c90765c1 grad_mode decorators without paren (#107086)
This PR implements the feature described in #107036 for `no_grad`, `enable_grad` and `inference_mode`.

Users can still use the above as before but they can also use them without parentheses.

For example:

```python
import torch

a = torch.ones(1, requires_grad=True)

def do_something():
    print(2 * a)

with torch.no_grad():
    do_something()  # tensor([2.])

torch.no_grad()(do_something)()  # tensor([2.])

torch.no_grad(do_something)()  # tensor([2.])

do_something()  # tensor([2.], grad_fn=<MulBackward0>)
```

For `inference_mode`, decorating without parenthesis is equivalent to decorating with the default `mode=True`, similiar to how dataclasses behave (https://docs.python.org/3/library/dataclasses.html#module-contents)

Closes #107036

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107086
Approved by: https://github.com/albanD
2023-08-15 05:25:33 +00:00
Richard Zou
b9ad7bc533 Don't run test/autograd/test_fallback.py in parallel (#106866)
Fixes https://github.com/pytorch/pytorch/issues/106754

This PR:
- moves test/autograd/test_fallback.py to test_autograd_fallback.py and
removes it from test_autograd.py (necessary for the next step)
- adds test_autograd_fallback.py to parallel test blocklist.
- lintrunner really wanted to make changes to the files, but other than
that, it is a move.

The problem is that we set a global option (the autograd fallback mode)
during these tests which may cause the tests to interfere with each
other.

Test Plan:
- python test/run_test.py -i test_autograd_fallback

NOTE to diff train oncall:
- You'll also need to modify the test/autograd/test_fallback.py TARGET in
caffe2/test/TARGETS since we renamed the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106866
Approved by: https://github.com/soulitzer
2023-08-10 00:26:23 +00:00
poseljacob
a25eee1d77 _force_original_view_tracking to work as both context manager and function (#106706)
Fix _force_original_view_tracking to work as a function as well as a context manager, as stated by documentation.

Applied similar fixes to PR: https://github.com/pytorch/pytorch/pull/105291
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106706
Approved by: https://github.com/albanD
2023-08-07 23:29:22 +00:00
Justin Chu
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
poseljacob
1aba399138 allow set_multithreading_enabled to act as function and context manager (#105291)
Fixes #104985

Implemented `set_multithreading_enabled` C++ function to directly alter state rather than using `MultithreadingEnabled` class, which was automatically resetting the state when the object was destroyed. This behavior more closely aligns with set_grad_enabled which does work as expected. This allows us to change python class `set_multithreading_enabled` to act as both a function and context manager.

I also added a getter: `torch._C.is_multithreading_enabled`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105291
Approved by: https://github.com/albanD
2023-07-18 16:55:40 +00:00
soulitzer
cf404a8ce4 Fix get_current_graph_task_execution_order accumulate_grads ordering (#105353)
Fixes https://github.com/pytorch/pytorch/issues/105293
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105353
Approved by: https://github.com/albanD
2023-07-18 00:59:25 +00:00
Richard Zou
f03a8f0589 [reland] Deprecate registering autograd kernels at not an autograd key (#105078)
Summary:
Context
-------
This PR adds a new fallback to the Autograd dispatch keys.

If you would prefer the old behavior:
- A quick (unsupported) way to get the previous behavior is to call
`torch._C._set_autograd_fallback("nothing")`
- Register "torch::CppFunction::makeFallthrough()" to your Autograd key,
like in https://gist.github.com/zou3519/d09a5f4b1afe2430af09fea67c6ff2c8

It is possible that this PR regresses performance of overhead-bound
models. If this is the case, please reach out (and apply one of the
temporary fixes in the previous section).

Description for reviewers
-------------------------
In order to deprecate registering autograd kernels at not an autograd
key, we add a fallback to the Autograd dispatch keys. This fallback
raises a warning if the user attempts to backprop through the operator
and is also configurable to either warn or not warn.

The goal of this PR is to
- preserve as much BC as possible
- raise a warning that whatever the user is doing is potentially wrong.
- be as performant as possible

There are roughly two cases:
- if the post-autograd kernels return a Tensor that requires grad, then
we install an autograd hook that raises a warning. We are preserving BC
in that it is possible that the user has a torch::autograd::Function
registered to their CPU key.
- if the post-autograd kernels return Tensors that do not require grad,
then we make them require_grad and install a WarnNotImplemented grad fn
that warns in the backward pass. This is mildy BC-breaking (see next
section).

Test Plan:
- bunch of new tests

BC-Breaking Note
----------------
This PR adds a new fallback to the Autograd dispatch keys. It affects
custom operators that do not have a kernel registered to the Autograd
keys (e.g. AutogradCPU and AutogradCUDA).

If the previous behavior was that the custom operator would return
Tensors that do not require grad if the inputs do require grad, then
this PR changes it so that all floating-point and complex returns do
require grad. See the "Context" section above for how to get the old
behavior.

Differential Revision: D47408353

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105078
Approved by: https://github.com/soulitzer
2023-07-14 15:03:07 +00:00
PyTorch MergeBot
24aa8b9b9a Revert "Deprecate registering autograd kernels at not an autograd key (#104481)"
This reverts commit ed13ab6664.

Reverted https://github.com/pytorch/pytorch/pull/104481 on behalf of https://github.com/atalman due to failed in periodic tests ([comment](https://github.com/pytorch/pytorch/pull/104481#issuecomment-1631552846))
2023-07-11 21:48:22 +00:00
Richard Zou
ed13ab6664 Deprecate registering autograd kernels at not an autograd key (#104481)
Context
-------
This PR adds a new fallback to the Autograd dispatch keys.

If you would prefer the old behavior:
- A quick (unsupported) way to get the previous behavior is to call
`torch._C._set_autograd_fallback("nothing")`
- Register "torch::CppFunction::makeFallthrough()" to your Autograd key,
like in https://gist.github.com/zou3519/d09a5f4b1afe2430af09fea67c6ff2c8

It is possible that this PR regresses performance of overhead-bound
models. If this is the case, please reach out (and apply one of the
temporary fixes in the previous section).

Description for reviewers
-------------------------
In order to deprecate registering autograd kernels at not an autograd
key, we add a fallback to the Autograd dispatch keys. This fallback
raises a warning if the user attempts to backprop through the operator
and is also configurable to either warn or not warn.

The goal of this PR is to
- preserve as much BC as possible
- raise a warning that whatever the user is doing is potentially wrong.
- be as performant as possible

There are roughly two cases:
- if the post-autograd kernels return a Tensor that requires grad, then
we install an autograd hook that raises a warning. We are preserving BC
in that it is possible that the user has a torch::autograd::Function
registered to their CPU key.
- if the post-autograd kernels return Tensors that do not require grad,
then we make them require_grad and install a WarnNotImplemented grad fn
that warns in the backward pass. This is mildy BC-breaking (see next
section).

Test Plan:
- bunch of new tests

BC-Breaking Note
----------------
This PR adds a new fallback to the Autograd dispatch keys. It affects
custom operators that do not have a kernel registered to the Autograd
keys (e.g. AutogradCPU and AutogradCUDA).

If the previous behavior was that the custom operator would return
Tensors that do not require grad if the inputs do require grad, then
this PR changes it so that all floating-point and complex returns do
require grad. See the "Context" section above for how to get the old
behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104481
Approved by: https://github.com/soulitzer
2023-07-11 16:48:39 +00:00
soulitzer
c85468a94c [autograd Function] Add private API to not materialize grads for non-differentiable outputs (#104291)
Fixes https://github.com/pytorch/pytorch/issues/104272

This PR adds a new private API `materialize_non_diff_grads` (default True) such that when set to False, grad outputs corresponding to outputs marked non-differentiable would receive None instead of a zero-filled tensor. This is overrides the setting of `materialize_grads`, i.e. grad outputs corresponding non-differentiable outputs would still be None even if `materialize_grads=True` (the default).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104291
Approved by: https://github.com/albanD
2023-07-08 14:53:54 +00:00
soulitzer
10ad74cbec Update SavedVariable to support saving non-input leafs (#104039)
Fixes https://github.com/pytorch/pytorch/issues/103726
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104039
Approved by: https://github.com/albanD
2023-06-22 21:52:35 +00:00
soulitzer
73c927f901 Improve debuggability of activation checkpoint (#103859)
This PR makes some improvements for debuggability of checkpointing:
- improved error messages that are more understandable
- errors are now `CheckpointError` which subclasses `RuntimeError` (only `CheckpointError` triggers debug message, see below)
- stricter error checking by default:
   - shapes, dtypes, and device are compared
   - we also now error when more tensors are being saved for backward during recompute
   - NOTE: checks are relaxed if it is detected that you are doing backward within forward
 - shapes, dtype, and device checking can be disabled by passing `determinism_check="none"`
 - new debug flag: more helpful error message when `debug=True`

Note:
- cpp stack trace is only included for x86 linux machines
- the error message if cpp stack trace is included can be quite long. For a function checkpointed with 8 operators, the log was around 1300 lines! (should this be hidden behind a flag?)

[Error message when debug='True' (python stack trace only)](https://gist.github.com/soulitzer/3d5e19c7cceae8e22f9bdd625ec39dd4)

[Error message when debug='True' (with python and cpp stacktrace)](https://gist.github.com/soulitzer/ff8fd8c3ccbb2c90dfe3df6d7713b167)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103859
Approved by: https://github.com/albanD
2023-06-22 03:57:36 +00:00
PyTorch MergeBot
2c313e7b99 Revert "Record view stacks if running anomaly mode (#103185)"
This reverts commit a02c573a89.

Reverted https://github.com/pytorch/pytorch/pull/103185 on behalf of https://github.com/izaitsevfb due to Breaks internal builds, see D46629734 ([comment](https://github.com/pytorch/pytorch/pull/103185#issuecomment-1588258206))
2023-06-12 23:52:10 +00:00
Nikita Shulga
4cfa06f706 [BE] Deprecate has_XYZ attributes (#103279)
Use [`__getattr__`](https://peps.python.org/pep-0562/) to raise warningwhen one tries to access `has_XYZ` methods and recommend appropriate `torch.backends.XYZ` methods

Make respective properties in `torch._C` private (by prefixing them with underscore), to exclude from `from torch._C import *`.

Added `warnings.simplefilter` to workaround Python-3.11 torch.compile lineinfo issue.

Fixes https://github.com/pytorch/pytorch/issues/102484

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103279
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2023-06-10 05:17:17 +00:00
Edward Z. Yang
a02c573a89 Record view stacks if running anomaly mode (#103185)
Now, when you do an inplace mutation and the view is naughty, you get this message:

```
RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked). To find out where this view was allocated, run your entire forward region under anomaly mode (torch.autograd.detect_anomaly(check_nan=False)).
```

When you run under anomaly mode, you get:

```
RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked). This view was allocated at:
  File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 4299, in arglebargle
  File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 4306, in test_anomaly_gives_view_stack
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 591, in run
  File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 2266, in _run_with_retry
  File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 2337, in run
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/case.py", line 650, in __call__
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 122, in run
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/suite.py", line 84, in __call__
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/runner.py", line 184, in run
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/main.py", line 271, in runTests
  File "/home/ezyang/local/c/pytorch-env/lib/python3.10/unittest/main.py", line 101, in __init__
  File "/data/users/ezyang/c/pytorch/torch/testing/_internal/common_utils.py", line 894, in run_tests
  File "/data/users/ezyang/c/pytorch/test/test_autograd.py", line 11209, in <module>
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103185
Approved by: https://github.com/zdevito
2023-06-09 16:56:28 +00:00
soulitzer
896d997dd0 Remove incorrect THP{Cpp,}Function_traverse PyObject traversals (#102860)
Fixes https://github.com/pytorch/pytorch/issues/102174

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102860
Approved by: https://github.com/albanD
2023-06-02 22:05:25 +00:00
soulitzer
98f6b815b7 [BE] Make some simplifications to torch.utils.checkpoint logic (#101193)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101193
Approved by: https://github.com/albanD
2023-05-12 04:35:22 +00:00
soulitzer
e552b91286 torch.utils.checkpoint warns if user does not pass use_reentrant explicitly (#100551)
Now that we have updated all internal callsites, per https://fb.workplace.com/groups/pytorch.oss.dev/permalink/1635183750239493/ we should raise a warning when use_reentrant is not explicitly passed for 2.1

Deprecation note:
- Not passing in use_reentrant explicitly is now deprecated and will raise a warning. In the future the default value of use-reentrant will be False. To preserve the existing behavior you can pass in use_reentrant=True. It is recommended that you use use_reentrant=False.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100551
Approved by: https://github.com/Skylion007
2023-05-03 20:48:07 +00:00
Justin Chu
01abbfbaae [BE] Fix all B022 useless-contextlib-suppress (#100335)
No arguments passed to contextlib.suppress. No exceptions will be suppressed and therefore this context manager is redundant

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100335
Approved by: https://github.com/Skylion007
2023-04-30 18:47:40 +00:00
Aaron Gokaslan
47dca20d80 [BE] Enable flake8-comprehension rule C417 (#97880)
Enables flake8-comprehension rule C417. Ruff autogenerated these fixes to the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97880
Approved by: https://github.com/ezyang, https://github.com/kit1980, https://github.com/albanD
2023-03-30 14:34:24 +00:00
Sergii Dymchenko
5ab50cf048 Fix shoud/shoudl typos (#97930)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97930
Approved by: https://github.com/clee2000
2023-03-30 08:27:16 +00:00
soulitzer
51c3fd39a5 Modify all calls to checkpoint pass use_reentrant explicitly (#97376)
Fixes #ISSUE_NUMBER

This is the first step toward making use_reentrant=False the default.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97376
Approved by: https://github.com/albanD
2023-03-27 13:37:42 +00:00
soulitzer
7a8b691388 Make early stop the default for checkpoint and expose a way to disable (#96866)
Why did I choose context manager instead of per-call? Early stopping is not part of the model definition, and depending on how a particular model is used, e.g., with PT2 or not we may or may not want to disable early stopping.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96866
Approved by: https://github.com/albanD
2023-03-22 20:03:56 +00:00
Pearu Peterson
9d5ac03b9a Deprecate gradcheck check_sparse_nnz argument as duplicate of masked argument (#97187)
As in the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97187
Approved by: https://github.com/soulitzer
2023-03-22 14:11:03 +00:00
Qi Zhu
086ce765a5 Add new parameter materialize_grads to torch.autograd.grad() (#97015)
Fixes #44189
Adds a new parameter, zero_grad_unused, to the torch.autograd.grad() function. This parameter allows for the gradient to be set to 0 instead of None when a variable is unused, which can be helpful for higher-order partial differentials.

Here is an example of using this new parameter to solve d^3y/dx^3 given y = a * x:

```python
x = torch.tensor(0.5, dtype=torch.float32, requires_grad=True)
a = torch.tensor(1, dtype=torch.float32, requires_grad=True)
y = x * a
dydx = torch.autograd.grad(y, x, create_graph=True, allow_unused=True)
d2ydx2 = torch.autograd.grad(dydx, x, allow_unused=True, zero_grad_unused=True)
try:
    d3ydx3 = torch.autograd.grad(d2ydx2, x, allow_unused=True, zero_grad_unused=True)
except RuntimeError as e:
    assert False, "Should not raise error"
```

With `zero_grad_unused`, d2ydx2 could be 0 instead of None, enabling d3ydx3 to be calculated as defined in math without throwing an error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97015
Approved by: https://github.com/soulitzer
2023-03-18 03:11:12 +00:00
albanD
985fc66b30 Bind increment_version to python (#96852)
Should be convenient when writing python-only kernels (with triton) that don't have access to the C++ APIs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96852
Approved by: https://github.com/soulitzer
2023-03-17 20:36:33 +00:00
soulitzer
f3db2a6341 Expose API to specify custom context manager for checkpoint (#96783)
Per [design](https://docs.google.com/document/d/1v-yqRqiWA6dIUOw5OpqFs2PqSQIbDEkwRPGk9FcYnxg/edit) we want (1) to allow the user to pass in a function that returns two context managers (2) a per-call API only for now, and (3) do not upstream selective checkpoint for the short term.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96783
Approved by: https://github.com/albanD
2023-03-15 20:37:33 +00:00
soulitzer
d30db9a251 Replace non-reentrant checkpoint with a rewrite that can be nested and contain grad (#90105)
Changes:
- bc-breaking change: The main difference between this and the old non-reentrant impl that it replaces is that we clear recomputed tensors on backward immediately upon unpack, even if retain_graph=True. This has the following additional implications:
   - Accessing _saved_tensors multiple times will silently recompute forward multiple times.
   - Accessing ctx.saved_tensor twice in the same backward will now raise an error.
- To avoid dealing with the potential consequences, early stopping has been hidden behind a global flag that is by default False, and can be enabled via a context manager. We can remove this in a follow up. Some features of nesting as a result do not work by default.

Before land:
- import to check for more bc-breakingness
- implement any workarounds for the bc-breaking-ness, if we decide on any
- update docs to reflect new lifetime of recomputed variables
- update docs to mention the early stop feature

Follow ups:
- enable early-stopping by default
- update docs/tutorial to feature nested use cases

Related docs:
  - code comment: https://github.com/pytorch/pytorch/pull/90105/files#diff-9dcd955620b52ce128e18e3567be88edbb238810460d1288a86fabc20e483b30R448
  - design doc: https://docs.google.com/document/d/1UDLhTNv6_kvuDTRlsjfj9WdqtNaQNr8ahrvdBIB6914/edit#
  - retains_grad <> checkpiont https://docs.google.com/document/d/1maiGmuFUxysQL0AdYUU88kngAaXh_L0XpDcLDh_5Ors/edit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90105
Approved by: https://github.com/albanD
2023-03-14 20:38:36 +00:00
Kshiteej K
1ec655565d [fix] resize_, resize_as_ : version bump in ADInplaceOrView (#96598)
Ref: https://github.com/pytorch/pytorch/pull/96403#discussion_r1132553277

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96598
Approved by: https://github.com/albanD
2023-03-14 16:15:34 +00:00
kshitij12345
987eade3f3 [fix] resize_ and resize_as_ : version bump (#96403)
Fixes https://github.com/pytorch/pytorch/issues/93776

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96403
Approved by: https://github.com/ezyang
2023-03-10 06:46:30 +00:00
Pearu Peterson
b89fda51cd Implement sparse semantics support in gradcheck (2nd try) (#95405)
Replaces https://github.com/pytorch/pytorch/pull/94714 that was reverted due to https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442355648

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95405
Approved by: https://github.com/albanD
2023-02-27 17:48:02 +00:00
Zain Rizvi
808879ec8b Revert "Implement sparse semantics support in gradcheck (#94714)" (#95386)
This reverts commit 7ac511c29a from https://github.com/pytorch/pytorch/pull/94714 since it breaks periodic.

Git thinks there's a merge conflict due to an unfortunately located newline deletion, so reverting this one manually

Details behind the failure in https://github.com/pytorch/pytorch/pull/94714#issuecomment-1442160593
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95386
Approved by: https://github.com/clee2000
2023-02-23 18:02:37 +00:00
Pearu Peterson
cece63f197 Add warn-once deprecation warning to legacy sparse constructors (#94850)
Addresses https://github.com/pytorch/pytorch/issues/68323#issuecomment-1425174341

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94850
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-02-23 15:05:12 +00:00
kshitij12345
3b966a6ce3 [autograd] disable backward/grad for complex scalar output (#92753)
Fixes https://github.com/pytorch/pytorch/issues/92750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92753
Approved by: https://github.com/ezyang
2023-02-23 11:38:27 +00:00
Pearu Peterson
7ac511c29a Implement sparse semantics support in gradcheck (#94714)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94714
Approved by: https://github.com/soulitzer, https://github.com/albanD
2023-02-22 20:03:25 +00:00
kshitij12345
311b20aae1 [fix] torch.pow handle real negative base and complex exponent (#95198)
Fixes https://github.com/pytorch/pytorch/issues/89903 https://github.com/pytorch/pytorch/issues/95111

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95198
Approved by: https://github.com/albanD, https://github.com/ngimel
2023-02-21 18:36:20 +00:00
Masaki Kozuki
f54233e273 [foreach] bump tensor's version and define backward via torchgen (as possible) (#93901)
## summary
- increment tensor versions in inplace foreach functions
- add a logic to take care of `ArrayRef<Scalar>`

rel: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/89591

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93901
Approved by: https://github.com/albanD
2023-02-20 23:18:07 +00:00
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Xuehai Pan
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
Brian Hirsh
2b36d35b9c add torch.autograd._unsafe_set_version_counter API (#92924)
better description coming soon (but this is meant to fix https://github.com/pytorch/pytorch/issues/91093)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92924
Approved by: https://github.com/ezyang, https://github.com/alanwaketan, https://github.com/albanD
2023-02-11 21:07:08 +00:00
soulitzer
93d7d546ff Fix saved tensor hooks to propogate errors back to python as-is (#94456)
Mitigates the effect of https://github.com/pytorch/pytorch/issues/34172 for saved tensor hooks

BC Breaking message:
- Exceptions raised inside the pack and unpack hooks are no longer erroneously converted to RuntimeErrors. You should update your code to handle the original type of exception raised.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94456
Approved by: https://github.com/albanD
2023-02-09 23:52:06 +00:00
Brian Hirsh
83275d8cdf add torch.autograd._set_view_replay_enabled, use in aot autograd (#92588)
tldr; this should fix some minor perf regressions that were caused by adding more as_strided() calls in aot autograd.

This PR adds a new context manager, `torch.autograd._set_view_replay_enabled()`.

Context: AOT Autograd has special handling for "outputs that alias graph intermediates". E.g. given this function:

```
def f(x):
    y = torch.mul(x, 2)
    out = y.view(-1)
    return out
```

AOT Autograd will do the following:

```
def fn_to_compile(x):
    y = torch.mul(x, 2)
    out = y.view(-1)
    # return the graph intermediate
    return y, out

compiled_fn = compile(fn_to_compile)

def wrapper(x):
    y, out = compiled_fn(x)
    # regenerate the alias of the graph intermediate
    return out._view_func(y)
```

What's annoying is that `out._view_func()` will result in a `.as_strided` call, because `out` is an ordinary runtime tensor. This (likely?) caused a perf regression, because when running the backward, out `as_strided_backward()` is slower than our `view_backward()`.

In this PR, I added some TLS for instructing autograd to do view replay instead of as_strided, even when given a normal tensor. I'm definitely interested in thoughts from autograd folks (cc @albanD @soulitzer). A few points that I want to bring up:

(1) One reason that this API seems generally useful to me is because of the case where you `torch.compile()` a function, and you pass in two inputs that alias each other, and mutate one of the inputs. Autograd is forced to add a bunch of as_strided() calls into the graph when this happens, but this would give users an escape hatch for better compiled perf in this situation

(2) To be fair, AOT Autograd probably won't need this TLS in the long term. There's a better (more complicated) solution, where AOT Autograd manually precomputes the view chain off of graph intermediates during tracing, and re-applies them at runtime. This is kind of complicated though and feels lower priority to implement immediately.

(3) Given all of that I made the API private, but lmk what you all think.

This is a followup of https://github.com/pytorch/pytorch/pull/92255.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-08 01:48:32 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Ivan Yashchuk
fba13d94a1 Remove deprecated torch.symeig (#70988)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`.

- [x] XLA PR: https://github.com/pytorch/xla/pull/4498

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988
Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet
2023-01-31 11:59:11 +00:00
Edward Z. Yang
434eb16deb Correctly restore pybind11 error_already_set (#93238)
We would handle py::error_already_set correctly from pybind11 bindings,
but not from our regular TH bindings, which meant that anything from
an inner pybind11 function call was getting unconditionally transformed
into a RuntimeError.  Not too many cases where we do this, but
PySymNodeImpl was one of them.

To test this, I need to raise a non-RuntimeError from a function which
is invoked from pybind11 and then propagated to a non-pybind11 call
site.  I introduce GuardOnDataDependentSymNode for expressly this
purpose (this is how I discovered the bug anyway.)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93238
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-01-30 16:43:01 +00:00
PyTorch MergeBot
acdd462b1a Revert "Remove deprecated torch.symeig (#70988)"
This reverts commit d70ed68162.

Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful
2023-01-24 19:03:40 +00:00
Elias Ellison
70f4b3551c Add Hook to store arbitrary python objects that are copied over in tls (#89169)
For the cudagraphs implementation, we would like to reuse objects that are defined in python across the forward and backward. The backward is run in a different thread, so to handle this we add an api for copying over arbitrary python objects in pytorch's thread local state, in the same way that C++ objects are copied over currently.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89169
Approved by: https://github.com/albanD
2023-01-24 05:24:57 +00:00
Ivan Yashchuk
d70ed68162 Remove deprecated torch.symeig (#70988)
The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988
Approved by: https://github.com/lezcano, https://github.com/kit1980
2023-01-23 22:51:40 +00:00
soulitzer
97342ae04b Fix python tensor hooks behavior on inplace (#92734)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92734
Approved by: https://github.com/albanD
2023-01-21 21:32:37 +00:00
soulitzer
1bc60c6b31 [reland] Improve hooks ordering behavior (#92559)
This reverts commit e525f433e1.

Original PR:  #85849
Fixes #ISSUE_NUMBER

In addition to reverting the revert, this PR:
- defines the virtual destructor of FunctionPreHook in the header. Why? Presumably the internal build imports the header from somewhere, but does not have function_hooks.cpp (where the virtual destructor was previously defined) in the same compilation unit.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92559
Approved by: https://github.com/albanD
2023-01-19 08:17:32 +00:00
PyTorch MergeBot
e525f433e1 Revert "Improve hooks ordering behavior (#85849)"
This reverts commit 049838f249.

Reverted https://github.com/pytorch/pytorch/pull/85849 on behalf of https://github.com/albanD due to fails internal build
2023-01-18 15:27:22 +00:00
soulitzer
388b245d54 Expose autograd.graph.Node as an abstract base class (#91475)
This PR:
- registers all of the codegened Nodes to the torch._C._functions module, this is where special nodes like AccumulateGrad are already registered.
- creates a autograd.graph.Node abstract base class that all of the newly registered nodes subclass from. We make the subclassing happen by implementing the ``__subclasshook__`` method
- enables static type checking to work and also enables Sphinx to generate documentation for the Node and its methods
- handles both the custom Function and codegened cases

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91475
Approved by: https://github.com/albanD
2023-01-18 00:20:13 +00:00
soulitzer
049838f249 Improve hooks ordering behavior (#85849)
Addresses: https://github.com/pytorch/pytorch/issues/35802

Design doc: https://docs.google.com/document/d/19xSib7FFknRQ5f3ptGFUmiOt3BrgXSUlTQH2xMcZJYg/edit#

### Changes in this PR

#### Implementation
- We have now have 3 fields: pre_hooks, retains_grad_hooks, and tensor_pre_hooks so that we can more precisely define their ordering and when they are executed.
- Since retains grad uses an entirely new field, we cannot reuse the old retains grad, logic. We refactor retains grad to call directly into the variable.cpp logic. Other logic in variable.cpp that handle cpp hooks must also be updated.

#### Hooks ordering and execution:
- Defines pre-hooks registered on tensor to run before pre-hooks registered on grad_fn
- Updates pre-hooks registered on tensor to always run, even if they are the inputs= to .grad()
- Post hooks (and pre hooks) can now observe the modifications to gradient by the tensor pre hook

#### Retains grad hooks
- retains grad hooks always execute last, even if there are other tensor pre-hooks registered

#### Unchanged:
- pre_hooks registered to grad_fn aren't expected to execute if they are the inputs= to .grad()

Follow ups:
- simplify retains_grad field to not be a vector, since it always holds a single hook
- potentially merge capture hooks with tensor pre hooks, this would involve some additional refactoring since
- python hooks registered to tensor behavior on in-place is still wrong

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85849
Approved by: https://github.com/albanD
2023-01-17 16:23:21 +00:00
Richard Zou
81cc9bba5e [autograd.Function] Kill the extension feature flag (#92026)
This PR removes the autograd.Function extension feature flag. This was
previously used for development of the functorch <> autograd.Function
interaction.

It's been in master for long enough with the feature flag defaulting to
True, so it's time to remove it.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92026
Approved by: https://github.com/soulitzer
2023-01-17 13:36:42 +00:00
Richard Zou
2f9166ef89 [autograd.Function] Cleanup asymmetry in generate_vmap_rule and vmap (#91787)
This PR:
- changes generate_vmap_rule to either be True or False. Previously it
  could be True, False, or not set. This simplifies the implementation a
  bit.
- changes the vmap staticmethod to always be on the autograd.Function
  rather than sometimes defined.
  This is how the other staticmethod (forward, backward, jvp) are
  implemented and allows us to document it.

There are 4 possible states for the autograd.Function w.r.t. to the
above:
- generate_vmap_rule is True, vmap staticmethod overriden. This raises
  an error when used with vmap.
- generate_vmap_rule is False, vmap staticmethod overriden. This is
  valid.
- generate_vmap_rule is True, vmap staticmethod not overriden. This is
  valid.
- generate_vmap_rule is False, vmap staticmethod not overriden. This
  raises an error when used with vmap.

Future:
- setup_context needs the same treatment, but that's a bit tricker to
  implement.

Test Plan:
- new unittest
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91787
Approved by: https://github.com/soulitzer
2023-01-17 13:36:34 +00:00
Edward Z. Yang
333540a458 Reland "Add torch.utils.device_mode" (#91796)
Original PR https://github.com/pytorch/pytorch/pull/91525

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91796
Approved by: https://github.com/albanD
2023-01-09 20:57:12 +00:00
PyTorch MergeBot
9b415240d4 Revert "Reland "Add torch.utils.device_mode" (#91796)"
This reverts commit 81b5eff3c3.

Reverted https://github.com/pytorch/pytorch/pull/91796 on behalf of https://github.com/huydhn due to This breaks trunk with the following failed test https://hud.pytorch.org/failure/test_jit_save%2CTestTracer
2023-01-09 04:45:47 +00:00
Edward Z. Yang
81b5eff3c3 Reland "Add torch.utils.device_mode" (#91796)
Original PR https://github.com/pytorch/pytorch/pull/91525

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91796
Approved by: https://github.com/albanD
2023-01-08 03:44:56 +00:00
Richard Zou
f012d0ea5b [autograd.Function] enable the extended Function feature flag by default (#91441)
The autograd.Function <> functorch interaction is in a mostly completed
state now. There are some minor action items remaining
(https://github.com/pytorch/pytorch/issues/90224), but I want to enable
the feature by default so that PyTorch CI / other parties / etc can
begin testing to see if there is any impact on the original
autograd.Function API (there shouldn't be).

The longer-term plan for the feature flag is:
- keep it around until at least the next release (so that people can
turn off the feature if it breaks something in existing code)
- delete the flag then (either before or after the release, I haven't
decided yet)

Test Plan:
- new test
- wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91441
Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-12-28 21:00:27 +00:00
soulitzer
1b2ee4d0e1 Update functorch supported autograd.Function to allow mark_dirty (#91222)
Fixes https://github.com/pytorch/pytorch/issues/90225
Uses what was originally in 32a57bcdb6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91222
Approved by: https://github.com/zou3519
2022-12-28 03:53:47 +00:00
Huy Do
e40e4d36c9 Fix test_profiler_seq_nr flakiness (on macos) (#91019)
Fixes https://github.com/pytorch/pytorch/issues/66893

On MacOS, two `aten::sum` calls are reported sometimes where there should be only one.  This can be easily reproduced by running `pytest test_autograd.py -k test_profiler_seq_nr --verbose  --flake-finder` to see the flakiness.  The profile result when the test fails is as follows (sorted by CPU):

```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
                                            aten::randn        16.67%       3.000us        27.78%       5.000us       2.500us             2
                                              aten::sum        16.67%       3.000us        27.78%       5.000us       2.500us             2
                                          aten::normal_        11.11%       2.000us        11.11%       2.000us       1.000us             2
                                              aten::add        11.11%       2.000us        11.11%       2.000us       2.000us             1
autograd::engine::evaluate_function: torch::autograd...        11.11%       2.000us        27.78%       5.000us       2.500us             2
                        torch::autograd::AccumulateGrad        11.11%       2.000us        16.67%       3.000us       1.500us             2
                                        aten::ones_like         5.56%       1.000us         5.56%       1.000us       1.000us             1
      autograd::engine::evaluate_function: SumBackward0         5.56%       1.000us        11.11%       2.000us       2.000us             1
                                           aten::expand         5.56%       1.000us         5.56%       1.000us       1.000us             1
                                            aten::copy_         5.56%       1.000us         5.56%       1.000us       0.500us             2
                                            aten::empty         0.00%       0.000us         0.00%       0.000us       0.000us             2
                                       aten::as_strided         0.00%       0.000us         0.00%       0.000us       0.000us             2
                                            aten::fill_         0.00%       0.000us         0.00%       0.000us       0.000us             2
                                       aten::empty_like         0.00%       0.000us         0.00%       0.000us       0.000us             1
                                    aten::empty_strided         0.00%       0.000us         0.00%       0.000us       0.000us             3
                                           SumBackward0         0.00%       0.000us         5.56%       1.000us       1.000us             1
      autograd::engine::evaluate_function: AddBackward0         0.00%       0.000us         0.00%       0.000us       0.000us             1
                                           AddBackward0         0.00%       0.000us         0.00%       0.000us       0.000us             1
                                aten::new_empty_strided         0.00%       0.000us         0.00%       0.000us       0.000us             2
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 18.000us
```

When it happens, the two `aten::sum` calls have different inputs:

```
                                              aten::sum         4.35%       1.000us        13.04%       3.000us       3.000us             1                          [[10, 10], []]
                                              aten::sum         8.70%       2.000us         8.70%       2.000us       2.000us             1                  [[10, 10], [], [], []]
```

I'm not sure what is the internal difference between `z.sum()` and `z.sum(dim=None)` here on MacOS, I thought they are the same.

### Testing

`pytest test_autograd.py -k test_profiler_seq_nr --verbose  --flake-finder` to run the test 50 times, all pass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91019
Approved by: https://github.com/malfet
2022-12-22 17:37:45 +00:00
soulitzer
d19988093d [autograd Function] Return input as-is if marked dirty even when requires_grad=False (#91214)
Fixes https://github.com/pytorch/pytorch/issues/90209

Somewhat related: https://github.com/pytorch/pytorch/issues/71119
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91214
Approved by: https://github.com/albanD
2022-12-21 21:20:56 +00:00
soulitzer
b66862ba87 [autograd Function] Don't materialize forward grad for non-differentiable types (#91183)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91183
Approved by: https://github.com/zou3519
2022-12-21 05:05:44 +00:00
albanD
0eb45d546c Bind autograd current Node for debugging purposes (#90867)
This allows to know at any point during the backward pass what is running and where the Node currently running was created at:
```python
import torch
from torch.utils._python_dispatch import TorchDispatchMode
from torch.autograd import detect_anomaly

class MyMode(TorchDispatchMode):
    def __torch_dispatch__(self, func, types, args, kwargs=None):
        node = torch._C._current_autograd_node()
        print(f"Running {func} from within {node}")
        if node is not None:
            print("The Node was created at:")
            print("\n  ".join(node.metadata["traceback_"]))
        return func(*args, **kwargs or {})

with MyMode(), detect_anomaly():
    print("FW")
    a = torch.rand(10, requires_grad=True)
    b = a.mul(2)
    b = b.div(3)
    b = b.sum()
    print("BW")
    b.backward()
```

Gives
```
$ python foo.py
foo.py:15: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging.
  with MyMode(), detect_anomaly():
FW
Running aten.rand.default from within None
Running aten.mul.Tensor from within None
Running aten.div.Tensor from within None
Running aten.sum.default from within None
BW
Running aten.ones_like.default from within None
Running aten.expand.default from within <SumBackward0 object at 0x7fa40c0c6dc0>
The Node was created at:
  File "foo.py", line 20, in <module>
    b = b.sum()

Running aten.isnan.default from within <SumBackward0 object at 0x7fa40c0c6500>
The Node was created at:
  File "foo.py", line 20, in <module>
    b = b.sum()

Running aten.any.default from within <SumBackward0 object at 0x7fa32b23a780>
The Node was created at:
  File "foo.py", line 20, in <module>
    b = b.sum()

Running aten._local_scalar_dense.default from within <SumBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 20, in <module>
    b = b.sum()

Running aten.div.Tensor from within <DivBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 19, in <module>
    b = b.div(3)

Running aten.isnan.default from within <DivBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 19, in <module>
    b = b.div(3)

Running aten.any.default from within <DivBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 19, in <module>
    b = b.div(3)

Running aten._local_scalar_dense.default from within <DivBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 19, in <module>
    b = b.div(3)

Running aten.mul.Tensor from within <MulBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 18, in <module>
    b = a.mul(2)

Running aten.isnan.default from within <MulBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 18, in <module>
    b = a.mul(2)

Running aten.any.default from within <MulBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 18, in <module>
    b = a.mul(2)

Running aten._local_scalar_dense.default from within <MulBackward0 object at 0x7fa40c0c9190>
The Node was created at:
  File "foo.py", line 18, in <module>
    b = a.mul(2)

Running aten.detach.default from within <AccumulateGrad object at 0x7fa40c0c9730>
The Node was created at:
  File "foo.py", line 18, in <module>
    b = a.mul(2)

Running aten.detach.default from within <AccumulateGrad object at 0x7fa40c0c94b0>
The Node was created at:
  File "foo.py", line 18, in <module>
    b = a.mul(2)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90867
Approved by: https://github.com/soulitzer
2022-12-20 13:41:43 +00:00
Nikita Vedeneev
3870a9e28d to_sparse_XXX: backward support (#90281)
As per title. Fixes https://github.com/pytorch/pytorch/issues/85226

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90281
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
2022-12-14 09:05:17 +00:00
Pearu Peterson
f4099af1e9 Fix gradcheck for BSR and BSC inputs. (#90719)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90719
Approved by: https://github.com/soulitzer, https://github.com/cpuhrsch
2022-12-14 05:37:05 +00:00
soulitzer
6d425a7ce9 Fix forward AD custom Function non-differentiable outputs (#90787)
Fixes https://github.com/pytorch/pytorch/issues/90067

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90787
Approved by: https://github.com/albanD
2022-12-13 23:13:44 +00:00
Richard Zou
24c3ad7851 Move private forward grad mode helpers to torch.autograd.forward_ad (#90240)
Motivation
- These were previously defined in functorch. They are not
functorch-specific, so I'm moving them to torch.autograd.forward_ad and
the autograd python bindings.
- I need this to avoid some of my cyclic import problems.

Should these be public APIs? Probably. Though this needs discussion, so
punting it to the future.

Test Plan:
- moved the tests of these from test/functorch/test_eager_transforms.py
to test/test_autograd.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90240
Approved by: https://github.com/soulitzer
2022-12-13 14:14:02 +00:00
Richard Zou
eb314f9b1a Add setup_context staticmethod to autograd.Function (#89859)
Adds a setup_context staticmethod to autograd.Function.
If it exists, then the user splits the ctx-specific logic from the
forward() and puts it in the setup_context staticmethod.

Docs will come later when we remove the feature flag.

Test Plan:
- some light tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89859
Approved by: https://github.com/soulitzer
2022-12-08 19:31:04 +00:00
Richard Zou
103be1f164 Add feature flag for the autograd.Function extension (#89858)
This PR adds a private runtime feature flag for the feature work we're going
to do with extending autograd.Function. The motivation of the feature flag
is:
- to guard the feature against unsuspecting users
- control the release of the feature to when we are ready to release it

We might not even need the feature flag (because we hope to have the
work done in the next month), but it is good practice and it does touch
currently public API (autograd.Function).

Concretely, "autograd.Function extension" refers to:
- adding an optional `setup_context` staticmethod to autograd.Function
- adding an optional `vmap` staticmethod to autograd.Function
- autograd.Function support for functorch

Test Plan:
- new test that the feature flag works
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89858
Approved by: https://github.com/soulitzer
2022-12-08 19:31:01 +00:00
Sergii Dymchenko
6a7659f304 Fix issue 38095 TODO in test_autograd.py (#90031)
Fix TODO related to https://github.com/pytorch/pytorch/issues/38095

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90031
Approved by: https://github.com/clee2000
2022-12-07 19:09:43 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
PyTorch MergeBot
cba96366a2 Revert "remove torch.equal usages (#89527)"
This reverts commit 4095ef8b80.

Reverted https://github.com/pytorch/pytorch/pull/89527 on behalf of https://github.com/clee2000 due to broke periodic multigpu tests 4095ef8b80 https://github.com/pytorch/pytorch/actions/runs/3592806602/jobs/6049368502
2022-12-02 21:36:13 +00:00
Pearu Peterson
b87682f555 Fix gradcheck for CSR and CSC inputs. (#89786)
Partially fix-es https://github.com/pytorch/pytorch/issues/87085

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89786
Approved by: https://github.com/albanD
2022-12-02 12:35:20 +00:00
Philip Meier
4095ef8b80 remove torch.equal usages (#89527)
Preparation for the next PR in this stack: #89559.

I replaced

- `self.assertTrue(torch.equal(...))` with `self.assertEqual(..., rtol=0, atol=0, exact_device=True)`,
- the same for `self.assertFalse(...)` with `self.assertNotEqual(...)`, and
- `assert torch.equal(...)` with `torch.testing.assert_close(..., rtol=0, atol=0)` (note that we don't need to set `check_device=True` here since that is the default).

There were a few instances where the result of `torch.equal` is used directly. In that cases I've replaced with `(... == ...).all().item()` while sometimes also dropping the `.item()` depending on the context.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89527
Approved by: https://github.com/mruberry
2022-12-01 11:22:52 +00:00
albanD
02e2eaa9c6 Fix CopySlices logic to ensure wrapped node runs properly. (#89812)
This should remove the failures seen by https://github.com/pytorch/pytorch/pull/89720 in functionalization
Locally verified that running the following on top of this PR does pass: `python benchmarks/dynamo/huggingface.py --accuracy --backend aot_eager --training --only MobileBertForMaskedLM`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89812
Approved by: https://github.com/soumith, https://github.com/voznesenskym, https://github.com/ezyang
2022-11-29 18:44:28 +00:00
albanD
c79489c8e6 Expose to python the backward AD view_func (#89586)
This will be useful for other systems (AOTAutograd) that want to replay autograd views.

FYI @bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89586
Approved by: https://github.com/soulitzer
2022-11-24 03:39:58 +00:00
albanD
347a7d97a5 Deprecate decorating classes with torch.no_grad and similar (#89522)
Fixes https://github.com/pytorch/pytorch/issues/89450

I would have completely removed it but I don't think this is particularly urgent and there are some use of it in the wild: https://github.com/search?q=%2Ftorch%5C.no_grad%5C%28%5C%29%5Cnclass%2F&type=code
So we might as well take one release to do it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89522
Approved by: https://github.com/lezcano, https://github.com/soulitzer, https://github.com/janeyx99
2022-11-23 16:51:42 +00:00
soulitzer
6b521bbf35 Prevent module full_backward_hook from erroring in double backward (#88357)
Also clarifies documentation to say "execute if and only if gradients wrt outputs are computed" (previously, "execute every time gradients wrt inputs are computed")

See https://docs.google.com/document/d/1tFZKYdsSzRBJ7Di7SWt8X8fSg-E3eiUPwomMF10UyhM/edit for more details regarding the question: 'should module full_backward_hooks be called every time the gradients wrt module inputs are called, or should module full_backward_hooks only be called when the "backward for the module" have been computed?'

Fixes https://github.com/pytorch/pytorch/issues/88312

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88357
Approved by: https://github.com/albanD
2022-11-16 19:27:30 +00:00
soulitzer
27dc03e09b Turn internal assert when saved tensor is detached inplace into torch check (#88860)
Fixes https://github.com/pytorch/pytorch/issues/88809

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88860
Approved by: https://github.com/albanD
2022-11-12 18:33:18 +00:00
soulitzer
b92acee8f8 Add context manager to allow mutation on saved tensors (#79056)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79056
Approved by: https://github.com/albanD
2022-11-11 15:18:28 +00:00
Fabio Rocha
652af5ec15 upsample_*.vec ops are now CompositeImplicit (#85638)
It was previously CompositeExplicit but it was not really necessary.
See discussion in https://github.com/pytorch/pytorch/issues/85405

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85638
Approved by: https://github.com/ezyang, https://github.com/lezcano, https://github.com/malfet, https://github.com/jansel
2022-11-09 09:58:04 +00:00
Kurt Mohler
ee28b865ee Deprecate TypedStorage, its derived classes, and all of their public methods (#85303)
Part of #85302

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303
Approved by: https://github.com/ezyang
2022-11-08 18:11:01 +00:00
soulitzer
84a302e534 Remove wrong internal assert in handle_view_on_rebase (#88243)
Fixes: https://github.com/pytorch/pytorch/issues/88205

The `CreationMeta::NO_GRAD_MODE` path in handle_view_on_rebase wrongly assumes that the tensor would be a leaf, because tensors created in no_grad are always leaf tensors. However, due to creation_meta propagation, a view of a view created in no_grad also has `CreationMeta::NO_GRAD_MODE`, but DOES have grad_fn.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88243
Approved by: https://github.com/albanD
2022-11-02 17:50:16 +00:00
Peter Bell
bc9caafc78 record_function: update to use custom_class API (#76420)
Re-submit of gh-72302

This still has a small performance hit, but it much smaller. On my
machine I see `_record_fucntion_exit._RecordFunction` takes 1.05 us
compared to the `Tensor` overload taking 0.79 us.

In an overall comparison, I see a 0.7 us slowdown from 6.0 us to
6.7 us for this timeit benchmark
```python
import torch

def foo():
  with torch.profiler.record_function("foo"):
    return torch.eye(3)

%timeit foo()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76420
Approved by: https://github.com/robieta
2022-11-02 00:39:28 +00:00
soulitzer
6ad3543a1b BE: Improve test_will_engine_execute_node unittest (#87806)
Adds the test from https://github.com/pytorch/pytorch/pull/86672

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87806
Approved by: https://github.com/albanD
2022-10-27 21:13:08 +00:00
soulitzer
adb76ef510 Expose API for backward execution order (#87507)
In this PR:
- graph_task stores graph roots on construction so that we can later traverse through the graph
- before the nodes are returned, they needed to be converted from raw_ptr to shared_ptr, and this should be OK because the graph is guaranteed to be alive

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87507
Approved by: https://github.com/albanD
2022-10-26 21:28:45 +00:00
lezcano
faf9c47abb Simplify a few diagonal-related functions (#87180)
`diag` was unnecessarily implemented as a kernel rather than as a composite
function, which made it unnecessarily difficult (explicit backward + all it entails).

We also change a few uses of `diag` on 2D tensors for `diagonal()`. The
latter returns a view rather than creating a new tensor.

We also upgrade its meta implementation to a fully-fledged
decomposition

I tried implementing the backwards of `diagonal()` via `diag_scatter` (or better `diag_scatter_` to keep the perf) but functionalisation was failing and I was not sure how to fix this, so I moved on. It may be possible to simplify that one as well if @soulitzer or someone knows how to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87180
Approved by: https://github.com/ngimel, https://github.com/albanD, https://github.com/mruberry
2022-10-24 06:11:53 +00:00
soulitzer
c18eead2df Update saved variable hooks to no longer trigger on wrapped numbers (#87316)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87316
Approved by: https://github.com/ezyang, https://github.com/albanD
2022-10-20 03:01:11 +00:00
Brian Hirsh
34c86adec4 symintify all of derivatives.yaml (#86610)
Big-bang PR to symintify **all** .sizes() calls in derivatives.yaml, which will be needed for symbolic tracing.

* with the exception of `split()`, which is tougher to land because it requires internal changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86610
Approved by: https://github.com/albanD
2022-10-14 20:15:48 +00:00
albanD
55663b7f81 Reland 3 of Symintify getitem and add the required helper functions (#86207) (#86487)
Note that this might not cover every use of the function (we know it doesn't)
But this is enough to get few models passing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86487
Approved by: https://github.com/ezyang
2022-10-10 15:54:28 +00:00
soulitzer
ba3fde6aa0 Add multi-grad hooks (#86260)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86260
Approved by: https://github.com/albanD
2022-10-07 21:16:45 +00:00
albanD
97e56c176d Try to fix shutdown test in edge cases (#86464)
Fixes https://github.com/pytorch/pytorch/issues/85259
See the issue for debugging details.
tl;dr: when a worker thread is actually used, make sure it is initialized before exiting.
Yes, it is very unlikely it will take >10s to initialize but it is what seems to happen.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86464
Approved by: https://github.com/soulitzer, https://github.com/ezyang
2022-10-07 21:09:40 +00:00
PyTorch MergeBot
5b69b87d5a Revert "Symintify getitem and add the required helper functions (#86207)"
This reverts commit fd5085c445.

Reverted https://github.com/pytorch/pytorch/pull/86207 on behalf of https://github.com/seemethere due to  Fails internal tests, see: https://www.internalfb.com/intern/sandcastle/job/22517998926071860/insights
2022-10-07 16:10:30 +00:00
Pearu Peterson
8f2c2167d4 Support autograd on sparse_mm in full. (#86301)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86301
Approved by: https://github.com/cpuhrsch
2022-10-06 18:39:31 +00:00
albanD
fd5085c445 Symintify getitem and add the required helper functions (#86207)
Note that this might not cover every use of the function (we know it doesn't)
But this is enough to get few models passing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86207
Approved by: https://github.com/ezyang, https://github.com/Chillee, https://github.com/bdhirsh
2022-10-06 04:46:19 +00:00
Elias Ellison
d04889323e Add Context Manager for Disabling Multithreading in Backwards, use in aot autograd (#86245)
We were running into a few issues with running multithreaded backwards in aot_autograd: such as https://github.com/pytorch/pytorch/issues/86136, and `FakeTensorMode` getting into a weird state as a result of not executing functions completely sequentially. The multithreaded backwards is lost in translation when we trace out the backwards anyway, and adds a lot of additional complexity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86245
Approved by: https://github.com/albanD, https://github.com/yf225
2022-10-06 03:27:42 +00:00
PyTorch MergeBot
168ba066e3 Revert "Symintify getitem and add the required helper functions (#86207)"
This reverts commit 17addb307e.

Reverted https://github.com/pytorch/pytorch/pull/86207 on behalf of https://github.com/malfet due to Broke lint, by double-registering `meta_index_put`, but no CI was run during the outage
2022-10-05 22:42:56 +00:00
albanD
17addb307e Symintify getitem and add the required helper functions (#86207)
Note that this might not cover every use of the function (we know it doesn't)
But this is enough to get few models passing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86207
Approved by: https://github.com/ezyang
2022-10-05 21:19:00 +00:00
Jing Xu
f20e4eab7b Fix ITT unit-tests if PyTorch is compiled with USE_ITT=OFF (#86199)
Fixes https://github.com/pytorch/pytorch/pull/84848#discussion_r986329680

@malfet @slgong-fb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86199
Approved by: https://github.com/malfet
2022-10-04 21:57:05 +00:00
Richard Zou
a262ccea58 Change torch.autograd.graph.disable_saved_tensors_hooks to be public API (#85994)
Also addresses some comments from the review in
https://github.com/pytorch/pytorch/pull/85971
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85994
Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-10-03 16:25:01 +00:00
Richard Zou
7c72bc48d8 Add mechanism to disable the "saved tensors hooks" feature (#85971)
The rationale for this is that functorch doesn't work with saved
variable hooks at the moment or checkpointing and we need some way to
disable it.

Concretely:
- there's a context manager that does the disabling
- this feature is disabled on a thread-local basis
- one can set an error message or use the default error message that
says the feature has been disabled

Since it is thread local I needed to update ATen/ThreadLocalState. To
make things nicer, this PR refactors all the "saved tensors hooks"
related TLS things into a single struct.

Test Plan:
- new test

Differential Revision: [D39970936](https://our.internmc.facebook.com/intern/diff/D39970936)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85971
Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-09-30 20:03:58 +00:00
PyTorch MergeBot
801818f9e6 Revert "Add mechanism to disable the "saved tensors hooks" feature (#85553)"
This reverts commit 5aa183d2bc.

Reverted https://github.com/pytorch/pytorch/pull/85553 on behalf of https://github.com/atalman due to Reverting since failed build-fisp-diff-linux_platform010-opt
2022-09-30 14:31:09 +00:00
Richard Zou
5aa183d2bc Add mechanism to disable the "saved tensors hooks" feature (#85553)
The rationale for this is that functorch doesn't work with saved
variable hooks at the moment or checkpointing and we need some way to
disable it.

Concretely:
- there's a context manager that does the disabling
- this feature is disabled on a thread-local basis
- one can set an error message or use the default error message that
says the feature has been disabled

Since it is thread local I needed to update ATen/ThreadLocalState. To
make things nicer, this PR refactors all the "saved tensors hooks"
related TLS things into a single struct.

Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85553
Approved by: https://github.com/soulitzer
2022-09-28 22:49:28 +00:00