Commit Graph

83 Commits

Author SHA1 Message Date
soulitzer
71aefd5595 [reland] Allow setting grad_dtype on leaf tensors (#164751)
ghstack-source-id: e44b3941530be83a630ec93f1478eec741ffca2e
Pull-Request-resolved: https://github.com/pytorch/pytorch/pull/162815

Fixes #ISSUE_NUMBER

Relanding due to internal weirdness. Separate PR to codev w/o ghstack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164751
Approved by: https://github.com/albanD
2025-10-08 20:23:13 +00:00
PyTorch MergeBot
3ddf2018d0 Revert "Support setting grad_dtype on leaf tensors (#162815)"
This reverts commit dca73982c5.

Reverted https://github.com/pytorch/pytorch/pull/162815 on behalf of https://github.com/yangw-dev due to break internal test D83850533, see more details below ([comment](https://github.com/pytorch/pytorch/pull/162815#issuecomment-3367498501))
2025-10-03 23:14:28 +00:00
soulitzer
dca73982c5 Support setting grad_dtype on leaf tensors (#162815)
`grad_dtype` is a new attribute on Tensor to control gradient dtype:
- Access/setting is leaf-only.
- grad_dtype is respected when (1) when assigning to .grad, and (2) in the engine after the previous node produces incoming gradients for AccumulateGrad. (See table below for details)
- Not setting grad_dtype preserves the current behavior. Accessing it returns `t.dtype`
- `grad_dtype` cannot be set when there is already a `.grad` present and the dtypes conflict.

| `grad_dtype` setting | Setting `.grad` manually | Incoming gradient from autograd engine |
|-----------------------|--------------------------|-----------------------------------------|
| **Default (tensor’s dtype)** | `.grad` must match tensor’s dtype | Engine casts incoming grad to tensor’s dtype |
| **Set to specific dtype** | `.grad` must match that dtype | Engine casts incoming grad to the specified dtype |
| **Set to `None`** | `.grad` may be any dtype | Engine does not cast; accepts incoming grad dtype as-is |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162815
Approved by: https://github.com/albanD
2025-10-02 23:09:07 +00:00
yewentao256
c44efc3755 [Refactor] Fix Compile Warning: possibly dangling reference to a temporary (#159517)
```bash
DEBUG pytorch/torch/csrc/dynamo/compiled_autograd.h:1388:25: warning: possibly dangling reference to a temporary [-Wdangling-reference]
DEBUG  1388 |     for (const at::IValue& elt : lst) {
DEBUG       |                         ^~~
DEBUG pytorch/torch/csrc/dynamo/compiled_autograd.h:1388:1: note: the temporary was destroyed at the end of the full expression ‘__for_begin .c10::impl::ListIterator<c10::IValue, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator*().c10::impl::ListElementReference<c10::IValue, __gnu_cxx::__normal_iterator<c10::IValue*, std::vector<c10::IValue> > >::operator std::conditional_t<true, const c10::IValue&, c10::IValue>()’
DEBUG  1388 |     for (const at::IValue& elt : lst) {
DEBUG       | ^
```

This PR fixes this warning

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159517
Approved by: https://github.com/xmfan
2025-07-31 04:49:43 +00:00
Xu Han
e2b2685f84 [inductor] enable compiled autograd on CPU windows - v2 (#159185)
The first version: https://github.com/pytorch/pytorch/pull/158432
compiled autograd on windows is disabled in PR #144707 because cuda windows cannot compile this code.
However these code can be compiled on CPU. This PR enable these code on CPU windows.

But the first version changed ifdef block logical, and caused torch audio build fail: https://github.com/pytorch/audio/issues/3992

Here is the version two, which keep the original logical.

# Local test torch audio build pass:
<img width="874" height="1043" alt="image" src="https://github.com/user-attachments/assets/9657be86-04f7-4c66-b8c6-802ec2a7c5c8" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159185
Approved by: https://github.com/xmfan
2025-07-26 16:21:28 +00:00
PyTorch MergeBot
3fb78501f0 Revert "enable compiled autograd on CPU windows (#158432)"
This reverts commit a369350065.

Reverted https://github.com/pytorch/pytorch/pull/158432 on behalf of https://github.com/atalman due to Broke audio cuda windows builds see: https://github.com/pytorch/audio/issues/3992 ([comment](https://github.com/pytorch/pytorch/pull/158432#issuecomment-3119912177))
2025-07-25 18:29:16 +00:00
yuchengliu1
a369350065 enable compiled autograd on CPU windows (#158432)
compiled autograd on windows is disabled in PR #144707 because cuda windows cannot compile this code.
However these code can be compiled on CPU. This PR enable these code on CPU windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158432
Approved by: https://github.com/jansel, https://github.com/xmfan

Co-authored-by: Xu Han <xu.han@outlook.com>
2025-07-16 17:22:37 +00:00
Yuanyuan Chen
07bb097698 Fix clang-tidy bugprone* warnings (#148529)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148529
Approved by: https://github.com/ezyang
2025-06-23 23:09:56 +00:00
Xuehai Pan
5b210bb3a6 [BE][9/16] fix typos in torch/ (torch/csrc/) (#156319)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156319
Approved by: https://github.com/albanD
ghstack dependencies: #156313, #156314, #156315, #156316, #156317
2025-06-23 02:57:50 +00:00
PyTorch MergeBot
1d3bca40ed Revert "[BE][9/16] fix typos in torch/ (torch/csrc/) (#156319)"
This reverts commit a23ccaa847.

Reverted https://github.com/pytorch/pytorch/pull/156319 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](c95f7fa874) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))
2025-06-22 12:31:56 +00:00
Xuehai Pan
a23ccaa847 [BE][9/16] fix typos in torch/ (torch/csrc/) (#156319)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156319
Approved by: https://github.com/albanD
ghstack dependencies: #156313, #156314, #156315, #156316, #156317
2025-06-22 08:43:49 +00:00
Simon Fan
5f2f343e1e [ca] suggest to disable compiled autograd for trace-time NotImplementedErrors (#156509)
Example:

```python
  File "/home/xmfan/core/a/pytorch/torch/autograd/graph.py", line 829, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: TorchDispatchMode not yet implemented for compiled autograd.
  You can disable compiled autograd for this operation by:
  1.  Relocating the unsupported autograd call outside the compiled region.
  2.  Wrapping the unsupported autograd call within a scope that disables compiled autograd.
  3.  Configuring the specific compilation unit to disable compiled autograd.
  4.  Globally disabling compiled autograd at the application's initialization.
```

No duplicate error messages for python side trace-time errors
```python
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xmfan/core/a/pytorch/torch/_dynamo/compiled_autograd.py", line 344, in begin_capture
    raise NotImplementedError(
NotImplementedError: Found tensor of type <class 'torch.nn.utils._expanded_weights.expanded_weights_impl.ExpandedWeight'>, which is not supported by FakeTensorMode. You can turn off compiled autograd by either:
1. Moving the unsupported autograd call outside of the torch.compile'd region.
2. Wrapping the unsupported autograd call in the torch._dynamo.compiled_autograd._disable() context manager.
3. Setting torch._dynamo.config.compiled_autograd=False for the torch.compile call containing the unsupported autograd call.
4. Setting torch._dynamo.config.compiled_autograd=False at the start of the program.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156509
Approved by: https://github.com/jansel
ghstack dependencies: #156374
2025-06-21 18:33:46 +00:00
Simon Fan
9ff9c28fe8 [ca] Functionalize AccumulateGrad (#155521)
This PR changes compiled autograd's handling of gradient accumulation, by proxying it as a `call_accumulate_grad`, which does the .grad mutation in python bytecode for dynamo to see. For eager, the only change is the leaf invariant check was moved up.

Before:
- Compiled Autograd Engine: proxies call to inductor accumulate_grad op
- Dynamo: polyfills the inductor accumulate_grad op (not respecting all of the accumulateGrad implementation e.g. sparse, gradient layout contract)
```python
        new_grad_strided: "f32[s21]" = torch.empty_like(getitem_1);  getitem_1 = None
        copy_: "f32[s21]" = new_grad_strided.copy_(aot3_tangents_1);  copy_ = None
```
- AOTAutograd: functionalizes the copy_

After:
- Compiled Autograd Engine: proxies call to `call_accumulate_grad`, which calls `torch._dynamo.compiled_autograd.ops.AccumulateGrad`/`AccumulateGrad_apply_functional_no_hooks_ivalue`, similar to other functional autograd implementations, but also sets .grad from python. Hooks are still handled separately from this call.
- Dynamo: `torch._dynamo.compiled_autograd.ops.AccumulateGrad` was allow_in_graph'd
- AOTAutograd: traces into the op, with FunctionalTensors.

While functionalizing the tensors, we insert an autograd Error node to ensure that we don't use the autograd meta from tracing. This clashes with the "leaf variable has been moved into the graph interior" error check, I could not find a way to identify a FunctionalTensor subclass from C++, so I bypass that for Error nodes in the compiled case.

In the CI PR, this fixes 19 tests relating to sparse tensors, and more are hidden by an earlier failure in dynamo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155521
Approved by: https://github.com/jansel
2025-06-16 18:45:02 +00:00
Simon Fan
748252378d [ca] introduce RuntimeState to support c++ hooks via graph breaks (#149987)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149987
Approved by: https://github.com/jansel
ghstack dependencies: #149647, #149709, #149651, #149897
2025-03-27 05:05:34 +00:00
Simon Fan
578160c875 [ca] don't inline accumulate grad op (#149014)
we use dummy tensors in our initial trace, so we should never inline. the subclass dispatch might not support the dummy tensor, e.g. DTensor accumulate grad will check that both param and grad are DTensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149014
Approved by: https://github.com/jansel
ghstack dependencies: #149064
2025-03-15 01:10:54 +00:00
Simon Fan
e6f560a262 [ca] support for dynamic shapes CopySlices (#148799)
i'm changing CA initial trace to always trace as dynamic, fixes these errors:
```python
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.2139s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_autograd_python_custom_function_inplace - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
To execute this test, run the following from the base repo dir:
    python test/test_autograd.py TestAutogradWithCompiledAutograd.test_autograd_python_custom_function_inplace
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0057s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_copy_slices_graph_task_updates - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
To execute this test, run the following from the base repo dir:
    python test/test_autograd.py TestAutogradWithCompiledAutograd.test_copy_slices_graph_task_updates
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.9662s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_inplace_on_view_weak_grad_fn - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
To execute this test, run the following from the base repo dir:
    python test/test_autograd.py TestAutogradWithCompiledAutograd.test_inplace_on_view_weak_grad_fn
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0077s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_leaf_assignment - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
To execute this test, run the following from the base repo dir:
    python test/test_autograd.py TestAutogradWithCompiledAutograd.test_leaf_assignment
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [5.0485s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_setitem_mask - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
To execute this test, run the following from the base repo dir:
    python test/test_autograd.py TestAutogradWithCompiledAutograd.test_setitem_mask
This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
FAILED [0.0102s] test/inductor/test_compiled_autograd.py::TestAutogradWithCompiledAutograd::test_tensor_hooks_inplace_over_view - RuntimeError: !has_symbolic_sizes_strides_ INTERNAL ASSERT FAILED at "/home/xmfan/core/a/pytorch/aten/src/ATen/TensorGeometry.h":63, please report a bug to PyTorch.
To execute this test, run the following from the base repo dir:
    python test/test_autograd.py TestAutogradWithCompiledAutograd.test_tensor_hooks_inplace_over_view
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148799
Approved by: https://github.com/jansel, https://github.com/zou3519
2025-03-13 17:30:20 +00:00
Simon Fan
457ff9b7ae [reland][ca] side-effect free inital trace: compiled_args (#148376)
This reverts commit ea12fc8a9f.
Reland https://github.com/pytorch/pytorch/pull/147804, there was a bad import inserted by my linter.

Differential Revision: [D70582747](https://our.internmc.facebook.com/intern/diff/D70582747)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148376
Approved by: https://github.com/jansel
2025-03-11 01:57:36 +00:00
Aditya Tiwari
bb9c426024 Typo Errors fixed in multiple files (#148262)
# Fix typo errors across PyTorch codebase

This PR fixes various spelling errors throughout the PyTorch codebase to improve documentation quality and code readability.

## Changes Made

### Documentation Fixes
- Changed "seperate" to "separate" in multiple files:
  - `setup.py`: Build system documentation
  - `torch/_library/triton.py`: AOT compilation comments
  - `torch/csrc/dynamo/compiled_autograd.h`: Node compilation documentation
  - `torch/export/_unlift.py`: Pass population comments
  - `torch/export/exported_program.py`: Decomposition table notes

### Code Comments and Error Messages
- Changed "occured" to "occurred" in:
  - `test/mobile/test_lite_script_module.py`: Exception handling comments
  - `torch/export/_draft_export.py`: Error message text
  - `aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp`: MAGMA bug comment
  - `torch/csrc/utils/python_numbers.h`: Overflow handling comment
  - `torch/csrc/jit/OVERVIEW.md`: Graph compilation documentation
  - `torch/_dynamo/symbolic_convert.py`: Error explanation

### API Documentation
- Changed "fullfill" to "fulfill" in `torch/distributed/checkpoint/state_dict_loader.py`
- Changed "accross" to "across" in:
  - `torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp`
  - `torch/distributed/distributed_c10d.py`

## Motivation
These changes improve code readability and maintain consistent spelling throughout the codebase. No functional changes were made; this is purely a documentation and comment improvement PR.

## Test Plan
No testing required as these changes only affect comments and documentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148262
Approved by: https://github.com/janeyx99

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-09 12:21:40 +00:00
rzou
a907b6abae [compiled_autograd] workaround windows compilation issue (#148454)
torch.compile doesn't work on windows so we can ifdef-away the problem.
I do not know what the root cause actually is. Most notably, the pytorch
windows build is fine, but some third-party projects that use pytorch headers
on windows (e.g. torchaudio) have issues.

Test Plan:
- wait for CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148454
Approved by: https://github.com/atalman, https://github.com/xmfan
2025-03-05 00:18:20 +00:00
Wouter Devriendt
ea12fc8a9f Revert D70262395 (#148164)
Summary:

This reverts #147804 due to internal revert.

---
This diff reverts D70262395

Reviewed By: RossMcKenzie

Differential Revision: D70318024

@diff-train-skip-merge

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148164
Approved by: https://github.com/xmfan
2025-02-28 06:39:48 +00:00
Simon Fan
fd43c36aa9 [ca] side-effect free initial trace: RAII PyCompilerInterface (#147891)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147891
Approved by: https://github.com/jansel
ghstack dependencies: #147242, #147796, #147804
2025-02-27 07:17:30 +00:00
Simon Fan
fd1220e386 [ca] side-effect free inital trace: compiled_args (#147804)
const methods to prevent accidental mutation. changes mainly in Error nodes and PyNode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147804
Approved by: https://github.com/jansel
ghstack dependencies: #147242, #147796
2025-02-26 16:37:27 +00:00
Simon Fan
0a2da008f8 [ca] trace saved variable unpacking (#147242)
## Before

Previously, CA will always unpack all saved variables stored in the autograd graph before executing it. This meant that we can't capture unpack hooks as part of the CA graph, and they would fire out of order wrt to other backward hooks. For memory saving APIs built on top of saved tensor hooks like non-reentrant checkpointing and offloading, we couldn't achieve any savings because all activations would be recomputed/loaded and active at the same time, resulting in no-op.

## After

We add unpack hooks into the CA graph so that they can be executed progressively. The python hook and hook input themselves are wrapped by non-traceable code, so CA polyfills the wrapping as:
```python
# pseudocode
class SavedVariable:
  def unpack(self):
    if self.hook:
      return self.hook(self.packed_data)
    else:
      return self.packed_data

# This approach won't directly work when we add support for Forward AD or double-backward.
```

Directly executing the CA graph (without torch.compiling it) under checkpointing/offloading, memory profile is expected to stay the same as when using the eager autograd engine. If AOT backward is in the autograd graph, memory profile is expected to be better than the eager autograd engine, since we can now delay saved activations unpacking into the AOT backward's execution.

All tests pass when running the CA graph directly, the remaining issues are in Dynamo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147242
Approved by: https://github.com/jansel
2025-02-26 16:37:17 +00:00
PyTorch MergeBot
90e3a3d86d Revert "[ca] trace saved variable unpacking (#147242)"
This reverts commit 68ddca9449.

Reverted https://github.com/pytorch/pytorch/pull/147242 on behalf of https://github.com/wdvr due to failing tests in the slow workflow, see below ([comment](https://github.com/pytorch/pytorch/pull/147242#issuecomment-2683604547))
2025-02-26 00:40:16 +00:00
PyTorch MergeBot
143f0f0006 Revert "[ca] side-effect free inital trace: compiled_args (#147804)"
This reverts commit ec768d8dc0.

Reverted https://github.com/pytorch/pytorch/pull/147804 on behalf of https://github.com/wdvr due to failing tests in the slow workflow, see below ([comment](https://github.com/pytorch/pytorch/pull/147804#issuecomment-2683594740))
2025-02-26 00:31:40 +00:00
Simon Fan
ec768d8dc0 [ca] side-effect free inital trace: compiled_args (#147804)
const methods to prevent accidental mutation. changes mainly in Error nodes and PyNode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147804
Approved by: https://github.com/jansel
ghstack dependencies: #147242, #147796
2025-02-25 20:38:51 +00:00
Simon Fan
68ddca9449 [ca] trace saved variable unpacking (#147242)
## Before

Previously, CA will always unpack all saved variables stored in the autograd graph before executing it. This meant that we can't capture unpack hooks as part of the CA graph, and they would fire out of order wrt to other backward hooks. For memory saving APIs built on top of saved tensor hooks like non-reentrant checkpointing and offloading, we couldn't achieve any savings because all activations would be recomputed/loaded and active at the same time, resulting in no-op.

## After

We add unpack hooks into the CA graph so that they can be executed progressively. The python hook and hook input themselves are wrapped by non-traceable code, so CA polyfills the wrapping as:
```python
# pseudocode
class SavedVariable:
  def unpack(self):
    if self.hook:
      return self.hook(self.packed_data)
    else:
      return self.packed_data

# This approach won't directly work when we add support for Forward AD or double-backward.
```

Directly executing the CA graph (without torch.compiling it) under checkpointing/offloading, memory profile is expected to stay the same as when using the eager autograd engine. If AOT backward is in the autograd graph, memory profile is expected to be better than the eager autograd engine, since we can now delay saved activations unpacking into the AOT backward's execution.

All tests pass when running the CA graph directly, the remaining issues are in Dynamo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147242
Approved by: https://github.com/jansel
2025-02-25 20:38:51 +00:00
Simon Fan
e20b0c82d1 [ca] no longer require is_traceable annotations for c++ autograd functions (#146229)
This PR removes the CA compile-time error for C++ autograd functions, and supports them by having dynamo graph break on them (instead of allow_in_graph). The CppNode's collects are kept as is for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146229
Approved by: https://github.com/jansel, https://github.com/zou3519
2025-02-05 08:49:17 +00:00
rzou
ea141d8134 functional compiled autograd (#144707)
This PR squashes together the following commits:

https://github.com/pytorch/pytorch/pull/144115
https://github.com/pytorch/pytorch/pull/143417
https://github.com/pytorch/pytorch/pull/143405
https://github.com/pytorch/pytorch/pull/143387
https://github.com/pytorch/pytorch/pull/143304
https://github.com/pytorch/pytorch/pull/143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144707
Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel
2025-01-27 05:20:56 +00:00
PyTorch MergeBot
6dd8283381 Revert "[compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296)"
This reverts commit 5531fafffe.

Reverted https://github.com/pytorch/pytorch/pull/143296 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))
2025-01-23 23:34:13 +00:00
PyTorch MergeBot
9553301ade Revert "[compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function (#143387)"
This reverts commit 784bb2127c.

Reverted https://github.com/pytorch/pytorch/pull/143387 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))
2025-01-23 23:34:13 +00:00
PyTorch MergeBot
3f6cfd0156 Revert "[compiled autograd] stop specializing on metadata during initial trace (#143417)"
This reverts commit 99dd1bf1b9.

Reverted https://github.com/pytorch/pytorch/pull/143417 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))
2025-01-23 23:34:12 +00:00
rzou
99dd1bf1b9 [compiled autograd] stop specializing on metadata during initial trace (#143417)
The previous PRs built up to this. We change compiled autograd's initial
trace to stop baking in metadata.

While tracing, we allocate some weirdly shaped tensors that we can put
proxies on. The initial trace should not be accessing any metadata of
these tensors (it will likely error out if it does because of how weird
the shapes are).

This involved fixing some various sites where we do specialize on the
metadata, like:
- we change CopySlices's apply_with_saved to proxy some calls
  into the graph (this change is fairly hard to split out by itself).
- we stop calling InputBuffer::add
- we delete the weird metadata from the graph so that no graph passes
  can make use of it.

Test Plan:
- tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143417
Approved by: https://github.com/jansel, https://github.com/xmfan
ghstack dependencies: #143296, #143304, #143387, #143405
2025-01-22 21:51:07 +00:00
rzou
784bb2127c [compiled autograd] Proxy nodes for user-defined C++ torch::autograd::Function (#143387)
We define a functional version of a C++ torch::autograd::Function. The
functional version reconstructs the ctx object and then calls
backward with it.

Some more details:
- we define how to pack/unpack ctx.saved_data into an IValue. It's a
  Dict[str, IValue], so it wasn't difficult.
- every call to CppNode::apply_with_saved binds a new function to
  Python. This is because we're unable to reuse the a previously bound
  function for reasons (the schema may change depending on what the user
  actually puts into their Dict[str, IValue]).

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143387
Approved by: https://github.com/jansel, https://github.com/xmfan
ghstack dependencies: #143296, #143304
2025-01-22 21:50:47 +00:00
rzou
5531fafffe [compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296)
This PR is on the way to getting compiled autograd's initial capture to
stop specializing on Tensor metadata.

This PR changes compiled autograd's initial capture to proxy an opaque
(w.r.t. Dynamo) function into the graph for all built-in codegen'ed
autograd nodes and validate_outputs.

We changed each codegen'ed apply_with_saved (e.g.
MulBackward0::apply_with_saved) to call into Python to proxy a function
(compiled_autograd.ops.MulBackward0) into the graph. Then, we use the
node's InputMetadata to "guess" at the properties of the output Tensors
to create some new FakeTensors.

Some details:
- MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be
  call to Python via libtorch_python. There is an indirection
  (PyCompilerInterface) to do this.
- MulBackward0::apply_with_saved passes a C++ function to Python. To make
  our lives easier, every codegen'ed apply_with_saved passes a C++
  function with the same signature
  `(variable_list, ivalue_list) -> variable_list`.
- We define how to pack arbitrary C++ types into IValue via a helper
  IValuePacker struct and codegen functional variants of each builtin
  C++ autograd node (e.g. MulBackward0_apply_functional_ivalue).

MulBackward0 before this PR:
https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de

MulBackward0 after this PR:
https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143296
Approved by: https://github.com/jansel
2025-01-22 21:50:29 +00:00
cyy
f7b9533c3f [4/N] Apply bugprone-unchecked-optional-access (#142832)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142832
Approved by: https://github.com/albanD
2024-12-12 04:33:32 +00:00
Simon Fan
db4e8a1d8a [ca] expose option to collect sizes as dynamic (#141153)
This is to address recompiles from eager nodes that saved dynamic activations

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141153
Approved by: https://github.com/jansel
ghstack dependencies: #141152
2024-11-22 19:26:27 +00:00
cyy
55f1959fc1 [12/N] Fix extra warnings brought by clang-tidy-17 (#140801)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140801
Approved by: https://github.com/Skylion007
2024-11-15 16:54:30 +00:00
cyy
1dd503c6fb [4/N] Fix Wextra-semi warning (#139256)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139256
Approved by: https://github.com/ezyang
2024-10-31 03:01:14 +00:00
Simon Fan
fd9f4e6770 Back out "[compiled autograd] tls access helpers (#138061)" and Back out "[compiled autograd] Compiled autograd configs in TLS (#137821)" (#139086)
Summary:
Original commit changeset: 9bf80c1492d7

Original Phabricator Diff: D64796226

Original commit changeset: aa1d9ef8f6e6

Original Phabricator Diff: D64796212

Differential Revision: D65072644

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139086
Approved by: https://github.com/malfet
2024-10-28 23:37:05 +00:00
cyy
f4f0f2995d Fix Wextra-semi warnings (#139000)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139000
Approved by: https://github.com/ezyang
2024-10-28 21:48:51 +00:00
Simon Fan
49fa437097 [compiled autograd] Compiled autograd configs in TLS (#137821)
Multithreaded doesn't work yet, this adds python side TLS only for the python side state

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821
Approved by: https://github.com/jansel, https://github.com/yf225
ghstack dependencies: #137953
2024-10-22 08:03:52 +00:00
PyTorch MergeBot
361f42bc42 Revert "[compiled autograd] Compiled autograd configs in TLS (#137821)"
This reverts commit 9aba0b91c8.

Reverted https://github.com/pytorch/pytorch/pull/137821 on behalf of https://github.com/wdvr due to Reverting this for now, it is failing test_public_bindings in trunk ([comment](https://github.com/pytorch/pytorch/pull/137821#issuecomment-2417351788))
2024-10-16 16:38:29 +00:00
Simon Fan
9aba0b91c8 [compiled autograd] Compiled autograd configs in TLS (#137821)
Multithreaded doesn't work yet, this adds python side TLS only for the python side state

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137821
Approved by: https://github.com/jansel, https://github.com/yf225
ghstack dependencies: #137953
2024-10-16 09:28:32 +00:00
Simon Fan
839d3568b0 [compiled autograd] fix -Wuninitialized (#137539)
https://github.com/pytorch/pytorch/pull/135663#discussion_r1792408353

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137539
Approved by: https://github.com/isuruf, https://github.com/Skylion007
2024-10-09 21:16:26 +00:00
Simon Fan
40157db5a7 [compiled autograd] log placeholder origin in verbose (#135663)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135663
Approved by: https://github.com/jansel
2024-10-01 03:22:49 +00:00
Simon Fan
ff7d94c67e [compiled autograd] fix saved tensor hook firing count (#134361)
SavedVariable constructor calls the pack hooks, we don't want to call them for the proxy tensor since it is proxying a tensor that already had called the pack hook during forward.

Using the same fix as https://github.com/pytorch/pytorch/pull/123196

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134361
Approved by: https://github.com/jansel
ghstack dependencies: #134186, #134200, #134205, #134286, #134290, #134162, #134163
2024-08-24 12:06:36 +00:00
Simon Fan
0d9208a398 [compiled autograd] match eager behavior for inplace detached activations (#134186)
Fixes `TestAutograd.test_saved_variable_saved_original_inplace_detach` when ran under compiled autograd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134186
Approved by: https://github.com/jansel
2024-08-24 12:06:36 +00:00
Simon Fan
b9922f7a5a [compiled autograd][cpp node] No recaptures from saved float scalars (#133048)
Partially addresses https://github.com/pytorch/pytorch/issues/130170 for float scalars saved from forward pass of a custom c++ autograd function. With this PR, compiled autograd no longer recaptures when the float value changes, but downstream support isn't there yet: 4bdb4bbd86/torch/_dynamo/config.py (L58-L61)

Currently, any non-tensors passed in ctx->saved_data is specialized on by compiled autograd. To stop specializing on float values, we lift the float. We also require user code to use IValue::toSymFloat instead of IValue::toDouble in order to swap the SymFloat to proxy during compiled autograd tracing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133048
Approved by: https://github.com/jansel
ghstack dependencies: #132771
2024-08-10 11:05:44 +00:00
Simon Fan
c860889a65 [compiled autograd][cpp node] No recompiles from saved int scalars (#132771)
Addresses https://github.com/pytorch/pytorch/issues/130170 for int scalars saved from forward pass of a custom c++ autograd function

Currently, any non-tensors passed in ctx->saved_data is specialized on by compiled autograd. To stop specializing on int values, we lift the ints. We also require user code to use IValue::toSymInt instead of IValue::toInt in order to swap the SymInt to proxy during compiled autograd tracing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132771
Approved by: https://github.com/jansel
2024-08-10 11:05:44 +00:00