Commit Graph

946 Commits

Author SHA1 Message Date
PyTorch MergeBot
f9b4856989 Revert "[pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257)"
This reverts commit c95a6b416b.

Reverted https://github.com/pytorch/pytorch/pull/113257 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @zou3519 can you please help land this internally? See the sigmoid tests in D71198793 for details. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/113257#issuecomment-2725982539))
2025-03-14 23:13:34 +00:00
Xuehai Pan
c95a6b416b [pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257)
Changes in this PR:

1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.

Resolves #75982. New tests are included in this PR.

- #75982

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
2025-03-14 08:50:30 +00:00
PyTorch MergeBot
ebd087e4b5 Revert "[pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257)"
This reverts commit f08146b67b.

Reverted https://github.com/pytorch/pytorch/pull/113257 on behalf of https://github.com/jovianjaison due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/113257#issuecomment-2711299830))
2025-03-10 17:19:21 +00:00
Xuehai Pan
f08146b67b [pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257)
Changes in this PR:

1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.

Resolves #75982. New tests are included in this PR.

- #75982

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
2025-03-06 18:59:02 +00:00
wdziurdz
edc3ca577e [Profiler] Add profiler activity for HPU devices (#148182)
Fixes #148181

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148182
Approved by: https://github.com/sraikund16
2025-03-05 01:37:48 +00:00
Mwiza Kunda
b5873292c6 Add overload names to profiler trace (#143114)
Currently, recorded profiler events for aten ops do not store overload names. It would be useful to know which overloads are actually called to analyse performance.
For example, consider the following dispatch trace which occurs if there is a fallthrough kernel registered for aten::add:
```
             [call] op=[aten::add.Tensor], key=[AutogradCPU]
               [redispatch] op=[aten::add.Tensor], key=[Undefined]
                 [call] op=[aten::empty.memory_format], key=[BackendSelect]
                   [redispatch] op=[aten::empty.memory_format], key=[CPU]
                 [call] op=[aten::add.out], key=[CPU]
```

In this case, aten::add.out is a child of aten::add.Tensor, however the current profiler trace provides no way to differentiate aten op calls.

See the added unit test for a more detailed example.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143114
Approved by: https://github.com/sraikund16
2025-03-05 01:00:29 +00:00
Aaron Orenstein
db4ce78d46 PEP585: More UP006 fixes (#146392)
This should be the final PR before we can enable RUFF UP006.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392
Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007
2025-02-20 06:18:13 +00:00
Aaron Orenstein
1f8ff94d4f PEP585: Add noqa to necessary tests (#146391)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146391
Approved by: https://github.com/justinchuby, https://github.com/Skylion007
2025-02-12 15:29:50 +00:00
Simon Fan
387c993c3b [ca] remove private API: _compiled_autograd_should_lift (#146720)
Since the functional autograd + compiled autograd migration, we don't trace into nodes anymore, and everything is lifted. We can't support this flag which tries to inline make_fx style in CA initial pass. There's no more usage internally.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146720
Approved by: https://github.com/zou3519
2025-02-10 04:29:57 +00:00
Brian Hirsh
e68f5087d8 update _unsafe_set_version_counter to accept lists of tensors (#137921)
See the comment [here](https://github.com/pytorch/pytorch/issues/132014#issuecomment-2379547400) (cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @XilunWu @rec) - this PR updates `_unsafe_set_version_counter` to accept a list of tensors, for overhead-sensitive users (e.g. distributed) who need to hide VC bumps from autograd on a large list of tensors without wanting to suffer the overhead of going from python->C++ separately for every tensor in the list.

I left the binding in pybind, and used a `std::vector`. if we **really** need to optimize overhead even further, we could write a manual cpython binding.

I use this updated API in the next PR to fix FSDP2, so that it properly hides the VC of all `all_gather_buffer` tensors in its call to `split_with_sizes_copy.out(all_gather_buffers)`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137921
Approved by: https://github.com/awgu, https://github.com/albanD
2025-02-04 04:51:11 +00:00
rzou
ea141d8134 functional compiled autograd (#144707)
This PR squashes together the following commits:

https://github.com/pytorch/pytorch/pull/144115
https://github.com/pytorch/pytorch/pull/143417
https://github.com/pytorch/pytorch/pull/143405
https://github.com/pytorch/pytorch/pull/143387
https://github.com/pytorch/pytorch/pull/143304
https://github.com/pytorch/pytorch/pull/143296

This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses.

For more information, please read the commit messages for each PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144707
Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel
2025-01-27 05:20:56 +00:00
soulitzer
9e0ee152e5 Fix allow_mutation_on_saved_tensors for inplace foreach (#145520)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145520
Approved by: https://github.com/albanD
2025-01-25 00:58:03 +00:00
PyTorch MergeBot
16c4f8c395 Revert "[compiled autograd] Always proxy autograd.Function nodes; handle AOT backwards (#143405)"
This reverts commit ec820fe57c.

Reverted https://github.com/pytorch/pytorch/pull/143405 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))
2025-01-23 23:34:13 +00:00
rzou
ec820fe57c [compiled autograd] Always proxy autograd.Function nodes; handle AOT backwards (#143405)
We will always proxy autograd.Function nodes in compiled autograd's
initial graph capture (previously there was an
option to proxy vs trace into the autograd.Function)

We have some requirements for the AOTBackward. Compiled Autograd runs
accumulate grad reordering passes on the AOTBackward graph directly
after the initial graph capture, so we can't just proxy a single node for it.

Instead, we:
- proxy the AOTBackward prologue function into the CA graph
- copy-paste the AOTBackward graph into the CA graph
- trace directly through the epilogue (the traced nodes go into the CA
  graph).

Tracing through the epilogue is safe (assuming no Tensor subclasses)
because the only thing the epilogue does is drop some outputs. The
Tensor subclass situation was already broken so this doesn't regress
anything but this PR sets it up to be fixed (in a followup, where we
will proxy "make_subclass" calls into the graph from the epilogue).

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143405
Approved by: https://github.com/jansel, https://github.com/xmfan
ghstack dependencies: #143296, #143304, #143387
2025-01-22 21:50:56 +00:00
Aaron Orenstein
f2cfe8b59f PEP585 update - mostly toplevels (#145178)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145178
Approved by: https://github.com/bobrenjc93
2025-01-22 02:21:14 +00:00
cyy
d87aad6877 [5/N] Apply Ruff fixes and pyupgrade to Python 3.9 (#144205)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144205
Approved by: https://github.com/albanD
2025-01-15 04:00:47 +00:00
soulitzer
c000214826 Allow GradientEdge as torch.autograd.backward outputs (#144744)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144744
Approved by: https://github.com/albanD
2025-01-14 21:31:44 +00:00
Jane Xu
2f6f13562f [BE] Actually suppress vmap warning from gradcheck (#144287)
This is the much safer change compared to https://github.com/pytorch/pytorch/pull/144283

Before:
```
PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_optim.py -k TestDifferentiableOptimizer.test_sgd
/data/users/janeyx/pytorch/torch/autograd/gradcheck.py:1156: FutureWarning: Please use torch.vmap instead of torch._vmap_internals.vmap.
  result = vmap(vjp)(torch.stack(grad_outputs))
/data/users/janeyx/pytorch/torch/autograd/gradcheck.py:1156: FutureWarning: Please use torch.vmap instead of torch._vmap_internals.vmap.
  result = vmap(vjp)(torch.stack(grad_outputs))
.
----------------------------------------------------------------------
Ran 1 test in 0.028s
```

(the env vars aren't necessary)

After:
```
PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_optim.py -k TestDifferentiableOptimizer.test_sgd
.
----------------------------------------------------------------------
Ran 1 test in 0.028s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144287
Approved by: https://github.com/cyyever, https://github.com/soulitzer
2025-01-07 01:11:41 +00:00
Xuehai Pan
b77406a9ec [BE][CI] bump ruff to 0.8.4 (#143753)
Changes:

1. Bump `ruff` from 0.7.4 to 0.8.4
2. Change `%`-formatted strings to f-string
3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753
Approved by: https://github.com/Skylion007
2024-12-24 12:24:10 +00:00
Tom Ritchford
dc23f1944a Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-12 17:39:14 +00:00
PyTorch MergeBot
5c97ac9721 Revert "Remove unused Python variables in torch/[_-a]* (#133492)"
This reverts commit fda975a7b3.

Reverted https://github.com/pytorch/pytorch/pull/133492 on behalf of https://github.com/clee2000 due to Sorry, I need to revert this in order to revert something else.  The only thing you need to do is rebase and remerge ([comment](https://github.com/pytorch/pytorch/pull/133492#issuecomment-2536635516))
2024-12-11 17:29:12 +00:00
Tom Ritchford
fda975a7b3 Remove unused Python variables in torch/[_-a]* (#133492)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133492
Approved by: https://github.com/albanD
2024-12-10 21:48:44 +00:00
Yoni Chechik
3e90c00a87 Missing space in torch.autograd.Function deprecation warning (#141562)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141562
Approved by: https://github.com/colesbury
2024-11-27 01:31:26 +00:00
Diego Sandoval
4fb4aa3e70 Updated docstrings referring to torch.expand to point to torch.Tensor.expand (#140045)
`torch.expand` was moved to `torch.Tensor.expand` but some docstrings still refer to `torch.expand`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140045
Approved by: https://github.com/mikaylagawarecki
2024-11-21 20:13:41 +00:00
Aaron Gokaslan
12e95aa4ee [BE]: Apply PERF401 autofixes from ruff (#140980)
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-11-20 17:52:07 +00:00
Shivam Raikundalia
ac7acfb894 [Profiler] Create Auto-Trace Frontend for Trace ID (#139310)
Summary:
This PR adds Auto-Trace implementation for Trace ID. By default, the python side will generate a uuid in the same format as the one set in the backend by kineto. Upon running an auto-trace, the python generated trace id will overwrite the one set in kineto using the Config variable. Since we don't expect users to generate on-demand traces after an auto-trace we can simply keep overwriting the backend trace id whenever autotrace is ran. If we one day want to eventually do something like this, we simply have to add a call in kineto on the backend to generate a new ID upon start of profiling.

We also implement a custom callback in the frontend such that users can generate their own trace ids if they wish to. This works similarly as the default, only difference being that they have to manually set this callback after a profiler is generated. We use a specific call to set this rather then putting it in the frontend initializer in case users want to change the trace_id for different repeats.

Test Plan: Tested both default and custom callbacks using the verbose prints added. Trace ids on the frontend and the prints on the backend for the manifold upload matched.

Differential Revision: D65178308

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139310
Approved by: https://github.com/shengfukevin
2024-10-31 19:02:57 +00:00
Ramana Sundararaman
be4b7e8131 Param fixes in docstring (#136097)
Fixes wrong param names in docstrings. cc: @kit1980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136097
Approved by: https://github.com/ezyang
2024-09-21 18:56:34 +00:00
PyTorch MergeBot
37a08b33bb Revert "fix compiled_autograd deadlock throw (#135795)"
This reverts commit 00dc7d4356.

Reverted https://github.com/pytorch/pytorch/pull/135795 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/135795#issuecomment-2354233619))
2024-09-16 23:59:56 +00:00
Ma Jian
00dc7d4356 fix compiled_autograd deadlock throw (#135795)
Fixes #135298

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135795
Approved by: https://github.com/xmfan
2024-09-12 23:24:57 +00:00
Jane Xu
7f2d20e687 Run all autograd node post hooks (#134728)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134728
Approved by: https://github.com/albanD, https://github.com/soulitzer
2024-09-06 19:44:28 +00:00
Shivam Raikundalia
62b221d5cc Add Percentages to Function Events (#135155)
Summary: Users have recently asked that the profiler contains self/total CPU and device percentages to FunctionEvents so that teams can process the data procedurely. Some of it could be done mathematically via subroutines but since we already have the information in the _build_table, lets build it there.

Test Plan: Check that we have the same table as before but also check that the parameters we check also have the expected values

Differential Revision: D62210351

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135155
Approved by: https://github.com/shanw-meta, https://github.com/kit1980
2024-09-06 00:39:11 +00:00
albanD
2588b5e51a Move module_tracker to logging for confused hierarchy (#134467)
Fixes https://github.com/pytorch/pytorch/issues/134242

Make sure to never raise an error when confused. Logs for confusion can be enabled with `TORCH_LOGS="torch.utils.module_tracker"` or the usual python systems.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134467
Approved by: https://github.com/malfet
2024-08-26 19:39:08 +00:00
Xuehai Pan
758a0a88a2 [BE][Easy] enable ruff rule PIE790: unnecessary pass statement (#133200)
This PR removes unnecessary `pass` statement. This is semanticly safe because the bytecode for the Python code does not change.

Note that if there is a docstring in the function, a empty function does not need a `pass` statement as placeholder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133200
Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/kit1980
2024-08-15 15:50:19 +00:00
briancoutinho
61625a18ef [profiler] Only parse kineto requests and build tree when required (#132713)
To avoid high overheads of constructing datastructure in python when the user is simply saving trace to a file, we only process things lazily.

## Details
1. Delay function event parsing, add a flag to denote when needed.
1. Make profiler.function_events a computed property so code using `prof.function_events` does not need to change.
1. Fix coverage for `str(prof)` in profiler tests.

## Test run
Test program
```
import torch
from torch.profiler import profile, record_function, ProfilerActivity

def payload(use_cuda=False):
    x = torch.randn(10, 10)
    if use_cuda:
        x = x.cuda()
    y = torch.randn(10, 10)
    if use_cuda:
        y = y.cuda()
    z = torch.mm(x, y)
    z = z + y
    if use_cuda:
        z = z.cpu()

with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
    with record_function("model_inference"):
        payload()

prof.export_chrome_trace("/tmp/test_trace.json")
#print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
```

The print "this is computing events" will happen lazily.

```
>]$ python3 profiler_test.py
Brian: this is computing function events
----------------------  ------------  ------------  ------------  ------------  ------------  ------------
                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls
----------------------  ------------  ------------  ------------  ------------  ------------  ------------
       model_inference         6.77%     441.628us       100.00%       6.523ms       6.523ms             1
           aten::randn         1.86%     121.108us        46.93%       3.061ms       1.530ms             2
              aten::mm        45.36%       2.959ms        45.44%       2.964ms       2.964ms             1
         aten::normal_        44.72%       2.917ms        44.72%       2.917ms       1.458ms             2
             aten::add         0.87%      56.646us         0.87%      56.646us      56.646us             1
           aten::empty         0.35%      22.808us         0.35%      22.808us      11.404us             2
    aten::resolve_conj         0.08%       5.173us         0.08%       5.173us       1.724us             3
----------------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 6.523ms

$> python3 profiler_test.py
(pytorch) [bcoutinho@devgpu038.ftw6 /data/users/bcoutinho/pytorch (profiler_optimize_parsing)]$
$>ls -a profiler_test.py
$> ls -l /tmp/test_trace.json
-rw-r--r-- 1 bcoutinho users 16471 Aug  5 16:10 /tmp/test_trace.json
```
## Unit test
Updates some tests and they all pass now.
`pytest test/profiler/test_profiler.py`

Also
`python test/test_autograd.py TestAutogradWithCompiledAutograd.test_record_function`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132713
Approved by: https://github.com/sraikund16
2024-08-13 18:58:20 +00:00
Shivam Raikundalia
d2ecdcb2f7 [Profiler] Add API for Dynamic Activity Toggling [2/n] (#133035)
Summary: During PT2 there are many GPU/CPU events that are unneccessary to profile in between a given step. To remedy this, we can add an API that takes in a list of activities and an arg to toggle said activies or not. For this diff we are adding the profiler API to propogate down to kineto (and in the future the collection.cpp logic). Subsequent diffs will be added for CPU toggling and e2e testing.

Test Plan: Tested by toggling backward gpu traces off and got following trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Jul_31_13_40_55.3251726.pt.trace.json.gz&bucket=gpu_traces

Reviewed By: aaronenyeshi

Differential Revision: D60541767

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133035
Approved by: https://github.com/aaronenyeshi
2024-08-09 21:54:54 +00:00
Shivam Raikundalia
78fa32a77b Turn off Function Event Accumulation by Default (#133095)
Summary: D56956245 added the ability to accumulate FunctionEvents across multiple cycles in order to perform statistical analysis on them all together. Although this can be useful, it uses too many CPU resources especially for long running jobs. For this reason, lets add a flag to the profiler to turn off this behavior by default, but still allow users to turn it on if they wish.

Test Plan: Changed function count test to have acc_events passed in and check the amount of function events based on if flag is true or not

Differential Revision: D61021490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133095
Approved by: https://github.com/briancoutinho, https://github.com/LucasLLC, https://github.com/aaronenyeshi
2024-08-09 20:47:20 +00:00
Brian Hirsh
e3394e5548 torch.autograd.graph.increment_version: accept List[Tensor], use in AOTDispatcher (#132652)
The regression from https://github.com/pytorch/pytorch/issues/132281 pinpoints e4ace1a396 as the cause. The main delta that commit introduces is that we now manually check `is_inference()` and call `increment_version()` (a pybind call) on every mutated input tensor to the graph.

This PR attempts to reduce overhead a bit by bundling up all of those checks into a single pybind call, by:

(1) updating `torch.autograd.graph.increment_version()` to accept a `Union[Tensor, List[Tensor]]`

(2) updating its semantics to no-op if you pass in a tensor with no version counter, instead of erroring

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132652
Approved by: https://github.com/albanD
2024-08-06 17:46:48 +00:00
Aaron Gokaslan
fd4b649e6c [BE]: Simplify some list comps to generators C419 (#132578)
Simplifies some list comprehensions to generator which is more efficient. Automatically applied diffs for the most part with ruff

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132578
Approved by: https://github.com/ezyang
2024-08-04 17:46:26 +00:00
Xuehai Pan
f3fce597e9 [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769
Approved by: https://github.com/ezyang
2024-08-04 10:24:09 +00:00
Oguz Ulgen
72d2dba992 Add None return type to init (#132335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335
Approved by: https://github.com/albanD
2024-08-01 15:26:45 +00:00
albanD
4e966e8a1c Update inference_mode doc (#132321)
Fix https://github.com/pytorch/pytorch/issues/132288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132321
Approved by: https://github.com/awgu, https://github.com/soulitzer
2024-07-31 23:50:03 +00:00
soulitzer
884eadcd19 Fix multi grad hooks thread safety (#132055)
Thanks @awgu  for spotting this

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132055
Approved by: https://github.com/Skylion007, https://github.com/awgu, https://github.com/albanD
2024-07-29 22:32:59 +00:00
Andrew Gu
d8358a2d86 Made register_multi_grad_hook return type RemovableHandle (#132074)
`_MultiHandle` is private. Let us return `RemovableHandle`, which is public.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132074
Approved by: https://github.com/soulitzer
2024-07-29 21:29:34 +00:00
Florian
1614891946 [Profiler] exclude gpu_user_annotation when accumulating cuda time total (#130733)
Fixes #[130730](https://github.com/pytorch/pytorch/issues/130730)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130733
Approved by: https://github.com/aaronenyeshi
2024-07-22 04:35:21 +00:00
soulitzer
2eec02523b [autograd] Support GradientEdge as output for torch.autograd.grad (#127766)
This is useful for splitting grad to run in two parts while preserving intermediates:

<details>

<summary>
Click to see code
</summary>

```python
import collections
import weakref
from torch.autograd.graph import GradientEdge

def _get_grad_fn_or_grad_acc(t):
    if t.requires_grad and t.grad_fn is None:
        return t.view_as(t).grad_fn.next_functions[0][0]
    else:
        return t.grad_fn

def reverse_closure(roots, target_nodes):
    # Recurse until we reach a target node
    closure = set()
    actual_target_nodes = set()
    q: Deque = collections.deque()
    for node in roots:
        if node is not None and node not in closure:
            closure.add(node)
            q.append(node)
    while q:
        node = q.popleft()
        reverse_edges = node.metadata.get("reverse_edges", [])
        for holder_ref, idx in reverse_edges:
            ref = holder_ref()
            if ref is not None:
                raise RuntimeError("Reverse graph is no longer alive")
            fn = ref.node
            if fn in closure or fn is None:
                continue
            if fn in target_nodes:
                actual_target_nodes.add(fn)
                continue
            closure.add(fn)
            q.append(fn)
    return closure, actual_target_nodes

# Enable weak pointer
class Holder():
    def __init__(self, node):
        self.node = node

# TODO: use weak references to avoid reference cycle
def construct_reverse_graph(roots):
    q: Deque = collections.deque()
    root_seen = set()
    reverse_graph_refs = []
    for node in roots:
        if node is not None and node not in root_seen:
            q.append(node)
            root_seen.add(node)
    while q:
        node = q.popleft()
        for fn, idx in node.next_functions:
            if fn is not None:
                # Don't necessarily need to store on the graph
                reverse_edges = fn.metadata.get("reverse_edges", [])
                if len(reverse_edges) == 0:
                    q.append(fn)
                holder = Holder(node)
                holder_ref = weakref.ref(holder)
                reverse_graph_refs.append(holder)
                reverse_edges.append((holder_ref, idx))
                fn.metadata["reverse_edges"] = reverse_edges
    return reverse_graph_refs

def get_param_groups(inputs, params):
    inputs_closure, _ = reverse_closure(inputs, set())
    param_groups = dict()  # keyed on intermediates
    for i, param in enumerate(params):
        closure, intersected = reverse_closure([param], inputs_closure)
        param_group = {
            "params": set([param]),
            "intermediates": set(intersected),
        }
        for input_node in intersected:
            existing = param_groups.get(input_node, None)
            if existing is not None:
                existing["params"] = existing["params"].union(param_group["params"])
                existing["intermediates"] = existing["intermediates"].union(param_group["intermediates"])
                param_group = existing
            else:
                param_groups[input_node] = param_group

    # Sanity check: union of all param_groups params should be equal to all params
    union_params = set()
    seen_ids = set()
    unique_param_groups = []
    for param_group in param_groups.values():
        if id(param_group) not in seen_ids:
            seen_ids.add(id(param_group))
            unique_param_groups.append(param_group)
            union_params = union_params.union(param_group["params"])
    assert union_params == set(params)

    return unique_param_groups

def compute_grads_only_inputs2(roots, inps, weights):
    root_grad_fns = list(map(_get_grad_fn_or_grad_acc, roots))
    inp_grad_fns = list(map(_get_grad_fn_or_grad_acc, inps))
    weight_grad_fns = list(map(_get_grad_fn_or_grad_acc, weights))

    reverse_graph_refs = construct_reverse_graph(root_grad_fns)
    param_groups = get_param_groups(inp_grad_fns, weight_grad_fns)
    del reverse_graph_refs

    for param_group in param_groups:
        for i, intermediate in enumerate(param_group["intermediates"]):
            def get_hook(param_group, i):
                def hook(grad_inputs):
                    if param_group.get("grads", None) is None:
                        param_group["grads"] = [None] * len(param_group["intermediates"])
                    param_group["grads"][i] = grad_inputs
                return hook
            # These are always "split" nodes that we need to recompute, so
            # save their inputs.
            intermediate.register_prehook(get_hook(param_group, i))

    dinputs = torch.autograd.grad((out,), inputs=tuple(inps), grad_outputs=(torch.ones_like(out),), retain_graph=True)
    return dinputs, param_groups

def compute_grads_only_weights2(user_weights, param_groups):
    all_dweights = dict()
    for param_group in param_groups:
        # TODO: Handle case where intermediate can have multiple outputs
        intermediate_edges = tuple(GradientEdge(i, 0) for i in param_group["intermediates"])
        weights_edges = tuple(GradientEdge(w, 0) for w in param_group["params"])

        assert all(len(g) == 1 for g in param_group["grads"])
        # [NEW!] Able to pass a GradientEdge to autograd.grad as output
        # We do not need to retain_graph because... guarantee no overlap?
        print("trying to execute: ", intermediate_edges, weights_edges)
        dweights = torch.autograd.grad(intermediate_edges, weights_edges, grad_outputs=sum(param_group["grads"], tuple()))
        for w, dw in zip(param_group["params"], dweights):
            all_dweights[w] = dw
    # return grads in the original order weights were provided in
    out = []
    for w in user_weights:
        grad_acc = _get_grad_fn_or_grad_acc(w)
        out.append(all_dweights[grad_acc])
    return tuple(out)

```

</details>

```python
import torch.nn as nn

# Setup
mod1 = nn.Linear(10, 10)
mod2 = nn.Linear(10, 10)

a = torch.rand(10, requires_grad=True)

weights = tuple(mod1.parameters()) + tuple(mod2.parameters())
inps = (a,)

out = mod2(mod1(a))

class LoggingTensorMode(torch.utils._python_dispatch.TorchDispatchMode):
    def __torch_dispatch__(self, func, types, args=(), kwargs=None):
        if kwargs is None:
            kwargs = {}
        rs = func(*args, **kwargs)
        print(f"{func.__module__}.{func.__name__}")
        return rs

print(" -- SPLIT -- ")
# Compute gradients in two parts
with LoggingTensorMode():
    print("PART 1")
    dinputs, state = compute_grads_only_inputs2((out,), inps, weights)
    print("PART 2")
    dweights = compute_grads_only_weights2(weights, state)

out = mod2(mod1(a))

print(" -- REF -- ")

# Compare with reference
with LoggingTensorMode():
    ref_all_gradients = torch.autograd.grad(out, inputs=tuple(inps) + weights, grad_outputs=(torch.ones_like(out),))

for actual, ref in zip(dinputs + dweights, ref_all_gradients):
    print(torch.allclose(actual, ref))

```

<img width="598" alt="image" src="https://github.com/pytorch/pytorch/assets/13428986/3681b8a7-3ab4-4d1d-a836-abef6913e671">

```
PART 1
torch._ops.aten.view.default
torch._ops.aten.view.default
torch._ops.aten.view.default
torch._ops.aten.view.default
torch._ops.aten.view.default
torch._ops.aten.ones_like.default
V0603 10:17:21.590878 8300067520 torch/autograd/graph.py:751] Executing: <ViewBackward0 object at 0x12a1ee160> with grad_outputs: [f32[10]]
torch._ops.aten.view.default
V0603 10:17:21.591204 8300067520 torch/autograd/graph.py:751] Executing: <AddmmBackward0 object at 0x12a1ee0d0> with grad_outputs: [f32[1, 10]]
torch._ops.aten.t.default
torch._ops.aten.mm.default
V0603 10:17:21.591578 8300067520 torch/autograd/graph.py:751] Executing: <ViewBackward0 object at 0x100d7ae50> with grad_outputs: [f32[1, 10]]
torch._ops.aten.view.default
V0603 10:17:21.591747 8300067520 torch/autograd/graph.py:751] Executing: <ViewBackward0 object at 0x12a1e4a60> with grad_outputs: [f32[10]]
torch._ops.aten.view.default
V0603 10:17:21.591834 8300067520 torch/autograd/graph.py:751] Executing: <AddmmBackward0 object at 0x12a1e4bb0> with grad_outputs: [f32[1, 10]]
torch._ops.aten.t.default
torch._ops.aten.mm.default
V0603 10:17:21.591922 8300067520 torch/autograd/graph.py:751] Executing: <ViewBackward0 object at 0x12a1e4a90> with grad_outputs: [f32[1, 10]]
torch._ops.aten.view.default
PART 2
trying to execute:  (GradientEdge(node=<AddmmBackward0 object at 0x12a1e4bb0>, output_nr=0),) (GradientEdge(node=<AccumulateGrad object at 0x12a21b130>, output_nr=0), GradientEdge(node=<AccumulateGrad object at 0x12a21b7c0>, output_nr=0))
V0603 10:17:21.592223 8300067520 torch/autograd/graph.py:751] Executing: <AddmmBackward0 object at 0x12a1e4bb0> with grad_outputs: [f32[1, 10]]
torch._ops.aten.t.default
torch._ops.aten.mm.default
torch._ops.aten.t.default
torch._ops.aten.sum.dim_IntList
torch._ops.aten.view.default
V0603 10:17:21.592421 8300067520 torch/autograd/graph.py:751] Executing: <TBackward0 object at 0x12a1cad60> with grad_outputs: [f32[10, 10]]
torch._ops.aten.t.default
trying to execute:  (GradientEdge(node=<AddmmBackward0 object at 0x12a1ee0d0>, output_nr=0),) (GradientEdge(node=<AccumulateGrad object at 0x12a1e41c0>, output_nr=0), GradientEdge(node=<AccumulateGrad object at 0x12a21b670>, output_nr=0))
V0603 10:17:21.593481 8300067520 torch/autograd/graph.py:751] Executing: <AddmmBackward0 object at 0x12a1ee0d0> with grad_outputs: [f32[1, 10]]
torch._ops.aten.t.default
torch._ops.aten.mm.default
torch._ops.aten.t.default
torch._ops.aten.sum.dim_IntList
torch._ops.aten.view.default
V0603 10:17:21.593750 8300067520 torch/autograd/graph.py:751] Executing: <TBackward0 object at 0x12a21b2b0> with grad_outputs: [f32[10, 10]]
torch._ops.aten.t.default
torch._ops.aten.view.default
torch._ops.aten.view.default
torch._ops.aten.view.default
torch._ops.aten.view.default

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127766
Approved by: https://github.com/albanD
2024-07-16 21:46:19 +00:00
Shivam Raikundalia
6f275ae4d0 Add kwinputs to Kineto Traces (#130373)
Summary: On the autograd side of things, we are currently saving the kwinputs but we aren't doing anything with them on the profiler side. This diff enables the use of the kwinputs for both FunctionEvents and Chrome Traces.

Test Plan: Added unit testing for both chrome traces and FunctionEvents. Used RecordFunctionFast to test kwinputs since test already had kwargs being passed in but not tested.

Differential Revision: D59472345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130373
Approved by: https://github.com/davidberard98
2024-07-14 00:40:59 +00:00
Feny Patel
7f2436014e add MTIA as valid device type for prof averages (#130340)
Summary: Add MTIA as valid device option for getting profile averages

Test Plan: Tested with auto-trace on MTIA

Differential Revision: D59486392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130340
Approved by: https://github.com/aaronenyeshi
2024-07-12 00:39:01 +00:00
Shivam Raikundalia
b5b91b418d [Easy] Update record_function Comment (#130561)
Summary: Users have been confused why user annotations on GPU tracks do not show when doing GPU only tracing. This comment should help users understand that to use this function they need to have CPU activies enabled.

Test Plan: N/A it is just updating a comment

Differential Revision: D59649390

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130561
Approved by: https://github.com/aaronenyeshi
2024-07-11 23:51:25 +00:00
Xuehai Pan
973037be6a [BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): list() / tuple() / dict() (#130199)
This PR changes the empty collection factory call to Python literals:

- `list()` -> `[]`
- `tuple()` -> `()`
- `dict()` -> `{}`

The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary:

```bash
$ python3 -m dis - <<EOS
import collections

d1 = {}
d2 = dict()

dict = collections.OrderedDict
d3 = dict()
EOS
```

```text
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (0)
              4 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (collections)
              8 STORE_NAME               0 (collections)

  3          10 BUILD_MAP                0
             12 STORE_NAME               1 (d1)

  4          14 PUSH_NULL
             16 LOAD_NAME                2 (dict)
             18 CALL                     0
             26 STORE_NAME               3 (d2)

  6          28 LOAD_NAME                0 (collections)
             30 LOAD_ATTR                8 (OrderedDict)
             50 STORE_NAME               2 (dict)

  7          52 PUSH_NULL
             54 LOAD_NAME                2 (dict)
             56 CALL                     0
             64 STORE_NAME               5 (d3)
             66 RETURN_CONST             1 (None)
```

The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199
Approved by: https://github.com/malfet
2024-07-11 17:30:28 +00:00
Xuehai Pan
3f50e197c4 [BE] annotate torch.autograd.graph (#129558)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129558
Approved by: https://github.com/soulitzer
2024-07-06 18:14:16 +00:00