Commit Graph

3124 Commits

Author SHA1 Message Date
Edgar Romo Montiel
ca3cabd24a Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696)
Fixes #155028

This pull request updates the documentation  by transitioning from .rst to .md format. It introduces new Markdown files for the documentation of named_tensor, nested, nn.attention.bias, nn.attention.experimental, and nn.attention.flex_attention

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155696
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-14 03:32:00 +00:00
dggaytan
3003c681ef Converting .rst files to .md files (#155377)
Fixes #155036
This pull request updates the documentation for several modules by transitioning from .rst to .md format, improving readability and usability. It introduces new Markdown files for the documentation of torch.ao.ns._numeric_suite, torch.ao.ns._numeric_suite_fx, AOTInductor, AOTInductor Minifier, and the torch.compiler API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155377
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-13 22:54:27 +00:00
ggsmith842
799443605b Convert to markdown: distributed.tensor.parallel.rst, distributed.tensor.rst, distributions.rst, dlpack.rst (#155297)
Fixes #155019

## Description
Convert to markdown: distributed.tensor.parallel.rst, distributed.tensor.rst, distributions.rst, dlpack.rst

## Checklist
- [X] dlpack.rst converted to dlpack.md --> [Preview](https://docs-preview.pytorch.org/pytorch/pytorch/155297/dlpack.html)
- [X] distributions.rst converted to distributions.md --> [Preview](https://docs-preview.pytorch.org/pytorch/pytorch/155297/distributions.html)
- [X] distributed.tensor.rst converted to distributed.tensor.md --> [Preview](https://docs-preview.pytorch.org/pytorch/pytorch/155297/distributed.tensor.html)
- [X] distributed.tensor.parallel.rst converted to distributed.tensor.parallel.md --> [Preview](https://docs-preview.pytorch.org/pytorch/pytorch/155297/distributed.tensor.parallel.html)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155297
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-13 22:08:37 +00:00
Runtian (Rachel) Li
049dc48d1e fix code chunk indentation for jit_language_reference_v2.md (#155937)
Fixes https://github.com/pytorch/pytorch/issues/155023
Related PR: #155781

Description:
As discussed, this PR is a follow-up update for `jit_language_reference_v2.md` by deleting the code chunk indentation.

Checklist:

- [x]  The issue being fixed is referenced above (Fixes https://github.com/pytorch/pytorch/issues/155023)
- [x]  Only one issue is addressed in this pull request
- [x]  Labels from the issue that this PR is fixing are added to this pull request
- [x]  No unnecessary issues are included into this pull request.

@pytorchbot label "topic: docs"
@pytorchbot label "topic: not user facing"
@pytorchbot label docathon-h1-2025
@pytorchbot label "module: docs"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155937
Approved by: https://github.com/jingsh, https://github.com/svekars
2025-06-13 21:05:23 +00:00
GdoongMathew
731351bb4a Convert rst to markdown - optim.rst #155031 (#155813)
Fixes #155031
![image](https://github.com/user-attachments/assets/36507ca1-eb1e-4358-9e66-ce25ec8a2be1)

@pytorchbot label "docathon-h1-2025" "module: docs" "topic: not user facing" "topic: docs"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155813
Approved by: https://github.com/AlannaBurke
2025-06-13 21:03:39 +00:00
windsonsea
7d1b3f599d [Docs] Convert to markdown cond.rst, config_mod.rst (#155653)
Related to #155014

Only included 2 files in this PR:

- cond.rst
- config_mod.rst

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155653
Approved by: https://github.com/svekars
2025-06-13 20:58:57 +00:00
Justin Silver
f3e6c8e834 Fix #155016 for Docathon - convert rst to markdown (#155198)
Used [rst2myst tool](https://rst-to-myst.readthedocs.io/en/latest/)

One note is that "Created On" and "Last Updated On" banner doesn't show in the markdown files... I'm not sure if that's just an artifact of my local build though.

Fixes #155016

Docs comparison (check out the 'new' whenever docs build)

1. cuda ([old](https://docs.pytorch.org/docs/main/cuda.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155198/cuda.html))
2. cuda.tunable ([old](https://docs.pytorch.org/docs/main/cuda.tunable.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155198/cuda.tunable.html))
3. leave cudnn_persistent_rnn.rst as is because it's reused in docstrings
4. cudnn_rnn_determinism.rst as is because it's reused in docstrings.
5. data ([old](https://docs.pytorch.org/docs/main/data.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155198/data.html))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155198
Approved by: https://github.com/albanD, https://github.com/svekars
2025-06-13 20:24:34 +00:00
Ankita George
bf798a2f01 Change _hfstorage to hfstorage (#155837)
Summary: Change HF classes to not have an underscore, there-by making them public, we will add documentation to them following this

Test Plan:
ensure existing tests pass

Rollback Plan:

Differential Revision: D76364024

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155837
Approved by: https://github.com/saumishr
2025-06-13 20:19:51 +00:00
Dhia-naouali
c5d00e150a convert: rst to myst pr 1/2 (#155840)
Fixes #155038
parent [PR](https://github.com/pytorch/pytorch/pull/155375) (made two PRs to pass sanity check)
this PR converts the following two .rst files
- [torch.compiler_dynamo_overview](https://github.com/pytorch/pytorch/blob/main/docs/source/torch.compiler_dynamo_overview.rst)
- [torch.compiler_fake_tensor](https://github.com/pytorch/pytorch/blob/main/docs/source/torch.compiler_fake_tensor.rst)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155840
Approved by: https://github.com/sekyondaMeta
2025-06-13 18:02:28 +00:00
Runtian (Rachel) Li
093aaccae2 convert jit_language_reference_v2.rst to jit_language_reference_v2.md (#155781)
Fixes https://github.com/pytorch/pytorch/issues/155023

Description:
converted `jit_language_reference_v2.rst` to `jit_language_reference_v2.md`
**I indented the code blocks to minimize the file difference to pass the sanity check for no more than 2000 lines of change. I will submit another PR to fix the indentation after this PR is merged.**

Checklist:

- [x]  The issue being fixed is referenced above (Fixes https://github.com/pytorch/pytorch/issues/155023)
- [x]  Only one issue is addressed in this pull request
- [x]  Labels from the issue that this PR is fixing are added to this pull request
- [x]  No unnecessary issues are included into this pull request.

@pytorchbot label "topic: docs"
@pytorchbot label "topic: not user facing"
@pytorchbot label docathon-h1-2025
@pytorchbot label module: docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155781
Approved by: https://github.com/svekars
2025-06-13 17:33:10 +00:00
ZhaoqiongZ
3d595fd559 update get start xpu (#151886)
update link and product name
add print to print ```torch.xpu.is_available()``` result in code snippet for user not using command python
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151886
Approved by: https://github.com/guangyey, https://github.com/AlannaBurke

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
2025-06-13 07:46:13 +00:00
Justin Silver
70bb34929a Convert to .md: draft_export.rst, export.ir_spec.rst, fft.rst (#155567)
Used [rst2myst tool](https://rst-to-myst.readthedocs.io/en/latest/)

Fixes #155020. This PR is split into 3 to pass sanity check.

Docs comparison (check out the 'new' whenever docs build)

1. draft_export ([old](https://docs.pytorch.org/docs/main/draft_export.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155567/draft_export.html))
2. export.ir_spec ([old](https://docs.pytorch.org/docs/main/export.ir_spec.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155567/export.ir_spec.html))
3. fft ([old](https://docs.pytorch.org/docs/main/fft.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155567/fft.html))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155567
Approved by: https://github.com/svekars
2025-06-13 05:19:43 +00:00
GdoongMathew
2ba930d4ce Convert rst to markdown - profiler.rst #155031 (#155559)
Fixes https://github.com/pytorch/pytorch/issues/155031

* [profiler.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/profiler.rst)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155559
Approved by: https://github.com/svekars
2025-06-13 05:02:54 +00:00
Runtian (Rachel) Li
e8b3dfa7c0 convert jit_language_reference.rst to jit_language_reference.md (#155633)
Part of changes https://github.com/pytorch/pytorch/issues/155023 (parent PR https://github.com/pytorch/pytorch/pull/155429)

- converted jit_language_reference.rst to jit_language_reference.md

@pytorchbot label "topic: docs"
@pytorchbot label "topic: not user facing"
@pytorchbot label docathon-h1-2025
@pytorchbot label module: docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155633
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-13 04:58:28 +00:00
Runtian (Rachel) Li
3f65e38b73 Convert hub.rst to hub.md (#155483)
Part of changes https://github.com/pytorch/pytorch/issues/155023 (parent PR https://github.com/pytorch/pytorch/pull/155429)

@pytorchbot label "topic: docs"
@pytorchbot label "topic: not user facing"
@pytorchbot label docathon-h1-2025
@pytorchbot label module: docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155483
Approved by: https://github.com/svekars
2025-06-13 04:39:55 +00:00
Justin Silver
e085012335 Fix #155020 - rst2markdown for export.rst (split PR) (#155753)
Used [rst2myst tool](https://rst-to-myst.readthedocs.io/en/latest/)

Fixes #155020. This PR is split into 3 to pass sanity check. This is the 3rd one.

Docs comparison (check out the 'new' whenever docs build)

1. export ([old](https://docs.pytorch.org/docs/main/export.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155567/export.html))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155753
Approved by: https://github.com/sekyondaMeta
2025-06-12 19:30:52 +00:00
Grant Smith
7986c0dba6 rename distributed.rst to md (#155767)
Fixes #155019

For sanity checks, split PR to have this one only include distributed.rst -> distributed.md

Preview -> [distributed.md](https://docs-preview.pytorch.org/pytorch/pytorch/155767/distributed.html)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155767
Approved by: https://github.com/sekyondaMeta
2025-06-12 18:42:15 +00:00
Runtian (Rachel) Li
9df2e8020f fix code indentation for fx.md (#155764)
Fixes https://github.com/pytorch/pytorch/issues/155023
Related PR: #155482

Description:
As discussed here https://github.com/pytorch/pytorch/pull/155482#pullrequestreview-2918032289, I removed indentation for python code blocks as a follow-up modification for fx.md

Checklist:

- [x] The issue being fixed is referenced above (Fixes https://github.com/pytorch/pytorch/issues/155023)
- [x] Only one issue is addressed in this pull request
- [x] Labels from the issue that this PR is fixing are added to this pull request
- [x] No unnecessary issues are included into this pull request.

@pytorchbot label "topic: docs"
@pytorchbot label "topic: not user facing"
@pytorchbot label docathon-h1-2025
@pytorchbot label module: docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155764
Approved by: https://github.com/svekars
2025-06-12 16:02:33 +00:00
Kazuaki Ishizaki
b00b641ff1 [Docs] Convert to markdown: accelerator.rst, amp.rst, autograd.rst, backends.rst, benchmark_utils.rst (#155762)
Fixes #155013

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155762
Approved by: https://github.com/svekars
2025-06-12 02:55:06 +00:00
Justin Silver
cf9878d7a2 Fix #155022 rst to markdown conversion (#155540)
Used [rst2myst tool](https://rst-to-myst.readthedocs.io/en/latest/)

Fixes #155022

Docs comparison (check out the 'new' whenever docs build)

1. func.ux_limitations ([old](https://docs.pytorch.org/docs/main/func.ux_limitations.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155540/func.ux_limitations.html))
2. func.whirlwind_tour ([old](https://docs.pytorch.org/docs/main/func.whirlwind_tour.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155540/func.whirlwind_tour.html))
3. future_mod ([old](https://docs.pytorch.org/docs/main/future_mod.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155540/future_mod.html))
4. futures ([old](https://docs.pytorch.org/docs/main/futures.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155540/futures.html))
5. fx.experimental ([old](https://docs.pytorch.org/docs/main/fx.experimental.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155540/fx.experimental.html))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155540
Approved by: https://github.com/AlannaBurke, https://github.com/svekars
2025-06-12 00:21:22 +00:00
jafraustro
1b032384b1 Convert rst files to md (#155369)
Fixes #155021
Fixes #155158

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155369
Approved by: https://github.com/svekars, https://github.com/malfet
2025-06-11 23:00:52 +00:00
loganthomas
458cc7213b DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702)
Fixes #155026

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155702
Approved by: https://github.com/sekyondaMeta, https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-11 22:16:04 +00:00
Justin Silver
b7a73a2cdb Convert to markdown: export.programming_model.rst (#155659)
Converts only export.programming_model.rst to markdown

Used [rst2myst tool](https://rst-to-myst.readthedocs.io/en/latest/)

Fixes #155020, but split into a second PR to pass sanity check

Docs comparison (check out the 'new' whenever docs build)

1. export.programming_model ([old](https://docs.pytorch.org/docs/main/export.programming_model.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155659/export.programming_model.html))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155659
Approved by: https://github.com/sekyondaMeta
2025-06-11 20:23:46 +00:00
Kazuaki Ishizaki
2002e3a311 [Docs] Convert to markdown: torch.compiler_transformations.rst, torch.compiler.config.rst (#155347)
Part of changes #155040 (parent PR #155120)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155347
Approved by: https://github.com/svekars
2025-06-11 18:55:30 +00:00
Runtian (Rachel) Li
925fbfca27 Convert fx.rst to fx.md (#155482)
Part of changes #155023 (parent PR #155429)

@pytorchbot label "topic: docs"
@pytorchbot label "topic: not user facing"
@pytorchbot label docathon-h1-2025
@pytorchbot label module: docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155482
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-11 18:46:35 +00:00
GdoongMathew
14f3639e09 Convert to .md: onnx_verification.rst, onnx.rst, package.rst, (#155556)
Fixes https://github.com/pytorch/pytorch/issues/155031

* [onnx_verification.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/onnx_verification.rst)
* [onnx.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/onnx.rst)

* [package.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/package.rst)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155556
Approved by: https://github.com/AlannaBurke, https://github.com/sekyondaMeta
2025-06-10 21:40:40 +00:00
nirajkamalk
ae0f1f8984 Convert to markdown onnx rst (#155228)
Fixes #155030

Converts the following files to MyST markdown and ensure that the doc tests are green:

- [x] [onnx_dynamo_onnxruntime_backend.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/onnx_dynamo_onnxruntime_backend.rst)
- [x] [onnx_dynamo.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/onnx_dynamo.rst)
- [x] [onnx_ops.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/onnx_ops.rst)
- [onnx_torchscript_supported_aten_ops.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/onnx_torchscript_supported_aten_ops.rst) - not changed as it is autogenerated
- [onnx_torchscript.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/onnx_torchscript.rst) - fixed in #155390

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155228
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-10 21:33:07 +00:00
Alberto A. Gallegos
8a396c5635 DOC: Convert to markdown: torch.compiler_best_practices_for_backends.rst, torch.compiler_cudagraph_trees.rst, torch.compiler_custom_backends.rst, torch.compiler_dynamic_shapes.rst, torch.compiler_dynamo_deepdive.rst (#155137)
Fixes #155037

[torch.compiler_best_practices_for_backends.rst](https://github.com/pytorch/pytorch/tree/main/docs/source/torch.compiler_best_practices_for_backends.rst) shows error 404

cc  @svekars @sekyondaMeta @AlannaBurke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155137
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-10 20:51:05 +00:00
jafraustro
01b8f5e685 Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523)
Fixes #155035

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155523
Approved by: https://github.com/AlannaBurke, https://github.com/svekars
2025-06-10 20:38:36 +00:00
Kazuaki Ishizaki
08d15d3ec1 [Docs] Convert to markdown: torch.compiler_troubleshooting.rst (#155514)
Part of changes #155040 (parent PR #155120)

Follow-up of #155351. I split the changes of `torch.compiler_troubleshooting.rst ` into #155351 and this PR due to the 2000-line limit in one PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155514
Approved by: https://github.com/svekars
2025-06-10 15:41:31 +00:00
Vivek Nayak
75f258dd1f Fix spelling mistake (#155495)
Summary: Change "primtivies" to "primitives".

Test Plan:
n/a

Rollback Plan:

Differential Revision: D76229938

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155495
Approved by: https://github.com/angelayi, https://github.com/cyyever
2025-06-10 09:06:58 +00:00
Sahdev Zala
f34335bf33 Convert compiler rst files to markdown (#155335)
Convert following compiler rst files to md file.
torch.compiler_inductor_profiling.rst
torch.compiler_ir.rst
torch.compiler_nn_module.rst
torch.compiler_performance_dashboard.rst
torch.compiler_profiling_torch_compile.rst

Fixes #155039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155335
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-10 01:12:11 +00:00
Kazuaki Ishizaki
5df3bf13ec [Docs] Convert to markdown: torch.compiler_troubleshooting.rst (#155351)
Part of changes #155040 (parent PR #155120)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155351
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-09 23:18:31 +00:00
Qasim Khan
82e6475d92 Add doc for missing functions for torch.special module (#155074)
Fixes #132178

Added all the missing functions that had a docstring but were not present in the documentation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155074
Approved by: https://github.com/albanD
2025-06-09 22:28:26 +00:00
Parag Ekbote
2908c10259 Document the default garbage_collection_threshold value and improve the organization of cuda docs (#155341)
Fixes #150917

As mentioned in the issue, I've updated the documentation of `garbage_collection_threshold`and improved the organization.

Could you please review?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155341
Approved by: https://github.com/AlannaBurke, https://github.com/ngimel
2025-06-08 22:09:35 +00:00
Abhinav Tharamel
d41f62b7a0 Fix/issue #155027 (#155252)
Fixes #155027
Converted RST files to Markdown

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155252
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-08 21:17:31 +00:00
Yuki Kobayashi
11bc29856d Fix some incorrect reST markups in the document (#154831)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154831
Approved by: https://github.com/cyyever, https://github.com/Skylion007
2025-06-07 19:09:46 +00:00
Francisco R Castro Garcia
cd82096973 DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298)
Fixes #155017

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155298
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-06 22:44:50 +00:00
Kazuaki Ishizaki
c95705dac2 [Docs] Convert to markdown: torch.compiler_troubleshooting_old.rst, torch.compiler.rst (#155348)
Part of changes #155040 (parent PR #155120)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155348
Approved by: https://github.com/svekars
2025-06-06 21:26:24 +00:00
loganthomas
4f5b34427b DOC: Convert to markdown: torch.overrides.rst, type_info.rst, utils.rst, xpu.rst (#155088)
Fixes #155041

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155088
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-06-06 20:16:13 +00:00
Joel Schlosser
5e93abe3c0 Address docs for clip_grad functions (#155125)
This PR takes the opinionated stance that `torch.nn.utils.<func>` should be the preferred API over `torch.nn.utils.clip_grad.<func>`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155125
Approved by: https://github.com/albanD, https://github.com/mikaylagawarecki, https://github.com/janeyx99
2025-06-05 19:22:09 +00:00
Jane Xu
2f3f8339ec [BE] Document device memory apis in correct module (#155126)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155126
Approved by: https://github.com/msaroufim, https://github.com/Skylion007
2025-06-05 15:16:48 +00:00
Justin Chu
3e57de1251 [ONNX] Create support for rotary embeddings (#154745)
This PR registers the RotaryEmbedding op in the `torch.ops.onnx` name spaces and allows the exporter to recognize and export onnx operators.

## Design

ONNX operators of their respective opset version is implemented in torch/onnx/ops/_impl.py, and are registered in the torch.ops.onnx namespace following the following rule:

`OpType-version => torch.ops.onnx.OpType.opset{version}`

For example, `RotaryEmbedding-23` becomes `torch.ops.onnx.RotaryEmbedding.opset23`

This name is parsed by the exporter to create an onnx node in the graph without having to go through translation.

When users use the ops in the model, we provide more convenient, unversioned functions under `torch.onnx.ops` that will dispatch to the implementations based on user input (type and provided attributes). For example, users can directly call `torch.onnx.ops.rotary_embedding()` to use the op natively in their pytorch models. I chose snake case naming to make the functions more pythonic and aligned with other torch apis.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154745
Approved by: https://github.com/titaiwangms
2025-06-04 03:07:43 +00:00
Alanna Burke
250e9af4da Removing per torch.compile audit. (#154572)
Removing https://pytorch.org/docs/stable/torch.compiler_best_practices_for_backends.html per torch.compile audit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154572
Approved by: https://github.com/williamwen42, https://github.com/svekars
2025-06-03 15:41:52 +00:00
bobrenjc93
33f2d0ff45 add reference to stances from dynamic shapes doc (#154823)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154823
Approved by: https://github.com/Skylion007, https://github.com/williamwen42
ghstack dependencies: #154802, #154826, #154822
2025-06-02 18:47:19 +00:00
bobrenjc93
d99e9568ec Add docs for how to mark as unbacked (#154822)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154822
Approved by: https://github.com/Skylion007
ghstack dependencies: #154802, #154826
2025-06-02 18:30:57 +00:00
bobrenjc93
9fe1b40d17 [ez] add dynamic sources docs (#154826)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154826
Approved by: https://github.com/Skylion007
ghstack dependencies: #154802
2025-06-02 17:53:30 +00:00
Nikita Shulga
0350c7e72c [BE] Introduce torch.AcceleratorError (#152023)
Which inherits from `RuntimeError` and contains `error_code`, which in case of CUDA should contain error returned by `cudaGetLastError`

`torch::detail::_new_accelerator_error_object(c10::AcceleratorError&)` follows the pattern of CPython's  [`PyErr_SetString`](cb8a72b301/Python/errors.c (L282)), namely
- Convert cstr into Python string with `PyUnicode_FromString`
- Create new exception object using `PyObject_CallOneArg` just like it's done in [`_PyErr_CreateException`](cb8a72b301/Python/errors.c (L32))
- Set `error_code` property using `PyObject_SetAttrString`
- decref all temporary references

Test that it works and captures CPP backtrace (in addition to CI) by running
```python
import os
os.environ['TORCH_SHOW_CPP_STACKTRACES'] = '1'

import torch

x = torch.rand(10, device="cuda")
y = torch.arange(20, device="cuda")
try:
    x[y] = 2
    print(x)
except torch.AcceleratorError as e:
    print("Exception was raised", e.args[0])
    print("Captured error code is ", e.error_code)
```

which produces following output
```
Exception was raised CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /home/ubuntu/pytorch/c10/cuda/CUDAException.cpp:41 (most recent call first):
C++ CapturedTraceback:
#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
#6 c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) [clone .cold] from CUDAException.cpp:0
#7 void at::native::gpu_kernel_impl<at::native::AbsFunctor<float> >(at::TensorIteratorBase&, at::native::AbsFunctor<float> const&) [clone .isra.0] from tmpxft_000191fc_00000000-6_AbsKernel.cudafe1.cpp:0
#8 at::native::abs_kernel_cuda(at::TensorIteratorBase&) from ??:0
#9 at::Tensor& at::native::unary_op_impl_with_complex_to_float_out<at::native::abs_stub_DECLARE_DISPATCH_type>(at::Tensor&, at::Tensor const&, at::native::abs_stub_DECLARE_DISPATCH_type&, bool) [clone .constprop.0] from UnaryOps.cpp:0
#10 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_out_abs_out(at::Tensor const&, at::Tensor&) from RegisterCUDA_0.cpp:0
#11 at::_ops::abs_out::call(at::Tensor const&, at::Tensor&) from ??:0
#12 at::native::abs(at::Tensor const&) from ??:0
#13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__abs>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&> >, at::Tensor (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) from RegisterCompositeExplicitAutograd_0.cpp:0
#14 at::_ops::abs::redispatch(c10::DispatchKeySet, at::Tensor const&) from ??:0
#15 torch::autograd::VariableType::(anonymous namespace)::abs(c10::DispatchKeySet, at::Tensor const&) from VariableType_1.cpp:0
#16 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::abs>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) from VariableType_1.cpp:0
#17 at::_ops::abs::call(at::Tensor const&) from ??:0
#18 at::native::isfinite(at::Tensor const&) from ??:0
#19 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__isfinite>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&> >, at::Tensor (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) from RegisterCompositeImplicitAutograd_0.cpp:0
#20 at::_ops::isfinite::call(at::Tensor const&) from ??:0
#21 torch::autograd::THPVariable_isfinite(_object*, _object*, _object*) from python_torch_functions_2.cpp:0
#22 PyObject_CallFunctionObjArgs from ??:0
#23 _PyObject_MakeTpCall from ??:0
#24 _PyEval_EvalFrameDefault from ??:0
#25 _PyObject_FastCallDictTstate from ??:0
#26 _PyStack_AsDict from ??:0
#27 _PyObject_MakeTpCall from ??:0
#28 _PyEval_EvalFrameDefault from ??:0
#29 _PyFunction_Vectorcall from ??:0
#30 _PyEval_EvalFrameDefault from ??:0
#31 _PyFunction_Vectorcall from ??:0
#32 _PyEval_EvalFrameDefault from ??:0
#33 _PyFunction_Vectorcall from ??:0
#34 _PyEval_EvalFrameDefault from ??:0
#35 PyFrame_GetCode from ??:0
#36 PyNumber_Xor from ??:0
#37 PyObject_Str from ??:0
#38 PyFile_WriteObject from ??:0
#39 _PyWideStringList_AsList from ??:0
#40 _PyDict_NewPresized from ??:0
#41 _PyEval_EvalFrameDefault from ??:0
#42 PyEval_EvalCode from ??:0
#43 PyEval_EvalCode from ??:0
#44 PyUnicode_Tailmatch from ??:0
#45 PyInit__collections from ??:0
#46 PyUnicode_Tailmatch from ??:0
#47 _PyRun_SimpleFileObject from ??:0
#48 _PyRun_AnyFileObject from ??:0
#49 Py_RunMain from ??:0
#50 Py_BytesMain from ??:0
#51 __libc_init_first from ??:0
#52 __libc_start_main from ??:0
#53 _start from ??:0

Captured error code is  710
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152023
Approved by: https://github.com/eqy, https://github.com/mradmila, https://github.com/ngimel
ghstack dependencies: #154436
2025-06-01 21:02:43 +00:00
Nikita Shulga
f7c09f864a [Docs] Reformat sparse example (#154785)
Not sure why, but rst fails to colorize multiline inputs, but works fine for single line commands
Test plan:
| [Before](https://docs.pytorch.org/docs/main/sparse.html#construction)  | [After](https://docs-preview.pytorch.org/pytorch/pytorch/154785/sparse.html#construction) |
| ------------- | ------------- |
| <img width="466" alt="image" src="https://github.com/user-attachments/assets/96a5c52a-1804-4d05-a5cf-c10221aaddf6" />  | <img width="477" alt="image" src="https://github.com/user-attachments/assets/99565288-5c0b-4e8e-bd60-f016ebc207b5" />  |

Fixes https://github.com/pytorch/pytorch/issues/154779

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154785
Approved by: https://github.com/janeyx99, https://github.com/Skylion007
2025-06-01 20:56:14 +00:00
Natalia Gimelshein
f01e628e3b Resubmit Remove MemPoolContext (#154042) (#154746)
Summary: Per title

Test Plan: Added tests + existing tests

Differential Revision: D75695030

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154746
Approved by: https://github.com/malfet
2025-05-31 01:21:54 +00:00
PyTorch MergeBot
d173ba5a75 Revert "Remove MemPoolContext (#154042)"
This reverts commit 3b38989b5f.

Reverted https://github.com/pytorch/pytorch/pull/154042 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/154042#issuecomment-2921401100))
2025-05-30 06:53:37 +00:00
bobrenjc93
9c06dff1ce [multigraph] use specializations in compile_and_call_fx_graph (#153449)
The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM which does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs.

There's really two parts of this work:

**The frontend changes:**
1) we introduce an optional kwarg `specialize_on` to mark_{dynamic,unbacked} that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc.

**The backend changes (this PR):**
1) We capture the backend_specialization specified in the mark_{dynamic,unbacked} API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py`
2) After we are done dynamo tracing, we will lazily (more on this later) invoke `call_user_compiler` up to N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. We do this by using a context manager to patch in specialization specific axioms into the ShapeEnv before invoking the user compiler.
3) When we have specializations, we install a lazy specialized dispatch function that checks each specialization and dispatches to the first one that matches. Instead of doing all of the specialization compiles up front, we do the compiles lazily. The first time a specialization is invoked, we will do the compilation and save it in a cache so subsequent invocations are fast. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards.

I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions:

![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153449
Approved by: https://github.com/zou3519
ghstack dependencies: #153433
2025-05-30 03:19:49 +00:00
nirajkamalk
40abb2b403 Fix deprecated amp APIs in docs (#154553)
Update usage of deprecated amp APIs.

Fixes https://github.com/pytorch/tutorials/issues/3331

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154553
Approved by: https://github.com/Skylion007
2025-05-29 00:05:59 +00:00
Natalia Gimelshein
3b38989b5f Remove MemPoolContext (#154042)
Removes MemPoolContext from custom user mempools. The ground truth for which pool should be used is in graph_pools active pool, and MemPoolContext just introduced an opportunity for the pool pointed to by MemPoolContext and active pool in graph_pools to go out of sync (see all the asserts in the code to make sure that happens, and yet it still could happen in a multithread scenario, see my recent PRs (#153990).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154042
Approved by: https://github.com/albanD, https://github.com/syed-ahmed
2025-05-28 16:35:48 +00:00
Yuki Kobayashi
f55f2f42a7 Add missing docstring for sym_ite (#154201)
`sym_ite` is listed in [the reference page](https://docs.pytorch.org/docs/stable/torch.html) and has no document.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154201
Approved by: https://github.com/Skylion007
2025-05-26 15:59:21 +00:00
bobrenjc93
53ecb8159a Introduce statically_known_false (#154291)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154291
Approved by: https://github.com/mengluy0125
2025-05-24 14:23:55 +00:00
Svetlana Karslioglu
1ab2993345 Add a link to transformer_building_blocks tutorial (#154281)
Cross-link to https://docs.pytorch.org/tutorials/intermediate/transformer_building_blocks.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154281
Approved by: https://github.com/mikaylagawarecki
2025-05-24 02:50:24 +00:00
Svetlana Karslioglu
ec368a1903 Add sitemap (#154158)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154158
Approved by: https://github.com/albanD
2025-05-23 18:01:00 +00:00
Shangdi Yu
04a6fe7914 Update provenance tracking doc (#154062)
Summary: Update the doc to reflect the changes in https://github.com/pytorch/pytorch/pull/153584/files#diff-e0cdb58c0f84f56f20c5433339b6d83c470dcde47847e2328effea6bedd4cd27 and https://github.com/pytorch/tlparse/pull/110

Test Plan: CI

Differential Revision: D75155981

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154062
Approved by: https://github.com/svekars, https://github.com/desertfire
2025-05-23 17:09:52 +00:00
Anita Katahoire
996c4d803d Removing conda references from PyTorch Docs (#152702)
Addresses #148339

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152702
Approved by: https://github.com/svekars, https://github.com/albanD, https://github.com/atalman
2025-05-20 20:33:28 +00:00
Svetlana Karslioglu
7c9d94e9bb Redirect mobile_optimizer.rst to executorch (#153664)
Redirect mobile_optimizer.rst to executorch

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153664
Approved by: https://github.com/byjlw, https://github.com/malfet
2025-05-20 18:13:45 +00:00
Mikayla Gawarecki
6383ddcfa4 Update serialization docs (#153631)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153631
Approved by: https://github.com/albanD
2025-05-19 20:22:07 +00:00
Angela Yi
b4fb801b2d [export] Move PT2 constants to torch::_export (#153206)
Test Plan:
`buck2 test //sigmoid/...`
https://www.internalfb.com/intern/testinfra/testrun/1970325119807758

Differential Revision: D74417085

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153206
Approved by: https://github.com/zhxchen17, https://github.com/dolpm
2025-05-17 08:21:59 +00:00
Anthony Shoumikhin
7d39e73c57 Fix more URLs (#153277)
Or ignore them.
Found by running the lint_urls.sh script locally with https://github.com/pytorch/pytorch/pull/153246

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153277
Approved by: https://github.com/malfet
2025-05-14 16:23:50 +00:00
angelayi
d51bc27378 [export] Make draft_export public (#153219)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153219
Approved by: https://github.com/pianpwk
2025-05-14 02:18:36 +00:00
Svetlana Karslioglu
f136046919 Clean up right nav (#153090)
- Move community and language binding links to the horizontal bar
- Add an intro to the community page.
- Fix the link in the ogp_image
- Fix the link in the version switcher
- Clean up unneeded links

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153090
Approved by: https://github.com/albanD
2025-05-12 21:00:45 +00:00
PyTorch MergeBot
fdc387ec7c Revert "refine fp32 precision api (#125888)"
This reverts commit 4c11b26158.

Reverted https://github.com/pytorch/pytorch/pull/125888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some failures on ROCm ([comment](https://github.com/pytorch/pytorch/pull/125888#issuecomment-2869274791))
2025-05-11 00:35:46 +00:00
haozhe.zhu
4c11b26158 refine fp32 precision api (#125888)
Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32  internal computation data types . Instead, we will directly use the algorithm to represent it.

### Design Choice: Directly use algorithms name like "TF32", "BF16".
#### Pros
 - The names are more informative. 'tf32' is more informative than a simple "high".
 - Easier to extend new algorithm like `tf32x3`
#### Cons
 - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them.

### We provide a layered structure for backends/operators.
('f32' is short for 'fp32_precision')
![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067)

### We provide 3 fp32 compute precision can be set:
 - **"ieee"**: Not allowed to use any other internal computation data types .
 - **"tf32"**: Allowed to use tf32 as internal computation data types.
 - **"bf16"**: Allowed to use bf16 as internal computation data types.
 - **"none"**:  Precision's are not set. Can be override by its father node.

### Overriding Precision Settings
Child node can be override by its father node if it is set to default.
For current default settings:
```
backend = generic, op = all, precision setting = none
    backend = cuda, op = all, precision setting = none
        backend = cuda, op = conv, precision setting = tf32
        backend = cuda, op = rnn, precision setting = tf32
        backend = cuda, op = matmul, precision setting = none
    backend = matmul, op = all, precision setting = none
        backend = matmul, op = conv, precision setting = none
        backend = matmul, op = rnn, precision setting = none
        backend = matmul, op = matmul, precision setting = none
```
 - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16".
 - If the user set `torch.backends.fp32_precision="bf16"`,  `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16".

### Backward Compatible
Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is
 - If the user only uses previous APIs, it will work as previous expectations.
 - If the user use **new** API to change the status to an **un-representable** status for old API, and try to access the status by **old** API. We will raise Runtime Error and point the document for user.

### Test Plan
```
python test/test_cuda.py -k test_fp32_precision_with_tf32
python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision
python test/test_cuda.py -k test_invalid_status_for_legacy_api
python test/test_mkldnn.py -k test_mlkdnn_get_set
python test/test_mkldnn.py -k test_generic_precision
python test/test_mkldnn.py -k test_invalid
python test/test_mkldnn.py -k test_default_use_parent
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888
Approved by: https://github.com/jgong5, https://github.com/albanD

Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>
2025-05-10 11:13:04 +00:00
soulitzer
9d00f2b375 [autograd][docs] Add more details on why save_for_backward is important in extending autograd note (#153005)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153005
Approved by: https://github.com/albanD
2025-05-09 16:36:57 +00:00
Shangdi Yu
faff387bfd Mini tutorial for provenance tracking (#152211)
as title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152211
Approved by: https://github.com/svekars, https://github.com/eellison, https://github.com/desertfire
2025-05-09 01:41:04 +00:00
Wei Feng
5a8c9c3ab0 [FSDP2][Doc] add pointer to torchtitan (#153079)
<img width="838" alt="Screenshot 2025-05-08 at 10 51 05 AM" src="https://github.com/user-attachments/assets/4cf43a16-3801-424b-a74f-ede1d41ff052" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153079
Approved by: https://github.com/mori360
2025-05-08 22:22:07 +00:00
Yuxin Wu
2cf7fd0d2b Update docs of saved_tensors_hooks to avoid ref cycle (#153049)
Fixes #115255

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153049
Approved by: https://github.com/Skylion007, https://github.com/soulitzer
2025-05-07 18:54:56 +00:00
angelayi
60ecc560af [export] Add draft-export docs (#152637)
Sample page: https://docs-preview.pytorch.org/pytorch/pytorch/152637/draft_export.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152637
Approved by: https://github.com/zou3519, https://github.com/svekars
2025-05-07 01:12:45 +00:00
Ti-Tai Wang
5fa5017479 [ONNX] Suggest users setting dynamo=True when exporting (#152478)
Fixes #152025

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152478
Approved by: https://github.com/justinchuby
2025-05-06 23:18:11 +00:00
Laith Sakka
376529c78b consolidate guard_or_x and definitely_x (#152463)
definitely_true is almost same as guard_or_false, the potential differences are not meaningful to a degree that justify the
existence of both. same for definitely_false, it can be expressed with guard_or_true and guard_or_false.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152463
Approved by: https://github.com/bobrenjc93
2025-05-02 18:08:11 +00:00
Huy Do
3f10091d3c Clean up conda usage in benchmark scripts (#152552)
Fixes https://github.com/pytorch/pytorch/issues/152123.

* Switch `benchmarks/dynamo/Makefile` to use uv.  Note that these scripts are only used locally, so it's kind of ok to keep conda here IMO.  But switching to uv is probably nicer to most folks.
* Delete some files that are outdated and not used anymore

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152552
Approved by: https://github.com/atalman, https://github.com/albanD
2025-04-30 21:27:29 +00:00
Svetlana Karslioglu
e58c73be44 Add latex settings (#152350)
- Fixes #147027
- Only lualatex can build our 3K pages PDF with reasonable quality, xelatex runs out of memory and pdflatex just fails.
- Move notes under the same toctree as python-api which is needed for the PDF but doesn't change how the HTML is generated.

This is the produced PDF:
[pytorch.pdf](https://github.com/user-attachments/files/19945450/pytorch.pdf)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152350
Approved by: https://github.com/albanD
2025-04-29 19:28:43 +00:00
Zizeng Meng
861945100e [Kineto] Enable OOM observer (#152160)
Summary:
# Context:
When memory leak happens, it usually trigger the OOM in the later iterations. The snapshot of full iteration will be huge and hard to interpret.
On CUDA side, they provide OOM observer which generates snapshot when OOM happens with latest 1,500,000 entries for debugging.

In this diff, we want to implement the feature on MTIA side

Test Plan:
Run this test with last diff in the stack.
```
buck run @//mode/opt  kineto/libkineto/fb/mtia/integration_tests:mtia_memory_auto_trace_test
```

As shown, the memory_snapshot is generated when oom happens
Log: P1794792326
Snapshot: https://fburl.com/pytorch_memory_visualizer/lx73y6s3 {F1977402355}

Differential Revision: D71993315

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152160
Approved by: https://github.com/sraikund16
2025-04-27 15:56:44 +00:00
Anthony Shoumikhin
e2f9759bd0 Fix broken URLs (#152237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237
Approved by: https://github.com/huydhn, https://github.com/malfet
2025-04-27 09:56:42 +00:00
Dan Johnson
d22c4cc353 Add option to use mempool on OOM (#151487)
MemPool is a separate pool of memory handled by the caching allocator. This PR adds the option let the caching allocator try to use this pool as a last resort instead of OOMing by associating a use_on_oom bool with each MemPool.

Usage:
Users can optionally specify a ``use_on_oom`` bool (which is False by default) during MemPool creation. If true, then the CUDACachingAllocator will be able to use memory in this pool as a last resort instead of OOMing.

```
pool = torch.cuda.MemPool(allocator, use_on_oom=True)
with torch.cuda.use_mem_pool(pool):
    a = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda")
del a
# at the memory limit, this will succeed by using pool's memory in order to avoid the oom
b = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda")
```

Testing:
```
python test/test_cuda.py -k test_mempool_limited_memory_with_allocator
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151487
Approved by: https://github.com/eqy, https://github.com/syed-ahmed, https://github.com/ngimel
2025-04-26 04:04:57 +00:00
Yu, Guangye
33c75cae0a Add torch.accelerator.device_index as accelerator's device switch context (#148864)
# Motivation
We propose adding support for the Python with statement on `torch.accelerator.device_index` to enable device switching functionality. This enhancement would simplify writing device-agnostic code and provide benefits across all accelerators. Its device-specific counterparts include [`torch.cuda.device`](00199acdb8/torch/cuda/__init__.py (L482)) and  [`torch.cuda._DeviceGuard`](00199acdb8/torch/cuda/__init__.py (L469)).

**Design Philosophy**
It accepts either an `Int` or `None` as input. When `None` is passed, no device switch is performed. Supporting `None` is important for compatibility, as it's possible to encounter `None` values from `torch.device.index`.

Therefore, with this PR, we can do like this

```python
src = 0
dst = 1
# Set src to current device
torch.accelerator.set_device_index(src)
with torch.accelerator.device_index(dst):
    # Inside with statement, we set dst to current device
    assert torch.accelerator.get_device_index() == dst
# Here the current device should be src
assert torch.accelerator.get_device_index() == src
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148864
Approved by: https://github.com/albanD
2025-04-25 09:45:25 +00:00
Jane Xu
8a9c66bb70 Improve stable library apis per Scott's feedback (#152040)
Following 3 suggestions:
1. inline at::Tensor arg
2. use uniq ptr of array vs std::vector
3. document the `std::optional<S>()` case

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152040
Approved by: https://github.com/swolchok, https://github.com/albanD
2025-04-24 20:51:03 +00:00
ILCSFNO
bd09d87fdb add Out Notes (#151306)
Fixes #150181
@albanD Could you please have a check?

Build locally without pytorch build:

![Developer-FAQ](https://github.com/user-attachments/assets/351a7e0b-588e-48ae-ad0a-03f427c86e89)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151306
Approved by: https://github.com/albanD
2025-04-24 20:25:09 +00:00
Pian Pawakapan
2ee8de54b1 [dynamic shapes] user-code friendly statically_known_true, has_static_value (#151601)
Fixes #151480

Allows `statically_known_true` in user code, as well as introducing `has_static_value`, returning True if the input has a static bool/float/int value

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151601
Approved by: https://github.com/laithsakka, https://github.com/zou3519, https://github.com/jingsh
2025-04-24 02:53:59 +00:00
Kaiyu Shi
f39a1a43ee Fix typos in meta.rst (#151979)
### Fixes made:
- "allow you to the module" → corrected to "allows you to move the module"

- "allow" → changed to "allows" to agree with the singular subject "method"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151979
Approved by: https://github.com/colesbury
2025-04-24 01:25:09 +00:00
Syed Tousif Ahmed
334aab0dea Updates NCCLConfig with QOS variable (#151821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151821
Approved by: https://github.com/kwen2501
2025-04-23 00:03:49 +00:00
Scott Wolchok
2f74cffab2 Remove reinterpret_casts with undefined behavior from stable/library.h (#151595)
There is a list of valid uses of `reinterpret_cast` (see https://en.cppreference.com/w/cpp/language/reinterpret_cast), and the use here was not on the list, hence undefined behavior. Implement what we meant using memcpy, which is well-defined.

Differential Revision: [D73200791](https://our.internmc.facebook.com/intern/diff/D73200791/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151595
Approved by: https://github.com/janeyx99
2025-04-22 20:24:47 +00:00
Svetlana Karslioglu
2fb1326483 Add dates to pages (#151602)
re: #150873
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151602
Approved by: https://github.com/albanD
2025-04-21 19:53:55 +00:00
Will Constable
bedefa46a9 Document non-pytorch CUDA memory allocation and how to query it (#150880)
This PR documents the fact that PyTorch does not have visibility into how every CUDA memory allocation happend - it only knows about allocations that went through the pytorch CUDA allocator.

It also adds a code snippet showing how to use pynvml to query current GPU memory usage.

## Preview
Added a note at the top of "Understanding CUDA Memory Usage" doc:
<img width="732" alt="image" src="https://github.com/user-attachments/assets/69e28d2a-841a-4b1b-b886-e96fb5d76582" />

which links to a section below:
<img width="733" alt="image" src="https://github.com/user-attachments/assets/cab4f252-9ac2-4fc6-a45d-fdb958fc7dbc" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150880
Approved by: https://github.com/kwen2501, https://github.com/ngimel
2025-04-18 03:48:54 +00:00
Kashif Rasul
2ed2cb5805 add generalized pareto distribution (GPD) (#135968)
Add the GPD as a distribution class

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135968
Approved by: https://github.com/albanD

Co-authored-by: Alexander März <statmixedmlgit@gmail.com>
2025-04-17 18:51:02 +00:00
Svetlana Karslioglu
cd7bc60e11 Migrate to new theme (#149331)
- Migrate pytorch docs, cpp docs and functorch docs to the pytorch_sphinx_theme2
- Migrate index.rst to markdown and restructure to use high-level horizontal bar sections Python API, Developer Notes
- Added python-api.md which becomes the main container for the API docs. This file will be used to add all api references in the toctree. It would be great to have lint for this file: https://github.com/pytorch/pytorch/issues/150718
- Enabled mermaid sphinx extension and opengraph sphinx extension

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149331
Approved by: https://github.com/malfet, https://github.com/atalman, https://github.com/albanD
2025-04-16 21:35:19 +00:00
Pian Pawakapan
6dddd6520d [dynamic shapes] add sym_and, sym_or (#150456)
This has been pretty helpful for the size-oblivious rewrite. Wanted the variadic args version to avoid `sym_or(a, sym_or(b, sym_or(c, d)))` in favor of `sym_or(a, b, c, d)`. Happy to change this to ban the 1-arg version.

This is better than plain and/or because the whole symbolic expression gets preserved, and if we guard on it or defer as a runtime assert, we preserve all branches.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150456
Approved by: https://github.com/laithsakka
2025-04-14 18:18:06 +00:00
fzyzcjy
50abc1ecc4 Super tiny fix typo (#151212)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151212
Approved by: https://github.com/Skylion007
2025-04-14 16:47:40 +00:00
zeshengzong
5eebcb991a Add scripts to generate plots of LRSchedulers (#149189)
Fixes #92007

## Changes

- Add script to generate plots for `lr_scheduler`
- Add plots to `lr_scheduler` docs
- Add example section if it missing in `lr_scheduler` docs

## Test Result

### LambdaLR

![image](https://github.com/user-attachments/assets/37fc0894-e2ec-48f2-a2d6-3514e51e1ea2)

### MultiplicativeLR

![image](https://github.com/user-attachments/assets/2122b3a0-a4ce-42c7-bb45-559c1fc73e0f)

### StepLR

![image](https://github.com/user-attachments/assets/47bc9d96-4b60-4586-a000-f213583bbe8f)

### MultiStepLR

![image](https://github.com/user-attachments/assets/c822b849-d5be-4b94-aa7a-0017a2c9ff15)

### ConstantLR

![image](https://github.com/user-attachments/assets/83107cdd-7b00-44a6-b09d-e8ee849b4a12)

### LinearLR

![image](https://github.com/user-attachments/assets/60190105-691a-4101-8966-5b0c396093a4)

### ExponentialLR

![image](https://github.com/user-attachments/assets/dfcbcbca-89e5-4a2f-b1bd-33e25d2405ec)

### PolynomialLR

![image](https://github.com/user-attachments/assets/7c3d4fce-c846-40a0-b62e-f3e81c7e08bd)

### CosineAnnealingLR

![image](https://github.com/user-attachments/assets/26712769-dde9-4faa-b61b-e23c51daef50)

### ChainedScheduler

![image](https://github.com/user-attachments/assets/20734a8b-e939-424f-b45a-773f86f020b1)

### SequentialLR

![image](https://github.com/user-attachments/assets/2cd3ed67-2a0a-4c42-9ad2-e0be090d3751)

### ReduceLROnPlateau

![image](https://github.com/user-attachments/assets/b77f641e-4810-450d-b2cd-8b3f134ea188)

### CyclicLR

![image](https://github.com/user-attachments/assets/29b8666f-41b3-45e4-9159-6929074e6108)

### OneCycleLR

![image](https://github.com/user-attachments/assets/d5b683ef-41e8-4ca8-9fe8-0f1e6b433866)

### CosineAnnealingWarmRestarts

![image](https://github.com/user-attachments/assets/1d45ea80-dea8-494d-a8ab-e9cfc94c55d6)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149189
Approved by: https://github.com/janeyx99
2025-04-14 09:53:38 +00:00
Tristan Rice
df4e5294a6 Reapply "ProcessGroupGloo: support lazy_init (#150801)" (#151031)
This reverts commit 73f3d6d9aa.

Reapplies #150801

Test plan:

See #150801

submodule

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151031
Approved by: https://github.com/fduwjj
2025-04-11 01:58:35 +00:00
Will Constable
c9a35c2a6e [C10D] Document object collectives limitations (#150815)
Adds louder warning labels in the doc page and docstring for object
collectives in hopes of raising awareness of several footgun issues
including accidental creation of cuda contexts by serializing and
sending 'device-local' gpu tensors over the object-* apis.

Preview:
<img width="902" alt="image" src="https://github.com/user-attachments/assets/e0c08c70-d8e5-4e15-b3e2-5cd563714f71" />

addresses #150798

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150815
Approved by: https://github.com/kwen2501
2025-04-10 22:48:39 +00:00
PyTorch MergeBot
73f3d6d9aa Revert "ProcessGroupGloo: support lazy_init (#150801)"
This reverts commit f237ee54bf.

Reverted https://github.com/pytorch/pytorch/pull/150801 on behalf of https://github.com/atalman due to failing internally ([comment](https://github.com/pytorch/pytorch/pull/150801#issuecomment-2793161239))
2025-04-10 13:44:31 +00:00
Yu, Guangye
6972255dad Document poison fork note for accelerator APIs (#147507)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147507
Approved by: https://github.com/sraikund16, https://github.com/kwen2501, https://github.com/albanD
2025-04-10 02:37:37 +00:00
Tristan Rice
f237ee54bf ProcessGroupGloo: support lazy_init (#150801)
This adds lazy initialization support to ProcessGroupGloo via `TORCH_GLOO_LAZY_INIT` or via `create_device(..., lazy_init=True)`

This is still a draft PR as there's one race condition when doing coalesced operations that needs to be fixed upstream in Gloo first. Depends on https://github.com/facebookincubator/gloo/pull/427 landing first

This also updates the gloo submodule to include the required changes.

Test plan:

added lazy init test variants

```
pytest -v test/distributed/test_c10d_gloo.py -k Lazy
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150801
Approved by: https://github.com/fduwjj
2025-04-09 19:29:50 +00:00
Antoine Broyelle
886d9acb0d [docs] Add 32-bit complex to the list of dtypes (#144590)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144590
Approved by: https://github.com/janeyx99
2025-04-09 13:10:21 +00:00
zeshengzong
c9c0f8eae3 Add plot for torch.nn.Threshold and torch.nn.GLU (#150171)
Fixes #150170

## Changes

- Add plot for `torch.nn.Threshold` and `torch.nn.GLU`
- Add example output make them easier get result by users

## Test Result

![image](https://github.com/user-attachments/assets/f6c5bc46-f9b7-4db7-9797-e08d8423d1b3)

![image](https://github.com/user-attachments/assets/ad4e6c84-7b29-44f1-b7bd-9c81e4a92ef8)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150171
Approved by: https://github.com/albanD
2025-04-08 03:55:37 +00:00
ZhaoqiongZ
96f35f55e2 update get start xpu document for v2.7 (#150397)
update get start xpu document for v2.7
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150397
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/atalman

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-04-03 18:17:08 +00:00
Avik Chaudhuri
b70d105c77 infer dynamic shapes through additional inputs (#150144)
Summary:
Instead of explicitly specifying dynamic shapes, it is possible to infer them from additional example inputs. Together with the example inputs provided to export, we can basically make any varying dim dynamic and keep any fixed dim static. This should be useful for prod scenarios that have access to tests and/or profiling data, yet are somewhat removed from the model authoring process.

However this alone is not satisfactory: the exported program by design has only one graph, representing one path through the model, and we cannot necessarily guarantee that this graph works for the additional example inputs because different guards might have been created if we had exported with them instead (corresponding to different traced paths). However, checking that the additional example inputs satisfy the guards created by the original export should be sufficient for generalization.

Now, while we don't preserve all guards in the exported program, we do check a subset of them as part of input matching. So we add a verification step at the end of export when such additional example inputs are provided. This should be enough for now.

Test Plan: added test (positive and negative cases)

Differential Revision: D72001771

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150144
Approved by: https://github.com/bobrenjc93
2025-04-01 21:13:39 +00:00
Tianyu Liu
d2ad9aa2f2 [dtensor][tp] add a ParallelStyle PrepareModuleInputOutput (#150372)
Needed this class for because `parallelize_module` takes a dict, which doesn't allow `PrepareModuleInput` and `PrepareModuleOutput` to be applied at the same time.

The `PrepareModuleInputOutput` in this PR initializes two variables `prepare_module_input` and `prepare_module_output` and uses them to process module / inputs / outputs.

I had another implementation which put all code in `PrepareModuleInputOutput` and let `PrepareModuleInput` and `PrepareModuleOutput` inherit the monolithic `PrepareModuleInputOutput`. But it is
1. less cleaner
2. conceptually abusing inheritance because `PrepareModuleInput` shouldn't be able to access class methods of `PrepareModuleOutput` and vice versa

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150372
Approved by: https://github.com/wanchaol
2025-04-01 19:15:43 +00:00
Xia, Weiwen
3b0cd9b542 [Quant][PT2E] add a lowering pass for x86 backend (#149708)
**Summary**
This PR adds a lowering pass for x86 backend
- Patterns of `dequantize -> conv/linear (-> quantize)` are fused to corresponding quantized onednn ops.
- Weights are prepacked ahead of time.
- Post ops of conv/linear are fused if supported.
- The pass returns a `GraphModule` with the modifications mentioned above.

**Test plan**
```
pytest test/quantization/pt2e/test_x86inductor_quantizer.py -k test_lowering_to_x86
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149708
Approved by: https://github.com/jerryzh168, https://github.com/leslie-fang-intel
2025-04-01 17:32:41 +00:00
Pian Pawakapan
103bf64a3c [export] refactor _Dim into Dim (#149891)
Summary: forward fix T218515233

Test Plan: test_export

Differential Revision: D71769231

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149891
Approved by: https://github.com/jingsh, https://github.com/angelayi
2025-03-28 06:19:03 +00:00
Laith Sakka
6cbcdee944 Introduce guard_or_true, guard_or_false (#148430)
some context in this document:
https://docs.google.com/document/d/18nJsj-F2C_QXO7ClwzPcAUENQ-B440B43W7DdDnlDt4/edit?tab=t.0#heading=h.pgebnyi7pocj

But TLDR;
`guard_or_true`, `guard_or_false` are better than `guard_size_oblivious` due to :
- Easier to reason about what assumptions we are making while reading the code.
- Avoid size_oblivious complexity that is not needed.
- Avoid unsoundness that could make `guard_size_oblivious(a==1)` be true when its not true for some vaue `a` during runtime.
- Less data dependent errors for some cases: ex, when doing `guard_size_oblivious(a==1)` and we know `a` is a tensor size, if it's traced with `a=u1-u2` `guard_size_oblivious(a==1)` will throw a data dependent error but `guard_else_false` will just return `False`.

### How is it different from statically_known_true??
**`if(cond)`:** (normal guarding) will try to evaluate statically and guard on the condition, willing to restrict input space to evaluate cond. if it fails to evaluate due to data dependent error will throw an exception (that could be converted to graph break in some situations).

**`statically_known_true(cond)`:** would be used when you never want to add a guard (restrict your input space), but just want to do a best effort check to see if you can infer that something is true/false ONLY based on existing constraints.

**`guard_or_true(cond)`/`guard_or_false(cond)`:** Those would be used in situations you prefer to guard and know the result of the expression over not guarding, but in case you hit a data dependent error you are ok with just returning true or false.
Some reasons you might be ok with returning true/false instead could be:
1. It's an optimization I do not want to fail for not performing optimization.
2. I am willing to deviate from the normal semantics when I have unbacked for the benefit of not failing (See the doc above for more details).

**`definitely_true(cond)`**: same as `guard_or_false(cond)` except does not try to do static eval for unbacked (planning to deprecate it and replace uses with `guard_or_false` or make it alias to `guard_or_false`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148430
Approved by: https://github.com/bobrenjc93
2025-03-27 09:34:05 +00:00
Louie Tsai
7aacbab0b3 Update Doc for Intel XPU Profiling (#134515)
Updated below two pages for Intel XPU
https://pytorch.org/docs/stable/torch.compiler_profiling_torch_compile.html
https://pytorch.org/docs/stable/profiler.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134515
Approved by: https://github.com/dvrogozh, https://github.com/malfet
2025-03-27 09:15:35 +00:00
PyTorch MergeBot
e080bac533 Revert "Introduce guard_or_true, guard_or_false (#148430)"
This reverts commit d5593ea31c.

Reverted https://github.com/pytorch/pytorch/pull/148430 on behalf of https://github.com/laithsakka due to need to fix stuff ([comment](https://github.com/pytorch/pytorch/pull/148430#issuecomment-2756701436))
2025-03-27 05:10:20 +00:00
Laith Sakka
d5593ea31c Introduce guard_or_true, guard_or_false (#148430)
some context in this document:
https://docs.google.com/document/d/18nJsj-F2C_QXO7ClwzPcAUENQ-B440B43W7DdDnlDt4/edit?tab=t.0#heading=h.pgebnyi7pocj

But TLDR;
`guard_or_true`, `guard_or_false` are better than `guard_size_oblivious` due to :
- Easier to reason about what assumptions we are making while reading the code.
- Avoid size_oblivious complexity that is not needed.
- Avoid unsoundness that could make `guard_size_oblivious(a==1)` be true when its not true for some vaue `a` during runtime.
- Less data dependent errors for some cases: ex, when doing `guard_size_oblivious(a==1)` and we know `a` is a tensor size, if it's traced with `a=u1-u2` `guard_size_oblivious(a==1)` will throw a data dependent error but `guard_else_false` will just return `False`.

### How is it different from statically_known_true??
**`if(cond)`:** (normal guarding) will try to evaluate statically and guard on the condition, willing to restrict input space to evaluate cond. if it fails to evaluate due to data dependent error will throw an exception (that could be converted to graph break in some situations).

**`statically_known_true(cond)`:** would be used when you never want to add a guard (restrict your input space), but just want to do a best effort check to see if you can infer that something is true/false ONLY based on existing constraints.

**`guard_or_true(cond)`/`guard_or_false(cond)`:** Those would be used in situations you prefer to guard and know the result of the expression over not guarding, but in case you hit a data dependent error you are ok with just returning true or false.
Some reasons you might be ok with returning true/false instead could be:
1. It's an optimization I do not want to fail for not performing optimization.
2. I am willing to deviate from the normal semantics when I have unbacked for the benefit of not failing (See the doc above for more details).

**`definitely_true(cond)`**: same as `guard_or_false(cond)` except does not try to do static eval for unbacked (planning to deprecate it and replace uses with `guard_or_false` or make it alias to `guard_or_false`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148430
Approved by: https://github.com/bobrenjc93
2025-03-27 02:22:20 +00:00
Tristan Rice
159e97cbcf ProcessGroupGloo: support reduce_scatter + update support chart (#149869)
This adds a `reduce_scatter` implementation for ProcessGroupGloo. This is a pretty naive implementation as it does 1 allreduce per  rank but may be useful for testing in FSDP etc. There was an existing implementation of reduce_scatter_tensor/reduce_scatter_tensor_coalesed that has a very similar implementation but requires a fixed tensor size per rank.

If users find these functions to be too slow we can address them as issues arise.

Gloo now supports all major distributed operations. Quite a few of these were added by @rohan-varma and @yifuwang but they didn't update the support chart. We also have `CUDAWork` variants of most operations so those were also added to the chart.

Test plan:

```
pytest -v test/distributed/test_c10d_gloo.py -k reduce_scatter
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149869
Approved by: https://github.com/fduwjj
2025-03-25 01:16:12 +00:00
Justin Chu
2dccd70ef0 [ONNX] Clean up legacy dynamo export code (#149745)
Clean up code that is unused and obsolete. The public `torch.onnx.dynamo_export` is kept for now but the legacy implementation is removed.

Remove public option classes and OnnxRegistry that have been deprecated.

Users: use torch.onnx.export(…, dynamo=True).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149745
Approved by: https://github.com/titaiwangms, https://github.com/cyyever
2025-03-23 19:35:16 +00:00
Pradeep Fernando
1b08aaeafe Supporting non-tensor-data write_size in planner write items. (#149699)
Summary:
1\ The current write item structure does not contain the amount of data that needs to be written.
2\ the planner.item already has a size primitive 'tensor_storage_size'. https://fburl.com/code/7a0gsmw7 But only for tensors.
3\ Right now, the only way the writer layer get hold of this property (fro non tensor data)
first do a lookup in to the actual tensor/bytes
then calculate the nbytes.
This change introduce a way to capture non-tensor data size within a write-plan item.

Test Plan: Existing UT.

Differential Revision: D71599725

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149699
Approved by: https://github.com/MeetVadakkanchery
2025-03-21 18:09:14 +00:00
Jing Xu
4ea580568a update aotinductor doc for XPU support (#149299)
as title. Since the AOTInductor feature starting from 2.7 works on Intel GPU, add the related contents into its doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149299
Approved by: https://github.com/guangyey, https://github.com/desertfire
2025-03-21 04:40:31 +00:00
FFFrog
1dce65a82c Fix the invalid link for FX (#149289)
As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149289
Approved by: https://github.com/zou3519
2025-03-19 14:03:18 +00:00
FFFrog
e8a35eb7da Add Missing Communication collectives (#147379)
----

- reduce_add_coalesced
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147379
Approved by: https://github.com/mikaylagawarecki
2025-03-19 06:59:04 +00:00
Justin Chu
010963032c [ONNX] Create onnx_symbolic (#148905)
In the old exporter we allow users to define a symbolic() method to bypass JIT tracing for a block of logic. We can allow users to do similar things by creating symbolic ops at export.

This PR implements `torch.onnx.ops.symbolic` and `torch.onnx.ops.symbolic_multi_out` to allow users to create onnx nodes symbolically with pt2 & fx. The custom pytorch ops were designed such that the attributes are encoded to be part of a valid fx op. Users provide shape and dtype for the meta function to produce the currect fake tensor during export.

An example is

![image](https://github.com/user-attachments/assets/c62f5f21-e038-456e-a71d-b9a5d0a7cd9d)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148905
Approved by: https://github.com/titaiwangms
2025-03-18 21:32:06 +00:00
Jane Xu
988827cdfb Use schema as source of truth + support ones_like/empty_like (#149052)
This change does 2 important things:
(a) Instead of relying on IValue type as source of truth, we use the schema as the source of truth, which is important as IValue types are overloaded and can ambiguously convert incorrectly. For example, a MemoryFormat will look like an int + get converted to an int64_t vs a MemoryFormat!

(b) This PR expands support for many more types to encompass way more schemas, e.g., Optional, Device, dtype, etc. The main win from this PR is the ability for aoti_torch_call_dispatcher to call TensorFactory ops like ones_like/empty_like!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149052
Approved by: https://github.com/albanD
2025-03-18 02:40:54 +00:00
Justin Chu
ebabd0efdd [ONNX] Expose verification utilities (#148603)
Expose verification utilities to public documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148603
Approved by: https://github.com/titaiwangms
2025-03-18 02:10:34 +00:00
Leo Wang
f4bffb7461 [docs] fix autograd description on convex function case (#148658)
The sub-gradient of minimum norm is the least steep descent direction.

```python
import torch

x = torch.tensor([-2, -1, 0, 1, 2.], requires_grad=True)
torch.relu(x).sum().backward()
print(x.grad) # tensor([0., 0., 0., 1., 1.])

y = torch.tensor([-2, -1, 0, 1, 2.], requires_grad=True)
torch.abs(y).sum().backward()
print(y.grad) # tensor([-1., -1.,  0.,  1.,  1.])
```

(How can I request a reviewer? I don't have the button on the right)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148658
Approved by: https://github.com/lezcano
2025-03-13 09:06:15 +00:00
Howard Huang
b98af95401 Fix DCP link (#148974)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148974
Approved by: https://github.com/svekars
2025-03-11 21:26:37 +00:00
Nikita Shulga
c18858d633 [MPS] Make torch.mps.compile_shader public (#148972)
It was a private method in 2.6, but nothin changes in its API for 2.7
and it will likely remain the same in 2.8, so time to remove underscore
from its name

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148972
Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/seemethere, https://github.com/albanD, https://github.com/dcci
2025-03-11 20:20:58 +00:00
Chien-Chin Huang
52acc1f955 [DSD] Update the document to mention the limitation of set_optimizer_state_dict (#148918)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/140898

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148918
Approved by: https://github.com/fduwjj, https://github.com/mori360
ghstack dependencies: #148825
2025-03-11 18:24:12 +00:00
albanD
68c12ecfe2 Move get accelerator to use build time flags when possible (#146098)
This PR does two main things (they are in a single PR to show how the newly added APIs are used).

- Add isBuilt and isAvailable APIs to the AcceleratorHook interface. See inline doc for their exact semantic
- Use the newly added isBuilt for accelerator check to ensure it does not poison fork

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146098
Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/EikanWang, https://github.com/jeromean

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-10 13:17:58 +00:00
Nichols A. Romero
08baaa7d63 [Docs][TunableOp] TunableOp documentation update (#148384)
This PR aligns documentation to what is in the README file:
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cuda/tunable/README.md

and removes the prototype NOTE.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148384
Approved by: https://github.com/jeffdaily, https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2025-03-07 21:02:49 +00:00
PyTorch MergeBot
b246cd7b82 Revert "Move get accelerator to use build time flags when possible (#146098)"
This reverts commit 17302b4bc8.

Reverted https://github.com/pytorch/pytorch/pull/146098 on behalf of https://github.com/albanD due to Still fails with cuda build on a non-gpu machine ([comment](https://github.com/pytorch/pytorch/pull/146098#issuecomment-2707191770))
2025-03-07 18:59:58 +00:00
albanD
17302b4bc8 Move get accelerator to use build time flags when possible (#146098)
This PR does two main things (they are in a single PR to show how the newly added APIs are used).

- Add isBuilt and isAvailable APIs to the AcceleratorHook interface. See inline doc for their exact semantic
- Use the newly added isBuilt for accelerator check to ensure it does not poison fork

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146098
Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/EikanWang, https://github.com/jeromean

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-07 15:19:34 +00:00
Syed Tousif Ahmed
3960f97832 Documents torch.cuda.MemPool API (#148374)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148374
Approved by: https://github.com/eqy, https://github.com/ngimel
2025-03-06 23:18:43 +00:00
Mikayla Gawarecki
be0ceee1c3 Make record/storage alignment in torch.save configurable (#147788)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147788
Approved by: https://github.com/albanD
ghstack dependencies: #147786, #147787
2025-03-06 12:04:46 +00:00
ZhaoqiongZ
38479e495e Add note to get start xpu (#148168)
Installing PyTorch from binaries will automatically install the runtime packages of Intel® Deep Learning Essentials. In this case, if we activate oneAPI in a standalone installation of Intel® Deep Learning Essentials, there will be an environment issue. Therefore, add a note to remind users to avoid this situation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148168
Approved by: https://github.com/janeyx99

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-05 18:11:14 +00:00
Marko Radmilac
c65ee728f0 Initial implementation of host memory stats (#147660)
This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics.

This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache.

As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later.

Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660
Approved by: https://github.com/ngimel
2025-03-05 16:13:19 +00:00
Meet Vadakkanchery
fdee60769a [DCP] Introduce process based async checkpointing (#147039)
Summary:
### Context
Background checkpoint upload thread interfering with trainer thread:

In [async save API](https://github.com/pytorch/pytorch/blob/main/torch/distributed/checkpoint/state_dict_saver.py#L239-L248), the background thread spends a considerable amount of time on CPU-bound tasks (pickling/unpickling several metada objects a.k.a SavePlans) on rank0 during the collective operation; this kind of asymmetric computation heavily contends for GIL with the trainer thread causing GPU util to suffer significantly for the E2E checkpoint duration.

### Solution:
Introduce async save via a checkpoint daemon process. This daemon process will be created once (during the first save attempt) and can serve async checkpoint requests for the remainder of training lifetime.

Test Plan: Added E2E UTs for process based async save.

Differential Revision: D69272583

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147039
Approved by: https://github.com/saumishr
2025-03-04 13:33:28 +00:00
Shangdi Yu
b17f5223a4 Generate AOTI input check by default (#148005)
Summary:
Generate AOTI size and stride input check by default. But the checks are only run if `AOT_INDUCTOR_DEBUG_COMPILE` env variable is set (to avoid slowing down the performance).

Example output:

```cpp
            bool _check_aoti_runtime_check_inputs_env() {
                const static char* env_var_value = getenv("AOTI_RUNTIME_CHECK_INPUTS");
                const static bool result = env_var_value != nullptr && env_var_value[0] != '\0';
                return result;
            }

            AOTI_NOINLINE static void __check_inputs_outputs(
                AtenTensorHandle* input_handles,
                AtenTensorHandle* output_handles) {
                if (!_check_aoti_runtime_check_inputs_env()){
                    return;
                }
//rest of the check
}

```

Test Plan: CI

Differential Revision: D70260490

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148005
Approved by: https://github.com/hl475, https://github.com/desertfire, https://github.com/jingsh
2025-03-04 00:55:14 +00:00
PyTorch MergeBot
a983b2b11a Revert "Initial implementation of host memory stats (#147660)"
This reverts commit 945e359fc1.

Reverted https://github.com/pytorch/pytorch/pull/147660 on behalf of https://github.com/mradmila due to There is an issue with ambiguous definition of Stat structure when different C++ tools are used. Backing out for now. ([comment](https://github.com/pytorch/pytorch/pull/147660#issuecomment-2692346379))
2025-03-01 18:05:45 +00:00
Marko Radmilac
945e359fc1 Initial implementation of host memory stats (#147660)
This is an initial attempt to provide some statistics for the pinned host memory allocations flowing through CachingHostAllocator. Many times in the past we have had inexplicable slowdowns that would be much easier to diagnose if we had some host memory characteristics.

This change tries very hard not to disrupt the initial design of the allocator, and it uses existing locking mechanism, whenever possible, to gather statistics "for free". Only deviation from that is on the "slow path" where we incur CUDA calls anyway, so taking a short lock is not going to hurt the performance much, especially in the steady state where most allocations will come from cache.

As mentioned before, this is the first PR, to introduce the concept and to see if it fits the right paradigm. We can always add more later.

Metrics that would require more involved changes to the code base and locks, like requested memory, have been punted for now. I also tried to reuse the Stat structure used in CUDA caching allocator, in order to maintain symmetry.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147660
Approved by: https://github.com/ngimel
2025-02-28 18:36:44 +00:00
ZhaoqiongZ
20ce67cd06 Udpate hw requirement for FP64 on "Getting Started on Intel GPU" (#147802)
Fixes #147731

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147802
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-02-27 01:54:19 +00:00
PyTorch MergeBot
7e7d05bf85 Revert "[do not merge yet] update grammar (#147996)"
This reverts commit 6e129a697f.

Reverted https://github.com/pytorch/pytorch/pull/147996 on behalf of https://github.com/seemethere due to Need to revert ([comment](https://github.com/pytorch/pytorch/pull/147996#issuecomment-2686291282))
2025-02-26 22:01:12 +00:00
sokkaofthewatertribe
6e129a697f [do not merge yet] update grammar (#147996)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147996
Approved by: https://github.com/seemethere
2025-02-26 21:52:58 +00:00
PyTorch MergeBot
dc7556f1bd Revert "[do not merge yet] update grammar (#147996)"
This reverts commit a1ee2c3a08.

Reverted https://github.com/pytorch/pytorch/pull/147996 on behalf of https://github.com/seemethere due to Need to revert ([comment](https://github.com/pytorch/pytorch/pull/147996#issuecomment-2686266052))
2025-02-26 21:43:06 +00:00
sokkaofthewatertribe
a1ee2c3a08 [do not merge yet] update grammar (#147996)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147996
Approved by: https://github.com/seemethere
2025-02-26 21:39:08 +00:00
martin-kokos
8de6fe8c0b [docs] fix numpy docs reference (#147697)
Fix a link to numpy documentation that has moved and now 404's

I"ve checked other numpy doc links that point to docs.scipy.org (which then redirects to numpy.org) and they do work, so I am fixing just this 404.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147697
Approved by: https://github.com/soulitzer
2025-02-26 01:30:03 +00:00
Svetlana Karslioglu
14b9f7f7bc Remove link to search survey (#147751)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147751
Approved by: https://github.com/malfet
2025-02-25 19:26:59 +00:00
Xuehai Pan
754fb834db [BE][CI] bump ruff to 0.9.0: string quote styles (#144569)
Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting

- Change the outer quotes to double quotes for nested f-strings

```diff
- f'{", ".join(args)}'
+ f"{', '.join(args)}"
```

- Change the inner quotes to double quotes for triple f-strings

```diff
  string = """
-     {', '.join(args)}
+     {", ".join(args)}
  """
```

- Join implicitly concatenated strings

```diff
- string = "short string " "short string " f"{var}"
+ string = f"short string short string {var}"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569
Approved by: https://github.com/Skylion007
ghstack dependencies: #146509
2025-02-24 19:56:09 +00:00
Dmitry Rogozhkin
d27ecf85db xpu: support sycl with torch.utils.cpp_extension APIs (#132945)
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: #132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132945
Approved by: https://github.com/albanD, https://github.com/guangyey, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-02-16 16:50:59 +00:00
PyTorch MergeBot
dd5d0ea6bb Revert "xpu: support sycl with torch.utils.cpp_extension APIs (#132945)"
This reverts commit 607379960b.

Reverted https://github.com/pytorch/pytorch/pull/132945 on behalf of https://github.com/malfet due to It just broke all the tests, see b16ae97ad0/1 ([comment](https://github.com/pytorch/pytorch/pull/132945#issuecomment-2661498747))
2025-02-16 16:03:42 +00:00
Dmitry Rogozhkin
607379960b xpu: support sycl with torch.utils.cpp_extension APIs (#132945)
This patch adds support for sycl kernels build via `torch.utils.cpp_extension.load`, `torch.utils.cpp_extension.load_inline` and (new) `class SyclExtension` APIs. Files having `.sycl` extension are considered to have sycl kernels and are compiled with `icpx` (dpc++ sycl compiler from Intel). Files with other extensions, `.cpp`, `.cu`, are handled as before. API supports building sycl along with other file types into single extension.

Note that `.sycl` file extension is a PyTorch convention for files containing sycl code which I propose to adopt. We did follow up with compiler team to introduce such file extension in the compiler, but they are opposed to this. At the same time discussion around sycl file extension and adding sycl language support into such tools as cmake is ongoing. Eventually cmake also considers to introduce some file extension convention for sycl. I hope we can further influence cmake and compiler communities to broader adopt `.sycl` file extension.

By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment `pvc,xe-lpg`. This behavior can be overridden by setting `TORCH_XPU_ARCH_LIST` environment variables to the comma separated list of desired devices to compile for.

Fixes: #132944

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132945
Approved by: https://github.com/albanD, https://github.com/guangyey
2025-02-16 10:16:09 +00:00
Mikayla Gawarecki
e8fbc86de0 Make torch.cuda.gds APIs public (#147120)
Follow up to https://github.com/pytorch/pytorch/pull/145748 that turned USE_CUFILE on for CUDA 12.6 and 12.8 binaries

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147120
Approved by: https://github.com/albanD
2025-02-14 17:06:50 +00:00
Aaron Gokaslan
6344ca1dd4 [BE][Ez]: Apply FURB188: use str remove(pre|suf)fix (#146997)
Since we are on 3.9, we can use this nice str builtin which is more readable and more efficient.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146997
Approved by: https://github.com/XuehaiPan, https://github.com/cyyever, https://github.com/jansel
2025-02-14 03:38:07 +00:00
PyTorch MergeBot
9a883007a2 Revert "Implement cuda graphs implementation of torch.cond and torch.while_loop (#140979)"
This reverts commit c7515da7b0.

Reverted https://github.com/pytorch/pytorch/pull/140979 on behalf of https://github.com/huydhn due to This change has been reported to break internal code ([comment](https://github.com/pytorch/pytorch/pull/140979#issuecomment-2657361940))
2025-02-13 18:04:26 +00:00
angelayi
67c4c39b4f [docs] Minor fixes to export and aoti docs (#144513)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144513
Approved by: https://github.com/yushangdi, https://github.com/desertfire
2025-02-13 15:19:35 +00:00