Commit Graph

5316 Commits

Author SHA1 Message Date
Yuanhao Ji
0a7eef140b Add torch.Tensor._make_wrapper_subclass to torch/_C/__init__.pyi (#154022)
Fixes #153790

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154022
Approved by: https://github.com/Skylion007
2025-05-27 14:10:00 +00:00
dependabot[bot]
ed27ee8355 Bump setuptools from 70.0.0 to 78.1.1 in /tools/build/bazel (#154075)
Bumps [setuptools](https://github.com/pypa/setuptools) from 70.0.0 to 78.1.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's changelog</a>.</em></p>
<blockquote>
<h1>v78.1.1</h1>
<h2>Bugfixes</h2>
<ul>
<li>More fully sanitized the filename in PackageIndex._download. (<a href="https://redirect.github.com/pypa/setuptools/issues/4946">#4946</a>)</li>
</ul>
<h1>v78.1.0</h1>
<h2>Features</h2>
<ul>
<li>Restore access to _get_vc_env with a warning. (<a href="https://redirect.github.com/pypa/setuptools/issues/4874">#4874</a>)</li>
</ul>
<h1>v78.0.2</h1>
<h2>Bugfixes</h2>
<ul>
<li>Postponed removals of deprecated dash-separated and uppercase fields in <code>setup.cfg</code>.
All packages with deprecated configurations are advised to move before 2026. (<a href="https://redirect.github.com/pypa/setuptools/issues/4911">#4911</a>)</li>
</ul>
<h1>v78.0.1</h1>
<h2>Misc</h2>
<ul>
<li><a href="https://redirect.github.com/pypa/setuptools/issues/4909">#4909</a></li>
</ul>
<h1>v78.0.0</h1>
<h2>Bugfixes</h2>
<ul>
<li>Reverted distutils changes that broke the monkey patching of command classes. (<a href="https://redirect.github.com/pypa/setuptools/issues/4902">#4902</a>)</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Setuptools no longer accepts options containing uppercase or dash characters in <code>setup.cfg</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="8e4868a036"><code>8e4868a</code></a> Bump version: 78.1.0 → 78.1.1</li>
<li><a href="100e9a61ad"><code>100e9a6</code></a> Merge pull request <a href="https://redirect.github.com/pypa/setuptools/issues/4951">#4951</a></li>
<li><a href="8faf1d7e0c"><code>8faf1d7</code></a> Add news fragment.</li>
<li><a href="2ca4a9fe47"><code>2ca4a9f</code></a> Rely on re.sub to perform the decision in one expression.</li>
<li><a href="e409e80029"><code>e409e80</code></a> Extract _sanitize method for sanitizing the filename.</li>
<li><a href="250a6d1797"><code>250a6d1</code></a> Add a check to ensure the name resolves relative to the tmpdir.</li>
<li><a href="d8390feaa9"><code>d8390fe</code></a> Extract _resolve_download_filename with test.</li>
<li><a href="4e1e89392d"><code>4e1e893</code></a> Merge <a href="https://github.com/jaraco/skeleton">https://github.com/jaraco/skeleton</a></li>
<li><a href="3a3144f0d2"><code>3a3144f</code></a> Fix typo: <code>pyproject.license</code> -&gt; <code>project.license</code> (<a href="https://redirect.github.com/pypa/setuptools/issues/4931">#4931</a>)</li>
<li><a href="d751068fd2"><code>d751068</code></a> Fix typo: pyproject.license -&gt; project.license</li>
<li>Additional commits viewable in <a href="https://github.com/pypa/setuptools/compare/v70.0.0...v78.1.1">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=setuptools&package-manager=pip&previous-version=70.0.0&new-version=78.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/pytorch/pytorch/network/alerts).

</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154075
Approved by: https://github.com/Skylion007

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-25 15:13:03 +00:00
Tom Ritchford
9a8c42ff94 Get rid of unused code in linters (#154043)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154043
Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007
2025-05-22 15:24:54 +00:00
Jane Xu
8817e5ac80 Render Example: and not Example:: in docs (#153978)
Everything here is a grep except the changes in tools/autograd/load_derivatives.py which I manually corrected.

The correct notation is:
```
Example::

    >>> ...
```

It is common and wrong to have:
```
Example::
    >>> ...
```

In the wrong example, we get these pesky double colons:
![image](https://github.com/user-attachments/assets/20ffd349-68bb-4552-966c-e23923350476)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153978
Approved by: https://github.com/soulitzer, https://github.com/malfet
2025-05-21 01:03:26 +00:00
Jane Xu
fc33da410f Add torch/header_only_apis.txt and enforce they're tested (#153635)
This PR adds enforcement of testing header only APIs.

The benefit of torch/header_only_apis.txt is twofold:
1) this gives us a clear view of what we expect to be header only
2) this allows us to enforce testing

The enforcement added in this PR is very basic--we literally string match that a symbol in `torch/header_only_apis.txt` is in a cpp test. This is meant to be a first step in verifying our APIs are properly tested and can get fancier over time. For now, I've added myself as a codeowner to learn what to look out for in terms of proper tests. Over time, I anticipate we can automate more steps, but right now let's just get something out the door.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153635
Approved by: https://github.com/albanD
ghstack dependencies: #153965
2025-05-20 23:42:24 +00:00
Jane Xu
8f943046f8 [BE] light cleanups to linter logic (#153965)
some BE cleanup on other lint things I saw while doing the top of the this stack

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153965
Approved by: https://github.com/soulitzer
2025-05-20 21:28:48 +00:00
Yang Wang
335c89c6f1 [Monitoring] enable local logs and add mac test monitoring (#153454)
Enable to run the upload utilzation logics using local pointer instead of reading from s3, this could be useful for rocm too,
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153454
Approved by: https://github.com/huydhn
2025-05-20 17:14:40 +00:00
Nikita Shulga
c4d1ff02f8 [Lint] Update clang-format to 19.1.4 (#153889)
All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889
Approved by: https://github.com/cyyever, https://github.com/atalman
2025-05-20 14:12:46 +00:00
Yang Wang
c54b9f2969 [Monitoring] Add util for linux build (#153456)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153456
Approved by: https://github.com/huydhn
2025-05-19 17:28:17 +00:00
Xuehai Pan
27f7b65a69 [BE] Ensure generated stub files by gen_pyi are properly formatted (#150730)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150730
Approved by: https://github.com/aorenste
2025-05-17 12:30:40 +00:00
PyTorch MergeBot
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
xinan.lin
a9adc9a9b6 [Linter] Add linter to detect device-bias hard code in test cases. (#152948)
Since XPU does not gate community pull requests, we’ve observed that contributors often hardcode "cuda" in functions decorated with @requires_gpu() when adding new test cases. This causes the tests to fail on XPU and breaks XPU CI.
This PR adds a linter to detect such issues automatically. An example is shown below.

```
  Error (TEST_DEVICE_BIAS) [device-bias]
    `@requires_gpu` function should not hardcode device='cuda'

        11670  |                .contiguous()
        11671  |            )
        11672  |
    >>> 11673  |        inp = torch.rand((64, 64), device="cuda") * 2 - 1
        11674  |        boundaries = torch.tensor([-0.9, -0.8, 0.1, 0.2, 0.5, 0.9])
        11675  |
        11676  |        self.common(fn, (inp, boundaries), check_lowp=False)

  Error (TEST_DEVICE_BIAS) [device-bias]
    `@requires_gpu` function should not hardcode .cuda() call

        11700  |            self.assertEqual(ref, res)
        11701  |
        11702  |            for offset2 in (0, 1, 2, 3, 4):
    >>> 11703  |                base2 = torch.randn(64 * 64 + 64, dtype=torch.float32).cuda()
        11704  |                inp2 = torch.as_strided(base2, (64, 64), (64, 1), offset2)
        11705  |                ref2 = fn(inp2)
        11706  |                res2 = fn_c(inp2)

  Error (TEST_DEVICE_BIAS) [device-bias]
    `@requires_gpu` function should not hardcode torch.device('cuda:0')

        11723  |            return x.sin() + x.cos()
        11724  |
        11725  |        base = torch.randn(
    >>> 11726  |            64 * 64 + 64, dtype=torch.float32, device=torch.device("cuda:0")
        11727  |        )
        11728  |
        11729  |        inp1 = torch.as_strided(base, (32, 32), (32, 1), 4)

  Error (TEST_DEVICE_BIAS) [device-bias]
    `@requires_gpu` function should not hardcode .to('cuda') call

        11771  |            torch.manual_seed(42)
        11772  |            base = torch.randn(64 * 64 + 64, dtype=torch.float32, device=self.device)
        11773  |            torch.manual_seed(42)
    >>> 11774  |            base_ref = torch.randn(64 * 64 + 64, dtype=torch.float32).to("cuda")
        11775  |
        11776  |            inp = torch.as_strided(base, size, stride, offset)
        11777  |            inp_ref = torch.as_strided(base_ref, size, stride, offset)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152948
Approved by: https://github.com/EikanWang, https://github.com/cyyever, https://github.com/malfet, https://github.com/jansel
2025-05-16 08:03:54 +00:00
Xuehai Pan
a4c828199e [BE] Add __all__ to torch/nn/functional.pyi and torch/return_types.pyi (#150729)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150729
Approved by: https://github.com/aorenste
2025-05-15 19:01:57 +00:00
Aaron Gokaslan
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
Xuehai Pan
f7a5aa1d8d [torchgen] Refactor and simplify gen_pyi.py to use Generic TypeAlias (PEP 585) and Union Type (PEP 604) (#150727)
https://github.com/pytorch/pytorch/pull/129001#discussion_r1645126801 is the motivation for the whole stack of PRs. In `torch/__init__.py`, `torch._C.Type` shadows `from typing import Type`, and there is no type stub for `torch._C.Type` in `torch/_C/__init__.pyi`. So we need to use `from typing import Type as _Type`. After enabling [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585) in the `.pyi` type stub files, we can use `type` instead of `typing.Type` or `from typing import Type as _Type`.

------

- [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585): e.g. `typing.List[T] -> list[T]`, `typing.Dict[KT, VT] -> dict[KT, VT]`, `typing.Type[T] -> type[T]`.
- [Union Type (PEP 604)](https://peps.python.org/pep-0604): e.g. `Union[X, Y] -> X | Y`, `Optional[X] -> X | None`, `Optional[Union[X, Y]] -> X | Y | None`.

Note that in `.pyi` stub files, we do not need `from __future__ import annotations`. So this PR does not violate issue #117449:

- #117449

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150727
Approved by: https://github.com/aorenste
ghstack dependencies: #150726
2025-05-15 09:36:42 +00:00
Xuehai Pan
014726d9d3 [torchgen] Refactor torchgen.utils.FileManager to accept pathlib.Path (#150726)
This PR allows `FileManager` to accept `pathlib.Path` as arguments while keeping the original `str` path support.

This allows us to simplify the code such as:

1. `os.path.join(..., ...)` with `Path.__floordiv__(..., ...)`.

95a5958db4/torchgen/utils.py (L155)

95a5958db4/torchgen/utils.py (L176)

2. `os.path.basename(...)` with `Path(...).name`.
 95a5958db4/torchgen/utils.py (L161)

3. Manual file extension split with `Path(...).with_stem(new_stem)`

95a5958db4/torchgen/utils.py (L241-L256)

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150726
Approved by: https://github.com/aorenste
2025-05-15 02:52:24 +00:00
PyTorch MergeBot
f363a3f51a Revert "[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for sm90, sm100 (#149282)"
This reverts commit 9386701b51.

Reverted https://github.com/pytorch/pytorch/pull/149282 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, see [D74729259](https://www.internalfb.com/diff/D74729259). @drisspg may you help out the author have their PR merged? ([comment](https://github.com/pytorch/pytorch/pull/149282#issuecomment-2881546951))
2025-05-14 20:53:49 +00:00
eqy
9386701b51 [cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for sm90, sm100 (#149282)
cleanup tuple/tensor boilerplate in cuDNN SDPA, preparation for nested/ragged tensor backward
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149282
Approved by: https://github.com/drisspg
2025-05-14 01:39:24 +00:00
Aaron Gokaslan
3555ebb63d [BE]: Update ruff to 0.11.8 (#153249)
Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere
2025-05-12 18:30:52 +00:00
Ke Wen
5bf0c3518c Detect NVSHMEM location (#153010)
### Changes
- Detect NVSHMEM install location via `sysconfig.get_path("purelib")`, which typically resolves to `<conda_env>/lib/python/site-packages`, and NVSHMEM include and lib live under `nvidia/nvshmem`
- Added link dir via `target_link_directories`
- Removed direct dependency on mlx5
- Added preload rule (following other other NVIDIA libs)

### Plan of Record
1. End user experience: link against NVSHMEM dynamically (NVSHMEM lib size is 100M, similar to NCCL, thus we'd like users to `pip install nvshmem` than torch carrying the bits)
2. Developer experience: at compile time, prefers wheel dependency than using Git submodule
General rule: submodule for small lib that torch can statically link with
If user pip install a lib, our CI build process should do the same, rather than building from Git submodule (just for its header, for example)
3. Keep `USE_NVSHMEM` to gate non-Linux platforms, like Windows, Mac
4. At configuration time, we should be able to detect whether nvshmem is available, if not, we don't build `NVSHMEMSymmetricMemory` at all.

For now, we have symbol dependency on two particular libs from NVSHMEM:
- libnvshmem_host.so: contains host side APIs;
- libnvshmem_device.a: contains device-side global variables AND device function impls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153010
Approved by: https://github.com/ngimel, https://github.com/fduwjj, https://github.com/Skylion007
2025-05-07 23:35:04 +00:00
You Jiacheng
ee0cd1d8b5 Only do shallow clone when checkout nccl (#152826)
Note: `--depth` implies `--single-branch` since git 2.7.6

```sh
git clone https://github.com/NVIDIA/nccl.git
Cloning into 'nccl'...
remote: Enumerating objects: 4205, done.
remote: Counting objects: 100% (238/238), done.
remote: Compressing objects: 100% (122/122), done.
remote: Total 4205 (delta 144), reused 126 (delta 116), pack-reused 3967 (from 3)
Receiving objects: 100% (4205/4205), 4.22 MiB | 7.01 MiB/s, done.
Resolving deltas: 100% (2858/2858), done.
```
```sh
git clone --depth 1 --branch v2.25.1-1 https://github.com/NVIDIA/nccl.git
Cloning into 'nccl'...
remote: Enumerating objects: 249, done.
remote: Counting objects: 100% (249/249), done.
remote: Compressing objects: 100% (227/227), done.
remote: Total 249 (delta 31), reused 111 (delta 15), pack-reused 0 (from 0)
Receiving objects: 100% (249/249), 657.44 KiB | 2.14 MiB/s, done.
Resolving deltas: 100% (31/31), done.
Note: switching to '80f6bda4378b99d99e82b4d76a633791cc45fef0'.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152826
Approved by: https://github.com/albanD
2025-05-06 04:56:19 +00:00
albanD
22d1359bc6 Move warning from item to specific number conversions (#152709)
Follow up to https://github.com/pytorch/pytorch/pull/143261 to not warn when a plain .item() is done.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152709
Approved by: https://github.com/malfet, https://github.com/ngimel
2025-05-05 20:46:05 +00:00
cyy
45efa1aaa8 [3/N] Use internal linkage in C++ files (#151297)
Follows #151070.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151297
Approved by: https://github.com/Skylion007
2025-05-05 17:48:39 +00:00
Tom Ritchford
2825a28bf1 Exempt overriding methods from docstring_linter (fix #151692) (#151906)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151906
Approved by: https://github.com/Skylion007
2025-05-05 12:39:42 +00:00
Michał Górny
5c0f474dac Do not check out nccl when not building it (#152533)
Add additional conditions to `build_pytorch_libs.py` to avoid fetching NCCL when `USE_CUDA` or `USE_NCCL` are disabled. While at it, adjust the existing condition for `USE_SYSTEM_NCCL` to use the utility function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152533
Approved by: https://github.com/albanD
2025-05-02 16:31:03 +00:00
PyTorch MergeBot
1c04ea4e59 Revert "[torchgen] Refactor torchgen.utils.FileManager to accept pathlib.Path (#150726)"
This reverts commit 4b5b1adb21.

Reverted https://github.com/pytorch/pytorch/pull/150726 on behalf of https://github.com/malfet due to This breaks Windows builds, see a765e2ddda/1 ([comment](https://github.com/pytorch/pytorch/pull/150726#issuecomment-2845858846))
2025-05-01 21:52:35 +00:00
Xuehai Pan
4b5b1adb21 [torchgen] Refactor torchgen.utils.FileManager to accept pathlib.Path (#150726)
This PR allows `FileManager` to accept `pathlib.Path` as arguments while keeping the original `str` path support.

This allows us to simplify the code such as:

1. `os.path.join(..., ...)` with `Path.__floordiv__(..., ...)`.

95a5958db4/torchgen/utils.py (L155)

95a5958db4/torchgen/utils.py (L176)

2. `os.path.basename(...)` with `Path(...).name`.
 95a5958db4/torchgen/utils.py (L161)

3. Manual file extension split with `Path(...).with_stem(new_stem)`

95a5958db4/torchgen/utils.py (L241-L256)

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150726
Approved by: https://github.com/zou3519
2025-05-01 17:43:16 +00:00
Pian Pawakapan
632b89af43 [dynamic shapes] support SymInt inputs for kthvalue (#152151)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152151
Approved by: https://github.com/tugsbayasgalan, https://github.com/malfet
2025-05-01 03:47:23 +00:00
Camyll Harajli
b22fda9e1c Remove conda refs in tools (#152368)
Fixes #152126

Did not find references in the two .ipynb files

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152368
Approved by: https://github.com/atalman
2025-04-29 02:45:47 +00:00
Anthony Shoumikhin
7cae7902a2 Add scripts to check xrefs and urls (#151844)
Traverses the docs and code to find any broken links
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151844
Approved by: https://github.com/huydhn
2025-04-28 09:30:07 +00:00
Anthony Shoumikhin
e2f9759bd0 Fix broken URLs (#152237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237
Approved by: https://github.com/huydhn, https://github.com/malfet
2025-04-27 09:56:42 +00:00
sumantro93
017a6bd593 add min/max_seqlen to non_differentiable (#151750)
Fixes #148988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151750
Approved by: https://github.com/soulitzer
2025-04-22 21:46:02 +00:00
Junjie Wang (PyTorch)
95abc0f515 [c10d][fr] Fix another bug when we should continue when the op list is empty (#151798)
Differential Revision: D73375318

We shouldn't check the op list when it is empty. And later, when it is empty we pops it out from the queue we will check for collective matching. Added a unit test for this case and also covered the case fixed https://github.com/pytorch/pytorch/pull/151683 in the unit test as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151798
Approved by: https://github.com/d4l3k, https://github.com/wconstab, https://github.com/fegin
2025-04-22 04:43:31 +00:00
Junjie Wang (PyTorch)
6e7b6e8d57 [c10d][fr] Fix a bug when first rank is not zero in the script (#151683)
Summary: Further testing the script, we found that we shouldn't always assume rank 0 is the first rank, so we need to check all entries and see if it P2P op for this coalesced group.

Test Plan: Directly test with corner case.

Differential Revision: D73266257

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151683
Approved by: https://github.com/fegin
2025-04-18 20:55:06 +00:00
Jithun Nair
b4550541ea [ROCm] upgrade nightly wheels to rocm6.4 (#151355)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151355
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-04-17 17:29:07 +00:00
fduwjj
6f9ffaa991 [c10d][fr] Fix script for uneven reduce scatter and update test cases (#151475)
Somehow the type string for reduce scatter is "REDUCE_SCATTER" not "REDUCESCATTER". This PR fixed it and added more test cases.

Differential Revision: [D73141245](https://our.internmc.facebook.com/intern/diff/D73141245)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151475
Approved by: https://github.com/fegin
2025-04-17 02:11:08 +00:00
fduwjj
ae648f047c [c10d][fr] Enable FR analysis script for rest of all coalesce op (#151247)
We revisited how coalesced collective is working in https://github.com/pytorch/pytorch/pull/151243 and we now want to enable the script to work for slow path. The change is indeed bc-breaking but this is needed to make it work and the API is an internal use API. It is not user facing. For slow path the individual has input-sizes and output sizes recorded but no state. The final one has the state ready. We check the correctness of each individual collective one by one but we don't check the state match for these collectives, we can only check the state match for the last one which is the work item with coalesced label.

Added more unit test for slow path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151247
Approved by: https://github.com/d4l3k, https://github.com/XilunWu
2025-04-15 20:53:03 +00:00
fduwjj
48b4bc1640 [c10d][fr] Enable FR analysis script for all fast-path coalesce op (#151243)
This PR is to enable FR for all coalesce ops for fast path. (batch p2p is enabled in the current script, so we will mainly focus on non-P2P ops). To explain what is fast path, let's revisit how coalesced collective is working today:

For non-P2P coalesced ops, there are are several ways to call it (due to legendary reasons):

- Way one: Directly call python api like all_reduce_coalesced in python, this will be deprecated soon.
- Way two: Directly call api inside PGNCCL like allreduce_coalesced. The way case 1 will eventually call into this. This is not deprecated and will not be deprecated, IIUC.
- Way three: Using _coalescing_manager in python, like:
```
with _coalescing_manager():
    for i in range(num_colls):
           dist.all_reduce(tensors[i])
```
This way has two path:
   - Fast path: when users call all-reduce, all-gather-into-tensor or reduce-scatter, we will only launch one big collective by calling the api from case 1.
   - Slow path: we call startCoalescing() in the beginning and then a bunch of collectives (each one will generate a FR entry) and then endCoalescing(). Inside startCoalescing(), groupStart() is called and inside endCoalescing(), groupEnd() is then called. So although this is going to be one collective, we call into PGNCCL for each collective coalesced in the slow path case.
   - For uneven all-gather (allgather_v) and reduce-scatter, it follows the pattern mention in slow path. It directly call cpp api inside PGNCCL.

This PR addressed the fast path because this is just an easy case, we store the collectives info on the python side, and we will only call into PGNCCL once so there will only be one work and one FR entry. We can just treat them as regular coalesced collective.

We add some e2e unit test for build_db function so that the change to FR is more thoroughly tested.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151243
Approved by: https://github.com/d4l3k, https://github.com/wz337
2025-04-15 04:08:28 +00:00
fduwjj
48132de4af [c10d][fr] Fix the false positive in the dtype check in fr analysis script (#151063)
When checking dtype in fr analysis script, we should only check it when the input of output numbel is larger than zero. For the case when it is gather or scatter, the output/input size will be an empty list for non-src or non-dst ranks which we should just skip the check.

Differential Revision: [D72826823](https://our.internmc.facebook.com/intern/diff/D72826823)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151063
Approved by: https://github.com/d4l3k, https://github.com/kwen2501
2025-04-11 02:11:58 +00:00
Tom Ritchford
596e44d26a [inductor] Enable docstring_linter on _inductor (#144622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144622
Approved by: https://github.com/eellison
ghstack dependencies: #144621
2025-04-10 14:32:26 +00:00
Tom Ritchford
ba35793226 [inductor] Add tests for new docstring_linter features (fix #142496) (#144621)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144621
Approved by: https://github.com/eellison
2025-04-10 14:32:26 +00:00
cyy
322f883c0c Remove unneeded CUDA logic from _create_build_env (#145822)
Because FindCUDAToolkit.cmake has that logic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145822
Approved by: https://github.com/albanD
2025-04-10 02:17:28 +00:00
Tom Ritchford
31fe258efc [inductor] Add features to docstring_linter (see #142496) (#145834)
## Improvements to `docstring_linter`

* Add a "grandfather list" of existing undocumented classes and functions (`--grandfather`, `--grandfather-tolerance`, `--no-grandfather`, `--write-grandfather`)
* In classes, now just one of the class itself or its `__init__()` method needs to be documented (`--lint-init` turns the old behavior back on)
* Now classes and functions defined local to other functions do not need to be documented (`--lint-local` turns the old behavior back on)
* New `--report` flag produces a compact report of long, undocumented classes or function definitions: see attached example run over all pytorch: [pytorch-docs.json](https://github.com/user-attachments/files/18455981/pytorch-docs.json)

## Help text

```
$ python tools/linter/adapters/docstring_linter.py --help
usage: docstring_linter.py [-h] [-l] [-v] [--grandfather GRANDFATHER] [--grandfather-tolerance GRANDFATHER_TOLERANCE] [--lint-init]
                           [--lint-local] [--lint-protected] [--max-class MAX_CLASS] [--max-def MAX_DEF]
                           [--min-docstring MIN_DOCSTRING] [--no-grandfather] [--report] [--write-grandfather]
                           [files ...]

`docstring_linter` reports on long functions, methods or classes without docstrings

positional arguments:
  files                 A list of files or directories to lint

optional arguments:
  -h, --help            show this help message and exit
  -l, --lintrunner      Run for lintrunner and print LintMessages which aren't edits
  -v, --verbose         Print more debug info
  --grandfather GRANDFATHER, -g GRANDFATHER
                        Set the grandfather list
  --grandfather-tolerance GRANDFATHER_TOLERANCE, -t GRANDFATHER_TOLERANCE
                        Tolerance for grandfather sizes, in percent
  --lint-init, -i       Lint __init__ and class separately
  --lint-local, -o      Lint definitions inside other functions
  --lint-protected, -p  Lint functions, methods and classes that start with _
  --max-class MAX_CLASS, -c MAX_CLASS
                        Maximum number of lines for an undocumented class
  --max-def MAX_DEF, -d MAX_DEF
                        Maximum number of lines for an undocumented function
  --min-docstring MIN_DOCSTRING, -s MIN_DOCSTRING
                        Minimum number of characters for a docstring
  --no-grandfather, -n  Disable the grandfather list
  --report, -r          Print a report on all classes and defs
  --write-grandfather, -w
                        Rewrite the grandfather list
```

---

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145834
Approved by: https://github.com/amjames, https://github.com/eellison
2025-04-09 21:38:36 +00:00
fduwjj
8aaf296efc [c10d][fr] Refactor analysis script for modularization and reusing for coalesce collectives (#150881)
Trying to make the code of FR analysis more reusable and modularized. So we split core error analysis logic into separate functions.

This PR mostly is shuffle around the code a bit.

Differential Revision: [D72690120](https://our.internmc.facebook.com/intern/diff/D72690120)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150881
Approved by: https://github.com/wz337
2025-04-09 16:10:19 +00:00
Natalia Gimelshein
55e62ff74a bf16 grouped gemm (#150374)
Enabled bf16 grouped gemm with an API similar to _scaled_group_gemm, except without scale and fast accum arguments. All transpose variants are enabled, unlike scaled gemm. Ideally we'd factor out a lot more code from scaled gemm, currently there's a lot of repetition between scaled and non-scaled versions. I factored out only a helper kernel that prepares arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150374
Approved by: https://github.com/drisspg
2025-04-06 04:53:24 +00:00
Xuehai Pan
ae74ef9d53 Set proper LD_LIBRARY_PATH on Linux in nightly venv in nightly pull tool (#143262)
Before this change:

```console
$ make setup-env-cuda PYTHON="${HOMEBREW_PREFIX}/bin/python3.12"
$ source venv/bin/activate
$ python3 -c 'import torch'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/PanXuehai/Projects/pytorch/torch/__init__.py", line 379, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: libcudnn.so.9: cannot open shared object file: No such file or directory
```

This PR adds `site-packages/nvidia/**/lib` to `LD_LIBRARY_PATH` in `venv/bin/activate` script to let NVIDIA PyPI packages can be loaded correctly.

See also:

- #141837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143262
Approved by: https://github.com/malfet
2025-04-01 16:51:02 +00:00
FFFrog
36f2d0aaba Add "xpu" to __all__ for torch/version.py (#149695)
As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149695
Approved by: https://github.com/desertfire, https://github.com/guangyey
2025-04-01 08:44:51 +00:00
Phillip Liu
31634b8c6a [fr] Added protection against missing stack frames in fr cont. (#150133)
Summary: Previously we had D70358287, which didn't fully resolved the issue.

Test Plan:
# FR
`buck2 run @//mode/opt //caffe2/fb/flight_recorder:fr_trace -- --mast_job_id f710320638-TrainingApplication --mast_job_version 0 --mast_job_attempt 0 --bucket tlcm_log_blob --world_size 128 --dump_file_name_offset 0 --allow-incomplete-ranks`
Confirm no error
# FR analyzer
`buck2 run @//mode/opt //investigations/dr_patternson/analyzers/ai_observability:ai_observability-all-analyzers-cli -- flight_recorder_analyzer --mast_job_name f710320638-TrainingApplication --mast_job_version 0 --mast_job_attempt 0`
Confirm no error

Differential Revision: D71998980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150133
Approved by: https://github.com/fduwjj
2025-04-01 03:07:59 +00:00
Daniël de Kok
fdc4394b16 Do not fetch NCCL when system NCCL is used (#149607)
We are compiling PyTorch in a sandbox without networking. Unconditionally fetching breaks the build and is not needed when a system NCCL is used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149607
Approved by: https://github.com/malfet
2025-03-28 05:06:49 +00:00
vasiliy
e33bc41958 add torch.float4_e2m1fn_x2 to PyTorch (#148791)
Summary:

Redo of https://github.com/pytorch/pytorch/pull/146578 to get around
rebase conflicts.

Test Plan:

```
pytest test/quantization/core/experimental/test_floatx.py -s
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148791
Approved by: https://github.com/drisspg, https://github.com/eqy, https://github.com/jeffdaily
2025-03-27 17:32:20 +00:00