pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yuanhao Ji	0a7eef140b	Add `torch.Tensor._make_wrapper_subclass` to `torch/_C/__init__.pyi` (#154022 ) Fixes #153790 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154022 Approved by: https://github.com/Skylion007	2025-05-27 14:10:00 +00:00
dependabot[bot]	ed27ee8355	Bump setuptools from 70.0.0 to 78.1.1 in /tools/build/bazel (#154075 ) Bumps [setuptools](https://github.com/pypa/setuptools) from 70.0.0 to 78.1.1. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/setuptools/blob/main/NEWS.rst">setuptools's changelog</a>.</em></p> <blockquote> <h1>v78.1.1</h1> <h2>Bugfixes</h2> <ul> <li>More fully sanitized the filename in PackageIndex._download. (<a href="https://redirect.github.com/pypa/setuptools/issues/4946">#4946</a>)</li> </ul> <h1>v78.1.0</h1> <h2>Features</h2> <ul> <li>Restore access to _get_vc_env with a warning. (<a href="https://redirect.github.com/pypa/setuptools/issues/4874">#4874</a>)</li> </ul> <h1>v78.0.2</h1> <h2>Bugfixes</h2> <ul> <li>Postponed removals of deprecated dash-separated and uppercase fields in <code>setup.cfg</code>. All packages with deprecated configurations are advised to move before 2026. (<a href="https://redirect.github.com/pypa/setuptools/issues/4911">#4911</a>)</li> </ul> <h1>v78.0.1</h1> <h2>Misc</h2> <ul> <li><a href="https://redirect.github.com/pypa/setuptools/issues/4909">#4909</a></li> </ul> <h1>v78.0.0</h1> <h2>Bugfixes</h2> <ul> <li>Reverted distutils changes that broke the monkey patching of command classes. (<a href="https://redirect.github.com/pypa/setuptools/issues/4902">#4902</a>)</li> </ul> <h2>Deprecations and Removals</h2> <ul> <li>Setuptools no longer accepts options containing uppercase or dash characters in <code>setup.cfg</code>.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`8e4868a036`"><code>8e4868a</code></a> Bump version: 78.1.0 → 78.1.1</li> <li><a href="`100e9a61ad`"><code>100e9a6</code></a> Merge pull request <a href="https://redirect.github.com/pypa/setuptools/issues/4951">#4951</a></li> <li><a href="`8faf1d7e0c`"><code>8faf1d7</code></a> Add news fragment.</li> <li><a href="`2ca4a9fe47`"><code>2ca4a9f</code></a> Rely on re.sub to perform the decision in one expression.</li> <li><a href="`e409e80029`"><code>e409e80</code></a> Extract _sanitize method for sanitizing the filename.</li> <li><a href="`250a6d1797`"><code>250a6d1</code></a> Add a check to ensure the name resolves relative to the tmpdir.</li> <li><a href="`d8390feaa9`"><code>d8390fe</code></a> Extract _resolve_download_filename with test.</li> <li><a href="`4e1e89392d`"><code>4e1e893</code></a> Merge <a href="https://github.com/jaraco/skeleton">https://github.com/jaraco/skeleton</a></li> <li><a href="`3a3144f0d2`"><code>3a3144f</code></a> Fix typo: <code>pyproject.license</code> -> <code>project.license</code> (<a href="https://redirect.github.com/pypa/setuptools/issues/4931">#4931</a>)</li> <li><a href="`d751068fd2`"><code>d751068</code></a> Fix typo: pyproject.license -> project.license</li> <li>Additional commits viewable in <a href="https://github.com/pypa/setuptools/compare/v70.0.0...v78.1.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=setuptools&package-manager=pip&previous-version=70.0.0&new-version=78.1.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/pytorch/pytorch/network/alerts). </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/154075 Approved by: https://github.com/Skylion007 Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-05-25 15:13:03 +00:00
Tom Ritchford	9a8c42ff94	Get rid of unused code in linters (#154043 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154043 Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007	2025-05-22 15:24:54 +00:00
Jane Xu	8817e5ac80	Render Example: and not Example:: in docs (#153978 ) Everything here is a grep except the changes in tools/autograd/load_derivatives.py which I manually corrected. The correct notation is: ``` Example:: >>> ... ``` It is common and wrong to have: ``` Example:: >>> ... ``` In the wrong example, we get these pesky double colons: ![image](https://github.com/user-attachments/assets/20ffd349-68bb-4552-966c-e23923350476) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153978 Approved by: https://github.com/soulitzer, https://github.com/malfet	2025-05-21 01:03:26 +00:00
Jane Xu	fc33da410f	Add torch/header_only_apis.txt and enforce they're tested (#153635 ) This PR adds enforcement of testing header only APIs. The benefit of torch/header_only_apis.txt is twofold: 1) this gives us a clear view of what we expect to be header only 2) this allows us to enforce testing The enforcement added in this PR is very basic--we literally string match that a symbol in `torch/header_only_apis.txt` is in a cpp test. This is meant to be a first step in verifying our APIs are properly tested and can get fancier over time. For now, I've added myself as a codeowner to learn what to look out for in terms of proper tests. Over time, I anticipate we can automate more steps, but right now let's just get something out the door. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153635 Approved by: https://github.com/albanD ghstack dependencies: #153965	2025-05-20 23:42:24 +00:00
Jane Xu	8f943046f8	[BE] light cleanups to linter logic (#153965 ) some BE cleanup on other lint things I saw while doing the top of the this stack Pull Request resolved: https://github.com/pytorch/pytorch/pull/153965 Approved by: https://github.com/soulitzer	2025-05-20 21:28:48 +00:00
Yang Wang	335c89c6f1	[Monitoring] enable local logs and add mac test monitoring (#153454 ) Enable to run the upload utilzation logics using local pointer instead of reading from s3, this could be useful for rocm too, Pull Request resolved: https://github.com/pytorch/pytorch/pull/153454 Approved by: https://github.com/huydhn	2025-05-20 17:14:40 +00:00
Nikita Shulga	c4d1ff02f8	[Lint] Update clang-format to 19.1.4 (#153889 ) All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889 Approved by: https://github.com/cyyever, https://github.com/atalman	2025-05-20 14:12:46 +00:00
Yang Wang	c54b9f2969	[Monitoring] Add util for linux build (#153456 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/153456 Approved by: https://github.com/huydhn	2025-05-19 17:28:17 +00:00
Xuehai Pan	27f7b65a69	[BE] Ensure generated stub files by `gen_pyi` are properly formatted (#150730 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150730 Approved by: https://github.com/aorenste	2025-05-17 12:30:40 +00:00
PyTorch MergeBot	3443627e07	Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473 )" This reverts commit `4f4ecc583e`. Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))	2025-05-16 08:29:26 +00:00
xinan.lin	a9adc9a9b6	[Linter] Add linter to detect device-bias hard code in test cases. (#152948 ) Since XPU does not gate community pull requests, we’ve observed that contributors often hardcode "cuda" in functions decorated with @requires_gpu() when adding new test cases. This causes the tests to fail on XPU and breaks XPU CI. This PR adds a linter to detect such issues automatically. An example is shown below. ``` Error (TEST_DEVICE_BIAS) [device-bias] `@requires_gpu` function should not hardcode device='cuda' 11670 \| .contiguous() 11671 \| ) 11672 \| >>> 11673 \| inp = torch.rand((64, 64), device="cuda") * 2 - 1 11674 \| boundaries = torch.tensor([-0.9, -0.8, 0.1, 0.2, 0.5, 0.9]) 11675 \| 11676 \| self.common(fn, (inp, boundaries), check_lowp=False) Error (TEST_DEVICE_BIAS) [device-bias] `@requires_gpu` function should not hardcode .cuda() call 11700 \| self.assertEqual(ref, res) 11701 \| 11702 \| for offset2 in (0, 1, 2, 3, 4): >>> 11703 \| base2 = torch.randn(64 * 64 + 64, dtype=torch.float32).cuda() 11704 \| inp2 = torch.as_strided(base2, (64, 64), (64, 1), offset2) 11705 \| ref2 = fn(inp2) 11706 \| res2 = fn_c(inp2) Error (TEST_DEVICE_BIAS) [device-bias] `@requires_gpu` function should not hardcode torch.device('cuda:0') 11723 \| return x.sin() + x.cos() 11724 \| 11725 \| base = torch.randn( >>> 11726 \| 64 * 64 + 64, dtype=torch.float32, device=torch.device("cuda:0") 11727 \| ) 11728 \| 11729 \| inp1 = torch.as_strided(base, (32, 32), (32, 1), 4) Error (TEST_DEVICE_BIAS) [device-bias] `@requires_gpu` function should not hardcode .to('cuda') call 11771 \| torch.manual_seed(42) 11772 \| base = torch.randn(64 * 64 + 64, dtype=torch.float32, device=self.device) 11773 \| torch.manual_seed(42) >>> 11774 \| base_ref = torch.randn(64 * 64 + 64, dtype=torch.float32).to("cuda") 11775 \| 11776 \| inp = torch.as_strided(base, size, stride, offset) 11777 \| inp_ref = torch.as_strided(base_ref, size, stride, offset) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152948 Approved by: https://github.com/EikanWang, https://github.com/cyyever, https://github.com/malfet, https://github.com/jansel	2025-05-16 08:03:54 +00:00
Xuehai Pan	a4c828199e	[BE] Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi` (#150729 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150729 Approved by: https://github.com/aorenste	2025-05-15 19:01:57 +00:00
Aaron Gokaslan	4f4ecc583e	[BE]: Enable RUFF TRY400 rule - log.exception (#153473 ) Change logging.error to logging.exception to log additional information when relevant. A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473 Approved by: https://github.com/albanD, https://github.com/cyyever	2025-05-15 13:36:59 +00:00
Xuehai Pan	f7a5aa1d8d	[torchgen] Refactor and simplify `gen_pyi.py` to use Generic TypeAlias (PEP 585) and Union Type (PEP 604) (#150727 ) https://github.com/pytorch/pytorch/pull/129001#discussion_r1645126801 is the motivation for the whole stack of PRs. In `torch/__init__.py`, `torch._C.Type` shadows `from typing import Type`, and there is no type stub for `torch._C.Type` in `torch/_C/__init__.pyi`. So we need to use `from typing import Type as _Type`. After enabling [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585) in the `.pyi` type stub files, we can use `type` instead of `typing.Type` or `from typing import Type as _Type`. ------ - [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585): e.g. `typing.List[T] -> list[T]`, `typing.Dict[KT, VT] -> dict[KT, VT]`, `typing.Type[T] -> type[T]`. - [Union Type (PEP 604)](https://peps.python.org/pep-0604): e.g. `Union[X, Y] -> X \| Y`, `Optional[X] -> X \| None`, `Optional[Union[X, Y]] -> X \| Y \| None`. Note that in `.pyi` stub files, we do not need `from __future__ import annotations`. So this PR does not violate issue #117449: - #117449 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/150727 Approved by: https://github.com/aorenste ghstack dependencies: #150726	2025-05-15 09:36:42 +00:00
Xuehai Pan	014726d9d3	[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path` (#150726 ) This PR allows `FileManager` to accept `pathlib.Path` as arguments while keeping the original `str` path support. This allows us to simplify the code such as: 1. `os.path.join(..., ...)` with `Path.__floordiv__(..., ...)`. `95a5958db4/torchgen/utils.py (L155)` `95a5958db4/torchgen/utils.py (L176)` 2. `os.path.basename(...)` with `Path(...).name`. `95a5958db4/torchgen/utils.py (L161)` 3. Manual file extension split with `Path(...).with_stem(new_stem)` `95a5958db4/torchgen/utils.py (L241-L256)` ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/150726 Approved by: https://github.com/aorenste	2025-05-15 02:52:24 +00:00
PyTorch MergeBot	f363a3f51a	Revert "[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 )" This reverts commit `9386701b51`. Reverted https://github.com/pytorch/pytorch/pull/149282 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, see [D74729259](https://www.internalfb.com/diff/D74729259). @drisspg may you help out the author have their PR merged? ([comment](https://github.com/pytorch/pytorch/pull/149282#issuecomment-2881546951))	2025-05-14 20:53:49 +00:00
eqy	9386701b51	[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100` (#149282 ) cleanup tuple/tensor boilerplate in cuDNN SDPA, preparation for nested/ragged tensor backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/149282 Approved by: https://github.com/drisspg	2025-05-14 01:39:24 +00:00
Aaron Gokaslan	3555ebb63d	[BE]: Update ruff to 0.11.8 (#153249 ) Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249 Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere	2025-05-12 18:30:52 +00:00
Ke Wen	5bf0c3518c	Detect NVSHMEM location (#153010 ) ### Changes - Detect NVSHMEM install location via `sysconfig.get_path("purelib")`, which typically resolves to `<conda_env>/lib/python/site-packages`, and NVSHMEM include and lib live under `nvidia/nvshmem` - Added link dir via `target_link_directories` - Removed direct dependency on mlx5 - Added preload rule (following other other NVIDIA libs) ### Plan of Record 1. End user experience: link against NVSHMEM dynamically (NVSHMEM lib size is 100M, similar to NCCL, thus we'd like users to `pip install nvshmem` than torch carrying the bits) 2. Developer experience: at compile time, prefers wheel dependency than using Git submodule General rule: submodule for small lib that torch can statically link with If user pip install a lib, our CI build process should do the same, rather than building from Git submodule (just for its header, for example) 3. Keep `USE_NVSHMEM` to gate non-Linux platforms, like Windows, Mac 4. At configuration time, we should be able to detect whether nvshmem is available, if not, we don't build `NVSHMEMSymmetricMemory` at all. For now, we have symbol dependency on two particular libs from NVSHMEM: - libnvshmem_host.so: contains host side APIs; - libnvshmem_device.a: contains device-side global variables AND device function impls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153010 Approved by: https://github.com/ngimel, https://github.com/fduwjj, https://github.com/Skylion007	2025-05-07 23:35:04 +00:00
You Jiacheng	ee0cd1d8b5	Only do shallow clone when checkout nccl (#152826 ) Note: `--depth` implies `--single-branch` since git 2.7.6 ```sh git clone https://github.com/NVIDIA/nccl.git Cloning into 'nccl'... remote: Enumerating objects: 4205, done. remote: Counting objects: 100% (238/238), done. remote: Compressing objects: 100% (122/122), done. remote: Total 4205 (delta 144), reused 126 (delta 116), pack-reused 3967 (from 3) Receiving objects: 100% (4205/4205), 4.22 MiB \| 7.01 MiB/s, done. Resolving deltas: 100% (2858/2858), done. ``` ```sh git clone --depth 1 --branch v2.25.1-1 https://github.com/NVIDIA/nccl.git Cloning into 'nccl'... remote: Enumerating objects: 249, done. remote: Counting objects: 100% (249/249), done. remote: Compressing objects: 100% (227/227), done. remote: Total 249 (delta 31), reused 111 (delta 15), pack-reused 0 (from 0) Receiving objects: 100% (249/249), 657.44 KiB \| 2.14 MiB/s, done. Resolving deltas: 100% (31/31), done. Note: switching to '80f6bda4378b99d99e82b4d76a633791cc45fef0'. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152826 Approved by: https://github.com/albanD	2025-05-06 04:56:19 +00:00
albanD	22d1359bc6	Move warning from item to specific number conversions (#152709 ) Follow up to https://github.com/pytorch/pytorch/pull/143261 to not warn when a plain .item() is done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152709 Approved by: https://github.com/malfet, https://github.com/ngimel	2025-05-05 20:46:05 +00:00
cyy	45efa1aaa8	[3/N] Use internal linkage in C++ files (#151297 ) Follows #151070. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151297 Approved by: https://github.com/Skylion007	2025-05-05 17:48:39 +00:00
Tom Ritchford	2825a28bf1	Exempt overriding methods from docstring_linter (fix #151692 ) (#151906 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151906 Approved by: https://github.com/Skylion007	2025-05-05 12:39:42 +00:00
Michał Górny	5c0f474dac	Do not check out nccl when not building it (#152533 ) Add additional conditions to `build_pytorch_libs.py` to avoid fetching NCCL when `USE_CUDA` or `USE_NCCL` are disabled. While at it, adjust the existing condition for `USE_SYSTEM_NCCL` to use the utility function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152533 Approved by: https://github.com/albanD	2025-05-02 16:31:03 +00:00
PyTorch MergeBot	1c04ea4e59	Revert "[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path` (#150726 )" This reverts commit `4b5b1adb21`. Reverted https://github.com/pytorch/pytorch/pull/150726 on behalf of https://github.com/malfet due to This breaks Windows builds, see `a765e2ddda/1` ([comment](https://github.com/pytorch/pytorch/pull/150726#issuecomment-2845858846))	2025-05-01 21:52:35 +00:00
Xuehai Pan	4b5b1adb21	[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path` (#150726 ) This PR allows `FileManager` to accept `pathlib.Path` as arguments while keeping the original `str` path support. This allows us to simplify the code such as: 1. `os.path.join(..., ...)` with `Path.__floordiv__(..., ...)`. `95a5958db4/torchgen/utils.py (L155)` `95a5958db4/torchgen/utils.py (L176)` 2. `os.path.basename(...)` with `Path(...).name`. `95a5958db4/torchgen/utils.py (L161)` 3. Manual file extension split with `Path(...).with_stem(new_stem)` `95a5958db4/torchgen/utils.py (L241-L256)` ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/150726 Approved by: https://github.com/zou3519	2025-05-01 17:43:16 +00:00
Pian Pawakapan	632b89af43	[dynamic shapes] support SymInt inputs for kthvalue (#152151 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152151 Approved by: https://github.com/tugsbayasgalan, https://github.com/malfet	2025-05-01 03:47:23 +00:00
Camyll Harajli	b22fda9e1c	Remove conda refs in tools (#152368 ) Fixes #152126 Did not find references in the two .ipynb files Pull Request resolved: https://github.com/pytorch/pytorch/pull/152368 Approved by: https://github.com/atalman	2025-04-29 02:45:47 +00:00
Anthony Shoumikhin	7cae7902a2	Add scripts to check xrefs and urls (#151844 ) Traverses the docs and code to find any broken links Pull Request resolved: https://github.com/pytorch/pytorch/pull/151844 Approved by: https://github.com/huydhn	2025-04-28 09:30:07 +00:00
Anthony Shoumikhin	e2f9759bd0	Fix broken URLs (#152237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet	2025-04-27 09:56:42 +00:00
sumantro93	017a6bd593	add min/max_seqlen to non_differentiable (#151750 ) Fixes #148988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151750 Approved by: https://github.com/soulitzer	2025-04-22 21:46:02 +00:00
Junjie Wang (PyTorch)	95abc0f515	[c10d][fr] Fix another bug when we should continue when the op list is empty (#151798 ) Differential Revision: D73375318 We shouldn't check the op list when it is empty. And later, when it is empty we pops it out from the queue we will check for collective matching. Added a unit test for this case and also covered the case fixed https://github.com/pytorch/pytorch/pull/151683 in the unit test as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151798 Approved by: https://github.com/d4l3k, https://github.com/wconstab, https://github.com/fegin	2025-04-22 04:43:31 +00:00
Junjie Wang (PyTorch)	6e7b6e8d57	[c10d][fr] Fix a bug when first rank is not zero in the script (#151683 ) Summary: Further testing the script, we found that we shouldn't always assume rank 0 is the first rank, so we need to check all entries and see if it P2P op for this coalesced group. Test Plan: Directly test with corner case. Differential Revision: D73266257 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151683 Approved by: https://github.com/fegin	2025-04-18 20:55:06 +00:00
Jithun Nair	b4550541ea	[ROCm] upgrade nightly wheels to rocm6.4 (#151355 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151355 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-04-17 17:29:07 +00:00
fduwjj	6f9ffaa991	[c10d][fr] Fix script for uneven reduce scatter and update test cases (#151475 ) Somehow the type string for reduce scatter is "REDUCE_SCATTER" not "REDUCESCATTER". This PR fixed it and added more test cases. Differential Revision: [D73141245](https://our.internmc.facebook.com/intern/diff/D73141245) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151475 Approved by: https://github.com/fegin	2025-04-17 02:11:08 +00:00
fduwjj	ae648f047c	[c10d][fr] Enable FR analysis script for rest of all coalesce op (#151247 ) We revisited how coalesced collective is working in https://github.com/pytorch/pytorch/pull/151243 and we now want to enable the script to work for slow path. The change is indeed bc-breaking but this is needed to make it work and the API is an internal use API. It is not user facing. For slow path the individual has input-sizes and output sizes recorded but no state. The final one has the state ready. We check the correctness of each individual collective one by one but we don't check the state match for these collectives, we can only check the state match for the last one which is the work item with coalesced label. Added more unit test for slow path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151247 Approved by: https://github.com/d4l3k, https://github.com/XilunWu	2025-04-15 20:53:03 +00:00
fduwjj	48b4bc1640	[c10d][fr] Enable FR analysis script for all fast-path coalesce op (#151243 ) This PR is to enable FR for all coalesce ops for fast path. (batch p2p is enabled in the current script, so we will mainly focus on non-P2P ops). To explain what is fast path, let's revisit how coalesced collective is working today: For non-P2P coalesced ops, there are are several ways to call it (due to legendary reasons): - Way one: Directly call python api like all_reduce_coalesced in python, this will be deprecated soon. - Way two: Directly call api inside PGNCCL like allreduce_coalesced. The way case 1 will eventually call into this. This is not deprecated and will not be deprecated, IIUC. - Way three: Using _coalescing_manager in python, like: ``` with _coalescing_manager(): for i in range(num_colls): dist.all_reduce(tensors[i]) ``` This way has two path: - Fast path: when users call all-reduce, all-gather-into-tensor or reduce-scatter, we will only launch one big collective by calling the api from case 1. - Slow path: we call startCoalescing() in the beginning and then a bunch of collectives (each one will generate a FR entry) and then endCoalescing(). Inside startCoalescing(), groupStart() is called and inside endCoalescing(), groupEnd() is then called. So although this is going to be one collective, we call into PGNCCL for each collective coalesced in the slow path case. - For uneven all-gather (allgather_v) and reduce-scatter, it follows the pattern mention in slow path. It directly call cpp api inside PGNCCL. This PR addressed the fast path because this is just an easy case, we store the collectives info on the python side, and we will only call into PGNCCL once so there will only be one work and one FR entry. We can just treat them as regular coalesced collective. We add some e2e unit test for build_db function so that the change to FR is more thoroughly tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151243 Approved by: https://github.com/d4l3k, https://github.com/wz337	2025-04-15 04:08:28 +00:00
fduwjj	48132de4af	[c10d][fr] Fix the false positive in the dtype check in fr analysis script (#151063 ) When checking dtype in fr analysis script, we should only check it when the input of output numbel is larger than zero. For the case when it is gather or scatter, the output/input size will be an empty list for non-src or non-dst ranks which we should just skip the check. Differential Revision: [D72826823](https://our.internmc.facebook.com/intern/diff/D72826823) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151063 Approved by: https://github.com/d4l3k, https://github.com/kwen2501	2025-04-11 02:11:58 +00:00
Tom Ritchford	596e44d26a	[inductor] Enable docstring_linter on _inductor (#144622 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144622 Approved by: https://github.com/eellison ghstack dependencies: #144621	2025-04-10 14:32:26 +00:00
Tom Ritchford	ba35793226	[inductor] Add tests for new docstring_linter features (fix #142496 ) (#144621 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144621 Approved by: https://github.com/eellison	2025-04-10 14:32:26 +00:00
cyy	322f883c0c	Remove unneeded CUDA logic from _create_build_env (#145822 ) Because FindCUDAToolkit.cmake has that logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145822 Approved by: https://github.com/albanD	2025-04-10 02:17:28 +00:00
Tom Ritchford	31fe258efc	[inductor] Add features to docstring_linter (see #142496 ) (#145834 ) ## Improvements to `docstring_linter` * Add a "grandfather list" of existing undocumented classes and functions (`--grandfather`, `--grandfather-tolerance`, `--no-grandfather`, `--write-grandfather`) * In classes, now just one of the class itself or its `__init__()` method needs to be documented (`--lint-init` turns the old behavior back on) * Now classes and functions defined local to other functions do not need to be documented (`--lint-local` turns the old behavior back on) * New `--report` flag produces a compact report of long, undocumented classes or function definitions: see attached example run over all pytorch: [pytorch-docs.json](https://github.com/user-attachments/files/18455981/pytorch-docs.json) ## Help text ``` $ python tools/linter/adapters/docstring_linter.py --help usage: docstring_linter.py [-h] [-l] [-v] [--grandfather GRANDFATHER] [--grandfather-tolerance GRANDFATHER_TOLERANCE] [--lint-init] [--lint-local] [--lint-protected] [--max-class MAX_CLASS] [--max-def MAX_DEF] [--min-docstring MIN_DOCSTRING] [--no-grandfather] [--report] [--write-grandfather] [files ...] `docstring_linter` reports on long functions, methods or classes without docstrings positional arguments: files A list of files or directories to lint optional arguments: -h, --help show this help message and exit -l, --lintrunner Run for lintrunner and print LintMessages which aren't edits -v, --verbose Print more debug info --grandfather GRANDFATHER, -g GRANDFATHER Set the grandfather list --grandfather-tolerance GRANDFATHER_TOLERANCE, -t GRANDFATHER_TOLERANCE Tolerance for grandfather sizes, in percent --lint-init, -i Lint __init__ and class separately --lint-local, -o Lint definitions inside other functions --lint-protected, -p Lint functions, methods and classes that start with _ --max-class MAX_CLASS, -c MAX_CLASS Maximum number of lines for an undocumented class --max-def MAX_DEF, -d MAX_DEF Maximum number of lines for an undocumented function --min-docstring MIN_DOCSTRING, -s MIN_DOCSTRING Minimum number of characters for a docstring --no-grandfather, -n Disable the grandfather list --report, -r Print a report on all classes and defs --write-grandfather, -w Rewrite the grandfather list ``` --- Pull Request resolved: https://github.com/pytorch/pytorch/pull/145834 Approved by: https://github.com/amjames, https://github.com/eellison	2025-04-09 21:38:36 +00:00
fduwjj	8aaf296efc	[c10d][fr] Refactor analysis script for modularization and reusing for coalesce collectives (#150881 ) Trying to make the code of FR analysis more reusable and modularized. So we split core error analysis logic into separate functions. This PR mostly is shuffle around the code a bit. Differential Revision: [D72690120](https://our.internmc.facebook.com/intern/diff/D72690120) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150881 Approved by: https://github.com/wz337	2025-04-09 16:10:19 +00:00
Natalia Gimelshein	55e62ff74a	bf16 grouped gemm (#150374 ) Enabled bf16 grouped gemm with an API similar to _scaled_group_gemm, except without scale and fast accum arguments. All transpose variants are enabled, unlike scaled gemm. Ideally we'd factor out a lot more code from scaled gemm, currently there's a lot of repetition between scaled and non-scaled versions. I factored out only a helper kernel that prepares arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150374 Approved by: https://github.com/drisspg	2025-04-06 04:53:24 +00:00
Xuehai Pan	ae74ef9d53	Set proper `LD_LIBRARY_PATH` on Linux in nightly venv in nightly pull tool (#143262 ) Before this change: ```console $ make setup-env-cuda PYTHON="${HOMEBREW_PREFIX}/bin/python3.12" $ source venv/bin/activate $ python3 -c 'import torch' Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/PanXuehai/Projects/pytorch/torch/__init__.py", line 379, in <module> from torch._C import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^ ImportError: libcudnn.so.9: cannot open shared object file: No such file or directory ``` This PR adds `site-packages/nvidia/**/lib` to `LD_LIBRARY_PATH` in `venv/bin/activate` script to let NVIDIA PyPI packages can be loaded correctly. See also: - #141837 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143262 Approved by: https://github.com/malfet	2025-04-01 16:51:02 +00:00
FFFrog	36f2d0aaba	Add "xpu" to __all__ for torch/version.py (#149695 ) As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149695 Approved by: https://github.com/desertfire, https://github.com/guangyey	2025-04-01 08:44:51 +00:00
Phillip Liu	31634b8c6a	[fr] Added protection against missing stack frames in fr cont. (#150133 ) Summary: Previously we had D70358287, which didn't fully resolved the issue. Test Plan: # FR `buck2 run @//mode/opt //caffe2/fb/flight_recorder:fr_trace -- --mast_job_id f710320638-TrainingApplication --mast_job_version 0 --mast_job_attempt 0 --bucket tlcm_log_blob --world_size 128 --dump_file_name_offset 0 --allow-incomplete-ranks` Confirm no error # FR analyzer `buck2 run @//mode/opt //investigations/dr_patternson/analyzers/ai_observability:ai_observability-all-analyzers-cli -- flight_recorder_analyzer --mast_job_name f710320638-TrainingApplication --mast_job_version 0 --mast_job_attempt 0` Confirm no error Differential Revision: D71998980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150133 Approved by: https://github.com/fduwjj	2025-04-01 03:07:59 +00:00
Daniël de Kok	fdc4394b16	Do not fetch NCCL when system NCCL is used (#149607 ) We are compiling PyTorch in a sandbox without networking. Unconditionally fetching breaks the build and is not needed when a system NCCL is used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149607 Approved by: https://github.com/malfet	2025-03-28 05:06:49 +00:00
vasiliy	e33bc41958	add `torch.float4_e2m1fn_x2` to PyTorch (#148791 ) Summary: Redo of https://github.com/pytorch/pytorch/pull/146578 to get around rebase conflicts. Test Plan: ``` pytest test/quantization/core/experimental/test_floatx.py -s ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/148791 Approved by: https://github.com/drisspg, https://github.com/eqy, https://github.com/jeffdaily	2025-03-27 17:32:20 +00:00

1 2 3 4 5 ...

5316 Commits