pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
albanD	bdbf2792a8	Fix docs build (#155129 ) Not sure why the online doc build passes but it fails locally with these broken strings... ~Also pinning numpy version even though it is technically optional to ensure users have the right version as most users have numpy in their environment anyways.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/155129 Approved by: https://github.com/janeyx99, https://github.com/svekars	2025-06-09 22:25:20 +00:00
Justin Silver	2aade5ee9f	Fix weight tensor documentation #134896 (#155093 ) Fixes #134896 ## Description Remove line about 'weight' tensor needing to be of floating point type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155093 Approved by: https://github.com/AlannaBurke	2025-06-09 18:07:21 +00:00
PyTorch MergeBot	d3d64c6db0	Revert "Add pinned numpy and fix build (#155129 )" This reverts commit `a3098a74d4`. Reverted https://github.com/pytorch/pytorch/pull/155129 on behalf of https://github.com/malfet due to Broke test_spectral_op, looks like missing xfail, see `0db3e0cf29/1` ([comment](https://github.com/pytorch/pytorch/pull/155129#issuecomment-2947951632))	2025-06-06 03:14:47 +00:00
Joel Schlosser	5e93abe3c0	Address docs for clip_grad functions (#155125 ) This PR takes the opinionated stance that `torch.nn.utils.<func>` should be the preferred API over `torch.nn.utils.clip_grad.<func>`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155125 Approved by: https://github.com/albanD, https://github.com/mikaylagawarecki, https://github.com/janeyx99	2025-06-05 19:22:09 +00:00
albanD	a3098a74d4	Add pinned numpy and fix build (#155129 ) Not sure why the online doc build passes but it fails locally with these broken strings... Also pinning numpy version even though it is technically optional to ensure users have the right version as most users have numpy in their environment anyways. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155129 Approved by: https://github.com/janeyx99, https://github.com/svekars	2025-06-05 17:44:18 +00:00
Mikayla Gawarecki	671553bd23	Update documentation wording for transformer-related layers (#155123 ) <img width="947" alt="Screenshot 2025-06-04 at 1 33 53 PM" src="https://github.com/user-attachments/assets/4dbb66b3-43f4-4d04-afb5-dc80cec0f2cd" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/155123 Approved by: https://github.com/albanD, https://github.com/jbschlosser	2025-06-04 22:20:32 +00:00
zeshengzong	31d12b3955	Fix avg_pool2d param kernel_size descripthon (#154353 ) Fixes part of #153149 ## Test Result ![image](https://github.com/user-attachments/assets/216ffd2b-dd2b-4cf6-9fca-aeed075be5e7) ![image](https://github.com/user-attachments/assets/820cd184-1f8e-4a7a-b64e-15dfb9c7dad2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154353 Approved by: https://github.com/colesbury	2025-06-04 11:55:01 +00:00
Natalia Gimelshein	34e3930401	fix numpy compatibility for 2d small list indices (#154806 ) Will fix #119548 and linked issues once we switch from warning to the new behavior, but for now, given how much this syntax was used in our test suite, we suspect a silent change will be disruptive. We will change the behavior after 2.8 branch is cut. Numpy behavior was changed at least in numpy 1.24 (more than 2 years ago) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154806 Approved by: https://github.com/cyyever, https://github.com/Skylion007, https://github.com/albanD	2025-06-04 01:58:52 +00:00
PyTorch MergeBot	50de6ae253	Revert "[BE][Ez]: Fully type nn.utils.clip_grad (#154801 )" This reverts commit `9ce2732b68`. Reverted https://github.com/pytorch/pytorch/pull/154801 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/154801#issuecomment-2937886337))	2025-06-04 00:41:27 +00:00
bubuss	31405a69fb	[typing] Add missing type annotations to torch.nn.init module (#154504 ) ## Summary Adds missing type annotations to `torch.nn.init` and removes `# mypy: allow-untyped-defs` since all functions are now properly typed. ## Changes - Added missing type annotations to initialization functions in the module. - Added missing typing imports: `Any`, `Callable`, `Union` - Removed `# mypy: allow-untyped-defs` comment - Create Literal types for kaiming initialization mode and nonlinearity. - Created `__all__` ## Why Better IDE support, catches type errors earlier, and brings the module up to PyTorch's typing standards. No runtime changes - purely additive typing improvements. Tested with existing test suite and lintrunner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154504 Approved by: https://github.com/Skylion007	2025-06-03 17:33:32 +00:00
cora-codes	40142978d7	Add type annotation to orthogonal_ (#154927 ) Trivial charge, but I want pyright to stop yelling at me Pull Request resolved: https://github.com/pytorch/pytorch/pull/154927 Approved by: https://github.com/cyyever, https://github.com/Skylion007	2025-06-03 17:00:02 +00:00
Aaron Gokaslan	9ce2732b68	[BE][Ez]: Fully type nn.utils.clip_grad (#154801 ) Full types clip_grad and exposed typing annotations that were hidden by a bad decorator Pull Request resolved: https://github.com/pytorch/pytorch/pull/154801 Approved by: https://github.com/jansel	2025-05-31 23:06:45 +00:00
zeshengzong	1b569e5490	Fix load_state_dict description (#154599 ) Fixes #141364 Fix missing description in `assign` param ## Test Result ### Before ![image](https://github.com/user-attachments/assets/5928c691-4e31-463b-aa0a-86eb8bb452e5) ### After ![image](https://github.com/user-attachments/assets/036631a2-0f20-4a71-95c3-2c0fd732293e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154599 Approved by: https://github.com/colesbury, https://github.com/mikaylagawarecki	2025-05-30 18:08:59 +00:00
Bob Ren	382b38ed1b	remove allow-untyped-defs from torch/nn/utils/_expanded_weights/conv_expanded_weights.py (#154623 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154623 Approved by: https://github.com/Skylion007	2025-05-30 07:32:57 +00:00
Aaron Gokaslan	316e7a9293	[BE][Ez]: Denote common types as TypeAlias (#154527 ) Denotes common_types as TypeAlias. This triggered a Ruff rule since we named our TypeAlias off standards so I added a file wide ruff suppression Pull Request resolved: https://github.com/pytorch/pytorch/pull/154527 Approved by: https://github.com/benjaminglass1, https://github.com/aorenste	2025-05-29 02:00:13 +00:00
Aaron Orenstein	946a4c2bdc	BE: Type previously untyped decorators (#154515 ) Summary: Cloned #153726 from Skylion007 and fixed internal typing issues. Test Plan: Unit tests pass Differential Revision: D75477355 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154515 Approved by: https://github.com/Skylion007	2025-05-29 00:36:34 +00:00
Xuehai Pan	7ae204c3b6	[BE][CI][Easy] Run `lintrunner` on generated `.pyi` stub files (#150732 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150732 Approved by: https://github.com/malfet, https://github.com/cyyever, https://github.com/aorenste	2025-05-27 14:58:02 +00:00
Nikita Shulga	214e4cef9f	Fix RMSNorm doc rendering (#154205 ) By removing `::func::` decorator which adds unneeded parenthesis Test plan: Check https://docs-preview.pytorch.org/pytorch/pytorch/154205/generated/torch.nn.RMSNorm.html#rmsnorm that now renders as <img width="704" alt="image" src="https://github.com/user-attachments/assets/443f605d-75a6-41ef-8971-21e7dc8ef9f6" /> Fixes https://github.com/pytorch/pytorch/issues/154184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154205 Approved by: https://github.com/mikaylagawarecki	2025-05-23 15:39:29 +00:00
PyTorch MergeBot	7d3dab6b90	Revert "[BE]: Type previously untyped decorators (#153726 )" This reverts commit `b7d08defe9`. Reverted https://github.com/pytorch/pytorch/pull/153726 on behalf of https://github.com/yangw-dev due to sorry, it seems like your pr failed typecheck error internally, [D75155486](https://www.internalfb.com/diff/D75155486) ([comment](https://github.com/pytorch/pytorch/pull/153726#issuecomment-2901911114))	2025-05-22 16:49:08 +00:00
zeshengzong	f12d8d60b1	Add hint message when parameters is empty in clip_grad_norm_ (#151529 ) Fixes #148259 ## Changes - Add print warning message when `parameters` generator exhausted ## Test Result ### print warning ```python import torch import torch.nn as nn import torch.optim as optim class SimpleModel(nn.Module): def __init__(self): super(SimpleModel, self).__init__() self.fc = nn.Linear(10, 1) def forward(self, x): return self.fc(x) model = SimpleModel() criterion = nn.MSELoss() optimizer = optim.SGD(model.parameters(), lr=0.01) inputs = torch.randn(16, 10) targets = torch.randn(16, 1) outputs = model(inputs) loss = criterion(outputs, targets) optimizer.zero_grad() loss.backward() params_to_clip = model.parameters() for p in params_to_clip: print(p.shape) max_norm = 1.0 norm_type = 2.0 total_norm = nn.utils.clip_grad_norm_(params_to_clip, max_norm, norm_type) print(f"total_norm: {total_norm}") ``` ```bash /home/zong/code/pytorch/torch/nn/utils/clip_grad.py:222: UserWarning: `parameters` is an empty generator, no gradient clipping will occur. warnings.warn( total_norm: 0.0 ``` ### UT ```bash pytest test/test_nn.py -k test_clip_grad_norm ``` ![image](https://github.com/user-attachments/assets/0aa0f06c-e0a5-43cf-9a97-d7c2747c9180) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151529 Approved by: https://github.com/jbschlosser	2025-05-22 11:23:39 +00:00
ishanjmukherjee	d82610c2af	docs: fix "should not to be" typo in `register_buffer` docstring (#153817 ) Corrects a small grammatical error in `register_buffer` docstring, from "... should not to be ..." to "... should not be ...". Docs-only change, so no runtime behavior, tests, or APIs are affected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153817 Approved by: https://github.com/mikaylagawarecki	2025-05-21 22:46:50 +00:00
yang	b967b7b11e	Update rnn.py, fix `torch.nn.RNN` document error (#153620 ) I found the same issue as #147490 (@jibril-b-coulibaly). There's an equivalent in the [doc-string](https://docs.pytorch.org/docs/stable/generated/torch.nn.RNN.html#rnn) of `torch.nn.RNN`: ```python # Efficient implementation equivalent to the following with bidirectional=False def forward(x, hx=None): if batch_first: x = x.transpose(0, 1) seq_len, batch_size, _ = x.size() if hx is None: hx = torch.zeros(num_layers, batch_size, hidden_size) h_t_minus_1 = hx h_t = hx output = [] for t in range(seq_len): for layer in range(num_layers): h_t[layer] = torch.tanh( x[t] @ weight_ih[layer].T + bias_ih[layer] + h_t_minus_1[layer] @ weight_hh[layer].T + bias_hh[layer] ) output.append(h_t[-1]) h_t_minus_1 = h_t output = torch.stack(output) if batch_first: output = output.transpose(0, 1) return output, h_t ``` However there's something wrong. 1. Like mentioned in #147490, line 499 is wrong `fb55bac3de/torch/nn/modules/rnn.py (L499)` The input for RNNCell should be different for different layers. 2. The code contains several hidden reference-related issues that may result in unintended modifications to tensors. For example in line 504, this causes all elements in the final output list to point to the same tensor. `fb55bac3de/torch/nn/modules/rnn.py (L504)` 3. Some variable is not defined. Despite being a relatively minor issue in annotation, it can lead to significant confusion for those who are new to the concept. For example `weight_ih` in line 499 `fb55bac3de/torch/nn/modules/rnn.py (L499)` So, i write a runnable version to make it more clear: ```python # Efficient implementation equivalent to the following with bidirectional=False rnn = nn.RNN(input_size, hidden_size, num_layers) params = dict(rnn.named_parameters()) def forward(x, hx=None, batch_first=False): if batch_first: x = x.transpose(0, 1) seq_len, batch_size, _ = x.size() if hx is None: hx = torch.zeros(rnn.num_layers, batch_size, rnn.hidden_size) h_t_minus_1 = hx.clone() h_t = hx.clone() output = [] for t in range(seq_len): for layer in range(rnn.num_layers): input_t = x[t] if layer == 0 else h_t[layer - 1] h_t[layer] = torch.tanh( input_t @ params[f"weight_ih_l{layer}"].T + h_t_minus_1[layer] @ params[f"weight_hh_l{layer}"].T + params[f"bias_hh_l{layer}"] + params[f"bias_ih_l{layer}"] ) output.append(h_t[-1].clone()) h_t_minus_1 = h_t.clone() output = torch.stack(output) if batch_first: output = output.transpose(0, 1) return output, h_t ``` This code can reproduce the computation of torch.nn.RNN. For example: ```python import torch import torch.nn as nn torch.manual_seed(0) input_size, hidden_size, num_layers = 3, 5, 2 rnn = nn.RNN(input_size, hidden_size, num_layers) params = dict(rnn.named_parameters()) x = torch.randn(10, 4, 3) official_imp = rnn(x) my_imp = forward(x) assert torch.allclose(official_imp[0], my_imp[0]) assert torch.allclose(official_imp[1], my_imp[1]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153620 Approved by: https://github.com/mikaylagawarecki	2025-05-21 22:45:28 +00:00
Aaron Gokaslan	b7d08defe9	[BE]: Type previously untyped decorators (#153726 ) This fixes decorator typing which unmasks a lot of typing issues in the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/153726 Approved by: https://github.com/albanD	2025-05-21 15:56:19 +00:00
Aaron Gokaslan	ffd49d538e	[BE][Ez]: Improve typing in torch/modules/container.py (#153728 ) Adds some missing type annotations Pull Request resolved: https://github.com/pytorch/pytorch/pull/153728 Approved by: https://github.com/albanD	2025-05-21 07:15:00 +00:00
Aaron Gokaslan	f3daedb263	[BE]: Remove redundant copy (#153629 ) Add typing and remove redundant copy Pull Request resolved: https://github.com/pytorch/pytorch/pull/153629 Approved by: https://github.com/cyyever, https://github.com/albanD	2025-05-19 08:25:20 +00:00
Daniel Vega-Myhre	7e16cb99b6	[FlexAttention] Enforce Q,K,V memory layouts for fp8 flex attention to avoid perf degradation (#153357 ) Fixes #147336 ## Context NCU analysis of the fp8 flex attention perf issue in #147336 showed an unexpected increase in shared memory access bank conflicts when loading the V tensor from HBM to SRAM. Bringing this to the attention of triton developer @davidberard98 he identified the memory layout of the tensor in HBM to be causing non-pipelined loads into SRAM, causing the slowdown. To summarize: In flex attention when performing the FP8 GEMM `softmax_scores @ V` the right operand V must be in column-major memory layout. However, the `tl.load` of V blocks from HBM to SRAM cannot be pipelined if the V tensor isn't column-major in HBM already, leading to substantial performance degradation. This is because triton does not perform async copies with the `cp.async` PTX instruction if the number of contiguous bytes is less than 4 (see [here](`81f93f2c8e/lib/Dialect/TritonGPU/Transforms/Pipeliner/PipeliningUtility.cpp (L403)`)). i.e., when loading 4 bytes of contiguous data from a tensor stored in row-major in HBM, we have to perform 4 separate non-contiguous writes to SRAM to place those bytes in their new location in the col-major layout in SRAM. Thus the load is not a candidate for pipelining w/ cp.async and just moves data to registers then performs a series of single byte stores. ## Fix summary - To fix this, we should enforce memory layouts for Q, K, V in FlexAttention when fp8 is being used, to ensure they each exist in HBM in the necessary memory layout to facilitate pipelined loads into SRAM ahead of the FP8 GEMMs ## Benchmarks Rerunning the repro we see fp8 runtime is reduced from 120% of bf16 to 76% of bf16 runtime. Before fix: ``` (flex) [danvm@devgpu007.eag6 ~/ml-perf-tools/flex_attention (main)]$ rm -rf /tmp/torchinductor_${USER}; python profile_flex.py --bf16 --fp8 2025-05-11 19:07:33,402 - flex_bench - INFO - Running benchmark: bf16 2025-05-11 19:07:35,885 - flex_bench - INFO - bf16: 424.87228804347734 us 2025-05-11 19:07:35,893 - flex_bench - INFO - Running benchmark: fp8e4m3 2025-05-11 19:07:37,319 - flex_bench - INFO - fp8e4m3: 515.714000000001 us ``` After fix: ``` (flex) [danvm@devgpu007.eag6 ~/ml-perf-tools/flex_attention (main)]$ rm -rf /tmp/torchinductor_${USER}; python profile_flex.py --bf16 --fp8 2025-05-11 17:34:38,223 - flex_bench - INFO - Running benchmark: bf16 2025-05-11 17:34:41,157 - flex_bench - INFO - bf16: 423.4662032967036 us 2025-05-11 17:34:41,167 - flex_bench - INFO - Running benchmark: fp8e4m3 2025-05-11 17:34:42,917 - flex_bench - INFO - fp8e4m3: 326.3694803493453 us ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153357 Approved by: https://github.com/ngimel, https://github.com/davidberard98	2025-05-16 04:56:50 +00:00
Simon Fan	d1f1ff8610	[ddp] propagate use_python_reducer to C++ reducer (#152735 ) C++ Reducer is silently incorrect under CA, its implementation is no-oping the collective. I'm guessing that it was no-op'd because in DDP + python reducer, the C++ reducer is still being initialized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152735 Approved by: https://github.com/fegin ghstack dependencies: #153300, #152689	2025-05-16 01:38:03 +00:00
Xuehai Pan	a4c828199e	[BE] Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi` (#150729 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150729 Approved by: https://github.com/aorenste	2025-05-15 19:01:57 +00:00
Xuehai Pan	22b124335e	[BE] Update `.pyi` stub template to use Generic TypeAlias (PEP 585) and Union Type (PEP 604) (#150728 ) https://github.com/pytorch/pytorch/pull/129001#discussion_r1645126801 is the motivation for the whole stack of PRs. In `torch/__init__.py`, `torch._C.Type` shadows `from typing import Type`, and there is no type stub for `torch._C.Type` in `torch/_C/__init__.pyi`. So we need to use `from typing import Type as _Type`. After enabling [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585) in the `.pyi` type stub files, we can use `type` instead of `typing.Type` or `from typing import Type as _Type`. ------ - [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585): e.g. `typing.List[T] -> list[T]`, `typing.Dict[KT, VT] -> dict[KT, VT]`, `typing.Type[T] -> type[T]`. - [Union Type (PEP 604)](https://peps.python.org/pep-0604): e.g. `Union[X, Y] -> X \| Y`, `Optional[X] -> X \| None`, `Optional[Union[X, Y]] -> X \| Y \| None`. Note that in `.pyi` stub files, we do not need `from __future__ import annotations`. So this PR does not violate issue #117449: - #117449 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/150728 Approved by: https://github.com/cyyever, https://github.com/aorenste ghstack dependencies: #150726, #150727	2025-05-15 09:36:42 +00:00
PyTorch MergeBot	71027b13b2	Revert "[FlexAttention] Enforce Q,K,V memory layouts for fp8 flex attention to avoid perf degradation (#153357 )" This reverts commit `881a598a1e`. Reverted https://github.com/pytorch/pytorch/pull/153357 on behalf of https://github.com/jeanschmidt due to Might have introduced regressions in rocm testing for main: https://github.com/pytorch/pytorch/actions/runs/15035410497/job/42257000513 feel free to re-merge if this was a mistake ([comment](https://github.com/pytorch/pytorch/pull/153357#issuecomment-2882915691))	2025-05-15 07:58:27 +00:00
Daniel Vega-Myhre	881a598a1e	[FlexAttention] Enforce Q,K,V memory layouts for fp8 flex attention to avoid perf degradation (#153357 ) Fixes #147336 ## Context NCU analysis of the fp8 flex attention perf issue in #147336 showed an unexpected increase in shared memory access bank conflicts when loading the V tensor from HBM to SRAM. Bringing this to the attention of triton developer @davidberard98 he identified the memory layout of the tensor in HBM to be causing non-pipelined loads into SRAM, causing the slowdown. To summarize: In flex attention when performing the FP8 GEMM `softmax_scores @ V` the right operand V must be in column-major memory layout. However, the `tl.load` of V blocks from HBM to SRAM cannot be pipelined if the V tensor isn't column-major in HBM already, leading to substantial performance degradation. This is because triton does not perform async copies with the `cp.async` PTX instruction if the number of contiguous bytes is less than 4 (see [here](`81f93f2c8e/lib/Dialect/TritonGPU/Transforms/Pipeliner/PipeliningUtility.cpp (L403)`)). i.e., when loading 4 bytes of contiguous data from a tensor stored in row-major in HBM, we have to perform 4 separate non-contiguous writes to SRAM to place those bytes in their new location in the col-major layout in SRAM. Thus the load is not a candidate for pipelining w/ cp.async and just moves data to registers then performs a series of single byte stores. ## Fix summary - To fix this, we should enforce memory layouts for Q, K, V in FlexAttention when fp8 is being used, to ensure they each exist in HBM in the necessary memory layout to facilitate pipelined loads into SRAM ahead of the FP8 GEMMs ## Benchmarks Rerunning the repro we see fp8 runtime is reduced from 120% of bf16 to 76% of bf16 runtime. Before fix: ``` (flex) [danvm@devgpu007.eag6 ~/ml-perf-tools/flex_attention (main)]$ rm -rf /tmp/torchinductor_${USER}; python profile_flex.py --bf16 --fp8 2025-05-11 19:07:33,402 - flex_bench - INFO - Running benchmark: bf16 2025-05-11 19:07:35,885 - flex_bench - INFO - bf16: 424.87228804347734 us 2025-05-11 19:07:35,893 - flex_bench - INFO - Running benchmark: fp8e4m3 2025-05-11 19:07:37,319 - flex_bench - INFO - fp8e4m3: 515.714000000001 us ``` After fix: ``` (flex) [danvm@devgpu007.eag6 ~/ml-perf-tools/flex_attention (main)]$ rm -rf /tmp/torchinductor_${USER}; python profile_flex.py --bf16 --fp8 2025-05-11 17:34:38,223 - flex_bench - INFO - Running benchmark: bf16 2025-05-11 17:34:41,157 - flex_bench - INFO - bf16: 423.4662032967036 us 2025-05-11 17:34:41,167 - flex_bench - INFO - Running benchmark: fp8e4m3 2025-05-11 17:34:42,917 - flex_bench - INFO - fp8e4m3: 326.3694803493453 us ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/153357 Approved by: https://github.com/ngimel, https://github.com/davidberard98	2025-05-15 02:41:38 +00:00
Ryan Guo	6765df052c	[dynamo] Emit warning on global module hooks when calling using output of `torch.compile(module)` (#152740 ) When we do `torch.compile(module)`, we eventually end up returning a new `OptimizedModule` instance, whose `forward` method is the result of `torch.compile(mod.__call__)`, meaning it already captures all the extra logic (e.g., hook firing) for the compiled module. `OptimizedModule` also inherits `nn.module.__call__`, and thus has its own hook logic. This is useful for torchao, which injects module forward hooks to run in eager for quantization purposes. However, this might create unexpected behavior for global module hooks, because `torch.compile(module)` causes the hook to fire one extra time for `OptimizedModule`, when compared to eager. To preserve BC, we simply emit a warning for this behavior, and let users decide what to do. This is reasonable because the global module hooks are documented to be used for debugging/profiling purposes only. Fixes #149502 Differential Revision: [D74611716](https://our.internmc.facebook.com/intern/diff/D74611716) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152740 Approved by: https://github.com/anijain2305, https://github.com/zou3519	2025-05-14 17:03:59 +00:00
Aaron Gokaslan	3555ebb63d	[BE]: Update ruff to 0.11.8 (#153249 ) Fixes a ton of false negatives throughout the codebase. RUFF also properly validates NOQA comments now and most of the changes are fixing typos there or removing filewide flake8 suppressions that were also silencing ruff issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153249 Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/seemethere	2025-05-12 18:30:52 +00:00
zeshengzong	37c71820f3	Fix nn.LazyModuleMixin examples (#150596 ) Fixes #150404 ## Test Result ![image](https://github.com/user-attachments/assets/e546339f-c1cb-47db-ab0e-276a42c167b8) ![image](https://github.com/user-attachments/assets/298db7ad-6512-4b17-9453-170ff843c4fd) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150596 Approved by: https://github.com/mikaylagawarecki	2025-05-06 05:11:22 +00:00
ILCSFNO	a69da90a9f	Add pad limit of avg_poolnd and AvgPoolnd (#152680 ) Fixes #152156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152680 Approved by: https://github.com/mikaylagawarecki	2025-05-04 17:25:22 +00:00
zeshengzong	d457b4492d	Optimize `Sequential` methods description (#147304 ) Fixes #146892 Add methods description and examples for [`Sequential` document](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) ## Test Result ### Before ![image](https://github.com/user-attachments/assets/3121a06f-02ed-4362-ad0a-f055bb43d469) ### After ![image](https://github.com/user-attachments/assets/66f6bb55-5298-4062-8f7f-7a7f4c1e16d9) ![image](https://github.com/user-attachments/assets/a5275a4c-4214-4518-b7a2-dff21954f368) ![image](https://github.com/user-attachments/assets/9c40d1fb-114a-4d14-a3c4-1143a131660e) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147304 Approved by: https://github.com/mikaylagawarecki	2025-05-02 19:18:58 +00:00
PyTorch MergeBot	4f9f1abd6d	Revert "Use swap_tensors path in nn.Module.to for all subclasses that override __torch_dispatch__ (#152539 )" This reverts commit `037343657e`. Reverted https://github.com/pytorch/pytorch/pull/152539 on behalf of https://github.com/wdvr due to failing internal tests - discussed with author ([comment](https://github.com/pytorch/pytorch/pull/152539#issuecomment-2846484924))	2025-05-02 06:43:35 +00:00
Mikayla Gawarecki	037343657e	Use swap_tensors path in nn.Module.to for all subclasses that override __torch_dispatch__ (#152539 ) Fixes https://github.com/pytorch/pytorch/issues/148977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152539 Approved by: https://github.com/albanD	2025-05-01 18:04:33 +00:00
Yu, Guangye	ad81eeb7c7	Refactor to use torch.accelerator.device_index instead of torch.cuda.device for generic device context manager (#148880 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148880 Approved by: https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #148864	2025-04-25 09:45:25 +00:00
zeshengzong	834a017fe3	Optimize register_full_backward_hook description when all input no grad (#151785 ) Fixes #100528 ## Test Result ### Before ![image](https://github.com/user-attachments/assets/5dd2e1d3-3bb1-49d0-84bf-8a7a6b18fa4b) ### After ![image](https://github.com/user-attachments/assets/2e16d17b-1586-40d8-b0ef-35559fc064f4) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151785 Approved by: https://github.com/soulitzer	2025-04-22 17:57:31 +00:00
Aaron Gokaslan	cccfc146fe	[BE][Easy]: Simplify ModuleList reversed method (#151673 ) Removes unnecessary list calls now that we are in Python 3.9 and KeyViews implement reversed directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151673 Approved by: https://github.com/albanD	2025-04-18 18:39:32 +00:00
zeshengzong	1a48382a4c	[Easy] Optimize container.py typing (#151653 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/151653 Approved by: https://github.com/albanD	2025-04-18 17:33:43 +00:00
Yanli Zhao	d8bafd23ab	[DDP] add one option to allow skipping all reduce unused parameters (#151503 ) Summary: add one option to allow skipping all reduce unused parameters, this could help improve training throughput significantly when the number of unused parameters is large in the model. Test Plan: unit tests, CI Differential Revision: D72282069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151503 Approved by: https://github.com/mrshenli	2025-04-17 23:30:19 +00:00
zeshengzong	7e2081fa93	Optimize `interpolate` saturate description (#151304 ) Fixes #108225 ## Test Result ### Before ![image](https://github.com/user-attachments/assets/bdbf8a5c-d5a4-44a5-b81e-2cbb5b8bfd02) ### After ![image](https://github.com/user-attachments/assets/1c21a27d-1700-4661-9988-dbb1cdc81fa2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151304 Approved by: https://github.com/albanD Co-authored-by: albanD <desmaison.alban@gmail.com>	2025-04-17 18:34:29 +00:00
zeshengzong	fe90a5c140	[Easy] Optimize `clip_grad` param description (#151532 ) Fix missing optional description in `clip_grad_norm_` and `clip_grad_value_` ## Test Result ### Before ![image](https://github.com/user-attachments/assets/3393dd4b-a730-4dd4-8304-9b895ac669d4) ![image](https://github.com/user-attachments/assets/220c4738-a728-474b-b06d-b5be7660d150) ### After ![image](https://github.com/user-attachments/assets/5637bb68-3b6d-49a3-8ee1-3af636950aa0) ![image](https://github.com/user-attachments/assets/c0f1d966-a9ba-4fac-a874-9d4955f6e0d6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151532 Approved by: https://github.com/Skylion007, https://github.com/albanD	2025-04-17 16:47:38 +00:00
Mateusz Nowak	4c4a5df73b	Allow to run flex_attention on HPU (#148656 ) HPU specific implementation details are to be located in out-of-tree HPU library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148656 Approved by: https://github.com/drisspg	2025-04-16 19:49:15 +00:00
Olaf Lipinski	0a6e1d6b9b	Expand docs for `nn.functional`, and make the wording consistent (#148436 ) Expands the docs for the loss functions, and makes the wording consistent. Fixes #148353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148436 Approved by: https://github.com/albanD	2025-04-14 19:37:12 +00:00
Zesheng Zong	5a64476ed6	[Easy] Add `output_size` in forward method of ConvTranspose2d (#150609 ) Fixes #74593 Add description for `forward` in [ConvTranspose2d](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html) doc ## Test Result ![image](https://github.com/user-attachments/assets/eebad7a2-f782-4219-9756-344e0f34fada) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150609 Approved by: https://github.com/mikaylagawarecki Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>	2025-04-14 09:53:22 +00:00
jPorterDosch	3e9f4f3f78	docs: allow empty targets tensor in ctc_loss (#151080 ) docs: allow empty targets tensor in ctc_losswhen target_lengths are zero, as described in issue Fixes #150995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151080 Approved by: https://github.com/albanD	2025-04-12 05:26:54 +00:00
zeshengzong	d94cc0e994	Optimize `ConvTranspose2d` stride description (#150819 ) Fixes #150775 ## Test Result ### Before ![image](https://github.com/user-attachments/assets/81cd932f-9447-4924-9553-a5cb88fc5d0e) ### After ![image](https://github.com/user-attachments/assets/6365c71c-7268-4226-b722-ee7446cb2467) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150819 Approved by: https://github.com/jbschlosser	2025-04-11 09:37:56 +00:00

1 2 3 4 5 ...

3571 Commits