pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Jeff Daily	c3b71d5499	[ROCm][CI] remove relaxed tolerance for tf32 tests (#166478 ) Instead of relaxing tolerances for certain unit tests that exercise TF32 on MI300, skip the tests until hipblaslt accuracy is improved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166478 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>	2025-10-31 16:15:42 +00:00
Yuanyuan Chen	0d50e5d8d4	[3/N] Fix unused loop variables (#166509 ) This PR removes unused loop variables in tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166509 Approved by: https://github.com/Lucaskabela, https://github.com/Skylion007	2025-10-30 20:13:51 +00:00
linhaifeng	695cb0d342	[2/N][Fix] Fix typo in test folder (#166374 ) Fix typo in test folder. _typos.toml ```bash [default.extend-words] nd = "nd" arange = "arange" Nd = "Nd" GLOBALs = "GLOBALs" hte = "hte" iy = "iy" PN = "PN" Dout = "Dout" optin = "optin" gam = "gam" PTD = "PTD" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166374 Approved by: https://github.com/cyyever, https://github.com/ezyang	2025-10-29 03:02:07 +00:00
arkadip-maitra	84d8d06fc3	Fixes floating point exception in torch.nn.PixelShuffle (#163154 ) Fixes #162251 Previous Output: `Floating point exception (core dumped)` Now Output: `RuntimeError: upscale factor is too large, (upscale_factor}^2 overflowed: upscale_factor=545460846592` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163154 Approved by: https://github.com/cyyever, https://github.com/albanD	2025-10-22 02:22:16 +00:00
Isalia20	d334c3649d	[CUDA] fix reflection padding for large batch size (#165942 ) Fixes [#165861](https://github.com/pytorch/pytorch/issues/165861) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165942 Approved by: https://github.com/eqy	2025-10-21 21:07:38 +00:00
Yuanyuan Chen	0e083942cc	Enable PLW0127 in ruff (#165851 ) This PR enables `PLW0127` in ruff, which checks self-assignment of variables with the form `var=var`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165851 Approved by: https://github.com/Lucaskabela	2025-10-21 03:30:57 +00:00
Yuanyuan Chen	e925dfcc6b	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang, https://github.com/mlazos	2025-10-17 07:27:11 +00:00
Isalia20	d73c283c3a	[CUDA] Large tensor maxpool crash fix (#165374 ) Fixes #165297 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165374 Approved by: https://github.com/eqy, https://github.com/malfet	2025-10-16 07:59:46 +00:00
eqy	0d39ecb2ce	[cuDNN][RNN] cuDNN RNN supports BFloat16 inputs since 9.13 (#164411 ) seems to work Pull Request resolved: https://github.com/pytorch/pytorch/pull/164411 Approved by: https://github.com/Skylion007	2025-10-08 15:26:50 +00:00
PyTorch MergeBot	5d7360bb03	Revert "Enable all SIM rules except disabled ones (#164645 )" This reverts commit `321e602692`. Reverted https://github.com/pytorch/pytorch/pull/164645 on behalf of https://github.com/izaitsevfb due to causes lint failures ([comment](https://github.com/pytorch/pytorch/pull/164645#issuecomment-3369274351))	2025-10-05 19:32:21 +00:00
Yuanyuan Chen	321e602692	Enable all SIM rules except disabled ones (#164645 ) `SIM` rules are useful for simplifying boolean expressions and enhances code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164645 Approved by: https://github.com/ezyang	2025-10-05 07:38:25 +00:00
blorange-amd	15d726005d	Enable several unit tests on ROCm (#163087 ) Code change enables: test_nn::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float16 test_nn::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float32 test_nn::TestNNDeviceTypeCUDA::test_transformerencoderlayer_cuda_float64 test_nn::TestNNDeviceTypeCUDA::test_transformerencoderlayer_gelu_cuda_float16 test_linalg::TestLinalgCUDA::test_eigh_svd_illcondition_matrix_input_should_not_crash_cuda_float32 test_linalg::TestLinalgCUDA::test_eigh_svd_illcondition_matrix_input_should_not_crash_cuda_float64 test_ops::TestCommonCUDA::test_complex_half_reference_testing_as_strided_scatter_cuda_complex32 Fixes https://github.com/pytorch/pytorch/issues/134687 Fixes https://github.com/pytorch/pytorch/issues/78205 Closing github issues: inductor/test_gpu_cpp_wrapper unit tests: Fixes https://github.com/pytorch/pytorch/issues/157084 test_nn unit tests: Fixes https://github.com/pytorch/pytorch/issues/157167 Fixes https://github.com/pytorch/pytorch/issues/157119 Fixes https://github.com/pytorch/pytorch/issues/157118 Fixes https://github.com/pytorch/pytorch/issues/157115 Fixes https://github.com/pytorch/pytorch/issues/157081 Fixes https://github.com/pytorch/pytorch/issues/155216 Fixes https://github.com/pytorch/pytorch/issues/157259 Fixes https://github.com/pytorch/pytorch/issues/157166 Fixes https://github.com/pytorch/pytorch/issues/157165 Fixes https://github.com/pytorch/pytorch/issues/157164 Fixes https://github.com/pytorch/pytorch/issues/157117 Fixes https://github.com/pytorch/pytorch/issues/157116 Fixes https://github.com/pytorch/pytorch/issues/157114 Fixes https://github.com/pytorch/pytorch/issues/157113 Fixes https://github.com/pytorch/pytorch/issues/157082 Fixes https://github.com/pytorch/pytorch/issues/157080 Fixes https://github.com/pytorch/pytorch/issues/157079 Fixes https://github.com/pytorch/pytorch/issues/157078 test_linalg unit tests: Fixes https://github.com/pytorch/pytorch/issues/157427 Fixes https://github.com/pytorch/pytorch/issues/157414 Fixes https://github.com/pytorch/pytorch/issues/157369 Fixes https://github.com/pytorch/pytorch/issues/157349 Fixes https://github.com/pytorch/pytorch/issues/157348 Fixes https://github.com/pytorch/pytorch/issues/157337 Fixes https://github.com/pytorch/pytorch/issues/157336 Fixes https://github.com/pytorch/pytorch/issues/157297 Fixes https://github.com/pytorch/pytorch/issues/157281 Fixes https://github.com/pytorch/pytorch/issues/157260 Fixes https://github.com/pytorch/pytorch/issues/157171 Fixes https://github.com/pytorch/pytorch/issues/157169 Fixes https://github.com/pytorch/pytorch/issues/157168 Fixes https://github.com/pytorch/pytorch/issues/157125 Fixes https://github.com/pytorch/pytorch/issues/157124 Fixes https://github.com/pytorch/pytorch/issues/157123 Fixes https://github.com/pytorch/pytorch/issues/157089 Fixes https://github.com/pytorch/pytorch/issues/157088 Fixes https://github.com/pytorch/pytorch/issues/157087 Fixes https://github.com/pytorch/pytorch/issues/157068 Fixes https://github.com/pytorch/pytorch/issues/157067 Fixes https://github.com/pytorch/pytorch/issues/157066 Fixes https://github.com/pytorch/pytorch/issues/157047 Fixes https://github.com/pytorch/pytorch/issues/157046 Fixes https://github.com/pytorch/pytorch/issues/157045 Fixes https://github.com/pytorch/pytorch/issues/157044 Fixes https://github.com/pytorch/pytorch/issues/156997 Fixes https://github.com/pytorch/pytorch/issues/156996 Fixes https://github.com/pytorch/pytorch/issues/156995 Fixes https://github.com/pytorch/pytorch/issues/156994 Fixes https://github.com/pytorch/pytorch/issues/156993 Fixes https://github.com/pytorch/pytorch/issues/156991 Fixes https://github.com/pytorch/pytorch/issues/156990 Fixes https://github.com/pytorch/pytorch/issues/156989 Fixes https://github.com/pytorch/pytorch/issues/105118 Fixes https://github.com/pytorch/pytorch/issues/157415 Fixes https://github.com/pytorch/pytorch/issues/157282 Fixes https://github.com/pytorch/pytorch/issues/157261 Fixes https://github.com/pytorch/pytorch/issues/157170 Fixes https://github.com/pytorch/pytorch/issues/157126 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163087 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony	2025-10-03 19:30:59 +00:00
vishalgoyal316	ace89350fc	better error handling for rrelu when lower or upper range is infinite (#160965 ) … - issue#153281 Fixes #153281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160965 Approved by: https://github.com/janeyx99	2025-09-30 05:01:32 +00:00
Yuanyuan Chen	a293206bd5	Fix invalid f-strings (#164112 ) Fixes invalid f-strings detected by `ruff`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164112 Approved by: https://github.com/Skylion007, https://github.com/mlazos	2025-09-30 04:17:13 +00:00
Xinya Zhang	3cbfbbd691	[ROCm] Transformer/SDPA unit test parity (#163745 ) ## Major Changes * Efficient Attention on ROCM requires last dimensions of input tensors align with 16 bytes. - Unlike FA, ME does not pad input tensors in `scaled_dot_product_attention` and hence this is required. * Fix `atomic_counter` handling in varlen FA API * Unskips a few unit tests. Fixes #157120 Fixes #157121 Fixes #157122 Fixes #157167 Fixes #155217 Fixes #157043 Fixes #157060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163745 Approved by: https://github.com/jeffdaily	2025-09-25 17:14:19 +00:00
arkadip-maitra	a0d2d84846	Handling overflow for long int overflow for the product of kernel_hei… (#155989 ) …ght and kernel_width that overflows to be exactly 0 Fixes [#155981](https://github.com/pytorch/pytorch/issues/155981) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155989 Approved by: https://github.com/malfet	2025-09-19 18:15:01 +00:00
PyTorch MergeBot	468c1f9e9d	Revert "[nn] Assert parsed iterable arguments are an appropriate length (#162340 )" This reverts commit `b5e6e58050`. Reverted https://github.com/pytorch/pytorch/pull/162340 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break an MPS tests on ExecuTorch ([comment](https://github.com/pytorch/pytorch/pull/162340#issuecomment-3282676242))	2025-09-11 21:22:57 +00:00
Jeff Daily	d65ffdef3d	[ROCm] fix miopen batchnorm changing output format (#162112 ) It was found that the integration of miopen batchnorm was causing the output to always be in default contig memory format even when the input was channels last. This also unskips a number of related unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162112 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Dmitry Nikolaev <dmitry.nikolaev@amd.com> Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>	2025-09-11 19:37:48 +00:00
Benjamin Glass	b5e6e58050	[nn] Assert parsed iterable arguments are an appropriate length (#162340 ) Fixes #162327 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162340 Approved by: https://github.com/Skylion007	2025-09-10 15:15:49 +00:00
mansiag05	5927a70934	NLLLoss: validate target is 0D when input is 1D (#161412 ) Add a shape check in nll_loss_forward to error out when both input and target are 1D. Added a unit test to cover the incompatible 1D/1D case. Fixes #157420 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161412 Approved by: https://github.com/ngimel	2025-09-06 20:58:42 +00:00
Yu, Guangye	3a20a20e70	Fix largeTensorTest malfunction on XPU (#161988 ) # Motivation https://github.com/pytorch/pytorch/pull/143553/files#diff-6492991193449e118ff0c8d42ca544cc38a73604e505ff246a3c711aeab91748R1345 makes `largeTensorTest` malfunction on XPU. This PR aims to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161988 Approved by: https://github.com/EikanWang, https://github.com/albanD	2025-09-04 16:10:03 +00:00
Jeff Daily	99f356fa58	[ROCm] revamp miopen integration (#161687 ) Update sources under ATen/miopen and ATen/native/miopen to align with best practices. Avoid reshape_ calls inside backward operations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161687 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-09-03 22:28:09 +00:00
Rohit Singh Rathaur	fca2601c9d	Improve error message for unsupported padding config (#160866 ) Fixes #160053 The previous error message `Only 2D, 3D, 4D, 5D padding with non-constant padding are supported for now` was not clear now we have ``` python3 Python 3.13.5 \| packaged by conda-forge \| (main, Jun 16 2025, 08:27:50) [GCC 13.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch ... import torch.nn.functional as F ... a = torch.empty(2,2,2,2) ... F.pad(a, (1,1), mode="circular") ... Traceback (most recent call last): File "<python-input-0>", line 4, in <module> F.pad(a, (1,1), mode="circular") ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rrathaur/Desktop/pytorch/torch/nn/functional.py", line 5294, in pad return torch._C._nn.pad(input, pad, mode, value) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^ NotImplementedError: Padding size 2 is not supported for 4D input tensor. Supported combinations for non-constant padding: - 2D or 3D input: padding size = 2 (pads last dimension) - 3D or 4D input: padding size = 4 (pads last 2 dimensions) - 4D or 5D input: padding size = 6 (pads last 3 dimensions) >>> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/160866 Approved by: https://github.com/mikaylagawarecki	2025-09-02 07:15:59 +00:00
Kurt Mohler	6382302990	[MPS] Add `grid_sampler_3d` for MPS (#160541 ) This PR adds support for `grid_sampler_3d` for MPS with "bilinear" interpolation. NOTE: "nearest" interpolation is not yet supported Fixes #159882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160541 Approved by: https://github.com/malfet	2025-08-15 16:19:25 +00:00
Nikita Shulga	e06b110f73	[Testing] Add MPS to NATIVE_DEVICES (#153835 ) This would allow me to enable more opinfo tests against MPS device eventually and supposed to be a very simple test, but actually required minor adjustments to lots of test files, namely: - Introduce `all_mps_types_and` that is very similar to `all_types_and`, but skips `float64` - Decorate lots of tests with `@dtypesIfMPS(*all_mps_types())` - Skip `test_from_dlpack_noncontinguous` as it currently crashes (need to be fixed) - Add lots of `expectedFailureIfMPS` - Delete all `@onlyNativeDeviceTypesAnd("mps")` <sarcasm> I love how well documented this variable are </sarcasm> Pull Request resolved: https://github.com/pytorch/pytorch/pull/153835 Approved by: https://github.com/Skylion007	2025-08-05 18:57:35 +00:00
PyTorch MergeBot	356ac3103a	Revert "Stop parsing command line arguments every time common_utils is imported. (#156703 )" This reverts commit `310f901a71`. Reverted https://github.com/pytorch/pytorch/pull/156703 on behalf of https://github.com/izaitsevfb due to breaking tests internally with `assert common_utils.SEED is not None` ([comment](https://github.com/pytorch/pytorch/pull/156703#issuecomment-3152337518))	2025-08-04 20:37:39 +00:00
Anthony Barbier	310f901a71	Stop parsing command line arguments every time common_utils is imported. (#156703 ) Last PR in the series to re-submit https://github.com/pytorch/pytorch/pull/134592 as smaller PRs: https://github.com/pytorch/pytorch/pull/154612 https://github.com/pytorch/pytorch/pull/154628 https://github.com/pytorch/pytorch/pull/154715 https://github.com/pytorch/pytorch/pull/154716 https://github.com/pytorch/pytorch/pull/154725 https://github.com/pytorch/pytorch/pull/154728 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156703 Approved by: https://github.com/clee2000	2025-08-02 16:38:54 +00:00
linmin	2a286cbdf4	Allow register_buffer with Tensor-like object (#159455 ) As torch allows extending the tensor with `__torch_function__`, it would be desirable to allow registering it as a buffer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159455 Approved by: https://github.com/mikaylagawarecki	2025-08-01 15:31:38 +00:00
AaronWang04	dc286aef61	Fused RMSNorm Housekeeping (#159317 ) Small PR to address comments that were made from the original fused rmsnorm PR that were not landed Changes: - Warning message when input.dtype doesn't match weight.dtype - Ensure default epsilon value is correct Comments: https://github.com/pytorch/pytorch/pull/153666#discussion_r2114735005 https://github.com/pytorch/pytorch/pull/153666#discussion_r2223518064 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159317 Approved by: https://github.com/ngimel, https://github.com/Skylion007, https://github.com/eqy	2025-07-29 22:39:18 +00:00
eqy	8573a2beda	[CUDA] Fix missing `__syncthreads` in MultiMarginLoss backward (#158994 ) Turns out issue in #158921 is detectable with a simple unit test and adding the missing sync fixes it Pull Request resolved: https://github.com/pytorch/pytorch/pull/158994 Approved by: https://github.com/malfet, https://github.com/Skylion007 Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-07-24 20:47:29 +00:00
Jiang, Yanbing	f4d8bc46c7	Enable TF32 as fp32 internal precision for matmul/linear/conv (#157520 ) ### Description This PR is to enable TF32 as fp32 internal precision for matmul/linear/conv in `mkldnn backend`. Since we have refined fp32 precision API in https://github.com/pytorch/pytorch/pull/125888, we can easily extend the API to support TF32 for `mkldnn backend`. ``` torch.backends.mkldnn.matmul.fp32_precision = 'tf32' torch.backends.mkldnn.conv.fp32_precision = "tf32" ``` Related kernel update and UTs update are done. And the wrapper `bf32_on_and _off` is updated to `reduced_f32_on_and_off`, and it can run tests 3 times, one is reduced_f32 OFF, the other two are reduced_f32 ON (including `bf32 ON` and `tf32 ON`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/157520 Approved by: https://github.com/mingfeima, https://github.com/jansel	2025-07-17 08:57:34 +00:00
Xuehai Pan	fc0376e8b1	[BE][2/6] fix typos in test/ (test/test_*.py) (#157636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157636 Approved by: https://github.com/yewentao256, https://github.com/mlazos ghstack dependencies: #156311, #156609	2025-07-09 11:02:23 +00:00
zeshengzong	f41d017aa6	Add device check in `mse_loss` (#155089 ) Fixes #154978 ## Test Result ```python >>> import torch >>> import numpy as np >>> import torch.nn as nn >>> import torch.distributions.normal as norm >>> device = torch.device(('cuda' if torch.cuda.is_available() else 'cpu')) >>> print('Using {}'.format(device)) Using cuda >>> m = nn.Sequential(nn.Linear(1, 128).cuda(), nn.Tanh(), nn.Linear(128, 128).cuda(), nn.Tanh(), nn.Linear(128, 128).cuda(), nn.Tanh()) >>> m.to(device, dtype=None, non_blocking=False) Sequential( (0): Linear(in_features=1, out_features=128, bias=True) (1): Tanh() (2): Linear(in_features=128, out_features=128, bias=True) (3): Tanh() (4): Linear(in_features=128, out_features=128, bias=True) (5): Tanh() ) >>> opt = torch.optim.Adam(m.parameters(), lr=0.001) >>> print('Number of trainable parameters: ', sum((p.numel() for p in m.parameters() if p.requires_grad))) Number of trainable parameters: 33280 >>> input_tensor = torch.tensor(77.0, device=device) >>> target = torch.tensor(66.0) >>> loss_function = nn.MSELoss() >>> print('Loss Function: ', loss_function) Loss Function: MSELoss() >>> loss = loss_function(input_tensor, target) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/zong/code/pytorch/torch/nn/modules/module.py", line 1767, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zong/code/pytorch/torch/nn/modules/module.py", line 1778, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zong/code/pytorch/torch/nn/modules/loss.py", line 610, in forward return F.mse_loss(input, target, reduction=self.reduction) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/zong/code/pytorch/torch/nn/functional.py", line 3903, in mse_loss return torch._C._nn.mse_loss( ^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/155089 Approved by: https://github.com/cyyever, https://github.com/albanD	2025-07-04 12:37:48 +00:00
Ahmad Sharif	36dd598bda	layernorm tests: Tweak test thresholds for comparing tensors (#156699 ) After I landed this PR: https://github.com/pytorch/pytorch/pull/156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices. We now raise the tolerances for larger tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156699 Approved by: https://github.com/eqy, https://github.com/ngimel	2025-07-02 19:33:38 +00:00
Dmitry Nikolaev	fe1f1a38df	add test_batchnorn_2D and 3D tests (#156498 ) New set of batchnorm tests to verify NCHW 2D/3D BatchNorm This test also allows to add and configure different BatchNorm tests (dtypes, NCHW/NHWC, Mixed) in the future based on: - Train [test_batchnorm_cudnn_nhwc](`1051b93192/test/test_nn.py (L4985)`) - Inference [test_batchnorm_nhwc_cuda](`1051b93192/test/test_nn.py (L5130)`) ``` test_batchnorm_3D_inference_NCHW_vs_cpu_float32 (__main__.TestNN.test_batchnorm_3D_inference_NCHW_vs_cpu_float32) ... ok (0.113s) test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_bfloat16 (__main__.TestNN.test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_bfloat16) ... ok (0.057s) test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_float16 (__main__.TestNN.test_batchnorm_3D_inference_NCHW_vs_cpu_mixed_float16) ... ok (0.063s) test_batchnorm_3D_inference_NCHW_vs_native_float32 (__main__.TestNN.test_batchnorm_3D_inference_NCHW_vs_native_float32) ... ok (0.059s) test_batchnorm_3D_inference_NCHW_vs_native_mixed_bfloat16 (__main__.TestNN.test_batchnorm_3D_inference_NCHW_vs_native_mixed_bfloat16) ... ok (0.006s) test_batchnorm_3D_inference_NCHW_vs_native_mixed_float16 (__main__.TestNN.test_batchnorm_3D_inference_NCHW_vs_native_mixed_float16) ... ok (0.006s) test_batchnorm_3D_train_NCHW_vs_cpu_float32 (__main__.TestNN.test_batchnorm_3D_train_NCHW_vs_cpu_float32) ... ok (0.007s) test_batchnorm_3D_train_NCHW_vs_cpu_mixed_bfloat16 (__main__.TestNN.test_batchnorm_3D_train_NCHW_vs_cpu_mixed_bfloat16) ... ok (0.005s) test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16 (__main__.TestNN.test_batchnorm_3D_train_NCHW_vs_cpu_mixed_float16) ... ok (0.005s) test_batchnorm_3D_train_NCHW_vs_native_float32 (__main__.TestNN.test_batchnorm_3D_train_NCHW_vs_native_float32) ... ok (0.003s) test_batchnorm_3D_train_NCHW_vs_native_mixed_bfloat16 (__main__.TestNN.test_batchnorm_3D_train_NCHW_vs_native_mixed_bfloat16) ... skip: bfloat16 NCHW train failed due to native tolerance issue (0.001s) test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 (__main__.TestNN.test_batchnorm_3D_train_NCHW_vs_native_mixed_float16) ... skip: 3D float16 NCHW train failed on ROCm<7.0 (0.001s) test_batchnorm_2D_inference_NCHW_vs_cpu_float32 (__main__.TestNN.test_batchnorm_2D_inference_NCHW_vs_cpu_float32) ... ok (0.016s) test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_bfloat16 (__main__.TestNN.test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_bfloat16) ... ok (0.003s) test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_float16 (__main__.TestNN.test_batchnorm_2D_inference_NCHW_vs_cpu_mixed_float16) ... ok (0.003s) test_batchnorm_2D_inference_NCHW_vs_native_float32 (__main__.TestNN.test_batchnorm_2D_inference_NCHW_vs_native_float32) ... ok (0.054s) test_batchnorm_2D_inference_NCHW_vs_native_mixed_bfloat16 (__main__.TestNN.test_batchnorm_2D_inference_NCHW_vs_native_mixed_bfloat16) ... ok (0.002s) test_batchnorm_2D_inference_NCHW_vs_native_mixed_float16 (__main__.TestNN.test_batchnorm_2D_inference_NCHW_vs_native_mixed_float16) ... ok (0.001s) test_batchnorm_2D_train_NCHW_vs_cpu_float32 (__main__.TestNN.test_batchnorm_2D_train_NCHW_vs_cpu_float32) ... ok (0.007s) test_batchnorm_2D_train_NCHW_vs_cpu_mixed_bfloat16 (__main__.TestNN.test_batchnorm_2D_train_NCHW_vs_cpu_mixed_bfloat16) ... ok (0.004s) test_batchnorm_2D_train_NCHW_vs_cpu_mixed_float16 (__main__.TestNN.test_batchnorm_2D_train_NCHW_vs_cpu_mixed_float16) ... ok (0.004s) test_batchnorm_2D_train_NCHW_vs_native_float32 (__main__.TestNN.test_batchnorm_2D_train_NCHW_vs_native_float32) ... ok (0.003s) test_batchnorm_2D_train_NCHW_vs_native_mixed_bfloat16 (__main__.TestNN.test_batchnorm_2D_train_NCHW_vs_native_mixed_bfloat16) ... skip: bfloat16 NCHW train failed due to native tolerance issue (0.001s) test_batchnorm_2D_train_NCHW_vs_native_mixed_float16 (__main__.TestNN.test_batchnorm_2D_train_NCHW_vs_native_mixed_float16) ... ok (0.002s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/156498 Approved by: https://github.com/jeffdaily	2025-06-25 20:38:02 +00:00
Ahmad Sharif	899d3d3e9e	Don't call `sum()` on a tensor that is not summable in layer_norm (#156600 ) Don't call `sum()` on a tensor that is default constructed. Previously we could call `sum()` on a tensor that was default-contructed. That would lead to an error like this: ``` Traceback (most recent call last): File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 58, in testPartExecutor yield File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 634, in run self._callTestMethod(testMethod) File "/home/ahmads/.conda/envs/pt3/lib/python3.12/unittest/case.py", line 589, in _callTestMethod if method() is not None: ^^^^^^^^ File "/home/ahmads/personal/pytorch/torch/testing/_internal/common_utils.py", line 3191, in wrapper method(args, kwargs) File "/home/ahmads/personal/pytorch/test/test_nn.py", line 7235, in test_layer_norm_backwards_eps ln_out_cuda.backward(grad_output_cuda) File "/home/ahmads/personal/pytorch/torch/_tensor.py", line 647, in backward torch.autograd.backward( File "/home/ahmads/personal/pytorch/torch/autograd/__init__.py", line 354, in backward _engine_run_backward( File "/home/ahmads/personal/pytorch/torch/autograd/graph.py", line 829, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: tensor does not have a device Exception raised from device_default at /home/ahmads/personal/pytorch/c10/core/TensorImpl.h:1265 (most recent call first): C++ CapturedTraceback: #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 #6 c10::detail::torchCheckFail(char const, char const, unsigned int, char const) from ??:0 #7 at::TensorBase::options() const from :0 #8 at::meta::resize_reduction(at::impl::MetaBase&, at::Tensor const&, c10::OptionalArrayRef<long>, bool, c10::ScalarType, bool) from :0 #9 at::meta::structured_sum_dim_IntList::meta(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from ??:0 #10 at::(anonymous namespace)::wrapper_CompositeExplicitAutogradNonFunctional_sum_dim_IntList(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from RegisterCompositeExplicitAutogradNonFunctional_0.cpp:0 #11 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), &at::(anonymous namespace)::wrapper_CompositeExplicitAutogradNonFunctional_sum_dim_IntList>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType> > >, at::Tensor (at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from RegisterCompositeExplicitAutogradNonFunctional_0.cpp:0 #12 at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>) from ??:0 #13 void at::native::(anonymous namespace)::LaunchGammaBetaBackwardCUDAKernel<float, float>(float const, float const, float const, float const, long, long, at::Tensor, at::Tensor, CUstream_st) from ??:0 #14 void at::native::(anonymous namespace)::LayerNormBackwardKernelImplInternal<float>(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long, long, at::Tensor, at::Tensor, at::Tensor) from ??:0 #15 at::native::(anonymous namespace)::LayerNormBackwardKernelImpl(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long, long, at::Tensor, at::Tensor, at::Tensor) from ??:0 #16 at::native::layer_norm_backward_cuda(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::array<bool, 3ul>) from ??:0 #17 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm_backward(at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::array<bool, 3ul>) from RegisterCUDA_0.cpp:0 ``` Now we only call `sum(0)` on tensors that are defined and properly guard the `sum(0)` and assignment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156600 Approved by: https://github.com/eqy, https://github.com/ngimel	2025-06-24 05:00:42 +00:00
Nikita Shulga	c28e74e457	[MPS] Add nearest_3d forward and backward (#156090 ) Introduce generalizable `UpsampleParams` structure in `UpSample.h`, which could be shared between CPU and MPS Delete `upsample_nearest3d` MPS fallback and replace it with proper shader Pull Request resolved: https://github.com/pytorch/pytorch/pull/156090 Approved by: https://github.com/kulinseth, https://github.com/dcci ghstack dependencies: #156256	2025-06-18 04:48:15 +00:00
atalman	c199a4d0fd	Move non inductor workflows cuda 12.6->cuda 12.8 (#155234 ) Move non inductor workflows cuda 12.6->cuda 12.8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155234 Approved by: https://github.com/Skylion007, https://github.com/zxiiro, https://github.com/cyyever, https://github.com/malfet	2025-06-12 12:42:34 +00:00
PyTorch MergeBot	40fefe2871	Revert "[BE] Update cudnn to 9.10.1.4 (#155122 )" This reverts commit `73220d52fd`. Reverted https://github.com/pytorch/pytorch/pull/155122 on behalf of https://github.com/atalman due to wrong pr description ([comment](https://github.com/pytorch/pytorch/pull/155122#issuecomment-2960592004))	2025-06-10 21:13:18 +00:00
Eddie Yan	35e8f2593c	[CUDA] Fix missing bounds check in `Softmax.cu` (#154778 ) Uncovered by @ngimel, same as change in #144009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154778 Approved by: https://github.com/ngimel, https://github.com/cyyever, https://github.com/malfet	2025-06-10 20:03:54 +00:00
Aaron Gokaslan	73220d52fd	[BE] Update cudnn to 9.10.1.4 (#155122 ) Follow up to #152782 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155122 Approved by: https://github.com/malfet, https://github.com/atalman, https://github.com/eqy	2025-06-10 16:59:00 +00:00
zeshengzong	f12d8d60b1	Add hint message when parameters is empty in clip_grad_norm_ (#151529 ) Fixes #148259 ## Changes - Add print warning message when `parameters` generator exhausted ## Test Result ### print warning ```python import torch import torch.nn as nn import torch.optim as optim class SimpleModel(nn.Module): def __init__(self): super(SimpleModel, self).__init__() self.fc = nn.Linear(10, 1) def forward(self, x): return self.fc(x) model = SimpleModel() criterion = nn.MSELoss() optimizer = optim.SGD(model.parameters(), lr=0.01) inputs = torch.randn(16, 10) targets = torch.randn(16, 1) outputs = model(inputs) loss = criterion(outputs, targets) optimizer.zero_grad() loss.backward() params_to_clip = model.parameters() for p in params_to_clip: print(p.shape) max_norm = 1.0 norm_type = 2.0 total_norm = nn.utils.clip_grad_norm_(params_to_clip, max_norm, norm_type) print(f"total_norm: {total_norm}") ``` ```bash /home/zong/code/pytorch/torch/nn/utils/clip_grad.py:222: UserWarning: `parameters` is an empty generator, no gradient clipping will occur. warnings.warn( total_norm: 0.0 ``` ### UT ```bash pytest test/test_nn.py -k test_clip_grad_norm ``` ![image](https://github.com/user-attachments/assets/0aa0f06c-e0a5-43cf-9a97-d7c2747c9180) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151529 Approved by: https://github.com/jbschlosser	2025-05-22 11:23:39 +00:00
Eddie Yan	cecfc7dc53	[CUDA][cuDNN] Fix handling of `CPU` side input and target length tensors in `CTCLoss` (#152745 ) https://github.com/pytorch/pytorch/pull/128271 migrated to cuDNN V8 CTCLoss which expects input and target length tensors to be on `CUDA` rather than `CPU` without adding the logic to account for the edge case of them being on `CPU` see also #152421 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152745 Approved by: https://github.com/Skylion007	2025-05-07 22:01:18 +00:00
iupaikov-amd	730a077d48	[ROCm] Unskipped test_rnn_dropout_state for ROCm (#152339 ) Unskipping the test, should work fine now. Related PR: https://github.com/pytorch/pytorch/pull/144572 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152339 Approved by: https://github.com/jeffdaily	2025-05-02 22:02:30 +00:00
abcarlisle	a1a4fee3b8	Native channel shuffle floating point exception (#144010 ) Fixes #142453 Added TORCH_CHECKS to prevent the user from using the native_channel_shuffle function incorrectly and getting a "Floating point exception (core dumped)" Pull Request resolved: https://github.com/pytorch/pytorch/pull/144010 Approved by: https://github.com/albanD	2025-04-29 23:38:54 +00:00
Jagadish Krishnamoorthy	0d99b4e9e2	ROCm: Enable tf32 testing on test_nn (#148945 ) Add tf32 support for ROCm tests. test command: python test/test_nn.py -v Pull Request resolved: https://github.com/pytorch/pytorch/pull/148945 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-04-28 23:01:04 +00:00
eqy	34b0de50a3	[TF32][CUDA] account for TF32 in `test_linear_autograd` (#152216 ) Abate some more noise seen on blackwell Pull Request resolved: https://github.com/pytorch/pytorch/pull/152216 Approved by: https://github.com/Skylion007	2025-04-28 21:00:17 +00:00
Anthony Shoumikhin	e2f9759bd0	Fix broken URLs (#152237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet	2025-04-27 09:56:42 +00:00
eqy	6efc572221	[CUDA][CPU] Bump system memory requirement for `test_cross_entropy_large_tensor` (#151812 ) `/usr/bin/time` seems to show max resident pages at 119GiB Pull Request resolved: https://github.com/pytorch/pytorch/pull/151812 Approved by: https://github.com/colesbury	2025-04-24 19:25:29 +00:00
zeshengzong	01f226bfb8	Add check for ctc_loss targets param (#150981 ) Fixes #150835 ## Test Result ```python # cuda >>> import torch >>> import torch.nn.functional as F >>> device = "cuda" # "cpu" is fine >>> num_classes = 4 >>> log_probs = torch.rand(0, 0, num_classes, device=device) >>> targets = torch.tensor([], device=device, dtype=torch.long) >>> input_lengths = torch.tensor([], device=device, dtype=torch.long) >>> target_lengths = torch.tensor([], device=device, dtype=torch.long) >>> result = F.ctc_loss(log_probs, targets, input_lengths, target_lengths, reduction='none') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/zong/code/pytorch/torch/nn/functional.py", line 3079, in ctc_loss return torch.ctc_loss( ^^^^^^^^^^^^^^^ RuntimeError: log_probs tensor must not be empty # cpu >>> device = "cpu" >>> num_classes = 4 >>> log_probs = torch.rand(0, 0, num_classes, device=device) >>> targets = torch.tensor([], device=device, dtype=torch.long) >>> input_lengths = torch.tensor([], device=device, dtype=torch.long) >>> target_lengths = torch.tensor([], device=device, dtype=torch.long) >>> result = F.ctc_loss(log_probs, targets, input_lengths, target_lengths, reduction='none') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/zong/code/pytorch/torch/nn/functional.py", line 3079, in ctc_loss return torch.ctc_loss( ^^^^^^^^^^^^^^^ RuntimeError: log_probs tensor must not be empty ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150981 Approved by: https://github.com/eqy	2025-04-14 07:24:30 +00:00

1 2 3 4 5 ...

1598 Commits