pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	afa1eda901	Revert "[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely (#148590 )" This reverts commit `ef6296e7f2`. Reverted https://github.com/pytorch/pytorch/pull/148590 on behalf of https://github.com/izaitsevfb due to reverted internally, see D71292427 ([comment](https://github.com/pytorch/pytorch/pull/148590#issuecomment-2731114626))	2025-03-17 22:43:15 +00:00
Isuru Fernando	a3c6e3139a	allow extra args for parameterization of tests in inductor (#149154 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149154 Approved by: https://github.com/amjames, https://github.com/eellison	2025-03-17 22:05:06 +00:00
Davide Italiano	e4f6e4ac84	[MPS] Add inductor support for `modified_bessel_i0`. (#149342 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149342 Approved by: https://github.com/malfet	2025-03-17 21:45:51 +00:00
Sampsa	5e9f792479	[ROCm] Unskip flex attention UTs after triton 3.3 bump (#148327 ) Enable `test_flex_attention.py::TestLearnableBiases` unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148327 Approved by: https://github.com/jeffdaily	2025-03-17 20:15:14 +00:00
Shunting Zhang	6c7d8419e3	fix two accuracy regression (#149172 ) There are 2 accuracy regression in 3/12 nightly perf run. I can not repro them locally thus there is no effective way to bisect. Raise the tolerance to make them pass the accuracy check. - error log for HF MegatronBertForQuestionAnswering https://gist.github.com/shunting314/25322b66e15e98feed32e0d9a1e43316 - error log for TIMM gluon_inception_v3 https://gist.github.com/shunting314/df64ce22327df27a7057bbbd19ef5164 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149172 Approved by: https://github.com/jansel, https://github.com/eellison	2025-03-17 19:34:00 +00:00
angelayi	8d7c430e84	Symintify transpose_ (#149057 ) Fixes https://github.com/pytorch/pytorch/issues/148702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149057 Approved by: https://github.com/yushangdi	2025-03-17 19:11:54 +00:00
Isuru Fernando	c41c2130be	Fix printing INT64_MIN (#149148 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149148 Approved by: https://github.com/anijain2305	2025-03-17 17:57:18 +00:00
Yichen Yan	8cdb9adc05	do not run `test_ck_blas_library` on cpu (#148316 ) Fix on non-rocm: ``` root@e01-tw-ue5g2g3sap6:~/pytorch/test# python test_linalg.py TestLinalgCPU.test_ck_blas_library_cpu E ====================================================================== ERROR: test_ck_blas_library_cpu (__main__.TestLinalgCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/root/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper method(args, kwargs) File "/root/pytorch/torch/testing/_internal/common_device_type.py", line 480, in instantiated_test raise rte File "/root/pytorch/torch/testing/_internal/common_device_type.py", line 460, in instantiated_test result = test(self, param_kwargs) File "/root/pytorch/torch/testing/_internal/common_device_type.py", line 1242, in dep_fn return fn(slf, args, *kwargs) File "/root/pytorch/torch/testing/_internal/common_utils.py", line 1981, in _fn fn(args, **kwargs) File "/root/pytorch/test/test_linalg.py", line 8621, in test_ck_blas_library torch.backends.cuda.preferred_blas_library('ck') File "/root/pytorch/torch/backends/cuda/__init__.py", line 258, in preferred_blas_library torch._C._set_blas_preferred_backend(_BlasBackends[backend]) RuntimeError: Cannot set preferred backend to Ck if PyTorch has not been compiled for ROCm. To execute this test, run the following from the base repo dir: python test/test_linalg.py TestLinalgCPU.test_ck_blas_library_cpu This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- Ran 1 test in 0.346s FAILED (errors=1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148316 Approved by: https://github.com/jeffdaily	2025-03-17 17:45:45 +00:00
Rachel Guo	aaa4c3d60b	[mm_logs] make aten mm info readable (#148800 ) Summary: as title. make it into a table like e.g. also see pic in test plan \| Name \| M \| N \| K \| Count \| \| aten.mm \| 16 \| 6 \| 16 \| 1 \| ... Test Plan: {F1975907876} <img width="1090" alt="Screenshot 2025-03-11 at 3 13 00 PM" src="https://github.com/user-attachments/assets/ffae8c56-e32c-49cc-bbfb-5b8d216b8657" /> Differential Revision: D70825664 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148800 Approved by: https://github.com/henrylhtsang	2025-03-17 17:00:58 +00:00
PyTorch UpdateBot	790f93db3a	Update slow tests (#149300 ) This PR is auto-generated weekly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/weekly.yml). Update the list of slow tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149300 Approved by: https://github.com/pytorchbot	2025-03-17 11:39:29 +00:00
Sun, Jiayi	b2862f1435	optimize the decomposition of aten.native_group_norm (#144733 ) Summary: Optimize the decomposition of aten.native_group_norm. Reduce unnecessary repeated operations by changing the order of operations for `mean`, `rstd`, `weight`, `bias `and `input`, which can improve performance when `flattened_inner_size `is large. The original decomposition: 1. compute `mean `and `rstd`, 2. out = (x - mean) * rstd, compute in the range [N, C, ], 3. out = out weight + bias, compute in the range [N, C, ], The new decomposition: 1. compute `mean `and `rstd`, 2. new_weight = rstd weight, new_bias = - mean * rstd * weight + bias, compute in the range [N, C], 3. out = out * new_weight + new_bias, compute in the range [N, C, *], I tested the Inductor performance benchmark with this PR on both CPU and A100. On CPU, two torchbench models(functorch_dp_cifar10 and opacus_cifar10) have about 25% performance improvement, and two diffusion models(Stable Diffusion and Latent Consistency Model(LCM)) have about 2% performance improvement. On A100, no performance gains or regressions were seen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144733 Approved by: https://github.com/leslie-fang-intel, https://github.com/jansel	2025-03-17 09:27:01 +00:00
soulitzer	916e8979d3	Skip some tests not using gradcheck on slowgradcheck (#149220 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149220 Approved by: https://github.com/seemethere	2025-03-17 00:34:52 +00:00
eqy	6048d88afe	[ARM64][CUDA] skip string pattern matching in `test_workspace_allocation_error` (#149236 ) `unwind()` on ARM64 seems to elide the strings of interest Pull Request resolved: https://github.com/pytorch/pytorch/pull/149236 Approved by: https://github.com/malfet, https://github.com/eellison, https://github.com/BoyuanFeng	2025-03-17 00:30:43 +00:00
Aaron Gokaslan	bfee141666	[BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/149257 Approved by: https://github.com/jansel	2025-03-16 23:52:58 +00:00
Tugsbayasgalan Manlaibaatar	6b1b95ad2a	Support subclass constructor capturing in export (#147014 ) Notable TODOs: 1. Need to implement AutogradHOP to get rid of subclasses before serializing 2. Need to implement mechanism to figure out what subclasses will be used in export when they are not expressed in the inputs Differential Revision: [D69640673](https://our.internmc.facebook.com/intern/diff/D69640673) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147014 Approved by: https://github.com/bdhirsh	2025-03-16 18:19:19 +00:00
Davide Italiano	9f33c6f0a0	[MPS] Add support for modified_bessel_i0 in eager. (#149264 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149264 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-03-16 04:45:49 +00:00
James Wu	a9c55277d7	[Reland] First version of statically compiled launcher for triton compiled CUDA kernels (#149238 ) This is a new version of https://github.com/pytorch/pytorch/pull/148561 fixing the ROCM test failure Putting this up for a first pass review, though I will likely make a bunch of changes before landing to add more features, etc. This diff implements a first version of a static CUDA kernel launcher in `torch._C`. The goal here is to take a cubin file and some metadata from a CompiledKernel from `triton`, and launch the cubin file directly. Background doc: https://docs.google.com/document/d/1rjRcHl6MfauHG30nCoQX-9UKvKyIs4WWMy_GsGyqb9g/edit?tab=t.0#heading=h.ut5lf39lzq66 Normally, using triton's CompiledKernel.make_launcher(), we would pay the cost of codegenning C++ and running it at compile time. With this new approach, we can use one statically compiled library to launch the kernel. The tradeoff here is that this new kernel launcher will not be able to use codegen to deal with different lengths/types of arguments. So we use templating to handle up to 10 arguments for now. We also allocate 8 bytes on the stack per argument no matter the argument type, which can take more memory than codegenning. On the other hand, we improve compile time on cold and warm start by not having to call the C++ compiler at all. This diff does not add the launcher to torch, but introduces a basic test suite. A list of TODOs that are not yet complete: - Handle `nvTmaDesc` and `cuTensorMap`, which triton handles - Embed the grid logic instead of passing in gridX,Y,Z - Handle launch_enter and exit hooks? (Not sure if inductor has these) - Benchmarking to see if there's runtime performance loss - Probably lots of features of the triton C++ generated code that I haven't handled yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149238 Approved by: https://github.com/oulgen	2025-03-15 15:06:46 +00:00
Huamin Li	e7e477c1f9	Not generate custom obj json when it's empty (#149246 ) Summary: as title. See internal Diff summary for more context. Test Plan: buck run @fbcode//mode/dev-nosan //caffe2/test/inductor:torchbind -- -r config_not_generated Differential Revision: D71241676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149246 Approved by: https://github.com/houseroad Co-authored-by: Huamin Li <huaminli@meta.com>	2025-03-15 13:00:48 +00:00
Jane Xu	740ce0fa5f	op should NOT be static in aoti_torch_call_dispatcher (#149208 ) aoti_torch_call_dispatcher is meant to call different ops, so the op must not be static. Otherwise, every call to this API will call the first op that was ever called, which is not the intended behavior of any human being. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149208 Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/malfet	2025-03-15 01:47:11 +00:00
Simon Fan	578160c875	[ca] don't inline accumulate grad op (#149014 ) we use dummy tensors in our initial trace, so we should never inline. the subclass dispatch might not support the dummy tensor, e.g. DTensor accumulate grad will check that both param and grad are DTensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/149014 Approved by: https://github.com/jansel ghstack dependencies: #149064	2025-03-15 01:10:54 +00:00
Simon Fan	f4368d8872	[ca] clean up aot node deduping (#149064 ) rename the AOT nodes as we copy paste them into the CA graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/149064 Approved by: https://github.com/jansel	2025-03-15 01:10:54 +00:00
Nikita Shulga	96795e9533	[BE] Parametrize `TestMPS.test_binops_dtype_precedence` (#149234 ) No op change, just splits a longer tests into a series of a smaller ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/149234 Approved by: https://github.com/atalman, https://github.com/dcci ghstack dependencies: #149216, #149233	2025-03-15 00:37:11 +00:00
yifanmao	7537b19c73	[FSDP2] Update ignored_params docstring and add unit test (#149074 ) Fixes https://github.com/pytorch/pytorch/issues/148242 ignored_params won't be moved to devices in full_shard(), update docstring. Add unit test `test_move_states_to_device_ignored_param_device` to show that ignored_params won't be moved during full_shard(), but would be after `model.cuda()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149074 Approved by: https://github.com/awgu	2025-03-15 00:23:09 +00:00
Nikita Shulga	08af311fc2	[MPS] Fix type promotion for `torch.floor_divide` (#149233 ) And delete some duplicating glue code by relying on the stub After this change `torch.arange(10, device = 'mps') // torch.arange(10., device='mps')` will return tensor of floats, which is a common dtype for float + integral operation, rather than tensor of ints Checked by `test_div2` inductor testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/149233 Approved by: https://github.com/atalman ghstack dependencies: #149216	2025-03-15 00:00:42 +00:00
bobrenjc93	eb7bf4202d	Make dynamism code robust to NotImplementedException (#148823 ) In prod many models have `@property` methods that raise NotImplementedError. This PR updates our dynamism code to be more robust to these types of models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148823 Approved by: https://github.com/laithsakka	2025-03-14 23:38:19 +00:00
PyTorch MergeBot	f9b4856989	Revert "[pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257 )" This reverts commit `c95a6b416b`. Reverted https://github.com/pytorch/pytorch/pull/113257 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @zou3519 can you please help land this internally? See the sigmoid tests in D71198793 for details. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/113257#issuecomment-2725982539))	2025-03-14 23:13:34 +00:00
PyTorch MergeBot	643aaea133	Revert "[RFC] First version of statically compiled launcher for triton compiled CUDA kernels (#148561 )" This reverts commit `5a843f8973`. Reverted https://github.com/pytorch/pytorch/pull/148561 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/148561#issuecomment-2725969268))	2025-03-14 23:01:26 +00:00
cz2h	05f2cbfe19	Add meta function for out variants of ones,zeros,empty (#149098 ) Open another PR to fix merge conflicts. Fixes https://github.com/pytorch/pytorch/issues/135832 For aten.ones, aten.zeros, followed this [link](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit?tab=t.0#heading=h.64r4npvq0w0) to register meta functions. For aten.empty.out, followed this [part](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit?tab=t.0#heading=h.iy9lxhxhtl5v) to register a decomp for empty that handles the FakeTensor input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149098 Approved by: https://github.com/williamwen42	2025-03-14 22:17:30 +00:00
Nikita Shulga	d7d9a71e19	[MPSInductor] Add support for atan2 (#149216 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149216 Approved by: https://github.com/dcci	2025-03-14 21:53:03 +00:00
Isalia20	dd6e9df3d0	[MPS] fix attention enable_gqa crash on mps (#149147 ) Fixes #149132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149147 Approved by: https://github.com/malfet	2025-03-14 21:25:54 +00:00
Davide Italiano	0bd863a62f	[MPS] Add inductor support for `i1e`. (#149221 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149221 Approved by: https://github.com/malfet	2025-03-14 21:18:38 +00:00
zeshengzong	a7f8de2198	Add `nn.Bilinear` param validation (#149018 ) Fixes #103425 ## Changes - Add doc description size value `must be > 0` - Add validation for `in1_features` param Currently, only `in1_features` will cause runtime error, if add checks for `in2_features` and `out_features` as well, might be kind of BC breaking. ```python import torch from torch import nn class lenet(nn.Module): def __init__(self): super(lenet, self).__init__() self.conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=5, stride=1) # Error, `in1_features=1, in2_features=0, out_features=0` no error self.linear = nn.Bilinear(in1_features=0, in2_features=0, out_features=0) def forward(self, x): # 1st block x = self.conv(x) x = self.linear(x) return x if __name__ == '__main__': net = lenet() ``` ## Test Result ```bash pytest test/test_nn.py -k test_bilinear -vv ``` ![image](https://github.com/user-attachments/assets/20617ba9-bac5-4db2-aecc-1831dbc8eb43) ![image](https://github.com/user-attachments/assets/401e4e1f-051a-4e1c-952b-48e85de64b0b) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149018 Approved by: https://github.com/mikaylagawarecki	2025-03-14 19:26:12 +00:00
James Wu	5a843f8973	[RFC] First version of statically compiled launcher for triton compiled CUDA kernels (#148561 ) Putting this up for a first pass review, though I will likely make a bunch of changes before landing to add more features, etc. This diff implements a first version of a static CUDA kernel launcher in `torch._C`. The goal here is to take a cubin file and some metadata from a CompiledKernel from `triton`, and launch the cubin file directly. Background doc: https://docs.google.com/document/d/1rjRcHl6MfauHG30nCoQX-9UKvKyIs4WWMy_GsGyqb9g/edit?tab=t.0#heading=h.ut5lf39lzq66 Normally, using triton's CompiledKernel.make_launcher(), we would pay the cost of codegenning C++ and running it at compile time. With this new approach, we can use one statically compiled library to launch the kernel. The tradeoff here is that this new kernel launcher will not be able to use codegen to deal with different lengths/types of arguments. So we use templating to handle up to 10 arguments for now. We also allocate 8 bytes on the stack per argument no matter the argument type, which can take more memory than codegenning. On the other hand, we improve compile time on cold and warm start by not having to call the C++ compiler at all. This diff does not add the launcher to torch, but introduces a basic test suite. A list of TODOs that are not yet complete, will do in separate diff: - Handle `nvTmaDesc` and `cuTensorMap`, which triton handles - Embed the grid logic instead of passing in gridX,Y,Z. With https://github.com/pytorch/pytorch/pull/147583, we should be able to handle all of the grid logic directly in _StaticCudaLauncher.launch_kernel, and get rid of the python evaluation. - Handle launch_enter and exit hooks? (Not sure if inductor has these) - Benchmarking to see if there's runtime performance loss - Hooking it up with a config to inductor - Testing harness to test against torch generated triton kernels Differential Revision: [D69926783](https://our.internmc.facebook.com/intern/diff/D69926783/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148561 Approved by: https://github.com/aorenste, https://github.com/syed-ahmed	2025-03-14 19:12:13 +00:00
zeshengzong	97272e4b49	Fix `torch.nn.functional.hardswish` gradients corner case (#148049 ) Fixes #147801 ## Changes - Change hardswish gradient compute condition as [torch.nn.functional.hardswish](https://pytorch.org/docs/stable/generated/torch.nn.functional.hardswish.html) - Enable cuda for test `test_hardswish_grad_corner` - Add test case for value=-3 ## Test Result ```bash pytest test/test_nn.py -k test_hardswish pytest test/test_unary_ufuncs.py -k test_hardswish pytest test/inductor/test_torchinductor.py -k test_hardswish ``` ![image](https://github.com/user-attachments/assets/000cb5c4-15f5-4bfd-ab45-f52bf810ff3d) ![image](https://github.com/user-attachments/assets/38b08cf8-ea84-47a2-8e37-0a213da3e0c8) ![image](https://github.com/user-attachments/assets/54bc57be-2c57-46cc-ab90-94ea6cbe1c34) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148049 Approved by: https://github.com/soulitzer	2025-03-14 18:53:10 +00:00
Nikita Shulga	f2221b2fce	[MPS] Add support for `i1e` (#149203 ) Followup after https://github.com/pytorch/pytorch/pull/149174 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149203 Approved by: https://github.com/dcci	2025-03-14 17:33:52 +00:00
Davide Italiano	f067eafabb	[MPS] Modify a test to test the correct function. (#149204 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149204 Approved by: https://github.com/malfet	2025-03-14 17:27:47 +00:00
Nikita Shulga	42e468d9b0	[MPSInductor] Adjust check_bounds (#147205 ) To make upper bound inclusive, which fixes `test_vectorized_ops_masked` and results in the following code ```python mps_lib_0 = compile_mps_shader(""" #include <c10/metal/random.h> #include <c10/metal/special_math.h> #include <c10/metal/utils.h> kernel void generated_kernel( device float* out_ptr0, constant float* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = (xindex) % (64); int x1 = (xindex) / (64); auto tmp5 = in_ptr0[x0 + 63*x1]; int x2 = xindex; auto tmp0 = x0; auto tmp1 = static_cast<long>(tmp0); auto tmp2 = 63; auto tmp3 = tmp1 < tmp2; if (x0 > 63) return; auto tmp6 = tmp3 ? tmp5 : 7; out_ptr0[x2] = static_cast<float>(tmp6); } """) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/147205 Approved by: https://github.com/jansel, https://github.com/dcci ghstack dependencies: #147211	2025-03-14 17:26:00 +00:00
cyy	a9aae05a6b	Remove test decorations on MacOS 12 (#148942 ) MacOS 12 may reach EOL, as from https://endoflife.date/macos Pull Request resolved: https://github.com/pytorch/pytorch/pull/148942 Approved by: https://github.com/malfet	2025-03-14 17:22:37 +00:00
Davide Italiano	f2ea77c099	[MPS] Add inductor support for i0e. (#149180 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149180 Approved by: https://github.com/malfet	2025-03-14 16:15:52 +00:00
PyTorch MergeBot	71795f159e	Revert "[AOTInductor] [BE] Add swap_constant_buffer into pybind for tests. (#149167 )" This reverts commit `bea181ff7e`. Reverted https://github.com/pytorch/pytorch/pull/149167 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. See D71177501 for the failure. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/149167#issuecomment-2725001232))	2025-03-14 15:16:21 +00:00
Davide Italiano	706c22549c	[MPS] Add support for `i0e` in eager. (#149174 ) Add `special.i0e` to XFAIL_GRADLIST for now, as its backward op is not yet implemented Pull Request resolved: https://github.com/pytorch/pytorch/pull/149174 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-03-14 14:43:46 +00:00
Huamin Li	68bbe20db7	Add test coverage (#149182 ) Summary: Follow up from D71160718 Differential Revision: D71177037 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149182 Approved by: https://github.com/houseroad	2025-03-14 09:38:29 +00:00
Xuehai Pan	c95a6b416b	[pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257 ) Changes in this PR: 1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence. 2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types. 3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class. Resolves #75982. New tests are included in this PR. - #75982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257 Approved by: https://github.com/zou3519	2025-03-14 08:50:30 +00:00
Sheng Fu	05ac99042f	Clean up grid in execution trace (#149159 ) Summary: This DIFF https://www.internalfb.com/diff/D70471332 removed input "grid" when calling triton kernel. PyTorch execution trace need to make the appropriate change. It includes capturing ET and replay ET. Test Plan: buck2 run mode/opt caffe2/test:test_profiler_cuda -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_with_pt2_cuda buck2 run mode/opt param_bench/fb/integration_tests:test_et_replay Differential Revision: D71152464 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149159 Approved by: https://github.com/sraikund16, https://github.com/jansel	2025-03-14 07:12:16 +00:00
PyTorch MergeBot	be4e6c1c8e	Revert "[MPS] Add support for `i0e` in eager. (#149174 )" This reverts commit `b4745db904`. Reverted https://github.com/pytorch/pytorch/pull/149174 on behalf of https://github.com/malfet due to MPS are red on trunk ([comment](https://github.com/pytorch/pytorch/pull/149174#issuecomment-2723774600))	2025-03-14 06:35:01 +00:00
Nikita Shulga	e162758051	[MPSInductor] Add `bessel_[jy][01]` ops (#149179 ) By simply calling corresponding special functions Followup TODO: tweak bessel_y0 to match CPU implementation for `torch.half` dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/149179 Approved by: https://github.com/dcci ghstack dependencies: #149123	2025-03-14 06:33:30 +00:00
Nikita Shulga	db6d72213b	[MPS] Add `torch.special.bessel_[jy][01]` implementations (#149123 ) By copy-n-pasting functions from `f59064f2b7/aten/src/ATen/native/cuda/Math.cuh (L1463)` With an ugly workaround for `bessel_y[01]` to avoid internal compiler exception on M1/M2 machines (see FB16863363 / https://gist.github.com/malfet/e7785e4b572e7740887a83a2386ef769 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/149123 Approved by: https://github.com/Skylion007, https://github.com/dcci	2025-03-14 05:13:55 +00:00
Isuru Fernando	9e6b2ca58d	Fix sympy float priting (#147552 ) Fixes https://github.com/pytorch/pytorch/pull/147261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147552 Approved by: https://github.com/bobrenjc93, https://github.com/cyyever	2025-03-14 05:07:06 +00:00
Mu-Chu Lee	bea181ff7e	[AOTInductor] [BE] Add swap_constant_buffer into pybind for tests. (#149167 ) Summary: We add swap_constant_buffer in pybind to add tests. Test Plan: python test/inductor/test_aot_inductor.py -k test_update_inactive_constant_buffer Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/149167 Approved by: https://github.com/chenyang78, https://github.com/jingsh	2025-03-14 04:12:48 +00:00
Mu-Chu Lee	e567900998	[AOTInductor] Activate CPU test for update_constant_buffer (#149162 ) Summary: Fixed by #145459 Test Plan: Re-activating tests. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/149162 Approved by: https://github.com/chenyang78, https://github.com/jingsh	2025-03-14 04:09:57 +00:00

1 2 3 4 5 ...

32678 Commits