Commit Graph

32678 Commits

Author SHA1 Message Date
PyTorch MergeBot
afa1eda901 Revert "[PGNCCL] Launch kernel on current stream & remove record_stream entirely (#148590)"
This reverts commit ef6296e7f2.

Reverted https://github.com/pytorch/pytorch/pull/148590 on behalf of https://github.com/izaitsevfb due to reverted internally, see D71292427 ([comment](https://github.com/pytorch/pytorch/pull/148590#issuecomment-2731114626))
2025-03-17 22:43:15 +00:00
Isuru Fernando
a3c6e3139a allow extra args for parameterization of tests in inductor (#149154)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149154
Approved by: https://github.com/amjames, https://github.com/eellison
2025-03-17 22:05:06 +00:00
Davide Italiano
e4f6e4ac84 [MPS] Add inductor support for modified_bessel_i0. (#149342)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149342
Approved by: https://github.com/malfet
2025-03-17 21:45:51 +00:00
Sampsa
5e9f792479 [ROCm] Unskip flex attention UTs after triton 3.3 bump (#148327)
Enable `test_flex_attention.py::TestLearnableBiases` unit tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148327
Approved by: https://github.com/jeffdaily
2025-03-17 20:15:14 +00:00
Shunting Zhang
6c7d8419e3 fix two accuracy regression (#149172)
There are 2 accuracy regression in 3/12 nightly perf run. I can not repro them locally thus there is no effective way to bisect. Raise the tolerance to make them pass the accuracy check.

- error log for HF MegatronBertForQuestionAnswering https://gist.github.com/shunting314/25322b66e15e98feed32e0d9a1e43316
- error log for TIMM gluon_inception_v3 https://gist.github.com/shunting314/df64ce22327df27a7057bbbd19ef5164

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149172
Approved by: https://github.com/jansel, https://github.com/eellison
2025-03-17 19:34:00 +00:00
angelayi
8d7c430e84 Symintify transpose_ (#149057)
Fixes https://github.com/pytorch/pytorch/issues/148702
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149057
Approved by: https://github.com/yushangdi
2025-03-17 19:11:54 +00:00
Isuru Fernando
c41c2130be Fix printing INT64_MIN (#149148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149148
Approved by: https://github.com/anijain2305
2025-03-17 17:57:18 +00:00
Yichen Yan
8cdb9adc05 do not run test_ck_blas_library on cpu (#148316)
Fix on non-rocm:

```
root@e01-tw-ue5g2g3sap6:~/pytorch/test# python test_linalg.py TestLinalgCPU.test_ck_blas_library_cpu
E
======================================================================
ERROR: test_ck_blas_library_cpu (__main__.TestLinalgCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/pytorch/torch/testing/_internal/common_utils.py", line 3108, in wrapper
    method(*args, **kwargs)
  File "/root/pytorch/torch/testing/_internal/common_device_type.py", line 480, in instantiated_test
    raise rte
  File "/root/pytorch/torch/testing/_internal/common_device_type.py", line 460, in instantiated_test
    result = test(self, **param_kwargs)
  File "/root/pytorch/torch/testing/_internal/common_device_type.py", line 1242, in dep_fn
    return fn(slf, *args, **kwargs)
  File "/root/pytorch/torch/testing/_internal/common_utils.py", line 1981, in _fn
    fn(*args, **kwargs)
  File "/root/pytorch/test/test_linalg.py", line 8621, in test_ck_blas_library
    torch.backends.cuda.preferred_blas_library('ck')
  File "/root/pytorch/torch/backends/cuda/__init__.py", line 258, in preferred_blas_library
    torch._C._set_blas_preferred_backend(_BlasBackends[backend])
RuntimeError: Cannot set preferred backend to Ck if PyTorch has not been compiled for ROCm.

To execute this test, run the following from the base repo dir:
    python test/test_linalg.py TestLinalgCPU.test_ck_blas_library_cpu

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
Ran 1 test in 0.346s

FAILED (errors=1)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148316
Approved by: https://github.com/jeffdaily
2025-03-17 17:45:45 +00:00
Rachel Guo
aaa4c3d60b [mm_logs] make aten mm info readable (#148800)
Summary:
as title. make it into a table like

e.g. also see pic in test plan

| Name     | M   | N   | K   | Count |
| aten.mm | 16  | 6   |  16 |     1     |
...

Test Plan: {F1975907876}
<img width="1090" alt="Screenshot 2025-03-11 at 3 13 00 PM" src="https://github.com/user-attachments/assets/ffae8c56-e32c-49cc-bbfb-5b8d216b8657" />

Differential Revision: D70825664

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148800
Approved by: https://github.com/henrylhtsang
2025-03-17 17:00:58 +00:00
PyTorch UpdateBot
790f93db3a Update slow tests (#149300)
This PR is auto-generated weekly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/weekly.yml).
Update the list of slow tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149300
Approved by: https://github.com/pytorchbot
2025-03-17 11:39:29 +00:00
Sun, Jiayi
b2862f1435 optimize the decomposition of aten.native_group_norm (#144733)
Summary:
Optimize the decomposition of aten.native_group_norm. Reduce unnecessary repeated operations by changing the order of operations for `mean`, `rstd`, `weight`, `bias `and `input`, which can improve performance when `flattened_inner_size `is large.

The original decomposition:
1. compute `mean `and `rstd`,
2. out = (x - mean) * rstd, compute in the range [N, C, *],
3. out = out * weight + bias, compute in the range [N, C, *],

The new decomposition:
1. compute `mean `and `rstd`,
2. new_weight = rstd * weight, new_bias = - mean * rstd * weight + bias, compute in the range [N, C],
3. out = out * new_weight + new_bias, compute in the range [N, C, *],

I tested the Inductor performance benchmark with this PR on both CPU and A100. On CPU, two torchbench models(functorch_dp_cifar10 and opacus_cifar10) have about 25% performance improvement, and two diffusion models(Stable Diffusion and Latent Consistency Model(LCM)) have about 2% performance improvement. On A100, no performance gains or regressions were seen.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144733
Approved by: https://github.com/leslie-fang-intel, https://github.com/jansel
2025-03-17 09:27:01 +00:00
soulitzer
916e8979d3 Skip some tests not using gradcheck on slowgradcheck (#149220)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149220
Approved by: https://github.com/seemethere
2025-03-17 00:34:52 +00:00
eqy
6048d88afe [ARM64][CUDA] skip string pattern matching in test_workspace_allocation_error (#149236)
`unwind()` on ARM64 seems to elide the strings of interest

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149236
Approved by: https://github.com/malfet, https://github.com/eellison, https://github.com/BoyuanFeng
2025-03-17 00:30:43 +00:00
Aaron Gokaslan
bfee141666 [BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149257
Approved by: https://github.com/jansel
2025-03-16 23:52:58 +00:00
Tugsbayasgalan Manlaibaatar
6b1b95ad2a Support subclass constructor capturing in export (#147014)
Notable TODOs:
1. Need to implement AutogradHOP to get rid of subclasses before serializing
2. Need to implement mechanism to figure out what subclasses will be used in export when they are not expressed in the inputs

Differential Revision: [D69640673](https://our.internmc.facebook.com/intern/diff/D69640673)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147014
Approved by: https://github.com/bdhirsh
2025-03-16 18:19:19 +00:00
Davide Italiano
9f33c6f0a0 [MPS] Add support for modified_bessel_i0 in eager. (#149264)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149264
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-03-16 04:45:49 +00:00
James Wu
a9c55277d7 [Reland] First version of statically compiled launcher for triton compiled CUDA kernels (#149238)
This is a new version of https://github.com/pytorch/pytorch/pull/148561 fixing the ROCM test failure

Putting this up for a first pass review, though I will likely make a bunch of changes before landing to add more features, etc.

This diff implements a first version of a static CUDA kernel launcher in `torch._C`. The goal here is to take a cubin file and some metadata from a CompiledKernel from `triton`, and launch the cubin file directly.

Background doc: https://docs.google.com/document/d/1rjRcHl6MfauHG30nCoQX-9UKvKyIs4WWMy_GsGyqb9g/edit?tab=t.0#heading=h.ut5lf39lzq66

Normally, using triton's CompiledKernel.make_launcher(), we would pay the cost of codegenning C++ and running it at compile time. With this new approach, we can use one statically compiled library to launch the kernel.

The tradeoff here is that this new kernel launcher will not be able to use codegen to deal with different lengths/types of arguments. So we use templating to handle up to 10 arguments for now. We also allocate 8 bytes on the stack per argument no matter the argument type, which can take more memory than codegenning. On the other hand, we improve compile time on cold and warm start by not having to call the C++ compiler at all.

This diff does not add the launcher to torch, but introduces a basic test suite.

A list of TODOs that are not yet complete:
- Handle `nvTmaDesc` and `cuTensorMap`, which triton handles
- Embed the grid logic instead of passing in gridX,Y,Z
- Handle launch_enter and exit hooks? (Not sure if inductor has these)
- Benchmarking to see if there's runtime performance loss
- Probably lots of features of the triton C++ generated code that I haven't handled yet.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149238
Approved by: https://github.com/oulgen
2025-03-15 15:06:46 +00:00
Huamin Li
e7e477c1f9 Not generate custom obj json when it's empty (#149246)
Summary: as title.

See internal Diff summary for more context.

Test Plan: buck run @fbcode//mode/dev-nosan //caffe2/test/inductor:torchbind -- -r config_not_generated

Differential Revision: D71241676

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149246
Approved by: https://github.com/houseroad

Co-authored-by: Huamin Li <huaminli@meta.com>
2025-03-15 13:00:48 +00:00
Jane Xu
740ce0fa5f op should NOT be static in aoti_torch_call_dispatcher (#149208)
aoti_torch_call_dispatcher is meant to call different ops, so the op must not be static. Otherwise, every call to this API will call the first op that was ever called, which is not the intended behavior of any human being.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149208
Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/malfet
2025-03-15 01:47:11 +00:00
Simon Fan
578160c875 [ca] don't inline accumulate grad op (#149014)
we use dummy tensors in our initial trace, so we should never inline. the subclass dispatch might not support the dummy tensor, e.g. DTensor accumulate grad will check that both param and grad are DTensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149014
Approved by: https://github.com/jansel
ghstack dependencies: #149064
2025-03-15 01:10:54 +00:00
Simon Fan
f4368d8872 [ca] clean up aot node deduping (#149064)
rename the AOT nodes as we copy paste them into the CA graph

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149064
Approved by: https://github.com/jansel
2025-03-15 01:10:54 +00:00
Nikita Shulga
96795e9533 [BE] Parametrize TestMPS.test_binops_dtype_precedence (#149234)
No op change, just splits a longer tests into a series of a smaller ones
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149234
Approved by: https://github.com/atalman, https://github.com/dcci
ghstack dependencies: #149216, #149233
2025-03-15 00:37:11 +00:00
yifanmao
7537b19c73 [FSDP2] Update ignored_params docstring and add unit test (#149074)
Fixes https://github.com/pytorch/pytorch/issues/148242

ignored_params won't be moved to devices in full_shard(), update docstring.
Add unit test `test_move_states_to_device_ignored_param_device` to show that ignored_params won't be moved during full_shard(), but would be after `model.cuda()`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149074
Approved by: https://github.com/awgu
2025-03-15 00:23:09 +00:00
Nikita Shulga
08af311fc2 [MPS] Fix type promotion for torch.floor_divide (#149233)
And delete some duplicating glue code by relying on the stub
After this change `torch.arange(10, device = 'mps') // torch.arange(10., device='mps')` will return tensor of floats, which is a common dtype for float + integral operation, rather than tensor of ints
Checked by `test_div2` inductor testing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149233
Approved by: https://github.com/atalman
ghstack dependencies: #149216
2025-03-15 00:00:42 +00:00
bobrenjc93
eb7bf4202d Make dynamism code robust to NotImplementedException (#148823)
In prod many models have `@property` methods that raise
NotImplementedError. This PR updates our dynamism code to be more robust
to these types of models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148823
Approved by: https://github.com/laithsakka
2025-03-14 23:38:19 +00:00
PyTorch MergeBot
f9b4856989 Revert "[pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257)"
This reverts commit c95a6b416b.

Reverted https://github.com/pytorch/pytorch/pull/113257 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. @zou3519 can you please help land this internally? See the sigmoid tests in D71198793 for details. To validate the fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/113257#issuecomment-2725982539))
2025-03-14 23:13:34 +00:00
PyTorch MergeBot
643aaea133 Revert "[RFC] First version of statically compiled launcher for triton compiled CUDA kernels (#148561)"
This reverts commit 5a843f8973.

Reverted https://github.com/pytorch/pytorch/pull/148561 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/148561#issuecomment-2725969268))
2025-03-14 23:01:26 +00:00
cz2h
05f2cbfe19 Add meta function for out variants of ones,zeros,empty (#149098)
Open another PR to fix merge conflicts. Fixes https://github.com/pytorch/pytorch/issues/135832

For aten.ones, aten.zeros, followed this [link](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit?tab=t.0#heading=h.64r4npvq0w0) to register meta functions.

For aten.empty.out, followed this [part](https://docs.google.com/document/d/1GgvOe7C8_NVOMLOCwDaYV1mXXyHMXY7ExoewHqooxrs/edit?tab=t.0#heading=h.iy9lxhxhtl5v) to register a decomp for empty that handles the FakeTensor input.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149098
Approved by: https://github.com/williamwen42
2025-03-14 22:17:30 +00:00
Nikita Shulga
d7d9a71e19 [MPSInductor] Add support for atan2 (#149216)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149216
Approved by: https://github.com/dcci
2025-03-14 21:53:03 +00:00
Isalia20
dd6e9df3d0 [MPS] fix attention enable_gqa crash on mps (#149147)
Fixes #149132

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149147
Approved by: https://github.com/malfet
2025-03-14 21:25:54 +00:00
Davide Italiano
0bd863a62f [MPS] Add inductor support for i1e. (#149221)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149221
Approved by: https://github.com/malfet
2025-03-14 21:18:38 +00:00
zeshengzong
a7f8de2198 Add nn.Bilinear param validation (#149018)
Fixes #103425

## Changes

- Add doc description size value `must be > 0`
- Add validation for `in1_features` param

Currently, only `in1_features` will cause runtime error, if add checks for `in2_features` and `out_features` as well, might be kind of BC breaking.

```python
import torch
from torch import nn

class lenet(nn.Module):
    def __init__(self):
        super(lenet, self).__init__()
        self.conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=5, stride=1)

        # Error, `in1_features=1, in2_features=0, out_features=0` no error
        self.linear = nn.Bilinear(in1_features=0, in2_features=0, out_features=0)

    def forward(self, x):
        # 1st block
        x = self.conv(x)
        x = self.linear(x)

        return x

if __name__ == '__main__':
    net = lenet()

```

## Test Result

```bash
pytest test/test_nn.py -k test_bilinear -vv
```

![image](https://github.com/user-attachments/assets/20617ba9-bac5-4db2-aecc-1831dbc8eb43)

![image](https://github.com/user-attachments/assets/401e4e1f-051a-4e1c-952b-48e85de64b0b)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149018
Approved by: https://github.com/mikaylagawarecki
2025-03-14 19:26:12 +00:00
James Wu
5a843f8973 [RFC] First version of statically compiled launcher for triton compiled CUDA kernels (#148561)
Putting this up for a first pass review, though I will likely make a bunch of changes before landing to add more features, etc.

This diff implements a first version of a static CUDA kernel launcher in `torch._C`. The goal here is to take a cubin file and some metadata from a CompiledKernel from `triton`, and launch the cubin file directly.

Background doc: https://docs.google.com/document/d/1rjRcHl6MfauHG30nCoQX-9UKvKyIs4WWMy_GsGyqb9g/edit?tab=t.0#heading=h.ut5lf39lzq66

Normally, using triton's CompiledKernel.make_launcher(), we would pay the cost of codegenning C++ and running it at compile time. With this new approach, we can use one statically compiled library to launch the kernel.

The tradeoff here is that this new kernel launcher will not be able to use codegen to deal with different lengths/types of arguments. So we use templating to handle up to 10 arguments for now. We also allocate 8 bytes on the stack per argument no matter the argument type, which can take more memory than codegenning. On the other hand, we improve compile time on cold and warm start by not having to call the C++ compiler at all.

This diff does not add the launcher to torch, but introduces a basic test suite.

A list of TODOs that are not yet complete, will do in separate diff:
- Handle `nvTmaDesc` and `cuTensorMap`, which triton handles
- Embed the grid logic instead of passing in gridX,Y,Z. With https://github.com/pytorch/pytorch/pull/147583, we should be able to handle all of the grid logic directly in _StaticCudaLauncher.launch_kernel, and get rid of the python evaluation.
- Handle launch_enter and exit hooks? (Not sure if inductor has these)
- Benchmarking to see if there's runtime performance loss
- Hooking it up with a config to inductor
- Testing harness to test against torch generated triton kernels

Differential Revision: [D69926783](https://our.internmc.facebook.com/intern/diff/D69926783/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148561
Approved by: https://github.com/aorenste, https://github.com/syed-ahmed
2025-03-14 19:12:13 +00:00
zeshengzong
97272e4b49 Fix torch.nn.functional.hardswish gradients corner case (#148049)
Fixes #147801

## Changes

- Change hardswish gradient compute condition as [torch.nn.functional.hardswish](https://pytorch.org/docs/stable/generated/torch.nn.functional.hardswish.html)
- Enable cuda for test `test_hardswish_grad_corner`
- Add test case for value=-3

## Test Result

```bash
pytest test/test_nn.py -k test_hardswish
pytest test/test_unary_ufuncs.py -k test_hardswish
pytest test/inductor/test_torchinductor.py -k test_hardswish
```

![image](https://github.com/user-attachments/assets/000cb5c4-15f5-4bfd-ab45-f52bf810ff3d)
![image](https://github.com/user-attachments/assets/38b08cf8-ea84-47a2-8e37-0a213da3e0c8)
![image](https://github.com/user-attachments/assets/54bc57be-2c57-46cc-ab90-94ea6cbe1c34)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148049
Approved by: https://github.com/soulitzer
2025-03-14 18:53:10 +00:00
Nikita Shulga
f2221b2fce [MPS] Add support for i1e (#149203)
Followup after https://github.com/pytorch/pytorch/pull/149174
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149203
Approved by: https://github.com/dcci
2025-03-14 17:33:52 +00:00
Davide Italiano
f067eafabb [MPS] Modify a test to test the correct function. (#149204)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149204
Approved by: https://github.com/malfet
2025-03-14 17:27:47 +00:00
Nikita Shulga
42e468d9b0 [MPSInductor] Adjust check_bounds (#147205)
To make upper bound inclusive, which fixes `test_vectorized_ops_masked` and results in the following code
```python
mps_lib_0 = compile_mps_shader("""
    #include <c10/metal/random.h>
    #include <c10/metal/special_math.h>
    #include <c10/metal/utils.h>
    kernel void generated_kernel(
        device float* out_ptr0,
        constant float* in_ptr0,
        uint xindex [[thread_position_in_grid]]
    ) {
        int x0 = (xindex) % (64);
        int x1 = (xindex) / (64);
        auto tmp5 = in_ptr0[x0 + 63*x1];
        int x2 = xindex;
        auto tmp0 = x0;
        auto tmp1 = static_cast<long>(tmp0);
        auto tmp2 = 63;
        auto tmp3 = tmp1 < tmp2;
        if (x0 > 63) return;
        auto tmp6 = tmp3 ? tmp5 : 7;
        out_ptr0[x2] = static_cast<float>(tmp6);
    }
""")
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147205
Approved by: https://github.com/jansel, https://github.com/dcci
ghstack dependencies: #147211
2025-03-14 17:26:00 +00:00
cyy
a9aae05a6b Remove test decorations on MacOS 12 (#148942)
MacOS 12 may reach EOL, as from https://endoflife.date/macos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148942
Approved by: https://github.com/malfet
2025-03-14 17:22:37 +00:00
Davide Italiano
f2ea77c099 [MPS] Add inductor support for i0e. (#149180)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149180
Approved by: https://github.com/malfet
2025-03-14 16:15:52 +00:00
PyTorch MergeBot
71795f159e Revert "[AOTInductor] [BE] Add swap_constant_buffer into pybind for tests. (#149167)"
This reverts commit bea181ff7e.

Reverted https://github.com/pytorch/pytorch/pull/149167 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. See D71177501 for the failure. To validate your fixes internally, you can follow the instructions here: https://fburl.com/fixing-ghfirst-reverts ([comment](https://github.com/pytorch/pytorch/pull/149167#issuecomment-2725001232))
2025-03-14 15:16:21 +00:00
Davide Italiano
706c22549c [MPS] Add support for i0e in eager. (#149174)
Add `special.i0e` to XFAIL_GRADLIST for now, as its backward op is not yet implemented
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149174
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-03-14 14:43:46 +00:00
Huamin Li
68bbe20db7 Add test coverage (#149182)
Summary: Follow up from D71160718

Differential Revision: D71177037

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149182
Approved by: https://github.com/houseroad
2025-03-14 09:38:29 +00:00
Xuehai Pan
c95a6b416b [pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257)
Changes in this PR:

1. Add `is_structseq` and `is_structseq_class` functions to determine a object or a class is PyStructSequence.
2. Add a generic class `structseq` which can be used as the registration key for PyStructSequence types like `namedtuple` for Named Tuple types.
3. Change `is_namedtuple` to accept subclasses of namedtuple to be namedtuple. Before this PR, only namedtuple class directly created by `collections.namedtuple` or `typing.NamedTuple` were namedtuple classes while their subclasses were not. This PR makes `is_namedtuple` return true for subclasses of namedtuple class.

Resolves #75982. New tests are included in this PR.

- #75982

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113257
Approved by: https://github.com/zou3519
2025-03-14 08:50:30 +00:00
Sheng Fu
05ac99042f Clean up grid in execution trace (#149159)
Summary: This DIFF https://www.internalfb.com/diff/D70471332 removed input "grid" when calling triton kernel. PyTorch execution trace need to make the appropriate change. It includes capturing ET and replay ET.

Test Plan:
buck2 run mode/opt caffe2/test:test_profiler_cuda  -- profiler.test_execution_trace.TestExecutionTraceCUDA.test_execution_trace_with_pt2_cuda

buck2 run mode/opt param_bench/fb/integration_tests:test_et_replay

Differential Revision: D71152464

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149159
Approved by: https://github.com/sraikund16, https://github.com/jansel
2025-03-14 07:12:16 +00:00
PyTorch MergeBot
be4e6c1c8e Revert "[MPS] Add support for i0e in eager. (#149174)"
This reverts commit b4745db904.

Reverted https://github.com/pytorch/pytorch/pull/149174 on behalf of https://github.com/malfet due to MPS are red on trunk ([comment](https://github.com/pytorch/pytorch/pull/149174#issuecomment-2723774600))
2025-03-14 06:35:01 +00:00
Nikita Shulga
e162758051 [MPSInductor] Add bessel_[jy][01] ops (#149179)
By simply calling corresponding special functions

Followup TODO: tweak bessel_y0 to match CPU implementation for `torch.half` dtype

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149179
Approved by: https://github.com/dcci
ghstack dependencies: #149123
2025-03-14 06:33:30 +00:00
Nikita Shulga
db6d72213b [MPS] Add torch.special.bessel_[jy][01] implementations (#149123)
By copy-n-pasting functions from
f59064f2b7/aten/src/ATen/native/cuda/Math.cuh (L1463)

With an  ugly workaround for `bessel_y[01]` to avoid internal compiler exception on M1/M2 machines (see FB16863363 /  https://gist.github.com/malfet/e7785e4b572e7740887a83a2386ef769 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149123
Approved by: https://github.com/Skylion007, https://github.com/dcci
2025-03-14 05:13:55 +00:00
Isuru Fernando
9e6b2ca58d Fix sympy float priting (#147552)
Fixes https://github.com/pytorch/pytorch/pull/147261
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147552
Approved by: https://github.com/bobrenjc93, https://github.com/cyyever
2025-03-14 05:07:06 +00:00
Mu-Chu Lee
bea181ff7e [AOTInductor] [BE] Add swap_constant_buffer into pybind for tests. (#149167)
Summary:
We add swap_constant_buffer in pybind to add tests.

Test Plan:
python test/inductor/test_aot_inductor.py -k test_update_inactive_constant_buffer

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149167
Approved by: https://github.com/chenyang78, https://github.com/jingsh
2025-03-14 04:12:48 +00:00
Mu-Chu Lee
e567900998 [AOTInductor] Activate CPU test for update_constant_buffer (#149162)
Summary:
Fixed by #145459

Test Plan:
Re-activating tests.

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149162
Approved by: https://github.com/chenyang78, https://github.com/jingsh
2025-03-14 04:09:57 +00:00