Commit Graph

5667 Commits

Author SHA1 Message Date
PyTorch MergeBot
1036f6d114 Revert "[ROCm] Bump AOTriton to 0.10b (#156290)"
This reverts commit 34d8e64ef6.

Reverted https://github.com/pytorch/pytorch/pull/156290 on behalf of https://github.com/atalman due to failing multiple internal tests ([comment](https://github.com/pytorch/pytorch/pull/156290#issuecomment-2992072727))
2025-06-20 15:35:25 +00:00
Hari Krishna Sai Kodali
e1f28fe17b add device generalisation support for distributed tests (#152471)
### MOTIVATION
To generalize Distributed test cases for non-CUDA devices

### CHANGES

- test/distributed/optim/test_zero_redundancy_optimizer.py
- test/distributed/test_c10d_logger.py
- test/distributed/test_compute_comm_reordering.py

Replaced hard coded device names with get_devtype from torch.testing._internal.common_fsdp.
DistributedTestBase is used instead of MultiProcessTestCase, to make use of helper functions.

- torch/testing/_internal/common_distributed.py

extended common utility functions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152471
Approved by: https://github.com/d4l3k
2025-06-20 07:35:42 +00:00
Nikita Shulga
4cbbc8b458 [MPS] Implement backward pass for interpolate_trilinear (#156373)
Backwards pass simply iterates over all 8 points current point contributed to, and back propagates them with the respective weights

TODO: Benchmark the performance of similar loop for the forward pas (i.e. compiler should be able to do loop unrolling, so no point of unrolling it by hand)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156373
Approved by: https://github.com/dcci
ghstack dependencies: #156375
2025-06-20 05:41:24 +00:00
Xinya Zhang
34d8e64ef6 [ROCm] Bump AOTriton to 0.10b (#156290)
Notable new features/optimizations for SDPA operators on AMD systems from AOTriton 0.10b:

* Official support of gfx950/gfx1201
* Experimental support of gfx1101/gfx1151/gfx1150/gfx1200
* Reduce libaotriton.so binary size by over 80%.
  + Without this optimization the binary size of `libaotriton.so` could be
    over 100MiB due to 2x more supported architectures compared with 0.9b.
    Now it is only about 11MiB.
* Support sliding window attention (SWA) in
  `_flash_attention_forward/backward`. Should fix #154582

See https://github.com/ROCm/aotriton/releases/tag/0.10b for full details,
including Known Problems.

Notable changes to SDPA backend:

* `std::optional<int64_t>` `window_size_left/right` are directly passed to
  ROCM's SDPA backend, because the default value `-1` is meaningful to
  AOTriton's backend and bottom-right aligned causal mask is implemented with
  negative `window_size_left/right`
* Some code clean up around `USE_CK_FLASH_ATTENTION`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156290
Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily
2025-06-19 21:13:58 +00:00
Nikita Shulga
c2f4cc59a7 [MPS] Fix bug in 3d coords calculation (#156375)
Which was not caught by CI beforehand, as all 3D examples right now are symmetric, so add an uneven shape to `sample_inputs_interpolate`

Though it's indirectly tested by `test_upsample_nearest3d` inductor test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156375
Approved by: https://github.com/atalman
2025-06-19 19:56:15 +00:00
Nikita Shulga
36f7a027b5 [MPS] Implement upsample_trilinear as Metal shader (#156263)
But only forward for now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156263
Approved by: https://github.com/dcci
ghstack dependencies: #156256, #156090
2025-06-18 16:10:02 +00:00
Nikita Shulga
c28e74e457 [MPS] Add nearest_3d forward and backward (#156090)
Introduce generalizable `UpsampleParams` structure in `UpSample.h`, which could be shared between CPU and MPS
Delete `upsample_nearest3d` MPS fallback and replace it with proper shader
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156090
Approved by: https://github.com/kulinseth, https://github.com/dcci
ghstack dependencies: #156256
2025-06-18 04:48:15 +00:00
Daniel Galvez
9ed0060225 Provide access to the cudaGraph_t underlying a CUDAGraph. (#155164)
There are a few considerations here:

1. A user might want to modify the cudaGraph_t either during the stream capture or after the stream capture (but before instantiation). This draft implements modification after stream capture only, though support could be added for modification during stream capture by applying
https://github.com/pytorch/pytorch/pull/140979/files#diff-d7302d133bb5e0890fc94de9aeea4d9d442555a3b40772c9db10edb5cf36a35cR391-R404

2. Previously, the cudaGraph_t would be destroyed before the end of capture_end() unless the user had previously called enable_debug_mode(). There is no way to implement this correctly without removing this restriction, or forcing the user to always call enable_debug_mode(). However, enable_debug_mode() is a confusing API (despite being an instance method, it would modify a static global variable; thus, putting one CUDAGraph object into debug mode puts all of them into debug mode, which is not acceptable in my opinion). Therefore, I made enable_debug_mode() into a no-op. This means that the CPU memory usage will increase after this change. I think this is likely to be fine.

3. No python bindings yet. These should be easy to add. It is probably worthwhile to take some time to make sure that the returned cudaGraph_t can be converted into the cuda-python cudaGraph_t in a reasonable, hopefully type-safe, manner (but without making cuda-python a dependency of pytorch), since I imagine most users will use the pip cuda-python package to make modifications.

4. There are two foot guns:

   a. The cudaGraph_t returned by raw_cuda_graph() is not owned by the user, so it will be destroyed once the owning CUDAGraph is destroyed (or calls reset()).

   b. The following seuquence won't work as intended:

```
g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
    foo()
g.replay()
raw_graph = g.raw_cuda_graph()
modify(raw_graph)
g.replay()
```

This won't work because the user must call instantiate() again after modifying cudaGraph_t. You could add a "safety" mechanism by traversing the cudaGraph_t to create a hash and seeing if the hash changes between calls to replay(), but this is likely way too expensive.

I think these two foot guns are probably okay given that this a bit of an experts' API.

Fixes #155106

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155164
Approved by: https://github.com/ngimel
2025-06-18 03:39:28 +00:00
Manuel Candales
a4ea242edc [MPS] Implement scan metal kernels (#156100)
Implements metal kernels for scan operations:
- Migrates cumsum and cumprod from MPSGraph implementation to Metal.
- Fixes #154881
- Adds MPS backend support for cummin and cummax

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156100
Approved by: https://github.com/malfet
2025-06-17 17:44:22 +00:00
Nikita Shulga
b1713c6655 [MPS][Testing][BE] Fix samples for full_like (#156026)
Now that device is known, one can avoid creating tensors of `torch.double` type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156026
Approved by: https://github.com/dcci
ghstack dependencies: #156121
2025-06-17 04:46:26 +00:00
PyTorch MergeBot
03488d820c Revert "[MPS][Testing][BE] Fix samples for full_like (#156026)"
This reverts commit 2d832c9587.

Reverted https://github.com/pytorch/pytorch/pull/156026 on behalf of https://github.com/atalman due to Sorry breaks MPS tests: test_ops.py::TestMathBitsCPU::test_neg_view_full_like_cpu_float64 [GH job link](https://github.com/pytorch/pytorch/actions/runs/15683608879/job/44182730620) [HUD commit link](2d832c9587) ([comment](https://github.com/pytorch/pytorch/pull/156026#issuecomment-2977903074))
2025-06-16 19:50:26 +00:00
Simon Fan
907d0931cc [ca] default on in CI, with fallback for tests in test/compiled_autograd_skips/ (#155480)
For every test that is ran with PYTORCH_TEST_WITH_DYNAMO=1, turn on compiled autograd via config if it is not skipped
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155480
Approved by: https://github.com/jansel
ghstack dependencies: #155521
2025-06-16 18:45:03 +00:00
Nikita Shulga
2d832c9587 [MPS][Testing][BE] Fix samples for full_like (#156026)
Now that device is known, one can avoid creating tensors of `torch.double` type
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156026
Approved by: https://github.com/dcci
2025-06-16 14:27:42 +00:00
Nikita Shulga
831c9010c7 [BE] Remove non-existing operator from unimplemented list (#156025)
Never heard of torch.login :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156025
Approved by: https://github.com/dcci
2025-06-16 14:14:58 +00:00
David Berard
c83041cac2 [test][triton pin] add device-side TMA tests (AOTI + test_triton_kernels) (#155827)
Tests added:
```
python test/inductor/test_triton_kernels.py -k test_on_device_tma
python test/inductor/test_triton_kernels.py -k test_add_kernel_on_device_tma
python test/inductor/test_aot_inductor.py -k test_triton_kernel_on_device_tma
```

These pass on Triton 3.3 but not yet on Triton 3.4 (note: to support tests for both Triton versions, there's two triton kernels - one for old api and one for new api - and a given version of the test will only run if that version of the API is available).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155827
Approved by: https://github.com/FindHao
ghstack dependencies: #155777, #155814
2025-06-15 20:24:19 +00:00
Howard Huang
8e1471bdc9 Allow MultiProcContinuousTest to set world_size (#155920)
`MultiProcContinuousTest` will automatically set world_size to number of devices. This change allows this attribute to be modified by the derived test class

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155920
Approved by: https://github.com/fduwjj
2025-06-15 00:24:17 +00:00
Aaron Orenstein
e95e8eed0a mypy 1.16.0 (#155821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155821
Approved by: https://github.com/ezyang, https://github.com/zou3519
2025-06-14 18:18:43 +00:00
Marcin Pioch
ce79056471 Custom FX pass for inductor's backend registration (#154841)
This PR is related to RFC #153532. It is an extension to Inductor's backend registration interface to allow to register custom FX passes by the backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154841
Approved by: https://github.com/jansel

Co-authored-by: Jason Ansel <jansel@jansel.net>
2025-06-14 17:29:54 +00:00
Sean McGovern
297805fd8f Typo fixes for "overridden" in comments and function names (#155944)
This word appears often in class descriptions and is not consistently spelled. Update comments and some function names to use the correct spelling consistently. Facilitates searching the codebase.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155944
Approved by: https://github.com/Skylion007
2025-06-14 03:37:38 +00:00
Nikita Shulga
fec571cfd4 [BE][CI] Remove hardshrink integer exclusions (#155965)
As they are not called anyway

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155965
Approved by: https://github.com/dcci
2025-06-14 00:32:57 +00:00
Prachi Gupta
f2b44424a1 [ROCm] Skip *_stress_cuda and test_ddp_apply_optim_in_backward* (#155724)
These tests are flaky on ROCm and have been skipped via Github issues, but the bot keeps closing the issues after not observing the failures for these tests in the rerun_disabled_tests runs (not sure why they don't fail there), and we have to keep reopening them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155724
Approved by: https://github.com/jeffdaily

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
2025-06-12 21:18:04 +00:00
Manuel Candales
f8d93b3783 [MPS] Migrate hardswish (forward and backward) to Metal kernel (#155479)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155479
Approved by: https://github.com/kulinseth, https://github.com/malfet
ghstack dependencies: #155304, #155316, #155462
2025-06-11 20:58:46 +00:00
David Berard
9328a7fb58 [triton pin][tests] refactor test_triton_kernel.py tests to test new & old API (#155510)
This splits out the tests so we can independently test both the new and old API.

Note: the new API doesn't work yet - we still need to fix those tests.

Differential Revision: [D76318840](https://our.internmc.facebook.com/intern/diff/D76318840)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155510
Approved by: https://github.com/oulgen
2025-06-11 13:52:15 +00:00
Kurt Mohler
013cf1e330 [MPS] Move expm1 op to Metal (#155611)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155611
Approved by: https://github.com/malfet
2025-06-11 13:06:14 +00:00
Oguz Ulgen
0e2013a12d Add helion x pt2 test (#155513)
This kinda just worked out of the box, shocking. PT2 traced into helion and emitted it as a user defined triton kernel: P1836496774

In the long run, we do not actually want this, but rather to create a helion HOP so we can do fusions etc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155513
Approved by: https://github.com/zou3519, https://github.com/jansel
2025-06-11 07:08:06 +00:00
Siddharth Kotapati
2161be8497 Move unary trig ops to metal kernels (#154465)
Move inverse trig unary ops, sinh, & cosh to metal kernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154465
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-06-10 22:56:59 +00:00
atalman
7a03b0d2ca [BE] Remove CUDA 11 artifacts. Fix Check Binary workflow (#155555)
Please see: https://github.com/pytorch/pytorch/issues/147383

1. Remove CUDA 11 build and test artifacts. One place CUDA 12.4
2. Fix Check Binary Workflow to use Stable Cuda version variable rather then hardcoded one

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155555
Approved by: https://github.com/malfet, https://github.com/Skylion007
2025-06-10 21:32:08 +00:00
redwrasse
8a22551300 Fixes OpInfo gradient checks for ctc_loss (#154590)
Fixes #67462

Re-enables `OpInfo` gradient checks for the restricted scenarios where the current `ctc_loss` implementation is accurate and consistent.

The desired `ctc_loss` gradient behavior appears to be an ongoing discussion, see
https://github.com/pytorch/pytorch/issues/52241. The `OpInfo` gradient checks can be updated if/as the underlying implementation advances.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154590
Approved by: https://github.com/soulitzer
2025-06-10 19:56:39 +00:00
Manuel Candales
0f47e76937 [MPS] Implement hardshrink metal kernel (#155304)
Implements the forward and backward hardshrink operators as Metal kernels.
In order to support the lambda parameter, we extend the `exec_unary_kernel`  and `exec_binary_kernel` methods. Now they take an optional Scalar and an optional ScalarType argument. When the optional ScalarType is provided, it overrides the type of the Scalar.
We add a new `REGISTER_UNARY_ALPHA_OP` macro, and modify the existing `REGISTER_BINARY_ALPHA_OP` to support the new feature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155304
Approved by: https://github.com/malfet
2025-06-10 18:20:27 +00:00
Kurt Mohler
e7698ff5cf [MPS] Move abs op to Metal (#155474)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155474
Approved by: https://github.com/Skylion007, https://github.com/malfet
2025-06-10 00:23:59 +00:00
Roy Hvaara
b95dadd717 [MPS] Enable RProp test for non-contiguous (#155439)
I believe this issue has already been fixed, but I don't know the hero PR. I'm relying on ci signals to verify it's fixed across macOS versions.

Fixes #118117

xref #115350

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155439
Approved by: https://github.com/Skylion007
2025-06-09 21:29:09 +00:00
Roy Hvaara
3490a4f906 [MPS] Enable optimizer tests affected by addcdiv (#155437)
Tracked in #118115. Fixed in #124442. This PR unskips the tests.

xref #115350

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155437
Approved by: https://github.com/Skylion007
2025-06-09 21:27:37 +00:00
Nikita Shulga
abbdf9f363 [BE][Testing] Unskip ones_like/zeros_like testing on MPS (#155476)
But skip `double` dtype form OpInfo variants for this test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155476
Approved by: https://github.com/Skylion007, https://github.com/dcci
2025-06-09 20:37:44 +00:00
PyTorch MergeBot
79bdafe5b6 Revert "Custom FX pass for inductor's backend registration (#154841)"
This reverts commit e694280d12.

Reverted https://github.com/pytorch/pytorch/pull/154841 on behalf of https://github.com/clee2000 due to failing some tests internally D76135706 ([comment](https://github.com/pytorch/pytorch/pull/154841#issuecomment-2956357711))
2025-06-09 16:56:45 +00:00
Nikita Shulga
f140fac8dc [MPS] Implement erfc (#155382)
And migrate `erf` to Metal kernel

Use `erf` approximations from https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/erf.h as previous approximation did not match the CPU implementation

After that, `erfc(x) := 1.0 - erf(x)`

Fixes https://github.com/pytorch/pytorch/issues/155337

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155382
Approved by: https://github.com/manuelcandales, https://github.com/dcci
2025-06-07 02:35:12 +00:00
Marcin Pioch
e694280d12 Custom FX pass for inductor's backend registration (#154841)
This PR is related to RFC #153532. It is an extension to Inductor's backend registration interface to allow to register custom FX passes by the backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154841
Approved by: https://github.com/jansel

Co-authored-by: Jason Ansel <jansel@jansel.net>
2025-06-06 06:49:44 +00:00
Narek Malkhasyan
7999735d23 [CUDA][MPS] Fix torch.arange bound validation for large float inputs (#154320)
Fixes #153133

Fixes an inconsistency in torch.arange on CUDA and MPS backends when using float32 and large input values. Previously, invalid ranges (e.g., start > end with a positive step) could silently return empty tensors due to precision loss in validation logic.

The fix introduces double precision validation for checking whether the step sign is consistent with the range direction.

This ensures torch.arange behaves consistently with CPU for large float32 inputs, and raises an appropriate error when the range is invalid.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154320
Approved by: https://github.com/malfet
2025-06-05 14:51:25 +00:00
PyTorch MergeBot
20912673a6 Revert "Add __main__ guards to jit tests (#154725)"
This reverts commit 1a55fb0ee8.

Reverted https://github.com/pytorch/pytorch/pull/154725 on behalf of https://github.com/malfet due to This added 2nd copy of raise_on_run to common_utils.py which caused lint failures, see https://github.com/pytorch/pytorch/actions/runs/15445374980/job/43473457466 ([comment](https://github.com/pytorch/pytorch/pull/154725#issuecomment-2940503905))
2025-06-04 15:42:52 +00:00
Anthony Barbier
1a55fb0ee8 Add __main__ guards to jit tests (#154725)
This PR is part of a series attempting to re-submit https://github.com/pytorch/pytorch/pull/134592 as smaller PRs.

In jit tests:

- Add and use a common raise_on_run_directly method for when a user runs a test file directly which should not be run this way. Print the file which the user should have run.
- Raise a RuntimeError on tests which have been disabled (not run)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154725
Approved by: https://github.com/Skylion007
2025-06-04 14:44:08 +00:00
Anthony Barbier
c8d44a2296 Add __main__ guards to fx tests (#154715)
This PR is part of a series attempting to re-submit #134592 as smaller PRs.

In fx tests:

- Add and use a common raise_on_run_directly method for when a user runs a test file directly which should not be run this way. Print the file which the user should have run.
- Raise a RuntimeError on tests which have been disabled (not run)
- Remove any remaining uses of "unittest.main()""

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154715
Approved by: https://github.com/Skylion007
2025-06-04 14:38:50 +00:00
Nikita Shulga
9f39028629 [MPS][BE] Move sigmoid op to Metal (#155080)
Fixes https://github.com/pytorch/pytorch/issues/154895
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155080
Approved by: https://github.com/dcci, https://github.com/cyyever
ghstack dependencies: #154936, #155002, #155081
2025-06-04 03:28:11 +00:00
Natalia Gimelshein
34e3930401 fix numpy compatibility for 2d small list indices (#154806)
Will fix #119548 and linked issues once we switch from warning to the new behavior,
but for now, given how much this syntax was used in our test suite, we suspect a silent change will be disruptive.
We will change the behavior after 2.8 branch is cut.
Numpy behavior was changed at least in numpy 1.24 (more than 2 years ago)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154806
Approved by: https://github.com/cyyever, https://github.com/Skylion007, https://github.com/albanD
2025-06-04 01:58:52 +00:00
Nikita Shulga
f714599c57 [MPS][BE] Extend torch.special. to integer dtypes (#155002)
By changing the functor to looks as follows
```metal
struct xlog1py_functor {
  template <typename T, enable_if_t<is_floating_point_v<T>, bool> = true>
  inline T operator()(const T a, const T b) {
    return static_cast<T>(c10:🤘:xlog1py(a, b));
  }
  template <typename T, enable_if_t<is_integral_v<T>, bool> = true>
  inline float operator()(const T a, const T b) {
    return c10:🤘:xlog1py(float(a), float(b));
  }
};
```

Repeat the same for `zeta`, `chebyshev_polynomial_[tuvw]_functor` and `hermite_polynomial_h[e]_functor`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155002
Approved by: https://github.com/Skylion007, https://github.com/dcci
ghstack dependencies: #154936
2025-06-03 17:52:41 +00:00
Nikita Shulga
e9266f807a [BE] Use vendored packaging for testing (#154946)
As the rest of the torch uses it, test should rely on it as well

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154946
Approved by: https://github.com/cyyever, https://github.com/Skylion007
2025-06-03 14:22:53 +00:00
Nikita Shulga
9cdce682a1 [MPS][BE] Reimplement log1p as Metal shader (#154936)
That should make it faster than MPSGraph implementation, but also
improves accuracy for small inputs, by using the algorithm described in [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1202), i.e. $log(1+x) = \frac{x * log(1+x)}{(1 + x) - 1}$ if $1 +x \neq 1$ else just $x$

Also tried using first 3 elements of Taylor series in Horner's form which also seems to work fine, i.e. $log(1+x) \approx x * (1 -x (\frac{1}{2} -  \frac{x}{3}))$

Replaced less accurate log1p implementation in `c10/metal/special_math.h` with generic one.

Parametrize and modify regression test to check for accuracy of small values

TODOs:
 - Do proper implementation for complex values as well, perhaps using 0408ba0a76/mlx/backend/metal/kernels/utils.h (L339)
 - May be implement it using Remez-like algorithm documented here 207f3b2b25/lib/msun/src/s_log1pf.c (L37)
 - Or use llvm's implementation from f393986b53/libclc/clc/lib/generic/math/clc_log1p.inc (L22)
 - Benchmark which algorithm is faster and delivers better accuracy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154936
Approved by: https://github.com/dcci, https://github.com/Skylion007
2025-06-03 14:10:13 +00:00
Guilherme Leobas
731e635c95 Add CPython math/cmath tests (#150794)
Tests:
* test_math.py
* test_cmath.py

Minor changes were made to each test to run them inside Dynamo

One can reproduce the changes by downloading the tests from CPython and applying the diff:

```bash
for f in "test_math" "test_cmath"; do
	wget -O "test/dynamo/cpython/3_13/${f}.py" "https://raw.githubusercontent.com/python/cpython/refs/heads/3.13/Lib/test/${f}.py"
	git apply "test/dynamo/cpython/3_13/${f}.diff"
done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150794
Approved by: https://github.com/zou3519
2025-06-02 20:49:44 +00:00
Joona Havukainen
981bdb39ca Enable ConvTranspose3D for FP32 and Complex64 (#154696)
Fixes #154615

Enables using ConvTranspose3D since it seems support exists both on MacOS 14 and 15.

For the half dtypes the discrepancy of CPU and GPU implementations is too large to conclude whether there is a bug in the implementation or not without a more rigorous study on what bounds are there to the expected error. So they are left unsupported for now and an assert is added to notify the user if the op is called with fp16 or bf16 inputs.

Tests for ConvTranspose3D were enabled for the supported data types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154696
Approved by: https://github.com/malfet
2025-06-02 16:24:03 +00:00
Chen Lai
9371491529 [Reland][pytorch] Patch the _is_conv_node function (#154473)
Summary: Add the conv padding ops in pytorch, the corresponding pr in torch ao is https://github.com/pytorch/ao/pull/2257

Test Plan:
```
buck test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_conv_padding_bn_relu (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
```

Differential Revision: D75494468

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154473
Approved by: https://github.com/Skylion007
2025-05-30 00:41:03 +00:00
Nikita Shulga
d6cb0fe576 [MPS] Extend index_copy support to complex dtypes (#154671)
Should have noticed it during the review
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154671
Approved by: https://github.com/dcci
ghstack dependencies: #154670
2025-05-30 00:28:13 +00:00
Isalia20
41092cb86c [MPS] index copy impl (#154326)
Second most requested op according to #154052

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154326
Approved by: https://github.com/malfet
2025-05-29 16:57:43 +00:00