pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Isalia20	62b0ebd8f9	[MPS] [Sparse] unique_dim and sparse broadcast (#163694 ) Implements unique_dim, sparse broadcast ops and adds dtypes for mps for tests where we expect to fail, otherwise they would always fail due to being run in double precision Pull Request resolved: https://github.com/pytorch/pytorch/pull/163694 Approved by: https://github.com/malfet	2025-09-26 23:03:13 +00:00
Kurt Mohler	5236007806	[MPS] Add `embedding_bag` forward pass (#163012 ) Part of #162270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163012 Approved by: https://github.com/kulinseth, https://github.com/malfet	2025-09-17 19:00:47 +00:00
Kurt Mohler	583bbf7761	[MPS] Add `native_dropout` and `native_dropout_backward` (#162108 ) Fixes #162002 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162108 Approved by: https://github.com/malfet	2025-09-09 01:44:06 +00:00
Kurt Mohler	791eff96c8	[MPS] Add `igamma/igammac` ops (#161927 ) Fixes #161725 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161927 Approved by: https://github.com/malfet	2025-09-02 20:52:02 +00:00
Isalia20	f3697b033e	[MPS] add bunch of unary funcs for sparse tensors (#161846 ) adds bunch of unary functions for sparse tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/161846 Approved by: https://github.com/malfet	2025-08-30 21:13:05 +00:00
Irakli Salia	8627a19adf	[MPS] sparse add unary funcs + add for sparse tensors (#160839 ) Adds several unary functions and add. Enables tests for unary functions in test_sparse but not enabling other tests yet, needs more ops before we fully migrate to testing SparseMPS with `test_sparse.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/160839 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-08-30 01:09:00 +00:00
Nikita Shulga	7c30a9d7fc	[MPS] Add slow version of `kthvalue` (#161817 ) Which heavily borrows implementation logic from `topk` As this method is non-deterministic, modified the logic for cpu-ops indices comparison with just an equality statement, as by default random numbers picked for input tensor allow for quite a lot of overlaps Pull Request resolved: https://github.com/pytorch/pytorch/pull/161817 Approved by: https://github.com/dcci	2025-08-30 00:44:29 +00:00
PyTorch MergeBot	f6368e934e	Revert "[MPS] sparse add unary funcs + add for sparse tensors (#160839 )" This reverts commit `93c5112f46`. Reverted https://github.com/pytorch/pytorch/pull/160839 on behalf of https://github.com/atalman due to test_sparse_csr.py::TestSparseCompressedCPU::test_consistency_SparseCSR_asinh_cpu_complex64 [GH job link](https://github.com/pytorch/pytorch/actions/runs/17329155095/job/49201551217) [HUD commit link](`93c5112f46`) ([comment](https://github.com/pytorch/pytorch/pull/160839#issuecomment-3238093296))	2025-08-29 19:55:39 +00:00
Irakli Salia	93c5112f46	[MPS] sparse add unary funcs + add for sparse tensors (#160839 ) Adds several unary functions and add. Enables tests for unary functions in test_sparse but not enabling other tests yet, needs more ops before we fully migrate to testing SparseMPS with `test_sparse.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/160839 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-08-29 16:28:58 +00:00
Nikita Shulga	2042d2174a	[MPS] Migrate round unary op to Metal (#161712 ) And actually use the right function, as [`torch.round`](https://docs.pytorch.org/docs/stable/generated/torch.round.html) doesn't use `std::round`, but rather `std::rint`, which can be easily seen by running something like ```python import torch print(torch.arange(-3., 3., step=.5, device='mps').round()) print(torch.arange(-3., 3., step=.5, device='mps').cpu().round()) ``` Before this change it printed ``` tensor([-3., -3., -2., -2., -1., -1., 0., 1., 1., 2., 2., 3.], device='mps:0') tensor([-3., -2., -2., -2., -1., -0., 0., 0., 1., 2., 2., 2.]) ``` But after this change results match Pull Request resolved: https://github.com/pytorch/pytorch/pull/161712 Approved by: https://github.com/dcci	2025-08-28 16:45:07 +00:00
Nikita Shulga	a44a0d3671	[MPS] Fix index_add for complex + int64 (#160926 ) By re-using deterministic algorithm from `bbc7c03e93/aten/src/ATen/native/cuda/Indexing.cu (L1106-L1113)` Fixes https://github.com/pytorch/pytorch/issues/160845 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160926 Approved by: https://github.com/manuelcandales ghstack dependencies: #160850, #160889	2025-08-19 17:43:06 +00:00
Kurt Mohler	6382302990	[MPS] Add `grid_sampler_3d` for MPS (#160541 ) This PR adds support for `grid_sampler_3d` for MPS with "bilinear" interpolation. NOTE: "nearest" interpolation is not yet supported Fixes #159882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160541 Approved by: https://github.com/malfet	2025-08-15 16:19:25 +00:00
Nikita Shulga	db0b7f1cc9	[BE][CI] Adjust `error_inputs` for cat and complex (#160378 ) MPS backend does not support double, so errors should be different Pull Request resolved: https://github.com/pytorch/pytorch/pull/160378 Approved by: https://github.com/dcci	2025-08-13 18:35:06 +00:00
Nikita Shulga	842cc77ab9	[MPS] Extend addmm to integral types (#160270 ) By adding `addmm` kernel, which is a logical continuation of `mm` one. The only tricking part are how alpha and beta constants are handled, which are passed as `optmath_t`, i.e. that it could be, int64, int32 or float Unified all MM flavors instantiations thru `INSTANTIATE_MM_OPS` and tested that `addmm` metal kernel works as expected for floating types as well by testing it via ``` PYTORCH_MPS_PREFER_METAL=1 python test/test_mps.py -v -k test_output_match_addmm_mps_ ``` Fixes https://github.com/pytorch/pytorch/issues/154901 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160270 Approved by: https://github.com/Skylion007, https://github.com/dcci ghstack dependencies: #160228, #160234	2025-08-11 00:54:17 +00:00
Nikita Shulga	8c41cb800a	[MPS][BE] Combine all pre-MacOS14 xfail lists (#160228 ) It does not matter whether it started to fail after 13.1 or 13.3, fact that it still fails on latest MacOS Pull Request resolved: https://github.com/pytorch/pytorch/pull/160228 Approved by: https://github.com/dcci	2025-08-09 00:00:46 +00:00
Nikita Shulga	28ccc9e724	[MPS] Extend `index_put` to complex types (#160159 ) And delete confusing supported types check. Move all pseudo atomic (but eventually consistent) ops to `c10/metal/atomic.h` header Fixes https://github.com/pytorch/pytorch/issues/160034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160159 Approved by: https://github.com/manuelcandales, https://github.com/dcci, https://github.com/Skylion007	2025-08-08 21:54:30 +00:00
Kurt Mohler	b59b61a099	Add `avg_pool3d` backward pass for MPS (#159089 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159089 Approved by: https://github.com/malfet	2025-08-05 01:55:38 +00:00
Kurt Mohler	d4109a0f99	[MPS] Add max_unpool1d/2d/3d (#159789 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159789 Approved by: https://github.com/malfet	2025-08-04 20:00:59 +00:00
Nikita Shulga	15bb81ea4f	[2/N][CI] Remove MacOS-13 workarounds from tests (#159304 ) Part of https://github.com/pytorch/pytorch/issues/159275 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159304 Approved by: https://github.com/dcci, https://github.com/cyyever ghstack dependencies: #159277, #159278	2025-07-29 23:12:13 +00:00
Kurt Mohler	52b9af163c	Add `avg_pool3d` for MPS (#158877 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158877 Approved by: https://github.com/malfet	2025-07-29 15:22:22 +00:00
Mikayla Gawarecki	7f649ed4f8	Add basic torch.hash_tensor op (#154149 ) Added `torch.hash_tensor` reduction function with a `mode` argument that defaults to reduction with xor. - The hash is always uint64. - Integers will be casted to uint64 before performing the xor_sum reduction - Floats will be upcasted to double and then bitcasted to uint64 before performing the xor_sum reduction Pull Request resolved: https://github.com/pytorch/pytorch/pull/154149 Approved by: https://github.com/albanD	2025-07-23 22:28:03 +00:00
Nikita Shulga	9ca080db87	[MPS] Extend atomic operations to all int types (#158179 ) That fixes `index_put(..., accumulate=True)` for all dtypes int64 operation is not really atomic, but eventually consistent from the `index_put_accumulate` kernel point of view: i.e. by the end of the operation results in the global memory are indeed accumulation of the operands at given indices Pull Request resolved: https://github.com/pytorch/pytorch/pull/158179 Approved by: https://github.com/dcci, https://github.com/Skylion007 ghstack dependencies: #158064, #158178	2025-07-14 04:25:05 +00:00
Nikita Shulga	beed033b6e	[MPS] Fix `index_kernel` for large tensors (#158064 ) Move `MetalShaderLibrary::bind_tensors` private method to OperatorUtils.h and extract `iter_tensor_offset` method, that returns an offset from the start of the storage associated with given tensor inside the iterator Migrated `index`, `index_put[_accumulate][_serial]` to the new paradigm that does not require additional tensor for indices nor special handling for 32 vs 64-bit offset, which resulted in almost 2x perf gain for 2000x2000 tensor, see results below before ``` [------------------------------------------------------------ -----------------------------------------------------------] \| 11x50x50 \| 11x100x100 \| 11x500x500 \| 11x1000x1000 \| 11x2000x2000 1 threads: ---------------------------------------------------------------------------------------------------------------- __getitem__ (torch.int8, torch.int64) \| 383.5 \| 379.8 \| 470.9 \| 1232.9 \| 4410.3 __getitem__ (torch.float16, torch.int64) \| 379.6 \| 354.5 \| 533.2 \| 1290.3 \| 4442.2 __getitem__ (torch.float32, torch.int64) \| 360.8 \| 338.6 \| 478.6 \| 1348.9 \| 4870.4 Times are in microseconds (us). ``` and after ``` [------------------------------------------------------------ -----------------------------------------------------------] \| 11x50x50 \| 11x100x100 \| 11x500x500 \| 11x1000x1000 \| 11x2000x2000 1 threads: ---------------------------------------------------------------------------------------------------------------- __getitem__ (torch.int8, torch.int64) \| 349.8 \| 330.5 \| 432.6 \| 764.5 \| 1961.2 __getitem__ (torch.float16, torch.int64) \| 342.5 \| 330.7 \| 434.7 \| 741.0 \| 1969.4 __getitem__ (torch.float32, torch.int64) \| 332.2 \| 326.1 \| 445.4 \| 751.3 \| 1972.6 Times are in microseconds (us). ``` While migrating also fixed index_put_accumulate for boolean types, by using compare_and_exchange trick over uint Fixes https://github.com/pytorch/pytorch/issues/153560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158064 Approved by: https://github.com/dcci	2025-07-11 22:35:44 +00:00
Kurt Mohler	510c398a4f	Add `max_pool3d` backward pass for MPS (#157498 ) Note on backward precision over fp16: A float16 number has 10 bits of mantissa, 5 bits of exponent, and 1 bit for the sign. If the sign bit is positive, then with a mantissa $m$ and exponent $e$ represented in base 10, the number that the float16 format represents is $(1 + m / 1024) \exp2(e)$. ([source](https://en.wikipedia.org/wiki/Half-precision_floating-point_format)) Consider adding two numbers $a$ and $b$ which have arbitrary mantissas, and say their exponents are $e_a = 1$ (so $2 \le a \lt 4$) and $e_b=-3$ (so $0.175 \le b \lt 0.25$). Assume that the result has the same exponent as $a$. Since the exponents differ by 4, we'll effectively need to truncate the 4 rightmost bits of $b$'s mantissa, which would introduce a maximum error on the order of $(2^4 / 1024) \exp2(-3) \approx 0.002$. The error is nearly the same if $e_b = -2$ (so $0.25 \le b \lt 0.5$), where the 3 rightmost bits are truncated, giving a maximum error on the order of $(2^3 / 1024) \exp2(-2) \approx 0.002$. Same for $e_b=-1$. So if we're adding up nine different numbers that all have exponents -3, -2, or -1, and they sum to a number with exponent 1, then we would expect a maximum error of several times greater than 0.002. In my comments above, summing those particular nine numbers in different ways gave results that ranged between 3.1816 and 3.1758, a difference of $0.0058 \approx 2.9 * 0.002$. That's within the acceptable bounds, and we can safely just increase the error tolerance used in test_output_grad_match for the case of max_pool3d_backward with float16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157498 Approved by: https://github.com/malfet	2025-07-07 19:46:44 +00:00
Nikita Shulga	a952956d05	Add isnan exit condition to special ops (#157464 ) They might have been slow on CUDA-11.3, but this version of CUDA is long gone. More fundamental underlying issue were linear complexity of the recursive polynomial definitions for higher order polynomials, for example see this loop from implementation of Chebyshev polynomial of the first kind `7081b8233a/aten/src/ATen/native/Math.h (L2969-L2973)` which were tested by `test_compare_cpu` using following values (as sample index 16) `7081b8233a/torch/testing/_internal/opinfo/core.py (L2079)` Luckily chebyshev polynomials for absolute values higher than 1 pretty quickly reach infinity, see below ``` python3 -c "import torch;print(torch.special.chebyshev_polynomial_v(torch.nextafter(torch.tensor(1.0), torch.tensor(2.0)), torch.tensor(1e6)))" tensor(nan) ``` Which is not the case for Laguerre polynomials, but it's probably fine to just limit it to 1e7 Before ``` $ PYTORCH_TEST_WITH_SLOW=1 python test_ops.py -k chebyshev_polynomial_ ssssssss..ssssss..ssssss..ssssssssssssssssssssss..ssssss/home/ubuntu/py3.10-nightly/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:131: UserWarning: This API is going to be deprecated, please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:78.) return torch._C._get_cublas_allow_tf32() ....ssssssssssss..ssssss..ssssss............ssssssssssssssssssssssssssssssssssss..ssssssssssssss..ssssss..ssssssssssssssssssssssssssssss..ssssss....ssssssssssss..ssssss..ssssss............ssssssssssssssssssssssssssssssssssss..ssssss..ssssssssssssss..ssssss..ssssss..ssssssssssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssssssssssss ---------------------------------------------------------------------- Ran 432 tests in 8.575s OK (skipped=344) ``` After ``` $ PYTORCH_TEST_WITH_SLOW=1 python test_ops.py -k chebyshev_polynomial_ ssssssss........................ssssssssssssssss......../home/ubuntu/pytorch/torch/backends/cuda/__init__.py:131: UserWarning: This API is going to be deprecated, please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /home/ubuntu/pytorch/aten/src/ATen/Context.cpp:78.) return torch._C._get_cublas_allow_tf32() ........................................................................................xxxxxxxx................ssssssssssssssssssssssss........................................................................................................ssssssss........................ssssssss........................................................................................ssssssss ---------------------------------------------------------------------- Ran 432 tests in 45.580s OK (skipped=72, expected failures=8) ``` Fixes https://github.com/pytorch/pytorch/issues/79528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/157464 Approved by: https://github.com/Skylion007, https://github.com/dcci ghstack dependencies: #157488	2025-07-05 04:19:50 +00:00
Manuel Candales	d56f11a1f2	[MPS] Implement logcumsumexp metal kernel (#156858 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156858 Approved by: https://github.com/malfet ghstack dependencies: #157512	2025-07-03 18:16:25 +00:00
PyTorch MergeBot	c9174a20f7	Revert "[BE] Unskip special ops (#157464 )" This reverts commit `e124a0d88c`. Reverted https://github.com/pytorch/pytorch/pull/157464 on behalf of https://github.com/clee2000 due to caused slow test config to time out [GH job link](https://github.com/pytorch/pytorch/actions/runs/16037776972/job/45254574100) [HUD commit link](`e124a0d88c`) ([comment](https://github.com/pytorch/pytorch/pull/157464#issuecomment-3032676989))	2025-07-03 15:24:15 +00:00
PyTorch MergeBot	b6276a425f	Revert "[MPS] Add `shifted_chebyshev_polynomial_[tuvw]` (#157488 )" This reverts commit `9620994067`. Reverted https://github.com/pytorch/pytorch/pull/157488 on behalf of https://github.com/clee2000 due to caused slow test config to time out [GH job link](https://github.com/pytorch/pytorch/actions/runs/16037776972/job/45254574100) [HUD commit link](`e124a0d88c`) ([comment](https://github.com/pytorch/pytorch/pull/157464#issuecomment-3032676989))	2025-07-03 15:24:15 +00:00
Nikita Shulga	9620994067	[MPS] Add `shifted_chebyshev_polynomial_[tuvw]` (#157488 ) For eager and inductor As for all other chebyshev ops, logic is simply compiled from `94716db222/aten/src/ATen/native/cuda/Math.cuh (L2821)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/157488 Approved by: https://github.com/dcci ghstack dependencies: #157464	2025-07-02 23:29:35 +00:00
Nikita Shulga	e124a0d88c	[BE] Unskip special ops (#157464 ) They were slow on CUDA-11.3, which has long been gone, let's see if they work now Before ``` $ python test_ops.py -k chebyshev_polynomial_ ssssssss..ssssss..ssssss..ssssssssssssssssssssss..ssssss/home/ubuntu/py3.10-nightly/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:131: UserWarning: This API is going to be deprecated, please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:78.) return torch._C._get_cublas_allow_tf32() ....ssssssssssss..ssssss..ssssss............ssssssssssssssssssssssssssssssssssss..ssssssssssssss..ssssss..ssssssssssssssssssssssssssssss..ssssss....ssssssssssss..ssssss..ssssss............ssssssssssssssssssssssssssssssssssss..ssssss..ssssssssssssss..ssssss..ssssss..ssssssssssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssss..ssssssssssssss ---------------------------------------------------------------------- Ran 432 tests in 8.575s OK (skipped=344) ``` After ``` $ python test_ops.py -k chebyshev_polynomial_ ssssssss........................ssssssssssssssss......../home/ubuntu/py3.10-nightly/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:131: UserWarning: This API is going to be deprecated, please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:78.) return torch._C._get_cublas_allow_tf32() ........................................................................................ssssssss................ssssssssssssssssssssssss........................................................................................................ssssssss........................ssssssss........................................................................................ssssssss ---------------------------------------------------------------------- Ran 432 tests in 42.379s OK (skipped=80) ``` Fixes https://github.com/pytorch/pytorch/issues/79528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/157464 Approved by: https://github.com/Skylion007	2025-07-02 23:16:52 +00:00
Nikita Shulga	a1e4f1f98a	[MPS] Reimplement `tri[ul]` as Metal shaders (#157179 ) And add in-place flavor, as it is currently broken for non-contig tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/157179 Approved by: https://github.com/dcci	2025-06-28 01:33:18 +00:00
Kurt Mohler	e0447bb5f8	Add `max_pool3d` for MPS (#156467 ) Fixes #100674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156467 Approved by: https://github.com/malfet	2025-06-26 23:33:50 +00:00
Manuel Candales	2d7e6c6241	[MPS] Revert cumsum/cumprod to MPSGraph implementation (#156708 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156708 Approved by: https://github.com/malfet	2025-06-24 18:12:18 +00:00
Xuehai Pan	cec2977ed2	[BE][6/16] fix typos in torch/ (#156316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315	2025-06-23 02:57:34 +00:00
PyTorch MergeBot	3f44fdc03d	Revert "[BE][6/16] fix typos in torch/ (#156316 )" This reverts commit `b210cf1ea5`. Reverted https://github.com/pytorch/pytorch/pull/156316 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:57 +00:00
Xuehai Pan	b210cf1ea5	[BE][6/16] fix typos in torch/ (#156316 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156316 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315	2025-06-22 08:43:33 +00:00
Nikita Shulga	4cbbc8b458	[MPS] Implement backward pass for interpolate_trilinear (#156373 ) Backwards pass simply iterates over all 8 points current point contributed to, and back propagates them with the respective weights TODO: Benchmark the performance of similar loop for the forward pas (i.e. compiler should be able to do loop unrolling, so no point of unrolling it by hand) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156373 Approved by: https://github.com/dcci ghstack dependencies: #156375	2025-06-20 05:41:24 +00:00
Nikita Shulga	36f7a027b5	[MPS] Implement upsample_trilinear as Metal shader (#156263 ) But only forward for now Pull Request resolved: https://github.com/pytorch/pytorch/pull/156263 Approved by: https://github.com/dcci ghstack dependencies: #156256, #156090	2025-06-18 16:10:02 +00:00
Manuel Candales	a4ea242edc	[MPS] Implement scan metal kernels (#156100 ) Implements metal kernels for scan operations: - Migrates cumsum and cumprod from MPSGraph implementation to Metal. - Fixes #154881 - Adds MPS backend support for cummin and cummax Pull Request resolved: https://github.com/pytorch/pytorch/pull/156100 Approved by: https://github.com/malfet	2025-06-17 17:44:22 +00:00
Nikita Shulga	b1713c6655	[MPS][Testing][BE] Fix samples for full_like (#156026 ) Now that device is known, one can avoid creating tensors of `torch.double` type Pull Request resolved: https://github.com/pytorch/pytorch/pull/156026 Approved by: https://github.com/dcci ghstack dependencies: #156121	2025-06-17 04:46:26 +00:00
PyTorch MergeBot	03488d820c	Revert "[MPS][Testing][BE] Fix samples for full_like (#156026 )" This reverts commit `2d832c9587`. Reverted https://github.com/pytorch/pytorch/pull/156026 on behalf of https://github.com/atalman due to Sorry breaks MPS tests: test_ops.py::TestMathBitsCPU::test_neg_view_full_like_cpu_float64 [GH job link](https://github.com/pytorch/pytorch/actions/runs/15683608879/job/44182730620) [HUD commit link](`2d832c9587`) ([comment](https://github.com/pytorch/pytorch/pull/156026#issuecomment-2977903074))	2025-06-16 19:50:26 +00:00
Nikita Shulga	2d832c9587	[MPS][Testing][BE] Fix samples for full_like (#156026 ) Now that device is known, one can avoid creating tensors of `torch.double` type Pull Request resolved: https://github.com/pytorch/pytorch/pull/156026 Approved by: https://github.com/dcci	2025-06-16 14:27:42 +00:00
Nikita Shulga	831c9010c7	[BE] Remove non-existing operator from unimplemented list (#156025 ) Never heard of torch.login :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156025 Approved by: https://github.com/dcci	2025-06-16 14:14:58 +00:00
Nikita Shulga	fec571cfd4	[BE][CI] Remove hardshrink integer exclusions (#155965 ) As they are not called anyway Pull Request resolved: https://github.com/pytorch/pytorch/pull/155965 Approved by: https://github.com/dcci	2025-06-14 00:32:57 +00:00
Kurt Mohler	013cf1e330	[MPS] Move expm1 op to Metal (#155611 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155611 Approved by: https://github.com/malfet	2025-06-11 13:06:14 +00:00
Siddharth Kotapati	2161be8497	Move unary trig ops to metal kernels (#154465 ) Move inverse trig unary ops, sinh, & cosh to metal kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/154465 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-06-10 22:56:59 +00:00
Manuel Candales	0f47e76937	[MPS] Implement hardshrink metal kernel (#155304 ) Implements the forward and backward hardshrink operators as Metal kernels. In order to support the lambda parameter, we extend the `exec_unary_kernel` and `exec_binary_kernel` methods. Now they take an optional Scalar and an optional ScalarType argument. When the optional ScalarType is provided, it overrides the type of the Scalar. We add a new `REGISTER_UNARY_ALPHA_OP` macro, and modify the existing `REGISTER_BINARY_ALPHA_OP` to support the new feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155304 Approved by: https://github.com/malfet	2025-06-10 18:20:27 +00:00
Nikita Shulga	abbdf9f363	[BE][Testing] Unskip `ones_like`/`zeros_like` testing on MPS (#155476 ) But skip `double` dtype form OpInfo variants for this test Pull Request resolved: https://github.com/pytorch/pytorch/pull/155476 Approved by: https://github.com/Skylion007, https://github.com/dcci	2025-06-09 20:37:44 +00:00
Nikita Shulga	f140fac8dc	[MPS] Implement erfc (#155382 ) And migrate `erf` to Metal kernel Use `erf` approximations from https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/erf.h as previous approximation did not match the CPU implementation After that, `erfc(x) := 1.0 - erf(x)` Fixes https://github.com/pytorch/pytorch/issues/155337 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155382 Approved by: https://github.com/manuelcandales, https://github.com/dcci	2025-06-07 02:35:12 +00:00
Nikita Shulga	9f39028629	[MPS][BE] Move sigmoid op to Metal (#155080 ) Fixes https://github.com/pytorch/pytorch/issues/154895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155080 Approved by: https://github.com/dcci, https://github.com/cyyever ghstack dependencies: #154936, #155002, #155081	2025-06-04 03:28:11 +00:00

1 2

66 Commits