Commit Graph

11 Commits

Author SHA1 Message Date
Nikita Shulga
975bbc63db [MPS][BE] Move fmod/remainder to Metal ops (#154280)
This accomplishes following:
 - Fixes correctness problem with large integer types (though probably makes it slower, but this could not be avoided if one wants to compute accurate answer)
 - Makes op faster for floating point types (as Metal kernel invocation is faster than creating MPSGraph)
 - Eliminates need for several correctness workarounds

Fixes https://github.com/pytorch/pytorch/issues/154171
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154280
Approved by: https://github.com/dcci
ghstack dependencies: #154275, #154290
2025-05-24 01:45:33 +00:00
Nikita Shulga
6fe5d9215f [EZ][MPS] Enable rsub op (#153786)
Nothing really to enable, just add it to native functions, TensorIterator abstraction takes care of the rest
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153786
Approved by: https://github.com/cyyever, https://github.com/Skylion007, https://github.com/dcci
2025-05-19 02:01:48 +00:00
Nikita Shulga
62d8e3cb40 [BE][MPS] Cleanup log ops migration (#153727)
Introduced by https://github.com/pytorch/pytorch/pull/153398

Workaround internal compiler error on MacOS-13 by providing boolean specialization

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153727
Approved by: https://github.com/Skylion007
2025-05-16 19:32:17 +00:00
Siddharth Kotapati
327d1b6ef0 Move additional MPS Unary ops to Iterator (#152876)
Noticed some of these ops were contributing to a big chunk of the runtime for OpenLLama as well as a few other benchmarks

At the op level, moving to a TensorIterator-based Metal kernel gives a 20x speedup. Will migrate the inverse trigonometric functions & log ops in a follow-up PR, as this one is already a bit large
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152876
Approved by: https://github.com/malfet
2025-05-07 00:06:54 +00:00
Nikita Shulga
0ffd31dc8a [MPS] Migrate div roudning modes (#152758)
By implementing `div_floor` and `div_trunc` . Do not mark `div_trunc` as OPMATH, to align following output with CPU(if division is performed in fp32, than result will be truncated to 25
```
import torch
print(torch.tensor([[-7.4688, -3.1289]], dtype=torch.float16,device="cpu").div(torch.tensor([-0.2988, -0.8789], dtype=torch.bfloat16,device="cpu"), rounding_mode="trunc"))
tensor([[24.,  3.]])
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152758
Approved by: https://github.com/dcci
ghstack dependencies: #152663, #152515, #152737, #152743
2025-05-05 03:02:29 +00:00
Nikita Shulga
e889937850 [MPS] Migrate div to Metal (#152743)
TODOs:
 - Verify accuracy of  `metal::dot` vs `x.x*x.x + y.y*y.y`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152743
Approved by: https://github.com/dcci, https://github.com/Skylion007
ghstack dependencies: #152663, #152515, #152737
2025-05-04 00:56:19 +00:00
Nikita Shulga
fcfa6e36c9 [MPS] Fix lerp for complex numbers (#152479)
As well as `.add`/`.sub` with complex alpha

Before this change `python3 -c "import torch;print(torch.rand(10, device='mps', dtype=torch.complex64).add(torch.rand(10, device='mps', dtype=torch.complex64), alpha=.5j))"` used to fail with
```
RuntimeError: value cannot be converted to type double without overflow
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152479
Approved by: https://github.com/dcci
ghstack dependencies: #152443, #152466
2025-04-30 04:46:19 +00:00
Nikita Shulga
3ef6d6924a [BE] Switch TestConsistency to MPS device (#147893)
Which will eventually allow move decorators away more `common_mps.py`

Adjust tolerances accordingly. XFAIL a bunch of tests on MacOS-13, which is going to be deprecated anyway

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147893
Approved by: https://github.com/atalman
ghstack dependencies: #152204
2025-04-26 01:19:21 +00:00
Nikita Shulga
56190d2577 [MPS] Fix ICE for entr bool instantiation on M1/M2 (#152204)
By instantiating it implicitly, otherwise attempts to run something like
```
% python3 -c "import torch; print(torch.special.entr(torch.testing.make_tensor(10, dtype=torch.bool, device='mps')))"
```
will fail with
```
Failed to created pipeline state object, error: Error Domain=AGXMetalG14X Code=3 "Compiler encountered an internal error"
```

Similar in spirit to https://github.com/pytorch/pytorch/pull/149123
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152204
Approved by: https://github.com/dcci
2025-04-25 19:00:49 +00:00
Nikita Shulga
3aecf2dc52 [MPS] Extend index_put to half precision floats (#151869)
By reusing `c10/metal/atomic.h`
This also fixes `GPUTests.test_index_put_fallback[12]_mps` that is unrolled by inductor, so no need for dedicated atomic_add support

TODOs:
 - Get rid of indexing kernel and compute it directly when kernel is run
 - Simulate atomic_add for int64 types as series of int32 atomic-add-and-fetch
 - Setup tolerances correctly to pass float16/bfloat16 tests (as CPU always takes sequential strategy)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151869
Approved by: https://github.com/Skylion007, https://github.com/dcci
2025-04-22 22:00:08 +00:00
Li-Huai (Allan) Lin
fbd29527d8 [MPS] Move ops modifiers to testing utils so other tests can reuse (#151781)
Test collection check:
```
python -m pytest test/test_mps.py --collect-only
```
Before:
```
6390 tests collected in 8.34s
```

After:
```
6390 tests collected in 7.71s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151781
Approved by: https://github.com/malfet
2025-04-22 19:19:52 +00:00