Commit Graph

352 Commits

Author SHA1 Message Date
ekamiti
32d422f335 Make adding buffers more like adding parameters (#104069)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069
Approved by: https://github.com/mikaylagawarecki
2023-07-17 17:59:05 +00:00
David Radley
17250976f3 correct empty tensor mps all operation (#105218)
Fixes #104694

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105218
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2023-07-14 17:42:54 +00:00
albanD
08cbfb2a58 Avoid tensor creation and use scalar overload (#104264)
I would expect this preserves the behavior but there might be weird edge cases?
@mruberry might know?

The aim is to fix https://github.com/pytorch/pytorch/pull/104254 (and make `1 ** t` capturable via cudagraph)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104264
Approved by: https://github.com/zou3519
2023-07-12 18:11:27 +00:00
Nikita Shulga
5e4ee15e85 [MPS] Fix unique flatten logic (#104938)
Tensor must be flatted if dim is none before checking whether or not dim dimension is already None

Fixes https://github.com/pytorch/pytorch/issues/104879

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104938
Approved by: https://github.com/albanD
2023-07-11 19:55:56 +00:00
soulitzer
91dcc3b272 Fix activation checkpoint for mps (#104787)
Fixes https://github.com/pytorch/pytorch/issues/104478

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104787
Approved by: https://github.com/albanD
2023-07-08 14:57:05 +00:00
Jerry Zhang
611febf6cf [quant] Support integer implementations for max_pool2d (#104225)
Summary:
This is needed for representing quantized model in pt2 export quantization flow

Test Plan:
tested by opinfo, python test/test_ops.py

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104225
Approved by: https://github.com/kimishpatel
2023-07-05 23:54:07 +00:00
Nikita Shulga
01e6d64dd2 [MPS] Fix unary ops over sparse-mapped tensors (#100765)
If input tensor is backed by a sparse view, create a dense copy before running unary op, otherwise op will be applied against the wrong elements.
Introduce `is_dense_in_storage` that returns true if tensor/view are mapped to a dense area in  the tensor storage.
Add unit test to validate the fix.

Fixes https://github.com/pytorch/pytorch/issues/98074
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100765
Approved by: https://github.com/albanD
2023-07-05 23:17:43 +00:00
Denis Vieriu
28720ad585 Fix argmax and argmin clamp value on MPS (#104374)
Replace clamp `LLONG_MAX` clamp value with the largest integer value that can be stored in a double. `constantWithScalar` takes as input a `double` value, for which `LLONG_MAX` was not fitting in a dobule, resulting in failures on x86.

Fixes https://github.com/pytorch/pytorch/issues/98191, https://github.com/pytorch/pytorch/issues/92311

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104374
Approved by: https://github.com/razarmehr, https://github.com/kulinseth
2023-06-30 18:11:49 +00:00
cyy
54cb61f7d9 enable ASAN on some tests (#103647)
Enabling more tests on ASAN, meanwhile we disable float-divide-by-zero and float-cast-overflow, both are disabled because they are also disabled by default in latest clang.
The following cited doc explains the reasons.
```
-fsanitize=float-cast-overflow: Conversion to, from, or between floating-point types
which would overflow the destination. Because the range of representable values
for all floating-point types supported by Clang is [-inf, +inf], the only cases detected are
conversions from floating point to integer types.
-fsanitize=float-divide-by-zero: Floating point division by zero.
This is undefined per the C and C++ standards,
 but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing
either an infinity or NaN value,
so is not included in -fsanitize=undefined.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103647
Approved by: https://github.com/kit1980
2023-06-28 02:17:14 +00:00
magic-akari
e56cdfd74b [MPS] Handle deserialization more permissively (#98834)
MPS deserialization should handle `mps:0`.
It is generated from some codes like the following

```python
torch.rand(size=(3, 4)).to("mps")
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98834
Approved by: https://github.com/kulinseth, https://github.com/kit1980, https://github.com/malfet
2023-06-15 15:51:03 +00:00
Pearu Peterson
45401ef745 Enable float16 and complex32 support for sparse CSR elementwise multiplication operation. (#100394)
As in the title. In addition, the PR adds float16 addcmul support for CPU device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100394
Approved by: https://github.com/amjames, https://github.com/cpuhrsch
2023-06-14 14:42:39 +00:00
Li-Huai (Allan) Lin
cce58a43c9 [MPS] Fix softplus with f16 input (#101948)
Fixes #101946
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101948
Approved by: https://github.com/malfet
2023-05-31 00:40:10 +00:00
ecao
3f4fee735a add Half support for logsigmoid, threshold, elu, gelu, hardtanh, hardsigmoid, hardswish, hardshrink, softshrink, leakyrelu, softplus, glu, silu, mish, and prelu on CPU (#98745)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98745
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/ngimel
2023-05-27 16:20:21 +00:00
Li-Huai (Allan) Lin
0db704d240 [OpInfo] Add multi_head_attention_forward (#100153)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 8f8d620</samp>

This pull request improves the testing of the `nn.functional.multi_head_attention_forward` function by adding it to the `OpInfo` framework, adjusting the tolerance and skipping criteria for some test cases, and restricting the dtype for the `MetaProgrammingSystem` tests. These changes aim to address the randomness and numerical precision issues of the function.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100153
Approved by: https://github.com/drisspg
2023-05-26 01:58:17 +00:00
Denis Vieriu
de7ec2ddd7 [MPS] Allow saved models to be loaded directly to MPS through torch.jit.load (#102204)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 94eed69</samp>

This pull request adds support for serializing and deserializing tensors on the `mps` device using JIT. It includes a test case in `test/test_mps.py` and a device handling logic in `torch/csrc/jit/serialization/unpickler.cpp`.

Fixes https://github.com/pytorch/pytorch/issues/88820, https://github.com/pytorch/pytorch/issues/87504
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102204
Approved by: https://github.com/kulinseth, https://github.com/malfet
2023-05-25 23:32:29 +00:00
Li-Huai (Allan) Lin
02a7318a5b [MPS] Add aminmax op (#101691)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101691
Approved by: https://github.com/malfet
2023-05-23 18:01:34 +00:00
Li-Huai (Allan) Lin
330c907301 [MPS] Fix embedding cache key (#101857)
Fixes #101198

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101857
Approved by: https://github.com/kulinseth
2023-05-21 06:11:25 +00:00
Aaron Gokaslan
3e2ea32dab [BE]: Enable ruff rule TRY302 and apply fixes (#101874)
Removes useless try statements and unreachable code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101874
Approved by: https://github.com/malfet
2023-05-19 17:30:52 +00:00
Khushi
1aaf0396eb [reland][opinfo] empty_strided (#101782)
Follows #100223

Previous PR: #100890

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101782
Approved by: https://github.com/ezyang
2023-05-19 03:06:29 +00:00
PyTorch MergeBot
dfac4364c4 Revert "[opinfo] empty_strided (#100890)"
This reverts commit 01c7106580.

Reverted https://github.com/pytorch/pytorch/pull/100890 on behalf of https://github.com/PaliC due to broke test_ops.py slow test ([comment](https://github.com/pytorch/pytorch/pull/100890#issuecomment-1551903975))
2023-05-17 19:00:15 +00:00
Li-Huai (Allan) Lin
bb3558961f [MPS] Add histogram ops (#96652)
Adds `torch.histc`, `torch.histogram`, `torch.histogramdd`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96652
Approved by: https://github.com/kulinseth, https://github.com/malfet
2023-05-17 01:25:43 +00:00
Khushi
01c7106580 [opinfo] empty_strided (#100890)
Follows: #100223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100890
Approved by: https://github.com/ezyang
2023-05-15 23:39:39 +00:00
Nikita Shulga
9e089db32e [MPS] Enable arange for int8 and uint8 dtypes (#101303)
Not sure, why it was not enabled previously.
Sort types in `AT_DISPATCH_MPS_TYPES` by group (floats first then integers) and size.
Test implicitly in `test_bernoulli`.

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 80c7ed7</samp>

> _`Char` and `Byte` types_
> _MPS can dispatch them now_
> _Winter of tensors_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101303
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/kulinseth
2023-05-13 01:19:08 +00:00
Ramin Azarmehr
0be53d83fc [MPS] Add support for MPSProfiler Python bindings (#101002)
- Added torch.mps.profiler.[start() and stop()] APIs with RST documentation
- Added test case in test_mps
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101002
Approved by: https://github.com/malfet
2023-05-12 21:55:34 +00:00
Sun, Jiayi
d56e1b2f67 add Half support for unary ops on CPU (#98493)
Add Half support for log_sigmoid and some unary ops on CPU, including sinc, acosh, asinh, atanh, digamma, trigamma, rsqrt, acos, asin, atan, ceil, cos, erf, erfc, erfinv, exp, expml, floor, log, log10, log1p, log2, i0, round, sin, sqrt, tan, tanh, trunc, lgamma.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98493
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/ngimel
2023-05-12 04:52:34 +00:00
Nikita Shulga
b7bf953bbc [MPS] Fix bernoulli for int types (#100946)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 069fd23</samp>

This pull request enhances the MPS implementation of random operations in `Distributions.mm` and adds more dtype tests for the bernoulli distribution in `test_mps.py`. This improves the performance, correctness, and usability of the MPS backend for PyTorch.

Fixes https://github.com/pytorch/pytorch/issues/100717

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100946
Approved by: https://github.com/kulinseth
2023-05-11 23:52:38 +00:00
Nikita Shulga
87084643e5 [CI][MPS] Actually make grid_sampler_2d available (#101108)
In CI older MacOS SDK can be used to compile the binary, so add guard for availability of `MPSGraphResizeNearestRoundingModeRoundToEven` enum value.
MPS feature availability checks are deliberately done at runtime (by using `is_macos_13_or_newer` and forward-declaring methods in `MPSGraphVenturaOps.h`) rather than at compile time (by using `#ifdef`s).

Modify error message and XFAIL condition in `test_mps.py` to fail test due to missing conditional on macOS-13.2 or newer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101108
Approved by: https://github.com/kulinseth
2023-05-11 10:35:09 +00:00
Khushi
51fe53e619 [opinfo] item (#100313)
Follows #100223

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100313
Approved by: https://github.com/ezyang
2023-05-10 11:32:45 +00:00
Ramin Azarmehr
cecfcf1e17 [MPS] Handle MPS failures of test_modules.py in common_modules.py (#95334)
- Also cleaned up `test_modules.py` from skipMPS code.
- Added `skipMPS` for unsupported or failing tests on MPS backend in common_modules.py.
   (We'll remove `skipMPS` from those tests once a fix is available for them.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95334
Approved by: https://github.com/kulinseth, https://github.com/albanD
2023-05-09 03:55:16 +00:00
Li-Huai (Allan) Lin
3b6a7f4d51 [MPS] Fix index_put with deterministic algorithm enabled (#97660)
Prevent using parallel computing when deterministic algorithm is set.

Fixes #97574

Benchmark:
```
[--------------- index_put_ Deterministic Algorithm Enabled ---------------]
                                                              |  cpu  |  mps
1 threads: -----------------------------------------------------------------
      Dtype: torch.float32 Features: 1024; Num Indices: 512   |   37  |   49
      Dtype: torch.float32 Features: 1024; Num Indices: 1024  |   54  |   50
      Dtype: torch.float32 Features: 1024; Num Indices: 2048  |   86  |   50
      Dtype: torch.float32 Features: 1024; Num Indices: 4096  |  150  |   49

Times are in microseconds (us).

[-------------- index_put_ Deterministic Algorithm Disabled ---------------]
                                                              |  cpu  |  mps
1 threads: -----------------------------------------------------------------
      DType: torch.float32 Features: 1024; Num Indices: 512   |   37  |   49
      DType: torch.float32 Features: 1024; Num Indices: 1024  |   53  |   49
      DType: torch.float32 Features: 1024; Num Indices: 2048  |   86  |   49
      DType: torch.float32 Features: 1024; Num Indices: 4096  |  147  |   50

Times are in microseconds (us).
```

<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at ebf2ff3</samp>

Added a deterministic version of `index_put` for MPS tensors that runs on a single thread and can be enabled by a global context flag. Refactored the existing `index_put` function and the kernel selection logic to support both parallel and serial modes. Added a test function to verify the deterministic behavior of `index_put` under different conditions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97660
Approved by: https://github.com/kulinseth
2023-05-08 00:57:29 +00:00
Kulin Seth
e20c94bda9 [MPS] Add the test for 5D in test_mps which is skipped. (#99271)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99271
Approved by: https://github.com/DenisVieriu97
2023-05-05 22:57:06 +00:00
Li-Huai (Allan) Lin
13da6585b6 [MPS] Skip all empty ops tests (#100368)
Fixes #100175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100368
Approved by: https://github.com/kulinseth
2023-05-02 00:43:58 +00:00
Li-Huai (Allan) Lin
a50fb50c51 [MPS] Fix exception regex not compared (#100367)
Previously when using `self.assertRaisesRegex` to test raised exception and its regex, the regex wasn't actually compared because mps was not in the `NATIVE_DEVICES`. This PR fixes that by enabling exception regex comparisons for mps device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100367
Approved by: https://github.com/albanD
2023-05-02 00:43:58 +00:00
Nikita Shulga
2442858f52 [MPS] Fix layer_norm_backward_mps key (#100295)
Followup after https://github.com/pytorch/pytorch/pull/98794
See report in https://github.com/pytorch/pytorch/issues/98602#issuecomment-1527312211 and reproducer in https://github.com/pytorch/pytorch/issues/98602#issuecomment-1528214175

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100295
Approved by: https://github.com/kit1980, https://github.com/izaitsevfb
2023-04-29 03:37:35 +00:00
Li-Huai (Allan) Lin
81978120ec [MPS] Fix trace exceptions not raised for error inputs (#99239)
Also rename `trace_mps_out` to `trace_mps` as it is not an out version.

Remove `index_add` from XFAILLIST as it seems working as expected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99239
Approved by: https://github.com/kulinseth
2023-04-26 14:41:50 +00:00
Li-Huai (Allan) Lin
f4a37c9a5d [MPS] Fix max_pool2d exceptions not raised for error inputs (#99238)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99238
Approved by: https://github.com/kulinseth
2023-04-26 14:41:50 +00:00
Li-Huai (Allan) Lin
f4cf744380 [MPS] Fix gelu exceptions not raised for error inputs (#99237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99237
Approved by: https://github.com/kulinseth
2023-04-26 14:41:46 +00:00
Li-Huai (Allan) Lin
1fcf40da63 [MPS] Add linear inputs check (#99228)
Fixes #98211

https://github.com/pytorch/pytorch/issues/98211#issuecomment-1496005668
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99228
Approved by: https://github.com/kit1980
2023-04-26 04:44:23 +00:00
Denis Vieriu
89baa1a74c [MPS] Add support for linalg.vector_norm (#99811)
Summary of changes:

- Add support for linalg.vector_norm
- Fix zero norm, correct formula is: sum(x != 0)
- Add additional tests in test_mps
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99811
Approved by: https://github.com/kulinseth
2023-04-26 01:34:29 +00:00
Justin Chu
79c9e82e27 Fix flake8 lint errors reported by ruff - take 2 (#99798)
Replaces #99784. This PR is pure autofix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99798
Approved by: https://github.com/Skylion007, https://github.com/kit1980
2023-04-23 23:09:51 +00:00
BJ Hargrave
dc52ba2906 Fix test_mps for macos 13.3 (#98739)
Expected dtype is changed from torch.int64 to torch.int32 prior to
macos 13.3.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98739
Approved by: https://github.com/kulinseth
2023-04-12 19:23:08 +00:00
Li-Huai (Allan) Lin
be8a4eb8e3 [MPS] Add index_fill op (#98694)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98694
Approved by: https://github.com/kulinseth
2023-04-12 18:13:33 +00:00
Li-Huai (Allan) Lin
71aea7f56e [MPS] Add error inputs check (#98167)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98167
Approved by: https://github.com/kulinseth
2023-04-12 17:19:13 +00:00
Nikita Shulga
583193e1d9 [MPS] Fix batch_norm_backwards key (#98794)
One needs different graphs for batch_norm_backwards depending whether or
not gradients are required for some of the params

Fixes https://github.com/pytorch/pytorch/issues/98602

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98794
Approved by: https://github.com/kulinseth
2023-04-11 17:23:36 +00:00
Guang Yang
c377a8590b Add nonzero_static() op to pytorch to unblock export (#97417)
Summary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR

Example:

Say input tensor is `x = torch.tensor([[1, 0], [3, 2]])`

call regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])`
call `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded)
call `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated)

Test Plan:
**Unit Tests**
```
buck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static'
```

**PT2 Export with `nonzero_static()`**
Example of `GraphModule` in the exported graph
```
def forward(self, x):
    arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec)
    nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4);  arg0 = None
    return pytree.tree_unflatten([nonzero_static_default], self._out_spec)
```

Differential Revision: D44324808

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97417
Approved by: https://github.com/ezyang
2023-04-11 05:13:36 +00:00
Nikita Shulga
29cde00701 [MPS] Add random_ overload (#98333)
That simply calls `torch.random_(from=0, to=None)`

Also, fix optional upper bound calculation for all `dtypes` but int64:
As one can see from https://pytorch.org/docs/stable/generated/torch.Tensor.random_.html
`from` boundary is inclusive, but `to` is exclusive, i.e. if `to` is
omitted for `torch.int8` dtype, it should be set to `128` and to `2`
for torch.bool.

Add test for `torch.random_`

Fixes https://github.com/pytorch/pytorch/issues/98118

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98333
Approved by: https://github.com/kulinseth
2023-04-05 21:24:45 +00:00
Li-Huai (Allan) Lin
db8abde9b6 [MPS] Enable conditional indexing tests (#97871)
The tests seem to be working now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97871
Approved by: https://github.com/kulinseth
2023-04-01 16:15:08 +00:00
Li-Huai (Allan) Lin
7776653a0c Add linear gradgrad (#97151)
Fixes #92206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97151
Approved by: https://github.com/albanD
2023-03-30 07:25:02 +00:00
Philip Meier
2f6c18d1a2 improve memory footprint of torch.testing.assert_close (#96131)
Redo of #90172 out of stack.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96131
Approved by: https://github.com/pearu, https://github.com/mruberry
2023-03-29 23:49:56 +00:00
Li-Huai (Allan) Lin
4afef85dda [MPS] Fix index_select_scalar test (#97773)
#96408 introduced a check that prevents the index to scalar from being non-singleton.

Fixes #94162

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97773
Approved by: https://github.com/kulinseth
2023-03-28 19:23:59 +00:00
Li-Huai (Allan) Lin
100641aadf [MPS] Fix torch.eye unsupported bool constant on macOS 12 (#97027)
Fixes #91620

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97027
Approved by: https://github.com/kulinseth
2023-03-20 18:08:36 +00:00
Ramin Azarmehr
50beab2978 [MPS] Fix the failure with ReplicatePad3D (#96988)
- Only ReflectPad needs the torch checks for input arguments and not the ReplicatePad
- Added a test case
- The failure was originally found in test_modules with test `test_forward_nn_ReplicationPad3d_mps_float32`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96988
Approved by: https://github.com/DenisVieriu97
2023-03-17 01:41:12 +00:00
alexdremov
62eb7a2e97 [MPS] LSTM grad_y missing fix (#96601)
Fixes #96416
Added tests that do not use LSTM output simalarly to the issue

Seems like this fix once again introduces backward incompatibility.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96601
Approved by: https://github.com/albanD, https://github.com/kulinseth
2023-03-16 15:53:56 +00:00
Li-Huai (Allan) Lin
c95bcb6694 [MPS] Fix flip where no dims need to be flipped (#96605)
Fixes #96558

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96605
Approved by: https://github.com/kulinseth
2023-03-14 00:34:30 +00:00
Li-Huai (Allan) Lin
a87f3f612e [MPS] Fall back multi-layer LSTM on macOS 12 (#90909)
The native implementation of LSTM has been fixed on macOS 13.

On macOS 12, the multi-layer LSTM still has a numerical correctness issue that cannot be resolved on OS's side.

Thus, we fall back the multi-layer LSTM on macOS 12 to LSTMCell iteration. It might have performance impact but will make LSTM on macOS 12 fully usable.

Fixes: #90421
Issues related: #80306, #83144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90909
Approved by: https://github.com/albanD, https://github.com/kulinseth
2023-03-10 03:10:49 +00:00
Nikita Shulga
075a49442d [MPS] Allow float16 input to float32 LayerNorm (#96430)
Only for forward pass

Subset of https://github.com/pytorch/pytorch/pull/96208

Create constant with scalar using `input_mps_dtype` and use
`reciprocalWithTensor` instead of `divisionWithPrimaryTensor:1.0
secondaryTensor:`

Fixes https://github.com/pytorch/pytorch/issues/96113

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96430
Approved by: https://github.com/kulinseth
2023-03-09 22:09:10 +00:00
Kulin Seth
2bb022e902 [MPS] Adding xfaillist with all categories of failures. (#96176)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96176
Approved by: https://github.com/malfet
2023-03-08 08:41:21 +00:00
Catherine Lee
eea0733045 Reduce pytest blocklist (#96016)
`TestCase = object` or variations of it get switched to `TestCase = NoTest`.

unittest collects test based on subclassing unittest.TestCase, so setting TestCase = object removes it from unittest test collection.  pytest collects based on name (https://docs.pytest.org/en/7.1.x/reference/reference.html#confval-python_classes) but can be told to ignore a class (bottom of https://docs.pytest.org/en/7.1.x/example/pythoncollection.html#changing-naming-conventions)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96016
Approved by: https://github.com/ZainRizvi, https://github.com/huydhn
2023-03-07 18:30:27 +00:00
Li-Huai (Allan) Lin
2f66b57a7a [MPS] Fix in-place add and sub with alpha == 0.0 (#96184)
Apart from fixing the below issue, this PR integrates the test for `sub` into the test for `add` as they are implemented using the same template.

Fixes #96065

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96184
Approved by: https://github.com/kulinseth
2023-03-07 17:17:53 +00:00
Nikita Shulga
769cc8a614 [MPS] Add type promotion to torch.addcmul (#96164)
Fixes crash while running something like `python -c "import torch;x=torch.rand(3, 3, dtype=torch.float16, device='mps');y=x.addcmul(torch.ones(3, device='mps'), torch.ones(3, device='mps'));print(y)"`

Modify `castMPSTensor` to become a no-op if cast is not needed

Define `common_dtype` as `c10::promoType` between self, tensor1 and
tensor2. Cast to any output type.

Add mixed-types test to `TestMPS.test_addcmul`, though it does not cover
all the permutations

Discovered while looking at https://github.com/pytorch/pytorch/issues/96113

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96164
Approved by: https://github.com/kulinseth
2023-03-07 04:19:30 +00:00
alexdremov
78da315afd [MPS] Fix bidirectional LSTM & small one-direction LSTM fix (#95563)
Fixes #94754

With this PR I hope to finish my breathtaking journey of fixing MPS LSTM.

Here, I enable `bidirectional` on MPS. Also, I've noticed that cache key did not account for all parameters, so there could have been problems with one-directional LSTM when created without bias or dropout and then with one of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95563
Approved by: https://github.com/jhavukainen, https://github.com/kulinseth, https://github.com/malfet
2023-03-05 00:19:54 +00:00
Nikita Shulga
436993d52b [MPS] Error on unsupported types (#95982)
I.e. attempt to create tensor of all possible types and make sure that
it raises a structured error for non-MPS types

Also, rename `test_resize_as_all_dtypes_and_devices` to `test_resize_as_mps_dtypes` and `test_resize_all_dtypes_and_devices` to `test_resize_mps_dtypes` and run both test for all MPS dtypes (rather than just bool, float16 and bfloat16 as they were running before)

Fixes https://github.com/pytorch/pytorch/issues/95976

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95982
Approved by: https://github.com/kulinseth
2023-03-04 01:29:07 +00:00
Denis Vieriu
304a95435d [MPS] Disallow reshape in slice (#95905)
Disallow reshapes for arrayViews.
Current code allows a base shape of `[2, 4, 256]` to be sliced into `[4, 1, 256]` (view's shape) - which is not possible. Slicing a smaller dimension into a bigger one will always error out.

Fixes https://github.com/pytorch/pytorch/issues/95883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95905
Approved by: https://github.com/razarmehr, https://github.com/kulinseth
2023-03-03 08:08:34 +00:00
Denis Vieriu
d0dd898943 [MPS] Remove remaining casts from 13.3 (#95870)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95870
Approved by: https://github.com/kulinseth
2023-03-02 12:44:59 +00:00
Denis Vieriu
4d3352ed90 [MPS] Remove casts from reduction/cumsum/sort ops starting with macOS 13.3 (#95817)
MPS in macOS13.3 has added support for int64 in reduction ops / cumsum / sort / argsort. This change removes the hard-coded casts and error messages prior macOS 13.3, allowing the op to run natively with int64.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95817
Approved by: https://github.com/kulinseth
2023-03-02 00:26:24 +00:00
Kulin Seth
5d9d8c6154 [MPS] Add fixes for div with floor and raise error for div_trunc (#95769)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95769
Approved by: https://github.com/DenisVieriu97
2023-03-01 20:52:28 +00:00
Denis Vieriu
e5a959a2d4 [MPS] Fix views with 3 or more sliced dimensions (#95762)
Fixes https://github.com/pytorch/pytorch/issues/95482
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95762
Approved by: https://github.com/razarmehr
2023-03-01 16:16:49 +00:00
Denis Vieriu
ed1957dc19 [MPS] Add support for masked_scatter (#95743)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95743
Approved by: https://github.com/kulinseth
2023-03-01 01:36:36 +00:00
Li-Huai (Allan) Lin
f33180fb7f [MPS] Add pow.Scalar (#95201)
1. Adds `pow.Scalar`.
2. Modifies testing `atol` and `rtol` to get pow output match tests pass.
3. Xfails numerically incorrect dtypes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95201
Approved by: https://github.com/kulinseth
2023-02-28 16:11:15 +00:00
Li-Huai (Allan) Lin
9e16f1281f [MPS] Add copysign op. (#95552)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95552
Approved by: https://github.com/kulinseth
2023-02-28 06:49:46 +00:00
Li-Huai (Allan) Lin
b7c2a65139 [MPS] Fix type casting copy with storage offset (#95573)
This PR handles the case where the `dst` tensor of type casting has a storage offset by creating a temporary buffer to store results and then copy them back to the dst with the offset added.

Fixes #95417

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95573
Approved by: https://github.com/kulinseth
2023-02-28 05:24:31 +00:00
Li-Huai (Allan) Lin
4930ae7f82 [MPS] Add roll op (#95168)
Reuse the cpu implementation here as currently there is no native roll implementation from the MPS api (if any, please let me know).

Compared to falling back to cpu using `PYTORCH_ENABLE_MPS_FALLBACK=1`, this way we keep tensors on MPS.

Did a small benchmark:

```python
for num in [10, 100, 1000, 10000]:
    for shft in [1, 5]:
        sz = num * num
        x = torch.arange(sz, device="cpu").view(num, num)
        s = time.time()
        r = torch.roll(x, shft)
        cpu_e = time.time() - s
        x = torch.arange(sz, device="mps").view(num, num)
        s = time.time()
        r = torch.roll(x, shft)
        mps_e = time.time() - s
        print(f"size: ({num}, {num}) shft: {shft} cpu: {cpu_e} mps: {mps_e}")
```

```
size: (10, 10) shft: 1 cpu: 0.00015163421630859375 mps: 0.003078937530517578
size: (10, 10) shft: 5 cpu: 6.794929504394531e-05 mps: 0.0014979839324951172
size: (100, 100) shft: 1 cpu: 0.0001621246337890625 mps: 0.0016200542449951172
size: (100, 100) shft: 5 cpu: 0.00016379356384277344 mps: 0.00154876708984375
size: (1000, 1000) shft: 1 cpu: 0.0022068023681640625 mps: 0.0017690658569335938
size: (1000, 1000) shft: 5 cpu: 0.009071111679077148 mps: 0.0020020008087158203
size: (10000, 10000) shft: 1 cpu: 0.16785407066345215 mps: 0.011695146560668945
size: (10000, 10000) shft: 5 cpu: 0.1160881519317627 mps: 0.011452913284301758
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95168
Approved by: https://github.com/albanD
2023-02-27 18:31:17 +00:00
Nikita Shulga
fd8367a7b1 [MPS][BE] Introduce xfail (#95045)
Add `mps_ops_modifier` function that adds `unittest.expectedFailure` decorators to the operators that supposed to fail on MPS.

This allows one to know whether or not operation will fail, rather than skip it.
For example:
```
% python test_mps.py -v -k test_output_match_dot
test_output_match_dot_cpu_float32 (__main__.TestConsistencyCPU) ... ok
test_output_match_dot_cpu_int16 (__main__.TestConsistencyCPU) ... ok
test_output_match_dot_cpu_int32 (__main__.TestConsistencyCPU) ... ok
test_output_match_dot_cpu_int64 (__main__.TestConsistencyCPU) ... expected failure
test_output_match_dot_cpu_uint8 (__main__.TestConsistencyCPU) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.175s

OK (expected failures=1)
```

Moved a few functions from blocklist to xfail, and find out that some of the functions in the list actually work, for example `torch.long`.

Also, allow `None` to be used in `ALLOWLIST`  instead of specifying all types explicitly (which aligns with `DecorateInfo` semantic)

Eventually, we should get rid of `ALLOWLIST` (i.e. all ops are allowed), keep small `BLOCKLIST` and move the rest to `XFAILLIST`

Add step to print HW/SW info before running MPS tests.

Fix type promotion in `trace_mps_out`

Introduce `MACOS_12_X_XFAILLIST` and skip almost every function for `torch.uint8`,  although some of those doesn't make much sense and feels like a regression from PyTorch-1.13

Re-enabled MPS testing on MacOS 12, as runners seems to be available again
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95045
Approved by: https://github.com/albanD
2023-02-27 15:01:01 +00:00
Li-Huai (Allan) Lin
4dca9bde05 [MPS] Add fmax fmin op (#95191)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95191
Approved by: https://github.com/kulinseth
2023-02-25 07:21:48 +00:00
Li-Huai (Allan) Lin
5cad542e43 [MPS] Add log_sigmoid op (#95280)
1. Add log_sigmoid.
2. Make log1p a common function. Operators that use log1p: mish, softplus, log_sigmoid (maybe more).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95280
Approved by: https://github.com/kulinseth
2023-02-24 01:38:30 +00:00
alexdremov
b9e95158d5 [MPS] Fix LSTM backward and forward pass (#95137)
Fixes #91694
Fixes #92615

Several transpositions were missing for backward graph in case of `batch_first=True`. The #91694 is not reproduced with `batch_first=False`.

After fixing transpose issue, I finally thought that now I can use LSTM freely in my project. And then I got horrific results on train. Seems related to #92615.

After that I decided to fix LSTM's backward step completely. I collected all my findings in this thread — seems like I succeeded

Funny enough, backward tests were completely disabled before and were not passing:
```python
    @unittest.skipIf(True, "Backward of lstm returns wrong result")
    def test_lstm_2(self, device="mps", dtype=torch.float32):
```

UPD: forward pass of multi-layer version also was wrong due to the incorrect `initState, initCell` slices. Tests were passing because states were inited with zeros. *Accidentally* fixed this too

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95137
Approved by: https://github.com/jhavukainen, https://github.com/kulinseth, https://github.com/soulitzer
2023-02-23 17:32:42 +00:00
Denis Vieriu
86efa104f5 [MPS] Fix view op slicing for 2nd dim in case of 0 offset (#95381)
* Fix view op slicing for 2nd dim in case of 0 offset

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95381
Approved by: https://github.com/razarmehr
2023-02-23 17:26:10 +00:00
XiaobingSuper
5730cabdd0 using float type to do the computation of norm reduce for cpu half and bfloat16 dtype (#95166)
As the title, we should use a higher dtype to compute norm reduce for half and bfloat1 dtype.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95166
Approved by: https://github.com/peterbell10, https://github.com/jgong5, https://github.com/ngimel, https://github.com/lezcano
2023-02-23 05:00:25 +00:00
Li-Huai (Allan) Lin
69c76ff05e [MPS] Add xlogy op (#95213)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95213
Approved by: https://github.com/kulinseth, https://github.com/soulitzer
2023-02-22 19:43:12 +00:00
Denis Vieriu
5e47571a13 [MPS] Convolution cleanup; remove unnecessary contiguous calls (#95078)
- Fixes convolution crashes in backward with weights
- Removes unnecessary contiguous calls
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95078
Approved by: https://github.com/kulinseth
2023-02-22 18:04:12 +00:00
Kulin Seth
02a6d4334b [MPS] Handle broadcasting by expanding src tensor in Copy.mm (#95272)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95272
Approved by: https://github.com/DenisVieriu97
2023-02-22 18:02:42 +00:00
Denis Vieriu
8475af7761 [MPS] Cast int64 to int32 for reduction ops (#95231)
- give warnings of converting int64 for reduction ops
- use cast tensor for reduction sum on trace
- unblock trace from running
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95231
Approved by: https://github.com/razarmehr
2023-02-22 17:23:25 +00:00
Li-Huai (Allan) Lin
f70a3430aa [MPS] Add hypot op (#95196)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95196
Approved by: https://github.com/kulinseth
2023-02-21 22:40:20 +00:00
Li-Huai (Allan) Lin
e0a0329a67 [MPS] Add hardsigmoid op (#95164)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95164
Approved by: https://github.com/kulinseth
2023-02-21 07:06:37 +00:00
Li-Huai (Allan) Lin
d96aac8d2a [MPS] Add logit op (#95162)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95162
Approved by: https://github.com/kulinseth
2023-02-21 07:02:45 +00:00
alexdremov
a17a7ccc92 [MPS] LogSoftmax numerical stability (#95091)
Fixes #94043

Calculations are now consistent with numericaly stable formula and CPU:

$LogSoftmax(X, \dim) = X - \max(X, \dim) - \log(sum(X - \max(X, \dim), \dim))$

@malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95091
Approved by: https://github.com/malfet, https://github.com/kulinseth
2023-02-18 18:26:29 +00:00
Ramin Azarmehr
9511b9fad2 [MPS] Fix copy_cast_mps() on tensors with storage offset (#95093)
- The copy_cast path requires storage_offset to be applied before casting
- This should fix some correctness issues in transformer models

Fixes #94980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95093
Approved by: https://github.com/kulinseth
2023-02-18 16:29:01 +00:00
Li-Huai (Allan) Lin
25ee6dd335 [MPS] Fix fill_ where input tensor has a storage offset (#95113)
Fixes #94390

Apart from fixing the issue above, this PR also fixes a bug that when an input tensor can be sliced, a sliced array view is created. This array view seems to be not writable or have a different storage from the original tensor, causing incorrect results with the in-place `fill`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95113
Approved by: https://github.com/kulinseth
2023-02-18 16:19:15 +00:00
Li-Huai (Allan) Lin
0a9c608461 [MPS] Fix tensor with non-zero storage offset graph gathering (#91071)
Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack `y_mps` and `x_mps` on the last dim:
```
t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)
```

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

```
res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)
```

`x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes #87856
Fixes #91065

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91071
Approved by: https://github.com/kulinseth
2023-02-17 18:44:20 +00:00
Denis Vieriu
a2afc657da [MPS] Fix upsample for NHWC output (#94963)
Fixes https://github.com/huggingface/diffusers/issues/941

**Before**:
<img width="1144" alt="Screenshot 2023-02-15 at 8 11 53 PM" src="https://user-images.githubusercontent.com/104024078/219266709-6a77636a-2fc0-4802-b130-85069b95953f.png">

**After**:
<img width="1144" alt="Screenshot 2023-02-15 at 8 12 02 PM" src="https://user-images.githubusercontent.com/104024078/219266694-ea743c02-fb55-44f1-b7d6-5946106527c3.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94963
Approved by: https://github.com/razarmehr
2023-02-17 05:07:22 +00:00
Denis Vieriu
5d1e9fd214 [MPS] Fix prelu backward pass (#94933)
Allocate the correct shape for the weights gradient
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94933
Approved by: https://github.com/razarmehr
2023-02-17 03:45:12 +00:00
Denis Vieriu
bc361fdfdf [MPS] Fix bilinear backward pass (#94892)
Fixes backward pass for bilinear.

Summary of changes:
- bilinear op is able to produce **contiguous, non-view** tensors with a storage offset, such as: shape=`[1, 1, 1, 1]`, `storage_offset=12`. This seems a weird case, but it is valid, and for these type of tensors we wouldn't be able to gather/scatter since we look at the view flag (which is not set here). This change looks into `storage_offset` only rather than the is_view flag which is not being set
- **reduction sum** must return a zeroed out output if passing an input with 0 elements (e.g a shape of (0, 5)).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94892
Approved by: https://github.com/kulinseth
2023-02-16 00:30:29 +00:00
Kulin Seth
54ebf255ab [MPS] Fixes for LSTM. (#94889)
- Backward pass has to give explicit bias tensor of zeros if none is passed to the op or the bias gradient will not be calculated.
- Fixed bias tensor mistakenly getting overwritten to zeros
- Fixes crash when lstm op called with has_biases set to false. Change takes into account the changed shape of the input params TensorList depending on the bias flag.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94889
Approved by: https://github.com/DenisVieriu97
2023-02-15 16:10:40 +00:00
Denis Vieriu
71ec2617d2 [MPS] Block uint8 data type for unary and binary ops on macOS 12 (#94876)
Blocks uint8 data type for unary and binary ops on macOS 12
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94876
Approved by: https://github.com/kulinseth
2023-02-15 06:09:56 +00:00
Kulin Seth
94f0808629 [MPS] Add fmod op. (#94722)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94722
Approved by: https://github.com/DenisVieriu97
2023-02-14 14:55:26 +00:00
Xuehai Pan
b005ec62b9 [BE] Remove dependency on six and future (#94709)
Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-02-14 09:14:14 +00:00
Denis Vieriu
1f06a71797 [MPS] Error out for square int64 input (#94766)
- add checks for whether macOS is greater than 13.2
- remove square from block list
- throw error messages if power int64 is called before macOS 13.2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94766
Approved by: https://github.com/kulinseth
2023-02-14 04:45:41 +00:00
Denis Vieriu
cedb7e3d77 [MPS] Fix remainder op for integral dtypes (#94757)
Map remainder op to the same template as div (integral dtypes will be cast to float)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94757
Approved by: https://github.com/kulinseth
2023-02-14 01:06:49 +00:00
Denis Vieriu
4acdc446b2 [MPS] Fix batch norm for NHWC (#94760)
Fixes `test_modules.py` batch norm NHWC testcases:
- `test_memory_format_nn_BatchNorm2d_eval_mode_mps_float32`
- `test_memory_format_nn_BatchNorm2d_eval_mode_mps_float32`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94760
Approved by: https://github.com/kulinseth
2023-02-13 23:31:10 +00:00
OwenPendrighElliott
840fb74ec8 86990 range mps support (#91075)
Fixes #86990

- Added range_mps_out to RangeFactories.mm
- Updated native_functions.yaml
- Added tests in test_mps.py

I did observe that despite [the documentation for torch.range](https://pytorch.org/docs/stable/generated/torch.range.html), the existing implementations do not adjust their return type based off the arguments passed to them. The MPS implementation provided here behaves the same way as the existing CPU and CUDA implementations in this regard, hence the conversion to float32 in the test cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91075
Approved by: https://github.com/kulinseth, https://github.com/DenisVieriu97
2023-02-13 23:19:10 +00:00