pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Denis Vieriu	28720ad585	Fix argmax and argmin clamp value on MPS (#104374 ) Replace clamp `LLONG_MAX` clamp value with the largest integer value that can be stored in a double. `constantWithScalar` takes as input a `double` value, for which `LLONG_MAX` was not fitting in a dobule, resulting in failures on x86. Fixes https://github.com/pytorch/pytorch/issues/98191, https://github.com/pytorch/pytorch/issues/92311 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104374 Approved by: https://github.com/razarmehr, https://github.com/kulinseth	2023-06-30 18:11:49 +00:00
cyy	54cb61f7d9	enable ASAN on some tests (#103647 ) Enabling more tests on ASAN, meanwhile we disable float-divide-by-zero and float-cast-overflow, both are disabled because they are also disabled by default in latest clang. The following cited doc explains the reasons. ``` -fsanitize=float-cast-overflow: Conversion to, from, or between floating-point types which would overflow the destination. Because the range of representable values for all floating-point types supported by Clang is [-inf, +inf], the only cases detected are conversions from floating point to integer types. -fsanitize=float-divide-by-zero: Floating point division by zero. This is undefined per the C and C++ standards, but is defined by Clang (and by ISO/IEC/IEEE 60559 / IEEE 754) as producing either an infinity or NaN value, so is not included in -fsanitize=undefined. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103647 Approved by: https://github.com/kit1980	2023-06-28 02:17:14 +00:00
magic-akari	e56cdfd74b	[MPS] Handle deserialization more permissively (#98834 ) MPS deserialization should handle `mps:0`. It is generated from some codes like the following ```python torch.rand(size=(3, 4)).to("mps") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98834 Approved by: https://github.com/kulinseth, https://github.com/kit1980, https://github.com/malfet	2023-06-15 15:51:03 +00:00
Pearu Peterson	45401ef745	Enable float16 and complex32 support for sparse CSR elementwise multiplication operation. (#100394 ) As in the title. In addition, the PR adds float16 addcmul support for CPU device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100394 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-06-14 14:42:39 +00:00
Li-Huai (Allan) Lin	cce58a43c9	[MPS] Fix softplus with f16 input (#101948 ) Fixes #101946 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101948 Approved by: https://github.com/malfet	2023-05-31 00:40:10 +00:00
ecao	3f4fee735a	add Half support for logsigmoid, threshold, elu, gelu, hardtanh, hardsigmoid, hardswish, hardshrink, softshrink, leakyrelu, softplus, glu, silu, mish, and prelu on CPU (#98745 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98745 Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/ngimel	2023-05-27 16:20:21 +00:00
Li-Huai (Allan) Lin	0db704d240	[OpInfo] Add multi_head_attention_forward (#100153 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 8f8d620</samp> This pull request improves the testing of the `nn.functional.multi_head_attention_forward` function by adding it to the `OpInfo` framework, adjusting the tolerance and skipping criteria for some test cases, and restricting the dtype for the `MetaProgrammingSystem` tests. These changes aim to address the randomness and numerical precision issues of the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100153 Approved by: https://github.com/drisspg	2023-05-26 01:58:17 +00:00
Denis Vieriu	de7ec2ddd7	[MPS] Allow saved models to be loaded directly to MPS through torch.jit.load (#102204 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 94eed69</samp> This pull request adds support for serializing and deserializing tensors on the `mps` device using JIT. It includes a test case in `test/test_mps.py` and a device handling logic in `torch/csrc/jit/serialization/unpickler.cpp`. Fixes https://github.com/pytorch/pytorch/issues/88820, https://github.com/pytorch/pytorch/issues/87504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102204 Approved by: https://github.com/kulinseth, https://github.com/malfet	2023-05-25 23:32:29 +00:00
Li-Huai (Allan) Lin	02a7318a5b	[MPS] Add aminmax op (#101691 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101691 Approved by: https://github.com/malfet	2023-05-23 18:01:34 +00:00
Li-Huai (Allan) Lin	330c907301	[MPS] Fix embedding cache key (#101857 ) Fixes #101198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101857 Approved by: https://github.com/kulinseth	2023-05-21 06:11:25 +00:00
Aaron Gokaslan	3e2ea32dab	[BE]: Enable ruff rule TRY302 and apply fixes (#101874 ) Removes useless try statements and unreachable code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101874 Approved by: https://github.com/malfet	2023-05-19 17:30:52 +00:00
Khushi	1aaf0396eb	[reland][opinfo] empty_strided (#101782 ) Follows #100223 Previous PR: #100890 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101782 Approved by: https://github.com/ezyang	2023-05-19 03:06:29 +00:00
PyTorch MergeBot	dfac4364c4	Revert "[opinfo] empty_strided (#100890 )" This reverts commit `01c7106580`. Reverted https://github.com/pytorch/pytorch/pull/100890 on behalf of https://github.com/PaliC due to broke test_ops.py slow test ([comment](https://github.com/pytorch/pytorch/pull/100890#issuecomment-1551903975))	2023-05-17 19:00:15 +00:00
Li-Huai (Allan) Lin	bb3558961f	[MPS] Add histogram ops (#96652 ) Adds `torch.histc`, `torch.histogram`, `torch.histogramdd` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96652 Approved by: https://github.com/kulinseth, https://github.com/malfet	2023-05-17 01:25:43 +00:00
Khushi	01c7106580	[opinfo] empty_strided (#100890 ) Follows: #100223 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100890 Approved by: https://github.com/ezyang	2023-05-15 23:39:39 +00:00
Nikita Shulga	9e089db32e	[MPS] Enable `arange` for `int8` and `uint8` dtypes (#101303 ) Not sure, why it was not enabled previously. Sort types in `AT_DISPATCH_MPS_TYPES` by group (floats first then integers) and size. Test implicitly in `test_bernoulli`. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 80c7ed7</samp> > _`Char` and `Byte` types_ > _MPS can dispatch them now_ > _Winter of tensors_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/101303 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/kulinseth	2023-05-13 01:19:08 +00:00
Ramin Azarmehr	0be53d83fc	[MPS] Add support for MPSProfiler Python bindings (#101002 ) - Added torch.mps.profiler.[start() and stop()] APIs with RST documentation - Added test case in test_mps Pull Request resolved: https://github.com/pytorch/pytorch/pull/101002 Approved by: https://github.com/malfet	2023-05-12 21:55:34 +00:00
Sun, Jiayi	d56e1b2f67	add Half support for unary ops on CPU (#98493 ) Add Half support for log_sigmoid and some unary ops on CPU, including sinc, acosh, asinh, atanh, digamma, trigamma, rsqrt, acos, asin, atan, ceil, cos, erf, erfc, erfinv, exp, expml, floor, log, log10, log1p, log2, i0, round, sin, sqrt, tan, tanh, trunc, lgamma. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98493 Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/ngimel	2023-05-12 04:52:34 +00:00
Nikita Shulga	b7bf953bbc	[MPS] Fix bernoulli for int types (#100946 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 069fd23</samp> This pull request enhances the MPS implementation of random operations in `Distributions.mm` and adds more dtype tests for the bernoulli distribution in `test_mps.py`. This improves the performance, correctness, and usability of the MPS backend for PyTorch. Fixes https://github.com/pytorch/pytorch/issues/100717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100946 Approved by: https://github.com/kulinseth	2023-05-11 23:52:38 +00:00
Nikita Shulga	87084643e5	[CI][MPS] Actually make grid_sampler_2d available (#101108 ) In CI older MacOS SDK can be used to compile the binary, so add guard for availability of `MPSGraphResizeNearestRoundingModeRoundToEven` enum value. MPS feature availability checks are deliberately done at runtime (by using `is_macos_13_or_newer` and forward-declaring methods in `MPSGraphVenturaOps.h`) rather than at compile time (by using `#ifdef`s). Modify error message and XFAIL condition in `test_mps.py` to fail test due to missing conditional on macOS-13.2 or newer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101108 Approved by: https://github.com/kulinseth	2023-05-11 10:35:09 +00:00
Khushi	51fe53e619	[opinfo] item (#100313 ) Follows #100223 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100313 Approved by: https://github.com/ezyang	2023-05-10 11:32:45 +00:00
Ramin Azarmehr	cecfcf1e17	[MPS] Handle MPS failures of test_modules.py in common_modules.py (#95334 ) - Also cleaned up `test_modules.py` from skipMPS code. - Added `skipMPS` for unsupported or failing tests on MPS backend in common_modules.py. (We'll remove `skipMPS` from those tests once a fix is available for them.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95334 Approved by: https://github.com/kulinseth, https://github.com/albanD	2023-05-09 03:55:16 +00:00
Li-Huai (Allan) Lin	3b6a7f4d51	[MPS] Fix index_put with deterministic algorithm enabled (#97660 ) Prevent using parallel computing when deterministic algorithm is set. Fixes #97574 Benchmark: ``` [--------------- index_put_ Deterministic Algorithm Enabled ---------------] \| cpu \| mps 1 threads: ----------------------------------------------------------------- Dtype: torch.float32 Features: 1024; Num Indices: 512 \| 37 \| 49 Dtype: torch.float32 Features: 1024; Num Indices: 1024 \| 54 \| 50 Dtype: torch.float32 Features: 1024; Num Indices: 2048 \| 86 \| 50 Dtype: torch.float32 Features: 1024; Num Indices: 4096 \| 150 \| 49 Times are in microseconds (us). [-------------- index_put_ Deterministic Algorithm Disabled ---------------] \| cpu \| mps 1 threads: ----------------------------------------------------------------- DType: torch.float32 Features: 1024; Num Indices: 512 \| 37 \| 49 DType: torch.float32 Features: 1024; Num Indices: 1024 \| 53 \| 49 DType: torch.float32 Features: 1024; Num Indices: 2048 \| 86 \| 49 DType: torch.float32 Features: 1024; Num Indices: 4096 \| 147 \| 50 Times are in microseconds (us). ``` <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at ebf2ff3</samp> Added a deterministic version of `index_put` for MPS tensors that runs on a single thread and can be enabled by a global context flag. Refactored the existing `index_put` function and the kernel selection logic to support both parallel and serial modes. Added a test function to verify the deterministic behavior of `index_put` under different conditions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97660 Approved by: https://github.com/kulinseth	2023-05-08 00:57:29 +00:00
Kulin Seth	e20c94bda9	[MPS] Add the test for 5D in test_mps which is skipped. (#99271 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99271 Approved by: https://github.com/DenisVieriu97	2023-05-05 22:57:06 +00:00
Li-Huai (Allan) Lin	13da6585b6	[MPS] Skip all empty ops tests (#100368 ) Fixes #100175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100368 Approved by: https://github.com/kulinseth	2023-05-02 00:43:58 +00:00
Li-Huai (Allan) Lin	a50fb50c51	[MPS] Fix exception regex not compared (#100367 ) Previously when using `self.assertRaisesRegex` to test raised exception and its regex, the regex wasn't actually compared because mps was not in the `NATIVE_DEVICES`. This PR fixes that by enabling exception regex comparisons for mps device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100367 Approved by: https://github.com/albanD	2023-05-02 00:43:58 +00:00
Nikita Shulga	2442858f52	[MPS] Fix `layer_norm_backward_mps` key (#100295 ) Followup after https://github.com/pytorch/pytorch/pull/98794 See report in https://github.com/pytorch/pytorch/issues/98602#issuecomment-1527312211 and reproducer in https://github.com/pytorch/pytorch/issues/98602#issuecomment-1528214175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100295 Approved by: https://github.com/kit1980, https://github.com/izaitsevfb	2023-04-29 03:37:35 +00:00
Li-Huai (Allan) Lin	81978120ec	[MPS] Fix trace exceptions not raised for error inputs (#99239 ) Also rename `trace_mps_out` to `trace_mps` as it is not an out version. Remove `index_add` from XFAILLIST as it seems working as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99239 Approved by: https://github.com/kulinseth	2023-04-26 14:41:50 +00:00
Li-Huai (Allan) Lin	f4a37c9a5d	[MPS] Fix max_pool2d exceptions not raised for error inputs (#99238 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99238 Approved by: https://github.com/kulinseth	2023-04-26 14:41:50 +00:00
Li-Huai (Allan) Lin	f4cf744380	[MPS] Fix gelu exceptions not raised for error inputs (#99237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/99237 Approved by: https://github.com/kulinseth	2023-04-26 14:41:46 +00:00
Li-Huai (Allan) Lin	1fcf40da63	[MPS] Add linear inputs check (#99228 ) Fixes #98211 https://github.com/pytorch/pytorch/issues/98211#issuecomment-1496005668 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99228 Approved by: https://github.com/kit1980	2023-04-26 04:44:23 +00:00
Denis Vieriu	89baa1a74c	[MPS] Add support for linalg.vector_norm (#99811 ) Summary of changes: - Add support for linalg.vector_norm - Fix zero norm, correct formula is: sum(x != 0) - Add additional tests in test_mps Pull Request resolved: https://github.com/pytorch/pytorch/pull/99811 Approved by: https://github.com/kulinseth	2023-04-26 01:34:29 +00:00
Justin Chu	79c9e82e27	Fix flake8 lint errors reported by ruff - take 2 (#99798 ) Replaces #99784. This PR is pure autofix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99798 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-04-23 23:09:51 +00:00
BJ Hargrave	dc52ba2906	Fix test_mps for macos 13.3 (#98739 ) Expected dtype is changed from torch.int64 to torch.int32 prior to macos 13.3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98739 Approved by: https://github.com/kulinseth	2023-04-12 19:23:08 +00:00
Li-Huai (Allan) Lin	be8a4eb8e3	[MPS] Add index_fill op (#98694 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/98694 Approved by: https://github.com/kulinseth	2023-04-12 18:13:33 +00:00
Li-Huai (Allan) Lin	71aea7f56e	[MPS] Add error inputs check (#98167 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/98167 Approved by: https://github.com/kulinseth	2023-04-12 17:19:13 +00:00
Nikita Shulga	583193e1d9	[MPS] Fix batch_norm_backwards key (#98794 ) One needs different graphs for batch_norm_backwards depending whether or not gradients are required for some of the params Fixes https://github.com/pytorch/pytorch/issues/98602 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98794 Approved by: https://github.com/kulinseth	2023-04-11 17:23:36 +00:00
Guang Yang	c377a8590b	Add `nonzero_static()` op to pytorch to unblock export (#97417 ) Summary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR Example: Say input tensor is `x = torch.tensor([[1, 0], [3, 2]])` call regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])` call `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded) call `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated) Test Plan: Unit Tests ``` buck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static' ``` PT2 Export with `nonzero_static()` Example of `GraphModule` in the exported graph ``` def forward(self, x): arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec) nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4); arg0 = None return pytree.tree_unflatten([nonzero_static_default], self._out_spec) ``` Differential Revision: D44324808 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97417 Approved by: https://github.com/ezyang	2023-04-11 05:13:36 +00:00
Nikita Shulga	29cde00701	[MPS] Add `random_` overload (#98333 ) That simply calls `torch.random_(from=0, to=None)` Also, fix optional upper bound calculation for all `dtypes` but int64: As one can see from https://pytorch.org/docs/stable/generated/torch.Tensor.random_.html `from` boundary is inclusive, but `to` is exclusive, i.e. if `to` is omitted for `torch.int8` dtype, it should be set to `128` and to `2` for torch.bool. Add test for `torch.random_` Fixes https://github.com/pytorch/pytorch/issues/98118 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98333 Approved by: https://github.com/kulinseth	2023-04-05 21:24:45 +00:00
Li-Huai (Allan) Lin	db8abde9b6	[MPS] Enable conditional indexing tests (#97871 ) The tests seem to be working now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97871 Approved by: https://github.com/kulinseth	2023-04-01 16:15:08 +00:00
Li-Huai (Allan) Lin	7776653a0c	Add linear gradgrad (#97151 ) Fixes #92206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97151 Approved by: https://github.com/albanD	2023-03-30 07:25:02 +00:00
Philip Meier	2f6c18d1a2	improve memory footprint of torch.testing.assert_close (#96131 ) Redo of #90172 out of stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96131 Approved by: https://github.com/pearu, https://github.com/mruberry	2023-03-29 23:49:56 +00:00
Li-Huai (Allan) Lin	4afef85dda	[MPS] Fix index_select_scalar test (#97773 ) #96408 introduced a check that prevents the index to scalar from being non-singleton. Fixes #94162 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97773 Approved by: https://github.com/kulinseth	2023-03-28 19:23:59 +00:00
Li-Huai (Allan) Lin	100641aadf	[MPS] Fix torch.eye unsupported bool constant on macOS 12 (#97027 ) Fixes #91620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97027 Approved by: https://github.com/kulinseth	2023-03-20 18:08:36 +00:00
Ramin Azarmehr	50beab2978	[MPS] Fix the failure with ReplicatePad3D (#96988 ) - Only ReflectPad needs the torch checks for input arguments and not the ReplicatePad - Added a test case - The failure was originally found in test_modules with test `test_forward_nn_ReplicationPad3d_mps_float32` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96988 Approved by: https://github.com/DenisVieriu97	2023-03-17 01:41:12 +00:00
alexdremov	62eb7a2e97	[MPS] LSTM grad_y missing fix (#96601 ) Fixes #96416 Added tests that do not use LSTM output simalarly to the issue Seems like this fix once again introduces backward incompatibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96601 Approved by: https://github.com/albanD, https://github.com/kulinseth	2023-03-16 15:53:56 +00:00
Li-Huai (Allan) Lin	c95bcb6694	[MPS] Fix flip where no dims need to be flipped (#96605 ) Fixes #96558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96605 Approved by: https://github.com/kulinseth	2023-03-14 00:34:30 +00:00
Li-Huai (Allan) Lin	a87f3f612e	[MPS] Fall back multi-layer LSTM on macOS 12 (#90909 ) The native implementation of LSTM has been fixed on macOS 13. On macOS 12, the multi-layer LSTM still has a numerical correctness issue that cannot be resolved on OS's side. Thus, we fall back the multi-layer LSTM on macOS 12 to LSTMCell iteration. It might have performance impact but will make LSTM on macOS 12 fully usable. Fixes: #90421 Issues related: #80306, #83144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90909 Approved by: https://github.com/albanD, https://github.com/kulinseth	2023-03-10 03:10:49 +00:00
Nikita Shulga	075a49442d	[MPS] Allow `float16` input to float32 `LayerNorm` (#96430 ) Only for forward pass Subset of https://github.com/pytorch/pytorch/pull/96208 Create constant with scalar using `input_mps_dtype` and use `reciprocalWithTensor` instead of `divisionWithPrimaryTensor:1.0 secondaryTensor:` Fixes https://github.com/pytorch/pytorch/issues/96113 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96430 Approved by: https://github.com/kulinseth	2023-03-09 22:09:10 +00:00
Kulin Seth	2bb022e902	[MPS] Adding xfaillist with all categories of failures. (#96176 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/96176 Approved by: https://github.com/malfet	2023-03-08 08:41:21 +00:00
Catherine Lee	eea0733045	Reduce pytest blocklist (#96016 ) `TestCase = object` or variations of it get switched to `TestCase = NoTest`. unittest collects test based on subclassing unittest.TestCase, so setting TestCase = object removes it from unittest test collection. pytest collects based on name (https://docs.pytest.org/en/7.1.x/reference/reference.html#confval-python_classes) but can be told to ignore a class (bottom of https://docs.pytest.org/en/7.1.x/example/pythoncollection.html#changing-naming-conventions) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96016 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn	2023-03-07 18:30:27 +00:00
Li-Huai (Allan) Lin	2f66b57a7a	[MPS] Fix in-place add and sub with alpha == 0.0 (#96184 ) Apart from fixing the below issue, this PR integrates the test for `sub` into the test for `add` as they are implemented using the same template. Fixes #96065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96184 Approved by: https://github.com/kulinseth	2023-03-07 17:17:53 +00:00
Nikita Shulga	769cc8a614	[MPS] Add type promotion to `torch.addcmul` (#96164 ) Fixes crash while running something like `python -c "import torch;x=torch.rand(3, 3, dtype=torch.float16, device='mps');y=x.addcmul(torch.ones(3, device='mps'), torch.ones(3, device='mps'));print(y)"` Modify `castMPSTensor` to become a no-op if cast is not needed Define `common_dtype` as `c10::promoType` between self, tensor1 and tensor2. Cast to any output type. Add mixed-types test to `TestMPS.test_addcmul`, though it does not cover all the permutations Discovered while looking at https://github.com/pytorch/pytorch/issues/96113 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96164 Approved by: https://github.com/kulinseth	2023-03-07 04:19:30 +00:00
alexdremov	78da315afd	[MPS] Fix bidirectional LSTM & small one-direction LSTM fix (#95563 ) Fixes #94754 With this PR I hope to finish my breathtaking journey of fixing MPS LSTM. Here, I enable `bidirectional` on MPS. Also, I've noticed that cache key did not account for all parameters, so there could have been problems with one-directional LSTM when created without bias or dropout and then with one of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95563 Approved by: https://github.com/jhavukainen, https://github.com/kulinseth, https://github.com/malfet	2023-03-05 00:19:54 +00:00
Nikita Shulga	436993d52b	[MPS] Error on unsupported types (#95982 ) I.e. attempt to create tensor of all possible types and make sure that it raises a structured error for non-MPS types Also, rename `test_resize_as_all_dtypes_and_devices` to `test_resize_as_mps_dtypes` and `test_resize_all_dtypes_and_devices` to `test_resize_mps_dtypes` and run both test for all MPS dtypes (rather than just bool, float16 and bfloat16 as they were running before) Fixes https://github.com/pytorch/pytorch/issues/95976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95982 Approved by: https://github.com/kulinseth	2023-03-04 01:29:07 +00:00
Denis Vieriu	304a95435d	[MPS] Disallow reshape in slice (#95905 ) Disallow reshapes for arrayViews. Current code allows a base shape of `[2, 4, 256]` to be sliced into `[4, 1, 256]` (view's shape) - which is not possible. Slicing a smaller dimension into a bigger one will always error out. Fixes https://github.com/pytorch/pytorch/issues/95883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95905 Approved by: https://github.com/razarmehr, https://github.com/kulinseth	2023-03-03 08:08:34 +00:00
Denis Vieriu	d0dd898943	[MPS] Remove remaining casts from 13.3 (#95870 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95870 Approved by: https://github.com/kulinseth	2023-03-02 12:44:59 +00:00
Denis Vieriu	4d3352ed90	[MPS] Remove casts from reduction/cumsum/sort ops starting with macOS 13.3 (#95817 ) MPS in macOS13.3 has added support for int64 in reduction ops / cumsum / sort / argsort. This change removes the hard-coded casts and error messages prior macOS 13.3, allowing the op to run natively with int64. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95817 Approved by: https://github.com/kulinseth	2023-03-02 00:26:24 +00:00
Kulin Seth	5d9d8c6154	[MPS] Add fixes for div with floor and raise error for div_trunc (#95769 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95769 Approved by: https://github.com/DenisVieriu97	2023-03-01 20:52:28 +00:00
Denis Vieriu	e5a959a2d4	[MPS] Fix views with 3 or more sliced dimensions (#95762 ) Fixes https://github.com/pytorch/pytorch/issues/95482 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95762 Approved by: https://github.com/razarmehr	2023-03-01 16:16:49 +00:00
Denis Vieriu	ed1957dc19	[MPS] Add support for masked_scatter (#95743 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95743 Approved by: https://github.com/kulinseth	2023-03-01 01:36:36 +00:00
Li-Huai (Allan) Lin	f33180fb7f	[MPS] Add pow.Scalar (#95201 ) 1. Adds `pow.Scalar`. 2. Modifies testing `atol` and `rtol` to get pow output match tests pass. 3. Xfails numerically incorrect dtypes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95201 Approved by: https://github.com/kulinseth	2023-02-28 16:11:15 +00:00
Li-Huai (Allan) Lin	9e16f1281f	[MPS] Add copysign op. (#95552 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95552 Approved by: https://github.com/kulinseth	2023-02-28 06:49:46 +00:00
Li-Huai (Allan) Lin	b7c2a65139	[MPS] Fix type casting copy with storage offset (#95573 ) This PR handles the case where the `dst` tensor of type casting has a storage offset by creating a temporary buffer to store results and then copy them back to the dst with the offset added. Fixes #95417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95573 Approved by: https://github.com/kulinseth	2023-02-28 05:24:31 +00:00
Li-Huai (Allan) Lin	4930ae7f82	[MPS] Add roll op (#95168 ) Reuse the cpu implementation here as currently there is no native roll implementation from the MPS api (if any, please let me know). Compared to falling back to cpu using `PYTORCH_ENABLE_MPS_FALLBACK=1`, this way we keep tensors on MPS. Did a small benchmark: ```python for num in [10, 100, 1000, 10000]: for shft in [1, 5]: sz = num * num x = torch.arange(sz, device="cpu").view(num, num) s = time.time() r = torch.roll(x, shft) cpu_e = time.time() - s x = torch.arange(sz, device="mps").view(num, num) s = time.time() r = torch.roll(x, shft) mps_e = time.time() - s print(f"size: ({num}, {num}) shft: {shft} cpu: {cpu_e} mps: {mps_e}") ``` ``` size: (10, 10) shft: 1 cpu: 0.00015163421630859375 mps: 0.003078937530517578 size: (10, 10) shft: 5 cpu: 6.794929504394531e-05 mps: 0.0014979839324951172 size: (100, 100) shft: 1 cpu: 0.0001621246337890625 mps: 0.0016200542449951172 size: (100, 100) shft: 5 cpu: 0.00016379356384277344 mps: 0.00154876708984375 size: (1000, 1000) shft: 1 cpu: 0.0022068023681640625 mps: 0.0017690658569335938 size: (1000, 1000) shft: 5 cpu: 0.009071111679077148 mps: 0.0020020008087158203 size: (10000, 10000) shft: 1 cpu: 0.16785407066345215 mps: 0.011695146560668945 size: (10000, 10000) shft: 5 cpu: 0.1160881519317627 mps: 0.011452913284301758 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95168 Approved by: https://github.com/albanD	2023-02-27 18:31:17 +00:00
Nikita Shulga	fd8367a7b1	[MPS][BE] Introduce xfail (#95045 ) Add `mps_ops_modifier` function that adds `unittest.expectedFailure` decorators to the operators that supposed to fail on MPS. This allows one to know whether or not operation will fail, rather than skip it. For example: ``` % python test_mps.py -v -k test_output_match_dot test_output_match_dot_cpu_float32 (__main__.TestConsistencyCPU) ... ok test_output_match_dot_cpu_int16 (__main__.TestConsistencyCPU) ... ok test_output_match_dot_cpu_int32 (__main__.TestConsistencyCPU) ... ok test_output_match_dot_cpu_int64 (__main__.TestConsistencyCPU) ... expected failure test_output_match_dot_cpu_uint8 (__main__.TestConsistencyCPU) ... ok ---------------------------------------------------------------------- Ran 5 tests in 0.175s OK (expected failures=1) ``` Moved a few functions from blocklist to xfail, and find out that some of the functions in the list actually work, for example `torch.long`. Also, allow `None` to be used in `ALLOWLIST` instead of specifying all types explicitly (which aligns with `DecorateInfo` semantic) Eventually, we should get rid of `ALLOWLIST` (i.e. all ops are allowed), keep small `BLOCKLIST` and move the rest to `XFAILLIST` Add step to print HW/SW info before running MPS tests. Fix type promotion in `trace_mps_out` Introduce `MACOS_12_X_XFAILLIST` and skip almost every function for `torch.uint8`, although some of those doesn't make much sense and feels like a regression from PyTorch-1.13 Re-enabled MPS testing on MacOS 12, as runners seems to be available again Pull Request resolved: https://github.com/pytorch/pytorch/pull/95045 Approved by: https://github.com/albanD	2023-02-27 15:01:01 +00:00
Li-Huai (Allan) Lin	4dca9bde05	[MPS] Add fmax fmin op (#95191 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95191 Approved by: https://github.com/kulinseth	2023-02-25 07:21:48 +00:00
Li-Huai (Allan) Lin	5cad542e43	[MPS] Add log_sigmoid op (#95280 ) 1. Add log_sigmoid. 2. Make log1p a common function. Operators that use log1p: mish, softplus, log_sigmoid (maybe more). Pull Request resolved: https://github.com/pytorch/pytorch/pull/95280 Approved by: https://github.com/kulinseth	2023-02-24 01:38:30 +00:00
alexdremov	b9e95158d5	[MPS] Fix LSTM backward and forward pass (#95137 ) Fixes #91694 Fixes #92615 Several transpositions were missing for backward graph in case of `batch_first=True`. The #91694 is not reproduced with `batch_first=False`. After fixing transpose issue, I finally thought that now I can use LSTM freely in my project. And then I got horrific results on train. Seems related to #92615. After that I decided to fix LSTM's backward step completely. I collected all my findings in this thread — seems like I succeeded Funny enough, backward tests were completely disabled before and were not passing: ```python @unittest.skipIf(True, "Backward of lstm returns wrong result") def test_lstm_2(self, device="mps", dtype=torch.float32): ``` UPD: forward pass of multi-layer version also was wrong due to the incorrect `initState, initCell` slices. Tests were passing because states were inited with zeros. Accidentally fixed this too Pull Request resolved: https://github.com/pytorch/pytorch/pull/95137 Approved by: https://github.com/jhavukainen, https://github.com/kulinseth, https://github.com/soulitzer	2023-02-23 17:32:42 +00:00
Denis Vieriu	86efa104f5	[MPS] Fix view op slicing for 2nd dim in case of 0 offset (#95381 ) * Fix view op slicing for 2nd dim in case of 0 offset Pull Request resolved: https://github.com/pytorch/pytorch/pull/95381 Approved by: https://github.com/razarmehr	2023-02-23 17:26:10 +00:00
XiaobingSuper	5730cabdd0	using float type to do the computation of norm reduce for cpu half and bfloat16 dtype (#95166 ) As the title, we should use a higher dtype to compute norm reduce for half and bfloat1 dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95166 Approved by: https://github.com/peterbell10, https://github.com/jgong5, https://github.com/ngimel, https://github.com/lezcano	2023-02-23 05:00:25 +00:00
Li-Huai (Allan) Lin	69c76ff05e	[MPS] Add xlogy op (#95213 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95213 Approved by: https://github.com/kulinseth, https://github.com/soulitzer	2023-02-22 19:43:12 +00:00
Denis Vieriu	5e47571a13	[MPS] Convolution cleanup; remove unnecessary contiguous calls (#95078 ) - Fixes convolution crashes in backward with weights - Removes unnecessary contiguous calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/95078 Approved by: https://github.com/kulinseth	2023-02-22 18:04:12 +00:00
Kulin Seth	02a6d4334b	[MPS] Handle broadcasting by expanding src tensor in Copy.mm (#95272 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95272 Approved by: https://github.com/DenisVieriu97	2023-02-22 18:02:42 +00:00
Denis Vieriu	8475af7761	[MPS] Cast int64 to int32 for reduction ops (#95231 ) - give warnings of converting int64 for reduction ops - use cast tensor for reduction sum on trace - unblock trace from running Pull Request resolved: https://github.com/pytorch/pytorch/pull/95231 Approved by: https://github.com/razarmehr	2023-02-22 17:23:25 +00:00
Li-Huai (Allan) Lin	f70a3430aa	[MPS] Add hypot op (#95196 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95196 Approved by: https://github.com/kulinseth	2023-02-21 22:40:20 +00:00
Li-Huai (Allan) Lin	e0a0329a67	[MPS] Add hardsigmoid op (#95164 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95164 Approved by: https://github.com/kulinseth	2023-02-21 07:06:37 +00:00
Li-Huai (Allan) Lin	d96aac8d2a	[MPS] Add logit op (#95162 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95162 Approved by: https://github.com/kulinseth	2023-02-21 07:02:45 +00:00
alexdremov	a17a7ccc92	[MPS] LogSoftmax numerical stability (#95091 ) Fixes #94043 Calculations are now consistent with numericaly stable formula and CPU: $LogSoftmax(X, \dim) = X - \max(X, \dim) - \log(sum(X - \max(X, \dim), \dim))$ @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/95091 Approved by: https://github.com/malfet, https://github.com/kulinseth	2023-02-18 18:26:29 +00:00
Ramin Azarmehr	9511b9fad2	[MPS] Fix copy_cast_mps() on tensors with storage offset (#95093 ) - The copy_cast path requires storage_offset to be applied before casting - This should fix some correctness issues in transformer models Fixes #94980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95093 Approved by: https://github.com/kulinseth	2023-02-18 16:29:01 +00:00
Li-Huai (Allan) Lin	25ee6dd335	[MPS] Fix fill_ where input tensor has a storage offset (#95113 ) Fixes #94390 Apart from fixing the issue above, this PR also fixes a bug that when an input tensor can be sliced, a sliced array view is created. This array view seems to be not writable or have a different storage from the original tensor, causing incorrect results with the in-place `fill`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95113 Approved by: https://github.com/kulinseth	2023-02-18 16:19:15 +00:00
Li-Huai (Allan) Lin	0a9c608461	[MPS] Fix tensor with non-zero storage offset graph gathering (#91071 ) Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims. For example, if we want to stack `y_mps` and `x_mps` on the last dim: ``` t_mps = torch.tensor([1, 2, 3, 4], device="mps") x_mps = t_mps[2:] # [3, 4] y_mps = t_mps[:2] # [1, 2] res_mps = torch.stack((y_mps, x_mps), dim=-1) ``` the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to: ``` res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1) ``` `x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage. Fixes #87856 Fixes #91065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91071 Approved by: https://github.com/kulinseth	2023-02-17 18:44:20 +00:00
Denis Vieriu	a2afc657da	[MPS] Fix upsample for NHWC output (#94963 ) Fixes https://github.com/huggingface/diffusers/issues/941 Before: <img width="1144" alt="Screenshot 2023-02-15 at 8 11 53 PM" src="https://user-images.githubusercontent.com/104024078/219266709-6a77636a-2fc0-4802-b130-85069b95953f.png"> After: <img width="1144" alt="Screenshot 2023-02-15 at 8 12 02 PM" src="https://user-images.githubusercontent.com/104024078/219266694-ea743c02-fb55-44f1-b7d6-5946106527c3.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94963 Approved by: https://github.com/razarmehr	2023-02-17 05:07:22 +00:00
Denis Vieriu	5d1e9fd214	[MPS] Fix prelu backward pass (#94933 ) Allocate the correct shape for the weights gradient Pull Request resolved: https://github.com/pytorch/pytorch/pull/94933 Approved by: https://github.com/razarmehr	2023-02-17 03:45:12 +00:00
Denis Vieriu	bc361fdfdf	[MPS] Fix bilinear backward pass (#94892 ) Fixes backward pass for bilinear. Summary of changes: - bilinear op is able to produce contiguous, non-view tensors with a storage offset, such as: shape=`[1, 1, 1, 1]`, `storage_offset=12`. This seems a weird case, but it is valid, and for these type of tensors we wouldn't be able to gather/scatter since we look at the view flag (which is not set here). This change looks into `storage_offset` only rather than the is_view flag which is not being set - reduction sum must return a zeroed out output if passing an input with 0 elements (e.g a shape of (0, 5)). Pull Request resolved: https://github.com/pytorch/pytorch/pull/94892 Approved by: https://github.com/kulinseth	2023-02-16 00:30:29 +00:00
Kulin Seth	54ebf255ab	[MPS] Fixes for LSTM. (#94889 ) - Backward pass has to give explicit bias tensor of zeros if none is passed to the op or the bias gradient will not be calculated. - Fixed bias tensor mistakenly getting overwritten to zeros - Fixes crash when lstm op called with has_biases set to false. Change takes into account the changed shape of the input params TensorList depending on the bias flag. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/94889 Approved by: https://github.com/DenisVieriu97	2023-02-15 16:10:40 +00:00
Denis Vieriu	71ec2617d2	[MPS] Block uint8 data type for unary and binary ops on macOS 12 (#94876 ) Blocks uint8 data type for unary and binary ops on macOS 12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94876 Approved by: https://github.com/kulinseth	2023-02-15 06:09:56 +00:00
Kulin Seth	94f0808629	[MPS] Add fmod op. (#94722 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/94722 Approved by: https://github.com/DenisVieriu97	2023-02-14 14:55:26 +00:00
Xuehai Pan	b005ec62b9	[BE] Remove dependency on `six` and `future` (#94709 ) Remove the Python 2 and 3 compatibility library [six](https://pypi.org/project/six) and [future](https://pypi.org/project/future) and `torch._six`. We only support Python 3.8+ now. It's time to retire them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94709 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-02-14 09:14:14 +00:00
Denis Vieriu	1f06a71797	[MPS] Error out for square int64 input (#94766 ) - add checks for whether macOS is greater than 13.2 - remove square from block list - throw error messages if power int64 is called before macOS 13.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94766 Approved by: https://github.com/kulinseth	2023-02-14 04:45:41 +00:00
Denis Vieriu	cedb7e3d77	[MPS] Fix remainder op for integral dtypes (#94757 ) Map remainder op to the same template as div (integral dtypes will be cast to float) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94757 Approved by: https://github.com/kulinseth	2023-02-14 01:06:49 +00:00
Denis Vieriu	4acdc446b2	[MPS] Fix batch norm for NHWC (#94760 ) Fixes `test_modules.py` batch norm NHWC testcases: - `test_memory_format_nn_BatchNorm2d_eval_mode_mps_float32` - `test_memory_format_nn_BatchNorm2d_eval_mode_mps_float32` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94760 Approved by: https://github.com/kulinseth	2023-02-13 23:31:10 +00:00
OwenPendrighElliott	840fb74ec8	86990 range mps support (#91075 ) Fixes #86990 - Added range_mps_out to RangeFactories.mm - Updated native_functions.yaml - Added tests in test_mps.py I did observe that despite [the documentation for torch.range](https://pytorch.org/docs/stable/generated/torch.range.html), the existing implementations do not adjust their return type based off the arguments passed to them. The MPS implementation provided here behaves the same way as the existing CPU and CUDA implementations in this regard, hence the conversion to float32 in the test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91075 Approved by: https://github.com/kulinseth, https://github.com/DenisVieriu97	2023-02-13 23:19:10 +00:00
Ramin Azarmehr	b57e6fdb50	[MPS] Enable Memory Leak Detection for test_mps.py (#94646 ) - To check for Memory Leaks in `test_mps.py`, set the env-variable `PYTORCH_TEST_MPS_MEM_LEAK_CHECK=1` when running test_mps.py (used CUDA code as reference). - Added support for the following new python interfaces in MPS module: `torch.mps.[empty_cache(), set_per_process_memory_fraction(), current_allocated_memory(), driver_allocated_memory()]` - Renamed `_is_mps_on_macos_13_or_newer()` to `_mps_is_on_macos_13_or_newer()`, and `_is_mps_available()` to `_mps_is_available()` to be consistent in naming with prefix `_mps`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94646 Approved by: https://github.com/malfet	2023-02-13 17:56:24 +00:00
Kulin Seth	18587cb31f	[MPS] Add sort and argSort Op. (#94697 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/94697 Approved by: https://github.com/DenisVieriu97	2023-02-13 01:03:22 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Ramin Azarmehr	bdd8f518d7	[MPS] Add Python Module Bindings for the MPS backend (#94417 ) - This PR is a prerequisite for the upcoming Memory Leak Detection PR. - Enable global manual seeding via `torch.manual_seed()` + test case - Add `torch.mps.synchronize()` to wait for MPS stream to finish + test case - Enable the following python interfaces for MPS: `torch.mps.[get_rng_state(), set_rng_state(), synchronize(), manual_seed(), seed()]` - Added some test cases in test_mps.py - Added `mps.rst` to document the `torch.mps` module. - Fixed the failure with `test_public_bindings.py` Description of new files added: - `torch/csrc/mps/Module.cpp`: implements `torch._C` module functions for `torch.mps` and `torch.backends.mps`. - `torch/mps/__init__.py`: implements Python bindings for `torch.mps` module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94417 Approved by: https://github.com/albanD	2023-02-12 21:22:30 +00:00
Henry Cheng	fe0c7fbcf8	[MPS] Add repeat_interleave to MPS (#88649 ) Fixes #87219 Implements new ``repeat_interleave`` function into ``aten/src/ATen/native/mps/operations/Repeat.mm`` Adds it to ``aten/src/ATen/native/native_functions.yaml`` Adds new test ``test_repeat_interleave`` to ``test/test_mps/py`` Pull Request resolved: https://github.com/pytorch/pytorch/pull/88649 Approved by: https://github.com/kulinseth	2023-02-12 08:43:55 +00:00
Denis Vieriu	b794fd19c5	[MPS] Add scatter gather kernels (support up to 5 dimensions) (#94663 ) Add scatter gather kernels (support up to 5 dimensions) - Fixes int64 issues for `mH`, `mT`, `T`, `H` on Monterey Pull Request resolved: https://github.com/pytorch/pytorch/pull/94663 Approved by: https://github.com/kulinseth	2023-02-12 08:17:26 +00:00
Kulin Seth	54c0f37646	[MPS] Add support for TopK k>16 (#94639 ) Fixes: https://github.com/pytorch/pytorch/issues/78915 * Add the topk>16 support Pull Request resolved: https://github.com/pytorch/pytorch/pull/94639 Approved by: https://github.com/DenisVieriu97	2023-02-12 00:57:53 +00:00

1 2 3 4 5 ...

345 Commits