pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Li-Huai (Allan) Lin	ac1e85161e	[MPS] Fix nll_loss with default ignore_index (#109574 ) `-100` should be a valid `ignore_index` as indicated in the linked issue. This PR also cleans up some unnecessary MPSTensor copies. Fixes #108148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109574 Approved by: https://github.com/kulinseth ghstack dependencies: #109557	2023-09-26 04:13:09 +00:00
Li-Huai (Allan) Lin	0087118997	[MPS] Fix mps to cpu copy with storage offset (#109557 ) Fix #108978 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109557 Approved by: https://github.com/DenisVieriu97	2023-09-26 04:13:08 +00:00
CaoE	7c9052165a	add fp16 support for native conv and deconv on CPU (#99497 ) ### Testing Native conv vs. mkldnn conv on SPR (with avx512_fp16 support) Single core: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 34676789 \| 524199.8 \| 66.15185 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 33454125 \| 349844.4 \| 95.62573 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 317650.1 \| 2317.677 \| 137.0554 IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 \| 15334.68 \| 167.264 \| 91.67952 56 cores: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 1032064 \| 11073.58 \| 93.20061 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 1000097 \| 16371.19 \| 61.08883 IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 981813.4 \| 9008.908 \| 108.9825 IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 1082606 \| 10150.47 \| 106.6558 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 319980.6 \| 181.598 \| 1762.027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-25 01:31:26 +00:00
igm503	255d1a776a	[MPS] Add support for Mish to MPS backend (#109786 ) Fixes [#ISSUE_NUMBER](https://github.com/pytorch/pytorch/issues/77764#issuecomment-1712894444) Adds the mish activation function to the mps backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109786 Approved by: https://github.com/kulinseth	2023-09-21 21:01:20 +00:00
igm503	0317626df5	[MPS] adding weight_norm_interface support for mps (#108008 ) Fixes #104513 Adds support for aten::_weight_norm_interface to the mps backend. Also adds a consistency test for the output and the grad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108008 Approved by: https://github.com/kulinseth	2023-09-20 02:18:28 +00:00
CaoE	54c28c564f	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/mingfeima	2023-09-19 10:43:33 +00:00
PyTorch MergeBot	be9f73f031	Revert "Add meta and OpInfo for _embedding_bag_dense_backward (#109211 )" This reverts commit `fe14e43d14`. Reverted https://github.com/pytorch/pytorch/pull/109211 on behalf of https://github.com/clee2000 due to Sorry I think the test_ops.py::TestCommonCUDA::test_compare_cpu__embedding_bag_dense_backward_cuda_float32 is failing `492a93d185` https://github.com/pytorch/pytorch/actions/runs/6190707847/job/16808644559 not sure why this is run in slow when it looks to be a new test ([comment](https://github.com/pytorch/pytorch/pull/109211#issuecomment-1720235918))	2023-09-14 22:29:12 +00:00
Edward Z. Yang	fe14e43d14	Add meta and OpInfo for _embedding_bag_dense_backward (#109211 ) The sample inputs is a bit involved because there are a lot of shenanigans in the derivative formula. Check comments. This is exercised in vdd, internal test `buck2 run '@fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup'` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109211 Approved by: https://github.com/albanD, https://github.com/zou3519	2023-09-14 18:49:32 +00:00
PyTorch MergeBot	b226373d16	Revert "add Half support for BatchNorm on CPU (#102070 )" This reverts commit `b6a1d3fb97`. Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to I'm very sorry but it looks like #106543 was not fixed, I still see it failing on main `b6a1d3fb97` https://github.com/pytorch/pytorch/actions/runs/6185704949/job/16793975677 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1719747065))	2023-09-14 16:13:34 +00:00
CaoE	b6a1d3fb97	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-14 12:23:59 +00:00
PyTorch MergeBot	04a765f95d	Revert "add Half support for BatchNorm on CPU (#102070 )" This reverts commit `6065e7a97c`. Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to sorry it looks like this is causing an unexpected success for `test_jit_fuser_te.py::TestNNCOpInfoCPU::test_nnc_correctness_nn_functional_batch_norm_cpu_float16` `6065e7a97c` https://github.com/pytorch/pytorch/actions/runs/6178069462/job/16770849782 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1718402208))	2023-09-13 22:38:42 +00:00
Nikita Shulga	916183a012	[MPS] Fix crash if nonzero is called concurrently (#108996 ) Surrounds `stream->synchronize()` call with `dispatch_sync(stream->queue(), ^{});`, which is a noop for signle threaded program, but serializes calls to the synchronize across the threads using the same stream. Prevent `[IOGPUMetalCommandBuffer validate]:215: failed assertion 'commit an already committed command buffer'` non-recoverable exception, which is triggered every time one is using PyCharm to inspect tensors on MPS device Fixes https://github.com/pytorch/pytorch/issues/100285 <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 1662ce2</samp> > _Sing, O Muse, of the swift and skillful coders_ > _Who fixed the dreadful deadlock of the stream_ > _That crashed the mighty tensors of the MPS_ > _When they sought out the nonzero elements._ Pull Request resolved: https://github.com/pytorch/pytorch/pull/108996 Approved by: https://github.com/kulinseth	2023-09-13 19:28:47 +00:00
CaoE	6065e7a97c	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-13 17:30:16 +00:00
igm503	1b9b3a2d15	[MPS] Adding lgamma, digamma, and polygamma implementations (#106292 ) Fixes issue mentioned in #77764 e.g. https://github.com/pytorch/pytorch/issues/77764#issuecomment-1654111744 Adds MPS support for the following ops: - lgamma - mvlgamma - digamma - polygamma The lgamma fucntion does not yet have an MPS backend implementation. I've added one using a custom metal kernel (following John D. Cook's c++ implementation of the log gamma function: https://www.johndcook.com/blog/cpp_gamma/). For the backward pass op, I've added a digamma kernel that follows the cpu+cuda digamma implementation, and for the backward pass of the digamma op, I've added a polygamma + trigamma kernel following, again, the cpu+cuda implementations. NOTE: The cpu implementation of the polygamma function incorrectly (as far as I can tell) outputs a finite number for order = 1 and x in the negative integers. The mps implementation correctly outputs infinite. (see https://github.com/pytorch/pytorch/issues/106692) The polygamma tests currently don't pass because of the error in the cpu+cuda kernels, but also because there are smallish discrepancies near the negative integers between the cpu+cuda and the mps polygamma and trigamma kernels. I'm not sure exactly why this is, but let me know if the discrepancies are too big. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106292 Approved by: https://github.com/kulinseth	2023-09-12 16:43:37 +00:00
Li-Huai (Allan) Lin	293d3b89d8	Add Opinfos for the Tensor overload of linspace/logspace (#107958 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107958 Approved by: https://github.com/zou3519	2023-09-11 22:30:19 +00:00
Nikita Shulga	9b12a28d89	[MPS] Implement `mul` operation for complex types (#108395 ) Using existing BinaryKernel template Add `mul` as well as `kron` and `outer` to list of MPS ops that support complex types This should add all the missing ops mentioned in https://github.com/pytorch/pytorch/issues/105665 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108395 Approved by: https://github.com/albanD ghstack dependencies: #108393, #108394	2023-09-10 05:39:12 +00:00
Nikita Shulga	c7bb842d35	[MPS] Add complex `add`/`sub` (#108394 ) Using `view_as_real` and running elementwise ops in resulted tensors Add `add` and `sub` to list of complex ops that should work on MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/108394 Approved by: https://github.com/albanD ghstack dependencies: #108393	2023-09-10 05:39:12 +00:00
Nikita Shulga	53a4ca4b58	[MPS][BE] Add `dispatch_sync_with_rethrow` (#108393 ) And enable testing for match_output for complex types. Most of them should throw an "unsupported XYZ" error, rather than crash. This fixed several crashes when linalg ops were invoked with complex inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108393 Approved by: https://github.com/kit1980, https://github.com/kulinseth	2023-09-10 02:07:12 +00:00
alexdremov	b60273b88a	[MPS] Pixel shuffle unshuffle support (#99306 ) Fixes #83196 Now, MPS implementation is blazingly fast. Though, I have several questions on improving this PR: 1. I copied code from `test_nn.py`. Is there better way to test this? 2. I decided to use `usepixelshuffleorder:YES`. Am I right performance-wise? According to docs: ``` `usePixelShuffleOrder` can be used to control how the data within spatial blocks is ordered in the `depthAxis` dimension: with `usePixelShuffleOrder=YES` the values within the spatial blocks are stored contiguosly within the `depthAxis` dimension whereas otherwise they are stored interleaved with existing values in the `depthAxis` dimension. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99306 Approved by: https://github.com/kulinseth, https://github.com/malfet	2023-09-06 09:11:39 +00:00
CaoE	42f94d7e9f	add Half support for maxpool on CPU (#98819 ) ### Testing Single socket (28 cores): shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 4.12895 \| 6.9669 \| 5.30297 \| 0.55775 \| 1.98917 \| 0.72233 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 0.85093 \| 1.88813 \| 1.38063 \| 5.5742 \| 36.5086 \| 10.58552 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig \| 22.37212 \| 37.90383 \| 30.94482 \| 6.85868 \| 10.6116 \| 3.9993 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL \| 5.41658 \| 4.71098 \| 4.66578 \| 6.69875 \| 14.7171 \| 5.1167 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 10.69831 \| 18.0468 \| 13.71657 \| 2.61192 \| 4.96172 \| 1.68635 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 2.52637 \| 2.0096 \| 2.0055 \| 2.60314 \| 7.2093 \| 2.49843 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 0.47605 \| 0.88398 \| 0.65326 \| 0.06525 \| 0.115489 \| 0.0674 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 0.10902 \| 0.25293 \| 0.157475 \| 0.11386 \| 0.53319 \| 0.17836 Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 90.9809 \| 163.473 \| 126.1276 \| 6.57721 \| 41.40833 \| 11.82505 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 9.88405 \| 38.39137 \| 29.62069 \| 7.10636 \| 36.97535 \| 11.0525 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig \| 476.782 \| 855.4769 \| 648.2248 \| 46.6488 \| 219.2586 \| 67.10599 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL \| 80.29271 \| 91.33854 \| 87.80345 \| 48.81692 \| 203.9974 \| 63.39004 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 235.2113 \| 419.0799 \| 315.4284 \| 20.6049 \| 107.1524 \| 32.39169 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 29.47653 \| 33.54905 \| 32.82823 \| 22.59674 \| 98.5586 \| 30.05763 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 7.90684 \| 13.9208 \| 10.03272 \| 0.23725 \| 1.35269 \| 0.41728 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 2.33638 \| 3.36894 \| 2.64635 \| 0.26535 \| 1.244 \| 0.38895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98819 Approved by: https://github.com/mingfeima, https://github.com/mikaylagawarecki	2023-09-05 18:23:41 +00:00
Nikita Shulga	bae409388c	[MPS] Fix `.item()` for multi-dim scalar (#107913 ) By refactoring `_local_scalar_dense_mps` to use `_empty_like` to allocate CPU tensor. Also, print a more reasonable error message when dst dim is less than src in mps_copy_ This fixes regression introduced by https://github.com/pytorch/pytorch/pull/105617 and adds regression test. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at abd06e6</samp> > _Sing, O Muse, of the valiant deeds of the PyTorch developers_ > _Who strive to improve the performance and usability of tensors_ > _And who, with skill and wisdom, fixed a bug in the MPS backend_ > _That caused confusion and dismay to many a user of `item()`_ Fixes https://github.com/pytorch/pytorch/issues/107867 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107913 Approved by: https://github.com/albanD	2023-08-31 21:08:29 +00:00
vfdev	b7624fc91e	Cleaned up test_mps.py::test_output*_match (#108092 ) Description: - cleaned up test_mps.py::test_output_match and test_mps.py::test_output_grad_match tests - removed unused variables and useless brackets - simplified atol/rtol setup if/else code Pull Request resolved: https://github.com/pytorch/pytorch/pull/108092 Approved by: https://github.com/kulinseth	2023-08-29 10:46:02 +00:00
Nikita Shulga	6e85a68829	[MPS] Implement `polar` via metal shader (#107324 ) Use `view_as_real` to cast complex into a pair of floats and then it becomes just another binary operator. Enable `polar` and `view_as_complex` consistency tests, but skip `test_output_grad_match_polar_cpu` as `mul` operator is yet not supported Remove redundant `#ifdef __OBJC__` and capture and re-throw exceptions captured during `createCacheBlock` block. Fixes https://github.com/pytorch/pytorch/issues/78503 TODOs(in followup PRs): - Implement backwards (requires complex mul and sgn) - Measure the perf impact of computing the strides on the fly rather than ahead of time (unrelated to this PR) Partially addresses https://github.com/pytorch/pytorch/issues/105665 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107324 Approved by: https://github.com/albanD	2023-08-25 03:16:23 +00:00
Aaron Gokaslan	660e8060ad	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-22 23:16:38 +00:00
PyTorch MergeBot	d59a6864fb	Revert "[BE]: Update ruff to 0.285 (#107519 )" This reverts commit `88ab3e4322`. Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))	2023-08-22 19:53:32 +00:00
Aaron Gokaslan	88ab3e4322	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-20 01:36:18 +00:00
arunppsg	4bfc55ba8b	[MPS] Enable forward test for renorm (#106666 ) Enabled forward test for renorm Pull Request resolved: https://github.com/pytorch/pytorch/pull/106666 Approved by: https://github.com/kulinseth, https://github.com/albanD	2023-08-17 16:46:06 +00:00
Jason Lu	bc88028e8e	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 ) Summary: Original commit changeset: 81319beb97f3 Original Phabricator Diff: D47961182 Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822 Reviewed By: atuljangra Differential Revision: D48131623 @diff-train-skip-merge (D48131623 landed internally) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743 Approved by: https://github.com/malfet	2023-08-08 15:27:34 +00:00
Ramin Azarmehr	cdfd0ea162	[MPS] Introduce torch.mps.Event() APIs (#102121 ) - Implement `MPSEventPool` to recycle events. - Implement python bindings with `torch.mps.Event` class using the MPSEventPool backend. The current member functions of the Event class are `record()`, `wait()`, `synchronize()`, `query()`, and `elapsed_time()`. - Add API to measure elapsed time between two event recordings. - Added documentation for Event class to `mps.rst`. - Added test case to `test_mps.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102121 Approved by: https://github.com/albanD, https://github.com/kulinseth	2023-08-08 03:45:45 +00:00
Li-Huai (Allan) Lin	d4d086ce7b	[MPS] Fix Clamp with strided outputs/inputs (#97858 ) Fixes #94396 Fixes #87348 1. If output is strided, we don't gather input tensors. 2. If output is not strided but min_t or max_t is strided, we make min_t or max_t contiguous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97858 Approved by: https://github.com/kulinseth	2023-08-04 09:32:12 +00:00
Peter Stefek	c9c2b14c53	Fix copy_ broadcast behavior on mps (#105617 ) Fixes #105277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105617 Approved by: https://github.com/malfet	2023-08-03 04:03:32 +00:00
PyTorch MergeBot	d83b887f2a	Revert "Add error checking for padding modules (#106147 )" This reverts commit `0547b6279d`. Reverted https://github.com/pytorch/pytorch/pull/106147 on behalf of https://github.com/jeanschmidt due to sadly it is breaking internal builds, and I can't coordinate a FF due to timezone differences ([comment](https://github.com/pytorch/pytorch/pull/106147#issuecomment-1661870970))	2023-08-02 09:37:40 +00:00
Denis Vieriu	d1a2aa1909	[MPS] Fix MPS clamp issue with different dtypes between input and min/max tensors (#105747 ) - Fix the FP16 clamp issue (FP32 and FP16 are not broadcast compatible) - Fix clamp (cached graph nodes were previously replaced with the cast version) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105747 Approved by: https://github.com/kulinseth	2023-08-02 02:51:34 +00:00
Peter Stefek	97e5055a69	Add cumprod support for device mps (#104688 ) Related to #77764 Add support for the cumprod operation (which in turn allows its gradient). This also allows us to compute the gradient of prod since it was blocked behind cumprod in the case where exactly one element of the tensor was 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104688 Approved by: https://github.com/kulinseth	2023-08-01 21:51:20 +00:00
Mikayla Gawarecki	0547b6279d	Add error checking for padding modules (#106147 ) Fixes https://github.com/pytorch/pytorch/issues/105627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106147 Approved by: https://github.com/albanD ghstack dependencies: #106325	2023-08-01 12:49:58 +00:00
Mikayla Gawarecki	d8e5f2aa6d	Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224 Approved by: https://github.com/atalman, https://github.com/albanD	2023-07-31 17:18:56 +00:00
cyy	b8eb827d93	use UBSAN on some tests (#103655 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/103655 Approved by: https://github.com/kshitij12345, https://github.com/zou3519	2023-07-24 14:24:49 +00:00
Peter Pham	bba06ad751	[MPS] aten::erfinv metal kernel ops (#101507 ) I've added the implementation of erfinv using the algorithm from `4154c8ea15/aten/src/ATen/native/Math.h (L152)` in order for the MPS based algorithm to match the CPU automatic test. This PR is using the new metal api calls from https://github.com/pytorch/pytorch/pull/100661 Testing shows MPS has a decent speed up (270x) compared to CPU on tensor size of 100 mil elements. ``` import torch x = torch.arange(-1, 1, 1e-8) # default cpu tensor #measure CPU compute time by calling torch.erfinv time = %timeit -o -q -r 5 torch.erfinv(x) cpu_time = time.average print("CPU torch.erfinv time: ", cpu_time) x = x.to("mps") # measure MPS compute time time = %timeit -o -q -r 5 torch.erfinv(x) mps_time = time.average print("MPS torch.erfinv time: ", mps_time) print(f"MPS torch.erfinv is {cpu_time/mps_time*100} percent faster than CPU torch.erfinv") # compute MSE between MPS and CPU torch.erfinv x = x.to("cpu") y_cpu = torch.erfinv(x) x = x.to("mps") y_mps = torch.erfinv(x) y_mps = y_mps.to("cpu") mask = torch.isfinite(y_cpu) & torch.isfinite(y_mps.to("cpu")) y_mps = y_mps[mask] y_cpu = y_cpu[mask] x = x[mask] print(f"length of y_mps: {len(y_mps)}, length of y_cpu: {len(y_cpu)}, length of x: {len(x)}") mse = torch.square(y_cpu - y_mps).mean() print("MSE between MPS and CPU torch.erfinv: ", mse) diff = torch.abs(y_cpu - y_mps) print("Largest difference") print(f"x: {x[torch.argmax(diff)]}, y_cpu: {y_cpu[torch.argmax(diff)]}, y_mps: {y_mps[torch.argmax(diff)]} , diff = {y_cpu[torch.argmax(diff)] - y_mps[torch.argmax(diff)]}") ``` CPU torch.erfinv time: 2.654937833400254 MPS torch.erfinv time: 0.009831255332002912 MPS torch.erfinv is 27005.07456822776 percent faster than CPU torch.erfinv length of y_mps: 199999992, length of y_cpu: 199999992, length of x: 199999992 MSE between MPS and CPU torch.erfinv: tensor(4.2339e-14) Largest difference x: -0.9999980330467224, y_cpu: -3.363569736480713, y_mps: -3.3635685443878174 , diff = -1.1920928955078125e-06 Fixes #https://github.com/pytorch/pytorch/issues/86808 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101507 Approved by: https://github.com/kulinseth	2023-07-23 01:36:43 +00:00
Jane Xu	803d42e457	add lerp cpu support for half (#105607 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105607 Approved by: https://github.com/albanD	2023-07-21 20:29:05 +00:00
Andrey Talman	c6653b65d8	Back out "Make adding buffers more like adding parameters (#104069 )" (#105581 ) Summary: D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/ with `TypeError: register_buffer() takes 3 positional arguments but 4 were given` Original commit changeset: d4b4069fbd38 Original Phabricator Diff: D47537831 Test Plan: ``` buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform ``` Reviewed By: atalman Differential Revision: D47600140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581 Approved by: https://github.com/mikaylagawarecki	2023-07-20 03:39:53 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
Peter Stefek	d2c24eca8a	Fix mps unary op issue on non densely stored tensors (#105512 ) This pr fixes a bug where non densely stored tensors were not converted to the dense tensors of the correct scalar type in the mps `unary_op` helper function Fixes https://github.com/pytorch/pytorch/issues/105284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105512 Approved by: https://github.com/malfet	2023-07-19 03:56:38 +00:00
Nikita Shulga	8cd94e1eab	[MPS] Add lerp implementation (#105470 ) lerp.Scalar fits very well into binary op template Add a very naive implementation for `lerp.Tensor` as `add_out(self, weights.mul(end.sub(self)))` Enable `lerp` testing in `test_mps` Fixes https://github.com/pytorch/pytorch/issues/105382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105470 Approved by: https://github.com/albanD	2023-07-18 20:01:04 +00:00
ekamiti	32d422f335	Make adding buffers more like adding parameters (#104069 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069 Approved by: https://github.com/mikaylagawarecki	2023-07-17 17:59:05 +00:00
David Radley	17250976f3	correct empty tensor mps all operation (#105218 ) Fixes #104694 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105218 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2023-07-14 17:42:54 +00:00
albanD	08cbfb2a58	Avoid tensor creation and use scalar overload (#104264 ) I would expect this preserves the behavior but there might be weird edge cases? @mruberry might know? The aim is to fix https://github.com/pytorch/pytorch/pull/104254 (and make `1 ** t` capturable via cudagraph) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104264 Approved by: https://github.com/zou3519	2023-07-12 18:11:27 +00:00
Nikita Shulga	5e4ee15e85	[MPS] Fix unique flatten logic (#104938 ) Tensor must be flatted if dim is none before checking whether or not dim dimension is already None Fixes https://github.com/pytorch/pytorch/issues/104879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104938 Approved by: https://github.com/albanD	2023-07-11 19:55:56 +00:00
soulitzer	91dcc3b272	Fix activation checkpoint for mps (#104787 ) Fixes https://github.com/pytorch/pytorch/issues/104478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104787 Approved by: https://github.com/albanD	2023-07-08 14:57:05 +00:00
Jerry Zhang	611febf6cf	[quant] Support integer implementations for max_pool2d (#104225 ) Summary: This is needed for representing quantized model in pt2 export quantization flow Test Plan: tested by opinfo, python test/test_ops.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/104225 Approved by: https://github.com/kimishpatel	2023-07-05 23:54:07 +00:00
Nikita Shulga	01e6d64dd2	[MPS] Fix unary ops over sparse-mapped tensors (#100765 ) If input tensor is backed by a sparse view, create a dense copy before running unary op, otherwise op will be applied against the wrong elements. Introduce `is_dense_in_storage` that returns true if tensor/view are mapped to a dense area in the tensor storage. Add unit test to validate the fix. Fixes https://github.com/pytorch/pytorch/issues/98074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100765 Approved by: https://github.com/albanD	2023-07-05 23:17:43 +00:00

1 2 3 4 5 ...

345 Commits