pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Sam Larsen	e165938853	Implement decomposition for aten.rrelu_with_noise (#106812 ) Test Plan: * Primarily, added new test in test/test_decomp.py * Updated existing tests, e.g., to NOT expect failure Pull Request resolved: https://github.com/pytorch/pytorch/pull/106812 Approved by: https://github.com/eellison	2023-08-11 19:18:29 +00:00
Stephen Jia	8c8477e55a	Add _unsafe_index decomp (#106814 ) Summary: Redirect `aten._unsafe_index` to `aten.index` through a decomposition. Also add it to the list of core decompositions. Test Plan: contbuild and OSS CI (similar to D40075277) Differential Revision: D48163393 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106814 Approved by: https://github.com/SherlockNoMad	2023-08-10 23:23:37 +00:00
vfdev-5	35a1913370	[inductor] Added affine_grid_generator decomposition (#104709 ) Description: - Added affine_grid_generator decomposition Related to https://github.com/pytorch/pytorch/issues/104296 Fixes https://github.com/pytorch/pytorch/issues/105565 Perfs: - speed-up on cuda with bilinear and nearest modes ``` Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git3ed904e) PR-afgg" and "Compiled (2.1.0a0+gitbcdd413) Nightly" [------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cpu ------------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git16df542) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+git16df542) Nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 7.467 (+-0.036) \| 11.905 (+-0.276) \| 13.391 (+-0.051) \| 1.125 (+-0.000) \| 7.343 (+-0.036) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 7.722 (+-0.168) \| 14.371 (+-0.035) \| 15.899 (+-0.038) \| 1.106 (+-0.000) \| 7.870 (+-0.043) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 7.710 (+-0.051) \| 11.354 (+-0.053) \| 13.376 (+-0.045) \| 1.178 (+-0.000) \| 7.698 (+-0.061) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 7.870 (+-0.050) \| 13.744 (+-0.237) \| 15.206 (+-0.102) \| 1.106 (+-0.000) \| 7.912 (+-0.039) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 4.738 (+-0.015) \| 4.508 (+-0.005) \| 6.566 (+-0.027) \| 1.456 (+-0.000) \| 4.630 (+-0.022) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 4.391 (+-0.010) \| 4.860 (+-0.390) \| 6.438 (+-0.047) \| 1.325 (+-0.000) \| 4.458 (+-0.010) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 4.279 (+-0.008) \| 4.127 (+-0.010) \| 6.598 (+-0.709) \| 1.599 (+-0.000) \| 5.064 (+-0.025) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 4.537 (+-0.010) \| 4.593 (+-0.006) \| 6.365 (+-0.104) \| 1.386 (+-0.000) \| 4.480 (+-0.011) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 26.411 (+-0.066) \| 62.275 (+-0.436) \| 64.486 (+-0.353) \| 1.035 (+-0.000) \| 26.210 (+-0.110) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 26.457 (+-0.096) \| 72.887 (+-0.247) \| 74.207 (+-0.337) \| 1.018 (+-0.000) \| 25.995 (+-0.120) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 26.457 (+-0.086) \| 64.110 (+-0.233) \| 66.340 (+-0.406) \| 1.035 (+-0.000) \| 26.145 (+-0.085) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 26.536 (+-0.094) \| 73.742 (+-0.483) \| 71.946 (+-0.460) \| 0.976 (+-0.000) \| 26.457 (+-0.166) Times are in milliseconds (ms). [------------------------------------------------------------------------------------------------------------------------------------ Affine grid sampling, cuda -----------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git1afae24) PR-afgg \| Compiled (2.1.0a0+git16df542) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+git16df542) Nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 91.971 (+-0.253) \| 90.570 (+-0.193) \| 137.206 (+-0.214) \| 1.515 (+-0.000) \| 84.280 (+-0.241) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 91.893 (+-0.361) \| 89.866 (+-0.170) \| 136.678 (+-0.471) \| 1.521 (+-0.000) \| 84.573 (+-0.214) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 116.967 (+-0.481) \| 110.468 (+-0.326) \| 223.770 (+-0.334) \| 2.026 (+-0.000) \| 108.098 (+-0.392) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 117.563 (+-0.546) \| 111.438 (+-0.212) \| 223.101 (+-0.350) \| 2.002 (+-0.000) \| 108.225 (+-0.395) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 80.706 (+-0.289) \| 70.525 (+-0.204) \| 143.697 (+-0.311) \| 2.038 (+-0.000) \| 74.485 (+-0.258) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 80.955 (+-0.208) \| 69.986 (+-0.250) \| 143.658 (+-0.244) \| 2.053 (+-0.000) \| 74.163 (+-0.238) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 117.576 (+-0.435) \| 71.179 (+-0.412) \| 178.515 (+-0.539) \| 2.508 (+-0.000) \| 108.394 (+-0.473) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 117.441 (+-0.205) \| 70.313 (+-0.170) \| 178.664 (+-0.555) \| 2.541 (+-0.000) \| 108.098 (+-0.416) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 92.962 (+-0.509) \| 1740.964 (+-0.597) \| 1785.401 (+-0.369) \| 1.026 (+-0.000) \| 92.638 (+-0.539) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 92.928 (+-0.493) \| 1401.146 (+-0.732) \| 1453.229 (+-0.628) \| 1.037 (+-0.000) \| 92.458 (+-0.428) Input: (2, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 118.152 (+-0.442) \| 1740.644 (+-0.480) \| 1793.475 (+-0.458) \| 1.030 (+-0.000) \| 107.962 (+-0.548) Input: (2, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 118.182 (+-0.425) \| 1400.621 (+-0.624) \| 1461.796 (+-0.630) \| 1.044 (+-0.000) \| 107.894 (+-0.994) Times are in microseconds (us). ``` [Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230801-220216-affine-grid-sampler-PR-afgg-vs-Nightly-speedup.md), [script](https://github.com/vfdev-5/pth-inductor-dev/blob/master/perf_affine_grid_sampler.py) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104709 Approved by: https://github.com/lezcano	2023-08-10 09:52:48 +00:00
Andy Rock	aa1b2f16c5	fix `upsample_nearest` decompositions for `uint8` tensors (#106675 ) Fixes #106674. This PR aligns the implementation of `_compute_upsample_nearest_indices` with `UpSampleKernel.cpp`: `68cb854d73/aten/src/ATen/native/cpu/UpSampleKernel.cpp (L1388-L1393)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106675 Approved by: https://github.com/albanD	2023-08-07 01:52:41 +00:00
Kshiteej K	a899333ffc	fix: nll_loss batch rule with negative ignore_idx (#106118 ) We use python decompositions instead of writing our own for batching rules. Fixes https://github.com/pytorch/pytorch/issues/105736 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106118 Approved by: https://github.com/lezcano, https://github.com/zou3519	2023-08-04 07:43:02 +00:00
chunyuan	cb6c3cbc91	inductor: enable weight prepack for LSTM (#103071 ) - Enabled LSTM weight prepack in inductor. - Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different. - Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input. Update 2023-07-26: - https://github.com/pytorch/pytorch/pull/103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in `3b207f7f1c (diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3)`). LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level. - Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module. When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here: `76fb72e24a/torch/nn/utils/stateless.py (L125-L128)` The forward function of LSTM will be called: `76fb72e24a/torch/_functorch/aot_autograd.py (L3379-L3381)` In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`: `76fb72e24a/torch/nn/modules/rnn.py (L775-L778)` The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here: `76fb72e24a/torch/nn/utils/stateless.py (L130-L142)` But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](`76fb72e24a/torch/_functorch/aot_autograd.py (L3695-L3698)`)). At this moment, if we need to recompile the model, `deepcopy` will be called: `76fb72e24a/torch/_dynamo/utils.py (L915-L917)` And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect: `76fb72e24a/torch/_subclasses/meta_utils.py (L514)` Added a fix in the `__get_state__` to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103071 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-07-28 13:54:32 +00:00
lezcano	36ae359655	Update matmul decomp to match eager (#105850 ) The decomposition was not updated after https://github.com/pytorch/pytorch/pull/95261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105850 Approved by: https://github.com/Chillee	2023-07-26 09:24:51 +00:00
Nikita Karetnikov	45e4706aff	[pt2] add decomps for `multilabel_margin_loss_forward` ops (#105302 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105302 Approved by: https://github.com/ezyang	2023-07-23 02:16:29 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
angelayi	fed8d3608d	Update core aten decomp table (#105673 ) Updated the decomposition table based on the existing [Core ATen IR](https://pytorch.org/docs/stable/ir.html) list, and moved rest of decompositions to inductor's decomposition table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105673 Approved by: https://github.com/SherlockNoMad	2023-07-21 02:45:37 +00:00
Yanbo Liang	8daed86e4e	[Inductor] aten.dist decomposition (#105586 ) Fixes #105557 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105586 Approved by: https://github.com/desertfire, https://github.com/Chillee	2023-07-20 06:42:44 +00:00
Justin Chu	8a688277a2	[BE] Enable ruff's UP rules and autoformat dynamo / functorch and refs (#105432 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105432 Approved by: https://github.com/ezyang	2023-07-19 13:48:44 +00:00
QSHLGZ	07108ff1e8	Fix typos under _decomp directory (#105210 ) Fix typos under _decomp directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/105210 Approved by: https://github.com/ezyang, https://github.com/Neilblaze	2023-07-17 11:41:30 +00:00
Peter Bell	9adfaf8807	[inductor] Add lowering for aten.unfold (#105165 ) The decomposition for unfold uses `as_strided` which forces the input to be realized. Instead, this implements it as a `GenericView` with reindexing which removes the need to realize, though it does call `mark_reuse` incase the input computation is expensive and the windows overlap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105165 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-07-16 13:09:23 +00:00
William Wen	5cd861fcf7	Add empty/empty_like to core aten decomps (#105158 ) Fixes https://github.com/pytorch/pytorch/issues/104871 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105158 Approved by: https://github.com/SherlockNoMad	2023-07-15 18:48:55 +00:00
Nikita Karetnikov	7e72126487	[pt2] add decomps for `multi_margin_loss` ops (#104578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104578 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-07-14 21:16:09 +00:00
Adnan Akhundov	4911b80b8e	[inductor] addmm + ReLU / GELU fusion pass (#104132 ) Summary: Add a new path in `post_grad.py` for replacing addmm + ReLU / GELU activation with the corresponding `_addmm_activation` call (with `use_gelu=False` or `True`, respectively). The replacement is done only on `max_autotune_gemm=False` and when the activation is fusible. Test Plan: $ python test/inductor/test_pattern_matcher.py -k test_addmm_activation -v (__main__.TestPaternMatcher.test_addmm_activation) ... /data/users/aakhundov/pytorch/torch/_inductor/compile_fx.py:128: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( Using FallbackKernel: aten._addmm_activation.default Using FallbackKernel: aten._addmm_activation.default /data/users/aakhundov/pytorch/torch/_dynamo/eval_frame.py:373: UserWarning: changing options to `torch.compile()` may require calling `torch._dynamo.reset()` to take effect warnings.warn( frames [('total', 1), ('ok', 1)] stats [('calls_captured', 2), ('unique_graphs', 1)] aot_autograd [('total', 1), ('ok', 1)] inductor [] ok ---------------------------------------------------------------------- Ran 1 test in 13.415s OK Reviewers: @eellison Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/104132 Approved by: https://github.com/eellison, https://github.com/jansel	2023-07-10 16:44:14 +00:00
Jerry Zhang	1a661639f7	[quant] Support integer implementations for adaptive_avg_pool2d (#104226 ) Summary: This is needed for representing quantized model in pt2 export quantization flow Test Plan: tested by opinfo, python test/test_ops.py Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/104226 Approved by: https://github.com/jgong5, https://github.com/andrewor14	2023-07-07 19:36:31 +00:00
XiaobingSuper	d3589c9456	reduce computation of batch_norm when weight or bias is none (#104616 ) For batch_norm decomposition, if weight or bias is None, we can skip some computations for better performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104616 Approved by: https://github.com/lezcano, https://github.com/desertfire, https://github.com/jgong5	2023-07-06 00:47:41 +00:00
Peter Bell	5c580a9846	[decomp] Add test tracking core ATen operators (#104262 ) This adds an expect-test that finds the set of core ATen operators by subtracting the operators with decomposition in core_aten_decompositions from the set of all operators that have decompositions and could be decomposed. This is useful because if you add a new decomposition but forget to add it to the list of core decompositions, it will appear in the PR diff. Also, by going through this list I have identified some operators where the functional variant is decomposed, but not the inplace variant which must be an oversight. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104262 Approved by: https://github.com/lezcano	2023-07-04 16:41:44 +00:00
David Berard	0b62aca726	Don't decompose aten.bucketize (#104396 ) torch.bucketize takes a tensor of values, and a "boundaries" tensor, which is a sorted list of values that represent buckets. It returns the bucket that each value lies in. E.g. if values = [1, 5, 3, 6] and boundaries=[0, 2, 4, 6, 8], the output will be [1, 3, 2, 4]. The current decomposition of this op doesn't work well with dynamic shapes. It performs a binary search, which bakes in the number of iterations in the binary search and requires recompiling (I don't completely understand why/where this happens). I can't think if whether there's a good way to write a decomposition for this op that will work with dynamic shapes. Use case: this op is very similar to some operations needed by jagged tensors. As a first step, I want to add a lowering for aten.bucketize and make use of opinfos. #104007 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104396 Approved by: https://github.com/Chillee	2023-06-30 05:05:08 +00:00
Peter Bell	8b418f197c	[decomp] Add decomposition for torch.renorm (#103858 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103858 Approved by: https://github.com/ezyang, https://github.com/nkaretnikov	2023-06-21 20:57:43 +00:00
Peter Bell	591981c5e2	[inductor] Lower diagonal, diagonal_copy and diagonal_scatter (#103755 ) Currently these are decomposed into `as_strided`, which forces a buffer to be realized. Instead, this lowers them into a native inductor view node and so doesn't require any buffers to be realized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103755 Approved by: https://github.com/jansel	2023-06-21 20:16:24 +00:00
Peter Bell	a61096fb94	[decomp] Decompose logaddexp2 (#103765 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103765 Approved by: https://github.com/Chillee	2023-06-21 20:16:24 +00:00
Peter Bell	61cd605813	[decomp] Don't call .item() in aten.fill.Tensor decomp (#103880 ) Currently calling the fill.Tensor overload under `torch.compile` results in a `DataDependentOutputException` due to the `.item()` call. This instead does a device-device copy which can then be inlined into subsequent inductor kernels as you would expect, e.g. ```python def fn(a): result = torch.deg2rad(a).sin() return torch.empty((128, 128), device=a.device).fill_(result) ``` generates the single kernel ```python @triton.jit def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 16384 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0 = tl.load(in_ptr0 + (0)) tmp1 = tl.broadcast_to(tmp0, [XBLOCK]) tmp2 = 0.017453292519943295 tmp3 = tmp1 * tmp2 tmp4 = tl.sin(tmp3) tl.store(out_ptr0 + (x0), tmp4, None) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103880 Approved by: https://github.com/Chillee	2023-06-21 18:45:04 +00:00
Kurt Mohler	ee83c646bb	Replace `_prims_common.check` with `torch._check` (#103240 ) This relands most of the changes from #102219 which were backed out by #103128. However, instead of removing `_prims_common.check`, it adds a warning and a comment mentioning that it will be removed in the future and `torch._check` should be used instead. As mentioned in https://github.com/pytorch/pytorch/pull/103128#pullrequestreview-1466414415, `_prims_common.check` cannot yet be removed because of some internal usage Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103240 Approved by: https://github.com/albanD	2023-06-21 00:46:17 +00:00
PyTorch MergeBot	7b6dc72ffa	Revert "[decomp] Decompose logaddexp2 (#103765 )" This reverts commit `bab21d20eb`. Reverted https://github.com/pytorch/pytorch/pull/103765 on behalf of https://github.com/ezyang due to looks like land race ([comment](https://github.com/pytorch/pytorch/pull/103765#issuecomment-1599030496))	2023-06-20 15:35:02 +00:00
Peter Bell	bab21d20eb	[decomp] Decompose logaddexp2 (#103765 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103765 Approved by: https://github.com/Chillee	2023-06-20 09:24:21 +00:00
Ivan Zaitsev	821493715c	Back out "Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 )", Back out "Forwatd fix for D46427687" (#103128 ) Test Plan: revertitparrot Reviewed By: malfet Differential Revision: D46506433 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103128 Approved by: https://github.com/malfet	2023-06-07 01:41:41 +00:00
Kurt Mohler	a84bb2709a	Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 ) Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-06-03 02:23:21 +00:00
PyTorch MergeBot	a7efa0ce35	Revert "Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 )" This reverts commit `fb79d43649`. Reverted https://github.com/pytorch/pytorch/pull/102219 on behalf of https://github.com/malfet due to Broke lint, see https://github.com/pytorch/pytorch/actions/runs/5158949959/jobs/9293466925 ([comment](https://github.com/pytorch/pytorch/pull/102219#issuecomment-1574245414))	2023-06-02 20:00:48 +00:00
Kurt Mohler	fb79d43649	Remove `check` from `_prims_common`, replace with `torch._check*` (#102219 ) Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102219 Approved by: https://github.com/lezcano, https://github.com/albanD	2023-06-02 19:13:45 +00:00
Aleksandar Samardžić	51e0f9e858	Add missing decompositons/lowerings for logical/bitwise operators (#102566 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102566 Approved by: https://github.com/lezcano, https://github.com/alexsio27444, https://github.com/jgong5	2023-06-02 14:27:17 +00:00
Nikita Karetnikov	c3ea8cc58b	[pt2] convert `out` params in `register_meta` (#101344 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101344 Approved by: https://github.com/lezcano	2023-05-27 18:38:52 +00:00
Animesh Jain	c2093de5d9	[partitioner] fix for rng ops (#102123 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102123 Approved by: https://github.com/Chillee	2023-05-25 00:35:07 +00:00
Peter Bell	ce42010722	[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812 Approved by: https://github.com/lezcano	2023-05-24 22:17:32 +00:00
vfdev-5	e3d97b6213	[inductor] Added `smooth_l1_loss` refs (#102077 ) Added `smooth_l1_loss` to refs + tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/102077 Approved by: https://github.com/lezcano, https://github.com/ngimel	2023-05-24 15:07:08 +00:00
Matthew Hoffman	29da75cc55	Enable mypy allow redefinition (#102046 ) Related #101528 I tried to enable this in another PR but it uncovered a bunch of type errors: https://github.com/pytorch/pytorch/actions/runs/4999748262/jobs/8956555243?pr=101528#step:10:1305 The goal of this PR is to fix these errors. --- This PR enables [allow_redefinition = True](https://mypy.readthedocs.io/en/stable/config_file.html#confval-allow_redefinition) in `mypy.ini`, which allows for a common pattern: > Allows variables to be redefined with an arbitrary type, as long as the redefinition is in the same block and nesting level as the original definition. `allow_redefinition` allows mypy to be more flexible by allowing reassignment to an existing variable with a different type... for instance (from the linked PR): `4a1e9230ba/torch/nn/parallel/data_parallel.py (L213)` A `Sequence[Union[int, torch.device]]` is narrowed to `Sequence[int]` thru reassignment to the same variable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102046 Approved by: https://github.com/ezyang	2023-05-24 07:05:30 +00:00
PyTorch MergeBot	5147fe4969	Revert "[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 )" This reverts commit `b9721bd705`. Reverted https://github.com/pytorch/pytorch/pull/101812 on behalf of https://github.com/osalpekar due to Causing test_nn_cuda tests to crash during runtime. More details at [D46093942](https://www.internalfb.com/diff/D46093942) ([comment](https://github.com/pytorch/pytorch/pull/101812#issuecomment-1560238085))	2023-05-23 23:06:21 +00:00
Peter Bell	b9721bd705	[inductor][decomp] Add aten._unsafe_index_put for unchecked indexing (#101812 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101812 Approved by: https://github.com/lezcano	2023-05-22 20:39:18 +00:00
Jason Ansel	0c6f409cda	[inductor] Refactor RNG operators (#100064 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100064 Approved by: https://github.com/ngimel	2023-05-20 03:43:33 +00:00
lezcano	1930428d89	Minor improvement on the decomposition of upsample_bilinear (#101682 ) This is how it's done in core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101682 Approved by: https://github.com/ngimel	2023-05-18 16:51:51 +00:00
Peter Bell	66e398951a	[inductor/decomp] Add aten._unsafe_index to disable range checks (#101602 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101602 Approved by: https://github.com/lezcano, https://github.com/ngimel	2023-05-17 23:36:24 +00:00
PyTorch MergeBot	5f07c589b0	Revert "[inductor] Refactor RNG operators (#100064 )" This reverts commit `3bbf0683a1`. Reverted https://github.com/pytorch/pytorch/pull/100064 on behalf of https://github.com/izaitsevfb due to breaks inductor tests, see D45936056 ([comment](https://github.com/pytorch/pytorch/pull/100064#issuecomment-1552093728))	2023-05-17 21:16:41 +00:00
Jason Ansel	3bbf0683a1	[inductor] Refactor RNG operators (#100064 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100064 Approved by: https://github.com/ngimel	2023-05-17 01:29:31 +00:00
Thibaut Durand	01da732691	Fix type annotation of `torch.split` (#100655 ) The type annotation indicates `list` but the returned type is `tuple` ```python >>> import torch >>> type(torch.arange(10).split(4)) <class 'tuple'> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100655 Approved by: https://github.com/kit1980	2023-05-16 21:35:41 +00:00
Jiong Gong	788ff0623b	[decomp] fix decomp of batch_norm when weight/bias is not flattened (#101059 ) Fix https://github.com/pytorch/pytorch/issues/100970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101059 Approved by: https://github.com/ezyang	2023-05-16 00:00:34 +00:00
Animesh Jain	e1021ec535	[decomp] Bad accuracy for elu_backward (#100284 ) Accuracy is tested by the full model at https://github.com/pytorch/pytorch/issues/100061 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100284 Approved by: https://github.com/ngimel	2023-04-29 04:21:20 +00:00
Bin Bao	b66d7007d8	Add aten.smooth_l1_loss_backward to core_aten_decompositions (#100267 ) Summary: https://github.com/pytorch/pytorch/pull/100242 didn't cover all test failures Pull Request resolved: https://github.com/pytorch/pytorch/pull/100267 Approved by: https://github.com/jansel	2023-04-28 19:32:17 +00:00
yhl48	07c02b9e92	Add vmap support for `smooth_l1_loss_backward` (#99429 ) Follow-up of #98357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99429 Approved by: https://github.com/kshitij12345, https://github.com/zou3519	2023-04-28 10:58:07 +00:00
Animesh Jain	a8ad0dc333	[philox_rand] Add decomps (#100206 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100206 Approved by: https://github.com/ngimel	2023-04-28 02:20:13 +00:00
Angela Yi	d06b93b0c7	Decompose arange.default to arange.start_step (#99739 ) The aten op arange.default is not in the core aten IR, and should decompose into the arange.start_step op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99739 Approved by: https://github.com/SherlockNoMad	2023-04-27 19:06:36 +00:00
Animesh Jain	539363a873	[inductor] Lowering of rngprims philox_rand (#99289 ) An example graph with Dynamic shapes on `arg0_1` is seed, `arg1_1` is base offset. ~~~ ===== Forward graph 0 ===== <eval_with_key>.5 class <lambda>(torch.nn.Module): def forward(self, arg0_1: i64[], arg1_1: i64[], arg2_1: Sym(s0), arg3_1: f32[s0]): # File: /scratch/anijain/work/pytorch/test/inductor/test_torchinductor.py:4605, code: a = torch.rand_like(x) * x add: i64[] = torch.ops.aten.add.Tensor(arg1_1, 0) philox_rand = torch.ops.rngprims.philox_rand.default([arg2_1], arg0_1, add, None, device(type='cuda', index=0), torch.float32); add = None getitem: f32[s0] = philox_rand[0] getitem_1: i64[] = philox_rand[1]; philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(getitem_1, 0); getitem_1 = None mul: f32[s0] = torch.ops.aten.mul.Tensor(getitem, arg3_1); getitem = arg3_1 = None # File: /scratch/anijain/work/pytorch/test/inductor/test_torchinductor.py:4606, code: a = torch.rand_like(x) * a add_2: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_1) philox_rand_1 = torch.ops.rngprims.philox_rand.default([arg2_1], arg0_1, add_2, None, device(type='cuda', index=0), torch.float32); arg2_1 = arg0_1 = add_2 = None getitem_2: f32[s0] = philox_rand_1[0] getitem_3: i64[] = philox_rand_1[1]; philox_rand_1 = None add_3: i64[] = torch.ops.aten.add.Tensor(add_1, getitem_3); add_1 = getitem_3 = None mul_1: f32[s0] = torch.ops.aten.mul.Tensor(getitem_2, mul); getitem_2 = mul = None # No stacktrace found for following nodes add_4: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_3); arg1_1 = add_3 = None add_5: i64[] = torch.ops.aten.add.Tensor(add_4, 3); add_4 = None div: i64[] = torch.ops.aten.div.Tensor_mode(add_5, 4, rounding_mode = 'floor'); add_5 = None mul_2: i64[] = torch.ops.aten.mul.Tensor(div, 4); div = None return (mul_1, mul_2) ~~~ Note that the output `mul2` is basically total `numel` of the random ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99289 Approved by: https://github.com/jansel	2023-04-26 01:22:41 +00:00
Animesh Jain	6bc4651193	[philox_rand] Dynamic shape support (#99290 ) Extends the functionalization of rng work to Dynamic shapes. An example of the generated graph looks like this ~~~ [2023-04-24 21:41:37,446] torch._functorch.aot_autograd.__aot_graphs: [INFO] TRACED GRAPH ===== Forward graph 1 ===== <eval_with_key>.7 class <lambda>(torch.nn.Module): def forward(self, arg0_1: i64[], arg1_1: i64[], arg2_1: Sym(s0), arg3_1: Sym(s1), arg4_1: f32[s0, s1]): # File: /scratch/anijain/work/pytorch/test/test_functionalization_of_rng_ops.py:46, code: a = torch.rand_like(x) * x add: i64[] = torch.ops.aten.add.Tensor(arg1_1, 0) philox_rand = torch.ops.rngprims.philox_rand.default([arg2_1, arg3_1], arg0_1, add, None, device(type='cuda', index=0), torch.float32); add = None getitem: f32[s0, s1] = philox_rand[0] getitem_1: i64[] = philox_rand[1]; philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(getitem_1, 0); getitem_1 = None mul: f32[s0, s1] = torch.ops.aten.mul.Tensor(getitem, arg4_1); getitem = arg4_1 = None # File: /scratch/anijain/work/pytorch/test/test_functionalization_of_rng_ops.py:47, code: a = torch.rand_like(x) * a add_2: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_1) philox_rand_1 = torch.ops.rngprims.philox_rand.default([arg2_1, arg3_1], arg0_1, add_2, None, device(type='cuda', index=0), torch.float32); arg2_1 = arg3_1 = arg0_1 = add_2 = None getitem_2: f32[s0, s1] = philox_rand_1[0] getitem_3: i64[] = philox_rand_1[1]; philox_rand_1 = None add_3: i64[] = torch.ops.aten.add.Tensor(add_1, getitem_3); add_1 = getitem_3 = None mul_1: f32[s0, s1] = torch.ops.aten.mul.Tensor(getitem_2, mul); getitem_2 = mul = None # No stacktrace found for following nodes add_4: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_3); arg1_1 = add_3 = None return (mul_1, add_4) ~~~ Each rand op is accompanied by its offset calculation op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99290 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-04-25 22:40:28 +00:00
XiaobingSuper	41069f2faa	inductor: align inductor behavior with eager mode for split_with_sizes (#99702 ) Fix https://github.com/pytorch/pytorch/issues/99686, for eager mode, if the given sizes is not meet requirements, it will report an error, but inductor can run, I think we need align inductor behavior with eager mode, the behavior will be like after this PR: ``` Traceback (most recent call last): File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 1267, in run_node return node.target(args, kwargs) File "/home/xiaobing/pytorch-offical/torch/functional.py", line 189, in split return tensor.split(split_size_or_sections, dim) File "/home/xiaobing/pytorch-offical/torch/_tensor.py", line 804, in split return torch._VF.split_with_sizes(self, split_size, dim) File "/home/xiaobing/pytorch-offical/torch/utils/_stats.py", line 20, in wrapper return fn(args, *kwargs) File "/home/xiaobing/pytorch-offical/torch/_subclasses/fake_tensor.py", line 1095, in __torch_dispatch__ return self.dispatch(func, types, args, kwargs) File "/home/xiaobing/pytorch-offical/torch/_subclasses/fake_tensor.py", line 1259, in dispatch return decomposition_table[func](args, *kwargs) File "/home/xiaobing/pytorch-offical/torch/_decomp/decompositions.py", line 1102, in split_with_sizes raise ValueError( ValueError: Split sizes don't add up to the tensor's size in the given dimension The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 1215, in get_fake_value return wrap_fake_exception( File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 835, in wrap_fake_exception return fn() File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 1216, in <lambda> lambda: run_node(tx.output, node, args, kwargs, nnmodule) File "/home/xiaobing/pytorch-offical/torch/_dynamo/utils.py", line 1279, in run_node raise RuntimeError( RuntimeError: Failed running call_function <function split at 0x7f45b8402ee0>((FakeTensor(..., size=(1, 5)), [2, 1, 1]), **{'dim': 1}): Split sizes don't add up to the tensor's size in the given dimension (scroll up for backtrace) The above exception was the direct cause of the following exception: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99702 Approved by: https://github.com/jgong5, https://github.com/lezcano, https://github.com/jansel	2023-04-25 01:13:52 +00:00
Will Constable	63690afc6c	Make CI error on inductor fallback when decomp is available (#99473 ) Fixes #99446 Remove the warning, as that annoyed end-users who don't know what to do about it. Instead, try to hold the line by preventing any decomp from being added without making the corresponding change to inductor's fallbacks. Note: we probably still need to better document how to update inductor's decomps, for now it's pretty much "go ask the inductor team for advice" Pull Request resolved: https://github.com/pytorch/pytorch/pull/99473 Approved by: https://github.com/ezyang, https://github.com/ngimel, https://github.com/jansel	2023-04-21 05:47:28 +00:00
PyTorch MergeBot	5cb788a9a5	Revert "[cuda rng] Making offset calculation independent of device properties (#98988 )" This reverts commit `26f318574f`. Reverted https://github.com/pytorch/pytorch/pull/98988 on behalf of https://github.com/anijain2305 due to Diagnosing if sebotnet has flakiness	2023-04-19 17:23:40 +00:00
Animesh Jain	26f318574f	[cuda rng] Making offset calculation independent of device properties (#98988 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98988 Approved by: https://github.com/ngimel	2023-04-19 01:35:44 +00:00
Animesh Jain	fdbc8625a1	Functionalization of torch.rand/rand_like ops (#97377 ) This PR introduces the functionalization of RNG ops. Key points are * Introduces a new `philox_rand` prim operator that accepts seed, offset. * Adds decompositions for random operators that use these philox_rand prims * Adds a PhiloxStateTracker to track the offset for each occurence of rand ops * Changes calling convention of AOT Autograd and adds <fwd_seed, fwd_base_offset> and <bwd_seed, bwd_base_offset> * Monkeypatches set_rng_state and get_rng_state while AOT Autograd tracing to record the rng state behavior * Raises assertion for CPU because CPU does not Philox RNG. Not dealt in this PR * dropout op - offset calculation is different * other distributions like normal, poisson etc * Inductor support * Cudagraph support * Dynamic shape support An example ~~~ class Custom(torch.autograd.Function): @staticmethod def forward(ctx, x): ctx.save_for_backward(x) a = torch.rand_like(x) * x a = torch.rand_like(x) * a return a @staticmethod def backward(ctx, grad_out): x, = ctx.saved_tensors return grad_out * torch.rand_like(grad_out) * torch.cos(x) ====== Forward graph 0 ====== def forward(self, fwd_seed_1: i64[], fwd_base_offset_1: i64[], primals_1: f32[16, 16]): # No stacktrace found for following nodes add: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 0) philox_rand: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add, [16, 1], device(type='cuda', index=0), torch.float32); add = None mul: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand, primals_1); philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 4); fwd_base_offset_1 = None philox_rand_1: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add_1, [16, 1], device(type='cuda', index=0), torch.float32); fwd_seed_1 = add_1 = None mul_1: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand_1, mul); philox_rand_1 = mul = None return [mul_1, primals_1] ====== Backward graph 0 ====== def forward(self, bwd_seed_1: i64[], bwd_base_offset_1: i64[], primals_1: f32[16, 16], tangents_1: f32[16, 16]): # No stacktrace found for following nodes add_2: i64[] = torch.ops.aten.add.Tensor(bwd_base_offset_1, 0); bwd_base_offset_1 = None philox_rand_2: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], bwd_seed_1, add_2, [16, 1], device(type='cuda', index=0), torch.float32); bwd_seed_1 = add_2 = None mul_2: f32[16, 16] = torch.ops.aten.mul.Tensor(tangents_1, philox_rand_2); tangents_1 = philox_rand_2 = None cos: f32[16, 16] = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_3: f32[16, 16] = torch.ops.aten.mul.Tensor(mul_2, cos); mul_2 = cos = None return [mul_3] ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97377 Approved by: https://github.com/ezyang	2023-04-16 09:55:56 +00:00
Peter Bell	7b91bd2a7b	[primTorch] Add count_nonzero (#98995 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98995 Approved by: https://github.com/lezcano	2023-04-13 22:08:19 +00:00
Peter Bell	7d74dca780	[primTorch] Add rad2deg and deg2rad (#98994 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98994 Approved by: https://github.com/lezcano	2023-04-13 22:08:19 +00:00
Nikita Karetnikov	ff825de442	[primTorch] add ref for `cumprod` (#98670 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98670 Approved by: https://github.com/ezyang	2023-04-09 15:22:28 +00:00
albanD	0210481dcb	Fix _like meta registrations (#98160 ) The meta implementation for these _like function is wrong whenever device != "meta" (it doesn't fill the memory!). zeros_like is special due to sparse and is fixed directly by always filling it with zeros. Every other one is CompositeExplicit implementation, I went with removing their meta registration and tweaking code to avoid infinite recursions. I can do the same as zeros_like (and add the proper filling for each) but that would duplicate the c++ logic and make the meta registrations non trivial. I can do it if you prefer to removal. test_meta works fine with these fixes, relying on CI to see if other tests are breaking as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98160 Approved by: https://github.com/ezyang	2023-04-06 18:44:34 +00:00
Kiersten Stokes	cea13ad9fa	Improve size mismatch error messaging referencing mat/vet sizes (#96863 ) Fixes #94841 This fixes the error messages in the following files, the same as those referenced in the linked issue. I was not able to find any additional examples, but am happy to add commits for any that I may have missed! ``` aten/src/ATen/native/Blas.cpp: "size mismatch, got ", self.size(0), ", ", mat.size(0), "x", mat.size(1), ",", vec.size(0)); torch/_decomp/decompositions.py: lambda: f"size mismatch, got {self.size(0)}x{self.size(1)},{vec.size(0)}", ``` Example output for `Blas.cpp` before: ``` size mismatch, got 3, 3x4,1 ``` The new error messages have the following format: ``` aten/src/ATen/native/Blas.cpp: "size mismatch, got bias (", self.size(0), "), matrix (", mat.size(0), "x", mat.size(1), "), vector (", vec.size(0), ")"); torch/_decomp/decompositions.py: lambda: f"size mismatch, got matrix ({self.size(0)}x{self.size(1)}), vector ({vec.size(0)})", ``` Example output for `Blas.cpp` after: ``` size mismatch, got bias (3), matrix (3x4), vector (1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96863 Approved by: https://github.com/albanD	2023-03-17 21:07:48 +00:00
Rohan Gupta	b01d6f2cdb	addmv decomp #2 (#96264 ) Fixes #94617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96264 Approved by: https://github.com/ngimel, https://github.com/ezyang	2023-03-16 23:09:45 +00:00
Christian Puhrsch	0a53c9624a	Back out "Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 )" (#96885 ) Summary: Backing out _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96885 Approved by: https://github.com/drisspg	2023-03-16 05:32:55 +00:00
mingfeima	6d62134f2c	fix aminmax output resize issue when input is a zero dimension tensor (#96171 ) Fix https://github.com/pytorch/pytorch/issues/96042 ### before ``` >>> torch.aminmax(torch.tensor(1, device='cpu'), dim=0, keepdim=True) __main__:1: UserWarning: An output with one or more elements was resized since it had shape [], which does not match the required output shape [1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:24.) torch.return_types.aminmax( min=tensor([1]), max=tensor([1])) >>> torch.aminmax(torch.tensor(1, device='cpu'), dim=0, keepdim=False) torch.return_types.aminmax( min=tensor(1), max=tensor(1)) ``` ### after ``` >>> torch.aminmax(torch.tensor(1, device='cpu'), dim=0, keepdim=True) torch.return_types.aminmax( min=tensor(1), max=tensor(1)) >>> torch.aminmax(torch.tensor(1, device='cpu'), dim=0, keepdim=False) torch.return_types.aminmax( min=tensor(1), max=tensor(1)) ``` Marked the following test as expected_fail: `test_vmap.py TestVmapOperatorsOpInfoCPU.test_op_has_batch_rule_aminmax_cpu_float32` Given input shape of (2), the loop out is shape (2), the batched vmap out is (2, 1), which mismatched. The loop out will calculate twice on a tensor shape of ( ): without this patch, the output is (1), and then stacked into (2, 1); with this patch, the output is ( ), then stacked into (2). Pull Request resolved: https://github.com/pytorch/pytorch/pull/96171 Approved by: https://github.com/jgong5, https://github.com/ngimel, https://github.com/zou3519	2023-03-15 22:44:13 +00:00
BowenBao	60a68477a6	Bump black version to 23.1.0 (#96578 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578 Approved by: https://github.com/ezyang	2023-03-15 06:27:59 +00:00
Jason Ansel	5dd52e250f	[inductor] Add some simple decomps (#96039 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96039 Approved by: https://github.com/ngimel	2023-03-05 17:07:56 +00:00
Natalia Gimelshein	43e71cddb0	[inductor] use triu ref instead of lowering (#96040 ) Fixes #95958 Generated code is functionally identical with ref and lowering, only minor differences Pull Request resolved: https://github.com/pytorch/pytorch/pull/96040 Approved by: https://github.com/jansel	2023-03-05 07:24:34 +00:00
Jason Ansel	5da6da659a	[inductor] Enable some decomps (#96038 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96038 Approved by: https://github.com/ngimel	2023-03-05 02:03:35 +00:00
Natalia Gimelshein	3a7fd20108	fix nll loss decomposition to properly ignore ignore_index (#95833 ) Fixes #95794 This is a hotfix for decomposition only (that is currently used by inductor), reference still accesses invalid indices. Perhaps `_nll_loss_nd` and this decomp should be unified, cc @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/95833 Approved by: https://github.com/lezcano, https://github.com/Chillee	2023-03-02 08:37:56 +00:00
Brian Hirsh	ddd6b53d80	fix embedding_backward_dense decomp with broadcasting (#95499 ) Fixes https://github.com/pytorch/pytorch/issues/95182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95499 Approved by: https://github.com/ezyang, https://github.com/ngimel	2023-02-28 00:24:40 +00:00
Christian Puhrsch	1fe2a9d122	Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339 ) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-02-27 20:27:25 +00:00
Yanan Cao (PyTorch)	039b4c8809	Add meta function for _upsample_bilinear2d_aa (#94982 ) Differential Revision: D43353000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94982 Approved by: https://github.com/ezyang	2023-02-19 07:11:20 +00:00
Brian Hirsh	68600fc7c6	avoid extra copies in batchnorm inference by introducing a new op, _native_batch_norm_legit_no_training (#94946 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94946 Approved by: https://github.com/ezyang	2023-02-16 11:41:20 +00:00
Fabio Rocha	1dbaa5c290	Use decompositions for some fallbacks introduced in #94039 (#94206 ) In some cases, implements required inductor primitives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94206 Approved by: https://github.com/jansel, https://github.com/ngimel	2023-02-14 09:31:30 +00:00
Peter Bell	e22e323bea	[decomp] Use var_mean in native_batch_norm decomposition (#94140 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94140 Approved by: https://github.com/ngimel	2023-02-10 15:19:46 +00:00
Horace He	e844120b2f	Fix embedding_dense_backward to not cast indiices to floats (#94572 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94572 Approved by: https://github.com/ngimel	2023-02-10 12:44:03 +00:00
lezcano	fe0e28ab87	[decompositions] GRU decompositon with and without packed sequence (#91466 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91466 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	5a7c1b7894	[decompositions] LSTM with packed input (#91465 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91465 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	bef61225c3	[decompositions] add decomposition for RNN with packed sequence (#91281 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91281 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	e5f6e1f660	[decompositions] add LSTM decomp (#91124 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91124 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
lezcano	20d01d2dc9	[expanded weights] add RNN support via decomp (#91807 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91807 Approved by: https://github.com/albanD	2023-02-08 14:16:30 +00:00
lezcano	c2a92687e0	[decompositions] add RNN decomp and testing (#91123 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91123 Approved by: https://github.com/zou3519	2023-02-08 14:16:30 +00:00
Michael Lazos	d16c2c36ad	Add another missing decomp (#94113 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/94113 Approved by: https://github.com/jansel	2023-02-07 21:32:56 +00:00
Natalia Gimelshein	7bba87ed06	add rsub decomposition with alpha (#94144 ) Fixes #93376 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94144 Approved by: https://github.com/desertfire	2023-02-07 17:21:13 +00:00
Natalia Gimelshein	ea4cda5268	fix inductor clamp decomp to correctly type promote and avoid wrappin… (#94157 ) …g scalars Fixes #93784, #93225 Ideally, clamp decomp should live in refs or _decomp, but this reversed our current decomposition flow of `clamp_min` -> `clamp` -> lowering, so to keep changes to minimum, I'm leaving it in inductor for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94157 Approved by: https://github.com/ezyang	2023-02-06 05:36:19 +00:00
Natalia Gimelshein	8ecda19607	fix upsampling decompositions to have integer output sizes (#94123 ) This allows unet to be compiled with symbolic shapes (but it still fails accuracy, lol). Output sizes are always integer, there's no need to pretend they are ever float. Recomputing scale factors still used nominally float sizes converted to int, we might as well do it from the start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94123 Approved by: https://github.com/ezyang	2023-02-05 04:56:07 +00:00
Peter Bell	77acb556e6	[primTorch] Rewrite nan_to_num ref in terms of aten functions (#93952 ) This de-duplicates `_refs.nan_to_num` with the inductor decomposition and simplifies it to not reimplement `isnan`, `isposinf` and `isneginf`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93952 Approved by: https://github.com/lezcano	2023-02-03 13:51:37 +00:00
Peter Bell	72385bbd03	[primTorch] Rewrite is{,pos,neg}inf refs in terms of aten functions (#93951 ) `isposinf` and `isneginf` currently fallback in inductor. Here, I enable the existing decompositions to work with inductor. `isinf` can also be written with aten functions, however I don't add it to inductor's decompositions because `isinf` is lowered to `tl.libdevice.isinf` in triton. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93951 Approved by: https://github.com/lezcano	2023-02-03 13:51:37 +00:00
XiaobingSuper	db87396474	inductor: align the decomposition output stride with none-decomposition path for torch.lerp (#93336 ) As title, we need to align the decomposition output stride with the none-decomposition path for torch.lerp. And also enable it's lowering path for inductor. After this PR for the following case: ``` def fn(i0, i1): # i0: (10, 3, 10) # i1: (3, 10, 10) x1 = i0.transpose(-2, -3) #y = torch.lerp(x1, x1, 70000) z = torch.lerp(i1, x1, 70000) return z x0 = torch.rand(10, 3, 10) x1 = torch.rand(3, 10, 10) ret_eager = fn(x0, x1) print('==== Eager mode OK! ====') compiled = torch.compile(fn, fullgraph=True) ret_compiled = compiled(x0, x1) print('==== compile mode OK! ====') ret_compiled = compiled(x0, x1) print(torch.equal(ret_eager, ret_compiled)) print(ret_eager.stride()==ret_compiled.stride()) ``` the inductor output code will be like(CPU): ``` from ctypes import c_void_p, c_long import torch import random from torch import empty_strided, as_strided, device from torch._inductor.codecache import AsyncCompile from torch._inductor.select_algorithm import extern_kernels aten = torch.ops.aten assert_size_stride = torch._C._dynamo.guards.assert_size_stride async_compile = AsyncCompile() kernel_cpp_0 = async_compile.cpp(''' #include "/tmp/torchinductor_xiaobing/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h" extern "C" void kernel(const float* __restrict__ in_ptr0, const float* __restrict__ in_ptr1, float* __restrict__ out_ptr0) { { #pragma GCC ivdep for(long i0=0; i0<3; i0+=1) { #pragma GCC ivdep for(long i1=0; i1<10; i1+=1) { for(long i2=0; i2<0; i2+=1) { auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr0 + (10i0) + (16i2) + (30i1)); auto tmp8 = at::vec::Vectorized<float>::loadu(in_ptr1 + (10i1) + (16i2) + (100i0)); auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(70000.0)); auto tmp1 = tmp0.abs(); auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(0.5)); auto tmp3 = tmp1 >= tmp2; auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1)); auto tmp5 = tmp0 - tmp4; auto tmp6 = decltype(tmp5)::blendv(tmp0, tmp5, tmp3); auto tmp9 = tmp7 - tmp8; auto tmp10 = tmp6 * tmp9; auto tmp11 = decltype(tmp7)::blendv(tmp8, tmp7, tmp3); auto tmp12 = tmp10 + tmp11; tmp12.store(out_ptr0 + (10i1) + (16i2) + (100i0)); } #pragma omp simd simdlen(8) for(long i2=0; i2<10; i2+=1) { auto tmp7 = in_ptr0[i2 + (10i0) + (30i1)]; auto tmp8 = in_ptr1[i2 + (10i1) + (100i0)]; auto tmp0 = static_cast<float>(70000.0); auto tmp1 = std::abs(tmp0); auto tmp2 = static_cast<float>(0.5); auto tmp3 = tmp1 >= tmp2; auto tmp4 = static_cast<float>(1); auto tmp5 = tmp0 - tmp4; auto tmp6 = tmp3 ? tmp5 : tmp0; auto tmp9 = tmp7 - tmp8; auto tmp10 = tmp6 tmp9; auto tmp11 = tmp3 ? tmp7 : tmp8; auto tmp12 = tmp10 + tmp11; out_ptr0[i2 + (10i1) + (100i0)] = tmp12; } } } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg0_1, arg1_1 = args args.clear() buf1 = empty_strided((3, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf1.data_ptr())) del arg0_1 del arg1_1 return (buf1, ) if __name__ == "__main__": from torch._dynamo.testing import rand_strided from torch._inductor.utils import print_performance arg0_1 = rand_strided((10, 3, 10), (30, 10, 1), device='cpu', dtype=torch.float32) arg1_1 = rand_strided((3, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32) print_performance(lambda: call([arg0_1, arg1_1])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93336 Approved by: https://github.com/jansel	2023-02-02 07:40:28 +00:00
Sherlock Huang	6a7d6cc30d	Introduce core_aten_decompositions (#93131 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93131 Approved by: https://github.com/ngimel	2023-02-01 08:35:46 +00:00
Joel Schlosser	e5fd7e6d8f	Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854 ) For the `crossvit_9_240` model - it works now with dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92854 Approved by: https://github.com/ezyang	2023-01-25 05:08:02 +00:00
PyTorch MergeBot	01f1097770	Revert "Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854 )" This reverts commit `d49187bf88`. Reverted https://github.com/pytorch/pytorch/pull/92854 on behalf of https://github.com/malfet due to Resulted in 50+% flaky failures in dynamo, reverting	2023-01-25 00:10:14 +00:00
Joel Schlosser	d49187bf88	Fix to use upsample_bicubic2d.vec decomp for dynamic shape support (#92854 ) For the `crossvit_9_240` model - it works now with dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92854 Approved by: https://github.com/ezyang	2023-01-24 21:36:17 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	8f3600b966	[RELAND] Add metadata coverage for unsafe_split and unsafe_split_with_sizes (#92802 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92802 Approved by: https://github.com/soumith	2023-01-23 10:57:10 +00:00
PyTorch MergeBot	0d9de46d9c	Revert "Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 )" This reverts commit `36e1f7bc2b`. Reverted https://github.com/pytorch/pytorch/pull/92608 on behalf of https://github.com/ezyang due to test_aot_autograd_symbolic_exhaustive_unsafe_split_cpu_float32 (main.TestEagerFusionOpInfoCPU) is now xpass	2023-01-22 13:57:31 +00:00
Tugsbayasgalan Manlaibaatar	36e1f7bc2b	Add meta kernel coverage for aten.unsafe_split, aten.unsafe_chunk (#92608 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92608 Approved by: https://github.com/ngimel	2023-01-22 07:12:29 +00:00
Peter Bell	dd760c98f8	[decomp] Use new squeeze.dims overload in decompositions (#91602 ) This removes the now-redundant `_squeeze_multiple` helpers and instead decomposes into a single call to `aten::squeeze.dims` which also has the effect of reducing the lowered graph size in inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91602 Approved by: https://github.com/ngimel	2023-01-20 18:08:18 +00:00

1 2 3 4 5 ...

350 Commits