pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Peter Bell	7ecbbc40c3	[HOP][inductor] Add higher order associative scan operator (#119430 ) Currently only supports single tensor scans, e.g. `cumsum`, `cumprod`, `logcumsumexp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119430 Approved by: https://github.com/Chillee	2024-04-23 14:40:13 +00:00
Gao Tianlin	aaef246c74	remove log2 decomposition; add log2 lowering (#123112 ) Same reason as `log10`. `log2` is a core aten op, we should not decompose it. As https://github.com/pytorch/pytorch/pull/110882 suggested, it often maps to a hardware intrinsic; Furthermore, decomposing it will negatively impact the numerical precision of the output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123112 Approved by: https://github.com/peterbell10	2024-04-02 16:16:26 +00:00
Catherine Lee	f9b2ffa7c4	Forward fix lint after #119727 (#123137 ) After #119727 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123137 Approved by: https://github.com/albanD	2024-04-02 09:35:20 +00:00
Peter Bell	09c72eaa3f	[inductor] Remove identity from ops.scan (#119727 ) Currently scan has an `init` argument which must be the identity of the combine function. This isn't strictly necessary if we are more careful about keeping track of the first element and avoid combining it with anything. This does additionally require that there are no active load masks, since we can't do the `where_cond` any more. However, this shouldn't be possible anyway since scans are always realized and only fused via the scheduler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119727 Approved by: https://github.com/lezcano	2024-04-01 22:47:26 +00:00
Peter Bell	03439d4c1c	[inductor] Lower divide by constant as multiplication by reciprocal (#121924 ) Fixes #101039 This lowers division by a constant value to be multipication by reciprocal. The same optimization is applied in eager mode on CUDA: `0636c11811/aten/src/ATen/native/cuda/BinaryDivTrueKernel.cu (L36-L38)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121924 Approved by: https://github.com/lezcano	2024-04-01 14:37:37 +00:00
Jiong Gong	367ec62ae3	[inductor][cpp] generalize vector mask for dtypes (#119654 ) Vectorized boolean values in CPU Inductor were modeled with `Vectorized<float>` which cannot work for operations with other data types. This PR generalizes it with the new `VecMask` template class that can work for masks on any vectorized data types. The intrinsics implementation in `cpp_prefix.h` for mask conversion, cast and masked load are now implemented as the specialization for `VecMask` and moved to corresponding header files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119654 Approved by: https://github.com/leslie-fang-intel, https://github.com/jansel	2024-03-27 05:33:53 +00:00
Isuru Fernando	409b1a6081	Add lowering for cummax, cummin (#120429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120429 Approved by: https://github.com/peterbell10	2024-03-15 19:04:38 +00:00
Isuru Fernando	b7df3bba62	add decomposition for frexp (#119217 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119217 Approved by: https://github.com/peterbell10 ghstack dependencies: #119284, #120027	2024-02-23 21:52:42 +00:00
Peter Bell	98fd23cccc	[EASY] Move OpsHandler and MockHandler to their own file (#119851 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119851 Approved by: https://github.com/lezcano ghstack dependencies: #119728	2024-02-15 18:54:41 +00:00

9 Commits