pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin	dd96c26066	[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778 This is a preparation for a switch from raw pointers to shared pointers as a memory model for TE expressions and statements. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30487425 Pulled By: ZolotukhinM fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c	2021-08-24 00:30:49 -07:00
Mikhail Zolotukhin	1dc2b52764	[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195 This helps us to later switch from using KernelArena with raw pointers to shared pointers without having to change all our source files at once. The changes are mechanical and should not affect any functionality. With this PR, we're changing the following: * `Add` --> `AddPtr` `new Add(...)` --> `alloc<Add>(...)` * `dynamic_cast<Add>` --> `to<Add>` `static_cast<Add>` --> `static_to<Add>` Due to some complications with args forwarding, some places became more verbose, e.g.: `new Block({})` --> `new Block(std::vector<ExprPtr>())` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292779 Pulled By: ZolotukhinM fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9	2021-08-17 13:44:45 -07:00
Bert Maher	93772792e3	[nnc] Get rid of fuser trigger counters (#57334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334 Here's a possibly controversial PR. These counters got in the way of generalizing the fuser tests to handle arbitrary devices, and I guess I'm just generally skeptical that they provide much value. While true that they let us observe whether fusion groups were created, we already have assertions based on the shape of the graph, and I'm not sure that I trust those any less than these counters. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29471484 Pulled By: bertmaher fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57	2021-06-29 22:22:15 -07:00
Raghavan Raman	30e24b2d2b	[nnc] Modified vectorize API to return bool (#59422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59422 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D28886980 Pulled By: navahgar fbshipit-source-id: 58cc3ecd86564a312a132f8260d836b096505095	2021-06-11 12:02:19 -07:00
Bert Maher	617b74aa35	[nnc] LLVMCodeGen for any target (#58713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58713 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28585722 Pulled By: bertmaher fbshipit-source-id: 82885b9780dc1a8610660a90969d8d2baad97920	2021-05-27 09:25:15 -07:00
Raghavan Raman	dd7bbe1a63	[NNC] Make splitWithMask transform in-place (#58269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58269 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28427227 Pulled By: navahgar fbshipit-source-id: 4e38a436abcf4752fd7ef6ab3666876eec6ea5ba	2021-05-25 11:32:51 -07:00
Mikhail Zolotukhin	c751e53800	[TensorExpr] Implement 'call_raw' in IREval. (#57882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57882 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28306752 Pulled By: ZolotukhinM fbshipit-source-id: 11d0034f9bfbadf8483de90c457f952a2161f10b	2021-05-12 14:08:18 -07:00
Mikhail Zolotukhin	0bf69278f7	Reland: [TensorExpr] Add `CodeGen::call_raw` method. (#57551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57551 The new method allows to pass input and output arguments by `void*` pointers instead of CallArgs. That helps to reduce the invocation overhead. Currently this is only supported in LLVM codegen. Relanding #55113 (the entire stack) which was reverted because I forgot to guard a new test with `ifdef LLVM`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28195049 Pulled By: ZolotukhinM fbshipit-source-id: 035b77ae996dbbcd542b4b0e4c011b41e8d7828b	2021-05-05 09:10:25 -07:00
Mike Ruberry	05b255c543	Revert D27487549: [TensorExpr] Add `CodeGen::call_raw` method. Test Plan: revert-hammer Differential Revision: D27487549 (`c9ab384af7`) Original commit changeset: d8f3d92262cd fbshipit-source-id: ea8e71dbe2d632bc0fb557362c8bd899eb6aa83a	2021-05-01 19:48:07 -07:00
Mikhail Zolotukhin	c9ab384af7	[TensorExpr] Add `CodeGen::call_raw` method. (#55113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55113 The new method allows to pass input and output arguments by `void*` pointers instead of CallArgs. That helps to reduce the invocation overhead. Currently this is only supported in LLVM codegen. Differential Revision: D27487549 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: d8f3d92262cde1c155beefb629454370d9af2f89	2021-04-30 15:24:37 -07:00
Mikhail Zolotukhin	7ab654afd7	[TensorExpr] Rename `Tensor::call` to `Tensor::load` to be consistent with `Buf` and `Placeholder`. (#55826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826 It's a mechanical change. Differential Revision: D27717777 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51	2021-04-13 12:08:53 -07:00
Mikhail Zolotukhin	1263448cb2	[TensorExpr] Remove mask field from Load and Store classes. (#55825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825 The mask has never been used (in vectorization we generate an explicit `IfThenElse` construct when we need to mask out some elements). The PR removes it and cleans up all its traces from tests. Differential Revision: D27717776 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db	2021-04-13 12:08:51 -07:00
Mikhail Zolotukhin	b01a15d3d3	[TensorExpr] Redesign Rfactor loopnest transformation. (#55324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55324 With this change `rfactor` only affects the passed loop and its body never touching anything outside (that was a rootcause of a bug with the previous implementation). Also, we don't have an `insertion_point` parameter anymore - its meaning was vague, and the effect of it should've been achievable with other transformations anyway. The new `rfactor` semantics is as follows: ``` Requirements: * S is the reduction store * S is the only statement in the innermost loop * There is at least two reduction arguments in S * OUTER_REDUCTION_FOR loop corresponds to the outermost reduction variable used in the store and all other reduction variables are index variables of children loops of OUTER_REDUCTION_FOR * OUTER_REDUCTION_FOR is a perfect loop nest, i.e. it has only loops corresponding to the other reduction variables and the store, nested into each other What it does: * Introduce a new buffer with an extra dimension of a size equal to the span of the loop OUTER_REDUCTION_FOR (the new buffer is returned via RFAC_BUF_PTR) * Insert an initialization store for the new buffer in OUTER_REDUCTION_FOR before its nested loop * Replace the reduction store to the original buffer with the reduction store to the temp buffer, removing the index var of OUTER_REDUCTION_FOR from reduction arguments * Insert a final reduction store over the extra dimension of the new buffer to the original buffer * Returns TRUE if the transformation succeeded and FALSE otherwise Example: Original IR: S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis S4: for k # reduction axis S5: X[i] = ReduceOp(X[i] + Y[i,j,k], reduce_axis={j,k}) After RFACTOR(S5, S3) S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis for X, normal axis for X_rfac X_rfac[i,j] = 0 S4: for k # reduction axis X_rfac[i,j] = ReduceOp(X_rfac[i,j] + Y[i,j,k], reduce_axis={k}) X[i] = ReduceOp(X[i] + X_rfac[i,j], reduce_axis={j}) ``` Differential Revision: D27694960 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 076fa6a1df2c23f5948302aa6b43e82cb222901c	2021-04-13 12:08:48 -07:00
Mikhail Zolotukhin	688e350725	[TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997 DepTracker was used to automatically pull in dependent computations from output ones. While it seems quite convenient, it's led to several architectural issues, which are fixed in this stack. DepTracker worked on Tensors, which is a pair of Buf and Stmt. However, Stmt could become stale and there was no way to reliably update the corresponding tensor. We're now using Bufs and Stmts directly and moving away from using Tensors to avoid these problems. Removing DepTracker allowed to unify Loads and FunctionCalls, which essentially were duplicates of each other. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27446414 Pulled By: ZolotukhinM fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399	2021-04-01 19:46:26 -07:00
Xiaoqiang Zheng	9f86b656ba	Resubmit: Adding parallel support for the LLVM backend. (#54122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54122 Test Plan: * USE_TBB=1 ATEN_THREADING=TBB python setup.py develop --cmake * USE_TBB=1 ATEN_THREADING=NATIVE python setup.py develop --cmake * USE_TBB=1 ATEN_THREADING=OMP python setup.py develop --cmake * cd build; ninja bin/tensorexpr_bench * bin/test_tensorexpr --gtest_filter="Parallel" Reviewed By: bertmaher Differential Revision: D27109802 Pulled By: zheng-xq fbshipit-source-id: db159466d0b46357bcf0fbefb36094bee312368c	2021-03-18 07:19:37 -07:00
Nikita Shulga	d57ae6c46d	Revert D26906509: Adding parallel support for the LLVM backend. Test Plan: revert-hammer Differential Revision: D26906509 (`95d2318510`) Original commit changeset: 12c17f2f21af fbshipit-source-id: cc86d0dfca0dd791b31bda23a0172fc1cfd89760	2021-03-11 17:54:47 -08:00
Xiaoqiang Zheng	95d2318510	Adding parallel support for the LLVM backend. (#53243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53243 Test Plan: Imported from OSS Reviewed By: bertmaher, Chillee Differential Revision: D26906509 Pulled By: zheng-xq fbshipit-source-id: 12c17f2f21af11e73fa4c5b5199043a7a15ecdec	2021-03-11 03:27:37 -08:00
Bert Maher	8ba7c4918a	[nnc] Test for direct usage of ramp/broadcast Summary: I was attempting to experiment with "manual" vectorization, and boy was it hard. I finally came up with this, which I want to write down as a test case. Eventually the APIs should make this easier... Test Plan: buck test Reviewed By: navahgar Differential Revision: D26631189 fbshipit-source-id: c28794b25d7852890ea843fdbcaf8751648258c0	2021-02-25 15:02:20 -08:00
Bert Maher	74082f0d6f	[te][llvm] Generate arithmetic vs logical right shift as appropriate (#51749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51749 Following in the mode of C++, we probably want to distinguish when it's appropriate to do arithmetic vs. logical right shift. > For negative a, the value of a >> b is implementation-defined (in most > implementations, this performs arithmetic right shift, so that the result > remains negative). If you look at what clang does, if `a` is unsigned, a logical shift is generated; if signed, an arithmetic shift. Let's do the same here. This turns out to be useful for, e.g., implementing transcendental function approximations. ghstack-source-id: 121366317 Test Plan: Added Byte (unsigned) and Char (signed) right-shift tests to test_llvm. Reviewed By: asuhan Differential Revision: D26245856 fbshipit-source-id: 260ee9bf4b032b9ce216f89acbc273cde0ed688c	2021-02-10 02:05:39 -08:00
Mikhail Zolotukhin	c639513378	[TensorExpr] Resubmit: Introduce ExternalCall nodes to TE IR. (#51594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51594 ExternalCall nodes represent opaque calls to external functions to fill a tensor (buffer) with values. It could be used to include nodes that are otherwise not-representable as TE, or whose TE representation is currently too slow. To make an external function available in NNC as ExternalCall, one needs to implement a "bridge" function that would take raw (void*) pointers to the data along with the arrays containing dimension info. This function would then internally call the desired external function and make sure the results of the call are correctly placed in the provided raw data buffers. The reason the PR was previously reverted was that the LLVM generated calls to bridge functions were breaking unwind tables. This is now fixed by requiring bridge functions to never throw and setting the corresponding attribute in the LLVM generated code. Differential Revision: D26213882 Test Plan: Imported from OSS Reviewed By: pbelevich, ngimel Pulled By: ZolotukhinM fbshipit-source-id: db954d8338e2d750c2bf0a41e88e38bd494f2945	2021-02-03 10:22:54 -08:00
Luca Wehrstedt	4f37150f40	Revert D26179083: [TensorExpr] Introduce ExternalCall nodes to TE IR. Test Plan: revert-hammer Differential Revision: D26179083 (`f4fc3e3920`) Original commit changeset: 9e44de098ae9 fbshipit-source-id: d15684e04c65c395b4102d4f98a4488482822d1b	2021-02-02 05:29:41 -08:00
Mikhail Zolotukhin	f4fc3e3920	[TensorExpr] Introduce ExternalCall nodes to TE IR. (#51475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51475 ExternalCall nodes represent opaque calls to external functions to fill a tensor (buffer) with values. It could be used to include nodes that are otherwise not-representable as TE, or whose TE representation is currently too slow. To make an external function available in NNC as ExternalCall, one needs to implement a "bridge" function that would take raw (void*) pointers to the data along with the arrays containing dimension info. This function would then internally call the desired external function and make sure the results of the call are correctly placed in the provided raw data buffers. Test Plan: Imported from OSS Reviewed By: pbelevich, Chillee Differential Revision: D26179083 Pulled By: ZolotukhinM fbshipit-source-id: 9e44de098ae94d25772cf5e2659d539fa6f3f659	2021-02-02 00:50:46 -08:00
Mikhail Zolotukhin	e975169426	[TensorExpr] Redesign `Tensor` class. (#50995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50995 This change makes 'Tensor' a thin wrapper over 'Buf' and 'Stmt', and merges it with recently introduced 'CompoundTensor'. A statement for the tensor is either passed directly to the Tensor constructor (akin to 'CompoundTensor'), or is built immediately in constructor. LoopNest is no longer responsible for constructing statements from tensors - it simply stitches already constructed statements contained in Tensors. This has a side effect that now we cannot construct several loopnests from the same tensors - we need to explicitly clone statements if we want to do that. A special copy constructor was added to LoopNest to make it more convenient (note: this only affects tests, we don't usually create multiple loopnests in other places). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26038223 Pulled By: ZolotukhinM fbshipit-source-id: 27a2e5900437cfb0c151e8f89815edec53608e17	2021-01-27 16:14:22 -08:00
Andres Suarez	8530c65e25	[codemod][fbcode/caffe2] Apply clang-format update fixes Test Plan: Sandcastle and visual inspection. Reviewed By: igorsugak Differential Revision: D25849205 fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0	2021-01-09 14:37:36 -08:00
Mikhail Zolotukhin	e1f73ced1e	[TensorExpr] Change `LoopNest::vectorize` to accept `For` instead of `Stmt`. (#49696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49696 And make it static. Test Plan: Imported from OSS Reviewed By: navahgar, nickgg Differential Revision: D25668695 Pulled By: ZolotukhinM fbshipit-source-id: 8d7fb507d6f3beca70e868d9e0f4c46247311a99	2020-12-21 20:17:20 -08:00
Bram Wasti	1047957831	[te][reapply] Add fast log approximation based on sleef (#49575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49575 This is a fast log implementations benchmark: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none' ``` Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat Reviewed By: bertmaher Differential Revision: D25627157 fbshipit-source-id: a4920f4f4005ce617d372b375e790ca966275cd9	2020-12-17 17:02:00 -08:00
Edward Yang	ea4ccc730e	Revert D25445815: [te] Add fast log approximation based on sleef Test Plan: revert-hammer Differential Revision: D25445815 (`1329066b69`) Original commit changeset: 20696eacd12a fbshipit-source-id: 38830a6abd16260d60e5dd9a5594e65736a9c782	2020-12-17 15:03:17 -08:00
Bram Wasti	1329066b69	[te] Add fast log approximation based on sleef Summary: This is a fast log implementations benchmark: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none' ``` Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat Reviewed By: bertmaher Differential Revision: D25445815 fbshipit-source-id: 20696eacd12a55e797f606f4a6dbbd94c9652888	2020-12-17 14:28:34 -08:00
Bram Wasti	6b78644623	[te] Add BitCast to the IR (#49184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49184 Adds BitCasting to NNC. This will enable fast approximation algorithms implemented directly in TensorExpressions Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr Reviewed By: bertmaher Differential Revision: D25466476 fbshipit-source-id: f063ab29ba7bab2dcce463e499f2d4a16bdc1f0e	2020-12-11 16:12:20 -08:00
Bram Wasti	195b92bfa6	Revert D25441716: [te] Add BitCast to the IR Test Plan: revert-hammer Differential Revision: D25441716 (`3384145418`) Original commit changeset: c97b871697bc fbshipit-source-id: e6eff02e28e1ae8c826dd2cfed79f869839ed2ba	2020-12-10 09:31:35 -08:00
Bram Wasti	3384145418	[te] Add BitCast to the IR Summary: Adds BitCasting to NNC. This will enable fast approximation algorithms implemented directly in TensorExpressions Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr Reviewed By: bertmaher Differential Revision: D25441716 fbshipit-source-id: c97b871697bc5931d09cda4a9cb0a81bb420f4e2	2020-12-10 09:25:46 -08:00
Bert Maher	07657b6001	[tensorexpr] Switch cpp tests to pure gtest (#48160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160 We no longer use the custom c++ test infra anyways, so move to pure gtest. Fixes #45703 ghstack-source-id: 116977283 Test Plan: `buck test //caffe2/test/cpp/tensorexpr` Reviewed By: navahgar, nickgg Differential Revision: D25046618 fbshipit-source-id: da34183d87465f410379048148c28e1623618553	2020-11-18 12:23:34 -08:00
Nick Gibson	957e45a97c	[NNC] Support vectorization of reductions (#47924 ) Summary: Add support for ReduceOp in the Vectorizer, which allows vectorization of reductions. Only non-reduce axes can be vectorized currently, we'd need either automatically pulling out the RHS of reductions (better as a separate transform, I think) or special handling of vector reduce in the LLVM codegen (tricky, maybe not useful?) to make vectorizing reduce axes work. There was a disabled LLVM test for this case which I reenabled with a bit of massaging, and added a few more. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47924 Reviewed By: bertmaher Differential Revision: D24963464 Pulled By: nickgg fbshipit-source-id: 91d91e9e2696555ab5690b154984b1ce48359d51	2020-11-16 10:43:53 -08:00
Cheng Chang	f730f2597e	[NNC] Implement Cond in LLVM codegen (#47256 ) Summary: Generate LLVM IR for statements such as ``` if (...) { .... } else { .... } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47256 Test Plan: added unit tests to test_llvm.cpp Reviewed By: nickgg Differential Revision: D24699080 Pulled By: cheng-chang fbshipit-source-id: 83b0cebcd242828263eb6052483f0924b5f091ce	2020-11-03 14:46:30 -08:00
Mikhail Zolotukhin	4aca63d38a	[TensorExpr] Change API for creating Load and Store expressions. (#45520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45520 With this change `Load`s and `Store`s no longer accept `Placeholder`s in their constructor and `::make` functions and can only be built with `Buf`. `Placeholder` gets its own `store`, `load`, `storeWithMask`, and `loadWithMask` method for more convenient construction. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23998789 Pulled By: ZolotukhinM fbshipit-source-id: 3fe018e00c1529a563553b2b215f403b34aea912	2020-09-29 20:52:38 -07:00
Mikhail Zolotukhin	b86008ab75	[TensorExpr] Remove buf_ field from class Tensor. (#45390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45390 Tensor objects should always refer to their Function's bufs. Currently we never create a Tensor with a buffer different than of its function, but having it in two places seems incorrect and dangerous. Differential Revision: D23952865 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: e63fc26d7078427514649d9ce973b74ea635a94a	2020-09-29 01:21:57 -07:00
Mikhail Zolotukhin	3c33695a6d	[TensorExpr] Rename `Buffer` to `Placeholder`. (#45389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45389 Differential Revision: D23952866 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 17eedd3ac17897501403482ac1866c569d247c75	2020-09-29 01:21:54 -07:00
Mikhail Zolotukhin	92306b85d5	[TensorExpr] Consolidate {buffer,function,tensor}.{h.cpp} in tensor.{h,cpp}. (#45388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45388 Classes defined in these files are closely related, so it is reasonable to have them all in one file. The change is purely a code move. Differential Revision: D23952867 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 12cfaa968bdfc4dff00509e34310a497c7b59155	2020-09-29 01:17:10 -07:00
Alex Suhan	18b77d7d17	[TensorExpr] Add Mod support to the LLVM backend (#44823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44823 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseMod_LLVM Reviewed By: glaringlee Differential Revision: D23761996 Pulled By: asuhan fbshipit-source-id: c3c5b2fe0d989dec04f0152ce47c5cae35ed19c9	2020-09-17 15:25:42 -07:00
Alex Suhan	f5b92332c1	[TensorExpr] Fix order comparisons for unsigned types (#44857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44857 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMCompareSelectByte*_LLVM Reviewed By: glaringlee Differential Revision: D23762162 Pulled By: asuhan fbshipit-source-id: 1553429bd2d5292ccda57910326b8c70e4e6ab88	2020-09-17 14:16:54 -07:00
Alex Suhan	5d57025206	[TensorExpr] Add log1p support to the LLVM backend (#44839 ) Summary: Also corrected Sleef_log1p registrations, float versions had a redundant f. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM Reviewed By: glaringlee Differential Revision: D23762113 Pulled By: asuhan fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459	2020-09-17 13:38:35 -07:00
Bert Maher	c14a3613a8	Fix NaN propagation in TE fuser's min/max implementation (#43609 ) Summary: Per eager mode source-of-truth, NaNs shall be propagated by min/max. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43609 Reviewed By: ZolotukhinM Differential Revision: D23349184 Pulled By: bertmaher fbshipit-source-id: 094eb8b89a02b27d5ecf3988d0f473c0f91e4afb	2020-09-01 02:10:13 -07:00
Nick Gibson	944ac133d0	[NNC] Remove VarBinding and go back to Let stmts (#42634 ) Summary: Awhile back when commonizing the Let and LetStmt nodes, I ended up removing both and adding a separate VarBinding section the Block. At the time I couldn't find a counter example, but I found it today: Local Vars and Allocations dependencies may go in either direction and so we need to support interleaving of those statements. So, I've removed all the VarBinding logic and reimplemented Let statements. ZolotukhinM I think you get to say "I told you so". No new tests, existing tests should cover this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42634 Reviewed By: mruberry Differential Revision: D22969771 Pulled By: nickgg fbshipit-source-id: a46c5193357902d0f59bf30ab103fe123b1503f1	2020-08-07 10:50:38 -07:00
Nick Gibson	7ffdd765c8	[TensorExpr] more convenient outer Rfactor output (#40050 ) Summary: Auto fuse the output loops of outer Rfactors, so it is in a more convenient format for binding GPU axes. An example: ``` Tensor* c = Reduce("sum", {}, Sum(), b, {{m, "m"}, {n, "n"}, {k, "k"}}); LoopNest loop({c}); std::vector<For> loops = loop.getLoopStmtsFor(c); auto v = loops.at(0)->var(); loop.rfactor(c->body(), v); ``` Before: ``` { Allocate(tmp_buf, float, {m}); sum[0] = 0.f; for (int m_1 = 0; m_1 < m; m_1++) { tmp_buf[m_1] = 0.f; } for (int m_1 = 0; m_1 < m; m_1++) { for (int n = 0; n < n_1; n++) { for (int k = 0; k < k_1; k++) { tmp_buf[m_1] = (tmp_buf[m_1]) + (b[((n_1 m_1) * k_1 + k) + k_1 * n]); } } } for (int m_1 = 0; m_1 < m; m_1++) { sum[0] = (sum[0]) + (tmp_buf[m_1]); } Free(tmp_buf); } ``` After: ``` { sum[0] = 0.f; for (int m = 0; m < m_1; m++) { Allocate(tmp_buf, float, {m_1}); tmp_buf[m] = 0.f; for (int n = 0; n < n_1; n++) { for (int k = 0; k < k_1; k++) { tmp_buf[m] = (tmp_buf[m]) + (b[((n_1 * m) * k_1 + k) + k_1 * n]); } } sum[0] = (sum[0]) + (tmp_buf[m]); Free(tmp_buf); } } ``` The existing Rfactor tests cover this case, although I did rename a few for clarity. This change broke the LLVMRFactorVectorizedReduction test because it now does what its intending to (vectorize a loop with a reduction in it) rather than nothing, and since that doesn't work it correctly fails. I've disabled it for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40050 Reviewed By: ZolotukhinM Differential Revision: D22605639 Pulled By: nickgg fbshipit-source-id: e359be53ea62d9106901cfbbc42d55d0e300e8e0	2020-07-21 14:44:26 -07:00
Nick Gibson	33f4fca1a6	[TensorExpr] remove Let and LetStmt in favour of binding in Block (#37606 ) Summary: Implementation of the less popular proposal for eliminating overlap between LetStmt and Let: removing both and storing a mapping between Var and value Expr in the Block. This complicates some tests but simplifies the IR by restricting where variable binding can occur. I used the unit tests & python integration tests to verify this is correct but I'm unsure of coverage, particularly around the dependency checker in loopnest - ZolotukhinM your review would be useful there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37606 Differential Revision: D21467483 Pulled By: nickgg fbshipit-source-id: b402d3fce4cacf35d75f300f0a7dca32a43b6688	2020-05-09 16:23:37 -07:00
Nick Gibson	4e2ea6e013	[TensorExpr] Remove the Tensor argument from loopnest.reorderAxis (#37873 ) Summary: Remove the requirement for the axes provided to reorderAxis to come from a Tensor. We were using that to determine the relevant loops, but we can alternatively determine it by traversing the parents of each provided For. resistor does this work for you? Pull Request resolved: https://github.com/pytorch/pytorch/pull/37873 Differential Revision: D21428016 Pulled By: nickgg fbshipit-source-id: b16b2f41cb443dfc2c6548b7980731d1e7d89a35	2020-05-06 12:02:15 -07:00
Mikhail Zolotukhin	1c0bad25f3	[TensorExpr] Add dtype to class Buf. (#36611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36611 Currently Buf represents underlying storage but it didn't have dtype. That resulted in specifying dtypes in different places and there was no mechanism to enforce its consistency: e.g. one could've created a kFloat expression and use a kInt buffer to store its result. Now we're centralizing where the logic regarding the storage is located and we can start enforcing semantics rules. Follow-ups: we can merge Buffer and BufHandle classes as the former is now a mere wrapper over the latter. Test Plan: Imported from OSS Differential Revision: D21027356 Pulled By: ZolotukhinM fbshipit-source-id: c06aa2c4077fdcde3bb4ca622d324aece79b5a9c	2020-05-05 15:04:37 -07:00
Owen Anderson	564de515f5	Add an iterator to Block. (#37542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37542 Differential Revision: D21314421 Pulled By: resistor fbshipit-source-id: e54d7a8a5c9c1186be59f69b5b8af030fc054b32	2020-05-01 15:12:49 -07:00
Owen Anderson	20ba29d81c	Add support for reductions on CPU in tensorexpr (#37333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37333 Differential Revision: D21290289 Pulled By: resistor fbshipit-source-id: ebba11f7af9e22b48c47e2eefb9497fa77acd17d	2020-04-30 10:59:38 -07:00

1 2

61 Commits