pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bert Maher	0714c003ee	[pytorch][tensorexpr] Make gtest-style macros in tests match actual gtest signatures (#44861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44861 We were redefining things like ASSERT_EQ to take a _VA_ARGS_ parameter, so compiling these files with gtest (instead of pytorch's custom python-based cpp test infra) fails. Test Plan: buck build //caffe2/test/cpp/tensorexpr Reviewed By: asuhan Differential Revision: D23711293 fbshipit-source-id: 8af14fa7c1f1e8169d14bb64515771f7bc3089e5	2020-09-19 07:25:05 -07:00
Bert Maher	a40ef25e30	[te] Disable flaky test CudaSharedMemReduce_1 (#44862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44862 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23753831 Pulled By: bertmaher fbshipit-source-id: d7d524ac34e4ca208df022a5730c2d11b3068f12	2020-09-17 07:58:16 -07:00
Nick Gibson	82ab167cce	[NNC] Fix masking for all block and thread dimensions in CudaCodeGen (#44733 ) Summary: Unifies a number of partial solutions to the thread and block dimension extent masking, including the NoThreadIdxWriter and my last fix https://github.com/pytorch/pytorch/issues/44325. The NoThreadIdxWriter is gone in favour of tracking the current loop extents and masking any statements that have a lower rank than the launch parameters in any Block or Thread dimension, which handles both the "no" and "smaller" axis binding cases. For example it will transform the following: ``` for i in 0..10 // blockIdx.x for j in 0..10 // threadIdx.x do thing(i, j); for k in 0..5 // threadIdx.x do other thing(i, k); ``` Into: ``` do thing(blockIdx.x, threadIdx.x); if (threadIdx.x < 5) { do other thing(blockIdx.x, threadIdx.x); } ``` And handle the case where statements are not bound by any axis, eg. ``` do outer thing; for i in 0..10 // blockIdx.x for j in 0..10 // threadIdx.x do thing(i, j); do other thing(i); ``` will become: ``` if (blockIdx.x < 1) { if (threadIdx.x < 1) { do outer thing; } } syncthreads(); do thing(blockIdx.x, threadIdx.x); syncthreads(); if (threadIdx.x < 1) { do other thing(blockIdx.x); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44733 Reviewed By: mruberry Differential Revision: D23736878 Pulled By: nickgg fbshipit-source-id: 52d08626ae8043d53eb937843466874d479a6768	2020-09-16 14:23:47 -07:00
Nick Gibson	64b4307d47	[NNC] Cuda Codegen - mask loops bound to block/thread dimensions (#44325 ) Summary: Fix an issue where loops of different sizes are bound to the same Cuda dimension / metavar. Coming soon more info and tests... Pull Request resolved: https://github.com/pytorch/pytorch/pull/44325 Reviewed By: colesbury Differential Revision: D23628859 Pulled By: nickgg fbshipit-source-id: 3621850a4cc38a790b62ad168d32e7a0e2462fad	2020-09-11 16:48:16 -07:00
Nick Gibson	30fccc53a9	[NNC] Don't attempt to refactor conditional scalars (#44223 ) Summary: Fixes a bug in the NNC registerizer for Cuda where it would hoist reads out of a conditional context when trying to cache them. As a quick fix, prevent scalar replacement if a usage is within a condition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44223 Reviewed By: gchanan Differential Revision: D23551247 Pulled By: nickgg fbshipit-source-id: 17a7bf2be4c8c3dd8a9ab7997dce9aea200c3685	2020-09-11 04:22:16 -07:00
Nick Gibson	be94dba429	[NNC] fix support for FP16 in CudaCodgen (#44209 ) Summary: Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load. Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209 Reviewed By: izdeby Differential Revision: D23575577 Pulled By: nickgg fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46	2020-09-08 18:00:39 -07:00
Bert Maher	55ff9aa185	Test TE fuser unary ops and fix sigmoid(half) (#44094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44094 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23494950 Pulled By: bertmaher fbshipit-source-id: 676c4e57267c4ad92065ea90b06323918dd5b0de	2020-09-03 12:48:46 -07:00
Mikhail Zolotukhin	ef50694d44	[TensorExpr] Apply GenericIntrinsicExpander recursively. (#42567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42567 Before this change we didn't expand arguments, and thus in an expr `sigmoid(sigmoid(x))` only the outer call was expanded. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D22936177 Pulled By: ZolotukhinM fbshipit-source-id: 9c05dc96561225bab9a90a407d7bcf9a89b078a1	2020-08-05 14:13:46 -07:00
Mikhail Zolotukhin	1c0bad25f3	[TensorExpr] Add dtype to class Buf. (#36611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36611 Currently Buf represents underlying storage but it didn't have dtype. That resulted in specifying dtypes in different places and there was no mechanism to enforce its consistency: e.g. one could've created a kFloat expression and use a kInt buffer to store its result. Now we're centralizing where the logic regarding the storage is located and we can start enforcing semantics rules. Follow-ups: we can merge Buffer and BufHandle classes as the former is now a mere wrapper over the latter. Test Plan: Imported from OSS Differential Revision: D21027356 Pulled By: ZolotukhinM fbshipit-source-id: c06aa2c4077fdcde3bb4ca622d324aece79b5a9c	2020-05-05 15:04:37 -07:00
Xiaoqiang Zheng	5e504e83e8	Add sync-point insertions and block/thread local memory allocations (#36563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36563 Test Plan: Imported from OSS Differential Revision: D21014238 Pulled By: zheng-xq fbshipit-source-id: 4d61ff2f76345ea2825f2d5f60a771f65b24ad69	2020-04-20 18:52:30 -07:00
Xiaoqiang Zheng	32bbf12aa7	Make trivial thread-idx for degenerate statements without thread-idx. (#36480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36480 Test Plan: Imported from OSS Differential Revision: D20992505 Pulled By: zheng-xq fbshipit-source-id: 3d4e5401b59b9507b5f2db659e511bd1af53f5ab	2020-04-17 02:31:07 -07:00
Nikita Shulga	6bd6b70a02	Fix clang-format (#36685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36685 Differential Revision: D21052657 Pulled By: malfet fbshipit-source-id: b4ec7eba21864108a1108f8c83b5d33cf31ab89e	2020-04-15 17:02:20 -07:00
Xiaoqiang Zheng	dad25ae47d	Add the one-block multi-thread global reduction support. (#36306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36306 Missing __syncthreads between sections. Differential Revision: D20957254 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Pulled By: zheng-xq fbshipit-source-id: c988f0205b667174b3ee851c28adeec2dbd089f7	2020-04-15 13:05:11 -07:00
Xiaoqiang Zheng	e80813fae3	Add trivial reduce for Cuda (#36293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36293 Detect non-read-only loads, and not to use __ldg. Resubmiting #36092 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D20935933 Pulled By: zheng-xq fbshipit-source-id: f9280db26aa9c9c8119cea12571bc820f5fbcb61	2020-04-15 13:03:58 -07:00
Mike Ruberry	3570ef6a0f	Revert D20876204: [pytorch][PR] Add trivial reduce for Cuda Test Plan: revert-hammer Differential Revision: D20876204 Original commit changeset: a719f3583cc4 fbshipit-source-id: 6d00afb3a24754d283a7b832c0b784ed9fce36e1	2020-04-06 20:17:04 -07:00
Xiaoqiang Zheng	a81be33a4e	Add trivial reduce for Cuda (#36092 ) Summary: Detect non-read-only loads, and not to use __ldg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36092 Reviewed By: ZolotukhinM Differential Revision: D20876204 Pulled By: zheng-xq fbshipit-source-id: a719f3583cc4ca30fcfb49d999ca785181354d84	2020-04-06 17:58:50 -07:00
Mikhail Zolotukhin	3ef5ff6012	[TensorExpr] Make Load and Store multi-dimensional. (#35800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35800 This PR includes the following changes: * Introduce a new `Expr` type `Buf`: it plays a similar to `Var` role, but also has dimensions. * Use the new `Buf` class in `Store` and `Load` instead of `Var` for specifying where to store to or load from. `Buf` contains the dimensions info of the buffer we're loading/storing to and hence we are able to keep N-d indexes without flattening them into a 1-d index ([x,y] vs [x+yW]). Flattening of the indexes is now a separate pass that is executed in `LoopNest::prepareForCodegen` - backends still expect indexes to be flattened, and this PR preserves that. * `Tensor` now contains a `Buf` instead of `Var`, and thus Tensor now has the dimensions info (previously it was a property of a `Function`, not a `Tensor`). This brings us closer to Tensor being a combination of Buffer + Function, where Buffer specifies iteration domain and the Function defines a computation. TODOs: * Consider merging `Buffer` with `Buf` or `BufHandle`. It seems that we don't need all of them. * Harden the logic of how we create buffers in fuser pass. Currently it seems that sometimes we don't set dimensions. * Use `Buf` in `Allocate` and `Free`. * Make it clearer that `Function` doesn't "own" dimensions info and that dimensions are a property of a Tensor, not a Function. Differential Revision: D20789005 Test Plan: Imported from OSS Reviewed By: zheng-xq Pulled By: ZolotukhinM fbshipit-source-id: e04188d1d297f195f1c46669c614557d6bb6cde4	2020-04-02 11:18:28 -07:00
Nikolay Korovaiko	9e22d15f14	Enable tensorexpr cpp tests in CI. try #2 (#35454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35454 Differential Revision: D20665160 Pulled By: Krovatkin fbshipit-source-id: e04cbe92b2ee5a3288f3c4e5c83533bfea85bf85	2020-03-27 12:09:55 -07:00
Suraj Menon	aa01a95c6d	Revert D20630760: [pytorch][PR] Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP] Test Plan: revert-hammer Differential Revision: D20630760 Original commit changeset: 7d2f27aca6b1 fbshipit-source-id: 28ac92b3390651a4a67061d6ebf208515b9b9463	2020-03-25 20:34:46 -07:00
Nikolay Korovaiko	f3a5081bd4	Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP] (#34897 ) Summary: This PR add tensorexpr cpp tests to test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/34897 Differential Revision: D20630760 Pulled By: Krovatkin fbshipit-source-id: 7d2f27aca6b1e23e3ffed1c765d8f590688118e3	2020-03-25 17:23:48 -07:00
Mikhail Zolotukhin	ceb4ed3733	[TensorExpr] Methods name cleanup in LoopNest class. (#35174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35174 Differential Revision: D20585575 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 0fa8e1e85e1502b9a86cf34608cb791ffb23d395	2020-03-25 11:51:11 -07:00
Mikhail Zolotukhin	95ad94c75b	[TensorExpr] Nuke tensorexpr::schedule namespace. (#35126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35126 Test Plan: Imported from OSS Differential Revision: D20569364 Pulled By: ZolotukhinM fbshipit-source-id: c0d51ecadf411918641cdbdc6d8cb06e207d2c9b	2020-03-20 23:39:14 -07:00
Mikhail Zolotukhin	65cea95777	[TensorExpr] Rename schedule.{cpp,h} to loopnest.{cpp,h}. (#35119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35119 Differential Revision: D20567927 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 1fb6d03bd4c6e66aca62140d2b537692577f261d	2020-03-20 23:37:51 -07:00
Mikhail Zolotukhin	95833a49e6	[TensorExpr] Pull changes from bertmaher/pytorch_fusion. (#34842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34842 This PR (hopefully the last one of such kind) is merging changes from a side branch where tensor expessions based fuser work has been done so far. This PR is is a squashed version of changes in the side branch, which is available here: https://github.com/bertmaher/pytorch Differential Revision: D20478208 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 21556e009f1fd88099944732edba72ac40e9b9c0	2020-03-17 11:02:48 -07:00
Mikhail Zolotukhin	35e7efeb9a	[TensorExpr] Add CUDA codegen. (#34227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34227 This PR adds a CUDA support to tensor expressions. Differential Revision: D20251836 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: ab36a55834cceff30c8371fef6cca1054a32f017	2020-03-16 11:49:29 -07:00

25 Commits