pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Raghavan Raman	4dec15e6d8	[nnc] Add a run method to TensorExprKernel that takes in output tensors (#69477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477 This diff adds a new run method to `TensorExprKernel` which takes in output tensors as inputs and stores the output in those given tensors. ghstack-source-id: 146107009 Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs' Reviewed By: ZolotukhinM Differential Revision: D32823890 fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f	2021-12-22 00:30:15 -08:00
David Berard	8c7f4a0d0b	[tensorexpr] check for index out of bounds in ir_eval (#68858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858 when executing with ir_eval, check for index out of bounds. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32657881 Pulled By: davidberard98 fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922	2021-12-16 09:27:45 -08:00
Raghavan Raman	e7a3bbce89	[nnc] Add support for dynamic shapes in TensorExprKernel (#67861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67861 Previously submitted as https://github.com/pytorch/pytorch/pull/67197. This got reverted because its failures were hidden by the failures of another PR. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32178196 Pulled By: navahgar fbshipit-source-id: cc8a5c68aed360d06289e69645461cfa773e1300	2021-11-05 11:18:19 -07:00
Natalia Gimelshein	ca445645f9	Revert D31902471: [nnc] Add support for dynamic shapes in TensorExprKernel Test Plan: revert-hammer Differential Revision: D31902471 (`15a3c374e2`) Original commit changeset: d2729a38ba1a fbshipit-source-id: 4c05de82e626bbf744df84fd2b914b66fd165a19	2021-11-03 14:48:12 -07:00
Raghavan Raman	15a3c374e2	[nnc] Add support for dynamic shapes in TensorExprKernel (#67197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67197 Test Plan: Imported from OSS Reviewed By: eellison, ZolotukhinM Differential Revision: D31902471 Pulled By: navahgar fbshipit-source-id: d2729a38ba1ac607ff07f516ed56fbd9085715dc	2021-11-03 11:24:17 -07:00
Raghavan Raman	383c1f51b1	[nnc] Fixed handling of 0-sized tensors in cat (#67734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67734 The implementation of `aten::cat` op in NNC has to ignore tensors that have 0-size in any dimension. Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.CatWithEmptyInputs'` Reviewed By: ZolotukhinM Differential Revision: D32122171 fbshipit-source-id: 90c697813bc504664673cdc262df6e7ce419c655	2021-11-03 10:16:16 -07:00
Mikhail Zolotukhin	d58ef2bbff	[TensorExpr] Fix lowering for aten::softmax for the case when dtype parameter is None. (#66516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66516 Differential Revision: D31590858 D31590858 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 0aeee7a5be64b3b9c8fa00aacb1a94031a7e25d1	2021-11-03 09:42:48 -07:00
Richard Barnes	e0643fa3fc	use irange for loops 5 (#66744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705358 fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48	2021-10-18 21:59:50 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Mikhail Zolotukhin	7e9c599784	[TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010 This pass ensures all names are legal and not-duplicated. Fixes #52727. Test Plan: Imported from OSS Reviewed By: bertmaher, navahgar Differential Revision: D30939717 Pulled By: ZolotukhinM fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63	2021-09-15 17:15:06 -07:00
Raghavan Raman	cad7a4b0ea	[nnc] Added an implementation of sign op (#64033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30579197 Pulled By: navahgar fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3	2021-09-10 16:49:04 -07:00
Mikhail Zolotukhin	a17d6c7f80	[TensorExpr] Simplify TE IR before applying any transformations. (#64717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717 This also exposed several bugs, which are fixed in this PR. Differential Revision: D30826408 D30826408 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560	2021-09-09 18:50:51 -07:00
Raghavan Raman	652a8bf7d0	[nnc] Updated indices during broadcast to use int64_t (#64627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627 This fixes the root cause of S242719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30801686 Pulled By: navahgar fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80	2021-09-09 08:29:37 -07:00
Hui Guo	5c27a580ec	[tensorexpr] Allocate intermediate buffers at compile time (#64227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30652220 Pulled By: huiguoo fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e	2021-09-08 15:34:44 -07:00
Mikhail Zolotukhin	72274e2a2f	[TensorExpr] Don't rely on exceptions in Vectorizer. (#64609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64609 We've been using exceptions to indicate whether vectorization succeeded or not, but that posed some problems with (e.g. we spent too much time symbolicazing these exceptions). This change converts this mechanism to a standard error return code. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30795342 Pulled By: ZolotukhinM fbshipit-source-id: 16e38b37bcdd78ceb438ac814cc377f35b058e17	2021-09-08 00:25:34 -07:00
Bert Maher	2e6221a232	[nnc] Make 64-bit dimensions work (#64077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077 We were assuming kernel dimensions fit in 32 bits (the old fuser made this assumption too), but we should be able to support 64. ghstack-source-id: 136933272 Test Plan: unit tests; new IR level test with huge sizes Reviewed By: ZolotukhinM Differential Revision: D30596689 fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94	2021-08-28 19:59:47 -07:00
Raghavan Raman	6d31ba6ddc	[nnc] Sanitized the names of constants in the input graph. (#63990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63923 The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990 Reviewed By: ZolotukhinM Differential Revision: D30558432 Pulled By: navahgar fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f	2021-08-26 09:52:02 -07:00
Bert Maher	8dda299d96	Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776 I reverted this out of an abundance of caution because some test failures occurred, but they were all due to precision issues fixed lower in this stack. Let's try again. I've rolled the elimination of the allow-parallelism-in-fusions toggle into this diff since they're pretty tightly coupled. ghstack-source-id: 136529847 Test Plan: CI Reviewed By: huiguoo Differential Revision: D30484555 fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59	2021-08-24 18:56:55 -07:00
Mikhail Zolotukhin	f0d274294d	[TensorExpr] Nuke KernelArena and KernelScope. (#63587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587 Now that there is no classes using KernelArena for memory management we can remove it. Differential Revision: D30429115 D30429115 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544	2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Bert Maher	37d60c08e5	Revert D30360382: [nnc] Support thread level parallelism in fused kernels Test Plan: revert-hammer Differential Revision: D30360382 (`d6d86efb1c`) Original commit changeset: 29acf4e932c6 fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438	2021-08-21 03:46:43 -07:00
Bert Maher	d6d86efb1c	[nnc] Support thread level parallelism in fused kernels (#63386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30360382 Pulled By: bertmaher fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6	2021-08-20 11:18:17 -07:00
Mikhail Zolotukhin	1dc2b52764	[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195 This helps us to later switch from using KernelArena with raw pointers to shared pointers without having to change all our source files at once. The changes are mechanical and should not affect any functionality. With this PR, we're changing the following: * `Add` --> `AddPtr` `new Add(...)` --> `alloc<Add>(...)` * `dynamic_cast<Add>` --> `to<Add>` `static_cast<Add>` --> `static_to<Add>` Due to some complications with args forwarding, some places became more verbose, e.g.: `new Block({})` --> `new Block(std::vector<ExprPtr>())` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292779 Pulled By: ZolotukhinM fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9	2021-08-17 13:44:45 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Mikhail Zolotukhin	3bfe15085d	[TensorExpr] Add a mechanism to register custom TS->NNC lowerings in TensorExprKernel. (#60804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804 The lowerings are stored as a map c10::Symbol -> std::function and the signature of thoese functions match the signature of `computeOperandValue`. Custom lowerings have higher priority over the standard ones, i.e. we can redefine how a given op is lowered. In general this feature is aimed at unblocking users whose models contain ops that are not yet supported by NNC - it allows to quickly add a custom lowering for a given op. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29409580 Pulled By: ZolotukhinM fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60	2021-06-27 15:27:22 -07:00
Mikhail Zolotukhin	eb36f67dcc	[TensorExpr] Minor cleanup in TensorExprKernel::computeValue (#60041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60041 Differential Revision: D29146709 D29146709 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 49ac919c18f669d7fda1a26c5a74e62ea752df4f	2021-06-17 01:23:24 -07:00
Mikhail Zolotukhin	daa35141e8	Reland: "[TensorExpr] Fix handling of 0-dim tensors." (#59508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59508 An assert that was triggering in a previous version is now relaxed to take 0-dim tensors into account. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28918342 Pulled By: ZolotukhinM fbshipit-source-id: c09b62c9725d1603b0ec11fcc051e7c932af06ae	2021-06-08 22:48:17 -07:00
Nikita Shulga	ba3a90b55e	Revert D28819780: [TensorExpr] Fix handling of 0-dim tensors. Test Plan: revert-hammer Differential Revision: D28819780 Original commit changeset: f3feff35a1ce fbshipit-source-id: 1dca4ac9cea0b67e9f02800f6d5b3c7e4ae1d81a	2021-06-04 19:25:30 -07:00
Mikhail Zolotukhin	d60efd8207	[TensorExpr] Fix handling of 0-dim tensors. (#59279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59279 There were some issues with how we handle 0-dim cases in lowerings and also in how we generate reductions in that special case. This PR fixes those issues and reenables a bunch of tests. Differential Revision: D28819780 D28819780 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f3feff35a1ce11821ada2f8d04ae9d4be10dc736	2021-06-04 13:58:15 -07:00
Raghavan Raman	3fe72d30dc	[NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28231374 Pulled By: navahgar fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a	2021-05-18 14:23:48 -07:00
Nikita Shulga	3a66a1cb99	[clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841 ) Summary: Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy Remove existing nolint warnings using following script: ``` for file in `git ls-files \| grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i $file; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841 Reviewed By: samestep Differential Revision: D28295045 Pulled By: malfet fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163	2021-05-07 20:02:33 -07:00
Mikhail Zolotukhin	dedaf4fad7	Reland: [TensorExpr] Add methods for inspecting generated code in `TensorExprKernel`. (#57560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57560 The new methods allow to peak into bufferArgs which describe parameters that codegen expects. This description includes info whether a given parameter is a scalar var or a buffer and in case it's a buffer allows to get the corresponding `Buf*` pointer from which we could get the expected sizes. Relanding #57074 which was reverted because I forgot to guard a new test with `ifdef LLVM`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28199048 Pulled By: ZolotukhinM fbshipit-source-id: 636e838e7e242a3c63e97ec453b8fae9b6380231	2021-05-05 09:11:40 -07:00
Mikhail Zolotukhin	e686c66fe7	Reland: [TensorExpr] Add `TensorExprKernel::runFast` method. (#57552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57552 This method uses `CodeGen::call_raw` instead of `CodeGen::call`. Relanding #57328 (the entire stack) which was reverted because I forgot to guard a new test with `ifdef LLVM`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28195047 Pulled By: ZolotukhinM fbshipit-source-id: bcfd3cb5b4f33a149b7549515ffd705e2c4f208f	2021-05-05 09:11:37 -07:00
Mike Ruberry	1461859fde	Revert D28048289: [TensorExpr] Add methods for inspecting generated code in `TensorExprKernel`. Test Plan: revert-hammer Differential Revision: D28048289 (`6b2cb939c5`) Original commit changeset: 3867e862a0ec fbshipit-source-id: bdd45dcc4b229673efeb06da411bbf0c58d44026	2021-05-04 11:29:14 -07:00
Mikhail Zolotukhin	6b2cb939c5	[TensorExpr] Add methods for inspecting generated code in `TensorExprKernel`. (#57074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57074 The new methods allow to peak into bufferArgs which describe parameters that codegen expects. This description includes info whether a given parameter is a scalar var or a buffer and in case it's a buffer allows to get the corresponding `Buf*` pointer from which we could get the expected sizes. Differential Revision: D28048289 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 3867e862a0ec3593906820826c2344bd8a8f5c0a	2021-05-03 20:02:28 -07:00
Mike Ruberry	3018093066	Revert D28110359: [TensorExpr] Add `TensorExprKernel::runFast` method. Test Plan: revert-hammer Differential Revision: D28110359 (`f219ed6627`) Original commit changeset: 4fdffc8196d2 fbshipit-source-id: 3c93a058b5dd7a3b71e399341a408ec74949ef56	2021-05-01 16:16:37 -07:00
Mikhail Zolotukhin	f219ed6627	[TensorExpr] Add `TensorExprKernel::runFast` method. (#57328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57328 This method uses `CodeGen::call_raw` instead of `CodeGen::call`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28110359 Pulled By: ZolotukhinM fbshipit-source-id: 4fdffc8196d24fc3300a9b4bc69f67562042a045	2021-04-30 15:26:18 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Mikhail Zolotukhin	f3743f097f	[TensorExpr] Nuke tensorexpr::ScalarType and instead use c10::ScalarType directly. (#56825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56825 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27977461 Pulled By: ZolotukhinM fbshipit-source-id: f8a72938ba395e426e2d9449627113abb1c9c34f	2021-04-26 01:51:21 -07:00
Bert Maher	c91c4a081d	[NNC] Horizontally fuse all loops (#56324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56324 Inlining is great if LLVM's CSE kicks in; but if a kernel has multiple outputs (and thus multiple loops), CSE has no chance. So, this pass "horizontally" fuses the output loops together so that CSE can go to town. Essentially we want to turn ``` for (...) { output_1[] = some_complicated_expr... } for (...) { output_2[] = some_complicated_expr... } ``` Into: ``` for (...) { output_1[] = complicated_expr output_2[] = complicated_expr. // llvm cse should take care of this } ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27841194 Pulled By: bertmaher fbshipit-source-id: 54153bb59786be87183c636d64f05963c4b1624a	2021-04-20 23:54:40 -07:00
Mikhail Zolotukhin	85126629a5	[TensorExpr] Add support for constant tensors in tensorexpr kernel. (#56319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56319 With this change the TorchScript graph can have constant tensors in it and we still will be able to lower it to TE. The constants are registered (or bound) within the `TensorExprKernel` object and when the codegen is called, they are passed along with usual inputs and outputs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27838747 Pulled By: ZolotukhinM fbshipit-source-id: 4a519d66fcc07fe5fa53f5cf9af28d25611f8437	2021-04-17 11:15:35 -07:00
Raghavan Raman	d3cde6c23c	[NNC] Implementation for aten::cat without conditionals. (#53128 ) Summary: This PR adds an implementation for `aten::cat` in NNC without any conditionals. This version is not enabled by default. Here is the performance of some micro benchmarks with and without conditionals. There is up to 50% improvement in performance without conditionals for some of the shapes. aten::cat implementation in NNC with conditionals ``` $ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion concat pt: concat2d2input_fwd_cpu_1_160_1_14_1: 5.44 us, SOL 0.26 GB/s, algorithmic 0.51 GB/s pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.75 us, SOL 1.05 GB/s, algorithmic 2.10 GB/s pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.87 us, SOL 4.05 GB/s, algorithmic 8.11 GB/s pt: concat2d2input_fwd_cpu_20_580_20_174_1: 14.52 us, SOL 8.31 GB/s, algorithmic 16.62 GB/s pt: concat2d2input_fwd_cpu_8_512_8_512_1: 9.58 us, SOL 6.84 GB/s, algorithmic 13.68 GB/s ``` aten::cat implementation in NNC without conditionals ``` $ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion --cat_wo_conditionals concat pt: concat2d2input_fwd_cpu_1_160_1_14_1: 4.67 us, SOL 0.30 GB/s, algorithmic 0.60 GB/s pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.65 us, SOL 1.07 GB/s, algorithmic 2.14 GB/s pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.10 us, SOL 4.56 GB/s, algorithmic 9.12 GB/s pt: concat2d2input_fwd_cpu_20_580_20_174_1: 7.44 us, SOL 16.22 GB/s, algorithmic 32.44 GB/s pt: concat2d2input_fwd_cpu_8_512_8_512_1: 6.46 us, SOL 10.14 GB/s, algorithmic 20.29 GB/s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53128 Reviewed By: bertmaher Differential Revision: D26758613 Pulled By: navahgar fbshipit-source-id: 00f56b7da630b42bc6e7ddd4444bae0cf3a5780a	2021-03-07 22:57:02 -08:00
Elias Ellison	43f56e19a6	[NNC] Make NNC sanitize input names (#52786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52786 Previously, NNC did not sanitize input names. I ran into this in the next PR when making subgraph creation preserve debug names caused a number of NNC cuda failures. I also previously ran into this with some masked_fill failures internally, which led me to disable the operator. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696699 Pulled By: eellison fbshipit-source-id: 7c3af4d559d58762fb8332666784a4d5cd6a4167	2021-03-01 21:22:16 -08:00
Raghavan Raman	c7a70eec1b	Make LLVM the default backend for TE (#52314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb	2021-02-18 12:00:38 -08:00
Elias Ellison	efe1fc21fc	Dont inlinine intermediates on cpu (#49565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49565 Test Plan: Imported from OSS Reviewed By: Krovatkin, ZolotukhinM Differential Revision: D25688271 Pulled By: eellison fbshipit-source-id: 9ea7858e2db4fb31292e04440fc72ee04623c688	2021-01-04 15:46:20 -08:00
generatedunixname89002005325676	dcd1e3d78d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25490983 fbshipit-source-id: b24a11214a485a4a24ccf7da1e72715b450d3a81	2020-12-11 08:43:24 -08:00
Elias Ellison	3b57be176e	[NNC] Preserve strided output (#48264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264 Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel. Fix for https://github.com/pytorch/pytorch/issues/45604 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25286213 Pulled By: eellison fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f	2020-12-10 12:19:51 -08:00
Elias Ellison	413caa7fd2	[NNC] Compute Tensor Output Properties in ininitialization (#47813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47813 We have some code paths that at kernel invocation seem to handle dynamic sizes, but I'm not sure how well it works because we have other parts of our code base that assume that tenso shapes are always fully specified. https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/kernel.cpp#L1572 As with some other PRs in the stack, I think it would be good to remove the features that aren't on/actively being worked on while they are not used. I initially did this PR to try to speed up perf. I couldn't observe too much of a speed up, so we can decide to keep drop this PR if we want. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286212 Pulled By: eellison fbshipit-source-id: 4ae66e0af88d649dd4e592bc78686538c2fdbaeb	2020-12-10 12:19:45 -08:00
Nikolay Korovaiko	0d8ddb5ec2	Make softmax and log_softmax handle negative dims, add tests (#48156 ) Summary: Make softmax and log_softmax handle negative dims, add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156 Reviewed By: bertmaher Differential Revision: D25059788 Pulled By: Krovatkin fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad	2020-11-19 01:38:14 -08:00

1 2

69 Commits