pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Raghavan Raman	d3cde6c23c	[NNC] Implementation for aten::cat without conditionals. (#53128 ) Summary: This PR adds an implementation for `aten::cat` in NNC without any conditionals. This version is not enabled by default. Here is the performance of some micro benchmarks with and without conditionals. There is up to 50% improvement in performance without conditionals for some of the shapes. aten::cat implementation in NNC with conditionals ``` $ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion concat pt: concat2d2input_fwd_cpu_1_160_1_14_1: 5.44 us, SOL 0.26 GB/s, algorithmic 0.51 GB/s pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.75 us, SOL 1.05 GB/s, algorithmic 2.10 GB/s pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.87 us, SOL 4.05 GB/s, algorithmic 8.11 GB/s pt: concat2d2input_fwd_cpu_20_580_20_174_1: 14.52 us, SOL 8.31 GB/s, algorithmic 16.62 GB/s pt: concat2d2input_fwd_cpu_8_512_8_512_1: 9.58 us, SOL 6.84 GB/s, algorithmic 13.68 GB/s ``` aten::cat implementation in NNC without conditionals ``` $ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion --cat_wo_conditionals concat pt: concat2d2input_fwd_cpu_1_160_1_14_1: 4.67 us, SOL 0.30 GB/s, algorithmic 0.60 GB/s pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.65 us, SOL 1.07 GB/s, algorithmic 2.14 GB/s pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.10 us, SOL 4.56 GB/s, algorithmic 9.12 GB/s pt: concat2d2input_fwd_cpu_20_580_20_174_1: 7.44 us, SOL 16.22 GB/s, algorithmic 32.44 GB/s pt: concat2d2input_fwd_cpu_8_512_8_512_1: 6.46 us, SOL 10.14 GB/s, algorithmic 20.29 GB/s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53128 Reviewed By: bertmaher Differential Revision: D26758613 Pulled By: navahgar fbshipit-source-id: 00f56b7da630b42bc6e7ddd4444bae0cf3a5780a	2021-03-07 22:57:02 -08:00
Elias Ellison	43f56e19a6	[NNC] Make NNC sanitize input names (#52786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52786 Previously, NNC did not sanitize input names. I ran into this in the next PR when making subgraph creation preserve debug names caused a number of NNC cuda failures. I also previously ran into this with some masked_fill failures internally, which led me to disable the operator. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696699 Pulled By: eellison fbshipit-source-id: 7c3af4d559d58762fb8332666784a4d5cd6a4167	2021-03-01 21:22:16 -08:00
Raghavan Raman	c7a70eec1b	Make LLVM the default backend for TE (#52314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb	2021-02-18 12:00:38 -08:00
Elias Ellison	efe1fc21fc	Dont inlinine intermediates on cpu (#49565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49565 Test Plan: Imported from OSS Reviewed By: Krovatkin, ZolotukhinM Differential Revision: D25688271 Pulled By: eellison fbshipit-source-id: 9ea7858e2db4fb31292e04440fc72ee04623c688	2021-01-04 15:46:20 -08:00
generatedunixname89002005325676	dcd1e3d78d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25490983 fbshipit-source-id: b24a11214a485a4a24ccf7da1e72715b450d3a81	2020-12-11 08:43:24 -08:00
Elias Ellison	3b57be176e	[NNC] Preserve strided output (#48264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264 Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel. Fix for https://github.com/pytorch/pytorch/issues/45604 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25286213 Pulled By: eellison fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f	2020-12-10 12:19:51 -08:00
Elias Ellison	413caa7fd2	[NNC] Compute Tensor Output Properties in ininitialization (#47813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47813 We have some code paths that at kernel invocation seem to handle dynamic sizes, but I'm not sure how well it works because we have other parts of our code base that assume that tenso shapes are always fully specified. https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/kernel.cpp#L1572 As with some other PRs in the stack, I think it would be good to remove the features that aren't on/actively being worked on while they are not used. I initially did this PR to try to speed up perf. I couldn't observe too much of a speed up, so we can decide to keep drop this PR if we want. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286212 Pulled By: eellison fbshipit-source-id: 4ae66e0af88d649dd4e592bc78686538c2fdbaeb	2020-12-10 12:19:45 -08:00
Nikolay Korovaiko	0d8ddb5ec2	Make softmax and log_softmax handle negative dims, add tests (#48156 ) Summary: Make softmax and log_softmax handle negative dims, add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156 Reviewed By: bertmaher Differential Revision: D25059788 Pulled By: Krovatkin fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad	2020-11-19 01:38:14 -08:00
Bert Maher	07657b6001	[tensorexpr] Switch cpp tests to pure gtest (#48160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160 We no longer use the custom c++ test infra anyways, so move to pure gtest. Fixes #45703 ghstack-source-id: 116977283 Test Plan: `buck test //caffe2/test/cpp/tensorexpr` Reviewed By: navahgar, nickgg Differential Revision: D25046618 fbshipit-source-id: da34183d87465f410379048148c28e1623618553	2020-11-18 12:23:34 -08:00
Raghavan Raman	8eb228a7f3	Add support for log_softmax (#47409 ) Summary: This diff adds support for `log_softmax` op in NNC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47409 Reviewed By: ejguan Differential Revision: D24750203 Pulled By: navahgar fbshipit-source-id: c4dacc7f62f9df65ae467f0d578ea03d3698273d	2020-11-06 13:29:27 -08:00
Raghavan Raman	2caa3bd453	Inlining all non-output buffers, including intermediate buffers. (#47258 ) Summary: This diff enables inlining for all non-output buffers, including the intermediate buffers that are created as part of an op. However, the buffers that correspond to reductions will not be inlined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47258 Reviewed By: anjali411 Differential Revision: D24707015 Pulled By: navahgar fbshipit-source-id: ad8b03e38497600cd69980424db6d586bf93db74	2020-11-03 17:00:32 -08:00
Raghavan Raman	f58842c214	Enable inlining into reductions (#47020 ) Summary: This diff enables inlining producers into reductions. It also guards against inlining reductions themselves. Prior to this diff, if there was a reduction in the loopnest, no inlining was happening. After this change, we will inline all non-output buffers that do not correspond to a reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47020 Reviewed By: albanD Differential Revision: D24644346 Pulled By: navahgar fbshipit-source-id: ad234a6877b65be2457b734cbb7f3a1800baa6a5	2020-11-02 15:33:38 -08:00
Mikhail Zolotukhin	d6de9d573a	[TensorExpr] Properly handle input types promotion and special case of empty inputs for aten::cat. (#46500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46500 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24373671 Pulled By: ZolotukhinM fbshipit-source-id: b3be73a89a9ab6654212cb7094f32bf1c445e876	2020-10-16 20:26:46 -07:00
Mikhail Zolotukhin	0f668d95b6	[TensorExpr] Fix shape inference logic for aten::cat. (#46482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46482 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D24366778 Pulled By: ZolotukhinM fbshipit-source-id: 000ff363b11599ba3827cdf2db3d4793878b84ab	2020-10-16 20:24:30 -07:00
Raghavan Raman	a5c0dbc519	Add support for Softmax. (#45286 ) Summary: This PR adds support for Softmax in NNC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45286 Reviewed By: mrshenli Differential Revision: D24042901 Pulled By: navahgar fbshipit-source-id: 120bafe17586d3ecf0918f9aee852a7c3a8f4990	2020-10-08 23:57:02 -07:00
Ansley Ussery	5072728d88	Fix stride printing/parsing formatting (#45156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45156 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24078695 Pulled By: ansley fbshipit-source-id: dab993277d43b31105c38d12098c37653747b42a	2020-10-06 15:06:46 -07:00
Mikhail Zolotukhin	92306b85d5	[TensorExpr] Consolidate {buffer,function,tensor}.{h.cpp} in tensor.{h,cpp}. (#45388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45388 Classes defined in these files are closely related, so it is reasonable to have them all in one file. The change is purely a code move. Differential Revision: D23952867 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 12cfaa968bdfc4dff00509e34310a497c7b59155	2020-09-29 01:17:10 -07:00
Alex Suhan	85d91a3230	[TensorExpr] Check statements in test_kernel.cpp (#43911 ) Summary: Check statements and fix all the warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43911 Test Plan: test_tensorexpr Reviewed By: ZolotukhinM Differential Revision: D23441092 Pulled By: asuhan fbshipit-source-id: f671eef4b4eb9b51acb15054131152ae650fedbd	2020-08-31 22:16:25 -07:00
Alex Suhan	deb5fde51c	[TensorExpr] Make KernelSumMultipleAxes much faster (#43905 ) Summary: Reduce input size, skip the dtype conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43905 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.KernelSum* Reviewed By: ailzhang Differential Revision: D23433398 Pulled By: asuhan fbshipit-source-id: 0d95ced3c1382f10595a9e5745bf4bef007cc913	2020-08-31 17:58:43 -07:00
Alex Suhan	60ad7e9c04	[TensorExpr] Make sum available from Python (#43730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43730 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_sum test_tensorexpr --gtest_filter=TensorExprTest.KernelSum* Reviewed By: ZolotukhinM Differential Revision: D23407600 Pulled By: asuhan fbshipit-source-id: e6da4690ae6d802f9be012e39e61b7467aa5285c	2020-08-29 10:38:21 -07:00
Alex Suhan	de84db2a9d	[TensorExpr] Add aten::sum lowering to the kernel (#43585 ) Summary: Handles all dimensions and selected dimensions, per PyTorch semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43585 Test Plan: test_tensorexpr Reviewed By: bertmaher Differential Revision: D23362382 Pulled By: asuhan fbshipit-source-id: e8d8f1197a026be0b46603b0807d996a0de5d58c	2020-08-27 02:46:47 -07:00
Mikhail Zolotukhin	b9c49f0e69	[TensorExpr] Support shape inference in TE for aten::cat. (#42387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42387 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D22879281 Pulled By: ZolotukhinM fbshipit-source-id: 775e46a4cfd91c63196b378ee587cc4434672c89	2020-08-05 14:11:24 -07:00
Mikhail Zolotukhin	2decccea2e	[TensorExpr] Implement shape inference for TE. (#41451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41451 Since TE operates on a limited subset of ops with a well-defined semantics, we can easily infer shapes of intermediate and output tensors given shapes of the inputs. There is a couple of ops that are not yet supported in the shape inference, once we add them we could relax the shape info requirements in the TE fuser: currently it requires all values in the fusion group to have shapes known and we can change it to only inputs. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D22543470 Pulled By: ZolotukhinM fbshipit-source-id: 256bae921028cb6ec3af91977f12bb870c385f40	2020-07-31 20:05:21 -07:00
Mikhail Zolotukhin	5d7046522b	[JIT] Teach IRPrinter and IRParser to handle 'requires_grad' and 'device' as a part of type info. (#41507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41507 These fields have always been a part of tensor types, this change just makes them serializable through IR dumps. Test Plan: Imported from OSS Reviewed By: Krovatkin, ngimel Differential Revision: D22563661 Pulled By: ZolotukhinM fbshipit-source-id: f01aaa130b7e0005bf1ff21f65827fc24755b360	2020-07-17 10:27:04 -07:00
Nikita Shulga	c6e9e9359f	[Codemod][GleanFbcode] Remove dead includes in caffe2/test (#39023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39023 Reviewed By: orionr Differential Revision: D21702529 fbshipit-source-id: 6945bba95609102409850b105a8a091e33b8acc9	2020-05-27 14:07:26 -07:00
Mikhail Zolotukhin	6e13146d96	[TensorExpr] TensorExprKernel: don't do any compilation or lowering in run(). (#37948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37948 The input JIT graph has all the information we need to perform the entire compilation at the construction time. We don't need to postpone any steps until the execution time. Also, from the graph we always know what device we will be executing on and thus we don't need to have a CodeGen cache in TensorExprKernel - we always have one and only one CodeGen. Test Plan: Imported from OSS Reviewed By: protonu Differential Revision: D21432145 Pulled By: ZolotukhinM fbshipit-source-id: 8dc86b891713056b2c62f30170cd4a168912f027	2020-05-13 14:02:23 -07:00
Mikhail Zolotukhin	ebfe631ed8	[TensorExpr] Cleanup TensorExprKernel class and add CPP tests for it. (#36952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36952 Differential Revision: D21139939 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: a6605c0d6ccbb049ce27e6cdcc8fd8d2ebc057e3	2020-04-23 10:51:33 -07:00

27 Commits