pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiaoqiang Zheng	5e504e83e8	Add sync-point insertions and block/thread local memory allocations (#36563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36563 Test Plan: Imported from OSS Differential Revision: D21014238 Pulled By: zheng-xq fbshipit-source-id: 4d61ff2f76345ea2825f2d5f60a771f65b24ad69	2020-04-20 18:52:30 -07:00
Xiaoqiang Zheng	32bbf12aa7	Make trivial thread-idx for degenerate statements without thread-idx. (#36480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36480 Test Plan: Imported from OSS Differential Revision: D20992505 Pulled By: zheng-xq fbshipit-source-id: 3d4e5401b59b9507b5f2db659e511bd1af53f5ab	2020-04-17 02:31:07 -07:00
Mike Ruberry	b45b9673a1	Fixes clang format (#36787 ) Summary: Fixes clang format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36787 Differential Revision: D21084603 Pulled By: mruberry fbshipit-source-id: 7e29da135f9a2aa126cb68640e33c1914fd570e3	2020-04-17 00:42:51 -07:00
Wanchao Liang	6d4c509168	[autograd] lower MAX_DEPTH limit according to TSAN limit (#36745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36745 As we hold a mutex for our custom C++ Node, when calling reentrant backward from custom C++ function, we will cocurrently holding many mutexes up to MAX_DEPTH. TSAN only allow 65 mutexes at once, otherwise it will complain. This PR lower the limit according to TSAN. TSAN Reference: https://github.com/google/sanitizers/issues/950 Test Plan: Imported from OSS Differential Revision: D21072604 Pulled By: wanchaol fbshipit-source-id: 99cd1acab41a203d834fa4947f4e6f0ffd2e70f2	2020-04-16 20:43:20 -07:00
Owen Anderson	1fc3556ec9	Teach the tensorexpr vectorizer to handle nested For loops. (#36467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36467 Differential Revision: D21013179 Pulled By: resistor fbshipit-source-id: aa4f3da58cf16934f11e0cf4252a300cbac98f21	2020-04-16 15:40:44 -07:00
Karl Ostmo	4894cba572	Revert D19775659: [WIP] Move profiler to a dispatch wrapper Test Plan: revert-hammer Differential Revision: D19775659 Original commit changeset: 5cbe5f736660 fbshipit-source-id: dcb41d2433697c5d521044a9dbc12c79f31e0929	2020-04-16 14:18:51 -07:00
Nick Gibson	ee3d046f87	[TensorExpr] Add support for Axis reordering in LoopNest (#36540 ) Summary: Adds a capability for reordering axes in the LoopNest. This was fairly straightforward except when handling Reduction initializers which required more changes, UPDATE: actually the complicated bit was preserving the ordering of statements in the loopnest which should not be reordered. Usage looks something like this: ``` Tensor* tensor = Compute( "f", {{2, "x"}, {3, "y"}}, [](const VarHandle& x, const VarHandle& y) { return ExprHandle(1.0f) + cast<float>(x) * x + cast<float>(y) * y; }); LoopNest l({tensor}); /* LoopNest looks like: for x in ... for y in ... f[x,y] = 1 + x * x + y * y; / auto loops = l.getLoopStmtsFor(tensor); l.reorderAxis(tensor, loops[0], loops[1]) / LoopNest looks like: for y in ... for x in ... f[x,y] = 1 + x * x + y * y; */ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36540 Differential Revision: D21068143 Pulled By: nickgg fbshipit-source-id: f02c29004376df4f5a9bedff366c075772726618	2020-04-16 13:42:47 -07:00
James Reed	a85c835196	[WIP] Move profiler to a dispatch wrapper (#33057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33057 Test Plan: Imported from OSS Differential Revision: D19775659 Pulled By: jamesr66a fbshipit-source-id: 5cbe5f736660c8543764ef62b16550638d9ceb72	2020-04-16 13:36:37 -07:00
Michael Ranieri	3567b881a5	make sure dispatch test works on windows (#36729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36729 setenv not available on windows Test Plan: CI green in ovrsource Reviewed By: stepancheg Differential Revision: D21067835 fbshipit-source-id: ddbc3285ef88f123dc6a200b661c48cfafc6bf00	2020-04-16 11:36:56 -07:00
Christian Sarofeen	f11c4f90c2	New CUDA Fuser: Unrolling support, interface refactor (#36435 ) Summary: Unrolling support has been added in a way that we get good performing code on GPUs. Not sure how long this link will last but an example of a generated unrolled kernel is: https://godbolt.org/z/i0uAv3 What can be seen from there is multiple calls of "ld.global.f32" without "ld.store.f32" in between them (and vice versa). This means that we are launching multiple loads that can be run in parallel, as well as multiple stores that can be run in parallel. This can be a crucial optimization for memory bound kernels. This was generally a point of concern in TVM as an attempt of a similar kernel from TVM produces: https://godbolt.org/z/Vu97vG which surrounds load - store pairs in conditional branches preventing the benefits of unrolling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36435 Reviewed By: ZolotukhinM Differential Revision: D21024011 Pulled By: soumith fbshipit-source-id: e852e282fa7a304aba962e1926f756098c011fe0	2020-04-16 09:20:24 -07:00
Nick Gibson	7539ea0207	[TensorExpr] Add simplification of length 0 and 1 For loops to IR Simplifier (#36348 ) Summary: Simplifies loops which can be collapsed down into a single block or removed entirely. E.g. ``` For 0..1 { Statements... } ``` Is now just `Block({Statements...})` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36348 Differential Revision: D21057959 Pulled By: nickgg fbshipit-source-id: 2f95a19a965c4a6e023680e2cea9ea846e82d62e	2020-04-15 23:56:34 -07:00
Elias Ellison	9cbeb0faed	[JIT] Dont optimize shape peepholes on inline (#36404 ) Summary: With https://github.com/pytorch/pytorch/pull/35562, we are running peephole optimization on inlining to reduce the number of nodes that are copied. The tracer encodes the sizes in the graph like: ``` graph(%0 : Double(7)): %1 : Function = prim::Constant[name="tensor_size"]() %2 : Tensor = prim::CallFunction(%1, %0) return (%2) ``` however people would like to reuse the graph with different shapes so running size invalidations would invalidate that. long term it might be better for the tracer to not include shape information but there are downstream users of that. Separates out FuseAddMM from peephole so that now there is a single `disable_size_optimizations` parameter, and onnx explicitly invokes fuseaddmm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36404 Differential Revision: D20968974 Pulled By: eellison fbshipit-source-id: 56f8f1699e3b0adeeccdfd5a67bb975fd41a2913	2020-04-15 17:49:48 -07:00
Nick Gibson	a99b169828	[TensorExpr] fix a bug in LLVM codegen around empty kernels (#36660 ) Summary: LLVM Codegen assumes that the kernel contains real statements, but that is not guaranteed, especially after IR Simplification. This PR adds a catch for the case where no value is generated after recursing the LLVMCodegen visitor through the kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36660 Differential Revision: D21044066 Pulled By: nickgg fbshipit-source-id: e521c766286b1ff4e26befcec7ff4959db8181a4	2020-04-15 17:45:06 -07:00
Nikita Shulga	6bd6b70a02	Fix clang-format (#36685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36685 Differential Revision: D21052657 Pulled By: malfet fbshipit-source-id: b4ec7eba21864108a1108f8c83b5d33cf31ab89e	2020-04-15 17:02:20 -07:00
Xiaoqiang Zheng	dad25ae47d	Add the one-block multi-thread global reduction support. (#36306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36306 Missing __syncthreads between sections. Differential Revision: D20957254 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Pulled By: zheng-xq fbshipit-source-id: c988f0205b667174b3ee851c28adeec2dbd089f7	2020-04-15 13:05:11 -07:00
Xiaoqiang Zheng	e80813fae3	Add trivial reduce for Cuda (#36293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36293 Detect non-read-only loads, and not to use __ldg. Resubmiting #36092 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D20935933 Pulled By: zheng-xq fbshipit-source-id: f9280db26aa9c9c8119cea12571bc820f5fbcb61	2020-04-15 13:03:58 -07:00
Mikhail Zolotukhin	317f598103	[TensorExpr] Clang-format test/cpp/tensorexpr/*. (#36615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36615 Test Plan: Imported from OSS Differential Revision: D21027733 Pulled By: ZolotukhinM fbshipit-source-id: e19cd85c1634f4e40805814ac71eec719d6587f8	2020-04-14 19:08:18 -07:00
Mikhail Zolotukhin	d5ba39c25d	[TensorExpr] Postpone insertion of Alloc/Free statements in computeAt. (#36526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36526 Test Plan: Imported from OSS Differential Revision: D21004740 Pulled By: ZolotukhinM fbshipit-source-id: 8ac8db0d4e31065e4fbd3e0cc27f15a15dcb141c	2020-04-13 22:30:00 -07:00
Mikhail Zolotukhin	df5f0a04ff	[TensorExpr] Implement LoopNest::computeAt (#36112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36112 Differential Revision: D20885662 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 4ea6293b249562fca46739dc36c5483d912e5838	2020-04-11 04:01:14 -07:00
Mikhail Zolotukhin	397aa46a3e	[TensorExpr] Bounds inference (#35120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35120 Differential Revision: D20567926 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 89a2afcddaf23a5c6259c15e4f7194e8649c1c4d	2020-04-11 03:59:34 -07:00
Nick Gibson	42457e634d	[TensorExpr] add support for Reduction Ops (#35866 ) Summary: Second attempt at the reduction frontend for the TensorExpr compiler. Has two APIs, a simple version for common reduction types and a customizable Reducer fronted which allows specifying initializer, reduction interaction via lambda and body via lambda. Simple API looks like so: ``` Buffer b(BufHandle("b", {10}), kInt); Tensor* c = Reduce("sum", {}, Sum(b), {{10, "m"}}); ``` An example of specializing a Sum to do Matmul: ``` Buffer tA(BufHandle("tA", {M, K}), kFloat); Buffer tB(BufHandle("tB", {K, N}), kFloat); Sum matmul([&](ParameterList& v) { ExprHandle m = v[0]; ExprHandle n = v[1]; ExprHandle k = v[2]; return tA(m, k) * tB(k, n); }); Tensor* mm = Reduce("mm", {{M, "m"}, {N, "n"}}, matmul, {{K, "k"}}); ``` A fully specialized Reduction: ``` VarHandle searchValue("searchValue", kInt); Buffer b(BufHandle("b", {4, 10}), kInt); Reducer anyEqSV( ExprHandle(0), [](ExprHandle a, ExprHandle b) { return CompareSelect::make(a, 1, 1, b, kEQ); }, [&](ParameterList& v) { return CompareSelect::make(b.call(v), searchValue, kEQ); }); Tensor* any = Reduce("anyEqual", {{4, "i"}}, anyEqSV, {{10, "j"}}); ``` --- Until lowering, Reductions are held in a compound form for easier optimization: ``` VarHandle m("m", kInt); Buffer b(BufHandle("b", {2, 3, m}), kFloat); Tensor* c = Reduce("sum", {{2, "l"}, {3, "n"}}, Sum(b), {{m, "m"}}); LoopNest loop({c}); std::cout << loop.root_stmt() << "\n"; ``` ``` for (int l = 0; l < 2; l++) { for (int n = 0; n < 3; n++) { for (int m = 0; m < m_1; m++) { sum[l, n] = ReduceOp(sum[l, n] = float(0);, (sum[l, n]) + (b[l, n, m]), {m}); } } } ``` ``` loop.prepareForCodegen(); std::cout << loop.root_stmt() << "\n"; ``` ``` for (int l = 0; l < 2; l++) { for (int n = 0; n < 3; n++) { sum[(0 + l * (1 * 3)) + n * 1] = float(0); for (int m = 0; m < m_1; m++) { sum[(0 + l * (1 * 3)) + n * 1] = (sum[(0 + l * (1 * 3)) + n * 1]) + (b[((0 + l * ((1 * m_1) * 3)) + n * (1 * m_1)) + m * 1]); } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35866 Differential Revision: D20965577 Pulled By: nickgg fbshipit-source-id: afe506c90db794447180056417013bcaf0e2c049	2020-04-10 11:57:10 -07:00
Nick Gibson	477f1c047c	[TensorExpr] add simplication of constant branches to IR Simplifier (#36257 ) Summary: Adds handling of constant branches to the TensorExpr IR Simplifier. This covers both IfThenElse and Cond when the condition expression is a known constant (e.g. `IfThenElse(1, X, Y) => X`), or when both arms of the branch are the same (e.g. `IfThenElse(Y, X, X) => X`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36257 Differential Revision: D20947777 Pulled By: nickgg fbshipit-source-id: 974379e42a6d65ce3e7178622afb62d36ad4e380	2020-04-09 14:45:13 -07:00
Christian Sarofeen	e551bfc8de	New CUDA Fuser code lowering refactor (#36199 ) Summary: This PR completely refactors the code lowering process from our IR to CUDA. Before we had one giant step that would go from a relatively high level IR straight to CUDA, now we're lowering this first into concepts like ForLoop, IfThenElse, TensorIndex, Allocate. This lowering will allow us to do more complex code lowering like reductions and unrolling. Unrolling will quickly follow this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36199 Reviewed By: dzhulgakov Differential Revision: D20925220 Pulled By: soumith fbshipit-source-id: 8f621c694c68a1aad8653e625d7287fe2d8b35dc	2020-04-09 14:27:05 -07:00
Nick Gibson	1443db8dc3	[TensorExpr] fix bug in IRSimplifier when multiplying by 0 (#36287 ) Summary: In the IR Simplifier we were not treating multiply by zero specially, which meant some constant expressions were stored in formats that were not constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36287 Differential Revision: D20937497 Pulled By: nickgg fbshipit-source-id: 528e430313ea048524d7a4a0256eef4a0297438b	2020-04-09 09:55:16 -07:00
Nick Gibson	caa45c8e33	[TensorExpr] fix warnings (#36167 ) Summary: Fix a bunch of minor warnings in jit/tensorexpr, mostly unused variable & wrong sign comparisons. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36167 Differential Revision: D20905081 Pulled By: nickgg fbshipit-source-id: 16fe605a86f08596f64e74e9337c59a2581a4d5a	2020-04-08 15:42:29 -07:00
Nick Gibson	195362d74c	[TensorExpr] scalar factorization of Div (#36154 ) Summary: Add support for the TensorExpr IR Simplifier to factorize common terms on either side of a Div node. e.g. `(8 * x) / (4 * y) => (2 * x) / y`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36154 Differential Revision: D20910580 Pulled By: nickgg fbshipit-source-id: ee071d93bc4711b1e710be312de599d18ab506f3	2020-04-08 11:56:07 -07:00
Mike Ruberry	3570ef6a0f	Revert D20876204: [pytorch][PR] Add trivial reduce for Cuda Test Plan: revert-hammer Differential Revision: D20876204 Original commit changeset: a719f3583cc4 fbshipit-source-id: 6d00afb3a24754d283a7b832c0b784ed9fce36e1	2020-04-06 20:17:04 -07:00
Xiaoqiang Zheng	a81be33a4e	Add trivial reduce for Cuda (#36092 ) Summary: Detect non-read-only loads, and not to use __ldg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36092 Reviewed By: ZolotukhinM Differential Revision: D20876204 Pulled By: zheng-xq fbshipit-source-id: a719f3583cc4ca30fcfb49d999ca785181354d84	2020-04-06 17:58:50 -07:00
Martin Yuan	82087ee7f6	Add DICT_CONSTRUCT and NAMED_TUPLE_CONSTRUCT to lite interpreter (#36015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36015 Test Plan: Imported from OSS Reviewed By: linbinyu Differential Revision: D20853995 Pulled By: iseeyuan fbshipit-source-id: 153f76d223f9ffc71e2259b741a7e5d78ae63f22	2020-04-04 09:52:58 -07:00
Will Feng (FAIAR)	5fab1bf3e4	Use `std::abs` instead of `abs` in lbfgs.cpp (#35974 ) Summary: This supersedes https://github.com/pytorch/pytorch/pull/35698. `abs` is a C-style function that takes only integral argument `std::abs` is polymorphic and can be applied to both integral and floating point types This PR also increases `kBatchSize` in `test_optimizer_xor` function in `test/cpp/api/optim.cpp` to fix `OptimTest.XORConvergence_LBFGS` failure under ASAN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35974 Test Plan: CI Reviewed By: pbelevich Differential Revision: D20853570 Pulled By: yf225 fbshipit-source-id: 6135588df2426c5b974e4e097b416955d1907bd4	2020-04-04 09:37:21 -07:00
Ashkan Aliabadi	b7f4b6a6de	Support for XNNPACK max pooling operator. (#35354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35354 Differential Revision: D20821862 Test Plan: Imported from OSS Pulled By: AshkanAliabadi fbshipit-source-id: 156fb8db85ab194919f68fd99599f08f2647b695	2020-04-03 22:53:15 -07:00
Ilia Cherniavskii	a604041a11	Back out "[pytorch][PR] indexing: throw exception for masks with dtype=uint8" (#36013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36013 Original commit changeset: f4ebaabf427d Test Plan: CI Differential Revision: D20853694 fbshipit-source-id: 93deb43f67a385ddfd6853fef6f1dc6de408ec37	2020-04-03 21:40:02 -07:00
Pavel Belevich	4b64dffcb6	Move uniform_() to DistributionTemplates(Migrate uniform_ from TH to ATen) (#35580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35580 `uniform_kernel_cpu` is based on https://github.com/pytorch/pytorch/pull/30954 Test Plan: Imported from OSS Differential Revision: D20820221 Pulled By: pbelevich fbshipit-source-id: 13f9fc8fc75b0e9fb48021f2ac08dcb38212a53f	2020-04-03 16:37:44 -07:00
Nikita Shulga	03a4a4887d	Fix `clang-format` (#35969 ) Summary: Just run `./tools/clang_format.py --verbose` and `git commit --all` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35969 Test Plan: CI Differential Revision: D20845626 Pulled By: malfet fbshipit-source-id: 0ae9a91dfa33417a021e7e9d233baba4188daf81	2020-04-03 14:36:20 -07:00
davidriazati	596153cad1	[jit] Enable type tags in serialization (#35741 ) Summary: This enables the serialization part of this change (the deserialization stuff is already landed #33255) Pull Request resolved: https://github.com/pytorch/pytorch/pull/35741 Pulled By: driazati Differential Revision: D20758124 fbshipit-source-id: e2cdefa99c3bec991491e5e967e7f1661ca7ffd9	2020-04-03 11:59:42 -07:00
Song Zhou	dabeff33b9	[pytorch] Fix fblearner flow compiling errors (#35902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35902 Move operator registration to anonymous namespace to avoid collision. Reviewed By: soumith Differential Revision: D20822382 fbshipit-source-id: 1ab00871491668b8b85e803ac877d96477f1688b	2020-04-02 14:52:48 -07:00
Mikhail Zolotukhin	3ef5ff6012	[TensorExpr] Make Load and Store multi-dimensional. (#35800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35800 This PR includes the following changes: * Introduce a new `Expr` type `Buf`: it plays a similar to `Var` role, but also has dimensions. * Use the new `Buf` class in `Store` and `Load` instead of `Var` for specifying where to store to or load from. `Buf` contains the dimensions info of the buffer we're loading/storing to and hence we are able to keep N-d indexes without flattening them into a 1-d index ([x,y] vs [x+yW]). Flattening of the indexes is now a separate pass that is executed in `LoopNest::prepareForCodegen` - backends still expect indexes to be flattened, and this PR preserves that. * `Tensor` now contains a `Buf` instead of `Var`, and thus Tensor now has the dimensions info (previously it was a property of a `Function`, not a `Tensor`). This brings us closer to Tensor being a combination of Buffer + Function, where Buffer specifies iteration domain and the Function defines a computation. TODOs: * Consider merging `Buffer` with `Buf` or `BufHandle`. It seems that we don't need all of them. * Harden the logic of how we create buffers in fuser pass. Currently it seems that sometimes we don't set dimensions. * Use `Buf` in `Allocate` and `Free`. * Make it clearer that `Function` doesn't "own" dimensions info and that dimensions are a property of a Tensor, not a Function. Differential Revision: D20789005 Test Plan: Imported from OSS Reviewed By: zheng-xq Pulled By: ZolotukhinM fbshipit-source-id: e04188d1d297f195f1c46669c614557d6bb6cde4	2020-04-02 11:18:28 -07:00
Christian Sarofeen	6d24f8fe21	Infrastructure for a new CUDA Fuser (#34785 ) Summary: Summary: This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_ One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated. Warning: This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser. Short term goals: Parity with current CUDA fuser (including performance): - Dynamic shapes (no recompilation) - Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code) - Dropout Mid-term goals: - Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation). - 1-D reductions fused with pointwise operations Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785 Reviewed By: ZolotukhinM Differential Revision: D20650977 Pulled By: soumith fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63	2020-04-02 09:22:42 -07:00
Nick Gibson	051132f119	[TensorExpr] simplification of round + mod pattern. (#35683 ) Summary: Adds capabilities to the TensorExpr IR Simplifier to simplify down Round + Mod patterns (e.g. `(x/y)y + x%y => x`) via means of lifting integer rounding into a temporary `RoundOff` node. This integrates with existing simplification mechanisms (folding, factorization, reordering, etc) to allow simplification of compound expressions: e.g. `20 (x / (16 / 2)) * 2 + (11 % 6) * (x % (7+1)) => 5 * x.`. Tests: ran tensorexpr cpp and python tests, ran a hpc benchmark and verified results and time didn't regress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35683 Differential Revision: D20811316 Pulled By: nickgg fbshipit-source-id: 0cd6a517fb9548b3bc689768304b97375df5ac58	2020-04-02 00:11:00 -07:00
Ilia Cherniavskii	bc6bd0bb1a	Debug Information Guard Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other) Test Plan: CI test/cpp/jit Reviewed By: dzhulgakov Differential Revision: D20602775 fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb	2020-04-01 01:55:29 -07:00
Wojciech Baranowski	2f84a07b58	indexing: throw exception for masks with dtype=uint8 (#34418 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34418 Differential Revision: D20776164 Pulled By: ngimel fbshipit-source-id: f4ebaabf427d7967f2f317235562f91c8f9216f0	2020-03-31 20:51:56 -07:00
Ilia Cherniavskii	800d5617c0	Recording of TorchScript functions (#34710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710 Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate. Test Plan: unit test (test_misc.cpp/testRecordFunction) Reviewed By: gdankel, dzhulgakov Differential Revision: D20158523 fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582	2020-03-31 00:33:23 -07:00
Nick Gibson	5b3492df18	[TensorExpr] Extend arithmetic simplifier to work with multi variable expressions (Attempt 2) (#35415 ) Summary: https://github.com/pytorch/pytorch/pull/35127 was landed and reverted because I missed a test fail (oops). I have found and fixed the issue, which was due to zero terms being introduced after the point that filtered them out (usually required NAN/INF, e.g. x / INF => 0). See https://github.com/pytorch/pytorch/pull/35127 for more info. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35415 Reviewed By: ZolotukhinM Differential Revision: D20702957 Pulled By: nickgg fbshipit-source-id: 119eb41e9fa676bd78e3d1df99297a47ae312185	2020-03-28 00:19:55 -07:00
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
Nikolay Korovaiko	9e22d15f14	Enable tensorexpr cpp tests in CI. try #2 (#35454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35454 Differential Revision: D20665160 Pulled By: Krovatkin fbshipit-source-id: e04cbe92b2ee5a3288f3c4e5c83533bfea85bf85	2020-03-27 12:09:55 -07:00
anjali411	5371fdb1a0	[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 ) Summary: 1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer) 2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function. 3. BC-compatibility serialization test for LBFGS 4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions 5. Made defaults_ optional argument in all optimizers except SGD TODO: add BC-breaking notes for this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20678162 Pulled By: yf225 fbshipit-source-id: 74e062e42d86dc118f0fbaddd794e438b2eaf35a	2020-03-26 19:53:02 -07:00
Meghan Lele	6384c2d81b	[JIT] clang-format JIT code (#35115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115 This commit runs the newly added tools/clang_format.py on the JIT codebase and includes all of the formatting changes thus produced. Testing: Ran the script, CI. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20568523 Pulled By: SplitInfinity fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b	2020-03-26 11:24:51 -07:00
Edward Yang	843fd740fb	Revert D20645945: [pytorch][PR] [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer Test Plan: revert-hammer Differential Revision: D20645945 Original commit changeset: 383588065bf1 fbshipit-source-id: 6d7bc5676de64e329d9862889f32033c76b4009c	2020-03-26 06:40:34 -07:00
Suraj Menon	aa01a95c6d	Revert D20630760: [pytorch][PR] Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP] Test Plan: revert-hammer Differential Revision: D20630760 Original commit changeset: 7d2f27aca6b1 fbshipit-source-id: 28ac92b3390651a4a67061d6ebf208515b9b9463	2020-03-25 20:34:46 -07:00
anjali411	efbd6b8533	[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 ) Summary: 1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer) 2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function. 3. BC-compatibility serialization test for LBFGS 4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions 5. Made defaults_ optional argument in all optimizers except SGD TODO: add BC-breaking notes for this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957 Differential Revision: D20645945 Pulled By: yf225 fbshipit-source-id: 383588065bf1859b38f0ad0a25d93d41e153c96e	2020-03-25 18:26:02 -07:00

1 2 3 4 5 ...

833 Commits