pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mike Ruberry	ddea6c552f	Ports full dtype inference deprecation to 1.6 (#40799 ) * ports full deprecation * fixtures * Fixes lint * Trying to fix phantom lint issue * nuclear lint option * Paradoxical linter fix Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2020-07-01 09:27:27 -07:00
Mike Ruberry	75a074abdc	1.6 Port: Dynamic Versioning (#40542 ) Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2020-06-30 10:18:18 -07:00
Deepali Chourasia	02e091902f	Release DistAutogradContainer context for each dist_autograd test case (#38711 ) Summary: this fixes - https://github.com/pytorch/pytorch/issues/38710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38711 Differential Revision: D22132057 fbshipit-source-id: 894280d164543c63beaec679c18f2059e7055b01	2020-06-18 20:58:55 -07:00
Xiang Gao	954a59a2f5	Add at::tensor(complex) and torch::tensor(complex) overload (#39793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39793 Differential Revision: D22067181 Pulled By: anjali411 fbshipit-source-id: 3cec1289a8aa3a9cc6bd1fcdb2974f858f75f7bd	2020-06-18 16:20:27 -07:00
Sotiris Lamprinidis	41f2dbde31	Add `AdamW` to C++ frontend (#40009 ) Summary: Slightly modified Adam, following the python implementation, and the `ProducesPyTorchValues` tests pass. I had a problem with another test though (see commit c1a6241676ab84fc531c1c3a10f964aa5704092e), it seems that optimizing for two steps with the same optimizer vs optimizing for two steps using freshly initialized objects will produce the same output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40009 Differential Revision: D22096053 Pulled By: glaringlee fbshipit-source-id: a31a8f5488cb37c53752ddf15436efabdba67dc4	2020-06-18 15:28:12 -07:00
Mikhail Zolotukhin	79450edad3	[JIT] IRParser: properly parse negative numbers. (#39981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39981 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D22032786 Pulled By: ZolotukhinM fbshipit-source-id: b6c5237ac5c1c331d5053a620eb9a37a4c698125	2020-06-15 12:28:41 -07:00
Jeremy Lilley	569c85b45d	[futures] Add assert to Future constValue() accessor, add hasValue(). (#39950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39950 Per the comment in the code, constValue() should only be used in the case where the future was complete and value was not an error. Add an assert to enforce this. Also, add hasValue() accessor for completeness. ghstack-source-id: 105815597 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit: Differential Revision: D22021776 fbshipit-source-id: b59b6c775eab344068a76f4cd8c3a9dc1f2a174e	2020-06-15 12:11:22 -07:00
Kurt Mohler	124cdf2290	Add experimental deterministic flag (#38683 ) Summary: Adds `torch.experimental.deterministic` flag to enforce deterministic algorithms across all of pytorch. Adds `torch.experimental.deterministic_error_level` to allow users to choose between error/warning/silent if determinism for an operation is not available. Adds `torch.experimental.alert_not_deterministic()` which should be called within operations that are not deterministic. Offers both Python and ATen interfaces Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38683 Differential Revision: D21998093 Pulled By: ezyang fbshipit-source-id: 23aabbddd20f6199d846f97764ff24d728163737	2020-06-12 08:44:06 -07:00
Jerry Zhang	004aa089a6	[jit][subgraph_rewriter] Support list of filters (#39867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39867 Support list of filters in subgraph rewriter, the rewrite will execute only when the match passes all filter check. this is useful for different matches to share the same filter. Test Plan: Imported from OSS Differential Revision: D22009855 fbshipit-source-id: 67aab8d6326b2011a9061397699dc62ee9ad4e2d	2020-06-12 08:24:49 -07:00
Christian Sarofeen	80e5ebf989	[nvFuser] Transform replay refactor and minor updates (#39579 ) Summary: We've got quite a few things going on, preparing a push back to upstream so we don't get too desynced. - Major refactor of transform replay. It is now far more robust and fixes bugs discovered in reductions. Preparing for extension to explicit broadcast ops which will be the last major memory pattern for op coverage. Broadcast ops will allow us to express up to and potentially beyond norms and gemms. - Initial runtime expression evaluator. This allows us to evaluate expressions at runtime. Will be useful for determining our grid/block layout at runtime, so we don't have to manually compute them according to the code we're trying to generate. - Moving to int64 and double for scalar representations to match PyTorch JIT. - Improvements in codegen interface where we return Tensor like object instead of parent class Val. - Add `addcmul` and `lerp` ops - General updates, fixes, test additions, test inprovements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39579 Differential Revision: D21974001 Pulled By: soumith fbshipit-source-id: 7f7ccc91593466e948f3ce90f8f9b7fbc5c28de2	2020-06-11 23:04:24 -07:00
Nick Gibson	63dc1363e6	[TensorExpr] Eliminate Cond statements when each branch is a different kind of empty (#39754 ) Summary: Fix another simplification edge case, a Cond statement when one branch is nullptr and the other is a zero stmt block. This happens mostly with an if with no else branch where all statements inside the if are removed (eg via inlining or simplification). Common case is SplitWithMask -> ComputeInline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39754 Differential Revision: D21962987 Pulled By: nickgg fbshipit-source-id: 2461415466fbbab88d2329061f90fcfdfa85e243	2020-06-11 17:08:14 -07:00
Nick Gibson	2b29feace4	[TensorExpr] Fix IRPrinter for function calls with uniqued names (#39753 ) Summary: IRPrinter was using the name_hint rather than the uniqued name when printing FunctionCalls, leading to output that appeared incorrect. E.g. for the following set of tensorexprs: ``` producer[M, N] = M * N; chunk[M, N/2] = producer[M, N]; chunk_1[M, N/2] = producer[M, N + N/2]; consumer[M, N] = chunk_1[M, N]; ``` Before fix: ``` { for (int m = 0; m < 4; m++) { for (int n = 0; n < 20; n++) { producer[m, n] = m * n; } } for (int m_1 = 0; m_1 < 4; m_1++) { for (int n_1 = 0; n_1 < 10; n_1++) { chunk[m_1, n_1] = producer(m_1, n_1); } } for (int m_2 = 0; m_2 < 4; m_2++) { for (int n_2 = 0; n_2 < 10; n_2++) { chunk_1[m_2, n_2] = producer(m_2, n_2 + 10); } } for (int i = 0; i < 4; i++) { for (int j = 0; j < 10; j++) { consumer[i, j] = i * (chunk(i, j)); <----- HERE! } } } ``` After fix: ``` { for (int m = 0; m < 4; m++) { for (int n = 0; n < 20; n++) { producer[m, n] = m * n; } } for (int m_1 = 0; m_1 < 4; m_1++) { for (int n_1 = 0; n_1 < 10; n_1++) { chunk[m_1, n_1] = producer(m_1, n_1); } } for (int m_2 = 0; m_2 < 4; m_2++) { for (int n_2 = 0; n_2 < 10; n_2++) { chunk_1[m_2, n_2] = producer(m_2, n_2 + 10); } } for (int i = 0; i < 4; i++) { for (int j = 0; j < 10; j++) { consumer[i, j] = i * (chunk_1(i, j)); <----- HERE! } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39753 Differential Revision: D21962441 Pulled By: nickgg fbshipit-source-id: caa429caf0df9c7b16e109937412587bff6dc886	2020-06-11 12:13:28 -07:00
Nikolay Korovaiko	7f55197a57	Peel Loop (#39434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39434 Differential Revision: D21857037 Pulled By: Krovatkin fbshipit-source-id: 6583da167fe93d96e93f1c3d71f46f94e7f4e982	2020-06-10 13:48:18 -07:00
Yanan Cao	c22bbb2124	[JIT] Add Type::repr_str to return human-readable str (#39544 ) Summary: Clearly expressing a type is inferred by PyTorch instead of explicitly annotated by user makes many error messages more user-friendly Currently Type has two string conversion methods. str() for IR printing and python_str() for serialization and error message generation. If we want to include more information in type printing while maintaining serialization/deserialization correctness, we need to split python_str() into annotation_str() and repr_str(). annotation_str is solely responsible for serialization, it strictly matches format of python type annotation. repr_str() is responsible for generating a human-readable error message that includes information like "this type is inferred, not explicitly annotated" Closes https://github.com/pytorch/pytorch/issues/39449 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39544 Differential Revision: D21978759 Pulled By: gmagogsfm fbshipit-source-id: 733566f5a62e748b5ca4bb3c5943ebb6d5b664d0	2020-06-10 12:01:24 -07:00
Elias Ellison	2193fa119e	[JIT] consider side effects when trying moves in alias analysis (#39497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39497 Previously, we didn't consider side effects at all when moving nodes in alias analysis. It is never valid to reorder a node with a side effect. This has led to bugs when used with Bailouts. Unfortunately this will might cause regressions but it wasn't correct prior :/ Test Plan: Imported from OSS Differential Revision: D21963774 Pulled By: eellison fbshipit-source-id: 656995d1b82534eca65437ed4e397b2bf08a4dec	2020-06-09 19:32:55 -07:00
Jeremy Lilley	be3bbfc917	[futures] Add collectAny() to ivalue::Future for completeness (#39597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39597 To complement collectAll(), this change adds collectAny(), and writes up relevant unittest coverage. We also remove the vector-based helper version of collectAll(), which was debatable usefulness in a previous change. ghstack-source-id: 105527180 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit/... Differential Revision: D21910311 fbshipit-source-id: dbb3ca404672a3d751b1b3cf016e6084a9ff8040	2020-06-09 16:32:52 -07:00
Jeremy Lilley	b83fed8d4c	[futures] Add c++ ivalue::Future collectAll() helper (#39119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39119 Add some base c++ unittest coverage for ivalue::Future, and in the process, add a basic collectAll() primitive, per 38937. In the process, I realized that List<Future> is effectively impossible to construct (since the Future's type is not templated, but rather passed in, the getTypePtr_<T>::call() isn't defined), so added a workaround in List to make it possible. ghstack-source-id: 105309650 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit/... Differential Revision: D21756884 fbshipit-source-id: 5d40c8d1c55098de5497655c7b887f4f56508a37	2020-06-08 05:52:09 -07:00
Linbin Yu	b28422d444	add overload name for str cmp (#39607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39607 add overload name for strcmp macro to prevent duplicated op names in lite interpreter also reformatted some other files Test Plan: verified these op schema are changed ``` -aten::eq(str a, str b) -> (bool) +aten::eq.str(str a, str b) -> (bool) -aten::ne(str a, str b) -> (bool) +aten::ne.str(str a, str b) -> (bool) -aten::lt(str a, str b) -> (bool) +aten::lt.str(str a, str b) -> (bool) -aten::gt(str a, str b) -> (bool) +aten::gt.str(str a, str b) -> (bool) -aten::le(str a, str b) -> (bool) +aten::le.str(str a, str b) -> (bool) -aten::ge(str a, str b) -> (bool) +aten::ge.str(str a, str b) -> (bool) ``` Reviewed By: iseeyuan Differential Revision: D21913049 fbshipit-source-id: 518db068c8c5b0efd19223f0bd94fc3351335dc4	2020-06-06 23:21:35 -07:00
Jerry Zhang	3669e45736	[jit][subgraph_matcher] Enable regex matching for string attributes of node (#39454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39454 Test Plan: Imported from OSS Differential Revision: D21876224 fbshipit-source-id: c0fdff3a4532d2a73b222353e2cad6cf52444697	2020-06-05 23:03:38 -07:00
Nikolay Korovaiko	97a2918a07	reduce number of bailout nodes (#38281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38281 Differential Revision: D21665509 Pulled By: Krovatkin fbshipit-source-id: c2c34b759aec30d0a161e582030ba994192ee4ec	2020-06-05 13:45:37 -07:00
Nick Gibson	d31e84497c	[TensorExpr] some cleanups / fixes for LoopOptions (#39408 ) Summary: Mainly, fix a bug in the HashProvider where it would not include LoopOptions in the hash, meaning two loops would be seen as identical even if they were bound to different thread/block axes. Also added symbolic names for the different axis options. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39408 Differential Revision: D21864494 Pulled By: nickgg fbshipit-source-id: 9c28729984e7a3375e026c78294c9f75b9015123	2020-06-03 15:11:59 -07:00
Nick Gibson	2ed4ed8733	[TensorExpr] Fix two bugs in Rfactor (#39268 ) Summary: The two bugs were: * Non-reduction axes were not added when inserting the new ReduceOp, meaning if a reduction with non-reduce axes was rfactored we'd produce bad outputs. There were no tests of Rfactor with non-reduce axis so I modified a test to do this. * The new statements were always prepended to the block, meaning writes to a buffer could be reordered after the usage of that buffer. This mostly happened in the case where we rfactor a previously rfactored reduction. There was a test of this, but since it only tested rfactoring the outer reduction axis there was never any other statements at the insertion point (the tests of the insertion point argument also do this). I added a new test which covers various rfactor-axis cases. Also cleaned up tests, removed some helper code we don't need etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39268 Differential Revision: D21864489 Pulled By: nickgg fbshipit-source-id: d314d20997a8472ec96b72f7a9068d6da6d2399c	2020-06-03 14:38:34 -07:00
Ilia Cherniavskii	abe2be2063	[resubmit] Use TensorMethods.cpp (#39385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39385 see https://github.com/pytorch/pytorch/pull/37639 Test Plan: https://github.com/pytorch/pytorch/pull/37639 Imported from OSS Differential Revision: D21833287 fbshipit-source-id: 9928d3f4122903d0de67ad312e349352d5f5c19c	2020-06-02 20:27:51 -07:00
Nick Gibson	36607c85ee	[TensorExpr] eliminate zero length Allocations in IRSimplifier (#38794 ) Summary: If the size of a temporary buffer is reduced to zero via binding of a dynamic variable we still run the alloc, even though it is a no op. It's easy to strip these out during simplification, so the expr: ``` { Allocate(x, int, {0}); // Stuff... Free(x); } ``` becomes ``` { // Stuff... } ``` I am assuming here that if the allocation size is zero then any usage of the buffer is also eliminated since theres no safe way to refer to a zero size buffer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38794 Differential Revision: D21723656 Pulled By: nickgg fbshipit-source-id: 3eaa8bd8974a13b0a351be04abe2348498b31b02	2020-06-02 18:24:42 -07:00
Edward Yang	2fe0fc2684	Revert D21374247: Use TensorMethods.cpp Test Plan: revert-hammer Differential Revision: D21374247 Original commit changeset: 076964415079 fbshipit-source-id: 732ec8c561d1f37475c1b5549ba79c718e3a6db8	2020-06-01 08:12:09 -07:00
Nick Gibson	5153cdbe87	[TensorExpr] fix a bug in ReorderAxis when there are trailing loops (#38841 ) Summary: Fixes a bug in reorder axis where we append the new reordered loops to the enclosing block, even if there were statements after it. e.g. with 3 Computes: ``` for (int m1 ... for (int n1 ... for (int k1 ... Body 1 for (int m2 ... for (int n2 ... for (int k2 ... Body 2 for (int m3 ... for (int n3 ... for (int k3 ... Body 3 ``` If we reorder loops m2 and k2, we were also reordering the body statements like this: ``` for (int m1 ... for (int n1 ... for (int k1 ... Body 1 for (int m3 ... for (int n3 ... for (int k3 ... Body 3 for (int k2 ... for (int n2 ... for (int m2 ... Body 2 ``` This is because we always append the new loops to their parent. This PR fixes the logic to replace the old loop root with the new loop, which keeps things consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38841 Differential Revision: D21723670 Pulled By: nickgg fbshipit-source-id: 1dee8bb153182fcaa2cabd948197577e8e80acd7	2020-05-31 22:22:45 -07:00
Ilia Cherniavskii	68e62b9ab6	Use TensorMethods.cpp (#37639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37639 Changing TensorMethods.h to .cpp Necessary to avoid incomplete types in dispatcher Test Plan: CI Imported from OSS checked mobile size, no change, small reduction in size in fbios fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: -18.2 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -8.8 KiB reran benchmark, no stat. significant difference buck run mode/opt caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:benchmark_torchscript_model -- --model_file caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt --num_runs 3 ╷ @ 68592d0d 41 minutes ago iliacher D21374247 ╭─╯ Use TensorMethods.cpp Created 3 benchmark runs on aibench for caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt. Links to the results: * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/1729113760 * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/3867976782 * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/2782186766 hg prev @ 7f501b42 Thursday at 14:26 bvaughan D21764704 ╷ short-circuit pow for complex 1 and 0 exponents Created 3 benchmark runs on aibench for caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt. Links to the results: * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/2155256332 * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/1802057074 * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/4119590830 Differential Revision: D21374247 fbshipit-source-id: 076964415079cf84fb57f1f7b43d087afed86e1d	2020-05-31 17:11:12 -07:00
Ilia Cherniavskii	a5e023f28a	Set RecordFunction id only when needed (#39265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39265 In this PR we set id of RecordFunction only when callbacks need them and when there's at least one active callback Test Plan: testRecordFunction unit test in test_misc.cpp buck test mode/dev caffe2/test/cpp/jit:jit https://our.intern.facebook.com/intern/testinfra/testrun/8725724291116413 Reviewed By: dzhulgakov Differential Revision: D21790421 fbshipit-source-id: 016623d7f1a2a271921a71c0483061e232b40321	2020-05-29 15:34:44 -07:00
lixinyu	a04fb2ab22	[Reland] add xenial + cuda 9.2 + gcc 5.4 CI test (#39036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39036 Test Plan: Imported from OSS Differential Revision: D21731026 Pulled By: glaringlee fbshipit-source-id: ae678f786f95e3687ed6b3f176fe6736a436c721	2020-05-28 19:48:18 -07:00
Luca Wehrstedt	72f2ff5950	[TensorPipe] Improve serialization (#39010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39010 The initial version of the serialization for the TensorPipe RPC agent (i.e., the conversion from rpc::Message to tensorpipe::Message) worker around a limitation of TensorPipe of only allowing one payload per message by pickling each tensor separately and storing the pickles as metadata (which is a less efficient way of sending data over, as it goes through more copies). Having now lifter that limitation we can now improve the way we serialize. We now put the type and the id as their own payloads, we do a single pickling pass for all the tensors of the message (which allows us to deduplicate them) and store the pickle as a payload. My impression is that pickling is a somewhat costly operation, so reducing the number of times we do it should be beneficial for performance. For this same reason, another change I've done here is separate the allocation of the buffers from the deserialization. This will allow us (in the future) to perform the allocation on the I/O event loop but perform the unpickling in the worker thread, thus keeping the event loop more responsive. ghstack-source-id: 104810740 Test Plan: RPC tests Differential Revision: D21716067 fbshipit-source-id: c1475cc78afdcf0820a485ffd98c91abb35796c7	2020-05-28 10:48:24 -07:00
Luca Antiga	e088902b4a	Add type-hint check for default arguments in TorchScript C++ frontend (#39021 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/39020 by requiring users to type-hint default arguments to a TorchScript when using the C++ frontend (the Python frontend will insert those automatically). Since this is a bit of a niche use case, I opted for the simpler solution of making type-hints mandatory for default arguments, as opposed to trying to type-infer them. I left a comment in the code justifying this choice. Test is included. /cc t-vi Pull Request resolved: https://github.com/pytorch/pytorch/pull/39021 Differential Revision: D21755317 Pulled By: suo fbshipit-source-id: e007650d3bfb3a4c58c25ad2c3a17759898f303b	2020-05-28 01:42:04 -07:00
Nick Gibson	cf8001d2d0	[TensorExpr] Fix a bug in Rfactor when there are multiple reductions (#38733 ) Summary: In `LoopNest::rfactor` we assume that there is only a single reduction below the insertion point, and when replacing the reduction we recursively replace all reductions below that point. This is not a safe assumption, as a number of transformations can introduce additional ReduceOps - most directly a `splitWithTail` on the innermost reduce axis. This PR fixes that bug, and adds some unit tests covering the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38733 Differential Revision: D21723634 Pulled By: nickgg fbshipit-source-id: 3ed6ffcdc2c15aef7504f9b2b91e8d827e0b5d88	2020-05-27 16:49:34 -07:00
Nikita Shulga	c6e9e9359f	[Codemod][GleanFbcode] Remove dead includes in caffe2/test (#39023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39023 Reviewed By: orionr Differential Revision: D21702529 fbshipit-source-id: 6945bba95609102409850b105a8a091e33b8acc9	2020-05-27 14:07:26 -07:00
Nick Gibson	a25062ab50	[TensorExpr] Fix elimination of For loops with empty bodies (#38883 ) Summary: We do try to eliminate empty For loops, but missed a case where the body Block exists but is empty. In that case we can eliminate the loop as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38883 Differential Revision: D21723680 Pulled By: nickgg fbshipit-source-id: 49610b0524af5b9ec30ef3b4cc0c8461838259c3	2020-05-26 18:58:57 -07:00
Christian Sarofeen	8e69c3be17	[nvFuser] Reduction support in codegen, fp16 support (#38627 ) Summary: Adds reduction support for the code generator. Reductions are fully supported with split/merge/reorder/rfactor/computeAt/unroll operators. There is also cross thread (intra-block) reduction support. The two remaining pieces missing for reduction support is: - Safety: If cross thread reduction was used, child operators shouldn't be able to bind that thread dim anymore - Cross block reduction: we will want inter-block reduction support to match parity with tensor iterator PR also provides FP16 support for fusions now. We insert casts on FP16 inputs to FP32, and we insert casts to FP16 on FP16 outputs. Also working towards reductions and shape inference for reductions in the fusion pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38627 Reviewed By: albanD Differential Revision: D21663196 Pulled By: soumith fbshipit-source-id: 3ff2df563f86c39cd5821ab9c1148149e5172a9e	2020-05-21 17:18:39 -07:00
Jerry Zhang	a8d8fc5532	[quant][graphmode] Different rule for add/add_/mul/mul_ (#38667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38667 Test Plan: Imported from OSS Differential Revision: D21633555 fbshipit-source-id: 03b0298e83bf4dbda41b048c0edc7bb92cd4e1df	2020-05-20 19:43:46 -07:00
Michael Voznesensky	f6f1384811	[JIT] Refactor attributes to support buffers and parameters as first class citizens, add support for iterating over `named_buffers()` (#37905 ) Summary: First part of https://github.com/pytorch/pytorch/issues/36211 - still a WIP, but asking for commentary to ensure this is the direction we want to go in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37905 Differential Revision: D21633735 Pulled By: voznesenskym fbshipit-source-id: f4e4302e40114513776c9e48867a90d72049e2e9	2020-05-18 23:23:43 -07:00
Nick Gibson	2f21dfb541	[TensorExpr] Eager reduction initialization & removal from ReduceOp (#38585 ) Summary: This PR removes the deferred initializer field from ReduceOp in favour of eagerly initializing buffers when they are created (either in the constructor of `LoopNest`, or in `rfactor()`). This allows a pretty good simplification of reduction logic, removing almost all of the reduction expander and the ReduceInitCleaner & unpopular NoOp node added in the last fix. Eager initialization is better for us anyway because it allows more opportunities to transform the initialization loop. Added a few more tests, testReduceOverSplitWithTail failed before this change due to a bug in splitWithTail which now can't happen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38585 Differential Revision: D21621551 Pulled By: nickgg fbshipit-source-id: 378137e5723b4a6d6e390239efb12adce22a8215	2020-05-18 15:56:43 -07:00
Mikhail Zolotukhin	b29e7f9b9d	[TensorExpr] Use couldMoveBefore instead of couldMoveAfter checks in the fuser pass, add CPP tests. (#38592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38592 I'm not sure that using couldMoveAfter was incorrect, but using couldMoveBefore is more consistent with other subgraph-extraction passes (old fuser, create autodiff graphs, etc.), so it would make it easier to unify their implementations after this change. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D21607856 Pulled By: ZolotukhinM fbshipit-source-id: 970583af7859889d48aacf620ae028258e37a75f	2020-05-18 13:40:59 -07:00
Nick Gibson	8bf3124572	[TensorExpr] Fix bug when splitting inner reduce axis with tail (#38420 ) Summary: Fixes a bug in the following code: ``` Tensor* c = Reduce("sum", {{10, "m"}}, Sum(), b, {{10, "n"}, {10, "k"}}); // split N loop with tail: loop.splitWithTail(loop.getLoopStmtsFor(c)[1], 8, &outer, &inner, &tail); ``` When this is expanded there are two ReduceOps: ``` for (int m = 0; m < 10; m++) { for (int n_outer = 0; n_outer < (10 - 0) / 8; n_outer++) { for (int n_inner = 0; n_inner < 8; n_inner++) { for (int k = 0; k < 10; k++) { sum[m] = ReduceOp(sum, float(0), (sum[m]) + (b[m, n_outer * 8 + n_inner, k]), out_args={m}, reduce_args={n_inner, n_outer, k}); } } } for (int n_tail = 0; n_tail < (10 - 0) % 8; n_tail++) { for (int k = 0; k < 10; k++) { sum[m] = ReduceOp(sum, float(0), (sum[m]) + (b[m, n_tail + ((10 - 0) / 8) * 8, k]), out_args={m}, reduce_args={n_tail, k}); } } } ``` But each ReduceOp will expand it's initializer, which in this case will overwrite the sum of the split loop: ``` for (int m = 0; m < 10; m++) { sum[m] = 0.f; for (int n_inner = 0; n_inner < 8; n_inner++) { for (int k = 0; k < 10; k++) { sum[m] = (sum[m]) + (b[(100 * m + k) + 10 * n_inner]); } } sum[m] = 0.f; <------- HERE for (int n_tail = 0; n_tail < 2; n_tail++) { for (int k = 0; k < 10; k++) { sum[m] = (sum[m]) + (b[((100 * m + k) + 10 * n_tail) + 80]); } } } ``` The simplest fix is to remove the initializer from the tail loop, which requires adding support for Reductions without an initializer (I did via adding a NoOp Expr rather than handling nullptr). Also moved the ReductionExpander from loopnest.cpp to reduction.h as loopnest is getting a bit heavy. Added tests for all kinds of splits on a simple 3D reduction to verify no more problems of this type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38420 Differential Revision: D21587583 Pulled By: nickgg fbshipit-source-id: e0766934481917007119612eb60cc76c3242e44a	2020-05-14 22:58:28 -07:00
Michael Suo	0d220ef381	[torchbind] Better error message when missing init. (#37474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37474 Previously we would segfault Test Plan: Imported from OSS Differential Revision: D21297542 Pulled By: suo fbshipit-source-id: c7e2f828a250c490ec23fb51c6a4a642d3370e52	2020-05-13 17:38:31 -07:00
Mikhail Zolotukhin	6e13146d96	[TensorExpr] TensorExprKernel: don't do any compilation or lowering in run(). (#37948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37948 The input JIT graph has all the information we need to perform the entire compilation at the construction time. We don't need to postpone any steps until the execution time. Also, from the graph we always know what device we will be executing on and thus we don't need to have a CodeGen cache in TensorExprKernel - we always have one and only one CodeGen. Test Plan: Imported from OSS Reviewed By: protonu Differential Revision: D21432145 Pulled By: ZolotukhinM fbshipit-source-id: 8dc86b891713056b2c62f30170cd4a168912f027	2020-05-13 14:02:23 -07:00
Ilia Cherniavskii	43dd8760d7	Move ThreadLocalDebugInfo to c10 (#37774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774 Move ThreadLocalDebugInfo from ATen to C10 Test Plan: Imported from OSS Differential Revision: D21384249 Pulled By: ilia-cher fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2	2020-05-11 19:27:41 -07:00
James Reed	a553935e3c	[JIT] Expose magic methods on script::Object (#38167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38167 Test Plan: Imported from OSS Differential Revision: D21486709 Pulled By: jamesr66a fbshipit-source-id: 17b44d979fc658768b0d64f7d8af6fb684043ea3	2020-05-11 15:01:15 -07:00
Nick Gibson	33f4fca1a6	[TensorExpr] remove Let and LetStmt in favour of binding in Block (#37606 ) Summary: Implementation of the less popular proposal for eliminating overlap between LetStmt and Let: removing both and storing a mapping between Var and value Expr in the Block. This complicates some tests but simplifies the IR by restricting where variable binding can occur. I used the unit tests & python integration tests to verify this is correct but I'm unsure of coverage, particularly around the dependency checker in loopnest - ZolotukhinM your review would be useful there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37606 Differential Revision: D21467483 Pulled By: nickgg fbshipit-source-id: b402d3fce4cacf35d75f300f0a7dca32a43b6688	2020-05-09 16:23:37 -07:00
Nick Gibson	ad433e2003	[TensorExpr] Fix a bug in the IR Simplifier that could introduce a division by zero (#38055 ) Summary: In the IR Simplifier when doing partial factorization of Round+Mod patterns we divide by the lower number, which could be zero. Add in a quick check against zero avoid the crash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38055 Differential Revision: D21478486 Pulled By: nickgg fbshipit-source-id: c5083f672e91662b7d1271d817cade7fa6c39967	2020-05-08 14:58:53 -07:00
Nick Gibson	f2f8027760	[TensorExpr] simplify trivial adds/subs/muls even in Float (#37960 ) Summary: The IR Simplifier early exits when working with dtypes that are not safe to reorder. There are some cases where we still want to simplify ops in these dtypes: x + 0, x - 0, x * 0 and x * 1. It's safe to eliminate the op here and it reduces clutter in the expr. Also added a quick simplification of casts which do nothing (their type is the same as the underlying). Pull Request resolved: https://github.com/pytorch/pytorch/pull/37960 Differential Revision: D21457736 Pulled By: nickgg fbshipit-source-id: 40e20a3b55fc1afb2ec50071812238a08bded2ac	2020-05-07 17:23:47 -07:00
Ilia Cherniavskii	facc5e0cc4	Make profiler thread local (#36291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36291 Move profiler state to be a thread local property, reuse existing thread local propagation mechanism to ensure correct profiling of async tasks. This also makes push/pop callback thread safe and easier to use in e.g. distributed profilier Test Plan: USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit ./build/bin/test_jit python test/test_autograd.py python test/test_jit.py Differential Revision: D20938501 Pulled By: ilia-cher fbshipit-source-id: c0c6c3eddcfea8fc7c14229534b7246a0ad25845	2020-05-07 14:52:49 -07:00
Ilia Cherniavskii	2ef4010593	Propagate TLS callbacks with ThreadLocalState (#37745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37745 This PR makes it possible to set TLS callbacks and use them transparently not only in the main thread but also in any async tasks Test Plan: Imported from OSS Differential Revision: D21374873 Pulled By: ilia-cher fbshipit-source-id: 3be2e121673b32d7694e17e794f3b474826dffe9	2020-05-07 14:52:44 -07:00
Ilia Cherniavskii	2d708cefcc	Move RecordFunction into ATen (#37548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548 Moving RecordFunction from torch::autograd::profiler into at namespace Test Plan: CI Imported from OSS Differential Revision: D21315852 fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa	2020-05-07 14:52:39 -07:00

1 2 3 4 5 ...

922 Commits