pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Bert Maher	c14a3613a8	Fix NaN propagation in TE fuser's min/max implementation (#43609 ) Summary: Per eager mode source-of-truth, NaNs shall be propagated by min/max. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43609 Reviewed By: ZolotukhinM Differential Revision: D23349184 Pulled By: bertmaher fbshipit-source-id: 094eb8b89a02b27d5ecf3988d0f473c0f91e4afb	2020-09-01 02:10:13 -07:00
Pritam Damania	f1624b82b5	Preserve python backtrace in autograd engine errors. (#43684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43684 This PR attempts to address #42560 by capturing the appropriate exception_ptr in the autograd engine and passing it over to the Future. As part of this change, there is a significant change the Future API where we now only accept an exception_ptr as part of setError. For the example in #42560, the exception trace would now look like: ``` > Traceback (most recent call last): > File "test_autograd.py", line 6914, in test_preserve_backtrace > Foo.apply(t).sum().backward() > File "torch/tensor.py", line 214, in backward > torch.autograd.backward(self, gradient, retain_graph, create_graph) > File "torch/autograd/__init__.py", line 127, in backward > allow_unreachable=True) # allow_unreachable flag > File "torch/autograd/function.py", line 87, in apply > return self._forward_cls.backward(self, *args) > File "test_autograd.py", line 6910, in backward > raise ValueError("something") > ValueError: something ``` ghstack-source-id: 111109637 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D23365408 fbshipit-source-id: 1470c4776ec8053ea92a6ee1663460a3bae6edc5	2020-09-01 01:28:47 -07:00
Alex Suhan	85d91a3230	[TensorExpr] Check statements in test_kernel.cpp (#43911 ) Summary: Check statements and fix all the warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43911 Test Plan: test_tensorexpr Reviewed By: ZolotukhinM Differential Revision: D23441092 Pulled By: asuhan fbshipit-source-id: f671eef4b4eb9b51acb15054131152ae650fedbd	2020-08-31 22:16:25 -07:00
Alex Suhan	deb5fde51c	[TensorExpr] Make KernelSumMultipleAxes much faster (#43905 ) Summary: Reduce input size, skip the dtype conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43905 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.KernelSum* Reviewed By: ailzhang Differential Revision: D23433398 Pulled By: asuhan fbshipit-source-id: 0d95ced3c1382f10595a9e5745bf4bef007cc913	2020-08-31 17:58:43 -07:00
Elias Ellison	a7e7981c0b	Use prim::TensorExprGroup interned symbol (#43635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43635 Intern the symbol, no functional changes. Aliasing need to be looked at but this should be done in a separate PR; this PR is just changing the symbol. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23358806 Pulled By: eellison fbshipit-source-id: f18bcd142a0daf514136f019ae607e4c3f45d9f8	2020-08-31 11:52:16 -07:00
Nick Gibson	1390cad2d8	[NNC] Hook up registerizer to Cuda codegen [2/x] (#42878 ) Summary: Insert the registerizer into the Cuda Codegen pass list, to enable scalar replacement and close the gap in simple reduction performance. First up the good stuff, benchmark before: ``` Column sum Caffe2 NNC Simple Better (10, 100) 5.7917 9.7037 6.9386 6.0448 (100, 100) 5.9338 14.972 7.1139 6.3254 (100, 10000) 21.453 741.54 145.74 12.555 (1000, 1000) 8.0678 122.75 22.833 9.0778 Row sum Caffe2 NNC Simple Better (10, 100) 5.4502 7.9661 6.1469 5.5587 (100, 100) 5.7613 13.897 21.49 5.5808 (100, 10000) 21.702 82.398 75.462 22.793 (1000, 1000) 22.527 129 176.51 22.517 ``` After: ``` Column sum Caffe2 NNC Simple Better (10, 100) 6.0458 9.4966 7.1094 6.056 (100, 100) 5.9299 9.1482 7.1693 6.593 (100, 10000) 21.739 121.97 162.63 14.376 (1000, 1000) 9.2374 29.01 26.883 10.127 Row sum Caffe2 NNC Simple Better (10, 100) 5.9773 8.1792 7.2307 5.8941 (100, 100) 6.1456 9.3155 24.563 5.8163 (100, 10000) 25.384 30.212 88.531 27.185 (1000, 1000) 26.517 32.702 209.31 26.537 ``` Speedup about 3-8x depending on the size of the data (increasing with bigger inputs). The gap between NNC and simple is closed or eliminated - remaining issue appears to be kernel launch overhead. Next up is getting us closer to the _Better_ kernel. It required a lot of refactoring and bug fixes on the way: * Refactored flattening of parallelized loops out of the CudaPrinter and into its own stage, so we can transform the graph in the stage between flattening and printing (where registerization occurs). * Made AtomicAddFuser less pessimistic, it will now recognize that if an Add to a buffer is dependent on all used Block and Thread vars then it has no overlap and does not need to be atomic. This allows registerization to apply to these stores. * Fixed PrioritizeLoad mutator so that it does not attempt to separate the Store and Load to the same buffer (i.e. reduction case). * Moved CudaAnalysis earlier in the process, allowing later stages to use the analyzed bufs. * Fixed a bug in the Registerizer where when adding a default initializer statement it would use the dtype of the underlying var (which is always kHandle) instead of the dtype of the Buf. * Fixed a bug in the IRMutator where Allocate statements logic was inverted to be replaced only if they did not change. * Added simplification of simple Division patterns to the IRSimplifier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42878 Reviewed By: glaringlee Differential Revision: D23382499 Pulled By: nickgg fbshipit-source-id: 3640a98fd843723abad9f54e67070d48c96fe949	2020-08-31 10:39:46 -07:00
Alex Suhan	60ad7e9c04	[TensorExpr] Make sum available from Python (#43730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43730 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_sum test_tensorexpr --gtest_filter=TensorExprTest.KernelSum* Reviewed By: ZolotukhinM Differential Revision: D23407600 Pulled By: asuhan fbshipit-source-id: e6da4690ae6d802f9be012e39e61b7467aa5285c	2020-08-29 10:38:21 -07:00
Nikolay Korovaiko	000739c31a	Function calls for fallback paths (#43274 ) Summary: This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274 Reviewed By: malfet Differential Revision: D23406961 Pulled By: Krovatkin fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c	2020-08-28 23:31:02 -07:00
Vinod Kumar S	13c7c6227e	Python/C++ API Parity: TransformerDecoder (#42886 ) Summary: Fixes #{[37756](https://github.com/pytorch/pytorch/issues/37756)} Pull Request resolved: https://github.com/pytorch/pytorch/pull/42886 Reviewed By: zhangguanheng66 Differential Revision: D23385631 Pulled By: glaringlee fbshipit-source-id: 610a2fabb4c25b2dfd37b33287215bb8872d653d	2020-08-28 20:13:53 -07:00
Mikhail Zolotukhin	776c2d495f	[JIT] IRParser: store list attributes as generic ivalue lists. (#43785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43785 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23400565 Pulled By: ZolotukhinM fbshipit-source-id: e248eb1854c4ec40da9455d4279ea6e47b1f2a16	2020-08-28 13:27:28 -07:00
Mike Ruberry	f4695203c2	Fixes fft function calls for C++ API (#43749 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43732. Requires importing the fft namespace in the C++ API, just like the Python API does, to avoid clobbering torch::fft the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43749 Reviewed By: glaringlee Differential Revision: D23391544 Pulled By: mruberry fbshipit-source-id: d477d0b6d9a689d5c154ad6c31213a7d96fdf271	2020-08-28 12:41:30 -07:00
Martin Yuan	288a2effa0	Operator generator based on templated selective build. (#43456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43456 Introduce the template OperatorGenerator, which returns an optional Operator. It's null if the templated bool value is null. RegisterOperators() is updated to take the optional Operator. A null will not be registered. With this update the selective operator registration can be done at compile time. Tests are added to show an operator can be registered if it's in a whitelist and it will not be registered if it's not in the whitelist. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D23283563 Pulled By: iseeyuan fbshipit-source-id: 456e0c72b2f335256be800aeabb797bd83bcf0b3	2020-08-27 07:26:07 -07:00
Alex Suhan	de84db2a9d	[TensorExpr] Add aten::sum lowering to the kernel (#43585 ) Summary: Handles all dimensions and selected dimensions, per PyTorch semantics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43585 Test Plan: test_tensorexpr Reviewed By: bertmaher Differential Revision: D23362382 Pulled By: asuhan fbshipit-source-id: e8d8f1197a026be0b46603b0807d996a0de5d58c	2020-08-27 02:46:47 -07:00
lixinyu	48e08f884e	C++ APIs TransformerEncoder (#43187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43187 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23182770 Pulled By: glaringlee fbshipit-source-id: 968846138d4b1c391a74277216111dba8b72d683	2020-08-27 01:31:46 -07:00
James Reed	a070c619b9	[FX] Native callables in FX lowering (#43426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43426 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23273427 Pulled By: jamesr66a fbshipit-source-id: 3a9d04486c72933d8afd9c181578fe98c3d825b0	2020-08-27 00:00:03 -07:00
Mikhail Zolotukhin	3ec24f02af	[TensorExpr] Start using typecheck in the fuser. (#43173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43173 With this change the fuser starts to generate typechecks for inputs of fusion group. For each fusion group we generate a typecheck and an if node: the true block contains the fused subgraph, the false block contains unoptimized original subgraph. Differential Revision: D23178230 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: f56e9529613263fb3e6575869fdb49973c7a520b	2020-08-25 18:13:32 -07:00
Mikhail Zolotukhin	b763666f9f	[JIT] Subgraph utils: add an optional vmap argument to the API to allow retrieving value mappings. (#43235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43235 This functionality is needed when we want to not lose track of nodes/values as we merge and unmerge them into other nodes. For instance, if we have a side data structure with some meta information about values or nodes, this new functionality would allow to keep that metadata up to date after merging and unmerging nodes. Differential Revision: D23202648 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: 350d21a5d462454166f8a61b51d833551c49fcc9	2020-08-25 18:13:29 -07:00
Ann Shan	7cc1efec13	Add lite SequentialSampler to torch mobile (#43299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43299 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23228415 Pulled By: ann-ss fbshipit-source-id: eebe54353a128783f039c7dac0e2dd765a61940d	2020-08-24 09:45:24 -07:00
Nikolay Korovaiko	a97ca93c0e	remove prim::profile and special-casing (#43160 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43160 Reviewed By: ZolotukhinM Differential Revision: D23284421 Pulled By: Krovatkin fbshipit-source-id: 35e97aad299509a682ae7e95d7cef53301625309	2020-08-22 23:52:36 -07:00
Zino Benaissa	40c77f926c	Add prim::TypeCheck operation (#43026 ) Summary: TypeCheck is a new operation to check the shape of tensors against expectd shapes. TypeCheck is a variadic operation. An example, %t0 : Tensor = ... %t1 : Tensor = ... %2 : FLOAT(20, 20), %3 : FLOAT(30, 30), %1 : bool = prim::TypeCheck(%t1, %t2) prim::If(%1) Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43026 Reviewed By: ZolotukhinM Differential Revision: D23115830 Pulled By: bzinodev fbshipit-source-id: fbf142126002173d2d865cf4b932dea3864466b4	2020-08-21 20:03:24 -07:00
Raghavan Raman	100649d6a9	Normalize loops with non-zero start. (#43179 ) Summary: This diff normalizes for-loops that have non 0 loop starts to always start from 0. Given a for-loop, this normalization changes the loop start to be 0 and adjusts the loop end and all accesses to the index variable within the loop body appropriately. This diff also adds tests for several cases of normalization and also tests normalization in conjunction with `splitwithTail` transformation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43179 Reviewed By: nickgg Differential Revision: D23220534 Pulled By: navahgar fbshipit-source-id: 64be0c72e4dbc76906084f7089dea81ae07d6020	2020-08-21 12:37:27 -07:00
Alex Suhan	f20a04fa2d	[TensorExpr] Simplify conditional select (#43350 ) Summary: Fold conditional select when both sides are constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43350 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConditionalSelectFold* Reviewed By: pbelevich Differential Revision: D23256602 Pulled By: asuhan fbshipit-source-id: ec04b1e4ae64f59fa574047f2d7af55a717a5262	2020-08-21 11:15:48 -07:00
lixinyu	e32d014f46	remove empty override pretty_print (#43341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43341 This is to remove the empty pretty_print() since it overrides the impl within Module base which is not as designed here. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D23244616 Pulled By: glaringlee fbshipit-source-id: 94b8dfd3697dfc450f53b3b4eee6e9c13cafba7b	2020-08-20 18:48:29 -07:00
Ann Shan	dd194c1612	add _save_parameters to serialize map (#43163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43163 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23175287 Pulled By: ann-ss fbshipit-source-id: ddfd734513c07e8bdbec108f26d1ca1770d098a6	2020-08-18 14:58:04 -07:00
Ann Shan	2e6e295ecc	refactor _save_parameters to _save_data (#43162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43162 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23175286 Pulled By: ann-ss fbshipit-source-id: 6f930b98c367242fd4efbf51cb1d09995f7c4b40	2020-08-18 14:57:03 -07:00
Christian Sarofeen	b3bda94393	[NVFuser] Enable E2E BCast-PWise-Reduction fusions (#43129 ) Summary: Had a bunch of merged commits that shouldn't have been there, reverted them to prevent conflicts. Lots of new features, highlights listed below. Overall: - Enables pointwise fusion, single (but N-D) broadcast -- pointwise fusion, single (but N-D) broadcast -- pointwise -- single (but N-D) reduction fusion. Integration: - Separate "magic scheduler" logic that takes a fusion and generates code generator schedule - Reduction fusion scheduling with heuristics closely matching eagermode (unrolling supported, but no vectorize support) - 2-Stage caching mechanism, one on contiguity, device, type, and operations, the other one is input size->reduction heuristic Code Generation: - More generic support in code generation for computeAt - Full rework of loop nest generation and Indexing to more generically handle broadcast operations - Code generator has automatic kernel launch configuration (including automatic allocation of grid reduction buffers) - Symbolic (runtime) tilling on grid/block dimensions is supported - Simplified index generation based on user-defined input contiguity - Automatic broadcast support (similar to numpy/pytorch semantics) - Support for compile time constant shared memory buffers - Parallelized broadcast support (i.e. block reduction -> block broadcast support) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43129 Reviewed By: mrshenli Differential Revision: D23162207 Pulled By: soumith fbshipit-source-id: 16deee4074c64de877eed7c271d6a359927111b2	2020-08-18 09:10:08 -07:00
lixinyu	269fdb5bb2	prepare to split transformer header file (#43069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43069 The transformer c++ impl need to put TransformerEncoderLayer/DecoderLayer and TransformerEncoder/TransformerDecoder in different header since TransformerEncoder/Decoder's options class need TransformerEncoderLayer/DecoderLayer as input parameter. Split header files to avoid cycle includsion. Test Plan: Imported from OSS Reviewed By: yf225 Differential Revision: D23139437 Pulled By: glaringlee fbshipit-source-id: 3c752ed7702ba18a9742e4d47d049e62d2813de0	2020-08-17 07:54:05 -07:00
Ann Shan	248b6a30f4	add training mode to mobile::Module (#42880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42880 Enable switching between and checking for training and eval mode for torch::jit::mobile::Module using train(), eval(), and is_training(), like exists for torch::jit::Module. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23063006 Pulled By: ann-ss fbshipit-source-id: b79002148c46146b6e961cbef8aaf738bbd53cb2	2020-08-17 00:20:03 -07:00
Elias Ellison	91f3114fc1	[JIT] Represent profiled types as a node attribute (#43035 ) Summary: This changes profiled types from being represented as: `%23 : Float(4:256, 256:1, requires_grad=0, device=cpu) = prim::profile(%0)` -> `%23 : Tensor = prim::profile[profiled_type=Float(4:256, 256:1, requires_grad=0, device=cpu)](%0)` Previously, by representing the profiled type in the IR directly it was very easy for optimizations to accidentally use profiled types without inserting the proper guards that would ensure that the specialized type would be seen. It would be a nice follow up to extend this to prim::Guard as well, however we have short term plans to get rid of prim::Guard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43035 Reviewed By: ZolotukhinM Differential Revision: D23120226 Pulled By: eellison fbshipit-source-id: c78d7904edf314dd65d1a343f2c3a947cb721b32	2020-08-14 20:17:46 -07:00
Shen Li	06aaf8c20d	Add set_device_map to TensorPipeOptions to support GPU args (#42637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42637 This commit enables sending non-CPU tensors through RPC using TensorPipe backend. Users can configure device mappings by calling set_map_location on `TensorPipeRpcBackendOptions`. Internally, the `init_rpc` API verifies the correctness of device mappings. It will shutdown RPC if the check failed, or proceed and pass global mappings to `TensorPipeAgent` if the check was successful. For serde, we added a device indices field to TensorPipe read and write buffers, which should be either empty (all tensors must be on CPU) or match the tensors in order and number in the RPC message. This commit does not yet avoid zero-copy, the tensor is always moved to CPU on the sender and then moved to the specified device on the receiver. Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D23011572 Pulled By: mrshenli fbshipit-source-id: 62b617eed91237d4e9926bc8551db78b822a1187	2020-08-14 18:46:55 -07:00
Heitor Schueroff de Souza	3d8c144400	Implemented torch::nn::Unflatten in libtorch (#42613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42613 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23030302 Pulled By: heitorschueroff fbshipit-source-id: 954f1cdfcbd3a62a7f0e887fcf5995ef27222a87	2020-08-14 15:32:13 -07:00
Nikita Shulga	2f9fd8ad29	Build test_e2e_tensorpipe only if Gloo is enabled (#43041 ) Summary: test_e2e_tensorpipe depends on ProcessGroupGloo, therefore it could not be tested with Gloo disabled Otherwise, it re-introduces https://github.com/pytorch/pytorch/issues/42776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43041 Reviewed By: lw Differential Revision: D23122101 Pulled By: malfet fbshipit-source-id: a8a088b6522a3bc888238ede5c2d589b83c6ea94	2020-08-14 09:24:47 -07:00
Luca Wehrstedt	ed242cbec5	Guard TensorPipe agent by USE_TENSORPIPE (#42682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42682 ghstack-source-id: 109834351 Test Plan: CI Reviewed By: malfet Differential Revision: D22978717 fbshipit-source-id: 18b7cbdb532e78ff9259e82f0f92ad279124419d	2020-08-14 02:57:36 -07:00
taivu	ccd9f3244b	Get, save, and load module information for each operator (#42133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42133 Test Plan: We save a module with module debugging information as follows. ``` import torch m = torch.jit.load('./detect.pt') # Save module without debug info m._save_for_lite_interpreter('./detect.bc') # Save module with debug info m._save_for_lite_interpreter('./detect.bc', _save_debug_info_in_bytecode=True) ``` Size of the file without module debugging information: 4.508 MB Size of the file with module debugging information: 4.512 MB Reviewed By: kimishpatel Differential Revision: D22803740 Pulled By: taivu1998 fbshipit-source-id: c82ea62498fde36a1cfc5b073e2cea510d3b7edb	2020-08-14 01:25:27 -07:00
Vinod Kumar S	830423b80b	Python/C++ API Parity: TransformerDecoderLayer (#42717 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42717 Reviewed By: zhangguanheng66 Differential Revision: D23095841 Pulled By: glaringlee fbshipit-source-id: 327a5a23c9a3cca05e422666a6d7d802a7e8c468	2020-08-13 20:31:13 -07:00
Nick Gibson	6fb5ce5569	[NNC] Fix some bugs in Round+Mod simplification (#42934 ) Summary: When working on the Cuda Codegen, I found that running the IRSimplifier before generating code lead to test fails. This was due to a bug in Round+Mod simplification (e.g. (x / y * y) + (x % y) => x) to do with the order in which the terms appeared. After fixing it and writing a few tests around those cases, I found another bug in simplification of the same pattern and have fixed it (with some more test coverage). Pull Request resolved: https://github.com/pytorch/pytorch/pull/42934 Reviewed By: zhangguanheng66 Differential Revision: D23085548 Pulled By: nickgg fbshipit-source-id: e780967dcaa7a5fda9f6d7d19a6b7e7b4e94374b	2020-08-13 09:47:21 -07:00
Bram Wasti	ba9025bc1a	[tensorexpr] Autograd for testing (#42548 ) Summary: A simple differentiable abstraction to allow testing of full training graphs. Included in this 1st PR is an example of trivial differentiation. If approved, I can add a full MLP and demonstrate convergence using purely NNC (for performance testing) in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42548 Reviewed By: ZolotukhinM Differential Revision: D23057920 Pulled By: bwasti fbshipit-source-id: 4a239852c5479bf6bd20094c6c35f066a81a832e	2020-08-13 07:58:06 -07:00
Luca Wehrstedt	8493b0d5d6	Enroll TensorPipe agent in C++-only E2E test (#42680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42680 ghstack-source-id: 109544678 Test Plan: CI Reviewed By: mrshenli Differential Revision: D22978714 fbshipit-source-id: 04d6d190c240c6ead9bd9f3b7f3a5f964d7451e8	2020-08-13 07:07:30 -07:00
Nick Gibson	aabdef51f9	[NNC] Registerizer for GPU [1/x] (#42606 ) Summary: Adds a new optimization pass, the Registerizer, which looks for common Stores and Loads to a single item in a buffer and replaces them with a local temporary scalar which is cheaper to write. For example it can replace: ``` A[0] = 0; for (int x = 0; x < 10; x++) { A[0] = (A[0]) + x; } ``` with: ``` int A_ = 0; for (int x = 0; x < 10; x++) { A_ = x + A_; } A[0] = A_; ``` This is particularly useful on GPUs when parallelizing, since after replacing loops with metavars we have a lot of accesses like this. Early tests of simple reductions on a V100 indicates this can speed them up by ~5x. This diff got a bit unwieldy with the integration code so that will come in a follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42606 Reviewed By: bertmaher Differential Revision: D22970969 Pulled By: nickgg fbshipit-source-id: 831fd213f486968624b9a4899a331ea9aeb40180	2020-08-11 11:17:50 -07:00
Heitor Schueroff de Souza	ffc3da35f4	Don't materialize output grads (#41821 ) Summary: Added a new option in AutogradContext to tell autograd to not materialize output grad tensors, that is, don't expand undefined/None tensors into tensors full of zeros before passing them as input to the backward function. This PR is the second part that closes https://github.com/pytorch/pytorch/issues/41359. The first PR is https://github.com/pytorch/pytorch/pull/41490. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41821 Reviewed By: albanD Differential Revision: D22693163 Pulled By: heitorschueroff fbshipit-source-id: a8d060405a17ab1280a8506a06a2bbd85cb86461	2020-08-11 04:27:07 -07:00
Nikita Shulga	64a7939ee5	test_cpp_rpc: Build test_e2e_process_group.cpp only if USE_GLOO is true (#42836 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42836 Reviewed By: seemethere Differential Revision: D23041274 Pulled By: malfet fbshipit-source-id: 8605332701271bea6d9b3a52023f548c11d8916f	2020-08-10 16:54:26 -07:00
Ann Shan	13bc542829	Fix lite trainer unit test submodule registration (#42714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42714 Change two unit tests for the lite trainer to register two instances/objects of the same submodule type instead of the same submodule object twice. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22990736 Pulled By: ann-ss fbshipit-source-id: 2bf56b5cc438b5a5fc3db90d3f30c5c431d3ae77	2020-08-07 18:26:56 -07:00
lixinyu	98de150381	C++ API TransformerEncoderLayer (#42633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42633 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22994332 Pulled By: glaringlee fbshipit-source-id: 873abdf887d135fb05bde560d695e2e8c992c946	2020-08-07 11:49:42 -07:00
Nick Gibson	944ac133d0	[NNC] Remove VarBinding and go back to Let stmts (#42634 ) Summary: Awhile back when commonizing the Let and LetStmt nodes, I ended up removing both and adding a separate VarBinding section the Block. At the time I couldn't find a counter example, but I found it today: Local Vars and Allocations dependencies may go in either direction and so we need to support interleaving of those statements. So, I've removed all the VarBinding logic and reimplemented Let statements. ZolotukhinM I think you get to say "I told you so". No new tests, existing tests should cover this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42634 Reviewed By: mruberry Differential Revision: D22969771 Pulled By: nickgg fbshipit-source-id: a46c5193357902d0f59bf30ab103fe123b1503f1	2020-08-07 10:50:38 -07:00
Luca Wehrstedt	c30bc6d4d7	Update TensorPipe submodule (#42522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522 Main changes: - Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch. - Changed the way the preprocessor flags are provided, and changed their name. There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories. I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one. Test Plan: CI Reviewed By: malfet Differential Revision: D22959472 fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67	2020-08-06 02:14:58 -07:00
Ilia Cherniavskii	a53fdaa23f	Remove ProfiledType (#42570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570 ProfiledType doesn't do anything and is not used atm, removing Test Plan: CI Reviewed By: ezyang Differential Revision: D22938664 Pulled By: ilia-cher fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0	2020-08-06 01:52:08 -07:00
Mike Ruberry	ccfce9d4a9	Adds fft namespace (#41911 ) Summary: This PR creates a new namespace, torch.fft (torch::fft) and puts a single function, fft, in it. This function is analogous to is a simplified version of NumPy's [numpy.fft.fft](https://numpy.org/doc/1.18/reference/generated/numpy.fft.fft.html?highlight=fft#numpy.fft.fft) that accepts no optional arguments. It is intended to demonstrate how to add and document functions in the namespace, and is not intended to deprecate the existing torch.fft function. Adding this namespace was complicated by the existence of the torch.fft function in Python. Creating a torch.fft Python module makes this name ambiguous: does it refer to a function or module? If the JIT didn't exist, a solution to this problem would have been to make torch.fft refer to a callable class that mimicked both the function and module. The JIT, however, cannot understand this pattern. As a workaround it's required to explicitly `import torch.fft` to access the torch.fft.fft function in Python: ``` import torch.fft t = torch.randn(128, dtype=torch.cdouble) torch.fft.fft(t) ``` See https://github.com/pytorch/pytorch/issues/42175 for future work. Another possible future PR is to get the JIT to understand torch.fft as a callable class so it need not be imported explicitly to be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41911 Reviewed By: glaringlee Differential Revision: D22941894 Pulled By: mruberry fbshipit-source-id: c8e0b44cbe90d21e998ca3832cf3a533f28dbe8d	2020-08-06 00:20:50 -07:00
Alexandru Suhan	1848b43c4d	[NNC] Add loop unroll transformation (#42465 ) Summary: Unroll a loop with constant boundaries, replacing it with multiple instances of the loop body. For example: ``` for x in 0..3: A[x] = x*2 ``` becomes: ``` A[0] = 0 A[1] = 2 A[2] = 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/42465 Test Plan: `test_tensorexpr` unit tests. Reviewed By: agolynski Differential Revision: D22914418 Pulled By: asuhan fbshipit-source-id: 72ca10d7c0b1ac7f9a3688ac872bd94a1c53dc51	2020-08-05 20:46:32 -07:00
Mikhail Zolotukhin	ef50694d44	[TensorExpr] Apply GenericIntrinsicExpander recursively. (#42567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42567 Before this change we didn't expand arguments, and thus in an expr `sigmoid(sigmoid(x))` only the outer call was expanded. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D22936177 Pulled By: ZolotukhinM fbshipit-source-id: 9c05dc96561225bab9a90a407d7bcf9a89b078a1	2020-08-05 14:13:46 -07:00

1 2 3 4 5 ...

1014 Commits