pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
David Berard	e86d8323cb	[JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554 In native_functions.yaml, the schemas for batch_norm and instance_norm are incorrect: the inputs `running_mean` and `running_var` are mutated, but are not marked as such in the function schema. Since `(a!)?` annotations are currently not working (see #65760), this instead adds a special case to `alias_anaysis.cpp`. If the value of `training` or `use_input_stats` is known to be `false`, then `alias_analysis` will mark the input as _not_ being written to. Test Plan: Removed the `skip` annotation on the following test, and added a special exception in `check_alias_annotations`: ``` python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm ``` Also: ``` ./build/bin/test_jit --gtest_filter="BatchAndInstanceNormFixture" ``` Imported from OSS Reviewed By: eellison Differential Revision: D31612339 fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb	2021-10-20 10:22:10 -07:00
Michael Suo	1bf0e1acb4	Revert D31732414: Add Initial NNC Dynamic Shapes Flow Test Plan: revert-hammer Differential Revision: D31732414 (`de4fe7a38c`) Original commit changeset: 290a94a667c2 fbshipit-source-id: 3021a1d7a8661967e37d4f9cfc86ed47cc4a7f3d	2021-10-19 20:05:29 -07:00
Elias Ellison	de4fe7a38c	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732414 Pulled By: eellison fbshipit-source-id: 290a94a667c20467717202a43c60e4f9ca4c00e2	2021-10-19 16:41:49 -07:00
gmagogsfm	147f7559b1	Add `SourceView` which doesn't own source text as base class of `Source` (#65309 ) Summary: This would save the cost copying text from stack to heap in some cases (like parsing function schema during loading phase of libtorch.so) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309 Reviewed By: swolchok Differential Revision: D31060315 Pulled By: gmagogsfm fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a	2021-10-18 23:17:22 -07:00
Richard Barnes	e0643fa3fc	use irange for loops 5 (#66744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705358 fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48	2021-10-18 21:59:50 -07:00
Will Constable	d05c1ec007	Add lazy Node base and associated infra (#66601 ) Summary: - Adds Node base class and unit tests - Also adds metadata utils to enable source code annotation and scope tracking Pull Request resolved: https://github.com/pytorch/pytorch/pull/66601 Test Plan: Add new unit tests Reviewed By: desertfire Differential Revision: D31634044 fbshipit-source-id: a042d54f06fbc480acfc63c18d43cb6fceb6fea5	2021-10-18 19:09:42 -07:00
Ivan Yashchuk	0d203a16fe	Add relative and absolute tolerances for matrix_rank, pinv (#63102 ) Summary: This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`. Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified. FC compatibility: https://github.com/pytorch/pytorch/pull/63102#discussion_r710930509 Fixes https://github.com/pytorch/pytorch/issues/54151. Fixes https://github.com/pytorch/pytorch/issues/66618. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63102 Reviewed By: H-Huang Differential Revision: D31641456 Pulled By: mruberry fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c	2021-10-17 22:15:42 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Scott Wolchok	e88d1c4f10	[PyTorch] Add tuple inline storage (#64066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066 I noticed a bunch of time being spent heap-allocating Tuples in the unpickler. 1-, 2-, and 3-element Tuples are apparently common enough that they get their own bytecode instructions, so I decided to try also giving them their own representation. We store up to 3 IValues inline in `Tuple` rather than doing a second heap allocation for a `std::vector<IValue>`. ghstack-source-id: 140695395 Test Plan: Added automated tests for TupleElements. Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422 We went from 347 ms to 302 ms. Reviewed By: dhruvbird Differential Revision: D30592622 fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8	2021-10-15 12:16:51 -07:00
Rohan Varma	06fa6c15c0	Back out "Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"" (#66393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393 Third try! Fixes: - test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that. - ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed. ghstack-source-id: 140560113 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534735 fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2	2021-10-14 22:23:22 -07:00
soulitzer	93d326c868	Add InplaceOrView boxed kernel (#63878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878 See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context: In this PR: - Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel - Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node. - This limitation makes it impossible to use this node for general use - view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor) - do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable - Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature: - static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`. - Adds testing: - for views: view relationship created - performing in-place operation on the view, raises properly - trying to create two view relationships is not allowed, - single view relationship but not first input/first output should error - view relation created properly for tensor vector output - for in-place: - version count bump - triggers rebase_history - multiple mutations is okay and also updates version counter - TODO (follow up): Update tutorials for adding third-party operators (and document the above limitations) - TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30901714 Pulled By: soulitzer fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c	2021-10-12 18:55:50 -07:00
Kimish Patel	c6216b2a43	Back out "Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source" (#66421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421 Original commit changeset: ab6bb8fe4e83 Plus this incldes BUILD.bazel changes, the reason for the revert. Test Plan: See original diff Reviewed By: gdankel Differential Revision: D31542513 fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb	2021-10-12 10:55:29 -07:00
Animesh Jain	cc24e4e5d0	[NNC] Normalize loops in SplitWithTail (#66242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66242 While working on random test generation, I observed that many simple transformations were upsetting vectorization. Digging deeper, I found that it calls SplitWithTail which incorrectly splits the loop when the loop start is not zero. This path normalizes the loop before we start splitting it. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31506853 Pulled By: anijain2305 fbshipit-source-id: 5c5f2568ce0a239bfaa515458be52541eafd23b1	2021-10-11 13:44:05 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Jane Xu	0a48f56318	Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" Test Plan: revert-hammer Differential Revision: D31299350 (`f1f3bd8c36`) Original commit changeset: 9ad5c8fa17f7 fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317	2021-10-08 17:55:28 -07:00
Jane Xu	c62ed96496	Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source Test Plan: revert-hammer Differential Revision: D30710710 (`c1343ff706`) Original commit changeset: 51399f9b0b64 fbshipit-source-id: ab6bb8fe4e83ed1052e621e427259192a4f0f540	2021-10-08 17:46:18 -07:00
Rohan Varma	f1f3bd8c36	Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883 Original commit changeset: d8e962b8aab6 ghstack-source-id: 139836954 Test Plan: ci Reviewed By: zhaojuanmao Differential Revision: D31299350 fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130	2021-10-08 16:04:20 -07:00
Kimish Patel	c1343ff706	[Pytorch Edge] Support profiling kineto events from external source (#64397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397 This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Imported from OSS Reviewed By: raziel Differential Revision: D30710710 fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2	2021-10-08 15:59:42 -07:00
Raghavan Raman	92ce188510	Revert D31445799: [nnc] Use given kernel function name while emitting code Test Plan: revert-hammer Differential Revision: D31445799 (`c30dc52739`) Original commit changeset: 8d1642098313 fbshipit-source-id: 6b9d8c816437e9fcba8eb19cc683bc0a46a04cf5	2021-10-08 12:39:01 -07:00
Raghavan Raman	2e6fa0261f	Revert D31445797: [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination Test Plan: revert-hammer Differential Revision: D31445797 (`7e5ef5e517`) Original commit changeset: 4e1450100928 fbshipit-source-id: fc13b34dbb66c7a22816eb46cf6d98ae9f332d39	2021-10-08 12:38:59 -07:00
Scott Wolchok	2d885ab73d	[jit] Reduce refcounting of Types (#65345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345 FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership. ghstack-source-id: 140044165 Test Plan: CI perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial. Reviewed By: hlu1 Differential Revision: D31027361 fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8	2021-10-08 09:03:04 -07:00
Raghavan Raman	7e5ef5e517	[nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31445797 Pulled By: navahgar fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed	2021-10-07 13:17:11 -07:00
Raghavan Raman	c30dc52739	[nnc] Use given kernel function name while emitting code (#66216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216 Test Plan: Imported from OSS Reviewed By: dagitses, priyaramani Differential Revision: D31445799 Pulled By: navahgar fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281	2021-10-07 13:15:46 -07:00
Will Constable	a8c0b362ce	[pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181 Original commit changeset: 3d0d5377d71e Test Plan: Run PyTorch XLA corresponding PR in XLA CI: https://github.com/pytorch/xla/pull/3148/files Reviewed By: suo Differential Revision: D31416438 fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40	2021-10-07 10:05:26 -07:00
Chen Lai	a5895f85be	[PyTorch Edge][type] Add type check in compatibility api (#63129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129 1. Add an api to get `supported_types` from runtime, expose in c++ only. 2. Add an api to get `contained_types` from model, expose in both c++ and PyThon. 3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string. 4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime. 5. Expand the unittest for compatibility to cover type 6. Add unit test in python to check type list ghstack-source-id: 139826944 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' buck test //caffe2/test:mobile ``` Reviewed By: iseeyuan Differential Revision: D30231419 fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947	2021-10-06 02:23:44 -07:00
Michael Suo	f062def486	Revert D31260343: [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core Test Plan: revert-hammer Differential Revision: D31260343 (`e94fea08d0`) Original commit changeset: 8bb1194188e3 fbshipit-source-id: 3d0d5377d71ed928015bcb2105801be368e38cd8	2021-10-05 17:15:50 -07:00
Will Constable	e94fea08d0	Add hash and int128 utils for Lazy Tensor Core (#65635 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635 Reviewed By: alanwaketan Differential Revision: D31260343 Pulled By: wconstab fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e	2021-10-05 16:43:55 -07:00
Nikita Shulga	4c4525fa5c	Compile without -Wno-unused-variable (take 2) (#66041 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Do not delete `caffe2::OperatorBase::Output` calls as they have side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041 Reviewed By: ngimel Differential Revision: D31360142 Pulled By: malfet fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8	2021-10-04 20:39:39 -07:00
Don Jang	7941590a51	[JIT] Selectively enable precise alias analysis for TupleConstruct (#66025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025 This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (`cd458fe092`)) to minimize its exposure only to `StaticRuntime` as of now. Test Plan: Modified existing unit tests whose behavior depends on D30437737 (`cd458fe092`). Reviewed By: eellison Differential Revision: D31350285 fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6	2021-10-01 20:42:22 -07:00
Nikita Shulga	e4ee5ca698	Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable Test Plan: revert-hammer Differential Revision: D31326599 (`a6280ab653`) Original commit changeset: 924155f1257a fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf	2021-10-01 20:40:47 -07:00
Nikita Shulga	a6280ab653	Compile without -Wno-unused-variable (#65954 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954 Reviewed By: ngimel Differential Revision: D31326599 Pulled By: malfet fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3	2021-10-01 17:40:47 -07:00
Hariom Narang	2828ce53fd	Added jit log stream changing function and some refactor (#65768 ) Summary: Description: - Have only added `stdout` and `stderr` as possible options from python API for now. We can do file path passing later maybe. - Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file. Python API: `torch._C._jit_set_logging_stream('stdout\|stderr')` C++ API: `::torch::jit::set_jit_logging_output_stream(ostream);` Testing: - Tested python API locally. - Unit test for the C++ API is written Fixes https://github.com/pytorch/pytorch/issues/54182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768 Reviewed By: mrshenli Differential Revision: D31291739 Pulled By: ZolotukhinM fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d	2021-09-30 23:25:11 -07:00
Michael Suo	33c03cb61a	[deploy][1/n] Make deploy code conform to PyTorch style. (#65861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861 First in a series. This PR changes the code in deploy.h/cpp and interpreter_impl.h/cpp to be camel case instead of snake case. Starting with this as it has the most impact on downstream users. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D31291183 Pulled By: suo fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934	2021-09-30 22:59:47 -07:00
Mikhail Zolotukhin	3a0165da49	[TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551 Previously we had a big switch on Op kind to decide how to lower a given JIT operator to NNC. This PR changes this switch to a hash table lookup. Why? This helps us with at least two things: 1) With this approach we can easily check if we know how to handle a given node in advance - i.e. we can inspect the entire graph and tell whether it's possible to compile it or not without actually trying to do that and dying in the middle. This would allow us to, say, provide user-friendly error messages in AOT workflow. 2) We can switch to use schema instead of op kind to determine correct lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963) and using it instead of schema can lead to bugs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148926 Pulled By: ZolotukhinM fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704	2021-09-30 22:56:18 -07:00
Don Jang	cd458fe092	[JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879 This change makes the output of `prim::TupleConstruct` alias only with its inputs when the created tuple is directly returned from the graph. The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used. Test Plan: Added - `AliasMoveForTupleConstructWithSingleUseAsGraphOutput` - `WildcardAliasForTupleConstructWithUses` to cover the newly added code. Reviewed By: eellison Differential Revision: D30437737 fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb	2021-09-29 21:56:31 -07:00
Mike Ruberry	91f8755b0e	Revert D31005792: [NCCL] Init dummy NCCL comms in constructor Test Plan: revert-hammer Differential Revision: D31005792 (`2b22a5dde2`) Original commit changeset: c2c582dee25a fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27	2021-09-29 20:46:38 -07:00
Rohan Varma	2b22a5dde2	[NCCL] Init dummy NCCL comms in constructor (#65173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173 Initializes dummy NCCL communicators in constructor for a basic health check that communicators can be initialized prior to launching the first collective. After successful init, we immediately use `ncclCommAbort` to destroy these communicators to ensure they don't interfere with regular communicator creation during collectives. Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31005792 fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b	2021-09-29 15:36:54 -07:00
Scott Wolchok	ece25c453f	[PyTorch] Store Argument::alias_info_ on the heap (#64824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824 See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite. ghstack-source-id: 138958896 Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector. Reviewed By: suo Differential Revision: D30860719 fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f	2021-09-24 17:00:51 -07:00
Peter Bell	68e5935498	Remove fgrad_input from slow_conv2d (#64280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64280 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30830887 Pulled By: jbschlosser fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4	2021-09-24 14:27:39 -07:00
XiaobingSuper	1682722152	keep output type after calling SubgraphRewriter (#65453 ) Summary: For jit SubgraphRewriter, it doesn't keep output type after overwriting the old graph, for example, in profiling mode, the old graph has the old operator's shapes, but after replacing the old operator with a newer operator by applying SubgraphRewriter, the tensor shape info was eliminated. The activation is that I want to replace pytorch convolution with a customer's convolution, I first register aten::_convolution as a profiler node that can reorder the input and output's shapes, and then using graph rewrite to replace it as aten::conv2d, which tensors' shapes info are eliminated. I hope using input size do some pre-progress before replacing aten::conv2d with the customer's convolution. Before rewrite: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %x : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::_convolution(%x.1, %weight, %4, %3, %2, %3, %6, %2, %7, %6, %6, %5, %5), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3. 6/site-packages/torch/nn/modules/conv.py:443:0 %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%x, %z, %7) # jit_test.py: 24:0 return (%16) ``` after rewrite by using aten::conv2d ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:22:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Tensor = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py:24:0 return (%16) ``` expected result after replace aten::_convolution with aten::conv2d: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py :24:0 return (%16) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65453 Reviewed By: zdevito Differential Revision: D31162489 Pulled By: ZolotukhinM fbshipit-source-id: 0d1c1d607cb612df47c64f173d9f4c9e8b1d6c49	2021-09-24 11:07:40 -07:00
kshitij12345	a012216b96	[nn] Fold : no batch dim (#64909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64907 Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909 Reviewed By: cpuhrsch, heitorschueroff Differential Revision: D30991087 Pulled By: jbschlosser fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3	2021-09-23 08:37:32 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit `03389dc851`. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Chen Lai	880098a7e3	[PyTorch Edge] Backport function for defaults args with out args, flag on (#63651 ) Summary: 1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4. 2. Bump bytecode version from 6 to 7 3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators. 4. unittest to cover backport function 5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651 ghstack-source-id: 138539912 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions ``` Reviewed By: raziel, tugsbayasgalan Differential Revision: D30454080 fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307	2021-09-20 22:50:30 -07:00
Mengwei Liu	eaf85fad62	[PyTorch] Extract parseOperator() into a standalone source file (#65179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179 This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build. Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`. Reviewed By: iseeyuan Differential Revision: D31006555 fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac	2021-09-17 13:31:59 -07:00
Jane Xu	1ee66a5278	Remove CUDA 9.2 references conditionals and workarounds (#65070 ) Summary: Title says it all Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070 Reviewed By: malfet Differential Revision: D30966464 Pulled By: janeyx99 fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8	2021-09-17 12:28:23 -07:00
Raghavan Raman	bbe25af0df	[nnc] Updated inlining to handle cases when producer indices are constants after eval (#65044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65044 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30954655 Pulled By: navahgar fbshipit-source-id: dfaedb5af710b2625ceec3a443a6c4e34158ab16	2021-09-17 11:28:48 -07:00
Raghavan Raman	03fc636d5c	[nnc] Updated inliner to remove assertions and exception (#64719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30828583 Pulled By: navahgar fbshipit-source-id: 9826a59085a210e44d101a843ff2cae440dfd633	2021-09-17 11:28:46 -07:00
Edward Yang	9601deb1b3	Disable autograd fallback tests on Windows (#65147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147 I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763 ghstack-source-id: 138247203 Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg` Reviewed By: soulitzer Differential Revision: D30992685 fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6	2021-09-17 08:32:43 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00

1 2 3 4 5 ...

1611 Commits