pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
Alex Suhan	5d57025206	[TensorExpr] Add log1p support to the LLVM backend (#44839 ) Summary: Also corrected Sleef_log1p registrations, float versions had a redundant f. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM Reviewed By: glaringlee Differential Revision: D23762113 Pulled By: asuhan fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459	2020-09-17 13:38:35 -07:00
Rohan Varma	bee97d5be0	Document the default behavior for dist.new_group() when ranks=None (#44000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44000 This wasn't documented, so add a doc saying all ranks are used when ranks=None ghstack-source-id: 111206308 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D23465034 fbshipit-source-id: 4c51f37ffcba3d58ffa5a0adcd5457e0c5676a5d	2020-09-17 11:30:37 -07:00
Yanan Cao	2558e5769d	Implement sort for list of tuples (#43448 ) Summary: * Implement tuple sort by traversing contained IValue types and generate a lambda function as comparator for sort. * Tuple, class objects can now arbitrarily nest within each other and still be sortable Fixes https://github.com/pytorch/pytorch/issues/43219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43448 Reviewed By: eellison Differential Revision: D23352273 Pulled By: gmagogsfm fbshipit-source-id: b6efa8d00e112178de8256da3deebdba7d06c0e1	2020-09-17 11:20:56 -07:00
Supriya Rao	1fde54d531	[quant][qat] Ensure fake_quant and observer can be disabled on scriptmodule (#44773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773 The model is created and prepared using fx APIs and then scripted for training. In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant and observer modules on it. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qat_and_script Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23741354 fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532	2020-09-17 10:21:52 -07:00
Supriya Rao	361b38da19	[quant][fx] Add node name as prefix to observer module name (#44765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44765 Test Plan: python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23741355 fbshipit-source-id: 7185ceae5b3b520ac0beebb627c44eab7ae7d231	2020-09-17 10:17:42 -07:00
Natalia Gimelshein	74c3dcd1d2	Revert D23725053: [pytorch][PR] change self.generator to generator Test Plan: revert-hammer Differential Revision: D23725053 (`a011b86115`) Original commit changeset: 89706313013d fbshipit-source-id: 035214f0d4298d29a52f8032d364b52dfd956fe8	2020-09-17 09:42:37 -07:00
Yanli Zhao	d2b4534d4d	refactor intialize bucket views (#44330 ) Summary: [test all] Pull Request resolved: https://github.com/pytorch/pytorch/pull/44330 Part of relanding PR #41954, this refactor is to seperate intialize_bucket_views and populate_bucket_views_out, as they are doing different things and called by different callsites as well ghstack-source-id: 112257271 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D23583347 fbshipit-source-id: a5f2041b2c4f2c2b5faba1af834c7143eaade938	2020-09-17 09:20:23 -07:00
Jane Xu	4affbbd9f8	minor style edits to torch/testing/_internal/common_quantized.py (#44807 ) Summary: style nits Pull Request resolved: https://github.com/pytorch/pytorch/pull/44807 Reviewed By: malfet Differential Revision: D23742537 Pulled By: janeyx99 fbshipit-source-id: 446343822d61f8fd9ef6dfcb8e5da4feff6522b6	2020-09-17 08:02:43 -07:00
Heitor Schueroff de Souza	28085cbd39	Fixed quantile nan propagation and implemented nanquantile (#44393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44393 torch.quantile now correctly propagates nan and implemented torch.nanquantile similar to numpy.nanquantile. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23649613 Pulled By: heitorschueroff fbshipit-source-id: 5201d076745ae1237cedc7631c28cf446be99936	2020-09-17 05:53:25 -07:00
Yanan Cao	99093277c0	Support Python Slice class in TorchScript (#44335 ) Summary: Implements support for[ Python Slice class](https://docs.python.org/3/c-api/slice.html) (not slice expression, which is already supported) Slice object can be used in any place that supports slice expression, including multi-dim tensor slicing. Fixes https://github.com/pytorch/pytorch/issues/43511 Fixes https://github.com/pytorch/pytorch/issues/43125 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44335 Reviewed By: suo, jamesr66a Differential Revision: D23682213 Pulled By: gmagogsfm fbshipit-source-id: f74fe25370e89fbfd2b3727d95ce4e1c4ba8dec4	2020-09-17 00:41:53 -07:00
Sameer Deshmukh	e18a2219dd	Implement scatter reductions (CUDA), remove divide/subtract (#41977 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33394 . This PR does two things: 1. Implement CUDA scatter reductions with revamped GPU atomic operations. 2. Remove support for divide and subtract for CPU reduction as was discussed with ngimel . I've also updated the docs to reflect the existence of only multiply and add. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41977 Reviewed By: mruberry Differential Revision: D23748888 Pulled By: ngimel fbshipit-source-id: ea643c0da03c9058e433de96db02b503514c4e9c	2020-09-16 23:25:21 -07:00
Muthu Arivoli	b61d3d8be8	Implement torch.kaiser_window (#44271 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44271 Reviewed By: ngimel Differential Revision: D23727972 Pulled By: mruberry fbshipit-source-id: b4c931b2eb3a536231ad6d6c3cb66e52a13286ac	2020-09-16 20:41:31 -07:00
alanashine	ba6534ae2b	enable type check common_distributed (#44821 ) Summary: Enabled type checking in common_distributed by using tensors of ints Pull Request resolved: https://github.com/pytorch/pytorch/pull/44821 Test Plan: Run python test/test_type_hints.py, errors are no longer ingnored by mypy.ini Reviewed By: walterddr Differential Revision: D23747466 Pulled By: alanadakotashine fbshipit-source-id: 820fd502d7ff715728470fbef0be90ae7f128dd6	2020-09-16 19:19:36 -07:00
Xiang Gao	e48201c5cf	Mention TF32 on related docs (#44690 ) Summary: cc: ptrblck ![image](https://user-images.githubusercontent.com/1032377/93168022-cbbfcb80-f6d6-11ea-8f6e-f2c8a15c5bea.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44690 Reviewed By: ngimel Differential Revision: D23727921 Pulled By: mruberry fbshipit-source-id: db7cc8e74cde09c13d6a57683129fd839863b914	2020-09-16 19:18:30 -07:00
James Reed	29664e6aa3	[FX] Further sanitize generated names (#44808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44808 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23739413 Pulled By: jamesr66a fbshipit-source-id: b759c3ea613dfa717fb23977b72ff4773d9dcc99	2020-09-16 18:47:38 -07:00
Nick Gibson	204f985fc3	[NNC] Add simplification of Loop + Condition patterns. (#44764 ) Summary: Adds a new optimization to the IRSimplifier which changes this pattern: ``` for ... if ... do thing; ``` into: ``` if ... for ... do thing; ``` Which should be almost strictly better. There are many cases where this isn't safe to do, hence tests. Most obviously when the condition depends on something modified within the loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44764 Reviewed By: mruberry Differential Revision: D23734463 Pulled By: nickgg fbshipit-source-id: 51617e837de96b354fb702d0090ac65ddc523d36	2020-09-16 18:41:58 -07:00
Yanan Cao	6befc09465	Fix misuse of PyObject_IsSubclass (#44769 ) Summary: PyObject_IsSubclass may set python live exception bit if given object is not a class. `IsNamedTuple` is currently using it incorrectly, which may trip all following python operations in debug-build python. Normal release-build python is not affected because `assert` is no-op in release-build. Fixes https://github.com/pytorch/pytorch/issues/43577 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44769 Reviewed By: jamesr66a Differential Revision: D23725584 Pulled By: gmagogsfm fbshipit-source-id: 2dabd4f8667a045d5bf75813500876c6fd81542b	2020-09-16 16:19:01 -07:00
Meghan Lele	43fe034514	[JIT] Disallow plain Optional type annotation without arg (#44586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44586 Summary This commit disallows plain `Optional` type annotations without any contained types both in type comments and in-line as Python3-style type annotations. Test Plan This commit adds a unit test for these two situations. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23721517 Pulled By: SplitInfinity fbshipit-source-id: ead411e94aa0ccce227af74eb0341e2a5331370a	2020-09-16 16:07:26 -07:00
Mingzhe Li	574f9af160	[NCCL] Add option to run NCCL on high priority cuda stream (#43796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43796 This diff adds an option for the process group NCCL backend to pick high priority cuda streams. Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D23404286 fbshipit-source-id: b79ae097b7cd945a26e8ba1dd13ad3147ac790eb	2020-09-16 16:00:41 -07:00
Michael Suo	161490d441	Move `torch/version.py` generation to cmake (#44577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44577 I would like to to move this to cmake so that I can depend on it happening from other parts of the build. This PR pulls out the logic for determining the version string and writing the version file into its own module. `setup.py` still receives the version string and uses it as before, but now the code for writing out `torch/version.py` lives in a custom command in torch/CMakeLists.txt I noticed a small inconsistency in how version info is populated. `TORCH_BUILD_VERSION` is populated from `setup.py` at configuration time, while `torch/version.py` is written at build time. So if, e.g. you configured cmake on a certain git rev, then built it in on another, the two versions would be inconsistent. This does not appear to matter, so I opted to preserve the existing behavior. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23734781 Pulled By: suo fbshipit-source-id: 4002c9ec8058503dc0550f8eece2256bc98c03a4	2020-09-16 15:49:22 -07:00
Meghan Lele	ffe127e4f1	[JIT] Disallow plain Tuple type annotation without arg (#44585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44585 Summary This commit disallows plain `Tuple` type annotations without any contained types both in type comments and in-line as Python3-style type annotations. Test Plan This commit adds a unit test for these two situations. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23721515 Pulled By: SplitInfinity fbshipit-source-id: e11c77a4fac0b81cd535c37a31b9f4129c276592	2020-09-16 15:49:19 -07:00
qxu	09a84071a3	enable mypy check for jit_metaprogramming_utils (#44752 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42969 enable mypy check for jit_metaprogramming_utils.py and fixed all errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44752 Reviewed By: walterddr Differential Revision: D23741285 Pulled By: qxu-fb fbshipit-source-id: 21e36ca5d25c8682fb93b806e416b9e1db76f71e	2020-09-16 15:44:37 -07:00
Alex Suhan	7b3432caff	[TensorExpr] Support boolean in simplifier (#44659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44659 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ConstantFoldCastToBool Reviewed By: ngimel Differential Revision: D23714675 Pulled By: asuhan fbshipit-source-id: 4c18d972b628d5ad55bad58eddd5f6974e043d9c	2020-09-16 15:30:19 -07:00
Meghan Lele	78b806ab4a	[JIT] Disallow plain List type annotation without arg (#44584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44584 Summary This commit extends the work done in #38130 and disallows plain Python3-style `List` type annotations. Test Plan This commit extends `TestList.test_no_element_type_annotation` to the Python3-style type annotation. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23721514 Pulled By: SplitInfinity fbshipit-source-id: 48957868286f44ab6d5bf5e1bf97f0a4ebf955df	2020-09-16 15:08:04 -07:00
Meghan Lele	cb3b8a33f1	[JIT] Disallow plain Dict type annotation without arg (#44334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44334 Summary This commit detects and prohibits the case in which `typing.Dict` is used as an annotation without type arguments (i.e. `typing.Dict[K, V]`). At present, `typing.Dict` is always assumed to have two arguments, and when it is used without them, `typing.Dict.__args__` is nonempty and contains some `typing.TypeVar` instances, which have no JIT type equivalent. Consequently, trying to convert `typing.Dict` to a JIT type results in a `c10::DictType` with `nullptr` for its key and value types, which can cause a segmentation fault. This is fixed by returning a `DictType` from `jit.annotations.try_ann_to_type` only if the key and value types are converted successfully to a JIT type and returning `None` otherwise. Test Plan This commit adds a unit test to `TestDict` that tests the plain `Dict` annotations throw an error. Fixes This commit closes #43530. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23610766 Pulled By: SplitInfinity fbshipit-source-id: 036b10eff6e3206e0da3131cfb4997d8189c4fec	2020-09-16 14:38:28 -07:00
Edward Yang	5027c161a9	Add TORCH_SELECTIVE_NAME to AMP definitions (#44711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44711 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23711425 Pulled By: ezyang fbshipit-source-id: d4b0ef77893af80fe9b74791e66825e223ae221d	2020-09-16 14:25:17 -07:00
Nick Gibson	82ab167cce	[NNC] Fix masking for all block and thread dimensions in CudaCodeGen (#44733 ) Summary: Unifies a number of partial solutions to the thread and block dimension extent masking, including the NoThreadIdxWriter and my last fix https://github.com/pytorch/pytorch/issues/44325. The NoThreadIdxWriter is gone in favour of tracking the current loop extents and masking any statements that have a lower rank than the launch parameters in any Block or Thread dimension, which handles both the "no" and "smaller" axis binding cases. For example it will transform the following: ``` for i in 0..10 // blockIdx.x for j in 0..10 // threadIdx.x do thing(i, j); for k in 0..5 // threadIdx.x do other thing(i, k); ``` Into: ``` do thing(blockIdx.x, threadIdx.x); if (threadIdx.x < 5) { do other thing(blockIdx.x, threadIdx.x); } ``` And handle the case where statements are not bound by any axis, eg. ``` do outer thing; for i in 0..10 // blockIdx.x for j in 0..10 // threadIdx.x do thing(i, j); do other thing(i); ``` will become: ``` if (blockIdx.x < 1) { if (threadIdx.x < 1) { do outer thing; } } syncthreads(); do thing(blockIdx.x, threadIdx.x); syncthreads(); if (threadIdx.x < 1) { do other thing(blockIdx.x); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44733 Reviewed By: mruberry Differential Revision: D23736878 Pulled By: nickgg fbshipit-source-id: 52d08626ae8043d53eb937843466874d479a6768	2020-09-16 14:23:47 -07:00
Yi Wang	f3bd984e44	Move the description comment of compute_bucket_assignment_by_size from cpp to the header file. (#44703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44703 The description of this public function should be in the header file. Also fix some typos. Test Plan: N/A. Reviewed By: pritamdamania87 Differential Revision: D23703661 fbshipit-source-id: 24ae63de9498e321b31dfb2efadb44183c6370df	2020-09-16 13:44:14 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
James Reed	e9c6449b46	[FX][EZ] Allow constructing GraphModule with dict for root (#44679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44679 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23696766 Pulled By: jamesr66a fbshipit-source-id: fe18b7b579c1728d00589bd5fd5e54c917cc61fe	2020-09-16 12:43:23 -07:00
Nikita Shulga	c44e4878ae	Enable torch.backends.quantized typechecks (#44794 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44794 Reviewed By: walterddr Differential Revision: D23734353 Pulled By: malfet fbshipit-source-id: 491bd7c8f147759715eb296d7537a172685aa066	2020-09-16 12:21:20 -07:00
Shen Li	cce7680a23	Add bound method tests for async_execution with RRef helper (#44716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44716 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23707326 Pulled By: mrshenli fbshipit-source-id: a2f8db17447e9f82c9f6ed941ff1f8cb9090ad74	2020-09-16 12:01:07 -07:00
Shen Li	257c6d0fde	Make async_execution compatible with RRef helpers (#44666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44666 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23691989 Pulled By: mrshenli fbshipit-source-id: b36f4b1c9d7782797a0220434a8272610a23e83e	2020-09-16 12:01:05 -07:00
Shen Li	924717bf51	Add _get_type() API to RRef (#44663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44663 The new API returns the type of the data object referenced by this `RRef`. On the owner, this is same as `type(rref.local_value())`. On a user, this will trigger an RPC to fetch the `type` object from the owner. After this function is run once, the `type` object is cached by the `RRef`, and subsequent invocations no longer trigger RPC. closes #33210 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23691990 Pulled By: mrshenli fbshipit-source-id: a2d87cd601a691dd75164b6bcd7315245e9cf6bd	2020-09-16 11:59:22 -07:00
Yanan Cao	07d07e3c6c	Remove EXPERIMENTAL_ENUM_SUPPORT feature guard (#44243 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44243 Reviewed By: ZolotukhinM Differential Revision: D23605979 Pulled By: gmagogsfm fbshipit-source-id: 098ae69049c4664ad5d1521c45b8a7dd22e72f6c	2020-09-16 11:45:59 -07:00
Michael Carilli	3e6bb5233f	Reference amp tutorial (recipe) from core amp docs (#44725 ) Summary: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html is live. Core amp docs should reference it. Also i fixed some typos in the `zero_grad` docs we ignored when git was behaving weirdly during ngimel 's merge of https://github.com/pytorch/pytorch/pull/44423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44725 Reviewed By: mruberry Differential Revision: D23723807 Pulled By: ngimel fbshipit-source-id: ca0b76365f8ca908bd978e3b38bf81857fa6c2a3	2020-09-16 11:37:58 -07:00
Fang Zhang	a011b86115	change self.generator to generator (#44461 ) Summary: bug fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/44461 Reviewed By: mruberry Differential Revision: D23725053 Pulled By: ngimel fbshipit-source-id: 89706313013d9eae96aaaf144924867457efd2c0	2020-09-16 11:32:17 -07:00
Jimmy Yao	5e717f0d5e	delete the space for the docs rendering (#44740 ) Summary: see the docs rendering of `jacobian` and `hessian` at https://pytorch.org/docs/stable/autograd.html ![image](https://user-images.githubusercontent.com/20907377/93268949-f0618500-f762-11ea-9ec6-ddd062540c59.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44740 Reviewed By: ngimel Differential Revision: D23724899 Pulled By: mrshenli fbshipit-source-id: f7558ff53989e5dc7e678706207be2ac7ce22c66	2020-09-16 11:13:45 -07:00
Pritam Damania	dbf17a1d4c	Fixing a few links in distributed CONTRIBUTING.md (#44753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44753 ghstack-source-id: 112132781 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D23719077 fbshipit-source-id: 3d943dfde100d175f417554fc7fca1fdb295129f	2020-09-16 10:14:19 -07:00
Rohan Varma	63469da3bb	Add a test to ensure DDP join works with RPC (#44439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44439 Adds a test to ddp_under_dist_autograd_test to enusre that that uneven inputs join() API works properly when DDP + RPC is combined. We test that when running in outside DDP mode (DDP applied to whole hybrid module) we can correctly process uneven inputs across different trainers. ghstack-source-id: 112156980 Test Plan: CI Reviewed By: albanD Differential Revision: D23612409 fbshipit-source-id: f1e328c096822042daaba263aa8747a9c7e89de7	2020-09-16 09:51:43 -07:00
Supriya Rao	3f512b0de2	[quant][qat] Ensure observers and fq modules are scriptable (#44749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749 Ensure fx module is scriptable after calling prepare_qat on it Test Plan: python test/test_quantization.py TestQuantizeFx.test_qat_and_script Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23718380 fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c	2020-09-16 09:30:07 -07:00
Mikhail Zolotukhin	d66520ba08	[TensorExpr] Fuser: try merging adjacent fusion groups. (#43671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43671 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23360796 Pulled By: ZolotukhinM fbshipit-source-id: 60ec318fe77ae9f2c821d9c4d106281845266e0f	2020-09-15 21:31:02 -07:00
Kent Gauen	2efc618f19	lr_schedule.py redundant code (#44613 ) Summary: The subclass sets "self.last_epoch" when this is set in the parent class's init function. Why would we need to set last_epoch twice? I think calling "super" resets last_epoch anyway, so I am not sure why we would want to include this in the subclass. Am I missing something? For the record, I am just a Pytorch enthusiast. I hope my question isn't totally silly. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44613 Reviewed By: albanD Differential Revision: D23691770 Pulled By: mrshenli fbshipit-source-id: 080d9acda86e1a2bfaafe2c6fcb8fc1544f8cf8a	2020-09-15 20:28:39 -07:00
Zachary DeVito	2c1b215b48	[fx] remove delegate, replace with tracer (#44566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44566 The Delegate objects were confusing. They were suppose to be a way to configure how tracing works, but in some cases they appeared necessary for consturcting graphs, which was not true. This makes the organization clearer by removing Delgate and moving its functionality into a Tracer class, similar to how pickle has a Pickler class. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23683177 Pulled By: zdevito fbshipit-source-id: 7605a34e65dfac9a487c0bada39a23ca1327ab00	2020-09-15 16:52:22 -07:00
Ailing Zhang	fb085d90e3	Revert D23583017: move rebuild buckets from end of first iteration to beginning of second iteration Test Plan: revert-hammer Differential Revision: D23583017 (`f5d231d593`) Original commit changeset: ef67f79437a8 fbshipit-source-id: fd914b7565aba6a5574a32b31403525abb80ff07	2020-09-15 15:10:52 -07:00
Dmytro Dzhulgakov	2f4c31ce3a	[jit] Speed up saving in case of many classes (#44589 ) Summary: There's an annoying O(N^2) in module export logic that makes saving some of the models (if they have many classes) take eternity. I'm not super familiar with this code to properly untangle the deps and make it a pure hash lookup. So I just added a side lookup table for raw pointers. It's still quadratic, but it's O(num_classes^2) instead of O(num_classes * num_references) which already gives huge savings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44589 Test Plan: Tested with one of the offending models - just loading a saving a Torchscript file: ``` Before: load 1.9239683151245117 save 165.74712467193604 After: load 1.9409027099609375 save 1.4711427688598633 ``` Reviewed By: suo Differential Revision: D23675278 Pulled By: dzhulgakov fbshipit-source-id: 8f3fa7730941085ea20d9255b49a149ac1bf64fe	2020-09-15 15:10:45 -07:00
Nick Gibson	69839ea3f6	[NNC] make inlining immediate (take 3) (#44231 ) Summary: This is a reup https://github.com/pytorch/pytorch/issues/43885 with an extra commit which should fix the bugs that caused it to be reverted. Read that for general context. The issue here was that we were still using the side maps `tensor_to_stmt_` and `stmt_to_tensor_` which get invalidated by any transform of the IR (rather than just any transform that isn't computeInline). I added a comment about this but didn't actually address our usages of it. I've removed these maps and changed the `getLoopBodyFor` and `getLoopStatementsFor` helpers to search the root stmt directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44231 Reviewed By: albanD Differential Revision: D23689688 Pulled By: nickgg fbshipit-source-id: 1c6009a880f8c0cebf2300fd06b5cc9322bffbf9	2020-09-15 11:12:24 -07:00
Elias Ellison	8df0400a50	Fix fallback graph in specialize autogradzero (#44654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44654 Previously we weren't creating a fallback graph as intended in specialize autograd zero, so if a Tensor failed one of our undefinedness checks we would run the backward normally without reprofiling & optimizing. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23691764 Pulled By: eellison fbshipit-source-id: 10c6fa79518c84a6f5ef2bfbd9ea10843af751eb	2020-09-15 11:12:20 -07:00
kshitij12345	1d733d660d	[docs] torch.min/max: remove incorrect warning from docs (#44615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44195 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/44615 Reviewed By: ngimel Differential Revision: D23703525 Pulled By: mruberry fbshipit-source-id: 471ebd764be667e29c03a30f3ef341440adc54d2	2020-09-15 10:42:08 -07:00
Xiang Gao	6bc77f4d35	Use amax/maximum instead of max in optimizers (#43797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43797 Reviewed By: malfet Differential Revision: D23406641 Pulled By: mruberry fbshipit-source-id: 0cd075124aa6533b21375fe2c90c44a5d05ad6e6	2020-09-15 10:39:40 -07:00
Muthu Arivoli	9c364da9b9	Fix doc builds for bool kwargs (#44686 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43669 The bool will still link to https://docs.python.org/3/library/functions.html#bool. Tested using bmm: ![image](https://user-images.githubusercontent.com/16063114/93156438-2ad11080-f6d6-11ea-9b81-96e02ee68d90.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44686 Reviewed By: ngimel Differential Revision: D23703823 Pulled By: mruberry fbshipit-source-id: 7286afad084f5ab24a1254ad84e5d01907781c85	2020-09-15 10:34:58 -07:00
Yanli Zhao	f5d231d593	move rebuild buckets from end of first iteration to beginning of second iteration (#44326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44326 Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112011490 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D23583017 fbshipit-source-id: ef67f79437a820d9b5699b651803622418499a83	2020-09-15 09:51:33 -07:00
Vasiliy Kuznetsov	5f692a67db	qat conv_fused.py: one more patch for forward compatibility (#44671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44671 See comments inline - the FC between https://github.com/pytorch/pytorch/pull/38478 and https://github.com/pytorch/pytorch/pull/38820 was broken, patching it. Test Plan: Verified with customer hitting the issue that this fixes their issue. Reviewed By: jerryzh168 Differential Revision: D23694029 fbshipit-source-id: a5e1733334e22305a111df750b190776889705d0	2020-09-15 09:43:29 -07:00
Vitaliy Chiley	c71ce10cfc	add dilation to transposeconv's _output_padding method (#43793 ) Summary: This PR adds dilation to _ConvTransposeNd._output_padding method and tests using a bunch of different sized inputs. Fixes https://github.com/pytorch/pytorch/issues/14272 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43793 Reviewed By: zou3519 Differential Revision: D23493313 Pulled By: ezyang fbshipit-source-id: bca605c428cbf3a97d3d24316d8d7fde4bddb307	2020-09-14 21:28:27 -07:00
Meghan Lele	e7d782e724	[JIT] Add property support for ScriptModules (#42390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42390 Summary This commit extends support for properties to include ScriptModules. Test Plan This commit adds a unit test that has a ScriptModule with a user-defined property. `python test/test_jit_py3.py TestScriptPy3.test_module_properties` Test Plan: Imported from OSS Reviewed By: eellison, mannatsingh Differential Revision: D22880298 Pulled By: SplitInfinity fbshipit-source-id: 74f6cb80f716084339e2151ca25092b6341a1560	2020-09-14 18:49:21 -07:00
Guilherme Leobas	e107ef5ca2	Add type annotations for torch.nn.utils.* (#43080 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43013 Redo of gh-42954 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43080 Reviewed By: albanD Differential Revision: D23681334 Pulled By: malfet fbshipit-source-id: 20ec78aa3bfecb7acffc12eb89d3ad833024394c	2020-09-14 17:52:37 -07:00
Elias Ellison	551494b01d	[JIT] Fix torch.tensor for empty multidimensional-typed lists (#44652 ) Summary: We were hitting an assert error when you passed in an empty `List[List[int]]` - this fixes that error by not recursing into 0-element tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44652 Reviewed By: ZolotukhinM Differential Revision: D23688247 Pulled By: eellison fbshipit-source-id: d48ea24893044fae96bc39f76c0f1f9726eaf4c7	2020-09-14 17:28:23 -07:00
Mike Ruberry	686e281bcf	Updates div to perform true division (#42907 ) Summary: This PR: - updates div to perform true division - makes torch.true_divide an alias of torch.div This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907 Reviewed By: ngimel Differential Revision: D23622114 Pulled By: mruberry fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927	2020-09-14 15:50:38 -07:00
Jerry Zhang	e594c30bc2	[quant][graphmode][fx] Support fp16 dynamic quantization for linear (#44582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44582 Test Plan: test_quantize_fx.py Imported from OSS Reviewed By: vkuzo Differential Revision: D23665974 fbshipit-source-id: 19ba6c61a9c77ef570b00614016506e9a2729f7c	2020-09-14 15:43:08 -07:00
BowenBao	43406e218a	[ONNX] Update ONNX shape inference (#43929 ) Summary: * Support sequence type (de)serialization, enables onnx shape inference on sequence nodes. * Fix shape inference with block input/output: e.g. Loop and If nodes. * Fix bugs in symbolic discovered by coverage of onnx shape inference. * Improve debuggability: added more jit logs. For simplicity, the default log level, when jit log is enabled, will not dump ir graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43929 Reviewed By: albanD Differential Revision: D23674604 Pulled By: bzinodev fbshipit-source-id: ab6aacb16d0e3b9a4708845bce27c6d65e567ba7	2020-09-14 15:36:19 -07:00
Ksenija Stanojevic	f7cfbac89b	[ONNX] Update len symbolic (#43824 ) Summary: Update len symbolic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43824 Reviewed By: izdeby Differential Revision: D23575765 Pulled By: bzinodev fbshipit-source-id: 0e5c8c8d4a5297f65e2dc43168993350f784c776	2020-09-14 15:00:44 -07:00
shubhambhokare1	da11d932bc	[ONNX] Update arange op to support out argument (#43777 ) Summary: Update arange op to support out argument Pull Request resolved: https://github.com/pytorch/pytorch/pull/43777 Reviewed By: albanD Differential Revision: D23674583 Pulled By: bzinodev fbshipit-source-id: 6fb65e048c6b1a551569d4d2a33223522d2a960c	2020-09-14 14:56:17 -07:00
neginraoof	62ebad4ff9	[ONNX] Export new_empty and new_zeros (#43506 ) Summary: Adding symbolic to export new_empty and new_zeros Pull Request resolved: https://github.com/pytorch/pytorch/pull/43506 Reviewed By: houseroad Differential Revision: D23674574 Pulled By: bzinodev fbshipit-source-id: ecfcdbd4845fd3a3c6618a060129fbeee4df5dd7	2020-09-14 14:48:34 -07:00
Zafar	742654d1b6	[quant] ConvTranspose1d / ConvTranspose2d (#40371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40371 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158981 Pulled By: z-a-f fbshipit-source-id: defbf6fbe730a58d5b155dcb2460dd969797215c	2020-09-14 14:25:06 -07:00
Alex Suhan	a188dbdf3f	Check for index-rank consistency in FunctionInliner (#44561 ) Summary: When caller / callee pairs are inserted into the mapping, verify that the arity of the buffer access is consistent with its declared rank. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44561 Test Plan: CI, test_tensorexpr --gtest_filter=TensorExprTest.DetectInlineRankMismatch Reviewed By: albanD Differential Revision: D23684342 Pulled By: asuhan fbshipit-source-id: dd3a0cdd4c2492853fa68381468e0ec037136cab	2020-09-14 14:07:22 -07:00
Rong Rong	b5dd6e3e61	split torch.testing._internal.* and add type checking for torch.testing._internal.common_cuda (#44575 ) Summary: First step to fix https://github.com/pytorch/pytorch/issues/42969. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44575 Reviewed By: malfet Differential Revision: D23668740 Pulled By: walterddr fbshipit-source-id: eeb3650b1780aaa5727b525b4e6182e1bc47a83f	2020-09-14 14:04:02 -07:00
mariosasko	cfba33bde3	Fix the ELU formula in the docs (#43764 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43389. This PR replaces the old ELU formula from the docs that yields wrong results for negative alphas with the new one that fixes the issue and relies on the cases notation which makes the formula more straightforward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43764 Reviewed By: ailzhang Differential Revision: D23425532 Pulled By: albanD fbshipit-source-id: d0931996e5667897d926ba4fc7a8cc66e8a66837	2020-09-14 14:01:56 -07:00
Zafar	9d4943daaf	[quant] conv_transpose1d / conv_transpose2d (#40370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40370 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158979 Pulled By: z-a-f fbshipit-source-id: f5cb812c9953efa7608f06cf0188de447f73f358	2020-09-14 13:45:28 -07:00
Rong Rong	ecac8294a6	enable type checking for torch._classes (#44576 ) Summary: Fix https://github.com/pytorch/pytorch/issues/42980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44576 Reviewed By: malfet Differential Revision: D23668741 Pulled By: walterddr fbshipit-source-id: 4201ea3187a40051ebff53d28c8e571ea1a61126	2020-09-14 13:26:46 -07:00
Raghavan Raman	ad7a2eb1c9	Simplify nested Min and Max patterns. (#44142 ) Summary: Improve simplification of nested Min and Max patterns. Specifically, handles the following pattern simplications: * `Max(A, Max(A, Const)) => Max(A, Const)` * `Max(Min(A, B), Min(A, C)) => Min(A, Max(B, C))` * `Max(Const, Max(A, OtherConst) => Max(A, Max(Const, OtherConst))` - This case can have an arbitrarily long chain of Max ops. For example: `Max(5, Max(x, Max(y, Max(z, 8)))) => Max(Max(Max(x, 8), y), z)` Similarly, for the case of Min as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44142 Reviewed By: albanD Differential Revision: D23644486 Pulled By: navahgar fbshipit-source-id: 42bd241e6c2af820566744c8494e5dee172107f4	2020-09-14 13:24:46 -07:00
Heitor Schueroff de Souza	199435af90	Update median doc to note return value of even-sized input (#44562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44562 Add a note that torch.median returns the smaller of the two middle elements for even-sized input and refer user to torch.quantile for the mean of the middle values. fixes https://github.com/pytorch/pytorch/issues/39520 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23657208 Pulled By: heitorschueroff fbshipit-source-id: 2747aa652d1e7f10229d9299b089295aeae092c2	2020-09-14 13:18:33 -07:00
Bram Wasti	a475613d1d	[static runtime] Swap to out-variant compatible nodes (#44127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44127 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604306 Pulled By: bwasti fbshipit-source-id: 18ccfb9b466b822e28130be3d5c4fae36c76820b	2020-09-14 12:38:25 -07:00
Elias Ellison	856510c96d	[JIT] Dont optimize shape info in batch_mm (#44565 ) Summary: We run remove profile nodes and specialize types before batch_mm, so we cannot run peepholes on the type information of tensors since these properties have not been guarded to be guaranteed to be correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44565 Reviewed By: albanD Differential Revision: D23661538 Pulled By: eellison fbshipit-source-id: 0dd23a65714f047f49b4db4ec582b21870925fe1	2020-09-14 12:34:20 -07:00
Yi Wang	ace81b6794	Remove an extra empty line in the warning comments. (#44622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44622 Remove an extra empty line in the warning comments.Remove an extra empty line. Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D23674070 fbshipit-source-id: 4ee570590c66a72fb808e9ee034fb773b833efcd	2020-09-14 11:15:35 -07:00
Natalia Gimelshein	95a69a7d09	adds list_gpu_processes function (#44616 ) Summary: per title, to make it easier to track the creation of stray contexts: ``` python -c "import torch; a=torch.randn(1, device='cuda'); print(torch.cuda.memory.list_gpu_processes(0)); print(torch.cuda.memory.list_gpu_processes(1))" GPU:0 process 79749 uses 601.000 MB GPU memory GPU:1 no processes are running ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44616 Reviewed By: mruberry Differential Revision: D23675739 Pulled By: ngimel fbshipit-source-id: ffa14cad9d7144e883de13b1c2c6817bd432f53a	2020-09-14 09:54:32 -07:00
Thomas Viehmann	bd257a17a1	Add HIP/ROCm version to collect_env.py (#44106 ) Summary: This adds HIP version info to the `collect_env.py` output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44106 Reviewed By: VitalyFedyunin Differential Revision: D23652341 Pulled By: zou3519 fbshipit-source-id: a1f5bce8da7ad27a1277a95885934293d0fd43c5	2020-09-14 09:19:18 -07:00
Jeremy Lilley	7040a070e3	[torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44442 I noticed lock contention on startup as lookupByLiteral() was calling registerPendingOperators() - some calls were holding the lock for 10+ ms, as operators were being registered. canonicalSchemaString() was using ostreamstring, which isn't typically particularly fast (partly because of c++ spec locale requirements). If we repalce with regular c++ string appends, it's somewhat faster (which isn't hard when comparing with stringstream; albeit a bit more codegen) Over the first minute or so, this cuts out 1.4 seconds under the OperatorRegistry lock (as part of registerPendingOperators) in the first couple minutes of run time (mostly front-loaded) when running sync sgd. As an example, before: registerPendingOperators 12688 usec for 2449 operators After: registerPendingOperators 6853 usec for 2449 operators ghstack-source-id: 111862971 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/... Reviewed By: ailzhang Differential Revision: D23614515 fbshipit-source-id: e712f9dac5bca0b1876e11fb8f0850402f03873a	2020-09-14 08:24:16 -07:00
kshitij12345	c68a99bd61	[numpy] Add `torch.exp2` (#44184 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO * [x] Add tests * [x] Add docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44184 Reviewed By: ngimel Differential Revision: D23674237 Pulled By: mruberry fbshipit-source-id: 7f4fb1900fad3051cd7fc9d3d7f6d985c5fb093c	2020-09-14 04:05:37 -07:00
Victor Bittorf	68a5c361ae	Adding Adapative Autorange to benchmark utils. (#44607 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44219 Rebasing https://github.com/pytorch/pytorch/pull/44288 and fixing the git history. This allows users to bencmark code without having to specify how long to run the benchmark. It runs the benchmark until the variance (IQR / Median) is low enough that we can be confident in the measurement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44607 Test Plan: There are unit tests, and we manually tested using Examples posted in git. Reviewed By: robieta Differential Revision: D23671208 Pulled By: bitfort fbshipit-source-id: d63184290b88b26fb81c2452e1ae701c7d513d12	2020-09-13 20:55:40 -07:00
Peter Bell	8daaa3bc7e	Fix latex error in heaviside docs (#44481 ) Summary: This fixes a `katex` error I was getting trying to build the docs: ``` ParseError: KaTeX parse error: Undefined control sequence: \0 at position 55: …gin{cases} ``` This failure was introduced in https://github.com/pytorch/pytorch/issues/42523. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44481 Reviewed By: colesbury Differential Revision: D23627700 Pulled By: mruberry fbshipit-source-id: 9cc09c687a7d9349da79a0ac87d6c962c9cfbe2d	2020-09-13 16:42:19 -07:00
Martin Yuan	7862827269	[pytorch] Add variadic run_method for lite intepreter (#44337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44337 Add a new run_method to mobile Module which is variadic (takes any number of arguments) to match full jit. ghstack-source-id: 111909068 Test Plan: Added new unit test to test_jit test suite Reviewed By: linbinyu, ann-ss Differential Revision: D23585763 fbshipit-source-id: 007cf852290f03615b78c35aa6f7a21287ccff9e	2020-09-13 13:26:30 -07:00
Mikhail Zolotukhin	bcf97b8986	[JIT] Cleanup some places where we log graphs in executors. (#44588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44588 1) SOURCE_DUMP crashes when invoked on a backward graph since `prim::GradOf` nodes can't be printed as sources (they don't have schema). 2) Dumping graph each time we execute an optimized plan produces lots of output in tests where we run the graph multiple times (e.g. benchmarks). Outputting that on the least level of verbosity seems like an overkill. 3) Duplicated log statement is removed. Differential Revision: D23666812 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: b9a30e34fd39c85f3e13c3f1e3594e157e1c130f	2020-09-13 11:31:02 -07:00
Mikhail Zolotukhin	82da6b3702	[JIT] Fix jit-log verbosity selection logic. (#44587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44587 Currently it's skewed by one. The following test demonstrates it: ``` $ cat test.py import torch def foo(a,b): return aab torch._C._jit_set_profiling_executor(True) torch._C._jit_set_profiling_mode(True) torch._C._jit_override_can_fuse_on_cpu(True) torch._C._jit_set_texpr_fuser_enabled(True) f = torch.jit.script(foo) for _ in range(10): f(torch.rand(10), torch.rand(10)) $ cat test_logging_levels.sh PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL="tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo FAIL \|\| echo OK PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep DUMP >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep UPDATE >& /dev/null && echo OK \|\| echo FAIL PYTORCH_JIT_LOG_LEVEL=">>tensorexpr_fuser" python test.py 2>&1 \| grep DEBUG >& /dev/null && echo OK \|\| echo FAIL ``` Before this change: ``` OK FAIL OK OK OK FAIL OK OK OK ``` With this change everthing passes. Differential Revision: D23666813 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 4adaa5a3d06deadf54eae014a0d76588cdc5e20a	2020-09-13 11:29:25 -07:00
Bert Maher	6d4a605ce9	Fix bug simplifying if-then-else when it can be removed (#44462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44462 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23671157 Pulled By: bertmaher fbshipit-source-id: b9b92ad0de1a7bd9bc1fcac390b542d885d0ca58	2020-09-13 10:29:28 -07:00
Mike Ruberry	7e91728f68	Deprecates calling linspace and logspace without setting steps explicitly (#43860 ) Summary: BC-breaking note This change is BC-breaking for C++ callers of linspace and logspace if they were providing a steps argument that could not be converted to an optional. PR note This PR deprecates calling linspace and logspace wihout setting steps explicitly by: - updating the documentation to warn that not setting steps is deprecated - warning (once) when linspace and logspace are called without steps being specified A test for this behavior is added to test_tensor_creation_ops. The warning only appears once per process, however, so the test would pass even if no warning were thrown. Ideally there would be a mechanism to force all warnings, include those from TORCH_WARN_ONCE, to trigger. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43860 Reviewed By: izdeby Differential Revision: D23498980 Pulled By: mruberry fbshipit-source-id: c48d7a58896714d184cb6ff2a48e964243fafc90	2020-09-13 06:09:19 -07:00
Yi Wang	82b4477948	Pass the input tensor vector by const reference. (#44340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44340 Changed the constructor of GradBucket to pass the input by const reference and hence avoided unnecessary explicit move semantics. Since previously the declaration and definition are separated, passing the input tensor vector by value looks quite bizarre. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: pritamdamania87 Differential Revision: D23569939 fbshipit-source-id: db761d42e76bf938089a0b38e98e76a05bcf4162	2020-09-11 18:03:56 -07:00
Yi Wang	ab5fee2784	Move the inline implementations of GradBucket class to the header. (#44339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44339 Moved the inline implementations of GradBucket class to the header for succinctness and readability. This coding style is also consistent with reducer.h under the same directory. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: pritamdamania87 Differential Revision: D23569701 fbshipit-source-id: 237d9e2c5f63a6bcac829d0fcb4a5ba3bede75e5	2020-09-11 18:01:37 -07:00
Elias Ellison	1f0dcf39fc	[JIT] dont optimize device dtype on inline (#43363 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/36404 Adding prim::device and prim::dtype to list of skipped peepholes when we run inlining. In the long term another fix may not be to encode shape / dtype info on the traced graph, because it is not guaranteed to be correct. This is blocked by ONNX currently. Partial fix for https://github.com/pytorch/pytorch/issues/43134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43363 Reviewed By: glaringlee Differential Revision: D23383987 Pulled By: eellison fbshipit-source-id: 2e9c5160d39d690046bd9904be979d58af8d3a20	2020-09-11 17:29:54 -07:00
Mikhail Zolotukhin	d729e2965e	[TensorExpr] Do not inline autodiff graphs if they contain prim::TypeCheck nodes. (#44564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44564 Before this change we sometimes inlined autodiff subgraph containing fusion groups. This happened because we didn't look for 'unsupported' nodes recursively (maybe we should), but fusion groups were inside if-nodes. The problem was detected by bertmaher in 'LearningToPaint' benchmark investigation where this bug caused us to keep constantly hitting fallback paths of the graph. Test Plan: Imported from OSS Reviewed By: bwasti Differential Revision: D23657049 Pulled By: ZolotukhinM fbshipit-source-id: 7c853424f6dce4b5c344d6cd9c467ee04a8f167e	2020-09-11 17:28:53 -07:00
Nick Gibson	64b4307d47	[NNC] Cuda Codegen - mask loops bound to block/thread dimensions (#44325 ) Summary: Fix an issue where loops of different sizes are bound to the same Cuda dimension / metavar. Coming soon more info and tests... Pull Request resolved: https://github.com/pytorch/pytorch/pull/44325 Reviewed By: colesbury Differential Revision: D23628859 Pulled By: nickgg fbshipit-source-id: 3621850a4cc38a790b62ad168d32e7a0e2462fad	2020-09-11 16:48:16 -07:00
Nikita Shulga	2ae74c0632	Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453 ) Summary: 2nd attempt to land https://github.com/pytorch/pytorch/pull/44079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453 Reviewed By: walterddr, seemethere Differential Revision: D23619528 Pulled By: malfet fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7	2020-09-11 16:27:47 -07:00
Jerry Zhang	b6f0ea0c71	[quant][graphmode][fx][fix] Remove qconfig in convert (#44526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44526 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23641960 fbshipit-source-id: 546da1c16694d1e1dfb72629085acaae2165e759	2020-09-11 15:51:47 -07:00
Jerry Zhang	a82ea6a91f	[quant][graphmode][fx][fix] Support None qconfig in convert (#44524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44524 None qconfig is not handled previously closes: https://github.com/pytorch/pytorch/issues/44438 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23640269 fbshipit-source-id: 8bfa88c8c78d4530338d9d7fa9669876c386d91f	2020-09-11 15:22:25 -07:00
Zafar	1fb5883072	removing conv filters from conv pattern matching (#44512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44512 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23637409 Pulled By: z-a-f fbshipit-source-id: ad5be0fa6accfbcceaae9171bf529772d87b4098	2020-09-11 15:16:29 -07:00
Wanchao Liang	ab6126b50e	[rpc][jit] support remote call in TorchScript (#43046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43046 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23621108 Pulled By: wanchaol fbshipit-source-id: e8152c6cdd3831f32d72d46ac86ce22f3f13c651	2020-09-11 14:59:51 -07:00
Wanchao Liang	3e5df5f216	[rpc][jit] support rpc_sync in TorchScript (#43043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43043 This add the support for rpc_sync in TorchScript in a way similar to rpc_async Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23252039 Pulled By: wanchaol fbshipit-source-id: 8a05329cb8a24079b2863178b73087d47273914c	2020-09-11 14:59:47 -07:00
Wanchao Liang	8bec7cfa91	[rpc] rename some functions (#43042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43042 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23228894 Pulled By: wanchaol fbshipit-source-id: 3702b7826ecb455073fabb9dc5dca804c0e092b2	2020-09-11 14:58:39 -07:00
Vasiliy Kuznetsov	70dfeb44bd	MinMax based observers: respect device affinity for state_dict (#44537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44537 Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val\|min_vals\|max_val\|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23644493 fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e	2020-09-11 14:48:56 -07:00
Gregory Chanan	192c4111a3	Simplify target handling in nn gradcheck. (#44507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44507 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23635799 Pulled By: gchanan fbshipit-source-id: 75090d6a48771e5c92e737a0829fbfa949f7c8a7	2020-09-11 13:25:59 -07:00
Gregory Chanan	5579b53a7f	Fix SmoothL1Loss when target.requires_grad is True. (#44486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44486 SmoothL1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True. This PR does the following: 1) adds derivative support for target via the normal derivatives.yaml route 2) kill the different (and incorrect) path for when target.requires_grad was True 3) modify the SmoothL1Loss CriterionTests to verify that the target derivative is checked. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23630699 Pulled By: gchanan fbshipit-source-id: 0f94d1a928002122d6b6875182867618e713a917	2020-09-11 12:13:36 -07:00
Cheng Chang	b7ef4eec46	[NNC] Add loop slicing transforms (#43854 ) Summary: Add new transforms `sliceHead` and `sliceTail` to `LoopNest`, for example: Before transformation: ``` for x in 0..10: A[x] = x2 ``` After `sliceHead(x, 4)`: ``` for x in 0..4: A[x] = x2 for x in 4..10: A[x] = x2 ``` After `sliceTail(x, 1)`: ``` for x in 0..4: A[x] = x2 for x in 4..9: A[x] = x2 for x in 9..10: A[x] = x2 ``` `sliceHead(x, 10)` and `sliceTail(x, 10)` is no-op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43854 Test Plan: Tests are added in `test_loopnest.cpp`, the tests cover the basic transformations, and also tests the combination with other transformations such as `splitWithTail`. Reviewed By: nickgg Differential Revision: D23417366 Pulled By: cheng-chang fbshipit-source-id: 06c6348285f2bafb4be3286d1642bfbe1ea499bf	2020-09-11 12:09:12 -07:00
Jerry Zhang	11fb51d093	[quant][graphmode][fx][fix] Support dictionary output (#44508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44508 Bug fix for dictionary output Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23636182 fbshipit-source-id: 0c00cd6b9747fa3f8702d7f7a0d5edb31265f466	2020-09-11 11:29:20 -07:00
Ann Shan	442957d8b6	[pytorch] Remove mobile nonvariadic run_method (#44235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44235 Removes nonvariadic run_method() from mobile Module entirely (to be later replaced by a variadic version). All use cases should have been migrated to use get_method() and Method::operator() in D23436351 ghstack-source-id: 111848220 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D23484577 fbshipit-source-id: 602fcde61e13047a34915b509da048b9550103b1	2020-09-11 10:23:08 -07:00
Ann Shan	a61318a535	[pytorch] Replace mobile run_method with get_method and operator() (#44202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202 In preparation for changing mobile run_method() to be variadic, this diff: * Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist. * Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects. ghstack-source-id: 111848222 Test Plan: CI, and all the unit tests which currently contain run_method that are being changed. Reviewed By: iseeyuan Differential Revision: D23436351 fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577	2020-09-11 10:23:06 -07:00
Guilherme Leobas	cdf5e2ae86	add typing annotations for a few torch.utils.* modules (#43806 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43431. Depends on [gh-43862](https://github.com/pytorch/pytorch/pull/43862) (EDIT: now merged) Modules: - torch.utils.mkldnn - torch.utils.mobile_optimizer - torch.utils.bundled_inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/43806 Reviewed By: gmagogsfm Differential Revision: D23635151 Pulled By: SplitInfinity fbshipit-source-id: a85b75a7927dde6cc55bcb361f8ff601ffb0b2a1	2020-09-11 10:20:55 -07:00
David Reiss	7d78a6fcdd	Update interpolate to use new upsample overloads (#43025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43025 - Use new overloads that better reflect the arguments to interpolate. - More uniform interface for upsample ops allows simplifying the Python code. - Also reorder overloads in native_functions.yaml to give them priority. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37177 ghstack-source-id: 106938111 Test Plan: test_nn has pretty good coverage. Relying on CI for ONNX, etc. Didn't test FC because this change is not forward compatible. To ensure backwards compatibility, I ran this code before this change ```python def test_func(arg): interp = torch.nn.functional.interpolate with_size = interp(arg, size=(16,16)) with_scale = interp(arg, scale_factor=[2.1, 2.2], recompute_scale_factor=False) with_compute = interp(arg, scale_factor=[2.1, 2.2]) return (with_size, with_scale, with_compute) traced_func = torch.jit.trace(test_func, torch.randn(1,1,1,1)) sample = torch.randn(1, 3, 7, 7) output = traced_func(sample) assert not torch.allclose(output[1], output[2]) torch.jit.save(traced_func, "model.pt") torch.save((sample, output), "data.pt") ``` then this code after this change ```python model = torch.jit.load("model.pt") sample, golden = torch.load("data.pt") result = model(sample) for r, g in zip(result, golden): assert torch.allclose(r, g) ``` Reviewed By: AshkanAliabadi Differential Revision: D21209991 fbshipit-source-id: 5b2ebb7c3ed76947361fe532d1dbdd6faa3544c8	2020-09-11 09:59:14 -07:00
Gregory Chanan	3de2c0b42f	Fix L1Loss when target.requires_grad is True. (#44471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44471 L1Loss had a completely different (and incorrect, see #43228) path when target.requires_grad was True. This PR does the following: 1) adds derivative support for target via the normal derivatives.yaml route 2) kill the different (and incorrect) path for when target.requires_grad was True 3) modify the L1Loss CriterionTests to verify that the target derivative is checked. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23626008 Pulled By: gchanan fbshipit-source-id: 2828be16b56b8dabe114962223d71b0e9a85f0f5	2020-09-11 09:51:16 -07:00
Martin Yuan	b73b44f976	[PyTorch Mobile] Move some string ops to register_prim_ops.cpp and make them selective (#44500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44500 Some user models are using those operators. Unblock them while keep the ops selective. Test Plan: CI Reviewed By: linbinyu Differential Revision: D23634769 fbshipit-source-id: 55841d1b07136b6a27b6a39342f321638dc508cd	2020-09-11 09:24:35 -07:00
Rohan Varma	567c51cce9	In common_distributed, fix TEST_SKIPS multiprocessing manager (#44525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44525 Since `TEST_SKIPS` is a global multiprocessing.manager, this was causing issues when one test would fail and make the rest of the tests fail during setup due to networking errors. See the failed CI job: https://app.circleci.com/pipelines/github/pytorch/pytorch/212491/workflows/0450151d-ca09-4cf6-863d-272de6ed917f/jobs/7389065 for an example, where `test_ddp_backward` failed but then caused the rest of the tests to fail at the line `test_skips.update(TEST_SKIPS)`. To fix this issue, at the end of every test we revert `TEST_SKIPS` back to a regular dict, and redo the conversion to a `mulitiprocessing.Manager` in the next test, which prevents these errors. ghstack-source-id: 111844724 Test Plan: CI Reviewed By: malfet Differential Revision: D23641618 fbshipit-source-id: 27ce823968ece9804bb4dda898ffac43ef732b89	2020-09-11 09:16:33 -07:00
Gregory Chanan	d07d25a8c5	Fix MSELoss when target.requires_grad is True. (#44437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44437 MSELoss had a completely different (and incorrect, see https://github.com/pytorch/pytorch/issues/43228) path when target.requires_grad was True. This PR does the following: 1) adds derivative support for target via the normal derivatives.yaml route 2) kill the different (and incorrect) path for when target.requires_grad was True 3) modify the MSELoss CriterionTests to verify that the target derivative is checked. TODO: 1) do we still need check_criterion_jacobian when we run grad/gradgrad checks? 2) ensure the Module tests check when target.requires_grad 3) do we actually test when reduction='none' and reduction='mean'? Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23612166 Pulled By: gchanan fbshipit-source-id: 4f74d38d8a81063c74e002e07fbb7837b2172a10	2020-09-11 08:51:28 -07:00
Shen Li	a9754fb860	Use TP Tensor.metadata to carry device info (#44396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44396 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D23602576 Pulled By: mrshenli fbshipit-source-id: c639789979b2b71fc165efbcf70f37b4c39469df	2020-09-11 08:33:22 -07:00
Shen Li	f44de7cdc3	Add missing rpc.shutdown() (#44417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44417 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D23626208 Pulled By: mrshenli fbshipit-source-id: 4ff8cad0e1193f99518804c21c9dd26ae718f4eb	2020-09-11 08:32:15 -07:00
lixinyu	77cc7d1ecd	C++ APIs Transformer NN Module Top Layer (#44333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44333 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23584010 Pulled By: glaringlee fbshipit-source-id: 990026e3f1b5ae276776e344ea981386cb7528fe	2020-09-11 08:25:27 -07:00
Tongzhou Wang	09892de815	Clarify track_running_stats docs; Make SyncBatchNorm track_running_stats behavior consistent (#44445 ) Summary: context: https://github.com/pytorch/pytorch/pull/38084 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44445 Reviewed By: colesbury Differential Revision: D23634216 Pulled By: mrshenli fbshipit-source-id: d1242c694dec0e7794651f8031327625eb9989ee	2020-09-11 08:20:34 -07:00
Nick Gibson	30fccc53a9	[NNC] Don't attempt to refactor conditional scalars (#44223 ) Summary: Fixes a bug in the NNC registerizer for Cuda where it would hoist reads out of a conditional context when trying to cache them. As a quick fix, prevent scalar replacement if a usage is within a condition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44223 Reviewed By: gchanan Differential Revision: D23551247 Pulled By: nickgg fbshipit-source-id: 17a7bf2be4c8c3dd8a9ab7997dce9aea200c3685	2020-09-11 04:22:16 -07:00
Zafar	c967e7724e	[quant] conv_transpose1d_prepack / conv_transpose1d_unpack (#40360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40360 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158982 Pulled By: z-a-f fbshipit-source-id: 844d02806554aaa68b521283703e630cc544d419	2020-09-11 04:12:28 -07:00
Elias Ellison	8b8986662f	[JIT] Remove profiling nodes in autodiff forward graph (#44420 ) Summary: Previously we were not removing profiling nodes in graphs that required grad and contained diff graphs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44420 Reviewed By: bertmaher Differential Revision: D23607482 Pulled By: eellison fbshipit-source-id: af095f3ed8bb3c5d09610f38cc7d1481cbbd2613	2020-09-11 02:59:39 -07:00
Mikhail Zolotukhin	c6febc6480	[JIT] Add a python hook for a function to interpret JIT graphs. (#44493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44493 This function allows to execute a graph exactly as it is, without going through a graph executor which would run passes on the graph before interpreting it. I found this feature extremely helpful when I worked on a stress-testing script to shake out bugs from the TE fuser: I needed to execute a very specific set of passes on a graph and nothing else, and then execute exactly it. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23632505 Pulled By: ZolotukhinM fbshipit-source-id: ea81fc838933743e2057312d3156b77284d832ef	2020-09-11 02:55:26 -07:00
Pritam Damania	51ed31269e	Replace FutureMessage with c10::ivalue::Future in DistEngine. (#44239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44239 As part of https://github.com/pytorch/pytorch/issues/41574, use c10::ivalue::Future everywhere in DistEngine. ghstack-source-id: 111645070 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D23553507 fbshipit-source-id: 1b51ba13d1ebfa6c5c70b12028e9e96ce8ba51ff	2020-09-11 01:03:42 -07:00
Jerry Zhang	0c58a017bd	[quant][eagermode][refactor] Add set/get method for quantization and fusion mappings (#43990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43990 Allow user to register custom quantization and fusion patterns Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23485344 fbshipit-source-id: 4f0174ee6d8000d83de0f73cb370e9a1941d54aa	2020-09-10 21:29:39 -07:00
Omkar Salpekar	f7278473d3	[NCCL] Fix NCCL_BLOCKING_WAIT functionality with Async Error Handling (#44411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44411 This basically aborts errored NCCL communicators if either blocking wait or async error handling is enabled. Otherwise we may abort nccl communicators where neither are enabled, and this may result in subsequent GPU operations using corrupted data. ghstack-source-id: 111839264 Test Plan: Succesful Flow run: f217591683 Reviewed By: jiayisuse Differential Revision: D23605382 fbshipit-source-id: 6c16f9626362be3b0ce2feaf0979b2dff97ce61b	2020-09-10 20:57:55 -07:00
Richard Zou	69f6d94caa	Register diag_backward, diagonal_backward, infinitetely...gelu_backward as operators (#44422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44422 See #44052 for context. Test Plan: - `pytest test/test_autograd.py -v` - `pytest test/test_nn.py -v` Reviewed By: mrshenli Differential Revision: D23607691 Pulled By: zou3519 fbshipit-source-id: 09fbcd66b877af4fa85fd9b2f851ed3912ce84d6	2020-09-10 18:43:18 -07:00
Richard Zou	7ff7e6cfc8	Register cummaxmin_backward, cumprod_backward as operators (#44410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44410 See #44052 for context. One of the cumprod_backward overloads was unused so I just deleted it. Test Plan: - `pytest test/test_autograd.py -v` Reviewed By: mrshenli Differential Revision: D23605503 Pulled By: zou3519 fbshipit-source-id: f9c5b595e62d2d6e71f26580ba96df15cc9de4f7	2020-09-10 18:43:15 -07:00
Richard Zou	08b431f54c	Add trace_backward, masked_select_backward, and take_backward as ops (#44408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44408 See #44052 for context. Test Plan: - `pytest test/test_autograd.py -v` Reviewed By: mrshenli Differential Revision: D23605504 Pulled By: zou3519 fbshipit-source-id: b9b1646d13caa6e536d08669c29bfc2ad8ff89a3	2020-09-10 18:41:07 -07:00
Rohan Varma	41f62b17e7	Fix DDP join() API in the case of model.no_sync() (#44427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44427 Closes https://github.com/pytorch/pytorch/issues/44425 DDP join API currently does not work properly with `model.no_sync()`, see https://github.com/pytorch/pytorch/issues/44425 for details. This PR fixes the problem via the approach mentioned in the issue, namely scheduling an allreduce that tells joined ranks whether to sync in the backwards pass or not. Tests are added for skipping gradient synchronization for various `sync_interval`s. ghstack-source-id: 111786479 Reviewed By: pritamdamania87 Differential Revision: D23609070 fbshipit-source-id: e8716b7881f8eee95e3e3499283e716bd3d7fe76	2020-09-10 18:31:40 -07:00
Mike Ruberry	c48f511c7e	Moves some of TestTorchMathOps to OpInfos (#44277 ) Summary: This PR fixes three OpInfo-related bugs and moves some functions from TestTorchMathOps to be tested using the OpInfo pattern. The bugs are: - A skip test path in test_ops.py incorrectly formatted its string argument - Decorating the tests in common_device_type.py was incorrectly always applying decorators to the original test, not the op-specific variant of the test. This could cause the same decorator to be applied multiple times, overriding past applications. - make_tensor was incorrectly constructing tensors in some cases The functions moved are: - asin - asinh - sinh - acosh - tan - atan - atanh - tanh - log - log10 - log1p - log2 In a follow-up PR more or all of the remaining functions in TestTorchMathOps will be refactored as OpInfo-based tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44277 Reviewed By: mrshenli, ngimel Differential Revision: D23617361 Pulled By: mruberry fbshipit-source-id: edb292947769967de9383f6a84eb327f027509e0	2020-09-10 17:31:50 -07:00
Mehdi Mirzazadeh	2e744b1820	Support work.result() to get result tensors for allreduce for Gloo, NCCL backends (#43970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43970 It is resubmition of #43386 Original commit changeset: 27fbeb161706 ghstack-source-id: 111775070 Test Plan: Added checks to existing unit test and ran it on gpu devserver. Verified the test that was failing in original diff also passes: https://app.circleci.com/pipelines/github/pytorch/pytorch/210229/workflows/86bde47b-f2da-48e3-a618-566ae2713102/jobs/7253683 Reviewed By: pritamdamania87 Differential Revision: D23455047 fbshipit-source-id: b8dc4a30b95570d68a482c19131674fff2a3bc7c	2020-09-10 17:13:37 -07:00
Ann Shan	1dd3fae3d2	[pytorch] Add logging to mobile Method run (#44234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44234 Changes mobile Method to point to a mobile Module directly instead of the Module ivalue in order to access metadata for logging/debugging, and then adds said logging. ghstack-source-id: 111775806 Test Plan: CI/existing unit tests to test BC Testing fb4a logging: Built fb4a on D23436351 (because usage of run_method isn't replaced yet in this diff), and then checked the Scuba logs to see that the appropriate ad clicks were logged (one ad for Buzzfeed shopping and another about Netflix from Bustle) {F328510687} {F328511201} [Scuba sample of QPL metrics](https://www.internalfb.com/intern/scuba/query/?dataset=qpl_metrics%2Fpytorch_employee&pool=uber&view=samples_client&drillstate=%7B%22sampleCols%22%3A[%22device_model%22%2C%22instance_id_sampled%22%2C%22method%22%2C%22ios_device_class%22%2C%22points_path%22%2C%22userid_sampled%22%2C%22client_sample_rate%22%2C%22browser_name%22%2C%22ios_device_name%22%2C%22points%22%2C%22is_employee%22%2C%22is_test_user%22%2C%22network_only_queries%22%2C%22annotations%22%2C%22oncall_shortname%22%2C%22environment_tags%22%2C%22revoked_queries%22%2C%22annotations_bool%22%2C%22points_data%22%2C%22annotations_double_array%22%2C%22annotations_string_array%22%2C%22revoked_steps%22%2C%22points_set%22%2C%22device_os_version%22%2C%22ota_version_rollout%22%2C%22steps%22%2C%22vadar_calculation_result%22%2C%22app_name%22%2C%22client_push_phase%22%2C%22vadar%22%2C%22release_channel%22%2C%22interaction_class%22%2C%22exposures%22%2C%22annotations_double%22%2C%22deviceid_sampled%22%2C%22is_logged_in%22%2C%22device_os%22%2C%22time%22%2C%22major_os_ver%22%2C%22annotations_int_array%22%2C%22duration_ns%22%2C%22app_build%22%2C%22bucket_id%22%2C%22cache_and_network_queries%22%2C%22value%22%2C%22vadar_v2%22%2C%22quicklog_event%22%2C%22unixname%22%2C%22vadar_calculation_result_v2%22%2C%22trace_tags%22%2C%22annotations_int%22%2C%22quicklog_module%22%2C%22push_phase%22%2C%22year_class%22%2C%22country%22%2C%22capped_duration%22%2C%22ram_class%22%2C%22weight%22%2C%22carrier%22%2C%22app_id%22%2C%22app_version%22%2C%22react_bundle_version%22%2C%22logging_source%22%2C%22is_unsampled_for_scuba%22%2C%22instrumentation_errors%22%2C%22android_cpu_abi_list%22%2C%22days_after_release%22%2C%22cpu_cores%22%2C%22user_bucket%22%2C%22quicklog_action%22%2C%22server_scuba_sample_rate%22%2C%22points_vector%22%2C%22annotations_bool_array%22%2C%22android_device_class%22%2C%22browser_full_version%22%2C%22major_app_ver%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22hideEmptyColumns%22%3Afalse%2C%22focused_event%22%3A%22%22%2C%22show_metadata%22%3A%22false%22%2C%22start%22%3A%222020-09-08%2011%3A27%3A00%22%2C%22end%22%3A%22start%20%2B%201%20minute%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22samplingRatio%22%3A%221%22%2C%22num_samples%22%3A%22100%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22quicklog_event%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22MOBILE_MODULE_STATS%5C%22]%22]%7D%2C%7B%22column%22%3A%22userid_sampled%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22100013484978975%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22samples_client%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22qpl_metrics%2Fpytorch_employee%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&normalized=1599581160) [Scuba sample showing ad source; just the bottom two results](https://www.internalfb.com/intern/scuba/query/?dataset=business_integrity_webpage_semantic&pool=uber&drillstate=%7B%22sampleCols%22%3A[%22from_custom_sampling%22%2C%22data_version%22%2C%22scribe_category_type%22%2C%22page_id%22%2C%22name%22%2C%22source_url%22%2C%22time%22%2C%22title_semantic%22%2C%22major_version%22%2C%22server_protocol%22%2C%22custom_sampling_enabled%22%2C%22ad_id%22%2C%22appversion%22%2C%22clienttime%22%2C%22isemployee%22%2C%22title%22%2C%22images%22%2C%22weight%22%2C%22carrier%22%2C%22is_ad%22%2C%22locale%22%2C%22appid%22%2C%22ip_country%22%2C%22iab_models%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22main_dimension%22%3A%22time%22%2C%22start%22%3A%22-5%20minutes%22%2C%22samplingRatio%22%3A%221%22%2C%22compare%22%3A%22none%22%2C%22axes%22%3A%22linked%22%2C%22overlay_types%22%3A[]%2C%22minBucketSamples%22%3A%22%22%2C%22dimensions%22%3A[]%2C%22scale_type%22%3A%22absolute%22%2C%22num_samples%22%3A%22100%22%2C%22metric%22%3A%22avg%22%2C%22fill_missing_buckets%22%3A%22connect%22%2C%22smoothing_bucket%22%3A%221%22%2C%22top%22%3A%227%22%2C%22markers%22%3A%22%22%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22end%22%3A%22now%22%2C%22show_p95_ci%22%3Afalse%2C%22time_bucket%22%3A%22auto%22%2C%22compare_mode%22%3A%22normal%22%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[]%2C%22modifiers%22%3A[]%2C%22order%22%3A%22none%22%2C%22order_desc%22%3Atrue%2C%22filterMode%22%3A%22DEFAULT%22%2C%22constraints%22%3A[[%7B%22column%22%3A%22major_version%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%22288%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22metrik_view_params%22%3A%7B%22should_use_legacy_colors%22%3Afalse%2C%22columns_skip_formatting%22%3A[]%2C%22view%22%3A%22time_view%22%2C%22width%22%3A%221358%22%2C%22height%22%3A%22912%22%2C%22tableID%22%3A%22business_integrity_webpage_semantic%22%2C%22fitToContent%22%3Afalse%2C%22format_tooltip_in_percent%22%3Afalse%2C%22use_y_axis_hints_as_limits%22%3Atrue%2C%22has_dynamic_context_menu%22%3Atrue%2C%22has_context_menu%22%3Afalse%2C%22legend_mode%22%3A%22nongrid%22%2C%22connect_nulls%22%3Atrue%2C%22timezone_offset%22%3A420%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22y_min_hint%22%3A0%2C%22should_render_plugins_menu%22%3Afalse%7D%7D&view=samples_client&normalized=1599587280) Reviewed By: iseeyuan Differential Revision: D23548687 fbshipit-source-id: 3e63085663f5fd8de90a4c7dbad0a17947aee973	2020-09-10 15:26:33 -07:00
Pritam Damania	a2a81e1335	Add a CONTRIBUTING.md for the distributed package. (#44224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44224 The purpose of this file is to help developers on PT distributed get upto speed on the code structure and layout for PT Distributed. ghstack-source-id: 111644842 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D23548377 fbshipit-source-id: 561d5b8e257642de172def8fdcc1311fae20690b	2020-09-10 14:58:00 -07:00
Nikita Shulga	4bead6438a	Enable torch.autograd typechecks (#44451 ) Summary: To help with further typing, move dynamically added native contributions from `torch.autograd` to `torch._C._autograd` Fix invalid error handling pattern in `89ac30afb8/torch/csrc/autograd/init.cpp (L13-L15)` `PyImport_ImportModule` already raises Python exception and nullptr should be returned to properly propagate the to Python runtime. And all native methods/types in `torch/autograd/__init.py` after `torch._C._init_autograd()` has been called Use f-strings instead of `.format` in test_type_hints.py Fixes https://github.com/pytorch/pytorch/issues/44450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44451 Reviewed By: ezyang Differential Revision: D23618261 Pulled By: malfet fbshipit-source-id: fa5f739d7cff8410641128b55b810318c5f636ae	2020-09-10 13:37:29 -07:00
Elias Ellison	cc5a1cf616	[JIT] Erase shapes before fallback graph (#44434 ) Summary: Previously the specialized types were copied over to the fallback function, although the tensors in the fallback type were not of that type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44434 Reviewed By: SplitInfinity Differential Revision: D23611943 Pulled By: eellison fbshipit-source-id: 2ea88a97529409f6c5c4c1f59a14b623524933de	2020-09-10 12:07:31 -07:00
Yi Wang	38c10b4f30	[NCCL] Fix the initialization of futureNCCLCallbackStreams (#44347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44347 Cloned from Pull Request resolved: https://github.com/pytorch/pytorch/pull/44097, because the original author Sinan has completed the internship and now is unable to submit this diff. As johnsonpaul mentioned in D23277575 (`7d517cf96f`). It looks like all processes were allocating memory on GPU-ID=0. I was able to reproduce it by running `test_ddp_comm_hook_allreduce_with_then_hook_nccl` unit test of `test_c10d.py` and running `nvidia-smi` while test was running. The issue was reproduced as: ``` +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 `3132563` C python 777MiB \| \| 0 3132564 C python 775MiB \| \| 4 3132564 C python 473MiB \| +-----------------------------------------------------------------------------+ ``` I realized that as we initialize ProcessGroupNCCL both processes were initially allocating memory on GPU 0. We later also realized that I forgot `isHighPriority` input of `getStreamFromPool` and `futureNCCLCallbackStreams_.push_back(std::make_shared<at::cuda::CUDAStream>(at::cuda::getStreamFromPool(device_index)));` was just creating a vector of GPU 0 streams. As i changed `at::cuda::getStreamFromPool(device_index)` to `at::cuda::getStreamFromPool(false, device_index)`. `nvidia-smi` looked like: ``` +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 673925 C python 771MiB \| \| 0 673926 C python 771MiB \| \| 1 673925 C python 771MiB \| \| 1 673926 C python 771MiB \| \| 2 673925 C python 771MiB \| \| 2 673926 C python 771MiB \| \| 3 673925 C python 771MiB \| \| 3 673926 C python 771MiB \| \| 4 673925 C python 771MiB \| \| 4 673926 C python 771MiB \| \| 5 673925 C python 771MiB \| \| 5 673926 C python 771MiB \| \| 6 673925 C python 771MiB \| \| 6 673926 C python 771MiB \| \| 7 673925 C python 707MiB \| \| 7 673926 C python 623MiB \| +-----------------------------------------------------------------------------+ ``` This confirms that we were just getting GPU 0 streams for the callback. I think this does not explain the `fp16_compress` stability issue, because we were able to reproduce that even without any then callback and just calling copy from fp32 to fp16 before allreduce. However, this can explain other issues where `allreduce` was not on par with `no_hook`. I'll run some additional simulations with this diff. I tried to to replace `getStreamFromPool` by `getDefaultCUDAStream(deviceIndex)` and it wasn't causing additional memory usage. In this diff, I temporarily solved the issue by just initializing null pointers for each device in the constructor and setting the callback stream for corresponding devices inside `ProcessGroupNCCL::getNCCLComm`. After the fix it looks like the memory issue was resolved: ``` +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 2513142 C python 745MiB \| \| 4 2513144 C python 747MiB \| +-----------------------------------------------------------------------------+ ``` I could use a dictionary instead of a vector for `futureNCCLCallbackStreams_`, but since number of devices is fixed, I think it isn't necessary. Please let me know what you think in the comments. ghstack-source-id: 111485483 Test Plan: `test_c10d.py` and some perf tests. Also check `nvidia-smi` while running tests to validate memory looks okay. This diff also fixes the regression in HPC tests as we register a hook: {F322730175} See https://fb.quip.com/IGuaAbD8 (`474fdd7e2d`)bnvy for details. Reviewed By: pritamdamania87 Differential Revision: D23495436 fbshipit-source-id: ad08e1d94343252224595d7c8a279fe75e244822	2020-09-10 11:25:38 -07:00
Kenichi Maehashi	cb90fef770	Fix return value of PyErr_WarnEx ignored (SystemError) (#44371 ) Summary: This PR fixes unexpected `SystemError` when warnings are emitted and warning filters are set. ## Current behavior ``` $ python -Werror >>> import torch >>> torch.range(1, 3) UserWarning: torch.range is deprecated in favor of torch.arange and will be removed in 0.5. Note that arange generates values in [start; end), not [start; end]. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: <built-in method range of type object at 0x7f38c7703a60> returned a result with an error set ``` ## Expected behavior ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end). ``` ## Note Python exception must be raised if `PyErr_WarnEx` returns `-1` ([python docs](https://docs.python.org/3/c-api/exceptions.html#issuing-warnings)). This PR fixes warnings raised in the following code: ```py import torch torch.range(1, 3) torch.autograd.Variable().volatile torch.autograd.Variable().volatile = True torch.tensor(torch.tensor([])) torch.tensor([]).new_tensor(torch.tensor([])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44371 Reviewed By: mrshenli Differential Revision: D23598410 Pulled By: albanD fbshipit-source-id: 2fbcb13fe4025dbebaf1fd837d4c8e0944e05010	2020-09-10 10:15:21 -07:00
Hameer Abbasi	f9a0d0c21e	Allow Tensor-likes in torch.autograd.gradcheck (#43877 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43877 Reviewed By: zou3519 Differential Revision: D23493257 Pulled By: ezyang fbshipit-source-id: 6cdaabe17157b484e9491189706ccc15420ac239	2020-09-10 09:02:17 -07:00
Gregory Chanan	c8914afdfa	Merge criterion_tests and new_criterion_tests. (#44398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44398 These end up executing the same tests, so no reason to have them separate. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23600855 Pulled By: gchanan fbshipit-source-id: 0952492771498bf813f1bf8e1d7c8dce574ec965	2020-09-10 08:29:59 -07:00
Gregory Chanan	fa158c4ca6	Combine criterion and new criterion tests in test_jit. (#43958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43958 There is not any difference between these tests (I'm merging them), so let's merge them in the JIT as well. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23452337 Pulled By: gchanan fbshipit-source-id: e6d13cdb164205eec3dbb7cdcd0052b02c961778	2020-09-10 08:28:14 -07:00
Gregory Chanan	af9cad761a	Stop ignoring NotImplementedErrors in cuda CriterionTests. (#44381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44381 Perhaps this was necessary when the test was originally introduced, but it's difficult to figure out what is actually tested. And I don't think we actually use NotImplementedErorrs. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23598646 Pulled By: gchanan fbshipit-source-id: aa18154bfc4969cca22323e61683a301198823be	2020-09-10 08:18:33 -07:00
generatedunixname89002005287564	356aa54694	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23621463 fbshipit-source-id: 1cd7e94e480c7073c9a0aad55aeba98de4b96164	2020-09-10 04:24:43 -07:00
Kurt Mohler	28a23fce4c	Deprecate torch.norm and torch.functional.norm (#44321 ) Summary: Part of https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44321 Reviewed By: mrshenli Differential Revision: D23617273 Pulled By: mruberry fbshipit-source-id: 6f88b5cb097fd0acb9cf0e415172c5a86f94e9f2	2020-09-10 01:16:41 -07:00
Chris Huynh	7b547f086f	To fix extra memory allocation when using circular padding (#39273 ) Summary: For fixing https://github.com/pytorch/pytorch/issues/39256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39273 Reviewed By: anjali411 Differential Revision: D23471811 Pulled By: mruberry fbshipit-source-id: fb324b51baea765311715cdf14642b334f335733	2020-09-10 00:15:31 -07:00
Jeff Daily	65d4a6b7c0	[ROCm] fix cub hipify mappings (#44431 ) Summary: Fixes ROCm-specific workarounds introduced by https://github.com/pytorch/pytorch/issues/44259. This adds new hipify mappings that properly handle cub outside of caffe2 sources. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44431 Reviewed By: mrshenli Differential Revision: D23617417 Pulled By: ngimel fbshipit-source-id: 5d16afb6b8e6ec5ed049c51571866b0878d534ca	2020-09-09 23:39:25 -07:00
Cheng Chang	28bd4929bd	[NNC] Make it able to normalize loop with variable start (#44133 ) Summary: Loops with variable start can also be normalized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44133 Test Plan: updated testNormalizeStartVariable. Reviewed By: navahgar Differential Revision: D23507097 Pulled By: cheng-chang fbshipit-source-id: 4e9aad1cd4f4a839f59a00bf8ddf97637a1a6648	2020-09-09 23:05:57 -07:00
taiyuanz	c515881137	Add reset_grad() function (#44423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44423 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42754 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23010859 Pulled By: ngimel fbshipit-source-id: 56eec43eba88b98cbf714841813977c68f983564	2020-09-09 22:05:45 -07:00
Meghan Lele	89ac30afb8	[JIT] Propagate type sharing setting to submodule compilation (#44226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44226 Summary At present, the `share_types` argument to `create_script_module` is used to decide whether to reuse a previously created type for a top-level module that has not yet been compiled. However, that setting does not apply to the compilation of submodules of the top-level module; types are still reused if possible. This commit modifies `create_script_module` so that the `share_types` flag is honoured during submodule compilation as well. Test Plan This commit adds a unit test to `TestTypeSharing` that checks that submodule types are not shared or reused when `share_types` is set to `False`. Fixes This commit fixes #43605. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23602371 Pulled By: SplitInfinity fbshipit-source-id: b909b8b6abbe3b4cb9be8319ac263ade90e83bd3	2020-09-09 20:06:35 -07:00
Meghan Lele	d3b6d5caf1	[JIT] Add support for del to TS classes (#44352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44352 Summary This commit adds support for `del` with class instances. If a class implements `__delitem__`, then `del class_instance[key]` is syntactic sugar for `class_instance.__delitem__[key]`. Test Plan This commit adds a unit test to TestClassTypes to test this feature. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23603102 Pulled By: SplitInfinity fbshipit-source-id: 28ad26ddc9a693a58a6c48a0e853a1c7cf5c9fd6	2020-09-09 19:52:35 -07:00
Omkar Salpekar	e028ad0762	Fix HashStoreTests and move to Gtest (#43384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43384 Much like the FileStoreTests, the HashStoreTests were also run in a single blob and threw exceptions upon failure. This modularizes the test by separating each function into separate gtest test cases. ghstack-source-id: 111690834 Test Plan: Confirmed that the tests pass on devvm. Reviewed By: jiayisuse Differential Revision: D23257579 fbshipit-source-id: 7e821f0e9ee74c8b815f06facddfdb7dc2724294	2020-09-09 17:56:33 -07:00
Omkar Salpekar	69a3ff005d	Modularize FileStoreTest and move to Gtest (#43383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43383 FileStore Test currently has a large blob of tests that throw exceptions upon failure. This PR modularizes each test so they can run independently, and migrates the framework to gtest. ghstack-source-id: 111690831 Test Plan: Confirmed tests pass on devvm Reviewed By: jiayisuse Differential Revision: D22879473 fbshipit-source-id: 6fa5468e594a53c9a6b972757068dfc41645703e	2020-09-09 17:56:30 -07:00
Omkar Salpekar	a7fba7de22	Convert StoreTestUtils to Gtest (#43382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43382 StoreTestCommon defines standard helper functions that are used by all of our Store tests. These helpers currently throw exceptions upon failure, this PR changes them to use gtest assertions instead. ghstack-source-id: 111690833 Test Plan: Tested the 2 PR's above this on devvm Reviewed By: jiayisuse Differential Revision: D22828156 fbshipit-source-id: 9e116cf2904e05ac0342a441e483501e00aad3dd	2020-09-09 17:55:25 -07:00
Elias Ellison	b69c28d02c	Improving ModuleList indexing error msg (#43361 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/41946/, to suggest enumerating a module as an alternative if a user tries indexing into a modulelist/sequential with a non-integer literal Pull Request resolved: https://github.com/pytorch/pytorch/pull/43361 Reviewed By: mrshenli Differential Revision: D23602388 Pulled By: eellison fbshipit-source-id: 51fa28d5bc45720529b3d45e92d367ee6c9e3316	2020-09-09 16:22:57 -07:00
Elias Ellison	e0c65abd38	Revert D23568330: [pytorch][PR] Moves some of TestTorchMathOps to OpInfos Test Plan: revert-hammer Differential Revision: D23568330 (`a953a825cc`) Original commit changeset: 03e69fccdbfd fbshipit-source-id: 04ec6843c5eb3c84ddf226dad0088172d9bed84d	2020-09-09 15:48:56 -07:00
Lillian Johnson	b0bcdbb1ab	[JIT] Support partially specified sizes/strides in IRParser (#44113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44113 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23508149 Pulled By: Lilyjjo fbshipit-source-id: b6b2d32109fae599bc5347dae742b67a2e4a0a49	2020-09-09 14:45:51 -07:00
Yuchen Huang	a00d36b0e7	[PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" (#44400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44400 This diff does the identical thing as D23549149 (`398409f072`) does. A fix included for OSS CI: pytorch_windows_vs2019_py36_cuda10.1_test1 ghstack-source-id: 111679745 Test Plan: - CI - OSS CI Reviewed By: xcheng16 Differential Revision: D23601050 fbshipit-source-id: 8ebdcd8fdc5865078889b54b0baeb397a90ddc40	2020-09-09 13:01:17 -07:00
Ailing Zhang	24efd29d19	Check commutativity for computed dispatch table and add a test to check entries. (#44088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44088 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23492793 Pulled By: ailzhang fbshipit-source-id: 37502f2a8a4d755219b400fcbb029e49d6cdb6e9	2020-09-09 12:48:34 -07:00
Omkar Salpekar	48c47db8fe	[NCCL] Add Environment Variable to guard Async Error Handling feature (#44163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44163 In this PR, we introduce a new environment variable (NCCL_ASYNC_ERROR_HANDLING), which guards the asynchronous error handling feature. We intend to eventually turn this feature on by default for all users, but this is a temporary solution so the change in behavior from hanging to crashing is not the default for users all of a sudden. ghstack-source-id: 111637788 Test Plan: CI/Sandcastle. We will turn on this env var by default in torchelastic and HPC trainer soon. Reviewed By: jiayisuse Differential Revision: D23517895 fbshipit-source-id: e7cd244b2ddf2dc0800ff7df33c73a6f00b63dcc	2020-09-09 12:26:25 -07:00
Omkar Salpekar	211ece7267	[NCCL] ProcessGroupNCCL Destructor Blocks on WorkNCCL Completion (#41054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41054 This Commit: ProcessGroupNCCL destructor now blocks until all WorkNCCL objects have either been aborted or completed and removed from the work vector. This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. ghstack-source-id: 111614314 Test Plan: 1. DDP Sanity Check: First we have a sanity check based on the PyTorch DDP benchmark. This verifies that the baseline DDP training with NCCL for standard CU workloads works well (esp. with standard models like Resnet50 and BERT). Here is a sample Flow: f213293473 1. HPC Performance Benchmarks: This stack has undergone thorough testing and profiling on the Training Cluster with varying number of nodes. This introduces 1-1.5% QPS regression only (~200-400 QPS regression for 8-64 GPUs). 1. HPC Accuracy Benchmarks: We've confirmed NE parity with the existing NCCL/DDP stack without this change. 1. Kernel-Specific Benchmarks: We have profiled other approaches for this system (such as cudaStreamAddCallback) and performed microbenchmarks to confirm the current solution is optimal. 1. Sandcastle/CI: Apart from the recently fixed ProcessGroupNCCL tests, we will also introduce a new test for desynchronization scenarios. Reviewed By: jiayisuse Differential Revision: D22054298 fbshipit-source-id: 2b95a4430a4c9e9348611fd9cbcb476096183c06	2020-09-09 12:26:22 -07:00
Omkar Salpekar	afbf2f140b	[NCCL] WorkNCCL Helper Functions (#41053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41053 This Commit: Some minor refactoring - added helper to check if `WorkNCCL` objects have timed out. Adding a new finish function to ProcessGroupNCCL::WorkNCCL that avoids notifying CV and uses `lock_guard`. Also renaming the timeoutCVMutex mutex to be more descriptive. This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. ghstack-source-id: 111614315 Test Plan: See D22054298 for verification of correctness and performance Reviewed By: jiayisuse Differential Revision: D21943520 fbshipit-source-id: b27ee329f0da6465857204ee9d87953ed6072cbb	2020-09-09 12:26:18 -07:00
Omkar Salpekar	f8f7b7840d	[NCCL] Abort Errored and Timed Out NCCL Communicators from Watchdog Thread (#41052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41052 This Commit: Watchdog Thread checks for error-ed or timed out `WorkNCCL` objects and aborts all associated NCCL Communicators. For now, we also process these aborted communicators as with the existing Watchdog logic (by adding them to abortedCommIds and writing aborted communicator ids to the store.) This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. ghstack-source-id: 111614313 Test Plan: See D22054298 for verification of correctness and performance Reviewed By: jiayisuse Differential Revision: D21943151 fbshipit-source-id: 337bfcb8af7542c451f1e4b3dcdfc5870bdec453	2020-09-09 12:26:15 -07:00
Omkar Salpekar	4e5c55ef69	[NCCL] Use cudaEventQuery to Poll for GPU operation errors (#41051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41051 This Commit: In the workCleanupThread, we process completion and exception handling for workNCCL objects corresponding to collective calls that have either completed GPU Execution, or have already thrown an exception. This way, we throw an exception from the workCleanupThread for failed GPU operations. This approach replaces the previous (and lower performance) approach of enqueuing a callback on the CUDA stream to process failures. This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. ghstack-source-id: 111614319 Test Plan: See D22054298 for verification of correctness and performance Reviewed By: jiayisuse Differential Revision: D21938498 fbshipit-source-id: df598365031ff210afba57e0c7be865e3323ca07	2020-09-09 12:26:12 -07:00
Omkar Salpekar	1df24fd457	[NCCL] Timeout Loop Thread for Async Error Handling (#41050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41050 This Commit: We introduce a workVector to track live workNCCL objects corresponding to collective operations. Further, we introduce a workCleanupLoop, which busy-polls the vector of workNCCL objects and removes them upon completion. This Stack: The purpose of this stack is to fix the hanging behavior observed in when using PyTorch DDP training with NCCL. In various situations (desynchronization, high GPU utilization, etc.), NCCL collectives may hang due to waiting on an unresponsive worker. This stack detects such hanging behavior and aborts timed-out collectives by throwing a user-visible exception, all with minimal perf regression. Training can then be restarted from a previous checkpoint with something like torchelastic. Test Plan: See D22054298 for verification of correctness and performance Reviewed By: jiayisuse Differential Revision: D21916637 fbshipit-source-id: f8cadaab0071aaad1c4e31f9b089aa23cba0cfbe	2020-09-09 12:25:06 -07:00
Nikita Shulga	683380fc91	Use compile time cudnn version if linking with it statically (#44402 ) Summary: This should prevent torch_python from linking the entire cudnn library statically just to query its version Pull Request resolved: https://github.com/pytorch/pytorch/pull/44402 Reviewed By: seemethere Differential Revision: D23602720 Pulled By: malfet fbshipit-source-id: 185b15b789bd48b1df178120801d140ea54ba569	2020-09-09 11:33:41 -07:00
Bert Maher	6ec8fabc29	Fix frac in CUDA fuser (#44152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44152 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23528506 fbshipit-source-id: bfd468d72fa55ce317f88ae83e1f2d5eee041aa0	2020-09-09 11:10:08 -07:00
Bert Maher	350130a69d	Prevent the TE fuser from getting datatypes it can't handle (#44160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44160 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23528508 Pulled By: bertmaher fbshipit-source-id: 03b22725fb2666f441cb504b35397ea6d155bb85	2020-09-09 11:10:04 -07:00
Bert Maher	960c088a58	[te] Fix casting of unsigned char, and abs(int) (#44157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44157 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23528507 Pulled By: bertmaher fbshipit-source-id: c5ef0422a91a4665b616601bed8b7cd137be39f9	2020-09-09 11:08:36 -07:00
Omkar Salpekar	7c464eed16	Skipping CUDA tests in ProcessGroupGloo and logs (#42488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42488 Currently, ProcessGroupGloo tests do not emit logs if the test was skipped due CUDA not being available/not enough CUDA devices. This PR clarifies the reason for skipping through these logs. ghstack-source-id: 111638111 Test Plan: tested on devvm and devgpu Reviewed By: jiayisuse Differential Revision: D22879396 fbshipit-source-id: d483ca46b5e22ed986521262c11a1c6dbfbe7efd	2020-09-09 10:52:52 -07:00
Michael Carilli	2a87742ffa	Autocast wrappers for RNN cell apis (#44296 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/42605. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44296 Reviewed By: izdeby Differential Revision: D23580447 Pulled By: ezyang fbshipit-source-id: 86027b693fd2b648f043ab781b84ffcc1f72854d	2020-09-09 09:44:59 -07:00
Mike Ruberry	a953a825cc	Moves some of TestTorchMathOps to OpInfos (#44277 ) Summary: This PR fixes three OpInfo-related bugs and moves some functions from TestTorchMathOps to be tested using the OpInfo pattern. The bugs are: - A skip test path in test_ops.py incorrectly formatted its string argument - Decorating the tests in common_device_type.py was incorrectly always applying decorators to the original test, not the op-specific variant of the test. This could cause the same decorator to be applied multiple times, overriding past applications. - make_tensor was incorrectly constructing tensors in some cases The functions moved are: - asin - asinh - sinh - acosh - tan - atan - atanh - tanh - log - log10 - log1p - log2 In a follow-up PR more or all of the remaining functions in TestTorchMathOps will be refactored as OpInfo-based tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44277 Reviewed By: ngimel Differential Revision: D23568330 Pulled By: mruberry fbshipit-source-id: 03e69fccdbfd560217c34ce4e9a5f20e10d05a5e	2020-09-09 09:41:03 -07:00
Bert Maher	8acce55015	Dump optimized graph when logging in already-optimized PE (#44315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44315 I find it more intuitive to dump the optimized graph if we have one; when I first saw the unoptimized graph being dumped I thought we had failed to apply any optimizations. Test Plan: Observe output by hand Reviewed By: Lilyjjo Differential Revision: D23578813 Pulled By: bertmaher fbshipit-source-id: e2161189fb0e1cd53aae980a153aea610871662a	2020-09-09 01:28:48 -07:00
Taewook Oh	7a64b0c27a	Export Node::isBefore/isAfter for PythonAPI (#44162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44162 This diff exports Node::isBefore/isAfter method to PythonAPI. Test Plan: Tested locally. Please let me know if there is a set of unit tests to be passed. Reviewed By: soumith Differential Revision: D23514448 fbshipit-source-id: 7ef709b036370217ffebef52fd93fbd68c464e89	2020-09-09 00:57:08 -07:00
Rohan Varma	b22abbe381	Enable test_distributed to work with spawn mode (#41769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41769 Currently the tests in `test_distributed` only work with the `fork` mode multiprocessing, this PR introduces support for `spawn` mode multiprocessing as well (while keeping the `fork` mode intact). Motivations for the change: 1) Spawn multiprocessing is the default on MacOS, so it better emulates how MacOS users would use distributed 2) With python 3.8+, spawn is the default on linux, so we should have test coverage for this 3) PT multiprocessing suggests using spawn/forkserver over fork, for sharing cuda tensors: https://pytorch.org/docs/stable/multiprocessing.html 4) Spawn is better supported with respect to certain sanitizers such as TSAN, so adding this sanitizer coverage may help us uncover issues. How it is done: 1) Move `test_distributed` tests in `_DistTestBase` class to a shared file `distributed_test` (similar to how the RPC tests are structured) 2) For `Barrier`, refactor the setup of temp directories, as the current version did not work with spawn, each process would get a different randomly generated directory and thus would write to different barriers. 3) Add all the relevant builds to run internally and in OSS. Running test_distributed with spawn mode in OSS can be done with: `python test/run_test.py -i distributed/test_distributed_spawn -v` Reviewed By: izdeby Differential Revision: D22408023 fbshipit-source-id: e206be16961fd80438f995e221f18139d7e6d2a9	2020-09-08 23:11:12 -07:00
Natalia Gimelshein	ecc6358dbe	Port nonzero cuda from THC to ATen (#44259 ) Summary: 1) Ports nonzero from THC to ATen 2) replaces most thrust uses with cub, to avoid synchronization and to improve performance. There is still one necessary synchronization point, communicating number of nonzero elements from GPU to CPU 3) slightly changes algorithm, now we first compute the number of nonzeros, and then allocate correct-sized output, instead of allocating full-sized output as was done before, to account for possibly all elements being non-zero 4) unfortunately, since the last transforms are still done with thrust, 2) is slightly beside the point, however it is a step towards a future without thrust 4) hard limits the number of elements in the input tensor to MAX_INT. Previous implementation allocated a Long tensor with the size ndimnelements, so that would be at least 16 GB for a tensor with MAX_INT elements. It is reasonable to say that larger tensors could not be used anyway. Benchmarking is done for tensors with approximately half non-zeros <details><summary>Benchmarking script</summary> <p> ``` import torch from torch.utils._benchmark import Timer from torch.utils._benchmark import Compare import sys device = "cuda" results = [] for numel in (1024 128,):#, 1024 * 1024, 1024 * 1024 * 128): inp = torch.randint(2, (numel,), device="cuda", dtype=torch.float) for ndim in range(2,3):#(1,4): if ndim == 1: shape = (numel,) elif ndim == 2: shape = (1024, numel // 1024) else: shape = (1024, 128, numel // 1024 // 128) inp = inp.reshape(shape) repeats = 3 timer = Timer(stmt="torch.nonzero(inp, as_tuple=False)", label="Nonzero", sub_label=f"number of elts {numel}", description = f"ndim {ndim}", globals=globals()) for i in range(repeats): results.append(timer.blocked_autorange()) print(f"\rnumel {numel} ndim {ndim}", end="") sys.stdout.flush() comparison = Compare(results) comparison.print() ``` </p> </details> ### Results Before: ``` [--------------------------- Nonzero ---------------------------] \| ndim 1 \| ndim 2 \| ndim 3 1 threads: ------------------------------------------------------ number of elts 131072 \| 55.2 \| 71.7 \| 90.5 number of elts 1048576 \| 113.2 \| 250.7 \| 497.0 number of elts 134217728 \| 8353.7 \| 23809.2 \| 54602.3 Times are in microseconds (us). ``` After: ``` [-------------------------- Nonzero --------------------------] \| ndim 1 \| ndim 2 \| ndim 3 1 threads: ---------------------------------------------------- number of elts 131072 \| 48.6 \| 79.1 \| 90.2 number of elts 1048576 \| 64.7 \| 134.2 \| 161.1 number of elts 134217728 \| 3748.8 \| 7881.3 \| 9953.7 Times are in microseconds (us). ``` There's a real regression for smallish 2D tensor due to added work of computing number of nonzero elements, however, for other sizes there are significant gains, and there are drastically lower memory requirements. Perf gains would be even larger for tensors with fewer nonzeros. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44259 Reviewed By: izdeby Differential Revision: D23581955 Pulled By: ngimel fbshipit-source-id: 0b99a767fd60d674003d83f0848dc550d7a363dc	2020-09-08 20:52:51 -07:00
Mikhail Zolotukhin	bd8e38cd88	[TensorExpr] Fuser: check node inputs' device before merging the node into a fusion group. (#44241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44241 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23554192 Pulled By: ZolotukhinM fbshipit-source-id: fb03262520303152b83671603e08e7aecc24f5f2	2020-09-08 19:32:23 -07:00
Supriya Rao	646ffd4886	[quant] Move EmbeddingBag eager quantization to static (#44217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44217 Move the tests to static ones as well Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_embedding_bag_api Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23547386 fbshipit-source-id: 41f81c31e1613098ecf6a7eff601c7dcd4b09c76	2020-09-08 19:05:02 -07:00
Supriya Rao	57b87aaf59	[quant] Add quantized Embedding module (#44208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44208 Add quantized module in static quantization namespace. Embedding quantization requires only weights to be quantized so it is static. Internally it calls the embedding_bag_byte op with the offsets set corresponding to the indices. Future PR will move EmbeddingBag quantization from dynamic to static as well. Test Plan: python test/test_quantization.py test_embedding_api Imported from OSS Reviewed By: vkuzo Differential Revision: D23547384 fbshipit-source-id: eddc6fb144b4a771060e7bab5853656ccb4443f0	2020-09-08 19:04:59 -07:00
Jerry Zhang	6269b6e0f0	[quant][graphmode][fx][api] Call fuse in prepare (#43984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43984 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23459261 fbshipit-source-id: 6b56b0916d76df67b9cc2f4be1fcee905d604019	2020-09-08 18:09:26 -07:00
Nick Gibson	be94dba429	[NNC] fix support for FP16 in CudaCodgen (#44209 ) Summary: Fixes a bug where FP16 values could be incorrectly cast to a half type that doesn't have a cast operator by inserting the cuda specific cast to float during handling of the Cast node, not as a wrapper around printing Loads and Stores. Two main changes: the HalfChecker now inserts the casts to float explicitly in the IR, and the PrioritizeLoad mutator now consumes both Loads and a Cast which immediately preceded a load. Tested with test_jit_fuser_te.py and test_tensorexpr.py, plus C++ tests obv. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44209 Reviewed By: izdeby Differential Revision: D23575577 Pulled By: nickgg fbshipit-source-id: 808605aeb2af812758f96f9fdc11b07e08053b46	2020-09-08 18:00:39 -07:00
Jerry Zhang	9f54bcc522	[quant][graphmode][fx] Support inplace option (#43983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43983 Support inplace option in apis Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23459260 fbshipit-source-id: 80409c7984f17d1a4e13fb1eece8e18a69ee43b3	2020-09-08 17:39:13 -07:00
Vasiliy Kuznetsov	00b5bd536f	fx quant: add docblocks to _find_matches and _find_quants (#43928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43928 Improving readability, no logic change. Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23440249 fbshipit-source-id: a7ebfc7ad15c73e26b9a94758e7254413cc17d29	2020-09-08 16:13:11 -07:00
Jerry Zhang	43e38d60d6	[quant][graphmode][fx] Support quantize per channel in all cases (#44042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44042 Missed one case last time Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23479345 fbshipit-source-id: 30e6713120c494e9fab5584de4df9b25bec83d32	2020-09-08 15:45:14 -07:00
James Reed	1fcccd6a18	[FX] Minor fixups in Graph printout (#44214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44214 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23545501 Pulled By: jamesr66a fbshipit-source-id: dabb3b051ed4da213b2087979ade8a649288bd5d	2020-09-08 14:45:32 -07:00
Sujoy Saraswati	54931ebb7b	Release saved variable from DifferentiableGraphBackward (#42994 ) Summary: When the backward ops execute via the autograd engine evaluate_function(), the fn.release_variables() is called to release the SavedVariables. For the eager mode ops, this releases the saved inputs that was required for backward grad function. However, with TorchScript, we get a DifferentableGraph and the DifferentiableGraphBackward() doesn't implement a release_variables(). This leads to the SavedVariables to be alive longer. Implement release_variables() for DifferentiableGraphBackward to release these SavedVariables early. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42994 Reviewed By: izdeby Differential Revision: D23503172 Pulled By: albanD fbshipit-source-id: d87127498cfa72883ae6bb31d0e6c7056c4c36d4	2020-09-08 14:36:52 -07:00
Mike Ruberry	63d62d3e44	Skips test_addcmul_cuda if using ROCm (#44304 ) Summary: This test is failing consistently on linux-bionic-rocm3.7-py3.6-test2. Relevant log snippet: ``` 03:43:11 FAIL: test_addcmul_cuda_float16 (__main__.TestForeachCUDA) 03:43:11 ---------------------------------------------------------------------- 03:43:11 Traceback (most recent call last): 03:43:11 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 818, in wrapper 03:43:11 method(args, kwargs) 03:43:11 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 258, in instantiated_test 03:43:11 result = test(self, args) 03:43:11 File "test_foreach.py", line 83, in test_addcmul 03:43:11 self._test_pointwise_op(device, dtype, torch._foreach_addcmul, torch._foreach_addcmul_, torch.addcmul) 03:43:11 File "test_foreach.py", line 58, in _test_pointwise_op 03:43:11 self.assertEqual(tensors, expected) 03:43:11 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1153, in assertEqual 03:43:11 exact_dtype=exact_dtype, exact_device=exact_device) 03:43:11 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1127, in assertEqual 03:43:11 self.assertTrue(result, msg=msg) 03:43:11 AssertionError: False is not true : Tensors failed to compare as equal! With rtol=0.001 and atol=1e-05, found 10 element(s) (out of 400) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.00048828125 (-0.46484375 vs. -0.46533203125), which occurred at index (11, 18). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44304 Reviewed By: malfet, izdeby Differential Revision: D23578316 Pulled By: mruberry fbshipit-source-id: 558eecf42677383e7deaa4961e12ef990ffbe28c	2020-09-08 13:14:25 -07:00
Meghan Lele	caf23d110f	[JIT] Unshare types for modules that define() in __init__ (#44233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44233 Summary By default, scripting tries to share concrete and JIT types across compilations. However, this can lead to incorrect results if a module extends `torch.jit.ScriptModule`, and injects instance variables into methods defined using `define`. This commit detects when this has happened and disables type sharing for the compilation of the module that uses `define` in `__init__`. Test Plan This commit adds a test to TestTypeSharing that tests this scenario. Fixes This commit fixes #43580. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23553870 Pulled By: SplitInfinity fbshipit-source-id: d756e87fcf239befa0012998ce29eeb25728d3e1	2020-09-08 12:16:45 -07:00
James Reed	4e0ac120e9	[FX] Only copy over training attr if it\'s there (#44314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44314 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23578189 Pulled By: jamesr66a fbshipit-source-id: fb7643f28582bd5009a826663a937fbe188c50bc	2020-09-08 11:50:08 -07:00
Vasiliy Kuznetsov	fd8e2064e0	quant: switch observers to use min_max (#42957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42957 Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks for all observers except `HistogramObserver`. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu /* * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/ * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/ * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/ * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/ */ ``` Imported from OSS Reviewed By: supriyar Differential Revision: D23093995 fbshipit-source-id: 9f416d144109b5b80baf089eb4bcfabe8fe358d5	2020-09-08 11:39:44 -07:00
Ailing Zhang	1b2da9ed82	Expose alias key info in dumpState and update test_dispatch. (#44081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44081 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23492794 Pulled By: ailzhang fbshipit-source-id: 27a2978591900463bda2e92e0201c9fd719f9792	2020-09-06 18:43:05 -07:00
Mike Ruberry	bb861e1d69	Ports CUDA var and std reduce all (with no out argument) to ATen, fixes var docs (#43858 ) Summary: When var and std are called without args (other than unbiased) they currently call into TH or THC. This PR: - Removes the THC var_all and std_all functions and updates CUDA var and std to use the ATen reduction - Fixes var's docs, which listed its arguments in the incorrect order - Adds new tests comparing var and std with their NumPy counterparts Performance appears to have improved as a result of this change. I ran experiments on 1D tensors, 1D tensors with every other element viewed ([::2]), 2D tensors and 2D transposed tensors. Some notable datapoints: - torch.randn((8000, 8000)) - var measured 0.0022215843200683594s on CUDA before the change - var measured 0.0020322799682617188s on CUDA after the change - torch.randn((8000, 8000)).T - var measured .015128850936889648 on CUDA before the change - var measured 0.001912832260131836 on CUDA after the change - torch.randn(8000 ** 2) - std measured 0.11031460762023926 on CUDA before the change - std measured 0.0017833709716796875 on CUDA after the change Timings for var and std are, as expected, similar. On the CPU, however, the performance change from making the analogous update was more complicated, and ngimel and I decided not to remove CPU var_all and std_all. ngimel wrote the following script that showcases how single-threaded CPU inference would suffer from this change: ``` import torch import numpy as np from torch.utils._benchmark import Timer from torch.utils._benchmark import Compare import sys base = 8 multiplier = 1 def stdfn(a): meanv = a.mean() ac = a-meanv return torch.sqrt(((acac).sum())/a.numel()) results = [] num_threads=1 for _ in range(7): size = basemultiplier input = torch.randn(size) tasks = [("torch.var(input)", "torch_var"), ("torch.var(input, dim=0)", "torch_var0"), ("stdfn(input)", "stdfn"), ("torch.sum(input, dim=0)", "torch_sum0") ] timers = [Timer(stmt=stmt, num_threads=num_threads, label="Index", sub_label=f"{size}", description=label, globals=globals()) for stmt, label in tasks] repeats = 3 for i, timer in enumerate(timers * repeats): results.append( timer.blocked_autorange() ) print(f"\r{i + 1} / {len(timers) * repeats}", end="") sys.stdout.flush() multiplier =10 print() comparison = Compare(results) comparison.print() ``` The TH timings using this script on my devfair are: ``` [------------------------------ Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 1 threads: ---------------------------------------------------------- 8 \| 16.0 \| 5.6 \| 40.9 \| 5.0 80 \| 15.9 \| 6.1 \| 41.6 \| 4.9 800 \| 16.7 \| 12.0 \| 42.3 \| 5.0 8000 \| 27.2 \| 72.7 \| 51.5 \| 6.2 80000 \| 129.0 \| 715.0 \| 133.0 \| 18.0 800000 \| 1099.8 \| 6961.2 \| 842.0 \| 112.6 8000000 \| 11879.8 \| 68948.5 \| 20138.4 \| 1750.3 ``` and the ATen timings are: ``` [------------------------------ Index ------------------------------] \| torch_var \| torch_var0 \| stdfn \| torch_sum0 1 threads: ---------------------------------------------------------- 8 \| 4.3 \| 5.4 \| 41.4 \| 5.4 80 \| 4.9 \| 5.7 \| 42.6 \| 5.4 800 \| 10.7 \| 11.7 \| 43.3 \| 5.5 8000 \| 69.3 \| 72.2 \| 52.8 \| 6.6 80000 \| 679.1 \| 676.3 \| 129.5 \| 18.1 800000 \| 6770.8 \| 6728.8 \| 819.8 \| 109.7 8000000 \| 65928.2 \| 65538.7 \| 19408.7 \| 1699.4 ``` which demonstrates that performance is analogous to calling the existing var and std with `dim=0` on a 1D tensor. This would be a significant performance hit. Another simple script shows the performance is mixed when using multiple threads, too: ``` import torch import time # Benchmarking var and std, 1D with varying sizes base = 8 multiplier = 1 op = torch.var reps = 1000 for _ in range(7): size = base multiplier t = torch.randn(size) elapsed = 0 for _ in range(reps): start = time.time() op(t) end = time.time() elapsed += end - start multiplier *= 10 print("Size: ", size) print("Avg. elapsed time: ", elapsed / reps) ``` ``` var cpu TH vs ATen timings Size: 8 Avg. elapsed time: 1.7853736877441406e-05 vs 4.9788951873779295e-06 (ATen wins) Size: 80 Avg. elapsed time: 1.7803430557250977e-05 vs 6.156444549560547e-06 (ATen wins) Size: 800 Avg. elapsed time: 1.8569469451904296e-05 vs 1.2302875518798827e-05 (ATen wins) Size: 8000 Avg. elapsed time: 2.8756141662597655e-05 vs. 6.97789192199707e-05 (TH wins) Size: 80000 Avg. elapsed time: 0.00026622867584228516 vs. 0.0002447957992553711 (ATen wins) Size: 800000 Avg. elapsed time: 0.0010556647777557374 vs 0.00030616092681884767 (ATen wins) Size: 8000000 Avg. elapsed time: 0.009990205764770508 vs 0.002938544034957886 (ATen wins) std cpu TH vs ATen timings Size: 8 Avg. elapsed time: 1.6681909561157225e-05 vs. 4.659652709960938e-06 (ATen wins) Size: 80 Avg. elapsed time: 1.699185371398926e-05 vs. 5.431413650512695e-06 (ATen wins) Size: 800 Avg. elapsed time: 1.768803596496582e-05 vs. 1.1279821395874023e-05 (ATen wins) Size: 8000 Avg. elapsed time: 2.7791500091552735e-05 vs 7.031106948852539e-05 (TH wins) Size: 80000 Avg. elapsed time: 0.00018650460243225096 vs 0.00024368906021118164 (TH wins) Size: 800000 Avg. elapsed time: 0.0010522041320800782 vs 0.0003039860725402832 (ATen wins) Size: 8000000 Avg. elapsed time: 0.009976618766784668 vs. 0.0029211788177490234 (ATen wins) ``` These results show the TH solution still performs better than the ATen solution with default threading for some sizes. It seems like removing CPU var_all and std_all will require an improvement in ATen reductions. https://github.com/pytorch/pytorch/issues/40570 has been updated with this information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43858 Reviewed By: zou3519 Differential Revision: D23498981 Pulled By: mruberry fbshipit-source-id: 34bee046c4872d11c3f2ffa1b5beee8968b22050	2020-09-06 09:40:54 -07:00
Mike Ruberry	83a6e7d342	Adds inequality testing aliases for better NumPy compatibility (#43870 ) Summary: This PR adds the following aliaes: - not_equal for torch.ne - greater for torch.gt - greater_equal for torch.ge - less for torch.lt - less_equal for torch.le This aliases are consistent with NumPy's naming for these functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43870 Reviewed By: zou3519 Differential Revision: D23498975 Pulled By: mruberry fbshipit-source-id: 78560df98c9f7747e804a420c1e53fd1dd225002	2020-09-06 09:36:23 -07:00
Nikita Shulga	e358d516c8	Revert D23549149: [PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" Test Plan: revert-hammer Differential Revision: D23549149 (`398409f072`) Original commit changeset: fad742a8d4e6 fbshipit-source-id: bd92a2033a804d3e6a2747b4fda4ca527991a993	2020-09-06 00:06:35 -07:00
Martin Yuan	70c8daf439	Apply selective build on RNN operators (#44132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43985 Added ``` def(detail::SelectiveStr<true>, ...) impl(detail::SelectiveStr<true>, ...) ``` in torch/library, which can also be used for other templated selective registration. Size saves for this diff: fbios-pika: 78 KB igios: 87 KB Test Plan: Imported from OSS Reviewed By: ljk53, smessmer Differential Revision: D23459774 Pulled By: iseeyuan fbshipit-source-id: 86d34cfe8e3f852602f203db06f23fa99af2c018	2020-09-05 23:47:51 -07:00
Muthu Arivoli	719d29dab5	Implement torch.i0 and torch.kaiser_window (#43132 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38349 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43132 Reviewed By: smessmer Differential Revision: D23479072 Pulled By: mruberry fbshipit-source-id: 4fb1de44830771c6a7222cf19f7728d9ac7c043b	2020-09-05 23:11:47 -07:00
Yi Wang	396469f18c	Explicitly forbidden the other inherited methods of RemoteModule. (#43895 ) Summary: Throw exceptions when the methods except for forwardXXX are used. Original PR issue: RemoteModule enhancements #40550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43895 Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: rohan-varma Differential Revision: D23392842 Pulled By: SciPioneer fbshipit-source-id: 7c09a55a03f9f0b7e9f9264a42bfb907607f4651	2020-09-05 14:48:56 -07:00
Supriya Rao	199c73be0f	[quant][pyper] Support quantization of ops in fork-wait subgraph (#44048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44048 Inline the fork-wait calls to make sure we can see the ops to be quantized in the main graph Also fix the InlineForkWait JIT pass to account for the case where the aten::wait call isn't present in the main graph and we return future tensor from subgraph Example ``` graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_6325.DperModuleWrapper, %argument_1.1 : Tensor, %argument_2.1 : Tensor): %3 : Future[Tensor[]] = prim::fork_0(%self.1, %argument_1.1, %argument_2.1) # :0:0 return (%3) with prim::fork_0 = graph(%self.1 : __torch__.dper3.core.interop.___torch_mangle_5396.DperModuleWrapper, %argument_1.1 : Tensor, %argument_2.1 : Tensor): %3 : __torch__.dper3.core.interop.___torch_mangle_6330.DperModuleWrapper = prim::GetAttr[name="x"](%self.1) %4 : __torch__.dper3.core.interop.___torch_mangle_5397.DperModuleWrapper = prim::GetAttr[name="y"](%self.1) %5 : __torch__.dper3.core.interop.___torch_mangle_6327.DperModuleWrapper = prim::GetAttr[name="z"](%4) %6 : Tensor = prim::CallMethod[name="forward"](%5, %argument_1.1, %argument_2.1) # :0:0 %7 : None = prim::CallMethod[name="forward"](%3, %6) # :0:0 %8 : Tensor[] = prim::ListConstruct(%6) return (%8) ``` Test Plan: python test/test_quantization.py test_interface_with_fork Imported from OSS Reviewed By: vkuzo Differential Revision: D23481003 fbshipit-source-id: 2e756be73c248319da38e053f021888b40593032	2020-09-05 12:06:19 -07:00
Supriya Rao	164b96c34c	[quant][pyper] make embedding_bag quantization static (#44008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44008 embedding_bag requires only quantization of weights (no dynamic quantization of inputs) So the type of quantization is essentially static (without calibration) This will enable pyper to do fc and embedding_bag quantization using the same API call Test Plan: python test/test_quantization.py test_embedding_bag Imported from OSS Reviewed By: vkuzo Differential Revision: D23467019 fbshipit-source-id: 41a61a17ee34bcb737ba5b4e19fb7a576d4aeaf9	2020-09-05 12:06:16 -07:00
Supriya Rao	a0ae416d60	[quant] Support aten::embedding_bag quantization in graph mode (#43989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43989 When we trace the model it produces aten::embedding_bag node in the graph, Add necessary passes in graph mode to help support quantizing it as well Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Imported from OSS Reviewed By: vkuzo Differential Revision: D23460485 fbshipit-source-id: 328c5e1816cfebb10ba951113f657665b6d17575	2020-09-05 12:05:06 -07:00
Yi Wang	15a7368115	Add const to getTensors method of GradBucket. (#44126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44126 Add const to getTensors method of GradBucket. Test Plan: buck test caffe2/torch/lib/c10d:ProcessGroupGlooTest Reviewed By: sinannasir, jiayisuse Differential Revision: D23504088 fbshipit-source-id: 427d9591042e0c03cde02629c1146ff1e5e027f9	2020-09-05 09:19:42 -07:00
Elias Ellison	5bd2902796	[JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44138 If you look at the sigmoid and tanh backward they are composed of other ops: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L786 https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/runtime/symbolic_script.cpp#L164 So tanh_backward and sigmoid_backward are no longer generated / legacy ops. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23543603 Pulled By: eellison fbshipit-source-id: ce8353e53043cf969b536aac47c9576d66d4ce02	2020-09-05 01:41:36 -07:00
Elias Ellison	df67f0beab	[TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44137 We only insert guards on Tensor types, so we rely on the output of a node being uniquely determined by its input types. bail if any non-Tensor input affects the output type and cannot be reasoned about statically Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23543602 Pulled By: eellison fbshipit-source-id: abd6fe0b1fd7fe6fc251694d4cd442b19c032dd7	2020-09-05 01:40:18 -07:00
Wanchao Liang	d07a36e0c1	Revert D23490149: [pytorch][PR] Compile less legacy code when BUILD_CAFFE2 is set to False Test Plan: revert-hammer Differential Revision: D23490149 (`15e99b6ff6`) Original commit changeset: a76382c30d83 fbshipit-source-id: 75057fa9af2c19eb976962552118bf0a99911b38	2020-09-04 22:59:39 -07:00
Vasiliy Kuznetsov	618b4dd763	fx quant prepare: clarify naming (#44125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44125 In `Quantizer._prepare`, `observed` was used for two different variables with different types. Making the names a bit cleaner and removing the name conflict. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: dskhudia Differential Revision: D23504109 fbshipit-source-id: 0f73eac3d6dd5f72ad5574a4d47d33808a70174a	2020-09-04 21:29:56 -07:00
Vasiliy Kuznetsov	a940f5ea5d	torchscript graph mode quant: remove benchmark filter (#44165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44165 Allows convolutions to be quantized if `torch.cudnn.backends.benchmark` flag was set. Not for land yet, just testing. Test Plan: in the gist below, the resulting graph now has quantized convolutions https://gist.github.com/vkuzo/622213cb12faa0996b6700b08d6ab2f0 Imported from OSS Reviewed By: supriyar Differential Revision: D23518775 fbshipit-source-id: 294f678c6afbd3feeb89b7a6655bc66ac9f8bfbc	2020-09-04 21:25:35 -07:00
Yuchen Huang	398409f072	[PyTorch][Mobile] Insert the module name as `name()` to metadata dict if metadata doesn't contain "model_name" (#44227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44227 As title ghstack-source-id: 111490242 Test Plan: CI Reviewed By: xcheng16 Differential Revision: D23549149 fbshipit-source-id: fad742a8d4e6f844f83495514cd60ff2bf0d5bcb	2020-09-04 21:18:12 -07:00
Nikita Shulga	15e99b6ff6	Compile less legacy code when BUILD_CAFFE2 is set to False (#44079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44079 Reviewed By: walterddr Differential Revision: D23490149 Pulled By: malfet fbshipit-source-id: a76382c30d83127d180ec63ac15093a7297aae53	2020-09-04 20:04:21 -07:00
shubhambhokare1	f3bf6a41ca	[ONNX] Update repeat op (#43430 ) Summary: Update repeat op so that the inputs to sizes argument can a mixture of dynamic and constant inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/43430 Reviewed By: houseroad Differential Revision: D23494257 Pulled By: bzinodev fbshipit-source-id: 90c5e90e4f73e98f3a9d5c8772850e72cecdf0d4	2020-09-04 18:53:31 -07:00
Yi Wang	8b17fd2516	Add remote_parameters() into RemoteModule class. (#43906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43906 This method returns a list of RRefs of remote parameters that can be fed into the DistributedOptimizer. Original PR issue: RemoteModule enhancements #40550 Test Plan: buck test caffe2/test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: rohan-varma Differential Revision: D23399586 fbshipit-source-id: 4b0f1ccf2e47c8a9e4f79cb2c8668f3cdbdff820	2020-09-04 16:22:40 -07:00
neginraoof	3d7c22a2ce	[ONNX] Enable new scripting passes for functionalization and remove_mutation (#43791 ) Summary: Duplicate of https://github.com/pytorch/pytorch/issues/41413 This PR initiates the process of updating the torchsciprt backend interface used by ONNX exporter. Replace jit lower graph pass by freeze module pass Enable ScriptModule tests for ONNX operator tests (ORT backend) and model tests by default. Replace jit remove_inplace_ops pass with remove_mutation and consolidation all passes for handling inplace ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43791 Reviewed By: houseroad Differential Revision: D23421872 Pulled By: bzinodev fbshipit-source-id: a98710c45ee905748ec58385e2a232de2486331b	2020-09-04 15:21:45 -07:00
Zachary DeVito	2ad5a82c43	[fx] get rid of graph_module.root (#44092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44092 instead submodules and weights are installed directly on the graph_module by transferring the original modules. This makes it more likely that scripting will succeed (since we no longer have submodules that are not used in the trace). It also prevents layered transforms from having to special case handling of the `root` module. GraphModules can now be re-traced as part of the input to other transforms. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23504210 Pulled By: zdevito fbshipit-source-id: f79e5c4cbfc52eb0ffb5d6ed89b37ce35a7dc467	2020-09-04 11:35:32 -07:00
Mikhail Zolotukhin	6474057c76	Revert D23503636: [pytorch][PR] [NNC] make inlining immediate (take 2) and fix bugs Test Plan: revert-hammer Differential Revision: D23503636 (`70aecd2a7f`) Original commit changeset: cdbdc902b7a1 fbshipit-source-id: b5164835f874a56213de4bed9ad690164eae9230	2020-09-04 10:58:23 -07:00
neginraoof	539d029d8c	[ONNX] Fix split export using slice (#43670 ) Summary: Fix for exporting split with fixed output shape using slice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43670 Reviewed By: houseroad Differential Revision: D23420318 Pulled By: bzinodev fbshipit-source-id: 09c2b58049fe32dca2f2977d91dd64de6ee9a72f	2020-09-04 10:52:44 -07:00
James Reed	af13faf18b	[FX] __str__ for GraphModule and Graph (#44166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44166 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23520801 Pulled By: jamesr66a fbshipit-source-id: f77e3466e435127ec01e66291964395f32a18992	2020-09-04 10:46:43 -07:00
Vinod Kumar S	2a1fc56694	replace the white list from default mappings (#41802 ) Summary: Replaced "whitelist" from default_mappings.py Fixes https://github.com/pytorch/pytorch/issues/41756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41802 Reviewed By: ngimel Differential Revision: D23521452 Pulled By: malfet fbshipit-source-id: 019a2d5c06dc59dc53d6c48b70fb35b216299cf4	2020-09-04 10:04:28 -07:00
Richard Zou	9a5a732866	Register some backwards functions as operators (#44052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44052 Summary ======= This PR registers the following backwards functions as operators: - slice_backward - select_backward - gather_backward - index_select_backward (the backward function for index_select) - select_index_backward (prevously known as index_select_backward, but is actually the backward function for max.dim, min.dim, etc) In the future, I'd like to register more backward functions as operators so that we can write batching rules for the backward functions. Batching rules for backward functions makes it so that we can compute batched gradients. Motivation ========== The rationale behind this PR is that a lot of backwards functions (27 in total) are incompatible with BatchedTensor due to using in-place operations. Sometimes we can allow the in-place operations, but other times we can't. For example, consider select_backward: ``` Tensor select_backward(const Tensor& grad, IntArrayRef input_sizes, int64_t dim, int64_t index) { auto grad_input = at::zeros(input_sizes, grad.options()); grad_input.select(dim, index).copy_(grad); return grad_input; } ``` and consider the following code: ``` x = torch.randn(5, requires_grad=True) def select_grad(v): torch.autograd.grad(x[0], x, v) vs = torch.randn(B0) batched_grads = vmap(select_grad)(vs) ``` For the batched gradient use case, `grad` is a BatchedTensor. The physical version of `grad` has size `(B0,)`. However, select_backward creates a `grad_input` of shape `(5)`, and tries to copy `grad` to a slice of it. Other approaches ================ I've considered the following: - register select_backward as an operator (this PR) - have a branch inside select_backward for if `grad` is batched. - this is OK, but what if we have more tensor extensions that want to override this? - modify select_backward to work with BatchedTensor, by creating a new operator for the "select + copy_ behavior". - select + copy_ isn't used elsewhere in derivative formulas so this doesn't seem useful Test Plan ========= - `pytest test/test_autograd.py -v` - Registering backward functions may impact performance. I benchmarked select_backward to see if registering it as an operator led to any noticable performance overheads: https://gist.github.com/zou3519/56d6cb53775649047b0e66de6f0007dc. The TL;DR is that the overhead is pretty minimal. Test Plan: Imported from OSS Reviewed By: ezyang, fbhuba Differential Revision: D23481183 Pulled By: zou3519 fbshipit-source-id: 125af62eb95824626dc83d06bbc513262ee27350	2020-09-04 08:30:39 -07:00
Nikita Shulga	0c01f136f3	[BE] Use f-string in various Python functions (#44161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44161 Reviewed By: seemethere Differential Revision: D23515874 Pulled By: malfet fbshipit-source-id: 868cf65aedd58fce943c08f8e079e84e0a36df1f	2020-09-04 07:38:25 -07:00
generatedunixname89002005287564	ef28ee50b0	[Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23536086 fbshipit-source-id: 56e9c70a6998086515f59d74c5d8a2280ac2f669	2020-09-04 03:33:32 -07:00
Bert Maher	98ad5ff41f	[te] Disable reductions by default (#44122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44122 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D23504769 Pulled By: bertmaher fbshipit-source-id: 1889217cd22da529e46ab30c9319a5646267e4ec	2020-09-03 23:37:45 -07:00
Martin Yuan	d221256888	[Message] Add what to do for missing operators. Summary: As title. Test Plan: N/A Reviewed By: gaurav-work Differential Revision: D23502416 fbshipit-source-id: a341eb10030e3f319266019ba4c02d9d9a0a6298	2020-09-03 22:41:27 -07:00
Nikita Shulga	b60ffcdfdd	Enable typechecks for torch.nn.quantized.modules.linear (#44154 ) Summary: Also import `Optional` directly from `typing` rather than from `_jit_internal` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44154 Reviewed By: seemethere Differential Revision: D23511833 Pulled By: malfet fbshipit-source-id: f78c5fd679c002b218e4d287a9e56fa198171981	2020-09-03 19:52:49 -07:00
Zafar	69e38828f5	[quant] conv_transpose2d_prepack/conv_transpose2d_unpack (#40351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40351 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22158983 Pulled By: z-a-f fbshipit-source-id: 3ca064c2d826609724b2740fcc9b9eb40556168d	2020-09-03 17:21:32 -07:00
Nick Gibson	70aecd2a7f	[NNC] make inlining immediate (take 2) and fix bugs (#43885 ) Summary: A rework of `computeInline` which makes it work a bit better, particularly when combined with other transformations. Previously we stored Functions that were inlined and then deferred the actual inlining of the function body until prepareForCodgen was called. This has an issue when transformations are applied to the LoopNest: the function body can be different from what appears in the root_stmt and result in inlining that a) fails, b) reverses other transformations or c) a weird unpredictable combination of the two. This PR changes that behaviour so that the inlining occurs in the root stmt immediately, which means it reflects any previous transformations and any future transformations have a true view of the internal IR. It also has the benefit that inspecting the root statement gives an accurate view of it without needing to call prepareForCodgen. I also removed the difference between `computeInline` and `computeInlineWithRand` and we handle calls to `rand()` in all branches. This is a rework of https://github.com/pytorch/pytorch/issues/38696, with the agreed changes from ZolotukhinM and zheng-xq: we should only inline if the dimensions are trivial (ie. they are vars not exprs). This PR is mostly tests, and I fixed a bunch of bugs I found along the way. Partial list: * When inlining an expression involving rand, we would create random vars equal to the dimensionality of the enclosing Tensor not the produced Tensor - meaning we'd use an incorrect value if the inlined tensor was smaller. E.g: `X[i] = rand(); A[i, j] = X[i]` would produce a tensor where `A[0, 0] != A[0, 1]`. This is fixed by inserting the Let binding of the random variable at the correct loop body. * When inlining we'd replace all calls to `rand()` rather than just those present in the Tensor being inlined. * `rand()` was treated symbolically by the simplifier and we would aggregate or cancel calls to `rand()`. Have fixed the hasher to hash all calls to `rand()` distinctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43885 Reviewed By: gmagogsfm Differential Revision: D23503636 Pulled By: nickgg fbshipit-source-id: cdbdc902b7a14d269911d978a74a1c11eab004fa	2020-09-03 16:49:24 -07:00
Mikhail Zolotukhin	3105d8a9b2	[TensorExpr] Fuser: rely on input types when checking whether a device is supported. (#44139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44139 Also, make sure that we're checking that condition when we're starting a new fusion group, not only when we merge a node into an existing fusion group. Oh, and one more: add a test checking that we're rejecting graphs with unspecified shapes. Differential Revision: D23507510 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 9c268825ac785671d7c90faf2aff2a3e5985ac5b	2020-09-03 16:27:14 -07:00
Vasiliy Kuznetsov	71510c60ad	fx qat: respect device affinity (#44115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44115 Fixes device affinity in the FX prepare pass for QAT. Before this PR, observers were always created on CPU. After this PR, observers are created on the same device as the rest of the model. This will enable QAT prepare to work regardless of whether users move the model to cuda before or after calling this pass. Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_qat_prepare_device_affinity ``` Imported from OSS Reviewed By: supriyar Differential Revision: D23502291 fbshipit-source-id: ec4ed20c21748a56a25e3395b35ab8640d71b5a8	2020-09-03 16:16:59 -07:00
Meghan Lele	7816d53798	[JIT] Add mypy type annotations for JIT (#43862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43862 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23491151 Pulled By: SplitInfinity fbshipit-source-id: 88367b89896cf409bb9ac3db7490d6779efdc3a4	2020-09-03 15:09:24 -07:00
Michael Suo	9dd8670d7d	[jit] Better match behavior of loaded ScriptModules vs. freshly created ones (#43298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43298 IR emitter uses `ModuleValue` to represent ScriptModules and emit IR for attribute access, submodule access, etc. `ModuleValue` relies on two pieces of information, the JIT type of the module, and the `ConcreteModuleType`, which encapsulates Python-only information about the module. ScriptModules loaded from a package used to create a dummy ConcreteModuleType without any info in it. This led to divergences in behavior during compilation. This PR makes the two ways of constructing a ConcreteModuleType equivalent, modulo any py-only information (which, by definition, is never present in packaged files anyway). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23228738 Pulled By: suo fbshipit-source-id: f6a660f42272640ca1a1bb8c4ee7edfa2d1b07cc	2020-09-03 15:03:39 -07:00
Michael Suo	74f18476a2	[jit] fix segfault in attribute lookup on loaded ScriptModules (#43284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43284 The IR emitter looks for attributes on modules like: 1. Check the JIT type for the attribute 2. Check the originating Python class, in order to fulfill requests for, e.g. static methods or ignored methods. In the case where you do: ``` inner_module = torch.jit.load("inner.pt") wrapped = Wrapper(inner_module) # wrap the loaded ScriptModule in an nn.Module torch.jit.script(wrapped) ``` The IR emitter may check for attributes on `inner_module`. There is no originating Python class for `inner_module`, since it was directly compiled from the serialized format. Due to a bug in the code, we don't guard for this case an a segfault results if the wrapper asks for an undefined attribute. The lookup in this case looks like: 1. Check the JIT type for the attribute (not there!) 2. Check the originating Python class (this is a nullptr! segfault!) This PR guards this case and properly just raises an attribute missing compiler error instead of segfaulting. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23224337 Pulled By: suo fbshipit-source-id: 0cf3060c427f2253286f76f646765ec37b9c4c49	2020-09-03 15:01:59 -07:00
Elias Ellison	6868bf95c6	[JIT] Fuser match on schemas not node kind (#44083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44083 Match on the complete schema of a node instead of its node kind when deciding to fuse it. Previously we matched on node kind, which could fail with something like `aten::add(int, int)` and if a new overload was added to an op without corresponding NNC support we would fuse it. Follow ups are: - bail when an output tensor type isnt uniquely determined by the input types (e.g. aten::add and the second input could be either a float or an int) - remove NNC lowering for _tanh_backward & _sigmoid_backward - Validate that we support all of the overloads here. I optimistically added ops that included Tensors, it's possible that we do not support every overload here. This isn't a regression, and this PR is at least improving our failures in that regard. I can do any of these as part of this PR if desired, but there are a number of failures people have run into that this PR fixes so I think it would be good to land this sooner than later. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23503704 Pulled By: eellison fbshipit-source-id: 3ce971fb1bc3a7f1cbaa38f1ed853e2db3d67c18	2020-09-03 14:47:19 -07:00
Ann Shan	9b3c72d46e	[pytorch] Make mobile find_method return an optional (#43965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965 As part of a larger effort to unify the API between the lite interpreter and full JIT: - implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function - add support for overloaded operator() to mobile Method and Function - mobile find_method now returns a c10::optional<Method> (so signature matches full jit) - moves some implementation of Function from module.cpp to function.cpp ghstack-source-id: 111161942 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D23330762 fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d	2020-09-03 14:46:18 -07:00
Nikolay Korovaiko	f91bdbeabd	Enable function calls in TEFuser and SpecializeAutogradZero (#43866 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866 Reviewed By: ezyang Differential Revision: D23452798 Pulled By: Krovatkin fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5	2020-09-03 14:42:52 -07:00
Zafar	e05fa2f553	[quant] Prep for conv_transpose packing (#39714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39714 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D22087071 Pulled By: z-a-f fbshipit-source-id: 507f8a414026eb4c9926f68c1e94d2f56119bca6	2020-09-03 14:10:32 -07:00
Yanan Cao	f3da9e3b50	Enable Enum pickling/unpickling. (#43188 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/43188 Enable Enum pickling/unpickling. * https://github.com/pytorch/pytorch/issues/42963 Add Enum TorchScript serialization and deserialization support * https://github.com/pytorch/pytorch/issues/42874 Fix enum constant printing and add FileCheck to all Enum tests * https://github.com/pytorch/pytorch/issues/43121 Add Enum convert back to Python object support Pull Request resolved: https://github.com/pytorch/pytorch/pull/43188 Reviewed By: zdevito Differential Revision: D23365141 Pulled By: gmagogsfm fbshipit-source-id: f0c93d4ac614dec047ad8640eb6bd9c74159b558	2020-09-03 13:51:02 -07:00
Rohan Varma	3806c939bd	Polish DDP join API docstrings (#43973 ) Summary: Polishes DDP join api docstrings and makes a few minor cosmetic changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43973 Reviewed By: zou3519 Differential Revision: D23467238 Pulled By: rohan-varma fbshipit-source-id: faf0ee56585fca5cc16f6891ea88032336b3be56	2020-09-03 13:39:45 -07:00
Nikita Shulga	442684cb25	Enable typechecks for torch.nn.modules.[activation\|upsampling] (#44093 ) Summary: Add missing `hardsigmoid`, `silu`, `hardswish` and `multi_head_attention_forward` to functional.pyi.in Embed some typing annotations into functional.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/44093 Reviewed By: ezyang Differential Revision: D23494384 Pulled By: malfet fbshipit-source-id: 27023c16ff5951ceaebb78799c4629efa25f7c5c	2020-09-03 13:20:04 -07:00
Kimish Patel	a153f69417	Fix replaceAtenConvolution for BC. (#44036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44036 Running replaceAtenConvolution on older traced model wont work as _convolution signature has changed and replaceAtenConvolution was changed to account for that. But we did not preserve the old behavior during that. This change restores the old behavior while keeing the new one. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23476775 fbshipit-source-id: 73a0c2b7387f2a8d82a8d26070d0059972126836	2020-09-03 12:57:57 -07:00
Kimish Patel	ba65cce2a2	Fix transposed conv2d rewrite pattern to account for convolution api (#44035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44035 change Also added test so as to capture such cases for future. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Reviewed By: iseeyuan Differential Revision: D23476773 fbshipit-source-id: a62c4429351c909245106a70b4c60b1bacffa817	2020-09-03 12:55:43 -07:00
Bert Maher	55ff9aa185	Test TE fuser unary ops and fix sigmoid(half) (#44094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44094 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23494950 Pulled By: bertmaher fbshipit-source-id: 676c4e57267c4ad92065ea90b06323918dd5b0de	2020-09-03 12:48:46 -07:00
Gregory Chanan	49215d7f26	For CriterionTests, have check_gradgrad actually only affect gradgrad checks. (#44060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44060 Right now it skips grad checks as well. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23484018 Pulled By: gchanan fbshipit-source-id: 24a8f1af41f9918aaa62bc3cd78b139b2f8de1e1	2020-09-03 12:29:32 -07:00
Meghan Lele	de672e874d	[JIT] Improve error message for unsupported Optional types (#44054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44054 Summary This commit improves the error message that is printed when an `Optional` type annotation with an unsupported contained type is encountered. At present, the `Optional` is printed as-is, and `Optional[T]` is syntatic sugar for `Union[T, None]`, so that is what shows up in the error message and can be confusing. This commit modifies the error message so that it prints `T` instead of `Union[T, None]`. Test Plan Continuous integration. Example of old message: ``` AssertionError: Unsupported annotation typing.Union[typing.List, NoneType] could not be resolved. ``` Example of new message: ``` AssertionError: Unsupported annotation typing.Union[typing.List, NoneType] could not be resolved because typing.List could not be resolved. ``` Fixes This commit fixes #42859. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23490365 Pulled By: SplitInfinity fbshipit-source-id: 2aa9233718e78cf1ba3501ae11f5c6f0089e29cd	2020-09-03 11:55:06 -07:00
Xingying Cheng	c59e11bfbb	Add soft error reporting to capture all the inference runtime failure. (#44078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44078 When PyTorch mobile inference failed and throw exception, if caller catch and not crash the app, we are not able to track all the inference failures. So we are adding native soft error reporting to capture all the failures occurring during module loading and running including both crashing and on-crashing failures. Since c10::Error has good error messaging stack handling (D21202891 (`a058e938f9`)), we are utilizing it for the error handling and message print out. ghstack-source-id: 111307080 Test Plan: Verified that the soft error reporting is sent through module.cpp when operator is missing, make sure a logview mid is generated with stack trace: https://www.internalfb.com/intern/logview/details/facebook_android_softerrors/5dd347d1398c1a9a73c804b20f7c2179/?selected-logview-tab=latest. Error message with context is logged below: ``` soft_error.cpp [PyTorchMobileInference] : Error occured during model running entry point: Could not run 'aten::embedding' with arguments from the 'CPU' backend. 'aten::embedding' is only available for these backends: [BackendSelect, Named, Autograd, Autocast, Batched, VmapMode]. BackendSelect: fallthrough registered at xplat/caffe2/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at xplat/caffe2/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Autograd: fallthrough registered at xplat/caffe2/aten/src/ATen/core/VariableFallbackKernel.cpp:31 [backend fallback] Autocast: fallthrough registered at xplat/caffe2/aten/src/ATen/autocast_mode.cpp:253 [backend fallback] Batched: registered at xplat/caffe2/aten/src/ATen/BatchingRegistrations.cpp:317 [backend fallback] VmapMode: fallthrough registered at xplat/caffe2/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] Exception raised from reportError at xplat/caffe2/aten/src/ATen/core/dispatch/OperatorEntry.cpp:261 (m ``` Reviewed By: iseeyuan Differential Revision: D23428636 fbshipit-source-id: 82d5d9c054300dff18d144f264389402d0b55a8a	2020-09-03 10:54:43 -07:00
Gregory Chanan	5973b44d9e	Rename NewCriterionTest to CriterionTest. (#44056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44056 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23482573 Pulled By: gchanan fbshipit-source-id: dde0f1624330dc85f48e5a0b9d98fb55fdb72f68	2020-09-03 10:29:20 -07:00
Sinan Nasir	98320061ad	DDP Communication hook: (Patch) Fix the way we pass future result to buckets. (#43734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43734 Following the additional GH comments on the original PR https://github.com/pytorch/pytorch/pull/43307. ghstack-source-id: 111327130 Test Plan: Run `python test/distributed/test_c10d.py` Reviewed By: smessmer Differential Revision: D23380288 fbshipit-source-id: 4b8889341c57b3701f0efa4edbe1d7bbc2a82ced	2020-09-03 08:59:10 -07:00
Gregory Chanan	cae52b4036	Merge CriterionTest into NewCriterionTest. (#44055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44055 There is no functional change here. Another patch will rename NewCriterionTest to CriterionTest. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23482572 Pulled By: gchanan fbshipit-source-id: de364579067e2cc9de7df6767491f8fa3a685de2	2020-09-03 08:14:34 -07:00
Andrew Jones	24ca6aab02	Improves type-checking guards. (#43339 ) Summary: PR https://github.com/pytorch/pytorch/issues/38157 fixed type checking for mypy by including `if False` guards on some type-checker-only imports. However other typecheckers - [like pyright](https://github.com/microsoft/pylance-release/issues/262#issuecomment-677758245) - will respect this logic and ignore the imports. Using [`if TYPE_CHECKING`](https://docs.python.org/3/library/typing.html#typing.TYPE_CHECKING) instead means both mypy and pyright will work correctly. [For background, an example of where the current code fails](https://github.com/microsoft/pylance-release/issues/262) is if you make a file `tmp.py` with the contents ```python import torch torch.ones((1,)) ``` Then [`pyright tmp.py --lib`](https://github.com/microsoft/pyright#command-line) will fail with a `"ones" is not a known member of module` error. This is because it can't find the `_VariableFunctions.pyi` stub file, as pyright respects the `if False` logic. After adding the `TYPE_CHECKING` guard, all works correctly. Credit to erictraut for suggesting the fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43339 Reviewed By: agolynski Differential Revision: D23348142 Pulled By: ezyang fbshipit-source-id: c8a58122a7b0016845c311da39a1cc48748ba03f	2020-09-03 07:45:53 -07:00
Gregory Chanan	68a1fbe308	Allow criterion backwards test on modules requiring extra args (i.e. CTCLoss). (#44050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44050 We don't actually turn on the CTCLoss tests since they fail, but this allows you to toggle check_forward_only and for the code to actually run. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23481091 Pulled By: gchanan fbshipit-source-id: f2a3b0a2dee27341933c5d25f1e37a878b04b9f6	2020-09-03 07:41:21 -07:00
Gregory Chanan	5f89aa36cf	Actually run backward criterion tests. (#44030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44030 This looks to have been a mistake from https://github.com/pytorch/pytorch/pull/9287. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23476274 Pulled By: gchanan fbshipit-source-id: 81ed9d0c9a40d49153fc97cd69fdcd469bec0c73	2020-09-03 07:39:13 -07:00
Mike Ruberry	665feda15b	Adds opinfo-based autograd tests and (un)supported dtype tests (#43451 ) Summary: This PR adds a new test suite, test_ops.py, designed for generic tests across all operators with OpInfos. It currently has two kinds of tests: - it validates that the OpInfo has the correct supported dtypes by verifying that unsupported dtypes throw an error and supported dtypes do not - it runs grad and gradgrad checks on each op and its variants (method and inplace) that has an OpInfo This is a significant expansion and simplification of the current autogenerated autograd tests, which spend considerable processing their inputs. As an alternative, this PR extends OpInfos with "SampleInputs" that are much easier to use. These sample inputs are analogous to the existing tuples in`method_tests()`. Future PRs will extend OpInfo-based testing to other uses of `method_tests()`, like test_jit.py, to ensure that new operator tests can be implemented entirely using an OpInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43451 Reviewed By: albanD Differential Revision: D23481723 Pulled By: mruberry fbshipit-source-id: 0c2cdeacc1fdaaf8c69bcd060d623fa3db3d6459	2020-09-03 02:50:48 -07:00
Milind Yishu Ujjawal	ab7606702c	Rectified a few grammatical errors in documentation (#43695 ) Summary: Rectified a few grammatical errors in documentation of pytorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43695 Reviewed By: anjali411 Differential Revision: D23451600 Pulled By: ezyang fbshipit-source-id: bc7b34c240fde1b31cac811080befa2ff2989395	2020-09-02 23:59:45 -07:00
Mikhail Zolotukhin	40fec4e739	[TensorExpr] Fuser: do not fuse ops with 0-dim tensors. (#44073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44073 We don't have a proper support on NNC and JIT IR->NNC lowering side for it yet. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23487905 Pulled By: ZolotukhinM fbshipit-source-id: da0da7478fc8ce7b455176c95d8fd610c94352c1	2020-09-02 22:59:04 -07:00
Mikhail Zolotukhin	3da82aee03	[JIT] Remove profile nodes before BatchMM. (#43961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43961 Currently we're removing prim::profile nodes and embed the type info directly in the IR right before the fuser, because it is difficult to fuse in a presence of prim::profile nodes. It turns out that BatchMM has a similar problem: it doesn't work when there are prim::profile nodes in the graph. These two passes run next to each other, so we could simply remove prim::profile nodes slightly earlier: before the BatchMM pass. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23453266 Pulled By: ZolotukhinM fbshipit-source-id: 92cb50863962109b3c0e0112e56c1f2cb7467ff1	2020-09-02 22:57:39 -07:00
Gao, Xiang	37658b144b	Remove useless py2 compatibility import __future__, part 1 (#43808 ) Summary: To avoid conflicts, this PR does not remove all imports. More are coming in further PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43808 Reviewed By: wanchaol Differential Revision: D23436675 Pulled By: ailzhang fbshipit-source-id: ccc21a1955c244f0804277e9e47e54bfd23455cd	2020-09-02 19:15:11 -07:00
Mikhail Zolotukhin	b2aaf212aa	[TensorExpr] Add option to enforce TensorExprKernel fallbacks. (#43972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43972 It is useful when debugging a bug to disable NNC backend to see whether the bug is there or in the fuser logic. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D23455624 Pulled By: ZolotukhinM fbshipit-source-id: f7c0452a29b860afc806e2d58acf35aa89afc060	2020-09-02 18:34:24 -07:00
Bert Maher	33d51a9b32	Respect canFuseOn{CPU,GPU} in TE fuser (#43967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43967 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D23469048 Pulled By: bertmaher fbshipit-source-id: 1005a7ae08974059ff9d467492caa3a388070eeb	2020-09-02 18:00:25 -07:00
anjali411	129f406062	Make torch.conj() a composite function and return self for real tensors (#43270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43270 `torch.conj` is a very commonly used operator for complex tensors, but it's mathematically a no op for real tensors. Switching to tensorflow gradients for complex tensors (as discussed in #41857) would involve adding `torch.conj()` to the backward definitions for a lot of operators. In order to preserve autograd performance for real tensors and maintain numpy compatibility for `torch.conj`, this PR updates `torch.conj()` which behaves the same for complex tensors but performs a view/returns `self` tensor for tensors of non-complex dtypes. The documentation states that the returned tensor for a real input shouldn't be mutated. We could perhaps return an immutable tensor for this case in future when that functionality is available (zdevito ezyang ). Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23460493 Pulled By: anjali411 fbshipit-source-id: 3b3bf0af55423b77ff2d0e29f5d2c160291ae3d9	2020-09-02 17:06:04 -07:00

... 3 4 5 6 7 ...

11958 Commits