pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Lillian Johnson	9a668f94bb	[jit] allow slicing multiple dimensions with indicies (#45239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45239 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23886919 Pulled By: Lilyjjo fbshipit-source-id: d45c2a550fa8df9960cf2ab5da9d1ae0058a967a	2020-10-05 15:03:54 -07:00
Taras Galkovskyi	f11f9a8c1f	[pytorch][improvement] Improve torch logging to identify problematic key (#45766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45766 As per subj, making KeyError message more verbose. Test Plan: Verified that breakage can be successfully investigated with verbose error message unit tests Reviewed By: esqu1 Differential Revision: D24080362 fbshipit-source-id: f4e22a78809e5cff65a69780d5cbbc1e8b11b2e5	2020-10-05 14:54:52 -07:00
Jerry Zhang	21fa877026	[quant][test] Remove numeric equivalence test for debug and non-debug option (#45852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45852 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24115329 fbshipit-source-id: ad32e68cbd54431fd440c8437a4361905a5dbdad	2020-10-05 14:11:07 -07:00
Jane Xu	ffbffc0436	fixed formatting in function rstrings in torch.autograd.functional (#45849 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44426 The changes look like: ![Screen Shot 2020-10-05 at 12 34 32 PM](https://user-images.githubusercontent.com/31798555/95107954-9839f500-0708-11eb-88b0-444486f53061.png) (compare with https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.jacobian) and also ![Screen Shot 2020-10-05 at 12 35 15 PM](https://user-images.githubusercontent.com/31798555/95107966-9bcd7c00-0708-11eb-979a-b3578b8203da.png) (compare with https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.hessian) and lastly ![Screen Shot 2020-10-05 at 12 38 19 PM](https://user-images.githubusercontent.com/31798555/95107971-9e2fd600-0708-11eb-9919-5b809f5f0f20.png) (compare with https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.hvp) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45849 Reviewed By: albanD Differential Revision: D24114223 Pulled By: janeyx99 fbshipit-source-id: bfea5f0d594933db4b2c400291d330f747f518e8	2020-10-05 13:39:01 -07:00
Pritam Damania	b5a2f04089	Disallow creation of ProcessGroupNCCL without GPUs. (#45642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45642 Prior to https://github.com/pytorch/pytorch/pull/45181, initializing a NCCL process group would work even if no GPUs were present. Although, now since init_process_group calls `barrier()` this would fail. In general the problem was that we could initialize ProcessGroupNCCL without GPUs and then if we called a method like `barrier()` the process would crash since we do % numGPUs resulting in division by zero. ghstack-source-id: 113490343 Test Plan: waitforbuildbot Reviewed By: osalpekar Differential Revision: D24038839 fbshipit-source-id: a1f1db52cabcfb83e06c1a11ae9744afbf03f8dc	2020-10-05 12:05:48 -07:00
Negin Raoof	45ddeb5ce6	[ONNX] Improve error handling for adaptive_pool (#43032 ) Summary: This would also improve error handling for interpolate with 'area' mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43032 Reviewed By: malfet Differential Revision: D23398534 Pulled By: bzinodev fbshipit-source-id: f2d60d40340f46e7c0499ea73c1e39945713418d	2020-10-05 11:53:14 -07:00
Nikolay Korovaiko	adc21c6db2	Rename jobs and cli switches for testing GraphExecutor configurations to something a little bit more sensical. (#45715 ) Summary: Rename jobs for testing GraphExecutor configurations to something a little bit more sensical. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45715 Reviewed By: ezyang, anjali411 Differential Revision: D24114344 Pulled By: Krovatkin fbshipit-source-id: 89e5f54aaebd88f8c5878e060e983c6f1f41b9bb	2020-10-05 11:43:28 -07:00
Thomas Viehmann	3ab88c3903	Enable TorchBind tests on ROCm (#45426 ) Summary: The torchbind tests didn't work be cause somehow we missed the rename of caffe2_gpu to torch_... (hip for us) in https://github.com/pytorch/pytorch/issues/20774 (merged 2019-06-13, oops) and still tried to link against it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45426 Reviewed By: VitalyFedyunin Differential Revision: D24112439 Pulled By: walterddr fbshipit-source-id: a66a574e63714728183399c543d2dafbd6c028f7	2020-10-05 09:38:12 -07:00
kshitij12345	f65ab89edd	[numpy] Add torch.nan_to_num (#44592 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO: * [x] Add tests * [x] Add docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44592 Reviewed By: colesbury Differential Revision: D24079472 Pulled By: mruberry fbshipit-source-id: 2b67d36cba46eaa7ca16cd72671b57750bd568bc	2020-10-05 01:38:56 -07:00
James Reed	2ab74a4839	[FX] Make Tracer.trace() just return a Graph (#45704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45704 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24067982 Pulled By: jamesr66a fbshipit-source-id: c82aa6be504d45e110055a3c4db129d0b9ac3ef5	2020-10-03 21:13:48 -07:00
Hao Lu	8a6b919163	[StaticRuntime] Fix broken tests (#45813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45813 Fix tests broken by D23996656 (`2b48dd168d`). Test Plan: ``` buck test mode/opt //pytorch/tensorboardX:test_pytorchtb -- 'test_pytorch_graph $pytorch\.tensorboardX\.tests\.test_pytorch_graph\.PytorchGraphTest$' buck test mode/opt //pytext/tests: buck test mode/dev-nosan //mobile-vision/projects/detectron2go/tests:test_caffe2_compatibles ``` Reviewed By: yinghai Differential Revision: D24100807 fbshipit-source-id: e2f92aadca4161f5cf9f552e922fb4d6500af3a4	2020-10-03 16:54:22 -07:00
Nikita Shulga	24fa2daea6	Revert D24100389: Revert D24072697: [te] Get llvm codegen to compile with llvm9 and llvm-fb Test Plan: revert-hammer Differential Revision: D24100389 Original commit changeset: b32c5163e4fb fbshipit-source-id: 9ce7bfbcf411c0584e5d535ee107fb5a135ee6e6	2020-10-03 15:33:42 -07:00
Nikita Shulga	ff568a0e6b	Revert D24072697: [te] Get llvm codegen to compile with llvm9 and llvm-fb Test Plan: revert-hammer Differential Revision: D24072697 (`e3d2defdc8`) Original commit changeset: 7f56b9f3cbe5 fbshipit-source-id: b32c5163e4fb6df99447f95fdb82674e5ae62f22	2020-10-03 12:27:26 -07:00
Hao Lu	2b48dd168d	[StaticRuntime] Integrate Static Runtime into PyTorchPredictor (#45640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45640 Reviewed By: dzhulgakov Differential Revision: D23996656 fbshipit-source-id: 63d88c89d1df61a04deadc472319607ed83867e5	2020-10-02 23:03:05 -07:00
Edward Yang	546aab66c1	Revert D24027761: Update backward definition for more operators and reenable tests in test_ops.py Test Plan: revert-hammer Differential Revision: D24027761 (`7d809f5d8e`) Original commit changeset: c1f707c2a039 fbshipit-source-id: 30750d2f08886036fb8b2cd0ae51c7732d3b7b19	2020-10-02 18:52:57 -07:00
Michael Suo	31621c828d	Fix JIT tests when run locally in fbcode (#45776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45776 Splitting out backend and custom class registration into their own library is not currently implemented in fbcode, so detect that we are running tests in fbcode and disable those tests. Test Plan: buck test mode/no-gpu mode/dev caffe2/test:jit Reviewed By: smessmer Differential Revision: D24085871 fbshipit-source-id: 1fcc0547880bc4be59428e2810b6a7f6e50ef798	2020-10-02 17:43:01 -07:00
James Reed	53aea60bce	[FX] Make output a non-special Node (#45599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45599 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24027586 Pulled By: jamesr66a fbshipit-source-id: 747c25e3c7668ca45f03bed0be71fd3c9af67286	2020-10-02 17:08:17 -07:00
Xiang Gao	2fa062002e	CUDA BFloat16 infrastructure (#44925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44925 Reviewed By: agolynski Differential Revision: D23783910 Pulled By: ngimel fbshipit-source-id: dacac2ad87d58056bdc68bfe0b7ab1de5c2af0d8	2020-10-02 16:21:30 -07:00
Shen Li	8cb7280242	Revert "Remove device maps from TensorPipe for v1.7 release (#45353 )" (#45762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45762 This reverts commit `5211fb97ac`. Test Plan: Imported from OSS Reviewed By: colesbury Differential Revision: D24088231 Pulled By: mrshenli fbshipit-source-id: b6ee15ec5ae137ea127bdc2db8e1842764bc01d4	2020-10-02 15:14:05 -07:00
Yanan Cao	d150d3e276	Make sure each warnings.warn only executes once inside TorchScript. (#45382 ) Summary: * Add a pass at end of runCleanupPasses to annotate `aten::warn` so that each has its unique id * Enhanced interpreter so that it tracks which `aten::warn` has been executed before and skip them * Improved insertInstruction so that it correctly checks for overflow Fixes https://github.com/pytorch/pytorch/issues/45108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45382 Reviewed By: mrshenli Differential Revision: D24060677 Pulled By: gmagogsfm fbshipit-source-id: 9221bc55b9ce36b374bdf614da3fe47496b481c1	2020-10-02 14:55:10 -07:00
anjali411	7d809f5d8e	Update backward definition for more operators and reenable tests in test_ops.py (#44444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44444 This PR: 1. Fixes https://github.com/pytorch/pytorch/issues/41510. Updates backward formula for the following functions: `asin`, `acos`, `asinh`, `acosh`, `atan`, `atanh`, `div`, `log`, `log10`, `log2`, `log1p`, `pow`, `reciprocal`, `angle`. 2. Re-enables the tests in `test_ops.py`. 3. Adds dispatch for complex dtypes for `tanh_backward`. 4. Re-enables commented tests in `common_methods_invocation.py`. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24027761 Pulled By: anjali411 fbshipit-source-id: c1f707c2a039149a6e04bbde53ee120d9119d99a	2020-10-02 13:37:10 -07:00
Bert Maher	e3d2defdc8	[te] Get llvm codegen to compile with llvm9 and llvm-fb (#45726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45726 FB has an old internal platform that uses some random llvm version that looks sort of like llvm 7. I've guarded that with the appropriate LLVM_VERSION_PATCH. I've also swapped out some of our uses of ThreadSafeModule/ThreadSafeContext for the variants without ThreadSafe in the name. As far as I can tell we weren't using the bundled locks anyways, but I'm like 85% sure this is OK since we compile under the Torch JIT lock anyways. Test Plan: unit tests Reviewed By: ZolotukhinM, asuhan Differential Revision: D24072697 fbshipit-source-id: 7f56b9f3cbe5e6d54416acdf73876338df69ddb2	2020-10-02 13:33:13 -07:00
Rohan Varma	f8c1ca5dd8	Enable NamedTuple data type to work with DDP (#44220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44220 Closes https://github.com/pytorch/pytorch/issues/44009 Currently if a dataloader returns objects created with a collections.namedtuple, this will incorrectly be cast to a tuple. As a result, if we have data of these types, there can be runtime errors during the forward pass if the module is expecting a named tuple. Fix this in `scatter_gather.py` to resolve the issue reported in https://github.com/pytorch/pytorch/issues/44009 ghstack-source-id: 113423287 Test Plan: CI Reviewed By: colesbury Differential Revision: D23536752 fbshipit-source-id: 3838e60162f29ebe424e83e474c4350ae838180b	2020-10-02 13:33:08 -07:00
Rong Rong	322855e380	type check for torch.quantization.observer (#45630 ) Summary: add type checker for observer Pull Request resolved: https://github.com/pytorch/pytorch/pull/45630 Reviewed By: malfet Differential Revision: D24058304 Pulled By: walterddr fbshipit-source-id: ac1c0f5ff0d34b0445bd1364653fc5c9d7571b05	2020-10-02 13:25:41 -07:00
Ansley Ussery	db8b076272	Change signature for torch.poisson (#45656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45656 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24078609 Pulled By: ansleyadelaide fbshipit-source-id: 97a95b08334ed0d710e032a267b940c2fc9f7f40	2020-10-02 13:14:12 -07:00
Ansley Ussery	7726754e70	Add function signature for pixel_shuffle (#45661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45661 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24078627 Pulled By: ansleyadelaide fbshipit-source-id: 44917ff5932e4d0adcc18ce24ecfc0b5686818e3	2020-10-02 11:46:35 -07:00
Omkar Salpekar	3799ba83e5	[Docs] Adding Store API Docs (#45543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45543 This PR adds documentation for the c10d Store to the public docs. Previously these docs were missing although we exposed a lightly-used (but potentially useful) Python API for our distributed key-value store. ghstack-source-id: 113409195 Test Plan: Will verify screenshots by building the docs. Reviewed By: pritamdamania87 Differential Revision: D24005598 fbshipit-source-id: 45c3600e7c3f220710e99a0483a9ce921d75d044	2020-10-02 11:16:56 -07:00
Eli Uriegas	a052597e6c	Bump nightlies to 1.8.0 (#45696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45696 Similar to https://github.com/pytorch/pytorch/pull/40519 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D24064381 Pulled By: seemethere fbshipit-source-id: 1484b9c4fc5fa8cfa7be591a0a5d4b6e05968589	2020-10-02 11:10:34 -07:00
Pritam Damania	6e43f0db8b	Use correct signatures for METH_NOARGS. (#45528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45528 As described in https://github.com/pytorch/pytorch/issues/45419, resolving a bunch of cpython signature issues. #Closes: https://github.com/pytorch/pytorch/issues/45419 ghstack-source-id: 113385726 Test Plan: sentinel Reviewed By: albanD Differential Revision: D24000626 fbshipit-source-id: d334596f1f0256063691aa044c8fb2face260817	2020-10-02 10:43:58 -07:00
Andrew Millspaugh	cdf93b03de	Add string versions of argument funcs in jit Node (#45464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45464 Usage of Symbols to find arguments requires one to generate a nonsense symbol for inputs which don't already have one. The intention of symbols appears to be something of an internalized string, but the namespace component doesn't apply to an argument. In order to access the arguments by name without adding new symbols, versions of those functions with std::string input was added. These can be proved valid based on the existing codepath. Additionally, a hasNamedInput convenience function was added to remove the necessity of a try/catch block in user code. The primary motivation is to be able to easily handle the variable number of arguments in glow, so that the arange op may be implemented. Reviewed By: eellison Differential Revision: D23972315 fbshipit-source-id: 3e0b41910cf07e916186f1506281fb221725a91b	2020-10-02 10:26:29 -07:00
Sam Estep	24187a0b42	Enable type check for torch.quantization.fake_quantize (#45701 ) Summary: Addresses part of https://github.com/pytorch/pytorch/issues/42969. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45701 Reviewed By: walterddr Differential Revision: D24066672 Pulled By: samestep fbshipit-source-id: 53bb5e7b4703738d3de86fa89fb0980f1d6251f3	2020-10-02 09:27:34 -07:00
Brian Hirsh	869b2ca048	some documentation and style fixes to smooth_l1_loss (#45587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45587 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24024313 Pulled By: bdhirsh fbshipit-source-id: c50efb2934d7b9d3b090e92678319cde42c0df45	2020-10-02 07:47:31 -07:00
Brian Hirsh	c703602e17	make broadcasting explanation clearer in matmul doc: #22763 (#45699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45699 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24065584 Pulled By: bdhirsh fbshipit-source-id: 5e2cdd00ed18ad47d24d11751cfa5bee63853cc9	2020-10-02 06:51:42 -07:00
Natalia Gimelshein	9201c37d02	Use addmm directly for 1x1 convolution (#45557 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45274 Based on https://github.com/pytorch/pytorch/issues/44041, sets intermediate for backward computation (otherwise, backward tests are failing). Pull Request resolved: https://github.com/pytorch/pytorch/pull/45557 Reviewed By: izdeby Differential Revision: D24030655 Pulled By: ngimel fbshipit-source-id: 368fe9440668dffc004879f8b1d2dd3787d915c9	2020-10-02 00:26:53 -07:00
Supriya Rao	04526a49d3	[quant] creating quint4x2 dtype for quantized tensors (#44678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44678 This is a prototype PR that introduces 4 bit qtensors. The new dtype added for this is c10::quint4x2 The underlying storage for this is still uint8_t, so we pack 2 4-bit values in a byte while quantizing it. This change uses most of the existing scaffolding for qtensor storage. We allocate storage based on the dtype before creating a new qtensor. It also adds a dispatch mechanism for this dtype so we can use this to get the bitwidth, qmin and qmax info while quantizing and packing the qtensor (when we add 2-bit qtensor) Kernels that use this dtype should be aware of the packing format. Test Plan: Locally tested ``` x = torch.ones((100, 100), dtype=torch.float) qx_8bit = torch.quantize_per_tensor(x, scale=1.0, zero_point=2, dtype=torch.quint8) qx = torch.quantize_per_tensor(x, scale=1.0, zero_point=2, dtype=torch.quint4x2) torch.save(x, "temp.p") print('Size float (B):', os.path.getsize("temp.p")) os.remove('temp.p') torch.save(qx_8bit, "temp.p") print('Size quantized 8bit(B):', os.path.getsize("temp.p")) os.remove('temp.p') torch.save(qx, "temp.p") print('Size quantized 4bit(B):', os.path.getsize("temp.p")) os.remove('temp.p') ``` Size float (B): 40760 Size quantized 8bit(B): 10808 Size quantized 4bit(B): 5816 Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23993134 fbshipit-source-id: 073bf262f9680416150ba78ed2d932032275946d	2020-10-01 23:53:34 -07:00
Nikolay Korovaiko	a0d08b2199	Set the default bailout depth to 20 (#45710 ) Summary: This modifies the default bailout depth to 20 which gives us a reasonable performance in benchmarks we considered (fastrnns, maskrcnn, hub/benchmark, etc) Pull Request resolved: https://github.com/pytorch/pytorch/pull/45710 Reviewed By: robieta Differential Revision: D24071861 Pulled By: Krovatkin fbshipit-source-id: 472aacc136f37297b21f577750c1d60683a6c81e	2020-10-01 23:37:41 -07:00
Meghan Lele	402caaeba5	[docs] Update docs for NegativeBinomial (#45693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45693 Summary This commit updates the docstring for `torch.distributions.NegativeBinomial` to better match actual behaviour. In particular, the parameter currently documented as probability of success is actually probability of failure. Test Plan 1) Ran the code from the issue to make sure this is still an issue (it is) 2) `make html` and viewed the docs in a browser. Before <img width="879" alt="Captura de Pantalla 2020-10-01 a la(s) 1 35 28 p m" src="https://user-images.githubusercontent.com/4392003/94864456-db3a5680-03f0-11eb-977e-3bab0fb9c206.png"> After <img width="877" alt="Captura de Pantalla 2020-10-01 a la(s) 2 12 24 p m" src="https://user-images.githubusercontent.com/4392003/94864478-e42b2800-03f0-11eb-965a-51493ca27c80.png"> Fixes This commit closes #42449. Test Plan: Imported from OSS Reviewed By: robieta Differential Revision: D24071048 Pulled By: SplitInfinity fbshipit-source-id: d345b4de721475dbe26233e368af62eb57a47970	2020-10-01 23:20:34 -07:00
Lillian Johnson	f6dc256bc6	example of splitting up an FX graph into smaller subgraphs with own submodules (#45404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45404 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23956147 Pulled By: Lilyjjo fbshipit-source-id: a35e33a0b9f1ed5f3fb6e5cd146f66c29bf3d518	2020-10-01 20:40:27 -07:00
lixinyu	fc4209bd4f	Fix the bucketization wrong doc for right argument (#45684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45684 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24057996 Pulled By: glaringlee fbshipit-source-id: 3db1c24f3cae9747effa4b1f3c5c3baf6888c9a1	2020-10-01 18:16:49 -07:00
Abaho Katabarwa	de3a48013a	Use CAFFE2_USE_MSVC_STATIC_RUNTIME to determine when to avoid waiting for global destructors on Windows (#43532 ) Summary: We are trying to build libtorch statically (BUILD_SHARED_LIBS=OFF) then link it into a DLL. Our setup hits the infinite loop mentioned [here](`54c05fa34e/torch/csrc/autograd/engine.cpp (L228)`) because we build with `BUILD_SHARED_LIBS=OFF` but still link it all into a DLL at the end of the day. This PR fixes the issue by changing the condition to guard on which windows runtime the build links against using the `CAFFE2_USE_MSVC_STATIC_RUNTIME` flag. `CAFFE2_USE_MSVC_STATIC_RUNTIME` defaults to ON when `BUILD_SHARED_LIBS=OFF`, so backwards compatibility is maintained. I'm not entirely confident I understand the subtleties of the windows runtime versus linking setup, but this setup works for us and should not affect the existing builds. Fixes https://github.com/pytorch/pytorch/issues/44470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43532 Reviewed By: mrshenli Differential Revision: D24053767 Pulled By: albanD fbshipit-source-id: 1127fefe5104d302a4fc083106d4e9f48e50add8	2020-10-01 16:41:14 -07:00
Jerry Zhang	4f685ecc25	[reland][quant][graphmode][fx] Merge all quantization mode (#45292 ) (#45672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45672 This PR merges all quantization mode and will only expose the following top level functions: ``` prepare_fx prepare_qat_fx convert_fx ``` Test Plan: Imported from OSS Imported from OSS Reviewed By: z-a-f Differential Revision: D24053439 fbshipit-source-id: 03d545e26a36bc22a73349061b751eeb35171e64	2020-10-01 15:47:11 -07:00
Wang Xu	03e4e94d24	Find single partition (#45429 ) Summary: WIP: This PR is working in progress for the partition of fx graph module. _class partitioner_ generates partitions for the graph module. _class partition_ is a partition node in the partitions. _Partitioner()_ : create a partitioner _partition_graph(self, fx_module: GraphModule, devices: List[str]) -> None_: use fx graph module and devices as the input and create partition_ids for each node inside the graph module _dump_partition_DAG(self) -> None_: print out the information about each partition, including its id, its backend type (what type of device this partition uses), all the nodes included in this partition, its parent partitions, children partitions, input nodes, and output nodes. So far, only a single partition is considered, which means there is only one device with unlimited memory. A test unit call _test_find_single_partition()_ is added to test if all nodes in the graph are marked for the only partition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45429 Reviewed By: izdeby Differential Revision: D24026268 Pulled By: scottxu0730 fbshipit-source-id: 119d506f33049a59b54ad993670f4ba5d8e15b0b	2020-10-01 13:07:34 -07:00
Richard Zou	381f6d32a7	[docs] Fix hyperlinks for nn.CrossEntropyLoss (#45660 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45460. This PR makes it so that LogSoftmax and NLLLoss are correctly linked from the nn.CrossEntropyLoss documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45660 Test Plan: - built and viewed docs locally ![image](https://user-images.githubusercontent.com/5652049/94816513-ee85fb80-03c9-11eb-8289-56642c133e11.png) Reviewed By: glaringlee Differential Revision: D24049009 Pulled By: zou3519 fbshipit-source-id: 3bd0660acb8575d753cefd2d0f1e523ca58a25b6	2020-10-01 12:18:43 -07:00
Richard Zou	1efdbfabcc	[docs] Fix back quote rendering in loss modules docs (#45662 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42855. Previously, back quotes weren't rendering correctly in equations. This is because we were quoting things like `'mean'`. In order to backquote properly in latex in text-mode, the back-quote needs to be written as a back-tick. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45662 Test Plan: - built docs locally and viewed the changes. For NLLLoss (which is not the original module mentioned in the issue, but it has the same problem), we can see how the back quotes now render properly: ![image](https://user-images.githubusercontent.com/5652049/94819862-c5676a00-03cd-11eb-9e92-01380ee52bd6.png) Reviewed By: glaringlee Differential Revision: D24049880 Pulled By: zou3519 fbshipit-source-id: 61a1257994144549eb8f29f19d639aea962dfec0	2020-10-01 11:52:27 -07:00
Ivan Yashchuk	77cd8e006b	Added support for complex torch.symeig (#45121 ) Summary: This PR adds support for complex-valued input for `torch.symeig`. TODO: - [ ] complex cuda tests raise `RuntimeError: _th_bmm_out not supported on CUDAType for ComplexFloat` Update: Added xfailing tests for complex dtypes on CUDA. Once support for complex `bmm` is added these tests will work. Fixes https://github.com/pytorch/pytorch/issues/45061. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45121 Reviewed By: mrshenli Differential Revision: D24049649 Pulled By: anjali411 fbshipit-source-id: 2cd11f0e47d37c6ad96ec786762f2da57f25dac5	2020-10-01 08:57:13 -07:00
Michael Carilli	72bc3d9de4	Use MTA for amp grad unscaling, enforce op math type in MTA functors, and allow op lambdas (#44778 ) Summary: Amp gradient unscaling is a great use case for multi tensor apply (in fact it's the first case I wrote it for). This PR adds an MTA unscale+infcheck functor. Really excited to have it for `torch.cuda.amp`. izdeby your interface was clean and straightforward to use, great work! Labeled as bc-breaking because the native_functions.yaml exposure of unscale+infcheck changes from [`_amp_non_finite_check_and_unscale_` to `_amp_foreach_non_finite_check_and_unscale_`]( https://github.com/pytorch/pytorch/pull/44778/files#diff-f1e4b2c15de770d978d0eb77b53a4077L6289-L6293). The PR also modifies Unary/Binary/Pointwise Functors to - do ops' internal math in FP32 for FP16 or bfloat16 inputs, which improves precision ([and throughput, on some architectures!](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions)) and has no downside for the ops we care about. - accept an instantiated op functor rather than an op functor template (`template<class> class Op`). This allows calling code to pass lambdas. Open question: As written now, the PR has MTA Functors take care of pre- and post-casting FP16/bfloat16 inputs to FP32 before running the ops. However, alternatively, the pre- and post-math casting could be deferred/written into the ops themselves, which gives them a bit more control. I can easily rewrite it that way if you prefer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44778 Reviewed By: gchanan Differential Revision: D23944102 Pulled By: izdeby fbshipit-source-id: 22b25ccad5f69b413c77afe8733fa9cacc8e766d	2020-10-01 07:51:16 -07:00
generatedunixname89002005325676	84cf3372d1	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D24044108 fbshipit-source-id: 6dfe2f1201304fa58e42472e3f53c72cbb63d7d2	2020-10-01 05:29:03 -07:00
Mike Ruberry	c36b354072	Revert D23913105: [quant][graphmode][fx] Merge all quantization mode Test Plan: revert-hammer Differential Revision: D23913105 (`ffcb0989e7`) Original commit changeset: 4e335286d6de fbshipit-source-id: 5765b4e8ec917423f1745f73a9f3f235fc53423d	2020-10-01 03:12:42 -07:00
James Reed	78b95b6204	Revert "Revert D24024606: [FX] Shape propagation example" (#45637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45637 This reverts commit `869b05648d`. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24037870 Pulled By: jamesr66a fbshipit-source-id: 851beb42fe72383108ceeff1fe97f388d9ad059e	2020-10-01 01:07:56 -07:00
Xingying Cheng	4339f5c076	[PyTorch][QPL] Add instance_key into MOBILE_MODULE_LOAD_STATS logging. (#45518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45518 Similar to previous diff, Add instance_key into MOBILE_MODULE_LOAD_STATS logging. ghstack-source-id: 113149713 Test Plan: ``` 09-29 11:50:23.345 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterLoadModel instance_key = 2015064908 09-29 11:50:23.409 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING markerAnnotate instance_key = 2015064908, model_name = bi_pytext_v10 09-29 11:50:23.410 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING markerAnnotate instance_key = 2015064908, model_type = FBNet 09-29 11:50:23.410 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING markerAnnotate instance_key = 2015064908, op_list_string = ["aten::__getitem__.t", "aten::__is__", "aten::__isnot__", "aten::add.Tensor", "aten::append.t", "aten::cat", "aten::contiguous", "aten::conv1d", "aten::dim", "aten::embedding", "aten::eq.int", "aten::format", "aten::len.t", "aten::max.dim", "aten::mul.Tensor", "aten::permute", "aten::relu", "aten::softmax.int", "aten::tanh", "prepacked::linear_clamp_run", "prim::RaiseException", "prim::TupleIndex", "prim::TupleUnpack", "prim::Uninitialized", "prim::unchecked_cast"] 09-29 11:50:23.410 6477 9351 W MobileModuleQPLObserver.cpp: TESTINGTESTING onExitLoadModel instance_key = 2015064908 ``` Reviewed By: iseeyuan Differential Revision: D23996150 fbshipit-source-id: 7bf76af3b7e6b346afd20ab341204743c81cfe83	2020-09-30 23:31:35 -07:00
BowenBao	3da4cea658	[ONNX] Add dim_param support in export with onnx shape inference (#44920 ) Summary: * Support propagating `dim_param` in ONNX by encoding as `ShapeSymbol` in `SymbolicShape` of outputs. If export is called with `dynamic_axes` provided, shape inference will start with these axes set as dynamic. * Add new test file `test_pytorch_onnx_shape_inference.py`, reusing all test cases from `test_pytorch_onnx_onnxruntime.py`, but focus on validating shape for all nodes in graph. Currently this is not enabled in the CI, since there are still quite some existing issues and corner cases to fix. The test is default to run only at opset 12. * Bug fixes, such as div, _len, and peephole.cpp passes for PackPadded, and LogSoftmaxCrossEntropy. * This PR depends on existing PR such as 44332. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44920 Reviewed By: eellison Differential Revision: D23958398 Pulled By: bzinodev fbshipit-source-id: 00479d9bd19c867d526769a15ba97ec16d56e51d	2020-09-30 21:56:24 -07:00
Jerry Zhang	ffcb0989e7	[quant][graphmode][fx] Merge all quantization mode (#45292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45292 This PR merges all quantization mode and will only expose the following top level functions: ``` prepare_fx prepare_qat_fx convert_fx ``` Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23913105 fbshipit-source-id: 4e335286d6de225839daf51d1df54322d52d68e5	2020-09-30 21:20:34 -07:00
Xingying Cheng	3f440d74fc	[PyTorch][QPL] Add instance_key into MOBILE_MODULE_STATS logging. (#45517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45517 Add unique instance_key instead of the default one into MOBILE_MODULE_STATS logging to avoid multiple events overlaps. ghstack-source-id: 113149453 Test Plan: Make sure that each event's start, annotate and end are having the same instancekey: ``` 09-28 23:46:03.094 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1123198800, method_name = forward 09-28 23:46:03.094 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1123198800, model_name = bi_pytext_v10 09-28 23:46:03.094 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1123198800, model_type = FBNet 09-28 23:46:03.094 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1123198800, op_list_string = ["aten::__getitem__.t", "aten::__is__", "aten::__isnot__", "aten::add.Tensor", "aten::append.t", "aten::cat", "aten::contiguous", "aten::conv1d", "aten::dim", "aten::embedding", "aten::eq.int", "aten::format", "aten::len.t", "aten::max.dim", "aten::mul.Tensor", "aten::permute", "aten::relu", "aten::softmax.int", "aten::tanh", "prepacked::linear_clamp_run", "prim::RaiseException", "prim::TupleIndex", "prim::TupleUnpack", "prim::Uninitialized", "prim::unchecked_cast"] 09-28 23:46:03.181 19349 21069 W MobileModuleQPLObserver.cpp: TESTINGTESTING onExitRunMethod instance_key = 1123198800 09-28 23:46:04.183 19349 20896 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1521608147, method_name = forward 09-28 23:46:04.184 19349 20896 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod instance_key = 1521608147, model_name = __torch__.Model 09-28 23:46:04.205 19349 20896 W MobileModuleQPLObserver.cpp: TESTINGTESTING onExitRunMethod instance_key = 1521608147 ``` Reviewed By: iseeyuan Differential Revision: D23985178 fbshipit-source-id: bcd5db8dc680e3cf8d12edf865377e80693cc23b	2020-09-30 20:13:33 -07:00
Jerry Zhang	9d5607fcd9	[quant] Use PlaceholderObserver as default dynamic quant observer (#45343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45343 Current default dynamic quant observer is not correct since we don't accumulate min/max and we don't need to calculate qparams. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D23933995 fbshipit-source-id: 3ff497c9f5f74c687e8e343ab9948d05ccbba09b	2020-09-30 19:01:18 -07:00
Taylor Robie	2b13d9413e	Re-land: Add callgrind collection to Timer #44717 (#45586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45586 Test Plan: The unit test has been softened to be less platform sensitive. Reviewed By: mruberry Differential Revision: D24025415 Pulled By: robieta fbshipit-source-id: ee986933b984e736cf1525e1297de6b21ac1f0cf	2020-09-30 17:43:06 -07:00
Yanan Cao	3a2d45304d	[Experimental][Partial] New implementation for torch.distributed APIs in C++ (#45547 ) Summary: This is an attempt at refactoring `torch.distributed` implementation. Goal is to push Python layer's global states (like _default_pg) to C++ layer such that `torch.distributed` becomes more TorchScript friendly. This PR adds the skeleton of C++ implementation, at the moment it is not included in any build (and won't be until method implementations are filled in). If you see any test failures related, feel free to revert. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45547 Reviewed By: izdeby Differential Revision: D24024213 Pulled By: gmagogsfm fbshipit-source-id: 2762767f63ebef43bf58e17f9447d53cf119f05f	2020-09-30 17:35:51 -07:00
David Reiss	869b05648d	Revert D24024606: [FX] Shape propagation example Test Plan: revert-hammer Differential Revision: D24024606 (`ac9a708ed0`) Original commit changeset: 5340eab20f80 fbshipit-source-id: f465eb5e8e994b3b0bedbc779901f76b9ab16f02	2020-09-30 17:03:14 -07:00
Hector Yuen	f2c2b75e80	flush the buffer when printing the IR (#45585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45585 I discovered this bug when I was trying to print the graph to a file. Turns out I had to close the file, but flushing should be a good safeguard in case other users forget. Test Plan: Tested with and without flushing. with P144064292 without P144064767 Reviewed By: mortzur Differential Revision: D24023819 fbshipit-source-id: 39574b3615feb28e5b5939664c04ddfb1257706a	2020-09-30 16:55:27 -07:00
Zino Benaissa	4be42034b6	Clear shape information before finalizing graph-mode quantization (#45282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45282 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23909601 Pulled By: bzinodev fbshipit-source-id: 3062cda46b15a79094a360216c35906afab7c723	2020-09-30 16:13:55 -07:00
Malgi Nikitha Vivekananda	85a70ce71f	Add multiline string dedent support (#45580 ) Summary: Fixes #{44842} Summary ======== This PR adds support for multiline string dedents. Test ===== pytest -k test_multiline_string_dedents test/test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/45580 Reviewed By: wconstab Differential Revision: D24025866 Pulled By: nikithamalgifb fbshipit-source-id: 0f49739fb93f70f73a8f367caca2887f558a3937	2020-09-30 16:08:26 -07:00
Sam Tsai	2596113a79	Add fuse support for batchnorm with affine=False (#45474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45474 When batchnorm affine is set to false, weight and bias is set to None, which is not supported in this case. Added a fix to set weights to 1 and bias to 0 if they are not set. Test Plan: Add unit test for testing fusing conv, batchnorm where batchnorm is in affine=False mode. Reviewed By: z-a-f Differential Revision: D23977080 fbshipit-source-id: 2782be626dc67553f3d27d8f8b1ddc7dea022c2a	2020-09-30 14:15:05 -07:00
Negin Raoof	6b42ca2d69	[ONNX] Update embedding_bag export (#44693 ) Summary: Export of embedding bag with dynamic list of offsets. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44693 Reviewed By: malfet Differential Revision: D23831980 Pulled By: bzinodev fbshipit-source-id: 3eaff1a0f20d1bcfb8039e518d78c491be381e1a	2020-09-30 13:36:40 -07:00
James Reed	ac9a708ed0	[FX] Shape propagation example (#45589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45589 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D24024606 Pulled By: jamesr66a fbshipit-source-id: 5340eab20f805c232bfeb37e4e2156f39a161c19	2020-09-30 13:18:23 -07:00
Xinyu Li	c9bb990707	[c++] Distance-agnostic triplet margin loss (#45377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45377 This PR adds a C++ implementation of the TripletMarginWithDistanceLoss, for which the Python implementation was introduced in PR #43680. It's based on PR #44072, but I'm resubmitting this to unlink it from Phabricator. Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D24003973 fbshipit-source-id: 2d9ada7260a6f27425ff2fdbbf623dad0fb79405	2020-09-30 12:37:35 -07:00
Rohan Varma	181afd5220	Add an option to DDP to take a list of parameters to ignore upfront. (#44826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44826 As described in https://github.com/pytorch/pytorch/issues/43690, there is a need for DDP to be able to ignore certain parameters in the module (not install allreduce hooks) for certain use cases. `find_unused_parameters` is sufficient from a correctness perspective, but we can get better performance with this upfront list if users know which params are unused, since we won't have to traverse the autograd graph every iteration. To enable this, we add a field `parameters_to_ignore` to DDP init and don't pass in that parameter to reducer if that parameter is in the given list. ghstack-source-id: 113210109 Test Plan: Added unittest Reviewed By: xw285cornell, mrshenli Differential Revision: D23740639 fbshipit-source-id: a0411712a8b0b809b9c9e6da04bef2b955ba5314	2020-09-30 11:52:50 -07:00
Jerry Zhang	5539066d12	[quant][graphmode][fx] Support quantization for custom module (#44074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44074 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23580642 fbshipit-source-id: a80b0b3e5e1f4c4a9647da872239cc0a4d58dd3b	2020-09-30 10:24:54 -07:00
Mike Ruberry	51d0ae9207	Revert D24010742: [pytorch][PR] Add callgrind collection to Timer Test Plan: revert-hammer Differential Revision: D24010742 (`9b27e0926b`) Original commit changeset: df6bc765f8ef fbshipit-source-id: 4c1edd57ea932896f7052716427059c924222501	2020-09-30 10:15:46 -07:00
Brian Hirsh	6c4aa2a79c	Revert D24002415: Some fixes to smooth_l1_loss Test Plan: revert-hammer Differential Revision: D24002415 (`fdbed7118e`) Original commit changeset: 980c141019ec fbshipit-source-id: 8981b5f6d982ed66c670122e437540444cb5f39c	2020-09-30 10:00:17 -07:00
Rong Rong	4f3920951e	type check for torch.quantization.quantize_jit (#45548 ) Summary: added type signal for more jit python functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/45548 Reviewed By: malfet Differential Revision: D24010922 Pulled By: walterddr fbshipit-source-id: 2fdd75482481adf2eddc01b915d7d5720fbb2b82	2020-09-30 09:17:00 -07:00
anjali411	415ed434aa	Add whitelist for complex backward (#45461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45461 This PR disables autograd for all C -> C, R -> C functions which are not included in the whitelist `GRADIENT_IMPLEMENTED_FOR_COMPLEX`. In practice, there will be a RuntimeError during forward computation when the outputs are differentiable: ``` >>> x=torch.randn(4, 4, requires_grad=True, dtype=torch.cdouble) >>> x.pow(3) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: pow does not support automatic differentiation for outputs with complex dtype. ``` The implicit assumption here is that all the C -> R functions have correct backward definitions. So before merging this PR, the following functions must be tested and verified to have correct backward definitions: `torch.abs` (updated in #39955 ), `torch.angle`, `torch.norm`, `torch.irfft`, `torch.istft`. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D23998156 Pulled By: anjali411 fbshipit-source-id: 370eb07fe56ac84dd8e2233ef7bf3a3eb8aeb179	2020-09-30 08:45:55 -07:00
Erjia Guan	96540e918c	Add ShuffleDataset with buffer (#45290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45290 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D24001084 Pulled By: erjia-guan fbshipit-source-id: d8a7455cf3f18e1f8c1edc53c42c1a99c8573c51	2020-09-30 07:58:15 -07:00
Brian Hirsh	fdbed7118e	Some fixes to smooth_l1_loss (#45532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45532 - updated documentation - explicitly not supporting negative values for beta (previously the result was incorrect) - Removing default value for beta in the backwards function, since it's only used internally by autograd (as per convention) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D24002415 Pulled By: bdhirsh fbshipit-source-id: 980c141019ec2d437b771ee11fc1cec4b1fcfb48	2020-09-30 07:28:44 -07:00
VinodSKumar	e02868e12d	Unify Transformer coder Constructors (#45515 ) Summary: Fixes #{[45502](https://github.com/pytorch/pytorch/issues/45502)} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45515 Reviewed By: zhangguanheng66, ZolotukhinM Differential Revision: D23994644 Pulled By: glaringlee fbshipit-source-id: b8728e8dfd8857e27246ebb11b17c2d1b48796ca	2020-09-30 07:05:41 -07:00
Nikolay Korovaiko	7566823779	Enable PE + TE (#45546 ) Summary: This PR enables PE + TE for 1.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45546 Reviewed By: ZolotukhinM Differential Revision: D24006940 Pulled By: Krovatkin fbshipit-source-id: a3326077d34a023941acdb06c4907c96e7ba0115	2020-09-30 06:49:59 -07:00
Taylor Robie	9b27e0926b	Add callgrind collection to Timer (#44717 ) Summary: This PR allows Timer to collect deterministic instruction counts for (some) snippets. Because of the intrusive nature of Valgrind (effectively replacing the CPU with an emulated one) we have to perform our measurements in a separate process. This PR writes a `.py` file containing the Timer's `setup` and `stmt`, and executes it within a `valgrind` subprocess along with a plethora of checks and error handling. There is still a bit of jitter around the edges due to the Python glue that I'm using, but the PyTorch signal is quite good and thus this provides a low friction way of getting signal. I considered using JIT as an alternative, but: A) Python specific overheads (e.g. parsing) are important B) JIT might do rewrites which would complicate measurement. Consider the following bit of code, related to https://github.com/pytorch/pytorch/issues/44484: ``` from torch.utils._benchmark import Timer counts = Timer( "x.backward()", setup="x = torch.ones((1,)) + torch.ones((1,), requires_grad=True)" ).collect_callgrind() for c, fn in counts[:20]: print(f"{c:>12} {fn}") ``` ``` 812800 ???:_dl_update_slotinfo 355600 ???:update_get_addr 308300 work/Python/ceval.c:_PyEval_EvalFrameDefault'2 304800 ???:__tls_get_addr 196059 ???:_int_free 152400 ???:__tls_get_addr_slow 138400 build/../c10/core/ScalarType.h:c10::typeMetaToScalarType(caffe2::TypeMeta) 126526 work/Objects/dictobject.c:_PyDict_LoadGlobal 114268 ???:malloc 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 85900 work/Python/ceval.c:_PyEval_EvalFrameDefault 79946 work/Objects/typeobject.c:_PyType_Lookup 72000 build/../c10/core/Device.h:c10::Device::validate() 70000 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() 66400 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 63000 ???:pthread_mutex_lock 61200 work/Objects/dictobject.c:PyDict_GetItem 59800 ???:free 58400 work/Objects/tupleobject.c:tupledealloc 56707 work/Objects/dictobject.c:lookdict_unicode_nodummy ``` Moreover, if we backport this PR to 1.6 (just copy the `_benchmarks` folder) and load those counts as `counts_1_6`, then we can easily diff them: ``` print(f"Head instructions: {sum(c for c, _ in counts)}") print(f"1.6 instructions: {sum(c for c, _ in counts_1_6)}") count_dict = {fn: c for c, fn in counts} for c, fn in counts_1_6: _ = count_dict.setdefault(fn, 0) count_dict[fn] -= c count_diffs = sorted([(c, fn) for fn, c in count_dict.items()], reverse=True) for c, fn in count_diffs[:15] + [["", "..."]] + count_diffs[-15:]: print(f"{c:>8} {fn}") ``` ``` Head instructions: 7609547 1.6 instructions: 6059648 169600 ???:_dl_update_slotinfo 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 74200 ???:update_get_addr 63600 ???:__tls_get_addr 46800 work/Python/ceval.c:_PyEval_EvalFrameDefault 33512 work/Objects/dictobject.c:_PyDict_LoadGlobal 31800 ???:__tls_get_addr_slow 31700 build/../aten/src/ATen/record_function.cpp:at::RecordFunction::RecordFunction(at::RecordScope) 28300 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object, _object, bool) 27800 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 27401 work/Objects/dictobject.c:lookdict_unicode_nodummy 24115 work/Objects/typeobject.c:_PyType_Lookup 24080 ???:_int_free 21700 work/Objects/dictobject.c:PyDict_GetItemWithError 20700 work/Objects/dictobject.c:PyDict_GetItem ... -3200 build/../c10/util/SmallVector.h:at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool) -3400 build/../aten/src/ATen/native/TensorIterator.cpp:at::TensorIterator::resize_outputs(at::TensorIteratorConfig const&) -3500 /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:std::unique_lock<std::mutex>::unlock() -3700 build/../torch/csrc/utils/python_arg_parser.cpp:torch::PythonArgParser::raw_parse(_object, _object, _object) -4207 work/Objects/obmalloc.c:PyMem_Calloc -4500 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() -4800 build/../torch/csrc/autograd/generated/VariableType_2.cpp:torch::autograd::VariableType::add__Tensor(at::Tensor&, at::Tensor const&, c10::Scalar) -5000 build/../c10/core/impl/LocalDispatchKeySet.cpp:c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKey) -5300 work/Objects/listobject.c:PyList_New -5400 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionParameter::check(_object, std::vector<pybind11::handle, std::allocator<pybind11::handle> >&) -5600 /usr/include/c++/8/bits/std_mutex.h:std::unique_lock<std::mutex>::unlock() -6231 work/Objects/obmalloc.c:PyMem_Free -6300 work/Objects/listobject.c:list_repeat -11200 work/Objects/listobject.c:list_dealloc -28900 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object*, bool) ``` Remaining TODOs: Include a timer in the generated script for cuda sync. * Add valgrind to CircleCI machines and add a unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44717 Reviewed By: soumith Differential Revision: D24010742 Pulled By: robieta fbshipit-source-id: df6bc765f8efce7193893edba186cd62b4b23623	2020-09-30 05:52:54 -07:00
Ilia Cherniavskii	f5c95d5cf1	Source code level attribution in profiler (#43898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43898 Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py ``` Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Source Location ----------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -------------------------------------------- ts_method_1 10.43% 235.364us 36.46% 822.920us 822.920us 1 test/test_profiler.py(70): test_source aten::add 7.52% 169.833us 8.88% 200.439us 200.439us 1 test/test_profiler.py(69): test_source aten::normal_ 6.26% 141.380us 6.26% 141.380us 141.380us 1 test/test_profiler.py(67): test_source aten::add 5.80% 130.830us 8.41% 189.800us 63.267us 3 test/test_profiler.py(72): test_source aten::sum 5.02% 113.340us 8.39% 189.475us 189.475us 1 test/test_profiler.py(64): ts_method_1 aten::add 4.58% 103.346us 6.33% 142.847us 142.847us 1 test/test_profiler.py(62): ts_method_1 aten::mul 4.05% 91.498us 9.62% 217.113us 217.113us 1 test/test_profiler.py(71): test_source aten::add 4.03% 90.880us 5.60% 126.405us 126.405us 1 test/test_profiler.py(58): ts_method_2 aten::empty 3.49% 78.735us 3.49% 78.735us 19.684us 4 test/test_profiler.py(72): test_source ``` Reviewed By: ngimel Differential Revision: D23432664 Pulled By: ilia-cher fbshipit-source-id: 83ad7ebe0c2502494d3b48c4e687802db9c77615	2020-09-30 00:57:35 -07:00
Xiang Gao	c2c7099944	Fix docs for kwargs, q-z (#43589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43589 Reviewed By: zhangguanheng66 Differential Revision: D24006259 Pulled By: mruberry fbshipit-source-id: 39abd474744f152648aad201d7311b42d20efc88	2020-09-29 22:57:02 -07:00
Peng-Jen Chen	93650a82c9	Move prim::tolist math.log and aten::cpu to lite interpreter for translation model (#45482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45482 Working on some models that need these ops on lite interpreter. Test Plan: locally build and load/run the TS model without problem. Reviewed By: iseeyuan Differential Revision: D23906581 fbshipit-source-id: 01b9de2af2046296165892b837bc14a7e5d59b4e	2020-09-29 21:42:18 -07:00
Mikhail Zolotukhin	4aca63d38a	[TensorExpr] Change API for creating Load and Store expressions. (#45520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45520 With this change `Load`s and `Store`s no longer accept `Placeholder`s in their constructor and `::make` functions and can only be built with `Buf`. `Placeholder` gets its own `store`, `load`, `storeWithMask`, and `loadWithMask` method for more convenient construction. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D23998789 Pulled By: ZolotukhinM fbshipit-source-id: 3fe018e00c1529a563553b2b215f403b34aea912	2020-09-29 20:52:38 -07:00
Taylor Robie	ccad73ab41	Fix D23995953 import. Summary: https://github.com/pytorch/pytorch/pull/45511 could not be properly imported Test Plan: See https://github.com/pytorch/pytorch/pull/45511 Reviewed By: zhangguanheng66 Differential Revision: D23995953 fbshipit-source-id: a6224a67d54617ddf34c2392e65f2142c4e78ea4	2020-09-29 19:30:23 -07:00
Xiang Gao	0a15646e15	CUDA RTX30 series support (#45489 ) Summary: I also opened a PR on cmake upstream: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/5292 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45489 Reviewed By: zhangguanheng66 Differential Revision: D23997844 Pulled By: ezyang fbshipit-source-id: 4e7443dde9e70632ee429184f0d51cb9aa5a98b5	2020-09-29 18:19:23 -07:00
Guilherme Leobas	c1e6592964	Enable type-checking of torch.nn.quantized.* modules (#43110 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43029 I am not changing the following files in this PR: * `torch/nn/quantized/dynamic/modules/rnn.py` due to https://github.com/pytorch/pytorch/issues/43072 * `torch/nn/quantized/modules/conv.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43110 Reviewed By: gchanan Differential Revision: D23963258 Pulled By: ezyang fbshipit-source-id: 0fb0fd13af283f6f7b3434e7bbf62165357d1f98	2020-09-29 18:14:29 -07:00
Guilherme Leobas	375a83e6c1	Annotate torch.utils.(tensorboard/show_pickle/hypify) (#44216 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44216 Reviewed By: gchanan Differential Revision: D23963216 Pulled By: ezyang fbshipit-source-id: b3fed51b2a1cbd05e3cd0222c89c38d61d8968c1	2020-09-29 18:14:26 -07:00
Guilherme Leobas	eb39542e67	Add typing annotations for torch.utils.data.* modules (#44136 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44136 Reviewed By: gchanan Differential Revision: D23963273 Pulled By: ezyang fbshipit-source-id: 939234dddbe89949bd8e5ff05d06f6c8add6935c	2020-09-29 18:12:05 -07:00
Thomas Viehmann	22a34bcf4e	ROCm {emoji:2764} TensorExpr (#45506 ) Summary: This might be an alternative to reverting https://github.com/pytorch/pytorch/issues/45396 . The obvious rough edge is that I'm not really seeing the work group limits that TensorExpr produces. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45506 Reviewed By: zhangguanheng66 Differential Revision: D23991410 Pulled By: Krovatkin fbshipit-source-id: 11d3fc4600e4bffb1d1192c6b8dd2fe22c1e064e	2020-09-29 16:52:16 -07:00
Hongyi Jia	06a566373a	[PyTorch/NCCL] Fix async error handling (#45456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45456 Remove work while not holding lock, to avoid deadlock with watchdog thread while GPU is 100% SyncBatchNorm failure trace: P143879560 Test Plan: Desync test: BACKEND=nccl WORLD_SIZE=3 NCCL_ASYNC_ERROR_HANDLING=1 ./buck-out/gen/caffe2/test/distributed/distributed_nccl_spawn#binary.par -r test_DistributedDataParallel_desync SyncBatchNorm test: BACKEND=nccl WORLD_SIZE=3 NCCL_ASYNC_ERROR_HANDLING=1 ./buck-out/gen/caffe2/test/distributed/distributed_nccl_fork#binary.par -r test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_gradient Reviewed By: osalpekar Differential Revision: D23972071 fbshipit-source-id: f03d9637a6ec998d64dab1a062a81e0f3697275f	2020-09-29 15:44:34 -07:00
Garret Catron	ef41472544	Create experimental FX graph manipulation library (#44775 ) Summary: This PR adds a new GraphManipulation library for operating on the GraphModule nodes. It also adds an implementation of replace_target_nodes_with, which replaces all nodes in the GraphModule or a specific op/target with a new specified op/target. An example use of this function would be replacing a generic operator with an optimized operator for specific sizes and shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44775 Reviewed By: jamesr66a Differential Revision: D23874561 Pulled By: gcatron fbshipit-source-id: e1497cd11e0bbbf1fabdf137d65c746248998e0b	2020-09-29 15:32:41 -07:00
Randall Hunt	ab5cf16b6c	fix standard deviation gradient NaN behavior (#45468 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/4320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45468 Reviewed By: zhangguanheng66 Differential Revision: D23991064 Pulled By: albanD fbshipit-source-id: d4274895f2dac8b2cdbd73e5276ce3df466fc341	2020-09-29 13:47:29 -07:00
anjali411	18876b5722	Update backward formula for torch.dot and add backward definition for torch.vdot (#45074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45074 TODO: Add R -> C tests in https://github.com/pytorch/pytorch/pull/44744 (blocked on some JIT changes) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D23975361 Pulled By: anjali411 fbshipit-source-id: 3512bd2962b588a198bc317673bd18cc96ac823f	2020-09-29 12:52:03 -07:00
Mike Ruberry	b2925671b6	Updates deterministic flag to throw a warning, makes docs consistent (#45410 ) Summary: Per feedback in the recent design review. Also tweaks the documentation to clarify what "deterministic" means and adds a test for the behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45410 Reviewed By: ngimel Differential Revision: D23974988 Pulled By: mruberry fbshipit-source-id: e48307da9c90418fc6834fbd67b963ba2fe0ba9d	2020-09-29 11:17:33 -07:00
Ivan Yashchuk	f47fd0eb72	Updated `cholesky_backward` for complex inputs (#45267 ) Summary: Updated `cholesky_backward` to work correctly for complex input. Note that the current implementation gives the conjugate of what JAX would return. anjali411 is that correct thing to do? Ref. https://github.com/pytorch/pytorch/issues/44895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45267 Reviewed By: bwasti Differential Revision: D23975269 Pulled By: anjali411 fbshipit-source-id: 9908b0bb53c411e5ad24027ff570c4f0abd451e6	2020-09-29 11:07:32 -07:00
Xingying Cheng	ea59251f51	Fix model_name not logged properly issue. (#45488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45488 model_name logging was broken, issue is from the recent change of assigning the method name into the module name, this diff is fixing it. ghstack-source-id: 113103942 Test Plan: made sure that now the model_name is logged from module_->name(). verified with one model which does not contain the model metadata, and the model_name field is logged as below: 09-28 21:59:30.065 11530 12034 W module.cpp: TESTINGTESTING run() module = __torch__.Model 09-28 21:59:30.065 11530 12034 W module.cpp: TESTINGTESTING metadata does not have model_name assigning to __torch__.Model 09-28 21:59:30.066 11530 12034 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod log model_name = __torch__.Model 09-28 21:59:30.066 11530 12034 W MobileModuleQPLObserver.cpp: TESTINGTESTING onEnterRunMethod log method_name = labels 09-28 21:59:30.068 11530 12034 W MobileModuleQPLObserver.cpp: TESTINGTESTING onExitRunMethod() Reviewed By: linbinyu Differential Revision: D23984165 fbshipit-source-id: 5b00f50ea82106b695c2cee14029cb3b2e02e2c8	2020-09-29 10:37:36 -07:00
Meghan Lele	09b3e16b40	[JIT] Enable @unused syntax for ignoring properties (#45261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45261 Summary This commit enables `unused` syntax for ignoring properties. Inoring properties is more intuitive with this feature enabled. `ignore` is not supported because class type properties cannot be executed in Python (because they exist only as TorchScript types) like an `ignored` function and module properties that cannot be scripted are not added to the `ScriptModule` wrapper so that they may execute in Python. Test Plan This commit updates the existing unit tests for class type and module properties to test properties ignored using `unused`. Test Plan: Imported from OSS Reviewed By: navahgar, Krovatkin, mannatsingh Differential Revision: D23971881 Pulled By: SplitInfinity fbshipit-source-id: 8d3cc1bbede7753d6b6f416619e4660c56311d33	2020-09-29 10:24:25 -07:00
Akshit Khurana	5f49d14be2	Add mobile_optimized tag to optimized model. (#45479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45479 Add a top level boolean attribute to the model called mobile_optimized that is set to true if it is optimized. Test Plan: buck test //caffe2/test:mobile passes Reviewed By: kimishpatel Differential Revision: D23956728 fbshipit-source-id: 79c5931702208b871454319ca2ab8633596b1eb8	2020-09-29 10:06:57 -07:00
Mike Ruberry	ab5edf21b0	Revert D23789657: [wip] fast typeMeta/ScalarType conversion approach 2 Test Plan: revert-hammer Differential Revision: D23789657 (`1ed1a2f5b0`) Original commit changeset: 5afdd52d24bd fbshipit-source-id: 6d827be8895bcb39c8e85342eee0f7a3f5056c76	2020-09-29 09:40:53 -07:00
Nikita Shulga	b3135c2056	Enable torch.cuda.amp typechecking (#45480 ) Summary: Fix `torch._C._autocast_*_nesting` declarations in __init__.pyi Fix iterable constructor logic: not every iterable can be constructed using `type(val)(val)` trick, for example it would not work for `val=range(10)` although `isinstance(val, Iterable)` is True Change optional resolution logic to meet mypy expectations Fixes https://github.com/pytorch/pytorch/issues/45436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45480 Reviewed By: walterddr Differential Revision: D23982822 Pulled By: malfet fbshipit-source-id: 6418a28d04ece1b2427dcde4b71effb67856a872	2020-09-29 09:31:55 -07:00
Mike Ruberry	bb19a55429	Improves fft doc consistency and makes deprecation warnings more prominent (#45409 ) Summary: This PR makes the deprecation warnings for existing fft functions more prominent and makes the torch.stft deprecation warning consistent with our current deprecation planning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45409 Reviewed By: ngimel Differential Revision: D23974975 Pulled By: mruberry fbshipit-source-id: b90d8276095122ac3542ab625cb49b991379c1f8	2020-09-29 09:07:49 -07:00
Mike Ruberry	6d37126a10	Makes rdiv consistent with div (#45407 ) Summary: In addition to making rdiv consistent with div, this PR significantly expands division testing, accounting for floor_divide actually performing truncation division, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45407 Reviewed By: ngimel Differential Revision: D23974967 Pulled By: mruberry fbshipit-source-id: 82b46b07615603f161ab7cd1d3afaa6d886bfe95	2020-09-29 08:34:01 -07:00
Mike Ruberry	87f98a5b54	Updates torch.floor_divide documentation to clarify it's actually torch.trunc_divide (or torch.rtz_divide) (#45411 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/43874 for 1.7. 1.8 will need to take floor_divide through a proper deprecation process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45411 Reviewed By: ngimel Differential Revision: D23974997 Pulled By: mruberry fbshipit-source-id: 16dd07e50a17ac76bfc93bd6b71d4ad72d909bf4	2020-09-29 05:55:44 -07:00
Antonio Cuni	37f9af7f29	Missing tests about torch.xxx(out=...) (#44465 ) Summary: PR opened just to run the CI tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44465 Reviewed By: ngimel Differential Revision: D23907565 Pulled By: mruberry fbshipit-source-id: 620661667877f1e9a2bab17d19988e2dc986fc0f	2020-09-29 04:54:46 -07:00
Mike Ruberry	56af122659	Revert D23966878: [pytorch][PR] This PR flips a switch to enable PE + TE Test Plan: revert-hammer Differential Revision: D23966878 (`dddb685c11`) Original commit changeset: 2010a0b07c59 fbshipit-source-id: 132556039730fd3e4babd0d7ca8daf9c8d14f728	2020-09-29 04:33:19 -07:00
Basil Hosmer	1ed1a2f5b0	[wip] fast typeMeta/ScalarType conversion approach 2 (#44965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44965 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23789657 Pulled By: bhosmer fbshipit-source-id: 5afdd52d24bd097891ff4a7313033f7bd400165e	2020-09-29 02:39:36 -07:00
Supriya Rao	489af4ddcb	[quant] Add quant APIs to save/load observer state_dict (#44846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44846 The save function traverses the model state dict to pick out the observer stats load function traverse the module hierarchy to load the state dict into module attributes depending on observer type Test Plan: python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23746821 fbshipit-source-id: 05c571b62949a2833602d736a81924d77e7ade55	2020-09-29 01:52:42 -07:00
Zafar	bb478810e0	[quant] torch.max_pool1d (#45152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45152 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23846473 Pulled By: z-a-f fbshipit-source-id: 38fd611e568e4f8b39b7a00adeb42c7b99576360	2020-09-29 01:45:22 -07:00
Mikhail Zolotukhin	b86008ab75	[TensorExpr] Remove buf_ field from class Tensor. (#45390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45390 Tensor objects should always refer to their Function's bufs. Currently we never create a Tensor with a buffer different than of its function, but having it in two places seems incorrect and dangerous. Differential Revision: D23952865 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: e63fc26d7078427514649d9ce973b74ea635a94a	2020-09-29 01:21:57 -07:00
Mikhail Zolotukhin	3c33695a6d	[TensorExpr] Rename `Buffer` to `Placeholder`. (#45389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45389 Differential Revision: D23952866 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 17eedd3ac17897501403482ac1866c569d247c75	2020-09-29 01:21:54 -07:00
Mikhail Zolotukhin	92306b85d5	[TensorExpr] Consolidate {buffer,function,tensor}.{h.cpp} in tensor.{h,cpp}. (#45388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45388 Classes defined in these files are closely related, so it is reasonable to have them all in one file. The change is purely a code move. Differential Revision: D23952867 Test Plan: Imported from OSS Reviewed By: nickgg Pulled By: ZolotukhinM fbshipit-source-id: 12cfaa968bdfc4dff00509e34310a497c7b59155	2020-09-29 01:17:10 -07:00
Iurii Zdebskyi	8c309fc052	Add more tests for mt optimizers (#45475 ) Summary: Add more test cases for mt optimizers and fix Adam/AdamW Pull Request resolved: https://github.com/pytorch/pytorch/pull/45475 Reviewed By: soumith Differential Revision: D23982727 Pulled By: izdeby fbshipit-source-id: 4b24d37bd52a2fa3719d3e3a5dcf3b96990b0f5b	2020-09-28 23:59:58 -07:00
James Reed	6bdb871d47	[FX] Lint pass for Graphs (#44973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44973 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23792631 Pulled By: jamesr66a fbshipit-source-id: d8faef0c311d8bd611ba0a7e1e2f353e3e5a1068	2020-09-28 23:00:32 -07:00
James Reed	b0bdc82a00	[FX][EZ] Fix bug where copying node made non-unique name (#45311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45311 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23917864 Pulled By: jamesr66a fbshipit-source-id: 10d0a4017ffe160bce4ba0d830e035616bbded74	2020-09-28 22:55:20 -07:00
lixinyu	417e3f85e5	Support tuple inputs in NN Module test (#44853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44853 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23750441 Pulled By: glaringlee fbshipit-source-id: 1b111a370a726b40521134b711c35f48dda99411	2020-09-28 22:05:05 -07:00
Nikolay Korovaiko	dddb685c11	This PR flips a switch to enable PE + TE (#45396 ) Summary: This PR flips a switch to enable PE + TE next PR: https://github.com/pytorch/pytorch/pull/45397 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45396 Reviewed By: suo Differential Revision: D23966878 Pulled By: Krovatkin fbshipit-source-id: 2010a0b07c595992a88b3fe0792d6af315cf421e	2020-09-28 21:57:50 -07:00
Natalia Gimelshein	50b91103a9	add self cuda time to avoid double/quadruple counting (#45209 ) Summary: In profiler, cuda did not report self time, so for composite functions there was no way to determine which function is really taking time. In addition, "total cuda time" reported was frequently more than total wallclock time. This PR adds "self CUDA time" in profiler, and computes total cuda time based on self cuda time, similar to how it's done for CPU. Also, slight formatting changes to make table more compact. Before: ``` -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- aten::matmul 0.17% 890.805us 99.05% 523.401ms 5.234ms 49.91% 791.184ms 7.912ms 100 aten::mm 98.09% 518.336ms 98.88% 522.511ms 5.225ms 49.89% 790.885ms 7.909ms 100 aten::t 0.29% 1.530ms 0.49% 2.588ms 25.882us 0.07% 1.058ms 10.576us 100 aten::view 0.46% 2.448ms 0.46% 2.448ms 12.238us 0.06% 918.936us 4.595us 200 aten::transpose 0.13% 707.204us 0.20% 1.058ms 10.581us 0.03% 457.802us 4.578us 100 aten::empty 0.14% 716.056us 0.14% 716.056us 7.161us 0.01% 185.694us 1.857us 100 aten::as_strided 0.07% 350.935us 0.07% 350.935us 3.509us 0.01% 156.380us 1.564us 100 aten::stride 0.65% 3.458ms 0.65% 3.458ms 11.527us 0.03% 441.258us 1.471us 300 -------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 528.437ms CUDA time total: 1.585s Recorded timeit time: 789.0814 ms ``` Note recorded timeit time (with proper cuda syncs) is 2 times smaller than "CUDA time total" reported by profiler After ``` -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::matmul 0.15% 802.716us 99.06% 523.548ms 5.235ms 302.451us 0.04% 791.151ms 7.912ms 100 aten::mm 98.20% 519.007ms 98.91% 522.745ms 5.227ms 790.225ms 99.63% 790.848ms 7.908ms 100 aten::t 0.27% 1.406ms 0.49% 2.578ms 25.783us 604.964us 0.08% 1.066ms 10.662us 100 aten::view 0.45% 2.371ms 0.45% 2.371ms 11.856us 926.281us 0.12% 926.281us 4.631us 200 aten::transpose 0.15% 783.462us 0.22% 1.173ms 11.727us 310.016us 0.04% 461.282us 4.613us 100 aten::empty 0.11% 591.603us 0.11% 591.603us 5.916us 176.566us 0.02% 176.566us 1.766us 100 aten::as_strided 0.07% 389.270us 0.07% 389.270us 3.893us 151.266us 0.02% 151.266us 1.513us 100 aten::stride 0.60% 3.147ms 0.60% 3.147ms 10.489us 446.451us 0.06% 446.451us 1.488us 300 -------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 528.498ms CUDA time total: 793.143ms Recorded timeit time: 788.9832 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45209 Reviewed By: zou3519 Differential Revision: D23925491 Pulled By: ngimel fbshipit-source-id: 7f9c49238d116bfd2db9db3e8943355c953a77d0	2020-09-28 21:51:13 -07:00
Shen Li	5be954b502	Fix WorkerInfo link format (#45476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45476 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23982069 Pulled By: mrshenli fbshipit-source-id: 6d932e77c1941dfd96592b388353f0fc8968dde6	2020-09-28 20:48:15 -07:00
Shen Li	8e47fcba5f	Update docs for RPC async_execution (#45458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45458 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973366 Pulled By: mrshenli fbshipit-source-id: 3697f07fa972db21746aa25eaf461c1b93293f58	2020-09-28 20:48:12 -07:00
Shen Li	c5ade5f698	Fix no_sync docs (#45455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45455 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973365 Pulled By: mrshenli fbshipit-source-id: 87c9878cdc7310754670b83efa65ae6f877f86fb	2020-09-28 20:48:09 -07:00
Shen Li	6967e6295e	Fix DDP docs (#45454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45454 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23973367 Pulled By: mrshenli fbshipit-source-id: 11f20d51d0d0f92f199e4023f02b86623867bae0	2020-09-28 20:43:22 -07:00
Alex Suhan	52cbc9e4ec	[TensorExpr] Always inline and DCE in the LLVM backend (#45445 ) Summary: Inline pytorch into wrapper, which is especially helpful in combination with dead code elimination to reduce IR size and compilation times when a lot of parameters are unused. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45445 Test Plan: CI Reviewed By: ZolotukhinM Differential Revision: D23969009 Pulled By: asuhan fbshipit-source-id: a21509d07e4c130b6aa6eae5236bb64db2748a3d	2020-09-28 18:11:13 -07:00
Meghan Lele	7ac872b934	[JIT] Modify to_backend API so that it accepts wrapped modules (#43612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43612 Summary This commit modifies the `torch._C._jit_to_backend` function so that it accepts `ScriptModules` as inputs. It already returns `ScriptModules` (as opposed to C++ modules), so this makes sense and makes the API more intuitive. Test Plan Continuous integration, which includes unit tests and out-of-tree tests for custom backends. Fixes This commit fixes #41432. Test Plan: Imported from OSS Reviewed By: suo, jamesr66a Differential Revision: D23339854 Pulled By: SplitInfinity fbshipit-source-id: 08ecef729c4e1e6bddf3f483276947fc3559ea88	2020-09-28 17:17:01 -07:00
Rong Rong	5855aa8dac	Type check quasirandom (#45434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42978. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45434 Reviewed By: walterddr Differential Revision: D23967139 Pulled By: ajitmaths fbshipit-source-id: bcee6627f367fd01aa9a5c10a7c24331fc1823ad	2020-09-28 16:49:38 -07:00
Rong Rong	49b198c454	type check for torch.testing._internal.common_utils (#45375 ) Summary: part of torch.testing._internal.* effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/45375 Reviewed By: malfet Differential Revision: D23964315 Pulled By: walterddr fbshipit-source-id: efdd643297f5c7f75670ffe60ff7e82fc413d18d	2020-09-28 16:28:46 -07:00
Heitor Schueroff de Souza	96f8755034	Fixed handling of nan for evenly_distribute_backward (#45280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45280 Performance is the same on CPU and on CUDA is only 1-1.05x slower. This change is necessary for the future nan ops including nan(min\|max\|median) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D23908796 Pulled By: heitorschueroff fbshipit-source-id: c2b57acbe924cfa59fbd85216811f29f4af05088	2020-09-28 15:57:02 -07:00
Jan Schlüter	6a206df891	20000x faster audio conversion for SummaryWriter (#44201 ) Summary: Stumbled upon a little gem in the audio conversion for `SummaryWriter.add_audio()`: two Python `for` loops to convert a float array to little-endian int16 samples. On my machine, this took 35 seconds for a 30-second 22.05 kHz excerpt. The same can be done directly in numpy in 1.65 milliseconds. (No offense, I'm glad that the functionality was there!) Would also be ready to extend this to support stereo waveforms, or should this become a separate PR? Pull Request resolved: https://github.com/pytorch/pytorch/pull/44201 Reviewed By: J0Nreynolds Differential Revision: D23831002 Pulled By: edward-io fbshipit-source-id: 5c8f1ac7823d1ed41b53c4f97ab9a7bac33ea94b	2020-09-28 15:44:29 -07:00
Zachary DeVito	e54e1fe51e	[package] Add dependency viz (#45214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45214 When in verbose mode the package exporter will produce an html visualization of dependencies of a module to make it easier to trim out unneeded code, or debug inclusion of things that cannot be exported. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23873525 Pulled By: zdevito fbshipit-source-id: 6801991573d8dd5ab8c284e09572b36a35e1e5a4	2020-09-28 15:38:41 -07:00
Omkar Salpekar	6b65b3cbd8	[Distributed] DeleteKey API for c10d TCP Store (#45401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45401 Added a DeleteKey API for the TCP Store ghstack-source-id: 112997162 Test Plan: Modified the existing get/set test to use delete. verified that the correct keys were deleted and that the numKeys API returned the right values Reviewed By: mrshenli Differential Revision: D23955730 fbshipit-source-id: 5c9f82be34ff4521c59f56f8d9c1abf775c67f9f	2020-09-28 15:30:39 -07:00
Gregory Chanan	1097fe0088	Remove CriterionTest.test_cuda code for dtype None. (#45316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45316 It's never used. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23919449 Pulled By: gchanan fbshipit-source-id: f9aaeeabf3940389156bfc01bc3118d348ca4cf6	2020-09-28 15:08:09 -07:00
lcskrishna	a4486fe7ba	[ROCm] Print name irrespective of seq number assignment for roctx traces (#45229 ) Summary: Recent changes to the seq_num correlation behavior in profiler (PR https://github.com/pytorch/pytorch/issues/42565) has changed the behavior for emit_nvtx(record_shapes=True) which doesn't print the name of the operator properly. Created PR to dump out the name in roctx traces, irrespective of the sequence number assigned only for ROCm. cc: jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45229 Reviewed By: zou3519 Differential Revision: D23932902 Pulled By: albanD fbshipit-source-id: c782667ff002b70b51f1cc921afd1b1ac533b39d	2020-09-28 15:03:47 -07:00
Taylor Robie	c6b7eeb654	Gh/taylorrobie/timer cleanup (#45361 ) Summary: This PR cleans up some of the rough edges around `Timer` and `Compare` * Moves `Measurement` to be dataclass based * Adds a bunch of type annotations. MyPy is now happy. * Allows missing entries in `Compare`. This is one of the biggest usability issues with `Compare` right now, both from an API perspective and because the current failure mode is really unpleasant. * Greatly expands the testing of `Compare` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45361 Test Plan: Changes to Timer are covered under existing tests, changes to `Compare` are covered by the expanded `test_compare` method. Reviewed By: bwasti Differential Revision: D23966816 Pulled By: robieta fbshipit-source-id: 826969f73b42f72fa35f4de3c64d0988b61474cd	2020-09-28 14:56:43 -07:00
Negin Raoof	a77d633db1	[ONNX] Fix view for dynamic input shape (#43558 ) Summary: Export of view op with dynamic input shape is broken when using tensors with a 0-dim. This fix removes symbolic use of static input size to fix this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43558 Reviewed By: ailzhang Differential Revision: D23965090 Pulled By: bzinodev fbshipit-source-id: 628e9d7ee5d53375f25052340ca6feabf7ba7c53	2020-09-28 14:46:51 -07:00
Gregory Chanan	5d1fee23b3	Remove convert_target from NN tests. (#45291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45291 It's not necessary, you can just check if the dtype is integral. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23911963 Pulled By: gchanan fbshipit-source-id: 230139e1651eb76226f4095e31068dded30e03e8	2020-09-28 14:21:42 -07:00
Rong Rong	986af53be2	type check for torch.testing._internalcodegen:* (#45368 ) Summary: part of `torch.testing._internal.*` effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/45368 Reviewed By: malfet Differential Revision: D23950512 Pulled By: walterddr fbshipit-source-id: 399f712d12cdd9795b0136328f512c3f86a15f24	2020-09-28 14:04:52 -07:00
Yi Wang	7a4c417ed3	Fix typo (#45379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45379 Registeres -> Registers in reducer.h. ghstack-source-id: 112982279 Test Plan: N/A Reviewed By: mrshenli Differential Revision: D23951203 fbshipit-source-id: 96c7dc2e1e12c132339b9ac83ce1da52c812740c	2020-09-28 14:02:01 -07:00
BowenBao	57c18127dc	[ONNX] Update div export to perform true divide (#44831 ) Summary: related https://github.com/pytorch/pytorch/issues/43787 Now that PyTorch div is actually performing true divide, update onnx export code to stay consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44831 Reviewed By: eellison Differential Revision: D23880316 Pulled By: bzinodev fbshipit-source-id: 3bb8db34142ac4fed4039295ad3c4cb79487987f	2020-09-28 13:53:43 -07:00
gunandrose4u	47debdca42	Document change for DDP enabled on Windows platform (#45392 ) Summary: Document change for DDP enabled on Windows platform Pull Request resolved: https://github.com/pytorch/pytorch/pull/45392 Reviewed By: gchanan Differential Revision: D23962344 Pulled By: mrshenli fbshipit-source-id: 8924c6ca36d68699871d8add3e0aab6542ea269c	2020-09-28 13:22:42 -07:00
Iurii Zdebskyi	722faeb2a4	[RELAND] Added optimizers based on multi tensor apply (#45408 ) Summary: Original PR https://github.com/pytorch/pytorch/pull/45299. The present PR fixes minor bugs that caused revert. Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45408 Reviewed By: gchanan Differential Revision: D23956680 Pulled By: izdeby fbshipit-source-id: c5eab7bf5fce14a287c15cead1cdc26e42cfed94	2020-09-28 13:14:04 -07:00
Bram Wasti	87b356d093	[static runtime] Split out graph preparation from runtime (#44131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44131 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604305 Pulled By: bwasti fbshipit-source-id: 7b47da4961d99074199417ef1407a788c7d80ee6	2020-09-28 13:01:23 -07:00
Nikolay Korovaiko	993628c74a	Build shape expressions and remove outputs that are only used by `aten::size`s (#45080 ) Summary: Currently, TE materializes all intermediate results even if they are only used for computing their shapes. This diff ports the approach the OF (Old Fuser) took to deal with this issue. Namely, given the structure of a fusion group we infer all the sizes outside a fusion group based on fusion group's inputs. A simple example would be: ``` def test_fuse(a, b): c = a + b d = c + b return d ``` Here we don't need to cache `c` as computing a gradient for `b` in `d = c + b` doesn't need it. We do need to compute sizes for all arguments here in case broadcasts happen. Without this optimization, TE would need to materialize `c` so we can get its size ``` [DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph: [DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %b.1 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1) [DUMP profiling_graph_executor_impl.cpp:499] return (%11) [DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %13 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %59 : int[] = aten::size(%13) # <string>:3:44 [DUMP profiling_graph_executor_impl.cpp:499] %62 : int[] = aten::size(%11) # <string>:3:93 [DUMP profiling_graph_executor_impl.cpp:499] %83 : Double(1:1, requires_grad=0, device=cuda:0), %84 : Double(1:1, requires_grad=0, device=cuda:0), %85 : bool = prim::TypeCheck(%11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %86 : Tensor, %87 : Tensor = prim::If(%85) [DUMP profiling_graph_executor_impl.cpp:499] block0(): [DUMP profiling_graph_executor_impl.cpp:499] %d.4 : Double(1:1, requires_grad=0, device=cuda:0), %c.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%83, %84) [DUMP profiling_graph_executor_impl.cpp:499] -> (%d.4, %c.4) [DUMP profiling_graph_executor_impl.cpp:499] block1(): [DUMP profiling_graph_executor_impl.cpp:499] %94 : Function = prim::Constant[name="fallback_function", fallback=1]() [DUMP profiling_graph_executor_impl.cpp:499] %95 : (Tensor, Tensor) = prim::CallFunction(%94, %11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %96 : Tensor, %97 : Tensor = prim::TupleUnpack(%95) [DUMP profiling_graph_executor_impl.cpp:499] -> (%96, %97) [DUMP profiling_graph_executor_impl.cpp:499] %60 : int[] = aten::size(%87) # <string>:3:55 [DUMP profiling_graph_executor_impl.cpp:499] %61 : int[]? = aten::_size_if_not_equal(%59, %60) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %64 : int[]? = aten::_size_if_not_equal(%62, %60) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] %67 : int[] = aten::size(%86) # <string>:3:55 [DUMP profiling_graph_executor_impl.cpp:499] %68 : int[]? = aten::_size_if_not_equal(%60, %67) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %71 : int[]? = aten::_size_if_not_equal(%62, %67) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] return (%86, %61, %64, %68, %71) [DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0), [DUMP profiling_graph_executor_impl.cpp:499] %4 : Double(1:1, requires_grad=0, device=cuda:0)): [DUMP profiling_graph_executor_impl.cpp:499] %5 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16 [DUMP profiling_graph_executor_impl.cpp:499] %2 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16 [DUMP profiling_graph_executor_impl.cpp:499] return (%d.3, %c.3) ``` With this optimization we use `prim::BroadcastSizes` to compute the size of `c`. No need to materialize it. ``` [DUMP profiling_graph_executor_impl.cpp:499] Optimized Graph: [DUMP profiling_graph_executor_impl.cpp:499] graph(%a.1 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %b.1 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %11 : Tensor = prim::DifferentiableGraph_0(%b.1, %a.1) [DUMP profiling_graph_executor_impl.cpp:499] return (%11) [DUMP profiling_graph_executor_impl.cpp:499] with prim::DifferentiableGraph_0 = graph(%11 : Tensor, [DUMP profiling_graph_executor_impl.cpp:499] %13 : Tensor): [DUMP profiling_graph_executor_impl.cpp:499] %59 : int[] = aten::size(%13) # <string>:3:44 [DUMP profiling_graph_executor_impl.cpp:499] %62 : int[] = aten::size(%11) # <string>:3:93 [DUMP profiling_graph_executor_impl.cpp:499] %88 : Double(1:1, requires_grad=0, device=cuda:0), %89 : Double(1:1, requires_grad=0, device=cuda:0), %90 : bool = prim::TypeCheck(%11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %91 : Tensor = prim::If(%90) [DUMP profiling_graph_executor_impl.cpp:499] block0(): [DUMP profiling_graph_executor_impl.cpp:499] %d.4 : Double(1:1, requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%88, %89) [DUMP profiling_graph_executor_impl.cpp:499] -> (%d.4) [DUMP profiling_graph_executor_impl.cpp:499] block1(): [DUMP profiling_graph_executor_impl.cpp:499] %97 : Function = prim::Constant[name="fallback_function", fallback=1]() [DUMP profiling_graph_executor_impl.cpp:499] %98 : (Tensor) = prim::CallFunction(%97, %11, %13) [DUMP profiling_graph_executor_impl.cpp:499] %99 : Tensor = prim::TupleUnpack(%98) [DUMP profiling_graph_executor_impl.cpp:499] -> (%99) [DUMP profiling_graph_executor_impl.cpp:499] %85 : int[] = aten::size(%91) [DUMP profiling_graph_executor_impl.cpp:499] %86 : int[] = prim::BroadcastSizes(%59, %62) [DUMP profiling_graph_executor_impl.cpp:499] %61 : int[]? = aten::_size_if_not_equal(%59, %86) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %64 : int[]? = aten::_size_if_not_equal(%62, %86) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] %68 : int[]? = aten::_size_if_not_equal(%86, %85) # <string>:3:19 [DUMP profiling_graph_executor_impl.cpp:499] %71 : int[]? = aten::_size_if_not_equal(%62, %85) # <string>:3:68 [DUMP profiling_graph_executor_impl.cpp:499] return (%91, %61, %64, %68, %71) [DUMP profiling_graph_executor_impl.cpp:499] with prim::TensorExprGroup_0 = graph(%1 : Double(1:1, requires_grad=0, device=cuda:0), [DUMP profiling_graph_executor_impl.cpp:499] %4 : Double(1:1, requires_grad=0, device=cuda:0)): [DUMP profiling_graph_executor_impl.cpp:499] %5 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %c.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%4, %1, %5) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2872:16 [DUMP profiling_graph_executor_impl.cpp:499] %2 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:499] %d.3 : Double(1:1, requires_grad=0, device=cuda:0) = aten::add(%c.3, %1, %2) # /scratch/villedepommes/pytorches/bench/test/test_jit.py:2873:16 [DUMP profiling_graph_executor_impl.cpp:499] return (%d.3) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45080 Reviewed By: bertmaher Differential Revision: D23856410 Pulled By: Krovatkin fbshipit-source-id: 2956286eb03a4894a5baa151c35e6092466322b1	2020-09-28 10:45:56 -07:00
Rong Rong	48d29c830d	[hotfix] disable problematic cuda tests on rocm builds (#45435 ) Summary: Disable the recent 3 cuda tests on amd rocm build/tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/45435 Reviewed By: malfet Differential Revision: D23962881 Pulled By: walterddr fbshipit-source-id: ad4ea1f835b4722cdbdce685806cfd64376cc16f	2020-09-28 10:02:12 -07:00
Nikita Vedeneev	e4950a093a	Backward support for generalized eigenvalue solver with LOBPCG in forward [only k-rank SYMEIG case] (#43002 ) Summary: As per title. Fixes [#{38948}](https://github.com/pytorch/pytorch/issues/38948). Therein you can find some blueprints for the algorithm being used in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43002 Reviewed By: zou3519 Differential Revision: D23931326 Pulled By: albanD fbshipit-source-id: e6994af70d94145f974ef87aa5cea166d6deff1e	2020-09-28 07:22:35 -07:00
Mike Ruberry	6417a70465	Updates linalg warning + docs (#45415 ) Summary: Changes the deprecation of norm to a docs deprecation, since PyTorch components still rely on norm and some behavior, like automatically flattening tensors, may need to be ported to torch.linalg.norm. The documentation is also updated to clarify that torch.norm and torch.linalg.norm are distinct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45415 Reviewed By: ngimel Differential Revision: D23958252 Pulled By: mruberry fbshipit-source-id: fd54e807c59a2655453a6bcd9f4073cb2c12e8ac	2020-09-28 05:28:42 -07:00
generatedunixname89002005325676	7818a214c5	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D23959094 fbshipit-source-id: 6caa046d263114bff38a38d756099aac357e4f04	2020-09-28 05:08:46 -07:00
Negin Raoof	95a97e51b5	[ONNX] Improve scripting inplace indexing ops (#44351 ) Summary: Fix a couple of issues with scripting inplace indexing in prepare_inplace_ops_for_onnx pass. 1- Tracing index copy (such as cases lik x[1:3] = data) already applies broadcasting on rhs if needed. The broadcasting node (aten::expand) is missing in scripting cases. 2- Inplace indexing with ellipsis (aten::copy_) is replaced with aten::index_put and then handled with slice+select in this pass. Support for negative indices for this op added. Shape inference is also enabled for scripting tests using new JIT API. A few more tests are enabled for scripting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44351 Reviewed By: ezyang Differential Revision: D23880267 Pulled By: bzinodev fbshipit-source-id: 78b33444633eb7ae0fbabc7415e3b16001f5207f	2020-09-28 00:32:36 -07:00
Zino Benaissa	13f76f2be4	Fix preserve submodule attribute in freezing (#45143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45143 This PR prevents freezing cleaning up a submodule when user requests to preserve a submodule. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23844969 Pulled By: bzinodev fbshipit-source-id: 80e6db3fc12460d62e634ea0336ae2a3551c2151	2020-09-28 00:05:38 -07:00
liqunfu	c3bf402cbb	handle onnx nll with default ignore index (#44816 ) Summary: in ONNX NegativeLogLikelihoodLoss specification, ignore_index is optional without default value. therefore, when convert nll op to ONNX, we need to set ignore_index attribute even if it is not specified (e.g. ignore_index=-100). Pull Request resolved: https://github.com/pytorch/pytorch/pull/44816 Reviewed By: ezyang Differential Revision: D23880354 Pulled By: bzinodev fbshipit-source-id: d0bdd58d0a4507ed9ce37133e68533fe6d1bdf2b	2020-09-27 23:26:19 -07:00
shubhambhokare1	5b839bca78	[ONNX] Optimize export_onnx api to reduce string and model proto exchange (#44332 ) Summary: Optimize export_onnx api to reduce string and model proto exchange in export.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/44332 Reviewed By: bwasti, eellison Differential Revision: D23880129 Pulled By: bzinodev fbshipit-source-id: 1d216d8f710f356cbba2334fb21ea15a89dd16fa	2020-09-27 16:29:08 -07:00
neginraoof	4005afe94b	[ONNX] Update narrow for dynamic inputs (#44039 ) Summary: Update narrow for dynamic inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/44039 Reviewed By: mruberry Differential Revision: D23742215 Pulled By: bzinodev fbshipit-source-id: 0d58d2fe996f91a124af988a9a21ee433e842d07	2020-09-27 15:52:57 -07:00
Natalia Gimelshein	78caa028b6	Revert D23009117: [Distributed] DeleteKey API for c10d TCP Store Test Plan: revert-hammer Differential Revision: D23009117 (`addf94f2d6`) Original commit changeset: 1a0d95b43d79 fbshipit-source-id: ad3fe5501267e1a0a7bf23410766f1e92b34b24d	2020-09-27 12:04:42 -07:00
Mike Ruberry	54a253fded	Revert D23931987: Added optimizers based on multi tensor apply Test Plan: revert-hammer Differential Revision: D23931987 (`2b21e7767e`) Original commit changeset: 582134ef2d40 fbshipit-source-id: ffd500aea55fda34155442fb15e2529cb9c00100	2020-09-26 18:11:54 -07:00
Rohan Varma	23dfca8351	Support record_shapes in RPC profiling (#44419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44419 Closes https://github.com/pytorch/pytorch/issues/39969 This PR adds support for propagation of input shapes over the wire when the profiler is invoked with `record_shapes=True` over RPC. Previously, we did not respect this argument. This is done by saving the shapes as an ivalue list and recovering it as the type expected (`std::vector<std::vector<int>>` on the client). Test is added to ensure that remote ops have the same `input_shapes` as if the op were run locally. ghstack-source-id: 112977899 Reviewed By: pritamdamania87 Differential Revision: D23591274 fbshipit-source-id: 7cf3b2e8df26935ead9d70e534fc2c872ccd6958	2020-09-26 13:26:44 -07:00
Rohan Varma	19dda7c68a	Fallback to CPU when remote end does not have CUDA for profiling (#44967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44967 When enabling profiler on server, if it is a different machine it may not have CUDA while caller does. In this case, we would crash but now we fallback to CPU and log a warning. ghstack-source-id: 112977906 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D23790729 fbshipit-source-id: dc6eba172b7e666842d54553f52a6b9d5f0a5362	2020-09-26 13:12:55 -07:00
Iurii Zdebskyi	2b21e7767e	Added optimizers based on multi tensor apply (#45299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45299 Adding a new namespace `torch.optim._multi_tensor` with a bunch of updated optimizers. Those optimizers are using _foreach APIs which improve performance significantly. ### Tests - updated existing tests to use both optimizers - added `test_multi_tensor_optimizers` test to verify correctness. ### Perf results Adam timeit: 42.69 ms --> 10.16 ms autorange: 41.96 ms --> 10.28 ms AdamW timeit: 51.38 ms --> 15.63 ms autorange: 50.82 ms --> 16.07 ms SGD timeit: 6.28 ms --> 4.40 ms autorange: 6.13 ms --> 4.73 ms RMSprop timeit: 28.63 ms --> 5.89 ms autorange: 28.27 ms --> 5.76 ms Rprop timeit: 213.30 --> 178.42 autorange: 212.03 --> 178.03 ASGD timeit: 21.67 --> 9.33 autorange: 21.64 --> 9.27 Adamax timeit: 55.60 --> 48.29 autorange: 55.22 -> 49.13 Rerf Script used ``` import torch import time import torch.optim as optim from torch.autograd import Variable from torch.optim.lr_scheduler import ExponentialLR, ReduceLROnPlateau, StepLR import torch.nn as nn import time import torchvision import torch.utils._benchmark as benchmark_utils device = "cuda" model = torchvision.models.resnet.resnet101(pretrained=True).to(device) targets = torch.randint(0, 1000, (100, 100), device=device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=1e-3) # <----------------------- optimizer. # would compare optim.SGD vs optim._multi_tensor.SGD running_loss = 0.0 target = torch.empty(128, dtype=torch.long, device=device).random_(5) optimizer.zero_grad() inputs = torch.rand(128, 3, 100, 100, device=device , requires_grad=True) outputs = model(inputs) loss = criterion(outputs, target) loss.backward() optimizer.step() running_loss += loss.item() def main(): timer = benchmark_utils.Timer( stmt="optimizer.step()", globals=globals(), label="str(optimizer)", ) for i in range(1): print(f"Run: {i}\n{'-' * 40}") print(f"timeit:\n{timer.timeit(1000)}\n") print(f"autorange:\n{timer.blocked_autorange()}\n\n") if __name__ == "__main__": main() ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23931987 Pulled By: izdeby fbshipit-source-id: 582134ef2d402909d27d89a45c5b588fb7130ea1	2020-09-26 12:17:43 -07:00
Omkar Salpekar	addf94f2d6	[Distributed] DeleteKey API for c10d TCP Store (#43963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43963 Added a DeleteKey API for the TCP Store ghstack-source-id: 112939762 Test Plan: Modified the existing get/set test to use delete. verified that the correct keys were deleted and that the numKeys API returned the right values Reviewed By: jiayisuse Differential Revision: D23009117 fbshipit-source-id: 1a0d95b43d79e665a69b2befbaa059b2b50a1f66	2020-09-26 00:54:21 -07:00
Omkar Salpekar	304e1d1e19	[Distributed] getNumKeys API to c10d TCPStore (#43962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43962 TCPStore needs a getNumKeys API for our logging needs. ghstack-source-id: 112939761 Test Plan: Adding tests to C++ Store Tests Reviewed By: pritamdamania87 Differential Revision: D22985085 fbshipit-source-id: 8a0d286fbd6fd314dcc997bae3aad0e62b51af83	2020-09-26 00:49:00 -07:00
Zafar	d9af3d2fcd	[quant] ConvTranspose warnings (#45081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45081 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23822449 Pulled By: z-a-f fbshipit-source-id: f21a5f3ef4d09f703c96fff0bc413dbadeac8202	2020-09-25 22:30:14 -07:00
Wang Xu	92189b34b7	Add get_all_users_of function to GraphManipulation (#45216 ) Summary: This PR adds get_all_users_of function. The function returns all the users of a specific node. A test unit is also added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45216 Reviewed By: ezyang Differential Revision: D23883572 Pulled By: scottxu0730 fbshipit-source-id: 3eb68a411c3c6db39ed2506c9cb7bb7337520ee4	2020-09-25 19:32:49 -07:00
Zafar	958c208666	[quant] conv_transpose graph patterns (#45078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45078 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23821580 Pulled By: z-a-f fbshipit-source-id: 813a4ef1bbc429720765d61791fe754b6678a334	2020-09-25 18:14:29 -07:00
Wanchao Liang	32c355af5b	[dist_optim] introduce distributed functional optimizer (#45221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45221 This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935256 Pulled By: wanchaol fbshipit-source-id: 59b6d77ff4693ab24a6e1cbb6740bcf614cc624a	2020-09-25 17:13:10 -07:00
Wanchao Liang	08caf15502	[optimizer] refactor Adam to use functional API (#44791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44791 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935257 Pulled By: wanchaol fbshipit-source-id: 6f6e22a9287f5515d2e4e6abd4dee2fe7e17b945	2020-09-25 17:13:08 -07:00
Wanchao Liang	0444c372e1	[optimizer] introduce optimizer functional API, refactor Adagrad (#44715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44715 We have provided a nice and intuitive API in Python. But in the context of large scale distributed training (e.g. Distributed Model Parallel), users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency. This PR introduces functional optimizer concept (that is similar to the concept of `nn.functional`), we split optimizer into two parts: 1. optimizer state management 2. optimizer computation. We expose the computation part as a separate functional API that is available to be used by internal and OSS developers, the caller of the functional API will maintain their own states in order to directly calls the functional API. While maintaining the end user API be the same, the functional API is TorchScript friendly, and could be used by the distributed optimizer to speed up the training without GIL. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935258 Pulled By: wanchaol fbshipit-source-id: d2a5228439edb3bc64f7771af2bb9e891847136a	2020-09-25 17:10:26 -07:00
Nikita Shulga	8ab2ad306d	Enable `torch.cuda.nccl` typechecking (#45344 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45336 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45344 Reviewed By: walterddr Differential Revision: D23935306 Pulled By: malfet fbshipit-source-id: dd09d4f8ff7a327131764487158675027a13bf69	2020-09-25 17:02:47 -07:00
Shen Li	5211fb97ac	Remove device maps from TensorPipe for v1.7 release (#45353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45353 Temporarily removing this feature, will add this back after branch cut. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D23939865 Pulled By: mrshenli fbshipit-source-id: 7dceaffea6b9a16512b5ba6036da73e7f8f83a8e	2020-09-25 16:51:45 -07:00
Brian Hirsh	439930c81b	adding a beta parameter to the smooth_l1 loss fn (#44433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44433 Not entirely sure why, but changing the type of beta from `float` to `double in autocast_mode.cpp and FunctionsManual.h fixes my compiler errors, failing instead at link time fixing some type errors, updated fn signature in a few more files removing my usage of Scalar, making beta a double everywhere instead Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D23636720 Pulled By: bdhirsh fbshipit-source-id: caea2a1f8dd72b3b5fd1d72dd886b2fcd690af6d	2020-09-25 16:36:28 -07:00
Pritam Damania	a2b4177c5b	Add barrier() at the end of init_process_group and new_group. (#45181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45181 `init_process_group` and `new_group` update a bunch of global variables after initializing the actual process group. As a result, there is a race that after initializing the process group on say rank 0, if we immediately check the default process group on rank 1 (say via RPC), we might actually get an error since rank 1 hasn't yet updated its _default_pg variable. To resolve this issue, I've added barrier() at the end of both of these calls. This ensures that once these calls return we are guaranteed about correct initialization on all ranks. Since these calls are usually done mostly during initialization, it should be fine to add the overhead of a barrier() here. #Closes: https://github.com/pytorch/pytorch/issues/40434, https://github.com/pytorch/pytorch/issues/40378 ghstack-source-id: 112923112 Test Plan: Reproduced the failures in https://github.com/pytorch/pytorch/issues/40434 and https://github.com/pytorch/pytorch/issues/40378 and verified that this PR fixes the issue. Reviewed By: mrshenli Differential Revision: D23858025 fbshipit-source-id: c4d5e46c2157981caf3ba1525dec5310dcbc1830	2020-09-25 15:46:59 -07:00
Vasiliy Kuznetsov	eee7dad376	Add torch.do_assert, which is symbolically traceable (#45188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45188 This is a symbolically traceable alternative to Python's `assert`. It should be useful to allow people who want to use FX to also be able to assert things. A bunch of TODO(before) land are inline - would love thoughts on where is the best place for this code to live, and what this function should be called (since `assert` is reserved). Test Plan: ``` python test/test_fx.py TestFX.test_symbolic_trace_assert ``` Imported from OSS Reviewed By: jamesr66a Differential Revision: D23861567 fbshipit-source-id: d9d6b9556140faccc0290eba1fabea401d7850de	2020-09-25 13:46:28 -07:00
Rohan Varma	7c5436d557	[RPC profiling] Add tests to ensure RPC profiling works on single threaded (#44923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44923 This ensures that RPC profiling works in single-threaded server scenarios and that we won't make the assumption that we'll have multiple threads when working on this code. For example, this assumption resulted in a bug in the previous diff (which was fixed) ghstack-source-id: 112868469 Test Plan: CI Reviewed By: lw Differential Revision: D23691304 fbshipit-source-id: b17d34ade823794cbe949b70a5ab35723d974203	2020-09-25 13:24:18 -07:00
Rohan Varma	27ab9bc0f9	[RPC profiling] Extend RPC profiling to support async function execution over RPC. (#44664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44664 Closes https://github.com/pytorch/pytorch/issues/39971. This PR adds support for functions decorated with `rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. ghstack-source-id: 112868470 Test Plan: ``` rvarm1@devbig978:fbcode (52dd34f6)$ buck test mode/no-gpu mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_rpc_profiling_async_function --print-passing-details --stress-runs 1 ``` Reviewed By: mrshenli Differential Revision: D23638387 fbshipit-source-id: eedb6d48173a4ecd41d70a9c64048920bd4807c4	2020-09-25 13:19:26 -07:00
Iurii Zdebskyi	d5748d9a1a	Enable binary ops with Scalar Lists with for foreach APIs (#45298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45298 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23931986 Pulled By: izdeby fbshipit-source-id: 281267cd6f90d57a169af89f9f10b0f4fcab47e3	2020-09-25 12:58:34 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Nikita Shulga	c8166d4b58	Add `torch.cuda.comm` to typechecking CI (#45350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45350 Reviewed By: walterddr Differential Revision: D23935750 Pulled By: malfet fbshipit-source-id: 5a7d2d4fbc976699d80bb5caf4727c19fa2c5bc8	2020-09-25 12:13:43 -07:00
Gao, Xiang	dc9e9c118e	CUDA BFloat16 neg (#45240 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/45240 Reviewed By: mruberry Differential Revision: D23933392 Pulled By: ngimel fbshipit-source-id: 2472dc550600ff470a1044ddee39054e22598038	2020-09-25 11:25:49 -07:00
Bram Wasti	e5f6e5af13	Add Deep and wide to test and flatten/tranpose for good measure (#44129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44129 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604302 Pulled By: bwasti fbshipit-source-id: 5787f6f32a80b22b1b712c4116f70370dad98f12	2020-09-25 11:05:41 -07:00
Bram Wasti	d1a11618f5	[static runtime] Add _out variants and reuse memory (#44128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44128 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D23604304 Pulled By: bwasti fbshipit-source-id: 06a23cb75700a0fc733069071843b7b498e7b9e9	2020-09-25 11:03:06 -07:00
Nick Gibson	d1d9017a66	[NNC] fix Half conversion of immediates in Cuda backend (#45213 ) Summary: The Cuda HalfChecker casts up all loads and stores of Half to Float, so we do math in Float on the device. It didn't cast up HalfImmediate (ie. constants) so they could insert mixed-size ops. Fix is to do that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45213 Reviewed By: ezyang Differential Revision: D23885287 Pulled By: nickgg fbshipit-source-id: 912991d85cc06ebb282625cfa5080d7525c8eba9	2020-09-25 10:53:36 -07:00
Supriya Rao	a117d968f6	[quant][graph] Remove redundant aten::wait calls in the graph (#45257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45257 Currently we inline fork-wait calls when we insert observers for quantization In the case where fork and wait are in different subgraphs, inlining the fork-wait calls only gets rid of the fork. This leaves the aten::wait call in the graph with a torch.Tensor as input, which is currently not supported. To avoid this we check to make sure input to all wait calls in the graph is of type Future[tensor] in the cleanup phase Test Plan: python test/test_quantization.py TestQuantizeJitPasses.test_quantize_fork_wait Imported from OSS Reviewed By: qizzzh Differential Revision: D23895412 fbshipit-source-id: 3c58c6be7d7e7904eb6684085832ac21f827a399	2020-09-25 09:52:52 -07:00
Shinichiro Hamaji	8b00c4c794	[ONNX] Correct a minor typo in warning (#45187 ) Summary: The warning for batch_norm was mentioning dropout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45187 Reviewed By: glaringlee Differential Revision: D23873215 Pulled By: ezyang fbshipit-source-id: 1dcc82ad16522215f49b4cd0fc0e357b2094e4f2	2020-09-25 09:26:51 -07:00
Sebastian Messmer	78fcde9c50	Trace scattered tensor options arguments (#44071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44071 Previously, tracing re-gathered ScalarType, Layout, Device, bool into a TensorOptions object and called `tracer::addInput()` on the gathered TensorOptions argument. `tracer::addInput()` then scattered them again and added the individual scattered arguments to the traced graph. This PR avoids the extraneous gathering and re-scattering step and calls `tracer::addInput()` on the individual arguments directly. This avoid the perf hit for an unnecessary gathering step. This applies to both c10-full and non-c10-full ops. In the case of c10-full ops, the tracing kernels takes scattered arguments and we can directly pass them to `tracer::addInput()`. In the case of non-c10-full ops, the kernel takes a `TensorOptions` argument but we still call `tracer::addInput()` on the scattered arguments. ghstack-source-id: 112825793 Test Plan: waitforsandcastle vs master: https://www.internalfb.com/intern/fblearner/details/216129483/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170069/ Reviewed By: ezyang Differential Revision: D23486638 fbshipit-source-id: e0b53e6673cef8d7f94158e718301eee261e5d22	2020-09-25 09:04:06 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
Brian Hirsh	2739a7c599	Byte-for-byte compatibility fixes in codegen (#44879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44879 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23825163 Pulled By: bdhirsh fbshipit-source-id: 4d8028274f82c401b393c4fe1b9e32de3f4909c6	2020-09-25 08:06:50 -07:00
kshitij12345	00e704e757	[fix] torch.repeat : dim-0 backward (#45212 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45212 Reviewed By: mrshenli Differential Revision: D23905545 Pulled By: albanD fbshipit-source-id: c5bf9cf481c8cf3ccc1fdbfb364006b29f67dc9f	2020-09-25 07:53:00 -07:00
Alex Suhan	76ee58e2ec	[TensorExpr] Move inner loops vectorization logic to its own method (#45287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45287 Test Plan: CI, build Reviewed By: gmagogsfm Differential Revision: D23913432 Pulled By: asuhan fbshipit-source-id: 3bf8fe09753f349e3c857863a43d2b1fca5101c1	2020-09-25 02:29:36 -07:00
Xiong Wei	241afc9188	Migrate `addr` from the TH to Aten (CPU) (#44364 ) Summary: Related https://github.com/pytorch/pytorch/issues/24507 Fixes https://github.com/pytorch/pytorch/issues/24666 This PR is to modernize the CPU implementation of the vector `outer product`. The existing TH implementation for `torch.attr` is migrated to `aten`, as the `torch.ger` manipulates the `addr` functions to calculate outer product, Pull Request resolved: https://github.com/pytorch/pytorch/pull/44364 Reviewed By: ezyang Differential Revision: D23866733 Pulled By: mruberry fbshipit-source-id: 5159ea22f0e3c991123fe7c19cc9beb6ad00301e	2020-09-25 01:18:09 -07:00
jjsjann123	99e0a87bbb	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 ) Summary: A lot of changes are in this update, some highlights: - Added Doxygen config file - Split the fusion IR (higher level TE like IR) from kernel IR (lower level CUDA like IR) - Improved latency with dynamic shape handling for the fusion logic - Prevent recompilation for pointwise + reduction fusions when not needed - Improvements to inner dimension reduction performance - Added input -> kernel + kernel launch parameters cache, added eviction policy - Added reduction fusions with multiple outputs (still single reduction stage) - Fixed code generation bugs for symbolic tiled GEMM example - Added thread predicates to prevent shared memory form being loaded multiple times - Improved sync threads placements with shared memory and removed read before write race - Fixes to FP16 reduction fusions where output would come back as FP32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45218 Reviewed By: ezyang Differential Revision: D23905183 Pulled By: soumith fbshipit-source-id: 12f5ad4cbe03e9a25043bccb89e372f8579e2a79	2020-09-24 23:17:20 -07:00
Vasiliy Kuznetsov	bdf329ef8a	SyncBN: preserve qconfig if it exists (#45317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45317 Eager mode quantization depends on the presence of the `config` model attribute. Currently converting a model to use `SyncBatchNorm` removes the qconfig - fixing this. This is important if a BN is not fused to anything during quantization convert. Test Plan: ``` python test/test_quantization.py TestDistributed.test_syncbn_preserves_qconfig ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23922072 fbshipit-source-id: cc1bc25c8e5243abb924c6889f78cf65a81be158	2020-09-24 22:52:07 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
Jerry Zhang	bc3151dee0	[quant] Remove unused qconfig argument in qat linear module (#45307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45307 fixes: https://github.com/pytorch/pytorch/issues/35634 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23917339 fbshipit-source-id: 65f8844b98198bbf93547b3d71408c2a54605218	2020-09-24 22:15:16 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
Yanli Zhao	c6500bcf14	[reland] Make grad point to bucket buffer in DDP to save memory usage (#44344 ) Summary: [test all] Pull Request resolved: https://github.com/pytorch/pytorch/pull/44344 reland #41954 Add one argument in DDP API to enable/disable letting grads pointing to views. When it is disabled, behavior is the same as DDP right now; when it is enabled, Make both variable.grad() and grad in distautograd context point to bucket buffer in DDP to save memory usage. In this case, grad will be view of bucket buffer tensors, in order to make it compatiable with optimizer.zero_grad(), we made changes in #41283. Also be noted that we can not make variable.grad() pointing to bucket buffer during construction time, because we want to keep grad undefined for unused parameters. ghstack-source-id: 112845787 Test Plan: 1. When grad_is_view=false: a. roberta_base, peak memory usage 8250MB, p50 per iteration latency 0.923second, https://www.internalfb.com/intern/fblearner/details/218029699/?notif_channel=cli b. resnet, peak memory usage 3089MB, p50 per iteration latency 0.120second, https://www.internalfb.com/intern/fblearner/details/218029035/?notif_channel=cli c. accuracy benchmark, distributed=false, .accuracy 40.914535522461, .loss: 1.6370717287064; distributed=true, .accuracy: 39.966053009033, .loss: 1.6849111318588 https://www.internalfb.com/intern/fblearner/details/218035688/?notif_channel=cli d. classy vision uru production flow, https://www.internalfb.com/intern/fblearner/details/219065811/?notif_channel=cli e. pytext flow, https://www.internalfb.com/intern/fblearner/details/219137458/?notif_channel=cli 2. When grad_is_view=true: a. roberta_base, peak memory usage 7183MB, p50 per iteration latency 0.908second, https://www.internalfb.com/intern/fblearner/details/217882539?tab=operator_details b. resnet, peak memory usage 2988 MB, p50 per iteration latency 0.119second, https://www.internalfb.com/intern/fblearner/details/218028479/?notif_channel=cli c. accuracy benchmark, distributed=false, .accuracy 41.713260650635, .loss: 1.69939661026; distributed=true, .accuracy: 39.966053009033, .loss: 1.6849111318588, https://www.internalfb.com/intern/fblearner/details/218037058/?notif_channel=cli d. classy vision uru production flow, expected, can not work well with apex.amp https://www.internalfb.com/intern/fblearner/details/219205218/?notif_channel=cli e. pytext flow, detach_() related error, expected, as pytext zero_grad depends on apex repo where detach_() is called. also seeing the warning in finalize_bucket_dense due to tied weights, which is expected. https://www.internalfb.com/intern/fblearner/details/219150229/?notif_channel=cli Reviewed By: mrshenli Differential Revision: D23588186 fbshipit-source-id: f724d325b954ef6f06ede31759bf01dd29a6f5e5	2020-09-24 20:54:51 -07:00
Xiao Wang	7e5492e1be	[minor] Fix undefined variable (#45246 ) Summary: The commit `2a37f3fd2f` https://github.com/pytorch/pytorch/pull/45130 deleted the python variable `capability` which is used in later lines. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45246 Reviewed By: walterddr Differential Revision: D23923916 Pulled By: malfet fbshipit-source-id: c5d7fef9e4a87ccc621191200e5965710e9d6aaa	2020-09-24 20:17:13 -07:00
Linbin Yu	0f2c648c97	log metadata when model loading failed (#44430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44430 log metadata even when model loading is failed Test Plan: {F331550976} Reviewed By: husthyc Differential Revision: D23577711 fbshipit-source-id: 0504e75625f377269f1e5df0f1ebe34b8e564c4b	2020-09-24 20:09:22 -07:00
Himangshu	92ebb04f92	added check for NumberType (#44375 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44107 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44375 Reviewed By: mrshenli Differential Revision: D23906728 Pulled By: eellison fbshipit-source-id: 3b534e5dd3af1f5e43a7314953e64117cbe8ffe4	2020-09-24 16:26:59 -07:00
Rohan Varma	bee1d448e7	Fix test_rpc_profiling_remote_record_function (#45162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45162 This test was flaky because it was not able to validate that the overall record_function's CPU times are greater than the sum of its children. It turns out that this is a general bug in the profiler that can be reproduced without RPC, see https://github.com/pytorch/pytorch/issues/45160. Hence, removing this from the test and replacing it by just validating the expected children. Ran the test 1000 times and they all passed. ghstack-source-id: 112632327 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23851854 fbshipit-source-id: 5d9023acd17800a6668ba4849659d8cc902b8d6c	2020-09-24 15:57:32 -07:00
Elias Ellison	5dd288eb06	[JIT] Regularize tensorexpr fuser strategy with other fusers (#44972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44972 Previously, our fusion strategy would be: - start at the end of the block, find a fusable node - iteratively try to merge inputs into the fusion group, sorted topologically This strategy works pretty well, but has the possibility of missing fusion groups. See my attached test case for an example where we wouldn't find all possible fusion groups. bertmaher found an example of a missed fusion groups in one of our rnn examples (jit_premul) that caused a regression from the legacy fuser. Here, I'm updating our fusion strategy to be the same as our other fusion passes - create_autodiff_subgraphs, and graph_fuser.cpp. The basic strategy is: - iterate until you find a fusible node - try to merge the nodes inputs, whenever a succesful merge occurs restart at the beginning of the nodes inputs - after you've exhausted a node, continue searching the block for fusion opportunities from the node - continue doing this on the block until we go through an iteration without an succesful merges Since we create the fusion groups once, and only re-specialize within the fusion groups, we should be running this very infrequently (only re-triggers when we fail undefinedness specializations). Also bc it's the same algorithm as the existing fuser it is unlikely to cause a regression. Test Plan: Imported from OSS Reviewed By: Krovatkin, robieta Differential Revision: D23821581 Pulled By: eellison fbshipit-source-id: e513d1ef719120dadb0bfafc7a14f4254cd806ee	2020-09-24 15:34:21 -07:00
Elias Ellison	0137e3641d	Refactor subgraph merging (#44238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44238 Refactor create_autodiff_subgraphs to use the same updating of output aliasing properties logic as tensorexpr fuser, and factor that out to a common function in subgraph utils. Test Plan: Imported from OSS Reviewed By: Krovatkin, robieta Differential Revision: D23871565 Pulled By: eellison fbshipit-source-id: 72df253b16baf8e4aabf3d68b103b29e6a54d44c	2020-09-24 15:29:34 -07:00
Mikhail Zolotukhin	71e6ce6616	[JIT] Specialize AutogradZero: merge AutogradAnyNonZero and Not(AutogradAnyNonZero) checks into one. (#44987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44987 This PR introduces new `prim::AutogradAllZero` and `prim::AutogradAllNonZero` ops that are used for a batch check for multiple tensors. The specialize-autogradzero pass now generates one check for all expected-to-be-undefined tensors, one check for all expected-to-be-defined tensors, and a bunch of checks for size parameters passed to `grad_sum_to_size` (this probably could be cleaned up somehow as well in future). An example of what we generated before this change: ``` %1626 : bool = prim::AutogradAnyNonZero(%0) %1627 : bool = prim::AutogradAnyNonZero(%2) %1628 : bool = aten::__not__(%1627) %1629 : bool = prim::AutogradAnyNonZero(%3) %1630 : bool = aten::__not__(%1629) %1631 : bool = prim::AutogradAnyNonZero(%4) %1632 : bool = aten::__not__(%1631) %1633 : bool = prim::AutogradAnyNonZero(%5) %1634 : bool = aten::__not__(%1633) %1635 : bool = prim::AutogradAnyNonZero(%6) %1636 : bool = aten::__not__(%1635) %1637 : bool = prim::AutogradAnyNonZero(%7) %1638 : bool = aten::__not__(%1637) %1639 : bool = prim::AutogradAnyNonZero(%8) %1640 : bool = aten::__not__(%1639) %1641 : bool = prim::AutogradAnyNonZero(%9) %1642 : bool = aten::__not__(%1641) %1643 : bool = prim::AutogradAnyNonZero(%10) %1644 : bool = aten::__not__(%1643) %1645 : bool = prim::AutogradAnyNonZero(%11) %1646 : bool = aten::__not__(%1645) %1647 : bool = prim::AutogradAnyNonZero(%12) %1648 : bool = aten::__not__(%1647) %1649 : bool = prim::AutogradAnyNonZero(%13) %1650 : bool = aten::__not__(%1649) %1651 : bool = prim::AutogradAnyNonZero(%14) %1652 : bool = aten::__not__(%1651) %1653 : bool = prim::AutogradAnyNonZero(%15) %1654 : bool = aten::__not__(%1653) %1655 : bool = prim::AutogradAnyNonZero(%16) %1656 : bool = aten::__not__(%1655) %1657 : bool = prim::AutogradAnyNonZero(%17) %1658 : bool = prim::AutogradAnyNonZero(%18) %1659 : bool = prim::AutogradAnyNonZero(%19) %1660 : bool = prim::AutogradAnyNonZero(%20) %1661 : bool = aten::__is__(%self_size.16, %1625) %1662 : bool = aten::__is__(%other_size.16, %1625) %1663 : bool = aten::__is__(%self_size.14, %1625) %1664 : bool = aten::__is__(%self_size.12, %1625) %1665 : bool = prim::AutogradAnyNonZero(%ingate.7) %1666 : bool = prim::AutogradAnyNonZero(%forgetgate.7) %1667 : bool = prim::AutogradAnyNonZero(%cellgate.7) %1668 : bool = prim::AutogradAnyNonZero(%30) %1669 : bool = prim::AutogradAnyNonZero(%31) %1670 : bool = aten::__is__(%self_size.10, %1625) %1671 : bool = aten::__is__(%other_size.10, %1625) %1672 : bool = prim::AutogradAnyNonZero(%34) %1673 : bool = prim::AutogradAnyNonZero(%35) %1674 : bool = aten::__is__(%self_size.8, %1625) %1675 : bool = aten::__is__(%other_size.8, %1625) %1676 : bool = aten::__is__(%self_size.6, %1625) %1677 : bool = aten::__is__(%other_size.6, %1625) %1678 : bool = prim::AutogradAnyNonZero(%outgate.7) %1679 : bool = prim::AutogradAnyNonZero(%41) %1680 : bool = prim::AutogradAnyNonZero(%42) %1681 : bool = prim::AutogradAnyNonZero(%43) %1682 : bool = aten::__is__(%self_size.4, %1625) %1683 : bool = aten::__is__(%other_size.4, %1625) %1684 : bool[] = prim::ListConstruct(%1626, %1628, %1630, %1632, %1634, %1636, %1638, %1640, %1642, %1644, %1646, %1648, %1650, %1652, %1654, %1656, %1657, %1658, %1659, %1660, %1661, %1662, %1663, %1664, %1665, %1666, %1667, %1668, %1669, %1670, %1671, %1672, %1673, %1674, %1675, %1676, %1677, %1678, %1679, %1680, %1681, %1682, %1683) %1685 : bool = aten::all(%1684) ``` Same example after this change: ``` %1625 : None = prim::Constant() %1626 : bool = aten::__is__(%self_size.16, %1625) %1627 : bool = aten::__is__(%other_size.16, %1625) %1628 : bool = aten::__is__(%self_size.14, %1625) %1629 : bool = aten::__is__(%self_size.12, %1625) %1630 : bool = aten::__is__(%self_size.10, %1625) %1631 : bool = aten::__is__(%other_size.10, %1625) %1632 : bool = aten::__is__(%self_size.8, %1625) %1633 : bool = aten::__is__(%other_size.8, %1625) %1634 : bool = aten::__is__(%self_size.6, %1625) %1635 : bool = aten::__is__(%other_size.6, %1625) %1636 : bool = aten::__is__(%self_size.4, %1625) %1637 : bool = aten::__is__(%other_size.4, %1625) %1638 : bool = prim::AutogradAllNonZero(%0, %17, %18, %19, %20, %ingate.7, %forgetgate.7, %cellgate.7, %30, %31, %34, %35, %outgate.7, %41, %42, %43) %1639 : bool = prim::AutogradAllZero(%2, %3, %4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14, %15, %16) %1640 : bool[] = prim::ListConstruct(%1626, %1627, %1628, %1629, %1630, %1631, %1632, %1633, %1634, %1635, %1636, %1637, %1638, %1639) %1641 : bool = aten::all(%1640) ``` My performance measurements showed some changes, but I don't really trust them and think that they are probably just a noise. Below are tables with min-aggregation over 10 runs: FastRNN models: \| name \| base time (s) \| diff time (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| lstm[aten]:bwd \| 30.059927 \| 29.834089 \| -0.8% \| \| lstm[aten]:fwd \| 25.673708 \| 25.700039 \| 0.1% \| \| lstm[cudnn]:bwd \| 17.866232 \| 17.893120 \| 0.2% \| \| lstm[cudnn]:fwd \| 11.418444 \| 11.408514 \| -0.1% \| \| lstm[jit]:bwd \| 27.127205 \| 27.141029 \| 0.1% \| \| lstm[jit]:fwd \| 17.018047 \| 16.975451 \| -0.3% \| \| lstm[jit_multilayer]:bwd \| 27.502396 \| 27.365149 \| -0.5% \| \| lstm[jit_multilayer]:fwd \| 16.918591 \| 16.917767 \| -0.0% \| \| lstm[jit_premul]:bwd \| 22.281199 \| 22.215082 \| -0.3% \| \| lstm[jit_premul]:fwd \| 14.848708 \| 14.896231 \| 0.3% \| \| lstm[jit_premul_bias]:bwd \| 20.761206 \| 21.170969 \| 2.0% \| \| lstm[jit_premul_bias]:fwd \| 15.013515 \| 15.037978 \| 0.2% \| \| lstm[jit_simple]:bwd \| 26.715771 \| 26.697786 \| -0.1% \| \| lstm[jit_simple]:fwd \| 16.675898 \| 16.545893 \| -0.8% \| \| lstm[py]:bwd \| 56.327065 \| 54.731030 \| -2.8% \| \| lstm[py]:fwd \| 39.876324 \| 39.230572 \| -1.6% \| Torch Hub models: \| name \| base time (s) \| diff time (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| test_eval[BERT_pytorch-cuda-jit] \| 0.111706 \| 0.106604 \| -4.6% \| \| test_eval[LearningToPaint-cuda-jit] \| 0.002841 \| 0.002801 \| -1.4% \| \| test_eval[Super_SloMo-cuda-jit] \| 0.384869 \| 0.384737 \| -0.0% \| \| test_eval[attension_is_all_you_nee...-cuda-jit] \| 0.123857 \| 0.123923 \| 0.1% \| \| test_eval[demucs-cuda-jit] \| 0.077270 \| 0.076878 \| -0.5% \| \| test_eval[fastNLP-cuda-jit] \| 0.000255 \| 0.000249 \| -2.3% \| \| test_eval[moco-cuda-jit] \| 0.426472 \| 0.427380 \| 0.2% \| \| test_eval[pytorch_CycleGAN_and_pix...-cuda-jit] \| 0.026483 \| 0.026423 \| -0.2% \| \| test_eval[pytorch_mobilenet_v3-cuda-jit] \| 0.036202 \| 0.035853 \| -1.0% \| \| test_eval[pytorch_struct-cuda-jit] \| 0.001439 \| 0.001495 \| 3.9% \| \| test_train[BERT_pytorch-cuda-jit] \| 0.247236 \| 0.247188 \| -0.0% \| \| test_train[Background_Matting-cuda-jit] \| 3.536659 \| 3.581864 \| 1.3% \| \| test_train[LearningToPaint-cuda-jit] \| 0.015341 \| 0.015331 \| -0.1% \| \| test_train[Super_SloMo-cuda-jit] \| 1.018626 \| 1.019098 \| 0.0% \| \| test_train[attension_is_all_you_nee...-cuda-jit] \| 0.446314 \| 0.444893 \| -0.3% \| \| test_train[demucs-cuda-jit] \| 0.169647 \| 0.169846 \| 0.1% \| \| test_train[fastNLP-cuda-jit] \| 0.001990 \| 0.001978 \| -0.6% \| \| test_train[moco-cuda-jit] \| 0.855323 \| 0.856974 \| 0.2% \| \| test_train[pytorch_mobilenet_v3-cuda-jit] \| 0.497723 \| 0.485416 \| -2.5% \| \| test_train[pytorch_struct-cuda-jit] \| 0.309692 \| 0.308792 \| -0.3% \| Differential Revision: D23794659 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 859b68868ef839c5c6cbc7021879ee22d3144ea8	2020-09-24 14:31:49 -07:00
Yi Wang	022ba5a78b	Make ddp_comm_hook_wrapper a private method. (#44643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44643 This method is not used anywhere else. Also formatted the file. Test Plan: buck test caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks Reviewed By: pritamdamania87 Differential Revision: D23675945 fbshipit-source-id: 2d04f94589a20913e46b8d71e6a39b70940c1461	2020-09-24 13:29:48 -07:00
Xinyu Li	26001a2334	Revert D23753711: [pytorch][PR] Add foreach APIs for binary ops with ScalarList Test Plan: revert-hammer Differential Revision: D23753711 (`71d1b5b0e2`) Original commit changeset: bf3e8c54bc07 fbshipit-source-id: 192692e0d3fff4cade9983db0a1760fedfc9674c	2020-09-24 11:55:49 -07:00
Gao, Xiang	3f5eee666c	Adjust TF32 tests (#44240 ) Summary: - The thresholds of some tests are bumped up. Depending on the random generator, sometimes these tests fail with things like 0.0059 is not smaller than 0.005. I ran `test_nn.py` and `test_torch.py` for 10+ times to check these are no longer flaky. - Add `tf32_on_and_off` to new `matrix_exp` tests. - Disable TF32 on test suites other than `test_nn.py` and `test_torch.py` cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/44240 Reviewed By: mruberry Differential Revision: D23882498 Pulled By: ngimel fbshipit-source-id: 44a9ec08802c93a2efaf4e01d7487222478b6df8	2020-09-24 10:25:58 -07:00
Rohan Varma	e57a08119b	Add a warning log when there is high skew of uneven inputs in DDP training (#45238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45238 Adds a warning when there is much higher than expected amount of discrepancy of inputs across different processes when running with uneven inputs. This is because a skew in the thousands can reduce performance a nontrivial amount as shown in benchmarks, and it was proposed to add this warning as a result. Tested by running the tests so the threshold is hit and observing the output. ghstack-source-id: 112773552 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23719270 fbshipit-source-id: 306264f62c1de65e733696a912bdb6e9376d5622	2020-09-24 09:50:44 -07:00
Raziel Alvarez Guevara	2b38c09f69	Moves prim ops from C10 back to JIT (#45144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45144 Moves prim ops from C10 back to JIT. These were originally moved to C10 from JIT in D19237648 (`f362cd510d`) ghstack-source-id: 112775781 Test Plan: buck test //caffe2/test/cpp/jit:jit https://pxl.cl/1l22N buck test adsatlas/gavel/lib/ata_processor/tests:ata_processor_test https://pxl.cl/1lBxD Reviewed By: iseeyuan Differential Revision: D23697598 fbshipit-source-id: 36d1eb8c346e9b161ba6af537a218440a9bafd27	2020-09-24 09:44:20 -07:00
Taylor Robie	8507ea22b2	replace timer test with a mocked variant (#45173 ) Summary: I noticed that the recently introduced adaptive_autorange tests occasionally timeout CI, and I've been meaning to improve the Timer tests for a while. This PR allows unit tests to swap the measurement portion of `Timer` with a deterministic mock so we can thoroughly test behavior without having to worry about flaky CI measurements. It also means that the tests can be much more detailed and still finish very quickly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45173 Test Plan: You're lookin' at it. Reviewed By: ezyang Differential Revision: D23873548 Pulled By: robieta fbshipit-source-id: 26113e5cea0cbf46909b9bf5e90c878c29e87e88	2020-09-24 09:42:37 -07:00
Rong Rong	bc591d76a1	add skip_if_rocm to all requires_nccl tests (#45158 ) Summary: requires_nccl annotation should skip_if_rocm as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/45158 Reviewed By: seemethere Differential Revision: D23879952 Pulled By: walterddr fbshipit-source-id: 818fb31ab75d5f02e77fe3f1367faf748855bee7	2020-09-24 08:37:49 -07:00
iurii zdebskyi	71d1b5b0e2	Add foreach APIs for binary ops with ScalarList (#44743 ) Summary: In this PR: 1) Added binary operations with ScalarLists. 2) Fixed _foreach_div(...) bug in native_functions 3) Covered all possible cases with scalars and scalar lists in tests 4) [minor] fixed bug in native_functions by adding "use_c10_dispatcher: full" to all _foreach functions tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44743 Reviewed By: bwasti, malfet Differential Revision: D23753711 Pulled By: izdeby fbshipit-source-id: bf3e8c54bc07867e8f6e82b5d3d35ff8e99b5a0a	2020-09-24 08:30:42 -07:00
Rong Rong	bea7901e38	Enable torch.tensor typechecks (#45077 ) Summary: this fixes https://github.com/pytorch/pytorch/issues/42983. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45077 Reviewed By: ezyang Differential Revision: D23842493 Pulled By: walterddr fbshipit-source-id: 1c516a5ff351743a187d00cba7ed0be11678edf1	2020-09-24 08:22:06 -07:00
Peter Bell	dc67b47bc9	Deprecate old fft functions (#44876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44876 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23866715 Pulled By: mruberry fbshipit-source-id: 73305eb02f92cbd1ef7d175419529d19358fedda	2020-09-24 02:39:44 -07:00
Alex Suhan	3dd0e362db	[TensorExpr] Fix min and max for integral inputs in CUDA backend (#44984 ) Summary: For integral types, isnan is meaningless. Provide specializations for maximum and minimum which don't call it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44984 Test Plan: python test/test_jit_fuser_te.py -k TestTEFuser.test_minmax_int_ops Reviewed By: ezyang Differential Revision: D23885259 Pulled By: asuhan fbshipit-source-id: 2e6da2c43c0ed18f0b648a2383d510894c574437	2020-09-23 23:19:12 -07:00
Peter Bell	6a2e9eb51c	torch.fft: Multi-dimensional transforms (#44550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44550 Part of the `torch.fft` work (gh-42175). This adds n-dimensional transforms: `fftn`, `ifftn`, `rfftn` and `irfftn`. This is aiming for correctness first, with the implementation on top of the existing `_fft_with_size` restrictions. I plan to follow up later with a more efficient rewrite that makes `_fft_with_size` work with arbitrary numbers of dimensions. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23846032 Pulled By: mruberry fbshipit-source-id: e6950aa8be438ec5cb95fb10bd7b8bc9ffb7d824	2020-09-23 22:09:58 -07:00
Supriya Rao	60665ace17	[quant] Add optimized approach to calculate qparams for qembedding_bag (#45149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45149 The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23848060 fbshipit-source-id: c6c57c9bb07664c3f1c87dd7664543e09f634aee	2020-09-23 19:00:22 -07:00
Alex Suhan	76c185dcca	[TensorExpr] When lanes differ, insert Broadcast instead of Cast (#45179 ) Summary: We need to check if dtypes differ in scalar type or lanes to decide between Cast and Broadcast. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45179 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyBroadcastTermExpander Reviewed By: bwasti Differential Revision: D23873316 Pulled By: asuhan fbshipit-source-id: ca141be67e10c2b6c5f2ff9c11e42dcfc62ac620	2020-09-23 17:06:54 -07:00
Jerry Zhang	f93ead6d37	[quant][eagermode] Custom module support (#44835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44835 This is for feature parity with fx graph mode quantization Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23745086 fbshipit-source-id: ae2fc86129f9896d5a9039b73006a4da15821307	2020-09-23 15:39:40 -07:00
Alex Suhan	0495998862	[TensorExpr] Disallow arithmetic binary operations on Bool (#44677 ) Summary: Arithmetic operations on Bool aren't fully supported in the evaluator. Moreover, such semantics can be implemented by the client code through insertion of explicit casts to widen and narrow to the desired types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44677 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.ExprDisallowBoolArithmetic python test/test_jit_fuser_te.py Reviewed By: agolynski Differential Revision: D23801412 Pulled By: asuhan fbshipit-source-id: fff5284e3a216655dbf5a9a64d1cb1efda271a36	2020-09-23 14:59:11 -07:00
Alex Suhan	8e0fc711f4	[TensorExpr] Remove unused EvalConstExpr function (#45180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45180 Test Plan: build Reviewed By: ezyang Differential Revision: D23877151 Pulled By: asuhan fbshipit-source-id: a5d4d211c1dc85e6f7045330606163a933b9474e	2020-09-23 14:55:27 -07:00
Yi Wang	2a1a51facb	Fix typos. (#45195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45195 Fix some typos in reducer class. ghstack-source-id: 112673443 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D23862399 fbshipit-source-id: 0dc69e5ea1fa7d33c85d1909b2216bcd1f579f6a	2020-09-23 14:51:15 -07:00
Wanchao Liang	3f89b779c4	[jit] allow submodule methods inference rule be different (#43872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43872 This PR allows the recursive scripting to have a separate submodule_stubs_fn to create its submodule with specific user provided rules. Fixes https://github.com/pytorch/pytorch/issues/43729 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23430176 Pulled By: wanchaol fbshipit-source-id: 20530d7891ac3345b36f1ed813dc9c650b28d27a	2020-09-23 14:10:31 -07:00
Nick Gibson	9e206ee9f1	[NNC] Fix a bug in SplitWithMask when splitting multiple times (#45141 ) Summary: When doing a splitWithMask we only mask if the loop extent is not cleanly divide by the split factor. However, the logic does not simplify so any nontrivial loop extents will always cause a mask to be added, e.g. if the loop had been previously split. Unlike splitWithTail, the masks added by splitWithMask are always overhead and we don't have the analysis to optimize them out if they are unnecessary, so it's good to avoid inserting them if we can. The fix is just to simplify the loop extents before doing the extent calculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45141 Reviewed By: ezyang Differential Revision: D23869170 Pulled By: nickgg fbshipit-source-id: 44686fd7b802965ca4f5097b0172a41cf837a1f5	2020-09-23 14:04:58 -07:00
Jerry Zhang	adb2b380ba	[quant][graphmode][fx] qconfig_dict support more types of configurations (#44856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44856 Support following format of qconfig_dict ```python qconfig_dict = { # optional, global config "": qconfig?, # optional, used for module and function types # could also be split into module_types and function_types if we prefer "object_type": [ (nn.Conv2d, qconfig?), (F.add, qconfig?), ..., ], # optional, used for module names "module_name": [ ("foo.bar", qconfig?) ..., ], # optional, matched in order, first match takes precedence "module_name_regex": [ ("foo.bar.conv[0-9]+", qconfig?) ..., ] # priority (in increasing order): global, object_type, module_name_regex, module_name # qconfig == None means fusion and quantization should be skipped for anything # matching the rule } ``` Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23751304 fbshipit-source-id: 5b98f4f823502b12ae2150c93019c7b229c49c50	2020-09-23 13:59:53 -07:00
Bradley Davis	21fabae47a	Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44684 The ad-hoc quantization benchmarking script in D23689062 recently highlighted that quantized ops were surprisingly slow after the introduction of support for custom ops in torch.fx in D23203204 (`f15e27265f`). Using strobelight, it's immediately clear that up to 66% of samples were seen in `c10::get_backtrace`, which is descends from `torch::is_tensor_and_apppend_overloaded -> torch::check_has_torch_function -> torch::PyTorch_LookupSpecial -> PyObject_HasAttrString -> PyObject_GetAttrString`. I'm no expert by any means so please correct any/all misinterpretation, but it appears that: - `check_has_torch_function` only needs to return a bool - `PyTorch_LookupSpecial` should return `NULL` if a matching method is not found on the object - in the impl of `PyTorch_LookupSpecial` the return value from `PyObject_HasAttrString` only serves as a bool to return early, but ultimately ends up invoking `PyObject_GetAttrString`, which raises, spawning the generation of a backtrace - `PyObject_FastGetAttrString` returns `NULL` (stolen ref to an empty py::object if the if/else if isn't hit) if the method is not found, anyway, so it could be used singularly instead of invoking both `GetAttrString` and `FastGetAttrString` - D23203204 (`f15e27265f`) compounded (but maybe not directly caused) the problem by increasing the number of invocations so, removing it in this diff and seeing how many things break :) before: strobelight: see internal section output from D23689062 script: ``` $ ./buck-out/gen/scripts/v/test_pt_quant_perf.par Sequential( (0): Quantize(scale=tensor([0.0241]), zero_point=tensor([60]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017489388585090637, zero_point=68, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.010896682739257812 q 0.11908197402954102 ``` after: strobelight: see internal section output from D23689062 script: ``` $ ./buck-out/gen/scripts/v/test_pt_quant_perf.par Sequential( (0): Quantize(scale=tensor([0.0247]), zero_point=tensor([46]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.012683945707976818, zero_point=41, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.011141300201416016 q 0.022639036178588867 ``` which roughly restores original performance seen in P142370729 UPDATE: 9/22 mode/opt benchmarks ``` buck run //scripts/x:test_pt_quant_perf mode/opt Sequential( (0): Quantize(scale=tensor([0.0263]), zero_point=tensor([82]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.021224206313490868, zero_point=50, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.002968311309814453 q 0.5138928890228271 ``` with patch: ``` buck run //scripts/x:test_pt_quant_perf mode/opt Sequential( (0): Quantize(scale=tensor([0.0323]), zero_point=tensor([70]), dtype=torch.quint8) (1): QuantizedLinear(in_features=4, out_features=4, scale=0.017184294760227203, zero_point=61, qscheme=torch.per_tensor_affine) (2): DeQuantize() ) fp 0.0026655197143554688 q 0.0064449310302734375 ``` Reviewed By: ezyang Differential Revision: D23697334 fbshipit-source-id: f756d744688615e01c94bf5c48c425747458fb33	2020-09-23 13:52:54 -07:00
Zino Benaissa	4d80c8c648	Fix inlining interface call in fork subgraph (#43790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43790 Interface calls were not handled properly when they are used in fork subgraph. This PR fixes this issue. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23402039 Pulled By: bzinodev fbshipit-source-id: 41adc5ee7d942250e732e243ab30e356d78d9bf7	2020-09-23 11:17:19 -07:00
Edward Yang	da4033d32a	Make cudaHostRegister actually useful on cudart. (#45159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45159 By default, pybind11 binds void* to be capsules. After a lot of Googling, I have concluded that this is not actually useful: you can't actually create a capsule from Python land, and our data_ptr() function returns an int, which means that the function is effectively unusable. It didn't help that we had no tests exercising it. I've replaced the void* with uintptr_t, so that we now accept int (and you can pass data_ptr() in directly). I'm not sure if we should make these functions accept ctypes types; unfortunately, pybind11 doesn't seem to have any easy way to do this. Fixes #43006 Also added cudaHostUnregister which was requested. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D23849731 Pulled By: ezyang fbshipit-source-id: 8a79986f3aa9546abbd2a6a5828329ae90fd298f	2020-09-23 11:05:44 -07:00
Taylor Robie	a5a4924c27	Warn if `import torch` is called from the source root. (#39995 ) Summary: This is a small developer quality of life improvement. I commonly try to run some snippet of python as I'm working on a PR and forget that I've cd-d into the local clone to run some git commands, resulting in annoying failures like: `ImportError: cannot import name 'default_generator' from 'torch._C' (unknown location)` This actually took a non-trivial amount of time to figure out the first time I hit it, and even now it's annoying because it happens just infrequently enough to not sit high in the mental cache. This PR adds a check to `torch/__init__.py` and warns if `import torch` is likely resolving to the wrong thing: ``` WARNING:root:You appear to be importing PyTorch from a clone of the git repo: /data/users/taylorrobie/repos/pytorch This will prevent `import torch` from resolving to the PyTorch install (instead it will try to load /data/users/taylorrobie/repos/pytorch/torch/__init__.py) and will generally lead to other failures such as a failure to load C extensions. ``` so that the soon to follow internal import failure makes some sense. I elected to make this a warning rather than an exception because I'm not 100% sure that it's always wrong. (e.g. weird `PYTHONPATH` or `importlib` corner cases.) EDIT: There are now separate cases for `cwd` vs. `PYTHONPATH`, and failure is an `ImportError`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39995 Reviewed By: malfet Differential Revision: D23817209 Pulled By: robieta fbshipit-source-id: d9ac567acb22d9c8c567a8565a7af65ac624dbf7	2020-09-23 10:55:08 -07:00
Shen Li	94c3cdd994	Let rpc._all_gather use default RPC timeout (#44983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44983 `_all_gather` was converted from `_wait_all_workers` and inherited its 5 seconds fixed timeout. As `_all_gather` meant to support a broader set of use cases, the timeout configuration should be more flexible. This PR makes `rpc._all_gather` use the global default RPC timeout. Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23794383 Pulled By: mrshenli fbshipit-source-id: 382f52c375f0f25c032c5abfc910f72baf4c5ad9	2020-09-23 08:06:09 -07:00
Martin Yuan	e5bade7b2c	[PyTorch Mobile] Move string op registrations to prim and make them selective (#44960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44960 Since we have templated selective build, it should be safe to move the operators to prim so that they can be selectively built in mobile Test Plan: CI Reviewed By: linbinyu Differential Revision: D23772025 fbshipit-source-id: 52cebae76e4df5a6b2b51f2cd82f06f75e2e45d0	2020-09-23 07:42:35 -07:00
Luca Wehrstedt	76dc50e9c8	[RPC] Infer backend type if only options are given (#45065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45065 To preserve backwards compatibility with applications that were passing in some ProcessGroupRpcBackendOptions but were not explicitly setting backend=BackendType.PROCESS_GROUP, we're here now inferring the backend type from the options if only the latter ones are passed. If neither are passed, we'll default to TensorPipe, as before this change. ghstack-source-id: 112586258 Test Plan: Added new unit tests. Reviewed By: pritamdamania87 Differential Revision: D23814289 fbshipit-source-id: f4be7919e0817a4f539a50ab12216dc3178cb752	2020-09-23 00:46:27 -07:00
Alex Suhan	215679573e	[TensorExpr] Fix operator order in combineMultilane (#45157 ) Summary: combineMultilane used the wrong order when ramp was on the left hand side, which matters for subtract. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45157 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.SimplifyRampSubBroadcast Reviewed By: ailzhang Differential Revision: D23851751 Pulled By: asuhan fbshipit-source-id: 864d1611e88769fb43327ef226bb3310017bf858	2020-09-22 23:50:47 -07:00
Supriya Rao	7fba30c2be	[quant][fx][bug] Fix error in convert step for QAT (#45050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45050 Update tests to actually test for QAT Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_linear Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23808022 fbshipit-source-id: d749ab2d215fe19238ff9d539307ffce9ef0ca9b	2020-09-22 22:48:31 -07:00
Zachary DeVito	25ed739ac9	[packaging] rstrip fix (#45166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45166 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23852505 Pulled By: zdevito fbshipit-source-id: 6bb743b37333ae19fc24629686e8d06aef812c50	2020-09-22 21:23:47 -07:00
Zachary DeVito	cb75addee4	torch.package - a way to package models and code (#45015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45015 torch.package allows you to write packages of code, pickled python data, and arbitrary binary and text resources into a self-contained package. torch.package.PackageExporter writes the packages and torch.package.PackageImporter reads them. The importers can load this code in a hermetic way, such that code is loaded from the package rather than the normal python import system. This allows for the packaging of PyTorch model code and data so that it can be run on a server or used in the future for transfer learning. The code contained in packages is copied file-by-file from the original source when it is created, and the file format is a specially organized zip file. Future users of the package can unzip the package, and edit the code in order to perform custom modifications to it. The importer for packages ensures that code in the module can only be loaded from within the package, except for modules explicitly listed as external using :method:`extern_module`. The file `extern_modules` in the zip archive lists all the modules that a package externally depends on. This prevents "implicit" dependencies where the package runs locally because it is importing a locally-installed package, but then fails when the package is copied to another machine. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23824337 Pulled By: zdevito fbshipit-source-id: 1247c34ba9b656f9db68a83e31f2a0fbe3bea6bd	2020-09-22 21:21:21 -07:00
Rohan Varma	d4a634c209	[RPC profiling] Don't wrap toHere() calls with profiling (#44655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44655 Since `toHere()` does not execute operations over RPC and simply transfers the value to the local node, we don't need to enable the profiler remotely for this message. This causes unnecessary overhead and is not needed. Since `toHere` is a blocking call, we already profile the call on the local node using `RECORD_USER_SCOPE`, so this does not change the expected profiler results (validated by ensuring all remote profiling tests pass). ghstack-source-id: 112605610 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23641466 fbshipit-source-id: 109d9eb10bd7fe76122b2026aaf1c7893ad10588	2020-09-22 21:17:00 -07:00
Rohan Varma	70d2e4d1f6	[RPC profiling] Allow disableProfiler() to be called from another thread. (#44653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653 This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. ghstack-source-id: 112605620 Reviewed By: mrshenli Differential Revision: D23638499 fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b	2020-09-22 21:16:58 -07:00
Rohan Varma	1bd6533d60	Remove thread_local RecordFunctionGuard from profiler. (#44646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44646 Per a discussion with ilia-cher, this is not needed anymore and removing it would make some future changes to support async RPC profiling easier. Tested by ensuring profiling tests in `test_autograd.py` still pass. ghstack-source-id: 112605618 Test Plan: CI Reviewed By: mrshenli Differential Revision: D23683998 fbshipit-source-id: 4e49a439509884fe04d922553890ae353e3331ab	2020-09-22 21:15:31 -07:00
Jerry Zhang	f575df201f	[quant][graphmode][jit][api] Expose preserved_attrs from finalize to convert_jit (#44490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44490 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23631142 fbshipit-source-id: f0913f0cb4576067e2a7288326024942d12e0ae0	2020-09-22 19:37:25 -07:00
Meghan Lele	e045119956	[JIT] Add default arguments for class types (#45098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45098 Summary This commit adds support for default arguments in methods of class types. Similar to how default arguments are supported for regular script functions and methods on scripted modules, default values are retrieved from the definition of a TorchScript class in Python as Python objects, converted to IValues, and then attached to the schemas of already compiled class methods. Test Plan This commit adds a set of new tests to TestClassType to test default arguments. Fixes This commit fixes #42562. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23844769 Pulled By: SplitInfinity fbshipit-source-id: ceedff7703bf9ede8bd07b3abcb44a0f654936bd	2020-09-22 18:37:44 -07:00
Bram Wasti	ebde5a80bb	[tensorexpr] Add flag to fuse with unknown shapes (#44401 ) Summary: This flag simply allows users to get fusion groups that will eventually have shapes (such that `getOperation` is a valid). This is useful for doing early analysis and compiling just in time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44401 Reviewed By: ZolotukhinM Differential Revision: D23656140 Pulled By: bwasti fbshipit-source-id: 9a26c202752399d1932ad7d69f21c88081ffc1e5	2020-09-22 18:17:47 -07:00
Nikita Shulga	2a37f3fd2f	Relax CUDA architecture check (#45130 ) Summary: NVIDIA GPUs are binary compatible within major compute capability revision This would prevent: "GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation." messages from appearing, since CUDA-11 do not support code generation for sm_85. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45130 Reviewed By: ngimel Differential Revision: D23841556 Pulled By: malfet fbshipit-source-id: bcfc9e8da63dfe62cdec06909b6c049aaed6a18a	2020-09-22 17:26:47 -07:00
Jerry Zhang	ccfbfe5eb5	[quant][graphmode][fx] Custom module support (#44766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44766 There might be modules that are not symbolically traceable, e.g. LSTM (since it has input dependent control flows), to support quantization in these cases, user will provide the corresponding observed and quantized version of the custom module, the observed custom module with observers already inserted in the module and the quantized version will have the corresponding ops quantized. And use ``` from torch.quantization import register_observed_custom_module_mapping from torch.quantization import register_quantized_custom_module_mapping register_observed_custom_module_mapping(CustomModule, ObservedCustomModule) register_quantized_custom_module_mapping(CustomModule, QuantizedCustomModule) ``` to register the custom module mappings, we'll also need to define a custom delegate class for symbolic trace in order to prevent the custom module from being traced: ```python class CustomDelegate(DefaultDelegate): def is_leaf_module(self, m): return (m.__module__.startswith('torch.nn') and not isinstance(m, torch.nn.Sequential)) or \ isinstance(m, CustomModule) m = symbolic_trace(original_m, delegate_class=CustomDelegate) ``` Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23723455 fbshipit-source-id: 50d666e29b94cbcbea5fb6bcc73b00cff87eb77a	2020-09-22 17:11:46 -07:00
James Reed	7f4a27be3a	[resubmit][FX] s/get_param/get_attr/ (#45147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45147 ghstack-source-id: 112605923 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23845096 fbshipit-source-id: 9ca209aa84cbaddd6e89c52b541e43b11197e2d5	2020-09-22 17:06:18 -07:00
hangjunxu	35cdb01327	[PyTorch] Enable type check for autocast_test_lists (#45107 ) Summary: This is a sub-task for addressing: https://github.com/pytorch/pytorch/issues/42969. We re-enable type check for `autocast_test_lists `. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45107 Test Plan: `python test/test_type_hints.py` passed: ``` (pytorch) bash-5.0$ with-proxy python test/test_type_hints.py .... ---------------------------------------------------------------------- Ran 4 tests in 103.871s OK ``` Reviewed By: walterddr Differential Revision: D23842884 Pulled By: Hangjun fbshipit-source-id: a39f3810e3abebc6b4c1cb996b06312f6d42ffd6	2020-09-22 16:54:26 -07:00
Kurt Mohler	d1c68a7069	Clarify that 5-D 'bilinear' grid_sample is actually trilinear (#45090 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45090 Reviewed By: ailzhang Differential Revision: D23841046 Pulled By: zou3519 fbshipit-source-id: 941770cd5b3e705608957739026e9113e5f0c616	2020-09-22 15:10:22 -07:00
James Reed	79fe794f87	[FX] Make Graphs immutable and make GraphModule recompile after assigning graph (#44830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44830 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23743850 Pulled By: jamesr66a fbshipit-source-id: 501b92a89ff636c26abeff13105a75462384554c	2020-09-22 15:02:11 -07:00
Viswesh Sankaran	a4ce3f4194	Fix type hint warnings for common_methods_invocations.py (#44971 ) Summary: Fixes a subtask of https://github.com/pytorch/pytorch/issues/42969 Tested the following and no warnings were seen. python test/test_type_hints.py .... ---------------------------------------------------------------------- Ran 4 tests in 180.759s OK Pull Request resolved: https://github.com/pytorch/pytorch/pull/44971 Reviewed By: walterddr Differential Revision: D23822274 Pulled By: visweshfb fbshipit-source-id: e3485021e348ee0a8508a9d128f04bad721795ef	2020-09-22 13:40:46 -07:00
Yanan Cao	c253b10154	Fix incorrect EnumValue serialization issue (#44891 ) Summary: Previously, `prim::EnumValue` is serialized to `ops.prim.EnumValue`, which doesn't have the right implementation to refine return type. This diff correctly serializes it to enum.value, thus fixing the issue. Fixes https://github.com/pytorch/pytorch/issues/44892 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44891 Reviewed By: malfet Differential Revision: D23818962 Pulled By: gmagogsfm fbshipit-source-id: 6edfdf9c4b932176b08abc69284a916cab10081b	2020-09-22 11:59:45 -07:00
Zafar	2b1f25885e	[quant] Fix ConvTranspose mapping (#44844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44844 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23746466 Pulled By: z-a-f fbshipit-source-id: cb84e0fef5ab82e8ed8dd118d9fb21ee7b480ef7	2020-09-22 11:59:42 -07:00
Mike Ruberry	ef885c10d8	[pytorch] Add triplet margin loss with custom distance (#43680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43680 As discussed [here](https://github.com/pytorch/pytorch/issues/43342), adding in a Python-only implementation of the triplet-margin loss that takes a custom distance function. Still discussing whether this is necessary to add to PyTorch Core. Test Plan: python test/run_tests.py Imported from OSS Reviewed By: albanD Differential Revision: D23363898 fbshipit-source-id: 1cafc05abecdbe7812b41deaa1e50ea11239d0cb	2020-09-22 11:35:52 -07:00
Ailing Zhang	10f287539f	Align casing in test_dispatch with dispatch keys. (#44933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44933 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23778247 Pulled By: ailzhang fbshipit-source-id: bc3725eae670b03543015afe763cb3bb16baf8f6	2020-09-22 10:50:08 -07:00
James Reed	1fd48a9d1f	Revert D23798016: [FX] s/get_param/get_attr/ Test Plan: revert-hammer Differential Revision: D23798016 (`c941dd3492`) Original commit changeset: 1d2f3db1994a fbshipit-source-id: 974d930064b37d396c5d66c905a63d45449813e5	2020-09-22 10:32:51 -07:00
Elias Ellison	ae286d81e0	[JIT] improve alias analysis for list constructs (#39111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111 In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach: - it is not to hard to maintain the aliasDb implementation - it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated - It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements. The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op. In an example like: ``` def foo(input): x = torch.tensor([1, 2, 3, 4]) y = [x, x] input.add_(1) return torch.cat(y) ``` we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23828003 Pulled By: eellison fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537	2020-09-22 09:38:59 -07:00
Himangshu	9fc7a942f0	Change from self to self.class() in _DecoratorManager to ensure a new object is every time a function is called recursively (#44633 ) Summary: Change from self to self._class_() in _DecoratorManager to ensure a new object is every time a function is called recursively Fixes https://github.com/pytorch/pytorch/issues/44531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44633 Reviewed By: agolynski Differential Revision: D23783601 Pulled By: albanD fbshipit-source-id: a818664dee7bdb061a40ede27ef99e9546fc80bb	2020-09-22 09:13:39 -07:00
Nikita Shulga	63fd257879	Add `Ellipsis` constant to the list of recognized tokens (#44959 ) Summary: Per https://docs.python.org/3.6/library/constants.html > `Ellipsis` is the same as ellipsis literal `...` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44959 Reviewed By: suo Differential Revision: D23785660 Pulled By: malfet fbshipit-source-id: f68461849e7d16ef68042eb96566f2c936c06b0f	2020-09-22 09:05:25 -07:00
albanD	e155fbe915	add warning when ParameterList/Dict is used with DataParallel (#44405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44405 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D23783987 Pulled By: albanD fbshipit-source-id: 5018b0d381cb09301d2f88a98a910854f740ace1	2020-09-22 08:58:00 -07:00
Rong Rong	4a0aa69a66	Fix undefined variable 'namedshape' in tensor.py (#45085 ) Summary: Hot Fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/45085 Reviewed By: malfet, seemethere Differential Revision: D23824444 Pulled By: walterddr fbshipit-source-id: c9f37b394d281b7ef44b14c30699bb7510a362a7	2020-09-22 08:52:47 -07:00
anjali411	58b6ab69e5	torch.sgn for complex tensors (#39955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39955 resolves https://github.com/pytorch/pytorch/issues/36323 by adding `torch.sgn` for complex tensors. `torch.sgn` returns `x/abs(x)` for `x != 0` and returns `0 + 0j` for `x==0` This PR doesn't test the correctness of the gradients. It will be done as a part of auditing all the ops in future once we decide the autograd behavior (JAX vs TF) and add gradchek. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D23460526 Pulled By: anjali411 fbshipit-source-id: 70fc4e14e4d66196e27cf188e0422a335fc42f92	2020-09-22 08:24:53 -07:00
Bugra Akyildiz	1b059f2c6d	Directly use work.result() to retrieve tensor rather than passing as a separate argument (#44914 ) Summary: We currently are fetching an allreduced tensor from Python in C++ in, where we are storing the resulting tensor in a struct's parameter. This PR removes extra tensor paratemeter in the function parameter and fetch from a single place. Fixes https://github.com/pytorch/pytorch/issues/43960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44914 Reviewed By: rohan-varma Differential Revision: D23798888 Pulled By: bugra fbshipit-source-id: ad1b8c31c15e3758a57b17218bbb9dc1f61f1577	2020-09-22 06:28:47 -07:00
Jerry Zhang	5aed75b21b	[quant][graphmode][jit] Try to support append (#44641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44641 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D23682356 fbshipit-source-id: 09a03dfde0b1346a5764e8e28ba56e32b343d239	2020-09-21 23:13:56 -07:00
Gao, Xiang	2111ec3bf3	CUDA BFloat16 losses (#45011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45011 Reviewed By: mruberry Differential Revision: D23805840 Pulled By: ngimel fbshipit-source-id: 3eb60d4367c727100763879e20e9df9d58bf5ad6	2020-09-21 22:51:17 -07:00
Ksenija Stanojevic	0dda65ac77	[ONNX] add jit pass for lists (#43820 ) Summary: Add jit preprocessing pass for adding int lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43820 Reviewed By: albanD Differential Revision: D23674598 Pulled By: bzinodev fbshipit-source-id: 35766403a073e202563bba5251c07efb7cc5cfb1	2020-09-21 22:05:25 -07:00
Shen Li	09e7f62ce2	Fix RPC and ProcessGroup GIL deadlock (#45088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45088 Fixes #45082 Found a few problems while working on #44983 1. We deliberately swallow RPC timeouts during shutdown, as we haven't found a good way to handle those. When we convert `_wait_all_workers` into `_all_gather`, the same logic was inherited. However, as `_all_gather` meant to be used in more general scenarios, we should no longer keep silent about errors. This commit let the error throw in `_all_gather` and also let `shutdown()` to catch them and log. 2. After fixing (1), I found that `UnpickledPythonCall` needs to acquire GIL on destruction, and this can lead to deadlock when used in conjuction with `ProcessGroup`. Because `ProcessGroup` ctor is a synchronization point which holds GIL. In `init_rpc`, followers (`rank != 0`) can exit before the leader (`rank == 0`). If the two happens together, we could get a) on a follower, it exits `init_rpc` after running `_broadcast_to_followers` and before the reaching dtor of `UnpickledPythonCall`. Then it runs the ctor of `ProcessGroup`, which holds the GIL and wait for the leader to join. However, the leader is waiting for the response from `_broadcast_to_followers`, which is blocked by the dtor of `UnpickledPythonCall`. And hence the deadlock. This commit drops the GIL in `ProcessGroup` ctor. 3. After fixing (2), I found that `TensorPipe` backend nondeterministically fails with `test_local_shutdown`, due to a similar reason as (2), but this time it is that `shutdown()` on a follower runs before the leader finishes `init_rpc`. This commit adds a join for `TensorPipe` backend `init_rpc` after `_all_gather`. The 3rd one should be able to solve the 2nd one as well. But since I didn't see a reason to hold GIL during `ProcessGroup` ctor, I made that change too. Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D23825592 Pulled By: mrshenli fbshipit-source-id: 94920f2ad357746a6b8e4ffaa380dd56a7310976	2020-09-21 21:47:27 -07:00
Lin.Sung	f77ba0e48c	Change typo 'momemtum' to 'momentum' (#45045 ) Summary: As the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45045 Reviewed By: mruberry Differential Revision: D23808563 Pulled By: mrshenli fbshipit-source-id: ca818377f4c23d67b037c146fef667ab8731961e	2020-09-21 19:03:26 -07:00
Nikita Shulga	81bb19c9f0	[JIT] Prohibit subscripted assignments for tuple types (#44929 ) Summary: This would force jit.script to raise an error if someone tries to mutate tuple ``` Tuple[int, int] does not support subscripted assignment: File "/home/nshulga/test/tupleassignment.py", line 9 torch.jit.script def foo(x: Tuple[int, int]) -> int: x[-1] = x[0] + 1 ~~~~~ <--- HERE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44929 Reviewed By: suo Differential Revision: D23777668 Pulled By: malfet fbshipit-source-id: 8efaa4167354ffb4930ccb3e702736a3209151b6	2020-09-21 16:35:44 -07:00
Xiang Gao	581a364437	CUDA BFloat16 unary ops part 1 (#44813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44813 Reviewed By: mruberry Differential Revision: D23805816 Pulled By: ngimel fbshipit-source-id: 28c645dc31f094c8b6c3d3803f0b4152f0475a64	2020-09-21 14:22:31 -07:00
ahassan@azavea.com	1cab27d485	Add a torch.hub.load_local() function that can load models from any local directory with a hubconf.py (#44204 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43622 - Moves the model loading part of `torch.hub.load()` into a new `torch.hub.load_local()` function that takes in a path to a local directory that contains a `hubconf.py` instead of a repo name. - Refactors `torch.hub.load()` so that it now calls `torch.hub.load_local()` after downloading and extracting the repo. - Updates `torch.hub` docs to include the new function + minor fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44204 Reviewed By: malfet Differential Revision: D23817429 Pulled By: ailzhang fbshipit-source-id: 788fd83c87a94f487b558715b2809d346ead02b2	2020-09-21 14:17:21 -07:00
James Reed	c941dd3492	[FX] s/get_param/get_attr/ (#45000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45000 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D23798016 Pulled By: jamesr66a fbshipit-source-id: 1d2f3db1994a62b95d0ced03bf958e54d30c35dd	2020-09-21 14:09:32 -07:00
Ailing Zhang	92f8f75c59	Add alias dispatch key Math. (#44354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23591481 Pulled By: ailzhang fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf	2020-09-21 11:10:39 -07:00
Lucas Hosseini	ac8c7c4e9f	Make Channel API accept buffer structs rather than raw pointers. (#45014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45014 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/219 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/212 + Introduce buffer.h defining the buffer struct(s). The `CpuBuffer` struct is always defined, while the `CudaBuffer` struct is defined only when `TENSORPIPE_SUPPORTS_CUDA` is true. + Update all channels to take a `CpuBuffer` or `CudaBuffer` for `send`/`recv` rather than a raw pointer and a length. + Make the base `Channel`/`Context` classes templated on `TBuffer`, effectively creating two channel hierarchies (one for CPU channels, one for CUDA channels). + Update the Pipe and the generic channel tests to use the new API. So far, generic channel tests are CPU only, and tests for the CUDA IPC channel are (temporarily) disabled. A subsequent PR will take care of refactoring tests so that generic tests work for CUDA channels. An other PR will add support for CUDA tensors in the Pipe. Differential Revision: D23598033 Test Plan: Imported from OSS Reviewed By: lw Pulled By: beauby fbshipit-source-id: 1d6c3f91e288420858835cd5e7962e8da051b44b	2020-09-21 10:18:45 -07:00
Nick Gibson	4bbb6adff5	[NNC] fix SyncThreads insertion and reenable CudaSharedMem test (#44909 ) Summary: A previous fix for masking Cuda dimensions (https://github.com/pytorch/pytorch/issues/44733) changed the behaviour of inserting thread synchronization barriers in the Cuda CodeGen, causing the CudaSharedMemReduce_1 to be flaky and ultimately disabled. The issue is working out where these barriers must be inserted - solving this optimally is very hard, and I think not possible without dependency analysis we don't have, so I've changed our logic to be quite pessimistic. We'll insert barriers before and after any blocks that have thread dimensions masked (even between blocks that have no data dependencies). This should be correct, but it's an area we could improve performance. To address this somewhat I've added a simplifier pass that removes obviously unnecessary syncThreads. To avoid this test being flaky again, I've added a check against the generated code to ensure there is a syncThread in the right place. Also fixed a couple of non-functional but clarity issues in the generated code: fixed the missing newline after Stores in the CudaPrinter, and prevented the PrioritizeLoad mutator from pulling out loads contained within simple Let statements (such as those produced by the Registerizer). Pull Request resolved: https://github.com/pytorch/pytorch/pull/44909 Reviewed By: agolynski Differential Revision: D23800565 Pulled By: nickgg fbshipit-source-id: bddef1f40d8d461da965685f01d00b468d8a2c2f	2020-09-21 09:27:22 -07:00
Gregory Chanan	a6895d43b6	Turn on gradgrad check for BCELoss Criterion Tests. (#44894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44894 Looks like we added double backwards support but only turned on the ModuleTests. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23762544 Pulled By: gchanan fbshipit-source-id: b5cef579608dd71f3de245c4ba92e49216ce8a5e	2020-09-21 07:14:22 -07:00
Kaushik Ram Sadagopan	4810365576	Enabled torch.testing._internal.jit_utils.* typechecking. (#44985 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/44985 Reviewed By: malfet Differential Revision: D23794444 Pulled By: kauterry fbshipit-source-id: 9893cc91780338a8223904fb574efa77fa3ab2b9	2020-09-21 01:19:06 -07:00
anjali411	9f67176b82	Complex gradcheck logic (#43208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43208 This PR adds gradcheck for complex. The logic used for complex gradcheck is described in Section 3.5.3 here: https://arxiv.org/pdf/1701.00392.pdf More concretely, this PR introduces the following changes: 1. Updates get_numerical_jacobian to take as input a scalar value for vector (v). Adds gradcheck logic for C -> C, C-> R, R -> C. For R -> C functions, only the real value of gradient is propagated. 2. Adds backward definition for `torch.complex` and also adds a test to verify the definition added. 3. Updates backward for `mul`, `sin`, `cos`, `sinh`, `cosh`. 4. Adds tests for all `torch.real`, `torch.imag`, `torch.view_as_real`, `torch.view_as_complex`, `torch.conj`. Follow up tasks: 1. Add more thorough tests for R -> C cases. Specifically, add R->C test variants for functions. for e.g., `torch.mul(complex_tensor, real_tensor)` 2. Add back commented test in `common_methods_invocation.py`. 3. Add more special case checking for complex gradcheck to make debugging easier. 4. Update complex autograd note. 5. disable complex autograd for operators not tested for complex. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D23655088 Pulled By: anjali411 fbshipit-source-id: caa75e09864b5f6ead0f988f6368dce64cf15deb	2020-09-20 22:05:04 -07:00
Peter Bell	da7863f46b	Add one dimensional FFTs to torch.fft namespace (#43011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43011 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23751850 Pulled By: mruberry fbshipit-source-id: 8dc5fec75102d8809eeb85a3d347ba1b5de45b33	2020-09-19 23:32:22 -07:00
Mike Ruberry	60709ad1bf	Adds multiply and divide aliases (#44463 ) Summary: These alias are consistent with NumPy. Note that C++'s naming would be different (std::multiplies and std::divides), and that PyTorch's existing names (mul and div) are consistent with Python's dunders. This also improves the instructions for adding an alias to clarify that dispatch keys should be removed when copying native_function.yaml entries to create the alias entries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44463 Reviewed By: ngimel Differential Revision: D23670782 Pulled By: mruberry fbshipit-source-id: 9f1bdf8ff447abc624ff9e9be7ac600f98340ac4	2020-09-19 15:47:52 -07:00
Vasiliy Kuznetsov	2163d31016	histogram observer: ensure buffer shape consistency (#44956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44956 Makes buffer shapes for HistogramObserver have the same shapes in uninitialized versus initialized states. This is useful because the detectron2 checkpointer assumes that these states will stay the same, so it removes the need for manual hacks around the shapes changing. Test Plan: ``` python test/test_quantization.py TestObserver.test_histogram_observer_consistent_buffer_shape ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23785382 fbshipit-source-id: 1a83fd4f39b244b00747c368d5d305a07d877c92	2020-09-19 09:29:39 -07:00
Xiao Wang	d75c402755	Add cusolver to build, rewrite MAGMA inverse with cusolver (#42403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42265 This PR adds cusolver to the pytorch build, and enables the use of cusolver/cublas library functions on GPU `torch.inverse` on certain tensor shapes. Specifically, when * the tensor is two dimensional (single batch), or * has >2 dimensions (multiple batches) and `batch_size <= 2`, or * magma is not linked, cusolver/cublas will be used. In other conditions, the current implementation of MAGMA will still be used. `8c0949ae45/aten/src/ATen/native/cuda/BatchLinearAlgebra.cu (L742-L752)` The reason for this is that for tensors with large batch_size, `cublasXgetrfBatched` and `cublasXgetriBatched` doesn't perform very well. For `batch_size > 1`, we launch cusolver functions in multiple streams. This lets cusolver functions run in parallel, and can greatly increase the performance. When `batch_size > 2`, the parallel launched cusolver functions are slightly slower than the current magma implementation, so we still use the current magma impl. On CUDA 9.2, there were some numerical issues detected, so cusolver impl will not be used. The cusolver impl will also not be used on platforms other than Nvidia CUDA. `060769feaf/aten/src/ATen/native/cuda/BatchLinearAlgebraLib.h (L10-L13)` Note that there is a new heuristic used before cusolver/cublas calls here: `8c0949ae45/aten/src/ATen/native/cuda/MiscUtils.h (L113-L121)` where `use_loop_launch = true` means launch single batch cusolver functions in parallel, and `use_loop_launch = false` means use cublas_X_batched functions. When magma is enabled (only `batch_size <= 2` will be dispatched to cusolver/cublas), the heuristic will always return `true` and the cusolver calls are faster than small batch_size magma calls. When magma is disabled, this adds the functionality of `torch.inverse`, which was disabled before for all shapes (though large batch_size cublas performance may not be as well as magma). Checklist: - [X] Add benchmark, cpu, gpu-before (magma), gpu-after (cusolver) - [X] Rewrite single inverse (ndim == 2) with cusolver - [X] Rewrite batched inverse (ndim > 2) with cublas - [X] Add cusolver to build - [x] Clean up functions related to `USE_MAGMA` define guard - [x] Workaround for non-cuda platform - [x] Workaround for cuda 9.2 - [x] Add zero size check - [x] Add tests Next step: If cusolver doesn't cause any problem in pytorch build, and there are no major performance regressions reported after this PR being merged, I will start porting other cusolver/cublas functions for linear algebra to improve the performance. <details> <summary> benchmark 73499c6 </summary> benchmark code: https://github.com/xwang233/code-snippet/blob/master/torch.inverse/inverse-cusolver.ipynb shape meaning: * `[] 2 torch.float32 -> torch.randn(2, 2, dtype=torch.float32)` * `[2] 4 torch.float32 -> torch.randn(2, 4, 4, dtype=torch.float32)` \| shape \| cpu_time (ms) \| gpu_time_before (magma) (ms) \| gpu_time_after (ms) \| \| --- \| --- \| --- \| --- \| \| [] 2 torch.float32 \| 0.095 \| 7.534 \| 0.129 \| \| [] 4 torch.float32 \| 0.009 \| 7.522 \| 0.129 \| \| [] 8 torch.float32 \| 0.011 \| 7.647 \| 0.138 \| \| [] 16 torch.float32 \| 0.075 \| 7.582 \| 0.135 \| \| [] 32 torch.float32 \| 0.073 \| 7.573 \| 0.191 \| \| [] 64 torch.float32 \| 0.134 \| 7.694 \| 0.288 \| \| [] 128 torch.float32 \| 0.398 \| 8.073 \| 0.491 \| \| [] 256 torch.float32 \| 1.054 \| 11.860 \| 1.074 \| \| [] 512 torch.float32 \| 5.218 \| 14.130 \| 2.582 \| \| [] 1024 torch.float32 \| 19.010 \| 18.780 \| 6.936 \| \| [1] 2 torch.float32 \| 0.009 \| 0.113 \| 0.128 *regressed \| \| [1] 4 torch.float32 \| 0.009 \| 0.113 \| 0.131 regressed \| \| [1] 8 torch.float32 \| 0.011 \| 0.116 \| 0.129 regressed \| \| [1] 16 torch.float32 \| 0.015 \| 0.122 \| 0.135 regressed \| \| [1] 32 torch.float32 \| 0.032 \| 0.177 \| 0.178 regressed \| \| [1] 64 torch.float32 \| 0.070 \| 0.420 \| 0.281 \| \| [1] 128 torch.float32 \| 0.328 \| 0.816 \| 0.490 \| \| [1] 256 torch.float32 \| 1.125 \| 1.690 \| 1.084 \| \| [1] 512 torch.float32 \| 4.344 \| 4.305 \| 2.576 \| \| [1] 1024 torch.float32 \| 16.510 \| 16.340 \| 6.928 \| \| [2] 2 torch.float32 \| 0.009 \| 0.113 \| 0.186 regressed \| \| [2] 4 torch.float32 \| 0.011 \| 0.115 \| 0.184 regressed \| \| [2] 8 torch.float32 \| 0.012 \| 0.114 \| 0.184 regressed \| \| [2] 16 torch.float32 \| 0.019 \| 0.119 \| 0.173 regressed \| \| [2] 32 torch.float32 \| 0.050 \| 0.170 \| 0.240 regressed \| \| [2] 64 torch.float32 \| 0.120 \| 0.429 \| 0.375 \| \| [2] 128 torch.float32 \| 0.576 \| 0.830 \| 0.675 \| \| [2] 256 torch.float32 \| 2.021 \| 1.748 \| 1.451 \| \| [2] 512 torch.float32 \| 9.070 \| 4.749 \| 3.539 \| \| [2] 1024 torch.float32 \| 33.655 \| 18.240 \| 12.220 \| \| [4] 2 torch.float32 \| 0.009 \| 0.112 \| 0.318 regressed \| \| [4] 4 torch.float32 \| 0.010 \| 0.115 \| 0.319 regressed \| \| [4] 8 torch.float32 \| 0.013 \| 0.115 \| 0.320 regressed \| \| [4] 16 torch.float32 \| 0.027 \| 0.120 \| 0.331 regressed \| \| [4] 32 torch.float32 \| 0.085 \| 0.173 \| 0.385 regressed \| \| [4] 64 torch.float32 \| 0.221 \| 0.431 \| 0.646 regressed \| \| [4] 128 torch.float32 \| 1.102 \| 0.834 \| 1.055 regressed \| \| [4] 256 torch.float32 \| 4.042 \| 1.811 \| 2.054 regressed \| \| [4] 512 torch.float32 \| 18.390 \| 4.884 \| 5.087 regressed \| \| [4] 1024 torch.float32 \| 69.025 \| 19.840 \| 20.000 *regressed \| </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/42403 Reviewed By: ailzhang, mruberry Differential Revision: D23717984 Pulled By: ngimel fbshipit-source-id: 54cbd9ea72a97989cff4127089938e8a8e29a72b	2020-09-18 20:43:29 -07:00
Ivan Kobzarev	e9941a5dd4	[vulkan][py] torch.utils.optimize_for_vulkan (#44903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44903 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D23766039 Pulled By: IvanKobzarev fbshipit-source-id: dbdf484ee7d3a7719aab105efba51b92ebc51568	2020-09-18 18:20:11 -07:00
Shawn Wu	572f7e069c	Enable type check for torch.testing._internal.te_utils.* (#44927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44927 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D23776842 Pulled By: sshawnwu fbshipit-source-id: 65c028169a37e1f2f7d9fdce8a958234ee1caa26	2020-09-18 18:09:15 -07:00
James Reed	043466f978	[FX] Pass module's qualname to is_leaf_module (#44966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44966 Test Plan: Imported from OSS Reviewed By: dzhulgakov Differential Revision: D23790360 Pulled By: jamesr66a fbshipit-source-id: 7ef569fd93646584b27af7a615fa69c8d8bbdd3b	2020-09-18 17:02:33 -07:00
Peter Bell	fd4e21c91e	Add optional string support to native_functions schema (#43010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43010 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23751851 Pulled By: mruberry fbshipit-source-id: 648f7430e1b7311eff28421f38e01f52d998fcbd	2020-09-18 14:57:24 -07:00
Michael Suo	374e9373b5	[jit] Pull (most) tests out of libtorch_python (#44795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795 Today, we build our cpp tests twice, once as a standalone gtest binary, and once linked in `libtorch_python` so we can call them from `test_jit.py`. This is convenient (it means that `test_jit.py` is a single entry point for all our tests), but has a few drawbacks: 1. We can't actually use the gtest APIs, since we don't link gtest into `libtorch_python`. We're stuck with the subset that we want to write polyfills for, and an awkward registration scheme where you have to write a test then include it in `tests.h`). 2. More seriously, we register custom operators and classes in these tests. In a world where we may be linking many `libtorch_python`s, this has a tendency to cause errors with `libtorch`. So now, only tests that explicitly require cooperation with Python are built into `libtorch_python`. The rest are built into `build/bin/test_jit`. There are tests which require that we define custom classes and operators. In these cases, I've built thm into separate `.so`s that we call `torch.ops.load_library()` on. Test Plan: Imported from OSS Reviewed By: SplitInfinity, ZolotukhinM Differential Revision: D23735520 Pulled By: suo fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff	2020-09-18 14:04:40 -07:00
Lucas Hosseini	af3fc9725d	Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44803 Test Plan: CI Reviewed By: lw Differential Revision: D23732022 fbshipit-source-id: 5b839c7997bbee162a14d03414ee32baabbc8ece	2020-09-18 13:51:43 -07:00
wuyangz	d22dd80128	Enable type check for torch.testing._internal.common_device_type. (#44911 ) Summary: This PR intends to fix the type exceptions in common_device_type.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44911 Reviewed By: walterddr Differential Revision: D23768397 Pulled By: wuyangzhang fbshipit-source-id: 053692583b4d6169b0eb5ffe0c3d30635c0db699	2020-09-18 13:42:11 -07:00
Richard Zou	6d312132e1	Beef up vmap docs and expose to master documentation (#44825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44825 Test Plan: - build and view docs locally. Reviewed By: ezyang Differential Revision: D23742727 Pulled By: zou3519 fbshipit-source-id: f62b7a76b5505d3387b7816c514c086c01089de0	2020-09-18 13:26:25 -07:00
Sam Estep	c2cf6efd96	Enable type check for torch.testing._internal.dist_utils.* (#44832 ) Summary: Addresses a sub-task of https://github.com/pytorch/pytorch/issues/44752. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44832 Reviewed By: malfet Differential Revision: D23744260 Pulled By: samestep fbshipit-source-id: 46aede57b4fa66a770d5df382b0aea2bd6772b9b	2020-09-18 12:50:48 -07:00
Nick Gibson	f175830558	[NNC] Fuse identical conditions in simplifier (#44886 ) Summary: Adds a pass to the IR Simplifier which fuses together the bodies of Cond statements which have identical conditions. e.g. ``` if (i < 10) { do_thing_1; } else { do_thing_2; } if (i < 10) { do_thing_3; } ``` is transformed into: ``` if (i < 10) { do_thing_1; do_thing_3; } else { do_thing_2; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/44886 Reviewed By: glaringlee Differential Revision: D23768565 Pulled By: nickgg fbshipit-source-id: 3fe40d91e82bdfff8dcb8c56a02a4fd579c070df	2020-09-18 11:38:03 -07:00
Yanan Cao	174cbff00a	Improve sugared value's error message (#42889 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/42889 Improve sugared value's error message I think most (if not all) cases where this code path is reached can be attributed to closing over a global variable. Improving error message to make this clearer to users. close https://github.com/pytorch/pytorch/issues/41288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/42889 Reviewed By: SplitInfinity Differential Revision: D23779347 Pulled By: gmagogsfm fbshipit-source-id: ced702a96234040f79eb16ad998d202e360d6654	2020-09-18 11:01:40 -07:00
shubhambhokare1	0063512a4b	[ONNX] Updates to diagnostic tool to find missing ops (#44124 ) Summary: Moved description of tool and changes in function name Pull Request resolved: https://github.com/pytorch/pytorch/pull/44124 Reviewed By: albanD Differential Revision: D23674618 Pulled By: bzinodev fbshipit-source-id: 5db0bb14fc106fc96358b1e0590f08e975388c6d	2020-09-18 10:32:30 -07:00
Yi Wang	c68cc78299	Add a device parameter to RemoteModule (#44254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44254 Add a device parameter to RemoteModule, so it can be placed on any device and not just CPU. Original PR issue: RemoteModule enhancements #40550 Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: pritamdamania87 Differential Revision: D23483803 fbshipit-source-id: 4918583c15c6a38a255ccbf12c9168660ab7f6db	2020-09-18 10:31:03 -07:00
Gregory Chanan	07b7e44ed1	Stop using check_criterion_jacobian. (#44786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44786 This predates gradcheck and gradcheck does the same and more. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23731902 Pulled By: gchanan fbshipit-source-id: 425fd30e943194f63a663708bada8960265b8f05	2020-09-18 07:04:57 -07:00
Gregory Chanan	6d178f6b8e	Stop ignoring errors in cuda nn module tests. (#44783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44783 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D23731778 Pulled By: gchanan fbshipit-source-id: 32df903a9e36bbf3f66645ee2d77efa5ed6ee429	2020-09-18 07:03:41 -07:00
Peter Bell	df39c40054	Cleanup tracer handling of optional arguments (#43009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43009 * #43009 Cleanup tracer handling of optional arguments Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23766621 Pulled By: mruberry fbshipit-source-id: c1b46cd23b58b18ef4c03021b2514d7e692badb6	2020-09-18 06:54:09 -07:00
Peter Bell	caea1adc35	Complex support for stft and istft (#43886 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175, fixes https://github.com/pytorch/pytorch/issues/34797 This adds complex support to `torch.stft` and `torch.istft`. Note that there are really two issues with complex here: complex signals, and returning complex tensors. ## Complex signals and windows `stft` currently assumes all signals are real and uses `rfft` with `onesided=True` by default. Similarly, `istft` always takes a complex fourier series and uses `irfft` to return real signals. For `stft`, I now allow complex inputs and windows by calling the full `fft` if either are complex. If the user gives `onesided=True` and the signal is complex, then this doesn't work and raises an error instead. For `istft`, there's no way to automatically know what to do when `onesided=False` because that could either be a redundant representation of a real signal or a complex signal. So there, the user needs to pass the argument `return_complex=True` in order to use `ifft` and get a complex result back. ## stft returning complex tensors The other issue is that `stft` returns a complex result, represented as a `(... X 2)` real tensor. I think ideally we want this to return proper complex tensors but to preserver BC I've had to add a `return_complex` argument to manage this transition. `return_complex` defaults to false for real inputs to preserve BC but defaults to True for complex inputs where there is no BC to consider. In order to `return_complex` by default everywhere without a sudden BC-breaking change, a simple transition plan could be: 1. introduce `return_complex`, defaulted to false when BC is an issue but giving a warning. (this PR) 2. raise an error in cases where `return_complex` defaults to false, making it a required argument. 3. change `return_complex` default to true in all cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43886 Reviewed By: glaringlee Differential Revision: D23760174 Pulled By: mruberry fbshipit-source-id: 2fec4404f5d980ddd6bdd941a63852a555eb9147	2020-09-18 01:39:47 -07:00
Rohan Varma	5dbcbea265	TorchScript with record_function (#44345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44345 As part of enhancing profiler support for RPC, when executing TorchScript functions over RPC, we would like to be able to support user-defined profiling scopes created by `with record_function(...)`. Since after https://github.com/pytorch/pytorch/pull/34705, we support `with` statements in TorchScript, this PR adds support for `with torch.autograd.profiler.record_function` to be used within TorchScript. This can be accomplished via the following without this PR: ``` torch.opts.profiler._record_function_enter(...) # Script code, such as forward pass torch.opts.profiler._record_function_exit(....) ``` This is a bit hacky and it would be much cleaner to use the context manager now that we support `with` statements. Also, `_record_function_` type operators are internal operators that are subject to change, this change will help avoid BC issues in the future. Tested with `python test/test_jit.py TestWith.test_with_record_function -v` ghstack-source-id: 112320645 Test Plan: Repro instructions: 1) Change `def script_add_ones_return_any(x) -> Any` to `def script_add_ones_return_any(x) -> Tensor` in `jit/rpc_test.py` 2) `buck test mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_record_function_on_caller_rpc_async --print-passing-details` 3) The function which ideally should accept `Future[Any]` is `def _call_end_callbacks_on_future` in `autograd/profiler.py`. python test/test_jit.py TestWith.test_with_foo -v Reviewed By: pritamdamania87 Differential Revision: D23332074 fbshipit-source-id: 61b0078578e8b23bfad5eeec3b0b146b6b35a870	2020-09-17 18:45:00 -07:00
Yuxin Wu	9a007ba4cb	[jit] stop parsing the block after seeing exit statements (#44870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44870 fix https://github.com/pytorch/pytorch/issues/44864 Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_assert_is_script' Reviewed By: eellison Differential Revision: D23755094 fbshipit-source-id: ca3f8b27dc6f9dc9364a22a1bce0e2f588ed4308	2020-09-17 18:09:16 -07:00
James Reed	60ae6c9c18	[FX] Fix GraphModule copy methods not regenerating forward (#44806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44806 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23738732 Pulled By: jamesr66a fbshipit-source-id: 14e13551c6568c562f3f789b6274b6c86afefd0b	2020-09-17 17:14:38 -07:00
Yanli Zhao	e14b2080be	[reland] move rebuild buckets from end of first iteration to beginning of second iteration (#44798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44798 [test all] Update for relanding: in ddp.join(), moved _rebuild_buckets from end of backward to beginning of forward as well. Part of relanding PR #41954, this refactoring is to move rebuild_buckets call from end of first iteration to beginning of second iteration ghstack-source-id: 112279261 ghstack-source-id: 112279261 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D23735185 fbshipit-source-id: c26e0efeecb3511640120faa1122a2c856cd694e	2020-09-17 17:10:21 -07:00
Nikita Shulga	2043fbdfb6	Enable torch.backends.cuda typechecking in CI (#44916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44916 Reviewed By: walterddr Differential Revision: D23769844 Pulled By: malfet fbshipit-source-id: 3be3616fba9e2f9c6d89cc71d5f0d24ffcc45cf2	2020-09-17 15:31:38 -07:00
Alex Suhan	18b77d7d17	[TensorExpr] Add Mod support to the LLVM backend (#44823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44823 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseMod_LLVM Reviewed By: glaringlee Differential Revision: D23761996 Pulled By: asuhan fbshipit-source-id: c3c5b2fe0d989dec04f0152ce47c5cae35ed19c9	2020-09-17 15:25:42 -07:00
Jane (Yuan) Xu	1c996b7170	Enable typechecking for torch.testing._internal.common_quantized.* (#44805 ) Summary: Addresses a subproblem of [Issue 42969](https://github.com/pytorch/pytorch/issues/42969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44805 Reviewed By: malfet Differential Revision: D23742754 Pulled By: janeyx99 fbshipit-source-id: e916a6a0c049cac318549a485d47f19363087d15	2020-09-17 14:24:32 -07:00
Alex Suhan	f5b92332c1	[TensorExpr] Fix order comparisons for unsigned types (#44857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44857 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMCompareSelectByte*_LLVM Reviewed By: glaringlee Differential Revision: D23762162 Pulled By: asuhan fbshipit-source-id: 1553429bd2d5292ccda57910326b8c70e4e6ab88	2020-09-17 14:16:54 -07:00
Nikita Shulga	4066022146	Do not use `PRId64` in torch/csrc (#44767 ) Summary: Instead use `fmt::format()` or `%lld` and cast argument to `(long long)` Fix typos and add helper `PyErr_SetString()` method in torch/csrc/Exceptions.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/44767 Reviewed By: ezyang Differential Revision: D23723671 Pulled By: malfet fbshipit-source-id: c0101aed222184aa436b1e8768480d1531dff232	2020-09-17 14:00:02 -07:00
Alex Suhan	5d57025206	[TensorExpr] Add log1p support to the LLVM backend (#44839 ) Summary: Also corrected Sleef_log1p registrations, float versions had a redundant f. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44839 Test Plan: test_tensorexpr --gtest_filter=TensorExprTest.LLVMElemwiseLog1pFloat_LLVM Reviewed By: glaringlee Differential Revision: D23762113 Pulled By: asuhan fbshipit-source-id: b5cf003b5c0c1ad549c7f04470352231929ac459	2020-09-17 13:38:35 -07:00
Rohan Varma	bee97d5be0	Document the default behavior for dist.new_group() when ranks=None (#44000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44000 This wasn't documented, so add a doc saying all ranks are used when ranks=None ghstack-source-id: 111206308 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D23465034 fbshipit-source-id: 4c51f37ffcba3d58ffa5a0adcd5457e0c5676a5d	2020-09-17 11:30:37 -07:00
Yanan Cao	2558e5769d	Implement sort for list of tuples (#43448 ) Summary: * Implement tuple sort by traversing contained IValue types and generate a lambda function as comparator for sort. * Tuple, class objects can now arbitrarily nest within each other and still be sortable Fixes https://github.com/pytorch/pytorch/issues/43219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43448 Reviewed By: eellison Differential Revision: D23352273 Pulled By: gmagogsfm fbshipit-source-id: b6efa8d00e112178de8256da3deebdba7d06c0e1	2020-09-17 11:20:56 -07:00
Supriya Rao	1fde54d531	[quant][qat] Ensure fake_quant and observer can be disabled on scriptmodule (#44773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773 The model is created and prepared using fx APIs and then scripted for training. In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant and observer modules on it. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qat_and_script Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23741354 fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532	2020-09-17 10:21:52 -07:00

... 4 5 6 7 8 ...

12304 Commits