pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Rohan Varma	f8248543a1	Pass in smaller timeout into init_process_group for distributed_test (#47896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47896 Per title ghstack-source-id: 116710141 Test Plan: CI Reviewed By: osalpekar Differential Revision: D24943323 fbshipit-source-id: 7bf33ce3a021b9750b65e0c08f602c465cd81d28	2020-11-14 13:38:20 -08:00
Vasiliy Kuznetsov	4779553921	Revert "[quant] Remove nn.quantized.ReLU module and nn.quantized.functional.relu (#47415 )" (#47949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47949 This reverts commit `1478e5ec2a`. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24966363 Pulled By: vkuzo fbshipit-source-id: ca1126f699eef84027a15df35962728296c8a790	2020-11-14 08:40:30 -08:00
Jagadish Krishnamoorthy	1606899dbe	distributed_test: Map rank to GPU accordingly (#47898 ) Summary: If world_size is lesser than or equal to number of GPU's available then the rank can be directly mapped to corresponding GPU. This fixes the issue referenced in https://github.com/pytorch/pytorch/issues/45435 and https://github.com/pytorch/pytorch/issues/47629 For world_size = 3 and number of GPU's = 8, the rank to GPU mapping will be 0,2,4. This is due to the introduction of barrier, (refer PR https://github.com/pytorch/pytorch/issues/45181) the tensors in barrier is mapped to cuda0,1,2 and the tensors in the actual test cases are mapped to cuda0,2,4 resulting in different streams and leading to timeout. This issue is specific to default process group. Issue is not observed in new process group since the streams are created again after the initial barrier call. This patch maps the rank to corresponding GPU's when the world_size is less than or equal to the number of GPU's, in this case 0,1,2 Note: The barrier function in distributed_c10d.py should include new parameter to specify the tensor or rank to GPU mapping. In that case, this patch will be redundant but harmless since the tests can specify the tensors with appropriate GPU rankings. Fixes https://github.com/pytorch/pytorch/issues/47629 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47898 Reviewed By: smessmer Differential Revision: D24956021 Pulled By: rohan-varma fbshipit-source-id: a88257f22a7991ba36566329766c106d3360bb4e	2020-11-13 23:59:42 -08:00
Chester Liu	17a6bc7c1b	Cleanup unused code for Python < 3.6 (#47822 ) Summary: I think these can be safely removed since the min version of supported Python is now 3.6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47822 Reviewed By: smessmer Differential Revision: D24954936 Pulled By: ezyang fbshipit-source-id: 5d4b2aeb78fc97d7ee4abaf5fb2aae21bf765e8b	2020-11-13 21:37:01 -08:00
Ivan Yashchuk	260daf088d	Added linalg.cholesky (#46083 ) Summary: This PR adds `torch.linalg.cholesky` function that matches `numpy.linalg.cholesky`. Fixed `lda` argument to `lapackCholesky` calls. Added `random_hermitian_pd_matrix` helper function for tests. Ref https://github.com/pytorch/pytorch/issues/42666. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46083 Reviewed By: ailzhang Differential Revision: D24861752 Pulled By: mruberry fbshipit-source-id: 214dbceb4e8a2c589df209493efd843962d25593	2020-11-13 16:50:40 -08:00
Gao, Xiang	0652d755d3	Fix some flaky tests in test_torch.py and test_nn.py (#46941 ) Summary: Fixed test: - `test_is_nonzero`, this is asserting exact match, which is flaky when `TORCH_SHOW_CPP_STACKTRACES=1`, I changed this to non-exact assert - `test_pinverse` TF32 - `test_symeig` TF32 - `test_triangular_solve_batched_many_batches_cpu_float64` precision on CPU BLAS - `test_qr` TF32, as well as the tensor factory forgets a `dtype=dtype` - `test_lu` TF32 - `ConvTranspose2d` TF32 - `Conv3d_1x1x1_no_bias` TF32 - `Transformer*` TF32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46941 Reviewed By: heitorschueroff Differential Revision: D24852725 Pulled By: mruberry fbshipit-source-id: ccd4740cc643476178d81059d1c78da34e5082ed	2020-11-12 22:35:42 -08:00
Natalia Gimelshein	eb8331e759	Revert D24524219: Remove `balance` and `devices` parameter from Pipe. Test Plan: revert-hammer Differential Revision: D24524219 (`8da7576303`) Original commit changeset: 9973172c2bb7 fbshipit-source-id: b187c80270adb2a412e3882863a2d7de2a52ed56	2020-11-12 19:31:19 -08:00
kshitij12345	3649a2c170	[numpy] `torch.sqrt` : promote integer inputs to float (#47293 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47293 Reviewed By: malfet Differential Revision: D24855994 Pulled By: mruberry fbshipit-source-id: 1e6752f2eeba6d638dea0bdea0c650cf722718c9	2020-11-12 16:16:09 -08:00
Pritam Damania	8da7576303	Remove `balance` and `devices` parameter from Pipe. (#46804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46804 As per our design in https://github.com/pytorch/pytorch/issues/44827, changign the API such that the user places modules on appropriate devices instead of having a `balance` and `devices` parameter that decides this. This design allows us to use RemoteModule in the future. ghstack-source-id: 116479842 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D24524219 fbshipit-source-id: 9973172c2bb7636572cdc37ce06bf8368638a463	2020-11-12 14:20:23 -08:00
Ivan Yashchuk	149190c014	Added CUDA support for complex input for torch.solve (#47045 ) Summary: `torch.solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Differentiation also works correctly with complex inputs. Fixes https://github.com/pytorch/pytorch/issues/41084 Ref. https://github.com/pytorch/pytorch/issues/33152 anjali411 I hope you don't mind that I took over https://github.com/pytorch/pytorch/pull/42737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47045 Reviewed By: nikithamalgifb Differential Revision: D24921503 Pulled By: anjali411 fbshipit-source-id: 4c3fc4f193a84b6e28c43c08672d480715000923	2020-11-12 12:22:59 -08:00
Elias Ellison	664d2f48cf	[NNC] Enable unary op cpu testing (#47374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47374 A few small fixes needed to enable unary op cpu testing. If reviewers would prefer I split them up let me know. Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D24805248 Pulled By: eellison fbshipit-source-id: c2cfe2e3319a633e64da3366e68f5bf21d390cb7	2020-11-12 11:14:03 -08:00
Jerry Zhang	1478e5ec2a	[quant] Remove nn.quantized.ReLU module and nn.quantized.functional.relu (#47415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47415 nn.ReLU works for both float and quantized input, we don't want to define an nn.quantized.ReLU that does the same thing as nn.ReLU, similarly for nn.quantized.functional.relu this also removes the numerical inconsistency for models quantizes nn.ReLU independently in qat mode Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D24747035 fbshipit-source-id: b8fdf13e513a0d5f0c4c6c9835635bdf9fdc2769	2020-11-12 10:56:30 -08:00
Mingzhe Li	66f9b1de1b	[NCCL] enable p2p tests (#47797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47797 NCCL p2p tests had hang issues before, the reason is that there were some unexpected context switches. For example, process 1 which is supposed to only use GPU1 could use GPU0 as a result of missing explicitly setting device. ghstack-source-id: 116461969 Test Plan: waitforsandcastle Reviewed By: jiayisuse Differential Revision: D24863808 fbshipit-source-id: 92bd3a4874be8334210c7c8ee6363648893c963e	2020-11-12 10:44:50 -08:00
Kyle Chen	859e054314	skip test_all_reduce_sum_cuda_async test case for ROCM (#47630 ) Summary: Skip the following test case for rocm (When PYTORCH_TEST_WITH_ROCM=1): - test_all_reduce_sum_cuda_async (__main__.TestDistBackendWithFork) jeffdaily pruthvistony Pull Request resolved: https://github.com/pytorch/pytorch/pull/47630 Reviewed By: seemethere, heitorschueroff Differential Revision: D24849755 Pulled By: walterddr fbshipit-source-id: b952c81677df2dfd35d459b94ce0f7a5b12c0d5c	2020-11-12 07:19:32 -08:00
Rohan Varma	c9f6e70c09	Refactor DDP uneven inputs control flags (#47394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47394 This is a preliminary refactor for the next diff that will add an additional flag to control whether we throw a StopIteration or not. We basically move the flags for ddp uneven inputs to a simple class. ghstack-source-id: 116428177 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D24739509 fbshipit-source-id: 96bf41bd1c02dd27e68f6f37d08e22f33129b319	2020-11-11 16:51:56 -08:00
Ivan Yashchuk	52ec8b9340	Added CUDA support for complex input for torch.triangular_solve (#46916 ) Summary: `torch.triangular_solve` now works for complex inputs on GPU. I moved the existing tests to `test_linalg.py` and modified them to test complex and float32 dtypes. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46916 Reviewed By: navahgar, agolynski Differential Revision: D24706647 Pulled By: anjali411 fbshipit-source-id: fe780eac93d2ae1b2549539bb385e5fac25213b3	2020-11-11 16:08:11 -08:00
Ivan Yashchuk	a1db5b0f2b	Added CUDA support for complex input for torch.inverse #2 (#47595 ) Summary: `torch.inverse` now works for complex inputs on GPU. Opening a new PR here. The previous PR was merged and reverted due to a bug in tests marked with `slowTest`. Previous PR https://github.com/pytorch/pytorch/pull/45034 Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47595 Reviewed By: navahgar Differential Revision: D24840955 Pulled By: anjali411 fbshipit-source-id: ec49fffdc4b3cb4ae7507270fa24e127be14f59b	2020-11-11 11:06:08 -08:00
Heitor Schueroff	bf6a156f64	Fix kthvalue error for scalar input (#47600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47600 fixes https://github.com/pytorch/pytorch/issues/30818 Note that the median case was already fixed by https://github.com/pytorch/pytorch/pull/45847 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24860337 Pulled By: heitorschueroff fbshipit-source-id: 69ccbbb6c7c86671e5712b1c2056c012d898b4f2	2020-11-10 17:21:52 -08:00
Rong Rong	febc76a5c6	fix assert_allclose doesnt check shape (#47580 ) Summary: fix assert_allclose doesnt check shape should fix https://github.com/pytorch/pytorch/issues/47449. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47580 Reviewed By: samestep Differential Revision: D24836399 Pulled By: walterddr fbshipit-source-id: 943f8c83864bc01e1a782048c234e9592d2f1a25	2020-11-10 15:03:25 -08:00
Rohan Varma	0a7ebf00f8	[Reland] Add tests for DDP control flow models. (#47470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47470 Reland of https://github.com/pytorch/pytorch/pull/47206, which was reverted due to failing multigpu tests. The fix to make multigpu tests work is to compare against `torch.tensor([world_size, 0])`, not hardcode `torch.tensor([2, 0]` which assumes a world size of 2. Original commit description: As discussed offline with pritamdamania87, add testing to ensure per-iteration and rank-dependent control flow works as expected in DDP with find_unused_parameters=True. ghstack-source-id: 115993934 ghstack-source-id: 115993934 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D24767893 fbshipit-source-id: 7d7a2449270eb3e72b5061694e897166e16f9bbc	2020-11-10 12:22:59 -08:00
Richard Zou	22d21414d7	Revert D24574649: [pytorch][PR] Utility that loads a DP/DDP model state dict into a non-DDP model with the same architecture. Test Plan: revert-hammer Differential Revision: D24574649 (`b631c872c9`) Original commit changeset: 17d29ab16ae2 fbshipit-source-id: 6766c6b21b82c9463143da0370192d9c68dbce6c	2020-11-10 06:55:45 -08:00
Pradeep Ganesan	b631c872c9	Utility that loads a DP/DDP model state dict into a non-DDP model with the same architecture. (#45643 ) Summary: Added a convenience function that allows users to load models without DP/DDP from a DP/DDP state dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45643 Reviewed By: rohan-varma Differential Revision: D24574649 fbshipit-source-id: 17d29ab16ae24a30890168fa84da6c63650e61e9	2020-11-09 20:49:29 -08:00
Rong Rong	2f617c5104	skip GPU test on sandcastle if sanitizer is enabled (#47626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47626 `caffe2/test:cuda` was safeguarded by a GPU availability check however most of the mixed CPU/GPU tests aren't. Use `TEST_WITH_*SAN` flags to safeguard test discovery for CUDA tests. Test Plan: sandcastle Reviewed By: janeyx99 Differential Revision: D24842333 fbshipit-source-id: 5e264344a0b7b98cd229e5bf73c17433751598ad	2020-11-09 16:06:58 -08:00
Erjia Guan	86bb413600	Optimize backward for torch.repeat (#46726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46726 Fixes #43192 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D24739840 Pulled By: ejguan fbshipit-source-id: ddf21fc52c4676de25ad7bfb0b5c1c23daa77ee6	2020-11-09 15:12:40 -08:00
Natalia Gimelshein	4a2fb34042	check sparse sizes (#47148 ) Summary: checks sizes of sparse tensors when comparing them in assertEqual. Removes additional checks in safeCoalesce, safeCoalesce should not be a test for `.coalesce()` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47148 Reviewed By: mruberry Differential Revision: D24823127 Pulled By: ngimel fbshipit-source-id: 9303a6ff74aa3c9d9207803d05c0be2325fe392a	2020-11-09 10:33:24 -08:00
Pritam Damania	781e0ed835	Support RRef.backward() for Owner RRefs. (#46641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46641 Second part of https://github.com/pytorch/pytorch/pull/46568, allows RRef.backward() to work for owner RRefs. ghstack-source-id: 115440252 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D24441300 fbshipit-source-id: 64af28e6b6ae47ea27e611a148f217bc344a4c5b	2020-11-07 21:25:32 -08:00
Mehdi Mirzazadeh	160db3db4f	Adding profiling capability to c++ ddp collective functions (#46471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46471 ghstack-source-id: 116018837 Test Plan: Added unit tests: buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork Reviewed By: rohan-varma Differential Revision: D23948397 fbshipit-source-id: 6d93a370aff26bf96c39e5d78a2492c5142a9156	2020-11-06 10:29:58 -08:00
Edward Yang	1aeefcdaa6	Revert D24730264: [pytorch][PR] Added CUDA support for complex input for torch.inverse Test Plan: revert-hammer Differential Revision: D24730264 (`33acbedace`) Original commit changeset: b9c94ec46301 fbshipit-source-id: beb9263700e9bc92685f74c37c46aa33f3b595b9	2020-11-06 07:28:14 -08:00
Xu Zhao	eaa993a2e0	Add type annotations to torch._C._distributed_rpc module. (#46624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46624 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24761656 Pulled By: xuzhao9 fbshipit-source-id: b55aee5dd2b97f573a50e5bbfddde7d984943fec	2020-11-06 01:28:51 -08:00
Ivan Yashchuk	33acbedace	Added CUDA support for complex input for torch.inverse (#45034 ) Summary: `torch.inverse` now works for complex inputs on GPU. Test cases with complex matrices are xfailed for now. For example, batched matmul does not work with complex yet. Ref. https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45034 Reviewed By: zou3519 Differential Revision: D24730264 Pulled By: anjali411 fbshipit-source-id: b9c94ec463012913c117278a884adeee96ea02aa	2020-11-05 16:30:11 -08:00
Richard Zou	9c8078cdfb	Revert D24659901: Add tests for DDP control flow models. Test Plan: revert-hammer Differential Revision: D24659901 (`31c9d2efcd`) Original commit changeset: 17fc2b3ebba9 fbshipit-source-id: 26b0bdbe83cba54da4f363cfa7fc85c503aa05ab	2020-11-05 08:08:59 -08:00
Pritam Damania	c8872051e6	Validate number of GPUs in distributed_test. (#47259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47259 As described in https://github.com/pytorch/pytorch/issues/47257, not using enough number of GPUs would result in an error. As a result, before we call `init_process_group` in distributed_test, we validate we have enough GPUs. #Closes: https://github.com/pytorch/pytorch/issues/47257 ghstack-source-id: 115790475 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D24699122 fbshipit-source-id: 59c78d191881d1e063c43623dcf4d7eb75a2e94e	2020-11-04 17:55:34 -08:00
Rohan Varma	31c9d2efcd	Add tests for DDP control flow models. (#47206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47206 As discussed offline with pritamdamania87, add testing to ensure per-iteration and rank-dependent control flow works as expected in DDP with `find_unused_parameters=True`. ghstack-source-id: 115854944 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D24659901 fbshipit-source-id: 17fc2b3ebba9cef2dd01d2877bad5702174b9767	2020-11-04 15:40:57 -08:00
Erjia Guan	f1ac63d324	Implement copysign (#46396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46396 Related #38349 [numpy](https://numpy.org/doc/stable/reference/generated/numpy.copysign.html?highlight=copysign#numpy.copysign) - No in-place function - No method - Optional output - Available: byte, char, bool, int, short, long, float, double, half - Integral promoted to float - Not available: float/double complex `c = np.copysign(a, b)` \| a \| b \| c \| a.grad \| \| -1 \| -1 \| -1 \| 1 \| \| -0 \| -1 \| -0 \| 0 \| \| 0 \| -1 \| -0 \| 0 \| \| 1 \| -1 \| -1 \| -1 \| \| -1 \| -0 \| -1 \| 1 \| \| -0 \| -0 \| 0 \| 0 \| \| 0 \| -0 \| 0 \| 0 \| \| 1 \| -0 \| -1 \| -1 \| \| -1 \| 0 \| 1 \| -1 \| \| -0 \| 0 \| 0 \| 0 \| \| 0 \| 0 \| 0 \| 0 \| \| 1 \| 0 \| 1 \| 1 \| \| -1 \| 1 \| 1 \| -1 \| \| -0 \| 1 \| 0 \| 0 \| \| 0 \| 1 \| 0 \| 0 \| \| 1 \| 1 \| 1 \| 1 \| This function becomes non-differentiable at `a=0` for any `b`. So, in my opinion, we may set the gradient for `a=0` to 0. TODO: - [x] test (cpu/gpu) - [x] doc - [x] ~kernel_vec~ Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D24401366 Pulled By: ejguan fbshipit-source-id: 3621c5ff74b185376a3705589983bb5197ab896d	2020-11-04 08:08:57 -08:00
Kshiteej K	63978556fd	[numpy] `torch.a{cosh, sinh}` : promote integer inputs to float (#47152 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47152 Reviewed By: mrshenli Differential Revision: D24681083 Pulled By: mruberry fbshipit-source-id: 246e2272536cf912a2575bfaaa831c3eceec034c	2020-11-03 15:26:13 -08:00
kshitij12345	c424d9389e	[numpy] `torch.a{cos, tan}` : promote integer inputs to float (#47005 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47005 Reviewed By: mrshenli Differential Revision: D24681097 Pulled By: mruberry fbshipit-source-id: 2f29655a5f3871ee96c2bfd35c93f4d721730e37	2020-11-03 13:00:24 -08:00
kshitij12345	0d00724e36	[numpy] `torch.{a}tanh` : promote integer inputs to float (#47064 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47064 Reviewed By: mrshenli Differential Revision: D24681107 Pulled By: mruberry fbshipit-source-id: 1818206c854dbce7074363bf6f1949daa7bf6052	2020-11-03 12:56:58 -08:00
Ivan Yashchuk	f276ab55cd	Added Kronecker product of tensors (torch.kron) (#45358 ) Summary: This PR adds a function for calculating the Kronecker product of tensors. The implementation is based on `at::tensordot` with permutations and reshape. Tests pass. TODO: - [x] Add more test cases - [x] Write documentation - [x] Add entry `common_methods_invokations.py` Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45358 Reviewed By: mrshenli Differential Revision: D24680755 Pulled By: mruberry fbshipit-source-id: b1f8694589349986c3abfda3dc1971584932b3fa	2020-11-03 12:41:41 -08:00
Xiao Wang	774b638eb6	Change largeCUDATensorTest to largeTensorTest+onlyCUDA; add a buffer to large cuda tensor test (#45332 ) Summary: Effectively, `largeCUDATensorTest` = `largeTensorTest` + `onlyCUDA`. There was this problem where a user got OOM for a `largeCUDATensorTest('16GB')` on a 16GB V100. This decorator was checking total memory for a GPU device, however in most cases, we can't allocate all of the memory that a GPU has. So, it would be beneficial that we have a buffer on this `largeTensorTest` check for CUDA. I added a 10% buffer to it. Definition of `largeTensorTest` `d22dd80128/torch/testing/_internal/common_device_type.py (L560-L578)` `_has_sufficient_memory` `d22dd80128/torch/testing/_internal/common_device_type.py (L535-L557)` `largeCUDATensorTest` `d22dd80128/torch/testing/_internal/common_device_type.py (L526-L532)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/45332 Reviewed By: ngimel Differential Revision: D24698690 Pulled By: mruberry fbshipit-source-id: a77544478e45ce271f6639ea04e87700574ae307	2020-11-03 11:43:49 -08:00
anjali411	cedeee2cd4	Add scalar.conj() and update backward formulas for add and sub (#46596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46596 1. Added `conj` method for scalar similar to numpy. 2. Updates backward formulas for add and sub to work correctly for R -> C cases and for the case when alpha is complex. 3. Enabled complex backward for nonzero (no formula update needed). Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24529227 Pulled By: anjali411 fbshipit-source-id: da871309a6decf5a4ab5c561d5ab35fc66b5273d	2020-11-02 16:17:00 -08:00
Rong Rong	96b23f7db1	add sandcastle device type test base discovery (#47119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47119 Test Plan: tests 1. test_cuda still works: `buck test --no-cache -c test.external_runner=tpx mode/dev-nosan //caffe2/test:cuda -- --use-remote-execution --force-tpx` 2. test_torch is blocked on D24623962 `buck test --no-cache -c test.external_runner=tpx mode/dev-nosan //caffe2/test:torch -- --use-remote-execution --force-tpx` Reviewed By: mruberry Differential Revision: D24649868 fbshipit-source-id: 97cb41996ea0c37a66a4bf2154e254d2d2912a17	2020-11-02 12:22:30 -08:00
Jeff Daily	6906701bde	[ROCm] enable stream priorities (#47136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47136 Reviewed By: mruberry Differential Revision: D24672457 Pulled By: ngimel fbshipit-source-id: 54f60c32df87cbd40fccd7fb1ecf0437905f01a3	2020-11-02 11:25:44 -08:00
James Reed	2e2dc5874b	Fix lint (#47095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47095 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D24639056 Pulled By: jamesr66a fbshipit-source-id: e4f7842eb0438675723d1cac78e20d13b96e802c	2020-10-29 18:09:23 -07:00
James Reed	9bc8f071a3	[WIP] Move torch.fx into its own target (#46658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46658 ghstack-source-id: 115213192 Test Plan: waitforsadcastle Reviewed By: zdevito, vkuzo Differential Revision: D24374723 fbshipit-source-id: 2b5708001f5df2ffb21ea5e586e26030653ccdcf	2020-10-29 17:03:08 -07:00
Rohan Varma	d850b5c98c	Fix DDP issue where parameters share same grad_accumulator (#46755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46755 As reported in https://github.com/pytorch/pytorch/issues/41324, there is a bug in DDP when `find_unused_parameters=True` and 2 or more parameters share the same gradient accumulator. In the reducer, we currently keep a mapping of grad accumulator to index and populate it with map[accumulator] = index, but this overwrites indices when the accumulator is the same. To fix this, switch the mapping values to a vector of indices to hold all such indices that share the same accumulator. ghstack-source-id: 115453567 Test Plan: Added UT Reviewed By: pritamdamania87 Differential Revision: D24497388 fbshipit-source-id: d32dfa9c5cd0b7a8df13c7873d5d28917b766640	2020-10-29 12:23:06 -07:00
Yi Wang	cab32d9cdf	[RPC Framework] Support remote device format "<workername>/<device>" (#46773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46773 Changed the constructor of RemoteModule to accept a `remote_device` arg in the following format: "<workername>/<device>" (e.g., "trainer0/cpu", "ps0/cuda:0") This arg merges the original `on` and `device` arg. Original PR issue: RemoteDevice Format #46554 ghstack-source-id: 115448051 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: pritamdamania87 Differential Revision: D24482562 fbshipit-source-id: 5acfc73772576a4b674df27625bf560b8f8e67c1	2020-10-29 00:14:56 -07:00
Kshiteej K	5c8aad1141	[numpy] `torch.cos`, `torch.tan` : promote integer inputs to float (#46706 ) Summary: References https://github.com/pytorch/pytorch/issues/42515 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/46706 Reviewed By: izdeby Differential Revision: D24537262 Pulled By: mruberry fbshipit-source-id: e57377a625814a3f34a765ce6bfd63a33c02a5d9	2020-10-28 22:02:52 -07:00
Jerry Zhang	c2a3951352	[quant][graphmode][fx] Remove inplace option for convert_fx (#46955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46955 Initially we were thinking of adding a `invalidate_quantized_float_parameters` option to free the memory of quantized floating parameters, but it turns out we will do module swap just like in eager mode for the modules that are quantized, so the old floating point module will not be referenced after quantization. therefore this feature is only needed for functionals, since most people are using quantization with modules we may not need this. we'll revisit after we find there is a need for this. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D24579400 fbshipit-source-id: fbb0e567405dc0604a2089fc001573affdade986	2020-10-28 21:07:19 -07:00
Rohan Varma	c7183c9878	Fix object-based collectives API to use torch.cuda.current_device instead of (#46897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46897 These APIs implicitly assumed that gpu for rank == rank index, but that is not necessarily true. For example, the first GPU could be used for a different purpose and rank 0 could use GPU 1, rank 1 uses GPU 2, etc. Thus, we mandate that the user specify the device to use via `torch.cuda.set_device()` before making calls to this API. This expectation should be okay since we clearly document it, and we expect the user to set this for DistributedDataParallel as well. Also adds/tidies up some documentation. ghstack-source-id: 115359633 Test Plan: Modified unittests Reviewed By: divchenko Differential Revision: D24556177 fbshipit-source-id: 7e826007241eba0fde3019180066ed56faf3c0ca	2020-10-28 18:12:50 -07:00
Richard Barnes	353e7f940f	Ensure kernel launches are checked (#46474 ) Summary: Caffe2 and Torch currently does not have a consistent mechanism for determining if a kernel has launched successfully. The result is difficult-to-detect or silent errors. This diff provides functionality to fix that. Subsequent diffs on the stack fix the identified issues. Kernel launch errors may arise if invalid launch parameters (number of blocks, number of threads, shared memory, or stream id) are specified incorrectly for the hardware or for other reasons. Interestingly, unless these launch errors are specifically checked for CUDA will silently fail and return garbage answers which can affect downstream computation. Therefore, catching launch errors is important. Launches are currently checked by placing ``` AT_CUDA_CHECK(cudaGetLastError()); ``` somewhere below the kernel launch. This is bad for two reasons. 1. The check may be performed at a site distant to the kernel launch, making debugging difficult. 2. The separation of the launch from the check means that it is difficult for humans and static analyzers to determine whether the check has taken place. This diff defines a macro: ``` #define TORCH_CUDA_KERNEL_LAUNCH_CHECK() AT_CUDA_CHECK(cudaGetLastError()) ``` which clearly indicates the check. This diff also introduces a new test which analyzes code to identify kernel launches and determines whether the line immediately following the launch contains `TORCH_CUDA_KERNEL_LAUNCH_CHECK();`. A search of the Caffe2 codebase identifies 104 instances of `AT_CUDA_CHECK(cudaGetLastError());` while the foregoing test identifies 1,467 launches which are not paired with a check. Visual inspection indicates that few of these are false positives, highlighting the need for some sort of static analysis system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46474 Test Plan: The new test is run with: ``` buck test //caffe2/test:kernel_launch_checks -- --print-passing-details ``` And should be launched automatically with the other land tests. (TODO: Is it?) The test is currently set up only to provide warnings but can later be adjusted to require checks. Otherwise, I rely on the existing test frameworks to ensure that changes resulting from reorganizing existing launch checks don't cause regressions. Reviewed By: ngimel Differential Revision: D24309971 Pulled By: r-barnes fbshipit-source-id: 0dc97984a408138ad06ff2bca86ad17ef2fdf0b6	2020-10-28 09:27:48 -07:00

1 2 3 4 5 ...

663 Commits