pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
ShawnZhong	12c219de54	Fix histc with empty tensor error (#38987 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38979 The error in mentioned https://github.com/pytorch/pytorch/issues/38979 is a [`cudaErrorInvalidConfiguration` error](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038): > This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requesting more shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations. This is because we are trying to launch a kernel with block size 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38987 Differential Revision: D21722993 Pulled By: ezyang fbshipit-source-id: 2c283e0a9f542b4acb96e895a43b991ccac808fe	2020-05-26 13:19:13 -07:00
Mike Ruberry	6ddca30b2d	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21717199 Pulled By: mruberry fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a	2020-05-26 08:30:23 -07:00
Brian	389e16c33b	`torch.pow` Add type promotion support and fix issue with __rpow__ (#37098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37098 ### Cherry-picked from another stack: Some code review already occurred here: https://github.com/pytorch/pytorch/pull/32582 ### Summary: Fixes: https://github.com/pytorch/pytorch/issues/32436 The issue caused incorrect handling of dtypes for scalar tensor. e.g. before this change: ``` >>> 5.5 torch.ones(5, dtype=torch.int32) tensor([5, 5, 5, 5, 5], dtype=torch.int32) ``` should return a float tensor. Also fixes a number of incorrect cases: * tensors to negative powers were giving incorrect results (1 instead of 0 or error) * Behavior wasn't consistent between cuda/cpu * large_value ** 1 in some cases gave a result not equal to large_value because of truncation in conversion to double and back. BC-breaking: Previously incorrect behavior (in 1.4): ``` >>> a tensor([1, 1, 1, 1, 1], dtype=torch.int32) >>> a.pow_(.5) tensor([1, 1, 1, 1, 1], dtype=torch.int32) ``` After this change: `RuntimeError: result type Float can't be cast to the desired output type Int` Test Plan: Imported from OSS Differential Revision: D21686207 Pulled By: nairbv fbshipit-source-id: e797e7b195d224fa46404f668bb714e312ea78ac	2020-05-26 08:29:51 -07:00
Xiang Gao	7e6f6f522f	[PATCH] Migrate min from THC to ATen and remove _min (#38440 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/36900 Since I feel this PR is already large enough, I didn't migrate max in this PR. Legacy code is not cleaned up either. All these remaining work will be done in later PRs after this is merged. Benchmark on an extreme case ```python import torch print(torch.__version__) t = torch.randn(100000, 2, device='cuda') warmup = torch.arange(100000000) torch.cuda.synchronize() %timeit t.min(dim=0); torch.cuda.synchronize() ``` Before: 4ms; After: 24.5us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38440 Differential Revision: D21560691 Pulled By: ngimel	2020-05-26 08:10:38 -07:00
kshitij12345	3487744821	Add `torch.logcumsumexp` (#36308 ) Summary: Creating new PR as I am unable to push to pandeykartikey 's branch as I don't have the permissions. Closes https://github.com/pytorch/pytorch/issues/26411 Based on https://github.com/pytorch/pytorch/issues/32876 Thanks pandeykartikey for starting this out. Have addressed the comments. anjali411 agadetsky albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/36308 Differential Revision: D21648573 Pulled By: albanD fbshipit-source-id: bc1a8fc4ab474a1148298117a1549b0e46f7c3ff	2020-05-21 09:12:31 -07:00
rohithkrn	1ea80b4234	[ROCm] Set correct tolerance values for bfloat16 div tests (#38823 ) Summary: This PR fixes the tolerance values for some of the bfloat16 div tests that were enabled on ROCm with incorrect tolerance values in the PR https://github.com/pytorch/pytorch/pull/38621 Also disabled(to unblock CI) `test_addcdiv*` for which the error is large when absolute values in the tensor are higher. This will have to be investigated further. ezyang jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38823 Differential Revision: D21686290 Pulled By: ezyang fbshipit-source-id: 85472680e1886bdc7c227ed2656e0b4fd5328e46	2020-05-21 07:29:49 -07:00
Nik Ved	f80df4ca79	port `scatter_add` to ATen (CUDA) (#38262 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/24622 ](https://github.com/pytorch/pytorch/issues/24622). Pull Request resolved: https://github.com/pytorch/pytorch/pull/38262 Differential Revision: D21656729 Pulled By: ngimel fbshipit-source-id: 63dcbf8eeaf59d8295bf4e5c8bb9d28ad165d4eb	2020-05-20 19:03:41 -07:00
kshitij12345	3b254acd99	support complex types for tanh_cuda and tanh_backward_cuda (#38786 ) Summary: Builds on https://github.com/pytorch/pytorch/issues/37791 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38786 Differential Revision: D21666138 Pulled By: anjali411 fbshipit-source-id: cbd313b8fd21109aadd614c60259b9dc505771a5	2020-05-20 12:57:40 -07:00
Mingfei Ma	fe66bdb498	port masked_select from TH to ATen and optimize perf on CPU (#33269 ) Summary: This PR ports `masked_select` from TH to ATen and optimize the performance on CPU with TensorIterator. https://github.com/pytorch/pytorch/issues/33053 1. single socket run: up to 5.4x speedup; 2. single core run: up to 1.16x speedup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33269 Differential Revision: D20922288 Pulled By: ngimel fbshipit-source-id: 38e183a4e3599bba29bbbebe36264026abe1c50e	2020-05-20 11:36:29 -07:00
nuka137	c78691b4a6	[CPU] torch.gather for complex dtypes (#36430 ) Summary: This PR resolves https://github.com/pytorch/pytorch/issues/36340 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/36430 Differential Revision: D21662139 Pulled By: anjali411 fbshipit-source-id: 361d064c1144b368afae3059c19f77abe26080a3	2020-05-20 09:15:14 -07:00
Mike Ruberry	7587188037	Skips test_float_to_int_conversion_finite on MacOS (#38753 ) Summary: See https://github.com/pytorch/pytorch/issues/38752. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38753 Differential Revision: D21656330 Pulled By: mruberry fbshipit-source-id: f1f97228f31b8a0b0535b3168a7d209fefff2769	2020-05-19 21:56:48 -07:00
Mike Ruberry	64584573f9	Updates tests for integer division deprecation (#38621 ) Summary: Updates our tests in preparation of integer division using torch.div and torch.addcdiv throwing a runtime error by avoiding integer division using torch.div. This creates a brief period where integer division using torch.div is untested, but that should be OK (since it will soon throw a runtime error). These callsites were identified using https://github.com/pytorch/pytorch/issues/36897. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38621 Differential Revision: D21612823 Pulled By: mruberry fbshipit-source-id: 749c03a69feae02590b4395335163d9bf047e162	2020-05-19 19:28:00 -07:00
Mike Ruberry	819da00b3d	Fixes floordiv dunder registrations (#38695 ) Summary: floordiv was missing a couple dunder registrations, which was causing __ifloordiv__ to not be called when it should. This adds the appropriate registrations and adds a test verifying that the inplace dunders are actually occuring inplace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38695 Differential Revision: D21633980 Pulled By: mruberry fbshipit-source-id: a423f5ec327cdc062fd6d9d56abd36fe44ac8198	2020-05-19 12:11:38 -07:00
Pavel Belevich	b14734d92e	Add bfloat16 to CPU cauchy_kernel, log_normal_kernel, exponential_kernel (#38427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38427 Test Plan: Imported from OSS Differential Revision: D21640640 Pulled By: pbelevich fbshipit-source-id: 9cff8f6b5c33b3b31753c76fc8033d329b218019	2020-05-19 10:21:36 -07:00
Pavel Belevich	35beff0b9f	RNG infrastructure improvements (#37984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37984 - `NumericUtils.h` CUDA distribution kernels had two variants of transformation labdas(`uniform`/`normal` -> `lognormal`/`exponential`/`cauchy`/`geometric`...): for double-precision and optimized for CUDA single precision. It was done by using `::log`/`__logf`, `::exp`/`__expf` and `::tan/__tanf`. I moved them to `NumericUtils.h` and called them `at::exp`, `at::log` and `at::tan`. It allowed to unify CPU/CUDA transformation templates in `TransformationHelper.h`. - `DistributionsHelper.h` Made `normal_distribution`, `geometric_distribution`, `exponential_distribution`, `cauchy_distribution`, `lognormal_distribution` C10_HOST_DEVICE compatible to reuse them in CPU/CUDA distribution kernels. Replaced explicit math with transformations from `TransformationHelper.h` - `TransformationHelper.h` Renamed `_transformation` to `transformation::` Added clear unified host/device transformations templates `normal`, `cauchy`, `exponential`, `geometric`, `log_normal` which are used by both CPU and CUDA distribution kernels and custom PRNG distribution kernels. - `cpu/DistributionTemplates.h` Unified `normal_kernel`, `cauchy_kernel`, `log_normal_kernel`, `geometric_kernel`, `exponential_kernel`. - `cuda/DistributionTemplates.h` Extracted `UNIFORM_AND_TRANSFORM` and `NORMAL_AND_TRANSFORM` macros to reuse code between distribution kernel templates. Unified transformation labdas(`uniform`/`normal` -> `lognormal`/`exponential`/`cauchy`/`geometric`...) - `test_torch.py` Added `scipy.stats.kstest` [Kolmogorov–Smirnov](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) tests for `uniform`/`normal`/`lognormal`/`exponential`/`cauchy` distributions and [Chi-squared](https://en.wikipedia.org/wiki/Chi-squared_test) test for `geometric` one. To make sure that our distributions are correct. - `cpu_rng_test.cpp`, `rng_test.h` Fixed random_()'s from and to bounds issue for floating-point types, fixed cast/overflow warnings - `THTensorRandom.h`, `THVector.h` Moved unnecessary includes to `THTensorRandom.cpp` Test Plan: Imported from OSS Differential Revision: D21477955 Pulled By: pbelevich fbshipit-source-id: 7b793d1761a7a921c4b4a4a7d21d5d6c48f03e72	2020-05-19 10:20:39 -07:00
kshitij12345	fc19747d64	handle grad with `stride=0` on GPU MvBackward (#38321 ) Summary: References : https://github.com/pytorch/pytorch/issues/38315 , https://github.com/pytorch/pytorch/issues/29984 cuBlas expects strides to be greater than 0. Cloning the `grad` allocates a new vector with non-zero strides. For CPU, we don't clone and allocate a new vector as CPU implementation works with stride=0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38321 Differential Revision: D21628966 Pulled By: ngimel fbshipit-source-id: 390caf835af6d1d77ed537b7fcc113a22c3ec301	2020-05-18 20:53:36 -07:00
anjali411	f3048609d3	[CUDA] torch.roll for complex dtypes (#38664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38664 Test Plan: Imported from OSS Differential Revision: D21630498 Pulled By: anjali411 fbshipit-source-id: bf43a812f3d8dd984785256bad41131410435965	2020-05-18 18:19:22 -07:00
Xiang Gao	83df3beaca	Add complex support for torch.sum (#38382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38382 Test Plan: Imported from OSS Differential Revision: D21600127 Pulled By: anjali411 fbshipit-source-id: c5338ab10bdcebe4a281b03f78e6f2063186bc32	2020-05-15 19:49:38 -07:00
Mike Ruberry	9cfc10d52e	Updates assertEqual to use torch.isclose-like logic (#37294 ) Summary: Edit: this has been updated to reflect the PR's current status, which has changed after review. This PR updates the behavior of the assertEqual, assertNotEqual, and assert_allclose to be consistent with each other and torch.isclose. It corrects several additional bugs in the current implementations and adds extensive testing and comments, too. These updates follow from changes to assertEqual like https://github.com/pytorch/pytorch/pull/34258 and https://github.com/pytorch/pytorch/pull/37069, and from our discussion of torch.isclose for complex tensors (see https://github.com/pytorch/pytorch/issues/36462), where we decided to implement a NumPy-compatible mathematical notion of "closeness" for complex tensors that is not a great fit for our testing framework. The detailed changelist is: - New test framework functions for comparing tensors and scalars - Tensors are compared using isclose; the real and imaginary parts of complex tensors are compared independently - Scalars are compared using the same algorithm - assertEqual and assert_allclose now use this common comparison function, instead of each implementing their own with divergent behavior - assertEqual-like debug messages are now available for all tensor and scalar comparisons, with additional context when comparing the components of sparse, quantized, and complex tensors - Extensive testing of the comparison behavior and debug messages - Small Updates - assertEqual now takes an "exact_device" argument, analogous to "exact_dtype", which should be useful in multidevice tests - assertEqual now takes an "equal_nan" argument for argument consistency with torch.isclose - assertEqual no longer takes the "allow_inf" keyword, which misleadingly only applied to scalar comparisons, was only ever set (rarely) to true, and is not supported by torch.isclose - Bug fixes: - the exact_dtype attribute has been removed (no longer needed after https://github.com/pytorch/pytorch/pull/38103) - message arguments passed to assertEqual are now handled correctly - bool x other dtype comparisons are now supported - uint8 and int8 tensor comparisons now function properly - rtol for integer comparisons is now supported (default is zero) - rtol and atol for scalar comparisons are now supported - complex scalar comparisons are now supported, analogous to complex tensor comparisons - assertNotEqual is now equivalent to the logical negation of assertEqual Pull Request resolved: https://github.com/pytorch/pytorch/pull/37294 Differential Revision: D21596830 Pulled By: mruberry fbshipit-source-id: f2576669f7113a06f82581fc71883e6b772de19b	2020-05-15 16:24:03 -07:00
Gregory Chanan	70ef9f5124	Improve testing of logical_not. (#38505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38505 This takes the testing of https://github.com/pytorch/pytorch/pull/38275, but doesn't include the kernel changes which are still being worked out. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D21580574 Pulled By: gchanan fbshipit-source-id: f12317259cb7373989f6c9ad345b19aaac524851	2020-05-15 10:51:35 -07:00
anjali411	242af6c078	Add tan_cuda for complex dtypes (#38400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38400 * #38399 Added autograd tests, disabled jit autograd tests for complex and added a separate list for tests for complex dtype only Test Plan: Imported from OSS Differential Revision: D21572209 Pulled By: anjali411 fbshipit-source-id: 7036029e9f8336139f5d54e0dfff9759f3bf8376	2020-05-15 08:16:59 -07:00
Michael Carilli	25f918548d	Allow GradScaler to be pickled (#38296 ) Summary: Should unblock https://github.com/PyTorchLightning/pytorch-lightning/issues/1782. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38296 Differential Revision: D21553296 Pulled By: albanD fbshipit-source-id: 9041a72d7cf8833e4b01bc767fd2321f17c7c5f2	2020-05-14 09:14:28 -07:00
SsnL	ae392a77a6	Add better device idx parse checks (#37376 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37376 Differential Revision: D21476036 Pulled By: zou3519 fbshipit-source-id: 86907083c23cbaf165b645307fb340f2656b814e	2020-05-14 09:07:12 -07:00
Peter Bell	0a159b0a3a	Fix precision issues in CPU remainder (#38293 ) Summary: Together with https://github.com/pytorch/pytorch/issues/37758, this fixes https://github.com/pytorch/pytorch/issues/37743 and fixes https://github.com/pytorch/pytorch/issues/24861. This follows the CUDA fix in https://github.com/pytorch/pytorch/issues/37758, vectorised using a `blendv` to replace the if conditionals. Most of the complication is from `remainder` supporting `at::Half` where `fmod` doesn't. I've now got `fmod` working on `Vec256<at::Half>` as well as enabling half dispatch for `fmod` so it matches `remainder`. I also added `fmod` support to `Vec256<at::BFloat16>` before realising that `remainder` doesn't support `BFloat16` anyway. I could also enable `BFloat16` if that's desirable. If not, I don't think `Vec256<BFloat16>` should be missing `fmod` anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38293 Differential Revision: D21539801 Pulled By: ezyang fbshipit-source-id: abac6a3ed2076932adc459174cd3d8d510f3e1d5	2020-05-14 08:54:32 -07:00
Cloud Han	8d94615c2b	Migrate erfc from TH to ATen (CUDA) (#38373 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/24559 Reference https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38373 Differential Revision: D21549626 Pulled By: ezyang fbshipit-source-id: 84c2cf58b071df3afc312ae0aef3b5ed6c014cc7	2020-05-13 21:19:03 -07:00
Hong Xu	336e1ec592	Clean up error handling in is_nonzero and where in TensorCompare.cpp (#38150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38150 Differential Revision: D21539736 Pulled By: ezyang fbshipit-source-id: e390c12f5948192a552d66dcd1bb89b2cb45f170	2020-05-13 20:19:40 -07:00
kshitij12345	d86de916a9	Migrate `exp` and `exp_` from the TH to Aten (CUDA) (#36652 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24561 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.exp(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.exp(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.exp(a) a.numel() == 10000 for 20000 times torch.half 0.3001665159999902 torch.exp(a) a.numel() == 10000 for 20000 times torch.float 0.28265794499998265 torch.exp(a) a.numel() == 10000 for 20000 times torch.double 0.3432170909998149 torch.exp(a) a.numel() == 100000 for 20000 times torch.half 0.32273333800003456 torch.exp(a) a.numel() == 100000 for 20000 times torch.float 0.31498759600003723 torch.exp(a) a.numel() == 100000 for 20000 times torch.double 1.079708754999956 ``` After: ``` torch.exp(a) a.numel() == 10000 for 20000 times torch.half 0.27996097300092515 torch.exp(a) a.numel() == 10000 for 20000 times torch.float 0.2774473429999489 torch.exp(a) a.numel() == 10000 for 20000 times torch.double 0.33066844799941464 torch.exp(a) a.numel() == 100000 for 20000 times torch.half 0.27641824200145493 torch.exp(a) a.numel() == 100000 for 20000 times torch.float 0.27805968599932385 torch.exp(a) a.numel() == 100000 for 20000 times torch.double 1.0644143180015817 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36652 Differential Revision: D21164653 Pulled By: VitalyFedyunin fbshipit-source-id: 42c7b24b0d85ff1d390231f1457968a8869b8db3	2020-05-13 10:06:51 -07:00
Natalia Gimelshein	3d968088e0	fix multinomial kernels to properly advance random states (#38046 ) Summary: Before, multinomial kernels did not advance random states enough, which lead to the same sequence being generated over and over with a shift of 4. This PR fixes that. Fixes https://github.com/pytorch/pytorch/issues/37403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38046 Differential Revision: D21516542 Pulled By: ngimel fbshipit-source-id: 23248a8c3a5c44316c4c35cd71a8c3b5f76c90f2	2020-05-12 22:33:11 -07:00
Pavel Belevich	70c6550cc9	Forgotten changes for Tensor.random_()'s from and to bounds for floating-point types (#38287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38287 Test Plan: Imported from OSS Differential Revision: D21534847 Pulled By: pbelevich fbshipit-source-id: 6ea972186789347555efbbf68407b5f12960dae6	2020-05-12 19:09:37 -07:00
Emilio Castillo	f7e7a15a5d	Fix `NaN` comparison in `torch.median` (#38216 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38018 when calling `eq_with_nan(v, kValue)` having `v` and `kValue` both `nan` is returning `false` when it should be `true`. https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/SortingKthValue.cu#L76 The implementation is using intrinsics such as `__double_as_longlong` and comparing their bit representations. But the values of the bits obtained for both nans are different. `9221120237041090560` for `v` `9223372036854775807` for `kValue` two different nans have different bit representations, so we have to do additional comparisons to fix this. I changed this comparison and it seems to be working now. However, when compared to a CPU implementation, the returned indices for the values seems to be random but valid. Probably this is an effect of the comparison order in the Cuda version. I am not sure if this is ok since all the indices point to valid elements. For the snippet in the issue I get the following: ``` # CUDA Values tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], device='cuda:0', dtype=torch.float64) # CUDA indices tensor([304, 400, 400, 528, 304, 304, 528, 336, 304, 432, 400, 280, 280, 336, 304, 336, 400, 304, 336, 560], device='cuda:0') ``` ``` # CPU values tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=torch.float64) # CPU indices tensor([515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515]) ``` Also, maybe its better to change the `eq_with_nan` implementations to address this instead? I am not sure if this will cause code to break in other places though ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/38216 Differential Revision: D21517617 Pulled By: ngimel fbshipit-source-id: deeb7bb0ac519a03aa0c5f365005a9150e6404e6	2020-05-12 18:27:14 -07:00
Cloud Han	8ab6377273	Port atan from TH to ATen (#37991 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/24538 Related https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37991 Differential Revision: D21531741 Pulled By: VitalyFedyunin fbshipit-source-id: c762cc80416d7fffbb1769c6cc5e0914ceaa8e2d	2020-05-12 14:22:26 -07:00
Ailing Zhang	7c13a07286	[Reland] Remove uses of type() part 2 (#38288 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/38140. It got reverted since it broke slow tests which were only run on master branch(thanks mruberry !). Enabling all CI tests in this PR to make sure they pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38288 Reviewed By: mruberry Differential Revision: D21524923 Pulled By: ailzhang fbshipit-source-id: 3a9ecc7461781066499c677249112434b08d2783	2020-05-12 13:37:14 -07:00
Emilio Castillo	779abf7538	Implements torch.pow for complex on cuda and enables complex values as exponents for pow (#36793 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36744 It also allows to call pow on the cpu with complex values as exponent, which was not possible before. TODO: Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/36793 Differential Revision: D21525514 Pulled By: anjali411 fbshipit-source-id: c4624c97b194cb1d942e5dd0ee9042adf7586ed3	2020-05-12 11:28:44 -07:00
Anjali Chourdia	ba0851326c	Revert D21449462: [CUDA] addmv for complex tensors Test Plan: revert-hammer Differential Revision: D21449462 Original commit changeset: 1f2dd5a7f8a4 fbshipit-source-id: 4f5f035668d1de4469d11ddeb08a77340eb52f98	2020-05-12 05:21:11 -07:00
anjali411	0d977e9223	[CUDA] addmv for complex tensors (#37940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37940 Test Plan: Imported from OSS Differential Revision: D21449462 Pulled By: anjali411 fbshipit-source-id: 1f2dd5a7f8a42d3ba92a1b1a286f35454392a06d	2020-05-11 21:46:52 -07:00
anjali411	375ddb01b5	Fix tensor printing (#38031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38031 Test Plan: Imported from OSS Differential Revision: D21502915 Pulled By: anjali411 fbshipit-source-id: 0cc3017a390da55af47ba81f651a883cd52b10da	2020-05-11 19:59:19 -07:00
kshitij12345	a37b865107	test_linspace : remove explicit for-loop (#38191 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/38187 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CPU : Intel® Core i5-8300H CPU @ 2.30GHz × 8 GPU : GTX 1050ti Test Cmd : `pytest test/test_torch.py -k linspace_cpu_float` Before : ``` test/test_torch.py .. [100%] ======================================================================== 2 passed, 5170 deselected in 24.43s ======================================================================== ``` After : ``` test/test_torch.py .. [100%] ======================================================================== 2 passed, 5170 deselected in 9.20s ========================================================================= ``` Test Cmd : `pytest test/test_torch.py -k linspace_cuda_float` Before : ``` test/test_torch.py ...... [100%] =================================================================== 6 passed, 5166 deselected in 83.84s (0:01:23) =================================================================== ``` After : ``` test/test_torch.py ...... [100%] ======================================================================== 6 passed, 5166 deselected in 40.18s ======================================================================== ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38191 Differential Revision: D21494478 Pulled By: mruberry fbshipit-source-id: fa58f727781425937a7b8212f9b63a739935eb86	2020-05-11 15:17:47 -07:00
Mike Ruberry	f6b1c046b6	Revert D21483808: [pytorch][PR] Remove uses of type() part 2 Test Plan: revert-hammer Differential Revision: D21483808 Original commit changeset: 12f5de6151ba fbshipit-source-id: 2755fa97ae3f342ae88b1531acfa790772a27c17	2020-05-09 00:42:39 -07:00
Ailing Zhang	86d28706e0	Remove uses of type() part 2 (#38140 ) Summary: I'm mostly done with cleaning up test/ folder. There're a bunch of remaining callsites but they're "valid" in testing `type()` functionalities. We cannot remove them until it's fully deprecated. Next PR would mainly focus on move some callsites to an internal API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38140 Differential Revision: D21483808 Pulled By: ailzhang fbshipit-source-id: 12f5de6151bae59374cfa0372e827651de7e1c0f	2020-05-08 19:30:46 -07:00
Xiao Wang	63b1ae6983	Fix overflow in torch.remainder when dividend is very large (#37758 ) Summary: This will fix the GPU implementation in https://github.com/pytorch/pytorch/issues/37743 and https://github.com/pytorch/pytorch/issues/24861. Please also check my [comment](https://github.com/pytorch/pytorch/issues/37743#issuecomment-623285707). The fixed `remainder_kernel` follows the similar implementation in numpy. See `79d7bc276a/numpy/core/src/npymath/npy_math_internal.h.src (L649-L658)` I also slightly update the doc for `torch.remainder`, to make it similar to `torch.fmod`. I'm not sure how to modify the Vec256 code of CPU remainder_kernel, so I just leave it there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37758 Differential Revision: D21388417 Pulled By: ngimel fbshipit-source-id: 770ba5801cf34619b2b68b8b0cf95d8cfa52e6f6	2020-05-08 16:46:55 -07:00
Donna Choi	ca2206d071	Add documentation for FeatureAlphaDropout (#36295 ) Summary: These changes add documentation for FeatureAlphaDropout, based on a need raised in an issue by SsnL (Issue https://github.com/pytorch/pytorch/issues/9886). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36295 Differential Revision: D21478591 Pulled By: zou3519 fbshipit-source-id: a73c40bf1c7e3b1f301dc3347cef7b32e9842320	2020-05-08 15:09:01 -07:00
Ralf Gommers	726aa713d5	Replace torch.is_tensor usages with isinstance checks. (#38062 ) Summary: `is_tensor` doesn't really have a reason to exist anymore (other than backwards compatibility) and is worse for typechecking with mypy (see gh-32824). Given that it may not be obvious what the fix is once mypy gives an error, make the change in a number of places at once, and add a note on this to the `is_tensor` docstring. Recommending an isinstance check instead has been done for quite a while, e.g. https://github.com/pytorch/pytorch/pull/7769#discussion_r190458971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38062 Differential Revision: D21470963 Pulled By: ezyang fbshipit-source-id: 98dd60d32ca0650abd2de21910b541d32b0eea41	2020-05-08 10:10:11 -07:00
Chris Paulse	deeef50432	Check the _geev input matrix for NaNs and infs (#37642 ) Summary: If we don't do this we risk a segmentation fault from the Intel MKL. Fixes https://github.com/pytorch/pytorch/issues/37499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37642 Differential Revision: D21465181 Pulled By: pbelevich fbshipit-source-id: 809dca11f11de91018d978578bc11737b879d6ec	2020-05-07 21:33:37 -07:00
Edward Yang	c2f787ce77	Give _VariableFunctions class a different name, so pickling works (#38033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38033 Pickles require class names to be actually accessible from the module in question. _VariableFunction was not! This fixes it. Fixes https://github.com/pytorch/pytorch/issues/37703 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21458068 Pulled By: ezyang fbshipit-source-id: 2a5ac41f9d1972e300724981b9b4b84364ddc18c	2020-05-07 20:34:21 -07:00
Michael Carilli	35693e9b4b	Give at::cuda::blas::gemv<at::Half> parity with <float> and <double>. Nature is healing. (#37569 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37157 on my machine. This was annoying to track down. The essence is that cublas expects column major inputs and Pytorch tensors are usually row major. Cublas lets you request that it act on transposed data, and the erroring `gemv` calls in https://github.com/pytorch/pytorch/issues/37157 make that request. The problem is, [cublasSgemv and cublasDgemv](https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gemv) (called by [`gemv<float>`](`091a1192d7/aten/src/ATen/cuda/CUDABlas.cpp (L318)`) and `gemv<double>`) regard their `m, n` arguments values as _pre_-transpose sizes, while [cublasGemmEx](https://docs.nvidia.com/cuda/cublas/index.html#cublas-GemmEx) (called by `gemv<at::Half>`, see [here](`091a1192d7/aten/src/ATen/cuda/CUDABlas.cpp (L342)`) and [here](`091a1192d7/aten/src/ATen/cuda/CUDABlas.cpp (L229)`)) regards its `m, k` argument values as _post_-transpose sizes. This is inconsistent. It turns out the `gemv<float>/<double>` calls are configured correctly and the `gemv<at::Half>` calls aren't. Strikethrough text below is no longer accurate, ngimel suggested a better way to handle gemv->gemm forwarding. [Comments in code](https://github.com/pytorch/pytorch/pull/37569/files#diff-686aa86335f96b4ecb9b37f562feed12R323-R348) provide an up-to-date explanation. Keeping out-of-date strikethrough text because I don't have the heart to delete it all and because it captures an intermediate state of my brain that will help orient me if i ever have to fix this again. ~~To convince myself this PR keeps `at::cuda::blas::gemv`'s external API consistent across dtypes, I need to think through what happens when a pytorch tensor input of size `(a,b)` multiples a vector of size `(b,)` for 4 cases:~~ ### ~~1. input is row-major (needs cublas internal transpose)~~ #### ~~1a. input is float or double~~ ~~`gemv<float>/<double>` call `cublasS/Dgemv`, forwarding `trans`, `m`, and `n` directly.~~ ~~`cublasS/Ggemv` expects "a m × n matrix stored in column-major format" (so m is the input's fast dim). Input has size `(a, b)` in row-major format. We can reinterpret it as a column-major matrix with size `(b, a)` without any memory movement. So the gemv call should supply `m=b`, `n=a`. However, we're not trying to multiply a matrix `(b, a)` x a vector `(b,)`, we're trying to sum across `b` for matrix and vector. So we also request that cublas transpose the matrix internally by supplying `trans='t'` to `blas::gemv`, which becomes `trans=CUBLAS_OP_T` to the `cublasS/Ggemv`.~~ ~~As long as the code calling `blas::gemv` thinks carefully and passes `trans='t'`, `m=b`, `n=a`, cublas carries out `(a, b) x (b,)` and all is well.~~ #### ~~1b. input is half or bfloat16~~ ~~`blas::gemv<at::Half>` takes a different code path, calling `gemm<at::Half>` which calls `cublasGemmEx`. The job of this PR is to make sure the exterior `blas::gemv` caller's carefully thought-out argument choices (`trans='t'`, `m=b`, `n=a`) remain correct.~~ ~~`cublasGemmEx` takes args `transa, transb, m, n, k, ....others we don't care about` and carries out~~ ``` C = α op ( A ) op ( B ) + β C where α and β are scalars, and A , B and C are matrices stored in column-major format with dimensions op ( A ) m × k , op ( B ) k × n and C m × n Also, for matrix A A if transa == CUBLAS_OP_N op ( A ) = A^T if transa == CUBLAS_OP_T ... ``` ~~`gemv<at::Half>` hacks a gemv by calling gemm such that the raw gemm's `m` is the output dim, `k` is the summed dim, and `n=1`, . Reasonable, as long as we get the values right, given that we also need to transpose the input.~~ ~~To conform with cublas docs we interpret input as column-major with size `(b, a)`. As for the `<float>/<double>` gemv we want cublas to carry out input (interpreted as column major), internally transposed, times vector of size `(b,)`. In other words we want cublas to apply `op(A) x B`, where op is transpose and `A` is input interpreted as column major. Docs define `m` and `k` by saying `op(A)` has dims `m x k` (`m` and `k` are _post_-`op` sizes). `A` was `(b, a)`, `op(A)` is `(a, b)`, so the correct thing is to supply `m=a`, `k=b` to the underlying gemm. For the `<float>/<double>` gemv, we passed `m=b`, not `m=a`, to the raw `cublasS/Dgemv`.~~ ~~The exterior `blas::gemv` must have been called with `trans='t'`, `m=b`, `n=a` (as required by the `<float>/<double>` versions). So when gemv is about to call gemm, we [swap](https://github.com/pytorch/pytorch/pull/37569/files#diff-686aa86335f96b4ecb9b37f562feed12R330) the local values of `m` and `n` so that `m=a`, `n=b`, then put `m (=a)` in the gemm's `m` spot, 1 in the gemm's `n` spot, and `n (=b)` in the gemm's `k` spot. All is well (we made the right gemm call after ingesting the same arg values as `blas::gemv<float>/<double>`).~~ ### ~~2. input is column-major (doesn't need cublas transpose)~~ #### ~~2a. input is float or double~~ ~~input is `(a,b)`, already column-major with strides `(1,a)`. Code calling `blas::gemv` supplies `trans='n'` (which becomes `CUBLAS_OP_N`, no internal transpose), `m=a`, `n=b`.~~ #### ~~2b. input is half or bfloat16~~ ~~`blas::gemv` should pass `transa='n'`, `m=a`, `n=1`, `k=b` to the underlying gemm. The exterior `blas::gemv` must have been called with `trans='t'`, `m=a`, `n=b` (as required by the `<float>/<double>` versions). So in this case we _don't_ swap `blas::gemv`'s local values of `m` and `n`. We directly put `m (=a)` in the gemm's `m` spot, 1 in the gemm's `n` spot, and `n (=b)` in the gemm's `k` spot. All is well (we made the right gemm call after ingesting the same arg values as `blas::gemv<float>/<double>`).~~ ~~ `trans` is a string `t` or `n` in the `at::cuda::blas::gemv` API, which gets [converted](`091a1192d7/aten/src/ATen/cuda/CUDABlas.cpp (L314)`) to a corresponding cublas enum value `CUBLAS_OP_T` (do transpose internally) or `CUBLAS_OP_N` (don't transpose internally) just before the raw cublas call.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/37569 Differential Revision: D21405955 Pulled By: ngimel fbshipit-source-id: e831414bbf54860fb7a4dd8d5666ef8081acd3ee	2020-05-06 18:19:30 -07:00
anjali411	4c4816ad07	[CPU] addmv for complex tensors (#37924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37924 Test Plan: Imported from OSS Differential Revision: D21429384 Pulled By: anjali411 fbshipit-source-id: 8b1b76ed13d2e5785a4d552aedb2e6f58d304c46	2020-05-06 14:13:05 -07:00
Gao, Xiang	b57b596f20	Reduction should not coalesce_dimensions when splitting for 32bit indexing (#37788 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37788 Differential Revision: D21387325 Pulled By: ngimel fbshipit-source-id: dbd0f5a23e06d8c4cc68cd21b09b4b0221c4bba7	2020-05-05 23:44:00 -07:00
rohithkrn	e3934dfae8	[ROCm] Enable bfloat16 for ops in BERT model (#37634 ) Summary: Enables bfloat16 type for ops present in BERT model. Enabled relevant unit tests. ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37634 Differential Revision: D21413957 Pulled By: ezyang fbshipit-source-id: 19309fe46b4a2f07922bf5b32fee2066df514aeb	2020-05-05 21:24:56 -07:00
Hong Xu	3b97723f08	Let >> and << support half on CUDA (#37670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37670 Differential Revision: D21395325 Pulled By: ngimel fbshipit-source-id: fcb02f3bee488717cdc1ffc05204970b907d3c3f	2020-05-05 10:10:37 -07:00
kshitij12345	145560f499	Migrate `erf` and `erf_` from the TH to Aten (CUDA) : Closes #24558 (#36724 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24558 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.erf(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.erf(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.erf(a) a.numel() == 10000 for 20000 times torch.half 0.29057903600187274 torch.erf(a) a.numel() == 10000 for 20000 times torch.float 0.2836507789979805 torch.erf(a) a.numel() == 10000 for 20000 times torch.double 0.44974555500084534 torch.erf(a) a.numel() == 100000 for 20000 times torch.half 0.31807255600142526 torch.erf(a) a.numel() == 100000 for 20000 times torch.float 0.3216503109979385 torch.erf(a) a.numel() == 100000 for 20000 times torch.double 2.0413486910001666 ``` After: ``` torch.erf(a) a.numel() == 10000 for 20000 times torch.half 0.2867302739996376 torch.erf(a) a.numel() == 10000 for 20000 times torch.float 0.28851128199858067 torch.erf(a) a.numel() == 10000 for 20000 times torch.double 0.4592030350013374 torch.erf(a) a.numel() == 100000 for 20000 times torch.half 0.28704102400115517 torch.erf(a) a.numel() == 100000 for 20000 times torch.float 0.29036039400125446 torch.erf(a) a.numel() == 100000 for 20000 times torch.double 2.04035638699861 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36724 Differential Revision: D21164626 Pulled By: VitalyFedyunin fbshipit-source-id: e6f3390b2bbb6e8d21e18ffe15f5d49a170fae83	2020-05-05 09:22:54 -07:00

1 2 3 4 5 ...

1220 Commits