pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Vitaly Fedyunin	02917dd1f4	Add memory format support to `randint_like` operator (#27889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27889 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980307 Pulled By: VitalyFedyunin fbshipit-source-id: f1766c2bcb015ef870bfb92c16b4cd363b3cbc14	2019-10-25 07:29:20 -07:00
Vitaly Fedyunin	c258cd039a	Add memory format support to `zeros_like` operator (#27562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27562 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980313 Pulled By: VitalyFedyunin fbshipit-source-id: 9ca8453dc1a554ceea93c6949e01263cc576384b	2019-10-25 07:29:16 -07:00
Vitaly Fedyunin	04f5325583	Add memory format support to `rand_like` operator (#27561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27561 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980316 Pulled By: VitalyFedyunin fbshipit-source-id: 2a1d47571268673de0c6f5ae1b6d4f9110962ab0	2019-10-25 07:29:12 -07:00
Vitaly Fedyunin	2c339a24ec	Add memory format support to `ones_like` operator (#27270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27270 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980312 Pulled By: VitalyFedyunin fbshipit-source-id: 5da9530f6b239306dbb66d1dfeefe88237f13bbd	2019-10-25 07:29:08 -07:00
Vitaly Fedyunin	85d5aee863	Add memory format support to `full_like` operator (#27262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27262 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980309 Pulled By: VitalyFedyunin fbshipit-source-id: 1761a9939aa7c5ab23e927b897e25e225089a8e7	2019-10-25 07:29:04 -07:00
Vitaly Fedyunin	baf8488dbd	Add memory format support to `empty_like` operator (#27244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27244 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D17980310 Pulled By: VitalyFedyunin fbshipit-source-id: 00a39b40daa4b8ee63c32e60d920222f8be2d6a1	2019-10-25 07:29:00 -07:00
Gregory Chanan	2793d41a9c	Fix scalar handling of unfold. (#28462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28462 Unfold is implemented in TH (as _th_unfold), and uses the standard scalar checks. That means, even though torch.tensor(5).unfold(dim=0, size=1, step=1) should produce: torch.tensor([5]), it actually produces torch.tensor(5) because the scalar_check infers it's a scalar. We can fix this by just turning off the scalar_check. Test Plan: Imported from OSS Differential Revision: D18074671 Pulled By: gchanan fbshipit-source-id: 5db09d614692830d66d6e6d8aba799ebe8144cf5	2019-10-25 07:12:18 -07:00
Hong Xu	5cf644157c	Speed up fill for half and bfloat16 on CPU. (#28397 ) Summary: This is done by replacing Vec<uint16_t> with Vec<int16_t>, which has all sorts of AVX optimization available. Benchmark (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136): ```python import timeit for dtype in ('torch.bfloat16', 'torch.half'): for n, t in [(40_000, 600000), (400_000, 60000)]: print(f'a.fill_(10) for {t} times, a=torch.empty({n}, dtype={dtype})') print(timeit.timeit(f'a.fill_(10)', setup=f'import torch; a=torch.empty({n}, dtype={dtype})', number=t)) ``` Before: ``` a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.bfloat16) 11.064065577999827 a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.bfloat16) 10.618151295000189 a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.half) 10.989039544000207 a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.half) 10.602233665999847 ``` After: ``` a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.bfloat16) 1.530125006000162 a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.bfloat16) 1.4807136570002513 a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.half) 1.3946152990001792 a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.half) 1.457788402999995 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28397 Differential Revision: D18125171 Pulled By: ezyang fbshipit-source-id: bfb2da13f10bc582e9848073e428af9e36656b13	2019-10-24 15:03:11 -07:00
Anjali Chourdia	136bb07a93	torch.histc added a finite range check to resolve segfaults if tensor has inf. also added checks for nan values, min>max (#27712 ) Summary: https://github.com/pytorch/pytorch/issues/27464 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27712 Differential Revision: D18064544 Pulled By: anjali411 fbshipit-source-id: c9c6d8eb4d55f2b5320409ba238bf44b0be8902e	2019-10-24 09:28:45 -07:00
Gregory Chanan	4f0a3504e1	Port is_set_to from TH/THC to ATen. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28425 Test Plan: Imported from OSS Differential Revision: D18063328 Pulled By: gchanan fbshipit-source-id: 86af01a630d88c30947b8c85d1fac86dd7b40585	2019-10-24 09:15:03 -07:00
Gregory Chanan	bee4aca259	is_set_to: unify TH/THC implmentation and genericize test_is_set_to. (#28422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28422 The TH implementation had two differences: 1) It explicitly checked for null storages; this isn't supported anymore so can be removed. 2) It collapsed all empty tensors to the same shape for the purpose of checking. This was introduced to keep BC when we introduced N-dimensional empty tensors, but since it's been quite a long time since we've had N-dimensional empty tensors and the CUDA implementation didn't support this, we should get rid of it. Test Plan: Imported from OSS Differential Revision: D18061916 Pulled By: gchanan fbshipit-source-id: 1a54cf9ea4fcb35b358a9ab57f84eff059ff1e7b	2019-10-23 13:46:52 -07:00
vishwakftw	657430e1f0	Return 0-numel empty tensor from symeig when eigenvectors=False (#28338 ) Summary: Changelog: - Changes the behavior of returning a zero tensor when eigenvectors=False, matching behavior of torch.eig Pull Request resolved: https://github.com/pytorch/pytorch/pull/28338 Test Plan: - test_symeig has been modified appropriately for this change Differential Revision: D18085280 Pulled By: ezyang fbshipit-source-id: 43129a96dd01743997157974100e5a7270742b46	2019-10-23 11:44:57 -07:00
Hong Xu	19aeb472aa	Move the CUDA implementation of log1p to ATen. (#26923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26923 Fix #24588 Test Plan: Imported from OSS Differential Revision: D17984184 Pulled By: VitalyFedyunin fbshipit-source-id: 3bc2be4f08e800b1de274940f2bd3d5b418b45ee	2019-10-22 14:00:59 -07:00
iurii zdebskyi	5e73e1fff8	Enabled torch.unique for bool tensors (#28374 ) Summary: Enabled torch.unique for bool tensors. Tested via unit tests [issue](https://github.com/pytorch/pytorch/issues/27691) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28374 Differential Revision: D18043413 Pulled By: izdeby fbshipit-source-id: 295ff03b9b61d33bbd2e05e6211c4f35a0ee23ea	2019-10-22 09:09:46 -07:00
Hong Xu	9cb003a94f	Add typing check of alpha for torch.sub and code clean up. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28298 Differential Revision: D18017923 Pulled By: nairbv fbshipit-source-id: 2c4b3f96eb005dfb70e1b7ff87d28eb79b9300dd	2019-10-18 14:49:42 -07:00
Igor Fedan	12dde7f58a	cdist performance improvement for euclidean distance (#25799 ) Summary: jacobrgardner https://github.com/pytorch/pytorch/issues/15253#issuecomment-491467128 preposed a way to speedup euclidean distance calculation. This PR is implementation of this solution for normal and batch version. Also simonepri provided performance metrics https://github.com/pytorch/pytorch/issues/15253#issuecomment-502363581 ![image](https://user-images.githubusercontent.com/12058312/64460756-44a24580-d0c9-11e9-9f7f-a5942f4c832d.png) Current implementation has speedup comparing to jacobrgardner approach ![image](https://user-images.githubusercontent.com/12058312/64461495-5553bb00-d0cb-11e9-87e6-302b8cc7e12b.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25799 Differential Revision: D17964982 Pulled By: ifedan fbshipit-source-id: bf7bd0dbfca51fd39e667da55139347480f30a2f	2019-10-17 14:56:54 -07:00
Vitaly Fedyunin	951dd03037	Add memory format support to typecasting shortcuts `byte`,`char`,`double`,`bool`,`half`,`int`,`long`,`short`,`float`,`bfloat16` (#27228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27228 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980315 Pulled By: VitalyFedyunin fbshipit-source-id: fd5615621bc4968aa4ef2a26430c492c552ed671	2019-10-17 09:16:25 -07:00
Vitaly Fedyunin	15df371934	Add memory format support to typecasting shortcuts `byte`,`char`,`double`,`bool`,`half`,`int`,`long`,`short`,`float`,`bfloat16` (#27228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27228 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980128 Pulled By: VitalyFedyunin fbshipit-source-id: b2646bab72c4475b7a82bb271d204a9d96d28bd4	2019-10-17 09:16:21 -07:00
Hong Xu	4a69d048e0	Move the CUDA implementation of log2 to ATen. (#26769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26769 Fix #24589 Test Plan: Imported from OSS Differential Revision: D17960122 Pulled By: VitalyFedyunin fbshipit-source-id: 58dff236886bbf3a0a152d7422aa8a5c478ee1de	2019-10-17 07:27:55 -07:00
Mike Ruberry	edc28676ef	Adds @overridePrecision decorator (#28131 ) Summary: Adds the overridePrecision decorator, which allows device generic tests to specify per-dtype precision overrides. Precision is overridden on the test class instance itself, and so is thread-local (so that running multiple tests in parallel will not conflict). It can be accessed directly from a test with self.precision, as before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28131 Differential Revision: D17969774 Pulled By: mruberry fbshipit-source-id: c4e0b71afac6bdc7cbf4e799f3054922de764820	2019-10-16 19:47:55 -07:00
Natalia Gimelshein	97257e257e	clean up test_cat_empty (#28115 ) Summary: Remove spurious parts from test_cat_empty Pull Request resolved: https://github.com/pytorch/pytorch/pull/28115 Test Plan: no additional tests needed. Differential Revision: D17956669 Pulled By: ngimel fbshipit-source-id: cffcfa9e5b50afba62c6dbc8ca5d9de95d0c020e	2019-10-16 14:42:14 -07:00
Hong Xu	cbb4c87d43	Improve the doc and test of logical_xor (#28031 ) Summary: Following up https://github.com/pytorch/pytorch/issues/27248. per suggestion by gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/28031 Differential Revision: D17962226 Pulled By: gchanan fbshipit-source-id: 788e4e1fc78b1cfc7915aedaa10c8656b19edc4d	2019-10-16 13:57:53 -07:00
Vitaly Fedyunin	d39ab0312a	Add memory_format support `to` and `type` operators (#27107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27107 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17931062 Pulled By: VitalyFedyunin fbshipit-source-id: 2c5dd3dd05bf58a9a29f25562cd45190b009c3f9	2019-10-15 12:55:56 -07:00
Hong Xu	e6a71405a0	Let logical_xor support non-bool tensors (again) (#27248 ) Summary: `f362a5a04b` reverted `5ca612b55e` due to build time conerns (also see https://github.com/pytorch/pytorch/issues/25254). Now we come back to this by reusing the underlying code in comparison operators: Logical operators on non-bool variables are essentially comparison operators that semantically output bool values. Compared with the previous implementation, we compromise by always applying XOR on the same input type, while output can be either the input type or the bool type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27248 Differential Revision: D17929356 Pulled By: ezyang fbshipit-source-id: dbac08c7614b36f05d24c69104fee9df9ca523d5	2019-10-15 10:56:32 -07:00
vishwakftw	ad47788647	Add Polygamma to the docs (#27696 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25347 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27696 Differential Revision: D17916790 Pulled By: ezyang fbshipit-source-id: ac2635a300b1ef0ab437e3ffac152239754fe828	2019-10-15 07:00:57 -07:00
vishwakftw	82a69a690f	Add documentation for torch.lgamma (#27812 ) Summary: Changelog: - Add doc string in _torch_docs.py, _tensor_docs.py - Expose in docs/source/torch.rst, docs/source/tensors.rst Pull Request resolved: https://github.com/pytorch/pytorch/pull/27812 Test Plan: - Remove `lgamma`, `lgamma_` from the blacklist Fixes https://github.com/pytorch/pytorch/issues/27783 Differential Revision: D17907630 Pulled By: ezyang fbshipit-source-id: 14e662a4e5262126889a437e5c4bfb21936730e8	2019-10-14 08:47:04 -07:00
Mike Ruberry	fab48eb200	Makes some CPU-only tests in test_torch generic (#27688 ) Summary: Per title. Also testing putting test_advancedindex back on the default stream. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27688 Differential Revision: D17888351 Pulled By: mruberry fbshipit-source-id: af8adeca89f575fc276921b39049b07135ed9776	2019-10-11 17:13:41 -07:00
t-kuha	b6fea4f77f	Removes floating_dtype decorator from test_torch and test_cuda (#27599 ) Summary: Per title. Also makes a few test_torch tests generic. This PR removes ~half the floating_dtype decorators. Follow-up will remove the rest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27599 Differential Revision: D17840056 Pulled By: mruberry fbshipit-source-id: 428bb5498c452083e3608325e0b548b1d75baf2d	2019-10-09 16:10:26 -07:00
zou3519	59b14a7620	Documentation for named tensors (#27173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27173 `docs/source/named_tensor.rst` is the entry point; most users will land either here or the named tensor tutorial when looking to use named tensors. We should strive to make this as readable, concise, and understandable as possible. `docs/source/name_inference.rst` lists all of the name inference rules. It should be clear but it's hard to make it concise. Please let me know if anything doesn't make sense and please propose alternative wordings and/or restructuring to improve the documentation. This should ultimately get cherry-picked into the 1.3 branch as one monolithic commit so it would be good to get all necessary changes made in this PR and not have any follow ups. Test Plan: - built and reviewed locally with `cd docs/ && make html`. Differential Revision: D17763046 Pulled By: zou3519 fbshipit-source-id: c7872184fc4b189d405b18dad77cad6899ae1522	2019-10-08 22:22:30 -07:00
Mike Ruberry	7f183a978f	Stops common_utils.py from setting the default tensor type (to torch.DoubleTensor) (#27444 ) Summary: This PR stop common_utils.py from setting the default tensor type when it's imported. See issue https://github.com/pytorch/pytorch/issues/27355. This is a frequent source of confusion for test writers. Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are: - test_autograd.py - test_distributions.py - test_jit.py - test_nn.py This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting. Notable technical changes in this PR are: - Significant updates to test_torch.py to make it pass without setting the default floating dtype globally. - The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously. - test_torch-specific parts of common_utils were refactored into test_torch. - tensor creation methods in common_utils were updated to accept an optional dtype and device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27444 Differential Revision: D17795235 Pulled By: mruberry fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1	2019-10-08 09:52:44 -07:00
Mike Ruberry	a7de545c63	Makes test_cuda.py's generated tensor op tests generic (#27210 ) Summary: - The tensor op tests generated in test_cuda.py are now generic and appear in test_torch,py - Data previously held in auxiliary data structures and files, like test_cuda_ignores.txt, is inlined Previously the tensor op tests used several auxiliary data structures, a file, and exception handling to filter the test suite. If a function wasn't implemented, for example, that exception would be caught. This let functions like trigamma, which isn't callable, appear to be tested. See https://github.com/pytorch/pytorch/issues/27230. Filtering from additional data stores is error prone, too. It requires developers understand what data stores are used and how they're used. The existing sources are also sometimes incorrect. The txt file claims that dist_ doesn't work on half tensors, for example, but the updated tests verify it does. In addition to making these tests generic, this PR removes those auxiliary data structures and does not catch any exceptions. Exceptions are errors. (This also means that if something implemented breaks it will now report as an error. Previously the test suite would have reported a pass.) The test infrastructure was also simplified to not perform computations with CPU half tensors since they do not support many operations. This introduces a float<->half conversion quirk but eliminates awkward functions that would first convert cpu tensors to float, perform an operation, and convert them back. With this change test_cuda.py is almost entirely CUDA-specific. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27210 Differential Revision: D17757907 Pulled By: mruberry fbshipit-source-id: b3c191c379667b1a7d5361087bdf82f397f77f65	2019-10-04 02:40:59 -07:00
Junjie Bai	76f847546b	Enable Python3.6 PyTorch ROCm CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27353 Differential Revision: D17758495 Pulled By: bddppq fbshipit-source-id: 95e329bc30f092e4093a33c408f1647b803d9983	2019-10-04 00:23:37 -07:00
Hong Xu	2e62318243	Move the CUDA implementation of log10 to ATen. (#26733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26733 Close #24587 Test Plan: Imported from OSS Differential Revision: D17606981 Pulled By: VitalyFedyunin fbshipit-source-id: 732f07b981287da3ca235b272b7b6f78144f8ebe	2019-10-03 14:54:20 -07:00
Vitaly Fedyunin	7b2e8c323c	Add memory format argument to the `clone` operator (#27106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27106 Adds memory_format option to the `clone` operator. Introduce new `clone` behavior if used with `input_t.clone(memory_format=torch.preserve_format)`: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17699357 Pulled By: VitalyFedyunin fbshipit-source-id: 5ae1537c2aca1abf0bf1eec4416846129c156f66	2019-10-03 12:08:47 -07:00
Mike Ruberry	b45f1b9601	Makes more of test_cuda.py generic and updates test_torch tests (#27135 ) Summary: - Makes more of test_cuda generic, including some serialization tests - Updates some tests in test_torch to use latest extensibility points and patterns Most remaining tests in test_cuda.py are either generated (to be moved in a follow-up PR) or deal with CUDA-specific features like streams, events, and querying CUDA devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27135 Differential Revision: D17696478 Pulled By: mruberry fbshipit-source-id: 51ae424c8a72e725556a2f2bc92ad9a87244b3c0	2019-10-01 19:18:56 -07:00
peter	ec07d144ba	Fixed seek offset size to 64bit. (#27125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26998. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27125 Differential Revision: D17687154 Pulled By: ezyang fbshipit-source-id: 6784f4fd799130ac72a25884f120a0ba96bd4f51	2019-10-01 08:50:32 -07:00
Mike Ruberry	ea414e4990	Adds Device Generic Precision Tests to test_torch.py (#26762 ) Summary: - Lets device generic classes be instantiated for all available device types EXCEPT those specified - Creates TestDevicePrecision in test_torch.py, letting devices compare their results to the CPU's - Moves 4 functions from test_cuda.py to TestDevicePrecision - polygamma and digamma functions were cleaned up The polygamma and digamma tests always ran with double tensors and will fail when using float tensors, despite former comments and code to the contrary. Notes were added to each function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26762 Differential Revision: D17677859 Pulled By: mruberry fbshipit-source-id: 7cbe7d05ee0bc9b622c9127be36ced02f9c4506a	2019-09-30 19:09:21 -07:00
Mike Ruberry	ec7913afbd	Cuts test_torch.py runtime in half by marking four tests as slow (#26789 ) Summary: - Adds slowTest to four tests On my devfair running test_torch.py takes ~200 seconds with slow tests enabled. Running with the current slowTest annotations takes ~145s. Running with these four additional annotations takes ~64s. test_sum_dim, for example, takes 30s but was not marked as slow. test_det_logdet_slogdet takes 17s on CPU and 22s on CUDA for a total of 39s! test_einsum takes 7s. test_triu_tril takes 5 seconds on CPU and 9s on CUDA for a total of 14s. Several of the current slowTests are faster than this. test_cholesky_solve_batched_many_batches, for example, takes a ~3 seconds on CPU and ~4.5 on CUDA, for a total of 7.5s across both devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26789 Differential Revision: D17574282 Pulled By: mruberry fbshipit-source-id: 3e5e505244c09b0ae23bd8c0145828119326719b	2019-09-30 17:25:30 -07:00
Edward Yang	b16358b251	Revert D17666050: [pytorch][PR] Fixed seek offset size to 64bit. Test Plan: revert-hammer Differential Revision: D17666050 Original commit changeset: f02ebd5320ae fbshipit-source-id: 6bc8fe583e350e2b573f767af85d1287dd048d1f	2019-09-30 11:07:35 -07:00
Yoshiaki Nakamura	1afe3fc01e	Fixed seek offset size to 64bit. (#27047 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27047 Differential Revision: D17666050 Pulled By: ezyang fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9	2019-09-30 07:52:15 -07:00
Vitaly Fedyunin	275e0c1c8f	Make nonzero non differentiable as it supposed to be (#26980 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/26038 Somewhere between v1.1 and master `nonzero` become `abstract` and was marked as differentiable (by mistake) we need to but them into TH section of `tools/autograd/derivatives.yaml ` to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26980 Differential Revision: D17632276 Pulled By: VitalyFedyunin fbshipit-source-id: d6cabcc53348af6148cea5a1bd1af2ef12547373	2019-09-30 07:33:58 -07:00
Igor Fedan	ee2c79d699	Migrate le/gt/ge/eq/ne from the TH to Aten. Added support of type promotion. (#27017 ) Summary: https://github.com/pytorch/pytorch/pull/26981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27017 Differential Revision: D17651454 Pulled By: ifedan fbshipit-source-id: c6313caa11598a0ef160e1c6d2f3c33d03ce80c5	2019-09-28 15:08:41 -07:00
Mike Ruberry	8858f42aa4	Revert D17635651: [pytorch][PR] Migrate le/gt/ge/eq/ne from the TH to Aten. Added support of type promotion. Test Plan: revert-hammer Differential Revision: D17635651 Original commit changeset: 6ec7615207f5 fbshipit-source-id: 1bd5d01856aabd01ff6b472dfa636bcea91c60a5	2019-09-27 21:09:26 -07:00
Igor Fedan	541de7e140	Migrate le/gt/ge/eq/ne from the TH to Aten. Added support of type promotion. (#26981 ) Summary: https://github.com/pytorch/pytorch/issues/24606 Migrate ne and ne_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24740 Migrate ne and ne_ from the TH to Aten (CPU) https://github.com/pytorch/pytorch/issues/24573 Migrate gt and gt_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24709 Migrate gt and gt_ from the TH to Aten (CPU) https://github.com/pytorch/pytorch/issues/24556 Migrate eq and eq_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24696 Migrate eq and eq_ from the TH to Aten (CPU) https://github.com/pytorch/pytorch/issues/24568 Migrate ge and ge_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24703 Migrate ge and ge_ from the TH to Aten (CPU) https://github.com/pytorch/pytorch/issues/24582 Migrate le and le_ from the TH to Aten (CUDA) https://github.com/pytorch/pytorch/issues/24719 Migrate le and le_ from the TH to Aten (CPU) Performance characteristics are similar to https://github.com/pytorch/pytorch/issues/25998 This PR migrates comparison ops from TH to ATen and adds type promotion in the same way as in https://github.com/pytorch/pytorch/issues/25998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26981 Differential Revision: D17635651 Pulled By: ifedan fbshipit-source-id: 6ec7615207f5c248a6dd85fc54c25bd5e6d328e6	2019-09-27 17:28:56 -07:00
Dmytro Dzhulgakov	764bf826e3	Remove fbgemm_is_cpu_supported in favor of torch.backends.quantized.supported_qengines (#26840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26840 Cleaning up top-level namespace. Also cosmetic changes to torch.backends.quantized Test Plan: Imported from OSS Differential Revision: D17604403 Pulled By: dzhulgakov fbshipit-source-id: c55af277ea7319d962a82a6120f65ccd47a60abc	2019-09-27 13:45:15 -07:00
Igor Fedan	f99bc714c7	Migrate lt and lt_ from the TH to Aten (#25998 ) Summary: https://github.com/pytorch/pytorch/issues/24593 https://github.com/pytorch/pytorch/issues/24727 torch.lt(Tensor a, Tensor b) will compute common dtype (highest) based on inputs and then compare values. The result will be Bool tensor ``` >>> x = torch.tensor([0], dtype=torch.int) >>> y = torch.tensor([0.5], dtype=torch.double) >>> x < y tensor([True]) ``` Previously it was impossible to make comparison of two tensors with different dtype. torch.lt(Tensor a, Tensor b, out=c) will compute common dtype (highest) based on inputs and then compare values. The result can be populated only to Bool tensor ``` >>> x = torch.tensor([0], dtype=torch.int) >>> y = torch.tensor([0.5], dtype=torch.double) >>> z = torch.empty([1], dtype=torch.bool) >>> torch.lt(x, y, out=z) tensor([True]) ``` Previously it was impossible to make comparison of two tensors with different dtype. Also previously the result dtype could be Bool and Byte(deprecated). Currently it will accept only Bool result. a.lt_(Tensor b) Expects that a and b has same dtype, otherwise it's possible to get an overflow(Example: 'a' is uint8, 'b' is float32. 'a' will be promoted to float32 and the result will be also float32. Then it will be casted back to uint8 so potential for overflow). Will not compute common dtype. Result will have type of a. ``` >>> x = torch.tensor([0], dtype=torch.double) >>> y = torch.tensor([0.5], dtype=torch.double) >>> x < y tensor([True]) ``` Works similar to previous implementation. torch.lt(Tensor a, Scalar b) will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. ``` >>> x = torch.tensor([0], dtype=torch.double) >>> x < 0.5 tensor([True]) >>> x = torch.tensor([0], dtype=torch.int) >>> x < 0.5 tensor([True]) ``` Fix https://github.com/pytorch/pytorch/issues/22301. torch.lt(Tensor a, Scalar b, out=c) will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. The result can be populated only to Bool tensor ``` >>> x = torch.tensor([0], dtype=torch.double) >>> torch.lt(x, 0.5, out=z) tensor([True]) ``` Previously the result dtype could be Bool and Byte(deprecated). Currently it will accept only Bool result. The rest works similar to previous implementation. torch.lt_(Tensor a, Scalar b) will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. Result will have type of a. ``` >>> x = torch.tensor([0], dtype=torch.int) >>> x.lt_(1) tensor([1], dtype=torch.int32) >>> x = torch.tensor([0], dtype=torch.int) >>> x.lt_(1.0) tensor([1], dtype=torch.int32) ``` Works similar to previous implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25998 Differential Revision: D17431853 Pulled By: ifedan fbshipit-source-id: b5effc6a5d9b32da379395b32abc628b604faaf7	2019-09-26 16:05:27 -07:00
Hong Xu	9dd8a129de	Fix Vec256<T>::abs() for floating point when applied on -0.0 (#26422 ) Summary: Currently when a Vec256<T> (base) object contains -0.0, Vec256<T>::abs() would not produce 0.0, but -0.0 instead. This commit fixes this issue. This bug will mostly affect CPUs without AVX support, such as ARM, PowerPC, and older Intel models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26422 Differential Revision: D17607346 fbshipit-source-id: e8d4595f0e88ad93018a61f89b9e3dcada485358	2019-09-26 15:55:55 -07:00
Ethan Steinberg	bf1d957dc8	Fix the Bernoulli distribution sampler (#26864 ) Summary: The current Bernoulli distribution sampler is slightly off in that it returns true slightly too often. This is most obvious at very low p values, like p = 0, although it theoretically occurs at every probability. See https://github.com/pytorch/pytorch/issues/26807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26864 Differential Revision: D17610459 Pulled By: ezyang fbshipit-source-id: 28215ff820a6046822513f284793e7b850d38438	2019-09-26 14:14:57 -07:00
Hong Xu	91549ef6c8	Move the CUDA implementation of log to ATen. (#26494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26494 Close #24586 Test Plan: Imported from OSS Differential Revision: D17572497 Pulled By: VitalyFedyunin fbshipit-source-id: e1bcd33021464eaa4affd4c6d3283c8403069945	2019-09-25 17:04:08 -07:00
nmilosev	5fc52482cf	torch.load default encoding change to 'utf-8' (#26421 ) Summary: Default encoding when using torch.load to 'utf-8' This commit provides changes for cases where user tries to torch.load a pickled module with non-ASCII characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743. The default encoding was changed from 'ascii' to 'utf-8'. Documentation for `torch.load` was updated and two tests (loading py2 unicode module with unicode in it; error throwing when user explicitly sets wrong encoding) were written. ~~This commit provides changes for better error handling in cases where user tries to `torch.load` a pickled module with non-ASCII characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743.~~ Ping ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/26421 Differential Revision: D17581633 Pulled By: yf225 fbshipit-source-id: f8e77dcf7907092771149aad8ede6cfb73c21620	2019-09-25 14:59:02 -07:00
vishwakftw	aaf30cdf36	Port CUDA implementation of expm1 to ATen (#26598 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24562 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26598 Differential Revision: D17531503 Pulled By: VitalyFedyunin fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48	2019-09-25 11:11:58 -07:00
Mike Ruberry	25cd3c6b7d	Lets generic tests use multiple devices (#26594 ) Summary: - Separates device type from default (test) device - Adds multidevice decorator - Updates generic tests to use multidevice decorator where applicable TorchXLA wants to change the default test device based on the test environment. Separating the device type and the default (test) device enables that functionality. Additionally, many existing tests only run on multiple devices and are required, as a consequence, to make CUDA-specific API calls. The multidevice decorator simplifies the existing code and limits the CUDA dependency. Eventually this should let us run multidevice tests on multiple device types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26594 Test Plan: tests were manually run with the CUDA test device set to 'cuda:1'. Differential Revision: D17568910 Pulled By: mruberry fbshipit-source-id: c442f748a31a970be8c21deb12a67c3b315c1128	2019-09-25 10:16:22 -07:00
Hong Xu	ae0732cde3	Speed up an integer to the power of a positive integer on CPU (#26020 ) Summary: Current integer scalar exps are always cast to double. This commit avoids cast if the tensor is also integral and the scalar is positive to speed up. Benchmark (Debian Buster, g++ 8, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz 0 0:0 3300.00 MHz , Debug build, Turbo turned off): ```python import timeit for n, t in [(1000, 13000), (10_000, 1300)]: for e in (2, 3, 4): for dtype in ('torch.int16', 'torch.int32', 'torch.int64'): print(f'a.pow({e}) (a.numel() == {n}) for {t} times') print(f'dtype {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a.pow({e})', setup=f'import torch; a = torch.arange({n}, device="cpu", dtype={dtype})', number=t)) ``` Before: ``` a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.6958350749996498 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 0.7989626339999631 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 0.7973162800003593 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.8660746679997828 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 0.8101709959996697 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 0.8135280149999744 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 5.010833072999958 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 4.801007671999741 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 3.963344578000033 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.6216251330001796 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.5672429639998882 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.5544572270000572 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.656308512999658 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 1.502670819999821 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.5757876879997639 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 4.775718216999849 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 4.754745475000163 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 3.737249878000057 ``` After: ``` a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.1006453190002503 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.0849009019998448 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.093259106000005 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.0859826279997833 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.1076840900000207 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.0755480369998622 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.918211066999902 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.9183043200000611 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.930021430999659 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 0.7271483560002707 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.7289002070001516 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.7267536800000016 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 0.7301799359997858 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.7289195180001116 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.7270008230002531 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.5354506029998447 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 1.528263066999898 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 1.5369428439998956 ``` --- Best viewed with whitespace changes turned off Pull Request resolved: https://github.com/pytorch/pytorch/pull/26020 Differential Revision: D17485400 Pulled By: VitalyFedyunin fbshipit-source-id: 3a16b074825a5aab0f7e7af3d8100f9e4b7011a3	2019-09-24 09:17:09 -07:00
Hong Xu	7bdc0c138a	Move the CUDA implementation of trunc to ATen. (#25423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25423 Fix #24650 Test Plan: Imported from OSS Differential Revision: D17397489 Pulled By: VitalyFedyunin fbshipit-source-id: 933f915a44ff9b7803ddb2708bf0e723433ee0b6	2019-09-24 07:08:55 -07:00
Supriya Rao	45391ccecb	Update qengine flag in python to string (#26620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620 This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now). set_qengine and get_qengine return an int which represents the at::QEngine enum Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17533582 fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f	2019-09-23 17:56:50 -07:00
Edward Yang	fdf2bdef0c	Revert D17450502: [pytorch][PR] [WIP] Enabled bfloat16 dtype on CUDA Test Plan: revert-hammer Differential Revision: D17450502 Original commit changeset: 0a5acc5fe1b1 fbshipit-source-id: 6360e750e9805dc9c7c6ca8a9c16256ecd749416	2019-09-23 12:11:52 -07:00
Iurii Zdebskyi	76697a3bfc	Enabled bfloat16 dtype on CUDA Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26407 Differential Revision: D17450502 Pulled By: izdeby fbshipit-source-id: 0a5acc5fe1b1555c61ebe038aee9eaaae9dac228	2019-09-23 09:19:04 -07:00
Richard Zou	4fada96218	Renames `tensor.renamed -> rename`, `tensor.names_ -> rename_` (#26548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26548 This makes the naming more consistent with PyTorch's API. The original concern was that `tensor.rename` might make the operation seem like it is in-place. However, we have many "verb" APIs: `tensor.add(other)`, for example, doesn't add other to tensor in-place, but `tensor.add_(other)` does. `tensor.rename_` does exactly the same place as `tensor.rename`, but in-place. Test Plan: - [namedtensor ci] Differential Revision: D17502021 Pulled By: zou3519 fbshipit-source-id: 6a5b93136a820075013cd1e30fb8fc6b9d77d7d9	2019-09-22 15:38:26 -07:00
Jerry Zhang	2667493f4c	Expose supportedQEngines to python (#26474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26474 att Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17517373 fbshipit-source-id: af931761d6ee31a88808d05f686002a83b6b25af	2019-09-21 10:36:13 -07:00
Hong Xu	9ed6074827	Correct the test of a big number (2 ^ 31) (#26491 ) Summary: 2 ^ 31 is 29, which is not a big number. Corrected to 2 ** 31. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26491 Differential Revision: D17494296 fbshipit-source-id: 83d320e8fb6d1b7df41e4474933a98107c8e4129	2019-09-20 19:14:55 -07:00
Vitaly Fedyunin	f55a9da00e	Move the CUDA implementation of floor to ATen. (#25372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25372 Close #24617 Test Plan: Imported from OSS Differential Revision: D17397478 fbshipit-source-id: 11a515235391ae796e2f84cde1913e56561c41bc	2019-09-20 13:15:29 -07:00
Edward Yang	9b7011c5c2	Implement multiple dispatch (#26468 ) (#26501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26501 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. XLA companion patch at https://github.com/pytorch/xla/pull/1031 Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. The new generated code looks like this: ``` inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const { static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)"); return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(this, src))(const_cast<Tensor&>(this), src, non_blocking); } ``` The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D17499154 Pulled By: ezyang fbshipit-source-id: 8ea237c2e935134b0f4f8d6cfd89c6a93037c02c	2019-09-20 10:12:04 -07:00
Richard Zou	e2515a4d6d	Allocate empty tensor instead of empty_like in binary ops, fix pow (#26498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26498 We should allocate an empty tensor as a result tensor when performing binary ops. Currently some ops use `empty_like(self)` as the initial result tensor before passing it into TensorIterator. This is not very efficient because TensorIterator may resize the tensor due to broadcasting, causing more memory allocation. By using an empty tensor as the result tensor, we only need to allocate/resize memory once as opposed to twice. Also fixes https://github.com/pytorch/pytorch/issues/26495. The bug there is that the implementation of `pow` is missing a resize in one case. Test Plan: - new test - run tests Differential Revision: D17500025 Pulled By: zou3519 fbshipit-source-id: bff4949af5e75541c04669b961bcf2e1ec456faf	2019-09-20 07:38:08 -07:00
Michael Suo	5304358859	Revert D17481256: Implement multiple dispatch Test Plan: revert-hammer Differential Revision: D17481256 Original commit changeset: b3206936b4ca fbshipit-source-id: a162c42168c17e24b5eaff83a7aae48beef3d2c2	2019-09-19 14:53:40 -07:00
Edward Yang	0705f759a3	Implement multiple dispatch (#26468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26468 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. XLA companion patch at https://github.com/pytorch/xla/pull/1031 Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. The new generated code looks like this: ``` inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const { static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)"); return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(this, src))(const_cast<Tensor&>(this), src, non_blocking); } ``` The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bddppq Differential Revision: D17481256 Pulled By: ezyang fbshipit-source-id: b3206936b4ca8938d45ea90fd71422e0d80b5f96	2019-09-19 14:29:38 -07:00
iurii zdebskyi	f673def92d	Enabled where for bool tensor on CUDA (#26430 ) Summary: Enabled "where_cuda" for bool tensors on CUDA Fixing https://github.com/pytorch/pytorch/issues/26247 Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/26430 Differential Revision: D17464181 Pulled By: izdeby fbshipit-source-id: cbb09925753b2e6f35e7400da3243d4d3fc86b69	2019-09-19 12:29:31 -07:00
Junjie Bai	07bd76988e	Revert D17265918: Implement multiple dispatch Test Plan: revert-hammer Differential Revision: D17265918 Original commit changeset: 221efe4e86a4 fbshipit-source-id: f0ab90fa1201080e0d62fd140faf0fcdfd56601b	2019-09-19 09:50:17 -07:00
Edward Yang	ece14ff473	Implement multiple dispatch (#25653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25653 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17265918 Pulled By: ezyang fbshipit-source-id: 221efe4e86a40f36abc81e2ebceaa7e251c90b3d	2019-09-19 09:30:40 -07:00
Mike Ruberry	d9ab78b3f0	Moves more tests to TestTorchDeviceType (#26435 ) Summary: - Moves all ROCm-requiring test_torch tests to TestTorchDeviceType - Moves test_stft and test_lu from test_cuda - Moves many CUDA-only test_torch tests to TestTorchDeviceType - Combines several test_torch CPU tests with their CUDA variants Pull Request resolved: https://github.com/pytorch/pytorch/pull/26435 Differential Revision: D17470469 Pulled By: mruberry fbshipit-source-id: 90bb7fc09465c53eb2ab8da52eb2c2509775c16f	2019-09-19 01:49:34 -07:00
Vitaly Fedyunin	36ade9aa23	Move the CUDA implementation of rsqrt to ATen. (#25285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25285 Fix #24620 Test Plan: Imported from OSS Differential Revision: D17397459 fbshipit-source-id: 024dc0da8085df85513fde5f1d1e0141f734b284	2019-09-18 18:17:52 -07:00
Mike Ruberry	248d5857ae	Adds dtypes decorators to and allows helper methods in device generic test classes (#26375 ) Summary: - Adds dtypes, dtypesIfCPU, and dtypesIfCUDA decorators. - Eliminates the need for nontest members to be defined in an inherited base. - Updates one test to use the decorators and updates TestTorchDeviceType with helpers. This PR appears to be hanging the ROCm build, which is not entirely surprising. See https://github.com/pytorch/pytorch/issues/26394, which demonstrates that the ROCm build can be hung by commenting out a Python test that was never run on ROCm. gchanan - what type list, if any, do you want to expose? I imagine most test suites will define their own lists like today. SCALAR_TYPES, QUANTIZED_TYPES, and ALL_TYPES seem reasonable to me. DOCUMENTED_TENSOR_TYPES will be removed, of course. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26375 Test Plan: Edit is to tests themselves. Differential Revision: D17462294 Pulled By: mruberry fbshipit-source-id: f8259ec66709749b1bf8077efc737676af901436	2019-09-18 15:35:52 -07:00
Mike Ruberry	388cfdf2ac	Removes torchtest, expands generic device testing (#26374 ) Summary: - Removes torchtest - <s>Moves test_torch tests skipped on ROCm to generic device test class</s> - Creates test_nn generic device test class Next: adding dtypes to generic device testing framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26374 Test Plan: Change is to tests themselves. Differential Revision: D17442218 Pulled By: mruberry fbshipit-source-id: d7e4451d09fc9049478b35a7efb8bb580071e8c8	2019-09-18 10:24:50 -07:00
Richard Zou	0038111019	Implement named tensor `unflatten(dim, namedshape)`. (#25658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25658 This unflattens `dim` according to the shape specified in `namedshape`. `namedshape` may be either an OrderedDict or an iterable of (name, size) tuples. Future: - It is possible to make it take a dict in Python >= 3.6 because those are ordered by default, but I'll leave that task for the future. Test Plan: - new tests [namedtensor ci] Differential Revision: D17192655 Pulled By: zou3519 fbshipit-source-id: fd9bd2f462c23a4df1c23d66f2aa95076ff1b160	2019-09-17 21:24:25 -07:00
Michael Suo	a76403f609	Revert D17367016: [pytorch][PR] Enabled bfloat16 dtype on CUDA Test Plan: revert-hammer Differential Revision: D17367016 Original commit changeset: 7e6ae7c6aa4e fbshipit-source-id: 6ca4e1dec5357232e224bf6d6f957ac80005c77c	2019-09-17 10:39:59 -07:00
Iurii Zdebskyi	1accc38b75	Enabled bfloat16 dtype on CUDA (#26148 ) Summary: Enabled basic functionality for bfloat16 dtype on CUDA. Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26148 Differential Revision: D17367016 Pulled By: izdeby fbshipit-source-id: 7e6ae7c6aa4e21f076d8b70b91e26b50063c6875	2019-09-17 08:17:36 -07:00
vishwakftw	2dac673861	Enable batching for pinverse (#26095 ) Summary: Changelog: - Modify existing implementation of pinverse to support batching on inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/26095 Test Plan: - Added tests in test_pinverse to test batched implementation Differential Revision: D17408092 Pulled By: soumith fbshipit-source-id: bba95eb193ce33a94ecfaf74da270d34b435e4af	2019-09-16 23:19:16 -07:00
Hong Xu	81d7675301	Ensure that n is non-negative in polygamma. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26294 Differential Revision: D17416847 Pulled By: soumith fbshipit-source-id: 17d5576e019e31e85c0308fb956524484e526cf6	2019-09-16 23:16:11 -07:00
Mike Ruberry	226ee7a889	Adds generic device tests to test_autograd.py (#26248 ) Summary: - Adds new decorators for skipping on ROCm, skipping on MKL, running only on the CPU and running only on CUDA - Makes decorator skip semantics consistent - Adds CUDA default stream requirement to MAGMA decorator - Creates TestAutogradDeviceType Note this PR originally moved test_cdist, but moving it caused failures in CI. There may be an undiagnosed issue with cdist or the test. The issue does not reproduce locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26248 Test Plan: Change is to tests themselves. Differential Revision: D17410386 Pulled By: mruberry fbshipit-source-id: 8459df44f2a00f0e71680fbe713587a01d4b0300	2019-09-16 20:25:25 -07:00
Hong Xu	c92ed8dd44	Move the CUDA implementation of round to ATen. (#25041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041 Fix #24617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041 Test Plan: Imported from OSS Differential Revision: D17114368 Pulled By: VitalyFedyunin fbshipit-source-id: 6ec6ef99b4451acd7e93491fd4b44fca9ce1809d	2019-09-16 09:54:30 -07:00
Mike Ruberry	31139b5f9a	Back out "[pytorch][PR] Refines test_torch.py generic device testing" (#26252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26252 Original commit changeset: 1375774f24c2 Testing to see if this is somehow the source of hangs on ROCm builds. Test Plan: Change is to tests themselves. This diff is for testing the ROCm hang, however. Differential Revision: D17390575 fbshipit-source-id: a6ffd5eb1df3971b99b6d42271a8d3d501ac79c6	2019-09-15 13:42:25 -07:00
Mike Ruberry	b6b2b4c18f	Refines test_torch.py generic device testing (#26244 ) Summary: - Adds SkipCUDAIfRocm and skipCPUIfNoMkl decorators, ports corresponding tests - Changes "SkipIf" input semantics for consistency - Removes torchtest, which has been replaced with this new generic framework - Refactors some common parts out of CUDA tests to TestTorchDeviceType - Ensures all MAGMA tests run on default stream by putting the skipCUDANonDefaultStreamIf in the skipCUDAIfNoMagma decorator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26244 Differential Revision: D17389060 Pulled By: mruberry fbshipit-source-id: 1375774f24c2266049e6d4b899e7300ddf32eac8	2019-09-15 03:35:23 -07:00
Mike Ruberry	b4b8f53a5d	Ports most of test_torch.py to generic device type framework (#26232 ) Summary: This PR moves many tests in test_torch.py to the generic device type framework. This means that many CUDA tests now run in test_torch.py and there is greater consistency in how tests for many device types are written. One change is that all MAGMA tests are run on the default stream due to intermittent instability running MAGMA on the non-default stream. This is a known issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26232 Test Plan: While this PR edits the tests itself, it was validated using two independent methods: (1) The code was reviewed and it was verified that all deleted functions were actually moved. (2) The output of the TestTorch CI was reviewed and test outputs were matched before and after this PR. Differential Revision: D17386370 Pulled By: mruberry fbshipit-source-id: 843d14911bbd52e8aac6861c0d9bc3d0d9418219	2019-09-14 17:10:47 -07:00
Mike Ruberry	fbf991d062	Creates generic device type testing framework (#25967 ) Summary: This PR addresses https://github.com/pytorch/pytorch/issues/24851 by... 1. lets device types easily register themselves for testing 2. lets tests be written to run on multiple devices and with multiple dtypes 3. provides a mechanism to instantiate those tests so they are discoverable and filterable by unittest and pytest It refactors three tests from test_torch.py to demonstrate how to use it. `test_diagonal` is the simplest example. Most tests just need to be modified to accept 'device' as an argument. The framework will then instantiate `test_diagonal_cpu` and `test_diagonal_cuda` (when CUDA is available) which call `test_diagonal` with the appropriate 'device' argument. `test_neg` also has dtype variants. It accepts both 'device' and 'dtype' as arguments, and the dtypes it runs with are specified with the 'dtypes' decorator. Dtypes can be specified for all device types and particular device types. The framework instantiates tests like `test_neg_cpu_torch.float`. `test_inverse` has device-specific dependencies. These dependencies are expressed with the sugary 'skipCUDAIfNoMagma' and 'skipCPUIfNoLapack' decorators. These decorators are device-specific so CPU testing is not skipped if Magma is not installed, and there conditions may be checked after or before the test case has been initialized. This means that skipCUDAIfNoMagma does not initialize CUDA. In fact, CUDA is only initialized if a CUDA test is run. These instantiated tests may be run as usual and with pytest filtering it's easy to run one test on all device types, run all the tests for a particular device type, or run a device type and dtype combination. See the note "Generic Device-Type Testing" for more detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25967 Differential Revision: D17381987 Pulled By: mruberry fbshipit-source-id: 4a639641130f0a59d22da0efe0951b24b5bc4bfb	2019-09-13 23:34:28 -07:00
Geovanni Zhang	e293c4ea73	Fix 'in' return true incorrectly (#24156 ) Summary: Because of 'return NotImplemented', __contains__ return True when the element is not a number. bool(NotImplemented) == True Pull Request resolved: https://github.com/pytorch/pytorch/pull/24156 Differential Revision: D16829895 Pulled By: zou3519 fbshipit-source-id: 9d3d58025b2b78b33a26fdfcfa6029d0d049f11f	2019-09-13 09:27:58 -07:00
Richard Zou	5e2d25af34	Implement tensor.align_as(other), change tensor.align_to(names) (#25843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25843 `tensor.align_to(names)` permutes the dimensions of `tensor` and adds additional 1-sized dimensions such that the output tensor has dimensions in the same order as `names`. All dimensions of `tensor` must be present in `names`, in addition, this function requires that all dims of `tensor` be named. `tensor.align_as(other)` is equivalent to `tensor.align_to(other.names)`. I'm planning on changing `torch.align_tensors(*tensors)` to align closer to these semantics because there didn't seem to be a clear use case for the old semantics that preserve unnamed dimensions. That will come in a future change. Test Plan: - new tests [namedtensor ci] Differential Revision: D17255549 Pulled By: zou3519 fbshipit-source-id: 1e437ad81e9359b4d5bd0e7e64c3a1be441fc3e3	2019-09-12 22:53:44 -07:00
Richard Zou	e544f88590	Implement tensor.refine_names (#25842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25842 `tensor.refine_names(names)` takes `tensor` and attempts to name its dimensions `names` out-of-place. If a dimension `i` already had a name, then it cannot be changed (so tensor.names[i] must equal names[i]); if the original dimension did not have a name, then the new name (names[i]) can be anything. `tensor.refine_names(names)` also accepts a glob '' that greedily selects names from `tensor`. Here are some examples: - `Tensor[None].refine_names('N') -> Tensor[N]` - `Tensor[N].refine_names('N') -> Tensor[N]` - `Tensor[N].refine_names('D') -> Error!` - `Tensor[N].refine_names(None) -> Error!` - `Tensor[None, None].refine_names('', D) -> Tensor[None, D]` Test Plan: - new tests [namedtensor ci] Differential Revision: D17255548 Pulled By: zou3519 fbshipit-source-id: fdbdb3a12f24fbe37ce1e53ed09dc8a42589d928	2019-09-12 22:53:40 -07:00
vishwakftw	eee58f8284	Refactor torch.*solve tests (#25733 ) Summary: Changelog: - De-duplicate the code in tests for torch.solve, torch.cholesky_solve, torch.triangular_solve - Skip tests explicitly if requirements aren't met for e.g., if NumPy / SciPy aren't available in the environment - Add generic helpers for these tests in test/common_utils.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/25733 Test Plan: - All tests should pass to confirm that the change is not erroneous Clears one point specified in the discussion in https://github.com/pytorch/pytorch/issues/24333. Differential Revision: D17315330 Pulled By: zou3519 fbshipit-source-id: c72a793e89af7e2cdb163521816d56747fd70a0e	2019-09-11 14:30:00 -07:00
Pavel Belevich	a14e884546	Migrate pow from TH to Aten (CUDA) (#25517 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24613 ``` DEBUG = 0 OMP_NUM_THREADS = 1 Tesla M40 import torch base = torch.randn(1000000, device='cuda:1') exp = torch.randn(1000000, device='cuda:1') out = torch.empty_like(base) timeit base.pow(0) old 53.1 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.7 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit base.pow(1/3) old 53.3 µs ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.1 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/3) old 53.3 µs ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.1 µs ± 29.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/2) old 53.2 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.8 µs ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/2) old 53.3 µs ± 54.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 42 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1) old 38.3 µs ± 53.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 40.1 µs ± 41.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1) old 38.4 µs ± 29 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 35 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(2) old 38.1 µs ± 20.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.8 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-2) old 38.3 µs ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 35.2 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(3) old 38.3 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.9 µs ± 46.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-3) old 53.3 µs ± 89.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.4 µs ± 31.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(123456.789) old 53.3 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.2 µs ± 24.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-123456.789) old 53.5 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.3 µs ± 66.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(exp) old 58.2 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 54.5 µs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp) old 49.1 µs ± 89.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.7 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp) old 48.7 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.7 µs ± 88.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit torch.pow(-1, exp) old 50.7 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.8 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp) old 49.4 µs ± 98 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.6 µs ± 26.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp) old 50.4 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.8 µs ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp, out=out) old 49 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.2 µs ± 169 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp, out=out) old 49.3 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.8 µs ± 45.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit torch.pow(-1, exp, out=out) old 50.4 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 60.2 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp, out=out) old 49.2 µs ± 293 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.9 µs ± 193 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp, out=out) old 50.5 µs ± 150 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 60.1 µs ± 89.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) base = (torch.rand(1000000, device='cuda:1') * 10).to(int) exp = (torch.rand(1000000, device='cuda:1') * 10).to(int) out = torch.empty_like(base) timeit base.pow(0) old 75.5 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.8 µs ± 84.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/3) old 75.5 µs ± 78.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 842 µs ± 449 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/3) old 75.5 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 843 µs ± 231 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/2) old 75.7 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 123 µs ± 71.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/2) old 76 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 180 µs ± 55.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1) old 74.1 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 72.3 µs ± 32.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1.0) old Integers to negative integer powers are not allowed. new 86.9 µs ± 84.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(2) old 74.2 µs ± 15.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 66.5 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-2.0) old Integers to negative integer powers are not allowed. new 87.3 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(3) old 74.3 µs ± 23.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 66.5 µs ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-3.0) old Integers to negative integer powers are not allowed. new 861 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(123456.789) old 256 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 863 µs ± 64.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-123456.789) old Integers to negative integer powers are not allowed. new 863 µs ± 57.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(exp) old 111 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 98.8 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp) old 81.9 µs ± 23.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 92.9 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp) old 81.9 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.6 µs ± 56.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-1, exp) old 82.2 µs ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.6 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp) old 82.1 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.8 µs ± 75.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp) old 82.3 µs ± 18.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 94 µs ± 68.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp, out=out) old 81.6 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.8 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp, out=out) old 81.6 µs ± 26.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.7 µs ± 36.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-1, exp, out=out) old 82.7 µs ± 119 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.9 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp, out=out) old 82.6 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.7 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp, out=out) old 82.5 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 94 µs ± 55.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25517 Differential Revision: D17251364 Pulled By: pbelevich fbshipit-source-id: 20904c073c311e76285eaa1b68e67e67ea3c62d8	2019-09-10 13:46:22 -07:00
Ailing Zhang	26f67e7aa7	fix scatter CPU kernel when (input size, src size) > index size (#25839 ) Summary: fixes https://github.com/pytorch/pytorch/issues/25836 According to doc, https://pytorch.org/docs/stable/tensors.html#torch.Tensor.scatter_ `index` must have the smallest size and we should iterate over `index` instead of `tensor`. cc: dlibenzi Pull Request resolved: https://github.com/pytorch/pytorch/pull/25839 Differential Revision: D17269116 Pulled By: ailzhang fbshipit-source-id: 0e8569fed6c0d2dd70e4e3ec5d29d8730cd2ae8f	2019-09-10 11:41:41 -07:00
Hong Xu	57b23c61c5	In the CUDA implementation of erfinv, erfinv() should be used for double (#25337 ) Summary: This best preserves accuracy, while erfinvf() should be used for half and float. This is also consistent with the implementation before the migration: https://github.com/pytorch/pytorch/issues/24943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25337 Differential Revision: D17102333 Pulled By: zou3519 fbshipit-source-id: 5178cff534cf5f10d86ab04d4b6c1779ffedf49e	2019-09-10 06:30:33 -07:00
Igor Fedan	bf04c2ca2f	Make torch checks same for both CPU and CUDA multinomial (#25595 ) Summary: Currently we have different checks for multinomial method on CPU and CUDA. This PR will make them consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25595 Differential Revision: D17236163 Pulled By: ifedan fbshipit-source-id: 7718173bdaf216e8eb636c2a5b9c5939b975325b	2019-09-10 05:29:58 -07:00
vishwakftw	36bdde255e	Fix test_det_logdet_slogdet_batched on PowerPC (#25773 ) Summary: Changelog: - Simplify generation of singular matrices to just constructing a constant matrix instead of a random singular matrix using random_square_matrix_of_rank, which is susceptible to numerical issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/25773 Test Plan: - test_det_logdet_slogdet_batched should pass Fixes https://github.com/pytorch/pytorch/issues/25172 cc: branfosj hartb Apologies for the delay. Differential Revision: D17261059 Pulled By: soumith fbshipit-source-id: 8f991e2cb8c0e9dccad363d4785075213088e58a	2019-09-09 19:23:42 -07:00
Richard Zou	7970e5720b	Rename tensor.view_names -> tensor.renamed (#25711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25711 This function renames the dimensions of a tensor out-of-place. Because of that, I think `tensor.renamed(...)` is a clearer name: `view_names` has the connotation that we can use names to `view` our tensors with a "different shape", but what this function really does is let us rename a tensor no matter the previous names. `tensor.names_`, the in-place version of this, is unchanged for now. However, we might delete this or not advertise it if it has no use case and also because its naming is a little inconsistent with `tensor.renamed`. Test Plan: - [namedtensor ci] Differential Revision: D17206515 Pulled By: zou3519 fbshipit-source-id: 67053951fcc8130c84566b5ebbdce35ef619c90d	2019-09-06 11:28:04 -07:00
Brian Vaughan	88e4cee3e7	Improve handling of mixed-type tensor operations (#22273 ) Summary: Improve handling of mixed-type tensor operations. This PR affects the arithmetic (add, sub, mul, and div) operators implemented via TensorIterator (so dense but not sparse tensor ops). For these operators, we will now promote to reasonable types where possible, following the rules defined in https://github.com/pytorch/pytorch/issues/9515, and error in cases where the cast would require floating point -> integral or non-boolean to boolean downcasts. The details of the promotion rules are described here: https://github.com/nairbv/pytorch/blob/promote_types_strict/docs/source/tensor_attributes.rst Some specific backwards incompatible examples: * now `int_tensor * float` will result in a float tensor, whereas previously the floating point operand was first cast to an int. Previously `torch.tensor(10) * 1.9` => `tensor(10)` because the 1.9 was downcast to `1`. Now the result will be the more intuitive `tensor(19)` * Now `int_tensor *= float` will error, since the floating point result of this operation can't be cast into the in-place integral type result. See more examples/detail in the original issue (https://github.com/pytorch/pytorch/issues/9515), in the above linked tensor_attributes.rst doc, or in the test_type_promotion.py tests added in this PR: https://github.com/nairbv/pytorch/blob/promote_types_strict/test/test_type_promotion.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/22273 Reviewed By: gchanan Differential Revision: D16582230 Pulled By: nairbv fbshipit-source-id: 4029cca891908cdbf4253e4513c617bba7306cb3	2019-09-05 18:26:09 -07:00
Johannes M Dieterich	c6dd4036f5	Enable two tests that were skipped b/c of rocThrust bugs fixed in ROCm 2.7 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25724 Differential Revision: D17212373 Pulled By: bddppq fbshipit-source-id: 2978bc13cdcd0e96a82c0019a08b589f67c0fe1d	2019-09-05 16:10:56 -07:00
iotamudelta	4fe857187c	switch to rocThrust for thrust/cub APIs (#25620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602 Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option. Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header. Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust. Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable. Skip four tests that fail with the new rocThrust for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864 Reviewed By: xw285cornell Differential Revision: D16940768 Pulled By: bddppq fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5	2019-09-03 22:16:30 -07:00
vishwakftw	d1e079e2e0	Enable torch.cholesky for batches > 262140 (#24438 ) Summary: Changelog: - Iterate over mini batches of 262140 matrices (maximum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24438 Test Plan: - Added slow tests to test the behavior in test_torch and test_cuda Fixes https://github.com/pytorch/pytorch/issues/24403 Differential Revision: D17175603 Pulled By: soumith fbshipit-source-id: 1abb0a1e92494cf43ef4ba9efb54a919cd18bfef	2019-09-03 17:35:37 -07:00
vishwakftw	1e4832ffad	Enable broadcasting of batch dimensions RHS and LHS tensors for lu_solve (#24333 ) Summary: Changelog: - Enable broadcasting of RHS and LHS tensors for lu_solve. This means that you can now have RHS with size `3 x 2` and LHS with size `4 x 3 x 3` for instance - Remove deprecated behavior of having 2D tensors for RHS. Now all tensors have to have a last dimension which equals the number of right hand sides - Modified docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/24333 Test Plan: - Add tests for new behavior in test_torch.py with a port to test_cuda.py Differential Revision: D17165463 Pulled By: zou3519 fbshipit-source-id: cda5d5496ddb29ed0182bab250b5d90f8f454aa6	2019-09-03 15:14:48 -07:00
Richard Zou	5c4cc1e8f3	Prepare to add some Dimname/DimnameList overloads (#25405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25405 This PR adds schemas to native_functions.yaml, core/Tensor.h, and core/TensorMethods.h for Dimname/DimnameList overloads for the following functions: - min, max, max_values, min_values - mean, meadian - logsumexp, std, var, norm The actual implementations will come in a later PR. I am accumulating all the addtional schemas and changes to core/{Tensor\|TensorMethods}.h in this PR so that there is only one point of failure for potential merge conflicts. Test Plan: - Check that all pytorch builds still build. [namedtensor ci] Differential Revision: D17116333 Pulled By: zou3519 fbshipit-source-id: fd666d60109a311767169261afbec0fd85cc00c8	2019-09-03 10:55:47 -07:00
davidriazati	7a921ba17d	Manually implement `is_zipfile` (#25279 ) Summary: The default implementation is lenient in that it recognizes a zipfile if the magic number appears anywhere in the archive. So if someone has the bytes `PK\x03\x04` in a tensor, it gets recognized as a zipfile. See https://bugs.python.org/issue28494 This implementation only checks the first 4 bytes of the file for the zip magic number. We could also copy https://github.com/python/cpython/pull/5053's fix, but that seems like overkill. Fixes #25214 ](https://our.intern.facebook.com/intern/diff/17102516/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25279 Pulled By: driazati Differential Revision: D17102516 fbshipit-source-id: 4d09645bd97e9ff7136a2229fba1d9a1bce5665a	2019-08-30 16:47:50 -07:00
CamiWilliams	329757a907	Torch.flatten() returns a 1-dim tensor on a 0-dim tensor (#25406 ) Summary: PR for `torch.flatten()` to return a 1-dim tensor on a 0-dim tensor > torch.tensor(123).shape -> torch.Size([]) > torch.tensor(123).flatten() -> torch.tensor([123]) > torch.tensor(123).flatten().shape -> torch.Size([1]) resolve https://github.com/pytorch/pytorch/issues/22963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25406 Differential Revision: D17120464 Pulled By: CamiWilliams fbshipit-source-id: efbecd61f0aefd82f2ab417ca6bb467488ff99de	2019-08-30 08:53:08 -07:00
Brian Vaughan	f0c6021846	fix bug in assertNotEqual for int tensors (#25412 ) Summary: re-apply: https://github.com/pytorch/pytorch/pull/25199 but without a failing quantized test Pull Request resolved: https://github.com/pytorch/pytorch/pull/25412 Differential Revision: D17131303 Pulled By: nairbv fbshipit-source-id: edf7736af3ede5e809eded72be9514e922e70db4	2019-08-30 06:52:30 -07:00
Jerry Zhang	e231bd16fb	Revert D17112656: [pytorch][PR] fix bug in assertNotEqual for int tensors Test Plan: revert-hammer Differential Revision: D17112656 Original commit changeset: 43e0e7da6d58 fbshipit-source-id: 0a0f7b8b125f24a45023ddb46fe144f21499b723	2019-08-29 10:36:56 -07:00
Hong Xu	2e1c37c95c	Move the CUDA implementation of ceil to ATen. (#24866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24866 Fix #24542 Test Plan: Imported from OSS Differential Revision: D16965903 Pulled By: VitalyFedyunin fbshipit-source-id: b9decaa58bec813a23d369b5e1eec627599f41da	2019-08-29 08:48:31 -07:00
Brian Vaughan	1e2b19db6d	fix bug in assertNotEqual for int tensors (#25199 ) Summary: assertNotEqual was failing to detect differences in int tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25199 Differential Revision: D17112656 Pulled By: nairbv fbshipit-source-id: 43e0e7da6d58eb1c837a508d462a748b2065bdd9	2019-08-29 07:32:50 -07:00
Gregory Chanan	f362a5a04b	Revert "Let logical_xor support non-bool tensors." (#25269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25269 This reverts commit `5ca612b55e`. Test Plan: Imported from OSS Differential Revision: D17080088 fbshipit-source-id: e6b6215b713910c448e9a6b831b08f28b849c64a	2019-08-28 15:41:51 -07:00
SsnL	6100de9b1b	implement bool_tensor.bernoulli_ (#25076 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25076 Differential Revision: D17073453 Pulled By: ezyang fbshipit-source-id: 42410da8c9911c1d7b3543bde740c7e66ae0cc1c	2019-08-28 12:25:27 -07:00
Igor Fedan	afb7a162fb	Migrate erfinv and erfinv_ from the TH to Aten (CUDA) (#24943 ) Summary: https://github.com/pytorch/pytorch/issues/24560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24943 Differential Revision: D16996434 Pulled By: ifedan fbshipit-source-id: 77111a4e47bb2b20f65225d48e7213cd77ddae19	2019-08-28 09:30:08 -07:00
Pavel Belevich	112f249446	Port `pow` operator from the TH code to Aten (#23492 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/24750 ``` DEBUG = 0 OMP_NUM_THREADS = 1 import torch base = torch.randn(1000000) exp = torch.randn(1000000) out = torch.empty_like(base) timeit base.pow(0) +30x old 6.26 ms ± 35.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 213 µs ± 3.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(1/3) +6x old 56 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.41 ms ± 237 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(-1/3) +6x old 57 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.49 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(1/2) +6x old 4.04 ms ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 620 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-1/2) +5x old 6.56 ms ± 43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 1.24 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(1) no diff old 322 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) new 331 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-1) +3.5x old 2.48 ms ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 717 µs ± 130 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(2) no diff old 328 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) new 324 µs ± 4.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-2) +3.5x old 2.45 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 662 µs ± 3.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(3) +7x old 2.39 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 334 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-3) +9x old 93.7 ms ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10.3 ms ± 666 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(123456.789) +5x old 46.5 ms ± 418 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.68 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(-123456.789) +5x old 46.5 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(exp) +6x old 60.6 ms ± 4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.7 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(0, exp) no diff old 18.3 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 21.2 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) timeit torch.pow(1, exp) +30x old 6.01 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 203 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit torch.pow(-1, exp) +3x old 30.8 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.67 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(42, exp) +8x old 80.1 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.51 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(-42, exp) +2x old 21.8 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.5 ms ± 89.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(0, exp, out=out) no diff old 20.2 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 22.1 ms ± 648 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) timeit torch.pow(1, exp, out=out) +30x old 6.7 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 203 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit torch.pow(-1, exp, out=out) +3x old 32.5 ms ± 3.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.4 ms ± 99.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(42, exp, out=out) +10x old 91 ms ± 7.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.64 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(-42, exp, out=out) +2.5x old 25.9 ms ± 5.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10.1 ms ± 698 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` BC: enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs. BC: enforce stronger integer tensor base power integer exponent requirement on CPU and CUDA: `Integers to negative integer powers are not allowed.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23492 Differential Revision: D16731583 Pulled By: pbelevich fbshipit-source-id: 4e5bf689357fe82a19371e42d48abbb7b4c1c3ca	2019-08-28 09:11:50 -07:00
Igor Fedan	9b1097958e	Migrate digamma\digamma_\polygamma\polygamma_ from the TH to Aten (CPU) (#25048 ) Summary: https://github.com/pytorch/pytorch/issues/24612 https://github.com/pytorch/pytorch/issues/24550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25048 Differential Revision: D16996440 Pulled By: ifedan fbshipit-source-id: 0d76588d179d4c932e3fc284cb399dcfc77bc622	2019-08-28 08:29:13 -07:00
Patrick Donnelly	883628cb5c	Added documentation for nn.functional.bilinear (#24951 ) Summary: Adds documentation for `nn.functional.bilinear`, as requested in https://github.com/pytorch/pytorch/issues/9886. The format follows that of `nn.functional.linear`, and borrows from `nn.bilinear` in its description of `Tensor` shapes. I am happy to add more extensive documentation (e.g. "Args," "Example(s)"). From what I gather, the format of comments is inconsistent across functions in `nn.functional.py` and between modules (e.g. `nn.functional` and `nn`). It's my first PR, so guidance for contributing documentation and other code would be greatly appreciated! Pull Request resolved: https://github.com/pytorch/pytorch/pull/24951 Differential Revision: D17091261 Pulled By: soumith fbshipit-source-id: efe2ad764700dfd6f30eedc03de4e1cd0d10ac72	2019-08-28 08:19:25 -07:00
Funtowicz Morgan	2c22076342	Moving sign function to ATen (#22861 ) Summary: This PR linked to https://github.com/pytorch/pytorch/issues/22806 moving sign function to ATen. sign(x) supports bool, and vectorized operation on CPU. sign(NaN) is defined to return 0. sign(bool) is a no-op, the resulting tensor will holds the same values than the input one. - [x] CPU Backend - [x] CUDA Backend - [x] Bring support for bool dtype - [x] Bring support for Half dtype - [x] Add test for NaN - [x] Add test for bool dtype - [x] Delete legacy implementation in THTensorMoreMath.cpp Performances: ```python timeit -s 'import torch; x = torch.randn((1000, 1000))' -n 1000 'torch.sign(x)' timeit -s 'import torch; x = torch.randn((1000, 1000), device="cuda")' -n 1000 'torch.sign(x); torch.cuda.synchronize()' ``` \| device \| before \| after \| \| :-------------: \| :-------------: \| :-----: \| \| CPU \| 1.24 msec \| 33.9 usec \| \| GPU \| 680 usec \| 7.13 usec \| \| CPU (1 thread) \| 0.82 msec \| 0.73 msec \| \| GPU (1 thread) \| 16.1 used \| 15.9 usec \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/22861 Differential Revision: D16503452 Pulled By: VitalyFedyunin fbshipit-source-id: a87ce7fff139642ef4ed791f15873074ad0d53af	2019-08-27 19:01:34 -07:00
Pavel Belevich	30bc65271d	torch.from_numpy fix for np.int (#25139 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22615 Because of different sizeof(long) we have the following relations between NPY_TYPES and NPY_INTXX aliases: ``` int value Enum Unix Windows 1 NPY_BYTE NPY_INT8 NPY_INT8 3 NPY_SHORT NPY_INT16 NPY_INT16 5 NPY_INT NPY_INT32 - 7 NPY_LONG NPY_INT64 NPY_INT32 9 NPY_LONGLONG - NPY_INT64 ``` I suggest the following fix for `numpy_dtype_to_aten` method: ``` if (dtype == NPY_INT \|\| dtype == NPY_INT32) { return kInt; } else if (dtype == NPY_LONGLONG \|\| dtype == NPY_INT64) { return kLong; } ``` On Unix it will be replaced with: Unix: ``` if (dtype == 5 \|\| dtype == 5) { return kInt; } else if (dtype == 9 \|\| dtype == 7) { return kLong; } ``` and on Windows with: ``` if (dtype == 5 \|\| dtype == 7) { return kInt; } else if (dtype == 9 \|\| dtype == 9) { return kLong; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25139 Differential Revision: D17048443 Pulled By: pbelevich fbshipit-source-id: 9f2c27ff2829b893a35d3d57f176a58e7749a468	2019-08-26 05:07:22 -07:00
Kexuan Sun	4b3ea92787	Test if descriptions of args are in the template (#24161 ) Summary: As in https://github.com/pytorch/pytorch/issues/23439, some descriptions of arguments in `_torch_docs.py` have been replaced by `common_args`, it would be helpful to check if any descriptions can be replaced for new docs in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24161 Differential Revision: D16889293 Pulled By: ezyang fbshipit-source-id: bf6f581494482d6eb32e634f73e84a4586766230	2019-08-20 16:34:50 -07:00
Max Balandat	d33623f7c1	Make SobolEngine use random seed if not specified (#24884 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/24881. Makes behavior consistent with the rest of the random functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24884 Test Plan: Unit tests Reviewed By: sdsingh Differential Revision: D16912036 Pulled By: Balandat fbshipit-source-id: eff00cca989926a5d9e20d8846a8674f7cd270cb	2019-08-20 09:22:41 -07:00
Pavel Belevich	6100205eb8	TensorIterator::binary_op input-output overlap check (#24058 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8212 This fix is based on the idea that in-place ops(e.g. add_(...)) and out ops(e.g. tensor.add(..., out=...)) must check that the output tensor does not partially overlap with any of it's input tensors. Otherwise the result of such op is unexpected to the user. Since TensorIterator is a common backend for such ops and it's already used to check output self-overlapping, this fix is implemented in the same place. MemOverlapStatus enum class is introduced to model two tensors overlapped state: - TOO_HARD if at least one of them is not contiguous - FULL if both are contiguous and share exactly the same memory array [data(), data() + numel() *itemsize()] - PARTIAL is both are contiguous but underlying memory is shared partially, in other words memory arrays overlap but not identical. - NO if both are contiguous but have independent non overlapping memory arrays Performance test of clone/addcmul_/addcdiv_ with check_mem_overlaps: a = torch.empty(10000000, device='cpu') b = torch.randn(10000000, device='cpu') timeit a.copy_(b) master: 10.3 ms ± 429 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) branch: 10.2 ms ± 946 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) a = torch.empty(10000000, device='cuda') b = torch.randn(10000000, device='cuda') timeit a.copy_(b) master: 373 µs ± 97.9 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) branch: 373 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cpu') b = torch.randn(1000000, device='cpu') c = torch.randn(1000000, device='cpu') timeit a.addcmul_(b, c) master: 2.02 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) branch: 2.11 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cuda') b = torch.randn(1000000, device='cuda') c = torch.randn(1000000, device='cuda') timeit a.addcmul_(b, c) master: 72.6 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 72.4 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(1000000, device='cpu') b = torch.randn(1000000, device='cpu') c = torch.randn(1000000, device='cpu') timeit a.addcdiv_(b, c) master: 2.19 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 1000 loop each) branch: 1.97 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cuda') b = torch.randn(1000000, device='cuda') c = torch.randn(1000000, device='cuda') timeit a.addcdiv_(b, c) master: 71.3 µs ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 71.7 µs ± 3.96 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.empty(100, device='cpu') b = torch.randn(100, device='cpu') timeit a.copy_(b) master: 12.1 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) branch: 11.1 µs ± 61.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) a = torch.empty(100, device='cuda') b = torch.randn(100, device='cuda') timeit a.copy_(b) master: 20.9 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 22.8 µs ± 2.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cpu') b = torch.randn(100, device='cpu') c = torch.randn(100, device='cpu') timeit a.addcmul_(b, c) master: 24.1 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 24 µs ± 91.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cuda') b = torch.randn(100, device='cuda') c = torch.randn(100, device='cuda') timeit a.addcmul_(b, c) master: 34.5 µs ± 4.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 29.8 µs ± 496 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cpu') b = torch.randn(100, device='cpu') c = torch.randn(100, device='cpu') timeit a.addcdiv_(b, c) master: 21.3 µs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 23.8 µs ± 403 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cuda') b = torch.randn(100, device='cuda') c = torch.randn(100, device='cuda') timeit a.addcdiv_(b, c) master: 30.3 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 31.8 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24058 Differential Revision: D16767892 Pulled By: pbelevich fbshipit-source-id: 0cdaaa471d003a2886b1736f8985842226b8493a	2019-08-19 15:06:04 -07:00
Vishwak Srinivasan	4358cbe01b	Allow torch.tril / triu to handle bool and half inputs (#24163 ) Summary: Changelog: - Enable torch.tril / triu for bool and float16 dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/24163 Test Plan: - Tests added in test_torch.py for all devices and dtypes (except bfloat16) Fixes https://github.com/pytorch/pytorch/issues/24035 Differential Revision: D16793315 Pulled By: ezyang fbshipit-source-id: 2bbc51ce567405a7cb2d8ab567eee6c2e40aa76a	2019-08-19 15:02:53 -07:00
vishwakftw	f849ebf1fe	Enable torch.eye for bool and half (#24148 ) Summary: Changelog: - Enable torch.eye for bool and float16 dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/24148 Test Plan: - Tests added in test_torch.py for all available devices and dtypes (except torch.bfloat16) Fixes https://github.com/pytorch/pytorch/issues/24088 Differential Revision: D16891048 Pulled By: ezyang fbshipit-source-id: 3e86fe271bd434300c396e63f82c1a1f3adac2b4	2019-08-19 14:59:37 -07:00
Iurii Zdebskyi	eee3e92936	Enabled torch.mm and torch.mv for bfloat16 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24224 Test Plan: Imported from OSS Differential Revision: D16779996 Pulled By: izdeby fbshipit-source-id: c859d8945a564edfa3f8a1430f140ae30d484d19	2019-08-16 15:46:15 -07:00
Heungsub Hans Lee	e166811598	Documentation for Tensor.record_stream() (#24078 ) Summary: This patch writes documentation for `Tensor.record_stream()`, which is not a documented API currently. I've discussed publishing it with colesbury in https://github.com/pytorch/pytorch/issues/23729. The documentation is based on [the introduction at `CUDACachingAllocator.cpp`](`25d1496d58/c10/cuda/CUDACachingAllocator.cpp (L47-L50)`). ~~I didn't explain full details of the life cycle of memory blocks or stream awareness of the allocator for the consistent level of details with other documentations.~~ I explained about the stream awareness in a note block. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24078 Differential Revision: D16743526 Pulled By: zou3519 fbshipit-source-id: 05819c3cc96733e2ba93c0a7c0ca06933acb22f3	2019-08-16 08:07:33 -07:00
Hong Xu	5ca612b55e	Let logical_xor support non-bool tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23978 Test Plan: Imported from OSS Differential Revision: D16719299 Pulled By: gchanan fbshipit-source-id: 2fe170be6090733e20410db7cf99266543299c58	2019-08-15 12:21:31 -07:00
Hong Xu	00e4870001	Let logical_not support non-bool tensors. (#23916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23916 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23916 Test Plan: Imported from OSS Differential Revision: D16719300 Pulled By: gchanan fbshipit-source-id: 5be6aeea9a38cc40ad59d0449d25a25f7dfa2787	2019-08-15 12:21:27 -07:00
Iurii Zdebskyi	52b4221bfa	Enabled masked methods for bfloat16 (#24183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24183 ----------- Fix: Enabled masked select/scatter/fill for BFloat16 on CPU Test: via unit tests Test Plan: Imported from OSS Differential Revision: D16763461 Pulled By: izdeby fbshipit-source-id: fe733635a2064e5a088a108ff77c2a1a1487a27c	2019-08-15 08:45:24 -07:00
Hong Xu	bc92ce9e07	Recommend logical_not() instead of bitwise_not() when applying sub and neg on bool tensors. (#23860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23860 Close #23836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23860 Test Plan: Imported from OSS Differential Revision: D16678299 Pulled By: gchanan fbshipit-source-id: b08e77f6a41c3994240849985caaff7c559d3f83	2019-08-15 08:40:29 -07:00
Hong Xu	338f9c860f	Add logical_xor operator (#23847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23847 Related to #23836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23847 Test Plan: Imported from OSS Differential Revision: D16678300 Pulled By: gchanan fbshipit-source-id: 67020aca5830b6bec2f561105954e0a8c2ee37e0	2019-08-15 08:40:25 -07:00
Hong Xu	1f4c73618c	Add logical_not operator. (#23839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23839 Close #23836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23839 Test Plan: Imported from OSS Differential Revision: D16678301 Pulled By: gchanan fbshipit-source-id: 54e7b3f3b04c577e239b88493247e1c036266774	2019-08-15 08:40:21 -07:00
Jie	064d156511	(#23574 ) Summary: Assert that there's no multiple written-to to a single memory location, which caused corrupted output. Fixed batched matrix trlu logic, which relies on the previous copy behavior to support tensors with stride 0 at leading dimension. This fixes the issue proposed at: https://github.com/pytorch/pytorch/issues/23063 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23574 Differential Revision: D16600717 Pulled By: ezyang fbshipit-source-id: e41e14f03eccf97398b64ba43647110beb1529e6	2019-08-14 21:12:07 -07:00
Kexuan Sun	e2a6212912	Resolve unused variables in tests (#24075 ) Summary: Variables such as `device` and `sparse` in for loops should be used in tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24075 Differential Revision: D16763073 Pulled By: ezyang fbshipit-source-id: 8735cbc8d9ed695db8489cfc949c895180a7b826	2019-08-14 21:02:52 -07:00
Vitaly Fedyunin	a5872a16a0	Rename torchtest.test_all_device_types to torchtest.for_all_device_types (#24337 ) Summary: Rename decorator to `for_all_device_types` as `test_` prefixed name recognized as test in some environments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24337 Differential Revision: D16806807 Pulled By: VitalyFedyunin fbshipit-source-id: 3132366046e183329ba5838a4bc29441fdb5bd4e	2019-08-14 12:09:51 -07:00
Richard Zou	f996f8d61d	Update tensor.view_names / tensor.names_ API (#23973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23973 Without loss of generality, I describe the API for `tensor.view_names`. `tensor.names_` has an analogous API. `tensor.view_names(names)` returns a view on tensor with named dims `names`. `names` must be of length `tensor.dim()`; otherwise, if '' is in `names`, then it (known as the "glob") is expanded greedily to be equal to the corresponding names from `tensor.names`. For example, ``` >>> x = torch.empty(2, 3, 5, 7, names=('N', 'C', 'H', 'W')) >>> x.view_names('', 'height', 'width').names ('N', 'C', 'height', 'width') >>> x.view_names('batch', '', 'width').names ('batch', 'C', 'H', 'width') ``` tensor.view_names(**rename_map) returns a view on tensor that has renamed dims as specified in the mapping `rename_map`. For example, ``` >>> x = torch.empty(2, 3, 5, 7, names=('N', 'C', 'H', 'W')) >>> x.view_names(W='width', H='height').names ('N', 'C', 'height', 'width') ``` These are different(!!!) from the C++ API, which only allows the following: - tensor.view_names(optional<DimnameList>) C++ API parity for named tensors is not important right now; I am punting that to the future. Test Plan: - [namedtensor ci] Differential Revision: D16710916 Pulled By: zou3519 fbshipit-source-id: 7cb8056c0fb4c97b04c3a2d1dd0f737e0a67ce34	2019-08-14 09:40:35 -07:00
Richard Zou	2fcdb3a1f3	Rename set_names -> view_names, set_names_ -> names_ (#23962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23962 This change should make the semantics clearer. `tensor.names_(names)` sets tensor.names to be `names`. `tensor.view_names(names)` returns a view of the tensor with names `names`. Test Plan - [namedtensor ci] Test Plan: Imported from OSS Differential Revision: D16710915 Pulled By: zou3519 fbshipit-source-id: c82fa9812624d03c86f7be84b0a460e3c047aaa0	2019-08-14 09:40:31 -07:00
Richard Zou	7030f2c623	Implement tensor.align_to(names), torch.align_tensors(tensors) (#23804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23804 `output = tensor.align_to(names)` returns a view of `tensor` such that `output.names = names`. Dimensions with the same names in `tensor` and `output` have the same sizes; dimensions with new names have size 1. The following must be true for this operation to succeed: 1) tensor.names must be a subsequence (not necessarily contiguous) of `names` 2) Aligning tensor.names to names must not change the absolute position from the right of any unnamed dimension. In practice, these constraints mean that aligning cannot transpose names. Some examples: - Tensor[C].align_to(C) -> Tensor[C] - Tensor[N].align_to([N, C]) -> Tensor[N, C] - Tensor[H, W].align_to([N, H, W, C]) -> Tensor[N, H, W, C] - Tensor[None].align_to([N, None]) -> Tensor[N, None] - Tensor[N].align_to([N, None None]) -> Tensor[N, None, None] Examples of error cases: - Tensor[W, H].align_to([N, H, W, C]) -> Error (not a subsequence) - Tensor[None, H].align_to([None, H, W]) -> Error (would change the absolute position from the right of a None dimension) `torch.align_tensors(tensors)` aligns the named dimensions of each tensor according to the alignment rules so that they can be used in an operation. More concretely, it aligns each tensor to the longest names among the names of the tensors in `tensors`. This allows users to emulate "broadcasting by names", which is one of the things named tensors tries to enable. Here is an example: ``` imgs: Tensor[N, C, H, W] scale: Tensor[N] // Doesn't work because we do broadcasting by alignment by default imgs * scale // Does work imgs, scale = torch.align_tensors(imgs, scale) imas * scale ``` Future: - Consider allowing broadcasting by names by default. Test Plan: - The diff looks pretty large but more than half of it is testing. - new tests [namedtensor ci] Differential Revision: D16657927 Pulled By: zou3519 fbshipit-source-id: e2f958bf5146c8ee3b694aba57d21b08e928a4e6	2019-08-14 09:40:27 -07:00
Richard Zou	98a3b3d565	Add name propagation for at::alias, add tensor.set_names (#24202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24202 tensor.set_names(names) is the out-of-place variant of tensor.set_names_(names). This naming is probably confusing so I am taking any and all suggestions. Test Plan: - run tests [namedtensor ci] Differential Revision: D16773014 Pulled By: zou3519 fbshipit-source-id: 61024303c1a34db631cc4cb2c53757345e40d72c	2019-08-13 17:01:18 -07:00
Iurii Zdebskyi	0ea8f22951	Enabled comparison ops for bfloat16 dtype on CPU (#24182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24182 ----- Fix: Enabled comparison operations for BFloat16 on CPU Test: via unit tests Test Plan: Imported from OSS Differential Revision: D16763460 Pulled By: izdeby fbshipit-source-id: 885ff9006d3bd60bb945147c3b86f97cd0d26f7b	2019-08-13 15:44:24 -07:00
Vitaly Fedyunin	6d14f7a214	Simplify tests that should cover all possible devices (#23824 ) Summary: This PR introduce `pytorchtest.test_all_device_types()` decorator which helps to write CPU, CUDA tests faster, iterating single test through all available devices Simple `test_var_mean_some_dims` becomes ``` test_var_mean_some_dims (__main__.TestTorch) ... ok test_var_mean_some_dims_cpu (__main__.TestTorch) ... ok test_var_mean_some_dims_cuda (__main__.TestTorch) ... ok ``` ```python class pytorchtest(): """Allows to generate and run per-device unittests. This decorator class allows to generate and run per-device unittest. Example: class _TestTorchMixin(pytorchtest): pytorchtest.test_all_device_types() def test_zeros_like(self, device): expected = torch.zeros((100, 100,), device=device) Will execute: test_zeros_like (__main__.TestTorch) ... skipped 'Look at test_zeros_like_cpu, test_zeros_like_cuda results.' test_zeros_like_cpu (__main__.TestTorch) ... ok test_zeros_like_cuda (__main__.TestTorch) ... ok To work properly, test class should be inherited from the `pytorchtest`. test_all_device_types decorator does not guarantee proper functionality in combination with other decorators. Please do not extend this decorator to support other cases (such as dtype, layouts, etc) without consulting with bigger group. Devices is the special case as build flags control additions/removals (see https://github.com/pytorch/pytorch/pull/23824 for the reference). """ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23824 Differential Revision: D16716959 Pulled By: VitalyFedyunin fbshipit-source-id: ba39af0f9bce2c4a64da421bbc24d6a1c1d9139d	2019-08-13 09:36:31 -07:00
Brian Vaughan	465b4de9d4	add function name to error messages generated by checked_tensor_unwrap (#24187 ) Summary: Improve error messages by showing the relevant function call that failed. Before: ``` >>> torch.ones(1, dtype=torch.float) < torch.ones(1, dtype=torch.double) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Expected object of scalar type Float but got scalar type Double for argument https://github.com/pytorch/pytorch/issues/2 'other' ``` After: ``` >>> torch.ones(1, dtype=torch.float) < torch.ones(1, dtype=torch.double) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: Expected object of scalar type Float but got scalar type Double for argument https://github.com/pytorch/pytorch/issues/2 'other' in call to _th_lt ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/24187 Differential Revision: D16769167 Pulled By: nairbv fbshipit-source-id: 4992eb4e86bdac2ab8805cc5356f7f92c63e1255	2019-08-12 14:02:22 -07:00
Richard Zou	75db368031	Revert D16763388: Add name propagation for at::alias, add tensor.set_names Differential Revision: D16763388 Original commit changeset: 4b2fb3acc051 fbshipit-source-id: 5be35bdcc2e7c71378af9e34be19305bdd4ba0d1	2019-08-12 13:42:43 -07:00
Richard Zou	1108fa1acb	Add name propagation for at::alias, add tensor.set_names (#24105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24105 tensor.set_names(names) is the out-of-place variant of tensor.set_names_(names). This naming is probably confusing so I am taking any and all suggestions. Test Plan: - run tests [namedtensor ci] Differential Revision: D16763388 Pulled By: zou3519 fbshipit-source-id: 4b2fb3acc0514515e7ca805dbc5c3d4a9bd96317	2019-08-12 12:44:56 -07:00
Igor Fedan	1d3d92e770	Port addcdiv operator from the TH code to Aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24086 Differential Revision: D16733306 Pulled By: ifedan fbshipit-source-id: c103bc44e0bb42dff0229252e1a12ce9b4e5aeae	2019-08-09 13:48:01 -07:00
Gregory Chanan	e81f296807	Fixed Bool in IsIntegralType bug (plus review comments) (#23942 ) Summary: Same as https://github.com/pytorch/pytorch/pull/23887, but also includes review comments, so we can kick off a build. Original PR: This [PR](https://github.com/pytorch/pytorch/pull/23346) caused [this](https://github.com/pytorch/pytorch/issues/23882) bug. Fix: - Deprecate old isIntegralType and add overload which takes a boolean flag which tells if torch.bool should be included in integral types or not. Testing: - Added extra test cases - Tested via running unit tests locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23942 Differential Revision: D16688056 Pulled By: gchanan fbshipit-source-id: eff457e27b13e116c05ffd022b2fb0495abe0e97	2019-08-09 12:25:27 -07:00
Richard Zou	0bba302da5	Revert D16621830: Add name propagation for at::alias, add tensor.set_names Differential Revision: D16621830 Original commit changeset: f8a3837d3a37 fbshipit-source-id: 801ab858a0741d98b0b9d56763fa70a9010fe75e	2019-08-09 10:55:18 -07:00
Richard Zou	78f3b883f0	Add name propagation for at::alias, add tensor.set_names (#23624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23624 tensor.set_names(names) is the out-of-place variant of tensor.set_names_(names). This naming is probably confusing so I am taking any and all suggestions. Test Plan: - run tests [namedtensor ci] gh-metadata: pytorch pytorch 23624 gh/zou3519/86/head Differential Revision: D16621830 Pulled By: zou3519 fbshipit-source-id: f8a3837d3a370b41210e938369348dcbb4aee53a	2019-08-09 09:17:31 -07:00
Edward Yang	21ea0a115c	Revert D16627924: [pytorch][PR] Port addcdiv operator from the TH code to Aten Differential Revision: D16627924 Original commit changeset: 960856d30fd3 fbshipit-source-id: a375a3ede5ef956a07fb55c7b4a5d4fc34c96ddb	2019-08-09 08:33:44 -07:00
Hong Xu	2e8557778b	Refactor randperm test (#23526 ) Summary: CPU and CUDA testing code are largely the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23526 Reviewed By: ezyang Differential Revision: D16586271 Pulled By: VitalyFedyunin fbshipit-source-id: 91c70c05789120fde4718ce955de243087a8c993	2019-08-09 08:33:35 -07:00
Stefan Krah	478c793065	Remove numpy assert that fails on Windows (older numpy versions). (#24012 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24001. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24012 Differential Revision: D16732191 Pulled By: ezyang fbshipit-source-id: 36660a6635ab64d2f63278b1616deb1282dea037	2019-08-09 07:55:02 -07:00
Igor Fedan	fb77f14054	Port addcdiv operator from the TH code to Aten (#23683 ) Summary: https://github.com/pytorch/pytorch/issues/22796 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23683 Differential Revision: D16627924 Pulled By: ifedan fbshipit-source-id: 960856d30fd3f79394925eddd0152cc5e27b39b3	2019-08-09 07:44:57 -07:00
Igor Fedan	9114089d70	port atan2 from TH to ATen (#23558 ) Summary: https://github.com/pytorch/pytorch/issues/22799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23558 Differential Revision: D16591638 Pulled By: ifedan fbshipit-source-id: d12d4c8229337a22a3278f0c7a8bbc9a86d4c9b7	2019-08-09 07:44:53 -07:00
Iurii Zdebskyi	5b9f55f33f	Enable Add, sub, mul, and div on CPU for bfloat16 type. (#22851 ) Summary: Enable Add, sub, mul, and div on CPU for bfloat16 type. Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22851 Differential Revision: D16256757 Pulled By: izdeby fbshipit-source-id: 8b62f7581fc0ca0d2cff48ab40d877a9fcf70a5b	2019-08-08 12:34:25 -07:00
Igor Fedan	341d5934b7	Move addcmul to Aten(CUDA) (#23814 ) Summary: https://github.com/pytorch/pytorch/issues/22797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23814 Differential Revision: D16712381 Pulled By: ifedan fbshipit-source-id: aeca4fdb9b10143932f195900b1f424ef6d26c89	2019-08-08 12:34:21 -07:00
Ojas Ahuja	10b1254edd	fix crash on torch.Tensor.repeat() for 0 repeats (#23766 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/23603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23766 Differential Revision: D16644866 Pulled By: soumith fbshipit-source-id: ee7d368afdfe874133d0bd90f4d03a191ee22b13	2019-08-07 09:16:00 -07:00

1 2 3 4 5 ...

999 Commits