pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mike Ruberry	388cfdf2ac	Removes torchtest, expands generic device testing (#26374 ) Summary: - Removes torchtest - <s>Moves test_torch tests skipped on ROCm to generic device test class</s> - Creates test_nn generic device test class Next: adding dtypes to generic device testing framework. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26374 Test Plan: Change is to tests themselves. Differential Revision: D17442218 Pulled By: mruberry fbshipit-source-id: d7e4451d09fc9049478b35a7efb8bb580071e8c8	2019-09-18 10:24:50 -07:00
Richard Zou	0038111019	Implement named tensor `unflatten(dim, namedshape)`. (#25658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25658 This unflattens `dim` according to the shape specified in `namedshape`. `namedshape` may be either an OrderedDict or an iterable of (name, size) tuples. Future: - It is possible to make it take a dict in Python >= 3.6 because those are ordered by default, but I'll leave that task for the future. Test Plan: - new tests [namedtensor ci] Differential Revision: D17192655 Pulled By: zou3519 fbshipit-source-id: fd9bd2f462c23a4df1c23d66f2aa95076ff1b160	2019-09-17 21:24:25 -07:00
Michael Suo	a76403f609	Revert D17367016: [pytorch][PR] Enabled bfloat16 dtype on CUDA Test Plan: revert-hammer Differential Revision: D17367016 Original commit changeset: 7e6ae7c6aa4e fbshipit-source-id: 6ca4e1dec5357232e224bf6d6f957ac80005c77c	2019-09-17 10:39:59 -07:00
Iurii Zdebskyi	1accc38b75	Enabled bfloat16 dtype on CUDA (#26148 ) Summary: Enabled basic functionality for bfloat16 dtype on CUDA. Tested via unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26148 Differential Revision: D17367016 Pulled By: izdeby fbshipit-source-id: 7e6ae7c6aa4e21f076d8b70b91e26b50063c6875	2019-09-17 08:17:36 -07:00
vishwakftw	2dac673861	Enable batching for pinverse (#26095 ) Summary: Changelog: - Modify existing implementation of pinverse to support batching on inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/26095 Test Plan: - Added tests in test_pinverse to test batched implementation Differential Revision: D17408092 Pulled By: soumith fbshipit-source-id: bba95eb193ce33a94ecfaf74da270d34b435e4af	2019-09-16 23:19:16 -07:00
Hong Xu	81d7675301	Ensure that n is non-negative in polygamma. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26294 Differential Revision: D17416847 Pulled By: soumith fbshipit-source-id: 17d5576e019e31e85c0308fb956524484e526cf6	2019-09-16 23:16:11 -07:00
Mike Ruberry	226ee7a889	Adds generic device tests to test_autograd.py (#26248 ) Summary: - Adds new decorators for skipping on ROCm, skipping on MKL, running only on the CPU and running only on CUDA - Makes decorator skip semantics consistent - Adds CUDA default stream requirement to MAGMA decorator - Creates TestAutogradDeviceType Note this PR originally moved test_cdist, but moving it caused failures in CI. There may be an undiagnosed issue with cdist or the test. The issue does not reproduce locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26248 Test Plan: Change is to tests themselves. Differential Revision: D17410386 Pulled By: mruberry fbshipit-source-id: 8459df44f2a00f0e71680fbe713587a01d4b0300	2019-09-16 20:25:25 -07:00
Hong Xu	c92ed8dd44	Move the CUDA implementation of round to ATen. (#25041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041 Fix #24617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041 Test Plan: Imported from OSS Differential Revision: D17114368 Pulled By: VitalyFedyunin fbshipit-source-id: 6ec6ef99b4451acd7e93491fd4b44fca9ce1809d	2019-09-16 09:54:30 -07:00
Mike Ruberry	31139b5f9a	Back out "[pytorch][PR] Refines test_torch.py generic device testing" (#26252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26252 Original commit changeset: 1375774f24c2 Testing to see if this is somehow the source of hangs on ROCm builds. Test Plan: Change is to tests themselves. This diff is for testing the ROCm hang, however. Differential Revision: D17390575 fbshipit-source-id: a6ffd5eb1df3971b99b6d42271a8d3d501ac79c6	2019-09-15 13:42:25 -07:00
Mike Ruberry	b6b2b4c18f	Refines test_torch.py generic device testing (#26244 ) Summary: - Adds SkipCUDAIfRocm and skipCPUIfNoMkl decorators, ports corresponding tests - Changes "SkipIf" input semantics for consistency - Removes torchtest, which has been replaced with this new generic framework - Refactors some common parts out of CUDA tests to TestTorchDeviceType - Ensures all MAGMA tests run on default stream by putting the skipCUDANonDefaultStreamIf in the skipCUDAIfNoMagma decorator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26244 Differential Revision: D17389060 Pulled By: mruberry fbshipit-source-id: 1375774f24c2266049e6d4b899e7300ddf32eac8	2019-09-15 03:35:23 -07:00
Mike Ruberry	b4b8f53a5d	Ports most of test_torch.py to generic device type framework (#26232 ) Summary: This PR moves many tests in test_torch.py to the generic device type framework. This means that many CUDA tests now run in test_torch.py and there is greater consistency in how tests for many device types are written. One change is that all MAGMA tests are run on the default stream due to intermittent instability running MAGMA on the non-default stream. This is a known issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26232 Test Plan: While this PR edits the tests itself, it was validated using two independent methods: (1) The code was reviewed and it was verified that all deleted functions were actually moved. (2) The output of the TestTorch CI was reviewed and test outputs were matched before and after this PR. Differential Revision: D17386370 Pulled By: mruberry fbshipit-source-id: 843d14911bbd52e8aac6861c0d9bc3d0d9418219	2019-09-14 17:10:47 -07:00
Mike Ruberry	fbf991d062	Creates generic device type testing framework (#25967 ) Summary: This PR addresses https://github.com/pytorch/pytorch/issues/24851 by... 1. lets device types easily register themselves for testing 2. lets tests be written to run on multiple devices and with multiple dtypes 3. provides a mechanism to instantiate those tests so they are discoverable and filterable by unittest and pytest It refactors three tests from test_torch.py to demonstrate how to use it. `test_diagonal` is the simplest example. Most tests just need to be modified to accept 'device' as an argument. The framework will then instantiate `test_diagonal_cpu` and `test_diagonal_cuda` (when CUDA is available) which call `test_diagonal` with the appropriate 'device' argument. `test_neg` also has dtype variants. It accepts both 'device' and 'dtype' as arguments, and the dtypes it runs with are specified with the 'dtypes' decorator. Dtypes can be specified for all device types and particular device types. The framework instantiates tests like `test_neg_cpu_torch.float`. `test_inverse` has device-specific dependencies. These dependencies are expressed with the sugary 'skipCUDAIfNoMagma' and 'skipCPUIfNoLapack' decorators. These decorators are device-specific so CPU testing is not skipped if Magma is not installed, and there conditions may be checked after or before the test case has been initialized. This means that skipCUDAIfNoMagma does not initialize CUDA. In fact, CUDA is only initialized if a CUDA test is run. These instantiated tests may be run as usual and with pytest filtering it's easy to run one test on all device types, run all the tests for a particular device type, or run a device type and dtype combination. See the note "Generic Device-Type Testing" for more detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25967 Differential Revision: D17381987 Pulled By: mruberry fbshipit-source-id: 4a639641130f0a59d22da0efe0951b24b5bc4bfb	2019-09-13 23:34:28 -07:00
Geovanni Zhang	e293c4ea73	Fix 'in' return true incorrectly (#24156 ) Summary: Because of 'return NotImplemented', __contains__ return True when the element is not a number. bool(NotImplemented) == True Pull Request resolved: https://github.com/pytorch/pytorch/pull/24156 Differential Revision: D16829895 Pulled By: zou3519 fbshipit-source-id: 9d3d58025b2b78b33a26fdfcfa6029d0d049f11f	2019-09-13 09:27:58 -07:00
Richard Zou	5e2d25af34	Implement tensor.align_as(other), change tensor.align_to(names) (#25843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25843 `tensor.align_to(names)` permutes the dimensions of `tensor` and adds additional 1-sized dimensions such that the output tensor has dimensions in the same order as `names`. All dimensions of `tensor` must be present in `names`, in addition, this function requires that all dims of `tensor` be named. `tensor.align_as(other)` is equivalent to `tensor.align_to(other.names)`. I'm planning on changing `torch.align_tensors(*tensors)` to align closer to these semantics because there didn't seem to be a clear use case for the old semantics that preserve unnamed dimensions. That will come in a future change. Test Plan: - new tests [namedtensor ci] Differential Revision: D17255549 Pulled By: zou3519 fbshipit-source-id: 1e437ad81e9359b4d5bd0e7e64c3a1be441fc3e3	2019-09-12 22:53:44 -07:00
Richard Zou	e544f88590	Implement tensor.refine_names (#25842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25842 `tensor.refine_names(names)` takes `tensor` and attempts to name its dimensions `names` out-of-place. If a dimension `i` already had a name, then it cannot be changed (so tensor.names[i] must equal names[i]); if the original dimension did not have a name, then the new name (names[i]) can be anything. `tensor.refine_names(names)` also accepts a glob '' that greedily selects names from `tensor`. Here are some examples: - `Tensor[None].refine_names('N') -> Tensor[N]` - `Tensor[N].refine_names('N') -> Tensor[N]` - `Tensor[N].refine_names('D') -> Error!` - `Tensor[N].refine_names(None) -> Error!` - `Tensor[None, None].refine_names('', D) -> Tensor[None, D]` Test Plan: - new tests [namedtensor ci] Differential Revision: D17255548 Pulled By: zou3519 fbshipit-source-id: fdbdb3a12f24fbe37ce1e53ed09dc8a42589d928	2019-09-12 22:53:40 -07:00
vishwakftw	eee58f8284	Refactor torch.*solve tests (#25733 ) Summary: Changelog: - De-duplicate the code in tests for torch.solve, torch.cholesky_solve, torch.triangular_solve - Skip tests explicitly if requirements aren't met for e.g., if NumPy / SciPy aren't available in the environment - Add generic helpers for these tests in test/common_utils.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/25733 Test Plan: - All tests should pass to confirm that the change is not erroneous Clears one point specified in the discussion in https://github.com/pytorch/pytorch/issues/24333. Differential Revision: D17315330 Pulled By: zou3519 fbshipit-source-id: c72a793e89af7e2cdb163521816d56747fd70a0e	2019-09-11 14:30:00 -07:00
Pavel Belevich	a14e884546	Migrate pow from TH to Aten (CUDA) (#25517 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24613 ``` DEBUG = 0 OMP_NUM_THREADS = 1 Tesla M40 import torch base = torch.randn(1000000, device='cuda:1') exp = torch.randn(1000000, device='cuda:1') out = torch.empty_like(base) timeit base.pow(0) old 53.1 µs ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.7 µs ± 15 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit base.pow(1/3) old 53.3 µs ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.1 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/3) old 53.3 µs ± 55.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.1 µs ± 29.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/2) old 53.2 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.8 µs ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/2) old 53.3 µs ± 54.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 42 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1) old 38.3 µs ± 53.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 40.1 µs ± 41.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1) old 38.4 µs ± 29 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 35 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(2) old 38.1 µs ± 20.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.8 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-2) old 38.3 µs ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 35.2 µs ± 54.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(3) old 38.3 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 34.9 µs ± 46.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-3) old 53.3 µs ± 89.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.4 µs ± 31.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(123456.789) old 53.3 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.2 µs ± 24.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-123456.789) old 53.5 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 51.3 µs ± 66.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(exp) old 58.2 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 54.5 µs ± 25.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp) old 49.1 µs ± 89.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.7 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp) old 48.7 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.7 µs ± 88.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit torch.pow(-1, exp) old 50.7 µs ± 104 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.8 µs ± 100 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp) old 49.4 µs ± 98 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.6 µs ± 26.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp) old 50.4 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.8 µs ± 48.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp, out=out) old 49 µs ± 13 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 59.2 µs ± 169 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp, out=out) old 49.3 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 18.8 µs ± 45.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) timeit torch.pow(-1, exp, out=out) old 50.4 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 60.2 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp, out=out) old 49.2 µs ± 293 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 58.9 µs ± 193 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp, out=out) old 50.5 µs ± 150 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 60.1 µs ± 89.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) base = (torch.rand(1000000, device='cuda:1') * 10).to(int) exp = (torch.rand(1000000, device='cuda:1') * 10).to(int) out = torch.empty_like(base) timeit base.pow(0) old 75.5 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.8 µs ± 84.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/3) old 75.5 µs ± 78.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 842 µs ± 449 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/3) old 75.5 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 843 µs ± 231 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1/2) old 75.7 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 123 µs ± 71.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1/2) old 76 µs ± 162 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 180 µs ± 55.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(1) old 74.1 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 72.3 µs ± 32.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-1.0) old Integers to negative integer powers are not allowed. new 86.9 µs ± 84.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(2) old 74.2 µs ± 15.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 66.5 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-2.0) old Integers to negative integer powers are not allowed. new 87.3 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(3) old 74.3 µs ± 23.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 66.5 µs ± 43.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-3.0) old Integers to negative integer powers are not allowed. new 861 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(123456.789) old 256 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 863 µs ± 64.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(-123456.789) old Integers to negative integer powers are not allowed. new 863 µs ± 57.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit base.pow(exp) old 111 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 98.8 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp) old 81.9 µs ± 23.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 92.9 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp) old 81.9 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.6 µs ± 56.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-1, exp) old 82.2 µs ± 15.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.6 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp) old 82.1 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.8 µs ± 75.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp) old 82.3 µs ± 18.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 94 µs ± 68.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(0, exp, out=out) old 81.6 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.8 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(1, exp, out=out) old 81.6 µs ± 26.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 33.7 µs ± 36.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-1, exp, out=out) old 82.7 µs ± 119 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.9 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(42, exp, out=out) old 82.6 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 93.7 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) timeit torch.pow(-42, exp, out=out) old 82.5 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) new 94 µs ± 55.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25517 Differential Revision: D17251364 Pulled By: pbelevich fbshipit-source-id: 20904c073c311e76285eaa1b68e67e67ea3c62d8	2019-09-10 13:46:22 -07:00
Ailing Zhang	26f67e7aa7	fix scatter CPU kernel when (input size, src size) > index size (#25839 ) Summary: fixes https://github.com/pytorch/pytorch/issues/25836 According to doc, https://pytorch.org/docs/stable/tensors.html#torch.Tensor.scatter_ `index` must have the smallest size and we should iterate over `index` instead of `tensor`. cc: dlibenzi Pull Request resolved: https://github.com/pytorch/pytorch/pull/25839 Differential Revision: D17269116 Pulled By: ailzhang fbshipit-source-id: 0e8569fed6c0d2dd70e4e3ec5d29d8730cd2ae8f	2019-09-10 11:41:41 -07:00
Hong Xu	57b23c61c5	In the CUDA implementation of erfinv, erfinv() should be used for double (#25337 ) Summary: This best preserves accuracy, while erfinvf() should be used for half and float. This is also consistent with the implementation before the migration: https://github.com/pytorch/pytorch/issues/24943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25337 Differential Revision: D17102333 Pulled By: zou3519 fbshipit-source-id: 5178cff534cf5f10d86ab04d4b6c1779ffedf49e	2019-09-10 06:30:33 -07:00
Igor Fedan	bf04c2ca2f	Make torch checks same for both CPU and CUDA multinomial (#25595 ) Summary: Currently we have different checks for multinomial method on CPU and CUDA. This PR will make them consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25595 Differential Revision: D17236163 Pulled By: ifedan fbshipit-source-id: 7718173bdaf216e8eb636c2a5b9c5939b975325b	2019-09-10 05:29:58 -07:00
vishwakftw	36bdde255e	Fix test_det_logdet_slogdet_batched on PowerPC (#25773 ) Summary: Changelog: - Simplify generation of singular matrices to just constructing a constant matrix instead of a random singular matrix using random_square_matrix_of_rank, which is susceptible to numerical issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/25773 Test Plan: - test_det_logdet_slogdet_batched should pass Fixes https://github.com/pytorch/pytorch/issues/25172 cc: branfosj hartb Apologies for the delay. Differential Revision: D17261059 Pulled By: soumith fbshipit-source-id: 8f991e2cb8c0e9dccad363d4785075213088e58a	2019-09-09 19:23:42 -07:00
Richard Zou	7970e5720b	Rename tensor.view_names -> tensor.renamed (#25711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25711 This function renames the dimensions of a tensor out-of-place. Because of that, I think `tensor.renamed(...)` is a clearer name: `view_names` has the connotation that we can use names to `view` our tensors with a "different shape", but what this function really does is let us rename a tensor no matter the previous names. `tensor.names_`, the in-place version of this, is unchanged for now. However, we might delete this or not advertise it if it has no use case and also because its naming is a little inconsistent with `tensor.renamed`. Test Plan: - [namedtensor ci] Differential Revision: D17206515 Pulled By: zou3519 fbshipit-source-id: 67053951fcc8130c84566b5ebbdce35ef619c90d	2019-09-06 11:28:04 -07:00
Brian Vaughan	88e4cee3e7	Improve handling of mixed-type tensor operations (#22273 ) Summary: Improve handling of mixed-type tensor operations. This PR affects the arithmetic (add, sub, mul, and div) operators implemented via TensorIterator (so dense but not sparse tensor ops). For these operators, we will now promote to reasonable types where possible, following the rules defined in https://github.com/pytorch/pytorch/issues/9515, and error in cases where the cast would require floating point -> integral or non-boolean to boolean downcasts. The details of the promotion rules are described here: https://github.com/nairbv/pytorch/blob/promote_types_strict/docs/source/tensor_attributes.rst Some specific backwards incompatible examples: * now `int_tensor * float` will result in a float tensor, whereas previously the floating point operand was first cast to an int. Previously `torch.tensor(10) * 1.9` => `tensor(10)` because the 1.9 was downcast to `1`. Now the result will be the more intuitive `tensor(19)` * Now `int_tensor *= float` will error, since the floating point result of this operation can't be cast into the in-place integral type result. See more examples/detail in the original issue (https://github.com/pytorch/pytorch/issues/9515), in the above linked tensor_attributes.rst doc, or in the test_type_promotion.py tests added in this PR: https://github.com/nairbv/pytorch/blob/promote_types_strict/test/test_type_promotion.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/22273 Reviewed By: gchanan Differential Revision: D16582230 Pulled By: nairbv fbshipit-source-id: 4029cca891908cdbf4253e4513c617bba7306cb3	2019-09-05 18:26:09 -07:00
Johannes M Dieterich	c6dd4036f5	Enable two tests that were skipped b/c of rocThrust bugs fixed in ROCm 2.7 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25724 Differential Revision: D17212373 Pulled By: bddppq fbshipit-source-id: 2978bc13cdcd0e96a82c0019a08b589f67c0fe1d	2019-09-05 16:10:56 -07:00
iotamudelta	4fe857187c	switch to rocThrust for thrust/cub APIs (#25620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602 Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option. Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header. Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust. Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable. Skip four tests that fail with the new rocThrust for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864 Reviewed By: xw285cornell Differential Revision: D16940768 Pulled By: bddppq fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5	2019-09-03 22:16:30 -07:00
vishwakftw	d1e079e2e0	Enable torch.cholesky for batches > 262140 (#24438 ) Summary: Changelog: - Iterate over mini batches of 262140 matrices (maximum) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24438 Test Plan: - Added slow tests to test the behavior in test_torch and test_cuda Fixes https://github.com/pytorch/pytorch/issues/24403 Differential Revision: D17175603 Pulled By: soumith fbshipit-source-id: 1abb0a1e92494cf43ef4ba9efb54a919cd18bfef	2019-09-03 17:35:37 -07:00
vishwakftw	1e4832ffad	Enable broadcasting of batch dimensions RHS and LHS tensors for lu_solve (#24333 ) Summary: Changelog: - Enable broadcasting of RHS and LHS tensors for lu_solve. This means that you can now have RHS with size `3 x 2` and LHS with size `4 x 3 x 3` for instance - Remove deprecated behavior of having 2D tensors for RHS. Now all tensors have to have a last dimension which equals the number of right hand sides - Modified docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/24333 Test Plan: - Add tests for new behavior in test_torch.py with a port to test_cuda.py Differential Revision: D17165463 Pulled By: zou3519 fbshipit-source-id: cda5d5496ddb29ed0182bab250b5d90f8f454aa6	2019-09-03 15:14:48 -07:00
Richard Zou	5c4cc1e8f3	Prepare to add some Dimname/DimnameList overloads (#25405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25405 This PR adds schemas to native_functions.yaml, core/Tensor.h, and core/TensorMethods.h for Dimname/DimnameList overloads for the following functions: - min, max, max_values, min_values - mean, meadian - logsumexp, std, var, norm The actual implementations will come in a later PR. I am accumulating all the addtional schemas and changes to core/{Tensor\|TensorMethods}.h in this PR so that there is only one point of failure for potential merge conflicts. Test Plan: - Check that all pytorch builds still build. [namedtensor ci] Differential Revision: D17116333 Pulled By: zou3519 fbshipit-source-id: fd666d60109a311767169261afbec0fd85cc00c8	2019-09-03 10:55:47 -07:00
davidriazati	7a921ba17d	Manually implement `is_zipfile` (#25279 ) Summary: The default implementation is lenient in that it recognizes a zipfile if the magic number appears anywhere in the archive. So if someone has the bytes `PK\x03\x04` in a tensor, it gets recognized as a zipfile. See https://bugs.python.org/issue28494 This implementation only checks the first 4 bytes of the file for the zip magic number. We could also copy https://github.com/python/cpython/pull/5053's fix, but that seems like overkill. Fixes #25214 ](https://our.intern.facebook.com/intern/diff/17102516/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25279 Pulled By: driazati Differential Revision: D17102516 fbshipit-source-id: 4d09645bd97e9ff7136a2229fba1d9a1bce5665a	2019-08-30 16:47:50 -07:00
CamiWilliams	329757a907	Torch.flatten() returns a 1-dim tensor on a 0-dim tensor (#25406 ) Summary: PR for `torch.flatten()` to return a 1-dim tensor on a 0-dim tensor > torch.tensor(123).shape -> torch.Size([]) > torch.tensor(123).flatten() -> torch.tensor([123]) > torch.tensor(123).flatten().shape -> torch.Size([1]) resolve https://github.com/pytorch/pytorch/issues/22963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25406 Differential Revision: D17120464 Pulled By: CamiWilliams fbshipit-source-id: efbecd61f0aefd82f2ab417ca6bb467488ff99de	2019-08-30 08:53:08 -07:00
Brian Vaughan	f0c6021846	fix bug in assertNotEqual for int tensors (#25412 ) Summary: re-apply: https://github.com/pytorch/pytorch/pull/25199 but without a failing quantized test Pull Request resolved: https://github.com/pytorch/pytorch/pull/25412 Differential Revision: D17131303 Pulled By: nairbv fbshipit-source-id: edf7736af3ede5e809eded72be9514e922e70db4	2019-08-30 06:52:30 -07:00
Jerry Zhang	e231bd16fb	Revert D17112656: [pytorch][PR] fix bug in assertNotEqual for int tensors Test Plan: revert-hammer Differential Revision: D17112656 Original commit changeset: 43e0e7da6d58 fbshipit-source-id: 0a0f7b8b125f24a45023ddb46fe144f21499b723	2019-08-29 10:36:56 -07:00
Hong Xu	2e1c37c95c	Move the CUDA implementation of ceil to ATen. (#24866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24866 Fix #24542 Test Plan: Imported from OSS Differential Revision: D16965903 Pulled By: VitalyFedyunin fbshipit-source-id: b9decaa58bec813a23d369b5e1eec627599f41da	2019-08-29 08:48:31 -07:00
Brian Vaughan	1e2b19db6d	fix bug in assertNotEqual for int tensors (#25199 ) Summary: assertNotEqual was failing to detect differences in int tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25199 Differential Revision: D17112656 Pulled By: nairbv fbshipit-source-id: 43e0e7da6d58eb1c837a508d462a748b2065bdd9	2019-08-29 07:32:50 -07:00
Gregory Chanan	f362a5a04b	Revert "Let logical_xor support non-bool tensors." (#25269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25269 This reverts commit `5ca612b55e`. Test Plan: Imported from OSS Differential Revision: D17080088 fbshipit-source-id: e6b6215b713910c448e9a6b831b08f28b849c64a	2019-08-28 15:41:51 -07:00
SsnL	6100de9b1b	implement bool_tensor.bernoulli_ (#25076 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25076 Differential Revision: D17073453 Pulled By: ezyang fbshipit-source-id: 42410da8c9911c1d7b3543bde740c7e66ae0cc1c	2019-08-28 12:25:27 -07:00
Igor Fedan	afb7a162fb	Migrate erfinv and erfinv_ from the TH to Aten (CUDA) (#24943 ) Summary: https://github.com/pytorch/pytorch/issues/24560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24943 Differential Revision: D16996434 Pulled By: ifedan fbshipit-source-id: 77111a4e47bb2b20f65225d48e7213cd77ddae19	2019-08-28 09:30:08 -07:00
Pavel Belevich	112f249446	Port `pow` operator from the TH code to Aten (#23492 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/24750 ``` DEBUG = 0 OMP_NUM_THREADS = 1 import torch base = torch.randn(1000000) exp = torch.randn(1000000) out = torch.empty_like(base) timeit base.pow(0) +30x old 6.26 ms ± 35.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 213 µs ± 3.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(1/3) +6x old 56 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.41 ms ± 237 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(-1/3) +6x old 57 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.49 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(1/2) +6x old 4.04 ms ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 620 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-1/2) +5x old 6.56 ms ± 43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 1.24 ms ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(1) no diff old 322 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) new 331 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-1) +3.5x old 2.48 ms ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 717 µs ± 130 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(2) no diff old 328 µs ± 7.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) new 324 µs ± 4.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-2) +3.5x old 2.45 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 662 µs ± 3.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(3) +7x old 2.39 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 334 µs ± 7.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit base.pow(-3) +9x old 93.7 ms ± 5.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10.3 ms ± 666 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(123456.789) +5x old 46.5 ms ± 418 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.68 ms ± 325 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(-123456.789) +5x old 46.5 ms ± 784 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit base.pow(exp) +6x old 60.6 ms ± 4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.7 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(0, exp) no diff old 18.3 ms ± 859 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 21.2 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) timeit torch.pow(1, exp) +30x old 6.01 ms ± 81.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 203 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit torch.pow(-1, exp) +3x old 30.8 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.67 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(42, exp) +8x old 80.1 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.51 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(-42, exp) +2x old 21.8 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.5 ms ± 89.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(0, exp, out=out) no diff old 20.2 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 22.1 ms ± 648 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) timeit torch.pow(1, exp, out=out) +30x old 6.7 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) new 203 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) timeit torch.pow(-1, exp, out=out) +3x old 32.5 ms ± 3.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.4 ms ± 99.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(42, exp, out=out) +10x old 91 ms ± 7.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 9.64 ms ± 291 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) timeit torch.pow(-42, exp, out=out) +2.5x old 25.9 ms ± 5.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) new 10.1 ms ± 698 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` BC: enforce stronger shape requirements on the output tensor (out= keyword argument) and do not allow output tensor to be resized if it is also used as one of the inputs. BC: enforce stronger integer tensor base power integer exponent requirement on CPU and CUDA: `Integers to negative integer powers are not allowed.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/23492 Differential Revision: D16731583 Pulled By: pbelevich fbshipit-source-id: 4e5bf689357fe82a19371e42d48abbb7b4c1c3ca	2019-08-28 09:11:50 -07:00
Igor Fedan	9b1097958e	Migrate digamma\digamma_\polygamma\polygamma_ from the TH to Aten (CPU) (#25048 ) Summary: https://github.com/pytorch/pytorch/issues/24612 https://github.com/pytorch/pytorch/issues/24550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25048 Differential Revision: D16996440 Pulled By: ifedan fbshipit-source-id: 0d76588d179d4c932e3fc284cb399dcfc77bc622	2019-08-28 08:29:13 -07:00
Patrick Donnelly	883628cb5c	Added documentation for nn.functional.bilinear (#24951 ) Summary: Adds documentation for `nn.functional.bilinear`, as requested in https://github.com/pytorch/pytorch/issues/9886. The format follows that of `nn.functional.linear`, and borrows from `nn.bilinear` in its description of `Tensor` shapes. I am happy to add more extensive documentation (e.g. "Args," "Example(s)"). From what I gather, the format of comments is inconsistent across functions in `nn.functional.py` and between modules (e.g. `nn.functional` and `nn`). It's my first PR, so guidance for contributing documentation and other code would be greatly appreciated! Pull Request resolved: https://github.com/pytorch/pytorch/pull/24951 Differential Revision: D17091261 Pulled By: soumith fbshipit-source-id: efe2ad764700dfd6f30eedc03de4e1cd0d10ac72	2019-08-28 08:19:25 -07:00
Funtowicz Morgan	2c22076342	Moving sign function to ATen (#22861 ) Summary: This PR linked to https://github.com/pytorch/pytorch/issues/22806 moving sign function to ATen. sign(x) supports bool, and vectorized operation on CPU. sign(NaN) is defined to return 0. sign(bool) is a no-op, the resulting tensor will holds the same values than the input one. - [x] CPU Backend - [x] CUDA Backend - [x] Bring support for bool dtype - [x] Bring support for Half dtype - [x] Add test for NaN - [x] Add test for bool dtype - [x] Delete legacy implementation in THTensorMoreMath.cpp Performances: ```python timeit -s 'import torch; x = torch.randn((1000, 1000))' -n 1000 'torch.sign(x)' timeit -s 'import torch; x = torch.randn((1000, 1000), device="cuda")' -n 1000 'torch.sign(x); torch.cuda.synchronize()' ``` \| device \| before \| after \| \| :-------------: \| :-------------: \| :-----: \| \| CPU \| 1.24 msec \| 33.9 usec \| \| GPU \| 680 usec \| 7.13 usec \| \| CPU (1 thread) \| 0.82 msec \| 0.73 msec \| \| GPU (1 thread) \| 16.1 used \| 15.9 usec \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/22861 Differential Revision: D16503452 Pulled By: VitalyFedyunin fbshipit-source-id: a87ce7fff139642ef4ed791f15873074ad0d53af	2019-08-27 19:01:34 -07:00
Pavel Belevich	30bc65271d	torch.from_numpy fix for np.int (#25139 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22615 Because of different sizeof(long) we have the following relations between NPY_TYPES and NPY_INTXX aliases: ``` int value Enum Unix Windows 1 NPY_BYTE NPY_INT8 NPY_INT8 3 NPY_SHORT NPY_INT16 NPY_INT16 5 NPY_INT NPY_INT32 - 7 NPY_LONG NPY_INT64 NPY_INT32 9 NPY_LONGLONG - NPY_INT64 ``` I suggest the following fix for `numpy_dtype_to_aten` method: ``` if (dtype == NPY_INT \|\| dtype == NPY_INT32) { return kInt; } else if (dtype == NPY_LONGLONG \|\| dtype == NPY_INT64) { return kLong; } ``` On Unix it will be replaced with: Unix: ``` if (dtype == 5 \|\| dtype == 5) { return kInt; } else if (dtype == 9 \|\| dtype == 7) { return kLong; } ``` and on Windows with: ``` if (dtype == 5 \|\| dtype == 7) { return kInt; } else if (dtype == 9 \|\| dtype == 9) { return kLong; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25139 Differential Revision: D17048443 Pulled By: pbelevich fbshipit-source-id: 9f2c27ff2829b893a35d3d57f176a58e7749a468	2019-08-26 05:07:22 -07:00
Kexuan Sun	4b3ea92787	Test if descriptions of args are in the template (#24161 ) Summary: As in https://github.com/pytorch/pytorch/issues/23439, some descriptions of arguments in `_torch_docs.py` have been replaced by `common_args`, it would be helpful to check if any descriptions can be replaced for new docs in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24161 Differential Revision: D16889293 Pulled By: ezyang fbshipit-source-id: bf6f581494482d6eb32e634f73e84a4586766230	2019-08-20 16:34:50 -07:00
Max Balandat	d33623f7c1	Make SobolEngine use random seed if not specified (#24884 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/24881. Makes behavior consistent with the rest of the random functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24884 Test Plan: Unit tests Reviewed By: sdsingh Differential Revision: D16912036 Pulled By: Balandat fbshipit-source-id: eff00cca989926a5d9e20d8846a8674f7cd270cb	2019-08-20 09:22:41 -07:00
Pavel Belevich	6100205eb8	TensorIterator::binary_op input-output overlap check (#24058 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8212 This fix is based on the idea that in-place ops(e.g. add_(...)) and out ops(e.g. tensor.add(..., out=...)) must check that the output tensor does not partially overlap with any of it's input tensors. Otherwise the result of such op is unexpected to the user. Since TensorIterator is a common backend for such ops and it's already used to check output self-overlapping, this fix is implemented in the same place. MemOverlapStatus enum class is introduced to model two tensors overlapped state: - TOO_HARD if at least one of them is not contiguous - FULL if both are contiguous and share exactly the same memory array [data(), data() + numel() *itemsize()] - PARTIAL is both are contiguous but underlying memory is shared partially, in other words memory arrays overlap but not identical. - NO if both are contiguous but have independent non overlapping memory arrays Performance test of clone/addcmul_/addcdiv_ with check_mem_overlaps: a = torch.empty(10000000, device='cpu') b = torch.randn(10000000, device='cpu') timeit a.copy_(b) master: 10.3 ms ± 429 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) branch: 10.2 ms ± 946 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) a = torch.empty(10000000, device='cuda') b = torch.randn(10000000, device='cuda') timeit a.copy_(b) master: 373 µs ± 97.9 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) branch: 373 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cpu') b = torch.randn(1000000, device='cpu') c = torch.randn(1000000, device='cpu') timeit a.addcmul_(b, c) master: 2.02 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) branch: 2.11 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cuda') b = torch.randn(1000000, device='cuda') c = torch.randn(1000000, device='cuda') timeit a.addcmul_(b, c) master: 72.6 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 72.4 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(1000000, device='cpu') b = torch.randn(1000000, device='cpu') c = torch.randn(1000000, device='cpu') timeit a.addcdiv_(b, c) master: 2.19 ms ± 583 µs per loop (mean ± std. dev. of 7 runs, 1000 loop each) branch: 1.97 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) a = torch.randn(1000000, device='cuda') b = torch.randn(1000000, device='cuda') c = torch.randn(1000000, device='cuda') timeit a.addcdiv_(b, c) master: 71.3 µs ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 71.7 µs ± 3.96 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.empty(100, device='cpu') b = torch.randn(100, device='cpu') timeit a.copy_(b) master: 12.1 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) branch: 11.1 µs ± 61.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) a = torch.empty(100, device='cuda') b = torch.randn(100, device='cuda') timeit a.copy_(b) master: 20.9 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 22.8 µs ± 2.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cpu') b = torch.randn(100, device='cpu') c = torch.randn(100, device='cpu') timeit a.addcmul_(b, c) master: 24.1 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 24 µs ± 91.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cuda') b = torch.randn(100, device='cuda') c = torch.randn(100, device='cuda') timeit a.addcmul_(b, c) master: 34.5 µs ± 4.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 29.8 µs ± 496 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cpu') b = torch.randn(100, device='cpu') c = torch.randn(100, device='cpu') timeit a.addcdiv_(b, c) master: 21.3 µs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 23.8 µs ± 403 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) a = torch.randn(100, device='cuda') b = torch.randn(100, device='cuda') c = torch.randn(100, device='cuda') timeit a.addcdiv_(b, c) master: 30.3 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) branch: 31.8 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24058 Differential Revision: D16767892 Pulled By: pbelevich fbshipit-source-id: 0cdaaa471d003a2886b1736f8985842226b8493a	2019-08-19 15:06:04 -07:00
Vishwak Srinivasan	4358cbe01b	Allow torch.tril / triu to handle bool and half inputs (#24163 ) Summary: Changelog: - Enable torch.tril / triu for bool and float16 dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/24163 Test Plan: - Tests added in test_torch.py for all devices and dtypes (except bfloat16) Fixes https://github.com/pytorch/pytorch/issues/24035 Differential Revision: D16793315 Pulled By: ezyang fbshipit-source-id: 2bbc51ce567405a7cb2d8ab567eee6c2e40aa76a	2019-08-19 15:02:53 -07:00
vishwakftw	f849ebf1fe	Enable torch.eye for bool and half (#24148 ) Summary: Changelog: - Enable torch.eye for bool and float16 dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/24148 Test Plan: - Tests added in test_torch.py for all available devices and dtypes (except torch.bfloat16) Fixes https://github.com/pytorch/pytorch/issues/24088 Differential Revision: D16891048 Pulled By: ezyang fbshipit-source-id: 3e86fe271bd434300c396e63f82c1a1f3adac2b4	2019-08-19 14:59:37 -07:00
Iurii Zdebskyi	eee3e92936	Enabled torch.mm and torch.mv for bfloat16 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24224 Test Plan: Imported from OSS Differential Revision: D16779996 Pulled By: izdeby fbshipit-source-id: c859d8945a564edfa3f8a1430f140ae30d484d19	2019-08-16 15:46:15 -07:00
Heungsub Hans Lee	e166811598	Documentation for Tensor.record_stream() (#24078 ) Summary: This patch writes documentation for `Tensor.record_stream()`, which is not a documented API currently. I've discussed publishing it with colesbury in https://github.com/pytorch/pytorch/issues/23729. The documentation is based on [the introduction at `CUDACachingAllocator.cpp`](`25d1496d58/c10/cuda/CUDACachingAllocator.cpp (L47-L50)`). ~~I didn't explain full details of the life cycle of memory blocks or stream awareness of the allocator for the consistent level of details with other documentations.~~ I explained about the stream awareness in a note block. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24078 Differential Revision: D16743526 Pulled By: zou3519 fbshipit-source-id: 05819c3cc96733e2ba93c0a7c0ca06933acb22f3	2019-08-16 08:07:33 -07:00
Hong Xu	5ca612b55e	Let logical_xor support non-bool tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23978 Test Plan: Imported from OSS Differential Revision: D16719299 Pulled By: gchanan fbshipit-source-id: 2fe170be6090733e20410db7cf99266543299c58	2019-08-15 12:21:31 -07:00

1 2 3 4 5 ...

828 Commits