pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Roy Li	0786dfee7c	Move THTensor_(copy) to aten (#13603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13603 P Moved vectorized CPU copy to aten. Notable changes mainly in _copy_same_type_. Reviewed By: ezyang Differential Revision: D12936031 fbshipit-source-id: 00d28813e3160595e73d104f76685e13154971c1	2018-11-30 11:12:54 -08:00
Brennan Vincent	c638f379b3	Make `mean` function work across multiple dimensions. (#14252 ) Summary: Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it. Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252 Differential Revision: D13161157 Pulled By: umanwizard fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c	2018-11-28 06:53:09 -08:00
Francisco Massa	68251fb931	Fix half tensor printing plus speedup large tensor printing (#14418 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/14344 and https://github.com/pytorch/pytorch/issues/6863 The slowdown was due to the fact that we were only summarizing the tensor (for computing the number of digits to print) if its first dimension was larger than the threshold. It now goes over all the dimensions. Some quick runtime analysis: Before this PR: ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50) In [2]: %timeit str(a) 13.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After this PR ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50) In [2]: %timeit str(a) 2.08 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [3]: b = a.cuda() In [4]: %timeit str(b) 8.39 ms ± 45.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14418 Reviewed By: weiyangfb Differential Revision: D13226950 Pulled By: soumith fbshipit-source-id: 19eb4b855db4c8f891d0925a9c56ae8a2824bb23	2018-11-28 06:13:06 -08:00
Brian Vaughan	a0def0b57e	check for invalid ranges in torch.arange Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13915 Differential Revision: D13222110 Pulled By: nairbv fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b	2018-11-27 20:38:56 -08:00
Brian Vaughan	b08a186153	roll along multiple dimensions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13874 Differential Revision: D13223669 Pulled By: nairbv fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04	2018-11-27 20:32:30 -08:00
Ailing Zhang	e387d945c2	allow empty index for scatter_* methods (#14077 ) Summary: Fixes #2027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14077 Differential Revision: D13095788 Pulled By: ailzhang fbshipit-source-id: ad2c8bbf83d36e07940782b9206fbdcde8905fd3	2018-11-19 09:50:21 -08:00
vishwakftw	a5891e6124	Remove debugging code in test_cholesky_batched (#14156 ) Summary: They didn't turn up in my tests because I use pytest which doesn't print debug statements if the tests pass Differential Revision: D13115227 Pulled By: soumith fbshipit-source-id: 46a7d47da7412d6b071158a23ab21e7fb0c6e11b	2018-11-17 22:28:21 -08:00
vishwakftw	a30ade1139	Batched cholesky decomposition (#14017 ) Summary: Implements batching for the Cholesky decomposition. Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations. Changes made: - batching code - tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`. - doc string modification - autograd modification - removal of `_batch_potrf` in `MultivariateNormal`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017 Differential Revision: D13087945 Pulled By: ezyang fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e	2018-11-17 10:49:15 -08:00
Johannes M Dieterich	ce48958606	enable more unit tests (#13166 ) Summary: This enables the distributions and utils test sets for ROCm. Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit. For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166 Differential Revision: D12814759 Pulled By: bddppq fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e	2018-11-12 18:49:52 -08:00
Vishwak Srinivasan	7b2fb012a8	Make potrs batched (#13453 ) Summary: - This is a straightforward PR, building up on the batch inverse PR, except for one change: - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty. Billing of changes: - Add batching for `potrs` - Add relevant tests - Modify doc string Minor changes: - Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`. - Add test for CUDA `potrs` (2D Tensor op) - Move the batched shape checking to `LinearAlgebraUtils.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453 Reviewed By: soumith Differential Revision: D12942039 Pulled By: zou3519 fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35	2018-11-09 15:16:26 -08:00
Brian Vaughan	4fadf571fd	handle flat rolling (no dim specified) T36264909 (#13588 ) Summary: update roll to behave as in numpy.roll when dimension to roll not specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13588 Differential Revision: D12964295 Pulled By: nairbv fbshipit-source-id: de9cdea1a937773033f081f8c1505a40e4e08bc1	2018-11-08 12:39:35 -08:00
vishwakftw	0a090fe60a	Fix torch.dist for infinity, zero and minus infinity norms (#13713 ) Summary: Fixes #13559 Differential Revision: D12981556 Pulled By: zou3519 fbshipit-source-id: 99e86abab3ca045257374a9212ca24e7ca59fe9d	2018-11-08 12:03:07 -08:00
Wei Yang	5dd153b1c2	speed up torch.sparse_mask() cpu kernel (#13290 ) Summary: - `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()` - previous `sparse_mask(D, S)` cpu kernel is not parallelized - this PR speed up the cpu kernel for two separated cases: - `D.dim == S.sparse_dim`: simply parallelize the kernel - `D.dim > S.sparse_dim`: simply use CUDA kernel implementation - performance: `D.dim == S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) >>> %timeit D.sparse_mask(S) ======= before change ======= 6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ======= after change ======= 333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` `D.dim > S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000, 2, 2] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, dims[2], dims[3]) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) %timeit D.sparse_mask(S) ======= before change ======= 495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ======= after change ======= 594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290 Differential Revision: D12878336 Pulled By: weiyangfb fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37	2018-11-07 20:02:17 -08:00
Wei Yang	6bfce16873	fix flip() shape bug in CPU (#13344 ) Summary: - a walk around for #13292, a complete fix requires investigation on the root cause when using advanced indexing - this PR brings in `filp()` CUDA implementation for CPU kernel - with this change: ``` >>> t = torch.randn(1, 3, 4, 5) >> t.flip(1, 3).shape torch.Size([1, 3, 4, 5]) ``` - performance: ``` ====== with this PR ====== >>> a = torch.randn(1000, 1000) >>> %timeit -r 100 a.flip(0, 1) 1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each) ====== Perf at previous PR #7873 ====== 100 loops, best of 3: 11 ms per loop ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13344 Differential Revision: D12968003 Pulled By: weiyangfb fbshipit-source-id: 66f434049d143a0575a35b5c983b3e0577a1a28d	2018-11-07 19:53:49 -08:00
Soumith Chintala	a7ee632dff	Various Test and build fixes (#13556 ) Summary: - fixes weights-contiguous requirement for THCUNN Convolutions - Add tests that conv backward pass works for non-contiguous weights - fix RNN tests / error messages to be consistent and pass - relax weight grad precision for fp16 for a particular test - fix regression of CMAKE_PREFIX_PATH not passing through - add missing skipIfNoLapack annotations where needed Differential Revision: D12918456 Pulled By: soumith fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05	2018-11-06 07:13:47 -08:00
Thomas Viehmann	f0ed927b62	Add diag_embed to ATen and torch (#12447 ) Summary: Fixes: #12160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12447 Differential Revision: D12916234 Pulled By: SsnL fbshipit-source-id: 512a04efb0c2e0a54295b857a61be66c3aae13da	2018-11-05 08:55:28 -08:00
Brian Vaughan	07f8b61cc6	Roll operator t32802531 (#13261 ) Summary: Adding a roll operator Pull Request resolved: https://github.com/pytorch/pytorch/pull/13261 Differential Revision: D12922575 Pulled By: nairbv fbshipit-source-id: ff05c075d9c484a615011192b023debf47da4017	2018-11-05 08:33:36 -08:00
Tongzhou Wang	2f82a06826	Fix half_tensor.bernoulli_(double) (#13474 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/12431 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13474 Differential Revision: D12897834 Pulled By: SsnL fbshipit-source-id: 598250fd7b9f1d2509ec0e5012724d7895a62daf	2018-11-02 07:46:46 -07:00
Tongzhou Wang	6d2b3cc869	Fix pytest, make it work with run_test.py (#13416 ) Summary: Fixes #13326 Also now you can use `run_test.py` with `pytest`. E.g., ``` python run_test.py -vci distributed -pt ``` Yes it works with `distributed` and `cpp_extension`. cc zou3519 vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/13416 Differential Revision: D12895622 Pulled By: SsnL fbshipit-source-id: 2d18106f3a118d642a666bfb1318f41c859c3df7	2018-11-01 19:08:06 -07:00
vishwakftw	d714ecf879	Rename potrf to cholesky (#12699 ) Summary: This PR performs a renaming of the function `potrf` responsible for the Cholesky decomposition on positive definite matrices to `cholesky` as NumPy and TF do. Billing of changes - make potrf cname for cholesky in Declarations.cwrap - modify the function names in ATen/core - modify the function names in Python frontend - issue warnings when potrf is called to notify users of the change Reviewed By: soumith Differential Revision: D10528361 Pulled By: zou3519 fbshipit-source-id: 19d9bcf8ffb38def698ae5acf30743884dda0d88	2018-11-01 15:10:55 -07:00
Sam Gross	a4f00c3d1e	Fix error message in tensorlist() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13392 Differential Revision: D12860921 Pulled By: colesbury fbshipit-source-id: 86da3ef15d70b0343dc922a3842449001c1afffa	2018-10-31 11:19:56 -07:00
Will Feng	11a16961a5	Fix "CUDA Tensor __rsub__ breaks when device is not 0" (#12956 ) Summary: Currently, `a = 1 - torch.tensor([1]).to('cuda:1')` puts `a` in `cuda:1` but reports `a.device` as `cuda:0` which is incorrect, and it causes illegal memory access error when trying to access `a`'s memory (e.g. when printing). This PR fixes the error. Fixes https://github.com/pytorch/pytorch/issues/10850. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12956 Differential Revision: D12835992 Pulled By: yf225 fbshipit-source-id: 5737703d2012b14fd00a71dafeedebd8230a0b04	2018-10-30 16:29:19 -07:00
Wei Yang	3cb2470bb3	add __deepcopy__ back to Parameter (#12886 ) Summary: - fix https://github.com/pytorch/pytorch/issues/315 - add `__deepcopy__` back to Parameter class Pull Request resolved: https://github.com/pytorch/pytorch/pull/12886 Differential Revision: D12838771 Pulled By: weiyangfb fbshipit-source-id: b2ce12244e36f981d89f6c7cdead63237dd820ea	2018-10-30 12:56:26 -07:00
Tongzhou Wang	d8dab6ffa8	Add tensor.to(options) (#13146 ) Summary: ezyang on the template hack smessmer on SFINAE of the `TensorOptions(Device)` goldsborough on the C++ API test changes zdevito on the `jit` codegen changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/13146 Reviewed By: ezyang Differential Revision: D12823809 Pulled By: SsnL fbshipit-source-id: 98d65c401c98fda1c6fa358e4538f86c6495abdc	2018-10-29 16:26:06 -07:00
Tongzhou Wang	8ad69a80e3	Test scripts only run cases defined in the running script (#13250 ) Summary: 1. Refactors `TestTorch` into `TestTorchMixin` (subclass of `object`) and `TestTorch` (subclass of `TestCase`, MRO `(TestCase, TestTorchMixin)`, only defined if `__name__ == '__main__'`). So other scripts won't accidentally run it. 2. Adds an assertion in `load_tests` that each script only runs cases defined in itself. cc yf225 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13250 Differential Revision: D12823734 Pulled By: SsnL fbshipit-source-id: 7a169f35fe0794ce76e310d8a137d9a3265c012b	2018-10-29 13:57:40 -07:00
vishwakftw	1fe8278559	Batched Inverse (#9949 ) Summary: Complete billing of changes: Related to Batch Inverse: - [x] Add batched inverse (CPU) - [x] Add batched inverse (CUDA) - [x] Modify autograd entry - [x] Add tests - [x] test_autograd - [x] test_cuda - [x] test_torch - [x] Modify docs - [x] Remove `_batch_inverse` in `MultivariateNormal`. - [x] Allow batch matrices as inputs for negative powers in `matrix_power` Miscellaneous modifications: - [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops. - [x] Add a RAII structure for MAGMA queue management. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9949 Differential Revision: D10559089 Pulled By: zou3519 fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4	2018-10-27 23:42:46 -07:00
Johannes M Dieterich	7a6e0bd77e	Skip ROCm tests that fail as per #12824 (#13181 ) Summary: For attention: bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/13181 Differential Revision: D12811207 Pulled By: bddppq fbshipit-source-id: de1c92e5a8cf4fc634c4644376d07374441c24e3	2018-10-26 21:06:20 -07:00
Zachary DeVito	dae7616078	Shard all of tests based on how many tests exist. (#13160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160 Reduces pytorch_core build from 2 hours to 30 minutes Reviewed By: soumith, dzhulgakov Differential Revision: D10524261 fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9	2018-10-26 18:20:34 -07:00
Ailing Zhang	478886be30	Fix print precision and match numpy behavior (#12746 ) Summary: Fixes #12578 #9395. * Fix and simplify print logic * Follow numpy print rule `eb2bd11870/numpy/core/arrayprint.py (L859)` > scientific notation is used when absolute value of the smallest number is < 1e-4 or maximum > 1e8 or the ratio of the maximum absolute value to the minimum is > 1e3 I hope I didn't break anything since there seems to be a lot of edge cases here... Here are some easy sanity checks. ``` In [5]: torch.tensor(1) Out[5]: tensor(1) Out[2]: array(1) # numpy In [6]: torch.tensor(10) Out[6]: tensor(10) Out[3]: array(10) # numpy In [8]: torch.tensor(99000000) Out[8]: tensor(99000000) Out[5]: array(99000000) # numpy In [9]: torch.tensor(100000000) Out[9]: tensor(100000000) Out[6]: array(100000000) # numpy In [10]: torch.tensor(100000001) Out[10]: tensor(100000001) Out[7]: array(100000001) # numpy In [11]: torch.tensor(1000000000) Out[11]: tensor(1000000000) Out[8]: array(1000000000) # numpy In [12]: torch.tensor([1, 1000]) Out[12]: tensor([ 1, 1000]) Out[9]: array([ 1, 1000]) # numpy In [13]: torch.tensor([1, 1010]) Out[13]: tensor([ 1, 1010]) Out[10]: array([ 1, 1010]) # numpy ``` For floating points, we use scientific when `max/min > 1000 \|\| max > 1e8 \|\| min < 1e-4` Lines with "old" are old behaviors that either has precision issue, or not aligned with numpy ``` In [14]: torch.tensor(0.01) Out[14]: tensor(0.0100) Out[11]: array(0.01) # numpy In [15]: torch.tensor(0.1) Out[15]: tensor(0.1000) Out[12]: array(0.1) # numpy In [16]: torch.tensor(0.0001) Out[16]: tensor(0.0001) Out[14]: array(0.0001) # numpy In [17]: torch.tensor(0.00002) Out[17]: tensor(2.0000e-05) Out[15]: array(2e-05) # numpy Out[5]: tensor(0.0000) # old In [18]: torch.tensor(1e8) Out[18]: tensor(100000000.) Out[16]: array(100000000.0) # numpy In [19]: torch.tensor(1.1e8) Out[19]: tensor(1.1000e+08) Out[17]: array(1.1e8) # numpy 1.14.5, In <= 1.13 this was not using scientific print Out[10]: tensor(110000000.) # old In [20]: torch.tensor([0.01, 10.]) Out[20]: tensor([ 0.0100, 10.0000]) Out[18]: array([ 0.01, 10. ]) # numpy In [21]: torch.tensor([0.01, 11.]) Out[21]: tensor([1.0000e-02, 1.1000e+01]) Out[19]: array([ 1.00000000e-02, 1.10000000e+01]) # numpy Out[7]: tensor([ 0.0100, 11.0000]) # old ``` When print floating number in int mode, we still need to respect rules to use scientific mode first ``` In [22]: torch.tensor([1., 1000.]) Out[22]: tensor([ 1., 1000.]) Out[20]: array([ 1., 1000.]) # numpy In [23]: torch.tensor([1., 1010.]) Out[23]: tensor([1.0000e+00, 1.0100e+03]) Out[21]: array([ 1.00000000e+00, 1.01000000e+03]) # numpy Out[9]: tensor([ 1., 1010.]) # old ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12746 Differential Revision: D10443800 Pulled By: ailzhang fbshipit-source-id: f5e4e3fe9bf0b44af2c64c93a9ed42b73fa613f5	2018-10-24 18:12:51 -07:00
James Sun	f4944f0f8a	Rename test/common.py to test/common_utils.py (#12794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794 common.py is used in base_module for almost all tests in test/. The name of this file is so common that can easily conflict with other dependencies if they happen to have another common.py in the base module. Rename the file to avoid conflict. Reviewed By: orionr Differential Revision: D10438204 fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380	2018-10-17 23:04:29 -07:00
Sepehr Sameni	cffeb03a2d	fix forward and backward for norm with negative infinity norm (#12722 ) Summary: I found a bug in norm() and fixed it (and added tests to make sure it's fixed) here is how to reproduce it: ```python import torch x = torch.FloatTensor([[10, 12, 13], [4, 0, 12]]) print(torch.norm(x, -40, dim=0, keepdim=True)) #output is tensor([[ 4.0000, 0.0000, 11.9853]]) print(torch.norm(x, float('-inf'), dim=0, keepdim=True)) #output is tensor([[1., 1., 1.]]) which is wrong! from numpy.linalg import norm as np_norm x = x.numpy() print(np_norm(x, ord=-40, axis=0)) #output is array([[4., 0., 11.985261]]) print(np_norm(x, ord=float('-inf'), axis=0)) #output is array([[4., 0., 12.0]]) ``` it's related to [#6817](https://github.com/pytorch/pytorch/issues/6817) and [#6969](https://github.com/pytorch/pytorch/pull/6969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12722 Differential Revision: D10427687 Pulled By: soumith fbshipit-source-id: 936a7491d1e2625410513ee9c39f8c910e8e6803	2018-10-17 21:07:43 -07:00
Ailing Zhang	25db86cca5	Fix isfinite for int input (#12750 ) Summary: `torch.isfinite()` used to crash on int inputs. ``` >>> import torch >>> a = torch.tensor([1, 2]) >>> torch.isfinite(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/scratch/pytorch/torch/functional.py", line 262, in isfinite return (tensor == tensor) & (tensor.abs() != inf) RuntimeError: value cannot be converted to type int64_t without overflow: inf ``` But this is a easy special case and numpy also supports it. ``` >>> import numpy as np >>> a = np.array([1, 2]) >>> a.dtype dtype('int64') >>> np.isfinite(a) array([ True, True], dtype=bool) ``` So added a hacky line to handle non-floating-point input. Since pytorch raises exception when overflow, we can safely assume all valid int tensors are infinite numbers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12750 Differential Revision: D10428204 Pulled By: ailzhang fbshipit-source-id: f39b2d0975762c91cdea23c766ff1e21d85d57a5	2018-10-17 11:48:25 -07:00
Thomas Viehmann	50c0aedbec	Don't segfault on Tensor.__delitem__ (#12726 ) Summary: The mapping protocol stipulates that when `__delitem__` is called, this is passed to `__setitem__` [(well, the same function in the C extension interface)](https://docs.python.org/3/c-api/typeobj.html#c.PyMappingMethods.mp_ass_subscript) with NULL data. PyTorch master crashes in this situation, with this patch, it does not anymore. Test code (careful, sefaults your interpreter): ```python import torch a = torch.randn(5) del a[2] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12726 Differential Revision: D10414244 Pulled By: colesbury fbshipit-source-id: c49716e1a0a3d9a117ce88fc394858f1df36ed79	2018-10-16 17:24:18 -07:00
vishwakftw	0740a5d521	compute_uv for SVD (#12517 ) Summary: Adds a `compute_uv` argument that defaults to `True` for optionally computing the singular vectors during SVD. Closes https://github.com/pytorch/pytorch/issues/12420 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12517 Differential Revision: D10384554 Pulled By: SsnL fbshipit-source-id: 704998a257afa815eda901b8ae830e8a661695be	2018-10-15 12:35:56 -07:00
Mingfei Ma	02695c11db	fix masked_fill_ bug on non-contiguous tensor (#12594 ) Summary: bug fix on #12230 , the following script pass after the fix. ```python x = torch.randn(2, 2, 2) x = x.permute((2, 0, 1)) y = x.clone() y.masked_fill_(y > 0, 1) x.masked_fill_(x > 0, 1) print((x == y).all()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12594 Differential Revision: D10377088 Pulled By: soumith fbshipit-source-id: 88feabe1459d325bfdf9a860412ddbd28686a28b	2018-10-14 23:12:27 -07:00
vishwakftw	48bc57fa8d	Introduce chain_matmul (#12380 ) Summary: - This was one of the few functions left out from the list of functions in NumPy's `linalg` module - `multi_mm` is particularly useful for DL research, for quick analysis of deep linear networks - Added tests and doc string Pull Request resolved: https://github.com/pytorch/pytorch/pull/12380 Differential Revision: D10357136 Pulled By: SsnL fbshipit-source-id: 52b44fa18d6409bdeb76cbbb164fe4e88224458e	2018-10-12 03:58:12 -07:00
Thomas Viehmann	0cf3c1ce66	Add copy= keyword to Tensor.to (#12571 ) Summary: Fixes: #12454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12571 Differential Revision: D10356994 Pulled By: SsnL fbshipit-source-id: d87416078a5a8e5ffa690cd73c09fa6b4e16aa25	2018-10-12 02:10:44 -07:00
Johannes M Dieterich	957142a4fe	switch ROCm CI targets to white rabbit release (#12577 ) Summary: * switches docker files over to white rabbit release - removed custom package installs * skips five tests that regressed in that release * fixes some case-sensitivity issues in ROCm supplied cmake files by sed'ing them in the docker * includes first changes to the infrastructure to support upcoming hip-clang compiler * prints ROCm library versions as part of the build (as discussed w/ ezyang ) * explicitly searches for miopengemm * installs the new hip-thrust package to be able to remove the explicit Thrust checkout in a future revision Pull Request resolved: https://github.com/pytorch/pytorch/pull/12577 Differential Revision: D10350165 Pulled By: bddppq fbshipit-source-id: 60f9c9caf04a48cfa90f4c37e242d944a175ab31	2018-10-11 18:03:11 -07:00
Ailing Zhang	8734b174ca	Multinomial raise error (#12490 ) Summary: Fixes #12260 #2896 ``` torch.multinomial(torch.FloatTensor([0, 1, 0, 0]), 3, replacement=False) ``` The old behavior is that we return `0` after we run out of postive categories. Now we raise an error based on discussion in the issue thread. - Add testcase for cpu & cuda case, in cuda case `n_samples=1` is a simple special case, so we test against `n_sample=2` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12490 Differential Revision: D10278794 Pulled By: ailzhang fbshipit-source-id: d04de7a60f60d0c0d648b975db3f3961fcf42db1	2018-10-10 20:39:04 -07:00
iotamudelta	c96afa3322	topk and sort fixes (#12337 ) Summary: * Topk part 1: fix intrinsincs for 64 wave front (#224) 64 in a wave front - intrinsics change. * Disable in-place sorting on ROCm. (#237) It is known to hang - use the Thrust fallback Skip one test - fails with the fallback. * Topk fixes (#239) * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255 * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs * Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63 * Round up blockDim.x to prevent negative index for smem bddppq ezyang Note the one additional skipped test resulting from using the thrust sort fallback for all sizes. We are working on getting bitonic to work properly (and always). Until then, this needs to be skipped on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12337 Differential Revision: D10259481 Pulled By: ezyang fbshipit-source-id: 5c8dc6596d7a3103ba7b4b550cba895f38c8148e	2018-10-09 12:08:48 -07:00
Thomas Viehmann	0e44db8b0d	Add check for backend of arguments to bmm cpu (#12434 ) Summary: Fixes: #12406 Thank you, jcjohnson, for reporting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12434 Differential Revision: D10235799 Pulled By: soumith fbshipit-source-id: 44ee35010bac3791901f604095f5b4bc66b0e7f8	2018-10-07 18:55:42 -07:00
Johannes M Dieterich	c9f7d7b506	mark unit tests as working, skip failing unit test (#12313 ) Summary: * enabled fp16 tests for test_torch * enable fp16 tests for test_nn * enabled multilabelmargin loss for fp16 * removed skip for test_pdist_empty_col * Enable test_nn tests that pass with compiler fixes etc. * Enable test_legacy_nn tests that pass with compiler fixes etc. ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313 Differential Revision: D10189922 Pulled By: bddppq fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6	2018-10-03 23:56:26 -07:00
iotamudelta	a2ebbccc9f	fix unit tests on CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187 Differential Revision: D10118483 Pulled By: bddppq fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442	2018-09-28 23:11:55 -07:00
Wei Yang	de11fe0c83	migrate PReLU to ATen (#11758 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10723 - migrate PReLU to ATen and deprecate legacy PReLU - performance: CPU with weight.numel() = 1 ``` >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 100 loops, best of 100: 9.43 ms per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 10 loops, best of 100: 24.4 ms per loop >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 695 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.47 ms per loop ``` CPU with weight.numel() = channels ``` >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 603 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 13.3 ms per loop >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 655 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.45 ms per loop ``` CUDA with weight.numel() = 1 ``` >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 187 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.01 ms per loop >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 195 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.28 ms per loop ``` CUDA with weight.numel() = channel ``` >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 174 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.27 ms per loop >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 181 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.26 ms per loop ``` The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels. ezyang SsnL zou3519 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758 Differential Revision: D9995799 Pulled By: weiyangfb fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e	2018-09-21 16:26:04 -07:00
Wei Yang	817e83fc01	fix PR #11061 (#11815 ) Summary: - fix PR https://github.com/pytorch/pytorch/pull/11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data ` - with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args - `torch.as_tensor` retains its behavior as documented gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11815 Differential Revision: D9932713 Pulled By: weiyangfb fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259	2018-09-21 11:04:19 -07:00
yya007	b91b15d86e	Implementing Matrix Norm for torch.norm (#11261 ) Summary: Currently, norm function only supports vector norm. This PR extends vector norm to matrix norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11261 Reviewed By: li-roy Differential Revision: D9652379 Pulled By: yya007 fbshipit-source-id: 519b3fb80b563c17c56a24675c7b0e46bf5a3a1c	2018-09-20 14:43:13 -07:00
Tongzhou Wang	24e958a0a7	Move bernoulli into ATen (#10273 ) Summary: + https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken fixed in moving `bernoulli_out` to ATen + https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods + https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results fixed by adding CUDA asserts In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors. The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like: ```cpp // The template argument `4` below indicates that we want to operate on four // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details. at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>( ret, p, [seeds] __device__( int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4, const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) { curandStatePhilox4_32_10_t state; curand_init( seeds.first, blockIdx.x * blockDim.x + threadIdx.x, seeds.second, &state); float4 rand = curand_uniform4(&state); switch (n) { case 4: { assert(0 <= p4 && p4 <= 1); v4 = static_cast<scalar_t>(rand.w <= p4); } case 3: { assert(0 <= p3 && p3 <= 1); v3 = static_cast<scalar_t>(rand.z <= p3); } case 2: { assert(0 <= p2 && p2 <= 1); v2 = static_cast<scalar_t>(rand.y <= p2); } case 1: { assert(0 <= p1 && p1 <= 1); v1 = static_cast<scalar_t>(rand.x <= p1); } } } ); ``` Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops: post patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 6.841588497161865 +- 0.05413117632269859 torch.bernoulli(xc) 0.05963418632745743 +- 0.0008014909108169377 x.bernoulli_() 0.4024486541748047 +- 0.0021550932433456182 xc.bernoulli_() 0.02167394384741783 +- 2.3818030967959203e-05 ``` pre-patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 12.394511222839355 +- 0.0966421514749527 torch.bernoulli(xc) 0.08970972150564194 +- 0.0038722590543329716 x.bernoulli_() 1.654480218887329 +- 0.02364428900182247 xc.bernoulli_() 0.058352887630462646 +- 0.003094920190051198 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273 Differential Revision: D9831294 Pulled By: SsnL fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088	2018-09-19 16:45:47 -07:00
Amitesh Arora	4ee0a78ee6	varargs for meshgrid (#11600 ) Summary: Adds vararg support for meshgrid and adds checks for all the tensor arguments to have the same dtype and device. Fixes: [#10823](https://github.com/pytorch/pytorch/issues/10823), #11446 The earlier pull request closed without any changes because I had some rebasing issues, so I made another pull request to close out #10823. Sorry for the inconvenience. Differential Revision: D9892876 Pulled By: ezyang fbshipit-source-id: 93d96cafc876102ccbad3ca2cc3d81cb4c9bf556	2018-09-18 07:41:31 -07:00
Wei Yang	407a9fee0c	make copy constructed tensor a leaf variable when using torch.tensor(sourceTensor) (#11061 ) Summary: - fix https://github.com/pytorch/pytorch/issues/10876 - the cause of the bug is because copy constructor cannot distinguish between default value of requires_grad and requires_grad=False, thus it makes a copy from source tensor along with its grad_fn if requires_grad=True at source - with this fix, the behavior becomes ``` >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source, requires_grad=True) >>> print(copy) tensor([[-1.2001, 1.9869], [-1.0134, 1.3096]], grad_fn=<CopyBackwards>) >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source, requires_grad=False) >>> print(copy) tensor([[-0.7402, 0.0467], [ 0.4344, -0.0420]]) >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source) >>> print(copy) tensor([[-0.7402, 0.0467], [ 0.4344, -0.0420]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11061 Differential Revision: D9569714 Pulled By: weiyangfb fbshipit-source-id: ea368688bdc0f1ce5997870e164e42835b64b4a1	2018-09-17 23:29:09 -07:00
Thomas Viehmann	a02685e109	Fix test_torch's test_potri (#11770 ) Summary: tset_potri -> test_potri, even though it has been like this for a long time More a curiosity than grave functionality... Pull Request resolved: https://github.com/pytorch/pytorch/pull/11770 Reviewed By: ezyang Differential Revision: D9884767 Pulled By: soumith fbshipit-source-id: 9bedde2e94ade281ab1ecc2293ca3cb1a0107387	2018-09-17 21:58:18 -07:00

1 2 3 4 5 ...

480 Commits