pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
vishwakftw	bcb5fd8f06	Port symeig to ATen and enable batching of inputs (#21858 ) Summary: Changelog: - Port `symeig` from TH/THC to ATen - Enable batching of matrix inputs for `symeig` - Modify derivative computation based on batching - Update docs to reflect the change Pull Request resolved: https://github.com/pytorch/pytorch/pull/21858 Test Plan: - Added additional tests in `test_torch.py` (with a port to `test_cuda.py`) and `common_methods_invocations.py` to test if both the port and batching work. Differential Revision: D15981789 Pulled By: soumith fbshipit-source-id: ab9af8361f8608db42318aabc8421bd99a1ca7ae	2019-06-25 12:13:27 -07:00
Michael Suo	e016a424ef	Revert D15944971: [pytorch][PR] merge interfaces that have an optional scalartype parameter Differential Revision: D15944971 Original commit changeset: 53473c370813 fbshipit-source-id: a18158b448cb8993b12e1a3bf2c2a3e0d6df6b10	2019-06-24 09:41:33 -07:00
Brian Vaughan	142361a7e4	merge interfaces that have an optional scalartype parameter (#21088 ) Summary: This change is backwards incompatible in C++ only on mean(), sum(), and prod() interfaces that accepted either of: ``` Tensor sum(IntArrayRef dim, bool keepdim=false) const; Tensor sum(IntArrayRef dim, ScalarType dtype) const; ``` but now to specify both the dim and dtype will require the keepdim parameter: ``` Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const; ``` [xla ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088 Reviewed By: ailzhang Differential Revision: D15944971 Pulled By: nairbv fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74	2019-06-24 07:17:58 -07:00
Jerry Zhang	88921feafd	change return type for q_scale and q_zero_point (#21709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21709 Change the return type from Scalar to double/int64_t so we don't need to do conversion when we call other quantize related aten functions Differential Revision: D15793003 fbshipit-source-id: 510936c69fa17a4d67340a31ebb03415647feb04	2019-06-20 20:30:39 -07:00
Jerry Zhang	fa5263af2c	Add set_quantizer_ for QTensor (#21852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21852 To enable change of q_scale and q_zero_point in `copy_` Differential Revision: D15793427 fbshipit-source-id: a7040b5b956d161fd6af6176287f4a4aa877c9be	2019-06-18 19:50:12 -07:00
Jerry Zhang	94f903654c	Add qscheme() method (#20608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608 Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc. Reviewed By: zafartahirov Differential Revision: D15364354 fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7	2019-06-14 16:29:29 -07:00
Richard Zou	0d6eb209e6	Expose torch.empty(sizes, *, names, ...) to Python (#21648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21648 ghimport-source-id: 583f155c8ee95967d2f8b9d8df27d94b9e725694 Differential Revision: D15804482 Pulled By: zou3519 fbshipit-source-id: f86520dda479100be2a752e4db8a902167413a83	2019-06-14 11:52:47 -07:00
Richard Zou	5c0e058950	Implement at::empty(IntArrayRef, DimnameList?, TensorOptions) in aten (#21647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21647 ghimport-source-id: 1db4ec31f047f7854a39c28e2b38918dc6b44f42 Differential Revision: D15804425 Pulled By: zou3519 fbshipit-source-id: 575cc3de09287efe75e7052df129626748208d0d	2019-06-13 20:38:19 -07:00
Brennan Vincent	f4f32cecfd	numpy like nonzero (called nonzero_tuple) (#20293 ) Summary: No performance degradation compared to Numpy when indexing: ``` In [15]: x=torch.randn((1000,1000)) In [16]: %timeit x[x.nonzero_tuple()] 4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [17]: y=x.numpy() In [18]: %timeit y[y.nonzero()] 14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [20]: x=x.t() In [22]: %timeit x[x.nonzero_tuple()] 9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [24]: y=x.numpy() In [25]: %timeit y[y.nonzero()] 16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293 Differential Revision: D15358754 Pulled By: umanwizard fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1	2019-06-06 12:50:59 -07:00
Brennan Vincent	e268fc97c3	Re-add Tensor.T (#21175 ) Summary: Something flaky is going on with `test_inplace_view_saved_output` on Windows. With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted. Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output... I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing. Test Plan: I will wait to make sure the Windows CI suite passes before landing this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175 Differential Revision: D15566970 Pulled By: umanwizard fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3	2019-06-04 17:38:25 -07:00
Edward Yang	0544a491d5	Revert D15499749: [pytorch][PR] Add `Tensor.T` attribute to reverse dimensions Differential Revision: D15499749 Original commit changeset: f3306b496667 fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878	2019-05-29 04:29:48 -07:00
Roy Li	3038cf8eee	Remove THSTensor and SparseTensorRef (#20877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20877 ghimport-source-id: a07f53ca158f9a3dce7a25ef5a169871e98ea3ea Differential Revision: D15480353 Pulled By: li-roy fbshipit-source-id: 1152dbc4df827ded3be1a57f007a6b7de12f567f	2019-05-29 01:37:03 -07:00
vishwakftw	f6ec464890	Enable batched QR decomposition and add a `some` option (#20689 ) Summary: This PR covers two important points with respect to the QR decomposition: - batching of input matrices (#7500) - adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538) Changelog: - Enable batching for inputs to `torch.qr` - Move QR decomposition implementation to ATen (CPU and CUDA) - Remove existing implementations in TH/THC - Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition - Modify doc strings Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689 Differential Revision: D15529230 Pulled By: soumith fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb	2019-05-28 17:52:37 -07:00
Brennan Vincent	9294de8c9f	Add `Tensor.T` attribute to reverse dimensions (#20598 ) Summary: For compatibility with numpy Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598 Differential Revision: D15499749 Pulled By: umanwizard fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc	2019-05-28 16:59:06 -07:00
Junjie Bai	c9f380df02	Add aten mkldnn linear operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19210 Reviewed By: dzhulgakov Differential Revision: D14901641 fbshipit-source-id: 8fa68b9941fd93cea0f313a828cba34c5c81ae11	2019-04-26 13:41:57 -07:00
Roy Li	a6811e17c0	Restore copy_ overload with async arg (#19641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19641 ghimport-source-id: 7099221334505bacdc209cff8bf29e3004c30379 Differential Revision: D15056755 Pulled By: li-roy fbshipit-source-id: e9063b606e72a70fc1270fbcdcf1c0b23d876dd3	2019-04-24 17:51:50 -07:00
Vitaly Fedyunin	d14abe3aff	Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688 ) Summary: Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688 Differential Revision: D15012644 Pulled By: VitalyFedyunin fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd	2019-04-24 15:38:56 -07:00
Wanchao Liang	e9c8f372c4	dispatch max_pools with no indices, expose max_pools to torch namespace (#19449 ) Summary: in functional interfaces we do boolean dispatch, but all to max_pool\d_with_indices. This change it to emit max_pool\d op instead when it's not necessary to expose with_indices ops to different backends (for jit). It also bind max_pool\d to the torch namespace, which is the same behavior with avg_pool\d Pull Request resolved: https://github.com/pytorch/pytorch/pull/19449 Differential Revision: D15016839 Pulled By: wanchaol fbshipit-source-id: f77cd5f0bcd6d8534c1296d89b061023a8288a2c	2019-04-23 11:20:05 -07:00
Vitaly Fedyunin	1c5073fb4b	Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952 ) Summary: Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary. Supported functions: ```python torch.rand_like(t, pin_memory=True) torch.randn_like(t, pin_memory=True) torch.empty_like(t, pin_memory=True) torch.full_like(t, 4, pin_memory=True) torch.zeros_like(t, pin_memory=True) torch.ones_like(t, pin_memory=True) torch.tensor([10,11], pin_memory=True) torch.randn(3, 5, pin_memory=True) torch.rand(3, pin_memory=True) torch.zeros(3, pin_memory=True) torch.randperm(3, pin_memory=True) torch.empty(6, pin_memory=True) torch.ones(6, pin_memory=True) torch.eye(6, pin_memory=True) torch.arange(3, 5, pin_memory=True) ``` Part of the bigger: `Remove Storage` plan. Now compatible with both torch scripts: ` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)` and ` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))` Same checked for all similar functions `rand_like`, `empty_like` and others It is fixed version of #18455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952 Differential Revision: D14801792 Pulled By: VitalyFedyunin fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba	2019-04-16 11:06:15 -07:00
Xiang Gao	ea2405c7dc	Add torch.unique_consecutive (#19060 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/19045 Please review: VitalyFedyunin ngimel This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there. The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely. Benchmark on a tensor of shape `torch.Size([15320, 2])`: ```python print(torch.__version__) print() a = tensor.sort().values.to('cpu') print('cpu, sorted_input=False:') %timeit torch._unique2_temporary_will_remove_soon(a) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True) print() print('cpu, sorted_input=True:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True) print() a = a.to('cuda') print('cuda, sorted_input=False:') %timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize() print() print('cuda, sorted_input=True:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+2addccc cpu, sorted_input=False: 340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted_input=True: 32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cuda, sorted_input=False: 213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cuda, sorted_input=True: 45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` ```python print(torch.__version__) print() a1, a2 = tensor.unbind(1) indices = (a1 * tensor.max() + a2).sort().indices a = tensor.index_select(0, indices).to('cpu') print('cpu, sorted_input=False:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True) print() print('cpu, sorted_input=True:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True) print() a = a.to('cuda') print('cuda, sorted_input=False:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize() print() print('cuda, sorted_input=True:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` cpu, sorted_input=False: 55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted_input=True: 54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cuda, sorted_input=False: 171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cuda, sorted_input=True: 59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060 Differential Revision: D14866909 Pulled By: ezyang fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f	2019-04-10 07:36:08 -07:00
Vitaly Fedyunin	b7c830b916	Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854 ) Summary: This reverts commit `c484cf43a0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854 Differential Revision: D14778393 Pulled By: VitalyFedyunin fbshipit-source-id: 4b5a1f5b1c091bbc4a8e75614734cc011d26b452	2019-04-05 06:25:33 -07:00
Jerry Zhang	dfcd7b0185	QTensor (#18230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230 Implementing minimum qtensor API to unblock other workstreams in quantization Changes: - Added Quantizer which represents different quantization schemes - Added qint8 as a data type for QTensor - Added a new ScalarType QInt8 - Added QTensorImpl for QTensor - Added following user facing APIs - quantize_linear(scale, zero_point) - dequantize() - q_scale() - q_zero_point() Reviewed By: dzhulgakov Differential Revision: D14524641 fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c	2019-04-03 13:17:11 -07:00
Vitaly Fedyunin	c484cf43a0	Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455 ) Summary: Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary. Supported functions: ```python torch.rand_like(t, pin_memory=True) torch.randn_like(t, pin_memory=True) torch.empty_like(t, pin_memory=True) torch.full_like(t, 4, pin_memory=True) torch.zeros_like(t, pin_memory=True) torch.ones_like(t, pin_memory=True) torch.tensor([10,11], pin_memory=True) torch.randn(3, 5, pin_memory=True) torch.rand(3, pin_memory=True) torch.zeros(3, pin_memory=True) torch.randperm(3, pin_memory=True) torch.empty(6, pin_memory=True) torch.ones(6, pin_memory=True) torch.eye(6, pin_memory=True) torch.arange(3, 5, pin_memory=True) ``` Part of the bigger: `Remove Storage` plan. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455 Reviewed By: ezyang Differential Revision: D14672084 Pulled By: VitalyFedyunin fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124	2019-04-02 08:48:19 -07:00
Vishwak Srinivasan	d859031ebf	Rename `btrifact*` to `lu` (#18435 ) Summary: Changelog: - Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`). - Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple. - Rename all tests, fix callsites - Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage. - Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435 Differential Revision: D14680352 Pulled By: soumith fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8	2019-03-29 00:34:30 -07:00
vishwakftw	291746f110	Rename trtrs to triangular_solve (#18213 ) Summary: Changelog: - Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`. - Rename all tests, fix callsites - Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage. - Move `isnan` to _torch_docs.py - Remove unnecessary imports Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213 Differential Revision: D14566902 Pulled By: ezyang fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f	2019-03-21 14:27:21 -07:00
Gao, Xiang	7e6220393f	Cleanup arg{min, max} (#17103 ) Summary: Why do we need this workaround? `PythonArgParser` handles these two cases well. The discussion started at https://github.com/pytorch/pytorch/pull/6201#issuecomment-378724406. The conclusion at that time by goldsborough was: > Because we wanted to allow `dim=None` in Python and route to a different function. Essentially the problem was wanting to wrap the C++ function in Python. AFAIK there is no way of translating `dim=None` behavior into C++? So Richard and I came up with this strategy Maybe at that time `PythonArgParser` was not powerful enough to handle the routing of two function with same name but different C++ signature. Will keep an eye on the CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17103 Differential Revision: D14523503 Pulled By: VitalyFedyunin fbshipit-source-id: cae3e2678062da2eccd93b51d4050578c7a9ab80	2019-03-20 16:28:27 -07:00
Vishwak Srinivasan	421b508d55	Rename gesv to solve (#18060 ) Summary: Changelog: - Renames `gesv` to `solve` to remain consistent with `cholesky_solve`. - Rename all tests, fix callsites - Create a tentative alias for `solve` under the name `gesv`, and add a deprecated warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18060 Differential Revision: D14503117 Pulled By: zou3519 fbshipit-source-id: 99c16d94e5970a19d7584b5915f051c030d49ff5	2019-03-18 16:04:24 -07:00
vishwakftw	f268370b42	torch.btrifact for tensors with greater than 3 dimensions (#14964 ) Summary: Motivation: - Earlier, `torch.btrifact` could not handle tensors with greater than 3 dimensions. This is because of the check: > AT_CHECK(THTensor_(nDimension)(a) == 3, "expected 3D tensor, got size: ", a->sizes()); What is in this PR?: - Move `btrifact` to ATen - Remove relation to TH/THC. - Handle tensors with more than three dimensions - Tests - Docs modifications: added a note about the non-pivoting variant. [blocked due to old magma-cuda binaries] Pull Request resolved: https://github.com/pytorch/pytorch/pull/14964 Differential Revision: D14405106 Pulled By: soumith fbshipit-source-id: f051f5d6aaa45f85836a2867176c065733563184	2019-03-12 01:46:07 -07:00
Xiang Gao	2e5a8cee82	Customize the printing of namedtuple return (#17136 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17112 ```python print("good", torch.randn(5,5,5).max(1)) print("terrible", torch.randn(5,5,10).max(1)) print("not as good", torch.randn(5,5,500).max(1)) print ("old behaviour = gold standard") print(tuple(torch.randn(5,5,5).max(1))) print(tuple(torch.randn(5,5,10).max(1))) print(tuple(torch.randn(5,5,500).max(1))) ``` now gives ``` >>> import torch >>> print("good", torch.randn(5,5,5).max(1)) good torch.return_types.max( values=tensor([[ 1.2821, 1.8063, 1.8075, 1.3082, -0.1267], [ 0.3437, 0.7353, 1.2619, 0.7557, 1.6662], [ 0.8583, 1.8906, 1.0246, 1.7598, 1.1184], [ 1.7821, 0.0230, 0.9452, 1.0318, 1.0823], [ 0.4116, -0.0379, -0.1843, 1.4129, 1.8796]]), indices=tensor([[4, 4, 3, 2, 1], [1, 2, 4, 1, 1], [2, 4, 0, 2, 1], [0, 2, 0, 3, 1], [0, 4, 4, 4, 4]])) >>> print("terrible", torch.randn(5,5,10).max(1)) terrible torch.return_types.max( values=tensor([[ 2.1272, 1.3664, 2.2067, 1.3974, -0.0883, 1.2505, 1.0074, 1.1217, 0.3849, 0.6936], [ 0.6288, -0.4560, 1.2748, 1.5482, 1.2777, 1.6874, 0.7151, 0.6041, 1.3572, 1.6232], [ 1.6703, 1.0075, 1.6480, 2.2839, 1.3390, 0.4938, 1.6449, 1.7628, 0.8141, 2.5714], [ 0.7079, 1.8677, 3.2478, 1.5591, 2.4870, 0.8635, -0.1450, 1.6923, 1.4924, 1.6298], [ 2.4056, 0.8002, 0.9317, 0.7455, 0.7866, 2.1191, 0.3492, 1.2095, 1.8637, 1.7470]]), indices=tensor([[1, 1, 0, 0, 0, 0, 3, 4, 4, 4], [4, 2, 2, 1, 2, 2, 3, 1, 1, 3], [0, 3, 3, 0, 2, 1, 4, 1, 0, 1], [4, 1, 3, 0, 3, 2, 0, 1, 4, 3], [1, 0, 3, 2, 1, 0, 0, 1, 0, 1]])) >>> print("not as good", torch.randn(5,5,500).max(1)) not as good torch.return_types.max( values=tensor([[ 0.3877, 0.7873, 1.8701, ..., 0.5971, 1.6103, -0.3435], [ 1.1300, 2.2418, 1.4239, ..., 1.3943, 0.3872, 1.6475], [ 2.0656, 1.3136, 0.9896, ..., 2.3918, 0.8226, 1.0517], [ 1.1054, 0.9945, 1.0561, ..., 2.1039, 1.1524, 3.0304], [ 1.5041, 2.2809, 1.0883, ..., 0.8504, 2.4774, 1.1041]]), indices=tensor([[4, 3, 1, ..., 1, 4, 0], [4, 4, 4, ..., 3, 0, 3], [3, 0, 1, ..., 2, 2, 4], [0, 1, 1, ..., 4, 2, 2], [1, 0, 4, ..., 2, 0, 2]])) >>> print ("old behaviour = gold standard") old behaviour = gold standard >>> print(tuple(torch.randn(5,5,5).max(1))) (tensor([[ 1.1908, 1.1807, 1.3151, 1.7184, 0.3556], [ 0.3798, 0.9213, 0.3001, 1.3087, 2.2419], [ 1.4233, 1.4814, 1.9900, 1.7744, 1.3059], [ 1.0026, -0.0330, 1.3061, 1.8730, 2.0685], [ 1.3041, 1.6458, 1.3449, 1.8948, 3.6206]]), tensor([[0, 4, 3, 4, 0], [1, 1, 4, 0, 4], [4, 1, 0, 3, 3], [1, 2, 1, 4, 0], [3, 3, 0, 3, 3]])) >>> print(tuple(torch.randn(5,5,10).max(1))) (tensor([[-0.1232, 0.8275, 0.6732, 1.1223, 0.8247, 1.2851, 1.6009, 1.9979, 1.9109, 0.7313], [ 0.2260, 0.5922, 1.6928, 0.6024, 2.1158, 3.0619, 0.5653, 0.7426, 0.8316, 0.6346], [ 0.4319, 0.2231, 0.5255, 1.7620, 1.1657, 0.8875, 0.5782, 0.6506, 0.5032, 1.7097], [ 0.4137, 1.7265, 1.4260, 2.0301, 1.2244, 0.7128, 2.6345, 0.7230, 1.3553, 1.6508], [ 1.0684, 1.7195, 1.4068, 0.7076, -0.0242, 0.8474, 0.8754, 1.7108, 0.2188, 1.1584]]), tensor([[0, 1, 3, 4, 2, 3, 4, 2, 1, 0], [1, 4, 0, 0, 3, 2, 0, 0, 3, 3], [2, 3, 1, 1, 4, 0, 1, 4, 4, 4], [0, 4, 1, 3, 2, 0, 2, 0, 3, 1], [1, 0, 0, 0, 0, 3, 3, 3, 2, 0]])) >>> print(tuple(torch.randn(5,5,500).max(1))) (tensor([[0.9395, 1.5572, 1.8797, ..., 2.0494, 0.8202, 0.9623], [1.7937, 0.7225, 1.8836, ..., 0.7927, 1.4976, 1.1813], [0.8558, 1.6943, 1.4192, ..., 0.8327, 1.9661, 0.4197], [1.2993, 1.4995, 0.9357, ..., 0.7810, 1.3030, 2.6216], [1.4206, 1.8315, 1.0338, ..., 1.4312, 1.3198, 1.5233]]), tensor([[0, 4, 3, ..., 3, 0, 2], [0, 1, 0, ..., 0, 4, 3], [3, 4, 3, ..., 3, 0, 0], [3, 2, 3, ..., 1, 2, 1], [1, 2, 4, ..., 3, 1, 3]])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17136 Differential Revision: D14250021 Pulled By: VitalyFedyunin fbshipit-source-id: aae72f03b35980063b1ac1f07b8353eddb0c8b93	2019-02-28 13:07:26 -08:00
Christian Puhrsch	e47aeede32	Use name for output variables instead of out in JIT (#17386 ) Summary: This adds 88 matches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17386 Differential Revision: D14179139 Pulled By: cpuhrsch fbshipit-source-id: 2c3263b8e4d084db84791e53290e8c8b1b7aecd5	2019-02-27 14:03:33 -08:00
Adam Paszke	7157be8622	Add special ops for BatchNorm symbolic differentiation (#15403 ) Summary: The main problem there is with differentiating batch norm statically is that we make a lot of complex run-time decisions about the backend we choose. Then, the autograd derivatives are implemented for every backend separately, which makes sense, because they might be saving buffers containing different values. To resolve the issue, the forward op returns an index of the chosen backend, and the backward function takes it as an argument, such that it knows how to interpret the buffers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15403 Differential Revision: D14098815 Pulled By: ailzhang fbshipit-source-id: 7fcd3e6e0566433e81fe8286fb441c1ecaf198ad	2019-02-15 15:40:28 -08:00
Edward Yang	4404762d7d	Rename IntList to IntArrayRef. (#16751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751 This was made more complicated by the fact that ivalue::IntList is a thing. So I had to fix all of the sites where we referring to IValue post facto. The following codemods were run, in this order: ``` codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>' codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>' ``` Some manual fixups were done afterwards; they can be reviewed separately at https://github.com/pytorch/pytorch/pull/16752 Reviewed By: dzhulgakov Differential Revision: D13954363 fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64	2019-02-05 14:54:34 -08:00
Roy Li	4c803f4ebd	Expose backend extensions to python Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16582 Reviewed By: gchanan Differential Revision: D13887539 fbshipit-source-id: 8755babf2e3e849af974655f2f3a91740efe977e	2019-02-01 11:00:18 -08:00
Lu Fang	b1b00f329e	Fix the flake8 linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549 Reviewed By: bddppq Differential Revision: D13877435 Pulled By: houseroad fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540	2019-01-30 09:36:00 -08:00
Thomas Viehmann	6a6983ed7f	create type hint stub files for module torch (#12500 ) Summary: We have: - This is an initial stab at creating a type stub `torch/__init__.pyi` . - This is only tested on Python 3, since that's the only Python version mypy works on. - So far, we only aim at doing this for torch functions and torch.Tensor. - Quite a few methods and functions have to be typed manually. These are done in `torch/__init__.pyi.in` For me, PyCharm (the non-paid one) didn't seem to indicate errors in the .pyi when opening and seemed to be able to get the type hint for the few functions I tried, but I don't use PyCharm for my usual PyTorch activities, so I didn't extensively try this out. An example of a generated PYI is at [this gist](https://gist.github.com/ezyang/bf9b6a5fa8827c52152858169bcb61b1). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12500 Differential Revision: D13695553 Pulled By: ezyang fbshipit-source-id: 4566c71913ede4e4c23ebc4a72c17151f94e8e21	2019-01-29 12:14:17 -08:00
Wanchao Liang	c6503a4205	Revert D13540278: [pytorch][PR] Unhide unique from C++, make unique partially scriptable Differential Revision: D13540278 Original commit changeset: 3768c76a90b0 fbshipit-source-id: 7a31c239f9dca6ff467344d99820095addcae9d7	2019-01-22 12:22:40 -08:00
Xiang Gao	c5e1b469be	Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429 ) Summary: Partially fixes: https://github.com/pytorch/pytorch/issues/394 Implementation detail: Codegen is modified to generate codes that looks like below: ```C++ static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PythonArgParser parser({ "svd(Tensor input, bool some=True, bool compute_uv=True, , TensorList[3] out=None)", }, /traceable=*/true); ParsedArgs<6> parsed_args; auto r = parser.parse(args, kwargs, parsed_args); static PyStructSequence_Field fields0[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyStructSequence_Desc desc0 = { "torch.return_types.svd_out", nullptr, fields0, 3 }; static PyTypeObject type0; static bool namedtuple_type_initialized0 = false; if (!namedtuple_type_initialized0) { PyStructSequence_InitType(&type0, &desc0); namedtuple_type_initialized0 = true; } static PyStructSequence_Field fields1[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyStructSequence_Desc desc1 = { "torch.return_types.svd", nullptr, fields1, 3 }; static PyTypeObject type1; static bool namedtuple_type_initialized1 = false; if (!namedtuple_type_initialized1) { PyStructSequence_InitType(&type1, &desc1); namedtuple_type_initialized1 = true; } if (r.idx == 0) { if (r.isNone(3)) { return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2))); } else { auto results = r.tensorlist_n<3>(3); return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2])); } } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called. When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`. In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue. Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs. There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429 Differential Revision: D13709678 Pulled By: ezyang fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf	2019-01-22 11:12:18 -08:00
Xiang Gao	bed7db7772	Unhide unique from C++, make unique partially scriptable (#15256 ) Summary: This PR does three things: ~~Allow `int64_t?` in function schema, which provide an elegant way of implementing null-able int arguments, as discussed in https://github.com/pytorch/pytorch/pull/15208#pullrequestreview-185230081~~ ~~Originally implemented in https://github.com/pytorch/pytorch/pull/15235~~ ~~Example:~~ ```yaml - func: myop(Tensor self, int64_t? dim=None) -> Tensor variants: function ``` ~~cc: zou3519~~ Edit: implemented in https://github.com/pytorch/pytorch/pull/15234 Previously tried in https://github.com/pytorch/pytorch/pull/12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`. Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user. Differential Revision: D13540278 Pulled By: wanchaol fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd	2019-01-21 12:31:37 -08:00
James Reed	acbd9c49b0	Direct FBGEMM integraton into ATen (#13777 ) Summary: This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation: 1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel. 2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value. 3) Biases are unquantized 4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops must be run with FBGEMM The API can be seen in the added test case. Highlights are: 1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above. 2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777 Differential Revision: D13383276 Pulled By: jamesr66a fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344	2018-12-21 10:35:51 -08:00
Wanchao Liang	b89b46abfb	Remove python_default_init from ATen and use Optional (#15234 ) Summary: Optional clean up. This PR remove python_default_init from the yaml files, and the code-gen, and utilize optional type to do the work. This also fix the bug in the #13149 to correctly adopt as_strided backward. Fixes #9941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15234 Differential Revision: D13502044 Pulled By: wanchaol fbshipit-source-id: 774b61fc4414482cf11d56e22bd0275aefb352a4	2018-12-19 21:38:50 -08:00
Tugrul Ates	560530aeec	Optional ScalarType support for native functions & JIT (#15154 ) Summary: For #6593 and #9515 This completes the support for optional<ScalarType> in native, JIT and autograd. Note: Mostly following the existing implementation for optional<Scalar> that was added in https://github.com/pytorch/pytorch/pull/12582. This PR introduces a way to make functions accept an optional dtype and it will unblock #9515 by allowing the `dtype` param for type promotion interface: ``` func: name(inputs, , ScalarType? dtype=None, Casting casting=same_kind) ``` An alternative approach could have been using `ScalarType::Undefined` for the same purpose but without optional, though it would have been a bit hacky. ``` func: name(inputs, , ScalarType dtype=Undefined, Casting casting=same_kind) ``` Here's an example use of this in action: `971f69eac6` There are already a bunch of native functions that were getting optional `dtype` through function overloading. https://github.com/pytorch/pytorch/pull/15133 is the attempt to migrate all of those. I will send those changes separately after this since some functions (e.g. sum) need quite a bit of change in the codebase. See the commits over there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15154 Differential Revision: D13457760 Pulled By: tugrulates fbshipit-source-id: 706134f0bd578683edd416b96329b49a1ba8ab48	2018-12-19 10:45:35 -08:00
Shen Li	90f9e8103c	Implement torch.tril_indices and torch.triu_indices (#12653 ) (#14904 ) Summary: This is an optimized implementation that does the following: 1. created an empty Tensor of correct size. 2. fill the Tensor with correct values. The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors. 1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations. 2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration. 3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it. <img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png"> NOTE: This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following: ```python x = torch.ones(3, 3) i = torch.tril_indices(3, 3) x[i] # need to first convert the 2D tensor into a tuple of two 1D tensors. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904 Reviewed By: zou3519 Differential Revision: D13433027 Pulled By: mrshenli fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a	2018-12-12 15:40:14 -08:00
Peter Goldsborough	875be849e9	Rename _local_scalar to item() (#13676 ) Summary: Make `at::_local_scalar` more "official" by renaming it to `item()`. gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/13676 Differential Revision: D13003020 Pulled By: goldsborough fbshipit-source-id: 0ac25f5237fb81a1576304a0a02f840ff44168a4	2018-12-04 13:19:26 -08:00
Wei Yang	12558019a8	backward for sparse.addmm(D, S, D, alpha, beta) -> D (#13345 ) Summary: - introduce `sparse.addmm()` with backward for sparse matrix input for https://github.com/pytorch/pytorch/issues/12308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13345 Differential Revision: D13094070 Pulled By: weiyangfb fbshipit-source-id: 136c08c3ca9bafb20577b60dd43d31c3e5cd5461	2018-11-26 17:47:48 -08:00
Gregory Chanan	b6edd7bbb4	Support 'python_module' of 'nn' in native functions. (#14126 ) Summary: Also move mse_loss, binary_cross_entropy, l1_loss to use this functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14126 Reviewed By: ezyang Differential Revision: D13109975 Pulled By: gchanan fbshipit-source-id: 0b29dc8cf222d25db14da7532d8dc096a988a0ec	2018-11-19 14:13:25 -08:00
vishwakftw	a30ade1139	Batched cholesky decomposition (#14017 ) Summary: Implements batching for the Cholesky decomposition. Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations. Changes made: - batching code - tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`. - doc string modification - autograd modification - removal of `_batch_potrf` in `MultivariateNormal`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017 Differential Revision: D13087945 Pulled By: ezyang fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e	2018-11-17 10:49:15 -08:00
Gregory Chanan	ce6192a21f	Don't python bind _thnn_ functions. (#14101 ) Summary: This is needed for moving nn functions to native functions, but since some functions are already named this way, I'm going to stop binding pre-emptively so we can check if there are any current dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14101 Differential Revision: D13102219 Pulled By: gchanan fbshipit-source-id: 6bbcca33a03ab1bf648f1b73cadfe84339fa3050	2018-11-16 17:18:08 -08:00
Vishwak Srinivasan	7b2fb012a8	Make potrs batched (#13453 ) Summary: - This is a straightforward PR, building up on the batch inverse PR, except for one change: - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty. Billing of changes: - Add batching for `potrs` - Add relevant tests - Modify doc string Minor changes: - Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`. - Add test for CUDA `potrs` (2D Tensor op) - Move the batched shape checking to `LinearAlgebraUtils.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453 Reviewed By: soumith Differential Revision: D12942039 Pulled By: zou3519 fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35	2018-11-09 15:16:26 -08:00
vishwakftw	1fe8278559	Batched Inverse (#9949 ) Summary: Complete billing of changes: Related to Batch Inverse: - [x] Add batched inverse (CPU) - [x] Add batched inverse (CUDA) - [x] Modify autograd entry - [x] Add tests - [x] test_autograd - [x] test_cuda - [x] test_torch - [x] Modify docs - [x] Remove `_batch_inverse` in `MultivariateNormal`. - [x] Allow batch matrices as inputs for negative powers in `matrix_power` Miscellaneous modifications: - [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops. - [x] Add a RAII structure for MAGMA queue management. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9949 Differential Revision: D10559089 Pulled By: zou3519 fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4	2018-10-27 23:42:46 -07:00
Richard Zou	4870b1b68f	Speed up tensor.resize_(sizes) when tensor has correct size (#12824 ) Summary: While using gbenchmark, I found `tensor.resize_({0})` would take 300ns if tensor already has the correct size. This is important for `at::empty({0})` perf because `at::empty` always calls `resize_`, which in turn is a important for JIT perf: the fusion compiler creates empty tensors and then `resize_`s them to computed sizes. Most of the 300ns is due to DeviceGuard (200ns) Summary of findings: - `at::empty({0}, cuda)`: 851ns - `empty_tensor.resize({0})`: 308ns - `DeviceGuard(tensor)`: ctor + dtor: 200ns (Going to look into this next because it impacts `resize_` perf). - vdispatch overhead (`tensor.resize_()` vs `at::native::resize__cuda(tensor)`): ~10ns This PR rips out the TH `resize_` implementation and adds it to ATen with the following modifications: - DeviceGuard used only after the same-size check. - Same-size check rewritten for simplicity. The new check doesn't affect perf. - empty_cpu / empty_cuda avoid the dispatch overhead to tensor.resize_. Timing with this PR: - `at::empty({0}, cuda)`: 363ns - `empty_tensor.resize_({0})`: 17ns Future: - Investigate `resize_(sizes)` slowness when `tensor.sizes() != sizes` - Should tell resize_as_ to use the new resize_ implementation... (because resize_as_ is in TH, it is calling the old TH resize_) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12824 Differential Revision: D10449209 Pulled By: zou3519 fbshipit-source-id: cecae5e6caf390017c07cd44a8eaf2fa6e3fdeb6	2018-10-25 21:09:41 -07:00

1 2 3

128 Commits