pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Vishwak Srinivasan	e73be58ff7	Rename `btriunpack` to `lu_unpack` (#18529 ) Summary: Changelog: - Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface. - Rename all relevant tests, fix callsites - Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529 Differential Revision: D14683161 Pulled By: soumith fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1	2019-03-29 13:01:30 -07:00
Vishwak Srinivasan	d859031ebf	Rename `btrifact*` to `lu` (#18435 ) Summary: Changelog: - Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`). - Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple. - Rename all tests, fix callsites - Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage. - Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435 Differential Revision: D14680352 Pulled By: soumith fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8	2019-03-29 00:34:30 -07:00
Edward Yang	81e030d9a6	Upgrade flake8-bugbear to master, fix the new lints. (#18507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18507 ghimport-source-id: 1c3642befad2da78a7e5f39d6d58732b85c76267 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18507 Upgrade flake8-bugbear to master, fix the new lints. It turns out Facebobok is internally using the unreleased master flake8-bugbear, so upgrading it grabs a few more lints that Phabricator was complaining about but we didn't get in open source. A few of the getattr sites that I fixed look very suspicious (they're written as if Python were a lazy language), but I didn't look more closely into the matter. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14633682 fbshipit-source-id: fc3f97c87dca40bbda943a1d1061953490dbacf8	2019-03-27 08:07:41 -07:00
Xiang Gao	2ba41c5550	Add some missing docs for tensor methods and attributes, new unittest to enforce tensors.rst no longer miss anything (#16057 ) Summary: This depend on https://github.com/pytorch/pytorch/pull/16039 This prevent people (reviewer, PR author) from forgetting adding things to `tensors.rst`. When something new is added to `_tensor_doc.py` or `tensor.py` but intentionally not in `tensors.rst`, people should manually whitelist it in `test_docs_coverage.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16057 Differential Revision: D14619550 Pulled By: ezyang fbshipit-source-id: e1c6dd6761142e2e48ec499e118df399e3949fcc	2019-03-26 18:05:56 -07:00
Soumith Chintala	66628f78b7	Revert D14605905: [pytorch][PR] Add return_counts to torch.unique Differential Revision: D14605905 Original commit changeset: 555f5a12a8e2 fbshipit-source-id: c7874f5987893e956c022180a37763d88bba38db	2019-03-26 17:18:01 -07:00
Tongzhou Wang	5292685d2f	Improve numerical precision of (s)logdet (#18449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449 Differential Revision: D14611638 Pulled By: soumith fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf	2019-03-26 15:32:14 -07:00
Soumith Chintala	436723122e	fix arange shape issue inconsistency across cpu and cuda (#18462 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18462 Differential Revision: D14620263 Pulled By: soumith fbshipit-source-id: 223524cdda2f5d55c2ca8d4cdcf6f7a05a6c15eb	2019-03-26 15:27:24 -07:00
Xiang Gao	5bff395a82	Namedtuple return for solve, slogdet, sort, topk (#17093 ) Summary: More ops for https://github.com/pytorch/pytorch/issues/394. ~~Also need to rebase after landing #16186, because we need to update the whitelist of the new unit test added in #16186.~~ cc: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/17093 Differential Revision: D14620068 Pulled By: ezyang fbshipit-source-id: deec5ffc9bf7624e0350c85392ee59789bad4237	2019-03-26 12:39:08 -07:00
Iurii Zdebskyi	1a742075ee	Resolving comments from Bool Tensor for CPU PR (#18165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18165 ghimport-source-id: 55cb3fb63a25c2faab1725b4ec14c688bf45bd38 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18166 Bool Tensor for CUDA * #18165 Resolved comments from Bool Tensor for CPU PR ------- ------------ This is a follow up PR that resolves some additional feedback on one the of previous Bool Tensor PRs. gchanan, here is a list of almost all the comments from the original PR with respective fixes and replies: [utils/python_scalars.h] why is this converting from uint8_t and not bool? (comment?) When i was adding this, i was testing by creating a tensor and then calling its .tolist(). it worked for bool and uint8_t equally good so i left uint8_t as thought it makes more sense as we are calling PyBool_FromLong. �Changing it to bool. [ATen/Dispatch.h]better name?. fixed. [test/test_torch.py] what about other factories, such as full? (and more). There is a test that goes through the factory methods - test_tensor_factories_empty. i added some bool cases above it and added a comment that once CUDA will be done, i will unite them and it will iterate not just between CUDA and CPU but also all types. ��Adding all bool cases now. Will unite in CUDA PR. [generic/THTensorMath.h] any changes in this file actually needed? Bad merge. Fixed. [TH/THTensor.h] this generates code for random, clampedRandom, and cappedRandom -- do we have tests for all of these with bool? Added [c10/core/ScalarType.h] I'm not very confident about the lack of Bool here -- can you look at the call sites and see what makes sense to do here? Added bool to the macro and created a similar one without for a single case which fails the build with errors: _./torch/csrc/jit/symbolic_variable.h:79:20: error: ambiguous overload for ‘operator’ (operand types are ‘const torch::jit::SymbolicVariable’ and ‘torch::jit::Value’) return (this) insertConstant(rhs);_ Differential Revision: D14605105 fbshipit-source-id: abf82d50e8f8c50b386545ac068268651b28496d	2019-03-26 09:59:34 -07:00
vishwakftw	5e462a3ed6	Introduce SobolEngine (#10505 ) Summary: `SobolEngine` is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111. Documentation has been added, tests have been added based on Balandat 's references. The implementation is an optimized / tensor-ized implementation of Balandat 's implementation in Cython as provided in #9332. This closes #9332 . cc: soumith Balandat Pull Request resolved: https://github.com/pytorch/pytorch/pull/10505 Reviewed By: zou3519 Differential Revision: D9330179 Pulled By: ezyang fbshipit-source-id: 01d5588e765b33b06febe99348f14d1e7fe8e55d	2019-03-26 07:53:07 -07:00
Xiang Gao	e2730ddb21	Add return_counts to torch.unique (#18391 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/12598 This PR was originally authorized by ptrblck at https://github.com/pytorch/pytorch/pull/15495, but since there was no update for months after the request change, I clone that branch and resolve the code reviews here. Hope everything is good now. Especially, the implementation of count is changed from ptrblck's original algorithm to the one ngimel suggest, i.e. using `unique_by_key` and `adjacent_difference`. The currently implementation of `_unique_dim` is VERY slow for computing inverse index and counts, see https://github.com/pytorch/pytorch/issues/18405. I will refactor `_unique_dim` in a later PR. For this PR, please allow me to keep the implementation as is. cc: ptrblck ezyang ngimel colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/18391 Reviewed By: soumith Differential Revision: D14605905 Pulled By: VitalyFedyunin fbshipit-source-id: 555f5a12a8e28c38b10dfccf1b6bb16c030bfdce	2019-03-25 20:38:17 -07:00
Edward Yang	50df3e5e2e	Add ability to query if built with CUDA and MKL-DNN. (#18362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362 ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18362 Add ability to query if built with CUDA and MKL-DNN. Fixes #18108. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14584430 fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473	2019-03-25 10:39:09 -07:00
Edward Yang	e3da16a99e	Add test for #17271 (torch.exp incorrect for 2*31 size tensor) (#18292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18292 ghimport-source-id: a3e96584db0eef7b6202a1211808f9f6e59dd529 Stack from [ghstack](https://github.com/ezyang/ghstack): #18292 Add test for #17271 (torch.exp incorrect for 231 size tensor)** * #18291 Correctly call superclass setUp in TestCase subclasses. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14567642 fbshipit-source-id: c60ee7597a86f5d2c5c0b72cb106f17815950427	2019-03-22 07:50:38 -07:00
vishwakftw	291746f110	Rename trtrs to triangular_solve (#18213 ) Summary: Changelog: - Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`. - Rename all tests, fix callsites - Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage. - Move `isnan` to _torch_docs.py - Remove unnecessary imports Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213 Differential Revision: D14566902 Pulled By: ezyang fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f	2019-03-21 14:27:21 -07:00
Edward Yang	549c4da917	Add a decorator for marking slow tests. (#18231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18231 ghimport-source-id: 78c230f60c41877fe91b89c8c979b160f36f856b Stack from [ghstack](https://github.com/ezyang/ghstack): * #18231 Add a decorator for marking slow tests. The general strategy: - It's a normal skip decorator, which triggers a skip if PYTORCH_TEST_WITH_SLOW is not set. - It also annotates the method in question that says it's slow. We use this to implement a catch-all skipper in setUp that skips all non-slow tests when PYTORCH_TEST_SKIP_FAST is set. I added a little smoketest to test_torch and showed that I get: ``` Ran 432 tests in 0.017s OK (skipped=431) ``` when running with PYTORCH_TEST_WITH_SLOW=1 and PYTORCH_TEST_SKIP_FAST=1 CI integration coming in later patch, as well as nontrivial uses of this decorator. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14544441 fbshipit-source-id: 54435ce4ec827193e019887178c09ebeae3ae2c9	2019-03-21 11:17:34 -07:00
Edward Yang	ba81074c40	Fix B902 lint error: invalid first argument. (#18181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181 ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315 Stack from [ghstack](https://github.com/ezyang/ghstack): * #18184 Fix B903 lint: save memory for data classes with slots/namedtuple * #18181 Fix B902 lint error: invalid first argument. * #18178 Fix B006 lint errors: using mutable structure in default argument. * #18177 Fix lstrip bug revealed by B005 lint A variety of sins were committed: - Some code was dead - Some code was actually a staticmethod - Some code just named it the wrong way - Some code was purposely testing the omitted case Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14530876 fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e	2019-03-21 09:10:28 -07:00
Gao, Xiang	7e6220393f	Cleanup arg{min, max} (#17103 ) Summary: Why do we need this workaround? `PythonArgParser` handles these two cases well. The discussion started at https://github.com/pytorch/pytorch/pull/6201#issuecomment-378724406. The conclusion at that time by goldsborough was: > Because we wanted to allow `dim=None` in Python and route to a different function. Essentially the problem was wanting to wrap the C++ function in Python. AFAIK there is no way of translating `dim=None` behavior into C++? So Richard and I came up with this strategy Maybe at that time `PythonArgParser` was not powerful enough to handle the routing of two function with same name but different C++ signature. Will keep an eye on the CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17103 Differential Revision: D14523503 Pulled By: VitalyFedyunin fbshipit-source-id: cae3e2678062da2eccd93b51d4050578c7a9ab80	2019-03-20 16:28:27 -07:00
Vishwak Srinivasan	a519217ee7	Add batched version of trtrs (#18025 ) Summary: - Remove single batch TH/THC implementations - Remove `_batch_trtrs_lower` from `multivariate_normal` - Add tests for batched behavior - Modify trtrs_backward to accommodate for batched case - Modify docs In a future PR, this will be renamed to `triangular_solve`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18025 Differential Revision: D14523004 Pulled By: ifedan fbshipit-source-id: 11c6a967d107f969b60e5a5c73ce6bb8099ebbe1	2019-03-20 11:11:32 -07:00
vishwakftw	234bb8719a	Add backend checks to solve methods (gesv, cholesky_solve) (#18116 ) Summary: Changelog: - Incorporate a simple backend check in the linearSolveCheckInputs function in LinearAlgebraUtils.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/18116 Differential Revision: D14504469 Pulled By: soumith fbshipit-source-id: 7402b6dbaa8d73048946613b806d54f68bcbd8f4	2019-03-19 10:44:45 -07:00
Vishwak Srinivasan	421b508d55	Rename gesv to solve (#18060 ) Summary: Changelog: - Renames `gesv` to `solve` to remain consistent with `cholesky_solve`. - Rename all tests, fix callsites - Create a tentative alias for `solve` under the name `gesv`, and add a deprecated warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18060 Differential Revision: D14503117 Pulled By: zou3519 fbshipit-source-id: 99c16d94e5970a19d7584b5915f051c030d49ff5	2019-03-18 16:04:24 -07:00
Richard Zou	3c977fb7ce	Error out on in-place (unary) ops on tensors that have internal overlap (#17927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17927 ghimport-source-id: 626d321e430b6b5c0ea3aa1eb9df8c1e2d058bf8 Stack: * #17926 Implement at::has_internal_overlap helper function * #17927 Error out on in-place (unary) ops on tensors that have internal overlap On the way to #17935. Works for CPU and CUDA on the following ops: - abs_, acos_, asin_, atan_, ceil_, cos_, erf_, erfc_, exp_, expm1_ - floor_, log_, log10_, log1p_, log2_, round_, rsqrt_, - sin_, sqrt_, tan_, tanh_, trunc_ This PR adds a check to see if the out/result tensor has internal overlap. If it does, then we error out because the result may be incorrect. This is overly conservative; there are some cases where if the result is the same as the input, the inplace operation is OK (such as floor_, round_, and trunc_). However, the current code isn't organized in such a way that this is easy to check, so enabling those will come in the future. Reviewed By: ezyang Differential Revision: D14438871 fbshipit-source-id: 15e12bf1fdb2ab7f74bb806e22bc74840bd6abd1	2019-03-15 07:50:19 -07:00
Richard Zou	a4123decf7	Implement at::has_internal_overlap helper function (#17926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17926 ghimport-source-id: 9f7572b5d43e474492363fa17dcb86a6c27ca13c Stack: * #17926 Implement at::has_internal_overlap helper function * #17927 Error out on in-place (unary) ops on tensors that have internal overlap On the way to #17935. Checks if a tensor's sizes/strides indicate that multiple elements share the same memory location. This problem in general is hard so at::has_internal_overlap implements two heuristics and avoids solving the general problem: if a tensor is contiguous, it cannot have internal overlap if a tensor has any zero strides, it does have internal overlap otherwise, return MemOverlap::kTooHard to indicate that there might be overlap, but we don't know. Reviewed By: ezyang Differential Revision: D14438858 fbshipit-source-id: 607ab31771315921ab6165b2a1f072ac3e75925a	2019-03-15 07:50:17 -07:00
J M Dieterich	1ba1ca0acb	Update to ROCm2.2 (#18007 ) Summary: ROCm 2.2 was released today, if we respin the CI docker images with the attached, PyTorch/Caffe2 will support ROCm 2.2 Changes necessary: * for the Ubuntu target, HIP PR 934 needs to be applied to fix the forceinline definition. ROCm 2.3 will contain this. * two unit tests proof flaky on different platforms, disable them defensively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18007 Differential Revision: D14473903 Pulled By: bddppq fbshipit-source-id: b1939f11d1c765a3bf71bb244b15f6ceb0e816d3	2019-03-14 18:47:22 -07:00
Lu Fang	f827f1052a	Fix the CI Summary: https://github.com/pytorch/pytorch/pull/17995 's CI has verified it should fix the CI. Reviewed By: bddppq Differential Revision: D14447674 fbshipit-source-id: 50085db9ae7421b5be216ed0a2216234babfdf6c	2019-03-13 17:28:50 -07:00
Guanheng Zhang	26a4c2ada6	Speed up gemm by reordering the for loops (#17730 ) Summary: Optimize the order of the "for" loops. Note: For "transa = true" cases, the order of the "for" loops has been optimzied in the original code. Therefore, no significant improvement is observed in those case (i.e. "transa && transb" and "transa && !transb") mode/opt (i.e. static libary) ////////////////////////////////////////////////////////////////////////////// transa && transb after: loops: 2229 x: 128 y: 128 z: 128 time: 2243ns => acceleration multiplier: 0.90 loops: 124 x: 128 y: 1024 z: 128 time: 40381ns => acceleration multiplier: 0.97 loops: 121 x: 1024 y: 128 z: 128 time: 41651ns => acceleration multiplier: 0.96 loops: 15 x: 1024 y: 1024 z: 128 time: 333771ns => acceleration multiplier: 0.98 loops: 4610 x: 128 y: 128 z: 64 time: 1084ns => acceleration multiplier: 0.95 loops: 252 x: 128 y: 1024 z: 64 time: 19860ns => acceleration multiplier: 0.98 loops: 248 x: 1024 y: 128 z: 64 time: 20232ns => acceleration multiplier: 0.98 loops: 30 x: 1024 y: 1024 z: 64 time: 167338ns => acceleration multiplier: 0.99 before: loops: 2468 x: 128 y: 128 z: 128 time: 2026ns loops: 128 x: 128 y: 1024 z: 128 time: 39338ns loops: 126 x: 1024 y: 128 z: 128 time: 39930ns loops: 16 x: 1024 y: 1024 z: 128 time: 327549ns loops: 4840 x: 128 y: 128 z: 64 time: 1033ns loops: 258 x: 128 y: 1024 z: 64 time: 19441ns loops: 252 x: 1024 y: 128 z: 64 time: 19854ns loops: 31 x: 1024 y: 1024 z: 64 time: 166254ns ////////////////////////////////////////////////////////////////////////////// transa && !transb after: loops: 4880 x: 128 y: 128 z: 128 time: 1024ns => acceleration multiplier: 0.98 loops: 638 x: 128 y: 1024 z: 128 time: 7839ns => acceleration multiplier: 1.04 loops: 605 x: 1024 y: 128 z: 128 time: 8276ns => acceleration multiplier: 1.01 loops: 77 x: 1024 y: 1024 z: 128 time: 65713ns => acceleration multiplier: 1.00 loops: 9935 x: 128 y: 128 z: 64 time: 503ns => acceleration multiplier: 1.00 loops: 1252 x: 128 y: 1024 z: 64 time: 3994ns => acceleration multiplier: 1.00 loops: 1183 x: 1024 y: 128 z: 64 time: 4226ns => acceleration multiplier: 0.98 loops: 153 x: 1024 y: 1024 z: 64 time: 32766ns => acceleration multiplier: 0.99 before: loops: 4985 x: 128 y: 128 z: 128 time: 1003ns loops: 615 x: 128 y: 1024 z: 128 time: 8140ns loops: 599 x: 1024 y: 128 z: 128 time: 8357ns loops: 76 x: 1024 y: 1024 z: 128 time: 65934ns loops: 9897 x: 128 y: 128 z: 64 time: 505ns loops: 1248 x: 128 y: 1024 z: 64 time: 4008ns loops: 1203 x: 1024 y: 128 z: 64 time: 4159ns loops: 154 x: 1024 y: 1024 z: 64 time: 32499ns ////////////////////////////////////////////////////////////////////////////// !transa && transb after: loops: 3919 x: 128 y: 128 z: 128 time: 1276ns => acceleration multiplier: 2.97 loops: 497 x: 128 y: 1024 z: 128 time: 10069ns => acceleration multiplier: 7.85 loops: 449 x: 1024 y: 128 z: 128 time: 11145ns => acceleration multiplier: 4.77 loops: 57 x: 1024 y: 1024 z: 128 time: 88595ns => acceleration multiplier: 7.12 loops: 7575 x: 128 y: 128 z: 64 time: 660ns => acceleration multiplier: 3.00 loops: 967 x: 128 y: 1024 z: 64 time: 5173ns => acceleration multiplier: 7.66 loops: 877 x: 1024 y: 128 z: 64 time: 5702ns => acceleration multiplier: 4.76 loops: 111 x: 1024 y: 1024 z: 64 time: 45232ns => acceleration multiplier: 7.03 before: loops: 1320 x: 128 y: 128 z: 128 time: 3789ns loops: 64 x: 128 y: 1024 z: 128 time: 79061ns loops: 95 x: 1024 y: 128 z: 128 time: 53107ns loops: 8 x: 1024 y: 1024 z: 128 time: 631161ns loops: 2521 x: 128 y: 128 z: 64 time: 1983ns loops: 127 x: 128 y: 1024 z: 64 time: 39604ns loops: 185 x: 1024 y: 128 z: 64 time: 27128ns loops: 16 x: 1024 y: 1024 z: 64 time: 318155ns ////////////////////////////////////////////////////////////////////////////// !transa && !transb after: loops: 3895 x: 128 y: 128 z: 128 time: 1283ns => acceleration multiplier: 1.73 loops: 393 x: 128 y: 1024 z: 128 time: 12746ns => acceleration multiplier: 3.36 loops: 411 x: 1024 y: 128 z: 128 time: 12170ns => acceleration multiplier: 1.93 loops: 46 x: 1024 y: 1024 z: 128 time: 110116ns => acceleration multiplier: 3.17 loops: 7404 x: 128 y: 128 z: 64 time: 675ns => acceleration multiplier: 1.58 loops: 636 x: 128 y: 1024 z: 64 time: 7872ns => acceleration multiplier: 2.70 loops: 724 x: 1024 y: 128 z: 64 time: 6911ns => acceleration multiplier: 1.32 loops: 73 x: 1024 y: 1024 z: 64 time: 68502ns => acceleration multiplier: 2.49 before: loops: 2253 x: 128 y: 128 z: 128 time: 2219ns loops: 117 x: 128 y: 1024 z: 128 time: 42788ns loops: 214 x: 1024 y: 128 z: 128 time: 23465ns loops: 15 x: 1024 y: 1024 z: 128 time: 349076ns loops: 4694 x: 128 y: 128 z: 64 time: 1065ns loops: 236 x: 128 y: 1024 z: 64 time: 21251ns loops: 549 x: 1024 y: 128 z: 64 time: 9108ns loops: 30 x: 1024 y: 1024 z: 64 time: 170799ns Pull Request resolved: https://github.com/pytorch/pytorch/pull/17730 Differential Revision: D14325149 Pulled By: zhangguanheng66 fbshipit-source-id: a7a5a83890fdf99fee6eb87a3a5060b7b6bd862f	2019-03-13 08:57:26 -07:00
Edward Yang	6466ddbd86	Fix lint in test_torch.py (#17807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17807 Lint also detected a bug in test_linspace where we weren't actually testing the CUDA case. Differential Revision: D14388241 fbshipit-source-id: e219e46400f4952c6b384bca3baa0724ef94acde	2019-03-12 13:48:28 -07:00
Thomas Viehmann	aba9051a65	kthvalue consistency with sort in the presence of NaN (#17824 ) Summary: This PR causes kthvalue to be consistent with sort (i.e. treat NaN as larger than any number), so that `a.kthvalue(n) == a.sort()[n - 1]`. One drawback is that median with a NaN argument does not return NaN, which is a deviation from NumPy. Thank you, ngimel, for raising this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17824 Differential Revision: D14410092 Pulled By: ezyang fbshipit-source-id: bdec2d8272dc4c65bcf2f9b8995e237774c44c02	2019-03-12 08:49:19 -07:00
vishwakftw	f268370b42	torch.btrifact for tensors with greater than 3 dimensions (#14964 ) Summary: Motivation: - Earlier, `torch.btrifact` could not handle tensors with greater than 3 dimensions. This is because of the check: > AT_CHECK(THTensor_(nDimension)(a) == 3, "expected 3D tensor, got size: ", a->sizes()); What is in this PR?: - Move `btrifact` to ATen - Remove relation to TH/THC. - Handle tensors with more than three dimensions - Tests - Docs modifications: added a note about the non-pivoting variant. [blocked due to old magma-cuda binaries] Pull Request resolved: https://github.com/pytorch/pytorch/pull/14964 Differential Revision: D14405106 Pulled By: soumith fbshipit-source-id: f051f5d6aaa45f85836a2867176c065733563184	2019-03-12 01:46:07 -07:00
Iurii Zdebskyi	4aa22833cf	Bool tensor creation (cpu) (#17376 ) Summary: This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this: 1. Storage Implementation [Done] 2. Tensor Creation. a) CPU (this PR) b) CUDA 3. Tensor Conversions. 4. Tensor Indexing. 5. Tensor Operations. 6. Back compatibility related changes. Change: Enable CPU tensors and these operations: - torch.zeros - torch.tensor - torch.ones - torch.randint - torch.full - torch.full_like - torch.empty - torch.empty_like Tested via: 1) unit tests 2) torch.zeros(2,2, dtype=torch.bool) torch.tensor([True, False], dtype=torch.bool) torch.tensor([-1, -1.1, 0, 1, 1.1, 2], dtype=torch.bool) torch.ones([1,2], dtype=torch.bool) torch.randint(10, (2, 2), dtype=torch.bool) torch.full((2, 3), True, dtype=torch.bool) torch.empty(4, dtype=torch.bool) a = torch.tensor([0,0,1]) b = torch.full_like(a, True) Pull Request resolved: https://github.com/pytorch/pytorch/pull/17376 Reviewed By: ezyang Differential Revision: D14375995 Pulled By: izdeby fbshipit-source-id: a65490b5360ee0e6e3accc54ce7e32e49ad2d2a8	2019-03-11 17:03:40 -07:00
Gao, Xiang	11c89dde55	Allow structseq to be input of operators where tuple is expected (#17208 ) Summary: Currently the following code gives an error on python 2 because `ret` is a structseq which is not a tuple ```python ret = a.max(dim=0) ret1 = torch.max(a, dim=0, out=ret) ``` This PR modify tuple check in python arg parser to allow structseq to be input of operators where tuple is expected, which would make the above code work. Depend on: https://github.com/pytorch/pytorch/pull/17136 Partially fixes: https://github.com/pytorch/pytorch/issues/16813 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17208 Differential Revision: D14280198 Pulled By: VitalyFedyunin fbshipit-source-id: beffebfd3951c4f5c7c8fe99a5847616a89491f3	2019-03-11 11:33:35 -07:00
bhushan	b57fe3cc66	Introducing array-like sequence methods __contains__ (#17733 ) Summary: for tensor Fixes: #17000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17733 Differential Revision: D14401952 Pulled By: soumith fbshipit-source-id: c841b128c5a1fceda1094323ed4ef1d0cf494909	2019-03-11 09:00:16 -07:00
bhushan	6bcff88d3e	Fix log_softmax and softmax if any dimension is 0-d (#17651 ) Summary: - Test added - test_dim_function_empty: softmax and log_softmax on last dimension fixes: #17262 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17651 Differential Revision: D14349009 Pulled By: gchanan fbshipit-source-id: b6f728f5c6be8ae7615749e3f0c201886632923e	2019-03-10 15:25:58 -07:00
vishwakftw	9d70e199f4	Move lerp to ATen, add functionality for tensor weights (#17348 ) Summary: Changelog: - Remove TH/THC bindings - Add tensor weights for `lerp` - Modify derivatives appropriately Pull Request resolved: https://github.com/pytorch/pytorch/pull/17348 Differential Revision: D14355845 Pulled By: soumith fbshipit-source-id: eaede4c09ee589d77ba6cf52583510ea8e3a2fcf	2019-03-07 14:04:58 -08:00
bhushan	886e482776	index operation support for torch.HalfTensor (#17645 ) Summary: - Test cases added 1. indexing for half tensor 2. setting for half tensor fixes #17161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17645 Differential Revision: D14302069 Pulled By: ezyang fbshipit-source-id: 100f141c07046f200c904e27c5882a9417bccda0	2019-03-06 10:32:35 -08:00
Edward Yang	2ed99fee0d	Revert D13935403: Call c10 cuda op from test_torch Differential Revision: D13935403 Original commit changeset: b2915ec8a366 fbshipit-source-id: 0f3409d5c102d719bc1f0483695aee93e7d613c9	2019-03-01 14:18:26 -08:00
Sebastian Messmer	0a7b2af13b	Call c10 cuda op from test_torch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16692 Reviewed By: ezyang Differential Revision: D13935403 fbshipit-source-id: b2915ec8a3664bb6e918ed357908cc33d8f9449a	2019-03-01 10:59:19 -08:00
bhushan	a6170573c8	Adding support for 0-d tensor for transpose (.t()) (#17535 ) Summary: - Test updates 1. test_torch: added 0-d test case and t_() test cases 2. test_jit : updated error message for TestAsync.test_async_script_error - Updating documentation for torch.t() Adding information regarding new support of 0-D and 1-D tenso Fixes #17520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17535 Differential Revision: D14269984 Pulled By: gchanan fbshipit-source-id: 38b723f31484be939261c88edb33575d242eca65	2019-03-01 08:45:01 -08:00
Xiang Gao	2e5a8cee82	Customize the printing of namedtuple return (#17136 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17112 ```python print("good", torch.randn(5,5,5).max(1)) print("terrible", torch.randn(5,5,10).max(1)) print("not as good", torch.randn(5,5,500).max(1)) print ("old behaviour = gold standard") print(tuple(torch.randn(5,5,5).max(1))) print(tuple(torch.randn(5,5,10).max(1))) print(tuple(torch.randn(5,5,500).max(1))) ``` now gives ``` >>> import torch >>> print("good", torch.randn(5,5,5).max(1)) good torch.return_types.max( values=tensor([[ 1.2821, 1.8063, 1.8075, 1.3082, -0.1267], [ 0.3437, 0.7353, 1.2619, 0.7557, 1.6662], [ 0.8583, 1.8906, 1.0246, 1.7598, 1.1184], [ 1.7821, 0.0230, 0.9452, 1.0318, 1.0823], [ 0.4116, -0.0379, -0.1843, 1.4129, 1.8796]]), indices=tensor([[4, 4, 3, 2, 1], [1, 2, 4, 1, 1], [2, 4, 0, 2, 1], [0, 2, 0, 3, 1], [0, 4, 4, 4, 4]])) >>> print("terrible", torch.randn(5,5,10).max(1)) terrible torch.return_types.max( values=tensor([[ 2.1272, 1.3664, 2.2067, 1.3974, -0.0883, 1.2505, 1.0074, 1.1217, 0.3849, 0.6936], [ 0.6288, -0.4560, 1.2748, 1.5482, 1.2777, 1.6874, 0.7151, 0.6041, 1.3572, 1.6232], [ 1.6703, 1.0075, 1.6480, 2.2839, 1.3390, 0.4938, 1.6449, 1.7628, 0.8141, 2.5714], [ 0.7079, 1.8677, 3.2478, 1.5591, 2.4870, 0.8635, -0.1450, 1.6923, 1.4924, 1.6298], [ 2.4056, 0.8002, 0.9317, 0.7455, 0.7866, 2.1191, 0.3492, 1.2095, 1.8637, 1.7470]]), indices=tensor([[1, 1, 0, 0, 0, 0, 3, 4, 4, 4], [4, 2, 2, 1, 2, 2, 3, 1, 1, 3], [0, 3, 3, 0, 2, 1, 4, 1, 0, 1], [4, 1, 3, 0, 3, 2, 0, 1, 4, 3], [1, 0, 3, 2, 1, 0, 0, 1, 0, 1]])) >>> print("not as good", torch.randn(5,5,500).max(1)) not as good torch.return_types.max( values=tensor([[ 0.3877, 0.7873, 1.8701, ..., 0.5971, 1.6103, -0.3435], [ 1.1300, 2.2418, 1.4239, ..., 1.3943, 0.3872, 1.6475], [ 2.0656, 1.3136, 0.9896, ..., 2.3918, 0.8226, 1.0517], [ 1.1054, 0.9945, 1.0561, ..., 2.1039, 1.1524, 3.0304], [ 1.5041, 2.2809, 1.0883, ..., 0.8504, 2.4774, 1.1041]]), indices=tensor([[4, 3, 1, ..., 1, 4, 0], [4, 4, 4, ..., 3, 0, 3], [3, 0, 1, ..., 2, 2, 4], [0, 1, 1, ..., 4, 2, 2], [1, 0, 4, ..., 2, 0, 2]])) >>> print ("old behaviour = gold standard") old behaviour = gold standard >>> print(tuple(torch.randn(5,5,5).max(1))) (tensor([[ 1.1908, 1.1807, 1.3151, 1.7184, 0.3556], [ 0.3798, 0.9213, 0.3001, 1.3087, 2.2419], [ 1.4233, 1.4814, 1.9900, 1.7744, 1.3059], [ 1.0026, -0.0330, 1.3061, 1.8730, 2.0685], [ 1.3041, 1.6458, 1.3449, 1.8948, 3.6206]]), tensor([[0, 4, 3, 4, 0], [1, 1, 4, 0, 4], [4, 1, 0, 3, 3], [1, 2, 1, 4, 0], [3, 3, 0, 3, 3]])) >>> print(tuple(torch.randn(5,5,10).max(1))) (tensor([[-0.1232, 0.8275, 0.6732, 1.1223, 0.8247, 1.2851, 1.6009, 1.9979, 1.9109, 0.7313], [ 0.2260, 0.5922, 1.6928, 0.6024, 2.1158, 3.0619, 0.5653, 0.7426, 0.8316, 0.6346], [ 0.4319, 0.2231, 0.5255, 1.7620, 1.1657, 0.8875, 0.5782, 0.6506, 0.5032, 1.7097], [ 0.4137, 1.7265, 1.4260, 2.0301, 1.2244, 0.7128, 2.6345, 0.7230, 1.3553, 1.6508], [ 1.0684, 1.7195, 1.4068, 0.7076, -0.0242, 0.8474, 0.8754, 1.7108, 0.2188, 1.1584]]), tensor([[0, 1, 3, 4, 2, 3, 4, 2, 1, 0], [1, 4, 0, 0, 3, 2, 0, 0, 3, 3], [2, 3, 1, 1, 4, 0, 1, 4, 4, 4], [0, 4, 1, 3, 2, 0, 2, 0, 3, 1], [1, 0, 0, 0, 0, 3, 3, 3, 2, 0]])) >>> print(tuple(torch.randn(5,5,500).max(1))) (tensor([[0.9395, 1.5572, 1.8797, ..., 2.0494, 0.8202, 0.9623], [1.7937, 0.7225, 1.8836, ..., 0.7927, 1.4976, 1.1813], [0.8558, 1.6943, 1.4192, ..., 0.8327, 1.9661, 0.4197], [1.2993, 1.4995, 0.9357, ..., 0.7810, 1.3030, 2.6216], [1.4206, 1.8315, 1.0338, ..., 1.4312, 1.3198, 1.5233]]), tensor([[0, 4, 3, ..., 3, 0, 2], [0, 1, 0, ..., 0, 4, 3], [3, 4, 3, ..., 3, 0, 0], [3, 2, 3, ..., 1, 2, 1], [1, 2, 4, ..., 3, 1, 3]])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/17136 Differential Revision: D14250021 Pulled By: VitalyFedyunin fbshipit-source-id: aae72f03b35980063b1ac1f07b8353eddb0c8b93	2019-02-28 13:07:26 -08:00
bhushan	4ca1a54526	Make transpose consistent with numpy's behavior (#17462 ) Summary: Pytorch's tensor.t() is now equivalent with Numpy's ndarray.T for 1D tensor i.e. tensor.t() == tensor Test case added: - test_t fixes #9687 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17462 Differential Revision: D14214838 Pulled By: soumith fbshipit-source-id: c5df1ecc8837be22478e3a82ce4854ccabb35765	2019-02-26 14:23:19 -08:00
Stefan Krah	e4e9b738d3	Followup to #17049 : change more instances of RuntimeError to IndexError Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17114 Differential Revision: D14150890 Pulled By: gchanan fbshipit-source-id: 579ca71665166c6a904b894598a0b334f0d8acc7	2019-02-25 15:34:22 -08:00
Gregory Chanan	15a55b86ed	Fix nonzero for scalars on cuda, to_sparse for scalars on cpu/cuda. (#17406 ) Summary: I originally set out to fix to_sparse for scalars, which had some overly restrictive checking (sparse_dim > 0, which is impossible for a scalar). This fix uncovered an issue with nonzero: it didn't properly return a size (z, 0) tensor for an input scalar, where z is the number of nonzero elements (i.e. 0 or 1). Pull Request resolved: https://github.com/pytorch/pytorch/pull/17406 Differential Revision: D14185393 Pulled By: gchanan fbshipit-source-id: f37a6e1e3773fd9cbf69eeca7fdebb3caa192a19	2019-02-25 08:23:40 -08:00
Xiang Gao	b2dde4386a	Namedtuple return for symeig, eig, pstrf, qr, geqrf (#16950 ) Summary: More ops for https://github.com/pytorch/pytorch/issues/394 Differential Revision: D14118645 Pulled By: ezyang fbshipit-source-id: a98646c3ddcbe4e34452aa044951286dcf9df778	2019-02-20 14:01:19 -08:00
SsnL	79f898263b	Improve error message w/ size inference on empty tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17255 Differential Revision: D14143094 Pulled By: soumith fbshipit-source-id: f96fa7f8eb6eaac72887d3e837546cbfa505f101	2019-02-20 09:12:26 -08:00
Will Feng	c88798dbc1	Make tril_ and triu_ actually in-place (#17031 ) Summary: Currently, when the input tensor `self` is not contiguous, `tril_` and `triu_` calls `self = self.contiguous()`, which allocates a new contiguous tensor and assign it to `self`. This effectively changes the input tensor `self`'s pointer and will break downstream code after Variable/Tensor merge. This PR fixes it so that `tril_` and `triu_` always update the input tensor in-place and preserve the input tensor's TensorImpl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/17031 Differential Revision: D14069592 Pulled By: yf225 fbshipit-source-id: d188218f426446a44ccc1d33fc28ac3f828c6a05	2019-02-19 14:47:17 -08:00
Iurii Zdebskyi	444039c47b	Bool tensor. Part 0: Boolean storage implementation (#16810 ) Summary: This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this: 0. Storage Implementation (this change) 1. Tensor Creation. 2. Tensor Conversions. 3. Tensor Indexing. 4. Tensor Operations. 5. Back compatibility related changes. This feature was requested by the community: https://github.com/pytorch/pytorch/issues/4764 https://github.com/pytorch/pytorch/issues/4219 https://github.com/pytorch/pytorch/issues/4288 Change: Added boolean type to the Storage class for CPU and CUDA backends. Tested via: 1. unit tests 2. running this: -> import torch -> torch.BoolStorage <class 'torch.BoolStorage'> -> torch.cuda.BoolStorage <class 'torch.cuda.BoolStorage'> Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810 Reviewed By: gchanan Differential Revision: D14087246 Pulled By: izdeby fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e	2019-02-19 08:22:13 -08:00
Gao, Xiang	b6b99fd7d3	Add namedtuple return for min, median, mode, kthvalue, add test for namedtuple return API (#16186 ) Summary: This partially fixes https://github.com/pytorch/pytorch/issues/394 and depend on https://github.com/pytorch/pytorch/pull/15429. I suggest to review this only after https://github.com/pytorch/pytorch/pull/15429 get landed, otherwise the diff might be large to review. The test only allows explicitly whitelisted operators to have named return. Differential Revision: D14070735 Pulled By: ezyang fbshipit-source-id: ace2a672998b4e4a8094f52cbda5aa1cea6e3b42	2019-02-16 00:01:33 -08:00
Xiang Gao	4fcab92d6c	Move outplace ops to ATen (#16788 ) Summary: Based on https://github.com/pytorch/pytorch/pull/12413, with the following additional changes: - Inside `native_functions.yml` move those outplace operators right next to everyone's corresponding inplace operators for convenience of checking if they match when reviewing - `matches_jit_signature: True` for them - Add missing `scatter` with Scalar source - Add missing `masked_fill` and `index_fill` with Tensor source. - Add missing test for `scatter` with Scalar source - Add missing test for `masked_fill` and `index_fill` with Tensor source by checking the gradient w.r.t source - Add missing docs to `tensor.rst` Differential Revision: D14069925 Pulled By: ezyang fbshipit-source-id: bb3f0cb51cf6b756788dc4955667fead6e8796e5	2019-02-15 15:58:10 -08:00
Stefan Krah	a5e7b1d032	Use IndexError instead of RuntimeError in ATen CPU kernels Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17049 Reviewed By: ezyang Differential Revision: D14064700 Pulled By: fmassa fbshipit-source-id: 3575db103bba5a7d82f574cbb082beca419151ec	2019-02-13 10:19:28 -08:00
vishwakftw	0d95028bee	Dispatch the correct legacy function for geqrf_out and ormqr_out (#16964 ) Summary: This fixes the segfault. Changelog: - Modify the function calls in LegacyDefinitions for `geqrf_out` and `ormqr_out` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16964 Differential Revision: D14025985 Pulled By: gchanan fbshipit-source-id: aa50e2c1694cbf3642273ee14b09ba12625c7d33	2019-02-12 13:48:51 -08:00

1 2 3 4 5 ...

583 Commits