Commit Graph

583 Commits

Author SHA1 Message Date
Edward Yang
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
Vishwak Srinivasan
e73be58ff7 Rename btriunpack to lu_unpack (#18529)
Summary:
Changelog:
- Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface.
- Rename all relevant tests, fix callsites
- Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529

Differential Revision: D14683161

Pulled By: soumith

fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1
2019-03-29 13:01:30 -07:00
Vishwak Srinivasan
d859031ebf Rename btrifact* to lu (#18435)
Summary:
Changelog:

- Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`).
- Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple.
- Rename all tests, fix callsites
- Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage.
- Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435

Differential Revision: D14680352

Pulled By: soumith

fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8
2019-03-29 00:34:30 -07:00
Edward Yang
81e030d9a6 Upgrade flake8-bugbear to master, fix the new lints. (#18507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18507
ghimport-source-id: 1c3642befad2da78a7e5f39d6d58732b85c76267

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18507 Upgrade flake8-bugbear to master, fix the new lints.**

It turns out Facebobok is internally using the unreleased master
flake8-bugbear, so upgrading it grabs a few more lints that Phabricator
was complaining about but we didn't get in open source.

A few of the getattr sites that I fixed look very suspicious (they're
written as if Python were a lazy language), but I didn't look more
closely into the matter.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14633682

fbshipit-source-id: fc3f97c87dca40bbda943a1d1061953490dbacf8
2019-03-27 08:07:41 -07:00
Xiang Gao
2ba41c5550 Add some missing docs for tensor methods and attributes, new unittest to enforce tensors.rst no longer miss anything (#16057)
Summary:
This depend on https://github.com/pytorch/pytorch/pull/16039

This prevent people (reviewer, PR author) from forgetting adding things to `tensors.rst`.

When something new is added to `_tensor_doc.py` or `tensor.py` but intentionally not in `tensors.rst`, people should manually whitelist it in `test_docs_coverage.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16057

Differential Revision: D14619550

Pulled By: ezyang

fbshipit-source-id: e1c6dd6761142e2e48ec499e118df399e3949fcc
2019-03-26 18:05:56 -07:00
Soumith Chintala
66628f78b7 Revert D14605905: [pytorch][PR] Add return_counts to torch.unique
Differential Revision:
D14605905

Original commit changeset: 555f5a12a8e2

fbshipit-source-id: c7874f5987893e956c022180a37763d88bba38db
2019-03-26 17:18:01 -07:00
Tongzhou Wang
5292685d2f Improve numerical precision of (s)logdet (#18449)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18448 and https://github.com/pytorch/pytorch/issues/18450
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18449

Differential Revision: D14611638

Pulled By: soumith

fbshipit-source-id: 4f1f27ab5316a92d2783e734169f599afed743cf
2019-03-26 15:32:14 -07:00
Soumith Chintala
436723122e fix arange shape issue inconsistency across cpu and cuda (#18462)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18462

Differential Revision: D14620263

Pulled By: soumith

fbshipit-source-id: 223524cdda2f5d55c2ca8d4cdcf6f7a05a6c15eb
2019-03-26 15:27:24 -07:00
Xiang Gao
5bff395a82 Namedtuple return for solve, slogdet, sort, topk (#17093)
Summary:
More ops for https://github.com/pytorch/pytorch/issues/394. ~~Also need to rebase after landing #16186, because we need to update the whitelist of the new unit test added in #16186.~~

cc: ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17093

Differential Revision: D14620068

Pulled By: ezyang

fbshipit-source-id: deec5ffc9bf7624e0350c85392ee59789bad4237
2019-03-26 12:39:08 -07:00
Iurii Zdebskyi
1a742075ee Resolving comments from Bool Tensor for CPU PR (#18165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18165
ghimport-source-id: 55cb3fb63a25c2faab1725b4ec14c688bf45bd38

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18166 Bool Tensor for CUDA
* **#18165 Resolved comments from Bool Tensor for CPU PR**
-------
------------
This is a follow up PR that resolves some additional feedback on one the of previous Bool Tensor PRs.

gchanan, here is a list of almost all the comments from the original PR with respective fixes and replies:

**[utils/python_scalars.h]** why is this converting from uint8_t and not bool? (comment?)
When i was adding this, i was testing by creating a tensor and then calling its .tolist(). it worked for bool and uint8_t equally good so i left uint8_t as thought it makes more sense as we are calling PyBool_FromLong. �Changing it to bool.

**[ATen/Dispatch.h]**better name?.
fixed.

**[test/test_torch.py]** what about other factories, such as full? (and more).
There is a test that goes through the factory methods - test_tensor_factories_empty. i added some bool cases above it and added a comment that once CUDA will be done, i will unite them and it will iterate not just between CUDA and CPU but also all types. ��Adding all bool cases now. Will unite in CUDA PR.

**[generic/THTensorMath.h]** any changes in this file actually needed?
Bad merge. Fixed.

**[TH/THTensor.h]** this generates code for random, clampedRandom, and cappedRandom -- do we have tests for all of these with bool?
Added

**[c10/core/ScalarType.h]** I'm not very confident about the lack of Bool here -- can you look at the call sites and see what makes sense to do here?
Added bool to the macro and created a similar one without for a single case which fails the build with errors:

_./torch/csrc/jit/symbolic_variable.h:79:20: error: ambiguous overload for ‘operator*’ (operand types are ‘const torch::jit::SymbolicVariable’ and ‘torch::jit::Value*’)
return (*this) * insertConstant(rhs);_

Differential Revision: D14605105

fbshipit-source-id: abf82d50e8f8c50b386545ac068268651b28496d
2019-03-26 09:59:34 -07:00
vishwakftw
5e462a3ed6 Introduce SobolEngine (#10505)
Summary:
`SobolEngine` is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111.

Documentation has been added, tests have been added based on Balandat 's references. The implementation is an optimized / tensor-ized implementation of Balandat 's implementation in Cython as provided in #9332.

This closes #9332 .

cc: soumith Balandat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10505

Reviewed By: zou3519

Differential Revision: D9330179

Pulled By: ezyang

fbshipit-source-id: 01d5588e765b33b06febe99348f14d1e7fe8e55d
2019-03-26 07:53:07 -07:00
Xiang Gao
e2730ddb21 Add return_counts to torch.unique (#18391)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/12598

This PR was originally authorized by ptrblck at https://github.com/pytorch/pytorch/pull/15495, but since there was no update for months after the request change, I clone that branch and resolve the code reviews here. Hope everything is good now. Especially, the implementation of count is changed from ptrblck's original algorithm to the one ngimel suggest, i.e. using `unique_by_key` and `adjacent_difference`.

The currently implementation of `_unique_dim` is VERY slow for computing inverse index and counts, see https://github.com/pytorch/pytorch/issues/18405. I will refactor `_unique_dim` in a later PR. For this PR, please allow me to keep the implementation as is.

cc: ptrblck ezyang ngimel colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18391

Reviewed By: soumith

Differential Revision: D14605905

Pulled By: VitalyFedyunin

fbshipit-source-id: 555f5a12a8e28c38b10dfccf1b6bb16c030bfdce
2019-03-25 20:38:17 -07:00
Edward Yang
50df3e5e2e Add ability to query if built with CUDA and MKL-DNN. (#18362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362
ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18242 Test running a CUDA build on CPU machine.
* **#18362 Add ability to query if built with CUDA and MKL-DNN.**

Fixes #18108.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14584430

fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473
2019-03-25 10:39:09 -07:00
Edward Yang
e3da16a99e Add test for #17271 (torch.exp incorrect for 2**31 size tensor) (#18292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18292
ghimport-source-id: a3e96584db0eef7b6202a1211808f9f6e59dd529

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18292 Add test for #17271 (torch.exp incorrect for 2**31 size tensor)**
* #18291 Correctly call superclass setUp in TestCase subclasses.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14567642

fbshipit-source-id: c60ee7597a86f5d2c5c0b72cb106f17815950427
2019-03-22 07:50:38 -07:00
vishwakftw
291746f110 Rename trtrs to triangular_solve (#18213)
Summary:
Changelog:
- Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage.
- Move `isnan` to _torch_docs.py
- Remove unnecessary imports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213

Differential Revision: D14566902

Pulled By: ezyang

fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f
2019-03-21 14:27:21 -07:00
Edward Yang
549c4da917 Add a decorator for marking slow tests. (#18231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18231
ghimport-source-id: 78c230f60c41877fe91b89c8c979b160f36f856b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18231 Add a decorator for marking slow tests.**

The general strategy:
- It's a normal skip decorator, which triggers a skip if
  PYTORCH_TEST_WITH_SLOW is not set.
- It also annotates the method in question that says it's
  slow.  We use this to implement a catch-all skipper in
  setUp that skips all non-slow tests when
  PYTORCH_TEST_SKIP_FAST is set.

I added a little smoketest to test_torch and showed that I get:

```
Ran 432 tests in 0.017s
OK (skipped=431)
```

when running with PYTORCH_TEST_WITH_SLOW=1 and PYTORCH_TEST_SKIP_FAST=1

CI integration coming in later patch, as well as nontrivial uses of
this decorator.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14544441

fbshipit-source-id: 54435ce4ec827193e019887178c09ebeae3ae2c9
2019-03-21 11:17:34 -07:00
Edward Yang
ba81074c40 Fix B902 lint error: invalid first argument. (#18181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18181
ghimport-source-id: 9c23551584a1a1b0b7ac246367f3a7ae1c50b315

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18184 Fix B903 lint: save memory for data classes with slots/namedtuple
* **#18181 Fix B902 lint error: invalid first argument.**
* #18178 Fix B006 lint errors: using mutable structure in default argument.
* #18177 Fix lstrip bug revealed by B005 lint

A variety of sins were committed:
- Some code was dead
- Some code was actually a staticmethod
- Some code just named it the wrong way
- Some code was purposely testing the omitted case

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14530876

fbshipit-source-id: 292a371d9a76ddc7bfcfd38b6f0da9165290a58e
2019-03-21 09:10:28 -07:00
Gao, Xiang
7e6220393f Cleanup arg{min, max} (#17103)
Summary:
Why do we need this workaround? `PythonArgParser` handles these two cases well.

The discussion started at https://github.com/pytorch/pytorch/pull/6201#issuecomment-378724406. The conclusion at that time by goldsborough was:

> Because we wanted to allow `dim=None` in Python and route to a different function. Essentially the problem was wanting to wrap the C++ function in Python. AFAIK there is no way of translating `dim=None` behavior into C++? So Richard and I came up with this strategy

Maybe at that time `PythonArgParser` was not powerful enough to handle the routing of two function with same name but different C++ signature.

Will keep an eye on the CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17103

Differential Revision: D14523503

Pulled By: VitalyFedyunin

fbshipit-source-id: cae3e2678062da2eccd93b51d4050578c7a9ab80
2019-03-20 16:28:27 -07:00
Vishwak Srinivasan
a519217ee7 Add batched version of trtrs (#18025)
Summary:
- Remove single batch TH/THC implementations
- Remove `_batch_trtrs_lower` from `multivariate_normal`
- Add tests for batched behavior
- Modify trtrs_backward to accommodate for batched case
- Modify docs

In a future PR, this will be renamed to `triangular_solve`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18025

Differential Revision: D14523004

Pulled By: ifedan

fbshipit-source-id: 11c6a967d107f969b60e5a5c73ce6bb8099ebbe1
2019-03-20 11:11:32 -07:00
vishwakftw
234bb8719a Add backend checks to solve methods (gesv, cholesky_solve) (#18116)
Summary:
Changelog:
- Incorporate a simple backend check in the linearSolveCheckInputs function in LinearAlgebraUtils.h
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18116

Differential Revision: D14504469

Pulled By: soumith

fbshipit-source-id: 7402b6dbaa8d73048946613b806d54f68bcbd8f4
2019-03-19 10:44:45 -07:00
Vishwak Srinivasan
421b508d55 Rename gesv to solve (#18060)
Summary:
Changelog:

- Renames `gesv` to `solve` to remain consistent with `cholesky_solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `solve` under the name `gesv`, and add a deprecated warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18060

Differential Revision: D14503117

Pulled By: zou3519

fbshipit-source-id: 99c16d94e5970a19d7584b5915f051c030d49ff5
2019-03-18 16:04:24 -07:00
Richard Zou
3c977fb7ce Error out on in-place (unary) ops on tensors that have internal overlap (#17927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17927
ghimport-source-id: 626d321e430b6b5c0ea3aa1eb9df8c1e2d058bf8

Stack:
* #17926 Implement at::has_internal_overlap helper function
* **#17927 Error out on in-place (unary) ops on tensors that have internal overlap**

On the way to #17935.

Works for CPU and CUDA on the following ops:
- abs_, acos_, asin_, atan_, ceil_, cos_, erf_, erfc_, exp_, expm1_
- floor_, log_, log10_, log1p_, log2_, round_, rsqrt_,
- sin_, sqrt_, tan_, tanh_, trunc_

This PR adds a check to see if the out/result tensor has internal
overlap. If it does, then we error out because the result **may** be
incorrect.

This is overly conservative; there are some cases where if the result is
the same as the input, the inplace operation is OK (such as floor_,
round_, and trunc_). However, the current code isn't organized in such a
way that this is easy to check, so enabling those will come in the future.

Reviewed By: ezyang

Differential Revision: D14438871

fbshipit-source-id: 15e12bf1fdb2ab7f74bb806e22bc74840bd6abd1
2019-03-15 07:50:19 -07:00
Richard Zou
a4123decf7 Implement at::has_internal_overlap helper function (#17926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17926
ghimport-source-id: 9f7572b5d43e474492363fa17dcb86a6c27ca13c

Stack:
* **#17926 Implement at::has_internal_overlap helper function**
* #17927 Error out on in-place (unary) ops on tensors that have internal overlap

On the way to #17935.

Checks if a tensor's sizes/strides indicate that multiple elements share
the same memory location. This problem in general is hard so
at::has_internal_overlap implements two heuristics and avoids solving
the general problem:

if a tensor is contiguous, it cannot have internal overlap
if a tensor has any zero strides, it does have internal overlap
otherwise, return MemOverlap::kTooHard to indicate that there might be
overlap, but we don't know.

Reviewed By: ezyang

Differential Revision: D14438858

fbshipit-source-id: 607ab31771315921ab6165b2a1f072ac3e75925a
2019-03-15 07:50:17 -07:00
J M Dieterich
1ba1ca0acb Update to ROCm2.2 (#18007)
Summary:
ROCm 2.2 was released today, if we respin the CI docker images with the attached, PyTorch/Caffe2 will support ROCm 2.2

Changes necessary:
* for the Ubuntu target, HIP PR 934 needs to be applied to fix the forceinline definition. ROCm 2.3 will contain this.
* two unit tests proof flaky on different platforms, disable them defensively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18007

Differential Revision: D14473903

Pulled By: bddppq

fbshipit-source-id: b1939f11d1c765a3bf71bb244b15f6ceb0e816d3
2019-03-14 18:47:22 -07:00
Lu Fang
f827f1052a Fix the CI
Summary: https://github.com/pytorch/pytorch/pull/17995 's CI has verified it should fix the CI.

Reviewed By: bddppq

Differential Revision: D14447674

fbshipit-source-id: 50085db9ae7421b5be216ed0a2216234babfdf6c
2019-03-13 17:28:50 -07:00
Guanheng Zhang
26a4c2ada6 Speed up gemm by reordering the for loops (#17730)
Summary:
Optimize the order of the "for" loops.

Note: For "transa = true" cases, the order of the "for" loops has been optimzied in the original code. Therefore, no significant improvement is observed in those case (i.e. "transa && transb" and "transa && !transb")

mode/opt (i.e. static libary)
//////////////////////////////////////////////////////////////////////////////
transa && transb
after:
loops:  2229     x:     128      y:     128      z:     128      time:  2243ns      =>  acceleration multiplier:  0.90
loops:  124      x:     128      y:     1024     z:     128      time:  40381ns      =>  acceleration multiplier:  0.97
loops:  121      x:     1024     y:     128      z:     128      time:  41651ns      =>  acceleration multiplier:  0.96
loops:  15       x:     1024     y:     1024     z:     128      time:  333771ns       =>  acceleration multiplier:  0.98
loops:  4610     x:     128      y:     128      z:     64       time:  1084ns       =>  acceleration multiplier:  0.95
loops:  252      x:     128      y:     1024     z:     64       time:  19860ns      =>  acceleration multiplier:  0.98
loops:  248      x:     1024     y:     128      z:     64       time:  20232ns      =>  acceleration multiplier:  0.98
loops:  30       x:     1024     y:     1024     z:     64       time:  167338ns      =>  acceleration multiplier:  0.99

before:
loops:  2468     x:     128      y:     128      z:     128      time:  2026ns
loops:  128      x:     128      y:     1024     z:     128      time:  39338ns
loops:  126      x:     1024     y:     128      z:     128      time:  39930ns
loops:  16       x:     1024     y:     1024     z:     128      time:  327549ns
loops:  4840     x:     128      y:     128      z:     64       time:  1033ns
loops:  258      x:     128      y:     1024     z:     64       time:  19441ns
loops:  252      x:     1024     y:     128      z:     64       time:  19854ns
loops:  31       x:     1024     y:     1024     z:     64       time:  166254ns

//////////////////////////////////////////////////////////////////////////////
transa && !transb
after:
loops:  4880     x:     128      y:     128      z:     128      time:  1024ns      =>  acceleration multiplier:  0.98
loops:  638      x:     128      y:     1024     z:     128      time:  7839ns      =>  acceleration multiplier:  1.04
loops:  605      x:     1024     y:     128      z:     128      time:  8276ns      =>  acceleration multiplier:  1.01
loops:  77       x:     1024     y:     1024     z:     128      time:  65713ns      =>  acceleration multiplier:  1.00
loops:  9935     x:     128      y:     128      z:     64       time:  503ns      =>  acceleration multiplier:  1.00
loops:  1252     x:     128      y:     1024     z:     64       time:  3994ns      =>  acceleration multiplier:  1.00
loops:  1183     x:     1024     y:     128      z:     64       time:  4226ns      =>  acceleration multiplier:  0.98
loops:  153      x:     1024     y:     1024     z:     64       time:  32766ns      =>  acceleration multiplier:  0.99

before:
loops:  4985     x:     128      y:     128      z:     128      time:  1003ns
loops:  615      x:     128      y:     1024     z:     128      time:  8140ns
loops:  599      x:     1024     y:     128      z:     128      time:  8357ns
loops:  76       x:     1024     y:     1024     z:     128      time:  65934ns
loops:  9897     x:     128      y:     128      z:     64       time:  505ns
loops:  1248     x:     128      y:     1024     z:     64       time:  4008ns
loops:  1203     x:     1024     y:     128      z:     64       time:  4159ns
loops:  154      x:     1024     y:     1024     z:     64       time:  32499ns

//////////////////////////////////////////////////////////////////////////////
!transa && transb
after:
loops:  3919     x:     128      y:     128      z:     128      time:  1276ns      =>  acceleration multiplier:  2.97
loops:  497      x:     128      y:     1024     z:     128      time:  10069ns      =>  acceleration multiplier:  7.85
loops:  449      x:     1024     y:     128      z:     128      time:  11145ns      =>  acceleration multiplier:  4.77
loops:  57       x:     1024     y:     1024     z:     128      time:  88595ns      =>  acceleration multiplier:  7.12
loops:  7575     x:     128      y:     128      z:     64       time:  660ns      =>  acceleration multiplier:  3.00
loops:  967      x:     128      y:     1024     z:     64       time:  5173ns      =>  acceleration multiplier:  7.66
loops:  877      x:     1024     y:     128      z:     64       time:  5702ns      =>  acceleration multiplier:  4.76
loops:  111      x:     1024     y:     1024     z:     64       time:  45232ns      =>  acceleration multiplier:  7.03

before:
loops:  1320     x:     128      y:     128      z:     128      time:  3789ns
loops:  64       x:     128      y:     1024     z:     128      time:  79061ns
loops:  95       x:     1024     y:     128      z:     128      time:  53107ns
loops:  8        x:     1024     y:     1024     z:     128      time:  631161ns
loops:  2521     x:     128      y:     128      z:     64       time:  1983ns
loops:  127      x:     128      y:     1024     z:     64       time:  39604ns
loops:  185      x:     1024     y:     128      z:     64       time:  27128ns
loops:  16       x:     1024     y:     1024     z:     64       time:  318155ns

//////////////////////////////////////////////////////////////////////////////
!transa && !transb
after:
loops:  3895     x:     128      y:     128      z:     128      time:  1283ns      =>  acceleration multiplier:  1.73
loops:  393      x:     128      y:     1024     z:     128      time:  12746ns      =>  acceleration multiplier:  3.36
loops:  411      x:     1024     y:     128      z:     128      time:  12170ns      =>  acceleration multiplier:  1.93
loops:  46       x:     1024     y:     1024     z:     128      time:  110116ns      =>  acceleration multiplier:  3.17
loops:  7404     x:     128      y:     128      z:     64       time:  675ns      =>  acceleration multiplier:  1.58
loops:  636      x:     128      y:     1024     z:     64       time:  7872ns      =>  acceleration multiplier:  2.70
loops:  724      x:     1024     y:     128      z:     64       time:  6911ns      =>  acceleration multiplier:  1.32
loops:  73       x:     1024     y:     1024     z:     64       time:  68502ns      =>  acceleration multiplier:  2.49

before:
loops:  2253     x:     128      y:     128      z:     128      time:  2219ns
loops:  117      x:     128      y:     1024     z:     128      time:  42788ns
loops:  214      x:     1024     y:     128      z:     128      time:  23465ns
loops:  15       x:     1024     y:     1024     z:     128      time:  349076ns
loops:  4694     x:     128      y:     128      z:     64       time:  1065ns
loops:  236      x:     128      y:     1024     z:     64       time:  21251ns
loops:  549      x:     1024     y:     128      z:     64       time:  9108ns
loops:  30       x:     1024     y:     1024     z:     64       time:  170799ns
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17730

Differential Revision: D14325149

Pulled By: zhangguanheng66

fbshipit-source-id: a7a5a83890fdf99fee6eb87a3a5060b7b6bd862f
2019-03-13 08:57:26 -07:00
Edward Yang
6466ddbd86 Fix lint in test_torch.py (#17807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17807

Lint also detected a bug in test_linspace where we weren't
actually testing the CUDA case.

Differential Revision: D14388241

fbshipit-source-id: e219e46400f4952c6b384bca3baa0724ef94acde
2019-03-12 13:48:28 -07:00
Thomas Viehmann
aba9051a65 kthvalue consistency with sort in the presence of NaN (#17824)
Summary:
This PR causes kthvalue to be consistent with sort
(i.e. treat NaN as larger than any number), so that
`a.kthvalue(n) == a.sort()[n - 1]`.

One drawback is that median with a NaN argument does not return NaN,
which is a deviation from NumPy.

Thank you, ngimel, for raising this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17824

Differential Revision: D14410092

Pulled By: ezyang

fbshipit-source-id: bdec2d8272dc4c65bcf2f9b8995e237774c44c02
2019-03-12 08:49:19 -07:00
vishwakftw
f268370b42 torch.btrifact for tensors with greater than 3 dimensions (#14964)
Summary:
Motivation:
- Earlier, `torch.btrifact` could not handle tensors with greater than 3 dimensions. This is because of the check:
>   AT_CHECK(THTensor_(nDimension)(a) == 3, "expected 3D tensor, got size: ", a->sizes());

What is in this PR?:
- Move `btrifact` to ATen
- Remove relation to TH/THC.
- Handle tensors with more than three dimensions
- Tests
- Docs modifications: added a note about the non-pivoting variant.

[blocked due to old magma-cuda binaries]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14964

Differential Revision: D14405106

Pulled By: soumith

fbshipit-source-id: f051f5d6aaa45f85836a2867176c065733563184
2019-03-12 01:46:07 -07:00
Iurii Zdebskyi
4aa22833cf Bool tensor creation (cpu) (#17376)
Summary:
This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
    1. Storage Implementation [Done]
    2. Tensor Creation.
        a) CPU (this PR)
        b) CUDA
    3. Tensor Conversions.
    4. Tensor Indexing.
    5. Tensor Operations.
    6. Back compatibility related changes.

**Change**:
Enable CPU tensors and these operations:
- torch.zeros
- torch.tensor
- torch.ones
- torch.randint
- torch.full
- torch.full_like
- torch.empty
- torch.empty_like

**Tested via**:
1) unit tests

2)
torch.zeros(2,2, dtype=torch.bool)
torch.tensor([True, False], dtype=torch.bool)
torch.tensor([-1, -1.1, 0, 1, 1.1, 2], dtype=torch.bool)
torch.ones([1,2], dtype=torch.bool)
torch.randint(10, (2, 2), dtype=torch.bool)
torch.full((2, 3), True, dtype=torch.bool)
torch.empty(4, dtype=torch.bool)

a = torch.tensor([0,0,1])
b = torch.full_like(a, True)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17376

Reviewed By: ezyang

Differential Revision: D14375995

Pulled By: izdeby

fbshipit-source-id: a65490b5360ee0e6e3accc54ce7e32e49ad2d2a8
2019-03-11 17:03:40 -07:00
Gao, Xiang
11c89dde55 Allow structseq to be input of operators where tuple is expected (#17208)
Summary:
Currently the following code gives an error on python 2 because `ret` is a structseq which is not a tuple
```python
ret = a.max(dim=0)
ret1 = torch.max(a, dim=0, out=ret)
```

This PR modify tuple check in python arg parser to allow structseq to be input of operators where tuple is expected, which would make the above code work.

Depend on: https://github.com/pytorch/pytorch/pull/17136
Partially fixes: https://github.com/pytorch/pytorch/issues/16813
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17208

Differential Revision: D14280198

Pulled By: VitalyFedyunin

fbshipit-source-id: beffebfd3951c4f5c7c8fe99a5847616a89491f3
2019-03-11 11:33:35 -07:00
bhushan
b57fe3cc66 Introducing array-like sequence methods __contains__ (#17733)
Summary:
for tensor

Fixes: #17000
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17733

Differential Revision: D14401952

Pulled By: soumith

fbshipit-source-id: c841b128c5a1fceda1094323ed4ef1d0cf494909
2019-03-11 09:00:16 -07:00
bhushan
6bcff88d3e Fix log_softmax and softmax if any dimension is 0-d (#17651)
Summary:
- Test added
- test_dim_function_empty: softmax and log_softmax on last dimension

fixes: #17262
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17651

Differential Revision: D14349009

Pulled By: gchanan

fbshipit-source-id: b6f728f5c6be8ae7615749e3f0c201886632923e
2019-03-10 15:25:58 -07:00
vishwakftw
9d70e199f4 Move lerp to ATen, add functionality for tensor weights (#17348)
Summary:
Changelog:
- Remove TH/THC bindings
- Add tensor weights for `lerp`
- Modify derivatives appropriately
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17348

Differential Revision: D14355845

Pulled By: soumith

fbshipit-source-id: eaede4c09ee589d77ba6cf52583510ea8e3a2fcf
2019-03-07 14:04:58 -08:00
bhushan
886e482776 index operation support for torch.HalfTensor (#17645)
Summary:
- Test cases added
1. indexing for half tensor
2. setting for half tensor

fixes #17161
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17645

Differential Revision: D14302069

Pulled By: ezyang

fbshipit-source-id: 100f141c07046f200c904e27c5882a9417bccda0
2019-03-06 10:32:35 -08:00
Edward Yang
2ed99fee0d Revert D13935403: Call c10 cuda op from test_torch
Differential Revision:
D13935403

Original commit changeset: b2915ec8a366

fbshipit-source-id: 0f3409d5c102d719bc1f0483695aee93e7d613c9
2019-03-01 14:18:26 -08:00
Sebastian Messmer
0a7b2af13b Call c10 cuda op from test_torch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16692

Reviewed By: ezyang

Differential Revision: D13935403

fbshipit-source-id: b2915ec8a3664bb6e918ed357908cc33d8f9449a
2019-03-01 10:59:19 -08:00
bhushan
a6170573c8 Adding support for 0-d tensor for transpose (.t()) (#17535)
Summary:
- Test updates
1. test_torch: added 0-d test case and t_() test cases
2. test_jit  : updated error message for TestAsync.test_async_script_error

- Updating documentation for torch.t()
Adding information regarding new support of 0-D and 1-D tenso

Fixes #17520
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17535

Differential Revision: D14269984

Pulled By: gchanan

fbshipit-source-id: 38b723f31484be939261c88edb33575d242eca65
2019-03-01 08:45:01 -08:00
Xiang Gao
2e5a8cee82 Customize the printing of namedtuple return (#17136)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17112
```python
print("good", torch.randn(5,5,5).max(1))
print("terrible", torch.randn(5,5,10).max(1))
print("not as good", torch.randn(5,5,500).max(1))
print ("old behaviour = gold standard")
print(tuple(torch.randn(5,5,5).max(1)))
print(tuple(torch.randn(5,5,10).max(1)))
print(tuple(torch.randn(5,5,500).max(1)))
```
now gives
```
>>> import torch
>>> print("good", torch.randn(5,5,5).max(1))
good torch.return_types.max(
values=tensor([[ 1.2821,  1.8063,  1.8075,  1.3082, -0.1267],
        [ 0.3437,  0.7353,  1.2619,  0.7557,  1.6662],
        [ 0.8583,  1.8906,  1.0246,  1.7598,  1.1184],
        [ 1.7821,  0.0230,  0.9452,  1.0318,  1.0823],
        [ 0.4116, -0.0379, -0.1843,  1.4129,  1.8796]]),
indices=tensor([[4, 4, 3, 2, 1],
        [1, 2, 4, 1, 1],
        [2, 4, 0, 2, 1],
        [0, 2, 0, 3, 1],
        [0, 4, 4, 4, 4]]))
>>> print("terrible", torch.randn(5,5,10).max(1))
terrible torch.return_types.max(
values=tensor([[ 2.1272,  1.3664,  2.2067,  1.3974, -0.0883,  1.2505,  1.0074,  1.1217,
          0.3849,  0.6936],
        [ 0.6288, -0.4560,  1.2748,  1.5482,  1.2777,  1.6874,  0.7151,  0.6041,
          1.3572,  1.6232],
        [ 1.6703,  1.0075,  1.6480,  2.2839,  1.3390,  0.4938,  1.6449,  1.7628,
          0.8141,  2.5714],
        [ 0.7079,  1.8677,  3.2478,  1.5591,  2.4870,  0.8635, -0.1450,  1.6923,
          1.4924,  1.6298],
        [ 2.4056,  0.8002,  0.9317,  0.7455,  0.7866,  2.1191,  0.3492,  1.2095,
          1.8637,  1.7470]]),
indices=tensor([[1, 1, 0, 0, 0, 0, 3, 4, 4, 4],
        [4, 2, 2, 1, 2, 2, 3, 1, 1, 3],
        [0, 3, 3, 0, 2, 1, 4, 1, 0, 1],
        [4, 1, 3, 0, 3, 2, 0, 1, 4, 3],
        [1, 0, 3, 2, 1, 0, 0, 1, 0, 1]]))
>>> print("not as good", torch.randn(5,5,500).max(1))
not as good torch.return_types.max(
values=tensor([[ 0.3877,  0.7873,  1.8701,  ...,  0.5971,  1.6103, -0.3435],
        [ 1.1300,  2.2418,  1.4239,  ...,  1.3943,  0.3872,  1.6475],
        [ 2.0656,  1.3136,  0.9896,  ...,  2.3918,  0.8226,  1.0517],
        [ 1.1054,  0.9945,  1.0561,  ...,  2.1039,  1.1524,  3.0304],
        [ 1.5041,  2.2809,  1.0883,  ...,  0.8504,  2.4774,  1.1041]]),
indices=tensor([[4, 3, 1,  ..., 1, 4, 0],
        [4, 4, 4,  ..., 3, 0, 3],
        [3, 0, 1,  ..., 2, 2, 4],
        [0, 1, 1,  ..., 4, 2, 2],
        [1, 0, 4,  ..., 2, 0, 2]]))
>>> print ("old behaviour = gold standard")
old behaviour = gold standard
>>> print(tuple(torch.randn(5,5,5).max(1)))
(tensor([[ 1.1908,  1.1807,  1.3151,  1.7184,  0.3556],
        [ 0.3798,  0.9213,  0.3001,  1.3087,  2.2419],
        [ 1.4233,  1.4814,  1.9900,  1.7744,  1.3059],
        [ 1.0026, -0.0330,  1.3061,  1.8730,  2.0685],
        [ 1.3041,  1.6458,  1.3449,  1.8948,  3.6206]]), tensor([[0, 4, 3, 4, 0],
        [1, 1, 4, 0, 4],
        [4, 1, 0, 3, 3],
        [1, 2, 1, 4, 0],
        [3, 3, 0, 3, 3]]))
>>> print(tuple(torch.randn(5,5,10).max(1)))
(tensor([[-0.1232,  0.8275,  0.6732,  1.1223,  0.8247,  1.2851,  1.6009,  1.9979,
          1.9109,  0.7313],
        [ 0.2260,  0.5922,  1.6928,  0.6024,  2.1158,  3.0619,  0.5653,  0.7426,
          0.8316,  0.6346],
        [ 0.4319,  0.2231,  0.5255,  1.7620,  1.1657,  0.8875,  0.5782,  0.6506,
          0.5032,  1.7097],
        [ 0.4137,  1.7265,  1.4260,  2.0301,  1.2244,  0.7128,  2.6345,  0.7230,
          1.3553,  1.6508],
        [ 1.0684,  1.7195,  1.4068,  0.7076, -0.0242,  0.8474,  0.8754,  1.7108,
          0.2188,  1.1584]]), tensor([[0, 1, 3, 4, 2, 3, 4, 2, 1, 0],
        [1, 4, 0, 0, 3, 2, 0, 0, 3, 3],
        [2, 3, 1, 1, 4, 0, 1, 4, 4, 4],
        [0, 4, 1, 3, 2, 0, 2, 0, 3, 1],
        [1, 0, 0, 0, 0, 3, 3, 3, 2, 0]]))
>>> print(tuple(torch.randn(5,5,500).max(1)))
(tensor([[0.9395, 1.5572, 1.8797,  ..., 2.0494, 0.8202, 0.9623],
        [1.7937, 0.7225, 1.8836,  ..., 0.7927, 1.4976, 1.1813],
        [0.8558, 1.6943, 1.4192,  ..., 0.8327, 1.9661, 0.4197],
        [1.2993, 1.4995, 0.9357,  ..., 0.7810, 1.3030, 2.6216],
        [1.4206, 1.8315, 1.0338,  ..., 1.4312, 1.3198, 1.5233]]), tensor([[0, 4, 3,  ..., 3, 0, 2],
        [0, 1, 0,  ..., 0, 4, 3],
        [3, 4, 3,  ..., 3, 0, 0],
        [3, 2, 3,  ..., 1, 2, 1],
        [1, 2, 4,  ..., 3, 1, 3]]))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17136

Differential Revision: D14250021

Pulled By: VitalyFedyunin

fbshipit-source-id: aae72f03b35980063b1ac1f07b8353eddb0c8b93
2019-02-28 13:07:26 -08:00
bhushan
4ca1a54526 Make transpose consistent with numpy's behavior (#17462)
Summary:
Pytorch's tensor.t() is now equivalent with Numpy's ndarray.T for 1D tensor
i.e. tensor.t() == tensor

Test case added:
- test_t

fixes #9687
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17462

Differential Revision: D14214838

Pulled By: soumith

fbshipit-source-id: c5df1ecc8837be22478e3a82ce4854ccabb35765
2019-02-26 14:23:19 -08:00
Stefan Krah
e4e9b738d3 Followup to #17049: change more instances of RuntimeError to IndexError
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17114

Differential Revision: D14150890

Pulled By: gchanan

fbshipit-source-id: 579ca71665166c6a904b894598a0b334f0d8acc7
2019-02-25 15:34:22 -08:00
Gregory Chanan
15a55b86ed Fix nonzero for scalars on cuda, to_sparse for scalars on cpu/cuda. (#17406)
Summary:
I originally set out to fix to_sparse for scalars, which had some overly restrictive checking (sparse_dim > 0, which is impossible for a scalar).

This fix uncovered an issue with nonzero: it didn't properly return a size (z, 0) tensor for an input scalar, where z is the number of nonzero elements (i.e. 0 or 1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17406

Differential Revision: D14185393

Pulled By: gchanan

fbshipit-source-id: f37a6e1e3773fd9cbf69eeca7fdebb3caa192a19
2019-02-25 08:23:40 -08:00
Xiang Gao
b2dde4386a Namedtuple return for symeig, eig, pstrf, qr, geqrf (#16950)
Summary: More ops for https://github.com/pytorch/pytorch/issues/394

Differential Revision: D14118645

Pulled By: ezyang

fbshipit-source-id: a98646c3ddcbe4e34452aa044951286dcf9df778
2019-02-20 14:01:19 -08:00
SsnL
79f898263b Improve error message w/ size inference on empty tensors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17255

Differential Revision: D14143094

Pulled By: soumith

fbshipit-source-id: f96fa7f8eb6eaac72887d3e837546cbfa505f101
2019-02-20 09:12:26 -08:00
Will Feng
c88798dbc1 Make tril_ and triu_ actually in-place (#17031)
Summary:
Currently, when the input tensor `self` is not contiguous, `tril_` and `triu_` calls `self = self.contiguous()`, which allocates a new contiguous tensor and assign it to `self`. This effectively changes the input tensor `self`'s pointer and will break downstream code after Variable/Tensor merge.

This PR fixes it so that `tril_` and `triu_` always update the input tensor in-place and preserve the input tensor's TensorImpl.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17031

Differential Revision: D14069592

Pulled By: yf225

fbshipit-source-id: d188218f426446a44ccc1d33fc28ac3f828c6a05
2019-02-19 14:47:17 -08:00
Iurii Zdebskyi
444039c47b Bool tensor. Part 0: Boolean storage implementation (#16810)
Summary:
This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:

0. Storage Implementation (this change)
1. Tensor Creation.
2. Tensor Conversions.
3. Tensor Indexing.
4. Tensor Operations.
5. Back compatibility related changes.

This feature was requested by the community:
https://github.com/pytorch/pytorch/issues/4764
https://github.com/pytorch/pytorch/issues/4219
https://github.com/pytorch/pytorch/issues/4288

**Change**:
Added boolean type to the Storage class for CPU and CUDA backends.

**Tested via**:
1. unit tests
2. running this:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810

Reviewed By: gchanan

Differential Revision: D14087246

Pulled By: izdeby

fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e
2019-02-19 08:22:13 -08:00
Gao, Xiang
b6b99fd7d3 Add namedtuple return for min, median, mode, kthvalue, add test for namedtuple return API (#16186)
Summary:
This partially fixes https://github.com/pytorch/pytorch/issues/394 and depend on https://github.com/pytorch/pytorch/pull/15429. I suggest to review this only after https://github.com/pytorch/pytorch/pull/15429 get landed, otherwise the diff might be large to review.

The test only allows explicitly whitelisted operators to have named return.

Differential Revision: D14070735

Pulled By: ezyang

fbshipit-source-id: ace2a672998b4e4a8094f52cbda5aa1cea6e3b42
2019-02-16 00:01:33 -08:00
Xiang Gao
4fcab92d6c Move outplace ops to ATen (#16788)
Summary:
Based on https://github.com/pytorch/pytorch/pull/12413, with the following additional changes:

-  Inside `native_functions.yml` move those outplace operators right next to everyone's corresponding inplace operators for convenience of checking if they match when reviewing
- `matches_jit_signature: True` for them
- Add missing `scatter` with Scalar source
- Add missing `masked_fill` and `index_fill` with Tensor source.
- Add missing test for `scatter` with Scalar source
- Add missing test for `masked_fill` and `index_fill` with Tensor source by checking the gradient w.r.t source
- Add missing docs to `tensor.rst`

Differential Revision: D14069925

Pulled By: ezyang

fbshipit-source-id: bb3f0cb51cf6b756788dc4955667fead6e8796e5
2019-02-15 15:58:10 -08:00
Stefan Krah
a5e7b1d032 Use IndexError instead of RuntimeError in ATen CPU kernels
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17049

Reviewed By: ezyang

Differential Revision: D14064700

Pulled By: fmassa

fbshipit-source-id: 3575db103bba5a7d82f574cbb082beca419151ec
2019-02-13 10:19:28 -08:00
vishwakftw
0d95028bee Dispatch the correct legacy function for geqrf_out and ormqr_out (#16964)
Summary:
This fixes the segfault.

Changelog:
- Modify the function calls in LegacyDefinitions for `geqrf_out` and `ormqr_out`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16964

Differential Revision: D14025985

Pulled By: gchanan

fbshipit-source-id: aa50e2c1694cbf3642273ee14b09ba12625c7d33
2019-02-12 13:48:51 -08:00