Commit Graph

153 Commits

Author SHA1 Message Date
Vitaly Fedyunin
9a9bb448ee Revert cudnn changes #23861 (#29329)
Summary:
Broken case:

```python
x = torch.randn(192,16,50).cuda()
x = x.permute(0,2,1).contiguous().permute(0,2,1)
m = torch.nn.Conv1d(
       in_channels=16,
       out_channels=32,
       kernel_size=2,
       bias=True,
  ).cuda()

m(x)
```

This reverts commit 8160f390cf.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29329

Differential Revision: D18357674

Pulled By: VitalyFedyunin

fbshipit-source-id: cdd7e77e8dcbfc5f2ab3df54eb53ccfbf703b245
2019-11-06 17:38:46 -08:00
Jie
8160f390cf (#23861)
Summary:
Added nhwc support for:
1. cudnn_batch_norm & cudnn_batch_norm_backward
2. cudnn_convolution_forward & cudnn_convolution_backward
3. cudnn_convolution_transpose & cudnn_convolution_transpose_backward

patching suggest_memory_format for convolution

suggest_memory_format has ambiguous meaning for two cases:
1. tensor with NCHW where C = 1.
   we could use stride of C as a hint to tell the intended memory format.
2. tensor with NCHW where H == W == 1.
   there's no way to identify the intended memory format from strides.

Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding
ambiguity for some of the special cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23861

Differential Revision: D18263434

Pulled By: VitalyFedyunin

fbshipit-source-id: dd9f69576ec12fec879cd87a3d446931371360d9
2019-11-04 09:11:50 -08:00
Peter Bell
f33813d589 Return NotImplemented from all binary math ops (#27423)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26333

Fixes the operators missed in https://github.com/pytorch/pytorch/issues/26507 and includes a test for all operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27423

Differential Revision: D17835390

Pulled By: ezyang

fbshipit-source-id: 7a1351c7ccc8ad11454dbaa00d3701dcee4f06a8
2019-10-28 14:28:33 -07:00
Pavel Belevich
46f96d1538 C++ API parity: at::Tensor::requires_grad_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26332

Test Plan: Imported from OSS

Differential Revision: D17427575

Pulled By: pbelevich

fbshipit-source-id: 5500169a4fa0ef9cc2a7272e13b6e2d89df09260
2019-10-24 13:24:18 -07:00
Brian Vaughan
002c250139 Expose a torch.result_type and simplify tensor iterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26012

Test Plan: Imported from OSS

Differential Revision: D17556197

Pulled By: nairbv

fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c
2019-09-25 06:52:23 -07:00
Dmytro Dzhulgakov
9aad4d7b5f Fix _empty_per_channel_affine_quantized to be less hacky (#26243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26243

This is an attempt to fix _empty_per_channel_affine_quantized to be more sane. It's a factory function that nevertheless receives a Tensor argument and it throws the codegen off course.

Before people did a hacky workaround of appending _like to the function name to trick codegen, it also required non-natural argument order.

This PR explicitly allows to override the 'category' of the function to make codegen do the right thing. Now name and the argument order (in C++) make more sense.

Test Plan: Imported from OSS

Differential Revision: D17443221

Pulled By: dzhulgakov

fbshipit-source-id: c98c1c74473d8cbf637f511d26ceb949d8ae2a1a
2019-09-23 22:28:58 -07:00
Pavel Belevich
d117842e56 C++ API parity: at::Tensor::version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26561

Test Plan: Imported from OSS

Differential Revision: D17507167

Pulled By: pbelevich

fbshipit-source-id: 167890c7b745acc9cb9ce4185f1d8c1745aaecc2
2019-09-21 08:37:46 -07:00
Edward Yang
a5bcde97af Revert D17427577: C++ API parity: at::Tensor::version
Test Plan: revert-hammer

Differential Revision:
D17427577

Original commit changeset: e9b3e76ca44d

fbshipit-source-id: a5bbae208ba33a31f90ab5c9b199f232de0c6d1b
2019-09-20 11:19:43 -07:00
Pavel Belevich
198521978b C++ API parity: at::Tensor::version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26217

Test Plan: Imported from OSS

Differential Revision: D17427577

Pulled By: pbelevich

fbshipit-source-id: e9b3e76ca44df883e3038b688dd7b930752d93a2
2019-09-20 11:02:41 -07:00
Dmytro Dzhulgakov
8c1354c31b Implement more support for per-channel quantization (#26240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26240

In particular adds support for empty/empty_like which is needed for memory layouts to work.

Test Plan: Imported from OSS

Differential Revision: D17443220

Pulled By: dzhulgakov

fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333
2019-09-19 13:39:17 -07:00
Pavel Belevich
fc3e1a22da C++ API parity: at::Tensor::output_nr
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26216

Test Plan: Imported from OSS

Differential Revision: D17427576

Pulled By: pbelevich

fbshipit-source-id: 351c834c6c44a2a2f915e48a1e8aa8ad7f4274b3
2019-09-19 09:11:40 -07:00
Pavel Belevich
44ffbc43de C++ API parity: at::Tensor::is_leaf
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26186

Test Plan: Imported from OSS

Differential Revision: D17427580

Pulled By: pbelevich

fbshipit-source-id: c01362a3b1fdb0bd1dfc158dbf6fe1cf1d928761
2019-09-18 17:56:13 -07:00
Ralf Gommers
1b4951d3a5 Fix remaining invalid function cast warnings that show up with GCC 8/9 (#26104)
Summary:
Follow-up to gh-25483, more of the same fixes for warnings like:

```
../torch/csrc/autograd/python_variable.cpp:503:31: warning: cast between incompatible function types from ‘PyObject* (*)(THPVariable*)’ {aka ‘_object* (*)(THPVariable*)’} to ‘getter’ {aka ‘_object* (*)(_object*, void*)’} [-Wcast-function-type]
  503 |   {"_backward_hooks", (getter)THPVariable_get_backwards_hooks, (setter)THPVariable_set_backwards_hooks, nullptr, nullptr},
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This takes the build log output for a full rebuild with GCC 9.1 from ~10,000 to ~7,000 lines.

`clang-tidy` is going to complain, no way around that - see discussion at the end of gh-25483.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26104

Differential Revision: D17396831

Pulled By: ezyang

fbshipit-source-id: d71696bfe4dbe25519e4bcb7753151c118bd39f7
2019-09-17 07:43:37 -07:00
Gregory Chanan
5aff3dbaf6 Kill 'default_init', which isn't needed anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26281

Test Plan: Imported from OSS

Differential Revision: D17397097

Pulled By: gchanan

fbshipit-source-id: fb53e90637a3dfb2300fca78f414abe2d82832f3
2019-09-16 16:20:49 -07:00
Pavel Belevich
33221b19ac C++ API parity: at::Tensor::data
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26008

Test Plan: Imported from OSS

Differential Revision: D17343488

Pulled By: pbelevich

fbshipit-source-id: b9ba5e26cad621a428a14292446d7fb5a6e5535d
2019-09-12 23:33:34 -07:00
Edward Yang
3d9c419648 Port new_empty to ATen. (#25475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25475

I got sucked into this rabbit hole when I was trying to understand
what I should do with TensorTypeId occurrences in
torch/csrc/utils/tensor_new.cpp.  I eventually concluded that all of my problems
were because Tensor.new_empty was hand implemented and not actually a native
function.  So I made it a native function.

There are a bunch of other new_* functions which should get this
treatment, but I'm sending out this PR just to show how it can
be done.

The general recipe:
1. Implement a concept of TensorOptions merging (TensorOptions::merge_in).
   This represents the notion of taking a tensor, but "overriding" some
   of its values with specific overrides.  One subtlety here is how
   devices get merged; see the comments for what our existing behavior is,
   and how I preserve it.
2. Implement new_empty as a native function, using options merging.
3. Add another special case to Python binding generation to treat new_*
   similar to *_like (i.e., handle TensorOptions correctly).  The logic
   here is probably wrong, actually; we should codegen TensorOptions
   correctly no matter what happens, but new_empty follows the same
   pattern as empty_like so I opted not to touch this code too much.
4. Delete the now defunct manual binding code.
5. Delete manual type annotations that are no longer necessary since
   we're going through native.

I didn't handle memory format correctly here.  I don't know if this function
should accept memory format; prior memory format patches didn't add support
for memory format to new_like.  If we had put memory format in TensorOptions
this wouldn't have been a question.
ghstack-source-id: 89294185

Test Plan: sandcastle & ossci

Differential Revision: D17133000

fbshipit-source-id: 00f4e98bd5174f6fd54e8aba2910ea91824771d9
2019-09-04 14:34:39 -07:00
Ailing Zhang
858493d168 generic overrideable convolution for backends (#23562)
Summary:
One possible solution based on our discussion yesterday: ezyang gchanan zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23562

Differential Revision: D16998161

Pulled By: ailzhang

fbshipit-source-id: 07fe3a335f43b4205a421b3521aeb5fa4dc80279
2019-08-27 18:33:21 -07:00
Edward Yang
d125b5ffa2 Fix C412 lint from flake8-comprehensions update. (#24184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24184

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16764168

Pulled By: ezyang

fbshipit-source-id: cc252a860fd7e4b7fb2b95c5d9fcdbf6935ffeb6
2019-08-12 14:34:45 -07:00
Richard Zou
0dcb8755c8 Implement tensor.set_names_, tensor.names setter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23172

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23172 gh/zou3519/74/head

Imported from OSS

Differential Revision: D16494364

Pulled By: zou3519

fbshipit-source-id: 8d0e26b33346d4eadba30b2e76610f6d7be7c373
2019-07-26 08:50:49 -07:00
vishwakftw
7d055c21b3 Port SVD to ATen, enable batching for matrix inputs (#21588)
Summary:
Changelog:
- Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Allow batches of matrices as arguments to `torch.svd`
- Remove existing implementations in TH and THC
- Update doc string
- Update derivatives to support batching
- Modify nuclear norm implementation to use at::svd instead of _batch_svd
- Remove _batch_svd as it is redundant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588

Test Plan:
- Add new test suite for SVD in test_torch.py with port to test_cuda.py
- Add tests in common_methods_invocations.py for derivative testing

Differential Revision: D16266115

Pulled By: nairbv

fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb
2019-07-15 13:34:01 -07:00
Brian Vaughan
97a604ef57 Rereapply optional ScalarType interface changes that were reverted in D16079809 (#22456)
Summary:
re-apply changes reverted in:
https://github.com/pytorch/pytorch/pull/22412

Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456

Differential Revision: D16097159

Pulled By: nairbv

fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d
2019-07-03 20:03:25 -07:00
Wanchao Liang
dff2c07183 Manual revert of D16012838
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22412

Reviewed By: nairbv, houseroad

Differential Revision: D16079809

fbshipit-source-id: ee0d805ff7a2bc5f98bcc65f90b8199751c840f6
2019-07-01 19:58:21 -07:00
Roy Li
6c454ff14c Stop using Type in Python bindings (#21963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21963
ghimport-source-id: 4d9d66ba2c8587503d892b67f535cc2a62e2d19e

Test Plan: Imported from OSS

Differential Revision: D15897423

Pulled By: li-roy

fbshipit-source-id: 2dd55ceb80971df7c86545b7bfff733387f13572
2019-06-30 04:11:32 -07:00
Brian Vaughan
7707dee761 Re apply optional ScalarType changes (#22237)
Summary:
This is (mostly) the re-application of:
https://github.com/pytorch/pytorch/pull/21088

which was reverted due to an issue conflicting with changes in:
https://github.com/pytorch/pytorch/pull/22104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22237

Differential Revision: D16012838

Pulled By: nairbv

fbshipit-source-id: 35f4a73c97ab68b4e2648aca96b2176f07b5a883
2019-06-26 13:36:25 -07:00
Vitaly Fedyunin
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
vishwakftw
bcb5fd8f06 Port symeig to ATen and enable batching of inputs (#21858)
Summary:
Changelog:
- Port `symeig` from TH/THC to ATen
- Enable batching of matrix inputs for `symeig`
- Modify derivative computation based on batching
- Update docs to reflect the change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21858

Test Plan: - Added additional tests in `test_torch.py` (with a port to `test_cuda.py`) and `common_methods_invocations.py` to test if both the port and batching work.

Differential Revision: D15981789

Pulled By: soumith

fbshipit-source-id: ab9af8361f8608db42318aabc8421bd99a1ca7ae
2019-06-25 12:13:27 -07:00
Michael Suo
e016a424ef Revert D15944971: [pytorch][PR] merge interfaces that have an optional scalartype parameter
Differential Revision:
D15944971

Original commit changeset: 53473c370813

fbshipit-source-id: a18158b448cb8993b12e1a3bf2c2a3e0d6df6b10
2019-06-24 09:41:33 -07:00
Brian Vaughan
142361a7e4 merge interfaces that have an optional scalartype parameter (#21088)
Summary:
This change is backwards incompatible in *C++ only* on mean(), sum(), and prod() interfaces that accepted either of:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
but now to specify both the dim and dtype will require the keepdim parameter:
```
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
```

[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088

Reviewed By: ailzhang

Differential Revision: D15944971

Pulled By: nairbv

fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74
2019-06-24 07:17:58 -07:00
Jerry Zhang
88921feafd change return type for q_scale and q_zero_point (#21709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21709

Change the return type from Scalar to double/int64_t so we don't need to do conversion when we call other quantize related aten functions

Differential Revision: D15793003

fbshipit-source-id: 510936c69fa17a4d67340a31ebb03415647feb04
2019-06-20 20:30:39 -07:00
Jerry Zhang
fa5263af2c Add set_quantizer_ for QTensor (#21852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21852

To enable change of q_scale and q_zero_point in `copy_`

Differential Revision: D15793427

fbshipit-source-id: a7040b5b956d161fd6af6176287f4a4aa877c9be
2019-06-18 19:50:12 -07:00
Jerry Zhang
94f903654c Add qscheme() method (#20608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608

Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc.

Reviewed By: zafartahirov

Differential Revision: D15364354

fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7
2019-06-14 16:29:29 -07:00
Richard Zou
0d6eb209e6 Expose torch.empty(sizes, *, names, ...) to Python (#21648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21648
ghimport-source-id: 583f155c8ee95967d2f8b9d8df27d94b9e725694

Differential Revision: D15804482

Pulled By: zou3519

fbshipit-source-id: f86520dda479100be2a752e4db8a902167413a83
2019-06-14 11:52:47 -07:00
Richard Zou
5c0e058950 Implement at::empty(IntArrayRef, DimnameList?, TensorOptions) in aten (#21647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21647
ghimport-source-id: 1db4ec31f047f7854a39c28e2b38918dc6b44f42

Differential Revision: D15804425

Pulled By: zou3519

fbshipit-source-id: 575cc3de09287efe75e7052df129626748208d0d
2019-06-13 20:38:19 -07:00
Brennan Vincent
f4f32cecfd numpy like nonzero (called nonzero_tuple) (#20293)
Summary:
No performance degradation compared to Numpy when indexing:

```
In [15]: x=torch.randn((1000,1000))

In [16]: %timeit x[x.nonzero_tuple()]
4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: y=x.numpy()

In [18]: %timeit y[y.nonzero()]
14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]: x=x.t()

In [22]: %timeit x[x.nonzero_tuple()]
9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [24]: y=x.numpy()

In [25]: %timeit y[y.nonzero()]
16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293

Differential Revision: D15358754

Pulled By: umanwizard

fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1
2019-06-06 12:50:59 -07:00
Brennan Vincent
e268fc97c3 Re-add Tensor.T (#21175)
Summary:
Something flaky is going on with `test_inplace_view_saved_output` on Windows.

With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted.

Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output...

I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing.

**Test Plan:** I will wait to make sure the Windows CI suite passes before landing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175

Differential Revision: D15566970

Pulled By: umanwizard

fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3
2019-06-04 17:38:25 -07:00
Edward Yang
0544a491d5 Revert D15499749: [pytorch][PR] Add Tensor.T attribute to reverse dimensions
Differential Revision:
D15499749

Original commit changeset: f3306b496667

fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878
2019-05-29 04:29:48 -07:00
Roy Li
3038cf8eee Remove THSTensor and SparseTensorRef (#20877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20877
ghimport-source-id: a07f53ca158f9a3dce7a25ef5a169871e98ea3ea

Differential Revision: D15480353

Pulled By: li-roy

fbshipit-source-id: 1152dbc4df827ded3be1a57f007a6b7de12f567f
2019-05-29 01:37:03 -07:00
vishwakftw
f6ec464890 Enable batched QR decomposition and add a some option (#20689)
Summary:
This PR covers two important points with respect to the QR decomposition:
- batching of input matrices (#7500)
- adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538)

Changelog:
- Enable batching for inputs to `torch.qr`
- Move QR decomposition implementation to ATen (CPU and CUDA)
- Remove existing implementations in TH/THC
- Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition
- Modify doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689

Differential Revision: D15529230

Pulled By: soumith

fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
2019-05-28 17:52:37 -07:00
Brennan Vincent
9294de8c9f Add Tensor.T attribute to reverse dimensions (#20598)
Summary:
For compatibility with numpy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598

Differential Revision: D15499749

Pulled By: umanwizard

fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc
2019-05-28 16:59:06 -07:00
Junjie Bai
c9f380df02 Add aten mkldnn linear operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19210

Reviewed By: dzhulgakov

Differential Revision: D14901641

fbshipit-source-id: 8fa68b9941fd93cea0f313a828cba34c5c81ae11
2019-04-26 13:41:57 -07:00
Roy Li
a6811e17c0 Restore copy_ overload with async arg (#19641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19641
ghimport-source-id: 7099221334505bacdc209cff8bf29e3004c30379

Differential Revision: D15056755

Pulled By: li-roy

fbshipit-source-id: e9063b606e72a70fc1270fbcdcf1c0b23d876dd3
2019-04-24 17:51:50 -07:00
Vitaly Fedyunin
d14abe3aff Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688)
Summary:
Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688

Differential Revision: D15012644

Pulled By: VitalyFedyunin

fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd
2019-04-24 15:38:56 -07:00
Wanchao Liang
e9c8f372c4 dispatch max_pools with no indices, expose max_pools to torch namespace (#19449)
Summary:
in functional interfaces we do boolean dispatch, but all to max_pool\*d_with_indices. This change it to emit max_pool\*d op instead when it's not necessary to expose with_indices ops to different backends (for jit).

It also bind max_pool\*d to the torch namespace, which is the same behavior with avg_pool\*d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19449

Differential Revision: D15016839

Pulled By: wanchaol

fbshipit-source-id: f77cd5f0bcd6d8534c1296d89b061023a8288a2c
2019-04-23 11:20:05 -07:00
Vitaly Fedyunin
1c5073fb4b Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.

Now compatible with both torch scripts:
 `  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)`
and
`  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))`

Same checked for all similar functions `rand_like`, `empty_like` and others

It is fixed version of #18455
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952

Differential Revision: D14801792

Pulled By: VitalyFedyunin

fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba
2019-04-16 11:06:15 -07:00
Xiang Gao
ea2405c7dc Add torch.unique_consecutive (#19060)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/19045

Please review: VitalyFedyunin ngimel

This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there.

The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely.

Benchmark on a tensor of shape `torch.Size([15320, 2])`:

```python
print(torch.__version__)
print()
a = tensor.sort().values.to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+2addccc

cpu, sorted_input=False:
340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cuda, sorted_input=False:
213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

```python
print(torch.__version__)
print()
a1, a2 = tensor.unbind(1)
indices = (a1 * tensor.max() + a2).sort().indices
a = tensor.index_select(0, indices).to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
cpu, sorted_input=False:
55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cuda, sorted_input=False:
171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060

Differential Revision: D14866909

Pulled By: ezyang

fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f
2019-04-10 07:36:08 -07:00
Vitaly Fedyunin
b7c830b916 Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854)
Summary:
This reverts commit c484cf43a0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854

Differential Revision: D14778393

Pulled By: VitalyFedyunin

fbshipit-source-id: 4b5a1f5b1c091bbc4a8e75614734cc011d26b452
2019-04-05 06:25:33 -07:00
Jerry Zhang
dfcd7b0185 QTensor (#18230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230

Implementing minimum qtensor API to unblock other workstreams in quantization

Changes:
- Added Quantizer which represents different quantization schemes
- Added qint8 as a data type for QTensor
- Added a new ScalarType QInt8
- Added QTensorImpl for QTensor
- Added following user facing APIs
  - quantize_linear(scale, zero_point)
  - dequantize()
  - q_scale()
  - q_zero_point()

Reviewed By: dzhulgakov

Differential Revision: D14524641

fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c
2019-04-03 13:17:11 -07:00
Vitaly Fedyunin
c484cf43a0 Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455

Reviewed By: ezyang

Differential Revision: D14672084

Pulled By: VitalyFedyunin

fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124
2019-04-02 08:48:19 -07:00
Vishwak Srinivasan
d859031ebf Rename btrifact* to lu (#18435)
Summary:
Changelog:

- Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`).
- Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple.
- Rename all tests, fix callsites
- Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage.
- Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435

Differential Revision: D14680352

Pulled By: soumith

fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8
2019-03-29 00:34:30 -07:00
vishwakftw
291746f110 Rename trtrs to triangular_solve (#18213)
Summary:
Changelog:
- Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage.
- Move `isnan` to _torch_docs.py
- Remove unnecessary imports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213

Differential Revision: D14566902

Pulled By: ezyang

fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f
2019-03-21 14:27:21 -07:00