Commit Graph

1219 Commits

Author SHA1 Message Date
Michael Suo
a25b79531c use fully qualified name for ScriptClasses (#19239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19239
ghimport-source-id: 830aad6dc11d2a7247760a9c7c9fc8556f70a706

Differential Revision: D14928293

Reviewed By: eellison

Pulled By: suo

fbshipit-source-id: d2efa5d7f7397526083278d6650b9cee8d967b1a
2019-04-26 19:17:21 -07:00
Junjie Bai
c9f380df02 Add aten mkldnn linear operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19210

Reviewed By: dzhulgakov

Differential Revision: D14901641

fbshipit-source-id: 8fa68b9941fd93cea0f313a828cba34c5c81ae11
2019-04-26 13:41:57 -07:00
Karl Ostmo
8f0603b128 C++ changes toward libtorch and libcaffe2 unification (#19554)
Summary:
* adds TORCH_API and AT_CUDA_API in places
* refactor code generation Python logic to separate
  caffe2/torch outputs
* fix hip and asan
* remove profiler_cuda from hip
* fix gcc warnings for enums
* Fix PythonOp::Kind
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554

Differential Revision: D15082727

Pulled By: kostmo

fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4
2019-04-26 01:38:10 -07:00
Thomas Viehmann
556c8a300b Fall back to asking nvcc for detecting cuda version if no *cudaart* is found (#19741)
Summary:
This happens on Debian/Ubuntu with distribution-provided cuda repackaging.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19741

Differential Revision: D15082550

Pulled By: soumith

fbshipit-source-id: 2ca39c6cdc9305896529b6fd537270116223cd6c
2019-04-25 10:54:20 -07:00
Roy Li
a6811e17c0 Restore copy_ overload with async arg (#19641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19641
ghimport-source-id: 7099221334505bacdc209cff8bf29e3004c30379

Differential Revision: D15056755

Pulled By: li-roy

fbshipit-source-id: e9063b606e72a70fc1270fbcdcf1c0b23d876dd3
2019-04-24 17:51:50 -07:00
Vitaly Fedyunin
d14abe3aff Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688)
Summary:
Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688

Differential Revision: D15012644

Pulled By: VitalyFedyunin

fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd
2019-04-24 15:38:56 -07:00
Dmytro Dzhulgakov
d247912dbf Add no-gpu build mode for all of PyTorch and Caffe2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19687

Differential Revision: D15023347

fbshipit-source-id: 5bed0d72e8ff337e066c142ca5c8e2c2bae93746
2019-04-24 13:27:59 -07:00
Dmytro Dzhulgakov
8b798f43e3 Commit explicit libtorch_python sources (#19607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19607

Explicit is better than implicit - it's pretty hard to debug where particular file is if it's not greppable.

As a follow up step - we should look whether we can just include build_variables.py in CMake directly to share setups of two build systems

Reviewed By: ezyang

Differential Revision: D15023348

fbshipit-source-id: 600ef2d1871bc28530c6a02681b284f7499904df
2019-04-23 19:49:42 -07:00
James Reed
80020b3d2d Guard {set,rebase}_history on grad_fn check (#19623)
Summary:
We would previously have statements like

```
set_history(flatten_tensor_args( result ), grad_fn);
```

Internally, {set,rebase}_history would check grad_fn and short circuit if it is nullptr. However, this means that we are executing the expression `flatten_tensor_args( result )` and immediately throwing away the results. This was causing unnecessary allocations + overhead.

My JIT overhead benchmark script (with custom benchmark method):

```
import torch, time

torch.jit.script
def add(x, y):
    return x + y

a = torch.rand([])
b = torch.rand([])

niter = 1000000

with torch.no_grad():
    s = time.time()
    add.__getattr__('forward').benchmark(niter, a, b)
    e = time.time() - s
    print('overhead per call (us)', e / niter * 1e6)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19623

Differential Revision: D15053399

Pulled By: jamesr66a

fbshipit-source-id: 8777e1a2b5c5a5bbd3a035b7247c8154c5fc4aa6
2019-04-23 15:40:11 -07:00
Wanchao Liang
e9c8f372c4 dispatch max_pools with no indices, expose max_pools to torch namespace (#19449)
Summary:
in functional interfaces we do boolean dispatch, but all to max_pool\*d_with_indices. This change it to emit max_pool\*d op instead when it's not necessary to expose with_indices ops to different backends (for jit).

It also bind max_pool\*d to the torch namespace, which is the same behavior with avg_pool\*d
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19449

Differential Revision: D15016839

Pulled By: wanchaol

fbshipit-source-id: f77cd5f0bcd6d8534c1296d89b061023a8288a2c
2019-04-23 11:20:05 -07:00
vishwakftw
c30224ad21 Rename potri to cholesky_inverse (#19498)
Summary:
Changelog:
- Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498

Differential Revision: D15029901

Pulled By: ezyang

fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a
2019-04-22 08:18:39 -07:00
Roy Li
689dd800ed Generate only one Type class per backend (#19295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19295
ghimport-source-id: 9345110f91f044a449804ddd5116cc9179444a00

Differential Revision: D14948581

Pulled By: li-roy

fbshipit-source-id: a317b03d58d621e8df162918038f7543bfb13ba2
2019-04-21 21:16:14 -07:00
Roy Li
ab78449e8c Add ScalarType argument to Type::options() (#19270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19270
ghimport-source-id: a5ade6131f3260066c5750ea1fa9ed5c998bb791

Differential Revision: D14938707

Pulled By: li-roy

fbshipit-source-id: 018fb3f01706531a06515d6d861e5683a455a705
2019-04-21 21:16:07 -07:00
Gregory Chanan
9eb48e1b03 Make one_hot non-differentiable. (#19524)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19524
ghimport-source-id: ceda3ad43471242ebbd272a21de11731c7d8bef6

Differential Revision: D15021417

Pulled By: gchanan

fbshipit-source-id: 65d1f17a32f81f47dba5e58e343d0b7b828e1d51
2019-04-21 14:14:37 -07:00
Gregory Chanan
6733037416 Remove 'BoolTensor', 'IndexTensor' from frontend specifications. (#19523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19523
ghimport-source-id: 618a15c2d1d9af9f87b46e32f10ff77111c2e3b7

Differential Revision: D15021420

Pulled By: gchanan

fbshipit-source-id: 048af8da3128de10bdee5827b6fbc169c3ad25a8
2019-04-21 14:14:34 -07:00
Gregory Chanan
3944601588 Have _embedding_bag_dense_backward match JIT signature. (#19522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19522
ghimport-source-id: ad645d87396de645a1aff5fd9d9939cb79cf6558

Differential Revision: D15021419

Pulled By: gchanan

fbshipit-source-id: bd7017edadb4ec9d43cefddf0aee8c52c5cca6a4
2019-04-21 14:14:30 -07:00
Gregory Chanan
e3523979ae Have embedding_dense_backward match JIT signature. (#19521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19521
ghimport-source-id: 817d3defb5f4ee98bae1f0488f99cb0e9a5226a2

Differential Revision: D15021376

Pulled By: gchanan

fbshipit-source-id: 2e29f1d3913f94fab3347dc48676303510d7da46
2019-04-21 14:14:27 -07:00
Gregory Chanan
83373e7755 Hook up non_differentiability in derivatives.yaml when no autograd function is generated. (#19520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19520
ghimport-source-id: a1272aa0b23692fb189974c4daba7b2e4e0dad50

Differential Revision: D15021380

Pulled By: gchanan

fbshipit-source-id: ec83efd4bb6d17714c060f13a0527a33a10452db
2019-04-21 13:48:55 -07:00
Gregory Chanan
8868a4f20b Move non_differentiable_arg_names from autograd functions to differentiability_info. (#19519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19519
ghimport-source-id: 74e603688b2e4ed33f6c46c7da9d009336140e74

Differential Revision: D15021378

Pulled By: gchanan

fbshipit-source-id: e366a914c67a90ba0552b67d0bf5b347edbaf189
2019-04-21 11:09:39 -07:00
James Reed
d17c22d024 Improve embedding_bag add kernel (#19329)
Summary:
This was actually getting pretty poor throughput with respect to memory bandwidth. I used this test to measure the memory bandwidth specifically for the AXPY call: https://gist.github.com/jamesr66a/b27ff9ecbe036eed5ec310c0a3cc53c5

And I got ~8 GB/s before this change, but ~14 GB/s after this change.

This seems to speed up the operator overall by around 1.3x (benchmark: https://gist.github.com/jamesr66a/c533817c334d0be432720ef5e54a4166):

== Before ==

time_per_iter 0.0001298875093460083
GB/s 3.082544287868467

== After ==

time_per_iter 0.00010104801654815674
GB/s 3.9623142905451076

The large difference between the local BW increase and the full-op BW increase likely indicates significant time is being spent elsewhere in the op, so I will investigate that.

EDIT: I updated this PR to include a call into caffe2/perfkernels. This is the progression:

before

time_per_iter 8.983819484710693e-05
GB/s 4.456723564864611

After no axpy
time_per_iter 7.19951868057251e-05
GB/s 5.56126065872172

AFter perfkernels
time_per_iter 5.6699180603027346e-05
GB/s 7.061548257694262

After perfkernels no grad
time_per_iter 4.388842582702637e-05
GB/s 9.122769670026413
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19329

Reviewed By: dzhulgakov

Differential Revision: D14969630

Pulled By: jamesr66a

fbshipit-source-id: 42d1015772c87bedd119e33c0aa2c8105160a738
2019-04-19 19:16:24 -07:00
Mikhail Zolotukhin
9818c7cb63 Add minimalistic implementation of subgraph matcher. (#19322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19322
ghimport-source-id: 93c713f829d1b2a9aa5d104cb1f30148dd37c967

Differential Revision: D14962182

Pulled By: ZolotukhinM

fbshipit-source-id: 3989fba06502011bed9c24f12648d0baa2a4480c
2019-04-19 16:35:16 -07:00
Gregory Chanan
1898e9368b Revert D15003385: Have embedding_dense_backward match JIT signature.
Differential Revision:
D15003385

Original commit changeset: 53cbe18aa454

fbshipit-source-id: be904ee2212aa9e402715c436a84d95f6cde326f
2019-04-19 11:27:16 -07:00
Gregory Chanan
e3470ae4bd Revert D15003379: Have _embedding_bag_dense_backward match JIT signature.
Differential Revision:
D15003379

Original commit changeset: f8e82800171f

fbshipit-source-id: 55f83557998d166aeb41d00d7a590acdc76fcf22
2019-04-19 11:27:13 -07:00
Gregory Chanan
79bfc3931a Revert D15003387: Remove 'BoolTensor', 'IndexTensor' from frontend specifications.
Differential Revision:
D15003387

Original commit changeset: e518e8ce3228

fbshipit-source-id: af5b107239446ea8d6f229a427d5b157fcafd224
2019-04-19 11:27:10 -07:00
Gregory Chanan
013926cfcf Revert D15003382: Make one_hot non-differentiable.
Differential Revision:
D15003382

Original commit changeset: e9244c7a5f0a

fbshipit-source-id: 84789cf4c46c77cce655e70c2a8ff425f32f48bd
2019-04-19 11:27:08 -07:00
Gregory Chanan
c3755eeeee Make one_hot non-differentiable. (#19430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19430
ghimport-source-id: 6787473873fdc21400138a4322e17fee8db62607

Differential Revision: D15003382

Pulled By: gchanan

fbshipit-source-id: e9244c7a5f0ad7cd2f79635944a8b37f910231c9
2019-04-19 11:03:14 -07:00
Gregory Chanan
622cf1fec9 Remove 'BoolTensor', 'IndexTensor' from frontend specifications. (#19429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19429
ghimport-source-id: 6116682b84210a34babb8b87a92e7050433e5d59

Differential Revision: D15003387

Pulled By: gchanan

fbshipit-source-id: e518e8ce322810e06175bb4e6672d4ea1eb18efd
2019-04-19 11:03:12 -07:00
Gregory Chanan
b0812d3d4c Have embedding_dense_backward match JIT signature. (#19427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19427
ghimport-source-id: 93438cd495129a1e41118c62e6339909783035fd

Differential Revision: D15003385

Pulled By: gchanan

fbshipit-source-id: 53cbe18aa4541a2501f496abfee526e40093c0ff
2019-04-19 11:03:09 -07:00
Gregory Chanan
a6ab443e32 Have _embedding_bag_dense_backward match JIT signature. (#19428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19428
ghimport-source-id: 037efa3df95efc1fbff631826351d1698a3c49ec

Differential Revision: D15003379

Pulled By: gchanan

fbshipit-source-id: f8e82800171f632e28535e416283d858156068ec
2019-04-19 11:03:06 -07:00
Gregory Chanan
30b2953b8b Stop generating autograd functions for derivatives.yaml entries that only specify output differentiability. (#19424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19424
ghimport-source-id: e9d1b86742607f5cbe39fb278fa7f378739cd6ef

Differential Revision: D15003380

Pulled By: gchanan

fbshipit-source-id: 8efb94fbc0b843863021bf25deab57c492086237
2019-04-19 10:56:20 -07:00
Gregory Chanan
ea6c738c8a Rename 'not_differentiable' to 'non_differentiable'. (#19272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19272
ghimport-source-id: 755e91efa68c5a1c4377a6853f21b3eee3f8cab5

Differential Revision: D15003381

Pulled By: gchanan

fbshipit-source-id: 54db27c5c5e65acf65821543db3217de9dd9bdb5
2019-04-19 07:07:55 -07:00
Sebastian Messmer
41dc54e291 Move function schema parser to ATen/core build target (#19282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19282

This is largely a hack because we need to use the function schema parser from ATen/core
but aren't clear yet on how the final software architecture should look like.

- Add function schema parser files from jit to ATen/core build target.
- Also move ATen/core build target one directory up to allow this.

We only change the build targets and don't move the files yet because this is likely
not the final build set up and we want to avoid repeated interruptions
for other developers. cc zdevito

Reviewed By: dzhulgakov

Differential Revision: D14931922

fbshipit-source-id: 26462e2e7aec9e0964706138edd3d87a83b964e3
2019-04-18 01:03:37 -07:00
Roy Li
fbf505cba7 Remove copy and copy_ special case on Type (#18972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18972
ghimport-source-id: b5d3012b00530145fa24ab0cab693a7e80cb5989

Differential Revision: D14816530

Pulled By: li-roy

fbshipit-source-id: 9c7a166abb22d2cd1f81f352e44d9df1541b1774
2019-04-18 00:21:43 -07:00
Sebastian Messmer
c7b1fdb767 Fixing function schema parser for Android (#19281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19281

String<->Number conversions aren't available in the STL used in our Android environment.
This diff adds workarounds for that so that the function schema parser can be compiled for android

Reviewed By: dzhulgakov

Differential Revision: D14931649

fbshipit-source-id: d5d386f2c474d3742ed89e52dff751513142efad
2019-04-17 23:50:17 -07:00
Sebastian Messmer
094678c04b Split function schema parser from operator (#19280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19280

We want to use the function schema parser from ATen/core, but with as little dependencies as possible.
This diff moves the function schema parser into its own file and removes some of its dependencies.

Reviewed By: dzhulgakov

Differential Revision: D14931651

fbshipit-source-id: c2d787202795ff034da8cba255b9f007e69b4aea
2019-04-17 23:50:15 -07:00
Eric Faust
48859e3ad3 Allow for single-line deletions in clang_tidy.py (#19082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19082

When you have just one line of deletions, just as with additions, there is no count printed.
Without this fix, we ignore all globs with single-line deletions when selecting which lines were changed.
When all the changes in the file were single-line, this meant no line-filtering at all!

Differential Revision: D14860426

fbshipit-source-id: c60e9d84f9520871fc0c08fa8c772c227d06fa27
2019-04-17 17:02:30 -07:00
Nikolay Korovaiko
58d4414c33 Profiling pipeline part1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18772

Differential Revision: D14952781

Pulled By: Krovatkin

fbshipit-source-id: 1e99fc9053c377291167f0b04b0f0829b452dbc4
2019-04-16 21:21:08 -07:00
Vitaly Fedyunin
1c5073fb4b Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.

Now compatible with both torch scripts:
 `  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)`
and
`  _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))`

Same checked for all similar functions `rand_like`, `empty_like` and others

It is fixed version of #18455
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952

Differential Revision: D14801792

Pulled By: VitalyFedyunin

fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba
2019-04-16 11:06:15 -07:00
Ilia Cherniavskii
f1c8e01524 Add input information in RecordFunction calls (#18717)
Summary:
Add input information into generated RecordFunction calls in
VariableType wrappers, JIT operators and a few more locations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18717

Differential Revision: D14729156

Pulled By: ilia-cher

fbshipit-source-id: 811ac4cbfd85af5c389ef030a7e82ef454afadec
2019-04-15 20:28:08 -07:00
vishwakftw
3403cb857b Modify Cholesky derivative (#19116)
Summary:
The derivative of the Cholesky decomposition was previously a triangular matrix.

Changelog:
- Modify the derivative of Cholesky from a triangular matrix to symmetric matrix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19116

Differential Revision: D14935470

Pulled By: ezyang

fbshipit-source-id: 1c1c76b478c6b99e4e16624682842cb632e8e8b9
2019-04-15 12:16:55 -07:00
Bram Wasti
b1539412db Add pass registration mechanism (#18587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18587
ghimport-source-id: 80d753f7046a2a719e0c076684f44fa2059a0921

Differential Revision: D14901227

Pulled By: bwasti

fbshipit-source-id: 56511d0313419b63945a36b80e9ea51abdef2bd4
2019-04-12 15:32:00 -07:00
Zachary DeVito
ef406ee925 First class modules in the compiler, round 2 (#19167)
Summary:
This PR propagates where we use first-class modules objects into the compiler. This creates a transitionary state where:

* compiler.cpp creates Graphs where `self` is a Module class and attributes/parameters/buffers/submodules are looked up with `prim::GetAttr`
* GraphExecutor still runs "lowered graphs" where the self object has been removed by a compiler pass `lower_first_class_method`.
* Tracing still creates "lowered graphs", and a pass "lift_lowered_method" creates a first-class method graph for things.

* This PR separates out Method and Function. A script::Function is a pure Graph with no `self` bound.  Similar to Python, a script::Method is just a bound `self` and its underlying `script::Function`.
* This PR also separates CompilationUnit from Module. A CompilationUnit is just a list of named script::Functions.  Class's have a CompilationUnit holding the class methods, and Modules also have a CompilationUnit holding their Methods. This avoids the weird circular case Module --has a-> Class -> has a -> Module ...

Details:
* In this transitionary state, we maintain two copies of a Graph, first-class module and lowered. Th first-class one has a self argument that is the module's class type. The lowered one is the lowered graph that uses the initial_ivalues inputs.
* When defining lowered methods using `_defined_lowered` we immediately create the first-class equivalent. The reverse is done lazily, creating lowered_methods on demand from the class.
* The two way conversions will be deleted in a future PR when the executor itself runs first-class objects. However this requires more changes to (1) the traces, (2) the python bindings, and (3) the onnx export pass and would make this PR way to large.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19167

Differential Revision: D14891966

Pulled By: zdevito

fbshipit-source-id: 0b5f03118aa65448a15c7a7818e64089ec93d7ea
2019-04-11 13:55:48 -07:00
Zachary DeVito
f5165ade5b Revert D14842057: Compiler uses first-class modules**
Differential Revision:
D14842057

Original commit changeset: ca6e7b5a4380

fbshipit-source-id: e8f1862a59bf20d5f78648b2fdc53a8b3750ead3
2019-04-11 06:17:01 -07:00
Zachary DeVito
5e1f0b2a07 Compiler uses first-class modules** (#19043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19043
ghimport-source-id: 0c9e80d5f35654af6d472abd5643bff3e9eb9ddf

Differential Revision: D14842057

Pulled By: zdevito

fbshipit-source-id: ca6e7b5a43805240f40b84d30e54495061067dc0
2019-04-11 00:00:48 -07:00
Xiang Gao
ea2405c7dc Add torch.unique_consecutive (#19060)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/19045

Please review: VitalyFedyunin ngimel

This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there.

The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely.

Benchmark on a tensor of shape `torch.Size([15320, 2])`:

```python
print(torch.__version__)
print()
a = tensor.sort().values.to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True)
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
1.1.0a0+2addccc

cpu, sorted_input=False:
340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cuda, sorted_input=False:
213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

```python
print(torch.__version__)
print()
a1, a2 = tensor.unbind(1)
indices = (a1 * tensor.max() + a2).sort().indices
a = tensor.index_select(0, indices).to('cpu')
print('cpu, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True)
print()
print('cpu, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True)
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True)
print()
a = a.to('cuda')
print('cuda, sorted_input=False:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize()
print()
print('cuda, sorted_input=True:')
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```

```
cpu, sorted_input=False:
55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cpu, sorted_input=True:
54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

cuda, sorted_input=False:
171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

cuda, sorted_input=True:
59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060

Differential Revision: D14866909

Pulled By: ezyang

fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f
2019-04-10 07:36:08 -07:00
Richard Zou
447d74a074 EmbeddingBag w/ differentiable per_sample_weights (#18957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18957
ghimport-source-id: 7396ca08b137ea40f04285764a9d9a6d4f19227e

Reviewed By: cpuhrsch

Differential Revision: D14856526

Pulled By: zou3519

fbshipit-source-id: 949faea219c7c02ad981b1db610a477194d3f5c9
2019-04-09 18:13:06 -07:00
Richard Zou
2a2007e5ac EmbeddingBag CPU forward with per_sample_weights. (#18735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18735
ghimport-source-id: d81bef54dafd7167d2451250d7be478d3c013920

Reviewed By: cpuhrsch

Differential Revision: D14851415

Pulled By: zou3519

fbshipit-source-id: cea6039e760ad571b90f0a536e420498f34be325
2019-04-09 18:12:55 -07:00
Vishwak Srinivasan
487388d8ad Rename btrisolve to lu_solve (#18726)
Summary:
Changelog:
- Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726

Differential Revision: D14726237

Pulled By: zou3519

fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b
2019-04-09 15:21:24 -07:00
Xiang Gao
89145e602b Namedtuple return for gels, triangular_solve, and test refactor (#17195)
Summary:
Partial fix of: https://github.com/pytorch/pytorch/issues/394
- `gels` and `triangular_solve` now returns namedtuple
- refactor test for namedtuple API for better coverage and maintainability
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17195

Differential Revision: D14851875

Pulled By: ezyang

fbshipit-source-id: 9b2cba95564269d2c3a15324ba48751d68ed623c
2019-04-09 09:13:26 -07:00
Edward Yang
48a35135fb Convert all tabs to spaces, add CI. (#18959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959
ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156

Differential Revision: D14831246

Pulled By: ezyang

fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0
2019-04-09 08:12:26 -07:00