pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Michael Suo	a25b79531c	use fully qualified name for ScriptClasses (#19239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19239 ghimport-source-id: 830aad6dc11d2a7247760a9c7c9fc8556f70a706 Differential Revision: D14928293 Reviewed By: eellison Pulled By: suo fbshipit-source-id: d2efa5d7f7397526083278d6650b9cee8d967b1a	2019-04-26 19:17:21 -07:00
Junjie Bai	c9f380df02	Add aten mkldnn linear operator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19210 Reviewed By: dzhulgakov Differential Revision: D14901641 fbshipit-source-id: 8fa68b9941fd93cea0f313a828cba34c5c81ae11	2019-04-26 13:41:57 -07:00
Karl Ostmo	8f0603b128	C++ changes toward libtorch and libcaffe2 unification (#19554 ) Summary: * adds TORCH_API and AT_CUDA_API in places * refactor code generation Python logic to separate caffe2/torch outputs * fix hip and asan * remove profiler_cuda from hip * fix gcc warnings for enums * Fix PythonOp::Kind Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554 Differential Revision: D15082727 Pulled By: kostmo fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4	2019-04-26 01:38:10 -07:00
Thomas Viehmann	556c8a300b	Fall back to asking nvcc for detecting cuda version if no cudaart is found (#19741 ) Summary: This happens on Debian/Ubuntu with distribution-provided cuda repackaging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19741 Differential Revision: D15082550 Pulled By: soumith fbshipit-source-id: 2ca39c6cdc9305896529b6fd537270116223cd6c	2019-04-25 10:54:20 -07:00
Roy Li	a6811e17c0	Restore copy_ overload with async arg (#19641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19641 ghimport-source-id: 7099221334505bacdc209cff8bf29e3004c30379 Differential Revision: D15056755 Pulled By: li-roy fbshipit-source-id: e9063b606e72a70fc1270fbcdcf1c0b23d876dd3	2019-04-24 17:51:50 -07:00
Vitaly Fedyunin	d14abe3aff	Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688 ) Summary: Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688 Differential Revision: D15012644 Pulled By: VitalyFedyunin fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd	2019-04-24 15:38:56 -07:00
Dmytro Dzhulgakov	d247912dbf	Add no-gpu build mode for all of PyTorch and Caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19687 Differential Revision: D15023347 fbshipit-source-id: 5bed0d72e8ff337e066c142ca5c8e2c2bae93746	2019-04-24 13:27:59 -07:00
Dmytro Dzhulgakov	8b798f43e3	Commit explicit libtorch_python sources (#19607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19607 Explicit is better than implicit - it's pretty hard to debug where particular file is if it's not greppable. As a follow up step - we should look whether we can just include build_variables.py in CMake directly to share setups of two build systems Reviewed By: ezyang Differential Revision: D15023348 fbshipit-source-id: 600ef2d1871bc28530c6a02681b284f7499904df	2019-04-23 19:49:42 -07:00
James Reed	80020b3d2d	Guard {set,rebase}_history on grad_fn check (#19623 ) Summary: We would previously have statements like ``` set_history(flatten_tensor_args( result ), grad_fn); ``` Internally, {set,rebase}_history would check grad_fn and short circuit if it is nullptr. However, this means that we are executing the expression `flatten_tensor_args( result )` and immediately throwing away the results. This was causing unnecessary allocations + overhead. My JIT overhead benchmark script (with custom benchmark method): ``` import torch, time torch.jit.script def add(x, y): return x + y a = torch.rand([]) b = torch.rand([]) niter = 1000000 with torch.no_grad(): s = time.time() add.__getattr__('forward').benchmark(niter, a, b) e = time.time() - s print('overhead per call (us)', e / niter * 1e6) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/19623 Differential Revision: D15053399 Pulled By: jamesr66a fbshipit-source-id: 8777e1a2b5c5a5bbd3a035b7247c8154c5fc4aa6	2019-04-23 15:40:11 -07:00
Wanchao Liang	e9c8f372c4	dispatch max_pools with no indices, expose max_pools to torch namespace (#19449 ) Summary: in functional interfaces we do boolean dispatch, but all to max_pool\d_with_indices. This change it to emit max_pool\d op instead when it's not necessary to expose with_indices ops to different backends (for jit). It also bind max_pool\d to the torch namespace, which is the same behavior with avg_pool\d Pull Request resolved: https://github.com/pytorch/pytorch/pull/19449 Differential Revision: D15016839 Pulled By: wanchaol fbshipit-source-id: f77cd5f0bcd6d8534c1296d89b061023a8288a2c	2019-04-23 11:20:05 -07:00
vishwakftw	c30224ad21	Rename potri to cholesky_inverse (#19498 ) Summary: Changelog: - Rename `potri` to `cholesky_inverse` to remain consistent with names of `cholesky` methods (`cholesky`, `cholesky_solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `cholesky_inverse` under the name `potri` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/19498 Differential Revision: D15029901 Pulled By: ezyang fbshipit-source-id: 2074286dc93d8744cdc9a45d54644fe57df3a57a	2019-04-22 08:18:39 -07:00
Roy Li	689dd800ed	Generate only one Type class per backend (#19295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19295 ghimport-source-id: 9345110f91f044a449804ddd5116cc9179444a00 Differential Revision: D14948581 Pulled By: li-roy fbshipit-source-id: a317b03d58d621e8df162918038f7543bfb13ba2	2019-04-21 21:16:14 -07:00
Roy Li	ab78449e8c	Add ScalarType argument to Type::options() (#19270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19270 ghimport-source-id: a5ade6131f3260066c5750ea1fa9ed5c998bb791 Differential Revision: D14938707 Pulled By: li-roy fbshipit-source-id: 018fb3f01706531a06515d6d861e5683a455a705	2019-04-21 21:16:07 -07:00
Gregory Chanan	9eb48e1b03	Make one_hot non-differentiable. (#19524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19524 ghimport-source-id: ceda3ad43471242ebbd272a21de11731c7d8bef6 Differential Revision: D15021417 Pulled By: gchanan fbshipit-source-id: 65d1f17a32f81f47dba5e58e343d0b7b828e1d51	2019-04-21 14:14:37 -07:00
Gregory Chanan	6733037416	Remove 'BoolTensor', 'IndexTensor' from frontend specifications. (#19523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19523 ghimport-source-id: 618a15c2d1d9af9f87b46e32f10ff77111c2e3b7 Differential Revision: D15021420 Pulled By: gchanan fbshipit-source-id: 048af8da3128de10bdee5827b6fbc169c3ad25a8	2019-04-21 14:14:34 -07:00
Gregory Chanan	3944601588	Have _embedding_bag_dense_backward match JIT signature. (#19522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19522 ghimport-source-id: ad645d87396de645a1aff5fd9d9939cb79cf6558 Differential Revision: D15021419 Pulled By: gchanan fbshipit-source-id: bd7017edadb4ec9d43cefddf0aee8c52c5cca6a4	2019-04-21 14:14:30 -07:00
Gregory Chanan	e3523979ae	Have embedding_dense_backward match JIT signature. (#19521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19521 ghimport-source-id: 817d3defb5f4ee98bae1f0488f99cb0e9a5226a2 Differential Revision: D15021376 Pulled By: gchanan fbshipit-source-id: 2e29f1d3913f94fab3347dc48676303510d7da46	2019-04-21 14:14:27 -07:00
Gregory Chanan	83373e7755	Hook up non_differentiability in derivatives.yaml when no autograd function is generated. (#19520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19520 ghimport-source-id: a1272aa0b23692fb189974c4daba7b2e4e0dad50 Differential Revision: D15021380 Pulled By: gchanan fbshipit-source-id: ec83efd4bb6d17714c060f13a0527a33a10452db	2019-04-21 13:48:55 -07:00
Gregory Chanan	8868a4f20b	Move non_differentiable_arg_names from autograd functions to differentiability_info. (#19519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19519 ghimport-source-id: 74e603688b2e4ed33f6c46c7da9d009336140e74 Differential Revision: D15021378 Pulled By: gchanan fbshipit-source-id: e366a914c67a90ba0552b67d0bf5b347edbaf189	2019-04-21 11:09:39 -07:00
James Reed	d17c22d024	Improve embedding_bag add kernel (#19329 ) Summary: This was actually getting pretty poor throughput with respect to memory bandwidth. I used this test to measure the memory bandwidth specifically for the AXPY call: https://gist.github.com/jamesr66a/b27ff9ecbe036eed5ec310c0a3cc53c5 And I got ~8 GB/s before this change, but ~14 GB/s after this change. This seems to speed up the operator overall by around 1.3x (benchmark: https://gist.github.com/jamesr66a/c533817c334d0be432720ef5e54a4166): == Before == time_per_iter 0.0001298875093460083 GB/s 3.082544287868467 == After == time_per_iter 0.00010104801654815674 GB/s 3.9623142905451076 The large difference between the local BW increase and the full-op BW increase likely indicates significant time is being spent elsewhere in the op, so I will investigate that. EDIT: I updated this PR to include a call into caffe2/perfkernels. This is the progression: before time_per_iter 8.983819484710693e-05 GB/s 4.456723564864611 After no axpy time_per_iter 7.19951868057251e-05 GB/s 5.56126065872172 AFter perfkernels time_per_iter 5.6699180603027346e-05 GB/s 7.061548257694262 After perfkernels no grad time_per_iter 4.388842582702637e-05 GB/s 9.122769670026413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19329 Reviewed By: dzhulgakov Differential Revision: D14969630 Pulled By: jamesr66a fbshipit-source-id: 42d1015772c87bedd119e33c0aa2c8105160a738	2019-04-19 19:16:24 -07:00
Mikhail Zolotukhin	9818c7cb63	Add minimalistic implementation of subgraph matcher. (#19322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19322 ghimport-source-id: 93c713f829d1b2a9aa5d104cb1f30148dd37c967 Differential Revision: D14962182 Pulled By: ZolotukhinM fbshipit-source-id: 3989fba06502011bed9c24f12648d0baa2a4480c	2019-04-19 16:35:16 -07:00
Gregory Chanan	1898e9368b	Revert D15003385: Have embedding_dense_backward match JIT signature. Differential Revision: D15003385 Original commit changeset: 53cbe18aa454 fbshipit-source-id: be904ee2212aa9e402715c436a84d95f6cde326f	2019-04-19 11:27:16 -07:00
Gregory Chanan	e3470ae4bd	Revert D15003379: Have _embedding_bag_dense_backward match JIT signature. Differential Revision: D15003379 Original commit changeset: f8e82800171f fbshipit-source-id: 55f83557998d166aeb41d00d7a590acdc76fcf22	2019-04-19 11:27:13 -07:00
Gregory Chanan	79bfc3931a	Revert D15003387: Remove 'BoolTensor', 'IndexTensor' from frontend specifications. Differential Revision: D15003387 Original commit changeset: e518e8ce3228 fbshipit-source-id: af5b107239446ea8d6f229a427d5b157fcafd224	2019-04-19 11:27:10 -07:00
Gregory Chanan	013926cfcf	Revert D15003382: Make one_hot non-differentiable. Differential Revision: D15003382 Original commit changeset: e9244c7a5f0a fbshipit-source-id: 84789cf4c46c77cce655e70c2a8ff425f32f48bd	2019-04-19 11:27:08 -07:00
Gregory Chanan	c3755eeeee	Make one_hot non-differentiable. (#19430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19430 ghimport-source-id: 6787473873fdc21400138a4322e17fee8db62607 Differential Revision: D15003382 Pulled By: gchanan fbshipit-source-id: e9244c7a5f0ad7cd2f79635944a8b37f910231c9	2019-04-19 11:03:14 -07:00
Gregory Chanan	622cf1fec9	Remove 'BoolTensor', 'IndexTensor' from frontend specifications. (#19429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19429 ghimport-source-id: 6116682b84210a34babb8b87a92e7050433e5d59 Differential Revision: D15003387 Pulled By: gchanan fbshipit-source-id: e518e8ce322810e06175bb4e6672d4ea1eb18efd	2019-04-19 11:03:12 -07:00
Gregory Chanan	b0812d3d4c	Have embedding_dense_backward match JIT signature. (#19427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19427 ghimport-source-id: 93438cd495129a1e41118c62e6339909783035fd Differential Revision: D15003385 Pulled By: gchanan fbshipit-source-id: 53cbe18aa4541a2501f496abfee526e40093c0ff	2019-04-19 11:03:09 -07:00
Gregory Chanan	a6ab443e32	Have _embedding_bag_dense_backward match JIT signature. (#19428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19428 ghimport-source-id: 037efa3df95efc1fbff631826351d1698a3c49ec Differential Revision: D15003379 Pulled By: gchanan fbshipit-source-id: f8e82800171f632e28535e416283d858156068ec	2019-04-19 11:03:06 -07:00
Gregory Chanan	30b2953b8b	Stop generating autograd functions for derivatives.yaml entries that only specify output differentiability. (#19424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19424 ghimport-source-id: e9d1b86742607f5cbe39fb278fa7f378739cd6ef Differential Revision: D15003380 Pulled By: gchanan fbshipit-source-id: 8efb94fbc0b843863021bf25deab57c492086237	2019-04-19 10:56:20 -07:00
Gregory Chanan	ea6c738c8a	Rename 'not_differentiable' to 'non_differentiable'. (#19272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19272 ghimport-source-id: 755e91efa68c5a1c4377a6853f21b3eee3f8cab5 Differential Revision: D15003381 Pulled By: gchanan fbshipit-source-id: 54db27c5c5e65acf65821543db3217de9dd9bdb5	2019-04-19 07:07:55 -07:00
Sebastian Messmer	41dc54e291	Move function schema parser to ATen/core build target (#19282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19282 This is largely a hack because we need to use the function schema parser from ATen/core but aren't clear yet on how the final software architecture should look like. - Add function schema parser files from jit to ATen/core build target. - Also move ATen/core build target one directory up to allow this. We only change the build targets and don't move the files yet because this is likely not the final build set up and we want to avoid repeated interruptions for other developers. cc zdevito Reviewed By: dzhulgakov Differential Revision: D14931922 fbshipit-source-id: 26462e2e7aec9e0964706138edd3d87a83b964e3	2019-04-18 01:03:37 -07:00
Roy Li	fbf505cba7	Remove copy and copy_ special case on Type (#18972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18972 ghimport-source-id: b5d3012b00530145fa24ab0cab693a7e80cb5989 Differential Revision: D14816530 Pulled By: li-roy fbshipit-source-id: 9c7a166abb22d2cd1f81f352e44d9df1541b1774	2019-04-18 00:21:43 -07:00
Sebastian Messmer	c7b1fdb767	Fixing function schema parser for Android (#19281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19281 String<->Number conversions aren't available in the STL used in our Android environment. This diff adds workarounds for that so that the function schema parser can be compiled for android Reviewed By: dzhulgakov Differential Revision: D14931649 fbshipit-source-id: d5d386f2c474d3742ed89e52dff751513142efad	2019-04-17 23:50:17 -07:00
Sebastian Messmer	094678c04b	Split function schema parser from operator (#19280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19280 We want to use the function schema parser from ATen/core, but with as little dependencies as possible. This diff moves the function schema parser into its own file and removes some of its dependencies. Reviewed By: dzhulgakov Differential Revision: D14931651 fbshipit-source-id: c2d787202795ff034da8cba255b9f007e69b4aea	2019-04-17 23:50:15 -07:00
Eric Faust	48859e3ad3	Allow for single-line deletions in clang_tidy.py (#19082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19082 When you have just one line of deletions, just as with additions, there is no count printed. Without this fix, we ignore all globs with single-line deletions when selecting which lines were changed. When all the changes in the file were single-line, this meant no line-filtering at all! Differential Revision: D14860426 fbshipit-source-id: c60e9d84f9520871fc0c08fa8c772c227d06fa27	2019-04-17 17:02:30 -07:00
Nikolay Korovaiko	58d4414c33	Profiling pipeline part1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18772 Differential Revision: D14952781 Pulled By: Krovatkin fbshipit-source-id: 1e99fc9053c377291167f0b04b0f0829b452dbc4	2019-04-16 21:21:08 -07:00
Vitaly Fedyunin	1c5073fb4b	Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors (#18952 ) Summary: Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary. Supported functions: ```python torch.rand_like(t, pin_memory=True) torch.randn_like(t, pin_memory=True) torch.empty_like(t, pin_memory=True) torch.full_like(t, 4, pin_memory=True) torch.zeros_like(t, pin_memory=True) torch.ones_like(t, pin_memory=True) torch.tensor([10,11], pin_memory=True) torch.randn(3, 5, pin_memory=True) torch.rand(3, pin_memory=True) torch.zeros(3, pin_memory=True) torch.randperm(3, pin_memory=True) torch.empty(6, pin_memory=True) torch.ones(6, pin_memory=True) torch.eye(6, pin_memory=True) torch.arange(3, 5, pin_memory=True) ``` Part of the bigger: `Remove Storage` plan. Now compatible with both torch scripts: ` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)` and ` _1 = torch.zeros([10], dtype=6, layout=0, device=torch.device("cpu"))` Same checked for all similar functions `rand_like`, `empty_like` and others It is fixed version of #18455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18952 Differential Revision: D14801792 Pulled By: VitalyFedyunin fbshipit-source-id: 8dbc61078ff7a637d0ecdb95d4e98f704d5450ba	2019-04-16 11:06:15 -07:00
Ilia Cherniavskii	f1c8e01524	Add input information in RecordFunction calls (#18717 ) Summary: Add input information into generated RecordFunction calls in VariableType wrappers, JIT operators and a few more locations Pull Request resolved: https://github.com/pytorch/pytorch/pull/18717 Differential Revision: D14729156 Pulled By: ilia-cher fbshipit-source-id: 811ac4cbfd85af5c389ef030a7e82ef454afadec	2019-04-15 20:28:08 -07:00
vishwakftw	3403cb857b	Modify Cholesky derivative (#19116 ) Summary: The derivative of the Cholesky decomposition was previously a triangular matrix. Changelog: - Modify the derivative of Cholesky from a triangular matrix to symmetric matrix Pull Request resolved: https://github.com/pytorch/pytorch/pull/19116 Differential Revision: D14935470 Pulled By: ezyang fbshipit-source-id: 1c1c76b478c6b99e4e16624682842cb632e8e8b9	2019-04-15 12:16:55 -07:00
Bram Wasti	b1539412db	Add pass registration mechanism (#18587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18587 ghimport-source-id: 80d753f7046a2a719e0c076684f44fa2059a0921 Differential Revision: D14901227 Pulled By: bwasti fbshipit-source-id: 56511d0313419b63945a36b80e9ea51abdef2bd4	2019-04-12 15:32:00 -07:00
Zachary DeVito	ef406ee925	First class modules in the compiler, round 2 (#19167 ) Summary: This PR propagates where we use first-class modules objects into the compiler. This creates a transitionary state where: * compiler.cpp creates Graphs where `self` is a Module class and attributes/parameters/buffers/submodules are looked up with `prim::GetAttr` * GraphExecutor still runs "lowered graphs" where the self object has been removed by a compiler pass `lower_first_class_method`. * Tracing still creates "lowered graphs", and a pass "lift_lowered_method" creates a first-class method graph for things. * This PR separates out Method and Function. A script::Function is a pure Graph with no `self` bound. Similar to Python, a script::Method is just a bound `self` and its underlying `script::Function`. * This PR also separates CompilationUnit from Module. A CompilationUnit is just a list of named script::Functions. Class's have a CompilationUnit holding the class methods, and Modules also have a CompilationUnit holding their Methods. This avoids the weird circular case Module --has a-> Class -> has a -> Module ... Details: * In this transitionary state, we maintain two copies of a Graph, first-class module and lowered. Th first-class one has a self argument that is the module's class type. The lowered one is the lowered graph that uses the initial_ivalues inputs. * When defining lowered methods using `_defined_lowered` we immediately create the first-class equivalent. The reverse is done lazily, creating lowered_methods on demand from the class. * The two way conversions will be deleted in a future PR when the executor itself runs first-class objects. However this requires more changes to (1) the traces, (2) the python bindings, and (3) the onnx export pass and would make this PR way to large. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19167 Differential Revision: D14891966 Pulled By: zdevito fbshipit-source-id: 0b5f03118aa65448a15c7a7818e64089ec93d7ea	2019-04-11 13:55:48 -07:00
Zachary DeVito	f5165ade5b	Revert D14842057: Compiler uses first-class modules** Differential Revision: D14842057 Original commit changeset: ca6e7b5a4380 fbshipit-source-id: e8f1862a59bf20d5f78648b2fdc53a8b3750ead3	2019-04-11 06:17:01 -07:00
Zachary DeVito	5e1f0b2a07	Compiler uses first-class modules** (#19043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19043 ghimport-source-id: 0c9e80d5f35654af6d472abd5643bff3e9eb9ddf Differential Revision: D14842057 Pulled By: zdevito fbshipit-source-id: ca6e7b5a43805240f40b84d30e54495061067dc0	2019-04-11 00:00:48 -07:00
Xiang Gao	ea2405c7dc	Add torch.unique_consecutive (#19060 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/19045 Please review: VitalyFedyunin ngimel This is independent on the #18649 series. This will cause merge conflicts in #18649 series, but please merge this first, and I will resolve the merge conflicts there. The new feature is exposed in `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon`. But not at `torch.unique` yet. I will take care of the API after #18649 series get merged completely. Benchmark on a tensor of shape `torch.Size([15320, 2])`: ```python print(torch.__version__) print() a = tensor.sort().values.to('cpu') print('cpu, sorted_input=False:') %timeit torch._unique2_temporary_will_remove_soon(a) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True) print() print('cpu, sorted_input=True:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True) %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True) print() a = a.to('cuda') print('cuda, sorted_input=False:') %timeit torch._unique2_temporary_will_remove_soon(a); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, return_inverse=True, return_counts=True); torch.cuda.synchronize() print() print('cuda, sorted_input=True:') %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_counts=True); torch.cuda.synchronize() %timeit torch._unique2_temporary_will_remove_soon(a, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` 1.1.0a0+2addccc cpu, sorted_input=False: 340 µs ± 5.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 717 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 52.3 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 52.3 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted_input=True: 32.8 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 49.9 µs ± 557 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 51.6 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 78 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cuda, sorted_input=False: 213 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 291 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 250 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 321 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cuda, sorted_input=True: 45.6 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 110 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 82 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 143 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` ```python print(torch.__version__) print() a1, a2 = tensor.unbind(1) indices = (a1 * tensor.max() + a2).sort().indices a = tensor.index_select(0, indices).to('cpu') print('cpu, sorted_input=False:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True) print() print('cpu, sorted_input=True:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True) %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True) print() a = a.to('cuda') print('cuda, sorted_input=False:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, return_inverse=True, return_counts=True); torch.cuda.synchronize() print() print('cuda, sorted_input=True:') %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_counts=True); torch.cuda.synchronize() %timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted_input=True, return_inverse=True, return_counts=True); torch.cuda.synchronize() ``` ``` cpu, sorted_input=False: 55.4 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.8 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.2 ms ± 402 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.1 ms ± 725 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cpu, sorted_input=True: 54.7 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 55.2 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.5 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 54.9 ms ± 577 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cuda, sorted_input=False: 171 µs ± 783 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 220 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 203 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 251 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cuda, sorted_input=True: 59.6 µs ± 757 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 113 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 93.2 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 147 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` The CPU implementation of `unique_dim` is super slow, see https://github.com/pytorch/pytorch/issues/18987, but this PR will not worry about this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19060 Differential Revision: D14866909 Pulled By: ezyang fbshipit-source-id: d20012cec68c37b05cf770a6f4d6524f910b950f	2019-04-10 07:36:08 -07:00
Richard Zou	447d74a074	EmbeddingBag w/ differentiable per_sample_weights (#18957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18957 ghimport-source-id: 7396ca08b137ea40f04285764a9d9a6d4f19227e Reviewed By: cpuhrsch Differential Revision: D14856526 Pulled By: zou3519 fbshipit-source-id: 949faea219c7c02ad981b1db610a477194d3f5c9	2019-04-09 18:13:06 -07:00
Richard Zou	2a2007e5ac	EmbeddingBag CPU forward with per_sample_weights. (#18735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18735 ghimport-source-id: d81bef54dafd7167d2451250d7be478d3c013920 Reviewed By: cpuhrsch Differential Revision: D14851415 Pulled By: zou3519 fbshipit-source-id: cea6039e760ad571b90f0a536e420498f34be325	2019-04-09 18:12:55 -07:00
Vishwak Srinivasan	487388d8ad	Rename btrisolve to lu_solve (#18726 ) Summary: Changelog: - Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`) - Fix all callsites - Rename all tests - Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726 Differential Revision: D14726237 Pulled By: zou3519 fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b	2019-04-09 15:21:24 -07:00
Xiang Gao	89145e602b	Namedtuple return for gels, triangular_solve, and test refactor (#17195 ) Summary: Partial fix of: https://github.com/pytorch/pytorch/issues/394 - `gels` and `triangular_solve` now returns namedtuple - refactor test for namedtuple API for better coverage and maintainability Pull Request resolved: https://github.com/pytorch/pytorch/pull/17195 Differential Revision: D14851875 Pulled By: ezyang fbshipit-source-id: 9b2cba95564269d2c3a15324ba48751d68ed623c	2019-04-09 09:13:26 -07:00
Edward Yang	48a35135fb	Convert all tabs to spaces, add CI. (#18959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959 ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156 Differential Revision: D14831246 Pulled By: ezyang fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0	2019-04-09 08:12:26 -07:00

1 2 3 4 5 ...

1219 Commits