pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiang Gao	1065e7cd24	Add `itertools.{prod, combinations, combinations_with_replacement}` like op to pytorch (#9393 ) Summary: closes https://github.com/pytorch/pytorch/issues/7580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9393 Differential Revision: D13659628 Pulled By: zou3519 fbshipit-source-id: 3a233befa785709395a793ba8833413be394a6fd	2019-01-15 08:31:22 -08:00
Brennan Vincent	bc233fe405	`var` for multiple dimensions (#15892 ) Summary: Timings are the same as for `std` . Pull Request resolved: https://github.com/pytorch/pytorch/pull/15892 Differential Revision: D13651173 Pulled By: umanwizard fbshipit-source-id: a26bf1021dd972aa9e3e60fb901cd4983bfa190f	2019-01-14 20:17:42 -08:00
Christian Puhrsch	d33159a426	Undo norm optimizations and add more documentation for parallel.h (#15885 ) Summary: See https://github.com/pytorch/pytorch/issues/15602 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15885 Differential Revision: D13614841 Pulled By: cpuhrsch fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf	2019-01-11 13:32:35 -08:00
Brennan Vincent	70dd44f6a8	Match NumPy by considering NaNs to be larger than any number when sorting (#15886 ) Summary: Fixes #15764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15886 Differential Revision: D13612971 Pulled By: umanwizard fbshipit-source-id: 91f552a25d1fd108f2f0b10e09a0ce0364f8c21e	2019-01-11 08:14:11 -08:00
Gregory Chanan	b7cdeb3fc3	Port empty_strided to ATen. (#15948 ) Summary: Turns out this has basically been implemented already in Resize.h / Resize.cuh. Also added some testing, basically just to check that empty_strided behaves equivalently to as_strided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15948 Differential Revision: D13631098 Pulled By: gchanan fbshipit-source-id: eb0e04eead45e4cff393ebde340f9d265779e185	2019-01-11 07:58:05 -08:00
vishwakftw	b4c3268b23	Batched upper triangular, lower triangular (#15257 ) Summary: Changelog: - Implements `triu` and `tril` for batches of 2D tensors. - Remove TH/THC binding for `tril` - Fix CUDA implementation - Update docstrings for tril and triu. - Remove mask-based `triu` and `tril` in cholesky forward and backward. - Remove batched tril in torch.distributions.utils Pull Request resolved: https://github.com/pytorch/pytorch/pull/15257 Differential Revision: D13613888 Pulled By: mrshenli fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4	2019-01-09 19:46:39 -08:00
zou3519	f0c2a9a7b6	Add torch.bincount() test case on sliced tensor (#15835 ) Summary: This was causing a problem in #15735 but appears to have been fixed. Adding this test to prevent regressions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15835 Differential Revision: D13600282 Pulled By: zou3519 fbshipit-source-id: d9939e74d372be71c50122a5f6a615fbd7fa4df6	2019-01-09 07:31:19 -08:00
vishwakftw	95febdfacc	Add is_floating_point to docs (#15704 ) Summary: Fixes #15700 . Changelog: - Expose torch.*.is_floating_point to docs Differential Revision: D13580734 Pulled By: zou3519 fbshipit-source-id: 76edb4af666c08237091a2cebf53d9ba5e6c8909	2019-01-07 10:43:22 -08:00
mruberry	b6a8c45f57	Removes print statements from test_torch.py (#15747 ) Summary: These print statements do not affect the test, and tests (generally) shouldn't print. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15747 Differential Revision: D13587289 Pulled By: soumith fbshipit-source-id: c758793c9e35faf02bacba6c7c6d072f7c40453f	2019-01-05 09:07:27 -08:00
Shen Li	efc3d6b65d	Fix vec256 inversion (#15659 ) Summary: soumith zou3519 I was browsing the code, and think `vec256_int.h` might need a minor revision, but not 100% sure. 1. It currently invert the result by `XOR` with 0. Should it `XOR` with 1 instead? ~2. AVX2 logical operations would set all bits in a byte/word/... to `1` if the condition holds. So functions, such as `_mm256_cmpeq_epi64 ` would return `0/-1` instead of `0/1`. Should it be masked with `1` to make sure it returns 0/1?~ ~Would I be correct if I assume that the code revised below is not yet activated, but will be after we port legacy code to ATen?~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/15659 Differential Revision: D13565929 Pulled By: mrshenli fbshipit-source-id: 8ae3daf256c3d915dd855a2215c95275e899ea8c	2019-01-02 21:32:44 -08:00
surgan12	b52420742d	clamp fixes (#15479 ) Summary: fix to #15338 . Differential Revision: D13564343 Pulled By: soumith fbshipit-source-id: be64b572945533e10ae6f627d335b47f093720a3	2019-01-01 23:12:17 -08:00
vishwakftw	7bb41e3953	Make btriunpack work for high dimensional batches and faster than before (#15286 ) Summary: Changelog: - Optimize btriunpack by using `torch.where` instead of indexing, inplace operations instead of out place operations and avoiding costly permutations by computing the final permutation over a list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15286 Differential Revision: D13562038 Pulled By: soumith fbshipit-source-id: e2c94cfab5322bf1d24bf56d7b056619f553acc6	2018-12-30 12:42:07 -08:00
Vishwak Srinivasan	9c8d8eab9d	Remove TH/THC link for gesv (#15510 ) Summary: This PR removes the TH/THC binding for gesv. Changelog: - Remove TH/THC binding - Port single matrix case to ATen - Enable test_gesv for CUDA as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/15510 Differential Revision: D13559990 Pulled By: soumith fbshipit-source-id: 9da2825e94d3103627e719709e6b1f8b521a07fb	2018-12-28 16:54:27 -08:00
Will Feng	7b87ecae37	Move autograd metadata from VariableImpl to TensorImpl (#13827 ) Summary: Changes originally in this PR: 1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct 2. Change Variable::Impl functions to use data members in `AutogradMeta` struct 3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl 4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()` Changes moved from https://github.com/pytorch/pytorch/pull/13645: 1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python. 2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated. This is the 1st+2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13827 Differential Revision: D13507173 Pulled By: yf225 fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a	2018-12-26 16:34:24 -08:00
Frank Zhang	d4712ee218	Added correct isinf handling for Integral tensors (#15489 ) Summary: Currently torch.isinf on integral tensor will raise RuntimeError: value cannot be converted to type int16_t without overflow: inf. This pr will suppress the error and return false(0) for all integral tensors. The behavior will also be consistent with np.isinf Pull Request resolved: https://github.com/pytorch/pytorch/pull/15489 Reviewed By: zou3519 Differential Revision: D13540786 Pulled By: flashhack fbshipit-source-id: e730dea849da6a59f3752d347bcfbadfd12c6483	2018-12-26 06:36:09 -08:00
SsnL	521894c490	Allow converting char tensor to numpy; add [fi]info.min (#15046 ) Summary: https://github.com/pytorch/pytorch/pull/14710 with test fixed. Also added `finfo.min` and `iinfo.min` to get castable tensors. cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/15046 Reviewed By: soumith Differential Revision: D13429388 Pulled By: SsnL fbshipit-source-id: 9a08004419c83bc5ef51d03b6df3961a9f5dbf47	2018-12-24 09:11:24 -08:00
Gao, Xiang	a47749cb28	Add at::one_hot (#15208 ) Summary: Closes: https://github.com/pytorch/pytorch/issues/15060 Differential Revision: D13528014 Pulled By: ezyang fbshipit-source-id: 5a18689a4c5638d92f9390c91517f741e5396293	2018-12-20 14:24:58 -08:00
Shen Li	06a7cb5901	Implementing cuda kernel for tril_indices and triu_indices (#15203 ) Summary: Followup PR of #14904, and the stretch goal of #12653. Directly calculate coordinates in the original tensor using column index in the result tensor. Every GPU thread takes care of a column (two numbers) in the output tensor. The implementation detects and handles precision loss during calculating the square root of a `int64_t` variable, and supports tensors with up to `row * column = 2 ^ 59` numbers. Algorithm details are describe in [comments of TensorFactories.cu](`23ddb6f58a/aten/src/ATen/native/cuda/TensorFactories.cu (L109-L255)`). zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15203 Reviewed By: zou3519 Differential Revision: D13517695 Pulled By: mrshenli fbshipit-source-id: 86b305d22cac08c8962a3b0cf8e9e620b7ec33ea	2018-12-20 10:23:38 -08:00
Erik Brinkman	8db44eda01	Add support for batched pdist (#12302 ) Summary: This updates pdist to work for batched inputs, and updates the documentation to reflect issues raised. closes #9406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12302 Reviewed By: ezyang Differential Revision: D13528485 Pulled By: erikbrinkman fbshipit-source-id: 63d93a6e1cc95b483fb58e9ff021758b341cd4de	2018-12-20 09:41:08 -08:00
Brennan Vincent	7a764fe270	multi-dim standard deviation for CUDA. (#14990 ) Summary: This is the CUDA version of #14535 . It refactors Reduce.cuh to allow more general classes of reductions to be performed -- we no longer assume that the temporary data returned during reduction is just one scalar, and instead allow an arbitrary accumulate type. We also allow 64-bit indexing when necessary, since in general we will no longer be able to accumulate directly in the output. (In the cases when we can, we continue to split the tensors until they can be addressed with 32-bits, as before). As an initial use-case, we implement `std` in multiple dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14990 Differential Revision: D13405097 Pulled By: umanwizard fbshipit-source-id: a56c24dc2fd5326d417632089bd3f5c4f9f0d2cb	2018-12-20 08:56:32 -08:00
vishwakftw	41e7e1bc40	Rename potrs to cholesky_solve (#15334 ) Summary: Changelog: - Renames `potrs` to `cholesky_solve` to remain consistent with Tensorflow and Scipy (not really, they call their function chol_solve) - Default argument for upper in cholesky_solve is False. This will allow a seamless interface between `cholesky` and `cholesky_solve`, since the `upper` argument in both function are the same. - Rename all tests - Create a tentative alias for `cholesky_solve` under the name `potrs`, and add deprecated warning to not promote usage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15334 Differential Revision: D13507724 Pulled By: soumith fbshipit-source-id: b826996541e49d2e2bcd061b72a38c39450c76d0	2018-12-19 12:31:24 -08:00
Gregory Chanan	2469f7e02e	Port torch.linspace to ATen and parallelize it on CPU. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15320 Reviewed By: ezyang Differential Revision: D13498995 Pulled By: gchanan fbshipit-source-id: fba655d51d978fffaa53a5e4cae4a99ebfb0eddc	2018-12-18 15:01:49 -08:00
vishwakftw	214f46faf5	Fix bincount for non-contiguous inputs on CPU (#15109 ) Summary: Fixes #15058. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15109 Differential Revision: D13447448 Pulled By: soumith fbshipit-source-id: 56e8d42934538fb00465105a2c5ccfeb7c18a651	2018-12-13 09:44:20 -08:00
Tyler Moncur	895cb8fcea	Fix resize for edge case tensors (#14874 ) Summary: Certain tensor shapes failed when being resized. This pull request addresses the bug found in #13404. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14874 Differential Revision: D13429788 Pulled By: soumith fbshipit-source-id: 8aa6451dbadce46d6d1c47a01cb26e6559bcfc8c	2018-12-12 19:56:23 -08:00
Shen Li	90f9e8103c	Implement torch.tril_indices and torch.triu_indices (#12653 ) (#14904 ) Summary: This is an optimized implementation that does the following: 1. created an empty Tensor of correct size. 2. fill the Tensor with correct values. The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors. 1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations. 2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration. 3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it. <img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png"> NOTE: This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following: ```python x = torch.ones(3, 3) i = torch.tril_indices(3, 3) x[i] # need to first convert the 2D tensor into a tuple of two 1D tensors. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904 Reviewed By: zou3519 Differential Revision: D13433027 Pulled By: mrshenli fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a	2018-12-12 15:40:14 -08:00
Brennan Vincent	f36a84b71b	fix some tests that I accidentally disabled (#15077 ) Summary: While moving these scenarios into `_test_dim_ops` I accidentally left an empty loop in the actual tests, causing them to do nothing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15077 Differential Revision: D13428759 Pulled By: umanwizard fbshipit-source-id: 08f53068981d9192c1408878b168e9053f4dc92e	2018-12-12 09:25:34 -08:00
Edward Yang	d30b6bf3b6	Revert D13306052: [pytorch][PR] Allow converting CharTensor to np arrays Differential Revision: D13306052 Original commit changeset: 202d038f139c fbshipit-source-id: 11f6bdd687f8ea5ce2e5f28f48d19449a5c403eb	2018-12-10 10:36:17 -08:00
SsnL	54d5c53826	Support torch.load with encoding (#14743 ) Summary: Addresses a common compatibility issue when loading Py2 checkpoints in Py3 regarding to bytes. E.g., [1] https://github.com/pytorch/pytorch/issues/5994, [2] https://github.com/CSAILVision/places365/issues/25, [3] https://discuss.pytorch.org/t/how-to-load-a-saved-model-trained-on-pytorch-0-3-1-python-2-7-on-pyorch-1-0-python-3-7/31212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14743 Reviewed By: weiyangfb Differential Revision: D13350888 Pulled By: soumith fbshipit-source-id: 2df4e828a8b70509118a355307ca3ebe51e108f6	2018-12-10 08:07:36 -08:00
SsnL	9b2bd284b3	Convert int8 numpy array to CharTensor (#14700 ) Summary: When rewriting `default_collate`, I noticed that `from_numpy` and `as_tensor` and `tensor` all do not work on `np.int8` arrays. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14700 Reviewed By: weiyangfb Differential Revision: D13305297 Pulled By: soumith fbshipit-source-id: 2937110f65ed714ee830d50098db292238e9b2a9	2018-12-10 07:39:06 -08:00
SsnL	e1b5dbf699	Allow converting CharTensor to np arrays (#14710 ) Summary: The other direction of #14700 cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14710 Reviewed By: weiyangfb Differential Revision: D13306052 Pulled By: soumith fbshipit-source-id: 202d038f139cf05e01069ff8d05268c66354c983	2018-12-10 07:35:28 -08:00
vishwakftw	fc30e2782c	Remove deprecated info argument in btrifact (#14935 ) Summary: As specified in title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14935 Differential Revision: D13394449 Pulled By: soumith fbshipit-source-id: 569d59414f3a1a43ea641bded4b5433eb53e3490	2018-12-09 15:59:30 -08:00
Brennan Vincent	25110d61fb	Implement `std` for multiple dimensions on CPU devices. (#14535 ) Summary: Tested on a tensor with 1 billion elements and 3 dimensions on a powerful, highly multi-core Linux machine. parallelized: All operations (e.g., `t.std(1)`) that could be done in the old code are now several times faster. All new operations (e.g., `t.std((0,2))` are significantly faster than the NumPy equivalents. `t.std((0, 1, 2))`, a new operation, is logically equivalent to the old `t.std()`, but faster. serial: The above comment about old operationos now being faster still holds, but `t.std((t1, ..., tn))` is now a few times slower than `t.std()`. If this turns out to be important, we can special-case that to use the old algorithm. The approach is to create a new method, `TensorIterator::foreach_reduced_elt`, valid for `TensorIterator`s that represent a dimension reduction. This method calls a supplied function for each element in the output, supplying it with the input elements that correspond to that output. Given that primitive, we can implement reductions like the following pseudocode: If there is more than one output element: ``` PARALLEL FOR EACH element IN output: accumulator = identity SERIAL FOR EACH data_point IN element.corresponding_input: accumulator.update(data_point) element = accumulator.to_output() ``` If there is only one output element, we still want to parallelize, so we do so along the input instead: ``` accumulators[n_threads] PARALLEL FOR EACH input_chunk IN input.chunks(): accumulators[thread_num()] = identity SERIAL FOR EACH data_point IN input_chunk: accumulators[thread_num()].update_with_data(data_point) accumulator = identity SERIAL FOR EACH acc in accumulators: accumulator.update_with_other_accumulator(acc) output_element = accumulator.to_output() ``` Note that accumulators and data points do not have to be the same type in general, since it might be necessary to track arbitrary amounts of data at intermediate stages. For example, for `std`, we use a parallel version of Welford's algorithm, which requies us to track the mean, second moment, and number of elements, so the accumulator type for `std` contains three pieces of data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14535 Differential Revision: D13283887 Pulled By: umanwizard fbshipit-source-id: 8586b7bf00bf9f663c55d6f8323301e257f5ec3f	2018-12-07 20:16:04 -08:00
Johannes M Dieterich	52942e1f09	Enable unit tests known to work on ROCm (#14011 ) Summary: * Enable unit tests known to work on ROCm. * Disable a few that are known to be flaky for the time being. * Use std::abs for Half * No more special casing for ROCm in TensorMathReduce * Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce ezyang bddppq for awareness Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011 Differential Revision: D13387679 Pulled By: bddppq fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71	2018-12-07 18:57:32 -08:00
Jan Schlüter	1c8d41a08d	Allow linspace and logspace with steps=1 and start != end like numpy (#14748 ) Summary: `torch.linspace(0, 1, 1)` fails with `RuntimeError: invalid argument 3: invalid number of points at ../aten/src/TH/generic/THTensorMoreMath.cpp:2119`, while `np.linspace(0, 1, 1)` works fine. Looking at the code, there is even a comment by gchanan asking: "NumPy allows you to pass different points even if n <= 1 -- should we?" I would say "yes". Currently, I would need to handle the case of `steps == 1` or `steps == 0` separately, making sure to change the `end` when calling `torch.linspace`. This is impractical. If we support `start != end`, there are two possibilities for the result: Either we ensure the first value in the resulting sequence always equals `start`, or we ensure the last value in the resulting sequence always equals `end`. Numpy chose the former, which also allows it to support a boolean `endpoint` flag. I'd say we should follow numpy. This PR adapts `linspace` and `logspace` to mimic the behavior of numpy, adapts the tests accordingly, and extends the docstrings to make clear what happens when passing `steps=1`. If you decide against this PR, the error message should become explicit about what I did wrong, and the documentation should be extended to mention this restriction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14748 Differential Revision: D13356136 Pulled By: ezyang fbshipit-source-id: db85b8f0a98a5e24b3acd766132ab71c91794a82	2018-12-06 09:30:55 -08:00
Junjie Bai	ba0ebe33c1	Unify device argument parsing between torch and c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14786 Differential Revision: D13334501 Pulled By: bddppq fbshipit-source-id: ae3536be1fe0dcd6a1552ec93629ecc9554c0d7c	2018-12-05 18:37:32 -08:00
Richard Zou	1921816f85	Fix clamp when min/max are both None (#14716 ) Summary: Before this PR, tensor.clamp() would return an empty tensor if min and max were not specified. This is a regression from 0.4.1, which would throw an error. This PR restores that error message. Fixes #14470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14716 Differential Revision: D13311031 Pulled By: zou3519 fbshipit-source-id: 87894db582d5749eaccfc22ba06aac4e10983880	2018-12-04 07:07:09 -08:00
Roy Li	0786dfee7c	Move THTensor_(copy) to aten (#13603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13603 P Moved vectorized CPU copy to aten. Notable changes mainly in _copy_same_type_. Reviewed By: ezyang Differential Revision: D12936031 fbshipit-source-id: 00d28813e3160595e73d104f76685e13154971c1	2018-11-30 11:12:54 -08:00
Brennan Vincent	c638f379b3	Make `mean` function work across multiple dimensions. (#14252 ) Summary: Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it. Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252 Differential Revision: D13161157 Pulled By: umanwizard fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c	2018-11-28 06:53:09 -08:00
Francisco Massa	68251fb931	Fix half tensor printing plus speedup large tensor printing (#14418 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/14344 and https://github.com/pytorch/pytorch/issues/6863 The slowdown was due to the fact that we were only summarizing the tensor (for computing the number of digits to print) if its first dimension was larger than the threshold. It now goes over all the dimensions. Some quick runtime analysis: Before this PR: ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50) In [2]: %timeit str(a) 13.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After this PR ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50) In [2]: %timeit str(a) 2.08 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [3]: b = a.cuda() In [4]: %timeit str(b) 8.39 ms ± 45.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14418 Reviewed By: weiyangfb Differential Revision: D13226950 Pulled By: soumith fbshipit-source-id: 19eb4b855db4c8f891d0925a9c56ae8a2824bb23	2018-11-28 06:13:06 -08:00
Brian Vaughan	a0def0b57e	check for invalid ranges in torch.arange Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13915 Differential Revision: D13222110 Pulled By: nairbv fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b	2018-11-27 20:38:56 -08:00
Brian Vaughan	b08a186153	roll along multiple dimensions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13874 Differential Revision: D13223669 Pulled By: nairbv fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04	2018-11-27 20:32:30 -08:00
Ailing Zhang	e387d945c2	allow empty index for scatter_* methods (#14077 ) Summary: Fixes #2027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14077 Differential Revision: D13095788 Pulled By: ailzhang fbshipit-source-id: ad2c8bbf83d36e07940782b9206fbdcde8905fd3	2018-11-19 09:50:21 -08:00
vishwakftw	a5891e6124	Remove debugging code in test_cholesky_batched (#14156 ) Summary: They didn't turn up in my tests because I use pytest which doesn't print debug statements if the tests pass Differential Revision: D13115227 Pulled By: soumith fbshipit-source-id: 46a7d47da7412d6b071158a23ab21e7fb0c6e11b	2018-11-17 22:28:21 -08:00
vishwakftw	a30ade1139	Batched cholesky decomposition (#14017 ) Summary: Implements batching for the Cholesky decomposition. Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations. Changes made: - batching code - tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`. - doc string modification - autograd modification - removal of `_batch_potrf` in `MultivariateNormal`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017 Differential Revision: D13087945 Pulled By: ezyang fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e	2018-11-17 10:49:15 -08:00
Johannes M Dieterich	ce48958606	enable more unit tests (#13166 ) Summary: This enables the distributions and utils test sets for ROCm. Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit. For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166 Differential Revision: D12814759 Pulled By: bddppq fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e	2018-11-12 18:49:52 -08:00
Vishwak Srinivasan	7b2fb012a8	Make potrs batched (#13453 ) Summary: - This is a straightforward PR, building up on the batch inverse PR, except for one change: - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty. Billing of changes: - Add batching for `potrs` - Add relevant tests - Modify doc string Minor changes: - Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`. - Add test for CUDA `potrs` (2D Tensor op) - Move the batched shape checking to `LinearAlgebraUtils.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453 Reviewed By: soumith Differential Revision: D12942039 Pulled By: zou3519 fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35	2018-11-09 15:16:26 -08:00
Brian Vaughan	4fadf571fd	handle flat rolling (no dim specified) T36264909 (#13588 ) Summary: update roll to behave as in numpy.roll when dimension to roll not specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13588 Differential Revision: D12964295 Pulled By: nairbv fbshipit-source-id: de9cdea1a937773033f081f8c1505a40e4e08bc1	2018-11-08 12:39:35 -08:00
vishwakftw	0a090fe60a	Fix torch.dist for infinity, zero and minus infinity norms (#13713 ) Summary: Fixes #13559 Differential Revision: D12981556 Pulled By: zou3519 fbshipit-source-id: 99e86abab3ca045257374a9212ca24e7ca59fe9d	2018-11-08 12:03:07 -08:00
Wei Yang	5dd153b1c2	speed up torch.sparse_mask() cpu kernel (#13290 ) Summary: - `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()` - previous `sparse_mask(D, S)` cpu kernel is not parallelized - this PR speed up the cpu kernel for two separated cases: - `D.dim == S.sparse_dim`: simply parallelize the kernel - `D.dim > S.sparse_dim`: simply use CUDA kernel implementation - performance: `D.dim == S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) >>> %timeit D.sparse_mask(S) ======= before change ======= 6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ======= after change ======= 333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` `D.dim > S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000, 2, 2] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, dims[2], dims[3]) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) %timeit D.sparse_mask(S) ======= before change ======= 495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ======= after change ======= 594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290 Differential Revision: D12878336 Pulled By: weiyangfb fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37	2018-11-07 20:02:17 -08:00
Wei Yang	6bfce16873	fix flip() shape bug in CPU (#13344 ) Summary: - a walk around for #13292, a complete fix requires investigation on the root cause when using advanced indexing - this PR brings in `filp()` CUDA implementation for CPU kernel - with this change: ``` >>> t = torch.randn(1, 3, 4, 5) >> t.flip(1, 3).shape torch.Size([1, 3, 4, 5]) ``` - performance: ``` ====== with this PR ====== >>> a = torch.randn(1000, 1000) >>> %timeit -r 100 a.flip(0, 1) 1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each) ====== Perf at previous PR #7873 ====== 100 loops, best of 3: 11 ms per loop ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13344 Differential Revision: D12968003 Pulled By: weiyangfb fbshipit-source-id: 66f434049d143a0575a35b5c983b3e0577a1a28d	2018-11-07 19:53:49 -08:00

1 2 3 4 5 ...

516 Commits