pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Andrew M. James	cfb2034b65	Add spdiags sparse matrix initialization (#78439 ) Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags) Part of #70926 In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to. Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output. The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor ``` Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`. This would need to be altered for the case where `len(shape)` > 2. One options is: ``` torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different. Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`. In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility. I think some discussion is required about: - [x] Should the N-D output case be implemented from the outset - [x] If not, should the future addition of the N-D output case be considered when designing the interface. - [x] Other thoughts on the signature which includes the `dims` information for the N-D output case. Resolution: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu	2022-06-30 19:54:47 +00:00
Christian Puhrsch	5da776dd08	[Resubmission] fix mul_out CUDA config for COO tensors (#80254 ) Fixes https://github.com/pytorch/pytorch/issues/79914 Duplicate of https://github.com/pytorch/pytorch/pull/79937 . I wasn't able to push changes to the existing PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/80254 Approved by: https://github.com/eellison	2022-06-28 00:47:03 +00:00
Nikita Vedeneev	417677bf62	`permute` for COO sparse tensors (#79707 ) As per title. Partial implementation of https://github.com/pytorch/pytorch/issues/78422. We cannot satisfy the view semantics once operated over sparse dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79707 Approved by: https://github.com/cpuhrsch	2022-06-25 08:49:58 +00:00
Nikita Vedeneev	03cf01bdc0	`index_select` for COO CUDA tensors. (#77551 ) Brings a native CUDA implementation for `index_select`. Master silently converts CUDA tensors to CPU for CUDA support. Case `nnz >> size` could be optimized similar to how https://github.com/pytorch/pytorch/pull/72710 is doing that. Some benchmarks: <details> <summary>PR/torch_sparse/master</summary> ``` [------------------------------- cuda coo.index_select -------------------------------] \| PR \| torch_sparse \| master 32 threads: --------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 96 \| 327 \| 70 n=10000, nnz=100, index_len=100, dim=1 \| 120 \| 505 \| 74 n=10000, nnz=100, index_len=1000, dim=0 \| 90 \| 333 \| 93 n=10000, nnz=100, index_len=1000, dim=1 \| 120 \| 499 \| 98 n=10000, nnz=100, index_len=10000, dim=0 \| 92 \| 331 \| 350 n=10000, nnz=100, index_len=10000, dim=1 \| 100 \| 506 \| 352 n=100000, nnz=1000, index_len=100, dim=0 \| 53 \| 274 \| 60 n=100000, nnz=1000, index_len=100, dim=1 \| 90 \| 368 \| 71 n=100000, nnz=1000, index_len=1000, dim=0 \| 93 \| 332 \| 100 n=100000, nnz=1000, index_len=1000, dim=1 \| 130 \| 501 \| 140 n=100000, nnz=1000, index_len=10000, dim=0 \| 100 \| 341 \| 522 n=100000, nnz=1000, index_len=10000, dim=1 \| 130 \| 530 \| 549 n=1000000, nnz=10000, index_len=100, dim=0 \| 90 \| 429 \| 110 n=1000000, nnz=10000, index_len=100, dim=1 \| 296 \| 810 \| 355 n=1000000, nnz=10000, index_len=1000, dim=0 \| 100 \| 435 \| 170 n=1000000, nnz=10000, index_len=1000, dim=1 \| 309 \| 830 \| 548 n=1000000, nnz=10000, index_len=10000, dim=0 \| 110 \| 446 \| 750 n=1000000, nnz=10000, index_len=10000, dim=1 \| 310 \| 830 \| 1000 n=10, nnz=100, index_len=100, dim=0 \| 90 \| 333 \| 74 n=10, nnz=100, index_len=100, dim=1 \| 100 \| 497 \| 78 n=10, nnz=100, index_len=1000, dim=0 \| 90 \| 329 \| 140 n=10, nnz=100, index_len=1000, dim=1 \| 100 \| 800 \| 100 n=10, nnz=100, index_len=10000, dim=0 \| 93 \| 340 \| 900 n=10, nnz=100, index_len=10000, dim=1 \| 120 \| 800 \| 489 n=10, nnz=1000, index_len=100, dim=0 \| 90 \| 321 \| 140 n=10, nnz=1000, index_len=100, dim=1 \| 100 \| 680 \| 140 n=10, nnz=1000, index_len=1000, dim=0 \| 110 \| 349 \| 670 n=10, nnz=1000, index_len=1000, dim=1 \| 130 \| 740 \| 800 n=10, nnz=1000, index_len=10000, dim=0 \| 302 \| 503 \| 4882 n=10, nnz=1000, index_len=10000, dim=1 \| 325 \| 2257 \| 5262 n=10, nnz=10000, index_len=100, dim=0 \| 229 \| 349 \| 810 n=10, nnz=10000, index_len=100, dim=1 \| 433 \| 870 \| 700 n=10, nnz=10000, index_len=1000, dim=0 \| 666 \| 502 \| 5581 n=10, nnz=10000, index_len=1000, dim=1 \| 826 \| 2379 \| 4820 n=10, nnz=10000, index_len=10000, dim=0 \| 2534 \| 2700 \| 80000 n=10, nnz=10000, index_len=10000, dim=1 \| 2723 \| 18540 \| 80000 n=100, nnz=1000, index_len=100, dim=0 \| 94 \| 324 \| 110 n=100, nnz=1000, index_len=100, dim=1 \| 100 \| 499 \| 110 n=100, nnz=1000, index_len=1000, dim=0 \| 96 \| 337 \| 150 n=100, nnz=1000, index_len=1000, dim=1 \| 130 \| 800 \| 140 n=100, nnz=1000, index_len=10000, dim=0 \| 100 \| 346 \| 900 n=100, nnz=1000, index_len=10000, dim=1 \| 130 \| 760 \| 900 n=100, nnz=10000, index_len=100, dim=0 \| 90 \| 323 \| 190 n=100, nnz=10000, index_len=100, dim=1 \| 279 \| 800 \| 180 n=100, nnz=10000, index_len=1000, dim=0 \| 110 \| 339 \| 781 n=100, nnz=10000, index_len=1000, dim=1 \| 294 \| 870 \| 800 n=100, nnz=10000, index_len=10000, dim=0 \| 315 \| 505 \| 6264 n=100, nnz=10000, index_len=10000, dim=1 \| 497 \| 2398 \| 5404 n=1000, nnz=10000, index_len=100, dim=0 \| 90 \| 333 \| 160 n=1000, nnz=10000, index_len=100, dim=1 \| 279 \| 635 \| 150 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 328 \| 215 n=1000, nnz=10000, index_len=1000, dim=1 \| 287 \| 810 \| 207 n=1000, nnz=10000, index_len=10000, dim=0 \| 100 \| 339 \| 900 n=1000, nnz=10000, index_len=10000, dim=1 \| 291 \| 880 \| 1000 n=1000, nnz=100000, index_len=100, dim=0 \| 92 \| 358 \| 435 n=1000, nnz=100000, index_len=100, dim=1 \| 302 \| 900 \| 530 n=1000, nnz=100000, index_len=1000, dim=0 \| 130 \| 360 \| 1000 n=1000, nnz=100000, index_len=1000, dim=1 \| 329 \| 930 \| 1200 n=1000, nnz=100000, index_len=10000, dim=0 \| 343 \| 530 \| 7000 n=1000, nnz=100000, index_len=10000, dim=1 \| 545 \| 2446 \| 6100 n=1000, nnz=1000000, index_len=100, dim=0 \| 355 \| 394 \| 2210 n=1000, nnz=1000000, index_len=100, dim=1 \| 1660 \| 2276 \| 2674 n=1000, nnz=1000000, index_len=1000, dim=0 \| 877 \| 574 \| 6700 n=1000, nnz=1000000, index_len=1000, dim=1 \| 2449 \| 3782 \| 9000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 3112 \| 2931 \| 57000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 7340 \| 20220 \| 65700 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/77551 Approved by: https://github.com/cpuhrsch	2022-06-01 17:39:03 +00:00
Mike Ruberry	089203f8bc	Updates floor_divide to perform floor division (#78411 ) Fixes https://github.com/pytorch/pytorch/issues/43874 This PR changes floor_divide to perform floor division instead of truncation division. This is a BC-breaking change, but it's a "bug fix," and we've already warned users for several releases this behavior would change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78411 Approved by: https://github.com/ngimel	2022-05-29 21:28:45 +00:00
Nikita Vedeneev	00a1fb64bb	Faster `index_select` for sparse COO tensors on CPU. (#72710 ) Fixes https://github.com/pytorch/pytorch/issues/72212. This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible. Benchmark results. <details> <summary>Testing script</summary> ```python import torch import math from IPython import get_ipython from itertools import product import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() index_sizes = (100, 1000, 10000) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) def f(t, d, index): s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t) ss = s.index_select(d, index) return ss.coo() name = "PR" results = [] for (n, nnz), m in product(problem_dims, index_sizes): for d in (0, 1): if nnz < n: shape = (n, n) else: shape = (n, nnz // n) if d == 0 else (nnz // n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,)) colidx = torch.randint(low=0, high=ncols, size=(nnz,)) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz) index = torch.randint(low=0, high=n, size=(m,)) SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce() smtp = "SparseX.index_select(d, index)" timer = Timer(smtp, globals=globals(), label="coo.index_select", description=f"{name}: coo.index_select", sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_index_select.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "torch_sparse", "master" ] timers = [] for name in files: with open("{}_index_select.pickle".format(name), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>PR/torch_sparse/master runtime comparison</summary> ``` [----------------------------------- coo.index_select ----------------------------------] \| PR \| torch_sparse \| master 32 threads: ----------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 14 \| 140 \| 10 n=10000, nnz=100, index_len=100, dim=1 \| 14 \| 200 \| 10 n=10000, nnz=100, index_len=1000, dim=0 \| 30 \| 180 \| 38 n=10000, nnz=100, index_len=1000, dim=1 \| 34 \| 240 \| 38 n=10000, nnz=100, index_len=10000, dim=0 \| 278 \| 460 \| 330 n=10000, nnz=100, index_len=10000, dim=1 \| 275 \| 516 \| 330 n=100000, nnz=1000, index_len=100, dim=0 \| 16 \| 290 \| 31 n=100000, nnz=1000, index_len=100, dim=1 \| 26 \| 390 \| 31 n=100000, nnz=1000, index_len=1000, dim=0 \| 45 \| 405 \| 263 n=100000, nnz=1000, index_len=1000, dim=1 \| 73 \| 500 \| 261 n=100000, nnz=1000, index_len=10000, dim=0 \| 444 \| 783 \| 2570 n=100000, nnz=1000, index_len=10000, dim=1 \| 470 \| 890 \| 2590 n=1000000, nnz=10000, index_len=100, dim=0 \| 25 \| 2400 \| 270 n=1000000, nnz=10000, index_len=100, dim=1 \| 270 \| 4000 \| 269 n=1000000, nnz=10000, index_len=1000, dim=0 \| 74 \| 2600 \| 2620 n=1000000, nnz=10000, index_len=1000, dim=1 \| 464 \| 3600 \| 2640 n=1000000, nnz=10000, index_len=10000, dim=0 \| 635 \| 3300 \| 26400 n=1000000, nnz=10000, index_len=10000, dim=1 \| 1000 \| 3960 \| 26400 n=10, nnz=100, index_len=100, dim=0 \| 16 \| 137 \| 16 n=10, nnz=100, index_len=100, dim=1 \| 16 \| 220 \| 16 n=10, nnz=100, index_len=1000, dim=0 \| 63 \| 238 \| 81 n=10, nnz=100, index_len=1000, dim=1 \| 60 \| 698 \| 78 n=10, nnz=100, index_len=10000, dim=0 \| 480 \| 940 \| 862 n=10, nnz=100, index_len=10000, dim=1 \| 330 \| 4930 \| 1070 n=10, nnz=1000, index_len=100, dim=0 \| 60 \| 200 \| 73 n=10, nnz=1000, index_len=100, dim=1 \| 56 \| 683 \| 70 n=10, nnz=1000, index_len=1000, dim=0 \| 480 \| 530 \| 1050 n=10, nnz=1000, index_len=1000, dim=1 \| 330 \| 4550 \| 1368 n=10, nnz=1000, index_len=10000, dim=0 \| 3100 \| 2900 \| 9300 n=10, nnz=1000, index_len=10000, dim=1 \| 3400 \| 46000 \| 9100 n=10, nnz=10000, index_len=100, dim=0 \| 400 \| 453 \| 857 n=10, nnz=10000, index_len=100, dim=1 \| 400 \| 4070 \| 1730 n=10, nnz=10000, index_len=1000, dim=0 \| 2840 \| 2600 \| 13900 n=10, nnz=10000, index_len=1000, dim=1 \| 3700 \| 40600 \| 16000 n=10, nnz=10000, index_len=10000, dim=0 \| 83200 \| 67400 \| 160000 n=10, nnz=10000, index_len=10000, dim=1 \| 68000 \| 528000 \| 190000 n=100, nnz=1000, index_len=100, dim=0 \| 46 \| 148 \| 31 n=100, nnz=1000, index_len=100, dim=1 \| 45 \| 242 \| 37 n=100, nnz=1000, index_len=1000, dim=0 \| 68 \| 248 \| 240 n=100, nnz=1000, index_len=1000, dim=1 \| 66 \| 755 \| 290 n=100, nnz=1000, index_len=10000, dim=0 \| 370 \| 802 \| 2250 n=100, nnz=1000, index_len=10000, dim=1 \| 372 \| 5430 \| 2770 n=100, nnz=10000, index_len=100, dim=0 \| 82 \| 210 \| 224 n=100, nnz=10000, index_len=100, dim=1 \| 74 \| 986 \| 270 n=100, nnz=10000, index_len=1000, dim=0 \| 350 \| 618 \| 2600 n=100, nnz=10000, index_len=1000, dim=1 \| 370 \| 4660 \| 4560 n=100, nnz=10000, index_len=10000, dim=0 \| 3000 \| 3400 \| 41680 n=100, nnz=10000, index_len=10000, dim=1 \| 5000 \| 47500 \| 30400 n=1000, nnz=10000, index_len=100, dim=0 \| 71 \| 160 \| 185 n=1000, nnz=10000, index_len=100, dim=1 \| 64 \| 516 \| 190 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 249 \| 1740 n=1000, nnz=10000, index_len=1000, dim=1 \| 98 \| 1030 \| 1770 n=1000, nnz=10000, index_len=10000, dim=0 \| 600 \| 808 \| 18300 n=1000, nnz=10000, index_len=10000, dim=1 \| 663 \| 5300 \| 18500 n=1000, nnz=100000, index_len=100, dim=0 \| 160 \| 258 \| 1890 n=1000, nnz=100000, index_len=100, dim=1 \| 200 \| 3620 \| 2050 n=1000, nnz=100000, index_len=1000, dim=0 \| 500 \| 580 \| 18700 n=1000, nnz=100000, index_len=1000, dim=1 \| 640 \| 7550 \| 30000 n=1000, nnz=100000, index_len=10000, dim=0 \| 3400 \| 3260 \| 186000 n=1000, nnz=100000, index_len=10000, dim=1 \| 3600 \| 49600 \| 194000 n=1000, nnz=1000000, index_len=100, dim=0 \| 517 \| 957 \| 18700 n=1000, nnz=1000000, index_len=100, dim=1 \| 680 \| 39600 \| 37600 n=1000, nnz=1000000, index_len=1000, dim=0 \| 3600 \| 4500 \| 186000 n=1000, nnz=1000000, index_len=1000, dim=1 \| 5800 \| 76400 \| 190000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 50000 \| 67900 \| 1800000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 45000 \| 570000 \| 1900000 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-05-10 16:33:13 +00:00
PyTorch MergeBot	8d67972b14	Revert "Faster `index_select` for sparse COO tensors on CPU. (#72710 )" This reverts commit `ce3857e73c`. Reverted https://github.com/pytorch/pytorch/pull/72710 on behalf of https://github.com/malfet	2022-05-10 14:43:05 +00:00
Nikita Vedeneev	ce3857e73c	Faster `index_select` for sparse COO tensors on CPU. (#72710 ) Fixes https://github.com/pytorch/pytorch/issues/72212. This PR improves the previous algorithm in complexity. It also utilizes the structure of the problem and parallelizes computations when possible. Benchmark results. <details> <summary>Testing script</summary> ```python import torch import math from IPython import get_ipython from itertools import product import pickle from torch.utils.benchmark import Timer, Compare torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() index_sizes = (100, 1000, 10000) # specifies (n, nnz) problem_dims = ( # n > nnz (10000, 100), (100000, 1000), (1000000, 10000), # n < nnz (10, 100), (10, 1000), (10, 10000), (100, 1000), (100, 10000), (1000, 10000), (1000, 100000), (1000, 1000000), #(1000000, 1000000000), ) def f(t, d, index): s = torch_sparse.SparseTensor.from_torch_sparse_coo_tensor(t) ss = s.index_select(d, index) return ss.coo() name = "PR" results = [] for (n, nnz), m in product(problem_dims, index_sizes): for d in (0, 1): if nnz < n: shape = (n, n) else: shape = (n, nnz // n) if d == 0 else (nnz // n, n) nrows, ncols = shape rowidx = torch.randint(low=0, high=nrows, size=(nnz,)) colidx = torch.randint(low=0, high=ncols, size=(nnz,)) itemidx = torch.vstack((rowidx, colidx)) xvalues = torch.randn(nnz) index = torch.randint(low=0, high=n, size=(m,)) SparseX = torch.sparse_coo_tensor(itemidx, xvalues, size=shape).coalesce() smtp = "SparseX.index_select(d, index)" timer = Timer(smtp, globals=globals(), label="coo.index_select", description=f"{name}: coo.index_select", sub_label=f"n={n}, nnz={nnz}, index_len={m}, dim={d}", num_threads=torch.get_num_threads()) results.append(timer.blocked_autorange()) compare = Compare(results) compare.trim_significant_figures() compare.print() with open(f"{name}_index_select.pickle", 'wb') as f: pickle.dump(results, f) ``` </details> <details> <summary>Gather results</summary> ```python import pickle from torch.utils.benchmark import Timer, Compare files = [ "PR", "torch_sparse", "master" ] timers = [] for name in files: with open("{}_index_select.pickle".format(name), 'rb') as f: timers += pickle.load(f) compare = Compare(timers) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary>PR/torch_sparse/master runtime comparison</summary> ``` [----------------------------------- coo.index_select ----------------------------------] \| PR \| torch_sparse \| master 32 threads: ----------------------------------------------------------------------------- n=10000, nnz=100, index_len=100, dim=0 \| 14 \| 140 \| 10 n=10000, nnz=100, index_len=100, dim=1 \| 14 \| 200 \| 10 n=10000, nnz=100, index_len=1000, dim=0 \| 30 \| 180 \| 38 n=10000, nnz=100, index_len=1000, dim=1 \| 34 \| 240 \| 38 n=10000, nnz=100, index_len=10000, dim=0 \| 278 \| 460 \| 330 n=10000, nnz=100, index_len=10000, dim=1 \| 275 \| 516 \| 330 n=100000, nnz=1000, index_len=100, dim=0 \| 16 \| 290 \| 31 n=100000, nnz=1000, index_len=100, dim=1 \| 26 \| 390 \| 31 n=100000, nnz=1000, index_len=1000, dim=0 \| 45 \| 405 \| 263 n=100000, nnz=1000, index_len=1000, dim=1 \| 73 \| 500 \| 261 n=100000, nnz=1000, index_len=10000, dim=0 \| 444 \| 783 \| 2570 n=100000, nnz=1000, index_len=10000, dim=1 \| 470 \| 890 \| 2590 n=1000000, nnz=10000, index_len=100, dim=0 \| 25 \| 2400 \| 270 n=1000000, nnz=10000, index_len=100, dim=1 \| 270 \| 4000 \| 269 n=1000000, nnz=10000, index_len=1000, dim=0 \| 74 \| 2600 \| 2620 n=1000000, nnz=10000, index_len=1000, dim=1 \| 464 \| 3600 \| 2640 n=1000000, nnz=10000, index_len=10000, dim=0 \| 635 \| 3300 \| 26400 n=1000000, nnz=10000, index_len=10000, dim=1 \| 1000 \| 3960 \| 26400 n=10, nnz=100, index_len=100, dim=0 \| 16 \| 137 \| 16 n=10, nnz=100, index_len=100, dim=1 \| 16 \| 220 \| 16 n=10, nnz=100, index_len=1000, dim=0 \| 63 \| 238 \| 81 n=10, nnz=100, index_len=1000, dim=1 \| 60 \| 698 \| 78 n=10, nnz=100, index_len=10000, dim=0 \| 480 \| 940 \| 862 n=10, nnz=100, index_len=10000, dim=1 \| 330 \| 4930 \| 1070 n=10, nnz=1000, index_len=100, dim=0 \| 60 \| 200 \| 73 n=10, nnz=1000, index_len=100, dim=1 \| 56 \| 683 \| 70 n=10, nnz=1000, index_len=1000, dim=0 \| 480 \| 530 \| 1050 n=10, nnz=1000, index_len=1000, dim=1 \| 330 \| 4550 \| 1368 n=10, nnz=1000, index_len=10000, dim=0 \| 3100 \| 2900 \| 9300 n=10, nnz=1000, index_len=10000, dim=1 \| 3400 \| 46000 \| 9100 n=10, nnz=10000, index_len=100, dim=0 \| 400 \| 453 \| 857 n=10, nnz=10000, index_len=100, dim=1 \| 400 \| 4070 \| 1730 n=10, nnz=10000, index_len=1000, dim=0 \| 2840 \| 2600 \| 13900 n=10, nnz=10000, index_len=1000, dim=1 \| 3700 \| 40600 \| 16000 n=10, nnz=10000, index_len=10000, dim=0 \| 83200 \| 67400 \| 160000 n=10, nnz=10000, index_len=10000, dim=1 \| 68000 \| 528000 \| 190000 n=100, nnz=1000, index_len=100, dim=0 \| 46 \| 148 \| 31 n=100, nnz=1000, index_len=100, dim=1 \| 45 \| 242 \| 37 n=100, nnz=1000, index_len=1000, dim=0 \| 68 \| 248 \| 240 n=100, nnz=1000, index_len=1000, dim=1 \| 66 \| 755 \| 290 n=100, nnz=1000, index_len=10000, dim=0 \| 370 \| 802 \| 2250 n=100, nnz=1000, index_len=10000, dim=1 \| 372 \| 5430 \| 2770 n=100, nnz=10000, index_len=100, dim=0 \| 82 \| 210 \| 224 n=100, nnz=10000, index_len=100, dim=1 \| 74 \| 986 \| 270 n=100, nnz=10000, index_len=1000, dim=0 \| 350 \| 618 \| 2600 n=100, nnz=10000, index_len=1000, dim=1 \| 370 \| 4660 \| 4560 n=100, nnz=10000, index_len=10000, dim=0 \| 3000 \| 3400 \| 41680 n=100, nnz=10000, index_len=10000, dim=1 \| 5000 \| 47500 \| 30400 n=1000, nnz=10000, index_len=100, dim=0 \| 71 \| 160 \| 185 n=1000, nnz=10000, index_len=100, dim=1 \| 64 \| 516 \| 190 n=1000, nnz=10000, index_len=1000, dim=0 \| 100 \| 249 \| 1740 n=1000, nnz=10000, index_len=1000, dim=1 \| 98 \| 1030 \| 1770 n=1000, nnz=10000, index_len=10000, dim=0 \| 600 \| 808 \| 18300 n=1000, nnz=10000, index_len=10000, dim=1 \| 663 \| 5300 \| 18500 n=1000, nnz=100000, index_len=100, dim=0 \| 160 \| 258 \| 1890 n=1000, nnz=100000, index_len=100, dim=1 \| 200 \| 3620 \| 2050 n=1000, nnz=100000, index_len=1000, dim=0 \| 500 \| 580 \| 18700 n=1000, nnz=100000, index_len=1000, dim=1 \| 640 \| 7550 \| 30000 n=1000, nnz=100000, index_len=10000, dim=0 \| 3400 \| 3260 \| 186000 n=1000, nnz=100000, index_len=10000, dim=1 \| 3600 \| 49600 \| 194000 n=1000, nnz=1000000, index_len=100, dim=0 \| 517 \| 957 \| 18700 n=1000, nnz=1000000, index_len=100, dim=1 \| 680 \| 39600 \| 37600 n=1000, nnz=1000000, index_len=1000, dim=0 \| 3600 \| 4500 \| 186000 n=1000, nnz=1000000, index_len=1000, dim=1 \| 5800 \| 76400 \| 190000 n=1000, nnz=1000000, index_len=10000, dim=0 \| 50000 \| 67900 \| 1800000 n=1000, nnz=1000000, index_len=10000, dim=1 \| 45000 \| 570000 \| 1900000 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/72710 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2022-05-09 19:59:39 +00:00
Jane Xu	6d9dbd3391	Manually skip test_sparse_addmm as disable code is not working for now (#77076 ) Related to https://github.com/pytorch/pytorch/issues/73145 It was previously skipped for Linux and Windows, but mac has become a problem as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77076 Approved by: https://github.com/ezyang	2022-05-09 13:54:29 +00:00
Mikayla Gawarecki	0adf070574	Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454 Approved by: https://github.com/cpuhrsch	2022-05-06 15:40:22 +00:00
PyTorch MergeBot	381e08309f	Revert "Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax)" This reverts commit `fc2a2e8b72`. Reverted https://github.com/pytorch/pytorch/pull/75454 on behalf of https://github.com/b0noI	2022-05-04 22:31:31 +00:00
Mikayla Gawarecki	fc2a2e8b72	Use scatter_reduce to support masked reductions on sparse COO tensors (sum, prod, amin, amax) Pull Request resolved: https://github.com/pytorch/pytorch/pull/75454 Approved by: https://github.com/cpuhrsch	2022-05-03 23:17:07 +00:00
arindamroy-eng	7478ce187a	ROCM:Unskip more tests for ROCM5.0 Re-enabling more tests which are working on ROCM5.0 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75353 Approved by: https://github.com/ezyang	2022-04-19 19:45:55 +00:00
Pearu Peterson	a98b4666e0	Enable test_sparse_mask for Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/75189 Approved by: https://github.com/cpuhrsch	2022-04-11 17:21:29 +00:00
Brian Hirsh	1b7d7d9327	Reland: "free up dispatch key space (in C++)" (#74963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963 This is a re-land of D35192346 (`9872a06d77`) and D35192317 (`a9216cde6c`), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633. The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`. Background: Existing Mobile Optimization Pytorch mobile builds have an existing optimization (here `cc23725e89/c10/core/DispatchKey.h (L382)` and here `cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)`), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: `cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294)`. The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64. The Bug This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`. Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan? Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this? The debugging experience was pretty difficult Debugging the Milan-specific failure was made difficult by the following: (1) lack of CI - the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging. (2) It's difficult to get a repro. - my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space) - There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/) (3) Lack of stack-traces. - Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash. ghstack-source-id: 152688542 Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing. Reviewed By: phding, albanD Differential Revision: D35222806 fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30 (cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)	2022-03-31 21:52:38 +00:00
Nikita Shulga	bfac65dfe5	[testing] Update dispatch macros (#74977 ) This PR is reland of #74289 Co-authored-by: Khushi Agrawal <khushiagrawal411@gmail.com>	2022-03-30 14:13:21 -07:00
PyTorch MergeBot	2e4152b118	Revert "[testing] Update dispatch macros" This reverts commit `eed19a0f38`. Reverted https://github.com/pytorch/pytorch/pull/74289 on behalf of https://github.com/malfet	2022-03-30 19:52:37 +00:00
Khushi Agrawal	eed19a0f38	[testing] Update dispatch macros Hi, This PR is the follow-up PR of #71561. (the previous PR had a couple of merge conflicts and was reverted, this PR resolves that). Please take a look. Thanks! cc: @pmeier @mruberry @kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74289 Approved by: https://github.com/pmeier, https://github.com/mruberry	2022-03-30 16:10:16 +00:00
Brian Hirsh	9872a06d77	Back out "free up dispatch key space (in C++)" (#74859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74859 Original commit changeset: 6d1dd0fd8144 Original Phabricator Diff: D34227616 (`2cbddc0e9b`) ghstack-source-id: 152381077 (Note: this ignores all push blocking failures!) Test Plan: Test on Milan with "get weather utterance" buck build fbsourcefbandroid/mode/opt fbsourcefbandroid/mode/milan_build_rdk //fbandroid/apps/wearable/system/speechservice:speechservice_target30_xhdpi_armv7_release_debug_keystore -c pt.has_backtaces=1 Reviewed By: phding Differential Revision: D35192346 fbshipit-source-id: b962de5d5effaf23f9aa8afd3ef36f8c6383de5b (cherry picked from commit 913e3027a11457aaa2d97a9d89ebc6133b14213c)	2022-03-29 15:39:17 +00:00
Christian Puhrsch	e55b73d65a	Add strided layout support for to_dense Fixes #59958 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74486 Approved by: https://github.com/pearu, https://github.com/suo	2022-03-29 00:12:48 +00:00
Pearu Peterson	ebeea9e2ea	Support masked sum on sparse COO tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71239 Approved by: https://github.com/cpuhrsch	2022-03-25 18:26:39 +00:00
Brian Hirsh	2cbddc0e9b	free up dispatch key space (in C++) (#72827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72827 Reland of D34034848 (`6690256021`) ghstack-source-id: 152161452 Test Plan: Confirm that Milan tests are passing Reviewed By: ezyang Differential Revision: D34227616 fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968 (cherry picked from commit e5c1b29fedd5c2a0bad810cedc94aa784136b6aa)	2022-03-25 17:04:51 +00:00
Nikita Shulga	ef066f0832	Revert D34856571: [pytorch][PR] Replace `get_all_` type macros with the ATen dispatch macros. Test Plan: revert-hammer Differential Revision: D34856571 (`3ded7b1da3`) Original commit changeset: 0dca038bcad5 Original Phabricator Diff: D34856571 (`3ded7b1da3`) fbshipit-source-id: 594553fa0b710d78beba59d5d2b646f1f1270386 (cherry picked from commit 8090eb9b12dcf452a9e7dc01792a66fb91b563b6)	2022-03-15 22:07:11 +00:00
Khushi Agrawal	3ded7b1da3	Replace `get_all_` type macros with the ATen dispatch macros. (#71561 ) Summary: Hi, Team! The PR is motivated from https://github.com/pytorch/pytorch/pull/71153#discussion_r782446738. It aims to replace `get_all` type macros with the ATen dispatch macros. The files it iterates over are: (Thanks, Lezcano, for the idea!!) <details> <summary> `test/test_autograd.py`</summary> <p> ```python 43:from torch.testing._internal.common_dtype import get_all_dtypes 8506: floating_dt = [dt for dt in get_all_dtypes() if dt.is_floating_point] ``` </p> </details> <details> <summary> `test/test_binary_ufuncs.py`</summary> <p> ```python 26: all_types_and_complex_and, integral_types_and, get_all_dtypes, get_all_int_dtypes, get_all_math_dtypes, 27: get_all_complex_dtypes, get_all_fp_dtypes, 935: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1035: dtypes(get_all_dtypes( 1488: dtypes((get_all_dtypes(include_bool=False, include_bfloat16=False))) 1879: dtypes(product(get_all_dtypes(include_complex=False), get_all_dtypes(include_complex=False))) 1887: dtypes((get_all_int_dtypes() + [torch.bool])) 1913: dtypes((get_all_fp_dtypes())) 1941: dtypes((get_all_fp_dtypes())) 1977: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 2019: dtypes(product(get_all_fp_dtypes(), get_all_fp_dtypes())) 2048: dtypes(get_all_dtypes()) 2110: dtypes(product(get_all_dtypes(include_complex=False), 2111: get_all_dtypes(include_complex=False))) 2128: types = [torch.bool, torch.bfloat16] + get_all_int_dtypes() 2173: if dtypes[1] in get_all_fp_dtypes(): 2178: dtypes(product(get_all_fp_dtypes(), 2179: get_all_fp_dtypes())) 2260: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2261: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2273: dtypesIfCUDA(set(get_all_math_dtypes('cuda')) - {torch.complex64, torch.complex128}) 2274: dtypes(set(get_all_math_dtypes('cpu')) - {torch.complex64, torch.complex128}) 2307: dtypes(get_all_math_dtypes('cpu')) 2319: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 2331: dtypes(get_all_int_dtypes()) 2356: dtypes(get_all_dtypes(include_bfloat16=False, include_bool=False, include_complex=False)) 2393: if dtype in get_all_int_dtypes(): 2614: dtypes(get_all_dtypes()) 2624: dtypes(tuple(itertools.combinations_with_replacement(get_all_dtypes(), 2))) 2806: dtypes(list(product(get_all_dtypes(include_complex=False), 2807: get_all_dtypes(include_complex=False)))) 2866: dtypes(list(product(get_all_complex_dtypes(), 2867: get_all_complex_dtypes()))) 2902: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2906: dtypes(product(get_all_dtypes(), get_all_dtypes())) 2910: dtypes(product(get_all_dtypes(), get_all_dtypes())) 3019: dtypes = [torch.float, torch.double] + get_all_complex_dtypes() 3221: dtypes(get_all_dtypes(include_complex=False)) 3407: dtypes(list(product(get_all_dtypes(include_bool=False), 3408: get_all_dtypes(include_bool=False)))) 3504: dtypes(product(get_all_dtypes(include_complex=False, include_bfloat16=False), 3505: get_all_dtypes(include_complex=False, include_bfloat16=False))) 3516: if x.dtype in get_all_int_dtypes() + [torch.bool]: 3643: dtypes(product(get_all_dtypes(include_complex=False, 3645: get_all_dtypes(include_complex=False, ``` </p> </details> <details> <summary> `test/test_complex.py`</summary> <p> ```python 6:from torch.testing._internal.common_dtype import get_all_complex_dtypes 11: dtypes(get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_foreach.py`</summary> <p> ```python 18: get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 142: if dtype in get_all_int_dtypes(): 179: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 201: disable_fastpath = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 205: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 211: disable_fastpath \|= dtype not in get_all_complex_dtypes() 241: bool_int_div = op.ref == torch.div and dtype in get_all_int_dtypes() + [torch.bool] 246: disable_fastpath \|= dtype in get_all_int_dtypes() + [torch.bool] 248: disable_fastpath \|= dtype not in get_all_complex_dtypes() 250: disable_fastpath \|= True and dtype not in get_all_complex_dtypes() 307: disable_fastpath = dtype in get_all_int_dtypes() + [torch.bool] 365: if opinfo.name == "_foreach_abs" and dtype in get_all_complex_dtypes(): 376: ops(foreach_unary_op_db, dtypes=get_all_dtypes()) 393: dtypes=get_all_dtypes(include_half=True, include_bfloat16=True, include_complex=False)) 401: ops(foreach_minmax_op_db, dtypes=get_all_fp_dtypes(include_bfloat16=True, include_half=True)) 426: if ord in (1, 2) and dtype in torch.testing.get_all_fp_dtypes(): 439: dtypes(get_all_dtypes()) 449: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 481: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 536: if dtype in get_all_int_dtypes() + [torch.bool] and foreach_op == torch._foreach_div: 545: ops(foreach_binary_op_db, dtypes=get_all_dtypes()) 637: ops(foreach_pointwise_op_db, allowed_dtypes=get_all_fp_dtypes(include_half=False, include_bfloat16=False)) ``` </p> </details> <details> <summary> `test/test_linalg.py`</summary> <p> ```python 29: all_types, floating_types, floating_and_complex_types, get_all_dtypes, get_all_int_dtypes, get_all_complex_dtypes, 30: get_all_fp_dtypes, 111: dtypes((get_all_dtypes())) 794: float_and_complex_dtypes = get_all_fp_dtypes() + get_all_complex_dtypes() 807: dtypes((get_all_int_dtypes())) 828: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 841: if dtype in get_all_complex_dtypes(): 844: dtypes(itertools.product(get_all_dtypes(), 845: get_all_dtypes())) 855: for dtypes0, dtypes1, dtypes2 in product(get_all_dtypes(), repeat=3): 5607: get_all_fp_dtypes(include_half=not CUDA9, include_bfloat16=(CUDA11OrLater and SM53OrLater))) 5608: dtypes((set(get_all_dtypes()) - {torch.half, torch.bool})) 5644: dtypes((get_all_complex_dtypes() + get_all_fp_dtypes())) 6255: dtypesIfCUDA(get_all_complex_dtypes(), 6256: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)), 6292: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6323: dtypesIfCUDA(get_all_complex_dtypes(), 6324: get_all_fp_dtypes(include_bfloat16=(TEST_WITH_ROCM or (CUDA11OrLater and SM53OrLater)))) 6325: dtypes(get_all_complex_dtypes(), get_all_fp_dtypes()) 6358: dtypesIfCUDA(([torch.float, torch.double] + get_all_complex_dtypes())) 6556: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6668: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) 6741: dtypes(get_all_fp_dtypes(), get_all_complex_dtypes()) ``` </p> </details> <details> <summary> `test/test_nn.py`</summary> <p> ```python 37:from torch.testing._internal.common_dtype import integral_types, get_all_fp_dtypes, get_all_math_dtypes 50: onlyNativeDeviceTypes, deviceCountAtLeast, largeTensorTest, expectedFailureMeta, skipMeta, get_all_device_types, \ 8862: for device in get_all_device_types(): 9629: for dt1 in get_all_math_dtypes(device): 9630: for dt2 in get_all_math_dtypes(device): 9631: for dt3 in get_all_math_dtypes(device): 9648: for input_dtype in get_all_math_dtypes(device): 9664: for input_dtype in get_all_math_dtypes(device): 13015: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13034: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 13159: dtypes(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17400: dtypesIfCUDA(get_all_fp_dtypes(include_bfloat16=AMPERE_OR_ROCM)) 17768: dtypesIfCUDA(get_all_fp_dtypes()) 17773: dtypesIfCUDA(get_all_fp_dtypes()) 17778: dtypesIfCUDA(get_all_fp_dtypes()) 17783: dtypesIfCUDA(get_all_fp_dtypes()) 17788: dtypesIfCUDA(get_all_fp_dtypes()) 17793: dtypesIfCUDA(get_all_fp_dtypes()) 17798: dtypesIfCUDA(get_all_fp_dtypes()) 17963: dtypesIfCUDA(get_all_fp_dtypes()) 17977: dtypesIfCUDA(get_all_fp_dtypes()) 18684: def test_cross_entropy_loss_prob_target_all_reductions(self, device): ``` </p> </details> <details> <summary> `test/test_numpy_interop.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import get_all_dtypes 399: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_ops.py`</summary> <p> ```python 12:from torch.testing._internal.common_dtype import floating_and_complex_types_and, get_all_dtypes 86: for dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_reductions.py`</summary> <p> ```python 16: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_complex_dtypes, get_all_fp_dtypes, 360: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 366: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 394: allowed_dtypes=get_all_dtypes(include_bfloat16=False)) 750: for dtype in [dtype for dtype in get_all_math_dtypes('cpu') if dtype != torch.float16]: 1404: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1457: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1458: get_all_complex_dtypes())) 1465: return dtype in get_all_int_dtypes() 1494: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1501: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1507: dtypes((get_all_complex_dtypes())) 1514: dtypes = list(get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False)) 1523: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False))) 1531: if dtype in get_all_fp_dtypes(): 1608: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, 1837: dtypes(get_all_dtypes(include_bool=False, include_complex=False)) 1855: dtypes((set(get_all_dtypes(include_bool=False, include_complex=False)) - {torch.uint8})) 3219: for dtype in get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_serialization.py`</summary> <p> ```python 26:from torch.testing._internal.common_dtype import get_all_dtypes 586: for device, dtype in product(devices, get_all_dtypes()): 589: for other_dtype in get_all_dtypes(): ``` </p> </details> <details> <summary> `test/test_shape_ops.py`</summary> <p> ```python 18:from torch.testing._internal.common_dtype import get_all_dtypes 230: dtypes(get_all_dtypes(include_complex=False, include_bool=False, include_half=False, 232: dtypesIfCUDA(get_all_dtypes(include_complex=False, include_bool=False, include_bfloat16=False)) 344: dtypes(get_all_dtypes()) 443: dtypes(get_all_dtypes()) 461: dtypes(get_all_dtypes()) 570: dtypes(get_all_dtypes(include_complex=False)) ``` </p> </details> <details> <summary> `test/test_sort_and_select.py`</summary> <p> ```python 12: all_types, all_types_and, floating_types_and, get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, 136: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 231: dtypes(set(get_all_dtypes()) - {torch.bool, torch.complex64, torch.complex128}) 296: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 647: dtypesIfCUDA(get_all_fp_dtypes()) 678: dtypesIfCUDA((get_all_dtypes(include_complex=False, 682: dtypes((get_all_dtypes(include_complex=False, include_bool=False, include_half=False, include_bfloat16=False))) 739: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 740: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) 799: dtypesIfCPU(set(get_all_dtypes()) - {torch.complex64, torch.complex128}) 800: dtypes(set(get_all_dtypes()) - {torch.bfloat16, torch.complex64, torch.complex128}) ``` </p> </details> <details> <summary> `test/test_sparse.py`</summary> <p> ```python 20:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes 29: floating_and_complex_types, floating_and_complex_types_and, get_all_dtypes, get_all_int_dtypes, 1963: return dtype in get_all_int_dtypes() 1994: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2103: return dtype in get_all_int_dtypes() 2138: dtypes(get_all_dtypes(include_bool=False, include_half=False, 2626: all_sparse_dtypes = get_all_dtypes(include_complex=True) 2633: all_sparse_dtypes = get_all_dtypes(include_complex=True) 3230: dtypes(get_all_complex_dtypes(), 3231: get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 3234: get_all_fp_dtypes( ``` </p> </details> <details> <summary> `test/test_sparse_csr.py`</summary> <p> ```python 7:from torch.testing import get_all_complex_dtypes, get_all_fp_dtypes, floating_and_complex_types, make_tensor 17:from torch.testing._internal.common_dtype import floating_types, get_all_dtypes 120: dtypes(get_all_dtypes()) 133: dtypes(get_all_dtypes()) 150: dtypes(get_all_dtypes()) 180: dtypes(get_all_dtypes()) 201: dtypes(get_all_dtypes()) 210: dtypes(get_all_dtypes()) 225: dtypes(get_all_dtypes()) 244: dtypes(get_all_dtypes()) 263: dtypes(get_all_dtypes()) 285: dtypes(get_all_dtypes()) 411: dtypes(get_all_dtypes()) 482: dtypes(get_all_dtypes()) 502: dtypes(get_all_dtypes()) 562: dtypes(get_all_dtypes()) 588: dtypesIfCUDA(get_all_complex_dtypes(), 589: get_all_fp_dtypes(include_half=SM53OrLater, include_bfloat16=SM80OrLater)) 745: dtypesIfCUDA(get_all_complex_dtypes(), 746: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 765: dtypesIfCUDA(get_all_complex_dtypes(), 766: get_all_fp_dtypes(include_half=SM53OrLater and TEST_CUSPARSE_GENERIC, 801: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 841: torch.testing.get_all_fp_dtypes(include_bfloat16=SM80OrLater, 1182: dtypes(get_all_dtypes()) 1276: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_bfloat16=False)) 1286: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_tensor_creation_ops.py`</summary> <p> ```python 21: onlyCUDA, skipCPUIf, dtypesIfCUDA, skipMeta, get_all_device_types) 23: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 150: for dt in get_all_dtypes(): 160: for dt in get_all_dtypes(): 314: dtypes = [dtype for dtype in get_all_dtypes() if dtype != torch.bfloat16] 1012: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1013: get_all_complex_dtypes())) 1032: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1033: get_all_complex_dtypes())) 1050: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1051: get_all_complex_dtypes())) 1745: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1779: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1868: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1926: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 1954: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 1956: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, None) 1957: do_test_empty_full(self, get_all_math_dtypes('cpu'), torch.strided, torch_device) 2538: for device in get_all_device_types(): 2645: for dtype in get_all_dtypes(): 2678: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False) + 2679: get_all_complex_dtypes())) 2716: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 2827: for dt in get_all_dtypes(): 2913: dtypes(get_all_dtypes(include_bool=False, include_half=False)) 2914: dtypesIfCUDA(get_all_dtypes(include_bool=False, include_half=True)) 3028: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3033: dtypes((get_all_fp_dtypes() + get_all_complex_dtypes())) 3074: dtypes(get_all_dtypes(include_bool=False, include_half=False, include_complex=False)) 3075: dtypesIfCUDA(((get_all_int_dtypes() + [torch.float32, torch.float16, torch.bfloat16]) 3077: else get_all_dtypes(include_bool=False, include_half=True, include_complex=False))) 3873: dtypes(get_all_dtypes()) 3884: dtypes(get_all_dtypes(include_bool=False)) 3916: for other in get_all_dtypes(): 3922: dtypes(get_all_dtypes()) 3932: dtypes(get_all_dtypes(include_bool=False)) 3955: dtypes(get_all_dtypes(include_bool=False)) 3961: dtypes(get_all_dtypes(include_bool=False)) 3965: dtypes(get_all_dtypes()) ``` </p> </details> <details> <summary> `test/test_testing.py`</summary> <p> ```python 25:from torch.testing._internal.common_dtype import get_all_dtypes 31: dtypes((get_all_dtypes(include_half=True, include_bfloat16=False, ``` </p> </details> <details> <summary> `test/test_torch.py`</summary> <p> ```python 51: expectedAlertNondeterministic, get_all_device_types, skipXLA) 57: get_all_fp_dtypes, get_all_int_dtypes, get_all_math_dtypes, get_all_dtypes, get_all_complex_dtypes 296: for d in get_all_device_types(): 323: for device in get_all_device_types(): 324: for dt1 in get_all_dtypes(): 325: for dt2 in get_all_dtypes(): 343: all_dtypes = get_all_dtypes() 350: all_dtypes = get_all_dtypes() 781: for dtype in get_all_dtypes(): 986: for device in get_all_device_types(): 1017: for device in get_all_device_types(): 1018: for dtype in get_all_math_dtypes(device): 2792: for device in get_all_device_types(): 3186: dtypes(get_all_dtypes()) 3195: for error_dtype in get_all_dtypes(): 3203: dtypes(get_all_dtypes()) 3212: for error_dtype in get_all_dtypes(): 4539: dtypes(get_all_fp_dtypes()) 4545: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 4577: dtypes(get_all_fp_dtypes(include_half=False, include_bfloat16=False)) 4578: dtypesIfCPU((get_all_fp_dtypes(include_half=False, include_bfloat16=True))) 4579: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4599: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4600: dtypesIfCPU((get_all_dtypes(include_half=False, include_bfloat16=False, include_complex=False))) 4601: dtypesIfCUDA((get_all_dtypes(include_bfloat16=False, include_complex=False))) 4613: for p_dtype in get_all_fp_dtypes(include_half=device.startswith('cuda'), include_bfloat16=False): 4628: dtypes((get_all_fp_dtypes(include_half=False, include_bfloat16=False))) 4629: dtypesIfCUDA((get_all_fp_dtypes(include_bfloat16=False))) 4640: dtypes(get_all_fp_dtypes()) 4723: dtypes(get_all_fp_dtypes()) 4735: dtypes(get_all_fp_dtypes(include_bfloat16=False)) 4736: dtypesIfCUDA(get_all_fp_dtypes()) 4747: dtypes(get_all_fp_dtypes()) 4761: dtypes(get_all_fp_dtypes()) 4771: dtypes(get_all_fp_dtypes()) 4792: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 5302: dtypes(get_all_dtypes(include_bfloat16=False)) 5322: dtypes(get_all_dtypes(include_half=False, include_bfloat16=False)) 5323: dtypesIfCPU(get_all_dtypes(include_bfloat16=False)) 5324: dtypesIfCUDA(get_all_dtypes(include_bfloat16=False)) 5591: for dt in get_all_dtypes(): 5611: for dt in get_all_dtypes(): 5678: for dt in get_all_dtypes(): 5696: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 5697: dtypes(set(get_all_math_dtypes('cpu'))) 5746: dtypes(get_all_dtypes()) 5780: dtypes(get_all_dtypes()) 5885: dtypes(get_all_dtypes()) 5902: dtypes(get_all_dtypes()) 5945: dtypes(get_all_dtypes()) 5979: dtypes(get_all_dtypes(include_bool=False)) 6049: dtypes(get_all_dtypes(include_bool=False)) 6092: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6093: get_all_complex_dtypes())) 6094: dtypesIfCPU(get_all_dtypes()) 6095: dtypesIfCUDA(get_all_dtypes()) 6122: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6123: get_all_complex_dtypes())) 6124: dtypesIfCPU(get_all_dtypes()) 6125: dtypesIfCUDA(get_all_dtypes()) 6163: dtypes((get_all_fp_dtypes(include_bfloat16=False, include_half=False) + 6164: get_all_complex_dtypes())) 6165: dtypesIfCPU(get_all_dtypes()) 6166: dtypesIfCUDA(get_all_dtypes()) 6190: dtypes((get_all_complex_dtypes() + 6191: get_all_int_dtypes())) 6238: dtypes(get_all_dtypes()) 6323: dtypes(get_all_dtypes()) 6389: dtypes(product(get_all_dtypes(), (torch.uint8, torch.bool))) 6699: dtypesIfCUDA(set(get_all_math_dtypes('cuda'))) 6700: dtypes(set(get_all_math_dtypes('cpu'))) 7452: dtypes(get_all_dtypes(include_bool=False)) 7461: dtypes(get_all_dtypes(include_bool=False)) 7477: dtypes(get_all_dtypes(include_bool=False)) 7496: dtypes(get_all_dtypes(include_bool=False)) 7538: dtypes(get_all_dtypes(include_bool=False)) 8162: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8163: get_all_complex_dtypes())) 8175: dtypes((get_all_int_dtypes() + get_all_fp_dtypes() + 8176: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_type_promotion.py`</summary> <p> ```python 14: get_all_dtypes, get_all_math_dtypes, get_all_int_dtypes, get_all_fp_dtypes 187: for dtype in get_all_dtypes(): 262: dtypes1 = get_all_math_dtypes('cuda') 263: dtypes2 = get_all_math_dtypes(device) 339: dtypes(itertools.product(get_all_dtypes(), get_all_dtypes())) 468: for dt1 in get_all_math_dtypes(device): 469: for dt2 in get_all_math_dtypes(device): 519: for dt1 in get_all_math_dtypes(device): 520: for dt2 in get_all_math_dtypes(device): 528: for dt in get_all_math_dtypes(device): 561: for dtype in get_all_dtypes(): 766: dtypes=get_all_math_dtypes(device)) 771: dtypes=get_all_math_dtypes(device)) 782: dtypes=get_all_math_dtypes(device)) 879: dtypes = get_all_dtypes(include_bfloat16=False) 898: dtypes = get_all_dtypes(include_bfloat16=False, include_bool=False) 965: dtypesIfCUDA(itertools.product(get_all_dtypes(include_bfloat16=False, include_complex=False), 966: get_all_dtypes(include_bfloat16=False, include_complex=False))) 967: dtypes(itertools.product(get_all_dtypes(include_half=False, include_bfloat16=False, 969: get_all_dtypes(include_half=False, include_bfloat16=False, 976: return dtype in get_all_int_dtypes() + [torch.bool] 979: return dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False) ``` </p> </details> <details> <summary> `test/test_unary_ufuncs.py`</summary> <p> ```python 24: floating_types_and, all_types_and_complex_and, floating_and_complex_types_and, get_all_dtypes, get_all_math_dtypes, 25: get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 517: dtypes((get_all_int_dtypes() + [torch.bool] + 518: get_all_fp_dtypes(include_bfloat16=False))) 596: dtypes(get_all_fp_dtypes(include_half=True, include_bfloat16=False)) 611: invalid_input_dtypes = get_all_int_dtypes() + \ 612: get_all_complex_dtypes() + \ 619: for dtype in get_all_fp_dtypes(include_half=True, include_bfloat16=False): 1048: dtypes(get_all_math_dtypes('cpu')) 1182: dtypesIfCUDA(get_all_fp_dtypes()) 1190: dtypesIfCUDA(get_all_fp_dtypes()) 1205: dtypesIfCUDA(get_all_fp_dtypes()) 1215: dtypesIfCUDA(get_all_fp_dtypes()) 1307: dtypes((get_all_dtypes(include_bool=False))) 1349: dtypes((get_all_fp_dtypes(include_half=False) + 1350: get_all_complex_dtypes())) 1351: dtypesIfCUDA((get_all_fp_dtypes(include_half=True) + 1352: get_all_complex_dtypes())) ``` </p> </details> <details> <summary> `test/test_view_ops.py`</summary> <p> ```python 19: get_all_dtypes, get_all_int_dtypes, get_all_fp_dtypes, get_all_complex_dtypes 124: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 131: dtypes(get_all_dtypes(include_bfloat16=False)) 213: for view_dtype in [get_all_fp_dtypes(), get_all_complex_dtypes()]: 220: dtypes(get_all_dtypes()) 224: for view_dtype in get_all_dtypes(): 305: dtypes(get_all_complex_dtypes(include_complex32=True)) 343: dtypes(get_all_dtypes()) 354: dtypes(get_all_dtypes()) 364: dtypes(get_all_dtypes()) 374: dtypes(get_all_dtypes()) 384: dtypes((get_all_int_dtypes() + get_all_fp_dtypes())) 395: dtypes(get_all_complex_dtypes()) 426: dtypes(get_all_complex_dtypes()) 451: dtypes(product(get_all_complex_dtypes(), get_all_dtypes())) 1263: dtypes((torch.testing.get_all_dtypes())) 1279: dtypes((torch.testing.get_all_dtypes())) 1405: dtypes((get_all_int_dtypes() + get_all_fp_dtypes(include_bfloat16=False) + 1406: get_all_complex_dtypes())) 1471: dtypes(get_all_dtypes(include_bfloat16=False)) 1574: dtypes(get_all_dtypes()) 1601: dtypes(get_all_dtypes(include_bfloat16=False)) 1632: dtypes(*get_all_dtypes(include_bfloat16=False)) 1711: for dt in get_all_dtypes(): 1717: for dt in get_all_dtypes(): 1724: for dt in get_all_dtypes(): ``` </p> </details> I'm looking forward to your viewpoints. Thanks :) cc: mruberry kshitij12345 anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71561 Reviewed By: samdow Differential Revision: D34856571 Pulled By: mruberry fbshipit-source-id: 0dca038bcad5cf69906245c496d2e61ac3876335 (cherry picked from commit b058f67b4313143efa714ab105f36e74083131b9)	2022-03-15 20:31:41 +00:00
Pearu Peterson	a5dcc0c378	Enable test_coalesce_cuda_bfloat16 (#73158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73158 Fixes #72893 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D34515679 Pulled By: cpuhrsch fbshipit-source-id: 049f8ddf53023b78e1b48e15bbd3cdc58b6bf692 (cherry picked from commit 28a44ca56f66bfaaf14a049856b7d89fec8cd838)	2022-02-28 19:34:20 +00:00
Pearu Peterson	3c932c345b	Fix test_Sparse_to_Sparse_copy__cuda_bfloat16 failure (#73157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73157 Fixes #72892 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D34398986 Pulled By: cpuhrsch fbshipit-source-id: 20214be1859354fb18a306e8d1de9852a898c485 (cherry picked from commit c1816ef0cf8834149bebcc11f4402f0eedfae6f7)	2022-02-28 05:33:50 +00:00
Pearu Peterson	16cd6853e1	Fix test_sparse_addmm_...float16 and test_sparse_matmul_...float16 test failures (#73155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73155 Fixes #73145 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D34398935 Pulled By: cpuhrsch fbshipit-source-id: b1e852f25b0888b37d9c9c1418ddf344ac8f0a04 (cherry picked from commit d63c977fb39c7dcb3f3d083edc4b25cd2d6c2ec4)	2022-02-26 05:30:36 +00:00
Pearu Peterson	4c522643e7	Fix CUDA error when multiplying sparse hybrid tensors with zero dense dimensions (#73428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73428 Fixes https://github.com/pytorch/pytorch/issues/73363 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D34478521 Pulled By: cpuhrsch fbshipit-source-id: cbc83f223a14c92ed8b284e5e2a8aab390e2bc5c (cherry picked from commit 9d7ecc848228f9a5b1761f9d3653d3cca49e0244)	2022-02-26 01:08:45 +00:00
Philip Meier	0973c5a1cc	align signature of make_tensor with other creation ops (#72702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72702 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D34457729 Pulled By: mruberry fbshipit-source-id: 83d580c4201eef946dc9cf4b9e28a3d36be55609 (cherry picked from commit aa4cf20fbeb4b795595729b8ac2e6ba7707d8283)	2022-02-25 06:30:31 +00:00
Rohan Varma	c3d79ac422	Manual skip sparse tests manual skip because not properly disabled by automation Differential Revision: [D34456851](https://our.internmc.facebook.com/intern/diff/D34456851/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/73374	2022-02-24 20:26:02 +00:00
Alban Desmaison	49444bb501	Revert D34400588: [pytorch][PR] super setUp call missing in TestSparse Test Plan: revert-hammer Differential Revision: D34400588 (`555b215a90`) Original commit changeset: 40ac1c56918d Original Phabricator Diff: D34400588 (`555b215a90`) fbshipit-source-id: 0375279d06cc7a9d612bd70cc4c042cb3319a5fc (cherry picked from commit 7cd3d2da907e6f0882f56c8843d50586756a2fe6)	2022-02-24 14:34:01 +00:00
Jane Xu	555b215a90	super setUp call missing in TestSparse (#73217 ) Summary: Should fix the fact that Sparse tests are not rightly disabled https://github.com/pytorch/pytorch/issues/73145#issuecomment-1046952585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73217 Reviewed By: atalman Differential Revision: D34400588 Pulled By: janeyx99 fbshipit-source-id: 40ac1c56918d5c47debf962a2bd218a325626ad8 (cherry picked from commit e63dae284ba9056567fcaffc54d1aa38151c0a12)	2022-02-23 19:36:50 +00:00
Nikita Shulga	5dad19fef0	Back out "[pytorch][PR] add BFloat16 sparse operators on CPU: copy, coalesce, sparse_mask, ad…" Summary: Original commit changeset: f1274125234a Original Phabricator Diff: D34343016 (`c6f56599bb`) Test Plan: Abovementioned PR regressed OSS CI Reviewed By: atalman Differential Revision: D34379703 fbshipit-source-id: bc624cfd86249dde2fac635d9b66f08f86b4aed9 (cherry picked from commit `e52827f1ae`)	2022-02-21 18:31:51 +00:00
Jiayi Sun	c6f56599bb	add BFloat16 sparse operators on CPU: copy, coalesce, sparse_mask, ad… (#72846 ) Summary: …d_out, addmm Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/72846 Reviewed By: mikaylagawarecki Differential Revision: D34343016 Pulled By: cpuhrsch fbshipit-source-id: f1274125234a3bacbb7a38fc642fbf5c9786d435 (cherry picked from commit `c819456abf`)	2022-02-19 01:33:51 +00:00
Pearu Peterson	e785c0a1ab	Enable Half/BFloat16 support for to_dense and coalesce methods. (#72397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72397 Test Plan: Imported from OSS Reviewed By: jbschlosser, zou3519 Differential Revision: D34286114 Pulled By: cpuhrsch fbshipit-source-id: a4f7e2abc3b2d37437cbd09d693c1b409bb011b9 (cherry picked from commit `74f94447fc`)	2022-02-17 02:54:23 +00:00
Philip Meier	b5f2574f36	no longer coalesce sparse COO tensors before comparison (#69751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69751 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D34262453 Pulled By: ezyang fbshipit-source-id: e2e62d2aa03fc569d2951c880960b256f5dc4aaa (cherry picked from commit `cb6b0ef719`)	2022-02-17 02:33:08 +00:00
Brian Hirsh	22ccf448e8	Revert D34034848: free up dispatch key space (in C++) Test Plan: revert-hammer Differential Revision: D34034848 (`6690256021`) Original commit changeset: 9677ee2c0a1a Original Phabricator Diff: D34034848 (`6690256021`) fbshipit-source-id: fd50943d915ef813bb9f9ab278fb582429eea3b1 (cherry picked from commit `3acefee1cd`)	2022-02-14 23:29:00 +00:00
Brian Hirsh	6690256021	free up dispatch key space (in C++) (#72402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402 The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes. Reland of D33255193 (`20b8653dfa`) ghstack-source-id: 148830728 Test Plan: Steps to test: (1) connect to a mobile OD (2) run `one_world android emulator android-29` in a terminal to start the android emulator (3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled` I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR. Reviewed By: albanD Differential Revision: D34034848 fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5 (cherry picked from commit `9ab9b12d35`)	2022-02-14 16:02:29 +00:00
Jacob Szwejbka	791e7df7d9	Back out "free up dispatch key space (in C++)" Summary: I think this diff stack broke all the related tasks below. Test Plan: For our failing tests: buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled For the ubn: Not really sure what to do, trying to build the app and see if I can use an effect? Reviewed By: shoumikhin Differential Revision: D34018849 fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65 (cherry picked from commit `3cc63cb2ea`)	2022-02-05 01:25:42 +00:00
Brian Hirsh	20b8653dfa	free up dispatch key space (in C++) (#69633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69633 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33255193 Pulled By: bdhirsh fbshipit-source-id: 79773e9c15bf4f2f27675121a49ff5ffd1375238 (cherry picked from commit `eac0b13005`)	2022-02-04 17:57:38 +00:00
Pearu Peterson	214f4bf2ff	Support sparse.sum on empty sparse tensor (#71091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71091 Fixes https://github.com/pytorch/pytorch/issues/65394 The masked sum on a full input tensor (of any layout) with an all-true mask is the same as the sum on the strided input tensor (after applying `to_dense` to sparse inputs). Since masked sum uses `torch.sparse.sum` then, for the simplicity of masked reductions implementations, its reduction behavior ought to be defined by the behavior of the `torch.sum`. This PR implements the behavioral connection with respect to the directional summation of empty sparse tensors that correspond to all-zero strided tensors. cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D33651750 Pulled By: cpuhrsch fbshipit-source-id: 703891bff88c8da6270b4272f5d2da81688db67d (cherry picked from commit `53f97e80f7`)	2022-01-19 18:58:08 +00:00
Pearu Peterson	677fab6d1d	Support broadcast_to on sparse COO tensors (#71073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71073 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33645744 Pulled By: cpuhrsch fbshipit-source-id: 4775c9636c4e868022a8c1bbfec93e351d1cf885 (cherry picked from commit `640f21e09a`)	2022-01-19 04:33:41 +00:00
Pearu Peterson	e7602a1e30	Fix multiplication of 0-D sparse tensors (#70749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70749 Fixes https://github.com/pytorch/pytorch/issues/65396 and a clang-tidy error. cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33439136 Pulled By: cpuhrsch fbshipit-source-id: 45ec58de7c18db183f891431d4a26e98fd0e924a	2022-01-06 13:36:46 -08:00
Peter Bell	6de9f0fc94	OpInfo: Allow sample_inputs_func to be any iterable (#69256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69256 Closes #52486 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32942008 Pulled By: mruberry fbshipit-source-id: f5b01b0298c0160b0bec6e86e2b6db8cfe746206	2021-12-09 08:37:26 -08:00
Peter Bell	1da1707568	Sparse: Implement simple unary ufuncs operators (#68887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887 Closes #46988, closes #46987, closes #46761 By "simple" I mean operators that map 0->0 so we can implement it by just re-dispatching on the values tensor. That does mean we have `sin` but not `cos` for example, but without fill value support this is the best that can be done. Most of these don't support autograd because the derivative formulas use unsupported operators. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32734911 Pulled By: cpuhrsch fbshipit-source-id: 203ab105799f3d2d682b01ca3d6b18e7c994776a	2021-12-01 05:43:19 -08:00
Eli Uriegas	251686fc4c	Revert D32706197: Sparse: Implement simple unary ufuncs operators Test Plan: revert-hammer Differential Revision: D32706197 (`fbaa19a6fa`) Original commit changeset: 65e1acb36457 fbshipit-source-id: 45c4b486f9eee200d5a1f6d46d267617124f8a5e	2021-11-30 10:50:12 -08:00
Peter Bell	fbaa19a6fa	Sparse: Implement simple unary ufuncs operators (#68887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887 Closes #46988, closes #46987, closes #46761 By "simple" I mean operators that map 0->0 so we can implement it by just re-dispatching on the values tensor. That does mean we have `sin` but not `cos` for example, but without fill value support this is the best that can be done. Most of these don't support autograd because the derivative formulas use unsupported operators. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32706197 Pulled By: cpuhrsch fbshipit-source-id: 65e1acb3645737ca7bdb7f2db739d8e118906f4b	2021-11-30 00:30:30 -08:00
Peter Bell	f5fa91ba2e	Sparse: Add additional opinfo tests (#68886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68886 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32697933 Pulled By: cpuhrsch fbshipit-source-id: fffdd1bc663cc1bc49abe8cf3680982d1cb497bc	2021-11-29 12:49:20 -08:00
Vinnam Kim	f89572f417	Add feature: zeros_like() from a dense tensor to a sparse tensor (#68108 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67904. - Create a sparse tensor when the sparse layout is given even if the input tensor is not sparse. cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/68108 Reviewed By: anjali411 Differential Revision: D32316269 Pulled By: cpuhrsch fbshipit-source-id: 923dbd4dc7c74f51f7cdbafb2375a30271a6a886	2021-11-11 08:54:15 -08:00
Jane Xu	793f366e34	[skip ci] Set test owners for sparse tests (#66863 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/66863 Reviewed By: anjali411 Differential Revision: D31771126 Pulled By: janeyx99 fbshipit-source-id: 6cb5ca0557e8555f6a09b3e607ff8888e505486e	2021-10-20 10:12:13 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Yukio Siraichi	c829cb6840	Port `min` kernel to structured kernels. (#61450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61450 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29741713 Pulled By: bdhirsh fbshipit-source-id: 2c107752a90fd39cfb55e08aaf3541bd484a5fc3	2021-09-28 14:03:54 -07:00
Ivan Yashchuk	1fec9cd76b	[Fixed] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980 ) Summary: This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices. The change is applied only to CUDA 11+ builds. `cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR. cc nikitaved pearu cpuhrsch IvanYashchuk ezyang anjali411 dylanbespalko mruberry Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980 Reviewed By: ngimel Differential Revision: D30994115 Pulled By: cpuhrsch fbshipit-source-id: 4f55b99e8e25079d6273b4edf95ad6fa85aeaf24	2021-09-21 13:03:40 -07:00
Philip Meier	26b7ff5aea	deprecate dtype getters from `torch.testing` namespace (#63554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554 Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold: 1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`. 2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries. We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30662206 Pulled By: mruberry fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56	2021-09-07 08:58:51 -07:00
Richard Zou	92b31b59af	Revert D29699456: [pytorch][PR] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] Test Plan: revert-hammer Differential Revision: D29699456 (`ad4848565e`) Original commit changeset: 407ae53392ac fbshipit-source-id: b6c70ba8bb28c0c38de47857030b69792a8470de	2021-09-01 07:32:24 -07:00
Saketh Are	83e28a7d28	Use stacklevel for floordiv deprecation warnings (#64034 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60548 `Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback. Repro: ``` import torch x = torch.tensor(0) x // 1 ``` Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide: ``` UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:571.) return torch.floor_divide(self, other) ``` After this change, the warning instead cites the user's offending line of Python code: ``` UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). x // 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034 Reviewed By: mruberry Differential Revision: D30658010 Pulled By: saketh-are fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a	2021-08-31 11:27:56 -07:00
Ivan Yashchuk	ad4848565e	Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980 ) Summary: This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices. The change is applied only to CUDA 11+ builds. `cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980 Reviewed By: ngimel Differential Revision: D29699456 Pulled By: cpuhrsch fbshipit-source-id: 407ae53392acb2f92396a62a57cbaeb0fe6e950b	2021-08-30 15:06:25 -07:00
Kushashwa Ravi Shrimali	d37636901e	[Doc] `make_tensor` to `torch.testing` module (#63925 ) Summary: This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs. TODOs: * [x] Add examples cc: pmeier mruberry brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925 Reviewed By: ngimel Differential Revision: D30633487 Pulled By: mruberry fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af	2021-08-30 12:25:40 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
mattip	c8eda919a4	test, fix sparse * dense exceptions and corner case (#61723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59916 This fixes two problems with sparse multiplication - 0d-dense * sparse was creating a non-sparse output and failing. - dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message <details> <summary> unhelpful error message </summary> Traceback (most recent call last): File "<stdin>", line 1, in <module> NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode]. SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel] SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel] SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel] SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel] BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback] ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel] UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback] Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback] Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] </details> Also added tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723 Reviewed By: ezyang Differential Revision: D29962639 Pulled By: cpuhrsch fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06	2021-08-05 11:27:12 -07:00
Kurt Mohler	87334c40a7	Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629 Reviewed By: mrshenli Differential Revision: D29774486 Pulled By: albanD fbshipit-source-id: bfc9119c478f0244d5be681bcf4954a3eb97e542	2021-07-20 10:55:43 -07:00
Anjali Chourdia	287603f51c	Revert D29698486: [pytorch][PR] Remove torch._bmm and remove torch.bmm deterministic arg documentation Test Plan: revert-hammer Differential Revision: D29698486 (`328606699f`) Original commit changeset: 5af2d3803ab1 fbshipit-source-id: ce954c13196b1fb8277d61a686ac351d3bf13903	2021-07-16 11:02:09 -07:00
Kurt Mohler	328606699f	Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629 Reviewed By: zou3519 Differential Revision: D29698486 Pulled By: albanD fbshipit-source-id: 5af2d3803ab1eb093616bcfc7e074d8b57ef6958	2021-07-16 09:18:34 -07:00
Joel Schlosser	03b5a225a7	Test parametrization for instantiated device-specific tests (#60233 ) Summary: The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`. This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic. One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism. The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability. Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233 Reviewed By: iramazanli Differential Revision: D29494995 Pulled By: jbschlosser fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc	2021-06-30 18:50:22 -07:00
Ivan Yashchuk	90303157ab	Enable complex dtypes for coo_sparse-coo_sparse matmul [CPU] (#59554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59554 This PR enables complex numbers supports for matrix-matrix multiplication of COO sparse matrices. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28968309 Pulled By: anjali411 fbshipit-source-id: 4fd471e76a5584366aabc86c08b4564667ee54ca	2021-06-08 19:34:41 -07:00
Ivan Yashchuk	acc47357b5	Fix torch.conj for zero-dimensional sparse coo matrix (#59553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59553 Added a test for 0x0 sparse coo input for sparse_unary_ufuncs. This test fails for `conj` on master. Modified `unsupportedTypes` for test_sparse_consistency, complex dtypes pass, but float16 doesn't pass for `conj` because `to_dense()` doesn't work with float16. Fixes https://github.com/pytorch/pytorch/issues/59549 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28968215 Pulled By: anjali411 fbshipit-source-id: 44e99f0ce4aa45b760d79995a021e6139f064fea	2021-06-08 15:46:49 -07:00
Peter Bell	99f2000a99	Migrate nonzero from TH to ATen (CPU) (#59149 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. \| Shape \| Before \| After (1 thread) \| After (8 threads) \| \|:----------:\|--------:\|-----------------:\|------------------:\| \| 256,128,32 \| 2610 us \| 2150 us \| 551 us \| \| 128,128,32 \| 1250 us \| 1020 us \| 197 us \| \| 64,128,32 \| 581 us \| 495 us \| 99 us \| \| 32,128,32 \| 292 us \| 255 us \| 83 us \| \| 16,128,32 \| 147 us \| 126 us \| 75 us \| \| 8,128,32 \| 75 us \| 65 us \| 65 us \| \| 4,128,32 \| 39 us \| 33 us \| 33 us \| \| 2,128,32 \| 20 us \| 18 us \| 18 us \| \| 1,128,32 \| 11 us \| 9 us \| 9 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149 Reviewed By: mruberry Differential Revision: D28817466 Pulled By: ngimel fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732	2021-06-02 12:26:29 -07:00
Natalia Gimelshein	657b75d155	Revert D28700259: [pytorch][PR] Migrate nonzero from TH to ATen (CPU) Test Plan: revert-hammer Differential Revision: D28700259 (`95b1bc1009`) Original commit changeset: 9b279ca7c36d fbshipit-source-id: 267afe63376be598d24c862e02e3b4b3ea75f77c	2021-05-27 20:07:30 -07:00
Peter Bell	95b1bc1009	Migrate nonzero from TH to ATen (CPU) (#58811 ) Summary: Closes gh-24745 The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works. This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location. \| Shape \| Before \| After (1 thread) \| After (8 threads) \| \|:----------:\|--------:\|-----------------:\|------------------:\| \| 256,128,32 \| 2610 us \| 2220 us \| 496 us \| \| 128,128,32 \| 1250 us \| 976 us \| 175 us \| \| 64,128,32 \| 581 us \| 486 us \| 88 us \| \| 32,128,32 \| 292 us \| 245 us \| 80 us \| \| 16,128,32 \| 147 us \| 120 us \| 71 us \| \| 8,128,32 \| 75 us \| 61 us \| 61 us \| \| 4,128,32 \| 39 us \| 32 us \| 32 us \| \| 2,128,32 \| 20 us \| 17 us \| 17 us \| \| 1,128,32 \| 11 us \| 9 us \| 9 us \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811 Reviewed By: anjali411 Differential Revision: D28700259 Pulled By: ngimel fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159	2021-05-27 10:06:54 -07:00
Pearu Peterson	be4ba29d49	Detect overflow in numel of sparse COO tensor (#57492 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57416 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57492 Reviewed By: albanD Differential Revision: D28273649 Pulled By: mruberry fbshipit-source-id: 08ba50509556df1981d7ede025d84a836d2e8e5e	2021-05-25 22:16:21 -07:00
Alexander	6f2c0cccdd	New: sparse complex: add linear algebra, addmm (#57129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57129 Test Plan: Imported from OSS Reviewed By: janeyx99, astaff Differential Revision: D28112701 Pulled By: ezyang fbshipit-source-id: 1b253453dc19e908fb18d0b1a83738243e0a8d59	2021-05-07 05:37:48 -07:00
Alexander	a911c4fc1c	New: Initial support for sparse complex tensors constructors for CPU/CUDA (#57125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57125 I'm opening this PR, solving the last issued reported before merging PR #54153 https://github.com/pytorch/pytorch/pull/54153#issuecomment-827997616, Solves gh-50690 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28112702 Pulled By: ezyang fbshipit-source-id: 915681954edb14b7c19c3ffe641af2d2e6649576	2021-05-07 05:36:41 -07:00
Peter Bell	a5288a0244	Sparse support for division rounding_mode argument (#51989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51989 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28118114 Pulled By: mruberry fbshipit-source-id: 2a76ee55c3845552e57e93d54628ce3c2fab3399	2021-05-01 17:37:25 -07:00
Mike Ruberry	7bcce2acb9	Revert D27765618: Initial support for sparse complex tensors constructors for CPU/CUDA Test Plan: revert-hammer Differential Revision: D27765618 (`daef60c3b7`) Original commit changeset: a9cdd31d5c7a fbshipit-source-id: f700d5db7ff8930b9158460b5a77f68a35e212a4	2021-04-27 15:48:51 -07:00
Alexander	0d41122e61	Eliminate global usage of torch.set_default_dtype in sparse test (#56393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56393 Fixes for gh-56369 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27913266 Pulled By: mruberry fbshipit-source-id: 2c590d3a2188aae251184f08c1a6a2c4c570d150	2021-04-27 15:23:14 -07:00
Alexander	daef60c3b7	Initial support for sparse complex tensors constructors for CPU/CUDA (#54153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153 Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA. - [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors - [x] add complex support to coalesce function - [x] add complex support to to_dense function - [x] add complex support to to_sparse function - [x] add complex support to sparse_add function - [x] add unit tests Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra. Note: Before using ghstack the original PR was #50984 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27765618 Pulled By: ezyang fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89	2021-04-27 14:39:13 -07:00
sorenrasmussenai	f27513e951	Fix bug in torch.sparse.addmm on CUDA when beta != 0 or 1 (#56160 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55917, which caused `torch.sparse.addmm` to fail on CUDA whenever `beta` was different from 0 or 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56160 Reviewed By: ejguan Differential Revision: D27825108 Pulled By: ngimel fbshipit-source-id: 2ade5ea38c5322768dc4dffb40c65fcbb17ec201	2021-04-26 02:57:41 -07:00
Alexander	6ee333cdb5	modernize test_sparse (#54572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54572 Adding device generic tests to `test_sparse`. Follow-up PR: #54153 I think is ready to review. Looking forward your comments cc mruberry. Thanks Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27562663 Pulled By: mruberry fbshipit-source-id: c48973e707f779b529bc7f61b75103194b428987	2021-04-09 12:19:29 -07:00
Alban Desmaison	b91d48877d	Reland Fix reference cycle in sparse coalesce graph (#55404 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/52874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55404 Reviewed By: bdhirsh Differential Revision: D27600438 Pulled By: albanD fbshipit-source-id: f5c286638b324ad59be65657a016028af5e2b303	2021-04-07 12:02:42 -07:00
Brian Hirsh	ec80981d28	Revert D27246997: [pytorch][PR] Fix reference cycle in sparse coalesce graph Test Plan: revert-hammer Differential Revision: D27246997 (`815bfad28c`) Original commit changeset: 0fe6c1104350 fbshipit-source-id: 4d345718589a642d3c65474b266342285205ccdf	2021-04-06 11:45:27 -07:00
Peter Bell	815bfad28c	Fix reference cycle in sparse coalesce graph (#52874 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52253 In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor. As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874 Reviewed By: bdhirsh Differential Revision: D27246997 Pulled By: albanD fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3	2021-04-06 08:32:19 -07:00
Heitor Schueroff	6d87b3667f	Added support for TensorList inputs in OpInfo (#54922 ) Summary: Stack: * https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs * __#54922 Added support for TensorList inputs in OpInfo__ Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck. Note: JIT testing support for TensorList inputs will be added in a follow up PR. Fixes https://github.com/pytorch/pytorch/issues/51996 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922 Reviewed By: H-Huang Differential Revision: D27448952 Pulled By: heitorschueroff fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278	2021-03-31 04:42:10 -07:00
Edward Yang	e0aebe241d	Refactor tensor_new.cpp to use TensorOptions instead of DispatchKey (#54034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034 Fixes #53544 I had to touch a bunch of lines but the refactoring was fairly mechanical. Here's how it works. The basic concept behind this PR is that tensor_new.cpp was previously abusing DispatchKey when it actually meant TensorOptions. The provided DispatchKey argument to most of the constructor functions typically comes from torch::tensors::get_default_dispatch_key(); it doesn't really make sense for people to set the default dispatch key, but this got grandfathered in due to the old API set_default_tensor_type (where the "Type" concept got refactored into "DispatchKey" concept over time). See also #53124. But the upshot is that, semantically, what we refer to as the default dispatch key really is more like torch.set_default_tensor_type(torch.Tensor) versus torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user wants to do something about construction of the tensor, and TensorOptions captures that exactly. So, how exactly to translate from one to the other? - Sources (things that used to PRODUCE DispatchKey) - Most top level functions take a DispatchKey as their argument. I use the new function dispatchKeyToTensorOptions to convert it into a TensorOptions - typeIdWithDefault now produces a TensorOptions (probably could do with a rename, though I didn't) - Sinks (things that used to CONSUME DispatchKey) - Previously, the function options() was typically used to convert the DispatchKey into a TensorOptions. Now its replacement build_options just takes a TensorOptions and sets some extra fields on it. Irritatingly, I can't just replace `build_options(options, scalar_type, device)` with `options.dtype(scalar_type).device(device)` because the semantics are slightly different: if device is nullopt, we should preserve the usage of the device specified in options (what options.device() does is overwrite the device unconditionally; e.g., if device is nullopt, unset device from options) - The other major sink for DispatchKey was `internal_new_from_data`, but it turns out it only really extracts the device type from the dispatch key. Now it just pulls out the device from TensorOptions. - To actually do the translation of DispatchKey to TensorOptions, I introduce new functions dispatchKeyToLayout (replicating layout_from_backend--there are still a few uses of this function so I couldn't delete it) and dispatchKeyToDeviceType (replacing computeDeviceType) - In all internal functions, whenever DispatchKey is taken as an argument, I instead take TensorOptions as an argument, and pass it along. - Anywhere `legacyExtractDispatchKey(other.key_set())` equality was previously used, I now do `other.options().type_equal()`, which is the intended BC for doing "backend to backend" comparisons - There are a few places in the sparse constructors where we allocated a tensor for values, and then read out the dispatch key from the result to allocate the keys. As best as I can tell, this is totally equivalent to just passing in the options to both values and indices (the only difference is dtype, which is captured via a separate argument) This refactor doesn't really go far enough: for example, there are now functions that take both TensorOptions and ScalarType, when really the TensorOptions can capture this all. I kept it solely just s/DispatchKey/TensorOptions/ to reduce the number of possible bugs; also, a lot of this will be mooted by a proper fix to #53124. Even with this limited refactor, the payoff is sweet. I can delete: - backendToCPU - backendToXPU - backendToCUDA - backendToHIP - backendToBackendOfDeviceType The reason I can do this is because I can simply overwrite layout in TensorOptions to do the conversion, rather than having to type out each backend case explicitly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27109509 Pulled By: ezyang fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9	2021-03-19 09:08:32 -07:00
mattip	54a2498919	Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387 ) Summary: Related to https://github.com/pytorch/pytorch/issues/50006 Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387 Reviewed By: albanD Differential Revision: D26773387 Pulled By: mruberry fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd	2021-03-08 03:32:14 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Rong Rong (AI Infra)	b52e2e6045	[BE] _get_torch_cuda_version should return tuple (#52409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52409 Reviewed By: jbschlosser, glaringlee Differential Revision: D26513924 Pulled By: walterddr fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734	2021-02-18 09:28:38 -08:00
Mike Ruberry	594a66d778	Warn about floor_divide performing incorrect rounding (#50281 ) (#50281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745 Test Plan: Imported from OSS Reviewed By: ngimel Pulled By: mruberry Differential Revision: D26257855 fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b	2021-02-10 03:13:34 -08:00
Jeffrey Wan	c0966914bc	Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49409 There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories: 1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead 3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?) Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False. So far exceptions to the above (as discovered by CI) include: - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103) - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236) - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235) - test_data_parallel (test_data_parallel_buffers_requiring_grad) - SIGSEGV (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697) - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315) Possible TODO is to prevent new tests from invoking external gradcheck. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133 Reviewed By: ezyang Differential Revision: D26147919 Pulled By: soulitzer fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432	2021-01-29 09:13:37 -08:00
Kyle Chen	d5e5c5455a	[ROCm] re-enable test_sparse.py tests (#50557 ) Summary: Signed-off-by: Kyle Chen <kylechen@amd.com> cc: jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/50557 Reviewed By: mruberry Differential Revision: D25941432 Pulled By: ngimel fbshipit-source-id: 534fc8a91a48fa8b3b397e63423cd8347b41bbe2	2021-01-18 23:36:39 -08:00
Nathan Howell	c517e15d79	Add support for converting sparse bool tensors to dense (#50019 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50019 Reviewed By: smessmer Differential Revision: D25782045 Pulled By: ezyang fbshipit-source-id: a8389cbecb7e79099292a423a6fd8ac28631905b	2021-01-06 07:38:14 -08:00
mattip	f96ce3305c	prohibit assignment to a sparse tensor (#50040 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48225 by prohibiting assignment to a sparse Tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50040 Reviewed By: mrshenli Differential Revision: D25757125 Pulled By: zou3519 fbshipit-source-id: 3db6f48932eb10bf6ca5e97a6091afcabb60e478	2021-01-04 14:38:35 -08:00
Himangshu	9552cc65d4	Creation of test framework for Sparse Operators (#48488 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48488 Reviewed By: ngimel Differential Revision: D25696487 Pulled By: mruberry fbshipit-source-id: dc4f57c6628f62b74dd321f3f6b0fff86f25b040	2020-12-23 15:42:26 -08:00
Alexander	44ce0b8883	Sparse-sparse matrix multiplication (CPU/CUDA) (#39526 ) Summary: This PR implements matrix multiplication support for 2-d sparse tensors using the COO sparse format. The current implementation of `torch.sparse.mm` support this configuration, `torch.sparse.mm(sparse_matrix1, sparse_matrix2.to_dense())`, but this could spend a lot of memory when sparse_matrix2's shape is large. This implementation extends `torch.sparse.mm` function to support `torch.sparse.mm(sparse_matrix1, sparse_matrix2)` Resolves #[20988](https://github.com/pytorch/pytorch/issues/20988) for CPU/CUDA. - [x] sparse matmul - [x] CPU/CUDA C++ implementation - [x] unittests - [x] update torch.sparse.mm documentation - [x] autograd support The CPU sparse-sparse matmul was implemented taking as a reference this work "Sparse Matrix Multiplication Package (SMMP)". The GPU sparse-sparse matmul is based on cuSparse, there is specific code for CUSPARSE when CUSPARSE_VERSION >= 11 and old version of CUSPARSE. Both CPU/CUDA rely on the sparse-sparse matmul algorithm using the CSR indices format as it is one of the fastest algorithm. Here it is the latest benchmark (script is here) results for torch.sparse.mm (CUDA) and torch.sparse.mm (CPU) and scipy, values are float32 scalars: size \| density \| sparse.mm(CUDA) \| sparse.mm(CPU) \| scipy_coo_matmul -- \| -- \| -- \| -- \| -- (32, 10000) \| 0.01 \| 822.7 \| 79.4 \| 704.1 (32, 10000) \| 0.05 \| 1741.1 \| 402.6 \| 1155.3 (32, 10000) \| 0.1 \| 2956.8 \| 840.8 \| 1885.4 (32, 10000) \| 0.25 \| 6417.7 \| 2832.3 \| 4665.2 (512, 10000) \| 0.01 \| 1010.2 \| 3941.3 \| 26937.7 (512, 10000) \| 0.05 \| 2216.2 \| 26903.8 \| 57343.7 (512, 10000) \| 0.1 \| 4868.4 \| 87773.7 \| 117477.0 (512, 10000) \| 0.25 \| 16639.3 \| 608105.0 \| 624290.4 (1024, 10000) \| 0.01 \| 1224.8 \| 13088.1 \| 110379.2 (1024, 10000) \| 0.05 \| 3897.5 \| 94783.9 \| 236541.8 (1024, 10000) \| 0.1 \| 10559.1 \| 405312.5 \| 525483.4 (1024, 10000) \| 0.25 \| 57456.3 \| 2424337.5 \| 2729318.7 A new backward algorithm was implemented using only `sparse @ sparse` and `sparse_mask` operations. Here is some benchmarking: ``` [------------------------- sparse.mm-backward -------------------------] \| sparse.backward \| dense.backward ----------------------------------------------------------------------- (32, 10000) \| 0.01 \| 13.5 \| 2.4 (32, 10000) \| 0.05 \| 52.3 \| 2.4 (512, 10000) \| 0.01 \| 1016.8 \| 491.5 (512, 10000) \| 0.05 \| 1604.3 \| 492.3 (1024, 10000) \| 0.01 \| 2384.1 \| 1963.7 (1024, 10000) \| 0.05 \| 3965.8 \| 1951.9 ``` I added new benchmark tests. Now I am using a real dataset used in recent studies [1, 2] with different sparsity levels. ``` [---------------------------------- matmul ---------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ------------------------------------------------------------------ (cpu) torch \| 5.4 \| 5.4 \| 5.2 \| 5.3 \| 5.3 \| 5.4 torch.sparse \| 122.2 \| 51.9 \| 27.5 \| 11.4 \| 4.9 \| 1.8 scipy \| 150.1 \| 87.4 \| 69.2 \| 56.8 \| 38.4 \| 17.1 (cuda) torch \| 1.3 \| 1.1 \| 1.1 \| 1.1 \| 1.1 \| 1.1 torch.sparse \| 20.0 \| 8.4 \| 5.1 \| 2.5 \| 1.5 \| 1.1 [----------------------------------- backward -----------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ----------------------------------------------------------------------- (cpu) torch \| 17.7 \| 17.9 \| 17.7 \| 17.7 \| 17.6 \| 17.9 torch.sparse \| 672.9 \| 432.6 \| 327.5 \| 230.8 \| 176.7 \| 116.7 (cuda) torch \| 3.8 \| 3.6 \| 3.5 \| 3.5 \| 3.6 \| 3.5 torch.sparse \| 68.8 \| 46.2 \| 35.6 \| 24.2 \| 17.8 \| 11.9 Times are in milliseconds (ms). ``` In summary, I can say that the new `sparse @ sparse` backward algorithm is better as it is more about saving space than performance. Moreover, it is better than other options tested before. ## References 1. Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen. Sparse GPU Kernels for Deep Learning. Proceedings of the International Conference for High Performance Computing, 2020. [https://github.com/google-research/google-research/tree/master/sgk](https://github.com/google-research/google-research/tree/master/sgk) 2. Trevor Gale, Erich Elsen, Sara Hooker. The State of Sparsity in Deep Neural Networks. [https://github.com/google-research/google-research/tree/master/state_of_sparsity](https://github.com/google-research/google-research/tree/master/state_of_sparsity) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39526 Reviewed By: mruberry Differential Revision: D25661239 Pulled By: ngimel fbshipit-source-id: b515ecd66d25f347d637e159d51aa45fb43b6938	2020-12-21 11:53:55 -08:00
Xiang Gao	87636c07bb	CUDA BF16 sparse (#48807 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/48807 Reviewed By: mruberry Differential Revision: D25526752 Pulled By: ngimel fbshipit-source-id: 9ff8e637486cfd67d46daf0c05142bbe611e08ec	2020-12-14 09:55:52 -08:00
kshitij12345	25ab39acd0	[numpy] `torch.asin` : promote integer inputs to float (#48461 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48461 Reviewed By: ngimel Differential Revision: D25192319 Pulled By: mruberry fbshipit-source-id: fd5dffeca9cd98b86782bfa6a9ab367e425ee934	2020-11-27 15:26:58 -08:00
kshitij12345	e9efd8df1b	[numpy] `torch.log1p` : promote integer inputs to float (#48002 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48002 Reviewed By: ngimel Differential Revision: D25148911 Pulled By: mruberry fbshipit-source-id: 902d0ddf699debd6edd1b3d55f5c73932ca45e83	2020-11-24 22:01:07 -08:00
Natalia Gimelshein	4a2fb34042	check sparse sizes (#47148 ) Summary: checks sizes of sparse tensors when comparing them in assertEqual. Removes additional checks in safeCoalesce, safeCoalesce should not be a test for `.coalesce()` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47148 Reviewed By: mruberry Differential Revision: D24823127 Pulled By: ngimel fbshipit-source-id: 9303a6ff74aa3c9d9207803d05c0be2325fe392a	2020-11-09 10:33:24 -08:00
vfdev-5	dc7cd97402	Fixes bug in sspaddmm (#45113 ) (#45963 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45113 Description: - Fixed bug in sspaddmm by calling contiguous on indices. - Added tests We have to make indices contiguous as we use `indices.data_ptr` in `_to_csr` which assumes row-contiguous storage: `be45c3401a/aten/src/ATen/native/sparse/SparseTensorMath.cpp (L1087-L1090)` > Part 1 of fixing this is probably to document sspaddmm. Part 2 may be to rewrite it using other ops. (https://github.com/pytorch/pytorch/issues/45113#issuecomment-700166809) - Docs will be written here: https://github.com/pytorch/pytorch/pull/45400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45963 Reviewed By: malfet Differential Revision: D24335599 Pulled By: ngimel fbshipit-source-id: 8278c73a1b4cccc5e22c6f3818dd222588c46b45	2020-10-15 16:50:16 -07:00
Alexander	29dc3c5ec8	Sparse softmax support (CUDA) (#42307 ) Summary: This PR implements softmax support for sparse tensors. Resolves gh-23651 for CUDA. - [x] sparse softmax - [x] CUDA C++ implementation - [x] unittests - [x] update softmax documentation - [x] autograd support - [x] sparse log_softmax - [x] CUDA C++ implementation - [x] unittests - [x] update log_softmax documentation - [x] autograd support Here are some benchmark (script is [here](https://gist.github.com/aocsa/fbc1827b3e49901512a33ba96092cbc1)) results for `torch.sparse.softmax and torch.softmax`, using CPU and GPU, values are float64 scalars, timing repeat is 1000: \| size \| density \| sparse CUDA \| sparse CPU \| \|--------------\|---------\|-------------\|------------\| \| (32, 10000) \| 0.01 \| 380.2 \| 687.5 \| \| (32, 10000) \| 0.05 \| 404.3 \| 2357.9 \| \| (32, 10000) \| 0.1 \| 405.9 \| 3677.2 \| \| (512, 10000) \| 0.01 \| 438.0 \| 5443.4 \| \| (512, 10000) \| 0.05 \| 888.1 \| 24485.0 \| \| (512, 10000) \| 0.1 \| 1921.3 \| 45340.5 \| \| size \| density \| dense CUDA \| dense CPU \| \|--------------\|---------\|-------------\|------------\| \| (32, 10000) \| 0.01 \| 23.6 \| 1943.2 \| \| (32, 10000) \| 0.05 \| 23.6 \| 1954.0 \| \| (32, 10000) \| 0.1 \| 23.5 \| 1950.0 \| \| (512, 10000) \| 0.01 \| 639.3 \| 39797.9 \| \| (512, 10000) \| 0.05 \| 640.3 \| 39374.4 \| \| (512, 10000) \| 0.1 \| 639.6 \| 39192.3 \| Times are in microseconds (us). Quick note: I updated the performance test again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42307 Reviewed By: ngimel Differential Revision: D23774427 Pulled By: mruberry fbshipit-source-id: bfabf726075b39dde544c10249f27ae1871f82c7	2020-09-24 00:07:30 -07:00
vfdev-5	c947ab0bb9	Added sparse support for asin and neg functions, updated log1p (#44028 ) Summary: Description: - [x] added C++ code for sparse `asin` and `neg` ops similarly to `log1p` op - [x] added tests - [x] coalesced input CPU/CUDA - [x] uncoalesced input CPU/CUDA - [x] added tests for `negative` and `arcsin` Backprop will be addressed in another PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44028 Reviewed By: agolynski Differential Revision: D23793027 Pulled By: mruberry fbshipit-source-id: 5fd642808da8e528cf6acd608ca0dcd720c4ccc3	2020-09-22 02:04:38 -07:00
Xiao Wang	d75c402755	Add cusolver to build, rewrite MAGMA inverse with cusolver (#42403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42265 This PR adds cusolver to the pytorch build, and enables the use of cusolver/cublas library functions on GPU `torch.inverse` on certain tensor shapes. Specifically, when * the tensor is two dimensional (single batch), or * has >2 dimensions (multiple batches) and `batch_size <= 2`, or * magma is not linked, cusolver/cublas will be used. In other conditions, the current implementation of MAGMA will still be used. `8c0949ae45/aten/src/ATen/native/cuda/BatchLinearAlgebra.cu (L742-L752)` The reason for this is that for tensors with large batch_size, `cublasXgetrfBatched` and `cublasXgetriBatched` doesn't perform very well. For `batch_size > 1`, we launch cusolver functions in multiple streams. This lets cusolver functions run in parallel, and can greatly increase the performance. When `batch_size > 2`, the parallel launched cusolver functions are slightly slower than the current magma implementation, so we still use the current magma impl. On CUDA 9.2, there were some numerical issues detected, so cusolver impl will not be used. The cusolver impl will also not be used on platforms other than Nvidia CUDA. `060769feaf/aten/src/ATen/native/cuda/BatchLinearAlgebraLib.h (L10-L13)` Note that there is a new heuristic used before cusolver/cublas calls here: `8c0949ae45/aten/src/ATen/native/cuda/MiscUtils.h (L113-L121)` where `use_loop_launch = true` means launch single batch cusolver functions in parallel, and `use_loop_launch = false` means use cublas_X_batched functions. When magma is enabled (only `batch_size <= 2` will be dispatched to cusolver/cublas), the heuristic will always return `true` and the cusolver calls are faster than small batch_size magma calls. When magma is disabled, this adds the functionality of `torch.inverse`, which was disabled before for all shapes (though large batch_size cublas performance may not be as well as magma). Checklist: - [X] Add benchmark, cpu, gpu-before (magma), gpu-after (cusolver) - [X] Rewrite single inverse (ndim == 2) with cusolver - [X] Rewrite batched inverse (ndim > 2) with cublas - [X] Add cusolver to build - [x] Clean up functions related to `USE_MAGMA` define guard - [x] Workaround for non-cuda platform - [x] Workaround for cuda 9.2 - [x] Add zero size check - [x] Add tests Next step: If cusolver doesn't cause any problem in pytorch build, and there are no major performance regressions reported after this PR being merged, I will start porting other cusolver/cublas functions for linear algebra to improve the performance. <details> <summary> benchmark 73499c6 </summary> benchmark code: https://github.com/xwang233/code-snippet/blob/master/torch.inverse/inverse-cusolver.ipynb shape meaning: * `[] 2 torch.float32 -> torch.randn(2, 2, dtype=torch.float32)` * `[2] 4 torch.float32 -> torch.randn(2, 4, 4, dtype=torch.float32)` \| shape \| cpu_time (ms) \| gpu_time_before (magma) (ms) \| gpu_time_after (ms) \| \| --- \| --- \| --- \| --- \| \| [] 2 torch.float32 \| 0.095 \| 7.534 \| 0.129 \| \| [] 4 torch.float32 \| 0.009 \| 7.522 \| 0.129 \| \| [] 8 torch.float32 \| 0.011 \| 7.647 \| 0.138 \| \| [] 16 torch.float32 \| 0.075 \| 7.582 \| 0.135 \| \| [] 32 torch.float32 \| 0.073 \| 7.573 \| 0.191 \| \| [] 64 torch.float32 \| 0.134 \| 7.694 \| 0.288 \| \| [] 128 torch.float32 \| 0.398 \| 8.073 \| 0.491 \| \| [] 256 torch.float32 \| 1.054 \| 11.860 \| 1.074 \| \| [] 512 torch.float32 \| 5.218 \| 14.130 \| 2.582 \| \| [] 1024 torch.float32 \| 19.010 \| 18.780 \| 6.936 \| \| [1] 2 torch.float32 \| 0.009 \| 0.113 \| 0.128 *regressed \| \| [1] 4 torch.float32 \| 0.009 \| 0.113 \| 0.131 regressed \| \| [1] 8 torch.float32 \| 0.011 \| 0.116 \| 0.129 regressed \| \| [1] 16 torch.float32 \| 0.015 \| 0.122 \| 0.135 regressed \| \| [1] 32 torch.float32 \| 0.032 \| 0.177 \| 0.178 regressed \| \| [1] 64 torch.float32 \| 0.070 \| 0.420 \| 0.281 \| \| [1] 128 torch.float32 \| 0.328 \| 0.816 \| 0.490 \| \| [1] 256 torch.float32 \| 1.125 \| 1.690 \| 1.084 \| \| [1] 512 torch.float32 \| 4.344 \| 4.305 \| 2.576 \| \| [1] 1024 torch.float32 \| 16.510 \| 16.340 \| 6.928 \| \| [2] 2 torch.float32 \| 0.009 \| 0.113 \| 0.186 regressed \| \| [2] 4 torch.float32 \| 0.011 \| 0.115 \| 0.184 regressed \| \| [2] 8 torch.float32 \| 0.012 \| 0.114 \| 0.184 regressed \| \| [2] 16 torch.float32 \| 0.019 \| 0.119 \| 0.173 regressed \| \| [2] 32 torch.float32 \| 0.050 \| 0.170 \| 0.240 regressed \| \| [2] 64 torch.float32 \| 0.120 \| 0.429 \| 0.375 \| \| [2] 128 torch.float32 \| 0.576 \| 0.830 \| 0.675 \| \| [2] 256 torch.float32 \| 2.021 \| 1.748 \| 1.451 \| \| [2] 512 torch.float32 \| 9.070 \| 4.749 \| 3.539 \| \| [2] 1024 torch.float32 \| 33.655 \| 18.240 \| 12.220 \| \| [4] 2 torch.float32 \| 0.009 \| 0.112 \| 0.318 regressed \| \| [4] 4 torch.float32 \| 0.010 \| 0.115 \| 0.319 regressed \| \| [4] 8 torch.float32 \| 0.013 \| 0.115 \| 0.320 regressed \| \| [4] 16 torch.float32 \| 0.027 \| 0.120 \| 0.331 regressed \| \| [4] 32 torch.float32 \| 0.085 \| 0.173 \| 0.385 regressed \| \| [4] 64 torch.float32 \| 0.221 \| 0.431 \| 0.646 regressed \| \| [4] 128 torch.float32 \| 1.102 \| 0.834 \| 1.055 regressed \| \| [4] 256 torch.float32 \| 4.042 \| 1.811 \| 2.054 regressed \| \| [4] 512 torch.float32 \| 18.390 \| 4.884 \| 5.087 regressed \| \| [4] 1024 torch.float32 \| 69.025 \| 19.840 \| 20.000 *regressed \| </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/42403 Reviewed By: ailzhang, mruberry Differential Revision: D23717984 Pulled By: ngimel fbshipit-source-id: 54cbd9ea72a97989cff4127089938e8a8e29a72b	2020-09-18 20:43:29 -07:00
vfdev	24df3b7373	torch.empty_like and torch.zeros_like raise error if any memory format is provided with sparse input (#43699 ) (#44058 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43699 - Changed the order of `TORCH_CHECK` and `if (options.layout() == kSparse && self.is_sparse())` inside `empty_like` method. - [x] Added tests EDIT: More details on that and why we can not take zeros_like approach. Python code : ```python res = torch.zeros_like(input_coalesced, memory_format=torch.preserve_format) ``` is routed to ```c++ // TensorFactories.cpp Tensor zeros_like( const Tensor& self, const TensorOptions& options, c10::optional<c10::MemoryFormat> optional_memory_format) { if (options.layout() == kSparse && self.is_sparse()) { auto res = at::empty({0}, options); // to be resized res.sparse_resize_and_clear_( self.sizes(), self.sparse_dim(), self.dense_dim()); return res; } auto result = at::empty_like(self, options, optional_memory_format); return result.zero_(); } ``` and passed to `if (options.layout() == kSparse && self.is_sparse())` When we call in Python ```python res = torch.empty_like(input_coalesced, memory_format=torch.preserve_format) ``` it is routed to ```c++ Tensor empty_like( const Tensor& self, const TensorOptions& options_, c10::optional<c10::MemoryFormat> optional_memory_format) { TORCH_CHECK( !(options_.has_memory_format() && optional_memory_format.has_value()), "Cannot set memory_format both in TensorOptions and explicit argument; please delete " "the redundant setter."); TensorOptions options = self.options() .merge_in(options_) .merge_in(TensorOptions().memory_format(optional_memory_format)); TORCH_CHECK( !(options.layout() != kStrided && optional_memory_format.has_value()), "memory format option is only supported by strided tensors"); if (options.layout() == kSparse && self.is_sparse()) { auto result = at::empty({0}, options); // to be resized result.sparse_resize_and_clear_( self.sizes(), self.sparse_dim(), self.dense_dim()); return result; } ``` cc pearu Pull Request resolved: https://github.com/pytorch/pytorch/pull/44058 Reviewed By: albanD Differential Revision: D23672494 Pulled By: mruberry fbshipit-source-id: af232274dd2b516dd6e875fc986e3090fa285658	2020-09-17 10:25:31 -07:00
Mike Ruberry	686e281bcf	Updates div to perform true division (#42907 ) Summary: This PR: - updates div to perform true division - makes torch.true_divide an alias of torch.div This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907 Reviewed By: ngimel Differential Revision: D23622114 Pulled By: mruberry fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927	2020-09-14 15:50:38 -07:00
vfdev	9f88bcb5a2	Minor typo fix (#42731 ) Summary: Just fixed a typo in test/test_sparse.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/42731 Reviewed By: ezyang Differential Revision: D22999930 Pulled By: mrshenli fbshipit-source-id: 1b5b21d7cb274bd172fb541b2761f727ba06302c	2020-08-07 11:17:51 -07:00
Nikita Shulga	aa4e91a6dc	Fix `TestSparse.test_bmm_windows_error` when CUDA is not available (#42626 ) Summary: Refactor comnon pattern of (torch.cuda.version and [int(x) for x in torch.cuda.version.split(".")] >= [a, b]) into `_get_torch_cuda_version()` function Pull Request resolved: https://github.com/pytorch/pytorch/pull/42626 Reviewed By: seemethere Differential Revision: D22956149 Pulled By: malfet fbshipit-source-id: 897c55965e53b477cd20f69e8da15d90489035de	2020-08-05 16:07:35 -07:00
peter	b08347fd7b	Add CUDA 11 builds for Windows CI (#42420 ) Summary: Stacked on https://github.com/pytorch/pytorch/pull/42410. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42420 Reviewed By: seemethere Differential Revision: D22917230 Pulled By: malfet fbshipit-source-id: 6ad394f7f8c430c587e0b0d9c5a5e7b7bcd85bfe	2020-08-05 09:40:33 -07:00
Kurt Mohler	206db5c127	Improve `torch.norm` functionality, errors, and tests (#41956 ) Summary: BC-Breaking Note: BC breaking changes in the case where keepdim=True. Before this change, when calling `torch.norm` with keepdim=True and p='fro' or p=number, leaving all other optional arguments as their default values, the keepdim argument would be ignored. Also, any time `torch.norm` was called with p='nuc', the result would have one fewer dimension than the input, and the dimensions could be out of order depending on which dimensions were being reduced. After the change, for each of these cases, the result has the same number and order of dimensions as the input. PR Summary: * Fix keepdim behavior * Throw descriptive errors for unsupported sparse norm args * Increase unit test coverage for these cases and for complex inputs These changes were taken from part of PR https://github.com/pytorch/pytorch/issues/40924. That PR is not going to be merged because it overrides `torch.norm`'s interface, which we want to avoid. But these improvements are still useful. Issue https://github.com/pytorch/pytorch/issues/24802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/41956 Reviewed By: albanD Differential Revision: D22837455 Pulled By: mruberry fbshipit-source-id: 509ecabfa63b93737996f48a58c7188b005b7217	2020-08-01 01:55:12 -07:00
Mike Ruberry	12cd083fd7	Updates torch.tensor, torch.as_tensor, and sparse ctors to use the device of inputs tensors they're given, by default (#41984 ) Summary: BC-Breaking Note This PR changes the behavior of the torch.tensor, torch.as_tensor, and sparse constructors. When given a tensor as input and a device is not explicitly specified, these constructors now always infer their device from the tensor. Historically, if the optional dtype kwarg was provided then these constructors would not infer their device from tensor inputs. Additionally, for the sparse ctor a runtime error is now thrown if the indices and values tensors are on different devices and the device kwarg is not specified. PR Summary This PR's functional change is a single line: ``` auto device = device_opt.has_value() ? device_opt : (type_inference ? var.device() : at::Device(computeDeviceType(dispatch_key))); ``` => ``` auto device = device_opt.has_value() ? device_opt : var.device(); ``` in `internal_new_from_data`. This line entangled whether the function was performing type inference with whether it inferred its device from an input tensor, and in practice meant that ``` t = torch.tensor((1, 2, 3), device='cuda') torch.tensor(t, dtype=torch.float64) ``` would return a tensor on the CPU, not the default CUDA device, while ``` t = torch.tensor((1, 2, 3), device='cuda') torch.tensor(t) ``` would return a tensor on the device of `t`! This behavior is niche and odd, but came up while aocsa was fixing https://github.com/pytorch/pytorch/issues/40648. An additional side affect of this change is that the indices and values tensors given to a sparse constructor must be on the same device, or the sparse ctor must specify the dtype kwarg. The tests in test_sparse.py have been updated to reflect this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41984 Reviewed By: ngimel Differential Revision: D22721426 Pulled By: mruberry fbshipit-source-id: 909645124837fcdf3d339d7db539367209eccd48	2020-07-25 02:49:45 -07:00
Mike Ruberry	13120bf677	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21740237 Pulled By: mruberry fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042	2020-05-27 06:31:07 -07:00
Rohan Varma	63e545e0fe	Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol Test Plan: revert-hammer Differential Revision: D21717199 Original commit changeset: 9feb856f94ee fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259	2020-05-26 18:23:59 -07:00
Mike Ruberry	6ddca30b2d	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21717199 Pulled By: mruberry fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a	2020-05-26 08:30:23 -07:00
Pearu Peterson	48c0331e01	Sparse softmax support (CPU) (#36305 ) Summary: This PR implements softmax support for sparse tensors. The sparse softmax is related to dense softmax when the values of unspecified sparse tensor entries are taken to be `-inf` that will have the effect of "zero entries ignored". This relation is used for testing the correctness of results here. Resolves https://github.com/pytorch/pytorch/issues/23651 for CPU. - [x] sparse softmax - [x] CPU C++ implementation - [x] unittests - [x] update softmax documentation - [x] autograd support - [x] sparse log_softmax - [x] CPU C++ implementation - [x] unittests - [x] update log_softmax documentation - [x] autograd support Pull Request resolved: https://github.com/pytorch/pytorch/pull/36305 Differential Revision: D21566540 Pulled By: ezyang fbshipit-source-id: a632ea69c38622f960721482e442efeb8d0a54fc	2020-05-14 08:08:40 -07:00
Hong Xu	336e1ec592	Clean up error handling in is_nonzero and where in TensorCompare.cpp (#38150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38150 Differential Revision: D21539736 Pulled By: ezyang fbshipit-source-id: e390c12f5948192a552d66dcd1bb89b2cb45f170	2020-05-13 20:19:40 -07:00
ashishfarmer	bcdff7eb67	Fix for tests on ROCm (#37616 ) Summary: This pull request fixes and re-enables two of the tests disabled in https://github.com/pytorch/pytorch/issues/37427 1. `test_sparse_add_out_bfloat16` in test_sparse.py fixed to use updated `atol` argument instead of `prec` for `assertEqual` 2. The conversion of `flt_min` to `int64` is divergent on HIP compared to numpy. The change removes that conversion from the `test_float_to_int_conversion_finite` test case in test_torch.py cc: ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37616 Differential Revision: D21379876 Pulled By: ezyang fbshipit-source-id: 2bfb41d67874383a01330c5d540ee516b3b07dcc	2020-05-04 07:16:54 -07:00
Peter Bell	675b3fc834	Prevent unbounded growth of sparse tensor in add operation (#36030 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34964 Sparse cuda add was implemented by just concatenating the indices and values for the tensor. If called repeatedly in a tight loop this will let `nnz` grow unbounded. In the worst case of `x.add_(x)` it grows exponentially. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36030 Differential Revision: D20873504 Pulled By: zou3519 fbshipit-source-id: d90ed8dda0c89571fb89e358757b5dde299513df	2020-05-01 12:05:15 -07:00
ashishfarmer	bbd2350c99	Disable tests failing on test2 in ROCm CI (#37427 ) Summary: This pull request disables the unit tests that were observed to be failing once `test2` was enabled. These tests will be one by one looked at and fixed at the earliest, but until then disabling them to unblock `test2` The pull request also disables fftPlanDestroy for rocFFT to avoid double-freeing FFT handles cc: ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37427 Differential Revision: D21302909 Pulled By: ezyang fbshipit-source-id: ecadda3778e65b7f4f97e24b932b96b9ce928616	2020-04-29 09:56:28 -07:00
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
Kurt Mohler	c7cf4c1bd6	Bmm sparse dense (#33430 ) Summary: Add sparse-dense BMM operation for CUDA and CPU. Closes https://github.com/pytorch/pytorch/issues/5672 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33430 Differential Revision: D21017828 Pulled By: ezyang fbshipit-source-id: 5bf60efcb16d05c08c7a284accc04d8968f98752	2020-04-20 09:35:16 -07:00
rohithkrn	3e402a5940	[ROCm] Enable BFloat16 type for add_out_sparse (#35978 ) Summary: Enables bfloat16 type for add_out of sparse tensors. Also enabled it for coalesce() which is used in unit test reference checking. iotamudelta ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/35978 Differential Revision: D20874142 Pulled By: ezyang fbshipit-source-id: af8d2f4bc5f5cc3bb7f8cb1e3c688669ba3d13b9	2020-04-06 14:07:17 -07:00
shihongzhi	74ef0adf60	add mv operator to SparseTensor (#21782 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21266 add mv operator to SparseTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/21782 Differential Revision: D20794372 Pulled By: ezyang fbshipit-source-id: 6b396357d512f7a5860da83e7976c33bf92cf974	2020-04-01 12:21:50 -07:00
Mike Ruberry	7c1ea736ba	Extends true_divide to be a method (#34794 ) Summary: Per title. See related https://github.com/pytorch/pytorch/pull/34570. In PyTorch 1.7 the plan is for torch.div and Python's division operator to perform "true" division, like Python 3, JAX, and NumPy. To facilitate this change, this PR expands true_divide to be a method so it can cover all of torch.div's use cases. New true_divide tests are added to test_torch.py, test_type_promotion.py, and test_sparse.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34794 Differential Revision: D20545507 Pulled By: mruberry fbshipit-source-id: 55286f819716c8823d1930441a69008560ac2bd5	2020-03-23 23:12:23 -07:00
Mike Ruberry	3b7e1cd2cc	Makes floor_divide a method, adds sparse floor division (#34552 ) Summary: (Updated per review feedback) `torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to: - have an out variant: `floor_divide(x, y, out=z)` - be a method on a tensor: `x.floor_divide(y)` - have an in-place variant: `x.floor_divide_(y)` - work with sparse tensors Tests are added to test_sparse.py and test_torch.py for these new behaviors. In addition, this PR: - cleans up the existing sparse division and true_division code and improves their error message - adds testing of sparse true_division to test_sparse.py - extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y). The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors. There are two potential follow-up issues suggested by this PR: - the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes - the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552 Differential Revision: D20509850 Pulled By: mruberry fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8	2020-03-18 15:00:53 -07:00
Mike Ruberry	a1eaaea288	Revert D20497453: [pytorch][PR] Makes floor_divide a method, adds sparse floor division Test Plan: revert-hammer Differential Revision: D20497453 Original commit changeset: ac326f2007d8 fbshipit-source-id: b94b89b1a25521506e3d0a6b072d3d4d8c55e63d	2020-03-18 01:48:50 -07:00
Mike Ruberry	b7129050e7	Makes floor_divide a method, adds sparse floor division (#34552 ) Summary: (Updated per review feedback) `torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to: - have an out variant: `floor_divide(x, y, out=z)` - be a method on a tensor: `x.floor_divide(y)` - have an in-place variant: `x.floor_divide_(y)` - work with sparse tensors Tests are added to test_sparse.py and test_torch.py for these new behaviors. In addition, this PR: - cleans up the existing sparse division and true_division code and improves their error message - adds testing of sparse true_division to test_sparse.py - extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y). The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors. There are two potential follow-up issues suggested by this PR: - the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes - the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552 Differential Revision: D20497453 Pulled By: mruberry fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d	2020-03-18 00:01:45 -07:00
Terence Feng	2cf344be4c	Turn on exact_dtype by default on test_sparse.py (#34489 ) (#34542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34542 Turn on exact_dtype by default on test_sparse.py (#34489) Pull Request resolved: #34489 Test Plan: ``` python test/test_sparse.py ``` Imported from OSS Differential Revision: D20369764 fbshipit-source-id: ade2434f77af8ae419bda653b4c46616c052a8b2	2020-03-10 12:52:09 -07:00
Xiao Wang	ccf6fab65e	Fix doc and type hints for "torch.add"; fix deprecated python calls in tests (#33935 ) Summary: This PR fixed documentation for `torch.add` with alpha. It also fixed these deprecated python calls `torch.add` and `torch.addmm` in tests, which may affect performance in test/test_sparse.py and test/test_nn.py. cc csarofeen ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/33935 Differential Revision: D20313320 Pulled By: ngimel fbshipit-source-id: fb08413d7e244865952e3fc0e1be7f1794ce4e9a	2020-03-06 15:53:58 -08:00
Mike Ruberry	8291e06f8f	Fixes cuda->numpy and non-strided->numpy segfaults (#33612 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/33300. Calling .numpy() on a CUDA or non-strided (e.g. sparse) tensor segfaults in current PyTorch. This fixes the segfaults and throws the appropriate TypeError, as was intended. Two tests, one in test_cuda.py and the other in test_sparse.py, are added to verify the behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33612 Differential Revision: D20038210 Pulled By: mruberry fbshipit-source-id: 265531dacd37c392232fd3ec763489a62ef54795	2020-02-21 22:23:08 -08:00
Hong Xu	a6a72ac68f	Fix all occurrences of C416. (#33429 ) Summary: C416: Unnecessary (list/set) comprehension - rewrite using list/set(). See https://pypi.org/project/flake8-comprehensions/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429 Differential Revision: D19972858 Pulled By: ezyang fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23	2020-02-21 08:32:22 -08:00
Pritam Damania	f050b16dd9	Move pytorch distributed tests to separate folder for contbuild. (#30445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445 Create distributed and rpc directories under caffe/test for better management of unit tests. Differential Revision: D18702786 fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606	2020-01-22 21:16:59 -08:00
Gregory Chanan	866c1b1fcc	Ensure legacy sparse constructor/new doesn't interpret python data as tensor data. (#31490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31490 When this happens, a dense tensor is constructed from a sparse constructor. Fixes: https://github.com/pytorch/pytorch/issues/16154 Test Plan: Imported from OSS Reviewed By: cpuhrsch, mrshenli Differential Revision: D19196498 Pulled By: gchanan fbshipit-source-id: 57a6324833e35f3e62318587ac74267077675b93	2019-12-26 10:46:18 -08:00
Michael Suo	62b10721fb	Actually make flake8 do something (#30892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892 Fixes all outstanding lints and actually installs a properly configured flake8 Test Plan: Imported from OSS Differential Revision: D18862825 Pulled By: suo fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85	2019-12-06 17:50:50 -08:00
Prasun Anand	3cf8382984	detect_anomaly() for SparseTensors (#29803 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28649 1. Modified detect_anomaly() to use isnan() 2. isnan() for SparseTensors returns a bool Tensor of _values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29803 Differential Revision: D18594299 Pulled By: ezyang fbshipit-source-id: 3f4190c569f53219be330584fc604ca43c4a6c7a	2019-12-03 15:42:51 -08:00
Brian Vaughan	a5272cb643	Error instead of assertion failure for div by sparse (#30260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30260 fixes: https://github.com/pytorch/pytorch/issues/30044 Without this PR, ``` >>> torch.tensor(1.) / torch.tensor(1.).to_sparse() Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: r.is_sparse() INTERNAL ASSERT FAILED at /Users/distiller/project/conda/conda-bld/pytorch_1570710797334/work/aten/src/ATen/native/sparse/SparseTensorMath.cpp:168, please report a bug to PyTorch. ``` Test Plan: Ran the same code with this change: ``` In [1]: import torch In [2]: torch.tensor(1).to_sparse() / torch.tensor(1).to_sparse() --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-2-7177f54f30bb> in <module> ----> 1 torch.tensor(1).to_sparse() / torch.tensor(1).to_sparse() RuntimeError: Unsupported tensor layout ``` Differential Revision: D18657387 Pulled By: nairbv fbshipit-source-id: cd23570d46f5b26fd84049e5e63b61b19835603d	2019-11-22 11:31:26 -08:00
Mike Ruberry	f6bda1e07b	Removes @default_floating_dtype decorator (#27628 ) Summary: One fewer legacy decorator cluttering the test suite. Functions relying on this decorator were updated or, in the case of test_sparse, the test suite was put back on double by default. Note: this PR is blocked on https://github.com/pytorch/pytorch/issues/27599. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27628 Differential Revision: D17896254 Pulled By: mruberry fbshipit-source-id: 13d460301f50ef4af7a660372432108164c0de1f	2019-10-12 12:39:34 -07:00
Mike Ruberry	7f183a978f	Stops common_utils.py from setting the default tensor type (to torch.DoubleTensor) (#27444 ) Summary: This PR stop common_utils.py from setting the default tensor type when it's imported. See issue https://github.com/pytorch/pytorch/issues/27355. This is a frequent source of confusion for test writers. Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are: - test_autograd.py - test_distributions.py - test_jit.py - test_nn.py This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting. Notable technical changes in this PR are: - Significant updates to test_torch.py to make it pass without setting the default floating dtype globally. - The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously. - test_torch-specific parts of common_utils were refactored into test_torch. - tensor creation methods in common_utils were updated to accept an optional dtype and device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27444 Differential Revision: D17795235 Pulled By: mruberry fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1	2019-10-08 09:52:44 -07:00
Pearu Peterson	b7fb2b8862	Implement pickle support for sparse tensors and torch.layout instances (#27062 ) Summary: Resolves issue https://github.com/pytorch/pytorch/issues/16667 and https://github.com/OpenMined/PySyft/issues/2326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27062 Differential Revision: D17762932 Pulled By: ezyang fbshipit-source-id: dd99c1f4ac8eb2286eb55aa20ce973f60ce7b7e1	2019-10-04 08:09:32 -07:00
Edward Yang	9b7011c5c2	Implement multiple dispatch (#26468 ) (#26501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26501 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. XLA companion patch at https://github.com/pytorch/xla/pull/1031 Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. The new generated code looks like this: ``` inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const { static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)"); return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(this, src))(const_cast<Tensor&>(this), src, non_blocking); } ``` The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D17499154 Pulled By: ezyang fbshipit-source-id: 8ea237c2e935134b0f4f8d6cfd89c6a93037c02c	2019-09-20 10:12:04 -07:00
Michael Suo	5304358859	Revert D17481256: Implement multiple dispatch Test Plan: revert-hammer Differential Revision: D17481256 Original commit changeset: b3206936b4ca fbshipit-source-id: a162c42168c17e24b5eaff83a7aae48beef3d2c2	2019-09-19 14:53:40 -07:00
Edward Yang	0705f759a3	Implement multiple dispatch (#26468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26468 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. XLA companion patch at https://github.com/pytorch/xla/pull/1031 Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. The new generated code looks like this: ``` inline Tensor & Tensor::copy_(const Tensor & src, bool non_blocking) const { static auto table = globalATenDispatch().getOpTable("aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)"); return table->getOp<Tensor & (Tensor &, const Tensor &, bool)>(at::detail::multi_dispatch_tensor_type_set(this, src))(const_cast<Tensor&>(this), src, non_blocking); } ``` The key difference is that previously we wrote `type_set()` as argument to getOp; now it is a call to `multi_dispatch_tensor_type_set` which collects the type ids together. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bddppq Differential Revision: D17481256 Pulled By: ezyang fbshipit-source-id: b3206936b4ca8938d45ea90fd71422e0d80b5f96	2019-09-19 14:29:38 -07:00
Junjie Bai	07bd76988e	Revert D17265918: Implement multiple dispatch Test Plan: revert-hammer Differential Revision: D17265918 Original commit changeset: 221efe4e86a4 fbshipit-source-id: f0ab90fa1201080e0d62fd140faf0fcdfd56601b	2019-09-19 09:50:17 -07:00
Edward Yang	ece14ff473	Implement multiple dispatch (#25653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25653 Instead of considering only the TensorTypeSet of the first argument, we collect all Tensor and TensorList arguments and union them together before computing the dispatch type id. Billing of changes: * ATenDispatch fallback code (i.e., what gets run if there is no entry for a function in the table) now lives out-of-line in a function `getFallbackOp`. This gave me an opportunity to write a more detailed error message, providing information about what registrations were available. There is a TODO in the fallback code, suggesting that we could automatically redispatch in the event that there is no handler for the key. But this is a bit of a design question, because it's not clear if automatic redispatch would cover up errors in the dispatch table (i.e., there should have been something registered at some key, but there wasn't.) * Collection of Tensor/TensorList arguments is done using the trusty old IterArgs helper class. A minor bit of refactoring I had to do to get here was move the IterArgs functionality in torch/csrc/utils/variadic.h into ATen/core. There's some refactoring due on that file too (it has copies of some C++ helper pieces which already live in c10--you can't actually move the whole thing because it is literally incompatible with other code in the codebase). So instead of calling `type_set()` to get the type set of the dispatch argument, now we just call `at::detail::multi_dispatch_tensor_type_set` on all of the tensor/tensor list arguments. * The code generator is adjusted to codegen collection of arguments as needed. There is a little bit of a hack in the code generator to turn 'self' arguments into 'this'. I think this may be duplicated with some logic somewhere else but I have to double check. After turning on multi-dispatch, I had to refactor existing code which previously dispatched one place, but now dispatches somewhere else. The primary component affected by this is sparse. Binary operations (add/sub/mul/div/addmm) now dispatch to sparse kernels even if you did add(dense, sparse). So I delete all the sparse handling code from dense kernels, and bulk up the sparse error handling to handle when the first argument is dense. In the case of addmm, I can eliminate the bridge code entirely (well, not quite: more on this below). I also updated the dispatch on sparse to actually point at sparse kernels. Pay special attention to the handling of `div_` by scalar: previously this logic lived in the "dense" `div_` implementation, but there is actually not any sparse kernel we dispatch to. I solved this particular problem by making a redispatch, but another valid approach would have been to add specific dispatches for sparse div on scalar. This codepath is poorly tested because it is only exercised from C++. * One minor annoyance is that because I now want separate dispatch for dense and sparse, I also need to replicate the `add`, `add_`, `add_out` trifecta on the sparse side. I opted for a compromise here: I wrote new a new `add_sparse` trifecta, but reused the implementation between CPU and CUDA. This means that I hav to do another dispatch once I get to `add_out`. The alternative would have been to do twice as many copies for CPU and CUDA (thereby eliminating the extra dispatch) but that seemed distinctly not worth it. * A lot of kernels in sparse assumed that the dispatch argument must be sparse. This is no longer true with dispatch, so I converted the asserts into plain error checking. This also means that we've perturbed the error message in the case of TestSparseOneOff.test_cuda_sparse_cpu_dense_add (I just updated the saved error message) * `addmm` is a little bit even more special: the bridge code also handled broadcasting. I replicated the broadcasting logic between CPU and CUDA implementations to avoid an extra dispatch. * `_sparse_addmm` gave me a bit of trouble, because I had forgotten why we had `torch.sparse.addmm` in the first place. But in the end, its changes followed along with the structural changes I made in addmm. I opted for an extra dispatch here for simplicity. * c10d has some Variable-Tensor confusion in its sparse code. I've worked around it by judiciously inserting "no variable type" guards, but a more correct fix would be to just solve the confusion entirely. Benchmark: Apply the following patch to the base commit and this commit: ``` diff --git a/aten/src/ATen/native/Const.cpp b/aten/src/ATen/native/Const.cpp new file mode 100644 index 0000000000..b66f4d3ece --- /dev/null +++ b/aten/src/ATen/native/Const.cpp @@ -0,0 +1,10 @@ +#include <ATen/ATen.h> + +namespace at { +namespace native { + +Tensor _const5(const Tensor& self, const Tensor& second, const Tensor& third, const Tensor& fourth, const Tensor& fifth) { + return self; +} + +}} // namespace at::native diff --git a/aten/src/ATen/native/native_functions.yaml b/aten/src/ATen/native/native_functions.yaml index b494ed7950..fddae638bb 100644 --- a/aten/src/ATen/native/native_functions.yaml +++ b/aten/src/ATen/native/native_functions.yaml @@ -5878,3 +5878,9 @@ dispatch: CPU: im2col_backward_cpu CUDA: im2col_backward_cuda + +# For benchmarking +- func: _const5(Tensor self, Tensor second, Tensor third, Tensor fourth, Tensor fifth) -> Tensor + variants: function + dispatch: + CPU: _const5 ``` Comparisons with timeit: One-argument, representative case: Before: ``` In [6]: %timeit x.reshape(1, 1) 1.46 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [7]: %timeit x.reshape(1, 1) 1.48 µs ± 29.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [8]: %timeit x.reshape(1, 1) 1.52 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit x.reshape(1, 1) 1.42 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit x.reshape(1, 1) 1.43 µs ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit x.reshape(1, 1) 1.42 µs ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Five-argument, synthetic case (we expect, with enough Tensor arguments, for there to be a slowdown, as we scale `O(n)` with number of arguments, compared to old dispatcher which is `O(1)` with number of arguments): Before: ``` In [1]: import torch In [2]: x = torch.zeros(1) In [3]: %timeit torch._const5(x, x, x, x, x) 949 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 954 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 947 ns ± 0.601 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` After: ``` In [3]: %timeit torch._const5(x, x, x, x, x) 985 ns ± 9.11 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit torch._const5(x, x, x, x, x) 984 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit torch._const5(x, x, x, x, x) 988 ns ± 0.555 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17265918 Pulled By: ezyang fbshipit-source-id: 221efe4e86a40f36abc81e2ebceaa7e251c90b3d	2019-09-19 09:30:40 -07:00
iotamudelta	4fe857187c	switch to rocThrust for thrust/cub APIs (#25620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602 Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option. Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header. Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust. Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable. Skip four tests that fail with the new rocThrust for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864 Reviewed By: xw285cornell Differential Revision: D16940768 Pulled By: bddppq fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5	2019-09-03 22:16:30 -07:00
Pearu Peterson	f793a7c57e	Implement indexing methods for sparse tensors (#24937 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/7416 . This PR implements the following indexing methods for sparse tensors: - [x] `select` - [x] `index_select` Note that this PR also modifies [gen.py](https://github.com/pytorch/pytorch/pull/24937/files#diff-76aa8cb3d0fad99c5f761d08cbcb4d19) that is not directly required to resolve the original issue but to work around a CI build issue reported in issue https://github.com/pytorch/pytorch/issues/24931 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/24937 Differential Revision: D17163796 Pulled By: ezyang fbshipit-source-id: 06613301ec456d9ed3491b9ce48e804048600f09	2019-09-03 09:31:03 -07:00
Will Feng	7b081e5d1e	Improve error message for changing tensor metadata after .data or .detach() (#23504 ) Summary: When a user tries to change metadata of a tensor created from `.data` or `.detach()`, we currently shows an error message "<function_name> is not allowed on Tensor created from .data or .detach()". However, this error message doesn't suggest what the right fix should look like. This PR improves the error message. Closes https://github.com/pytorch/pytorch/issues/23393. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23504 Differential Revision: D16547415 Pulled By: yf225 fbshipit-source-id: 37f4a0385442e2b0966386fb14d3d938ecf4230c	2019-07-29 22:25:14 -07:00
Will Feng	e4c7f59fbc	Shallow-copy indices and values in sparse tensor ctor (#20614 ) Summary: (Reopens https://github.com/pytorch/pytorch/pull/20330 and fixes test error.) After the Variable/Tensor merge, there is no guarantee that `indices` and `values` passed into the sparse tensor constructor don't contain AutogradMeta. However, we want to maintain the existing invariant that `indices_` and `values_` of a sparse tensor don't contain AutogradMeta, and to achieve this we need do shallow-copy in the sparse tensor constructor. Note that this is BC-breaking for code that changes the sizes / strides of the indices or values tensor after it's used to create a sparse tensor. In current master, such changes will be reflected in the sparse tensor and break sparse tensor invariants. After this PR, those changes will not be reflected in the sparse tensor, and thus the sparse tensor invariants are always preserved. Specifically, running in-place size/stride-changing ops such as `resize_` / `resize_as_` / `as_strided_` / `set_` / `transpose_` on the original values tensor will not update the sparse tensor's `values_`. For example: ```python # Calling resize_ on non-requires-grad value tensor i2 = torch.zeros([1, 1]) v2 = torch.ones([1, 2, 3]) t2 = torch.sparse_coo_tensor(i2, v2, torch.Size([2, 2, 3])) v2.resize_(4, 5) t2.coalesce().values().size() # On current master, this throws "indices and values must have same nnz, but got nnz from indices: 1, nnz from values: 4", because resizing the original value tensor affects `values_` of the sparse tensor. # After this PR, this prints "torch.Size([1, 2, 3])", which means resizing the original value tensor doesn't affect `values_` of the sparse tensor. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20614 Differential Revision: D15385811 Pulled By: yf225 fbshipit-source-id: e963fcf5e4097f8c881b56145f408565d97cf5c1	2019-05-16 18:35:05 -07:00
Will Feng	2ddf126b96	Revert D15373683: [pytorch][PR] [BC-breaking] Shallow-copy indices and values in sparse tensor ctor Differential Revision: D15373683 Original commit changeset: 32e7275d7121 fbshipit-source-id: ed1786ee9ffa11f7c14c9cd10be6db48285dc57a	2019-05-16 15:22:48 -07:00
Will Feng	4f02321a9a	Shallow-copy indices and values in sparse tensor ctor (#20330 ) Summary: After the Variable/Tensor merge, there is no guarantee that `indices` and `values` passed into the sparse tensor constructor don't contain AutogradMeta. However, we want to maintain the existing invariant that `indices_` and `values_` of a sparse tensor don't contain AutogradMeta, and to achieve this we need do shallow-copy in the sparse tensor constructor. Note that this is BC-breaking for code that changes the sizes / strides of the indices or values tensor after it's used to create a sparse tensor. In current master, such changes will be reflected in the sparse tensor and break sparse tensor invariants. After this PR, those changes will not be reflected in the sparse tensor, and thus the sparse tensor invariants are always preserved. Specifically, running in-place size/stride-changing ops such as `resize_` / `resize_as_` / `as_strided_` / `set_` / `transpose_` on the original values tensor will not update the sparse tensor's `values_`. For example: ```python # Calling resize_ on non-requires-grad value tensor i2 = torch.zeros([1, 1]) v2 = torch.ones([1, 2, 3]) t2 = torch.sparse_coo_tensor(i2, v2, torch.Size([2, 2, 3])) v2.resize_(4, 5) t2.coalesce().values().size() # On current master, this throws "indices and values must have same nnz, but got nnz from indices: 1, nnz from values: 4", because resizing the original value tensor affects `values_` of the sparse tensor. # After this PR, this prints "torch.Size([1, 2, 3])", which means resizing the original value tensor doesn't affect `values_` of the sparse tensor. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/20330 Differential Revision: D15373683 Pulled By: yf225 fbshipit-source-id: 32e7275d7121e17937c7cc258e8a60bb0848ff25	2019-05-16 15:04:23 -07:00
Brian Vaughan	d68802ba47	Sparse half embeddings on cuda (#19695 ) Summary: ``` import torch a = torch.nn.Embedding(3, 4, sparse=True).half().cuda() a(torch.LongTensor([1, 0]).cuda()).sum().backward() ``` gave: `RuntimeError: torch.cuda.sparse.HalfTensor is not enabled` This PR enables sparse.HalfTensor on cuda. Still won't work for CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19695 Differential Revision: D15281162 Pulled By: nairbv fbshipit-source-id: 0d83d946a059393bd53d8b8102e2daa9b4c02588	2019-05-10 08:00:55 -07:00
Johannes M Dieterich	5241e6ec5c	Fix sparse mm for ROCm (#18985 ) Summary: * Annotate also two pass reduction with launch bounds * ifdef some shortcomings of ROCm w.r.t. short-circuit returns - internal tickets filed * while there, plug memory leak by destroying matrix descriptor after the sparse call (applicable to cuSPARSE) * while there, fix types for cusparseXcoo2csr as per cuSPARSE documentation * enable test_dsmm in test_sparse which now passes Pull Request resolved: https://github.com/pytorch/pytorch/pull/18985 Differential Revision: D14822009 Pulled By: bddppq fbshipit-source-id: 757267a47a63ee56ef396c33059f7eca099f4833	2019-04-07 18:16:16 -07:00

1 2 3 4 5 ...

364 Commits