pytorch/torch/sparse
Pearu Peterson b76d1b79e6 Add scaling arguments to bsr_dense_addmm (#136104)
As in the title.

Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413

The PR assumes that the existing tuning parameters are good also when using scaling arguments. This needs to be verified as a follow-up task.

Also, this PR redefines triton-contiguous tensors: the tensor must have strides not larger than 1. This will now allow zero strides that previously triggered `contiguous` call although the underlying memory buffer was contiguous.

Re: "a considerable slow-down occurs because tensor data is copied element-wise rather than chunk-wise" - this note should refer to a code (torch or triton?) that implements the element/chunk-wise copy so that we could verify that allowing zero strides indeed would not trigger element-wise copies. Atm, the performance increase in ViT-H benchmarks (that involve using 0 strides) is an evidence that allowing zero strides does not lead to slow-downs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136104
Approved by: https://github.com/cpuhrsch
2024-09-16 20:26:54 +00:00
..
__init__.py SparseCsrCUDA: cuDSS backend for linalg.solve (#129856) 2024-08-22 07:57:30 +00:00
_semi_structured_conversions.py Enable UFMT on all of torch/sparse (#130545) 2024-07-15 22:35:52 +00:00
_semi_structured_ops.py [BE][Easy][19/19] enforce style for empty lines in import segments in torch/[o-z]*/ (#129771) 2024-08-01 17:07:14 +00:00
_triton_ops_meta.py Add scaling arguments to bsr_dense_addmm (#136104) 2024-09-16 20:26:54 +00:00
_triton_ops.py Add scaling arguments to bsr_dense_addmm (#136104) 2024-09-16 20:26:54 +00:00
semi_structured.py [BE]: Update mypy to 1.11.2 (#133816) 2024-09-16 19:44:11 +00:00