pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Pearu Peterson b76d1b79e6 Add scaling arguments to bsr_dense_addmm (#136104 ) As in the title. Tackles https://github.com/pytorch/ao/pull/821/files#r1759821413 The PR assumes that the existing tuning parameters are good also when using scaling arguments. This needs to be verified as a follow-up task. Also, this PR redefines triton-contiguous tensors: the tensor must have strides not larger than 1. This will now allow zero strides that previously triggered `contiguous` call although the underlying memory buffer was contiguous. Re: "a considerable slow-down occurs because tensor data is copied element-wise rather than chunk-wise" - this note should refer to a code (torch or triton?) that implements the element/chunk-wise copy so that we could verify that allowing zero strides indeed would not trigger element-wise copies. Atm, the performance increase in ViT-H benchmarks (that involve using 0 strides) is an evidence that allowing zero strides does not lead to slow-downs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136104 Approved by: https://github.com/cpuhrsch		2024-09-16 20:26:54 +00:00
..
__init__.py	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 )	2024-08-22 07:57:30 +00:00
_semi_structured_conversions.py	Enable UFMT on all of torch/sparse (#130545 )	2024-07-15 22:35:52 +00:00
_semi_structured_ops.py	[BE][Easy][19/19] enforce style for empty lines in import segments in `torch/[o-z]*/` (#129771 )	2024-08-01 17:07:14 +00:00
_triton_ops_meta.py	Add scaling arguments to bsr_dense_addmm (#136104 )	2024-09-16 20:26:54 +00:00
_triton_ops.py	Add scaling arguments to bsr_dense_addmm (#136104 )	2024-09-16 20:26:54 +00:00
semi_structured.py	[BE]: Update mypy to 1.11.2 (#133816 )	2024-09-16 19:44:11 +00:00