pytorch/test/expect
Jesse Cai 4471fe6c39 [sparse][semi-structured] add alg_id to _cslt_sparse_mm and _cslt_sparse_mm_search (#115178)
Summary:

cuSPARSELt has support for different alg_id, which are set via

`cusparseLTMatmulAlgSetAttribute`, in total there are 4 different
alg_ids, 0 - 3.

Previously we were just using the default alg_id, as from our initial
experiments we found that for most shapes the default alg_id is the
fastest and that they made no difference on numerical correctness, just
performance. From our previous experiments the fastest alg_id seemed to
differ only on small matmul shapes.

danthe3rd found a performance regression when running with
cuSPARSELt v0.4.0 vs v0.5.0, on LLM shapes, which match these
characteristics (activations are small, weights are large).

However it's likely that this is due to the alg_id ordering changing, as
mentioned in the release notes for v0.5.0.
```
cusparseLtMatmulAlgSelectionInit() does not ensure the same ordering of
algorithm id alg as in v0.4.0.
```

This PR adds in the following:
- support for passing in alg_id to _cslt_sparse_mm
- a new op, _cslt_sparse_mm_search, which returns the optimal alg_id for
  a given matmul

_cslt_sparse_mm_search has the same function signature as
_cslt_sparse_mm, minus the alg_id parameter.
We are able to achieve v0.4.0 performance with alg_id=1 on the shapes
that daniel provided.

We will address autoselecting the best alg_id in a future PR, possibly
with torch.compile.

Test Plan:
```
python test/test_sparse_semi_structured -k cslt
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115178
Approved by: https://github.com/cpuhrsch
2023-12-11 23:08:51 +00:00
..
__init__.py
HasDecompTest.test_aten_core_operators.expect [inductor] Added decomposition for upsample_nearest_exact Nd (#113749) 2023-11-21 13:03:47 +00:00
HasDecompTest.test_has_decomposition.expect [sparse][semi-structured] add alg_id to _cslt_sparse_mm and _cslt_sparse_mm_search (#115178) 2023-12-11 23:08:51 +00:00
TestAutograd.test_function-x_grad_desc.expect
TestAutograd.test_function-y_grad_desc.expect
TestFXAPIBackwardCompatibility.test_class_member_back_compat-fx_backcompat_class_members.expect [fx] Add a faster method for inserting positional argument. (#111974) 2023-10-26 02:30:42 +00:00
TestFXAPIBackwardCompatibility.test_function_back_compat-fx_backcompat_function_signatures.expect [export] make UnflattenedModule not inherit from GraphModule (#115408) 2023-12-11 22:15:21 +00:00
TestJit.test_cu_escaped_number.expect
TestJit.test_import_method.expect
TestJit.test_non_ascii_string.expect
TestJit.test_pretty_printer-empty_float_list_test.expect
TestJit.test_pretty_printer-empty_int_list_test.expect
TestJit.test_pretty_printer-if_one.expect
TestJit.test_pretty_printer-if_test.expect
TestJit.test_pretty_printer-loop_use_test.expect
TestJit.test_pretty_printer-print_weird_test.expect
TestJit.test_pretty_printer-python_op_name_test.expect
TestJit.test_pretty_printer-while_if_test.expect
TestJit.test_pretty_printer-while_test.expect
TestPytorchExportModes.test_aten_fallback.expect
TestPytorchExportModes.test_onnx_aten.expect
TestScript.test_annot_ast_mypy_fn.expect
TestScript.test_annot_ast_mypy_method.expect
TestScript.test_annot_ast_py3_fn.expect
TestScript.test_annot_ast_py3_method.expect
TestScript.test_annot_string_mypy_fn.expect
TestScript.test_annot_string_mypy_method.expect
TestScript.test_annot_string_py3_fn.expect
TestScript.test_annot_string_py3_method.expect
TestScript.test_annotated_script_fn.expect
TestScript.test_annotated_script_method.expect
TestScript.test_format-stdout.expect
TestScript.test_listconstruct_erasure.expect
TestScript.test_parser_type_annotations_comment.expect
TestScript.test_parser_type_annotations.expect
TestScript.test_print-stdout.expect Remove set_default_dtype calls from jit and ops tests (#105072) 2023-07-15 03:18:33 +00:00
TestScript.test_python_frontend_py2.expect
TestScript.test_python_frontend_py3.expect
TestScript.test_python_frontend.expect
TestScript.test_string_print-stdout.expect
TestScript.test_torch_dot_tensor_annotation.expect
TestSparseCompressedCPU.test_print_SparseBSC_cpu.expect Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914) 2022-11-30 02:13:33 +00:00
TestSparseCompressedCPU.test_print_SparseBSR_cpu.expect Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914) 2022-11-30 02:13:33 +00:00
TestSparseCompressedCPU.test_print_SparseCSC_cpu.expect Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914) 2022-11-30 02:13:33 +00:00
TestSparseCompressedCPU.test_print_SparseCSR_cpu.expect Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914) 2022-11-30 02:13:33 +00:00
TestSparseCompressedCUDA.test_print_SparseBSC_cuda.expect Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914) 2022-11-30 02:13:33 +00:00
TestSparseCompressedCUDA.test_print_SparseBSR_cuda.expect Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914) 2022-11-30 02:13:33 +00:00
TestSparseCompressedCUDA.test_print_SparseCSC_cuda.expect Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914) 2022-11-30 02:13:33 +00:00
TestSparseCompressedCUDA.test_print_SparseCSR_cuda.expect Generator of tensor inputs with variable layout and structure (batch/non-batch, hybrid/non-hybrid, block/non-block) (#88914) 2022-11-30 02:13:33 +00:00
TestSparseCPU.test_print_coalesced_cpu_float64.expect
TestSparseCPU.test_print_uncoalesced_cpu_float64.expect
TestSparseCUDA.test_print_coalesced_cuda_float64.expect
TestSparseCUDA.test_print_uncoalesced_cuda_float64.expect
TestTensorBoard.test_audio.expect
TestTensorBoard.test_caffe2_simple_cnnmodel.expect
TestTensorBoard.test_caffe2_simple_model.expect
TestTensorBoard.test_histogram_auto.expect
TestTensorBoard.test_histogram_doane.expect
TestTensorBoard.test_histogram_fd.expect
TestTensorBoard.test_hparams_bool.expect
TestTensorBoard.test_hparams_number.expect
TestTensorBoard.test_hparams_string.expect
TestTensorBoard.test_image_with_3_channel_batched.expect Avoid overflow in tensorboard image summary (#90423) 2022-12-08 08:31:52 +00:00
TestTensorBoard.test_image_with_boxes.expect Avoid overflow in tensorboard image summary (#90423) 2022-12-08 08:31:52 +00:00
TestTensorBoard.test_image_with_one_channel_batched.expect Avoid overflow in tensorboard image summary (#90423) 2022-12-08 08:31:52 +00:00
TestTensorBoard.test_image_with_one_channel.expect Avoid overflow in tensorboard image summary (#90423) 2022-12-08 08:31:52 +00:00
TestTensorBoard.test_image_without_channel.expect Avoid overflow in tensorboard image summary (#90423) 2022-12-08 08:31:52 +00:00
TestTensorBoard.test_mesh.expect
TestTensorBoard.test_nested_nn_squential.expect
TestTensorBoard.test_pr_curve_raw.expect
TestTensorBoard.test_pr_curve.expect
TestTensorBoard.test_pytorch_graph.expect
TestTensorBoard.test_scalar_new_style.expect
TestTensorBoard.test_text.expect
TestTensorBoard.test_video.expect
TestTorch.test_is_nonzero-empty.expect
TestTorch.test_is_nonzero-multiple.expect
TestTorch.test_print-non_contiguous.expect