pytorch/torch/testing/_internal
Jeff Daily 6ede882c0b preferred blas library; cublaslt gemm implementation (#122106)
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources.

The default blas implementation remains cublas or hipblas.  cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122106
Approved by: https://github.com/lezcano
2024-04-22 15:38:22 +00:00
..
codegen
data
distributed [Profiler] Unify the device(CUDA, XPU, PrivateUse1) in torch profiler post processing (#123247) 2024-04-22 01:26:55 +00:00
generated
opinfo Test COW materialization in backward ops (#123593) 2024-04-09 22:31:50 +00:00
optests [opcheck] Stop doing test_aot_dispatch_static by default (#124495) 2024-04-19 21:57:22 +00:00
test_module
__init__.py
autocast_test_lists.py
autograd_function_db.py
check_kernel_launches.py
common_cuda.py skip various unit tests for Jetson (#122531) 2024-04-16 01:26:26 +00:00
common_device_type.py [BE]: TRY002 - Ban raising vanilla exceptions (#124570) 2024-04-21 22:26:40 +00:00
common_dist_composable.py
common_distributed.py [BE]: Update ruff to 0.4.1 (#124549) 2024-04-21 14:06:23 +00:00
common_dtype.py
common_fsdp.py [FSDP2] Added pre/post-all-gather extensions (subclass) (#122908) 2024-04-15 21:35:51 +00:00
common_jit.py
common_methods_invocations.py [ATen] Add CPU fp16 support for nll_loss and cross_entropy_loss (#123256) 2024-04-18 11:44:38 +00:00
common_mkldnn.py
common_modules.py [ATen] Add CPU fp16 support for nll_loss and cross_entropy_loss (#123256) 2024-04-18 11:44:38 +00:00
common_nn.py [BE]: Update ruff to 0.4.1 (#124549) 2024-04-21 14:06:23 +00:00
common_optimizers.py Enable dynamo rosenbrock sparse tests (#124542) 2024-04-20 05:54:41 +00:00
common_pruning.py
common_quantization.py [BE]: Update ruff to 0.4.1 (#124549) 2024-04-21 14:06:23 +00:00
common_quantized.py
common_subclass.py
common_utils.py preferred blas library; cublaslt gemm implementation (#122106) 2024-04-22 15:38:22 +00:00
composite_compliance.py
custom_op_db.py Change register_autograd to reflect ordering of setup_context and backward (#124403) 2024-04-19 17:56:30 +00:00
dist_utils.py [BE]: Update ruff to 0.4.1 (#124549) 2024-04-21 14:06:23 +00:00
dynamo_test_failures.py
hop_db.py [while_loop] add a simiple op_info test (#123814) 2024-04-11 19:59:04 +00:00
hypothesis_utils.py
inductor_utils.py [Inductor Intel GPU backend Upstream] Add Inductor Intel GPU backend. (#121895) 2024-04-05 09:05:11 +00:00
jit_metaprogramming_utils.py
jit_utils.py
logging_tensor.py
logging_utils.py
quantization_torch_package_models.py
static_module.py
torchbind_impls.py Support torchbind op dispatch in python (#123367) 2024-04-19 17:17:27 +00:00
triton_utils.py [Inductor Intel GPU backend Upstream] Generalize device-bias code in (#124249) 2024-04-18 03:54:31 +00:00
two_tensor.py