pytorch/cmake
Daniel Vega-Myhre ac99fc7e57 Updates to build rowwise scaled mm kernel on SM10.0a (#148274)
## Summary
Update cmake files and RowwiseScaledMM.cu to build on SM10.0a arch.

**NOTE**: performance optimization will be done in separate follow up PRs

## Steps to verify build
1. Access devgpu/machine with B200 GPUs, verify B200s are visible w/ `nvidia-smi`
2. Install CUDA tookit 12.8
    - e.g. see [Nvidia docs](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Rocky&target_version=9&target_type=rpm_local)
3. Verify CUDA toolkit installation
    - e.g. `nvcc --version` should have `... Cuda compilation tools, release 12.8 ... ` in output
4. Set env var `TORCH_CUDA_ARCH_LIST=10.0a`
4. Build pytorch from source with this PR ([steps](https://github.com/pytorch/pytorch#from-source))
5. Uninstall `pytorch-triton` with `pip uninstall pytorch-triton`
6. Build and install triton from source: https://github.com/triton-lang/triton?tab=readme-ov-file#install-from-source
7. Run tests shown in test plan below

**NOTE**: performance optimization will be done in a separate PR. The goal of this PR is just to ensure it builds correctly.

## Test plan
- `python test/distributed/tensor/test_matrix_ops.py  -k scaled_mm`: OK
- `python test/test_matmul_cuda.py -k rowwise`: OK
- `python test/test_flop_counter.py -k scaled_mm`: OK
- `python test/inductor/test_aot_inductor.py -k fp8`: OK
- `python test/inductor/test_fp8.py`: OK

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148274
Approved by: https://github.com/drisspg
2025-03-04 05:23:41 +00:00
..
External Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073) 2025-02-19 03:52:26 +00:00
Modules [Intel GPU] Decompule Intel GPU oneDNN from other backends (#147926) 2025-02-28 07:42:06 +00:00
Modules_CUDA_fix [NVIDIA] Full Family Blackwell Support codegen (#145436) 2025-01-24 04:36:00 +00:00
public Add cmake hints to USE_SYSTEM_NVTX for nvtx3 include dir (#147418) 2025-02-26 20:52:28 +00:00
Allowlist.cmake
BuildVariables.cmake
Caffe2Config.cmake.in
CheckAbi.cmake
cmake_uninstall.cmake.in
Codegen.cmake Updates to build rowwise scaled mm kernel on SM10.0a (#148274) 2025-03-04 05:23:41 +00:00
DebugHelper.cmake
Dependencies.cmake [ROCm] OCP FP8 Support for new GPUs (#146632) 2025-02-24 22:47:52 +00:00
FlatBuffers.cmake
GoogleTestPatch.cmake
IncludeSource.cpp.in
iOS.cmake
Metal.cmake [MPS] Support includes in metal objects (#145087) 2025-01-18 05:35:22 +00:00
MiscCheck.cmake Add SVE implementation of embedding_lookup_idx (#133995) 2024-10-15 18:52:44 +00:00
prioritized_text.txt
ProtoBuf.cmake
ProtoBufPatch.cmake
Summary.cmake Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505) 2025-01-23 18:50:59 +00:00
TorchConfig.cmake.in Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505) 2025-01-23 18:50:59 +00:00
TorchConfigVersion.cmake.in
VulkanCodegen.cmake
VulkanDependencies.cmake