pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Daniel Vega-Myhre ac99fc7e57 Updates to build rowwise scaled mm kernel on SM10.0a (#148274 ) ## Summary Update cmake files and RowwiseScaledMM.cu to build on SM10.0a arch. NOTE: performance optimization will be done in separate follow up PRs ## Steps to verify build 1. Access devgpu/machine with B200 GPUs, verify B200s are visible w/ `nvidia-smi` 2. Install CUDA tookit 12.8 - e.g. see [Nvidia docs](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Rocky&target_version=9&target_type=rpm_local) 3. Verify CUDA toolkit installation - e.g. `nvcc --version` should have `... Cuda compilation tools, release 12.8 ... ` in output 4. Set env var `TORCH_CUDA_ARCH_LIST=10.0a` 4. Build pytorch from source with this PR ([steps](https://github.com/pytorch/pytorch#from-source)) 5. Uninstall `pytorch-triton` with `pip uninstall pytorch-triton` 6. Build and install triton from source: https://github.com/triton-lang/triton?tab=readme-ov-file#install-from-source 7. Run tests shown in test plan below NOTE: performance optimization will be done in a separate PR. The goal of this PR is just to ensure it builds correctly. ## Test plan - `python test/distributed/tensor/test_matrix_ops.py -k scaled_mm`: OK - `python test/test_matmul_cuda.py -k rowwise`: OK - `python test/test_flop_counter.py -k scaled_mm`: OK - `python test/inductor/test_aot_inductor.py -k fp8`: OK - `python test/inductor/test_fp8.py`: OK Pull Request resolved: https://github.com/pytorch/pytorch/pull/148274 Approved by: https://github.com/drisspg		2025-03-04 05:23:41 +00:00
..
External	Nccl update to 2.25.1 for cuda 12.4-12.8 (#146073 )	2025-02-19 03:52:26 +00:00
Modules	[Intel GPU] Decompule Intel GPU oneDNN from other backends (#147926 )	2025-02-28 07:42:06 +00:00
Modules_CUDA_fix	[NVIDIA] Full Family Blackwell Support codegen (#145436 )	2025-01-24 04:36:00 +00:00
public	Add cmake hints to USE_SYSTEM_NVTX for nvtx3 include dir (#147418 )	2025-02-26 20:52:28 +00:00
Allowlist.cmake
BuildVariables.cmake
Caffe2Config.cmake.in
CheckAbi.cmake
cmake_uninstall.cmake.in
Codegen.cmake	Updates to build rowwise scaled mm kernel on SM10.0a (#148274 )	2025-03-04 05:23:41 +00:00
DebugHelper.cmake
Dependencies.cmake	[ROCm] OCP FP8 Support for new GPUs (#146632 )	2025-02-24 22:47:52 +00:00
FlatBuffers.cmake
GoogleTestPatch.cmake
IncludeSource.cpp.in
iOS.cmake
Metal.cmake	[MPS] Support includes in metal objects (#145087 )	2025-01-18 05:35:22 +00:00
MiscCheck.cmake	Add SVE implementation of embedding_lookup_idx (#133995 )	2024-10-15 18:52:44 +00:00
prioritized_text.txt
ProtoBuf.cmake
ProtoBufPatch.cmake
Summary.cmake	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 )	2025-01-23 18:50:59 +00:00
TorchConfig.cmake.in	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 )	2025-01-23 18:50:59 +00:00
TorchConfigVersion.cmake.in
VulkanCodegen.cmake
VulkanDependencies.cmake