mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
## Summary
Update cmake files and RowwiseScaledMM.cu to build on SM10.0a arch.
**NOTE**: performance optimization will be done in separate follow up PRs
## Steps to verify build
1. Access devgpu/machine with B200 GPUs, verify B200s are visible w/ `nvidia-smi`
2. Install CUDA tookit 12.8
- e.g. see [Nvidia docs](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Rocky&target_version=9&target_type=rpm_local)
3. Verify CUDA toolkit installation
- e.g. `nvcc --version` should have `... Cuda compilation tools, release 12.8 ... ` in output
4. Set env var `TORCH_CUDA_ARCH_LIST=10.0a`
4. Build pytorch from source with this PR ([steps](https://github.com/pytorch/pytorch#from-source))
5. Uninstall `pytorch-triton` with `pip uninstall pytorch-triton`
6. Build and install triton from source: https://github.com/triton-lang/triton?tab=readme-ov-file#install-from-source
7. Run tests shown in test plan below
**NOTE**: performance optimization will be done in a separate PR. The goal of this PR is just to ensure it builds correctly.
## Test plan
- `python test/distributed/tensor/test_matrix_ops.py -k scaled_mm`: OK
- `python test/test_matmul_cuda.py -k rowwise`: OK
- `python test/test_flop_counter.py -k scaled_mm`: OK
- `python test/inductor/test_aot_inductor.py -k fp8`: OK
- `python test/inductor/test_fp8.py`: OK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148274
Approved by: https://github.com/drisspg
|
||
|---|---|---|
| .. | ||
| External | ||
| Modules | ||
| Modules_CUDA_fix | ||
| public | ||
| Allowlist.cmake | ||
| BuildVariables.cmake | ||
| Caffe2Config.cmake.in | ||
| CheckAbi.cmake | ||
| cmake_uninstall.cmake.in | ||
| Codegen.cmake | ||
| DebugHelper.cmake | ||
| Dependencies.cmake | ||
| FlatBuffers.cmake | ||
| GoogleTestPatch.cmake | ||
| IncludeSource.cpp.in | ||
| iOS.cmake | ||
| Metal.cmake | ||
| MiscCheck.cmake | ||
| prioritized_text.txt | ||
| ProtoBuf.cmake | ||
| ProtoBufPatch.cmake | ||
| Summary.cmake | ||
| TorchConfig.cmake.in | ||
| TorchConfigVersion.cmake.in | ||
| VulkanCodegen.cmake | ||
| VulkanDependencies.cmake | ||