[ROCm] Enable inductor GEMM lowering for gfx11 (#141687)

This check doesn't make sense for some of the AMD gpus since they have the right amount of CUs but multi_processor_count returns WGPs on RDNA while still performing adequately. A lot of tests fail on modern archs due to this check defaulting them to not using the GEMMs backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141687 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-12-06 12:20:52 +01:00 · 2024-12-02 22:13:31 +00:00 · 2024-12-02 22:13:31 +00:00 · 5c2584a14c
commit 5c2584a14c
parent 1f3d8896bc
1 changed files with 11 additions and 1 deletions
--- a/torch/_inductor/utils.py
+++ b/torch/_inductor/utils.py
@ -1114,8 +1114,18 @@ class DelayReplaceLine(DeferredLineBase):

@functools.lru_cache(None)
 def is_big_gpu(index) -> bool:
+    prop = torch.cuda.get_device_properties(index)
+
+    # SM logic is not relevant to ROCm gpus
+    # Arbitrarily skipping the older models
+    if torch.version.hip:
+        if prop.major < 9 or prop.major == 10:
+            log.warning("GPU arch does not support max_autotune_gemm mode usage")
+            return False
+        return True
+
    min_sms = 68  # 3080
-    avail_sms = torch.cuda.get_device_properties(index).multi_processor_count
+    avail_sms = prop.multi_processor_count
    if avail_sms < min_sms:
        log.warning(
            "Not enough SMs to use max_autotune_gemm mode",