[Intel GPU] skip a cuda api call in amp to save some host overhead on xpu (#151111)

This can save ~0.2ms on non cuda devices by skip calling `amp_definitely_not_available()`. It can improve small models in torchbench like lennard_jones on xpu 10% on both eager and inductor in dynamo benchmarks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151111 Approved by: https://github.com/soulitzer
2025-12-06 12:20:52 +01:00 · 2025-04-13 06:37:07 +00:00 · 2025-04-13 06:37:07 +00:00 · b59f3d3ae0
commit b59f3d3ae0
parent 1c5619ef9c
1 changed files with 1 additions and 1 deletions
--- a/torch/amp/autocast_mode.py
+++ b/torch/amp/autocast_mode.py
@ -260,8 +260,8 @@ class autocast:
        self._cache_enabled = torch.is_autocast_cache_enabled()
        if (
            enabled
-            and torch.cuda.amp.common.amp_definitely_not_available()
            and self.device == "cuda"
+            and torch.cuda.amp.common.amp_definitely_not_available()
        ):
            warnings.warn(
                "User provided device_type of 'cuda', but CUDA is not available. Disabling"