[Intel GPU] skip a cuda api call in amp to save some host overhead on xpu (#151111)

This can save ~0.2ms on non cuda devices by skip calling `amp_definitely_not_available()`. It can improve small models in torchbench like lennard_jones on xpu 10% on both eager and inductor in dynamo benchmarks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151111
Approved by: https://github.com/soulitzer
This commit is contained in:
Zhang, Jianyi 2025-04-13 06:37:07 +00:00 committed by PyTorch MergeBot
parent 1c5619ef9c
commit b59f3d3ae0

View File

@ -260,8 +260,8 @@ class autocast:
self._cache_enabled = torch.is_autocast_cache_enabled()
if (
enabled
and torch.cuda.amp.common.amp_definitely_not_available()
and self.device == "cuda"
and torch.cuda.amp.common.amp_definitely_not_available()
):
warnings.warn(
"User provided device_type of 'cuda', but CUDA is not available. Disabling"