Avoid casting low precision inputs to high precision for XPU Tensor in torch.linalg.vector_norm (#141954)

Fixes https://github.com/pytorch/pytorch/issues/141953 For mixed precision cases, tensors with device is cpu would cast type to `out_dtype`, while tensors with cuda devices will not do so for computational efficiency. For Intel xpu tensors, low-precision inputs should also not be converted to high-precision (same as cuda). Pull Request resolved: https://github.com/pytorch/pytorch/pull/141954 Approved by: https://github.com/guangyey, https://github.com/ezyang
2025-12-06 12:20:52 +01:00 · 2024-12-04 06:44:17 +00:00 · 2024-12-04 06:44:17 +00:00 · c0e1fc4919
commit c0e1fc4919
parent 75d57b04ec
1 changed files with 1 additions and 1 deletions
--- a/aten/src/ATen/native/ReduceOpsUtils.h
+++ b/aten/src/ATen/native/ReduceOpsUtils.h
@ -219,7 +219,7 @@ inline TensorIterator make_reduction(
  // not generalize this to common mismatched input/output types to avoid cross
  // product of templated kernel launches.
  const bool gpu_lowp_to_f32 = (
-    self.is_cuda() && (self.scalar_type() == kHalf || self.scalar_type() == kBFloat16) && out_dtype == kFloat);
+        (self.is_cuda() || self.is_xpu()) && (self.scalar_type() == kHalf || self.scalar_type() == kBFloat16) && out_dtype == kFloat);
  auto in_dtype = gpu_lowp_to_f32 ? self.scalar_type()
                   : self.is_complex() ? c10::toComplexType(out_dtype)
                                       : out_dtype;