Avoid casting low precision inputs to high precision for XPU Tensor in torch.linalg.vector_norm (#141954)

Fixes https://github.com/pytorch/pytorch/issues/141953

For mixed precision cases, tensors with device is cpu would cast type to `out_dtype`, while tensors with cuda devices will not do so for computational efficiency. For Intel xpu tensors, low-precision inputs should also not be converted to high-precision (same as cuda).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141954
Approved by: https://github.com/guangyey, https://github.com/ezyang
This commit is contained in:
chunhuanMeng 2024-12-04 06:44:17 +00:00 committed by PyTorch MergeBot
parent 75d57b04ec
commit c0e1fc4919

View File

@ -219,7 +219,7 @@ inline TensorIterator make_reduction(
// not generalize this to common mismatched input/output types to avoid cross
// product of templated kernel launches.
const bool gpu_lowp_to_f32 = (
self.is_cuda() && (self.scalar_type() == kHalf || self.scalar_type() == kBFloat16) && out_dtype == kFloat);
(self.is_cuda() || self.is_xpu()) && (self.scalar_type() == kHalf || self.scalar_type() == kBFloat16) && out_dtype == kFloat);
auto in_dtype = gpu_lowp_to_f32 ? self.scalar_type()
: self.is_complex() ? c10::toComplexType(out_dtype)
: out_dtype;