mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Avoid casting low precision inputs to high precision for XPU Tensor in torch.linalg.vector_norm (#141954)
Fixes https://github.com/pytorch/pytorch/issues/141953 For mixed precision cases, tensors with device is cpu would cast type to `out_dtype`, while tensors with cuda devices will not do so for computational efficiency. For Intel xpu tensors, low-precision inputs should also not be converted to high-precision (same as cuda). Pull Request resolved: https://github.com/pytorch/pytorch/pull/141954 Approved by: https://github.com/guangyey, https://github.com/ezyang
This commit is contained in:
parent
75d57b04ec
commit
c0e1fc4919
|
|
@ -219,7 +219,7 @@ inline TensorIterator make_reduction(
|
|||
// not generalize this to common mismatched input/output types to avoid cross
|
||||
// product of templated kernel launches.
|
||||
const bool gpu_lowp_to_f32 = (
|
||||
self.is_cuda() && (self.scalar_type() == kHalf || self.scalar_type() == kBFloat16) && out_dtype == kFloat);
|
||||
(self.is_cuda() || self.is_xpu()) && (self.scalar_type() == kHalf || self.scalar_type() == kBFloat16) && out_dtype == kFloat);
|
||||
auto in_dtype = gpu_lowp_to_f32 ? self.scalar_type()
|
||||
: self.is_complex() ? c10::toComplexType(out_dtype)
|
||||
: out_dtype;
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user