pytorch/aten/src/ATen
fengqing.lu de92296bbb [Intel GPU] undo broadcast on zero stride tensor for SDPA (#151976)
Fix https://github.com/pytorch/pytorch/issues/152290.

The model **hubert** uses aten::expand to build attention mask by broadcasting. Pytorch uses strides[d]=0 to represent broadcast, which is not supported by oneDNN.  This PR handles this scenario.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151976
Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/drisspg
2025-05-14 16:09:03 +00:00
..
benchmarks
core Revert "Make device check error message more descriptive (#150750)" 2025-05-06 06:42:08 +00:00
cpu Format all headers under ATen/cpu/vec, not just top-level (#152364) 2025-05-09 18:46:07 +00:00
cuda Revert "refine fp32 precision api (#125888)" 2025-05-11 00:35:46 +00:00
cudnn Fix xrefs (#151888) 2025-04-25 21:27:27 +00:00
detail Enable -Wunused on torch targets (#150077) 2025-05-02 07:14:19 +00:00
functorch [dynamo] Guard serialization for FUNCTORCH_STACK_MATCH (#152616) 2025-05-05 18:05:56 +00:00
hip/impl Fix torch.accelerator api abort when passing invaild device (#143550) 2024-12-23 03:44:22 +00:00
metal Remove deprecated alias macro(1/3) (#137556) 2024-10-21 17:32:32 +00:00
miopen [ROCm] Implemented dropout usage for RNN with MIOpen backend (#144572) 2025-04-25 21:06:45 +00:00
mkl Fix xrefs (#151888) 2025-04-25 21:27:27 +00:00
mps [MPS] col2im kernel implementation (#152282) 2025-04-28 03:48:41 +00:00
native [Intel GPU] undo broadcast on zero stride tensor for SDPA (#151976) 2025-05-14 16:09:03 +00:00
nnapi Fix cppcoreguidelines-init-variables ignorance (#141795) 2025-01-28 17:11:37 +00:00
ops
quantized Enable modernize-use-default-member-init (#149046) 2025-04-09 11:57:24 +00:00
templates Remove unnecessary __STDC_FORMAT_MACROS macro (#152513) 2025-05-02 05:06:44 +00:00
test Set CMake 3.5 as minimum version in pytorch_android (#152769) 2025-05-04 16:57:22 +00:00
vulkan Remove deprecated alias macro(1/3) (#137556) 2024-10-21 17:32:32 +00:00
xpu Deprecate host allocator legacy APIs (#151437) 2025-04-22 03:13:24 +00:00
.gitignore
AccumulateType.cpp Get accumulate dtype for Intel GPU (#134465) 2024-08-29 05:27:57 +00:00
AccumulateType.h [3/N] Fix Wextra-semi warnings (#139165) 2024-10-30 02:08:13 +00:00
ArrayRef.h
ATen.h
ATenConfig.cmake.in
autocast_mode.cpp [MAIA] [Autocast] Enable autocast on MAIA device (#148511) 2025-03-18 03:46:22 +00:00
autocast_mode.h [BE] Replace std::runtime_error with TORCH_CHECK [1/N] (#151880) 2025-04-23 11:14:35 +00:00
Backend.h
Backtrace.h
BlasBackend.h [ROCm] change preferred blas lib defaults (#150212) 2025-03-29 03:33:07 +00:00
CachedTensorUtils.cpp [18/N] Fix extra warnings brought by clang-tidy-17 (#144014) 2025-01-08 17:21:55 +00:00
CachedTensorUtils.h
ceil_div.h
CMakeLists.txt [ROCm][Windows] Include AOTriton dependent sources in Windows build (#150521) 2025-04-08 16:18:15 +00:00
code_template.h [3/N] Fix cppcoreguidelines-special-member-functions warnings (#138796) 2024-10-28 10:53:11 +00:00
CollapseDims.h
Config.h.in Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505) 2025-01-23 18:50:59 +00:00
ConjugateFallback.cpp
Context.cpp Revert "refine fp32 precision api (#125888)" 2025-05-11 00:35:46 +00:00
Context.h Revert "refine fp32 precision api (#125888)" 2025-05-11 00:35:46 +00:00
cpp_custom_type_hack.h
CPUApplyUtils.h [9/N] Fix extra warnings brought by clang-tidy-17 (#139286) 2024-10-31 05:20:31 +00:00
CPUFixedAllocator.h [9/N] Fix extra warnings brought by clang-tidy-17 (#139286) 2024-10-31 05:20:31 +00:00
CPUGeneratorImpl.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
CPUGeneratorImpl.h
Device.h
DeviceAccelerator.cpp Add torch.accelerator.device_index as accelerator's device switch context (#148864) 2025-04-25 09:45:25 +00:00
DeviceAccelerator.h Add torch.accelerator.device_index as accelerator's device switch context (#148864) 2025-04-25 09:45:25 +00:00
DeviceGuard.h Remove unneeded std::make_optional (#141567) 2024-11-28 00:05:21 +00:00
Dimname.h
DimVector.h
Dispatch_v2.h add the torch.float8_e8m0fnu dtype to PyTorch (#147466) 2025-02-20 13:55:42 +00:00
Dispatch.cpp
Dispatch.h Use std::string_view (#145906) 2025-01-30 03:14:27 +00:00
div_rtn.h
DLConvertor.cpp add torch.float4_e2m1fn_x2 to PyTorch (#148791) 2025-03-27 17:32:20 +00:00
DLConvertor.h Remove const fromDLPack overload (#139156) 2024-11-01 04:12:46 +00:00
dlpack.h [5/N] Fix extra warnings brought by clang-tidy-17 (#138403) 2024-10-21 02:59:54 +00:00
DynamicLibrary.cpp
DynamicLibrary.h Enable cppcoreguidelines-special-member-functions (#139132) 2024-11-06 13:42:20 +00:00
EmptyTensor.cpp [dynamic shapes] guard_or_false for computeStorageNbytes (#150483) 2025-05-09 19:31:19 +00:00
EmptyTensor.h
ExpandBase.h
ExpandUtils.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
ExpandUtils.h Remove deprecated alias macro(1/3) (#137556) 2024-10-21 17:32:32 +00:00
Formatting.h
FunctionalInverses.cpp [9/N] Fix extra warnings brought by clang-tidy-17 (#139286) 2024-10-31 05:20:31 +00:00
FunctionalizeFallbackKernel.cpp Revert "Make functionalization ViewMeta serializable with pickle. (#143712)" 2025-01-17 00:52:50 +00:00
FunctionalStorageImpl.cpp Revert "Make functionalization ViewMeta serializable with pickle. (#143712)" 2025-01-17 00:52:50 +00:00
FunctionalStorageImpl.h Revert "Make functionalization ViewMeta serializable with pickle. (#143712)" 2025-01-17 00:52:50 +00:00
FunctionalTensorWrapper.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
FunctionalTensorWrapper.h Revert "Make functionalization ViewMeta serializable with pickle. (#143712)" 2025-01-17 00:52:50 +00:00
FuncTorchTLS.cpp
FuncTorchTLS.h
Generator.h
InferSize.h convert guard_size_oblivious to runtime check in infer_size_impl (#148872) 2025-05-13 00:32:28 +00:00
InitialTensorOptions.h
jit_macros.h
jiterator_macros.h
Layout.h
LegacyBatchedFallback.cpp [9/N] Fix extra warnings brought by clang-tidy-17 (#139286) 2024-10-31 05:20:31 +00:00
LegacyBatchedFallback.h
LegacyBatchedTensorImpl.cpp [3/N] Avoid copy in std::get (#141843) 2024-12-06 20:13:36 +00:00
LegacyBatchedTensorImpl.h [3/N] Avoid copy in std::get (#141843) 2024-12-06 20:13:36 +00:00
LegacyBatchingRegistrations.cpp c10::string_view -> std::string_view in aten (#141903) 2024-12-07 23:23:52 +00:00
LegacyVmapMode.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
LegacyVmapMode.h
LegacyVmapTransforms.cpp [9/N] Fix extra warnings brought by clang-tidy-17 (#139286) 2024-10-31 05:20:31 +00:00
LegacyVmapTransforms.h
LinalgBackend.h
MapAllocator.cpp Bump Clang-tidy to 19.1.4 (#148648) 2025-03-10 17:32:30 +00:00
MapAllocator.h Use std::string_view (#145906) 2025-01-30 03:14:27 +00:00
MatrixRef.h [17/N] Fix extra warnings brought by clang-tidy-17 (#143804) 2024-12-25 19:54:42 +00:00
MemoryOverlap.cpp Bail on checking internal overlap when dealing with unbacked symints (#145385) 2025-01-23 22:31:31 +00:00
MemoryOverlap.h
NamedTensor.h
NamedTensorUtils.cpp [2/N] Apply bugprone-unchecked-optional-access (#141091) 2024-12-09 19:30:19 +00:00
NamedTensorUtils.h [12/N] Fix clang-tidy warnings in aten/src/ATen (#133425) 2024-08-18 11:03:55 +00:00
NestedTensorImpl.cpp [HPU] Add HPU as a supported device for NestedTensor (#148659) 2025-04-14 03:42:34 +00:00
NestedTensorImpl.h
NumericUtils.h
OpaqueTensorImpl.h Allow OpaqueTensorImpl to be used for views (#151028) 2025-04-11 20:07:47 +00:00
OpMathType.h
PadNd.h Use std::string_view (#145906) 2025-01-30 03:14:27 +00:00
Parallel-inl.h
Parallel.h [7/N] Fix extra warnings brought by clang-tidy-17 (#138972) 2024-10-26 19:09:47 +00:00
ParallelCommon.cpp [BE] Only print MKL version on x86 platforms (#143763) 2024-12-24 02:04:26 +00:00
ParallelFuture.h [7/N] Fix extra warnings brought by clang-tidy-17 (#138972) 2024-10-26 19:09:47 +00:00
ParallelNative.cpp [19/N] Fix extra warnings brought by clang-tidy-17 (#144448) 2025-01-09 15:58:05 +00:00
ParallelNative.h [12/N] Fix clang-tidy warnings in aten/src/ATen (#133425) 2024-08-18 11:03:55 +00:00
ParallelOpenMP.cpp [2/N] Use internal linkage in aten C++ files (#151070) 2025-04-14 16:07:17 +00:00
ParallelOpenMP.h
ParallelThreadPoolNative.cpp [5/N] Fix Wextra-semi warning (#139465) 2024-11-03 20:40:50 +00:00
PTThreadPool.h
PythonTorchFunctionTLS.cpp Add C API to return all torch function disablement status (#133136) 2024-08-20 07:15:04 +00:00
PythonTorchFunctionTLS.h Add C API to return all torch function disablement status (#133136) 2024-08-20 07:15:04 +00:00
record_function.cpp Add overload names to profiler trace (#143114) 2025-03-05 01:00:29 +00:00
record_function.h Add overload names to profiler trace (#143114) 2025-03-05 01:00:29 +00:00
ROCmFABackend.h [ROCm] CK Flash Attention Backend (#143695) 2025-01-03 22:01:36 +00:00
SavedTensorHooks.cpp [1/N] Apply bugprone-unchecked-optional-access (#140679) 2024-11-20 04:04:41 +00:00
SavedTensorHooks.h Update SavedTensorHooks TLS stack to use SafePyObject (#131700) 2024-08-02 16:27:16 +00:00
Scalar.h
ScalarOps.cpp [redo] Fp8 support for item() with cuda, index_select, and fill_ cpu (#137341) 2024-10-07 00:58:51 +00:00
ScalarOps.h
ScalarType.h
SDPBackend.h [16/N] Fix extra warnings brought by clang-tidy-17 (#143714) 2024-12-24 03:29:38 +00:00
SequenceNumber.cpp
SequenceNumber.h
SmallVector.h
SparseCsrTensorImpl.cpp [Intel GPU] Support SparseCsrXPU codegen (#144722) 2025-02-16 03:16:12 +00:00
SparseCsrTensorImpl.h
SparseCsrTensorUtils.h Enable cppcoreguidelines-special-member-functions (#139132) 2024-11-06 13:42:20 +00:00
SparseTensorImpl.cpp Remove deprecated alias macro(1/3) (#137556) 2024-10-21 17:32:32 +00:00
SparseTensorImpl.h
Storage.h
StorageUtils.cpp Enable more readability-redundant checks (#143963) 2024-12-30 14:49:33 +00:00
StorageUtils.h
Tensor.h
TensorAccessor.h
TensorGeometry.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
TensorGeometry.h functional compiled autograd (#144707) 2025-01-27 05:20:56 +00:00
TensorIndexing.cpp
TensorIndexing.h Support C++ statically_known_true (#151346) 2025-04-18 06:42:12 +00:00
TensorIterator.cpp [2/N] Apply bugprone-unchecked-optional-access (#141091) 2024-12-09 19:30:19 +00:00
TensorIterator.h [2/N] Apply bugprone-unchecked-optional-access (#141091) 2024-12-09 19:30:19 +00:00
TensorIteratorInternal.h
TensorMeta.cpp
TensorMeta.h
TensorNames.cpp
TensorNames.h Fix Wextra-semi warnings (#139000) 2024-10-28 21:48:51 +00:00
TensorOperators.h
TensorOptions.h
TensorSubclassLikeUtils.h
TensorUtils.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
TensorUtils.h
ThreadLocalPythonObjects.cpp
ThreadLocalPythonObjects.h
ThreadLocalState.cpp [9/N] Fix extra warnings brought by clang-tidy-17 (#139286) 2024-10-31 05:20:31 +00:00
ThreadLocalState.h Enable cppcoreguidelines-special-member-functions (#139132) 2024-11-06 13:42:20 +00:00
TracerMode.h
TypeDefault.h Remove ConstQuantizerPtr in torchgen (#142375) 2024-12-10 02:37:01 +00:00
Utils.cpp [12/N] Fix clang-tidy warnings in aten/src/ATen (#133425) 2024-08-18 11:03:55 +00:00
Utils.h Remove deprecated alias macro(1/3) (#137556) 2024-10-21 17:32:32 +00:00
Version.cpp Extend vec backend with BF16 SVE intrinsics (#143666) 2025-04-28 18:25:44 +00:00
Version.h
VmapModeRegistrations.cpp Enable misc-use-internal-linkage check and apply fixes (#148948) 2025-03-12 14:22:56 +00:00
WrapDimUtils.h [10/N] Fix extra warnings brought by clang-tidy-17 (#139385) 2024-11-04 00:47:19 +00:00
WrapDimUtilsMulti.h
ZeroTensorFallback.cpp Revert "Avoid some dangling reference warnings (#132535)" 2024-10-17 16:23:36 +00:00