pytorch/test
Jiang, Yanbing f2f25a5444 Upgrade submodule oneDNN to v3.7.1 (#148293)
This PR is to upgrade submodule oneDNN to v3.7.1.

## Improvements

- Improved performance of convolution and matmul primitives on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids).
- Improved performance of int8 and fp32 forward convolution primitive on processors with Intel AVX2 instruction set support.
- Improved performance of fp8 matmul primitives with bf16 and fp16 bias data type on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids).
- Introduced initial optimizations for Intel GPUs based on Xe3 architecture.
- Added bfloat16 support for SDPA, implemented fp16 and bf16 gemm kernel in SDPA.
- Fixed f16 matmul accuracy, the issue of SDPA cannot dispatched to ukernel, bf16/fp16/fp32 conv performance, INT8 Kernel trigger page fault, deconvolution precision issue on complex128 and fp64 and gemm correctness issue in float16 issues.
- Improved bf16 matmul performance with fp32 destination with Arm Compute Library (ACL).
- Improved bf16 to fp32 reorder performance.
- Improved bf16 reorder performance.
- Improved bf16 convolution with ACL.

Fixes https://github.com/pytorch/pytorch/issues/136348.

## Validation results on CPU

1. NLP models accuracy/inference/training
![image](https://github.com/user-attachments/assets/859279b8-1631-4268-b226-7de9ac5870d8)

![image](https://github.com/user-attachments/assets/30ec7151-41ca-482a-9d2d-0c4850e75bab)

2. Torchbench cpu userbenchmark inference & training

![image](https://github.com/user-attachments/assets/71c9807c-caf9-4385-9990-d2ab637031cd)

3. Inductor quantization

![image](https://github.com/user-attachments/assets/3d2a3bd3-82fa-4566-8050-7ea5d6b61675)

4. Dynamo benchmarks
![image](https://github.com/user-attachments/assets/554ecce3-c85c-4a0e-88f1-2e73983c5dcd)
![image](https://github.com/user-attachments/assets/148c88f8-4367-4428-bb54-ce8a4deefd1b)
![image](https://github.com/user-attachments/assets/f2e744f4-d710-4699-acf4-1f130ecfadf1)
![image](https://github.com/user-attachments/assets/97128b80-4d0e-495a-aeda-dde3e70c96fd)
![image](https://github.com/user-attachments/assets/a9afce37-684c-45c0-b938-6dd7e0383805)
![image](https://github.com/user-attachments/assets/b8714236-9681-4fbe-8d98-be93deedab88)
![image](https://github.com/user-attachments/assets/4423061f-d133-45ba-98bd-d2f739e50431)
![image](https://github.com/user-attachments/assets/7955da10-3d23-493e-99fa-658f7f40035b)

## Validation results on XPU
Accuracy is same as baseline. Performance is shown below.
![image](https://github.com/user-attachments/assets/7645304d-5b1d-43f9-b840-9f846ed380a0)

## Validation results on ARM
![image](https://github.com/user-attachments/assets/080f7c02-0238-436f-ad20-5a9e3f6aafbb)
![image](https://github.com/user-attachments/assets/443742aa-ca61-41de-ae80-5d4c65cd0c87)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148293
Approved by: https://github.com/mingfeima, https://github.com/atalman
2025-03-04 13:56:45 +00:00
..
ao/sparsity [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
autograd Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
backends/xeon
benchmark_utils PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
bottleneck_test Fix unused Python variables in test/[a-d]* (#134665) 2024-12-13 22:13:12 +00:00
cpp Introduce cache clearing APIs for the lazy graph executor (#144489) 2025-01-29 17:38:01 +00:00
cpp_api_parity
cpp_extensions xpu: support sycl with torch.utils.cpp_extension APIs (#132945) 2025-02-16 16:50:59 +00:00
custom_backend
custom_operator [dynamo] make some more graph break messages readable in English [2/N] (#147385) 2025-02-26 09:20:28 +00:00
distributed [DCP] Introduce process based async checkpointing (#147039) 2025-03-04 13:33:28 +00:00
distributions Temp disable MKL in DistributionKernels.cpp (#146174) 2025-02-01 18:53:11 +00:00
dynamo Introduce delayed compile via eager_then_compile stance (#147983) 2025-03-04 07:46:31 +00:00
dynamo_expected_failures [Dynamo] support isinstance(...) check for type tuple (#146984) 2025-02-16 10:41:49 +00:00
dynamo_skips Move Dynamo test to skip from expected_failures (#145390) 2025-01-22 19:06:39 +00:00
edge Use std::string_view in tests (#146120) 2025-02-04 09:51:36 +00:00
error_messages
expect [Inductor] Avoid tensor slice overflow for large step (#147433) 2025-03-02 16:07:15 +00:00
export Consistently use load_torchbind_test_lib in tests (#148082) 2025-03-03 19:37:28 +00:00
forward_backward_compatibility Clean up op BC check list (#146577) 2025-02-07 22:40:49 +00:00
functorch Make Tensor.set_ validate storage_offset when sizes/strides are unchanged (#147354) 2025-02-27 15:48:58 +00:00
fx Subprocess compile (#146134) 2025-03-03 21:10:12 +00:00
higher_order_ops [invoke_subgraph] Run joint passes on the hop graphs (#139325) 2025-03-03 23:38:14 +00:00
inductor Upgrade submodule oneDNN to v3.7.1 (#148293) 2025-03-04 13:56:45 +00:00
inductor_expected_failures Set enable_faithful_generator_behavior flag to True (#142513) 2025-02-08 22:42:12 +00:00
inductor_skips [BE] Remove test_ops from FIXME_inductor_dont_reset_dynamo (#145307) 2025-01-27 18:12:39 +00:00
jit Consistently use load_torchbind_test_lib in tests (#148082) 2025-03-03 19:37:28 +00:00
jit_hooks
lazy [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
mobile PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
nn [ROCm] miopen benchmark behavior now better aligns with cudnn (#145294) 2025-02-05 17:19:53 +00:00
onnx [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
optim Adding support for differentiable lr, weight_decay, and betas in Adam/AdamW (#143726) 2024-12-30 01:11:57 +00:00
package Remove outdated test skipif conditions for Python3.9 (#146144) 2025-01-31 19:01:04 +00:00
profiler Enable CUPTI on Windows (#141454) 2025-02-06 15:58:20 +00:00
quantization add the torch.float8_e8m0fnu dtype to PyTorch (#147466) 2025-02-20 13:55:42 +00:00
scripts
strobelight/examples Enable strobelight profiling specific compile frame ids using COMPILE_STROBELIGHT_FRAME_FILTER (#147549) 2025-02-22 03:44:53 +00:00
test_img
torch_np [BE]: Enable ruff SLOT checks (#146276) 2025-02-04 19:18:23 +00:00
typing Revert "Fix non-bitwise type annotations for Tensor operators (see #145838) (#146845)" 2025-02-18 19:01:27 +00:00
xpu [Intel GPU] Enable fp64 GEMM (#140677) 2025-02-17 08:15:55 +00:00
_test_bazel.py
allowlist_for_publicAPI.json Improve typing in torch/types.py (#145237) 2025-01-28 05:29:12 +00:00
conftest.py Apply ruff fixes to tests (#146140) 2025-02-04 05:41:01 +00:00
create_dummy_torchscript_model.py
delete.py
hi.py
HowToWriteTestsUsingFileCheck.md
linear.py
load_torchscript_model.py
minioptest_failures_dict.json
mkl_verbose.py
mkldnn_verbose.py
pytest_shard_custom.py
run_doctests.sh
run_test.py [Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging (#147019) 2025-03-04 12:16:38 +00:00
simulate_nccl_errors.py
slow_tests.json Update slow tests (#147728) 2025-02-24 11:48:19 +00:00
test_accelerator.py Generalize pin memory logic for accelerator when non blocking copy happened (#143783) 2025-01-23 03:43:05 +00:00
test_ao_sparsity.py
test_appending_byte_serializer.py Add AppendingByteSerializer class (#148226) 2025-03-02 08:20:58 +00:00
test_autocast.py Enable TemporaryFileName tests on Windows (#146311) 2025-02-07 06:06:18 +00:00
test_autograd_fallback.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_autograd.py [BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408) 2025-02-04 19:07:04 +00:00
test_autoload.py
test_binary_ufuncs.py Fix lerp weight type promotion (#141117) 2025-01-24 01:18:20 +00:00
test_bundled_images.py
test_bundled_inputs.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_ci_sanity_check_fail.py
test_comparison_utils.py
test_compile_benchmark_util.py
test_complex.py
test_content_store.py torch.utils._content_store: fix error in hash_storage on XPU (#147785) 2025-02-26 23:57:59 +00:00
test_cpp_api_parity.py Enable C++ API parity tests on AArch64 (#145370) 2025-01-30 22:42:49 +00:00
test_cpp_extensions_aot.py xpu: support sycl with torch.utils.cpp_extension APIs (#132945) 2025-02-16 16:50:59 +00:00
test_cpp_extensions_jit.py xpu: support sycl with torch.utils.cpp_extension APIs (#132945) 2025-02-16 16:50:59 +00:00
test_cpp_extensions_mtia_backend.py
test_cpp_extensions_open_device_registration.py Update pin memory related APIs to not pass 'device' argument (#131858) 2025-01-15 17:23:35 +00:00
test_cpp_extensions_stream_and_event.py
test_cuda_expandable_segments.py Revert "Use absolute path path.resolve() -> path.absolute() (#129409)" 2025-01-04 14:17:20 +00:00
test_cuda_multigpu.py Add get_stream_from_external API for CUDA backend (#143799) 2024-12-31 11:15:59 +00:00
test_cuda_nvml_based_avail.py
test_cuda_primary_ctx.py
test_cuda_sanitizer.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_cuda_trace.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_cuda.py Remove outdated CUDA version check (#148142) 2025-03-04 03:33:44 +00:00
test_custom_ops.py [dynamo] improved graph break messages for some common graph break sites [1/N] (#146525) 2025-02-20 00:08:13 +00:00
test_dataloader.py Remove NO_MULTIPROCESSING_SPAWN checks (#146705) 2025-02-28 05:53:19 +00:00
test_datapipe.py Remove unactivated test (#146233) 2025-02-04 05:26:04 +00:00
test_decomp.py Update ruff linter for PEP585 (#147540) 2025-02-22 04:45:17 +00:00
test_deploy.py
test_determination.py
test_dispatch.py [BE][CI] bump ruff to 0.9.0: string quote styles (#144569) 2025-02-24 19:56:09 +00:00
test_dlpack.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_dynamic_shapes.py realize stride symbols in estimate_runtime (#146752) 2025-02-19 06:02:49 +00:00
test_expanded_weights.py No actual change, just remove variable contain Tensors from global scope (#143225) 2024-12-17 16:14:25 +00:00
test_extension_utils.py Move privateuse1 test out of test_utils and make them serial (#145380) 2025-01-23 00:31:39 +00:00
test_fake_tensor.py support meta_tensor.to(device='cpu') under fake_mode (#146729) 2025-02-12 20:57:10 +00:00
test_file_check.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_flop_counter.py Build RowwiseScaledMM.cu for SM89 (#145676) 2025-02-01 11:44:58 +00:00
test_foreach.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
test_function_schema.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_functional_autograd_benchmark.py Enable Windows tests (#146666) 2025-02-08 00:55:20 +00:00
test_functional_optim.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_functionalization_of_rng_ops.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_functionalization.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_futures.py
test_fx_experimental.py PEP585: Add noqa to necessary tests (#146391) 2025-02-12 15:29:50 +00:00
test_fx_passes.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_fx_reinplace_pass.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_fx.py [FX] Refactor immutable collections implementation (#144640) 2025-02-24 09:14:08 +00:00
test_hop_infra.py Support torch.compile rng selective activation checkpointing with cudagraph (#146878) 2025-02-28 00:47:03 +00:00
test_hub.py
test_import_stats.py
test_indexing.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_itt.py
test_jit_autocast.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_jit_disabled.py
test_jit_fuser_legacy.py
test_jit_fuser_te.py [BE][CI] bump ruff to 0.9.0: string quote styles (#144569) 2025-02-24 19:56:09 +00:00
test_jit_fuser.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_jit_legacy.py
test_jit_llga_fuser.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_jit_profiling.py
test_jit_simple.py
test_jit_string.py PEP585 update - test (#145176) 2025-01-22 04:48:28 +00:00
test_jit.py scriptfunction: Make sure we have valid __name__ and __qualname__ (#147906) 2025-02-28 23:25:47 +00:00
test_jiterator.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_kernel_launch_checks.py
test_legacy_vmap.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_license.py
test_linalg.py [ROCm] [TunableOp] Unit tests for scaled GEMM and GEMM with bias (#147890) 2025-02-26 22:41:24 +00:00
test_logging.py
test_masked.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_maskedtensor.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_matmul_cuda.py torch._scaled_mm with MXFP8 (#147548) 2025-02-27 02:44:39 +00:00
test_meta.py [inductor] fix index.Tensor fallback (#144736) 2025-01-16 09:38:29 +00:00
test_metal.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_mkl_verbose.py
test_mkldnn_fusion.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_mkldnn_verbose.py
test_mkldnn.py [BE][CI] bump ruff to 0.8.4 (#143753) 2024-12-24 12:24:10 +00:00
test_mobile_optimizer.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_model_exports_to_core_aten.py
test_module_tracker.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_modules.py Disable slow gradcheck for nn.Transformer ModuleInfo (#145531) 2025-01-25 00:58:03 +00:00
test_monitor.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_mps.py [MPS] Fix sqrt and other for torch.chalf (#148285) 2025-03-03 16:03:54 +00:00
test_multiprocessing_spawn.py Remove NO_MULTIPROCESSING_SPAWN checks (#146705) 2025-02-28 05:53:19 +00:00
test_multiprocessing.py Remove NO_MULTIPROCESSING_SPAWN checks (#146705) 2025-02-28 05:53:19 +00:00
test_namedtensor.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_namedtuple_return_api.py
test_native_functions.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_native_mha.py
test_nestedtensor.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
test_nn.py Use float data type for Half sum in fallback implementation of batchnorm backward on CPU (#147353) 2025-02-21 01:33:33 +00:00
test_nnapi.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_numba_integration.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_numpy_interop.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_openmp.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_ops_fwd_gradients.py
test_ops_gradients.py
test_ops_jit.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_ops.py [ARM] Fix bug in _ref_test_helper in test_ops and fix failing test on Aarch64 (#146597) 2025-02-25 14:15:10 +00:00
test_optim.py [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546) 2025-02-27 20:46:16 +00:00
test_out_dtype_op.py [Codemod][AddExplicitStrictExportArg] caffe2/test (#143688) 2024-12-27 07:58:44 +00:00
test_overrides.py Use std::string_view in tests (#146120) 2025-02-04 09:51:36 +00:00
test_package.py
test_per_overload_api.py
test_prims.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_proxy_tensor.py Fix register constant to be usable in exportz (#147533) 2025-02-25 21:10:47 +00:00
test_pruning_op.py
test_public_bindings.py Remove public_allowlist from TestPublicBindings.test_correct_module_names and ensure private_allowlist-ed things are actually private (#145620) 2025-01-27 17:30:02 +00:00
test_python_dispatch.py Delete torch._library.register_functional_op (#145110) 2025-01-18 00:58:25 +00:00
test_pytree.py [pytree][Easy] preserve dict keys in insertion order in CXX pytree (#130140) 2025-02-12 16:41:49 +00:00
test_quantization.py Add support for prototype affine quantization in pt2e flow (#141421) 2024-12-24 04:22:18 +00:00
test_reductions.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_scatter_gather_ops.py
test_schema_check.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_segment_reductions.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_serialization.py Add sparse tensors constructed via legacy constructor to _sparse_tensors_to_validate (#147759) 2025-02-25 23:51:12 +00:00
test_set_default_mobile_cpu_allocator.py
test_shape_ops.py [Quant] flip: throw runtime error for QUInt4x2 and QUInt2x4 input (#147430) 2025-02-25 03:47:40 +00:00
test_show_pickle.py
test_sort_and_select.py Fix linter F821 error (#146665) 2025-02-08 07:19:37 +00:00
test_sparse_csr.py [CUDA][SDPA] Compute reference in test_triton_scaled_dot_product_attention_block_size_16_cuda_float32 in float64 (#146461) 2025-02-06 23:28:56 +00:00
test_sparse_semi_structured.py [TEST][Sparse] Force CUTLASS backend in TestSparseSemiStructuredCUTLASS (#146398) 2025-02-04 22:07:12 +00:00
test_sparse.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_spectral_ops.py Re-add stft option to align window for center = false (#146379) 2025-02-06 14:07:13 +00:00
test_stateless.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_static_runtime.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_subclass.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_sympy_utils.py [Inductor] Expand Identity ops prior to block pattern matching (#146000) 2025-02-08 18:11:53 +00:00
test_tensor_creation_ops.py [Inductor] Add input value checking to randint meta function (#147191) 2025-02-25 02:18:16 +00:00
test_tensorboard.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_tensorexpr_pybind.py
test_tensorexpr.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_testing.py Enable some tests on Windows (#146243) 2025-02-05 03:54:28 +00:00
test_throughput_benchmark.py Fix Throughputbenchmark issue (#144669) 2025-01-26 03:37:20 +00:00
test_torch.py Remove outdated CUDA version check (#148142) 2025-03-04 03:33:44 +00:00
test_transformers_privateuse1.py Split test_transformers.py (#147441) 2025-02-26 11:54:24 +00:00
test_transformers.py [Intel GPU] Enable SDPA on XPU (#147614) 2025-03-04 01:40:45 +00:00
test_type_hints.py Revert "Use absolute path path.resolve() -> path.absolute() (#129409)" 2025-01-04 14:17:20 +00:00
test_type_info.py
test_type_promotion.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_typing.py [4/N] Apply py39 ruff and pyupgrade fixes (#143257) 2025-01-04 10:47:51 +00:00
test_unary_ufuncs.py Enable some tests on Windows (#146243) 2025-02-05 03:54:28 +00:00
test_utils_config_module.py Add check that envvar configs are boolean (#145454) 2025-02-05 19:40:10 +00:00
test_utils_filelock.py filelock: Make waitcounter variant to use (#139816) 2024-12-12 01:18:34 +00:00
test_utils.py [utils] add try_import method for importing optional modules (#145528) 2025-01-25 00:14:07 +00:00
test_view_ops.py Fix overflow in checkInBoundsForStorage (#147352) 2025-02-27 15:48:50 +00:00
test_vulkan.py Fix unused Python variables in test/[e-z]* (#136964) 2024-12-18 23:02:30 +00:00
test_weak.py Consistently use load_torchbind_test_lib in tests (#148082) 2025-03-03 19:37:28 +00:00
test_xnnpack_integration.py [BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408) 2025-02-04 19:07:04 +00:00
test_xpu.py Fix test_device_memory_allocated (#147311) 2025-02-17 19:00:53 +00:00