pytorch/test/inductor
Jiong Gong 0b44e1a74c [inductor][cpp][gemm] optimize arbitrary N in packed gemm template (#130690)
Currently we require `n % register_block_n == 0` which typically bring good perf when `n` is a multiply of 8, 16, 32 etc. while will fall back to the reference micro gemm otherwise (where `register_block_n == 1`). This PR optimizes this by padding `n` to the multiple of `register_block_n` which is 8, 16, 32 etc. for packed weight. Therefore, the micro-gemm can work as is on the padded `n`. When the weight is padded, we will use the local accumulation buffer to get the result from micro-gemm and then unpadded (sliced) before storing back to the output buffer.

Performance numbers measured on "Intel (R) Xeon (R) CPU Max 9480", single core, bf16.

Before
AUTOTUNE linear_unary(512x768, 3073x768, 3073)
  _linear_pointwise 2.3563 ms 100.0%
  cpp_packed_gemm_0 710.5902 ms 0.3%

After
AUTOTUNE linear_unary(512x768, 3073x768, 3073)
  cpp_packed_gemm_0 1.8909 ms 100.0%
  _linear_pointwise 2.1016 ms 90.0%

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130690
Approved by: https://github.com/leslie-fang-intel, https://github.com/jansel
ghstack dependencies: #130675
2024-07-20 06:30:15 +00:00
..
cpp
extension_backends [codemod] c10::optional -> std::optional in caffe2/aten/src/ATen/DeviceGuard.h +117 (#126901) 2024-05-24 00:26:15 +00:00
__init__.py
indirect_assert_helper.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
minifier_smoke.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
opinfo_harness.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_aot_inductor_package.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_aot_inductor_utils.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_aot_inductor.py [inductor] Separate Buffer and Operation into two concepts (#130831) 2024-07-20 02:05:07 +00:00
test_autoheuristic.py Autoheuristic: Do not store choices as metadata (#130304) 2024-07-18 21:39:42 +00:00
test_b2b_gemm.py [Inductor] B2B-GEMM performance tuning with test (#130778) 2024-07-19 22:53:57 +00:00
test_benchmark_fusion.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_binary_folding.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_ck_backend.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_codecache.py [Inductor UT] Generalize device-bias code in case TestFxGraphCache.test_inductor_counters. (#131006) 2024-07-19 01:14:22 +00:00
test_codegen_triton.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_compile_worker.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_compiled_autograd.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_compiled_optimizers.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_config.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_control_flow.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_coordinate_descent_tuner.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_cpp_wrapper_hipify.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_cpu_cpp_wrapper.py Add test to xfail_list only for abi_compatible (#128506) 2024-06-21 07:19:28 +00:00
test_cpu_repro.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_cpu_select_algorithm.py [inductor][cpp][gemm] optimize arbitrary N in packed gemm template (#130690) 2024-07-20 06:30:15 +00:00
test_cuda_cpp_wrapper.py add is_big_gpu(0) check to test_select_algorithm tests in tests/inductor/test_cuda_cpp_wrapper.py (#128652) 2024-06-18 02:00:04 +00:00
test_cuda_repro.py [inductor] Separate Buffer and Operation into two concepts (#130831) 2024-07-20 02:05:07 +00:00
test_cudacodecache.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_cudagraph_trees_expandable_segments.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_cudagraph_trees.py Revert "Invalidate StorageImpl instances when tensor is overwritten with cudagraphs (#125264)" 2024-07-19 22:58:51 +00:00
test_custom_lowering.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_custom_post_grad_passes.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_cutlass_backend.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_debug_trace.py [inductor] Separate Buffer and Operation into two concepts (#130831) 2024-07-20 02:05:07 +00:00
test_decompose_mem_bound_mm.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_dependencies.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_distributed_patterns.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_efficient_conv_bn_eval.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_extension_backend.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_flex_attention.py [inductor] Use multiple outputs for flex-attention (#130833) 2024-07-20 02:05:10 +00:00
test_flex_decoding.py Removing some cruff and updating signatures for consistency (#130871) 2024-07-18 13:32:11 +00:00
test_foreach.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_fp8.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_fused_attention.py Add Efficient Attention support on ROCM (#124885) 2024-06-08 22:41:05 +00:00
test_fx_fusion.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_graph_transform_observer.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_group_batch_fusion.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_halide.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_indexing.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_inductor_freezing.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_inductor_utils.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_inplacing_pass.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_kernel_benchmark.py Retry of D58015187 Move AsyncCompile to a different file (#127691) 2024-06-03 15:29:41 +00:00
test_layout_optim.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_loop_ordering.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_max_autotune.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_memory_planning.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_metrics.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_minifier_isolate.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_minifier.py [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 2) (#124147) 2024-06-16 08:07:05 +00:00
test_mkldnn_pattern_matcher.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_mmdecomp.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_move_constructors_to_cuda.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_multi_kernel.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_pad_mm.py Fix mm pad regresion - more conservative estimation of plannable inputs (#128909) 2024-07-18 16:42:30 +00:00
test_padding.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_pattern_matcher.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_perf.py Fix mm pad regresion - more conservative estimation of plannable inputs (#128909) 2024-07-18 16:42:30 +00:00
test_profiler.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_scatter_optimization.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_select_algorithm.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_smoke.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_snode_runtime.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_split_cat_fx_passes.py [Inductor Intel GPU backend Upstream] Reuse inductor test for Intel GPU (PART 2) (#124147) 2024-06-16 08:07:05 +00:00
test_standalone_compile.py
test_torchbind.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_torchinductor_codegen_dynamic_shapes.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_torchinductor_dynamic_shapes.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_torchinductor_opinfo.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_torchinductor_strided_blocks.py [Inductor][Intel GPU] Support reduction split. (#129120) 2024-06-21 15:11:59 +00:00
test_torchinductor.py [inductor] Avoid fallback case for custom scan op lowering (#130936) 2024-07-18 19:53:47 +00:00
test_triton_extension_backend.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_triton_heuristics.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_triton_kernels.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00
test_triton_wrapper.py Retry of D58015187 Move AsyncCompile to a different file (#127691) 2024-06-03 15:29:41 +00:00
test_unbacked_symints.py [inductor] support unbacked symint divisors in vars_and_sizes (#130595) 2024-07-16 16:21:38 +00:00
test_utils.py [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126) 2024-05-27 14:49:57 +00:00
test_xpu_basic.py [BE][Easy][12/19] enforce style for empty lines in import segments in test/i*/ (#129763) 2024-07-18 07:49:19 +00:00