pytorch/torch/_inductor/codegen
haozhe.zhu 57790fd088 [inductor] share cse cache during vectorized indirect load (#124597)
Fix https://github.com/pytorch/pytorch/issues/123502

`swap_buffer` in not needed in vectorized indirect load, remove it to share cse buffer.
```
auto tmp8 =
[&]
{
    __at_align__ std::array<int64_t, 16> tmpbuf;
    tmp7.store(tmpbuf.data());
    return tmpbuf;
}
()
;
//
// other codes
//
// also store tmp7 here (redundant tmp16)
auto tmp16 =
[&]
{
    __at_align__ std::array<int64_t, 16> tmpbuf;
    tmp7.store(tmpbuf.data());
    return tmpbuf;
}
()
;
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124597
Approved by: https://github.com/jgong5, https://github.com/jansel
2024-04-28 01:02:48 +00:00
..
aoti_runtime [Inductor] Enable ABI-compatible mode for cpp-wrapper JIT (#121309) 2024-03-07 14:22:06 +00:00
cuda [Inductor Cutlass backend] Improved GEMM template (#124577) 2024-04-26 20:03:20 +00:00
xpu [Inductor Intel GPU backend Upstream] Add Inductor Intel GPU backend. (#121895) 2024-04-05 09:05:11 +00:00
__init__.py
aoti_hipify_utils.py [5/x][AMD][Lowering Enablement] Hipifying aoti code_wrapper (#124241) 2024-04-19 18:57:38 +00:00
codegen_device_driver.py [5/x][AMD][Lowering Enablement] Hipifying aoti code_wrapper (#124241) 2024-04-19 18:57:38 +00:00
common.py Do not propogate (#124769) 2024-04-24 02:18:18 +00:00
cpp_prefix.h [AOTI] Add more ABI-compatiblity unit test (#123900) 2024-04-23 16:06:40 +00:00
cpp_wrapper_cpu.py Revert "fix Invalid call to aoti_torch_tensor_copy_ #123039 (#124037)" 2024-04-26 15:07:09 +00:00
cpp_wrapper_cuda.py [inductor] Refactor runtime files into torch._inductor.runtime (part 1) (#124552) 2024-04-22 18:41:12 +00:00
cpp.py [inductor] share cse cache during vectorized indirect load (#124597) 2024-04-28 01:02:48 +00:00
cuda_combined_scheduling.py [Inductor Cutlass backend] Disable epilogue fusions (#124107) 2024-04-24 13:56:44 +00:00
memory_planning.py Fix global flake8 issues (#124771) 2024-04-26 15:35:53 +00:00
multi_kernel.py [inductor] Refactor runtime files into torch._inductor.runtime (part 3) (#124557) 2024-04-22 18:46:24 +00:00
triton_foreach.py Revert "[inductor] Remove usage of device_interface from _inductor.runtime (#124592)" 2024-04-25 11:28:23 +00:00
triton_split_scan.py [inductor] Remove config check for 3D tiling (#124569) 2024-04-22 18:46:40 +00:00
triton_utils.py [inductor] Specialize on unguarded alignment of example inputs (#123319) 2024-04-25 22:28:15 +00:00
triton.py Add support for capturing tensors with score_mod (#124444) 2024-04-26 01:02:28 +00:00
wrapper.py Improved unbacked SymInt input support in Inductor (#124739) 2024-04-25 13:29:53 +00:00