mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Fix https://github.com/pytorch/pytorch/issues/123502 `swap_buffer` in not needed in vectorized indirect load, remove it to share cse buffer. ``` auto tmp8 = [&] { __at_align__ std::array<int64_t, 16> tmpbuf; tmp7.store(tmpbuf.data()); return tmpbuf; } () ; // // other codes // // also store tmp7 here (redundant tmp16) auto tmp16 = [&] { __at_align__ std::array<int64_t, 16> tmpbuf; tmp7.store(tmpbuf.data()); return tmpbuf; } () ; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124597 Approved by: https://github.com/jgong5, https://github.com/jansel |
||
|---|---|---|
| .. | ||
| aoti_runtime | ||
| cuda | ||
| xpu | ||
| __init__.py | ||
| aoti_hipify_utils.py | ||
| codegen_device_driver.py | ||
| common.py | ||
| cpp_prefix.h | ||
| cpp_wrapper_cpu.py | ||
| cpp_wrapper_cuda.py | ||
| cpp.py | ||
| cuda_combined_scheduling.py | ||
| memory_planning.py | ||
| multi_kernel.py | ||
| triton_foreach.py | ||
| triton_split_scan.py | ||
| triton_utils.py | ||
| triton.py | ||
| wrapper.py | ||