pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

haozhe.zhu 57790fd088 [inductor] share cse cache during vectorized indirect load (#124597 ) Fix https://github.com/pytorch/pytorch/issues/123502 `swap_buffer` in not needed in vectorized indirect load, remove it to share cse buffer. ``` auto tmp8 = [&] { __at_align__ std::array<int64_t, 16> tmpbuf; tmp7.store(tmpbuf.data()); return tmpbuf; } () ; // // other codes // // also store tmp7 here (redundant tmp16) auto tmp16 = [&] { __at_align__ std::array<int64_t, 16> tmpbuf; tmp7.store(tmpbuf.data()); return tmpbuf; } () ; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124597 Approved by: https://github.com/jgong5, https://github.com/jansel		2024-04-28 01:02:48 +00:00
..
aoti_runtime	[Inductor] Enable ABI-compatible mode for cpp-wrapper JIT (#121309 )	2024-03-07 14:22:06 +00:00
cuda	[Inductor Cutlass backend] Improved GEMM template (#124577 )	2024-04-26 20:03:20 +00:00
xpu	[Inductor Intel GPU backend Upstream] Add Inductor Intel GPU backend. (#121895 )	2024-04-05 09:05:11 +00:00
__init__.py
aoti_hipify_utils.py	[5/x][AMD][Lowering Enablement] Hipifying aoti code_wrapper (#124241 )	2024-04-19 18:57:38 +00:00
codegen_device_driver.py	[5/x][AMD][Lowering Enablement] Hipifying aoti code_wrapper (#124241 )	2024-04-19 18:57:38 +00:00
common.py	Do not propogate (#124769 )	2024-04-24 02:18:18 +00:00
cpp_prefix.h	[AOTI] Add more ABI-compatiblity unit test (#123900 )	2024-04-23 16:06:40 +00:00
cpp_wrapper_cpu.py	Revert "fix Invalid call to aoti_torch_tensor_copy_ #123039 (#124037 )"	2024-04-26 15:07:09 +00:00
cpp_wrapper_cuda.py	[inductor] Refactor runtime files into torch._inductor.runtime (part 1) (#124552 )	2024-04-22 18:41:12 +00:00
cpp.py	[inductor] share cse cache during vectorized indirect load (#124597 )	2024-04-28 01:02:48 +00:00
cuda_combined_scheduling.py	[Inductor Cutlass backend] Disable epilogue fusions (#124107 )	2024-04-24 13:56:44 +00:00
memory_planning.py	Fix global flake8 issues (#124771 )	2024-04-26 15:35:53 +00:00
multi_kernel.py	[inductor] Refactor runtime files into torch._inductor.runtime (part 3) (#124557 )	2024-04-22 18:46:24 +00:00
triton_foreach.py	Revert "[inductor] Remove usage of device_interface from _inductor.runtime (#124592 )"	2024-04-25 11:28:23 +00:00
triton_split_scan.py	[inductor] Remove config check for 3D tiling (#124569 )	2024-04-22 18:46:40 +00:00
triton_utils.py	[inductor] Specialize on unguarded alignment of example inputs (#123319 )	2024-04-25 22:28:15 +00:00
triton.py	Add support for capturing tensors with score_mod (#124444 )	2024-04-26 01:02:28 +00:00
wrapper.py	Improved unbacked SymInt input support in Inductor (#124739 )	2024-04-25 13:29:53 +00:00