| .. |
|
aoti_runtime
|
[AOTI] Fix bfloat16 in CPU (#132150)
|
2024-08-01 22:26:30 +00:00 |
|
cuda
|
[Inductor] Generalize cuda cpp wrapper as common triton based GPU cpp wrapper, will be reused by xpu in next PR. (#135312)
|
2024-09-11 23:59:54 +00:00 |
|
rocm
|
[Inductor] Generalize cuda cpp wrapper as common triton based GPU cpp wrapper, will be reused by xpu in next PR. (#135312)
|
2024-09-11 23:59:54 +00:00 |
|
xpu
|
[Inductor] Generalize device guard codegen for cpp_wrapper mode. (#134761)
|
2024-09-10 10:11:52 +00:00 |
|
__init__.py
|
|
|
|
aoti_hipify_utils.py
|
[BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/ (#129768)
|
2024-07-20 16:20:58 +00:00 |
|
common.py
|
[aoti] Fix workspace generation for triton (#135552)
|
2024-09-12 23:53:09 +00:00 |
|
cpp_gemm_template.py
|
[inductor][cpp][gemm] fix perf regression xcit_large_24_p8_224 (#134686) (#135438)
|
2024-09-09 05:16:02 +00:00 |
|
cpp_micro_gemm.py
|
Revert "[AOTI] Fix assert_function call in cpu autotune template (#135086)"
|
2024-09-10 19:51:16 +00:00 |
|
cpp_prefix.h
|
[inductor][cpp][gemm] fix perf regression xcit_large_24_p8_224 (#134686) (#135438)
|
2024-09-09 05:16:02 +00:00 |
|
cpp_template_kernel.py
|
[Inductor] Generalize cuda cpp wrapper as common triton based GPU cpp wrapper, will be reused by xpu in next PR. (#135312)
|
2024-09-11 23:59:54 +00:00 |
|
cpp_template.py
|
Revert "[AOTI] Fix assert_function call in cpu autotune template (#135086)"
|
2024-09-10 19:51:16 +00:00 |
|
cpp_utils.py
|
[inductor] Move LoopBody to its own file (#135257)
|
2024-09-07 16:29:15 +00:00 |
|
cpp_wrapper_cpu.py
|
[AOTI][Tooling] Support debug printing for inductor level extern kernel call such as externkernel.addmm, bmm, etc. (#135731)
|
2024-09-12 17:31:10 +00:00 |
|
cpp_wrapper_gpu.py
|
[aoti] Fix workspace generation for triton (#135552)
|
2024-09-12 23:53:09 +00:00 |
|
cpp.py
|
[Inductor] Generalize cuda cpp wrapper as common triton based GPU cpp wrapper, will be reused by xpu in next PR. (#135312)
|
2024-09-11 23:59:54 +00:00 |
|
cuda_combined_scheduling.py
|
Reland "[2/2] PT2 Inductor ComboKernels - automatic horizontal fusing (#131675)" (#133291)
|
2024-08-13 18:18:12 +00:00 |
|
debug_utils.py
|
[AOTI][Tooling] Support debug printing for inductor level extern kernel call such as externkernel.addmm, bmm, etc. (#135731)
|
2024-09-12 17:31:10 +00:00 |
|
halide.py
|
[Inductor] Generalize cuda cpp wrapper as common triton based GPU cpp wrapper, will be reused by xpu in next PR. (#135312)
|
2024-09-11 23:59:54 +00:00 |
|
memory_planning.py
|
[BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/ (#129768)
|
2024-07-20 16:20:58 +00:00 |
|
multi_kernel.py
|
[inductor] Remove dead code in multi_kernel.py (#134194)
|
2024-08-24 00:35:57 +00:00 |
|
simd.py
|
[inductor] Split reduction loops when there is no shared reads (#134307)
|
2024-09-12 09:45:08 +00:00 |
|
triton_combo_kernel.py
|
[inductor] More fixes on the keys of constants and signature dictionaries (#135406)
|
2024-09-13 04:10:41 +00:00 |
|
triton_split_scan.py
|
[aoti] Fix workspace generation for triton (#135552)
|
2024-09-12 23:53:09 +00:00 |
|
triton_utils.py
|
[inductor] More fixes on the keys of constants and signature dictionaries (#135406)
|
2024-09-13 04:10:41 +00:00 |
|
triton.py
|
[inductor] More fixes on the keys of constants and signature dictionaries (#135406)
|
2024-09-13 04:10:41 +00:00 |
|
wrapper.py
|
[inductor] More fixes on the keys of constants and signature dictionaries (#135406)
|
2024-09-13 04:10:41 +00:00 |