pytorch/torch/_inductor/codegen
Ying Zhang 097fd43f8c [Inductor CUTLASS backend] Step 4: CUDA (template) kernels (#107931)
This is the step 4 to add cutlass as an alternative inductor backend.
Full tests can be found from the last PR in the stack.

Feature request: https://github.com/pytorch/pytorch/issues/106991.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107931
Approved by: https://github.com/aakhundov, https://github.com/jansel, https://github.com/kadeng
ghstack dependencies: #107802, #107847, #107901
2023-09-12 17:44:38 +00:00
..
aot_runtime [inductor] Move AOTInductor runtime headers (#108564) 2023-09-06 11:50:41 +00:00
cuda [Inductor CUTLASS backend] Step 4: CUDA (template) kernels (#107931) 2023-09-12 17:44:38 +00:00
__init__.py
common.py [Inductor CUTLASS backend] Step 4: CUDA (template) kernels (#107931) 2023-09-12 17:44:38 +00:00
cpp_prefix.h inductor: support masked load for cpu path (#107670) 2023-08-25 21:11:09 +00:00
cpp.py [Inductor CUTLASS backend] Step 4: CUDA (template) kernels (#107931) 2023-09-12 17:44:38 +00:00
triton_foreach.py [inductor] Add CPU-side profiler event names for templates and foreach kernels (#108449) 2023-09-09 02:11:13 +00:00
triton_utils.py [inductor] Enable Mypy Checking for torch/_inductor/codegen/triton_utils.py (#108951) 2023-09-10 19:18:51 +00:00
triton.py [Inductor CUTLASS backend] Step 4: CUDA (template) kernels (#107931) 2023-09-12 17:44:38 +00:00
wrapper.py [Inductor CUTLASS backend] Step 4: CUDA (template) kernels (#107931) 2023-09-12 17:44:38 +00:00