pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Kai Londenberg 5d81ade484 [Inductor max autotune] Multithreaded Precompilation (#119386 ) When using the Cutlass backend, the compilation of CUDA source files can totally dominate the runtime required for the benchmarking done as part of Autotuning. This change adds a multithreaded precompilation phase, which serves to pre-populate the compilation cache ( both in-memory, and a possible on-disk sccache ). Also it ensures that no unneccessary compilation and benchmarking steps are performed, which was peviously the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119386 Approved by: https://github.com/aakhundov		2024-02-09 16:11:30 +00:00
..
cutlass_lib_extensions	Delete a bunch of type-ignores (#113990 )	2023-11-18 02:48:38 +00:00
__init__.py
cuda_cpp_scheduling.py	Delete a bunch of type-ignores (#113990 )	2023-11-18 02:48:38 +00:00
cuda_env.py	Enable local_partial_types (#118467 )	2024-01-28 13:38:22 +00:00
cuda_kernel.py	[Inductor max autotune] Multithreaded Precompilation (#119386 )	2024-02-09 16:11:30 +00:00
cuda_template.py	[Inductor CUTLASS backend] Epilogue fusion codegen (Step 1) (#110890 )	2023-11-06 19:42:10 +00:00
cutlass_epilogue_gen.py	Enable possibly-undefined error code (#118533 )	2024-01-30 21:07:01 +00:00
cutlass_utils.py	Enable local_partial_types (#118467 )	2024-01-28 13:38:22 +00:00
device_op_overrides.py	[Inductor Intel GPU backend Upstream] Step 1/3: Generalize device-bias code in code generation. (#116020 )	2023-12-22 08:42:51 +00:00
gemm_template.py	[BE]: Enable F821 and fix bugs (#116579 )	2024-01-01 08:40:46 +00:00