mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Ramya Ramineni 7a974a88f2 [ROCm] Fix resource_strings.h (#159996 ) This PR fixes the errors like below: ``` [rank7]: RuntimeError: /tmp/comgr-c3c81b/input/CompileSourceejOPx6:34:8: error: unknown type name 'uint64_t'; did you mean '__hip_internal::uint64_t'? [rank7]: 34 \| if(((uint64_t) t0.data) % (4 * sizeof(half)) != 0) flag_vec4 = false; ``` The following datatypes needs to be defined in `torch/csrc/jit/codegen/fuser/cuda/resource_strings.h` for ROCm versions >= 7.0. ``` typedef unsigned char uint8_t; typedef signed char int8_t; typedef short int int16_t; typedef long long int int64_t; typedef unsigned long long int uint64_t; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/159996 Approved by: https://github.com/pruthvistony, https://github.com/Skylion007, https://github.com/jeffdaily		2025-08-12 01:58:02 +00:00
..
cpu	[Environment Variable][7/N] Use thread-safe getenv functions (#140211 )	2025-04-24 01:06:29 +00:00
cuda	[ROCm] Fix resource_strings.h (#159996 )	2025-08-12 01:58:02 +00:00
arg_spec.h	Remove cppcoreguidelines-pro-type-member-init_fix suppression (#148638 )	2025-04-02 01:33:20 +00:00
codegen.cpp	[BE][10/16] fix typos in torch/ (torch/csrc/jit/) (#156320 )	2025-07-02 22:55:29 +00:00
codegen.h
compiler.cpp	[Environment Variable][7/N] Use thread-safe getenv functions (#140211 )	2025-04-24 01:06:29 +00:00
compiler.h	[Lint] Update clang-format to 19.1.4 (#153889 )	2025-05-20 14:12:46 +00:00
executor.cpp	[BE][10/16] fix typos in torch/ (torch/csrc/jit/) (#156320 )	2025-07-02 22:55:29 +00:00
executor.h
fallback.cpp	[1/N] Use internal linkage in torch/csrc C++ files. (#150930 )	2025-04-11 02:19:31 +00:00
fallback.h
fused_kernel.h	[BE][10/16] fix typos in torch/ (torch/csrc/jit/) (#156320 )	2025-07-02 22:55:29 +00:00
interface.cpp	[1/N] Use internal linkage in torch/csrc C++ files. (#150930 )	2025-04-11 02:19:31 +00:00
interface.h
kernel_cache.cpp
kernel_cache.h
kernel_spec.h	Fix clang-tidy warnings in torch/jit (#146963 )	2025-02-15 03:36:59 +00:00
partition_desc.h
README.md
tensor_desc.h
tensor_info.h

README.md

PyTorch Fuser

The fuser accepts subgraphs wrapped in "fusion nodes" and tries to execute them by just-in-time (JIT) compiling kernels that run all the graph operations.

Code Organization

The fuser is designed hierarchically with device-independent logic eventually deferring to device-specific logic and implementation. The device-specific code is (mostly) found in each devices' subdirectory. The device-independent logic has six components:

The Interface (interface.h/cpp) has functions to register and run fusions, interrogate fusion functionality, and perform debugging.
The Compiler (compiler.h/cpp) performs "upfront" and "runtime" compilation. When fusions are registered, upfront compilation produces fallback code and and performs some shape inference. When a fusion is run, runtime compilation invokes code generation and the device-specific compilation logic.
The Code Generator (codegen.h/cpp) produces the string to be compiled on the device.
The Executor (executor.h/cpp) runs requested fusions. It performs shape inference, expands tensors as necessary, determines the device to run on, acquires a cached compiled kernel or requests the Compiler produce a new one, invokes device-specific code to launch the kernel and updates the stack.
The Fallback (fallback.h/cpp) runs subgraphs that can't be fused because shape inference didn't determine a common tensor size or the device the tensors are on doesn't support fusion.
The Kernel Specification Cache (kernel_cache.h/cpp) is a thread-safe cache holding the device-independent specifications produced during upfront compilation. These specifications each have their own thread-safe stores of compiled kernels that the Executor checks before requesting runtime compilation.

The device-specific components have logic for compiling and running code in FusedKernelCPU (cpu/fused_kernel.h/cpp) and FusedKernelCUDA (cuda/fused_kernel.h/cpp).