ollama/llm
Jesse Gross 94ab428e3f ggml: Seperate tensor load from backend creation
Currently, when the backend is created, the tensors are loaded at the
same time, which is a slow operation. This separates them to be two
steps:
 - Create backend, including enumerating tensors and memory allocation
 - Loading tensor data

This allows more flexibility in managing model loading.
2025-05-19 09:54:22 -07:00
..
llm_darwin.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_linux.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_windows.go win: lint fix (#10571) 2025-05-05 11:08:12 -07:00
memory_test.go Move quantization to new backend (#10363) 2025-05-06 11:20:48 -07:00
memory.go ggml: Seperate tensor load from backend creation 2025-05-19 09:54:22 -07:00
server_test.go lint: enable usetesting, disable tenv (#10594) 2025-05-08 11:42:14 -07:00
server.go ggml: Seperate tensor load from backend creation 2025-05-19 09:54:22 -07:00
status.go Improve crash reporting (#7728) 2024-11-19 16:26:57 -08:00