ollama

mirror of https://github.com/zebrajr/ollama.git synced 2025-12-06 00:19:51 +01:00

History

Jesse Gross 94ab428e3f ggml: Seperate tensor load from backend creation Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps: - Create backend, including enumerating tensors and memory allocation - Loading tensor data This allows more flexibility in managing model loading.		2025-05-19 09:54:22 -07:00
..
llm_darwin.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_linux.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_windows.go	win: lint fix (#10571 )	2025-05-05 11:08:12 -07:00
memory_test.go	Move quantization to new backend (#10363 )	2025-05-06 11:20:48 -07:00
memory.go	ggml: Seperate tensor load from backend creation	2025-05-19 09:54:22 -07:00
server_test.go	lint: enable usetesting, disable tenv (#10594 )	2025-05-08 11:42:14 -07:00
server.go	ggml: Seperate tensor load from backend creation	2025-05-19 09:54:22 -07:00
status.go	Improve crash reporting (#7728 )	2024-11-19 16:26:57 -08:00