ollama

mirror of https://github.com/zebrajr/ollama.git synced 2025-12-06 00:19:51 +01:00

History

Jesse Gross d5a0d8d904 llm: New memory management This changes the memory allocation strategy from upfront estimation to tracking actual allocations done by the engine and reacting to that. The goal is avoid issues caused by both under-estimation (crashing) and over-estimation (low performance due to under-utilized GPUs). It is currently opt-in and can be enabled for models running on the Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other cases is unchanged and will continue to use the existing estimates.		2025-08-14 15:24:01 -07:00
..
cache_test.go	ollamarunner: Separate text and multimodal graphs	2025-05-15 13:46:20 -07:00
cache.go	ggml: Support closing backends	2025-08-08 14:57:13 -07:00
multimodal.go	ml: Panic rather than return error on tensor allocation failure	2025-05-22 14:38:09 -07:00
runner.go	llm: New memory management	2025-08-14 15:24:01 -07:00