ollama

mirror of https://github.com/zebrajr/ollama.git synced 2025-12-06 00:19:51 +01:00

History

Jesse Gross d5a0d8d904 llm: New memory management This changes the memory allocation strategy from upfront estimation to tracking actual allocations done by the engine and reacting to that. The goal is avoid issues caused by both under-estimation (crashing) and over-estimation (low performance due to under-utilized GPUs). It is currently opt-in and can be enabled for models running on the Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other cases is unchanged and will continue to use the existing estimates.		2025-08-14 15:24:01 -07:00
..
internal	chore: fix some inconsistent function name in comment	2025-08-13 09:50:27 -07:00
auth.go	fix nil deref in auth.go	2024-07-26 14:14:48 -07:00
create_test.go	server: validate local path on safetensor create (#9379 )	2025-02-28 16:10:43 -08:00
create.go	remove support for multiple ggufs in a single file (#10722 )	2025-05-21 13:55:31 -07:00
download.go	server: abort download on empty digest	2025-05-27 11:28:48 -07:00
fixblobs_test.go	server: replace blob prefix separator from ':' to '-' (#3146 )	2024-03-14 20:18:06 -07:00
fixblobs.go	server: replace blob prefix separator from ':' to '-' (#3146 )	2024-03-14 20:18:06 -07:00
harmonyparser_test.go	gpt-oss (#11672 )	2025-08-05 12:21:16 -07:00
harmonyparser.go	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
images_test.go	Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 )	2025-06-20 11:11:40 -07:00
images.go	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
layer.go	One corrupt manifest should not wedge model operations (#7515 )	2024-11-05 14:21:45 -08:00
manifest_test.go	One corrupt manifest should not wedge model operations (#7515 )	2024-11-05 14:21:45 -08:00
manifest.go	One corrupt manifest should not wedge model operations (#7515 )	2024-11-05 14:21:45 -08:00
model.go	tools: refactor tool call parsing and enable streaming (#10415 )	2025-05-23 14:19:31 -07:00
modelpath_test.go	lint: enable usetesting, disable tenv (#10594 )	2025-05-08 11:42:14 -07:00
modelpath.go	server: add hint to the error message when model path access fails (#10843 )	2025-05-24 13:17:04 -07:00
prompt_test.go	gpt-oss (#11672 )	2025-08-05 12:21:16 -07:00
prompt.go	fix(openai): handle reasoning_effort (#11868 )	2025-08-12 11:02:01 -07:00
quantization_test.go	Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 )	2025-06-20 11:11:40 -07:00
quantization.go	skip quantizing per_layer_token_embd (#11207 )	2025-06-26 21:49:35 -07:00
routes_create_test.go	Move quantization to new backend (#10363 )	2025-05-06 11:20:48 -07:00
routes_delete_test.go	Update the /api/create endpoint to use JSON (#7935 )	2024-12-31 18:02:30 -08:00
routes_generate_test.go	llm: New memory management	2025-08-14 15:24:01 -07:00
routes_harmony_streaming_test.go	llm: New memory management	2025-08-14 15:24:01 -07:00
routes_list_test.go	Update the /api/create endpoint to use JSON (#7935 )	2024-12-31 18:02:30 -08:00
routes_test.go	server: use slices.Equal to simplify code (#11502 )	2025-07-23 14:25:39 -07:00
routes.go	llm: New memory management	2025-08-14 15:24:01 -07:00
sched_test.go	llm: New memory management	2025-08-14 15:24:01 -07:00
sched.go	llm: New memory management	2025-08-14 15:24:01 -07:00
sparse_common.go	Don't hard fail on sparse setup error	2024-08-09 12:16:19 -07:00
sparse_windows.go	Don't hard fail on sparse setup error	2024-08-09 12:16:19 -07:00
upload.go	server: always print upload/download part info (#8832 )	2025-02-04 19:30:49 -08:00