ollama

mirror of https://github.com/zebrajr/ollama.git synced 2025-12-06 00:19:51 +01:00

History

Jesse Gross e119783e66 llm: Clamp batch size to context size The context must always be able to store the current batch, so if the user requests a small context then we should also shrink the batch to match. This also fixes the TestLongInputContext test on the new engine. (The old engine already has this behavior.)		2025-09-08 20:40:11 -07:00
..
llm_darwin.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_linux.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_windows.go	win: lint fix (#10571 )	2025-05-05 11:08:12 -07:00
memory_test.go	llm: New memory management	2025-08-14 15:24:01 -07:00
memory.go	gptoss: enable flash attention by default (#11996 )	2025-08-26 13:34:45 -07:00
server_test.go	llm: New memory management	2025-08-14 15:24:01 -07:00
server.go	llm: Clamp batch size to context size	2025-09-08 20:40:11 -07:00
status.go	Improve crash reporting (#7728 )	2024-11-19 16:26:57 -08:00