ollama/llm
Jesse Gross e119783e66 llm: Clamp batch size to context size
The context must always be able to store the current batch, so
if the user requests a small context then we should also shrink
the batch to match. This also fixes the TestLongInputContext
test on the new engine. (The old engine already has this behavior.)
2025-09-08 20:40:11 -07:00
..
llm_darwin.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_linux.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_windows.go win: lint fix (#10571) 2025-05-05 11:08:12 -07:00
memory_test.go llm: New memory management 2025-08-14 15:24:01 -07:00
memory.go gptoss: enable flash attention by default (#11996) 2025-08-26 13:34:45 -07:00
server_test.go llm: New memory management 2025-08-14 15:24:01 -07:00
server.go llm: Clamp batch size to context size 2025-09-08 20:40:11 -07:00
status.go Improve crash reporting (#7728) 2024-11-19 16:26:57 -08:00