ollama

mirror of https://github.com/zebrajr/ollama.git synced 2025-12-06 12:19:56 +01:00

History

Jesse Gross 4100ed7bdd ml: Add support for quantized KV cache Similar to the llama engine, quantizing the KV cache requires flash attention to be enabled through the Ollama server.		2025-03-07 18:43:39 -08:00
..
backend	ml: Add support for quantized KV cache	2025-03-07 18:43:39 -08:00
nn	attention: Remove unnecessary contiguous operations	2025-03-01 20:53:23 -08:00
backend.go	ml: Add support for quantized KV cache	2025-03-07 18:43:39 -08:00