ollama

mirror of https://github.com/zebrajr/ollama.git synced 2025-12-06 00:19:51 +01:00

History

Jesse Gross 392a270261 ggml: Avoid cudaMemsetAsync during memory fitting We pass invalid pointers when we check the size of the required compute graph before fitting. Some CUDA APIs validate these pointers but we can just skip them during this phase. cudaMemsetAsync is one of these that we weren't skipping but never took the code path that used it. Now that we have enabled op_offload, we can hit it in memory pressured situations.		2025-10-31 15:23:28 -07:00
..
.gitignore	update vendored llama.cpp and ggml (#11823 )	2025-08-14 14:42:58 -07:00
0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0002-pretokenizer.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0003-clip-unicode.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0004-solar-pro.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0005-fix-deepseek-deseret-regex.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0006-maintain-ordering-for-rules-for-grammar.patch	Update GGML to b6646 (#12245 )	2025-10-02 14:47:10 -07:00
0007-sort-devices-by-score.patch	Update GGML to b6646 (#12245 )	2025-10-02 14:47:10 -07:00
0008-add-phony-target-ggml-cpu-for-all-cpu-variants.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0009-remove-amx.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0010-fix-string-arr-kv-loading.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0011-ollama-debug-tensor.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0012-add-ollama-vocab-for-grammar-support.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0013-add-argsort-and-cuda-copy-for-i32.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0014-graph-memory-reporting-on-failure.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0015-ggml-Export-GPU-UUIDs.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0016-add-C-API-for-mtmd_input_text.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0017-no-power-throttling-win32-with-gnuc.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0018-BF16-macos-version-guard.patch	Update GGML to b6646 (#12245 )	2025-10-02 14:47:10 -07:00
0019-ggml-Add-batch-size-hint.patch	ggml: Enable op_offload to improve partial offload performance	2025-10-30 13:53:10 -07:00
0020-Disable-ggml-blas-on-macos-v13-and-older.patch	Update GGML to b6646 (#12245 )	2025-10-02 14:47:10 -07:00
0021-fix-mtmd-audio.cpp-build-on-windows.patch	llm: New memory management	2025-08-14 15:24:01 -07:00
0022-ggml-No-alloc-mode.patch	ggml: Avoid cudaMemsetAsync during memory fitting	2025-10-31 15:23:28 -07:00
0023-decode-disable-output_all.patch	Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552 )	2025-10-13 15:26:18 -07:00
0024-ggml-Enable-resetting-backend-devices.patch	logs: fix bogus "0 MiB free" log line (#12590 )	2025-10-14 11:26:28 -07:00
0025-harden-uncaught-exception-registration.patch	harden uncaught exception registration (#12120 )	2025-09-02 09:43:55 -07:00
0026-GPU-discovery-enhancements.patch	Fix vulkan PCI ID and ID handling (#12775 )	2025-10-28 15:15:35 -07:00
0027-NVML-fallback-for-unified-memory-GPUs.patch	Fix vulkan PCI ID and ID handling (#12775 )	2025-10-28 15:15:35 -07:00
0028-CUDA-Changing-the-CUDA-scheduling-strategy-to-spin-1.patch	Fix vulkan PCI ID and ID handling (#12775 )	2025-10-28 15:15:35 -07:00
0029-report-LoadLibrary-failures.patch	Fix vulkan PCI ID and ID handling (#12775 )	2025-10-28 15:15:35 -07:00
0032-interleave-multi-rope.patch	interleaved mrope (#12807 )	2025-10-30 11:29:00 -07:00