ollama/llama/patches
Jesse Gross 392a270261 ggml: Avoid cudaMemsetAsync during memory fitting
We pass invalid pointers when we check the size of the required
compute graph before fitting. Some CUDA APIs validate these pointers
but we can just skip them during this phase. cudaMemsetAsync is one
of these that we weren't skipping but never took the code path that
used it. Now that we have enabled op_offload, we can hit it in
memory pressured situations.
2025-10-31 15:23:28 -07:00
..
.gitignore update vendored llama.cpp and ggml (#11823) 2025-08-14 14:42:58 -07:00
0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0002-pretokenizer.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0003-clip-unicode.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0004-solar-pro.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0005-fix-deepseek-deseret-regex.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0006-maintain-ordering-for-rules-for-grammar.patch Update GGML to b6646 (#12245) 2025-10-02 14:47:10 -07:00
0007-sort-devices-by-score.patch Update GGML to b6646 (#12245) 2025-10-02 14:47:10 -07:00
0008-add-phony-target-ggml-cpu-for-all-cpu-variants.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0009-remove-amx.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0010-fix-string-arr-kv-loading.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0011-ollama-debug-tensor.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0012-add-ollama-vocab-for-grammar-support.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0013-add-argsort-and-cuda-copy-for-i32.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0014-graph-memory-reporting-on-failure.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0015-ggml-Export-GPU-UUIDs.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0016-add-C-API-for-mtmd_input_text.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0017-no-power-throttling-win32-with-gnuc.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0018-BF16-macos-version-guard.patch Update GGML to b6646 (#12245) 2025-10-02 14:47:10 -07:00
0019-ggml-Add-batch-size-hint.patch ggml: Enable op_offload to improve partial offload performance 2025-10-30 13:53:10 -07:00
0020-Disable-ggml-blas-on-macos-v13-and-older.patch Update GGML to b6646 (#12245) 2025-10-02 14:47:10 -07:00
0021-fix-mtmd-audio.cpp-build-on-windows.patch llm: New memory management 2025-08-14 15:24:01 -07:00
0022-ggml-No-alloc-mode.patch ggml: Avoid cudaMemsetAsync during memory fitting 2025-10-31 15:23:28 -07:00
0023-decode-disable-output_all.patch Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) 2025-10-13 15:26:18 -07:00
0024-ggml-Enable-resetting-backend-devices.patch logs: fix bogus "0 MiB free" log line (#12590) 2025-10-14 11:26:28 -07:00
0025-harden-uncaught-exception-registration.patch harden uncaught exception registration (#12120) 2025-09-02 09:43:55 -07:00
0026-GPU-discovery-enhancements.patch Fix vulkan PCI ID and ID handling (#12775) 2025-10-28 15:15:35 -07:00
0027-NVML-fallback-for-unified-memory-GPUs.patch Fix vulkan PCI ID and ID handling (#12775) 2025-10-28 15:15:35 -07:00
0028-CUDA-Changing-the-CUDA-scheduling-strategy-to-spin-1.patch Fix vulkan PCI ID and ID handling (#12775) 2025-10-28 15:15:35 -07:00
0029-report-LoadLibrary-failures.patch Fix vulkan PCI ID and ID handling (#12775) 2025-10-28 15:15:35 -07:00
0032-interleave-multi-rope.patch interleaved mrope (#12807) 2025-10-30 11:29:00 -07:00