Commit Graph

  • 392a270261 ggml: Avoid cudaMemsetAsync during memory fitting main Jesse Gross 2025-10-31 14:16:20 -0700
  • 3bee3af6ed
    cpu: always ensure LibOllamaPath included (#12890) Daniel Hiltgen 2025-10-31 14:37:29 -0700
  • 83537993d7
    logs: catch rocm errors (#12888) Daniel Hiltgen 2025-10-31 09:54:25 -0700
  • 7dd4862a89
    embeddings: removed redundant TestAPIEmbeddings test (#12863) nicole pardal 2025-10-30 17:12:33 -0700
  • db973c8fc2
    win: avoid ID mixups on refresh (#12869) Daniel Hiltgen 2025-10-30 15:12:14 -0700
  • afaf7ce8c3 ggml: Enable op_offload to improve partial offload performance Jesse Gross 2025-10-27 16:32:05 -0700
  • 26465fb85f ollamarunner: Worst case batch for token generation Jesse Gross 2025-10-27 16:31:58 -0700
  • 88236bc05f
    win: use copy for subprocess logs (#12864) Daniel Hiltgen 2025-10-30 13:22:00 -0700
  • 76eb7d0fff
    testing: test more models with tool calling (#12867) Patrick Devine 2025-10-30 13:19:21 -0700
  • f67a6df110
    interleaved mrope (#12807) Michael Yang 2025-10-30 11:29:00 -0700
  • 75e75d9afe
    qwen3vl: enable flash attention by default (#12862) Michael Yang 2025-10-30 10:51:37 -0700
  • ed78e127d0
    fix(cmd): unload model before removal (#12832) Michael Yang 2025-10-30 10:41:49 -0700
  • d432ade714
    fix: qwen2.5vl, qwen3vl composite image (#12841) Michael Yang 2025-10-30 10:33:19 -0700
  • 06b3422d5f
    tests: add tests and docs for commonly used ops (#12844) Michael Yang 2025-10-30 10:32:45 -0700
  • cbe1cf06c4
    Update README.md (#12822) Athiban Sharon 2025-10-30 17:14:39 +0000
  • 0a2d92081b
    Removing whitespace between Thinking and Content in Qwen3VL (#12838) Grace 2025-10-29 15:14:28 -0700
  • c88647104d
    int: harden server lifecycle (#12835) Daniel Hiltgen 2025-10-29 11:50:56 -0700
  • 05aff4a4f1
    tests: fix embeddinggemma integration test (#12830) Patrick Devine 2025-10-29 11:07:28 -0700
  • 0d140bd1af
    fix: conv2d bias (#12834) Michael Yang 2025-10-29 11:03:43 -0700
  • 93e45f0f0d
    docs: temporarily restore api.md and cleanup docs paths (#12818) Jeffrey Morgan 2025-10-28 23:25:48 -0700
  • a342160803
    docs: fix root api documentation page (#12813) Jeffrey Morgan 2025-10-28 19:17:54 -0700
  • f6c29409dc
    docs: add new cloud model + fix openai redirect (#12812) Jeffrey Morgan 2025-10-28 19:09:07 -0700
  • 7d25b9e194
    feat(model): add qwen3vl (#12665) Michael Yang 2025-10-28 17:39:47 -0700
  • 36d64fb531
    embed: add distance correlation test for library embed models (#12796) Patrick Devine 2025-10-28 16:57:27 -0700
  • d828517e78
    docs: update readme and links (#12809) Parth Sareen 2025-10-28 16:20:02 -0700
  • 14977a9350
    Fix vulkan PCI ID and ID handling (#12775) Daniel Hiltgen 2025-10-28 15:15:35 -0700
  • 29f63f37c8
    Revert "server: Consolidate embedding truncation in runner (#12730)" (#12810) Patrick Devine 2025-10-28 14:49:14 -0700
  • 3d99d9779a
    docs: add docs for docs.ollama.com (#12805) Parth Sareen 2025-10-28 13:18:48 -0700
  • 6d02a43a75
    docs: rename to mdx to setup docs site (#12804) Parth Sareen 2025-10-28 13:04:31 -0700
  • 5483497d7a
    Revert "docs: add reference to docs.ollama.com (#12800)" (#12803) Parth Sareen 2025-10-28 12:52:49 -0700
  • 934dd9e196
    docs: add reference to docs.ollama.com (#12800) Parth Sareen 2025-10-28 12:44:02 -0700
  • 1188f408dd
    s/From*Slice/From*s/ (#12255) Michael Yang 2025-10-28 12:08:49 -0700
  • 15c7d30d9a
    embedding tests: added check against exact base64 string (#12790) nicole pardal 2025-10-28 10:37:20 -0700
  • 9862317174
    Merge pull request #12793 from ollama/drifkin/12792_renderer-parser-from Devon Rifkin 2025-10-28 00:15:46 -0700
  • ec9eb28f4c
    gemma3: make embedding non-causal (#12297) Michael Yang 2025-10-27 19:54:08 -0700
  • 1bdd816910 create: inherit FROM model's renderer/parser Devon Rifkin 2025-10-27 15:14:19 -0700
  • 5d347f6d6f
    server: Consolidate embedding truncation in runner (#12730) nicole pardal 2025-10-27 11:59:12 -0700
  • b97eb2b858
    cloud: set the proxy content-type to the same as local models (#12759) Patrick Devine 2025-10-25 10:57:10 -0700
  • ad6f6a1d29 llm: Change memory allocation backoff from exponential to incremental Jesse Gross 2025-10-23 11:31:25 -0700
  • 6723a40be6
    readme: add VT Code project to terminal community integrations (#12749) Vinh Nguyen 2025-10-24 02:29:50 +0700
  • 3258a89b6e
    DRY out the runner lifecycle code (#12540) Daniel Hiltgen 2025-10-23 11:20:02 -0700
  • 1c093e97af kvcache: Remove special case for reservation mask Jesse Gross 2025-10-22 16:00:43 -0700
  • a8d9c2648e llamarunner: Record the time for all batches during prompt processing Jesse Gross 2025-10-16 16:27:45 -0700
  • 0334e67ffd
    tools: parse tool calls that don't conform to ("name": name, "arguments": args} (#12738) frob 2025-10-22 20:34:27 +0200
  • e0ead1adee
    embeddings: base64 encoding fix (#12715) nicole pardal 2025-10-22 11:27:44 -0700
  • d515aed6c3
    cloud: don't error sending empty messages (#12724) Patrick Devine 2025-10-21 18:12:14 -0700
  • 5fe7ba1b9b
    runner: always truncate embeddings requests (#12714) Jeffrey Morgan 2025-10-20 16:47:05 -0700
  • d2b63c19b3
    fs(ggml): fill in arch prefix if necessary (#12646) Michael Yang 2025-10-20 16:42:18 -0700
  • 94f110b35a
    model/parsers: remove warning for missing <think> tag for qwen3-vl (#12713) Jeffrey Morgan 2025-10-20 16:03:43 -0700
  • 5d22953ba7
    cuda: get driver version after props (#12707) Daniel Hiltgen 2025-10-20 10:57:27 -0700
  • d245dffed8
    rocm: give it more time to bootstrap (#12681) Daniel Hiltgen 2025-10-20 09:43:05 -0700
  • bc1a818fdc
    contiguous input per layer (#12686) Daniel Hiltgen 2025-10-17 18:39:18 -0700
  • ba2253dc30
    win: more verbose load failures (#12683) Daniel Hiltgen 2025-10-17 17:13:16 -0700
  • 68e04c7ff8
    test: harden scheduler tests (#12662) Daniel Hiltgen 2025-10-17 08:56:44 -0700
  • 270679932f
    cuda: tidy up CC settings (#12668) Daniel Hiltgen 2025-10-16 16:39:30 -0700
  • 65fb3ff49d
    renderers: add global flag for setting [img] tags (#12669) Jeffrey Morgan 2025-10-16 16:37:32 -0700
  • e2a0b24435
    Grace/qwen3 thinking (#12647) Grace 2025-10-16 15:29:41 -0700
  • 1813ff85a0
    cuda: bring back CC 5.2 (#12666) Daniel Hiltgen 2025-10-16 13:07:41 -0700
  • b531777a66
    test: add a few missing embedding models (#12661) Daniel Hiltgen 2025-10-16 09:36:25 -0700
  • fe3ec8dbf0
    Revert "Workaround broken NVIDIA iGPU free VRAM data (#12490)" (#12642) Daniel Hiltgen 2025-10-16 09:09:48 -0700
  • c744134287
    vulkan: Get FilterID from Backend for Vulkan (#12655) Thomas Stocker 2025-10-16 18:07:35 +0200
  • 4be41d2d45
    readme: add achatbot-go to community integrations (#12629) weedge 2025-10-16 12:54:15 +0800
  • de670570c9
    fs/ggml: fix function name in comment (#12630) zhetaicheleba 2025-10-16 13:53:38 +0900
  • 201d93716e
    Merge pull request #12651 from ollama/drifkin/oai-conversion Devon Rifkin 2025-10-15 21:10:30 -0700
  • 160cecc8e2 openai: make tool call conversion fns public Devon Rifkin 2025-10-15 20:54:58 -0700
  • 8b6e5baee7
    CI: Set up temporary opt-out Vulkan support (#12614) Daniel Hiltgen 2025-10-15 14:18:01 -0700
  • 75d17fc6c2
    perf: backport cuda iGPU sched spin (#12641) Daniel Hiltgen 2025-10-15 11:52:14 -0700
  • 8fafc8af77
    ml/backend/ggml: NVML fallback for unified memory GPUs (#12619) Santosh Bhavani 2025-10-15 13:40:06 -0500
  • c3c85aa06c llm: Enable flash attention by default for gemma3 Jesse Gross 2025-10-15 10:22:03 -0700
  • 0d713051a2
    envconfig: default to port 443 when connecting to ollama.com (#12617) Jeffrey Morgan 2025-10-14 23:38:24 -0700
  • c4c5a4a01e
    types: send index for tool calls (#12625) Parth Sareen 2025-10-14 19:35:15 -0700
  • 3dcfd5f69e llm: Perform eviction when num_gpu is set with new estimates Jesse Gross 2025-10-14 17:21:16 -0700
  • 53a969d509
    Merge pull request #12621 from ollama/drifkin/any-of Devon Rifkin 2025-10-14 15:51:24 -0700
  • 08fbb60bb2 qwen3-coder: support anyOf when parsing tool calls Devon Rifkin 2025-10-14 15:33:05 -0700
  • 850da848c5
    logs: fix bogus "0 MiB free" log line (#12590) Daniel Hiltgen 2025-10-14 11:26:28 -0700
  • 2aba569a2a
    Vulkan based on #9650 (#11835) Thomas Stocker 2025-10-14 19:59:58 +0200
  • fd8aa947f3
    Merge pull request #12562 from ollama/drifkin/registries Devon Rifkin 2025-10-14 02:01:53 -0700
  • ddaca643d0 add registries for parsers/renderers Devon Rifkin 2025-10-14 01:13:54 -0700
  • 05982a95cb
    Qwen3VL Cloud Parser and Renderer (#12526) Grace 2025-10-13 16:52:33 -0700
  • 4987f13d34
    Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) Gabe Goodhart 2025-10-13 16:26:18 -0600
  • e638f2acb6
    runner: fix shifting on llama runner (#12604) Jeffrey Morgan 2025-10-13 13:46:33 -0700
  • 18087f2ec7 Revert "use llama runner for qwen3 (#12556)" Michael Yang 2025-10-13 13:21:06 -0700
  • 6c833d5f8d fix(qwen3): deepseek distill Michael Yang 2025-10-13 12:09:53 -0700
  • 6544e14735
    Reapply "add truncate and shift parameters" (#12582) Jeffrey Morgan 2025-10-11 16:06:14 -0700
  • 5db8a818a1
    Merge pull request #12581 from ollama/drifkin/renderer-api-generate Devon Rifkin 2025-10-11 14:10:23 -0700
  • 6db8da9958 routes: fix built-in renderers for api/generate Devon Rifkin 2025-10-11 13:57:43 -0700
  • 0c68ec8d6a
    discover: fix typo (#12565) frob 2025-10-11 21:06:02 +0200
  • 70d9e363e1
    doc: remove AMD EOL GPUs (#12567) Daniel Hiltgen 2025-10-10 17:16:29 -0700
  • 1a2feb2a97 ollamarunner: fix deadlock Michael Yang 2025-10-10 16:38:12 -0700
  • aab2190420
    implement nvml for linux (#12517) Daniel Hiltgen 2025-10-10 15:15:56 -0700
  • 629db9dc43 comment split Michael Yang 2025-10-09 16:13:03 -0700
  • e0cd511661 fix test Michael Yang 2025-10-07 16:46:37 -0700
  • 207332078f fix lint Michael Yang 2025-10-07 16:39:14 -0700
  • 93085127f4 convert: slice gate_up weight Michael Yang 2025-10-06 16:05:38 -0700
  • c00fa9cc2b convert: split gate_up bias Michael Yang 2025-10-06 14:55:55 -0700
  • df411c4b02 refactor: using testing.B.Loop yajianggroup 2025-09-23 16:05:59 +0800
  • 3d32249c74
    use llama runner for qwen3 (#12556) Jeffrey Morgan 2025-10-09 19:08:21 -0700
  • d681cd7c29
    thinking: allow "think": false for non-thinking models (#12555) Patrick Devine 2025-10-09 18:46:00 -0700
  • 47298fce39 refactor: use builtin max and min shengxinjing 2025-09-28 23:06:33 +0100
  • 4a48937ef1 refactor: use builtin max and min shengxinjing 2025-09-25 21:25:37 +0100