ollama

OSSForks/ollama

Fork 0

mirror of https://github.com/zebrajr/ollama.git synced 2025-12-06 00:19:51 +01:00

392a270261 ggml: Avoid cudaMemsetAsync during memory fitting main Jesse Gross 2025-10-31 14:16:20 -0700
3bee3af6ed

cpu: always ensure LibOllamaPath included (#12890) Daniel Hiltgen 2025-10-31 14:37:29 -0700
83537993d7

logs: catch rocm errors (#12888) Daniel Hiltgen 2025-10-31 09:54:25 -0700
7dd4862a89

embeddings: removed redundant TestAPIEmbeddings test (#12863) nicole pardal 2025-10-30 17:12:33 -0700
db973c8fc2

win: avoid ID mixups on refresh (#12869) Daniel Hiltgen 2025-10-30 15:12:14 -0700
afaf7ce8c3 ggml: Enable op_offload to improve partial offload performance Jesse Gross 2025-10-27 16:32:05 -0700
26465fb85f ollamarunner: Worst case batch for token generation Jesse Gross 2025-10-27 16:31:58 -0700
88236bc05f

win: use copy for subprocess logs (#12864) Daniel Hiltgen 2025-10-30 13:22:00 -0700
76eb7d0fff

testing: test more models with tool calling (#12867) Patrick Devine 2025-10-30 13:19:21 -0700
f67a6df110

interleaved mrope (#12807) Michael Yang 2025-10-30 11:29:00 -0700
75e75d9afe

qwen3vl: enable flash attention by default (#12862) Michael Yang 2025-10-30 10:51:37 -0700
ed78e127d0

fix(cmd): unload model before removal (#12832) Michael Yang 2025-10-30 10:41:49 -0700
d432ade714

fix: qwen2.5vl, qwen3vl composite image (#12841) Michael Yang 2025-10-30 10:33:19 -0700
06b3422d5f

tests: add tests and docs for commonly used ops (#12844) Michael Yang 2025-10-30 10:32:45 -0700
cbe1cf06c4

Update README.md (#12822) Athiban Sharon 2025-10-30 17:14:39 +0000
0a2d92081b

Removing whitespace between Thinking and Content in Qwen3VL (#12838) Grace 2025-10-29 15:14:28 -0700
c88647104d

int: harden server lifecycle (#12835) Daniel Hiltgen 2025-10-29 11:50:56 -0700
05aff4a4f1

tests: fix embeddinggemma integration test (#12830) Patrick Devine 2025-10-29 11:07:28 -0700
0d140bd1af

fix: conv2d bias (#12834) Michael Yang 2025-10-29 11:03:43 -0700
93e45f0f0d

docs: temporarily restore api.md and cleanup docs paths (#12818) Jeffrey Morgan 2025-10-28 23:25:48 -0700
a342160803

docs: fix root api documentation page (#12813) Jeffrey Morgan 2025-10-28 19:17:54 -0700
f6c29409dc

docs: add new cloud model + fix openai redirect (#12812) Jeffrey Morgan 2025-10-28 19:09:07 -0700
7d25b9e194

feat(model): add qwen3vl (#12665) Michael Yang 2025-10-28 17:39:47 -0700
36d64fb531

embed: add distance correlation test for library embed models (#12796) Patrick Devine 2025-10-28 16:57:27 -0700
d828517e78

docs: update readme and links (#12809) Parth Sareen 2025-10-28 16:20:02 -0700
14977a9350

Fix vulkan PCI ID and ID handling (#12775) Daniel Hiltgen 2025-10-28 15:15:35 -0700
29f63f37c8

Revert "server: Consolidate embedding truncation in runner (#12730)" (#12810) Patrick Devine 2025-10-28 14:49:14 -0700
3d99d9779a

docs: add docs for docs.ollama.com (#12805) Parth Sareen 2025-10-28 13:18:48 -0700
6d02a43a75

docs: rename to mdx to setup docs site (#12804) Parth Sareen 2025-10-28 13:04:31 -0700
5483497d7a

Revert "docs: add reference to docs.ollama.com (#12800)" (#12803) Parth Sareen 2025-10-28 12:52:49 -0700
934dd9e196

docs: add reference to docs.ollama.com (#12800) Parth Sareen 2025-10-28 12:44:02 -0700
1188f408dd

s/From*Slice/From*s/ (#12255) Michael Yang 2025-10-28 12:08:49 -0700
15c7d30d9a

embedding tests: added check against exact base64 string (#12790) nicole pardal 2025-10-28 10:37:20 -0700
9862317174

Merge pull request #12793 from ollama/drifkin/12792_renderer-parser-from Devon Rifkin 2025-10-28 00:15:46 -0700
ec9eb28f4c

gemma3: make embedding non-causal (#12297) Michael Yang 2025-10-27 19:54:08 -0700
1bdd816910 create: inherit FROM model's renderer/parser Devon Rifkin 2025-10-27 15:14:19 -0700
5d347f6d6f

server: Consolidate embedding truncation in runner (#12730) nicole pardal 2025-10-27 11:59:12 -0700
b97eb2b858

cloud: set the proxy content-type to the same as local models (#12759) Patrick Devine 2025-10-25 10:57:10 -0700
ad6f6a1d29 llm: Change memory allocation backoff from exponential to incremental Jesse Gross 2025-10-23 11:31:25 -0700
6723a40be6

readme: add VT Code project to terminal community integrations (#12749) Vinh Nguyen 2025-10-24 02:29:50 +0700
3258a89b6e

DRY out the runner lifecycle code (#12540) Daniel Hiltgen 2025-10-23 11:20:02 -0700
1c093e97af kvcache: Remove special case for reservation mask Jesse Gross 2025-10-22 16:00:43 -0700
a8d9c2648e llamarunner: Record the time for all batches during prompt processing Jesse Gross 2025-10-16 16:27:45 -0700
0334e67ffd

tools: parse tool calls that don't conform to ("name": name, "arguments": args} (#12738) frob 2025-10-22 20:34:27 +0200
e0ead1adee

embeddings: base64 encoding fix (#12715) nicole pardal 2025-10-22 11:27:44 -0700
d515aed6c3

cloud: don't error sending empty messages (#12724) Patrick Devine 2025-10-21 18:12:14 -0700
5fe7ba1b9b

runner: always truncate embeddings requests (#12714) Jeffrey Morgan 2025-10-20 16:47:05 -0700
d2b63c19b3

fs(ggml): fill in arch prefix if necessary (#12646) Michael Yang 2025-10-20 16:42:18 -0700
94f110b35a

model/parsers: remove warning for missing <think> tag for qwen3-vl (#12713) Jeffrey Morgan 2025-10-20 16:03:43 -0700
5d22953ba7

cuda: get driver version after props (#12707) Daniel Hiltgen 2025-10-20 10:57:27 -0700
d245dffed8

rocm: give it more time to bootstrap (#12681) Daniel Hiltgen 2025-10-20 09:43:05 -0700
bc1a818fdc

contiguous input per layer (#12686) Daniel Hiltgen 2025-10-17 18:39:18 -0700
ba2253dc30

win: more verbose load failures (#12683) Daniel Hiltgen 2025-10-17 17:13:16 -0700
68e04c7ff8

test: harden scheduler tests (#12662) Daniel Hiltgen 2025-10-17 08:56:44 -0700
270679932f

cuda: tidy up CC settings (#12668) Daniel Hiltgen 2025-10-16 16:39:30 -0700
65fb3ff49d

renderers: add global flag for setting [img] tags (#12669) Jeffrey Morgan 2025-10-16 16:37:32 -0700
e2a0b24435

Grace/qwen3 thinking (#12647) Grace 2025-10-16 15:29:41 -0700
1813ff85a0

cuda: bring back CC 5.2 (#12666) Daniel Hiltgen 2025-10-16 13:07:41 -0700
b531777a66

test: add a few missing embedding models (#12661) Daniel Hiltgen 2025-10-16 09:36:25 -0700
fe3ec8dbf0

Revert "Workaround broken NVIDIA iGPU free VRAM data (#12490)" (#12642) Daniel Hiltgen 2025-10-16 09:09:48 -0700
c744134287

vulkan: Get FilterID from Backend for Vulkan (#12655) Thomas Stocker 2025-10-16 18:07:35 +0200
4be41d2d45

readme: add achatbot-go to community integrations (#12629) weedge 2025-10-16 12:54:15 +0800
de670570c9

fs/ggml: fix function name in comment (#12630) zhetaicheleba 2025-10-16 13:53:38 +0900
201d93716e

Merge pull request #12651 from ollama/drifkin/oai-conversion Devon Rifkin 2025-10-15 21:10:30 -0700
160cecc8e2 openai: make tool call conversion fns public Devon Rifkin 2025-10-15 20:54:58 -0700
8b6e5baee7

CI: Set up temporary opt-out Vulkan support (#12614) Daniel Hiltgen 2025-10-15 14:18:01 -0700
75d17fc6c2

perf: backport cuda iGPU sched spin (#12641) Daniel Hiltgen 2025-10-15 11:52:14 -0700
8fafc8af77

ml/backend/ggml: NVML fallback for unified memory GPUs (#12619) Santosh Bhavani 2025-10-15 13:40:06 -0500
c3c85aa06c llm: Enable flash attention by default for gemma3 Jesse Gross 2025-10-15 10:22:03 -0700
0d713051a2

envconfig: default to port 443 when connecting to ollama.com (#12617) Jeffrey Morgan 2025-10-14 23:38:24 -0700
c4c5a4a01e

types: send index for tool calls (#12625) Parth Sareen 2025-10-14 19:35:15 -0700
3dcfd5f69e llm: Perform eviction when num_gpu is set with new estimates Jesse Gross 2025-10-14 17:21:16 -0700
53a969d509

Merge pull request #12621 from ollama/drifkin/any-of Devon Rifkin 2025-10-14 15:51:24 -0700
08fbb60bb2 qwen3-coder: support anyOf when parsing tool calls Devon Rifkin 2025-10-14 15:33:05 -0700
850da848c5

logs: fix bogus "0 MiB free" log line (#12590) Daniel Hiltgen 2025-10-14 11:26:28 -0700
2aba569a2a

Vulkan based on #9650 (#11835) Thomas Stocker 2025-10-14 19:59:58 +0200
fd8aa947f3

Merge pull request #12562 from ollama/drifkin/registries Devon Rifkin 2025-10-14 02:01:53 -0700
ddaca643d0 add registries for parsers/renderers Devon Rifkin 2025-10-14 01:13:54 -0700
05982a95cb

Qwen3VL Cloud Parser and Renderer (#12526) Grace 2025-10-13 16:52:33 -0700
4987f13d34

Llama cpp bump (df1b612): granite docling / mamba2 optimizations / multimodal encoding fixes (#12552) Gabe Goodhart 2025-10-13 16:26:18 -0600
e638f2acb6

runner: fix shifting on llama runner (#12604) Jeffrey Morgan 2025-10-13 13:46:33 -0700
18087f2ec7 Revert "use llama runner for qwen3 (#12556)" Michael Yang 2025-10-13 13:21:06 -0700
6c833d5f8d fix(qwen3): deepseek distill Michael Yang 2025-10-13 12:09:53 -0700
6544e14735

Reapply "add truncate and shift parameters" (#12582) Jeffrey Morgan 2025-10-11 16:06:14 -0700
5db8a818a1

Merge pull request #12581 from ollama/drifkin/renderer-api-generate Devon Rifkin 2025-10-11 14:10:23 -0700
6db8da9958 routes: fix built-in renderers for api/generate Devon Rifkin 2025-10-11 13:57:43 -0700
0c68ec8d6a

discover: fix typo (#12565) frob 2025-10-11 21:06:02 +0200
70d9e363e1

doc: remove AMD EOL GPUs (#12567) Daniel Hiltgen 2025-10-10 17:16:29 -0700
1a2feb2a97 ollamarunner: fix deadlock Michael Yang 2025-10-10 16:38:12 -0700
aab2190420

implement nvml for linux (#12517) Daniel Hiltgen 2025-10-10 15:15:56 -0700
629db9dc43 comment split Michael Yang 2025-10-09 16:13:03 -0700
e0cd511661 fix test Michael Yang 2025-10-07 16:46:37 -0700
207332078f fix lint Michael Yang 2025-10-07 16:39:14 -0700
93085127f4 convert: slice gate_up weight Michael Yang 2025-10-06 16:05:38 -0700
c00fa9cc2b convert: split gate_up bias Michael Yang 2025-10-06 14:55:55 -0700
df411c4b02 refactor: using testing.B.Loop yajianggroup 2025-09-23 16:05:59 +0800
3d32249c74

use llama runner for qwen3 (#12556) Jeffrey Morgan 2025-10-09 19:08:21 -0700
d681cd7c29

thinking: allow "think": false for non-thinking models (#12555) Patrick Devine 2025-10-09 18:46:00 -0700
47298fce39 refactor: use builtin max and min shengxinjing 2025-09-28 23:06:33 +0100
4a48937ef1 refactor: use builtin max and min shengxinjing 2025-09-25 21:25:37 +0100

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main