Commit Graph

489 Commits

Author SHA1 Message Date
Jeffrey Morgan
93e45f0f0d
docs: temporarily restore api.md and cleanup docs paths (#12818) 2025-10-28 23:25:48 -07:00
Jeffrey Morgan
a342160803
docs: fix root api documentation page (#12813) 2025-10-28 19:17:54 -07:00
Jeffrey Morgan
f6c29409dc
docs: add new cloud model + fix openai redirect (#12812) 2025-10-28 19:09:07 -07:00
Parth Sareen
d828517e78
docs: update readme and links (#12809) 2025-10-28 16:20:02 -07:00
Parth Sareen
3d99d9779a
docs: add docs for docs.ollama.com (#12805) 2025-10-28 13:18:48 -07:00
Parth Sareen
6d02a43a75
docs: rename to mdx to setup docs site (#12804) 2025-10-28 13:04:31 -07:00
Parth Sareen
5483497d7a
Revert "docs: add reference to docs.ollama.com (#12800)" (#12803)
This reverts commit 934dd9e196.
2025-10-28 12:52:49 -07:00
Parth Sareen
934dd9e196
docs: add reference to docs.ollama.com (#12800) 2025-10-28 12:44:02 -07:00
Daniel Hiltgen
270679932f
cuda: tidy up CC settings (#12668)
8.7 is Jetpack only, so no need on x86 builds
10.3 covers [G]B300
2025-10-16 16:39:30 -07:00
Daniel Hiltgen
70d9e363e1
doc: remove AMD EOL GPUs (#12567) 2025-10-10 17:16:29 -07:00
Daniel Hiltgen
303be9304c
docs: improve accuracy of LLM library docs (#12530) 2025-10-07 16:21:07 -07:00
Daniel Hiltgen
bd15eba4e4
Bring back escape valve for llm libraries and fix Jetpack6 crash (#12529)
* Bring back escape valve for llm libraries

If the new discovery logic picks the wrong library, this gives users the
ability to force a specific one using the same pattern as before. This
can also potentially speed up bootstrap discovery if one of the libraries
takes a long time to load and ultimately bind to no devices.  For example
unsupported AMD iGPUS can sometimes take a while to discover and rule out.

* Bypass extra discovery on jetpack systems

On at least Jetpack6, cuda_v12 appears to expose the iGPU, but crashes later on in
cublasInit so if we detect a Jetpack, short-circuit and use that variant.
2025-10-07 16:06:14 -07:00
Daniel Hiltgen
c68f367ef6
Update GGML to b6646 (#12245)
Notable EOLs with this change:
- MacOS v12 and v13 are no longer supported (v14+ required)
- AMD gfx900 and gfx906 are no longer supported
2025-10-02 14:47:10 -07:00
Daniel Hiltgen
bc8909fb38
Use runners for GPU discovery (#12090)
This revamps how we discover GPUs in the system by leveraging the Ollama
runner.  This should eliminate inconsistency between our GPU discovery and the
runners capabilities at runtime, particularly for cases where we try to filter
out unsupported GPUs.  Now the runner does that implicitly based on the actual
device list.  In some cases free VRAM reporting can be unreliable which can
leaad to scheduling mistakes, so this also includes a patch to leverage more
reliable VRAM reporting libraries if available.

Automatic workarounds have been removed as only one GPU leveraged this, which
is now documented. This GPU will soon fall off the support matrix with the next
ROCm bump.

Additional cleanup of the scheduler and discovery packages can be done in the
future once we have switched on the new memory management code, and removed
support for the llama runner.
2025-10-01 15:12:32 -07:00
jmorganca
af060eb250 docs: update cloud.md for cloud models 2025-09-22 13:09:17 -03:00
jmorganca
ae5c33008e docs: move turbo.md to cloud.md 2025-09-22 13:09:17 -03:00
Daniel Hiltgen
93c64ea1b1
doc: show how to clear the cgo cache (#12298) 2025-09-15 15:45:35 -07:00
Michael Yang
feb18cd710
feat: add dimensions field to embed requests (#12242)
* feat: add field to truncate embeddings

* add openai embeddings for dimensions
2025-09-11 10:36:10 -07:00
Daniel Hiltgen
17a023f34b
Add v12 + v13 cuda support (#12000)
* Add support for upcoming NVIDIA Jetsons

The latest Jetsons with JetPack 7 are moving to an SBSA compatible model and
will not require building a JetPack specific variant.

* cuda: bring back dual versions

This adds back dual CUDA versions for our releases,
with v11 and v13 to cover a broad set of GPUs and
driver versions.

* win: break up native builds in build_windows.ps1

* v11 build working on windows and linux

* switch to cuda v12.8 not JIT

* Set CUDA compression to size

* enhance manual install linux docs
2025-09-10 12:05:18 -07:00
Daniel Hiltgen
950d33aa30
docs: show how to debug nvidia init failures (#12216)
This debug setting can help troubleshoot obscure initialization failures.
2025-09-08 11:39:00 -07:00
Thomas Pelster
883d031268
docs: added missing comma in 'Ollama's Javascript library'' (#11915) 2025-08-15 14:45:01 -07:00
Daniel Hiltgen
7ccfd97a93
doc: clarify both rocm and main bundle necessary (#11900)
Some users expect the rocm bundles to be self-sufficient, but are designed to be additive.
2025-08-14 12:54:55 -07:00
Patrick Devine
44bc36d063
docs: update the faq (#11760) 2025-08-06 16:55:57 -07:00
Gao feng
8a75e9ee15
Update downloading to pulling in api.md (#11170)
update api.md to make it consist with code.
https://github.com/ollama/ollama/blob/main/server/download.go#L447
2025-08-06 11:33:09 -07:00
Parth Sareen
4742e12c23
docs: update turbo model name (#11707) 2025-08-05 17:29:08 -07:00
Jeffrey Morgan
ee92ca3e1d
docs: add docs for Ollama Turbo (#11687) 2025-08-05 13:09:10 -07:00
Yoshi
3515cc377c
docs: fix typos and remove trailing whitespaces (#11554) 2025-07-28 11:19:13 -07:00
ycomiti
4151ef8cf7
Update linux.md (#11462) 2025-07-22 11:17:31 -07:00
frob
802ad16ce4
docs: add the no-Modelfile function of ollama create (#9077) 2025-07-16 22:16:10 -07:00
Marcelo Fornet
2e3fd86d48
docs: fix typo in macos.md (#11425) 2025-07-16 10:50:46 -07:00
先知
4261a3b0b2
docs: update modelfile.md to reflect current default num_ctx (#11189)
As in the commit 44b466eeb2, the default context length has been increased to 4096.
2025-07-11 15:15:00 -07:00
Daniel Hiltgen
66fb8575ce
doc: add MacOS docs (#11334)
also removes stale model dir instructions for windows
2025-07-08 15:38:04 -07:00
Daniel Hiltgen
20c3266e94
Reduce default parallelism to 1 (#11330)
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm.  This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined.  Removal of the dynamic
logic will come in a follow up.
2025-07-08 12:08:37 -07:00
Parth Sareen
43107b15b9
add tool_name to api.md (#11326) 2025-07-07 16:53:13 -07:00
Parth Sareen
1f91cb0c8c
template: add tool result compatibility (#11294) 2025-07-07 15:53:42 -07:00
Daniel Hiltgen
9d60bb44cf
doc: add NVIDIA blackwell to supported list (#11307) 2025-07-05 16:06:30 -07:00
Daniel Hiltgen
1c6669e64c
Re-remove cuda v11 (#10694)
* Re-remove cuda v11

Revert the revert - drop v11 support requiring drivers newer than Feb 23

This reverts commit c6bcdc4223.

* Simplify layout

With only one version of the GPU libraries, we can simplify things down somewhat.  (Jetsons still require special handling)

* distinct sbsa variant for linux arm64

This avoids accidentally trying to load the sbsa cuda libraries on
a jetson system which results in crashes.

* temporary prevent rocm+cuda mixed loading
2025-06-23 14:07:00 -07:00
Jeffrey Morgan
8bcb3125c1
benchmark: remove unused benchmark test (#11120)
Removes a test under benchmark/ that is unused
2025-06-18 12:58:50 -07:00
Krzysztof Jeziorny
fc0309615e
docs: update link to AMD drivers in linux.md (#10973) 2025-06-06 23:30:04 -04:00
Jeffrey Morgan
09d308d6b6
Revert "server: add model capabilities to the list endpoint (#10174)" (#11004)
This reverts commit 0943001193.
2025-06-06 23:29:14 -04:00
Hunter Wittenborn
c6a6d7294d
docs: fix typo in development.md (#10998) 2025-06-06 12:07:29 -04:00
JasonHonKL
0943001193
server: add model capabilities to the list endpoint (#10174) 2025-06-04 11:39:48 -07:00
Devon Rifkin
5f57b0ef42
add thinking support to the api and cli (#10584)
- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models
2025-05-28 19:38:52 -07:00
frob
6623898198
docs: remove unsupported quantizations (#10842) 2025-05-24 13:17:26 -07:00
Daniel Hiltgen
c6bcdc4223
Revert "remove cuda v11 (#10569)" (#10692)
Bring back v11 until we can better warn users that their driver
is too old.

This reverts commit fa393554b9.
2025-05-13 13:12:54 -07:00
Daniel Hiltgen
9d6df90805
Follow up to #10363 (#10647)
The quantization PR didn't block all unsupported file types,
which this PR fixes.  It also updates the API docs to reflect
the now reduced set of supported types.
2025-05-12 15:23:31 -07:00
Jeffrey Morgan
fa9973cd7f
api: remove unused sampling parameters (#10581) 2025-05-08 08:31:08 -07:00
Daniel Hiltgen
fa393554b9
remove cuda v11 (#10569)
This reduces the size of our Windows installer payloads by ~256M by dropping
support for nvidia drivers older than Feb 2023.  Hardware support is unchanged.

Linux default bundle sizes are reduced by ~600M to 1G.
2025-05-06 17:33:19 -07:00
Jeffrey Morgan
3b2d2c8326
api: remove unused or unsupported api options (#10574)
Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options
2025-05-05 14:54:40 -07:00
Devon Rifkin
44b466eeb2 config: update default context length to 4096 2025-04-28 17:03:27 -07:00