Michael Yang
5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
...
fix: environ lookup
2024-07-31 10:18:05 -07:00
royjhan
1b44d873e7
Add Metrics to api\embed response ( #5709 )
...
* add prompt tokens to embed response
* rm slog
* metrics
* types
* prompt n
* clean up
* reset submodule
* update tests
* test name
* list metrics
2024-07-30 13:12:21 -07:00
Daniel Hiltgen
345420998e
Prevent partial loading on mixed GPU brands
...
In mult-brand GPU setups, if we couldn't fully load the model we
would fall through the scheduler and mistakenly try to load across
a mix of brands. This makes sure we find the set of GPU(s) that
best fit for the partial load.
2024-07-30 11:00:55 -07:00
Michael Yang
079b2c3b03
Merge pull request #5999 from ollama/mxyng/fix-push
...
fix nil deref in auth.go
2024-07-26 14:28:34 -07:00
Blake Mizerany
750c1c55f7
server: fix race conditions during download ( #5994 )
...
This fixes various data races scattered throughout the download/pull
client where the client was accessing the download state concurrently.
This commit is mostly a hot-fix and will be replaced by a new client one
day soon.
Also, remove the unnecessary opts argument from downloadChunk.
2024-07-26 14:24:24 -07:00
Michael Yang
a622c47bd3
fix nil deref in auth.go
2024-07-26 14:14:48 -07:00
Michael Yang
ec4c35fe99
Merge pull request #5512 from ollama/mxyng/detect-stop
...
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
Michael Yang
15af558423
include modelfile messages
2024-07-26 11:40:11 -07:00
Blake Mizerany
c8af3c2d96
server: reuse original download URL for images ( #5962 )
...
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Josh
db0968f30c
fix dupe err message ( #5857 )
2024-07-22 15:48:15 -07:00
Michael Yang
85d9d73a72
comments
2024-07-22 11:49:03 -07:00
Michael Yang
1954ec5917
uint64
2024-07-22 11:49:02 -07:00
Michael Yang
0f1910129f
int
2024-07-22 11:30:07 -07:00
Michael Yang
8570c1c0ef
keepalive
2024-07-22 11:27:22 -07:00
Michael Yang
55cd3ddcca
bool
2024-07-22 11:27:21 -07:00
Michael Yang
66fe77f084
models
2024-07-22 11:26:12 -07:00
Michael Yang
d1a5227cad
origins
2024-07-22 11:25:30 -07:00
Michael Yang
35b89b2eab
rfc: dynamic environ lookup
2024-07-22 11:25:30 -07:00
Jeffrey Morgan
b3e5491e41
server: collect nested tool call objects when parsing ( #5824 )
2024-07-22 12:38:03 -04:00
Jeffrey Morgan
80ee9b5e47
Remove out of space test temporarily ( #5825 )
2024-07-21 00:22:11 -04:00
Daniel Hiltgen
06e5d74e34
Merge pull request #5506 from dhiltgen/sched_tests
...
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Jeffrey Morgan
69a2d4ccff
Fix generate test flakyness ( #5804 )
2024-07-19 19:11:25 -07:00
Josh
e8b954c646
server: validate template ( #5734 )
...
add template validation to modelfile
2024-07-19 15:24:29 -07:00
Michael Yang
43606d6d6a
fix parsing tool calls
2024-07-18 12:08:11 -07:00
Jeffrey Morgan
70b1010fa5
server: check for empty tools array too ( #5779 )
2024-07-18 11:44:57 -07:00
Jeffrey Morgan
319fb1ce03
server: only parse tool calls if tools are provided ( #5771 )
...
* server: only parse tool calls if tools are provided
* still set `resp.Message.Content`
2024-07-18 08:50:23 -07:00
Michael Yang
b255445557
marshal json automatically for some template values ( #5758 )
2024-07-17 15:35:11 -07:00
Michael Yang
5fd6988126
parse tool call as individual objects
2024-07-17 11:19:04 -07:00
Michael Yang
c279f96371
remove ToolCall from GenerateResponse
2024-07-16 15:22:49 -07:00
Michael Yang
499e87c9ba
Merge pull request #5730 from ollama/mxyng/cleanup
...
remove unneeded tool calls
2024-07-16 14:42:13 -07:00
Michael Yang
d290e87513
add suffix support to generate endpoint
...
this change is triggered by the presence of "suffix", particularly
useful for code completion tasks
2024-07-16 14:31:35 -07:00
Michael Yang
5a83f79afd
remove unneeded tool calls
2024-07-16 13:48:45 -07:00
royjhan
987dbab0b0
OpenAI: /v1/embeddings compatibility ( #5285 )
...
* OpenAI v1 models
* Empty List Testing
* Add back envconfig
* v1/models docs
* Remove Docs
* OpenAI batch embed compatibility
* merge conflicts
* integrate with api/embed
* ep
* merge conflicts
* request tests
* rm resp test
* merge conflict
* merge conflict
* test fixes
* test fn renaming
* input validation for empty string
---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-16 13:36:08 -07:00
Michael Yang
a8388beb94
Merge pull request #5726 from ollama/mxyng/tools-templates
...
fix unmarshal type errors
2024-07-16 12:12:10 -07:00
Michael Yang
5afbb60fc4
fix unmarshal type errors
2024-07-16 11:39:34 -07:00
Jeffrey Morgan
4cb5d7decc
server: omit model system prompt if empty ( #5717 )
2024-07-16 11:09:00 -07:00
Michael Yang
4a565cbf94
add chat and generate tests with mock runner
2024-07-16 09:39:31 -07:00
Michael Yang
64039df6d7
Merge pull request #5284 from ollama/mxyng/tools
...
tools
2024-07-15 18:03:37 -07:00
Jeffrey Morgan
7ac6d462ec
server: return empty slice on empty /api/embed request ( #5713 )
...
* server: return empty slice on empty `/api/embed` request
* fix tests
2024-07-15 17:39:44 -07:00
Michael Yang
ef5136a745
tools test
2024-07-15 17:18:21 -07:00
Michael Yang
d02bbebb11
tools
2024-07-15 15:26:16 -07:00
royjhan
b9f5e16c80
Introduce /api/embed endpoint supporting batch embedding ( #5127 )
...
* Initial Batch Embedding
* Revert "Initial Batch Embedding"
This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.
* Initial Draft
* mock up notes
* api/embed draft
* add server function
* check normalization
* clean up
* normalization
* playing around with truncate stuff
* Truncation
* Truncation
* move normalization to go
* Integration Test Template
* Truncation Integration Tests
* Clean up
* use float32
* move normalize
* move normalize test
* refactoring
* integration float32
* input handling and handler testing
* Refactoring of legacy and new
* clear comments
* merge conflicts
* touches
* embedding type 64
* merge conflicts
* fix hanging on single string
* refactoring
* test values
* set context length
* clean up
* testing clean up
* testing clean up
* remove function closure
* Revert "remove function closure"
This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.
* remove function closure
* remove redundant error check
* clean up
* more clean up
* clean up
2024-07-15 12:14:24 -07:00
Patrick Devine
057d31861e
remove template ( #5655 )
2024-07-13 20:56:24 -07:00
jmorganca
f7ee012300
server: prepend system message in chat handler
2024-07-13 15:08:00 -07:00
Jeffrey Morgan
1ed0aa8fea
server: fix context, load_duration and total_duration fields ( #5676 )
...
* server: fix `contet`, `load_duration` and `total_duration` fields
* Update server/routes.go
2024-07-13 09:25:31 -07:00
Michael Yang
22c5451fc2
fix system prompt ( #5662 )
...
* fix system prompt
* execute template when hitting previous roles
* fix tests
---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-12 21:04:44 -07:00
Michael Yang
ebc529cbb3
autodetect stop parameters from template
2024-07-12 16:01:23 -07:00
Michael Yang
57ec6901eb
revert embedded templates to use prompt/response
...
This reverts commit 19753c18c0 .
for compat. messages will be added at a later date
2024-07-11 14:49:35 -07:00
Jeffrey Morgan
791650ddef
sched: only error when over-allocating system memory ( #5626 )
2024-07-11 00:53:12 -07:00
Michael Yang
41be28096a
add system prompt to first legacy template
2024-07-10 17:03:08 -07:00
Daniel Hiltgen
f4408219e9
Refine scheduler unit tests for reliability
...
This breaks up some of the test scenarios to create a
more reliable set of tests, as well as adding a little more
coverage.
2024-07-09 16:00:08 -07:00
Michael Yang
6bbbc50f10
Merge pull request #5440 from ollama/mxyng/messages-templates
...
update named templates
2024-07-09 09:36:32 -07:00
Michael Yang
9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
...
update message processing
2024-07-09 09:20:44 -07:00
Jeffrey Morgan
e4ff73297d
server: fix model reloads when setting OLLAMA_NUM_PARALLEL ( #5560 )
...
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`
* remove whitespace change
* undo some changes
2024-07-08 22:32:15 -07:00
Jeffrey Morgan
0ee87615c7
sched: don't error if paging to disk on Windows and macOS ( #5523 )
2024-07-06 22:01:52 -04:00
Michael Yang
fb6cbc02fb
update named templates
2024-07-05 16:29:32 -07:00
Michael Yang
ac7a842e55
fix model reloading
...
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Michael Yang
2c3fe1fd97
comments
2024-07-05 13:17:24 -07:00
Michael Yang
269ed6e6a2
update message processing
2024-07-05 13:16:58 -07:00
Daniel Hiltgen
af28b94533
Merge pull request #5469 from dhiltgen/prevent_system_oom
...
Prevent loading models larger than total memory
2024-07-05 08:22:20 -07:00
Anatoli Babenia
0d16eb310e
fix: use envconfig.ModelsDir directly ( #4821 )
...
* Co-authored-by: Anatoli Babenia <anatoli@rainforce.org>
Co-authored-by: Maas Lalani <maas@lalani.dev>
2024-07-03 15:36:11 -07:00
Daniel Hiltgen
955f2a4e03
Only set default keep_alive on initial model load
...
This change fixes the handling of keep_alive so that if client
request omits the setting, we only set this on initial load. Once
the model is loaded, if new requests leave this unset, we'll keep
whatever keep_alive was there.
2024-07-03 15:29:56 -07:00
Daniel Hiltgen
3c75113e37
Prevent loading models larger than total memory
...
Users may not realize the siny new model they're trying to load
fits on their disk, but can't load into system+GPU memory. Today
we crash, but with this fix, we'll give them a better error message
before even trying to load it.
2024-07-03 14:47:42 -07:00
Michael Yang
65a5040e09
fix generate template
2024-07-02 16:42:17 -07:00
royjhan
d626b99b54
OpenAI: v1/completions compatibility ( #5209 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Add Middleware for Chat and List
* Completions Endpoint
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* Rename function
* float types
* type cleanup
* cleaning
* more cleaning
* Extra test cases
* merge conflicts
* merge conflicts
* merge conflicts
* merge conflicts
* cleaning
* cleaning
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 16:01:45 -07:00
Michael Yang
dddb58a38b
Merge pull request #5051 from ollama/mxyng/capabilities
...
add model capabilities
2024-07-02 14:26:07 -07:00
Michael Yang
400056e154
Merge pull request #5420 from ollama/mxyng/insecure-path
...
err on insecure path
2024-07-02 14:03:23 -07:00
royjhan
996bb1b85e
OpenAI: /v1/models and /v1/models/{model} compatibility ( #5007 )
...
* OpenAI v1 models
* Refactor Writers
* Add Test
Co-Authored-By: Attila Kerekes
* Credit Co-Author
Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com>
* Empty List Testing
* Use Namespace for Ownedby
* Update Test
* Add back envconfig
* v1/models docs
* Use ModelName Parser
* Test Names
* Remove Docs
* Clean Up
* Test name
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Add Middleware for Chat and List
* Testing Cleanup
* Test with Fatal
* Add functionality to chat test
* OpenAI: /v1/models/{model} compatibility (#5028 )
* Retrieve Model
* OpenAI Delete Model
* Retrieve Middleware
* Remove Delete from Branch
* Update Test
* Middleware Test File
* Function name
* Cleanup
* Test Update
* Test Update
---------
Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-02 11:50:56 -07:00
Michael Yang
88bcd79bb9
err on insecure path
2024-07-01 15:55:59 -07:00
Michael Yang
da8e2a0447
use kvs to detect embedding models
2024-07-01 10:47:43 -07:00
Michael Yang
a30915bde1
add capabilities
2024-07-01 10:47:43 -07:00
Michael Yang
58e3fff311
rename templates to template
2024-07-01 10:40:54 -07:00
Michael Yang
3f0b309ad4
remove ManifestV2
2024-07-01 10:40:54 -07:00
Daniel Hiltgen
cff3f44f4a
Fix case for NumCtx
2024-07-01 09:43:59 -07:00
Daniel Hiltgen
3518aaef33
Merge pull request #4218 from dhiltgen/auto_parallel
...
Enable concurrency by default
2024-07-01 08:32:29 -07:00
Michael Yang
123a722a6f
zip: prevent extracting files into parent dirs ( #5314 )
2024-06-26 21:38:21 -07:00
Blake Mizerany
cb42e607c5
llm: speed up gguf decoding by a lot ( #5246 )
...
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:
* Too many allocations when decoding strings
* Hitting disk for each read of each key and value, resulting in a
not-okay amount of syscalls/disk I/O.
The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.
This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.
Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Daniel Hiltgen
642cee1342
Sort the ps output
...
Provide consistent ordering for the ps command - longest duration listed first
2024-06-21 15:59:41 -07:00
Daniel Hiltgen
9929751cc8
Disable concurrency for AMD + Windows
...
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00
Daniel Hiltgen
17b7186cd7
Enable concurrency by default
...
This adjusts our default settings to enable multiple models and parallel
requests to a single model. Users can still override these by the same
env var settings as before. Parallel has a direct impact on
num_ctx, which in turn can have a significant impact on small VRAM GPUs
so this change also refines the algorithm so that when parallel is not
explicitly set by the user, we try to find a reasonable default that fits
the model on their GPU(s). As before, multiple models will only load
concurrently if they fully fit in VRAM.
2024-06-21 15:45:05 -07:00
Michael Yang
e835ef1836
fix: quantization with template
2024-06-21 13:39:25 -07:00
royjhan
fedf71635e
Extend api/show and ollama show to return more model info ( #4881 )
...
* API Show Extended
* Initial Draft of Information
Co-Authored-By: Patrick Devine <pdevine@sonic.net>
* Clean Up
* Descriptive arg error messages and other fixes
* Second Draft of Show with Projectors Included
* Remove Chat Template
* Touches
* Prevent wrapping from files
* Verbose functionality
* Docs
* Address Feedback
* Lint
* Resolve Conflicts
* Function Name
* Tests for api/show model info
* Show Test File
* Add Projector Test
* Clean routes
* Projector Check
* Move Show Test
* Touches
* Doc update
---------
Co-authored-by: Patrick Devine <pdevine@sonic.net>
2024-06-19 14:19:02 -07:00
royjhan
89c79bec8c
Add ModifiedAt Field to /api/show ( #5033 )
...
* Add Mod Time to Show
* Error Handling
2024-06-15 20:53:56 -07:00
Daniel Hiltgen
45cacbaf05
Merge pull request #4517 from dhiltgen/gpu_incremental
...
Enhanced GPU discovery and multi-gpu support with concurrency
2024-06-14 15:35:00 -07:00
Daniel Hiltgen
6f351bf586
review comments and coverage
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
ff4f0cbd1d
Prevent multiple concurrent loads on the same gpus
...
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
fc37c192ae
Refine CPU load behavior with system memory visibility
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
434dfe30c5
Reintroduce nvidia nvml library for windows
...
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
48702dd149
Harden unload for empty runners
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
5e8ff556cb
Support forced spreading for multi GPU
...
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one. This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Jeffrey Morgan
dd7c9ebeaf
server: longer timeout in TestRequests ( #5046 )
2024-06-14 09:48:25 -07:00
Patrick Devine
94618b2365
add OLLAMA_MODELS to envconfig ( #5029 )
2024-06-13 12:52:03 -07:00
Jeffrey Morgan
1fd236d177
server: remove jwt decoding error ( #5027 )
2024-06-13 11:21:15 -07:00
Michael Yang
c16f8af911
fix: multiple templates when creating from model
...
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang
515f497e6d
fix: skip removing layers that no longer exist
2024-06-10 11:32:19 -07:00
Michael Yang
b27268aaef
add test
2024-06-10 11:32:15 -07:00
Michael Yang
030e765e76
fix create model when template detection errors
2024-06-07 10:51:35 -07:00
Michael Yang
9b6c2e6eb6
detect chat template from KV
2024-06-06 16:03:47 -07:00
royjhan
1a29e9a879
API app/browser access ( #4879 )
...
* API app/browser access
* Add tauri (resolves #2291 , #4791 , #3799 , #4388 )
2024-06-06 15:19:03 -07:00
royjhan
4bf1da4944
Separate ListResponse and ModelResponse for api/tags vs api/ps ( #4842 )
...
* Remove false time fields
* Struct Separation for List and Process
* Remove Marshaler
2024-06-06 10:11:45 -07:00
Blake Mizerany
de5beb06b3
server: skip blob verification for already verified blobs
2024-06-05 16:39:11 -07:00
Michael Yang
d61ef8b954
update create handler to use model.Name
2024-06-04 13:28:25 -07:00
Michael Yang
6297f85606
gofmt, goimports
2024-06-04 13:20:24 -07:00
Michael Yang
8ce4032e72
more lint
2024-06-04 11:13:30 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Michael Yang
c895a7d13f
some gocritic
2024-06-04 11:13:30 -07:00
Michael Yang
8ffb51749f
nolintlint
2024-06-04 11:13:30 -07:00
Michael Yang
04f3c12bb7
replace x/exp/slices with slices
2024-06-04 11:13:30 -07:00
Michael Yang
96bc232b43
Merge pull request #4413 from ollama/mxyng/name-check
...
check if name exists before create/pull/copy
2024-05-29 12:06:58 -07:00
Michael Yang
bca7b12284
Merge pull request #3718 from ollama/mxyng/modelname-3
...
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Michael Yang
6adca97f37
Merge pull request #4619 from noxer/patch-1
...
Fix download retry issue
2024-05-24 17:21:57 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Tim Scheuermann
db2ffa79f1
Fix download retry issue
2024-05-24 20:30:42 +02:00
Jeffrey Morgan
38255d2af1
Use flash attention flag for now ( #4580 )
...
* put flash attention behind flag for now
* add test
* remove print
* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Sang Park
4434d7f447
Correct typo in error message ( #4535 )
...
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
807d092761
fix quantize file types
2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
Daniel Hiltgen
02b31c9dc8
Don't return error on signal exit
2024-05-16 16:25:38 -07:00
Michael Yang
84ed77cbd8
Merge pull request #4436 from ollama/mxyng/done-part
...
return on part done
2024-05-15 17:16:24 -07:00
Patrick Devine
d1692fd3e0
fix the cpu estimatedTotal memory + get the expiry time for loading models ( #4461 )
2024-05-15 15:43:16 -07:00
Patrick Devine
f2cf97d6f1
fix typo in modelfile generation ( #4439 )
2024-05-14 15:34:29 -07:00
Michael Yang
85a57006d1
check if name exists before create/pull/copy
2024-05-14 14:58:58 -07:00
Michael Yang
c5e892cb3e
update tests
2024-05-14 14:56:31 -07:00
Michael Yang
81fb06f530
more resilient Manifests
2024-05-14 14:08:24 -07:00
Michael Yang
a385382ff5
filepath.Join
2024-05-14 14:08:24 -07:00
Michael Yang
b8772a353f
remove DeleteModel
2024-05-14 14:08:24 -07:00
Michael Yang
c2714fcbfd
routes: use Manifests for ListHandler
2024-05-14 14:08:24 -07:00
Michael Yang
a2fc933fed
update delete handler to use model.Name
2024-05-14 14:08:24 -07:00
Michael Yang
ac145f75ca
return on part done
2024-05-14 13:04:30 -07:00
Ryo Machida
798b107f19
Fixed the API endpoint /api/tags when the model list is empty. ( #4424 )
...
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.
* Update server/routes.go
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
Daniel Hiltgen
ec231a7923
Remove VRAM convergence check for windows
...
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
2024-05-14 09:53:46 -07:00
Patrick Devine
7ca71a6b0f
don't abort when an invalid model name is used in /save ( #4416 )
2024-05-13 18:48:28 -07:00
Patrick Devine
6845988807
Ollama ps command for showing currently loaded models ( #4327 )
2024-05-13 17:17:36 -07:00
jmorganca
4ec7445a6f
Revert "use post token"
...
This reverts commit 0fec3525ad .
2024-05-11 22:19:14 -07:00
Michael Yang
0fec3525ad
use post token
2024-05-11 19:13:16 -07:00
Daniel Hiltgen
824ee5446f
Fix envconfig unit test
2024-05-10 16:49:48 -07:00
Daniel Hiltgen
4142c3ef7c
Always use the sorted list of GPUs
...
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00
Jeffrey Morgan
6602e793c0
Use --quantize flag and quantize api parameter ( #4321 )
...
* rename `--quantization` to `--quantize`
* backwards
* Update api/types.go
Co-authored-by: Michael Yang <mxyng@pm.me>
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Jeffrey Morgan
bb6fd02298
Don't clamp ctx size in PredictServerFit ( #4317 )
...
* dont clamp ctx size in `PredictServerFit`
* minimum 4 context
* remove context warning
2024-05-10 10:17:12 -07:00
Michael Yang
e03637176d
fix(routes): skip bad manifests
2024-05-10 08:46:11 -07:00
Jeffrey Morgan
302d7fdbf3
prune partial downloads ( #4272 )
2024-05-09 16:35:20 -07:00
Daniel Hiltgen
3ae2f441e0
Fix race in shutdown logic
...
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Daniel Hiltgen
354ad9254e
Wait for GPU free memory reporting to converge
...
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Daniel Hiltgen
8727a9c140
Record more GPU information
...
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Bruce MacDonald
cfa84b8470
add done_reason to the api ( #4235 )
2024-05-09 13:30:14 -07:00
Michael Yang
a7ee84fc31
routes: skip invalid filepaths
2024-05-09 11:23:22 -07:00
Jeffrey Morgan
d5eec16d23
use model defaults for num_gqa, rope_frequency_base and rope_frequency_scale ( #1983 )
2024-05-09 09:06:13 -07:00
Bruce MacDonald
cef45feaa4
Add preflight OPTIONS handling and update CORS config ( #4086 )
...
* Add preflight OPTIONS handling and update CORS config
- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.
- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.
* allow auth, content-type, and user-agent headers
* Update routes.go
2024-05-08 13:14:00 -07:00
Michael Yang
b25976aeb8
routes: fix show llava models
2024-05-08 12:43:36 -07:00
Michael Yang
88cf154483
Merge pull request #4244 from ollama/mxyng/skip-if-same
...
skip if same quantization
2024-05-07 19:03:37 -07:00
Bruce MacDonald
8cbd3e7510
skip hidden files in list models handler ( #4247 )
2024-05-07 19:01:45 -07:00
Michael Yang
eeb695261f
skip if same quantization
2024-05-07 17:44:19 -07:00
Bruce MacDonald
dc9b1111e0
fix invalid destination error message
2024-05-07 17:35:52 -07:00
Michael Yang
ffbd3d173f
Merge pull request #3715 from ollama/mxyng/modelname-2
...
update list handler to use model.Name
2024-05-07 15:21:39 -07:00
Michael Yang
1e0a669f75
Merge pull request #3682 from ollama/mxyng/quantize-all-the-things
...
quantize any fp16/fp32 model
2024-05-07 15:20:49 -07:00
Michael Yang
548a7df014
update list handler to use model.Name
2024-05-07 09:38:45 -07:00
Jeffrey Morgan
39d9d22ca3
close server on receiving signal ( #4213 )
2024-05-06 16:01:37 -07:00
Michael Yang
b2f00aa977
close zip files
2024-05-06 15:27:19 -07:00
Michael Yang
f5e8b207fb
s/DisplayLongest/String/
2024-05-06 15:24:01 -07:00
Michael Yang
d245460362
only quantize language models
2024-05-06 15:24:01 -07:00
Michael Yang
4d0d0fa383
no iterator
2024-05-06 15:24:01 -07:00
Michael Yang
7ffe45734d
rebase
2024-05-06 15:24:01 -07:00
Michael Yang
01811c176a
comments
2024-05-06 15:24:01 -07:00
Michael Yang
a7248f6ea8
update tests
2024-05-06 15:24:01 -07:00
Michael Yang
9685c34509
quantize any fp16/fp32 model
...
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Daniel Hiltgen
0963c65027
Merge pull request #4208 from dhiltgen/fix_sched_test
...
Fix stale test logic
2024-05-06 14:23:12 -07:00
Jeffrey Morgan
c9f98622b1
Skip scheduling cancelled requests, always reload unloaded runners ( #4189 )
2024-05-06 14:22:24 -07:00
Daniel Hiltgen
0a954e5066
Fix stale test logic
...
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
2024-05-06 14:15:37 -07:00
Jeffrey Morgan
dfa2f32ca0
unload in critical section ( #4187 )
2024-05-05 17:18:27 -07:00
Daniel Hiltgen
f56aa20014
Centralize server config handling
...
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
Jeffrey Morgan
942c979232
allocate a large enough kv cache for all parallel requests ( #4162 )
2024-05-05 15:59:32 -07:00
Patrick Devine
2a21363bb7
validate the format of the digest when getting the model path ( #4175 )
2024-05-05 11:46:12 -07:00
Daniel Hiltgen
20f6c06569
Make maximum pending request configurable
...
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Michael Yang
b7a87a22b6
Merge pull request #4059 from ollama/mxyng/parser-2
...
rename parser to model/file
2024-05-03 13:01:22 -07:00
Daniel Hiltgen
9a32c514cb
Soften timeouts on sched unit tests
...
This gives us more headroom on the scheduler tests to tamp
down some flakes.
2024-05-03 09:08:33 -07:00
Michael Yang
e9ae607ece
Merge pull request #3892 from ollama/mxyng/parser
...
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang
5b806d8d24
Merge pull request #4089 from ollama/mxyng/target-invalid
...
server: destination invalid
2024-05-01 12:46:35 -07:00
Michael Yang
45b6a12e45
server: target invalid
2024-05-01 12:40:45 -07:00
Mark Ward
63c763685f
log when the waiting for the process to stop to help debug when other tasks execute during this wait.
...
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
2024-05-01 18:51:10 +00:00
Mark Ward
f4a73d57a4
fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use.
2024-05-01 18:51:10 +00:00
Michael Yang
119589fcb3
rename parser to model/file
2024-05-01 09:53:50 -07:00
Michael Yang
9cf0f2e973
use parser.Format instead of templating modelfile
2024-05-01 09:52:54 -07:00
Michael Yang
c0a00f68ae
refactor modelfile parser
2024-05-01 09:52:54 -07:00
Bruce MacDonald
0a7fdbe533
prompt to display and add local ollama keys to account ( #3717 )
...
- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied
2024-04-30 11:02:08 -07:00
Jeffrey Morgan
586672f490
fix copying model to itself ( #4019 )
2024-04-28 23:47:49 -04:00
Daniel Hiltgen
d6e3b64582
Fix concurrency for CPU mode
...
Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads. This adds that back, along with test coverage.
This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.
2024-04-28 13:42:39 -07:00
Jeffrey Morgan
bb31def011
return code 499 when user cancels request while a model is loading ( #3955 )
2024-04-26 17:38:29 -04:00
Blake Mizerany
37f9c8ad99
types/model: overhaul Name and Digest types ( #3924 )
2024-04-26 13:08:32 -07:00
Daniel Hiltgen
9b5a3c5991
Merge pull request #3914 from dhiltgen/mac_perf
...
Improve mac parallel performance
2024-04-25 16:28:31 -07:00
Jeffrey Morgan
00b0699c75
Reload model if num_gpu changes ( #3920 )
...
* reload model if `num_gpu` changes
* dont reload on -1
* fix tests
2024-04-25 19:02:40 -04:00
Daniel Hiltgen
b123be5b71
Adjust context size for parallelism
2024-04-25 13:58:54 -07:00
Daniel Hiltgen
f503a848c2
Merge pull request #3895 from brycereitano/shiftloading
...
Move ggml loading to when attempting to fit
2024-04-25 09:24:08 -07:00
Bryce Reitano
36a6daccab
Restructure loading conditional chain
2024-04-24 17:37:03 -06:00
Bryce Reitano
ceb0e26e5e
Provide variable ggml for TestLoad
2024-04-24 17:19:55 -06:00
Bryce Reitano
284e02bed0
Move ggml loading to when we attempt fitting
2024-04-24 17:17:24 -06:00
Michael Yang
592dae31c8
update copy to use model.Name
2024-04-24 15:54:54 -07:00
Daniel Hiltgen
d8851cb7a0
Harden sched TestLoad
...
Give the go routine a moment to deliver the expired event
2024-04-23 16:14:47 -07:00
Daniel Hiltgen
34b9db5afc
Request and model concurrency
...
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Cheng
62be2050dd
chore: use errors.New to replace fmt.Errorf will much better ( #3789 )
2024-04-20 22:11:06 -04:00
Patrick Devine
9f8691c6c8
Add llama2 / torch models for ollama create ( #3607 )
2024-04-15 11:26:42 -07:00
Jeffrey Morgan
a0b8a32eb4
Terminate subprocess if receiving SIGINT or SIGTERM signals while model is loading ( #3653 )
...
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading
* use `unload` in signal handler
2024-04-15 12:09:32 -04:00
Blake Mizerany
a7b431e743
server: provide helpful workaround hint when stalling on pull ( #3584 )
...
This is a quick fix to help users who are stuck on the "pull" step at
99%.
In the near future we're introducing a new registry client that
should/will hopefully be smarter. In the meantime, this should unblock
the users hitting issue #1736 .
2024-04-10 16:24:37 -07:00
Michael Yang
9502e5661f
cgo quantize
2024-04-08 15:31:08 -07:00
Michael Yang
e1c9a2a00f
no blob create if already exists
2024-04-08 15:09:48 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion ( #3422 )
2024-04-01 16:14:53 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion ( #3250 )
...
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Michael Yang
af8a8a6b59
fix: trim quotes on OLLAMA_ORIGINS
2024-03-27 15:24:29 -07:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama ( #3347 )
2024-03-26 13:04:17 -07:00
Daniel Hiltgen
949b6c01e0
Revamp go based integration tests
...
This uplevels the integration tests to run the server which can allow
testing an existing server, or a remote server.
2024-03-23 14:24:18 +01:00
Blake Mizerany
703684a82a
server: replace blob prefix separator from ':' to '-' ( #3146 )
...
This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.
2024-03-14 20:18:06 -07:00
Patrick Devine
47cfe58af5
Default Keep Alive environment variable ( #3094 )
...
---------
Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
2024-03-13 13:29:40 -07:00
Daniel Hiltgen
4a5c9b8035
Finish unwinding idempotent payload logic
...
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent. This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Jeffrey Morgan
5b3fad9636
separate out isLocalIP
2024-03-09 00:22:08 -08:00
Jeffrey Morgan
bfec2c6e10
simplify host checks
2024-03-08 23:29:53 -08:00
Jeffrey Morgan
5c143af726
add additional allowed hosts
2024-03-08 23:23:59 -08:00
Jeffrey Morgan
fc8c044584
add allowed host middleware and remove workDir middleware ( #3018 )
2024-03-08 22:23:47 -08:00
Michael Yang
76bdebbadf
decode ggla
2024-03-08 15:46:25 -08:00
Bruce MacDonald
0cebc79cba
fix: allow importing a model from name reference ( #3005 )
2024-03-08 12:27:47 -05:00
Jeffrey Morgan
fc06205971
Revert "adjust download and upload concurrency based on available bandwidth" ( #2995 )
2024-03-07 18:10:16 -08:00
Daniel Hiltgen
6c5ccb11f9
Revamp ROCm support
...
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.
We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.
For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
2024-03-07 10:36:50 -08:00
Michael Yang
2e20110e50
Merge pull request #2221 from ollama/mxyng/up-down-ccy
...
adjust download and upload concurrency based on available bandwidth
2024-03-07 09:27:33 -08:00
Patrick Devine
2c017ca441
Convert Safetensors to an Ollama model ( #2824 )
2024-03-06 21:01:51 -08:00
Jeffrey Morgan
3b4bab3dc5
Fix embeddings load model behavior ( #2848 )
2024-02-29 17:40:56 -08:00
Michael Yang
0e19476b56
prepend image tags ( #2789 )
...
instead of appending image tags, prepend them - this generally produces better results
2024-02-29 11:30:14 -08:00
Michael Yang
084d846621
refactor
2024-02-21 13:42:48 -08:00
Michael Yang
6a4b994433
lint
2024-02-21 13:42:48 -08:00
Michael Yang
bea007deb7
use LimitGroup for uploads
2024-02-21 13:42:48 -08:00
Michael Yang
074934be03
adjust group limit based on download speed
2024-02-21 13:42:48 -08:00
Michael Yang
0de12368a0
add new LimitGroup for dynamic concurrency
2024-02-21 13:42:48 -08:00
Michael Yang
917bd61084
refactor download run
2024-02-21 13:42:46 -08:00
Jeffrey Morgan
287ba11500
better error message when calling /api/generate or /api/chat with embedding models
2024-02-20 21:53:45 -05:00
Jeffrey Morgan
63861f58cc
Support for bert and nomic-bert embedding models
2024-02-20 21:37:29 -05:00
Michael Yang
210b65268e
replace strings buffer with hasher ( #2437 )
...
the buffered value is going into the hasher eventually so write directly
to the hasher instead
2024-02-20 19:07:50 -05:00
Michael Yang
897b213468
use http.DefaultClient ( #2530 )
...
default client already handles proxy
2024-02-20 18:34:47 -05:00
Bruce MacDonald
88622847c6
fix: chat system prompting overrides ( #2542 )
2024-02-16 14:42:43 -05:00
Michael Yang
e43648afe5
rerefactor
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
f397e0e988
Move hub auth out to new package
2024-02-15 05:56:45 +00:00
Jeffrey Morgan
48a273f80b
Fix issues with templating prompt in chat mode ( #2460 )
2024-02-12 15:06:57 -08:00
Jeffrey Morgan
1f9078d6ae
Check image filetype in api handlers ( #2467 )
2024-02-12 11:16:20 -08:00
Jeffrey Morgan
a0a199b108
Fix hanging issue when sending empty content ( #2399 )
2024-02-07 19:30:33 -05:00
Jeffrey Morgan
453f572f83
Initial OpenAI /v1/chat/completions API compatibility ( #2376 )
2024-02-07 17:24:29 -05:00
Michael Yang
e805ac1d59
fix response on token error
2024-02-07 11:05:49 -08:00
Michael Yang
bfbf2f7cf7
Merge pull request #2296 from ollama/mxyng/img-tags
...
append image tags to user content
2024-02-01 13:16:59 -08:00
Michael Yang
3d6f48507a
structured debug prompt
2024-02-01 11:56:28 -08:00
Michael Yang
f3761405c8
use image id
2024-02-01 11:52:42 -08:00
Michael Yang
e49dc9f3d8
fix tests
2024-02-01 11:48:11 -08:00
Michael Yang
d125510b4b
remove image tags
2024-02-01 11:32:51 -08:00
Michael Yang
fb56988014
account for image projection in token count
2024-02-01 09:50:48 -08:00
Michael Yang
d046bee790
use llm.ImageData for chat
2024-01-31 19:18:25 -08:00
Jeffrey Morgan
f11bf0740b
use llm.ImageData
2024-01-31 19:13:48 -08:00
Michael Yang
8450bf66e6
trim images
2024-01-31 19:13:47 -08:00
Michael Yang
b4e11be8ef
append image tags to user content
2024-01-31 19:13:10 -08:00
Bruce MacDonald
a896079705
preserve last system message from modelfile ( #2289 )
2024-01-31 21:45:01 -05:00
Michael Yang
8ac08a0eec
update slog handler options
...
- consistent format by using text handler for debug and non-debug
- truncate source file to just the file name
2024-01-31 15:15:00 -08:00
Michael Yang
c8b1f2369e
remove unnecessary parse raw
2024-01-30 17:00:53 -08:00
Bruce MacDonald
0632dff3f8
trim chat prompt based on llm context size ( #1963 )
2024-01-30 15:59:29 -05:00
Jeffrey Morgan
f2245c7c77
print prompt with OLLAMA_DEBUG=1 ( #2245 )
2024-01-28 15:22:35 -08:00
Jeffrey Morgan
e4b9b72f2a
Do not repeat system prompt for chat templating ( #2241 )
2024-01-28 14:15:56 -08:00
Patrick Devine
b5cf31b460
add keep_alive to generate/chat/embedding api endpoints ( #2146 )
2024-01-26 14:28:02 -08:00
Michael Yang
9d3dcfd0ec
fix logging
2024-01-26 11:04:27 -08:00
Michael Yang
6e0ea5ecc8
Merge pull request #1916 from ollama/mxyng/inactivity-monitor
...
download: add inactivity monitor
2024-01-26 10:56:00 -08:00
Patrick Devine
7c40a67841
Save and load sessions ( #2063 )
2024-01-25 12:12:36 -08:00
Michael Yang
c08dfaa23d
fix: remove overwritten model layers
...
if create overrides a manifest, first add the older manifest's layers to
the delete map so they can be cleaned up
2024-01-19 14:58:37 -08:00
Michael Yang
aac9ab4db7
fix show handler
2024-01-18 15:36:50 -08:00
Michael Yang
745b5934fa
add model to ModelResponse
2024-01-18 14:32:55 -08:00
Michael Yang
a38d88d828
api: add model for all requests
...
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Daniel Hiltgen
fedd705aea
Mechanical switch from log to slog
...
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Michael Yang
96cfb62641
fix: normalize name path before splitting
2024-01-16 16:48:29 -08:00
Patrick Devine
eef50accb4
Fix show parameters ( #2017 )
2024-01-16 10:34:44 -08:00
Michael Yang
27331ae3a8
download: add inactivity monitor
...
if a download part is inactive for some time, restart it
2024-01-12 15:23:15 -08:00
Michael Yang
cf29bd2d72
fix: request retry with error
...
this fixes a subtle bug with makeRequestWithRetry where an HTTP status
error on a retried request will potentially not return the right err
2024-01-12 13:32:27 -08:00
Michael Yang
2b9892a808
fix(windows): modelpath and list
2024-01-09 09:36:58 -08:00
Michael Yang
2bb2bdd5d4
fix lint
2024-01-09 09:36:58 -08:00
Michael Yang
acfc376efd
add .golangci.yaml
2024-01-09 09:36:58 -08:00
Bruce MacDonald
7e8f7c8358
remove ggml automatic re-pull ( #1856 )
2024-01-08 14:41:01 -05:00
Michael Yang
0101e76dbe
Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05
...
fix: allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 17:20:09 -08:00
Patrick Devine
22e93efa41
add show info command and fix the modelfile
2024-01-05 12:20:05 -08:00
Nicholas Dudfield
8baaaa39c0
Allow extension origins (still needs explicit listing), fixes #1686
2024-01-05 09:06:47 +07:00
Bruce MacDonald
4ad6c9b11f
fix: pull either original model or from model on create ( #1774 )
2024-01-04 01:34:38 -05:00
Bruce MacDonald
0b3118e0af
fix: relay request opts to loaded llm prediction ( #1761 )
2024-01-03 12:01:42 -05:00
Daniel Hiltgen
697bea6939
Guard integration tests with a tag
...
This should help CI avoid running the integration test logic in a
container where it's not currently possible.
2023-12-22 16:33:27 -08:00
Bruce MacDonald
db356c8519
post-response templating ( #1427 )
2023-12-22 17:07:05 -05:00
Daniel Hiltgen
96fb441abd
Merge pull request #1146 from dhiltgen/ext_server_cgo
...
Add cgo implementation for llama.cpp
2023-12-22 08:16:31 -08:00
Michael Yang
63aac0edc5
fix(test): use real version string for comparison
2023-12-19 15:03:02 -08:00
Daniel Hiltgen
51082535e1
Add automated test for multimodal
...
A simple test case that verifies llava:7b can read text in an image
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
35934b2e05
Adapted rocm support to cgo based llama.cpp
2023-12-19 09:05:46 -08:00
Daniel Hiltgen
d4cd695759
Add cgo implementation for llama.cpp
...
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Bruce MacDonald
5e7fd6906f
Update images.go
2023-12-19 09:05:46 -08:00
Bruce MacDonald
811b1f03c8
deprecate ggml
...
- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-12-19 09:05:46 -08:00
Bruce MacDonald
d99fa6ce0a
send empty messages on last chat response ( #1530 )
2023-12-18 14:23:38 -05:00
Patrick Devine
3948c6ea06
add magic header for unit tests ( #1558 )
2023-12-18 10:41:02 -08:00
Patrick Devine
86b0dd4b16
add API create/copy handlers ( #1541 )
2023-12-15 11:59:18 -08:00
Patrick Devine
0174665d0e
add API tests for list handler ( #1535 )
2023-12-14 18:18:25 -08:00