Commit Graph

658 Commits

Author SHA1 Message Date
likelovewant
63a5f509ed remove official support arches to down size 2024-08-02 13:30:46 +08:00
likelovewant
ca4c0c1a8f Merge branch 'ollama:main' into main 2024-08-02 09:28:09 +08:00
Michael Yang
0ff42e84b0 Merge pull request #4756 from ollama/mxyng/convert2
refactor convert
2024-08-01 14:16:30 -07:00
likelovewant
0d4292b4b1 Merge branch 'ollama:main' into main 2024-08-01 18:30:28 +08:00
Michael Yang
df993fa37b comments 2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b refactor convert 2024-07-31 15:58:33 -07:00
Michael Yang
0f3271db88 patches: phi3 default sliding window attention 2024-07-31 14:58:34 -07:00
Michael Yang
6b252918fb update convert test to check result data 2024-07-31 10:59:38 -07:00
Michael Yang
5c1912769e Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
likelovewant
1eb1dc32d2 Merge branch 'ollama:main' into main 2024-07-31 14:52:26 +08:00
likelovewant
ad5ad895fb fix 2024-07-31 13:37:19 +08:00
jmorganca
afa8d6e9d5 patch gemma support 2024-07-30 18:07:29 -07:00
royjhan
1b44d873e7 Add Metrics to api\embed response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
likelovewant
fc296fd744 Remove llm/llama.cpp from Git index 2024-07-30 22:37:32 +08:00
likelovewant
e628246970 Restore llama.cpp from commit 6eeaeba 2024-07-30 20:43:59 +08:00
likelovewant
776aa9ceb2 resolve merge conflicts 2024-07-30 18:53:59 +08:00
Jeffrey Morgan
68ee42f995 update llama.cpp submodule to 6eeaeba1 (#6039) 2024-07-29 13:20:26 -07:00
Tibor Schmidt
f3d7a481b7 feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
likelovewant
91ba40fc45 Merge branch 'ollama:main' into main 2024-07-27 12:18:55 +08:00
Jeffrey Morgan
f2a96c7d77 llm: keep patch for llama 3 rope factors (#5987) 2024-07-26 15:20:52 -07:00
likelovewant
86a1575ee3 fix api 2024-07-23 14:57:33 +08:00
likelovewant
fbfc13b6ca Merge branch 'ollama:main' into main 2024-07-23 14:49:32 +08:00
Daniel Hiltgen
e12fff8810 Enable windows error dialog for subprocess startup
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it.  Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang
e2c3f6b3e2 string 2024-07-22 11:27:52 -07:00
Michael Yang
55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang
35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
Daniel Hiltgen
5784c05397 Merge pull request #5854 from dhiltgen/win_exit_status
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Jeffrey Morgan
f8fedbda20 Update llama.cpp submodule commit to d94c6e0c (#5805) 2024-07-22 12:42:00 -04:00
Daniel Hiltgen
a3c20e3f18 Refine error reporting for subprocess crash
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
likelovewant
c44ff579a3 fix mismatch 2024-07-22 19:47:58 +08:00
likelovewant
04325ba40a fix typo 2024-07-22 19:35:43 +08:00
likelovewant
3f03ae5808 update gen_windows.ps1 ,keep track with upstream 2024-07-22 19:00:40 +08:00
likelovewant
24641ae3a5 update gen_windows.ps1 ,keep track with upstream 2024-07-22 18:48:21 +08:00
likelovewant
5cae567ee8 megrge upstream update and reslove the conflicts 2024-07-22 17:00:43 +08:00
likelovewant
a8890fd2c6 fix conflicts 2024-07-22 08:10:12 +08:00
Jeffrey Morgan
5534f2cc6a llm: consider head_dim in llama arch (#5817) 2024-07-20 21:48:12 -04:00
Daniel Hiltgen
283948c83b Adjust windows ROCm discovery
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery.  The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
Jeffrey Morgan
1475eab95f add patch for tekken (#5807) 2024-07-20 13:41:21 -04:00
likelovewant
5cfa607627 Merge branch 'ollama:main' into main 2024-07-17 22:29:55 +08:00
Michael Yang
4a565cbf94 add chat and generate tests with mock runner 2024-07-16 09:39:31 -07:00
royjhan
b9f5e16c80 Introduce /api/embed endpoint supporting batch embedding (#5127)
* Initial Batch Embedding

* Revert "Initial Batch Embedding"

This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29.

* Initial Draft

* mock up notes

* api/embed draft

* add server function

* check normalization

* clean up

* normalization

* playing around with truncate stuff

* Truncation

* Truncation

* move normalization to go

* Integration Test Template

* Truncation Integration Tests

* Clean up

* use float32

* move normalize

* move normalize test

* refactoring

* integration float32

* input handling and handler testing

* Refactoring of legacy and new

* clear comments

* merge conflicts

* touches

* embedding type 64

* merge conflicts

* fix hanging on single string

* refactoring

* test values

* set context length

* clean up

* testing clean up

* testing clean up

* remove function closure

* Revert "remove function closure"

This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787.

* remove function closure

* remove redundant error check

* clean up

* more clean up

* clean up
2024-07-15 12:14:24 -07:00
likelovewant
8c0f922c48 Merge branch 'ollama:main' into main 2024-07-14 00:23:59 +08:00
Jeffrey Morgan
ef98803d63 llm: looser checks for minimum memory (#5677) 2024-07-13 09:20:05 -07:00
likelovewant
5505a018b2 Resolved merge conflicts 2024-07-12 20:44:04 +08:00
Josh
10e768826c fix: quant err message (#5616) 2024-07-11 17:24:29 -07:00
Jeffrey Morgan
c4cf8ad559 llm: avoid loading model if system memory is too small (#5637)
* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2024-07-11 16:42:57 -07:00
Jeffrey Morgan
791650ddef sched: only error when over-allocating system memory (#5626) 2024-07-11 00:53:12 -07:00
Jeffrey Morgan
efbf41ed81 llm: dont link cuda with compat libs (#5621) 2024-07-10 20:01:52 -07:00
Michael Yang
37a570f962 Merge pull request #5612 from ollama/mxyng/mem
chatglm graph
2024-07-10 14:18:33 -07:00
Michael Yang
5a739ff4cb chatglm graph 2024-07-10 13:43:47 -07:00