Commit Graph

722 Commits

Author SHA1 Message Date
likelovewant
b7d38e2ccd Merge branch 'ollama:main' into main 2024-08-13 11:27:09 +08:00
Michael Yang
6ffb5cb017 add conversion for microsoft phi 3 mini/medium 4k, 128 2024-08-12 15:13:29 -07:00
Jeffrey Morgan
15c2d8fe14 server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
Daniel Hiltgen
25906d72d1 llm: prevent loading too large models on windows (#5926)
Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.
2024-08-11 11:30:20 -07:00
Daniel Hiltgen
2473bdba5e Merge pull request #6182 from dhiltgen/more_patterns
Catch one more error log
2024-08-08 12:33:17 -07:00
Michael Yang
2003d60159 llama3.1 memory 2024-08-08 11:18:13 -07:00
likelovewant
ca312b344f Merge branch 'ollama:main' into main 2024-08-07 17:20:55 +08:00
Jeffrey Morgan
de4fc29773 llm: reserve required number of slots for embeddings (#6219) 2024-08-06 23:20:49 -04:00
Jeffrey Morgan
e04c7012c2 update llama.cpp submodule to 1e6f6554 (#6208) 2024-08-06 15:11:45 -04:00
royjhan
86b907f82a sort batch results (#6189) 2024-08-05 16:55:34 -07:00
Daniel Hiltgen
f457d63400 Implement linux NUMA detection
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
2024-08-05 12:56:20 -07:00
Daniel Hiltgen
04210aa6dd Catch one more error log 2024-08-05 09:28:07 -07:00
Michael Yang
6a07344786 line feed 2024-08-04 17:25:41 -07:00
likelovewant
63a5f509ed remove official support arches to down size 2024-08-02 13:30:46 +08:00
likelovewant
ca4c0c1a8f Merge branch 'ollama:main' into main 2024-08-02 09:28:09 +08:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang
0ff42e84b0 Merge pull request #4756 from ollama/mxyng/convert2
refactor convert
2024-08-01 14:16:30 -07:00
likelovewant
0d4292b4b1 Merge branch 'ollama:main' into main 2024-08-01 18:30:28 +08:00
Michael Yang
df993fa37b comments 2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b refactor convert 2024-07-31 15:58:33 -07:00
Michael Yang
0f3271db88 patches: phi3 default sliding window attention 2024-07-31 14:58:34 -07:00
Michael Yang
6b252918fb update convert test to check result data 2024-07-31 10:59:38 -07:00
Michael Yang
5c1912769e Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
likelovewant
1eb1dc32d2 Merge branch 'ollama:main' into main 2024-07-31 14:52:26 +08:00
likelovewant
ad5ad895fb fix 2024-07-31 13:37:19 +08:00
jmorganca
afa8d6e9d5 patch gemma support 2024-07-30 18:07:29 -07:00
royjhan
1b44d873e7 Add Metrics to api\embed response (#5709)
* add prompt tokens to embed response

* rm slog

* metrics

* types

* prompt n

* clean up

* reset submodule

* update tests

* test name

* list metrics
2024-07-30 13:12:21 -07:00
likelovewant
fc296fd744 Remove llm/llama.cpp from Git index 2024-07-30 22:37:32 +08:00
likelovewant
e628246970 Restore llama.cpp from commit 6eeaeba 2024-07-30 20:43:59 +08:00
likelovewant
776aa9ceb2 resolve merge conflicts 2024-07-30 18:53:59 +08:00
Jeffrey Morgan
68ee42f995 update llama.cpp submodule to 6eeaeba1 (#6039) 2024-07-29 13:20:26 -07:00
Tibor Schmidt
f3d7a481b7 feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
likelovewant
91ba40fc45 Merge branch 'ollama:main' into main 2024-07-27 12:18:55 +08:00
Jeffrey Morgan
f2a96c7d77 llm: keep patch for llama 3 rope factors (#5987) 2024-07-26 15:20:52 -07:00
likelovewant
86a1575ee3 fix api 2024-07-23 14:57:33 +08:00
likelovewant
fbfc13b6ca Merge branch 'ollama:main' into main 2024-07-23 14:49:32 +08:00
Daniel Hiltgen
e12fff8810 Enable windows error dialog for subprocess startup
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it.  Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang
e2c3f6b3e2 string 2024-07-22 11:27:52 -07:00
Michael Yang
55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang
35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
Daniel Hiltgen
5784c05397 Merge pull request #5854 from dhiltgen/win_exit_status
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Jeffrey Morgan
f8fedbda20 Update llama.cpp submodule commit to d94c6e0c (#5805) 2024-07-22 12:42:00 -04:00
Daniel Hiltgen
a3c20e3f18 Refine error reporting for subprocess crash
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
likelovewant
c44ff579a3 fix mismatch 2024-07-22 19:47:58 +08:00
likelovewant
04325ba40a fix typo 2024-07-22 19:35:43 +08:00
likelovewant
3f03ae5808 update gen_windows.ps1 ,keep track with upstream 2024-07-22 19:00:40 +08:00
likelovewant
24641ae3a5 update gen_windows.ps1 ,keep track with upstream 2024-07-22 18:48:21 +08:00
likelovewant
5cae567ee8 megrge upstream update and reslove the conflicts 2024-07-22 17:00:43 +08:00
likelovewant
a8890fd2c6 fix conflicts 2024-07-22 08:10:12 +08:00
Jeffrey Morgan
5534f2cc6a llm: consider head_dim in llama arch (#5817) 2024-07-20 21:48:12 -04:00