likelovewant
b7d38e2ccd
Merge branch 'ollama:main' into main
2024-08-13 11:27:09 +08:00
Michael Yang
6ffb5cb017
add conversion for microsoft phi 3 mini/medium 4k, 128
2024-08-12 15:13:29 -07:00
Jeffrey Morgan
15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner ( #6220 )
...
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
Daniel Hiltgen
25906d72d1
llm: prevent loading too large models on windows ( #5926 )
...
Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.
2024-08-11 11:30:20 -07:00
Daniel Hiltgen
2473bdba5e
Merge pull request #6182 from dhiltgen/more_patterns
...
Catch one more error log
2024-08-08 12:33:17 -07:00
Michael Yang
2003d60159
llama3.1 memory
2024-08-08 11:18:13 -07:00
likelovewant
ca312b344f
Merge branch 'ollama:main' into main
2024-08-07 17:20:55 +08:00
Jeffrey Morgan
de4fc29773
llm: reserve required number of slots for embeddings ( #6219 )
2024-08-06 23:20:49 -04:00
Jeffrey Morgan
e04c7012c2
update llama.cpp submodule to 1e6f6554 ( #6208 )
2024-08-06 15:11:45 -04:00
royjhan
86b907f82a
sort batch results ( #6189 )
2024-08-05 16:55:34 -07:00
Daniel Hiltgen
f457d63400
Implement linux NUMA detection
...
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
2024-08-05 12:56:20 -07:00
Daniel Hiltgen
04210aa6dd
Catch one more error log
2024-08-05 09:28:07 -07:00
Michael Yang
6a07344786
line feed
2024-08-04 17:25:41 -07:00
likelovewant
63a5f509ed
remove official support arches to down size
2024-08-02 13:30:46 +08:00
likelovewant
ca4c0c1a8f
Merge branch 'ollama:main' into main
2024-08-02 09:28:09 +08:00
Michael Yang
b732beba6a
lint
2024-08-01 17:06:06 -07:00
Michael Yang
0ff42e84b0
Merge pull request #4756 from ollama/mxyng/convert2
...
refactor convert
2024-08-01 14:16:30 -07:00
likelovewant
0d4292b4b1
Merge branch 'ollama:main' into main
2024-08-01 18:30:28 +08:00
Michael Yang
df993fa37b
comments
2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b
refactor convert
2024-07-31 15:58:33 -07:00
Michael Yang
0f3271db88
patches: phi3 default sliding window attention
2024-07-31 14:58:34 -07:00
Michael Yang
6b252918fb
update convert test to check result data
2024-07-31 10:59:38 -07:00
Michael Yang
5c1912769e
Merge pull request #5473 from ollama/mxyng/environ
...
fix: environ lookup
2024-07-31 10:18:05 -07:00
likelovewant
1eb1dc32d2
Merge branch 'ollama:main' into main
2024-07-31 14:52:26 +08:00
likelovewant
ad5ad895fb
fix
2024-07-31 13:37:19 +08:00
jmorganca
afa8d6e9d5
patch gemma support
2024-07-30 18:07:29 -07:00
royjhan
1b44d873e7
Add Metrics to api\embed response ( #5709 )
...
* add prompt tokens to embed response
* rm slog
* metrics
* types
* prompt n
* clean up
* reset submodule
* update tests
* test name
* list metrics
2024-07-30 13:12:21 -07:00
likelovewant
fc296fd744
Remove llm/llama.cpp from Git index
2024-07-30 22:37:32 +08:00
likelovewant
e628246970
Restore llama.cpp from commit 6eeaeba
2024-07-30 20:43:59 +08:00
likelovewant
776aa9ceb2
resolve merge conflicts
2024-07-30 18:53:59 +08:00
Jeffrey Morgan
68ee42f995
update llama.cpp submodule to 6eeaeba1 ( #6039 )
2024-07-29 13:20:26 -07:00
Tibor Schmidt
f3d7a481b7
feat: add support for min_p ( resolve #1142 ) ( #1825 )
2024-07-27 14:37:40 -07:00
likelovewant
91ba40fc45
Merge branch 'ollama:main' into main
2024-07-27 12:18:55 +08:00
Jeffrey Morgan
f2a96c7d77
llm: keep patch for llama 3 rope factors ( #5987 )
2024-07-26 15:20:52 -07:00
likelovewant
86a1575ee3
fix api
2024-07-23 14:57:33 +08:00
likelovewant
fbfc13b6ca
Merge branch 'ollama:main' into main
2024-07-23 14:49:32 +08:00
Daniel Hiltgen
e12fff8810
Enable windows error dialog for subprocess startup
...
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it. Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang
e2c3f6b3e2
string
2024-07-22 11:27:52 -07:00
Michael Yang
55cd3ddcca
bool
2024-07-22 11:27:21 -07:00
Michael Yang
35b89b2eab
rfc: dynamic environ lookup
2024-07-22 11:25:30 -07:00
Daniel Hiltgen
5784c05397
Merge pull request #5854 from dhiltgen/win_exit_status
...
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Jeffrey Morgan
f8fedbda20
Update llama.cpp submodule commit to d94c6e0c ( #5805 )
2024-07-22 12:42:00 -04:00
Daniel Hiltgen
a3c20e3f18
Refine error reporting for subprocess crash
...
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
likelovewant
c44ff579a3
fix mismatch
2024-07-22 19:47:58 +08:00
likelovewant
04325ba40a
fix typo
2024-07-22 19:35:43 +08:00
likelovewant
3f03ae5808
update gen_windows.ps1 ,keep track with upstream
2024-07-22 19:00:40 +08:00
likelovewant
24641ae3a5
update gen_windows.ps1 ,keep track with upstream
2024-07-22 18:48:21 +08:00
likelovewant
5cae567ee8
megrge upstream update and reslove the conflicts
2024-07-22 17:00:43 +08:00
likelovewant
a8890fd2c6
fix conflicts
2024-07-22 08:10:12 +08:00
Jeffrey Morgan
5534f2cc6a
llm: consider head_dim in llama arch ( #5817 )
2024-07-20 21:48:12 -04:00