Commit Graph

3289 Commits

Author SHA1 Message Date
likelovewant
91ba40fc45 Merge branch 'ollama:main' into main 2024-07-27 12:18:55 +08:00
Jeffrey Morgan
f2a96c7d77 llm: keep patch for llama 3 rope factors (#5987) 2024-07-26 15:20:52 -07:00
Daniel Hiltgen
e8a66680d1 Merge pull request #5705 from dhiltgen/win_errormode
Enable windows error dialog for subprocess
2024-07-26 14:49:34 -07:00
Michael Yang
079b2c3b03 Merge pull request #5999 from ollama/mxyng/fix-push
fix nil deref in auth.go
2024-07-26 14:28:34 -07:00
Blake Mizerany
750c1c55f7 server: fix race conditions during download (#5994)
This fixes various data races scattered throughout the download/pull
client where the client was accessing the download state concurrently.

This commit is mostly a hot-fix and will be replaced by a new client one
day soon.

Also, remove the unnecessary opts argument from downloadChunk.
2024-07-26 14:24:24 -07:00
Michael Yang
a622c47bd3 fix nil deref in auth.go 2024-07-26 14:14:48 -07:00
Michael Yang
ec4c35fe99 Merge pull request #5512 from ollama/mxyng/detect-stop
autodetect stop parameters from template
2024-07-26 13:48:23 -07:00
likelovewant
0da8b2bc85 Merge branch 'ollama:main' into main v0.3.0 2024-07-26 11:55:46 +08:00
Jeffrey Morgan
f5e3939220 Update api.md (#5968) 2024-07-25 23:10:18 -04:00
Jeffrey Morgan
ae27d9dcfd Update openai.md 2024-07-25 20:27:33 -04:00
Michael Yang
37096790a7 Merge pull request #5552 from ollama/mxyng/messages-docs
docs
2024-07-25 16:26:19 -07:00
Michael Yang
997c903884 Update docs/template.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-25 16:23:40 -07:00
Blake Mizerany
c8af3c2d96 server: reuse original download URL for images (#5962)
This changes the registry client to reuse the original download URL
it gets on the first redirect response for all subsequent requests,
preventing thundering herd issues when hot new LLMs are released.
2024-07-25 15:58:30 -07:00
Jeffrey Morgan
455e61170d Update openai.md 2024-07-25 18:34:47 -04:00
royjhan
4de1370a9d openai tools doc (#5617) 2024-07-25 18:34:06 -04:00
Jeffrey Morgan
bbf8f102ee Revert "llm(llama): pass rope factors (#5924)" (#5963)
This reverts commit bb46bbcf5e.
2024-07-25 18:24:55 -04:00
Michael Yang
bb46bbcf5e llm(llama): pass rope factors (#5924) 2024-07-24 16:05:59 -04:00
royjhan
ac33aa7d37 Fix Embed Test Flakes (#5893)
* float cmp

* increase tolerance
2024-07-24 11:15:46 -07:00
Ajay Chintala
a6cd8f6169 Update README.md to add LLMStack integration (#5799) 2024-07-23 14:40:23 -04:00
likelovewant
86a1575ee3 fix api v0.2.8 2024-07-23 14:57:33 +08:00
likelovewant
fbfc13b6ca Merge branch 'ollama:main' into main 2024-07-23 14:49:32 +08:00
Daniel Hiltgen
c78089263a Merge pull request #5864 from dhiltgen/bump_go
Bump Go patch version
2024-07-22 16:34:18 -07:00
Daniel Hiltgen
3e5ea035d5 Merge pull request #5757 from lreed-mdsol/lreed/bump-go-version-fix-vulnerabilities
bump go version to 1.22.5 to fix security vulnerabilities in docker
2024-07-22 16:32:43 -07:00
Daniel Hiltgen
5d604eec5b Bump Go patch version 2024-07-22 16:16:28 -07:00
Josh
db0968f30c fix dupe err message (#5857) 2024-07-22 15:48:15 -07:00
Daniel Hiltgen
e12fff8810 Enable windows error dialog for subprocess startup
Make sure if something goes wrong spawning the process, the user gets
enough info to be able to try to self correct, or at least file a bug
with details so we can fix it.  Once the process starts, we immediately
change back to the recommended setting to prevent the blocking dialog.
This ensures if the model fails to load (OOM, unsupported model type,
etc.) the process will exit quickly and we can scan the stdout/stderr
of the subprocess for the reason to report via API.
2024-07-22 14:07:27 -07:00
Michael Yang
9b60a038e5 update api.md 2024-07-22 13:49:51 -07:00
Michael Yang
83a0cb8d88 docs 2024-07-22 13:38:09 -07:00
royjhan
c0648233f2 api embed docs (#5282) 2024-07-22 13:37:08 -07:00
Jeffrey Morgan
d835368eb8 convert: capture head_dim for mistral (#5818) 2024-07-22 16:16:22 -04:00
Daniel Hiltgen
5784c05397 Merge pull request #5854 from dhiltgen/win_exit_status
Refine error reporting for subprocess crash
2024-07-22 10:40:22 -07:00
Daniel Hiltgen
f14aa5435d Merge pull request #5855 from dhiltgen/remove_max_vram
Remove no longer supported max vram var
2024-07-22 10:35:29 -07:00
Jeffrey Morgan
f8fedbda20 Update llama.cpp submodule commit to d94c6e0c (#5805) 2024-07-22 12:42:00 -04:00
Jeffrey Morgan
b3e5491e41 server: collect nested tool call objects when parsing (#5824) 2024-07-22 12:38:03 -04:00
Daniel Hiltgen
cc269ba094 Remove no longer supported max vram var
The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
scenarios.  With Concurrency this was no longer wired up, and the simplistic
value doesn't map to multi-GPU setups.  Users can still set `num_gpu`
to limit memory usage to avoid OOM if we get our predictions wrong.
2024-07-22 09:08:11 -07:00
Daniel Hiltgen
a3c20e3f18 Refine error reporting for subprocess crash
On windows, the exit status winds up being the search term many
users search for and end up piling in on issues that are unrelated.
This refines the reporting so that if we have a more detailed message
we'll suppress the exit status portion of the message.
2024-07-22 08:52:16 -07:00
likelovewant
c44ff579a3 fix mismatch 2024-07-22 19:47:58 +08:00
likelovewant
04325ba40a fix typo 2024-07-22 19:35:43 +08:00
likelovewant
3f03ae5808 update gen_windows.ps1 ,keep track with upstream 2024-07-22 19:00:40 +08:00
likelovewant
24641ae3a5 update gen_windows.ps1 ,keep track with upstream 2024-07-22 18:48:21 +08:00
likelovewant
381e89da2e remove unecessary files 2024-07-22 17:25:22 +08:00
likelovewant
5cae567ee8 megrge upstream update and reslove the conflicts 2024-07-22 17:00:43 +08:00
likelovewant
8ebfa2b4ec fix links 2024-07-22 08:18:55 +08:00
likelovewant
a8890fd2c6 fix conflicts 2024-07-22 08:10:12 +08:00
Jeffrey Morgan
80ee9b5e47 Remove out of space test temporarily (#5825) 2024-07-21 00:22:11 -04:00
Jeffrey Morgan
5534f2cc6a llm: consider head_dim in llama arch (#5817) 2024-07-20 21:48:12 -04:00
Daniel Hiltgen
d321297d8a Merge pull request #5815 from dhiltgen/win_rocm_gfx_features
Adjust windows ROCm discovery
2024-07-20 16:02:55 -07:00
Daniel Hiltgen
06e5d74e34 Merge pull request #5506 from dhiltgen/sched_tests
Refine scheduler unit tests for reliability
2024-07-20 15:48:39 -07:00
Daniel Hiltgen
5d707e6fd5 Merge pull request #5583 from dhiltgen/integration_improvements
Fix context exhaustion integration test for small gpus
2024-07-20 15:48:21 -07:00
Daniel Hiltgen
283948c83b Adjust windows ROCm discovery
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery.  The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00