mirror of
https://github.com/likelovewant/ollama-for-amd.git
synced 2025-12-22 23:03:55 +00:00
Compare commits
157 Commits
V0.1.33-al
...
v0.1.37-al
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9a36dc537d | ||
|
|
9b3b3f6a14 | ||
|
|
4ec7445a6f | ||
|
|
0372c51f82 | ||
|
|
0fec3525ad | ||
|
|
41ba3017fd | ||
|
|
8080fbce35 | ||
|
|
ec14f6ceda | ||
|
|
c60a086635 | ||
|
|
dfbeca78af | ||
|
|
92ca2cca95 | ||
|
|
1e1634daca | ||
|
|
33d0209023 | ||
|
|
824ee5446f | ||
|
|
879e2caf8c | ||
|
|
c4014e73a2 | ||
|
|
be9efdb981 | ||
|
|
074dc3b9d8 | ||
|
|
86f9b582d5 | ||
|
|
4142c3ef7c | ||
|
|
6602e793c0 | ||
|
|
ea0fdaed28 | ||
|
|
1eb382da5a | ||
|
|
bb6fd02298 | ||
|
|
7e2bceceee | ||
|
|
30a7d7096c | ||
|
|
200a18820e | ||
|
|
e03637176d | ||
|
|
c02db93243 | ||
|
|
ffa4d5134a | ||
|
|
302d7fdbf3 | ||
|
|
cf442cd57e | ||
|
|
0e1ba65855 | ||
|
|
6aad333c63 | ||
|
|
4fcc84e67a | ||
|
|
3ae2f441e0 | ||
|
|
2abb3f6424 | ||
|
|
ce3b212d12 | ||
|
|
83d6d46e29 | ||
|
|
354ad9254e | ||
|
|
58876091f7 | ||
|
|
dc18eee39d | ||
|
|
8727a9c140 | ||
|
|
d0425f26cf | ||
|
|
cfa84b8470 | ||
|
|
1580ed4c06 | ||
|
|
a7ee84fc31 | ||
|
|
84ac7ce139 | ||
|
|
788b092c49 | ||
|
|
5cde17a096 | ||
|
|
c3837eb08c | ||
|
|
8cc0ee2efe | ||
|
|
d5eec16d23 | ||
|
|
a3906a6173 | ||
|
|
daa1a032f7 | ||
|
|
6042e8bc57 | ||
|
|
3952ceb6a6 | ||
|
|
920a4b0794 | ||
|
|
ee49844d09 | ||
|
|
8a516ac862 | ||
|
|
bee2f4a3b0 | ||
|
|
cef45feaa4 | ||
|
|
2687f02c96 | ||
|
|
b25976aeb8 | ||
|
|
001f167aad | ||
|
|
486a2c1d94 | ||
|
|
bb22295d43 | ||
|
|
88cf154483 | ||
|
|
8cbd3e7510 | ||
|
|
eeb695261f | ||
|
|
dc9b1111e0 | ||
|
|
06ac829e70 | ||
|
|
72700279e2 | ||
|
|
5d3f7fff26 | ||
|
|
d77c1c5f9d | ||
|
|
2a5302a1cf | ||
|
|
ffbd3d173f | ||
|
|
1e0a669f75 | ||
|
|
527e9be058 | ||
|
|
34bea2e272 | ||
|
|
fe44ae3371 | ||
|
|
adeb40eaf2 | ||
|
|
d7d33e5255 | ||
|
|
63bc884e25 | ||
|
|
ef4e095d24 | ||
|
|
4d4f75a8a8 | ||
|
|
3f71ba406a | ||
|
|
88a67127d8 | ||
|
|
f7dc7dcc64 | ||
|
|
04f971c84b | ||
|
|
548a7df014 | ||
|
|
70edb9bc4d | ||
|
|
3f0ed03856 | ||
|
|
4bd9b97d7b | ||
|
|
0d107ca31c | ||
|
|
4736391bfb | ||
|
|
7c5330413b | ||
|
|
39d9d22ca3 | ||
|
|
af47413dba | ||
|
|
b2f00aa977 | ||
|
|
6694be5e50 | ||
|
|
f5e8b207fb | ||
|
|
d245460362 | ||
|
|
4d0d0fa383 | ||
|
|
7ffe45734d | ||
|
|
01811c176a | ||
|
|
a7248f6ea8 | ||
|
|
9685c34509 | ||
|
|
d091fe3c21 | ||
|
|
ee02f548c8 | ||
|
|
b08870aff3 | ||
|
|
3ecae420ac | ||
|
|
4cbbf0e13b | ||
|
|
380378cc80 | ||
|
|
0963c65027 | ||
|
|
ed740a2504 | ||
|
|
c9f98622b1 | ||
|
|
0a954e5066 | ||
|
|
aa93423fbf | ||
|
|
01c9386267 | ||
|
|
af9eb36f9f | ||
|
|
06093fd396 | ||
|
|
86b7fcac32 | ||
|
|
fb8ddc564e | ||
|
|
242efe6611 | ||
|
|
8d64603d1a | ||
|
|
1b0e6c9c0e | ||
|
|
dfa2f32ca0 | ||
|
|
840424a2c4 | ||
|
|
f56aa20014 | ||
|
|
6707768ebd | ||
|
|
c78bb76a12 | ||
|
|
942c979232 | ||
|
|
06164911dd | ||
|
|
2a21363bb7 | ||
|
|
026869915f | ||
|
|
45d61aaaa3 | ||
|
|
20f6c06569 | ||
|
|
371f5e52aa | ||
|
|
e006480e49 | ||
|
|
aed545872d | ||
|
|
44869c59d6 | ||
|
|
52663284cf | ||
|
|
42fa9d7f0a | ||
|
|
b7a87a22b6 | ||
|
|
e8aaea030e | ||
|
|
b1ad3a43cb | ||
|
|
267e25a750 | ||
|
|
9a32c514cb | ||
|
|
e592e8fccb | ||
|
|
8acb233668 | ||
|
|
119589fcb3 | ||
|
|
089daaeabc | ||
|
|
c496967e56 | ||
|
|
c942e4a07b | ||
|
|
bd54b08261 | ||
|
|
b99c291f47 |
48
README.md
48
README.md
@@ -14,7 +14,25 @@ Get up and running with large language models locally.
|
|||||||
|
|
||||||
### Windows preview
|
### Windows preview
|
||||||
|
|
||||||
[Download](https://ollama.com/download/OllamaSetup.exe)
|
[Download](https://github.com/likelovewant/ollama-for-amd/releases)
|
||||||
|
|
||||||
|
For AMD use or build , please follow the guide on [wiki](https://github.com/likelovewant/ollama-for-amd/wiki)
|
||||||
|
|
||||||
|
official support list
|
||||||
|
```
|
||||||
|
"gfx900" "gfx906:xnack-" "gfx908:xnack-" "gfx90a:xnack+" "gfx90a:xnack-" "gfx940" "gfx941" "gfx942" "gfx1010""gfx1012" "gfx1030" "gfx1100""gfx1101" "gfx1102"
|
||||||
|
```
|
||||||
|
Please download from ollama [official](https://ollama.com/download/OllamaSetup.exe)
|
||||||
|
|
||||||
|
Example extra list add on this repo.
|
||||||
|
```
|
||||||
|
"gfx803" "gfx902" "gfx904""gfx940" "gfx941" "gfx942" "gfx1010" "gfx1011" "gfx1012" "gfx1031" "gfx1032""gfx1034" "gfx1035" "gfx1036" "gfx1103"
|
||||||
|
```
|
||||||
|
Please follow the [wiki](https://github.com/likelovewant/ollama-for-amd/wiki) guide to build or use the pre-release version.
|
||||||
|
|
||||||
|
Note: `gfx803, gfx1010` reported not working by the wiki method ,expected a future support
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Linux
|
### Linux
|
||||||
|
|
||||||
@@ -258,6 +276,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
|
|||||||
|
|
||||||
- [Open WebUI](https://github.com/open-webui/open-webui)
|
- [Open WebUI](https://github.com/open-webui/open-webui)
|
||||||
- [Enchanted (macOS native)](https://github.com/AugustDev/enchanted)
|
- [Enchanted (macOS native)](https://github.com/AugustDev/enchanted)
|
||||||
|
- [Hollama](https://github.com/fmaclen/hollama)
|
||||||
- [Lollms-Webui](https://github.com/ParisNeo/lollms-webui)
|
- [Lollms-Webui](https://github.com/ParisNeo/lollms-webui)
|
||||||
- [LibreChat](https://github.com/danny-avila/LibreChat)
|
- [LibreChat](https://github.com/danny-avila/LibreChat)
|
||||||
- [Bionic GPT](https://github.com/bionic-gpt/bionic-gpt)
|
- [Bionic GPT](https://github.com/bionic-gpt/bionic-gpt)
|
||||||
@@ -284,17 +303,20 @@ See the [API documentation](./docs/api.md) for all endpoints.
|
|||||||
- [OllamaGUI](https://github.com/enoch1118/ollamaGUI)
|
- [OllamaGUI](https://github.com/enoch1118/ollamaGUI)
|
||||||
- [OpenAOE](https://github.com/InternLM/OpenAOE)
|
- [OpenAOE](https://github.com/InternLM/OpenAOE)
|
||||||
- [Odin Runes](https://github.com/leonid20000/OdinRunes)
|
- [Odin Runes](https://github.com/leonid20000/OdinRunes)
|
||||||
- [LLM-X: Progressive Web App](https://github.com/mrdjohnson/llm-x)
|
- [LLM-X](https://github.com/mrdjohnson/llm-x) (Progressive Web App)
|
||||||
- [AnythingLLM (Docker + MacOs/Windows/Linux native app)](https://github.com/Mintplex-Labs/anything-llm)
|
- [AnythingLLM (Docker + MacOs/Windows/Linux native app)](https://github.com/Mintplex-Labs/anything-llm)
|
||||||
- [Ollama Basic Chat: Uses HyperDiv Reactive UI](https://github.com/rapidarchitect/ollama_basic_chat)
|
- [Ollama Basic Chat: Uses HyperDiv Reactive UI](https://github.com/rapidarchitect/ollama_basic_chat)
|
||||||
- [Ollama-chats RPG](https://github.com/drazdra/ollama-chats)
|
- [Ollama-chats RPG](https://github.com/drazdra/ollama-chats)
|
||||||
- [QA-Pilot: Chat with Code Repository](https://github.com/reid41/QA-Pilot)
|
- [QA-Pilot](https://github.com/reid41/QA-Pilot) (Chat with Code Repository)
|
||||||
- [ChatOllama: Open Source Chatbot based on Ollama with Knowledge Bases](https://github.com/sugarforever/chat-ollama)
|
- [ChatOllama](https://github.com/sugarforever/chat-ollama) (Open Source Chatbot based on Ollama with Knowledge Bases)
|
||||||
- [CRAG Ollama Chat: Simple Web Search with Corrective RAG](https://github.com/Nagi-ovo/CRAG-Ollama-Chat)
|
- [CRAG Ollama Chat](https://github.com/Nagi-ovo/CRAG-Ollama-Chat) (Simple Web Search with Corrective RAG)
|
||||||
- [RAGFlow: Open-source Retrieval-Augmented Generation engine based on deep document understanding](https://github.com/infiniflow/ragflow)
|
- [RAGFlow](https://github.com/infiniflow/ragflow) (Open-source Retrieval-Augmented Generation engine based on deep document understanding)
|
||||||
- [chat: chat web app for teams](https://github.com/swuecho/chat)
|
- [StreamDeploy](https://github.com/StreamDeploy-DevRel/streamdeploy-llm-app-scaffold) (LLM Application Scaffold)
|
||||||
|
- [chat](https://github.com/swuecho/chat) (chat web app for teams)
|
||||||
- [Lobe Chat](https://github.com/lobehub/lobe-chat) with [Integrating Doc](https://lobehub.com/docs/self-hosting/examples/ollama)
|
- [Lobe Chat](https://github.com/lobehub/lobe-chat) with [Integrating Doc](https://lobehub.com/docs/self-hosting/examples/ollama)
|
||||||
- [Ollama RAG Chatbot: Local Chat with multiple PDFs using Ollama and RAG.](https://github.com/datvodinh/rag-chatbot.git)
|
- [Ollama RAG Chatbot](https://github.com/datvodinh/rag-chatbot.git) (Local Chat with multiple PDFs using Ollama and RAG)
|
||||||
|
- [BrainSoup](https://www.nurgo-software.com/products/brainsoup) (Flexible native client with RAG & multi-agent automation)
|
||||||
|
- [macai](https://github.com/Renset/macai) (macOS client for Ollama, ChatGPT, and other compatible API back-ends)
|
||||||
|
|
||||||
### Terminal
|
### Terminal
|
||||||
|
|
||||||
@@ -327,6 +349,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
|
|||||||
|
|
||||||
- [Pacman](https://archlinux.org/packages/extra/x86_64/ollama/)
|
- [Pacman](https://archlinux.org/packages/extra/x86_64/ollama/)
|
||||||
- [Helm Chart](https://artifacthub.io/packages/helm/ollama-helm/ollama)
|
- [Helm Chart](https://artifacthub.io/packages/helm/ollama-helm/ollama)
|
||||||
|
- [Guix channel](https://codeberg.org/tusharhero/ollama-guix)
|
||||||
|
|
||||||
### Libraries
|
### Libraries
|
||||||
|
|
||||||
@@ -348,10 +371,13 @@ See the [API documentation](./docs/api.md) for all endpoints.
|
|||||||
- [Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/ollama.md)
|
- [Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/ollama.md)
|
||||||
- [Elixir LangChain](https://github.com/brainlid/langchain)
|
- [Elixir LangChain](https://github.com/brainlid/langchain)
|
||||||
- [Ollama for R - rollama](https://github.com/JBGruber/rollama)
|
- [Ollama for R - rollama](https://github.com/JBGruber/rollama)
|
||||||
|
- [Ollama for R - ollama-r](https://github.com/hauselin/ollama-r)
|
||||||
- [Ollama-ex for Elixir](https://github.com/lebrunel/ollama-ex)
|
- [Ollama-ex for Elixir](https://github.com/lebrunel/ollama-ex)
|
||||||
- [Ollama Connector for SAP ABAP](https://github.com/b-tocs/abap_btocs_ollama)
|
- [Ollama Connector for SAP ABAP](https://github.com/b-tocs/abap_btocs_ollama)
|
||||||
- [Testcontainers](https://testcontainers.com/modules/ollama/)
|
- [Testcontainers](https://testcontainers.com/modules/ollama/)
|
||||||
|
- [Portkey](https://portkey.ai/docs/welcome/integration-guides/ollama)
|
||||||
|
- [PromptingTools.jl](https://github.com/svilupp/PromptingTools.jl) with an [example](https://svilupp.github.io/PromptingTools.jl/dev/examples/working_with_ollama)
|
||||||
|
- [LlamaScript](https://github.com/WolfTheDeveloper/llamascript)
|
||||||
### Mobile
|
### Mobile
|
||||||
|
|
||||||
- [Enchanted](https://github.com/AugustDev/enchanted)
|
- [Enchanted](https://github.com/AugustDev/enchanted)
|
||||||
@@ -370,12 +396,13 @@ See the [API documentation](./docs/api.md) for all endpoints.
|
|||||||
- [Ollama Telegram Bot](https://github.com/ruecat/ollama-telegram)
|
- [Ollama Telegram Bot](https://github.com/ruecat/ollama-telegram)
|
||||||
- [Hass Ollama Conversation](https://github.com/ej52/hass-ollama-conversation)
|
- [Hass Ollama Conversation](https://github.com/ej52/hass-ollama-conversation)
|
||||||
- [Rivet plugin](https://github.com/abrenneke/rivet-plugin-ollama)
|
- [Rivet plugin](https://github.com/abrenneke/rivet-plugin-ollama)
|
||||||
- [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama)
|
|
||||||
- [Obsidian BMO Chatbot plugin](https://github.com/longy2k/obsidian-bmo-chatbot)
|
- [Obsidian BMO Chatbot plugin](https://github.com/longy2k/obsidian-bmo-chatbot)
|
||||||
- [Cliobot](https://github.com/herval/cliobot) (Telegram bot with Ollama support)
|
- [Cliobot](https://github.com/herval/cliobot) (Telegram bot with Ollama support)
|
||||||
- [Copilot for Obsidian plugin](https://github.com/logancyang/obsidian-copilot)
|
- [Copilot for Obsidian plugin](https://github.com/logancyang/obsidian-copilot)
|
||||||
- [Obsidian Local GPT plugin](https://github.com/pfrankov/obsidian-local-gpt)
|
- [Obsidian Local GPT plugin](https://github.com/pfrankov/obsidian-local-gpt)
|
||||||
- [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama)
|
- [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama)
|
||||||
|
- [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama)
|
||||||
|
- [Ollama Copilot](https://github.com/bernardo-bruning/ollama-copilot) (Proxy that allows you to use ollama as a copilot like Github copilot)
|
||||||
- [twinny](https://github.com/rjmacarthy/twinny) (Copilot and Copilot chat alternative using Ollama)
|
- [twinny](https://github.com/rjmacarthy/twinny) (Copilot and Copilot chat alternative using Ollama)
|
||||||
- [Wingman-AI](https://github.com/RussellCanfield/wingman-ai) (Copilot code and chat alternative using Ollama and HuggingFace)
|
- [Wingman-AI](https://github.com/RussellCanfield/wingman-ai) (Copilot code and chat alternative using Ollama and HuggingFace)
|
||||||
- [Page Assist](https://github.com/n4ze3m/page-assist) (Chrome Extension)
|
- [Page Assist](https://github.com/n4ze3m/page-assist) (Chrome Extension)
|
||||||
@@ -385,3 +412,4 @@ See the [API documentation](./docs/api.md) for all endpoints.
|
|||||||
|
|
||||||
### Supported backends
|
### Supported backends
|
||||||
- [llama.cpp](https://github.com/ggerganov/llama.cpp) project founded by Georgi Gerganov.
|
- [llama.cpp](https://github.com/ggerganov/llama.cpp) project founded by Georgi Gerganov.
|
||||||
|
|
||||||
|
|||||||
@@ -1,9 +1,16 @@
|
|||||||
// Package api implements the client-side API for code wishing to interact
|
// Package api implements the client-side API for code wishing to interact
|
||||||
// with the ollama service. The methods of the [Client] type correspond to
|
// with the ollama service. The methods of the [Client] type correspond to
|
||||||
// the ollama REST API as described in https://github.com/ollama/ollama/blob/main/docs/api.md
|
// the ollama REST API as described in [the API documentation].
|
||||||
//
|
|
||||||
// The ollama command-line client itself uses this package to interact with
|
// The ollama command-line client itself uses this package to interact with
|
||||||
// the backend service.
|
// the backend service.
|
||||||
|
//
|
||||||
|
// # Examples
|
||||||
|
//
|
||||||
|
// Several examples of using this package are available [in the GitHub
|
||||||
|
// repository].
|
||||||
|
//
|
||||||
|
// [the API documentation]: https://github.com/ollama/ollama/blob/main/docs/api.md
|
||||||
|
// [in the GitHub repository]: https://github.com/ollama/ollama/tree/main/examples
|
||||||
package api
|
package api
|
||||||
|
|
||||||
import (
|
import (
|
||||||
@@ -299,8 +306,14 @@ func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// PushProgressFunc is a function that [Client.Push] invokes when progress is
|
||||||
|
// made.
|
||||||
|
// It's similar to other progress function types like [PullProgressFunc].
|
||||||
type PushProgressFunc func(ProgressResponse) error
|
type PushProgressFunc func(ProgressResponse) error
|
||||||
|
|
||||||
|
// Push uploads a model to the model library; requires registering for ollama.ai
|
||||||
|
// and adding a public key first. fn is called each time progress is made on
|
||||||
|
// the request and can be used to display a progress bar, etc.
|
||||||
func (c *Client) Push(ctx context.Context, req *PushRequest, fn PushProgressFunc) error {
|
func (c *Client) Push(ctx context.Context, req *PushRequest, fn PushProgressFunc) error {
|
||||||
return c.stream(ctx, http.MethodPost, "/api/push", req, func(bts []byte) error {
|
return c.stream(ctx, http.MethodPost, "/api/push", req, func(bts []byte) error {
|
||||||
var resp ProgressResponse
|
var resp ProgressResponse
|
||||||
@@ -312,8 +325,15 @@ func (c *Client) Push(ctx context.Context, req *PushRequest, fn PushProgressFunc
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// CreateProgressFunc is a function that [Client.Create] invokes when progress
|
||||||
|
// is made.
|
||||||
|
// It's similar to other progress function types like [PullProgressFunc].
|
||||||
type CreateProgressFunc func(ProgressResponse) error
|
type CreateProgressFunc func(ProgressResponse) error
|
||||||
|
|
||||||
|
// Create creates a model from a [Modelfile]. fn is a progress function that
|
||||||
|
// behaves similarly to other methods (see [Client.Pull]).
|
||||||
|
//
|
||||||
|
// [Modelfile]: https://github.com/ollama/ollama/blob/main/docs/modelfile.md
|
||||||
func (c *Client) Create(ctx context.Context, req *CreateRequest, fn CreateProgressFunc) error {
|
func (c *Client) Create(ctx context.Context, req *CreateRequest, fn CreateProgressFunc) error {
|
||||||
return c.stream(ctx, http.MethodPost, "/api/create", req, func(bts []byte) error {
|
return c.stream(ctx, http.MethodPost, "/api/create", req, func(bts []byte) error {
|
||||||
var resp ProgressResponse
|
var resp ProgressResponse
|
||||||
@@ -325,6 +345,7 @@ func (c *Client) Create(ctx context.Context, req *CreateRequest, fn CreateProgre
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// List lists models that are available locally.
|
||||||
func (c *Client) List(ctx context.Context) (*ListResponse, error) {
|
func (c *Client) List(ctx context.Context) (*ListResponse, error) {
|
||||||
var lr ListResponse
|
var lr ListResponse
|
||||||
if err := c.do(ctx, http.MethodGet, "/api/tags", nil, &lr); err != nil {
|
if err := c.do(ctx, http.MethodGet, "/api/tags", nil, &lr); err != nil {
|
||||||
@@ -333,6 +354,8 @@ func (c *Client) List(ctx context.Context) (*ListResponse, error) {
|
|||||||
return &lr, nil
|
return &lr, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Copy copies a model - creating a model with another name from an existing
|
||||||
|
// model.
|
||||||
func (c *Client) Copy(ctx context.Context, req *CopyRequest) error {
|
func (c *Client) Copy(ctx context.Context, req *CopyRequest) error {
|
||||||
if err := c.do(ctx, http.MethodPost, "/api/copy", req, nil); err != nil {
|
if err := c.do(ctx, http.MethodPost, "/api/copy", req, nil); err != nil {
|
||||||
return err
|
return err
|
||||||
@@ -340,6 +363,7 @@ func (c *Client) Copy(ctx context.Context, req *CopyRequest) error {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Delete deletes a model and its data.
|
||||||
func (c *Client) Delete(ctx context.Context, req *DeleteRequest) error {
|
func (c *Client) Delete(ctx context.Context, req *DeleteRequest) error {
|
||||||
if err := c.do(ctx, http.MethodDelete, "/api/delete", req, nil); err != nil {
|
if err := c.do(ctx, http.MethodDelete, "/api/delete", req, nil); err != nil {
|
||||||
return err
|
return err
|
||||||
@@ -347,6 +371,7 @@ func (c *Client) Delete(ctx context.Context, req *DeleteRequest) error {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Show obtains model information, including details, modelfile, license etc.
|
||||||
func (c *Client) Show(ctx context.Context, req *ShowRequest) (*ShowResponse, error) {
|
func (c *Client) Show(ctx context.Context, req *ShowRequest) (*ShowResponse, error) {
|
||||||
var resp ShowResponse
|
var resp ShowResponse
|
||||||
if err := c.do(ctx, http.MethodPost, "/api/show", req, &resp); err != nil {
|
if err := c.do(ctx, http.MethodPost, "/api/show", req, &resp); err != nil {
|
||||||
@@ -355,12 +380,16 @@ func (c *Client) Show(ctx context.Context, req *ShowRequest) (*ShowResponse, err
|
|||||||
return &resp, nil
|
return &resp, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Hearbeat checks if the server has started and is responsive; if yes, it
|
||||||
|
// returns nil, otherwise an error.
|
||||||
func (c *Client) Heartbeat(ctx context.Context) error {
|
func (c *Client) Heartbeat(ctx context.Context) error {
|
||||||
if err := c.do(ctx, http.MethodHead, "/", nil, nil); err != nil {
|
if err := c.do(ctx, http.MethodHead, "/", nil, nil); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Embeddings generates embeddings from a model.
|
||||||
func (c *Client) Embeddings(ctx context.Context, req *EmbeddingRequest) (*EmbeddingResponse, error) {
|
func (c *Client) Embeddings(ctx context.Context, req *EmbeddingRequest) (*EmbeddingResponse, error) {
|
||||||
var resp EmbeddingResponse
|
var resp EmbeddingResponse
|
||||||
if err := c.do(ctx, http.MethodPost, "/api/embeddings", req, &resp); err != nil {
|
if err := c.do(ctx, http.MethodPost, "/api/embeddings", req, &resp); err != nil {
|
||||||
@@ -369,10 +398,13 @@ func (c *Client) Embeddings(ctx context.Context, req *EmbeddingRequest) (*Embedd
|
|||||||
return &resp, nil
|
return &resp, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// CreateBlob creates a blob from a file on the server. digest is the
|
||||||
|
// expected SHA256 digest of the file, and r represents the file.
|
||||||
func (c *Client) CreateBlob(ctx context.Context, digest string, r io.Reader) error {
|
func (c *Client) CreateBlob(ctx context.Context, digest string, r io.Reader) error {
|
||||||
return c.do(ctx, http.MethodPost, fmt.Sprintf("/api/blobs/%s", digest), r, nil)
|
return c.do(ctx, http.MethodPost, fmt.Sprintf("/api/blobs/%s", digest), r, nil)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Version returns the Ollama server version as a string.
|
||||||
func (c *Client) Version(ctx context.Context) (string, error) {
|
func (c *Client) Version(ctx context.Context) (string, error) {
|
||||||
var version struct {
|
var version struct {
|
||||||
Version string `json:"version"`
|
Version string `json:"version"`
|
||||||
|
|||||||
242
api/types.go
242
api/types.go
@@ -4,6 +4,7 @@ import (
|
|||||||
"encoding/json"
|
"encoding/json"
|
||||||
"errors"
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
"math"
|
"math"
|
||||||
"os"
|
"os"
|
||||||
"reflect"
|
"reflect"
|
||||||
@@ -12,6 +13,7 @@ import (
|
|||||||
"time"
|
"time"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
// StatusError is an error with and HTTP status code.
|
||||||
type StatusError struct {
|
type StatusError struct {
|
||||||
StatusCode int
|
StatusCode int
|
||||||
Status string
|
Status string
|
||||||
@@ -32,6 +34,7 @@ func (e StatusError) Error() string {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ImageData represents the raw binary data of an image file.
|
||||||
type ImageData []byte
|
type ImageData []byte
|
||||||
|
|
||||||
// GenerateRequest describes a request sent by [Client.Generate]. While you
|
// GenerateRequest describes a request sent by [Client.Generate]. While you
|
||||||
@@ -77,26 +80,44 @@ type GenerateRequest struct {
|
|||||||
Options map[string]interface{} `json:"options"`
|
Options map[string]interface{} `json:"options"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ChatRequest describes a request sent by [Client.Chat].
|
||||||
type ChatRequest struct {
|
type ChatRequest struct {
|
||||||
Model string `json:"model"`
|
// Model is the model name, as in [GenerateRequest].
|
||||||
Messages []Message `json:"messages"`
|
Model string `json:"model"`
|
||||||
Stream *bool `json:"stream,omitempty"`
|
|
||||||
Format string `json:"format"`
|
// Messages is the messages of the chat - can be used to keep a chat memory.
|
||||||
|
Messages []Message `json:"messages"`
|
||||||
|
|
||||||
|
// Stream enable streaming of returned response; true by default.
|
||||||
|
Stream *bool `json:"stream,omitempty"`
|
||||||
|
|
||||||
|
// Format is the format to return the response in (e.g. "json").
|
||||||
|
Format string `json:"format"`
|
||||||
|
|
||||||
|
// KeepAlive controls how long the model will stay loaded into memory
|
||||||
|
// followin the request.
|
||||||
KeepAlive *Duration `json:"keep_alive,omitempty"`
|
KeepAlive *Duration `json:"keep_alive,omitempty"`
|
||||||
|
|
||||||
|
// Options lists model-specific options.
|
||||||
Options map[string]interface{} `json:"options"`
|
Options map[string]interface{} `json:"options"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Message is a single message in a chat sequence. The message contains the
|
||||||
|
// role ("system", "user", or "assistant"), the content and an optional list
|
||||||
|
// of images.
|
||||||
type Message struct {
|
type Message struct {
|
||||||
Role string `json:"role"` // one of ["system", "user", "assistant"]
|
Role string `json:"role"`
|
||||||
Content string `json:"content"`
|
Content string `json:"content"`
|
||||||
Images []ImageData `json:"images,omitempty"`
|
Images []ImageData `json:"images,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ChatResponse is the response returned by [Client.Chat]. Its fields are
|
||||||
|
// similar to [GenerateResponse].
|
||||||
type ChatResponse struct {
|
type ChatResponse struct {
|
||||||
Model string `json:"model"`
|
Model string `json:"model"`
|
||||||
CreatedAt time.Time `json:"created_at"`
|
CreatedAt time.Time `json:"created_at"`
|
||||||
Message Message `json:"message"`
|
Message Message `json:"message"`
|
||||||
|
DoneReason string `json:"done_reason,omitempty"`
|
||||||
|
|
||||||
Done bool `json:"done"`
|
Done bool `json:"done"`
|
||||||
|
|
||||||
@@ -112,7 +133,8 @@ type Metrics struct {
|
|||||||
EvalDuration time.Duration `json:"eval_duration,omitempty"`
|
EvalDuration time.Duration `json:"eval_duration,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// Options specified in GenerateRequest, if you add a new option here add it to the API docs also
|
// Options specified in [GenerateRequest], if you add a new option here add it
|
||||||
|
// to the API docs also.
|
||||||
type Options struct {
|
type Options struct {
|
||||||
Runner
|
Runner
|
||||||
|
|
||||||
@@ -141,7 +163,6 @@ type Runner struct {
|
|||||||
UseNUMA bool `json:"numa,omitempty"`
|
UseNUMA bool `json:"numa,omitempty"`
|
||||||
NumCtx int `json:"num_ctx,omitempty"`
|
NumCtx int `json:"num_ctx,omitempty"`
|
||||||
NumBatch int `json:"num_batch,omitempty"`
|
NumBatch int `json:"num_batch,omitempty"`
|
||||||
NumGQA int `json:"num_gqa,omitempty"`
|
|
||||||
NumGPU int `json:"num_gpu,omitempty"`
|
NumGPU int `json:"num_gpu,omitempty"`
|
||||||
MainGPU int `json:"main_gpu,omitempty"`
|
MainGPU int `json:"main_gpu,omitempty"`
|
||||||
LowVRAM bool `json:"low_vram,omitempty"`
|
LowVRAM bool `json:"low_vram,omitempty"`
|
||||||
@@ -151,36 +172,45 @@ type Runner struct {
|
|||||||
UseMMap bool `json:"use_mmap,omitempty"`
|
UseMMap bool `json:"use_mmap,omitempty"`
|
||||||
UseMLock bool `json:"use_mlock,omitempty"`
|
UseMLock bool `json:"use_mlock,omitempty"`
|
||||||
NumThread int `json:"num_thread,omitempty"`
|
NumThread int `json:"num_thread,omitempty"`
|
||||||
|
|
||||||
// Unused: RopeFrequencyBase is ignored. Instead the value in the model will be used
|
|
||||||
RopeFrequencyBase float32 `json:"rope_frequency_base,omitempty"`
|
|
||||||
// Unused: RopeFrequencyScale is ignored. Instead the value in the model will be used
|
|
||||||
RopeFrequencyScale float32 `json:"rope_frequency_scale,omitempty"`
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// EmbeddingRequest is the request passed to [Client.Embeddings].
|
||||||
type EmbeddingRequest struct {
|
type EmbeddingRequest struct {
|
||||||
Model string `json:"model"`
|
// Model is the model name.
|
||||||
Prompt string `json:"prompt"`
|
Model string `json:"model"`
|
||||||
|
|
||||||
|
// Prompt is the textual prompt to embed.
|
||||||
|
Prompt string `json:"prompt"`
|
||||||
|
|
||||||
|
// KeepAlive controls how long the model will stay loaded in memory following
|
||||||
|
// this request.
|
||||||
KeepAlive *Duration `json:"keep_alive,omitempty"`
|
KeepAlive *Duration `json:"keep_alive,omitempty"`
|
||||||
|
|
||||||
|
// Options lists model-specific options.
|
||||||
Options map[string]interface{} `json:"options"`
|
Options map[string]interface{} `json:"options"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// EmbeddingResponse is the response from [Client.Embeddings].
|
||||||
type EmbeddingResponse struct {
|
type EmbeddingResponse struct {
|
||||||
Embedding []float64 `json:"embedding"`
|
Embedding []float64 `json:"embedding"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// CreateRequest is the request passed to [Client.Create].
|
||||||
type CreateRequest struct {
|
type CreateRequest struct {
|
||||||
Model string `json:"model"`
|
Model string `json:"model"`
|
||||||
Path string `json:"path"`
|
Path string `json:"path"`
|
||||||
Modelfile string `json:"modelfile"`
|
Modelfile string `json:"modelfile"`
|
||||||
Stream *bool `json:"stream,omitempty"`
|
Stream *bool `json:"stream,omitempty"`
|
||||||
Quantization string `json:"quantization,omitempty"`
|
Quantize string `json:"quantize,omitempty"`
|
||||||
|
|
||||||
// Name is deprecated, see Model
|
// Name is deprecated, see Model
|
||||||
Name string `json:"name"`
|
Name string `json:"name"`
|
||||||
|
|
||||||
|
// Quantization is deprecated, see Quantize
|
||||||
|
Quantization string `json:"quantization,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// DeleteRequest is the request passed to [Client.Delete].
|
||||||
type DeleteRequest struct {
|
type DeleteRequest struct {
|
||||||
Model string `json:"model"`
|
Model string `json:"model"`
|
||||||
|
|
||||||
@@ -188,6 +218,7 @@ type DeleteRequest struct {
|
|||||||
Name string `json:"name"`
|
Name string `json:"name"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ShowRequest is the request passed to [Client.Show].
|
||||||
type ShowRequest struct {
|
type ShowRequest struct {
|
||||||
Model string `json:"model"`
|
Model string `json:"model"`
|
||||||
System string `json:"system"`
|
System string `json:"system"`
|
||||||
@@ -199,6 +230,7 @@ type ShowRequest struct {
|
|||||||
Name string `json:"name"`
|
Name string `json:"name"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ShowResponse is the response returned from [Client.Show].
|
||||||
type ShowResponse struct {
|
type ShowResponse struct {
|
||||||
License string `json:"license,omitempty"`
|
License string `json:"license,omitempty"`
|
||||||
Modelfile string `json:"modelfile,omitempty"`
|
Modelfile string `json:"modelfile,omitempty"`
|
||||||
@@ -209,11 +241,13 @@ type ShowResponse struct {
|
|||||||
Messages []Message `json:"messages,omitempty"`
|
Messages []Message `json:"messages,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// CopyRequest is the request passed to [Client.Copy].
|
||||||
type CopyRequest struct {
|
type CopyRequest struct {
|
||||||
Source string `json:"source"`
|
Source string `json:"source"`
|
||||||
Destination string `json:"destination"`
|
Destination string `json:"destination"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// PullRequest is the request passed to [Client.Pull].
|
||||||
type PullRequest struct {
|
type PullRequest struct {
|
||||||
Model string `json:"model"`
|
Model string `json:"model"`
|
||||||
Insecure bool `json:"insecure,omitempty"`
|
Insecure bool `json:"insecure,omitempty"`
|
||||||
@@ -225,6 +259,8 @@ type PullRequest struct {
|
|||||||
Name string `json:"name"`
|
Name string `json:"name"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ProgressResponse is the response passed to progress functions like
|
||||||
|
// [PullProgressFunc] and [PushProgressFunc].
|
||||||
type ProgressResponse struct {
|
type ProgressResponse struct {
|
||||||
Status string `json:"status"`
|
Status string `json:"status"`
|
||||||
Digest string `json:"digest,omitempty"`
|
Digest string `json:"digest,omitempty"`
|
||||||
@@ -232,6 +268,7 @@ type ProgressResponse struct {
|
|||||||
Completed int64 `json:"completed,omitempty"`
|
Completed int64 `json:"completed,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// PushRequest is the request passed to [Client.Push].
|
||||||
type PushRequest struct {
|
type PushRequest struct {
|
||||||
Model string `json:"model"`
|
Model string `json:"model"`
|
||||||
Insecure bool `json:"insecure,omitempty"`
|
Insecure bool `json:"insecure,omitempty"`
|
||||||
@@ -243,10 +280,12 @@ type PushRequest struct {
|
|||||||
Name string `json:"name"`
|
Name string `json:"name"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ListResponse is the response from [Client.List].
|
||||||
type ListResponse struct {
|
type ListResponse struct {
|
||||||
Models []ModelResponse `json:"models"`
|
Models []ModelResponse `json:"models"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ModelResponse is a single model description in [ListResponse].
|
||||||
type ModelResponse struct {
|
type ModelResponse struct {
|
||||||
Name string `json:"name"`
|
Name string `json:"name"`
|
||||||
Model string `json:"model"`
|
Model string `json:"model"`
|
||||||
@@ -260,17 +299,31 @@ type TokenResponse struct {
|
|||||||
Token string `json:"token"`
|
Token string `json:"token"`
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// GenerateResponse is the response passed into [GenerateResponseFunc].
|
||||||
type GenerateResponse struct {
|
type GenerateResponse struct {
|
||||||
Model string `json:"model"`
|
// Model is the model name that generated the response.
|
||||||
CreatedAt time.Time `json:"created_at"`
|
Model string `json:"model"`
|
||||||
Response string `json:"response"`
|
|
||||||
|
|
||||||
Done bool `json:"done"`
|
//CreatedAt is the timestamp of the response.
|
||||||
|
CreatedAt time.Time `json:"created_at"`
|
||||||
|
|
||||||
|
// Response is the textual response itself.
|
||||||
|
Response string `json:"response"`
|
||||||
|
|
||||||
|
// Done specifies if the response is complete.
|
||||||
|
Done bool `json:"done"`
|
||||||
|
|
||||||
|
// DoneReason is the reason the model stopped generating text.
|
||||||
|
DoneReason string `json:"done_reason,omitempty"`
|
||||||
|
|
||||||
|
// Context is an encoding of the conversation used in this response; this
|
||||||
|
// can be sent in the next request to keep a conversational memory.
|
||||||
Context []int `json:"context,omitempty"`
|
Context []int `json:"context,omitempty"`
|
||||||
|
|
||||||
Metrics
|
Metrics
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ModelDetails provides details about a model.
|
||||||
type ModelDetails struct {
|
type ModelDetails struct {
|
||||||
ParentModel string `json:"parent_model"`
|
ParentModel string `json:"parent_model"`
|
||||||
Format string `json:"format"`
|
Format string `json:"format"`
|
||||||
@@ -308,7 +361,6 @@ func (m *Metrics) Summary() {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
var ErrInvalidOpts = errors.New("invalid options")
|
|
||||||
var ErrInvalidHostPort = errors.New("invalid port specified in OLLAMA_HOST")
|
var ErrInvalidHostPort = errors.New("invalid port specified in OLLAMA_HOST")
|
||||||
|
|
||||||
func (opts *Options) FromMap(m map[string]interface{}) error {
|
func (opts *Options) FromMap(m map[string]interface{}) error {
|
||||||
@@ -324,76 +376,76 @@ func (opts *Options) FromMap(m map[string]interface{}) error {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
invalidOpts := []string{}
|
|
||||||
for key, val := range m {
|
for key, val := range m {
|
||||||
if opt, ok := jsonOpts[key]; ok {
|
opt, ok := jsonOpts[key]
|
||||||
field := valueOpts.FieldByName(opt.Name)
|
if !ok {
|
||||||
if field.IsValid() && field.CanSet() {
|
slog.Warn("invalid option provided", "option", opt.Name)
|
||||||
if val == nil {
|
continue
|
||||||
continue
|
}
|
||||||
}
|
|
||||||
|
|
||||||
switch field.Kind() {
|
field := valueOpts.FieldByName(opt.Name)
|
||||||
case reflect.Int:
|
if field.IsValid() && field.CanSet() {
|
||||||
switch t := val.(type) {
|
if val == nil {
|
||||||
case int64:
|
continue
|
||||||
field.SetInt(t)
|
}
|
||||||
case float64:
|
|
||||||
// when JSON unmarshals numbers, it uses float64, not int
|
switch field.Kind() {
|
||||||
field.SetInt(int64(t))
|
case reflect.Int:
|
||||||
default:
|
switch t := val.(type) {
|
||||||
return fmt.Errorf("option %q must be of type integer", key)
|
case int64:
|
||||||
}
|
field.SetInt(t)
|
||||||
case reflect.Bool:
|
case float64:
|
||||||
val, ok := val.(bool)
|
// when JSON unmarshals numbers, it uses float64, not int
|
||||||
if !ok {
|
field.SetInt(int64(t))
|
||||||
return fmt.Errorf("option %q must be of type boolean", key)
|
default:
|
||||||
}
|
return fmt.Errorf("option %q must be of type integer", key)
|
||||||
field.SetBool(val)
|
}
|
||||||
case reflect.Float32:
|
case reflect.Bool:
|
||||||
// JSON unmarshals to float64
|
val, ok := val.(bool)
|
||||||
val, ok := val.(float64)
|
if !ok {
|
||||||
if !ok {
|
return fmt.Errorf("option %q must be of type boolean", key)
|
||||||
return fmt.Errorf("option %q must be of type float32", key)
|
}
|
||||||
}
|
field.SetBool(val)
|
||||||
field.SetFloat(val)
|
case reflect.Float32:
|
||||||
case reflect.String:
|
// JSON unmarshals to float64
|
||||||
val, ok := val.(string)
|
val, ok := val.(float64)
|
||||||
if !ok {
|
if !ok {
|
||||||
return fmt.Errorf("option %q must be of type string", key)
|
return fmt.Errorf("option %q must be of type float32", key)
|
||||||
}
|
}
|
||||||
field.SetString(val)
|
field.SetFloat(val)
|
||||||
case reflect.Slice:
|
case reflect.String:
|
||||||
// JSON unmarshals to []interface{}, not []string
|
val, ok := val.(string)
|
||||||
val, ok := val.([]interface{})
|
if !ok {
|
||||||
if !ok {
|
return fmt.Errorf("option %q must be of type string", key)
|
||||||
return fmt.Errorf("option %q must be of type array", key)
|
}
|
||||||
}
|
field.SetString(val)
|
||||||
// convert []interface{} to []string
|
case reflect.Slice:
|
||||||
slice := make([]string, len(val))
|
// JSON unmarshals to []interface{}, not []string
|
||||||
for i, item := range val {
|
val, ok := val.([]interface{})
|
||||||
str, ok := item.(string)
|
if !ok {
|
||||||
if !ok {
|
return fmt.Errorf("option %q must be of type array", key)
|
||||||
return fmt.Errorf("option %q must be of an array of strings", key)
|
}
|
||||||
}
|
// convert []interface{} to []string
|
||||||
slice[i] = str
|
slice := make([]string, len(val))
|
||||||
}
|
for i, item := range val {
|
||||||
field.Set(reflect.ValueOf(slice))
|
str, ok := item.(string)
|
||||||
default:
|
if !ok {
|
||||||
return fmt.Errorf("unknown type loading config params: %v", field.Kind())
|
return fmt.Errorf("option %q must be of an array of strings", key)
|
||||||
}
|
}
|
||||||
|
slice[i] = str
|
||||||
|
}
|
||||||
|
field.Set(reflect.ValueOf(slice))
|
||||||
|
default:
|
||||||
|
return fmt.Errorf("unknown type loading config params: %v", field.Kind())
|
||||||
}
|
}
|
||||||
} else {
|
|
||||||
invalidOpts = append(invalidOpts, key)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(invalidOpts) > 0 {
|
|
||||||
return fmt.Errorf("%w: %v", ErrInvalidOpts, strings.Join(invalidOpts, ", "))
|
|
||||||
}
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// DefaultOptions is the default set of options for [GenerateRequest]; these
|
||||||
|
// values are used unless the user specifies other values explicitly.
|
||||||
func DefaultOptions() Options {
|
func DefaultOptions() Options {
|
||||||
return Options{
|
return Options{
|
||||||
// options set on request to runner
|
// options set on request to runner
|
||||||
@@ -421,8 +473,7 @@ func DefaultOptions() Options {
|
|||||||
NumCtx: 2048,
|
NumCtx: 2048,
|
||||||
NumBatch: 512,
|
NumBatch: 512,
|
||||||
NumGPU: -1, // -1 here indicates that NumGPU should be set dynamically
|
NumGPU: -1, // -1 here indicates that NumGPU should be set dynamically
|
||||||
NumGQA: 1,
|
NumThread: 0, // let the runtime decide
|
||||||
NumThread: 0, // let the runtime decide
|
|
||||||
LowVRAM: false,
|
LowVRAM: false,
|
||||||
F16KV: true,
|
F16KV: true,
|
||||||
UseMLock: false,
|
UseMLock: false,
|
||||||
@@ -436,6 +487,13 @@ type Duration struct {
|
|||||||
time.Duration
|
time.Duration
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (d Duration) MarshalJSON() ([]byte, error) {
|
||||||
|
if d.Duration < 0 {
|
||||||
|
return []byte("-1"), nil
|
||||||
|
}
|
||||||
|
return []byte("\"" + d.Duration.String() + "\""), nil
|
||||||
|
}
|
||||||
|
|
||||||
func (d *Duration) UnmarshalJSON(b []byte) (err error) {
|
func (d *Duration) UnmarshalJSON(b []byte) (err error) {
|
||||||
var v any
|
var v any
|
||||||
if err := json.Unmarshal(b, &v); err != nil {
|
if err := json.Unmarshal(b, &v); err != nil {
|
||||||
@@ -449,7 +507,7 @@ func (d *Duration) UnmarshalJSON(b []byte) (err error) {
|
|||||||
if t < 0 {
|
if t < 0 {
|
||||||
d.Duration = time.Duration(math.MaxInt64)
|
d.Duration = time.Duration(math.MaxInt64)
|
||||||
} else {
|
} else {
|
||||||
d.Duration = time.Duration(t * float64(time.Second))
|
d.Duration = time.Duration(int(t) * int(time.Second))
|
||||||
}
|
}
|
||||||
case string:
|
case string:
|
||||||
d.Duration, err = time.ParseDuration(t)
|
d.Duration, err = time.ParseDuration(t)
|
||||||
@@ -459,6 +517,8 @@ func (d *Duration) UnmarshalJSON(b []byte) (err error) {
|
|||||||
if d.Duration < 0 {
|
if d.Duration < 0 {
|
||||||
d.Duration = time.Duration(math.MaxInt64)
|
d.Duration = time.Duration(math.MaxInt64)
|
||||||
}
|
}
|
||||||
|
default:
|
||||||
|
return fmt.Errorf("Unsupported type: '%s'", reflect.TypeOf(v))
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
|
|||||||
@@ -21,6 +21,11 @@ func TestKeepAliveParsingFromJSON(t *testing.T) {
|
|||||||
req: `{ "keep_alive": 42 }`,
|
req: `{ "keep_alive": 42 }`,
|
||||||
exp: &Duration{42 * time.Second},
|
exp: &Duration{42 * time.Second},
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
name: "Positive Float",
|
||||||
|
req: `{ "keep_alive": 42.5 }`,
|
||||||
|
exp: &Duration{42 * time.Second},
|
||||||
|
},
|
||||||
{
|
{
|
||||||
name: "Positive Integer String",
|
name: "Positive Integer String",
|
||||||
req: `{ "keep_alive": "42m" }`,
|
req: `{ "keep_alive": "42m" }`,
|
||||||
@@ -31,6 +36,11 @@ func TestKeepAliveParsingFromJSON(t *testing.T) {
|
|||||||
req: `{ "keep_alive": -1 }`,
|
req: `{ "keep_alive": -1 }`,
|
||||||
exp: &Duration{math.MaxInt64},
|
exp: &Duration{math.MaxInt64},
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
name: "Negative Float",
|
||||||
|
req: `{ "keep_alive": -3.14 }`,
|
||||||
|
exp: &Duration{math.MaxInt64},
|
||||||
|
},
|
||||||
{
|
{
|
||||||
name: "Negative Integer String",
|
name: "Negative Integer String",
|
||||||
req: `{ "keep_alive": "-1m" }`,
|
req: `{ "keep_alive": "-1m" }`,
|
||||||
@@ -48,3 +58,50 @@ func TestKeepAliveParsingFromJSON(t *testing.T) {
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestDurationMarshalUnmarshal(t *testing.T) {
|
||||||
|
tests := []struct {
|
||||||
|
name string
|
||||||
|
input time.Duration
|
||||||
|
expected time.Duration
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
"negative duration",
|
||||||
|
time.Duration(-1),
|
||||||
|
time.Duration(math.MaxInt64),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"positive duration",
|
||||||
|
time.Duration(42 * time.Second),
|
||||||
|
time.Duration(42 * time.Second),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"another positive duration",
|
||||||
|
time.Duration(42 * time.Minute),
|
||||||
|
time.Duration(42 * time.Minute),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"zero duration",
|
||||||
|
time.Duration(0),
|
||||||
|
time.Duration(0),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"max duration",
|
||||||
|
time.Duration(math.MaxInt64),
|
||||||
|
time.Duration(math.MaxInt64),
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, test := range tests {
|
||||||
|
t.Run(test.name, func(t *testing.T) {
|
||||||
|
b, err := json.Marshal(Duration{test.input})
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
var d Duration
|
||||||
|
err = json.Unmarshal(b, &d)
|
||||||
|
require.NoError(t, err)
|
||||||
|
|
||||||
|
assert.Equal(t, test.expected, d.Duration, "input %v, marshalled %v, got %v", test.input, string(b), d.Duration)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@@ -5,12 +5,14 @@ import (
|
|||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
|
|
||||||
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
)
|
)
|
||||||
|
|
||||||
func InitLogging() {
|
func InitLogging() {
|
||||||
level := slog.LevelInfo
|
level := slog.LevelInfo
|
||||||
|
|
||||||
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
|
if envconfig.Debug {
|
||||||
level = slog.LevelDebug
|
level = slog.LevelDebug
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -31,16 +31,13 @@ func DoUpgrade(cancel context.CancelFunc, done chan int) error {
|
|||||||
"/LOG=" + filepath.Base(UpgradeLogFile), // Only relative seems reliable, so set pwd
|
"/LOG=" + filepath.Base(UpgradeLogFile), // Only relative seems reliable, so set pwd
|
||||||
"/FORCECLOSEAPPLICATIONS", // Force close the tray app - might be needed
|
"/FORCECLOSEAPPLICATIONS", // Force close the tray app - might be needed
|
||||||
}
|
}
|
||||||
// When we're not in debug mode, make the upgrade as quiet as possible (no GUI, no prompts)
|
// make the upgrade as quiet as possible (no GUI, no prompts)
|
||||||
// TODO - temporarily disable since we're pinning in debug mode for the preview
|
|
||||||
// if debug := os.Getenv("OLLAMA_DEBUG"); debug == "" {
|
|
||||||
installArgs = append(installArgs,
|
installArgs = append(installArgs,
|
||||||
"/SP", // Skip the "This will install... Do you wish to continue" prompt
|
"/SP", // Skip the "This will install... Do you wish to continue" prompt
|
||||||
"/SUPPRESSMSGBOXES",
|
"/SUPPRESSMSGBOXES",
|
||||||
"/SILENT",
|
"/SILENT",
|
||||||
"/VERYSILENT",
|
"/VERYSILENT",
|
||||||
)
|
)
|
||||||
// }
|
|
||||||
|
|
||||||
// Safeguard in case we have requests in flight that need to drain...
|
// Safeguard in case we have requests in flight that need to drain...
|
||||||
slog.Info("Waiting for server to shutdown")
|
slog.Info("Waiting for server to shutdown")
|
||||||
|
|||||||
32
cmd/cmd.go
32
cmd/cmd.go
@@ -34,7 +34,6 @@ import (
|
|||||||
"github.com/ollama/ollama/api"
|
"github.com/ollama/ollama/api"
|
||||||
"github.com/ollama/ollama/auth"
|
"github.com/ollama/ollama/auth"
|
||||||
"github.com/ollama/ollama/format"
|
"github.com/ollama/ollama/format"
|
||||||
"github.com/ollama/ollama/parser"
|
|
||||||
"github.com/ollama/ollama/progress"
|
"github.com/ollama/ollama/progress"
|
||||||
"github.com/ollama/ollama/server"
|
"github.com/ollama/ollama/server"
|
||||||
"github.com/ollama/ollama/types/errtypes"
|
"github.com/ollama/ollama/types/errtypes"
|
||||||
@@ -57,13 +56,13 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
|
|||||||
p := progress.NewProgress(os.Stderr)
|
p := progress.NewProgress(os.Stderr)
|
||||||
defer p.Stop()
|
defer p.Stop()
|
||||||
|
|
||||||
modelfile, err := os.Open(filename)
|
f, err := os.Open(filename)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
defer modelfile.Close()
|
defer f.Close()
|
||||||
|
|
||||||
commands, err := parser.Parse(modelfile)
|
modelfile, err := model.ParseFile(f)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
@@ -77,10 +76,10 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
|
|||||||
spinner := progress.NewSpinner(status)
|
spinner := progress.NewSpinner(status)
|
||||||
p.Add(status, spinner)
|
p.Add(status, spinner)
|
||||||
|
|
||||||
for i := range commands {
|
for i := range modelfile.Commands {
|
||||||
switch commands[i].Name {
|
switch modelfile.Commands[i].Name {
|
||||||
case "model", "adapter":
|
case "model", "adapter":
|
||||||
path := commands[i].Args
|
path := modelfile.Commands[i].Args
|
||||||
if path == "~" {
|
if path == "~" {
|
||||||
path = home
|
path = home
|
||||||
} else if strings.HasPrefix(path, "~/") {
|
} else if strings.HasPrefix(path, "~/") {
|
||||||
@@ -92,7 +91,7 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fi, err := os.Stat(path)
|
fi, err := os.Stat(path)
|
||||||
if errors.Is(err, os.ErrNotExist) && commands[i].Name == "model" {
|
if errors.Is(err, os.ErrNotExist) && modelfile.Commands[i].Name == "model" {
|
||||||
continue
|
continue
|
||||||
} else if err != nil {
|
} else if err != nil {
|
||||||
return err
|
return err
|
||||||
@@ -115,7 +114,7 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
commands[i].Args = "@"+digest
|
modelfile.Commands[i].Args = "@" + digest
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -143,9 +142,9 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
quantization, _ := cmd.Flags().GetString("quantization")
|
quantize, _ := cmd.Flags().GetString("quantize")
|
||||||
|
|
||||||
request := api.CreateRequest{Name: args[0], Modelfile: parser.Format(commands), Quantization: quantization}
|
request := api.CreateRequest{Name: args[0], Modelfile: modelfile.String(), Quantize: quantize}
|
||||||
if err := client.Create(cmd.Context(), &request, fn); err != nil {
|
if err := client.Create(cmd.Context(), &request, fn); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
@@ -899,7 +898,12 @@ func RunServer(cmd *cobra.Command, _ []string) error {
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
return server.Serve(ln)
|
err = server.Serve(ln)
|
||||||
|
if errors.Is(err, http.ErrServerClosed) {
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
func initializeKeypair() error {
|
func initializeKeypair() error {
|
||||||
@@ -1046,8 +1050,8 @@ func NewCLI() *cobra.Command {
|
|||||||
RunE: CreateHandler,
|
RunE: CreateHandler,
|
||||||
}
|
}
|
||||||
|
|
||||||
createCmd.Flags().StringP("file", "f", "Modelfile", "Name of the Modelfile (default \"Modelfile\")")
|
createCmd.Flags().StringP("file", "f", "Modelfile", "Name of the Modelfile")
|
||||||
createCmd.Flags().StringP("quantization", "q", "", "Quantization level.")
|
createCmd.Flags().StringP("quantize", "q", "", "Quantize model to this level (e.g. q4_0)")
|
||||||
|
|
||||||
showCmd := &cobra.Command{
|
showCmd := &cobra.Command{
|
||||||
Use: "show MODEL",
|
Use: "show MODEL",
|
||||||
|
|||||||
@@ -162,7 +162,7 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
|
|||||||
fmt.Fprintln(os.Stderr, " /set parameter repeat_penalty <float> How strongly to penalize repetitions")
|
fmt.Fprintln(os.Stderr, " /set parameter repeat_penalty <float> How strongly to penalize repetitions")
|
||||||
fmt.Fprintln(os.Stderr, " /set parameter repeat_last_n <int> Set how far back to look for repetitions")
|
fmt.Fprintln(os.Stderr, " /set parameter repeat_last_n <int> Set how far back to look for repetitions")
|
||||||
fmt.Fprintln(os.Stderr, " /set parameter num_gpu <int> The number of layers to send to the GPU")
|
fmt.Fprintln(os.Stderr, " /set parameter num_gpu <int> The number of layers to send to the GPU")
|
||||||
fmt.Fprintln(os.Stderr, " /set parameter stop \"<string>\", ... Set the stop parameters")
|
fmt.Fprintln(os.Stderr, " /set parameter stop <string> <string> ... Set the stop parameters")
|
||||||
fmt.Fprintln(os.Stderr, "")
|
fmt.Fprintln(os.Stderr, "")
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -5,6 +5,7 @@ import (
|
|||||||
"encoding/binary"
|
"encoding/binary"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
"io"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
@@ -47,7 +48,7 @@ type ByteOrder interface {
|
|||||||
type ModelArch interface {
|
type ModelArch interface {
|
||||||
GetTensors() error
|
GetTensors() error
|
||||||
LoadVocab() error
|
LoadVocab() error
|
||||||
WriteGGUF() (string, error)
|
WriteGGUF(io.WriteSeeker) error
|
||||||
}
|
}
|
||||||
|
|
||||||
type ModelFormat interface {
|
type ModelFormat interface {
|
||||||
|
|||||||
@@ -94,7 +94,7 @@ func (m *GemmaModel) LoadVocab() error {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (m *GemmaModel) WriteGGUF() (string, error) {
|
func (m *GemmaModel) WriteGGUF(ws io.WriteSeeker) error {
|
||||||
kv := llm.KV{
|
kv := llm.KV{
|
||||||
"general.architecture": "gemma",
|
"general.architecture": "gemma",
|
||||||
"general.name": m.Name,
|
"general.name": m.Name,
|
||||||
@@ -122,16 +122,5 @@ func (m *GemmaModel) WriteGGUF() (string, error) {
|
|||||||
"tokenizer.ggml.add_eos_token": false,
|
"tokenizer.ggml.add_eos_token": false,
|
||||||
}
|
}
|
||||||
|
|
||||||
f, err := os.CreateTemp("", "ollama-gguf")
|
return llm.NewGGUFV3(m.Params.ByteOrder).Encode(ws, kv, m.Tensors)
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
defer f.Close()
|
|
||||||
|
|
||||||
mod := llm.NewGGUFV3(m.Params.ByteOrder)
|
|
||||||
if err := mod.Encode(f, kv, m.Tensors); err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
return f.Name(), nil
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,7 +5,6 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
|
||||||
"regexp"
|
"regexp"
|
||||||
"strings"
|
"strings"
|
||||||
|
|
||||||
@@ -132,7 +131,7 @@ func (m *LlamaModel) LoadVocab() error {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (m *LlamaModel) WriteGGUF() (string, error) {
|
func (m *LlamaModel) WriteGGUF(ws io.WriteSeeker) error {
|
||||||
kv := llm.KV{
|
kv := llm.KV{
|
||||||
"general.architecture": "llama",
|
"general.architecture": "llama",
|
||||||
"general.name": m.Name,
|
"general.name": m.Name,
|
||||||
@@ -159,18 +158,5 @@ func (m *LlamaModel) WriteGGUF() (string, error) {
|
|||||||
"tokenizer.ggml.add_eos_token": false,
|
"tokenizer.ggml.add_eos_token": false,
|
||||||
}
|
}
|
||||||
|
|
||||||
f, err := os.CreateTemp("", "ollama-gguf")
|
return llm.NewGGUFV3(m.Params.ByteOrder).Encode(ws, kv, m.Tensors)
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
defer f.Close()
|
|
||||||
|
|
||||||
mod := llm.NewGGUFV3(m.Params.ByteOrder)
|
|
||||||
if err := mod.Encode(f, kv, m.Tensors); err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
slog.Debug(fmt.Sprintf("gguf file = %s", f.Name()))
|
|
||||||
|
|
||||||
return f.Name(), nil
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -132,7 +132,7 @@ func (m *MistralModel) LoadVocab() error {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (m *MistralModel) WriteGGUF() (string, error) {
|
func (m *MistralModel) WriteGGUF(ws io.WriteSeeker) error {
|
||||||
kv := llm.KV{
|
kv := llm.KV{
|
||||||
"general.architecture": "llama",
|
"general.architecture": "llama",
|
||||||
"general.name": m.Name,
|
"general.name": m.Name,
|
||||||
@@ -158,16 +158,5 @@ func (m *MistralModel) WriteGGUF() (string, error) {
|
|||||||
"tokenizer.ggml.unknown_token_id": uint32(0),
|
"tokenizer.ggml.unknown_token_id": uint32(0),
|
||||||
}
|
}
|
||||||
|
|
||||||
f, err := os.CreateTemp("", "ollama-gguf")
|
return llm.NewGGUFV3(m.Params.ByteOrder).Encode(ws, kv, m.Tensors)
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
defer f.Close()
|
|
||||||
|
|
||||||
mod := llm.NewGGUFV3(m.Params.ByteOrder)
|
|
||||||
if err := mod.Encode(f, kv, m.Tensors); err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
return f.Name(), nil
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
package convert
|
package convert
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"os"
|
"io"
|
||||||
"regexp"
|
"regexp"
|
||||||
|
|
||||||
"github.com/ollama/ollama/llm"
|
"github.com/ollama/ollama/llm"
|
||||||
@@ -47,7 +47,7 @@ func (m *MixtralModel) LoadVocab() error {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (m *MixtralModel) WriteGGUF() (string, error) {
|
func (m *MixtralModel) WriteGGUF(ws io.WriteSeeker) error {
|
||||||
kv := llm.KV{
|
kv := llm.KV{
|
||||||
"general.architecture": "llama",
|
"general.architecture": "llama",
|
||||||
"general.name": m.Name,
|
"general.name": m.Name,
|
||||||
@@ -81,16 +81,5 @@ func (m *MixtralModel) WriteGGUF() (string, error) {
|
|||||||
"tokenizer.ggml.add_eos_token": false,
|
"tokenizer.ggml.add_eos_token": false,
|
||||||
}
|
}
|
||||||
|
|
||||||
f, err := os.CreateTemp("", "ollama-gguf")
|
return llm.NewGGUFV3(m.Params.ByteOrder).Encode(ws, kv, m.Tensors)
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
defer f.Close()
|
|
||||||
|
|
||||||
mod := llm.NewGGUFV3(m.Params.ByteOrder)
|
|
||||||
if err := mod.Encode(f, kv, m.Tensors); err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
return f.Name(), nil
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -53,7 +53,7 @@ func (m *SafetensorFormat) GetTensors(dirpath string, params *Params) ([]llm.Ten
|
|||||||
var err error
|
var err error
|
||||||
t, offset, err = m.readTensors(f, offset, params)
|
t, offset, err = m.readTensors(f, offset, params)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
slog.Error("%v", err)
|
slog.Error(err.Error())
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
tensors = append(tensors, t...)
|
tensors = append(tensors, t...)
|
||||||
@@ -122,7 +122,7 @@ func (m *SafetensorFormat) readTensors(fn string, offset uint64, params *Params)
|
|||||||
|
|
||||||
ggufName, err := m.GetLayerName(k)
|
ggufName, err := m.GetLayerName(k)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
slog.Error("%v", err)
|
slog.Error(err.Error())
|
||||||
return nil, 0, err
|
return nil, 0, err
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -74,7 +74,7 @@ func (tf *TorchFormat) GetTensors(dirpath string, params *Params) ([]llm.Tensor,
|
|||||||
|
|
||||||
ggufName, err := tf.GetLayerName(k.(string))
|
ggufName, err := tf.GetLayerName(k.(string))
|
||||||
if err != nil {
|
if err != nil {
|
||||||
slog.Error("%v", err)
|
slog.Error(err.Error())
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
slog.Debug(fmt.Sprintf("finding name for '%s' -> '%s'", k.(string), ggufName))
|
slog.Debug(fmt.Sprintf("finding name for '%s' -> '%s'", k.(string), ggufName))
|
||||||
|
|||||||
@@ -6,7 +6,7 @@
|
|||||||
* [Importing models](./import.md)
|
* [Importing models](./import.md)
|
||||||
* [Linux Documentation](./linux.md)
|
* [Linux Documentation](./linux.md)
|
||||||
* [Windows Documentation](./windows.md)
|
* [Windows Documentation](./windows.md)
|
||||||
* [Docker Documentation](https://hub.docker.com/r/ollama/ollama)
|
* [Docker Documentation](./docker.md)
|
||||||
|
|
||||||
### Reference
|
### Reference
|
||||||
|
|
||||||
|
|||||||
63
docs/api.md
63
docs/api.md
@@ -17,7 +17,7 @@
|
|||||||
|
|
||||||
### Model names
|
### Model names
|
||||||
|
|
||||||
Model names follow a `model:tag` format, where `model` can have an optional namespace such as `example/model`. Some examples are `orca-mini:3b-q4_1` and `llama2:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
|
Model names follow a `model:tag` format, where `model` can have an optional namespace such as `example/model`. Some examples are `orca-mini:3b-q4_1` and `llama3:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
|
||||||
|
|
||||||
### Durations
|
### Durations
|
||||||
|
|
||||||
@@ -66,7 +66,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/generate -d '{
|
curl http://localhost:11434/api/generate -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"prompt": "Why is the sky blue?"
|
"prompt": "Why is the sky blue?"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
@@ -77,7 +77,7 @@ A stream of JSON objects is returned:
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-08-04T08:52:19.385406455-07:00",
|
"created_at": "2023-08-04T08:52:19.385406455-07:00",
|
||||||
"response": "The",
|
"response": "The",
|
||||||
"done": false
|
"done": false
|
||||||
@@ -95,11 +95,11 @@ The final response in the stream also includes additional data about the generat
|
|||||||
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
|
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
|
||||||
- `response`: empty if the response was streamed, if not streamed, this will contain the full response
|
- `response`: empty if the response was streamed, if not streamed, this will contain the full response
|
||||||
|
|
||||||
To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.
|
To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration` * `10^9`.
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-08-04T19:22:45.499127Z",
|
"created_at": "2023-08-04T19:22:45.499127Z",
|
||||||
"response": "",
|
"response": "",
|
||||||
"done": true,
|
"done": true,
|
||||||
@@ -121,7 +121,7 @@ A response can be received in one reply when streaming is off.
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/generate -d '{
|
curl http://localhost:11434/api/generate -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"prompt": "Why is the sky blue?",
|
"prompt": "Why is the sky blue?",
|
||||||
"stream": false
|
"stream": false
|
||||||
}'
|
}'
|
||||||
@@ -133,7 +133,7 @@ If `stream` is set to `false`, the response will be a single JSON object:
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-08-04T19:22:45.499127Z",
|
"created_at": "2023-08-04T19:22:45.499127Z",
|
||||||
"response": "The sky is blue because it is the color of the sky.",
|
"response": "The sky is blue because it is the color of the sky.",
|
||||||
"done": true,
|
"done": true,
|
||||||
@@ -155,7 +155,7 @@ If `stream` is set to `false`, the response will be a single JSON object:
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/generate -d '{
|
curl http://localhost:11434/api/generate -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"prompt": "What color is the sky at different times of the day? Respond using JSON",
|
"prompt": "What color is the sky at different times of the day? Respond using JSON",
|
||||||
"format": "json",
|
"format": "json",
|
||||||
"stream": false
|
"stream": false
|
||||||
@@ -166,7 +166,7 @@ curl http://localhost:11434/api/generate -d '{
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-11-09T21:07:55.186497Z",
|
"created_at": "2023-11-09T21:07:55.186497Z",
|
||||||
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
|
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
|
||||||
"done": true,
|
"done": true,
|
||||||
@@ -289,7 +289,7 @@ If you want to set custom options for the model at runtime rather than in the Mo
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/generate -d '{
|
curl http://localhost:11434/api/generate -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"prompt": "Why is the sky blue?",
|
"prompt": "Why is the sky blue?",
|
||||||
"stream": false,
|
"stream": false,
|
||||||
"options": {
|
"options": {
|
||||||
@@ -313,7 +313,6 @@ curl http://localhost:11434/api/generate -d '{
|
|||||||
"numa": false,
|
"numa": false,
|
||||||
"num_ctx": 1024,
|
"num_ctx": 1024,
|
||||||
"num_batch": 2,
|
"num_batch": 2,
|
||||||
"num_gqa": 1,
|
|
||||||
"num_gpu": 1,
|
"num_gpu": 1,
|
||||||
"main_gpu": 0,
|
"main_gpu": 0,
|
||||||
"low_vram": false,
|
"low_vram": false,
|
||||||
@@ -321,8 +320,6 @@ curl http://localhost:11434/api/generate -d '{
|
|||||||
"vocab_only": false,
|
"vocab_only": false,
|
||||||
"use_mmap": true,
|
"use_mmap": true,
|
||||||
"use_mlock": false,
|
"use_mlock": false,
|
||||||
"rope_frequency_base": 1.1,
|
|
||||||
"rope_frequency_scale": 0.8,
|
|
||||||
"num_thread": 8
|
"num_thread": 8
|
||||||
}
|
}
|
||||||
}'
|
}'
|
||||||
@@ -332,7 +329,7 @@ curl http://localhost:11434/api/generate -d '{
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-08-04T19:22:45.499127Z",
|
"created_at": "2023-08-04T19:22:45.499127Z",
|
||||||
"response": "The sky is blue because it is the color of the sky.",
|
"response": "The sky is blue because it is the color of the sky.",
|
||||||
"done": true,
|
"done": true,
|
||||||
@@ -354,7 +351,7 @@ If an empty prompt is provided, the model will be loaded into memory.
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/generate -d '{
|
curl http://localhost:11434/api/generate -d '{
|
||||||
"model": "llama2"
|
"model": "llama3"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -364,7 +361,7 @@ A single JSON object is returned:
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-12-18T19:52:07.071755Z",
|
"created_at": "2023-12-18T19:52:07.071755Z",
|
||||||
"response": "",
|
"response": "",
|
||||||
"done": true
|
"done": true
|
||||||
@@ -407,7 +404,7 @@ Send a chat message with a streaming response.
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/chat -d '{
|
curl http://localhost:11434/api/chat -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"role": "user",
|
"role": "user",
|
||||||
@@ -423,7 +420,7 @@ A stream of JSON objects is returned:
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-08-04T08:52:19.385406455-07:00",
|
"created_at": "2023-08-04T08:52:19.385406455-07:00",
|
||||||
"message": {
|
"message": {
|
||||||
"role": "assistant",
|
"role": "assistant",
|
||||||
@@ -438,7 +435,7 @@ Final response:
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-08-04T19:22:45.499127Z",
|
"created_at": "2023-08-04T19:22:45.499127Z",
|
||||||
"done": true,
|
"done": true,
|
||||||
"total_duration": 4883583458,
|
"total_duration": 4883583458,
|
||||||
@@ -456,7 +453,7 @@ Final response:
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/chat -d '{
|
curl http://localhost:11434/api/chat -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"role": "user",
|
"role": "user",
|
||||||
@@ -471,7 +468,7 @@ curl http://localhost:11434/api/chat -d '{
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "registry.ollama.ai/library/llama2:latest",
|
"model": "registry.ollama.ai/library/llama3:latest",
|
||||||
"created_at": "2023-12-12T14:13:43.416799Z",
|
"created_at": "2023-12-12T14:13:43.416799Z",
|
||||||
"message": {
|
"message": {
|
||||||
"role": "assistant",
|
"role": "assistant",
|
||||||
@@ -495,7 +492,7 @@ Send a chat message with a conversation history. You can use this same approach
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/chat -d '{
|
curl http://localhost:11434/api/chat -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"role": "user",
|
"role": "user",
|
||||||
@@ -519,7 +516,7 @@ A stream of JSON objects is returned:
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-08-04T08:52:19.385406455-07:00",
|
"created_at": "2023-08-04T08:52:19.385406455-07:00",
|
||||||
"message": {
|
"message": {
|
||||||
"role": "assistant",
|
"role": "assistant",
|
||||||
@@ -533,7 +530,7 @@ Final response:
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"created_at": "2023-08-04T19:22:45.499127Z",
|
"created_at": "2023-08-04T19:22:45.499127Z",
|
||||||
"done": true,
|
"done": true,
|
||||||
"total_duration": 8113331500,
|
"total_duration": 8113331500,
|
||||||
@@ -591,7 +588,7 @@ curl http://localhost:11434/api/chat -d '{
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/chat -d '{
|
curl http://localhost:11434/api/chat -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"role": "user",
|
"role": "user",
|
||||||
@@ -609,7 +606,7 @@ curl http://localhost:11434/api/chat -d '{
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "registry.ollama.ai/library/llama2:latest",
|
"model": "registry.ollama.ai/library/llama3:latest",
|
||||||
"created_at": "2023-12-12T14:13:43.416799Z",
|
"created_at": "2023-12-12T14:13:43.416799Z",
|
||||||
"message": {
|
"message": {
|
||||||
"role": "assistant",
|
"role": "assistant",
|
||||||
@@ -651,7 +648,7 @@ Create a new model from a `Modelfile`.
|
|||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/create -d '{
|
curl http://localhost:11434/api/create -d '{
|
||||||
"name": "mario",
|
"name": "mario",
|
||||||
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
|
"modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros."
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -758,7 +755,7 @@ A single JSON object will be returned.
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"name": "llama2:latest",
|
"name": "llama3:latest",
|
||||||
"modified_at": "2023-12-07T09:32:18.757212583-08:00",
|
"modified_at": "2023-12-07T09:32:18.757212583-08:00",
|
||||||
"size": 3825819519,
|
"size": 3825819519,
|
||||||
"digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
|
"digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
|
||||||
@@ -792,7 +789,7 @@ Show information about a model including details, modelfile, template, parameter
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/show -d '{
|
curl http://localhost:11434/api/show -d '{
|
||||||
"name": "llama2"
|
"name": "llama3"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -827,8 +824,8 @@ Copy a model. Creates a model with another name from an existing model.
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/copy -d '{
|
curl http://localhost:11434/api/copy -d '{
|
||||||
"source": "llama2",
|
"source": "llama3",
|
||||||
"destination": "llama2-backup"
|
"destination": "llama3-backup"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -854,7 +851,7 @@ Delete a model and its data.
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl -X DELETE http://localhost:11434/api/delete -d '{
|
curl -X DELETE http://localhost:11434/api/delete -d '{
|
||||||
"name": "llama2:13b"
|
"name": "llama3:13b"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -882,7 +879,7 @@ Download a model from the ollama library. Cancelled pulls are resumed from where
|
|||||||
|
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/pull -d '{
|
curl http://localhost:11434/api/pull -d '{
|
||||||
"name": "llama2"
|
"name": "llama3"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
71
docs/docker.md
Normal file
71
docs/docker.md
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
# Ollama Docker image
|
||||||
|
|
||||||
|
### CPU only
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
### Nvidia GPU
|
||||||
|
Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation).
|
||||||
|
|
||||||
|
#### Install with Apt
|
||||||
|
1. Configure the repository
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
|
||||||
|
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
||||||
|
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
|
||||||
|
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
|
||||||
|
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
||||||
|
sudo apt-get update
|
||||||
|
```
|
||||||
|
2. Install the NVIDIA Container Toolkit packages
|
||||||
|
```bash
|
||||||
|
sudo apt-get install -y nvidia-container-toolkit
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Install with Yum or Dnf
|
||||||
|
1. Configure the repository
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
|
||||||
|
| sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Install the NVIDIA Container Toolkit packages
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo yum install -y nvidia-container-toolkit
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Configure Docker to use Nvidia driver
|
||||||
|
```
|
||||||
|
sudo nvidia-ctk runtime configure --runtime=docker
|
||||||
|
sudo systemctl restart docker
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Start the container
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
### AMD GPU
|
||||||
|
|
||||||
|
To run Ollama using Docker with AMD GPUs, use the `rocm` tag and the following command:
|
||||||
|
|
||||||
|
```
|
||||||
|
docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run model locally
|
||||||
|
|
||||||
|
Now you can run a model:
|
||||||
|
|
||||||
|
```
|
||||||
|
docker exec -it ollama ollama run llama3
|
||||||
|
```
|
||||||
|
|
||||||
|
### Try different models
|
||||||
|
|
||||||
|
More models can be found on the [Ollama library](https://ollama.com/library).
|
||||||
14
docs/faq.md
14
docs/faq.md
@@ -32,7 +32,7 @@ When using the API, specify the `num_ctx` parameter:
|
|||||||
|
|
||||||
```
|
```
|
||||||
curl http://localhost:11434/api/generate -d '{
|
curl http://localhost:11434/api/generate -d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"prompt": "Why is the sky blue?",
|
"prompt": "Why is the sky blue?",
|
||||||
"options": {
|
"options": {
|
||||||
"num_ctx": 4096
|
"num_ctx": 4096
|
||||||
@@ -140,7 +140,7 @@ Refer to the section [above](#how-do-i-configure-ollama-server) for how to set e
|
|||||||
|
|
||||||
- macOS: `~/.ollama/models`
|
- macOS: `~/.ollama/models`
|
||||||
- Linux: `/usr/share/ollama/.ollama/models`
|
- Linux: `/usr/share/ollama/.ollama/models`
|
||||||
- Windows: `C:\Users\<username>\.ollama\models`
|
- Windows: `C:\Users\%username%\.ollama\models`
|
||||||
|
|
||||||
### How do I set them to a different location?
|
### How do I set them to a different location?
|
||||||
|
|
||||||
@@ -221,14 +221,20 @@ The `keep_alive` parameter can be set to:
|
|||||||
|
|
||||||
For example, to preload a model and leave it in memory use:
|
For example, to preload a model and leave it in memory use:
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": -1}'
|
curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": -1}'
|
||||||
```
|
```
|
||||||
|
|
||||||
To unload the model and free up memory use:
|
To unload the model and free up memory use:
|
||||||
```shell
|
```shell
|
||||||
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
|
curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": 0}'
|
||||||
```
|
```
|
||||||
|
|
||||||
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.
|
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.
|
||||||
|
|
||||||
If you wish to override the `OLLAMA_KEEP_ALIVE` setting, use the `keep_alive` API parameter with the `/api/generate` or `/api/chat` API endpoints.
|
If you wish to override the `OLLAMA_KEEP_ALIVE` setting, use the `keep_alive` API parameter with the `/api/generate` or `/api/chat` API endpoints.
|
||||||
|
|
||||||
|
## How do I manage the maximum number of requests the server can queue
|
||||||
|
|
||||||
|
If too many requests are sent to the server, it will respond with a 503 error
|
||||||
|
indicating the server is overloaded. You can adjust how many requests may be
|
||||||
|
queue by setting `OLLAMA_MAX_QUEUE`
|
||||||
@@ -125,7 +125,7 @@ Publishing models is in early alpha. If you'd like to publish your model to shar
|
|||||||
|
|
||||||
1. Create [an account](https://ollama.com/signup)
|
1. Create [an account](https://ollama.com/signup)
|
||||||
2. Copy your Ollama public key:
|
2. Copy your Ollama public key:
|
||||||
- macOS: `cat ~/.ollama/id_ed25519.pub`
|
- macOS: `cat ~/.ollama/id_ed25519.pub | pbcopy`
|
||||||
- Windows: `type %USERPROFILE%\.ollama\id_ed25519.pub`
|
- Windows: `type %USERPROFILE%\.ollama\id_ed25519.pub`
|
||||||
- Linux: `cat /usr/share/ollama/.ollama/id_ed25519.pub`
|
- Linux: `cat /usr/share/ollama/.ollama/id_ed25519.pub`
|
||||||
3. Add your public key to your [Ollama account](https://ollama.com/settings/keys)
|
3. Add your public key to your [Ollama account](https://ollama.com/settings/keys)
|
||||||
@@ -136,6 +136,8 @@ Next, copy your model to your username's namespace:
|
|||||||
ollama cp example <your username>/example
|
ollama cp example <your username>/example
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> Note: model names may only contain lowercase letters, digits, and the characters `.`, `-`, and `_`.
|
||||||
|
|
||||||
Then push the model:
|
Then push the model:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -105,7 +105,7 @@ sudo chmod +x /usr/bin/ollama
|
|||||||
To view logs of Ollama running as a startup service, run:
|
To view logs of Ollama running as a startup service, run:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
journalctl -u ollama
|
journalctl -e -u ollama
|
||||||
```
|
```
|
||||||
|
|
||||||
## Uninstall
|
## Uninstall
|
||||||
|
|||||||
@@ -10,7 +10,7 @@ A model file is the blueprint to create and share models with Ollama.
|
|||||||
- [Examples](#examples)
|
- [Examples](#examples)
|
||||||
- [Instructions](#instructions)
|
- [Instructions](#instructions)
|
||||||
- [FROM (Required)](#from-required)
|
- [FROM (Required)](#from-required)
|
||||||
- [Build from llama2](#build-from-llama2)
|
- [Build from llama3](#build-from-llama3)
|
||||||
- [Build from a bin file](#build-from-a-bin-file)
|
- [Build from a bin file](#build-from-a-bin-file)
|
||||||
- [PARAMETER](#parameter)
|
- [PARAMETER](#parameter)
|
||||||
- [Valid Parameters and Values](#valid-parameters-and-values)
|
- [Valid Parameters and Values](#valid-parameters-and-values)
|
||||||
@@ -48,7 +48,7 @@ INSTRUCTION arguments
|
|||||||
An example of a `Modelfile` creating a mario blueprint:
|
An example of a `Modelfile` creating a mario blueprint:
|
||||||
|
|
||||||
```modelfile
|
```modelfile
|
||||||
FROM llama2
|
FROM llama3
|
||||||
# sets the temperature to 1 [higher is more creative, lower is more coherent]
|
# sets the temperature to 1 [higher is more creative, lower is more coherent]
|
||||||
PARAMETER temperature 1
|
PARAMETER temperature 1
|
||||||
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
|
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
|
||||||
@@ -67,33 +67,25 @@ To use this:
|
|||||||
|
|
||||||
More examples are available in the [examples directory](../examples).
|
More examples are available in the [examples directory](../examples).
|
||||||
|
|
||||||
### `Modelfile`s in [ollama.com/library][1]
|
To view the Modelfile of a given model, use the `ollama show --modelfile` command.
|
||||||
|
|
||||||
There are two ways to view `Modelfile`s underlying the models in [ollama.com/library][1]:
|
|
||||||
|
|
||||||
- Option 1: view a details page from a model's tags page:
|
|
||||||
1. Go to a particular model's tags (e.g. https://ollama.com/library/llama2/tags)
|
|
||||||
2. Click on a tag (e.g. https://ollama.com/library/llama2:13b)
|
|
||||||
3. Scroll down to "Layers"
|
|
||||||
- Note: if the [`FROM` instruction](#from-required) is not present,
|
|
||||||
it means the model was created from a local file
|
|
||||||
- Option 2: use `ollama show` to print the `Modelfile` for any local models like so:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
> ollama show --modelfile llama2:13b
|
> ollama show --modelfile llama3
|
||||||
# Modelfile generated by "ollama show"
|
# Modelfile generated by "ollama show"
|
||||||
# To build a new Modelfile based on this one, replace the FROM line with:
|
# To build a new Modelfile based on this one, replace the FROM line with:
|
||||||
# FROM llama2:13b
|
# FROM llama3:latest
|
||||||
|
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
|
||||||
|
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
|
||||||
|
|
||||||
FROM /root/.ollama/models/blobs/sha256:123abc
|
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
|
||||||
TEMPLATE """[INST] {{ if .System }}<<SYS>>{{ .System }}<</SYS>>
|
|
||||||
|
|
||||||
{{ end }}{{ .Prompt }} [/INST] """
|
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
|
||||||
SYSTEM """"""
|
|
||||||
PARAMETER stop [INST]
|
{{ .Response }}<|eot_id|>"""
|
||||||
PARAMETER stop [/INST]
|
PARAMETER stop "<|start_header_id|>"
|
||||||
PARAMETER stop <<SYS>>
|
PARAMETER stop "<|end_header_id|>"
|
||||||
PARAMETER stop <</SYS>>
|
PARAMETER stop "<|eot_id|>"
|
||||||
|
PARAMETER stop "<|reserved_special_token"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Instructions
|
## Instructions
|
||||||
@@ -106,10 +98,10 @@ The `FROM` instruction defines the base model to use when creating a model.
|
|||||||
FROM <model name>:<tag>
|
FROM <model name>:<tag>
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Build from llama2
|
#### Build from llama3
|
||||||
|
|
||||||
```modelfile
|
```modelfile
|
||||||
FROM llama2
|
FROM llama3
|
||||||
```
|
```
|
||||||
|
|
||||||
A list of available base models:
|
A list of available base models:
|
||||||
|
|||||||
@@ -25,7 +25,7 @@ chat_completion = client.chat.completions.create(
|
|||||||
'content': 'Say this is a test',
|
'content': 'Say this is a test',
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
model='llama2',
|
model='llama3',
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -43,7 +43,7 @@ const openai = new OpenAI({
|
|||||||
|
|
||||||
const chatCompletion = await openai.chat.completions.create({
|
const chatCompletion = await openai.chat.completions.create({
|
||||||
messages: [{ role: 'user', content: 'Say this is a test' }],
|
messages: [{ role: 'user', content: 'Say this is a test' }],
|
||||||
model: 'llama2',
|
model: 'llama3',
|
||||||
})
|
})
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -53,7 +53,7 @@ const chatCompletion = await openai.chat.completions.create({
|
|||||||
curl http://localhost:11434/v1/chat/completions \
|
curl http://localhost:11434/v1/chat/completions \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{
|
||||||
"model": "llama2",
|
"model": "llama3",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"role": "system",
|
"role": "system",
|
||||||
@@ -113,7 +113,7 @@ curl http://localhost:11434/v1/chat/completions \
|
|||||||
Before using a model, pull it locally `ollama pull`:
|
Before using a model, pull it locally `ollama pull`:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
ollama pull llama2
|
ollama pull llama3
|
||||||
```
|
```
|
||||||
|
|
||||||
### Default model names
|
### Default model names
|
||||||
@@ -121,7 +121,7 @@ ollama pull llama2
|
|||||||
For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name:
|
For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name:
|
||||||
|
|
||||||
```
|
```
|
||||||
ollama cp llama2 gpt-3.5-turbo
|
ollama cp llama3 gpt-3.5-turbo
|
||||||
```
|
```
|
||||||
|
|
||||||
Afterwards, this new model name can be specified the `model` field:
|
Afterwards, this new model name can be specified the `model` field:
|
||||||
|
|||||||
@@ -83,3 +83,22 @@ If your system is configured with the "noexec" flag where Ollama stores its
|
|||||||
temporary executable files, you can specify an alternate location by setting
|
temporary executable files, you can specify an alternate location by setting
|
||||||
OLLAMA_TMPDIR to a location writable by the user ollama runs as. For example
|
OLLAMA_TMPDIR to a location writable by the user ollama runs as. For example
|
||||||
OLLAMA_TMPDIR=/usr/share/ollama/
|
OLLAMA_TMPDIR=/usr/share/ollama/
|
||||||
|
|
||||||
|
## Container fails to run on NVIDIA GPU
|
||||||
|
|
||||||
|
Make sure you've set up the conatiner runtime first as described in [docker.md](./docker.md)
|
||||||
|
|
||||||
|
Sometimes the container runtime can have difficulties initializing the GPU.
|
||||||
|
When you check the server logs, this can show up as various error codes, such
|
||||||
|
as "3" (not initialized), "46" (device unavailable), "100" (no device), "999"
|
||||||
|
(unknown), or others. The following troubleshooting techniques may help resolve
|
||||||
|
the problem
|
||||||
|
|
||||||
|
- Is the uvm driver not loaded? `sudo nvidia-modprobe -u`
|
||||||
|
- Try reloading the nvidia_uvm driver - `sudo rmmod nvidia_uvm` then `sudo modprobe nvidia_uvm`
|
||||||
|
- Try rebooting
|
||||||
|
- Make sure you're running the latest nvidia drivers
|
||||||
|
|
||||||
|
If none of those resolve the problem, gather additional information and file an issue:
|
||||||
|
- Set `CUDA_ERROR_LEVEL=50` and try again to get more diagnostic logs
|
||||||
|
- Check dmesg for any errors `sudo dmesg | grep -i nvrm` and `sudo dmesg | grep -i nvidia`
|
||||||
|
|||||||
@@ -5,17 +5,17 @@ In this tutorial, we are going to use JavaScript with LangChain and Ollama to le
|
|||||||
To get started, let's just use **LangChain** to ask a simple question to a model. To do this with JavaScript, we need to install **LangChain**:
|
To get started, let's just use **LangChain** to ask a simple question to a model. To do this with JavaScript, we need to install **LangChain**:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
npm install langchain
|
npm install @langchain/community
|
||||||
```
|
```
|
||||||
|
|
||||||
Now we can start building out our JavaScript:
|
Now we can start building out our JavaScript:
|
||||||
|
|
||||||
```javascript
|
```javascript
|
||||||
import { Ollama } from "langchain/llms/ollama";
|
import { Ollama } from "@langchain/community/llms/ollama";
|
||||||
|
|
||||||
const ollama = new Ollama({
|
const ollama = new Ollama({
|
||||||
baseUrl: "http://localhost:11434",
|
baseUrl: "http://localhost:11434",
|
||||||
model: "llama2",
|
model: "llama3",
|
||||||
});
|
});
|
||||||
|
|
||||||
const answer = await ollama.invoke(`why is the sky blue?`);
|
const answer = await ollama.invoke(`why is the sky blue?`);
|
||||||
@@ -23,7 +23,7 @@ const answer = await ollama.invoke(`why is the sky blue?`);
|
|||||||
console.log(answer);
|
console.log(answer);
|
||||||
```
|
```
|
||||||
|
|
||||||
That will get us the same thing as if we ran `ollama run llama2 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
|
That will get us the same thing as if we ran `ollama run llama3 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
npm install cheerio
|
npm install cheerio
|
||||||
|
|||||||
@@ -12,7 +12,7 @@ So let's figure out how we can use **LangChain** with Ollama to ask our question
|
|||||||
|
|
||||||
Let's start by asking a simple question that we can get an answer to from the **Llama2** model using **Ollama**. First, we need to install the **LangChain** package:
|
Let's start by asking a simple question that we can get an answer to from the **Llama2** model using **Ollama**. First, we need to install the **LangChain** package:
|
||||||
|
|
||||||
`pip install langchain`
|
`pip install langchain_community`
|
||||||
|
|
||||||
Then we can create a model and ask the question:
|
Then we can create a model and ask the question:
|
||||||
|
|
||||||
|
|||||||
@@ -27,7 +27,7 @@ Logs will often be helpful in diagnosing the problem (see
|
|||||||
|
|
||||||
Here's a quick example showing API access from `powershell`
|
Here's a quick example showing API access from `powershell`
|
||||||
```powershell
|
```powershell
|
||||||
(Invoke-WebRequest -method POST -Body '{"model":"llama2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
|
(Invoke-WebRequest -method POST -Body '{"model":"llama3", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
@@ -45,3 +45,17 @@ the explorer window by hitting `<cmd>+R` and type in:
|
|||||||
- `explorer %LOCALAPPDATA%\Programs\Ollama` contains the binaries (The installer adds this to your user PATH)
|
- `explorer %LOCALAPPDATA%\Programs\Ollama` contains the binaries (The installer adds this to your user PATH)
|
||||||
- `explorer %HOMEPATH%\.ollama` contains models and configuration
|
- `explorer %HOMEPATH%\.ollama` contains models and configuration
|
||||||
- `explorer %TEMP%` contains temporary executable files in one or more `ollama*` directories
|
- `explorer %TEMP%` contains temporary executable files in one or more `ollama*` directories
|
||||||
|
|
||||||
|
|
||||||
|
## Standalone CLI
|
||||||
|
|
||||||
|
The easiest way to install Ollama on Windows is to use the `OllamaSetup.exe`
|
||||||
|
installer. It installs in your account without requiring Administrator rights.
|
||||||
|
We update Ollama regularly to support the latest models, and this installer will
|
||||||
|
help you keep up to date.
|
||||||
|
|
||||||
|
If you'd like to install or integrate Ollama as a service, a standalone
|
||||||
|
`ollama-windows-amd64.zip` zip file is available containing only the Ollama CLI
|
||||||
|
and GPU library dependencies for Nvidia and AMD. This allows for embedding
|
||||||
|
Ollama in existing applications, or running it as a system service via `ollama
|
||||||
|
serve` with tools such as [NSSM](https://nssm.cc/).
|
||||||
|
|||||||
@@ -1,10 +0,0 @@
|
|||||||
# Bash Shell examples
|
|
||||||
|
|
||||||
When calling `ollama`, you can pass it a file to run all the prompts in the file, one after the other:
|
|
||||||
|
|
||||||
`ollama run llama2 < sourcequestions.txt`
|
|
||||||
|
|
||||||
This concept is used in the following example.
|
|
||||||
|
|
||||||
## Compare Models
|
|
||||||
`comparemodels.sh` is a script that runs all the questions in `sourcequestions.txt` using any 4 models you choose that you have already pulled from the Ollama library or have created locally.
|
|
||||||
@@ -1,64 +0,0 @@
|
|||||||
#! /usr/bin/env bash
|
|
||||||
# Compare multiple models by running them with the same questions
|
|
||||||
|
|
||||||
NUMBEROFCHOICES=4
|
|
||||||
SELECTIONS=()
|
|
||||||
declare -a SUMS=()
|
|
||||||
|
|
||||||
# Get the list of models
|
|
||||||
CHOICES=$(ollama list | awk '{print $1}')
|
|
||||||
|
|
||||||
# Select which models to run as a comparison
|
|
||||||
echo "Select $NUMBEROFCHOICES models to compare:"
|
|
||||||
select ITEM in $CHOICES; do
|
|
||||||
if [[ -n $ITEM ]]; then
|
|
||||||
echo "You have selected $ITEM"
|
|
||||||
SELECTIONS+=("$ITEM")
|
|
||||||
((COUNT++))
|
|
||||||
if [[ $COUNT -eq $NUMBEROFCHOICES ]]; then
|
|
||||||
break
|
|
||||||
fi
|
|
||||||
else
|
|
||||||
echo "Invalid selection"
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
|
|
||||||
# Loop through each of the selected models
|
|
||||||
for ITEM in "${SELECTIONS[@]}"; do
|
|
||||||
echo "--------------------------------------------------------------"
|
|
||||||
echo "Loading the model $ITEM into memory"
|
|
||||||
ollama run "$ITEM" ""
|
|
||||||
echo "--------------------------------------------------------------"
|
|
||||||
echo "Running the questions through the model $ITEM"
|
|
||||||
COMMAND_OUTPUT=$(ollama run "$ITEM" --verbose < sourcequestions.txt 2>&1| tee /dev/stderr)
|
|
||||||
|
|
||||||
# eval duration is sometimes listed in seconds and sometimes in milliseconds.
|
|
||||||
# Add up the values for each model
|
|
||||||
SUM=$(echo "$COMMAND_OUTPUT" | awk '
|
|
||||||
/eval duration:/ {
|
|
||||||
value = $3
|
|
||||||
if (index(value, "ms") > 0) {
|
|
||||||
gsub("ms", "", value)
|
|
||||||
value /= 1000
|
|
||||||
} else {
|
|
||||||
gsub("s", "", value)
|
|
||||||
}
|
|
||||||
sum += value
|
|
||||||
}
|
|
||||||
END { print sum }')
|
|
||||||
|
|
||||||
|
|
||||||
SUMS+=("All questions for $ITEM completed in $SUM seconds")
|
|
||||||
done
|
|
||||||
|
|
||||||
echo ""
|
|
||||||
echo "--------------------------------------------------------------"
|
|
||||||
echo -e "Sums of eval durations for each run:"
|
|
||||||
for val in "${SUMS[@]}"; do
|
|
||||||
echo "$val"
|
|
||||||
done
|
|
||||||
|
|
||||||
echo "--------------------------------------------------------------"
|
|
||||||
echo "Comparison complete. Now you can decide"
|
|
||||||
echo "which model is best."
|
|
||||||
echo "--------------------------------------------------------------"
|
|
||||||
@@ -1,7 +0,0 @@
|
|||||||
Why is the sky blue
|
|
||||||
What is a black hole
|
|
||||||
Explain the big bang theory like I am 5?
|
|
||||||
What is the quickest way to win a game of Monopoly with 3 others?
|
|
||||||
Why does a vacuum bottle keep my coffee hot and my milkshake cold?
|
|
||||||
What is the difference between a meteor, a meteorite, and a meteoroid?
|
|
||||||
Create an array with 5 items and print to the console. Do this in Python, C#, Typescript, and Rust.
|
|
||||||
1
examples/flyio/.gitignore
vendored
Normal file
1
examples/flyio/.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
|||||||
|
fly.toml
|
||||||
67
examples/flyio/README.md
Normal file
67
examples/flyio/README.md
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
# Deploy Ollama to Fly.io
|
||||||
|
|
||||||
|
> Note: this example exposes a public endpoint and does not configure authentication. Use with care.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Ollama: https://ollama.com/download
|
||||||
|
- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Login to Fly.io
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fly auth login
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Create a new Fly app
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Pull and run `orca-mini:3b`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b
|
||||||
|
```
|
||||||
|
|
||||||
|
`shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below).
|
||||||
|
|
||||||
|
## (Optional) Persistent Volume
|
||||||
|
|
||||||
|
By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:
|
||||||
|
|
||||||
|
1. Create the Fly Volume
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fly volume create ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Update `fly.toml` and add `[mounts]`
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[mounts]
|
||||||
|
source = "ollama"
|
||||||
|
destination = "/mnt/ollama/models"
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Update `fly.toml` and add `[env]`
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[env]
|
||||||
|
OLLAMA_MODELS = "/mnt/ollama/models"
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Deploy your app
|
||||||
|
|
||||||
|
```bash
|
||||||
|
fly deploy
|
||||||
|
```
|
||||||
|
|
||||||
|
## (Optional) Hardware Acceleration
|
||||||
|
|
||||||
|
Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu
|
||||||
|
|
||||||
|
Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`.
|
||||||
@@ -35,7 +35,7 @@ func main() {
|
|||||||
|
|
||||||
ctx := context.Background()
|
ctx := context.Background()
|
||||||
req := &api.ChatRequest{
|
req := &api.ChatRequest{
|
||||||
Model: "llama2",
|
Model: "llama3",
|
||||||
Messages: messages,
|
Messages: messages,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -7,12 +7,24 @@
|
|||||||
|
|
||||||
## Steps
|
## Steps
|
||||||
|
|
||||||
1. Create the Ollama namespace, daemon set, and service
|
1. Create the Ollama namespace, deployment, and service
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
kubectl apply -f cpu.yaml
|
kubectl apply -f cpu.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## (Optional) Hardware Acceleration
|
||||||
|
|
||||||
|
Hardware acceleration in Kubernetes requires NVIDIA's [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin) which is deployed in Kubernetes in form of daemonset. Follow the link for more details.
|
||||||
|
|
||||||
|
Once configured, create a GPU enabled Ollama deployment.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl apply -f gpu.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test
|
||||||
|
|
||||||
1. Port forward the Ollama service to connect and use it locally
|
1. Port forward the Ollama service to connect and use it locally
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -24,13 +36,3 @@
|
|||||||
```bash
|
```bash
|
||||||
ollama run orca-mini:3b
|
ollama run orca-mini:3b
|
||||||
```
|
```
|
||||||
|
|
||||||
## (Optional) Hardware Acceleration
|
|
||||||
|
|
||||||
Hardware acceleration in Kubernetes requires NVIDIA's [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin). Follow the link for more details.
|
|
||||||
|
|
||||||
Once configured, create a GPU enabled Ollama deployment.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl apply -f gpu.yaml
|
|
||||||
```
|
|
||||||
|
|||||||
@@ -51,7 +51,7 @@ while True:
|
|||||||
template=template,
|
template=template,
|
||||||
)
|
)
|
||||||
|
|
||||||
llm = Ollama(model="llama2:13b", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))
|
llm = Ollama(model="llama3:8b", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))
|
||||||
qa_chain = RetrievalQA.from_chain_type(
|
qa_chain = RetrievalQA.from_chain_type(
|
||||||
llm,
|
llm,
|
||||||
retriever=vectorstore.as_retriever(),
|
retriever=vectorstore.as_retriever(),
|
||||||
|
|||||||
@@ -1,12 +1,12 @@
|
|||||||
from langchain.llms import Ollama
|
from langchain_community.llms import Ollama
|
||||||
from langchain.document_loaders import WebBaseLoader
|
from langchain_community.document_loaders import WebBaseLoader
|
||||||
from langchain.chains.summarize import load_summarize_chain
|
from langchain.chains.summarize import load_summarize_chain
|
||||||
|
|
||||||
loader = WebBaseLoader("https://ollama.com/blog/run-llama2-uncensored-locally")
|
loader = WebBaseLoader("https://ollama.com/blog/run-llama2-uncensored-locally")
|
||||||
docs = loader.load()
|
docs = loader.load()
|
||||||
|
|
||||||
llm = Ollama(model="llama2")
|
llm = Ollama(model="llama3")
|
||||||
chain = load_summarize_chain(llm, chain_type="stuff")
|
chain = load_summarize_chain(llm, chain_type="stuff")
|
||||||
|
|
||||||
result = chain.run(docs)
|
result = chain.invoke(docs)
|
||||||
print(result)
|
print(result)
|
||||||
|
|||||||
@@ -4,10 +4,10 @@ This example is a basic "hello world" of using LangChain with Ollama.
|
|||||||
|
|
||||||
## Running the Example
|
## Running the Example
|
||||||
|
|
||||||
1. Ensure you have the `llama2` model installed:
|
1. Ensure you have the `llama3` model installed:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ollama pull llama2
|
ollama pull llama3
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Install the Python Requirements.
|
2. Install the Python Requirements.
|
||||||
@@ -21,4 +21,3 @@ This example is a basic "hello world" of using LangChain with Ollama.
|
|||||||
```bash
|
```bash
|
||||||
python main.py
|
python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -1,6 +1,6 @@
|
|||||||
from langchain.llms import Ollama
|
from langchain.llms import Ollama
|
||||||
|
|
||||||
input = input("What is your question?")
|
input = input("What is your question?")
|
||||||
llm = Ollama(model="llama2")
|
llm = Ollama(model="llama3")
|
||||||
res = llm.predict(input)
|
res = llm.predict(input)
|
||||||
print (res)
|
print (res)
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
FROM llama2
|
FROM llama3
|
||||||
PARAMETER temperature 1
|
PARAMETER temperature 1
|
||||||
SYSTEM """
|
SYSTEM """
|
||||||
You are Mario from super mario bros, acting as an assistant.
|
You are Mario from super mario bros, acting as an assistant.
|
||||||
|
|||||||
@@ -2,12 +2,12 @@
|
|||||||
|
|
||||||
# Example character: Mario
|
# Example character: Mario
|
||||||
|
|
||||||
This example shows how to create a basic character using Llama2 as the base model.
|
This example shows how to create a basic character using Llama3 as the base model.
|
||||||
|
|
||||||
To run this example:
|
To run this example:
|
||||||
|
|
||||||
1. Download the Modelfile
|
1. Download the Modelfile
|
||||||
2. `ollama pull llama2` to get the base model used in the model file.
|
2. `ollama pull llama3` to get the base model used in the model file.
|
||||||
3. `ollama create NAME -f ./Modelfile`
|
3. `ollama create NAME -f ./Modelfile`
|
||||||
4. `ollama run NAME`
|
4. `ollama run NAME`
|
||||||
|
|
||||||
@@ -18,7 +18,7 @@ Ask it some questions like "Who are you?" or "Is Peach in trouble again?"
|
|||||||
What the model file looks like:
|
What the model file looks like:
|
||||||
|
|
||||||
```
|
```
|
||||||
FROM llama2
|
FROM llama3
|
||||||
PARAMETER temperature 1
|
PARAMETER temperature 1
|
||||||
SYSTEM """
|
SYSTEM """
|
||||||
You are Mario from Super Mario Bros, acting as an assistant.
|
You are Mario from Super Mario Bros, acting as an assistant.
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ import requests
|
|||||||
import json
|
import json
|
||||||
import random
|
import random
|
||||||
|
|
||||||
model = "llama2"
|
model = "llama3"
|
||||||
template = {
|
template = {
|
||||||
"firstName": "",
|
"firstName": "",
|
||||||
"lastName": "",
|
"lastName": "",
|
||||||
|
|||||||
@@ -12,7 +12,7 @@ countries = [
|
|||||||
"France",
|
"France",
|
||||||
]
|
]
|
||||||
country = random.choice(countries)
|
country = random.choice(countries)
|
||||||
model = "llama2"
|
model = "llama3"
|
||||||
|
|
||||||
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in {country}, and phone number. Do not use common names. Respond using JSON. Key names should have no backslashes, values should use plain ascii with no special characters."
|
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in {country}, and phone number. Do not use common names. Respond using JSON. Key names should have no backslashes, values should use plain ascii with no special characters."
|
||||||
|
|
||||||
|
|||||||
@@ -6,10 +6,10 @@ There are two python scripts in this example. `randomaddresses.py` generates ran
|
|||||||
|
|
||||||
## Running the Example
|
## Running the Example
|
||||||
|
|
||||||
1. Ensure you have the `llama2` model installed:
|
1. Ensure you have the `llama3` model installed:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ollama pull llama2
|
ollama pull llama3
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Install the Python Requirements.
|
2. Install the Python Requirements.
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ import json
|
|||||||
import requests
|
import requests
|
||||||
|
|
||||||
# NOTE: ollama must be running for this to work, start the ollama app or run `ollama serve`
|
# NOTE: ollama must be running for this to work, start the ollama app or run `ollama serve`
|
||||||
model = "llama2" # TODO: update this for whatever model you wish to use
|
model = "llama3" # TODO: update this for whatever model you wish to use
|
||||||
|
|
||||||
|
|
||||||
def chat(messages):
|
def chat(messages):
|
||||||
|
|||||||
@@ -4,10 +4,10 @@ The **chat** endpoint is one of two ways to generate text from an LLM with Ollam
|
|||||||
|
|
||||||
## Running the Example
|
## Running the Example
|
||||||
|
|
||||||
1. Ensure you have the `llama2` model installed:
|
1. Ensure you have the `llama3` model installed:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ollama pull llama2
|
ollama pull llama3
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Install the Python Requirements.
|
2. Install the Python Requirements.
|
||||||
|
|||||||
@@ -4,10 +4,10 @@ This example demonstrates how one would create a set of 'mentors' you can have a
|
|||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
1. Add llama2 to have the mentors ask your questions:
|
1. Add llama3 to have the mentors ask your questions:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ollama pull llama2
|
ollama pull llama3
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Install prerequisites:
|
2. Install prerequisites:
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ async function characterGenerator() {
|
|||||||
ollama.setModel("stablebeluga2:70b-q4_K_M");
|
ollama.setModel("stablebeluga2:70b-q4_K_M");
|
||||||
const bio = await ollama.generate(`create a bio of ${character} in a single long paragraph. Instead of saying '${character} is...' or '${character} was...' use language like 'You are...' or 'You were...'. Then create a paragraph describing the speaking mannerisms and style of ${character}. Don't include anything about how ${character} looked or what they sounded like, just focus on the words they said. Instead of saying '${character} would say...' use language like 'You should say...'. If you use quotes, always use single quotes instead of double quotes. If there are any specific words or phrases you used a lot, show how you used them. `);
|
const bio = await ollama.generate(`create a bio of ${character} in a single long paragraph. Instead of saying '${character} is...' or '${character} was...' use language like 'You are...' or 'You were...'. Then create a paragraph describing the speaking mannerisms and style of ${character}. Don't include anything about how ${character} looked or what they sounded like, just focus on the words they said. Instead of saying '${character} would say...' use language like 'You should say...'. If you use quotes, always use single quotes instead of double quotes. If there are any specific words or phrases you used a lot, show how you used them. `);
|
||||||
|
|
||||||
const thecontents = `FROM llama2\nSYSTEM """\n${bio.response.replace(/(\r\n|\n|\r)/gm, " ").replace('would', 'should')} All answers to questions should be related back to what you are most known for.\n"""`;
|
const thecontents = `FROM llama3\nSYSTEM """\n${bio.response.replace(/(\r\n|\n|\r)/gm, " ").replace('would', 'should')} All answers to questions should be related back to what you are most known for.\n"""`;
|
||||||
|
|
||||||
fs.writeFile(path.join(directory, 'Modelfile'), thecontents, (err: any) => {
|
fs.writeFile(path.join(directory, 'Modelfile'), thecontents, (err: any) => {
|
||||||
if (err) throw err;
|
if (err) throw err;
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
import * as readline from "readline";
|
import * as readline from "readline";
|
||||||
|
|
||||||
const model = "llama2";
|
const model = "llama3";
|
||||||
type Message = {
|
type Message = {
|
||||||
role: "assistant" | "user" | "system";
|
role: "assistant" | "user" | "system";
|
||||||
content: string;
|
content: string;
|
||||||
|
|||||||
@@ -53,6 +53,8 @@ func HumanBytes(b int64) string {
|
|||||||
|
|
||||||
func HumanBytes2(b uint64) string {
|
func HumanBytes2(b uint64) string {
|
||||||
switch {
|
switch {
|
||||||
|
case b >= GibiByte:
|
||||||
|
return fmt.Sprintf("%.1f GiB", float64(b)/GibiByte)
|
||||||
case b >= MebiByte:
|
case b >= MebiByte:
|
||||||
return fmt.Sprintf("%.1f MiB", float64(b)/MebiByte)
|
return fmt.Sprintf("%.1f MiB", float64(b)/MebiByte)
|
||||||
case b >= KibiByte:
|
case b >= KibiByte:
|
||||||
|
|||||||
@@ -13,12 +13,20 @@ const (
|
|||||||
|
|
||||||
func HumanNumber(b uint64) string {
|
func HumanNumber(b uint64) string {
|
||||||
switch {
|
switch {
|
||||||
case b > Billion:
|
case b >= Billion:
|
||||||
return fmt.Sprintf("%.0fB", math.Round(float64(b)/Billion))
|
number := float64(b) / Billion
|
||||||
case b > Million:
|
if number == math.Floor(number) {
|
||||||
return fmt.Sprintf("%.0fM", math.Round(float64(b)/Million))
|
return fmt.Sprintf("%.0fB", number) // no decimals if whole number
|
||||||
case b > Thousand:
|
}
|
||||||
return fmt.Sprintf("%.0fK", math.Round(float64(b)/Thousand))
|
return fmt.Sprintf("%.1fB", number) // one decimal if not a whole number
|
||||||
|
case b >= Million:
|
||||||
|
number := float64(b) / Million
|
||||||
|
if number == math.Floor(number) {
|
||||||
|
return fmt.Sprintf("%.0fM", number) // no decimals if whole number
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%.2fM", number) // two decimals if not a whole number
|
||||||
|
case b >= Thousand:
|
||||||
|
return fmt.Sprintf("%.0fK", float64(b)/Thousand)
|
||||||
default:
|
default:
|
||||||
return fmt.Sprintf("%d", b)
|
return fmt.Sprintf("%d", b)
|
||||||
}
|
}
|
||||||
|
|||||||
34
format/format_test.go
Normal file
34
format/format_test.go
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
package format
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestHumanNumber(t *testing.T) {
|
||||||
|
|
||||||
|
type testCase struct {
|
||||||
|
input uint64
|
||||||
|
expected string
|
||||||
|
}
|
||||||
|
|
||||||
|
testCases := []testCase{
|
||||||
|
{0, "0"},
|
||||||
|
{1000000, "1M"},
|
||||||
|
{125000000, "125M"},
|
||||||
|
{500500000, "500.50M"},
|
||||||
|
{500550000, "500.55M"},
|
||||||
|
{1000000000, "1B"},
|
||||||
|
{2800000000, "2.8B"},
|
||||||
|
{2850000000, "2.9B"},
|
||||||
|
{1000000000000, "1000B"},
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, tc := range testCases {
|
||||||
|
t.Run(tc.expected, func(t *testing.T) {
|
||||||
|
result := HumanNumber(tc.input)
|
||||||
|
if result != tc.expected {
|
||||||
|
t.Errorf("Expected %s, got %s", tc.expected, result)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
62
go.mod
62
go.mod
@@ -1,36 +1,38 @@
|
|||||||
module github.com/ollama/ollama
|
module github.com/ollama/ollama
|
||||||
|
|
||||||
go 1.22
|
go 1.22.0
|
||||||
|
|
||||||
toolchain go1.22.0
|
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/containerd/console v1.0.3
|
github.com/containerd/console v1.0.3
|
||||||
github.com/d4l3k/go-bfloat16 v0.0.0-20211005043715-690c3bdd05f1
|
github.com/d4l3k/go-bfloat16 v0.0.0-20211005043715-690c3bdd05f1
|
||||||
github.com/emirpasic/gods v1.18.1
|
github.com/emirpasic/gods v1.18.1
|
||||||
github.com/gin-gonic/gin v1.9.1
|
github.com/gin-gonic/gin v1.10.0
|
||||||
github.com/golang/protobuf v1.5.0 // indirect
|
github.com/golang/protobuf v1.5.4 // indirect
|
||||||
github.com/google/uuid v1.0.0
|
github.com/google/uuid v1.1.2
|
||||||
github.com/mitchellh/mapstructure v1.5.0
|
github.com/mitchellh/mapstructure v1.5.0
|
||||||
github.com/olekukonko/tablewriter v0.0.5
|
github.com/olekukonko/tablewriter v0.0.5
|
||||||
github.com/spf13/cobra v1.7.0
|
github.com/spf13/cobra v1.7.0
|
||||||
github.com/stretchr/testify v1.8.4
|
github.com/stretchr/testify v1.9.0
|
||||||
github.com/x448/float16 v0.8.4
|
github.com/x448/float16 v0.8.4
|
||||||
golang.org/x/sync v0.3.0
|
golang.org/x/sync v0.3.0
|
||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/nlpodyssey/gopickle v0.3.0
|
github.com/nlpodyssey/gopickle v0.3.0
|
||||||
github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9
|
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c
|
||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/apache/arrow/go/arrow v0.0.0-20201229220542-30ce2eb5d4dc // indirect
|
github.com/apache/arrow/go/arrow v0.0.0-20211112161151-bc219186db40 // indirect
|
||||||
|
github.com/bytedance/sonic/loader v0.1.1 // indirect
|
||||||
github.com/chewxy/hm v1.0.0 // indirect
|
github.com/chewxy/hm v1.0.0 // indirect
|
||||||
github.com/chewxy/math32 v1.0.8 // indirect
|
github.com/chewxy/math32 v1.10.1 // indirect
|
||||||
|
github.com/cloudwego/base64x v0.1.4 // indirect
|
||||||
|
github.com/cloudwego/iasm v0.2.0 // indirect
|
||||||
github.com/davecgh/go-spew v1.1.1 // indirect
|
github.com/davecgh/go-spew v1.1.1 // indirect
|
||||||
github.com/gogo/protobuf v1.3.2 // indirect
|
github.com/gogo/protobuf v1.3.2 // indirect
|
||||||
github.com/google/flatbuffers v1.12.0 // indirect
|
github.com/google/flatbuffers v24.3.25+incompatible // indirect
|
||||||
|
github.com/kr/text v0.2.0 // indirect
|
||||||
github.com/mattn/go-runewidth v0.0.14 // indirect
|
github.com/mattn/go-runewidth v0.0.14 // indirect
|
||||||
github.com/pkg/errors v0.9.1 // indirect
|
github.com/pkg/errors v0.9.1 // indirect
|
||||||
github.com/pmezard/go-difflib v1.0.0 // indirect
|
github.com/pmezard/go-difflib v1.0.0 // indirect
|
||||||
@@ -38,40 +40,38 @@ require (
|
|||||||
github.com/xtgo/set v1.0.0 // indirect
|
github.com/xtgo/set v1.0.0 // indirect
|
||||||
go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6 // indirect
|
go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6 // indirect
|
||||||
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
|
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
|
||||||
gonum.org/v1/gonum v0.8.2 // indirect
|
gonum.org/v1/gonum v0.15.0 // indirect
|
||||||
gorgonia.org/vecf32 v0.9.0 // indirect
|
gorgonia.org/vecf32 v0.9.0 // indirect
|
||||||
gorgonia.org/vecf64 v0.9.0 // indirect
|
gorgonia.org/vecf64 v0.9.0 // indirect
|
||||||
)
|
)
|
||||||
|
|
||||||
require (
|
require (
|
||||||
github.com/bytedance/sonic v1.9.1 // indirect
|
github.com/bytedance/sonic v1.11.6 // indirect
|
||||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 // indirect
|
github.com/gabriel-vasile/mimetype v1.4.3 // indirect
|
||||||
github.com/gabriel-vasile/mimetype v1.4.2 // indirect
|
github.com/gin-contrib/cors v1.7.2
|
||||||
github.com/gin-contrib/cors v1.4.0
|
|
||||||
github.com/gin-contrib/sse v0.1.0 // indirect
|
github.com/gin-contrib/sse v0.1.0 // indirect
|
||||||
github.com/go-playground/locales v0.14.1 // indirect
|
github.com/go-playground/locales v0.14.1 // indirect
|
||||||
github.com/go-playground/universal-translator v0.18.1 // indirect
|
github.com/go-playground/universal-translator v0.18.1 // indirect
|
||||||
github.com/go-playground/validator/v10 v10.14.0 // indirect
|
github.com/go-playground/validator/v10 v10.20.0 // indirect
|
||||||
github.com/goccy/go-json v0.10.2 // indirect
|
github.com/goccy/go-json v0.10.2 // indirect
|
||||||
github.com/google/go-cmp v0.5.9 // indirect
|
|
||||||
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
github.com/inconshreveable/mousetrap v1.1.0 // indirect
|
||||||
github.com/json-iterator/go v1.1.12 // indirect
|
github.com/json-iterator/go v1.1.12 // indirect
|
||||||
github.com/klauspost/cpuid/v2 v2.2.4 // indirect
|
github.com/klauspost/cpuid/v2 v2.2.7 // indirect
|
||||||
github.com/leodido/go-urn v1.2.4 // indirect
|
github.com/leodido/go-urn v1.4.0 // indirect
|
||||||
github.com/mattn/go-isatty v0.0.19 // indirect
|
github.com/mattn/go-isatty v0.0.20 // indirect
|
||||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
||||||
github.com/modern-go/reflect2 v1.0.2 // indirect
|
github.com/modern-go/reflect2 v1.0.2 // indirect
|
||||||
github.com/pelletier/go-toml/v2 v2.0.8 // indirect
|
github.com/pelletier/go-toml/v2 v2.2.2 // indirect
|
||||||
github.com/spf13/pflag v1.0.5 // indirect
|
github.com/spf13/pflag v1.0.5 // indirect
|
||||||
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
||||||
github.com/ugorji/go/codec v1.2.11 // indirect
|
github.com/ugorji/go/codec v1.2.12 // indirect
|
||||||
golang.org/x/arch v0.3.0 // indirect
|
golang.org/x/arch v0.8.0 // indirect
|
||||||
golang.org/x/crypto v0.14.0
|
golang.org/x/crypto v0.23.0
|
||||||
golang.org/x/exp v0.0.0-20230817173708-d852ddb80c63
|
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa
|
||||||
golang.org/x/net v0.17.0 // indirect
|
golang.org/x/net v0.25.0 // indirect
|
||||||
golang.org/x/sys v0.13.0
|
golang.org/x/sys v0.20.0
|
||||||
golang.org/x/term v0.13.0
|
golang.org/x/term v0.20.0
|
||||||
golang.org/x/text v0.14.0 // indirect
|
golang.org/x/text v0.15.0 // indirect
|
||||||
google.golang.org/protobuf v1.30.0
|
google.golang.org/protobuf v1.34.1
|
||||||
gopkg.in/yaml.v3 v3.0.1 // indirect
|
gopkg.in/yaml.v3 v3.0.1 // indirect
|
||||||
)
|
)
|
||||||
|
|||||||
244
go.sum
244
go.sum
@@ -1,22 +1,32 @@
|
|||||||
cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
|
cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
|
||||||
|
cloud.google.com/go v0.34.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
|
||||||
|
dmitri.shuralyov.com/gpu/mtl v0.0.0-20190408044501-666a987793e9/go.mod h1:H6x//7gZCb22OMCxBHrMx7a5I7Hp++hsVxbQ4BYO7hU=
|
||||||
|
gioui.org v0.0.0-20210308172011-57750fc8a0a6/go.mod h1:RSH6KIUZ0p2xy5zHDxgAM4zumjgTw83q2ge/PI+yyw8=
|
||||||
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
||||||
|
github.com/BurntSushi/xgb v0.0.0-20160522181843-27f122750802/go.mod h1:IVnqGOEym/WlBOVXweHU+Q+/VP0lqqI8lqeDx9IjBqo=
|
||||||
github.com/ajstarks/svgo v0.0.0-20180226025133-644b8db467af/go.mod h1:K08gAheRH3/J6wwsYMMT4xOr94bZjxIelGM0+d/wbFw=
|
github.com/ajstarks/svgo v0.0.0-20180226025133-644b8db467af/go.mod h1:K08gAheRH3/J6wwsYMMT4xOr94bZjxIelGM0+d/wbFw=
|
||||||
github.com/apache/arrow/go/arrow v0.0.0-20201229220542-30ce2eb5d4dc h1:zvQ6w7KwtQWgMQiewOF9tFtundRMVZFSAksNV6ogzuY=
|
github.com/antihax/optional v1.0.0/go.mod h1:uupD/76wgC+ih3iEmQUL+0Ugr19nfwCT1kdvxnR2qWY=
|
||||||
github.com/apache/arrow/go/arrow v0.0.0-20201229220542-30ce2eb5d4dc/go.mod h1:c9sxoIT3YgLxH4UhLOCKaBlEojuMhVYpk4Ntv3opUTQ=
|
github.com/apache/arrow/go/arrow v0.0.0-20211112161151-bc219186db40 h1:q4dksr6ICHXqG5hm0ZW5IHyeEJXoIJSOZeBLmWPNeIQ=
|
||||||
github.com/bytedance/sonic v1.5.0/go.mod h1:ED5hyg4y6t3/9Ku1R6dU/4KyJ48DZ4jPhfY1O2AihPM=
|
github.com/apache/arrow/go/arrow v0.0.0-20211112161151-bc219186db40/go.mod h1:Q7yQnSMnLvcXlZ8RV+jwz/6y1rQTqbX6C82SndT52Zs=
|
||||||
github.com/bytedance/sonic v1.9.1 h1:6iJ6NqdoxCDr6mbY8h18oSO+cShGSMRGCEo7F2h0x8s=
|
github.com/boombuler/barcode v1.0.0/go.mod h1:paBWMcWSl3LHKBqUq+rly7CNSldXjb2rDl3JlRe0mD8=
|
||||||
github.com/bytedance/sonic v1.9.1/go.mod h1:i736AoUSYt75HyZLoJW9ERYxcy6eaN6h4BZXU064P/U=
|
github.com/bytedance/sonic v1.11.6 h1:oUp34TzMlL+OY1OUWxHqsdkgC/Zfc85zGqw9siXjrc0=
|
||||||
|
github.com/bytedance/sonic v1.11.6/go.mod h1:LysEHSvpvDySVdC2f87zGWf6CIKJcAvqab1ZaiQtds4=
|
||||||
|
github.com/bytedance/sonic/loader v0.1.1 h1:c+e5Pt1k/cy5wMveRDyk2X4B9hF4g7an8N3zCYjJFNM=
|
||||||
|
github.com/bytedance/sonic/loader v0.1.1/go.mod h1:ncP89zfokxS5LZrJxl5z0UJcsk4M4yY2JpfqGeCtNLU=
|
||||||
github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU=
|
github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU=
|
||||||
github.com/chenzhuoyu/base64x v0.0.0-20211019084208-fb5309c8db06/go.mod h1:DH46F32mSOjUmXrMHnKwZdA8wcEefY7UVqBKYGjpdQY=
|
|
||||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 h1:qSGYFH7+jGhDF8vLC+iwCD4WpbV1EBDSzWkJODFLams=
|
|
||||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311/go.mod h1:b583jCggY9gE99b6G5LEC39OIiVsWj+R97kbl5odCEk=
|
|
||||||
github.com/chewxy/hm v1.0.0 h1:zy/TSv3LV2nD3dwUEQL2VhXeoXbb9QkpmdRAVUFiA6k=
|
github.com/chewxy/hm v1.0.0 h1:zy/TSv3LV2nD3dwUEQL2VhXeoXbb9QkpmdRAVUFiA6k=
|
||||||
github.com/chewxy/hm v1.0.0/go.mod h1:qg9YI4q6Fkj/whwHR1D+bOGeF7SniIP40VweVepLjg0=
|
github.com/chewxy/hm v1.0.0/go.mod h1:qg9YI4q6Fkj/whwHR1D+bOGeF7SniIP40VweVepLjg0=
|
||||||
github.com/chewxy/math32 v1.0.0/go.mod h1:Miac6hA1ohdDUTagnvJy/q+aNnEk16qWUdb8ZVhvCN0=
|
github.com/chewxy/math32 v1.0.0/go.mod h1:Miac6hA1ohdDUTagnvJy/q+aNnEk16qWUdb8ZVhvCN0=
|
||||||
github.com/chewxy/math32 v1.0.8 h1:fU5E4Ec4Z+5RtRAi3TovSxUjQPkgRh+HbP7tKB2OFbM=
|
github.com/chewxy/math32 v1.10.1 h1:LFpeY0SLJXeaiej/eIp2L40VYfscTvKh/FSEZ68uMkU=
|
||||||
github.com/chewxy/math32 v1.0.8/go.mod h1:dOB2rcuFrCn6UHrze36WSLVPKtzPMRAQvBvUwkSsLqs=
|
github.com/chewxy/math32 v1.10.1/go.mod h1:dOB2rcuFrCn6UHrze36WSLVPKtzPMRAQvBvUwkSsLqs=
|
||||||
github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw=
|
github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw=
|
||||||
|
github.com/cloudwego/base64x v0.1.4 h1:jwCgWpFanWmN8xoIUHa2rtzmkd5J2plF/dnLS6Xd/0Y=
|
||||||
|
github.com/cloudwego/base64x v0.1.4/go.mod h1:0zlkT4Wn5C6NdauXdJRhSKRlJvmclQ1hhJgA0rcu/8w=
|
||||||
|
github.com/cloudwego/iasm v0.2.0 h1:1KNIy1I1H9hNNFEEH3DVnI4UujN+1zjpuk6gwHLTssg=
|
||||||
|
github.com/cloudwego/iasm v0.2.0/go.mod h1:8rXZaNYT2n95jn+zTI1sDr+IgcD2GVs0nlbbQPiEFhY=
|
||||||
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
|
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
|
||||||
|
github.com/cncf/udpa/go v0.0.0-20201120205902-5459f2c99403/go.mod h1:WmhPx2Nbnhtbo57+VJT5O0JRkEi1Wbu0z5j0R8u5Hbk=
|
||||||
|
github.com/cncf/xds/go v0.0.0-20210312221358-fbca930ec8ed/go.mod h1:eXthEFrGJvWHgFFCl3hGmgk+/aYT6PnTQLykKQRLhEs=
|
||||||
github.com/containerd/console v1.0.3 h1:lIr7SlA5PxZyMV30bDW0MGbiOPXwc63yRuCP0ARubLw=
|
github.com/containerd/console v1.0.3 h1:lIr7SlA5PxZyMV30bDW0MGbiOPXwc63yRuCP0ARubLw=
|
||||||
github.com/containerd/console v1.0.3/go.mod h1:7LqA/THxQ86k76b8c/EMSiaJ3h1eZkMkXar0TQ1gf3U=
|
github.com/containerd/console v1.0.3/go.mod h1:7LqA/THxQ86k76b8c/EMSiaJ3h1eZkMkXar0TQ1gf3U=
|
||||||
github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
|
github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
|
||||||
@@ -31,30 +41,35 @@ github.com/emirpasic/gods v1.18.1/go.mod h1:8tpGGwCnJ5H4r6BWwaV6OrWmMoPhUl5jm/FM
|
|||||||
github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
|
github.com/envoyproxy/go-control-plane v0.9.0/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
|
||||||
github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
|
github.com/envoyproxy/go-control-plane v0.9.1-0.20191026205805-5f8ba28d4473/go.mod h1:YTl/9mNaCwkRvm6d1a2C3ymFceY/DCBVvsKhRF0iEA4=
|
||||||
github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98=
|
github.com/envoyproxy/go-control-plane v0.9.4/go.mod h1:6rpuAdCZL397s3pYoYcLgu1mIlRU8Am5FuJP05cCM98=
|
||||||
|
github.com/envoyproxy/go-control-plane v0.9.9-0.20201210154907-fd9021fe5dad/go.mod h1:cXg6YxExXjJnVBQHBLXeUAgxn2UodCpnH306RInaBQk=
|
||||||
|
github.com/envoyproxy/go-control-plane v0.9.9-0.20210217033140-668b12f5399d/go.mod h1:cXg6YxExXjJnVBQHBLXeUAgxn2UodCpnH306RInaBQk=
|
||||||
|
github.com/envoyproxy/go-control-plane v0.9.9-0.20210512163311-63b5d3c536b0/go.mod h1:hliV/p42l8fGbc6Y9bQ70uLwIvmJyVE5k4iMKlh8wCQ=
|
||||||
github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7+kN2VEUnK/pcBlmesArF7c=
|
github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7+kN2VEUnK/pcBlmesArF7c=
|
||||||
github.com/fogleman/gg v1.2.1-0.20190220221249-0403632d5b90/go.mod h1:R/bRT+9gY/C5z7JzPU0zXsXHKM4/ayA+zqcVNZzPa1k=
|
github.com/fogleman/gg v1.2.1-0.20190220221249-0403632d5b90/go.mod h1:R/bRT+9gY/C5z7JzPU0zXsXHKM4/ayA+zqcVNZzPa1k=
|
||||||
github.com/gabriel-vasile/mimetype v1.4.2 h1:w5qFW6JKBz9Y393Y4q372O9A7cUSequkh1Q7OhCmWKU=
|
github.com/fogleman/gg v1.3.0/go.mod h1:R/bRT+9gY/C5z7JzPU0zXsXHKM4/ayA+zqcVNZzPa1k=
|
||||||
github.com/gabriel-vasile/mimetype v1.4.2/go.mod h1:zApsH/mKG4w07erKIaJPFiX0Tsq9BFQgN3qGY5GnNgA=
|
github.com/gabriel-vasile/mimetype v1.4.3 h1:in2uUcidCuFcDKtdcBxlR0rJ1+fsokWf+uqxgUFjbI0=
|
||||||
github.com/gin-contrib/cors v1.4.0 h1:oJ6gwtUl3lqV0WEIwM/LxPF1QZ5qe2lGWdY2+bz7y0g=
|
github.com/gabriel-vasile/mimetype v1.4.3/go.mod h1:d8uq/6HKRL6CGdk+aubisF/M5GcPfT7nKyLpA0lbSSk=
|
||||||
github.com/gin-contrib/cors v1.4.0/go.mod h1:bs9pNM0x/UsmHPBWT2xZz9ROh8xYjYkiURUfmBoMlcs=
|
github.com/ghodss/yaml v1.0.0/go.mod h1:4dBDuWmgqj2HViK6kFavaiC9ZROes6MMH2rRYeMEF04=
|
||||||
|
github.com/gin-contrib/cors v1.7.2 h1:oLDHxdg8W/XDoN/8zamqk/Drgt4oVZDvaV0YmvVICQw=
|
||||||
|
github.com/gin-contrib/cors v1.7.2/go.mod h1:SUJVARKgQ40dmrzgXEVxj2m7Ig1v1qIboQkPDTQ9t2E=
|
||||||
github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
|
github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
|
||||||
github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI=
|
github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI=
|
||||||
github.com/gin-gonic/gin v1.8.1/go.mod h1:ji8BvRH1azfM+SYow9zQ6SZMvR8qOMZHmsCuWR9tTTk=
|
github.com/gin-gonic/gin v1.10.0 h1:nTuyha1TYqgedzytsKYqna+DfLos46nTv2ygFy86HFU=
|
||||||
github.com/gin-gonic/gin v1.9.1 h1:4idEAncQnU5cB7BeOkPtxjfCSye0AAm1R0RVIqJ+Jmg=
|
github.com/gin-gonic/gin v1.10.0/go.mod h1:4PMNQiOhvDRa013RKVbsiNwoyezlm2rm0uX/T7kzp5Y=
|
||||||
github.com/gin-gonic/gin v1.9.1/go.mod h1:hPrL7YrpYKXt5YId3A/Tnip5kqbEAP+KLuI3SUcPTeU=
|
github.com/go-fonts/dejavu v0.1.0/go.mod h1:4Wt4I4OU2Nq9asgDCteaAaWZOV24E+0/Pwo0gppep4g=
|
||||||
github.com/go-playground/assert/v2 v2.0.1/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
|
github.com/go-fonts/latin-modern v0.2.0/go.mod h1:rQVLdDMK+mK1xscDwsqM5J8U2jrRa3T0ecnM9pNujks=
|
||||||
|
github.com/go-fonts/liberation v0.1.1/go.mod h1:K6qoJYypsmfVjWg8KOVDQhLc8UDgIK2HYqyqAO9z7GY=
|
||||||
|
github.com/go-fonts/stix v0.1.0/go.mod h1:w/c1f0ldAUlJmLBvlbkvVXLAD+tAMqobIIQpmnUIzUY=
|
||||||
|
github.com/go-gl/glfw v0.0.0-20190409004039-e6da0acd62b1/go.mod h1:vR7hzQXu2zJy9AVAgeJqvqgH9Q5CA+iKCZ2gyEVpxRU=
|
||||||
|
github.com/go-latex/latex v0.0.0-20210118124228-b3d85cf34e07/go.mod h1:CO1AlKB2CSIqUrmQPqA0gdRIlnLEY0gK5JGjh37zN5U=
|
||||||
github.com/go-playground/assert/v2 v2.2.0 h1:JvknZsQTYeFEAhQwI4qEt9cyV5ONwRHC+lYKSsYSR8s=
|
github.com/go-playground/assert/v2 v2.2.0 h1:JvknZsQTYeFEAhQwI4qEt9cyV5ONwRHC+lYKSsYSR8s=
|
||||||
github.com/go-playground/assert/v2 v2.2.0/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
|
github.com/go-playground/assert/v2 v2.2.0/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
|
||||||
github.com/go-playground/locales v0.14.0/go.mod h1:sawfccIbzZTqEDETgFXqTho0QybSa7l++s0DH+LDiLs=
|
|
||||||
github.com/go-playground/locales v0.14.1 h1:EWaQ/wswjilfKLTECiXz7Rh+3BjFhfDFKv/oXslEjJA=
|
github.com/go-playground/locales v0.14.1 h1:EWaQ/wswjilfKLTECiXz7Rh+3BjFhfDFKv/oXslEjJA=
|
||||||
github.com/go-playground/locales v0.14.1/go.mod h1:hxrqLVvrK65+Rwrd5Fc6F2O76J/NuW9t0sjnWqG1slY=
|
github.com/go-playground/locales v0.14.1/go.mod h1:hxrqLVvrK65+Rwrd5Fc6F2O76J/NuW9t0sjnWqG1slY=
|
||||||
github.com/go-playground/universal-translator v0.18.0/go.mod h1:UvRDBj+xPUEGrFYl+lu/H90nyDXpg0fqeB/AQUGNTVA=
|
|
||||||
github.com/go-playground/universal-translator v0.18.1 h1:Bcnm0ZwsGyWbCzImXv+pAJnYK9S473LQFuzCbDbfSFY=
|
github.com/go-playground/universal-translator v0.18.1 h1:Bcnm0ZwsGyWbCzImXv+pAJnYK9S473LQFuzCbDbfSFY=
|
||||||
github.com/go-playground/universal-translator v0.18.1/go.mod h1:xekY+UJKNuX9WP91TpwSH2VMlDf28Uj24BCp08ZFTUY=
|
github.com/go-playground/universal-translator v0.18.1/go.mod h1:xekY+UJKNuX9WP91TpwSH2VMlDf28Uj24BCp08ZFTUY=
|
||||||
github.com/go-playground/validator/v10 v10.10.0/go.mod h1:74x4gJWsvQexRdW8Pn3dXSGrTK4nAUsbPlLADvpJkos=
|
github.com/go-playground/validator/v10 v10.20.0 h1:K9ISHbSaI0lyB2eWMPJo+kOS/FBExVwjEviJTixqxL8=
|
||||||
github.com/go-playground/validator/v10 v10.14.0 h1:vgvQWe3XCz3gIeFDm/HnTIbj6UGmg/+t63MyGU2n5js=
|
github.com/go-playground/validator/v10 v10.20.0/go.mod h1:dbuPbCMFw/DrkbEynArYaCwl3amGuJotoKCe95atGMM=
|
||||||
github.com/go-playground/validator/v10 v10.14.0/go.mod h1:9iXMNT7sEkjXb0I+enO7QXmzG6QCsPWY4zveKFVRSyU=
|
|
||||||
github.com/goccy/go-json v0.9.7/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
|
||||||
github.com/goccy/go-json v0.10.2 h1:CrxCmQqYDkv1z7lO7Wbh2HN93uovUHgrECaO5ZrCXAU=
|
github.com/goccy/go-json v0.10.2 h1:CrxCmQqYDkv1z7lO7Wbh2HN93uovUHgrECaO5ZrCXAU=
|
||||||
github.com/goccy/go-json v0.10.2/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
github.com/goccy/go-json v0.10.2/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
||||||
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
|
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
|
||||||
@@ -72,46 +87,51 @@ github.com/golang/protobuf v1.4.0-rc.4.0.20200313231945-b860323f09d0/go.mod h1:W
|
|||||||
github.com/golang/protobuf v1.4.0/go.mod h1:jodUvKwWbYaEsadDk5Fwe5c77LiNKVO9IDvqG2KuDX0=
|
github.com/golang/protobuf v1.4.0/go.mod h1:jodUvKwWbYaEsadDk5Fwe5c77LiNKVO9IDvqG2KuDX0=
|
||||||
github.com/golang/protobuf v1.4.1/go.mod h1:U8fpvMrcmy5pZrNK1lt4xCsGvpyWQ/VVv6QDs8UjoX8=
|
github.com/golang/protobuf v1.4.1/go.mod h1:U8fpvMrcmy5pZrNK1lt4xCsGvpyWQ/VVv6QDs8UjoX8=
|
||||||
github.com/golang/protobuf v1.4.2/go.mod h1:oDoupMAO8OvCJWAcko0GGGIgR6R6ocIYbsSw735rRwI=
|
github.com/golang/protobuf v1.4.2/go.mod h1:oDoupMAO8OvCJWAcko0GGGIgR6R6ocIYbsSw735rRwI=
|
||||||
github.com/golang/protobuf v1.5.0 h1:LUVKkCeviFUMKqHa4tXIIij/lbhnMbP7Fn5wKdKkRh4=
|
github.com/golang/protobuf v1.4.3/go.mod h1:oDoupMAO8OvCJWAcko0GGGIgR6R6ocIYbsSw735rRwI=
|
||||||
github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk=
|
github.com/golang/protobuf v1.5.0/go.mod h1:FsONVRAS9T7sI+LIUmWTfcYkHO4aIWwzhcaSAoJOfIk=
|
||||||
github.com/google/flatbuffers v1.11.0/go.mod h1:1AeVuKshWv4vARoZatz6mlQ0JxURH0Kv5+zNeJKJCa8=
|
github.com/golang/protobuf v1.5.2/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY=
|
||||||
github.com/google/flatbuffers v1.12.0 h1:/PtAHvnBY4Kqnx/xCQ3OIV9uYcSFGScBsWI3Oogeh6w=
|
github.com/golang/protobuf v1.5.4 h1:i7eJL8qZTpSEXOPTxNKhASYpMn+8e5Q6AdndVa1dWek=
|
||||||
github.com/google/flatbuffers v1.12.0/go.mod h1:1AeVuKshWv4vARoZatz6mlQ0JxURH0Kv5+zNeJKJCa8=
|
github.com/golang/protobuf v1.5.4/go.mod h1:lnTiLA8Wa4RWRcIUkrtSVa5nRhsEGBg48fD6rSs7xps=
|
||||||
|
github.com/golang/snappy v0.0.3 h1:fHPg5GQYlCeLIPB9BZqMVR5nR9A+IM5zcgeTdjMYmLA=
|
||||||
|
github.com/golang/snappy v0.0.3/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q=
|
||||||
|
github.com/google/flatbuffers v2.0.0+incompatible/go.mod h1:1AeVuKshWv4vARoZatz6mlQ0JxURH0Kv5+zNeJKJCa8=
|
||||||
|
github.com/google/flatbuffers v24.3.25+incompatible h1:CX395cjN9Kke9mmalRoL3d81AtFUxJM+yDthflgJGkI=
|
||||||
|
github.com/google/flatbuffers v24.3.25+incompatible/go.mod h1:1AeVuKshWv4vARoZatz6mlQ0JxURH0Kv5+zNeJKJCa8=
|
||||||
github.com/google/go-cmp v0.2.0/go.mod h1:oXzfMopK8JAjlY9xF4vHSVASa0yLyX7SntLO5aqRK0M=
|
github.com/google/go-cmp v0.2.0/go.mod h1:oXzfMopK8JAjlY9xF4vHSVASa0yLyX7SntLO5aqRK0M=
|
||||||
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
|
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
|
||||||
github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
|
github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
|
||||||
github.com/google/go-cmp v0.4.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
github.com/google/go-cmp v0.4.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
||||||
github.com/google/go-cmp v0.5.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
github.com/google/go-cmp v0.5.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
||||||
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
||||||
github.com/google/go-cmp v0.5.9 h1:O2Tfq5qg4qc4AmwVlvv0oLiVAGB7enBSJ2x2DqQFi38=
|
github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
|
||||||
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
|
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
|
||||||
|
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
|
||||||
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
|
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
|
||||||
github.com/google/uuid v1.0.0 h1:b4Gk+7WdP/d3HZH8EJsZpvV7EtDOgaZLtnaNGIu1adA=
|
github.com/google/uuid v1.1.2 h1:EVhdT+1Kseyi1/pUmXKaFxYsDNy9RQYkMWRH68J/W7Y=
|
||||||
github.com/google/uuid v1.0.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
|
github.com/google/uuid v1.1.2/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
|
||||||
|
github.com/grpc-ecosystem/grpc-gateway v1.16.0/go.mod h1:BDjrQk3hbvj6Nolgz8mAMFbcEtjT1g+wF4CSlocrBnw=
|
||||||
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
|
github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
|
||||||
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
|
github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
|
||||||
github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
|
github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
|
||||||
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
|
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
|
||||||
|
github.com/jung-kurt/gofpdf v1.0.0/go.mod h1:7Id9E/uU8ce6rXgefFLlgrJj/GYY22cpxn+r32jIOes=
|
||||||
github.com/jung-kurt/gofpdf v1.0.3-0.20190309125859-24315acbbda5/go.mod h1:7Id9E/uU8ce6rXgefFLlgrJj/GYY22cpxn+r32jIOes=
|
github.com/jung-kurt/gofpdf v1.0.3-0.20190309125859-24315acbbda5/go.mod h1:7Id9E/uU8ce6rXgefFLlgrJj/GYY22cpxn+r32jIOes=
|
||||||
github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8=
|
github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8=
|
||||||
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
|
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
|
||||||
|
github.com/klauspost/compress v1.13.1 h1:wXr2uRxZTJXHLly6qhJabee5JqIhTRoLBhDOA74hDEQ=
|
||||||
|
github.com/klauspost/compress v1.13.1/go.mod h1:8dP1Hq4DHOhN9w426knH3Rhby4rFm6D8eO+e+Dq5Gzg=
|
||||||
github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
|
github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
|
||||||
github.com/klauspost/cpuid/v2 v2.2.4 h1:acbojRNwl3o09bUq+yDCtZFc1aiwaAAxtcn8YkZXnvk=
|
github.com/klauspost/cpuid/v2 v2.2.7 h1:ZWSB3igEs+d0qvnxR/ZBzXVmxkgt8DdzP6m9pfuVLDM=
|
||||||
github.com/klauspost/cpuid/v2 v2.2.4/go.mod h1:RVVoqg1df56z8g3pUjL/3lE5UfnlrJX8tyFgg4nqhuY=
|
github.com/klauspost/cpuid/v2 v2.2.7/go.mod h1:Lcz8mBdAVJIBVzewtcLocK12l3Y+JytZYpaMropDUws=
|
||||||
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
|
github.com/knz/go-libedit v1.10.1/go.mod h1:MZTVkCWyz0oBc7JOWP3wNAzd002ZbM/5hgShxwh4x8M=
|
||||||
github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
|
|
||||||
github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
|
github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
|
||||||
github.com/kr/pretty v0.3.0/go.mod h1:640gp4NfQd8pI5XOwp5fnNeVWj67G7CFk/SaSQn7NBk=
|
github.com/kr/pretty v0.3.0/go.mod h1:640gp4NfQd8pI5XOwp5fnNeVWj67G7CFk/SaSQn7NBk=
|
||||||
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
|
|
||||||
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
|
|
||||||
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
|
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
|
||||||
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
|
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
|
||||||
github.com/leodido/go-urn v1.2.1/go.mod h1:zt4jvISO2HfUBqxjfIshjdMTYS56ZS/qv49ictyFfxY=
|
github.com/leodido/go-urn v1.4.0 h1:WT9HwE9SGECu3lg4d/dIA+jxlljEa1/ffXKmRjqdmIQ=
|
||||||
github.com/leodido/go-urn v1.2.4 h1:XlAE/cm/ms7TE/VMVoduSpNBoyc2dOxHs5MZSwAN63Q=
|
github.com/leodido/go-urn v1.4.0/go.mod h1:bvxc+MVxLKB4z00jd1z+Dvzr47oO32F/QSNjSBOlFxI=
|
||||||
github.com/leodido/go-urn v1.2.4/go.mod h1:7ZrI8mTSeBSHl/UaRyKQW1qZeMgak41ANeCNaVckg+4=
|
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
|
||||||
github.com/mattn/go-isatty v0.0.14/go.mod h1:7GGIvUiUoEMVVmxf/4nioHXj79iQHKdU27kJ6hsGG94=
|
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
||||||
github.com/mattn/go-isatty v0.0.19 h1:JITubQf0MOLdlGRuRq+jtsDlekdYPia9ZFsB8h/APPA=
|
|
||||||
github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
|
||||||
github.com/mattn/go-runewidth v0.0.9/go.mod h1:H031xJmbD/WCDINGzjvQ9THkh0rPKHF+m2gUSrubnMI=
|
github.com/mattn/go-runewidth v0.0.9/go.mod h1:H031xJmbD/WCDINGzjvQ9THkh0rPKHF+m2gUSrubnMI=
|
||||||
github.com/mattn/go-runewidth v0.0.14 h1:+xnbZSEeDbOIg5/mE6JF0w6n9duR1l3/WmbinWVwUuU=
|
github.com/mattn/go-runewidth v0.0.14 h1:+xnbZSEeDbOIg5/mE6JF0w6n9duR1l3/WmbinWVwUuU=
|
||||||
github.com/mattn/go-runewidth v0.0.14/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
|
github.com/mattn/go-runewidth v0.0.14/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
|
||||||
@@ -126,12 +146,15 @@ github.com/nlpodyssey/gopickle v0.3.0 h1:BLUE5gxFLyyNOPzlXxt6GoHEMMxD0qhsE4p0CIQ
|
|||||||
github.com/nlpodyssey/gopickle v0.3.0/go.mod h1:f070HJ/yR+eLi5WmM1OXJEGaTpuJEUiib19olXgYha0=
|
github.com/nlpodyssey/gopickle v0.3.0/go.mod h1:f070HJ/yR+eLi5WmM1OXJEGaTpuJEUiib19olXgYha0=
|
||||||
github.com/olekukonko/tablewriter v0.0.5 h1:P2Ga83D34wi1o9J6Wh1mRuqd4mF/x/lgBS7N7AbDhec=
|
github.com/olekukonko/tablewriter v0.0.5 h1:P2Ga83D34wi1o9J6Wh1mRuqd4mF/x/lgBS7N7AbDhec=
|
||||||
github.com/olekukonko/tablewriter v0.0.5/go.mod h1:hPp6KlRPjbx+hW8ykQs1w3UBbZlj6HuIJcUGPhkA7kY=
|
github.com/olekukonko/tablewriter v0.0.5/go.mod h1:hPp6KlRPjbx+hW8ykQs1w3UBbZlj6HuIJcUGPhkA7kY=
|
||||||
github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9 h1:DV4iXjNn6fGeDl1AkZ1I0QB/0DBjrc7kPpxHrmuDzW4=
|
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c h1:GwiUUjKefgvSNmv3NCvI/BL0kDebW6Xa+kcdpdc1mTY=
|
||||||
github.com/pdevine/tensor v0.0.0-20240228013915-64ccaa8d9ca9/go.mod h1:nR7l3gM6ubiOm+mCkmmUyIBUcBAyiUmW6dQrDZhugFE=
|
github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c/go.mod h1:PSojXDXF7TbgQiD6kkd98IHOS0QqTyUEaWRiS8+BLu8=
|
||||||
github.com/pelletier/go-toml/v2 v2.0.1/go.mod h1:r9LEWfGN8R5k0VXJ+0BkIe7MYkRdwZOjgMj2KwnJFUo=
|
github.com/pelletier/go-toml/v2 v2.2.2 h1:aYUidT7k73Pcl9nb2gScu7NSrKCSHIDE89b3+6Wq+LM=
|
||||||
github.com/pelletier/go-toml/v2 v2.0.8 h1:0ctb6s9mE31h0/lhu+J6OPmVeDxJn+kYnJc2jZR9tGQ=
|
github.com/pelletier/go-toml/v2 v2.2.2/go.mod h1:1t835xjRzz80PqgE6HHgN2JOsmgYu/h4qDAS4n929Rs=
|
||||||
github.com/pelletier/go-toml/v2 v2.0.8/go.mod h1:vuYfssBdrU2XDZ9bYydBu6t+6a6PYNcZljzZR9VXg+4=
|
github.com/phpdave11/gofpdf v1.4.2/go.mod h1:zpO6xFn9yxo3YLyMvW8HcKWVdbNqgIfOOp2dXMnm1mY=
|
||||||
github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e/go.mod h1:pJLUxLENpZxwdsKMEsNbx1VGcRFpLqf3715MtcvvzbA=
|
github.com/phpdave11/gofpdi v1.0.12/go.mod h1:vBmVV0Do6hSBHC8uKUQ71JGW+ZGQq74llk/7bXwjDoI=
|
||||||
|
github.com/pierrec/lz4/v4 v4.1.8 h1:ieHkV+i2BRzngO4Wd/3HGowuZStgq6QkPsD1eolNAO4=
|
||||||
|
github.com/pierrec/lz4/v4 v4.1.8/go.mod h1:gZWDp/Ze/IJXGXf23ltt2EXimqmTUXEy0GFuRQyBid4=
|
||||||
|
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
|
||||||
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
|
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
|
||||||
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
|
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
|
||||||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
|
||||||
@@ -139,10 +162,11 @@ github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZN
|
|||||||
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=
|
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4/go.mod h1:xMI15A0UPsDsEKsMN9yxemIoYk6Tm2C1GtYGdfGttqA=
|
||||||
github.com/rivo/uniseg v0.2.0 h1:S1pD9weZBuJdFmowNwbpi7BJ8TNftyUImj/0WQi72jY=
|
github.com/rivo/uniseg v0.2.0 h1:S1pD9weZBuJdFmowNwbpi7BJ8TNftyUImj/0WQi72jY=
|
||||||
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
|
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
|
||||||
github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTEfhy4qGm1nDQc=
|
github.com/rogpeppe/fastuuid v1.2.0/go.mod h1:jVj6XXZzXRy/MSR5jhDC/2q6DgLz+nrA6LYCDYWNEvQ=
|
||||||
github.com/rogpeppe/go-internal v1.8.0 h1:FCbCCtXNOY3UtUuHUYaghJg4y7Fd14rXifAYUAtL9R8=
|
github.com/rogpeppe/go-internal v1.8.0 h1:FCbCCtXNOY3UtUuHUYaghJg4y7Fd14rXifAYUAtL9R8=
|
||||||
github.com/rogpeppe/go-internal v1.8.0/go.mod h1:WmiCO8CzOY8rg0OYDC4/i/2WRWAB6poM+XZ2dLUbcbE=
|
github.com/rogpeppe/go-internal v1.8.0/go.mod h1:WmiCO8CzOY8rg0OYDC4/i/2WRWAB6poM+XZ2dLUbcbE=
|
||||||
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
|
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
|
||||||
|
github.com/ruudk/golang-pdf417 v0.0.0-20181029194003-1af4ab5afa58/go.mod h1:6lfFZQK844Gfx8o5WFuvpxWRwnSoipWe/p622j1v06w=
|
||||||
github.com/spf13/cobra v1.7.0 h1:hyqWnYt1ZQShIddO5kBpj3vu05/++x6tJ6dg8EC572I=
|
github.com/spf13/cobra v1.7.0 h1:hyqWnYt1ZQShIddO5kBpj3vu05/++x6tJ6dg8EC572I=
|
||||||
github.com/spf13/cobra v1.7.0/go.mod h1:uLxZILRyS/50WlhOIKD7W6V5bgeIt+4sICxh6uRMrb0=
|
github.com/spf13/cobra v1.7.0/go.mod h1:uLxZILRyS/50WlhOIKD7W6V5bgeIt+4sICxh6uRMrb0=
|
||||||
github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
|
github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
|
||||||
@@ -150,96 +174,119 @@ github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An
|
|||||||
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
|
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
|
||||||
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
|
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
|
||||||
github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
|
github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
|
||||||
|
github.com/stretchr/objx v0.5.2/go.mod h1:FRsXN1f5AsAjCGJKqEizvkpNtU+EGNCLh3NxZ/8L+MA=
|
||||||
github.com/stretchr/testify v1.1.4/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
|
github.com/stretchr/testify v1.1.4/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
|
||||||
github.com/stretchr/testify v1.2.0/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
|
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
|
||||||
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
|
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
|
||||||
github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA=
|
||||||
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
||||||
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
|
||||||
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
|
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU=
|
||||||
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
|
github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
|
||||||
github.com/stretchr/testify v1.8.2/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
|
|
||||||
github.com/stretchr/testify v1.8.3/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
|
||||||
github.com/stretchr/testify v1.8.4 h1:CcVxjf3Q8PM0mHUKJCdn+eZZtm5yQwehR5yeSVQQcUk=
|
|
||||||
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
github.com/stretchr/testify v1.8.4/go.mod h1:sz/lmYIOXD/1dqDmKjjqLyZ2RngseejIcXlSw2iwfAo=
|
||||||
|
github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
|
||||||
|
github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
|
||||||
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
|
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
|
||||||
github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
|
github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
|
||||||
github.com/ugorji/go v1.2.7/go.mod h1:nF9osbDWLy6bDVv/Rtoh6QgnvNDpmCalQV5urGCCS6M=
|
github.com/ugorji/go/codec v1.2.12 h1:9LC83zGrHhuUA9l16C9AHXAqEV/2wBQ4nkvumAE65EE=
|
||||||
github.com/ugorji/go/codec v1.2.7/go.mod h1:WGN1fab3R1fzQlVQTkfxVtIBhWDRqOviHU95kRgeqEY=
|
github.com/ugorji/go/codec v1.2.12/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg=
|
||||||
github.com/ugorji/go/codec v1.2.11 h1:BMaWp1Bb6fHwEtbplGBGJ498wD+LKlNSl25MjdZY4dU=
|
|
||||||
github.com/ugorji/go/codec v1.2.11/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg=
|
|
||||||
github.com/x448/float16 v0.8.4 h1:qLwI1I70+NjRFUR3zs1JPUCgaCXSh3SW62uAKT1mSBM=
|
github.com/x448/float16 v0.8.4 h1:qLwI1I70+NjRFUR3zs1JPUCgaCXSh3SW62uAKT1mSBM=
|
||||||
github.com/x448/float16 v0.8.4/go.mod h1:14CWIYCyZA/cWjXOioeEpHeN/83MdbZDRQHoFcYsOfg=
|
github.com/x448/float16 v0.8.4/go.mod h1:14CWIYCyZA/cWjXOioeEpHeN/83MdbZDRQHoFcYsOfg=
|
||||||
github.com/xtgo/set v1.0.0 h1:6BCNBRv3ORNDQ7fyoJXRv+tstJz3m1JVFQErfeZz2pY=
|
github.com/xtgo/set v1.0.0 h1:6BCNBRv3ORNDQ7fyoJXRv+tstJz3m1JVFQErfeZz2pY=
|
||||||
github.com/xtgo/set v1.0.0/go.mod h1:d3NHzGzSa0NmB2NhFyECA+QdRp29oEn2xbT+TpeFoM8=
|
github.com/xtgo/set v1.0.0/go.mod h1:d3NHzGzSa0NmB2NhFyECA+QdRp29oEn2xbT+TpeFoM8=
|
||||||
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
||||||
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
||||||
|
github.com/yuin/goldmark v1.3.5/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k=
|
||||||
|
go.opentelemetry.io/proto/otlp v0.7.0/go.mod h1:PqfVotwruBrMGOCsRd/89rSnXhoiJIqeYNgFYFoEGnI=
|
||||||
go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6 h1:lGdhQUN/cnWdSH3291CUuxSEqc+AsGTiDxPP3r2J0l4=
|
go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6 h1:lGdhQUN/cnWdSH3291CUuxSEqc+AsGTiDxPP3r2J0l4=
|
||||||
go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6/go.mod h1:FftLjUGFEDu5k8lt0ddY+HcrH/qU/0qk+H8j9/nTl3E=
|
go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6/go.mod h1:FftLjUGFEDu5k8lt0ddY+HcrH/qU/0qk+H8j9/nTl3E=
|
||||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
||||||
golang.org/x/arch v0.3.0 h1:02VY4/ZcO/gBOH6PUaoiptASxtXU10jazRCP865E97k=
|
golang.org/x/arch v0.8.0 h1:3wRIsP3pM4yUptoR96otTUOXI367OS0+c9eeRi9doIc=
|
||||||
golang.org/x/arch v0.3.0/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
golang.org/x/arch v0.8.0/go.mod h1:FEVrYAQjsQXMVJ1nsMoVVXPZg6p2JE2mx8psSWTDQys=
|
||||||
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
|
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
|
||||||
|
golang.org/x/crypto v0.0.0-20190510104115-cbcb75029529/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
|
||||||
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
|
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
|
||||||
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
|
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
|
||||||
golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
|
golang.org/x/crypto v0.23.0 h1:dIJU/v2J8Mdglj/8rJ6UUOM3Zc9zLZxVZwwxMooUSAI=
|
||||||
golang.org/x/crypto v0.14.0 h1:wBqGXzWJW6m1XrIKlAH0Hs1JJ7+9KBwnIO8v66Q9cHc=
|
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
|
||||||
golang.org/x/crypto v0.14.0/go.mod h1:MVFd36DqK4CsrnJYDkBA3VC4m2GkXAM0PvzMCn4JQf4=
|
|
||||||
golang.org/x/exp v0.0.0-20180321215751-8460e604b9de/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
golang.org/x/exp v0.0.0-20180321215751-8460e604b9de/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
||||||
golang.org/x/exp v0.0.0-20180807140117-3d87b88a115f/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
golang.org/x/exp v0.0.0-20180807140117-3d87b88a115f/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
||||||
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
||||||
golang.org/x/exp v0.0.0-20190125153040-c74c464bbbf2/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
golang.org/x/exp v0.0.0-20190125153040-c74c464bbbf2/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
||||||
golang.org/x/exp v0.0.0-20230817173708-d852ddb80c63 h1:m64FZMko/V45gv0bNmrNYoDEq8U5YUhetc9cBWKS1TQ=
|
golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
||||||
golang.org/x/exp v0.0.0-20230817173708-d852ddb80c63/go.mod h1:0v4NqG35kSWCMzLaMeX+IQrlSnVE/bqGSyC2cz/9Le8=
|
golang.org/x/exp v0.0.0-20191002040644-a1355ae1e2c3/go.mod h1:NOZ3BPKG0ec/BKJQgnvsSFpcKLM5xXVWnvZS97DWHgE=
|
||||||
|
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa h1:FRnLl4eNAQl8hwxVVC17teOw8kdjVDVAiFMtgUdTSRQ=
|
||||||
|
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa/go.mod h1:zk2irFbV9DP96SEBUUAy67IdHUaZuSnrz1n472HUCLE=
|
||||||
golang.org/x/image v0.0.0-20180708004352-c73c2afc3b81/go.mod h1:ux5Hcp/YLpHSI86hEcLt0YII63i6oz57MZXIpbrjZUs=
|
golang.org/x/image v0.0.0-20180708004352-c73c2afc3b81/go.mod h1:ux5Hcp/YLpHSI86hEcLt0YII63i6oz57MZXIpbrjZUs=
|
||||||
|
golang.org/x/image v0.0.0-20190227222117-0694c2d4d067/go.mod h1:kZ7UVZpmo3dzQBMxlp+ypCbDeSB+sBbTgSJuh5dn5js=
|
||||||
|
golang.org/x/image v0.0.0-20190802002840-cff245a6509b/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
|
||||||
|
golang.org/x/image v0.0.0-20190910094157-69e4b8554b2a/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
|
||||||
|
golang.org/x/image v0.0.0-20200119044424-58c23975cae1/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
|
||||||
|
golang.org/x/image v0.0.0-20200430140353-33d19683fad8/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
|
||||||
|
golang.org/x/image v0.0.0-20200618115811-c13761719519/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
|
||||||
|
golang.org/x/image v0.0.0-20201208152932-35266b937fa6/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
|
||||||
|
golang.org/x/image v0.0.0-20210216034530-4410531fe030/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
|
||||||
golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
|
golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
|
||||||
golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
|
golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
|
||||||
golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
|
golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
|
||||||
|
golang.org/x/lint v0.0.0-20210508222113-6edffad5e616/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY=
|
||||||
|
golang.org/x/mobile v0.0.0-20190719004257-d2bd2a29d028/go.mod h1:E/iHnbuqvinMTCcRqshq8CkpyQDoeVncDDYHnLhea+o=
|
||||||
|
golang.org/x/mod v0.1.0/go.mod h1:0QHyrYULN0/3qlju5TqG8bIK38QM8yzMo5ekMj3DlcY=
|
||||||
|
golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg=
|
||||||
golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
|
golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
|
||||||
golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
|
golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
|
||||||
|
golang.org/x/mod v0.4.2/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
|
||||||
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
||||||
golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
||||||
|
golang.org/x/net v0.0.0-20190108225652-1e06a53dbb7e/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
||||||
golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
|
||||||
golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
|
golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
|
||||||
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
|
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
|
||||||
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
|
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
|
||||||
golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
|
golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
|
||||||
golang.org/x/net v0.0.0-20200904194848-62affa334b73/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA=
|
golang.org/x/net v0.0.0-20200822124328-c89045814202/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA=
|
||||||
golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
|
golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
|
||||||
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
|
golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM=
|
||||||
golang.org/x/net v0.17.0 h1:pVaXccu2ozPjCXewfr1S7xza/zcXTity9cCdXQYSjIM=
|
golang.org/x/net v0.0.0-20210614182718-04defd469f4e/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
|
||||||
golang.org/x/net v0.17.0/go.mod h1:NxSsAGuq816PNPmqtQdLE42eU2Fs7NoRIZrHJAlaCOE=
|
golang.org/x/net v0.25.0 h1:d/OCCoBEUq33pjydKrGQhw7IlUPI2Oylr+8qLx49kac=
|
||||||
|
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
|
||||||
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
|
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
|
||||||
|
golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
|
||||||
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||||
golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||||
|
golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||||
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||||
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||||
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||||
|
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||||
golang.org/x/sync v0.3.0 h1:ftCYgMx6zT/asHUrPw8BLLscYtGznsLAnjq5RH9P66E=
|
golang.org/x/sync v0.3.0 h1:ftCYgMx6zT/asHUrPw8BLLscYtGznsLAnjq5RH9P66E=
|
||||||
golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
|
golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y=
|
||||||
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
||||||
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
||||||
|
golang.org/x/sys v0.0.0-20190312061237-fead79001313/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
golang.org/x/sys v0.0.0-20200909081042-eff7692f9009/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
|
||||||
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
golang.org/x/sys v0.0.0-20210124154548-22da62e12c0c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
golang.org/x/sys v0.0.0-20210124154548-22da62e12c0c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
golang.org/x/sys v0.0.0-20210304124612-50617c2ba197/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
|
golang.org/x/sys v0.0.0-20210330210617-4fbd30eecc44/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
|
golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||||
|
golang.org/x/sys v0.0.0-20210510120138-977fb7262007/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||||
golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||||
golang.org/x/sys v0.0.0-20210806184541-e5e7981a1069/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||||
golang.org/x/sys v0.0.0-20220704084225-05e143d24a9e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
|
||||||
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||||
golang.org/x/sys v0.13.0 h1:Af8nKPmuFypiUBjVoU9V20FiaFXOcuZI21p0ycVYYGE=
|
golang.org/x/sys v0.20.0 h1:Od9JTbYCk261bKm4M/mw7AklTlFYIa0bIp9BgSm1S8Y=
|
||||||
golang.org/x/sys v0.13.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
|
||||||
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
|
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
|
||||||
golang.org/x/term v0.13.0 h1:bb+I9cTfFazGW51MZqBVmZy7+JEJMouUHTUSKVQLBek=
|
golang.org/x/term v0.20.0 h1:VnkxpohqXaOBYJtBmEppKUG6mXpi+4O6purfc2+sMhw=
|
||||||
golang.org/x/term v0.13.0/go.mod h1:LTmsnFJwVN6bCy1rVCoS+qHT1HhALEFxKncY3WNNh4U=
|
golang.org/x/term v0.20.0/go.mod h1:8UkIAJTvZgivsXaD6/pH6U9ecQzZ45awqEOzuCvwpFY=
|
||||||
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
|
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
|
||||||
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||||
|
golang.org/x/text v0.3.5/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||||
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||||
golang.org/x/text v0.14.0 h1:ScX5w1eTa3QqT8oi6+ziP7dTV1S2+ALU0bI+0zXKWiQ=
|
golang.org/x/text v0.15.0 h1:h1V/4gjBv8v9cjcR6+AR5+/cIYK5N/WAgiv4xlsEtAk=
|
||||||
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
|
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
|
||||||
golang.org/x/tools v0.0.0-20180525024113-a5b4c53f6e8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
golang.org/x/tools v0.0.0-20180525024113-a5b4c53f6e8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||||
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||||
golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||||
@@ -247,34 +294,40 @@ golang.org/x/tools v0.0.0-20190206041539-40960b6deb8e/go.mod h1:n7NCudcB/nEzxVGm
|
|||||||
golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY=
|
golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY=
|
||||||
golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
|
golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
|
||||||
golang.org/x/tools v0.0.0-20190524140312-2c0ae7006135/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q=
|
golang.org/x/tools v0.0.0-20190524140312-2c0ae7006135/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q=
|
||||||
|
golang.org/x/tools v0.0.0-20190927191325-030b2cf1153e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
|
||||||
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
|
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
|
||||||
|
golang.org/x/tools v0.0.0-20200130002326-2f3ba24bd6e7/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=
|
||||||
golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE=
|
golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE=
|
||||||
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
|
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
|
||||||
|
golang.org/x/tools v0.1.4/go.mod h1:o0xws9oXOQQZyjljx8fwUC0k7L1pTE6eaCbjGeHmOkk=
|
||||||
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||||
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||||
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||||
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE=
|
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE=
|
||||||
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||||
gonum.org/v1/gonum v0.0.0-20180816165407-929014505bf4/go.mod h1:Y+Yx5eoAFn32cQvJDxZx5Dpnq+c3wtXuadVZAcxbbBo=
|
gonum.org/v1/gonum v0.0.0-20180816165407-929014505bf4/go.mod h1:Y+Yx5eoAFn32cQvJDxZx5Dpnq+c3wtXuadVZAcxbbBo=
|
||||||
gonum.org/v1/gonum v0.8.2 h1:CCXrcPKiGGotvnN6jfUsKk4rRqm7q09/YbKb5xCEvtM=
|
|
||||||
gonum.org/v1/gonum v0.8.2/go.mod h1:oe/vMfY3deqTw+1EZJhuvEW2iwGF1bW9wwu7XCu0+v0=
|
gonum.org/v1/gonum v0.8.2/go.mod h1:oe/vMfY3deqTw+1EZJhuvEW2iwGF1bW9wwu7XCu0+v0=
|
||||||
gonum.org/v1/netlib v0.0.0-20190313105609-8cb42192e0e0 h1:OE9mWmgKkjJyEmDAAtGMPjXu+YNeGvK9VTSHY6+Qihc=
|
gonum.org/v1/gonum v0.9.3/go.mod h1:TZumC3NeyVQskjXqmyWt4S3bINhy7B4eYwW69EbyX+0=
|
||||||
|
gonum.org/v1/gonum v0.15.0 h1:2lYxjRbTYyxkJxlhC+LvJIx3SsANPdRybu1tGj9/OrQ=
|
||||||
|
gonum.org/v1/gonum v0.15.0/go.mod h1:xzZVBJBtS+Mz4q0Yl2LJTk+OxOg4jiXZ7qBoM0uISGo=
|
||||||
gonum.org/v1/netlib v0.0.0-20190313105609-8cb42192e0e0/go.mod h1:wa6Ws7BG/ESfp6dHfk7C6KdzKA7wR7u/rKwOGE66zvw=
|
gonum.org/v1/netlib v0.0.0-20190313105609-8cb42192e0e0/go.mod h1:wa6Ws7BG/ESfp6dHfk7C6KdzKA7wR7u/rKwOGE66zvw=
|
||||||
gonum.org/v1/plot v0.0.0-20190515093506-e2840ee46a6b/go.mod h1:Wt8AAjI+ypCyYX3nZBvf6cAIx93T+c/OS2HFAYskSZc=
|
gonum.org/v1/plot v0.0.0-20190515093506-e2840ee46a6b/go.mod h1:Wt8AAjI+ypCyYX3nZBvf6cAIx93T+c/OS2HFAYskSZc=
|
||||||
|
gonum.org/v1/plot v0.9.0/go.mod h1:3Pcqqmp6RHvJI72kgb8fThyUnav364FOsdDo2aGW5lY=
|
||||||
google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM=
|
google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM=
|
||||||
google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
|
google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
|
||||||
google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc=
|
google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc=
|
||||||
google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc=
|
google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc=
|
||||||
|
google.golang.org/genproto v0.0.0-20200513103714-09dca8ec2884/go.mod h1:55QSHmfGQM9UVYDPBsyGGes0y52j32PQ3BqQfXhyH3c=
|
||||||
google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013/go.mod h1:NbSheEEYHJ7i3ixzK3sjbqSGDJWnxyFXZblF3eUsNvo=
|
google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013/go.mod h1:NbSheEEYHJ7i3ixzK3sjbqSGDJWnxyFXZblF3eUsNvo=
|
||||||
google.golang.org/genproto v0.0.0-20200911024640-645f7a48b24f h1:Yv4xsIx7HZOoyUGSJ2ksDyWE2qIBXROsZKt2ny3hCGM=
|
google.golang.org/genproto v0.0.0-20210630183607-d20f26d13c79/go.mod h1:yiaVoXHpRzHGyxV3o4DktVWY4mSUErTKaeEOq6C3t3U=
|
||||||
google.golang.org/genproto v0.0.0-20200911024640-645f7a48b24f/go.mod h1:FWY/as6DDZQgahTzZj3fqbO1CbirC29ZNUFHwi0/+no=
|
|
||||||
google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c=
|
google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c=
|
||||||
google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyacEbxg=
|
google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyacEbxg=
|
||||||
google.golang.org/grpc v1.25.1/go.mod h1:c3i+UQWmh7LiEpx4sFZnkU36qjEYZ0imhYfXVyQciAY=
|
google.golang.org/grpc v1.25.1/go.mod h1:c3i+UQWmh7LiEpx4sFZnkU36qjEYZ0imhYfXVyQciAY=
|
||||||
google.golang.org/grpc v1.27.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk=
|
google.golang.org/grpc v1.27.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk=
|
||||||
google.golang.org/grpc v1.32.0 h1:zWTV+LMdc3kaiJMSTOFz2UgSBgx8RNQoTGiZu3fR9S0=
|
google.golang.org/grpc v1.33.1/go.mod h1:fr5YgcSWrqhRRxogOsw7RzIpsmvOZ6IcH4kBYTpR3n0=
|
||||||
google.golang.org/grpc v1.32.0/go.mod h1:N36X2cJ7JwdamYAgDz+s+rVMFjt3numwzf/HckM8pak=
|
google.golang.org/grpc v1.36.0/go.mod h1:qjiiYl8FncCW8feJPdyg3v6XW24KsRHe+dy9BAGRRjU=
|
||||||
google.golang.org/grpc/cmd/protoc-gen-go-grpc v0.0.0-20200910201057-6591123024b3/go.mod h1:6Kw0yEErY5E/yWrBtf03jp27GLLJujG4z/JK95pnjjw=
|
google.golang.org/grpc v1.38.0/go.mod h1:NREThFqKR1f3iQ6oBuvc5LadQuXVGo9rkm5ZGrQdJfM=
|
||||||
|
google.golang.org/grpc v1.39.0/go.mod h1:PImNr+rS9TWYb2O4/emRugxiyHZ5JyHW5F+RPnDzfrE=
|
||||||
google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd/go.mod h1:DFci5gLYBciE7Vtevhsrf46CRTquxDuWsQurQQe4oz8=
|
google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd/go.mod h1:DFci5gLYBciE7Vtevhsrf46CRTquxDuWsQurQQe4oz8=
|
||||||
google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64/go.mod h1:kwYJMbMJ01Woi6D6+Kah6886xMZcty6N08ah7+eCXa0=
|
google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64/go.mod h1:kwYJMbMJ01Woi6D6+Kah6886xMZcty6N08ah7+eCXa0=
|
||||||
google.golang.org/protobuf v0.0.0-20200228230310-ab0ca4ff8a60/go.mod h1:cfTl7dwQJ+fmap5saPgwCLgHXTUD7jkjRqWcaiX5VyM=
|
google.golang.org/protobuf v0.0.0-20200228230310-ab0ca4ff8a60/go.mod h1:cfTl7dwQJ+fmap5saPgwCLgHXTUD7jkjRqWcaiX5VyM=
|
||||||
@@ -283,20 +336,18 @@ google.golang.org/protobuf v1.21.0/go.mod h1:47Nbq4nVaFHyn7ilMalzfO3qCViNmqZ2kzi
|
|||||||
google.golang.org/protobuf v1.22.0/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU=
|
google.golang.org/protobuf v1.22.0/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU=
|
||||||
google.golang.org/protobuf v1.23.0/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU=
|
google.golang.org/protobuf v1.23.0/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU=
|
||||||
google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU=
|
google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpADcykh3NcUnDUJcl1+ZksZNG86OlYog2l/sGQquU=
|
||||||
google.golang.org/protobuf v1.24.0/go.mod h1:r/3tXBNzIEhYS9I1OUVjXDlt8tc493IdKGjtUeSXeh4=
|
|
||||||
google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c=
|
google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c=
|
||||||
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
|
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
|
||||||
google.golang.org/protobuf v1.28.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
|
google.golang.org/protobuf v1.26.0/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc=
|
||||||
google.golang.org/protobuf v1.30.0 h1:kPPoIgf3TsEvrm0PFe15JQ+570QVxYzEvvHqChK+cng=
|
google.golang.org/protobuf v1.27.1/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc=
|
||||||
google.golang.org/protobuf v1.30.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
|
google.golang.org/protobuf v1.34.1 h1:9ddQBjfCyZPOHPUiPxpYESBLc+T8P3E+Vo4IbKZgFWg=
|
||||||
|
google.golang.org/protobuf v1.34.1/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
|
||||||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||||
gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
|
||||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
|
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
|
||||||
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
|
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
|
||||||
gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI=
|
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
|
||||||
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
|
gopkg.in/yaml.v2 v2.2.3/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
|
||||||
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||||
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
|
||||||
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||||
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||||
gorgonia.org/vecf32 v0.9.0 h1:PClazic1r+JVJ1dEzRXgeiVl4g1/Hf/w+wUSqnco1Xg=
|
gorgonia.org/vecf32 v0.9.0 h1:PClazic1r+JVJ1dEzRXgeiVl4g1/Hf/w+wUSqnco1Xg=
|
||||||
@@ -305,4 +356,5 @@ gorgonia.org/vecf64 v0.9.0 h1:bgZDP5x0OzBF64PjMGC3EvTdOoMEcmfAh1VCUnZFm1A=
|
|||||||
gorgonia.org/vecf64 v0.9.0/go.mod h1:hp7IOWCnRiVQKON73kkC/AUMtEXyf9kGlVrtPQ9ccVA=
|
gorgonia.org/vecf64 v0.9.0/go.mod h1:hp7IOWCnRiVQKON73kkC/AUMtEXyf9kGlVrtPQ9ccVA=
|
||||||
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
||||||
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
||||||
|
nullprogram.com/x/optparse v1.0.0/go.mod h1:KdyPE+Igbe0jQUrVfMqDMeJQIJZEuyV7pjYmp6pbG50=
|
||||||
rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
|
rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
|
||||||
|
|||||||
@@ -81,8 +81,10 @@ func commonAMDValidateLibDir() (string, error) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Well known location(s)
|
// Well known location(s)
|
||||||
if rocmLibUsable(RocmStandardLocation) {
|
for _, path := range RocmStandardLocations {
|
||||||
return RocmStandardLocation, nil
|
if rocmLibUsable(path) {
|
||||||
|
return path, nil
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Installer payload location if we're running the installed binary
|
// Installer payload location if we're running the installed binary
|
||||||
|
|||||||
@@ -3,7 +3,6 @@ package gpu
|
|||||||
import (
|
import (
|
||||||
"fmt"
|
"fmt"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"strconv"
|
|
||||||
"syscall"
|
"syscall"
|
||||||
"unsafe"
|
"unsafe"
|
||||||
|
|
||||||
@@ -74,16 +73,22 @@ func (hl *HipLib) Release() {
|
|||||||
hl.dll = 0
|
hl.dll = 0
|
||||||
}
|
}
|
||||||
|
|
||||||
func (hl *HipLib) AMDDriverVersion() (string, error) {
|
func (hl *HipLib) AMDDriverVersion() (driverMajor, driverMinor int, err error) {
|
||||||
if hl.dll == 0 {
|
if hl.dll == 0 {
|
||||||
return "", fmt.Errorf("dll has been unloaded")
|
return 0, 0, fmt.Errorf("dll has been unloaded")
|
||||||
}
|
}
|
||||||
var version int
|
var version int
|
||||||
status, _, err := syscall.SyscallN(hl.hipDriverGetVersion, uintptr(unsafe.Pointer(&version)))
|
status, _, err := syscall.SyscallN(hl.hipDriverGetVersion, uintptr(unsafe.Pointer(&version)))
|
||||||
if status != hipSuccess {
|
if status != hipSuccess {
|
||||||
return "", fmt.Errorf("failed call to hipDriverGetVersion: %d %s", status, err)
|
return 0, 0, fmt.Errorf("failed call to hipDriverGetVersion: %d %s", status, err)
|
||||||
}
|
}
|
||||||
return strconv.Itoa(version), nil
|
|
||||||
|
slog.Debug("hipDriverGetVersion", "version", version)
|
||||||
|
// TODO - this isn't actually right, but the docs claim hipDriverGetVersion isn't accurate anyway...
|
||||||
|
driverMajor = version / 1000
|
||||||
|
driverMinor = (version - (driverMajor * 1000)) / 10
|
||||||
|
|
||||||
|
return driverMajor, driverMinor, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (hl *HipLib) HipGetDeviceCount() int {
|
func (hl *HipLib) HipGetDeviceCount() int {
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ import (
|
|||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
|
"regexp"
|
||||||
"slices"
|
"slices"
|
||||||
"strconv"
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
@@ -25,12 +26,12 @@ const (
|
|||||||
// Prefix with the node dir
|
// Prefix with the node dir
|
||||||
GPUTotalMemoryFileGlob = "mem_banks/*/properties" // size_in_bytes line
|
GPUTotalMemoryFileGlob = "mem_banks/*/properties" // size_in_bytes line
|
||||||
GPUUsedMemoryFileGlob = "mem_banks/*/used_memory"
|
GPUUsedMemoryFileGlob = "mem_banks/*/used_memory"
|
||||||
RocmStandardLocation = "/opt/rocm/lib"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
var (
|
var (
|
||||||
// Used to validate if the given ROCm lib is usable
|
// Used to validate if the given ROCm lib is usable
|
||||||
ROCmLibGlobs = []string{"libhipblas.so.2*", "rocblas"} // TODO - probably include more coverage of files here...
|
ROCmLibGlobs = []string{"libhipblas.so.2*", "rocblas"} // TODO - probably include more coverage of files here...
|
||||||
|
RocmStandardLocations = []string{"/opt/rocm/lib", "/usr/lib64"}
|
||||||
)
|
)
|
||||||
|
|
||||||
// Gather GPU information from the amdgpu driver if any supported GPUs are detected
|
// Gather GPU information from the amdgpu driver if any supported GPUs are detected
|
||||||
@@ -41,10 +42,8 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Opportunistic logging of driver version to aid in troubleshooting
|
// Opportunistic logging of driver version to aid in troubleshooting
|
||||||
ver, err := AMDDriverVersion()
|
driverMajor, driverMinor, err := AMDDriverVersion()
|
||||||
if err == nil {
|
if err != nil {
|
||||||
slog.Info("AMD Driver: " + ver)
|
|
||||||
} else {
|
|
||||||
// TODO - if we see users crash and burn with the upstreamed kernel this can be adjusted to hard-fail rocm support and fallback to CPU
|
// TODO - if we see users crash and burn with the upstreamed kernel this can be adjusted to hard-fail rocm support and fallback to CPU
|
||||||
slog.Warn("ollama recommends running the https://www.amd.com/en/support/linux-drivers", "error", err)
|
slog.Warn("ollama recommends running the https://www.amd.com/en/support/linux-drivers", "error", err)
|
||||||
}
|
}
|
||||||
@@ -91,6 +90,7 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
scanner := bufio.NewScanner(fp)
|
scanner := bufio.NewScanner(fp)
|
||||||
isCPU := false
|
isCPU := false
|
||||||
var major, minor, patch uint64
|
var major, minor, patch uint64
|
||||||
|
var vendor, device uint64
|
||||||
for scanner.Scan() {
|
for scanner.Scan() {
|
||||||
line := strings.TrimSpace(scanner.Text())
|
line := strings.TrimSpace(scanner.Text())
|
||||||
// Note: we could also use "cpu_cores_count X" where X is greater than zero to detect CPUs
|
// Note: we could also use "cpu_cores_count X" where X is greater than zero to detect CPUs
|
||||||
@@ -118,6 +118,26 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
slog.Debug("malformed int " + line)
|
slog.Debug("malformed int " + line)
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
} else if strings.HasPrefix(line, "vendor_id") {
|
||||||
|
ver := strings.Fields(line)
|
||||||
|
if len(ver) != 2 {
|
||||||
|
slog.Debug("malformed vendor_id", "vendor_id", line)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
vendor, err = strconv.ParseUint(ver[1], 10, 32)
|
||||||
|
if err != nil {
|
||||||
|
slog.Debug("malformed vendor_id" + line)
|
||||||
|
}
|
||||||
|
} else if strings.HasPrefix(line, "device_id") {
|
||||||
|
ver := strings.Fields(line)
|
||||||
|
if len(ver) != 2 {
|
||||||
|
slog.Debug("malformed device_id", "device_id", line)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
device, err = strconv.ParseUint(ver[1], 10, 32)
|
||||||
|
if err != nil {
|
||||||
|
slog.Debug("malformed device_id" + line)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// TODO - any other properties we want to extract and record?
|
// TODO - any other properties we want to extract and record?
|
||||||
@@ -140,7 +160,7 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
}
|
}
|
||||||
|
|
||||||
if int(major) < RocmComputeMin {
|
if int(major) < RocmComputeMin {
|
||||||
slog.Warn(fmt.Sprintf("amdgpu too old gfx%d%d%x", major, minor, patch), "gpu", gpuID)
|
slog.Warn(fmt.Sprintf("amdgpu too old gfx%d%x%x", major, minor, patch), "gpu", gpuID)
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -210,24 +230,29 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
|
|
||||||
// iGPU detection, remove this check once we can support an iGPU variant of the rocm library
|
// iGPU detection, remove this check once we can support an iGPU variant of the rocm library
|
||||||
if totalMemory < IGPUMemLimit {
|
if totalMemory < IGPUMemLimit {
|
||||||
slog.Info("amdgpu appears to be an iGPU, skipping", "gpu", gpuID, "total", format.HumanBytes2(totalMemory))
|
slog.Info("unsupported Radeon iGPU detected skipping", "id", gpuID, "total", format.HumanBytes2(totalMemory))
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
var name string
|
||||||
|
// TODO - PCI ID lookup
|
||||||
|
if vendor > 0 && device > 0 {
|
||||||
|
name = fmt.Sprintf("%04x:%04x", vendor, device)
|
||||||
|
}
|
||||||
|
|
||||||
slog.Info("amdgpu memory", "gpu", gpuID, "total", format.HumanBytes2(totalMemory))
|
slog.Debug("amdgpu memory", "gpu", gpuID, "total", format.HumanBytes2(totalMemory))
|
||||||
slog.Info("amdgpu memory", "gpu", gpuID, "available", format.HumanBytes2(totalMemory-usedMemory))
|
slog.Debug("amdgpu memory", "gpu", gpuID, "available", format.HumanBytes2(totalMemory-usedMemory))
|
||||||
gpuInfo := GpuInfo{
|
gpuInfo := GpuInfo{
|
||||||
Library: "rocm",
|
Library: "rocm",
|
||||||
memInfo: memInfo{
|
memInfo: memInfo{
|
||||||
TotalMemory: totalMemory,
|
TotalMemory: totalMemory,
|
||||||
FreeMemory: (totalMemory - usedMemory),
|
FreeMemory: (totalMemory - usedMemory),
|
||||||
},
|
},
|
||||||
ID: fmt.Sprintf("%d", gpuID),
|
ID: fmt.Sprintf("%d", gpuID),
|
||||||
// Name: not exposed in sysfs directly, would require pci device id lookup
|
Name: name,
|
||||||
Major: int(major),
|
Compute: fmt.Sprintf("gfx%d%x%x", major, minor, patch),
|
||||||
Minor: int(minor),
|
|
||||||
Patch: int(patch),
|
|
||||||
MinimumMemory: rocmMinimumMemory,
|
MinimumMemory: rocmMinimumMemory,
|
||||||
|
DriverMajor: driverMajor,
|
||||||
|
DriverMinor: driverMinor,
|
||||||
}
|
}
|
||||||
|
|
||||||
// If the user wants to filter to a subset of devices, filter out if we aren't a match
|
// If the user wants to filter to a subset of devices, filter out if we aren't a match
|
||||||
@@ -266,7 +291,7 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
}
|
}
|
||||||
slog.Debug("rocm supported GPUs", "types", supported)
|
slog.Debug("rocm supported GPUs", "types", supported)
|
||||||
}
|
}
|
||||||
gfx := fmt.Sprintf("gfx%d%d%x", gpuInfo.Major, gpuInfo.Minor, gpuInfo.Patch)
|
gfx := gpuInfo.Compute
|
||||||
if !slices.Contains[[]string, string](supported, gfx) {
|
if !slices.Contains[[]string, string](supported, gfx) {
|
||||||
slog.Warn("amdgpu is not supported", "gpu", gpuInfo.ID, "gpu_type", gfx, "library", libDir, "supported_types", supported)
|
slog.Warn("amdgpu is not supported", "gpu", gpuInfo.ID, "gpu_type", gfx, "library", libDir, "supported_types", supported)
|
||||||
// TODO - consider discrete markdown just for ROCM troubleshooting?
|
// TODO - consider discrete markdown just for ROCM troubleshooting?
|
||||||
@@ -276,7 +301,7 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
slog.Info("amdgpu is supported", "gpu", gpuInfo.ID, "gpu_type", gfx)
|
slog.Info("amdgpu is supported", "gpu", gpuInfo.ID, "gpu_type", gfx)
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
slog.Debug("skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=" + gfxOverride)
|
slog.Info("skipping rocm gfx compatibility check", "HSA_OVERRIDE_GFX_VERSION", gfxOverride)
|
||||||
}
|
}
|
||||||
|
|
||||||
// The GPU has passed all the verification steps and is supported
|
// The GPU has passed all the verification steps and is supported
|
||||||
@@ -322,19 +347,34 @@ func AMDValidateLibDir() (string, error) {
|
|||||||
return "", fmt.Errorf("no suitable rocm found, falling back to CPU")
|
return "", fmt.Errorf("no suitable rocm found, falling back to CPU")
|
||||||
}
|
}
|
||||||
|
|
||||||
func AMDDriverVersion() (string, error) {
|
func AMDDriverVersion() (driverMajor, driverMinor int, err error) {
|
||||||
_, err := os.Stat(DriverVersionFile)
|
_, err = os.Stat(DriverVersionFile)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return "", fmt.Errorf("amdgpu version file missing: %s %w", DriverVersionFile, err)
|
return 0, 0, fmt.Errorf("amdgpu version file missing: %s %w", DriverVersionFile, err)
|
||||||
}
|
}
|
||||||
fp, err := os.Open(DriverVersionFile)
|
fp, err := os.Open(DriverVersionFile)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return "", err
|
return 0, 0, err
|
||||||
}
|
}
|
||||||
defer fp.Close()
|
defer fp.Close()
|
||||||
verString, err := io.ReadAll(fp)
|
verString, err := io.ReadAll(fp)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return "", err
|
return 0, 0, err
|
||||||
}
|
}
|
||||||
return strings.TrimSpace(string(verString)), nil
|
|
||||||
|
pattern := `\A(\d+)\.(\d+).*`
|
||||||
|
regex := regexp.MustCompile(pattern)
|
||||||
|
match := regex.FindStringSubmatch(string(verString))
|
||||||
|
if len(match) < 2 {
|
||||||
|
return 0, 0, fmt.Errorf("malformed version string %s", string(verString))
|
||||||
|
}
|
||||||
|
driverMajor, err = strconv.Atoi(match[1])
|
||||||
|
if err != nil {
|
||||||
|
return 0, 0, err
|
||||||
|
}
|
||||||
|
driverMinor, err = strconv.Atoi(match[2])
|
||||||
|
if err != nil {
|
||||||
|
return 0, 0, err
|
||||||
|
}
|
||||||
|
return driverMajor, driverMinor, nil
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -7,14 +7,12 @@ import (
|
|||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
"slices"
|
"slices"
|
||||||
"strconv"
|
|
||||||
"strings"
|
"strings"
|
||||||
|
|
||||||
"github.com/ollama/ollama/format"
|
"github.com/ollama/ollama/format"
|
||||||
)
|
)
|
||||||
|
|
||||||
const (
|
const (
|
||||||
RocmStandardLocation = "C:\\Program Files\\AMD\\ROCm\\5.7\\bin" // TODO glob?
|
|
||||||
|
|
||||||
// TODO We're lookinng for this exact name to detect iGPUs since hipGetDeviceProperties never reports integrated==true
|
// TODO We're lookinng for this exact name to detect iGPUs since hipGetDeviceProperties never reports integrated==true
|
||||||
iGPUName = "AMD Radeon(TM) Graphics"
|
iGPUName = "AMD Radeon(TM) Graphics"
|
||||||
@@ -22,7 +20,8 @@ const (
|
|||||||
|
|
||||||
var (
|
var (
|
||||||
// Used to validate if the given ROCm lib is usable
|
// Used to validate if the given ROCm lib is usable
|
||||||
ROCmLibGlobs = []string{"hipblas.dll", "rocblas"} // TODO - probably include more coverage of files here...
|
ROCmLibGlobs = []string{"hipblas.dll", "rocblas"} // TODO - probably include more coverage of files here...
|
||||||
|
RocmStandardLocations = []string{"C:\\Program Files\\AMD\\ROCm\\5.7\\bin"} // TODO glob?
|
||||||
)
|
)
|
||||||
|
|
||||||
func AMDGetGPUInfo() []GpuInfo {
|
func AMDGetGPUInfo() []GpuInfo {
|
||||||
@@ -34,13 +33,12 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
}
|
}
|
||||||
defer hl.Release()
|
defer hl.Release()
|
||||||
|
|
||||||
ver, err := hl.AMDDriverVersion()
|
// TODO - this reports incorrect version information, so omitting for now
|
||||||
if err == nil {
|
// driverMajor, driverMinor, err := hl.AMDDriverVersion()
|
||||||
slog.Info("AMD Driver: " + ver)
|
// if err != nil {
|
||||||
} else {
|
// // For now this is benign, but we may eventually need to fail compatibility checks
|
||||||
// For now this is benign, but we may eventually need to fail compatibility checks
|
// slog.Debug("error looking up amd driver version", "error", err)
|
||||||
slog.Debug("error looking up amd driver version", "error", err)
|
// }
|
||||||
}
|
|
||||||
|
|
||||||
// Note: the HIP library automatically handles subsetting to any HIP_VISIBLE_DEVICES the user specified
|
// Note: the HIP library automatically handles subsetting to any HIP_VISIBLE_DEVICES the user specified
|
||||||
count := hl.HipGetDeviceCount()
|
count := hl.HipGetDeviceCount()
|
||||||
@@ -62,10 +60,10 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
slog.Debug("skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=" + gfxOverride)
|
slog.Info("skipping rocm gfx compatibility check", "HSA_OVERRIDE_GFX_VERSION", gfxOverride)
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Info("detected hip devices", "count", count)
|
slog.Debug("detected hip devices", "count", count)
|
||||||
// TODO how to determine the underlying device ID when visible devices is causing this to subset?
|
// TODO how to determine the underlying device ID when visible devices is causing this to subset?
|
||||||
for i := 0; i < count; i++ {
|
for i := 0; i < count; i++ {
|
||||||
err = hl.HipSetDevice(i)
|
err = hl.HipSetDevice(i)
|
||||||
@@ -85,18 +83,11 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
// Can luid be used on windows for setting visible devices (and is it actually set?)
|
// Can luid be used on windows for setting visible devices (and is it actually set?)
|
||||||
n = bytes.IndexByte(props.GcnArchName[:], 0)
|
n = bytes.IndexByte(props.GcnArchName[:], 0)
|
||||||
gfx := string(props.GcnArchName[:n])
|
gfx := string(props.GcnArchName[:n])
|
||||||
slog.Info("hip device", "id", i, "name", name, "gfx", gfx)
|
slog.Debug("hip device", "id", i, "name", name, "gfx", gfx)
|
||||||
var major, minor, patch string
|
|
||||||
switch len(gfx) {
|
|
||||||
case 6:
|
|
||||||
major, minor, patch = gfx[3:4], gfx[4:5], gfx[5:]
|
|
||||||
case 7:
|
|
||||||
major, minor, patch = gfx[3:5], gfx[5:6], gfx[6:]
|
|
||||||
}
|
|
||||||
//slog.Info(fmt.Sprintf("[%d] Integrated: %d", i, props.iGPU)) // DOESN'T REPORT CORRECTLY! Always 0
|
//slog.Info(fmt.Sprintf("[%d] Integrated: %d", i, props.iGPU)) // DOESN'T REPORT CORRECTLY! Always 0
|
||||||
// TODO Why isn't props.iGPU accurate!?
|
// TODO Why isn't props.iGPU accurate!?
|
||||||
if strings.EqualFold(name, iGPUName) {
|
if strings.EqualFold(name, iGPUName) {
|
||||||
slog.Info("iGPU detected skipping", "id", i)
|
slog.Info("unsupported Radeon iGPU detected skipping", "id", i, "name", name, "gfx", gfx)
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
if gfxOverride == "" {
|
if gfxOverride == "" {
|
||||||
@@ -106,7 +97,7 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
slog.Warn("See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage")
|
slog.Warn("See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage")
|
||||||
continue
|
continue
|
||||||
} else {
|
} else {
|
||||||
slog.Info("amdgpu is supported", "gpu", i, "gpu_type", gfx)
|
slog.Debug("amdgpu is supported", "gpu", i, "gpu_type", gfx)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -124,8 +115,8 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
|
|
||||||
// TODO revisit this once ROCm v6 is available on windows.
|
// TODO revisit this once ROCm v6 is available on windows.
|
||||||
// v5.7 only reports VRAM used by this process, so it's completely wrong and unusable
|
// v5.7 only reports VRAM used by this process, so it's completely wrong and unusable
|
||||||
slog.Info("amdgpu memory", "gpu", i, "total", format.HumanBytes2(totalMemory))
|
slog.Debug("amdgpu memory", "gpu", i, "total", format.HumanBytes2(totalMemory))
|
||||||
slog.Info("amdgpu memory", "gpu", i, "available", format.HumanBytes2(freeMemory))
|
slog.Debug("amdgpu memory", "gpu", i, "available", format.HumanBytes2(freeMemory))
|
||||||
gpuInfo := GpuInfo{
|
gpuInfo := GpuInfo{
|
||||||
Library: "rocm",
|
Library: "rocm",
|
||||||
memInfo: memInfo{
|
memInfo: memInfo{
|
||||||
@@ -135,31 +126,12 @@ func AMDGetGPUInfo() []GpuInfo {
|
|||||||
ID: fmt.Sprintf("%d", i), // TODO this is probably wrong if we specify visible devices
|
ID: fmt.Sprintf("%d", i), // TODO this is probably wrong if we specify visible devices
|
||||||
DependencyPath: libDir,
|
DependencyPath: libDir,
|
||||||
MinimumMemory: rocmMinimumMemory,
|
MinimumMemory: rocmMinimumMemory,
|
||||||
}
|
Name: name,
|
||||||
if major != "" {
|
Compute: gfx,
|
||||||
gpuInfo.Major, err = strconv.Atoi(major)
|
|
||||||
if err != nil {
|
// TODO - this information isn't accurate on windows, so don't report it until we find the right way to retrieve
|
||||||
slog.Info("failed to parse version", "version", gfx, "error", err)
|
// DriverMajor: driverMajor,
|
||||||
}
|
// DriverMinor: driverMinor,
|
||||||
}
|
|
||||||
if minor != "" {
|
|
||||||
gpuInfo.Minor, err = strconv.Atoi(minor)
|
|
||||||
if err != nil {
|
|
||||||
slog.Info("failed to parse version", "version", gfx, "error", err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if patch != "" {
|
|
||||||
// Patch rev is hex; e.g. gfx90a
|
|
||||||
p, err := strconv.ParseInt(patch, 16, 0)
|
|
||||||
if err != nil {
|
|
||||||
slog.Info("failed to parse version", "version", gfx, "error", err)
|
|
||||||
} else {
|
|
||||||
gpuInfo.Patch = int(p)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if gpuInfo.Major < RocmComputeMin {
|
|
||||||
slog.Warn(fmt.Sprintf("amdgpu [%s] too old gfx%d%d%x", gpuInfo.ID, gpuInfo.Major, gpuInfo.Minor, gpuInfo.Patch))
|
|
||||||
continue
|
|
||||||
}
|
}
|
||||||
|
|
||||||
resp = append(resp, gpuInfo)
|
resp = append(resp, gpuInfo)
|
||||||
|
|||||||
@@ -12,6 +12,8 @@ import (
|
|||||||
"sync"
|
"sync"
|
||||||
"syscall"
|
"syscall"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
)
|
)
|
||||||
|
|
||||||
var (
|
var (
|
||||||
@@ -24,45 +26,8 @@ func PayloadsDir() (string, error) {
|
|||||||
defer lock.Unlock()
|
defer lock.Unlock()
|
||||||
var err error
|
var err error
|
||||||
if payloadsDir == "" {
|
if payloadsDir == "" {
|
||||||
runnersDir := os.Getenv("OLLAMA_RUNNERS_DIR")
|
runnersDir := envconfig.RunnersDir
|
||||||
// On Windows we do not carry the payloads inside the main executable
|
|
||||||
if runtime.GOOS == "windows" && runnersDir == "" {
|
|
||||||
appExe, err := os.Executable()
|
|
||||||
if err != nil {
|
|
||||||
slog.Error("failed to lookup executable path", "error", err)
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
cwd, err := os.Getwd()
|
|
||||||
if err != nil {
|
|
||||||
slog.Error("failed to lookup working directory", "error", err)
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
var paths []string
|
|
||||||
for _, root := range []string{filepath.Dir(appExe), cwd} {
|
|
||||||
paths = append(paths,
|
|
||||||
filepath.Join(root),
|
|
||||||
filepath.Join(root, "windows-"+runtime.GOARCH),
|
|
||||||
filepath.Join(root, "dist", "windows-"+runtime.GOARCH),
|
|
||||||
)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Try a few variations to improve developer experience when building from source in the local tree
|
|
||||||
for _, p := range paths {
|
|
||||||
candidate := filepath.Join(p, "ollama_runners")
|
|
||||||
_, err := os.Stat(candidate)
|
|
||||||
if err == nil {
|
|
||||||
runnersDir = candidate
|
|
||||||
break
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if runnersDir == "" {
|
|
||||||
err = fmt.Errorf("unable to locate llm runner directory. Set OLLAMA_RUNNERS_DIR to the location of 'ollama_runners'")
|
|
||||||
slog.Error("incomplete distribution", "error", err)
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if runnersDir != "" {
|
if runnersDir != "" {
|
||||||
payloadsDir = runnersDir
|
payloadsDir = runnersDir
|
||||||
return payloadsDir, nil
|
return payloadsDir, nil
|
||||||
@@ -70,7 +35,7 @@ func PayloadsDir() (string, error) {
|
|||||||
|
|
||||||
// The remainder only applies on non-windows where we still carry payloads in the main executable
|
// The remainder only applies on non-windows where we still carry payloads in the main executable
|
||||||
cleanupTmpDirs()
|
cleanupTmpDirs()
|
||||||
tmpDir := os.Getenv("OLLAMA_TMPDIR")
|
tmpDir := envconfig.TmpDir
|
||||||
if tmpDir == "" {
|
if tmpDir == "" {
|
||||||
tmpDir, err = os.MkdirTemp("", "ollama")
|
tmpDir, err = os.MkdirTemp("", "ollama")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -133,7 +98,7 @@ func cleanupTmpDirs() {
|
|||||||
func Cleanup() {
|
func Cleanup() {
|
||||||
lock.Lock()
|
lock.Lock()
|
||||||
defer lock.Unlock()
|
defer lock.Unlock()
|
||||||
runnersDir := os.Getenv("OLLAMA_RUNNERS_DIR")
|
runnersDir := envconfig.RunnersDir
|
||||||
if payloadsDir != "" && runnersDir == "" && runtime.GOOS != "windows" {
|
if payloadsDir != "" && runnersDir == "" && runtime.GOOS != "windows" {
|
||||||
// We want to fully clean up the tmpdir parent of the payloads dir
|
// We want to fully clean up the tmpdir parent of the payloads dir
|
||||||
tmpDir := filepath.Clean(filepath.Join(payloadsDir, ".."))
|
tmpDir := filepath.Clean(filepath.Join(payloadsDir, ".."))
|
||||||
|
|||||||
@@ -8,14 +8,14 @@ import (
|
|||||||
|
|
||||||
func GetCPUVariant() string {
|
func GetCPUVariant() string {
|
||||||
if cpu.X86.HasAVX2 {
|
if cpu.X86.HasAVX2 {
|
||||||
slog.Info("CPU has AVX2")
|
slog.Debug("CPU has AVX2")
|
||||||
return "avx2"
|
return "avx2"
|
||||||
}
|
}
|
||||||
if cpu.X86.HasAVX {
|
if cpu.X86.HasAVX {
|
||||||
slog.Info("CPU has AVX")
|
slog.Debug("CPU has AVX")
|
||||||
return "avx"
|
return "avx"
|
||||||
}
|
}
|
||||||
slog.Info("CPU does not have vector extensions")
|
slog.Debug("CPU does not have vector extensions")
|
||||||
// else LCD
|
// else LCD
|
||||||
return ""
|
return ""
|
||||||
}
|
}
|
||||||
|
|||||||
95
gpu/gpu.go
95
gpu/gpu.go
@@ -21,11 +21,13 @@ import (
|
|||||||
"unsafe"
|
"unsafe"
|
||||||
|
|
||||||
"github.com/ollama/ollama/format"
|
"github.com/ollama/ollama/format"
|
||||||
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
)
|
)
|
||||||
|
|
||||||
type handles struct {
|
type handles struct {
|
||||||
deviceCount int
|
deviceCount int
|
||||||
cudart *C.cudart_handle_t
|
cudart *C.cudart_handle_t
|
||||||
|
nvcuda *C.nvcuda_handle_t
|
||||||
}
|
}
|
||||||
|
|
||||||
const (
|
const (
|
||||||
@@ -62,6 +64,22 @@ var CudartWindowsGlobs = []string{
|
|||||||
"c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll",
|
"c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
var NvcudaLinuxGlobs = []string{
|
||||||
|
"/usr/local/cuda*/targets/*/lib/libcuda.so*",
|
||||||
|
"/usr/lib/*-linux-gnu/nvidia/current/libcuda.so*",
|
||||||
|
"/usr/lib/*-linux-gnu/libcuda.so*",
|
||||||
|
"/usr/lib/wsl/lib/libcuda.so*",
|
||||||
|
"/usr/lib/wsl/drivers/*/libcuda.so*",
|
||||||
|
"/opt/cuda/lib*/libcuda.so*",
|
||||||
|
"/usr/local/cuda/lib*/libcuda.so*",
|
||||||
|
"/usr/lib*/libcuda.so*",
|
||||||
|
"/usr/local/lib*/libcuda.so*",
|
||||||
|
}
|
||||||
|
|
||||||
|
var NvcudaWindowsGlobs = []string{
|
||||||
|
"c:\\windows\\system*\\nvcuda.dll",
|
||||||
|
}
|
||||||
|
|
||||||
// Jetson devices have JETSON_JETPACK="x.y.z" factory set to the Jetpack version installed.
|
// Jetson devices have JETSON_JETPACK="x.y.z" factory set to the Jetpack version installed.
|
||||||
// Included to drive logic for reducing Ollama-allocated overhead on L4T/Jetson devices.
|
// Included to drive logic for reducing Ollama-allocated overhead on L4T/Jetson devices.
|
||||||
var CudaTegra string = os.Getenv("JETSON_JETPACK")
|
var CudaTegra string = os.Getenv("JETSON_JETPACK")
|
||||||
@@ -74,6 +92,8 @@ func initGPUHandles() *handles {
|
|||||||
gpuHandles := &handles{}
|
gpuHandles := &handles{}
|
||||||
var cudartMgmtName string
|
var cudartMgmtName string
|
||||||
var cudartMgmtPatterns []string
|
var cudartMgmtPatterns []string
|
||||||
|
var nvcudaMgmtName string
|
||||||
|
var nvcudaMgmtPatterns []string
|
||||||
|
|
||||||
tmpDir, _ := PayloadsDir()
|
tmpDir, _ := PayloadsDir()
|
||||||
switch runtime.GOOS {
|
switch runtime.GOOS {
|
||||||
@@ -82,6 +102,9 @@ func initGPUHandles() *handles {
|
|||||||
localAppData := os.Getenv("LOCALAPPDATA")
|
localAppData := os.Getenv("LOCALAPPDATA")
|
||||||
cudartMgmtPatterns = []string{filepath.Join(localAppData, "Programs", "Ollama", cudartMgmtName)}
|
cudartMgmtPatterns = []string{filepath.Join(localAppData, "Programs", "Ollama", cudartMgmtName)}
|
||||||
cudartMgmtPatterns = append(cudartMgmtPatterns, CudartWindowsGlobs...)
|
cudartMgmtPatterns = append(cudartMgmtPatterns, CudartWindowsGlobs...)
|
||||||
|
// Aligned with driver, we can't carry as payloads
|
||||||
|
nvcudaMgmtName = "nvcuda.dll"
|
||||||
|
nvcudaMgmtPatterns = NvcudaWindowsGlobs
|
||||||
case "linux":
|
case "linux":
|
||||||
cudartMgmtName = "libcudart.so*"
|
cudartMgmtName = "libcudart.so*"
|
||||||
if tmpDir != "" {
|
if tmpDir != "" {
|
||||||
@@ -89,16 +112,30 @@ func initGPUHandles() *handles {
|
|||||||
cudartMgmtPatterns = []string{filepath.Join(tmpDir, "cuda*", cudartMgmtName)}
|
cudartMgmtPatterns = []string{filepath.Join(tmpDir, "cuda*", cudartMgmtName)}
|
||||||
}
|
}
|
||||||
cudartMgmtPatterns = append(cudartMgmtPatterns, CudartLinuxGlobs...)
|
cudartMgmtPatterns = append(cudartMgmtPatterns, CudartLinuxGlobs...)
|
||||||
|
// Aligned with driver, we can't carry as payloads
|
||||||
|
nvcudaMgmtName = "libcuda.so*"
|
||||||
|
nvcudaMgmtPatterns = NvcudaLinuxGlobs
|
||||||
default:
|
default:
|
||||||
return gpuHandles
|
return gpuHandles
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Info("Detecting GPUs")
|
slog.Debug("Detecting GPUs")
|
||||||
|
nvcudaLibPaths := FindGPULibs(nvcudaMgmtName, nvcudaMgmtPatterns)
|
||||||
|
if len(nvcudaLibPaths) > 0 {
|
||||||
|
deviceCount, nvcuda, libPath := LoadNVCUDAMgmt(nvcudaLibPaths)
|
||||||
|
if nvcuda != nil {
|
||||||
|
slog.Debug("detected GPUs", "count", deviceCount, "library", libPath)
|
||||||
|
gpuHandles.nvcuda = nvcuda
|
||||||
|
gpuHandles.deviceCount = deviceCount
|
||||||
|
return gpuHandles
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
cudartLibPaths := FindGPULibs(cudartMgmtName, cudartMgmtPatterns)
|
cudartLibPaths := FindGPULibs(cudartMgmtName, cudartMgmtPatterns)
|
||||||
if len(cudartLibPaths) > 0 {
|
if len(cudartLibPaths) > 0 {
|
||||||
deviceCount, cudart, libPath := LoadCUDARTMgmt(cudartLibPaths)
|
deviceCount, cudart, libPath := LoadCUDARTMgmt(cudartLibPaths)
|
||||||
if cudart != nil {
|
if cudart != nil {
|
||||||
slog.Info("detected GPUs", "library", libPath, "count", deviceCount)
|
slog.Debug("detected GPUs", "library", libPath, "count", deviceCount)
|
||||||
gpuHandles.cudart = cudart
|
gpuHandles.cudart = cudart
|
||||||
gpuHandles.deviceCount = deviceCount
|
gpuHandles.deviceCount = deviceCount
|
||||||
return gpuHandles
|
return gpuHandles
|
||||||
@@ -118,6 +155,9 @@ func GetGPUInfo() GpuInfoList {
|
|||||||
if gpuHandles.cudart != nil {
|
if gpuHandles.cudart != nil {
|
||||||
C.cudart_release(*gpuHandles.cudart)
|
C.cudart_release(*gpuHandles.cudart)
|
||||||
}
|
}
|
||||||
|
if gpuHandles.nvcuda != nil {
|
||||||
|
C.nvcuda_release(*gpuHandles.nvcuda)
|
||||||
|
}
|
||||||
}()
|
}()
|
||||||
|
|
||||||
// All our GPU builds on x86 have AVX enabled, so fallback to CPU if we don't detect at least AVX
|
// All our GPU builds on x86 have AVX enabled, so fallback to CPU if we don't detect at least AVX
|
||||||
@@ -126,6 +166,12 @@ func GetGPUInfo() GpuInfoList {
|
|||||||
slog.Warn("CPU does not have AVX or AVX2, disabling GPU support.")
|
slog.Warn("CPU does not have AVX or AVX2, disabling GPU support.")
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// On windows we bundle the nvidia library one level above the runner dir
|
||||||
|
depPath := ""
|
||||||
|
if runtime.GOOS == "windows" && envconfig.RunnersDir != "" {
|
||||||
|
depPath = filepath.Dir(envconfig.RunnersDir)
|
||||||
|
}
|
||||||
|
|
||||||
var memInfo C.mem_info_t
|
var memInfo C.mem_info_t
|
||||||
resp := []GpuInfo{}
|
resp := []GpuInfo{}
|
||||||
|
|
||||||
@@ -138,7 +184,15 @@ func GetGPUInfo() GpuInfoList {
|
|||||||
gpuInfo := GpuInfo{
|
gpuInfo := GpuInfo{
|
||||||
Library: "cuda",
|
Library: "cuda",
|
||||||
}
|
}
|
||||||
C.cudart_check_vram(*gpuHandles.cudart, C.int(i), &memInfo)
|
var driverMajor int
|
||||||
|
var driverMinor int
|
||||||
|
if gpuHandles.cudart != nil {
|
||||||
|
C.cudart_check_vram(*gpuHandles.cudart, C.int(i), &memInfo)
|
||||||
|
} else {
|
||||||
|
C.nvcuda_check_vram(*gpuHandles.nvcuda, C.int(i), &memInfo)
|
||||||
|
driverMajor = int(gpuHandles.nvcuda.driver_major)
|
||||||
|
driverMinor = int(gpuHandles.nvcuda.driver_minor)
|
||||||
|
}
|
||||||
if memInfo.err != nil {
|
if memInfo.err != nil {
|
||||||
slog.Info("error looking up nvidia GPU memory", "error", C.GoString(memInfo.err))
|
slog.Info("error looking up nvidia GPU memory", "error", C.GoString(memInfo.err))
|
||||||
C.free(unsafe.Pointer(memInfo.err))
|
C.free(unsafe.Pointer(memInfo.err))
|
||||||
@@ -151,9 +205,12 @@ func GetGPUInfo() GpuInfoList {
|
|||||||
gpuInfo.TotalMemory = uint64(memInfo.total)
|
gpuInfo.TotalMemory = uint64(memInfo.total)
|
||||||
gpuInfo.FreeMemory = uint64(memInfo.free)
|
gpuInfo.FreeMemory = uint64(memInfo.free)
|
||||||
gpuInfo.ID = C.GoString(&memInfo.gpu_id[0])
|
gpuInfo.ID = C.GoString(&memInfo.gpu_id[0])
|
||||||
gpuInfo.Major = int(memInfo.major)
|
gpuInfo.Compute = fmt.Sprintf("%d.%d", memInfo.major, memInfo.minor)
|
||||||
gpuInfo.Minor = int(memInfo.minor)
|
|
||||||
gpuInfo.MinimumMemory = cudaMinimumMemory
|
gpuInfo.MinimumMemory = cudaMinimumMemory
|
||||||
|
gpuInfo.DependencyPath = depPath
|
||||||
|
gpuInfo.Name = C.GoString(&memInfo.gpu_name[0])
|
||||||
|
gpuInfo.DriverMajor = int(driverMajor)
|
||||||
|
gpuInfo.DriverMinor = int(driverMinor)
|
||||||
|
|
||||||
// TODO potentially sort on our own algorithm instead of what the underlying GPU library does...
|
// TODO potentially sort on our own algorithm instead of what the underlying GPU library does...
|
||||||
resp = append(resp, gpuInfo)
|
resp = append(resp, gpuInfo)
|
||||||
@@ -196,9 +253,10 @@ func GetCPUMem() (memInfo, error) {
|
|||||||
return ret, nil
|
return ret, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func FindGPULibs(baseLibName string, patterns []string) []string {
|
func FindGPULibs(baseLibName string, defaultPatterns []string) []string {
|
||||||
// Multiple GPU libraries may exist, and some may not work, so keep trying until we exhaust them
|
// Multiple GPU libraries may exist, and some may not work, so keep trying until we exhaust them
|
||||||
var ldPaths []string
|
var ldPaths []string
|
||||||
|
var patterns []string
|
||||||
gpuLibPaths := []string{}
|
gpuLibPaths := []string{}
|
||||||
slog.Debug("Searching for GPU library", "name", baseLibName)
|
slog.Debug("Searching for GPU library", "name", baseLibName)
|
||||||
|
|
||||||
@@ -218,8 +276,14 @@ func FindGPULibs(baseLibName string, patterns []string) []string {
|
|||||||
}
|
}
|
||||||
patterns = append(patterns, filepath.Join(d, baseLibName+"*"))
|
patterns = append(patterns, filepath.Join(d, baseLibName+"*"))
|
||||||
}
|
}
|
||||||
|
patterns = append(patterns, defaultPatterns...)
|
||||||
slog.Debug("gpu library search", "globs", patterns)
|
slog.Debug("gpu library search", "globs", patterns)
|
||||||
for _, pattern := range patterns {
|
for _, pattern := range patterns {
|
||||||
|
|
||||||
|
// Nvidia PhysX known to return bogus results
|
||||||
|
if strings.Contains(pattern, "PhysX") {
|
||||||
|
slog.Debug("skipping PhysX cuda library path", "path", pattern)
|
||||||
|
}
|
||||||
// Ignore glob discovery errors
|
// Ignore glob discovery errors
|
||||||
matches, _ := filepath.Glob(pattern)
|
matches, _ := filepath.Glob(pattern)
|
||||||
for _, match := range matches {
|
for _, match := range matches {
|
||||||
@@ -267,8 +331,25 @@ func LoadCUDARTMgmt(cudartLibPaths []string) (int, *C.cudart_handle_t, string) {
|
|||||||
return 0, nil, ""
|
return 0, nil, ""
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func LoadNVCUDAMgmt(nvcudaLibPaths []string) (int, *C.nvcuda_handle_t, string) {
|
||||||
|
var resp C.nvcuda_init_resp_t
|
||||||
|
resp.ch.verbose = getVerboseState()
|
||||||
|
for _, libPath := range nvcudaLibPaths {
|
||||||
|
lib := C.CString(libPath)
|
||||||
|
defer C.free(unsafe.Pointer(lib))
|
||||||
|
C.nvcuda_init(lib, &resp)
|
||||||
|
if resp.err != nil {
|
||||||
|
slog.Debug("Unable to load nvcuda", "library", libPath, "error", C.GoString(resp.err))
|
||||||
|
C.free(unsafe.Pointer(resp.err))
|
||||||
|
} else {
|
||||||
|
return int(resp.num_devices), &resp.ch, libPath
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0, nil, ""
|
||||||
|
}
|
||||||
|
|
||||||
func getVerboseState() C.uint16_t {
|
func getVerboseState() C.uint16_t {
|
||||||
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
|
if envconfig.Debug {
|
||||||
return C.uint16_t(1)
|
return C.uint16_t(1)
|
||||||
}
|
}
|
||||||
return C.uint16_t(0)
|
return C.uint16_t(0)
|
||||||
|
|||||||
@@ -39,16 +39,19 @@ extern "C" {
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
#define GPU_ID_LEN 64
|
#define GPU_ID_LEN 64
|
||||||
|
#define GPU_NAME_LEN 96
|
||||||
|
|
||||||
typedef struct mem_info {
|
typedef struct mem_info {
|
||||||
char *err; // If non-nill, caller responsible for freeing
|
char *err; // If non-nill, caller responsible for freeing
|
||||||
char gpu_id[GPU_ID_LEN];
|
char gpu_id[GPU_ID_LEN];
|
||||||
|
char gpu_name[GPU_NAME_LEN];
|
||||||
uint64_t total;
|
uint64_t total;
|
||||||
uint64_t free;
|
uint64_t free;
|
||||||
|
|
||||||
// Compute Capability
|
// Compute Capability
|
||||||
int major;
|
int major;
|
||||||
int minor;
|
int minor;
|
||||||
|
int patch;
|
||||||
} mem_info_t;
|
} mem_info_t;
|
||||||
|
|
||||||
void cpu_check_ram(mem_info_t *resp);
|
void cpu_check_ram(mem_info_t *resp);
|
||||||
@@ -58,6 +61,7 @@ void cpu_check_ram(mem_info_t *resp);
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
#include "gpu_info_cudart.h"
|
#include "gpu_info_cudart.h"
|
||||||
|
#include "gpu_info_nvcuda.h"
|
||||||
|
|
||||||
#endif // __GPU_INFO_H__
|
#endif // __GPU_INFO_H__
|
||||||
#endif // __APPLE__
|
#endif // __APPLE__
|
||||||
@@ -10,8 +10,6 @@ void cpu_check_ram(mem_info_t *resp) {
|
|||||||
if (GlobalMemoryStatusEx(&info) != 0) {
|
if (GlobalMemoryStatusEx(&info) != 0) {
|
||||||
resp->total = info.ullTotalPhys;
|
resp->total = info.ullTotalPhys;
|
||||||
resp->free = info.ullAvailPhys;
|
resp->free = info.ullAvailPhys;
|
||||||
resp->major = 0;
|
|
||||||
resp->minor = 0;
|
|
||||||
snprintf(&resp->gpu_id[0], GPU_ID_LEN, "0");
|
snprintf(&resp->gpu_id[0], GPU_ID_LEN, "0");
|
||||||
} else {
|
} else {
|
||||||
resp->err = LOAD_ERR();
|
resp->err = LOAD_ERR();
|
||||||
@@ -31,8 +29,6 @@ void cpu_check_ram(mem_info_t *resp) {
|
|||||||
} else {
|
} else {
|
||||||
resp->total = info.totalram * info.mem_unit;
|
resp->total = info.totalram * info.mem_unit;
|
||||||
resp->free = info.freeram * info.mem_unit;
|
resp->free = info.freeram * info.mem_unit;
|
||||||
resp->major = 0;
|
|
||||||
resp->minor = 0;
|
|
||||||
snprintf(&resp->gpu_id[0], GPU_ID_LEN, "0");
|
snprintf(&resp->gpu_id[0], GPU_ID_LEN, "0");
|
||||||
}
|
}
|
||||||
return;
|
return;
|
||||||
|
|||||||
@@ -6,9 +6,9 @@
|
|||||||
// Just enough typedef's to dlopen/dlsym for memory information
|
// Just enough typedef's to dlopen/dlsym for memory information
|
||||||
typedef enum cudartReturn_enum {
|
typedef enum cudartReturn_enum {
|
||||||
CUDART_SUCCESS = 0,
|
CUDART_SUCCESS = 0,
|
||||||
CUDA_ERROR_INVALID_VALUE = 1,
|
CUDART_ERROR_INVALID_VALUE = 1,
|
||||||
CUDA_ERROR_MEMORY_ALLOCATION = 2,
|
CUDART_ERROR_MEMORY_ALLOCATION = 2,
|
||||||
CUDA_ERROR_INSUFFICIENT_DRIVER = 35,
|
CUDART_ERROR_INSUFFICIENT_DRIVER = 35,
|
||||||
// Other values omitted for now...
|
// Other values omitted for now...
|
||||||
} cudartReturn_t;
|
} cudartReturn_t;
|
||||||
|
|
||||||
|
|||||||
207
gpu/gpu_info_nvcuda.c
Normal file
207
gpu/gpu_info_nvcuda.c
Normal file
@@ -0,0 +1,207 @@
|
|||||||
|
#ifndef __APPLE__ // TODO - maybe consider nvidia support on intel macs?
|
||||||
|
|
||||||
|
#include <string.h>
|
||||||
|
#include "gpu_info_nvcuda.h"
|
||||||
|
|
||||||
|
void nvcuda_init(char *nvcuda_lib_path, nvcuda_init_resp_t *resp) {
|
||||||
|
CUresult ret;
|
||||||
|
resp->err = NULL;
|
||||||
|
resp->num_devices = 0;
|
||||||
|
const int buflen = 256;
|
||||||
|
char buf[buflen + 1];
|
||||||
|
int i;
|
||||||
|
|
||||||
|
struct lookup {
|
||||||
|
char *s;
|
||||||
|
void **p;
|
||||||
|
} l[] = {
|
||||||
|
|
||||||
|
{"cuInit", (void *)&resp->ch.cuInit},
|
||||||
|
{"cuDriverGetVersion", (void *)&resp->ch.cuDriverGetVersion},
|
||||||
|
{"cuDeviceGetCount", (void *)&resp->ch.cuDeviceGetCount},
|
||||||
|
{"cuDeviceGet", (void *)&resp->ch.cuDeviceGet},
|
||||||
|
{"cuDeviceGetAttribute", (void *)&resp->ch.cuDeviceGetAttribute},
|
||||||
|
{"cuDeviceGetUuid", (void *)&resp->ch.cuDeviceGetUuid},
|
||||||
|
{"cuDeviceGetName", (void *)&resp->ch.cuDeviceGetName},
|
||||||
|
{"cuCtxCreate_v3", (void *)&resp->ch.cuCtxCreate_v3},
|
||||||
|
{"cuMemGetInfo_v2", (void *)&resp->ch.cuMemGetInfo_v2},
|
||||||
|
{"cuCtxDestroy", (void *)&resp->ch.cuCtxDestroy},
|
||||||
|
{NULL, NULL},
|
||||||
|
};
|
||||||
|
|
||||||
|
resp->ch.handle = LOAD_LIBRARY(nvcuda_lib_path, RTLD_LAZY);
|
||||||
|
if (!resp->ch.handle) {
|
||||||
|
char *msg = LOAD_ERR();
|
||||||
|
LOG(resp->ch.verbose, "library %s load err: %s\n", nvcuda_lib_path, msg);
|
||||||
|
snprintf(buf, buflen,
|
||||||
|
"Unable to load %s library to query for Nvidia GPUs: %s",
|
||||||
|
nvcuda_lib_path, msg);
|
||||||
|
free(msg);
|
||||||
|
resp->err = strdup(buf);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (i = 0; l[i].s != NULL; i++) {
|
||||||
|
*l[i].p = LOAD_SYMBOL(resp->ch.handle, l[i].s);
|
||||||
|
if (!*l[i].p) {
|
||||||
|
char *msg = LOAD_ERR();
|
||||||
|
LOG(resp->ch.verbose, "dlerr: %s\n", msg);
|
||||||
|
UNLOAD_LIBRARY(resp->ch.handle);
|
||||||
|
resp->ch.handle = NULL;
|
||||||
|
snprintf(buf, buflen, "symbol lookup for %s failed: %s", l[i].s,
|
||||||
|
msg);
|
||||||
|
free(msg);
|
||||||
|
resp->err = strdup(buf);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = (*resp->ch.cuInit)(0);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
LOG(resp->ch.verbose, "cuInit err: %d\n", ret);
|
||||||
|
UNLOAD_LIBRARY(resp->ch.handle);
|
||||||
|
resp->ch.handle = NULL;
|
||||||
|
if (ret == CUDA_ERROR_INSUFFICIENT_DRIVER) {
|
||||||
|
resp->err = strdup("your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
snprintf(buf, buflen, "nvcuda init failure: %d", ret);
|
||||||
|
resp->err = strdup(buf);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
int version = 0;
|
||||||
|
resp->ch.driver_major = 0;
|
||||||
|
resp->ch.driver_minor = 0;
|
||||||
|
|
||||||
|
// Report driver version if we're in verbose mode, ignore errors
|
||||||
|
ret = (*resp->ch.cuDriverGetVersion)(&version);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
LOG(resp->ch.verbose, "cuDriverGetVersion failed: %d\n", ret);
|
||||||
|
} else {
|
||||||
|
resp->ch.driver_major = version / 1000;
|
||||||
|
resp->ch.driver_minor = (version - (resp->ch.driver_major * 1000)) / 10;
|
||||||
|
LOG(resp->ch.verbose, "CUDA driver version: %d.%d\n", resp->ch.driver_major, resp->ch.driver_minor);
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = (*resp->ch.cuDeviceGetCount)(&resp->num_devices);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
LOG(resp->ch.verbose, "cuDeviceGetCount err: %d\n", ret);
|
||||||
|
UNLOAD_LIBRARY(resp->ch.handle);
|
||||||
|
resp->ch.handle = NULL;
|
||||||
|
snprintf(buf, buflen, "unable to get device count: %d", ret);
|
||||||
|
resp->err = strdup(buf);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const int buflen = 256;
|
||||||
|
void nvcuda_check_vram(nvcuda_handle_t h, int i, mem_info_t *resp) {
|
||||||
|
resp->err = NULL;
|
||||||
|
nvcudaMemory_t memInfo = {0,0};
|
||||||
|
CUresult ret;
|
||||||
|
CUdevice device = -1;
|
||||||
|
CUcontext ctx = NULL;
|
||||||
|
char buf[buflen + 1];
|
||||||
|
CUuuid uuid = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
|
||||||
|
|
||||||
|
if (h.handle == NULL) {
|
||||||
|
resp->err = strdup("nvcuda handle isn't initialized");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = (*h.cuDeviceGet)(&device, i);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
snprintf(buf, buflen, "nvcuda device failed to initialize");
|
||||||
|
resp->err = strdup(buf);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
int major = 0;
|
||||||
|
int minor = 0;
|
||||||
|
ret = (*h.cuDeviceGetAttribute)(&major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, device);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
LOG(h.verbose, "[%d] device major lookup failure: %d\n", i, ret);
|
||||||
|
} else {
|
||||||
|
ret = (*h.cuDeviceGetAttribute)(&minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, device);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
LOG(h.verbose, "[%d] device minor lookup failure: %d\n", i, ret);
|
||||||
|
} else {
|
||||||
|
resp->minor = minor;
|
||||||
|
resp->major = major;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = (*h.cuDeviceGetUuid)(&uuid, device);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
LOG(h.verbose, "[%d] device uuid lookup failure: %d\n", i, ret);
|
||||||
|
snprintf(&resp->gpu_id[0], GPU_ID_LEN, "%d", i);
|
||||||
|
} else {
|
||||||
|
// GPU-d110a105-ac29-1d54-7b49-9c90440f215b
|
||||||
|
snprintf(&resp->gpu_id[0], GPU_ID_LEN,
|
||||||
|
"GPU-%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-%02x%02x%02x%02x%02x%02x",
|
||||||
|
uuid.bytes[0],
|
||||||
|
uuid.bytes[1],
|
||||||
|
uuid.bytes[2],
|
||||||
|
uuid.bytes[3],
|
||||||
|
uuid.bytes[4],
|
||||||
|
uuid.bytes[5],
|
||||||
|
uuid.bytes[6],
|
||||||
|
uuid.bytes[7],
|
||||||
|
uuid.bytes[8],
|
||||||
|
uuid.bytes[9],
|
||||||
|
uuid.bytes[10],
|
||||||
|
uuid.bytes[11],
|
||||||
|
uuid.bytes[12],
|
||||||
|
uuid.bytes[13],
|
||||||
|
uuid.bytes[14],
|
||||||
|
uuid.bytes[15]
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = (*h.cuDeviceGetName)(&resp->gpu_name[0], GPU_NAME_LEN, device);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
LOG(h.verbose, "[%d] device name lookup failure: %d\n", i, ret);
|
||||||
|
resp->gpu_name[0] = '\0';
|
||||||
|
}
|
||||||
|
|
||||||
|
// To get memory we have to set (and release) a context
|
||||||
|
ret = (*h.cuCtxCreate_v3)(&ctx, NULL, 0, 0, device);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
snprintf(buf, buflen, "nvcuda failed to get primary device context %d", ret);
|
||||||
|
resp->err = strdup(buf);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
ret = (*h.cuMemGetInfo_v2)(&memInfo.free, &memInfo.total);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
snprintf(buf, buflen, "nvcuda device memory info lookup failure %d", ret);
|
||||||
|
resp->err = strdup(buf);
|
||||||
|
// Best effort on failure...
|
||||||
|
(*h.cuCtxDestroy)(ctx);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
resp->total = memInfo.total;
|
||||||
|
resp->free = memInfo.free;
|
||||||
|
|
||||||
|
LOG(h.verbose, "[%s] CUDA totalMem %lu mb\n", resp->gpu_id, resp->total / 1024 / 1024);
|
||||||
|
LOG(h.verbose, "[%s] CUDA freeMem %lu mb\n", resp->gpu_id, resp->free / 1024 / 1024);
|
||||||
|
LOG(h.verbose, "[%s] Compute Capability %d.%d\n", resp->gpu_id, resp->major, resp->minor);
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
ret = (*h.cuCtxDestroy)(ctx);
|
||||||
|
if (ret != CUDA_SUCCESS) {
|
||||||
|
LOG(1, "nvcuda failed to release primary device context %d", ret);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void nvcuda_release(nvcuda_handle_t h) {
|
||||||
|
LOG(h.verbose, "releasing nvcuda library\n");
|
||||||
|
UNLOAD_LIBRARY(h.handle);
|
||||||
|
// TODO and other context release logic?
|
||||||
|
h.handle = NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // __APPLE__
|
||||||
74
gpu/gpu_info_nvcuda.h
Normal file
74
gpu/gpu_info_nvcuda.h
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
#ifndef __APPLE__
|
||||||
|
#ifndef __GPU_INFO_NVCUDA_H__
|
||||||
|
#define __GPU_INFO_NVCUDA_H__
|
||||||
|
#include "gpu_info.h"
|
||||||
|
|
||||||
|
// Just enough typedef's to dlopen/dlsym for memory information
|
||||||
|
typedef enum cudaError_enum {
|
||||||
|
CUDA_SUCCESS = 0,
|
||||||
|
CUDA_ERROR_INVALID_VALUE = 1,
|
||||||
|
CUDA_ERROR_MEMORY_ALLOCATION = 2,
|
||||||
|
CUDA_ERROR_NOT_INITIALIZED = 3,
|
||||||
|
CUDA_ERROR_INSUFFICIENT_DRIVER = 35,
|
||||||
|
// Other values omitted for now...
|
||||||
|
} CUresult;
|
||||||
|
|
||||||
|
typedef enum CUdevice_attribute_enum {
|
||||||
|
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
|
||||||
|
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
|
||||||
|
|
||||||
|
// TODO - not yet wired up but may be useful for Jetson or other
|
||||||
|
// integrated GPU scenarios with shared memory
|
||||||
|
CU_DEVICE_ATTRIBUTE_INTEGRATED = 18
|
||||||
|
|
||||||
|
} CUdevice_attribute;
|
||||||
|
|
||||||
|
typedef void *nvcudaDevice_t; // Opaque is sufficient
|
||||||
|
typedef struct nvcudaMemory_st {
|
||||||
|
uint64_t total;
|
||||||
|
uint64_t free;
|
||||||
|
} nvcudaMemory_t;
|
||||||
|
|
||||||
|
typedef struct nvcudaDriverVersion {
|
||||||
|
int major;
|
||||||
|
int minor;
|
||||||
|
} nvcudaDriverVersion_t;
|
||||||
|
|
||||||
|
typedef struct CUuuid_st {
|
||||||
|
unsigned char bytes[16];
|
||||||
|
} CUuuid;
|
||||||
|
|
||||||
|
typedef int CUdevice;
|
||||||
|
typedef void* CUcontext;
|
||||||
|
|
||||||
|
typedef struct nvcuda_handle {
|
||||||
|
void *handle;
|
||||||
|
uint16_t verbose;
|
||||||
|
int driver_major;
|
||||||
|
int driver_minor;
|
||||||
|
CUresult (*cuInit)(unsigned int Flags);
|
||||||
|
CUresult (*cuDriverGetVersion)(int *driverVersion);
|
||||||
|
CUresult (*cuDeviceGetCount)(int *);
|
||||||
|
CUresult (*cuDeviceGet)(CUdevice* device, int ordinal);
|
||||||
|
CUresult (*cuDeviceGetAttribute)(int* pi, CUdevice_attribute attrib, CUdevice dev);
|
||||||
|
CUresult (*cuDeviceGetUuid)(CUuuid* uuid, CUdevice dev); // signature compatible with cuDeviceGetUuid_v2
|
||||||
|
CUresult (*cuDeviceGetName)(char *name, int len, CUdevice dev);
|
||||||
|
|
||||||
|
// Context specific aspects
|
||||||
|
CUresult (*cuCtxCreate_v3)(CUcontext* pctx, void *params, int len, unsigned int flags, CUdevice dev);
|
||||||
|
CUresult (*cuMemGetInfo_v2)(uint64_t* free, uint64_t* total);
|
||||||
|
CUresult (*cuCtxDestroy)(CUcontext ctx);
|
||||||
|
} nvcuda_handle_t;
|
||||||
|
|
||||||
|
typedef struct nvcuda_init_resp {
|
||||||
|
char *err; // If err is non-null handle is invalid
|
||||||
|
nvcuda_handle_t ch;
|
||||||
|
int num_devices;
|
||||||
|
} nvcuda_init_resp_t;
|
||||||
|
|
||||||
|
void nvcuda_init(char *nvcuda_lib_path, nvcuda_init_resp_t *resp);
|
||||||
|
void nvcuda_check_vram(nvcuda_handle_t ch, int device_id, mem_info_t *resp);
|
||||||
|
void nvcuda_release(nvcuda_handle_t ch);
|
||||||
|
|
||||||
|
#endif // __GPU_INFO_NVCUDA_H__
|
||||||
|
#endif // __APPLE__
|
||||||
34
gpu/types.go
34
gpu/types.go
@@ -1,5 +1,12 @@
|
|||||||
package gpu
|
package gpu
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
|
|
||||||
|
"github.com/ollama/ollama/format"
|
||||||
|
)
|
||||||
|
|
||||||
type memInfo struct {
|
type memInfo struct {
|
||||||
TotalMemory uint64 `json:"total_memory,omitempty"`
|
TotalMemory uint64 `json:"total_memory,omitempty"`
|
||||||
FreeMemory uint64 `json:"free_memory,omitempty"`
|
FreeMemory uint64 `json:"free_memory,omitempty"`
|
||||||
@@ -20,11 +27,13 @@ type GpuInfo struct {
|
|||||||
DependencyPath string `json:"lib_path,omitempty"`
|
DependencyPath string `json:"lib_path,omitempty"`
|
||||||
|
|
||||||
// GPU information
|
// GPU information
|
||||||
ID string `json:"gpu_id"` // string to use for selection of this specific GPU
|
ID string `json:"gpu_id"` // string to use for selection of this specific GPU
|
||||||
Name string `json:"name"` // user friendly name if available
|
Name string `json:"name"` // user friendly name if available
|
||||||
Major int `json:"major,omitempty"` // Major compatibility version (CC or gfx)
|
Compute string `json:"compute"` // Compute Capability or gfx
|
||||||
Minor int `json:"minor,omitempty"` // Minor compatibility version (CC or gfx)
|
|
||||||
Patch int `json:"patch,omitempty"` // Patch compatibility only matters on AMD
|
// Driver Information - TODO no need to put this on each GPU
|
||||||
|
DriverMajor int `json:"driver_major,omitempty"`
|
||||||
|
DriverMinor int `json:"driver_minor,omitempty"`
|
||||||
|
|
||||||
// TODO other performance capability info to help in scheduling decisions
|
// TODO other performance capability info to help in scheduling decisions
|
||||||
}
|
}
|
||||||
@@ -56,6 +65,21 @@ func (l GpuInfoList) ByLibrary() []GpuInfoList {
|
|||||||
return resp
|
return resp
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Report the GPU information into the log an Info level
|
||||||
|
func (l GpuInfoList) LogDetails() {
|
||||||
|
for _, g := range l {
|
||||||
|
slog.Info("inference compute",
|
||||||
|
"id", g.ID,
|
||||||
|
"library", g.Library,
|
||||||
|
"compute", g.Compute,
|
||||||
|
"driver", fmt.Sprintf("%d.%d", g.DriverMajor, g.DriverMinor),
|
||||||
|
"name", g.Name,
|
||||||
|
"total", format.HumanBytes2(g.TotalMemory),
|
||||||
|
"available", format.HumanBytes2(g.FreeMemory),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Sort by Free Space
|
// Sort by Free Space
|
||||||
type ByFreeMemory []GpuInfo
|
type ByFreeMemory []GpuInfo
|
||||||
|
|
||||||
|
|||||||
@@ -217,7 +217,7 @@ func TestMultiModelStress(t *testing.T) {
|
|||||||
defer wg.Done()
|
defer wg.Done()
|
||||||
for j := 0; j < 3; j++ {
|
for j := 0; j < 3; j++ {
|
||||||
slog.Info("Starting", "req", i, "iter", j, "model", req[i].Model)
|
slog.Info("Starting", "req", i, "iter", j, "model", req[i].Model)
|
||||||
DoGenerate(ctx, t, client, req[i], resp[i], 90*time.Second, 5*time.Second)
|
DoGenerate(ctx, t, client, req[i], resp[i], 120*time.Second, 5*time.Second)
|
||||||
}
|
}
|
||||||
}(i)
|
}(i)
|
||||||
}
|
}
|
||||||
|
|||||||
117
integration/max_queue_test.go
Normal file
117
integration/max_queue_test.go
Normal file
@@ -0,0 +1,117 @@
|
|||||||
|
//go:build integration
|
||||||
|
|
||||||
|
package integration
|
||||||
|
|
||||||
|
import (
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
|
"os"
|
||||||
|
"strconv"
|
||||||
|
"strings"
|
||||||
|
"sync"
|
||||||
|
"testing"
|
||||||
|
"time"
|
||||||
|
|
||||||
|
"github.com/ollama/ollama/api"
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestMaxQueue(t *testing.T) {
|
||||||
|
// Note: This test can be quite slow when running in CPU mode, so keep the threadCount low unless your on GPU
|
||||||
|
// Also note that by default Darwin can't sustain > ~128 connections without adjusting limits
|
||||||
|
threadCount := 32
|
||||||
|
mq := os.Getenv("OLLAMA_MAX_QUEUE")
|
||||||
|
if mq != "" {
|
||||||
|
var err error
|
||||||
|
threadCount, err = strconv.Atoi(mq)
|
||||||
|
require.NoError(t, err)
|
||||||
|
} else {
|
||||||
|
os.Setenv("OLLAMA_MAX_QUEUE", fmt.Sprintf("%d", threadCount))
|
||||||
|
}
|
||||||
|
|
||||||
|
req := api.GenerateRequest{
|
||||||
|
Model: "orca-mini",
|
||||||
|
Prompt: "write a long historical fiction story about christopher columbus. use at least 10 facts from his actual journey",
|
||||||
|
Options: map[string]interface{}{
|
||||||
|
"seed": 42,
|
||||||
|
"temperature": 0.0,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
resp := []string{"explore", "discover", "ocean"}
|
||||||
|
|
||||||
|
// CPU mode takes much longer at the limit with a large queue setting
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
||||||
|
defer cancel()
|
||||||
|
client, _, cleanup := InitServerConnection(ctx, t)
|
||||||
|
defer cleanup()
|
||||||
|
|
||||||
|
require.NoError(t, PullIfMissing(ctx, client, req.Model))
|
||||||
|
|
||||||
|
// Context for the worker threads so we can shut them down
|
||||||
|
// embedCtx, embedCancel := context.WithCancel(ctx)
|
||||||
|
embedCtx := ctx
|
||||||
|
|
||||||
|
var genwg sync.WaitGroup
|
||||||
|
go func() {
|
||||||
|
genwg.Add(1)
|
||||||
|
defer genwg.Done()
|
||||||
|
slog.Info("Starting generate request")
|
||||||
|
DoGenerate(ctx, t, client, req, resp, 45*time.Second, 5*time.Second)
|
||||||
|
slog.Info("generate completed")
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Give the generate a chance to get started before we start hammering on embed requests
|
||||||
|
time.Sleep(5 * time.Millisecond)
|
||||||
|
|
||||||
|
threadCount += 10 // Add a few extra to ensure we push the queue past its limit
|
||||||
|
busyCount := 0
|
||||||
|
resetByPeerCount := 0
|
||||||
|
canceledCount := 0
|
||||||
|
succesCount := 0
|
||||||
|
counterMu := sync.Mutex{}
|
||||||
|
var embedwg sync.WaitGroup
|
||||||
|
for i := 0; i < threadCount; i++ {
|
||||||
|
go func(i int) {
|
||||||
|
embedwg.Add(1)
|
||||||
|
defer embedwg.Done()
|
||||||
|
slog.Info("embed started", "id", i)
|
||||||
|
embedReq := api.EmbeddingRequest{
|
||||||
|
Model: req.Model,
|
||||||
|
Prompt: req.Prompt,
|
||||||
|
Options: req.Options,
|
||||||
|
}
|
||||||
|
// Fresh client for every request
|
||||||
|
client, _ = GetTestEndpoint()
|
||||||
|
|
||||||
|
resp, genErr := client.Embeddings(embedCtx, &embedReq)
|
||||||
|
counterMu.Lock()
|
||||||
|
defer counterMu.Unlock()
|
||||||
|
switch {
|
||||||
|
case genErr == nil:
|
||||||
|
succesCount++
|
||||||
|
require.Greater(t, len(resp.Embedding), 5) // somewhat arbitrary, but sufficient to be reasonable
|
||||||
|
case errors.Is(genErr, context.Canceled):
|
||||||
|
canceledCount++
|
||||||
|
case strings.Contains(genErr.Error(), "busy"):
|
||||||
|
busyCount++
|
||||||
|
case strings.Contains(genErr.Error(), "connection reset by peer"):
|
||||||
|
resetByPeerCount++
|
||||||
|
default:
|
||||||
|
require.NoError(t, genErr, "%d request failed", i)
|
||||||
|
}
|
||||||
|
|
||||||
|
slog.Info("embed finished", "id", i)
|
||||||
|
}(i)
|
||||||
|
}
|
||||||
|
genwg.Wait()
|
||||||
|
slog.Info("generate done, waiting for embeds")
|
||||||
|
embedwg.Wait()
|
||||||
|
|
||||||
|
require.Equal(t, resetByPeerCount, 0, "Connections reset by peer, have you updated your fd and socket limits?")
|
||||||
|
require.True(t, busyCount > 0, "no requests hit busy error but some should have")
|
||||||
|
require.True(t, canceledCount == 0, "no requests should have been canceled due to timeout")
|
||||||
|
|
||||||
|
slog.Info("embeds completed", "success", succesCount, "busy", busyCount, "reset", resetByPeerCount, "canceled", canceledCount)
|
||||||
|
}
|
||||||
@@ -85,7 +85,7 @@ func GetTestEndpoint() (*api.Client, string) {
|
|||||||
var serverMutex sync.Mutex
|
var serverMutex sync.Mutex
|
||||||
var serverReady bool
|
var serverReady bool
|
||||||
|
|
||||||
func startServer(ctx context.Context, ollamaHost string) error {
|
func startServer(t *testing.T, ctx context.Context, ollamaHost string) error {
|
||||||
// Make sure the server has been built
|
// Make sure the server has been built
|
||||||
CLIName, err := filepath.Abs("../ollama")
|
CLIName, err := filepath.Abs("../ollama")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -107,7 +107,7 @@ func startServer(ctx context.Context, ollamaHost string) error {
|
|||||||
|
|
||||||
if tmp := os.Getenv("OLLAMA_HOST"); tmp != ollamaHost {
|
if tmp := os.Getenv("OLLAMA_HOST"); tmp != ollamaHost {
|
||||||
slog.Info("setting env", "OLLAMA_HOST", ollamaHost)
|
slog.Info("setting env", "OLLAMA_HOST", ollamaHost)
|
||||||
os.Setenv("OLLAMA_HOST", ollamaHost)
|
t.Setenv("OLLAMA_HOST", ollamaHost)
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Info("starting server", "url", ollamaHost)
|
slog.Info("starting server", "url", ollamaHost)
|
||||||
@@ -200,7 +200,7 @@ func InitServerConnection(ctx context.Context, t *testing.T) (*api.Client, strin
|
|||||||
}
|
}
|
||||||
lifecycle.ServerLogFile = fp.Name()
|
lifecycle.ServerLogFile = fp.Name()
|
||||||
fp.Close()
|
fp.Close()
|
||||||
require.NoError(t, startServer(ctx, testEndpoint))
|
require.NoError(t, startServer(t, ctx, testEndpoint))
|
||||||
}
|
}
|
||||||
|
|
||||||
return client, testEndpoint, func() {
|
return client, testEndpoint, func() {
|
||||||
|
|||||||
88
llm/ext_server/server.cpp
vendored
88
llm/ext_server/server.cpp
vendored
@@ -66,7 +66,7 @@ struct server_params {
|
|||||||
};
|
};
|
||||||
|
|
||||||
bool server_verbose = false;
|
bool server_verbose = false;
|
||||||
bool server_log_json = true;
|
bool server_log_json = false;
|
||||||
|
|
||||||
enum stop_type {
|
enum stop_type {
|
||||||
STOP_FULL,
|
STOP_FULL,
|
||||||
@@ -266,7 +266,7 @@ struct server_slot {
|
|||||||
sprintf(buffer, "prompt eval time = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)",
|
sprintf(buffer, "prompt eval time = %10.2f ms / %5d tokens (%8.2f ms per token, %8.2f tokens per second)",
|
||||||
t_prompt_processing, n_prompt_tokens_processed,
|
t_prompt_processing, n_prompt_tokens_processed,
|
||||||
t_token, n_tokens_second);
|
t_token, n_tokens_second);
|
||||||
LOG_INFO(buffer, {
|
LOG_DEBUG(buffer, {
|
||||||
{"slot_id", id},
|
{"slot_id", id},
|
||||||
{"task_id", task_id},
|
{"task_id", task_id},
|
||||||
{"t_prompt_processing", t_prompt_processing},
|
{"t_prompt_processing", t_prompt_processing},
|
||||||
@@ -280,7 +280,7 @@ struct server_slot {
|
|||||||
sprintf(buffer, "generation eval time = %10.2f ms / %5d runs (%8.2f ms per token, %8.2f tokens per second)",
|
sprintf(buffer, "generation eval time = %10.2f ms / %5d runs (%8.2f ms per token, %8.2f tokens per second)",
|
||||||
t_token_generation, n_decoded,
|
t_token_generation, n_decoded,
|
||||||
t_token, n_tokens_second);
|
t_token, n_tokens_second);
|
||||||
LOG_INFO(buffer, {
|
LOG_DEBUG(buffer, {
|
||||||
{"slot_id", id},
|
{"slot_id", id},
|
||||||
{"task_id", task_id},
|
{"task_id", task_id},
|
||||||
{"t_token_generation", t_token_generation},
|
{"t_token_generation", t_token_generation},
|
||||||
@@ -290,7 +290,7 @@ struct server_slot {
|
|||||||
});
|
});
|
||||||
|
|
||||||
sprintf(buffer, " total time = %10.2f ms", t_prompt_processing + t_token_generation);
|
sprintf(buffer, " total time = %10.2f ms", t_prompt_processing + t_token_generation);
|
||||||
LOG_INFO(buffer, {
|
LOG_DEBUG(buffer, {
|
||||||
{"slot_id", id},
|
{"slot_id", id},
|
||||||
{"task_id", task_id},
|
{"task_id", task_id},
|
||||||
{"t_prompt_processing", t_prompt_processing},
|
{"t_prompt_processing", t_prompt_processing},
|
||||||
@@ -371,7 +371,7 @@ struct llama_server_context
|
|||||||
{
|
{
|
||||||
if (clp_ctx)
|
if (clp_ctx)
|
||||||
{
|
{
|
||||||
LOG_INFO("freeing clip model", {});
|
LOG_DEBUG("freeing clip model", {});
|
||||||
clip_free(clp_ctx);
|
clip_free(clp_ctx);
|
||||||
clp_ctx = nullptr;
|
clp_ctx = nullptr;
|
||||||
}
|
}
|
||||||
@@ -392,7 +392,7 @@ struct llama_server_context
|
|||||||
params = params_;
|
params = params_;
|
||||||
if (!params.mmproj.empty()) {
|
if (!params.mmproj.empty()) {
|
||||||
multimodal = true;
|
multimodal = true;
|
||||||
LOG_INFO("Multi Modal Mode Enabled", {});
|
LOG_DEBUG("Multi Modal Mode Enabled", {});
|
||||||
clp_ctx = clip_model_load(params.mmproj.c_str(), /*verbosity=*/ 1);
|
clp_ctx = clip_model_load(params.mmproj.c_str(), /*verbosity=*/ 1);
|
||||||
if(clp_ctx == nullptr) {
|
if(clp_ctx == nullptr) {
|
||||||
LOG_ERROR("unable to load clip model", {{"model", params.mmproj}});
|
LOG_ERROR("unable to load clip model", {{"model", params.mmproj}});
|
||||||
@@ -445,7 +445,7 @@ struct llama_server_context
|
|||||||
|
|
||||||
const int32_t n_ctx_slot = n_ctx / params.n_parallel;
|
const int32_t n_ctx_slot = n_ctx / params.n_parallel;
|
||||||
|
|
||||||
LOG_INFO("initializing slots", {{"n_slots", params.n_parallel}});
|
LOG_DEBUG("initializing slots", {{"n_slots", params.n_parallel}});
|
||||||
for (int i = 0; i < params.n_parallel; i++)
|
for (int i = 0; i < params.n_parallel; i++)
|
||||||
{
|
{
|
||||||
server_slot slot;
|
server_slot slot;
|
||||||
@@ -454,7 +454,7 @@ struct llama_server_context
|
|||||||
slot.n_ctx = n_ctx_slot;
|
slot.n_ctx = n_ctx_slot;
|
||||||
slot.n_predict = params.n_predict;
|
slot.n_predict = params.n_predict;
|
||||||
|
|
||||||
LOG_INFO("new slot", {
|
LOG_DEBUG("new slot", {
|
||||||
{"slot_id", slot.id},
|
{"slot_id", slot.id},
|
||||||
{"n_ctx_slot", slot.n_ctx}
|
{"n_ctx_slot", slot.n_ctx}
|
||||||
});
|
});
|
||||||
@@ -468,7 +468,7 @@ struct llama_server_context
|
|||||||
//GGML_ASSERT(n_ctx_train % ga_w == 0 && "n_ctx_train must be a multiple of ga_w"); // NOLINT
|
//GGML_ASSERT(n_ctx_train % ga_w == 0 && "n_ctx_train must be a multiple of ga_w"); // NOLINT
|
||||||
//GGML_ASSERT(n_ctx >= n_ctx_train * ga_n && "n_ctx must be at least n_ctx_train * ga_n"); // NOLINT
|
//GGML_ASSERT(n_ctx >= n_ctx_train * ga_n && "n_ctx must be at least n_ctx_train * ga_n"); // NOLINT
|
||||||
|
|
||||||
LOG_INFO("slot self-extend", {
|
LOG_DEBUG("slot self-extend", {
|
||||||
{"slot_id", slot.id},
|
{"slot_id", slot.id},
|
||||||
{"ga_n", ga_n},
|
{"ga_n", ga_n},
|
||||||
{"ga_w", ga_w}
|
{"ga_w", ga_w}
|
||||||
@@ -827,7 +827,7 @@ struct llama_server_context
|
|||||||
|
|
||||||
all_slots_are_idle = false;
|
all_slots_are_idle = false;
|
||||||
|
|
||||||
LOG_INFO("slot is processing task", {
|
LOG_DEBUG("slot is processing task", {
|
||||||
{"slot_id", slot->id},
|
{"slot_id", slot->id},
|
||||||
{"task_id", slot->task_id},
|
{"task_id", slot->task_id},
|
||||||
});
|
});
|
||||||
@@ -1186,8 +1186,6 @@ struct llama_server_context
|
|||||||
{"model", params.model_alias},
|
{"model", params.model_alias},
|
||||||
{"tokens_predicted", slot.n_decoded},
|
{"tokens_predicted", slot.n_decoded},
|
||||||
{"tokens_evaluated", slot.n_prompt_tokens},
|
{"tokens_evaluated", slot.n_prompt_tokens},
|
||||||
{"generation_settings", get_formated_generation(slot)},
|
|
||||||
{"prompt", slot.prompt},
|
|
||||||
{"truncated", slot.truncated},
|
{"truncated", slot.truncated},
|
||||||
{"stopped_eos", slot.stopped_eos},
|
{"stopped_eos", slot.stopped_eos},
|
||||||
{"stopped_word", slot.stopped_word},
|
{"stopped_word", slot.stopped_word},
|
||||||
@@ -1506,7 +1504,7 @@ struct llama_server_context
|
|||||||
}
|
}
|
||||||
slots_data.push_back(slot_data);
|
slots_data.push_back(slot_data);
|
||||||
}
|
}
|
||||||
LOG_INFO("slot data", {
|
LOG_DEBUG("slot data", {
|
||||||
{"task_id", task.id},
|
{"task_id", task.id},
|
||||||
{"n_idle_slots", n_idle_slots},
|
{"n_idle_slots", n_idle_slots},
|
||||||
{"n_processing_slots", n_processing_slots}
|
{"n_processing_slots", n_processing_slots}
|
||||||
@@ -1568,7 +1566,7 @@ struct llama_server_context
|
|||||||
bool update_slots() {
|
bool update_slots() {
|
||||||
if (system_need_update)
|
if (system_need_update)
|
||||||
{
|
{
|
||||||
LOG_INFO("updating system prompt", {});
|
LOG_DEBUG("updating system prompt", {});
|
||||||
system_prompt_update();
|
system_prompt_update();
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1578,7 +1576,7 @@ struct llama_server_context
|
|||||||
{
|
{
|
||||||
if (system_prompt.empty() && clean_kv_cache)
|
if (system_prompt.empty() && clean_kv_cache)
|
||||||
{
|
{
|
||||||
LOG_INFO("all slots are idle and system prompt is empty, clear the KV cache", {});
|
LOG_DEBUG("all slots are idle and system prompt is empty, clear the KV cache", {});
|
||||||
kv_cache_clear();
|
kv_cache_clear();
|
||||||
}
|
}
|
||||||
return true;
|
return true;
|
||||||
@@ -1601,7 +1599,7 @@ struct llama_server_context
|
|||||||
const int n_left = (int) system_tokens.size() + slot.n_past - n_keep;
|
const int n_left = (int) system_tokens.size() + slot.n_past - n_keep;
|
||||||
const int n_discard = n_left / 2;
|
const int n_discard = n_left / 2;
|
||||||
|
|
||||||
LOG_INFO("slot context shift", {
|
LOG_DEBUG("slot context shift", {
|
||||||
{"slot_id", slot.id},
|
{"slot_id", slot.id},
|
||||||
{"task_id", slot.task_id},
|
{"task_id", slot.task_id},
|
||||||
{"n_keep", n_keep},
|
{"n_keep", n_keep},
|
||||||
@@ -1640,7 +1638,7 @@ struct llama_server_context
|
|||||||
slot.command = NONE;
|
slot.command = NONE;
|
||||||
slot.t_last_used = ggml_time_us();
|
slot.t_last_used = ggml_time_us();
|
||||||
|
|
||||||
LOG_INFO("slot released", {
|
LOG_DEBUG("slot released", {
|
||||||
{"slot_id", slot.id},
|
{"slot_id", slot.id},
|
||||||
{"task_id", slot.task_id},
|
{"task_id", slot.task_id},
|
||||||
{"n_ctx", n_ctx},
|
{"n_ctx", n_ctx},
|
||||||
@@ -1809,7 +1807,7 @@ struct llama_server_context
|
|||||||
slot.ga_i = ga_i;
|
slot.ga_i = ga_i;
|
||||||
}
|
}
|
||||||
|
|
||||||
LOG_INFO("slot progression", {
|
LOG_DEBUG("slot progression", {
|
||||||
{ "slot_id", slot.id },
|
{ "slot_id", slot.id },
|
||||||
{ "task_id", slot.task_id },
|
{ "task_id", slot.task_id },
|
||||||
{ "n_past", slot.n_past },
|
{ "n_past", slot.n_past },
|
||||||
@@ -1824,7 +1822,7 @@ struct llama_server_context
|
|||||||
if (slot.n_past == slot.n_prompt_tokens && slot.n_past > 0)
|
if (slot.n_past == slot.n_prompt_tokens && slot.n_past > 0)
|
||||||
{
|
{
|
||||||
// we have to evaluate at least 1 token to generate logits.
|
// we have to evaluate at least 1 token to generate logits.
|
||||||
LOG_INFO("we have to evaluate at least 1 token to generate logits", {
|
LOG_DEBUG("we have to evaluate at least 1 token to generate logits", {
|
||||||
{ "slot_id", slot.id },
|
{ "slot_id", slot.id },
|
||||||
{ "task_id", slot.task_id }
|
{ "task_id", slot.task_id }
|
||||||
});
|
});
|
||||||
@@ -1836,7 +1834,7 @@ struct llama_server_context
|
|||||||
}
|
}
|
||||||
|
|
||||||
int p0 = (int) system_tokens.size() + slot.n_past;
|
int p0 = (int) system_tokens.size() + slot.n_past;
|
||||||
LOG_INFO("kv cache rm [p0, end)", {
|
LOG_DEBUG("kv cache rm [p0, end)", {
|
||||||
{ "slot_id", slot.id },
|
{ "slot_id", slot.id },
|
||||||
{ "task_id", slot.task_id },
|
{ "task_id", slot.task_id },
|
||||||
{ "p0", p0 }
|
{ "p0", p0 }
|
||||||
@@ -2493,11 +2491,7 @@ static void server_params_parse(int argc, char **argv, server_params &sparams,
|
|||||||
}
|
}
|
||||||
else if (arg == "-v" || arg == "--verbose")
|
else if (arg == "-v" || arg == "--verbose")
|
||||||
{
|
{
|
||||||
#if SERVER_VERBOSE != 1
|
|
||||||
LOG_WARNING("server.cpp is not built with verbose logging.", {});
|
|
||||||
#else
|
|
||||||
server_verbose = true;
|
server_verbose = true;
|
||||||
#endif
|
|
||||||
}
|
}
|
||||||
else if (arg == "--mlock")
|
else if (arg == "--mlock")
|
||||||
{
|
{
|
||||||
@@ -2603,7 +2597,7 @@ static void server_params_parse(int argc, char **argv, server_params &sparams,
|
|||||||
else if (arg == "--log-disable")
|
else if (arg == "--log-disable")
|
||||||
{
|
{
|
||||||
log_set_target(stdout);
|
log_set_target(stdout);
|
||||||
LOG_INFO("logging to file is disabled.", {});
|
LOG_DEBUG("logging to file is disabled.", {});
|
||||||
}
|
}
|
||||||
else if (arg == "--slots-endpoint-disable")
|
else if (arg == "--slots-endpoint-disable")
|
||||||
{
|
{
|
||||||
@@ -2729,12 +2723,12 @@ static json format_detokenized_response(std::string content)
|
|||||||
static void log_server_request(const httplib::Request &req, const httplib::Response &res)
|
static void log_server_request(const httplib::Request &req, const httplib::Response &res)
|
||||||
{
|
{
|
||||||
// skip GH copilot requests when using default port
|
// skip GH copilot requests when using default port
|
||||||
if (req.path == "/v1/health" || req.path == "/v1/completions")
|
if (req.path == "/health" || req.path == "/v1/health" || req.path == "/v1/completions")
|
||||||
{
|
{
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
LOG_INFO("request", {
|
LOG_DEBUG("request", {
|
||||||
{"remote_addr", req.remote_addr},
|
{"remote_addr", req.remote_addr},
|
||||||
{"remote_port", req.remote_port},
|
{"remote_port", req.remote_port},
|
||||||
{"status", res.status},
|
{"status", res.status},
|
||||||
@@ -3056,6 +3050,26 @@ int main(int argc, char **argv) {
|
|||||||
log_data["api_key"] = "api_key: " + std::to_string(sparams.api_keys.size()) + " keys loaded";
|
log_data["api_key"] = "api_key: " + std::to_string(sparams.api_keys.size()) + " keys loaded";
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (sparams.n_threads_http < 1) {
|
||||||
|
// +2 threads for monitoring endpoints
|
||||||
|
sparams.n_threads_http = std::max(params.n_parallel + 2, (int32_t) std::thread::hardware_concurrency() - 1);
|
||||||
|
}
|
||||||
|
log_data["n_threads_http"] = std::to_string(sparams.n_threads_http);
|
||||||
|
svr.new_task_queue = [&sparams] { return new httplib::ThreadPool(sparams.n_threads_http); };
|
||||||
|
|
||||||
|
LOG_INFO("HTTP server listening", log_data);
|
||||||
|
// run the HTTP server in a thread - see comment below
|
||||||
|
std::thread t([&]()
|
||||||
|
{
|
||||||
|
if (!svr.listen_after_bind())
|
||||||
|
{
|
||||||
|
state.store(SERVER_STATE_ERROR);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
});
|
||||||
|
|
||||||
// load the model
|
// load the model
|
||||||
if (!llama.load_model(params))
|
if (!llama.load_model(params))
|
||||||
{
|
{
|
||||||
@@ -3260,26 +3274,6 @@ int main(int argc, char **argv) {
|
|||||||
}*/
|
}*/
|
||||||
//);
|
//);
|
||||||
|
|
||||||
if (sparams.n_threads_http < 1) {
|
|
||||||
// +2 threads for monitoring endpoints
|
|
||||||
sparams.n_threads_http = std::max(params.n_parallel + 2, (int32_t) std::thread::hardware_concurrency() - 1);
|
|
||||||
}
|
|
||||||
log_data["n_threads_http"] = std::to_string(sparams.n_threads_http);
|
|
||||||
svr.new_task_queue = [&sparams] { return new httplib::ThreadPool(sparams.n_threads_http); };
|
|
||||||
|
|
||||||
LOG_INFO("HTTP server listening", log_data);
|
|
||||||
// run the HTTP server in a thread - see comment below
|
|
||||||
std::thread t([&]()
|
|
||||||
{
|
|
||||||
if (!svr.listen_after_bind())
|
|
||||||
{
|
|
||||||
state.store(SERVER_STATE_ERROR);
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
return 0;
|
|
||||||
});
|
|
||||||
|
|
||||||
llama.queue_tasks.on_new_task(std::bind(
|
llama.queue_tasks.on_new_task(std::bind(
|
||||||
&llama_server_context::process_single_task, &llama, std::placeholders::_1));
|
&llama_server_context::process_single_task, &llama, std::placeholders::_1));
|
||||||
llama.queue_tasks.on_finish_multitask(std::bind(
|
llama.queue_tasks.on_finish_multitask(std::bind(
|
||||||
|
|||||||
13
llm/ext_server/utils.hpp
vendored
13
llm/ext_server/utils.hpp
vendored
@@ -55,9 +55,10 @@ extern bool server_log_json;
|
|||||||
} while (0)
|
} while (0)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
#define LOG_ERROR( MSG, ...) server_log("ERR", __func__, __LINE__, MSG, __VA_ARGS__)
|
#define LOG_ERROR( MSG, ...) server_log("ERROR", __func__, __LINE__, MSG, __VA_ARGS__)
|
||||||
#define LOG_WARNING(MSG, ...) server_log("WARN", __func__, __LINE__, MSG, __VA_ARGS__)
|
#define LOG_WARNING(MSG, ...) server_log("WARN", __func__, __LINE__, MSG, __VA_ARGS__)
|
||||||
#define LOG_INFO( MSG, ...) server_log("INFO", __func__, __LINE__, MSG, __VA_ARGS__)
|
#define LOG_INFO( MSG, ...) server_log("INFO", __func__, __LINE__, MSG, __VA_ARGS__)
|
||||||
|
#define LOG_DEBUG( MSG, ...) server_log("DEBUG", __func__, __LINE__, MSG, __VA_ARGS__)
|
||||||
|
|
||||||
enum server_state {
|
enum server_state {
|
||||||
SERVER_STATE_LOADING_MODEL, // Server is starting up, model not fully loaded yet
|
SERVER_STATE_LOADING_MODEL, // Server is starting up, model not fully loaded yet
|
||||||
@@ -123,6 +124,10 @@ static inline void server_log(const char *level, const char *function, int line,
|
|||||||
{"timestamp", time(nullptr)},
|
{"timestamp", time(nullptr)},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
if (strncmp("DEBUG", level, strlen(level)) == 0 && !server_verbose) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
if (server_log_json) {
|
if (server_log_json) {
|
||||||
log.merge_patch(
|
log.merge_patch(
|
||||||
{
|
{
|
||||||
@@ -137,14 +142,12 @@ static inline void server_log(const char *level, const char *function, int line,
|
|||||||
|
|
||||||
std::cout << log.dump(-1, ' ', false, json::error_handler_t::replace) << "\n" << std::flush;
|
std::cout << log.dump(-1, ' ', false, json::error_handler_t::replace) << "\n" << std::flush;
|
||||||
} else {
|
} else {
|
||||||
char buf[1024];
|
|
||||||
snprintf(buf, 1024, "%4s [%24s] %s", level, function, message);
|
|
||||||
|
|
||||||
if (!extra.empty()) {
|
if (!extra.empty()) {
|
||||||
log.merge_patch(extra);
|
log.merge_patch(extra);
|
||||||
}
|
}
|
||||||
|
|
||||||
std::stringstream ss;
|
std::stringstream ss;
|
||||||
ss << buf << " |";
|
ss << level << " [" << function << "] " << message << " |";
|
||||||
for (const auto& el : log.items())
|
for (const auto& el : log.items())
|
||||||
{
|
{
|
||||||
const std::string value = el.value().dump(-1, ' ', false, json::error_handler_t::replace);
|
const std::string value = el.value().dump(-1, ' ', false, json::error_handler_t::replace);
|
||||||
|
|||||||
140
llm/filetype.go
Normal file
140
llm/filetype.go
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
package llm
|
||||||
|
|
||||||
|
import "fmt"
|
||||||
|
|
||||||
|
type fileType uint32
|
||||||
|
|
||||||
|
const (
|
||||||
|
fileTypeF32 fileType = iota
|
||||||
|
fileTypeF16
|
||||||
|
fileTypeQ4_0
|
||||||
|
fileTypeQ4_1
|
||||||
|
fileTypeQ4_1_F16
|
||||||
|
fileTypeQ4_2 // unused
|
||||||
|
fileTypeQ4_3 // unused
|
||||||
|
fileTypeQ8_0
|
||||||
|
fileTypeQ5_0
|
||||||
|
fileTypeQ5_1
|
||||||
|
fileTypeQ2_K
|
||||||
|
fileTypeQ3_K_S
|
||||||
|
fileTypeQ3_K_M
|
||||||
|
fileTypeQ3_K_L
|
||||||
|
fileTypeQ4_K_S
|
||||||
|
fileTypeQ4_K_M
|
||||||
|
fileTypeQ5_K_S
|
||||||
|
fileTypeQ5_K_M
|
||||||
|
fileTypeQ6_K
|
||||||
|
fileTypeIQ2_XXS
|
||||||
|
fileTypeIQ2_XS
|
||||||
|
fileTypeQ2_K_S
|
||||||
|
fileTypeQ3_K_XS
|
||||||
|
fileTypeIQ3_XXS
|
||||||
|
|
||||||
|
fileTypeUnknown
|
||||||
|
)
|
||||||
|
|
||||||
|
func ParseFileType(s string) (fileType, error) {
|
||||||
|
switch s {
|
||||||
|
case "F32":
|
||||||
|
return fileTypeF32, nil
|
||||||
|
case "F16":
|
||||||
|
return fileTypeF16, nil
|
||||||
|
case "Q4_0":
|
||||||
|
return fileTypeQ4_0, nil
|
||||||
|
case "Q4_1":
|
||||||
|
return fileTypeQ4_1, nil
|
||||||
|
case "Q4_1_F16":
|
||||||
|
return fileTypeQ4_1_F16, nil
|
||||||
|
case "Q8_0":
|
||||||
|
return fileTypeQ8_0, nil
|
||||||
|
case "Q5_0":
|
||||||
|
return fileTypeQ5_0, nil
|
||||||
|
case "Q5_1":
|
||||||
|
return fileTypeQ5_1, nil
|
||||||
|
case "Q2_K":
|
||||||
|
return fileTypeQ2_K, nil
|
||||||
|
case "Q3_K_S":
|
||||||
|
return fileTypeQ3_K_S, nil
|
||||||
|
case "Q3_K_M":
|
||||||
|
return fileTypeQ3_K_M, nil
|
||||||
|
case "Q3_K_L":
|
||||||
|
return fileTypeQ3_K_L, nil
|
||||||
|
case "Q4_K_S":
|
||||||
|
return fileTypeQ4_K_S, nil
|
||||||
|
case "Q4_K_M":
|
||||||
|
return fileTypeQ4_K_M, nil
|
||||||
|
case "Q5_K_S":
|
||||||
|
return fileTypeQ5_K_S, nil
|
||||||
|
case "Q5_K_M":
|
||||||
|
return fileTypeQ5_K_M, nil
|
||||||
|
case "Q6_K":
|
||||||
|
return fileTypeQ6_K, nil
|
||||||
|
case "IQ2_XXS":
|
||||||
|
return fileTypeIQ2_XXS, nil
|
||||||
|
case "IQ2_XS":
|
||||||
|
return fileTypeIQ2_XS, nil
|
||||||
|
case "Q2_K_S":
|
||||||
|
return fileTypeQ2_K_S, nil
|
||||||
|
case "Q3_K_XS":
|
||||||
|
return fileTypeQ3_K_XS, nil
|
||||||
|
case "IQ3_XXS":
|
||||||
|
return fileTypeIQ3_XXS, nil
|
||||||
|
default:
|
||||||
|
return fileTypeUnknown, fmt.Errorf("unknown fileType: %s", s)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (t fileType) String() string {
|
||||||
|
switch t {
|
||||||
|
case fileTypeF32:
|
||||||
|
return "F32"
|
||||||
|
case fileTypeF16:
|
||||||
|
return "F16"
|
||||||
|
case fileTypeQ4_0:
|
||||||
|
return "Q4_0"
|
||||||
|
case fileTypeQ4_1:
|
||||||
|
return "Q4_1"
|
||||||
|
case fileTypeQ4_1_F16:
|
||||||
|
return "Q4_1_F16"
|
||||||
|
case fileTypeQ8_0:
|
||||||
|
return "Q8_0"
|
||||||
|
case fileTypeQ5_0:
|
||||||
|
return "Q5_0"
|
||||||
|
case fileTypeQ5_1:
|
||||||
|
return "Q5_1"
|
||||||
|
case fileTypeQ2_K:
|
||||||
|
return "Q2_K"
|
||||||
|
case fileTypeQ3_K_S:
|
||||||
|
return "Q3_K_S"
|
||||||
|
case fileTypeQ3_K_M:
|
||||||
|
return "Q3_K_M"
|
||||||
|
case fileTypeQ3_K_L:
|
||||||
|
return "Q3_K_L"
|
||||||
|
case fileTypeQ4_K_S:
|
||||||
|
return "Q4_K_S"
|
||||||
|
case fileTypeQ4_K_M:
|
||||||
|
return "Q4_K_M"
|
||||||
|
case fileTypeQ5_K_S:
|
||||||
|
return "Q5_K_S"
|
||||||
|
case fileTypeQ5_K_M:
|
||||||
|
return "Q5_K_M"
|
||||||
|
case fileTypeQ6_K:
|
||||||
|
return "Q6_K"
|
||||||
|
case fileTypeIQ2_XXS:
|
||||||
|
return "IQ2_XXS"
|
||||||
|
case fileTypeIQ2_XS:
|
||||||
|
return "IQ2_XS"
|
||||||
|
case fileTypeQ2_K_S:
|
||||||
|
return "Q2_K_S"
|
||||||
|
case fileTypeQ3_K_XS:
|
||||||
|
return "Q3_K_XS"
|
||||||
|
case fileTypeIQ3_XXS:
|
||||||
|
return "IQ3_XXS"
|
||||||
|
default:
|
||||||
|
return "unknown"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (t fileType) Value() uint32 {
|
||||||
|
return uint32(t)
|
||||||
|
}
|
||||||
102
llm/ggml.go
102
llm/ggml.go
@@ -13,82 +13,6 @@ type GGML struct {
|
|||||||
model
|
model
|
||||||
}
|
}
|
||||||
|
|
||||||
const (
|
|
||||||
fileTypeF32 uint32 = iota
|
|
||||||
fileTypeF16
|
|
||||||
fileTypeQ4_0
|
|
||||||
fileTypeQ4_1
|
|
||||||
fileTypeQ4_1_F16
|
|
||||||
fileTypeQ8_0 uint32 = iota + 2
|
|
||||||
fileTypeQ5_0
|
|
||||||
fileTypeQ5_1
|
|
||||||
fileTypeQ2_K
|
|
||||||
fileTypeQ3_K_S
|
|
||||||
fileTypeQ3_K_M
|
|
||||||
fileTypeQ3_K_L
|
|
||||||
fileTypeQ4_K_S
|
|
||||||
fileTypeQ4_K_M
|
|
||||||
fileTypeQ5_K_S
|
|
||||||
fileTypeQ5_K_M
|
|
||||||
fileTypeQ6_K
|
|
||||||
fileTypeIQ2_XXS
|
|
||||||
fileTypeIQ2_XS
|
|
||||||
fileTypeQ2_K_S
|
|
||||||
fileTypeQ3_K_XS
|
|
||||||
fileTypeIQ3_XXS
|
|
||||||
)
|
|
||||||
|
|
||||||
func fileType(fileType uint32) string {
|
|
||||||
switch fileType {
|
|
||||||
case fileTypeF32:
|
|
||||||
return "F32"
|
|
||||||
case fileTypeF16:
|
|
||||||
return "F16"
|
|
||||||
case fileTypeQ4_0:
|
|
||||||
return "Q4_0"
|
|
||||||
case fileTypeQ4_1:
|
|
||||||
return "Q4_1"
|
|
||||||
case fileTypeQ4_1_F16:
|
|
||||||
return "Q4_1_F16"
|
|
||||||
case fileTypeQ8_0:
|
|
||||||
return "Q8_0"
|
|
||||||
case fileTypeQ5_0:
|
|
||||||
return "Q5_0"
|
|
||||||
case fileTypeQ5_1:
|
|
||||||
return "Q5_1"
|
|
||||||
case fileTypeQ2_K:
|
|
||||||
return "Q2_K"
|
|
||||||
case fileTypeQ3_K_S:
|
|
||||||
return "Q3_K_S"
|
|
||||||
case fileTypeQ3_K_M:
|
|
||||||
return "Q3_K_M"
|
|
||||||
case fileTypeQ3_K_L:
|
|
||||||
return "Q3_K_L"
|
|
||||||
case fileTypeQ4_K_S:
|
|
||||||
return "Q4_K_S"
|
|
||||||
case fileTypeQ4_K_M:
|
|
||||||
return "Q4_K_M"
|
|
||||||
case fileTypeQ5_K_S:
|
|
||||||
return "Q5_K_S"
|
|
||||||
case fileTypeQ5_K_M:
|
|
||||||
return "Q5_K_M"
|
|
||||||
case fileTypeQ6_K:
|
|
||||||
return "Q6_K"
|
|
||||||
case fileTypeIQ2_XXS:
|
|
||||||
return "IQ2_XXS"
|
|
||||||
case fileTypeIQ2_XS:
|
|
||||||
return "IQ2_XS"
|
|
||||||
case fileTypeQ2_K_S:
|
|
||||||
return "Q2_K_S"
|
|
||||||
case fileTypeQ3_K_XS:
|
|
||||||
return "Q3_K_XS"
|
|
||||||
case fileTypeIQ3_XXS:
|
|
||||||
return "IQ3_XXS"
|
|
||||||
default:
|
|
||||||
return "unknown"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
type model interface {
|
type model interface {
|
||||||
KV() KV
|
KV() KV
|
||||||
Tensors() Tensors
|
Tensors() Tensors
|
||||||
@@ -121,12 +45,12 @@ func (kv KV) ParameterCount() uint64 {
|
|||||||
return kv.u64("general.parameter_count")
|
return kv.u64("general.parameter_count")
|
||||||
}
|
}
|
||||||
|
|
||||||
func (kv KV) FileType() string {
|
func (kv KV) FileType() fileType {
|
||||||
if u64 := kv.u64("general.file_type"); u64 > 0 {
|
if u64 := kv.u64("general.file_type"); u64 > 0 {
|
||||||
return fileType(uint32(u64))
|
return fileType(uint32(u64))
|
||||||
}
|
}
|
||||||
|
|
||||||
return "unknown"
|
return fileTypeUnknown
|
||||||
}
|
}
|
||||||
|
|
||||||
func (kv KV) BlockCount() uint64 {
|
func (kv KV) BlockCount() uint64 {
|
||||||
@@ -286,6 +210,23 @@ const (
|
|||||||
|
|
||||||
var ErrUnsupportedFormat = errors.New("unsupported model format")
|
var ErrUnsupportedFormat = errors.New("unsupported model format")
|
||||||
|
|
||||||
|
func DetectGGMLType(b []byte) string {
|
||||||
|
switch binary.LittleEndian.Uint32(b[:4]) {
|
||||||
|
case FILE_MAGIC_GGML:
|
||||||
|
return "ggml"
|
||||||
|
case FILE_MAGIC_GGMF:
|
||||||
|
return "ggmf"
|
||||||
|
case FILE_MAGIC_GGJT:
|
||||||
|
return "ggjt"
|
||||||
|
case FILE_MAGIC_GGLA:
|
||||||
|
return "ggla"
|
||||||
|
case FILE_MAGIC_GGUF_LE, FILE_MAGIC_GGUF_BE:
|
||||||
|
return "gguf"
|
||||||
|
default:
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func DecodeGGML(rs io.ReadSeeker) (*GGML, int64, error) {
|
func DecodeGGML(rs io.ReadSeeker) (*GGML, int64, error) {
|
||||||
var magic uint32
|
var magic uint32
|
||||||
if err := binary.Read(rs, binary.LittleEndian, &magic); err != nil {
|
if err := binary.Read(rs, binary.LittleEndian, &magic); err != nil {
|
||||||
@@ -388,7 +329,10 @@ func (llm GGML) GraphSize(context, batch uint64) (partialOffload, fullOffload ui
|
|||||||
4*batch*(1+4*embedding+context+context*heads),
|
4*batch*(1+4*embedding+context+context*heads),
|
||||||
)
|
)
|
||||||
|
|
||||||
partialOffload = 4*batch*(2*embedding+vocab) + embedding*vocab*105/128
|
partialOffload = max(
|
||||||
|
4*batch*(2*embedding+vocab)+embedding*vocab*105/128,
|
||||||
|
4*batch*(2+3*embedding+context+context*heads),
|
||||||
|
)
|
||||||
case "stablelm":
|
case "stablelm":
|
||||||
fullOffload = 4 * batch * (context*(1+heads) + 3*embedding + 2)
|
fullOffload = 4 * batch * (context*(1+heads) + 3*embedding + 2)
|
||||||
partialOffload = max(
|
partialOffload = max(
|
||||||
|
|||||||
56
llm/llm.go
56
llm/llm.go
@@ -20,7 +20,7 @@ func SystemInfo() string {
|
|||||||
return C.GoString(C.llama_print_system_info())
|
return C.GoString(C.llama_print_system_info())
|
||||||
}
|
}
|
||||||
|
|
||||||
func Quantize(infile, outfile, filetype string) error {
|
func Quantize(infile, outfile string, ftype fileType) error {
|
||||||
cinfile := C.CString(infile)
|
cinfile := C.CString(infile)
|
||||||
defer C.free(unsafe.Pointer(cinfile))
|
defer C.free(unsafe.Pointer(cinfile))
|
||||||
|
|
||||||
@@ -29,58 +29,10 @@ func Quantize(infile, outfile, filetype string) error {
|
|||||||
|
|
||||||
params := C.llama_model_quantize_default_params()
|
params := C.llama_model_quantize_default_params()
|
||||||
params.nthread = -1
|
params.nthread = -1
|
||||||
|
params.ftype = ftype.Value()
|
||||||
|
|
||||||
switch filetype {
|
if rc := C.llama_model_quantize(cinfile, coutfile, ¶ms); rc != 0 {
|
||||||
case "F32":
|
return fmt.Errorf("llama_model_quantize: %d", rc)
|
||||||
params.ftype = fileTypeF32
|
|
||||||
case "F16":
|
|
||||||
params.ftype = fileTypeF16
|
|
||||||
case "Q4_0":
|
|
||||||
params.ftype = fileTypeQ4_0
|
|
||||||
case "Q4_1":
|
|
||||||
params.ftype = fileTypeQ4_1
|
|
||||||
case "Q4_1_F16":
|
|
||||||
params.ftype = fileTypeQ4_1_F16
|
|
||||||
case "Q8_0":
|
|
||||||
params.ftype = fileTypeQ8_0
|
|
||||||
case "Q5_0":
|
|
||||||
params.ftype = fileTypeQ5_0
|
|
||||||
case "Q5_1":
|
|
||||||
params.ftype = fileTypeQ5_1
|
|
||||||
case "Q2_K":
|
|
||||||
params.ftype = fileTypeQ2_K
|
|
||||||
case "Q3_K_S":
|
|
||||||
params.ftype = fileTypeQ3_K_S
|
|
||||||
case "Q3_K_M":
|
|
||||||
params.ftype = fileTypeQ3_K_M
|
|
||||||
case "Q3_K_L":
|
|
||||||
params.ftype = fileTypeQ3_K_L
|
|
||||||
case "Q4_K_S":
|
|
||||||
params.ftype = fileTypeQ4_K_S
|
|
||||||
case "Q4_K_M":
|
|
||||||
params.ftype = fileTypeQ4_K_M
|
|
||||||
case "Q5_K_S":
|
|
||||||
params.ftype = fileTypeQ5_K_S
|
|
||||||
case "Q5_K_M":
|
|
||||||
params.ftype = fileTypeQ5_K_M
|
|
||||||
case "Q6_K":
|
|
||||||
params.ftype = fileTypeQ6_K
|
|
||||||
case "IQ2_XXS":
|
|
||||||
params.ftype = fileTypeIQ2_XXS
|
|
||||||
case "IQ2_XS":
|
|
||||||
params.ftype = fileTypeIQ2_XS
|
|
||||||
case "Q2_K_S":
|
|
||||||
params.ftype = fileTypeQ2_K_S
|
|
||||||
case "Q3_K_XS":
|
|
||||||
params.ftype = fileTypeQ3_K_XS
|
|
||||||
case "IQ3_XXS":
|
|
||||||
params.ftype = fileTypeIQ3_XXS
|
|
||||||
default:
|
|
||||||
return fmt.Errorf("unknown filetype: %s", filetype)
|
|
||||||
}
|
|
||||||
|
|
||||||
if retval := C.llama_model_quantize(cinfile, coutfile, ¶ms); retval != 0 {
|
|
||||||
return fmt.Errorf("llama_model_quantize: %d", retval)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
|
|||||||
@@ -3,30 +3,20 @@ package llm
|
|||||||
import (
|
import (
|
||||||
"fmt"
|
"fmt"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
|
||||||
"strconv"
|
|
||||||
|
|
||||||
"github.com/ollama/ollama/api"
|
"github.com/ollama/ollama/api"
|
||||||
"github.com/ollama/ollama/format"
|
"github.com/ollama/ollama/format"
|
||||||
"github.com/ollama/ollama/gpu"
|
"github.com/ollama/ollama/gpu"
|
||||||
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
)
|
)
|
||||||
|
|
||||||
// This algorithm looks for a complete fit to determine if we need to unload other models
|
// This algorithm looks for a complete fit to determine if we need to unload other models
|
||||||
func PredictServerFit(allGpus gpu.GpuInfoList, ggml *GGML, adapters, projectors []string, opts api.Options) (bool, uint64) {
|
func PredictServerFit(allGpus gpu.GpuInfoList, ggml *GGML, adapters, projectors []string, opts api.Options) (bool, uint64) {
|
||||||
var estimatedVRAM uint64
|
|
||||||
if opts.NumCtx > int(ggml.KV().ContextLength()) {
|
|
||||||
slog.Warn("requested context length is greater than model max context length", "requested", opts.NumCtx, "model", ggml.KV().ContextLength())
|
|
||||||
opts.NumCtx = int(ggml.KV().ContextLength())
|
|
||||||
}
|
|
||||||
|
|
||||||
if opts.NumCtx < 4 {
|
|
||||||
opts.NumCtx = 4
|
|
||||||
}
|
|
||||||
|
|
||||||
// Split up the GPUs by type and try them
|
// Split up the GPUs by type and try them
|
||||||
|
var estimatedVRAM uint64
|
||||||
for _, gpus := range allGpus.ByLibrary() {
|
for _, gpus := range allGpus.ByLibrary() {
|
||||||
var layerCount int
|
var layerCount int
|
||||||
layerCount, estimatedVRAM = EstimateGPULayers(gpus, ggml, projectors, opts)
|
layerCount, estimatedVRAM, _ = EstimateGPULayers(gpus, ggml, projectors, opts)
|
||||||
if opts.NumGPU < 0 {
|
if opts.NumGPU < 0 {
|
||||||
if layerCount > 0 && layerCount >= int(ggml.KV().BlockCount()+1) {
|
if layerCount > 0 && layerCount >= int(ggml.KV().BlockCount()+1) {
|
||||||
return true, estimatedVRAM
|
return true, estimatedVRAM
|
||||||
@@ -40,25 +30,15 @@ func PredictServerFit(allGpus gpu.GpuInfoList, ggml *GGML, adapters, projectors
|
|||||||
return false, estimatedVRAM
|
return false, estimatedVRAM
|
||||||
}
|
}
|
||||||
|
|
||||||
// Given a model and one or more GPU targets, predict how many layers and bytes we can load
|
// Given a model and one or more GPU targets, predict how many layers and bytes we can load, and the total size
|
||||||
// The GPUs provided must all be the same Library
|
// The GPUs provided must all be the same Library
|
||||||
func EstimateGPULayers(gpus []gpu.GpuInfo, ggml *GGML, projectors []string, opts api.Options) (int, uint64) {
|
func EstimateGPULayers(gpus []gpu.GpuInfo, ggml *GGML, projectors []string, opts api.Options) (int, uint64, uint64) {
|
||||||
if gpus[0].Library == "cpu" {
|
|
||||||
return 0, 0
|
|
||||||
}
|
|
||||||
var memoryAvailable uint64
|
var memoryAvailable uint64
|
||||||
for _, info := range gpus {
|
for _, info := range gpus {
|
||||||
memoryAvailable += info.FreeMemory
|
memoryAvailable += info.FreeMemory
|
||||||
}
|
}
|
||||||
userLimit := os.Getenv("OLLAMA_MAX_VRAM")
|
if envconfig.MaxVRAM > 0 {
|
||||||
if userLimit != "" {
|
memoryAvailable = envconfig.MaxVRAM
|
||||||
avail, err := strconv.ParseUint(userLimit, 10, 64)
|
|
||||||
if err != nil {
|
|
||||||
slog.Error("invalid setting, ignoring", "OLLAMA_MAX_VRAM", userLimit, "error", err)
|
|
||||||
} else {
|
|
||||||
slog.Info("user override memory limit", "OLLAMA_MAX_VRAM", avail, "actual", memoryAvailable)
|
|
||||||
memoryAvailable = avail
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Debug("evaluating", "library", gpus[0].Library, "gpu_count", len(gpus), "available", format.HumanBytes2(memoryAvailable))
|
slog.Debug("evaluating", "library", gpus[0].Library, "gpu_count", len(gpus), "available", format.HumanBytes2(memoryAvailable))
|
||||||
@@ -93,18 +73,13 @@ func EstimateGPULayers(gpus []gpu.GpuInfo, ggml *GGML, projectors []string, opts
|
|||||||
graphPartialOffload = graphFullOffload
|
graphPartialOffload = graphFullOffload
|
||||||
}
|
}
|
||||||
|
|
||||||
|
layers := ggml.Tensors().Layers()
|
||||||
|
|
||||||
// memoryRequiredTotal represents the memory required for full GPU offloading (all layers)
|
// memoryRequiredTotal represents the memory required for full GPU offloading (all layers)
|
||||||
memoryRequiredTotal := memoryMinimum + graphFullOffload
|
memoryRequiredTotal := memoryMinimum + graphFullOffload + layers["blk.0"].size()
|
||||||
|
|
||||||
// memoryRequiredPartial represents the memory required for partial GPU offloading (n > 0, n < layers)
|
// memoryRequiredPartial represents the memory required for partial GPU offloading (n > 0, n < layers)
|
||||||
memoryRequiredPartial := memoryMinimum + graphPartialOffload
|
memoryRequiredPartial := memoryMinimum + graphPartialOffload + layers["blk.0"].size()
|
||||||
|
|
||||||
if memoryRequiredPartial > memoryAvailable {
|
|
||||||
slog.Debug("insufficient VRAM to load any model layers")
|
|
||||||
return 0, 0
|
|
||||||
}
|
|
||||||
|
|
||||||
layers := ggml.Tensors().Layers()
|
|
||||||
|
|
||||||
var memoryLayerOutput uint64
|
var memoryLayerOutput uint64
|
||||||
if layer, ok := layers["output_norm"]; ok {
|
if layer, ok := layers["output_norm"]; ok {
|
||||||
@@ -189,5 +164,13 @@ func EstimateGPULayers(gpus []gpu.GpuInfo, ggml *GGML, projectors []string, opts
|
|||||||
),
|
),
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
return layerCount, uint64(memoryRequiredPartial)
|
if gpus[0].Library == "cpu" {
|
||||||
|
return 0, 0, memoryRequiredTotal
|
||||||
|
}
|
||||||
|
if memoryRequiredPartial > memoryAvailable {
|
||||||
|
slog.Debug("insufficient VRAM to load any model layers")
|
||||||
|
return 0, 0, memoryRequiredTotal
|
||||||
|
}
|
||||||
|
|
||||||
|
return layerCount, memoryRequiredPartial, memoryRequiredTotal
|
||||||
}
|
}
|
||||||
|
|||||||
24
llm/patches/05-clip-fix.diff
Normal file
24
llm/patches/05-clip-fix.diff
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp
|
||||||
|
index e3c9bcd4..b43f892d 100644
|
||||||
|
--- a/examples/llava/clip.cpp
|
||||||
|
+++ b/examples/llava/clip.cpp
|
||||||
|
@@ -573,14 +573,16 @@ static ggml_cgraph * clip_image_build_graph(clip_ctx * ctx, const clip_image_f32
|
||||||
|
struct ggml_tensor * embeddings = inp;
|
||||||
|
if (ctx->has_class_embedding) {
|
||||||
|
embeddings = ggml_new_tensor_3d(ctx0, GGML_TYPE_F32, hidden_size, num_positions, batch_size);
|
||||||
|
+ }
|
||||||
|
+ ggml_set_name(embeddings, "embeddings");
|
||||||
|
+ ggml_set_input(embeddings);
|
||||||
|
+
|
||||||
|
+ if (ctx->has_class_embedding) {
|
||||||
|
embeddings = ggml_acc(ctx0, embeddings, model.class_embedding,
|
||||||
|
embeddings->nb[1], embeddings->nb[2], embeddings->nb[3], 0);
|
||||||
|
embeddings = ggml_acc(ctx0, embeddings, inp,
|
||||||
|
embeddings->nb[1], embeddings->nb[2], embeddings->nb[3], model.class_embedding->nb[1]);
|
||||||
|
}
|
||||||
|
- ggml_set_name(embeddings, "embeddings");
|
||||||
|
- ggml_set_input(embeddings);
|
||||||
|
-
|
||||||
|
|
||||||
|
struct ggml_tensor * positions = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, num_positions);
|
||||||
|
ggml_set_name(positions, "positions");
|
||||||
458
llm/server.go
458
llm/server.go
@@ -26,6 +26,7 @@ import (
|
|||||||
"github.com/ollama/ollama/api"
|
"github.com/ollama/ollama/api"
|
||||||
"github.com/ollama/ollama/format"
|
"github.com/ollama/ollama/format"
|
||||||
"github.com/ollama/ollama/gpu"
|
"github.com/ollama/ollama/gpu"
|
||||||
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
)
|
)
|
||||||
|
|
||||||
type LlamaServer interface {
|
type LlamaServer interface {
|
||||||
@@ -48,7 +49,11 @@ type llmServer struct {
|
|||||||
options api.Options
|
options api.Options
|
||||||
|
|
||||||
// TODO - this should be broken down by GPU
|
// TODO - this should be broken down by GPU
|
||||||
estimatedVRAM uint64 // Estimated usage of VRAM by the loaded model
|
estimatedVRAM uint64 // Estimated usage of VRAM by the loaded model
|
||||||
|
estimatedTotal uint64 // Total size of model
|
||||||
|
totalLayers uint64
|
||||||
|
gpuCount int
|
||||||
|
loadDuration time.Duration // Record how long it took the model to load
|
||||||
|
|
||||||
sem *semaphore.Weighted
|
sem *semaphore.Weighted
|
||||||
}
|
}
|
||||||
@@ -72,22 +77,17 @@ func LoadModel(model string) (*GGML, error) {
|
|||||||
// The gpu list must be a single family.
|
// The gpu list must be a single family.
|
||||||
func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, projectors []string, opts api.Options) (LlamaServer, error) {
|
func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, projectors []string, opts api.Options) (LlamaServer, error) {
|
||||||
var err error
|
var err error
|
||||||
if opts.NumCtx > int(ggml.KV().ContextLength()) {
|
var cpuRunner string
|
||||||
slog.Warn("requested context length is greater than the model's training context window size", "requested", opts.NumCtx, "training size", ggml.KV().ContextLength())
|
|
||||||
}
|
|
||||||
|
|
||||||
if opts.NumCtx < 4 {
|
|
||||||
opts.NumCtx = 4
|
|
||||||
}
|
|
||||||
|
|
||||||
cpuRunner := ""
|
|
||||||
var estimatedVRAM uint64
|
var estimatedVRAM uint64
|
||||||
|
var estimatedTotal uint64
|
||||||
var systemMemory uint64
|
var systemMemory uint64
|
||||||
|
gpuCount := len(gpus)
|
||||||
if (len(gpus) == 1 && gpus[0].Library == "cpu") || opts.NumGPU == 0 {
|
if (len(gpus) == 1 && gpus[0].Library == "cpu") || opts.NumGPU == 0 {
|
||||||
|
|
||||||
// TODO evaluate system memory to see if we should block the load, or force an unload of another CPU runner
|
// TODO evaluate system memory to see if we should block the load, or force an unload of another CPU runner
|
||||||
|
|
||||||
cpuRunner = serverForCpu()
|
cpuRunner = serverForCpu()
|
||||||
|
gpuCount = 0
|
||||||
} else {
|
} else {
|
||||||
if gpus[0].Library == "metal" {
|
if gpus[0].Library == "metal" {
|
||||||
memInfo, err := gpu.GetCPUMem()
|
memInfo, err := gpu.GetCPUMem()
|
||||||
@@ -99,12 +99,16 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
var layers int
|
var layers int
|
||||||
layers, estimatedVRAM = EstimateGPULayers(gpus, ggml, projectors, opts)
|
layers, estimatedVRAM, estimatedTotal = EstimateGPULayers(gpus, ggml, projectors, opts)
|
||||||
|
|
||||||
if gpus[0].Library == "metal" && estimatedVRAM > systemMemory {
|
if gpus[0].Library == "metal" && estimatedVRAM > systemMemory {
|
||||||
// disable partial offloading when model is greater than total system memory as this
|
// disable partial offloading when model is greater than total system memory as this
|
||||||
// can lead to locking up the system
|
// can lead to locking up the system
|
||||||
opts.NumGPU = 0
|
opts.NumGPU = 0
|
||||||
|
} else if gpus[0].Library != "metal" && layers == 0 {
|
||||||
|
// Don't bother loading into the GPU if no layers can fit
|
||||||
|
cpuRunner = serverForCpu()
|
||||||
|
gpuCount = 0
|
||||||
} else if opts.NumGPU < 0 && layers > 0 && gpus[0].Library != "cpu" {
|
} else if opts.NumGPU < 0 && layers > 0 && gpus[0].Library != "cpu" {
|
||||||
opts.NumGPU = layers
|
opts.NumGPU = layers
|
||||||
}
|
}
|
||||||
@@ -124,7 +128,7 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
} else {
|
} else {
|
||||||
servers = serversForGpu(gpus[0]) // All GPUs in the list are matching Library and Variant
|
servers = serversForGpu(gpus[0]) // All GPUs in the list are matching Library and Variant
|
||||||
}
|
}
|
||||||
demandLib := strings.Trim(os.Getenv("OLLAMA_LLM_LIBRARY"), "\"' ")
|
demandLib := envconfig.LLMLibrary
|
||||||
if demandLib != "" {
|
if demandLib != "" {
|
||||||
serverPath := availableServers[demandLib]
|
serverPath := availableServers[demandLib]
|
||||||
if serverPath == "" {
|
if serverPath == "" {
|
||||||
@@ -132,6 +136,10 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
} else {
|
} else {
|
||||||
slog.Info("user override", "OLLAMA_LLM_LIBRARY", demandLib, "path", serverPath)
|
slog.Info("user override", "OLLAMA_LLM_LIBRARY", demandLib, "path", serverPath)
|
||||||
servers = []string{demandLib}
|
servers = []string{demandLib}
|
||||||
|
if strings.HasPrefix(demandLib, "cpu") {
|
||||||
|
// Omit the GPU flag to silence the warning
|
||||||
|
opts.NumGPU = -1
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -145,17 +153,14 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
"--batch-size", fmt.Sprintf("%d", opts.NumBatch),
|
"--batch-size", fmt.Sprintf("%d", opts.NumBatch),
|
||||||
"--embedding",
|
"--embedding",
|
||||||
}
|
}
|
||||||
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
|
|
||||||
params = append(params, "--log-format", "json")
|
params = append(params, "--log-disable")
|
||||||
} else {
|
|
||||||
params = append(params, "--log-disable")
|
|
||||||
}
|
|
||||||
|
|
||||||
if opts.NumGPU >= 0 {
|
if opts.NumGPU >= 0 {
|
||||||
params = append(params, "--n-gpu-layers", fmt.Sprintf("%d", opts.NumGPU))
|
params = append(params, "--n-gpu-layers", fmt.Sprintf("%d", opts.NumGPU))
|
||||||
}
|
}
|
||||||
|
|
||||||
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
|
if envconfig.Debug {
|
||||||
params = append(params, "--verbose")
|
params = append(params, "--verbose")
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -193,16 +198,15 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
params = append(params, "--numa")
|
params = append(params, "--numa")
|
||||||
}
|
}
|
||||||
|
|
||||||
// "--cont-batching", // TODO - doesn't seem to have any noticeable perf change for multiple requests
|
numParallel := envconfig.NumParallel
|
||||||
numParallel := 1
|
|
||||||
if onp := os.Getenv("OLLAMA_NUM_PARALLEL"); onp != "" {
|
// TODO (jmorganca): multimodal models don't support parallel yet
|
||||||
numParallel, err = strconv.Atoi(onp)
|
// see https://github.com/ollama/ollama/issues/4165
|
||||||
if err != nil || numParallel <= 0 {
|
if len(projectors) > 0 {
|
||||||
err = fmt.Errorf("invalid OLLAMA_NUM_PARALLEL=%s must be greater than zero - %w", onp, err)
|
numParallel = 1
|
||||||
slog.Error("misconfiguration", "error", err)
|
slog.Warn("multimodal models don't support parallel requests yet")
|
||||||
return nil, err
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
params = append(params, "--parallel", fmt.Sprintf("%d", numParallel))
|
params = append(params, "--parallel", fmt.Sprintf("%d", numParallel))
|
||||||
|
|
||||||
for i := 0; i < len(servers); i++ {
|
for i := 0; i < len(servers); i++ {
|
||||||
@@ -210,10 +214,15 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
if dir == "" {
|
if dir == "" {
|
||||||
// Shouldn't happen
|
// Shouldn't happen
|
||||||
finalErr = fmt.Errorf("[%d] server %s not listed in available servers %v", i, servers[i], availableServers)
|
finalErr = fmt.Errorf("[%d] server %s not listed in available servers %v", i, servers[i], availableServers)
|
||||||
slog.Error("sever list inconsistent", "error", finalErr)
|
slog.Error("server list inconsistent", "error", finalErr)
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if strings.HasPrefix(servers[i], "cpu") {
|
||||||
|
// TODO if we tried a gpu runner first, and it failed, record the error and bubble that back up
|
||||||
|
gpuCount = 0
|
||||||
|
}
|
||||||
|
|
||||||
// Find an availableServers port, retry on each iterration in case the failure was a port conflict race
|
// Find an availableServers port, retry on each iterration in case the failure was a port conflict race
|
||||||
port := 0
|
port := 0
|
||||||
if a, err := net.ResolveTCPAddr("tcp", "localhost:0"); err == nil {
|
if a, err := net.ResolveTCPAddr("tcp", "localhost:0"); err == nil {
|
||||||
@@ -233,13 +242,13 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
if runtime.GOOS == "windows" {
|
if runtime.GOOS == "windows" {
|
||||||
pathEnv = "PATH"
|
pathEnv = "PATH"
|
||||||
}
|
}
|
||||||
// append the server directory to LD_LIBRARY_PATH/PATH
|
// prepend the server directory to LD_LIBRARY_PATH/PATH
|
||||||
libraryPaths := []string{dir}
|
libraryPaths := []string{dir}
|
||||||
|
|
||||||
if libraryPath, ok := os.LookupEnv(pathEnv); ok {
|
if libraryPath, ok := os.LookupEnv(pathEnv); ok {
|
||||||
// Append our runner directory to the path
|
// Append our runner directory to the path
|
||||||
// This will favor system libraries over our bundled library dependencies
|
// This will favor system libraries over our bundled library dependencies
|
||||||
libraryPaths = append(filepath.SplitList(libraryPath), libraryPaths...)
|
libraryPaths = append(libraryPaths, filepath.SplitList(libraryPath)...)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Note: we always put the dependency path first
|
// Note: we always put the dependency path first
|
||||||
@@ -267,23 +276,43 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
}
|
}
|
||||||
|
|
||||||
s := &llmServer{
|
s := &llmServer{
|
||||||
port: port,
|
port: port,
|
||||||
cmd: exec.Command(server, finalParams...),
|
cmd: exec.Command(server, finalParams...),
|
||||||
status: NewStatusWriter(os.Stderr),
|
status: NewStatusWriter(os.Stderr),
|
||||||
options: opts,
|
options: opts,
|
||||||
estimatedVRAM: estimatedVRAM,
|
estimatedVRAM: estimatedVRAM,
|
||||||
sem: semaphore.NewWeighted(int64(numParallel)),
|
estimatedTotal: estimatedTotal,
|
||||||
|
sem: semaphore.NewWeighted(int64(numParallel)),
|
||||||
|
totalLayers: ggml.KV().BlockCount() + 1,
|
||||||
|
gpuCount: gpuCount,
|
||||||
|
done: make(chan error, 1),
|
||||||
}
|
}
|
||||||
|
|
||||||
libEnv := fmt.Sprintf("%s=%s", pathEnv, strings.Join(libraryPaths, string(filepath.ListSeparator)))
|
s.cmd.Env = os.Environ()
|
||||||
s.cmd.Env = append(os.Environ(), libEnv)
|
|
||||||
s.cmd.Stdout = os.Stdout
|
s.cmd.Stdout = os.Stdout
|
||||||
s.cmd.Stderr = s.status
|
s.cmd.Stderr = s.status
|
||||||
|
|
||||||
// TODO - multiple GPU selection logic...
|
visibleDevicesEnv, visibleDevicesEnvVal := gpu.GpuInfoList(gpus).GetVisibleDevicesEnv()
|
||||||
key, val := gpu.GpuInfoList(gpus).GetVisibleDevicesEnv()
|
pathEnvVal := strings.Join(libraryPaths, string(filepath.ListSeparator))
|
||||||
if key != "" {
|
|
||||||
s.cmd.Env = append(s.cmd.Env, key+"="+val)
|
// Update or add the path and visible devices variable with our adjusted version
|
||||||
|
pathNeeded := true
|
||||||
|
devicesNeeded := visibleDevicesEnv != ""
|
||||||
|
for i := range s.cmd.Env {
|
||||||
|
cmp := strings.SplitN(s.cmd.Env[i], "=", 2)
|
||||||
|
if strings.EqualFold(cmp[0], pathEnv) {
|
||||||
|
s.cmd.Env[i] = pathEnv + "=" + pathEnvVal
|
||||||
|
pathNeeded = false
|
||||||
|
} else if devicesNeeded && strings.EqualFold(cmp[0], visibleDevicesEnv) {
|
||||||
|
s.cmd.Env[i] = visibleDevicesEnv + "=" + visibleDevicesEnvVal
|
||||||
|
devicesNeeded = false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if pathNeeded {
|
||||||
|
s.cmd.Env = append(s.cmd.Env, pathEnv+"="+pathEnvVal)
|
||||||
|
}
|
||||||
|
if devicesNeeded {
|
||||||
|
s.cmd.Env = append(s.cmd.Env, visibleDevicesEnv+"="+visibleDevicesEnvVal)
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Info("starting llama server", "cmd", s.cmd.String())
|
slog.Info("starting llama server", "cmd", s.cmd.String())
|
||||||
@@ -291,6 +320,11 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
slog.Debug("subprocess", "environment", s.cmd.Env)
|
slog.Debug("subprocess", "environment", s.cmd.Env)
|
||||||
|
|
||||||
if err = s.cmd.Start(); err != nil {
|
if err = s.cmd.Start(); err != nil {
|
||||||
|
// Detect permission denied and augment them essage about noexec
|
||||||
|
if errors.Is(err, os.ErrPermission) {
|
||||||
|
finalErr = fmt.Errorf("unable to start server %w. %s may have noexec set. Set OLLAMA_TMPDIR for server to a writable executable directory", err, dir)
|
||||||
|
continue
|
||||||
|
}
|
||||||
msg := ""
|
msg := ""
|
||||||
if s.status != nil && s.status.LastErrMsg != "" {
|
if s.status != nil && s.status.LastErrMsg != "" {
|
||||||
msg = s.status.LastErrMsg
|
msg = s.status.LastErrMsg
|
||||||
@@ -300,13 +334,11 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
|
|||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
// TODO - make sure this is all wired up correctly
|
// reap subprocess when it exits
|
||||||
// if err = s.WaitUntilRunning(); err != nil {
|
go func() {
|
||||||
// slog.Error("error starting llama server", "server", servers[i], "error", err)
|
s.done <- s.cmd.Wait()
|
||||||
// s.Close()
|
}()
|
||||||
// finalErr = err
|
|
||||||
// continue
|
|
||||||
// }
|
|
||||||
return s, nil
|
return s, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -338,7 +370,7 @@ type ServerStatus int
|
|||||||
|
|
||||||
const ( // iota is reset to 0
|
const ( // iota is reset to 0
|
||||||
ServerStatusReady ServerStatus = iota
|
ServerStatusReady ServerStatus = iota
|
||||||
ServerStatusNoSlotsAvaialble
|
ServerStatusNoSlotsAvailable
|
||||||
ServerStatusLoadingModel
|
ServerStatusLoadingModel
|
||||||
ServerStatusNotResponding
|
ServerStatusNotResponding
|
||||||
ServerStatusError
|
ServerStatusError
|
||||||
@@ -348,7 +380,7 @@ func (s ServerStatus) ToString() string {
|
|||||||
switch s {
|
switch s {
|
||||||
case ServerStatusReady:
|
case ServerStatusReady:
|
||||||
return "llm server ready"
|
return "llm server ready"
|
||||||
case ServerStatusNoSlotsAvaialble:
|
case ServerStatusNoSlotsAvailable:
|
||||||
return "llm busy - no slots available"
|
return "llm busy - no slots available"
|
||||||
case ServerStatusLoadingModel:
|
case ServerStatusLoadingModel:
|
||||||
return "llm server loading model"
|
return "llm server loading model"
|
||||||
@@ -373,6 +405,10 @@ func (s *llmServer) getServerStatus(ctx context.Context) (ServerStatus, error) {
|
|||||||
if s.status != nil && s.status.LastErrMsg != "" {
|
if s.status != nil && s.status.LastErrMsg != "" {
|
||||||
msg = s.status.LastErrMsg
|
msg = s.status.LastErrMsg
|
||||||
}
|
}
|
||||||
|
if s.cmd.ProcessState.ExitCode() == -1 {
|
||||||
|
// Most likely a signal killed it, log some more details to try to help troubleshoot
|
||||||
|
slog.Warn("llama runner process no longer running", "sys", s.cmd.ProcessState.Sys(), "string", s.cmd.ProcessState.String())
|
||||||
|
}
|
||||||
return ServerStatusError, fmt.Errorf("llama runner process no longer running: %d %s", s.cmd.ProcessState.ExitCode(), msg)
|
return ServerStatusError, fmt.Errorf("llama runner process no longer running: %d %s", s.cmd.ProcessState.ExitCode(), msg)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -405,7 +441,7 @@ func (s *llmServer) getServerStatus(ctx context.Context) (ServerStatus, error) {
|
|||||||
case "ok":
|
case "ok":
|
||||||
return ServerStatusReady, nil
|
return ServerStatusReady, nil
|
||||||
case "no slot available":
|
case "no slot available":
|
||||||
return ServerStatusNoSlotsAvaialble, nil
|
return ServerStatusNoSlotsAvailable, nil
|
||||||
case "loading model":
|
case "loading model":
|
||||||
return ServerStatusLoadingModel, nil
|
return ServerStatusLoadingModel, nil
|
||||||
default:
|
default:
|
||||||
@@ -413,6 +449,29 @@ func (s *llmServer) getServerStatus(ctx context.Context) (ServerStatus, error) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// getServerStatusRetry will retry if ServerStatusNoSlotsAvailable is received
|
||||||
|
func (s *llmServer) getServerStatusRetry(ctx context.Context) (ServerStatus, error) {
|
||||||
|
var retries int
|
||||||
|
for {
|
||||||
|
status, err := s.getServerStatus(ctx)
|
||||||
|
if err != nil {
|
||||||
|
return status, err
|
||||||
|
}
|
||||||
|
|
||||||
|
if status == ServerStatusNoSlotsAvailable {
|
||||||
|
if retries >= 10 {
|
||||||
|
return status, fmt.Errorf("no slots available after %d retries", retries)
|
||||||
|
}
|
||||||
|
|
||||||
|
time.Sleep(5 * time.Millisecond)
|
||||||
|
retries++
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
return status, nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func (s *llmServer) Ping(ctx context.Context) error {
|
func (s *llmServer) Ping(ctx context.Context) error {
|
||||||
_, err := s.getServerStatus(ctx)
|
_, err := s.getServerStatus(ctx)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -424,13 +483,11 @@ func (s *llmServer) Ping(ctx context.Context) error {
|
|||||||
|
|
||||||
func (s *llmServer) WaitUntilRunning(ctx context.Context) error {
|
func (s *llmServer) WaitUntilRunning(ctx context.Context) error {
|
||||||
start := time.Now()
|
start := time.Now()
|
||||||
// TODO we need to wire up a better way to detect hangs during model load and startup of the server
|
|
||||||
expiresAt := time.Now().Add(10 * time.Minute) // be generous with timeout, large models can take a while to load
|
expiresAt := time.Now().Add(10 * time.Minute) // be generous with timeout, large models can take a while to load
|
||||||
ticker := time.NewTicker(50 * time.Millisecond)
|
|
||||||
defer ticker.Stop()
|
|
||||||
|
|
||||||
slog.Info("waiting for llama runner to start responding")
|
slog.Info("waiting for llama runner to start responding")
|
||||||
var lastStatus ServerStatus = -1
|
var lastStatus ServerStatus = -1
|
||||||
|
|
||||||
for {
|
for {
|
||||||
select {
|
select {
|
||||||
case <-ctx.Done():
|
case <-ctx.Done():
|
||||||
@@ -442,41 +499,39 @@ func (s *llmServer) WaitUntilRunning(ctx context.Context) error {
|
|||||||
msg = s.status.LastErrMsg
|
msg = s.status.LastErrMsg
|
||||||
}
|
}
|
||||||
return fmt.Errorf("llama runner process has terminated: %v %s", err, msg)
|
return fmt.Errorf("llama runner process has terminated: %v %s", err, msg)
|
||||||
case <-ticker.C:
|
default:
|
||||||
if time.Now().After(expiresAt) {
|
}
|
||||||
// timeout
|
if time.Now().After(expiresAt) {
|
||||||
msg := ""
|
// timeout
|
||||||
if s.status != nil && s.status.LastErrMsg != "" {
|
msg := ""
|
||||||
msg = s.status.LastErrMsg
|
if s.status != nil && s.status.LastErrMsg != "" {
|
||||||
}
|
msg = s.status.LastErrMsg
|
||||||
return fmt.Errorf("timed out waiting for llama runner to start: %s", msg)
|
|
||||||
}
|
}
|
||||||
if s.cmd.ProcessState != nil {
|
return fmt.Errorf("timed out waiting for llama runner to start: %s", msg)
|
||||||
msg := ""
|
}
|
||||||
if s.status != nil && s.status.LastErrMsg != "" {
|
if s.cmd.ProcessState != nil {
|
||||||
msg = s.status.LastErrMsg
|
msg := ""
|
||||||
}
|
if s.status != nil && s.status.LastErrMsg != "" {
|
||||||
return fmt.Errorf("llama runner process no longer running: %d %s", s.cmd.ProcessState.ExitCode(), msg)
|
msg = s.status.LastErrMsg
|
||||||
}
|
|
||||||
|
|
||||||
c, cancel := context.WithTimeout(ctx, 200*time.Millisecond)
|
|
||||||
defer cancel()
|
|
||||||
status, err := s.getServerStatus(c)
|
|
||||||
if err != nil && lastStatus != status {
|
|
||||||
slog.Debug("server not yet available", "error", err)
|
|
||||||
lastStatus = status
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
|
|
||||||
switch status {
|
|
||||||
case ServerStatusLoadingModel:
|
|
||||||
// TODO - this state never seems to happen with the current server.cpp code (bug?)
|
|
||||||
// it doesn't respond to the health endpoint until after the model is loaded
|
|
||||||
slog.Debug("loading model")
|
|
||||||
case ServerStatusReady:
|
|
||||||
slog.Debug(fmt.Sprintf("llama runner started in %f seconds", time.Since(start).Seconds()))
|
|
||||||
return nil
|
|
||||||
}
|
}
|
||||||
|
return fmt.Errorf("llama runner process no longer running: %d %s", s.cmd.ProcessState.ExitCode(), msg)
|
||||||
|
}
|
||||||
|
ctx, cancel := context.WithTimeout(ctx, 200*time.Millisecond)
|
||||||
|
defer cancel()
|
||||||
|
status, _ := s.getServerStatus(ctx)
|
||||||
|
if lastStatus != status && status != ServerStatusReady {
|
||||||
|
// Only log on status changes
|
||||||
|
slog.Info("waiting for server to become available", "status", status.ToString())
|
||||||
|
}
|
||||||
|
switch status {
|
||||||
|
case ServerStatusReady:
|
||||||
|
s.loadDuration = time.Since(start)
|
||||||
|
slog.Info(fmt.Sprintf("llama runner started in %0.2f seconds", s.loadDuration.Seconds()))
|
||||||
|
return nil
|
||||||
|
default:
|
||||||
|
lastStatus = status
|
||||||
|
time.Sleep(time.Millisecond * 250)
|
||||||
|
continue
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -510,7 +565,6 @@ ws ::= ([ \t\n] ws)?
|
|||||||
`
|
`
|
||||||
|
|
||||||
const maxBufferSize = 512 * format.KiloByte
|
const maxBufferSize = 512 * format.KiloByte
|
||||||
const maxRetries = 3
|
|
||||||
|
|
||||||
type ImageData struct {
|
type ImageData struct {
|
||||||
Data []byte `json:"data"`
|
Data []byte `json:"data"`
|
||||||
@@ -518,10 +572,11 @@ type ImageData struct {
|
|||||||
}
|
}
|
||||||
|
|
||||||
type completion struct {
|
type completion struct {
|
||||||
Content string `json:"content"`
|
Content string `json:"content"`
|
||||||
Model string `json:"model"`
|
Model string `json:"model"`
|
||||||
Prompt string `json:"prompt"`
|
Prompt string `json:"prompt"`
|
||||||
Stop bool `json:"stop"`
|
Stop bool `json:"stop"`
|
||||||
|
StoppedLimit bool `json:"stopped_limit"`
|
||||||
|
|
||||||
Timings struct {
|
Timings struct {
|
||||||
PredictedN int `json:"predicted_n"`
|
PredictedN int `json:"predicted_n"`
|
||||||
@@ -540,6 +595,7 @@ type CompletionRequest struct {
|
|||||||
|
|
||||||
type CompletionResponse struct {
|
type CompletionResponse struct {
|
||||||
Content string
|
Content string
|
||||||
|
DoneReason string
|
||||||
Done bool
|
Done bool
|
||||||
PromptEvalCount int
|
PromptEvalCount int
|
||||||
PromptEvalDuration time.Duration
|
PromptEvalDuration time.Duration
|
||||||
@@ -586,7 +642,7 @@ func (s *llmServer) Completion(ctx context.Context, req CompletionRequest, fn fu
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Make sure the server is ready
|
// Make sure the server is ready
|
||||||
status, err := s.getServerStatus(ctx)
|
status, err := s.getServerStatusRetry(ctx)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
} else if status != ServerStatusReady {
|
} else if status != ServerStatusReady {
|
||||||
@@ -600,133 +656,119 @@ func (s *llmServer) Completion(ctx context.Context, req CompletionRequest, fn fu
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
retryDelay := 100 * time.Microsecond
|
// Handling JSON marshaling with special characters unescaped.
|
||||||
for retries := 0; retries < maxRetries; retries++ {
|
buffer := &bytes.Buffer{}
|
||||||
if retries > 0 {
|
enc := json.NewEncoder(buffer)
|
||||||
time.Sleep(retryDelay) // wait before retrying
|
enc.SetEscapeHTML(false)
|
||||||
retryDelay *= 2 // exponential backoff
|
|
||||||
}
|
|
||||||
|
|
||||||
// Handling JSON marshaling with special characters unescaped.
|
if err := enc.Encode(request); err != nil {
|
||||||
buffer := &bytes.Buffer{}
|
return fmt.Errorf("failed to marshal data: %v", err)
|
||||||
enc := json.NewEncoder(buffer)
|
}
|
||||||
enc.SetEscapeHTML(false)
|
|
||||||
|
|
||||||
if err := enc.Encode(request); err != nil {
|
endpoint := fmt.Sprintf("http://127.0.0.1:%d/completion", s.port)
|
||||||
return fmt.Errorf("failed to marshal data: %v", err)
|
serverReq, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, buffer)
|
||||||
}
|
if err != nil {
|
||||||
|
return fmt.Errorf("error creating POST request: %v", err)
|
||||||
|
}
|
||||||
|
serverReq.Header.Set("Content-Type", "application/json")
|
||||||
|
|
||||||
endpoint := fmt.Sprintf("http://127.0.0.1:%d/completion", s.port)
|
res, err := http.DefaultClient.Do(serverReq)
|
||||||
req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, buffer)
|
if err != nil {
|
||||||
|
return fmt.Errorf("POST predict: %v", err)
|
||||||
|
}
|
||||||
|
defer res.Body.Close()
|
||||||
|
|
||||||
|
if res.StatusCode >= 400 {
|
||||||
|
bodyBytes, err := io.ReadAll(res.Body)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("error creating POST request: %v", err)
|
return fmt.Errorf("failed reading llm error response: %w", err)
|
||||||
}
|
}
|
||||||
req.Header.Set("Content-Type", "application/json")
|
log.Printf("llm predict error: %s", bodyBytes)
|
||||||
|
return fmt.Errorf("%s", bodyBytes)
|
||||||
|
}
|
||||||
|
|
||||||
resp, err := http.DefaultClient.Do(req)
|
scanner := bufio.NewScanner(res.Body)
|
||||||
if err != nil {
|
buf := make([]byte, 0, maxBufferSize)
|
||||||
return fmt.Errorf("POST predict: %v", err)
|
scanner.Buffer(buf, maxBufferSize)
|
||||||
}
|
|
||||||
defer resp.Body.Close()
|
|
||||||
|
|
||||||
if resp.StatusCode >= 400 {
|
// keep track of the last token generated, this is used to abort if the model starts looping
|
||||||
bodyBytes, err := io.ReadAll(resp.Body)
|
var lastToken string
|
||||||
if err != nil {
|
var tokenRepeat int
|
||||||
return fmt.Errorf("failed reading llm error response: %w", err)
|
|
||||||
|
for scanner.Scan() {
|
||||||
|
select {
|
||||||
|
case <-ctx.Done():
|
||||||
|
// This handles the request cancellation
|
||||||
|
return ctx.Err()
|
||||||
|
default:
|
||||||
|
line := scanner.Bytes()
|
||||||
|
if len(line) == 0 {
|
||||||
|
continue
|
||||||
}
|
}
|
||||||
log.Printf("llm predict error: %s", bodyBytes)
|
|
||||||
return fmt.Errorf("%s", bodyBytes)
|
|
||||||
}
|
|
||||||
|
|
||||||
scanner := bufio.NewScanner(resp.Body)
|
evt, ok := bytes.CutPrefix(line, []byte("data: "))
|
||||||
buf := make([]byte, 0, maxBufferSize)
|
if !ok {
|
||||||
scanner.Buffer(buf, maxBufferSize)
|
return fmt.Errorf("error parsing llm response stream: %s", line)
|
||||||
|
}
|
||||||
|
|
||||||
retryNeeded := false
|
var c completion
|
||||||
// keep track of the last token generated, this is used to abort if the model starts looping
|
if err := json.Unmarshal(evt, &c); err != nil {
|
||||||
var lastToken string
|
return fmt.Errorf("error unmarshaling llm prediction response: %v", err)
|
||||||
var tokenRepeat int
|
}
|
||||||
|
|
||||||
for scanner.Scan() {
|
switch {
|
||||||
select {
|
case strings.TrimSpace(c.Content) == lastToken:
|
||||||
case <-ctx.Done():
|
tokenRepeat++
|
||||||
// This handles the request cancellation
|
|
||||||
return ctx.Err()
|
|
||||||
default:
|
default:
|
||||||
line := scanner.Bytes()
|
lastToken = strings.TrimSpace(c.Content)
|
||||||
if len(line) == 0 {
|
tokenRepeat = 0
|
||||||
continue
|
|
||||||
}
|
|
||||||
|
|
||||||
// try again on slot unavailable
|
|
||||||
if bytes.Contains(line, []byte("slot unavailable")) {
|
|
||||||
retryNeeded = true
|
|
||||||
break
|
|
||||||
}
|
|
||||||
|
|
||||||
evt, ok := bytes.CutPrefix(line, []byte("data: "))
|
|
||||||
if !ok {
|
|
||||||
return fmt.Errorf("error parsing llm response stream: %s", line)
|
|
||||||
}
|
|
||||||
|
|
||||||
var c completion
|
|
||||||
if err := json.Unmarshal(evt, &c); err != nil {
|
|
||||||
return fmt.Errorf("error unmarshaling llm prediction response: %v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
switch {
|
|
||||||
case strings.TrimSpace(c.Content) == lastToken:
|
|
||||||
tokenRepeat++
|
|
||||||
default:
|
|
||||||
lastToken = strings.TrimSpace(c.Content)
|
|
||||||
tokenRepeat = 0
|
|
||||||
}
|
|
||||||
|
|
||||||
// 30 picked as an arbitrary max token repeat limit, modify as needed
|
|
||||||
if tokenRepeat > 30 {
|
|
||||||
slog.Debug("prediction aborted, token repeat limit reached")
|
|
||||||
return ctx.Err()
|
|
||||||
}
|
|
||||||
|
|
||||||
if c.Content != "" {
|
|
||||||
fn(CompletionResponse{
|
|
||||||
Content: c.Content,
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
if c.Stop {
|
|
||||||
fn(CompletionResponse{
|
|
||||||
Done: true,
|
|
||||||
PromptEvalCount: c.Timings.PromptN,
|
|
||||||
PromptEvalDuration: parseDurationMs(c.Timings.PromptMS),
|
|
||||||
EvalCount: c.Timings.PredictedN,
|
|
||||||
EvalDuration: parseDurationMs(c.Timings.PredictedMS),
|
|
||||||
})
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
if err := scanner.Err(); err != nil {
|
// 30 picked as an arbitrary max token repeat limit, modify as needed
|
||||||
if strings.Contains(err.Error(), "unexpected EOF") {
|
if tokenRepeat > 30 {
|
||||||
s.Close()
|
slog.Debug("prediction aborted, token repeat limit reached")
|
||||||
msg := ""
|
return ctx.Err()
|
||||||
if s.status != nil && s.status.LastErrMsg != "" {
|
}
|
||||||
msg = s.status.LastErrMsg
|
|
||||||
|
if c.Content != "" {
|
||||||
|
fn(CompletionResponse{
|
||||||
|
Content: c.Content,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
if c.Stop {
|
||||||
|
doneReason := "stop"
|
||||||
|
if c.StoppedLimit {
|
||||||
|
doneReason = "length"
|
||||||
}
|
}
|
||||||
|
|
||||||
return fmt.Errorf("an unknown error was encountered while running the model %s", msg)
|
fn(CompletionResponse{
|
||||||
|
Done: true,
|
||||||
|
DoneReason: doneReason,
|
||||||
|
PromptEvalCount: c.Timings.PromptN,
|
||||||
|
PromptEvalDuration: parseDurationMs(c.Timings.PromptMS),
|
||||||
|
EvalCount: c.Timings.PredictedN,
|
||||||
|
EvalDuration: parseDurationMs(c.Timings.PredictedMS),
|
||||||
|
})
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
return fmt.Errorf("error reading llm response: %v", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
if !retryNeeded {
|
|
||||||
return nil // success
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// should never reach here ideally
|
if err := scanner.Err(); err != nil {
|
||||||
return fmt.Errorf("max retries exceeded")
|
if strings.Contains(err.Error(), "unexpected EOF") {
|
||||||
|
s.Close()
|
||||||
|
msg := ""
|
||||||
|
if s.status != nil && s.status.LastErrMsg != "" {
|
||||||
|
msg = s.status.LastErrMsg
|
||||||
|
}
|
||||||
|
return fmt.Errorf("an unknown error was encountered while running the model %s", msg)
|
||||||
|
}
|
||||||
|
|
||||||
|
return fmt.Errorf("error reading llm response: %v", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
type EmbeddingRequest struct {
|
type EmbeddingRequest struct {
|
||||||
@@ -743,8 +785,9 @@ func (s *llmServer) Embedding(ctx context.Context, prompt string) ([]float64, er
|
|||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
defer s.sem.Release(1)
|
defer s.sem.Release(1)
|
||||||
|
|
||||||
// Make sure the server is ready
|
// Make sure the server is ready
|
||||||
status, err := s.getServerStatus(ctx)
|
status, err := s.getServerStatusRetry(ctx)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
} else if status != ServerStatusReady {
|
} else if status != ServerStatusReady {
|
||||||
@@ -799,7 +842,7 @@ func (s *llmServer) Tokenize(ctx context.Context, content string) ([]int, error)
|
|||||||
status, err := s.getServerStatus(ctx)
|
status, err := s.getServerStatus(ctx)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
} else if status != ServerStatusReady && status != ServerStatusNoSlotsAvaialble {
|
} else if status != ServerStatusReady && status != ServerStatusNoSlotsAvailable {
|
||||||
return nil, fmt.Errorf("unexpected server status: %s", status.ToString())
|
return nil, fmt.Errorf("unexpected server status: %s", status.ToString())
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -851,7 +894,7 @@ func (s *llmServer) Detokenize(ctx context.Context, tokens []int) (string, error
|
|||||||
status, err := s.getServerStatus(ctx)
|
status, err := s.getServerStatus(ctx)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return "", err
|
return "", err
|
||||||
} else if status != ServerStatusReady && status != ServerStatusNoSlotsAvaialble {
|
} else if status != ServerStatusReady && status != ServerStatusNoSlotsAvailable {
|
||||||
return "", fmt.Errorf("unexpected server status: %s", status.ToString())
|
return "", fmt.Errorf("unexpected server status: %s", status.ToString())
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -896,8 +939,11 @@ func (s *llmServer) Close() error {
|
|||||||
if err := s.cmd.Process.Kill(); err != nil {
|
if err := s.cmd.Process.Kill(); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
// if ProcessState is already populated, Wait already completed, no need to wait again
|
||||||
_ = s.cmd.Wait()
|
if s.cmd.ProcessState == nil {
|
||||||
|
slog.Debug("waiting for llama server to exit")
|
||||||
|
<-s.done
|
||||||
|
}
|
||||||
|
|
||||||
slog.Debug("llama server stopped")
|
slog.Debug("llama server stopped")
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -109,13 +109,12 @@ func toChatCompletion(id string, r api.ChatResponse) ChatCompletion {
|
|||||||
Choices: []Choice{{
|
Choices: []Choice{{
|
||||||
Index: 0,
|
Index: 0,
|
||||||
Message: Message{Role: r.Message.Role, Content: r.Message.Content},
|
Message: Message{Role: r.Message.Role, Content: r.Message.Content},
|
||||||
FinishReason: func(done bool) *string {
|
FinishReason: func(reason string) *string {
|
||||||
if done {
|
if len(reason) > 0 {
|
||||||
reason := "stop"
|
|
||||||
return &reason
|
return &reason
|
||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}(r.Done),
|
}(r.DoneReason),
|
||||||
}},
|
}},
|
||||||
Usage: Usage{
|
Usage: Usage{
|
||||||
// TODO: ollama returns 0 for prompt eval if the prompt was cached, but openai returns the actual count
|
// TODO: ollama returns 0 for prompt eval if the prompt was cached, but openai returns the actual count
|
||||||
@@ -133,19 +132,16 @@ func toChunk(id string, r api.ChatResponse) ChatCompletionChunk {
|
|||||||
Created: time.Now().Unix(),
|
Created: time.Now().Unix(),
|
||||||
Model: r.Model,
|
Model: r.Model,
|
||||||
SystemFingerprint: "fp_ollama",
|
SystemFingerprint: "fp_ollama",
|
||||||
Choices: []ChunkChoice{
|
Choices: []ChunkChoice{{
|
||||||
{
|
Index: 0,
|
||||||
Index: 0,
|
Delta: Message{Role: "assistant", Content: r.Message.Content},
|
||||||
Delta: Message{Role: "assistant", Content: r.Message.Content},
|
FinishReason: func(reason string) *string {
|
||||||
FinishReason: func(done bool) *string {
|
if len(reason) > 0 {
|
||||||
if done {
|
return &reason
|
||||||
reason := "stop"
|
}
|
||||||
return &reason
|
return nil
|
||||||
}
|
}(r.DoneReason),
|
||||||
return nil
|
}},
|
||||||
}(r.Done),
|
|
||||||
},
|
|
||||||
},
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -218,7 +218,7 @@ func (i *Instance) Readline() (string, error) {
|
|||||||
case CharCtrlZ:
|
case CharCtrlZ:
|
||||||
fd := int(syscall.Stdin)
|
fd := int(syscall.Stdin)
|
||||||
return handleCharCtrlZ(fd, i.Terminal.termios)
|
return handleCharCtrlZ(fd, i.Terminal.termios)
|
||||||
case CharEnter:
|
case CharEnter, CharCtrlJ:
|
||||||
output := buf.String()
|
output := buf.String()
|
||||||
if output != "" {
|
if output != "" {
|
||||||
i.History.Add([]rune(output))
|
i.History.Add([]rune(output))
|
||||||
@@ -232,7 +232,7 @@ func (i *Instance) Readline() (string, error) {
|
|||||||
metaDel = false
|
metaDel = false
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
if r >= CharSpace || r == CharEnter {
|
if r >= CharSpace || r == CharEnter || r == CharCtrlJ {
|
||||||
buf.Add(r)
|
buf.Add(r)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -166,8 +166,8 @@ fi
|
|||||||
|
|
||||||
if check_gpu lspci amdgpu || check_gpu lshw amdgpu; then
|
if check_gpu lspci amdgpu || check_gpu lshw amdgpu; then
|
||||||
# Look for pre-existing ROCm v6 before downloading the dependencies
|
# Look for pre-existing ROCm v6 before downloading the dependencies
|
||||||
for search in "${HIP_PATH:-''}" "${ROCM_PATH:-''}" "/opt/rocm"; do
|
for search in "${HIP_PATH:-''}" "${ROCM_PATH:-''}" "/opt/rocm" "/usr/lib64"; do
|
||||||
if [ -n "${search}" ] && [ -e "${search}/lib/libhipblas.so.2" ]; then
|
if [ -n "${search}" ] && [ -e "${search}/libhipblas.so.2" -o -e "${search}/lib/libhipblas.so.2" ]; then
|
||||||
status "Compatible AMD GPU ROCm library detected at ${search}"
|
status "Compatible AMD GPU ROCm library detected at ${search}"
|
||||||
install_success
|
install_success
|
||||||
exit 0
|
exit 0
|
||||||
|
|||||||
174
server/envconfig/config.go
Normal file
174
server/envconfig/config.go
Normal file
@@ -0,0 +1,174 @@
|
|||||||
|
package envconfig
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"runtime"
|
||||||
|
"strconv"
|
||||||
|
"strings"
|
||||||
|
)
|
||||||
|
|
||||||
|
var (
|
||||||
|
// Set via OLLAMA_ORIGINS in the environment
|
||||||
|
AllowOrigins []string
|
||||||
|
// Set via OLLAMA_DEBUG in the environment
|
||||||
|
Debug bool
|
||||||
|
// Set via OLLAMA_LLM_LIBRARY in the environment
|
||||||
|
LLMLibrary string
|
||||||
|
// Set via OLLAMA_MAX_LOADED_MODELS in the environment
|
||||||
|
MaxRunners int
|
||||||
|
// Set via OLLAMA_MAX_QUEUE in the environment
|
||||||
|
MaxQueuedRequests int
|
||||||
|
// Set via OLLAMA_MAX_VRAM in the environment
|
||||||
|
MaxVRAM uint64
|
||||||
|
// Set via OLLAMA_NOPRUNE in the environment
|
||||||
|
NoPrune bool
|
||||||
|
// Set via OLLAMA_NUM_PARALLEL in the environment
|
||||||
|
NumParallel int
|
||||||
|
// Set via OLLAMA_RUNNERS_DIR in the environment
|
||||||
|
RunnersDir string
|
||||||
|
// Set via OLLAMA_TMPDIR in the environment
|
||||||
|
TmpDir string
|
||||||
|
)
|
||||||
|
|
||||||
|
func AsMap() map[string]string {
|
||||||
|
return map[string]string{
|
||||||
|
"OLLAMA_ORIGINS": fmt.Sprintf("%v", AllowOrigins),
|
||||||
|
"OLLAMA_DEBUG": fmt.Sprintf("%v", Debug),
|
||||||
|
"OLLAMA_LLM_LIBRARY": fmt.Sprintf("%v", LLMLibrary),
|
||||||
|
"OLLAMA_MAX_LOADED_MODELS": fmt.Sprintf("%v", MaxRunners),
|
||||||
|
"OLLAMA_MAX_QUEUE": fmt.Sprintf("%v", MaxQueuedRequests),
|
||||||
|
"OLLAMA_MAX_VRAM": fmt.Sprintf("%v", MaxVRAM),
|
||||||
|
"OLLAMA_NOPRUNE": fmt.Sprintf("%v", NoPrune),
|
||||||
|
"OLLAMA_NUM_PARALLEL": fmt.Sprintf("%v", NumParallel),
|
||||||
|
"OLLAMA_RUNNERS_DIR": fmt.Sprintf("%v", RunnersDir),
|
||||||
|
"OLLAMA_TMPDIR": fmt.Sprintf("%v", TmpDir),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
var defaultAllowOrigins = []string{
|
||||||
|
"localhost",
|
||||||
|
"127.0.0.1",
|
||||||
|
"0.0.0.0",
|
||||||
|
}
|
||||||
|
|
||||||
|
// Clean quotes and spaces from the value
|
||||||
|
func clean(key string) string {
|
||||||
|
return strings.Trim(os.Getenv(key), "\"' ")
|
||||||
|
}
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
// default values
|
||||||
|
NumParallel = 1
|
||||||
|
MaxRunners = 1
|
||||||
|
MaxQueuedRequests = 512
|
||||||
|
|
||||||
|
LoadConfig()
|
||||||
|
}
|
||||||
|
|
||||||
|
func LoadConfig() {
|
||||||
|
if debug := clean("OLLAMA_DEBUG"); debug != "" {
|
||||||
|
d, err := strconv.ParseBool(debug)
|
||||||
|
if err == nil {
|
||||||
|
Debug = d
|
||||||
|
} else {
|
||||||
|
Debug = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
RunnersDir = clean("OLLAMA_RUNNERS_DIR")
|
||||||
|
if runtime.GOOS == "windows" && RunnersDir == "" {
|
||||||
|
// On Windows we do not carry the payloads inside the main executable
|
||||||
|
appExe, err := os.Executable()
|
||||||
|
if err != nil {
|
||||||
|
slog.Error("failed to lookup executable path", "error", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
cwd, err := os.Getwd()
|
||||||
|
if err != nil {
|
||||||
|
slog.Error("failed to lookup working directory", "error", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
var paths []string
|
||||||
|
for _, root := range []string{filepath.Dir(appExe), cwd} {
|
||||||
|
paths = append(paths,
|
||||||
|
filepath.Join(root),
|
||||||
|
filepath.Join(root, "windows-"+runtime.GOARCH),
|
||||||
|
filepath.Join(root, "dist", "windows-"+runtime.GOARCH),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Try a few variations to improve developer experience when building from source in the local tree
|
||||||
|
for _, p := range paths {
|
||||||
|
candidate := filepath.Join(p, "ollama_runners")
|
||||||
|
_, err := os.Stat(candidate)
|
||||||
|
if err == nil {
|
||||||
|
RunnersDir = candidate
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if RunnersDir == "" {
|
||||||
|
slog.Error("unable to locate llm runner directory. Set OLLAMA_RUNNERS_DIR to the location of 'ollama_runners'")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
TmpDir = clean("OLLAMA_TMPDIR")
|
||||||
|
|
||||||
|
userLimit := clean("OLLAMA_MAX_VRAM")
|
||||||
|
if userLimit != "" {
|
||||||
|
avail, err := strconv.ParseUint(userLimit, 10, 64)
|
||||||
|
if err != nil {
|
||||||
|
slog.Error("invalid setting, ignoring", "OLLAMA_MAX_VRAM", userLimit, "error", err)
|
||||||
|
} else {
|
||||||
|
MaxVRAM = avail
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
LLMLibrary = clean("OLLAMA_LLM_LIBRARY")
|
||||||
|
|
||||||
|
if onp := clean("OLLAMA_NUM_PARALLEL"); onp != "" {
|
||||||
|
val, err := strconv.Atoi(onp)
|
||||||
|
if err != nil || val <= 0 {
|
||||||
|
slog.Error("invalid setting must be greater than zero", "OLLAMA_NUM_PARALLEL", onp, "error", err)
|
||||||
|
} else {
|
||||||
|
NumParallel = val
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if noprune := clean("OLLAMA_NOPRUNE"); noprune != "" {
|
||||||
|
NoPrune = true
|
||||||
|
}
|
||||||
|
|
||||||
|
if origins := clean("OLLAMA_ORIGINS"); origins != "" {
|
||||||
|
AllowOrigins = strings.Split(origins, ",")
|
||||||
|
}
|
||||||
|
for _, allowOrigin := range defaultAllowOrigins {
|
||||||
|
AllowOrigins = append(AllowOrigins,
|
||||||
|
fmt.Sprintf("http://%s", allowOrigin),
|
||||||
|
fmt.Sprintf("https://%s", allowOrigin),
|
||||||
|
fmt.Sprintf("http://%s:*", allowOrigin),
|
||||||
|
fmt.Sprintf("https://%s:*", allowOrigin),
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
maxRunners := clean("OLLAMA_MAX_LOADED_MODELS")
|
||||||
|
if maxRunners != "" {
|
||||||
|
m, err := strconv.Atoi(maxRunners)
|
||||||
|
if err != nil {
|
||||||
|
slog.Error("invalid setting", "OLLAMA_MAX_LOADED_MODELS", maxRunners, "error", err)
|
||||||
|
} else {
|
||||||
|
MaxRunners = m
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if onp := os.Getenv("OLLAMA_MAX_QUEUE"); onp != "" {
|
||||||
|
p, err := strconv.Atoi(onp)
|
||||||
|
if err != nil || p <= 0 {
|
||||||
|
slog.Error("invalid setting", "OLLAMA_MAX_QUEUE", onp, "error", err)
|
||||||
|
} else {
|
||||||
|
MaxQueuedRequests = p
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
20
server/envconfig/config_test.go
Normal file
20
server/envconfig/config_test.go
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
package envconfig
|
||||||
|
|
||||||
|
import (
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/require"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestConfig(t *testing.T) {
|
||||||
|
Debug = false // Reset whatever was loaded in init()
|
||||||
|
t.Setenv("OLLAMA_DEBUG", "")
|
||||||
|
LoadConfig()
|
||||||
|
require.False(t, Debug)
|
||||||
|
t.Setenv("OLLAMA_DEBUG", "false")
|
||||||
|
LoadConfig()
|
||||||
|
require.False(t, Debug)
|
||||||
|
t.Setenv("OLLAMA_DEBUG", "1")
|
||||||
|
LoadConfig()
|
||||||
|
require.True(t, Debug)
|
||||||
|
}
|
||||||
611
server/images.go
611
server/images.go
@@ -1,8 +1,8 @@
|
|||||||
package server
|
package server
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"archive/zip"
|
|
||||||
"bytes"
|
"bytes"
|
||||||
|
"cmp"
|
||||||
"context"
|
"context"
|
||||||
"crypto/sha256"
|
"crypto/sha256"
|
||||||
"encoding/base64"
|
"encoding/base64"
|
||||||
@@ -11,7 +11,6 @@ import (
|
|||||||
"errors"
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
"io/fs"
|
|
||||||
"log"
|
"log"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"net/http"
|
"net/http"
|
||||||
@@ -26,10 +25,9 @@ import (
|
|||||||
|
|
||||||
"github.com/ollama/ollama/api"
|
"github.com/ollama/ollama/api"
|
||||||
"github.com/ollama/ollama/auth"
|
"github.com/ollama/ollama/auth"
|
||||||
"github.com/ollama/ollama/convert"
|
|
||||||
"github.com/ollama/ollama/format"
|
"github.com/ollama/ollama/format"
|
||||||
"github.com/ollama/ollama/llm"
|
"github.com/ollama/ollama/llm"
|
||||||
"github.com/ollama/ollama/parser"
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
"github.com/ollama/ollama/types/errtypes"
|
"github.com/ollama/ollama/types/errtypes"
|
||||||
"github.com/ollama/ollama/types/model"
|
"github.com/ollama/ollama/types/model"
|
||||||
"github.com/ollama/ollama/version"
|
"github.com/ollama/ollama/version"
|
||||||
@@ -54,7 +52,6 @@ type Model struct {
|
|||||||
System string
|
System string
|
||||||
License []string
|
License []string
|
||||||
Digest string
|
Digest string
|
||||||
Size int64
|
|
||||||
Options map[string]interface{}
|
Options map[string]interface{}
|
||||||
Messages []Message
|
Messages []Message
|
||||||
}
|
}
|
||||||
@@ -63,46 +60,74 @@ func (m *Model) IsEmbedding() bool {
|
|||||||
return slices.Contains(m.Config.ModelFamilies, "bert") || slices.Contains(m.Config.ModelFamilies, "nomic-bert")
|
return slices.Contains(m.Config.ModelFamilies, "bert") || slices.Contains(m.Config.ModelFamilies, "nomic-bert")
|
||||||
}
|
}
|
||||||
|
|
||||||
func (m *Model) Commands() (cmds []parser.Command) {
|
func (m *Model) String() string {
|
||||||
cmds = append(cmds, parser.Command{Name: "model", Args: m.ModelPath})
|
var modelfile model.File
|
||||||
|
|
||||||
if m.Template != "" {
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
cmds = append(cmds, parser.Command{Name: "template", Args: m.Template})
|
Name: "model",
|
||||||
}
|
Args: m.ModelPath,
|
||||||
|
})
|
||||||
if m.System != "" {
|
|
||||||
cmds = append(cmds, parser.Command{Name: "system", Args: m.System})
|
|
||||||
}
|
|
||||||
|
|
||||||
for _, adapter := range m.AdapterPaths {
|
for _, adapter := range m.AdapterPaths {
|
||||||
cmds = append(cmds, parser.Command{Name: "adapter", Args: adapter})
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
|
Name: "adapter",
|
||||||
|
Args: adapter,
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
for _, projector := range m.ProjectorPaths {
|
for _, projector := range m.ProjectorPaths {
|
||||||
cmds = append(cmds, parser.Command{Name: "projector", Args: projector})
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
|
Name: "model",
|
||||||
|
Args: projector,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
if m.Template != "" {
|
||||||
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
|
Name: "template",
|
||||||
|
Args: m.Template,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
if m.System != "" {
|
||||||
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
|
Name: "system",
|
||||||
|
Args: m.System,
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
for k, v := range m.Options {
|
for k, v := range m.Options {
|
||||||
switch v := v.(type) {
|
switch v := v.(type) {
|
||||||
case []any:
|
case []any:
|
||||||
for _, s := range v {
|
for _, s := range v {
|
||||||
cmds = append(cmds, parser.Command{Name: k, Args: fmt.Sprintf("%v", s)})
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
|
Name: k,
|
||||||
|
Args: fmt.Sprintf("%v", s),
|
||||||
|
})
|
||||||
}
|
}
|
||||||
default:
|
default:
|
||||||
cmds = append(cmds, parser.Command{Name: k, Args: fmt.Sprintf("%v", v)})
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
|
Name: k,
|
||||||
|
Args: fmt.Sprintf("%v", v),
|
||||||
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
for _, license := range m.License {
|
for _, license := range m.License {
|
||||||
cmds = append(cmds, parser.Command{Name: "license", Args: license})
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
|
Name: "license",
|
||||||
|
Args: license,
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
for _, msg := range m.Messages {
|
for _, msg := range m.Messages {
|
||||||
cmds = append(cmds, parser.Command{Name: "message", Args: fmt.Sprintf("%s %s", msg.Role, msg.Content)})
|
modelfile.Commands = append(modelfile.Commands, model.Command{
|
||||||
|
Name: "message",
|
||||||
|
Args: fmt.Sprintf("%s %s", msg.Role, msg.Content),
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
return cmds
|
return modelfile.String()
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
type Message struct {
|
type Message struct {
|
||||||
@@ -130,50 +155,11 @@ type ConfigV2 struct {
|
|||||||
RootFS RootFS `json:"rootfs"`
|
RootFS RootFS `json:"rootfs"`
|
||||||
}
|
}
|
||||||
|
|
||||||
func (c *ConfigV2) SetModelFormat(format string) {
|
|
||||||
if c.ModelFormat == "" {
|
|
||||||
c.ModelFormat = format
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func (c *ConfigV2) SetModelFamily(families ...string) {
|
|
||||||
for _, family := range families {
|
|
||||||
if c.ModelFamily == "" {
|
|
||||||
c.ModelFamily = family
|
|
||||||
}
|
|
||||||
|
|
||||||
if !slices.Contains(c.ModelFamilies, family) {
|
|
||||||
c.ModelFamilies = append(c.ModelFamilies, family)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func (c *ConfigV2) SetModelType(modelType string) {
|
|
||||||
if c.ModelType == "" {
|
|
||||||
c.ModelType = modelType
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func (c *ConfigV2) SetFileType(fileType string) {
|
|
||||||
if c.FileType == "" {
|
|
||||||
c.FileType = fileType
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
type RootFS struct {
|
type RootFS struct {
|
||||||
Type string `json:"type"`
|
Type string `json:"type"`
|
||||||
DiffIDs []string `json:"diff_ids"`
|
DiffIDs []string `json:"diff_ids"`
|
||||||
}
|
}
|
||||||
|
|
||||||
func (m *ManifestV2) GetTotalSize() (total int64) {
|
|
||||||
for _, layer := range m.Layers {
|
|
||||||
total += layer.Size
|
|
||||||
}
|
|
||||||
|
|
||||||
total += m.Config.Size
|
|
||||||
return total
|
|
||||||
}
|
|
||||||
|
|
||||||
func GetManifest(mp ModelPath) (*ManifestV2, string, error) {
|
func GetManifest(mp ModelPath) (*ManifestV2, string, error) {
|
||||||
fp, err := mp.GetManifestPath()
|
fp, err := mp.GetManifestPath()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
@@ -214,7 +200,6 @@ func GetModel(name string) (*Model, error) {
|
|||||||
Digest: digest,
|
Digest: digest,
|
||||||
Template: "{{ .Prompt }}",
|
Template: "{{ .Prompt }}",
|
||||||
License: []string{},
|
License: []string{},
|
||||||
Size: manifest.GetTotalSize(),
|
|
||||||
}
|
}
|
||||||
|
|
||||||
filename, err := GetBlobsPath(manifest.Config.Digest)
|
filename, err := GetBlobsPath(manifest.Config.Digest)
|
||||||
@@ -304,7 +289,7 @@ func GetModel(name string) (*Model, error) {
|
|||||||
return model, nil
|
return model, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func realpath(mfDir, from string) string {
|
func realpath(rel, from string) string {
|
||||||
abspath, err := filepath.Abs(from)
|
abspath, err := filepath.Abs(from)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return from
|
return from
|
||||||
@@ -321,22 +306,15 @@ func realpath(mfDir, from string) string {
|
|||||||
return filepath.Join(home, from[2:])
|
return filepath.Join(home, from[2:])
|
||||||
}
|
}
|
||||||
|
|
||||||
if _, err := os.Stat(filepath.Join(mfDir, from)); err == nil {
|
if _, err := os.Stat(filepath.Join(rel, from)); err == nil {
|
||||||
// this is a file relative to the Modelfile
|
// this is a file relative to the Modelfile
|
||||||
return filepath.Join(mfDir, from)
|
return filepath.Join(rel, from)
|
||||||
}
|
}
|
||||||
|
|
||||||
return abspath
|
return abspath
|
||||||
}
|
}
|
||||||
|
|
||||||
func CreateModel(ctx context.Context, name, modelFileDir, quantization string, commands []parser.Command, fn func(resp api.ProgressResponse)) error {
|
func CreateModel(ctx context.Context, name, modelFileDir, quantization string, modelfile *model.File, fn func(resp api.ProgressResponse)) (err error) {
|
||||||
deleteMap := make(map[string]struct{})
|
|
||||||
if manifest, _, err := GetManifest(ParseModelPath(name)); err == nil {
|
|
||||||
for _, layer := range append(manifest.Layers, manifest.Config) {
|
|
||||||
deleteMap[layer.Digest] = struct{}{}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
config := ConfigV2{
|
config := ConfigV2{
|
||||||
OS: "linux",
|
OS: "linux",
|
||||||
Architecture: "amd64",
|
Architecture: "amd64",
|
||||||
@@ -345,250 +323,181 @@ func CreateModel(ctx context.Context, name, modelFileDir, quantization string, c
|
|||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
var layers Layers
|
var messages []*api.Message
|
||||||
messages := []string{}
|
parameters := make(map[string]any)
|
||||||
|
|
||||||
params := make(map[string][]string)
|
var layers []*Layer
|
||||||
fromParams := make(map[string]any)
|
for _, c := range modelfile.Commands {
|
||||||
|
|
||||||
for _, c := range commands {
|
|
||||||
mediatype := fmt.Sprintf("application/vnd.ollama.image.%s", c.Name)
|
mediatype := fmt.Sprintf("application/vnd.ollama.image.%s", c.Name)
|
||||||
|
|
||||||
switch c.Name {
|
switch c.Name {
|
||||||
case "model":
|
case "model", "adapter":
|
||||||
if strings.HasPrefix(c.Args, "@") {
|
var baseLayers []*layerWithGGML
|
||||||
blobPath, err := GetBlobsPath(strings.TrimPrefix(c.Args, "@"))
|
if name := model.ParseName(c.Args); name.IsValid() {
|
||||||
|
baseLayers, err = parseFromModel(ctx, name, fn)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
} else if strings.HasPrefix(c.Args, "@") {
|
||||||
|
blobpath, err := GetBlobsPath(strings.TrimPrefix(c.Args, "@"))
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
c.Args = blobPath
|
blob, err := os.Open(blobpath)
|
||||||
}
|
if err != nil {
|
||||||
|
|
||||||
pathName := realpath(modelFileDir, c.Args)
|
|
||||||
|
|
||||||
ggufName, err := convertModel(name, pathName, fn)
|
|
||||||
if err != nil {
|
|
||||||
var pathErr *fs.PathError
|
|
||||||
switch {
|
|
||||||
case errors.Is(err, zip.ErrFormat):
|
|
||||||
// it's not a safetensor archive
|
|
||||||
case errors.As(err, &pathErr):
|
|
||||||
// it's not a file on disk, could be a model reference
|
|
||||||
default:
|
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
defer blob.Close()
|
||||||
|
|
||||||
|
baseLayers, err = parseFromFile(ctx, blob, fn)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
} else if file, err := os.Open(realpath(modelFileDir, c.Args)); err == nil {
|
||||||
|
defer file.Close()
|
||||||
|
|
||||||
|
baseLayers, err = parseFromFile(ctx, file, fn)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
return fmt.Errorf("invalid model reference: %s", c.Args)
|
||||||
}
|
}
|
||||||
|
|
||||||
if ggufName != "" {
|
for _, baseLayer := range baseLayers {
|
||||||
pathName = ggufName
|
if quantization != "" &&
|
||||||
defer os.RemoveAll(ggufName)
|
baseLayer.MediaType == "application/vnd.ollama.image.model" &&
|
||||||
|
baseLayer.GGML != nil &&
|
||||||
if quantization != "" {
|
baseLayer.GGML.Name() == "gguf" {
|
||||||
quantization = strings.ToUpper(quantization)
|
want, err := llm.ParseFileType(quantization)
|
||||||
fn(api.ProgressResponse{Status: fmt.Sprintf("quantizing %s model to %s", "F16", quantization)})
|
|
||||||
tempfile, err := os.CreateTemp(filepath.Dir(ggufName), quantization)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
defer os.RemoveAll(tempfile.Name())
|
|
||||||
|
|
||||||
if err := llm.Quantize(ggufName, tempfile.Name(), quantization); err != nil {
|
ft := baseLayer.GGML.KV().FileType()
|
||||||
return err
|
if !slices.Contains([]string{"F16", "F32"}, ft.String()) {
|
||||||
}
|
return errors.New("quantization is only supported for F16 and F32 models")
|
||||||
|
} else if want != ft {
|
||||||
|
fn(api.ProgressResponse{Status: fmt.Sprintf("quantizing %s model to %s", ft, quantization)})
|
||||||
|
|
||||||
if err := tempfile.Close(); err != nil {
|
blob, err := GetBlobsPath(baseLayer.Digest)
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
pathName = tempfile.Name()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
bin, err := os.Open(pathName)
|
|
||||||
if err != nil {
|
|
||||||
// not a file on disk so must be a model reference
|
|
||||||
modelpath := ParseModelPath(c.Args)
|
|
||||||
manifest, _, err := GetManifest(modelpath)
|
|
||||||
switch {
|
|
||||||
case errors.Is(err, os.ErrNotExist):
|
|
||||||
fn(api.ProgressResponse{Status: "pulling model"})
|
|
||||||
if err := PullModel(ctx, c.Args, ®istryOptions{}, fn); err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
manifest, _, err = GetManifest(modelpath)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
case err != nil:
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
fn(api.ProgressResponse{Status: "reading model metadata"})
|
|
||||||
fromConfigPath, err := GetBlobsPath(manifest.Config.Digest)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
fromConfigFile, err := os.Open(fromConfigPath)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
defer fromConfigFile.Close()
|
|
||||||
|
|
||||||
var fromConfig ConfigV2
|
|
||||||
if err := json.NewDecoder(fromConfigFile).Decode(&fromConfig); err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
// if the model is still not in gguf format, error out
|
|
||||||
if fromConfig.ModelFormat != "gguf" {
|
|
||||||
return fmt.Errorf("%s is not in gguf format, this base model is not compatible with this version of ollama", c.Args)
|
|
||||||
}
|
|
||||||
|
|
||||||
config.SetModelFormat(fromConfig.ModelFormat)
|
|
||||||
config.SetModelFamily(append(fromConfig.ModelFamilies, fromConfig.ModelFamily)...)
|
|
||||||
config.SetModelType(fromConfig.ModelType)
|
|
||||||
config.SetFileType(fromConfig.FileType)
|
|
||||||
|
|
||||||
for _, layer := range manifest.Layers {
|
|
||||||
deleteMap[layer.Digest] = struct{}{}
|
|
||||||
if layer.MediaType == "application/vnd.ollama.image.params" {
|
|
||||||
fromParamsPath, err := GetBlobsPath(layer.Digest)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
fromParamsFile, err := os.Open(fromParamsPath)
|
temp, err := os.CreateTemp(filepath.Dir(blob), quantization)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
defer fromParamsFile.Close()
|
defer temp.Close()
|
||||||
|
defer os.Remove(temp.Name())
|
||||||
|
|
||||||
if err := json.NewDecoder(fromParamsFile).Decode(&fromParams); err != nil {
|
if err := llm.Quantize(blob, temp.Name(), want); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
baseLayer.Layer, err = NewLayer(temp, baseLayer.Layer.MediaType)
|
||||||
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
layer, err := NewLayerFromLayer(layer.Digest, layer.MediaType, modelpath.GetShortTagname())
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
layers.Add(layer)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
deleteMap[manifest.Config.Digest] = struct{}{}
|
if baseLayer.GGML != nil {
|
||||||
continue
|
config.ModelFormat = cmp.Or(config.ModelFormat, baseLayer.GGML.Name())
|
||||||
|
config.ModelFamily = cmp.Or(config.ModelFamily, baseLayer.GGML.KV().Architecture())
|
||||||
|
config.ModelType = cmp.Or(config.ModelType, format.HumanNumber(baseLayer.GGML.KV().ParameterCount()))
|
||||||
|
config.FileType = cmp.Or(config.FileType, baseLayer.GGML.KV().FileType().String())
|
||||||
|
config.ModelFamilies = append(config.ModelFamilies, baseLayer.GGML.KV().Architecture())
|
||||||
|
}
|
||||||
|
|
||||||
|
layers = append(layers, baseLayer.Layer)
|
||||||
}
|
}
|
||||||
defer bin.Close()
|
case "license", "template", "system":
|
||||||
|
blob := strings.NewReader(c.Args)
|
||||||
var offset int64
|
layer, err := NewLayer(blob, mediatype)
|
||||||
for {
|
|
||||||
fn(api.ProgressResponse{Status: "creating model layer"})
|
|
||||||
if _, err := bin.Seek(offset, io.SeekStart); err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
ggml, size, err := llm.DecodeGGML(bin)
|
|
||||||
if errors.Is(err, io.EOF) {
|
|
||||||
break
|
|
||||||
} else if errors.Is(err, llm.ErrUnsupportedFormat) {
|
|
||||||
return fmt.Errorf("model binary specified in FROM field is not a valid gguf format model, %w", err)
|
|
||||||
} else if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
config.SetModelFormat(ggml.Name())
|
|
||||||
config.SetModelFamily(ggml.KV().Architecture())
|
|
||||||
config.SetModelType(format.HumanNumber(ggml.KV().ParameterCount()))
|
|
||||||
config.SetFileType(ggml.KV().FileType())
|
|
||||||
|
|
||||||
mediatype := mediatype
|
|
||||||
if ggml.KV().Architecture() == "clip" {
|
|
||||||
mediatype = "application/vnd.ollama.image.projector"
|
|
||||||
}
|
|
||||||
|
|
||||||
sr := io.NewSectionReader(bin, offset, size)
|
|
||||||
layer, err := NewLayer(sr, mediatype)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
layers.Add(layer)
|
|
||||||
|
|
||||||
offset += size
|
|
||||||
}
|
|
||||||
case "adapter":
|
|
||||||
if strings.HasPrefix(c.Args, "@") {
|
|
||||||
blobPath, err := GetBlobsPath(strings.TrimPrefix(c.Args, "@"))
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
c.Args = blobPath
|
|
||||||
}
|
|
||||||
|
|
||||||
fn(api.ProgressResponse{Status: "creating adapter layer"})
|
|
||||||
bin, err := os.Open(realpath(modelFileDir, c.Args))
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
defer bin.Close()
|
|
||||||
|
|
||||||
_, size, err := llm.DecodeGGML(bin)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
sr := io.NewSectionReader(bin, 0, size)
|
if c.Name != "license" {
|
||||||
layer, err := NewLayer(sr, mediatype)
|
// replace
|
||||||
if err != nil {
|
layers = slices.DeleteFunc(layers, func(layer *Layer) bool {
|
||||||
return err
|
return layer.MediaType == mediatype
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
layers.Add(layer)
|
layers = append(layers, layer)
|
||||||
case "license":
|
|
||||||
fn(api.ProgressResponse{Status: "creating license layer"})
|
|
||||||
|
|
||||||
bin := strings.NewReader(c.Args)
|
|
||||||
layer, err := NewLayer(bin, mediatype)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
layers.Add(layer)
|
|
||||||
case "template", "system":
|
|
||||||
fn(api.ProgressResponse{Status: fmt.Sprintf("creating %s layer", c.Name)})
|
|
||||||
|
|
||||||
bin := strings.NewReader(c.Args)
|
|
||||||
layer, err := NewLayer(bin, mediatype)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
layers.Replace(layer)
|
|
||||||
case "message":
|
case "message":
|
||||||
messages = append(messages, c.Args)
|
role, content, ok := strings.Cut(c.Args, ": ")
|
||||||
|
if !ok {
|
||||||
|
return fmt.Errorf("invalid message: %s", c.Args)
|
||||||
|
}
|
||||||
|
|
||||||
|
messages = append(messages, &api.Message{Role: role, Content: content})
|
||||||
default:
|
default:
|
||||||
params[c.Name] = append(params[c.Name], c.Args)
|
ps, err := api.FormatParams(map[string][]string{c.Name: {c.Args}})
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
for k, v := range ps {
|
||||||
|
if ks, ok := parameters[k].([]string); ok {
|
||||||
|
parameters[k] = append(ks, v.([]string)...)
|
||||||
|
} else if vs, ok := v.([]string); ok {
|
||||||
|
parameters[k] = vs
|
||||||
|
} else {
|
||||||
|
parameters[k] = v
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(messages) > 0 {
|
var err2 error
|
||||||
fn(api.ProgressResponse{Status: "creating parameters layer"})
|
layers = slices.DeleteFunc(layers, func(layer *Layer) bool {
|
||||||
|
switch layer.MediaType {
|
||||||
|
case "application/vnd.ollama.image.message":
|
||||||
|
// if there are new messages, remove the inherited ones
|
||||||
|
if len(messages) > 0 {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
msgs := make([]api.Message, 0)
|
return false
|
||||||
|
case "application/vnd.ollama.image.params":
|
||||||
|
// merge inherited parameters with new ones
|
||||||
|
r, err := layer.Open()
|
||||||
|
if err != nil {
|
||||||
|
err2 = err
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
defer r.Close()
|
||||||
|
|
||||||
for _, m := range messages {
|
var ps map[string]any
|
||||||
// todo: handle images
|
if err := json.NewDecoder(r).Decode(&ps); err != nil {
|
||||||
msg := strings.SplitN(m, ": ", 2)
|
err2 = err
|
||||||
msgs = append(msgs, api.Message{Role: msg[0], Content: msg[1]})
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
for k, v := range ps {
|
||||||
|
if _, ok := parameters[k]; !ok {
|
||||||
|
parameters[k] = v
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return true
|
||||||
|
default:
|
||||||
|
return false
|
||||||
}
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
if err2 != nil {
|
||||||
|
return err2
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(messages) > 0 {
|
||||||
var b bytes.Buffer
|
var b bytes.Buffer
|
||||||
if err := json.NewEncoder(&b).Encode(msgs); err != nil {
|
if err := json.NewEncoder(&b).Encode(messages); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -597,39 +506,25 @@ func CreateModel(ctx context.Context, name, modelFileDir, quantization string, c
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
layers.Replace(layer)
|
layers = append(layers, layer)
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(params) > 0 {
|
if len(parameters) > 0 {
|
||||||
fn(api.ProgressResponse{Status: "creating parameters layer"})
|
|
||||||
|
|
||||||
formattedParams, err := api.FormatParams(params)
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
for k, v := range fromParams {
|
|
||||||
if _, ok := formattedParams[k]; !ok {
|
|
||||||
formattedParams[k] = v
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
var b bytes.Buffer
|
var b bytes.Buffer
|
||||||
if err := json.NewEncoder(&b).Encode(formattedParams); err != nil {
|
if err := json.NewEncoder(&b).Encode(parameters); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
fn(api.ProgressResponse{Status: "creating config layer"})
|
|
||||||
layer, err := NewLayer(&b, "application/vnd.ollama.image.params")
|
layer, err := NewLayer(&b, "application/vnd.ollama.image.params")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
layers.Replace(layer)
|
layers = append(layers, layer)
|
||||||
}
|
}
|
||||||
|
|
||||||
digests := make([]string, len(layers.items))
|
digests := make([]string, len(layers))
|
||||||
for i, layer := range layers.items {
|
for i, layer := range layers {
|
||||||
digests[i] = layer.Digest
|
digests[i] = layer.Digest
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -640,36 +535,37 @@ func CreateModel(ctx context.Context, name, modelFileDir, quantization string, c
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
configLayer, err := NewLayer(&b, "application/vnd.docker.container.image.v1+json")
|
layer, err := NewLayer(&b, "application/vnd.docker.container.image.v1+json")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
delete(deleteMap, configLayer.Digest)
|
for _, layer := range append(layers, layer) {
|
||||||
|
if layer.status != "" {
|
||||||
|
fn(api.ProgressResponse{Status: layer.status})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
for _, layer := range append(layers.items, configLayer) {
|
unref := make(map[string]struct{})
|
||||||
committed, err := layer.Commit()
|
if manifest, _, err := GetManifest(ParseModelPath(name)); err == nil {
|
||||||
if err != nil {
|
for _, layer := range manifest.Layers {
|
||||||
return err
|
if !slices.Contains(digests, layer.Digest) {
|
||||||
|
unref[layer.Digest] = struct{}{}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
status := "writing layer"
|
if manifest.Config.Digest != layer.Digest {
|
||||||
if !committed {
|
unref[manifest.Config.Digest] = struct{}{}
|
||||||
status = "using already created layer"
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn(api.ProgressResponse{Status: fmt.Sprintf("%s %s", status, layer.Digest)})
|
|
||||||
|
|
||||||
delete(deleteMap, layer.Digest)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn(api.ProgressResponse{Status: "writing manifest"})
|
fn(api.ProgressResponse{Status: "writing manifest"})
|
||||||
if err := WriteManifest(name, configLayer, layers.items); err != nil {
|
if err := WriteManifest(name, layer, layers); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
if noprune := os.Getenv("OLLAMA_NOPRUNE"); noprune == "" {
|
if !envconfig.NoPrune {
|
||||||
if err := deleteUnusedLayers(nil, deleteMap, false); err != nil {
|
if err := deleteUnusedLayers(nil, unref); err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -678,74 +574,6 @@ func CreateModel(ctx context.Context, name, modelFileDir, quantization string, c
|
|||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func convertModel(name, path string, fn func(resp api.ProgressResponse)) (string, error) {
|
|
||||||
r, err := zip.OpenReader(path)
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
defer r.Close()
|
|
||||||
|
|
||||||
tempDir, err := os.MkdirTemp("", "ollama-convert")
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
defer os.RemoveAll(tempDir)
|
|
||||||
|
|
||||||
fn(api.ProgressResponse{Status: "unpacking model metadata"})
|
|
||||||
for _, f := range r.File {
|
|
||||||
fpath := filepath.Join(tempDir, f.Name)
|
|
||||||
outFile, err := os.OpenFile(fpath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
rc, err := f.Open()
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
_, err = io.Copy(outFile, rc)
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
outFile.Close()
|
|
||||||
rc.Close()
|
|
||||||
}
|
|
||||||
|
|
||||||
mf, err := convert.GetModelFormat(tempDir)
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
params, err := mf.GetParams(tempDir)
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
mArch, err := mf.GetModelArch(name, tempDir, params)
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
fn(api.ProgressResponse{Status: "processing tensors"})
|
|
||||||
if err := mArch.GetTensors(); err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
if err := mArch.LoadVocab(); err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
fn(api.ProgressResponse{Status: "converting model"})
|
|
||||||
path, err = mArch.WriteGGUF()
|
|
||||||
if err != nil {
|
|
||||||
return "", err
|
|
||||||
}
|
|
||||||
|
|
||||||
return path, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
func CopyModel(src, dst model.Name) error {
|
func CopyModel(src, dst model.Name) error {
|
||||||
if !dst.IsFullyQualified() {
|
if !dst.IsFullyQualified() {
|
||||||
return model.Unqualified(dst)
|
return model.Unqualified(dst)
|
||||||
@@ -785,7 +613,7 @@ func CopyModel(src, dst model.Name) error {
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
func deleteUnusedLayers(skipModelPath *ModelPath, deleteMap map[string]struct{}, dryRun bool) error {
|
func deleteUnusedLayers(skipModelPath *ModelPath, deleteMap map[string]struct{}) error {
|
||||||
fp, err := GetManifestPath()
|
fp, err := GetManifestPath()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
@@ -832,13 +660,9 @@ func deleteUnusedLayers(skipModelPath *ModelPath, deleteMap map[string]struct{},
|
|||||||
slog.Info(fmt.Sprintf("couldn't get file path for '%s': %v", k, err))
|
slog.Info(fmt.Sprintf("couldn't get file path for '%s': %v", k, err))
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
if !dryRun {
|
if err := os.Remove(fp); err != nil {
|
||||||
if err := os.Remove(fp); err != nil {
|
slog.Info(fmt.Sprintf("couldn't remove file '%s': %v", fp, err))
|
||||||
slog.Info(fmt.Sprintf("couldn't remove file '%s': %v", fp, err))
|
continue
|
||||||
continue
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
slog.Info(fmt.Sprintf("wanted to remove: %s", fp))
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -861,14 +685,25 @@ func PruneLayers() error {
|
|||||||
for _, blob := range blobs {
|
for _, blob := range blobs {
|
||||||
name := blob.Name()
|
name := blob.Name()
|
||||||
name = strings.ReplaceAll(name, "-", ":")
|
name = strings.ReplaceAll(name, "-", ":")
|
||||||
if strings.HasPrefix(name, "sha256:") {
|
|
||||||
deleteMap[name] = struct{}{}
|
_, err := GetBlobsPath(name)
|
||||||
|
if err != nil {
|
||||||
|
if errors.Is(err, ErrInvalidDigestFormat) {
|
||||||
|
// remove invalid blobs (e.g. partial downloads)
|
||||||
|
if err := os.Remove(filepath.Join(p, blob.Name())); err != nil {
|
||||||
|
slog.Error("couldn't remove blob", "blob", blob.Name(), "error", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
|
deleteMap[name] = struct{}{}
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Info(fmt.Sprintf("total blobs: %d", len(deleteMap)))
|
slog.Info(fmt.Sprintf("total blobs: %d", len(deleteMap)))
|
||||||
|
|
||||||
err = deleteUnusedLayers(nil, deleteMap, false)
|
err = deleteUnusedLayers(nil, deleteMap)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
@@ -924,7 +759,7 @@ func DeleteModel(name string) error {
|
|||||||
}
|
}
|
||||||
deleteMap[manifest.Config.Digest] = struct{}{}
|
deleteMap[manifest.Config.Digest] = struct{}{}
|
||||||
|
|
||||||
err = deleteUnusedLayers(&mp, deleteMap, false)
|
err = deleteUnusedLayers(&mp, deleteMap)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
@@ -999,7 +834,7 @@ func PullModel(ctx context.Context, name string, regOpts *registryOptions, fn fu
|
|||||||
// build deleteMap to prune unused layers
|
// build deleteMap to prune unused layers
|
||||||
deleteMap := make(map[string]struct{})
|
deleteMap := make(map[string]struct{})
|
||||||
|
|
||||||
if noprune = os.Getenv("OLLAMA_NOPRUNE"); noprune == "" {
|
if !envconfig.NoPrune {
|
||||||
manifest, _, err = GetManifest(mp)
|
manifest, _, err = GetManifest(mp)
|
||||||
if err != nil && !errors.Is(err, os.ErrNotExist) {
|
if err != nil && !errors.Is(err, os.ErrNotExist) {
|
||||||
return err
|
return err
|
||||||
@@ -1084,7 +919,7 @@ func PullModel(ctx context.Context, name string, regOpts *registryOptions, fn fu
|
|||||||
|
|
||||||
if noprune == "" {
|
if noprune == "" {
|
||||||
fn(api.ProgressResponse{Status: "removing any unused layers"})
|
fn(api.ProgressResponse{Status: "removing any unused layers"})
|
||||||
err = deleteUnusedLayers(nil, deleteMap, false)
|
err = deleteUnusedLayers(nil, deleteMap)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,39 +5,14 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
"os"
|
"os"
|
||||||
"strings"
|
|
||||||
|
|
||||||
"golang.org/x/exp/slices"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
type Layers struct {
|
|
||||||
items []*Layer
|
|
||||||
}
|
|
||||||
|
|
||||||
func (ls *Layers) Add(layer *Layer) {
|
|
||||||
if layer.Size > 0 {
|
|
||||||
ls.items = append(ls.items, layer)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func (ls *Layers) Replace(layer *Layer) {
|
|
||||||
if layer.Size > 0 {
|
|
||||||
mediatype := layer.MediaType
|
|
||||||
layers := slices.DeleteFunc(ls.items, func(l *Layer) bool {
|
|
||||||
return l.MediaType == mediatype
|
|
||||||
})
|
|
||||||
|
|
||||||
ls.items = append(layers, layer)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
type Layer struct {
|
type Layer struct {
|
||||||
MediaType string `json:"mediaType"`
|
MediaType string `json:"mediaType"`
|
||||||
Digest string `json:"digest"`
|
Digest string `json:"digest"`
|
||||||
Size int64 `json:"size"`
|
Size int64 `json:"size"`
|
||||||
From string `json:"from,omitempty"`
|
From string `json:"from,omitempty"`
|
||||||
|
status string
|
||||||
tempFileName string
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func NewLayer(r io.Reader, mediatype string) (*Layer, error) {
|
func NewLayer(r io.Reader, mediatype string) (*Layer, error) {
|
||||||
@@ -46,14 +21,12 @@ func NewLayer(r io.Reader, mediatype string) (*Layer, error) {
|
|||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
|
||||||
const delimiter = "-"
|
temp, err := os.CreateTemp(blobs, "sha256-")
|
||||||
|
|
||||||
pattern := strings.Join([]string{"sha256", "*-partial"}, delimiter)
|
|
||||||
temp, err := os.CreateTemp(blobs, pattern)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
defer temp.Close()
|
defer temp.Close()
|
||||||
|
defer os.Remove(temp.Name())
|
||||||
|
|
||||||
sha256sum := sha256.New()
|
sha256sum := sha256.New()
|
||||||
n, err := io.Copy(io.MultiWriter(temp, sha256sum), r)
|
n, err := io.Copy(io.MultiWriter(temp, sha256sum), r)
|
||||||
@@ -61,11 +34,29 @@ func NewLayer(r io.Reader, mediatype string) (*Layer, error) {
|
|||||||
return nil, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if err := temp.Close(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
digest := fmt.Sprintf("sha256:%x", sha256sum.Sum(nil))
|
||||||
|
blob, err := GetBlobsPath(digest)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
status := "using existing layer"
|
||||||
|
if _, err := os.Stat(blob); err != nil {
|
||||||
|
status = "creating new layer"
|
||||||
|
if err := os.Rename(temp.Name(), blob); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
return &Layer{
|
return &Layer{
|
||||||
MediaType: mediatype,
|
MediaType: mediatype,
|
||||||
Digest: fmt.Sprintf("sha256:%x", sha256sum.Sum(nil)),
|
Digest: digest,
|
||||||
Size: n,
|
Size: n,
|
||||||
tempFileName: temp.Name(),
|
status: fmt.Sprintf("%s %s", status, digest),
|
||||||
}, nil
|
}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -85,21 +76,15 @@ func NewLayerFromLayer(digest, mediatype, from string) (*Layer, error) {
|
|||||||
Digest: digest,
|
Digest: digest,
|
||||||
Size: fi.Size(),
|
Size: fi.Size(),
|
||||||
From: from,
|
From: from,
|
||||||
|
status: fmt.Sprintf("using existing layer %s", digest),
|
||||||
}, nil
|
}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (l *Layer) Commit() (bool, error) {
|
func (l *Layer) Open() (io.ReadCloser, error) {
|
||||||
// always remove temp
|
|
||||||
defer os.Remove(l.tempFileName)
|
|
||||||
|
|
||||||
blob, err := GetBlobsPath(l.Digest)
|
blob, err := GetBlobsPath(l.Digest)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return false, err
|
return nil, err
|
||||||
}
|
}
|
||||||
|
|
||||||
if _, err := os.Stat(blob); err != nil {
|
return os.Open(blob)
|
||||||
return true, os.Rename(l.tempFileName, blob)
|
|
||||||
}
|
|
||||||
|
|
||||||
return false, nil
|
|
||||||
}
|
}
|
||||||
79
server/manifest.go
Normal file
79
server/manifest.go
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
package server
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"crypto/sha256"
|
||||||
|
"encoding/json"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
|
||||||
|
"github.com/ollama/ollama/types/model"
|
||||||
|
)
|
||||||
|
|
||||||
|
type Manifest struct {
|
||||||
|
ManifestV2
|
||||||
|
Digest string `json:"-"`
|
||||||
|
}
|
||||||
|
|
||||||
|
func (m *Manifest) Size() (size int64) {
|
||||||
|
for _, layer := range append(m.Layers, m.Config) {
|
||||||
|
size += layer.Size
|
||||||
|
}
|
||||||
|
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
func ParseNamedManifest(name model.Name) (*Manifest, error) {
|
||||||
|
if !name.IsFullyQualified() {
|
||||||
|
return nil, model.Unqualified(name)
|
||||||
|
}
|
||||||
|
|
||||||
|
manifests, err := GetManifestPath()
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
var manifest ManifestV2
|
||||||
|
manifestfile, err := os.Open(filepath.Join(manifests, name.Filepath()))
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
sha256sum := sha256.New()
|
||||||
|
if err := json.NewDecoder(io.TeeReader(manifestfile, sha256sum)).Decode(&manifest); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
return &Manifest{
|
||||||
|
ManifestV2: manifest,
|
||||||
|
Digest: fmt.Sprintf("%x", sha256sum.Sum(nil)),
|
||||||
|
}, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func WriteManifest(name string, config *Layer, layers []*Layer) error {
|
||||||
|
manifest := ManifestV2{
|
||||||
|
SchemaVersion: 2,
|
||||||
|
MediaType: "application/vnd.docker.distribution.manifest.v2+json",
|
||||||
|
Config: config,
|
||||||
|
Layers: layers,
|
||||||
|
}
|
||||||
|
|
||||||
|
var b bytes.Buffer
|
||||||
|
if err := json.NewEncoder(&b).Encode(manifest); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
modelpath := ParseModelPath(name)
|
||||||
|
manifestPath, err := modelpath.GetManifestPath()
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := os.MkdirAll(filepath.Dir(manifestPath), 0o755); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
return os.WriteFile(manifestPath, b.Bytes(), 0o644)
|
||||||
|
}
|
||||||
@@ -1,34 +0,0 @@
|
|||||||
package server
|
|
||||||
|
|
||||||
import (
|
|
||||||
"bytes"
|
|
||||||
"encoding/json"
|
|
||||||
"os"
|
|
||||||
"path/filepath"
|
|
||||||
)
|
|
||||||
|
|
||||||
func WriteManifest(name string, config *Layer, layers []*Layer) error {
|
|
||||||
manifest := ManifestV2{
|
|
||||||
SchemaVersion: 2,
|
|
||||||
MediaType: "application/vnd.docker.distribution.manifest.v2+json",
|
|
||||||
Config: config,
|
|
||||||
Layers: layers,
|
|
||||||
}
|
|
||||||
|
|
||||||
var b bytes.Buffer
|
|
||||||
if err := json.NewEncoder(&b).Encode(manifest); err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
modelpath := ParseModelPath(name)
|
|
||||||
manifestPath, err := modelpath.GetManifestPath()
|
|
||||||
if err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
if err := os.MkdirAll(filepath.Dir(manifestPath), 0o755); err != nil {
|
|
||||||
return err
|
|
||||||
}
|
|
||||||
|
|
||||||
return os.WriteFile(manifestPath, b.Bytes(), 0o644)
|
|
||||||
}
|
|
||||||
261
server/model.go
Normal file
261
server/model.go
Normal file
@@ -0,0 +1,261 @@
|
|||||||
|
package server
|
||||||
|
|
||||||
|
import (
|
||||||
|
"archive/zip"
|
||||||
|
"bytes"
|
||||||
|
"context"
|
||||||
|
"errors"
|
||||||
|
"fmt"
|
||||||
|
"io"
|
||||||
|
"net/http"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
|
||||||
|
"github.com/ollama/ollama/api"
|
||||||
|
"github.com/ollama/ollama/convert"
|
||||||
|
"github.com/ollama/ollama/llm"
|
||||||
|
"github.com/ollama/ollama/types/model"
|
||||||
|
)
|
||||||
|
|
||||||
|
type layerWithGGML struct {
|
||||||
|
*Layer
|
||||||
|
*llm.GGML
|
||||||
|
}
|
||||||
|
|
||||||
|
func parseFromModel(ctx context.Context, name model.Name, fn func(api.ProgressResponse)) (layers []*layerWithGGML, err error) {
|
||||||
|
modelpath := ParseModelPath(name.String())
|
||||||
|
manifest, _, err := GetManifest(modelpath)
|
||||||
|
switch {
|
||||||
|
case errors.Is(err, os.ErrNotExist):
|
||||||
|
if err := PullModel(ctx, name.String(), ®istryOptions{}, fn); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
modelpath = ParseModelPath(name.String())
|
||||||
|
manifest, _, err = GetManifest(modelpath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
case err != nil:
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, layer := range manifest.Layers {
|
||||||
|
layer, err := NewLayerFromLayer(layer.Digest, layer.MediaType, modelpath.GetShortTagname())
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
switch layer.MediaType {
|
||||||
|
case "application/vnd.ollama.image.model",
|
||||||
|
"application/vnd.ollama.image.projector",
|
||||||
|
"application/vnd.ollama.image.adapter":
|
||||||
|
blobpath, err := GetBlobsPath(layer.Digest)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
blob, err := os.Open(blobpath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer blob.Close()
|
||||||
|
|
||||||
|
ggml, _, err := llm.DecodeGGML(blob)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
layers = append(layers, &layerWithGGML{layer, ggml})
|
||||||
|
default:
|
||||||
|
layers = append(layers, &layerWithGGML{layer, nil})
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
return layers, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func parseFromZipFile(_ context.Context, file *os.File, fn func(api.ProgressResponse)) (layers []*layerWithGGML, err error) {
|
||||||
|
stat, err := file.Stat()
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
r, err := zip.NewReader(file, stat.Size())
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
tempdir, err := os.MkdirTemp(filepath.Dir(file.Name()), "")
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer os.RemoveAll(tempdir)
|
||||||
|
|
||||||
|
fn(api.ProgressResponse{Status: "unpacking model metadata"})
|
||||||
|
for _, f := range r.File {
|
||||||
|
// TODO(mxyng): this should not write out all files to disk
|
||||||
|
outfile, err := os.Create(filepath.Join(tempdir, f.Name))
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer outfile.Close()
|
||||||
|
|
||||||
|
infile, err := f.Open()
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer infile.Close()
|
||||||
|
|
||||||
|
if _, err = io.Copy(outfile, infile); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := outfile.Close(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := infile.Close(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
mf, err := convert.GetModelFormat(tempdir)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
params, err := mf.GetParams(tempdir)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
mArch, err := mf.GetModelArch("", tempdir, params)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
fn(api.ProgressResponse{Status: "processing tensors"})
|
||||||
|
if err := mArch.GetTensors(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := mArch.LoadVocab(); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
fn(api.ProgressResponse{Status: "converting model"})
|
||||||
|
|
||||||
|
// TODO(mxyng): this should write directly into a layer
|
||||||
|
// e.g. NewLayer(arch.Reader(), "application/vnd.ollama.image.model")
|
||||||
|
temp, err := os.CreateTemp(tempdir, "fp16")
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer temp.Close()
|
||||||
|
defer os.Remove(temp.Name())
|
||||||
|
|
||||||
|
if err = mArch.WriteGGUF(temp); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
if _, err := temp.Seek(0, io.SeekStart); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
layer, err := NewLayer(temp, "application/vnd.ollama.image.model")
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("aaa: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
blobpath, err := GetBlobsPath(layer.Digest)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
bin, err := os.Open(blobpath)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
defer bin.Close()
|
||||||
|
|
||||||
|
ggml, _, err := llm.DecodeGGML(bin)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
layer, err = NewLayerFromLayer(layer.Digest, layer.MediaType, "")
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
layers = append(layers, &layerWithGGML{layer, ggml})
|
||||||
|
return layers, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func parseFromFile(ctx context.Context, file *os.File, fn func(api.ProgressResponse)) (layers []*layerWithGGML, err error) {
|
||||||
|
sr := io.NewSectionReader(file, 0, 512)
|
||||||
|
contentType, err := detectContentType(sr)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
switch contentType {
|
||||||
|
case "gguf", "ggla":
|
||||||
|
// noop
|
||||||
|
case "application/zip":
|
||||||
|
return parseFromZipFile(ctx, file, fn)
|
||||||
|
default:
|
||||||
|
return nil, fmt.Errorf("unsupported content type: %s", contentType)
|
||||||
|
}
|
||||||
|
|
||||||
|
stat, err := file.Stat()
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
var offset int64
|
||||||
|
for offset < stat.Size() {
|
||||||
|
ggml, n, err := llm.DecodeGGML(file)
|
||||||
|
if errors.Is(err, io.EOF) {
|
||||||
|
break
|
||||||
|
} else if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
mediatype := "application/vnd.ollama.image.model"
|
||||||
|
if ggml.Name() == "ggla" {
|
||||||
|
mediatype = "application/vnd.ollama.image.adapter"
|
||||||
|
} else if ggml.KV().Architecture() == "clip" {
|
||||||
|
mediatype = "application/vnd.ollama.image.projector"
|
||||||
|
}
|
||||||
|
|
||||||
|
layer, err := NewLayer(io.NewSectionReader(file, offset, n), mediatype)
|
||||||
|
if err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
|
layers = append(layers, &layerWithGGML{layer, ggml})
|
||||||
|
offset = n
|
||||||
|
}
|
||||||
|
|
||||||
|
return layers, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func detectContentType(r io.Reader) (string, error) {
|
||||||
|
var b bytes.Buffer
|
||||||
|
if _, err := io.Copy(&b, r); err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
|
||||||
|
if contentType := llm.DetectGGMLType(b.Bytes()); contentType != "" {
|
||||||
|
return contentType, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
if contentType := http.DetectContentType(b.Bytes()); contentType != "application/octet-stream" {
|
||||||
|
return contentType, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
return "unknown", nil
|
||||||
|
}
|
||||||
@@ -6,6 +6,7 @@ import (
|
|||||||
"net/url"
|
"net/url"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
|
"regexp"
|
||||||
"strings"
|
"strings"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -25,9 +26,10 @@ const (
|
|||||||
)
|
)
|
||||||
|
|
||||||
var (
|
var (
|
||||||
ErrInvalidImageFormat = errors.New("invalid image format")
|
ErrInvalidImageFormat = errors.New("invalid image format")
|
||||||
ErrInvalidProtocol = errors.New("invalid protocol scheme")
|
ErrInvalidProtocol = errors.New("invalid protocol scheme")
|
||||||
ErrInsecureProtocol = errors.New("insecure protocol http")
|
ErrInsecureProtocol = errors.New("insecure protocol http")
|
||||||
|
ErrInvalidDigestFormat = errors.New("invalid digest format")
|
||||||
)
|
)
|
||||||
|
|
||||||
func ParseModelPath(name string) ModelPath {
|
func ParseModelPath(name string) ModelPath {
|
||||||
@@ -149,6 +151,14 @@ func GetBlobsPath(digest string) (string, error) {
|
|||||||
return "", err
|
return "", err
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// only accept actual sha256 digests
|
||||||
|
pattern := "^sha256[:-][0-9a-fA-F]{64}$"
|
||||||
|
re := regexp.MustCompile(pattern)
|
||||||
|
|
||||||
|
if digest != "" && !re.MatchString(digest) {
|
||||||
|
return "", ErrInvalidDigestFormat
|
||||||
|
}
|
||||||
|
|
||||||
digest = strings.ReplaceAll(digest, ":", "-")
|
digest = strings.ReplaceAll(digest, ":", "-")
|
||||||
path := filepath.Join(dir, "blobs", digest)
|
path := filepath.Join(dir, "blobs", digest)
|
||||||
dirPath := filepath.Dir(path)
|
dirPath := filepath.Dir(path)
|
||||||
|
|||||||
@@ -1,6 +1,73 @@
|
|||||||
package server
|
package server
|
||||||
|
|
||||||
import "testing"
|
import (
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"testing"
|
||||||
|
|
||||||
|
"github.com/stretchr/testify/assert"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestGetBlobsPath(t *testing.T) {
|
||||||
|
// GetBlobsPath expects an actual directory to exist
|
||||||
|
dir, err := os.MkdirTemp("", "ollama-test")
|
||||||
|
assert.Nil(t, err)
|
||||||
|
defer os.RemoveAll(dir)
|
||||||
|
|
||||||
|
tests := []struct {
|
||||||
|
name string
|
||||||
|
digest string
|
||||||
|
expected string
|
||||||
|
err error
|
||||||
|
}{
|
||||||
|
{
|
||||||
|
"empty digest",
|
||||||
|
"",
|
||||||
|
filepath.Join(dir, "blobs"),
|
||||||
|
nil,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"valid with colon",
|
||||||
|
"sha256:456402914e838a953e0cf80caa6adbe75383d9e63584a964f504a7bbb8f7aad9",
|
||||||
|
filepath.Join(dir, "blobs", "sha256-456402914e838a953e0cf80caa6adbe75383d9e63584a964f504a7bbb8f7aad9"),
|
||||||
|
nil,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"valid with dash",
|
||||||
|
"sha256-456402914e838a953e0cf80caa6adbe75383d9e63584a964f504a7bbb8f7aad9",
|
||||||
|
filepath.Join(dir, "blobs", "sha256-456402914e838a953e0cf80caa6adbe75383d9e63584a964f504a7bbb8f7aad9"),
|
||||||
|
nil,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"digest too short",
|
||||||
|
"sha256-45640291",
|
||||||
|
"",
|
||||||
|
ErrInvalidDigestFormat,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"digest too long",
|
||||||
|
"sha256-456402914e838a953e0cf80caa6adbe75383d9e63584a964f504a7bbb8f7aad9aaaaaaaaaa",
|
||||||
|
"",
|
||||||
|
ErrInvalidDigestFormat,
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"digest invalid chars",
|
||||||
|
"../sha256-456402914e838a953e0cf80caa6adbe75383d9e63584a964f504a7bbb8f7a",
|
||||||
|
"",
|
||||||
|
ErrInvalidDigestFormat,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
for _, tc := range tests {
|
||||||
|
t.Run(tc.name, func(t *testing.T) {
|
||||||
|
t.Setenv("OLLAMA_MODELS", dir)
|
||||||
|
|
||||||
|
got, err := GetBlobsPath(tc.digest)
|
||||||
|
|
||||||
|
assert.ErrorIs(t, tc.err, err, tc.name)
|
||||||
|
assert.Equal(t, tc.expected, got, tc.name)
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func TestParseModelPath(t *testing.T) {
|
func TestParseModelPath(t *testing.T) {
|
||||||
tests := []struct {
|
tests := []struct {
|
||||||
|
|||||||
277
server/routes.go
277
server/routes.go
@@ -1,6 +1,7 @@
|
|||||||
package server
|
package server
|
||||||
|
|
||||||
import (
|
import (
|
||||||
|
"cmp"
|
||||||
"context"
|
"context"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"errors"
|
"errors"
|
||||||
@@ -28,7 +29,7 @@ import (
|
|||||||
"github.com/ollama/ollama/gpu"
|
"github.com/ollama/ollama/gpu"
|
||||||
"github.com/ollama/ollama/llm"
|
"github.com/ollama/ollama/llm"
|
||||||
"github.com/ollama/ollama/openai"
|
"github.com/ollama/ollama/openai"
|
||||||
"github.com/ollama/ollama/parser"
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
"github.com/ollama/ollama/types/model"
|
"github.com/ollama/ollama/types/model"
|
||||||
"github.com/ollama/ollama/version"
|
"github.com/ollama/ollama/version"
|
||||||
)
|
)
|
||||||
@@ -126,10 +127,6 @@ func (s *Server) GenerateHandler(c *gin.Context) {
|
|||||||
|
|
||||||
opts, err := modelOptions(model, req.Options)
|
opts, err := modelOptions(model, req.Options)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
if errors.Is(err, api.ErrInvalidOpts) {
|
|
||||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -146,12 +143,7 @@ func (s *Server) GenerateHandler(c *gin.Context) {
|
|||||||
select {
|
select {
|
||||||
case runner = <-rCh:
|
case runner = <-rCh:
|
||||||
case err = <-eCh:
|
case err = <-eCh:
|
||||||
if errors.Is(err, context.Canceled) {
|
handleErrorResponse(c, err)
|
||||||
c.JSON(499, gin.H{"error": "request canceled"})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -160,9 +152,10 @@ func (s *Server) GenerateHandler(c *gin.Context) {
|
|||||||
// of `raw` mode so we need to check for it too
|
// of `raw` mode so we need to check for it too
|
||||||
if req.Prompt == "" && req.Template == "" && req.System == "" {
|
if req.Prompt == "" && req.Template == "" && req.System == "" {
|
||||||
c.JSON(http.StatusOK, api.GenerateResponse{
|
c.JSON(http.StatusOK, api.GenerateResponse{
|
||||||
CreatedAt: time.Now().UTC(),
|
CreatedAt: time.Now().UTC(),
|
||||||
Model: req.Model,
|
Model: req.Model,
|
||||||
Done: true,
|
Done: true,
|
||||||
|
DoneReason: "load",
|
||||||
})
|
})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -230,10 +223,11 @@ func (s *Server) GenerateHandler(c *gin.Context) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
resp := api.GenerateResponse{
|
resp := api.GenerateResponse{
|
||||||
Model: req.Model,
|
Model: req.Model,
|
||||||
CreatedAt: time.Now().UTC(),
|
CreatedAt: time.Now().UTC(),
|
||||||
Done: r.Done,
|
Done: r.Done,
|
||||||
Response: r.Content,
|
Response: r.Content,
|
||||||
|
DoneReason: r.DoneReason,
|
||||||
Metrics: api.Metrics{
|
Metrics: api.Metrics{
|
||||||
PromptEvalCount: r.PromptEvalCount,
|
PromptEvalCount: r.PromptEvalCount,
|
||||||
PromptEvalDuration: r.PromptEvalDuration,
|
PromptEvalDuration: r.PromptEvalDuration,
|
||||||
@@ -374,10 +368,6 @@ func (s *Server) EmbeddingsHandler(c *gin.Context) {
|
|||||||
|
|
||||||
opts, err := modelOptions(model, req.Options)
|
opts, err := modelOptions(model, req.Options)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
if errors.Is(err, api.ErrInvalidOpts) {
|
|
||||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -394,12 +384,7 @@ func (s *Server) EmbeddingsHandler(c *gin.Context) {
|
|||||||
select {
|
select {
|
||||||
case runner = <-rCh:
|
case runner = <-rCh:
|
||||||
case err = <-eCh:
|
case err = <-eCh:
|
||||||
if errors.Is(err, context.Canceled) {
|
handleErrorResponse(c, err)
|
||||||
c.JSON(499, gin.H{"error": "request canceled"})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -522,28 +507,17 @@ func (s *Server) PushModelHandler(c *gin.Context) {
|
|||||||
|
|
||||||
func (s *Server) CreateModelHandler(c *gin.Context) {
|
func (s *Server) CreateModelHandler(c *gin.Context) {
|
||||||
var req api.CreateRequest
|
var req api.CreateRequest
|
||||||
err := c.ShouldBindJSON(&req)
|
if err := c.ShouldBindJSON(&req); errors.Is(err, io.EOF) {
|
||||||
switch {
|
|
||||||
case errors.Is(err, io.EOF):
|
|
||||||
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": "missing request body"})
|
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": "missing request body"})
|
||||||
return
|
return
|
||||||
case err != nil:
|
} else if err != nil {
|
||||||
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
var model string
|
name := model.ParseName(cmp.Or(req.Model, req.Name))
|
||||||
if req.Model != "" {
|
if !name.IsValid() {
|
||||||
model = req.Model
|
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": "invalid model name"})
|
||||||
} else if req.Name != "" {
|
|
||||||
model = req.Name
|
|
||||||
} else {
|
|
||||||
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": "model is required"})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
if err := ParseModelPath(model).Validate(); err != nil {
|
|
||||||
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -552,19 +526,19 @@ func (s *Server) CreateModelHandler(c *gin.Context) {
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
var modelfile io.Reader = strings.NewReader(req.Modelfile)
|
var r io.Reader = strings.NewReader(req.Modelfile)
|
||||||
if req.Path != "" && req.Modelfile == "" {
|
if req.Path != "" && req.Modelfile == "" {
|
||||||
mf, err := os.Open(req.Path)
|
f, err := os.Open(req.Path)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("error reading modelfile: %s", err)})
|
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("error reading modelfile: %s", err)})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
defer mf.Close()
|
defer f.Close()
|
||||||
|
|
||||||
modelfile = mf
|
r = f
|
||||||
}
|
}
|
||||||
|
|
||||||
commands, err := parser.Parse(modelfile)
|
modelfile, err := model.ParseFile(r)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
||||||
return
|
return
|
||||||
@@ -580,7 +554,12 @@ func (s *Server) CreateModelHandler(c *gin.Context) {
|
|||||||
ctx, cancel := context.WithCancel(c.Request.Context())
|
ctx, cancel := context.WithCancel(c.Request.Context())
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
if err := CreateModel(ctx, model, filepath.Dir(req.Path), req.Quantization, commands, fn); err != nil {
|
quantization := req.Quantization
|
||||||
|
if req.Quantize != "" {
|
||||||
|
quantization = req.Quantize
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := CreateModel(ctx, name.String(), filepath.Dir(req.Path), strings.ToUpper(quantization), modelfile, fn); err != nil {
|
||||||
ch <- gin.H{"error": err.Error()}
|
ch <- gin.H{"error": err.Error()}
|
||||||
}
|
}
|
||||||
}()
|
}()
|
||||||
@@ -732,69 +711,86 @@ func GetModelInfo(req api.ShowRequest) (*api.ShowResponse, error) {
|
|||||||
fmt.Fprintln(&sb, "# Modelfile generate by \"ollama show\"")
|
fmt.Fprintln(&sb, "# Modelfile generate by \"ollama show\"")
|
||||||
fmt.Fprintln(&sb, "# To build a new Modelfile based on this, replace FROM with:")
|
fmt.Fprintln(&sb, "# To build a new Modelfile based on this, replace FROM with:")
|
||||||
fmt.Fprintf(&sb, "# FROM %s\n\n", model.ShortName)
|
fmt.Fprintf(&sb, "# FROM %s\n\n", model.ShortName)
|
||||||
fmt.Fprint(&sb, parser.Format(model.Commands()))
|
fmt.Fprint(&sb, model.String())
|
||||||
resp.Modelfile = sb.String()
|
resp.Modelfile = sb.String()
|
||||||
|
|
||||||
return resp, nil
|
return resp, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func (s *Server) ListModelsHandler(c *gin.Context) {
|
func (s *Server) ListModelsHandler(c *gin.Context) {
|
||||||
models := make([]api.ModelResponse, 0)
|
manifests, err := GetManifestPath()
|
||||||
manifestsPath, err := GetManifestPath()
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
modelResponse := func(modelName string) (api.ModelResponse, error) {
|
var models []api.ModelResponse
|
||||||
model, err := GetModel(modelName)
|
if err := filepath.Walk(manifests, func(path string, info os.FileInfo, _ error) error {
|
||||||
if err != nil {
|
|
||||||
return api.ModelResponse{}, err
|
|
||||||
}
|
|
||||||
|
|
||||||
modelDetails := api.ModelDetails{
|
|
||||||
Format: model.Config.ModelFormat,
|
|
||||||
Family: model.Config.ModelFamily,
|
|
||||||
Families: model.Config.ModelFamilies,
|
|
||||||
ParameterSize: model.Config.ModelType,
|
|
||||||
QuantizationLevel: model.Config.FileType,
|
|
||||||
}
|
|
||||||
|
|
||||||
return api.ModelResponse{
|
|
||||||
Model: model.ShortName,
|
|
||||||
Name: model.ShortName,
|
|
||||||
Size: model.Size,
|
|
||||||
Digest: model.Digest,
|
|
||||||
Details: modelDetails,
|
|
||||||
}, nil
|
|
||||||
}
|
|
||||||
|
|
||||||
walkFunc := func(path string, info os.FileInfo, _ error) error {
|
|
||||||
if !info.IsDir() {
|
if !info.IsDir() {
|
||||||
path, tag := filepath.Split(path)
|
rel, err := filepath.Rel(manifests, path)
|
||||||
model := strings.Trim(strings.TrimPrefix(path, manifestsPath), string(os.PathSeparator))
|
|
||||||
modelPath := strings.Join([]string{model, tag}, ":")
|
|
||||||
canonicalModelPath := strings.ReplaceAll(modelPath, string(os.PathSeparator), "/")
|
|
||||||
|
|
||||||
resp, err := modelResponse(canonicalModelPath)
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
slog.Info(fmt.Sprintf("skipping file: %s", canonicalModelPath))
|
return err
|
||||||
// nolint: nilerr
|
}
|
||||||
|
|
||||||
|
if hidden, err := filepath.Match(".*", filepath.Base(rel)); err != nil {
|
||||||
|
return err
|
||||||
|
} else if hidden {
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
resp.ModifiedAt = info.ModTime()
|
n := model.ParseNameFromFilepath(rel)
|
||||||
models = append(models, resp)
|
if !n.IsValid() {
|
||||||
|
slog.Warn("bad manifest filepath", "path", rel)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
m, err := ParseNamedManifest(n)
|
||||||
|
if err != nil {
|
||||||
|
slog.Warn("bad manifest", "name", n, "error", err)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
f, err := m.Config.Open()
|
||||||
|
if err != nil {
|
||||||
|
slog.Warn("bad manifest config filepath", "name", n, "error", err)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
defer f.Close()
|
||||||
|
|
||||||
|
var c ConfigV2
|
||||||
|
if err := json.NewDecoder(f).Decode(&c); err != nil {
|
||||||
|
slog.Warn("bad manifest config", "name", n, "error", err)
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// tag should never be masked
|
||||||
|
models = append(models, api.ModelResponse{
|
||||||
|
Model: n.DisplayShortest(),
|
||||||
|
Name: n.DisplayShortest(),
|
||||||
|
Size: m.Size(),
|
||||||
|
Digest: m.Digest,
|
||||||
|
ModifiedAt: info.ModTime(),
|
||||||
|
Details: api.ModelDetails{
|
||||||
|
Format: c.ModelFormat,
|
||||||
|
Family: c.ModelFamily,
|
||||||
|
Families: c.ModelFamilies,
|
||||||
|
ParameterSize: c.ModelType,
|
||||||
|
QuantizationLevel: c.FileType,
|
||||||
|
},
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}); err != nil {
|
||||||
|
|
||||||
if err := filepath.Walk(manifestsPath, walkFunc); err != nil {
|
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
slices.SortStableFunc(models, func(i, j api.ModelResponse) int {
|
||||||
|
// most recently modified first
|
||||||
|
return cmp.Compare(j.ModifiedAt.Unix(), i.ModifiedAt.Unix())
|
||||||
|
})
|
||||||
|
|
||||||
c.JSON(http.StatusOK, api.ListResponse{Models: models})
|
c.JSON(http.StatusOK, api.ListResponse{Models: models})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -816,7 +812,7 @@ func (s *Server) CopyModelHandler(c *gin.Context) {
|
|||||||
|
|
||||||
dst := model.ParseName(r.Destination)
|
dst := model.ParseName(r.Destination)
|
||||||
if !dst.IsValid() {
|
if !dst.IsValid() {
|
||||||
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("destination %q is invalid", r.Source)})
|
c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": fmt.Sprintf("destination %q is invalid", r.Destination)})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -872,20 +868,9 @@ func (s *Server) CreateBlobHandler(c *gin.Context) {
|
|||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
if _, err := layer.Commit(); err != nil {
|
|
||||||
c.AbortWithStatusJSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
c.Status(http.StatusCreated)
|
c.Status(http.StatusCreated)
|
||||||
}
|
}
|
||||||
|
|
||||||
var defaultAllowOrigins = []string{
|
|
||||||
"localhost",
|
|
||||||
"127.0.0.1",
|
|
||||||
"0.0.0.0",
|
|
||||||
}
|
|
||||||
|
|
||||||
func isLocalIP(ip netip.Addr) bool {
|
func isLocalIP(ip netip.Addr) bool {
|
||||||
if interfaces, err := net.Interfaces(); err == nil {
|
if interfaces, err := net.Interfaces(); err == nil {
|
||||||
for _, iface := range interfaces {
|
for _, iface := range interfaces {
|
||||||
@@ -957,6 +942,11 @@ func allowedHostsMiddleware(addr net.Addr) gin.HandlerFunc {
|
|||||||
}
|
}
|
||||||
|
|
||||||
if allowedHost(host) {
|
if allowedHost(host) {
|
||||||
|
if c.Request.Method == "OPTIONS" {
|
||||||
|
c.AbortWithStatus(http.StatusNoContent)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
c.Next()
|
c.Next()
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -969,19 +959,8 @@ func (s *Server) GenerateRoutes() http.Handler {
|
|||||||
config := cors.DefaultConfig()
|
config := cors.DefaultConfig()
|
||||||
config.AllowWildcard = true
|
config.AllowWildcard = true
|
||||||
config.AllowBrowserExtensions = true
|
config.AllowBrowserExtensions = true
|
||||||
|
config.AllowHeaders = []string{"Authorization", "Content-Type", "User-Agent", "Accept", "X-Requested-With"}
|
||||||
if allowedOrigins := strings.Trim(os.Getenv("OLLAMA_ORIGINS"), "\"'"); allowedOrigins != "" {
|
config.AllowOrigins = envconfig.AllowOrigins
|
||||||
config.AllowOrigins = strings.Split(allowedOrigins, ",")
|
|
||||||
}
|
|
||||||
|
|
||||||
for _, allowOrigin := range defaultAllowOrigins {
|
|
||||||
config.AllowOrigins = append(config.AllowOrigins,
|
|
||||||
fmt.Sprintf("http://%s", allowOrigin),
|
|
||||||
fmt.Sprintf("https://%s", allowOrigin),
|
|
||||||
fmt.Sprintf("http://%s:*", allowOrigin),
|
|
||||||
fmt.Sprintf("https://%s:*", allowOrigin),
|
|
||||||
)
|
|
||||||
}
|
|
||||||
|
|
||||||
r := gin.Default()
|
r := gin.Default()
|
||||||
r.Use(
|
r.Use(
|
||||||
@@ -1020,10 +999,11 @@ func (s *Server) GenerateRoutes() http.Handler {
|
|||||||
|
|
||||||
func Serve(ln net.Listener) error {
|
func Serve(ln net.Listener) error {
|
||||||
level := slog.LevelInfo
|
level := slog.LevelInfo
|
||||||
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
|
if envconfig.Debug {
|
||||||
level = slog.LevelDebug
|
level = slog.LevelDebug
|
||||||
}
|
}
|
||||||
|
|
||||||
|
slog.Info("server config", "env", envconfig.AsMap())
|
||||||
handler := slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
|
handler := slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{
|
||||||
Level: level,
|
Level: level,
|
||||||
AddSource: true,
|
AddSource: true,
|
||||||
@@ -1047,7 +1027,7 @@ func Serve(ln net.Listener) error {
|
|||||||
return err
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
if noprune := os.Getenv("OLLAMA_NOPRUNE"); noprune == "" {
|
if !envconfig.NoPrune {
|
||||||
// clean up unused layers and manifests
|
// clean up unused layers and manifests
|
||||||
if err := PruneLayers(); err != nil {
|
if err := PruneLayers(); err != nil {
|
||||||
return err
|
return err
|
||||||
@@ -1064,7 +1044,8 @@ func Serve(ln net.Listener) error {
|
|||||||
}
|
}
|
||||||
|
|
||||||
ctx, done := context.WithCancel(context.Background())
|
ctx, done := context.WithCancel(context.Background())
|
||||||
sched := InitScheduler(ctx)
|
schedCtx, schedDone := context.WithCancel(ctx)
|
||||||
|
sched := InitScheduler(schedCtx)
|
||||||
s := &Server{addr: ln.Addr(), sched: sched}
|
s := &Server{addr: ln.Addr(), sched: sched}
|
||||||
r := s.GenerateRoutes()
|
r := s.GenerateRoutes()
|
||||||
|
|
||||||
@@ -1078,23 +1059,32 @@ func Serve(ln net.Listener) error {
|
|||||||
signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
|
signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
|
||||||
go func() {
|
go func() {
|
||||||
<-signals
|
<-signals
|
||||||
done()
|
srvr.Close()
|
||||||
|
schedDone()
|
||||||
sched.unloadAllRunners()
|
sched.unloadAllRunners()
|
||||||
gpu.Cleanup()
|
gpu.Cleanup()
|
||||||
os.Exit(0)
|
done()
|
||||||
}()
|
}()
|
||||||
|
|
||||||
if err := llm.Init(); err != nil {
|
if err := llm.Init(); err != nil {
|
||||||
return fmt.Errorf("unable to initialize llm library %w", err)
|
return fmt.Errorf("unable to initialize llm library %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
s.sched.Run(ctx)
|
s.sched.Run(schedCtx)
|
||||||
|
|
||||||
// At startup we retrieve GPU information so we can get log messages before loading a model
|
// At startup we retrieve GPU information so we can get log messages before loading a model
|
||||||
// This will log warnings to the log in case we have problems with detected GPUs
|
// This will log warnings to the log in case we have problems with detected GPUs
|
||||||
_ = gpu.GetGPUInfo()
|
gpus := gpu.GetGPUInfo()
|
||||||
|
gpus.LogDetails()
|
||||||
|
|
||||||
return srvr.Serve(ln)
|
err = srvr.Serve(ln)
|
||||||
|
// If server is closed from the signal handler, wait for the ctx to be done
|
||||||
|
// otherwise error out quickly
|
||||||
|
if !errors.Is(err, http.ErrServerClosed) {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
<-ctx.Done()
|
||||||
|
return err
|
||||||
}
|
}
|
||||||
|
|
||||||
func waitForStream(c *gin.Context, ch chan interface{}) {
|
func waitForStream(c *gin.Context, ch chan interface{}) {
|
||||||
@@ -1203,10 +1193,6 @@ func (s *Server) ChatHandler(c *gin.Context) {
|
|||||||
|
|
||||||
opts, err := modelOptions(model, req.Options)
|
opts, err := modelOptions(model, req.Options)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
if errors.Is(err, api.ErrInvalidOpts) {
|
|
||||||
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
@@ -1223,12 +1209,7 @@ func (s *Server) ChatHandler(c *gin.Context) {
|
|||||||
select {
|
select {
|
||||||
case runner = <-rCh:
|
case runner = <-rCh:
|
||||||
case err = <-eCh:
|
case err = <-eCh:
|
||||||
if errors.Is(err, context.Canceled) {
|
handleErrorResponse(c, err)
|
||||||
c.JSON(499, gin.H{"error": "request canceled"})
|
|
||||||
return
|
|
||||||
}
|
|
||||||
|
|
||||||
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1253,10 +1234,11 @@ func (s *Server) ChatHandler(c *gin.Context) {
|
|||||||
// an empty request loads the model
|
// an empty request loads the model
|
||||||
if len(req.Messages) == 0 || prompt == "" {
|
if len(req.Messages) == 0 || prompt == "" {
|
||||||
resp := api.ChatResponse{
|
resp := api.ChatResponse{
|
||||||
CreatedAt: time.Now().UTC(),
|
CreatedAt: time.Now().UTC(),
|
||||||
Model: req.Model,
|
Model: req.Model,
|
||||||
Done: true,
|
Done: true,
|
||||||
Message: api.Message{Role: "assistant"},
|
DoneReason: "load",
|
||||||
|
Message: api.Message{Role: "assistant"},
|
||||||
}
|
}
|
||||||
c.JSON(http.StatusOK, resp)
|
c.JSON(http.StatusOK, resp)
|
||||||
return
|
return
|
||||||
@@ -1289,10 +1271,11 @@ func (s *Server) ChatHandler(c *gin.Context) {
|
|||||||
fn := func(r llm.CompletionResponse) {
|
fn := func(r llm.CompletionResponse) {
|
||||||
|
|
||||||
resp := api.ChatResponse{
|
resp := api.ChatResponse{
|
||||||
Model: req.Model,
|
Model: req.Model,
|
||||||
CreatedAt: time.Now().UTC(),
|
CreatedAt: time.Now().UTC(),
|
||||||
Message: api.Message{Role: "assistant", Content: r.Content},
|
Message: api.Message{Role: "assistant", Content: r.Content},
|
||||||
Done: r.Done,
|
Done: r.Done,
|
||||||
|
DoneReason: r.DoneReason,
|
||||||
Metrics: api.Metrics{
|
Metrics: api.Metrics{
|
||||||
PromptEvalCount: r.PromptEvalCount,
|
PromptEvalCount: r.PromptEvalCount,
|
||||||
PromptEvalDuration: r.PromptEvalDuration,
|
PromptEvalDuration: r.PromptEvalDuration,
|
||||||
@@ -1349,3 +1332,15 @@ func (s *Server) ChatHandler(c *gin.Context) {
|
|||||||
|
|
||||||
streamResponse(c, ch)
|
streamResponse(c, ch)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func handleErrorResponse(c *gin.Context, err error) {
|
||||||
|
if errors.Is(err, context.Canceled) {
|
||||||
|
c.JSON(499, gin.H{"error": "request canceled"})
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if errors.Is(err, ErrMaxQueue) {
|
||||||
|
c.JSON(http.StatusServiceUnavailable, gin.H{"error": err.Error()})
|
||||||
|
return
|
||||||
|
}
|
||||||
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
||||||
|
}
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ import (
|
|||||||
"github.com/stretchr/testify/assert"
|
"github.com/stretchr/testify/assert"
|
||||||
|
|
||||||
"github.com/ollama/ollama/api"
|
"github.com/ollama/ollama/api"
|
||||||
"github.com/ollama/ollama/parser"
|
"github.com/ollama/ollama/types/model"
|
||||||
"github.com/ollama/ollama/version"
|
"github.com/ollama/ollama/version"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -55,13 +55,13 @@ func Test_Routes(t *testing.T) {
|
|||||||
createTestModel := func(t *testing.T, name string) {
|
createTestModel := func(t *testing.T, name string) {
|
||||||
fname := createTestFile(t, "ollama-model")
|
fname := createTestFile(t, "ollama-model")
|
||||||
|
|
||||||
modelfile := strings.NewReader(fmt.Sprintf("FROM %s\nPARAMETER seed 42\nPARAMETER top_p 0.9\nPARAMETER stop foo\nPARAMETER stop bar", fname))
|
r := strings.NewReader(fmt.Sprintf("FROM %s\nPARAMETER seed 42\nPARAMETER top_p 0.9\nPARAMETER stop foo\nPARAMETER stop bar", fname))
|
||||||
commands, err := parser.Parse(modelfile)
|
modelfile, err := model.ParseFile(r)
|
||||||
assert.Nil(t, err)
|
assert.Nil(t, err)
|
||||||
fn := func(resp api.ProgressResponse) {
|
fn := func(resp api.ProgressResponse) {
|
||||||
t.Logf("Status: %s", resp.Status)
|
t.Logf("Status: %s", resp.Status)
|
||||||
}
|
}
|
||||||
err = CreateModel(context.TODO(), name, "", "", commands, fn)
|
err = CreateModel(context.TODO(), name, "", "", modelfile, fn)
|
||||||
assert.Nil(t, err)
|
assert.Nil(t, err)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -124,14 +124,12 @@ func Test_Routes(t *testing.T) {
|
|||||||
Method: http.MethodPost,
|
Method: http.MethodPost,
|
||||||
Path: "/api/create",
|
Path: "/api/create",
|
||||||
Setup: func(t *testing.T, req *http.Request) {
|
Setup: func(t *testing.T, req *http.Request) {
|
||||||
f, err := os.CreateTemp(t.TempDir(), "ollama-model")
|
fname := createTestFile(t, "ollama-model")
|
||||||
assert.Nil(t, err)
|
|
||||||
defer f.Close()
|
|
||||||
|
|
||||||
stream := false
|
stream := false
|
||||||
createReq := api.CreateRequest{
|
createReq := api.CreateRequest{
|
||||||
Name: "t-bone",
|
Name: "t-bone",
|
||||||
Modelfile: fmt.Sprintf("FROM %s", f.Name()),
|
Modelfile: fmt.Sprintf("FROM %s", fname),
|
||||||
Stream: &stream,
|
Stream: &stream,
|
||||||
}
|
}
|
||||||
jsonData, err := json.Marshal(createReq)
|
jsonData, err := json.Marshal(createReq)
|
||||||
@@ -216,27 +214,25 @@ func Test_Routes(t *testing.T) {
|
|||||||
httpSrv := httptest.NewServer(router)
|
httpSrv := httptest.NewServer(router)
|
||||||
t.Cleanup(httpSrv.Close)
|
t.Cleanup(httpSrv.Close)
|
||||||
|
|
||||||
workDir, err := os.MkdirTemp("", "ollama-test")
|
t.Setenv("OLLAMA_MODELS", t.TempDir())
|
||||||
assert.Nil(t, err)
|
|
||||||
defer os.RemoveAll(workDir)
|
|
||||||
os.Setenv("OLLAMA_MODELS", workDir)
|
|
||||||
|
|
||||||
for _, tc := range testCases {
|
for _, tc := range testCases {
|
||||||
t.Logf("Running Test: [%s]", tc.Name)
|
t.Run(tc.Name, func(t *testing.T) {
|
||||||
u := httpSrv.URL + tc.Path
|
u := httpSrv.URL + tc.Path
|
||||||
req, err := http.NewRequestWithContext(context.TODO(), tc.Method, u, nil)
|
req, err := http.NewRequestWithContext(context.TODO(), tc.Method, u, nil)
|
||||||
assert.Nil(t, err)
|
assert.Nil(t, err)
|
||||||
|
|
||||||
if tc.Setup != nil {
|
if tc.Setup != nil {
|
||||||
tc.Setup(t, req)
|
tc.Setup(t, req)
|
||||||
}
|
}
|
||||||
|
|
||||||
resp, err := httpSrv.Client().Do(req)
|
resp, err := httpSrv.Client().Do(req)
|
||||||
assert.Nil(t, err)
|
assert.Nil(t, err)
|
||||||
defer resp.Body.Close()
|
defer resp.Body.Close()
|
||||||
|
|
||||||
if tc.Expected != nil {
|
if tc.Expected != nil {
|
||||||
tc.Expected(t, resp)
|
tc.Expected(t, resp)
|
||||||
}
|
}
|
||||||
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
136
server/sched.go
136
server/sched.go
@@ -5,10 +5,8 @@ import (
|
|||||||
"errors"
|
"errors"
|
||||||
"fmt"
|
"fmt"
|
||||||
"log/slog"
|
"log/slog"
|
||||||
"os"
|
|
||||||
"reflect"
|
"reflect"
|
||||||
"sort"
|
"sort"
|
||||||
"strconv"
|
|
||||||
"strings"
|
"strings"
|
||||||
"sync"
|
"sync"
|
||||||
"time"
|
"time"
|
||||||
@@ -17,6 +15,7 @@ import (
|
|||||||
"github.com/ollama/ollama/format"
|
"github.com/ollama/ollama/format"
|
||||||
"github.com/ollama/ollama/gpu"
|
"github.com/ollama/ollama/gpu"
|
||||||
"github.com/ollama/ollama/llm"
|
"github.com/ollama/ollama/llm"
|
||||||
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
"golang.org/x/exp/slices"
|
"golang.org/x/exp/slices"
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -43,35 +42,14 @@ type Scheduler struct {
|
|||||||
getGpuFn func() gpu.GpuInfoList
|
getGpuFn func() gpu.GpuInfoList
|
||||||
}
|
}
|
||||||
|
|
||||||
// TODO set this to zero after a release or two, to enable multiple models by default
|
var ErrMaxQueue = fmt.Errorf("server busy, please try again. maximum pending requests exceeded")
|
||||||
var loadedMax = 1 // Maximum runners; < 1 maps to as many as will fit in VRAM (unlimited for CPU runners)
|
|
||||||
var maxQueuedRequests = 10 // TODO configurable
|
|
||||||
var numParallel = 1
|
|
||||||
|
|
||||||
func InitScheduler(ctx context.Context) *Scheduler {
|
func InitScheduler(ctx context.Context) *Scheduler {
|
||||||
maxRunners := os.Getenv("OLLAMA_MAX_LOADED_MODELS")
|
|
||||||
if maxRunners != "" {
|
|
||||||
m, err := strconv.Atoi(maxRunners)
|
|
||||||
if err != nil {
|
|
||||||
slog.Error("invalid setting", "OLLAMA_MAX_LOADED_MODELS", maxRunners, "error", err)
|
|
||||||
} else {
|
|
||||||
loadedMax = m
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if onp := os.Getenv("OLLAMA_NUM_PARALLEL"); onp != "" {
|
|
||||||
p, err := strconv.Atoi(onp)
|
|
||||||
if err != nil || p <= 0 {
|
|
||||||
slog.Error("invalid parallel setting, must be greater than zero", "OLLAMA_NUM_PARALLEL", onp, "error", err)
|
|
||||||
} else {
|
|
||||||
numParallel = p
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
sched := &Scheduler{
|
sched := &Scheduler{
|
||||||
pendingReqCh: make(chan *LlmRequest, maxQueuedRequests),
|
pendingReqCh: make(chan *LlmRequest, envconfig.MaxQueuedRequests),
|
||||||
finishedReqCh: make(chan *LlmRequest, maxQueuedRequests),
|
finishedReqCh: make(chan *LlmRequest, envconfig.MaxQueuedRequests),
|
||||||
expiredCh: make(chan *runnerRef, maxQueuedRequests),
|
expiredCh: make(chan *runnerRef, envconfig.MaxQueuedRequests),
|
||||||
unloadedCh: make(chan interface{}, maxQueuedRequests),
|
unloadedCh: make(chan interface{}, envconfig.MaxQueuedRequests),
|
||||||
loaded: make(map[string]*runnerRef),
|
loaded: make(map[string]*runnerRef),
|
||||||
newServerFn: llm.NewLlamaServer,
|
newServerFn: llm.NewLlamaServer,
|
||||||
getGpuFn: gpu.GetGPUInfo,
|
getGpuFn: gpu.GetGPUInfo,
|
||||||
@@ -82,6 +60,13 @@ func InitScheduler(ctx context.Context) *Scheduler {
|
|||||||
|
|
||||||
// context must be canceled to decrement ref count and release the runner
|
// context must be canceled to decrement ref count and release the runner
|
||||||
func (s *Scheduler) GetRunner(c context.Context, model *Model, opts api.Options, sessionDuration time.Duration) (chan *runnerRef, chan error) {
|
func (s *Scheduler) GetRunner(c context.Context, model *Model, opts api.Options, sessionDuration time.Duration) (chan *runnerRef, chan error) {
|
||||||
|
// allocate a large enough kv cache for all parallel requests
|
||||||
|
if opts.NumCtx < 4 {
|
||||||
|
opts.NumCtx = 4
|
||||||
|
}
|
||||||
|
|
||||||
|
opts.NumCtx = opts.NumCtx * envconfig.NumParallel
|
||||||
|
|
||||||
req := &LlmRequest{
|
req := &LlmRequest{
|
||||||
ctx: c,
|
ctx: c,
|
||||||
model: model,
|
model: model,
|
||||||
@@ -90,12 +75,11 @@ func (s *Scheduler) GetRunner(c context.Context, model *Model, opts api.Options,
|
|||||||
successCh: make(chan *runnerRef),
|
successCh: make(chan *runnerRef),
|
||||||
errCh: make(chan error, 1),
|
errCh: make(chan error, 1),
|
||||||
}
|
}
|
||||||
// context split across parallel threads
|
|
||||||
opts.NumCtx = opts.NumCtx * numParallel
|
|
||||||
select {
|
select {
|
||||||
case s.pendingReqCh <- req:
|
case s.pendingReqCh <- req:
|
||||||
default:
|
default:
|
||||||
req.errCh <- fmt.Errorf("server busy, please try again. maximum pending requests exceeded")
|
req.errCh <- ErrMaxQueue
|
||||||
}
|
}
|
||||||
return req.successCh, req.errCh
|
return req.successCh, req.errCh
|
||||||
}
|
}
|
||||||
@@ -120,6 +104,12 @@ func (s *Scheduler) processPending(ctx context.Context) {
|
|||||||
return
|
return
|
||||||
case pending := <-s.pendingReqCh:
|
case pending := <-s.pendingReqCh:
|
||||||
// Block other requests until we get this pending request running
|
// Block other requests until we get this pending request running
|
||||||
|
|
||||||
|
if pending.ctx.Err() != nil {
|
||||||
|
slog.Debug("pending request cancelled or timed out, skipping scheduling")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
for {
|
for {
|
||||||
var runnerToExpire *runnerRef
|
var runnerToExpire *runnerRef
|
||||||
s.loadedMu.Lock()
|
s.loadedMu.Lock()
|
||||||
@@ -134,11 +124,11 @@ func (s *Scheduler) processPending(ctx context.Context) {
|
|||||||
pending.useLoadedRunner(runner, s.finishedReqCh)
|
pending.useLoadedRunner(runner, s.finishedReqCh)
|
||||||
break
|
break
|
||||||
}
|
}
|
||||||
} else if loadedMax > 0 && loadedCount >= loadedMax {
|
} else if envconfig.MaxRunners > 0 && loadedCount >= envconfig.MaxRunners {
|
||||||
slog.Debug("max runners achieved, unloading one to make room", "runner_count", loadedCount)
|
slog.Debug("max runners achieved, unloading one to make room", "runner_count", loadedCount)
|
||||||
runnerToExpire = s.findRunnerToUnload(pending)
|
runnerToExpire = s.findRunnerToUnload()
|
||||||
} else {
|
} else {
|
||||||
// Either no models are loaded or below loadedMax
|
// Either no models are loaded or below envconfig.MaxRunners
|
||||||
// Get a refreshed GPU list
|
// Get a refreshed GPU list
|
||||||
gpus := s.getGpuFn()
|
gpus := s.getGpuFn()
|
||||||
|
|
||||||
@@ -149,7 +139,7 @@ func (s *Scheduler) processPending(ctx context.Context) {
|
|||||||
break
|
break
|
||||||
}
|
}
|
||||||
|
|
||||||
// If we're CPU only mode, just limit by loadedMax above
|
// If we're CPU only mode, just limit by envconfig.MaxRunners above
|
||||||
// TODO handle system memory exhaustion
|
// TODO handle system memory exhaustion
|
||||||
if (len(gpus) == 1 && gpus[0].Library == "cpu") || pending.opts.NumGPU == 0 {
|
if (len(gpus) == 1 && gpus[0].Library == "cpu") || pending.opts.NumGPU == 0 {
|
||||||
slog.Debug("cpu mode with existing models, loading")
|
slog.Debug("cpu mode with existing models, loading")
|
||||||
@@ -177,7 +167,7 @@ func (s *Scheduler) processPending(ctx context.Context) {
|
|||||||
s.loadFn(pending, ggml, gpus)
|
s.loadFn(pending, ggml, gpus)
|
||||||
break
|
break
|
||||||
}
|
}
|
||||||
runnerToExpire = s.findRunnerToUnload(pending)
|
runnerToExpire = s.findRunnerToUnload()
|
||||||
}
|
}
|
||||||
|
|
||||||
if runnerToExpire == nil {
|
if runnerToExpire == nil {
|
||||||
@@ -277,13 +267,16 @@ func (s *Scheduler) processCompleted(ctx context.Context) {
|
|||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
slog.Debug("got lock to unload", "model", runner.model)
|
|
||||||
runner.unload()
|
|
||||||
s.loadedMu.Lock()
|
s.loadedMu.Lock()
|
||||||
|
slog.Debug("got lock to unload", "model", runner.model)
|
||||||
|
finished := runner.waitForVRAMRecovery()
|
||||||
|
runner.unload()
|
||||||
delete(s.loaded, runner.model)
|
delete(s.loaded, runner.model)
|
||||||
s.loadedMu.Unlock()
|
s.loadedMu.Unlock()
|
||||||
slog.Debug("runner released", "model", runner.model)
|
slog.Debug("runner released", "model", runner.model)
|
||||||
runner.refMu.Unlock()
|
runner.refMu.Unlock()
|
||||||
|
|
||||||
|
<-finished
|
||||||
slog.Debug("sending an unloaded event", "model", runner.model)
|
slog.Debug("sending an unloaded event", "model", runner.model)
|
||||||
s.unloadedCh <- struct{}{}
|
s.unloadedCh <- struct{}{}
|
||||||
}
|
}
|
||||||
@@ -455,6 +448,10 @@ func (runner *runnerRef) needsReload(ctx context.Context, req *LlmRequest) bool
|
|||||||
timeout = 2 * time.Minute // Initial load can take a long time for big models on slow systems...
|
timeout = 2 * time.Minute // Initial load can take a long time for big models on slow systems...
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if runner.Options == nil {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
|
||||||
// Don't reload runner if num_gpu=-1 was provided
|
// Don't reload runner if num_gpu=-1 was provided
|
||||||
optsExisting := runner.Options.Runner
|
optsExisting := runner.Options.Runner
|
||||||
optsNew := req.opts.Runner
|
optsNew := req.opts.Runner
|
||||||
@@ -475,6 +472,61 @@ func (runner *runnerRef) needsReload(ctx context.Context, req *LlmRequest) bool
|
|||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Free memory reporting on GPUs can lag for a while even after the runner
|
||||||
|
// exits, so we have to keep checking until we see the available memory recover,
|
||||||
|
// otherwise subsequent model loads will get far less layers loaded or worse
|
||||||
|
// case, may completely fall back to CPU mode.
|
||||||
|
// This routine must be called before the runner unloads so it can establish
|
||||||
|
// a before and after GPU memory allocation. The returned channel
|
||||||
|
// will be notified when we're done waiting, or have timed out and should
|
||||||
|
// proceed anyway
|
||||||
|
func (runner *runnerRef) waitForVRAMRecovery() chan interface{} {
|
||||||
|
finished := make(chan interface{}, 1)
|
||||||
|
|
||||||
|
// CPU or Metal don't need checking, so no waiting required
|
||||||
|
if len(runner.gpus) == 1 && (runner.gpus[0].Library == "cpu" || runner.gpus[0].Library == "metal") {
|
||||||
|
finished <- struct{}{}
|
||||||
|
return finished
|
||||||
|
}
|
||||||
|
start := time.Now()
|
||||||
|
|
||||||
|
// Establish a baseline before we unload
|
||||||
|
gpusBefore := gpu.GetGPUInfo()
|
||||||
|
var totalMemoryBefore, freeMemoryBefore uint64
|
||||||
|
for _, gpu := range gpusBefore {
|
||||||
|
totalMemoryBefore += gpu.TotalMemory
|
||||||
|
freeMemoryBefore += gpu.FreeMemory
|
||||||
|
}
|
||||||
|
go func() {
|
||||||
|
expiresAt := start.Add(5 * time.Second) // typical convergence is 0.5-1.5s
|
||||||
|
ticker := time.NewTicker(250 * time.Millisecond)
|
||||||
|
defer ticker.Stop()
|
||||||
|
for {
|
||||||
|
<-ticker.C
|
||||||
|
if time.Now().After(expiresAt) {
|
||||||
|
slog.Warn("gpu VRAM usage didn't recover within timeout", "seconds", time.Since(start).Seconds())
|
||||||
|
finished <- struct{}{}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Query GPUs, look for free to go back up
|
||||||
|
gpusNow := gpu.GetGPUInfo()
|
||||||
|
var totalMemoryNow, freeMemoryNow uint64
|
||||||
|
for _, gpu := range gpusNow {
|
||||||
|
totalMemoryNow += gpu.TotalMemory
|
||||||
|
freeMemoryNow += gpu.FreeMemory
|
||||||
|
}
|
||||||
|
// If we're within ~80% of the estimated memory usage recovered, bail out
|
||||||
|
if float32(freeMemoryNow-freeMemoryBefore) > float32(runner.estimatedVRAM)*0.8 {
|
||||||
|
slog.Debug(fmt.Sprintf("gpu VRAM free memory converged after %0.2f seconds", time.Since(start).Seconds()))
|
||||||
|
finished <- struct{}{}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
return finished
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
type ByDuration []*runnerRef
|
type ByDuration []*runnerRef
|
||||||
|
|
||||||
func (a ByDuration) Len() int { return len(a) }
|
func (a ByDuration) Len() int { return len(a) }
|
||||||
@@ -515,16 +567,16 @@ func pickBestFitGPUs(req *LlmRequest, ggml *llm.GGML, gpus gpu.GpuInfoList) gpu.
|
|||||||
// - try subsets of GPUs instead of just falling back to 1 or all in a family
|
// - try subsets of GPUs instead of just falling back to 1 or all in a family
|
||||||
|
|
||||||
// Now try all the GPUs
|
// Now try all the GPUs
|
||||||
if ok, estimatedVRAM = llm.PredictServerFit(gl, ggml, req.model.AdapterPaths, req.model.ProjectorPaths, req.opts); ok {
|
if ok, estimatedVRAM = llm.PredictServerFit(sgl, ggml, req.model.AdapterPaths, req.model.ProjectorPaths, req.opts); ok {
|
||||||
slog.Debug("new model will fit in available VRAM, loading", "model", req.model.ModelPath, "library", gl[0].Library, "required", format.HumanBytes2(estimatedVRAM))
|
slog.Debug("new model will fit in available VRAM, loading", "model", req.model.ModelPath, "library", sgl[0].Library, "required", format.HumanBytes2(estimatedVRAM))
|
||||||
return gl
|
return sgl
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// findRunnerToUnload finds a runner to unload to make room for a new model
|
// findRunnerToUnload finds a runner to unload to make room for a new model
|
||||||
func (s *Scheduler) findRunnerToUnload(req *LlmRequest) *runnerRef {
|
func (s *Scheduler) findRunnerToUnload() *runnerRef {
|
||||||
s.loadedMu.Lock()
|
s.loadedMu.Lock()
|
||||||
runnerList := make([]*runnerRef, 0, len(s.loaded))
|
runnerList := make([]*runnerRef, 0, len(s.loaded))
|
||||||
for _, r := range s.loaded {
|
for _, r := range s.loaded {
|
||||||
|
|||||||
@@ -15,6 +15,7 @@ import (
|
|||||||
"github.com/ollama/ollama/format"
|
"github.com/ollama/ollama/format"
|
||||||
"github.com/ollama/ollama/gpu"
|
"github.com/ollama/ollama/gpu"
|
||||||
"github.com/ollama/ollama/llm"
|
"github.com/ollama/ollama/llm"
|
||||||
|
"github.com/ollama/ollama/server/envconfig"
|
||||||
"github.com/stretchr/testify/assert"
|
"github.com/stretchr/testify/assert"
|
||||||
"github.com/stretchr/testify/require"
|
"github.com/stretchr/testify/require"
|
||||||
)
|
)
|
||||||
@@ -27,38 +28,14 @@ func init() {
|
|||||||
func TestInitScheduler(t *testing.T) {
|
func TestInitScheduler(t *testing.T) {
|
||||||
ctx, done := context.WithCancel(context.Background())
|
ctx, done := context.WithCancel(context.Background())
|
||||||
defer done()
|
defer done()
|
||||||
initialMax := loadedMax
|
|
||||||
initialParallel := numParallel
|
|
||||||
s := InitScheduler(ctx)
|
s := InitScheduler(ctx)
|
||||||
require.Equal(t, initialMax, loadedMax)
|
|
||||||
s.loadedMu.Lock()
|
s.loadedMu.Lock()
|
||||||
require.NotNil(t, s.loaded)
|
require.NotNil(t, s.loaded)
|
||||||
s.loadedMu.Unlock()
|
s.loadedMu.Unlock()
|
||||||
|
|
||||||
os.Setenv("OLLAMA_MAX_LOADED_MODELS", "blue")
|
|
||||||
s = InitScheduler(ctx)
|
|
||||||
require.Equal(t, initialMax, loadedMax)
|
|
||||||
s.loadedMu.Lock()
|
|
||||||
require.NotNil(t, s.loaded)
|
|
||||||
s.loadedMu.Unlock()
|
|
||||||
|
|
||||||
os.Setenv("OLLAMA_MAX_LOADED_MODELS", "0")
|
|
||||||
s = InitScheduler(ctx)
|
|
||||||
require.Equal(t, 0, loadedMax)
|
|
||||||
s.loadedMu.Lock()
|
|
||||||
require.NotNil(t, s.loaded)
|
|
||||||
s.loadedMu.Unlock()
|
|
||||||
|
|
||||||
os.Setenv("OLLAMA_NUM_PARALLEL", "blue")
|
|
||||||
_ = InitScheduler(ctx)
|
|
||||||
require.Equal(t, initialParallel, numParallel)
|
|
||||||
os.Setenv("OLLAMA_NUM_PARALLEL", "10")
|
|
||||||
_ = InitScheduler(ctx)
|
|
||||||
require.Equal(t, 10, numParallel)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestLoad(t *testing.T) {
|
func TestLoad(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 5*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 20*time.Millisecond)
|
||||||
defer done()
|
defer done()
|
||||||
s := InitScheduler(ctx)
|
s := InitScheduler(ctx)
|
||||||
var ggml *llm.GGML // value not used in tests
|
var ggml *llm.GGML // value not used in tests
|
||||||
@@ -174,7 +151,7 @@ func newScenario(t *testing.T, ctx context.Context, modelName string, estimatedV
|
|||||||
}
|
}
|
||||||
|
|
||||||
func TestRequests(t *testing.T) {
|
func TestRequests(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 100*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 500*time.Millisecond)
|
||||||
defer done()
|
defer done()
|
||||||
|
|
||||||
// Same model, same request
|
// Same model, same request
|
||||||
@@ -249,7 +226,7 @@ func TestRequests(t *testing.T) {
|
|||||||
t.Errorf("timeout")
|
t.Errorf("timeout")
|
||||||
}
|
}
|
||||||
|
|
||||||
loadedMax = 1
|
envconfig.MaxRunners = 1
|
||||||
s.newServerFn = scenario3a.newServer
|
s.newServerFn = scenario3a.newServer
|
||||||
slog.Info("scenario3a")
|
slog.Info("scenario3a")
|
||||||
s.pendingReqCh <- scenario3a.req
|
s.pendingReqCh <- scenario3a.req
|
||||||
@@ -268,7 +245,7 @@ func TestRequests(t *testing.T) {
|
|||||||
require.Len(t, s.loaded, 1)
|
require.Len(t, s.loaded, 1)
|
||||||
s.loadedMu.Unlock()
|
s.loadedMu.Unlock()
|
||||||
|
|
||||||
loadedMax = 0
|
envconfig.MaxRunners = 0
|
||||||
s.newServerFn = scenario3b.newServer
|
s.newServerFn = scenario3b.newServer
|
||||||
slog.Info("scenario3b")
|
slog.Info("scenario3b")
|
||||||
s.pendingReqCh <- scenario3b.req
|
s.pendingReqCh <- scenario3b.req
|
||||||
@@ -329,7 +306,7 @@ func TestRequests(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func TestGetRunner(t *testing.T) {
|
func TestGetRunner(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 20*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 100*time.Millisecond)
|
||||||
defer done()
|
defer done()
|
||||||
|
|
||||||
// Same model, same request
|
// Same model, same request
|
||||||
@@ -339,7 +316,7 @@ func TestGetRunner(t *testing.T) {
|
|||||||
scenario1b.req.sessionDuration = 0
|
scenario1b.req.sessionDuration = 0
|
||||||
scenario1c := newScenario(t, ctx, "ollama-model-1c", 10)
|
scenario1c := newScenario(t, ctx, "ollama-model-1c", 10)
|
||||||
scenario1c.req.sessionDuration = 0
|
scenario1c.req.sessionDuration = 0
|
||||||
maxQueuedRequests = 1
|
envconfig.MaxQueuedRequests = 1
|
||||||
s := InitScheduler(ctx)
|
s := InitScheduler(ctx)
|
||||||
s.getGpuFn = func() gpu.GpuInfoList {
|
s.getGpuFn = func() gpu.GpuInfoList {
|
||||||
g := gpu.GpuInfo{Library: "metal"}
|
g := gpu.GpuInfo{Library: "metal"}
|
||||||
@@ -375,11 +352,9 @@ func TestGetRunner(t *testing.T) {
|
|||||||
scenario1c.req.model.ModelPath = "bad path"
|
scenario1c.req.model.ModelPath = "bad path"
|
||||||
slog.Info("scenario1c")
|
slog.Info("scenario1c")
|
||||||
successCh1c, errCh1c := s.GetRunner(scenario1c.ctx, scenario1c.req.model, scenario1c.req.opts, scenario1c.req.sessionDuration)
|
successCh1c, errCh1c := s.GetRunner(scenario1c.ctx, scenario1c.req.model, scenario1c.req.opts, scenario1c.req.sessionDuration)
|
||||||
require.Len(t, s.pendingReqCh, 0)
|
// Starts in pending channel, then should be quickly processsed to return an error
|
||||||
require.Len(t, successCh1c, 0)
|
|
||||||
require.Len(t, errCh1c, 0)
|
|
||||||
|
|
||||||
time.Sleep(5 * time.Millisecond)
|
time.Sleep(5 * time.Millisecond)
|
||||||
|
require.Len(t, successCh1c, 0)
|
||||||
s.loadedMu.Lock()
|
s.loadedMu.Lock()
|
||||||
require.Len(t, s.loaded, 0)
|
require.Len(t, s.loaded, 0)
|
||||||
s.loadedMu.Unlock()
|
s.loadedMu.Unlock()
|
||||||
@@ -391,7 +366,7 @@ func TestGetRunner(t *testing.T) {
|
|||||||
|
|
||||||
// TODO - add one scenario that triggers the bogus finished event with positive ref count
|
// TODO - add one scenario that triggers the bogus finished event with positive ref count
|
||||||
func TestPrematureExpired(t *testing.T) {
|
func TestPrematureExpired(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 100*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 500*time.Millisecond)
|
||||||
defer done()
|
defer done()
|
||||||
|
|
||||||
// Same model, same request
|
// Same model, same request
|
||||||
@@ -436,7 +411,7 @@ func TestPrematureExpired(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func TestUseLoadedRunner(t *testing.T) {
|
func TestUseLoadedRunner(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 5*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 100*time.Millisecond)
|
||||||
req := &LlmRequest{
|
req := &LlmRequest{
|
||||||
ctx: ctx,
|
ctx: ctx,
|
||||||
opts: api.DefaultOptions(),
|
opts: api.DefaultOptions(),
|
||||||
@@ -461,7 +436,7 @@ func TestUseLoadedRunner(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func TestUpdateFreeSpace(t *testing.T) {
|
func TestUpdateFreeSpace(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 5*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 100*time.Millisecond)
|
||||||
defer done()
|
defer done()
|
||||||
gpus := gpu.GpuInfoList{
|
gpus := gpu.GpuInfoList{
|
||||||
{
|
{
|
||||||
@@ -494,12 +469,9 @@ func TestUpdateFreeSpace(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func TestFindRunnerToUnload(t *testing.T) {
|
func TestFindRunnerToUnload(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 5*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 100*time.Millisecond)
|
||||||
defer done()
|
defer done()
|
||||||
req := &LlmRequest{
|
|
||||||
ctx: ctx,
|
|
||||||
opts: api.DefaultOptions(),
|
|
||||||
}
|
|
||||||
r1 := &runnerRef{refCount: 1, sessionDuration: 1}
|
r1 := &runnerRef{refCount: 1, sessionDuration: 1}
|
||||||
r2 := &runnerRef{sessionDuration: 2}
|
r2 := &runnerRef{sessionDuration: 2}
|
||||||
|
|
||||||
@@ -509,16 +481,16 @@ func TestFindRunnerToUnload(t *testing.T) {
|
|||||||
s.loaded["b"] = r2
|
s.loaded["b"] = r2
|
||||||
s.loadedMu.Unlock()
|
s.loadedMu.Unlock()
|
||||||
|
|
||||||
resp := s.findRunnerToUnload(req)
|
resp := s.findRunnerToUnload()
|
||||||
require.Equal(t, r2, resp)
|
require.Equal(t, r2, resp)
|
||||||
r2.refCount = 1
|
r2.refCount = 1
|
||||||
resp = s.findRunnerToUnload(req)
|
resp = s.findRunnerToUnload()
|
||||||
require.Equal(t, r1, resp)
|
require.Equal(t, r1, resp)
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestNeedsReload(t *testing.T) {
|
func TestNeedsReload(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 5*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 100*time.Millisecond)
|
||||||
defer done()
|
defer done()
|
||||||
|
|
||||||
llm := &mockLlm{}
|
llm := &mockLlm{}
|
||||||
@@ -562,7 +534,7 @@ func TestNeedsReload(t *testing.T) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func TestUnloadAllRunners(t *testing.T) {
|
func TestUnloadAllRunners(t *testing.T) {
|
||||||
ctx, done := context.WithTimeout(context.Background(), 5*time.Millisecond)
|
ctx, done := context.WithTimeout(context.Background(), 100*time.Millisecond)
|
||||||
defer done()
|
defer done()
|
||||||
|
|
||||||
llm1 := &mockLlm{}
|
llm1 := &mockLlm{}
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
package parser
|
package model
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"bufio"
|
"bufio"
|
||||||
@@ -10,11 +10,41 @@ import (
|
|||||||
"strings"
|
"strings"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
type File struct {
|
||||||
|
Commands []Command
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f File) String() string {
|
||||||
|
var sb strings.Builder
|
||||||
|
for _, cmd := range f.Commands {
|
||||||
|
fmt.Fprintln(&sb, cmd.String())
|
||||||
|
}
|
||||||
|
|
||||||
|
return sb.String()
|
||||||
|
}
|
||||||
|
|
||||||
type Command struct {
|
type Command struct {
|
||||||
Name string
|
Name string
|
||||||
Args string
|
Args string
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (c Command) String() string {
|
||||||
|
var sb strings.Builder
|
||||||
|
switch c.Name {
|
||||||
|
case "model":
|
||||||
|
fmt.Fprintf(&sb, "FROM %s", c.Args)
|
||||||
|
case "license", "template", "system", "adapter":
|
||||||
|
fmt.Fprintf(&sb, "%s %s", strings.ToUpper(c.Name), quote(c.Args))
|
||||||
|
case "message":
|
||||||
|
role, message, _ := strings.Cut(c.Args, ": ")
|
||||||
|
fmt.Fprintf(&sb, "MESSAGE %s %s", role, quote(message))
|
||||||
|
default:
|
||||||
|
fmt.Fprintf(&sb, "PARAMETER %s %s", c.Name, quote(c.Args))
|
||||||
|
}
|
||||||
|
|
||||||
|
return sb.String()
|
||||||
|
}
|
||||||
|
|
||||||
type state int
|
type state int
|
||||||
|
|
||||||
const (
|
const (
|
||||||
@@ -32,38 +62,14 @@ var (
|
|||||||
errInvalidCommand = errors.New("command must be one of \"from\", \"license\", \"template\", \"system\", \"adapter\", \"parameter\", or \"message\"")
|
errInvalidCommand = errors.New("command must be one of \"from\", \"license\", \"template\", \"system\", \"adapter\", \"parameter\", or \"message\"")
|
||||||
)
|
)
|
||||||
|
|
||||||
func Format(cmds []Command) string {
|
func ParseFile(r io.Reader) (*File, error) {
|
||||||
var sb strings.Builder
|
|
||||||
for _, cmd := range cmds {
|
|
||||||
name := cmd.Name
|
|
||||||
args := cmd.Args
|
|
||||||
|
|
||||||
switch cmd.Name {
|
|
||||||
case "model":
|
|
||||||
name = "from"
|
|
||||||
args = cmd.Args
|
|
||||||
case "license", "template", "system", "adapter":
|
|
||||||
args = quote(args)
|
|
||||||
case "message":
|
|
||||||
role, message, _ := strings.Cut(cmd.Args, ": ")
|
|
||||||
args = role + " " + quote(message)
|
|
||||||
default:
|
|
||||||
name = "parameter"
|
|
||||||
args = cmd.Name + " " + quote(cmd.Args)
|
|
||||||
}
|
|
||||||
|
|
||||||
fmt.Fprintln(&sb, strings.ToUpper(name), args)
|
|
||||||
}
|
|
||||||
|
|
||||||
return sb.String()
|
|
||||||
}
|
|
||||||
|
|
||||||
func Parse(r io.Reader) (cmds []Command, err error) {
|
|
||||||
var cmd Command
|
var cmd Command
|
||||||
var curr state
|
var curr state
|
||||||
var b bytes.Buffer
|
var b bytes.Buffer
|
||||||
var role string
|
var role string
|
||||||
|
|
||||||
|
var f File
|
||||||
|
|
||||||
br := bufio.NewReader(r)
|
br := bufio.NewReader(r)
|
||||||
for {
|
for {
|
||||||
r, _, err := br.ReadRune()
|
r, _, err := br.ReadRune()
|
||||||
@@ -128,7 +134,7 @@ func Parse(r io.Reader) (cmds []Command, err error) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
cmd.Args = s
|
cmd.Args = s
|
||||||
cmds = append(cmds, cmd)
|
f.Commands = append(f.Commands, cmd)
|
||||||
}
|
}
|
||||||
|
|
||||||
b.Reset()
|
b.Reset()
|
||||||
@@ -157,14 +163,14 @@ func Parse(r io.Reader) (cmds []Command, err error) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
cmd.Args = s
|
cmd.Args = s
|
||||||
cmds = append(cmds, cmd)
|
f.Commands = append(f.Commands, cmd)
|
||||||
default:
|
default:
|
||||||
return nil, io.ErrUnexpectedEOF
|
return nil, io.ErrUnexpectedEOF
|
||||||
}
|
}
|
||||||
|
|
||||||
for _, cmd := range cmds {
|
for _, cmd := range f.Commands {
|
||||||
if cmd.Name == "model" {
|
if cmd.Name == "model" {
|
||||||
return cmds, nil
|
return &f, nil
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -243,10 +249,6 @@ func quote(s string) string {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func unquote(s string) (string, bool) {
|
func unquote(s string) (string, bool) {
|
||||||
if len(s) == 0 {
|
|
||||||
return "", false
|
|
||||||
}
|
|
||||||
|
|
||||||
// TODO: single quotes
|
// TODO: single quotes
|
||||||
if len(s) >= 3 && s[:3] == `"""` {
|
if len(s) >= 3 && s[:3] == `"""` {
|
||||||
if len(s) >= 6 && s[len(s)-3:] == `"""` {
|
if len(s) >= 6 && s[len(s)-3:] == `"""` {
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
package parser
|
package model
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"bytes"
|
"bytes"
|
||||||
@@ -10,7 +10,7 @@ import (
|
|||||||
"github.com/stretchr/testify/assert"
|
"github.com/stretchr/testify/assert"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestParser(t *testing.T) {
|
func TestParseFileFile(t *testing.T) {
|
||||||
input := `
|
input := `
|
||||||
FROM model1
|
FROM model1
|
||||||
ADAPTER adapter1
|
ADAPTER adapter1
|
||||||
@@ -22,8 +22,8 @@ TEMPLATE template1
|
|||||||
|
|
||||||
reader := strings.NewReader(input)
|
reader := strings.NewReader(input)
|
||||||
|
|
||||||
commands, err := Parse(reader)
|
modelfile, err := ParseFile(reader)
|
||||||
assert.Nil(t, err)
|
assert.NoError(t, err)
|
||||||
|
|
||||||
expectedCommands := []Command{
|
expectedCommands := []Command{
|
||||||
{Name: "model", Args: "model1"},
|
{Name: "model", Args: "model1"},
|
||||||
@@ -34,10 +34,10 @@ TEMPLATE template1
|
|||||||
{Name: "template", Args: "template1"},
|
{Name: "template", Args: "template1"},
|
||||||
}
|
}
|
||||||
|
|
||||||
assert.Equal(t, expectedCommands, commands)
|
assert.Equal(t, expectedCommands, modelfile.Commands)
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParserFrom(t *testing.T) {
|
func TestParseFileFrom(t *testing.T) {
|
||||||
var cases = []struct {
|
var cases = []struct {
|
||||||
input string
|
input string
|
||||||
expected []Command
|
expected []Command
|
||||||
@@ -85,14 +85,16 @@ func TestParserFrom(t *testing.T) {
|
|||||||
|
|
||||||
for _, c := range cases {
|
for _, c := range cases {
|
||||||
t.Run("", func(t *testing.T) {
|
t.Run("", func(t *testing.T) {
|
||||||
commands, err := Parse(strings.NewReader(c.input))
|
modelfile, err := ParseFile(strings.NewReader(c.input))
|
||||||
assert.ErrorIs(t, err, c.err)
|
assert.ErrorIs(t, err, c.err)
|
||||||
assert.Equal(t, c.expected, commands)
|
if modelfile != nil {
|
||||||
|
assert.Equal(t, c.expected, modelfile.Commands)
|
||||||
|
}
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParserParametersMissingValue(t *testing.T) {
|
func TestParseFileParametersMissingValue(t *testing.T) {
|
||||||
input := `
|
input := `
|
||||||
FROM foo
|
FROM foo
|
||||||
PARAMETER param1
|
PARAMETER param1
|
||||||
@@ -100,21 +102,21 @@ PARAMETER param1
|
|||||||
|
|
||||||
reader := strings.NewReader(input)
|
reader := strings.NewReader(input)
|
||||||
|
|
||||||
_, err := Parse(reader)
|
_, err := ParseFile(reader)
|
||||||
assert.ErrorIs(t, err, io.ErrUnexpectedEOF)
|
assert.ErrorIs(t, err, io.ErrUnexpectedEOF)
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParserBadCommand(t *testing.T) {
|
func TestParseFileBadCommand(t *testing.T) {
|
||||||
input := `
|
input := `
|
||||||
FROM foo
|
FROM foo
|
||||||
BADCOMMAND param1 value1
|
BADCOMMAND param1 value1
|
||||||
`
|
`
|
||||||
_, err := Parse(strings.NewReader(input))
|
_, err := ParseFile(strings.NewReader(input))
|
||||||
assert.ErrorIs(t, err, errInvalidCommand)
|
assert.ErrorIs(t, err, errInvalidCommand)
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParserMessages(t *testing.T) {
|
func TestParseFileMessages(t *testing.T) {
|
||||||
var cases = []struct {
|
var cases = []struct {
|
||||||
input string
|
input string
|
||||||
expected []Command
|
expected []Command
|
||||||
@@ -123,34 +125,34 @@ func TestParserMessages(t *testing.T) {
|
|||||||
{
|
{
|
||||||
`
|
`
|
||||||
FROM foo
|
FROM foo
|
||||||
MESSAGE system You are a Parser. Always Parse things.
|
MESSAGE system You are a file parser. Always parse things.
|
||||||
`,
|
`,
|
||||||
[]Command{
|
[]Command{
|
||||||
{Name: "model", Args: "foo"},
|
{Name: "model", Args: "foo"},
|
||||||
{Name: "message", Args: "system: You are a Parser. Always Parse things."},
|
{Name: "message", Args: "system: You are a file parser. Always parse things."},
|
||||||
},
|
},
|
||||||
nil,
|
nil,
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
`
|
`
|
||||||
FROM foo
|
FROM foo
|
||||||
MESSAGE system You are a Parser. Always Parse things.`,
|
MESSAGE system You are a file parser. Always parse things.`,
|
||||||
[]Command{
|
[]Command{
|
||||||
{Name: "model", Args: "foo"},
|
{Name: "model", Args: "foo"},
|
||||||
{Name: "message", Args: "system: You are a Parser. Always Parse things."},
|
{Name: "message", Args: "system: You are a file parser. Always parse things."},
|
||||||
},
|
},
|
||||||
nil,
|
nil,
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
`
|
`
|
||||||
FROM foo
|
FROM foo
|
||||||
MESSAGE system You are a Parser. Always Parse things.
|
MESSAGE system You are a file parser. Always parse things.
|
||||||
MESSAGE user Hey there!
|
MESSAGE user Hey there!
|
||||||
MESSAGE assistant Hello, I want to parse all the things!
|
MESSAGE assistant Hello, I want to parse all the things!
|
||||||
`,
|
`,
|
||||||
[]Command{
|
[]Command{
|
||||||
{Name: "model", Args: "foo"},
|
{Name: "model", Args: "foo"},
|
||||||
{Name: "message", Args: "system: You are a Parser. Always Parse things."},
|
{Name: "message", Args: "system: You are a file parser. Always parse things."},
|
||||||
{Name: "message", Args: "user: Hey there!"},
|
{Name: "message", Args: "user: Hey there!"},
|
||||||
{Name: "message", Args: "assistant: Hello, I want to parse all the things!"},
|
{Name: "message", Args: "assistant: Hello, I want to parse all the things!"},
|
||||||
},
|
},
|
||||||
@@ -160,12 +162,12 @@ MESSAGE assistant Hello, I want to parse all the things!
|
|||||||
`
|
`
|
||||||
FROM foo
|
FROM foo
|
||||||
MESSAGE system """
|
MESSAGE system """
|
||||||
You are a multiline Parser. Always Parse things.
|
You are a multiline file parser. Always parse things.
|
||||||
"""
|
"""
|
||||||
`,
|
`,
|
||||||
[]Command{
|
[]Command{
|
||||||
{Name: "model", Args: "foo"},
|
{Name: "model", Args: "foo"},
|
||||||
{Name: "message", Args: "system: \nYou are a multiline Parser. Always Parse things.\n"},
|
{Name: "message", Args: "system: \nYou are a multiline file parser. Always parse things.\n"},
|
||||||
},
|
},
|
||||||
nil,
|
nil,
|
||||||
},
|
},
|
||||||
@@ -196,14 +198,16 @@ MESSAGE system`,
|
|||||||
|
|
||||||
for _, c := range cases {
|
for _, c := range cases {
|
||||||
t.Run("", func(t *testing.T) {
|
t.Run("", func(t *testing.T) {
|
||||||
commands, err := Parse(strings.NewReader(c.input))
|
modelfile, err := ParseFile(strings.NewReader(c.input))
|
||||||
assert.ErrorIs(t, err, c.err)
|
assert.ErrorIs(t, err, c.err)
|
||||||
assert.Equal(t, c.expected, commands)
|
if modelfile != nil {
|
||||||
|
assert.Equal(t, c.expected, modelfile.Commands)
|
||||||
|
}
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParserQuoted(t *testing.T) {
|
func TestParseFileQuoted(t *testing.T) {
|
||||||
var cases = []struct {
|
var cases = []struct {
|
||||||
multiline string
|
multiline string
|
||||||
expected []Command
|
expected []Command
|
||||||
@@ -348,14 +352,16 @@ TEMPLATE """
|
|||||||
|
|
||||||
for _, c := range cases {
|
for _, c := range cases {
|
||||||
t.Run("", func(t *testing.T) {
|
t.Run("", func(t *testing.T) {
|
||||||
commands, err := Parse(strings.NewReader(c.multiline))
|
modelfile, err := ParseFile(strings.NewReader(c.multiline))
|
||||||
assert.ErrorIs(t, err, c.err)
|
assert.ErrorIs(t, err, c.err)
|
||||||
assert.Equal(t, c.expected, commands)
|
if modelfile != nil {
|
||||||
|
assert.Equal(t, c.expected, modelfile.Commands)
|
||||||
|
}
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParserParameters(t *testing.T) {
|
func TestParseFileParameters(t *testing.T) {
|
||||||
var cases = map[string]struct {
|
var cases = map[string]struct {
|
||||||
name, value string
|
name, value string
|
||||||
}{
|
}{
|
||||||
@@ -404,18 +410,18 @@ func TestParserParameters(t *testing.T) {
|
|||||||
var b bytes.Buffer
|
var b bytes.Buffer
|
||||||
fmt.Fprintln(&b, "FROM foo")
|
fmt.Fprintln(&b, "FROM foo")
|
||||||
fmt.Fprintln(&b, "PARAMETER", k)
|
fmt.Fprintln(&b, "PARAMETER", k)
|
||||||
commands, err := Parse(&b)
|
modelfile, err := ParseFile(&b)
|
||||||
assert.Nil(t, err)
|
assert.NoError(t, err)
|
||||||
|
|
||||||
assert.Equal(t, []Command{
|
assert.Equal(t, []Command{
|
||||||
{Name: "model", Args: "foo"},
|
{Name: "model", Args: "foo"},
|
||||||
{Name: v.name, Args: v.value},
|
{Name: v.name, Args: v.value},
|
||||||
}, commands)
|
}, modelfile.Commands)
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParserComments(t *testing.T) {
|
func TestParseFileComments(t *testing.T) {
|
||||||
var cases = []struct {
|
var cases = []struct {
|
||||||
input string
|
input string
|
||||||
expected []Command
|
expected []Command
|
||||||
@@ -433,14 +439,14 @@ FROM foo
|
|||||||
|
|
||||||
for _, c := range cases {
|
for _, c := range cases {
|
||||||
t.Run("", func(t *testing.T) {
|
t.Run("", func(t *testing.T) {
|
||||||
commands, err := Parse(strings.NewReader(c.input))
|
modelfile, err := ParseFile(strings.NewReader(c.input))
|
||||||
assert.Nil(t, err)
|
assert.NoError(t, err)
|
||||||
assert.Equal(t, c.expected, commands)
|
assert.Equal(t, c.expected, modelfile.Commands)
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func TestParseFormatParse(t *testing.T) {
|
func TestParseFileFormatParseFile(t *testing.T) {
|
||||||
var cases = []string{
|
var cases = []string{
|
||||||
`
|
`
|
||||||
FROM foo
|
FROM foo
|
||||||
@@ -449,7 +455,7 @@ LICENSE MIT
|
|||||||
PARAMETER param1 value1
|
PARAMETER param1 value1
|
||||||
PARAMETER param2 value2
|
PARAMETER param2 value2
|
||||||
TEMPLATE template1
|
TEMPLATE template1
|
||||||
MESSAGE system You are a Parser. Always Parse things.
|
MESSAGE system You are a file parser. Always parse things.
|
||||||
MESSAGE user Hey there!
|
MESSAGE user Hey there!
|
||||||
MESSAGE assistant Hello, I want to parse all the things!
|
MESSAGE assistant Hello, I want to parse all the things!
|
||||||
`,
|
`,
|
||||||
@@ -483,18 +489,22 @@ You are a store greeter. Always responsed with "Hello!".
|
|||||||
"""
|
"""
|
||||||
MESSAGE user Hey there!
|
MESSAGE user Hey there!
|
||||||
MESSAGE assistant Hello, I want to parse all the things!
|
MESSAGE assistant Hello, I want to parse all the things!
|
||||||
|
`,
|
||||||
|
`
|
||||||
|
FROM foo
|
||||||
|
SYSTEM ""
|
||||||
`,
|
`,
|
||||||
}
|
}
|
||||||
|
|
||||||
for _, c := range cases {
|
for _, c := range cases {
|
||||||
t.Run("", func(t *testing.T) {
|
t.Run("", func(t *testing.T) {
|
||||||
commands, err := Parse(strings.NewReader(c))
|
modelfile, err := ParseFile(strings.NewReader(c))
|
||||||
assert.NoError(t, err)
|
assert.NoError(t, err)
|
||||||
|
|
||||||
commands2, err := Parse(strings.NewReader(Format(commands)))
|
modelfile2, err := ParseFile(strings.NewReader(modelfile.String()))
|
||||||
assert.NoError(t, err)
|
assert.NoError(t, err)
|
||||||
|
|
||||||
assert.Equal(t, commands, commands2)
|
assert.Equal(t, modelfile, modelfile2)
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -35,6 +35,12 @@ func Unqualified(n Name) error {
|
|||||||
// spot in logs.
|
// spot in logs.
|
||||||
const MissingPart = "!MISSING!"
|
const MissingPart = "!MISSING!"
|
||||||
|
|
||||||
|
const (
|
||||||
|
defaultHost = "registry.ollama.ai"
|
||||||
|
defaultNamespace = "library"
|
||||||
|
defaultTag = "latest"
|
||||||
|
)
|
||||||
|
|
||||||
// DefaultName returns a name with the default values for the host, namespace,
|
// DefaultName returns a name with the default values for the host, namespace,
|
||||||
// and tag parts. The model and digest parts are empty.
|
// and tag parts. The model and digest parts are empty.
|
||||||
//
|
//
|
||||||
@@ -43,9 +49,9 @@ const MissingPart = "!MISSING!"
|
|||||||
// - The default tag is ("latest")
|
// - The default tag is ("latest")
|
||||||
func DefaultName() Name {
|
func DefaultName() Name {
|
||||||
return Name{
|
return Name{
|
||||||
Host: "registry.ollama.ai",
|
Host: defaultHost,
|
||||||
Namespace: "library",
|
Namespace: defaultNamespace,
|
||||||
Tag: "latest",
|
Tag: defaultTag,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -161,7 +167,7 @@ func ParseNameBare(s string) Name {
|
|||||||
}
|
}
|
||||||
|
|
||||||
scheme, host, ok := strings.Cut(s, "://")
|
scheme, host, ok := strings.Cut(s, "://")
|
||||||
if ! ok {
|
if !ok {
|
||||||
host = scheme
|
host = scheme
|
||||||
}
|
}
|
||||||
n.Host = host
|
n.Host = host
|
||||||
@@ -169,6 +175,27 @@ func ParseNameBare(s string) Name {
|
|||||||
return n
|
return n
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ParseNameFromFilepath parses a 4-part filepath as a Name. The parts are
|
||||||
|
// expected to be in the form:
|
||||||
|
//
|
||||||
|
// { host } "/" { namespace } "/" { model } "/" { tag }
|
||||||
|
func ParseNameFromFilepath(s string) (n Name) {
|
||||||
|
parts := strings.Split(s, string(filepath.Separator))
|
||||||
|
if len(parts) != 4 {
|
||||||
|
return Name{}
|
||||||
|
}
|
||||||
|
|
||||||
|
n.Host = parts[0]
|
||||||
|
n.Namespace = parts[1]
|
||||||
|
n.Model = parts[2]
|
||||||
|
n.Tag = parts[3]
|
||||||
|
if !n.IsFullyQualified() {
|
||||||
|
return Name{}
|
||||||
|
}
|
||||||
|
|
||||||
|
return n
|
||||||
|
}
|
||||||
|
|
||||||
// Merge merges the host, namespace, and tag parts of the two names,
|
// Merge merges the host, namespace, and tag parts of the two names,
|
||||||
// preferring the non-empty parts of a.
|
// preferring the non-empty parts of a.
|
||||||
func Merge(a, b Name) Name {
|
func Merge(a, b Name) Name {
|
||||||
@@ -203,6 +230,27 @@ func (n Name) String() string {
|
|||||||
return b.String()
|
return b.String()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// DisplayShort returns a short string version of the name.
|
||||||
|
func (n Name) DisplayShortest() string {
|
||||||
|
var sb strings.Builder
|
||||||
|
|
||||||
|
if n.Host != defaultHost {
|
||||||
|
sb.WriteString(n.Host)
|
||||||
|
sb.WriteByte('/')
|
||||||
|
sb.WriteString(n.Namespace)
|
||||||
|
sb.WriteByte('/')
|
||||||
|
} else if n.Namespace != defaultNamespace {
|
||||||
|
sb.WriteString(n.Namespace)
|
||||||
|
sb.WriteByte('/')
|
||||||
|
}
|
||||||
|
|
||||||
|
// always include model and tag
|
||||||
|
sb.WriteString(n.Model)
|
||||||
|
sb.WriteString(":")
|
||||||
|
sb.WriteString(n.Tag)
|
||||||
|
return sb.String()
|
||||||
|
}
|
||||||
|
|
||||||
// IsValid reports whether all parts of the name are present and valid. The
|
// IsValid reports whether all parts of the name are present and valid. The
|
||||||
// digest is a special case, and is checked for validity only if present.
|
// digest is a special case, and is checked for validity only if present.
|
||||||
func (n Name) IsValid() bool {
|
func (n Name) IsValid() bool {
|
||||||
@@ -242,12 +290,12 @@ func (n Name) Filepath() string {
|
|||||||
if !n.IsFullyQualified() {
|
if !n.IsFullyQualified() {
|
||||||
panic("illegal attempt to get filepath of invalid name")
|
panic("illegal attempt to get filepath of invalid name")
|
||||||
}
|
}
|
||||||
return strings.ToLower(filepath.Join(
|
return filepath.Join(
|
||||||
n.Host,
|
n.Host,
|
||||||
n.Namespace,
|
n.Namespace,
|
||||||
n.Model,
|
n.Model,
|
||||||
n.Tag,
|
n.Tag,
|
||||||
))
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
// LogValue returns a slog.Value that represents the name as a string.
|
// LogValue returns a slog.Value that represents the name as a string.
|
||||||
|
|||||||
@@ -19,6 +19,16 @@ func TestParseNameParts(t *testing.T) {
|
|||||||
wantFilepath string
|
wantFilepath string
|
||||||
wantValidDigest bool
|
wantValidDigest bool
|
||||||
}{
|
}{
|
||||||
|
{
|
||||||
|
in: "registry.ollama.ai/library/dolphin-mistral:7b-v2.6-dpo-laser-q6_K",
|
||||||
|
want: Name{
|
||||||
|
Host: "registry.ollama.ai",
|
||||||
|
Namespace: "library",
|
||||||
|
Model: "dolphin-mistral",
|
||||||
|
Tag: "7b-v2.6-dpo-laser-q6_K",
|
||||||
|
},
|
||||||
|
wantFilepath: filepath.Join("registry.ollama.ai", "library", "dolphin-mistral", "7b-v2.6-dpo-laser-q6_K"),
|
||||||
|
},
|
||||||
{
|
{
|
||||||
in: "scheme://host:port/namespace/model:tag",
|
in: "scheme://host:port/namespace/model:tag",
|
||||||
want: Name{
|
want: Name{
|
||||||
@@ -266,9 +276,9 @@ func TestFilepathAllocs(t *testing.T) {
|
|||||||
allocs := testing.AllocsPerRun(1000, func() {
|
allocs := testing.AllocsPerRun(1000, func() {
|
||||||
n.Filepath()
|
n.Filepath()
|
||||||
})
|
})
|
||||||
allowedAllocs := 2.0
|
var allowedAllocs float64 = 1
|
||||||
if runtime.GOOS == "windows" {
|
if runtime.GOOS == "windows" {
|
||||||
allowedAllocs = 4
|
allowedAllocs = 3
|
||||||
}
|
}
|
||||||
if allocs > allowedAllocs {
|
if allocs > allowedAllocs {
|
||||||
t.Errorf("allocs = %v; allowed %v", allocs, allowedAllocs)
|
t.Errorf("allocs = %v; allowed %v", allocs, allowedAllocs)
|
||||||
@@ -309,6 +319,49 @@ func TestParseDigest(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func TestParseNameFromFilepath(t *testing.T) {
|
||||||
|
cases := map[string]Name{
|
||||||
|
filepath.Join("host", "namespace", "model", "tag"): {Host: "host", Namespace: "namespace", Model: "model", Tag: "tag"},
|
||||||
|
filepath.Join("host:port", "namespace", "model", "tag"): {Host: "host:port", Namespace: "namespace", Model: "model", Tag: "tag"},
|
||||||
|
filepath.Join("namespace", "model", "tag"): {},
|
||||||
|
filepath.Join("model", "tag"): {},
|
||||||
|
filepath.Join("model"): {},
|
||||||
|
filepath.Join("..", "..", "model", "tag"): {},
|
||||||
|
filepath.Join("", "namespace", ".", "tag"): {},
|
||||||
|
filepath.Join(".", ".", ".", "."): {},
|
||||||
|
filepath.Join("/", "path", "to", "random", "file"): {},
|
||||||
|
}
|
||||||
|
|
||||||
|
for in, want := range cases {
|
||||||
|
t.Run(in, func(t *testing.T) {
|
||||||
|
got := ParseNameFromFilepath(in)
|
||||||
|
|
||||||
|
if !reflect.DeepEqual(got, want) {
|
||||||
|
t.Errorf("parseNameFromFilepath(%q) = %v; want %v", in, got, want)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestDisplayShortest(t *testing.T) {
|
||||||
|
cases := map[string]string{
|
||||||
|
"registry.ollama.ai/library/model:latest": "model:latest",
|
||||||
|
"registry.ollama.ai/library/model:tag": "model:tag",
|
||||||
|
"registry.ollama.ai/namespace/model:tag": "namespace/model:tag",
|
||||||
|
"host/namespace/model:tag": "host/namespace/model:tag",
|
||||||
|
"host/library/model:tag": "host/library/model:tag",
|
||||||
|
}
|
||||||
|
|
||||||
|
for in, want := range cases {
|
||||||
|
t.Run(in, func(t *testing.T) {
|
||||||
|
got := ParseNameBare(in).DisplayShortest()
|
||||||
|
if got != want {
|
||||||
|
t.Errorf("parseName(%q).DisplayShortest() = %q; want %q", in, got, want)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func FuzzName(f *testing.F) {
|
func FuzzName(f *testing.F) {
|
||||||
for s := range testCases {
|
for s := range testCases {
|
||||||
f.Add(s)
|
f.Add(s)
|
||||||
|
|||||||
46
wiki for build on amd
Normal file
46
wiki for build on amd
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
###download
|
||||||
|
|
||||||
|
Make preparation as guide on [Development] (https://github.com/ollama/ollama/blob/main/docs/development.md)
|
||||||
|
run
|
||||||
|
$env:CGO_ENABLED="1"
|
||||||
|
go generate ./...
|
||||||
|
|
||||||
|
###edit
|
||||||
|
Then edit the file https://github.com/ollama/ollama/blob/main/llm/generate/gen_windows.ps1
|
||||||
|
add your gpu on the gpu list.
|
||||||
|
|
||||||
|
official support list "gfx900" "gfx906:xnack-" "gfx908:xnack-" "gfx90a:xnack+" "gfx90a:xnack-" "gfx940" "gfx941" "gfx942" "gfx1010""gfx1012" "gfx1030" "gfx1100""gfx1101" "gfx1102"
|
||||||
|
|
||||||
|
example extra list add on this repo.
|
||||||
|
|
||||||
|
"gfx803" "gfx902" "gfx904""gfx940" "gfx941" "gfx942" "gfx1010" "gfx1011" "gfx1012" "gfx1030" "gfx1031" "gfx1032""gfx1034" "gfx1035" "gfx1036" "gfx1103"
|
||||||
|
|
||||||
|
|
||||||
|
### Rocblas support
|
||||||
|
Make sure your have Hip SDK availble and have rocblas supported .
|
||||||
|
if not try to build the rocblas or download somewhere on github or build yourself
|
||||||
|
|
||||||
|
(learn how to build rocblas)[https://github.com/likelovewant/stable-diffusion-webui-forge-on-amd#extra-if-you-do-not-need-build-roclabs-or-already-have-library-please-skip-this-]
|
||||||
|
|
||||||
|
|
||||||
|
Availbe rocblas for gfx803、gfx900、gfx1010、gfx1031、gfx1032、 gfx1032、 gfx1103 on [https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU-/tree/main]
|
||||||
|
For others , please build yourself or search from others source .
|
||||||
|
|
||||||
|
Place rocblas.dll into C:\Program Files\AMD\ROCm\5.7\bin( this fold will appear after install HIP SKD ) replace the origianl one ,replace library within rocblas\library , also relace files in the ollama program folder with your rocblas.dll and library folder,eg(C:\Users\usrname\AppData\Local\Programs\Ollama\rocm) this report will not update regulary ,serve an example only .
|
||||||
|
|
||||||
|
|
||||||
|
After done .
|
||||||
|
|
||||||
|
### Build
|
||||||
|
Then follow the guide(here)[https://github.com/ollama/ollama/blob/main/app/README.md]
|
||||||
|
|
||||||
|
If you want to build the installer, youll need to install
|
||||||
|
|
||||||
|
https://jrsoftware.org/isinfo.php
|
||||||
|
In the top directory of this repo, run the following powershell script to build the ollama CLI, ollama app, and ollama installer.
|
||||||
|
|
||||||
|
powershell -ExecutionPolicy Bypass -File .\scripts\build_windows.ps1
|
||||||
|
|
||||||
|
after done the installer will availbel in the dist folder .
|
||||||
|
|
||||||
|
run the installer , its will work as exactly as the offocial release one .
|
||||||
Reference in New Issue
Block a user