subprocess llama.cpp server (#401)

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 14:26:30 +00:00

* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm

This commit is contained in:

Bruce MacDonald

2023-08-30 16:35:03 -04:00

committed by

GitHub

parent f4432e1dba

commit 42998d797d

37 changed files with 958 additions and 43928 deletions

1

.gitignore vendored

View File

@@ -5,4 +5,3 @@
 .swp
 dist
 ollama
 /ggml-metal.metal

subprocess llama.cpp server (#401)

1 .gitignore vendored Unescape Escape View File

1

.gitignore vendored

View File