mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 14:26:30 +00:00

Files

Jesse Gross 73d6a82cce ollamarunner: Memory usage reporting

This provides granular information about the backend memory allocations
required by the runner:
 - Per backend
 - Per layer
 - Weights, cache and graph
 - Allocation status

This can be used for debugging and validating memory estimates.

2025-05-22 14:38:09 -07:00

common

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

llamarunner

ollamarunner: Base cached tokens on current prompt

2025-05-15 13:46:20 -07:00

ollamarunner

ollamarunner: Memory usage reporting

2025-05-22 14:38:09 -07:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding