ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 22:33:56 +00:00

Files

Michael Yang bbbc73d637 llamarunner: update metrics

this change updates how metrics are collected. until now, performance
metrics, specifically initial input processing and subsequent generation
durations, were collected by taking the timestamp when creating a new
sequence, the first token generation, and completing generation. the
processing duration is taken as first token generation sub sequence
creation while generation is taken as completing generation sub first
token generation.

while this approach is an accurate end-to-end metric of processing and
generation, it's not comparable to other tools which only measure the
active, i.e. decode, duration.

this change updates the metrics to only capture decode duration so it
can be more directly compared to other tools

2025-10-09 15:44:04 -07:00

cache_test.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

cache.go

refactor: use the built-in max/min to simplify the code (#12280 )

2025-09-16 17:14:21 -07:00

image_test.go

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

image.go

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

runner.go

llamarunner: update metrics

2025-10-09 15:44:04 -07:00