ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-26 00:18:02 +00:00

Files

Jesse Gross 71cb86af3e llm: Remove unneeded warning with flash attention enabled

If flash attention is enabled without KV cache quanitization, we will
currently always get this warning:
level=WARN source=server.go:226 msg="kv cache type not supported by model" type=""

2025-09-10 16:40:45 -07:00

ggml

llm: Remove unneeded warning with flash attention enabled

2025-09-10 16:40:45 -07:00

gguf

Reapply "feat: incremental gguf parser (#10822 )" (#11114 ) (#11119 )

2025-06-20 11:11:40 -07:00

util/bufioutil

next ollama runner (#7913 )

2025-02-13 16:31:21 -08:00

config.go

add new gemma model (#11204 )

2025-06-25 21:47:09 -07:00