ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 14:26:30 +00:00

Files

Jesse Gross 71cb86af3e llm: Remove unneeded warning with flash attention enabled

If flash attention is enabled without KV cache quanitization, we will
currently always get this warning:
level=WARN source=server.go:226 msg="kv cache type not supported by model" type=""

2025-09-10 16:40:45 -07:00

llm_darwin.go

Optimize container images for startup (#6547 )

2024-09-12 12:10:30 -07:00

llm_linux.go

Optimize container images for startup (#6547 )

2024-09-12 12:10:30 -07:00

llm_windows.go

win: lint fix (#10571 )

2025-05-05 11:08:12 -07:00

memory_test.go

llm: New memory management

2025-08-14 15:24:01 -07:00

memory.go

llm: Remove unneeded warning with flash attention enabled

2025-09-10 16:40:45 -07:00

server_test.go

llm: New memory management

2025-08-14 15:24:01 -07:00

server.go

llm: Remove unneeded warning with flash attention enabled

2025-09-10 16:40:45 -07:00

status.go

Improve crash reporting (#7728 )

2024-11-19 16:26:57 -08:00