ollama-for-amd/ml/backend.go at e093db92c4731b0cada767c7d5877c20d5f61dcf

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-22 06:43:57 +00:00

Files

Jesse Gross 4100ed7bdd ml: Add support for quantized KV cache

Similar to the llama engine, quantizing the KV cache requires
flash attention to be enabled through the Ollama server.

2025-03-07 18:43:39 -08:00

View Raw