ollama-for-amd/server/quantization.go at a4770107a6ea6b4f5adc235d37d08417dc3b9184

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 22:33:56 +00:00

Files

Michael Yang d0b32def60 skip quantizing per_layer_token_embd (#11207 )

this tensor isn't compatible with cuda when quantized to q4_K so skip it

2025-06-26 21:49:35 -07:00

View Raw