ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 14:26:30 +00:00

Files

Jesse Gross 4372d0bfef llamarunner: Respect device ordering for offloaded layers

We used to control the way that llama.cpp saw devices using
CUDA_VISIBLE_DEVICES or similar. This would ensure that the layers
offloaded to a device were actually the ones intended. This is
particularly important because we might reorder devices based on
free memory or performance.

When we started explicitly scheduling layers, this logic went
away but the llamarunner didn't have any way to set the correct
order of devices. This meant that the correct number of layers
would be assigned to a device but not necessarily the layers
that were expected. This change sets up the devices correctly
based on the offload information.

2025-11-11 13:11:08 -08:00

backend

Remove unnecessary MacOs 13 and lower Patches (#12656 )

2025-11-06 15:52:56 -08:00

ggml update to b6840 (#12791 )

2025-11-06 10:19:22 -08:00

backend.go

ggml: Enable op_offload to improve partial offload performance

2025-10-30 13:53:10 -07:00

device.go

llamarunner: Respect device ordering for offloaded layers

2025-11-11 13:11:08 -08:00

path.go

cpu: always ensure LibOllamaPath included (#12890 )

2025-10-31 14:37:29 -07:00