llm: New memory management

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 14:26:30 +00:00

This changes the memory allocation strategy from upfront estimation to
tracking actual allocations done by the engine and reacting to that. The
goal is avoid issues caused by both under-estimation (crashing) and
over-estimation (low performance due to under-utilized GPUs).

It is currently opt-in and can be enabled for models running on the
Ollama engine by setting OLLAMA_NEW_ESTIMATES=1. Behavior in other
cases is unchanged and will continue to use the existing estimates.

This commit is contained in:

Jesse Gross

2025-05-29 12:21:48 -07:00

committed by

Jesse Gross

parent ef7d26ba2c

commit d5a0d8d904

26 changed files with 1860 additions and 900 deletions

1049

llm/server.go

View File

File diff suppressed because it is too large Load Diff

llm: New memory management

1049 llm/server.go View File

1049

llm/server.go

View File