sched: fix race leading to orphaned runners (#10599)

If a model is loading, and the request context is canceled during the load
by a client closing the connection, and another request is inbound for the
same model with a different configuration (context size, etc.) thus requiring
a reload, two unload events can be in flight.  The first shuts down the
original model load, but the second one caused the loss of the new
reloading runner reference, thus triggering the leak.

The primary fix is detecting the duplicate unload and ignoring the second
instance.  The load routine is also hardened to ensure we detect
clobbering an already present runner and unload it with a warning.
This commit is contained in:
Daniel Hiltgen
2025-05-07 09:38:17 -07:00
committed by GitHub
parent 392de84031
commit 5e380c3b42
2 changed files with 40 additions and 20 deletions

View File

@@ -1010,17 +1010,17 @@ func (s *llmServer) Close() error {
s.llamaModelLock.Unlock()
if s.cmd != nil {
slog.Debug("stopping llama server")
slog.Debug("stopping llama server", "pid", s.Pid())
if err := s.cmd.Process.Kill(); err != nil {
return err
}
// if ProcessState is already populated, Wait already completed, no need to wait again
if s.cmd.ProcessState == nil {
slog.Debug("waiting for llama server to exit")
slog.Debug("waiting for llama server to exit", "pid", s.Pid())
<-s.done
}
slog.Debug("llama server stopped")
slog.Debug("llama server stopped", "pid", s.Pid())
}
return nil