Files
ollama-for-amd/server/sched.go
Daniel Hiltgen ff4f0cbd1d Prevent multiple concurrent loads on the same gpus
While models are loading, the VRAM metrics are dynamic, so try
to load on a GPU that doesn't have a model actively loading, or wait
to avoid races that lead to OOMs
2024-06-14 14:51:40 -07:00

24 KiB