Bring back escape valve for llm libraries and fix Jetpack6 crash (#12529)

* Bring back escape valve for llm libraries If the new discovery logic picks the wrong library, this gives users the ability to force a specific one using the same pattern as before. This can also potentially speed up bootstrap discovery if one of the libraries takes a long time to load and ultimately bind to no devices. For example unsupported AMD iGPUS can sometimes take a while to discover and rule out. * Bypass extra discovery on jetpack systems On at least Jetpack6, cuda_v12 appears to expose the iGPU, but crashes later on in cublasInit so if we detect a Jetpack, short-circuit and use that variant.
2025-12-21 22:33:56 +00:00 · 2025-10-07 16:06:14 -07:00
parent bc71278670
commit bd15eba4e4
3 changed files with 46 additions and 8 deletions
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -48,16 +48,10 @@ Dynamic LLM libraries [rocm_v6 cpu cpu_avx cpu_avx2 cuda_v12 rocm_v5]

 **Experimental LLM Library Override**

-You can set OLLAMA_LLM_LIBRARY to any of the available LLM libraries to bypass autodetection, so for example, if you have a CUDA card, but want to force the CPU LLM library with AVX2 vector support, use:
+You can set OLLAMA_LLM_LIBRARY to any of the available LLM libraries to limit autodetection, so for example, if you have both CUDA and AMD GPUs, but want to force the CUDA v13 only, use:

 ```shell
-OLLAMA_LLM_LIBRARY="cpu_avx2" ollama serve
-```
-
-You can see what features your CPU has with the following.
-
-```shell
-cat /proc/cpuinfo| grep flags | head -1
+OLLAMA_LLM_LIBRARY="cuda_v13" ollama serve
 ```

 ## Installing older or pre-release versions on Linux