* feat: Bump llama.cpp to the latest master (17f7f4b)
This brings in significant improvements to prefill performance for all
models using the SSM_CONV and SSM_SCAN ops (granite4, jamba, falcon-h,
nemotron-h, Qwen3 Next) on Apple Metal.
See https://github.com/ggml-org/llama.cpp/pull/17876
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 1-4
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Update patches 5-12
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 13-18
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patch 20
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Update patches 21-31
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Sync vendored code
The two files I'm not sure about here are the swap from gemma3-iswa.cpp to
gemma3.cpp (I chose to include this because I think it's required), and the
inclusion of `ggml-zendnn.h` which I chose to omit.
Branch: LlamaCPPMetalSSMImprovements
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* Revert "vulkan: temporary cary of vulkan fixes (#12971)"
This reverts commit 3a9e8e9fd4.
* ggml update to b7087
* fix argsort on metal
* update to b7108
* fix bakllava regression
This model lacks the metadata for the projector type.
* update to b7209
* fix TopK perf
* only build arm code on arm
* add build to .dockerignore
* test: only build one arch
* add build to .gitignore
* fix ccache path
* filter amdgpu targets
* only filter if autodetecting
* Don't clobber gpu list for default runner
This ensures the GPU specific environment variables are set properly
* explicitly set CXX compiler for HIP
* Update build_windows.ps1
This isn't complete, but is close. Dependencies are missing, and it only builds the "default" preset.
* build: add ollama subdir
* add .git to .dockerignore
* docs: update development.md
* update build_darwin.sh
* remove unused scripts
* llm: add cwd and build/lib/ollama to library paths
* default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS
* add additional cmake output vars for msvc
* interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12
* remove unncessary filepath.Dir, cleanup
* add hardware-specific directory to path
* use absolute server path
* build: linux arm
* cmake install targets
* remove unused files
* ml: visit each library path once
* build: skip cpu variants on arm
* build: install cpu targets
* build: fix workflow
* shorter names
* fix rocblas install
* docs: clean up development.md
* consistent build dir removal in development.md
* silence -Wimplicit-function-declaration build warnings in ggml-cpu
* update readme
* update development readme
* llm: update library lookup logic now that there is one runner (#8587)
* tweak development.md
* update docs
* add windows cuda/rocm tests
---------
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>