ollama-for-amd/llama/patches at 3d0b1734c006798960a56acb0ea23ea57e0dd1d9 - ollama-for-amd - Git.NotJustAnna.net

mirrors/ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 22:33:56 +00:00

Files

History

Jesse Gross 3d0b1734c0 ggml: Preallocate CUDA pool memory

The GGML CUDA backend allocates additional memory for intermediate
results during calculation. This memory isn't currently allocated
during worst case graph reservation and therefore not included in
scheduling. This means that as these buffers potentially grow
with context length, we could crash.

This extends the memory allocation system down layer from the GGML
graph to the CUDA layer, preallocating the worst case memory there
as well.

Fixes #11753

2025-09-30 15:04:43 -07:00

..

.gitignore

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0001-ggml-backend-malloc-and-free-using-the-same-compiler.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0002-pretokenizer.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0003-clip-unicode.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0004-solar-pro.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0005-fix-deepseek-deseret-regex.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0006-maintain-ordering-for-rules-for-grammar.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0007-sort-devices-by-score.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0008-add-phony-target-ggml-cpu-for-all-cpu-variants.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0009-remove-amx.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0010-fix-string-arr-kv-loading.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0011-ollama-debug-tensor.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0012-add-ollama-vocab-for-grammar-support.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0013-add-argsort-and-cuda-copy-for-i32.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0014-graph-memory-reporting-on-failure.patch

ggml: Remove allocation status reporting

2025-09-30 15:04:43 -07:00

0015-ggml-Export-GPU-UUIDs.patch

update vendored llama.cpp and ggml (#11823 )

2025-08-14 14:42:58 -07:00

0016-add-C-API-for-mtmd_input_text.patch

llm: New memory management

2025-08-14 15:24:01 -07:00

0017-no-power-throttling-win32-with-gnuc.patch

llm: New memory management

2025-08-14 15:24:01 -07:00

0018-BF16-macos-version-guard.patch

llm: New memory management

2025-08-14 15:24:01 -07:00

0019-Enable-CUDA-Graphs-for-gemma3n.patch

disable output_all (#11959 )

2025-08-18 17:45:40 -07:00

0020-Disable-ggml-blas-on-macos-v13-and-older.patch

llm: New memory management

2025-08-14 15:24:01 -07:00

0021-fix-mtmd-audio.cpp-build-on-windows.patch

llm: New memory management

2025-08-14 15:24:01 -07:00

0022-ggml-No-alloc-mode.patch

ggml: Preallocate CUDA pool memory

2025-09-30 15:04:43 -07:00

0023-decode-disable-output_all.patch

disable output_all (#11959 )

2025-08-18 17:45:40 -07:00

0024-ggml-Enable-resetting-backend-devices.patch

ggml: Avoid allocating CUDA primary context on unused GPUs

2025-08-27 16:24:18 -07:00

0025-harden-uncaught-exception-registration.patch

harden uncaught exception registration (#12120 )

2025-09-02 09:43:55 -07:00

0026-ggml-Backport-scale-kernel-fixes.patch

ggml: Backport scale kernel fixes

2025-09-30 15:04:43 -07:00