Files
ollama-for-amd/ml
Jesse Gross efaee8c2d6 ggml: Backport scale kernel fixes
The GGML scale kernel uses signed 32-bit ints to represent
the number of elements in the tensor. For large images,
mistral-small3.2 overflows this, triggering CUDA errors due
to negative arguments.

Currently, this can happen when the user passes a large image
to mistral-small3.2. However, with upcoming changes to reserve
CUDA memory, it happens every time mistral-small is loaded as
we reserve using a worst case batch.

This patch is part of an upstream GGML commit and should be removed
after GGML is updated past 0a1b398 "ggml: add ops for WAN video model
(cuda && cpu) (#15669)".

Fixes #10388
2025-09-30 15:04:43 -07:00
..
2025-09-30 15:04:43 -07:00