ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 22:33:56 +00:00

Files

Jesse Gross efaee8c2d6 ggml: Backport scale kernel fixes

The GGML scale kernel uses signed 32-bit ints to represent
the number of elements in the tensor. For large images,
mistral-small3.2 overflows this, triggering CUDA errors due
to negative arguments.

Currently, this can happen when the user passes a large image
to mistral-small3.2. However, with upcoming changes to reserve
CUDA memory, it happens every time mistral-small is loaded as
we reserve using a worst case batch.

This patch is part of an upstream GGML commit and should be removed
after GGML is updated past 0a1b398 "ggml: add ops for WAN video model
(cuda && cpu) (#15669)".

Fixes #10388

2025-09-30 15:04:43 -07:00

backend

ggml: Backport scale kernel fixes

2025-09-30 15:04:43 -07:00

use split activations when possible (#12293 )

2025-09-16 09:51:19 -07:00

backend.go

ggml: Remove allocation status reporting

2025-09-30 15:04:43 -07:00