Commit Graph

12 Commits

Author SHA1 Message Date
Jeffrey Morgan
2dfb74410d model: fix rotary embeddings for ministral 3 (#13432) 2025-12-11 16:02:05 -08:00
Jeffrey Morgan
a838421ea3 model: conversion and hyperparameter fixes for ministral and devstral (#13424) 2025-12-11 13:04:00 -08:00
Michael Yang
603ceefaa6 refactor rope
change to a flatter directory structure and group the options with the
function

update models to call rope in one place
2025-12-08 14:42:22 -08:00
Patrick Devine
d3e0a0dee4 model: ministral w/ llama4 scaling (#13292)
This change:

* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2025-12-01 23:20:14 -08:00
Michael Yang
564b558c92 fix(llama): other llama flavours (#12308)
* fix(llama): rope scale

* spm llama

* skip moe models

* cleanup
2025-09-17 12:12:21 -07:00
Michael Yang
ad95d5b30b use split activations when possible (#12293)
* use ggml_*_split activations when possible

* forward qkv
2025-09-16 09:51:19 -07:00
Michael Yang
9ed8bf14cb ml: add more rope options (#10775) 2025-05-20 15:51:08 -07:00
Michael Yang
ff180c3466 fix llama and mistral3 models (#10774)
* fix llama model

* fix mistral3.1 model

do not set default vision layers
2025-05-19 15:06:35 -07:00
Jesse Gross
3c14461d5d ollamarunner: Separate text and multimodal graphs
For some multimodal models (such as gemma3), we create a single
graph that generates the image embedding and then use this in the
text model. The embedding tensor is completely opaque to the runner.

However, this doesn't work if we need to use the embedding in multiple
batches. This can arise if the embedding is larger than the batch size.
In these cases (as with llama4), we would like to create views that
are more appropriately sized. However, if we do this then the original
source tensor is used in multiple graphs, which isn't allowed. To
avoid that problem, models with this pattern compute the embedding
tensor on first use and recreate the individual views. There is no
longer a single vision and text graph.

This codifies the pattern of separating vision and text graphs. The
logic of computing tensors on demand is moved to the runner, so models
no longer have to worry about this. It also gives the runner visibility
into the multimodal tensors, which is important for memory management.
2025-05-15 13:46:20 -07:00
Michael Yang
526b2ed102 fix vocabulary (#10679) 2025-05-12 17:29:46 -07:00
Michael Yang
d26c18e25c fix token type 2025-04-25 16:59:01 -07:00
Bruce MacDonald
6bd0a983cd model: support for mistral-small in the ollama runner
Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.
2025-04-03 16:57:36 -07:00