docs: add docs for docs.ollama.com (#12805)

2025-12-21 22:33:56 +00:00 · 2025-10-28 13:18:48 -07:00
parent 6d02a43a75
commit 3d99d9779a
74 changed files with 4997 additions and 2175 deletions
--- a/docs/modelfile.mdx
+++ b/docs/modelfile.mdx
@@ -1,9 +1,8 @@
-# Ollama Model File
+---
+title: Modelfile Reference
+---

-> [!NOTE]
-> `Modelfile` syntax is in development
-
-A model file is the blueprint to create and share models with Ollama.
+A Modelfile is the blueprint to create and share customized models using Ollama.

 ## Table of Contents

@@ -73,26 +72,23 @@ To view the Modelfile of a given model, use the `ollama show --modelfile` comman
 ollama show --modelfile llama3.2
 ```

-> **Output**:
->
-> ```
-> # Modelfile generated by "ollama show"
-> # To build a new Modelfile based on this one, replace the FROM line with:
-> # FROM llama3.2:latest
-> FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
-> TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
->
-> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
->
-> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
->
-> {{ .Response }}<|eot_id|>"""
-> PARAMETER stop "<|start_header_id|>"
-> PARAMETER stop "<|end_header_id|>"
-> PARAMETER stop "<|eot_id|>"
-> PARAMETER stop "<|reserved_special_token"
-> ```
+```
+# Modelfile generated by "ollama show"
+# To build a new Modelfile based on this one, replace the FROM line with:
+# FROM llama3.2:latest
+FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
+TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

+{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
+
+{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
+
+{{ .Response }}<|eot_id|>"""
+PARAMETER stop "<|start_header_id|>"
+PARAMETER stop "<|end_header_id|>"
+PARAMETER stop "<|eot_id|>"
+PARAMETER stop "<|reserved_special_token"
+```

 ## Instructions

@@ -110,10 +106,13 @@ FROM <model name>:<tag>
 FROM llama3.2
 ```

-A list of available base models:
-<https://github.com/ollama/ollama#model-library>
-Additional models can be found at:
-<https://ollama.com/library>
+<Card title="Base Models" href="https://github.com/ollama/ollama#model-library">
+  A list of available base models
+</Card>
+
+<Card title="Base Models" href="https://ollama.com/library">
+  Additional models can be found at
+</Card>

 #### Build from a Safetensors model

@@ -124,10 +123,11 @@ FROM <model directory>
 The model directory should contain the Safetensors weights for a supported architecture.

 Currently supported model architectures:
-  * Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2)
-  * Mistral (including Mistral 1, Mistral 2, and Mixtral)
-  * Gemma (including Gemma 1 and Gemma 2)
-  * Phi3
+
+- Llama (including Llama 2, Llama 3, Llama 3.1, and Llama 3.2)
+- Mistral (including Mistral 1, Mistral 2, and Mixtral)
+- Gemma (including Gemma 1 and Gemma 2)
+- Phi3

 #### Build from a GGUF file

@@ -137,7 +137,6 @@ FROM ./ollama-model.gguf

 The GGUF file location should be specified as an absolute path or relative to the `Modelfile` location.

-
 ### PARAMETER

 The `PARAMETER` instruction defines a parameter that can be set when the model is run.
@@ -148,18 +147,21 @@ PARAMETER <parameter> <parametervalue>

 #### Valid Parameters and Values

-| Parameter      | Description                                                                                                                                                                                                                                             | Value Type | Example Usage        |
-| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | -------------------- |
-| num_ctx        | Sets the size of the context window used to generate the next token. (Default: 4096)                                                                                                                                                                    | int        | num_ctx 4096         |
-| repeat_last_n  | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)                                                                                                                                           | int        | repeat_last_n 64     |
-| repeat_penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)                                                                     | float      | repeat_penalty 1.1   |
-| temperature    | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)                                                                                                                                     | float      | temperature 0.7      |
-| seed           | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)                                                                                       | int        | seed 42              |
-| stop           | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate `stop` parameters in a modelfile.                                      | string     | stop "AI assistant:" |
-| num_predict    | Maximum number of tokens to predict when generating text. (Default: -1, infinite generation)                                                                                                                                   | int        | num_predict 42       |
-| top_k          | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)                                                                        | int        | top_k 40             |
-| top_p          | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)                                                                 | float      | top_p 0.9            |
-| min_p          | Alternative to the top_p, and aims to ensure a balance of quality and variety. The parameter *p* represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with *p*=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0) | float      | min_p 0.05            |
+| Parameter      | Description                                                                                                                                                                                                                                                                                                                                                                     | Value Type | Example Usage        |
+| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | -------------------- |
+| mirostat       | Enable Mirostat sampling for controlling perplexity. (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)                                                                                                                                                                                                                                                                 | int        | mirostat 0           |
+| mirostat_eta   | Influences how quickly the algorithm responds to feedback from the generated text. A lower learning rate will result in slower adjustments, while a higher learning rate will make the algorithm more responsive. (Default: 0.1)                                                                                                                                                | float      | mirostat_eta 0.1     |
+| mirostat_tau   | Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)                                                                                                                                                                                                                                 | float      | mirostat_tau 5.0     |
+| num_ctx        | Sets the size of the context window used to generate the next token. (Default: 2048)                                                                                                                                                                                                                                                                                            | int        | num_ctx 4096         |
+| repeat_last_n  | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)                                                                                                                                                                                                                                                                   | int        | repeat_last_n 64     |
+| repeat_penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)                                                                                                                                                                                             | float      | repeat_penalty 1.1   |
+| temperature    | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)                                                                                                                                                                                                                                                             | float      | temperature 0.7      |
+| seed           | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)                                                                                                                                                                                                               | int        | seed 42              |
+| stop           | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate `stop` parameters in a modelfile.                                                                                                                                                              | string     | stop "AI assistant:" |
+| num_predict    | Maximum number of tokens to predict when generating text. (Default: -1, infinite generation)                                                                                                                                                                                                                                                                                    | int        | num_predict 42       |
+| top_k          | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)                                                                                                                                                                                                | int        | top_k 40             |
+| top_p          | Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)                                                                                                                                                                                         | float      | top_p 0.9            |
+| min_p          | Alternative to the top*p, and aims to ensure a balance of quality and variety. The parameter \_p* represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with _p_=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out. (Default: 0.0) | float      | min_p 0.05           |

 ### TEMPLATE

@@ -201,9 +203,10 @@ ADAPTER <path to safetensor adapter>
 ```

 Currently supported Safetensor adapters:
-  * Llama (including Llama 2, Llama 3, and Llama 3.1)
-  * Mistral (including Mistral 1, Mistral 2, and Mixtral)
-  * Gemma (including Gemma 1 and Gemma 2)
+
+- Llama (including Llama 2, Llama 3, and Llama 3.1)
+- Mistral (including Mistral 1, Mistral 2, and Mixtral)
+- Gemma (including Gemma 1 and Gemma 2)

 #### GGUF adapter

@@ -237,7 +240,6 @@ MESSAGE <role> <message>
 | user      | An example message of what the user could have asked.        |
 | assistant | An example message of how the model should respond.          |

-
 #### Example conversation

 ```
@@ -249,7 +251,6 @@ MESSAGE user Is Ontario in Canada?
 MESSAGE assistant yes
 ```

-
 ## Notes

 - the **`Modelfile` is not case sensitive**. In the examples, uppercase instructions are used to make it easier to distinguish it from arguments.