mirror of
https://github.com/likelovewant/ollama-for-amd.git
synced 2025-12-21 14:26:30 +00:00
* docs: add docs for v1/responses and rework openai compat section I reworked the examples to be separated by topic and to be fully runnable (i.e., they now log output instead of just suggesting how a call might be made). We now use `<CodeGroup>`s so that each example has a dropdown on the docs site for users to choose, which makes the examples a lot more digestible (since you only see approx 1/3 of the code you used to). I also added a new tool to extract code examples into files so that it's easier to actually run them and check that they work. ## Example ```shell go run docs/tools/extract-examples/main.go docs/api/openai-compatibility.mdx ``` Output: ``` Extracting code examples to: /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368 - 01_basic.py - 01_basic.js - 01_basic.sh - 02_responses.py - 02_responses.js - 02_responses.sh - 03_vision.py - 03_vision.js - 03_vision.sh Extracted 9 file(s) to /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368 To run examples: cd /var/folders/vq/wfm2g6k917d3ldzpjdxc8ph00000gn/T/mdx-examples-3271754368 npm install # for JS examples then run individual files with `node file.js`, `python file.py`, `bash file.sh` ``` In the future we should consider actually running the examples in CI and having some sort of acceptance test so we can automatically detect when our examples break. So this is just a start in that direction. * Update docs/api/openai-compatibility.mdx Co-authored-by: Parth Sareen <parth.sareen@ollama.com> * Update docs/api/openai-compatibility.mdx Co-authored-by: Parth Sareen <parth.sareen@ollama.com> --------- Co-authored-by: Parth Sareen <parth.sareen@ollama.com>
359 lines
22 KiB
Plaintext
359 lines
22 KiB
Plaintext
---
|
|
title: OpenAI compatibility
|
|
---
|
|
|
|
Ollama provides compatibility with parts of the [OpenAI API](https://platform.openai.com/docs/api-reference) to help connect existing applications to Ollama.
|
|
|
|
## Usage
|
|
|
|
### Simple `v1/chat/completions` example
|
|
|
|
<CodeGroup dropdown>
|
|
|
|
```python basic.py
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
base_url='http://localhost:11434/v1/',
|
|
api_key='ollama', # required but ignored
|
|
)
|
|
|
|
chat_completion = client.chat.completions.create(
|
|
messages=[
|
|
{
|
|
'role': 'user',
|
|
'content': 'Say this is a test',
|
|
}
|
|
],
|
|
model='gpt-oss:20b',
|
|
)
|
|
print(chat_completion.choices[0].message.content)
|
|
```
|
|
|
|
```javascript basic.js
|
|
import OpenAI from "openai";
|
|
|
|
const openai = new OpenAI({
|
|
baseURL: "http://localhost:11434/v1/",
|
|
apiKey: "ollama", // required but ignored
|
|
});
|
|
|
|
const chatCompletion = await openai.chat.completions.create({
|
|
messages: [{ role: "user", content: "Say this is a test" }],
|
|
model: "gpt-oss:20b",
|
|
});
|
|
|
|
console.log(chatCompletion.choices[0].message.content);
|
|
```
|
|
|
|
```shell basic.sh
|
|
curl -X POST http://localhost:11434/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-oss:20b",
|
|
"messages": [{ "role": "user", "content": "Say this is a test" }]
|
|
}'
|
|
```
|
|
|
|
</CodeGroup>
|
|
|
|
### Simple `v1/responses` example
|
|
|
|
<CodeGroup dropdown>
|
|
|
|
```python responses.py
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
base_url='http://localhost:11434/v1/',
|
|
api_key='ollama', # required but ignored
|
|
)
|
|
|
|
responses_result = client.responses.create(
|
|
model='qwen3:8b',
|
|
input='Write a short poem about the color blue',
|
|
)
|
|
print(responses_result.output_text)
|
|
```
|
|
|
|
```javascript responses.js
|
|
import OpenAI from "openai";
|
|
|
|
const openai = new OpenAI({
|
|
baseURL: "http://localhost:11434/v1/",
|
|
apiKey: "ollama", // required but ignored
|
|
});
|
|
|
|
const responsesResult = await openai.responses.create({
|
|
model: "qwen3:8b",
|
|
input: "Write a short poem about the color blue",
|
|
});
|
|
|
|
console.log(responsesResult.output_text);
|
|
```
|
|
|
|
```shell responses.sh
|
|
curl -X POST http://localhost:11434/v1/responses \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "qwen3:8b",
|
|
"input": "Write a short poem about the color blue"
|
|
}'
|
|
```
|
|
|
|
</CodeGroup>
|
|
|
|
### v1/chat/completions with vision example
|
|
|
|
<CodeGroup dropdown>
|
|
|
|
```python vision.py
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
base_url='http://localhost:11434/v1/',
|
|
api_key='ollama', # required but ignored
|
|
)
|
|
|
|
response = client.chat.completions.create(
|
|
model='qwen3-vl:8b',
|
|
messages=[
|
|
{
|
|
'role': 'user',
|
|
'content': [
|
|
{'type': 'text', 'text': "What's in this image?"},
|
|
{
|
|
'type': 'image_url',
|
|
'image_url': '',
|
|
},
|
|
],
|
|
}
|
|
],
|
|
max_tokens=300,
|
|
)
|
|
print(response.choices[0].message.content)
|
|
```
|
|
|
|
```javascript vision.js
|
|
import OpenAI from "openai";
|
|
|
|
const openai = new OpenAI({
|
|
baseURL: "http://localhost:11434/v1/",
|
|
apiKey: "ollama", // required but ignored
|
|
});
|
|
|
|
const response = await openai.chat.completions.create({
|
|
model: "qwen3-vl:8b",
|
|
messages: [
|
|
{
|
|
role: "user",
|
|
content: [
|
|
{ type: "text", text: "What's in this image?" },
|
|
{
|
|
type: "image_url",
|
|
image_url:
|
|
"",
|
|
},
|
|
],
|
|
},
|
|
],
|
|
});
|
|
console.log(response.choices[0].message.content);
|
|
```
|
|
|
|
```shell vision.sh
|
|
curl -X POST http://localhost:11434/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "qwen3-vl:8b",
|
|
"messages": [{ "role": "user", "content": [{"type": "text", "text": "What is this an image of?"}, {"type": "image_url", "image_url": ""}]}]
|
|
}'
|
|
```
|
|
|
|
</CodeGroup>
|
|
|
|
## Endpoints
|
|
|
|
### `/v1/chat/completions`
|
|
|
|
#### Supported features
|
|
|
|
- [x] Chat completions
|
|
- [x] Streaming
|
|
- [x] JSON mode
|
|
- [x] Reproducible outputs
|
|
- [x] Vision
|
|
- [x] Tools
|
|
- [ ] Logprobs
|
|
|
|
#### Supported request fields
|
|
|
|
- [x] `model`
|
|
- [x] `messages`
|
|
- [x] Text `content`
|
|
- [x] Image `content`
|
|
- [x] Base64 encoded image
|
|
- [ ] Image URL
|
|
- [x] Array of `content` parts
|
|
- [x] `frequency_penalty`
|
|
- [x] `presence_penalty`
|
|
- [x] `response_format`
|
|
- [x] `seed`
|
|
- [x] `stop`
|
|
- [x] `stream`
|
|
- [x] `stream_options`
|
|
- [x] `include_usage`
|
|
- [x] `temperature`
|
|
- [x] `top_p`
|
|
- [x] `max_tokens`
|
|
- [x] `tools`
|
|
- [ ] `tool_choice`
|
|
- [ ] `logit_bias`
|
|
- [ ] `user`
|
|
- [ ] `n`
|
|
|
|
### `/v1/completions`
|
|
|
|
#### Supported features
|
|
|
|
- [x] Completions
|
|
- [x] Streaming
|
|
- [x] JSON mode
|
|
- [x] Reproducible outputs
|
|
- [ ] Logprobs
|
|
|
|
#### Supported request fields
|
|
|
|
- [x] `model`
|
|
- [x] `prompt`
|
|
- [x] `frequency_penalty`
|
|
- [x] `presence_penalty`
|
|
- [x] `seed`
|
|
- [x] `stop`
|
|
- [x] `stream`
|
|
- [x] `stream_options`
|
|
- [x] `include_usage`
|
|
- [x] `temperature`
|
|
- [x] `top_p`
|
|
- [x] `max_tokens`
|
|
- [x] `suffix`
|
|
- [ ] `best_of`
|
|
- [ ] `echo`
|
|
- [ ] `logit_bias`
|
|
- [ ] `user`
|
|
- [ ] `n`
|
|
|
|
#### Notes
|
|
|
|
- `prompt` currently only accepts a string
|
|
|
|
### `/v1/models`
|
|
|
|
#### Notes
|
|
|
|
- `created` corresponds to when the model was last modified
|
|
- `owned_by` corresponds to the ollama username, defaulting to `"library"`
|
|
|
|
### `/v1/models/{model}`
|
|
|
|
#### Notes
|
|
|
|
- `created` corresponds to when the model was last modified
|
|
- `owned_by` corresponds to the ollama username, defaulting to `"library"`
|
|
|
|
### `/v1/embeddings`
|
|
|
|
#### Supported request fields
|
|
|
|
- [x] `model`
|
|
- [x] `input`
|
|
- [x] string
|
|
- [x] array of strings
|
|
- [ ] array of tokens
|
|
- [ ] array of token arrays
|
|
- [x] `encoding format`
|
|
- [x] `dimensions`
|
|
- [ ] `user`
|
|
|
|
### `/v1/responses`
|
|
|
|
Ollama supports the [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses). Only the non-stateful flavor is supported (i.e., there is no `previous_response_id` or `conversation` support).
|
|
|
|
#### Supported features
|
|
|
|
- [x] Streaming
|
|
- [x] Tools (function calling)
|
|
- [x] Reasoning summaries (for thinking models)
|
|
- [ ] Stateful requests
|
|
|
|
#### Supported request fields
|
|
|
|
- [x] `model`
|
|
- [x] `input`
|
|
- [x] `instructions`
|
|
- [x] `tools`
|
|
- [x] `stream`
|
|
- [x] `temperature`
|
|
- [x] `top_p`
|
|
- [x] `max_output_tokens`
|
|
- [ ] `previous_response_id` (stateful v1/responses not supported)
|
|
- [ ] `conversation` (stateful v1/responses not supported)
|
|
- [ ] `truncation`
|
|
|
|
## Models
|
|
|
|
Before using a model, pull it locally `ollama pull`:
|
|
|
|
```shell
|
|
ollama pull llama3.2
|
|
```
|
|
|
|
### Default model names
|
|
|
|
For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name:
|
|
|
|
```shell
|
|
ollama cp llama3.2 gpt-3.5-turbo
|
|
```
|
|
|
|
Afterwards, this new model name can be specified the `model` field:
|
|
|
|
```shell
|
|
curl http://localhost:11434/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-3.5-turbo",
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "Hello!"
|
|
}
|
|
]
|
|
}'
|
|
```
|
|
|
|
### Setting the context size
|
|
|
|
The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a `Modelfile` which looks like:
|
|
|
|
```
|
|
FROM <some model>
|
|
PARAMETER num_ctx <context size>
|
|
```
|
|
|
|
Use the `ollama create mymodel` command to create a new model with the updated context size. Call the API with the updated model name:
|
|
|
|
```shell
|
|
curl http://localhost:11434/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "mymodel",
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "Hello!"
|
|
}
|
|
]
|
|
}'
|
|
```
|