ollama-for-amd

mirrors/ollama-for-amd

Fork 0

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-21 22:33:56 +00:00

Commit Graph

Select branches

Hide Pull Requests

main

#114

#114

V0.1.33-alpha

v0.1.34-alpha

v0.1.35-alpha

v0.1.36-alpha

v0.1.37-alpha

v0.1.38-alpha

v0.1.39-alpha

v0.1.40-alpha

v0.1.42-alpha

v0.1.43-alpha

v0.1.45-alpha

v0.1.46-alpha

v0.1.47-Alpha

v0.1.48-alpha

v0.1.48-alpha-2

v0.10.1

v0.11.10

v0.11.2

v0.11.6

v0.11.8

v0.11.9

v0.12.11

v0.12.3

v0.12.6

v0.12.7

v0.13.0

v0.13.2

v0.13.3

v0.13.5

v0.2.1-alpha

v0.2.3-alpha

v0.2.5

v0.2.5-alpha

v0.2.7

v0.2.8

v0.3.0

v0.3.10

v0.3.11

v0.3.13

v0.3.14

v0.3.2

v0.3.4

v0.3.5

v0.3.6

v0.3.8

v0.4.1

v0.4.2

v0.4.4

v0.4.6

v0.5.1

v0.5.13

v0.5.4

v0.5.8

v0.5.9

v0.6.0

v0.6.1

v0.6.3

v0.6.6

v0.6.8

v0.7.0

v0.9.0

v0.9.2

v0.9.4

7d965258ce Revert "add truncate and shift parameters (#12519)" (#12545) Jeffrey Morgan 2025-10-08 17:57:57 -07:00
6a62b894c7 add truncate and shift parameters (#12519) Jeffrey Morgan 2025-10-08 17:05:05 -07:00
90d429f5a8 thinking: turn on thinking mode for all reasoning models (#12533) Patrick Devine 2025-10-08 16:50:13 -07:00
1fc35f1260 kvcache: Clean up sliding window state with independent batches Jesse Gross 2025-10-06 16:04:53 -07:00
aa45f7ce27 discover: Disable flash attention for Jetson Xavier (CC 7.2) Jesse Gross 2025-10-07 11:37:58 -07:00
4e5d862ec4 Integration test tuning (#12492) Daniel Hiltgen 2025-10-08 09:51:25 -07:00
303be9304c docs: improve accuracy of LLM library docs (#12530) Daniel Hiltgen 2025-10-07 16:21:07 -07:00
bd15eba4e4 Bring back escape valve for llm libraries and fix Jetpack6 crash (#12529) Daniel Hiltgen 2025-10-07 16:06:14 -07:00
bc71278670 Merge pull request #12509 from ollama/drifkin/oai-compat-refactor Devon Rifkin 2025-10-06 16:22:08 -07:00
918231931c win: fix build script (#12513) Daniel Hiltgen 2025-10-06 14:46:45 -07:00
04c1849878 discovery: prevent dup OLLAMA_LIBRARY_PATH (#12514) Daniel Hiltgen 2025-10-06 14:36:44 -07:00
2c2f4deaa9 openai: refactor to split compat layer and middleware Devon Rifkin 2025-10-05 14:18:56 -07:00
292767afb4 CI: fix win arm build (#12502) Daniel Hiltgen 2025-10-04 11:46:45 -07:00
ae5e0f0889 CI: replace clang compiler for windows (#12495) Daniel Hiltgen 2025-10-04 09:18:42 -07:00
19e6796eac llm: Support KV cache quantization with gpt-oss Jesse Gross 2025-10-03 13:50:02 -07:00
33801c1597 Fixed Deepseek2 adding nil tensor error Grace 2025-10-03 14:20:06 -07:00
e4340667e3 Workaround broken NVIDIA iGPU free VRAM data (#12490) Daniel Hiltgen 2025-10-03 12:17:21 -07:00
2fa1e92a99 test: add template error test (#12489) Patrick Devine 2025-10-03 12:05:34 -07:00
07e36761c3 ci: place rocm windows in correct runner dir (#12487) Daniel Hiltgen 2025-10-03 07:28:40 -07:00
c29fb007c0 CI: temporarily disable clang install (#12486) Daniel Hiltgen 2025-10-02 20:31:18 -07:00
730ed6e9e1 ci: fix windows build (#12485) Daniel Hiltgen 2025-10-02 19:16:01 -07:00
dc06601677 ci: fix windows build (#12484) Daniel Hiltgen 2025-10-02 18:59:26 -07:00
1ed2881ef0 templates: fix crash in improperly defined templates (#12483) Patrick Devine 2025-10-02 17:25:55 -07:00
0bda72892c llm: Enable flash attention by default for qwen3 and qwen3moe Jesse Gross 2025-10-02 16:51:51 -07:00
55ca827267 AMD: block running on unsupported gfx900/gfx906 (#12481) Daniel Hiltgen 2025-10-02 16:53:05 -07:00
c68f367ef6 Update GGML to b6646 (#12245) Daniel Hiltgen 2025-10-02 14:47:10 -07:00
fdb109469f llm: Allow overriding flash attention setting Jesse Gross 2025-10-01 14:38:09 -07:00
05a43e078a fix panic on bootstrapDevices (#12475) Daniel Hiltgen 2025-10-01 17:39:29 -07:00
bc8909fb38 Use runners for GPU discovery (#12090) Daniel Hiltgen 2025-10-01 15:12:32 -07:00
6b50f2b9cd Merge pull request #12461 from ollama/drifkin/qwen3-coder-tweaks Devon Rifkin 2025-09-30 19:47:44 -07:00
35ac4eb12c fix keep alive Michael Yang 2025-09-30 17:12:37 -07:00
3d0b1734c0 ggml: Preallocate CUDA pool memory Jesse Gross 2025-09-09 16:17:31 -07:00
efaee8c2d6 ggml: Backport scale kernel fixes Jesse Gross 2025-09-23 12:13:39 -07:00
734b57da0e ggml: Remove allocation status reporting Jesse Gross 2025-09-22 17:27:03 -07:00
83021fcf0f qwen3-coder: fix tool definition type rendering Devon Rifkin 2025-09-30 15:03:15 -07:00
0469861d9d build: call find_package to instantiate library paths Michael Yang 2025-09-30 12:58:31 -07:00
04431b50fa fix v0.12.3 likelovewant 2025-09-28 12:37:28 +08:00
c47154c08d fix: correct condition for AMDGPU_TARGETS filtering logic (#12412) 羊撅撅 2025-09-27 02:38:47 +08:00
b04e46da3e bugfix: restore the current runOptions if loading fails in the CLI (#12402) Patrick Devine 2025-09-25 18:30:45 -07:00
34efbbd3f0 Merge pull request #12417 from ollama/drifkin/qwen3-coder-unicode Devon Rifkin 2025-09-25 15:56:34 -07:00
05ba4ca1f4 parsers: fix unicode handling for qwen3-coder Devon Rifkin 2025-09-25 15:47:46 -07:00
5a56ff3cf0 cli: add device signin flow when doing ollama push (#12405) Patrick Devine 2025-09-25 15:04:43 -07:00
2fba04b5fb tools: handle the case where a tool call sends "arguments" or "parameters" as a serialized json string (#12413) Gabe Goodhart 2025-09-25 15:37:39 -06:00
fbd82ba5bb Grace/deepseek v3 migration (#12385) Grace 2025-09-24 15:19:47 -07:00
2e742544bf prefer ollama engine for qwen3moe (#12374) Michael Yang 2025-09-24 11:21:32 -07:00
bbb195a6ff Merge pull request #12393 from ollama/drifkin/fix-built-ins Devon Rifkin 2025-09-23 23:45:31 -07:00
fd88cd7cb0 harmony: don't sanitize built-ins Devon Rifkin 2025-09-23 23:34:55 -07:00
e1979c571a fix: leaf alt name (#12390) Michael Yang 2025-09-23 17:50:53 -07:00
bf78ed6ee9 add pre:, suf: to tags (#12274) Michael Yang 2025-09-23 16:08:57 -07:00
a40d427bce multi-regexp pretokenizer (#12325) Michael Yang 2025-09-23 13:21:47 -07:00
64883e3c4c auth: fix problems with the ollama keypairs (#12373) Patrick Devine 2025-09-22 23:20:20 -07:00
41efdd4048 Merge pull request #12339 from ollama/drifkin/harmony-refactor-to-builtin Devon Rifkin 2025-09-22 13:13:40 -07:00
c23e6f4cae tests: add single threaded history test (#12295) Daniel Hiltgen 2025-09-22 11:23:14 -07:00
af060eb250 docs: update cloud.md for cloud models jmorganca 2025-09-19 15:50:41 -07:00
ae5c33008e docs: move turbo.md to cloud.md jmorganca 2025-09-19 15:49:56 -07:00
000a3ec8b9 Merge branch 'ollama:main' into main likelovewant 2025-09-21 10:33:39 +08:00
3677842ff1 Merge pull request #12358 from ollama/drifkin/qwen3-coder-ampersands Devon Rifkin 2025-09-20 12:40:33 -07:00
242df70a75 parsers: fix &s in qwen3coder parameter values Devon Rifkin 2025-09-20 12:10:58 -07:00
dba39b2eee gemma: fix rope scaling for qat models (#12348) Patrick Devine 2025-09-19 15:04:40 -07:00
9f3a37fd36 fix: model load for unsupported embedding models (#12311) Michael Yang 2025-09-18 16:11:08 -07:00
7460259eb3 feat: qwen3 embed (#12301) Michael Yang 2025-09-18 15:50:32 -07:00
22ccdd74c2 server: add unauthorized error to remote chat handler (#12338) Jeffrey Morgan 2025-09-18 19:40:31 -03:00
0c3d0e7533 build: avoid unbounded parallel builds (#12319) Daniel Hiltgen 2025-09-18 14:57:01 -07:00
e7f56ef3d8 harmony: remove special casing in routes.go Devon Rifkin 2025-09-18 14:55:59 -07:00
eb0a5d4459 auth: check the permissions on the private key to see if it's readable (#12336) Patrick Devine 2025-09-18 14:34:34 -07:00
ceac416ec2 fix(integration): check truncated length (#12337) Michael Yang 2025-09-18 14:00:21 -07:00
2717dce6fe convert: convert bf16 vision weights to fp16 (#12324) Patrick Devine 2025-09-17 17:43:17 -07:00
9b8187b487 server: skip parsing initial <think> if provided in the prompt for /api/generate (#12289) frob 2025-09-18 01:39:04 +02:00
8b894933a7 engine: add remote proxy (#12307) Patrick Devine 2025-09-17 14:40:53 -07:00
9c5bf342bc fix: multi-cuda version skew (#12318) Daniel Hiltgen 2025-09-17 13:05:09 -07:00
564b558c92 fix(llama): other llama flavours (#12308) Michael Yang 2025-09-17 12:12:21 -07:00
a417ac97ee prefer ollama engine for qwen3 (#12310) Michael Yang 2025-09-17 09:48:21 -07:00
05d53457af refactor: use the built-in max/min to simplify the code (#12280) russcoss 2025-09-16 20:14:21 -04:00
b225508c9b logutil: fix source field (#12279) Michael Yang 2025-09-16 16:18:07 -07:00
fa1c987a29 Merge pull request #12248 from ollama/drifkin/qwen3-coder-parsing Devon Rifkin 2025-09-16 10:21:43 -07:00
ad95d5b30b use split activations when possible (#12293) Michael Yang 2025-09-16 09:51:19 -07:00
c253433d68 embed: cleanup (#12299) Michael Yang 2025-09-16 09:48:42 -07:00
a1cff89b30 fix: fix CUDA detection for older GPUs (#12300) Beshoy Girgis 2025-09-16 09:47:06 -05:00
93c64ea1b1 doc: show how to clear the cgo cache (#12298) Daniel Hiltgen 2025-09-15 15:45:35 -07:00
3f6642f6fc model: implement bert in ollama engine (#9080) Michael Yang 2025-09-15 15:35:59 -07:00
6f7117145f batch: use tensors for outputs (#12185) Michael Yang 2025-09-15 14:33:06 -07:00
472feec2ff address comments Devon Rifkin 2025-09-15 11:46:25 -07:00
47991940d4 add qwen3-coder tool support Devon Rifkin 2025-09-11 13:40:35 -07:00
9f3f80891d Merge branch 'ollama:main' into main likelovewant 2025-09-13 10:45:51 +08:00
92b96d54ef Revert "runner: move harmony to runner (#12052)" jmorganca 2025-09-12 13:32:30 -07:00
9d56e63dbf Revert "runner: simplify parser entrypoints in runner (#12233)" jmorganca 2025-09-12 13:32:02 -07:00
053092185e Fix image cannot be seen with slice image on llama engine tc-mb 2025-09-13 07:25:12 +08:00
44a6792873 tests: tighten up a few flaky tests (#12271) Daniel Hiltgen 2025-09-12 13:59:34 -07:00
e4ce68311a cuda: remove compression for better compatibility (#12259) Daniel Hiltgen 2025-09-12 07:59:14 -07:00
26214125e8 ollamarunner: Suppress stack trace during memory allocation Jesse Gross 2025-09-11 13:48:51 -07:00
61fb912ca4 CI: fix windows cuda build (#12246) Daniel Hiltgen 2025-09-11 12:25:26 -07:00
aba1575315 llm: Don't try to load split vision models in the Ollama engine Jesse Gross 2025-09-10 11:03:06 -07:00
eb10390de9 llm: Enable new memory estimates by default Jesse Gross 2025-09-11 10:30:18 -07:00
feb18cd710 feat: add dimensions field to embed requests (#12242) Michael Yang 2025-09-11 10:36:10 -07:00
8a7e2055d2 cmd: use slices.Contains to simplify code (#12249) fengyuchuanshen 2025-09-12 00:57:31 +08:00
29ddfc2cab ggml: Disable flash attention for gemma2 Jesse Gross 2025-09-09 10:48:34 -07:00
71cb86af3e llm: Remove unneeded warning with flash attention enabled Jesse Gross 2025-09-09 10:37:28 -07:00
5198956372 docs: add ollama-co2 to community integrations (#12230) CarbonatedWater.org 2025-09-10 16:37:10 -07:00
17a023f34b Add v12 + v13 cuda support (#12000) Daniel Hiltgen 2025-09-10 12:05:18 -07:00
8d6fffaead runner: simplify parser entrypoints in runner (#12233) Parth Sareen 2025-09-10 11:24:42 -07:00

... 2 3 4 5 6 ...