Commit Graph

  • 7d965258ce Revert "add truncate and shift parameters (#12519)" (#12545) Jeffrey Morgan 2025-10-08 17:57:57 -07:00
  • 6a62b894c7 add truncate and shift parameters (#12519) Jeffrey Morgan 2025-10-08 17:05:05 -07:00
  • 90d429f5a8 thinking: turn on thinking mode for all reasoning models (#12533) Patrick Devine 2025-10-08 16:50:13 -07:00
  • 1fc35f1260 kvcache: Clean up sliding window state with independent batches Jesse Gross 2025-10-06 16:04:53 -07:00
  • aa45f7ce27 discover: Disable flash attention for Jetson Xavier (CC 7.2) Jesse Gross 2025-10-07 11:37:58 -07:00
  • 4e5d862ec4 Integration test tuning (#12492) Daniel Hiltgen 2025-10-08 09:51:25 -07:00
  • 303be9304c docs: improve accuracy of LLM library docs (#12530) Daniel Hiltgen 2025-10-07 16:21:07 -07:00
  • bd15eba4e4 Bring back escape valve for llm libraries and fix Jetpack6 crash (#12529) Daniel Hiltgen 2025-10-07 16:06:14 -07:00
  • bc71278670 Merge pull request #12509 from ollama/drifkin/oai-compat-refactor Devon Rifkin 2025-10-06 16:22:08 -07:00
  • 918231931c win: fix build script (#12513) Daniel Hiltgen 2025-10-06 14:46:45 -07:00
  • 04c1849878 discovery: prevent dup OLLAMA_LIBRARY_PATH (#12514) Daniel Hiltgen 2025-10-06 14:36:44 -07:00
  • 2c2f4deaa9 openai: refactor to split compat layer and middleware Devon Rifkin 2025-10-05 14:18:56 -07:00
  • 292767afb4 CI: fix win arm build (#12502) Daniel Hiltgen 2025-10-04 11:46:45 -07:00
  • ae5e0f0889 CI: replace clang compiler for windows (#12495) Daniel Hiltgen 2025-10-04 09:18:42 -07:00
  • 19e6796eac llm: Support KV cache quantization with gpt-oss Jesse Gross 2025-10-03 13:50:02 -07:00
  • 33801c1597 Fixed Deepseek2 adding nil tensor error Grace 2025-10-03 14:20:06 -07:00
  • e4340667e3 Workaround broken NVIDIA iGPU free VRAM data (#12490) Daniel Hiltgen 2025-10-03 12:17:21 -07:00
  • 2fa1e92a99 test: add template error test (#12489) Patrick Devine 2025-10-03 12:05:34 -07:00
  • 07e36761c3 ci: place rocm windows in correct runner dir (#12487) Daniel Hiltgen 2025-10-03 07:28:40 -07:00
  • c29fb007c0 CI: temporarily disable clang install (#12486) Daniel Hiltgen 2025-10-02 20:31:18 -07:00
  • 730ed6e9e1 ci: fix windows build (#12485) Daniel Hiltgen 2025-10-02 19:16:01 -07:00
  • dc06601677 ci: fix windows build (#12484) Daniel Hiltgen 2025-10-02 18:59:26 -07:00
  • 1ed2881ef0 templates: fix crash in improperly defined templates (#12483) Patrick Devine 2025-10-02 17:25:55 -07:00
  • 0bda72892c llm: Enable flash attention by default for qwen3 and qwen3moe Jesse Gross 2025-10-02 16:51:51 -07:00
  • 55ca827267 AMD: block running on unsupported gfx900/gfx906 (#12481) Daniel Hiltgen 2025-10-02 16:53:05 -07:00
  • c68f367ef6 Update GGML to b6646 (#12245) Daniel Hiltgen 2025-10-02 14:47:10 -07:00
  • fdb109469f llm: Allow overriding flash attention setting Jesse Gross 2025-10-01 14:38:09 -07:00
  • 05a43e078a fix panic on bootstrapDevices (#12475) Daniel Hiltgen 2025-10-01 17:39:29 -07:00
  • bc8909fb38 Use runners for GPU discovery (#12090) Daniel Hiltgen 2025-10-01 15:12:32 -07:00
  • 6b50f2b9cd Merge pull request #12461 from ollama/drifkin/qwen3-coder-tweaks Devon Rifkin 2025-09-30 19:47:44 -07:00
  • 35ac4eb12c fix keep alive Michael Yang 2025-09-30 17:12:37 -07:00
  • 3d0b1734c0 ggml: Preallocate CUDA pool memory Jesse Gross 2025-09-09 16:17:31 -07:00
  • efaee8c2d6 ggml: Backport scale kernel fixes Jesse Gross 2025-09-23 12:13:39 -07:00
  • 734b57da0e ggml: Remove allocation status reporting Jesse Gross 2025-09-22 17:27:03 -07:00
  • 83021fcf0f qwen3-coder: fix tool definition type rendering Devon Rifkin 2025-09-30 15:03:15 -07:00
  • 0469861d9d build: call find_package to instantiate library paths Michael Yang 2025-09-30 12:58:31 -07:00
  • 04431b50fa fix v0.12.3 likelovewant 2025-09-28 12:37:28 +08:00
  • c47154c08d fix: correct condition for AMDGPU_TARGETS filtering logic (#12412) 羊撅撅 2025-09-27 02:38:47 +08:00
  • b04e46da3e bugfix: restore the current runOptions if loading fails in the CLI (#12402) Patrick Devine 2025-09-25 18:30:45 -07:00
  • 34efbbd3f0 Merge pull request #12417 from ollama/drifkin/qwen3-coder-unicode Devon Rifkin 2025-09-25 15:56:34 -07:00
  • 05ba4ca1f4 parsers: fix unicode handling for qwen3-coder Devon Rifkin 2025-09-25 15:47:46 -07:00
  • 5a56ff3cf0 cli: add device signin flow when doing ollama push (#12405) Patrick Devine 2025-09-25 15:04:43 -07:00
  • 2fba04b5fb tools: handle the case where a tool call sends "arguments" or "parameters" as a serialized json string (#12413) Gabe Goodhart 2025-09-25 15:37:39 -06:00
  • fbd82ba5bb Grace/deepseek v3 migration (#12385) Grace 2025-09-24 15:19:47 -07:00
  • 2e742544bf prefer ollama engine for qwen3moe (#12374) Michael Yang 2025-09-24 11:21:32 -07:00
  • bbb195a6ff Merge pull request #12393 from ollama/drifkin/fix-built-ins Devon Rifkin 2025-09-23 23:45:31 -07:00
  • fd88cd7cb0 harmony: don't sanitize built-ins Devon Rifkin 2025-09-23 23:34:55 -07:00
  • e1979c571a fix: leaf alt name (#12390) Michael Yang 2025-09-23 17:50:53 -07:00
  • bf78ed6ee9 add pre:, suf: to tags (#12274) Michael Yang 2025-09-23 16:08:57 -07:00
  • a40d427bce multi-regexp pretokenizer (#12325) Michael Yang 2025-09-23 13:21:47 -07:00
  • 64883e3c4c auth: fix problems with the ollama keypairs (#12373) Patrick Devine 2025-09-22 23:20:20 -07:00
  • 41efdd4048 Merge pull request #12339 from ollama/drifkin/harmony-refactor-to-builtin Devon Rifkin 2025-09-22 13:13:40 -07:00
  • c23e6f4cae tests: add single threaded history test (#12295) Daniel Hiltgen 2025-09-22 11:23:14 -07:00
  • af060eb250 docs: update cloud.md for cloud models jmorganca 2025-09-19 15:50:41 -07:00
  • ae5c33008e docs: move turbo.md to cloud.md jmorganca 2025-09-19 15:49:56 -07:00
  • 000a3ec8b9 Merge branch 'ollama:main' into main likelovewant 2025-09-21 10:33:39 +08:00
  • 3677842ff1 Merge pull request #12358 from ollama/drifkin/qwen3-coder-ampersands Devon Rifkin 2025-09-20 12:40:33 -07:00
  • 242df70a75 parsers: fix &s in qwen3coder parameter values Devon Rifkin 2025-09-20 12:10:58 -07:00
  • dba39b2eee gemma: fix rope scaling for qat models (#12348) Patrick Devine 2025-09-19 15:04:40 -07:00
  • 9f3a37fd36 fix: model load for unsupported embedding models (#12311) Michael Yang 2025-09-18 16:11:08 -07:00
  • 7460259eb3 feat: qwen3 embed (#12301) Michael Yang 2025-09-18 15:50:32 -07:00
  • 22ccdd74c2 server: add unauthorized error to remote chat handler (#12338) Jeffrey Morgan 2025-09-18 19:40:31 -03:00
  • 0c3d0e7533 build: avoid unbounded parallel builds (#12319) Daniel Hiltgen 2025-09-18 14:57:01 -07:00
  • e7f56ef3d8 harmony: remove special casing in routes.go Devon Rifkin 2025-09-18 14:55:59 -07:00
  • eb0a5d4459 auth: check the permissions on the private key to see if it's readable (#12336) Patrick Devine 2025-09-18 14:34:34 -07:00
  • ceac416ec2 fix(integration): check truncated length (#12337) Michael Yang 2025-09-18 14:00:21 -07:00
  • 2717dce6fe convert: convert bf16 vision weights to fp16 (#12324) Patrick Devine 2025-09-17 17:43:17 -07:00
  • 9b8187b487 server: skip parsing initial <think> if provided in the prompt for /api/generate (#12289) frob 2025-09-18 01:39:04 +02:00
  • 8b894933a7 engine: add remote proxy (#12307) Patrick Devine 2025-09-17 14:40:53 -07:00
  • 9c5bf342bc fix: multi-cuda version skew (#12318) Daniel Hiltgen 2025-09-17 13:05:09 -07:00
  • 564b558c92 fix(llama): other llama flavours (#12308) Michael Yang 2025-09-17 12:12:21 -07:00
  • a417ac97ee prefer ollama engine for qwen3 (#12310) Michael Yang 2025-09-17 09:48:21 -07:00
  • 05d53457af refactor: use the built-in max/min to simplify the code (#12280) russcoss 2025-09-16 20:14:21 -04:00
  • b225508c9b logutil: fix source field (#12279) Michael Yang 2025-09-16 16:18:07 -07:00
  • fa1c987a29 Merge pull request #12248 from ollama/drifkin/qwen3-coder-parsing Devon Rifkin 2025-09-16 10:21:43 -07:00
  • ad95d5b30b use split activations when possible (#12293) Michael Yang 2025-09-16 09:51:19 -07:00
  • c253433d68 embed: cleanup (#12299) Michael Yang 2025-09-16 09:48:42 -07:00
  • a1cff89b30 fix: fix CUDA detection for older GPUs (#12300) Beshoy Girgis 2025-09-16 09:47:06 -05:00
  • 93c64ea1b1 doc: show how to clear the cgo cache (#12298) Daniel Hiltgen 2025-09-15 15:45:35 -07:00
  • 3f6642f6fc model: implement bert in ollama engine (#9080) Michael Yang 2025-09-15 15:35:59 -07:00
  • 6f7117145f batch: use tensors for outputs (#12185) Michael Yang 2025-09-15 14:33:06 -07:00
  • 472feec2ff address comments Devon Rifkin 2025-09-15 11:46:25 -07:00
  • 47991940d4 add qwen3-coder tool support Devon Rifkin 2025-09-11 13:40:35 -07:00
  • 9f3f80891d Merge branch 'ollama:main' into main likelovewant 2025-09-13 10:45:51 +08:00
  • 92b96d54ef Revert "runner: move harmony to runner (#12052)" jmorganca 2025-09-12 13:32:30 -07:00
  • 9d56e63dbf Revert "runner: simplify parser entrypoints in runner (#12233)" jmorganca 2025-09-12 13:32:02 -07:00
  • 053092185e Fix image cannot be seen with slice image on llama engine tc-mb 2025-09-13 07:25:12 +08:00
  • 44a6792873 tests: tighten up a few flaky tests (#12271) Daniel Hiltgen 2025-09-12 13:59:34 -07:00
  • e4ce68311a cuda: remove compression for better compatibility (#12259) Daniel Hiltgen 2025-09-12 07:59:14 -07:00
  • 26214125e8 ollamarunner: Suppress stack trace during memory allocation Jesse Gross 2025-09-11 13:48:51 -07:00
  • 61fb912ca4 CI: fix windows cuda build (#12246) Daniel Hiltgen 2025-09-11 12:25:26 -07:00
  • aba1575315 llm: Don't try to load split vision models in the Ollama engine Jesse Gross 2025-09-10 11:03:06 -07:00
  • eb10390de9 llm: Enable new memory estimates by default Jesse Gross 2025-09-11 10:30:18 -07:00
  • feb18cd710 feat: add dimensions field to embed requests (#12242) Michael Yang 2025-09-11 10:36:10 -07:00
  • 8a7e2055d2 cmd: use slices.Contains to simplify code (#12249) fengyuchuanshen 2025-09-12 00:57:31 +08:00
  • 29ddfc2cab ggml: Disable flash attention for gemma2 Jesse Gross 2025-09-09 10:48:34 -07:00
  • 71cb86af3e llm: Remove unneeded warning with flash attention enabled Jesse Gross 2025-09-09 10:37:28 -07:00
  • 5198956372 docs: add ollama-co2 to community integrations (#12230) CarbonatedWater.org 2025-09-10 16:37:10 -07:00
  • 17a023f34b Add v12 + v13 cuda support (#12000) Daniel Hiltgen 2025-09-10 12:05:18 -07:00
  • 8d6fffaead runner: simplify parser entrypoints in runner (#12233) Parth Sareen 2025-09-10 11:24:42 -07:00