Commit Graph

186 Commits

Author SHA1 Message Date
likelovewant
06db1f2cf5 Merge branch 'ollama:main' into main 2024-09-08 11:44:55 +08:00
Daniel Hiltgen
f29b167e1a Use cuda v11 for driver 525 and older (#6620)
It looks like driver 525 (aka, cuda driver 12.0) has problems with the cuda v12 library
we compile against, so run v11 on those older drivers if detected.
2024-09-03 17:15:31 -07:00
likelovewant
76feb6c569 Merge branch 'ollama:main' into main 2024-08-28 12:02:21 +08:00
Daniel Hiltgen
93ea9240ae Move ollama executable out of bin dir (#6535) 2024-08-27 16:19:00 -07:00
Daniel Hiltgen
69be940bf6 gpu: Group GPU Library sets by variant (#6483)
The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.
2024-08-23 15:11:56 -07:00
Daniel Hiltgen
7a1e1c1caf gpu: Ensure driver version set before variant (#6480)
During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.
2024-08-23 11:21:12 -07:00
likelovewant
f9e1f572c2 Merge branch 'ollama:main' into main 2024-08-21 10:45:57 +08:00
Daniel Hiltgen
f9e31da946 Review comments 2024-08-19 10:36:15 -07:00
Daniel Hiltgen
88bb9e3328 Adjust layout to bin+lib/ollama 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
4fe3a556fa Add cuda v12 variant and selection logic
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
fc3b4cda89 Report GPU variant in log 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
d470ebe78b Add Jetson cuda variants for arm
This adds new variants for arm64 specific to Jetson platforms
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
74d45f0102 Refactor linux packaging
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
2024-08-19 09:38:53 -07:00
likelovewant
4574e557ee update to hip sdk 6.1.2 2024-08-16 15:25:43 +08:00
likelovewant
b7d38e2ccd Merge branch 'ollama:main' into main 2024-08-13 11:27:09 +08:00
Michael Yang
160d9d4900 Merge pull request #6171 from ollama/mxyng/remove-temp
removeall to remove non-empty temp dirs
2024-08-09 15:47:13 -07:00
Daniel Hiltgen
5bca2e60a7 Harden intel boostrap for nil pointers 2024-08-09 11:31:38 -07:00
likelovewant
ca312b344f Merge branch 'ollama:main' into main 2024-08-07 17:20:55 +08:00
Daniel Hiltgen
f457d63400 Implement linux NUMA detection
If the system has multiple numa nodes, enable numa support in llama.cpp
If we detect numactl in the path, use that, else use the basic "distribute" mode.
2024-08-05 12:56:20 -07:00
Michael Yang
43f9d92008 close pid file 2024-08-05 00:41:16 -07:00
Michael Yang
ed6c8bfe57 removeall to remove non-empty temp dirs 2024-08-05 00:41:16 -07:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
likelovewant
0d4292b4b1 Merge branch 'ollama:main' into main 2024-08-01 18:30:28 +08:00
Michael Yang
5c1912769e Merge pull request #5473 from ollama/mxyng/environ
fix: environ lookup
2024-07-31 10:18:05 -07:00
likelovewant
776aa9ceb2 resolve merge conflicts 2024-07-30 18:53:59 +08:00
Daniel Hiltgen
7c2a157ca4 Ensure amd gpu nodes are numerically sorted
For systems that enumerate over 10 CPUs the default lexicographical
sort order interleaves CPUs and GPUs.
2024-07-24 13:43:26 -07:00
Michael Yang
e2c3f6b3e2 string 2024-07-22 11:27:52 -07:00
Michael Yang
55cd3ddcca bool 2024-07-22 11:27:21 -07:00
Michael Yang
35b89b2eab rfc: dynamic environ lookup 2024-07-22 11:25:30 -07:00
likelovewant
5cae567ee8 megrge upstream update and reslove the conflicts 2024-07-22 17:00:43 +08:00
Daniel Hiltgen
283948c83b Adjust windows ROCm discovery
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery.  The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
likelovewant
d63280cf56 change back to 5.7 2024-07-14 01:02:00 +08:00
likelovewant
5505a018b2 Resolved merge conflicts 2024-07-12 20:44:04 +08:00
Jeffrey Morgan
c4cf8ad559 llm: avoid loading model if system memory is too small (#5637)
* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2024-07-11 16:42:57 -07:00
likelovewant
514e9186d3 update the igpu support 2024-07-11 23:28:08 +08:00
Daniel Hiltgen
4cfcbc328f Merge pull request #5124 from dhiltgen/amd_windows
Wire up windows AMD driver reporting
2024-07-10 12:50:23 -07:00
Daniel Hiltgen
8ea500441d Merge pull request #5580 from dhiltgen/cuda_overhead
Detect CUDA OS overhead
2024-07-10 12:47:31 -07:00
Daniel Hiltgen
1f50356e8e Bump ROCm on windows to 6.1.2
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
likelovewant
00beadf67e update 2024-07-10 23:40:16 +08:00
likelovewant
b0a43b1700 Update amd_windows.go 2024-07-10 21:43:21 +08:00
Daniel Hiltgen
f6f759fc5f Detect CUDA OS Overhead
This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.
2024-07-09 12:21:50 -07:00
likelovewant
72bcdc1d4e Merge branch 'ollama:main' into main 2024-07-08 16:02:24 +08:00
Jeffrey Morgan
f8241bfba3 gpu: report system free memory instead of 0 (#5521) 2024-07-06 19:35:04 -04:00
likelovewant
dc1d1a121b Merge branch 'ollama:main' into main 2024-07-05 21:48:45 +08:00
Daniel Hiltgen
ef757da2c9 Better nvidia GPU discovery logging
Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.
2024-07-03 10:50:40 -07:00
likelovewant
b8fdb0387c remove igpu limits 2024-07-02 11:06:26 +08:00
likelovewant
d772472225 Merge branch 'ollama:main' into main 2024-07-02 01:17:34 +08:00
likelovewant
1c648e512e remove code to support igpu 2024-06-29 22:32:45 +08:00
likelovewant
0e42bf50ca Merge upstream/main and resolve conflicts 2024-06-25 00:54:58 +08:00
Daniel Hiltgen
9929751cc8 Disable concurrency for AMD + Windows
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00