Commit Graph

39 Commits

Author SHA1 Message Date
likelovewant
76feb6c569 Merge branch 'ollama:main' into main 2024-08-28 12:02:21 +08:00
Daniel Hiltgen
93ea9240ae Move ollama executable out of bin dir (#6535) 2024-08-27 16:19:00 -07:00
likelovewant
f9e1f572c2 Merge branch 'ollama:main' into main 2024-08-21 10:45:57 +08:00
Daniel Hiltgen
88bb9e3328 Adjust layout to bin+lib/ollama 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
74d45f0102 Refactor linux packaging
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
2024-08-19 09:38:53 -07:00
likelovewant
4574e557ee update to hip sdk 6.1.2 2024-08-16 15:25:43 +08:00
likelovewant
ca312b344f Merge branch 'ollama:main' into main 2024-08-07 17:20:55 +08:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
likelovewant
0d4292b4b1 Merge branch 'ollama:main' into main 2024-08-01 18:30:28 +08:00
Michael Yang
e2c3f6b3e2 string 2024-07-22 11:27:52 -07:00
likelovewant
5cae567ee8 megrge upstream update and reslove the conflicts 2024-07-22 17:00:43 +08:00
Daniel Hiltgen
283948c83b Adjust windows ROCm discovery
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery.  The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
likelovewant
d63280cf56 change back to 5.7 2024-07-14 01:02:00 +08:00
likelovewant
5505a018b2 Resolved merge conflicts 2024-07-12 20:44:04 +08:00
likelovewant
514e9186d3 update the igpu support 2024-07-11 23:28:08 +08:00
Daniel Hiltgen
4cfcbc328f Merge pull request #5124 from dhiltgen/amd_windows
Wire up windows AMD driver reporting
2024-07-10 12:50:23 -07:00
Daniel Hiltgen
1f50356e8e Bump ROCm on windows to 6.1.2
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
likelovewant
00beadf67e update 2024-07-10 23:40:16 +08:00
likelovewant
b0a43b1700 Update amd_windows.go 2024-07-10 21:43:21 +08:00
likelovewant
b8fdb0387c remove igpu limits 2024-07-02 11:06:26 +08:00
likelovewant
d772472225 Merge branch 'ollama:main' into main 2024-07-02 01:17:34 +08:00
likelovewant
1c648e512e remove code to support igpu 2024-06-29 22:32:45 +08:00
likelovewant
0e42bf50ca Merge upstream/main and resolve conflicts 2024-06-25 00:54:58 +08:00
Daniel Hiltgen
9929751cc8 Disable concurrency for AMD + Windows
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00
Daniel Hiltgen
784bf88b0d Wire up windows AMD driver reporting
This seems to be ROCm version, not actually driver version, but
it may be useful for toggling logic for VRAM reporting in the future
2024-06-18 16:22:47 -07:00
Daniel Hiltgen
6be309e1bd Centralize GPU configuration vars
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
Daniel Hiltgen
6f351bf586 review comments and coverage 2024-06-14 14:55:50 -07:00
Daniel Hiltgen
43ed358f9a Refine GPU discovery to bootstrap once
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
likelovewant
a6390a8992 Merge branch 'ollama:main' into main 2024-06-07 17:25:53 +08:00
Michael Yang
e919f6811f lint windows 2024-06-04 11:13:30 -07:00
likelovewant
a4a435bf8f Update amd_windows.go 2024-06-03 14:55:48 +08:00
likelovewant
73c49d57e8 Update amd_windows.go
remove this will broken the installer build
2024-05-24 20:06:28 +08:00
likelovewant
0e5b263a60 Update amd_windows.go
add igpu support ,remove those ,otherwise ,gfx1035 report not work
2024-05-24 15:26:24 +08:00
Daniel Hiltgen
8727a9c140 Record more GPU information
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Daniel Hiltgen
e592e8fccb Support Fedoras standard ROCm location 2024-05-01 15:47:12 -07:00
Daniel Hiltgen
0d6687f84c AMD gfx patch rev is hex
Correctly handle gfx90a discovery
2024-04-24 09:43:52 -07:00
Daniel Hiltgen
34b9db5afc Request and model concurrency
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Daniel Hiltgen
4a5c9b8035 Finish unwinding idempotent payload logic
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Daniel Hiltgen
6c5ccb11f9 Revamp ROCm support
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed.  It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux.  Given the large size of ROCms tensor files, we split the
dependency out.  It's bundled into the installer on windows, and a
separate download on windows.  The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us.  For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
2024-03-07 10:36:50 -08:00