ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-24 07:28:27 +00:00

Author	SHA1	Message	Date
likelovewant	06db1f2cf5	Merge branch 'ollama:main' into main	2024-09-08 11:44:55 +08:00
Daniel Hiltgen	f29b167e1a	Use cuda v11 for driver 525 and older (#6620 ) It looks like driver 525 (aka, cuda driver 12.0) has problems with the cuda v12 library we compile against, so run v11 on those older drivers if detected.	2024-09-03 17:15:31 -07:00
likelovewant	76feb6c569	Merge branch 'ollama:main' into main	2024-08-28 12:02:21 +08:00
Daniel Hiltgen	93ea9240ae	Move ollama executable out of bin dir (#6535 )	2024-08-27 16:19:00 -07:00
Daniel Hiltgen	69be940bf6	gpu: Group GPU Library sets by variant (#6483 ) The recent cuda variant changes uncovered a bug in ByLibrary which failed to group by common variant for GPU types.	2024-08-23 15:11:56 -07:00
Daniel Hiltgen	7a1e1c1caf	gpu: Ensure driver version set before variant (#6480 ) During rebasing, the ordering was inverted causing the cuda version selection logic to break, with driver version being evaluated as zero incorrectly causing a downgrade to v11.	2024-08-23 11:21:12 -07:00
likelovewant	f9e1f572c2	Merge branch 'ollama:main' into main	2024-08-21 10:45:57 +08:00
Daniel Hiltgen	f9e31da946	Review comments	2024-08-19 10:36:15 -07:00
Daniel Hiltgen	88bb9e3328	Adjust layout to bin+lib/ollama	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	4fe3a556fa	Add cuda v12 variant and selection logic Based on compute capability and driver version, pick v12 or v11 cuda variants.	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	fc3b4cda89	Report GPU variant in log	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	d470ebe78b	Add Jetson cuda variants for arm This adds new variants for arm64 specific to Jetson platforms	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	74d45f0102	Refactor linux packaging This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.	2024-08-19 09:38:53 -07:00
likelovewant	4574e557ee	update to hip sdk 6.1.2	2024-08-16 15:25:43 +08:00
likelovewant	b7d38e2ccd	Merge branch 'ollama:main' into main	2024-08-13 11:27:09 +08:00
Michael Yang	160d9d4900	Merge pull request #6171 from ollama/mxyng/remove-temp removeall to remove non-empty temp dirs	2024-08-09 15:47:13 -07:00
Daniel Hiltgen	5bca2e60a7	Harden intel boostrap for nil pointers	2024-08-09 11:31:38 -07:00
likelovewant	ca312b344f	Merge branch 'ollama:main' into main	2024-08-07 17:20:55 +08:00
Daniel Hiltgen	f457d63400	Implement linux NUMA detection If the system has multiple numa nodes, enable numa support in llama.cpp If we detect numactl in the path, use that, else use the basic "distribute" mode.	2024-08-05 12:56:20 -07:00
Michael Yang	43f9d92008	close pid file	2024-08-05 00:41:16 -07:00
Michael Yang	ed6c8bfe57	removeall to remove non-empty temp dirs	2024-08-05 00:41:16 -07:00
Michael Yang	b732beba6a	lint	2024-08-01 17:06:06 -07:00
likelovewant	0d4292b4b1	Merge branch 'ollama:main' into main	2024-08-01 18:30:28 +08:00
Michael Yang	5c1912769e	Merge pull request #5473 from ollama/mxyng/environ fix: environ lookup	2024-07-31 10:18:05 -07:00
likelovewant	776aa9ceb2	resolve merge conflicts	2024-07-30 18:53:59 +08:00
Daniel Hiltgen	7c2a157ca4	Ensure amd gpu nodes are numerically sorted For systems that enumerate over 10 CPUs the default lexicographical sort order interleaves CPUs and GPUs.	2024-07-24 13:43:26 -07:00
Michael Yang	e2c3f6b3e2	string	2024-07-22 11:27:52 -07:00
Michael Yang	55cd3ddcca	bool	2024-07-22 11:27:21 -07:00
Michael Yang	35b89b2eab	rfc: dynamic environ lookup	2024-07-22 11:25:30 -07:00
likelovewant	5cae567ee8	megrge upstream update and reslove the conflicts	2024-07-22 17:00:43 +08:00
Daniel Hiltgen	283948c83b	Adjust windows ROCm discovery The v5 hip library returns unsupported GPUs which wont enumerate at inference time in the runner so this makes sure we align discovery. The gfx906 cards are no longer supported so we shouldn't compile with that GPU type as it wont enumerate at runtime.	2024-07-20 15:17:50 -07:00
likelovewant	d63280cf56	change back to 5.7	2024-07-14 01:02:00 +08:00
likelovewant	5505a018b2	Resolved merge conflicts	2024-07-12 20:44:04 +08:00
Jeffrey Morgan	c4cf8ad559	llm: avoid loading model if system memory is too small (#5637 ) * llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com>	2024-07-11 16:42:57 -07:00
likelovewant	514e9186d3	update the igpu support	2024-07-11 23:28:08 +08:00
Daniel Hiltgen	4cfcbc328f	Merge pull request #5124 from dhiltgen/amd_windows Wire up windows AMD driver reporting	2024-07-10 12:50:23 -07:00
Daniel Hiltgen	8ea500441d	Merge pull request #5580 from dhiltgen/cuda_overhead Detect CUDA OS overhead	2024-07-10 12:47:31 -07:00
Daniel Hiltgen	1f50356e8e	Bump ROCm on windows to 6.1.2 This also adjusts our algorithm to favor our bundled ROCm. I've confirmed VRAM reporting still doesn't work properly so we can't yet enable concurrency by default.	2024-07-10 11:01:22 -07:00
likelovewant	00beadf67e	update	2024-07-10 23:40:16 +08:00
likelovewant	b0a43b1700	Update amd_windows.go	2024-07-10 21:43:21 +08:00
Daniel Hiltgen	f6f759fc5f	Detect CUDA OS Overhead This adds logic to detect skew between the driver and management library which can be attributed to OS overhead and records that so we can adjust subsequent management library free VRAM updates and avoid OOM scenarios.	2024-07-09 12:21:50 -07:00
likelovewant	72bcdc1d4e	Merge branch 'ollama:main' into main	2024-07-08 16:02:24 +08:00
Jeffrey Morgan	f8241bfba3	gpu: report system free memory instead of 0 (#5521 )	2024-07-06 19:35:04 -04:00
likelovewant	dc1d1a121b	Merge branch 'ollama:main' into main	2024-07-05 21:48:45 +08:00
Daniel Hiltgen	ef757da2c9	Better nvidia GPU discovery logging Refine the way we log GPU discovery to improve the non-debug output, and report more actionable log messages when possible to help users troubleshoot on their own.	2024-07-03 10:50:40 -07:00
likelovewant	b8fdb0387c	remove igpu limits	2024-07-02 11:06:26 +08:00
likelovewant	d772472225	Merge branch 'ollama:main' into main	2024-07-02 01:17:34 +08:00
likelovewant	1c648e512e	remove code to support igpu	2024-06-29 22:32:45 +08:00
likelovewant	0e42bf50ca	Merge upstream/main and resolve conflicts	2024-06-25 00:54:58 +08:00
Daniel Hiltgen	9929751cc8	Disable concurrency for AMD + Windows Until ROCm v6.2 ships, we wont be able to get accurate free memory reporting on windows, which makes automatic concurrency too risky. Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes. All other platforms and GPUs have accurate VRAM reporting wired up now, so we can turn on concurrency by default.	2024-06-21 15:45:05 -07:00

1 2 3 4

186 Commits