ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-22 14:53:56 +00:00

Author	SHA1	Message	Date
likelovewant	b7d38e2ccd	Merge branch 'ollama:main' into main	2024-08-13 11:27:09 +08:00
Michael Yang	6ffb5cb017	add conversion for microsoft phi 3 mini/medium 4k, 128	2024-08-12 15:13:29 -07:00
Jeffrey Morgan	15c2d8fe14	server: parallelize embeddings in API web handler instead of in subprocess runner (#6220 ) For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.	2024-08-11 11:57:10 -07:00
Daniel Hiltgen	25906d72d1	llm: prevent loading too large models on windows (#5926 ) Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.	2024-08-11 11:30:20 -07:00
Daniel Hiltgen	2473bdba5e	Merge pull request #6182 from dhiltgen/more_patterns Catch one more error log	2024-08-08 12:33:17 -07:00
likelovewant	ca312b344f	Merge branch 'ollama:main' into main	2024-08-07 17:20:55 +08:00
Jeffrey Morgan	de4fc29773	llm: reserve required number of slots for embeddings (#6219 )	2024-08-06 23:20:49 -04:00
Jeffrey Morgan	e04c7012c2	update llama.cpp submodule to `1e6f6554` (#6208 )	2024-08-06 15:11:45 -04:00
royjhan	86b907f82a	sort batch results (#6189 )	2024-08-05 16:55:34 -07:00
Daniel Hiltgen	f457d63400	Implement linux NUMA detection If the system has multiple numa nodes, enable numa support in llama.cpp If we detect numactl in the path, use that, else use the basic "distribute" mode.	2024-08-05 12:56:20 -07:00
Daniel Hiltgen	04210aa6dd	Catch one more error log	2024-08-05 09:28:07 -07:00
Michael Yang	6a07344786	line feed	2024-08-04 17:25:41 -07:00
likelovewant	63a5f509ed	remove official support arches to down size	2024-08-02 13:30:46 +08:00
likelovewant	ca4c0c1a8f	Merge branch 'ollama:main' into main	2024-08-02 09:28:09 +08:00
Michael Yang	b732beba6a	lint	2024-08-01 17:06:06 -07:00
Michael Yang	0ff42e84b0	Merge pull request #4756 from ollama/mxyng/convert2 refactor convert	2024-08-01 14:16:30 -07:00
likelovewant	0d4292b4b1	Merge branch 'ollama:main' into main	2024-08-01 18:30:28 +08:00
Michael Yang	df993fa37b	comments	2024-07-31 15:58:55 -07:00
Michael Yang	5e9db9fb0b	refactor convert	2024-07-31 15:58:33 -07:00
Michael Yang	0f3271db88	patches: phi3 default sliding window attention	2024-07-31 14:58:34 -07:00
Michael Yang	6b252918fb	update convert test to check result data	2024-07-31 10:59:38 -07:00
Michael Yang	5c1912769e	Merge pull request #5473 from ollama/mxyng/environ fix: environ lookup	2024-07-31 10:18:05 -07:00
likelovewant	1eb1dc32d2	Merge branch 'ollama:main' into main	2024-07-31 14:52:26 +08:00
likelovewant	ad5ad895fb	fix	2024-07-31 13:37:19 +08:00
jmorganca	afa8d6e9d5	patch gemma support	2024-07-30 18:07:29 -07:00
royjhan	1b44d873e7	Add Metrics to `api\embed` response (#5709 ) * add prompt tokens to embed response * rm slog * metrics * types * prompt n * clean up * reset submodule * update tests * test name * list metrics	2024-07-30 13:12:21 -07:00
likelovewant	fc296fd744	Remove llm/llama.cpp from Git index	2024-07-30 22:37:32 +08:00
likelovewant	e628246970	Restore llama.cpp from commit 6eeaeba	2024-07-30 20:43:59 +08:00
likelovewant	776aa9ceb2	resolve merge conflicts	2024-07-30 18:53:59 +08:00
Jeffrey Morgan	68ee42f995	update llama.cpp submodule to `6eeaeba1` (#6039 )	2024-07-29 13:20:26 -07:00
Tibor Schmidt	f3d7a481b7	feat: add support for min_p (resolve #1142 ) (#1825 )	2024-07-27 14:37:40 -07:00
likelovewant	91ba40fc45	Merge branch 'ollama:main' into main	2024-07-27 12:18:55 +08:00
Jeffrey Morgan	f2a96c7d77	llm: keep patch for llama 3 rope factors (#5987 )	2024-07-26 15:20:52 -07:00
likelovewant	86a1575ee3	fix api	2024-07-23 14:57:33 +08:00
likelovewant	fbfc13b6ca	Merge branch 'ollama:main' into main	2024-07-23 14:49:32 +08:00
Daniel Hiltgen	e12fff8810	Enable windows error dialog for subprocess startup Make sure if something goes wrong spawning the process, the user gets enough info to be able to try to self correct, or at least file a bug with details so we can fix it. Once the process starts, we immediately change back to the recommended setting to prevent the blocking dialog. This ensures if the model fails to load (OOM, unsupported model type, etc.) the process will exit quickly and we can scan the stdout/stderr of the subprocess for the reason to report via API.	2024-07-22 14:07:27 -07:00
Michael Yang	e2c3f6b3e2	string	2024-07-22 11:27:52 -07:00
Michael Yang	55cd3ddcca	bool	2024-07-22 11:27:21 -07:00
Michael Yang	35b89b2eab	rfc: dynamic environ lookup	2024-07-22 11:25:30 -07:00
Daniel Hiltgen	5784c05397	Merge pull request #5854 from dhiltgen/win_exit_status Refine error reporting for subprocess crash	2024-07-22 10:40:22 -07:00
Jeffrey Morgan	f8fedbda20	Update llama.cpp submodule commit to `d94c6e0c` (#5805 )	2024-07-22 12:42:00 -04:00
Daniel Hiltgen	a3c20e3f18	Refine error reporting for subprocess crash On windows, the exit status winds up being the search term many users search for and end up piling in on issues that are unrelated. This refines the reporting so that if we have a more detailed message we'll suppress the exit status portion of the message.	2024-07-22 08:52:16 -07:00
likelovewant	c44ff579a3	fix mismatch	2024-07-22 19:47:58 +08:00
likelovewant	04325ba40a	fix typo	2024-07-22 19:35:43 +08:00
likelovewant	3f03ae5808	update gen_windows.ps1 ,keep track with upstream	2024-07-22 19:00:40 +08:00
likelovewant	24641ae3a5	update gen_windows.ps1 ,keep track with upstream	2024-07-22 18:48:21 +08:00
likelovewant	5cae567ee8	megrge upstream update and reslove the conflicts	2024-07-22 17:00:43 +08:00
likelovewant	a8890fd2c6	fix conflicts	2024-07-22 08:10:12 +08:00
Jeffrey Morgan	5534f2cc6a	llm: consider `head_dim` in llama arch (#5817 )	2024-07-20 21:48:12 -04:00
Daniel Hiltgen	283948c83b	Adjust windows ROCm discovery The v5 hip library returns unsupported GPUs which wont enumerate at inference time in the runner so this makes sure we align discovery. The gfx906 cards are no longer supported so we shouldn't compile with that GPU type as it wont enumerate at runtime.	2024-07-20 15:17:50 -07:00

1 2 3 4 5 ...

671 Commits