ollama-for-amd

mirror of https://github.com/likelovewant/ollama-for-amd.git synced 2025-12-25 16:08:01 +00:00

Author	SHA1	Message	Date
likelovewant	b8fdb0387c	remove igpu limits	2024-07-02 11:06:26 +08:00
likelovewant	50463011dd	Merge branch 'ollama:main' into main	2024-07-02 10:56:16 +08:00
Josh	2425281317	Merge pull request #5336 from ollama/jyan/from-errors fix: trim spaces for FROM argument, don't trim inside of quotes	2024-07-01 16:32:46 -07:00
Josh	0403e9860e	Merge pull request #5421 from ollama/jyan/ver fix: add unsupported architecture message for linux/windows	2024-07-01 16:32:14 -07:00
Josh Yan	33a65e3ba3	error	2024-07-01 16:04:13 -07:00
Josh Yan	7e571f95f0	trimspace test case	2024-07-01 11:07:48 -07:00
likelovewant	d772472225	Merge branch 'ollama:main' into main	2024-07-02 01:17:34 +08:00
Daniel Hiltgen	e70610ef06	Merge pull request #5410 from dhiltgen/ctx_cleanup Fix case for NumCtx	2024-07-01 09:54:20 -07:00
Daniel Hiltgen	dfded7e075	Merge pull request #5364 from dhiltgen/concurrency_docs Document concurrent behavior and settings	2024-07-01 09:49:48 -07:00
Daniel Hiltgen	173b550438	Remove default auto from help message This may confuse users thinking "auto" is an acceptable string - it must be numeric	2024-07-01 09:48:05 -07:00
Daniel Hiltgen	cff3f44f4a	Fix case for NumCtx	2024-07-01 09:43:59 -07:00
Josh Yan	26e4e66faf	updated parsefile test	2024-07-01 09:43:49 -07:00
Daniel Hiltgen	3518aaef33	Merge pull request #4218 from dhiltgen/auto_parallel Enable concurrency by default	2024-07-01 08:32:29 -07:00
RAPID ARCHITECT	1963c00201	Update README.md (#5214 ) * Update README.md Added Mesop example to web & desktop * Update README.md --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-06-30 22:00:57 -04:00
Eduard	27402cb7a2	Update gpu.md (#5382 ) Runs fine on a NVIDIA GeForce GTX 1050 Ti	2024-06-30 21:48:51 -04:00
Jeffrey Morgan	c1218199cf	Update api.md	2024-06-29 16:22:49 -07:00
likelovewant	c03afb5bc4	Remove .vs/ directory files	2024-06-29 23:09:15 +08:00
likelovewant	6b5b3a2542	Add .vs/ to .gitignore	2024-06-29 22:59:09 +08:00
likelovewant	1c648e512e	remove code to support igpu v0.1.48-alpha	2024-06-29 22:32:45 +08:00
likelovewant	159dcaa93b	Merge branch 'ollama:main' into main	2024-06-29 20:59:45 +08:00
Jeffrey Morgan	717f7229eb	Do not shift context for sliding window models (#5368 ) * Do not shift context for sliding window models * truncate prompt > 2/3 tokens * only target gemma2	2024-06-28 19:39:31 -07:00
Daniel Hiltgen	aae56abb7c	Document concurrent behavior and settings	2024-06-28 13:15:57 -07:00
royjhan	5f034f5b63	Include Show Info in Interactive (#5342 )	2024-06-28 13:15:52 -07:00
royjhan	b910fa9010	Ollama Show: Check for Projector Type (#5307 ) * Check exists projtype * Maintain Ordering	2024-06-28 11:30:16 -07:00
royjhan	6d4219083c	Update docs (#5312 )	2024-06-28 09:58:14 -07:00
Michael Yang	1ed4f521c4	Merge pull request #5340 from ollama/mxyng/mem gemma2 graph	2024-06-27 14:26:49 -07:00
Michael Yang	de2163dafd	gemma2 graph	2024-06-27 13:34:52 -07:00
Josh Yan	9bd00041fa	trim all params	2024-06-27 11:18:38 -07:00
Josh Yan	4e986a823c	unquote, trimp space	2024-06-27 10:59:15 -07:00
Michael	2cc7d05012	update readme for gemma 2 (#5333 ) * update readme for gemma 2	2024-06-27 12:45:16 -04:00
likelovewant	b5286d46dc	Update gen_windows.ps1 v0.1.46-alpha	2024-06-27 12:55:18 +08:00
likelovewant	d5fd3ae7ea	Merge branch 'ollama:main' into main	2024-06-27 12:44:25 +08:00
Michael Yang	123a722a6f	zip: prevent extracting files into parent dirs (#5314 )	2024-06-26 21:38:21 -07:00
Jeffrey Morgan	4d311eb731	llm: architecture patch (#5316 )	2024-06-26 21:38:12 -07:00
likelovewant	0fc2f9c5f2	Merge branch 'ollama:main' into main	2024-06-25 19:22:17 +08:00
likelovewant	7ef869f2dc	Update gen_windows.ps1	2024-06-25 19:21:02 +08:00
Blake Mizerany	cb42e607c5	llm: speed up gguf decoding by a lot (#5246 ) Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.	2024-06-24 21:47:52 -07:00
Blake Mizerany	2aa91a937b	cmd: defer stating model info until necessary (#5248 ) This commit changes the 'ollama run' command to defer fetching model information until it really needs it. That is, when in interactive mode. It also removes one such case where the model information is fetch in duplicate, just before calling generateInteractive and then again, first thing, in generateInteractive. This positively impacts the performance of the command: ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.168 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.220 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.217 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 4% cpu 0.652 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 5% cpu 0.498 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 3% cpu 0.479 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total	2024-06-24 20:14:03 -07:00
likelovewant	0e42bf50ca	Merge upstream/main and resolve conflicts v0.1.45-alpha	2024-06-25 00:54:58 +08:00
likelovewant	c570d01dff	Remove .vs/ from version control	2024-06-25 00:49:47 +08:00
Daniel Hiltgen	ccef9431c8	Merge pull request #5205 from dhiltgen/modelfile_use_mmap Fix use_mmap parsing for modelfiles	2024-06-21 16:30:36 -07:00
Daniel Hiltgen	642cee1342	Sort the ps output Provide consistent ordering for the ps command - longest duration listed first	2024-06-21 15:59:41 -07:00
royjhan	9a9e7d83c4	Docs (#5149 )	2024-06-21 15:52:09 -07:00
Daniel Hiltgen	9929751cc8	Disable concurrency for AMD + Windows Until ROCm v6.2 ships, we wont be able to get accurate free memory reporting on windows, which makes automatic concurrency too risky. Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes. All other platforms and GPUs have accurate VRAM reporting wired up now, so we can turn on concurrency by default.	2024-06-21 15:45:05 -07:00
Daniel Hiltgen	17b7186cd7	Enable concurrency by default This adjusts our default settings to enable multiple models and parallel requests to a single model. Users can still override these by the same env var settings as before. Parallel has a direct impact on num_ctx, which in turn can have a significant impact on small VRAM GPUs so this change also refines the algorithm so that when parallel is not explicitly set by the user, we try to find a reasonable default that fits the model on their GPU(s). As before, multiple models will only load concurrently if they fully fit in VRAM.	2024-06-21 15:45:05 -07:00
Michael Yang	189a43caa2	Merge pull request #5206 from ollama/mxyng/quantize fix: quantization with template	2024-06-21 13:44:34 -07:00
Michael Yang	e835ef1836	fix: quantization with template	2024-06-21 13:39:25 -07:00
Daniel Hiltgen	7e7749224c	Fix use_mmap parsing for modelfiles Add the new tristate parsing logic for the code path for modelfiles, as well as a unit test.	2024-06-21 12:27:19 -07:00
Daniel Hiltgen	c7c2f3bc22	Merge pull request #5194 from dhiltgen/linux_mmap_auto Refine mmap default logic on linux	2024-06-20 11:44:08 -07:00
Daniel Hiltgen	54a79d6a8a	Merge pull request #5125 from dhiltgen/fedora39 Bump latest fedora cuda repo to 39	2024-06-20 11:27:24 -07:00

1 2 3 4 5 ...

3063 Commits