mirror of
https://github.com/likelovewant/ollama-for-amd.git
synced 2025-12-21 22:33:56 +00:00
backend: Support graph computation that does not return an output
There are two cases where we may not have an output after computing: - Prompt processing where the length of the input exceeds the batch size - Internal memory management operations such as cache defrag and shift
This commit is contained in:
@@ -49,7 +49,7 @@ type Context interface {
|
||||
FromIntSlice(s []int32, shape ...int) (Tensor, error)
|
||||
|
||||
Forward(Tensor)
|
||||
Compute(Tensor) Tensor
|
||||
Compute(...Tensor)
|
||||
Close()
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user