Option to remove part of chat history when prompt limit is reached to prevent reprocessing every generation? #2356
Replies: 3 comments 1 reply
-
yes, after a while it starts to talk more and more weird things slowly the longer the chat history gets. it goes back to normal after deleting the chat history but then the AI also loses any contexts of the discussion... |
Beta Was this translation helpful? Give feedback.
-
what about this ? seems to be an quite good idea |
Beta Was this translation helpful? Give feedback.
-
In the parameters tab, see if you got a "Truncate the prompt up to this length" option. See if that does what you want. Though, if it indeed does crop the top, you might lose some initial instructions that I dunno if the system will re-insert automatically somehwere. And yeah, if that's what it does, it will essentially be making the AI "forget" the earlier parts of the conversation. |
Beta Was this translation helpful? Give feedback.
-
When reaching the token limit, the prompt gets truncated every generation which takes forever, because it can't use the cache anymore and needs to process all tokens every generation. I have very low-end Hardware.
Is it possible to remove the beginning of the history to a set amount of tokens automatically once the limit is reached so llama.cpps cache function can work again?
I hope this is understandable, as I'm not a programmer.
Beta Was this translation helpful? Give feedback.
All reactions