Replies: 1 comment
-
same for me here on rtx3070... speed is really slow( |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When using Llama 13b 4bit I get around 2.5 tokens/s (tested in the notebook mode for 200 tokens with just a one token when starting).
It's strange because I get better speed on CPU using llama.cpp and 13b 4bit. Also I saw tomshardware benchmark where they got 19.5 tokens/s on the same repo and card. What results do you have?
Beta Was this translation helpful? Give feedback.
All reactions