What should be the generation speed on RTX 3060? #506

ray2022 · 2023-03-22T23:19:44Z

ray2022
Mar 22, 2023

When using Llama 13b 4bit I get around 2.5 tokens/s (tested in the notebook mode for 200 tokens with just a one token when starting).
It's strange because I get better speed on CPU using llama.cpp and 13b 4bit. Also I saw tomshardware benchmark where they got 19.5 tokens/s on the same repo and card. What results do you have?

mindcz · 2023-03-24T11:50:08Z

mindcz
Mar 24, 2023

same for me here on rtx3070... speed is really slow(

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should be the generation speed on RTX 3060? #506

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

What should be the generation speed on RTX 3060? #506

ray2022 Mar 22, 2023

Replies: 1 comment

mindcz Mar 24, 2023

ray2022
Mar 22, 2023

mindcz
Mar 24, 2023