Is llama.cpp batching prompts size setted by the parameter -b? #1

5symx · 2024-11-22T11:34:34Z

Hi there, I am also working on the performance comparison through different LLM inference framework. For the batching prompt, the llama.cpp perform quite worse then I expected. I found that your method use the -b of ./llama-bench to set up the batch size. However, it's not quite clear to me that this parameter is the same as the batch_size of other framework.

n_batch (-b) don't affect how much of the context you can use, it is just a limit to how many tokens you can put in a single batch.. If I understand it correctly, it means that if I set -b to 32 means that there is 32 token input in a single llama_decode(), other than set 32*1024 tokens to the input batch.

Here are some other related links from llama.cpp. batch_prompt , batch-size and ubatch-size.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is llama.cpp batching prompts size setted by the parameter -b? #1

Is llama.cpp batching prompts size setted by the parameter -b? #1

5symx commented Nov 22, 2024

Is llama.cpp batching prompts size setted by the parameter -b? #1

Is llama.cpp batching prompts size setted by the parameter -b? #1

Comments

5symx commented Nov 22, 2024