[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!! #945

zhanaali · 2024-09-23T07:17:19Z

Model Series

Qwen2.5

What are the models used?

Qwen2.5-32B-Instruct-GPTQ-Int4

What is the scenario where the problem happened?

Using vllm reasoning Qwen2.5-32B-Instruct-GPTQ-Int4 appears with garbled text !!!!!!!!!!!!!!!!!!

Is this badcase known and can it be solved using avaiable techniques?

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find a solution there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

Information about environment

python==3.10
gpu: A100 80GB * 2
CUDA Version: 12.4
Driver Version: 550.54.15
PyTorch: 2.3.0+cu121
pip list

anaconda-anon-usage 0.4.4
archspec 0.2.3
boltons 23.0.0
Brotli 1.0.9
certifi 2024.7.4
cffi 1.16.0
charset-normalizer 3.3.2
conda 24.7.1
conda-content-trust 0.2.0
conda-libmamba-solver 24.7.0
conda-package-handling 2.3.0
conda_package_streaming 0.10.0
cryptography 42.0.5
distro 1.9.0
frozendict 2.4.2
idna 3.7
jsonpatch 1.33
jsonpointer 2.1
libmambapy 1.5.8
menuinst 2.1.2
packaging 24.1
pip 24.2
platformdirs 3.10.0
pluggy 1.0.0
pycosat 0.6.6
pycparser 2.21
PySocks 1.7.1
requests 2.32.3
ruamel.yaml 0.17.21
setuptools 72.1.0
tqdm 4.66.4
truststore 0.8.0
urllib3 2.2.2
wheel 0.43.0
zstandard 0.22.0

Description

Steps to reproduce

This happens to Qwen2.5-32B-Instruct-GPTQ-Int4
The badcase can be reproduced with the following steps:

...
...

The following example input & output can be used:

{
   "content": "你好",
   "role": "user"
}

Expected results

{"model":"Qwen2-7B-Instruct","object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!","function_call":null},"finish_reason":"stop"}],"created":1727075660}

Attempts to fix

Switching to Qwen2.5-72B-Instruct-GPTQ-Int4 model, the output is normal.

Anything else helpful for investigation

I find that this problem also happens to Qwen1.5-32B-Instruct-GPTQ-Int4

The text was updated successfully, but these errors were encountered:

zhanaali · 2024-09-23T07:23:23Z

Inference file
openai_api_32b.txt

zhanaali · 2024-09-23T09:30:51Z

Can the same script reasoning Qwen2.5-32B-Instruct-GPTQ-Int8 model normally output。 Is it a problem with the reasoning parameters?

hzhwcmhf · 2024-09-23T10:09:12Z

Have you tried to upgrade the vllm and autogptq packages?

zhanaali · 2024-09-23T11:01:58Z

It's still the same after the upgrade

@hzhwcmhf

leavegee · 2024-09-23T15:31:51Z

我也遇到了相同的问题。用的模型是32B的GPTQ量化模型
Name: vllm
Version: 0.6.1.post2
推理命令
vllm serve qwen25-32b --quantization gptq --host 0.0.0.0 --port 8080
希望能得到解决方案。

jklj077 · 2024-09-26T05:29:47Z

Hi, could you try installing the latest vllm in a fresh environment?

conda create -n vllm python=3.11
conda activate vllm
pip install vllm

This should install

vllm 0.6.2
torch 2.4.0
cuda12.1 runtime

Tested with

2 or 8 NVIDIA A10
NVIDIA Driver 535.183.06

The result appears normal:

linzhengtian · 2024-09-26T09:31:59Z

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8 Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4全部会出现这个问题，prompt大于60tonken后就恢复正常。

jklj077 · 2024-09-26T10:04:26Z

@linzhengtian Please provide steps to reproduce. I cannot reproduce with the settings above with vLLM. (The input sequence length is also about 30 tokens.)

noanti · 2024-09-26T12:18:26Z

遇到了相同的问题
vllm==0.6.1.post2
卡是v100*2。
相同环境部署qwen2.5-72b-gptq-int4和qwen2.5-14b-gptq-int4都没有问题，只有32b不行，只会输出感叹号。

jklj077 · 2024-09-27T03:34:31Z

@noanti see this comment: #945 (comment)

QwertyJack · 2024-10-09T02:57:14Z

@noanti see this comment: #945 (comment)

Tested on V100, failed with infinite !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

QwertyJack · 2024-10-09T03:00:15Z

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8 Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4全部会出现这个问题，prompt大于60tonken后就恢复正常。

The same here.

featherace · 2024-10-11T08:22:33Z

Inference file openai_api_32b.txt

Try to set quantization = "gptq_marlin" or quantization = None

QwertyJack · 2024-10-12T02:52:08Z

Try to set quantization = "gptq_marlin" or quantization = None

Unfortunately, V100 has SM70 so it does not support marlin.

noanti · 2024-10-17T03:23:40Z

@noanti see this comment: #945 (comment)

试了vllm0.6.2和0.6.3，仍然有问题。按照前面说的，把prompt增加到50token以上就能正常输出了，很奇怪……

shilei4260 · 2024-10-22T04:36:46Z

@noanti 卡是v100*4 16G。部署qwen2.5-72b-gptq-int4 会报显存错误，你怎么部署的哦，可以看下命令不。

github-actions · 2024-12-19T08:00:39Z

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.

wfnian · 2024-12-24T08:26:10Z

我在vllm 0.6.4.post1,两张P100显卡，Qwen2.5-32B-Instruct-GPTQ-Int4上也出现了无限感叹号!!!!!!的问题，
然后在modelscope的评论区找到了目前对于我来说唯一有效的解决办法：在每次提问前，系统提示词后面新增一轮对话：

if len(messages) <= 1:
        messages.extend([
                {"role":"user",content:"你好"},
                {"role":"assistant",content:"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"}
            ])

这个解决方案虽然不优雅，但是对于我来说确实好用。当然我的系统提示词确实也超过50token了。

jklj077 assigned hzhwcmhf Sep 23, 2024

jklj077 mentioned this issue Nov 19, 2024

[Bug]: vllm infer Qwen2.5-32B-Instruct-AWQ with 2 * Nvidia-L20, output repeat !!!! #1090

Closed

4 tasks

jklj077 added the help wanted Extra attention is needed label Nov 19, 2024

This was referenced Nov 26, 2024

[Bug]: Qwen2.5-32b-int4用vllm跑好像只会生成感叹号 #1103

Open

[Bug]: Qwen2.5-32B-GPTQ-Int4 inference !!!!! vllm-project/vllm#10656

Closed

github-actions bot added the inactive label Dec 19, 2024

github-actions bot removed the inactive label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!! #945

[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!! #945

zhanaali commented Sep 23, 2024

zhanaali commented Sep 23, 2024

zhanaali commented Sep 23, 2024

hzhwcmhf commented Sep 23, 2024

zhanaali commented Sep 23, 2024

leavegee commented Sep 23, 2024

jklj077 commented Sep 26, 2024

linzhengtian commented Sep 26, 2024

jklj077 commented Sep 26, 2024

noanti commented Sep 26, 2024

jklj077 commented Sep 27, 2024

QwertyJack commented Oct 9, 2024

QwertyJack commented Oct 9, 2024

featherace commented Oct 11, 2024

QwertyJack commented Oct 12, 2024

noanti commented Oct 17, 2024

shilei4260 commented Oct 22, 2024

github-actions bot commented Dec 19, 2024

wfnian commented Dec 24, 2024 •

edited

Loading

[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!! #945

[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!! #945

Comments

zhanaali commented Sep 23, 2024

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this badcase known and can it be solved using avaiable techniques?

Information about environment

python==3.10 gpu: A100 80GB * 2 CUDA Version: 12.4 Driver Version: 550.54.15 PyTorch: 2.3.0+cu121 pip list

Description

Steps to reproduce

Expected results

Attempts to fix

Anything else helpful for investigation

zhanaali commented Sep 23, 2024

zhanaali commented Sep 23, 2024

hzhwcmhf commented Sep 23, 2024

zhanaali commented Sep 23, 2024

leavegee commented Sep 23, 2024

jklj077 commented Sep 26, 2024

linzhengtian commented Sep 26, 2024

jklj077 commented Sep 26, 2024

noanti commented Sep 26, 2024

jklj077 commented Sep 27, 2024

QwertyJack commented Oct 9, 2024

QwertyJack commented Oct 9, 2024

featherace commented Oct 11, 2024

QwertyJack commented Oct 12, 2024

noanti commented Oct 17, 2024

shilei4260 commented Oct 22, 2024

github-actions bot commented Dec 19, 2024

wfnian commented Dec 24, 2024 • edited Loading

python==3.10
gpu: A100 80GB * 2
CUDA Version: 12.4
Driver Version: 550.54.15
PyTorch: 2.3.0+cu121
pip list

wfnian commented Dec 24, 2024 •

edited

Loading