-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Badcase]: Model inference Qwen2.5-32B-Instruct-GPTQ-Int4 appears as garbled text !!!!!!!!!!!!!!!!!! #945
Comments
Inference file |
Can the same script reasoning Qwen2.5-32B-Instruct-GPTQ-Int8 model normally output。 Is it a problem with the reasoning parameters? |
Have you tried to upgrade the vllm and autogptq packages? |
It's still the same after the upgrade |
我也遇到了相同的问题。 用的模型是32B的GPTQ量化模型 |
Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8 Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4全部会出现这个问题,prompt大于60tonken后就恢复正常。 |
@linzhengtian Please provide steps to reproduce. I cannot reproduce with the settings above with vLLM. (The input sequence length is also about 30 tokens.) |
遇到了相同的问题 |
@noanti see this comment: #945 (comment) |
Tested on V100, failed with infinite !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
The same here. |
Try to set |
Unfortunately, V100 has SM70 so it does not support marlin. |
试了vllm0.6.2和0.6.3,仍然有问题。按照前面说的,把prompt增加到50token以上就能正常输出了,很奇怪…… |
@noanti 卡是v100*4 16G。部署qwen2.5-72b-gptq-int4 会报显存错误,你怎么部署的哦,可以看下命令不。 |
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. |
我在vllm 0.6.4.post1,两张P100显卡,Qwen2.5-32B-Instruct-GPTQ-Int4上也出现了无限感叹号!!!!!!的问题, if len(messages) <= 1:
messages.extend([
{"role":"user",content:"你好"},
{"role":"assistant",content:"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"}
]) 这个解决方案虽然不优雅,但是对于我来说确实好用。当然我的系统提示词确实也超过50token了。 |
Model Series
Qwen2.5
What are the models used?
Qwen2.5-32B-Instruct-GPTQ-Int4
What is the scenario where the problem happened?
Using vllm reasoning Qwen2.5-32B-Instruct-GPTQ-Int4 appears with garbled text !!!!!!!!!!!!!!!!!!
Is this badcase known and can it be solved using avaiable techniques?
Information about environment
python==3.10
gpu: A100 80GB * 2
CUDA Version: 12.4
Driver Version: 550.54.15
PyTorch: 2.3.0+cu121
pip list
anaconda-anon-usage 0.4.4
archspec 0.2.3
boltons 23.0.0
Brotli 1.0.9
certifi 2024.7.4
cffi 1.16.0
charset-normalizer 3.3.2
conda 24.7.1
conda-content-trust 0.2.0
conda-libmamba-solver 24.7.0
conda-package-handling 2.3.0
conda_package_streaming 0.10.0
cryptography 42.0.5
distro 1.9.0
frozendict 2.4.2
idna 3.7
jsonpatch 1.33
jsonpointer 2.1
libmambapy 1.5.8
menuinst 2.1.2
packaging 24.1
pip 24.2
platformdirs 3.10.0
pluggy 1.0.0
pycosat 0.6.6
pycparser 2.21
PySocks 1.7.1
requests 2.32.3
ruamel.yaml 0.17.21
setuptools 72.1.0
tqdm 4.66.4
truststore 0.8.0
urllib3 2.2.2
wheel 0.43.0
zstandard 0.22.0
Description
Steps to reproduce
This happens to Qwen2.5-32B-Instruct-GPTQ-Int4
The badcase can be reproduced with the following steps:
The following example input & output can be used:
Expected results
{"model":"Qwen2-7B-Instruct","object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!","function_call":null},"finish_reason":"stop"}],"created":1727075660}
Attempts to fix
Switching to Qwen2.5-72B-Instruct-GPTQ-Int4 model, the output is normal.
Anything else helpful for investigation
I find that this problem also happens to Qwen1.5-32B-Instruct-GPTQ-Int4
The text was updated successfully, but these errors were encountered: