8bits GPTQ quantization output #35460

joshuaongg21 · 2024-12-30T18:26:00Z

System Info

Hi, I noticed that with 8-bit quantization using GPTQConfig, the model inference generally produces really bad results, with outputs that often don't make sense. Could this be an engineering issue with GPTQ, or is this typical behavior for GPTQ when using 8-bit quantization? Thank you in advance!

Who can help?

@SunMarc @ArthurZucker
No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below) - NQOpen

Reproduction

By using Qwen2.5 Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8
By using GPTQConfig on meta-llama/Llama-3.1-8B-Instruct

quantization = GPTQConfig(
    bits = 8,
    group_size = 128,
    dataset = "c4",
    desc_act=False,
)

tokenizer = AutoTokenizer.from_pretrained(model)

model = AutoModelForCausalLM.from_pretrained(model, quantization_config=quantization, device_map="auto")

Expected behavior

model tends to generate unrelated outputs, such as the !!!!!!!!!!!!!!!!!..... as the output. The generated output does not make sense at all.

The text was updated successfully, but these errors were encountered:

SunMarc · 2025-01-02T13:36:48Z

Thanks for the report @joshuaongg21 ! This shouldn't be the case. @Qubitium, does this also happen with gptqmodel ?

Qubitium · 2025-01-02T14:02:14Z

@joshuaongg21 Please try to replicate this error (requantize) using gptqmodel

This should not happen. Our previously tests show 8 bit gptq quantization is almost effortless to get high quality result.

Usually when I see total corruption of output, my first instinct is to double check the tokenizer.

@SunMarc I will add 2-8 bit unit tests in gptqmodel to check for regressions just in case.

Qubitium · 2025-01-03T07:44:37Z

@SunMarc @joshuaongg21 We have added 2-8 bit validation on GPTQModel using tinyllama and evaluating the ARC-Challenge score post-quantization.

https://github.com/ModelCloud/GPTQModel/pull/995/files

What we see is the 8bit has highest score following by 4 bit which is healthy and normal. No regression in 8bit quants. We do however see a 3-bit regression vs 2-bit which we need to investigate further since no-one that I know of have actually produced a 3-bit gptq quant that is widely used. At this moment, we are not sure why 3bit is worse than 2bits. Need to be investigated. But 4bit, 8bits looks good, and 2bits looks normal. 3bit is currently outlier.

Acc scores for model: ModelCloud/TinyLlama-1.1B-Chat-v1.0

    TORCH_QLINEAR_QUANTIZED_MODEL_ARC_CHALLENGE_EXPECTS = {
        2: {'acc,none': 0.22610921501706485, 'acc_norm,none': 0.2909556313993174},
        3: {'acc,none': 0.21245733788395904, 'acc_norm,none': 0.24744027303754265},
        4: {'acc,none': 0.2738907849829352, 'acc_norm,none': 0.3122866894197952},
        8: {'acc,none': 0.2841296928327645, 'acc_norm,none': 0.302901023890785},
    }

SunMarc · 2025-01-03T14:55:40Z

Thanks for the details @Qubitium !

joshuaongg21 added the bug label Dec 30, 2024

joshuaongg21 changed the title ~~8bits quantization~~ 8bits GPTQ quantization output Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8bits GPTQ quantization output #35460

8bits GPTQ quantization output #35460

joshuaongg21 commented Dec 30, 2024 •

edited

Loading

SunMarc commented Jan 2, 2025

Qubitium commented Jan 2, 2025 •

edited

Loading

Qubitium commented Jan 3, 2025 •

edited

Loading

SunMarc commented Jan 3, 2025

8bits GPTQ quantization output #35460

8bits GPTQ quantization output #35460

Comments

joshuaongg21 commented Dec 30, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

SunMarc commented Jan 2, 2025

Qubitium commented Jan 2, 2025 • edited Loading

Qubitium commented Jan 3, 2025 • edited Loading

SunMarc commented Jan 3, 2025

joshuaongg21 commented Dec 30, 2024 •

edited

Loading

Qubitium commented Jan 2, 2025 •

edited

Loading

Qubitium commented Jan 3, 2025 •

edited

Loading