PyTorch Transformer model GPT2 for Natural Language Text Generation

This document describes evaluation of optimized checkpoints for transformer models GPT2 for NL Text Generation tasks.

AIMET installation and setup

Please install and setup AIMET (Torch GPU variant) before proceeding further.

NOTE

All AIMET releases are available here: https://github.com/quic/aimet/releases
This model has been tested using AIMET version 1.24.0 (i.e. set release_tag="1.24.0" in the above instructions).
This model is compatible with the PyTorch GPU variant of AIMET (i.e. set AIMET_VARIANT="torch_gpu" in the above instructions).

pip install accelerate==0.9.0
pip install transformers==4.21.0
pip install datasets==2.4.0

Original full precision checkpoints without downstream training were downloaded through hugging face
[Full precision model with downstream training weight files] are automatically downloaded using evaluation script
[Quantization optimized model weight files] are automatically downloaded using evaluation script

For Text Generation tasks, we use the WikiText language modeling dataset benchmark dataset for evaluation.
Dataset downloading is handled by evaluation script

python transformer_tg_quanteval.py \ 
    --model_eval_type <model evaluation type> \
    --per_device_eval_batch_size <batch size>

example

python transformers_tg_quanteval.py --model_eval_type fp32 --per_device_eval_batch_size 8

The following configuration has been used for the above models for INT8 quantization:

Weight quantization: 8 bits, symmetric quantization
Bias parameters are not quantized
Activation quantization: 8 bits, asymmetric quantization
Model inputs are quantized
TF range learning was used as quantization scheme
Clamped initialization was adopted
Quantization aware training (QAT) was used to obtain optimized quantized weights