From c008fc0e354106b5d7cb5feb1b011b316eeae72a Mon Sep 17 00:00:00 2001 From: Bharath Ramaswamy Date: Wed, 14 Dec 2022 13:28:19 -0800 Subject: [PATCH] Added detailed results for models into their respective MD files, Fixed QuickSRNet docs to symmetric weight quant Signed-off-by: Bharath Ramaswamy --- zoo_tensorflow/Docs/RetinaNet.md | 83 +++++++++++++++++++++++++++++ zoo_tensorflow/Docs/SRGAN.md | 28 ++++++++++ zoo_torch/Docs/Bert.md | 42 +++++++++++++++ zoo_torch/Docs/DistilBert.md | 42 +++++++++++++++ zoo_torch/Docs/FFNet.md | 36 +++++++++++++ zoo_torch/Docs/MiniLM.md | 42 +++++++++++++++ zoo_torch/Docs/MobileBert.md | 42 +++++++++++++++ zoo_torch/Docs/Roberta.md | 42 +++++++++++++++ zoo_torch/Docs/SRGAN.md | 22 ++++++++ zoo_torch/Docs/SuperRes.md | 90 +++++++++++++++++++++++++++++++- 10 files changed, 468 insertions(+), 1 deletion(-) diff --git a/zoo_tensorflow/Docs/RetinaNet.md b/zoo_tensorflow/Docs/RetinaNet.md index 0d1758d..366aed3 100644 --- a/zoo_tensorflow/Docs/RetinaNet.md +++ b/zoo_tensorflow/Docs/RetinaNet.md @@ -55,3 +55,86 @@ python3 retinanet_quanteval.py \ - Weight quantization: 8 bits, per tensor asymmetric quantization - Bias parameters are quantized - Activation quantization: 8 bits, asymmetric quantization + +## Results +(COCO dataset) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Average Precision/Recall @[ IoU | area | maxDets] FP32 INT8
Average Precision @[ 0.50:0.95 | all | 100 ] 0.350 0.349
Average Precision @[ 0.50 | all | 100 ] 0.537 0.536
Average Precision @[ 0.75 | all | 100 ] 0.374 0.372
Average Precision @[ 0.50:0.95 | small | 100 ] 0.191 0.187
Average Precision @[ 0.50:0.95 | medium | 100 ] 0.383 0.381
Average Precision @[ 0.50:0.95 | large | 100 ] 0.472 0.472
Average Recall @[ 0.50:0.95 | all | 1 ] 0.306 0.305
Average Recall @[0.50:0.95 | all | 10 ] 0.491 0.490
Average Recall @[ 0.50:0.95 | all |100 ] 0.533 0.532
Average Recall @[ 0.50:0.95 | small | 100 ] 0.3450.341
Average Recall @[ 0.50:0.95 | medium | 100 ] 0.5770.577
Average Recall @[ 0.50:0.95 | large | 100 ] 0.6810.679
diff --git a/zoo_tensorflow/Docs/SRGAN.md b/zoo_tensorflow/Docs/SRGAN.md index 7227213..8c872de 100644 --- a/zoo_tensorflow/Docs/SRGAN.md +++ b/zoo_tensorflow/Docs/SRGAN.md @@ -45,3 +45,31 @@ pip install tensorflow-gpu==2.4.0 - Activation quantization: 16 bits, asymmetric quantization - Model inputs are quantized - Bias Correction and Cross Layer Equalization have been applied + +## Results + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelDatasetPSNRSSIM
FP32Set5 / Set14 / BSD10029.17 / 26.17 / 25.450.853 / 0.719 / 0.668
INT8 / ACT8Set5 / Set14 / BSD10028.31 / 25.55 / 24.780.821 / 0.684 / 0.628
INT8 / ACT16Set5 / Set14 / BSD10029.12 / 26.15 / 25.410.851 / 0.719 / 0.666
diff --git a/zoo_torch/Docs/Bert.md b/zoo_torch/Docs/Bert.md index 0d6a9fa..7ecfab4 100644 --- a/zoo_torch/Docs/Bert.md +++ b/zoo_torch/Docs/Bert.md @@ -71,3 +71,45 @@ The following configuration has been used for the above models for INT8 quantiza - TF range learning was used as quantization scheme - Mask values of -6 was applied in attention layers - Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948). + +## Results +Below are the results of the Pytorch transformer model Bert for GLUE dataset: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration CoLA (corr) SST-2 (acc) MRPC (f1) STS-B (corr) QQP (acc) MNLI (acc) QNLI (acc) RTE (acc) GLUE
FP32 58.76 93.12 89.93 88.84 90.94 85.19 91.63 66.43 83.11
W8A8 56.93 91.28 90.34 89.13 90.78 81.68 91.14 68.23 82.44
diff --git a/zoo_torch/Docs/DistilBert.md b/zoo_torch/Docs/DistilBert.md index 5bb9630..eb0e635 100644 --- a/zoo_torch/Docs/DistilBert.md +++ b/zoo_torch/Docs/DistilBert.md @@ -71,3 +71,45 @@ The following configuration has been used for the above models for INT8 quantiza - TF range learning was used as quantization scheme - Mask values of -6 was applied in attention layers - Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948). + +## Results +Below are the results of the Pytorch transformer model DistilBert for GLUE dataset: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration CoLA (corr) SST-2 (acc) MRPC (f1) STS-B (corr) QQP (acc) MNLI (acc) QNLI (acc) RTE (acc) GLUE
FP32 53.85 91.17 88.40 87.12 90.39 87.29 82.15 65.34 80.71
W8A8 52.99 90.48 89.34 86.76 89.77 86.88 83.35 65.54 80.26
diff --git a/zoo_torch/Docs/FFNet.md b/zoo_torch/Docs/FFNet.md index 6d00a7c..4a7515c 100755 --- a/zoo_torch/Docs/FFNet.md +++ b/zoo_torch/Docs/FFNet.md @@ -54,3 +54,39 @@ python ffnet_quanteval.py \ - TF-Enhanced was used as quantization scheme - Cross layer equalization (CLE) has been applied on optimized checkpoint - for low resolution models with pre_down suffix, the GaussianConv2D layer is disabled for quantization. + +## Results +Below are the *mIoU* results of the PyTorch FFNet model for the Cityscapes dataset: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Model ConfigurationFP32 (%)INT8 (%)
segmentation_ffnet78S_dBBB_mobile81.380.7
segmentation_ffnet54S_dBBB_mobile80.880.1
segmentation_ffnet40S_dBBB_mobile79.278.9
segmentation_ffnet78S_BCC_mobile_pre_down80.680.4
segmentation_ffnet122NS_CCC_mobile_pre_down79.379.0
diff --git a/zoo_torch/Docs/MiniLM.md b/zoo_torch/Docs/MiniLM.md index 974032b..19159c4 100644 --- a/zoo_torch/Docs/MiniLM.md +++ b/zoo_torch/Docs/MiniLM.md @@ -71,3 +71,45 @@ The following configuration has been used for the above models for INT8 quantiza - TF range learning was used as quantization scheme - Mask values of -6 was applied in attention layers - Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948). + +## Results +Below are the results of the Pytorch transformer model MiniLM for GLUE dataset: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration CoLA (corr) SST-2 (acc) MRPC (f1) STS-B (corr) QQP (acc) MNLI (acc) QNLI (acc) RTE (acc) GLUE
FP32 57.78 92.32 89.01 88.73 90.70 85.04 91.52 70.76 83.23
W8A8 55.58 92.20 88.21 88.68 90.62 84.59 90.72 70.40 82.63
diff --git a/zoo_torch/Docs/MobileBert.md b/zoo_torch/Docs/MobileBert.md index 81f4ab4..ba02f9c 100644 --- a/zoo_torch/Docs/MobileBert.md +++ b/zoo_torch/Docs/MobileBert.md @@ -71,3 +71,45 @@ The following configuration has been used for the above models for INT8 quantiza - TF range learning was used as quantization scheme - Mask values of -6 was applied in attention layers - Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948). + +## Results +Below are the results of the Pytorch transformer model MobileBert for GLUE dataset: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration CoLA (corr) SST-2 (acc) MRPC (f1) STS-B (corr) QQP (acc) MNLI (acc) QNLI (acc) RTE (acc) GLUE
FP32 50.41 90.83 85.47 88.75 90.26 83.36 90.81 70.04 81.24
W8A8 49.34 89.79 88.50 88.46 88.60 83.82 90.48 70.40 81.17
diff --git a/zoo_torch/Docs/Roberta.md b/zoo_torch/Docs/Roberta.md index 3b76282..996b104 100644 --- a/zoo_torch/Docs/Roberta.md +++ b/zoo_torch/Docs/Roberta.md @@ -51,3 +51,45 @@ The following configuration has been used for the above models for INT8 quantiza - TF range learning was used as quantization scheme - Mask values of -6 was applied in attention layers - Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948). + +## Results +Below are the results of the Pytorch transformer model Roberta for GLUE dataset: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Configuration CoLA (corr) SST-2 (acc) MRPC (f1) STS-B (corr) QQP (acc) MNLI (acc) QNLI (acc) RTE (acc) GLUE
FP32 60.36 94.72 91.84 90.54 91.24 87.29 92.33 72.56 85.11
W8A8 57.35 92.55 92.69 90.15 90.09 86.88 91.47 72.92 84.26
diff --git a/zoo_torch/Docs/SRGAN.md b/zoo_torch/Docs/SRGAN.md index e1bb559..7eb2090 100644 --- a/zoo_torch/Docs/SRGAN.md +++ b/zoo_torch/Docs/SRGAN.md @@ -68,3 +68,25 @@ python srgan_quanteval.py \ - Bias parameters are quantized - Activation quantization: 8 bits asymmetric quantization - Model inputs are not quantized + +## Results + + + + + + + + + + + + + + + + + + + +
ModelDatasetPSNRSSIM
FP32Set5 / Set14 / BSD10029.93 / 26.58 / 25.510.851 / 0.709 / 0.653
INT8Set5 / Set14 / BSD10029.86 / 26.59 / 25.550.845 / 0.705 / 0.648
diff --git a/zoo_torch/Docs/SuperRes.md b/zoo_torch/Docs/SuperRes.md index 0f8c05e..1ee3539 100644 --- a/zoo_torch/Docs/SuperRes.md +++ b/zoo_torch/Docs/SuperRes.md @@ -66,8 +66,96 @@ Please note the following regarding the available checkpoints: ## Quantization Configuration In the evaluation notebook included, we have used the default config file, which configures the quantizer ops with the following assumptions: -- Weight quantization: *8 bits, per tensor asymmetric quantization* +- Weight quantization: *8 bits, per tensor symmetric quantization* - Bias parameters are not quantized - Activation quantization: *8 bits, asymmetric quantization* - Model inputs are quantized - *TF_enhanced* was used as the quantization scheme + +## Results +**NOTE:** +All results below used a *Scaling factor (LR-to-HR upscaling) of 2x* and the *Set14 dataset*. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelConfig[1]ChannelsPSNR
FP32 + INT8 +
ABPNN/A2832.7132.64
N/A3232.7532.69
XLSRN/A3232.5732.30
SESRM31632.4132.25
M51632.5732.50
M71632.6632.58
M111632.7332.59
XL3233.0332.92
QuickSRNetSmall3232.5232.49
Medium3232.7832.73
Large6433.2433.17
+ +*[1]* Config: This parameter denotes a model configuration corresponding to a certain number of residual blocks used. The M*x* models have 16 feature channels, whereas the XL model has 32 feature channels.