Skip to content

Commit

Permalink
Added detailed results for models into their respective MD files, Fix…
Browse files Browse the repository at this point in the history
…ed QuickSRNet docs to symmetric weight quant

Signed-off-by: Bharath Ramaswamy <quic_bharathr@quicinc.com>
  • Loading branch information
quic-bharathr committed Dec 14, 2022
1 parent 28b63fd commit c008fc0
Show file tree
Hide file tree
Showing 10 changed files with 468 additions and 1 deletion.
83 changes: 83 additions & 0 deletions zoo_tensorflow/Docs/RetinaNet.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,86 @@ python3 retinanet_quanteval.py \
- Weight quantization: 8 bits, per tensor asymmetric quantization
- Bias parameters are quantized
- Activation quantization: 8 bits, asymmetric quantization

## Results
(COCO dataset)
<table style= " width:50%">
<tr>
<th>Average Precision/Recall </th>
<th> @[ IoU | area | maxDets] </th>
<th>FP32 </th>
<th>INT8 </th>
</tr>
<tr>
<td>Average Precision</td>
<td> @[ 0.50:0.95 | all | 100 ] </td>
<td>0.350 </td>
<td>0.349</td>
</tr>
<tr>
<td>Average Precision</td>
<td> @[ 0.50 | all | 100 ] </td>
<td>0.537 </td>
<td>0.536</td>
</tr>
<tr>
<td>Average Precision</td>
<td> @[ 0.75 | all | 100 ] </td>
<td>0.374 </td>
<td> 0.372</td>
</tr>
<tr>
<td>Average Precision</td>
<td> @[ 0.50:0.95 | small | 100 ] </td>
<td>0.191 </td>
<td>0.187</td>
</tr>
<tr>
<td>Average Precision</td>
<td> @[ 0.50:0.95 | medium | 100 ] </td>
<td> 0.383 </td>
<td>0.381</td>
</tr>
<tr>
<td>Average Precision</td>
<td> @[ 0.50:0.95 | large | 100 ] </td>
<td>0.472 </td>
<td>0.472</td>
</tr>
<tr>
<td> Average Recall</td>
<td> @[ 0.50:0.95 | all | 1 ] </td>
<td>0.306 </td>
<td>0.305</td>
</tr>
<tr>
<td> Average Recall</td>
<td> @[0.50:0.95 | all | 10 ] </td>
<td>0.491 </td>
<td>0.490</td>
</tr>
<tr>
<td> Average Recall</td>
<td> @[ 0.50:0.95 | all |100 ] </td>
<td>0.533 </td>
<td>0.532</td>
</tr>
<tr>
<td> Average Recall</td>
<td> @[ 0.50:0.95 | small | 100 ] </td>
<td>0.345</td>
<td>0.341</td>
</tr>
<tr>
<td> Average Recall</td>
<td> @[ 0.50:0.95 | medium | 100 ] </td>
<td>0.577</td>
<td>0.577</td>
</tr>
<tr>
<td> Average Recall</td>
<td> @[ 0.50:0.95 | large | 100 ] </td>
<td>0.681</td>
<td>0.679</td>
</tr>
</table>
28 changes: 28 additions & 0 deletions zoo_tensorflow/Docs/SRGAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,31 @@ pip install tensorflow-gpu==2.4.0
- Activation quantization: 16 bits, asymmetric quantization
- Model inputs are quantized
- Bias Correction and Cross Layer Equalization have been applied

## Results
<table style= " width:50%">
<tr>
<th>Model</th>
<th>Dataset</th>
<th>PSNR</th>
<th>SSIM</th>
</tr>
<tr>
<td>FP32</td>
<td>Set5 / Set14 / BSD100</td>
<td>29.17 / 26.17 / 25.45</td>
<td>0.853 / 0.719 / 0.668</td>
</tr>
<tr>
<td>INT8 / ACT8</td>
<td>Set5 / Set14 / BSD100</td>
<td>28.31 / 25.55 / 24.78</td>
<td>0.821 / 0.684 / 0.628</td>
</tr>
<tr>
<td>INT8 / ACT16</td>
<td>Set5 / Set14 / BSD100</td>
<td>29.12 / 26.15 / 25.41</td>
<td>0.851 / 0.719 / 0.666</td>
</tr>
</table>
42 changes: 42 additions & 0 deletions zoo_torch/Docs/Bert.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,45 @@ The following configuration has been used for the above models for INT8 quantiza
- TF range learning was used as quantization scheme
- Mask values of -6 was applied in attention layers
- Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948).

## Results
Below are the results of the Pytorch transformer model Bert for GLUE dataset:

<table style= " width:50%">
<tr>
<td> Configuration </td>
<td> CoLA (corr) </td>
<td> SST-2 (acc) </td>
<td> MRPC (f1) </td>
<td> STS-B (corr) </td>
<td> QQP (acc) </td>
<td> MNLI (acc) </td>
<td> QNLI (acc) </td>
<td> RTE (acc) </td>
<td> GLUE </td>
</tr>
<tr>
<td> FP32 </td>
<td> 58.76 </td>
<td> 93.12 </td>
<td> 89.93 </td>
<td> 88.84 </td>
<td> 90.94 </td>
<td> 85.19 </td>
<td> 91.63 </td>
<td> 66.43 </td>
<td> 83.11 </td>
</tr>
<tr>
<td> W8A8 </td>
<td> 56.93 </td>
<td> 91.28 </td>
<td> 90.34 </td>
<td> 89.13 </td>
<td> 90.78 </td>
<td> 81.68 </td>
<td> 91.14 </td>
<td> 68.23 </td>
<td> 82.44 </td>
</tr>
</table>
42 changes: 42 additions & 0 deletions zoo_torch/Docs/DistilBert.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,45 @@ The following configuration has been used for the above models for INT8 quantiza
- TF range learning was used as quantization scheme
- Mask values of -6 was applied in attention layers
- Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948).

## Results
Below are the results of the Pytorch transformer model DistilBert for GLUE dataset:

<table style= " width:50%">
<tr>
<td> Configuration </td>
<td> CoLA (corr) </td>
<td> SST-2 (acc) </td>
<td> MRPC (f1) </td>
<td> STS-B (corr) </td>
<td> QQP (acc) </td>
<td> MNLI (acc) </td>
<td> QNLI (acc) </td>
<td> RTE (acc) </td>
<td> GLUE </td>
</tr>
<tr>
<td> FP32 </td>
<td> 53.85 </td>
<td> 91.17 </td>
<td> 88.40 </td>
<td> 87.12 </td>
<td> 90.39 </td>
<td> 87.29 </td>
<td> 82.15 </td>
<td> 65.34 </td>
<td> 80.71 </td>
</tr>
<tr>
<td> W8A8 </td>
<td> 52.99 </td>
<td> 90.48 </td>
<td> 89.34 </td>
<td> 86.76 </td>
<td> 89.77 </td>
<td> 86.88 </td>
<td> 83.35 </td>
<td> 65.54 </td>
<td> 80.26 </td>
</tr>
</table>
36 changes: 36 additions & 0 deletions zoo_torch/Docs/FFNet.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,39 @@ python ffnet_quanteval.py \
- TF-Enhanced was used as quantization scheme
- Cross layer equalization (CLE) has been applied on optimized checkpoint
- for low resolution models with pre_down suffix, the GaussianConv2D layer is disabled for quantization.

## Results
Below are the *mIoU* results of the PyTorch FFNet model for the Cityscapes dataset:

<table style= " width:50%">
<tr>
<th>Model Configuration</th>
<th>FP32 (%)</th>
<th>INT8 (%)</th>
</tr>
<tr>
<td>segmentation_ffnet78S_dBBB_mobile</td>
<td>81.3</td>
<td>80.7</td>
</tr>
<tr>
<td>segmentation_ffnet54S_dBBB_mobile</td>
<td>80.8</td>
<td>80.1</td>
</tr>
<tr>
<td>segmentation_ffnet40S_dBBB_mobile</td>
<td>79.2</td>
<td>78.9</td>
</tr>
<tr>
<td>segmentation_ffnet78S_BCC_mobile_pre_down</td>
<td>80.6</td>
<td>80.4</td>
</tr>
<tr>
<td>segmentation_ffnet122NS_CCC_mobile_pre_down</td>
<td>79.3</td>
<td>79.0</td>
</tr>
</table>
42 changes: 42 additions & 0 deletions zoo_torch/Docs/MiniLM.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,45 @@ The following configuration has been used for the above models for INT8 quantiza
- TF range learning was used as quantization scheme
- Mask values of -6 was applied in attention layers
- Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948).

## Results
Below are the results of the Pytorch transformer model MiniLM for GLUE dataset:

<table style= " width:50%">
<tr>
<td> Configuration </td>
<td> CoLA (corr) </td>
<td> SST-2 (acc) </td>
<td> MRPC (f1) </td>
<td> STS-B (corr) </td>
<td> QQP (acc) </td>
<td> MNLI (acc) </td>
<td> QNLI (acc) </td>
<td> RTE (acc) </td>
<td> GLUE </td>
</tr>
<tr>
<td> FP32 </td>
<td> 57.78 </td>
<td> 92.32 </td>
<td> 89.01 </td>
<td> 88.73 </td>
<td> 90.70 </td>
<td> 85.04 </td>
<td> 91.52 </td>
<td> 70.76 </td>
<td> 83.23 </td>
</tr>
<tr>
<td> W8A8 </td>
<td> 55.58 </td>
<td> 92.20 </td>
<td> 88.21 </td>
<td> 88.68 </td>
<td> 90.62 </td>
<td> 84.59 </td>
<td> 90.72 </td>
<td> 70.40 </td>
<td> 82.63 </td>
</tr>
</table>
42 changes: 42 additions & 0 deletions zoo_torch/Docs/MobileBert.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,45 @@ The following configuration has been used for the above models for INT8 quantiza
- TF range learning was used as quantization scheme
- Mask values of -6 was applied in attention layers
- Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948).

## Results
Below are the results of the Pytorch transformer model MobileBert for GLUE dataset:

<table style= " width:50%">
<tr>
<td> Configuration </td>
<td> CoLA (corr) </td>
<td> SST-2 (acc) </td>
<td> MRPC (f1) </td>
<td> STS-B (corr) </td>
<td> QQP (acc) </td>
<td> MNLI (acc) </td>
<td> QNLI (acc) </td>
<td> RTE (acc) </td>
<td> GLUE </td>
</tr>
<tr>
<td> FP32 </td>
<td> 50.41 </td>
<td> 90.83 </td>
<td> 85.47 </td>
<td> 88.75 </td>
<td> 90.26 </td>
<td> 83.36 </td>
<td> 90.81 </td>
<td> 70.04 </td>
<td> 81.24 </td>
</tr>
<tr>
<td> W8A8 </td>
<td> 49.34 </td>
<td> 89.79 </td>
<td> 88.50 </td>
<td> 88.46 </td>
<td> 88.60 </td>
<td> 83.82 </td>
<td> 90.48 </td>
<td> 70.40 </td>
<td> 81.17 </td>
</tr>
</table>
42 changes: 42 additions & 0 deletions zoo_torch/Docs/Roberta.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,45 @@ The following configuration has been used for the above models for INT8 quantiza
- TF range learning was used as quantization scheme
- Mask values of -6 was applied in attention layers
- Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in [Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021](https://arxiv.org/abs/2109.12948).
## Results
Below are the results of the Pytorch transformer model Roberta for GLUE dataset:
<table style= " width:50%">
<tr>
<td> Configuration </td>
<td> CoLA (corr) </td>
<td> SST-2 (acc) </td>
<td> MRPC (f1) </td>
<td> STS-B (corr) </td>
<td> QQP (acc) </td>
<td> MNLI (acc) </td>
<td> QNLI (acc) </td>
<td> RTE (acc) </td>
<td> GLUE </td>
</tr>
<tr>
<td> FP32 </td>
<td> 60.36 </td>
<td> 94.72 </td>
<td> 91.84 </td>
<td> 90.54 </td>
<td> 91.24 </td>
<td> 87.29 </td>
<td> 92.33 </td>
<td> 72.56 </td>
<td> 85.11 </td>
</tr>
<tr>
<td> W8A8 </td>
<td> 57.35 </td>
<td> 92.55 </td>
<td> 92.69 </td>
<td> 90.15 </td>
<td> 90.09 </td>
<td> 86.88 </td>
<td> 91.47 </td>
<td> 72.92 </td>
<td> 84.26 </td>
</tr>
</table>
Loading

0 comments on commit c008fc0

Please sign in to comment.