diff --git a/README.md b/README.md index ffcc846d5..e83541610 100644 --- a/README.md +++ b/README.md @@ -50,9 +50,9 @@ MCT supports various quantization methods as appears below. Quantization Method | Complexity | Computational Cost | API | Tutorial -------------------- | -----------|--------------------|---------|-------- -PTQ (Post Training Quantization) | Low | Low (~1-10 CPU minutes) | [Keras API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/keras_post_training_quantization.html), [PyTorch API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/pytorch_post_training_quantization.html) | -GPTQ (parameters fine-tuning using gradients) | Moderate | Moderate (~1-3 GPU hours) | [Keras API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/keras_gradient_post_training_quantization.html), [PyTorch API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/pytorch_gradient_post_training_quantization.html) | -QAT (Quantization Aware Training) | High | High (~12-36 GPU hours) | [Experimental](https://github.com/sony/model_optimization?tab=readme-ov-file#experimental-features) | +PTQ (Post Training Quantization) | Low | Low (~1-10 CPU minutes) | [PyTorch API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/pytorch_post_training_quantization.html) / [Keras API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/keras_post_training_quantization.html) | +GPTQ (parameters fine-tuning using gradients) | Moderate | Moderate (~1-3 GPU hours) | [PyTorch API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/pytorch_gradient_post_training_quantization.html) / [Keras API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/keras_gradient_post_training_quantization.html) | +QAT (Quantization Aware Training) | High | High (~12-36 GPU hours) | [QAT API](https://sony.github.io/model_optimization/docs/api/api_docs/index.html#qat) |

@@ -79,24 +79,26 @@ MCT offers a range of powerful features to optimize models for efficient edge de 🏆 **Mixed-precision search** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb). Assigning optimal quantization bit-width per layer (for weights/activations) -📈 **Graph optimizations**. Transforming the model to be best fitted for quantization process. +📈 **Graph optimizations**. +Transforming the model to be best fitted for quantization process. -🔎 **Quantization parameter search**. Minimizing expected quantization-noise during thresholds search using methods such as MSE, No-Clipping and MAE. +🔎 **Quantization parameter search** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb). Minimizing expected quantization-noise during thresholds search using methods such as MSE, No-Clipping and MAE. -🧮 **Advanced quantization algorithms**. To prevent a performance degradation some algorithms are applied, such as Shift negative correction, Outliers filtering and clustering. +🧮 **Advanced quantization algorithms** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb). Enhancing quantization performance for advanced cases is available with some algorithms that can be applied, such as Shift negative correction, Outliers filtering and clustering. __________________________________________________________________________________________________________ ### Hardware-aware optimization -🎯 **TPC (Target Platform Capabilities)**. Describes the target hardware’s constrains, for which the model optimization is targeted. See [TPC Readme](https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/target_platform_capabilities/README.md) for more information. +🎯 **TPC (Target Platform Capabilities)**. Describes the target hardware’s constrains, for which the model optimization is targeted. See [TPC Readme](./model_compression_toolkit/target_platform_capabilities/README.md) for more information. __________________________________________________________________________________________________________ ### Data-free quantization (Data Generation) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb) -Generates synthetic images based on the statistics stored in a model's batch normalization layers, based on your specific needs, for when image data isn’t available. See [Data Generation Library](https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/data_generation/README.md) for more. +Generates synthetic images based on the statistics stored in the model's batch normalization layers, according to your specific needs, for when image data isn’t available. See [Data Generation Library](https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/data_generation/README.md) for more. __________________________________________________________________________________________________________ ### Structured Pruning [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb) -Reduces model size/complexity and ensures better channels utilization by removing redundant input channels from layers and reconstruction of layer weights. Read more (Pytorch/Keras). +Reduces model size/complexity and ensures better channels utilization by removing redundant input channels from layers and reconstruction of layer weights. Read more ([Pytorch API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/pytorch_pruning_experimental.html) / [Keras API](https://sony.github.io/model_optimization/docs/api/api_docs/methods/keras_pruning_experimental.html)). __________________________________________________________________________________________________________ ### **Debugging and Visualization** -**🎛️ Network Editor (Modify Quantization Configurations)** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb). Modify your model's quantization configuration for specific layers or apply a custom edit rule (e.g adjust layer's bit-width) using MCT’s network editor +**🎛️ Network Editor (Modify Quantization Configurations)** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb). +Modify your model's quantization configuration for specific layers or apply a custom edit rule (e.g adjust layer's bit-width) using MCT’s network editor **🖥️ Visualization**. Observe useful information for troubleshooting the quantized model's performance using TensorBoard. [Read more](https://sony.github.io/model_optimization/docs/guidelines/visualization.html). @@ -104,8 +106,8 @@ ________________________________________________________________________________ __________________________________________________________________________________________________________ ### Enhanced Post-Training Quantization (EPTQ) As part of the GPTQ capability, we provide an advanced optimization algorithm called EPTQ. -The specifications of the algorithm are detailed in the paper: _"**EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian**"_ [4]. -More details on the how to use EPTQ via MCT can be found in the [EPTQ guidelines](https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/gptq/README.md). +The specifications of the algorithm are detailed in the paper: _"**EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization**"_ [4]. +More details on how to use EPTQ via MCT can be found in the [GPTQ guidelines](https://github.com/sony/model_optimization/blob/main/model_compression_toolkit/gptq/README.md). ##
Resources
* [User Guide](https://sony.github.io/model_optimization/docs/index.html) contains detailed information about MCT and guides you from installation through optimizing models for your edge AI applications. @@ -207,4 +209,4 @@ MCT is licensed under Apache License Version 2.0. By contributing to the project [3] [TORCHVISION.MODELS](https://pytorch.org/vision/stable/models.html) -[4] Gordon, O., Habi, H. V., & Netzer, A., 2023. [EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian. arXiv preprint](https://arxiv.org/abs/2309.11531) +[4] Gordon, O., Cohen, E., Habi, H. V., & Netzer, A., 2024. [EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization. arXiv preprint](https://arxiv.org/abs/2309.11531)