Skip to content

Commit

Permalink
Update documentation for Layer Output Generation API
Browse files Browse the repository at this point in the history
Signed-off-by: Raj Gite <quic_rgite@quicinc.com>
  • Loading branch information
quic-rgite authored and quic-akhobare committed Nov 9, 2023
1 parent 34f7cca commit b98f200
Show file tree
Hide file tree
Showing 10 changed files with 118 additions and 117 deletions.
8 changes: 4 additions & 4 deletions Docs/api_docs/keras_layer_output_generation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
AIMET Keras Layer Output Generation API
================================

This API captures and saves intermediate layer-outputs of a model. The model can be original(FP32) or quantsim.
This API captures and saves intermediate layer-outputs of a model. The model can be original (FP32) or quantsim.
The layer-outputs are named according to the exported Keras model by the quantsim export API. This allows layer-output
comparison amongst FP32 model, quantization simulated model and actually quantized model on target-device to debug
accuracy miss-match issues.
Expand Down Expand Up @@ -34,21 +34,21 @@ Code Example
:start-after: # Step 0. Import statements
:end-before: # End step 0

**Obtain Original or QuantSim model session**
**Obtain Original or QuantSim model from AIMET Export Artifacts**

.. literalinclude:: ../keras_code_examples/layer_output_generation_code_example.py
:language: python
:start-after: # Step 1. Obtain original or quantsim model
:end-before: # End step 1

**Obtain pre-processed inputs**
**Obtain inputs for which we want to generate intermediate layer-outputs**

.. literalinclude:: ../keras_code_examples/layer_output_generation_code_example.py
:language: python
:start-after: # Step 2. Obtain pre-processed inputs
:end-before: # End step 2

**Generate Layer Outputs**
**Generate layer-outputs**

.. literalinclude:: ../keras_code_examples/layer_output_generation_code_example.py
:language: python
Expand Down
8 changes: 4 additions & 4 deletions Docs/api_docs/onnx_layer_output_generation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
AIMET ONNX Layer Output Generation API
================================

This API captures and saves intermediate layer-outputs of a model. The model can be original(FP32) or quantsim.
This API captures and saves intermediate layer-outputs of a model. The model can be original (FP32) or quantsim.
The layer-outputs are named according to the exported ONNX model by the quantsim export API. This allows layer-output comparison
amongst FP32 model, quantization simulated model and actually quantized model on target-device to debug accuracy miss-match issues.

Expand Down Expand Up @@ -34,21 +34,21 @@ Code Example
:start-after: # Step 0. Import statements
:end-before: # End step 0

**Obtain Original or QuantSim model session**
**Obtain Original or QuantSim model from AIMET Export Artifacts**

.. literalinclude:: ../onnx_code_examples/layer_output_generation_code_example.py
:language: python
:start-after: # Step 1. Obtain original or quantsim model
:end-before: # End step 1

**Obtain pre-processed inputs**
**Obtain inputs for which we want to generate intermediate layer-outputs**

.. literalinclude:: ../onnx_code_examples/layer_output_generation_code_example.py
:language: python
:start-after: # Step 2. Obtain pre-processed inputs
:end-before: # End step 2

**Generate Layer Outputs**
**Generate layer-outputs**

.. literalinclude:: ../onnx_code_examples/layer_output_generation_code_example.py
:language: python
Expand Down
8 changes: 4 additions & 4 deletions Docs/api_docs/tensorflow_layer_output_generation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
AIMET Tensorflow Layer Output Generation API
================================

This API captures and saves intermediate layer-outputs of a model. The model can be original(FP32) or quantsim.
This API captures and saves intermediate layer-outputs of a model. The model can be original (FP32) or quantsim.
The layer-outputs are named according to the exported Tensorflow model by the quantsim export API. This allows layer-output comparison
amongst FP32 model, quantization simulated model and actually quantized model on target-device to debug accuracy miss-match issues.

Expand Down Expand Up @@ -34,21 +34,21 @@ Code Example
:start-after: # Step 0. Import statements
:end-before: # End step 0

**Obtain Original or QuantSim model session**
**Obtain Original or QuantSim model session from AIMET Export Artifacts**

.. literalinclude:: ../tf_code_examples/layer_output_generation_code_example.py
:language: python
:start-after: # Step 1. Obtain original or quantsim model
:end-before: # End step 1

**Obtain pre-processed inputs**
**Obtain inputs for which we want to generate intermediate layer-outputs**

.. literalinclude:: ../tf_code_examples/layer_output_generation_code_example.py
:language: python
:start-after: # Step 2. Obtain pre-processed inputs
:end-before: # End step 2

**Generate Layer Outputs**
**Generate layer-outputs**

.. literalinclude:: ../tf_code_examples/layer_output_generation_code_example.py
:language: python
Expand Down
8 changes: 4 additions & 4 deletions Docs/api_docs/torch_layer_output_generation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
AIMET PyTorch Layer Output Generation API
================================

This API captures and saves intermediate layer-outputs of a model. The model can be original(FP32) or quantsim.
This API captures and saves intermediate layer-outputs of a model. The model can be original (FP32) or quantsim.
The layer-outputs are named according to the exported PyTorch/ONNX/TorchScript model by the quantsim export API.
This allows layer-output comparison amongst FP32 model, quantization simulated model and actually quantized model
on target-device to debug accuracy miss-match issues.
Expand Down Expand Up @@ -44,21 +44,21 @@ Code Example
:start-after: # Step 0. Import statements
:end-before: # End step 0

**Obtain Original or QuantSim model**
**Obtain Original or QuantSim model from AIMET Export Artifacts**

.. literalinclude:: ../torch_code_examples/layer_output_generation_code_example.py
:language: python
:start-after: # Step 1. Obtain original or quantsim model
:end-before: # End step 1

**Obtain pre-processed inputs**
**Obtain inputs for which we want to generate intermediate layer-outputs**

.. literalinclude:: ../torch_code_examples/layer_output_generation_code_example.py
:language: python
:start-after: # Step 2. Obtain pre-processed inputs
:end-before: # End step 2

**Generate Layer Outputs**
**Generate layer-outputs**

.. literalinclude:: ../torch_code_examples/layer_output_generation_code_example.py
:language: python
Expand Down
43 changes: 18 additions & 25 deletions Docs/keras_code_examples/layer_output_generation_code_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,47 +38,40 @@
""" Code example to generate intermediate layer outputs of a model """

# Step 0. Import statements
import numpy as np
import tensorflow as tf

from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_tensorflow.keras.layer_output_utils import LayerOutputUtil
# End step 0

# Step 1. Obtain original or quantsim model
def quantsim_forward_pass_callback(model, dummy_input):
_ = model.predict(dummy_input)
# Load the model.
model = tf.keras.models.load_model('path/to/aimet_export_artifacts/model.h5')

# Load the baseline/original (FP32) model
base_model = load_baseline_model()
# Use same arguments as that were used for the exported QuantSim model. For sake of simplicity only mandatory arguments are passed below.
quantsim = QuantizationSimModel(model)

dummy_input = np.random.rand(1, 16, 16, 3)
# Load exported encodings into quantsim object.
quantsim.load_encodings_to_sim('path/to/aimet_export_artifacts/model.encodings')

# Create QuantizationSim Object
quantsim_obj = QuantizationSimModel(
model=base_model,
quant_scheme='tf_enhanced',
rounding_mode="nearest",
default_output_bw=8,
default_param_bw=8,
in_place=False,
config_file=None
)

# Compute encodings
quantsim_obj.compute_encodings(quantsim_forward_pass_callback,
forward_pass_callback_args=dummy_input
)
# Check whether constructed original and quantsim model are running properly before using Layer Output Generation API.
_ = model.predict(dummy_input)
_ = quantsim.predict(dummy_input)
# End step 1

# Step 2. Obtain pre-processed inputs
# Get the inputs that are pre-processed using the same manner while computing quantsim encodings
# Use same input pre-processing pipeline as was used for computing the quantization encodings.
input_batches = get_pre_processed_inputs()
# End step 2

# Step 3. Generate outputs
# Generate layer-outputs
layer_output_util = LayerOutputUtil(model=quantsim_obj.model, save_dir="./KerasLayerOutput")
# Use original model to get fp32 layer-outputs
fp32_layer_output_util = LayerOutputUtil(model=model, save_dir='fp32_layer_outputs')

# Use quantsim model to get quantsim layer-outputs
quantsim_layer_output_util = LayerOutputUtil(model=quantsim.model, save_dir='quantsim_layer_outputs')

for input_batch in input_batches:
layer_output_util.generate_layer_outputs(input_batch=input_batch)
fp32_layer_output_util.generate_layer_outputs(input_batch=input_batch)
quantsim_layer_output_util.generate_layer_outputs(input_batch=input_batch)
# End step 3
40 changes: 24 additions & 16 deletions Docs/onnx_code_examples/layer_output_generation_code_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,35 +38,43 @@
""" Code example to generate intermediate layer outputs of a model """

# Step 0. Import statements
import numpy as np
from aimet_onnx.quantsim import QuantizationSimModel
import onnx
from onnxruntime import InferenceSession

from aimet_onnx.quantsim import QuantizationSimModel, load_encodings_to_sim
from aimet_onnx.layer_output_utils import LayerOutputUtil
# End step 0

# Step 1. Obtain original or quantsim model
# Obtain original model
original_model = Model()
# Load the model.
model = onnx.load('path/to/aimet_export_artifacts/model.onnx')

# Obtain quantsim model
input_shape = (1, 3, 224, 224)
dummy_data = np.random.randn(*input_shape).astype(np.float32)
input_dict = {'input': dummy_data}
# Use same arguments as that were used for the exported QuantSim model. For sake of simplicity only mandatory arguments are passed below.
quantsim = QuantizationSimModel(model=model, dummy_input=dummy_input_dict, use_cuda=False)

def forward_pass(session, input_dict):
session.run(None, input_dict)
# Load exported encodings into quantsim object
load_encodings_to_sim(quantsim, 'path/to/aimet_export_artifacts/model.encodings')

quantsim = QuantizationSimModel(model=original_model, dummy_input=input_dict, use_cuda=False)
quantsim.compute_encodings(forward_pass, input_dict)
# Check whether constructed original and quantsim model are running properly before using Layer Output Generation API.
_ = InferenceSession(model.SerializeToString()).run(None, dummy_input_dict)
_ = quantsim.session.run(None, dummy_input_dict)
# End step 1

# Step 2. Obtain pre-processed inputs
# Get the inputs that are pre-processed using the same manner while computing quantsim encodings in numpy ndarray
# Use same input pre-processing pipeline as was used for computing the quantization encodings.
input_batches = get_pre_processed_inputs()
# End step 2

# Step 3. Generate outputs
# Generate layer-outputs
layer_output_util = LayerOutputUtil(model=quantsim.model.model, dir_path='./layer_output_dump')
# Use original model to get fp32 layer-outputs
fp32_layer_output_util = LayerOutputUtil(model=model, dir_path='./fp32_layer_outputs')

# Use quantsim model to get quantsim layer-outputs
quantsim_layer_output_util = LayerOutputUtil(model=quantsim.model.model, dir_path='./quantsim_layer_outputs')

for input_batch in input_batches:
layer_output_util.generate_layer_outputs(input_batch)
fp32_layer_output_util.generate_layer_outputs(input_batch)
quantsim_layer_output_util.generate_layer_outputs(input_batch)

# Note: Generate layer-outputs for fp32 model before creating quantsim model becuase the fp32 model itself is modified to get quantsim version.
# End step 3
50 changes: 24 additions & 26 deletions Docs/tf_code_examples/layer_output_generation_code_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,48 +38,46 @@
""" Code example to generate intermediate layer outputs of a model """

# Step 0. Import statements
import numpy as np
import tensorflow as tf

from aimet_tensorflow.examples.test_models import keras_model
from aimet_tensorflow.quantsim import QuantizationSimModel
from aimet_tensorflow.layer_output_utils import LayerOutputUtil
# End step 0

# Step 1. Obtain original or quantsim model
# Load original model into session
def cpu_session():
tf.compat.v1.reset_default_graph()
with tf.device('/cpu:0'):
model = keras_model()
init = tf.compat.v1.global_variables_initializer()
session = tf.compat.v1.Session()
session.run(init)
return session

session = cpu_session()
# Step 1. Obtain original or quantsim model
# Load the model into session.
tf.compat.v1.reset_default_graph()
session = tf.compat.v1.Session()
saver = tf.compat.v1.train.import_meta_graph('path/to/aimet_export_artifacts/model.meta')
saver.restore(session, 'path/to/aimet_export_artifacts/model')

# Obtain quantsim model session
def quantsim_forward_pass_callback(session, dummy_input):
model_input = session.graph.get_tensor_by_name('conv2d_input:0')
model_output = session.graph.get_tensor_by_name('keras_model/Softmax_quantized:0')
return session.run(model_output, feed_dict={model_input: dummy_input})
# Use same arguments as that were used for the exported QuantSim model. For sake of simplicity only mandatory arguments are passed below.
quantsim = QuantizationSimModel(session, starting_op_names, output_op_names, use_cuda=False)

dummy_input = np.random.randn(1, 16, 16, 3)
# Load exported encodings into quantsim object.
quantsim.load_encodings_to_sim('path/to/aimet_export_artifacts/model.encodings')

quantsim = QuantizationSimModel(session, ['conv2d_input'], ['keras_model/Softmax'], use_cuda=False)
quantsim.compute_encodings(quantsim_forward_pass_callback, dummy_input)
# Check whether constructed original and quantsim model are running properly before using Layer Output Generation API.
_ = session.run(None, feed_dict)
_ = quantsim.session.run(None, feed_dict)
# End step 1


# Step 2. Obtain pre-processed inputs
# Get the inputs that are pre-processed using the same manner while computing quantsim encodings
# Use same input pre-processing pipeline as was used for computing the quantization encodings.
input_batches = get_pre_processed_inputs()
# End step 2


# Step 3. Generate outputs
# Generate layer-outputs
layer_output_util = LayerOutputUtil(session=quantsim.session, starting_op_names=['conv2d_input'],
output_op_names=['keras_model/Softmax'], dir_path='./layer_output_dump')
# Use original session to get fp32 layer-outputs
fp32_layer_output_util = LayerOutputUtil(session, starting_op_names, output_op_names, dir_path='./fp32_layer_outputs')

# Use quantsim session to get quantsim layer-outputs
quantsim_layer_output_util = LayerOutputUtil(quantsim.session, starting_op_names, output_op_names, dir_path='./quantsim_layer_outputs')

for input_batch in input_batches:
layer_output_util.generate_layer_outputs(input_batch)
fp32_layer_output_util.generate_layer_outputs(input_batch)
quantsim_layer_output_util.generate_layer_outputs(input_batch)
# End step 3
Loading

0 comments on commit b98f200

Please sign in to comment.