[OpenVINO][Examples] Add Quantization for the OpenVINO Stable Diffusion Example by anzr299 · Pull Request #17807 · pytorch/executorch

anzr299 · 2026-03-03T07:17:59Z

Summary

Extend the stable diffusion example for OpenVINO backend with quantization support.

pytorch-bot · 2026-03-03T07:18:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17807

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 9 Awaiting Approval

As of commit d6db584 with merge base 40200e6 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-03T07:18:42Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…fusion component model names reliable

daniil-lyakhov

In general:

I think the maybe logic is not worth it there, and it would be nicer to have a separate quantize_unet and compress_model functions in each export function.

I mean now the diamond structure of export looks too complicated that it is in reality

daniil-lyakhov · 2026-03-03T12:37:50Z

examples/openvino/stable_diffusion/export_lcm.py

 from executorch.exir.backend.backend_details import CompileSpec
 from torch.export import export
+from torchao.quantization.pt2e.quantizer.quantizer import Quantizer
+from tqdm import tqdm  # type: ignore[import-untyped]


Why is the type ignore here?

It was suggested by the lintrunner

daniil-lyakhov · 2026-03-03T12:40:53Z

examples/openvino/stable_diffusion/export_lcm.py

+            # Configure OpenVINO compilation
+            compile_spec = [CompileSpec("device", device.encode())]
+            partitioner = OpenvinoPartitioner(compile_spec)
+
+            # Lower to edge dialect and apply OpenVINO backend
+            edge_manager = to_edge_transform_and_lower(
+                exported_program, partitioner=[partitioner]


Ah yes. Great catch!

daniil-lyakhov · 2026-03-03T12:42:59Z

examples/openvino/stable_diffusion/export_lcm.py

+            if not is_quantization_enabled:
+                return model


Please keep only code which could raise an error inside of the try catch block

daniil-lyakhov · 2026-03-03T12:45:08Z

examples/openvino/stable_diffusion/export_lcm.py

+                # Quantize activations for the Unet Model. Other models are weights-only quantized.
+                pipeline = self.model_loader.pipeline
+                try:
+                    # We need the models in FP32 to run inference for calibration data collection
+                    self._set_pipeline_dtype(pipeline, torch.float32)
+                    calibration_dataset = self.get_unet_calibration_dataset(pipeline)
+                finally:
+                    self._set_pipeline_dtype(pipeline, self.model_loader.dtype)
+
+                quantized_model = quantize_model(
+                    model,
+                    mode=QuantizationMode.INT8_TRANSFORMER,
+                    calibration_dataset=calibration_dataset,
+                    smooth_quant=True,
+                )


This if body is so big it worth to split the function on two like quantize and compress

I have removed the try-finally. Now it is just quantize and compress

What about separate functions quantize and compress?

daniil-lyakhov · 2026-03-03T12:45:54Z

examples/openvino/stable_diffusion/export_lcm.py

+
+            def forward(self, *args, **kwargs):
+                """
+                obtain and pass each input individually to ensure the order is maintained


Suggested change

obtain and pass each input individually to ensure the order is maintained

Obtain and pass each input individually to ensure the order is maintained

daniil-lyakhov · 2026-03-03T12:46:45Z

examples/openvino/stable_diffusion/export_lcm.py

+        dataset = datasets.load_dataset(
+            "google-research-datasets/conceptual_captions",
+            split="train",
+            trust_remote_code=True,
+        ).shuffle(seed=42)


Maybe put the dataset name as an example param?

daniil-lyakhov · 2026-03-03T12:47:32Z

examples/openvino/stable_diffusion/export_lcm.py

+        wrapped_unet = UNetWrapper(pipeline.unet, pipeline.unet.config)
+        pipeline.unet = wrapped_unet
+        # Run inference for data collection
+        pbar = tqdm(total=calibration_dataset_size)


Maybe executorch has some sort of progress bar already? The less dependencies the better

tqdm is used in multiple places inside executorch examples too.

daniil-lyakhov · 2026-03-03T12:51:25Z

examples/openvino/stable_diffusion/export_lcm.py

+            if self.should_quantize_model(sd_model_component):
+                # Quantize activations for the Unet Model. Other models are weights-only quantized.
+                pipeline = self.model_loader.pipeline
+                try:
+                    # We need the models in FP32 to run inference for calibration data collection
+                    self._set_pipeline_dtype(pipeline, torch.float32)
+                    calibration_dataset = self.get_unet_calibration_dataset(pipeline)
+                finally:
+                    self._set_pipeline_dtype(pipeline, self.model_loader.dtype)


If some condition then calibration dataset is set for stable diffusion, don't see value in the should_quantize_model method

daniil-lyakhov · 2026-03-09T15:33:11Z

examples/openvino/stable_diffusion/export_lcm.py

+                # Quantize activations for the Unet Model. Other models are weights-only quantized.
+                pipeline = self.model_loader.pipeline
+                try:
+                    # We need the models in FP32 to run inference for calibration data collection
+                    self._set_pipeline_dtype(pipeline, torch.float32)
+                    calibration_dataset = self.get_unet_calibration_dataset(pipeline)
+                finally:
+                    self._set_pipeline_dtype(pipeline, self.model_loader.dtype)
+
+                quantized_model = quantize_model(
+                    model,
+                    mode=QuantizationMode.INT8_TRANSFORMER,
+                    calibration_dataset=calibration_dataset,
+                    smooth_quant=True,
+                )


What about separate functions quantize and compress?

daniil-lyakhov · 2026-03-09T15:34:57Z

examples/openvino/stable_diffusion/export_lcm.py

+            exported_program = self._export_and_maybe_quantize(
+                vae_decoder,
+                dummy_inputs[sd_model_component],
+                sd_model_component,
+                self.is_quantization_enabled,
+            )


export_program = self._export(...) if quantize: exported_model = quantize(exported_model)

Ah okay, you mean separating the export and quantize logic?

Regarding #17807 (comment)
The quantization is already 1 seperate function. Do you mean a function inside this file which collects calibration dataset and quantizes both?

Regarding compression, sure I will move it to a seperate function. I thought I would show the changes(removal of try-finally removes bulk of code there)

A PTQ fn for Unet and a WC function for the other parts, don't see a reason why the WC and PTQ should be united under one single function

Oh I see. I made it into a single function which performs quantization or compression depending on the model.

It seemed more reasonale because for the user both are quantization(compression == weights only quantization) since compression is relatively more nncf term

daniil-lyakhov · 2026-03-10T16:47:06Z

examples/openvino/stable_diffusion/export_lcm.py

+                    exported_program_module
+                )
+            # Re-export the transformed torch.fx.GraphModule to ExportedProgram
+            exported_program = export(exported_program_module, component_dummy_inputs)


You can put it to the quantization fn

daniil-lyakhov · 2026-03-10T16:47:37Z

examples/openvino/stable_diffusion/export_lcm.py

+        )
+
+    @staticmethod
+    def _compress_non_unet_model(


Suggested change

def _compress_non_unet_model(

def compress_model(

daniil-lyakhov · 2026-03-10T16:47:47Z

examples/openvino/stable_diffusion/export_lcm.py

+        pipeline.unet = original_unet
+        return calibration_data
+
+    def _quantize_unet_model(


Suggested change

def _quantize_unet_model(

def quantize_unet_model(

daniil-lyakhov · 2026-03-12T14:28:38Z

examples/openvino/stable_diffusion/export_lcm.py

+            prompt = batch[dataset_column]
+            if not isinstance(prompt, str):
+                prompt = str(prompt)


Suggested change

prompt = batch[dataset_column]

if not isinstance(prompt, str):

prompt = str(prompt)

prompt = str(batch[dataset_column])

Should work

daniil-lyakhov · 2026-03-12T14:29:14Z

examples/openvino/stable_diffusion/export_lcm.py

+            prompt = batch[dataset_column]
+            if not isinstance(prompt, str):
+                prompt = str(prompt)
+            if len(prompt.split()) > pipeline.tokenizer.model_max_length:


You sure num tokens and num of spaces are always equal?

Copilot

Pull request overview

Extends the OpenVINO Stable Diffusion (LCM) example to support INT8 post-training quantization (PTQ) during export, and exposes the new dtype option in both export and inference CLIs.

Changes:

Add --dtype int8 path in export_lcm.py with UNet calibration + quantization and weights-only compression for other components.
Introduce StableDiffusionComponent enum to avoid stringly-typed component keys for dummy inputs.
Update example docs and dependencies to reflect quantization usage.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
examples/openvino/stable_diffusion/requirements.txt	Adds tqdm dependency for calibration progress reporting.
examples/openvino/stable_diffusion/openvino_lcm.py	Extends dtype CLI choices to include `int8`.
examples/openvino/stable_diffusion/export_lcm.py	Implements PTQ export flow, calibration dataset handling, and component-key refactor.
examples/openvino/stable_diffusion/README.md	Documents INT8 export and inference usage.
examples/models/stable_diffusion/model.py	Adds `StableDiffusionComponent` enum and switches dummy input dict keys to the enum.
examples/models/stable_diffusion/init.py	Re-exports the new `StableDiffusionComponent`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

examples/openvino/stable_diffusion/export_lcm.py

examples/openvino/stable_diffusion/README.md

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR extends the OpenVINO Stable Diffusion (LCM) example to support PTQ quantization (int8) during export and adds CLI/README updates to expose the new workflow.

Changes:

Add --dtype int8 option for export and inference scripts.
Implement PTQ quantization for UNet (activation-aware) and weights-only compression for other components during export.
Introduce StableDiffusionComponent enum to make component naming/lookup more robust.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
examples/openvino/stable_diffusion/requirements.txt	Adds an extra dependency for the example environment.
examples/openvino/stable_diffusion/openvino_lcm.py	Allows `--dtype int8` in the inference CLI.
examples/openvino/stable_diffusion/export_lcm.py	Implements quantization + calibration dataset collection and wires it into export.
examples/openvino/stable_diffusion/README.md	Documents the new int8 export/inference path.
examples/models/stable_diffusion/model.py	Adds `StableDiffusionComponent` enum and uses it for dummy-input keys.
examples/models/stable_diffusion/init.py	Re-exports the new enum.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

examples/openvino/stable_diffusion/export_lcm.py

examples/openvino/stable_diffusion/README.md

examples/openvino/stable_diffusion/export_lcm.py

anzr299 added 2 commits February 23, 2026 17:53

init

056ed58

update readme

810214d

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2026

anzr299 added 5 commits March 3, 2026 12:02

lint

a7a41e3

Merge branch 'main' into an/openvino/quantize_lcm_model

135fd60

fix bugs; introduce support for fp16; add enum to maintain stable dif…

a164209

…fusion component model names reliable

minor comment

1b605aa

lint

ab09c86

daniil-lyakhov suggested changes Mar 3, 2026

View reviewed changes

anzr299 added 3 commits March 4, 2026 16:27

review changes

a291ed8

readme for quantize

b27ef24

minor fix

5463abe

anzr299 requested a review from daniil-lyakhov March 4, 2026 12:56

anzr299 added 2 commits March 5, 2026 16:29

remove excess

d457d76

Merge branch 'main' into an/openvino/quantize_lcm_model

0ad4a25

daniil-lyakhov suggested changes Mar 9, 2026

View reviewed changes

review changes

e065b1b

anzr299 requested a review from daniil-lyakhov March 10, 2026 16:41

Merge branch 'main' into an/openvino/quantize_lcm_model

08111fe

daniil-lyakhov reviewed Mar 10, 2026

View reviewed changes

review changes

d28c5cb

daniil-lyakhov approved these changes Mar 12, 2026

View reviewed changes

anzr299 added 3 commits March 13, 2026 16:40

review changes

6b7eada

Merge branch 'main' into an/openvino/quantize_lcm_model

5d98f9e

lint

9d7b0d7

anzr299 marked this pull request as ready for review March 14, 2026 10:25

anzr299 requested a review from lucylq as a code owner March 14, 2026 10:25

Copilot AI review requested due to automatic review settings March 14, 2026 10:25

Copilot started reviewing on behalf of anzr299 March 14, 2026 10:25 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Apply suggestions from code review

d6b8933

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 14, 2026 10:37

Copilot started reviewing on behalf of anzr299 March 14, 2026 10:38 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

review changes

d6db584

	obtain and pass each input individually to ensure the order is maintained
	Obtain and pass each input individually to ensure the order is maintained

Conversation

anzr299 commented Mar 3, 2026

Summary

Uh oh!

pytorch-bot bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17807

⚠️ 9 Awaiting Approval

Uh oh!

github-actions bot commented Mar 3, 2026

This PR needs a release notes: label

Uh oh!

daniil-lyakhov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pytorch-bot bot commented Mar 3, 2026 •

edited

Loading

This PR needs a `release notes:` label