[OpenVINO][Examples] Add Quantization for the OpenVINO Stable Diffusion Example#17807
[OpenVINO][Examples] Add Quantization for the OpenVINO Stable Diffusion Example#17807anzr299 wants to merge 20 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17807
Note: Links to docs will display an error until the docs builds have been completed.
|
This PR needs a
|
…fusion component model names reliable
daniil-lyakhov
left a comment
There was a problem hiding this comment.
In general:
I think the maybe logic is not worth it there, and it would be nicer to have a separate quantize_unet and compress_model functions in each export function.
I mean now the diamond structure of export looks too complicated that it is in reality
| from executorch.exir.backend.backend_details import CompileSpec | ||
| from torch.export import export | ||
| from torchao.quantization.pt2e.quantizer.quantizer import Quantizer | ||
| from tqdm import tqdm # type: ignore[import-untyped] |
There was a problem hiding this comment.
Why is the type ignore here?
There was a problem hiding this comment.
It was suggested by the lintrunner
| # Configure OpenVINO compilation | ||
| compile_spec = [CompileSpec("device", device.encode())] | ||
| partitioner = OpenvinoPartitioner(compile_spec) | ||
|
|
||
| # Lower to edge dialect and apply OpenVINO backend | ||
| edge_manager = to_edge_transform_and_lower( | ||
| exported_program, partitioner=[partitioner] |
There was a problem hiding this comment.
Ah yes. Great catch!
| if not is_quantization_enabled: | ||
| return model |
There was a problem hiding this comment.
Please keep only code which could raise an error inside of the try catch block
| # Quantize activations for the Unet Model. Other models are weights-only quantized. | ||
| pipeline = self.model_loader.pipeline | ||
| try: | ||
| # We need the models in FP32 to run inference for calibration data collection | ||
| self._set_pipeline_dtype(pipeline, torch.float32) | ||
| calibration_dataset = self.get_unet_calibration_dataset(pipeline) | ||
| finally: | ||
| self._set_pipeline_dtype(pipeline, self.model_loader.dtype) | ||
|
|
||
| quantized_model = quantize_model( | ||
| model, | ||
| mode=QuantizationMode.INT8_TRANSFORMER, | ||
| calibration_dataset=calibration_dataset, | ||
| smooth_quant=True, | ||
| ) |
There was a problem hiding this comment.
This if body is so big it worth to split the function on two like quantize and compress
There was a problem hiding this comment.
I have removed the try-finally. Now it is just quantize and compress
There was a problem hiding this comment.
What about separate functions quantize and compress?
|
|
||
| def forward(self, *args, **kwargs): | ||
| """ | ||
| obtain and pass each input individually to ensure the order is maintained |
There was a problem hiding this comment.
| obtain and pass each input individually to ensure the order is maintained | |
| Obtain and pass each input individually to ensure the order is maintained |
| dataset = datasets.load_dataset( | ||
| "google-research-datasets/conceptual_captions", | ||
| split="train", | ||
| trust_remote_code=True, | ||
| ).shuffle(seed=42) |
There was a problem hiding this comment.
Maybe put the dataset name as an example param?
| wrapped_unet = UNetWrapper(pipeline.unet, pipeline.unet.config) | ||
| pipeline.unet = wrapped_unet | ||
| # Run inference for data collection | ||
| pbar = tqdm(total=calibration_dataset_size) |
There was a problem hiding this comment.
Maybe executorch has some sort of progress bar already? The less dependencies the better
There was a problem hiding this comment.
tqdm is used in multiple places inside executorch examples too.
| if self.should_quantize_model(sd_model_component): | ||
| # Quantize activations for the Unet Model. Other models are weights-only quantized. | ||
| pipeline = self.model_loader.pipeline | ||
| try: | ||
| # We need the models in FP32 to run inference for calibration data collection | ||
| self._set_pipeline_dtype(pipeline, torch.float32) | ||
| calibration_dataset = self.get_unet_calibration_dataset(pipeline) | ||
| finally: | ||
| self._set_pipeline_dtype(pipeline, self.model_loader.dtype) |
There was a problem hiding this comment.
If some condition then calibration dataset is set for stable diffusion, don't see value in the should_quantize_model method
| # Quantize activations for the Unet Model. Other models are weights-only quantized. | ||
| pipeline = self.model_loader.pipeline | ||
| try: | ||
| # We need the models in FP32 to run inference for calibration data collection | ||
| self._set_pipeline_dtype(pipeline, torch.float32) | ||
| calibration_dataset = self.get_unet_calibration_dataset(pipeline) | ||
| finally: | ||
| self._set_pipeline_dtype(pipeline, self.model_loader.dtype) | ||
|
|
||
| quantized_model = quantize_model( | ||
| model, | ||
| mode=QuantizationMode.INT8_TRANSFORMER, | ||
| calibration_dataset=calibration_dataset, | ||
| smooth_quant=True, | ||
| ) |
There was a problem hiding this comment.
What about separate functions quantize and compress?
| exported_program = self._export_and_maybe_quantize( | ||
| vae_decoder, | ||
| dummy_inputs[sd_model_component], | ||
| sd_model_component, | ||
| self.is_quantization_enabled, | ||
| ) |
There was a problem hiding this comment.
export_program = self._export(...)
if quantize:
exported_model = quantize(exported_model)
There was a problem hiding this comment.
Ah okay, you mean separating the export and quantize logic?
There was a problem hiding this comment.
Regarding #17807 (comment)
The quantization is already 1 seperate function. Do you mean a function inside this file which collects calibration dataset and quantizes both?
Regarding compression, sure I will move it to a seperate function. I thought I would show the changes(removal of try-finally removes bulk of code there)
There was a problem hiding this comment.
A PTQ fn for Unet and a WC function for the other parts, don't see a reason why the WC and PTQ should be united under one single function
There was a problem hiding this comment.
Oh I see. I made it into a single function which performs quantization or compression depending on the model.
It seemed more reasonale because for the user both are quantization(compression == weights only quantization) since compression is relatively more nncf term
| exported_program_module | ||
| ) | ||
| # Re-export the transformed torch.fx.GraphModule to ExportedProgram | ||
| exported_program = export(exported_program_module, component_dummy_inputs) |
There was a problem hiding this comment.
You can put it to the quantization fn
| ) | ||
|
|
||
| @staticmethod | ||
| def _compress_non_unet_model( |
There was a problem hiding this comment.
| def _compress_non_unet_model( | |
| def compress_model( |
| pipeline.unet = original_unet | ||
| return calibration_data | ||
|
|
||
| def _quantize_unet_model( |
There was a problem hiding this comment.
| def _quantize_unet_model( | |
| def quantize_unet_model( |
| prompt = batch[dataset_column] | ||
| if not isinstance(prompt, str): | ||
| prompt = str(prompt) |
There was a problem hiding this comment.
| prompt = batch[dataset_column] | |
| if not isinstance(prompt, str): | |
| prompt = str(prompt) | |
| prompt = str(batch[dataset_column]) |
Should work
| prompt = batch[dataset_column] | ||
| if not isinstance(prompt, str): | ||
| prompt = str(prompt) | ||
| if len(prompt.split()) > pipeline.tokenizer.model_max_length: |
There was a problem hiding this comment.
You sure num tokens and num of spaces are always equal?
There was a problem hiding this comment.
Pull request overview
Extends the OpenVINO Stable Diffusion (LCM) example to support INT8 post-training quantization (PTQ) during export, and exposes the new dtype option in both export and inference CLIs.
Changes:
- Add
--dtype int8path inexport_lcm.pywith UNet calibration + quantization and weights-only compression for other components. - Introduce
StableDiffusionComponentenum to avoid stringly-typed component keys for dummy inputs. - Update example docs and dependencies to reflect quantization usage.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/openvino/stable_diffusion/requirements.txt | Adds tqdm dependency for calibration progress reporting. |
| examples/openvino/stable_diffusion/openvino_lcm.py | Extends dtype CLI choices to include int8. |
| examples/openvino/stable_diffusion/export_lcm.py | Implements PTQ export flow, calibration dataset handling, and component-key refactor. |
| examples/openvino/stable_diffusion/README.md | Documents INT8 export and inference usage. |
| examples/models/stable_diffusion/model.py | Adds StableDiffusionComponent enum and switches dummy input dict keys to the enum. |
| examples/models/stable_diffusion/init.py | Re-exports the new StableDiffusionComponent. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the OpenVINO Stable Diffusion (LCM) example to support PTQ quantization (int8) during export and adds CLI/README updates to expose the new workflow.
Changes:
- Add
--dtype int8option for export and inference scripts. - Implement PTQ quantization for UNet (activation-aware) and weights-only compression for other components during export.
- Introduce
StableDiffusionComponentenum to make component naming/lookup more robust.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/openvino/stable_diffusion/requirements.txt | Adds an extra dependency for the example environment. |
| examples/openvino/stable_diffusion/openvino_lcm.py | Allows --dtype int8 in the inference CLI. |
| examples/openvino/stable_diffusion/export_lcm.py | Implements quantization + calibration dataset collection and wires it into export. |
| examples/openvino/stable_diffusion/README.md | Documents the new int8 export/inference path. |
| examples/models/stable_diffusion/model.py | Adds StableDiffusionComponent enum and uses it for dummy-input keys. |
| examples/models/stable_diffusion/init.py | Re-exports the new enum. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Summary
Extend the stable diffusion example for OpenVINO backend with quantization support.