Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

1Yandex Research        2HSE University        3Skoltech        4MIPT        5Neural Magic        6IST Austria
*Indicates Equal Contribution
MY ALT TEXT

Overview of the proposed layer-wise calibration procedure before fine-tuning.

Abstract

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in size and already contain billions of parameters. As a result, state-of-the-art text-to-image models are becoming less accessible in practice, especially in resource-limited environments. Post-training quantization (PTQ) tackles this issue by compressing the pretrained model weights into lower-bit representations. Recent diffusion quantization techniques primarily rely on uniform scalar quantization, providing decent performance for the models compressed to 4 bits. This work demonstrates that more versatile vector quantization (VQ) may achieve higher compression rates for large-scale text-to-image diffusion models. Specifically, we tailor vector-based PTQ methods to recent billion-scale text-to-image models, such as SDXL and SDXL-Turbo, and show that the diffusion models of 2B+ parameters compressed to around 3 bits using VQ exhibit the same image quality and textual alignment as previous 4-bit compression techniques, as confirmed by human evaluation.

SDXL

SDXL-Turbo

Human evaluation

MY ALT TEXT

Left. VQDM vs baselines. Right. Quantized vs full-precision models.



Automatic Metrics

MY ALT TEXT

Evaluation of quantized SDXL models for different bit-widths in terms of automatic metrics.

BibTeX

@article{egiazarian2024vqdm,
  title={Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization},
  author={Vage Egiazarian and Denis Kuznedelev and Anton Voronov and Ruslan Svirschevski and Michael Goin and Daniil Pavlov and Dan Alistarh and Dmitry Baranchuk},
  journal={arXiv preprint arXiv:2409.00492},
  year={2024},
  url={https://arxiv.org/abs/2409.00492}
}