Distilling the Knowledge in Diffusion Models

Jun 18, 2023·

Tim Dockhorn

Robin Rombach

Andreas Blatmann

Yaoliang Yu

· 0 min read

PDF Cite URL

Abstract

Large-scale diffusion models have achieved unprecedented results in (conditional) image synthesis, however, they generally require a large amount of GPU memory and are slow at inference time. To overcome this limitation, we propose to distill the knowledge of pre-trained (teacher) diffusion models into smaller student diffusion models via an approximate score matching objective. For classifier-free guided generation on CIFAR-10, our student model achieves a FID-5K of 8.03 using 273G flops. In comparison, the larger teacher model only achieves a FID-5K of 294 using 424G flops. We present initial experiments on distilling the knowledge of Stable Diffusion, a large scale text-to-image diffusion model, and discuss several promising future directions.

Type

Book section

Publication

CVPR workshop on Generative Models for Computer Vision

Last updated on Jun 18, 2023

Workshop

← Operator Selection and Ordering in a Pipeline Approach to Efficiency Optimizations for Transformers Jul 1, 2023

Multi-Objective Reinforcement Learning: Convexity, Stationarity and Pareto Optimality Feb 18, 2023 →