In recent years, denoising diffusion models have taken center stage in the field of generative AI. From creating stunningly realistic images to powering advanced text-to-image systems like GLIDE and DALL·E 2, these models are helping AI turn raw noise into coherent, meaningful outputs. But what makes these models tick? The answer lies in mathematics, specifically the stochastic processes and probability theory that govern their operations. In this post, we’ll break down the math behind denoising diffusion models, making it accessible whether you’re an AI enthusiast or a budding machine learning engineer.
Understanding the Problem: Noise and Clarity
The primary goal of denoising diffusion models is to learn how to reverse the process of adding random noise to data (such as images) step by step, ultimately recovering the original data from pure noise. This is often formalized as a “generative process.” The process is inspired by non-equilibrium thermodynamics, where systems gradually evolve from disorder (noise) to order (clarity).
Step 1: Adding Noise – The Forward Process
The forward process (also known as the diffusion process) introduces noise to the data in small increments over several steps. Mathematically, given an original data sample x0 (e.g., a clean image), the forward process creates a series of increasingly noisy samples x1, x2, …, xT, where T is the total number of noise steps. At each step, Gaussian noise is added:
xt = \sqrt{1 - \betat} \cdot xt-1 + \sqrt{\betat} \cdot \epsilon
Where:
- βt is a time-dependent noise schedule, usually increasing as t grows.
- ε is drawn from a standard normal distribution (mean 0, variance 1).
This process can be thought of as a Markov chain, where every state only depends on the one before it. More on Markov chains can be found in this detailed overview by Wolfram.
Step 2: The Goal – The Reverse Process
The magic of denoising diffusion models lies in learning how to reverse the noise addition process. Here, we train a neural network to gradually “denoise” the sample, step by step, until we restore the original data—or generate a new variation that’s indistinguishable from reality:
p_θ(xt-1 | xt)
This conditional probability distribution is estimated by the model. What’s really happening is that the network tries to predict the noise that was added at each step and remove it—effectively moving through a sequence that transforms pure noise into clarity.
The Key Mathematical Insights
Denoising diffusion models rely on fundamental ideas from probability and variational inference:
- Variational Lower Bound (VLB): The model is trained to maximize the likelihood of the data under a complicated, intractable distribution. To get around this, it optimizes a surrogate known as the Evidence Lower Bound (ELBO).
- Kullback-Leibler Divergence: Each reverse step is designed to minimize the divergence between the model’s denoising prediction and the true data distribution, using the concept of KL divergence.
- Stochastic Differential Equations (SDEs): The process closely mirrors a continuous stochastic process, making use of SDEs, as explained in cutting-edge research from Carnegie Mellon University.
Step-by-Step Example: Diffusing and Denoising an Image
- Start with an Image: Take a sample image x0.
- Diffusion (Add Noise): Gradually add noise over 1,000 steps using the formula above, ending with xT that looks like pure static.
- Denoising (Learned by the Model): Starting from xT, use the trained neural network to predict and remove the noise at each step, producing xT-1, then xT-2, and so on, until returning to a clean image x0.
- Sampling New Images: Instead of starting with a real image, begin at random noise and use the learned denoising process to create novel, realistic examples. This is how models like Google’s Imagen generate new images.
Applications and Further Reading
Denoising diffusion models have shown remarkable results in multiple domains:
To dive deeper into the math and mechanisms, check out these resources:
- Lil’Log: Diffusion Models – A Comprehensive Guide
- Denoising Diffusion Probabilistic Models (Original DDPM Paper)
- UVA’s DDPM Tutorial Notebook
Conclusion
Denoising diffusion models represent a mathematically elegant approach to generative modeling, leveraging the power of stochastic processes and neural networks. By understanding the underlying math, we gain insight into why these models are so powerful—and how they’re able to generate clear, realistic data from pure chaos. The denoising diffusion revolution is just beginning, and its impact on AI is sure to grow in the coming years.