Generative Modeling Overview: a Probabilistic Perspective
Goals and Traits of Generative Models
- Density Estimation: pointwise evaluation of \(\pt(\vx)\).
- Data Generation: generating new samples \(\rvx\sim\pt(\vx)\).
- Conditional Generation: \(\rvx\sim\pt(\vx|\vc)\) for some condition \(\vc\).
- Imputation: \(\rvx_m\sim\pt(\vx_m|\vx_o)\) for some partially observed data \(\vx_o\).
- Training Target: the gradient of the loss \(\nabla_\theta L(\theta)\) should be tractable.
- Corresponding Metrics: (Forward) KL-divergence \(\KL(\pdata||\pt)\), etc.
- Latents: whether latent variables \(\vz\) are introduced, potentially enabling latent interpolation and arithmetics.
- Architecture: whether there are architectural restrictions on the neural network when modeling \(\pt(\vx)\).
Note
- Supporting fast pointwise evaluation of \(\pt(\vx)\) does not necessarily allow fast sampling \(\rvx\sim\pt(\vx)\), and vice versa.
- Tractable training target \(L(\theta)\) does not necessarily allow fast pointwise evaluation of \(\pt(\vx)\) or fast sampling \(\rvx\sim\pt(\vx)\).
- Likelihood (MLE) is often uncorrelated with the perceptual quality of the samples (images, sound, etc.) 1
Discussion
How can we parameterize \(\pt(\vx)\) for high-dimensional input, while allowing us to sample from it?
Considerations:
- Can we perform pointwise evaluation of \(\pt(\vx)\)?
- (Required) Can we sample from \(\pt(\vx)\)?
- (Required) Can we efficiently compute the gradient of the loss?
- Should we introduce a latent variable \(\vz\)?
- How can we design the model architecture to satisfy the restrictions?
Taxonomy of Generative Models
Overview of different types of generative models, from Fig.1 of What are Diffusion Models? by Lilian Weng, and from Fig.20.1 of Probabilistic Machine Learning: Advanced Topics.
Characteristics of common kinds of generative model, modified from Table 20.1 of Probabilistic Machine Learning: Advanced Topics.
Model | Density | Sampling | Training | Latents | Architecture |
---|---|---|---|---|---|
VAE | LB, fast | Fast | MLE-LB | \(\R^L\) | Encoder-Decoder |
ARM | Exact, fast | Slow | MLE | None | Sequential |
Flows | Exact, slow/fast | Slow | MLE | \(\R^D\) | Invertible |
EBM | Approx, slow | Slow | MLE-Approx | Optional | Discriminative |
DM | LB | Slow | MLE-LB | \(\R^D\) | Encoder-Decoder |
GAN | N/A | Fast | Min-max | \(\R^L\) | Generator-Discriminator |
Community Resources
- Probabilistic Machine Learning: Advanced Topics (i.e., probml-book2)
- Chapter 20: Generative models: an overview
-
See Section 20.4.1.3 "Likelihood can be hard to compute" from Probabilistic Machine Learning: Advanced Topics ↩