Skip to content

Johnson's Summary

Maximum Likelihood Estimation (MLE)

Johnson's Summary

Home
Summary
Summary
Math
Math
- Probability
  Probability
  - Maximum Likelihood Estimation (MLE) Maximum Likelihood Estimation (MLE)
    Table of contents
    
    Derivation
    
    Optimization
    
    Community Resources
ML
ML
- Notation
- Generative Modeling
  Generative Modeling
  - Autoregressive Models (AR/ARM)
RL
RL
- Fundamentals
  Fundamentals
  - Problem Formulation
  - Taxonomy
- Applications
  Applications
Post by Tag

Maximum Likelihood Estimation (MLE)

Derivation

$X = {x^{(1)}, \dots, x^{(m)}}$ is a set of $m$ examples drawn independently from the true (but unknown) data distribution $p_{d a t a} (x)$ . ${\hat{p}}_{d a t a}$ is the empirical distribution of the training set $X$ .

\begin{aligned} θ_{ML} & = \underset{θ}{\arg \max} p_{θ} (X) \\ = \underset{θ}{\arg \max} p_{θ} (x^{(1)}, \dots, x^{(m)}) \\ = \underset{θ}{\arg \max} \prod_{i = 1}^{m} p_{θ} (x^{(i)}) \\ = \underset{θ}{\arg \max} \log \prod_{i = 1}^{m} p_{θ} (x^{(i)}) \\ = \underset{θ}{\arg \max} \sum_{i = 1}^{m} \log p_{θ} (x^{(i)}) \\ = \underset{θ}{\arg \max} E_{x \sim {\hat{p}}_{d a t a} (x)} [\log p_{θ} (x)] \\ = \underset{θ}{\arg \min} E_{x \sim {\hat{p}}_{d a t a} (x)} [- \log p_{θ} (x)] \end{aligned}

Optimization

The following training goal: $θ_{ML} = \underset{θ}{\arg \max} E_{x \sim {\hat{p}}_{d a t a} (x)} [\log p_{θ} (x)]$

corresponds to the Negative Log-Likelihood (NLL) loss function: $L (θ) = E_{x \sim {\hat{p}}_{d a t a} (x)} [- \log p_{θ} (x)]$

which can be optimized based on its gradients: $\nabla_{θ} L (θ) = E_{x \sim {\hat{p}}_{d a t a} (x)} [- \nabla_{θ} \log p_{θ} (x)]$

Community Resources

Deep Learning
- Chapter 5.5 Maximum Likelihood Estimation

Comments