Skip to content

Maximum Likelihood Estimation (MLE)

Derivation

X={x(1),,x(m)} is a set of m examples drawn independently from the true (but unknown) data distribution pdata(x). p^data is the empirical distribution of the training set X.

θML=argmaxθ pθ(X)=argmaxθ pθ(x(1),,x(m))=argmaxθ i=1mpθ(x(i))=argmaxθ logi=1mpθ(x(i))=argmaxθ i=1mlogpθ(x(i))=argmaxθ Exp^data(x)[logpθ(x)]=argminθ Exp^data(x)[logpθ(x)]

Optimization

The following training goal: θML=argmaxθ Exp^data(x)[logpθ(x)]

corresponds to the Negative Log-Likelihood (NLL) loss function: L(θ)=Exp^data(x)[logpθ(x)]

which can be optimized based on its gradients: θL(θ)=Exp^data(x)[θlogpθ(x)]

Community Resources

Comments