Glossary Entry

Maximum Likelihood Estimation

Fitting a model by choosing the parameters under which the observed data would have been most probable.

Optimization Statistics

Also called: maximum likelihood, MLE, log-likelihood

Seed source: Stanford CS229 lecture notes

Maximum likelihood estimation treats the model as a probability statement about the data and asks which parameter values make the observed outcomes least surprising. In practice you maximize the log of the likelihood, which turns products over independent observations into sums.

Many familiar loss functions are maximum likelihood in disguise: minimizing squared error is MLE under Gaussian noise, and minimizing binary cross-entropy is MLE under a Bernoulli outcome. Recognizing the disguise explains where losses come from and when they are the wrong choice, such as squared error on heavy-tailed data.