Maximum likelihood estimation treats the model as a probability statement about the data and asks which parameter values make the observed outcomes least surprising. In practice you maximize the log of the likelihood, which turns products over independent observations into sums.
Many familiar loss functions are maximum likelihood in disguise: minimizing squared error is MLE under Gaussian noise, and minimizing binary cross-entropy is MLE under a Bernoulli outcome. Recognizing the disguise explains where losses come from and when they are the wrong choice, such as squared error on heavy-tailed data.
