The score of a distribution at a point is the direction you would move that point to make it more plausible under the distribution. A network that knows the score everywhere can generate samples via Langevin dynamics: repeatedly step uphill along the score while injecting a little noise.
Scores sidestep the intractable normalizing constant of high-dimensional densities, since constants vanish under the gradient. In diffusion models, predicting the added noise is equivalent to estimating the score of the noised data distribution up to a known rescaling, which is the bridge between the denoising and score-based views of the same model family.
