Gradient boosting builds an ensemble one weak learner at a time, usually shallow decision trees. At each round, the next tree is fitted to the gradient of the loss with respect to the current ensemble’s predictions, so it directly targets whatever the ensemble is still getting wrong. A learning rate shrinks each tree’s contribution, trading more rounds for better generalization.
Implementations like XGBoost, LightGBM, and CatBoost dominate on structured tabular data, where they remain the strongest general-purpose baseline even against deep learning approaches.
