Bayesian optimization is designed for functions that are expensive to evaluate and have no useful gradients, with hyperparameter tuning as the canonical example: each “evaluation” means training and scoring a whole model. It maintains a cheap probabilistic surrogate of the objective (a Gaussian process, or density estimates in TPE-style methods) and uses it to balance exploring uncertain regions against exploiting regions that already look good.
Compared to grid or random search, it typically finds better configurations in fewer trials, which matters when each trial costs minutes or hours. Libraries like Hyperopt and Optuna implement it via the Tree-structured Parzen Estimator (TPE).
