arXiv:1807.02811v1 [stat.ML] 8 Jul 2018
A Tutorial on Bayesian Optimization
Peter I. Frazier July 10, 2018
Abstract
Bayesian optimization is an approach to optimizing objective functions that take a long time (min- utes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique,
Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process re- gression and three common acquisition functions:
expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function
evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random
environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative infor- mation. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal
decision-theoretic argument, standing in contrast to previous ad hoc modifications.
Introduction
Bayesian optimization (BayesOpt) is a class of machine-learning-based optimization methods focused on solving the problem
max
f (
x)
, (1)
x∈
A
where the feasible set and objective function typically have the following properties:
d
The input x is in R for a value of d that is not too large. Typically d ≤ 20 in most successful
applications of BayesOpt.
hyper-rectangle {x ∈ Rd : ai ≤ xi ≤ bi} or the d-dimensional simplex {x ∈ Rd : (Section 5) we will relax this assumption.
i
xi = 1}. Later
The feasible set A is a simple set, in which it is easy to assess membership. ΣTypically A is a
The objective function f is continuous. This will typically be required to model f using Gaussian process regression.
f is “expensive to evaluate” in the sense that the number of evaluations that may be performed is limited, typically to a few hundred. This limitation typically arises because each evaluation takes a substantial amount of time (typically hours), but may also occur because each evaluation bears a monetary cost (e.g., from purchasing cloud computing power, or buying laboratory materials), or an opportunity cost (e.g., if evaluating f requires asking a human subject questions who will tolerate only a limited number).