Overcoming this challenge with dimensionality, Wu and Frazier (2016) proposed
1 a substantially more efficient and scalable approach, based on multi-start stochastic gradient ascent. Stochastic gradient ascent (Robbins and Monro, 1951; Blum, 1954) is an algorithm for finding local optima of functions used such unbiased gradient estimates widely used in machine learning (Bottou, 2012). Multistart stochastic gradient ascent (Mart´ı et al., 2016) runs multiple instances of stochastic gradient ascent from different starting points and selects the best local optimum found as an approximate global optimum.
t
We summarize this approach for maximizing the
KG acquisition function in Algorithm 3. The al- gorithm iterates over starting points,
indexed by r, and for each maintains a sequence of iterates
x(r), indexed by
t, that converges to a local optimum of the KG acquisition function. The inner loop over
t relies on a
stochastic gradient G, which is a random variable whose expected value is equal to the gradient of the KG acquisition function with respect to where we sample, evaluated
at the current iterate x(r) .
t−1
We obtain the next iterate by taking a step in the direction of the stochastic gradient
G. The size of this
step is determined by the magnitude of G and a decreasing step size αt. Once stochastic gradient ascent has run for
T iterations for each start, Algorithm 3 uses simulation (Algorithm 2) to evaluate the KG acquisition function for the final point obtained
from each starting point, and selects the best one.
1This approach was proposed in the context of BayesOpt with parallel function evaluations, but can also be used in the setting considered here in which we perform one evaluation at a time.