Optimization

Yüklə 0,51 Mb.

səhifə	9/19
tarix	12.05.2023
ölçüsü	0,51 Mb.
	#112044

1 ... 5 6 7 8 9 10 11 12 ... 19

bayesian optimallash

Algorithm 3

for j = 1 to J: do

n

∼ ∼
Generate y_n₊₁ Normal(µ_n(x), σ² (x)). (Equivalently, Z Normal(0, 1) and y_n₊₁ = µ_n(x) + σ_n(x)Z.) Set µ_n₊₁(x^′; x, y_n₊₁) to the posterior mean at x^′ via (3) with (x, y_n₊₁) as the last observation.

µ^∗_n₊₁= max_x^'µ_n₊₁(x^′; x, y_n₊₁).

−
_∆(j) ₌_µ∗_n₊₁_µ∗_n_.
end for

J

j=1
Estimate KG_n(x) by ¹ ^Σ^J

_∆(j)_.

Overcoming this challenge with dimensionality, Wu and Frazier (2016) proposed¹ a substantially more efficient and scalable approach, based on multi-start stochastic gradient ascent. Stochastic gradient ascent (Robbins and Monro, 1951; Blum, 1954) is an algorithm for finding local optima of functions used such unbiased gradient estimates widely used in machine learning (Bottou, 2012). Multistart stochastic gradient ascent (Mart´ı et al., 2016) runs multiple instances of stochastic gradient ascent from different starting points and selects the best local optimum found as an approximate global optimum.

t
We summarize this approach for maximizing the KG acquisition function in Algorithm 3. The al- gorithm iterates over starting points, indexed by r, and for each maintains a sequence of iterates x⁽^r⁾, indexed by t, that converges to a local optimum of the KG acquisition function. The inner loop over t relies on a stochastic gradient G, which is a random variable whose expected value is equal to the gradient of the KG acquisition function with respect to where we sample, evaluated at the current iterate x⁽^r⁾ .
t−1
We obtain the next iterate by taking a step in the direction of the stochastic gradient G. The size of this
^{step is determined by the magnitude of}^G^{and a decreasing step size}^αt^{. Once stochastic gradient ascent}has run for T iterations for each start, Algorithm 3 uses simulation (Algorithm 2) to evaluate the KG acquisition function for the final point obtained from each starting point, and selects the best one.
¹This approach was proposed in the context of BayesOpt with parallel function evaluations, but can also be used in the setting considered here in which we perform one evaluation at a time.

Algorithm 3 Efficient method for finding x with the largest KG_n(x), based on multistart stochastic gradient ascent. Takes as input a number of starts R, a number of iterations T for each pass of stochastic gradient ascent, a parameter a used to define a stepsize sequence, and a number of replications J. Suggested input parameters: R = 10, T = 10², a = 4, J = 10³.

Yüklə 0,51 Mb.

Dostları ilə paylaş:

1 ... 5 6 7 8 9 10 11 12 ... 19

Optimization

n ∼ ∼ Generate yn+1 Normal(µn(x), σ2 (x)). (Equivalently, Z Normal(0, 1) and yn+1 = µn(x) + σn(x)Z.) Set µn+1(x′; x, yn+1) to the posterior mean at x′ via (3) with (x, yn+1) as the last observation.

J j=1 Estimate KGn(x) by 1 ΣJ

n

∼ ∼
Generate y_n₊₁ Normal(µ_n(x), σ² (x)). (Equivalently, Z Normal(0, 1) and y_n₊₁ = µ_n(x) + σ_n(x)Z.) Set µ_n₊₁(x^′; x, y_n₊₁) to the posterior mean at x^′ via (3) with (x, y_n₊₁) as the last observation.

J

j=1
Estimate KG_n(x) by ¹ ^Σ^J