Optimization

Yüklə 0,51 Mb.

səhifə	6/19
tarix	12.05.2023
ölçüsü	0,51 Mb.
	#112044

1 2 3 4 5 6 7 8 9 ... 19

bayesian optimallash

Expected Improvement

Acquisition Functions

Having surveyed Gaussian processes, we return to Algorithm 1 and discuss the acquisition function used in that loop. We focus on the setting described in Section 1 with noise-free evaluations, which we call the “standard” problem, and then discuss noisy evaluations, parallel evaluations, derivative observations, and other “exotic” extensions in Section 5.
The most commonly used acquisition function is expected improvement, and we discuss it first, in Section 4.1. Expected improvement performs well and is easy to use. We then discuss the knowledge gradient (Section 4.2), entropy search and predictive entropy search (Section 4.3) acquisition functions. These alternate acquisition functions are most useful in exotic problems where an assumption made by expected improvement, that the primary benefit of sampling occurs through an improvement at the point sampled, is no longer true.

Expected Improvement

The expected improvement acquisition function is derived by a thought experiment. Suppose we are ^{using Algorithm 1 to solve (1), in which}^xn ^{indicates the point sampled at iteration}ⁿ^and^yn ^indicatesthe observed value. Assume that we may only return a solution that we have evaluated as our final solution to (1). Also suppose for the moment that we have no evaluations left to make, and must return a solution based on those we have already performed. Since we observe f without noise, the optimal choice is the previously evaluated point with the largest observed value. Let f_n^∗ = max_m_≤_n f (x_m) be the value of this point, where n is the number of times we have evaluated f thus far.
Now suppose in fact we have one additional evaluation to perform, and we can perform it anywhere. If we evaluate at x, we will observe f (x). After this new evaluation, the value of the best point we have observed will either be f (x) (if f (x) ≥ f_n^∗) or f_n^∗ (if f (x) ≤ f_n^∗). The improvement in the value of the
best observed point is then f (x) − f_n^∗ if this quantity is positive, and 0 otherwise. We can write this improvement more compactly as [f (x) − f_n^∗]⁺, where a⁺ = max(a, 0) indicates the positive part.
While we would like to choose x so that this improvement is large, f (x) is unknown until after the
evaluation. What we can do, however, is to take the expected value of this improvement and choose x
to maximize it. We define the expected improvement as,
EI_n(x) := E_n [f (x) − f_n^∗]⁺(7)

n
^Here,^En^[^·^{] =}^E^[^·|^x1:n^{, y}1:n^{] indicates the expectation taken under the posterior distribution given eval-}^{uations of}^f^at^x1^{, . . . x}n^.^{This posterior distribution is given by}^(3):^f⁽^x^{) given}^x1:n^{, y}1:n ^{is normally}^distributed^with^mean^µn⁽^x⁾^and^variance^σ²⁽^x^).
The expected improvement can be evaluated in closed form using integration by parts, as described
in Jones et al. (1998) or Clark (1961). The resulting expression is

Yüklə 0,51 Mb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 19

Optimization

Acquisition Functions

Expected Improvement