Optimization



Yüklə 0,51 Mb.
səhifə6/19
tarix12.05.2023
ölçüsü0,51 Mb.
#112044
1   2   3   4   5   6   7   8   9   ...   19
bayesian optimallash

Acquisition Functions


Having surveyed Gaussian processes, we return to Algorithm 1 and discuss the acquisition function used in that loop. We focus on the setting described in Section 1 with noise-free evaluations, which we call the “standard” problem, and then discuss noisy evaluations, parallel evaluations, derivative observations, and other “exotic” extensions in Section 5.
The most commonly used acquisition function is expected improvement, and we discuss it first, in Section 4.1. Expected improvement performs well and is easy to use. We then discuss the knowledge gradient (Section 4.2), entropy search and predictive entropy search (Section 4.3) acquisition functions. These alternate acquisition functions are most useful in exotic problems where an assumption made by expected improvement, that the primary benefit of sampling occurs through an improvement at the point sampled, is no longer true.


    1. Expected Improvement


The expected improvement acquisition function is derived by a thought experiment. Suppose we are using Algorithm 1 to solve (1), in which xn indicates the point sampled at iteration n and yn indicates the observed value. Assume that we may only return a solution that we have evaluated as our final solution to (1). Also suppose for the moment that we have no evaluations left to make, and must return a solution based on those we have already performed. Since we observe f without noise, the optimal choice is the previously evaluated point with the largest observed value. Let fn = maxmn f (xm) be the value of this point, where n is the number of times we have evaluated f thus far.
Now suppose in fact we have one additional evaluation to perform, and we can perform it anywhere. If we evaluate at x, we will observe f (x). After this new evaluation, the value of the best point we have observed will either be f (x) (if f (x) ≥ fn) or fn (if f (x) ≤ fn). The improvement in the value of the
best observed point is then f (x) − fn if this quantity is positive, and 0 otherwise. We can write this improvement more compactly as [f (x) − fn]+, where a+ = max(a, 0) indicates the positive part.
While we would like to choose x so that this improvement is large, f (x) is unknown until after the
evaluation. What we can do, however, is to take the expected value of this improvement and choose x
to maximize it. We define the expected improvement as,
EIn(x) := En [f (x) − fn]+ (7)

n
Here, En[·] = E[·|x1:n, y1:n] indicates the expectation taken under the posterior distribution given eval- uations of f at x1, . . . xn. This posterior distribution is given by (3): f (x) given x1:n, y1:n is normally distributed with mean µn(x) and variance σ2 (x).
The expected improvement can be evaluated in closed form using integration by parts, as described
in Jones et al. (1998) or Clark (1961). The resulting expression is



Yüklə 0,51 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   19




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin