Optimization


Exotic Bayesian Optimization



Yüklə 0,51 Mb.
səhifə12/19
tarix12.05.2023
ölçüsü0,51 Mb.
#112044
1   ...   8   9   10   11   12   13   14   15   ...   19
bayesian optimallash

Exotic Bayesian Optimization


Above we described methodology for solving the “standard” Bayesian optimization problem described in Section 1. This problem assumed a feasible set in which membership is easy to evaluate, such as a hyperrectangle or simplex; a lack of derivative information; and noise-free evaluations.
While there are quite a few applied problems that meet all of the assumptions of the standard problem, there are even more where one or more of these assumptions are broken. We call these “exotic” problems. Here, we describe some prominent examples and give references for more detailed reading. (Although we discuss noisy evaluations in this section on exotic problems, they are substantially less exotic than the others considered, and are often considered to be part of the standard problem.)


Noisy Evaluations GP regression can be extended naturally to observations with independent nor- mally distributed noise of known variance (Rasmussen and Williams, 2006). This adds a diagonal term with entries equal to the variance of the noise to the covariance matrices in (3). In practice, this variance is not known, and so the most common approach is to assume that the noise is of common variance and to include this variance as a hyperparameter. It is also possible to perform inference assuming that the variance changes with the domain, by modeling the log of the variance with a second Gaussian process (Kersting et al., 2007).
The KG, ES, and PES acquisition functions apply directly in the setting with noise and they retain
their one-step optimality properties. One simply uses the posterior mean of the Gaussian process that includes noise.
Direct use of the EI acquisition function presents conceptual challenges, however, since the “improve- ment” that results from a function value is no longer easily defined, and f (x) in (7) is no longer observed. Authors have employed a variety of heuristic approaches, substituting different normal distributions for the distribution of f (x) in (7), and typically using the maximum of the posterior mean at the previously evaluated points in place of fn. Popular substitutes for the distribution of f (x) include the distribution of µn+1(x), the distribution of yn+1, and continuing to use the distribution of f (x) even though it is not observed. Because of these approximations, KG can outperform EI substantially in problems with substantial noise (Wu and Frazier, 2016; Frazier et al., 2009).

^
As an alternative approach to applying EI when measurements are noisy, Scott et al. (2011) considers noisy evaluations under the restriction made in the derivation of EI: that the reported solution needs to be a previously reported point. It then finds the one-step optimal place to sample under this assumption. Its analysis is similar to that used to derive the KG policy, except that we restrict xto those points that have been evaluated.
Indeed, if we were to report a final solution after n measurements, it would be the point among x1:n with the largest value of µn(x), and it would have conditional expected value µn∗∗ = maxi=1,...,n µn(xi). If we were to take one more sample at xn+1 = x, it would have conditional expected value under the new posterior of µn+1 = maxi=1,...,n+1 µn+1(xi). Taking the expected value of the difference, the value of sampling at x is
En [µn+1µn|xn+1 = x] . (13)
Unlike the case with noise-free evaluations, this sample may cause µn+1(xi) to differ from µn(xi) for i n, necessitating a more complex calculation than in the noise-free setting (but a simpler calculation than for the KG policy). A procedure for calculating this quantity and its derivative is given in Scott et al. (2011). While we can view this acquisition function as an approximation to the KG acquisition function as Scott et al. (2011) does (they call it the KGCP acquisition function), we argue here that it is the most natural generalization of EI’s assumptions to the case with noisy measurements.



Yüklə 0,51 Mb.

Dostları ilə paylaş:
1   ...   8   9   10   11   12   13   14   15   ...   19




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin