Optimization

Exotic Bayesian Optimization

Yüklə 0,51 Mb.

səhifə	12/19
tarix	12.05.2023
ölçüsü	0,51 Mb.
	#112044

1 ... 8 9 10 11 12 13 14 15 ... 19

bayesian optimallash

Exotic Bayesian Optimization

Above we described methodology for solving the “standard” Bayesian optimization problem described in Section 1. This problem assumed a feasible set in which membership is easy to evaluate, such as a hyperrectangle or simplex; a lack of derivative information; and noise-free evaluations.
While there are quite a few applied problems that meet all of the assumptions of the standard problem, there are even more where one or more of these assumptions are broken. We call these “exotic” problems. Here, we describe some prominent examples and give references for more detailed reading. (Although we discuss noisy evaluations in this section on exotic problems, they are substantially less exotic than the others considered, and are often considered to be part of the standard problem.)

Noisy Evaluations GP regression can be extended naturally to observations with independent nor- mally distributed noise of known variance (Rasmussen and Williams, 2006). This adds a diagonal term with entries equal to the variance of the noise to the covariance matrices in (3). In practice, this variance is not known, and so the most common approach is to assume that the noise is of common variance and to include this variance as a hyperparameter. It is also possible to perform inference assuming that the variance changes with the domain, by modeling the log of the variance with a second Gaussian process (Kersting et al., 2007).
The KG, ES, and PES acquisition functions apply directly in the setting with noise and they retain
their one-step optimality properties. One simply uses the posterior mean of the Gaussian process that includes noise.
Direct use of the EI acquisition function presents conceptual challenges, however, since the “improve- ment” that results from a function value is no longer easily defined, and f (x) in (7) is no longer observed. Authors have employed a variety of heuristic approaches, substituting different normal distributions for the distribution of f (x) in (7), and typically using the maximum of the posterior mean at the previously evaluated points in place of f_n^∗. Popular substitutes for the distribution of f (x) include the distribution ^of^µn+1⁽^x^),^the^distribution^of^yn+1^,^{and continuing to use the distribution of}^f⁽^x^{) even though it is}not observed. Because of these approximations, KG can outperform EI substantially in problems with substantial noise (Wu and Frazier, 2016; Frazier et al., 2009).

^
As an alternative approach to applying EI when measurements are noisy, Scott et al. (2011) considers noisy evaluations under the restriction made in the derivation of EI: that the reported solution needs to be a previously reported point. It then finds the one-step optimal place to sample under this assumption. ^Its^analysis^is^similar^to^that^used^to^derive^the^KG^policy,^except^that^we^restrict^x∗ ^to^those^pointsthat have been evaluated.
^{Indeed, if we were to report a final solution after}ⁿ^{measurements, it would be the point among}^x1:n with the largest value of µ_n(x), and it would have conditional expected value µ_n^∗∗ = max_i₌₁_,...,n µ_n(x_i). ^{If we were to take one more sample at}^xn+1 ⁼^x^{, it would have conditional expected value under the}new posterior of µ^∗_n^∗₊₁ = max_i₌₁_,...,n₊₁ µ_n₊₁(x_i). Taking the expected value of the difference, the value of sampling at x is
E_n [µ^∗_n^∗₊₁ − µ_n^∗^∗|x_n₊₁ = x] . (13)
^Unlike^the^case^with^noise-free^evaluations,^this^sample^may^cause^µn+1⁽^xi⁾^to^differ^from^µn⁽^xi⁾^fori ≤ n, necessitating a more complex calculation than in the noise-free setting (but a simpler calculation than for the KG policy). A procedure for calculating this quantity and its derivative is given in Scott et al. (2011). While we can view this acquisition function as an approximation to the KG acquisition function as Scott et al. (2011) does (they call it the KGCP acquisition function), we argue here that it is the most natural generalization of EI’s assumptions to the case with noisy measurements.

Yüklə 0,51 Mb.

Dostları ilə paylaş:

1 ... 8 9 10 11 12 13 14 15 ... 19