Constraints In the problem posed in Section 1, we assumed that the feasible set was a simple one in which it was easy to assess membership. The literature has also considered the more general problem,
max f(x)
x subjecttogi(x)≥0,i=1,...,I, wherethegi areasexpensivetoevaluateasf.EIgeneralizesnaturallytothissettingwhenfandgi canbeevaluatedwithoutnoise:improvementresultswhentheevaluatedxisfeasible(gi(x)≥0forall x) and f(x) is better than the best previously evaluated feasible point. This was proposed in Section 4 of Schonlau et al. (1998) and studied independently by Gardner et al. (2014). PES has also been studied for this setting (Hernandez-Lobato et al., 2015).
Multi-FidelityandMulti-InformationSourceEvaluations In multi-fidelity optimization, rather than a single objective f, we have a collection of information sources f(x,s) indexed by s. Here, scontrols the “fidelity”, with lower sgiving higher fidelity, and f (x, 0) corresponding to the original objective. Increasing the fidelity (decreasing s) gives a more accurate estimate of f(x,0), but at a higher cost c(s). For example, x might describe the design of an engineering system, and s the size of a mesh used in solving a partial differential equation that models the system. Or, s might describe the time horizon used in a steady-state simulation. Authors have also recently considered optimization of neural networks, where sindexes the number of iterations or amount of data used in training a machine learning algorithm (Swersky et al., 2014; Klein et al., 2016). Accuracy is modeled by supposing that f(x,s) is equal to f(x,0) and is observed with noise whose variance λ(s) increases with s, or by supposing that f (x, s) provides deterministic evaluations with f (x, s+1)−f (x, s) modeled by a mean 0 Gaussian process that varies with x. Both settings can be modeled via a Gaussian process on f, including both xand sas part of the modeled domain.
Theoverarchinggoalistosolvemaxxf(x,0)byobservingf(x,s)atasequenceofpointsandfidelities n=1
(xn,sn)withtotalcostΣN c(sn)lessthansomebudgetB.Workonmulti-fidelityoptimizationincludes Huang et al. (2006); S´obester et al. (2004); Forrester et al. (2007); McLeod et al. (2017); Kandasamy et al. (2016).
In the more general problem of multi-information source optimization, we relax the assumption that the f (·, s) are ordered by s in terms of accuracy and cost. Instead, we simply have a function f taking a design input x and an information source input s, with f (x, 0) being the objective, and f (x, s) for s /= 0 being observable with different biases relative to the objective, different amounts of noise, and different costs.
For example, xmight represent the design of an aircraft’s wing, f(x,0) the predicted performance of the wing under an accurate but slow simulator, and f (x, s) for s = 1, 2 representing predicted performance under two inexpensive approximate simulators making different assumptions. It may be that f (x, 1) is accurate for some regions of the search space and substantially biased in others, with f (x, 2) being accurate in other regions. In this setting, the relative accuracy of f (x, 1) vs. f (x, 2) depends on x. Work on multi-information source optimization includes Lam et al. (2015); Poloczek et al. (2017).
EI is difficult to apply directly in these problems because evaluating f (x, s) for s /= 0 never provides an improvement in the best objective function value seen, max{f (xn, 0) : sn= 0}. Thus, a direct translationof EI to this setting causes EI= 0 for s/= 0, leading to measurement of only the highest fidelity. For thisreason,theEI-basedmethodfromLametal.(2015)usesEItoselectxnassumingthatf(x,0)willbe observed (even if it will not), and uses a separate procedure to select s. KG, ES, and PES can be applied directly to these problems, as in Poloczek et al. (2017).
∫
Random Environmental Conditions and Multi-Task Bayesian Optimization Closely related to multi-information source optimization is the pair of problems
max f(x,w)p(w) dw,
Σ
x
max f(x,w)p(w),
x w where f is expensive to evaluate. These problems appear in the literature with a variety of names: optimization with random environmental conditions (Chang et al., 2001) in statistics, multi-task Bayesian optimization (Swersky et al., 2013) in machine learning, along with optimization of integrated response functions (Williams et al., 2000) and optimization with expensive integrands (Toscano-Palmerin and Frazier, 2018).
¸
Rather than taking the objective f (x, w)p(w) dw as our unit of evaluation, a natural approach is to evaluate f (x, w) at a small number of w at an x of interest. This gives partial information about the objective at x. Based on this information, one can explore a different x, or resolve the current x with more precision. Moreover, by leveraging observations at w for a nearby x, one may already have substantial information about a particular f (x, w) reducing the need to evaluate it. Methods that act on this intuition can substantially outperform methods that simply evaluate the full objective in each evaluation via numerical quadrature or a full sum (Toscano-Palmerin and Frazier, 2018).
This pair of problems arise in the design of engineering systems and biomedical problems, such as joint replacements (Chang et al., 2001) and cardiovascular bypass grafts (Xie et al., 2012), where f (x, w) is the performance of design x under environmental condition w as evaluated by some expensive-to-evaluate computational model, p(w) is some simple function (e.g., the normal density) describing the frequency with which condition woccurs, and our goal is to optimize average performance. It also arises in machine learning, in optimizing cross-validation performance. Here, we divide our data into chunks or “folds” indexed by w, and f (x, w) is the test performance on fold w of a machine learning model trained without data from this fold.
Methods in this area include KG for noise-free (Xie et al., 2012) and general problems (Toscano- Palmerin and Frazier, 2018), PES (Swersky et al., 2013), and modifications of EI (Groot et al., 2010; Williams et al., 2000). As in multi-information source optimization, the unmodified EI acquisition function is inappropriate here because observing f (x, w) does not provide an observation of the objective (unless all w′ /= w have already been observed at that x) nor a strictly positive improvement. Thus, Groot et al. (2010) and Williams et al. (2000) use EI to choose xas if it we did observe the objective, and then use a separate strategy for choosing w.