Optimization



Yüklə 0,51 Mb.
səhifə2/19
tarix12.05.2023
ölçüsü0,51 Mb.
#112044
1   2   3   4   5   6   7   8   9   ...   19
bayesian optimallash

f lacks known special structure like concavity or linearity that would make it easy to optimize using techniques that leverage such structure to improve efficiency. We summarize this by saying f is a “black box.”

  • When we evaluate f , we observe only f (x) and no first- or second-order derivatives. This prevents the application of first- and second-order methods like gradient descent, Newton’s method, or quasi- Newton methods. We refer to problems with this property as “derivative-free”.

  • Through most of the article, we will assume f (x) is observed without noise. Later (Section 5) we will allow f (x) to be obscured by stochastic noise. In almost all work on Bayesian optimization, noise is assumed independent across evaluations and Gaussian with constant variance.

    We summarize these problem characteristics by saying that BayesOpt is designed for black-box derivative- free global optimization.
    The ability to optimize expensive black-box derivative-free functions makes BayesOpt extremely ver- satile. Recently it has become extremely popular for tuning hyperparameters in machine learning al- gorithms, especially deep neural networks (Snoek et al., 2012). Over a longer period, since the 1960s, BayesOpt has been used extensively for designing engineering systems (Moˇckus, 1989; Jones et al., 1998; Forrester et al., 2008). BayesOpt has also been used to choose laboratory experiments in materials and drug design (Negoescu et al., 2011; Frazier and Wang, 2016; Packwood, 2017), in calibration of envi- ronmental models (Shoemaker et al., 2007), and in reinforcement learning (Brochu et al., 2009; Lizotte, 2008; Lizotte et al., 2007).
    BayesOpt originated with the work of Kushner (Kushner, 1964), Zilinskas (Zˇilinskas, 1975; Moˇckus
    et al., 1978), and Moˇckus (Moˇckus, 1975; Moˇckus, 1989), but received substantially more attention after that work was popularized by Jones et al. (1998) and their work on the Efficient Global Opti- mization (EGO) algorithm. Following Jones et al. (1998), innovations developed in that same literature include multi-fidelity optimization (Huang et al., 2006; S´obester et al., 2004), multi-objective optimiza- tion (Keane, 2006; Knowles, 2006; Moˇckus and Moˇckus, 1991), and a study of convergence rates (Calvin,
    1997; Calvin and Zˇilinskas, 2000; Calvin and Zˇilinskas, 2005; Calvin and Zˇilinskas, 1999). The observa-
    tion made by Snoek et al. (2012) that BayesOpt is useful for training deep neural networks sparked a surge of interest within machine learning, with complementary innovations from that literature including multi-task optimization (Swersky et al., 2013; Toscano-Palmerin and Frazier, 2018), multi-fidelity opti- mization specifically aimed at training deep neural networks (Klein et al., 2016), and parallel methods (Ginsbourger et al., 2007, 2010; Wang et al., 2016a; Wu and Frazier, 2016). Gaussian process regression, its close cousin kriging, and BayesOpt have also been studied recently in the simulation literature (Klei- jnen et al., 2008; Salemi et al., 2014; Mehdad and Kleijnen, 2018) for modeling and optimizing systems simulated using discrete event simulation.
    There are other techniques outside of BayesOpt that can be used to optimize expensive derivative- free black-box functions. While we do not review methods from this literature here in detail, many of them have a similar flavor to BayesOpt methods: they maintain a surrogate that models the objective function, which they use to choose where to evaluate (Booker et al., 1999; Regis and Shoemaker, 2007b,a, 2005). This more general class of methods is often called “surrogate methods.” Bayesian optimization distinguishes itself from other surrogate methods by using surrogates developed using Bayesian statistics, and in deciding where to evaluate the objective using a Bayesian interpretation of these surrogates.
    We first introduce the typical form that Bayesian optimization algorithms take in Section 2. This form involves two primary components: a method for statistical inference, typically Gaussian process (GP) regression; and an acquisition function for deciding where to sample, which is often expected improvement. We describe these two components in detail in Sections 3 and 4.1. We then describe three alternate acquisition functions: knowledge-gradient (Section 4.2), entropy search, and predictive entropy search (Section 4.3). These alternate acquisition functions are particularly useful in problems falling outside the strict set of assumptions above, which we call “exotic” Bayesian optimization problems and we discuss in Section 5. These exotic Bayesian optimization problems include those with parallel evaluations, constraints, multi-fidelity evaluations, multiple information sources, random environmental conditions, multi-task objectives, and derivative observations. We then discuss Bayesian optimization and Gaussian process regression software in Section 6 and conclude with a discussion of future research directions in Section 7.
    Other tutorials and surveys on Bayesian optimization include Shahriari et al. (2016); Brochu et al. (2009); Sasena (2002); Frazier and Wang (2016). This tutorial differs from these others in its coverage of non-standard or “exotic” Bayesian optimization problems. It also differs in its substantial emphasis on acquisition functions, with less emphasis on GP regression. Finally, it includes what we believe is a novel analysis of expected improvement for noisy measurements, and argues that the acquisition function previously proposed by Scott et al. (2011) is the most natural way to apply the expected improvement acquisition function when measurements are noisy.



    1. Yüklə 0,51 Mb.

      Dostları ilə paylaş:
    1   2   3   4   5   6   7   8   9   ...   19




    Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
    rəhbərliyinə müraciət

    gir | qeydiyyatdan keç
        Ana səhifə


    yükləyin