Acknowledgments
While writing this tutorial, the author was supported by the Air Force Office of Scientific Research and the National Science Foundation (AFOSR FA9550-15-1-0038 and NSF CMMI-1254298, CMMI-1536895 DMR-1719875, DMR-1120296). The author would also like to thank Roman Garnett, whose suggestions helped shape the discussion of expected improvement with noise, and several anonymous reviewers.
References
Ahmed, M. O., Shahriari, B., and Schmidt, M. (2016). Do we need “harmless” Bayesian optimization and “first-order” Bayesian optimization. In Neural Information Processing Systems 2016 Workshop on Bayesian Optimization.
Berger, J. O. (2013). Statistical Decision Theory and Bayesian Analysis. Springer Science & Business Media.
Blum, J. R. (1954). Multidimensional stochastic approximation methods. The Annals of Mathematical Statistics, pages 737–744.
Booker, A., Dennis, J., Frank, P., Serafini, D., Torczon, V., and Trosset, M. (1999). A rigorous framework for optimization of expensive functions by surrogates. Structural and Multidisciplinary Optimization, 17(1):1–13.
Bottou, L. (2012). Stochastic gradient descent tricks. In Montavon, G., Orr, G. B., and Mu¨ller, K. R., editors, Neural Networks: Tricks of the Trade, pages 421–436. Springer.
Brochu, E., Cora, M., and de Freitas, N. (2009). A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report TR-2009-023, Department of Computer Science, University of British Columbia. arXiv:1012.2599.
Bull, A. D. (2011). Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 12(Oct):2879–2904.
Calvin, J. (1997). Average performance of a class of adaptive algorithms for global optimization. The Annals of Applied Probability, 7(3):711–730.
Calvin, J. and Zˇilinskas, A. (2005). One-dimensional global optimization for observations with noise.
Computers & Mathematics with Applications, 50(1-2):157–169.
Calvin, J. and Zˇilinskas, A. (1999). On the convergence of the P-algorithm for one-dimensional global optimization of smooth functions. Journal of Optimization Theory and Applications, 102(3):479–495.
Calvin, J. and Zˇilinskas, A. (2000). One-dimensional P-algorithm with convergence rate O(n-3+δ) for smooth functions. Journal of Optimization Theory and Applications, 106(2):297–307.
Cashore, J. M., Kumarga, L., and Frazier, P. I. (2016). Multi-step Bayesian optimization for one- dimensional feasibility determination. arXiv preprint arXiv:1607.03195.
Chang, P. B., Williams, B. J., Bhalla, K. S. B., Belknap, T. W., Santner, T. J., Notz, W. I., and Bartel,
D. L. (2001). Design and analysis of robust total joint replacements: finite element model experiments with environmental variables. Journal of Biomechanical Engineering, 123(3):239–246.
Chick, S. E. and Inoue, K. (2001). New two-stage and sequential procedures for selecting the best simulated system. Operations Research, 49(5):732–743.
Clark, C. E. (1961). The greatest of a finite set of random variables. Operations Research, 9(2):145–162. Cover, T. M. and Thomas, J. A. (2012). Elements of Information Theory. John Wiley & Sons.
Dynkin, E. and Yushkevich, A. (1979). Controlled Markov Processes. Springer, New York.
Forrester, A., S´obester, A., and Keane, A. (2008). Engineering Design via Surrogate Modelling: A Practical Guide. Wiley, West Sussex, UK.
Forrester, A. I., S´obester, A., and Keane, A. J. (2007). Multi-fidelity optimization via surrogate modelling. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, volume 463, pages 3251–3269. The Royal Society.
Frazier, P., Powell, W., and Dayanik, S. (2009). The knowledge-gradient policy for correlated normal beliefs. INFORMS Journal on Computing, 21(4):599–613.
Frazier, P. I. (2012). Tutorial: Optimization via simulation with bayesian statistics and dynamic pro- gramming. In Laroque, C., Himmelspach, J., Pasupathy, R., Rose, O., and Uhrmacher, A. M., editors, Proceedings of the 2012 Winter Simulation Conference Proceedings, pages 79–94, Piscataway, New Jersey. Institute of Electrical and Electronics Engineers, Inc.
Frazier, P. I., Powell, W. B., and Dayanik, S. (2008). A knowledge-gradient policy for sequential infor- mation collection. SIAM Journal on Control and Optimization, 47(5):2410–2439.
Frazier, P. I. and Wang, J. (2016). Bayesian optimization for materials design. In Lookman, T., Alexander,
F. J., and Rajan, K., editors, Information Science for Materials Discovery and Design, pages 45–75. Springer.
Gardner, J. R., Kusner, M. J., Xu, Z. E., Weinberger, K. Q., and Cunningham, J. P. (2014). Bayesian optimization with inequality constraints. In ICML, pages 937–945.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian Data Analysis, volume 2. CRC Press Boca Raton, FL.
Ginsbourger, D., Le Riche, R., and Carraro, L. (2007). A multi-points criterion for deterministic par- allel global optimization based on kriging. In International Conference on Nonconvex Programming, NCP07, Rouen, France.
Ginsbourger, D., Le Riche, R., and Carraro, L. (2010). Kriging is well-suited to parallelize optimization. In Tenne, Y. and Goh, C. K., editors, Computational Intelligence in Expensive Optimization Problems, volume 2, pages 131–162. Springer.
Ginsbourger, D. and Riche, R. (2010). Towards Gaussian process-based optimization with finite time horizon. In Giovagnoli, A., Atkinson, A., Torsney, B., and May, C., editors, mODa 9–Advances in Model-Oriented Design and Analysis, pages 89–96. Springer.
Gonz´alez, J., Osborne, M., and Lawrence, N. (2016). GLASSES: Relieving the myopia of bayesian optimisation. In Artificial Intelligence and Statistics, pages 790–799.
Groot, P., Birlutiu, A., and Heskes, T. (2010). Bayesian monte carlo for the global optimization of expensive functions. In ECAI, pages 249–254.
Hennig, P. and Schuler, C. J. (2012). Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13:1809–1837.
Hern´andez-Lobato, J. M., Gelbart, M. A., Hoffman, M. W., Adams, R. P., and Ghahramani, Z. (2015). Predictive entropy search for bayesian optimization with unknown constraints. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37, pages 1699–1707. JMLR. org.
Hern´andez-Lobato, J. M., Hoffman, M. W., and Ghahramani, Z. (2014). Predictive entropy search for efficient global optimization of black-box functions. In Advances in neural information processing systems, pages 918–926.
Ho, Y.-C., Cao, X., and Cassandras, C. (1983). Infinitesimal and finite perturbation analysis for queueing networks. Automatica, 19(4):439–445.
Huang, D., Allen, T., Notz, W., and Miller, R. (2006). Sequential kriging optimization using multiple- fidelity evaluations. Structural and Multidisciplinary Optimization, 32(5):369–382.
Jedynak, B., Frazier, P. I., and Sznitman, R. (2012). Twenty questions with noise: Bayes optimal policies for entropy loss. Journal of Applied Probability, 49(1):114–136.
Jones, D. R., Schonlau, M., and Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492.
Ju, S., Shiga, T., Feng, L., Hou, Z., Tsuda, K., and Shiomi, J. (2017). Designing nanostructures for phonon transport via Bayesian optimization. Physical Review X, 7.
Kaelbling, L. P., Littman, M. L., and Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237–285.
Kandasamy, K., Dasarathy, G., Oliva, J. B., Schneider, J., and P´oczos, B. (2016). Gaussian process bandit optimisation with multi-fidelity evaluations. In Advances in Neural Information Processing Systems, pages 992–1000.
Kandasamy, K., Schneider, J., and Poczos, B. (2015). High dimensional bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, pages 295–304.
Keane, A. (2006). Statistical improvement criteria for use in multiobjective design optimization. AIAA Journal, 44(4):879–891.
Kersting, K., Plagemann, C., Pfaff, P., and Burgard, W. (2007). Most likely heteroscedastic gaussian process regression. In Proceedings of the 24th International Conference on Machine learning, pages 393–400. ACM.
Kleijnen, J. P. et al. (2008). Design and Analysis of Simulation Experiments, volume 20. Springer.
Klein, A., Falkner, S., Bartels, S., Hennig, P., and Hutter, F. (2016). Fast Bayesian optimization of machine learning hyperparameters on large datasets. arXiv preprint arXiv:1605.07079.
Knowles, J. (2006). ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Transactions on Evolutionary Computation, 10(1):50–66.
Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86(1):97–106.
Lam, R., Allaire, D. L., and Willcox, K. E. (2015). Multifidelity optimization using statistical surro- gate modeling for non-hierarchical information sources. In 56th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, page 0143.
Lam, R., Willcox, K., and Wolpert, D. H. (2016). Bayesian optimization with a finite budget: An approximate dynamic programming approach. In Advances in Neural Information Processing Systems, pages 883–891.
Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization.
Dostları ilə paylaş: |