Abstract
Artificial intelligence in games often leads to the problem of parameter tuning. Some heuristics may have coefficients, and they should be tuned to maximize the win rate of the program. A possible approach is to build local quadratic models of the win rate as a function of program parameters. Many local regression algorithms have already been proposed for this task, but they are usually not sufficiently robust to deal automatically and efficiently with very noisy outputs and non-negative Hessians. The CLOP principle, which stands for Confident Local OPtimization, is a new approach to local regression that overcomes all these problems in a straightforward and efficient way. CLOP discards samples of which the estimated value is confidently inferior to the mean of all samples. Experiments demonstrate that, when the function to be optimized is smooth, this method outperforms all other tested algorithms.
Keywords
- Quadratic Regression
- Winter Simulation
- Noisy Observation
- Noisy Function
- Noisy Optimization
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R.: The continuum-armed bandit problem. SIAM Journal on Control and Optimization 33(6), 1926–1951 (1995)
Anderson, B.S., Moore, A.W., Cohn, D.: A nonparametric approach to noisy and costly optimization. In: Langley, P. (ed.) Proceedings of the Seventeenth International Conference on Machine Learning, pp. 17–24. Morgan Kaufmann (2000)
Boesch, E.: Minimizing the mean of a random variable with one real parameter (2010)
Box, G.E.P., Wilson, K.B.: On the experimental attainment of optimum conditions (with discussion). Journal of the Royal Statistical Society 13(1), 1–45 (1951)
Branke, J., Meisel, S., Schmidt, C.: Simulated annealing in the presence of noise. Journal of Heuristics 14, 627–654 (2008)
Chaloner, K.: Bayesian design for estimating the turning point of a quadratic regression. Communications in Statistics—Theory and Methods 18(4), 1385–1400 (1989)
Chang, K.H., Hong, L.J., Wan, H.: Stochastic trust region gradient-free method (STRONG)—a new response-surface-based algorithm in simulation optimization. In: Henderson, S.G., Biller, B., Hsieh, M.H., Shortle, J., Tew, J.D., Barton, R.R. (eds.) Proceedings of the 2007 Winter Simulation Conference, pp. 346–354 (2007)
Chaslot, G.M.J.B., Winands, M.H.M., Szita, I., van den Herik, H.J.: Cross-entropy for Monte-Carlo tree search. ICGA Journal 31(3), 145–156 (2008)
Chen, H.: Lower rate of convergence for locating a maximum of a function. The Annals of Statistics 16(3), 1330–1334 (1988)
Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (2007)
Deng, G., Ferris, M.C.: Adaptation of the UOBYQA algorithm for noisy functions. In: Perrone, L.F., Wieland, F.P., Liu, J., Lawson, B.G., Nicol, D.M., Fujimoto, R.M. (eds.) Proceedings of the 2006 Winter Simulation Conference, pp. 312–319 (2006)
Deng, G., Ferris, M.C.: Extension of the DIRECT optimization algorithm for noisy functions. In: Henderson, S.G., Biller, B., Hsieh, M.H., Shortle, J., Tew, J.D., Barton, R.R. (eds.) Proceedings of the 2007 Winter Simulation Conference, pp. 497–504 (2007)
Elster, C., Neumaier, A.: A method of trust region type for minimizing noisy functions. Computing 58(1), 31–46 (1997)
Fackle Fornius, E.: Optimal Design of Experiments for the Quadratic Logistic Model. Ph.D. thesis, Department of Statistics, Stockholm University (2008)
Hansen, N., Niederberger, A.S.P., Guzzella, L., Koumoutsakos, P.: A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion. IEEE Transactions on Evolutionary Computation 13(1), 180–197 (2009)
Hu, J., Hu, P.: On the performance of the cross-entropy method. In: Rossetti, M.D., Hill, R.R., Johansson, B., Dunkin, A., Ingalls, R.G. (eds.) Proceedings of the 2009 Winter Simulation Conference, pp. 459–468 (2009)
Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. Journal of Global Optimization 34(3), 441–466 (2006)
Hutter, F., Bartz-Beielstein, T., Hoos, H., Leyton-Brown, K., Murphy, K.: Sequential model-based parameter optimisation: an experimental investigation of automated and interactive approaches. In: Bartz-Beielstein, T., Chiarandini, M., Paquete, L., Preuß, M. (eds.) Empirical Methods for the Analysis of Optimization Algorithms, ch.15, pp. 361–411. Springer (2010)
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. Journal of Global Optimization 13, 455–492 (1998)
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics 23(3), 462–466 (1952)
Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Kocsis, L., Szepesvári, C.: Bandit Based Monte-Carlo Planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Kocsis, L., Szepesvári, C.: Universal parameter optimisation in games based on SPSA. Machine Learning 63(3), 249–286 (2006)
Locatelli, M.: Simulated annealing algorithms for continuous global optimization. In: Handbook of Global Optimization II, pp. 179–230. Kluwer Academic Publishers (2002)
Moore, A.W., Schneider, J.G., Boyan, J.A., Lee, M.S.: Q2: Memory-based active learning for optimizing noisy continuous functions. In: Shavlik, J. (ed.) Proceedings of the Fifteenth International Conference of Machine Learning, pp. 386–394. Morgan Kaufmann (1998)
Salganicoff, M., Ungar, L.H.: Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices. In: Prieditis, A., Russell, S.J. (eds.) Proceedings of the Twelfth International Conference on Machine Learning, pp. 480–487. Morgan Kaufmann (1995)
Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM Journal 3(3), 210–229 (1959)
Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control 37, 332–341 (1992)
Spall, J.C.: Feedback and weighting mechanisms for improving Jacobian estimates in the adaptive simultaneous perturbation algorithm. IEEE Transactions on Automatic Control 54(6), 1216–1229 (2009)
Tesauro, G.: Temporal difference learning and TD-Gammon. Communications of the ACM 38(3), 58–68 (1995)
Villemonteix, J., Vazquez, E., Walter, E.: An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization (September 2008)
Wiens, D.P.: Robustness of design for the testing of lack of fit and for estimation in binary response models. Computational Statistics & Data Analysis 54(12), 3371–3378 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Coulom, R. (2012). CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning. In: van den Herik, H.J., Plaat, A. (eds) Advances in Computer Games. ACG 2011. Lecture Notes in Computer Science, vol 7168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31866-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-31866-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31865-8
Online ISBN: 978-3-642-31866-5
eBook Packages: Computer ScienceComputer Science (R0)
