Universal parameter optimisation in games based on SPSA

Kocsis, Levente; Szepesvári, Csaba

doi:10.1007/s10994-006-6888-8

Universal parameter optimisation in games based on SPSA

Published: 28 March 2006

Volume 63, pages 249–286, (2006)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Universal parameter optimisation in games based on SPSA

Download PDF

Levente Kocsis¹ &
Csaba Szepesvári¹

950 Accesses
15 Citations
Explore all metrics

Abstract

Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are known only for special problems such as the adjustment of the parameters of an evaluation function. The SPSA algorithm (Simultaneous Perturbation Stochastic Approximation) is a generic stochastic gradient method for optimising an objective function when an analytic expression of the gradient is not available, a frequent case in game programs. Further, SPSA in its canonical form is very easy to implement. As such, it is an attractive choice for parameter optimisation in game programs, both due to its generality and simplicity. The goal of this paper is twofold: (i) to introduce SPSA for the game programming community by putting it into a game-programming perspective, and (ii) to propose and discuss several methods that can be used to enhance the performance of SPSA. These methods include using common random numbers and antithetic variables, a combination of SPSA with RPROP, and the reuse of samples of previous performance evaluations. SPSA with the proposed enhancements was tested in some large-scale experiments on tuning the parameters of an opponent model, a policy and an evaluation function in our poker program, MCRAISE. Whilst SPSA with no enhancements failed to make progress using the allocated resources, SPSA with the enhancements proved to be competitive with other methods, including TD-learning; increasing the average payoff per game by as large as 0.19 times the size of the amount of the small bet. From the experimental study, we conclude that the use of an appropriately enhanced variant of SPSA for the optimisation of game program parameters is a viable approach, especially if no good alternative exist for the types of parameters considered.

Article PDF

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Article 09 April 2023

Applications of game theory in deep learning: a survey

Article 09 February 2022

GOOSE algorithm: a powerful optimization tool for real-world engineering challenges and beyond

Article 11 January 2024

References

Anastasiadis, A. D., Magoulas, G. D., & Vrahatis, M. N. (2005). New globally convergent training scheme based on the resilient propagation algorithm. Neurocomputing, 64, 253–270.
Article Google Scholar
Andradóttir, S. (1998). A review of simulation optimization techniques. In Proceeding of the 1998 Winter Simulation Conference (pp. 151–158).
Baird, L. & Moore, A. W. (1999). Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems 11 (pp. 968–974). Cambridge MA: MIT Press.
Baxter, J., & Bartlett, P. L. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 319–350.
Article MathSciNet MATH Google Scholar
Baxter, J., Tridgell, A., & Weaver, L. (2000). Learning to play chess using temporal differences. Machine Learning, 40(3), 243–263.
Article MATH Google Scholar
Billings, D., Burch, N., Davidson, A., Holte, R., Schaeffer, J., Schauenberg, T., & Szafron, D. (2003). Approximating game-theoretic optimal strategies for full-scale poker. In Proceedings of Eighteenth International Joint Conference on Artificial Intelligence (pp. 661–668).
Billings, D., Davidson, A., Schaeffer, J., & Szafron, D. (2002). The challenge of poker. Artificial Intelligence, 134, 201–240.
Article MATH Google Scholar
Billings, D., Davidson, A., Shauenberg, T., Burch, N., Bowling, M., Holte, R., Schaeffer, J., & Szafron, D. (2004). Game tree search with adaptation in stochastic imperfect information games. In Proceedings of Computers and Games (CG’04).
Björnsson, Y., & Marsland, T. A. (2003). Learning extension parameters in game-tree search. Journal of Information Sciences, 154, 95–118.
Article Google Scholar
Blum, J. R. (1954). Multidimensional stochastic approximation methods. Annals of Mathematical Statistics, 25, 737–744.
MATH MathSciNet Google Scholar
Bowling, M., & Veloso, M. (2002). Scalable learning in stochastic games. In AAAI Workshop on Game Theoretic and Decision Theoretic Agents.
Chellapilla, K., & Fogel, D. B. (1999). Evolving neural networks to play checkers without expert knowledge’. IEEE Transactions on Neural Networks, 10(6), 1382–1391.
Article Google Scholar
Chen, H. (1988). Lower rate convergence for locating a maximum of a function. Annals of Statistics, 16, 1330–1334.
MATH MathSciNet Google Scholar
Dippon, J. (2003). Accelerated randomized stochastic optimization. Annals of Statistics, 31(4), 1260–1281.
Article MATH MathSciNet Google Scholar
Douc, R., Cappé, O., & Moulines, E. (2005). Comparison of resampling schemes for particle filtering. In 4th International Symposium on Image and Signal Processing and Analysis (ISPA).
Fabian, V. (1968). On asymptotic normality in stochastic approximation. Annals of Mathematical Statistics, 39, 1327–1332.
Google Scholar
Gerencsér, L., Hill, S. D., & Vágó, Z. (1999). Optimization over discrete sets via SPSA. In Proceedings of the 1999 Winter Simulation Conference (pp. 466–470).
Gerencsér, L., Kozmann, G., & Vágó, Z. (1998). Non-smooth optimization via SPSA. In Proceedings of the Conference on the Mathematical Theory of Networks and Systems MTNS 98 (pp. 803–806).
Glasserman, P., & Yao, D. D. (1992). Some guidelines and guarantees for common random numbers. Management Science, 38, 884–908.
MATH Google Scholar
Greensmith, E., Bartlett, P. L., & Baxter, J. (2002). Variance reduction techniques for gradient estimates in reinforcement learning. In Advances in Neural Information Processing Systems 14 (pp. 1507–1514).
He, Y., Fu, M. C., & Marcus, S. I. (2003). Convergence of simultaneous perturbation stochastic approximation for nondifferentiable optimization. IEEE Transactions on Automatic Control, 48, 1459–1463.
Article MathSciNet Google Scholar
Igel, C., & Hüsken, M. (2000), Improving the Rprop learning algorithm. In H. Bothe, & R. Rojas (Eds.), Proceedings of the second international ICSC symposium on neural computation (NC 2000) (pp. 115–121). ICSC Academic Press.
Igel, C., & Hüsken, M. (2003). Empirical evaluation of the improved Rprop learning algorithm. Neurocomputing, 50(C), 105–123.
Article MATH Google Scholar
Kakade, S., & Langford, J. (2002). Approximately optimal approximate reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning (ICML 2002) (pp. 267–274).
Kiefer, J., & Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics, 23, 462–466.
MathSciNet MATH Google Scholar
Kleinman, N. L., Spall, J. C., & Neiman, D. Q. (1999). Simulation-based optimization with stochastic approximation using common random numbers. Management Science, 45(11), 1570–1578.
MATH Google Scholar
Kocsis, L. (2003). Learning search decisions. Ph.D. thesis, Universiteit Maastricht, The Netherlands.
Kocsis, L., & Szepesvári, Cs. (2005). Reduced-variance payoff estimation in adversarial bandit problems. In Proceedings of the ECML’05 Workshop on Reinforcement Learning in Non-Stationary Environments (in print).
Kocsis, L., Szepesvári, Cs., & Winands, M. H. M. (2005). RSPSA: Enhanced parameter optimisation in games. In Proceedings of the 11th Advances in Computer Games Conference (ACG-11), in press.
Kushner, H. J., & Yin, G. G. (1997). Stochastic approximation algorithms and applications. New York: Springer.
MATH Google Scholar
L’Ecuyer, P., & Yin, G. (1998). Budget-dependent convergence rate of stochastic approximation. SIAM J. on Optimization, 8(1), 217–247.
Article MathSciNet MATH Google Scholar
Polyak, B. T., & Tsybakov, A. B. (1990). Optimal orders of accuracy for search algorithms of stochastic optimization. Problems of Information Transmission, 26, 126–133.
MathSciNet MATH Google Scholar
Riedmiller, M., & Braun, H. (1993). A direct adaptive method for faster backpropagation learning The RPROP algorithm. In E. H. Ruspini (Eds.), Proceedings of the IEEE international conference on neural networks (pp. 586–591). IEEE Press.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400–407.
MathSciNet MATH Google Scholar
Rubinstein, R. Y., Samorodnitsky, G., & Shaked, M. (1985). Antithetic variables, multivariate dependence and simulation of complex stochastic systems. Management Sciences, 31, 66–77.
Article MathSciNet MATH Google Scholar
Sadegh, P. & Spall, J. C. (1997). Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation. In Proceedings of the American Control Conference, Albuquerque, NM (pp. 3582–3586).
Schraudolph, N. (1999). Local gain adaptation in stochastic gradient descent. In Proc. 9th International Conference on Artificial Neural Networks, Edinburgh (pp. 569–574). London: IEE.
Schraudolph, N. N. & Graepel, T. (2002). Towards stochastic conjugate gradient methods. In Proceedings of the 9th International Conference on Neural Information Processing (pp. 1351–1358).
Spall, J. C. (1992). Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control, 37, 332–341.
Article MATH MathSciNet Google Scholar
Spall, J. C. (2000). Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Transactions on Automatic Control, 45, 1839–1853.
Article MATH MathSciNet Google Scholar
Spall, J. C. (2003). Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, Hoboken, NJ: Wiley.
MATH Google Scholar
Sutton, R. & Barto, A. (1998). Reinforcement learning: An introduction. Bradford Book.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Google Scholar
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12 (pp. 1057–1063), MIT Press, Cambridge MA.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.
MATH Google Scholar
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.
MATH Google Scholar
Winands, M. H. M., Kocsis, L., Uiterwijk, J. W. H. M., & Van den Herik, H. J. (2002). Temporal difference learning and the neural movemap heuristic in the game of lines of action. In Proceedings of 3rd International Conference on Intelligent Games and Simulation (GAME-ON 2002) (pp. 99–103).
Xiong, X., Wang, I.-J., & Fu, M. C. (2002). Randomized-direction stochastic approximation algorithms using deterministic sequences. In Proceedings of the 2002 Winter Simulation Conference, San Diego, CA (pp. 285–291).

Download references

Author information

Authors and Affiliations

MTA SZTAKI, Kende u. 13–17, Budapest, Hungary, 1111
Levente Kocsis & Csaba Szepesvári

Authors

Levente Kocsis
View author publications
You can also search for this author in PubMed Google Scholar
Csaba Szepesvári
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Csaba Szepesvári.

Additional information

Editors: Michael Bowling · Johannes Fürnkranz · Thore Graepel · Ron Musick

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kocsis, L., Szepesvári, C. Universal parameter optimisation in games based on SPSA. Mach Learn 63, 249–286 (2006). https://doi.org/10.1007/s10994-006-6888-8

Download citation

Received: 11 February 2005
Revised: 12 September 2005
Accepted: 29 December 2005
Published: 28 March 2006
Issue Date: June 2006
DOI: https://doi.org/10.1007/s10994-006-6888-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Universal parameter optimisation in games based on SPSA

Abstract

Article PDF

Similar content being viewed by others

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Applications of game theory in deep learning: a survey

GOOSE algorithm: a powerful optimization tool for real-world engineering challenges and beyond

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Universal parameter optimisation in games based on SPSA

Abstract

Article PDF

Similar content being viewed by others

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Applications of game theory in deep learning: a survey

GOOSE algorithm: a powerful optimization tool for real-world engineering challenges and beyond

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation