RSPSA: Enhanced Parameter Optimization in Games

  • Levente Kocsis
  • Csaba Szepesvári
  • Mark H. M. Winands
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4250)


Most game programs have a large number of parameters that are crucial for their performance. Tuning these parameters by hand is rather difficult. Therefore automatic optimization algorithms in game programs are interesting research domains. However, successful applications are only known for parameters that belong to certain components (e.g., evaluation-function parameters). The SPSA (Simultaneous Perturbation Stochastic Approximation) algorithm is an attractive choice for optimizing any kind of parameters of a game program, both for its generality and its simplicity. Its disadvantage is that it can be very slow.

In this article we propose several methods to speed up SPSA, in particular, the combination with RPROP, using common random numbers, antithetic variables, and averaging. We test the resulting algorithm for tuning various types of parameters in two domains, Poker and LOA. From the experimental study, we may conclude that using SPSA is a viable approach for optimization in game programs, in particular if no good alternative exists for the types of parameters considered.


Stochastic Approximation Perturbation Vector Opponent Model Realization Probability Simultaneous Perturbation Stochastic Approximation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Billings, D., Davidson, A., Shauenberg, T., Burch, N., Bowling, M., Holte, R., Schaeffer, J., Szafron, D.: Game Tree Search with Adaptation in Stochastic Imperfect Information Games. In: van den Herik, H.J., Björnsson, Y., Netanyahu, N.S. (eds.) CG 2004. LNCS, vol. 3846, pp. 21–34. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Björnsson, Y., Marsland, T.A.: Learning Extension Parameters in Game-Tree Search. Journal of Information Sciences 154, 95–118 (2003)CrossRefGoogle Scholar
  3. 3.
    Chellapilla, K., Fogel, D.B.: Evolving Neural Networks to Play Checkers Without Expert Knowledge. IEEE Transactions on Neural Networks 10(6), 1382–1391 (1999)CrossRefGoogle Scholar
  4. 4.
    Dippon, J.: Accelerated Randomized Stochastic Optimization. Annals of Statistics 31(4), 1260–1281 (2003)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Igel, C., Hüsken, M.: Empirical Evaluation of the Improved Rprop Learning Algorithm. Neurocomputing 50(C), 105–123 (2003)MATHCrossRefGoogle Scholar
  6. 6.
    Kleinman, N.L., Spall, J.C., Neiman, D.Q.: Simulation-based Optimization with Stochastic Approximation using Common Random Numbers. Management Science 45(11), 1570–1578 (1999)MATHCrossRefGoogle Scholar
  7. 7.
    Kocsis, L.: Learning Search Decisions. PhD thesis, Universiteit Maastricht, Maastricht, The Netherlands (2003)Google Scholar
  8. 8.
    Kocsis, L., van den Herik, H.J., Uiterwijk, J.W.H.M.: Two Learning Algorithms for Forward Pruning. ICGA Journal 26(3), 165–181 (2003)Google Scholar
  9. 9.
    L’Ecuyer, P., Yin, G.: Budget-dependent Convergence Rate of Stochastic Approximation. SIAM J. on Optimization 8(1), 217–247 (1998)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Levy, D.: Some Comments on Realization Probabilities and the Sex Algorithm. ICGA Journal 25(3), 167 (2002)Google Scholar
  11. 11.
    Riedmiller, M., Braun, H.: A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. In: Ruspini, E.H. (ed.) Proceedings of the IEEE International Conference on Neural Networks, pp. 586–591 (1993)Google Scholar
  12. 12.
    Spall, J.C.: Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation. IEEE Transactions on Automatic Control 37, 332–341 (1992)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Spall, J.C.: Adaptive Stochastic Approximation by the Simultaneous Perturbation Method. IEEE Transactions on Automatic Control 45, 1839–1853 (2000)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Tesauro, G.: Practical Issues in Temporal Difference Learning. Machine Learning 8, 257–277 (1992)MATHGoogle Scholar
  15. 15.
    Theiler, J., Alper, J.: On the Choice of Random Directions for Stochastic Approximation Algorithms. IEEE Transactions on Automatic Control 51, 476–481 (2006)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Tsuruoka, Y., Yokoyama, D., Chikayama, T.: Game-tree Search Algorithm based on Realization Probability. ICGA Journal 25(3), 132–144 (2002)Google Scholar
  17. 17.
    Winands, M.H.M.: Informed Search in Complex Games. PhD thesis, Universiteit Maastricht, Maastricht, The Netherlands (2004)Google Scholar
  18. 18.
    Winands, M.H.M., Kocsis, L., Uiterwijk, J.W.H.M., van den Herik, H.J.: Learning in Lines of Action. In: Proceedings of BNAIC 2002, pp. 99–103 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Levente Kocsis
    • 1
  • Csaba Szepesvári
    • 1
  • Mark H. M. Winands
    • 2
  1. 1.MTA SztakiBudapestHungary
  2. 2.Institute for Knowledge and Agent Technology, MICCUniversiteit MaastrichtMaastrichtThe Netherlands

Personalised recommendations