Skip to main content

The Evaluation of Rating Systems in Online Free-for-All Games

  • Conference paper
  • First Online:
Advances in Data Science and Information Engineering

Abstract

Online competitive games have become increasingly popular. To ensure an exciting and competitive environment, these games routinely attempt to match players with similar skill levels. Matching players is often accomplished through a rating system. There has been an increasing amount of research on developing such rating systems. However, less attention has been given to the evaluation metrics of these systems. In this paper, we present an exhaustive analysis of six metrics for evaluating rating systems in online competitive games. We compare traditional metrics such as accuracy. We then introduce other metrics adapted from the field of information retrieval. We evaluate these metrics against several well-known rating systems on a large real-world dataset of over 100,000 free-for-all matches. Our results show stark differences in their utility. Some metrics do not consider deviations between two ranks. Others are inordinately impacted by new players. Many do not capture the importance of distinguishing between errors in higher ranks and lower ranks. Among all metrics studied, we recommend normalized discounted cumulative gain (NDCG) because not only does it resolve the issues faced by other metrics, but it also offers flexibility to adjust the evaluations based on the goals of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A.E. Elo, The Rating of Chessplayers, Past and Present (Arco, Portugal, 1978)

    Google Scholar 

  2. J. Ilic, Number of Online Gamers to Hit 1 Billion by 2024. https://leagueofbetting.com/number-of-online-gamers-to-hit-1-billion-by-2024/. Accessed 2020-04-22

  3. M. Myślak, D. Deja, Developing game-structure sensitive matchmaking system for massive-multiplayer online games, in International Conference on Social Informatics (Springer, Berlin, 2014), pp. 200–208

    Google Scholar 

  4. R. Herbrich, T. Minka, T. Graepel, TrueskillTM: a bayesian skill rating system, in Advances in Neural Information Processing Systems (2007), pp. 569–576

    Google Scholar 

  5. L. Zhang, J. Wu, Z.-C. Wang, C.-J. Wang, A factor-based model for context-sensitive skill rating systems, in Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, vol. 2 (IEEE, New York, 2010), pp. 249–255

    Google Scholar 

  6. O. Delalleau, E. Contal, E. Thibodeau-Laufer, R.C. Ferrari, Y. Bengio, F. Zhang, Beyond skill rating: Advanced matchmaking in ghost recon online. IEEE Trans. Comput. Intell. AI in Games 4(3), 167–177 (2012)

    Article  Google Scholar 

  7. J.E. Menke, T.R. Martinez, A bradley–terry artificial neural network model for individual ratings in group competitions. Neural Comput. Appl. 17(2), 175–186 (2008)

    Article  Google Scholar 

  8. D. Buckley, K. Chen, J. Knowles, Predicting skill from gameplay input to a first-person shooter, in Proceedings of the 2013 IEEE Conference on Computational Inteligence in Games (CIG) (IEEE, New York, 2013), pp. 1–8

    Book  Google Scholar 

  9. J.R. López-Arcos, F. Gutiérrez, N. Padilla-Zea, N.M. Medina, P. Paderewski, Continuous assessment in educational video games: a roleplaying approach, in Proceedings of XV International Conference on Human Computer Interaction (2014), pp. 1–8

    Google Scholar 

  10. M.E. Glickman, Parameter estimation in large dynamic paired comparison experiments. J. R. Stat. Soc. Series C (Appl. Stat.) 48(3), 377–394 (1999)

    Google Scholar 

  11. C. DeLong, N. Pathak, K. Erickson, E. Perrino, K. Shim, J. Srivastava, Teamskill: modeling team chemistry in online multi-player games, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, Berlin, 2011), pp. 519–531

    Google Scholar 

  12. I. Makarov, D. Savostyanov, B. Litvyakov, D.I. Ignatov, Predicting winning team and probabilistic ratings in “dota 2” and “counter-strike: global offensive” video games, in Proceedings of the International Conference on Analysis of Images, Social Networks and Texts (Springer, Berlin, 2017), pp. 183–196

    Google Scholar 

  13. R.C. Weng, C.-J. Lin, A bayesian approximation method for online ranking, J. Mach. Learn. Res. 12(Jan), 267–300 (2011)

    MathSciNet  MATH  Google Scholar 

  14. S. Guo, S. Sanner, T. Graepel, W. Buntine, Score-based bayesian skill learning, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, 2012), pp. 106–121

    Book  Google Scholar 

  15. S. Chen, T. Joachims, Predicting matchups and preferences in context, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2016), pp. 775–784

    Google Scholar 

  16. Y. Li, M. Cheng, K. Fujii, F. Hsieh, C.-J. Hsieh, Learning from group comparisons: exploiting higher order interactions, in Advances in Neural Information Processing Systems (2018), pp. 4981–4990

    Google Scholar 

  17. S. Motegi, N. Masuda, A network-based dynamical ranking system for competitive sports. Sci. Rep. 2, 904 (2012)

    Article  Google Scholar 

  18. J. Ibstedt, E. Rådahl, E. Turesson, et al., Application and Further Development of Trueskill TM Ranking in Sports (2019)

    Google Scholar 

  19. S. Cooper, C.S. Deterding, T. Tsapakos, Player rating systems for balancing human computation games: testing the effect of bipartiteness, in Proceedings of the 1st International Joint Conference of DiGRA and FDG. DIGRA Digital Games and Research Association (2016)

    Google Scholar 

  20. B. Morrison, Comparing ELO, Glicko, IRT, and Bayesian IRT Statistical Models for Educational and Gaming Data (2019)

    Google Scholar 

  21. M. Stanescu, Rating systems with multiple factors, in Master’s thesis, School of Informatics, University of Edinburgh, Edinburgh, UK (2011)

    Google Scholar 

  22. L. Yu, D. Zhang, X. Chen, X. Xie, Moba-slice: a time slice based evaluation framework of relative advantage between teams in moba games (2018). arXiv preprint arXiv:1807.08360

    Google Scholar 

  23. J. Lasek, Z. Szlávik, S. Bhulai, The predictive power of ranking systems in association football. Int. J. Appl. Pattern Recognit. 1(1), 27–46 (2013)

    Article  Google Scholar 

  24. D. Buckley, K. Chen, J. Knowles, Rapid skill capture in a first-person shooter. IEEE Trans. Comput. Intell. AI Games 9(1), 63–75 (2015)

    Article  Google Scholar 

  25. C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University, Cambridge, 2008)

    Book  Google Scholar 

  26. E.M. Voorhees, The trec-8 question answering track report, in Proceedings of the Trec, vol. 99 (Citeseer, New York, 1999), pp. 77–82

    Google Scholar 

  27. G. Salton, Developments in automatic text retrieval. Science 253(5023), 974–980 (1991)

    Article  MathSciNet  Google Scholar 

  28. K. Järvelin, J. Kekäläinen, Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)

    Google Scholar 

  29. M.E. Glickman, The Glicko System, vol. 16 (Boston University, Boston, 1995)

    Google Scholar 

  30. F.R. Kschischang, B.J. Frey, H.-A. Loeliger et al., Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)

    Article  MathSciNet  Google Scholar 

  31. T.P. Minka, A Family of Algorithms for Approximate Bayesian Inference. Ph.D. dissertation (Massachusetts Institute of Technology, Cambridge, 2001)

    Google Scholar 

  32. J. Huang, T. Zimmermann, N. Nagapan, C. Harrison, B.C. Phillips, Mastering the art of war: how patterns of gameplay influence skill in halo, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM, New York, 2013), pp. 695–704

    Google Scholar 

  33. C. Kawatsu, R. Hubal, R.P. Marinier, Predicting students’ decisions in a training simulation: a novel application of trueskill. IEEE Trans. Games 10(1), 97–100 (2017)

    Article  Google Scholar 

  34. L.C. Quispe, J.E.O. Luna, A content-based recommendation system using trueskill, in Proceedings of the 2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI) (IEEE, New York, 2015), pp. 203–207

    Google Scholar 

  35. T. Graepel, J.Q. Candela, T. Borchert, R. Herbrich, Web-scale Bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine (Omnipress, Madison, 2010)

    Google Scholar 

  36. M.G. Kendall, Rank Correlation Methods (1948)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arman Dehpanah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dehpanah, A., Ghori, M.F., Gemmell, J., Mobasher, B. (2021). The Evaluation of Rating Systems in Online Free-for-All Games. In: Stahlbock, R., Weiss, G.M., Abou-Nasr, M., Yang, CY., Arabnia, H.R., Deligiannidis, L. (eds) Advances in Data Science and Information Engineering. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-71704-9_9

Download citation

Publish with us

Policies and ethics