Abstract
Online competitive games have become increasingly popular. To ensure an exciting and competitive environment, these games routinely attempt to match players with similar skill levels. Matching players is often accomplished through a rating system. There has been an increasing amount of research on developing such rating systems. However, less attention has been given to the evaluation metrics of these systems. In this paper, we present an exhaustive analysis of six metrics for evaluating rating systems in online competitive games. We compare traditional metrics such as accuracy. We then introduce other metrics adapted from the field of information retrieval. We evaluate these metrics against several well-known rating systems on a large real-world dataset of over 100,000 free-for-all matches. Our results show stark differences in their utility. Some metrics do not consider deviations between two ranks. Others are inordinately impacted by new players. Many do not capture the importance of distinguishing between errors in higher ranks and lower ranks. Among all metrics studied, we recommend normalized discounted cumulative gain (NDCG) because not only does it resolve the issues faced by other metrics, but it also offers flexibility to adjust the evaluations based on the goals of the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A.E. Elo, The Rating of Chessplayers, Past and Present (Arco, Portugal, 1978)
J. Ilic, Number of Online Gamers to Hit 1 Billion by 2024. https://leagueofbetting.com/number-of-online-gamers-to-hit-1-billion-by-2024/. Accessed 2020-04-22
M. Myślak, D. Deja, Developing game-structure sensitive matchmaking system for massive-multiplayer online games, in International Conference on Social Informatics (Springer, Berlin, 2014), pp. 200–208
R. Herbrich, T. Minka, T. Graepel, TrueskillTM: a bayesian skill rating system, in Advances in Neural Information Processing Systems (2007), pp. 569–576
L. Zhang, J. Wu, Z.-C. Wang, C.-J. Wang, A factor-based model for context-sensitive skill rating systems, in Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, vol. 2 (IEEE, New York, 2010), pp. 249–255
O. Delalleau, E. Contal, E. Thibodeau-Laufer, R.C. Ferrari, Y. Bengio, F. Zhang, Beyond skill rating: Advanced matchmaking in ghost recon online. IEEE Trans. Comput. Intell. AI in Games 4(3), 167–177 (2012)
J.E. Menke, T.R. Martinez, A bradley–terry artificial neural network model for individual ratings in group competitions. Neural Comput. Appl. 17(2), 175–186 (2008)
D. Buckley, K. Chen, J. Knowles, Predicting skill from gameplay input to a first-person shooter, in Proceedings of the 2013 IEEE Conference on Computational Inteligence in Games (CIG) (IEEE, New York, 2013), pp. 1–8
J.R. López-Arcos, F. Gutiérrez, N. Padilla-Zea, N.M. Medina, P. Paderewski, Continuous assessment in educational video games: a roleplaying approach, in Proceedings of XV International Conference on Human Computer Interaction (2014), pp. 1–8
M.E. Glickman, Parameter estimation in large dynamic paired comparison experiments. J. R. Stat. Soc. Series C (Appl. Stat.) 48(3), 377–394 (1999)
C. DeLong, N. Pathak, K. Erickson, E. Perrino, K. Shim, J. Srivastava, Teamskill: modeling team chemistry in online multi-player games, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, Berlin, 2011), pp. 519–531
I. Makarov, D. Savostyanov, B. Litvyakov, D.I. Ignatov, Predicting winning team and probabilistic ratings in “dota 2” and “counter-strike: global offensive” video games, in Proceedings of the International Conference on Analysis of Images, Social Networks and Texts (Springer, Berlin, 2017), pp. 183–196
R.C. Weng, C.-J. Lin, A bayesian approximation method for online ranking, J. Mach. Learn. Res. 12(Jan), 267–300 (2011)
S. Guo, S. Sanner, T. Graepel, W. Buntine, Score-based bayesian skill learning, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, 2012), pp. 106–121
S. Chen, T. Joachims, Predicting matchups and preferences in context, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2016), pp. 775–784
Y. Li, M. Cheng, K. Fujii, F. Hsieh, C.-J. Hsieh, Learning from group comparisons: exploiting higher order interactions, in Advances in Neural Information Processing Systems (2018), pp. 4981–4990
S. Motegi, N. Masuda, A network-based dynamical ranking system for competitive sports. Sci. Rep. 2, 904 (2012)
J. Ibstedt, E. Rådahl, E. Turesson, et al., Application and Further Development of Trueskill TM Ranking in Sports (2019)
S. Cooper, C.S. Deterding, T. Tsapakos, Player rating systems for balancing human computation games: testing the effect of bipartiteness, in Proceedings of the 1st International Joint Conference of DiGRA and FDG. DIGRA Digital Games and Research Association (2016)
B. Morrison, Comparing ELO, Glicko, IRT, and Bayesian IRT Statistical Models for Educational and Gaming Data (2019)
M. Stanescu, Rating systems with multiple factors, in Master’s thesis, School of Informatics, University of Edinburgh, Edinburgh, UK (2011)
L. Yu, D. Zhang, X. Chen, X. Xie, Moba-slice: a time slice based evaluation framework of relative advantage between teams in moba games (2018). arXiv preprint arXiv:1807.08360
J. Lasek, Z. Szlávik, S. Bhulai, The predictive power of ranking systems in association football. Int. J. Appl. Pattern Recognit. 1(1), 27–46 (2013)
D. Buckley, K. Chen, J. Knowles, Rapid skill capture in a first-person shooter. IEEE Trans. Comput. Intell. AI Games 9(1), 63–75 (2015)
C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University, Cambridge, 2008)
E.M. Voorhees, The trec-8 question answering track report, in Proceedings of the Trec, vol. 99 (Citeseer, New York, 1999), pp. 77–82
G. Salton, Developments in automatic text retrieval. Science 253(5023), 974–980 (1991)
K. Järvelin, J. Kekäläinen, Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
M.E. Glickman, The Glicko System, vol. 16 (Boston University, Boston, 1995)
F.R. Kschischang, B.J. Frey, H.-A. Loeliger et al., Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
T.P. Minka, A Family of Algorithms for Approximate Bayesian Inference. Ph.D. dissertation (Massachusetts Institute of Technology, Cambridge, 2001)
J. Huang, T. Zimmermann, N. Nagapan, C. Harrison, B.C. Phillips, Mastering the art of war: how patterns of gameplay influence skill in halo, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM, New York, 2013), pp. 695–704
C. Kawatsu, R. Hubal, R.P. Marinier, Predicting students’ decisions in a training simulation: a novel application of trueskill. IEEE Trans. Games 10(1), 97–100 (2017)
L.C. Quispe, J.E.O. Luna, A content-based recommendation system using trueskill, in Proceedings of the 2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI) (IEEE, New York, 2015), pp. 203–207
T. Graepel, J.Q. Candela, T. Borchert, R. Herbrich, Web-scale Bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine (Omnipress, Madison, 2010)
M.G. Kendall, Rank Correlation Methods (1948)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Dehpanah, A., Ghori, M.F., Gemmell, J., Mobasher, B. (2021). The Evaluation of Rating Systems in Online Free-for-All Games. In: Stahlbock, R., Weiss, G.M., Abou-Nasr, M., Yang, CY., Arabnia, H.R., Deligiannidis, L. (eds) Advances in Data Science and Information Engineering. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-71704-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-71704-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71703-2
Online ISBN: 978-3-030-71704-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)