The Evaluation of Rating Systems in Online Free-for-All Games

Dehpanah, Arman; Ghori, Muheeb Faizan; Gemmell, Jonathan; Mobasher, Bamshad

doi:10.1007/978-3-030-71704-9_9

Arman Dehpanah⁸,
Muheeb Faizan Ghori⁸,
Jonathan Gemmell⁸ &
…
Bamshad Mobasher⁸

Part of the book series: Transactions on Computational Science and Computational Intelligence ((TRACOSCI))

2196 Accesses
1 Citations

Abstract

Online competitive games have become increasingly popular. To ensure an exciting and competitive environment, these games routinely attempt to match players with similar skill levels. Matching players is often accomplished through a rating system. There has been an increasing amount of research on developing such rating systems. However, less attention has been given to the evaluation metrics of these systems. In this paper, we present an exhaustive analysis of six metrics for evaluating rating systems in online competitive games. We compare traditional metrics such as accuracy. We then introduce other metrics adapted from the field of information retrieval. We evaluate these metrics against several well-known rating systems on a large real-world dataset of over 100,000 free-for-all matches. Our results show stark differences in their utility. Some metrics do not consider deviations between two ranks. Others are inordinately impacted by new players. Many do not capture the importance of distinguishing between errors in higher ranks and lower ranks. Among all metrics studied, we recommend normalized discounted cumulative gain (NDCG) because not only does it resolve the issues faced by other metrics, but it also offers flexibility to adjust the evaluations based on the goals of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A.E. Elo, The Rating of Chessplayers, Past and Present (Arco, Portugal, 1978)
Google Scholar
J. Ilic, Number of Online Gamers to Hit 1 Billion by 2024. https://leagueofbetting.com/number-of-online-gamers-to-hit-1-billion-by-2024/. Accessed 2020-04-22
M. Myślak, D. Deja, Developing game-structure sensitive matchmaking system for massive-multiplayer online games, in International Conference on Social Informatics (Springer, Berlin, 2014), pp. 200–208
Google Scholar
R. Herbrich, T. Minka, T. Graepel, Trueskill^TM: a bayesian skill rating system, in Advances in Neural Information Processing Systems (2007), pp. 569–576
Google Scholar
L. Zhang, J. Wu, Z.-C. Wang, C.-J. Wang, A factor-based model for context-sensitive skill rating systems, in Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, vol. 2 (IEEE, New York, 2010), pp. 249–255
Google Scholar
O. Delalleau, E. Contal, E. Thibodeau-Laufer, R.C. Ferrari, Y. Bengio, F. Zhang, Beyond skill rating: Advanced matchmaking in ghost recon online. IEEE Trans. Comput. Intell. AI in Games 4(3), 167–177 (2012)
Article Google Scholar
J.E. Menke, T.R. Martinez, A bradley–terry artificial neural network model for individual ratings in group competitions. Neural Comput. Appl. 17(2), 175–186 (2008)
Article Google Scholar
D. Buckley, K. Chen, J. Knowles, Predicting skill from gameplay input to a first-person shooter, in Proceedings of the 2013 IEEE Conference on Computational Inteligence in Games (CIG) (IEEE, New York, 2013), pp. 1–8
Book Google Scholar
J.R. López-Arcos, F. Gutiérrez, N. Padilla-Zea, N.M. Medina, P. Paderewski, Continuous assessment in educational video games: a roleplaying approach, in Proceedings of XV International Conference on Human Computer Interaction (2014), pp. 1–8
Google Scholar
M.E. Glickman, Parameter estimation in large dynamic paired comparison experiments. J. R. Stat. Soc. Series C (Appl. Stat.) 48(3), 377–394 (1999)
Google Scholar
C. DeLong, N. Pathak, K. Erickson, E. Perrino, K. Shim, J. Srivastava, Teamskill: modeling team chemistry in online multi-player games, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, Berlin, 2011), pp. 519–531
Google Scholar
I. Makarov, D. Savostyanov, B. Litvyakov, D.I. Ignatov, Predicting winning team and probabilistic ratings in “dota 2” and “counter-strike: global offensive” video games, in Proceedings of the International Conference on Analysis of Images, Social Networks and Texts (Springer, Berlin, 2017), pp. 183–196
Google Scholar
R.C. Weng, C.-J. Lin, A bayesian approximation method for online ranking, J. Mach. Learn. Res. 12(Jan), 267–300 (2011)
MathSciNet MATH Google Scholar
S. Guo, S. Sanner, T. Graepel, W. Buntine, Score-based bayesian skill learning, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, Berlin, 2012), pp. 106–121
Book Google Scholar
S. Chen, T. Joachims, Predicting matchups and preferences in context, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2016), pp. 775–784
Google Scholar
Y. Li, M. Cheng, K. Fujii, F. Hsieh, C.-J. Hsieh, Learning from group comparisons: exploiting higher order interactions, in Advances in Neural Information Processing Systems (2018), pp. 4981–4990
Google Scholar
S. Motegi, N. Masuda, A network-based dynamical ranking system for competitive sports. Sci. Rep. 2, 904 (2012)
Article Google Scholar
J. Ibstedt, E. Rådahl, E. Turesson, et al., Application and Further Development of Trueskill ^TM Ranking in Sports (2019)
Google Scholar
S. Cooper, C.S. Deterding, T. Tsapakos, Player rating systems for balancing human computation games: testing the effect of bipartiteness, in Proceedings of the 1st International Joint Conference of DiGRA and FDG. DIGRA Digital Games and Research Association (2016)
Google Scholar
B. Morrison, Comparing ELO, Glicko, IRT, and Bayesian IRT Statistical Models for Educational and Gaming Data (2019)
Google Scholar
M. Stanescu, Rating systems with multiple factors, in Master’s thesis, School of Informatics, University of Edinburgh, Edinburgh, UK (2011)
Google Scholar
L. Yu, D. Zhang, X. Chen, X. Xie, Moba-slice: a time slice based evaluation framework of relative advantage between teams in moba games (2018). arXiv preprint arXiv:1807.08360
Google Scholar
J. Lasek, Z. Szlávik, S. Bhulai, The predictive power of ranking systems in association football. Int. J. Appl. Pattern Recognit. 1(1), 27–46 (2013)
Article Google Scholar
D. Buckley, K. Chen, J. Knowles, Rapid skill capture in a first-person shooter. IEEE Trans. Comput. Intell. AI Games 9(1), 63–75 (2015)
Article Google Scholar
C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University, Cambridge, 2008)
Book Google Scholar
E.M. Voorhees, The trec-8 question answering track report, in Proceedings of the Trec, vol. 99 (Citeseer, New York, 1999), pp. 77–82
Google Scholar
G. Salton, Developments in automatic text retrieval. Science 253(5023), 974–980 (1991)
Article MathSciNet Google Scholar
K. Järvelin, J. Kekäläinen, Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Google Scholar
M.E. Glickman, The Glicko System, vol. 16 (Boston University, Boston, 1995)
Google Scholar
F.R. Kschischang, B.J. Frey, H.-A. Loeliger et al., Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)
Article MathSciNet Google Scholar
T.P. Minka, A Family of Algorithms for Approximate Bayesian Inference. Ph.D. dissertation (Massachusetts Institute of Technology, Cambridge, 2001)
Google Scholar
J. Huang, T. Zimmermann, N. Nagapan, C. Harrison, B.C. Phillips, Mastering the art of war: how patterns of gameplay influence skill in halo, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM, New York, 2013), pp. 695–704
Google Scholar
C. Kawatsu, R. Hubal, R.P. Marinier, Predicting students’ decisions in a training simulation: a novel application of trueskill. IEEE Trans. Games 10(1), 97–100 (2017)
Article Google Scholar
L.C. Quispe, J.E.O. Luna, A content-based recommendation system using trueskill, in Proceedings of the 2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI) (IEEE, New York, 2015), pp. 203–207
Google Scholar
T. Graepel, J.Q. Candela, T. Borchert, R. Herbrich, Web-scale Bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine (Omnipress, Madison, 2010)
Google Scholar
M.G. Kendall, Rank Correlation Methods (1948)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, DePaul University, Chicago, IL, USA
Arman Dehpanah, Muheeb Faizan Ghori, Jonathan Gemmell & Bamshad Mobasher

Authors

Arman Dehpanah
View author publications
You can also search for this author in PubMed Google Scholar
Muheeb Faizan Ghori
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Gemmell
View author publications
You can also search for this author in PubMed Google Scholar
Bamshad Mobasher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arman Dehpanah .

Editor information

Editors and Affiliations

HBS – Hamburg Business School, Institute of Information Systems, University of Hamburg, Hamburg, Hamburg, Germany
Robert Stahlbock
Department of Computer & Information Science, Fordham University, New York, NY, USA
Gary M. Weiss
College of Engineering & Computer Science, University of Michigan-Dearborn, Dearborn, MI, USA
Mahmoud Abou-Nasr
Department of Computer Science, University of Taipei, Taipei City, Taiwan
Cheng-Ying Yang
Department of Computer Science, University of Georgia, Athens, GA, USA
Hamid R. Arabnia
School of Computing and Data Sciences, Wentworth Institute of Technology, Boston, MA, USA
Leonidas Deligiannidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dehpanah, A., Ghori, M.F., Gemmell, J., Mobasher, B. (2021). The Evaluation of Rating Systems in Online Free-for-All Games. In: Stahlbock, R., Weiss, G.M., Abou-Nasr, M., Yang, CY., Arabnia, H.R., Deligiannidis, L. (eds) Advances in Data Science and Information Engineering. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-71704-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-71704-9_9
Published: 30 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71703-2
Online ISBN: 978-3-030-71704-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics