Dolores: a model that predicts football match outcomes from all over the world
The paper describes Dolores, a model designed to predict football match outcomes in one country by observing football matches in multiple other countries. The model is a mixture of two methods: (a) dynamic ratings and (b) Hybrid Bayesian Networks. It was developed as part of the international special issue competition Machine Learning for Soccer. Unlike past academic literature which tends to focus on a single league or tournament, Dolores is trained with a single dataset that incorporates match outcomes, with missing data (as part of the challenge), from 52 football leagues from all over the world. The challenge involved using a single model to predict 206 future match outcomes from 26 different leagues, played from March 31 to April 9 in 2017. Dolores ranked 2nd in the competition with a predictive error 0.94% higher than the top and 116.78% lower than the bottom participants. The paper extends the assessment of the model in terms of profitability against published market odds. Given that the training dataset incorporates a number of challenges as part of the competition, the results suggest that the model generalised well over multiple leagues, divisions, and seasons. Furthermore, while detailed historical performance for each team helps to maximise predictive accuracy, Dolores provides empirical proof that a model can make a good prediction for a match outcome between teams x and y even when the prediction is derived from historical match data that neither x nor y participated in. While this agrees with past studies in football and other sports, this paper extends the empirical evidence to historical training data that does not just include match results from a single competition but contains results spanning different leagues and divisions from 35 different countries. This implies that we can still predict, for example, the outcome of English Premier League matches, based on training data from Japan, New Zealand, Mexico, South Africa, Russia, and other countries in addition to data from the English Premier league.
KeywordsAssociation football Bayesian Networks Dynamic ratings Football betting Soccer prediction Time-series analysis
This study was partly supported by the European Research Council (ERC), Research Project ERC-2013-AdG339182-BAYES_KNOWLEDGE.
- Arabzad, S. M., Araghi, M. E. T., Sadi-Nezhad, S., & Ghofrani, N. (2014). Football match results prediction using artificial neural networks; The case of Iran Pro League. International Journal of Applied Research on Industrial Engineering, 1(3), 159–179.Google Scholar
- Berrar, D., Dubitzky, W., Davis, J., & Lopes, P. (2017). Machine learning for soccer. Retrieved September 1, 2017 from https://osf.io/ftuva/.
- Britannica. (2017). Football (Association Football, Soccer). In Encyclopaedia Britannica, Retrieved April 19, 2017 from https://www.britannica.com/sports/football-soccer.
- Cheng, T., Cui, D., Fan, Z., Zhou, J., & Lu, S. (2003). A new model to forecast the results of matches based on hybrid neural networks in the soccer rating system. In IEEE Xplore.Google Scholar
- Constantinou, A. C., & Fenton, N. E. (2013b). Profiting from arbitrage and odds biases of the European football gambling market. The Journal of Gambling Business and Economics, 7(2), 41–70.Google Scholar
- Daily Mail. (2015). Global sports gambling worth ‘up to $3 trillion’. Daily Mail. Retrieved April 19, 2017 from http://www.dailymail.co.uk/wires/afp/article-3040540/Global-sports-gambling-worth-3-trillion.html.
- Dayaratna, K. D., & Miller, S. J. (2013). The Pythagorean won-loss formula and hockey: A statistical justification for using the classic baseball formula as an evaluative tool in hockey (pp. 193–209). XVI: The Hockey Research Journal.Google Scholar
- Deloitte. (2016). Annual Review of Football Finance 2016. Deloitte. Retrieved April 19, 2017 from https://www2.deloitte.com/uk/en/pages/sports-business-group/articles/annual-review-of-football-finance.html.
- Dixon, M. J., & Coles, S. G. (1997). Modelling association football scores and inefficiencies in the football betting market. Applied Statistics, 46(2), 265–280.Google Scholar
- Dunning, E. (1999). The development of soccer as a world game. In Sports Matters: Sociological Studies of Sport Violence and Civilisation. London: Routledge.Google Scholar
- Elo, A. E. (1978). The rating of chess players, past and present. New York: Arco Publishing.Google Scholar
- FIFA. (2017). FIFA/Coca-Cola World Ranking. FIFA. Retrieved April 19, 2017 from http://www.fifa.com/fifa-world-ranking/procedure/men.html.
- Football-Data. (2017). Historical Football Results and Betting Odds Data. Retrieved April 4, 2017 from http://www.football-data.co.uk/data.php.
- Huang, K., & Chang, W. (2010). A neural network method for prediction of 2006 World Cup Football Game. In IEEE Xplore.Google Scholar
- Leitch, G., & Tanner, J. E. (1991). Economic forecast evaluation: Profits versus the conventional error measures. American Economic Association, 81(3), 580–590.Google Scholar
- Miller, S. J. (2006). A derivation of the pythagorean won-loss formula in baseball. arXiv:math/0509698 [math.ST].
- O’Shaughnessy, D. (2006). Possession versus position: Strategic evaluation in AFL. Journal of Sports Science & Medicine, 5(4), 533–540.Google Scholar
- Oliver, D. (2004). Basketball on paper: Rules and tools for performance analysis. Washington, DC: Brassey’s Inc.Google Scholar
- Pearl, J. (1982). Reverend Bayes on inference engines: A distributed hierarchical approach. In AAAI - 82 Proceedings (pp. 133–136).Google Scholar
- Pearl, J. (1985). A model of activated memory for evidential reasoning. In Proceedings of the cognitive science society (pp. 329–334).Google Scholar
- Pena, J. L. (2014). A Markovian model for association football possession and its outcomes. arXiv:1403.7993 [math.PR].
- Piette, J., Pham, L., & Anand, S. (2011). Evaluating basketball player performance via statistical network modeling. In MIT Sloan Sports Analytics Conference 2011, Boston, MA, USA.Google Scholar
- Pomeroy, K. (2017). 2018 Pomeroy College Basketball Ratings. Retrieved November 30, 2017 from https://kenpom.com/.
- Rue, H., & Salvesen, O. (2010). Prediction and retrospective analysis of soccer matches in a league. Journal of the Royal Statistical Society: Series D (The Statistician), 49(3), 399–418.Google Scholar
- Schatz, A. (2006). Pro football prospectus 2006: Statistics, analysis, and insight for the information age. New York: Workman Publishing Company.Google Scholar
- Tsakonas, A., Dounias, G., Shtovba, S. & Vivdyuk, V. (2002). Soft computing-based result prediction of football games. In The first international conference on inductive modelling (ICIM2002), Lviv, Ukraine.Google Scholar