Advertisement

Dolores: a model that predicts football match outcomes from all over the world

  • Anthony C. Constantinou
Article
Part of the following topical collections:
  1. Special Issue on Machine Learning for Soccer

Abstract

The paper describes Dolores, a model designed to predict football match outcomes in one country by observing football matches in multiple other countries. The model is a mixture of two methods: (a) dynamic ratings and (b) Hybrid Bayesian Networks. It was developed as part of the international special issue competition Machine Learning for Soccer. Unlike past academic literature which tends to focus on a single league or tournament, Dolores is trained with a single dataset that incorporates match outcomes, with missing data (as part of the challenge), from 52 football leagues from all over the world. The challenge involved using a single model to predict 206 future match outcomes from 26 different leagues, played from March 31 to April 9 in 2017. Dolores ranked 2nd in the competition with a predictive error 0.94% higher than the top and 116.78% lower than the bottom participants. The paper extends the assessment of the model in terms of profitability against published market odds. Given that the training dataset incorporates a number of challenges as part of the competition, the results suggest that the model generalised well over multiple leagues, divisions, and seasons. Furthermore, while detailed historical performance for each team helps to maximise predictive accuracy, Dolores provides empirical proof that a model can make a good prediction for a match outcome between teams x and y even when the prediction is derived from historical match data that neither x nor y participated in. While this agrees with past studies in football and other sports, this paper extends the empirical evidence to historical training data that does not just include match results from a single competition but contains results spanning different leagues and divisions from 35 different countries. This implies that we can still predict, for example, the outcome of English Premier League matches, based on training data from Japan, New Zealand, Mexico, South Africa, Russia, and other countries in addition to data from the English Premier league.

Keywords

Association football Bayesian Networks Dynamic ratings Football betting Soccer prediction Time-series analysis 

Notes

Acknowledgements

This study was partly supported by the European Research Council (ERC), Research Project ERC-2013-AdG339182-BAYES_KNOWLEDGE.

References

  1. Angelini, G., & Angelis, L. D. (2017). PARX model for football match predictions. Journal of Forecasting, 36, 795.MathSciNetCrossRefGoogle Scholar
  2. Arabzad, S. M., Araghi, M. E. T., Sadi-Nezhad, S., & Ghofrani, N. (2014). Football match results prediction using artificial neural networks; The case of Iran Pro League. International Journal of Applied Research on Industrial Engineering, 1(3), 159–179.Google Scholar
  3. Baio, G., & Blangiardo, M. (2010). Bayesian hierarchical model for the prediction of football results. Journal of Applied Statistics, 37(2), 253–264.MathSciNetCrossRefGoogle Scholar
  4. Berrar, D., Dubitzky, W., Davis, J., & Lopes, P. (2017). Machine learning for soccer. Retrieved September 1, 2017 from https://osf.io/ftuva/.
  5. Britannica. (2017). Football (Association Football, Soccer). In Encyclopaedia Britannica, Retrieved April 19, 2017 from https://www.britannica.com/sports/football-soccer.
  6. Cheng, T., Cui, D., Fan, Z., Zhou, J., & Lu, S. (2003). A new model to forecast the results of matches based on hybrid neural networks in the soccer rating system. In IEEE Xplore.Google Scholar
  7. Constantinou, A. C., & Fenton, N. E. (2012). Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models. Journal of Quantitative Analysis in Sports, 8(1), 1–14.CrossRefGoogle Scholar
  8. Constantinou, A. C., & Fenton, N. E. (2013a). Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries. Journal of Quantitative Analysis in Sports, 9(1), 37–50.CrossRefGoogle Scholar
  9. Constantinou, A. C., & Fenton, N. E. (2013b). Profiting from arbitrage and odds biases of the European football gambling market. The Journal of Gambling Business and Economics, 7(2), 41–70.Google Scholar
  10. Constantinou, A., & Fenton, N. (2017). Towards smart-data: Improving predictive accuracy in long-term football team performance. Knowledge-Based Systems, 124, 93–104.CrossRefGoogle Scholar
  11. Constantinou, A. C., Fenton, N. E., & Neil, M. (2012). pi-football: A Bayesian network model for forecasting Association Football match outcomes. Knowledge-Based Systems, 36, 322–339.CrossRefGoogle Scholar
  12. Constantinou, A. C., Fenton, N. E., & Neil, M. (2013). Profiting from an inefficient Association Football gambling market: Prediction, Risk and Uncertainty using Bayesian networks. Knowledge-Based Systems, 50, 60–86.CrossRefGoogle Scholar
  13. Daily Mail. (2015). Global sports gambling worth ‘up to $3 trillion’. Daily Mail. Retrieved April 19, 2017 from http://www.dailymail.co.uk/wires/afp/article-3040540/Global-sports-gambling-worth-3-trillion.html.
  14. Dayaratna, K. D., & Miller, S. J. (2013). The Pythagorean won-loss formula and hockey: A statistical justification for using the classic baseball formula as an evaluative tool in hockey (pp. 193–209). XVI: The Hockey Research Journal.Google Scholar
  15. Deloitte. (2016). Annual Review of Football Finance 2016. Deloitte. Retrieved April 19, 2017 from https://www2.deloitte.com/uk/en/pages/sports-business-group/articles/annual-review-of-football-finance.html.
  16. Dixon, M. J., & Coles, S. G. (1997). Modelling association football scores and inefficiencies in the football betting market. Applied Statistics, 46(2), 265–280.Google Scholar
  17. Dunning, E. (1999). The development of soccer as a world game. In Sports Matters: Sociological Studies of Sport Violence and Civilisation. London: Routledge.Google Scholar
  18. Elo, A. E. (1978). The rating of chess players, past and present. New York: Arco Publishing.Google Scholar
  19. Epstein, E. (1969). A scoring system for probability forecasts of ranked categories. Journal of Applied Meteorology, 8, 985–987.CrossRefGoogle Scholar
  20. FIFA. (2017). FIFA/Coca-Cola World Ranking. FIFA. Retrieved April 19, 2017 from http://www.fifa.com/fifa-world-ranking/procedure/men.html.
  21. Football-Data. (2017). Historical Football Results and Betting Odds Data. Retrieved April 4, 2017 from http://www.football-data.co.uk/data.php.
  22. Forrest, D., Goddard, J., & Simmons, R. (2005). Odds-setters as forecasters: The case of English football. International Journal of Forecasting, 21, 551–564.CrossRefGoogle Scholar
  23. Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2003). Bayesian data analysis (2nd ed.). Boca Raton: Chapman and Hall/CRC.zbMATHGoogle Scholar
  24. Goddard, J. (2005). Regression models for forecasting goals and match results in association football. International Journal of Forecasting, 21, 331–340.CrossRefGoogle Scholar
  25. Goddard, J., & Asimakopoulos, I. (2004). Forecasting football results and the efficiency of fixed-odds betting. Journal of Forecasting, 23, 51–66.CrossRefGoogle Scholar
  26. Hamilton, H. (2011). An extension of the pythagorean expectation for association football. Journal of Quantitative Analysis in Sports, 7(2), 1–18.MathSciNetCrossRefGoogle Scholar
  27. Huang, K., & Chang, W. (2010). A neural network method for prediction of 2006 World Cup Football Game. In IEEE Xplore.Google Scholar
  28. Hvattum, L. M., & Arntzen, H. (2010). Using ELO ratings for match result prediction in association football. International Journal of Forecasting, 26, 460–470.CrossRefGoogle Scholar
  29. Joseph, A., Fenton, N., & Neil, M. (2006). Predicting football results using Bayesian nets and other machine learning techniques. Knowledge-Based Systems, 7, 544–553.CrossRefGoogle Scholar
  30. Karlis, D., & Ntzoufras, I. (2003). Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3), 381–393.MathSciNetGoogle Scholar
  31. Kelly, J. L. (1956). A new interpretation of information rate. Bell System Technical Journal, 35(4), 917–926.MathSciNetCrossRefGoogle Scholar
  32. Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge: The MIT Press.zbMATHGoogle Scholar
  33. Kuypers, T. (2000). Information and efficiency: An empirical study of a fixed odds betting market. Applied Economics, 32, 1353–1363.CrossRefGoogle Scholar
  34. Lee, A. J. (1997). Modeling scores in the Premier League: Is Manchester United really the best? Chance, 10(1), 15–19.CrossRefGoogle Scholar
  35. Leitch, G., & Tanner, J. E. (1991). Economic forecast evaluation: Profits versus the conventional error measures. American Economic Association, 81(3), 580–590.Google Scholar
  36. Leitner, C., Zeileis, A., & Hornik, K. (2010). Forecasting sports tournaments by ratings of (prob)abilities: A comparison for the EURO 2008. International Journal of Forecasting, 26, 471–481.CrossRefGoogle Scholar
  37. Maher, M. J. (1982). Modelling association football scores. Statistica Neerlandica, 36(3), 109–111.CrossRefGoogle Scholar
  38. Miller, S. J. (2006). A derivation of the pythagorean won-loss formula in baseball. arXiv:math/0509698 [math.ST].
  39. O’Shaughnessy, D. (2006). Possession versus position: Strategic evaluation in AFL. Journal of Sports Science & Medicine, 5(4), 533–540.Google Scholar
  40. Oliver, D. (2004). Basketball on paper: Rules and tools for performance analysis. Washington, DC: Brassey’s Inc.Google Scholar
  41. Pearl, J. (1982). Reverend Bayes on inference engines: A distributed hierarchical approach. In AAAI - 82 Proceedings (pp. 133–136).Google Scholar
  42. Pearl, J. (1985). A model of activated memory for evidential reasoning. In Proceedings of the cognitive science society (pp. 329–334).Google Scholar
  43. Pearl, J. (2009). Causality: Models, reasoning and inference (2nd ed.). Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  44. Pena, J. L. (2014). A Markovian model for association football possession and its outcomes. arXiv:1403.7993 [math.PR].
  45. Piette, J., Pham, L., & Anand, S. (2011). Evaluating basketball player performance via statistical network modeling. In MIT Sloan Sports Analytics Conference 2011, Boston, MA, USA.Google Scholar
  46. Pomeroy, K. (2017). 2018 Pomeroy College Basketball Ratings. Retrieved November 30, 2017 from https://kenpom.com/.
  47. Rotshtein, A., Posner, M., & Rakytyanska, A. (2005). Football predictions based on a fuzzy model with genetic and neural tuning. Cybernetics and Systems Analysis, 41(4), 619–630.MathSciNetCrossRefzbMATHGoogle Scholar
  48. Rue, H., & Salvesen, O. (2010). Prediction and retrospective analysis of soccer matches in a league. Journal of the Royal Statistical Society: Series D (The Statistician), 49(3), 399–418.Google Scholar
  49. Schatz, A. (2006). Pro football prospectus 2006: Statistics, analysis, and insight for the information age. New York: Workman Publishing Company.Google Scholar
  50. Szczepanski, L., & McHale, I. (2015). Beyond completion rate: Evaluating the passing ability of footballers. Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(2), 513–533.MathSciNetCrossRefGoogle Scholar
  51. Tsakonas, A., Dounias, G., Shtovba, S. & Vivdyuk, V. (2002). Soft computing-based result prediction of football games. In The first international conference on inductive modelling (ICIM2002), Lviv, Ukraine.Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Risk and Information Management (RIM) Research Group, School of Electronic Engineering and Computer ScienceQueen Mary University of LondonLondonUK

Personalised recommendations