Automatic attribute construction for basketball modelling

  • Petar VračarEmail author
  • Erik Štrumbelj
  • Igor Kononenko
Regular Paper


We address the problem of automatic extraction of patterns in the sequence of events in basketball games and construction of statistical models for generating a plausible simulation of a match between two distinct teams. We present a method for automatic construction of an attribute space which requires very little expert knowledge. The attributes are defined as the ratio between the number of entries and exits from higher-level concepts that are identified as groups of similar in-game events. The similarity between events is determined by the similarity between probability distributions describing the preceding and the following events in the observed sequences of game progression. The methodology is general and is applicable to any sports game that can be modelled as a random walk through the state space. Experiments on basketball show that automatically generated attributes are as informative as those derived using expert knowledge. Furthermore, the obtained simulations are in line with empirical data.


Sports modelling Markov process Attribute construction Match simulation NBA 



  1. 1.
    Baghal T et al (2012) Are the “four factors” indicators of one factor? an application of structural equation modeling methodology to NBA data in prediction of winning percentage. J Quant Anal Sports 8(1):1–14Google Scholar
  2. 2.
    Berri DJ (2008) A simple measure of worker productivity in the national basketball association. Bus Sport 3:1–40Google Scholar
  3. 3.
    Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 75:1–3CrossRefGoogle Scholar
  4. 4.
    Cervone D, D’Amour A, Bornn L, Goldsberry K (2016) A multiresolution stochastic process model for predicting basketball possession outcomes. J Am Stat Assoc 111(514):585–599MathSciNetCrossRefGoogle Scholar
  5. 5.
    Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1:1Google Scholar
  6. 6.
    Chang Y-H, Maheswaran R, Su J, Kwok S, Levy T, Wexler A, Squire K (2014) Quantifying shot quality in the nba. In: Proceedings of the 8th annual MIT sloan sports analytics conference. MIT, BostonGoogle Scholar
  7. 7.
    Chawla S, Estephan J, Gudmundsson J, Horton M (2017) Classification of passes in football matches using spatiotemporal data. ACM Trans Spat Algorithms Syst 3:6Google Scholar
  8. 8.
    Cintia P, Giannotti F, Pappalardo L, Pedreschi D, Malvaldi M (2015) The harsh rule of the goals: Data-driven performance indicators for football teams. In: 2015 IEEE international conference on data science and advanced analytics (DSAA), IEEE, 36678 pp. 1–10Google Scholar
  9. 9.
    Clemente FM, Martins FML, Mendes RS et al (2016) Social network analysis applied to team sports analysis. Springer, BerlinCrossRefGoogle Scholar
  10. 10.
    Elo A (1961) New USCF rating system. Chess life 16:160–161Google Scholar
  11. 11.
    Epstein ES (1969) A scoring system for probability forecast of ranked categories. J Appl Meteorol 8:985–987CrossRefGoogle Scholar
  12. 12.
    Franks A, Miller A, Bornn L, Goldsberry K et al (2015) Characterizing the spatial structure of defensive skill in professional basketball. Annal Appl Stat 9:94–121MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Gabel A, Redner S et al (2012) Random walk picture of basketball scoring. J Quant Anal Sports 8(1):1–18Google Scholar
  14. 14.
    Good IJ (1952) Rational decisions. J R Stat Soc Series B (Methodological), pp 107–114Google Scholar
  15. 15.
    Gudmundsson J, Horton M (2017) Spatio-temporal analysis of team sports. ACM Comput Surv 50:22CrossRefGoogle Scholar
  16. 16.
    Hollinger J (2003) Pro Basketball Prospectus 2003–2004. Brassey’s, San FranciscoGoogle Scholar
  17. 17.
    Hvattum LM, Arntzen H (2010) Using ELO ratings for match result prediction in association football. Int J Forecast 26:460–470CrossRefGoogle Scholar
  18. 18.
    Kononenko I (1995) On biases in estimating multi-valued attributes. In: Ijcai. 95: 1034–1040Google Scholar
  19. 19.
    Kubatko J, Oliver D, Pelton K, Rosenbaum DT (2007) A starting point for analyzing basketball statistics. J Quant Anal Sports 3:1–22MathSciNetGoogle Scholar
  20. 20.
    Kullback S, Leibler RA (1951) On information and sufficiency. Annal Math Stat 22:79–86MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Langville AN, Meyer CD (2012) Who’s# 1?: the science of rating and ranking. Princeton University Press, PrincetonzbMATHGoogle Scholar
  22. 22.
    Le HM, Carr P, Yue Y, Lucey P (2017) Data-driven ghosting using deep imitation learning. In: 2017 MIT sloan sports analytics conferenceGoogle Scholar
  23. 23.
    Lucey P, Bialkowski A, Monfort M, Carr P, Matthews I (2014) Quality vs quantity: improved shot prediction in soccer using strategic features from spatiotemporal data. In: Proceedings of the 8th annual MIT sloan sports analytics conference. pp 1–9Google Scholar
  24. 24.
    Mehrasa N, Zhong Y, Tung F, Bornn L, Mori G (2018) Deep learning of player trajectory representations for team activity analysis. In: 2018 MIT sloan sports analytics conferenceGoogle Scholar
  25. 25.
    Oliver D (2004) Basketball on paper: rules and tools for performance analysis. Potomac Books Inc, PotomacGoogle Scholar
  26. 26.
    Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefzbMATHGoogle Scholar
  27. 27.
    Štrumbelj E, Vračar P (2012) Simulating a basketball match with a homogeneous Markov model and forecasting the outcome. Int J Forecast 28:532–542CrossRefGoogle Scholar
  28. 28.
    Teramoto M, Cross CL (2010) Relative importance of performance factors in winning NBA games in regular season versus playoffs. J Quant Anal Sports 6(3):1–17MathSciNetGoogle Scholar
  29. 29.
    Vračar P, Štrumbelj E, Kononenko I (2016) Modeling basketball play-by-play data. Expert Syst Appl 44:58–66CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Computer and Information ScienceUniversity of LjubljanaLjubljanaSlovenia

Personalised recommendations