Data Mining and Knowledge Discovery

, Volume 31, Issue 6, pp 1643–1677 | Cite as

Archetypoid analysis for sports analytics

  • G. VinuéEmail author
  • I. Epifanio
Part of the following topical collections:
  1. Sports Analytics


We intend to understand the growing amount of sports performance data by finding extreme data points, which makes human interpretation easier. In archetypoid analysis each datum is expressed as a mixture of actual observations (archetypoids). Therefore, it allows us to identify not only extreme athletes and teams, but also the composition of other athletes (or teams) according to the archetypoid athletes, and to establish a ranking. The utility of archetypoids in sports is illustrated with basketball and soccer data in three scenarios. Firstly, with multivariate data, where they are compared with other alternatives, showing their best results. Secondly, despite the fact that functional data are common in sports (time series or trajectories), functional data analysis has not been exploited until now, due to the sparseness of functions. In the second scenario, we extend archetypoid analysis for sparse functional data, furthermore showing the potential of functional data analysis in sports analytics. Finally, in the third scenario, features are not available, so we use proximities. We extend archetypoid analysis when asymmetric relations are present in data. This study provides information that will provide valuable knowledge about player/team/league performance so that we can analyze athlete’s careers.


Archetype analysis Sports data mining Functional data analysis Extreme point Multidimensional scaling Performance analysis 



The authors would like to thank the Editors and three reviewers for their very constructive suggestions, which have led to improvements in the manuscript.


  1. Bauckhage C, Thurau C (2009) Making archetypal analysis practical. In: Denzler J., Notni G., Süsse H. (eds) Pattern Recognition. 31st annual pattern recognition symposium of the German Association for Pattern Recognition, 2009. Lecture Notes in Computer Science, vol 5748. Springer, Berlin, Heidelberg, Germany, 272–281Google Scholar
  2. Bhandari I, Colet E, Parker J, Pines Z, Pratap R, Ramanujam K (1997) Advanced scout: Data mining and knowledge discovery in NBA data. Data Mining and Knowledge Discovery 1(1):121–125CrossRefGoogle Scholar
  3. Canhasi E, Kononenko I (2013) Multi-document summarization via archetypal analysis of the content-graph joint model. Knowledge and Information Systems, 1–22Google Scholar
  4. Canhasi E, Kononenko I (2014) Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Systems with Applications 41(2):535–543CrossRefGoogle Scholar
  5. Chan B, Mitchell D, Cram L (2003) Archetypal analysis of galaxy spectra. Monthly Notices of the Royal Astronomical Society 338:1–6CrossRefGoogle Scholar
  6. Cleveland W, Grosse E, Shyu W (1992) Statistical models in S, Wadsworth & Brooks/Cole, chap Local regressionGoogle Scholar
  7. Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4):338–347MathSciNetCrossRefzbMATHGoogle Scholar
  8. Davis T, Love B (2010) Memory for category information is idealized through contrast with competing options. Psychological Science 21(2):234–242CrossRefGoogle Scholar
  9. D’Esposito MR, Palumbo F, Ragozini G (2012) Interval archetypes: A new tool for interval data analysis. Statistical Analysis and Data Mining 5(4):322–335MathSciNetCrossRefGoogle Scholar
  10. D’Esposito MR, Ragozini G (2008) A new R-ordering procedure to rank multivariate performances. Quaderni di Statistica 10:5–21Google Scholar
  11. Donoghue O, Harrison A, Coffey N, Hayes K (2008) Functional data analysis of running kinematics in chronic Achilles tendon injury. Medicine and Science in Sports and Exercise 40(7):1323–1335CrossRefGoogle Scholar
  12. Elhamifar E, Sapiro G, Vidal R (2012) See all by looking at a few: Sparse modeling for finding representative objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–8Google Scholar
  13. Epifanio I (2013) H-plots for displaying nonmetric dissimilarity matrices. Statistical Analysis and Data Mining 6(2):136–143MathSciNetCrossRefGoogle Scholar
  14. Epifanio I (2014) Mapping the asymmetrical citation relationships between journals by h-plots. Journal of the Association for Information Science and Technology 65(6):1293–1298CrossRefGoogle Scholar
  15. Epifanio I (2016) Functional archetype and archetypoid analysis. Computational Statistics & Data Analysis 104:24–34MathSciNetCrossRefGoogle Scholar
  16. Epifanio I, Ávila C, Page Á, Atienza C (2008) Analysis of multiple waveforms by means of functional principal component analysis: normal versus pathological patterns in sit-to-stand movement. Medical & Biological Engineering & Computing 46(6):551–561CrossRefGoogle Scholar
  17. Epifanio I, Vinué G, Alemany S (2013) Archetypal analysis: Contributions for estimating boundary cases in multivariate accommodation problem. Computers & Industrial Engineering 64:757–765CrossRefGoogle Scholar
  18. Eugster M (2012) Performance profiles based on archetypal athletes. International Journal of Performance Analysis in Sport 12(1):166–187Google Scholar
  19. Eugster M, Leisch F (2009) From Spider-Man to hero - Archetypal analysis in R. Journal of Statistical Software 30(8):1–23CrossRefGoogle Scholar
  20. Eugster M, Leisch F (2011) Weighted and robust archetypal analysis. Computational Statistics & Data Analysis 55(3):1215–1225MathSciNetCrossRefzbMATHGoogle Scholar
  21. Feld S, Werner M, Schönfeld M, Hasler S (2015) Archetypes of alternative routes in buildings. In: Proceedings of the 6th International Conference on Indoor Positioning and Indoor Navigation (IPIN), 1–10Google Scholar
  22. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976MathSciNetCrossRefzbMATHGoogle Scholar
  23. Gower J (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871CrossRefGoogle Scholar
  24. Gruhl J, Erosheva EA (2014) A Tale of Two (Types of) Memberships. In: Handbook on Mixed-Membership Models, Chapman & Hall/CRC, 15–38Google Scholar
  25. Harrison A (2014) Applications of functional data analysis in sports biomechanics. In: 32 International Conference of Biomechanics in Sports, 1–9Google Scholar
  26. Harrison A, Ryan W, Hayes K (2007) Functional data analysis of joint coordination in the development of vertical jump performance. Sports Biomechanics 6(2):199–214CrossRefGoogle Scholar
  27. Hoopdata - NBA Statistics and Analysis (2009-2013). Retrieved from
  28. James G (2010) The Oxford handbook of functional data analysis, Oxford University Press, chap Sparse Functional Data AnalysisGoogle Scholar
  29. James G, Hastie T, Sugar C (2000) Principal component models for sparse functional data. Biometrika 87(3):587–602MathSciNetCrossRefzbMATHGoogle Scholar
  30. Kaufman L, Rousseeuw PJ (1990) Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, New YorkCrossRefzbMATHGoogle Scholar
  31. Kersting K, Bauckhage C, Thurau C, Wahabzada M (2012) Matrix Factorization as Search. Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases. Bristol, UK, pp 850–853Google Scholar
  32. Krein M, Milman D (1940) On extreme points of regular convex sets. Studia Mathematica 9:133–138MathSciNetCrossRefzbMATHGoogle Scholar
  33. Kubatko J, Oliver D, Pelton K, Rosenbaum D (2007) A starting point for analyzing basketball statistics. Journal of Quantitative Analysis in Sports 3(3):1–10MathSciNetCrossRefGoogle Scholar
  34. Levitin D, Nuzzo R, Vines B, Ramsay J (2007) Introduction to functional data analysis. Canadian Psychology 48(3):135–155CrossRefGoogle Scholar
  35. Li S, Wang P, Louviere J, Carson R (2003) Archetypal analysis: A new way to segment markets based on extreme individuals. In: ANZMAC 2003, Conference Proceedings, Australia and New Zealand Marketing Academy Conference (ANZMAC), Adelaide, Australia, 1674–1679Google Scholar
  36. Lutz D (2012) A cluster analysis of NBA players. MIT Sloan Sports Analytics Conference. MIT, Boston, USA, pp 1–10Google Scholar
  37. Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2015) cluster: Cluster analysis basics and extensions. R package version 2.0.1 — For new features, see the ’Changelog’ file (in the package source)Google Scholar
  38. Midgley D, Venaik S (2013) Marketing strategy in MNC subsidiaries: Pure versus hybrid archetypes. Proceedings of the 55th Annual Meeting of the Academy of International Business. AIB, Istanbul, Turkey, pp 215–216Google Scholar
  39. Mohamed S, Heller K, Ghahramani Z (2014) A simple and general exponential family framework for partial membership and factor analysis. In: Handbook on Mixed-Membership Models, Chapman & Hall/CRC, 67–88Google Scholar
  40. Mørup M, Hansen L (2012) Archetypal analysis for machine learning and data mining. Neurocomputing 80:54–63CrossRefGoogle Scholar
  41. O’Donoghue P (2010) Research methods for sports performance analysis. Routledge, Taylor & Francis Group, New York, NYGoogle Scholar
  42. Peng J, Paul D (2009) A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data. Journal of Computational and Graphical Statistics 18(4):995–1015MathSciNetCrossRefGoogle Scholar
  43. Peng J, Paul D (2011) fpca: Restricted MLE for functional principal components analysis., R package version 0.2-1
  44. Porzio G, Ragozini G, Vistocco D (2008) On the use of archetypes as benchmarks. Applied Stochastic Models in Business and Industry 24:419–437MathSciNetCrossRefzbMATHGoogle Scholar
  45. R Development Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria,, ISBN 3-900051-07-0
  46. Ragozini G, Palumbo F, D’Esposito MR (2017) Archetypal analysis for data-driven prototype identification. Statistical Analysis and Data Mining: The ASA Data Science Journal 10(1):6–20MathSciNetCrossRefGoogle Scholar
  47. Ramsay J, Silverman B (2002) Applied functional data analysis. SpringerGoogle Scholar
  48. Ramsay J, Silverman B (2005) Functional data analysis, 2nd edn. SpringerGoogle Scholar
  49. Schulte, O, Zhao, Z Routley, K (2015) What is the Value of an Action in Ice Hockey? Learning a Q-function for the NHL. In: MLSA 2015: Machine Learning and Data Mining for Sports Analytics (MLSA 15), 1–10Google Scholar
  50. Seiler C, Wohlrabe K (2013) Archetypal scientists. Journal of Informetrics 7:345–356CrossRefGoogle Scholar
  51. Shea S (2014) Basketball analytics: Spatial tracking. Louis, MO, Createspace, Lake StGoogle Scholar
  52. Shea S, Baker C (2013) Basketball analytics: Objective and efficient strategies for understanding how teams win. Louis, MO, Advanced Metrics, LLC, Lake StGoogle Scholar
  53. Theodosiou T, Kazanidis I, Valsamidis S, Kontogiannis S (2013) Courseware usage archetyping. In: Proceedings of the 17th Panhellenic Conference on Informatics, ACM, New York, NY, USA, PCI ’13, 243–249Google Scholar
  54. Thurau C, Kersting K, Wahabzada M, Bauckhage C (2012) Descriptive matrix factorization for sustainability adopting the principle of opposites. Data Mining and Knowledge Discovery 24(2):325–354MathSciNetCrossRefzbMATHGoogle Scholar
  55. Ullah S, Finch C (2013) Applications of functional data analysis: A systematic review. BMC Medical Research Methodology 13(43):1–12Google Scholar
  56. Vinué G (2014) Development of statistical methodologies applied to anthropometric data oriented towards the ergonomic design of products. PhD thesis, Faculty of Mathematics. University of Valencia, Spain,
  57. Vinué G, Epifanio I, Alemany S (2015) Archetypoids: A new approach to define representative archetypal data. Computational Statistics and Data Analysis 87:102–115MathSciNetCrossRefGoogle Scholar
  58. Vinué G (2017) Anthropometry: An R package for analysis of anthropometric data. Journal of Statistical Software 77(6):1–39CrossRefGoogle Scholar
  59. Vinué G, Epifanio I, Simó A, Ibáñez M, Domingo J, Ayala G (2017) Anthropometry: An R package for analysis of anthropometric data., R package version 1.8
  60. Wakim A, Jin J (2014) Functional data analysis of aging curves in sports,
  61. Williams C, Wragg C (2004) Data analysis and research for sport and exercise science. Routledge, Taylor & Francis Group, New York, NYGoogle Scholar
  62. Winston W (2009) Mathletics : How gamblers, managers, and sports enthusiasts use mathematics in baseball, basketball, and football. Princeton University Press, Princeton, New JerseyzbMATHGoogle Scholar
  63. Yao F, Müller H-G, Wang JL (2005) Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association 100(470):577–590MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Department of Statistics and O.R.University of ValenciaBurjassotSpain
  2. 2.Dept. Matemàtiques and Institut de Matemàtiques i Aplicacions de Castelló. Campus del Riu SecUniversitat Jaume ICastellóSpain

Personalised recommendations