Skip to main content
Log in

An Assessment of Football Through the Lens of Data Science

  • Published:
Annals of Data Science Aims and scope Submit manuscript

Abstract

The rise of Data Science and related fields of Big Data, Machine Learning, and Deep Learning has transformed the industrial landscape. The areas of sports and sports analytics are no exception. While to the layman, its influence may not be evident, but they have changed the way various sports are played up to different degrees. Hence, in recent times, sports institutions and clubs have given increased importance to such research that will ultimately help them have a competitive edge over rivals. The effects of these institutions incorporating these researches into their ways of competing have had impacts on and off the playing field. These effects aren’t only in terms of physiological enhancements of the athletes, but also socio-political and economic impacts as well. Out of the various sports implementing these techniques, we will focus on the effects mentioned above of Data Science on Football (“Soccer” in the USA). The following is a detailed review of the concepts as mentioned earlier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Availability of Data and Material

All relevant data and material are presented in the main paper.

References

  1. Parekh V, Shah D, Shah M (2020) Fatigue detection using artificial intelligence framework. Augment Hum Res. https://doi.org/10.1007/s41133-019-0023-4

    Article  Google Scholar 

  2. Pandya R, Nadiadwala S, Shah R, Shah M (2020) Buildout of methodology for meticulous diagnosis of K-complex in EEG for aiding the detection of alzheimer’s by artificial intelligence. Augment Hum Res. https://doi.org/10.1007/s41133-019-0021-6

    Article  Google Scholar 

  3. Kundalia K, Patel Y, Shah M (2020) Multi-label movie genre detection from a movie poster using knowledge transfer learning. Augment Hum Res. https://doi.org/10.1007/s41133-019-0029-y

    Article  Google Scholar 

  4. Bondyopadhyay PK (1998) Moore’s law governs the silicon revolution. Proc IEEE 86:78–81. https://doi.org/10.1109/5.658761

    Article  Google Scholar 

  5. Arnold U, Oberlander J, Schwarzbach B (2013) Advancements in cloud computing for logistics. Fed Conf Comput Sci Inf Syst FedCSIS 2013:1055–1062

    Google Scholar 

  6. Gandhi M, Kamdar J, Shah M (2020) Preprocessing of non-symmetrical images for edge detection. Augment Hum Res 5:1–10. https://doi.org/10.1007/s41133-019-0030-5

    Article  Google Scholar 

  7. Patel D, Shah D, Shah M (2020) The intertwine of brain and body: a quantitative analysis on how big data influences the system of sports. Ann Data Sci. https://doi.org/10.1007/s40745-019-00239-y

    Article  Google Scholar 

  8. Ahir K, Govani K, Gajera R, Shah M (2020) Application on virtual reality for enhanced education learning, military training and sports. Augment Hum Res. https://doi.org/10.1007/s41133-019-0025-2

    Article  Google Scholar 

  9. Jani K, Chaudhuri M, Patel H, Shah M (2020) Machine learning in films: an approach towards automation in film censoring. J Data, Inf Manag 2:55–64. https://doi.org/10.1007/s42488-019-00016-9

    Article  Google Scholar 

  10. Bryant R, Katz R, Lazowska E (2008) Big-data computing: creating revolutionary breakthroughs in commerce, science, and society in computing research initiatives for the 21st century. Comput Res Assoc

  11. Tambe P (2014) Big Data Investment, Skills, and Firm Value. Manage Sci 60:1452–1469. https://doi.org/10.1287/mnsc.2014.1899

    Article  Google Scholar 

  12. Mcafee A, Brynjolfsson E (2012) Spotlight on big data big data: the management revolution. Harv Bus Rev 90:1–9

    Google Scholar 

  13. Li J, Shi Y (2001) An integer linear programming problem with multi-criteria and multi-constraint levels: a branch-and-partition algorithm. Int Trans Oper Res 8:497–509. https://doi.org/10.1111/1475-3995.00328

    Article  Google Scholar 

  14. Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer, London

    Book  Google Scholar 

  15. Olsen D, Shi Y (2006) Introduction to business data mining. McGraw-Hill/Irwin, New York

    Google Scholar 

  16. Sukhadia A, Upadhyay K, Gundeti M et al (2020) Optimization of smart traffic governance system using artificial intelligence. Augment Hum Res. https://doi.org/10.1007/s41133-020-00035-x

    Article  Google Scholar 

  17. Jha K, Doshi A, Patel P, Shah M (2019) A comprehensive review on automation in agriculture using artificial intelligence. Artif Intell Agric 2:1–12. https://doi.org/10.1016/j.aiia.2019.05.004

    Article  Google Scholar 

  18. Kakkad V, Patel M, Shah M (2019) Biometric authentication and image encryption for image security in cloud framework. Multiscale Multidiscip Model Exp Des 2:233–248. https://doi.org/10.1007/s41939-019-00049-y

    Article  Google Scholar 

  19. Panchiwala S, Shah M (2020) A comprehensive study on critical security issues and challenges of the IoT world. J Data, Inf Manag 2:257–278. https://doi.org/10.1007/s42488-020-00030-2

    Article  Google Scholar 

  20. Gupta A, Dengre V, Kheruwala HA, Shah M (2020) Comprehensive review of text-mining applications in finance. Financ Innov 6:1–25

    Article  Google Scholar 

  21. Desai M, Shah M (2020) An anatomization on breast cancer detection and diagnosis employing multi-layer perceptron neural network (MLP) and convolutional neural network (CNN). Clin eHealth. https://doi.org/10.1016/j.ceh.2020.11.002

    Article  Google Scholar 

  22. Thakkar H, Shah V, Yagnik H, Shah M (2020) Comparative anatomization of data mining and fuzzy logic techniques used in diabetes prognosis. Clin eHealth. https://doi.org/10.1016/j.ceh.2020.11.001

    Article  Google Scholar 

  23. Ayankoya K, Calitz A, Greyling J (2014) Intrinsic relations between data science, big data, business analytics and datafication. ACM Int Conf Proceeding Ser 28-Septemb:192–198. https://doi.org/10.1145/2664591.2664619

  24. Talaviya T, Shah D, Patel N et al (2020) Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif Intell Agric 4:58–73. https://doi.org/10.1016/j.aiia.2020.04.002

    Article  Google Scholar 

  25. Shah K, Patel H, Sanghvi D, Shah M (2020) A Comparative analysis of logistic regression, random forest and KNN models for the text classification. Augment Hum Res. https://doi.org/10.1007/s41133-020-00032-0

    Article  Google Scholar 

  26. Naik B, Mehta A, Shah M (2020) Denouements of machine learning and multimodal diagnostic classification of Alzheimer’s disease. Vis Comput Ind Biomed Art 3:1–18. https://doi.org/10.1186/s42492-020-00062-w

    Article  Google Scholar 

  27. Shah D, Dixit R, Shah A et al (2020) A comprehensive analysis regarding several breakthroughs based on computer intelligence targeting various syndromes. Augment Hum Res. https://doi.org/10.1007/s41133-020-00033-z

    Article  Google Scholar 

  28. Drust B, Green M (2013) Science and football: evaluating the influence of science on performance. J Sports Sci 31:1377–1382. https://doi.org/10.1080/02640414.2013.828544

    Article  Google Scholar 

  29. Lewis Michael (2004) Moneyball: The Art of winning an unfair game - Michael Lewis - Google Books

  30. Fullerton HS (1912) The inside game: the science of baseball. Am Mag 70:2–13

    Google Scholar 

  31. Reep C, Benajmin B (1968) Skill and chance in association football. J Royal Stat Soc. Ser A (General) 131(4):581–585

    Article  Google Scholar 

  32. Memmert D, Rein R (2018) Match analysis, big data and tactics: current trends in elite soccer. Dtsch Z Sportmed 69:65–72. https://doi.org/10.5960/dzsm.2018.322

    Article  Google Scholar 

  33. Thabtah F, Zhang L, Abdelhamid N (2019) NBA game result prediction using feature analysis and machine learning. Ann Data Sci 6:103–116. https://doi.org/10.1007/s40745-018-00189-x

    Article  Google Scholar 

  34. Hughes M, Franks I (2005) Analysis of passing sequences, shots and goals in soccer. J Sports Sci 23:509–514. https://doi.org/10.1080/02640410410001716779

    Article  Google Scholar 

  35. Bojanova I (2014) IT enhances football at world cup 2014. IT Prof 16:12–17. https://doi.org/10.1109/MITP.2014.54

    Article  Google Scholar 

  36. ZACH HELFAND (2015) Use of defensive shifts in baseball is spreading — because it works - Los Angeles Times. https://www.latimes.com/sports/la-sp-baseball-defensive-shifts-20150719-story.html. Accessed 3 Jan 2021

  37. Alrababa’h A, Marble W, Mousa S, Siegel AA (2019) Can exposure to celebrities reduce prejudice? The effect of Mohamed Salah on islamophobic behaviors and attitudes. https://doi.org/10.31235/osf.io/eq8ca

  38. Henderson JC, Foo K, Lim H, Yip S (2010) Sports events and tourism: the Singapore formula one grand prix. Int J Event Festiv Manag 1:60–73. https://doi.org/10.1108/17852951011029306

    Article  Google Scholar 

  39. Constantinou AC, Fenton NE, Neil M (2012) Pi-football: a bayesian network model for forecasting association football match outcomes. Knowledge-Based Syst 36:322–339. https://doi.org/10.1016/j.knosys.2012.07.008

    Article  Google Scholar 

  40. Epstein ES (1969) A scoring system for probability forecasts of ranked categories on JSTOR. J Appl Meteorol 8:985–987

    Article  Google Scholar 

  41. Dixon MJ, Coles SG (1997) Modelling association football scores and inefficiencies in the football betting market. J R Stat Soc Ser C Appl Stat 46:265–280. https://doi.org/10.1111/1467-9876.00065

    Article  Google Scholar 

  42. Moura FA, Martins LEB, Cunha SA (2014) Analysis of football game-related statistics using multivariate techniques. J Sports Sci 32:1881–1887. https://doi.org/10.1080/02640414.2013.853130

    Article  Google Scholar 

  43. Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, 99th edn. Wiley, Hoboken

    Google Scholar 

  44. Jones PD, James N, Mellalieu SD (2004) Possession as a performance indicator in soccer. Int J Perform Anal Sport 4:98–102. https://doi.org/10.1080/24748668.2004.11868295

    Article  Google Scholar 

  45. Gama J, Passos P, Davids K et al (2014) Network analysis and intra-team activity in attacking phases of professional football. Int J Perform Anal Sport 14:692–708. https://doi.org/10.1080/24748668.2014.11868752

    Article  Google Scholar 

  46. Hirotsu N, Wright M (2003) Determining the best strategy for changing the configuration of a football team. J Oper Res Soc 54:878–887. https://doi.org/10.1057/palgrave.jors.2601591

    Article  Google Scholar 

  47. Hirotsu N, Wright M (2002) Using a markov process model of an association football match to determine the optimal timing of substitution and tactical decisions. J Oper Res Soc 53:88–96. https://doi.org/10.1057/palgrave/jors/2601254

    Article  Google Scholar 

  48. Rotshtein AP, Posner M, Rakityanskaya AB (2005) Football predictions based on a fuzzy model with genetic and neural tuning. Cybern Syst Anal 41:619–630. https://doi.org/10.1007/s10559-005-0098-4

    Article  Google Scholar 

  49. RotshteinKatel’Nikov APDI (1998) Identification of nonlinear objects by fuzzy knowledge bases. Cybern Syst Anal 34:676–683. https://doi.org/10.1007/BF02667040

    Article  Google Scholar 

  50. Rotshtein AP, Shtovba SD (2001) Fuzzy multicriteria analysis of variants with the use of paired comparisons. J Comput Syst Sci Int 40:499–503

    Google Scholar 

  51. Tsakonas A, Dounias G, Shtovba S, Vivdyuk V (2002) Soft computing-based result prediction of football games. Ist Int Conf Inductive Model

  52. Sæbø OD, Hvattum LM (2018) Modelling the financial contribution of soccer players to their clubs. J Sport Anal 5:23–34. https://doi.org/10.3233/jsa-170235

    Article  Google Scholar 

  53. Hvattum LM (2013) Analyzing information efficiency in the betting market for association football league winners. J Predict Mark 7:55–70. https://doi.org/10.5750/jpm.v7i2.614

    Article  Google Scholar 

  54. Gennaro Vince (2007) Diamond dollars: The economics of winning in baseball. In: Potomac Books Inc. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=18.%09Gennaro+2007.+Diamond+Dollars%3A+The+Economics+of+Winning.+Maple+Street+Press.+1-253&btnG=#d=gs_cit&u=%2Fscholar%3Fq%3Dinfo%3AvoGYPaWVTGQJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Den. Accessed 3 Jan 2021

  55. Fairchild A, Pelechrinis K, Kokkodis M (2018) Spatial analysis of shots in MLS: a model for expected goals and fractal dimensionality. J Sport Anal 4:165–174. https://doi.org/10.3233/jsa-170207

    Article  Google Scholar 

  56. Pollard R, Ensum J, Taylor S (2004) Estimating the probability of a shot resulting in a goal: the effects of distance, angle and space. Int J Soccer Sci 2:50–55

    Google Scholar 

  57. Anderson Chris (2010) Comparing the best soccer leagues in the world. In: Sport. Inc. 3.1(Fall). https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=1.%09Anderson%2C+C.%2C+2010%2C+Comparing+the+best+soccer+leagues+in+the+world.+Sports%2C+Inc.+3.1%28Fall%29%2C+10-12&btnG=. Accessed 3 Jan 2021

  58. Rein R, Memmert D (2016) Big data and tactical analysis in elite soccer: future challenges and opportunities for sports science. Springerplus. https://doi.org/10.1186/s40064-016-3108-2

    Article  Google Scholar 

  59. Yiannakos A, Armatas V (2006) Evaluation of the goal scoring patterns in European Championship in Portugal 2004. Int J Perform Anal Sport 6:178–188. https://doi.org/10.1080/24748668.2006.11868366

    Article  Google Scholar 

  60. Coutts AJ (2014) Evolution of football match analysis research. J Sports Sci 32:1829–1830. https://doi.org/10.1080/02640414.2014.985450

    Article  Google Scholar 

  61. Bakker D, Müller A, Velupillai V et al (2009) Adding typology to lexicostatistics: a combined approach to language classification. Linguist Typol 13:169–181. https://doi.org/10.1515/LITY.2009.009

    Article  Google Scholar 

  62. González-Víllora S, Serra-Olivares J, Pastor-Vicedo JC, da Costa IT (2015) Review of the tactical evaluation tools for youth players, assessing the tactics in team sports: football. Springerplus 4:1–17. https://doi.org/10.1186/s40064-015-1462-0

    Article  Google Scholar 

  63. LI Ping (2005) Tendency of Offensive Tactics of Modern Football from the 11~(th) and 12~(th) European Football Championship--《Journal of Chengdu Physical Education Institute》2005年05期. J Chengdu Phys Educ Inst

  64. Lu W-L, Ting J-A, Little JJ, Murphy KP (2013) Learning to track and identify players from broadcast sports videos. IEEE Trans Pattern Anal Mach Intell 35:1704–1716

    Article  Google Scholar 

  65. Júlio G (2009) Trends of tactical performance analysis in team sports: bridging the gap between research, training and competition. Rev Port Ciências do Desporto 9:81–89

    Article  Google Scholar 

  66. Carling C, Bloomfield J, Nelsen L, Reilly T (2008) The role of motion analysis in elite soccer work rate data. Sport Med 38:839–862

    Article  Google Scholar 

  67. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  68. Dutt-Mazumder A, Button C, Robins A, Bartlett R (2011) Neural network modelling and dynamical system theory: are they relevant to study the governing dynamics of association football players? Sport Med 41:1003–1017. https://doi.org/10.2165/11593950-000000000-00000

    Article  Google Scholar 

  69. Goecks J, Nekrutenko A, Taylor J et al (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. https://doi.org/10.1186/gb-2010-11-8-r86

    Article  Google Scholar 

  70. Blankenberg D, Von Kuster G, Bouvier E et al (2014) Dissemination of scientific software with galaxy toolshed. Genome Biol 15:2–4. https://doi.org/10.1186/gb4161

    Article  Google Scholar 

  71. Sharma M, Khera SN, Sharma PB (2019) Applicability of machine learning in the measurement of emotional intelligence. Ann Data Sci 6:179–187. https://doi.org/10.1007/s40745-018-00185-1

    Article  Google Scholar 

  72. Xu Z, Shi Y (2015) Exploring big data analysis: fundamental scientific problems. Ann Data Sci 2:363–372. https://doi.org/10.1007/s40745-015-0063-7

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to Indus University and School of Technology, Pandit Deendayal Petroleum University for permission to publish this research.

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

All the authors make substantial contributions to this manuscript. PT and MS participated in drafting the manuscript. PT wrote the main manuscript, all the authors discussed the results and implication on the manuscript at all stages.

Corresponding author

Correspondence to Manan Shah.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical Approval

Not applicable.

Consent for Publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thakkar, P., Shah, M. An Assessment of Football Through the Lens of Data Science. Ann. Data. Sci. 8, 823–836 (2021). https://doi.org/10.1007/s40745-021-00323-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40745-021-00323-2

Keywords

Navigation