Abstract
There is growing interest in the role of predictive analytics in sport, where such extensive data collection provides an exciting opportunity for the development and utilisation of prediction models for medical and performance purposes. Clinical prediction models have traditionally been developed using regression-based approaches, although newer machine learning methods are becoming increasingly popular. Machine learning models are considered 'black box'. In parallel with the increase in machine learning, there is also an emergence of proprietary prediction models that have been developed by researchers with the aim of becoming commercially available. Consequently, because of the profitable nature of proprietary systems, developers are often reluctant to transparently report (or make freely available) the development and validation of their prediction algorithms; the term 'black box' also applies to these systems. The lack of transparency and unavailability of algorithms to allow implementation by others of ‘black box’ approaches is concerning as it prevents independent evaluation of model performance, interpretability, utility, and generalisability prior to implementation within a sports medicine and performance environment. Therefore, in this Current Opinion article, we: (1) critically examine the use of black box prediction methodology and discuss its limited applicability in sport, and (2) argue that black box methods may pose a threat to delivery and development of effective athlete care and, instead, highlight why transparency and collaboration in prediction research and product development are essential to improve the integration of prediction models into sports medicine and performance.
This is a preview of subscription content, access via your institution.
References
Horvat T, Job J. The use of machine learning in sport outcome prediction: a review. Wiley Interdiscipl Rev Data Min Knowl Discov. 2020;10(5):e1380.
McCall A, Fanchini M, Coutts AJ. Prediction: the modern-day sport-science and sports-medicine “quest for the holy grail.” Int J Sports Physiol Perform. 2017;12(5):704–6.
Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Int Med. 2015;162(1):W1–73.
Riley RD, van der Windt D, Croft P, Moons KG. Prognosis research in healthcare: concepts, methods, and impact. Oxford University Press; 2019.
Hughes T, Sergeant JC, van der Windt DA, Riley R, Callaghan MJ. Periodic health examination and injury prediction in professional football (Soccer): theoretically, the prognosis is good. Sport Med. 2018;48(11):2443–8.
Riley RD, Hayden JA, Steyerberg EW, Moons KG, Abrams K, Kyzas PA, et al. Prognosis Research Strategy (PROGRESS) 2: prognostic factor research. PLoS Med. 2013;10(2):e1001380.
Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, et al. Prognosis Research Strategy (PROGRESS) 3: prognostic model research. PLoS Med. 2013;10(2):e1001381.
Van Calster B, Wynants L, Timmerman D, Steyerberg EW, Collins GS. Predictive analytics in health care: how can we know it works? J Am Med Inf Assoc. 2019;26(12):1651–4.
Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56.
Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138–60.
Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019;1(5):206–15.
Dhiman P, Ma J, Navarro CA, Speich B, Bullock G, Damen JA, et al. Reporting of prognostic clinical prediction models based on machine learning methods in oncology needs to be improved. J Clin Epidemiol. 2021;138:60–72.
Da Cruz HF, Pfahringer B, Martensen T, Schneider F, Meyer A, Böttinger E, et al. Using interpretability approaches to update “black-box” clinical prediction models: an external validation study in nephrology. ArtifIntell Med. 2021;111:101982.
Cook C. Predicting future physical injury in sports: it's a complicated dynamic system. Br J Sport Med. 2016;50(22):1356–7.
Shah ND, Steyerberg EW, Kent DM. Big data and predictive analytics: recalibrating expectations. JAMA. 2018;320(1):27–8.
Van Calster B, Steyerberg EW, Collins GS. Artificial intelligence algorithms for medical prediction should be nonproprietary and readily available. JAMA Int Med. 2019;179(5):731.
Seow D, Graham I, Massey A. Prediction models for musculoskeletal injuries in professional sporting activities: A systematic review. Trans Sports Med. 2020;3(6):505–17.
Collins GS, de Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014;14(1):40.
Watson DS, Krutzinna J, Bruce IN, Griffiths CE, McInnes IB, Barnes MR, et al. Clinical applications of machine learning algorithms: beyond the black box. Bmj. 2019;12;364.
Hernán MA, Hsu J, Healy B. A second chance to get causal inference right: a classification of data science tasks. Chance. 2019;32(1):42–9.
Shmueli G. To explain or to predict? Stat Sci. 2010;25(3):289–310.
Prosperi M, Guo Y, Sperrin M, Koopman JS, Min JS, He X, et al. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nat Mach Intell. 2020;2(7):369–75.
Hernán MA, Robins JM. Causal inference: what if. Boca Raton: Chapman & Hall/CRC; 2020.
Sperrin M, Jenkins D, Martin GP, Peek N. Explicit causal reasoning is needed to prevent prognostic models being victims of their own success. J Am Med Informatic Assoc. 2019;26(12):1675–6.
Hingorani AD, van der Windt DA, Riley RD, Abrams K, Moons KG, Steyerberg EW, et al. Prognosis research strategy (PROGRESS) 4: stratified medicine research. BMJ. 2013;346:e5793.
Impellizzeri FM, McCall A, Ward P, Bornn L, Coutts AJ. Training load and its role in injury prevention, part 2: conceptual and methodologic pitfalls. J Athl Train. 2020;55(9):893–901.
Impellizzeri FM, Menaspà P, Coutts AJ, Kalkhoven J, Menaspa MJ. Training load and its role in injury prevention, part I: back to the future. J Athl Train. 2020;55(9):885–92.
Impellizzeri FM, Ward P, Coutts AJ, Bornn L, McCall A. Training load and injury part 1: the devil is in the detail—challenges to applying the current research in the training load and injury field. J Orthop Sport Phys Ther. 2020;50(10):574–6.
Moons KG, Royston P, Vergouwe Y, Grobbee DE, Altman DG. Prognosis and prognostic research: what, why, and how? BMJ. 2009;338:b375.
Bzdok D, Altman N, Krzywinski M. Points of significance: statistics versus machine learning. Nature 2018;14(12):1119.
Ogundimu EO, Altman DG, Collins GS. Adequate sample size for developing prediction models is not simply related to events per variable. J Clin Epidemiol. 2016;76:175–82.
Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med. 2016;35(2):214–26.
Collins GS, Moons KG. Reporting of artificial intelligence prediction models. The Lancet. 2019;393(10181):1577–9.
Steyerberg EW. Clinical prediction models. Springer; 2019.
Wynants L, Collins GS, Van Calster B. Key steps and common pitfalls in developing and validating risk models. BJOG. 2017;124(3):423–32.
Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19(4):453–73.
Steyerberg EW, Harrell FE Jr, Borsboom GJ, Eijkemans M, Vergouwe Y, Habbema JDF. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54(8):774–81.
Steyerberg EW, Harrell FE. Prediction models need appropriate internal, internal–external, and external validation. J Clin Epidemiol. 2016;69:245–7.
Efron B, Tibshirani RJ. An introduction to the bootstrap. CRC Press; 1994.
Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. Bmj. 2020;25;368.
Vollmer S, Mateen BA, Bohner G, Király FJ, Ghani R, Jonsson P, et al. Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. bmj. 2020;20;368.
Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med. 2016;375(13):1216.
D'Amour A, Heller K, Moldovan D, Adlam B, Alipanahi B, Beutel A, et al. Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:201103395. 2020.
Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart. 2012;98(9):691–8.
Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53.
Haibe-Kains B, Adam GA, Hosny A, Khodakarami F, Waldron L, Wang B, et al. Transparency and reproducibility in artificial intelligence. Nature. 2020;586(7829):E14–6.
Janssens A. Proprietary algorithms for polygenic risk: protecting scientific innovation or hiding the lack of it? Genes. 2019;10(6):448.
van Smeden M, de Groot JA, Moons KG, Collins GS, Altman DG, Eijkemans MJ, et al. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med Res Methodol. 2016;16(1):163.
van Smeden M, Moons KG, de Groot JA, Collins GS, Altman DG, Eijkemans MJ, et al. Sample size for binary logistic prediction models: Beyond events per variable criteria. Stat Methods Med Res. 2019;28(8):2455–74.
Riley RD, Snell KI, Ensor J, Burke DL, Harrell FE Jr, Moons KG, et al. Minimum sample size for developing a multivariable prediction model: PART II—binary and time-to-event outcomes. Stat Med. 2019;38(7):1276–96.
Riley RD, Debray TP, Collins GS, Archer L, Ensor J, van Smeden M, et al. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat Med. 2021; 40(19):4230–51.
Snell KI, Archer L, Ensor J, Bonnett LJ, Debray TP, Phillips B, et al. External validation of clinical prediction models: simulation-based sample size calculations were more reliable than rules-of-thumb. J Clin Epidemiol. 2021;135:79–89.
Hughes T, Riley RD, Callaghan MJ, Sergeant JC. The value of preseason screening for injury prediction: the development and internal validation of a multivariable prognostic model to predict indirect muscle injury risk in elite football (soccer) players. Sports Med-Open. 2020;6(1):1–13.
Jennings D, Cormack S, Coutts AJ, Boyd LJ, Aughey RJ. Variability of GPS units for measuring distance in team sport movements. Int J Sport Physiol Perform. 2010;5(4):565–9.
Plews DJ, Laursen PB, Stanley J, Kilding AE, Buchheit M. Training adaptation and heart rate variability in elite endurance athletes: opening the door to effective monitoring. Sports Med. 2013;43(9):773–81.
Wisbey B, Rattray B, Pyne D. Quantifying changes in AFL player game demands using GPS tracking: 2008 AFL season. Florey (ACT): FitSense Australia; 2008.
Me E, Unold O. Machine learning approach to model sport training. Comput Hum Behav. 2011;27(5):1499–506.
Alderson J. A markerless motion capture technique for sport performance analysis and injury prevention: toward a ‘big data’, machine learning future. J Sci Med Sport. 2015;19:e79.
Zelič I, Kononenko I, Lavrač N, Vuga V. Induction of decision trees and Bayesian classification applied to diagnosis of sport injuries. J Med Syst. 1997;21(6):429–44.
Robertson S, Bartlett JD, Gastin PB. Red, amber, or green? Athlete monitoring in team sport: the need for decision-support systems. Int J Sport Physiol Perform. 2017;12(s2):S2-73-S2-9.
Funding
GSC was supported by the NIHR Biomedical Research Centre, Oxford, and Cancer Research UK (programme grant: C49297/A27294). No other sources of funding were used to assist in the preparation of this article.
Author information
Authors and Affiliations
Contributions
GSB, TH, GSC and SK conceived the study idea. GSB, TH, GSC and SK were involved in design and planning. GSB, TH and SK wrote the first draft. GSB, TH, AHA, PW, GSC and SK critically appraised the manuscript. GSB, TH and SK wrote the first draft. GSB, TH, AHA, PW, GSC and SK approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
Garrett S. Bullock, Tom Hughes, Amelia H. Arundale, Patrick Ward, Gary S. Collins and Stefan Kluzek declare that they have no conflicts of interest relevant to the content of this article.
Availability of data and material
Not applicable.
Code availability
Not applicable.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bullock, G.S., Hughes, T., Arundale, A.H. et al. Black Box Prediction Methods in Sports Medicine Deserve a Red Card for Reckless Practice: A Change of Tactics is Needed to Advance Athlete Care. Sports Med 52, 1729–1735 (2022). https://doi.org/10.1007/s40279-022-01655-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40279-022-01655-6