Abstract
Data-based methods and statistical models are given special attention to the study of sports injuries to gain in-depth understanding of its risk factors and mechanisms. The objective of this work is to evaluate the use of shared frailty Cox models for the prediction of occurring sports injuries, and to compare their performance with different sets of variables selected by several regularized variable selection approaches. The study is motivated by specific characteristics commonly found for sports injury data, that usually include reduced sample size and even fewer number of injuries, coupled with a large number of potentially influential variables. Hence, we conduct a simulation study to address these statistical challenges and to explore regularized Cox model strategies together with shared frailty models in different controlled situations. We show that predictive performance greatly improves as more player observations are available. Methods that result in sparse models and favour interpretability, e.g. Best Subset Selection and Boosting, are preferred when the sample size is small. We include a real case study of injuries of female football players of a Spanish football club.
This is a preview of subscription content, access via your institution.






References
Andersen, P.K., Gill, R.D.: Coxs regression model for counting processes: a large sample study. The Annals of Statistics , 1100–1120 (1982)
Androulakis, E., Koukouvinos, C., Vonta, F.: Estimation and variable selection via frailty models with penalized likelihood. Stat. Med. 31(20), 2223–2239 (2012)
Bahr, R.: Why screening tests to predict injury do not work-and probably never will...: a critical review. Br. J. Sports Med. 50(13), 776–780 (2016)
Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)
Bender, A., Groll, A., Scheipl, F.: A generalized additive model approach to time-to-event analysis. Stat. Model. 18(3–4), 299–321 (2018)
Binder, H.: CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks. R package version 1, 4 (2013)
Binder, H., Schumacher, M.: Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Statistical Applications in Genetics and Molecular Biology 7(1), (2008)
Bolling, C., Van Mechelen, W., Pasman, H.R., Verhagen, E.: Context matters: revisiting the first step of the sequence of prevention of sports injuries. Sports Med. 48(10), 2227–2234 (2018)
Breheny, P., Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat. Comput. 25, 173–187 (2015)
Bühlmann, P., Hothorn, T., et al.: Boosting algorithms: Regularization, prediction and model fitting. Stat. Sci. 22(4), 477–505 (2007)
Chatterjee, A., Lahiri, S.: Asymptotic properties of the residual bootstrap for lasso estimators. Proceed. Am. Math. Soc. 138(12), 4497–4509 (2010)
Cox, D.R.: Regression models and life-tables. J. Roy. Stat. Soc.: Ser. B (Methodol.) 34(2), 187–202 (1972)
Cox, D.R.: Partial likelihood. Biometrika 62(2), 269–276 (1975)
Croisier, J.-L., Forthomme, B., Namurois, M.-H., Vanderthommen, M., Crielaard, J.-M.: Hamstring muscle strain recurrence and strength performance disorders. Am. J. Sports Med. 30(2), 199–203 (2002)
Croisier, J.-L., Réveillon, V., Ferret, J., Cotte, T., Genty, M., Popovic, N., Mohty, F., Faryniuk, J., Ganteaume, S., Crielaard, J.-M.: Isokinetic assessment of knee flexors and extensors in professional soccer players. Isokinet. Exerc. Sci. 11(1), 61–62 (2003)
Crossley, K.M., Patterson, B.E., Culvenor, A.G., Bruder, A.M., Mosler, A.B., Mentiplay, B.F.: Making football safer for women: a systematic review and meta-analysis of injury prevention programmes in 11 773 female football (soccer) players. British journal of sports medicine (2020)
De Visser, H., Reijman, M., Heijboer, M., Bos, P.: Risk factors of recurrent hamstring injuries: a systematic review. Br. J. Sports Med. 46(2), 124–130 (2012)
Efron, B., Tibshirani, R.: Improvements on cross-validation: the 632+ bootstrap method. J. Am. Stat. Assoc. 92(438), 548–560 (1997)
Fan, J., Li, R.: Variable selection for coxs proportional hazards model and frailty model. Annals of Statistics , 74–99 (2002)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Fuller, C.W., Ekstrand, J., Junge, A., Andersen, T.E., Bahr, R., Dvorak, J., Hägglund, M., McCrory, P., Meeuwisse, W.H.: Consensus statement on injury definitions and data collection procedures in studies of football (soccer) injuries. Scand. J. Med. Sci. Sports 16(2), 83–92 (2006)
Gabbett, T.J., Ullah, S., Finch, C.F.: Identifying risk factors for contact injury in professional rugby league players-application of a frailty model for recurrent injury. J. Sci. Med. Sport 15(6), 496–504 (2012)
Gasparini, A., Clements, M.S., Abrams, K.R., Crowther, M.J.: Impact of model misspecification in shared frailty survival models. Stat. Med. 38(23), 4477–4502 (2019)
Gerds, T.A., Schumacher, M.: Consistent estimation of the expected brier score in general survival models with right-censored event times. Biom. J. 48(6), 1029–1040 (2006)
Graf, E., Schmoor, C., Sauerbrei, W., Schumacher, M.: Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18(17–18), 2529–2545 (1999)
Groll, A.: PenCoxFrail: Regularization in Cox Frailty Models. R package version 1, 1 (2016)
Groll, A., Hastie, T., Tutz, G.: Selection of effects in cox frailty models by regularization methods. Biometrics 73(3), 846–856 (2017)
Hägglund, M., Waldén, M., Ekstrand, J.: Previous injury as a risk factor for injury in elite football: a prospective study over two consecutive seasons. Br. J. Sports Med. 40(9), 767–772 (2006)
Harden, J.J., Kropko, J.: Simulating duration data for the cox model. Polit. Sci. Res. Methods 7(4), 921–928 (2019)
Herrmann, M., Probst, P., Hornung, R., Jurinovic, V., and Boulesteix, A.-L. (2020). Large-scale benchmark study of survival prediction methods using multi-omics data. arXiv preprint arXiv:2003.03621
Hewett, T.E., Myer, G.D., Ford, K.R., Heidt, R.S., Jr., Colosimo, A.J., McLean, S.G., Van den Bogert, A.J., Paterno, M.V., Succop, P.: Biomechanical measures of neuromuscular control and valgus loading of the knee predict anterior cruciate ligament injury risk in female athletes: a prospective study. Am. J. Sports Med. 33(4), 492–501 (2005)
Hoerl, A.E., Kennard, R.W.: Ridge regression iterative estimation of the biasing parameter. Commun. Stat.-Theory Methods 5(1), 77–88 (1976)
Hohberg, M. and Groll, A. (2020). A flexible adaptive lasso cox frailty model based on the full likelihood. arXiv preprint arXiv:2003.14118
Hougaard, P.: Frailty models for survival data. Lifetime Data Anal. 1(3), 255–273 (1995)
Impellizzeri, F.M., Rampinini, E., Maffiuletti, N., Marcora, S.M.: A vertical jump force test for assessing bilateral strength asymmetry in athletes. Med. Sci. Sports Exerc. 39(11), 2044–2050 (2007)
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S., et al.: Random survival forests. Annals Appl. Stat. 2(3), 841–860 (2008)
Kelly, P.J., Lim, L.L.-Y.: Survival analysis for recurrent event data: an application to childhood infectious diseases. Stat. Med. 19(1), 13–33 (2000)
Knapik, J.J., Bauman, C.L., Jones, B.H., Harris, J.M., Vaughan, L.: Preseason strength and flexibility imbalances associated with athletic injuries in female collegiate athletes. Am. J. Sports Med. 19(1), 76–81 (1991)
Larruskain, J., Celorrio, D., Barrio, I., Odriozola, A., Gil, S.M., Fernandez-Lopez, J.R., Nozal, R., Ortuzar, I., Lekue, J.A., Aznar, J.M.: Genetic variants and hamstring injury in soccer: an association and validation study. Med. Sci. Sports Exerc. 50(2), 361–368 (2018)
LeBlanc, M., Crowley, J.: Relative risk trees for censored survival data. Biometrics , 411–425 (1992)
Li, H. and Luan, Y. (2002). Kernel cox regression models for linking gene expression profiles to censored survival data. In Biocomputing 2003, pages 65–76. World Scientific
Liu, X.-R., Pawitan, Y., Clements, M.S.: Generalized survival models for correlated time-to-event data. Stat. Med. 36(29), 4743–4762 (2017)
McCall, A., Carling, C., Davison, M., Nedelec, M., Le Gall, F., Berthoin, S., Dupont, G.: Injury risk factors, screening tests and preventative strategies: a systematic review of the evidence that underpins the perceptions and practices of 44 football (soccer) teams from various premier leagues. Br. J. Sports Med. 49(9), 583–589 (2015)
McGilchrist, C., Aisbett, C.: Regression with frailty in survival analysis. Biometrics , 461–466 (1991)
Mogensen, U.B., Ishwaran, H., Gerds, T.A.: Evaluating random forests for survival analysis using prediction error curves. J. Stat. Softw. 50(11), 1 (2012)
Møller, M., Nielsen, R., Attermann, J., Wedderkopp, N., Lind, M., Sørensen, H., Myklebust, G.: Handball load and shoulder injury rate: a 31-week cohort study of 679 elite youth handball players. Br. J. Sports Med. 51(4), 231–237 (2017)
Morris, T.P., White, I.R., Crowther, M.J.: Using simulation studies to evaluate statistical methods. Stat. Med. 38(11), 2074–2102 (2019)
Nielsen, R.O., Bertelsen, M.L., Ramskov, D., Møller, M., Hulme, A., Theisen, D., Finch, C.F., Fortington, L.V., Mansournia, M.A., Parner, E.T.: Time-to-event analysis for sports injury research part 2: time-varying outcomes. Br. J. Sports Med. 53(1), 70–78 (2019)
Nielsen, R.Ø., Malisoux, L., Møller, M., Theisen, D., Parner, E.T.: Shedding light on the etiology of sports injuries: a look behind the scenes of time-to-event analyses. J. Orthop. Sports Phys. Therapy 46(4), 300–311 (2016)
Pan, W.: Using frailties in the accelerated failure time model. Lifetime Data Anal. 7(1), 55–64 (2001)
Prentice, R.L., Williams, B.J., Peterson, A.V.: On the regression analysis of multivariate failure time data. Biometrika 68(2), 373–379 (1981)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019)
Ripatti, S., Palmgren, J.: Estimation of multivariate frailty models using penalized partial likelihood. Biometrics 56(4), 1016–1022 (2000)
Rondeau, V., Mazroui, Y., Gonzalez, J.R.: Frailtypack: An r package for the analysis of correlated data with frailty models using the penalized likelihood estimation. Journal Of Statistical Software 47(4), (2012)
Rossi, A., Pappalardo, L., Cintia, P., Iaia, F.M., Fernández, J., Medina, D.: Effective injury forecasting in soccer with gps training data and machine learning. PLoS ONE 13(7), e0201264 (2018)
Ruddy, J. D., Cormack, S. J., Whiteley, R., Williams, M. D., Timmins, R. G., and Opar, D. A.: Modeling the risk of team sport injuries: a narrative review of different statistical approaches. Frontiers in physiology, 10 (2019)
Sartori, S.: Penalized regression: Bootstrap confidence intervals and variable selection for high-dimensional data sets (2011)
Steyerberg, E.W., Vickers, A.J., Cook, N.R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M.J., Kattan, M.W.: Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 21(1), 128 (2010)
Su, X., Fan, J.: Multivariate survival trees: a maximum likelihood approach based on frailty models. Biometrics 60(1), 93–99 (2004)
Therneau, T. M. (2020). A Package for Survival Analysis in R. R package version 3.2-7
Therneau, T.M., Grambsch, P.M., Pankratz, V.S.: Penalized survival models and frailty. J. Comput. Graph. Stat. 12(1), 156–175 (2003)
Tibshirani, R.: The lasso method for variable selection in the cox model. Stat. Med. 16(4), 385–395 (1997)
Tutz, G., Binder, H.: Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62(4), 961–971 (2006)
Ullah, S., Gabbett, T.J., Finch, C.F.: Statistical modelling for recurrent events: an application to sports injuries. Br. J. Sports Med. 48(17), 1287–1293 (2014)
Wei, L.-J., Lin, D.Y., Weissfeld, L.: Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Am. Stat. Assoc. 84(408), 1065–1073 (1989)
Wen, C., Zhang, A., Quan, S., Wang, X.: Bess: An r package for best subset selection in linear, logistic and cox proportional hazards models. J. Stat. Softw. 94(4), 1–24 (2020)
Witten, D.M., Tibshirani, R.: Survival analysis with high-dimensional covariates. Stat. Methods Med. Res. 19(1), 29–51 (2010)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Acknowledgements
This research was supported by the Basque Government through the BERC Programme 2018–2021 by the Spanish Ministry of Science, Innovation and Universities MICINN and FEDER: BCAM Severo Ochoa excellence accreditation SEV-2017-0718, and project PID2020-115882RB-I00 funded by AEI/FEDER, UE and acronym “S3M1P4R” and by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibility for its content. Furthermore, the authors are thankful to the two anonymous reviewers for their valuable and constructive comments which led to an improved manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zumeta-Olaskoaga, L., Weigert, M., Larruskain, J. et al. Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models. AStA Adv Stat Anal 107, 101–126 (2023). https://doi.org/10.1007/s10182-021-00428-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-021-00428-2
Keywords
- Shared frailty models
- Regularized Cox methods
- Sports injury prevention
- Survival analysis