Key points

  • Identified risk factors for sustaining a second ACL injury varied depending on follow-up time using two different analysis methods: Classification and regression tree (CART) and Cox regression analysis.

  • In future research on risk factors, the time athletes are followed up and the type of statistical approaches are important to discuss when interpreting the findings.

Background

A high proportion of athletes, especially in contact and pivoting sports, sustain not only a first anterior cruciate ligament (ACL) injury, but also a second ACL injury [1,2,3,4,5]. Therefore, a common and important goal in sport medicine is to reduce the risk of both primary and secondary ACL injury. In the clinical setting, screening tests to identify at-risk athletes or risk factors for new ACL injury after primary ACL reconstruction (ACLR) could be performed at various time points after ACLR. In research, screening tests are often performed at a study baseline, i.e. at a specific time point such as at the beginning of the season [6, 7], at the time of ACLR [8], or at the time of return to sport (RTS) [9, 10]. The athletes are then prospectively followed up to register new ACL injuries to analyse the association between baseline tests and risk for future ACL injury. The follow-up times presented in previous research investigating risk factors for ACL injury with different functional performance tests range from 6 months to 8 years [7, 10, 11].

Risk factors to sustain a second ACL injury include young age [4, 9, 12, 13], return to cutting and pivoting sports [4, 8, 14] and RTS before 9 months after ACLR [8, 9]. The importance of other proposed risk factors such as the impact of knee valgus motion in predicting a second ACL injury is debated. Studies with 1–2-year follow-up reported association between excessive knee valgus motion in a drop vertical jump (DVJ) for a primary or second ACL injury [15, 16], but studies with longer follow-up did not [7, 11]. Knee valgus motion pattern could potentially fluctuate over one or several seasons, and the studied cohorts will change over time regarding, e.g. activity level [2], which is an important factor for the risk to sustain a second ACL injury [4, 8, 14]. The problem is that functional performance is dynamic and changes over time, and thus also the association between performance and injury risk. The standard procedure in many studies on risk factors is to include a baseline functional test and then assume that this would be associated with new injury occurrence regardless of how long it has been between the functional performance test and the new injury. Thus, different follow-up times may be a reason for the sometimes contrasting results regarding the association between candidate risk factors and subsequent ACL injury. In addition, the statistical approach used can also contribute to different results related to injury risk [17]. Some studies have explored injury risk adding nonlinear analysis among the factors and make clinical use of these interactions, such as classification and regression tree (CART) analysis [6, 13], but generalized linear approaches such as Cox regression are more commonly used [8,9,10]. For example, using CART analysis, it was found that hop performance in a triple hop for distance and greater limb asymmetry identified high-risk young female patients for second ACL injury after ACLR and RTS [13]. However, using Cox regression, hop performance was not identified as a risk factor for a second ACL injury in other cohorts [8, 9].

We have previously used CART in a two-year follow-up [6] and reported that an interaction between functional performance, clinical assessment, and psychological factors could identify female football players at high risk for a second ACL injury. In the same cohort, we studied predetermined functional performance test cut-offs and their association with the risk of a second ACL injury where no association was found [18]. Therefore, using the same study cohort, variables and cut-offs, we wanted to study how different statistical methods (CART and Cox regression) and different follow-up periods influence the association between candidate risk factor variables and a second ACL injury occurrence. So, the aim of this study was to explore if different follow-up times and statistical approaches, CART analysis and Cox regression, would impact on the association between various candidate risk factors and ACL injury in female football players. The hypotheses were that: (1) identified risk factors will vary depending on the follow-up time; the association between baseline candidate risk factors and the outcome is likely time-dependent; (2) the association between baseline functional performance tests and psychological readiness for RTS and injury outcome will decrease with longer follow-up times because factors such as muscle strength and hop performance are likely to fluctuate over one or consecutive seasons [19], and psychological readiness to RTS will improve over time [20]; and (3) the association between baseline personality traits and anthropometrics and injury outcome will be more stable regardless of the follow-up time.

Methods

Ethics Statement

All players received written and oral information about the study and gave informed consent. The study was approved by the Swedish Ethical Review Authority (Dnr 2012/24–31, 2013/75–32, 2017/324–32, and 2020–01,093) and the Swedish National Knee Ligament Register (SNKLR) board. This research was performed in accordance with the Declaration of Helsinki and in accordance with all relevant guidelines and regulations.

Participants

Female football players playing at any level, aged 16 to 25 years, with primary unilateral ACLR performed 6 to 36 months earlier were included. Exclusion criteria were having an associated posterior cruciate ligament injury and/or surgically treated injuries to the medial or lateral collateral ligament. For a detailed description of the inclusion procedure, see previous publications [2, 6]. The cohort consisted of 112 active female football players with ACLR (mean ± SD, 18 ± 8 months after ACL reconstruction; mean age 20 ± 2 years; height 168 ± 5 cm; and weight 65 ± 8 kg). This was a sub-cohort of the previously described cohort of 117 players with ACLR in a 2-year follow-up study (Five players dropped out in the long-term follow-up) [2]. In total, 43 (38%) players sustained a second ACL injury (30 ipsilateral and 13 contralateral ruptures).

Procedure

This is a study using CART analysis and Cox regression with 19 independent variables [6], based on a prospective cohort study analysing risk factors for a second ACL injury. Four different follow-up times from baseline tests were used: 0–12, 0–24, 0–36, and 0–>36 months. At baseline, at the beginning of the football season (January to April), all players completed questionnaires, underwent a clinical assessment, and performed functional performance tests (Table 1). The procedure has been described in detail previously [2, 6, 21]. The independent variables included were the same as those used in previously published data [6]. Nineteen of the baseline variables were included (Table 2).

Table 1 Baseline questionnaires, anthropometric measurements, and functional performance tests
Table 2 Descriptive baseline demographics separated into those who did or did not incur a new ACL injury

Anthropometric measurements and functional performance tests were conducted as pre-season tests, supervised by the same experienced test leader (A.F.). The measurements were taken in a standard order as presented in Table 1.

For the single hop for distance and the side hop, a Limb Symmetry Index (LSI) was calculated (ACL − reconstructed limb/uninvolved limb × 100), and used as an independent variable. In the DVJ, knee motion in the frontal plane (medial/valgus or lateral/varus knee displacement) was measured with motion analysis software (Dartfish ProSuite; Dartfish Ltd, Fribourg, Switzerland) from video films (Panasonic HC-V500M). Knee motion during the DVJ was calculated in centimetres as the frontal plane displacement of the knee from the initial ground contact (when the feet just touched the ground, X1) to the end of the deceleration phase (deepest knee flexion position, X2) [31,32,33]. To simplify the measurement, the greater trochanter, the lateral knee joint line, the head of the fibula, lateral malleolus, patella tendon, and centre of the patella were marked on the players with a marker pen. The displacement was measured between the two vertical lines to the centre of the patella (X1–X2) to represent the amount of knee valgus motion in each leg.

Players were allowed three practice trials and three maximum attempts for the single hop for distance, 5-jump test, and DVJ. However if hop length increased in all three hops in the single hop for distance, additional hops were performed until no further increase occurred. For the tuck jump and side hops, a few test hops were performed to get familiarized with the tests.

Follow-Up of New ACL Injury

The players answered a web-based question regarding the occurrence of any new knee injury on three occasions annually (pre-season, in-season, and post-season) for the first 2 years after inclusion. If a player reported a new knee injury, she was contacted to inquire if it was a new ACL injury (re-rupture or contralateral rupture), and confirmation of the diagnosis was retrieved from medical records. Detailed descriptions of registered new knee injuries and other injuries over the first 2 years have been published previously for the cohort [2].

Approximately 5 years after baseline, the players again received a follow-up question about occurrence of any new ACL injury. All reported new ACL injuries were confirmed from the SNKLR or medical records.

Statistical Methods

Statistical analyses were performed with IBM SPSS Statistics for Windows (v 27.0; IBM) and OpenEpi software. Mean ± standard deviations (SD) or median and interquartile range (IQR)/range were calculated to characterize the sample. Between-group comparisons were made with the Student’s t test (continuous data), Mann–Whitney U test (not normally distributed data), or Fisher's exact test (nominal data).

The hypothesis from our first study [6] was that sustaining a new ACL injury is complex and involves nonlinear interactions among several factors. That was the major reason to use CART. A correlation matrix was performed before running the model to exclude highly correlated variables [6]. CART analysis identifies interactions between independent variables and provides a hierarchical association with new ACL injury. Binary recursive divisions of the initial set of data reveal the variables and their respective cut-off points regarding individuals in each category (with and without new ACL injury), until the sub-groups reach a minimum size or no improvement can be made. For each partition, CART analysis considers all variables to decide which one would be the best to split the parent node in two. The total sample (players with ACLR, n = 112; node 0) is considered at the beginning, and the final model represents, hierarchically, the strength of association with the outcome factor (new ACL injury).

The following criteria were used to produce the partitions and, consequently, CART growth: a minimum of eight participants in each node to make a division, a minimum of four participants to generate a node [36], and a Gini index of 0.0001 to maximize the node’s homogeneity. We used fivefold cross-validation, a resampling procedure, to estimate better accuracy and improve the level of fit of the CART model. This approach is used to evaluate machine learning models on a limited data sample [36] and as a measure to decrease overfitting. The classification cost was considered symmetric between categories, and the probability of new ACL injury was established as equal between the groups. A receiver operating characteristic (ROC) curve was created to verify the accuracy of the CART analysis. Finally, relative risks (RRs) were calculated for each terminal node of the CART model to investigate the strength of the associations. One separate CART analysis was conducted for each follow-up at 0–12, 0–24, 0–36, and 0–>36 months.

The Cox proportional hazards model is a method to investigate the association between time to event (new ACL injury) and one or more independent variables. The Cox model can be written as a multiple linear regression of the logarithm of the hazard on the independent variables. One fundamental assumption in the Cox model is that the hazards are proportional, which implies that the hazard ratio is constant across time. The proportional hazards assumption is tested by adding an interaction term of time and each independent variable to the model, where a non-significant result (P > 0.05) indicates that the assumption is met. In addition, we plotted the scaled Schoenfeld residuals against time, where the residuals should randomly vary around zero to meet the proportional hazards assumption. We also plotted the Martingale residuals against the continuous covariates to detect nonlinearity.

Univariable Cox regression was used to estimate associations between the 19 independent variables and the occurrence of new ACL injury at the four different follow-up times. The time variable used in the Cox regression was the number of days from baseline to an event or to the end of follow-up. Hazard ratios (HRs) were calculated with 95% confidence intervals (CIs). Significant variables from the Cox regression analyses were dichotomized using cut-offs identified from the CART analysis to calculate sensitivity, specificity, accuracy, and positive (PPV) and negative predictive values (NPV) to predict new ACL injury. The significance level was set at P < 0.05.

Results

Descriptive statistics for demographic characteristics and the 19 independent variables included in the analysis for risk factors for the study sample are presented in Table 2.

Identified risk factors associated with a new ACL injury varied depending on the follow-up time and with the statistical approach used. Here, we present results for CART and Cox regression separately with the different follow-up times.

CART

The accuracy of the different CART analyses varied between 86 and 93% with 77–96% sensitivity and 70–81% specificity. CART identified 12 of the 19 independent variables and selected between 5 and 6 of the variables in the four different follow-up times associated with second ACL injury. The first factors selected (level 1, strongest association) by the CART at each follow-up time were as follows: 0–12-month follow-up, ACL-RSI (cut-off 5.7); 0–24-month follow-up, SSP-stress susceptibility (cut-off 44.4); and 0–36-month and 0–>36-month follow-ups, knee extension (cut-off − 4 degrees) (Table 3, Fig. 1, and Additional file 1: Appendix).

Table 3 Classification and regression tree analysis with four different follow-up times for players with ACLR (n = 112)
Fig. 1
figure 1

Classification and regression tree (CART) with 0–12-month follow-up for second ACL injury. The bold text in each node (ACL– [no second ACL injury] or ACL + [second ACL injury]) corresponds to the predicted category. All bold boxes indicate terminal nodes. ACL-RSI ranges from 0 (worse) to 10 (best); SSP-impulsiveness, SSP-stress susceptibility, and IKDC range from 0 (lowest) to 100 (highest). ACL-RSI, Anterior Cruciate Ligament-Return to Sport after Injury; IKDC, International Knee Documentation Committee Subjective Knee Form; LSI, Limb Symmetry Index; SSP, the Swedish Universities Scales of Personality

An interpretation of the different follow-up analyses with CART can be given with an example from node 10 with the highest RR (8.16) in the 0–12-month follow-up; players with higher scores in ACL-RSI (cut-off 5.7), lower scores in SSP-impulsiveness (cut-off 59.7), higher scores in SSP-stress susceptibility (cut-off 55.6), and higher scores in IKDC (cut-off 85.6) had an eightfold increased risk of a second ACL injury (Fig. 1). All seven statistically significant profiles were on risk factors for new injury, and no profiles were on protective factors (Table 3 and Additional file 1: Appendix).

Univariable Cox Regression

Univariable Cox regression did not indicate any significant risk factors in the 0–12- and 0–24-month follow-ups. Univariable Cox regression showed that in the 0–36-month follow-up, only greater knee hyperextension in the ACLR leg was associated with sustaining a secondary ACL injury (HR, 0.935; 95% CI 0.882–0.991; P = 0.023, sensitivity 49%, specificity 71%, PPV 44%, NPV 75%, accuracy 64%). In the in 0–>36-month follow-up, greater knee hyperextension in the ACLR leg (HR, 0.936; 95% CI 0.889–0.987; P = 0.014, sensitivity 44%, specificity 71%, PPV 49%, NPV 67%, accuracy 61%) and shorter time between primary injury and surgery (HR, 0.898; 95% CI 0.812–0.992; P = 0.035 sensitivity 88%, specificity 32%, PPV 45%, NPV 81%, accuracy 54%) were associated with sustaining a secondary ACL injury (Table 4).

Table 4 Univariable Cox regression with four different follow-up times for players with ACLR (n = 112)

Discussion

The main finding was that the identified risk factors for a second ACL injury in female football players with a primary ACLR varied depending on the follow-up time using CART analysis and with Cox regression. Further, the statistical approaches yielded different results with regard to the identification of risk factors for a second ACL injury. In future research on risk factors for ACL injury, the follow-up time of athletes and the statistical approach are important to discuss in relation to the findings.

Our first hypothesis was confirmed regarding the association between risk factors and the follow-up time. In CART, the accuracy (given by the ROC curve) was the highest (93%) in the 0–12-month follow-up. The accuracy dropped to 86% in the 0–24-month follow-up, and then remained stable with longer follow-up (86–87%). One reason for this observed plateau is that prediction models improve with a higher number of injured players, i.e. as with longer follow-up times. We calculated the RR of all terminal nodes for the different CARTs to analyse the strength of each association, not to analyse the models in detail because it was not the purpose of this study. An interesting finding was that higher RRs in CART were found at 0–12- and 0–24-month follow-up (4.62–8.16) and lower RRs were found at 0–36 and 0–>36 months (2.11–3.73). This also suggests that prediction is influenced by the follow-up time.

Cox regression did not indicate any significant risk factors in the 0–12- and 0–24-month follow-ups, but two significant factors (knee extension and time between injury and ACLR) in the 0–36- and 0–>36-month follow-ups. Categories were created based on the cut-offs selected in the CART analyses for the significant values (knee extension, − 4 degrees; time between injury and ACLR, 7.1 months) to analyse the sensitivity, specificity, and accuracy. The sensitivity and specificity varied, but accuracy was consistently low (54–64%). Thus, knee extension and time between injury and ACLR as predisposing factors to ACL injury may be limited. In the current sample, and with the candidate risk factors available, the value of univariable Cox regression to predict who will sustain a new ACL injury is questionable. Previous studies using Cox regression have identified other risk factors to sustain a second ACL injury such as young age at first injury [9, 12], return to cutting and pivoting sports [8, 12], early RTS after primary ACLR [9, 10], and time between injury and surgery [12]. Our cohort already consisted of a high-risk group of young players who had returned to football after ACLR, which could explain the discrepancies. An important aspect to consider is that both CART and Cox analysis are sample-dependent. An example of the sample dependence is the differences from our previously published results with CART analysis with 117 players included [6] instead of 112 as in the present analysis, where the 5-jump test was the first predictor and now the second predictor in the 0–24-month follow-up.

Our second hypothesis that the results from functional performance tests would decrease in importance regarding prediction for longer follow-up times was rejected using CART. The interactions in CART included results from functional performance tests in all different follow-ups. However, this was not the case for psychological readiness to RTS. CART analysis with the 0–12-month follow-up showed that the factor ACL-RSI was selected on the first level, i.e. the strongest association with the outcome, whereas ACL-RSI was not selected with the longer follow-up times. Previous studies have shown an association with ACL-RSI and second ACL injury in follow-up periods of approximately 2 years after ACLR [37, 38], indicating that psychological readiness for RTS may be more important in the short-term follow-up. This confirms the hypothesis that the association between psychological readiness and injury will decrease with longer follow-up times. From the Cox regression analyses, the hypothesis could not be confirmed or rejected because of the non-significant associations of the functional performance tests, psychological readiness, and second ACL injury.

Our third hypothesis that the association between baseline personality traits, anthropometrics, and injury outcome will be more stable regardless of the follow-up time was to some extent confirmed. Personality traits were identified by CART in most of our risk profiles and at different follow-up times. There are some studies regarding personality traits (somatic anxiety, psychic trait anxiety, stress susceptibility, and irritability) and association with injuries in general in football players followed for 3 months [39, 40] and up to 1 football season [41]. Using CART, knee hyperextension was selected at the first level in the 0–36- and 0–>36-month follow-ups. Similarly, using Cox regression, knee hyperextension was also indicated as a risk factor for second ACL injury in the 0–36- and 0–>36-month follow-ups. The association between knee hyperextension and risk of a second ACL injury is not well-studied. However, in a previous study on patients who have undergone ACLR with a patellar tendon autograft, patients with ≥ 6° of hyperextension in the ACLR knee compared with patients with ≤ 3° hyperextension did not show any increased risk for re-ruptures at a mean of 4.1 ± 1.1-year follow-up after ACLR [42]. The evidence is also insufficient regarding generalized joint hypermobility and the risk of graft failure [43]. The importance of personality and anthropometrics in prediction models with longer follow-up need further investigation.

We should be aware that small changes in prediction models could have a great impact on the results. Beischer et al. [9] illuminates the problem in their study; young athletes who returned to sport before 9 months after ACLR had approximately a sevenfold increased rate of sustaining a second ACL injury compared with those who returned at 9 months or later. However, when the authors did further sensitivity analyses and more athletes were included, the rate was lowered to a threefold higher risk. Further analysis with exclusion of the 10% of events with the strongest influence on the analysis showed no relationship between time to RTS and new ACL injury [9]. Another important change in the cohorts depending on time is that players will decrease in activity level and quit football [2]. Return to pivoting sports is probably the most important risk factor to sustain a second ACL injury [14].

Many factors may affect the decision on how long follow-up time is needed in a risk factor study. If the aim is to study, e.g. age, sex, anatomical and surgical factors as potential risk factors, longer follow-up times could be chosen to capture even more secondary ACL injuries, and studies may have as long as 10–20 years of follow-up [44, 45]. Too short follow-up will mean that not all second ACL injuries are captured, even if most secondary ACL injuries occur within the first 2 years after ACLR [12] or the first year after RTS [10].

Traditionally, sports injury prediction has been analysed through linear methods, such as the generalized linear model Cox regression. These methods assume that relationships among explanatory variables and the outcome are linear. However, there has been a paradigm shift regarding sport injury aetiology in recent years, seen as a complex phenomenon, with recommendations to apply statistical modelling that account for complex interactions between explanatory variables [46]. CART provides other benefits rather than just being a nonparametric alternative to Cox. For instance, CART analysis identifies interactions between independent variables, and the method also decreases the risk of overfitting.

Finally, the purpose of this study was to highlight and provide insights for future studies over the importance of addressing follow-up times and statistical methods used in risk factor studies. Psychological and personality factors seem to be important in the short term and greater knee hyperextension in the ACLR leg in the long term. However, the purpose was not to put the findings of the different risk factors into a clinically usable context and to suggest how clinicians should address them in rehabilitation and secondary injury prevention. Previous methodological discussions in risk analysis have been directed towards the variables being examined, time point of the collected variables, the type of variables (continues vs categorical), the amount of data (observations and events), association vs prediction, considerations when modelling the risk of injury, such as the method of data transformation, model validation and performance assessment [47].

The strengths of the present study are the prospective design of a homogeneous cohort of female football players with ACLR, with verification of all new ACL injuries from medical charts. Another strength is the width of factors included in the CART and Cox regression analysis, which made it possible to analyse many different factors regarding prediction and follow-up time.

Some limitations should be acknowledged. This was a relatively small sample with a small number of ACL injuries. Even considering this limitation, we were able to develop accurate models using CART. The fivefold cross-validation approach decreased overfitting for the CART method. Usually, researchers strive to include not only univariable Cox regression, but also multivariable Cox regression. Unfortunately, we could not use traditional multivariable Cox proportional regression analysis with 19 independent variables; the sample was too small, and the model would be overfitting. Another reason was that too few significant factors were identified using univariable Cox regression, despite the risk of false positive values due to multiple univariable analysis, and it would be questionable to perform a multivariable Cox regression. It is common to choose the baseline at the time point of ACLR, but the time for RTS after ACLR vary widely. It is common to choose the baseline at the time point of ACLR, but the time for RTS after ACLR varies widely, e.g. 3–38 months [8, 9]. Therefore, we included players with a range of 6 to 36 months after ACLR. However, using time after ACLR means that the exposure to sport is very different for the included players, and fluctuations in functional performance are expected.

Conclusions

Identified risk factors associated with a new ACL injury varied depending on the follow-up time and with the statistical approach used. The nonlinear approach (CART) identified more risk factors than the traditional generalized linear approach (Cox regression). Future research on risk factors for ACL (sport) injury should interpret and discuss the influence of follow-up time of athletes and the type of statistical approaches when interpreting the findings.