Introduction

Hip fractures are an increasingly common problem in older adults. Although the age-standardized incidence is falling, the aging of the population leads to a worldwide increase in the number of hip fractures [1]. More than 18,000 patients aged 65 years or older with a hip fracture are admitted to a hospital in the Netherlands yearly. This incidence is expected to increase by 24% to more than 21,200 in 2040 [2]. Especially in older adults, hip fractures have a significant impact on the quality of life. These factors result in poor long-term functional outcomes, an increased dependency on mobility, and a decreased physical quality of life [3]. A significant proportion of the patients die within the first 30 days after hip fracture surgery, with 30-day mortality rates varying from 4.1 to 13.3% [4,5,6,7].

Currently, surgical intervention is the standard treatment for hip fractures [8]. In 2020, 96.6% of hip fracture patients in the Netherlands underwent surgery [9]. The goal of surgery is for the patient to return to their prefracture functional level. Although surgery increases the functional level and can contribute to pain relief, it requires hospitalization with the risk of postoperative complications; many patients cannot return home after hospital admission. An alternative is a nonoperative treatment, which is associated with a poor prognosis regarding survival [10, 11]. The multicenter cohort FRAIL-HIP study recently compared nonoperative and operative treatment for frail institutionalized patients with limited life expectancy at 25 hospitals across the Netherlands [12]. In that study, nonoperative management was non-inferior to operative management regarding the quality of life. In the nonoperative management group, the early mortality rate was much higher; however, there were fewer adverse events, and the quality of dying was rated as good to almost perfect by 51% of the proxies and caregivers. Therefore, nonoperative treatment was viable, suggesting that surgery should not be a foregone conclusion for these patients. Nonoperatively treated patients avoid the stress of surgery and anesthesia and can stay at home with their relatives during this vulnerable phase of life. Especially in frail patients with a high risk of early mortality, one may question if surgery is always the best treatment [13]. Identifying patients at high risk for early mortality after surgery is essential because this knowledge could be used in future patients to personalize treatment decisions. Furthermore, this information could inform patients and families about the prognosis.

To identify hip fracture patients with a high risk of mortality after surgery, the Almelo Hip Fracture Score (AHFS) was developed in the Netherlands in 2016 for patients aged 70 years and older [14]. It has a good to excellent discriminative value with an area under the receiver operating curve (AUC) ranging from 0.70 in an external validation study to 0.82 in the initial study. In contrast to other models, the AHFS uses preoperative risk factors only, which could support shared decision-making processes.

Despite these strengths, the AHFS has room for improvement. The clinical utility of the risk score is limited. The maximum risk of early mortality calculated by the AHFS is 68.4%, which is relatively low. A higher maximum risk of mortality would be more helpful for clinical decision-making. This limited range may partly be caused by the skewed distribution in survival and mortality, as the proportion of deceased patients in the study population of the AHFS was relatively low: 7.5% of the patients died within 30 days following hip fracture surgery. A better prediction might be possible in a study population with higher mortality rates. De Groot et al. (2020) showed that the oldest hip fracture patients (i.e., 90 years and older, the so-called nonagenarians) have a significantly higher 30-day mortality rate of 13.3% compared to their younger peers (4.3% and 8.5% in patients aged 70–79 years and 80–89 years, respectively) [15]. Developing a risk score in patients 90 or older could improve the prediction. Furthermore, nonoperative or operative treatment considerations are clinically relevant in this patient group.

This study aimed to develop and internally validate a preoperative risk score to predict early mortality in patients aged 90 years or older undergoing hip fracture surgery.

Patients and methods

Study population and setting

To improve the quality of care for patients with a hip fracture, the nationwide Dutch Hip Fracture Audit (DHFA) was established in 2016 [16]. Prospective collection of patient characteristics, treatment modalities, and outcomes is an essential part of this audit. A taskforce study group within this audit collects extra data for research purposes: the DHFA Taskforce Indicators (DHFA TFI). The DHFA TFI comprises six Dutch hospitals across various regions: St. Antonius Hospital, Nieuwegein; Bernhoven Hospital, Oss; Admiraal de Ruyter Hospital, Goes; Diakonessenhuis Hospital, Utrecht; Haaglanden Medical Center, Den Haag; and Ziekenhuisgroep Twente, Almelo. For this study, we used data from the DHFA TFI. We included patients aged ≥ 90 years admitted between January 2018 and December 2019 in one of the participating hospitals surgically treated for a proximal femur fracture. Patients who were scheduled for surgery but died before undergoing surgery were excluded.

Data collection

A selection of possible predictors for mortality was made within the available data of the DHFA TFI based on a literature review [17,18,19,20]. Perioperative and postoperative variables were excluded because the AHFS90 is intended to be used preoperatively.

The following variables were included:

  • Age (in years).

  • Gender (females/males).

  • Dementia: diagnosis known in the hospital or by the general practitioner (yes/no).

  • Living in a nursing home (yes/no).

  • Risk of malnutrition (no increased risk/moderately increased risk/increased risk of malnutrition). The risk of malnutrition was measured using the Short Nutritional Assessment Questionnaire (SNAQ) score or Malnutrition Universal Screening Tool (MUST). Patients were considered not at risk of malnutrition if SNAQ = 0 or MUST = 0, moderately at risk of malnutrition if SNAQ > 0 and ≤ 2 or MUST = 1, and an increased risk of malnutrition if SNAQ > 2 or MUST > 1.

  • Fracture type (femoral neck non-displaced/femoral neck non-displaced/trochanteric AO-A1/trochanteric AO-A2/trochanteric AO-A3/subtrochanteric).

  • American Society of Anaesthesiologists physical status classification (ASA score 1–2/3/4).

  • Parker Mobility Score (PMS) (0–9); this is a composite measurement of a patient’s mobility indoors, outdoors, and during shopping. The PMS was selected instead of the Fracture Mobility Score as the reliability and validity of this score are more proven in previous studies.

  • Katz-ADL score (0–6); measures the patients’ ability to independently perform activities of daily living.

  • Serum hemoglobin level at admission to the hospital (in mmol/l).

  • We included polypharmacy (use of five or more different medications; yes/no) as a proxy for comorbidities. These include all the medications the patient uses, as well prescribed as not prescribed (such as multivitamins and fiber supplements).

The primary outcome was early mortality, defined as death within the first 30 days following hip fracture surgery. Survival was defined as survival after 30 days following hip fracture surgery.

Statistical analysis

Patient characteristics were analyzed using descriptive statistics. Categorical variables were expressed as numbers with corresponding percentages. Continuous variables were expressed as means with standard deviations or, in the case of skewed data, as median with interquartile range (IQR). The associations between categorical variables and mortality were tested using the chi-square test to assess univariate relationships between possible predictors and mortality. Associations between continuous variables and mortality were tested using the t-test. The following three steps were used to develop and validate a preoperative risk score for the prediction of early mortality in patients aged 90 years or older undergoing hip fracture surgery:

Step 1: Imputation

In the case of missing data, Multiple Imputation using Chained Equations (MICE) was used to create data imputations [21]. For dementia, polypharmacy, risk of malnutrition, PMS, and Katz-ADL score, 5.0% (n = 50) to 11.7% (n = 116) of the data were missing. MICE was used to create 20 imputed datasets [21]. To identify predictors for the imputations, chi-square tests between all possible combinations of variables were performed using a cut-off p-value of 0.20. Sum scores and aggregate variables were passively imputed. Convergence plots indicated convergence after ten iterations. Variables with small categories were recategorized after imputation.

Step 2: Multivariable logistic regression analysis

Logistic regression with backward variable selection based on the Akaike Information Criterion was performed to develop prediction models for early mortality after hip fracture surgery for all 20 imputed datasets. A final model was built by pooling the 20 models using Rubin’s rules. Pooled p-values of categorical variables with three or more categories were calculated using the method of Meng and Rubin [22]. Variables in at least half of the 20 models were selected for the final model [23].

Step 3: Validation

The modeling procedure, including backward variable selection, was validated using 200 bootstrap replicates. An optimism-corrected pooled AUC was calculated for the modeling procedure with a corresponding 95% confidence interval (CI) [24]. The calibration of this model was assessed using a calibration plot and a table comparing observed and predicted risks of early mortality. An example calculation of the AHFS90 is given in a fictional scenario.

To provide a form of external validation, geographic cross-validation is performed where the modeling procedure is repeated on data from five out of six hospitals and the resulting model is validated on the remaining hospital. This was repeated for each combination of five hospitals and a pooled AUC was calculated with a corresponding 95% confidence interval.

A p < 0.05 was regarded as statistically significant. Statistical analyses were conducted using the R statistical package for Windows, Version 4.0.2 (R foundation, 2020, Vienna, Austria). This article was written according to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines [25].

Results

Patient characteristics

Baseline characteristics are presented in Table 1. The study population included 922 patients. IQR age was 92.0 (91.0–95.0) years and 78.0% (n = 718) were female. Dementia was present in 32.0% (n = 277) of the patients. Before the hip fracture, 17.4% (n = 156) of the patients lived at a nursing home. Severe systemic diseases without constant threat to life were seen in 64.2% (n = 582) (ASA 3). Seventy-four patients (8.2%) had severe systemic diseases with a constant threat to life (ASA 4).

Table 1 Patient characteristics

One hundred and two patients (11.1%) died within 30 days following hip fracture surgery (early mortality group). Compared with the survival group, patients in the early mortality group were significantly older (median 93.0 years (IQR 91.0–95.0) versus 92.0 years (IQR 91.0–95.0), p = 0.013), less often female (64.7% (n = 66) versus 79.6% (n = 652), p < 0.001), more frequently living in a nursing home (30.4% (n = 31) versus 15.7% (n = 125), p < 0.001), and suffering more often from dementia (49.5% (n = 49) versus 29.8% (n = 228), p < 0.001). They were physically frailer, with more frequent severe systemic diseases that were a constant threat to life (ASA score 4, 19.4% (n = 19) versus 6.8% (n = 55), p < 0.001) and lower functional scores (PMS and Katz-ADL, p < 0.001 and p = 0.005, respectively). Their hemoglobin levels are lower at admission to the emergency department than patients in the survival group (mean 7.3 mmol/l (SD 1.1) versus mean 7.5 mmol/l (SD 1.0), p = 0.018).

Development of the Almelo Hip Fracture Score 90 (AHFS90)

Multivariable logistic regression with backward variable selection resulted in 20 models that included age, gender, dementia, living in a nursing home, ASA score, and serum hemoglobin level as predictors. Five models included PMS, and three models included the risk of malnutrition as predictors. Because only variables that were present in at least half of the 20 models were selected for the final model, the final AHFS90 model included age, gender, dementia, living in a nursing home, ASA score, and serum hemoglobin level as predictors of early mortality after hip fracture surgery in patients aged 90 years or older (Table 2).

Table 2 Multivariable logistic regression analysis of the AHFS90

Validation

The modeling procedure was validated on 200 bootstrap replicates, which resulted in an AUC of 0.74 with a 95% CI ranging from 0.72 to 0.76.

For the geographic cross validation, data from 778 out of the 922 (84.4%) patients could be retrieved. For one hospital, only 13 patients were included which was a too small number to test the model on. Therefore, the geographic validation was performed on the remaining five hospitals and 765 patients. The geographic cross-validation resulted in a pooled AUC of 0.72, with a 95% confidence interval ranging from 0.67 to 0.76.

Based on the coefficients and the constant factor (Table 2), the AHFS90 score can be calculated using the following formula:

$${\mathrm{AHFS}}^{90}= -8.872+\left(0.085\bullet \mathrm{Age}\right)-\left(0.906\bullet \mathrm{Gender}\right)+\left(0.586\bullet \mathrm{Dementia}\right)+\left(0.593\bullet \mathrm{Nursing Home}\right)+(0.836\bullet {\mathrm{ASA}}_{3})+(1.704\bullet {\mathrm{ASA}}_{4})-(0.219\bullet \mathrm{Hb})$$

Instructions on how to use the formula:

  • Age is the patient’s age in years.

  • Gender has the value of 1 for females and 0 for males.

  • Dementia has a value of 1 for patients diagnosed with dementia and 0 for patients not diagnosed with dementia.

  • A nursing home has the value of 1 for patients living in a nursing home and 0 for patients with other living situations.

  • ASA3 has the value of 1 for patients with an ASA score of 3 and 0 for patients with a different ASA score.

  • ASA4 has the value of 1 for patients with an ASA score of 4 and the value of 0 for patients with a different ASA score.

  • Hb is the hemoglobin level in mmol/l at admission to the hospital.

To predict the risk of early mortality, the AHFS90 score is entered in the following formula:

$$\mathrm{Risk}\;\mathrm{of}\;\mathrm{early}\;\mathrm{mortality}\left(\%\right)=\frac{100}{1+\mathrm e^{-\mathrm{AHFS}90}}$$

Box 1 gives an example of how to use the AHFS90 in clinical practice.

Box 1 Example of the use of the AHFS90

A 94-year-old man with dementia and severe systemic diseases with a constant threat to life presents to the emergency department several hours after a slip and fall at the nursing home where he lives. At the emergency department, he complained of pain in his left leg and was unable to stand on this leg. The leg was shortened and externally rotated. A plain radiograph of the hip revealed a trochanteric proximal femoral fracture. Laboratory tests revealed a hemoglobin level of 5.3 mmol/l

His daughter inquired about treatment options. To inform the patient and his daughter about the prognosis and start the shared decision-making process, the emergency department physician calculated the risk of early mortality using the AHFS90

Predicting the risk of early mortality with the AHFS90

  The patient’s scores on the AHFS90 were as follows:

  Age in years = 94

  Gender = 0

  Dementia = 1

  Nursing Home = 1

  ASA4 = 1

  Hb in mmol/l = 5.3

\({\mathrm{AHFS}}^{90}= -8.872+0.085\bullet 94-0.906\bullet 0+0.586\bullet 1+0.593\bullet 1+0.836 \bullet 0+ 1.704\bullet 1-0.219\bullet 5.3\)

\(\mathrm{AHFS}90=0.840\)

\(\mathrm{Risk}\;\mathrm{of}\;\mathrm{early}\;\mathrm{mortality}\;\left(\%\right)=\frac{100}{1+\mathrm e^{-0.840}}=69.9\%\)  

Figure 1 and Table 3 display the predicted risk of early mortality calculated with the AHFS90 versus the observed risk of early mortality in the study population. The lower predicted risk categories (< 30.0%) correspond with the observed risk of early mortality. In the predicted risk categories, 30.0–40.0%, the AHFS90 slightly overestimates the risk of early mortality. The predicted risk categories 40.0–50.0% correspond reasonably well with the observed risk of early mortality. Due to the absence of observed mortality in the risk category 50.0–60.0%, a proper analysis of this risk category was impossible.

Fig. 1
figure 1

Observed early mortality versus predicted early mortality calculated with the AHFS90 in the study population. Observed risk of early mortality (%) is plotted against predicted risk of early mortality (%). The dashed line indicates perfect agreement between observed risk of early mortality and the predicted risk of early mortality

Table 3 Predicted risk of early mortality calculated with the AHFS90 versus the observed risk of early mortality in the study population

For this reason, no bar is shown for this risk category in Fig. 1. In the category 60.0–70.0%, the AHFS90 again overestimates the risk of early mortality. The AHFS90 had a maximal prediction of early mortality of 64.5% in this study population.

Discussion

We developed and validated the AHFS90 to predict the risk of early mortality in hip fracture patients aged 90 years or older. The AHFS90 provides insight into the risk of early mortality after hip fracture in these patients; it incorporates age, gender, dementia, living in a nursing home, ASA score, and hemoglobin level as independent predictors for early mortality. Based on the AUC of 0.72 after geographic cross-validation, the accuracy of the risk model is good [26].

With the increasing incidence of hip fractures, the aging of the population, its associated comorbidities, associated costs, and limited workforce availability, there is a growing need for careful consideration of the right therapy for the right patient. Where almost every hip fracture patient was treated surgically in the past, it is now becoming increasingly critical to carefully consider whether surgery is the best treatment option. This approach is supported by the recently published FRAIL-HIP study that showed that nonoperative management is a viable option in hip patients with a limited life expectancy regarding quality of life. In the multidisciplinary shared decision-making process regarding operative vs. nonoperative treatment, goals of care and the patient’s condition are critical topics that should be discussed. The AHFS90 could add value to this process by providing information about the prognosis of the individual patient after hip fracture surgery.

In the past, several studies developed or investigated risk models for early mortality in hip fracture patients [5, 14, 27,28,29]. However, most risk models targeted patients aged 65 or 70. The AHFS90 includes only patients aged 90 with relatively high mortality rates [15]. The assumption was made that a better prediction could be expected in a study population with a less skewed distribution in survival and mortality. The mortality rate in our study was comparable to the 30-day mortality of 13.3% in nonagenarians found in the Dutch study of de Groot et al. [15]. Overall, the patient population in this large multicenter prospective cohort study is likely to reflect the Dutch hip fracture population, as it includes patients from six different hospitals across various regions in the Netherlands.

Initially, a selection of possible predictors for mortality was made from the available data based on a literature review. Variable selection should be based on domain knowledge because selection based on statistical methods alone does not result in stable models [24]. Univariate analyses showed that not all selected predictors were significantly related to early mortality (Table 1). The omission of predictors based on their univariate p-values might result in the omission of potential confounders. By making a first selection based on domain knowledge, backward variable selection identified the model with the best combination of predictors, regardless of their univariate significance.

The number of missing data was less than 12.0% per variable. Most missing data were seen in the variables dementia, polypharmacy, risk of malnutrition, Katz-ADL score, and PMS. This finding could be caused by a lack of staffing capacity for data collection, as described by Voeten et al. [16]. MICE was used to create 20 imputed datasets [21]. Convergence plots showed that the imputation was stable after the ten iterations, indicating the good performance of the imputation procedure.

The independent predictors included in the AHFS90 are well-known risk factors [30,31,32,33]. All are included in the AHFS. In addition, the AHFS incorporated the PMS, malignancies, and number of comorbidities. Unfortunately, the DHFA TFI does not register data concerning comorbidities such as the Charlson Comorbidity Index or malignancy. In the literature, several meta-analyses have shown comorbidities and malignancy to be risk factors for mortality [7, 19, 32, 34]. Including these data in a final model might lead to better predictions. We included polypharmacy as a proxy for comorbidities. This included all the medications the patient used, prescribed, and not prescribed (such as multivitamins). Multivariable logistic regression with backward variable selection resulted in 20 models, of which five included PMS and none included polypharmacy. Because only variables in at least half of the 20 models were selected for the final model, the final model excluded PMS and polypharmacy. These results could be different when polypharmacy included only prescribed medications, which would be a better proxy for comorbidities. Unfortunately, we were not able to gain information about how many medications were described or not.

In the Netherlands, most of the weighted risk factors of the AHFS90 are already gathered for audit purposes [16]. Except for hemoglobin level at admission to the hospital, all risk factors are variables mandatorily collected in the DHFA. Hemoglobin level is already present in the dataset of the DHFA TFI for research purposes. In the future, hemoglobin level is expected to become a variable in the DHFA to optimize benchmarking.

The AUC for the AHFS90 in this study was 0.74, which is low compared to the AUC of 0.82 in the original AHFS study [14]. However, this AHFS value might be relatively high due to overfitting, as validated on the dataset. One of the strengths of the present study is that we internally validated the model appropriately by correcting for over-optimism with bootstrapping. In four risk categories of the AHFS90, the predicted risk corresponded with the observed risk of early mortality. The AHFS90 slightly overestimated the risk of early mortality in two risk categories. In one risk category, no mortality was observed because a proper analysis was impossible. The AHFS90 predicted risk of > 40.0% in a few patients. Comparing the predicted risk with the observed risk in the risk categories 40.0–50.0%, 50.0–60.0%, and 60.0–70.0% could be inaccurate. Future external validation studies are recommended to investigate the accuracy of the AHFS90 in another study population and the performance of predicted risk categories of > 50.0%. A larger cohort of patients would be necessary to increase the number of patients in higher predicted categories. The DHFA may provide this data in the future, when hemoglobin is included in the mandatory dataset.

Predicting a higher risk of early mortality (> 80.0%) in hip fracture patients is challenging. In the past, several studies that developed risk scores for early mortality in hip fracture patients experienced a limitation in range and could not predict the risk of early mortality higher than 45.0% [29, 35]. The first AHFS calculated a maximum risk of early mortality of 68.4% [14]. With the AHFS90, we hoped to extend this range to be more supportive in clinical decision-making. In our study population, this goal was not achieved; the AHFS90 predicted risk of early mortality of 64.5%. One of the reasons that the risk of early mortality calculated in this study population might be relatively low is that including only patients who underwent surgery may bias the study population. Patients who received a nonsurgical treatment might have worse patient characteristics and a higher risk of early mortality, which often is the reason to refrain from surgery. By excluding these patients, we created a relatively healthier patient population. This study could not assess the precise impact of excluding these patients; however, 2% of all hip fracture patients registered in the DHFA received nonoperative treatment in 2017–2019 [36]. Future studies are recommended to observe the ability of the AHFS90 to predict the risk of early mortality in the general hip fracture population regardless of the type of treatment (e.g., by using data of the DHFA TFI). This way, accurate distribution and maximum predicted risk of early mortality can be observed in the Dutch hip fracture population.

In addition to bias in the study population, other possible explanations for the limited range in predicting early mortality might be that we did not include all relevant potential predictors. As mentioned above, data regarding comorbidities and known risk factors for mortality after hip fracture surgery were unavailable. Including these data (in the form number of comorbidities or proxies such as the Charlson Comorbidity Index) might lead to better prediction models. Furthermore, psycho-social factors like a mindset or emotional loneliness were not included, which might accelerate the aging process [37, 38]. Another possible predictor not yet measured was overall physical reserve capacity. This parameter can be measured by (for example) fatigability in handgrip strength. Studies showed that lower handgrip strength was associated with higher mortality risk [39,40,41]. To explore the concept of fatigability and overall physical reserve capacity, our research group is currently researching a dynamic version of handgrip strength (so-called “grip work”). The device eforto® (Fatigability Outcomes to monitor Resilience Targets in Older Persons) is tested to monitor muscle fatigability as a dynamic marker of an older person’s intrinsic capacity and resilience [42].

Machine learning could help optimize the extraction of predictors, possibly leading to an even more accurate prediction. This technique creates an algorithm that searches for patterns within data. It extracts knowledge through an inductive process: the inputs are the data and a first example of the expected output (mortality within 30 days after hip fracture surgery). Subsequently, the machine then learns the algorithm to follow to obtain the same result by discovering patterns in large data sets. In contrast to logistic regression, there is no limitation in the number of features of data inputted. In 2021, Yenidogan et al. used multimodal machine learning to predict the 30-day postoperative mortality of older adults sustaining a hip fracture [43]. Using data containing patient characteristics, comorbidities, vital signs, physical examinations, electrocardiography, laboratory tests, and X-ray images, the authors achieved an optimal AUC of 0.79 in a multimodal model. Their primary takeaway message was clear: a multimodal machine learning model can significantly exploit the additional data from other modalities. By comparison, classical statistics work with smaller datasets; however, these are easier to interpret.

Future recommendations

A perfect risk score for predicting early mortality after hip fracture surgery is not yet available. Recommendations for the future are to strive to develop a risk score that supports treatment considerations. The prediction needs to be accurate and reach a predicted risk of early mortality of 80% or more, as lower predicted risks are less likely to support and adjust treatment considerations. In addition to the risk score, the clinical view of the health care professionals also plays an essential role. Furthermore, the decision on a nonoperative or operative treatment should be made in a shared decision-making process. Shared decision-making requires a holistic approach to patient care, in which knowledge of what matters most for the patient is essential. This final piece of information, with the prognosis based on the risk score, enables a careful consideration of the surgery’s benefits and possible adverse effects.

Conclusion

The AHFS90 predicts early mortality after hip fracture surgery in patients aged 90 years or older. Age, gender, dementia, living situation, ASA Score, and hemoglobin level are independent risk factors included in the model. The accuracy of the AHFS90 is good, with an AUC of 0.74. Calibration showed that the predicted risk corresponds with the observed risk in most risk categories. In our study population, the AHFS90 yielded a maximum prediction of early mortality of 64.5%, comparable to the maximum risk of the AHFS.