Introduction

Clinicians, researchers, and federal regulators have often failed to agree on relevant and reproducible measures of fracture healing [8]. Although most surgeons believe fracture healing should be determined using clinical and radiographic information, substantial variability in the methods for assessing healing is observed in practice and the research environment [3, 7, 11, 15, 16]. This is extremely problematic when trying to establish the validity of a study’s results or make comparative evaluations between different studies. As a result of the high incidence of elderly hip fractures and its associated morbidity [9, 12, 14], numerous clinical trials are aimed at interventions to improve femoral neck fracture healing outcomes; however, the assessment of femoral neck fracture healing remains highly subjective and causes disagreements among specialists [6].

A variety of descriptions for fracture nonunion have been proposed in recent clinical studies, including a past US Department of Health and Human Service’s definition of nonunion, “when fracture healing has ceased for 3 or more months” [10]. Although many authors argue that a temporal definition for fracture healing is too arbitrary and flawed [4], describing a minimum amount of cortical and trabecular bridging to distinguish between healed and ununited fractures is equally as subjective. Many healed fractures may not achieve complete trabecular consolidation and cortical bridging despite the patient’s return to full activities. Clinically, surgeons use radiographic and patient assessments to determine if a fracture has united.

The Radiographic Union Score for Hip (RUSH) is a previously validated tool that improves fracture healing agreement between radiologists and orthopaedic surgeons by using a checklist-based scoring approach [1, 2, 5]. Using this tool to grade the healing of femoral neck fractures leads to better agreement with respect to radiographic healing and improved intra- and interobserver reliability [5]. We sought to determine if this systematic and standardized scoring approach could be used to determine fracture healing and predict clinically important outcomes. Therefore, the primary purpose of this study was to find a RUSH score that is highly specific for femoral neck nonunion at 6-months and, secondarily, to determine if this score would be associated with an increased risk for reoperation.

We therefore asked: (1) What RUSH score threshold yields at least 98% specificity to diagnose nonunion at 6 months postinjury? (2) Using the threshold identified, are patients below this threshold at greater risk of reoperation for nonunion repair and for other indications?

Materials and Methods

We used a convenience sample from the ongoing Fixation Using Alternative Implants for the Treatment of Femoral Neck Fractures (FAITH) trial [8]. FAITH is a prospective randomized controlled trial comparing multiple cancellous screws versus the sliding hip screw for the treatment of elderly femoral neck fracture patients (ClinicalTrials.gov Identifier, NCT00761813). This trial is coordinated by the Centre for Evidenced-Based Orthopaedics (McMaster University) and has been approved by the Hamilton Integrated Research Ethics Board (#06-402). All radiographs and measures of fracture healing were obtained prospectively as part of the trial’s protocol. Using the study radiographs, two authors (TF, GO) independently assigned a RUSH score to the 6-month postinjury radiographs of all included patients. The reviewers were blinded to study outcome events and all analyses were performed using fracture healing data obtained from the trial.

To be eligible for the current study, patients were required to have adequate AP and lateral hip radiographs at 6 months (followup window, 5–7 months) and to have all study outcome data reviewed by the independent Central Adjudication Committee (CAC). Adequate radiographs were those that had sufficient penetrance and image quality such that the components of the RUSH score could be reliably assessed. The 6-month followup visit was selected as the time point of interest for our study because it was felt to be the first clinical visit that unhealed fractures could be declared a nonunion and, similarly, any reoperations that occurred before 6 months would likely have been for indications other than nonunion. A representative sample of 250 trial participants were included in our analysis out of a possible 1112 enrolled in the FAITH trial. Four hundred seventy-five patients were excluded for inadequate or missing 6-month followup radiographs. The remaining 387 patients had not completed their 2-year study followup or their clinical data had not yet been analyzed by the CAC.

Our primary clinical outcome was nonunion at 6-months, as defined by an ununited femoral neck fracture at the 6-month assessment. We acknowledged that defining a femoral neck nonunion as any ununited fracture at 6 months is a simplistic determination but was appropriate for the context of the study design. Nonunion events were obtained from two sources: 1) the treating surgeon’s assessment at the point of care; and 2) the CAC’s assessment of fracture healing based on radiographs only. The assessment of fracture healing by the local treating surgeon was used as the primary outcome defining nonunion at 6 months. We opted to use the local assessment of fracture healing because the treating surgeon had the benefit of using the radiographic and clinical examination to make his or her determination.

The CAC consisted of five experienced orthopaedic trauma surgeons (GPS, GDR, SL, KJ, RH) trained in outcome adjudication. The committee retrospectively viewed all available radiographs and case report forms describing study outcome events such as reoperation. The CAC made a consensus determination of either “healed” or “not healed” at each study time point based solely on radiographic parameters. A fracture was considered healed when there was complete obliteration of the fracture line on the radiograph. This was intentionally a conservative assessment because the CAC was unable to evaluate the patients clinically, unlike the local surgeon. Details surrounding the indication for reoperation were reviewed by the CAC to ensure this met the criteria for a study event, namely the reoperation, was unplanned and related to the femoral neck fracture.

Although we recognized that using the treating surgeon’s assessment may introduce local bias or increased intersurgeon variability in the determination of fracture healing, the additional use of the CAC data would provide a secondary assessment to further validate the findings of our study. The repeat analysis using the CAC’s assessment of healing revealed similar findings to surgeon assessment (Figs. 1, 2). As expected, the CAC’s more stringent definition of radiographic healing resulted in a larger proportion of fractures being classified as a nonunion at 6 months (CAC: n = 122 of 250 [49%], treating surgeons: n = 53 of 250 [21%]; p < 0.001); however, the mean RUSH score of the nonunion group was similar to the result using the local surgeon assessment (CAC: 22.6 ± 3.7, treating surgeons: 22.1 ± 4.0; p = 0.15).

Fig. 1
figure 1

The RUSH scores based on the treating physicians’ assessment are stratified by fracture healing in the scatterplot.

Fig. 2
figure 2

The RUSH scores based on the CAC’s radiographic assessment are stratified by fracture healing in the scatterplot.

Two investigators (TF, GO) reviewed the 6-month radiographs of all included patients and independently assigned a RUSH score for each patient. The reviewers were provided with the original publications describing the methods for assigning a RUSH and a brief tutorial to promote consistency. The RUSH quantifies four measures of healing: cortical bridging, cortical fracture disappearance, trabecular consolidation, and trabecular fracture disappearance [3]. Cortical healing is assessed in four anatomic femoral neck regions (anterior, posterior, medial, lateral) and trabecular healing is measured with two assessments (fracture line disappearance and consolidation of matrix). Each of the 10 assessed dimensions of radiographic femoral neck healing are scored 1 to 3, leading to a minimum score of 10 (no signs of healing) and a maximum score of 30 (perfect healing) (Fig. 3). The average RUSH score between the two reviewers’ assessments was used as the final RUSH score for all analyses. Interobserver agreement of RUSH scores was assessed by the interclass coefficient (ICC 2,k) to ensure adequate agreement. An ICC of > 0.8 was used to define nearly perfect agreement, as suggested by Landis and Koch [13]. There was substantial reliability between the reviewers assigning the RUSH scores (ICC, 0.81; 95% confidence interval [CI], 0.76–0.85). The mean 6-month RUSH score for the entire cohort was 24.3 ± 3.4.

Fig. 3
figure 3

Radiographs taken at 6 months postinjury serve as examples for a low-scoring RUSH assessment (RUSH: 12) and a high-scoring RUSH assessment (RUSH: 30).

Statistical Analysis

All data were compiled in an Excel database (Microsoft Corp, Redmond, WA, USA) and exported to JMP 9.0 (SAS Institute Inc, Cary, NC, USA) for statistical analysis. Unless otherwise denoted, data were summarized with its mean and SD or as proportions with 95% CIs. A Student’s t-test was used to test for difference in RUSH scores between subgroups. The level of significance was defined as p < 0.05. The primary analysis was performed using the treating surgeons’ assessment of fracture healing at 6 months and the average RUSH score for the corresponding 6-month hip radiographs. Secondary associations with the 6-month RUSH score were examined using the CAC’s assessment of 6-month fracture healing as well as reoperations between 6 months and 2 years postinjury for nonunion and all-cause reoperation indications such as implant removal.

Receiver operating characteristic tables were computed to examine the specificity of threshold RUSH values that would correctly classify a nonunion. Because the study’s primary objective was to identify a threshold RUSH score that would be specific for nonunion, we sought to maximize the specificity of the threshold RUSH value. This would ensure all radiographs with a RUSH score below the defined threshold would be classified as nonunions. A priori, we decided that a threshold with > 98% specificity would represent a clinically acceptable margin of error and would meet our study objective. Finally, the positive predictive value (PPV = true-positives/true-positives + false-positives) was calculated to assess the discriminatory value of our threshold for defining nonunion.

Results

The mean age of study participants was 71 ± 12 years and two-thirds of the fractures were Garden I or II fracture patterns (Table 1). A total of 121 patients received a sliding hip screw and 129 received multiple cancellous screws.

Table 1 Participant demographics (n = 250)

A RUSH score of < 18 corresponds to 100% specificity and a PPV of 100% with 13 cases meeting criteria. Increasing specificity is achieved by decreasing the RUSH threshold score for both the treating surgeon and the CAC assessments of healing (Table 2). According to the treating surgeons’ assessments, 53 patients (21%) had a femoral neck nonunion at 6 months. The mean RUSH score was 2.8 points higher (95% CI, 1.7–4.0) among healed fractures compared to ununited fractures (22.1 ± 4.0 versus 24.9 ± 3.0, respectively) (Fig. 1). Based on the treating surgeons’ assessment of fracture healing, a 6-month RUSH score of less than 16 points defines a nonunion; however, this diagnostic threshold has a poor PPV (43%) and only six cases in the entire series met this definition.

Table 2 Threshold RUSH score for defining nonunion

Patients with a RUSH score of < 18 were 10 times more likely to undergo a nonunion reoperation than individuals with higher scores (relative risk [RR], 9.9; 95% CI, 4.4–22.7). Similarly, the 18-point threshold was also predictive for reoperation for all indication (RR, 2.7; 95% CI, 1.7–4.4).

The repeat analysis using the CAC’s assessment of healing revealed similar trends (Fig. 2). As expected, the CAC’s more stringent definition of radiographic healing resulted in a larger proportion of fractures being classified as nonunion at 6 months (n = 122 [49%]); however, the mean RUSH score was approximately 3 points higher among healed fractures versus nonunions (p < 0.001).

Discussion

The assessment of femoral neck fracture healing remains highly subjective and causes disagreements among specialists. The RUSH score is an instrument designed to describe radiographic healing of femoral neck fractures. The impetus for this study was the desire to expand the utility of the RUSH score to standardize a reproducible definition of fracture nonunion and to improve the ability of describing the continuum of radiographic healing in this metaphyseal region. By describing a method to assess cortical and trabecular healing, previous studies have demonstrated excellent inter- and intrarater reliability of the instrument; however, the RUSH score has not been used to define fracture healing. In the current study, we aimed to determine a RUSH threshold score that would correctly classify femoral neck fractures that have not healed at 6 months.

There is no gold standard for the assessment of femoral neck fracture healing and this remains problematic. Although radiographic healing has traditionally been an important outcome of interest, the current results demonstrate a large discordance between clinical practice and determining healing outcomes solely by radiographs. As a result, we believe this observation continues to stress the necessity of orthopaedic research to be based on clinical data and patient-reported outcomes. Furthermore, it must also be mentioned that both RUSH score assessors knew that all radiographs were taken at 6 months postinjury. Although this is not a limitation of the study’s internal validity, it is important that readers recognize that the findings likely do not apply to radiographs earlier than 6 months postinjury. We also acknowledged that defining a femoral neck nonunion as any ununited fracture at 6 months is a simplistic determination but was appropriate for the context of the study design. Additionally, although we captured a representative sample of 250 out of a possible 725 patients, the rather large number of exclusions was attributed to lack of sufficient radiograph quality for accurate RUSH score assessment and this highlights a possible limitation to using the RUSH. Finally, although we were only able to analyze the radiographs of 250 patients, we do not believe there is any reason to suspect this sample of the study population would lead to biases in the RUSH scores assigned or the external validity of our results.

Our results suggest that a RUSH score of less than 18 has a 100% PPV for defining radiographic nonunion at 6 months postinjury; however, when one defines fracture healing using the treating surgeons’ combined clinical and radiographic assessment, there is no RUSH threshold that is useful for classifying femoral neck fracture no-unions. The discrepancy between the local surgeons’ assessment and the radiograph-only assessment (CAC) highlights the heavy influence clinical evaluation imparts on a surgeon’s determination of fracture healing. The use of the CAC definition of healing was intentionally conservative; however, the assessment of the PPV is dependent on the prevalence of nonunion in the sample. Because the nonunion rate changes with certain populations, this will inevitably have an effect on the PPV of the 18-point RUSH threshold. The use of multicenter data for our study population increases the confidence that our observed prevalence of nonunion is generalizable to most populations; however, one would expect the prevalence, and therefore the PPV, to change depending on differences in the proportion of displaced fractures.

The RUSH score threshold of 18 also demonstrates utility in predicting reoperations for nonunions as well as reoperations for any other indication. RUSH scores below this threshold were 10 times more likely to undergo a nonunion reoperation than individuals with higher scores, and as expected, we found lower 6-month RUSH scores among patients who experienced a reoperation after their 6-month visit, both for indications of nonunion (19.6 ± 4.9 versus 24.7 ± 3.0) and all-cause indications (22.9 ± 4.2 versus 24.8 ± 3.0, p = 0.002). This finding suggests again that the radiographic parameters of a femoral neck fracture at 6 months measured by the RUSH score are reliable for assessing radiographic healing but the discrepancy when using treating surgeon definitions of nonunion underlines the finding that this may not necessarily tell the entire clinical picture.

The 6-month RUSH score is a reliable method for assessing radiographic healing. Our results highlight the discordance between radiographic determinations and clinician assessments of fracture healing and stress the need for clinical data to be incorporated in research studies evaluating fracture healing.