Introduction

Assessment of achievement in orthopaedic surgery is shifting toward resulting patient health-related quality of life (HRQoL) rather than the measure of success or failure of joint implants as defined by crude endpoints like revision. Ahmad et al. [1] concluded that the ideal outcome measure for patients who have undergone THA should be specific for the hip, possess a generic component, and take into consideration patient comorbidities and the need for walking aids. Although no such ideal outcome measure exists, numerous patient-reported outcome measures (PROMs) appear in the literature and practitioners have augmented the measures with different comorbidity scores. PROMs not only help assess patient eligibility for elective surgery, but also can serve as predictors of patient quality of life after hip surgery [12]. Existing comorbidities are known to influence the outcome of surgery and, depending on the severity, can inhibit elective procedures such as THA. As a result, researchers and clinicians often consider comorbidity scores such as those determined using the Charlson Comorbidity Index, Charnley classification, and the American Society of Anesthesiologists physical function score, among others [2]. Comorbidities are defined as patient conditions or diseases not associated with the development or causation of the immediate disease of interest, and comorbidities can be diagnosed at different points in time, which may lead to different associations with the risk of adverse outcomes [18]. Thus, it is necessary to establish the best timeframe for calculating summary comorbidity measures.

Originally, the Swedish Hip Arthroplasty Register’s PROMs program routinely collected data regarding patient pre- and postoperative HRQoL for the purpose of monitoring the performance of the nation’s healthcare centers [22]. More recently, with healthcare centers achieving high success rates, the focus has been diverted to the goal of identifying patients who might not experience sufficient improvement in HRQoL. To this end, the Swedish Hip Arthroplasty Register proposed the construction of a decision-making support algorithm to estimate HRQoL 1 year postsurgery of patients who underwent THA. Such an instrument needed to consider both hospital- and patient-related factors. Existing patient comorbidities are known to influence the outcome of medical procedures, but there is no consensus on how best to use the available data about comorbidities in registry-based predictive models.

The objective of our study was to determine whether International Classification of Diseases, 10th Revision (ICD-10)-based comorbidity measures (Elixhauser, Charlson, and the Royal College of Surgeons [RCS] Charlson) had added value for explaining postoperative HRQoL, pain, and satisfaction with the outcomes of surgery for patients who had undergone THA when preoperative HRQoL, pain, and Charnley classification were already known. We also aimed to investigate the ideal timeframe for recording the different diagnoses that served as the basis for comorbidity measure calculations.

Materials and Methods

We performed a registry study with prospectively collected PROM data from the national Swedish Hip Arthroplasty Register. These data were then merged with diagnosis data from the Swedish National Patient Register where each of the comorbidity scores was calculated retrospectively on the basis of patient medical diagnoses contained therein.

The Swedish Hip Arthroplasty Register data initially identified 22,263 patients who had undergone THA (Fig. 1). Mean patient age at time of surgery was 68 years (± 9.9 years). The surgeries were performed between January 1, 2002, and December 28, 2007, with complete preoperative and postoperative PROM data available. If a patient had both hips operated on during the observation period, only the first hip with complete PROMs was included in the analysis. There was a slight predominance of women patients (12,696). To calculate comorbidity scores for each patient, we obtained individual diagnoses (ICD-10) from the Swedish National Patient Register. The two registers were crossmatched with the help of personal identity numbers unique to each Swedish resident to establish demographic, surgical, PROM, and comorbidity data for each patient (Table 1). Data collection and merger of the two databases was approved by the national ethical review board.

Fig. 1
figure 1

This flowchart indicates patient selection from the Swedish Hip Arthroplasty Register.

Table 1 Summary of study population PROMs pre- and post-THA, Charnley classification,and ICD-10-based comorbidity measures

Data Analyses

We calculated the Elixhauser, Charlson, and the RCS Charlson Comorbidity Indices for each patient using ICD-10 diagnosis codes based on the algorithms developed by Quan et al. [19]. (For further details, we refer the reader to Gordon and collaborators [11].) The three indices identified possible comorbidities and, based on their pathophysiologic properties, group comorbidities in different categories. The Charlson Comorbidity Index recognizes 19 distinct categories; the RCS Charlson Index has 14, whereas the Elixhauser has 31 distinct categories. If a patient had any illness identified, the different comorbidity categories (ie, the patient’s medical history included the ICD-10 classification code during the studied time period), then that category was assigned a value of 1; if no ICD-10 codes appeared in the record, the value given was zero.

The RCS Charlson and Elixhauser comorbidity scores assume that each of the distinct categories weighs equally on patient health and the final index value was the sum of the different categorical values. Elixhauser et al. questioned this approach because the impact of disease categories may vary among different outcomes [8]; however, counting comorbidities was recently found to be a viable alternative to weighted indices [11]. The Charlson Index assigns weight for each category based on the seriousness of the disease group and the final index is a weighted sum of the different categorical values [19].

We calculated the Elixhauser, Charlson, and RCS Charlson comorbidity scores based on three overlapping timeframes. First, we went back 1 year from the actual surgery date and extracted all ICD-10 diagnosis codes that the patient had at that point. Then, we extended the presurgery timeframe to 2 years and then to 5 years before THA. Thus, for every patient, the 1-year comorbidity score should have been equal to or greater than the 2-year comorbidity score, which should have been equal to or greater than the score 5 years before THA.

The PROM protocol consisted of the HRQoL measure EQ-5D, a visual analog scale (VAS) for pain (pain is rated from 0 to 100, where 0 = no pain), the Charnley classification survey, and a VAS measuring patient satisfaction with the outcomes of surgery (from 0 to 100, where 0 = complete satisfaction) [3, 9]. There are three Charnley classifications of musculoskeletal impairment: Class A, which corresponds to unilateral hip disease; Class B, which corresponds to bilateral hip disease; and Class C, which indicates multiple joint disease or some other condition that inhibits the patient’s ability to walk. The surveys were administered preoperatively (excluding the satisfaction VAS) and then 1 year after THA. The protocol was implemented in the Swedish Hip Arthroplasty Register in 2002 and gradually gained nationwide coverage [22].

The impact of the comorbidity scores on the four patient-reported outcomes (PROs) were modeled with linear regression (EQ-5D index, EQ VAS, pain VAS, and satisfaction VAS). For each of the three timeframes, we built a regression analysis that included each of the 1-year followup PROs as the dependent outcome. The 1-year timeframe model included patient preoperative Charnley class, the preoperative value of the considered PRO, and the comorbidity scores calculated on diagnoses reported 1 year before THA. The 2-year presurgery timeframe model replaced the comorbidity scores from the previous model with comorbidity scores calculated on patient diagnoses reported 2 years before THA. The 5-year, pre-THA timeframe model replaced the patient comorbidity scores from the previous model with comorbidity scores calculated on diagnoses reported 5 years before THA.

To determine the influence of the three comorbidity measure on the PROs, the global predictive power of the models was summarized by the coefficient of determination (R2), which denotes the proportion of variance in the outcome explained by the predictors. To understand the individual impact of each predictor, we assessed its partial R2 value to determine how much each variable uniquely contributed to the global R2 value. To counteract possible distortion by including all three comorbidity indices in a single model, we calculated the partial R2 value for univariate models regressing the postoperative PROMs on each of the different comorbidity indices. These models were first adjusted for Charnley classification and then were adjusted for the appropriate preoperative PROM score.

To gauge if there was improvement in the predictive power with the use of a longer timeframe for calculating the comorbidity indices, we calculated the difference between the predictive power of the model based on patient 1-year comorbidity scores with the 2-year and 5-year comorbidity scores, respectively. By using 1000 nonparametric bootstraps, we built 95% confidence intervals (CIs) of the differences between the R2 values. If the interval included zero, then we concluded that there was no evidence suggesting that a longer timeframe would increase the predictive possibilities of the model. In addition, we tested to see whether we could obtain an increase in precision if the interaction between the comorbidity indices and Charnley classification was included in the model. Although we saw an interaction between Charnley Class C and the comorbidity indices as one would expect, this was not accompanied by an increase in predictive power (data not shown), and so these interactions were not pursued further.

Results

Independent of the length of followup, the models generally had low predictive power. Use of the Charnley classification, the three comorbidity scores, and the preoperative value of the outcome explained roughly 10% of the variation observed in EQ-5D index and EQ VAS; however, the pain VAS (3%) and the satisfaction VAS (1%) were insufficiently explained by the aforementioned predictors.

Predictive Value of Comorbidity Measures

Charnley classification and the preoperative scores were the strongest predictors of both measures of postoperative HRQoL (EQ-5D index and EQ VAS) and of postoperative pain (pain VAS). Similarly, Charnley classification was the strongest predictor for postoperative satisfaction with outcomes from surgery (satisfaction VAS). Regardless of the timeframe considered, the Charlson and Elixhauser comorbidity index scores had a marginal influence on the postoperative EQ-5D index (ß = −0.023; 95% CI, −0.035 to −0.011; p < 0.001; and ß = −0.010; 95% CI, −0.015 to −0.004; p < 0.001, respectively) (Fig. 2). Similarly for the postoperative EQ VAS, the Charlson and Elixhauser comorbidity index scores had a marginal influence on the outcome score (ß = −3.407; 95% CI, −4.400 to −2.414; p < 0.001; and ß = −0.886; 95% CI, −1.353 to −0.420; p < 0.001, respectively) (Fig. 3). The pain VAS was not influenced by the Charlson or RCS Charlson scores, but was marginally influenced by the Elixhauser comorbidity index score (ß = 0.522; 95% CI, 0.091–0.954; p = 0.018) (Fig. 4). The Elixhauser comorbidity index was the only score of the three comorbidity measures to influence the satisfaction VAS at 1 year (ß = 0.628; 95% CI, 0.141–1.116; p = 0.012) (Fig. 5). Finally, the low predictive power of each of the comorbidity indices on the PROM values was not found to be the result of colinearity in the tested models (Table 2).

Fig. 2
figure 2

This forest plot displays the predictive power of ICD-10-based comorbidity measures, self-reported disease-specific comorbidity measure, and the preoperative EQ-5D index values of the postoperative EQ-5D index values.

Fig. 3
figure 3

This forest plot displays the predictive power of ICD-10-based comorbidity measures, self-reported disease-specific comorbidity measure, and the preoperative EQ VAS values of the postoperative EQ VAS values.

Fig. 4
figure 4

This forest plot displays the predictive power of ICD-10-based comorbidity measures, self-reported disease-specific comorbidity measure, and the preoperative pain VAS values of the postoperative pain VAS values.

Fig. 5
figure 5

This forest plot displays the predictive power of ICD-10-based comorbidity measures and self-reported disease-specific comorbidity measure of the postoperative satisfaction VAS values.

Table 2 Predictive power of International Classification of Diseases-based comorbidity measures of the postoperative PROM values

Ideal Timeframe for Diagnosis Recording

For each of the outcomes considered (EQ-5D index, EQ VAS, pain VAS, and satisfaction VAS), there was a gradual increase in the models’ predictive power with the length of the timeframe considered for calculating the comorbidity measures. For the EQ-5D index, the predictive power of the 1-year timeframe was lower than the 2-year timeframe model (R2 difference = −0.003; 95% CI, −0.0049 to −0.0019; p = 0.001). Similarly, the 5-year timeframe model outperformed both the 1-year timeframe model (R2 difference = −0.006; 95% CI, −0.0083 to −0.0039; p = 0.001) and the 2-year timeframe model (R2 difference = −0.002; 95% CI, −0.0046 to −0.0011; p = 0.013) (Table 3). For the EQ VAS, the predictive power of the 1-year timeframe model was significantly lower than the 2-year timeframe model (R2 difference = −0.0031; 95% CI, −0.0046 to −0.0019; p = 0.001). Similarly, the 5-year timeframe model outperformed both the 1-year timeframe model (R2 difference = −0.005; 95% CI, −0.0078 to −0.0038; p = 0.001) and the 2-year timeframe model (R2 difference = −0.002; 95% CI, −0.0045 to −0.0011; p = 0.013) (Table 4). For the pain VAS, the predictive power of the 1-year timeframe model was similar to the 2-year timeframe model’s predictive power (R2 difference = −0.0004; 95% CI, −0.001 to 0.0001; p = 0.001). The 5-year timeframe model slightly outperformed both the 1-year timeframe model (R2 difference = −0.001; 95% CI, −0.0023 to −0.0004, p = 0.001) and the 2-year timeframe model (R2 difference = −0.0008; 95% CI, −0.0016 to −0.0001; p = 0.013) (Table 5). Finally, for the satisfaction VAS, the predictive power of the 1-year timeframe model was significantly lower than the 2-year timeframe model (R2 difference = −0.0012; 95% CI, −0.002 to −0.0006; p = 0.001) and, once again, the 5-year timeframe model outperformed both the 1-year timeframe model (R2 difference = −0.002; 95% CI, −0.0036 to −0.0012; p = 0.001) and the 2-year timeframe model (R2 difference = −0.001; 95% CI, −0.0022 to −0.0003; p = 0.013) (Table 6). Although these increases were statistically significant, their clinical importance would need to be determined for each outcome measure.

Table 3 Influence of measures on postoperative EQ-5D index values
Table 4 Influence of measures on patient postoperative EQ VAS values
Table 5 Influence of measures on postoperative pain VAS values
Table 6 Influence of measures on postoperative satisfaction VAS values

Discussion

National registry data can help us to better understand PROs. When conducting these analyses, proper risk adjustment is critical to understand how patient comorbidities influence these surgical outcomes after THA. Unfortunately, there is no consensus on how best to incorporate current comorbidity indices in registry-based predictive models. For the purpose of predicting patient HRQoL, pain, and satisfaction 1 year after undergoing THA, we found that there was no added value in the ICD-10-based comorbidity measures if the patient’s Charnley classification and preoperative HRQoL and pain measures were known regardless of the timeframe used to calculate the comorbidity measures. All three comorbidity measures considered (Elixhauser, Charlson, and the RCS Charlson) had low predictive power.

There were a few limitations to our investigation. Sweden has a very successful arthroplasty registry and PROM followup model with 98% of all THAs captured and a 90% PROM response rate at 1 year after surgery; however, not all data in our investigation could be obtained from the Swedish Hip Arthroplasty Register. The ICD-10 codes used to identify and calculate the comorbidity measures were obtained from the Swedish National Patient Registry. Unfortunately, the National Patient Registry only collects diagnosis codes of hospitalized patients but not diagnosis codes identified by general practitioners; therefore, some comorbid conditions may have been missed during calculation of the comorbidity measures. ICD-10 codes for general practitioners are gathered at the county level but collection of records from all 21 Swedish counties would have been cost- and time-prohibitive for our project. In addition to the limitation that comorbidities included in this study were restricted to those reported during hospitalizations, the accuracy of reporting all relevant diagnoses at discharge may be questioned, because of the challenges associated with administrative coding. We were also limited to including only patients with complete pre- and postoperative PROs. Although the Swedish Hip Arthroplasty Register had 100% nationwide coverage of all hospitals during the observation period, only 92% of the hospitals were participating in the routine PROMs followup by 2007, which meant that some hospitals were not included in our analyses [13]. Finally, an important consideration for these analyses is that THA is an elective procedure, which fundamentally limits the population for whom this treatment is available. Patients must be healthy enough to be eligible to undergo THA, as evidenced by the comorbidity indices in this population, and therefore, comorbidity measures aimed at investigating mortality are inherently unlikely to be associated with how a patient will report their HRQoL, pain, or satisfaction after the procedure.

We also acknowledge that gender and age are important confounders influencing the studied comorbidity measures and PROMs. Moreover, in this setting, the comorbidity measures (Charnley classification and the ICD-10-based comorbidity measures) act as mediators that channel some of the effects of age and gender on PROMs. We opted not to present results adjusted for age and gender to see the maximal predictive power that the comorbidity measures had on PROMs. Additionally, models regressing postoperative PROMs on gender and age, preoperative PROMs, Charnley classification, and the ICD-10-based comorbidity measures (together or one at the time) showed that neither age nor gender modified the effect or predictive power of the Charnley classification and the ICD-10-based comorbidity measures. Both age and gender had predictive power lower than 1%.

It is important to remember that the self-administered Charnley classification survey was used in our investigation. Despite the simplicity of the original Charnley classification [5], surgeon-assigned classification has a high rate of interobserver variability [2]. Charnley Class C, as reported by McGuigan et al., was approximately 4% [16], whereas Lavernia’s group reported 30% [14]. Complicating matters further, some researchers have split Class B into two groups: those who have not been operated on the contralateral hip (B1) and those who have (B2) [7, 21]. It was recently shown that the classification does not require splitting because patients naturally classify themselves similarly to Charnley classification definitions [10]. For these reasons, clinicians are encouraged to use the self-administered Charnley classification survey rather than assign the classification themselves.

Charnley classification and the preoperative value of the outcome examined contributed most to the models’ predictive power rather than the three comorbidity measures tested. Similarly, Lavernia et al. [14] found comorbidity scores correlated poorly with patient pain level and functional abilities after total joint arthroplasty. Interestingly, Charnley classification had stronger predictive power than the patient preoperative health state for the EQ-5D index, EQ VAS, and, most evidently, for the postoperative pain VAS. Röder et al. [20] drew similar conclusions, finding that patients with poor preoperative walking capacity and hip flexion were more likely to have poorer clinical results after THA. In contrast to our findings, however, Röder et al. found no association between patient preoperative pain level and pain alleviation. Of the three comorbidity measures investigated, Elixhauser comorbidity scores were most predictive and RCS Charlson scores the least. Based on an extensive review, Sharabiana et al. [23] found that if a comorbidity measure were necessary, the Elixhauser comorbidity score might be the best choice, but they had reservations that the results might differ based on the patient group and the outcome being investigated. However, one must consider that predictive performance of all the comorbidity index scores was not only outcome; the risk-adjustment method also had bearing [17], as was seen in patients with osteoporosis [6, 15]. Based on our results, we conclude that if consideration of patient comorbidities is necessary to achieve a particular purpose, then the Elixhauser comorbidity score would seem the best for predicting patient HRQoL, pain, and satisfaction 1 year after THA. It is important to remember that although the comorbidity indices we used had low predictive capacity with regard to the PROMs explored, Mnatzaganian et al. [17] proved the index scores to be valuable for predicting patient in-hospital mortality, adverse events, and nonroutine discharge after major orthopaedic surgery. Furthermore, our results indicated that regardless of comorbid conditions covered by the Charlson and Elixhauser indices, patients with similar self-reported preoperative HRQoL and pain as well as similar demography have similar chances to improve from surgery.

We observed a small increase in predictive power with the increase of preoperative timeframe for our study; however, despite statistical significance, the increase lacked practical importance, suggesting that comorbidities gathered 1 year before surgery were nearly as useful as those gathered much earlier. Similarly, when examining the influence of the same comorbidity indices on rates of THA reoperation, Gordon et al. found that there was little added value by extending the timeframe beyond 1 year [11]. For these reasons, it seems unnecessary to try to obtain information about comorbid conditions more than 1 year before THA if one were to include a comorbidity index as a risk adjustment for predicting outcomes after surgery.

Given the observational nature of registry studies, proper risk adjustment is critical to eliminate confounding when exploring PRO after treatment. National arthroplasty registry data provide a great way to explore the influence of demographic, surgical, and preoperative health states on outcomes after THA. The comorbidity measures explored in our analyses were originally designed to identify rates of patient mortality and hospitalization [4, 8, 23]. It may be for that reason that only weak associations were found between the measures and PRO after THA. Identification of patient-reported Charnley classification appears to be important in understanding how a patient will fare after hip arthroplasty. A patient’s Charnley classification could be an important risk-adjustment tool, which may be considered by the clinician as well as by researchers for predicting patient HRQoL, pain, and satisfaction after THA.