Background

Recent recommendations suggest that total hip arthroplasty (THA) is indicated when the patients’ functional limitations and pain levels due to hip osteoarthritis (OA) are refractory to pharmacological and non-pharmacological treatments [1, 2]. Resurfacing hip arthroplasty (HR) is an alternative to THA in patients who are younger, more active, with normal renal function and appropriate proximal femoral bone morphology and quality [3]. Both THA and HR are considered efficacious for the great majority of patients undergoing these procedures [47]. Although generally successful at alleviating coxarthrosis-related ailments, hip arthroplasty can yield subpar results in terms of pain and functional outcomes as well as degree of satisfaction in a non-negligible proportion of patients. For example, a recent systematic review reports that 7 to 23 % of the patients undergoing THA experience unfavourable pain outcomes 3 months to 5 years after the procedure [8]. Moreover, up to 15 % of the patients report dissatisfaction with surgery [9, 10] . To our knowledge, no formal data on proportions of patients with poor pain, functional and satisfaction levels after HR exists. However, it can be posited that these proportions are similar to the ones observed among patients undergoing THA, as studies indicate that these outcomes are similar between the two procedures [11, 12].

In light of these observations, careful case management must be implemented in order to minimize unsuccessful outcomes. Potential interventions directed at improving surgical outcomes include patient education and intensive rehabilitation. However, identification of patients at risk of severe pain and functional limitations after THA or HR is difficult. A multitude of factors related to poor functional and pain outcomes following hip arthroplasty have been identified. These include worse preoperative levels of pain and function, lower educational level, comorbidities, presence of back pain or higher body mass index (BMI) among others [1322]. Nevertheless, regardless of the quantity of the evidence of potential risk factors, no definitive consensus has been reached concerning their identity and the magnitude of their association with postoperative pain, functioning and satisfaction. In light these observations, an algorithm aimed at identifying with sufficient accuracy which patients present the greatest risk of unsuccessful outcomes may assist in the care process. Therefore, the objective of this study was to develop a preliminary prediction algorithm (PA) used to identify patients at risk of unfavourable functional status, pain and joint perception one to two years following THA or HR.

Methods

Study design

This study entailed a retrospective analysis of longitudinal, prospectively collected data. The methodology adheres to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines for observational cohort studies (Additional file 1).

Data collection

Our prospective arthroplasty database was consulted in order to identify patients eligible for inclusion in the study. The database contains extensive baseline and follow-up data on patients undergoing hip procedures, including THA and HR. All patients provide informed consent to participate. Independent assessors who are not involved in the medical care of the patients collect the prospective data.

Inclusion criteria were 1) patients undergoing primary unilateral THA or HR, 2) diagnosis of primary hip OA, 3) complete preoperative and one to two-year postoperative self-reported outcome questionnaire responses. The main exclusion criteria were 1) THA or HR of the contralateral hip before the relevant follow-up evaluation, 2) revision of the implant before the one to two-year follow-up, 3) diagnosis of inflammatory hip arthritis, pediatric hip disease, post-traumatic hip or any hip disease other than primary OA. Data on all patients having undergone hip interventions were assessed for inclusion. All patients were initially interviewed just before their intervention. Postoperative outcomes were collected 12 to 24 months after the surgery.

Dependent variables

Functional status and pain levels were assessed preoperatively and at follow-up with the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) [23]. The WOMAC consists of the following domains: pain (5 items), stiffness (2 items) and functional limitation (17 items). Items are scored on a 5-point Likert scale representing different degrees of intensity (none, mild, moderate, severe or extreme). The scores of each domain as well as the total score were standardized on a 0 to 100 scale, with a greater score indicating more pain, stiffness or functional limitation. The psychometric qualities of the WOMAC, including its responsiveness, convergent construct validity and reliability have been found excellent for evaluating patients with hip OA undergoing hip arthroplasty [24, 25].

At follow-up, self-perceived joint perception was measured by asking the patient a multiple-choice question: “How do you perceive your operated hip?” with the possible responses being ”Like a native or natural joint”, ”Like an artificial joint with no restriction”, ”Like an artificial joint with minimal restriction”, ”Like an artificial joint with major restriction” and ”Like a non-functional joint” [26]. Evaluation of joint perception has been strongly associated with validated clinical scores of patient-reported outcome measures and can be employed as a measure of patient satisfaction [26].

No consensus exists regarding what represents poor outcome following hip arthroplasty. Hence, patients of risk of suboptimal outcomes were defined as the ones in the worst quartile of the total WOMAC score at follow-up (i.e. WOMAC score >11.5) and perceiving their hip ”Like an artificial joint with minimal restriction”, "Like an artificial joint with major restriction” or "Like a non-functional joint”.

Independent variables

Potential preoperative predictors

Several of the variables that were collected preoperatively and available in the database were considered as potential predictors of suboptimal hip arthroplasty outcome. Demographic variables included age and gender. Clinical variables included BMI, previous hip interventions and medical comorbidities (diabetes, gastrointestinal disease, immunosuppression secondary to corticosteroid use or other causes, cardiac disease, obesity, pulmonary disease, neurologic disease, urologic disease, and other comorbidities). Pain localization (back, radicular, buttocks, trochanter, groin, thigh, knee and/or calf) as well as whether hip pain was present at rest, after the first few steps, after a long walk and during sexual relations were also considered. Answers to the 24 individual items of the pre-operative WOMAC questionnaire were additionally included in the analysis as potential predictors.

Statistical analysis

Baseline and follow-up mean WOMAC scores along with their standard deviations were calculated. Differences between time points in relation to total WOMAC scores and the respective domains was assessed using paired samples Student-t tests, with a significance level set at 0.05.

The classification and regression tree approach was used to build the PA as it is one of the most effective algorithms of recursive partitioning [27]. It is based on maximizing the within-node homogeneity by evaluating all combinations of potential predictors, thus minimizing the within-node error. The Gini impurity measure was used as a splitting criterion to develop the decision trees [28]. Data for all the patients in the training set was used to develop the PA. Firstly, all the potential predictor variables were employed to develop models using an automated approach. Secondly, a manual approach entailed the development of additional models by inputting independent variables that were judged to be more readily available and easier to employ in a clinical setting. For example, age and gender were favoured over the number of comorbidities and previous hip interventions because the latter two could be affected by a recall bias or would require extensive medical file review. The predictive values of every model were calculated along with their 95 % confidence intervals, namely sensitivity, specificity, positive and negative predictive values as well as positive and negative likelihood ratios [29]. Among all the proposed models, the one that showed the highest level of sensitivity and an acceptable level of specificity and that fit the ease-of-use criterion was selected in order to develop the screening tool. Internal validity of the model was then evaluated by the use of 1,000 bootstrap resamples [30]. All analyses were performed using IBM SPSS Statistics Version 20.0 (SPSS Inc., Chicago).

Ethics

The research ethics committee of our centre approved the study annually.

Results

Participants

Our database yielded 2963 entries with at least some preoperative data on hip arthroplasty procedures performed from October 2004 to February 2014. Out of these, 1207 procedures (40.7 %) fit the inclusion criteria. Incomplete preoperative and/or postoperative data required for the purposes of the current study obliged the exclusion of a further 942 entries. Thus, a total of 265 primary hip arthroplasty interventions (60 classical THAs, 128 large-femoral head diameter THAs, and 77 h) with complete preoperative and postoperative data were included in the study (follow-up mean ± SD: 446.3 ± 171.1 days), representing a participation proportion of 22.0 %.

Table 1 shows selected characteristics of the 265 patients included in the study. The mean age of the participants was 52.0 (SD 9.0) and 67.4 % were male. The mean BMI was 28.2 (SD 5.1) and each patient had on average 0.79 comorbidities (SD 0.96).

Table 1 Selected characteristics of the participants who underwent hip arthroplasty (n = 265)

Mean follow-up was 446 (SD 171) days and ranged from 253 to 1638 days. Postoperatively, the patients had significantly improved on pain (−44.9, SD 22.6, 95 % CI −42.1 to −47.6), stiffness (−44.6, SD 25.1, 95 % CI −41.6 to −47.7), function (−43.6, SD 21.9, 95 % CI −40.9 to −46.2) as well as total WOMAC score (−43.9, SD 21.1, 95 % CI −41.4 to −46.5) (Table 2). Seventy-six patients (29 %) reported that they perceived their prosthetic joint as artificial with minimal or major restrictions (Table 3).

Table 2 Changes in WOMAC scores of the participants between preoperative measurement and following hip arthroplasty (n = 265)
Table 3 Postoperative joint perception of the patients who underwent hip arthroplasty according to the distribution of their postoperative total WOMAC scores (n = 265)

Out of the 265 patients eligible for inclusion in the study, 40 (15.1 %) had a total WOMAC score > 11.5 and perceived their joint as artificial with minimal or major restrictions. Hence, these patients were considered as having suboptimal surgical outcomes.

Final prediction algorithm

After developing several prediction rules, the algorithm with the highest level of sensitivity and an appropriate level of specificity was chosen. It consists of patient gender, age at the time of surgery, body mass index (BMI), and 3 items of the preoperative WOMAC, namely degree of pain with walking on a flat surface and during night and degree of difficulty with putting socks or stockings (Fig. 1). Patients respond sequentially to the questions and their risk status is determined according to the classification algorithm (Fig. 2).

Fig. 1
figure 1

Prediction algorithm to identify patients at risk of suboptimal outcomes after hip arthroplasty

Fig. 2
figure 2

Graphical representation of the prediction algorithm identifying patients at risk of suboptimal surgical outcomes after hip arthroplasty

The final PA correctly identified 30 out of the 40 patients considered at risk of suboptimal outcome and 175 patients out of 225 were identified as not at risk of suboptimal outcome (Table 4). Therefore, the PA had a sensitivity of 75.0 % (95 % CI: 59.8–85.8), a specificity of 77.8 % (95 % CI: 71.9–82.7 and a positive likelihood ratio of 3.38 (95 % CI: 2.49–4.57) (Table 5). The other prediction models that were also considered are presented in Additional file 2.

Table 4 Two by Two table of predicted versus actual outcomes of the prediction algorithm
Table 5 Validity measures of the prediction algorithm

Internal validation

Validation of the rule was established using 1,000 bootstrap re-samples. Table 5 indicates the estimated bootstrap values of the predictive measures being close to the original ones, thus suggesting an appropriate accuracy of the proposed model.

Discussion

Since THA and HR can bring significant improvement in patients suffering from hip OA, careful management of subjects at risk of having unsuccessful outcomes is indicated. We aimed to develop a prediction tool in order to facilitate the preoperative identification of these patients, which could possibly ameliorate their surgical outcomes. With a cohort of 265 patients undergoing primary hip arthroplasty for OA, we were able to create a PA predicting the identity of patients that are at the highest risk of unsuccessful outcomes. Albeit preliminary in nature and requiring further development and validation, our PA has excellent predictive capacities, with a sensitivity of 75.0 % (95 % CI 59.8.4–85.8), a specificity of 77.8 % (95 % CI 71.9–82.7) and a positive likelihood ratio of 3.38 (95 % CI 2.49–4.57).

To our knowledge, one model predicting the identity of patients at risk of poor outcomes after THA has been developed [31]. Consisting of patient age, BMI and gender, the model was able to correctly predict patients’ outcomes with a sensitivity of 87.5 % (95 % CI 52.9–97.8), a specificity of 72.4 % (95 % CI 54.3–85.3) and a positive likelihood ratio of 3.17 (95 % CI 1.66–6.05). However, surgical success was determined solely based on functional outcomes (change between the preoperative and six-month Lower Extremity Functional Scale score), and the results are based on a cohort of 37 patients.

The selection of patients for inclusion in the study was based on the availability of complete data for important determinants of hip arthroplasty outcomes as reported in the literature. This allowed the development of a prediction rule that is consistent with the clinical reality. Because there is no unequivocal definition of what represents suboptimal outcome following hip replacement, several criteria of classifying patients who are at risk have been considered and different prediction models were built accordingly. The choice of the final model was based on the principle of selecting a screening tool minimizing the number of false negatives and that is easily employable in a clinical setting. Accordingly, an algorithm with a sensitivity of 75.0 % and a specificity of 77.8 % was deemed suitable. Although the positive likelihood ratio of 3.38 of the PA can be considered subpar when compared to accepted diagnostic standards, the PA performs similarly to other validated prediction models in the epidemiological literature. For example, the positive likelihood ratios of the Ottawa Knee and Ankle rules assessing the necessity of a roentgenographic evaluation in cases of acute knee and ankle injuries respectively are both inferior to the one reported by our PA [32, 33].

The algorithm with the most appropriate predictive capabilities contains two demographic variables (gender and age), one clinical (BMI) and 3 items of the preoperative WOMAC questionnaire (two pain-related and one function scale). All of these variables have been consistently related to hip arthroplasty outcomes [1820, 22, 34, 35]. Moreover, the PA comprises all the predictors reported by Slaven et al. [31] in their model, namely age, gender and BMI, thus pointing towards the importance of these factors for prediction of hip arthroplasty results. It is noteworthy to mention that the prediction of surgical outcomes in women is achieved by age and BMI, with the body mass being the only modifiable risk factor. In the case of men, potential modifiable risk factors include BMI, degree of hip pain walking on a flat surface and during the night as well as degree of difficulty putting on socks or stockings. However, caution should be used, as recursive partitioning does not imply a causative relationship between variables [36]. Indeed, interventions targeted at ameliorating either of the items of the PA, such as weight loss in the case of high BMI, will not necessarily improve the outcome of the surgery; it will merely imply that the patient will be classified as not at risk of suboptimal outcomes by the PA. Further research in terms of appropriate interventions to improve surgical outcomes should be undertaken.

When developing the PA, we intended for it to be a clinically pertinent tool. The decision to include patients with different types of hip arthroplasties was taken in order to generate a PA that has the ability to perform successfully in a heterogeneous population. Moreover, we included subjects with complete information one to two years following the procedure, as patients are followed closely by their surgeons during this period, and the rehabilitation process can easily be altered if the progression is judged suboptimal.

In one instance, the interpretation process may yield a counterintuitive situation. For example, it is possible, in an extreme scenario, for a 49 year-old male patient with a BMI of 22 kg/m2 and with no pain when walking on a flat surface as well as with no difficulty with putting on socks or stockings to be classified as at risk of suboptimal surgical outcomes. This pattern of answers was however shown to have the best predictive capabilities when developing the algorithm with recursive partitioning. This situation underlines the concept that a predictor is not necessarily a determinant.

Strengths of the study

The developed PA is, to our knowledge, the first one of its kind to discriminate THA or HR results based on more than one parameter, namely patients’ functional, stiffness and pain levels as well as their perception of the replaced hip joint. In the context of a lack of an accepted standard of surgical failure, this approach increases the likelihood of the patients thusly classified to truly present subpar outcomes. Moreover, this classification identified 15.1 % of the patients as having unsuccessful outcomes, well in line with the published proportions of what can be considered a suboptimal outcome [8]. Finally, the rigorous statistical analysis employed in the development of the PA underlines the stringency of our approach.

Limitations of the study

Due to its retrospective design, the study has a certain risk of selection bias. Compared to the subjects excluded due to missing data, the included participants were, on average, younger, had a greater number of comorbidities and a greater proportion were male (p < 0.05, data not shown). Nevertheless, the preoperative baseline status as measured by the WOMAC domains as well as the total WOMAC score was not statistically significantly different between the included and the non-included subjects (p > 0.05, data not shown). Additionally, only 265 out of the 1207 procedures (22.0 %) that were performed during the study time period met the inclusion criteria, therefore potentially limiting the generalizability of the results. Moreover, the population under study was patients undergoing primary unilateral hip replacement procedures, which precludes the utilization of this tool for patients undergoing revision or bilateral interventions. Recent evidence identifies other variables potentially associated with hip arthroplasty outcomes that were not included in our study, thus potentially limiting the pool of candidate predictor variables. Although there is no consensus regarding the optimal sample size for developing models employing recursive partitioning, the progressively smaller number of cases in the leafs as the tree was built may limit the reliability of the findings, prompting further development in a subsequent study. Before employing it in a clinical setting, the decision rule has to be validated in a different sample of patients. Moreover, the performance of the PA has to be compared to clinical judgement alone and its financial impacts require evaluation.

Conclusions

The developed PA may discriminate with excellent capabilities the patients undergoing hip arthroplasty that are at the highest risk of suboptimal pain, functional limitations and joint perception outcomes on an average of 15 months following the intervention. Its implementation has the potential of targeting susceptible individuals such as to modify their risk profile, and eventually, improve surgical results.