Introduction

Despite the overall effectiveness of TKA, a subset of patients experience unsatisfactory results with respect to pain, function, and restoration of quality of life [39, 45]. For a patient to ultimately benefit from a surgical intervention, the procedure must result in a meaningful improvement in health, pain, or function such that the patient might consider repeating the intervention. The minimum clinically important difference (MCID) can be used to specify this degree of change in health and is one way to define what constitutes a successful outcome after a surgical intervention [8, 18, 23]. Prior studies that have defined and evaluated the MCID after TKA have shown that a notable proportion of patients do not experience this degree of change postoperatively [9, 11]. Furthermore, numerous studies have documented dissatisfaction rates after TKA ranging from 15% to 20% [25, 39]. A recent appropriateness study of patients from the Osteoarthritis Initiative concluded that 34% of TKAs were performed inappropriately [36]. Regional, racial, and gender variations in patient selection throughout the United States also highlight the need to better define surgical appropriateness criteria and develop improved shared decision-making tools [14].

Improved quality of life and functional ability are often considered the most important outcomes after major joint replacement, and patients’ own assessment of these outcomes is a key element in evaluating the effectiveness of the procedure. Consequently, the focus of outcomes assessment has shifted to a more patient-centered approach with the use of patient-reported outcome measures (PROMs). Disease-specific measures such as the Knee injury and Osteoarthritis Outcome Score (KOOS) are more sensitive to change within the context of a specific illness. Generic measures such as the SF-36 and SF-12 version 2 (SF12v2) provide information on overall health status, incorporate elements of psychosocial health, and have the ability to capture the effects of medical comorbidities on quality of life [21].

There is much interest in the ability to preoperatively identify patients who are at greatest risk of unsatisfactory outcomes after TKA as well as those who will benefit most. Preoperative pain and functional status, as measured by PROMs, have been shown to predict pain and functional ability after TKA [15, 27, 40]. More specifically, although patients with higher levels of preoperative pain and disability demonstrate the greatest improvements in PROM scores, they do not achieve absolute postoperative scores comparable to patients with less preoperative pain and better baseline function [13, 24]. A substantial body of evidence has also demonstrated that mental and emotional health influence postoperative PROMs after TKA [4, 5, 13, 16, 17, 19, 24, 26, 27, 40, 43, 44]. Worse preoperative mental and emotional health is associated with smaller improvements in physical function scores after TKA [4, 5, 16, 17, 22, 27, 40]. These findings highlight the importance of considering both the physical and mental components of preoperative PROMs if they are to be incorporated into a decision-making tool. To our knowledge, this has not been done for patients undergoing TKA.

The purpose of this study was to determine the association between preoperative PROM scores and the likelihood that patients undergoing TKA would experience meaningful clinical improvement 1 year after surgery, as defined by the MCID. Specifically, we asked whether (1) calculated threshold values would be acceptably predictive and define maximum preoperative functional component scores after which the likelihood of experiencing a meaningful improvement begins to diminish; and (2) controlling for baseline mental and emotional health would improve the predictive ability of these threshold values.

Patients and Methods

Data included in this study were obtained from a longitudinally maintained joint replacement outcomes registry, including multiple surgeons, from the primary author’s institution (JLB). The database includes clinical information and patient-reported outcomes for patients undergoing TKA. Patient-reported outcomes were collected preoperatively and 1 year after surgery as part of routine care. The database also includes patient demographic information including age, gender, and race. Patients selected for this study had a history of primary unilateral TKA and available PROM data recorded at both preoperative and 1-year postoperative time points. To ensure the analysis was performed on a homogenous patient population, the data analysis excluded patients with pathological fracture, malignant neoplasm, or a history of a subsequent procedure on the knee. It was anticipated that 500 patients would be needed to establish appropriateness thresholds, allowing the assessment of receiver operating characteristic (ROC) curves and areas under the curve (AUCs) with a standard error of no more than 0.03 for an expected AUC of 0.7.

In total, 562 patients with knee osteoarthritis who underwent primary TKA between 2009 and 2013 met our inclusion criteria. Average patient age was 67 years (SD ± 11), 59% of the patients included in the study were women, and 76% were white. This cohort of patients represented 75% of the 750 patients undergoing primary, unilateral TKA included in our institution’s joint replacement registry who had no history of a subsequent procedure. The remaining 188 patients were excluded from the study because they either did not have 1-year followup PROM scores available (60 patients) or were lost to follow-up entirely (128 patients). No differences were found between the study cohort and all patients lost to follow-up when comparing preoperative and postoperative PROM scores (Table 1).

Table 1 Comparison of study cohort to patients without 1-year PROM data

Preoperative and 1-year postoperative KOOS and SF-12v2 PROMs were collected through an electronic interface or on paper by a research assistant (DP). The KOOS consists of 42 items separated into five subscales and is scored from 0 to 100 with 0 being the worst level of pain and function. The SF12v2 is a revised version of the original SF-12 with wording modifications to improve readability and ease of use. SF12v2 physical and mental composite scores (PCS and MCS, respectively) range from 0 to 100 in which a score of 0 indicates the lowest level of health. The scores of both subscales are calculated from the survey’s 12 questions. Survey questions that assess mental and emotional health address the effect of emotional problems (such as feeling depressed or anxious) on work or regular daily activities. PROM scores and SDs were calculated using the scoring algorithms for each outcomes instrument. The SF12v2 PCS and MCS components were considered as separate outcome measures and not as individual subscales.

MCID may be calculated using consensus, anchor, or distribution-based methods [23, 28]. The MCID after TKA has been defined using the WOMAC and SF-36 PROMs and may be estimated, by using a distribution-based method, as half the SD of outcome change scores for a given instrument [11, 30, 46]. This method of calculation was chosen after evaluating several other possible estimates of the MCID, including distribution-based methods that use either 95% confidence intervals or half the interquartile range (IQR)/2. MCIDs were calculated separately for KOOS and SF12v2 PCS as half the SD of all change scores for that specific PROM [30]. Anchor-based methods require a separate subjective assessment measure of a patient’s perceived benefit from an intervention, data not collected by our institution’s joint replacement registry [47]. The calculated MCID value was 10 for the KOOS and 5 for the SF12v2 PCS (Table 2). Overall, 82% of patients achieved improvement greater than the MCID on the KOOS and 69% on the SF12v2 PCS after unilateral primary TKA.

Table 2 Threshold values for univariate and multivariate analysis

A nonparametric ROC analysis was used to determine an optimal threshold value for KOOS and SF12v2 PCS separately. For each PROM, the calculated threshold value specified a preoperative score best able to predict the likelihood of a patient experiencing a MCID after TKA. The Youden index was applied to each PROM’s ROC analysis to calculate threshold values [48]. The Youden index was used to define threshold values by calculating the preoperative PROM score with the highest combined sensitivity and specificity that a patient would experience the MCID. The c-statistic (AUC) of each ROC analysis indicated the predictive validity of this binary classifier test for predicting the likelihood that a patient would achieve the MCID. The c-statistic ranges from 0 to 1 with 1 indicating perfect discrimination and 0.5 indicating no better than chance. Predictive models are considered reasonable when the c-statistic (AUC) is > 0.7 and excellent when > 0.8 [20]. For this study, c-statistics > 0.7 were considered acceptably predictive.

Given the known effect of a patient’s preoperative mental and emotional health on the functional result of TKA, a multivariate analysis was performed. A two-stage hierarchical multivariate logistic regression analysis was performed to determine the relative influence of preoperative MCS score on patients’ likelihood of achieving a MCID based on their preoperative KOOS or PCS score. This analysis was necessary to control and adjust for individual patient preoperative variability to allow for comparisons between patients’ clinically meaningful improvements. New Youden thresholds for KOOS and PCS were then calculated from the fitted logistic regression equation of the predicted probability of obtaining the MCID, generating a new threshold value for each potential preoperative MCS score. These new threshold values were then used to calculate new c-statistics to determine changes in the predictive ability of the KOOS and PCS threshold values after controlling for preoperative MCS scores.

Results

The calculated threshold values for functional outcome measures KOOS and SF12v2 PCS were 58 and 34, respectively. These threshold values defined the maximum preoperative scores after which the likelihood of a patient experiencing a minimum clinically important difference began to diminish. The KOOS threshold value of 58 proved to be acceptably predictive of a patient’s likelihood of achieving the MCID with an AUC value of 0.76 (Fig. 1A). The SF12v2 PCS threshold value did not demonstrate sufficient predictive ability with an AUC of 0.65 (Fig. 1B). The corresponding sensitivity and specificity values for each threshold ranged from 56% to 82% (Table 2).

Fig. 1A–B
figure 1

The calculated threshold values, indicated by the dotted vertical lines, do not represent true cutoffs but instead serve to represent points after which a patient’s likelihood of experiencing a clinically meaningful improvement in function begins to more rapidly diminish. (A) The KOOS threshold value of 58 was acceptably predictive of a patient’s likelihood of experiencing a clinically meaningful improvement in outcome as measured by the 1-year postoperative HOOS score (AUC, 0.76). (B) The SF12v2 PCS threshold value of 34 was not acceptably predictive of a patient’s likelihood of experiencing a clinically meaningful improvement in outcome as measured by the 1-year postoperative SF12v2 PCS score (AUC, 0.65).

Multivariate analysis, adjusting for preoperative mental and emotional health, demonstrated that patients with better baseline mental and emotional health (higher preoperative MCS scores) and the greatest amount of disability (lower KOOS and PCS scores) had the highest likelihood of experiencing a clinically meaningful improvement in function after TKA. In addition, the predictive ability of both the KOOS and SF12v2 PCS threshold values improved after controlling for mental and emotional health. The KOOS c-statistic improved from 0.76 to 0.80 and the SF12v2 PCS c-statistic improved from 0.65 to 0.71 (Table 2). Only when taking into consideration a patient’s preoperative mental health are the SF12v2 PCS threshold values acceptably predictive. Using the fitted logistic regression equation, new threshold values for KOOS and SF12v2 PCS were calculated for each potential preoperative MCS value from 0 to 100. Higher preoperative SF12v2 MCS scores resulted in higher threshold values for both KOOS (Fig. 2A) and SF12v2 PCS (Fig. 2B) such that for each 10-point increase in preoperative SF12v2 MCS score, there was an approximate 6-point increase in both KOOS and SF12v2 PCS threshold values. These findings suggest that patients with better mental and emotional health are more likely to experience a clinically meaningful improvement in function after TKA despite having superior baseline function.

Fig. 2A–B
figure 2

SF12v2 PCS and KOOS threshold values (represented by dashed lines) are dependent on preoperative MCS score and demonstrate a linear relationship. Postoperative data are plotted in a binned fashion, which demonstrates the likelihood of attaining a MCID across different preoperative PROM score combinations. Hexagonal cells are labeled and shaded according to the proportion of patients within that cell who obtained the MCID (absolute number of patients in parentheses). By situating patients within a specific bin, one is able to visualize an approximate likelihood of obtaining a MCID based on preoperative PROM scores in the context of calculated threshold values. (A) The predictive ability of KOOS threshold values improved from 0.76 to 0.80 after adjusting for preoperative mental and emotional health. (B) After adjusting for preoperative mental and emotional health, SF12v2 PCS threshold values demonstrated an improved predictive ability (AUC, 0.71).

Discussion

Despite the proven effectiveness of TKA, a notable subset of patients does not experience meaningful clinical improvement after surgery. Prior studies aimed at defining the MCID after TKA have demonstrated that 12% to 51% of patients do not experience this degree of improvement postoperatively with respect to pain and function [2, 7, 9, 11]. With limited appropriateness criteria, the decision to pursue surgery is complex and multifactorial for both the patient and physician. This is evidenced by reports that suggest up to 34% of TKAs are performed inappropriately [3, 33, 34, 36]. Prior studies have attempted to define explicit clinical criteria for TKA appropriateness; however, the subjective nature of the procedure’s indications requires patients to weigh the risks and benefits on the basis of their own values [10, 12]. This study is the first of which we are aware to identify an association between baseline functional status, adjusted for mental and emotional health, and the subset of patients most likely to experience a clinically meaningful improvement in function after TKA. The preoperative threshold value for KOOS, determined to be 58, is capable, with acceptable predictive ability, of differentiating patients more likely to experience meaningful improvements in function after TKA from those who are not. Additionally, threshold values for KOOS and SF12v2 PCS vary and demonstrate improved predictive ability when taking into account preoperative mental and emotional health.

This study has several limitations. The definition of a “successful” outcome is a controversial issue and varies between different patients and providers. For the purpose of this study it was defined as a change in PROM score after TKA greater than the MCID, which may not be the most appropriate definition of success. This definition excludes patient satisfaction, a separate outcome that has been shown to be dependent on patient expectations and highly variable as a result of factors specific to each individual patient [31, 37]. For this reason, patient satisfaction may be an unreliable metric to assess the impact of an intervention [37]. Additionally, the method of threshold value calculation using Youden’s index, which maximizes the combined sensitivity and specificity of the cutoff point, may not be the most clinically relevant method. This may explain the relatively high proportion of patients in our study that fall outside of the defined thresholds when compared with prior studies of TKA use and appropriateness based on clinical criteria [10, 34]. Surgeons may prefer thresholds with higher sensitivity at the expense of specificity, thereby predicting that a greater percentage of patients are likely to experience meaningful functional benefit after TKA. Furthermore, this retrospective study does not confirm the ability of threshold values to predict which patients will actually experience a clinically meaningful improvement in practice. Only future prospective studies are able to validate the clinical application of threshold values. For these reasons, the calculated threshold values should not be regarded as true appropriate use criteria, but rather as tools to enhance patient education and shared decision-making.

Values for MCID can vary substantially and are dependent on numerous factors including the method of calculation, differing patient populations, and length of patient followup. For instance, prior studies designed to estimate the MCID for WOMAC and SF-36 after TKA using anchor-based methods have found consistently higher values than those using distribution-based methods [7, 11, 42]. This may suggest that distribution-based methods underestimate the amount of postoperative improvement necessary to be meaningful for patients. However, an ideal means of calculating MCID with regard to TKA, or any intervention, remains to be determined [7, 9, 11]. The minimum clinically important difference for a specific intervention is ultimately defined by what is interpreted as important to a patient and is therefore not a fixed attribute. This study used a distribution-based method that, although widely used, is generally not a preferred method as a result of several limitations. Distribution-based methods are founded entirely on statistical reasoning and therefore fail to incorporate patients’ own assessment of their condition. In the current study, we performed several different distribution-based estimates of MCID to evaluate for statistical variability. MCID was calculated using three different methodologies including half SD, 95% confidence intervals, and half the IQR. The half the SD method was chosen because it fell between the other proposed MCID values (Supplemental Table 1 [Supplemental materials are available with the online version of CORR®.]). Additionally, the 95% confidence intervals were very wide, whereas the IQR/2 criterion resulted in MCID values attained by a relatively small percentage of the patient population. Furthermore, every attempt was made to control for individual variability using multivariate techniques.

Similar to other regional and national joint replacement registries in the United States, our institution’s joint replacement registry does not collect subjective patient assessments of improvement after TKA (other than as reported in their patient-reported outcome scores). We were therefore unable to perform an anchor-based method of calculation. However, because MCID values are known to be sample-specific, we favored a method that used data from our study population over adopting MCID values defined in previous studies. Applicable to any study that uses MCID, the reader should be made aware of the associated limitations and the resulting impact on clinical applications.

Length of followup for this study (1 year) may be considered a limitation. However, we believe that 1 year of follow-up is appropriate given the objective of our study and evidence from prior studies related to time to full recovery after TKA. When quantified with PROMs, the greatest change with regard to pain, function, and mental health has been shown to occur within the first 6 months after TKA and plateaus 1 year postoperatively [15, 35, 38]. One hundred eighty-eight (25%) of the patients from our institution’s joint replacement registry who met the original inclusion criteria during the years 2009 to 2013 were not included in the study. This included 128 who were lost to follow-up and 60 who did have postoperative follow-up but not at the 1-year time point. Importantly, no differences were found between the study cohort and all patients lost to follow-up when comparing preoperative and postoperative PROM scores (Table 1). Given the distribution-based method used in this study, the value of MCID is dependent on variability within our patient population and therefore could be affected by the 128 patients lost to follow-up entirely. If 1-year postoperative PROM scores for those lost to follow-up were biased as compared with our study population, either more or less improved, our calculated MCID values would be larger. Although this would result in a smaller percentage of patients within our study population experiencing clinically meaningful improvement after TKA, it would be unlikely to significantly affect threshold values, and corresponding c-statistics, because these are objective measures.

This study was performed at a single institution on a predominantly white, North American population. Accordingly, the results may not be applicable to individuals who are underrepresented in our patient population. However, both the KOOS and SF12v2 have been shown to have good applicability across varying populations and we therefore believe that our findings can be generalized. In the future, we believe that a methodology similar to the one described in this study may be applied to surgeon-specific data with the use of a computational algorithm. Such an algorithm could be incorporated into joint replacement registries as an application capable of generating surgeon-specific threshold values. Given the progressive adoption of regional and national joint replacement registries, this type of application would have limited barriers to entry and broad-reaching implications.

The threshold value for KOOS, which was a maximum of 58 out of a possible 100 points, was sufficiently predictive of attaining a MCID (AUC, 0.76). This suggests that patients with baseline KOOS scores greater than 58 are progressively less likely to experience a clinically meaningful improvement after surgery. This trend of diminishing returns with higher baseline functional capacity has been previously described [24, 26]. These findings are consistent with prior evidence suggesting that preoperative pain and functional status are predictive of functional ability after TKA [15, 27, 40]. Comparatively, the threshold value for SF12v2 PCS was not acceptably predictive. This is likely explained by the fact that generic PROMs such as the SF12v2 are less sensitive to changes in health after TKA when compared with disease-specific PROMs such as the KOOS [29].

The predictive ability of both SF12v2 PCS and KOOS threshold values improved after controlling for baseline mental and emotional health, as quantified by preoperative SF12v2 MCS scores. In fact, only when taking into consideration patients’ preoperative mental and emotional health do the SF12v2 PCS threshold values become acceptably predictive. Additionally, the multivariate analysis demonstrated that baseline SF12v2 MCS scores paralleled functional threshold values. These findings are consistent with prior evidence, which demonstrates that poorer baseline mental and emotional health is associated with smaller improvement in function after TKA [4, 5, 13, 16, 17, 19, 24, 26, 27, 40, 43, 44]. By comparison, patient comorbidities and age have little effect on PROM scores after TKA [13, 32].

For physicians, the results of this study broaden the application of widely used patient-reported outcome surveys by providing preoperative threshold values for a disease-specific PROM, KOOS, and a generic PROM, SF12v2, that have been adjusted for mental and emotional health. These data may help to identify the subset of patients with preoperative PROM scores that place them at a low likelihood of experiencing a clinically meaningful benefit after TKA. This type of information may facilitate further discussions surrounding the timing of surgery or the need for additional preoperative interventions before proceeding to surgery. More specifically, patients with low MCS scores may benefit from preoperative interventions aimed at improving mental and emotional health such as a multimodal program including cognitive therapy and education to better align patient expectations with realistic outcomes. Patients with high preoperative function may be counseled that the incremental improvement they can expect after surgery may not be sufficient to make a meaningful impact on their quality of life and therefore may not be considered worth the risk of undergoing a major surgical procedure. For patients, this information could help to give them a sense of what to expect after surgery, thereby promoting further engagement in the decision-making process. Evidence-based tools such as decision aids have been shown to be an effective means of helping patients make difficult decisions and lead to better outcomes and patient satisfaction after joint replacement surgery [1, 6, 41, 49]. Future prospective studies are needed to assess the use of preoperative PROM threshold values in shared clinical decision-making between patients with advanced knee osteoarthritis and their care team as well as their ability to increase the percentage of patients who experience a clinically meaningful improvement after TKA.