Can Preoperative Patient-reported Outcome Measures Be Used to Predict Meaningful Improvement in Function After TKA?
- First Online:
- Cite this article as:
- Berliner, J.L., Brodke, D.J., Chan, V. et al. Clin Orthop Relat Res (2017) 475: 149. doi:10.1007/s11999-016-4770-y
- 593 Downloads
Despite the overall effectiveness of total knee arthroplasty (TKA), a subset of patients do not experience expected improvements in pain, physical function, and quality of life as documented by patient-reported outcome measures (PROMs), which assess a patient’s physical and emotional health and pain. It is therefore important to develop preoperative tools capable of identifying patients unlikely to improve by a clinically important margin after surgery.
The purpose of this study was to determine if an association exists between preoperative PROM scores and patients’ likelihood of experiencing a clinically meaningful change in function 1 year after TKA.
A retrospective study design was used to evaluate preoperative and 1-year postoperative Knee injury and Osteoarthritis Outcome Score (KOOS) and SF-12 version 2 (SF12v2) scores from 562 patients who underwent primary unilateral TKA. This cohort represented 75% of the 750 patients who underwent surgery during that time period; a total of 188 others (25%) either did not complete PROM scores at the designated times or were lost to follow-up. Minimum clinically important differences (MCIDs) were calculated for each PROM using a distribution-based method and were used to define meaningful clinical improvement. MCID values for KOOS and SF12v2 physical component summary (PCS) scores were calculated to be 10 and 5, respectively. A receiver operating characteristic analysis was used to determine threshold values for preoperative KOOS and SF12v2 PCS scores and their respective predictive abilities. Threshold values defined the point after which the likelihood of clinically meaningful improvement began to diminish. Multivariate regression was used to control for the effect of preoperative mental and emotional health, patient attributes quantified by SF12v2 mental component summary (MCS) scores, on patients’ likelihood of experiencing meaningful improvement in function after surgery.
Threshold values for preoperative KOOS and SF12v2 PCS scores were a maximum of 58 (area under the curve [AUC], 0.76; p < 0.001) and 34 (AUC, 0.65; p < 0.001), respectively. Patients scoring above these thresholds, indicating better preoperative function, were less likely to experience a clinically meaningful improvement in function after TKA. When accounting for mental and emotional health with a multivariate analysis, the predictive ability of both KOOS and SF12v2 PCS threshold values improved (AUCs increased to 0.80 and 0.71, respectively). Better preoperative mental and emotional health, as reflected by a higher MCS score, resulted in higher threshold values for KOOS and SF12v2 PCS.
We identified preoperative PROM threshold values that are associated with clinically meaningful improvements in functional outcome after TKA. Patients with preoperative KOOS or SF12v2 PCS scores above the defined threshold values have a diminishing probability of experiencing clinically meaningful improvement after TKA. Patients with worse baseline mental and emotional health (as defined by SF12v2 MCS score) have a lower probability of experiencing clinically important levels of functional improvement after surgery. The results of this study are directly applicable to patient-centered informed decision-making tools and may be used to facilitate discussions with patients regarding the expected benefit after TKA.
Level of Evidence
Level III, prognostic study.
Despite the overall effectiveness of TKA, a subset of patients experience unsatisfactory results with respect to pain, function, and restoration of quality of life [39, 45]. For a patient to ultimately benefit from a surgical intervention, the procedure must result in a meaningful improvement in health, pain, or function such that the patient might consider repeating the intervention. The minimum clinically important difference (MCID) can be used to specify this degree of change in health and is one way to define what constitutes a successful outcome after a surgical intervention [8, 18, 23]. Prior studies that have defined and evaluated the MCID after TKA have shown that a notable proportion of patients do not experience this degree of change postoperatively [9, 11]. Furthermore, numerous studies have documented dissatisfaction rates after TKA ranging from 15% to 20% [25, 39]. A recent appropriateness study of patients from the Osteoarthritis Initiative concluded that 34% of TKAs were performed inappropriately . Regional, racial, and gender variations in patient selection throughout the United States also highlight the need to better define surgical appropriateness criteria and develop improved shared decision-making tools .
Improved quality of life and functional ability are often considered the most important outcomes after major joint replacement, and patients’ own assessment of these outcomes is a key element in evaluating the effectiveness of the procedure. Consequently, the focus of outcomes assessment has shifted to a more patient-centered approach with the use of patient-reported outcome measures (PROMs). Disease-specific measures such as the Knee injury and Osteoarthritis Outcome Score (KOOS) are more sensitive to change within the context of a specific illness. Generic measures such as the SF-36 and SF-12 version 2 (SF12v2) provide information on overall health status, incorporate elements of psychosocial health, and have the ability to capture the effects of medical comorbidities on quality of life .
There is much interest in the ability to preoperatively identify patients who are at greatest risk of unsatisfactory outcomes after TKA as well as those who will benefit most. Preoperative pain and functional status, as measured by PROMs, have been shown to predict pain and functional ability after TKA [15, 27, 40]. More specifically, although patients with higher levels of preoperative pain and disability demonstrate the greatest improvements in PROM scores, they do not achieve absolute postoperative scores comparable to patients with less preoperative pain and better baseline function [13, 24]. A substantial body of evidence has also demonstrated that mental and emotional health influence postoperative PROMs after TKA [4, 5, 13, 16, 17, 19, 24, 26, 27, 40, 43, 44]. Worse preoperative mental and emotional health is associated with smaller improvements in physical function scores after TKA [4, 5, 16, 17, 22, 27, 40]. These findings highlight the importance of considering both the physical and mental components of preoperative PROMs if they are to be incorporated into a decision-making tool. To our knowledge, this has not been done for patients undergoing TKA.
The purpose of this study was to determine the association between preoperative PROM scores and the likelihood that patients undergoing TKA would experience meaningful clinical improvement 1 year after surgery, as defined by the MCID. Specifically, we asked whether (1) calculated threshold values would be acceptably predictive and define maximum preoperative functional component scores after which the likelihood of experiencing a meaningful improvement begins to diminish; and (2) controlling for baseline mental and emotional health would improve the predictive ability of these threshold values.
Patients and Methods
Data included in this study were obtained from a longitudinally maintained joint replacement outcomes registry, including multiple surgeons, from the primary author’s institution (JLB). The database includes clinical information and patient-reported outcomes for patients undergoing TKA. Patient-reported outcomes were collected preoperatively and 1 year after surgery as part of routine care. The database also includes patient demographic information including age, gender, and race. Patients selected for this study had a history of primary unilateral TKA and available PROM data recorded at both preoperative and 1-year postoperative time points. To ensure the analysis was performed on a homogenous patient population, the data analysis excluded patients with pathological fracture, malignant neoplasm, or a history of a subsequent procedure on the knee. It was anticipated that 500 patients would be needed to establish appropriateness thresholds, allowing the assessment of receiver operating characteristic (ROC) curves and areas under the curve (AUCs) with a standard error of no more than 0.03 for an expected AUC of 0.7.
Comparison of study cohort to patients without 1-year PROM data
Missing 1-year data
Number of patients
Preoperative SF12v2 PCS
Preoperative SF12v2 MCS
Postoperative SF12v2 PCS
Postoperative SF12v2 MCS
Preoperative and 1-year postoperative KOOS and SF-12v2 PROMs were collected through an electronic interface or on paper by a research assistant (DP). The KOOS consists of 42 items separated into five subscales and is scored from 0 to 100 with 0 being the worst level of pain and function. The SF12v2 is a revised version of the original SF-12 with wording modifications to improve readability and ease of use. SF12v2 physical and mental composite scores (PCS and MCS, respectively) range from 0 to 100 in which a score of 0 indicates the lowest level of health. The scores of both subscales are calculated from the survey’s 12 questions. Survey questions that assess mental and emotional health address the effect of emotional problems (such as feeling depressed or anxious) on work or regular daily activities. PROM scores and SDs were calculated using the scoring algorithms for each outcomes instrument. The SF12v2 PCS and MCS components were considered as separate outcome measures and not as individual subscales.
Threshold values for univariate and multivariate analysis
A nonparametric ROC analysis was used to determine an optimal threshold value for KOOS and SF12v2 PCS separately. For each PROM, the calculated threshold value specified a preoperative score best able to predict the likelihood of a patient experiencing a MCID after TKA. The Youden index was applied to each PROM’s ROC analysis to calculate threshold values . The Youden index was used to define threshold values by calculating the preoperative PROM score with the highest combined sensitivity and specificity that a patient would experience the MCID. The c-statistic (AUC) of each ROC analysis indicated the predictive validity of this binary classifier test for predicting the likelihood that a patient would achieve the MCID. The c-statistic ranges from 0 to 1 with 1 indicating perfect discrimination and 0.5 indicating no better than chance. Predictive models are considered reasonable when the c-statistic (AUC) is > 0.7 and excellent when > 0.8 . For this study, c-statistics > 0.7 were considered acceptably predictive.
Given the known effect of a patient’s preoperative mental and emotional health on the functional result of TKA, a multivariate analysis was performed. A two-stage hierarchical multivariate logistic regression analysis was performed to determine the relative influence of preoperative MCS score on patients’ likelihood of achieving a MCID based on their preoperative KOOS or PCS score. This analysis was necessary to control and adjust for individual patient preoperative variability to allow for comparisons between patients’ clinically meaningful improvements. New Youden thresholds for KOOS and PCS were then calculated from the fitted logistic regression equation of the predicted probability of obtaining the MCID, generating a new threshold value for each potential preoperative MCS score. These new threshold values were then used to calculate new c-statistics to determine changes in the predictive ability of the KOOS and PCS threshold values after controlling for preoperative MCS scores.
Despite the proven effectiveness of TKA, a notable subset of patients does not experience meaningful clinical improvement after surgery. Prior studies aimed at defining the MCID after TKA have demonstrated that 12% to 51% of patients do not experience this degree of improvement postoperatively with respect to pain and function [2, 7, 9, 11]. With limited appropriateness criteria, the decision to pursue surgery is complex and multifactorial for both the patient and physician. This is evidenced by reports that suggest up to 34% of TKAs are performed inappropriately [3, 33, 34, 36]. Prior studies have attempted to define explicit clinical criteria for TKA appropriateness; however, the subjective nature of the procedure’s indications requires patients to weigh the risks and benefits on the basis of their own values [10, 12]. This study is the first of which we are aware to identify an association between baseline functional status, adjusted for mental and emotional health, and the subset of patients most likely to experience a clinically meaningful improvement in function after TKA. The preoperative threshold value for KOOS, determined to be 58, is capable, with acceptable predictive ability, of differentiating patients more likely to experience meaningful improvements in function after TKA from those who are not. Additionally, threshold values for KOOS and SF12v2 PCS vary and demonstrate improved predictive ability when taking into account preoperative mental and emotional health.
This study has several limitations. The definition of a “successful” outcome is a controversial issue and varies between different patients and providers. For the purpose of this study it was defined as a change in PROM score after TKA greater than the MCID, which may not be the most appropriate definition of success. This definition excludes patient satisfaction, a separate outcome that has been shown to be dependent on patient expectations and highly variable as a result of factors specific to each individual patient [31, 37]. For this reason, patient satisfaction may be an unreliable metric to assess the impact of an intervention . Additionally, the method of threshold value calculation using Youden’s index, which maximizes the combined sensitivity and specificity of the cutoff point, may not be the most clinically relevant method. This may explain the relatively high proportion of patients in our study that fall outside of the defined thresholds when compared with prior studies of TKA use and appropriateness based on clinical criteria [10, 34]. Surgeons may prefer thresholds with higher sensitivity at the expense of specificity, thereby predicting that a greater percentage of patients are likely to experience meaningful functional benefit after TKA. Furthermore, this retrospective study does not confirm the ability of threshold values to predict which patients will actually experience a clinically meaningful improvement in practice. Only future prospective studies are able to validate the clinical application of threshold values. For these reasons, the calculated threshold values should not be regarded as true appropriate use criteria, but rather as tools to enhance patient education and shared decision-making.
Values for MCID can vary substantially and are dependent on numerous factors including the method of calculation, differing patient populations, and length of patient followup. For instance, prior studies designed to estimate the MCID for WOMAC and SF-36 after TKA using anchor-based methods have found consistently higher values than those using distribution-based methods [7, 11, 42]. This may suggest that distribution-based methods underestimate the amount of postoperative improvement necessary to be meaningful for patients. However, an ideal means of calculating MCID with regard to TKA, or any intervention, remains to be determined [7, 9, 11]. The minimum clinically important difference for a specific intervention is ultimately defined by what is interpreted as important to a patient and is therefore not a fixed attribute. This study used a distribution-based method that, although widely used, is generally not a preferred method as a result of several limitations. Distribution-based methods are founded entirely on statistical reasoning and therefore fail to incorporate patients’ own assessment of their condition. In the current study, we performed several different distribution-based estimates of MCID to evaluate for statistical variability. MCID was calculated using three different methodologies including half SD, 95% confidence intervals, and half the IQR. The half the SD method was chosen because it fell between the other proposed MCID values (Supplemental Table 1 [Supplemental materials are available with the online version of CORR®.]). Additionally, the 95% confidence intervals were very wide, whereas the IQR/2 criterion resulted in MCID values attained by a relatively small percentage of the patient population. Furthermore, every attempt was made to control for individual variability using multivariate techniques.
Similar to other regional and national joint replacement registries in the United States, our institution’s joint replacement registry does not collect subjective patient assessments of improvement after TKA (other than as reported in their patient-reported outcome scores). We were therefore unable to perform an anchor-based method of calculation. However, because MCID values are known to be sample-specific, we favored a method that used data from our study population over adopting MCID values defined in previous studies. Applicable to any study that uses MCID, the reader should be made aware of the associated limitations and the resulting impact on clinical applications.
Length of followup for this study (1 year) may be considered a limitation. However, we believe that 1 year of follow-up is appropriate given the objective of our study and evidence from prior studies related to time to full recovery after TKA. When quantified with PROMs, the greatest change with regard to pain, function, and mental health has been shown to occur within the first 6 months after TKA and plateaus 1 year postoperatively [15, 35, 38]. One hundred eighty-eight (25%) of the patients from our institution’s joint replacement registry who met the original inclusion criteria during the years 2009 to 2013 were not included in the study. This included 128 who were lost to follow-up and 60 who did have postoperative follow-up but not at the 1-year time point. Importantly, no differences were found between the study cohort and all patients lost to follow-up when comparing preoperative and postoperative PROM scores (Table 1). Given the distribution-based method used in this study, the value of MCID is dependent on variability within our patient population and therefore could be affected by the 128 patients lost to follow-up entirely. If 1-year postoperative PROM scores for those lost to follow-up were biased as compared with our study population, either more or less improved, our calculated MCID values would be larger. Although this would result in a smaller percentage of patients within our study population experiencing clinically meaningful improvement after TKA, it would be unlikely to significantly affect threshold values, and corresponding c-statistics, because these are objective measures.
This study was performed at a single institution on a predominantly white, North American population. Accordingly, the results may not be applicable to individuals who are underrepresented in our patient population. However, both the KOOS and SF12v2 have been shown to have good applicability across varying populations and we therefore believe that our findings can be generalized. In the future, we believe that a methodology similar to the one described in this study may be applied to surgeon-specific data with the use of a computational algorithm. Such an algorithm could be incorporated into joint replacement registries as an application capable of generating surgeon-specific threshold values. Given the progressive adoption of regional and national joint replacement registries, this type of application would have limited barriers to entry and broad-reaching implications.
The threshold value for KOOS, which was a maximum of 58 out of a possible 100 points, was sufficiently predictive of attaining a MCID (AUC, 0.76). This suggests that patients with baseline KOOS scores greater than 58 are progressively less likely to experience a clinically meaningful improvement after surgery. This trend of diminishing returns with higher baseline functional capacity has been previously described [24, 26]. These findings are consistent with prior evidence suggesting that preoperative pain and functional status are predictive of functional ability after TKA [15, 27, 40]. Comparatively, the threshold value for SF12v2 PCS was not acceptably predictive. This is likely explained by the fact that generic PROMs such as the SF12v2 are less sensitive to changes in health after TKA when compared with disease-specific PROMs such as the KOOS .
The predictive ability of both SF12v2 PCS and KOOS threshold values improved after controlling for baseline mental and emotional health, as quantified by preoperative SF12v2 MCS scores. In fact, only when taking into consideration patients’ preoperative mental and emotional health do the SF12v2 PCS threshold values become acceptably predictive. Additionally, the multivariate analysis demonstrated that baseline SF12v2 MCS scores paralleled functional threshold values. These findings are consistent with prior evidence, which demonstrates that poorer baseline mental and emotional health is associated with smaller improvement in function after TKA [4, 5, 13, 16, 17, 19, 24, 26, 27, 40, 43, 44]. By comparison, patient comorbidities and age have little effect on PROM scores after TKA [13, 32].
For physicians, the results of this study broaden the application of widely used patient-reported outcome surveys by providing preoperative threshold values for a disease-specific PROM, KOOS, and a generic PROM, SF12v2, that have been adjusted for mental and emotional health. These data may help to identify the subset of patients with preoperative PROM scores that place them at a low likelihood of experiencing a clinically meaningful benefit after TKA. This type of information may facilitate further discussions surrounding the timing of surgery or the need for additional preoperative interventions before proceeding to surgery. More specifically, patients with low MCS scores may benefit from preoperative interventions aimed at improving mental and emotional health such as a multimodal program including cognitive therapy and education to better align patient expectations with realistic outcomes. Patients with high preoperative function may be counseled that the incremental improvement they can expect after surgery may not be sufficient to make a meaningful impact on their quality of life and therefore may not be considered worth the risk of undergoing a major surgical procedure. For patients, this information could help to give them a sense of what to expect after surgery, thereby promoting further engagement in the decision-making process. Evidence-based tools such as decision aids have been shown to be an effective means of helping patients make difficult decisions and lead to better outcomes and patient satisfaction after joint replacement surgery [1, 6, 41, 49]. Future prospective studies are needed to assess the use of preoperative PROM threshold values in shared clinical decision-making between patients with advanced knee osteoarthritis and their care team as well as their ability to increase the percentage of patients who experience a clinically meaningful improvement after TKA.
We thank Dana Pong for both administering PROMs and recording outcomes data in the University of California, San Francisco joint replacement registry database.