Background

To enter the era of value-based orthopaedics (“health outcomes per dollar spent”) [2, 19], clinical researchers will have to prove that each treatment produces a meaningful clinical improvement using outcomes that are relevant for patients. The American Association of Hip and Knee Surgeons has recommended the use of patient-reported outcome measures to evaluate the results of knee and hip arthroplasties [16]. Studies have focused on statistically detectable (sometimes called statistically significant) differences [35]; however, it can be possible to detect statistical differences between interventions that are so small as not to be discernible to patients. Such small differences may not justify the cost or risk of the intervention. It seems much more important that treatments should result in clinical improvements big enough for patients to consider clinically important.

For a given outcome measure, we questioned how much improvement is needed for patients to consider the difference clinically important? Stated otherwise, what is the minimum clinically important differences (MCID) for a specific outcomes measurement tool, such as the SF-36 or the Oswestry Disability Index?

Discussion

According to Cook [5], the idea of the MCID was originally conceived by Jaeschke in 1989, whose definition was “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient management.” Cook’s interpretation of this seminal definition establishes two essential characteristics: (1) a minimal amount of change perceived by the patient; and (2) a change sufficiently relevant to determine a modification in patient management. The alternative term, minimal clinically important improvement (MCII) is defined as the “smallest change in measurement that signifies an important improvement” [16], and has encountered more support than the MCID in musculoskeletal research, especially in rheumatology [31, 32]. In the accompanying tables (Tables 18), we refer to their results as MCID or MCII, but acknowledge it may not be consistent among them. For simplicity, we will use the term MCID throughout this paper.

Table 1 Minimal clinically important improvements for the hip
Table 2 Minimum clinically important differences for the knee
Table 3 Minimum clinically important differences for the spine
Table 4 Foot and ankle scores based on reported improvement for anchor
Table 5 Minimum clinically important differences for the lower extremity functional score
Table 6 Minimum clinically important differences for the hand
Table 7 Minimum clinically important differences for the shoulder and elbow
Table 8 Minimum clinically important differences for the DASH and QuickDASH scores

The application of MCIDs in clinical research has been difficult largely owing to various methods for estimating them [5, 14, 15]. Wright et al. [35] enumerates nine possible methods that can be divided in two possible approaches. One approach uses distribution-based methods, based on statistically detectable changes. However, MCIDs calculated using statistical distributions—particularly when they represent small effect sizes—may not reflect clinically important changes. This topic is discussed in more detail in the “Myths and Misconceptions” Section.

The other approach is to define a binary anchor based on a patient’s reported outcome—for example, was the patient satisfied, or did (s)he feel that his or her health had improved? In this anchor approach, there are two commonly used methods to estimate MCID. One is to use a statistical test to estimate the difference between patients answering ‘yes’ and ‘no’ to the anchor. Another is to use a receiver-operating characteristic (ROC) curve to identify the MCID as the threshold best separating ‘yes’ and ‘no’ responses. In studies using the ROC approach, two additional alternatives exist: studies that focus on maximum overall accuracy, and studies that ascertain whether 80% specificity had been achieved. Importantly, Katz et al. [11] also warn that anchor-based approaches may be misleading in scenarios where a few patients show large benefits, but most show negligible changes.

Although we believe the anchor approach is relatively robust, we acknowledge that different calculation methods lead, unsurprisingly, to different results. Other factors affecting results include whether the calculations were based on raw outcome scores or changes from baseline (MCID vs MCII) and the underlying diagnosis for the patients. Accordingly, we sometimes found a range of possible MCID values for the same outcomes tool (Tables 18).

Myths and Misconceptions

The MCID and the Minimal Detectable Change are the Same (or Even Similar)

They are not the same. By definition, the minimal detectable change (MDC) is the smallest change that can be distinguished from background variation among subjects, which may depend on the variability of the measurement in the population or on the standard error of measurement associated with the test. However, a statistically detectable change may not be one that matters to the patient, although the two may be related. For example, Norman et al. [18] reported that for quality-of-life outcome scores across a range of conditions, the MCID was generally approximately half the standard deviation of the reported scores’ ranges, perhaps reflecting discrimination thresholds of patients. If the MDC is less than the MCID/MCII, then a study may suggest that a treatment results in a difference in outcomes (based on distribution) but the patients may not be able to perceive this difference. If the MCID/MCII is less than the MDC, then we may have the opposite situation, in which numerous patients will report a real benefit, but there is no way to verify it using hard data.

The difference between a statistically detectable and clinically meaningful difference is important (Fig. 1). Imagine a series of clinical studies, each of which returns an estimate of the size of a treatment’s effect, with a confidence interval drawn around that estimate. If the confidence interval crosses the vertical line representing “no change,” the result is not statistically significant, meaning that the observed “difference” may be simply the influence of chance. If the confidence interval is entirely to the right of the vertical line indicating “no change” then the effect is unlikely to be a chance effect. The clinical importance of this effect increases with its distance from the vertical line; that is, confidence intervals that are to the right of the line of “no change” represent “real” effects, but if they are very close to that line, those treatment effects are very small. Therefore it is possible to have situations that reflect statistically detectable changes, but ones that are not clinically important. It also is possible to have intermediate situations, in which there is no statistical effect but we cannot exclude the possibility of a clinical one (this comes into play when there is insufficient statistical power, commonly the result of too few patients studied), or in which there is a statistical effect (the confidence interval remains entirely to the right of the line of “no change”), and the point estimate—such as the mean value on a patient-reported outcomes score, or an odds ratio—seems large enough to care about, but a confidence interval whose left-hand boundary is a very small number, suggesting the effect may in fact not be clinically important.

Fig. 1
figure 1

A comparison of clinical and statistical significance is presented. The vertical line indicates the “no change” region of a measured effect. The horizontal distance from the line measures strength of the effect. Any confidence interval crossing that vertical line is not statistically significant, and any confidence interval near that line may not be clinically significant.

For a Specific Outcomes Tool, the MCIDs for Various Treatments of a Single Joint Will Always be the Same (or Even Similar)

One expects MCID estimates to differ depending on patients’ pathologic characteristics and comorbidities, even when the same calculation method is used for a given outcomes tool. For example, the MCID for hip osteoarthritis may vary based on whether the operation was a first-time arthroplasty or a revision, and based on the timetable of recovery (Table 1). Other examples include those reported by Ozer et al. [20], who found that patients with diabetes had higher MCIDs on the Carpal Tunnel Questionnaire (Table 6), and Wang et al. [33], who found that the MCID for Lower Extremity Functional Scale scores after treatment are at least in part related to the scores those patients reported at baseline (Table 5) and also that age, gender, and symptom acuity could affect estimated MCIDs.

The MCID Can be Used as a Basis for Planning Studies

This is not so much a misconception as a potential caveat. Before the current work, a compendium of outcome scores was assembled by Katz et al. [11], who reviewed painful orthopaedic conditions. They found, as we have, that there is a range of MCIDs for the same condition, and that some scores depend on the initial condition of the patient. Their concern was that averaging across groups could be misleading, if only a few patients change substantially, and most patients change only slightly, if at all. They recommended that in clinical trials comparing two treatments, studies should compare the percentages of patients achieving the MCID.

Conclusions

The tables summarize the range of MCIDs for various outcome tools as an aid to clinicians who may be planning studies or seeking to evaluate patient outcomes in their practices. We caution, based on our findings presented here, that none of the MCID estimates can be considered definitive. However, it may be sufficient for an investigator’s purpose to know a range of probable values for differences between patient groups.

Methodologic Note

The articles referenced were found by using a Boolean search in PubMed using the terms “MCID” or (“Clinically Important” AND (“Minimum” OR “Minimal”) plus “orthopedic” in September 2016. These results were not as comprehensive as we had expected, although still broad enough to provide ample evidence of the variation in MCID. We focused on anchor-based methods because they are tied to patient outcomes, whereas statistical detection thresholds (distribution-based methods) may be irrelevant to the patient.