Decisions with respect to the type of anchor
In our example we used the patient’s global rating of perceived effect (GPE) as the anchor. Critics of the GPE’s reliability  point out that it consists of only one question and that people’s ability to recall their previous health status is questionable. The GPE has been shown to correlate more with current than with previous health status [19, 20]. In our example the Spearman’s rho of the GPE with the changes in PI-NRS scores was 0.61. The correlation of the GPE with the baseline and 12-week values was 0.10 and 0.80, respectively. The low correlation with baseline scores is not alarming: our study sample consisted of a homogenous group of patients who all entered the trial with severe complaints (high baseline values). During the study most patients showed a variable amount of improvement or stayed the same, leading to a more heterogeneous distribution of post-treatment values. In such a situation the correlation of the anchor with the post-treatment values will always be much higher than with the baseline values.
It is important to note that the critical remarks of using a global rating scale as an anchor do not disqualify the anchor-based MIC distribution method, as the method is not restricted to this specific type of anchor. Better anchors should be used if available. Cella et al.  present a nice example of clinical cancer outcomes as anchors, and Kolotkin et al.  chose change in body weight as an anchor in a study population of obese persons. Kosinski et al.  used five different measures for rheumatoid arthritis severity as anchor, including patient’s and clinician’s global assessments.
The choice of anchor is crucial in any anchor-based approach. In other words, the MIC greatly depends on the type of anchor and the anchor’s definition of important change. The anchor determines whether the MIC is considered from the perspective of the patient or the clinician. As clinicians and patients do not always agree which changes are considered important the MIC from patient’s perspective may differ from that from a clinican’s point of view. It is fully acceptable that clinicians and patients have different perspectives on what is important: patients may base it on symptoms, and clinicians on implicit estimation of the clinical course.
Furthermore, the anchor can be very specific or quite general. A global rating scale used as an anchor, in, for example, a study on relaxation therapy for patients with angina pectoris might ask generally ‘How has your health status changed since the start of the treatment?’ or it might ask more specifically ‘Has your anxiety deteriorated, stayed the same, or improved since the last time?’. The latter question could lead to different MIC values, because anxiety is just one aspect of general health status. In general, scores on aspects of health status about which patients are less concerned must change more before they can be considered to reflect important improvement/deterioration for their health status. It has been suggested that to be an adequate anchor, it should correlate at least 0.50 with the changes in the instrument’s scores [14, 24].
What is a ‘minimally important’ change?
The MIC value depends to a great degree on the anchor’s definition of minimal importance. So, the crucial question, then, is ‘what is a minimally important improvement/deterioration?’ Some authors tend to emphasize minimal, while others stress important . Remarkably little research has focused on the ‘importance’ of a change. If patients indicate to be slightly changed, it is a minimal change but it is unknown whether this amount of change is considered important by or for these patients. A current initiative at the 8th Outcome Measures in Rheumatology (OMERACT 8) conference is aimed at exploring these issues in rheumatologic disorders (http://www.omeract.org).
Some authors do consider slight improvement as measured by the anchor to be the minimally important improvement [2, 3, 26]. We [16, 27, 28] and others [15, 29–31] set the bar for minimally important improvement at much improved. We had several reasons for this choice in our primary analysis. In our opinion, it better reflects the concept of important improvement, and we expect that some patients, wanting to please their doctor or researchers, easily say that they are slightly improved.
In our secondary analysis we did lower the bar for minimally important improvement to include those persons who indicated on the anchor that they had slightly improved. In that analysis, the MIC using the ROC cut-off was again 2.5, but the MIC value using the 95% limit cut-offpoint was somewhat smaller, and the overlap between the two curves was substantially larger. This overlap, however, says nothing about the most adequate definition of minimally important improvement, which, in its very nature, is arbitrary.
Which cut-off point is preferred?
A challenging question is: Should the ROC cut-off point or the 95% limit cut-off point be used as the MIC? With the ROC cut-off point, false positive and false negative classifications are equally weighted. If there is no a priori reason to dislike false positives more than false negatives, the ROC cut-off point is a good choice. However, if one objects to classify patients as improved whose changes in scores fall within the measurement error of the not importantly changed patients, one might prefer the 95% limit cut-off point. Alternative cut-off points are also defensible, as long as a justification is given.
We recommend graphs of the anchor-based MIC distribution to visualize the consequences of both ROC and 95% limit cut-off points. The ROC cutoff point usually results in a smaller MIC value than the 95% limit cut-off point, meaning that less change is needed before it is considered important. Note that in Figure 1, in the assessment of the MIC for deterioration, the ROC curve cut-off is larger (i.e. larger distance from zero) than the 95% cut-off level. This can only be reached if the curves hardly overlap, in other words, the optimal cut-off point on the ROC curve has a specificity of more than 95%.
MIC is not an invariable characteristic
Some authors have advocated one uniform measure for MIC, such as 0.5 points on a 7-point response scale  or one SEM [32, 33]. Other studies, using an anchor-based method, however, have shown that an MIC is not an invariable characteristic. It depends on baseline values — with higher baseline values (more severe disorders) needing greater changes to be labeled important [8, 31, 34, 35] — and even on characteristics such as age and sex . What is considered to be an MIC depends, among other things, on the anchor, on the severity of the disease, and on the intervention.
To investigate whether sub-groups of patients require different MICs, we calculated the MICs for subgroups of (sub)-acute and chronic patients, and for patients with high and low baseline values. An accomodation for MICs’ dependency on baseline values is to express the MIC as a percentage of baseline values. Farrar et al.  showed that MICs for a pain intensity rating scale were more uniform when expressed as percentage of baseline values than as absolute change. This solution, unfortunately, does not apply to other characteristics that may affect MICs.
How to deal with different values for MIC
Once it is acknowledged that an MIC cannot be expressed as a single value, it follows that it should be expressed as a range that includes all reasonable values [23, 37, 38]. Ranges, however, require that people know when to use the larger values and when the smaller. People will tend to choose the smallest MIC — they want, after all, see improvement — but the smallest value may not be the most adequate in their situation. In case of high baseline values, for example, higher MICs apply. It is the challenge to balance the clinical practicality of an easily applied single value against the validity of a harder-to-determine value within a range. We support the view of Sloan et al.  that, for MICs to be accepted and used in clinical practice, a single value should be set, but with a small range around it to accommodate some variation. As in the end the MIC should be viewed as a tool to improve interpretation of study (or measurement) results, strongly based on perceptions of those involved, there is a good case to use a mix of evidence-based and consensus processes to come to reasonable and parsimonious choices on MIC values. The OMERACT initiative has been highly successful in organizing such processes in the field of rheumatology (see: http://www.omeract.org). These initially set MICs can always be moved if further research so demands.
The MIC, though important, is only one of the values that enhance our interpretation of the scores on health stauts instruments. Comparing scores from different patients groups  and relating scores to other, better understood, clinical parameters  also enhance the interpretation of these instruments. Our Table 1 is informative in that respect.
Relation of the anchor-based MIC distribution method with other methods for assessing MIC
Authors such as Juniper et al.  and Farrar et al.  have defined the MIC as the mean change in scores of patients categorized by the anchor as having experienced minimally important improvement/deterioration. As can be seen in Table 1, when minimally important improvement was set at much improved, the patients that fell within the categroy had a mean score of 4.1. When the bar was lowered to slightly improved the mean score of persons in that category was 1.8. Note that this method does not take into account the standard deviation of these changes in scores, and only the category of minimally important improvement is used.
Including the categories of improvement beyond minimally important would falsely increase the MIC, because patients who are considered completely recovered are more likely to have very high changes in scores. However, for the ROC analysis, considering only the category of minimally important improvement underestimates the number of false negative classifications, because the categories that indicate more than minimally important improvement may include persons who score lower than the optimal ROC cut-off point. One certainly wants to define these as false negatives. Therefore we have sub-divided our total sample (except for the three deteriorated persons) into importantly improved and not importantly changed persons to determine the minimal important change.
With respect to the role of the distribution, also the ROC analysis ignores standard deviations or other distribution parameters. The ROC cut-off point is based on the minimum percentages of misclassifications on the health status instrument with the anchor as gold standard.
The standard deviation of changes on the health status instrument first becomes important if the 95% limit cut-off point is used. Note that in that case, one only considers the distribution of the persons who have not experienced minimally important change.
Many authors proposed distribution-based approaches to assess MIC, most of which express the observed change in a standardized metric. The SEM, an often-used distribution-based measure, links the reliability of the health status instrument to the standard deviation of the population . The major disadvantage of all distribution-based methods is that they reveal minimally detectable change rather than minimally important change; in themselves, they cannot provide a good indication of the importance of the observed change. Although it may appear, at first glance, to make sense to define an MIC on what is detectable, this leads to the faulty reasoning that what is detectable is important, and conversely, that what is undetectable cannot be important. The latter reasoning has the unfortunate effect of making it impossible to ever conclude that an instrument is unsuitable for detecting MICs.
Statistical significant changes on group level, on individual level, and MIC
It is widely acknowledged that statistically significant differences on group level are largely dependent of sample sizes and have little relation to MICs for individual patients. A variety of approaches to determine the statistical significance of individual change have been proposed . Our 95% limit cut-off point incorporates the concept of statistical significance of individual change, representing a change that is statistically significant different from persons who do not importantly change. The ROC cut-off point is more liberal in this respect, and may result in MIC values which are not statistically different from the mean value of the patients that do not experience an important change.
To use the MIC values on group level, for example to interpret the results of clinical trials, one should determine the proportion of patients who show changes larger than the MIC in each treatment group and compare these proportions [41, 42].