Introduction

Health-Related Quality of Life (HRQoL), a multidimensional construct that assesses several domains (e.g., physical, emotional, social), is an important Patient-Reported Outcome (PRO) in clinical trials as well as in routine clinical practice, and such information is also used by health policy makers in health care resource allocation and reimbursement decisions [1,2,3]. A PRO is defined as “any report coming directly from patients about how they function or feel in relation to a health condition and its therapy” [4]. Interpretation of changes in HRQoL scores of Patient-Reported Outcomes (PROs) is a challenge to the meaningful application of PRO measures in patient-centered care and policy [5].

Numerous clinical trials have established the importance of HRQoL in various diseases, and it is increasingly popular to evaluate generic and disease-specific HRQoL in clinical trials as a measure of patients’ subjective state of health [5, 6].

To be clinically useful, HRQoL instruments must demonstrate psychometric properties such as validity, reliability and responsiveness to change [7, 8]. Responsiveness to change is important for instruments designed to measure change over time. However, the statistical significance of a change in HRQoL scores does not necessarily imply that it is also clinically relevant [9,10,11,12]. Indeed, health policy makers need to present clinically meaningful results, to determine if the treatment is beneficial or harmful to their patients and also to know how to interpret and implement those results in their evidence-based method for clinical decision making [13]. Interpretation of clinical outcomes therefore should not be based solely on the presence or absence of statistically significant differences [14]. This highlights the need to define the minimal change in score considered relevant by patients and physicians, called ‘the Minimal Clinically Important Difference (MCID)’.

The MCID was first defined by Jaeschke [15] as ‘the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management’. MCID values are therefore important in interpreting the clinical relevance of observed changes, at both the individual and group levels. From the patient’s viewpoint, a meaningful change in HRQoL may be one that reflects a reduction in symptoms or improvement in function, however, a meaningful change for the physician may be one that indicates a change in the treatment or in the prognosis of the disease [16, 17].

Several methods have been developed, but no clear consensus exists regarding which methods are most suitable. An extensive review of available methods was published by Wells and colleagues and classified them into nine different methods [18].

Another review proposed three categories of methods for defining the MCID: distribution-based, opinion-based (relying upon experts) and anchor-based methods [19].

On one hand, anchor-based methods examine the relationship between a HRQoL measure with another measure of clinical change: the anchor [20]. Anchors can be derived from clinical outcomes (laboratory values, psychological measures, and clinical rating performance measures) or Patient-Reported Outcomes (PRO) (global health transition scale, patient’s self-reported evaluation of change) [20].

On the other hand, distribution-based methods use statistical properties of the distribution of outcome scores, particularly how the scores differ between patients. The distribution methods may use methods based on Standard Error of Measurement (SEM), Standard Deviation (SD), Effect Size (ES), Standardized Response Mean (SRM), Minimal Detectable Change (MDC), or Reliable Change Index (RCI) [20, 21].

The Delphi method has also been put forth in the literature. It involves the presentation of a questionnaire or interview to a panel of experts in a specific field for the purpose of obtaining a consensus [22]. The expert panel is provided with information on the results of a trial and are requested to provide their best estimate of the MCID. Their responses are averaged, and this summary is sent back with an invitation to revise their estimates. This process is continued until consensus is achieved [23].

To date, methods to determine MCID can be divided into two well-defined categories: distribution-based and anchor-based methods [20, 24,25,26]. These two methods are conceptually different. Distribution-based methods are the most used with a meaningful external anchor [20, 24,25,26]. Revicki et al. [21] recommended the usage of the anchor-based method to produce primary evidence for the MCID of any instrument and the distribution-based method to provide secondary or supportive evidence for that MCID.

The interest in estimation of MCID for HRQoL instruments has been increasing in recent years, and several reviews focused on estimates of MCID [20, 27,28,29]. MCID values have been shown to differ by population and study context as well as choice of anchors. This variability highlights the need to understand how the MCID was statistically established and what kind of anchors have been used, in order to facilitate its application in the Quality of Life field.

A systematic review was conducted to describe, from a structural literature search, the different types of anchors and statistical methods used in estimating the MCID for HRQoL instruments, either generic or disease-specific ones.

Materials and methods

Search strategy

A literature review was conducted in accordance with the preferred reporting items for systematic reviews and meta-analyses (PRISMA) [29].

To identify a large number of studies related to MCID, we performed a literature search on PubMed and Google scholar articles from 01 January 2010 to 31 December 2018 using the following request: (“MCID” OR “MID” OR “minimal clinically important difference” OR “minimal important difference” OR “minimal clinically important change” OR “clinically important change” OR “minimal clinical important difference” OR “clinical important difference” OR “meaningful change”) AND (“health related quality of life”).

A grey literature review was also performed.

We selected English and French language articles displaying an abstract and having included studies which (1) were original articles (i.e. reviews, meta-analysis, commentaries and research letters were not considered), (2) described anchors and statistical methods used to estimate the MCID in HRQoL instruments. We did not select the literature reviews, considered as secondary research articles, but we used the references of these reviews to search for other pertinent articles.

Two authors (YM and EJ) independently screened the study based on titles and abstracts. Then, authors (YM and EJ) obtained the selected full texts and read them to determine eligibility, and finally, the references in each of the retained articles were reviewed by YM and EJ for other relevant articles that might have been missed in the initial research.

Data extraction and evaluation

For each included article, we collected data about:

- The year of publication,

- The study design. Four types were identified:

  • Prospective;

  • • Retrospective;

  • • Cross-sectional;

  • • Clinical trials.

  • – The sample size (N);

  • – The disease;

  • – The HRQoL instrument: number of instruments used/subscale, generic and/or disease-specific;

  • – MCID estimation method: anchor and/or distribution, number of anchors, kind (subjective or clinical), cutoffs used, statistical methods, distribution criteria;

  • – The MCID value/range of each HRQoL instrument for each study.

The methodological quality of the included studies was independently assessed by two authors and disagreements were resolved by discussion. Articles that met eligibility criteria were grouped according to different clinical treatment areas. We then assessed MCID anchors and calculation methods, developed tables to display questionnaire names, calculation methods and type of anchors, MCID values by generic and disease-specific questionnaire.

Results

Selection process and general characteristics of included studies

The literature search identified 695 articles via PubMed, and 119 more articles were added with complementary research. After the selection process, this literature review included 47 articles (Fig. 1).

Fig. 1
figure 1

PRISMA Flow diagram of the literature search

Our review provides an assessment of MCID for 6 generic and 18 disease-specific instruments (Table 1). Characteristics of the 47 included articles [30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76] are summarized in Table 2.

Table 1 HRQoL instruments: abbreviations and full names
Table 2 General characteristics of the included studies (N = 47)

More than half of the studies were prospective (n = 34, 72.3%), 3 retrospective (6.4%), 3 cross-sectional (6.4%) and 7 were clinical trials (14.9%). Nearly 40% of studies have been conducted in the field of oncology.

In addition, 75% of studies estimated the MCID for only one HRQoL instrument while 25% for two or three instruments. Twenty-two (46.8%) studies focused only on a generic HRQoL instrument, 23 (48.9%) studies only on a disease-specific HRQoL instrument, and 2 (4.3%) studies combined both.

Methods of MCID estimation

In this review, 18 (38.3%) of the included studies used only anchor-based methods to estimate the MCID; 6 (12.8%) studies used only distribution-based methods, and 23 (48.9%) combined both to provide more accurate estimates (Table 2).

Anchor-based methods

Type of anchors

Among the 41 studies using anchor-based methods, 36 studies applied non-clinical anchors and only 5 studies applied clinical ones. Anchors adopted in the included studies are presented in Table 3. For each of these anchors, authors predefined different cutoffs that vary depending on the study context.

Table 3 MCID methods estimation: anchors and statistical methods

Among the 36 studies using non-clinical anchors, 30 of them chose anchors from the viewpoint of patients, 5 from the viewpoint of physicians and 1 from the viewpoint of both.

Anchors from patient point of view are based on questions to assess how a patient feels about his or her current health status over time or on Patient-Reported Outcomes (PRO):

  • The Global Rating of Change (GRC) scale (n = 6): used by authors on a 15-point ordinal scale or on a 7-point scale.

  • Global and transition questions: the most common was the Health Transition Item (HTI) of the SF-36 (n = 6). The other questions related to the instruments were differently applied and are described in detail in Table 3.

  • PRO such as Pain Disability Index, the perceived recovery score of the Stroke Impact scale, the Symptom Scale-Interview …

  • Other scales such as the Modified Rankin Scale, the Barthel Index …

Five studies used a physician point of view anchor:

  • The dichotomous physician’s global impression of treatment effectiveness (PGI): this question was a discrete choice of “effective” or “not effective” treatment.

  • The Clinical Global Impressions scales: Improvement (CGI-I) or severity (CGI-S).

  • The change in Fontaine classification: rated on a 4-point scale (much improved, improved, unchanged and worse).

Four studies used a Performance Status (PS) as clinical anchor:

  • The Karnofsky Performance Scale (KPS).

  • The World Health Organization Performance Status (WHO PS), combined with Mini-Mental State Exam (MMSE) or Weight change.

Statistical methods used for anchors-based methods

Among the 41 studies using anchor-based methods, 36 applied only one statistical method. These methods were Change Difference (CD), Receiver Operating Curve (ROC), Regression analysis (REG), Average Change (AC) and Equipercentile Linking (EL). Furthermore, 5 studies combined many of these methods (Fig. 2, Table 3).

Fig. 2
figure 2

Review of Statistical methods applied in the included studies

Mostly, determination of MCID was based on the calculation of a change of HRQoL score between two times, from a baseline (longitudinal study).

Among the 36 studies using only one statistical method, the most common were (Fig. 2):

  • The CD: MCID was identified by the difference between the average of HRQoL score change of responder patients (defined by the anchor) and the average score change of non-responder patients.

  • The ROC: created by plotting the sensitivity of the instrument (the true positive rate) against the specificity (the false positive rate). Some studies [43, 51, 54, 60, 76] identified the MCID as the upper corner of the curve, and other studies [45, 65] identified the MCID as the point of the receiver operating characteristics curve in which sensitivity and specificity are maximized (Maximum (Sensibility+Specificity-1), Youden index). The area under the curve (AUC) was always calculated to measure the instrument responsiveness, suggesting AUC values upper than 0.7.

  • The regression analysis of HRQoL score (or change) by anchor as regressor: authors defined the MCID as the coefficient estimate of the anchor.

  • The method of AC: by relating the average of the HRQoL score change observed in patients classified as responders according to the anchor.

  • The EL: the value of change in the HRQoL score that corresponds in percentile rank to the change in the anchor is interpreted as the MCID.

All 5 studies combining 4 methods: CD, ROC, AC and MDC methods.

  • The MDC (Minimal Detectable Change) is defined as the upper limit of the 95% confidence interval (CI) of the average change detected in non-responders.

Two of these 5 studies [49, 50] chose the MDC as the most appropriate method to identify the MCID, since it was the only method to provide a threshold above the 95% Confidence Interval of the unimproved cohort (greater than the measurement error). The three other studies [46, 52, 59] did not find a difference between the 4 methods to determine the true value of MCID.

Distribution-based methods

Among the 29 studies using distribution-based methods, 13 applied only one method, while most studies (n = 16) combined more than one distributional method (Table 3).

The most common were (Fig. 2):

  • Multiples of Standard Deviation were used as MCID: 0.5SD, 0.3SD, 1/3SD, 0.2SD. Most authors (n = 22) used 0.5 Standard Deviation of the HRQoL mean change score between two time points. Frequently 2 or 3 multiples of 0.5SD, 0.3SD and/or 0.2SD (n = 12) were used, only one with 1/3SD. Multiples of SD were related to effect size: 0.2SD (small effect) to 0.5SD (median effect).

  • The Standard Error of Measurement (SEM): calculated by the formula SEM = SD √(1-r) where r is a reliability estimation of HRQoL score (ratio of the true score variance to the observed score variance or internal consistency measure as Cronbach’s alpha). This characteristic of precision was frequently used (n = 16), and associated with multiple SD.

  • The Effect Size (ES): used in one study and represents the standardized HRQol score change. Common statistic is calculated by the ratio of the score change divided by the standard deviation of the score.

  • The Minimal Detectable Change (MDC): used in two studies and calculated as 1.96 × √2 × SEM (for a 95% confidence interval). The MDC represents the smallest change above the measurement error with a confidence interval.

Thereby, most studies (n = 16) combined the fractionations including 0.2 SD, 0.3 SD or 0.5 SD and/or 1 SEM in order to provide a range of MCID values (Table 3).

MCID values

As shown in the supplementary file (see Additional file 1), variability in MCID results were observed for each HRQoL instrument, depending on:

  • Pathology: MCID for SF-36 PCS ranged from 4.9 to 5.21 [55] and from 4.09 to 9.62 [59] for Rheumatology and neurology population, respectively.

  • Methodology: even for the same pathology, MCID values were variable. For example, for EQ-5D, MCID values, using anchor and/or distribution-based methods, varied from 0.01 to 0.39 for patients with rheumatology/musculoskeletal disorders [47,48,49,50, 54] and from 0.08 to 0.15 for oncology patients [31, 40, 43]. For patients with psychology disorders, MCID ranged from 0.05 to 0.08 using anchor-based method, and 0.04 to 0.1 using distribution-based method [65].

  • Statistical method: MCID for EORTC QLQ-C30 in oncology patients ranged from − 27 to 17.5 using CD method [30], − 12 to 8 using AC method [31] and − 11.8 to 11.8 using regression analysis [37].

  • Change direction: some studies calculated MCID irrespective of the change direction or separately for improvement and deterioration without major impact on MCID values and did not find a major impact on MCID values. The WHOQOL-100, for example, was assessed in early-stage breast cancer population [36], MCID for improvement ranged from 0.51 to 1.27 and for decline from − 1.56 to − 0.71.

Discussion

Our systematic review identified 47 studies reporting anchors and statistical methods to estimate MCID for generic and disease-specific HRQoL instruments. This review pointed out that the interest of MCID in HRQoL instruments has been increasing in the recent years and the largest work has been done in the field of oncology disorders.

Most studies used anchor-based methods in our review (n = 41), either alone or in combination with distribution-based methods. As discussed by Gatchel and Mayer [77], anchor-based methods are good depending on the choice of the external criteria as well as the methodology used.

We observed multiple anchors chosen by authors, and the most common anchors were non-clinical and from the viewpoint of patients in order to assess how a patient feels about his or her current health status over time. These anchors are well-studied and applicable to a wide range of patients [78]. However, patients may be aware that the phase of their disease is deteriorating, thus they will conclude that their HRQoL is similarly deteriorating. Furthermore, the patients’ subjective experiences are related to the way in which people construct their memories. It is hard for people to accurately recall a previous health state; they will rather create an impression of how much they have changed by considering their present state and then retrospectively applying some idea of their change over time. Hermann [79] described the problem of “recall bias” where events intervening between the anchor points influence the recall of the original status, while Schwartz and Sprangers [80] described “response-shift” where a patient’s response is influenced by a changing perception of their context.

Clinical anchors were not widely applied in the included studies. Changes in Performance Status (PS), in particular the KPS and the WHO PS, were chosen by authors because of accessibility and interpretability [25]. As they do not provide MCID values per se [81], clinical anchors were applied in our review with distribution criterion. However, we did not find any combination with another subjective anchor.

Authors recommended the usage of multiple independent anchors [20]. Anchors must be easily interpretable, widely used and at least moderately correlated with the instrument being explored [8, 32, 33]. According to Cohen’s [82], 0.371 was recommended as a correlation threshold to define an important association. However, anchor-based methods may be vulnerable to recall bias, and as was evident in our review, different anchors may produce widely different estimates for the same HRQoL instrument.

Cut-offs for different anchors were differently assigned by authors. Even for the same anchor, many cutoffs were used. There is no agreement on the exact cutoffs for anchors, they are generally assigned for the purposes of research and depending on study context and anchor used [29].

Once the anchor has been chosen, different statistical methods were applied to estimate the MCID. The most established method in our review was the mean change score, also called the Change Difference (CD) method. This latter is defined as the mean change of patients who improved and, therefore, authors can set its cutoffs on the basis of the change score of patients who were shown to have had a small, moderate, or large change. MCID corresponds to the difference between two adjacent levels on the anchor. MCID would depend on the number of levels on the anchor: the larger the number of levels, the smaller the difference between two adjacent levels, and the smaller the MCID [83].

Each of the statistical methods has its specific concepts and produces a MCID value different from the other methods. Some authors pointed out that the largest threshold value is most often generated from the average change method, whereas the smallest threshold from the CD and MDC methods [48,49,50].

Therefore, the usage of patient point of view’ anchors by most studies may be explained by the lack of satisfying objective scales, which incite the usage of subjective anchors in first place. In addition, perhaps the CD method is simple to apply by many authors, but we cannot affirm that this is the most relevant method. We conclude that there are many faces to the MCID, it is not a simple concept, nor simple to estimate.

In addition, distribution-based methods, derived from statistical analysis, were also applied in few studies. In accordance with literature [24,25,26,27,28,29], most often fractionations in our review include 0.2 SD, 0.3 SD, 0.5 SD and 1 standard error of measurement (SEM).

Some studies determined that MCID corresponded closest to the 0.5 SD estimate. The 0.5 SD was the value in which most meaningful changes fall, as previously proven in a study by Norman et al. [84].

Distribution-based methods also produced different values of MCID depending on the distributional criterion. Nevertheless, distribution-based methods do not address the question of clinical importance and ignore the aim of MCID, which is to define the clinical importance distinctly from statistical significance. Authors recommended the usage of these methods when anchor-based calculations are unavailable [20].

In this review, MCID values were defined for Patient-Reported Outcomes (PRO) measuring HRQoL using the two methods: anchor and distribution-based methods. Some studies had been developed to determine MCID for the Patient-Reported Outcomes Measurement Information System (PROMIS) instruments, using the same reported statistical methods. In recent years, the PROMIS Network (www.nihpromis.org), a National Institutes of Health Roadmap Initiative, has advanced PRO measurement by developing item banks for measuring major self-reported health domains affected by chronic illness. Therefore, further studies should be developed to determine the meaningful change in HRQoL for PROMIS.

Summing up, we did not observe a single MCID value for any HRQoL instrument in our review. Several factors may influence this variability. On one hand, we found many available methods that produced many MCID values for the same HRQoL instrument. Authors applied, for the same instrument and in the same cohort, four different methods and reported four different MCID values [48,49,50, 52, 59], which suggest that variation could be explained further than differences in disease severity or disease group since the same cohort of patients was analyzed. On the other hand, even with the same methodology for the same instrument, MCID values vary since the variability may be related to study population, in particular, patient demographics and patient baseline status. Wang et al. [85] stated that MCID scores are context-specific, depending on patient baseline and demographic characteristics. Therefore, factors affecting MCID values are specific to the population being studied and are non-transferable across patient groups, also related to the multiple reported conceptual and methodological differences.

Our review exhibits some limitations that deserve mention. First, the search strategy only focused on Pubmed and Google scholar, which might have caused the loss of some papers. However, the inclusion of grey literature is a useful source of relevant information, which ensures a certain standard of quality of the selected papers. In addition, our review was limited to the nine last years, which can also lead to the loss of some papers. To our knowledge, there is no review published recently to define MCID in the QoL field, our objective was therefore to provide researchers the new statistical methods to be applied for further researches.

The following question remains to be answered “which is the best method for MCID”? Sloan J (2005) [86] stated that “there are many methods available to ascertaining an MCID, none are perfect, but all are useful”. The MCID can be best estimated using a combination of anchor and distribution measures to triangulate toward a single value. Using several methods enables to assess the robustness of the results. This corresponds to a sensitivity analysis not on the data but on the methods. Anchor-based methods should be used as primary measures with distribution methods as supportive measure.

Conclusion

We conclude that many methods have become available, which lead to different estimations of MCID. MCID should be based on the context of each clinical study. Therefore, in order to stay cautious while interpreting MCID in the field of Quality of Life, close collaboration between statisticians and clinicians may be critical and necessary in order to integrate an agreement regarding the appropriate method to determine MCID. Moreover, as performed for the data, a sensitivity analysis on method, ie performing the analysis with several methods is highly recommended to assess the robustness of the results.