Identifying the minimal important difference in patient-reported outcome measures in the field of people with severe mental illness: a pre–post-analysis of the Illness Management and Recovery Programme

Purpose Complementary interventions for persons with severe mental illness (SMI) focus on both personal recovery and illness self-management. This paper aimed to identify the patient-reported outcome measures (PROMs) associated with the most relevant and meaningful change in persons with SMI who attended the Illness Management and Recovery Programme (IMR). Methods The effect of the IMR was measured with PROMs concerning recovery, illness self-management, burden of symptoms and quality of life (QoL). From the QoL measures, an anchor was chosen based on the most statistically significant correlations with the PROMs. Then, we estimated the minimal important difference (MID) for all PROMs using an anchor-based method supported by distribution-based methods. The PROM with the highest outcome for effect score divided by MID (the effect/MID index) was considered to be a measure of the most relevant and meaningful change. Results All PROMs showed significant pre–post-effects. The QoL measure ‘General Health Perception (Rand-GHP)’ was identified as the anchor. Based on the anchor method, the Mental Health Recovery Measure (MHRM) showed the highest effect/MID index, which was supported by the distribution-based methods. Because of the modifying gender covariate, we stratified the MID calculations. In most MIDs, the MHRM showed the highest effect/MID indexes. Conclusion Taking into account the low sample size and the gender covariate, we conclude that the MHRM was capable of showing the most relevant and meaningful change as a result of the IMR in persons with SMI.


Introduction
In recent decades, the focus of treatment for people with severe mental illness changed from decreasing the burden of symptoms towards living a meaningful life [1]. In the 1980s, the concept of recovery was introduced, defined as a deeply personal, unique process of changing one's attitudes, values, feelings, goals, skills and/or roles [2]. In the 1990s, as a result of better general health care, life expectancy grew, illnesses became chronic, the challenge to manage chronic illnesses and their consequences increased and the term 'self-management' was introduced [3].
In the field of people with severe mental illness, selfmanagement and symptom reduction represent the clinical orientation, whereas recovery is used as an orientation for personal issues [1,2]. In this field, a dismissive attitude towards labelling mental illnesses can be heard because of the stigmatizing tendencies [1,4]. Several interventions with a single focus on recovery have been developed to help persons with severe mental illness to choose, acquire and keep valued roles. Complementary interventions provide both illness self-management and personal recovery-orientated strategies [5]. An example of a complementary intervention is the Illness Management and Recovery (IMR) programme [6]. Internationally the IMR programme is criticized for its too dominant clinical orientation and McGuire et al. [7] recommend exploring the effects of the IMR programme on recovery and severity of symptom outcomes. In different trials, the IMR programme showed effects on patient-reported outcome measures (PROMs) in domains of recovery [8][9][10], symptom reduction [8,11] and illness self-management [12][13][14].
A PROM is defined as any report coming directly from patients about how they function or feel in relation to a health condition and its therapy [15]. PROMs are considered to be able to measure clinically relevant pre-posteffects from a patient perspective. To assess if a change in pre-post-measures is relevant and meaningful, the concept of minimal important difference (MID) is introduced [16][17][18]. Guyatt et al. [19] defined the MID as the smallest difference in outcome in the domain of interest that patients perceive as important, either beneficial or harmful. Knowing that an intervention can enhance an important difference in a desired outcome domain may help patients, caregivers and professionals when considering shared decision-making processes. King [20] states that MIDs can convince clinicians to change their treatment practices and convince policy-makers to change their treatment guidelines. The concept of MID has become a standard approach in determining the clinical relevance of changes in PROMs. No scientific literature on MID for PROMs concerning people with severe mental illness are available. In this study, we want to contribute to knowledge about MID in the field of severe mental illness.
Considering the discourse of a clinical versus personal recovery orientation in the field of people with severe mental illness, this paper aimed to identify the PROM that captures the most relevant and meaningful change as a result of the IMR programme in persons with severe mental illness. If we are aware of this we are able to measure more uniformly in clinical practice and in research.

Trial design, settings and study population
We performed pre-and post-tests to measure the effect of attending the IMR programme on different PROMs. To examine which PROM captured the most relevant and meaningful change, we used the concept of the MID [17,[20][21][22]. Estimating MIDs of the PROMs was based primarily on an anchor-based method but supported by distribution-based methods. We used an additional index to assess whether the effect of the IMR programme in a given PROM was large or small seen from a patient perspective, hence in terms of the MID. In this way, we arrived at the effect/MID as an index. An effect/MID of ≥ 1 indicates that the effect is at least the MID: the higher the index the more patients have a change score above the MID. Typically, within-patient changes are normally distributed around the mean change at group level. By comparing the index across different outcome measures, we could identify on which PROM the participants improved the most from the IMR programme. We considered that the PROM with the highest effect/MID index represents the most relevant and meaningful change as a result of the IMR 1 3 programme. The study population consisted of participants from the e-IMR trial [23,24]. Eligible participants met the following criteria: above 18 years of age, capable of giving informed consent and meeting the Dutch severe mental illness criteria according to Delespaul [25]: being diagnosed with a psychiatric disorder that causes, and is due to, serious impairment in social and/or occupational functioning that lasts longer than at least a couple of years and necessitates coordinated multidisciplinary care. The IMR programme was delivered groupwise in Dutch mental health institutions and lasted about 1 year for all the groups. Further information on the e-IMR trial is published elsewhere [23,24].

Data collection and outcome measures
In this study, we used the pre-post-data that were gathered in the e-IMR trial between January 2015 and October 2016 [23]. The first author and research assistants sampled all data in structured interviews. Face-to-face interviews were held, because we estimated that too many participants would not respond to telephone or online questionnaires. In the population of people with severe mental illness, low computer experience, skills [26] and availability [24] exist. Furthermore, this population is known for their cognitive impairments [27]. We decided to use the advantages of face-to-face interviewing: spoken language that can be better understood; the possibility of responding to misunderstanding and probing for complete answers and using cards with answering options to overcome memorizing difficulties [28]. The interviewers tried to create an easy-going atmosphere using small talk between the different questionnaires. The interviews took 30-60 min and data were sampled using the following six PROMs: 1. Illness management was measured with the Illness Management and Recovery Scales (IMRS), consisting of 15 items. The response levels, on a 5-point scale (1)(2)(3)(4)(5), vary depending on the item. The total score ranges from 15 to 75 and a higher score indicates a higher level of illness management [29]. The test-retest coefficient (r xx ) for the IMRS varies between 0.79 and 0.84 [30][31][32][33]. 2. Self-management was measured with the Patient Activation Measure (PAM), consisting of 13 items. The response levels, on a four-point scale, vary from 'strongly disagree' to 'strongly agree', and the fifth option is 'not applicable'. Raw scores were transformed into scores ranging between 0 and 100 and a higher score indicates a higher level of self-management [34]. Coefficient r xx for the PAM is 0.76 [35].

Recovery was measured with the Mental Health
Recovery Measure (MHRM), consisting of 30 items. The response levels, on a five-point scale, vary from 'strongly disagree' (0) to 'strongly agree' (4), with 'neu-tral' in between (2). The total score ranges from 0 to 120 and a higher score indicates a higher level of recovery [36]. Coefficient r xx for the MHRM is 0.92 [37].

Burden of symptoms was measured with the Brief
Symptom Inventory (BSI), consisting of 53 items. The response levels, on a five-point scale, vary from 'not at all' (0) to 'extremely' (4) [38]. The mean score ranges from 0 to 4 and a higher score indicates a higher level of burden and a lower level of mental health. Coefficient r xx for the BSI is 0.90 [39]. 5. Quality of life (QoL) was measured with the Manchester Short Assessment of Quality of Life (MANSA), rating satisfaction with your life as a whole as the first item (MANSA-1) and 11 other items focusing on social, physical and mental health domains on a 7-point scale, varying from 'couldn't be worse' (1) to 'couldn't be better' (7). The mean score ranges from 1 to 7 and a higher score represent a higher level of QoL [40]. Coefficient r xx for the MANSA is 0.82 [41]. 6. Quality of Life was also measured in terms of general health with the Rand 36-Item Health Survey, consisting of 36 items assembled into nine concepts. Raw scores in all the concepts were transformed into scores ranging between 0 and 100, in steps of 5. A higher score indicates a higher level of health [42]. In this study, we used only two concepts: General Health Perception (Rand-GHP) and Health Change (Rand-HC). Rand-GHP consists of five items that estimate the participant's current perception of general health (bio-psycho-social): (1). how many times the participants' health status hindered them in social activities, which was scored on a fivepoint scale (all, most, some, a little, none of the time); (2) whether they estimate that they become ill more easily than other people; (3) whether they estimate that their health status is just like other people they know; (4) whether they expect their health status to decline; and (5) whether they estimate that their health status is excellent. Items 2-5 were scored on a five-point scale (definitely true, mostly true, don't know, mostly false, definitely false). Coefficient r xx is 0.80 for Rand-GHP [43]. Rand-HC consists of one item that estimates health compared to a year ago on a five-point scale (much better, better, the same, worse, much worse). At the endpoint of our study, this 'year ago' was the start of the IMR programme. Coefficient r xx is 0.40 for the Rand-HC [43].
During the pre-test, participant characteristics were also sampled: age, gender, psychiatric diagnoses conforming to DSM-IV criteria, psychiatric and somatic comorbidities, treatment history, cultural background, housing, socioeconomic status and highest education (see Table 1). At the 1 3 endpoint, non-completers were identified as those who attended fewer than 50% of the IMR programme sessions.

Method for investigating pre-post-effects
Analyses were conducted using SPSS software, version 23 [44]. To determine the pre-post-effects of the IMR programme on all PROMs, we performed mixed-model multilevel regression analyses, taking into account the clustering of participants. Because the IMR programme was delivered via group sessions, the participants were clustered in these groups. This method automatically uses the 'missing at random' assumption to handle missing data. Random effects on cluster and individual participants nested within the cluster and fixed main effects for time trend were included in the model. The analyses were executed according to the intention-to-treat principle. To take into account the influence of covariates, the pre-post-effects were controlled for the participant characteristics one by one, including for noncompleters. The pre-post-effects in the PROMs were used to estimate the effect/MID index.

Methods for investigating the MID
To estimate a MID, both anchor-based and statistical distribution-based methods are recommended [17,[20][21][22]. The anchor-based method uses a criterion that measures a concept that health professionals are familiar with and is widely used in assessing patients' health status [20], such as clinical endpoints, global transition questions or QoL measures [17]. As there is no such widely used criterion in mental health, we searched for a criterion in our own data that captures the richness and variation of the construct of a QoL measure [17]. We examined four QoL anchor candidates: the endpoint data on the Rand-HC global transition question and the change scores on MANSA-1, the total MANSA and Rand-GHP. Rand-HC and MANSA-1 were only used in the search for an anchor. The strength of the association between the anchor and the PROM needs to be determined, because low or no correlation can provide misleading information [17,45]. A correlation of at least 0.30-0.35 is recommended [17], therefore, Spearman's correlation coefficients between the anchor candidates and the PROMs were calculated. Outliers should not drive a correlation to a significant level. In SPSS, scores above 2.58 times the standard deviation (SD) are assigned as probable outliers [46]. Probable outliers were assessed on their appropriateness and impact on the correlation and a decision was made about removing or recoding to a reasonable level [47,48]. The anchor candidate with the highest correlation with the change scores in most PROMs was considered to be the best anchor and, therefore, will be used in the MID-anchor calculations.
Estimation of the MID-anchor proceeds as follows: the scores on the anchor were used to categorize participants into five groups that reflected relevant and meaningful change (large negative, small negative, no, small positive, large positive); the mean of the four differences between change scores in the PROMs for two succeeding change groups is the PROM's MID-anchor [20]. The MID-anchor was used to estimate the effect/MID-anchor index.
It is recommended that the MID be estimated primarily by anchor-based methods [17,49] and to use distributionbased methods as supportive information [17]. Therefore, we examined two statistical distribution-based MID methods based on the effect size (ES) and standard error of measurement (SEM) of the PROM [17]. The ES estimates the effect of the intervention related to the SD, with ½ES as standard for estimating the MID [17,[20][21][22]50]. In MID studies, the SDs of the baseline scores and change scores are used to estimate the ES [21,50]. A change score is the endpoint score minus the baseline score of a participant. The SD of the change scores (SD c ) relates to between-patient variation in change scores. Our index (effect/MID-SD c ) is 2 × effect/ SD c , which is also known as the standardised response mean 1 3 [45,51]. To calculate the effect/MID-SD c index, we used the estimated effects from the mixed-model analyses.
The SEM is computed using the SD and the test-retest coefficient index: SD × √(1 − r xx ) [20,28,47,52]. To estimate the SEM of the PROMs used in our study, we used the SD and r xx reported in psychometric studies of the PROMs in populations comparable to ours as much as possible. A change smaller than the SEM is likely to be a result of the measure's unreliability rather than a true observed change, therefore, the PROM's MID based on the SEM (MID-SEM) is equal to the SEM of the PROM. The MID-SEM and the estimated effects from the mixed-model analyses were used to estimate the effect/MID-SEM index.
When covariates modified the effects, we re-estimated the effect/MID indexes and stratified the participants according to the modifying covariate.

Participant flow
Ten IMR programme groups entered the trial, totalling 91 potential participants, 60 of whom (66%) participated. Baseline characteristics of the participants are presented in Table 1: 36 participants (60%) were female and 15 (25%) were lost to follow-up. In total, 45 participants completed the post-test measurements, 25 (56%) of whom were female; 14 (23%) were non-completers, meaning that they attended less than 50% of the IMR programme sessions. Table 2 shows the estimated effects, which were significant for all the PROMs: illness management (IMRS), recovery (MHRM), self-management (PAM), burden of symptoms    Table 2).

Analysis of the MID
The results of the MID analyses are shown in Table 2. The change scores on all the PROMs were normally distributed. The health change measure (Rand-HC) showed no significant correlations to the other PROMs and was therefore considered to be a non-feasible anchor. Compared to the QoL measures MANSA-1 and the MANSA, the general health measure Rand-GHP showed the most frequent, highest and statistically significant (p < 0.01) correlations with the following measures: illness management (IMRS), self-management (PAM), recovery (MHRM), burden of symptoms (BSI), and QoL (MANSA) (see Table 3). Two outliers drove the correlation between Rand-GHP and the BSI. Examination identified the outliers as true scores and to assess their impact on correlation they were recoded into twice the SD c (= 1.02), after which the correlation remained significant (p < 0.01; see Table 3). Rand-GHP was selected as the anchor. We categorized the participants in five change groups. The change scores in the Rand-GHP vary from − 35 to + 40 in steps of 5 (see Table 4). To form the five change groups, we estimated that each group consists of participants that have three consecutive scores in steps of 5, so the score difference between succeeding change groups was 15, which is comparable to the SD c of 17.4. In the 'no change' group, we categorized participants with change scores around zero (− 5, 0, 5). We subsequently defined the other groups and the remaining score + 40 was assigned to the 'large positive change' group. This resulting five change groups are: large negative (n = 3), small negative (n = 7), no (n = 18), small positive (n = 10), and large positive change (n = 7) (see Table 4).
The effect/MID-anchor index was highest in the recovery measure (MHRM), with a value of 1.29.
Regarding the supportive MIDs, the MID-SD c was calculated for the PROMs (see Table 2) and the effect/MID-SD c index was highest in the MHRM with a value of 1.33. The MID-SEM values for the PROMs were equal to the SEM, which we calculated for the PROMs with data from referent studies (see Table 5). The effect/MID-SEM index was also highest in the MHRM with a value of 1.40.
Owing to the modifying effect of the gender covariate, we also stratified the MID calculations (see Table 2). For the stratified calculations of the MID-anchor, we reorganized the change groups into four groups based on the 15-point difference, because the male/female proportions in the different change groups were skewed (see Table 4). Males were overrepresented in the negative change groups and females were overrepresented in the positive change groups. Only one male was present in the large positive change group and females were absent in the large negative change group. For the groups with male participants we merged the two positive change groups, resulting in large negative (n = 3), small negative (n = 4), no (n = 10) and positive (n = 3) change groups; for the groups with female participants, we omitted the large negative change group, resulting in small negative (n = 3), no (n = 8), small positive (n = 8) and large positive (n = 7) change groups. The mean differences between the four change groups estimated the MID-anchors for males and females separately. The effect/MID-anchor index was highest for males (0,88) in the self-management measure (PAM) and for females (2,63) in the recovery measure (MHRM).
For the supportive MIDs, the MID-SD c was stratified for males and females. The effect/MID-SD c indexes for both males (0.83) and females (1.71) were highest in the MHRM. The MID-SEM for males and females separately was not calculated, because the referent studies did not provide data for males and females. The effect/MID-SEM indexes for both males (0.89) and females (1.76) were highest in the MHRM.

Discussion
Considering the discourse of a clinical versus personal recovery orientation in the field of people with severe mental illness, this paper aimed to identify the PROM that captures the most relevant and meaningful change as a result of the IMR programme in persons with severe mental illness. In the whole study population, the recovery measure (MHRM) showed the highest effect/MID index in all the MIDs. Also, in the subgroups stratified by gender, the MHRM had the highest effect/MID index in nearly all the MIDs except for the effect/MID-anchor index for men, which was highest in the self-management measure (PAM). With certain prudence, we conclude that the MHRM captures the most relevant and meaningful change for persons with severe mental illness.
Pre-post-scores improved statistically significantly on all the PROMs. The improvements in self-management (PAM) and illness management (IMRS), are bigger than the decrease of burden of symptoms (BSI). The improvements in illness-and self-management might have enhanced their perceived recovery more than symptom reduction. This matches with Slade's statement that self-management is related to recovery because it can be a vital resource for supporting recovery [1]. Our findings showed that the IMR programme is capable of facilitating recovery using both  r xx SD SEM IMRS [31] .80 6.99 3.15 PAM [35] .76 14.21 6.96 MHRM [37] .92 20.00 5.66 BSI [39] .90 .72 .23 MANSA [41] .82 1.08 .46 Rand-GHP [43] .80 22.7 10.15 illness self-management and personal recovery-orientated strategies, which is also claimed in previous research [7,53,54]. In another earlier study on this population, we saw that women scored better than men, as if women could benefit more from the IMR programme than men [23]. However, before concluding that the IMR programme should be preserved for females, we suggest re-investigating the possible difference in effect between men and women in a larger trial sample.
The overall results on effect/MID index indicate that participating in the IMR programme brought about an important change in the participants. However, on none of the PROMs did the male participants score an effect/ MID index of > 1. Considering the concept of the MID, Revicki et al. [45] state that the ½SD magnitude of change is certainly clinically significant but may not be the smallest non-ignorable difference: ½SD in an outcome measure might be too large to be considered minimally important [45]. Revicki et al.'s statement might also apply for ½SD c , because in our study, the SD c and the SD on baseline scores differed only slightly. Although the mean change in the male participants in our study was < 1MID, we saw that they improved significantly on the recovery measure (MHRM).
On comparing the three calculated MIDs in the PROMs, we conclude that they do not differ by very much. Similarity of the distribution-based MIDs would be expected when the reliability index r xx is 0.75 in a SEM calculation, because then both the MID-SD c and the MID-SEM are equal [52]. The r xx in the main PROMs in our study ranged between 0.76 and 0.92. When r xx is higher than 0.75, the MID-SEM is expected to be lower than the MID-ES. In our study, this is the case in the MHRM and the BSI but not in the QoL measure MANSA due to the difference between the SD in the reference study [41] and the SD c in our study. Nevertheless, in our study, the results in the three MIDs are reasonably consistent and therefore we can conclude that the MID-SD c and MID-SEM support the findings on the MID-anchor.
The anchor-based method is our preferred method, as also recommended by Revecki et al. and Johnstone et al. [17,49]. Jayadevappa et al. [15] mention there is no agreement regarding appropriate anchors. The health change (Rand-HC) 'global transition question' anchor-based method appeared to be non-feasible, which is in line with other studies that declared inaccuracy related to response shifts and recall bias [20,55,56]. Recall bias might also be responsible for Rand-HC's low test-retest coefficient (r xx = 0.40) found in the study of Van der Zee et al. [43]. Nevertheless, this global transition question is still recommended for estimating the MID.
In our study, we found the general health perception measure (Rand-GHP) to be the best anchor. Although this choice was data driven, we also considered that Rand-GHP captures the richness and variation of a construct of QoL. The five Rand-GHP questions contain important issues in estimating one's health status: global estimations of whether their health status hinders them in social activities, whether their health status is excellent, whether they expect deterioration and two questions on whether one's health differs compared to the persons they know. There is considerable evidence that evaluating oneself favourably in comparison with others is associated with having fewer health problems [57]. Social comparison also is an important behavioural change technique [58]. Because of the groupwise deliverance of the IMR programme in our study, participants became acquainted with peers. Comparing oneself to peers might be more realistic than a comparison to healthy persons. Perceiving one's health status as deteriorating is associated with a higher need for support with self-management tasks [59]. We considered that Rand-GHP is a valid measure for investigating the MID as a result of an intervention.

Strength and limitations of the study
The strength of our study is that we contributed to the scientific literature on PROMs and explored the use of MIDs in the field of severe mental illness. We need to be cautious about drawing definite conclusions based on our findings because of the relatively low sample size and the significant gender confounder. The statistical power of the results is low and our sample might not be good representation of the population of persons with severe mental illness. More men living in a supported housing facility might coincidently determine the variance in our observed scores. In a confirmative trial or in other existing datasets with a bigger sample, this study might necessarily be repeated. Although our sample size was small, it was large enough (> 40) to detect correlation coefficients of 0.50 or higher with a power of 96% [47] and it, therefore, properly based the MID-anchor calculations. Another strength of our study is that we were able to include non-completer participants with a low attendance rate, which makes the findings more realistic.
On one hand, our interviewer-administered method of data collection can be considered as a limitation. The face-to-face interviews might have caused response bias, in terms of acquiescence bias [60], and also social desirability, which is stronger in women compared to men [61]. This might have influenced the gender effect difference in our study. Respondents may deliberately answer questions inaccurately, either by underreporting or overreporting of normative or stigmatized issues such as sexual behaviour or eating patterns [62,63]. In our data, we did not find a gender difference in the response to the item of satisfaction with their sexual life in the QoL measure (MANSA), which is an issue that could cause shame and be influenced by social desirable bias. Therefore, we could not conclude that social-desirability bias did lead to the gender difference found in our study and nor could we definitely rule out the presence of this bias. This bias, just as with acquiescence bias, could have occurred in the baseline as well as in the endpoint interviews. Therefore, we expect that the change we saw can be considered a real change. The length of the questionnaires could have caused cognitive fatigue and biased the results, because we did not change the order of the different questionnaires. In a confirmative trial, randomizing the order might prevent this bias.
On the other hand, the face-to-face interviews might have prevented non-response bias, by preventing attrition. We estimated that too many of our participants would not respond to self-administered questionnaires. Only participants with a higher level of functioning might have completed the questionnaires, which could have caused bias. We decided we could better use the advantages of face-to-face interviews [28] as mentioned before in the 'Materials and method' section.

Conclusions
Taking into account the low sample size and the gender covariate, we conclude with certain prudence that the MHRM was capable of showing the most relevant and meaningful change in persons with severe mental illness as a result of the IMR programme.

Implications for further research
Our research can be used as an example of how to estimate MIDs in the context of people with severe mental illness. More research with a larger sample needs to be done to gain a more solid grounding for the MIDs. This research needs to account for the gender covariate. In future research on the effectiveness of interventions for people with severe mental illness, a recovery measure such as the MHRM should be used.

Implications for further practice
In the search for scientific information that can convince clinicians to change their treatment practices and convince policy-makers to change their treatment guidelines, our findings can be used, with certain prudence, in shared decisionmaking processes. When an outcome on recovery is desired, a person with severe mental illness can be assigned to the IMR programme. A recovery measure such as the MHRM is able to measure the effect and should be used uniformly.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.