Psychometric Properties of a Fidelity Scale for Illness Management and Recovery


This study examined the psychometric properties and feasibility of the Illness Management and Recovery (IMR) Fidelity scale. Despite widespread use of the scale, the psychometric properties have received limited attention. Trained fidelity assessors conducted assessments four times over 18 months at 11 sites implementing IMR. The IMR Fidelity scale showed excellent interrater reliability (.99), interrater item agreement (94%), internal consistency (.91–.95 at three time points), and sensitivity to change. Frequency distributions generally showed that item ratings included the entire range. The IMR Fidelity scale has excellent psychometric properties and should be used to evaluate and guide the implementation of IMR.

Trial registration: Identifier: NCT03271242.


Evidence-based practices (EBPs) require reliable and valid instruments to assess fidelity (Bond et al. 2011; Martinez et al. 2014; McHugo et al. 2007). Fidelity to interventions, defined as the degree to which an implementer follows the intervention as specified (Cross and West 2011), is one critical implementation outcome (Proctor et al. 2011).

Illness Management and Recovery (IMR) is a standardized psychosocial intervention designed to help people with serious mental illnesses manage their illness and achieve personal recovery goals (Mueser et al. 2006). Five strategies form the basis of the IMR program: psychoeducation to improve knowledge of mental illness, relapse prevention to reduce relapses and hospitalizations, behavioural training to improve medication adherence, coping skills training to reduce the severity and distress of persistent symptoms, and social training to strengthen social support. The practitioners teach these strategies using a combination of educational, motivational, and cognitive-behavioural techniques, following an accompanying workbook with educational handouts in weekly sessions over 10–12 months, either individually or in groups. The IMR program has spread world-wide (Egeland et al. 2017; Garber-Epstein et al. 2013; Pratt et al. 2011; Roosenschoon et al. 2016), including strong endorsement in Sweden (The National Board of Health and Welfare 2017). A 2014 review concluded that IMR had superior outcomes to treatment as usual, according to observer ratings of psychiatric symptoms, as well as patient and practitioner ratings (McGuire et al. 2014).

The Illness Management and Recovery Fidelity Scale (IMR fidelity) (McHugo et al. 2007) assesses the implementation of specific strategies within the IMR program together with structural and curriculum-based elements, with each item rated on a behaviorally anchored continuum from 1 = no fidelity to 5 = excellent fidelity. A summed and averaged fidelity score of 4.0 or higher or more defines good fidelity, 3.0–4.0 as fair fidelity, and less than 3 as an absence of fidelity (Bond et al. 2009; McHugo et al. 2007). Psychometric assessment of the scale has been limited. One study demonstrated high inter-rater reliability (ICC = .97) (McHugo et al. 2007), and two studies found sensitivity to change following training and consultation (McHugo et al. 2007; Salyers et al. 2009). Nevertheless, no published study has reported a comprehensive psychometric assessment of the IMR Fidelity scale.

This study examined the psychometric properties of the Illness Management and Recovery (IMR) Fidelity scale, including item analysis, interrater reliability, interrater item agreement, internal consistency, and sensitivity to change.



As part of a large implementation study (ClinicalTrials NCT03271242), the research team invited mental health clinics providing treatment for psychosis disorders throughout Norway to participate in a study of implementing evidence-based practices. Eleven sites from six of the 19 health trusts in Norway agreed to implement IMR and received intensive technical assistance in implementing IMR. The current paper reports the findings of a secondary data analysis of IMR fidelity assessments at these 11 sites. Prior to the study initiation, none of the sites were providing IMR. All sites committed to adopting IMR and following the program model and practice manual (Gingerich and Mueser 2011). The Regional committees for medical and health research ethics (REK 2015/2169) approved the study, which followed the principles in the Declaration of Helsinki.

Study Sites

Eight of the 11 mental health clinics were community mental health centers, one was a combined inpatient and outpatient clinic for young adults with psychosis and drug abuse problems, one was an outpatient clinic for children and adolescents, and one was an inpatient clinic for adolescents. The latter sites enrolled youth aged 16 years and older in the IMR program. The participating clinics represented both urban and rural areas.


Each clinic received intensive technical assistance in IMR over 12 months. The technical assistance included 4 days of IMR training with a professional trainer, followed by 30-min weekly group supervision by phone for 6 months, and then every other week for another 6 months.

Each site received a fidelity assessment at baseline, and after 6, 12, and 18 months. A pair of fidelity assessors, independent from the clinical staff delivering IMR, conducted each fidelity assessment. The fidelity assessors varied across sites and assessment periods. A group of 17 researchers (psychologists, psychiatrists, nurses and other health professionals) served as the assessors. All received specific training on IMR fidelity assessment. A senior researcher served as a quality control monitor, reviewing all the fidelity assessments.

The assessors conducted full-day site visits, using an integration of four sources of information: (a) semi-structured interviews with the site leader, (b) semi-structured group interviews with the practitioners that facilitated IMR, (c) progress notes on the patients’ goals and steps towards the goals that were filled out by the practitioners prior to the site visit, and (d) handouts and written materials on the patients’ progress. The two assessors rated each program independently and then compared ratings, resolving discrepancies through discussion in order to reach consensus.


The Illness Management and Recovery Fidelity scale (McHugo et al. 2007) assesses the implementation of specific strategies within the IMR program, such as goal setting and follow-up, motivational techniques, educational techniques, cognitive-behavioral techniques, coping skills training, relapse prevention training, and behavioral tailoring for medication. It also assesses structural and curriculum-based elements, including the number of people in a group, the number of sessions held, the content modules covered, provision of educational handouts, and involvement of significant others (see “Appendix” section). The scale consists of 13 items, with each scored on a 5-point scale (from one indicating no implementation and five indicating full implementation).

A Norwegian translation agency translated the IMR Fidelity scale into Norwegian, in conjunction with the translation of the IMR manual (Egeland 2018). Two of the authors (KME and KSH) reviewed the translation in detail, repeatedly comparing it with the original version. A prior implementation project tested the translated version (Egeland et al. 2017).

Data Analyses

We examined agreement between assessors at the item level by percentage of exact agreement between pairs of assessors. We also examined mean agreement at each time period and across items for each of four time periods.

We calculated each assessor’s total fidelity score for each site, defined as the sum of the item ratings divided by the number of items (i.e., 13). To evaluate interrater reliability of the site fidelity ratings, we used the intraclass correlation coefficient (ICC) (McGraw and Wong 1996), based on a one-way random effects analysis of variance model for agreement between the two fidelity assessors on the IMR Fidelity scale. A single ICC was computed, combining paired ratings across all assessment points.

After assessing interrater agreement and reliability, we used consensus ratings in all subsequent analyses. To estimate internal consistency of the IMR scale, we used Cronbach’s alpha, calculating an alpha coefficient for each time period.

We next examined the item distributions at 18 months, examining mean, standard deviations, and distribution of scores across sites for full (rating = 5), adequate (4), and poor (1–3) fidelity. We also examined the distribution of site scores at 18 months.

Finally, we examined the longitudinal pattern of fidelity graphically and statistically. We examined sensitivity for change over time in IMR fidelity using a one-way ANOVA repeated measures design with pairwise post hoc tests with Bonferroni correction for changes between baseline and each of the three follow-up assessments. Change over time was estimated by calculating the standardized mean difference effect size (Cohen’s dz) for within-subjects design (Lakens 2013). All data analyses were done using SPSS for Windows version 25 (


Agreement Between Assessors on Individual Items

Over all items and time periods, exact agreement on items was very high, averaging 94% (see Table 2 in the “Appendix” section). The mean exact agreement declined from 99% at baseline to 90–93% thereafter. (High agreement at baseline was due to lack of IMR implementation and ratings of one.) At the item level, mean agreement on all four fidelity reviews on Item 13 (behavioural tailoring) was 82% and on Item 7 (goal follow-up) was 87%, while mean agreement on all other items exceeded 90%.

Interrater Reliability

Two fidelity assessors rated the IMR Fidelity scales on four occasions at each of the 11 participating sites. We aggregated paired ratings across four time periods to estimate interrater reliability for the 44 assessments (100% completion rate). The intraclass correlation measuring interrater reliability (assuming two assessors) was .99, indicating a very high degree of agreement. In all subsequent analyses, we report the findings based on consensus ratings.

Internal Consistency

After baseline, internal consistency (Cronbach’s alpha) was excellent: undefined (baseline), .91 (6 months), .94 (12 months), and .95 (18 months), suggesting that the 13 items comprising the IMR Fidelity scale are measuring a unitary construct. Internal consistency at baseline could not be calculated because nearly all items were rated 1 at all sites.

Item Analysis

As shown in Table 1, the item means for the 11 sites at 18 months ranged from 4.18 (Item 2: Program Length and Item 5: Involvement of Significant Others) to 4.82 (Item 12: Relapse Prevention Training). Notably, all of the items reached an average score exceeding 4.0, which is the benchmark for good fidelity. By contrast, at baseline, all mean item ratings were 1.60 or less. Thus, fidelity assessors used the entire rating scale from 1 to 5 for all 13 items, with no evidence of restriction of range.

Table 1 Item distributions on the IMR Fidelity scale at 18 months (N = 11 sites)

Changes over Time

We inspected the longitudinal pattern of changes graphically across the 18-month period for the 11 sites, as shown in Fig. 1. The mean improvement was sharp between baseline and 6 months, increasing from 1.01 to 3.61, and reaching good fidelity at 12 months (benchmark = 4.0) at 12 months and continuing to increase at 18 months (4.02 and 4.50). The change in IMR fidelity over time was highly significant, F (1, 10) = 148.69, p= .00. Post-hoc t tests comparing baseline fidelity ratings to 6, 12, and 18 months confirmed sensitivity to change, with t values of 7.45, at 6 months, 8.26 at 12 months, and 12.10 at 18 months all significant at p < .001. The standardized mean difference effect size (Cohen’s dz) was 3.65.

Fig. 1

Development of IMR fidelity from baseline to 18 months

We also examined change over time looking at the percentage of sites attaining good fidelity (4.0 or higher) at each time period. At baseline, none of the sites had any IMR services whatsoever (10 sites rated 1.0 and one site rated 1.1). The number and percentage of sites attaining good fidelity were 6 (55%) at 6 months, 8 (73%) at 12 months, and 10 (91%) at 18 months. Moreover, the number and percentage attaining very good fidelity (4.5 or higher) were 2 (18%) at 6 months, 6 (55%) at 12 months, and 8 (73%) at 18 months. In summary, most sites attained good fidelity by 6 months and very good fidelity by 12 months.


Overall, the psychometric properties of the IMR Fidelity scale were excellent, with very strong interrater reliability and a high degree of agreement between assessors and very good internal consistency at all three follow-up assessments. The scale was sensitive to change, and the entire rating scale from 1 to 5 was used.

High agreement in the overall decision between the assessors indicates that the scale items are easy to understand and to agree on. Using the entire rating scale from 1 to 5 for all 13 items indicates no restriction of range and being sensitive to change. Using the scale in clinics to document improvement may reinforce good clinical practice. It can also inform the need for specific training in different areas. Because fidelity monitoring leads to understanding and sustainment of practices in the clinics (Bond et al. 2009), the findings reinforce wide use of the IMR Fidelity scale for clinical purposes as well as in research.

In general, research on the psychometric properties of Fidelity scales is lacking (Martinez et al. 2014). This study therefore responds to a strong need. Other Fidelity scales should receive similar psychometric attention.

Our findings identified two items with lower (still adequate) agreement: item seven (IMR goal follow-up) and item 13 (behavioral tailoring for medication). Improving agreement on these items would require written documentation and interviews with patients (Bond et al. 2009).

Seven months after the completion of the formal study, fidelity assessors completed a survey on their experiences using the IMR Fidelity scale. Overall, assessors reported some challenges in finding the relevant data but few other difficulties. They reported that the scale was easy to score and had clear instructions. The assessors perceived that the interviews with practitioners provided the most useful source of information. Interviews with leaders and progress notes were less helpful. Nevertheless, using multiple sources (triangulation) enhances validity.

Although several studies have used the IMR Fidelity scale to measure fidelity, the current study is the first to examine psychometric properties thoroughly. Nonetheless, some limitations deserve mention. The fidelity assessments neither included interviews with patients nor observation of IMR sessions. We have not yet assessed predictive validity, the strongest evidence for utility of a Fidelity scale. Although some studies have shown that core principles predict outcomes (Bartholomew and Kensler 2010; Hasson-Ohayon et al. 2007; McGuire et al. 2012), no published study has thus far examined the predictive validity of the IMR Fidelity scale. Experts recommend regular fidelity monitoring (Bond et al. 2009), which is always difficult to implement (Bond et al. 2014; Egeland et al. 2017; Rychener et al. 2009). The widespread use of Fidelity scales awaits electronic health records designed to facilitate quality measurement.


The IMR Fidelity scale coheres well, including excellent interrater reliability, internal consistency, sensitivity to change and use of the full scale. Our study supports its use for clinical and research purposes. Other Fidelity scales need similar psychometric evaluations. Widespread use of Fidelity scales will require electronic health records designed to facilitate quality measurement.


  1. Bartholomew, T., & Kensler, D. (2010). Illness Management and Recovery in state psychiatric hospitals. American Journal of Psychiatric Rehabilitation, 13(2), 105–125.

    Article  Google Scholar 

  2. Bond, G., Becker, D. R., & Drake, R. E. (2011). Measurement of fidelity of implementation of evidence-based practices: Case example of the IPS Fidelity scale. Clinical Psychology: Science and Practice, 18(2), 126–141.

    Google Scholar 

  3. Bond, G., Drake, R. E., McHugo, G. J., Peterson, A. E., Jones, A. M., & Williams, J. (2014). Long-term sustainability of evidence-based practices in community mental health agencies. Administration and Policy in Mental Health and Mental Health Services Research, 41(2), 228–236.

    Article  PubMed  Google Scholar 

  4. Bond, G., Drake, R. E., McHugo, G. J., Rapp, C. A., & Whitley, R. (2009). Strategies for improving fidelity in the national evidence-based practices project. Research on Social Work Practice, 19(5), 569–581.

    Article  Google Scholar 

  5. Cross, W., & West, J. (2011). Examining implementer fidelity: Conceptualising and measuring adherence and competence. Journal of Children’s Services, 6(1), 18–33.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Egeland, K. M. (2018). The role of practitioners in the implementation of evidence-based practices in mental health services: Attitudes, participation, and experiences.. (PhD), University of Oslo, Oslo, Norway Retrieved from

  7. Egeland, K. M., Ruud, T., Ogden, T., Färdig, R., Lindstrøm, J. C., & Heiervang, K. S. (2017). How to implement Illness Management and Recovery (IMR) in mental health service settings: Evaluation of the implementation strategy. International Journal of Mental Health Systems, 11(1), 13.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Garber-Epstein, P., Yamin, A., & Roe, D. (2013). Promoting recovery in Israel: A decade of efforts to implement Illness Management and Recovery (IMR). Worl Association of Psychiatric Rehabilitation Bulletin, 31, 5–11.

    Google Scholar 

  9. Gingerich, S., & Mueser, K. (2011). Illness Management and Recovery IMR: Personalized skills and strategies for those with mental illness (3rd ed.). Center City, MN: Hazelden.

    Google Scholar 

  10. Hasson-Ohayon, I., Roe, D., & Kravetz, S. (2007). A randomized controlled trial of the effectiveness of the Illness Management and Recovery program. Psychiatric Services, 58(11), 1461–1466.

    Article  PubMed  Google Scholar 

  11. Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863.

    Article  Google Scholar 

  12. Martinez, R. G., Lewis, C. C., & Weiner, B. J. (2014). Instrumentation issues in implementation science. Implementation Science, 9(1), 118.

    Article  PubMed  Google Scholar 

  13. McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30–46.

    Article  Google Scholar 

  14. McGuire, A. B., Kukla, M., Green, A., Gilbride, D., Mueser, K. T., & Salyers, M. P. (2014). Illness Management and Recovery: A review of the literature. Psychiatric Services, 65(2), 171–179.

    Article  PubMed  Google Scholar 

  15. McGuire, A. B., Stull, L. G., Mueser, K. T., Santos, M., Mook, A., Rose, N., et al. (2012). Development and reliability of a measure of clinician competence in providing Illness Management and Recovery. Psychiatric Services (Washington, DC), 63(8), 772–778.

    Article  Google Scholar 

  16. McHugo, G. J., Drake, R. E., Whitley, R., Bond, G., Campbell, K., Rapp, C. A., et al. (2007). Fidelity outcomes in the national implementing evidence-based practices project. Psychiatric Services, 58(10), 1279–1284.

    Article  PubMed  Google Scholar 

  17. Mueser, K. T., Meyer, P. S., Penn, D. L., Clancy, R., Clancy, D. M., & Salyers, M. P. (2006). The Illness Management and Recovery program: Rationale, development, and preliminary findings. Schizophrenia Bulletin, 32(Suppl 1), S32–S43.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Pratt, C. W., Smith, R. C., Kazmi, A., & Ahmed, S. (2011). Inttoducing pychiatric rehabilitation at a psychiatric faculty in Pakistan. American Journal of Psychiatric Rehabilitation, 14, 259–271.

    Article  Google Scholar 

  19. Proctor, E. K., Silmere, H., Raghavan, R., Hovmand, P., Aarons, G., Bunger, A., et al. (2011). Outcomes for implementation research: Conceptual distinctions, measurement challenges, and research agenda. Administration and Policy In Mental Health, 38(2), 65–76.

    Article  PubMed  Google Scholar 

  20. Roosenschoon, B.-J., Mulder, C. L., Deen, M. L., & van Weeghel, J. (2016). Effectiveness of Illness Management and Recovery (IMR) in the Netherlands: A randomised clinical trial. BMC Psychiatry, 16(1), 73.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Rychener, M., Salyers, M. P., Labriola, S., & Little, N. (2009). Thresholds’ wellness management and recovery implementation. American Journal of Psychiatric Rehabilitation, 12(2), 172–184.

    Article  Google Scholar 

  22. Salyers, M. P., Godfrey, J. L., McGuire, A. B., Gearhart, T., Rollins, A. L., & Boyle, C. (2009). Implementing the Illness Management and Recovery program for consumers with severe mental illness. Psychiatric Services, 60(4), 483–490.

    Article  PubMed  Google Scholar 

  23. The National Board of Health and Welfare. (2017). Nationella riktlinjer för vård och stöd vid schizofreni och schizofreniliknande tilståndStöd för styrning och ledningRemissversion. Stockholm, Sweden: Socialstyrelsen Retrieved from file:///C:/Users/KREG/Downloads/

Download references


This study was funded by South-Eastern Norway Regional Health Authority (Helse Sør-Øst) (Grant No. 2015106).

Author information



Corresponding author

Correspondence to Karina Myhren Egeland.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the national research committee (REK 2015/2169) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



The following tables to be included in the Appendix.

See Tables 2, 3.

Table 2 Percentage agreement between fidelity assessors on individual items
Table 3 Illness Management and Recovery (IMR) Fidelity scale rev. 3-24-05

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Egeland, K.M., Heiervang, K.S., Landers, M. et al. Psychometric Properties of a Fidelity Scale for Illness Management and Recovery. Adm Policy Ment Health 47, 885–893 (2020).

Download citation


  • IMR Fidelity scale
  • Psychometric properties
  • Illness Management and Recovery
  • Measurement