Abstract
Workplace-based clinical supervision is common in community based mental health care for youth and families and could be a leveraged to scale and improve the implementation of evidence-based treatment (EBTs). Accurate methods are needed to measure, monitor, and support supervisor performance with limited disruption to workflow. Audit and Feedback (A&F) interventions may offer some promise in this regard. The study—a randomized controlled trial with 60 clinical supervisors measured longitudinally for 7 months—had two parts: (1) psychometric evaluation of an observational coding system for measuring adherence and competence of EBT supervision and (2) evaluation of an experimental Supervisor Audit and Feedback (SAF) intervention on outcomes of supervisor adherence and competence. All supervisors recorded and uploaded weekly supervision sessions for 7 months, and those in the experimental condition were provided a single, monthly web-based feedback report. Psychometric performance was evaluated using measurement models based in Item Response Theory, and the effect of the SAF intervention was evaluated using mixed-effects regression models. The observational instrument performed well across psychometric indicators of dimensionality, rating scale functionality, and item fit; however, coder reliability was lower for competence than for adherence. Statistically significant A&F effects were largely in the expected directions and consistent with hypotheses. The observational coding system performed well, and a monthly electronic feedback report showed promise in maintaining or improving community-based clinical supervisors’ adherence and, to a lesser extent, competence. Limitations discussed include unknown generalizability to the supervision of other EBTs.
Similar content being viewed by others
References
American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME). (1999). Standards for educational and psychological testing.
Atkins, M. S., Shernoff, E. S., Frazier, S. L., Schoenwald, S. K., Capella, E., Marinez-Lora, A., Mehta, T. G., Lakind, D., Cua, G., Bhaumik, R., & Bhaummik, D. (2015). Redesigning community mental health services for urban children: Supporting schooling to promote mental health. Journal of Consulting and Clinical Psychology, 83, 839–852. https://doi.org/10.1037/a0039661
Bailin, A., Bearman, S. K., & Sale, R. (2018). Clinical supervision of mental health professionals serving youth: Format and microskills. Administration and Policy in Mental Health and Mental Health Services Research, 45, 800–812. https://doi.org/10.1007/s10488-018-0865-y
Bearman, S. K., Schneiderman, R. L., & Zoloth, E. (2017). Building and evidence base for effective supervision practices: An analogue experiment of supervision to increase EBT Fidelity. Administration and Policy in Mental Health and Mental Health Services Research, 44, 293–307. https://doi.org/10.1007/s10488-016-0723-8
Bearman, S. K., Weisz, J. R., Chorpita, B. F., Hoagwood, K., Ward, A., Ugueto, A. M., Bernstein, A., The Research Network on Youth Mental Health. (2013). More practice, less preach? The role of supervision processes and therapist characteristics in EBP implementation. Administration and Policy in Mental Health and Mental Health Services Research, 40, 518–529. https://doi.org/10.1007/s10488-013-0485-5
Beidas, R. S., & Kendall, P. C. (2010). Training therapists in evidence-based practice: A critical review of studies from a systems-contextual perspective. Clinical Psychology: Science and Practice, 17(1), 1–30. https://doi.org/10.1111/j.1468-2850.2009.01187.x
Beidas, R. S., Maclean, J. C., Fishman, J., Dorsey, S., Schoenwald, S. K., Mandell, D. S., Shea, J. A., McLeod, B. D., French, M. T., Hogue, A., Adams, D. R., Lieberman, A., Becker-Haimes, M., & Marcus, S. C. (2016). A randomized trial to identify accurate and cost-effective fidelity measurement methods for cognitive-behavioral therapy: Project FACTS study protocol. BMC Psychiatry, 16(323), 1–10. https://doi.org/10.1186/s12888-016-1034-z
Bickman, L. (2000). Our quality-assurance methods aren’t so sure. Behavioral Health Tomorrow, 9(3), 41–42.
Bickman, L. (2020). Improving mental health services: A 50-year journey from randomized experiments to artificial intelligence and precision mental health. Administration and Policy in Mental Health and Mental Health Services Research, 47(5), 795–843. https://doi.org/10.1007/s10488-020-01065-8
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). Routledge.
Boxmeyer, C. L., Lochman, J. E., Powell, N. R., Windle, M., & Wells, K. (2008). School counselors’ implementation of Coping Power in a dissemination field trial: Delineating the range of flexibility within fidelity. Emotional and Behavioral Disorders in Youth, 8, 79–95.
Bradshaw, T., Butterworth, A., & Mairs, H. (2007). Does structured clinical supervision during psychosocial intervention education enhance outcome for mental health nurses and the service users they work with? Journal of Psychiatric & Mental Health Nursing, 14, 4–12. https://doi.org/10.1111/j.1365-2850.2007.01021.x
Brunette, M. F., Cotes, R. O., de Nesnera, A., McHugo, G., Dzebisashvili, N., Xie, H., & Bartels, S. J. (2018). Use of academic detailing with audit and feedback to improve antipsychotic pharmacotherapy. Psychiatric Services, 69(9), 1021–1028. https://doi.org/10.1176/appi.ps.201700536
Caron, E. B., & Dozier, M. (2019). Effects of fidelity-focused consultation on clinicians’ implementation: An exploratory multiple baseline design. Administration and Policy in Mental Health and Mental Health Services Research, 46, 445–457. https://doi.org/10.1007/s10488-019-00924-3
Chambers, D. A., Glasgow, R., & Stange, K. (2013). The dynamic sustainability framework: Addressing the paradox of sustainment amid ongoing change. Implementation Science, 8(1), 117. https://doi.org/10.1186/1748-5908-8-117
Charlton, C., Rasbash, J., Browne, W., Healy, M., & Cameron, B. (2019). MLwiN (Version 3.03) [Computer software and manual]. Centre for Multilevel Modelling. Retrieved from http://www.bristol.ac.uk/cmm/software/mlwin/
Chorpita, B. F., & Regan, J. (2009). Dissemination of effective mental health treatment procedures: Maximizing the return on a significant investment. Behaviour Research and Therapy, 47, 990–993. https://doi.org/10.1016/j.brat.2009.07.002
Collyer, H., Eisler, I., & Woolgar, M. (2020). Systematic literature review and meta-analysis of the relationship between adherence, competence and outcome in psychotherapy for children and adolescents. European Child & Adolescent Psychiatry, 29(4), 417–431. https://doi.org/10.1007/s00787-018-1265-2
Colquhoun, H. L., Carroll, K., Eva, K. W., Grimshaw, J. G., Ivers, N., Michie, S., & Brehaut, J. C. (2021). Informing the research agenda for optimizing audit and feedback interventions: Results of a prioritization exercise. BMC Medical Research Methodology, 21(1), 20. https://doi.org/10.1186/s12874-020-01195-5
Colquhoun, H., Michie, S., Sales, S., Ivers, N., Grimshaw, J. M., Carroll, K., Chalifoux, M., Eva, K., & Brehaut, J. (2017). Reporting and design elements of audit and feedback interventions: A secondary review. BMJ Quality and Safety, 26, 54–60. https://doi.org/10.1136/bmjqs-2015-005004
Dorsey, S., Kerns, S. E. U., Lucid, L., Pullmann, M. D., Harrison, J. P., Berliner, L., Thompson, K., & Deblinger, E. (2018). Objective coding of content and techniques in workplace-based supervision of an EBT in public mental health. Implementation Science, 13(1), 19. https://doi.org/10.1186/s13012-017-0708-3
Dorsey, S., Pullmann, M. D., Deblinger, E., Berliner, L., Kerns, S. E., Thompson, K., Unützer, J., Weisz, J. R., & Garland, A. F. (2013). Improving practice in community-based settings: A randomized trial of supervision – study protocol. Implementation Science, 8, 89. https://doi.org/10.1186/1748-5908-8-89
Farmer, C. C., Mitchell, K. S., Parker-Guilbert, K., & Galovski, T. E. (2016). Fidelity to the cognitive processing therapy protocol: Evaluation of critical elements. Behavior Therapist, 48(2), 195–206. https://doi.org/10.1016/j.beth.2016.02.009
Fixsen, D. L., Naoom, S. F., Blasé, K. A., Friedman, R. M., & Wallace, F. (2005). Implementation research: A synthesis of the literature. University of South Florida, Louis de la Parte Florida Mental Health Institute, The National Implementation Research Network (FMHI Publication #231).
Glasgow, R. E., & Riley, W. T. (2013). Pragmatic measures: What they are and why we need them. American Journal of Preventive Medicine, 45(2), 237–243. https://doi.org/10.1016/j.amepre.2013.03.010
Gleacher, A. A., Olin, S. S., Nadeem, E., Pollock, M., Ringle, V., Bickman, L., Douglas, S., & Hoagwood, K. (2016). Implementing a measurement feedback system in community mental health clinics: A case study of multilevel barriers and facilitators. Administration and Policy in Mental Health and Mental Health Services Research, 43(3), 426–440. https://doi.org/10.1007/s10488-015-0642-0
Henggeler, S. W., Melton, G. B., Brondino, M. J., Scherer, D. G., & Hanley, J. H. (1997). Multisystemic therapy with violent and chronic juvenile offenders and their families: The role of treatment fidelity in successful dissemination. Journal of Consulting and Clinical Psychology, 65, 821–833.
Henggeler, S. W., Pickrel, S. G., & Brondino, M. J. (1999). Multisystemic treatment of substance abusing and dependent delinquents: Outcomes, treatment fidelity, and transportability. Mental Health Services Research, 1, 171–184.
Henggeler, S. W., Schoenwald, S. K., Borduin, C. M., Rowland, M. D., & Cunningham, P. B. (2009). Multisystemic therapy for antisocial behavior in children and adolescents (2nd ed.). The Guilford Press.
Hogue, A., Henderson, C. E., Dauber, S., Barajas, P. C., Fried, A., & Liddle, H. A. (2008). Treatment adherence, competence, and outcome in individual and family therapy for adolescent behavior problems. Journal of Consulting and Clinical Psychology, 76(4), 544–555. https://doi.org/10.1037/0022-006X.76.4.544
Hogue, A., Liddle, H. A., & Rowe, C. (1996). Treatment adherence process research in family therapy: A rationale and some practical guidelines. Psychotherapy: Theory Research, Practice, Training, 33, 332–345. https://doi.org/10.1037/0033-3204.33.2.332
Huey, S. J., Henggeler, S. W., Brondino, M. J., & Pickrel, S. G. (2000). Mechanisms of change in multisystemic therapy: Reducing delinquent behavior through therapist adherence and improved family and peer functioning. Journal of Consulting and Clinical Psychology, 68, 451–467.
Ivers, N., Jamtvedt, G., Flottorp, S., Young, J. M., Odgaard-Jensen, J., French, S. D., O’Brien, M. A., Johansen, M., Grimshaw, J., & Oxman, A. D. (2012). Audit and feedback: Effects on professional practice and healthcare outcomes. Cochrane Database Systematic Review, 6, CD000259. https://doi.org/10.1002/14651858.CD000259.pub3
Ivers, N., Sales, A., Colquhoun, H., Michie, S., Foy, R., Francis, J. J., & Grimshaw, J. M. (2014). No more ‘business as usual’ with audit and feedback interventions: Towards an agenda for a reinvigorated intervention. Implementation Science, 9, 14. https://doi.org/10.1186/1748-5908-9-14
Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38, 79–93. https://doi.org/10.1111/2Fj.1745-3984.2001.tb01117.x
Lambert, M. J., & Harmon, K. L. (2018). The merits of implementing routine outcome monitoring in clinical practice. Clinical Psychology: Science and Practice, 25(4), e12268. https://doi.org/10.1111/cpsp.12268
Lambert, M. J., Whipple, J. L., & Kleinstäuber, M. (2018). Collecting and delivering progress feedback: A meta-analysis of routine outcome monitoring. Psychotherapy, 55(4), 520–537. https://doi.org/10.1037/pst0000167
Landis-Lewis, Z., Kononowech, J., Scott, W. J., Hogikyan, R. V., Carpenter, J. G., Periyakoil, V. S., Miller, S. C., Levy, C., Ersek, M., & Sales, A. (2020). Designing clinical practice feedback reports: Three steps illustrated in Veterans Health Affairs long-term care facilities and programs. Implementation Science, 15, 7. https://doi.org/10.1186/s13012-019-0950-y
Landsverk, J. A., Brown, C. H., Chamberlain, P., Palinkas, L. A., Ogihara, M., Czaja, S., Goldhaber-Fiebert, J. D., Vermeer, W., Saldana, L., Rolls Reutz, J. A., & Horwitz, S. M. (2012). Design and analysis in dissemination and implementation research. In R. C. Brownson, G. A. Colditz & E. K. Proctor (Eds.), Dissemination and implementation research in health: Translating research to practice (pp. 225–260). Oxford University Press.
Lewis, C. C., Boyd, M., Puspitasari, A., Navarro, E., Howard, J., Kassab, H., Hoffman, M., Scott, K., Lyon, A., Douglas, S., Simon, G., & Kroenke, K. (2019). Implementing measurement-based care in behavioral health: A review. JAMA Psychiatry, 76(3), 324–335. https://doi.org/10.1001/jamapsychiatry.2018.3329
Linacre, J. M. (1994). Many-facet Rasch measurement. MESA Press.
Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3, 85–106.
Linacre, J. M. (2019b). WINSTEPS Rasch measurement computer program [Computer software and manual]. Retrieved from https://www.winsteps.com/facets.htm
Linacre, J. M. (2019a). FACETS Rasch measurement computer program [Computer software and manual]. Retrieved from https://www.winsteps.com/facets.htm
Linacre, J. M., & Wright, B. D. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370.
Lochman, J. E., Boxmeyer, C., Powell, N., Qu, L., Wells, K., & Windle, M. (2009). Dissemination of the Coping Power program: Importance of intensity of counselor training. Journal of Consulting and Clinical Psychology, 77, 397–409. https://doi.org/10.1037/a0014514
Lyon, A. R., Lewis, C. C., Boyd, M. R., Hendrix, E., & Liu, F. (2016). Capabilities and characteristics of digital measurement feedback systems: Results from a comprehensive review. Administration and Policy in Mental Health and Mental Health Services Research, 43(3), 441–466. https://doi.org/10.1007/s10488-016-0719
Maas, C. J. M., & Hox, J. J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1, 86–92. https://doi.org/10.1027/1614-2241.1.3.86
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using Many-Facet Rasch Measurement: Part I. Journal of Applied Measurement, 4, 386–422.
Newman, C. F., Reiser, R. P., & Milne, D. L. (2016). Supporting our supervisors: A summary and discussion of the special issue on CBT supervision. The Cognitive Behavior Therapist, 9, e29. https://doi.org/10.1017/S1754470X16000106
O’Donohue, W., & Maragakis, A. (Eds.). (2016). Quality Improvement in behavioral Health. Springer.
Ogden, T., & Hagen, K. A. (2006). Multisystemic therapy of serious behavior problems in youth: Sustainability of therapy effectiveness two years after intake. Child and Adolescent Mental Health, 11(3), 142–149.
Perepletchikova, F., Treat, T. A., & Kazdin, A. E. (2007). Treatment integrity in psychotherapy research: Analysis of the studies and examination of the associated factors. Journal of Consulting and Clinical Psychology, 75, 829–841. https://doi.org/10.1037/0022-006X.75.6.829
Rasch, G. (1980). Probabilistic models for some intelligence and achievement tests. University of Chicago Press.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Sage Publications.
Raudenbush, S. W., Bryk, A. S., & Congdon, R. (2013). HLM 7: Hierarchical linear & nonlinear modeling (version 7.00) [Computer software & manual]. Scientific Software International. Retrieved from https://ssicentral.com/index.php/products/hlm-general/
Ravand, H. (2015). Item response theory using hierarchical generalized linear models. Practical Assessment, Research, & Evaluation, 20(7), 1–17. https://doi.org/10.7275/s4n1-kn37
Roth, A. D., Pilling, S., & Turner, J. (2010). Therapist training and supervision in clinical trials: Implications for clinical practice. Behavioral and Cognitive Psychotherapy, 38(3), 291–302. https://doi.org/10.1017/S13524658100000068
Saldana, L., Chamberlain, P., & Chapman, J. (2016). A supervisor-targeted implementation approach to promote system change: The R3 Model. Administration and Policy in Mental Health and Mental Health Services Research, 43(6), 879–892. https://doi.org/10.1007/s10488-016-0730-9
Sale, R., Bearman, S. K., Woo, R., & Baker, N. (2021). Introducing a measurement feedback system for youth mental health: Predictors and impact of implementation in a community agency. Administration and Policy in Mental Health and Mental Health Services Research, 48(2), 327–342. https://doi.org/10.1007/s10488-020-01076-5
Schoenwald, S. K. (1998). MST personnel data inventory. Medical University of South Carolina.
Schoenwald, S. K. (2016). The Multisystemic therapy® quality assurance/quality improvement system. In W. O’Donohue & A. Maragakis (Eds.), Quality improvement and behavioral health (pp. 169–192). Springer International Publishing Switzerland.
Schoenwald, S. K., Chapman, J. E., Kelleher, K., Hoagwood, K. E., Landsverk, J., Stevens, J., Glisson, C., Rolls-Reutz, J., The Research Network on Youth Mental Health. (2008). A survey of the infrastructure for children’s mental health services: Implications for the implementation of empirically supported treatments (ESTs). Administration and Policy in Mental Health and Mental Health Services Research, 35, 84–97. https://doi.org/10.1007/s10488-007-0147
Schoenwald, S. K., & Garland, A. F. (2013). A review of treatment adherence measurement methods. Psychological Assessment, 25, 146–156. https://doi.org/10.1037/a0029715
Schoenwald, S. K., Garland, A. F., Chapman, J. E., Frazier, S. L., Sheidow, A. J., & Southam-Gerow, M. A. (2011). Toward the effective and efficient measurement of implementation fidelity. Administration and Policy in Mental Health and Mental Health Services Research, 38, 32–43. https://doi.org/10.1007/s10488-010-0321-0
Schoenwald, S. K., Mehta, T. G., Frazier, S. L., & Shernoff, E. S. (2013). Clinical supervision in effectiveness and implementation research. Clinical Psychology: Science and Practice, 20, 44–59. https://doi.org/10.1080/14733140601185274
Schoenwald, S. K., Sheidow, A. J., & Chapman, J. E. (2009). Clinical supervision in treatment transport: Effects on adherence and outcomes. Journal of Consulting and Clinical Psychology, 77(3), 410–421. https://doi.org/10.1037/a0013788
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press.
Smith, E. V., Jr. (2002). Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Applied Measurement, 3, 205–231.
Southam-Gerow, M., Chapman, J. E., Martinez, R. G., McLeod, B. D., Hogue, A., Weisz, J. R., & Kendall, P. C. (2021). Are therapist adherence and competence related to clinical outcomes in cognitive-behavioral treatment for youth anxiety? Journal of Consulting and Clinical Psychology, 89(3), 188–199. https://doi.org/10.1037/ccp0000538
Stirman, S. W., Calloway, A., Toder, K., Miller, C. J., DeVito, A. K., Meisel, S. N., Xhezo, R., Evans, A. C., Beck, A. T., & Crits-Christoph, P. (2013). Modifications to cognitive therapy by community mental health providers: Implications for effectiveness and sustainability. Psychiatric Services, 64(10), 1056–1059. https://doi.org/10.1176/appi.ps.201200456
Stirman, S. W., Finley, E. P., Shields, N., Cook, J., Haine-Schlagel, R., Burgess, J. F., Dimeff, L., Koerner, K., Suvak, M., Gutner, C. A., Gagnon, D., Masina, T., Beristianos, M., Mallard, K., Ramirez, V., & Monson, C. (2017). Improving and sustaining delivery of CPT for PTSD in mental health systems: A cluster randomized trial. Implementation Science, 12(1), 32. https://doi.org/10.1186/s13012-017-0544-5
Stirman, S. W., Kimberly, J. R., Calloway, A., Cook, N., Castro, F., & Charns, M. P. (2012). The sustainability of new programs and interventions: A review of the empirical literature and recommendations for future research. Implementation Science, 7(1), 17. https://doi.org/10.1186/1748-5908-7-17
Stirman, S. W., Marques, L., Creed, T. A., Cassidy, A. G., DeRubeis, R., Barnett, P. G., Kuhn, E., Suvak, M., Owen, J., Vogt, D., Jo, B., Schoenwald, S., Johnson, C., Mallard, K., Beristianos, M., & La Bash, H. (2018). Leveraging routine clinical materials and mobile technology to assess CBT fidelity: The Innovative Methods to Assess Psychotherapy Practices (imAPPP) study. Implementation Science, 13(1), 69. https://doi.org/10.1186/s13012-018-0756-3
Stirman, S. W., Shields, N., Deloriea, J., Landy, M. S. H., Belus, J. M., Maslej, M. M., & Monson, C. M. (2013). A randomized controlled dismantling trial of post-workshop consultation strategies to increase effectiveness and fidelity to an evidence-based psychotherapy for posttraumatic stress disorder. Implementation Science, 8, 82. https://doi.org/10.1186/1748-5908-8-82
Stone, M. H. (2003). Substantive scale construction. Journal of Applied Measurement, 4, 282–297.
Timmons-Mitchell, J., Bender, M. B., Kishna, M. A., & Mitchell, C. C. (2006). An independent effectiveness trial of multisystemic therapy with juvenile justice youth. Journal of Clinical Child and Adolescent Psychology, 35, 227–236.
Weisz, J. R., Ng, M. Y., & Bearman, S. K. (2014). Odd couple? Reenvisioning the relation between science and practice in the dissemination-implementation era. Clinical Psychological Science, 2(1), 58–74. https://doi.org/10.1177/2167702613501307
Wilson, M. (2005). Constructing measures: An item response modeling approach. Erlbaum.
Wolfe, E. W., & Smith, E. V., Jr. (2007). Instrument development tools and activities for measure validation using Rasch models. Journal of Applied Measurement, 8, 97–123.
Wright, B. D., & Masters, G. (1982). Rating scale analysis. MESA Press.
Wright, B. D., & Mok, M. (2000). Rasch models overview. Journal of Applied Measurement, 1, 83–106.
Yeaton, W. H., & Sechrest, L. (1981). Critical dimensions in the choice and maintenance of successful treatments: Strength, integrity, and effectiveness. Journal of Consulting and Clinical Psychology, 49, 156–167.
Funding
Funding for the study was provided by Grant R21/R33MH097000 from the National Institute of Mental Health. The authors wish to thank R21 project coordinator Jennifer Smith Powell and R33 project coordinator Erin McKercher for managing all aspects of the data collection efforts. The authors are grateful to the supervisors in the study, whose dedication to service includes participating in research that might improve it; and, to the leadership of the provider organizations who supported that participation.
Author information
Authors and Affiliations
Contributions
SKS is a co-founder and part owner of MST® Services, LLC, which has the exclusive agreement through the Medical University of South Carolina for the transfer of MST technology. She also receives royalties from Guilford Press for published volumes on MST. There is a management plan in place to ensure these conflicts do not jeopardize the objectivity of this research. She did not collect or analyze data for the study. AJS is a co-owner of Science to Practice Group, LLC, which provides the training and quality assurance for an adaptation of MST for emerging adults (MST-EA). There is a management plan in place to ensure this conflict does not jeopardize the objectivity of this research. She did not collect or analyze data for the study. PBC is a part owner of Evidence Based Services, Inc., a MST Network Partner Organization. He also receives royalties from Guilford Press for published volumes on MST. There is a management plan in place to ensure these conflicts do not jeopardize the objectivity of this research. He did not collect or analyze data for the study.
Corresponding author
Ethics declarations
Ethical Approval
All research procedures were fully consistent with ethical guidelines and approved by the pertinent Institutional Review Boards.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Initial Development and Evaluation of the Supervisor Observational Coding System
The Supervisor Observational Coding System (SOCS) was developed in the first study (R21MH097000). The measurement development process included five steps that were based on the Standards for Educational and Psychological Testing (SEPT; APA, AERA, NCME, & 1999) and associated methods from Item Response Theory (IRT; Wilson, 2005; Stone, 2003; Wolfe & Smith, 2007). The development team for the SOCS included four MST content experts (two MST researchers and two MST Expert trainers), a fidelity measurement consultant, and a measurement development expert. The resulting instrument was pilot tested in the first study, and the instrument was then revised for use in the present study. The psychometric performance of the revised SOCS is detailed extensively in the Results, and the five steps of the initial measurement development process are described next.
Step 1: Define the Purpose of the Instrument and Intended Use of the Scores
The purpose of the instrument was to measure the primary outcome for the experimental supervisor audit-and-feedback (SAF) system that is the focus of this manuscript. Additionally, the instrument and scores were intended for routine monitoring of supervisor fidelity in real-world practice settings. Importantly, the instrument was to be used with audio or video recordings that were rated by trained observational coders. Without separate revision and evaluation efforts, the instrument was not intended for use with self-reports, retrospective reports, or other non-observational reports from other respondents.
Step 2: Define the Main Requirements for the Instrument
The SOCS was intended to be coded in approximately real-time. Most components would be rated for Adherence (i.e., whether the component was delivered), and components that were delivered would also be rated for Competence (i.e., the quality of delivery). Related to this, some components were “always applicable” and therefore would only receive a rating for competence. All components would need to be directly observable from audio-recordings of group supervision sessions. The sessions would be structured with a series of case discussions (typically prioritized by clinical need) among a team of three to four therapists and one supervisor. Accordingly, ratings of individual case discussions were determined to be preferable to providing one set of ratings for the overall supervision session; however, scoring was not necessarily intended to occur at the level of individual case discussions.
Step 3: Define the Components of MST Supervisor Fidelity
This step involved defining a complete list of components, defining rating scale constructs and category labels for adherence and competence, and developing a coding manual for use by the observational coders. Leveraging existing MST supervision materials, candidate components were identified across three theoretical dimensions: Analytical Process (AP), Use of Principles (P), and Structure and Process (SP). A fourth dimension, Delivery Method (DM), was also defined. To ensure that the identified components would be suitable for supervisors with varying levels of adherence and competence, each was located, in a one-to-three-word description, on a hypothetical continuum of supervisors that ranged from novice to expert. This continuum oriented the development team to the concepts of “difficulty” and “ability” which are essential to IRT-based measurement. Each component was located at the point where a supervisor with the given level of adherence or competence would be expected to deliver the component on a consistent basis. Using the information that resulted from this process, a coding manual was developed. Specifically, the coding manual included definitions of Adherence and Competence, a log of decision-rules and modifications, and definitions of each domain and component. For each component, there was a broad definition, definitions specific to Adherence and Competence, a list of terms used by supervisors when delivering the component, examples, counter-examples, and distinctions from similar components. Across the AP, P, SP, and DM domains, the resulting instrument included 40 components, with 13 for AP, 10 for P, 10 for SP, and 7 for DM. For Adherence, each component was rated on a 2-point scale (i.e., 0 = Not Delivered, 1 = Delivered), and components that were delivered were also rated for Competence on a 3-point scale (i.e., 1 = Low, 2 = Moderate, 3 = High). Of note, because the SP components were always applicable, all but one were only rated for Competence.
Step 4: Pilot Test the Coding System
Following procedures approved by the Institutional Review Board of the Medical University of South Carolina, 30 MST supervisors, located in more than 20 sites across the US, recorded weekly supervision sessions for a period of 10 consecutive weeks. A digital recorder was provided by the study, and following each session, the recording was uploaded to a secure server at MUSC via a web-based interface. The trained observational coders, hired and trained for the purpose of this study, were three master’s level individuals not involved in MST. The resulting pilot data were analyzed using IRT-based Rasch measurement models, and based on these results, the instrument was revised (see Step 5). There was no evidence of additional dimensionality within the AP, P, SP, or DM domains. The rate of absolute agreement across coders ranged from 78 to 88% for Adherence but was lower for Competence, rating from 39 to 54%. The three-point ordered categorical rating scale for Competence performed as expected with the exception of the DM domain, where only two categories were well-discriminated. For Adherence ratings in the AP, P, and SP domains, the components were well-targeted to the distribution of supervisors, with the components spanning a range of “difficulty” and assessing the full range of supervisor “ability.” For Competence ratings, the three-point scale provided good coverage of the supervisor distribution, though supervisors at the highest and lowest levels were not well-targeted. Across domains, four components evidenced unpredictable Adherence ratings, with five evidencing unpredictable Competence ratings. In each case, the pattern of misfit was suggestive of ambiguous component definitions and thresholds for endorsement.
Step 5. Refine the SOCS for Use in the Second Study
The SOCS was revised based on the psychometric results from the pilot study. The most significant change was that the DM domain was dropped, primarily to reduce coder burden. The revised instrument was comprised of three domains: AP with 10 components, P with 9 components, and SP with 7 components. The final components are reported in Table
A1. On the revised instrument, all AP and P components were rated both for Adherence and Competence. For SP, all of the components, with two exceptions, were rated for Competence only.
Appendix B
Example Feedback Reports
Rights and permissions
About this article
Cite this article
Chapman, J.E., Schoenwald, S.K., Sheidow, A.J. et al. Performance of a Supervisor Observational Coding System and an Audit and Feedback Intervention. Adm Policy Ment Health 49, 670–693 (2022). https://doi.org/10.1007/s10488-022-01191-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10488-022-01191-5