Ten percent of all U.S. youths experience serious emotional disorder (SED), defined as a psychiatric disorder that causes substantial impairment in one or more functional domains (Friedman et al. 1996; Williams et al. 2018). Since the publication of Unclaimed Children (Knitzer and Olson 1982) and A System of Care for Children and Youth with SED (Stroul and Friedman 1986), a primary goal of public serving systems has been to provide children and adolescents with SED access to effective community-based services that can prevent unnecessary out-of-home placement.

By some metrics, this movement has achieved modest success. For example, there has been a proliferation of states that organize some amount of their treatment for youth with SED on community-based systems of care and/or “wraparound” frameworks (Bruns et al. 2011; Sather and Bruns 2016). Evaluations of federal demonstration projects (e.g., Urdapilleta et al. 2011), state and county studies (e.g. Kamradt et al. 2008; Rauso et al. 2009; Yoe et al. 2011), and meta-analyses of controlled research (e.g., Suter and Bruns 2009) on these models have found reductions in institutional care and subsequent costs. Meanwhile, at a population level, Case et al. (2007) found a decline in inpatient mental health treatment of children and adolescents in U.S. community hospitals between 1990 and 2000. In the child welfare sector, the number of foster children in group homes and institutions experienced steady decline until 2012, when rates began to increase again (Child Trends Databank 2018).

Despite evidence of gains, however, there remains a persistent “imbalance” in the children’s behavioral health system (Cooper et al. 2008). Ringel and Sturm (2001) found that approximately one-third of the $12 billion spent on child mental health services were for out-of-home care, and there is little evidence that this imbalance has shifted. Soni (2014) found that mental health disorders represented the most costly health condition of childhood, due primarily to the reliance on institutional care for youths with SED. Pires et al. (2013) have documented that 38% of all Medicaid spending on children’s behavioral health is allocated to the 9% of youth with the most serious needs, mostly due to spending on residential and group placements for these children and youth. Meanwhile, child welfare out-of-home care expenses are estimated to be $10 billion annually, and over $5 billion is spent in juvenile justice to provide residential placements for committed youth (Scarcella et al. 2004; Justice Policy Institute 2009). Given the majority of youth in these sectors have mental health diagnoses (dosReis et al. 2005; Vincent et al. 2008), it is reasonable to assume that a large portion of placements for youth in the child welfare and justice sectors are related to presence of behavioral health needs.

Intensive Home-Based Treatments

To adhere to federal laws (e.g., the Early and Periodic Screening, Diagnostic and Treatment child health component of Medicaid; Olmstead v. L.C. 1999; The Americans with Disability Act of 1990), families and communities need to have access to effective community-based treatment alternatives that reduce the need for more restrictive residential treatment options for youth with SED. Typically, IHBT programs are considered the most intensive community-based treatment option in a comprehensive continuum of care for youth at risk of placement or returning home from placement.

Youth and families requiring IHBT services have multiple and complex needs that require access to an array of services and supports. IHBT is a comprehensive service that offers a range of individual treatment elements integrated into a single coordinated service. These elements typically include: (1) crisis response, stabilization, and safety planning; (2) psychoeducational skill building with youth; (3) skill building with caretakers (parenting and behavior management); (4) cognitive and emotional coping with a focus on trauma-informed care; (5) family systems therapy; and (6) resilience and support-building interventions (Kaplan and Girard 1994; Stroul 1988). The ultimate goal for IHBT is to reduce the risk of out-of-home placement for the youth, through (1) reduction in the youth’s emotional and behavioral symptomatology; (2) creation of safe, trauma-free environments; (3) improvement in family communication and relationships; (4) reduction in the magnitude and frequency of family conflicts; (5) an increase in youth and family supports, resources, and resilience; (6) reduction in family stressors; and (7) improved family problem solving and management of challenging behaviors.

Research on IHBT

The majority of research studies evaluating impact of intensive home-based interventions have focused on stabilizing families and reducing out of home placements for youth involved in the child welfare and juvenile justice systems (Moffett et al. 2018). For example, Homebuilders (Kinney et al. 1977), a 4–6 weeks “family preservation” program, has been the subject of many controlled studies and reviews. Over time, states and other child-serving systems adapted the Homebuilders model to serve youth involved in mental health (Lindblad-Goldberg et al. 1998; Stroul 1988) and juvenile justice systems (Henggeler 2011). Multisystemic Therapy (MST; Henggeler 2011; MST Services 2020), for example, utilizes an intensive home-based service delivery model to reduce adolescent substance abuse and offending behaviors, and is one of the most extensively researched evidence-based practices for juvenile justice involved youth, with 79 studies and 28 randomized clinical trials.

IHBT programs to reduce risk of out-of-home placement due to psychiatric impairment or SED have proliferated greatly in the U.S. over the past three decades, but controlled research on their effectiveness is less developed. IHBT was cited early in the systems of care movement as a means for addressing gaps in community-based children’s mental health services that led to unnecessary institutionalization, and was the focus of a monograph by one of the co-authors of the original monograph on Systems of Care (Stroul 1988). By 1999, 35 states offered some form of IHBT to children and youth. By 2016, IHBT programs were found to serve hundreds of thousands of children in nearly every state at costs in the tens to hundreds of millions. For example, in FY2018, Virginia expended $145.6 million in federal (Medicaid and Children’s Health Insurance Program) and state funds on IHBT services (Virginia Department of Medical Assistance Services 2018).

However, while child welfare- and juvenile justice-derived IHBT programs have been subjected to dozens of randomized trials, a recent review found only five controlled studies of IHBT for youth with emotional and behavioral impairment (Moffett et al. 2018). Experimental evaluation of a variant of MST called MST-Psychiatric found initial gains in emotional and behavioral functioning that favored the treatment group, but a follow-up study found no maintenance of these gains at 1 year post-treatment (Henggeler et al. 1999; Schoenwald et al. 2000). A second trial was terminated early due to fidelity control and enrollment problems (Rowland et al. 2005).

Controlled research on other IHBT programs, including an intensive home-based cognitive-behavioral treatment (CBT) program in Ontario (Wilmshurst 2002) and a Home-Based Crisis Intervention for children at risk for psychiatric placement (Evans et al. 1997), found null to very small effects. Finally, Barth et al. (2007) used quasi-experimental methods to evaluate an intensive in-home program adapted from MST (Intercept), in use in in 10 U.S. states, compared to comparable youth served via residential treatment. A non-significant trend of better outcomes (OHP, school attendance, avoidance of legal involvement) was found for the IHBT arm, indicating IHBT was at least as effective as residential treatment, as well as less restrictive and expensive (Barth et al. 2007).

Thus, there are a small number of IHBT models that are well-specified and evidence based, but the research base for IHBT programs specifically targeting youths with SED and behaviors relevant to these youth is scant. Other manualized IHBT programs exist for this population (e.g., IICAPS; Woolston et al. 2007), but have not been subjected to rigorous study. Meanwhile, strictly defined and purveyor-controlled evidence-based psychosocial interventions (EBPIs) such as MST are costly and viewed by providers as difficult to implement (Borntrager et al. 2009; Chorpita et al. 2007, 2011). As a result, they have been found to account for a very small percentage of all youth served—as low as one to three percent (Bruns et al. 2016).

For all the above reasons, most IHBT programs operating in state service systems today are “home grown” programs that are less strict with respect to practice parameters, organizational requirements, training and coaching expectations, and fidelity and outcomes monitoring than manualized EBPIs (Hammond and Czyszczon 2014; Moffett et al. 2018). While such flexibility may make such home-grown IHBT programs more likely to be used in “real world” service systems, it may result in lack of clarity around effective practice elements and necessary program factors, resulting in lower quality practice and poorer youth/family outcomes.

Since the early 2000s, there has been a growing literature on the promise of promoting “common elements” of effective treatment (discrete clinical techniques or strategies extrapolated from a larger set of treatment manuals for EBPIs) that may facilitate quality and outcomes while improving flexibility (Chorpita et al. 2005). Research on broad-based applications of common elements to public systems has been encouraging (Daleiden et al. 2006; Chorpita et al. 2017), but has mostly focused on therapists working in outpatient settings, rather than intensive in-home models.

As a relevant example for IHBT, Lee et al. (2014) used methods similar to those of Chorpita et al. (2005) to aggregate and distill information from controlled research studies on interventions to prevent OHP. That study identified practice and program elements found to be “common” among manualized programs determined by research to be effective for preventing OHP for youth. However, results from this or other research has not been effectively translated into a defined set of program or practice parameters for IHBT.

While evidence-based IHBT models, such as MST, have robust and multiple quality assurance mechanisms (e.g., therapist, supervisory, and organizational measures of adherence), they account for a small proportion of all IHBT delivered. Meanwhile, only a few state IHBT programs (e.g., Massachusetts, Ohio), have developed and deployed measures of adherence with their IHBT service delivery standards. A systematic effort to synthesize both “evidence-based practice” (from studies such as that by Lee and colleagues) and “practice-based evidence” (such as from these states’ local efforts) to develop a cohesive, well-vetted set of program and practice standards for IHBT could hold promise for use in broad-based practice improvement efforts as well as research on IHBT.

The Current Study

To fill this gap for the public children’s behavioral health field, the research team sought to develop IHBT program and practice standards through a multi-step iterative process. The process we adopted draws from canonical frameworks for quality assessment in health care such as that of Donabedian (1988), which assumes outcomes are influenced by both structure (e.g., material or human resources, organizational structure) and process (what is actually done in providing care). Study methods also are informed by approaches used by entities such as the Agency for Healthcare Research and Quality (AHRQ) Subcommittee on Quality Measures for Children’s Healthcare, which identifies recommended quality measures based on literature reviews, expert input, and a systematic review process focused on both importance and feasibility (Mangione-Smith et al. 2011).

The current project used a comprehensive review of program models and prior literature reviews (e.g., Lee et al. 2014) and expert informant interviews (including representatives of states with quality frameworks) to develop an initial set of standards that were subjected to a version of the Delphi process (Linstone and Turoff 1975) process with a range of experts and informants. The goal was to develop two sets of quality standards for IHBT that align with the Donabedian (1988) framework:

  1. 1.

    Program standards Conditions at the level of organization, program, or service system (e.g., 24/7 on-call support, access to flexible funding) that describe the structure and resources of the program and that promote program-level quality and fidelity and positive youth/family outcomes.

  2. 2.

    Practice standards Distinct activities or techniques that are expected of interventionists or practitioners and proposed to promote positive outcomes (e.g., active listening, crisis plan development, parent behavior management skills training).

Thse current paper provides an overview of literature review and expert interviews followed by a full description of the methods and results of the Decision Delphi process that was used to refine and improve an initial set of IHBT standards over two rounds of expert input and feedback.


Step 1: Literature Review

Step 1 of the project consisted of a literature review of relevant IHBT models using relevant search terms (e.g., “home based,” “in-home,” “community-based,” “serious emotional disturbance,” “child,” “adolescent,” “psychiatric,” “intensive,” “behavioral”) using PsycINFO ad Web of Science. Unlike efforts by, for example, Lee et al. (2014) and Moffett et al. (2018), this search did not aim to produce a comprehensive review of controlled research studies, but rather surface unique IHBT models for youths with SED that included adequate descriptions of practice and program elements that could serve as the basis for language for initial program and practice standards.

The search was conducted in 2017 and sought articles published between 1995 and 2016. Reviews such as Lee et al. (2014) and Moffett et al. (2018) then provided independent summaries of models for review and specific potential program and practice elements. A summary of relevant IHBT models was compiled from this scan and reviewed by a subset of the authors (authors 4, 5, 6, and 7) for comprehensiveness. Table 1 presents a summary of relevant models the team used as the basis for constructing initial practice and program standards to be submitted for review by experts (step 2) and subsequent Delphi process (step 3).

Table 1 Interventions included in review of potential intensive home based treatment program and practice elements

Step 2: Expert Interviews

After completion of the literature review (step 1), we conducted a series of phone interviews with N = 16 IHBT, EBPI, and children’s behavioral health experts. Key informants included children’s leads from state behavioral health authorities (n = 5), researchers involved in developing, studying, and/or disseminating child EBPIs (n = 5), representatives from national and state training and consultation centers (n = 3), representatives of managed care organizations (n = 2), and youth and family advocates (n = 1).

An initial set of potential program and practice quality standards was developed and sent to informants in advance of scheduled interview. Each respondent was asked to provide input on (1) the proposed definition of the IHBT target population (“Youth with Serious Emotional Disturbance who are at-risk of placement due to a mental health disorder”); (2) the feasibility and appropriateness of using manualized EBPIs to provide IHBT in public systems (and specific EBPIs recommended); (3) the feasibility and appropriateness of using program elements as a guide or quality assurance mechanism for IHBT (and specific program elements recommended); and (4) the feasibility and appropriateness of using common practice elements (and specific program elements recommended). Informants were also asked to provide input on the most effective and feasible overall approach states and other purchasers might take to assure IHBT quality and any other recommendations. Because results of interviews were used primarily in preparation for step 3 (Delphi process), we only provide a brief summary of findings here.

Regarding the definition of the target population, many informants observed that eligibility only for youth with SED risked being too narrow, given many effective in-home programs were built for and could benefit other populations. However, many of these same informants also agreed that funding rules required specific eligibility criteria, and that to be useful, the current project also needed to be clear on the population of focus.

Although many informants believed manualized EBPIs provide a high standard of care, the majority voiced that basing IHBT services provided in routine practice and public systems on one or more EBPIs was unlikely to be feasible or acceptable to most purchasers and providers. Many informants reported that a quality framework based on common elements was likely to be a more feasible, acceptable, and flexible way of using available evidence. As one informant put it, “train and supervise them to do things that work and measure outcomes.” Another voiced, “having some ‘off-the-shelfs’ is powerful, but [states] also need to also build their own [model] for the rest of the population.” A third said, “[EBPIs] are helpful, but also cost more, and don’t necessarily help everyone.” Other common themes focused on the need for infrastructure to aid IHBT providers, such as training and coaching for IHBT practitioners; outcomes and quality monitoring; state centers of excellence that could conduct these activities; and high-quality, frequent supervision. Although several experts noted the influence of such “outer context” (policy and financing) factors, for the purposes of the current project, none proposed a different organizational structure than focusing on the practice and organization domains.

In sum, experts reported that quality indicators or implementation standards, aligned with research-based program and practice elements, was likely to be an important route to more focused quality improvement by states and other purchasers. As such, the results supported proceeding to Step 3. Informants also provided feedback on preliminary indicators of quality at the practice and program levels submitted for their review and nominated additional potential indicators of high-quality and/or research-based IHBT. This information aided in finalizing a first set of items for review and a participant list for the Delphi process.

Step 3: Delphi Process

Experts in the field of IHBT were recruited to participate in a web-based Delphi process (Linstone and Turoff 1975). The Delphi process is based on the principle that decisions from a group of informed individuals will be more accurate than from a small number of people or an unstructured group. Delphi employs a structured method through which a panel of experts answer questionnaires in two or more rounds. After each round, the facilitator reports back results from the previous round with detailed feedback from those who rendered judgments. In subsequent rounds, experts are thus encouraged to revise earlier ratings or answers in light of results from the panel overall. The process is stopped after a predefined number of rounds or when a criterion is reached based on stability of results. The Delphi process as used here adhered closely to hallmarks of a standard Delphi process, in that it relied on an external facilitator or “change agent,” employed a heterogeneous group of participants (who were nonetheless informed on the topic), allowed participants to review anonymous feedback from other participants, and sought to use no fewer than two rounds of feedback to achieve a high level of agreement (Linstone and Turoff 2002).

For the current project, experts again included state children’s mental health directors from the National Association of State Mental Health Program Directors (NASMHPD), developers of evidence-based practice models, major providers of IHBT across the country, parent and youth leaders with perspectives on IHBT, and representatives of national and state children’s mental health technical assistance centers. Initially invited participants were also asked to identify, through a snowball process, further stakeholders with expertise in IHBT who should engage in the Delphi process.

Delphi Process Survey

Respondents were asked to rate each standard’s (1) importance and feasibility with respect to fundamental content, and (2) clarity and appropriateness of proposed wording. For content, each participant indicated whether the standard described was essential, optional, or inadvisable for IHBT programs. They were also asked to provide their rationale, concerns, or other information related to the standard’s capacity to promote positive outcomes (or, in the case of program standards, effective practice and positive outcomes). For wording, each participant indicated whether, as written, the description of the activity was acceptable, acceptable with minor revisions, or unacceptable. For wording, Delphi participants had the opportunity to propose alternative language if they viewed an item acceptable with minor revisions or unacceptable as written. A financial incentive of $30 for review of each set of standards in each round was provided.

Criteria for Acceptance

The research team applied a set of criteria for standards to be approved. To meet criteria for “Full Approval,” 75% or more of respondents had to rate the standard’s content as “Essential” or “Optional,” and 75% or more had to rate the wording as “Acceptable.” A standard was considered to have met criteria for “Partial Approval” if it met one of the above criteria, but not both. Standards rated as “Inadvisable” for content by > 50% of respondents were removed from further consideration.

For standards that did not meet the above criteria for “Full Approval,” qualitative data were coded using standard techniques (e.g., Auerbach and Silverstein 2003) to identify unique pieces of feedback about the content or wording of standard across all Delphi participants. These codes were then sorted into themes, defined as “broad units of information that consist of several codes aggregated to form a common idea” (Creswell 2013, p. 186). A summary of qualitative analyses of open-ended input (all themes/categories and N of respondents who contributed a comment coded into that theme) was fed back to Delphi participants after each round of review, as per recommendations for effective use of the Delphi process.


Because literature review and expert interviews were primarily used to inform the development of items to be reviewed by experts in the Delphi process (step 3), we have briefly reported the results of steps 1 and 2 above in the “Methods” section. Results reported below thus focus only on results of the Delphi process.

Delphi Participants

A total of 150 informants were invited to review and rate the program and practice standards. Twelve invited respondents (8%) opted out due to self-reported lack of content expertise. Of the remainder, a total of 74 invitees returned complete practice standards surveys (54%) in round 1, while 58 (42%) returned complete surveys of program standards. For round 2, we sent revised standards for both program and practice to all 74 participants who provided feedback on at least one set of standards in round 1. Thirty-eight participants (51%) returned complete surveys in round 2 for both program and practice standards. To analyze the possibility of differential attrition, independent samples t-tests were used to evaluate differences in round 2 survey completers and non-completers, using Bonferroni corrections, for round 1 scores on both practice standard ratings (p < .001) and program standard ratings (p < .001). No significant differences were found between respondents who provided one versus two rounds of ratings.

Characteristics of Respondents

Table 2 presents a full summary of Delphi respondents. As shown, the most commonly reported roles were state service system administrators (50%) and consultants/technical assistance providers (22%). The mean number of years of experience in children’s behavioral health was 23. The majority of respondents were white (86%) and reported having a Master’s level of education or higher (95%), primarily in social work, clinical psychology and counseling. As was the case for differences in round 1 ratings, we found no significant differences between reviewers who participated in one versus both rounds of feedback by role, years of experience, or any other characteristics measured.

Table 2 Characteristics of Delphi participants

Delphi Results

Program Standards, Round 1

Figures 1 and 2 present the sequence of review and revision steps and ultimate decisions for program standards across Rounds 1 and 2, respectively. As shown, 30 program standards were submitted to respondents for review in Round 1. After Round 1 of the Delphi process, 21 standards met criteria for approval (both content and wording were > 75%), and 9 standards met criteria for partial approval (one but not both of these criteria met). The research team revised the 9 standards that received partial approval based on qualitative feedback from Delphi respondents, as well as 5 standards that the research team felt could be improved based on feedback, despite meeting criteria for approval. For a summary of all program standards reviewed, the percent of Delphi participants who approved both content and wording, and the disposition of each standard across rounds 1 and 2, see Table 3.

Fig. 1
figure 1

Delphi process flow chart for program standards (round 1)

Fig. 2
figure 2

Delphi process flow chart for program standards (round 2)

Table 3 Percent of proposed program standards approved for content and language, Delphi rounds 1 and 2

Example of Revision

An example of the feedback and editing process is provided by Program standard 25, which received partial approval. Although the majority of respondents agreed the standard should be included, approximately half objected to the original language:

Review of care plans: Each youth /family’s initial plan of care is reviewed by an expert in the IHBT practice model (ideally external to the supervisor or coach). Updated plans of care should be reviewed no less than bi-monthly.

Thematic coding of the qualitative responses revealed to two major concerns: (1) requiring an external reviewer would cause an undue burden on programs since they may not have access to or funds available for such an effort and that a supervisor should therefore be adequate to fill this role; and (2) specifying a bi-monthly review was too prescriptive. This feedback was taken into account and revisions made, resulting in the following revised standard:

Review of care plans: Each youth and caregiver's initial plan of care is reviewed by an expert (e.g., a supervisor) in the IHBT practice model. Updated plans of care should also be regularly reviewed.

Based on input, the research team also created 4 new standards during the review phase of Round 1 (e.g., based on input about missing concepts that needed to be added, or multi-barreled standards that reviewers suggested should be the basis for two unique standards). Thus, after Round 1, 16 standards were formally accepted, and 18 standards proceeded to a second round of review and comment.

Program Standards, Round 2

After review by respondents in Round 2, 14 remaining standards met criteria for full approval, and 4 received partial approval. One of the partially approved standards, “IHBT practitioner serves a lead role,” was removed, due to multiple objections from reviewers that in complex public service systems with many other services and initiatives, system processes may dictate that another entity serve in a lead role (e.g., a care coordination unit). Another partially approved standard, “Qualified personnel,” was combined with an approved standard, “Initial apprenticeship,” to reduce redundancy.

The remaining 2 partially approved standards narrowly missed the cut-off for outright approval. Specifically, “Comprehensiveness of intervention” and “Post transition services,” both met criteria for content (97.2% and 89.2% respectively), but fell short of the 75% threshold for language (69.4% and 63.9%). The research team made minor revisions to the language of these two standards, thus bringing the total to 16 approved standards in Round 2. Thus, after two rounds consensus among experts was established, resulting in 32 final program standards.

Practice Standards, Round 1

Figures 3 and 4 present the sequence of review and revision steps and ultimate decisions for practice standards across Rounds 1 and 2, respectively. As shown, 49 practice standards were submitted to respondents for review in Round 1. After Round 1 of the Delphi process, 31 standards met criteria for approval (both content and wording were > 75%), and 18 standards met criteria for partial approval. Of the 18 standards receiving partial approval, 13 standards were revised, 4 standards were combined (becoming 2 standards), and 1 standard was removed. Of the 31 fully approved standards, reviewer feedback led to revision of one standard and removal of one standard. Two standards, “Identifies additional supports needed” and “Builds family resources and supports,” were combined into one approved standard. Thus, after Round 1, 28 standards were accepted, and 16 standards proceeded to a second round of review. For a summary of all practice standards reviewed, the percent of Delphi participants who approved both content and wording, and the disposition of each standard across rounds 1 and 2, see Table 4.

Fig. 3
figure 3

Delphi process flow chart for practice standards (round 1)

Fig. 4
figure 4

Delphi process flow chart for practice standards (round 2)

Table 4 Percent of proposed practice standards approved for content and language, Delphi rounds 1 and 2

Practice Standards, Round 2

After review by respondents in Round 2, 14 remaining standards met criteria for full approval, and 2 received partial approval. One partially approved standard, “Conducts a functional analysis,” was combined with “Creates functional understanding of behavior,” which was approved in round 1. Similarly, “Develops individualized indicators of progress,” was combined with “Conducts standardized assessment,” also approved in round 1. Additionally, one approved standard, “Develops accommodations” was combined with a standard approved in Round 1, “Arranges for supports from systems.” Thus, 13 new standards were accepted and 3 standards were combined with previously approved standards. After two rounds of the Delphi process, consensus amongst experts was established, resulting in 41 final practice standards.


Public behavioral health systems in the U.S. expend hundreds of millions of dollars on IHBT with an aim of improving outcomes for children and youth with SED and their families and reduce reliance on institutional care. As implemented in the real world, however, the quality of these services is variable. Although an array of intensive home-based EBPIs exist, their effectiveness for youths with SED is not well-established, and those that may be relevant are rarely used in public systems. Alternative approaches are needed to assure that IHBT implementation is guided by systems of continuous quality improvement (CQI). At a minimum, such systems should operationalize and continuously measure three things: program structures, practices provided, and client outcomes (Donabedian 1988).

Overview of Findings

An initial set of IHBT quality standards was positively viewed by a diverse group of experts and stakeholders, with 70% of program standards and 63% of practice standards meeting criteria for acceptability after only one round of rating and input. After revision and a second round of input, 88% of proposed program standards and 95% of practice standards reached criteria. After a last round of minor revisions, a set of 32 program and 48 practice standards was finalized and disseminated to what had evolved into a national learning community of IHBT experts and stakeholders (see Appendix Tables 5 and 6).

The achievement of relatively high ratings of importance, acceptability, and feasibility by experts over just two rounds of feedback was facilitated by a number of factors, including prior research on common elements of effective community-based interventions for youth with SED (e.g., Lee et al. 2014), nomination by like-minded peers, and alignment of the standards with common wisdom among experts and stakeholders. Prior development of quality standards for other community children’s mental health models that had evolved out of “practice-based evidence,” such as Treatment Foster Care (Farmer et al. 2002), Wraparound (Walker and Bruns 2006), and parent peer support (Olin et al. 2015) also provided a precedent for the development of the IHBT standards that increased the credibility of the process and its results. Finally, the increasing awareness of the importance of quality metrics in children’s behavioral healthcare, and the presence of well-accepted methods for developing them (e.g., Mangione-Smith et al. 2011), aided the process.

Themes from Expert Consensus Process

For many standards, stakeholders from state behavioral health authorities and provider agencies voiced that quality indicators should be more feasible than initially presented. For example, many EBPIs reviewed in step 1 of the project mandate review of treatment plans by an external expert or coach, a step viewed as infeasible by many state representatives and providers. After editing to accommodate feedback from stakeholders, the final, revised program standard (No. 25: “Review of care plans,” see Appendix Table 5) states that “Each youth and caregiver’s initial plan of care is reviewed by an expert in the IIHBT practice model,” without specifying that the expert must be external to the provider organization. This is but one of many examples where the core team balanced feedback from experts who advocated for a “gold standard,” often based in expectations for established EBPIs, against the need for feasibility. By feeding back both quantitative and qualitative input, the Delphi process allowed participants to consider their own ratings and perspectives against that of their peers. The high level of endorsements of revised standards (round 2) suggests that the process facilitated achievement of an appropriate balance.

A second theme in the development, review, vetting, and revision of the standards was issues related to the overlap of functions, responsibilities, and services in public systems of care. For example, many intensive in-home EBPIs, such as MST, require the therapist to manage all aspects of the youth and family’s treatment. However, in many systems and/or for certain populations, this function may be performed by other providers, such as a Wraparound care coordinator. Similar questions arise around the IHBT program’s ability to be the first crisis response option for a family, such as when a mobile response and stabilization service (MRSS; Manley et al. 2018) exists in the system of care. When designing local continuums of care, it is important to clearly distinguish the discrete, as well as, overlapping roles and functions of IHBT in relation to programming such as Wraparound and MRSS, including decision protocols for determining which service has the primary responsibility for overlapping functions.

Finally, application of IHBT practice standards must account for the wide range of experience and training that is represented in the IHBT workforce. Despite the complexity of the work, many IHBT practitioners are new graduates of Masters or even Bachelors programs, with little experience, while others may be seasoned veterans and be certified in one or more specific EBPIs.

In developing and revising standards based on feedback, the research team adopted a mindset that program standards should be consistent expectations of states and other purchasers of service, written into contracts and memoranda of agreement and designed to enhance the consistency of implementation of IHBT model components. Particularly because many IHBT practitioners are new to the field, intensive supervision and ongoing training are among these expectations of IHBT programs. Practice standards, however, might better be viewed as competency targets, designed to highlight key practitioner skills and competencies that are necessary for quality delivery of IHBT. As such they become a focus for the development and enhancement of clinical competencies through ongoing training, intensive supervision, and clinical consultation. One would expect that more experienced, well-supported IHBT practitioners will meet the majority of these standards, while novice practitioners will require more time to do so.

Application of the IHBT Standards

It is our hope that the IHBT standards developed through this project can inform future efforts to improve the quality of IHBT programs and outcomes experienced by youth and families. Encouragingly, immediately after completion of this study, a national IHBT learning community of states, managed care entities, and large behavioral health provider organizations was launched that allows participants to gain insights from peers about how the standards could promote quality assurance and inform contracting, financing strategies, and investments in workforce development.

In the future, we envision translating the IHBT standards into measurement strategies that are employed in more rigorous approaches, such as quality collaboratives (ØVretveit et al. 2002) that use data on relevant metrics to identify change targets, inform improvement plans, and evaluate success or progress. Quality improvement efforts using IHBT standards could also include value-based payment (VBP) arrangements that provide additional reimbursement to providers for meeting quality and outcome benchmarks (Institute of Medicine 2007), thus serving the dual purpose of enhancing the quality of IHBT service delivery and providing additional resources to the program to offset the additional costs of providing IHBT to fidelity. Such VBP efforts could address feasibility concerns of the standards (discussed above), by providing resources needed by providers to meet standards, rather than adjusting research-based practices further due to feasibility concerns.

Finally, standards such as those developed for IHBT can link to broader federal quality initiatives. In the child welfare sector, these standards can aid in development and evaluation of outcomes (e.g., preventing or reunifying children after out-of-home placement) for programs included in state plans under the Family First Prevention Services Act (FFPSA; Lindell et al. 2020). In healthcare, among the many provisions to strengthen care quality, the Child Health Insurance Program Reauthorization Act (CHIPRA) requires states to report on a core set of health care quality measures by 2024. Although standards such as these for IHBT extend far beyond the CHIPRA Core Set (which are still under development), federal quality initiatives may increase willingness and capacity to pursue other metric-based quality improvement efforts. Moreover, availability of IHBT—and adherence to research-based standards—hold promise for improving states’ status on specific measures in the Core Set, such as follow-up after hospitalization for mental illness or appropriate medication prescribing and monitoring.

Limitations and Need for Future Research

A typical limitation of efforts to define quality metrics or standards is that they are not fully based on rigorous research on what factors are associated with client outcomes. To bridge this gap in knowledge, and to accommodate the need to assure feasibility and acceptability, expert and stakeholder opinion is used as a key input to the process (Mangione-Smith et al. 2011). In the example of the current project, the literature on effective IHBT programs for youth with SED is particularly scant, limiting the degree to which a comprehensive, formal meta-analysis on interventions for the target population could even be conducted.

Although this represents a limitation of the current project, efforts are now under way to develop quality measures based on these indicators which can then be used in future research. For example, the relationship between standards adherence and outcomes should be evaluated, in order to establish the validity of these measures and expand the research base on IHBT. This has been done previously for other models that were based on “practice-based evidence” in the children’s behavioral health field. For example, Farmer et al. (2002) developed quality measures for Treatment Foster Care based on newly established standards, and found scores from these measures were associated youth outcomes. For other intensive models, including Wraparound (Bruns et al. 2008) and Assertive Community Treatment (Salyers et al. 2003), a combination of norm-referencing (i.e., review of score distribution for many programs) and criterion-referencing (i.e., association with outcomes) have been used to establish quality benchmarks for fidelity measures. Such research will be needed to inform the field on which standards are essential versus not essential, appropriate uses of the standards, and the types of decisions that should—and should not—be made on the basis of the scores.

A second set of limitations relate to the specific methods that were chosen for the current study. For example, we chose to define program and practice domains based on past research and expert interviews. The results may have differed if the domains and initial standards were defined by experts themselves via methods such as Concept Mapping (Trochim 1989). Additionally, we did not evaluate and adjust for distinct response patterns across participants (or types of participants), which could have been evaluated using measurement models such as those based on item response theory (IRT; Hambleton et al. 1991). The above approaches could be used in the future to validate or supplement the IHBT standards.

A third set of limitations relate to the characteristics and attrition of Delphi process participants. First, although nearly half of experts engaged in initial interviews were EBPI developers and purveyors, the majority of Delphi participants were representatives of public service systems. This was the result of inviting all state children’s behavioral health leads to participate, in an effort to achieve external validity. However, this imbalance may have influenced results. There also was attrition from round 1 to round 2 of the Delphi process, whereby loss of raters for program standards was 34%, and practice standards was 47%. Although no significant differences were found between experts completing one survey versus both surveys (on either respondent characteristics or round 1 scores), it is possible some unmeasured factor differentiated these groups, yielding biased results in round 2. It is also not clear why respondents were less likely to complete the second round of practice standards review, although the greater number of practice standards may have been a factor.

Finally, although consensus was not perfect after the second round of input, we chose not to conduct a third round of the Delphi process. This decision was based on the near-complete consensus that was achieved after round 2 (only 4 of 34 program standards and 2 of 44 practice standards did not reach criteria). Nonetheless, it must be acknowledged that experts and stakeholders did not formally review the small number of final revisions.


In conducting this study, we were reminded repeatedly of the many commonly cited limitations of public systems, including funding and workforce limitations that may constrain achievement of these standards. We were also reminded of how rapidly and dramatically the context of behavioral healthcare provision can change. Since completion of this study, the COVID-19 (SARS Co-V-2) pandemic has forced modifications to IHBT practice and illuminated disparities in healthcare, and a national racial justice movement has brought further attention to the need to address disparities in healthcare access and outcomes. Such events point to potential shortcomings in the IHBT standards related to structural racism and social determinants of health.

For example, in addition to standards related to identifying youth and family strengths, needs, and resources, a standard could be included in the “comprehensive contextual assessment” section of the practice standards that explicitly demands attention to collecting information on social determinants of health (e.g., safe housing, transportation, safety of neighborhood; racism, discrimination, and violence; education, job opportunities, and income). Similarly, program standards could call out the need for training and supervision on using such an ecological focus and incorporating anti-racist principles into provision of services. In the Accountability section, a standard could be included related to collecting and disaggregating data on families served and providers that include race, ethnicity, and language.

Such events only reinforce the need for continually refining the standards based on research as well as feedback from diverse members of the IHBT community. As the standards are applied to CQI and workforce efforts in large-scale IHBT programs and initiatives nationally, feedback will continue to be compiled. Where appropriate, we expect that further revisions will be made to the standards based on data and lessons learned in real world systems and programs. In the short term, our hope is that the availability of standards based on evidence for effectiveness—and shaped by experts from real world systems—will help improve existing programs, establish new ones, and provides adequate funding and other resources to implement the highest-quality IHBT programs possible.