Children with autism spectrum disorder (ASD) exhibit difficulty learning language with wide variation in the nature and degree of these difficulties (Kjelgaard & Tager-Flusberg, 2001; Lord et al., 2004; Tager-Flusberg et al., 2005; Thurm et al., 2007). Concerns with communication are often one of the first developmental concerns that caregivers of children later diagnosed with ASD express (De Giacomo, & Fombonne, 1998; Howlin & Moore, 1997; Kozlowski et al., 2011). Such concerns are consistent with the observed areas of need in prelinguistic skills of children with ASD (e.g., joint attention, canonical babbling; Mundy et al., 1986; Patten et al., 2014; Sigman & Ruskin, 1999). Approximately 30% of children with ASD present with minimal verbal skills, using only a few words, even after years of intervention (Anderson et al., 2007; Tager-Flusberg & Kasari, 2013). Other individuals with ASD achieve fluent speech with large vocabularies and complete sentences (Kjelgaard & Tager-Flusberg, 2001; Tager-Flusberg & Joseph, 2003). Pragmatic language, which includes the social aspects of language, has been identified as a particular area of need for children with ASD (Lord & Paul, 1997; Wilkinson, 1998). These language difficulties may have long-term negative consequences on social and vocational outcomes, including decreased likelihood of living independently and low employment status (Billstedt et al., 2005; Howlin, 2000). Thus, determining how to best mitigate such difficulties is critical for improving long-term outcomes of individuals with ASD.

Interventions for children with ASD vary across multiple facets including theoretical basis, type of interventionist (e.g., clinicians, caregivers, peers, or their combination), degree to which interventions are child-led versus adult-led (i.e., directedness), and how the communication partner responds to communicative attempts. Investigations of the effectiveness of communication and language interventions have yielded widely varying results (e.g., Hampton & Kaiser, 2016; Reichow et al., 2018; Sandbank et al., 2020b). Because intervention studies vary in many factors (e.g., participants characteristics, outcome measures, and intervention features), systematic synthesis across studies is needed to draw conclusions. One synthesis approach is to evaluate interventions with specific components to identify active ingredients of effective interventions. This approach may enable interventionists to focus on the essential strategies. Focusing on essential strategies is especially important when caregivers serve as the interventionist because teaching too many strategies or tasks may risk overwhelming caregivers and reducing the training’s effectiveness. This systematic review and meta-analysis synthesizes and evaluates studies of interventions that use responsivity intervention strategies to target prelinguistic and language skills in children with ASD.

Responsivity Intervention Strategies

We define responsivity intervention strategies as strategies designed to support the development of turn-taking conversations through setting up the environment to increase communication by following the child’s lead, using natural reinforcement for communicative attempts, and providing targeted input. The adult adapts their responses to the child’s focus of attention and/or on-going actions. Responsive strategies include, but are not limited to, linguistic mapping, follow-in comments, recasting, and imitating the child. Linguistic mapping occurs when an adult describes the child’s action and/or underlying message or intention (Yoder & Warren, 2002). For example, the adult says, “That’s a book,” when the child points to a book and says, “Uh”. Follow-in comments describe the child’s current focus of attention (McDuffie & Yoder, 2010). For example, the adult says, “That car is going fast,” when the child is playing with a toy car. When an adult recasts what a child says, they add grammatical or phonemic information to the child’s utterance. For example, the adult says, “That dog is big!” when the child comments, “Dog big.” When imitating a child, the adult may imitate the child’s words, sounds, gestures, and/or actions on objects. These responsivity intervention strategies can, and often do, target prelinguistic skills (e.g., joint attention and vocalizations) that are foundational to language use and conversational turn-taking.

Responsivity intervention strategies may be used independently, but they are often used within an intervention package, such as a naturalistic developmental behavioral intervention (NDBI). NDBIs combine developmental principles and applied behavior analysis (ABA) principles, follow the child’s lead, and include multiple intervention strategies to support learning and engagement (Schreibman et al., 2015). Examples include the Early Start Denver Model (ESDM), Joint Attention Symbolic Play Engagement and Regulation (JASPER), Pivotal Response Treatment (PRT), reciprocal imitation training (RIT), and Responsivity Education / Prelinguistic Milieu Teaching (RE/PMT). Responsivity intervention strategies contrast adult-driven interventions that emphasize discrete training of specific behaviors using structured prompting procedures (e.g., discrete trial training).

Responsivity intervention strategies align with multiple theories that emphasize the bidirectional interactions between children and adults in facilitating vocal and language development, including the social feedback theory (Goldstein & Schwade, 2008; Goldstein et al., 2003), social feedback loop theory (Warlaumont et al., 2014), and transactional theory of spoken language development (Camarata & Yoder, 2002; McLean & Snyder-McLean, 1978; Sameroff & Chandler, 1975). Although the details of these theories vary modestly, they all support the use of contingent caregiver responses to children’s communicative attempts to facilitate continued growth in communication and language. Thus, these theories provide support for use of responsivity intervention strategies during language intervention for children with ASD.

The social feedback theory asserts that children produce more complex and more adult-like vocalizations when adults respond contingently to them within social interactions (e.g., smiling at, moving closer to, and/or touching the infant when they vocalize) than when they respond noncontingently (Goldstein et al., 2003). The contingent nature of the response is emphasized rather than a more general response style or the quantity of input. Intervention procedures that support adults consistently responding to child vocalizations, but not responding when the child is not producing vocalizations, would align with the social feedback theory.

The social feedback loop theory emphasizes that adults are more likely to respond to children’s speechlike utterances than non-speechlike utterances and children are more likely to produce speechlike utterances when their communication partner responds to their immediately preceding utterance (Warlaumont et al., 2014). The social feedback loop theory aligns with intervention approaches that increase adults’ responses to children’s utterances as well as increasing the number of child vocalizations.

The transactional theory of spoken language development posits that caregivers provide increasingly complex input to the child as the child produces more complex communication and language acts. The relatively more complex input scaffolds continued child growth that evokes even more complex input (Camarata & Yoder, 2002). Thus, this theory supports intervention strategies that encourage adults to provide input that is contingent on and somewhat more complex than the child’s utterances.

Relevant Prior Reviews

No known prior reviews specifically address the effects of responsivity intervention strategies on prelinguistic and language skills of children with ASD using randomized controlled trials (RCTs) and single case research design (SCRD) studies, which can address this causal question. One known systematic review and meta-analysis evaluated the effectiveness of intervention studies that addressed parent verbal responsiveness and child communication for children with or at risk for ASD (Edmunds et al., 2019). Because the meta-analysis only included five RCTs for the intervention studies, the results must be interpreted with caution. The findings identified improvement in parent verbal responsiveness but not child communication. Some of the included studies reported benefits for child communication but others did not. The limited number of studies available precluded more detailed analysis to explain the variation in results. Other reviews that have also focused on specific types of intervention (e.g., early intensive behavioral interventions [Reichow et al., 2018], parent-mediated early interventions [Oono et al., 2013], ESDM [Ryberg, 2015]) have been limited by the number and quality of relevant studies to include. These example meta-analyses included at most eight studies with at most two being RCTs.

Taking a different approach, a few other prior reviews have examined effects of broad language intervention for young children with ASD, regardless of the intervention type. These review studies were restricted to group design studies and have often included quasi-experimental studies in addition to randomized controlled trials (Hampton & Kaiser, 2016; Sandbank et al., 2020a, 2020b). Sandbank and colleagues reported a positive, but small, statistically significant mean effect size for the effects of nonpharmacological early intervention on multiple areas of development, including language, for group design studies (Sandbank et al., 2020a, 2020b). Similarly, Hampton and Kaiser (2016) reported a small, significant mean overall effect size (g = 0.26, 95% CI [0.11, 0.42]) for spoken language outcomes. Some reviews included children at risk for ASD, rather than only children diagnosed with ASD (Edmunds et al., 2019).

Factors that May Influence the Presence and Strength of Intervention Effects

Some of the reviews described above have investigated several factors that may influence the presence and strength of intervention effects. The results have often been mixed, which supports the need for continued investigation to reach a consensus. These variables include the interventionist, time in intervention, proximity of outcome measures, boundedness of outcome measures, risk of correlated measurement error, and publication bias.

Interventionist

ASD interventions may be implemented by a variety of individuals including caregivers, clinicians, and/or peers. Some interventions are implemented by multiple individuals, such as a caregiver and a clinician simultaneously with varying levels of caregiver training provided (e.g., Gengoux et al., 2019; Roberts et al., 2011; Vivanti et al., 2014). Logically, a child may benefit from both the caregiver spending relatively more time with the child during the day to implement therapeutic strategies and the clinician’s expertise implementing and adapting strategies. Both Sandbank et al. (2020b) and Hampton and Kaiser (2016) reported stronger effects for intervention implemented by caregivers and clinicians than those implemented by caregivers alone. Sandbank et al. (2020b) also identified a larger effect size for interventions implemented by clinicians alone than those by caregivers alone, but Hampton and Kaiser (2016) did not find similar differences. Fuller and Kaiser (2020) did not identify a differential effect by interventionist. These three meta-analyses included responsive language interventions, but not exclusively.

Time in Intervention

School-based speech-language pathologists report providing more intensive intervention services for children with severe communication needs (Brandel & Frome Loeb, 2011). Yet, there is relatively little relevant data regarding whether more intensive intervention yields greater language gains for children with ASD, despite its intuitive appeal (Baker, 2012; Warren et al., 2007). A number of meta-analyses have failed to identify total intervention dosage as a moderator of effect size for speech-language outcomes in the meta-analysis for children with ASD (Fuller & Kaiser, 2020; Hampton & Kaiser, 2016; Sandbank et al., 2020b). The current synthesis provides an opportunity to test whether a greater amount of time in intervention improves prelinguistic and language outcomes for interventions that use responsivity intervention strategies. As described by Warren et al. (2007), intervention intensity can be quantified in multiple ways. Because we anticipated limited reporting of the necessary details to calculate cumulative intervention intensity, we selected time in intervention (minutes per week times number of weeks of intervention) as the intensity variable.

Proximity of Outcome Measure

Proximal outcome measures assess skills taught directly during the intervention. Distal outcome measures assess skills beyond what was taught directly. As predicted, Yoder et al. (2013) found significantly greater probability of an effect on social communication for proximal outcome measures (63%) than distal outcome measures (39%) for children with ASD.

Boundedness of Outcome Measure

Boundedness of outcome measures refers to the degree to which the occurrence of the outcome behavior depends on the intervention context (e.g., same setting, materials, and/or communication partner; Yoder et al., 2013). Context-bound outcome measures are measured in situations very similar to the treatment sessions (e.g., evaluating the number of intentional communication acts during treatment sessions with the interventionist). In contrast, generalized characteristics are measured in situations that vary from the treatment context in setting, materials, and/or communication partner (e.g., number of intentional communication acts with an unfamiliar clinician during a session in which the intervention strategies are not used). Potentially context-bound outcome measures may show changes that are possibly limited to the treatment context (e.g., standardized caregiver report measure for a caregiver-implemented intervention). Yoder et al. (2013) found greater probability of a significant effect on social communication for context-bound outcome measures (82%) than generalized characteristics (33%). Boundedness also moderated the mean effect size for the effectiveness of early intervention on social communication skills of children with ASD (Fuller & Kaiser, 2020).

Risk for Correlated Measurement Error

Correlated measurement error (CME) systematically elevates the true score for the predicted superior group or phase over the control group or phase (Yoder et al., 2018). Intervention studies are at risk for CME (a) when the outcome measure coder is not blind to treatment assignment and (b) when interventionists (including caregivers) provide the intervention and serve as the examiner when the outcome measure is assessed.

Publication Bias

Publication bias occurs “when published research on a topic is systematically unrepresentative of the population of completed studies on that topic” (Rothstein, 2008, p. 61). We test for this known risk for meta-analyses by comparing effect sizes of published versus unpublished studies. This examination is a feature of well-designed meta-analyses.

The Current Literature Synthesis

The purpose of this systematic review and meta-analysis is to describe the current state of the literature for responsivity intervention strategies aimed at improving prelinguistic and language skills of children with ASD with an eventual outcome of shaping the direction of future research studies and clinical practice. Most of the prior reviews are systematic, but do not employ meta-analytic techniques (Mancil et al., 2009; McConachie & Diggle, 2007; Verschuur et al., 2014). Our review uses meta-analytic techniques to determine the mean effect size not only for group design studies, specifically RCTs, but also for SCRD studies. Including SCRD studies is important because many studies of responsive interventions have used single case designs. SCRD studies avoid the need for large samples required for RCTs to make causal conclusions by each participant serving as their own control and by using specific designs to control for threats to internal validity (Ledford & Gast, 2018). We restricted the research synthesis to RCTs and SCRD studies because those designs permit causal conclusions, unlike quasi-experimental or other non-randomized group designs. This design requirement combined with the quality analysis enabled this research synthesis to focus on studies with relatively higher quality of evidence. We conducted two separate analyses—one for RCTs and a second for SCRD studies. We then descriptively discuss the results of the two analyses. The review is registered with PROSPERO (CRD42020157374).

Research Questions

To provide a comprehensive review of the literature, we included RCTs and SCRD studies that met quality criterion. We addressed two primary research questions, separately for the RCT and SCRD studies: (1) Is the mean effect size for interventions that use responsivity intervention strategies on communication and/language skills in children with ASD greater than zero? (2) Does the mean effect size vary by interventionist, time in intervention, proximity or boundedness of the outcome measure, risk for CME, or publication bias? We also assessed study quality descriptively using the Revised Cochrane risk-of-bias tool for randomized trials (RoB 2; Higgins et al., 2019) and What Works Clearinghouse standards for SCRDs (What Works Clearinghouse, 2016). Both tools address potential bias from multiple sources including, but not limited to, the study design, completeness of the data, and data analysis.

Methods

Search Strategy

Our comprehensive search strategy included multiple search methods. The main search utilized electronic databases. We searched PubMed on October 18, 2019 and the Education Database, ERIC, Health & Medical Collection, Linguistics and Language Behavior Abstracts, Linguistics Database, ProQuest Dissertations & Theses Global, Psychology Database, PsycINFO, and Social Science Database in ProQuest and the Cumulative Index of Nursing and Allied Health Literature (CINAHL) on October 19, 2019. See Supplementary Information 1 for an example search.

For supplementary searches, the first author hand searched table of contents for the past year for journals that contributed at least five articles to the full text screening from the main database search (i.e., Autism, Journal of Autism and Developmental Disorders, Journal of Child Psychology and Psychiatry). The first author also screened abstracts from the two prior conferences for the Gatlinburg Conference on Intellectual and Developmental Disabilities, International Meeting for Autism Research, and Society for Research in Child Development to identify findings that may not yet be in publication. Finally, the first author scanned reference lists and conducted forward searches for included studies. The supplementary searches were completed on March 28, 2020.

The primary coder (first author) screened 100% of the identified reports. Trained research assistants independently screened 25% of the reports at the title and abstract level and the full text level. The primary coder (first author) was blind to which reports would be coded for reliability. To prevent coder drift, discrepancy discussions were completed regularly. Point-by-point agreement for inclusion or exclusion (i.e., agreements divided by total number of reports) was 89% at the title and abstract level and 87% at the full text level. We used the primary coder’s decisions for inclusion.

Inclusion Criteria

Population

Study participants had to be children diagnosed with ASD with a mean or median age under 18 years, 0 months at intervention initiation. We included numerous diagnostic search terms due to the change in diagnostic criteria and terminology in recent decades. Participants with autism spectrum disorder(s), autism, autistic disorder, pervasive developmental disorder–not otherwise specified, high-functioning autism, and Asperger’s disorder/syndrome were included if they met other inclusion criteria. We only included children at “high-risk” for ASD (e.g., infant siblings of children with ASD) if they were later diagnosed with ASD. For RCTs, each group was required to contain at least five participants to permit calculation of an effect size.

Intervention

We included studies that tested the effects of a behavioral intervention that used responsivity intervention strategies designed to improve prelinguistic and/or language skills in children with ASD. The interventionist responds to the child’s communicative attempts and provides targeted prelinguistic and/or language input. Responsivity intervention strategies include but are not limited to an adult or peer imitating the child’s vocalizations or spoken words, recasting the child’s verbal or nonverbal communication act, contingent responses to child vocalizations that continues the turn-taking exchange, and follow-in comments. We did not exclude studies based on the type of interventionist (e.g., caregivers, clinicians, teachers, and/or peers).

Comparison

For RCTs, the treatment group (the group that received responsivity intervention strategies) must be compared with a randomly assigned control group that does not receive responsivity intervention strategies. The control group may vary in type including, but not limited to, other intervention strategies that do not use responsivity intervention strategies, a business-as-usual condition, or a waitlist control. For the SCRD studies, a baseline or alternative intervention condition serves as the comparison, depending on the study design.

Outcomes

Studies must report at least one prelinguistic skill and/or language measure for the child participants with ASD. Outcome measures may be expressive language (e.g., expressive vocabulary, mean length of utterance, and requests), receptive language (e.g., receptive vocabulary and following directions), or prelinguistic skills (e.g., directed vocalizations, joint attention, and gestures).

For the RCTs, each report must include at least one group mean difference effect size or sufficient data to calculate one for an eligible outcome measure. For applicable SCRD studies, we calculated the between-case standardized mean difference (BC-SMD) because it applies to multiple baseline across participants studies (the most common design of this review), quantifies magnitude and consistency of change, and is more similar to group design effect sizes than within-case effect sizes (Hedges et al., 2012, 2013; Pustejovsky et al., 2014; Valentine et al., 2016). We present the RCT and SCRD study results separately to permit comparison of the RCT results with prior meta-analyses and to avoid differences in weighting of sample sizes across study types (Valentine et al., 2016).

Exclusion Criteria

To maintain an appropriately narrow focus, literacy, vocal stereotypy, and challenging behavior outcomes were excluded. We also excluded outcome measures that focused on the interventionist’s performance (e.g., number of adult conversational turns, prompts to the child, or use of intervention strategies). We excluded studies not written in English due to lack of translation resources. Studies were not excluded based on the language of the participants or the publication date. At the final stage of the full text screening, we excluded SCRD studies that failed to meet quality standards from the qualitative and quantitative analyses because failing to meet those standards prevents interpretation of the findings (What Works Clearinghouse, 2016). Broadly, the following criteria must be met to demonstrate an intervention effect: (a) graphical display of the data, (b) at least three attempts to demonstrate an effect and (c) a sufficient number of data points per phase (e.g., at least three data points per phase to meet with reservations and at least five data points per phase to meet without reservations for multiple baseline, multiple probe, and ABAB [reversal/withdrawal] designs; What Works Clearinghouse, 2016). Multiple baseline and multiple probe designs must also have sufficiently overlapping baselines across tiers. Failure to meet all these criteria resulted in exclusion from the qualitative and quantitative analyses. For additional details, refer to the What Works Clearinghouse Study Review Guide Instructions for Reviewing Single-Case Designs Studies (What Works Clearinghouse, 2016).

Study Selection

As shown in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram (Fig. 1), database searches yielded 7108 records and other sources yielded 149 records. After eliminating duplicates and screening the titles and abstracts, 770 records remained. During the full text screening, independent coders eliminated studies in the order listed in Fig. 1. For the SCRD studies, the final inclusion criterion was meeting quality standards with or without reservations. The search yielded 33 RCTs that were described in 45 reports and included 294 relevant effect sizes and 42 SCRD studies that were described in 47 reports. Thirty-seven SCRD studies included sufficient graphical information for visual analysis (91 relevant opportunities to detect a functional relation) and 34 permitted extractions of at least one BC-SMD effect size (69 total BC-SMD effect sizes).

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram. BC-SMD = between-case standardized mean difference; RCTs = randomized controlled trials; SCRD = single case research design

Coding the Studies

All reports were coded by the primary coder (first author) and trained research assistants using a detailed coding manual (available from first author upon request). For the RCTs, point-by-point agreement for data extraction and bias coding was 94% and 90%, respectively. For the SCRD studies, point-by-point agreement for quality coding, data extraction, and visual analysis was 80%, 92%, and 80% respectively. Discrepancies were resolved by consensus. Consensus coding was used for all analyses.

Report level features included publication status, report type, country, spoken language of the participants, and percent of participants who are monolingual. Effect size level features included sample size (for ASD group and control group for RCTs), sex, age, intervention, interventionist, time in intervention, outcome measure(s), and effect size. The total time in intervention is the number of minutes per week multiplied by the number of weeks of intervention. For caregiver-implemented interventions, the amount of time is based on structured intervention time, not all waking hours, even though a caregiver may implement at least some strategies throughout the entire day. We categorized the outcome measures as distal or proximal and context-bound, potentially context-bound, or a generalized characteristic.

For risk of bias for RCTs, we used the Revised Cochrane risk-of-bias tool for randomized trials (RoB 2; Higgins et al., 2019). We rated each study for low, moderate, or high risk of bias for randomization process, deviations from intended interventions, missing outcome data, measurement of outcome, selection of reported result, and overall. In addition, coded study quality features include risk for CME, method of handling missing data, and use of blind assessors. For quality coding for the SCRD studies we used guidelines provided by the What Works Clearinghouse Study Review Guide Instructions for Reviewing Single-Case Designs Studies (What Works Clearinghouse, 2016). Studies that did not meet quality standards with or without reservations were excluded from the meta-analysis. Remaining studies were categorized as meeting standards with versus without reservations.

Analytic Strategies

Effect Size

For the RCTs, we calculated the standardized mean difference (d) for independent groups (i.e., mean of responsivity intervention group minus the mean of the control group divided by the within-groups standard deviation) for each relevant outcome measure (Borenstein et al., 2009). Consistent with current meta-analytic techniques, we then used a correction factor to convert to Hedges’ g to address the tendency for d to overestimate the standardized mean difference for small samples (Borenstein et al., 2009).

For the SCRD studies, we digitized the data (Huwaldt, 2010) to convert the graphical data into numerical data. We then calculated the BC-SMD using the online single-case design hierarchical linear model (scdhlm) web application (Valentine et al., 2016). For consistency, all effect sizes were calculated with restricted maximum likelihood estimation and with fixed and random effects permitted for the baseline and intervention phases (Valentine et al., 2016).

Visual Analysis for SCRD Studies

For visual analysis for SCRDs, we followed guidelines by Kratochwill et al. (2010), What Works Clearinghouse Single-Case Design Technical Documentation, for determining whether an effect is present. The visual analysis focuses on level, trend, variability, immediacy of the effect, overlap, and consistency of data patterns across similar phases.

Robust Variance Estimation

Because traditional meta-analytic techniques assume that all effect sizes are independent, only one effect size per sample can be used. In contrast, robust variance estimation permits inclusion of multiple effect sizes per study (Hedges et al., 2010; Tanner-Smith & Tipton, 2014). We used a random effects model with approximately inverse variance weights to address the dependency of multiple effect sizes per study via the robumeta.ado file from the Stata Statistical Software Components archive.

Moderator Analyses for Putative Moderators of Intervention Effects

We used meta-regression with robust variation estimation to conduct the planned moderator analyses. To evaluate variation in effectiveness across studies that use responsivity intervention strategies, we tested six moderators as shown in Table 1. After examining intercorrelations among putative moderators, all moderators were tested independently.

Table 1 Putative moderators of intervention effect by type

Results

Study Characteristics

For the RCTs, Tables 2 and 3 display participant characteristics and intervention features. At least 897 unique participants (accounting for possible overlap between studies) are included in at least one effect size. Participants’ mean age at study initiation was 43.01 months (SD = 17.97 months). A variety of interventions were implemented. Joint attention intervention / JASPER (8 studies) and PRT (6 studies) were most common. JASPER targets joint attention, play, and imitation through a combination of behavioral and developmental principles (Chang et al., 2016; Goods et al., 2013; Kasari et al., 2006). PRT is designed to target “pivotal” areas using ABA principles and to train caregivers in the strategies to do so (Hardan et al., 2015). Caregivers were the most common interventionist (17 studies). Table 4 displays details for effect size features, including outcome measures. The included studies used a wide variety of outcome measures and varied in the number of effect sizes per study, ranging from 1 to 78.

Table 2 Participant characteristics for included randomized controlled trials
Table 3 Intervention features of included randomized controlled trials
Table 4 Effect size characteristics and outcome measures for included randomized controlled trials

For the SCRD studies, Tables 5 and 6 display participant characteristics and intervention features. The studies included at least 143 unique participants with an average of 3.40 participants per study (SD = 1.17). Only participants who contributed data included for visual analysis or an effect size were included. The mean age of participants prior to intervention was 54.36 months (SD = 25.72). Table 7 displays effect size features (e.g., outcome measures and results) and visual analysis results.

Table 5 Participant characteristics for included single case research design studies
Table 6 Intervention features of included single case research design studies
Table 7 Visual analysis and effect size results for included single case research design studies

Quality Indicators

For the RCTs, overall risk of bias was judged to be high for 25 studies, moderate for 7 studies, and low for only 1 study. It should be noted that a “high” risk of bias rating for any category results in an overall risk of bias rating of “high”. See Tables 8 and 9 for details. Many studies were noted to be at risk for CME which resulted in high risk of bias for “Measurement of outcome.” Only seven of the RCTs provided sufficient information to determine whether there were deviations from the intended intervention (a component of “Deviations from intended interventions”). Of those, three indicated probable risk of bias. Studies were judged to deviate from the intended intervention if the mean or median procedural fidelity value was below 80%. “Sufficient information” required that procedural fidelity data to be drawn from at least 20% of sessions or participants. Similar gaps in reporting of outcome measure reliability were also observed, as shown in Table 9. The high number of studies without sufficient information about procedural fidelity and reliability reveals an area of need for improving the quality of available studies. It inhibits quantitative analysis of the influence of procedural fidelity and reliability on intervention effects. No studies were at high risk of bias for the randomization process and only three were at high risk for deviations from the intended interventions.

Table 8 Risk of bias for included randomized controlled trials

For the SCRD studies, only three studies included at least one distal outcome measure (Carpenter, 2003; Ingersoll & Wainer, 2003, 2013). All outcome measures were context-bound and at risk for CME. The high proportion of proximal and context-bound outcome measures is consistent with Yoder et al. (2013). For study quality, 25 of the 47 included reports met quality standards without reservations. As shown in Fig. 1, 44 reports that otherwise would have been included were excluded due to failing to meet quality standards (listed in Supplementary Information 2). Twenty-three SCRDs provided some type of summary value for procedural fidelity of the interventionist (see Table 10). Nine additional SCRD studies provided fidelity data for the interventionists, but not in a summative form (e.g., graphically or narrative description). However, only two studies (Randolph et al., 2011; Vogler-Elias, 2009) reported procedural fidelity data for the trainers (e.g., a trainer who taught a caregiver to implement the intervention). The ten remaining SCRD studies did not report procedural fidelity data. Similar to the RCTs, the gaps in reporting of procedural fidelity reveal an area of need for improving the quality of available studies. Relative to procedural fidelity data, the SCRD studies more consistently reported interobserver agreement (IOA) data for the outcome measure. Only one study omitted IOA data, revealing an area of strength for the included studies.

Table 9 Procedural fidelity bias rating and outcome measure reliability for included randomized controlled trials

Effect Size

We reject the null hypothesis that there is no effect of interventions using responsivity strategies on prelinguistic and language skills of children with ASD for the RCTs and SCRD studies (research question 1). For the RCTs, the mean standardized group difference is g = 0.36, 95% CI [0.21, 0.51], which is a moderate effect size. No variation in the weighted mean effect sizes were observed when we varied the p value in Stata at 0.1 increments from 0.0 to 0.9.

For the SCRD studies, the mean BC-SMD = 1.20, 95% CI [0.87, 1.54], which is large. No variation in the weighted mean effect sizes were observed when we varied the p value in Stata at 0.1 increments from 0.0 to 0.9. The difference in mean effect size between the RCTs and SCRDs may be due to methodological differences between group and SCRD studies. Thus, the effect sizes are not directly comparable between the RCTs and the SCRD studies. Relatively large effect sizes are easier to detect through visual analysis and may explain the publication bias toward studies with larger effects for SCRD studies (Shadish et al., 2015, 2016). As described in the Moderator Analyses section, we did identify evidence of publication bias. Other meta-analyses that combine group and SCRD studies have also reported relatively larger mean effect sizes for SCRD studies (Barton et al., 2017). Based on visual analysis of the SCRD studies, 41 graphs (45%) showed strong evidence, two (2%) showed moderate evidence, and 48 (53%) showed no evidence of a functional relation between the intervention with responsivity strategies and child prelinguistic and/or language skills. Opportunities to show a functional relation that showed strong evidence had a mean BC-SMD of 2.34 (SD = 2.18, range: 0.56 – 5.36). Those that showed no evidence had a mean BC-SMD of 0.67 (SD = 0.49, range: -0.20 – 2.69).

Moderator Analyses

The moderator analyses address our second research question about whether particular study features account for the observed heterogeneity. RCTs and SCRD studies were analyzed separately.

The Galbraith plots (Figs. 2 and 3) and τ2 values (0.19 and 0.67 for RCTs and SRCD designs, respectively) all provide evidence of substantial heterogeneity. We define heterogeneity as variation in estimated ‘true effects’ (Borenstein et al., 2009). This variation is differentiated from that due to spurious error in the computation of τ2 by considering the ratio of observed to expected variation across studies. The results show that there is notable dispersion of the effect sizes that is assumed to be real rather than spurious error. The larger τ2 value for SRCD studies than RCTs indicates greater dispersion in true effects for the SRCD studies than the RCTs. The Galbraith plot, which is an alternative to the forest plot for meta-analyses with a large number of effect sizes, displays more precise estimates further from the origin. The large number of effect sizes outside of the two parallel outer lines that represent that 95% confidence interval indicates substantial heterogeneity (Anzures-Cabrera & Higgins, 2010).

Fig. 2
figure 2

Galbraith plot for included randomized controlled trials

Fig. 3
figure 3

Galbraith plot for included single case research design studies

RCTs

For the RCTs, six moderator analyses were planned (i.e., interventionist, time in intervention, proximity, boundedness, risk for CME, and publication status). Context-bound outcomes exhibited a larger mean effect size (p < 0.05; g = 0.47) than generalized or potentially context-bound outcomes combined (g = 0.24). These results indicate the participants exhibited larger changes in behaviors that are measured in situations very similar to the treatment sessions (i.e., context-bound) than those measured in situations that vary from the treatment context in setting, materials, and/or communication partner. No other moderator analyses yielded significant results. Due to missing details in the included reports, time in intervention could only be extracted for 18 of the 33 RCTs. As a result, the degrees of freedom were too low to complete the analysis for time in intervention. Only a few studies that included caregivers as the interventionists reported the time caregivers spent conducting the intervention (Clionsky, 2012; Dawson et al., 2010; Green et al., 2010). As an alternative intensity variable, we tested time in intervention in weeks. However, even with more studies providing such information, the degrees of freedom were still too low (i.e., < 4) for a trustworthy result. Similarly, due to studies rarely being unpublished (i.e., four effect sizes from two studies), the degrees of freedom for this analysis were too low to interpret.

Table 11 displays results by subgroups to inform decisions regarding which moderators may warrant additional investigation. Of note, the mean effect size was greater than zero for effect sizes at risk for CME (g = 0.39), but not for those free from CME risk (g = 0.12). Except for the caregiver only subgroup, the relatively low number of studies in the interventionist subgroups resulted in low degrees of freedom and should be interpreted with caution.

Table 10 Procedural fidelity for interventionist and interobserver agreement for included single case research design studies

We calculated the correlations between each of the tested moderators to evaluate how distinct each moderator is from the others. Of all the pairs, only three exceeded r = 0.30: proximity of the outcome measure and time in intervention in weeks (r = 0.34), risk for CME and boundedness (r = 0.68), and time in intervention in weeks and time in intervention in hours (r = 0.75). Distal outcome measures were more likely to be used for studies of relatively longer duration. Studies not at risk for CME were more likely to use generalized outcome measures. The relatively high correlation between the time in intervention in weeks and time in intervention in hours is expected; time in intervention in weeks was derived as an alternative to time in intervention in hours to address missing data in the included reports.

A publication bias was not detected via the moderator analysis. However, the Egger’s test suggests publication bias against small studies with negative results (Fig. 4; p < 0.01). The moderator analysis for publication bias was likely limited by the relatively low number of effect sizes (i.e., four) reported from unpublished reports (Fig. 5).

Fig. 4
figure 4

Funnel plot of effect size (Hedges’ g) versus standard error for included randomized controlled trials

Fig. 5
figure 5

Funnel plot of effect size (between-case standardized mean difference [BC-SMD]) versus standard error for included single case research design studies

SCRD Studies

For the SCRD studies, three moderator analyses (i.e., interventionist, time in intervention, and publication bias) were completed. The other moderators tested for the RCTs did not have enough variation across the SCRD studies. All of the effect sizes were at risk for CME and used context-bound outcome measures. Only 7 effect sizes included distal outcome measures. For time in intervention, only 13 studies reported the necessary details. None of the moderator effects were significant. See Table 12 for moderator analyses by subgroup. No correlations between moderators exceeded r = 0.4. We completed a follow-up analysis comparing only studies implemented by a caregiver alone or a clinician alone, which were the two types of interventionists with sufficient degrees of freedom for reliable results. Effect sizes for interventions implemented by caregivers only had a mean effect size of 0.81 versus 1.90 for those implemented by clinicians only. Results approached, but did not reach, statistical significance (p = 0.06). Given the magnitude of difference in mean effect sizes and identified differences in prior meta-analyses, the role of interventionists warrants continued evaluation in the future, especially as the number of relevant primary studies increases. A publication bias was not detected via the moderator analysis. However, like the RCT analysis, the Egger’s test suggests publication bias against small studies with negative results (p < 0.001).

Table 11 Moderator analysis results by subgroup for included randomized controlled trials
Table 12 Moderator analysis results by subgroup for included single case research design studies

Discussion

Summary of Evidence

Based on 294 effect sizes from 33 RCTs and 69 effect sizes from 34 SCRD studies that included a total of 1040 participants, the weighted mean effect size of the effect of interventions using responsivity intervention strategies on child prelinguistic and language outcomes is moderate to large. The identified mean effect size (g = 0.36) is somewhat larger than that identified by Hampton and Kaiser (2016; g = 0.26) and Sandbank et al., (2020b; g = 0.13 for receptive language; g = 0.18 for expressive language;), which evaluated a wider variety of interventions on language outcomes. Visual analysis of 91 opportunities to demonstrate a functional relation from 37 SCRD studies provided somewhat weaker support, characterized by 45% of opportunities showing strong support, 2% showing moderate support, and 53% showing no support for the interventions improving child prelinguistic and/or language outcomes. Thus, heterogeneity in results is apparent through the effect sizes and visual analysis. Although the mean effect size for the SCRD studies was large, a nearly even split between “strong” and “no” evidence offers reason for caution in interpreting the results. Because many of the studies used a multiple baseline across participants design with three participants, the presence or absence of an effect for each participant could have a large impact on the overall judgment of a functional relation. In addition, the magnitude of effect sizes cannot be compared directly between the RCTs and SCRD studies due to methodological differences in study types.

Moderator analyses revealed that effect sizes using context-bound outcome measures had a larger mean effect size than those with potentially context-bound or generalized outcome measures for the RCTs. This finding is consistent with those reported by Yoder et al. (2013) and Fuller and Kaiser (2020) for social communication outcomes in children with ASD. In addition, RCTs at risk for CME exhibited a significant, positive effect size, but those free from CME risk did not. These results for the role of boundness and CME risk could not be replicated for the SCRD studies because all of the SCRD study effect sizes were context-bound and at risk for CME. Although some of the SCRD studies did include generalization probes (e.g., with a different communication partner or setting), in the vast majority of cases probes were not frequent enough to meet quality standards for inclusion. Similarly, very few effect sizes from the SCRD studies included distal outcome measures. Thus, proximity of the outcome measure could not be tested for the SCRD studies. Results for the RCTs and SCRD were consistent for publication bias being identified by the Egger’s test but not the moderator analysis. The relatively small number of unpublished studies for both types of studies limited the moderator analysis. For both RCTs and SCRD studies, we were unable to test for a moderating effect of time in intervention, despite attempts to use multiple intensity variables. Too few studies included key details about intensity in the included reports.

In sum, these findings provide support for the use of responsivity intervention strategies for children with ASD for improving prelinguistic and language skills. As expected, findings are more robust for context-bound outcome measures than other types of outcome measures (e.g., potentially context-bound and generalized characteristics).

Limitations

Limitations for meta-analyses are influenced by primary study level characteristics as well as meta-analytic level characteristics. For the current study, imprecise reporting at the primary study level, especially for the study’s intensity, limited analyses. Despite calls for improved reporting of intervention details, only about half of the RCTs and SCRD studies provided sufficient information to determine the time in intervention, a less precise variable than the cumulative intervention intensity (Warren et al., 2007). Given the potential importance of treatment intensity for the effectiveness, cost (financial and time), and feasibility of services for children, reports of future studies would be strengthened by explicit descriptions of intensity variables. Concerns of study quality of the included studies also influenced the meta-analysis. Forty-four SCRD design studies that would have otherwise been included were excluded because they failed to meet What Works Clearinghouse standards. Risk for CME was very common across included studies and should be attended to in future studies to minimize risk for bias.

The use of meta-analytic analyses for SCRD studies is a relatively new and still developing area. As a result, only studies that used a multiple baseline across participants design were able to be included in the current quantitative meta-analysis. Future meta-analyses on responsivity intervention strategies should be considered as other analytic approaches develop. Other limitations at the meta-analytic level include the potential failure to include relevant effect sizes and only including studies written in English. The risk of missing relevant effect sizes was minimized through multiple supplementary searches and completion of reliability checks at all screening levels. Lastly, robust variance estimation is most effective with at least 40 studies. Our analyses using robust variance estimation included 33 RCTs and 34 SCRD studies.

Strengths

Although our searches yielded effect sizes from fewer than 40 RCTs or SCRD studies, the use of robust variance estimation remains a strength of this meta-analysis. Robust variance estimation permits the inclusion of multiple effect sizes per study, which eliminates the loss of potentially important effect sizes. Second, we include both RCTs and SCRD studies, which is currently rare for systematic reviews and meta-analyses. This approach provides a more comprehensive review of the current literature base and opportunities for replication across the two study types. Third, we enhanced the quality of this meta-analysis by conducting interrater reliability for all screening levels and having two independent coders extract data (including risk of bias) for all included reports. Fourth, we considered the quality of the included studies through multiple avenues. We not only required studies to meet certain characteristics to be included, but also coded for study quality features including risk for bias and CME.

Clinical Implications

This systematic review and meta-analysis provides empirical support for the use of responsivity intervention strategies to improve prelinguistic and language skills of children with ASD. Because the data are more robust for context-bound outcome measures (e.g., behaviors that occur during the intervention or a very similar setting) than generalized characteristics (e.g., use of targeted skills in a novel setting with someone other than the interventionist), gains in generalized characteristics should be monitored closely during clinical practice. The observed benefits of responsivity intervention strategies were observed for a wide variety of outcome measures (e.g., joint attention, use of gestures, verbal utterances, and vocalizations), which suggests that these strategies have broad application including both prelinguistic and early language skills.

Research Implications

Additional, high-quality intervention studies regarding the observed benefits of responsivity intervention strategies are needed to further delineate the specific impact of these strategies and how features of such interventions can be adjusted to maximize gains. At the primary study level, future studies would be enhanced by continued improvement of study quality, especially minimizing risk for bias and CME, and more explicit reporting of putative moderators of treatment effects.

The need to report intensity data was especially apparent. Not only is such data needed to determine whether more intensive intervention is likely to have positive or negative effects on child outcomes, but also to control for intensity when investigating the role of other putative moderators, such as interventionist. Explicit reporting will improve the effectiveness of future meta-analytic moderator analyses. Primary studies that directly address the effect of intensity on intervention are also needed.

Continued inclusion of distal outcome measures in coordination with proximal measures is also warranted. Explicitly identifying outcomes as proximal versus distal will allow readers to accurately weigh the results. Because distal measures are expected to yield smaller effect sizes than proximal measures, achieving a relatively large effect size for a distal measure should be noted. SCRD studies can be used within a programmatic line of research to guide selection of outcome measures in RCTs. For example, an SCRD may include some generalization and maintenance data with sufficient data points to determine whether those dependent variables may be suitable distal and/or generalized characteristic outcome measures for a subsequent RCT. The evidence base would also benefit from the inclusion of studies that provide specific responsivity intervention strategies outside of large treatment packages as well as explicit descriptions of strategies implemented. Such evidence would facilitate ongoing efforts to identify active ingredients of interventions and inform modifications aimed at increasing effectiveness.

Conclusions

This meta-analysis provides support for the use of responsivity intervention strategies with young children with ASD to support growth in prelinguistic and language skills. Positive results were observed for both RCTs and SCRD studies. Moderator analysis indicated the need to attend to the potential roles of CME and boundedness of outcome measures. Concerns of study quality, risk for bias, and omission of key intervention details were also observed. These findings can be applied to future studies to enhance the quality of the literature base and the confidence of clinical recommendations for intervention practices.