Introduction

Autism is a lifelong pervasive neurodevelopmental condition (Elsabbagh et al., 2012) that affects people’s abilities in different areas including communication, interaction, behavior, and cognitive abilities (American Psychiatric Association [APA] 2013). Its prevalence has been increasing across the globe. According to recent US data, one in 44 children have autism (Maenner et al., 2018). Difficulties with social communication is one of the autism spectrum disorder (ASD) diagnosis criteria (APA, 2013). The term communication describes a broad array of verbal and nonverbal initiations and responses used in reciprocal social interactions (Wetherby et al., 2006). Children with autism have been reported to have a range of communication difficulties including making requests, initiating attempts to get others’ attention, responding to and sharing experiences with others (Bacon et al., 2019; Landa et al., 2007). They face challenges in nonverbal communication behaviors including joint attention, gaze shift, gestures, and shared positive affect as well as challenges in verbal communication behaviors such as initiating interaction, asking and answering questions, making comments and engaging in conversations (Mohammadzaheri et al., 2021; Landa et al., 2007; Wetherby et al., 2004).

Dialogic reading (DR) is a shared reading practice targeting preschool age children’s language skills wherein adults actively interact with children to encourage them to conversate about the text (Whitehurst et al., 1994). In DR, the adult asks questions taking an active listening role and the child answers taking an active participation role rather than being a passive listener (Whitehurst et al., 1994; Whitehurst et al., 1988). DR involves certain steps that the adult should follow during the reading, which are represented by the acronym PEER. Firstly, the adult prompts the child to participate by asking specific questions represented by the acronym CROWD (Whitehurst & Lonigan, 1998). Table 1 presents descriptions and examples of these prompts. Secondly, once the child responds to the prompt, the adult evaluates the accuracy of this response and either praises the child’s correct response or provides an alternative response for incorrect answers. Thirdly, the adult expands on that response by the addition of further information. Finally, the adult encourages the child to repeat the expanded response.

Table 1 CROWD Prompts

The effect of DR is well established in research (Towson et al., 2017) as it has been found to positively affect the language and early literacy of neurotypical children (e.g., Crain-Thoreson & Dale, 1999; Hargrave & Sénéchal, 2000; Opel et al., 2009; Whitehurst et al., 1988). Towson et al. (2017) evaluated 30 studies to examine the evidence base of DR in early childhood settings. The review found that 53% of the studies examined language skills, 10% of them assessed early literacy skills, and 27% of them measured both. Twenty-seven studies (90%) stated that DR increased language and emergent literacy. Additionally, Towson et al. (2021) conducted a literature review on using DR for children with disabilities. In this, 23 studies were identified wherein the interventions were used with children with disabilities such as autism, language delay and developmental disabilities. Nine of these studies included children with autism. The review reported that six studies often included additional strategies to the original DR strategies, such as pausing and repetition. The interventions across the 23 studies were implemented individually and in small groups. The settings were both home and school, and the intervenors were parents, teachers, and researchers. On the whole, language and communication showed positive effects in the single-case design studies and the expressive language outcomes showed large effect size in the group design studies.

In addition to language and early literacy, studies also focused on effect of DR on children’s communication skills. For example, Dale et al. (1996) used DR with 33 children with language delay and their mothers to examine its effectiveness on their communication and language development. Children’s responsive communication behaviors (e.g., response to adult question, verbal acknowledgement, nonverbal gestures, nonverbal attending) were measured. Whilst the results showed that children’s communication engagement (both verbal and nonverbal responses) was modest, the study as a whole found that DR had a potential positive effect on children’s communication skills. Similarly, Brannon and Dauksas (2012) investigated the effect of DR on the interaction amongst family members. Forty family members participated with their children, who were classified as ‘at-risk’ for developmental delay based on their language, social, gross motor and intellectual abilities. Families were divided into a DR group and reading-as-usual group. In the latter, family members read to their children without given instructions. The findings showed that the DR group used significantly more verbal interaction via a variety of literacy communication behaviors (i.e., questioning and expanding) than the control group. As a result, children in the intervention group engaged in longer conversations than children in the reading-as-usual group. However, none of these studies included children with autism.

Nevertheless, DR has been increasingly used with children with autism; researchers employed either the original DR intervention or an adapted version of it in which some adaptations were added to the PEER/CROWD components. The adaptations were included to provide more support to children with autism. One of the adapted DR interventions is RECALL which stands for “Reading to Engage Children with Autism in Language and Learning.” RECALL was developed by Whalon et al. (2013) to increase emergent literacy, language, and communication including children’s initiating interactions and responses to the initiations of others. It employs DR strategies and prompts (PEER and CROWD) and systematic instructional procedures, including least-to-most prompting hierarchy, joint attention, interaction prompts, and visual supports. Another DR adaptation uses special prompts (i.e., answering yes/no questions and pointing) to help young children with autism who may find the CROWD prompts difficult to answer (Fleury and Schwartz, 2017).

Present Study

The US Department of Education’s What Works Clearinghouse (WWC) stated that DR has potentially positive effects on language and communication development in children with disabilities (WWC, 2007). Children with autism find it difficult to initiate and respond to communication irrespective of their language development (Whalon et al., 2013). Since this is a core difficulty in autism, DR can potentially be an effective intervention to use with this population (Alharbi, 2021). DR is designed to encourage conversation including initiations and responses between adults and children about a story. Adults can provide children with autism with opportunities to both initiate and respond by using DR techniques to prompt their engagement and participation.

To the knowledge of the authors of this study, there are no previous systematic reviews on the DR effect on children with autism. More specifically, there are no previous systematic reviews on the DR effect on verbal and nonverbal communicative responses and initiations amongst children with autism despite the universality of their communication difficulties and the potential benefits of DR interventions on them. The purpose of this paper was, therefore, to fill in the existing gap in the literature. More precisely, it had three aims with regard to the effect of DR on the communication skills of children with autism. The first aim was to identify the characteristics of the DR intervention studies that focused on communicative initiations and responses for children with autism. These characteristics involve the type of DR (whether the intervention was original DR or an adapted version), duration, intervenor, training, children’s characteristics, adult/child ratio, and setting. The second aim was to examine the outcomes and effects of DR interventions on the certain communicative skills of children with autism (initiations and responses). The outcomes denote children’s outcomes across communication while the effect refers to the overall effect size of children’s outcomes. Finally, the third aim was to investigate the quality of these research studies. The quality of the research refers to the evidence that a study provides to establish its robustness.

Method

Search Procedures

This systematic literature review followed the guidelines of PRISMA (2009) and the article of “Reporting Standards for Research” (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008). A systematic search in the last 30 years (from 1990 to 2021) was conducted in two electronic databases: Education Resources Information Centre (ERIC) and PsycINFO followed by a hand search in a generic database: Google Scholar. Only ERIC and PsycINFO databases were used because they are the main databases for Education and Psychology. ERIC is “a comprehensive, easy-to-use, searchable, Internet-based bibliographic and full-text database of education research and information” (“ERIC,” (n.d.), What is ERIC? section). PsycInfo is one of “the most trusted index of psychological science in the world. With more than five million interdisciplinary bibliographic records, [the] database delivers targeted discovery across the full spectrum of behavioral and social sciences.” (APA “PsycINFO,” (n.d.), Celebrating 55 years section).

The search structure included a set of three keywords connected by the Boolean operator “AND”: [age] AND [autism] AND [dialogic reading]. The search terms for the keywords were as follows: (child* or student* or pupil* or pre-school* or kindergarten or “young child*” or “early years” or “young people” or nursery or “foundation stage” or “reception class*”) AND (autism* or “autism spectrum disorder” or “autism spectrum condition” or ASC or ASD or Asperger* or PDD* or “pervasive developmental disorders”) AND (“dialogic reading” or “shared book” or “joint reading” or “shared interactive reading” or “interactive reading” or “shared reading” or “book reading” or “shared book reading” or “storybook reading” or “read* aloud” or “book reading” or “reading intervention*” or “literacy intervention*” or “dialogic reading programme*” or “picture book reading” or “picture-book reading” or “interactive book reading” or “parent-child book reading” or “parent-child shared book reading” or “parent-child shared reading” or “parent-child interactive reading” or “parent-child storybook reading” or “parent-child joint reading”). The search terms were based on recent reviews (Towson et al., 2021), but were also expanded the terms to cover all aspects to ensure that all relevant studies were identified.

Inclusion and Exclusion Criteria

To be included in the review, a study should have (1) been published in a peer-reviewed journal or be a masters’ dissertation/doctoral thesis, (2) been written in English, (3) been published between 1990 and 2021, (4) included at least one participant with autism, (5) used DR (as defined by Whitehurst et al., 1988) or an adapted version of DR as the independent variable, (6) examined aspects of communicative initiations and responses of children (verbal and nonverbal), (7) involved one or more adults (i.e., teacher, parent, researcher, other school staff) to deliver the intervention, and (8) been an experimental study. Participants with autism were defined those who were reported to have a medical diagnosis of ASD and/or an educational determination of autism (i.e., eligible for services under the autism category of the Individuals with Disabilities Education Act [IDEA], 2004). Children’s communicative responses entailed verbal and nonverbal acts. Verbal communicative responses were defined as children’s verbal responses to the intervention prompts whether the responses were correct or incorrect as long as the incorrect responses were on topic. Nonverbal responses were defined as children pointing to the answer. Verbal initiations were defined as initiated on topic verbal interactions with the adult by using naming, comments, and questions while nonverbal initiations included pointing to show or share information.

An experimental study was defined as a study that has a control condition: a control group(s) or a baseline phase(s) (Horner et al., 2005) whether it was a group research design or a single case research design (SCRD). A baseline phase was required to have at least three data points. In an adapted version of DR, all the PEER/CROWD components needed to be present.

Studies were excluded from the review if they (1) used other shared reading interventions which might bear similarities to DR , but the authors did not identify this intervention as an adaptation of DR, (2) solely involved peers to deliver the intervention (fidelity of implementation might have been put at risk if peers delivered the intervention, Chang & Locke, 2016), (3) solely involved technology to deliver the intervention, and (4) were review papers, books/book chapters or conference proceedings.

Screening Process

Figure 1 presents the search and screening process. After conducting the initial search, 246 studies were identified. These studies were entered into EndNote (a reference management software package). After removing the duplicates, the number of studies was reduced to 179. The first author screened all the articles to assess whether they met the inclusion criteria. Only the article title and abstract were examined at this stage to determine inclusion in or exclusion from the study. At the end of the screening process, 25 studies were identified. Subsequently, the full texts of the studies were assessed by the first author, which resulted in excluding 16 articles. Two papers (Jackson et al., 2020; Whalon et al., 2013) only provided a rationale and description of the interventions. Six papers were excluded as they were not experimental studies as defined in this review; one was an action research (Lundy, 2020), three had no baseline at all (Balsamo, 2019; Plattos, 2011; Tan, 2014), one had no control group (Fleury & Towson, 2014) and one had fewer than three baseline points (Ward, 2018). Seven studies did not measure any aspect of communicative initiations and responses (Coogle et al., 2020; Coogle et al., 2018; Hudson et al., 2017; Nunes et al., 2021; Pamparo, 2012; Storie et al., 2021; Towson et al., 2016). Also, one paper (Irvine, 2018) was excluded as it only used secondary analysis of one of the other nine studies.

Fig. 1
figure 1

Search and screening process. Based on Moher et al. (2009)

To ensure reliability on the decisions to exclude those articles, the first author reviewed the 16 excluded articles. Then, the second and third author reviewed 50% of those articles each, independently. An agreement was counted when all the authors stated that a study did not meet all inclusion criteria. There was 100% agreement among the authors. Therefore, a total of nine studies met the inclusion criteria. In order to ensure greater rigor, the first author and an independent rater, doctoral student with background in special education and psychology, reviewed the full papers independently to ascertain whether they met the inclusion criteria. Inter-rater agreement between the first author and the independent rater was calculated for the studies that met the inclusion criteria. An agreement was counted when both raters stated that a study meets all inclusion criteria. There was 100% inter-rater agreement, and all nine studies were included in the review. Table 2 provides a summary of the reliability stages.

Table 2 Reliability processes

Data Extraction and Coding Procedures

A coding sheet was developed to extract the following features of the studies: (a) research design, (b) participant characteristics (i.e., number, age, and gender), (c) intervention characteristics (i.e., intervention, intervenor, setting/ group ratio & duration), (d) dependent variables, (e) fidelity of implementation, (f) outcomes, (g) certainty of evidence (quality of evidence), and (h) intervention effectiveness. Table 3 displays a summary of all nine studies. When studies investigated other interventions in addition to DR, only data for DR were extracted. Likewise, when studies recruited participants without autism in addition to those with autism, data extracted were limited to participants who had a medical diagnosis or educational determination of ASD. The reliability of the data extraction process was performed for 20% of the extracted data. The first author extracted all the data while the second and third author also extracted 10% of the data each. An agreement was counted when both authors extracted the same data. There was 100% inter-rater agreement. Table 2 provides a summary of the data extraction reliability.

Table 3 Summary of studies’ features, outcomes and quality scores

Data Analysis

Quality Assessment and Further Inter-rater Agreements

The quality of the reviewed studies was assessed to explore their robustness. Two quality assessments were used in this review as there is no single method that includes all the quality aspects the authors would like to consider. The first evaluating method was Reichow et al.’s (2008), which was specifically developed to evaluate and determine evidence-based practices in autism, whereas the second quality assessment was Terlektsi et al.’s (2019), which includes aspects that add to the first tool and the authors consider of significant importance for studies conducted in real world settings (i.e., ecological validity, critical reflections on limitations of the study and reporting of evaluation). Reichow et al.’s (2008) method is specifically designed for studies of children with autism and includes a number of primary and secondary quality indicators for both group research design and SCRD to assess the rigor of each study. Table 4 presents the quality indicators for group research and Table 5 shows the quality indicators for SCRD. Each primary quality indicator is scored as having high quality, acceptable quality, and unacceptable quality while each secondary quality indicator is defined as having evidence or no evidence. Then, based on these results, the studies are rated as having strong, adequate or weak research report strength. The strong research report strength shows concrete evidence of high quality, adequate research report strength has strong evidence in most, but not all areas, and weak research report strength shows many missing elements, and/or fatal flaws.

Table 4 Definition of group research quality indicators
Table 5 Definition of single subject research quality indicators

The second quality assessment was adapted from Terlektsi et al. (2019). The matrix is a comprehensive assessment as it was based on a number of studies that evaluated evidence-based practices for children and young people with hearing impairment (Terlektsi et al., 2019) and profound and multiple learning disabilities (Rushton et al., 2022). The matrix used specific criteria to examine different aspects of the study. Each aspect is assigned a score of 1 if there is only impressionistic evidence of impact, a score of 2 if there is modest evidence of impact, or a score of 3 if there is strong evidence of impact. Two adaptations were applied to the matrix with a view to making it more appropriate for studies with children with autism: (1) the generalizability component (i.e., it was adapted to include the characteristics of the autism population rather than describing the general population), and (2) the design component (i.e., it was adapted to include specific evaluative criteria for both randomized controlled trial (RCT) and SCRD studies as they have different design-specific issues of quality. Table 6 presents the adapted quality assessment matrix. Based on the mean scores across all components, the overall study was rated as having impressionistic to moderate quality if the scoring was between 1 and 1.9, or as having moderate to strong quality if the scoring was between 2 and 3.

Table 6 Quality assessment matrix

The reliability of assessing the quality of the studies was measured for both assessments. For each assessment, the first author evaluated the robustness of the quality indicators’ scorings for the nine studies by reviewing them as full text. Then, the second and third authors independently assessed the quality of all the papers (the second author checked five studies and the third author checked four studies for the first assessment while the second author checked four studies and the third author checked five studies for the second assessment). Agreement was counted when the two authors gave the same score for each assessed aspect of a study. A disagreement was counted when the raters gave different scores to an aspect. In case of disagreement, the authors looked again at each aspect separately and discussed their disagreement until an agreement was reached (i.e., mutually accepted final score). Thus, the raters fully agreed on the overall scores of all reviewed studies. The inter-rater agreement for Reichow et al.’s (2008) evaluating methods was 91.8% and for Terlektsi et al.’s (2019) quality assessment (i.e., individual components) was 86.6%, both exceeding the 80% agreement recommended by Reichow et al. (2008).

Assessment of the Effectiveness of the Interventions

The aim of assessing the interventions’ effectiveness was to determine the extent to which the interventions affect the skills (e.g., verbal responses) of children with autism. Investigating how effective each intervention was and which skills were improved informed the interpretation of results in the review and provided implications for practice and future research. The effect of interventions which followed an RCT was assessed following the process that was used by El Zein et al. (2013) in their systematic review on reading comprehension interventions. Their process was selected for this review because it assessed interventions for children with autism. El Zein et al. (2013) calculated the effect sizes of RCT using Hedges’ g formula (Hedges & Olkin, 1985):

$$\textrm{Hedge}{\textrm{s}}^{\prime }g=\frac{m_1-{m}_2}{SD\ pooled}$$
$$SD\ pooled=\sqrt{\frac{\left({n}_1-1\right){s}_1^2+\left({n}_2-1\right){s}_2^2}{n_1+{n}_2-2}}$$

The parameters, m1 and m2 are the means of outcomes of the control and experiment groups, n1 and n2 are the sample sizes of the control and treatment groups, and s1 and s2 are their standard deviations.

Tau-U was used to assess the effectiveness of SCRD studies. Tau-U is a nonparametric approach that makes it possible to analyze the effect size of SCRD. It combines non-overlap between phases and within-phase trends (Parker et al., 2007). Tau-U was used because it has strong statistical power and is considered one of the most robust effect sizes, controlling for any undesirable upward baseline trend and aligning with visual analysis (Parker et al., 2011).

In order to calculate Tau-U, data were extracted from the reviewed studies’ graphs with GetData Graph Digitize (Version 2.5.9) software, which is used for digitizing graphs and plots. Thereafter, the following steps were taken to calculate the overall effect size for each study. Firstly, baseline trend was checked for all participants in all studies. In line with the recommendation of Vannest and Ninci (2015), baseline trend was corrected when trend exceeded .20. Secondly, the effect size between baseline and intervention phases was calculated for all individuals. Subsequently, individual effect sizes were combined into an overall effect size. All the calculations were done using the Tau-U calculator (Vannest et al., 2016). Acknowledging that interpreting effect sizes are predicated on the context, the following interpreting guidelines were used: − 0.14 improvement was coded as a small effect, 0.35 improvement was coded as a moderate effect, 0.80 and above improvement was coded as a large effect. These effect size benchmarks were based on the 25th, 50th, and 75th percentile of the outcome domine (social communication) of interventions for young children with ASD (Chow et al., 2023).

Results

Research Design

Eight out of the nine studies used SCRD including multiple baseline designs across participants (Fleury & Schwartz, 2017; Fleury et al., 2014; Kang, 2017; Queiroz et al., 2020; Whalon et al., 2015), multiple probe-across participants design (Pierson et al., 2021), ABAB design (Jackson & Hanline, 2020), and repeated acquisition design (Whalon et al., 2016). One study used RCT design (Lo & Shum, 2020).

Participants

A total of 53 children with autism participated in the reviewed studies. Participants’ characteristics are presented in Table 3. The age of participants ranged between 3 and 8 years old with an average age of 55.4 months. However, Lo and Shum (2020) had 31 participants, six of whom, did not have a diagnosis of autism or an educational determination of autism but displayed significant autism symptoms as reported by clinicians. Therefore, these six had to be excluded from this review and also none of the participants of Lo and Shum (2020) was included in the gender and average age data as the age and gender were given for the overall 31 participants. In terms of gender, 25 (47.2%) of the participants were males, 3 of them (5.6%) were females, and the gender of the rest (47.2%) was unknown (Lo & Shum; 2020).

In terms of participants’ language/communication skills, the reported information in the reviewed studies varied. Some studies included a language/communication skills criterion (e.g., using phrases consisting of 2–3 words at least) while others described, with or without assessment scores, participants’ language/communication abilities. In general, the provided information indicated that—at least—the majority of participants’ language/communication ability levels were below their peers. Two studies mentioned that all or the majority of their participants had a language delay (Lo and Shum, 2020; Pierson et al., 2021). Table 3 presents more information.

Interventions

The studies were divided into two categories in terms of the used interventions: DR and adapted versions of DR (see Table 3). Only one study used DR without adaptations (Fleury et al., 2014). The remaining eight studies used adapted versions of DR to target the outcome measures as well as incorporate practices known to support the learning of children with autism. Five of them used RECALL (Jackson & Hanline, 2019; Kang, 2017; Lo & Shum, 2020; Whalon et al., 2016; Whalon et al., 2015). RECALL includes DR procedures in addition to least-to-most prompting hierarchy with visual supports. RECALL included a four-level prompting hierarchy. If the child does not respond or responds incorrectly, he or she is provided with three visual response options (level 1). If no correct response occurs, the child is provided with a binary choice (level 2). If the child fails to respond correctly, he or she is given a direct model and is asked to repeat it (level 3). If the child does not imitate, he or she is physically guided to point to the correct response (level 4). In addition to RECALL prompting hierarchy, Jackson and Hanline (2019) used a concept map, a visual support that presents a topic’s main idea, and then provides visual links to explain how the main idea is related to other concepts. Pierson et al. (2021) used an adapted DR with a system of least prompts with visual supports. When the child does not answer correctly, parents use:

“(a) provision of answer choices for participants with greater communication needs; (b) verbal prompts such as repeating the question, redirection to the task, or rephrasing the question; (c) reduction of answer choices; (d) gestural prompts such as pointing to the picture of the correct answer while verbally saying the correct answer; and (e) full physical prompts moving the child’s hand to the picture of the correct answer.” (Pierson et al., 2021, p.120).

One study (Queiroz et al., 2020) used an adapted DR, similar to RECALL, in which DR was combined with a least-to-most verbal prompting hierarchy. The verbal prompting hierarchy includes three levels. In level 1, the question is restated to the child. In level 2, the adult prompts the child to complete an utterance of the answer. Finally, the adult models the correct answer to the child (level 3). Fleury and Schwartz (2017) used an adapted DR wherein additional prompts were included in the DR prompts CROWD. These additional prompts, called “special prompts,” were used by the adult when the child had difficulties answering CROWD prompts. Special prompts included (a) providing the child with a choice of binary responses, (b) asking the child a yes/no question, (c) requesting the child to repeat a target word, and (d) asking the child to point to the correct picture.

In addition, the studies varied in the frequency of using the same books in the intervention. Three studies used each book twice (Fleury & Schwartz, 2017; Fleury et al., 2014; Lo and Shum, 2020) while another three studies used the same book three times before using another one (Pierson et al., 2021; Queiroz et al., 2020; Whalon et al., 2015).). Kang (2017) used each book between two to three times. Only one study (Jackson & Hanline, 2019) used a new book each session.

Intervenors

Teachers, researchers, and parents implemented the interventions in the reviewed studies (see Table 3). In two studies, the interventions were delivered by teachers and teaching assistants working in the children’s schools (Fleury & Schwartz, 2017; Kang, 2017) whereas the intervention was implemented by researchers in the remaining four studies (Fleury et al., 2014; Jackson & Hanline, 2019; Queiroz et al., 2020; Whalon et al., 2015). Finally, Pierson et al. (2021), Lo and Shum (2020), and Whalon et al. (2016) recruited parents to apply the intervention. Although the studies of Lo-Shum et al. (2020) and Whalon et al. (2016) were delivered by parents, children’s responses were coded in separate reading sessions conducted by the researchers.

Intervention Training

The papers were divided into three categories based on the training protocol they reported: (a) four studies reported a detailed training protocol (Fleury & Schwartz, 2017; Kang, 2017; Lo & Shum, 2020; Pierson et al., 2021), (b) three studies reported a brief training protocol (Queiroz et al., 2020; Whalon et al. 2016; Whalon et al. 2015), and (c) two studies did not report a training protocol at all (Fleury et al., 2014; Jackson & Hanline, 2019). A study was classified as having a detailed training protocol when it mentioned more than one element of training while a study was classified as having a brief training protocol when it only reported one element of training. In the first category, the studies mentioned a detailed training protocol that involved the training strategies used, training duration, as well as the fidelity of implementation. The training strategies included explicit instructions, PowerPoint presentations, videos of modelling the intervention, rehearsal, live demonstration and live coaching, and feedback. The training duration varied between 1 and 4 days and the session duration lasted between 1 and 4 hours. Additionally, four papers mentioned that the intervenors received feedback and coaching during the intervention phase (Fleury & Schwartz, 2017; Kang, 2017; Lo & Shum, 2020; Pierson et al., 2021). The studies of the second category provided a brief training protocol which only mentioned the training strategies without further details (e.g., direct instruction, video modelling of using the intervention, and/or role-play practice). While most of the studies used in person training, Pierson et al. (2021) trained parents via telepractice.

Fidelity of Implementation

All the papers provided information about the fidelity of implementation. However, the level of information varied across studies. Seven studies discussed how fidelity was measured and how estimates were calculated (Fleury & Schwartz, 2017; Jackson & Hanline, 2019; Kang, 2017; Pierson et al., 2021; Queiroz et al., 2020; Whalon et al. 2016; Whalon et al., 2015). Their fidelity estimates were high for these studies (86.9-100%). Fleury et al. (2014) also had a fidelity estimate of 100%, but the study collected data on using DR prompts (CROWD) only. On the other hand, Lo and Shum (2020) only mentioned that the research team contacted parents via phone on a biweekly basis for intervention integrity and the study did not provide estimates of procedural fidelity.

Intervention Duration, Settings, and Adult/Child Ratio

In terms of the intervention duration, the reviewed studies reported variable information. Only one study did not provide any information on the duration of the intervention (Kang, 2017). The remaining eight studies mentioned at least one of the following: the number of sessions, the duration of all the sessions, the duration of each condition (baseline & intervention), or the duration of the entire intervention. Overall, the duration of the intervention varied between 4 and 12 weeks. The number of sessions per week ranged from one to five sessions. Three studies (Jackson & Hanline, 2019; Lo & Shum, 2020; Queiroz et al., 2020) mentioned the session duration, which ranged from 6 to 20 min.

With regards to the setting, in five studies the intervention was conducted at school (Fleury & Schwartz, 2017; Fleury et al., 2014; Kang, 2017; Queiroz et al., 2020; Whalon et al., 2015) whereas in three studies the intervention was implemented at home (Lo & Shum, 2020; Pierson et al., 2021; Whalon et al., 2016). Finally, in Jackson and Hanline’s (2019) study, the intervention was conducted at a therapy center and school for one child, and at home for the other child.

With respect to the adult/child ratio, the intervention was delivered one to one in seven studies (Fleury et al., 2014; Jackson & Hanline, 2019; Kang, 2017; Lo & Shum, 2020; Pierson et al., 2021; Queiroz et al., 2020; Whalon et al., 2016) and in small groups (between two to five children) in two studies (Fleury & Schwartz, 2017; Whalon et al., 2015). The reported adult/child ratio included all children (with and without autism) participated in the studies. The adult/child ratio referred to all children, not just the children with autism.

Dependent Variables

While DR might have effects on a variety of skills, this review only focuses on communicative initiations and responses. The reviewed studies examined the effect of DR on a number of children’s communication skills. More precisely, all nine reviewed studies measured children’s verbal communicative acts. Five of them examined verbal responses (Fleury et al., 2014; Jackson & Hanline, 2019; Lo & Shum, 2020; Pierson et al., 2021; Whalon et al., 2016); and the other four studies measured both verbal responses and initiations (Fleury & Schwartz, 2017; Kang, 2017; Queiroz et al., 2020; Whalon et al., 2015). However, not all the studies agreed about what they considered verbal responses. While most of the studies captured only correct responses, two studies (Fleury & Schwartz, 2017; Fleury et al., 2014) captured both correct and incorrect responses if the incorrect responses were on topic. On the other hand, all the studies that measured verbal initiations agreed that the initiations should be on topic to be captured. Nonverbal communicative acts (i.e., joint attention and pointing to show or share information) was only measured in two studies (Queiroz et al., 2020; Whalon et al., 2015).

Overall, all studies but one (Pierson et al., 2021 which reported inconsistent effect of the intervention across participants), presented promising results and indicated that the intervention had an impact on children’s communicative initiations and responses. They concluded that DR is a promising shared reading practice for children with autism. More details about the effect of the interventions on each dependent variable are presented below.

Verbal Communicative Responses and Initiations

Eight studies that measured responding to adults’ prompts reported that the intervention increased children’s verbal responses (Fleury & Schwartz, 2017; Fleury et al., 2014; Jackson & Hanline, 2019; Kang, 2017; Lo & Shum, 2020; Queiroz et al., 2020; Whalon et al., 2016; Whalon et al., 2015). Seven of these studies used SCRD and their Tau-U scores ranged between a moderate to a large effect (see Table 3), and the overall Tau-U effect size was .68 indicating a moderate effect. However, the effect size of the only group design study (Lo & Shum, 2020) was .07, showing no statistical significance. On the other hand, Pierson et al. (2021) reported inconsistent effects of the intervention on children’s responses (unprompted and prompted). The scores of Tau-U were .048 for unprompted responses and − .02 for prompted responses.

In terms of verbal initiations, the studies reported mixed. Two studies reported that children’s verbal initiations were increased (Kang, 2017; Whalon et al., 2015) with a small to moderate effect size. On the contrary, the Tau-U of Queiroz et al.’s (2020) and Fleury and Schwartz’s (2017) studies ranged between -.12 to -0.19. The overall Tau-U (.25) showed a small effect size.

Nonverbal Communicative Responses and Initiations

Eight studies included nonverbal responses in the prompting hierarchy when measuring the effect of the interventions (Fleury and Schwartz, 2017; Jackson & Hanline, 2019; Kang, 2017; Lo & Shum, 2020; Whalon et al., 2016; Whalon et al., 2015). However, the nonverbal responses were combined with the verbal responses because the nonverbal responses were one of the four-level prompting hierarchies in these studies. Therefore, no specific data on children’s nonverbal responses were provided.

In addition, Queiroz et al. (2020) and Whalon et al. (2015) examined the effect of their interventions on children’s nonverbal intentions. The studies did not Queiroz et al. (2020) did not mention how they measured children’s nonverbal initiation and only reported that no effect was found with the Tau-U estimate of − .45. On the other hand, Whalon et al. (2015) examined nonverbal initiations by counting the number of times the child exhibited nonverbal communication skills, including joint attention and pointing to show or share information. In the study conducted by Whalon et al. (2015), three children showed an increase in their nonverbal initiation whereas the fourth child showed no change. Whalon et al.’s Tau-U effect size was .35 indicating a moderate change. The overall Tau-U score for both Queiroz et al.’s (2020) and Whalon et al.’s (2015) studies was -.05.

Quality Assessment

According to Reichow et al.’s (2008) evaluating method, five studies had strong research report strength (Fleury & Schwartz, 2017; Kang, 2017; Lo & Shum, 2020; Pierson et al., 2021; Whalon et al., 2015). The other four studies (Fleury et al., 2014; Jackson & Hanline, 2019; Queiroz et al., 2020; Whalon et al., 2016) had adequate research report strength. For Terlektsi et al.’s (2019) assessment, all studies had a quality score between 2 and 3, indicating moderate to strong quality. The studies’ quality scores for both assessments are presented in Table 3.

Discussion

To the knowledge of the authors of this study, this review is the first systematic literature review on DR interventions for children with autism, examining its effect on their verbal and nonverbal communicative responses and initiations.

Characteristics of DR Interventions

The review revealed two types of DR interventions used with children with autism: the original DR and adapted versions of DR. The results showed that both types of DR were equally effective for children’s verbal responses. The adapted versions of DR included practices that are known to further support the learning of children with autism (Whalon et al., 2013). Moreover, all the adaptations, but one (concept of map), were added to also impact the outcome measured (initiations and responses), which resulted in improving effect. For example, the review found that using these adaptations, visual prompts, in particular, had positive results in children’s verbal and nonverbal initiation. This finding is expected as individuals with autism tend to process and understand visually supported information more easily (Rao & Gagie, 2006).

In terms of the duration of the interventions, this varied between 4 and 12 weeks and ranged from one to five sessions per week. All the interventions, but one (Pierson et al., 2021), of this current review were effective, even the one with the shortest duration. This result is in agreement with Boyle et al. (2019) who indicated from their review that even a small number of shared reading sessions can benefit children with autism.

Another characteristic this review examined was the intervenors. The interventions were implemented by parents in only three studies (Lo & Shum, 2020; Pierson et al., 2021; Whalon et al., 2016). A similar finding was reported in the systematic review of shared reading interventions with children with autism (Boyle et al., 2019) which showed that only two out of the 11 reviewed studies involved parents as intervenors. The review suggested that natural intervention agents such as parents and teachers should be encouraged to carry out interventions. Having a few DR interventions delivered by parents might be explained by the fact that despite having an evidence-based record on home-based parent literacy mediating intervention in early childhood settings with neurotypical population (Barone et al., 2019), there is limited parental involvement in interventions for children with autism. Most shared reading interventions, and literacy interventions in the field of autism, are school-based rather than home-based.

Finally, one study used telepractice to train and coach parents to implement DR intervention with their children (Pierson et al., 2021). Training parents via telepractice is an extremely timely topic in the light of interruptions and restrictions due to COVID 19 (Watkins et al., 2021). Telepractice has multiple benefits including reaching interventions across the globe and to rural locations, cost-effectiveness and time convenience (Kossyvaki et al., 2022). On the other hand, certain families might face barriers to accessing telepractice interventions including time and participation constraints, technology deficits and setting challenges (Frederick et al., 2022).

Characteristics of the Outcomes Measured and the Observed Effect

While it is well known that a considerable proportion (25–30%) of individuals with autism do not develop spoken language (Anderson et al., 2007; Bacon et al., 2019), eight out of nine of the reviewed studies measured aspects of spoken communication. All participants in the eight studies were verbal. In most of these studies, one of the participants’ inclusion criteria was for the participants to have verbal abilities. Although being able to speak was not stated as an inclusion criterion in several of the reviewed studies, their research questions or aims indicated that these studies also targeted participants with verbal abilities. One explanation for excluding nonverbal participants from these studies might be that DR was originally developed to improve children’s spoken language skills (Whitehurst et al., 1988). However, it seems that research started recently to pay attention to this matter. Pierson et al. (2021), which is the most recent study of the review, included one nonverbal child. They added a low-tech augmentative and alternative communication (AAC) as a response mode (picture answer choices). Similarly, Storie et al. (2021) described how to pair DR with technology-enhanced AAC devices to ensure that children with autism who have limited verbal communication can participate in DR activities.

The nine studies in this review examined the effect of DR on children’s verbal communication. These studies used informal assessments created by the authors. Eight of nine studies reported an increase in the children’s verbal responses, which is consistent with the previous results of DR studies on neurotypical children (e.g., Crain-Thoreson & Dale, 1999; Whitehurst et al., 1994). This is a promising result when taking into consideration that most of the participants had low language/communication ability levels compared to their peers. On the other hand, one study (Pierson et al., 2021) found that DR had an inconsistent effect on children’s verbal responses. Pierson et al. (2021) stated that this finding might be due to the complexity of the intervention. Parents were asked to set up the book reading session, start with anticipatory set procedures, then implement DR components, prompt after evaluating following a system of least prompts and moderate child’s behaviors that occurred during the session. Therefore, it might have been difficult for parents to implement this multicomponent intervention without additional support. Indeed, parents were not able to implement some of DR components and could not demonstrate 100% use of both the anticipatory set procedures and PEER (Watkins et al., 2021). Thus, reducing the complexity of the intervention or adding additional telepractice coaching sessions may improve parents’ implementation of DR which could improve children’s verbal responses (Watkins et al., 2021).

In addition to verbal responses, mixed results were found when measuring verbal initiations. This might be attributed to the fact that children with autism often have difficulties in initiating communication (Bacon et al., 2019; Stone et al., 1997). Another explanation could be that the nature of DR (using prompts) was designed to encourage children to participate by answering questions rather than initiating communication. Nonetheless, Fleury and Schwartz (2017) reported that shared reading can be used to teach children with autism how to initiate communication (asking questions and making comments).

In addition to verbal communication, children’s nonverbal communication was measured in two studies (Queiroz et al., 2020; Whalon et al., 2015). Tau-U scores indicated a moderate effect for one study (Whalon et al., 2015) and no effect for the other (Queiroz et al., 2020). The better results of Whalon’s study might be explained by the facts that they added visual prompts to DR and employed a younger sample (age between 4 and 5 years) than Queiroz’s sample (age of 7 years). This result confirms previous findings indicating that DR is likely to work better with younger children (Mol et al., 2008). Indeed, DR was originally developed to target preschool age children (Whitehurst et al., 1994; Whitehurst et al., 1988).

Characteristics of the Research Quality

The two complementary assessments used to evaluate the quality of the studies (Reichow et al., 2008; Terlektsi et al., 2019) showed an agreement about the studies’ evaluation. All reviewed studies had strong or adequate research report strength and provided evidence of moderate to strong quality. The studies rated with adequate research report strength have issues regarding their participant characteristics, experimental control, or failing to meet many of the secondary quality indicators (e.g., fidelity, blind raters and generalization). However, even though a few studies had issues regarding their participant characteristics, the review ensured that all the studies included children with ASD following the review’s inclusion criteria and Reichow et al.’s (2008) evaluating method to assess participant characteristics (e.g., age, gender and specific diagnostic information). In addition, the nine studies presented the interventions in sufficient detail so that practitioners could use them and future research could replicate them. The studies had ecological and social validity and also reported extensive results and analyses. Indeed, four of the SCRD studies in this review and the RCT study had strong research report strength, which according to Reichow et al.’s (2008) evaluating methods means that DR had the empirical evidence needed to be considered an evidence-based practice for children with autism.

Group research designs were sparse in this review. Only one study (Lo & Shum, 2020) used RCT. Although RCT is considered the gold standard of research designs, there is still an argument against using it with heterogeneous populations such as children with autism (Horner et al., 2005). Several researchers consider individualized variation in interventions as best practices for participants with autism (Barton, Lawrence, & Deurloo, 2011; Delmolino & Harris, 2011) because they contribute to a significant understanding of individuals’ responses to interventions (Bulkeley et al., 2013).

Implications for Future Research

Research on shared reading for children with autism is still not as well-established as it is for neurotypical children, which increases the need for more studies. The reviewed DR interventions studies on autism are also few with small sample sizes making the need for more studies and with bigger samples pressing. When exploring DR, future researchers should not limit their focus to examining DR’s impact on children’s verbal communication. Children’s nonverbal communication should also be included as a dependent variable as this will provide opportunities to use DR with both verbal and nonverbal children and address the needs of a wider cohort. In this review, only one study included nonverbal children in their sample which highlights the great need to examine the effect of DR on nonverbal children with autism. Finally, there is limited parental shared reading interventions with children with autism (Whalon et al., 2016). Only three studies in this review were implemented by parents. On the other hand, they are many DR interventions with parents of neurotypical children (Mol et al., 2008). Thus, future research should focus on training parents to use DR with their children with autism.

Implications for Practice

According to this review, DR is a promising shared reading practice to use with children with autism. The variety of DR adaptations can allow teachers to choose the version that would be more beneficial for their students. The adaptations include using a least-to-most prompting hierarchy, a concept map, special prompts, and visual supports. Furthermore, it is recommended that teachers think of other DR adaptations that they believe are needed for the heterogeneous nature of their students with autism (Bulkeley et al., 2013). Moreover, when teachers use DR, they are encouraged to teach children how to initiate communication in the shared reading context, a skill which has not been widely explored in the reviewed studies. In addition to the limited use of DR with parents of children with autism, teachers need to support parents to use DR with their children at home and provide them with training.

Limitations

As is the case with other reviews, it is important to acknowledge that the current review has some limitations. The first limitation has to do with the effectiveness of DR interventions, especially the effect of the adapted DR interventions. All the adaptations applied in the reviewed studies were evidence-based practices for children with autism. Therefore, the effect of these interventions might be due to these adaptations rather than the DR itself. Secondly, the review used two electronic databases and one hand search in a generic database, a set of three keywords and specific search terms. The use of more databases and hand searches as well as the use of different and/or more keywords and search terms might have resulted in more and/or different papers. However, it is unlikely that more papers would be identified if other databases had been included as the two chosen databases are comprehensive in covering literature in the field of psychology and education, the two most relevant disciplines regarding the specific topic.

The third limitation is regarding the difference in how the reviewed studies captured responses; most of the studies coded only correct responses while two studies coded on-topic responses regardless of whether the response is correct. This might mean that most of the studies were not fully capturing the communicative responses between the intervenors and the children. Similarly, another limitation is the differences in delivering the interventions. In some studies, the dependent variables were collected when the intervenors were delivering the intervention while in two studies (Lo-Shum et al., 2020; Whalon et al., 2016) the dependent variables were not collected in the intervenors’ sessions but in separate reading sessions conducted by researchers. While this might increase the ecological validity of those studies, it could easily impact the effectiveness of the intervention. The same issue applies to the frequency of using the same books in the intervention. One study (Jackson & Hanline, 2019) used a new book for each reading session, while the other studies used the same book twice or three times.

The fourth limitation is the quality assessment of Terlektsi et al. (2019). The matrix deducts points based on the number of participants when scoring the sample size component and generalizability component because it considers having a small sample size a limitation. Thus, all the studies which used SCRD, but one (Fleury & Schwartz, 2017), received a low score in these components because they had fewer than five participants. This might be a limitation of the matrix as the number of participants is not relevant to quality when studies used SCRD. However, the review did not only rely of Terlektsi et al.’s (2019) assessment, it also used another method (Reichow, 2008) to evaluate the quality of the studies.

Finally, it is noteworthy that this review is not a meta-analysis but a systematic review. A meta-analysis is a quantitative approach that statistically analyses and combines the results of similar systematic review studies (Hedges & Cooper, 2009). It was not possible to conduct a meta-analysis as the studies in the current systematic review measured different outcomes that could not be combined (Ahn & Kang, 2018).

Conclusion

To the knowledge of the authors of this study, this paper is the first systematic review focusing on DR interventions for children with autism and more precisely its effect on their verbal and nonverbal communicative responses and initiations. The systematic review included nine studies investigating the original and adapted versions of DR interventions. All reviewed studies provided evidence of moderate to strong quality. The review showed an increase in the children’s verbal responses as well as found mixed results regarding verbal and nonverbal initiations. While inconsistent effects of the interventions were found, the review concluded that DR is a promising shared reading intervention and can benefit children with autism.