Introduction

A hallmark characteristic of children with autism spectrum disorder (ASD)Footnote 1 is difficulty in developing language skills (Kjelgaard & Tager-Flusberg, 2001). Among the different domains of language, pragmatics, the social linguistic aspect, is commonly regarded as the most affected domain (Geurts & Embrechts, 2008). For example, children with ASD often show limited verbal and/or nonverbal initiations and responses in social interactions (Drew et al., 2007; Tager-Flusberg et al., 2005). Difficulties in language among children with ASD also manifest in other linguistic domains, such as semantics (e.g., reduced vocabulary diversity and productivity), morphology (e.g., reduced usage of morphemes), and syntax (e.g., limited use of sentence structures) (Boucher, 2012). A variety of measures have been developed to evaluate language skills in the form of standardized tests or language sample analysis (Tager-Flusberg et al., 2009). Although language difficulties have been removed from the defining criteria for the diagnosis of ASD (American Psychiatric Association, 2013), language continues to receive much attention from researchers and therapists, as it is closely related to multiple areas of development in children with ASD, such as challenging behaviors (Matson et al., 2009) and anxiety (Barokova & Tager-Flusberg, 2020).

Timely and effective intervention is recommended to facilitate the development of children with ASD. While the demand for services is immense (Brown et al., 2011), service dissemination is often limited by geographical distance and costs related to traveling (American Medical Association, 2020). In the past two decades, telehealth has emerged as an alternative to traditional face-to-face clinical services. Telehealth allows therapists to deliver services using various telecommunication technologies, such as synchronous audiovisual technologies and asynchronous transmission of therapy content (Center for Connected Health Policy, n.d.). It effectively bridges the geographical gap and reduces client families’ financial burdens. The widespread use of digital devices and the internet has made telehealth services increasingly accessible. By 2018, 92% of U.S. households had at least one type of computing devices (e.g., desktop, laptop, smartphone, and tablet), and 85% had a broadband internet subscription (U.S. Census Bureau, 2021). Since the outbreak of the COVID-19 pandemic, telehealth has become extremely important to meet service needs when in-person visits are seriously challenged by the need of social distancing (Tohidast et al., 2020).

Despite the benefits of telehealth, there are challenges faced by therapists and client families, which may have an influence on its effectiveness. For example, Scott Kruse et al. (2018) reviewed barriers of adopting telehealth across different telehealth fields worldwide. Commonly reported barriers included limited technology literacy, limited internet access, and resistance to change. Other barriers were reported in the literature, such as challenges in rapport building due to the lack of physical proximity (Akamoglu et al., 2018) and low confidence as a result of the lack of training (Hao et al., 2021b). Given the challenges and the growing use of telehealth for families of children with ASD, particularly after the outbreak of the COVID-19 pandemic, there is a pressing need to evaluate its effectiveness and the level of evidence for the use of telehealth.

There have been a few systematic reviews summarizing the generic effectiveness of telehealth to improve children’s behaviors (Aresti-Bartolome & Garcia-Zapirain, 2014; Knutsen et al., 2016; Sutherland et al., 2018). However, the intervention programs and/or outcomes that they focused are wide ranging. For example, Ellison et al. (2021) reviewed a variety of telehealth intervention programs, including but not limited to, cognitive behavioral therapy, functional communication training, applied behavior analysis, and social communication therapy. As a result, a wide range of outcomes (e.g., anxiety, challenging behaviors, sleep, attention, engagement, and language) were reported in a generic manner. In addition, none of these studies focused on language outcomes or used meta-analysis to quantify children’s longitudinal changes before and after telehealth intervention.

Social communication is a broad area for which therapists of different fields, such as speech-language pathologists, behavioral analysts, and psychologists, provide support for children with ASD (e.g., American Psychological Association, 2022; American Speech-Language-Hearing Association, n.d.-a). Difficulties in language widely manifest in children with ASD and limit the development of other skills. Therefore, language outcomes are an important area of focus for therapists of diverse disciplines who work with children with ASD (e.g., Behavior Analyst Certification Board, 2021). This study served as a focused systematic review and meta-analysis investigating the effectiveness of telehealth-based social communication interventions on language skills among children with ASD. To provide a refined way to study the effectiveness of these telehealth intervention programs, we categorized language outcomes into different linguistic domains (i.e., phonology, morphology, syntax, semantics, and pragmatics). A quality assessment was implemented to assess the research rigor of the existing studies. In addition, participant and intervention characteristics were summarized.

Methods

Article Search

The systematic review followed the guideline of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (Moher et al., 2009). The authors conducted a literature search in October 2020 to identify relevant articles through a key word search in five databases, including PsycINFO, PubMed, CINAHL, ERIC (EBSCO), and Psychology and Behavioral Sciences Collection. Three clusters of terms were applied in the database search. The first cluster was for the ASD definition (i.e., ASD OR autis* OR development* disab* OR autistic disorder OR PDD-NOS OR Asperger).Footnote 2 The second cluster was to identify telehealth studies. We included all the terms used in nine previous review studies in relation to telehealth among children with ASD (e.g., telehealth, telemedicine, and teleconferenc*) (Akamoglu et al., 2020; Aresti-Bartolome & Garcia-Zapirain, 2014; Boisvert & Hall, 2014; Boisvert et al., 2010; Ferguson et al., 2019; Knutsen et al., 2016; Meadan & Daczewitz, 2015; Neely et al., 2017; Parsons et al., 2017). The third cluster defined targeted intervention outcomes of children’s language skills by referring to Tager-Flusberg et al.’s (2009) framework of language development (i.e., gesture* OR word* OR vocabulary OR sentence* OR morpholog* OR language OR gramma*). For a full list of the three search clusters, see Appendix 1.

Considering that the outbreak of the COVID-19 pandemic could result in more interest and attention to telehealth, we went through the same database search procedure using the same search terms in July 2021, targeting studies published during the previous year to capture newly published telehealth studies that may fall within our research focus. We did not limit the time of publication but only included articles published in peer-reviewed English journals. The database search was supplemented by reviewing the references of the nine review articles mentioned in the previous paragraph (they had different focuses from the current review but were related to telehealth among children with ASD). The study protocol has been registered with Open Science Framework.

Eligibility Criteria

To be included in the review, a study needed to include a telehealth intervention that (a) aimed at serving children diagnosed with ASD (not at risk for ASD) between the ages of 0 and 18 years; (b) was delivered via synchronous online video conferencing, asynchronous web-based tutorial, or a combination of synchronous, asynchronous, or in-person training (American Speech-Language-Hearing Association, n.d.-b); (c) was not an intervention that only entailed mailing DVD/USB drive/printed paper instructions to the home for caregivers/children to learn by themselves; (d) was an intervention program that primarily targeted social communication skills; (e) was not a program that exclusively focused on disruptive behaviors or anxiety; and (f) had at least one outcome measure concerning children’s language skills, which may be derived from spontaneous language sample analysis and/or standardized tests. Due to the extensive amount of time spent with each participant for intervention, a small sample size was anticipated, and design may vary. Therefore, there was no restriction on a minimum sample size or a specific study design.

Study Selection and Data Extraction

Study selection included title screening, abstract screening, and full-text assessment. Using Microsoft Excel Spreadsheet, studies included were marked as “Y,” and studies excluded were marked as “N” with a brief statement of rationales. Duplicates were identified using the “find” function in Excel by entering the title of each study. After removing all duplicates, two research assistants (RAs) independently screened the titles based on the eligibility criteria and then met virtually to resolve disagreements by discussion. If eligibility could not be determined based on the title, the two RAs marked the article “Y” to allow for more information in the abstract screening to determine eligibility. After excluding all ineligible studies, the two RAs proceeded to abstract screening following the same procedure and further excluded studies that did not meet the eligibility criteria. Finally, the first and second authors did full-text assessment independently, resolved disagreements by reading the full texts together with regard to the eligibility criteria, and determined the final list of studies.

After the eligible articles were identified, a table was developed for data extraction. The first and second authors independently extracted data from the identified articles, including study design, participant characteristics (i.e., sample size, direct recipients of intervention, child age and gender, and parent education), intervention characteristics (i.e., country of study, intervention program, length of intervention, and telehealth type), and child language outcomes.

Meta-analysis

Meta-analysis was planned for each of the linguistic domains (phonology, morphology, syntax, semantics, and pragmatics) using pre- and post-intervention data. For details about the procedure of the categorization of language measures, see Appendix 3. For group studies, the data extracted were pre- and post-intervention means and SDs. For single subject experimental design (SSED) studies, we extracted baseline and post-intervention means for each child, which were averaged for group means and SDs based on individual data. We did not include follow-up data, as it relates to the maintenance of an intervention effect which is beyond the current research focus. Nine studies presented the specified data in charts or only included ranges (not means and SDs) for which we could not extract the needed data. To obtain the data, we reached out to the corresponding authors to request the data. We received the data for four studies, and the remaining five studies could not be included for meta-analysis.

A meta-analysis was conducted with the metafor R package (Viechtbauer, 2010). Given the heterogeneity across the studies, effect sizes for each study were performed using random effects model (Borenstein et al., 2009), and Hedge’s G was employed to indicate intervention effects. Heterogeneity between trial results was tested with an I2 statistic, with 50% and above as considerable heterogeneity (Higgins et al., 2003). In order to estimate the potential influences of the unincluded studies, publication bias was assessed with rank correlation test for the funnel plot asymmetry (representation of study distribution), Egger’s regression test (detection of funnel plot asymmetry) (Egger et al., 1997), and the trim and fill method (estimation of unpublished studies) (Duval & Tweedie, 2000). Data that could not be included in meta-analysis was summarized using visual observation.

Study Quality Assessment

Quality assessment was conducted for the included studies based on Reichow et al.’s (2008) evaluation method. This tool is specifically designed for intervention studies targeting individuals with ASD and has been widely adopted by recent review studies in the field (Chang & Locke, 2016; Ferguson et al., 2019; Maggin et al., 2012). Importantly, it contains two distinctive quality assessment rubrics for group studies and SSED studies, which is especially useful for the current review, as both research designs were used in the extant literature.

This assessment tool examines research rigor by focusing on primary (e.g., appropriate description of participant characteristics, independent and dependent variables, and comparison conditions) and secondary (e.g., use of random assignment, interobserver agreement, and use of blind raters) quality indicators while distinguishing between group and SSED studies. These indicators assess both a study’s risk of bias and clinical significance of outcomes. The overall assessment of study strength (i.e., strong, adequate, or weak) was determined based on the number of quality indicators being met, accounting for the differences between group and SSED studies. A strong study should receive “1 (yes)” ratings on all primary quality indicators and at least three (for SSED studies) or four (for group studies) secondary quality indicators. Two co-authors each independently rated the quality indicators for all the studies.

Reliability

Agreement between the two RAs/co-authors was 94.7% for title screening, 93.9% for abstract screening, and 95.8% for full-text assessment, averaging agreements of the two times of database searches. Consensus was reached through discussion between the two parties. Data extraction reliability was conducted by comparing item-by-item coding between the first and second authors. The number of agreements was 234, and the number of disagreements was 18, which yielded 92.9% consistency. All disagreements were resolved by discussion. For study quality assessment, the two co-authors achieved an interrater consistency of 89%. Inconsistent ratings were reexamined and discussed, and consensus was reached.

Results

Details about the inclusion and exclusion procedure are presented in Fig. 1. This procedure yielded 21 eligible articles that were included in the current review.

Fig. 1
figure 1

Article search and selection procedure

Table 1 presents the results of data extraction. Regarding sample size, a total of 169 children with ASD (primary and/or secondary diagnosis of ASD, but not at risk of ASD) were included in the 21 eligible studies. There were 110 children with ASD included in the four group studies (two randomized controlled trials—RCT and two quasi-experimental studies) which are presented at the top of Table 1. The two RCT studies compared different telehealth approaches (self-directed vs. therapist-assisted) (Ingersoll et al., 2016) or different telehealth programs (Early Start Denver Model vs. regular communication intervention model) (Vismara et al., 2018). The remaining 59 children with ASD were included in SSED studies which are presented following group studies in Table 1.

Table 1 Data extraction results for the included studies

The age range was between 1;4 (year;month) and 11 years old. Age was categorized based on the stages of child development (Centers for Disease Control and Prevention, n.d.-a), including infants (under age 1), toddlers (1–3 years), preschoolers (4–6 years), middle childhood (7–12 years), and teenagers (13–18 years). Participating children were predominantly preschoolers (18 studies), followed by toddlers (12 studies), and then middle childhood children (5 studies).Footnote 3 Although children’s ages were not restricted while doing the database searches, the review did not identify any study including either teenagers or infants with ASD. Four studies did not include child gender, and the 17 studies that did include it presented data for 32 (25.6%) girls and 93 (74.4%) boys.

Regarding direct recipients of therapy, only one study reported direct telehealth intervention for children with ASD (Boisvert et al., 2012). It is worth mentioning that the two children were both 11 years of age which was older than children from other studies. Among the remaining 20 studies, 14 studies reported only parent training via telehealth, three used telehealth to provide training for professionals only (e.g., onsite SLPs and special educators), two reported telehealth intervention supporting both parents and professionals, and one study trained multiple family members of a child with ASD.

Parent education level is commonly used to indicate family socioeconomic status (Casagrande & Ingersoll, 2017), which may be related to the likelihood of owning digital devices and access to the internet. Almost one-third of the eligible studies (8 studies) did not report parents’ level of education. Seven studies only included parents who had received associate degrees and above. Five studies included parents who had completed high school and above. One study did not provide complete information about parent education (only percentage of parents who had received college degrees and above).

Intervention Characteristics

The studies were predominantly conducted in the U.S. (18 studies; 85.7%). The remaining three studies were conducted in Iceland (n = 2) and Singapore (n = 1). There was a variety of social communication intervention programs, which are organized and described in Appendix 2. Commonly used programs included the Early Start Denver Model (ESDM) (4 studies), the Improving Parent as Communication Teacher (ImPACT) program (3 studies), and the Naturalistic Developmental Behavioral Interventions (NDBIs) (3 studies). These programs appeared to go beyond social communication skills and addressed play, social-emotional, or cognitive skills; however, social communication was a primary focus. Other programs were more focused on social communication and language skills, such as the internet-based modified Parent-implemented Communication Strategies (i-PiCS) and the Prepare, Offer, Wait, Response (POWR). Three studies incorporated augmentative and alternative communication (AAC) to teach functional language to replace children’s idiosyncratic behaviors that were hard to interpret.

There were 17 studies that provided information about hours of intervention. However, among these studies, two did not provide complete data (only reported data for one group/participant), and four studies estimated hours of intervention (not the exact intervention hours). Based on the data, the estimated range of intervention hours was 1.13 to 53 h, averaging 12.33 h per direct recipient. There were 11 studies that reported the overall duration of intervention, and the range was from 4.4 weeks to one year (52 weeks).

Telehealth types were categorized based on ASHA’s categorization (American Speech-Language-Hearing Association, n.d.-b), including synchronous (i.e., live video conferencing), asynchronous (i.e., self-paced online learning through a website or a mobile application), and hybrid (i.e., combination of synchronous, asynchronous, or in-person services). Synchronous live video conferencing was the most commonly used approach (8 studies), which was followed by the hybrid of in-person and synchronous services (7 studies) and the hybrid of synchronous and asynchronous services (6 studies). The asynchronous approach involving web or app-based training (1 study) and the hybrid of in-person and an asynchronous approach (1 study: in-person sessions for initial screening and self-paced website learning for sessions afterward) were the least prevalent.Footnote 4 One study planned to provide in-person services but had to switch to telehealth after a few sessions due to the outbreak of the COVID-19 pandemic (Gevarter et al., 2021).

Summary of Language Outcome Measures

The eligible studies included a range of language measures (Table 1). There were two forms of assessments, including standardized tests and language/play sample analyses based on video-taped probes/sessions. Two standardized tests were used in four studies, including MacArthur-Bates Communicative Development Inventories (MCDI), a parent report that captures children’s skills in word comprehension, word production, gestures, and grammar, and the Vineland, an adaptive behavior test which includes subtests of receptive and expressive language and other aspects (e.g., daily living and motor).

Appendix 3 summarizes the measures derived from language/play sample analyses, definitions of the measures, and the categorization of linguistic domains. These measures were wide ranging, such as initiations, responses, functional verbal utterances, requests, communication turns, mean length of utterances, and number of different words. While data collection approaches showed consistency across studies (i.e., video-recorded play/language sample), definitions of the same measure varied or slightly varied across studies, for instance, functional verbal utterances in Vismara et al. (2009) and Vismara et al. (2018). Appendix 3 shows that pragmatic measures were included in 18 studies, semantic measures were included in three studies, and measures involving more than one domain were included in three studies.

Meta-analysis for Pragmatic Outcomes

In the meta-analysis, we made efforts to reduce heterogeneity of language outcome measures. We removed the outcomes derived from standardized tests, as these measures mixed receptive and expressive language. The remaining outcome measures were all derived from language/play sample analysis which reflected expressive language only. Due to that very few studies examined semantics (3 studies) and mixed domains (3 studies), meta-analysis was only performed for pragmatic measures.

To avoid the inflation of the effect size of an individual study, if a study included more than one pragmatic measure, only one pragmatic measure was selected. See Appendix 3 for the details about the selection of single pragmatic measures. To stay focused on the effectiveness of telehealth, the data of the in-person comparison group in Hao et al., (2021a) and Vismara et al. (2009) was not included in the meta-analysis. Although 18 studies included pragmatic measures, data of five studies was not available after requesting corresponding authors (specified in the Method section), leading to 13 studies used for the meta-analysis, totaling 90 children with ASD.

It was shown that the children with ASD significantly improved pragmatic skills from pre- to post-intervention (Standard Mean Difference (Hedge’s G) = 0.89; 95% confidence intervals (CIs) = 0.54:1.24; p < 0.001). The effect size of 0.89 is considered to be large (Cohen, 1988). No significant heterogeneity was detected between studies (I2 = 0.00%). The rank test, Egger’s regression test, and the trim and fill did not detect publication bias (see Fig. 2). To exclude the possibility that our selections of the pragmatic measures biased the findings, we did a different effect size calculation including only the unselected pragmatic measures, which indicates significant changes pre- versus post-intervention among children with ASD (Hedge’s G = 1.09, 95% CIs = 0.49:1.69; p < 0.001; I2 = 37.52%).

Fig. 2
figure 2

Forest plot of children’s progress in pragmatic skills before and after telehealth-based intervention. Squares indicate mean individual study effect sizes. The diamond indicates the cross-study summary effect size

Outcomes in Other Linguistic Domains

Regarding the outcomes in other linguistic domains, three studies included five semantic measures, including number of different words (30 children), transition words in narrative production (1 child), number of phrases generated using AAC (1 child), number of modifiers in requests (1 child), and number of words in requests (2 children), totaling 35 children. Three studies used measures evaluating more than one linguistic domain, including mean length of utterances (30 children), individualized language targets (29 children), and number of correct responses to comprehension questions (3 children), totaling 62 children. Standardized tests (i.e., MCDI and Vineland) evaluated different linguistic domains and included 46 children across the four studies.

Observation of children’s scores indicated improvement pre- to post-intervention for all the studies; however, the amount of progress appeared to vary across different children within the same study. For example, in Baharav and Reiser (2010), one child increased from 47 to 278 words understood on MCDI, whereas the other child’s improvement was relatively mild, from 221 to 311. Individual differences regarding telehealth suitability were noted in some of the studies. For example, Boisvert et al. (2012) reported that one child responded more favorably to telehealth than in-person therapy, possibly due to that telehealth offered a more natural environment (home setting) which reduced the child’s anxiety.

Quality Assessment Results

Table 2 presents quality assessment results, and detailed ratings of indicators are presented in Appendix 4.1 and 4.2. Among the 21 included studies, four were group studies, and the others were SSED studies. All of the four group studies had a strong rating in study strength. Regarding the 17 SSED studies, nine were rated as strong, seven were rated as adequate, and one was rated as weak. For studies rated as less than strong, detailed rating (Appendix 4.1 and 4.2) shows that they were missing at least one primary indicator, such as insufficient description of participant characteristics, independent variables, or dependent variables. They also tended to have missing secondary indicators, such as inadequate information regarding interobserver agreement, fidelity, blind raters, or generalization/maintenance.Footnote 5

Table 2 Quality assessment of the 21 included studies

Discussion

We conducted a systematic review and identified 21 telehealth-based social communication intervention studies for children with ASD. Meta-analysis was initially planned for each linguistic domain; however, only pragmatics was evaluated using 13 studies. There was a paucity of studies that included measures of other domains. The results revealed significant pre- to post-intervention progress in pragmatic outcomes, providing initial evidence for the effectiveness of the telehealth social communication programs to improve pragmatic skills in children with ASD. A quality assessment showed that 20 out of the 21 studies were rated as strong or adequate. The results should be interpreted with caution, considering the small sample size, the lack of RCT providing telehealth treatment vs. nontreatment comparisons, and the lack of measures focusing on linguistic domains other than pragmatics.

Potential Unmet Service Needs in Infants and Teenagers

Studies identified in this review primarily focused on preschoolers and toddlers with ASD, suggesting that the current telehealth social communication interventions targeted mainly young children. Age played an important role in determining direct versus indirect telehealth delivery. Among the 21 studies, only Boisvert et al. (2012) provided direct services to two 11-year-old children with ASD, while all the others provided indirect services (i.e., parents or professionals) for children younger than 8 years old. This may be explained by the technology demands on younger children to operate both software and hardware tools to engage in direct services. Age and prior technology experience impact children’s computer literacy skills (Lane & Ziviani, 2010). In a qualitative study in children between 6 months and 6 years old, Calvert et al. (2005) found that older children had a greater likelihood to engage in activities such as turning on a computer, using the computer without sitting on a parent’s lap, and controlling a mouse. Acquisition of these skills is essential for children to independently participate in telehealth training programs without assistance.

This review did not identify studies that included teenagers and infants with ASD, although we did not set age restrictions in the searches. The lack of studies for participants younger than one year old is likely because ASD is typically diagnosed around age three, and in rural communities, the average age of diagnosis could be delayed to age seven (Solomon & Soares, 2020). Reasons for the lack of studies in teenagers with ASD are not as apparent. It could be that social communication services are most commonly provided for preschoolers and elementary-aged children (Turcotte et al., 2016). We speculate that teenagers with ASD may have received many years of therapy, and by teenage years, their goals have been achieved or shifted to vocational skills. Teenagers are more likely to have accumulated technology fluency, making them more appropriate candidates than younger children to receive direct telehealth services. Future research should pay more attention to addressing the unmet service need among adolescents with ASD.

Telehealth Types

Live video conferencing is the most commonly used telehealth approach, in which participants can receive real-time feedback on intervention strategies. Researchers also recognized the benefits of utilizing asynchronous approaches for self-learning, which appeared to be a good supplement for parents or professionals who are unable to receive real-time training due to scheduling conflicts. Several studies (e.g., Douglas et al., 2018) used in-person sessions for initial screening and baseline data collection, possibly due to concerns about building rapport. Whether rapport building is a significant challenge that impacts the use of telehealth across different intervention phases is still an open question which may be specific to client age and interaction. Lincoln et al. (2015) did not report concerns regarding rapport building in telehealth services for school-age populations. However, from a service provider’s perspective, O’Cathail et al. (2020) noted that clinical practitioners (e.g., pediatricians, dietitians, and general practitioners) recommended that teleconsultations are more appropriate for follow-up appointments. Specifically, the researchers found teleconsultations created physical and emotional barriers and impaired dialog flows during remote meetings, resulting in awkward and uncomfortable experiences.

Clinicians are recommended to consider family and client characteristics to decide the use of synchronous, asynchronous, in-person, or hybrid service delivery approaches. Future studies should continue exploring this important issue to provide more detailed guidelines for determining whether an initial in-person session is needed. In the era of the COVID-19 pandemic, an increasing number of families have been receiving telehealth services at home. More support should be given to build a home-based therapy environment. For example, Law et al. (2018) implemented a toy preference assessment using the Reinforcer Assessment for Individuals with Severe Disability, which allowed families to use preferred toys during telehealth sessions.

Telehealth Intervention Programs

The commonly used programs (e.g., ESDM) target skills beyond social communication and language and incorporated intervention for other areas such as social-emotional and cognitive skills. These programs have had a long and established practice history (Solomon & Soares, 2020), which could lead to more scholarly research in telehealth interventions for children with ASD. Their wide range of intervention focuses may reflect the specific needs of the ASD population, which include social-emotional problems, lack of play skills, and cognitive deficits in addition to difficulties in social communication and language (Centers for Disease Control and Prevention, n.d.-b). Research showed that the symptoms are related to each other (e.g., depressed language skills are related to emotion problems and challenging behaviors) (Barokova & Tager-Flusberg, 2020; Matson et al., 2009). The programs thus were designed to address a wide scope of issues that may influence each other, leading to better intervention outcomes.

Hours of intervention were not always reported in the current studies. While 17 studies reported the data, six of them provided incomplete data or estimated hours of intervention. Fewer studies (n = 11) provided the overall duration (how many weeks/months the intervention lasted). It should be noted that the focus of the current study was not examining the “dosage” effect of telehealth social communication intervention on language skills. Given the heterogeneity of language measures and intervention programs, the extant literature cannot answer this question. It is better addressed in future review studies when a large number of children with ASD participate in the same intervention program and are evaluated using consistent outcome measures.

Different populations or clinical diagnoses influence how therapists design telehealth programs. Challenging behaviors appeared to be a key factor to consider when delivering telehealth services to children with ASD. Interestingly, our review revealed different views on this issue. Boisvert et al. (2012) viewed that telehealth brings a more predictable social environment that can reduce anxious and adverse behaviors among children with ASD, whereas Guðmundsdóttir et al. (2019) reported that delays caused by technical problems (e.g., internet connection) can trigger children’s challenging behaviors during telehealth. The discrepancy may lie in the variation of the child’s preferences (e.g., home or clinic) and the parent’s or the clinician’s internet access and technology fluency. These factors should be considered to reduce challenging behaviors when telehealth is delivered to children with ASD.

Telehealth Effectiveness on Language Skills

The meta-analysis, based on 13 studies, showed that telehealth-based social communication intervention programs resulted in significant improvement in pragmatic skills. Although we planned to evaluate different linguistic domains, the current telehealth intervention studies only allowed us to measure pragmatic skills. This calls for more diverse language measures beyond the domain of pragmatics in future telehealth studies among children with ASD, which will provide a better understanding of the effectiveness of telehealth interventions on language skills in a comprehensive manner. Also, we noticed that the definitions for the same measure appeared to be different or slightly different across studies (e.g., functional verbal utterances in Vismara et al. (2009) and Vismara et al. (2018)), indicating the subjective nature of outcome measures in the current literature. In the future, more consistent measures need to be used in research to enhance cross-study comparisons.

Visual observation was conducted for measures not included in the meta-analysis, including a range of measures covering semantics or mixed linguistic areas, for example, number of different words (semantics) and mean length of utterances (morphology and syntax). While all the children demonstrated progress on these measures, the degree of progress varied across children. Some children showed quick and tremendous improvement, but others showed mild progress. This is possibly related to the notion that telehealth should not be regarded as appropriate for all individuals with special needs, and it is necessary to assess individual client for telehealth suitability before making recommendations of service delivery approaches (Colombo et al., 2020; Hao et al., 2021b).

Before drawing a conclusion about the effectiveness of telehealth on children’s language skills, particularly pragmatic skills, we want to note a caveat relating to the heterogeneity of the current studies. The extant studies differ in multiple aspects, such as study designs (i.e., RCT, quasi-experimental, and SSED), telehealth formats (e.g., synchronous, asynchronous, and hybrid), direct recipients of intervention (i.e., parents, professionals, and children), intervention programs, length of intervention, and specific measures used to evaluate pragmatic and other language skills. Although in the meta-analysis, we tried to reduce heterogeneity by focusing on pragmatic measures derived from language/play sample analyses, heterogeneity still existed. Therefore, readers need to use caution when interpreting the results, considering the small sample size, the predominant use of SSED studies, and the heterogeneity in multiple aspects.

This is the first review study focusing on the effectiveness of telehealth social communication intervention programs on language skills. Although the heterogeneity of current literature prevents us from drawing a firm conclusion, the significant progress in pragmatic skills, as indicated by the meta-analysis, provided initial evidence. We believe that large-scale RCTs comparing telehealth treatment and non-treatment and more homogeneous designs and measures should be implemented in future studies, which will likely provide more robust evidence for the use of telehealth to improve language skills.

In addition to the multiple telehealth parent training studies, five studies delivered training to professionals (e.g., special educators and local speech therapists). Among these studies, Vismara et al. (2009) compared a parent-mediated and therapist-mediated intervention. They found that only the children in the therapist-mediated intervention group demonstrated significant improvement in the use of functional verbal utterances. Professionals have received more training on child development and are more likely to implement the intervention strategies with fidelity than parents. Although parent training is predominant in the included studies, professional training may be more effective to improve children’s performance. This warrants more attention and efforts in future studies to train local professionals via telehealth, maximizing the intervention effect on children with ASD.

Quality Assessment

Results of the quality assessment showed that nearly all (20 out of 21) of the included studies were rated as adequate or strong, indicating that these studies generally followed rigorous research procedures to minimize the risks of bias and to achieve reliable and valid results. For studies being rated as less than strong, they should consider a clear research plan guided by quality standards (e.g., Reichow et al., 2008), including important primary indicators (e.g., sufficient description of participant characteristics) and secondary indicators (e.g., interobserver agreement, generalization, and maintenance) to ensure the quality of both clinical intervention and research development. Although most studies were rated as strong or adequate, the total sample size, the heterogeneity in the included studies (e.g., variation of research design, participants, and outcome measures), and a lack of RCTs (presenting treatment vs. non-treatment comparisons) limit the strength of the evidence for the effectiveness of telehealth.

Limitations

Limitations of the study should be noted. First, although we made efforts to conduct a second database search to capture intervention programs after the outbreak of the COVID-19 pandemic, only one study specifically noted that they switched to telehealth after the pandemic. All the other studies were implemented before the pandemic when telehealth was not as widely adopted. The dramatic expansion of telehealth after the pandemic outbreak was accompanied by service disruption and abrupt transition from in-person services to telehealth services, which may have imposed a significant impact on telehealth implementation and subsequent effectiveness. There is an imminent need to examine telehealth in the context of the COVID-19 pandemic to understand its challenges, facilitators, and effectiveness, so that corresponding strategies can be suggested for its advancement.

Second, as the included studies are more likely to recruit relatively high socioeconomic families, findings from the review may be biased. There has been little attention to low socioeconomic families in telehealth research (Parsons et al., 2017), possibly due to their limited access to internet or challenges in engaging them in the telehealth intervention. However, given the limited resources that they may have, these families are more likely to be service demanding. Future research should explore effective mechanisms to engage these families in telehealth services.

Third, future research is warranted to expand the focus of the review. For example, we did not collect data on adverse effects of telehealth on children’s behaviors. As telehealth may not be appropriate for families with low acceptance and poor internet connections, using it in these families may result in negative influences. Other factors are worth investigating in future review studies. For example, linguistic and cultural diversity and comorbidity of ASD (e.g., medical issues) may have important influences on telehealth effectiveness. Also, social validity, reflecting patients’ satisfaction and acceptance during telehealth implementation, is an important aspect of the outcomes, warranting attention in future studies.

Conclusion

Telehealth has become increasingly important for the delivery of social communication interventions to families of children with ASD, yet its effectiveness on children’s language skills across different linguistic domains was unclear. In this study, we addressed the gap by conducting a systematic review using 21 eligible telehealth-based social communication intervention studies that measured language skills in children with ASD. The findings delineated participants and intervention characteristics for the extant telehealth social communication interventions. Research quality assessment indicated that most studies were rated as strong or adequate. Although we planned to do a meta-analysis for measures in each linguistic domain, the current studies only allowed a meta-analysis for pragmatic measures among 13 studies. The results revealed significant pre- versus post-intervention progress in the domain of pragmatics, providing initial evidence for the use of telehealth to improve pragmatics skills in children with ASD. Future studies should reduce heterogeneity in the research design, intervention programs, and outcome measures. RCTs, presenting telehealth treatment versus nontreatment comparisons, are warranted to provide more robust evidence for the use of telehealth to improve language skills.