Introduction

Autism spectrum disorder (ASD) is a set of heterogeneous neurodevelopmental disorders associated with genetic, environmental and immune factors, characterized by early-appearing social communication deficits and restricted, repetitive sensory-motor behaviors (Lord et al., 2018). The median worldwide prevalence of children with ASD is 1/100 (Elsabbagh et al., 2012; Zeidan et al., 2022), and the prevalence of ASD in children under the age of 8 in the United States is 1:44 (Maenner et al., 2021). ASD commences in early childhood, with abnormal patterns that persist throughout life (Baxter et al., 2015). In addition to having core symptoms, more than 70% of people with ASD have concurrent conditions, such as atypical language development and abilities, motor abnormalities, and other psychiatric disorders (Lai et al., 2014). The high morbidity and disability rates not only place a heavy burden on people with ASD and their families but also impose a heavy economic burden on society (Buescher et al., 2014; Hyman et al., 2020; Leigh & Du, 2015). Therefore, effective early interventions for ASD are critical to clinical and public health practice.

Currently, there is no definitive curative treatment for ASD. Behavioral therapy, as the standard-of-care treatment for core symptoms, is most effective if started early in life (Lai et al., 2014). However, the benefits of early intensive behavioral therapy are limited to a small number of individuals; most people with ASD require lifelong supportive care (Frye, 2022). In recent years, physical activity has attracted attention as a potential therapeutic strategy for many chronic diseases (Pedersen & Saltin, 2015). Specifically, physical activity has been demonstrated to have a favourable outcome for ASD symptoms as a non-pharmacological intervention (Toscano et al., 2021). Previous systematic reviews and pairwise meta-analyses have shown the existence of a positive relationship between physical activity and the improvement of motor function (Healy et al., 2018; Ruggeri et al., 2020; Zhang et al., 2020), executive function (Liang et al., 2022) and communication (Chan et al., 2021; Huang et al., 2020), as well as reduction of deficits in social interaction (Howells et al., 2019; Sowa & Meulenbroek, 2012) and stereotyped behaviors (Bremer et al., 2016; Ferreira et al., 2019; Tarr et al., 2020; Teh et al., 2021) in children with ASD. However, a meta-analysis by Monteiro et al. showed that physical activity had no statistically significant effect on motor skills in children with ASD (Monteiro et al., 2022). Previous meta-analyses only compared the effects of physical activity interventions relative to non-physical activity interventions, and which specific types of physical activity are more effective in children with ASD received limited attention. There is a need to identify and evaluate the efficacy of different interventions capable of improving autistic symptoms.

Given the paucity of direct comparative studies between the two interventions, it is difficult to establish the superiority of the different interventions with traditional pairwise meta-analyses. To determine the most appropriate type of physical activity for autism-related health outcomes, it is necessary to establish evidence concerning the comparative efficacy of all relevant physical activity strategies (Su et al., 2020). Network meta-analyses (NMA) can overcome these problems by incorporating data from direct comparisons (similar to pairwise meta-analyses) and indirect comparisons of all available treatment options via the network of treatments (Su et al., 2020). In addition, network meta-analysis allows for the estimation of the relative effectiveness and rank ordering of all the interventions (Rochwerg et al., 2018).

In the present study, we performed a systematic review and network meta-analysis to compare the efficacy of different types of physical activity for motor function, social function, communication, and stereotyped behaviors in children with ASD and ranked the optimal physical activity interventions.

Methods

Registration

This review was registered with the International Prospective Register of Systematic Reviews (PROSPERO) (CRD42022316836) and has adhered to guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA-NMA) statement (Hutton et al., 2015).

Search Strategy

An electronic search of PubMed, Web of Science, EBSCO, and Cochrane Library was conducted from 1943, since ASD was first reported (Kanner, 1943; Lai et al., 2014), to May 25, 2023, to identify all related articles. The search was restricted to English-language, peer-reviewed articles. The search strategy was developed and refined from previously published meta-analyses (Howells et al., 2019; Tan et al., 2016).

The search was carried out using key phrases and Medical Subject Heading (MeSH) terms (Supplementary Table 1). The detailed search strategy is presented in Supplementary Table 2. In addition to searching the databases, the reference lists of included articles and previously published systematic reviews were manually searched to identify articles that met the inclusion criteria.

Study Selection

Endnote X8 literature management software (Thomson ResearchSoft, Stanford, Connecticut, USA) was used to manage the literature search records. All results of the search were screened by one reviewer to exclude duplicates. The remaining studies were screened independently by two reviewers based on titles and abstracts. Any studies with the potential to meet our inclusion criteria and conflicting studies were subjected to full-text evaluation. Two reviewers then independently assessed the full texts of the remaining studies against the inclusion and exclusion criteria. Any discrepancies were adjudicated by a third reviewer.

Inclusion and Exclusion Criteria

Studies were required to be published in a peer-reviewed journal and written in English (conference abstracts and registered trials were excluded). All other inclusion criteria were in accordance with the PICOS (Participants, Interventions, Comparators, Outcomes and Study Design) framework.

The participants of interest were children under the age of 18 diagnosed with ASD, including disorders previously known as autism, Asperger’s, and pervasive developmental disorders not otherwise specified (American Psychiatric Association, 2013; Howells et al., 2019; Tan et al., 2016).

Interventions were restricted to studies that focused on any type of physical activity intervention. Physical activity is planned, structured, repetitive, and purposeful (Caspersen et al., 1985). Studies were excluded if the physical activity intervention was combined with other non-physical activities (e.g., behavioral intervention, pharmacotherapy) or if the intervention involved unsupervised training sessions. The comparator groups involved participants who engaged in no treatment, a waiting list, or usual care for the comparison condition.

Studies were required to include at least one of the following outcome measures of autistic symptoms: motor function, social function, communication, and stereotyped behaviors. We excluded studies that did not assess autistic symptoms based on baseline and post-intervention change scores, had incomplete data, and were unsuccessful in contacting the full-text authors.

For the study design, included studies were randomized controlled trials (RCTs) or controlled trials (CTs) that investigated the effects of physical activity interventions on children with ASD. Studies were excluded which had no control group or failed to report any comparative results between groups.

Data Extraction

Study characteristics and outcome statistics were extracted independently by two reviewers using a data extraction form, including the relevant publication information (author, year, and country), study design (RCT or CT), participant characteristics (sample size, sex, age), diagnostic criteria for ASD, severity, intervention components (intervention design, sample size, frequency, duration), and outcome measures. If the content of interventions is relatively homogeneous in several groups, we classify them as the same intervention (e.g., we categorized structured circuit exercise (Arslan et al., 2020), inclusive physical activity (Sansi et al., 2021) and exercise-based intervention (Toscano et al., 2018), which aim to develop children's fundamental movement skills, as fundamental movement skill intervention due to the homogeneity of these interventions in form and content). Pre- and post-intervention means and standard deviations were extracted to calculate effect sizes (mean differences). When only figures were presented (rather than numerical data within the text), data were extracted using WebPlotDigitizer (Version 4.5, https://automeris.io/WebPlotDigitizer/). Authors of particular studies were contacted at least twice over a 4-week period if the full text was not available.

Risk of Bias Assessment

Risk of bias for each individual study was assessed independently by two reviewers in accordance with the version 2 of the Cochrane risk-of-bias tool (RoB 2) (Sterne et al., 2019), which examined 5 domains: randomization process, deviation from intended intervention, missing outcome data, outcome measurement, and selection of the reported result. For each source of bias, studies were classified as having a low risk, high risk or some concerns. Randomization process is only assessed for RCTs. Any disagreements in reviewers’ judgments were resolved by a third reviewer.

Statistical Analysis

The network meta-analysis followed the current PRISMA-NMA guidelines and was conducted within a frequentist framework using STATA software (Version 16.0, College Station, Texas, USA) (Shim et al., 2017). Network geometry was created to visualize comparative relationships among different interventions (Salanti et al., 2008). As the outcome measures were continuous but measured using different instruments, we used standardized mean difference (SMD) and 95% confidence interval (CI) as effect estimates. The direction of effect size was recoded such that a positive mean difference indicated a greater improvement in outcome measures in the intervention group relative to the control group. Random effects models were used to minimize heterogeneity across studies, considering the differences in study design and outcome measures. The I2 statistic was used to rate heterogeneity as low (< 25%), moderate (25%–50%) or high (> 50%) (Higgins et al., 2003). Leave-one-out sensitivity analysis was performed for the presence of large heterogeneity. Consistency test was evaluated by fitting consistency and inconsistency network meta-analysis models. The local inconsistency test evaluated the difference between direct and indirect estimates in all closed loops in the network. The inconsistency of the model was assessed using node splitting (Chaimani et al., 2013). When the p-value exceeded 0.05, the consistency model was used to calculate the effect size of different interventions and evaluate the rank probabilities (Higgins et al., 2012). The surface under cumulative ranking curve (SUCRA) percentage values were used to predict the probability ranking for each intervention. The SUCRA range from 0 to 100%; larger SUCRA indicate more effective intervention methods (Mbuagbaw et al., 2017; Shim et al., 2017). Furthermore, comparison-adjusted funnel plots were generated and visually inspected using the symmetry criterion for the presence of publication bias.

Results

Study Selection

The flow of the systematic review is presented in Fig. 1. The electronic database search yielded 5846 records after duplicates were removed. Additionally, 43 records were located via the reference lists of previously published systematic reviews. The examination of titles and abstracts resulted in the retrieval of 229 full-text records. Following full-text review, 37 studies were included in qualitative analysis. Of these studies, 29 (78.4%) were deemed eligible for the network meta-analyses (motor function, n = 12; social function, n = 17; communication, n = 8; stereotyped behavior, n = 5).

Fig. 1
figure 1

PRISMA flow chart of the study selection process

Characteristics of the Studies

A detailed summary of each included study (n = 37) is presented in Supplementary Table 3. These studies were conducted in Australia (n = 2), China (n = 8), Canada (n = 1), Hungary (n = 1), Iran (n = 12), Israel (n = 1), Italy (n = 2), Portugal (n = 1), Spain (n = 1), Turkey (n = 2), and America (n = 6). In this study, 26 RCTs and 11 CTs published between 2009 and 2021 were included. Among them, only two included studies (Hassani et al., 2020; Rafiei Milajerdi et al., 2021) were comprised of three arms.

A total of 1200 participants (631 treatments and 569 controls) with an average age range from 4.3 (SD = 0.2) to 10.8 (SD = 2.4) years, were investigated across the 37 studies. Of these, 971 (80.9%) were male. Six studies (Arslan et al., 2020; Borgi et al., 2016; Moradi et al., 2020; Nazemzadegan et al., 2016; Pan, 2010; Pan et al., 2017) only included males, one study (Najafabadi et al., 2018) did not report sex, and the remainder included both sexes. Twenty-two studies reported the Diagnostic and Statistical Manual of Mental Disorders (DSM) as a diagnostic criterion for ASD, three studies (Gabriels et al., 2015; Ketcheson et al., 2016; Rafiei Milajerdi et al., 2021) reported Autism Diagnostic Observation Schedule (ADOS), and 12 studies did not report a diagnostic criterion. Five studies (Bahrami et al., 2012, 2016; Moradi et al., 2020; Movahedi et al., 2013; Sarabzadeh et al., 2019) used Gilliam Autism Rating Scale (GARS) to assess the severity of ASD, five studies (Cai et al., 2020a, b; García-Gómez et al., 2014; Wang et al., 2020; Yang et al., 2021) used Childhood Autism Rating Scale (CARS), two studies used ADOS (Caputo et al., 2018; Ketcheson et al., 2016), two studies (Howells et al., 2020, 2021) used Social Responsiveness Scale (SRS), only one study (Sotoodeh et al., 2017) used Autism Diagnostic Interview (ADI), and 22 studies did not report severity.

The intervention duration ranged from 4 to 48 weeks, with more than three-quarters (75.6%) of studies reporting interventions that lasted more than 12 weeks. The average number of sessions per week was 2.7 (SD = 1.5), the mean length per session was 56.5 (SD = 35.4) min, and the maximum duration per day was 4 h. Further details relating to the interventions are reported in Supplementary Table 3.

In terms of intervention categories, studies included 17 physical activity interventions: Aquatic exercise (AE) (studies: n = 3, participants: n = 28), fundamental motor skill intervention (FMS) (studies: n = 6, participants: n = 69), gymnastic exercises (GE) (studies: n = 1, participants: n = 15), I can have physical literacy (ICPL) (studies: n = 1, participants: n = 11), Kinect (studies: n = 1, participants: n = 20), Kata techniques (KT) (studies: n = 3, participants: n = 45), Mini-basketball training program (MBTP) (studies: n = 4, participants: n = 63), outdoor adventure program (OAP) (studies: n = 1, participants: n = 30), organized football program (OFP) (studies: n = 2, participants: n = 35), perceptual-motor exercises (PME) (studies: n = 1, participants: n = 25), Qigong sensory training (QST) (studies: n = 1, participants: n = 25), Sport, Play, and Active Recreation for Kids (SPARK) (studies: n = 3, participants: n = 42), Tai Chi Chuan (studies: n = 2, participants: n = 21), trampoline-based training (TBT) (studies: n = 1, participants: n = 8), therapeutic horseback riding (THR) (studies: n = 6, participants: n = 144), table tennis exercise (TTE) (studies: n = 1, participants: n = 11), Yoga training program (YTP) (studies: n = 2, participants: n = 39).

Risk of Bias Assessment

The summary of the risk of bias is presented in Supplementary Table 4. The quality of the included studies was generally not very high (Fig. 2). More than half of (51.4%) included studies had a low risk in overall bias. Twenty-four (92.3%) included RCTs had a low risk in randomization process. More than half (51.4) of the studies had a high risk of bias in measurement of the outcome. As some autism behavioral assessment instruments are reported by parents or caregivers, most of the studies involved parents during the implementation of the physical activity intervention, which may influence the assessment of outcomes by knowledge of the intervention received. All included studies had a low risk of bias for selection of the reported result, missing outcome data, and deviations from intended interventions.

Fig. 2
figure 2

Risk of bias of included studies. Percentage of studies with low, unclear, and high risk of bias for each feature of the Version 2 of the Cochrane risk-of-bias tool (RoB 2.0). Randomization process is only rated for randomized controlled trials

Results of the Network Meta-analysis

Motor Function

Seventeen studies (Arslan et al., 2020; Borgi et al., 2016; Bremer et al., 2015; Caputo et al., 2018; Fragala-Pinkham et al., 2011; Gabriels et al., 2015; Hassani et al., 2020; Howells et al., 2021; Ketcheson et al., 2016; Lourenço et al., 2015; Najafabadi et al., 2018; Pan et al., 2017; Rafiei Milajerdi et al., 2021; Sansi et al., 2021; Sarabzadeh et al., 2019; Steiner & Kertesz, 2015; Zamani Jam et al., 2018) assessed motor function, five studies (Arslan et al., 2020; Howells et al., 2021; Ketcheson et al., 2016; Najafabadi et al., 2018; Sansi et al., 2021) were qualitatively described, and 12 studies were eligible for network meta-analysis involving 11 intervention types. The network geometry shows which interventions have been compared directly in the included studies, and which could only be compared indirectly (Fig. 3). In the motor function network plot, only SPARK existed for direct comparisons with ICPL and Kinect among different interventions. The global consistency of direct and indirect effects in pairwise and multi-arm comparisons was tested using the inconsistency network models. The assumption of inconsistency between direct and indirect estimates was not significantly different (p = 0.76), indicating that indirect estimates were consistent with direct evidence (Supplementary Table 5). Loop-specific heterogeneity was explored with an inconsistency plot (Supplementary Fig. 1). Inconsistency factor (IF) did not indicate high inconsistency (IF = 0.35). The heterogeneity term, τ2, was 0.000 in magnitude for the networks, and therefore, the assumption of consistency was upheld.

Fig. 3
figure 3

Network meta-analysis graphs for comparison of different interventions on motor function, social function, communication, and stereotyped behavior. The size of the nodes relates to the number of participants in that intervention type, and the thickness of lines between the interventions relates to the number of studies for that comparison. CON = Control group; AE = Aquatic exercise; FMS = Fundamental motor skill intervention; GE = Gymnastic exercises; ICPL = I can have physical literacy; KT = Kata techniques; MBTP = Mini-basketball training program; OFP = Organized football program; OAP = Outdoor adventure program; PME = Perceptual-motor exercises; SPARK = Sport, Play, and Active Recreation for Kids; TBT = Trampoline-based training; THR = Therapeutic horseback riding; TTE = Table tennis exercise; YTP = Yoga training program

In this study, for the 11 interventions on motor function outcomes, there are 55 relative effect estimates. We can present these results using a square matrix called a league table, which contains all information about relative effectiveness of all possible pairs of interventions. The results for motor function using network meta-analysis are presented in Fig. 4 and Supplementary Fig. 2. The results demonstrated that, compared with controls, THR (SMD: 2.08, 95% CI: 1.24 to 2.91) and Tai Chi Chuan (SMD: 4.47, 95% CI: 2.33 to 6.61) yielded better outcomes, while other types of intervention did not yield significant differences for improving motor function. Furthermore, Tai Chi Chuan was significantly more effective than other interventions in improving motor function. AE (SMD: -1.80, 95% CI: -3.10 to -0.51), SPARK (SMD: -1.65, 95% CI: -2.86 to -0.44), Kinect (SMD: -2.07, 95% CI: -3.46 to -0.68), and GE (SMD: -1.87, 95% CI: -3.37 to -0.38) resulted in lower motor function compared with THR. The ranking of intervention effects based on SUCRA is shown in Fig. 5 and Supplementary Fig. 3. The results demonstrated that, in comparison with the control condition, Tai Chi Chuan was ranked as the most effective (SUCRA: 99.4%), followed by THR (SUCRA: 83.1%). Kinect ranked the lowest on efficacy (SUCRA: 18.6%), and all of the interventions ranked higher than the control group. Furthermore, the results of the comparison-adjusted funnel plot are provided in Supplementary Fig. 4, and no significant asymmetry was found.

Fig. 4
figure 4

League table representing summary estimates from network meta-analysis. Comparisons of the effects (SMD (95% CI)) of different interventions on motor function, social function, communication, and stereotyped behavior using network meta-analysis. Significant differences are highlighted by bold type. CON = Control group; AE = Aquatic exercise; FMS = Fundamental motor skill intervention; GE = Gymnastic exercises; ICPL = I can have physical literacy; KT = Kata techniques; MBTP = Mini-basketball training program; OFP = Organized football program; OAP = Outdoor adventure program; PME = Perceptual-motor exercises; SPARK = Sport, Play, and Active Recreation for Kids; TBT = Trampoline-based training; THR = Therapeutic horseback riding; TTE = Table tennis exercise; YTP = Yoga training program

Fig. 5
figure 5

The surface under cumulative ranking curves (SUCRAs) for the assessment of different interventions on motor function, social function, communication, and stereotyped behavior. Rank 1 indicates the probability that a treatment is best. CON = Control group; AE = Aquatic exercise; FMS = Fundamental motor skill intervention; GE = Gymnastic exercises; ICPL = I can have physical literacy; KT = Kata techniques; MBTP = Mini-basketball training program; OFP = Organized football program; OAP = Outdoor adventure program; PME = Perceptual-motor exercises; SPARK = Sport, Play, and Active Recreation for Kids; TBT = Trampoline-based training; THR = Therapeutic horseback riding; TTE = Table tennis exercise; YTP = Yoga training program

Data from pairwise meta-analysis provided evidence of considerable heterogeneity (I2 = 77%) among the three studies (Borgi et al., 2016; Gabriels et al., 2015; Steiner & Kertesz, 2015) on the effect of THR on motor function. The sensitivity analysis showed that the heterogeneity dropped to 0 after deleting the references (Steiner & Kertesz, 2015) with the biggest difference from others. Sensitivity analyses showed that there was no single study influential enough to change the conclusions (Supplementary Table 6).

Social Function

Twenty (Bass et al., 2009; Borgi et al., 2016; Cai et al., 2020a, b; Caputo et al., 2018; Gabriels et al., 2015; García-Gómez et al., 2014; Howells et al., 2020; Koenig et al., 2012; Movahedi et al., 2013; Najafabadi et al., 2018; Pan, 2010; Sansi et al., 2021; Sotoodeh et al., 2017; Steiner & Kertesz, 2015; Wang et al., 2020; Yang et al., 2021; Zachor et al., 2017; Zhao & Chen, 2018; Zhao et al., 2021) studies reported social function; three studies (García-Gómez et al., 2014; Koenig et al., 2012; Sansi et al., 2021) were qualitatively described; seventeen studies included nine interventions and control arms were eligible for network meta-analysis (Fig. 3). According to the results of network meta-analysis (Fig. 4, Supplementary Fig. 5), KT (SMD: 1.40, 95% CI: 0.59 to 2.20), MBTP (SMD: 0.67, 95% CI: 0.30 to 1.04), FMS (SMD: 1.20, 95% CI: 0.60 to 1.81), and THR (SMD: 1.18, 95% CI: 0.91 to 1.45) had a significantly greater effect on enhancing social function compared with the control condition. THR significantly improved social function compared to other interventions except for SPARK (SMD: -0.41, 95% CI: -1.26 to 0.44) and FMS (SMD: 0.02, 95% CI: -0.64 to 0.68). In addition, KT (SMD: 1.16, 95% CI: 0.07 to 2.25) and FMS (SMD: 0.97, 95% CI: 0.02 to 1.92) also showed greater efficacy than YTP. The results of the SUCRAs (Fig. 5, Supplementary Fig. 6) indicated that KT (SUCRA: 89.0%) had the greatest likelihood of being the most effective intervention for improving social functioning, followed by THR (SUCRA: 84.0%) and FMS (SUCRA: 82.6%). YTP (SUCRA: 23.3%) was most likely to be the least effective for social function, and all interventions ranked higher than the control group. The comparison-adjusted funnel plots indicated that there was no significant publication bias (Supplemental Fig. 7).

We found high heterogeneity (I2 = 83%) among the five studies (Bass et al., 2009; Borgi et al., 2016; Gabriels et al., 2015; Steiner & Kertesz, 2015; Zhao et al., 2021) of THR on social function (Supplementary Table 6). The sensitivity analysis showed that the heterogeneity dropped to 0 after deleting two studies (Borgi et al., 2016; Steiner & Kertesz, 2015) with a bigger difference from others. Analysis of the other three studies (Bass et al., 2009; Gabriels et al., 2015; Zhao et al., 2021) showed that the intervention duration ranged from 10 to 16 weeks, the Steiner study (2015) (Steiner & Kertesz, 2015) had a shorter intervention period (4 weeks), and the Borgi study (2016) (Borgi et al., 2016) had a longer intervention period (25 weeks). Furthermore, two studies (Bass et al., 2009; Gabriels et al., 2015) used Social Responsiveness Scale (SRS) as outcome measures, while the other three studies used Vineland Adaptive Behavior Scale (VABS) (Borgi et al., 2016), Pedagogical Analysis and Curriculum (PAC) (Steiner & Kertesz, 2015), and The Social Skills Improvement System Rating Scales (SSIS) (Zhao et al., 2021) as outcome measures. The difference may be caused by the duration of intervention and the outcome measures varying across several studies. Sensitivity analyses showed that there was no single study influential enough to change the conclusions.

Communication

Ten studies (Bahrami et al., 2016; Caputo et al., 2018; Gabriels et al., 2015; Howells et al., 2020; Koenig et al., 2012; Silva et al., 2009; Sotoodeh et al., 2017; Steiner & Kertesz, 2015; Zhao & Chen, 2018; Zhao et al., 2021) reported communication, two studies (Koenig et al., 2012; Silva et al., 2009) were qualitatively described, and eight studies including seven intervention types were eligible for network meta-analysis (Fig. 3). The network meta-analysis (Fig. 4, Supplementary Fig. 8) revealed that THR (SMD: 0.43, 95% CI: 0.14 to 0.71) and FMS (SMD: 0.90, 95% CI: 0.32 to 1.49) were statistically superior to the control group in improving communication. According to the SUCRAs (Fig. 5, Supplementary Fig. 9), the physical activity intervention ranked as having the greatest likelihood of being the most effective was FMS (SUCRA: 85.9%). YTP (SUCRA: 28.9%) was most likely to be the least effective for communication. All of the physical activity interventions ranked better than the control group. The comparison-adjusted funnel plots indicated that there was no significant publication bias (Supplementary Fig. 10).

We found high heterogeneity (I2 = 91%) among the three studies (Gabriels et al., 2015; Steiner & Kertesz, 2015; Zhao et al., 2021) on the effect of THR on communication (Supplementary Table 6), and sensitivity analysis indicated that the heterogeneity persisted even if any of the studies were removed separately. The analysis of the three studies showed that the difference may be caused by the dosage of sessions and outcome measures. Gabriels et al. (2015) reported 10 sessions of THR intervention, using the VABS scale to measure communication ability; Steiner and Kertesz (2015) conducted 4 intervention sessions and used the PAC scale for measurement; Zhao et al. (2021) conducted 32 intervention sessions and used SSIS to assess communication. There is no single study influential enough to change the conclusions.

Stereotyped Behavior

Seven studies reported stereotyped behavior (Bahrami et al., 2012; Gabriels et al., 2015; Koenig et al., 2012; Moradi et al., 2020; Nazemzadegan et al., 2016; Tabeshian et al., 2021; Wang et al., 2020), and five studies (Bahrami et al., 2012; Gabriels et al., 2015; Moradi et al., 2020; Nazemzadegan et al., 2016; Wang et al., 2020) involving five interventions and control arms were eligible for network meta-analysis (Fig. 3). The results indicated that, compared with no intervention, KT (SMD: 0.90, 95% CI: 0.15 to 1.66) and PME (SMD: 0.82, 95% CI: 0.24 to 1.40) significantly alleviated stereotyped behaviors (Fig. 4, Supplementary Fig. 11). The results of the SUCRAs (Fig. 5, Supplementary Fig. 12) revealed that, in comparison with the control condition, KT (SUCRA: 81.0%) was the best intervention, and THR (SUCRA: 30.6%) was most likely to be the least effective for reducing stereotyped behaviors. All of the physical activity interventions ranked better than the control group. The comparison-adjusted funnel plots indicated that there was no significant publication bias (Supplementary Fig. 13).

Discussion

Summary of Evidence

In this first network meta-analysis of physical activity interventions in children with ASD, our purpose was to evaluate the efficacy of different interventions for motor function, social function, communication, and stereotyped behavior in children with ASD. According to the results of the NMA, Tai Chi Chuan appeared to be the most promising intervention for motor function, KT might be optimal for social functioning and stereotyped behaviors, and FMS should be considered as an optimal intervention for improving communication.

Comparisons with Previous Studies and Theoretical Implications of Results

Consistent with the results of this study, previous systematic reviews and meta-analyses showed robust benefits of different physical activities (e.g., fundamental motor skill intervention, aquatic, and simulated horse-riding) on motor function in children with ASD (Case & Yun, 2019; Healy et al., 2018; Ruggeri et al., 2020; Sam et al., 2015; Sowa & Meulenbroek, 2012).

Despite numerous reported benefits of physical activity, physical activity levels in children with ASD are significantly lower than those of typically developing peers (Borremans et al., 2010; Macdonald et al., 2011). This physical inactivity could be attributed to their social and communicational impairments as well as problems in behavioral, sensory, and motor domains (Damme et al., 2015; Srinivasan et al., 2014). In addition, individuals with ASD typically suffer from difficulties with balance, postural stability, gait, joint flexibility and movement speed, which may be exacerbated by reduced opportunities to participate in physical activity (Lang et al., 2010). With the potential physical, psychological and behavioral benefits of increased exercise for people with ASD, there is a need to better understand which specific physical activity interventions are most appropriate.

In the network meta-analysis, for motor function, the results of SUCRAs demonstrated that the surface area for Tai Chi Chuan reaches 99.4%, confirming that it is the best intervention. It incorporates balance and mental stimulation and focuses on promoting balance, proprioceptive function and body awareness, which may contribute to improving cognitive and behavioral problems in individuals with ASD. Additionally, the repetitive sit-to-stand movements, balance, relaxation, and mental stimulation tasks in Tai Chi Chuan, as well as sensory-motor exercises, can better integrate the left and right hemispheres and increase environmental perception, which can lead to increased tolerance thresholds and decrease anxiety (Gatts, 2008; Sarabzadeh et al., 2019). Furthermore, from the perspective of neuropathology, Tai Chi Chuan, as a moderate-intensity physical activity, stimulates the secretion of cytokines or metabolic hormones from several organs that play an important role in regulating neuronal metabolism, neuroinflammation and neuroplasticity underlying changes in brain function (Bay & Pedersen, 2020; Ignácio et al., 2019; Murphy et al., 2020). Tai Chi can be considered as an intervention to increase the physical activity level and improve the motor function of children with ASD. In addition, THR also showed notable beneficial effects on motor function in our study. THR is an animal-assisted therapy that can improve motor function due to the rhythmic equine movements imposed on the participant’s body, which improve balance, muscle symmetry, coordination, and posture. Correspondingly, we found that THR can also improve social and communication deficits, it is due to the recognized ability of some animals to interact positively with people, which can potentially counteract the social withdrawal of these individuals (Borgi et al., 2016).

Motor and social communication impairments in children with ASD are linked. Physical activities not only enable children to engage in social and communicative behavior, but they also improve their ability to receive perceptual information from their surroundings. Motor clumsiness will contribute to missed opportunities, reduced contact with coordinated and agile peers, and may lead to delayed social skills and long-term social impairment (Bhat et al., 2011). The results of the SUCRAs revealed that FMS (SUCRA: 85.9%) appears to have optimal benefits for communication. Participating in FMS (e.g., group ball exercise) allows children with ASD to experience a fun activity with their peers and to promote the development of social, emotional, and communication skills (Zhao & Chen, 2018). Several systematic reviews (Bremer et al., 2016; Chan et al., 2021; Howells et al., 2019; Huang et al., 2020) have also demonstrated the efficacy of physical activity interventions (e.g., jogging, horseback riding, martial arts, and swimming) in improving the communication and social functioning of individuals with ASD.

The results of the SUCRAs indicated that KT (SUCRA: 89.0%) had the greatest likelihood of being the most effective intervention for improving social functioning. KT may provide a great deal of social interaction for children with ASD, which can lead to a reduction in their social dysfunction. From a neurochemical perspective, abnormal levels of neurotransmitters such as oxytocin and serotonin in individuals with ASD are associated with social functioning (Blatt, 2010), and KT may improve the synthesis and metabolism of neurotransmitters in the brain, which consistently decrease social dysfunction in children with ASD. In addition, according to the SUCRAs, we found that KT (SUCRA: 81.0%) was also the most beneficial intervention for improving stereotyped behavior. Several studies (Ferreira et al., 2019; Huang et al., 2020; Petrus et al., 2008; Tarr et al., 2020; Teh et al., 2021) also reported the positive effects of physical activity on alleviating stereotyped behaviors in individuals with ASD. One possible explanation is that fatigue resulting from physical activity ameliorates maladaptive behaviors (Lang et al., 2010; Sefen et al., 2020). Stereotypic behaviors are often considered to be due to the pleasurable intrinsic effects that the behavior itself has on the individual. The physical stimulation obtained through exercise may be similar to that obtained through stereotypic behavior in some children (Rapp et al., 2004). KT may involve similar body mechanics to stereotypical behaviors and therefore may produce similar internal states; it is possible that stereotypic behaviors may be allocated to physical activities that receive other reinforcements.

The quality of some of the studies included in this systematic review is limited, which may introduce bias and affect the credibility of the results. Similarly, the previous meta-analyses were also limited by the existing studies, and the included studies in the analysis had the same limitations of high risk of bias, small number of participants and low quality of studies (Chan et al., 2021; Howells et al., 2019; Zhang et al., 2020). While many of the existing studies in this area have apparent biases and design limitations, they nonetheless provide insight into the positive effects of physical activity on children with ASD.

The network meta-analysis conducted in our study makes robust contributions to the literature beyond the previous findings. It provides details on which type of physical activity is most appropriate for children with ASD, thus providing additional information and valuable contributions to this field of research. However, the variation in the measurement instruments for the outcomes (motor function, social function, communication and stereotyped behavior) across studies may be a contributing factor to the wide heterogeneity in the results of this meta-analysis. We have selected standardized mean difference (SMD) as the effect size to minimize the effect of differences in measurement instruments and units across studies on the results (Higgins & Green, 2011). The application of standardized assessment instruments in future studies could increase the reliability and consistency of results.

Strengths and Limitations

To our knowledge, our review is the first to use network meta-analysis to quantitatively compare the effectiveness of different physical activity interventions for children with ASD. This design can provide indirect comparisons and allow firmer conclusions to be drawn about the comparative effectiveness and relative rankings of different interventions. The current review conducted a systematic literature search, using standardized inclusion and exclusion criteria to include relevant studies and to determine effect sizes from available data.

However, there are several noteworthy limitations of this network meta-analysis. First, the measurement instruments for the outcomes and characteristics of interventions (e.g., duration, and length of the experimental period) varied across the included studies. These differences may cause heterogeneity and reduce the precision of the overall effect estimation. The use of SMDs circumvents the uncertainty related to measuring heterogeneity. In addition, we performed sensitivity analysis and found that the studies with heterogeneity did not alter the conclusions. Additionally, our network meta-analysis lacked exploration regarding the most effective exercise dose due to limitations in the design of the included studies, which is another important factor for the effectiveness of rehabilitation. Furthermore, the sample size of the included studies was relatively small; only eight of the included studies had a sample size of over 30 participants for each comparison group, which introduced imprecision into the indirect comparisons. Second, some comparisons were limited by the number of included studies; there were not enough studies to enable us to directly compare the effects among different interventions. Additionally, the network meta-analysis included only five intervention studies that used stereotypical behaviors as an outcome measure, and the sample size for each included study was relatively small. Third, some publication bias may have been caused by our exclusion of studies published in languages other than English, although comparison-adjusted funnel plots did not indicate publication bias. Finally, the high risk of bias in some of the included studies could threaten the overall validity of the current findings. On the one hand, some of the included studies had a high risk of bias in terms of randomization process, which may introduce selection bias, in that some participants were more likely to be exposed to a particular intervention than others. On the other hand, most of the included studies have a high risk of bias in measurement of the outcome, which may result from the outcome assessors’ knowledge of group allocation of participants and expectations of intervention’s effects. This can affect the objectivity of outcome assessment (e. g. parents of participants in the intervention group are more likely to see outcome improvements in their children).

Implications and Future Research

We believe that future research is required to address several important issues. First, the description of the exercise intervention models for the ASD population is still not clear in specific details such as intensity, volume, and frequency of exercise. It is not possible to determine which exercise prescription is optimal for children with ASD. We recommend that future studies should consider examining the effects of different exercise intervention protocols. Second, given the included studies had a high risk of bias, this highlights the need to conduct more high-quality RCTs in the future, with increased numbers of participants in different intervention types, more rigorous randomization processes, and blinding of outcome assessments, which will contribute to more reliable and accurate results on the effects of physical activity on motor functioning, social functioning, communication and stereotyped behaviors in children with ASD.

Additionally, future large-scale studies with direct comparisons between different interventions are needed to validate the effectiveness of the best interventions we identified. We also recommend that stereotyped behavior should be the focus of experimental outcomes in future RCTs. Finally, we suggest that standardized assessment instruments be considered, as the majority of studies included in the current review assessed outcomes via different instruments. Moreover, regularly updating network meta-analyses on this topic is crucial for researchers, as new data are continually released.

Conclusions

This network meta-analysis provides valuable information for the clinical application of physical activity as a complementary strategy for other therapies in the management of ASD. Our review suggests that Tai Chi Chuan may be the most promising intervention for motor function, and KT may be optimal for improving social functioning and stereotyped behaviors. FMS should be considered as an optimal intervention for improving communication. It is crucial that the therapists should work with people with ASD to identify a modality suitable for their capabilities and interests to increase the likelihood of efficacy.