Introduction

Autism spectrum disorder (ASD) is characterised by deficits in social skills and communication, restricted interests and repetitive patterns of behaviour (American Psychiatric Association, 2013). The overall estimated prevalence of ASD is 27.6 per 1000 (one in 36) in children aged 8 years, varying from 23.1 to 44.9 per 1000 among the CDC-established Autism and Developmental Disabilities Monitoring network sites (Maenner et al., 2023). Typically, ASD manifests in the first years of life with a wide range of clinical evidences, and it is persistent throughout life.

ASD diagnosis is still based only on behavioural observations and anamnestic data, according to clinical diagnostic criteria. Indeed, the neurobiological bases and the etiopathogenesis of the disorder are still unknown, and they seem to be extraordinarily complex (Lord et al., 2020). It is critical to have an early diagnosis, to allow an earlier intervention, which has been proven to be one of the factors that can impact developmental trajectories (Lord et al., 2020).

Conventional therapeutic strategies in ASD aim to promote social engagement, interaction and communication. Growing literature is available about evidence-based intervention in ASD, referring to manualised interventions, that are mainly behavioural and developmental approaches, such as applied behaviour analysis (Politte et al., 2015) or early start Denver model (Sam et al., 2020). However, there is a gap between the current literature knowledge about evidence-based practices (EBPs) and their routine use by clinical practitioners (Sam et al., 2020). On the one hand, there is a difference between research activity/settings and daily clinical practice. The majority of interventions are delivered by researchers, within a too controlled scenario and highly standardised conditions (Hume et al., 2021), not replicable in daily clinical practice. In addition, most studies are single-case designs, which makes it difficult to assess their effect size (Wong et al., 2015). Nevertheless, many EBP reviews still include these studies to not ignore the largest body of literature on focused interventions for ASD (Hume et al., 2021). On the other hand, the considered outcomes go from the evaluation of core deficits of autism like communication and social behaviours, to the assessment of task-specific skills like joint attention (Wong et al., 2015; Hume et al., 2021), hardening the comparison among the studies. Moreover, for each one of these skills, numerous and different types of metrics are used as outcome measures, including standardised and validated scales assessed by researchers, questionnaires filled by the parents or caregivers or discrete observational measures of a target behaviour (Wong et al., 2015; Hume et al., 2021).

Next to conventional therapies, other alternative methods are being designed, such as computer-based and tablet-based applications, taking advantage of ASD children’s interests. Among these technologies, in the last 20 years, the use of robots has gained attention, in particular of social assistive robots (SARs).

SARs can be used to ease the interaction with human users, providing assistance through measurable progress and enhancing therapy, eventually promoting learning processes and improved quality of life (Papadopoulos et al., 2020). SARs have already been used for elders, to increase the frequency of physical exercises and to control self-medication or social interaction (Martinez-Martin et al., 2020). Specifically in ASD, robots have been introduced as a “bridge” to facilitate social communication and interaction or as mediators for the recognition and codification of emotions and feelings (Syriopoulou-Delli et al., 2020).

Nowadays, robotic systems are being designed to support clinicians in diagnosis and therapy protocols but also with the ultimate goal to enter daily lives like at home or schools. Regarding therapy, which is expected to bring the highest impact, most of the studies aimed at improving social skills. However, recent studies also used SARs to train motor functions, which can be altered in ASD patients, as described in Jouaiti and Hénaff (2019). Robots have the advantage of being very repetitive and stylised in their appearance, turning out to be more predictable and emotionally more comfortable for ASD children (Diehl et al., 2012). Moreover, they can smoothly execute one task at a time, making the learning process more focused and simpler (Syriopoulou-Delli et al., 2020). Finally, they are concrete objects, occupying a physical space in the therapy room, differently from other rehabilitation technologies, such as computer-mediated games and virtual reality (Winkielman et al., 2016). In this way, therapy-acquired skills can be more immediately generalised and transferred to everyday life.

Although some studies in this field have already demonstrated positive outcomes, an extensive and significant analysis on the applications of robotics for autism is still missing. Some reviews focus only on SARs pilot studies (Martinez-Martin et al., 2020; Saleh et al., 2020), while others are restricted only to the studies with robust evidence of the effectiveness of robot mediated therapies (Salimi et al., 2021). For example, Mazon et al. (2019) consider only randomised control trials and studies with a control group to determine the statistical significance and feasibility of robot-mediated therapies. However, this choice determines a reduced sample size (10 studies) which is compensated by including studies that deal with other technologies, such as virtual-based scenarios for ASD rehabilitation trials. This approach is the same used by DiPietro et al. (2019), which analyses 18 studies and gives a general overview of therapy targets and purposes of each study. Widening the inclusion criteria (by including studies since 1900 instead of 2015 as in DiPietro et al. (2019)), Ismail et al. (2019) increase the sample size to 41. This review focuses more on the clinical aspects of these therapies, while from the technological point of view, only the type of robots used is described. In contrast, in Syriopoulou-Delli et al. (2020), the focus is not on the design of the clinical protocol but more on the outcome metrics of the interventions. In this latter work, studies are divided based on the results achieved in terms of improvements in eye contact, verbal communication, imitation and other social skills. A description of the results is done, but a comprehensive analysis of the robotic intervention is missing.

With this work, we provide a comprehensive review and analysis of the literature to fill the gaps found on previous reviews, presenting evidence on how robots can be used in current clinical practice. We included studies from multiple areas of expertise (clinical and engineering), merging their results to provide a complete picture of the potential impacts for the patients. This review wants to gather the heterogeneous results available in literature and identify criticalities, in order to point the way for future studies. By analysing, simultaneously, the technology-focused literature and the clinical-focused studies in the last 5 years, we discuss here the main technological challenges of robot-mediated ASD therapies. In this way, we specifically aim at extracting guidelines to define (i) the best interaction scenarios and (ii) the best outcome measures that could be used towards widespread adoption of robots for ASD people.

In order to achieve these goals, our methodology follows the different trends of the reviews present in the literature. In the first part, we define the screening criteria used to choose the papers to include in this review. After, we present global categories to evaluate the different studies in terms of technological and clinical challenges. In the second part, we do a meta-analysis of the papers with a stronger level of evidence. In the subsequent section, we present the main results of our methodology. In the end of this paper, we discuss these results considering our two principal aims and defining possible directions for future studies.

Methods

Database Searching and Studies Identification

The search of papers for this review was done on the Web of Science (https://apps.webofknowledge.com/), PubMed (https://pubmed.ncbi.nlm.nih.gov/) and Scopus websites (https://www.scopus.com/). The keywords used were “Robots” and “Autism”. We decided to consider just these two keywords to fit our goals, in order to provide a full overview of all the possible interaction scenarios and outcome measures when robots are used in the lives of people with autism, thus avoiding to focus only on specific applications. Figure 1 shows an exponential increase in the number of Web of Science studies published recently on this topic. Considering that the most cited and recent review about this topic was from Pennisi et al. in 2016 (Pennisi et al., 2016) and that we wanted to analyse the recent growing trends of this topic, we choose to focus our search on all the studies with a publication date between January 2016 and October 2020.

Fig. 1
figure 1

Evolution of the number of studies published in Web of Science with the keywords “Robots” and “Autism”

Then, we removed the duplicates common to the several databases.

Screening Criteria

After a first reading of the title and abstract, the non-relevant studies were excluded. Then, we defined several exclusion criteria:

  • Not related to autism: We excluded all studies whose main focus was not autism. Some studies had future applications in the autistic field but were centred on other disorders, for example, cerebral palsy.

  • Without robot: The studies should use at least one robot. This criterion pretends to exclude the studies in which avatar or videos were used before a future implementation using robots.

  • Insufficient testing: For the quantitative significance purpose, we excluded studies in which the robotic interactions included less than 2 participants.

  • Cumulative studies: We also excluded studies which presented the same study design and patients of other studies.

  • Aggregated disorders: Studies with the main focus on autism but that also had other disorders were excluded if the results for the people with autism were not clear or aggregated with the other disorders.

  • Other reasons: The study did not describe clearly the role of the robot or the robot was used as a model of the disorder and not for an interaction.

Fig. 2
figure 2

PRISMA chart of the proposed review

Evaluation Categories

The eligible studies were classified following seven main criteria:

  • Purpose

  • Robots

  • Human–robot interaction

  • Session scenario

  • Sessions timing

  • Participants

  • Evaluation measures

For some of these areas, multiple categories were a priori identified, while for others, the division emerged from the exploratory review study. Specifically, the main purpose of the different studies included therapy, diagnosis, teaching and platform technological development which we named design. Following previous reviews, the type of robot used in the study was categorised into humanoid and non-humanoid (Pennisi et al., 2016; Jouaiti and Hénaff, 2019; Syriopoulou-Delli et al., 2020). Regarding human–robot interaction, we analysed the type of sensors used during the interaction, the strategies to control the robot and the type of feedback provided during the interaction. According to Dautenhahn (2020), five different strategies of control can be considered:

  • Wizard of Oz, where the robot is fully remotely controlled by a hidden operator,

  • Hybrid control, i.e. the robot has some autonomous behaviours, but the overall control is done through an interface, used by an adult or child.

  • Semi-autonomous control, where the robot behaves autonomously, but all the decisions require the approval of the supervising adult.

  • Fully autonomous strategy, i.e. the robot behaves autonomously with no intervention by any operator.

  • Autonomous and adaptive strategy, where the robot perceives the reactions of the subjects and adapts its behaviours according to them.

Moreover, a focus on the session design was done, namely identifying therapy location (research laboratory, clinical, home or school-setting) and session features—i.e dyadic or triadic interactions, number of sessions, number of sessions per week and session duration. Then, the main characteristics of the participants in terms of age and number of children were analysed. Finally, we extracted the inclusion criteria metrics used for the recruitment of the target population and the evaluation measures to monitor the evolution of the children’s performance across the session.

Cross Categories Analysis

In order to have a deeper overview of the current literature and suggest critical aspects to investigate in future studies, we compared several categories to observe the correlation between them:

  • Purpose vs control strategy

  • Setting vs control strategy

  • Purpose vs evaluation measures

Randomised Controlled Trials and a Meta-Analysis

After a comprehensive overview of the topic, we decided to focus on the therapy studies with higher levels of evidence: the randomised controlled trials. Firstly, we performed a qualitative analysis, focusing on the categories previously presented, especially on the outcome measures used. We verified if these measures were significant in favour of the robotic group or not. In particular, the clinical scales were evaluated in terms of reliability, to define the ones which could be used in future studies. After, we classified all the outcome measures according to the International Classification of Functioning, Disability and Health (ICF) framework (World Health Organization, 2001) to understand how they were related to the daily life of subjects with ASD.

Table 1 Most common evaluation metrics on the main areas of therapy

Secondly, we did a quantitative analysis (a meta-analysis) with the studies that reported full numerical results. For each study, we extracted the number of participants, the mean values and standard deviations of continuous outcome measures. We also analysed each article in terms of risk of bias. After, we divided the five studies according to their ICF criteria in order to reduce their heterogeneity. For the analysis of the treatment effects, we used the standardised mean difference with 95% confidence interval.

Results

The initial search with the keywords explained in “Methods” led to a collection of 804 papers. Removing the duplicates and the non-relevant studies, the sample was reduced to 439 papers. Applying the inclusion exclusion criteria, the number of studies chosen for the analysis was 146. A full overview of this process in terms of a PRISMA chart is presented in Fig. 2. From this chart, we can already point out a significant number of studies (n=106) that designed robotic interaction protocols for people with autism, but never tested them or did only with one person.

The description of each of the selected papers, in the categories defined previously, can be found in Tables 1 and 2 of the Supplementary Material. Table 1 focuses on the technical categories and Table 2 on the clinical categories. In this section, first, we do a statistical analysis of each of the categories, followed by a meta-analysis of the randomised controlled trials collected for this review.

Table 2 Types of quantitative measurements

Purpose

The main purpose of the analysed studies was therapy, namely the development of new robotic therapies for ASD (Fig. 3) to improve specific social skills like emotion recognition (Marino et al., 2020), joint attention training (Ali et al., 2019; Cao et al., 2019; Mehmood et al., 2019) and sensory processing (Javed et al., 2019). Other works focused on a global improvement of social skills (Feng et al., 2017; David et al., 2020; Yoshikawa et al., 2019) or communicative skills such as gestural skills training (So et al., 2019a, 2018b) and imitation training (Di Nuovo et al., 2020), intrinsically related to the motor skills training.

Another goal of the analysed studies was supporting the diagnostic process  (Petric et al., 2017; Petric and Kovačić, 2019; Del Coco et al., 2018; Ramírez-Duque et al., 2020; Kumazaki et al., 2019b). Since ASD diagnosis is based on behaviour, some works identified automatised protocols using robots that can ease at least a part of the ASD diagnostic process. Other studies aimed at teaching people with ASD specific academic skills/subjects such as programming and mathematics (Saadatzi et al., 2018; Choi et al., 2016; Clabaugh et al., 2019; Arshad et al., 2020). Then, there are 30 studies (21% of the analysed studies) centred on the design of cutting-edge specific features (such as gaze analysis or social differences in engagement) to be integrated into larger systems, paving the way to future studies (Hirokawa et al., 2019; Rudovic et al., 2017; Wan et al., 2019; van Straten et al., 2017; Roberts-Yates et al., 2019).

Fig. 3
figure 3

Purpose of the analysed studies

Robots

Humanoid robots are used in 75% of the considered studies, principally thanks to their resemblance to humans and their capability to perform different tasks. Their usage spans from imitation to emotion recognition. They usually have simpler expressions than humans, which can ease the work with ASD subjects (Pennisi et al., 2016). NAO is the most used humanoid robot (Fig. 4) (Alnajjar et al., 2020; Baraka et al., 2020; Amanatiadis et al., 2020; Petric and Kovacic, 2020; Qidwai et al., 2020; Billing et al., 2020; Chung, 2020; Korte et al., 2020; Barnes et al., 2021; So et al., 2020), probably because it is a commercial robot, thus more accessible. It has 25 degrees of freedom on the full body and sensors (touch sensors, microphones and two cameras), which is ideal for therapeutic sessions. Moreover, it has 16 eye LEDs and two loudspeakers, that are useful for multi-sensory interaction.

Next, in terms of frequency of utilisation in the analysed studies, there are Zeno and Actroid F. Zeno distinguishes from NAO for the possibility of displaying several facial expressions, thus being used for emotion recognition therapies (Schadenberg et al., 2020; alvador et al., 2016; Wijayasinghe et al., 2016; Chevalier et al., 2017a; Marinoiu et al., 2018; Del Coco et al., 2018; Palestra et al., 2016). Actroid F is an android female robot with a high resemblance to a human interviewer in terms of hair, facial features and voice. Thus, it has been used to train ASD adolescents in social interactions, namely for jobs interviews (Kumazaki et al., 2019a, c, 2018a, 2017; Yoshikawa et al., 2019).

CommU and Kaspar are present in the same number of studies. Both move only the upper part of the body. Kaspar can modulate facial expressions, and it can react to touch (Zaraki et al., 2020; Robins et al., 2017; Zaraki et al., 2018), while CommU has a simpler design, and its point of strength is in the high degree of freedom of the eyes, being perfect for joint-attention tasks (Kumazaki et al., 2019c, 2018a, b).

Fig. 4
figure 4

Humanoid robots used in the literature

Fig. 5
figure 5

Types of controllers used

Human–Robot Interaction

Regarding the human–robot interaction, it can be analysed according to the three different subsections described before.

Control Strategies

The control of the robot has been evolving in recent years towards increasing autonomy and adaptability. Wizard of Oz control is adopted in Rudovic et al. (2018); Kumazaki et al. (2019a); Yoshikawa et al. (2019); Anzalone et al. (2019); Valadão et al. (2016); Ishak et al. (2019). Instead, in Costa et al. (2018); Bharatharaj et al. (2017b); Desideri et al. (2017); Bharatharaj et al. (2017a); Desideri et al. (2018), a hybrid strategy is applied.

Increasing the level of autonomy, Cao et al. (2019); Melo et al. (2019); Anzalone et al. (2019); Petric and Kovačić (2019); Cai et al. (2019) use a semi-autonomous strategy. Giannopulu et al. (2018); Silva et al. (2019); Ponce et al. (2017); Matsuda et al. (2017) prefer a fully autonomous control. Finally, within the autonomous and adaptive systems, Clabaugh et al. (2019) aim at training math skills in ASD children, with robots adapting the difficulty of the game to ASD subjects’ performance and attention.

From Fig. 5, we can observe that there is still a small number of studies that use an autonomous and adaptive control strategy (Jain et al., 2020; Zheng et al., 2020; Yun et al., 2016), compared to other forms of control. Thus, this field is still in evolution. Due to the variability of symptoms in ASD people, flexible platforms should be preferred. They should adapt exercises and protocols to each person and maintain people’s engagement throughout the therapy.

Type of Sensors

Sensor-wise, cameras are mainly used to monitor the sessions. They are either integrated into the robot or independent. The number of cameras can go from one to four. Depending on the purpose of the study, other sensors have been used, namely depth cameras for imitation training (as the Microsoft Kinect) (Mehmood et al., 2019; Taheri et al., 2018; Wijayasinghe et al., 2016), gaze trackers (as Tobii) for joint attention training and monitoring (Cao et al., 2019; Yoshikawa et al., 2019), surface EMG for facial expression detection (Kim et al., 2018) and EEG for evaluating electrophysiological activity correlations (Ali et al., 2019; Arpaia et al., 2020).

Type of Feedback

Robots can also provide feedback during exercises. The principal type of feedback is vocal (Saadatzi et al., 2018; Di Nuovo et al., 2020; David et al., 2020; Wan et al., 2019; Zheng et al., 2018), which is mainly positive, trying to reward the subjects when they accomplish a task. Some other studies also include feedbacks given through lights (Bharatharaj et al., 2017a; Boccanfuso et al., 2017; Ackovska et al., 2017) or movements (Clabaugh et al., 2019; Zheng et al., 2018; Scassellati et al., 2018; Axelsson et al., 2019). The combination of different feedbacks is sometimes done incrementally, varying and increasing the types of feedback to sustain subjects’ attention (Xiaofeng Liu et al., 2016; Ali et al., 2019).

Fig. 6
figure 6

Types of more common interactions

Sessions Scenario

From the point of view of the session organisation, most studies involve only the robot and the ASD subject in a dyadic interaction (Fig. 6) (Rudovic et al., 2018; Kumazaki et al., 2019a; Choi et al., 2016; Yun et al., 2017; Melo et al., 2019; Casas-Bocanegra et al., 2020; Alhaddad et al., 2018; Rakhymbayeva et al., 2020; Bernardo et al., 2016; Dickstein-Fischer et al., 2017). However, some studies design triadic interactions between the robot, the ASD patient and another agent, actively participating in the therapy protocol. This third agent can be either the therapist, the researcher or the parent (triadic human) (Chung, 2019; Silva et al., 2019; Taheri et al., 2019; Scassellati et al., 2018; Albo-Canals et al., 2018; Taheri et al., 2018; Golestan et al., 2017; Srinivasan et al., 2016; Yun et al., 2016; Zaraki et al., 2020; Attawibulkul et al., 2019; Cervera et al., 2019; Sperati et al., 2020). However, in very few studies, the other agent is another robot (triadic robot) (Ali et al., 2019; So et al., 2019a; Mehmood et al., 2019; Chevalier et al., 2017a; So et al., 2020). This last option is rarer since the robot in the triadic interactions preferably acts as a facilitator (providing easier and more direct learning opportunities) or a mediator (giving hints in cooperation activities) for the human-to-human interaction, helping the learning process.

Most experiments presented in the analysed studies have been developed either in a research laboratory (Chevalier et al., 2016; Zhang et al., 2019a; Taheri et al., 2018; Javed et al., 2018; Short et al., 2017; Zheng et al., 2016; Ghorbandaei Pour et al., 2018) or in a clinical setting (Kostrubiec and Kruck, 2020; Sperati et al., 2020; So et al., 2018b; Amanatiadis et al., 2017; Chevalier et al., 2017b). Few studies have been tested in homes (Jain et al., 2020; Ponce et al., 2017; Scassellati et al., 2018; Clabaugh et al., 2019) or schools (Zhang et al., 2019b; Albo-Canals et al., 2018; Simut et al., 2016; Knight et al., 2019; Fuglerud and Solheim, 2018; Krichmar and Chou, 2018), since these environments are less controllable, due to multiple noise sources and variability factors (Fig. 7). Indeed, schools and home settings are not usually ideal places for acquisitions aimed at assessing robotic platforms functionality. Firstly, due to the need for several sensors, as listed in “3.3”, which are extremely sensitive to noise. Secondly, camera recognition algorithms are less efficient in the presence of several objects/people, which are present in home/school rooms, but not in a research or clinical setting. Finally, in a clinical or research setting, an expert can always monitor the experiment and the robot’s behaviour, which is not possible in a home setting. Despite these difficulties, there is an effort to advance the platforms already tested in research settings, towards home settings, in order to achieve a bigger impact on ASD subjects thanks to more intensive and immersive daily training.

Fig. 7
figure 7

Settings where the experiments are executed

Fig. 8
figure 8

Design criteria for control strategies categorised based on the purpose (a) and the sessions settings (b)

Technological Design Challenges

The technological design of robotic platforms for ASD therapy usually depends on the purpose of the study and the application environments. Indeed, when choosing the robot’s control strategy, it can be observed from Fig. 8a that autonomous controllers are still not so used in pure design studies, where Wizard of Oz is preferred to assess the functioning of the robotic platform. However, autonomous controllers are crucial in therapy or teaching scenarios. Instead, for the diagnosis purpose, semi-autonomous control is one of the principal choices, since, for ethical reasons, a physician should always confirm the diagnosis of a child with autism.

Moreover, the application scenario is a fundamental factor for controller design: autonomy is required when going towards more complex environments as homes (Fig. 8b). The flexibility provided by the autonomous and adaptive strategy is essential in a house setting, where neither the researcher nor the physician can be present and change the type of protocol or the robot behaviour to cope with varying scenarios. On the other hand, in a clinical setting, there is often an observation room; thus, the Wizard of Oz strategy is easier to implement. However, the lack of adaptability of this strategy can limit the impact of robotic therapies compared to other ASD interventions (Srinivasan et al., 2016). That is why other approaches with a larger autonomy are being implemented.

Sessions Timing

Some studies do not report the duration and the number of sessions done. Several studies include less than five sessions, most only one session (Taheri et al., 2019; Askari et al., 2018; Moghadas and Moradi, 2018; Nakadoi, 2017; Suzuki and Lee, 2016; Wanglavan et al., 2019; Guedjou et al., 2017), as they are small pilot studies, aiming to evaluate the platform and protocol feasibility and the subjects’ engagement in these novel robotic approaches (Fig. 9a). So et al. (2019b); Han et al. (2016); Cervera et al. (2019); So et al. (2018a) applied robotic therapies for more than 15 sessions. Across the different studies, the maximum number of sessions registered is 40 in Han et al. (2016).

As for session duration, most studies have a short duration, less than 15 min, to avoid losing attention from ASD subjects (Fig. 9b) (Nuovo et al., 2018; Bharatharaj et al., 2017c; Golestan et al., 2017; Tariq et al., 2016; Boccanfuso et al., 2016; Attawibulkul et al., 2019; Chevalier et al., 2016). Few studies as Yun et al. (2016); Mavadati et al. (2016); Korte et al. (2020); Sakka et al. (2018) planned sessions longer than 30 min. Actually, Costa et al. (2018); Pop et al. (2017); Arias and Madrid (2017); Koch et al. (2017) registered a maximum duration of 120 min. In Arias and Madrid (2017), pauses of 15 min have been added to switch from one task to the next.

Future studies should increase the session duration and the number of sessions in the study protocol to understand better the long-term impact of robot-mediated therapy on ASD subjects. We assume that an improvement in social and communicative skills can be detected only after several therapy sessions.

Fig. 9
figure 9

Session timings variables, namely number of sessions (a), their duration (b) and their frequency (c)

Moreover, from less to more recent analysed works, we found a tendency to increase the frequency of the sessions. Most of the contemporary studies still plan sessions once a week or once every 2 weeks (Fig. 9c) (Chung, 2019; Yun et al., 2017; Moorthy and Pugazhenthi, 2016; Beer et al., 2016; Zheng et al., 2020; Marino et al., 2020; Carlson et al., 2018; Wong et al., 2016; Hudson and Lewis, 2020). However, giving the evidence that a prolonged and repeated exposure to given stimuli provides higher chances of learning and a potentially larger number of learnt skills, experimenters have started to increase frequency to twice or more times per week (Nuovo et al., 2018; Clabaugh et al., 2019; Kumazaki et al., 2019a; Cao et al., 2019; So et al., 2019b; Saadatzi et al., 2018; Scassellati et al., 2018; Albo-Canals et al., 2018; Bharatharaj et al., 2017a, c; Boccanfuso et al., 2017; Choi et al., 2016; Srinivasan et al., 2016; Han et al., 2016; Yun et al., 2016; Marino et al., 2020; Fuglerud and Solheim, 2018; Schadenberg et al., 2020; Telisheva et al., 2019; Carlson et al., 2018; So et al., 2018a, b).

Just 12% of the studies try to do follow-up acquisition of outcome measures, to test the generalisation and persistence of the skills acquired during the therapy/teaching robotic sessions on a longer time-scale or the robustness of a diagnosis (Chung, 2019; Han et al., 2016; Anzalone et al., 2019; Scassellati et al., 2018; Moorthy and Pugazhenthi, 2017). In these cases, the long-term follow-up took place 2 weeks or more after the termination of acquisitions.

Fig. 10
figure 10

Participants variables, namely number of participants in the studies (a), their range of ages (b) and the control groups that have participated (c)

Participants

Most studies still have very reduced samples, below ten users, because they are mainly pilot studies and proofs of concept, as shown before (Fig. 10a) (Arent et al., 2019; Telisheva et al., 2019; Ahmad et al., 2017; Javed et al., 2020; Aryania et al., 2020; Palestra et al., 2017; Nunez et al., 2018). Therefore, they have low statistical power. The reduced number of recruited children is often a consequence of the attempt to recruit a homogeneous group to minimise the effects of the phenotypic variability in ASD that can be a challenging and confounding factor. In order to understand the impact of robotic therapies in ASD subjects, some studies are recruiting larger samples (more than 30 participants) to achieve statistical significance (Wan et al., 2019; Anzalone et al., 2019; Rudovic et al., 2017). For this purpose, it is not only necessary to obtain larger samples but also to have a control group (Fig. 10c). Only 13% of the studies chose a control group of ASD subjects following standard ASD protocols (Aryania et al., 2020; Billing et al., 2020; Qidwai et al., 2020). Other studies selected neurotypical people as a control group, to find some differences between the two populations and to show that the robotic systems can capture these differences (Javed et al., 2020; Petric and Kovacic, 2020; Kumazaki et al., 2019c). In Ramírez-Duque et al. (2020), there is also a comparison with a population of Down syndrome patients to test if the robotic diagnostic system can help identifying the differences between the two groups.

Analysing the participants’ age, most of the studies focus on primary school age (Short et al., 2017; Choi et al., 2016; Xiaofeng Liu et al., 2016; Pérez et al., 2019; Silva et al., 2018; Suzuki et al., 2017), probably for better compliance with humanoid robots and task comprehension (Fig. 10b). However, an effort has been made to introduce robot’s use in preschoolers because of the evidence that early intervention can be crucial (Nie et al., 2018; Zheng et al., 2018; Del Coco et al., 2018). This could also be why high school participants and adults are included only in a few studies.

Metric-wise, we found different tools used to include patients: some studies included patients based on diagnostic criteria from the Diagnostic and Statistical Manual of Mental Disorders-IV and V (So et al., 2019a; David et al., 2020; Giannopulu et al., 2018), and others based on Childhood Autism Rating Scale Schedule (Rudovic et al., 2017; Giannopulu et al., 2016; Schadenberg et al., 2020); some others chose diagnostic tools such as Autism Diagnostic Observation Schedule (ADOS) (Kumazaki et al., 2018b; van Straten et al., 2017; Askari et al., 2018) or Autism Diagnostic Interview-Revised (Zhang et al., 2019a; Aryania et al., 2020; Nuovo et al., 2018), and others collected anamnestic data and symptoms through parent questionnaires, as the Social Responsiveness Scale (Zhang et al., 2019b; Chung, 2019).

Evaluation Measures

In case of therapy purposes, the evaluation measures used by the several studies are multiple and substantially different, making the comparison across the studies hard. Table 1 shows this variability for the most common areas of therapy (joint attention, motor therapy and emotion recognition) using robotics in autism. The majority of the measures cannot be considered systematic outcomes because they are not collected at the beginning and end of the study to evaluate the response to the therapy (pre/post outcomes). Instead, they are internal variables that assess the progress of the exercises during the human–robot interaction, within the therapy.

These metrics can be divided into more qualitative and quantitative metrics. The qualitative metrics include the emotional feeling report (Giannopulu et al., 2018; Yun et al., 2016; Giannopulu et al., 2020) or questionnaires answered by the parents, which are common to the three areas presented in Table 1 (Wan et al., 2019; Taheri et al., 2019; Askari et al., 2018). The quantitative metrics have been divided into three subtypes: manual, automatic and physiological.

As described in Table 2, the majority of quantitative measurements are collected manually by different observers using either clinical scales as the Early Social Communication Scale (Carlson et al., 2018; Nie et al., 2018; So et al., 2020) or specific exercise parameters like the imitation accuracy (Cao et al., 2019; Mehmood et al., 2019; Zheng et al., 2016; Moorthy and Pugazhenthi, 2016). In other cases, these measures are automatically acquired by the robotic system, so that they are more objective, easier to obtain and can be included in the robot’s control loop. For example, Zheng et al. (2020) monitored the number of times a child looked at a target in a joint attention task, in real time. Other examples of automatic measures are the fixation time on a given target (Cao et al., 2019) or the performance of a gesture, evaluated through dynamic time warping between the movement of the child and the one of the robot (Wijayasinghe et al., 2016). Moreover, there are new techniques that allow the recognition of child emotions (angry, sad, happy, afraid) (Bharatharaj et al., 2017c) or engagement in a given task (Feng et al., 2017; Jain et al., 2020), combining the information about attention to the task, proximity of the child to the robot and facial expressions. Recently, there has been a growing interest in introducing more accurate evaluations, including measures of physiological parameters to assess stress outcomes, as the salivary cortisol (Bharatharaj et al., 2017a) and the heart rate variability (Giannopulu et al., 2016), or measures of attentiveness through the EEG power density (Mehmood et al., 2019).

Overall, researchers are trying to convert manual and qualitative evaluations, into automatic and quantitative measures. The introduction of more automatic measures at design stage is visible in Fig. 11. Automatic measurements are even more important for diagnosis purposes, since this would diminish the work of clinicians who have to observe, code and interpret several behaviours of the children simultaneously to provide a diagnosis (Petric and Kovačić 2019).

Fig. 11
figure 11

Purpose vs type of quantitative measurements. Automatic measures include both features computed from behavioural signals (e.g. motor signals) and measurements of physiological parameters

Randomised Controlled Trials

To investigate the potential of robots in autism more deeply, we decided to evaluate the results from the ten randomised control trials (RCTs) included in this review, all aimed at testing robotic therapies for ASD. Table 3 of the Supplementary Material presents the full results. Several studies focus on the three different therapeutic purposes described previously (“3.1”): Pop et al. (2017) and Marino et al. (2020) focus on emotions; Zheng et al. (2020) and Srinivasan et al. (2016) focus on joint attention; and So and colleagues concentrate on the importance of gestures production and recognition to improve social skills of ASD children (So et al., 2018a, b, 2019a, 2020). Korte et al. (2020) explore a different direction for social skills training, focusing on the importance of self initiations for communicative skills in children with ASD. Yun et al. (2017) also evaluate the effect of social skills training, combining eye contact training with facial and emotion recognition tasks.

All the studies included a group doing a robotic therapy, called the robotic group, and a control group. The control group sometimes was constituted by children with ASD not doing any therapy while waiting to be admitted for the robotic therapy (wait-list group) (So et al., 2018a, b, 2020; Zheng et al., 2020). In some cases, additional control groups were chosen to allow further comparison with the ASD children in the robotic group, e.g. a group trying a different therapy (rhythm therapy) in Srinivasan et al. (2016) and a neurotypical group in So et al. (2018b). In Tables 3 and 4, the outcome measures of the several RCTs are reported. The ones with significant effects in the robotic group from the pre-test to the post-test are indicated.

Table 3 Outcome measures of randomised controlled trials—part I
Table 4 Outcome measures of randomised controlled trials—part II
Table 5 Reliability of the clinical scales used

In total, 109 children were part of the robotic group and 104 children of the control group. Seven out of ten studies had some outcome measures with a significant effect in favour of the robotic group (\(p<0.05\) in parametric and non-parametric statistical tests) (So et al., 2018a, b, 2020; Korte et al., 2020; Marino et al., 2020; Pop et al., 2017).

Five out of these seven papers verified the maintenance of the effect in follow-up analyses (So et al., 2020; Korte et al., 2020; So et al., 2018b, a; Yun et al., 2017). Yun et al. (2017) have reported the absence of effect on eye contact during the follow-up. In contrast, So et al. (2018b) and So et al. (2020) found that all the measures which reported to have a positive effect in favour of the robotic group have maintained this effect also 2 weeks after. In addition, Korte et al. (2020) verified that the score in the Social Responsiveness Scale for parents had a significant decrease during follow-up, despite having no significant change in the immediate post-test. Therefore, the authors could conclude that robotic therapy promoted a long-term improvement, contrarily to the control group.

On the other hand, three studies reported no significant outcome measures in favour of the robotic group. Srinivasan et al. (2016) reported reduced joint attention for the robotic group, but the authors ascribed this to the Wizard of Oz control strategy, which probably made the exercises boring and not enough engaging. That is why they point out autonomy as one of the principal directions for future work. On the other hand, Zheng et al. (2020) show no overall evidence in favour of the robotic group. To better understand the results, they divided the robotic group into two groups: responders and not responders to robotic therapy. Significant effects were found in each specific group. These observations suggest the importance of individualised and personalised interventions for children with ASD, given the heterogeneity that characterises this condition. A similar problem is reported in So et al. (2019a), where the considerable variability of the results was ascribed to differences in severity in autism, cognitive functioning and communication skills of recruited patients, hampering the expression of any significant impact of the robotic therapy.

Analysing deeper the measures used, most of the chosen clinical scales have high reliability both in terms of time (test-retest) and observer (inter-rater) (Table 5). Thus, the consistency of measures supports the results obtained for the robotic therapy. Moreover, outcome measures were classified according to the International Classification of Functioning, Disability and Health (ICF) framework (World Health Organization, 2001). The majority of the outcomes were evaluated under the ICF domain of Activities and Participation (Table 3 and 4). Therefore, these studies prove that robotic therapy can indeed impact the daily life of subjects with ASD.

Meta-Analysis

From the selected randomised controlled trials, just five reported full numerical results and were combined in a meta-analysis. The outcome measures of the 5 studies were categorised based on the ICF criteria, in order to reduce the heterogeneity. Therefore, we performed two different meta-analyses: one for the ICF criteria d720 Complex interpersonal interactions and another for the ICF criteria d160 Focusing attention. For the analysis of the treatment effects, we used the standardised mean difference with 95% confidence interval.

Regarding the first criteria, d720 Complex interpersonal interactions, it refers to the capability of sustaining a structured interaction with others, in a contextually and socially appropriate manner (World Health Organization, 2001). Four studies evaluated this category through 96 children with ASD recruited in four clinical trials. The forest plot in Fig. 12a revealed a tendency in favour of the experimental group (standard mean difference 0.22; confidence interval (\(-\)0.52, 0.97)), but not statistically significant (p=0.55).

The second ICF criteria analysed, d160 Focusing attention, refer to the ability of sustaining the attention, by means of filtering out all the disturbing noises (World Health Organization, 2001). Just two studies (Yun et al. (2017) and Zheng et al. (2020)), recruiting 35 children with ASD, evaluated the capability of directing the attention with similar definition of the outcome measure. In fact, the analysis is centred on the target hit rate and frequency of eye contact. The forest plot reported in Fig. 12b demonstrates a tendency in favour of the experimental group (standard mean difference 0.18, confidence interval (\(-\)0.49, 0.84)), but a statistical significance was not found (p=0.60).

In terms of risk of bias, two of the studies had a high risk of bias and three were classified with “Some concerns”, which reveals the lack of high-quality evidence regarding this particular field. The most problematic criteria were the ones related to the randomization process and the deviations from intended interventions. The bias in the randomization process was related to the lack of information regarding the allocation concealment. The majority of the studies did not say how they generated the allocation sequence and how it was concealed from the study participants. Regarding the second problematic criteria, in all the studies, the participants were not restricted from taking other collateral interventions. Even though this creates a risk of bias, it is an intrinsic requirement from institutions ethical committees regarding innovative technologies, since the children cannot stop their rehabilitation process for a technology whose impact is unknown.

Fig. 12
figure 12

Meta-analysis with the randomised controlled trials selected. In a, it is presented the analysis of the ICF criteria d720 - Complex interpersonal interactions, and in b the analysis of the ICF criteria d160

Discussion and Future Work

This review shows a recent growing interest in using robotic strategies for ASD. We have included and analysed studies coming from multiple fields of expertise and background, resulting in a high heterogeneous pool of studies. Despite this heterogeneity, the overall synthesis provided by this review can help the multiple and diverse professionals who work in the field to improve collaboration and cross-fertilisation, to leap towards more effective clinical studies with significant impact. In particular, based on the results of our analysis, engineers should be more aware of the importance of early intervention, adaptability and flexibility of developed platforms, while clinicians should introduce more quantitative and standardised measurements in their clinical practice, benefiting from new technologies. The overall semantic approach of the review provides a common framework for the multidisciplinary stakeholders towards a transdisciplinary reading of the literature. It highlights many crucial issues for future research towards translation to clinical practice.

From the engineering point of view, humanoid robots have been used in the majority of ASD studies. Their human resemblance has the advantage of engaging patients while training motor and social skills, facilitating the generalisation process. Up to date, most of the analysed studies show the use of robots for rehabilitation therapy purposes, but few recent papers deal with the potential use of robots as a part of the diagnostic process (Petric and Kovačić 2019; Del Coco et al., 2018; Askari et al., 2018; Ramírez-Duqueet al., 2020; Moghadas and Moradi, 2018; Petric et al., 2017; Wijayasinghe et al., 2016; Arent et al., 2019; Zhang et al., 2019a; Petric and Kovacic, 2020). From the clinical point of view, a large part of the studies considers small samples, not homogeneous for age, and uses different training protocols and outcome measures. There is also a lack of variability of control groups, a limited number of randomised controlled trials with follow-up studies. In our meta-analysis, we did not obtain any significant effect although both analysis were in favour of the robotic group, suggesting promising preliminary results.

Generally, for the future, there is a great need to design more standardised protocols in terms of timing (duration, frequency, follow-up) of sessions and rehabilitation setting from both clinical and engineering points of view. Since human–robot interaction can have an impact at all ages of ASD subjects, it would be interesting to study age-targeted training protocols, for example, including preschooler children, due to the importance of early interventions on evolutionary trajectories. Moreover, these studies should have larger sample sizes to extract stronger conclusions. We are aware that robots are still costly and that performing clinical trials requires significant effort, but it is particularly important to cope with the weakness in this field in terms of the size of the studies.

More specifically, future studies should include interaction scenarios with the following characteristics:

  • Automatic and adaptive robot controllers: There is a tendency to move towards more automatic robot controllers. This is fundamental to design and adapt the training protocols on the specific needs of each ASD subject in the spectrum, with the opportunity to modulate the difficulty of the exercise and the feedback based on the subject’s characteristics and responsiveness, making the therapy personalised and/or incrementally challenging. This has been pointed out as crucial in order to promote and individualise the learning process and sustain attention and engagement. Despite some preliminary studies on this, additional technological effort is required, together with a more collaborative development of robotic platform between engineer researchers and therapists.

  • Robust and transparent monitoring systems: The robot should have more and more sensors to evaluate the participant status/progress. The monitoring system can be very different depending on the application, but it should be robust and transparent to the participants.

In relation to the outcome measures of future studies, they should be as follows:

  • Quantitative and automatically extracted from the robot: Quantitative automatic measures are essential for adaptive robot controllers and allow a homogenisation of the metrics used in different studies, allowing a better comparison between them.

  • Validated with clinical knowledge: The augmented data collected through more robust monitoring systems does not represent augmented information about the participant experience. Dedicated cross-validation between the clinical scores and clinical experience-based evaluation and the parameters extracted from the data is mandatory to define meaningful standard measures.

We believe that these guidelines are essential to have more evidence-based therapeutic protocols and to ultimately bring robots to home and school environments, where they can have a direct impact on the daily life of people with ASD.