Introduction

Autism and Social Differences

Effective social communication depends on a range of social skills, a term which is complex and has been variably defined across the literature, but is broadly considered to encompass verbal and non-verbal behaviours that enable positive social interactions (Gresham & Elliott, 1984). This multifaceted concept includes innate skills, such as eye tracking and a preference for faces, as well as more complex, learned behaviours, such as adjusting behaviour to suit particular social contexts (Spence, 2003). An exhaustive list of skills considered to be social is difficult to establish, due to the number of cognitive and behavioural processes drawn upon in any given social situation.

AutisticFootnote 1 children’s social skills do not always align with those of neurotypical people. Their skills are different in terms of social-emotional reciprocity (e.g. social approach and initiation of interactions), nonverbal communicative behaviours (e.g. eye contact, and use of facial expressions), and the development, maintenance and understanding of relationships (e.g. reduced interest in peers and difficulties making friends); Diagnostic and Statistical Manual of Mental Disorders, 5th ed.; DSM-5; American Psychiatric Association [APA], (2013). These differences manifest differently for each individual (Happé et al., 2006) and can have negative secondary effects on wellbeing, learning and relationships. For example, social skills which do not align with neurotypical people have been shown to predict social anxiety (Bellini, 2006) and are associated with social rejection from peers (Ochs et al., 2001). This is not through lack of interest in social contact (Jaswal & Akhtar, 2019); autistic young people often express a desire for social interaction and report being more lonely than their neurotypical peers (Bauminger & Kasari, 2000). Furthermore, social differences in autistic people can have a negative effect on academic and occupational outcomes (Howlin, 2003; Welsh et al., 2001).

Theories of Social Difference in Autism

A number of theories might explain the social differences observed in autism. One theory proposes that autistic people can find it harder to make predictions about others’ mental states (theory of mind) and so may find it difficult to encode socially relevant information (Baron-Cohen et al., 1985). Alternatively, it may be that difficulty understanding others’ interpretations of the world is due to an underlying difference in executive functioning, such as rule learning (J. Russell, 1997). This can account for the finding that some autistic children struggle with theory of mind style tasks, even when the social element is removed (Russell et al., 1991). In particular, predictive processing might be different in autism. Pellicano and Burr (2012) propose that autistic people perceive the world more accurately, as they are less biased by prior experience when making predictions. At times this may be a strength, but could make some elements of social interaction more difficult, such as predicting when another person will speak (Stark et al., 2021). Another cognitive theory of autism is ‘weak central coherence’ (Frith, 1989) which suggests that autistic people have a preference for local processing rather than striving for overall meaning. This may lead to differences in the interpretation of social behaviours. However, more recent evidence suggests that this theory sits alongside social differences, rather than explains them (Happé & Frith, 2006). These cognitive theories are in line with evidence that social desire is not affected by autism (Happé & Frith, 1996).

In contrast, social motivation theory proposes that autistic people have differences in the psychological dispositions and biological mechanisms that bias neurotypical individuals to orient towards social stimuli, seek and derive pleasure from social interactions, and work to maintain social bonds (Chevallier et al., 2012). However, this contrasts with many accounts of autistic people, overlooks the role of other people’s perceptions and responses, and ignores the many other explanations of social differences in autism (Jaswal & Akhtar, 2019). Furthermore, this theory has led to differences such as limited eye contact being prioritised as intervention targets, despite evidence for the positive effects of gaze aversion on concentration and regulation of emotions (Robledo et al., 2012).

Social Skills Interventions

A number of interventions have sought to teach social skills to autistic children, due to the secondary benefits on wellbeing and quality of life shown by previous research (McConnell, 2002). A range of outcomes have been targeted, such as turn taking, initiating interaction, recognising emotions and understanding theory of mind (Begeer et al., 2011; Matson et al., 2007; Rieth et al., 2014). Well-established interventions include Social Stories™ (Karkhaneh et al., 2010), video modelling (Sng et al., 2014), pivotal response training (Koegel et al., 2016), social skills groups (Reichow et al., 2012) and peer-mediated behavioural interventions (Laushey & Heflin, 2000). There is increasing evidence that, at least in some contexts, these interventions can help autistic children to utilise neurotypical social skills but the duration, generalisability and precise mechanism of change remains unclear (McConnell, 2002). Furthermore, little acknowledgement has been given to the fact that it may be effortful, and perhaps emotionally draining for autistic people to operationalise these skills.

When considering social skills interventions, it is important to acknowledge that social differences in autism do not necessarily equate to a ‘difficulty’. Many argue that difficulties arise due to a lack of accommodation of autistic differences in a neurotypical society (Brownlow, 2010), and emerging research is suggestive of a ‘double empathy problem’, whereby communication breakdowns between autistic and non-autistic people are a two-way issue resulting from disjuncture in reciprocity between the two differently disposed social actors (Edey et al., 2016; Milton, 2012; Sheppard et al., 2016). Social skills training can therefore perpetuate the idea that autistic people need to adapt, without consideration of the role of the contexts in which autistic people live, learn and work in understanding, supporting and responding to autistic communication. However, many autistic people already find ways to overcome social challenges and seek support in doing this (Cresswell et al., 2019). Social skills interventions are one way in which services can support people to overcome these challenges, particularly in circumstances where it is not possible to adapt the environment e.g. public places.

Extended Reality Technologies

Over the last three decades, social skills interventions have begun to capitalise on the potential advantages of digital technologies. Computerised approaches can support social skills in situ (Vygotsky & Luria, 1994) or improve traditional training approaches, due to the attentional and motivational advantages of interactive technologies (Golan & Baron-Cohen, 2006). More recently, these advantages have been exploited using extended reality (XR), i.e. the merging of physical and virtual realities (Riva et al., 2016). This encompasses augmented reality (AR), in which virtual information is overlaid onto the real world, virtual reality (VR), in which users are fully immersed in a simulated environment and mixed reality (MR), in which digital and real-world objects interact in real-time. By merging physical and virtual realities, XR could reduce the salience of potentially distressing stimuli, making interactions less overwhelming for some autistic people, thus providing an effective space to experiment with social skills (Thye et al., 2018). In virtual social settings, there is no risk of embarrassment in front of one’s peers, yet the degree of realism may increase the likelihood of generalisability (Strickland et al., 1996).

Extended reality interventions have been successfully implemented in a number of therapeutic contexts including stroke rehabilitation (Laver et al., 2017) and exposure therapies for anxiety disorders (Bouchard et al., 2017; Vincelli et al., 2003). Applications for autistic people are also emerging, although the majority of interventions to date have used virtual learning environments (VLEs), a non-immersive technology in which users interact with simulated social situations using a desktop computer (e.g. Didehbani et al., 2016). More immersive technologies tend to require a head-mounted display (HMD) or Smartglasses, computer glasses that change what the wearer sees. Despite the potential challenges for those with tactile hypersensitivities, they have been shown to be well-tolerated, enjoyable and engaging in this population (Keshav et al., 2017; Newbutt et al., 2016), and effective in teaching other skills, such as taking public transport (Simoes et al., 2018).

Aims of this Review

Previous reviews have shown that XR interventions are feasible and sometimes beneficial for teaching autistic children a range of skills, although recognise that the quality of the literature is varied (e.g. Berenguer et al., 2020; Mesa-Gresa et al., 2018). However, no review to date has investigated the specific potential of XR for social skills. Furthermore, previous reviews of VR interventions have tended to include non-immersive technologies, such as VLEs. This review takes a focused approach by reviewing the evidence base for interventions in which one’s perception of reality is altered. Social skills interventions have most commonly been implemented in children but we chose to include young adults, based on evidence that brain maturation, particularly that of the prefrontal cortex which coordinates the goal directed behaviour needed to implement social skills, continues to develop into a person’s early twenties (Johnson et al., 2009). This is also in line with current plans to extend children’s mental health services in England up to 25 years (NHS Long Term Plan, 2019). Due to the infancy of this research area, we chose not to restrict inclusion on the basis of study design.

It was considered important to understand the effectiveness of these interventions because of the secondary benefits social skills can have on indicators of wellbeing, such as increased peer interactions (Mandelberg et al., 2014) and reduced anxiety and low mood (Hillier et al., 2011; Schohl et al., 2014). Secondly, the realism of immersive VR offers a unique potential to improve upon existing social skills interventions which do not always generalise to real contexts (Bellini et al., 2007). Thirdly, XR is at a point of rapid expansion, with VR headsets becoming increasingly widespread. If social skills interventions are going to capitalise on this expansion, it is important to consider whether there is any good quality evidence for their effectiveness and whether investment in these technologies will actually lead to meaningful benefits for autistic young people.

To this end, this review aims to answer the following questions:

  1. 1.

    Is there any evidence that extended reality interventions are effective in helping autistic children and young people to enhance their social skills?

  2. 2.

    What is the quality of the evidence for observed effects?

  3. 3.

    To what extent have observed effects been shown to lead to meaningful change for autistic people?

Method

This quantitative systematic review uses a synthesis without meta-analysis methodology and information is reported in line with Cochrane guidance (Garritty et al., 2021). The protocol was pre-registered with PROSPERO (ID: CRD42021229442; available at https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=229442). Five online databases (PsychInfo, Medline, Pubmed, EMBASE, CINAHL) were searched using key terms listed in Appendix 28. All retrieved datasets were downloaded into a Zotero library and exported to Excel for first author screening by title and abstract, then by full text. Another author (EM) screened 26.5% of studies at initial screening and 25% of studies at full text screening. All texts accepted at final screening were forward and backward citation searched using Scopus.

Screening and Selection

Studies were screened for eligibility according to the following inclusion criteria:

  • The paper was published in English, in a peer-reviewed journal to ensure a minimum quality level.

  • The paper was published after 1990, when AR was developed and VR headsets became widely commercially available.

  • The paper included an active intervention using XR. Studies described as VR but which used a desktop computer or motion based video game were not included for review as they do not to extend one’s perception of reality.

  • The study targeted one or more social skill.

  • The paper reported a quantitative measure of social skills, at least pre and post intervention. This included in vivo outcomes, as well as validated and idiosyncratic measures of social skills.

  • Study participants were children and young adults up to the age of 25.

  • Study participants were autistic.

Where papers reported on more than one study, they were included based on the study which met the inclusion criteria. Borderline cases were discussed amongst the researchers.

Study Selection

The initial search identified 592 records, 324 of which were retained after duplicates were removed (Fig. 1). Two hundred seventy-five studies were excluded at the abstract stage, with a 96.5% inter-rater reliability (kappa = .87). Thirty-one studies were excluded at the full text screen, with a 91.7% inter-rater reliability (kappa = .83). Disagreements were resolved through further screening at the full text stage and discussion with other authors. Seven additional records were identified through hand searching and citation chaining (Scopus search conducted May 3, 2021). A total of 17 studies were included in the review and a number of data items were extracted from each study. Where it was not possible to extract data directly, study authors were contacted, or available information was used to calculate an effect size. 50% of data extraction was duplicated by a second reviewer (EM).

Fig. 1
figure 1

Study selection process

Data Synthesis

Data were synthesised and organised according to the social communication criteria of DSM-5’s autism diagnostic criteria (APA, 2000), as the effect of XR on specific social skills was the primary area of interest. In line with Siddaway et al. (2019), findings were integrated and critiqued, not merely summarised. The high number of single case studies meant that it was not possible to conduct a meta-analysis but key information is summarised in tables, with a summary of effect (Cohen’s d) reported where enough information was provided in the paper or could be obtained through correspondence with the study authors. All studies were included for synthesis, irrespective of quality, but due to the range of study quality, summary plots were not developed, to avoid misrepresentation of the evidence base.

Quality Assessment

The quality and risk of bias of each study was evaluated using a quality checklist for healthcare intervention studies, developed by Downs and Black (1998). The tool was selected due to its applicability to both randomised and non-randomised studies. All papers were rated by the first author and 30% were duplicated by another author (EM), with an interrater reliability of 90.4% (kappa = 0.86). Discrepancies were resolved through discussion. Some items were not applicable to small-N research and so a summary score relative to the number of items assessed was generated for each paper to allow for direct comparison across studies. This was calculated by dividing the score obtained by the total possible score according to study design. A risk of bias score was generated by the same method, using only items assessing bias (14 to 26). A full summary of the assessment approach is provided in Appendix 31. Given that a number of items were not applicable to single case and small-N studies, the Single Case Experimental Design (SCED) Scale (Tate et al., 2008) was also used to qualitatively assess these studies (Appendix 32).

Results

Study Characteristics

Across the 17 studies (summarised in Table 1), 225 participants (15.5% female) aged 2 to 16 were included. Sample sizes ranged from one to 94 participants (M = 13.24, SD = 18.02). All participants were autistic however, only 11 studies reported that this was a clinical diagnosis according to DSM and only three confirmed diagnosis using a screening tool: Ip et al. (2018) using the Childhood Autism Spectrum Test (Williams et al., 2005); Liu et al. (2017) using the Social Communication Questionnaire (Chandler et al., 2007); and Lorenzo et al. (2019) using the Autism Spectrum Inventory (Rivière, 2002). Nine studies specified an intelligence quotient (IQ), with a mean or lower limit of at least 70. The studies used a range of designs, although a majority (10) were small-N, with either multiple baseline or pre-post measures. Only five studies used a control (Crowell et al., 2020; Herrero & Lorenzo, 2020; Ip et al., 2018; Lorenzo et al., 2016, 2019). Seven studies explicitly reported being at the preliminary or feasibility stage of the research.

Table 1 Summary of included studies

Fourteen unique interventions were evaluated (seven VR; six AR; one MR) and are described in Table 2. Seven adapted existing social skills interventions, such as video modelling (Chen et al., 2016), collaborative game play (Mora-Guiard et al., 2017) and Social Stories™ (Herrero & Lorenzo, 2020). Rationale for the use of XR, included distraction reduction; the potential for engagement and sustained attention; freedom to practice social interactions without risk; and customisability i.e. the capacity to emphasise social stimuli and adapt difficulty level. Interventions most often took place in schools, and ranged from one 15 minute session (Crowell et al., 2020) to eighty 25-min sessions (Lorenzo et al., 2013). Some studies gave rational for these design decisions, such as short sessions due to attentional capacity, but the majority gave no justification for the frequency or duration of XR use. Adverse outcomes were not routinely assessed but one study commented that some children had difficulties with restlessness and tolerance of the headset (Ravindran et al., 2019) and another used ‘reinforcement candy’ to encourage ‘emotional stability’ during the intervention (Cheng et al., 2015).

Table 2 Description of AR, VR and MR interventions and their reported purpose

Outcomes

Social-Emotional Reciprocity

Fifteen studies targeted social-emotional reciprocity: six using AR, seven using VR and two using MR. Specific targets included identifying and responding to social greetings; motivation to communicate with a conversation partner; understanding and use of social initiations; and sharing emotions and affect with others. Given the multiple countries in which these studies took place, it is possible that reciprocal behaviours considered socially appropriate differed according to cultural norms. Participants ranged from 6 to 16 years, although the majority of participants were primary school aged.

AR interventions included Smartglasses, which gave real-time prompts in social situations (Liu et al., 2017; Sahin et al., 2018; Vahabzadeh et al., 2018), a motion based game, which provided opportunities to practice reciprocal behaviours (Lee, 2020), and a concept map enhanced with AR to aid learning (Lee et al., 2018). One study used an AR smartphone app to enhance child-therapist interactions, but the specific effects of AR were unclear (Lorenzo et al., 2019). VR interventions were less varied, with the majority simulating social situations in which participants could practice specific reciprocal behaviours (Cheng et al., 2015; Herrero & Lorenzo, 2020; Ip et al., 2018; Lorenzo et al., 2013, 2016; Ravindran et al., 2019). One study utilised a specific role playing game within the VR (Tsai et al., 2020). Conversely, the MR intervention (Crowell et al., 2020; Mora-Guiard et al., 2017) used mixed reality objects to digitally emphasise the benefits of collaboration with a partner.

Outcomes were most commonly measured by presenting participants with a range of social situations (sometimes in the form of Social Stories™) and asking them to describe, and/or demonstrate, how they might respond (Cheng et al., 2015; Herrero & Lorenzo, 2020; Lee, 2020; Lee et al., 2018; Tsai et al., 2020). These studies all concluded that social-emotional reciprocity could be improved using their specific XR intervention however, the criteria for rating responses was not validated or tested for age appropriateness using a non-autistic sample. Furthermore, one study reported that participants were given prompts when responding but did not specify a protocol for this (Cheng et al., 2015). Another reported improvements in social reciprocity but assessed behaviours using a novel and unvalidated joint attention assessment (Ravindran et al., 2019). A number of studies measured outcomes by observing changes in behaviours during the XR intervention. Again, it was unclear whether the coded behaviours were age-appropriate, or had any relation to social skills in the real world. Of studies measuring in situ behaviours, two concluded that behaviours improved after the intervention (Herrero & Lorenzo, 2020; Mora-Guiard et al., 2017) and one found no impact of XR on social reciprocity when compared to an active control (Crowell et al., 2020). Others did not set clear hypotheses regarding primary outcome measures, making it difficult to draw conclusions on their efficacy (Lorenzo et al., 2013, 2016). Only four of the studies targeting social-emotional reciprocity utilised validated outcome measures (Ip et al., 2018; Liu et al., 2017; Sahin et al., 2018; Vahabzadeh et al., 2018): the Aberrant Behaviour Checklist (ABC; Aman et al., 1985), Social Responsiveness Scale (SRS-2; Constantino & Gruber, 2012) and PEP-3 Psychoeducational Profile (Schopler et al., 2005). All of these studies showed an improvement in scores before and after the intervention, with a maximum follow up of 2 weeks (Ip et al., 2018). However, only in one instance was this in comparison to a control (Ip et al., 2018). One measure was unsuitable for the age of the study participants (Ip et al., 2018) and another indicated a 100% reduction in social difficulties (Liu et al., 2017) despite theoretical understandings of autism as a lifelong neurodevelopmental difference characterised by differences in social communication. One study measured outcomes using the Autism Spectrum Inventory (IDEA; Rivière, 2002). It is unclear whether this is validated as the measure is in Spanish, but no improvements were observed when compared with an active control, and again, the theoretical justification for use of this measure was limited (Lorenzo et al, 2019).

Overall, a number of studies have targeted elements of social-emotional reciprocity and almost all claim evidence of effectiveness (see Table 3). However, there is significant risk of bias in how outcomes were measured and the better quality studies tended to be single case. It is therefore difficult to draw conclusions about the efficacy of XR for improving this aspect of social communication. Furthermore, despite some qualitative reports, the studies did not attempt to determine generalisability and lack of longer-term follow-up makes it difficult to determine the extent to which improvements were sustained.

Table 3 Summary of results

Non-verbal Communicative Behaviours

No studies targeted non-verbal communication in isolation but three studies (two VR and one AR) included non-verbal behaviours as outcome measures. Herrero and Lorenzo (2020) used an idiosyncratic measure of ‘non-verbal behaviours’ (e.g. use of facial cues, imitation and gestures) to suggest that use of their VR intervention was beneficial for 7- to 12-year-olds. However, all but one of the participants were rated as having ‘fair’ or ‘good’ non-verbal communication prior to the intervention and the measure used was not validated. Two studies concluded that their respective interventions increased eye contact: Ravindran et al. (2019) coded the amount of eye contact used by the 9- to 16-year-old participants during a joint attention assessment and Liu et al. (2017) utilised an idiosyncratic caregiver measure.

Given the limited number of studies which have investigated non-verbal communicative behaviours, it is difficult to draw conclusions about XR’s efficacy for this domain of social communication. This is compounded by use of unvalidated measures, small sample sizes and unclear theoretical rationale for anticipated improvements in non-verbal behaviours. Only one study measured generalisability (caregiver report).

Developing, Maintaining and Understanding Relationships

Two AR studies primarily targeted the development, maintenance and understanding of relationships (Chen et al., 2015, 2016) and one VR study included some relational outcome measures (Ip et al., 2018). The AR studies targeted emotion recognition which was measured according to participants’ ability to identify emotions in a story. According to an unvalidated assessment method, participants improved between baseline and follow up. Informal parent reports suggested that improvements corresponded with real world change in ability to identify emotions however, this was not formally measured. In contrast, Ip et al. (2018) found no evidence of improvements on this domain after use of VR, when compared with a waitlist control. They targeted the application of social skills to real life (e.g. ability to maintain relationships), measured using a domain of the Adaptive Behaviour Assessment System (ABAS-II; Harrison & Oakland, 2003), and emotion recognition, which was measured using the Eyes and Faces Tests (Baron-Cohen et al., 1997, 2001). Mixed outcomes from this small number of studies mean it is not possible to conclude whether XR is effective for improving relational social communication.

Related Skills

Two studies included outcomes which would theoretically benefit social skills but do not fall under the DSM-5 social communication criteria (APA, 2013), such as emotion regulation (Ip et al., 2018) and flexibility to changes during social situations (Herrero & Lorenzo, 2020). When compared with a waitlist control, the former showed an increase in emotion regulation ratings after use of VR, as measured by the PEP-3 (Schopler et al., 2005), and the latter showed improvements in flexibility to change across all participants in the intervention group, according to idiosyncratic measures.

Quality of Included Studies

Quality ratings ranged from 0.31 to 0.67 (where possible scores are between 0 and 1) with an average quality rating of 0.50. Risk of bias scores ranged from 0.17 to 0.83 with an average of 0.47. Although the adapted method of scoring makes it difficult to give a precise assessment of quality, comparisons with previous applications of this checklist suggests that the studies were generally of poor to fair quality (Hooper et al., 2008). Higher scores were typically given for clarity of reporting, such as describing the main aims, interventions and findings. The majority of studies took place in representative locations, and there was little evidence of non-compliance. Lower quality studies tended to have non-representative samples, unclear recruitment strategies and lack of reporting on attrition. Furthermore, several studies did not attempt to blind those involved in rating outcomes. It was particularly notable that no studies gave justification for their selected sample size.

Further quality appraisal using the SCED Scale indicated that the majority of small-N studies were truly experimental in design (7 of 10) and conducted sufficient sampling at baseline and treatment phases; some of these studies also performed statistical analysis. However, a lack of precise and repeatable outcome measures was a significant weakness across small-N studies. In particular, outcomes tended to be rated by one, non-independent assessor and where there were multiple assessors, interrater reliability was not calculated. No attempt was made to show generalisation across settings or therapists.

Discussion

This systematic review identified 17 studies that investigated the potential of XR interventions to enhance the social skills of autistic children. To date, the majority of interventions have targeted social-emotional reciprocity, with relatively little attention given to the non-verbal and relationship aspects of social-communication. The overall quality of the research is relatively low, perhaps reflecting the infancy of the research area. The majority of studies have small sample sizes and a number are not truly experimental, making it difficult to draw firm conclusions about the efficacy of XR, particularly in comparison to current, less costly social skills interventions. Significant heterogeneity means it is not possible to determine the ‘active ingredient’ of XR interventions. Furthermore, only limited attempts have been made as yet to determine generalisability and there is limited exploration of whether the statistically significant changes observed by many of the studies were also clinically significant i.e. led to improvements in everyday social or occupational skills, or other important areas of functioning.

Limitations of the Current Literature

Given the relative novelty of this research, it is important to consider the limitations of the current evidence base, to be improved upon in future research. In particular, current evidence is limited in its ethical considerations, theoretical grounding, robustness of study design and sample representativeness. It is also unclear as yet, how easily and cost-effectively XR interventions could be scaled to clinical practice.

Ethical Considerations

Social skills interventions are primarily important because of the negative secondary effects of social difficulties and so quality of life has been an important outcome measure across the literature (e.g. Baghdadli et al., 2013; Mitchel et al., 2010). However, none of the studies included in this review measured quality of life, nor did they consider the importance of the intervention target to the participants themselves. Krasny et al. (2003) emphasise that interventions are most efficacious when children understand the relevance of the social skill and work towards individualised goals. Future interventions could capitalise on the customisability of XR to tailor interventions to individual needs. It is perhaps telling that the majority of interventions have targeted social reciprocity which, by definition, impacts those around autistic people. There are few attempts to directly target development and maintenance of relationships, despite this being what some autistic people have expressed as most important (Cresswell et al., 2019). Some studies did use participatory research methods, i.e. the incorporation of the views of autistic people into the design and implementation of the research (Cornwall & Jewkes, 1995), although the extent of involvement is somewhat limited. Increased meaningful involvement from autistic people within this field of research would help to ensure outputs are relevant and beneficial to the people for whom they are designed (Fletcher-Watson et al., 2019).

Future research should also make a more comprehensive attempt to measure adverse events, particularly given that one study which did comment on negative outcomes, noted problems with restlessness and discomfort, while another reported using positive reinforcement to keep children engaged. Moreover, commercially available VR headsets typically include safety notices regarding their use in children, emphasising the need for robust assessment of potential harm. Future studies could improve this through quantitative measurement of participant experience, such as whether the tool was straightforward to use and physically and sensorially comfortable, and analysis of the relationship to outcomes i.e. whether more positive experiences are associated with greater improvements. As well as the potential adverse effects of XR, future studies should consider the wider impact of teaching children to implement neurotypical social skills. Many autistic people already invest significant time and energy in monitoring and modifying their behaviour in order to align with social norms, known as ‘social camouflaging’. Evidence suggests that this hiding of one’s true self can be mentally and physically draining, and requires excessive concentration, self-control and discomfort (Hull et al., 2017; Mandy, 2019). Further exploration is needed of whether social skills training simply teaches children to mask their true selves, and if so, serious attention should be given to the potential deleterious impact on wellbeing. Perhaps greater research focus could instead be given to how settings, such as schools, could adapt to become more accommodating for autistic people, rather than placing all emphasis on changing autistic children to manage in a neurotypical world. This would be more consistent with the shared responsibility for communication between autistic and non-autistic people highlighted by the double empathy problem (Milton, 2012).

Theoretical Grounding

Despite the number of theoretical explanations for social differences in autism, the studies reviewed in this research give relatively little theoretical justification to the design of their interventions. As a result, XR has been applied in a number of different ways, often with no clear rationale for why design decisions were made. Greater attention to theory when designing interventions might further improve efficacy. This has proved beneficial for non-XR studies, such as the Mind Reading intervention which draws on the empathising-systemising theory by exploiting participants’ relative strength in systematising when teaching emotion recognition (Golan & Baron-Cohen, 2006). Limited consideration of the mechanism by which change might occur may also account for lack of specificity of some of the research. In particular, some studies measured a number of outcomes with no clear indication of how these related to the purpose or design of the intervention. Theoretical understandings of the mechanism of change could also inform intervention duration, which varied significantly across the studies. While it seems theoretically unlikely that clinically significant changes could be made in the time of the shortest intervention (15 min), strong theoretical justification should be given before subjecting children to high numbers of intervention sessions.

Study Design

Quality appraisal highlighted a number of methodological limitations of the current research. For example, the majority of studies did not attempt to blind the individuals who coded behavioural outcomes and in fact, behaviours were often rated by researchers or parents who knew the purpose of the study, both of whom are likely to have been highly invested in positive outcomes. In particular, parents of autistic children have been shown to report positive effects when told their child is receiving treatment, even in the absence of an intervention (Jones et al., 2017). It is also possible that improvements occurred as a result of repetition of the outcome measures themselves. For example, a number of studies used questions based on Social Stories™ to measure outcomes, and as a social skills intervention in itself, this may have contributed to improvements. Similarly, when outcomes were measured by in vivo performance, improvements may have been due to practice effects, rather than social skill development. Greater use of validated measures of social skills, applied by raters who are blinded to the study purpose and intervention allocation, would improve some of these issues. Future research should also ensure that potential confounders, such as whether participants are receiving any other therapies, are fully considered.

Appropriately for the stage of research, the majority of studies were small-N. However, in larger trials, no details were given about the determination of sample size, and so it may be that those which did not find an effect were underpowered. In one study, a significant proportion of the sample were excluded from analysis based on age, however, the purpose of this is not explained, diminishing the validity of their results. To determine effectiveness, more large-scale trials are required: these should be sufficiently powered and ideally pre-registered, to reduce the likelihood of data dredging and subsequent false positives. More experimental, controlled studies would also make it easier to determine, not only whether XR interventions are effective, but whether they offer any real improvement to social skills, over and above current methods.

Representativeness of Sample

Current research is limited in the extent to which participants are representative of the autistic population as a whole. Around 25% of autistic people are female (Loomes et al., 2017) but only five of the included studies met this threshold. Similarly, up to 50% of the autistic population are estimated to have an Intellectual Disability (ID; Charman et al., 2011; Loomes et al., 2017) yet a number of the included studies excluded participants with an ID or stated a mean IQ within the average range. Limited justification is given for this and it compounds evidence that people with ID are consistently excluded from autism research (G. Russell et al., 2019). Future research should seek to recruit participants who truly reflect the autistic population as a whole, or provide strong justification for exclusion.

Limitations of the Review

The findings of this review should be viewed in the context of its limitations. Firstly, an effect size could only be given for a limited number of studies making it impossible to compare the efficacy of different interventions using meta-analysis. Instead, this study used a systematic but non-analytic approach to synthesis. Vote counting and summary plots were avoided to reduce the potential misrepresentation of efficacy, given the number of low quality studies. This could perhaps have been improved by only including studies which meet a particular quality threshold. However, this was not deemed feasible for this review given the small number of studies. As the research develops and more experimental, controlled studies are carried out, more systematic methods of synthesis will be possible. Furthermore, publication bias will have significantly impacted the number of studies showing positive effects. Grey literature was not included, to ensure included studies were of peer-review quality, but this may have increased the positive results bias. When determining studies to be included, there were some borderline cases, where insufficient descriptions of study interventions made it difficult to establish whether the XR was immersive. This was resolved through discussion amongst researchers but, could have been improved by contacting study authors for more information.

The conclusions that can be drawn from this review are also constrained by the limited amount of data, both in terms of number of studies and overall sample size. The maximum age of participants across studies was 16 and so it is unclear whether XR interventions could be effective for helping older teenagers and young adults to enhance their social skills. Additionally, only VR studies which altered one’s perception of reality were included and so the efficacy of immersive VR could not be compared to that of VLEs, which may be more cheaply and easily implemented in clinical practice.

Conclusion

The potential to digitally alter how one perceives the world offers exciting new possibilities in developing effective and engaging social skills interventions for autistic children. However, enthusiasm for XR interventions should be viewed with caution. While there is some indication that interventions are feasible to implement with autistic children, limited evidence exists for their effectiveness in bringing about meaningful, longstanding improvements in everyday functioning. Furthermore, no assessment has been made of the potential emotional cost of autistic children implementing behaviours which do not come naturally to them. This review demonstrates the need for theoretically grounded interventions, designed with the interests of autistic people at the forefront. Controlled trials and larger sample sizes, as well as other improvements to study design, are required to draw firm conclusions about the efficacy of XR interventions and their generalisability, before potentially high cost scaling to routine clinical services is considered.