Are Extended Reality Interventions Effective in Helping Autistic Children to Enhance Their Social Skills? A Systematic Review

Autistic children’s social skills do not always align with those of their neurotypical peers and research suggests that this can negatively impact quality of life. This review aimed to assess the effectiveness of extended reality (XR) interventions in helping autistic children to enhance their social skills. Five electronic databases were systematically searched and seventeen studies were identified. The majority targeted social-emotional reciprocity and were of relatively low quality. There was insufficient evidence to determine whether effects were generalisable, sustained or important to autistic people. Research in this field is in its infancy and evidence of effectiveness should be viewed with caution. Future studies should aim for high-quality, theory-driven research, and involve autistic people to ensure meaningful outcomes. PROSPERO ID: CRD42021229442


Autism and Social Differences
Effective social communication depends on a range of social skills, a term which is complex and has been variably defined across the literature, but is broadly considered to encompass verbal and non-verbal behaviours that enable positive social interactions (Gresham & Elliott, 1984).This multifaceted concept includes innate skills, such as eye tracking and a preference for faces, as well as more complex, learned behaviours, such as adjusting behaviour to suit particular social contexts (Spence, 2003).An exhaustive list of skills considered to be social is difficult to establish, due to the number of cognitive and behavioural processes drawn upon in any given social situation.
Autistic 1 children's social skills do not always align with those of neurotypical people.Their skills are different in terms of social-emotional reciprocity (e.g.social approach and initiation of interactions), nonverbal communicative behaviours (e.g.eye contact, and use of facial expressions), and the development, maintenance and understanding of relationships (e.g.reduced interest in peers and difficulties making friends); Diagnostic and Statistical Manual of Mental Disorders, 5th ed.; DSM-5; American Psychiatric Association [APA], (2013).These differences manifest differently for each individual (Happé et al., 2006) and can have negative secondary effects on wellbeing, learning and relationships.For example, social skills which do not align with neurotypical people have been shown to predict social anxiety (Bellini, 2006) and are associated with social rejection from peers (Ochs et al., 2001).This is not through lack of interest in social contact (Jaswal & Akhtar, 2019); autistic young people often express a desire for social interaction and report being more lonely than their neurotypical peers (Bauminger & Kasari, 2000).Furthermore, social differences in autistic people can have a negative effect on academic and occupational outcomes (Howlin, 2003;Welsh et al., 2001).

Theories of Social Difference in Autism
A number of theories might explain the social differences observed in autism.One theory proposes that autistic people can find it harder to make predictions about others' mental states (theory of mind) and so may find it difficult to encode socially relevant information (Baron-Cohen et al., 1985).Alternatively, it may be that difficulty understanding others' interpretations of the world is due to an underlying difference in executive functioning, such as rule learning (J.Russell, 1997).This can account for the finding that some autistic children struggle with theory of mind style tasks, even when the social element is removed (Russell et al., 1991).In particular, predictive processing might be different in autism.Pellicano and Burr (2012) propose that autistic people perceive the world more accurately, as they are less biased by prior experience when making predictions.At times this may be a strength, but could make some elements of social interaction more difficult, such as predicting when another person will speak (Stark et al., 2021).Another cognitive theory of autism is 'weak central coherence' (Frith, 1989) which suggests that autistic people have a preference for local processing rather than striving for overall meaning.This may lead to differences in the interpretation of social behaviours.However, more recent evidence suggests that this theory sits alongside social differences, rather than explains them (Happé & Frith, 2006).These cognitive theories are in line with evidence that social desire is not affected by autism (Happé & Frith, 1996).
In contrast, social motivation theory proposes that autistic people have differences in the psychological dispositions and biological mechanisms that bias neurotypical individuals to orient towards social stimuli, seek and derive pleasure from social interactions, and work to maintain social bonds (Chevallier et al., 2012).However, this contrasts with many accounts of autistic people, overlooks the role of other people's perceptions and responses, and ignores the many other explanations of social differences in autism (Jaswal & Akhtar, 2019).Furthermore, this theory has led to differences such as limited eye contact being prioritised as intervention targets, despite evidence for the positive effects of gaze aversion on concentration and regulation of emotions (Robledo et al., 2012).

Social Skills Interventions
A number of interventions have sought to teach social skills to autistic children, due to the secondary benefits on wellbeing and quality of life shown by previous research (McConnell, 2002).A range of outcomes have been targeted, such as turn taking, initiating interaction, recognising emotions and understanding theory of mind (Begeer et al., 2011;Matson et al., 2007;Rieth et al., 2014).Well-established interventions include Social Stories™ (Karkhaneh et al., 2010), video modelling (Sng et al., 2014), pivotal response training (Koegel et al., 2016), social skills groups (Reichow et al., 2012) and peer-mediated behavioural interventions (Laushey & Heflin, 2000).There is increasing evidence that, at least in some contexts, these interventions can help autistic children to utilise neurotypical social skills but the duration, generalisability and precise mechanism of change remains unclear (McConnell, 2002).Furthermore, little acknowledgement has been given to the fact that it may be effortful, and perhaps emotionally draining for autistic people to operationalise these skills.
When considering social skills interventions, it is important to acknowledge that social differences in autism do not necessarily equate to a 'difficulty'.Many argue that difficulties arise due to a lack of accommodation of autistic differences in a neurotypical society (Brownlow, 2010), and emerging research is suggestive of a 'double empathy problem', whereby communication breakdowns between autistic and non-autistic people are a two-way issue resulting from disjuncture in reciprocity between the two differently disposed social actors (Edey et al., 2016;Milton, 2012;Sheppard et al., 2016).Social skills training can therefore perpetuate the idea that autistic people need to adapt, without consideration of the role of the contexts in which autistic people live, learn and work in understanding, supporting and responding to autistic communication.However, many autistic people already find ways to overcome social challenges and seek support in doing this (Cresswell et al., 2019).Social skills interventions are one way in which services can support people to overcome these challenges, particularly in circumstances where it is not possible to adapt the environment e.g.public places.

Extended Reality Technologies
Over the last three decades, social skills interventions have begun to capitalise on the potential advantages of digital technologies.Computerised approaches can support social skills in situ (Vygotsky & Luria, 1994) or improve traditional training approaches, due to the attentional and motivational advantages of interactive technologies (Golan & Baron-Cohen, 2006).More recently, these advantages have been exploited using extended reality (XR), i.e. the merging of physical and virtual realities (Riva et al., 2016).This encompasses augmented reality (AR), in which virtual information is overlaid onto the real world, virtual reality (VR), in which users are fully immersed in a simulated environment and mixed reality (MR), in which digital and real-world objects interact in real-time.By merging physical and virtual realities, XR could reduce the salience of potentially distressing stimuli, making interactions less overwhelming for some autistic people, thus providing an effective space to experiment with social skills (Thye et al., 2018).In virtual social settings, there is no risk of embarrassment in front of one's peers, yet the degree of realism may increase the likelihood of generalisability (Strickland et al., 1996).
Extended reality interventions have been successfully implemented in a number of therapeutic contexts including stroke rehabilitation (Laver et al., 2017) and exposure therapies for anxiety disorders (Bouchard et al., 2017;Vincelli et al., 2003).Applications for autistic people are also emerging, although the majority of interventions to date have used virtual learning environments (VLEs), a non-immersive technology in which users interact with simulated social situations using a desktop computer (e.g.Didehbani et al., 2016).More immersive technologies tend to require a headmounted display (HMD) or Smartglasses, computer glasses that change what the wearer sees.Despite the potential challenges for those with tactile hypersensitivities, they have been shown to be well-tolerated, enjoyable and engaging in this population (Keshav et al., 2017;Newbutt et al., 2016), and effective in teaching other skills, such as taking public transport (Simoes et al., 2018).

Aims of this Review
Previous reviews have shown that XR interventions are feasible and sometimes beneficial for teaching autistic children a range of skills, although recognise that the quality of the literature is varied (e.g.Berenguer et al., 2020;Mesa-Gresa et al., 2018).However, no review to date has investigated the specific potential of XR for social skills.Furthermore, previous reviews of VR interventions have tended to include non-immersive technologies, such as VLEs.This review takes a focused approach by reviewing the evidence base for interventions in which one's perception of reality is altered.Social skills interventions have most commonly been implemented in children but we chose to include young adults, based on evidence that brain maturation, particularly that of the prefrontal cortex which coordinates the goal directed behaviour needed to implement social skills, continues to develop into a person's early twenties (Johnson et al., 2009).This is also in line with current plans to extend children's mental health services in England up to 25 years (NHS Long Term Plan, 2019).Due to the infancy of this research area, we chose not to restrict inclusion on the basis of study design.
It was considered important to understand the effectiveness of these interventions because of the secondary benefits social skills can have on indicators of wellbeing, such as increased peer interactions (Mandelberg et al., 2014) and reduced anxiety and low mood (Hillier et al., 2011;Schohl et al., 2014).Secondly, the realism of immersive VR offers a unique potential to improve upon existing social skills interventions which do not always generalise to real contexts (Bellini et al., 2007).Thirdly, XR is at a point of rapid expansion, with VR headsets becoming increasingly widespread.If social skills interventions are going to capitalise on this expansion, it is important to consider whether there is any good quality evidence for their effectiveness and whether investment in these technologies will actually lead to meaningful benefits for autistic young people.
To this end, this review aims to answer the following questions:

Screening and Selection
Studies were screened for eligibility according to the following inclusion criteria: • The paper was published in English, in a peer-reviewed journal to ensure a minimum quality level.• The paper was published after 1990, when AR was developed and VR headsets became widely commercially available.• The paper included an active intervention using XR.
Studies described as VR but which used a desktop computer or motion based video game were not included for review as they do not to extend one's perception of reality.• The study targeted one or more social skill.• The paper reported a quantitative measure of social skills, at least pre and post intervention.This included in vivo outcomes, as well as validated and idiosyncratic measures of social skills.• Study participants were children and young adults up to the age of 25. • Study participants were autistic.
Where papers reported on more than one study, they were included based on the study which met the inclusion criteria.Borderline cases were discussed amongst the researchers.

Study Selection
The initial search identified 592 records, 324 of which were retained after duplicates were removed (Fig. 1).Two hundred seventy-five studies were excluded at the abstract stage, with a 96.5% inter-rater reliability (kappa = .87).Thirty-one studies were excluded at the full text screen, with a 91.7% inter-rater reliability (kappa = .83).Disagreements were resolved through further screening at the full text stage and discussion with other authors.Seven additional records were identified through hand searching and citation chaining (Scopus search conducted May 3, 2021).A total of 17 studies were included in the review and a number of data items were extracted from each study.Where it was not possible to extract data directly, study authors were contacted, or available information was used to calculate an effect size.50% of data extraction was duplicated by a second reviewer (EM).

Data Synthesis
Data were synthesised and organised according to the social communication criteria of DSM-5's autism diagnostic criteria (APA, 2000), as the effect of XR on specific social skills was the primary area of interest.In line with Siddaway et al. (2019), findings were integrated and critiqued, not merely summarised.The high number of single case studies meant that it was not possible to conduct a meta-analysis but key information is summarised in tables, with a summary of effect (Cohen's d) reported where enough information was provided in the paper or could be obtained through correspondence with the study authors.All studies were included for synthesis, irrespective of quality, but due to the range of study quality, summary plots were not developed, to avoid misrepresentation of the evidence base.

Quality Assessment
The quality and risk of bias of each study was evaluated using a quality checklist for healthcare intervention studies, developed by Downs and Black (1998).The tool was selected due to its applicability to both randomised and nonrandomised studies.All papers were rated by the first author and 30% were duplicated by another author (EM), with an interrater reliability of 90.4% (kappa = 0.86).Discrepancies were resolved through discussion.Some items were not applicable to small-N research and so a summary score relative to the number of items assessed was generated for each paper to allow for direct comparison across studies.This was calculated by dividing the score obtained by the total possible score according to study design.A risk of bias score was generated by the same method, using only items assessing bias (14 to 26).A full summary of the assessment approach is provided in Appendix 31.Given that a number of items were not applicable to single case and small-N studies, the Single Case Experimental Design (SCED) Scale (Tate et al., 2008) was also used to qualitatively assess these studies (Appendix 32).

Study Characteristics
Across the 17 studies (summarised in  (Chandler et al., 2007); and Lorenzo et al. (2019) using the Autism Spectrum Inventory (Rivière, 2002).Nine studies specified an intelligence quotient (IQ), with a mean or lower limit of at least 70.The studies used a range of designs, although a majority (10) were small-N, with either multiple baseline or pre-post measures.Only five studies used a control (Crowell et al., 2020;Herrero & Lorenzo, 2020;Ip et al., 2018;Lorenzo et al., 2016Lorenzo et al., , 2019)).Seven studies explicitly reported being at the preliminary or feasibility stage of the research.Fourteen unique interventions were evaluated (seven VR; six AR; one MR) and are described in Table 2. Seven adapted existing social skills interventions, such as video modelling (Chen et al., 2016), collaborative game play (Mora-Guiard et al., 2017) and Social Stories™ (Herrero & Lorenzo, 2020).Rationale for the use of XR, included distraction reduction; the potential for engagement and sustained attention; freedom to practice social interactions without risk; and customisability i.e. the capacity to emphasise social stimuli and adapt difficulty level.Interventions most often took place in schools, and ranged from one 15 minute session (Crowell et al., 2020) to eighty 25-min sessions (Lorenzo et al., 2013).Some studies gave rational for these design decisions, such as short sessions due to attentional capacity, but the majority gave no justification for the frequency or duration of XR use.Adverse outcomes were not routinely assessed but one study commented that some children had difficulties with restlessness and tolerance of the headset (Ravindran et al., 2019) and another used 'reinforcement candy' to encourage 'emotional stability' during the intervention (Cheng et al., 2015).

Social-Emotional Reciprocity
Fifteen studies targeted social-emotional reciprocity: six using AR, seven using VR and two using MR.Specific targets included identifying and responding to social greetings; motivation to communicate with a conversation partner; understanding and use of social initiations; and sharing emotions and affect with others.Given the multiple countries in which these studies took place, it is possible that reciprocal behaviours considered socially appropriate differed according to cultural norms.Participants ranged from 6 to 16 years, although the majority of participants were primary school aged.
AR interventions included Smartglasses, which gave realtime prompts in social situations (Liu et al., 2017;Sahin et al., 2018;Vahabzadeh et al., 2018), a motion based game, which provided opportunities to practice reciprocal behaviours (Lee, 2020), and a concept map enhanced with AR to aid learning (Lee et al., 2018).One study used an AR smartphone app to enhance child-therapist interactions, but the specific effects of AR were unclear (Lorenzo et al., 2019).VR interventions were less varied, with the majority simulating social situations in which participants could practice specific reciprocal behaviours (Cheng et al., 2015;Herrero & Lorenzo, 2020;Ip et al., 2018;Lorenzo et al., 2013Lorenzo et al., , 2016;;Ravindran et al., 2019).One study utilised a specific role playing game within the VR (Tsai et al., 2020).Conversely, the MR intervention (Crowell et al., 2020;Mora-Guiard et al., 2017) used mixed reality objects to digitally emphasise the benefits of collaboration with a partner.
Outcomes were most commonly measured by presenting participants with a range of social situations (sometimes in the form of Social Stories™) and asking them to describe, and/or demonstrate, how they might respond (Cheng et al., 2015;Herrero & Lorenzo, 2020;Lee, 2020;Lee et al., 2018;Tsai et al., 2020).These studies all concluded that socialemotional reciprocity could be improved using their specific XR intervention however, the criteria for rating responses was not validated or tested for age appropriateness using a non-autistic sample.Furthermore, one study reported that participants were given prompts when responding but did not specify a protocol for this (Cheng et al., 2015).Another reported improvements in social reciprocity but assessed behaviours using a novel and unvalidated joint attention assessment (Ravindran et al., 2019).A number of studies measured outcomes by observing changes in behaviours during the XR intervention.Again, it was unclear whether the coded behaviours were age-appropriate, or had any relation to social skills in the real world.Of studies measuring in situ behaviours, two concluded that behaviours improved after the intervention (Herrero & Lorenzo, 2020;Mora-Guiard et al., 2017) and one found no impact of XR on social reciprocity when compared to an active control (Crowell et al., 2020).Others did not set clear hypotheses regarding primary outcome measures, making it difficult to draw conclusions on their efficacy (Lorenzo et al., 2013(Lorenzo et al., , 2016)).Only four of the studies targeting social-emotional reciprocity utilised validated outcome measures (Ip et al., 2018;Liu et al., 2017;Sahin et al., 2018;Vahabzadeh et al., 2018): the Aberrant Behaviour Checklist (ABC; Aman et al., 1985), Social Responsiveness Scale (SRS-2; Constantino & Gruber, 2012) and PEP-3 Psychoeducational Profile (Schopler et al., 2005).All of these studies showed an improvement in scores before and after the intervention, with a maximum follow up of 2 weeks (Ip et al., 2018).However, only in one instance was this in comparison to a control (Ip et al., 2018).One measure was unsuitable for the age of the study participants (Ip et al., 2018) and another indicated a 100% reduction in social difficulties (Liu et al., 2017) despite theoretical understandings It was not possible to determine whether this is a validated measure due to it having been published in Spanish Users score points by directing their gaze towards their caregiver's face.Emotion Game asks children to identify the facial expression of the person in front of them by tilting their head towards the correct emoticon.Points are gained for correct answers.

Quiver Vision
A smartphone application is used to augment real life pictures so that they are perceived as 3D objects.In Lorenzo et al. (2019), participants engage in activities which involve interacting with the AR components alongside the therapist, such as touching different AR objects according to the therapist's instructions.VR 3D-SU System Users view and interact with animated social events using an immersive VR headset.Based on their responses to questions about the events, users are rewarded with a message and applause or instructed to try again.Using a headset is thought to reduce distraction and encourage sustained attention towards social events.Cave Automatic Virtual Environment (CAVE) VR Users are immersed in a number of social scenarios within a 'cave' of multiple large projection screens.This allows children to practice their responses free from risk of embarrassment.Ip et al. (2018)  Users are immersed in a number of scenarios using two large projection screens and movement detection software.Lorenzo et al. (2013) use this system to allow children to rehearse their responses to a range of classroom based tasks such as initiating conversation with a classmate.In Lorenzo et al. (2016), participants follow an evaluator's guidance on how to behave in a range of social situations e.g.attending a birthday party.This approach is thought to be beneficial due to its realism and the capacity to present information clearly without irrelevant distractors.
of autism as a lifelong neurodevelopmental difference characterised by differences in social communication.One study measured outcomes using the Autism Spectrum Inventory (IDEA; Rivière, 2002).It is unclear whether this is validated as the measure is in Spanish, but no improvements were observed when compared with an active control, and again, the theoretical justification for use of this measure was limited (Lorenzo et al, 2019).Overall, a number of studies have targeted elements of social-emotional reciprocity and almost all claim evidence of effectiveness (see Table 3).However, there is significant risk of bias in how outcomes were measured and the better quality studies tended to be single case.It is therefore difficult to draw conclusions about the efficacy of XR for improving this aspect of social communication.Furthermore, despite some qualitative reports, the studies did not attempt to determine generalisability and lack of longer-term follow-up makes it difficult to determine the extent to which improvements were sustained.

Non-verbal Communicative Behaviours
No studies targeted non-verbal communication in isolation but three studies (two VR and one AR) included non-verbal behaviours as outcome measures.Herrero and Lorenzo (2020) used an idiosyncratic measure of 'non-verbal behaviours' (e.g.use of facial cues, imitation and gestures) to suggest that use of their VR intervention was beneficial for 7-to 12-year-olds.However, all but one of the participants were rated as having 'fair' or 'good' non-verbal communication prior to the intervention and the measure used was not validated.Two studies concluded that their respective interventions increased eye contact: Ravindran et al. (2019) coded the amount of eye contact used by the 9-to 16-yearold participants during a joint attention assessment and Liu et al. (2017) utilised an idiosyncratic caregiver measure.
Given the limited number of studies which have investigated non-verbal communicative behaviours, it is difficult to draw conclusions about XR's efficacy for this domain of social communication.This is compounded by use of unvalidated measures, small sample sizes and unclear theoretical rationale for anticipated improvements in non-verbal behaviours.Only one study measured generalisability (caregiver report).

Developing, Maintaining and Understanding Relationships
Two AR studies primarily targeted the development, maintenance and understanding of relationships (Chen et al., 2015(Chen et al., , 2016) ) and one VR study included some relational outcome measures (Ip et al., 2018).The AR studies targeted emotion recognition which was measured according to participants' ability to identify emotions in a story.According to an unvalidated assessment method, participants improved between baseline and follow up.Informal parent reports suggested that improvements corresponded with real world change in ability to identify emotions however, this was not formally measured.In contrast, Ip et al. (2018) found no evidence of improvements on this domain after use of VR, when compared with a waitlist control.They targeted the application of social skills to real life (e.g.ability to maintain relationships), measured using a domain of the Adaptive Behaviour Assessment System (ABAS-II; Harrison & Oakland, 2003), and emotion recognition, which was measured using the Eyes and Faces Tests (Baron-Cohen et al., 1997, 2001).Mixed outcomes from this small number of studies mean it is not possible to conclude whether XR is effective for improving relational social communication.

Related Skills
Two studies included outcomes which would theoretically benefit social skills but do not fall under the DSM-5 social communication criteria (APA, 2013), such as emotion regulation (Ip et al., 2018) and flexibility to changes during social situations (Herrero & Lorenzo, 2020).When compared with a waitlist control, the former showed an increase in emotion regulation ratings after use of VR, as measured by the PEP-3 (Schopler et al., 2005), and the latter showed improvements in flexibility to change across all participants in the intervention group, according to idiosyncratic measures.

Quality of Included Studies
Quality ratings ranged from 0.31 to 0.67 (where possible scores are between 0 and 1) with an average quality rating of 0.50.Risk of bias scores ranged from 0.17 to 0.83 with  an average of 0.47.Although the adapted method of scoring makes it difficult to give a precise assessment of quality, comparisons with previous applications of this checklist suggests that the studies were generally of poor to fair quality (Hooper et al., 2008).Higher scores were typically given for clarity of reporting, such as describing the main aims, interventions and findings.The majority of studies took place in representative locations, and there was little evidence of non-compliance.Lower quality studies tended to have nonrepresentative samples, unclear recruitment strategies and lack of reporting on attrition.Furthermore, several studies did not attempt to blind those involved in rating outcomes.
It was particularly notable that no studies gave justification for their selected sample size.Further quality appraisal using the SCED Scale indicated that the majority of small-N studies were truly experimental in design (7 of 10) and conducted sufficient sampling at baseline and treatment phases; some of these studies also performed statistical analysis.However, a lack of precise and repeatable outcome measures was a significant weakness across small-N studies.In particular, outcomes tended to be rated by one, non-independent assessor and where there were multiple assessors, interrater reliability was not calculated.No attempt was made to show generalisation across settings or therapists.

Discussion
This systematic review identified 17 studies that investigated the potential of XR interventions to enhance the social skills of autistic children.To date, the majority of interventions have targeted social-emotional reciprocity, with relatively little attention given to the non-verbal and relationship aspects of social-communication.The overall quality of the research is relatively low, perhaps reflecting the infancy of the research area.The majority of studies have small sample sizes and a number are not truly experimental, making it difficult to draw firm conclusions about the efficacy of XR, particularly in comparison to current, less costly social skills interventions.Significant heterogeneity means it is not possible to determine the 'active ingredient' of XR interventions.Furthermore, only limited attempts have been made as yet to determine generalisability and there is limited exploration of whether the statistically significant changes observed by many of the studies were also clinically significant i.e. led to improvements in everyday social or occupational skills, or other important areas of functioning.

Limitations of the Current Literature
Given the relative novelty of this research, it is important to consider the limitations of the current evidence base, to be improved upon in future research.In particular, current evidence is limited in its ethical considerations, theoretical grounding, robustness of study design and sample representativeness.It is also unclear as yet, how easily and cost-effectively XR interventions could be scaled to clinical practice.

Ethical Considerations
Social skills interventions are primarily important because of the negative secondary effects of social difficulties and so quality of life has been an important outcome measure across the literature (e.g.Baghdadli et al., 2013;Mitchel et al., 2010).However, none of the studies included in this review measured quality of life, nor did they consider the importance of the intervention target to the participants themselves.Krasny et al. (2003) emphasise that interventions are most efficacious when children understand the relevance of the social skill and work towards individualised goals.Future interventions could capitalise on the customisability of XR to tailor interventions to individual needs.It is perhaps telling that the majority of interventions have targeted social reciprocity which, by definition, impacts those around autistic people.There are few attempts to directly target development and maintenance of relationships, despite this being what some autistic people have expressed as most important (Cresswell et al., 2019).Some studies did use participatory research methods, i.e. the incorporation of the views of autistic people into the design and implementation of the research (Cornwall & Jewkes, 1995), although the extent of involvement is somewhat limited.Increased meaningful involvement from autistic people within this field of research would help to ensure outputs are relevant and beneficial to the people for whom they are designed (Fletcher-Watson et al., 2019).
Future research should also make a more comprehensive attempt to measure adverse events, particularly given that one study which did comment on negative outcomes, noted problems with restlessness and discomfort, while another reported using positive reinforcement to keep children engaged.Moreover, commercially available VR headsets typically include safety notices regarding their use in children, emphasising the need for robust assessment of potential harm.Future studies could improve this through quantitative measurement of participant experience, such as whether the tool was straightforward to use and physically and sensorially comfortable, and analysis of the relationship to outcomes i.e. whether more positive experiences are associated with greater improvements.As well as the potential adverse effects of XR, future studies should consider the wider impact of teaching children to implement neurotypical social skills.Many autistic people already invest significant time and energy in monitoring and modifying their behaviour in order to align with social norms, known as 'social camouflaging'.Evidence suggests that this hiding of one's true self can be mentally and physically draining, and requires excessive concentration, self-control and discomfort (Hull et al., 2017;Mandy, 2019).Further exploration is needed of whether social skills training simply teaches children to mask their true selves, and if so, serious attention should be given to the potential deleterious impact on wellbeing.Perhaps greater research focus could instead be given to how settings, such as schools, could adapt to become more accommodating for autistic people, rather than placing all emphasis on changing autistic children to manage in a neurotypical world.This would be more consistent with the shared responsibility for communication between autistic and non-autistic people highlighted by the double empathy problem (Milton, 2012).

Theoretical Grounding
Despite the number of theoretical explanations for social differences in autism, the studies reviewed in this research give relatively little theoretical justification to the design of their interventions.As a result, XR has been applied in a number of different ways, often with no clear rationale for why design decisions were made.Greater attention to theory when designing interventions might further improve efficacy.This has proved beneficial for non-XR studies, such as the Mind Reading intervention which draws on the empathising-systemising theory by exploiting participants' relative strength in systematising when teaching emotion recognition (Golan & Baron-Cohen, 2006).Limited consideration of the mechanism by which change might occur may also account for lack of specificity of some of the research.In particular, some studies measured a number of outcomes with no clear indication of how these related to the purpose or design of the intervention.Theoretical understandings of the mechanism of change could also inform intervention duration, which varied significantly across the studies.While it seems theoretically unlikely that clinically significant changes could be made in the time of the shortest intervention (15 min), strong theoretical justification should be given before subjecting children to high numbers of intervention sessions.

Study Design
Quality appraisal highlighted a number of methodological limitations of the current research.For example, the majority of studies did not attempt to blind the individuals who coded behavioural outcomes and in fact, behaviours were often rated by researchers or parents who knew the purpose of the study, both of whom are likely to have been highly invested in positive outcomes.In particular, parents of autistic children have been shown to report positive effects when told their child is receiving treatment, even in the absence of an intervention (Jones et al., 2017).It is also possible that improvements occurred as a result of repetition of the outcome measures themselves.For example, a number of studies used questions based on Social Stories™ to measure outcomes, and as a social skills intervention in itself, this may have contributed to improvements.Similarly, when outcomes were measured by in vivo performance, improvements may have been due to practice effects, rather than social skill development.Greater use of validated measures of social skills, applied by raters who are blinded to the study purpose and intervention allocation, would improve some of these issues.Future research should also ensure that potential confounders, such as whether participants are receiving any other therapies, are fully considered.
Appropriately for the stage of research, the majority of studies were small-N.However, in larger trials, no details were given about the determination of sample size, and so it may be that those which did not find an effect were underpowered.In one study, a significant proportion of the sample were excluded from analysis based on age, however, the purpose of this is not explained, diminishing the validity of their results.To determine effectiveness, more largescale trials are required: these should be sufficiently powered and ideally pre-registered, to reduce the likelihood of data dredging and subsequent false positives.More experimental, controlled studies would also make it easier to determine, not only whether XR interventions are effective, but whether they offer any real improvement to social skills, over and above current methods.

Representativeness of Sample
Current research is limited in the extent to which participants are representative of the autistic population as a whole.Around 25% of autistic people are female (Loomes et al., 2017) but only five of the included studies met this threshold.Similarly, up to 50% of the autistic population are estimated to have an Intellectual Disability (ID; Charman et al., 2011;Loomes et al., 2017) yet a number of the included studies excluded participants with an ID or stated a mean IQ within the average range.Limited justification is given for this and it compounds evidence that people with ID are consistently excluded from autism research (G.Russell et al., 2019).Future research should seek to recruit participants who truly reflect the autistic population as a whole, or provide strong justification for exclusion.

Limitations of the Review
The findings of this review should be viewed in the context of its limitations.Firstly, an effect size could only be given 1 3 for a limited number of studies making it impossible to compare the efficacy of different interventions using meta-analysis.Instead, this study used a systematic but non-analytic approach to synthesis.Vote counting and summary plots were avoided to reduce the potential misrepresentation of efficacy, given the number of low quality studies.This could perhaps have been improved by only including studies which meet a particular quality threshold.However, this was not deemed feasible for this review given the small number of studies.As the research develops and more experimental, controlled studies are carried out, more systematic methods of synthesis will be possible.Furthermore, publication bias will have significantly impacted the number of studies showing positive effects.Grey literature was not included, to ensure included studies were of peer-review quality, but this may have increased the positive results bias.When determining studies to be included, there were some borderline cases, where insufficient descriptions of study interventions made it difficult to establish whether the XR was immersive.This was resolved through discussion amongst researchers but, could have been improved by contacting study authors for more information.
The conclusions that can be drawn from this review are also constrained by the limited amount of data, both in terms of number of studies and overall sample size.The maximum age of participants across studies was 16 and so it is unclear whether XR interventions could be effective for helping older teenagers and young adults to enhance their social skills.Additionally, only VR studies which altered one's perception of reality were included and so the efficacy of immersive VR could not be compared to that of VLEs, which may be more cheaply and easily implemented in clinical practice.

Conclusion
The potential to digitally alter how one perceives the world offers exciting new possibilities in developing effective and engaging social skills interventions for autistic children.However, enthusiasm for XR interventions should be viewed with caution.While there is some indication that interventions are feasible to implement with autistic children, limited evidence exists for their effectiveness in bringing about meaningful, longstanding improvements in everyday functioning.Furthermore, no assessment has been made of the potential emotional cost of autistic children implementing behaviours which do not come naturally to them.This review demonstrates the need for theoretically grounded interventions, designed with the interests of autistic people at the forefront.Controlled trials and larger sample sizes, as well as other improvements to study design, are required to draw firm conclusions about the efficacy of XR interventions and their generalisability, before potentially high cost scaling to routine clinical services is considered.

Appendix 1
Search terms* *Keyword and thesaurus searches were also conducted for each of the above terms, where possible within each database.

Appendix 2
The Downs and Black Checklist (below) was used to assess study quality.The scoring of item 27, referring to the power of the study, was modified.Instead of rating according to an available range of study powers, we rated whether or not a power calculation was performed.Items which were not applicable to single case and small-N studies are marked*.Items which were not applicable to studies which did not use statistical analysis are marked †.Questions 14 and 22 were not applicable to studies which did not have a control intervention.All other items were rated as Yes (scored as 1) or No (scored as 0) or Unable to Determine (scored as 0).Items 14 to 26 specifically assess bias and so were used to generate a risk of bias score.
1. Is the hypothesis/aim/objective of the study clearly described?2. Are the main outcomes to be measured clearly described in the Introduction or Methods section?If the main outcomes are first mentioned in the Results section, the question should be answered no. 3. Are the characteristics of the participants included in the study clearly described?In cohort studies and trials, inclusion and/or exclusion criteria should be given.In case-control studies, a casedefinition and the source for controls should be given.4. Are the interventions of interest clearly described?Treatments and placebo (where relevant) that are to be compared should be clearly described.5. Are the distributions of principal confounders in each group of subjects to be compared clearly described?A list of principal confounders is provided.6. Are the main findings of the study clearly described?Simple outcome data (including denominators and numerators) should be reported for all major findings so that the reader can check the major analyses and conclusions.7. Does the study provide estimates of the random variability in the for the main outcomes?In non-normally distributed data the inter-quartile range of results should be reported.In normally distributed data the standard error, standard deviation or confidence intervals should be reported.If the distribution of the data is not described, it must be assumed that the estimates used were appropriate and the question should be answered Yes. 8. Have all important adverse events that may be a consequence of the intervention been reported?This should be answered yes if the study demonstrates that there was a comprehensive attempt to measure adverse events.(A list of possible adverse events is provided).9. Have the characteristics of patients lost to follow up been described?This should be answered yes where there were no losses to follow up or where losses were so small that findings would be unaffected by their inclusion.This should be answered no, where the study does not report the number of patients lost to follow up.* 10.Have actual probability values been reported (e.g.0.035 rather than <.05) for the main outcomes except where the probability value is less than 0.001?† 11.Were the subjects asked to participate in the study representative of the entire population from which they were recruited?
The study must identify the source population for participants and describe how the participants were selected.Participants would be representative if they comprised the entire source population, an unselected sample of consecutive participants, or a random sample.Random sampling is only feasible where a list of all members of the relevant population exists.Where a study does not report the proportion of the source population from which the patients are derived, the question should be answered as unable to determine.12. Were those subjects who were prepared to participate representative of the entire population from which they were recruited?The proportion of those asked who agreed should be stated.Validation that the sample was representative would include demonstrating that the distribution of the main confounding factors was the same in the study sample and the source population.
13. Were the staff, places and facilities where the patients were treated, representative of the treatment the majority of patients receive?For the question to be answered yes the study should demonstrate that the intervention was representative of that in use in the source population.The questions should be answered no if, for example, the intervention was undertaken in a specialist centre unrepresentative of the hospitals most of the source population would attend.14.Was an attempt made to blind study subjects to the intervention they have received?For studies where the patients would have no way of knowing which intervention they received, this should be answered yes.* 15.Was an attempt made to blind those measuring the outcomes of the intervention?16.If any of the results of the study were based on 'data dredging', was this made clear?Any analysis that had not been planned at the outset of the study should be clearly indicated.If no retrospective unplanned subgroup analyses were reported, then answer yes.17.In trials and cohort studies, do the analyses adjust for different lengths of follow up of patients, or in the case-control studies, is the time period between the intervention and outcome the same for cases and controls?Where follow-up was the same for all study patients, the answer should be yes.If different lengths of follow-up were adjusted for by, example, survival analysis the answer should be yes, studies where differences in follow-up are ignored should be answered no.** 18. Were the statistical tests used to assess the main outcomes appropriate?The statistical techniques must be appropriate to the data.For example non-parametric methods should be used for small sample sizes.Where little statistical analysis has been undertaken but where there is no evidence of bias, the question should be answered yes.If the distribution of the data (normal or not) is not described it must be assumed that the estimates used were appropriate and the questions should be answered yes.19.Was compliance with the intervention/s reliable?Where there was non-compliance with the allocated treatment or where there was contamination of one group, the question should be answered no.For studies where the effect of any misclassification was likely to bias any association to the null, the question should be answered yes.20.Were the main outcome measures used accurate (valid and reliable)?For studies where the outcome measures are clearly described, the question should be answered yes.For studies which refer to other work or that demonstrated the outcomes measures are accurate, the question should be answered as yes.21.Were the participants in different intervention groups (trials and cohort studies) or were the cases and controls (case-control studies) recruited from the same population?For example, patients for all comparison groups should be selected from the same hospital.The question should be answered unable to determine for cohort and case control studies where there is no information concerning the source of patients included in the study.*† 22. Were study participants in different intervention groups (trials and cohort studies) or were the cases and controls (casecontrol studies) recruited over the same period of time?For a study which does not specify the time period over which patients were recruited, the question should be answered as unable to determine.*23.Were study subjects randomised to intervention groups?Studies which state patients were randomised should be answered yes except where method of randomisation would not ensure random allocation.For example, alternate allocation would score no because it is predictable.*† 24.Was the randomised intervention assignment concealed from both patients and health care staff until recruitment was complete and irrevocable?All non-randomised studies should be answered no.If assignment was concealed from patients but not from staff, it should be answered no.* † 25.Was there adequate adjustment for confounding in the analyses from which the main findings were drawn?This question should be answered no for trials if: the main conclusion of the study were based on analyses of treatment rather than intention to treat; the distribution of known confounders in the different treatment groups was not described; or the distribution of known confounders differed between the treatment groups but was not taken into account in the analyses.In non-randomised studies, if the effect of the main confounders was not investigated or confounding was demonstrated but no adjustment was made in the final analyses the question should be answered as no.* † 26.Were losses of patients to follow up taken into account?If the numbers of patients lost to follow-up are not reported, the question should be answered as unable to determine.If the proportion lost to follow up was too small to affect the main findings, the question should be answered yes.27.Is there any evidence that a power calculation or reasonable equivalent was used to determine sample size?

Fig. 1
Fig. 1 Study selection process improvements in social reciprocity and inflexibility to change post-intervention for all participants (+) Observed improvements in non-verbal communication post-intervention for 6 of 7 participants (social reciprocity post-intervention (+) (d = 0.50) Statistically significant improvement in emotion regulation post-intervention (+) (d = 0.39) No significant difference in emotion recognition (Faces Test) post-intervention (-) (d = 0.26) No significant difference in emotion recognition (Eyes Test) post-intervention (-) (d = 0.19) No significant difference in everyday social skills post-intervention (-) (d = 0) Statistically signification effect of group (intervention vs waitlist control) on improvements in emotion regulation (+) (d = 0.54) and social reciprocity (+) (d = 0.67) Lorenzo et al. (2016) 0.37 0.46 No clear primary outcome measure.Study reported greater overall improvements in the intervention group, compared with the non-immersive control (+) Ravindran et al. (2019) 0.42 0.33 Observed increases in frequency of interactions and amount of eye contact after using the VR system (+) Crowell et al. (2020) 0.46 0.33 No significant difference in social initiations using MR, compared with non-digital control (-) (d = 0.06) Statistically significantly more social responses in control condition, compared with MR intervention (-) (d = 0.14) Mora-Guiard et al. (2017) 0.39 0.33Observed increases in initiations, acts and responses while using the MR system, between sessions understanding of social initiations between baseline and intervention for all participants.Sustained at maintenance for 2 of the 3 participants (+/-) Statistically significant increase in therapist ratings of role play performance at intervention and maintenance (improvement in emotion recognition at follow-up in comparison to baseline (+) 1. Is there any evidence that extended reality interventions are effective in helping autistic children and young people to enhance their social skills?2. What is the quality of the evidence for observed effects? 3. To what extent have observed effects been shown to lead to meaningful change for autistic people?

Table 1
Ip et al. (2018)nts (15.5% female) aged 2 to 16 were included.Sample sizes ranged from one to 94 participants (M = 13.24,SD=18.02).All participants were autistic however, only 11 studies reported that this was a clinical diagnosis according to DSM and only three confirmed diagnosis using a screening tool:Ip et al. (2018)using the Childhood Autism Spectrum Test (Williams et al., 2005); Liu et al. (2017) using the Social Communication Questionnaire

Table 1
Summary of included studies

Table 2
Description of AR, VR and MR interventions and their reported purpose Intervention Description AR AR based Self-Facial Modelling Learning System This system is designed for children to interact with while wearing masks which represent particular emotions.AR superimposes facial expressions onto a participant's view of themselves, to show the emotion most appropriate to the situation.This augmentation of the participant's view of themselves is thought to promote learning.AR Video Modelling With Storybook Learning System A tablet computer is used to augment a physical storybook with short video clips, which interactively emphasise social details.This draws attention to social elements of the story and is thought to be more engaging than traditional video modelling approaches.Real Time Kinect Skeletal Tracking System This system augments a virtual character with a user's body gestures and superimposes a range of real life backgrounds.This enables children to practice a range of body gestures which might be appropriate to different situations e.g.shaking hands or waving arms.The primary benefit is that it is real time but without the anxiety associated with real world interactions.AR Concept Map Training System A tablet computer is used to augment a physical concept map with 3D greeting animations and graphics to indicate relationships between people and degrees of closeness in relationships.The purpose is to help children to visually learn about basic social relations and appropriate greeting behaviours.Empowered Brain/ Brain Power System Children wear a pair of Smartglasses which can evaluate looking behaviours and superimpose digital text and images onto the world, so that social cues are given in real time according to the user's ability.It includes a number of applications: Face Game (also termed Face2Face) uses an AR game to draw attention to human faces with visual overlays such as arrows and cartoon masks which gradually fade.

Table 2
(continued)in two school environments which are viewed through a head mounted display.Children can socially interact with a number of avatars, whose responses are controlled by the researchers.The system is reported to be highly immersive and flexible to allow for adaptation and promote real world generalisability.MRLands of Fog Users collaborate with a partner to hunt for insects in a dense layer of virtual fog using a physical butterfly net.Explorative and collaborative efforts are rewarded with an upbeat tune or new feature.The approach is thought to be effective in encouraging collaboration as users can clearly see the actions of themselves and others, unlike in traditional collaborative game play.

Table 3
Summary of results