Systematic Review of Virtual Reality in Behavioral Interventions for Individuals with Autism

Virtual reality technology is able to simulate real life environments and has been used to facilitate behavioral interventions for people with autism. This systematic review aims to evaluate the role of virtual reality (VR) technology in the context of behavioral interventions designed to increase behaviors that support more independent functioning (e.g., teach vocational skills, adaptive behavior) or decrease challenging behaviors that interfere with daily functioning for individuals with autism. We conducted a systematic search in four databases followed by a reference search for those articles identified by the systematic database search. We also conducted a quality review using the evaluative method for evaluating and determining evidence-based practices in autism. We identified 23 studies with a majority of the studies (n = 18; 75%) utilizing group experimental or quasi-experiment research design and the remaining (n = 5; 21.74%) utilizing single-case research design. Of those studies, targeted behavior includes vocational skills (n = 10), safety skills (n = 4), functional behaviors (n = 2), and challenging behavior (n = 7). Of the 23 studies, 11 met the quality criteria to be classified as “strong” or “adequate” and can offer evidence on the integration of VR technology into behavioral interventions. The use of VR to provide behavioral interventions to teach driving skills and to teach interview skills can be considered an evidence-based practice.

needs of those with ASD can best be supported through interventions, particularly those considered to be evidencebased practices. Evidence-based practices are interventions and teaching methods that are substantially supported through research to produce positive outcomes (Hume et al., 2021). One evidence-based practice aimed at improving socially valid behavioral concerns is applied behavior analysis (ABA). ABA is a branch of the science of behavior analysis which relies on assessment to inform individualized interventions designed to reduce or strengthen human behavior (Cooper et al., 2020). There is a strong body of evidence to demonstrate its use in supporting individuals with ASD (Leaf et al., 2020;Tiura et al., 2017). Evidence-based practices for individuals with ASD that are aligned with the principles of ABA include discrete trial training, functional communication training, prompting, and self-monitoring, to name a few (Hume et al., 2021;Ivy & Schreck, 2016). However, as our society shifts to a more technological world, there has been an increase in the use of technology-based interventions to support individuals with ASD (Thai & Nathan-Roberts, 2018).
Technology-aided instruction and intervention (TAII) is an evidence-based practice for individuals with ASD in which technology is the primary component of the intervention (Hume et al., 2021). This can include the use of more common technology, such as the use of computers or mobile device applications, as well as more advanced technology, like virtual reality (VR), augmented reality (AR), mixed reality (MR), or robots. For example, in Rosenbloom et al. (2016), researchers investigated the effectiveness of using a commercially developed mobile device application, I-Connect (Wills & Mason, 2014), for self-monitoring in the general education classroom. The results of the study indicated strong outcomes for increasing on-task behavior and reducing disruptive behaviors, including good social validity from the participant and the teacher. TAII has demonstrated to be effective at supporting a variety of needs (e.g., communication, academics, adaptive skills) for individuals with ASD from infancy to adulthood (Hume et al., 2021). Advanced technology, like VR, has the potential to support TAII and has become more accessible over the last decade.
VR is a three-dimensional, computer-generated visual experience that can replicate real life that the user can interact with (Clay et al., 2021;Lorenzo et al., 2016). It allows for various types of immersive experiences for the user: non-immersive (e.g., desktop display), semi-immersive (e.g., large screen display), and immersive (e.g., headmounted display; Di Natale et al., 2020). There are several advantages to the use of VR interventions for individuals with ASD, which align with the principles and dimensions of behavior analysis. A clear benefit is its ability to emulate a real-world setting (i.e., environmental arrangement) and offer experiences that cannot otherwise be captured through typical teaching methods like text (i.e., written instructions) or videos (Bailenson et al., 2008). VR allows for repeated practice of skills that may be difficult, or dangerous, to do in real life (e.g., safety skills; Karami et al., 2021). This is extremely advantageous for individuals with ASD as it may reduce the stress associated with learning adaptive and functional skills through more traditional means (Didehbani et al., 2016). It allows for individualization as the implementer can adapt the user's experience (e.g., appearance of the VR environment displayed, complexity of the task) to meet their specific needs (Bailenson et al., 2008). Real-time feedback empowers programming of specific contingencies and reinforcement schedules to facilitate learning (Clay et al., 2021;Karami et al., 2021). Because VR can mimic the real world, it likely supports generalization. While this could be achieved using invivo teaching methods, it enables possibilities beyond face-to-face teaching. VR enables the tracking of a user's movements (Bailenson et al., 2008), which provides valuable information as to where the user responds, interprets, and interacts with the world (Lorenzo et al., 2016), which is beneficial for data-based decision-making. Additionally, individuals with ASD have shown a strong preference for technology (Valencia et al., 2019), potentially contributing to the social validity of this type of intervention.
Despite an increase in research regarding VR and ASD, there are a limited number of systemic literature reviews to further inform research and practice. For example, Barton et al. (2017) focused broadly on technology-based interventions, which do include information regarding VR. However, their review did not capture the extant literature on VR given this was not the focus and the search terms encapsulating VR were not specifically used. To date, two existing reviews specific to use of VR as an intervention for people with ASD have been identified. Both Mesa-Gresa et al. (2018) and Lorenzo et al. (2019) provided analysis of this topic, with the focus of Lorenzo et al. (2019) being specific to immersive VR. Together, these reviews analyzed 43 published articles from 1990 to 2018. While these reviews give researchers and practitioners an understanding of how VR has been used, neither assessed the rigor of the research using a quality evaluative method (e.g., Center for Exceptional Children Standards; Reichow et al., 2008). This is an important consideration as synthesizing the literature and evaluating the effectiveness of interventions without consideration for the quality of the research (e.g., threats to internal validity) are less helpful in informing practice (Kratochwill et al., 2013).
Additionally, practitioners use systematic reviews to inform decisions related to identifying target behaviors and selecting between intervention options. Therefore, more focused reviews may provide an easier way for them to access the research. Currently, to the authors' best knowledge, four focused reviews exist on VR for people with ASD. Specifically, two of these reviews focused on social communication skills (Irish, 2013;Vasquez et al., 2015), which included a range of social communication skills (e.g., social interactions, social conventions), and two reviews more broadly focused on VR interventions to teach social skills, such as joint attention, pretend play, and social interactions (Dechsling et al., 2021;Parsons & Mitchell, 2002). A focus in these areas is not surprising given the diagnostic criteria for ASD relates to deficits in social communication. However, individuals with ASD often have difficulty acquiring skills that facilitate independence (e.g., adaptive behavior and vocational skills). One of the primary benefits of VR is the ability to design an accessible and safe practice space to develop skills that are needed to promote independence and autonomy (Irish, 2013;Parsons & Mitchell, 2002). Thus, it is exciting to see an increase in published research using VR-based interventions to teach functional and adaptive behaviors (Didehbani et al., 2016;Karami et al., 2021).
Lastly, although some reviews of the VR literature have been published, most of these reviews are limited in that they did not utilize systematic search procedures or provided assessment of the methodological rigor (e.g., Reichow et al., 2008) of the studies included. The absence of a measure for methodological rigor limits the certainty of evidence and complicates interpretation of the review's conclusions. While there is a need to address the deficits of communication for those with ASD, it is also imperative to teach adaptive and functional skills in order to promote independence and autonomy.
There is potential for VR to enable people with ASD to have meaningful opportunities to learn and generalize adaptive and functional skills to their everyday life. The current review aims to (1) evaluate the current existing body of literature utilizing behavioral interventions delivered using VR to increase behaviors that support more independent functioning (e.g., teach vocational skills, adaptive behavior) or decrease challenging behaviors that interfere with daily functioning that interfere with daily functioning for individuals with ASD and (2) assess the methodological rigor of the literature to inform future research and practice for the use of VR when targeting adaptive and functional skills.

Search Procedures
The researchers completed a systematic search of the following databases: PsycINFO, Medline, Psychology and Behavioral Sciences Collection, and ERIC. These searches were conducted by combining a term to describe ASD (i.e., "Autis*," "Developmental disab*," "Asperger," "ASD") with a term to describe VR ("virtual reality") and ("intervention", "treatment"). The original search of the databases was conducted in September 2021 and yielded 191 articles after the removal of duplicates (see Fig. 1 graphic results).  Following the database searches, the third author completed an initial screening of each article by their title and abstract and excluded articles that did not include the use of VR and ASD (n = 129). An additional search using Google Scholar was then conducted to identify any additional articles. After excluding duplicates, a total of 34 articles were added to the total article list for application of the inclusion criteria. In total, the researchers reviewed the full text of 67 articles using the inclusion and exclusion criteria. This resulted in 51 articles being excluded and 16 articles being included in the review. Following the database searches, ancestral and forward searches of the included articles were conducted. The ancestral searches consisted of reviewing the references of the included articles and extracting relevant studies if titles contained any of the search terms (defined above). Forward searches were conducted via Google Scholar to search the record of the included articles using the "cited by" button. Relevant articles which cited the included article were then reviewed for possible inclusion. All articles extracted from the ancestral and forward searches were added into a Microsoft Excel™ spreadsheet. These articles then underwent review for inclusion. A total of seven additional articles were retrieved from the ancestral and forward search bringing the final number of articles included to a total of 23 studies.

Inclusion and Exclusion Criteria
To be included in this review, articles had to meet the following criteria: (a) be peer-reviewed and published in English, (b) include at least one participant with an ASD, (c) implement a behavioral intervention designed to facilitate independence by increasing functional/ adaptive behaviors or decreasing challenging/ interfering behaviors, (d) utilize an experimental or quasi-experimental research design to evaluate the effects of the intervention on the target behaviors, (e) use a form of VR to facilitate the therapeutic intervention, and (f) provide quantitative data pertaining to the participant's acquisition of adaptive and functional target behaviors (e.g., teaching air travel behavior, treatment of phobia, learning pedestrian safety skills). Studies that did not collect data on an adaptive or functional target behavior or did not include a therapeutic intervention as the independent variable were excluded (e.g., social skills, communication skills). Functional and adaptive behaviors were broadly defined as behaviors that fall within the categories of vocational, domestic, personal, or community, leisure skills (Ivy & Schreck, 2016). Skills targeted within the studies were evaluated by cross checking the dependent variable with commonly used behavior analytic assessments (e.g., Essential for Living, Assessment of Functional Living Skills) to determine inclusion (McGreevy et al., 2012;Partington & Mueller, 2012). For example, Fitzgerald et al. (2018) conducted a study that evaluated the use of VR and video modeling to teach paper folding tasks (e.g., making a paper boat). However, this paper was excluded since although folding paper might have some functional and adaptive contexts, such as folding paper menus in the context of job skills or increasing leisure skills for an individual, a direct context was not provided for a functional or adaptive context, and the study focus was more so a comparison of two intervention methods (e.g., VR versus video modeling). Additionally, the researchers included any study utilizing a quasi-experimental, group comparison experimental, or single-case experimental design. Studies evaluating only the social validity of VR interventions or participants' perspectives that did not also involve an evaluation of intervention outcomes were excluded. For example, McCleery et al. (2020) evaluated the usability and feasibility of an immersive VR program to teach police interaction skills for participants with autism but did not measure targeted skills gained from the intervention program and thus was excluded from this review given the lack of data collection. Finally, any study that discussed the development of technologies or the architecture of the technologies but did not provide quantitative data on the effects of the intervention on the target dependent variables was excluded. For example, Trepagnier et al. (2005) discussed multiple computer-based and virtual environment technologies that are in development but did not utilize those technologies in an experiment. After application of the inclusion criteria, a total of 23 articles were included in this review.

Descriptive Synthesis
The raters coded each article by the following variables: (a) number of participants, (b) participants characteristics (age, gender, diagnosis), (c) dependent variable, (d) independent variable, (e) technology utilized (description of the VR technology), (f) experimental design, and (g) study outcomes. Raters coded the total number of participants including both the participants with ASD and without ASD. Raters provided a narrative description of the dependent variables, independent variables, and technology used. Raters coded the experimental design as either group experimental, quasi-experimental, or single-case experimental designs. Finally, raters coded the study outcomes according to how the author(s) reported the outcomes for the target dependent variable(s).

Quality Evaluation Method
Articles were group based on the experimental design (i.e., single-case research versus group experimental/quasi-experimental) to facilitate the quality evaluation. Primary indicators of quality include evaluation of descriptions included 1 3 in a study such as participant information, independent variable, dependent variable, and use of statistical tests. The two lead authors then evaluated each study according to the corresponding rubric developed by Reichow et al. (2008) evaluative method for single-case or group-experimental research design. Reichow's evaluative method was chosen in comparison to other quality evaluative methods (e.g., Council for Exceptional Children Standards) since it includes procedures to evaluate both single-case and group experimental research, evaluates internal and external validity, and was specifically developed for research specific to individuals with autism (Cook et al., 2015;Reichow et al., 2008). Additionally, Reichow's evaluative method has been well established in the literature to aid in the identification of practices that meet the standards to be classified as an evidence-based practice (EBP; Lynch et al., 2018;Reichow, 2012).

Interrater Reliability
During the review for inclusion, two raters coded 100% of the articles (n = 74). To evaluate the reliability of the application of the inclusion and exclusion criteria, interrater reliability (IRR) was calculated using the percent agreement by dividing the total number of agreements by the total number of agreements plus disagreements and then multiplying by 100 to obtain a percentage. Agreement on inclusion was obtained on 89.19% of the studies (n = 66). Disagreements were reviewed and discussed by the raters until agreement was established for a final agreement of 100%.

Data Extraction
Two raters independently coded 50% of the included articles (n = 23), which were assigned using a randomizer application (i.e., www. random. org). Each article was coded across three categories with a total of 36 items for which reliability was evaluated (i.e., 12 articles with three categories each). Agreement was established on 33 of the items. IRR was calculated using the percentage of agreement by dividing the total number of items with agreement by the total number of items and then multiplying by 100 to obtain a percentage. The initial IRR was 91.67%. Disagreements were reviewed by the raters and discussed for a final IRR of 100%. The final table was then evaluated for accuracy by the remaining authors to ensure accuracy of the table.

Quality Evaluation
Twelve studies of the 23 articles (50%) were independently reviewed by the two lead authors to establish IRR. The 12 articles included seven group experimental/quasi-experimental design studies and four single-case research studies. There were 12 indicators per article for a total of 24 items for which reliability was evaluated. Agreement was established on 21 of the 23 total items (91%). Disagreements were discussed by the authors until a consensus for a final IRR of 100%.

Results
The 23 articles included in this review were summarized by dependent variable, intervention components, behavioral components, and technology used. Table 1 provides the data summary of each study.

Participants
Across the 23 included studies, there were a total of 888 participants (excluding the staff participants included in Smith et al., 2021a) with an approximate mean age of 17.47 (range = 4 to 29.4 years). A total of 519 participants were reported having formal diagnosis of autism, ASD, Asperger's, or pervasive developmental disorder-not otherwise specified (PDD-NOS). The majority of the included participants were high to moderate functioning levels. Furthermore, most studies included specific participant characteristics or inclusion criteria such as unimpaired cognition/ average IQ (e.g., Maskey et al., 2019a;Ward & Esposito, 2019), minimum reading level ability (e.g., Genova et al., 2021;, verbal fluency/spoken language abilities (e.g., Dixon et al., 2020;Maskey et al., 2014), ability to follow directions (e.g., Maskey et al., 2019bMaskey et al., , 2019c, normal vision and hearing (e.g., Smith et al., , 2021a, and no severe physical, medical, or psychiatric condition would interfere (e.g., Cox et al, 2017;Johnston et al., 2020). Only one study reported the inclusion of participants (n = 3) that did not have spoken language (i.e., . Interestingly, none of the included studies reported screening of seizure disorders. And of the studies that reported specific exclusion criteria (n = 5), seizure disorders were not specifically listed.

Dependent Variables
Of the 23 studies, 43.48% (n = 10) taught vocational related skills. Specifically, nine of these studies (e.g., Burke et al., 2018;Genova et al., 2021; targeted job interview skills and one targeted general vocational skill (i.e., Bozgeyikli et al., 2017). Of the functional behaviors targeted, 30.43% (n = 7) of the studies focused on the treatment of challenging behavior, such as the treatment of fears, phobias (e.g., Maskey et al., 2014Maskey et al., , 2019aMeindl et al., 2019), or hypersensitivity to environmental sounds (e.g., crying babies, barking dogs, sirens; Johnston et al., 2020) that caused that caused anxiety, disruption to daily  Bozgeyikli et al. (2017) (1) Level scores measured via 6 vocational skills, (2) participant self-rated (tiredness, immersion, motion sickness, satisfaction, effective training, reasonable design, and accuracy) via a modified version of Loewenthal's core elements of the gaming experience questionnaire, (3) follow-up data on the 6 vocational skills were collected via a survey by the job trainers, and (4) social validity was collected via qualitative survey data The user completed six job training modules (cleaning, environmental awareness, loading, money management, shelving, social skills). Each module began with a tutorial provided by a virtual instruction to explained and show the user how to perform the task. This was then proceeded by two additional levels increasing in difficulty  Simões et al. (2018) (1) Action accuracy measured via the number of correct actions for the task, (2) debriefing accuracy measured via the accuracy of the participants describing the process of riding the bus, (3) task duration in minutes for each session, and (4) Anxiety level measured via electrodermal activity (EDA) variation during each session Participants were given a brief tutorial of game controls and then a task ranging in difficulty and complexity (taking one bus or two buses) for selecting a bus route and engaging in corresponding steps to reach the destination. At the end of every session, the participant was asked to describe the process of riding a bus to the experimenter, but feedback was not provided (2) adherence outcome measured via the recorded total number of completed virtual interviews, frequency of participants who progressed through systems levels, and total number of mins engaged with virtual interviewers; (3) job interview skills measured via the Mock Interview Rating Scale; (4) self-efficacy measured via self-reported scale; (5) job interview anxiety measured via a modified version of the brief Personal Report of Public Speaking Apprehension; and (6) vocational outcomes Self-guided e-learning materials and computer simulated job interview using virtual interviewers  Smith et al. (2021b) (1) Implementation outcomes measured via VR-JIT orientation acceptability, appropriateness, and expected implementation feasibility, (2) effectiveness outcomes measured via vocational outcomes (employment) Self-guided e-learning materials and computer simulated job interview using virtual interviewers driving simulation software and a hardware driving interface that allow users to perform driving tasks in a controlled environment. A 3D model city was created via CityEngine and Autodesk Maya software, along with the Unity3D game to implement the VR driving application The Tobii X120 remote eye-tracking device (www. tobii. com) was used to extract eye gaze data from participants Wang and Reid (2013) (1) Contextual processing of objects measured using Virtual Reality Test of Contextual Processing of Objects an adaptation of the Object Integration test; (2) executive functioning measured using the modified version of the Flexible Item Selection Task; (3) visual attention, visual scanning, and visuomotor inhibition measured using the Attention Sustained Subtest; and (4) behavioral changes measured using a final feedback questionnaire The VR training program provided lessons focused on one class of object characteristics: perceptual, spatial, or functional. The goal of each lesson was to teach the child to flexibly attend to object dimensions of that class The VR program presented each set of 10 training items in a predictable sequence. Motion-capture VR technology allowed for the child to see themselves on the screen and indicate responses through gestures (grab and dragging objects across the screen) and project the child's image and movements into the virtual environment All software programs were programmed using Flash 8 with the programming language Actionscript 2.0. The programs were run using Macromedia Flash Player Ward and Esposito (2019) (1) Self-efficacy measured via the General Self Efficacy Scale, (2) self-confidence measured via the Interview Self Confidence Survey, (3) progress monitoring within VR-JIT, and (4) student satisfaction Virtual job interview training program (Job Interview Training with Molly Porter) that includes user-driven e-learning, an interactive job application, and practice interview simulation with a virtual interviewer live, and/or challenging behavioral reactions (e.g., agitation, hiding, panic attacks, refusing to go outside). Four (17.39%) studies focused on safety related skills, such as pedestrian safety (i.e., Dixon et al., 2020), driving skills (i.e., Cox et al., 2017;Wade et al., 2016), and transportation use (i.e., Simões et al., 2018). One study targeted general functioning skills (i.e., understanding the context and characteristics of common objects (i.e., Wang & Reid, 2013).
And lastly, only one study focused on increasing exercise engagement (i.e., McMahon et al., 2020). See Table 1 for a summary of each study.

Behavior Analytic Components Embedded within VR
A combination of behavior analytic components, such as antecedent interventions, prompting, reinforcement, or corrective feedback, was utilized by all the included studies.
For nine of the studies (39.13%), the VR system primarily provided the learning stimuli, prompts, and consequence variables (e.g., reinforcement or feedback) and in some cases a researcher or therapist provided pre-training on the use of the VR system. For five of the studies (21.73%), a combination of the VR system and therapist implementation was used. For example, for most of the studies utilizing VR within the context of job interview training, the VR system was primarily used for practice interviews and additional instruction was provided by a therapist on related interview skills (e.g., Smith et al., 2021a;Strickland et al., 2013). Lastly, in eight studies (34.78%), the VR system was utilized primarily for the learning stimuli needed for teaching the targeted skill with a therapist delivering instruction, prompting, and reinforcement. For example, Dixon et al. (2020) used the VR system within the context of pedestrian safety (visual and auditory stimuli) with a therapist delivering questions related to the safety of the situation (e.g., "Is there a moving car?") and providing reinforcement for the participant's responses. See Table 1 for a summary of each study.

Generalization Measurement
The majority of the studies included some type of generalization assessment. Eight of the studies (34.78%) provided a real-life opportunity to assess generalization. Specifically, two studies included real-world practice (e.g., Meindl et al., 2019;. For example, Meindl conducted real-life blood draws in medical environments and assessed generalization across two different nurses and to the participant's other arm. Of the studies that did not collect generalization data, six studies (26.09%) provided mock interviews (e.g., Burke et al., 2018;Genova et al., 2021; rather than generalization to real life contexts. Five studies (21.74%) collected self-reporting of phobias/ fears that had been treated at various follow-up points (e.g., 6 weeks, 6 months, 12 months; e.g., Maskey et al., 2014Maskey et al., , 2019aMaskey et al., , 2019c. However, four studies (17.39%) did not assess for any dimension of generalization, simulated practice, or self-reporting of effects past treatment (e.g., Bozgeyikli et al., 2017;Simões et al., 2018;Strickland et al., 2013).

VR Technology
All studies used software to create the virtual environments, but some used additional hardware displays and interfaces to increase the level of immersion. A non-immersive VR was the most commonly utilized configuration, which was used by 43.48% of the included studies (n = 10). This type of VR configuration is the least immersive and generally relied on a standard desktop sized computer monitor (i.e., size range) with basic inputs from the user (e.g., desktop keyboard or controller; Bamodu & Ye, 2013). A semi-immersive VR which was used by 30.43% (n = 7). This configuration relied on external equipment, such as sensors for interaction (e.g., XBOX Kinect, Leap Motion) and projectors or large screens to display the VR simulation (e.g., Blue Room advanced VRE) to create a sense of deeper immersion and interactivity within a VR simulation (Bamodu & Ye, 2013). Lastly, fully immersive VR was used by 30.34% (n = 7) of the included studies. This set up entailed both the use of advanced VR technology (e.g., motion tracking, head mounted display, Oculus Touch controllers) with the use of software (e.g., Unity Game engine) to create the more advanced 3D virtual environments (Bamodu & Ye, 2013). See Table 1 for a summary of each study.

Quality Ratings and Evaluation of Evidence
There were 18 group experimental design studies and five single-case experimental design studies (n = 23 studies total). Overall, the raters identified two (8.75%; Cox et al., 2017;Wade et al., 2016) of the studies as meeting criteria to be classified as "strong" and nine (39.13%; Dixon et al., 2020;Genova et al., 2021;Meindl et al., 2019;Simões et al., 2018;Smith et al., 2021a;Smith et al., 2021b;Wang & Reid, 2013) of the studies as meeting criteria to be classified as "adequate." The remaining studies (n = 12; 52.17%) did not meet criteria and cannot offer evidence towards the research question (i.e., Bozgeyikli et al., 2017;Burke et al., 2018;Burke et al., 2021;Genova et al., 2021;Johnston et al., 2020;Maskey et al., 2014;Maskey et al., 2019a;Maskey et al., 2019b;Maskey et al., 2019c;McMahon et al., 2020;Strickland et al., 2013;Ward & Esposito, 2019). Of the 18 group experimental design studies, the raters classified two (11.11%) as "strong," six (33.33%) as "adequate," and ten (55.56%) as "weak." Of the five single-case experimental design studies, the raters classified zero as "strong," three (60%; Dixon et al., 2020;Meindl et al., 2019;Wang & Reid, 2013) as "adequate," and two (40%; Johnston et al., 2020;McMahon et al., 2020) as "weak." Across the between group design studies, the most common "unacceptable" primary indicator rating was related to the dependent variable. Meaning there were at least three out of four missing features (i.e., the variables were defined with precision; details of measurement for replication; measures linked to dependent variable; data collected at appropriate times for analysis). Across the secondary indicators, the most common criteria not met was random assignment. For the single case studies, there were not commonalities in the primary indicators. Rather, the most common "unacceptable" secondary indicator rating was related to calculation of the kappa statistic, use of blind raters, and collection of generalization and maintenance data. See Tables 2 and 3 for quality ratings of each study. Taken as a whole, the two studies (Cox et al., 2017;Wade et al., 2016) identified as "strong" quality studies were conducted by two different research teams, at two different locations, with 71 different participants. They both used VR in the context of teaching driving skills and meet the qualifications to be considered an evidence-based practice. With four between groups experimental studies (i.e., Smith et al, , 2021aSmith et al, , 2021b conducted by two different research labs, rating as "adequate," the use of VR in the context of teaching interview skills can also be considered an evidence-based practice.

Discussion
This review synthesized 23 studies investigating the use of VR to provide behavioral interventions designed to increase behaviors that support independent functioning or decrease behaviors that interfere with functioning in daily life for individuals with autism. Of those studies, ten targeted vocational related skills, seven targeted challenging behaviors, four targeted safety skills (e.g., driving, airplane travel, pedestrian safety), and two studies targeted general functional skills (e.g., engagement, executive functioning).
In terms of quality ratings, two of the studies met the quality criteria for a classification of "strong" and eight met the quality criteria for a classification of "adequate." This literature base supports the use of VR as an evidence-based modality for behavioral interventions in teaching driving and interview skills. There is also a need for replication of both single case and between group experimental designs, as well as an increase in the rigor of quality design methodology.
The first aim of this review was to evaluate the current existing body of literature utilizing behavioral interventions delivered using VR to increase behaviors that support independent functioning and address challenging behaviors that interfere with daily functioning. The literature base highlighted the use of VR in simulating daily environments (e.g., interview settings and streets) in the teaching environment. The use of VR to simulate these daily environments can enhance the safety and generalizability of training and intervention overall. For example, the use of VR to simulate a street by Dixon et al. (2020) removes the inherent danger associated with street crossings while allowing the trainee to develop autonomy. Similarly, the use of VR to simulate driving conditions by Cox et al. (2017) can allow for safe practice environments protecting the trainee, instructor, other drivers, and pedestrians. In particular, VR environments can reduce the risks associated with skill acquisition that might not be feasible in the real-world environments. For example, when practicing safely walking across the street in a VR environment, there are no real risks if the user does not wait for the crosswalk sign to signal as compared to the real environment, where an individual could be hit by a car.
Another potential benefit of VR based interventions are the ability to customize the user's intervention based on their progress for skill acquisition, such as embedding prompts to help highlight the salient cues in the environment that should evoke a specific behavior response from the user. For example, Cox et al. (2017) included extra stimulus cues within the VR driving simulation based on user eye gaze to highlight driving hazards that should evoke driver attention and defensive driving maneuvers. This type of included component can potentially help ensure the VR interaction can individualize to the user, thus providing a more tailored intervention and user experience.
VR interventions can also allow for extra practice and a variety of exemplars to better promote generalization of skills (multiple exemplar training study). Furthermore, VR can also easily allow for generalization to the natural environment since it allows for programming of the relevant stimuli that would occur within the natural environment (Stokes & Baer, 1977). For example,  included programming for generalization within the sessions of the study. Specifically, this study conducted the last session of the study at the airport to provide a real-world rehearsal of the air travel skills targeted during the VR-based intervention. This study highlights the utility and efficacy of VR based interventions as well as the need to evaluate the transfer of skills to the "real" environment. However, given the lack of assessment of generalization to real-environments from the studies included in this review, more analysis is needed to evaluate the effects of generalization on VRtrained skills.
Lastly, some of the studies included in this review indicated the effectiveness of using lower cost VR systems, which may increase the feasibly of VR-based interventions within clinical applications. For example,  used an iPhone X with Google Cardboard device to Table 2 Quality indicator ratings for group experimental and quasi-experimental research And several studies used a commercially available Internet software program (i.e., Molly Porter by SIMmerson Immersive Simulations) to provide mock interviews for developing interview skills (i.e., Genova et al., 2021;Smith et al., , 2021aSmith et al., , 2021bWard & Esposito, 2019). Although low-tech solutions may be readily available, research is still needed to help evaluate the costs and benefits of the various VR technology as it relates to skills being taught, the needs of the individual, and the programming of relevant environmental variables to help best promote generalization of skills to real-world environments. The secondary aim of this review was to assess the methodological rigor of the literature to inform future research and practice for the use of VR when targeting adaptive and functional skills. A major strength of this literature base is the inclusion of both single-case experiments and between-group experiments. This literature base was able to establish the use of VR as an evidence-based modality to provide behavioral interventions for teaching driving and interview skills. Unfortunately, over half of the included studies did not meet quality criteria to contribute to the knowledge base. This indicates a need for further replication with a focus on methodological quality for VR-based modalities in the context of behavioral interventions. In particular, description of the dependent variable is crucial to replication of research but was a limitation of this literature base (Kazdin, 2011). Also, the use of rigorous designs with control conditions and random assignment can enhance the rigor of this literature base.
While the current research evaluated in this review indicates that VR is a conducive platform complementary for the integration of behavior analytic strategies to develop effective interventions, there are a few considerations worthy of discussion. First, there is a need for decision-making frameworks to help inform practitioners and service providers which equipment options allow for individualization or what technology options best align to various characteristics and needs of the individuals we serve. For example, Simões et al. (2018) provided differentiation across the technology used. Specifically, four of the participants in the study did not use the VR head-mounted display due to vision impairments, however the desktop configuration was still conducive for those users to participate in the VR intervention. This highlights the need for clear decision-making framework for technology selected in VR-based interventions.
Second, there is a need for cross field collaboration to ensure that VR interventions have the programming capacity for individualization, systematic teaching procedures, and reinforcement contingencies that are transferable to the real environments. In many of the studies included in this review, therapist/researchers were still providing prompts and reinforcement rather than these elements being seamlessly incorporated into the VR system. For example, Dixon et al. (2020) had their participants vocally state if it was safe to cross the road, rather than crossing the street in the VR environment. Real-time videos were used of the participants communities rather than the development of a virtual world that would allow for participant interactions. Such VR uses may be limiting in that the participants do not get a fully immersed experience where the real-world behaviors (e.g., crossing the street safely) contact reinforcement. And furthermore, it may be difficult from a programming perspective for a therapist to incorporate adjustment to the stimulus presentation within a virtual environment; thus, collaboration at the programming level is needed to ensure the relevant variations to the virtual environment are included. This may indicate that there was a lack of collaboration across technology developers and behavior analysts. As such, future research should consider the benefits of cross-field collaboration to improve the quality and efficacy of VR-based interventions. Third, there is a need to evaluate other skills taught using behavioral interventions, where VR could provide a better context for developing effective interventions. Given the few studies focused on safety skills in the current literature, this seems like an important area that could benefit individuals who are working to develop these functional skills. For example, abduction prevention could be an area where VR-based interventions might provide for more effective training, as compared to role playing or social stories-based interventions, since the virtual environment could include relevant signals with multiple exemplars and provide practice opportunities (e.g., Ledbetter-Cho et al., 2016).
For practitioners, it is important to highlight the use of evidence-based practices (EBPs) when developing interventions for individuals with autism. Given the range of technology options for VR-based interventions, consideration of prerequisite skills for both the use of technology and the skill is targeted within the intervention. Thus, assessment should be utilized to help guide the intervention plans. For example, if using VR goggles, it would be important to do some direct assessment to ensure the user has the necessary skills and that the VR experience is enjoyable and does not cause issues, such as motion sickness. Practitioners would also want to be sure that generalization of the skill is accounted for within the intervention and transfers easily to the real world. This may also require incorporating other stakeholders within the intervention phases to ensure the technology used is feasible for everyone involved. As VR technology continues to advance, research is needed to help provide a clear framework for collaboration and decision-making to help progress and extend VR-based interventions.