1 Introduction

Schizophrenia is a chronic, heterogeneous, and complex mental disorder. Three main groups of symptoms have been identified: positive (i.e., hallucinations, delusions and disorganized speech and motor behavior), negative (i.e., emotional flattening, apathy, alogia), and cognitive symptoms (American Psychiatric Association 2013; Kahn et al. 2015).

Cognitive disturbances in schizophrenia are recognized as a major feature of the disorder. They are considered a separate entity from positive and negative symptoms (Halverson et al. 2019; Hasson-Ohayon et al. 2018). These impairments are heterogeneous, however, the domains more consistently reported are attention, memory, processing speed, executive functions, and social cognition (SC) (Barrera 2006; Kenney et al. 2015; Kern and Horan 2010).

1.1 Social cognition (SC) and schizophrenia

SC refers to a set of processes that underlie social interactions. These processes facilitate the perception, interpretation, and generation of responses to the intentions, dispositions, and behaviors of others, considering contextual factors in order to make such inferences (Adolphs 2001; Kennedy and Adolphs 2012). Brain structures associated with the social brain and SC have been reported to comprise a wide variety of cortical and subcortical structures (Bickart et al. 2014; Kennedy and Adolphs 2012; Porcelli et al. 2019).

Specifically in schizophrenia, disturbances have been found in different networks that integrate cortical and subcortical areas of the social brain, affecting the different SC processes (Campos et al. 2016; Lee et al. 2004). These deficits are found to be present before the onset of the first psychotic episode and their course tends to be persistent and stable throughout the evolution of the condition (Solís-Vivanco et al. 2020). Functional impairments may result from deficits in SC, and it has been proposed that SC plays a mediator role in the interaction between neurocognition and functional outcomes (Fett et al. 2011; Halverson et al. 2019). A consensus was reached to identify the processes of SC that are relevant for research in patients with schizophrenia: emotional perception, attributional styles, social perception, and theory of mind (ToM) (Pinkham et al. 2014). Emotional perception refers to the ability to recognize the emotional states of others from facial, postural, and prosodic cues (Green and Leitman 2008; McDonald and Cassel 2017). Social perception can be defined as the ability to identify and comprehend social cues (e.g., verbal messages, paralinguistic information, nonverbal behaviors, etc.) thus favoring the integration of social knowledge and information obtained from the context (Singhai et al. 2020). Attributional styles refer to the way in which people interpret and explain the causes of interactions and social events whether they are positive or negative (Pinkham et al. 2014; Singhai et al. 2020). Finally, ToM can be defined as the ability to interpret the behaviors and speech of others in terms of their intentions, knowledge, and beliefs (Schaafsma et al. 2015). ToM is a very complex construct that involves specific subdomains, to deepen into ToM characteristics, please refer to Byom and Mutlu (2013).

1.2 SC interventions in schizophrenia

To date, many approaches and interventions for SC have been developed. Social Cognition Training (SCT) can be defined as a broad term that encompasses different psychosocial approaches focused on the intervention of SC impairments (Nijman et al., 2020a). There are different approaches to SCT, which can be categorized as follows: Targeted interventions (i.e., the training of one SC domain), comprehensive interventions (i.e., the training of 2 or more SC domains) and broad-based approaches (i.e., the integration of SC interventions, either targeted or comprehensive, with other psychosocial and/or neurocognitive interventions). The number of sessions and duration of such interventions is variable, ranging from 2 h (Kayser et al. 2006) to more than 6 months (Hogarty et al. 2006, 2015). Regarding the form of administration, this is also highly variable, as slides, photographs, computers, videos, and audios can be used. With respect to the efficacy of SCT, Nijman et al. (2020a) conducted a network meta-analysis to identify its effect on neuropsychological measures of SC and social functioning in patients with psychotic disorders. They reported that targeted interventions were effective for basic SC processes, like emotional perception, which enables other more complex processes to take place, like ToM. On the other hand, SCT with a broad-based approach was the most effective with moderate to large effect sizes for social perception and social functioning.

1.3 Virtual reality (VR) as a potential intervention tool

Despite the evidence of brain changes and efficacy in improving different SC domains related to SCT interventions, these approaches still do not fully satisfy the demands of generalization of the changes obtained by SCT to activities of daily living (Campos et al. 2016). These generalization limitations could be associated with several factors such as the intervention delivery (group or individual), the type of materials (i.e. photos, slides, videos, etc.), the duration of the intervention, the intensity (i.e. one-per week, twice a week, etc.), among others (Lahera et al. 2021; Nijman et al. 2020a). There is now evidence that the effect of cognitive interventions on social functioning improves when a systematic application of cognitive processes in real-life situations is included (Medalia and Saperstein 2013).

Virtual reality (VR) has been proposed as a tool that would facilitate obtaining valid results in real-lifelike scenarios (Alcañiz et al. 2018). VR allows the immersion of participants in virtual environments designed to represent the real world as much as possible, thus facilitating the interaction between the environment, the avatars, and the participant. VR offers the possibility to adjust the difficulty level to the specific skills of each patient, to give immediate feedback, and to obtain automatized recordings of the patient’s response (Peyroux and Frank 2016). It also allows traditional training methods to be implemented in more relaxed, nonclinical virtual environments. In addition, as VR becomes more accessible, affordable, and more widely researched, owing to continual technological advancements, it will ultimately be well positioned as a home-based treatment approach (Bohil et al. 2011; Mishkind et al. 2017; Turner and Casey 2014).

There are two ways in which VR can be used: non-immersive VR and immersive VR. Non-immersive VR is the most basic form of VR, requiring only a flat screen, computer monitor or game console. It should be noted that in this type of VR application, external perceptual stimuli are not restricted and interaction with the environments is carried out by means of joystick, mouse, keyboard or other specialized controllers. On the other hand, the goal of immersive VR is to perceptually replace stimuli from the outside world with elements of the virtual world (Rizzo et al. 2018). There are several technologies available for the use of VR. The most widely used are desktop or laptop computers operated with keyboard, mouse and/or joystick, Head Mounted Displays (HMD), and Cave Automatic Virtual Environments (CAVE). Through the creation of different virtual environments, it is intended to simulate everyday experiences, providing high-controlled conditions along the interventions, high ecological validity and thus increasing the ability to transfer the skills learned in the intervention to real life (Gainsford et al. 2020).

1.4 VR for SC interventions in schizophrenia

The first VR adaptations focused on SC were created for autism spectrum disorders. Early studies in this population demonstrated positive results mainly in ToM and emotion recognition (Kandalaft et al. 2013; Strickland et al. 1996). VR applications have also been developed for use in schizophrenia. However, most studies have focused on improving social skills such as conversation and job interview skills, rather than specifically targeting impaired SC symptoms such as ToM or emotion recognition (Gainsford et al. 2020).

As there is limited evidence and few approaches on the use of SCT-VR in schizophrenia, it is crucial to identify and explore the characteristics of existing interventions. This will enable researchers and clinicians to learn about the SCT-VR interventions that have been developed and applied in patients with schizophrenia. Moreover, clinicians and researchers will be able to identify the specific tasks, stimuli, and characteristics of these interventions, allowing for testing in various contexts and even creating new SCT-VR proposals.

It has been reported that there are differences between the effects of immersive VR and non-immersive VR, with immersive VR motor and cognitive interventions showing a greater impact compared to those using non-immersive VR (Ren et al. 2024; Ventura et al. 2019). Furthermore, certain SC processes may benefit from increased intensity, even in interventions of short duration. This has been observed primarily in targeted interventions which address fundamental SC processes, such as emotion recognition (Nijman et al. 2020a). However, it is crucial to understand the characteristics of the studies in which SCT-VR has been implemented, including the study design and the number of participants that support the results and findings. It is also important to consider the year of publication and geographical location to identify the development of this field over time and the regions that have contributed the most evidence. Although there are few studies, it is important to understand their main findings, from feasibility to intervention results. In addition, a fundamental goal of psychosocial interventions in schizophrenia is to provide motivation towards the intervention and the tasks. Medalia and Saperstein (2011) reported that adherence to treatment is increased by treatment motivation. When using VR in cognitive interventions, it is crucial to note that certain tasks and stimuli may increase patient motivation, reducing the likelihood of intervention abandonment. Freeman et al. (2022) reported that patients with psychosis had overall positive opinions of VR interventions and experienced minimal adverse effects. Thus, the VR intervention approach could become a potentially useful tool for cognitive and social enhancement to be applied in severe psychiatric conditions, like schizophrenia. Due to the state of art of this topic, a scoping review is a suitable method to examine the extent of existing literature, the characteristics of the studies, and the SCT-VR. Additionally, it allows us to summarize findings from a heterogeneous body of knowledge and identify gaps in the literature favoring the planning and conduct of future research.

Therefore, the aim of this scoping review was to explore and describe the characteristics of SCT-VR interventions in schizophrenia. A preliminary search of MEDLINE, the Cochrane Database of Systematic Reviews, and JBI Evidence Synthesis yielded no current or underway systematic reviews or scoping reviews on the topic. To achieve the objective of this scoping review, the following research questions have been developed:

  1. 1.

    How many articles have been published? Is there a time trend for publications? In which countries have they been published?

  2. 2.

    What are the characteristics of the studies (i.e., study design, number of participants, etc.)?

  3. 3.

    What are the characteristics of the intervention? What type of intervention are they using (i.e., targeted, integrative, broad-based)? What processes are they targeting? Which VR modality are they using (i.e., immersive or non-immersive)? How long does the intervention last?

  4. 4.

    What are the key outcomes of the SCT-VR?

2 Methods

2.1 Protocol and registration

The scoping review protocol is registered with the Open Science Framework (https://doi.org/10.17605/OSF.IO/MNJ9Y). We have used the PRISMA Extension for Scoping Reviews checklist (Tricco et al. 2018) in the reporting of this review. The research team decided to include an additional research question regarding the primary findings of the analyzed articles.

2.2 Eligibility criteria

The inclusion criteria were categorized according to the broad Population—Concept—Context (PCC) mnemonic recommended by the Joanna Briggs Institute for scoping reviews (Peters et al. 2020a).

Population

Individuals with clinical high-risk (CHR) or schizophrenia-spectrum disorders (i.e., schizophrenia, schizoaffective disorder, delusional disorder, schizophreniform disorder, brief psychotic disorder or first-episode psychosis). Any age and sex. To obtain a greater number of results, it was important to broaden the range of the population within the spectrum of schizophrenia disorders. Furthermore, evidence suggests that impairments in SC are present in various disorders within the psychotic spectrum (Bora and Pantelis 2016).

Concept

Social cognition training with VR (SCT-VR), which refers to the implementation of social cognition training (SCT) through interactive virtual environments.

Context

All settings considered, and no geographical or temporal restrictions were applied. Original research articles (single-case studies, cross-sectional and longitudinal studies with or without a comparison control group, RCT, RCT protocol, or any other method) were included. These criteria were considered because this scoping review aims to provide as much information as possible about the characteristics of the studies and the SCT-VR, so broadening the criteria enabled us to obtain as many results as possible. Unpublished studies were included to increase the number of available sources and to provide valuable information for answering research questions. Additionally, this review is not intended to determine the effectiveness of these interventions in people on the schizophrenia spectrum, so it allows us to broaden the study designs of the published articles.

2.2.1 Exclusion criteria

Papers published as posters, systematic reviews, or any study not providing a description of the interventions were excluded as they did not provide sufficient information to answer the research questions. Additionally, studies which did not use VR and those that used social skills training were excluded, as social SCT and social skills training differ in terms of conceptual framework, methods used and processes involved. For a review of VR interventions for social skills, please refer to Oliveira et al. 2021. Studies were excluded if more than 50% of the population did not belong to the schizophrenia spectrum disorder.

2.3 Information sources

The search strategy aimed to locate both published and unpublished studies, the search strategy was developed by ELR in collaboration with the research group. An initial limited search of MEDLINE and PsycInfo was undertaken to identify articles on the topic. The text words in the titles and abstracts of relevant articles, and the index terms used to describe the articles were used to develop a full search strategy (see Additional file 1). The search strategy, including all identified keywords and index terms, was adapted for each included database. The reference list of all included sources of evidence was screened for additional studies. The search strategy was not limited to a date range or language. The searched databases were MEDLINE (OVID), PsycInfo (OVID), Web of Science (CLARIVATE) and CINAHL (EMBASE). The final search results were exported to Covidence (Veritas Health Innovation, Melbourne, Australia, available at www.covidence.org) and duplicates were removed.

2.4 Search strategy

The search was conducted on 1 September 2022 and updated on 14 February 2024. The final search results were exported to Covidence (Veritas Health Innovation, Melbourne, Australia, available at www.covidence.org) and duplicates were removed. The final search can be found in Supplemental material. The search strategy was (Schizophrenia OR Psychotic Disorders OR schizo*) AND (virtual reality OR Virtual Real* OR VR OR virtual enviro* OR virtual character* OR VCs OR avatar*) AND ( Social Cognition OR emotion recognition OR emotion processing OR theory of mind OR social perception).

2.5 Selection of sources of evidence

Following a pilot test, titles and abstracts were screened by two independent reviewers (JLP and DGS) for assessment against the inclusion and exclusion criteria for the review, a kappa coefficient of 0.37 was obtained. Potentially relevant sources were retrieved in full text and their citation details were imported into the Covidence software. The full text of selected citations was assessed in detail against the inclusion criteria by two independent reviewers (JLP and DGS). The kappa coefficient obtained was 0.44. We resolved disagreements on study selection by an open discussion with the reviewer and reached consensus, this was important due to the low levels of agreement. The primary reasons for disagreement during the full-text review were the study designs and types of interventions. Reasons for the exclusion of sources of evidence in full text that did not meet the inclusion criteria were recorded and reported in the scoping review. We resolved disagreements on study selection and data extraction by consensus and open discussion with other reviewers if needed (DPF and AMM).

2.6 Data charting process

A data-charting form was jointly developed by two reviewers (DPF and AMM) to determine which variables to extract. That form was uploaded into the Covidence software. The tool captured the relevant information on key study characteristics and detailed information on studies of SCT-VR in patients with schizophrenia. Two reviewers (ASM and DPF) independently charted data from each eligible article. Any disagreements were resolved through discussion between the two reviewers or further adjudication by a third reviewer (AMM).

2.7 Data items

We extracted data on article characteristics (e.g., country, date, funder, type of study design), type of SCT (e.g., targeted, integrative, broad-based), SC processes (e.g., ToM, social perception, empathy, emotion recognition, attributional styles), type of VR modality (e.g., immersive, non-immersive), type of VR technology used (e.g., HMD, CAVE, computer, laptop) and intervention characteristics (e.g., duration of the intervention, type of stimulus, tasks, group or individual).

2.8 Synthesis of results

The source selection process was summarized and presented in a PRISMA-ScR flow diagram (Tricco et al. 2018). The PCC framework and review questions were used to guide the data synthesis process. To address the review questions, we used descriptive statistics and narrative reports. We presented the results with tables and figures. Several team meetings were held to discuss data synthesis plans and key findings.

3 Results

3.1 Selection of sources of evidence

A total of 1,407 records were identified, from which 235 were duplicated. In the screening phase, 1,150 records were excluded and 22 were assessed for eligibility. A total of 10 reports were removed because they did not fulfill the inclusion criteria. Finally, 12 studies were included for further analyses (Fig. 1).

Fig. 1
figure 1

PRISMA flow diagram of the study selection process

3.2 Characteristics of sources of evidence

3.2.1 Methodological characteristics of the studies

Table 1 shows the characteristics of the reports included in the review. All papers were recent (< 10 years old); most SCT-RVs were developed in Europe and applied in clinical contexts (i.e., hospitals, health institutions, etc.); study designs were variable, but the most common were proof-of concept or pilot studies (4 out of 12; 33.3%) and RCT’s (3 out of 12; 25%); interventions were applied, or pretended to be applied to patients with schizophrenia (6 out of 12; 50%) or psychotic disorder (6 out of 12; 50%).

Table 1 Characteristics of the reports included in the review

As mentioned above, the included reports were recent, ranging a time frame from 2016 to 2023. Interestingly, most manuscripts were published from 2020 to 2023 (10 out of 12; 83.3%) (Fig. 2).

Fig. 2
figure 2

Number of publications per year

3.2.2 Characteristics of the interventions

Regarding the SCT-VR features, most interventions were immersive (11 out of 12; 91.6%), in which HMD was employed; integrative interventions were applied more frequently (7 out of 12 each; 58.3%) than targeted (4 out of 12; 33.3%) and broad-based interventions (1 out of 12; 8.3%). The SC domain which was most frequently stimulated was ToM (75%), followed by emotion perception/recognition (66.6%), and social perception (58.3%). Number of sessions ranged from 9 to 16, and the duration of each session ranged from 45 to 120 min. Most interventions were applied once a week (7 out of 12; 58.3%) and delivered individually (11 out of 12; 91.6%) (Table 2). All the interventions were guided by a trained therapist.

Table 2 Characteristics of SCT-VR interventions

3.3 Results of individual sources of evidence

3.3.1 Description of the interventions and outcomes

Based on the selected studies, five different intervention programs could be identified, which are described below.

VR-SOAP (Meins et al. 2023).

VR-SOAP consists of five modules that address the identified causes of impaired social functioning: four optional modules (1 to 4) and one fixed module (5). In sessions 1 and 2, the patient and therapist discuss the baseline assessment and formulate goals concerning social contacts, leisure activities and/or social participation. After session 2, the patient and therapist select two of the optional VR modules (4 sessions each): Module 1- Negative symptoms Domain (behavioral activation and planning in social situations); Module 2 - Social cognition Domain ( recognizing facial emotions, interpreting social situations, mentalizing); Module 3 - Paranoid ideations and social anxiety Domain (behavioral experiments testing harm expectancies, exposure exercises, dropping safety behavior); Module 4 - Self-esteem and self-stigma Domain (improving positive self-image in interactions, practicing disclosure of mental illness); Module 5 - Communication and Interaction skills Domain ; practicing conversations, job interviews, attending a party with a focus on social skills). No results have been reported since the only study available is a protocol for a single-blind multi-center randomized controlled trial.

DiSCoVR (Nijman et al. 2019, 2020b, 2023b, 2023a)

This training aims to improve SC processes through an immersive VR paradigm. DiSCoVR is a comprehensive intervention composed of 16 individual 45-to-60-minute sessions over 8 weeks (2 sessions per week). The virtual environments were presented with an Oculus Rift VR headset. During the sessions, the VR software is controlled by the therapist, using one monitor to observe the participant’s view, and another monitor to control the virtual environment. The intervention is structured to start with basic SC processes and the difficulty and complexity increases with each module. This intervention is composed of 3 modules. Module 1 (sessions 1 to 5) provides participants with training in facial emotion recognition. Participants must select, through an arrangement of forced choices, the emotion reflected in the avatar’s face. Module 2 (sessions 6 to 9) trains ToM and social perception, participants are asked to observe an interaction between avatars. These interactions contain misunderstandings, hints, true and false beliefs, and social mistakes. Lastly, module 3 (sessions 10 to 16) focuses on the application of the previously trained skills by interacting with an avatar that is being manipulated by the therapist. DiSCoVR was reported to be feasible and acceptable among psychotic patients, the most frequently mentioned strength of DiSCoVR was the opportunity to practice with personalized social situations. They found a significant effect for the total Ekman 60 Faces score, and social functioning scores, but no significant effect was found for ToM measures.

RC2S (Peyroux and Franck 2016).

This training aims to improve SC processes through a non-immersive VR paradigm, is a comprehensive intervention and is composed of 14 individual, weekly sessions of 60–90 min. RC2S trains emotion recognition, social perception, attributional styles, and ToM. It is an individualized and flexible training that is tailored to the personal goals of the participants. It consists of 2 preparation sessions, 10 cognitive remediation sessions, and 2 transfer sessions. The cognitive remediation sessions are composed of four parts: (1) the patient must create a coherent mental representation of the proposed situation; (2) in the VR environment the participant is asked to help an avatar in a social situation and guide him/her during the interaction by choosing a behavior pattern (passive, aggressive or assertive), the social interaction scene follows a predefined but flexible decision tree in which the patient’s choice influences the avatar’s progress and the next interactions, this part is recorded for further analysis; (3) from the recording, the participant and the therapist analyze the participant’s behavior in each of the interactions; (4) homework tasks are chosen by the patient in collaboration with the therapist to foster motivation. Tasks are adapted to the patient’s needs considering the patient’s daily contexts. All the situations are ranked in order of increasing difficulty, based on both the emotional or affective nature of the situation and the complexity of the characters’ interactions. Results show a significant improvement in emotion recognition and ToM measures.

SCIT-VR (Shen et al. 2022; Thompson et al. 2020).

This program is a VR adaptation of the Social Cognition and Interaction Training (SCIT; Penn et al. 2007). The aim of SCIT-VR is to train emotion recognition and attributional styles. The SCIT-VR has a duration of 10 sessions in which 2 will be held individually and the remaining will be group sessions. The individual sessions last 30 min approximately, during these sessions the researcher helps the participant to set up and become familiar with the technology required to be able to participate in the virtual group sessions. This includes providing a headset, helping the participant with the login procedure, setting up their avatar and helping them become familiar with the Second Life© environment. The next eight-group sessions last 60 min. The first three focus on emotion recognition, the next three on attributional styles and the last two on skill acquisition using CBT-derived strategies. Thompson et al. (2020) reported that SCIT-VR therapy delivered was feasible, acceptable, safe, and showed high levels of participant satisfaction. Pre-post changes were found in emotion recognition scores. Shen et al. (2022) reported that Chinese adaptation of SCIT-VR is feasible, and found significative changes in emotion recognition and ToM measures. Finally, they found that participants had higher social functioning scores post-intervention than at baseline. VR-ToMIS (Vass et al. 2021a, 2021b, 2022).

This training aims to improve ToM through an immersive VR paradigm. This program is a targeted intervention for ToM. It consists of nine-weekly sessions lasting 50 min each. The first session aims to help patients understand the method and the use of the program. The next eight-sessions begin with a brief introduction that aims to review the activities between sessions (homework and monitoring of behavioral changes) and the key procedures for change. Three steps are considered: (1) The patient participates in simulated social interactions with an avatar in immersive VR environments, double meaning sentences, overstatements and irony were included in the conversation; (2) each simulation was followed by a task, where the patient had to visualize the inferred emotions of the avatar by using Temporal Disc Controller (TDC; Csukly et al. 2004); (3) finally, the simulation experiences are discussed with a trained therapist right after each task, during this step the therapist uses cognitive and metacognitive techniques to assist the patient in recognizing congruencies between cognition and behavior and in developing appropriate behavioral strategies. After the third step, the participant is given the option to virtually test the learned reactions by repeating the three steps as many times as possible within the framework of the session. Results show that VR-ToMIS is feasible and acceptable among schizophrenia patients. No side effects were reported during treatment. Furthermore, VR-ToMIS was associated with improvements in negative symptoms and different measures of ToM. Regarding quality of life measures, only the Social relationships domain showed significant change, which stayed notable also at the follow-up.

Video task (Dumont et al. 2022)

Dumont et al. (2022), developed 5 social situations presented with immersive VR in a 360-video format. The aim of these videos is to facilitate training in the detection of social cues for the adequate interpretation of social situations. Four of the 5 videos had ambiguous social situations (self-reference or intentionality). Participants are given a brief description before the beginning of the video, then they are shown the 360 video and then asked questions regarding the situation and the actors’ intentions. Results of the validation phase showed that all scenarios generated adequate sense of presence and were considered highly realistic. Three scenarios elicited biases and, consequently, moderate levels of anger. No results have been reported for feasibility or efficacy of this intervention.

Table 3 Outcomes of SCT-VR interventions

4 Discussion

4.1 Summary of evidence

The aim of this scoping review was to explore and describe the characteristics of SCT-VR interventions in schizophrenia patients. Twelve manuscripts fulfilled the inclusion criteria applied for this purpose and allowed us to answer the research questions. Next, the discussion of our findings will be presented according to such research questions.

4.2 How many articles were found? Available literature, timeframe, and localization

According to our results, the development of SCT-VR programs for schizophrenia patients can be considered as a very recent research field. This affirmation is supported by the fact that the published papers on the topic that were considered for the present review are quite recent, from less than 10 years ago (2016–2023), and the number of original research manuscripts is limited (12 studies). These findings are consistent with the general development of the research field on SC in schizophrenia. According to Green et al. (2019), the number of published studies on this topic has been remarkable during the last 15 years. At first, the studies were focused on assessment methods of SC domains, and in the last decade, intervention designs have been explored as well. For SCT-VR, there has been an upward trend in the number of publications per year. This trend may be attributed to the continuous technological development that has led to a decrease in VR costs, making it more accessible. Moreover, its continuous upgrades will facilitate the development of more immersive and interactive interventions (Chan et al. 2023). Furthermore, the growing interest in VR enables a better visualization of its potential scope in cognitive interventions.

We found that most studies were developed in Europe. A possible explanation for this, is that in 2003 The International Social Cognition Network (ISCON) was founded, in which the European Social Cognition Network (ESCON) research group was included. The creation of such initiatives could have stimulated this research area mostly in Europe, when compared to any other geographical areas. Nowadays, the committee of ISCON is integrated by researchers from universities from Chicago, Cologne, Toronto, Washington, and North Carolina, so the interest in this research area could be expanding worldwide (International Social Cognition Network 2023).

4.3 What are the characteristics of the studies? Studies’ designs, participants, and assessment tools

The twelve included manuscripts are widely heterogeneous regarding the experimental design they implemented, ranging from single case reports to randomized controlled single-blind clinical trials. Most studies were proof-of-concept or pilot studies, which evidences the recent nature of the research on SCT-VR. However, RCT designs have been growing in the field, indicating that in the next few years the evidence about SCT-VR efficacy will be more consolidated. Thus, it was expectable that the number of participants varied considerably from 1 to 116. However, it must be noted that the sample of 116 participants (Meins et al. 2023) is a protocol, so real data on SCT-VR ranges from 1 to 87 subjects, indicating the difficulty to draw robust conclusions. Furthermore, the participants’ diagnoses are variable among studies. Half the studies included or pretended to include patients with an established diagnosis of schizophrenia or schizoaffective disorder, whilst the other half included (or pretended to include) patients with psychotic disorder. Such diagnostic variability is quite common in the field of psychiatry research. The main factor that is usually considered to differentiate between schizophrenia and psychosis, is chronicity. Some chronic psychiatric conditions like bipolar disorder, schizophrenia, or schizoaffective disorder, may initiate with a psychotic episode, and a definitive diagnosis only can be reached through time, since the different clinical entities are characterized by variable courses and the presence of other symptoms that support each diagnosis (Barciela and Kahn 2021).

However, the clinical heterogeneity of the psychotic spectrum implies that the severity of the deficits in SC are quite variable among diagnoses. For instance, a meta-analysis which compared SC deficits between schizophrenia and bipolar disorder patients, found that the former underperformed the latter in SC tests (Bora and Pantelis 2016), showing that the therapeutic needs of the patients among the psychotic spectrum vary significantly. Thus, the diagnosis heterogeneity among studies is another factor that prevent us from drawing definite conclusions on the use of VR in SC training. However, it must be highlighted that VR tasks are flexible regarding the level of difficulty that can be controlled along the intervention program according to the deficit severity and needs of the patient.

It is noteworthy to mention that the assessment tools for measuring SC are quite variable among studies as well. This is expected because each SCT-VR program aimed to improve different aspects of the SC construct. Furthermore, until recent years, no gold-standard assessment tools had been proposed to measure SC in schizophrenia and related disorders. Although the MATRICS Consensus Cognitive Battery (MCCB, Nuechterlein et al. 2008) is considered as the gold-standard neuropsychological measure for cognition in schizophrenia, the subtest included for SC (MSCEIT-ER) focuses on emotional processing, thus narrowing SC assessment to this specific domain. Fortunately, in 2022 a SC battery for schizophrenia spectrum disorders was developed to address this gap and to standardize SC measurement in this population (Halverson et al. 2022).

4.4 What are the characteristics of the intervention?

As expected, a wide variability was observed among the SCT-VR programs (i.e., DiSCoVR, RC2S, SCIT-VR, VR-ToMIS, VR-SOAP and 360 video task) in terms of session number, intensity, and training SC domain. For instance, DiSCoVR must be implemented twice a week, whilst RC2S and VR-ToMIS require a session per week. Session duration varies from 30 to 120 min. These variables are of special interest in VR context because there is no sufficient data to suggest the optimal VR exposition that could prevent fatigue or adverse effects associated with VR exposition, like cybersickness (Saredakis et al. 2020). Thus, more research must be conducted to determine the optimal factors/conditions related to VR that could guarantee a beneficial therapeutic effect with low risk of adverse effects related to VR. In this regard, Krohn et al. (2020), propose a multidimensional evaluation framework for VR applications in clinical neuropsychology (VR-Check). This framework can be useful for assessing the existing paradigms, as well as for guiding the development of new cognitive paradigms for neuropsychological assessment or intervention. Most SCT-VR utilized integrative and targeted interventions. Although broad-based interventions have been reported to have a greater effect (Nijman et al. 2020a), only one of the intervention programs was based on this approach. However, it is a protocol for RCT and is not yet finalized. Interestingly, all SCT-VR programs but one (Thompson et al. 2020) were delivered individually. The latter (SCIT-VR) was an adaptation from an original non-VR intervention, so a mixed modality (i.e., individual and group) was applied. One of the challenges involving SC intervention programs is precisely the patient’s training on adaptative social interaction skills, so the ecological practice of the learned abilities with others should be mandatory. Thus, the following question arises: Which modality of delivery (i.e., individual vs. group) is more effective for SC training in schizophrenia? A systematic review by Lockwood and Page (2004) has informed that each modality delivery in other psychological interventions has contributed with distinct benefits. For instance, individual-delivered interventions can improve general functionality, enhance specific abilities and insight, whilst group interventions improve general psychiatric symptomatology, decrease social anxiety, and enhance social interactions. Regarding SCT-VR, we still do not have sufficient evidence to answer this question. In line with the findings of Lockwood and Page (2004), we might hypothesize that specific abilities could be better trained and modeled in individual sessions, while trained-abilities generalization could be developed through group sessions. Moreover, we still do not have enough information about the therapeutic role of the avatars used in VR settings. The effect size and scope of the interaction training through an avatar, if it fulfills the complexity of real social interactions to stimulate generalization, remains elusive. Future research on SCT-VR must address these questions.

Although the observed variability among studies, it is noteworthy to mention that some consistencies have been found as well. For instance, the most used intervention modality was immersive VR. A systematic review conducted by Bisso et al. (2020) reported that immersive VR therapies applied to patients within the schizophrenia spectrum are well tolerated, safe, with long-lasting positive effects, and no significant side effects associated with VR exposure. The promising data on the use of immersive VR in this population, may have led to explore its use in a variety of interventions, including SCT-VR.

4.5 SCT-VR outcomes

It is important to note that this study did not include a quality assessment of the articles as it was not one of the main objectives of this scoping review. Therefore, the results described below should be treated with caution. As mentioned before, the selected articles showed heterogeneity in terms of the population, study design, and objectives. Consequently, some studies did not report results because they were protocols. Despite the heterogeneity, the results can be classified into two groups: (1) feasibility, motivation, and satisfaction, and (2) efficacy of the interventions.

4.5.1 Feasibility, motivation and satisfaction

The studies indicate that SCT-VR is feasible for patients with psychotic spectrum disorders and schizophrenia spectrum disorders, measured through open-ended and Likert-type questionnaires, as well as dropout rates. Shen et al. (2022) report no significant difference in dropout rates between SCT-VR and traditional SCT-VR. However, the traditional intervention group had a higher voluntary dropout rate. The studies overall report minimal effects of cybersickness, good tolerability, and a strong sense of presence in SCT-VR. These findings are consistent with those reported in various systematic reviews (Bisso et al. 2020; Chan et al. 2023; Freeman et al. 2022; Lan et al. 2023). In Lan et al. (2023) study, they conducted a systematic review on the use of VR for the diagnosis and treatment of psychotic disorders. The authors found that VR interventions are feasible, accepted, and do not generate clinically significant symptoms in patients with psychotic disorders. Most studies report high scores on measures of treatment satisfaction and motivation. This is consistent with Bisso et al. (2020), who found that patients reported higher motivation for VR treatment compared to traditional VR. These results are encouraging, suggesting that patients are motivated and enjoy the intervention tasks. Treatment motivation has been associated with a higher rate of treatment adherence (Medalia and Saperstein 2011), which can improve the effectiveness of interventions (Altman et al. 2023). It is important to note that apathy is one of the main symptoms of schizophrenia, making it crucial to develop interventions that increase motivation in participants. The use of VR and gamification can provide stimuli and elements that may help to compensate for the lack of motivation towards treatment (Sardi et al. 2017; Shen et al. 2022; Vajawat et al. 2021).

4.5.2 Efficacy of SCT-VR

Regarding the efficacy of the interventions, due to the prematurity of the field and the heterogeneity of the studies, it is not possible to draw firm conclusions. Additionally, differences were found in the reported outcomes, with some studies only reporting measures of feasibility and tolerability, while others included measures of SC, neurocognition, and psychiatric symptomatology. Few studies reported outcomes on social functioning and quality of life. Regarding emotion recognition, most studies have reported significant improvements (Peyroux 2016; Shen et al. 2022; Thompson et al. 2020). These results are comparable to those of traditional SCT, where emotion recognition is consistently one of the processes that benefit from these interventions (Nijman et al. 2020a). Regarding the effect of SCT-VR on ToM, the results are variable. Some interventions, such as those by Peyroux (2016) and Vass et al. (2022), report significant improvement in ToM tasks. However, other studies, including those by Nijman et al. 2023a); Shen et al. (2022); Thompson et al. (2020), did not find significant improvements in ToM. It is worth noting that VR-ToMIS, a targeted intervention for ToM, had the most significant effect on ToM. This is in contrast to what has been reported for traditional interventions, where ToM is said to benefit more from broad-based approaches than from targeted interventions (Nijman et al. 2020a) Interestingly, only one study (Shen et al. 2022) reported significant improvement in social functioning, but it did not find any significant differences between the VR group and the traditional group. This statement is in line with Nijman et al. (2020a) meta-analysis, which found that traditional SCT has a greater impact on basic SC processes than on more complex processes and social functioning. Yeo et al. (2022) also reported this in another meta-analysis, showing that SCT has a greater effect on emotion recognition compared to ToM and social functioning. This may be due to SCT having a greater impact on processes that require less integration of complex social information. One hypothesized strength of using SCT-VR is the potential for generalization and transferability of improvements to everyday life (Gainsford et al. 2020; Nijman et al. 2020a) However, it is unclear whether SCT-VR improves social functioning. The data obtained are inconclusive. Based on the characteristics outlined, it is crucial to further investigate the effectiveness of SCT-VR using designs that minimize the risk of biased results, such as RCTs. Additionally, future studies should compare the effects of traditional SCT and SCT-VR.

5 Limitations

The main limitation of the present review is that no assessment regarding the effectiveness or methodological quality of the analyzed studies was performed, so further conclusions about their clinical and therapeutic value are not included. As mentioned before, the therapeutic use of VR to address SC deficits in schizophrenia is a very recent field, so a small number of studies were retrieved and analyzed, providing an interesting but limited state of art synthesis on this topic. Finally, no direct contact with other researchers or field experts on the topic was conducted while preparing the present manuscript, so the existence of other similar scoping reviews during the same timeline, is not guaranteed.

6 Conclusions

The review of the available literature on SCT-VR for patients with schizophrenia shows that it is an emerging scientific field, which is why the state of the art is focused on developing intervention protocols and providing early evidence about the interventions. Although the results of various studies are encouraging, it is pertinent to note the heterogeneity of intervention programs, both in the domains they attempt to modify, as well as the characteristics of the programs, the sessions and the population being studied. In this sense, it is not possible to have a definitive conclusion on the effectiveness of this tool, because we are dealing with preliminary data in many of the studies. Future studies should evaluate variables associated with intervention efficacy, such as participant intrinsic motivation, task motivation, intervention intensity, and delivery form. This will aid in the development of interventions that consider these factors. It is important to continue generating evidence on the efficacy of these interventions for SC, symptomatology, and social functioning. This can be achieved through the development of randomized controlled trials or single case experimental designs, which can quantify efficacy with lower bias. Additionally, it is relevant to conduct comparative studies between SCT-VR and SCT-Traditional to identify the strengths and limitations of both paradigms.