1 Introduction

Virtual reality (VR) is gaining more and more importance in therapies dedicated to enhancing health care and well-being (Escalona et al. 2019; Saab et al. 2019). In addition, avatar-based applications in psychology and psychiatry are particularly beneficial (Chandrasiri et al. 2020; Prudenzi et al. 2019; Aiken and Berry 2019), as they serve several functions that encourage the involvement of patients with mental health problems and health professionals (Rehm et al. 2016). From the patients’ point of view, avatar-based applications have been shown to help reduce communication barriers and promote treatment adherence through anonymity. On the other hand, avatars allow therapists to control and manipulate treatment stimuli in a relatively simple way.

Schizophrenia is one of the most disabling psychiatric disorders, classified as one of the major diseases contributing to the global burden of disease  (Salomon et al. 2012). It affects approximately 1% of the population and is associated with a significant and persistent functional disability. This disability manifests itself in a variety of areas such as the maintenance of interpersonal relationships, the ability to be independent for activities of daily living, and the performance of pleasurable and leisure activities (Bellack et al. 2006).

The disorder includes different psychopathological dimensions, including the positive-symptom dimension that encompasses the presence of delusions and hallucinations (van Os and Kapur 2009). Hallucination is typically defined as the perception of a sensory process in the absence of an appropriate external stimulus. This perception is lived with the force and impact of a real perception and is not susceptible of being voluntarily controlled by the person presenting it (Fernández-Sotos et al. 2020a; McCarthy-Jones and Resnick 2014). For the experimenter, hallucinations can have a strong negative meaning, which may imply a worse prognosis with higher risk of suicide, neurocognitive deficits and worse functioning (Waters and Fernyhough 2016).

Although effective and acceptably tolerated antipsychotic drugs are now available, a high proportion of patients have persistent hallucinations (Dellazizzo et al. 2018). Cognitive psychotherapeutic interventions are advocated as a first line of treatment for patients who have exhausted the psychopharmacological route. These interventions are usually motivated by inefficacy, intolerance or active rejection of the antipsychotic medication (Fernández-Sotos et al. 2020b). The basis of the cognitive behavioral therapy for psychosis (CBT-p) is that it is not the voice or its content that generates distress to the patient, but the belief it has about it, specifically in relation to its identity, intention, power and control (Garety et al. 2007). The therapy has been shown to be effective in more than 30 randomized controlled trials in improving symptoms and functioning, with an effect size on hallucinations from small to large (Wykes et al. 2007; Birchwood et al. 2014; Hazell et al. 2018). Therefore, numerous clinical guidelines and international protocols recommend CBT-p for patients with schizophrenia in all phases of the disease.

The interpersonal dimension of auditory verbal hallucinations is increasingly recognized (Hayward et al. 2011). A recent review explored the relevance of interpersonal perspectives to the understanding and intervention of auditory hallucinations (Tavares 2017). It was concluded that the experience of auditory hallucinations can be understood within interpersonal frameworks. In addition, the way in which patients relate to their voices shares many characteristics with real-world interpersonal relationships, in such a way that encouraging dialogue between the patient and their voices may help them develop more constructive relationships and reduce their distress (Hayward et al. 2011; Romme et al. 1991). Therefore, within the cognitive approaches, a type of therapeutic approach has been developed for the treatment of auditory hallucinations. It consists of the patient imaginatively visualizing the person or source of these hallucinations. With the help of the therapist, he or she is confronted with these “embodied” hallucinations.

Recently, the use of new technologies has been incorporated in this type of psychotherapeutic approach. Thus, several research groups have incorporated into their therapy a computer avatar representing the heard voice. In the AVATAR therapy (Leff et al. 2013; Craig et al. 2017), patients are assisted by the therapists in the use of an editor tool, so they can give physical form to the voice they hear customizing the head of an avatar. In du Sert et al. (2018), the therapy is similar but they use a full body avatar editor instead of just a head. In this way, following the previously exposed therapy model, the patient is helped to deal with the representation (now computer rather than imaginary) of his/her hallucinations. Recently, this group has published a study demonstrating the efficacy of their approach with the incorporation of the computer avatar (Craig et al. 2017). Interesting results have been published about the application of therapies with avatars in these previous works. However, to the best of our knowledge, neither their tools nor the process that was followed to create the avatars have been described in detail. More importantly, these instruments have not been evaluated by patients or therapists.

Following this line of research, our team is devoted to a progressive implementation of a computer tool for the creation of auditory hallucination avatars [e.g., (Fernández-Caballero et al. 2017a, b)]. It consists on a full-body avatar editor. The main difference with the one in du Sert et al. (2018) is that they seem to edit the avatars using a plugin within the Unity editor (there is no screenshot or description of this process), so it is not clear how easy this system is for the therapists to use. Moreover, the plugin they used (Morph3D) to configure the avatars seems to be abandoned by their creators.

On the other hand, our approach presents a series of advantages in respect to traditional therapies. A portable, standalone editing tool that does not require any prior knowledge, making it easier to handle for both therapists and patients, has been created. According to Freeman (2020), a key advantage of VR is that individuals know that a computer environment is not real, but their minds and bodies behave as if it is real. People would much more easily face difficult situations in VR than in real life and be able to try out more engaging and appealing therapeutic strategies. In our case, the tool created allows for direct interaction between the patient and his/her avatar hallucination. Thanks to the objectification of the hallucination, the approach contributes to the reduction of the patient’s beliefs about the omnipotence of the hallucination. In the same way, the supposed power of the voice and its actions are mitigated.

Despite the existence of tools using configurable avatars to represent auditory hallucinations, the usability and acceptance of these tools yet remain not evaluated. To our knowledge, there is no published evidence that the software tools used in the computer-based therapies described so far (Leff et al. 2013; du Sert et al. 2018) have been evaluated in sufficient detail. Therefore, as a first step in the creation of a useful and usable therapy based on avatars, the main objectives of this work have been (a) to apply the tool to patients with schizophrenia who suffer from auditory hallucinations, (b) to evaluate the acceptance and usefulness of the tool from the practitioners’ point of view and (c) to evaluate the process of creation and the product from the patients’ perspective.

2 Material and methods

2.1 Avatar creation system

Embodied avatars with an acceptable level of visual realism are needed in order to design the avatars that represent auditory hallucinations. Patients will be asked about the physical characteristics of both the face and body that they believe would correspond to the voice they hear in an attempt to recreate what is only in their minds. The avatar creation system has been conceived as a desktop VR application because of the fear that patients went through an excessively unpleasant moment during the process of shaping their auditory hallucinations.

The decision to reduce the level of immersion in this first contact with the embodiment of the hallucinations was driven by the therapists who collaborated in this research. We can talk about an attempt to reduce the shock that patients might experience when facing their embodied hallucination by using head mounted displays as immersive VR screens. In later stages of the therapy, it is planned to gradually increase the level of immersion as patients confront their avatars and gain strength against them. In summary, our approach is that patients start confronting their avatars on a computer screen and then increase the degree of immersion through the use of virtual and augmented reality displays.

This type of avatar creation system is common in video games, especially in Role Playing Games (RPGs) and Massive Multiplayer Role Playing Games (MMORPGs), in which the players can customize the character that represent them inside the game. But character customization is not only used for entertainment purposes. Previous studies have proven these customization options to be positive to improve learning effects of e-learning software (Lin et al. 2017) and for increasing the appealing of different serious applications (Reski and Alissandrakis 2020; Ravyse et al. 2017). This supports the hypothesis of using avatars to treat auditory hallucinations. As a result of this interest, there are also standalone third-party tools devoted to the design of virtual characters to be used for multiple purposes. Examples are Adobe Fuse CC (https://www.adobe.com/products/fuse.html), Character Creator (https://charactercreator.org/), Poser (https://www.posersoftware.com/) and Unity Multipurpose Avatar (UMA) (https://assetstore.unity.com/), among others.

It was our objective to provide the therapists with a standalone tool that they could use for the creation of the avatars without any technical support. This means that they should be able to obtain the intended avatar instantly so that it could be used for the future therapy straightaway. This was an important limitation that forced us to avoid the use of third-party software for avatar creation (such as Fuse, Character Creator or Poser), as they would normally require an external person to take the avatar designed and recompile the therapy application. In addition, these tools can be considered very complex for people without technical or artistic background, as there are hundreds of options and menus that exceed the needs of the avatar creation system as it is proposed for this work. Therefore, the modifications on the avatar mesh need to be done inside the application. UMA is the only one that provides this possibility out-of-the-box, allowing the avatar designed to be used straightaway in further stages of the therapy without any technical assistance. However, the base avatar that come with this tool does not look as good as the other alternatives, which mostly rely on laser scans of real people.

The pipeline for the creation of the avatars used in this paper is depicted in Fig. 1. The pipeline is divided into four stages: (1) the selection of the base characters, (2) the inclusion of predefined animations, (3) the creation of the customization possibilities that would allow the creation of different characters from the basic ones and (4) their inclusion in an interactive desktop VR application where inexperienced users (therapists) could embody the patients’ hallucinations. The following subsections describe each step and the design decisions made.

Fig. 1
figure 1

Pipeline for the creation of the base avatars

2.1.1 Character 3D models, rigging and animation

In our search for more realistic characters, we opted to select characters from a third party tool (Adobe Fuse CC) and modify them so that they include all the configuration possibilities (modifications of face and body) so that they can be used in the tool implementing the therapy. As an initial step, two characters from the ones included in Adobe Fuse CC (one male and one female) were selected. They were formed by 29,855 triangles (13,902 constituting the head) for the female character and 35,946 triangles (14,074 representing the head) for the male character (both hairless). The textures used for the skin had a resolution of \(2048 \times 2048\) pixels and were provided in separate files for diffuse, normal, environmental occlusion and specularity and roughness texture maps. The textures for clothing had a resolution of \(512 \times 512\) pixels and were also delivered in separate files with the same amount of texture maps.

After their selection, they were exported into Adobe Mixamo, an online rigging and animation tool. The two character models were improved with a set of predefined animations thanks to Adobe Mixamo, although only idle animation was used at this stage.

2.1.2 Blendshape creation

After the 3D model of the characters were selected and provided with basic animations, their configuration possibilities were discussed in several meetings between the engineering and psychiatric teams. In these meetings, the parts of the face and body that would allow modification were discussed and agreed.

Once the configuration possibilities were agreed, they were implemented using a 3D authoring software (Autodesk 3D Studio Max) in the form of blendshapes. Blendshape animation is a technique that allows the creation of several vertices configurations inside a mesh that can be individually activated and animated. Each of these blendshapes consisted of the modification of one characteristic of the face and body of the characters. Thus, they store the final position of the vertices for, for example, increasing the size of the characters’ nose. A value of 0 would represent the original nose size, while a value of 100 would represent the maximum size of the nose, allowing the selection of each intermediate step. Following this approach, 44 blendshapes were manually created, 40 for the face and 4 for the body (as summarized in Fig. 2). For the face, they were grouped in 8 for the head, 12 for the mouth area (mouth, lips, cheeks and jaw), 10 for eyes area (eyes, eyebrows and ears) and 10 for the nose.

Fig. 2
figure 2

List of blendshapes designed to allow the avatar creation

Figure 3 provides an example of some of the blendshapes created for the nose. The left part of the image shows the application of the blendshapes on the neutral face, while the right part shows the combination of the blendshapes displayed on each row. The 4 blendshapes designed for the body allowed the modification of different parts of the body at the same time in order to allow the representation of fat and thin characters, old characters, muscular bodies and adding African features to them.

The blendshapes created could be used in combination with each other, as also illustrated in Fig. 3, except the opposites (thin and fat bodies, big and small heads, etc.). All these blendshapes were created for all the meshes that compose the character because for example, increasing the size of the eyes would affect the face, the eyes and the eyelashes meshes. They all need to be synchronously modified in order to obtain a proper result for increasing the eye size.

Fig. 3
figure 3

Example of some of the nose blendshapes created (left) and their combination (right)

2.1.3 Real-time avatar creation

Once all the blendshapes were added to both characters, they were imported into the game engine Unity, where the full therapy will be implemented in the future. They were enhanced using different textures for young and old characters by using different diffuse and normal textures in order to increase the realism of the characters. Moreover, three different hair styles were added for each character, and the possibility to customize colors was added for the skin, hair, eyes and clothes.

As a result of meetings with experienced psychiatrists, different accessories were added in order to represent fantasy characters such as angels (aureole and feather wings), demons (horns and bat-like wings) and blurred shadows. They were included because they are a common representation of the voices among the patients they normally treat in their daily work. All other essential instructions from psychiatrists were based on the recommendations of the CBT-p. In this way, two different user interfaces were created in Unity to facilitate the avatar creation task:

  • Interface to obtain an initial avatar In order to facilitate the task of designing the virtual characters, an initial set of questions was added to the avatar configuration system. The main aim of this questions was to gather generic features in order to obtain an initial version of the character that could subsequently receive further modifications.

    The initial questions were structured in a three-screen wizard dialog: one screen for general features of the character (gender, age, racial group, skin color and complexion), another one for the head features (height, width, size, hair color, hair style) and one for other controls (eye size and color, nose length and width and mouth and lips size). In order to make this initial selection of features for the avatar, a set of predefined sizes and colors are provided to choose from, as can be seen on the left side of Fig. 4.

  • Interface to fine tune the initial avatar: The avatar obtained from the questions described in the previous point is very generic, only some features of the set of customizable features have been established. Thus, the next step in the creation of the avatar that represents a patient’s auditory hallucination is to refine it. For this purpose, a simple user interface has been designed in blocks of common features: general controls, head, eyes, nose, mouth and clothes. Each of them contains a set of user interface controls (sliders, buttons, color pickers, etc.) to easily modify the features of the avatar using the blendshapes created for them. The right side of Fig. 4 shows an screenshot of the user interface for fine tuning the avatar. It was also possible to orbit around the avatar’s head and place the camera at any angle. The avatar follows the camera with its eyes and turns its head about 45 degrees to maintain eye contact. This together with an idle animation, gives the impression that the avatar is alive and aware of the spectator’s presence.

    Additional customization features were added at this point, such as the possibility to select different hair styles and colors, skin color, the color for the iris and the sclera, eye brightness, beard/makeup and a fantasy look (angel, demon and blurred shadow). We tried to prevent the background in which the avatar is presented from generating emotions on the patient. Therefore, a neutral background (plain gray) was selected against a black one as done in previous works (Leff et al. 2013; du Sert et al. 2018), which could result uncomfortable or negative to some patients (Tao et al. 2015).

Fig. 4
figure 4

User interface displaying some of the questions used to obtain an initial avatar (left) and user interface used to fine tune it (right)

2.2 Procedure

This was a 4-month cross-sectional and multi-center study, including three Spanish recruitment centers: “Hospital Universitario 12 de Octubre” (H12O), Madrid; “Complejo Asistencial Benito Menni” (CABM), Madrid; and “Hospital Virgen de la Luz” (HVL), Cuenca.

Meetings with psychiatrists were organized in the three recruitment centers to explain the project. Psychiatrists participating in the experiment signed an informed consent form. The goal was the identification and recruitment of patients. The recruitment was based on their clinical evaluation, so they decided when a patient had the ability to participate in the experiment. In that case, the patient was asked to sign an informed consent form. The session was organized in accordance with our research team.

A session consisted of a single individual task of 35 to 40 minutes duration in which the psychiatrist and the patient had to collaborate in the design of an avatar. Using the avatar setup tool, the patient was asked to design an avatar to embody his/her auditory hallucinations. Before starting the process of creating the hallucination avatar, the patients were informed of the possibility of leaving the study at any time.

In order to obtain an initial avatar that would be modified to resemble a patient’s hallucination, during the sessions the participant responded to the initial series of questions described in Sect. 2.1.3 and illustrated in the left screenshot of Fig. 4. The questions were formulated orally by the therapist who was interacting with the computer. Afterwards, the initial avatar was presented using the fine-tuning interface (screenshot on the right in Fig. 4). At this stage, the avatar was presented to the patient on a 27-inch screen displaying the modifications undertaken by the therapist. This fine tuning of the avatar continued until the patient felt that it was similar enough to what he/she imagined his/her voice would look like.

Finally, some time after the patients’ sessions were completed, they were contacted again to notify them of their participation and offered the possibility of erasing all captured data.

2.3 Participants and data collection

A total of 20 psychiatrists accepted to participate in the study (15 female and 5 male). Their mean age was \(M=37\) years (SD = 10.5, Max = 58 and Min = 26). They were able to involve 29 patients (17 patients from H12O, 7 from CABM and 5 from HVL). Table 1 summarizes the sociodemographic data of the patients.

Table 1 Patients’ sociodemographic data

All of them met DSM-5 diagnostic criteria for schizophrenia evaluated with the Structured Clinical Interview for DSM-5 (First et al. 2015). Patients who did not believe their auditory hallucinations came from human-like characters were excluded from the study. In addition, patients with a relatively low evolution of their disease were preferred since it is known that the result of any therapy (psychotherapy or pharmacological intervention) is more effective the lower their cognitive impairment. The ideal would have been to work with first episode patients or patients with no more than 5 years of prevalence. Finally, the prevalence of the disorder was mean = 6.55 (SD = 3.78), median = 7 (IQR = 5) years. 14 were in an outpatient regime (7 were recruited in different mental health centers, 7 in a day hospital) and 15 in a hospital regime (5 were recruited in short hospitalization units, 7 in average stay units and 3 in the prolonged psychiatric care unit).

Given the type of experiment described in this paper, no control group has been used to conduct a comparison. Indeed, it is not a therapy or treatment that is being evaluated, but rather the process of creating the avatars, as well as their potential usefulness and their acceptance by therapists and patients.

Several online questionnaires were used for data collection. For the therapists, questionnaires based on the well-known USE (Usefulness, Satisfaction and Ease of use) (Lund 2001) and UTAUT2 (Acceptance and Use of Technology) (Venkatesh et al. 2012) were used to evaluate the usability of the tool and their intention to adopt this technology, respectively. The original UTAUT2 paper describes a set of constructs that influence the Behavioral intention and the Use behavior of a new technology. From these constructs, we did not include Price value, Habit and Use behavior because the technology presented is in an early prototype state. Both questionnaires have been widely used to evaluate medical applications and the acceptance of new technology in health environments (Huang 2020; Jaana et al. 2019; Kontogiannatou et al. 2019; Sezgin et al. 2018).

A different approach was followed for the patients in order to avoid them using a computer to fill in the questionnaires. Instead, the therapists interviewed them, asked them the questions and filled in the questionnaires for them. Given the difficulty to find previous questionnaires about the assessment of the avatar creation process (ACP) and the created avatar result (CAR), custom made questionnaires inspired by previous avatar studies (Marcos et al. 2010; McDonnell and Breidt 2010; Nowak and Biocca 2003) were designed for this purpose.

Each item of the questionnaires was assessed on a 5-point Likert scale for the patients’ questionnaires and in a 7-point Likert scale for the therapists’ questionnaires.

2.4 Data analysis

The participant sample has been described using means and standard deviations for the continuous variables like age and prevalence of disorders, and frequencies and percentages for nominal variables such as race and education level.

The analysis performed for the data gathered from therapists and patients had some similarities. The results obtained from the questionnaires (ACP, CAR, USE and UTAUT2) were presented using descriptive statistics. They included measures of central tendency (mean and median) and dispersion (standard deviation -SD- and interquartile range -IQR-). Important results were highlighted using percentages of responses in a given range of answers (positive responses or negative responses). Additionally, stacked horizontal histograms were used to describe the frequency of responses for the questionnaires in a graphical way. SPSS 24 and Microsoft Excel were used for these purposes.

Apart from these descriptive statistics, the avatars created by the patients were described in terms of their main features, analyzing the most prevalent ones.

3 Results

3.1 Patients’ questionnaires

The results of the patient’s answers to the ACP questionnaire are summarized in Table 2 and Fig. 5, which show the questions and descriptive statistics. The frequency of responses is between 4 and 5 is 78%. This result is represented graphically in Fig. 5, where the bars representing 4 and 5 take more than a half of the responses for all questions. Three out of the four questions obtain similar results. The third one (ACP3), related with the difficulty of representing the avatar for the voice, obtains the lowest responses (52% of the respondents found it difficult to create an avatar for their voice -values of 4 or 5-, while 24% found it easy -values of 1 or 2-). There was an open question for the patients to include comments about the avatar creation process. Some of them would have preferred to have more hair styles and eyebrows available to choose from (6 participants), others wanted to have different clothes to dress the avatars (2), missed having the possibility of changing the makeup color (1), or wanted the avatar to be transparent (1). Only one participant stated that the creation process was hard due to how frightened he/she was of the avatar. Finally, some of them would also have liked the avatar to have a voice.

Table 2 Statistics of questionnaire on the avatar creation process (ACP) (responses range from “1 - Strongly disagree” to “5 - Strongly agree”)
Fig. 5
figure 5

Frequency of responses to each item in the avatar creation process (ACP) questionnaire. Responses from “1 - Strongly disagree” to “5 - Strongly agree”

The questions for the CAR questionnaire can be found in Table 2 together with mean and median values. Figure 6 presents the histograms with the frequency of responses for each item of the questionnaire. The patients liked the avatar having the appearance of a human being (100% of the responses between 4 -agree- and 5 -strongly agree- for CAR1), they also liked the way the avatar looked at them, gazing at the camera position while it orbited the avatar (93% of the responses agreed or strongly agreed for CAR2) and the way it turned its head toward the camera (97% between 4 and 5 for CAR3). CAR4 and CAR5 are related to the feeling of being in front of the virtual representation of their voices. 38% of the patients were afraid of the avatar (values of 1 or 2), and 38% were not (values of 4 or 5). Moreover, while 69% did not feel threatened by them (values of 4 or 5) being on a computer screen, a minority of them reported feeling threatened by their avatar (14% responded 1 or 2). Finally, CAR6 and CAR7 were related to future interaction with the avatar. 90% of the patients would like to talk to the avatar (responses from 4 to 5), and 55% of them would even be willing to face the avatar. For this group of questions, 77% of the responses were agree (4) or strongly agree (5) (Table 3).

Table 3 Statistics of questionnaire on the created avatar result (CAR) (responses range from “1 - Strongly disagree” to “5 - Strongly agree”)
Fig. 6
figure 6

Frequency of responses to each item in the created avatar result (CAR) questionnaire. Responses from “1 - Strongly disagree” to “5 - Strongly agree”

3.2 Avatars created

Most avatars were identified as human, a large part of them representing close people, for example, family members. The second most frequent type avatars designed were mystical or religious figures such as demons, angels or the Virgin. Apart from that, some other patients identified their voices as coming from blurred and dark shadows.

Male, between 30–40-years-old, with dark eyes, dark hair, dark clothes, prominent facial characteristics as well as muscular physical complexion, were the most prevalent characteristics in the created avatars. The vast majority of participants designed their avatar at a very close distance from them. The most identified race was Caucasian, coinciding with the race of most of the participants. However, it is surprising that 8 patients designed avatars of other races (African and from the gypsy ethnic group).

A large majority of the participants agreed that “their voices” were egodistonic, so the process of creating the avatar was stressful. Some examples of avatars created by the participants using our tool during the experiment are shown in Fig. 7.

Fig. 7
figure 7

Example of the avatars created by patients using our tool during this experiment

3.3 Therapists’ questionnaires

The results for the USE questionnaire are summarized in Table 4 and illustrated graphically with the histograms of Fig. 8. The mean value for the responses obtained for the Usefulness block of questions is 6.20 (SD = 1.02) which shows that therapists largely agreed with the usefulness of the avatar creation system (83% of the responses ranged between agree -6- and strongly agree -7-). The only one that is a bit different is the question “Usefulness6: The avatar creation system would save me time”, in which only 60% of the responses ranged between 6 and 7. For the rest of them, this percentage is over 75%. The result is similar for the Ease of use block (\(M=6.21\), SD = 1.09 and 84% of the responses between 6 and 7). The question that is different this time is “Ease of use4: It requires the fewest steps possible to accomplish what I want to do with it” (55% of the responses between 6 and 7).

Ease of learning is the block of questions that obtained better results (\(M=6.53\), SD = 0.59 and 95% of the responses between agree and and strongly agree). It is worth noting that the medians for the questions of this block are the highest possible (7). Finally, the Satisfaction block of questions shows also a high percentage of positive answers (81% for agree or strongly agree and \(M=6.09\), SD = 1.20). Only two questions have slightly different results, “Satisfaction5: The avatar creation system is wonderful” and “Satisfaction6: I feel I need to use the avatar creation system”. The first one had a percentage of positive answer of 75% (\(M=6.05\), SD = 0.89) and for the second the percentage goes down to 35% (\(M=4.45\), SD = 1.79) while it is 20% for negative answers (disagree -2- or strongly disagree -1-). This is the only question that obtains a percentage of negative answers over 5%. Apart from them, the rest obtained a percentage of positive answers over 85%.

Table 4 Statistics of the USE questionnaire (responses range from “1 - Strongly disagree” to “7 - Strongly agree”)
Fig. 8
figure 8

Frequency of responses to each item in the USE questionnaire. Responses from strongly agree (7) to strongly disagree (1)

The results of the technology acceptance questionnaire (UTAUT2) can be found in Table 5 and Fig. 9. Partial least squares structural equation modeling (PLS-SEM) is the multivariate technique commonly employed to study UTAUT2 models. However, even though it works well with small sample sizes, ours is far from the minimum recommended by the literature to have enough statistical power (Hair Jr et al. 2016). This prevented us to use PLS-SEM to analyse the data captured and extract conclusions about the influence of the constructs on Behavioral intention, and led us to use other techniques to infer some information from the results.

The descriptive statistics for behavioral intention show a relatively high agreement of the therapists with the three indicators (\(M=5.93\), SD = 1.44, with 67% of the responses within the range 6–7). The percentage of responses in the range 6–7 per each question is 82% for BI1, 64% for BI2 and 55% for BI3, while in the range 1–2 is 0% for BI1, and 5% for BI2 and BI3. This suggests that the respondents would have the intention of using the avatar creation system if it were available to them. Performance expectancy results (\(M=6.41\), SD = 0.85) show that therapists found the avatar creation system useful in they daily work (PE1), believing it would benefit the patients progress (PE2) accomplishing objectives in a quicker way (PE3), and it would help them to be more productive (PE4). The results are similar for Effort expectancy (\(M=6.55\), SD = 0.61), indicating that learning (EE1, EE4), and using the system was not a problem for the therapists (EE2, EE3), maybe because they reckon they have the knowledge to use it (FC2). Hedonic motivation (\(M=6.52\), SD = 0.67) show high degree of enjoyment in the use of the avatar creation system. Regarding the percentage of responses in the range 6–7, it is over 80% for all the indicators but PE1 (73%).

Table 5 Statistics of the UTAUT2 questionnaire (responses range from “1 - Strongly disagree” to “7 - Strongly agree”)
Fig. 9
figure 9

Frequency of responses to each item in the UTAUT2 questionnaire. Responses from strongly agree (7) to strongly disagree (1)

4 Discussion

The current study aimed at assessing the utility and usability of an avatar creation system to embody auditory hallucinations from both the point of view of the patients and the therapists. Despite this kind of tools are not new and have been recently used in experimental therapies (Ward et al. 2020), we are not aware of any evaluation of the tools used apart from studying the results of applying the therapy. Therefore, the strength of this study lies in the involvement of both patients and therapists in refining the tools that will support the application of a future therapy employing avatars, ensuring that it is tailored to the needs of both types of stakeholders.

Patients found the system complete and useful for representing their voices, with the final avatar being very similar to how they imagined their voice would look. They found the movements of the avatars natural and had the feeling that they were face to face with the virtual representation of their voices. On the other hand, the avatar creation process was stressful, so most of the participants found the task difficult. In a low percentage of the participants (\(14\%\)), a marked discomfort was evident during the design, with ideas related to feeling threatened by their avatar. In these cases, they were offered to leave the study; however, all decided to continue after a short break. This finding supports the design decision to use a desktop VR application as opposed to more immersive alternatives. As other authors pointed out, the stressful task of creating the avatar’s hallucination is, in part, positive and necessary when performing an intervention, working with the content of the voices, and mobilizing the patient’s emotions (Leff et al. 2013; Craig et al. 2017).

Almost every patient indicated that they would be willing to have a conversation with the avatar, while slightly more than half of them would be willing to confront it during professional accompaniment in a controlled environment. This is a positive result, as the patients willing to move forward in the therapy will allow therapists to work with the voices embodied. As previous studies indicate, the “reification” of voices (turning them into an object that can be modified during therapy (Leff et al. 2013; Craig et al. 2017) gives patients a greater sense of control and power in front of their voices. The results also show the engaging effect of using new technologies with patients.

The avatars resulting from the creation process are also in line with previous research about auditory delusions. Hallucinations can be operationalized as intrusive memories related to traumatic personal experiences from the past, especially in childhood (Kaney et al. 1999). In fact, beliefs about hallucinations are interpreted taking as reference both previous life experiences and the contents learned from the society that surrounds the patient (Morrison 2001). It is common for patients with schizophrenia to identify their voices with people from their closest environment (family, friends, neighbors). In our experiment, 8 out of the 29 avatars created represented real people. Also, the influence of society is often reflected in the characteristics of the avatars created. Clear examples of them are the influence of religion (including fear of the supernatural and hell), the identification of the black color with “dirty”, “evil” or fear of coexistence with other races or ethnic groups.

Patients interacted with the avatar creation system through the therapists. Therefore, the usability of the system was assessed on them with the USE questionnaire. The results for the four blocks of questions that compose this questionnaire were very positive, as the size of the blue and green bars show in Fig. 8. According to this, we can conclude that the therapists found the system useful for the treatment of patients with schizophrenia, easy to use and learn, and fun, highlighting its intention to use it when considering it effective for patients with schizophrenia and resistant voices.

A slightly lower score was obtained for the “Usefulness6” item, that is, a large percentage of therapists did not agree with the item stating that the tool saved them time in treating patients (15% felt neutral -gave a score of 4- and 20% somewhat disagreed -gave a score of 3-). This can be explained by the fact that, indeed, the voice avatar therapy requires a number of sessions to be carried out in parallel with the usual therapy sessions. This “extra” time could save time in the medium to long term in treating the positive symptoms of schizophrenia patients. Similarly, “Easy of Use4” obtained a less positive score that the rest of its block. 20% felt neutral, and 10% considered that the application could have been designed with fewer steps to achieve the final avatar design. Finally, regarding the item “Satisfaction6”, 35% of the therapists agreed that they did not feel the need to use the avatar creation system (answers between 3 and 1). This may be because the full therapy is not yet implemented and, despite finding the tool useful, they are unsure of the results they could ultimately achieve.

The intention to use of this technology was also evaluated, since none of the therapists involved in the experiment had any previous experience with it. Performance expectancy is normally considered as the strongest predictor of behavioral intention (Venkatesh et al. 2012). A close look on the results for this construct show that most of the responses lie within the range 5-7, with median values of 6 and 7. This suggest that the majority of the respondents considered the performance of the system to be adequate for its use with patients and capable of making progress with them. Having a look at the rest of the constructs, similar conclusions can be extracted from Effort expectancy. The answers to the Ease of learning questions are even more positive (median of 7). This indicates that learning to use the system was easy (no one had previous experience with it) and that the interaction was simple and clear, which is also in accordance with the results obtained from the USE questionnaire.

The only indicator of the Facilitating conditions construct is also related to use, it was easy for the therapists to get used to the system because they had the knowledge needed to use it. Linked with these results, Hedonic motivation, despite being associated to consumer behavior and being considered to have less impact in determining technology use (Venkatesh et al. 2012), still shows that the respondents enjoyed using the avatar creation system, emphasizing the utility and use of the system. Therefore, we can conclude that therapists found the system useful, simple and fun, highlighting their intention to use it when considering it effective for patients with schizophrenia suffering from resistant voices.

4.1 Limitations

Although the results of evaluating both the patients’ and the therapists’ experience are promising, the sample size limits the inference of these results (29 patients and 20 therapists). Despite using several hospitals based on several Spanish locations to recruit patients, the sample size compromises the possibility to generalize the findings. This is similar for the therapists, so that a wider experiment involving a greater number of them would be desirable. Moreover, this prevented us to use multivariate techniques in order to draw conclusions about the impact of the UTAUT2 constructs (or other variables such as age or experience) on the behavioral intention. Further knowledge about this would help to understand how to improve the avatar creation system in order to obtain better results of its application.

Regarding the questionnaires and the data analysis, even though most of the study was based on valid and reliable questionnaires (USE and UTAUT2), custom-made questionnaires were created to evaluate the experience of the patients with the avatar creation system. Therefore, further research is needed in order to assure their validity and reliability.

In addition, some limitations on the avatar creation system were identified by the patients during their use of the system. The most relevant one is the lack of voice, as some of the patients pointed out. This capability of the system was left out of this experiment on purpose, as the main objective was to evaluate the creation of the physical appearance of the avatar. Therefore, the next step in the creation of our tool, before it can be used in therapies, would be to provide the avatars with a voice that could be customized for each patient. We believe this is a paramount step in the creation of a useful therapy and will be addressed in the near future.

Patients also missed the possibility of selecting a different look for the eyebrows, more hair styles, makeup customization, a wider range of clothing to choose from and the inclusion of accessories (such as a police officer hat, etc.). We will use this valuable feedback to improve our avatar creation system.

5 Conclusions

This paper has introduced an avatar creation system aimed at embodying auditory hallucinations of patients with schizophrenia. Although avatars have already been used in experimental therapies (Leff et al. 2013), the tools used have not been evaluated in depth by either patients nor therapists. As in any other software development discipline, the involvement of the end users in the development of the tools is vital to obtain usable tools and for the ultimate goal of the acceptance of the tools.

The main objective of this paper was to gain insight into the acceptance of this kind of tools within medical practitioners as well as to evaluate the avatar creation process and its result from the perspective of the patients. Therefore, this work has described the evaluation carried out on our avatar creation system.

The hallucination avatar creation tool was administered to 29 patients diagnosed with schizophrenia and active voice listening, and a total of 20 therapists agreed to participate. Our tool showed promising results in terms of acceptance and credibility in patients with schizophrenia. In the case of the therapists, the results were equally positive in terms of usability and, more importantly, about their intention to use the proposed tool. While usability provides information on whether the users can achieve their goals efficiently, effectively and satisfactorily, assessing acceptance is even more important when a new technology is being developed. Such assessment indicates whether the beneficiaries will eventually use it, that is, whether they will accept such technology and incorporate it into their daily work with patients.

In terms of future work, the effectiveness of this approach should be examined in a larger study before presenting further dissemination. Moreover, motivated by the promising preliminary results presented in this paper, we will continue the development of our cognitive therapy tool for stressful voices. The therapy will make use of the avatar creation system described here to embody the patient’s hallucinations and help them gain more control over the voices they hear. This includes, as suggested by some participants in the experiment, enhancing the avatar creation tool with the design of the avatar’s voice. Other feedback was gathered during the experiment, and it will be used to improve the tool.

Since auditory hallucinations are prominent in many psychiatric disorders, the proposed tool could be adapted to treat other mental disorders like bipolar disorder, borderline personality disorder, major depressive disorder, posttraumatic stress disorder and schizoaffective disorder, where hallucinations are also relatively common. Moreover, not all auditory hallucinations are associated with mental illness, and studies show that a range of organic brain disorders is also associated with hallucinations, including temporal lobe epilepsy, delirium, dementia, focal brain lesions, neuroinfections and cerebral tumors (Waters 2010). This fact opens an important line of research beyond schizophrenia and other mental illnesses, reaching a potentially considerable number of people.