Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

4.1 Psychological Therapies for Voice Hearing: An Overview

While attempts to understand voice hearing need to acknowledge the complexity and diversity of the experience (Woods et al. 2014), the majority of hearers describe voices that take the form of a characterized “other” with whom a personally meaningful relationship develops (Beavan 2011; McCarthy-Jones et al. 2014). AVATAR therapy is part of a new and exciting wave of therapies which adopt an explicitly relational and dialogic approach to working with the distressing voices. To understand the AVATAR approach, it is important to consider its position in the evolution of psychological interventions for distressing voices.

Developments in psychological approaches to working with voices have been articulated in a recent review by an international collaboration of experts (Thomas et al. 2014). Following an early focus on functional-analytic approaches such as coping strategy enhancement (Tarrier et al. 1993), cognitive conceptualizations of psychosis came to prominence (Garety et al. 2001; Morrison 2001). A key premise of cognitive models, in keeping with a continuum view of psychosis, is that the presence of voices in isolation is not sufficient to determine the transition to clinical psychosis (i.e., “need for care”), a position that is supported by evidence of non-distressing voices in the general population (de Leede-Smith and Barkus 2013; Johns et al. 2014). Instead, cognitive models propose that beliefs and appraisals play a central role in the development and persistence of positive symptoms of psychosis (Garety et al. 2007). Put simply the way in which individuals make sense of, and respond to, their voices can determine whether voices remain benign (even life-enhancing) or alternatively result in distress, impairment, and a need for clinical care. Drawing on work in the field of anxiety disorders, Morrison has conceptualized distressing voices as occurring when “intrusions” into awareness are subject to “culturally unacceptable” misinterpretation, a stage of meaning-making that is influenced by the person’s prior life experiences together with beliefs about the self, the world, and others (Morrison 2001). Seminal early work by Paul Chadwick and Max Birchwood has demonstrated that beliefs about voices (specifically regarding identity, power, intention, and control) are key to understanding distress and maladaptive responding in the context of voices (Birchwood and Chadwick 1997; Chadwick and Birchwood 1994). Factors such as mood and physiology, safety behaviors (including hypervigilance), meta-cognitive processes, and faulty self- and social knowledge are viewed as key maintenance processes which fuel distressing appraisals and beliefs about voices (Morrison 2001). Consequently, a range of cognitive-behavioral therapy approaches have been developed, which intervene at the level of individual “meaning-making” (i.e., beliefs and appraisals) and target the key maintenance processes outlined above (Thomas et al. 2014).

In more recent times Birchwood and colleagues (2000, 2004) have integrated their original cognitive model of voices with a “social mentalities” approach which proposes that humans have evolved mechanisms for recognizing dominant-subordinate interactions, i.e., their social rank (Gilbert and Allan 1994; Gilbert et al. 2001). Beliefs about the power of the voice are essentially viewed as a differential judgment the hearer makes regarding their power (or more usually lack of power) in relation to the voice, i.e., a relational judgment. Individuals who have experienced powerlessness and inferiority in social relationships have been found to be more likely to report similar experiences during the voice interaction (Birchwood et al. 2000). It is argued that negative experiences within social relationships establish social schemata that drive the subsequent appraisals of voices and ultimately lead to significant levels of distress and depression (Birchwood et al. 2004). Recent reviews have provided support for the hypothesis that social schema may mediate the appraisal-distress relationship with the implication that therapies could benefit from targeting social and interpersonal variables (Mawson et al. 2010; Paulik 2012). These theoretical developments have informed a specific cognitive therapy for command hallucinations (CTHC; Birchwood et al. 2014; Trower et al. 2004). A randomized controlled trial of this approach (the COMMAND trial; Birchwood et al. 2014) has recently reported a reduction in the rate of compliance behavior to the voices compared with the treatment as usual group (odds ratio 0.45) along with an associated reduction in the specific treatment target (the power difference between the perceived threat of the voice and the hearer’s ability to mitigate this threat). Interventions such as CTHC answer an identified need for the development of more targeted therapeutic approaches to specific experiences and symptoms of psychosis and the putative mechanisms of persistence and distress (Garety and Freeman 2013; Thomas et al. 2014).

The approach of Birchwood and colleagues provides a bridge between early formulation-based approaches centering on intra-psychological (cognitive, affective, and behavioral) processes and a new wave of relational approaches which focus on the interpersonal relationship between the voice-hearer and the voice (Corstens et al. 2012; Hayward et al. 2014; Leff et al. 2013). Relating therapy (Hayward et al. 2009) specifically applies Birtchnell’s (1996) interpersonal model to the voice-hearer relationship, identifying key interpersonal dimensions of power and proximity (Birtchnell 1996). While interpersonal power can be viewed as analogous to the social rank characterization of dominant-subordinate interactions, maladaptive relationships along the proximity dimension are defined by opposing poles of “withdrawal/self-isolation” versus “over-involvement/intrusiveness” (Hayward et al. 2011). Therapy begins by exploration of similarities between the person’s relationship with their voice and other social relationships (in line with the mirroring noted above by Birchwood and colleagues 2000). Following this awareness-building stage, sessions move on to explore different ways of relating to the voice using assertiveness training (including role-play and empty chair work (Chadwick 2006) with the aim (again consistent with CTHC) of increasing the person’s appraisal of control within the relationship.

“Talking with voices” (Corstens et al. 2012) is a relational approach which emphasizes the importance of understanding voices (and voice relationships) within the person’s biographical context (Longden et al. 2012). Voices are understood as a reflection of conflict within a person’s life story, a conflict that becomes manifest in the voice that the person hears. The approach involves a “facilitator” engaging in dialogue with the voice(s), asking direct questions the answers to which are relayed back via the voice-hearer. Rather than seeking to eradicate the experience of voice hearing, or indeed target specific cognitive mechanisms, this approach aims to provide an opportunity to resolve social-emotional dilemmas in order to achieve a sense of acceptance or mastery over previously distressing, disempowering experiences.

4.2 AVATAR Therapy: History, Method, and Evidence So Far

AVATAR therapy (Leff et al. 2013) is a recent relational approach which draws on the theoretical and clinical developments outlined above, within the context of a novel therapeutic milieu. Using specially designed computer software, the clients create a visual representation of the entity (human or nonhuman) that they believe is talking to them. Additional software is used to transform the voice of the therapist to match the pitch and tone of the voice heard by the person; the two processes finally being combined to produce a computer simulation (a virtual agent or “avatar”) through which the therapist can have a dialogue with the person. In addition to the time taken to create the “AVATAR”, therapy comprises approximately 6 × 45 minutes sessions of which around 15 minutes is spent in dialogue with the avatar. The therapist (sitting in a separate room to the participant and communicating through linked computers) promotes a dialogue between the participant and the avatar, one goal of which is that the hearer will experience more power and control within the relationship (Leff et al. 2014). The sessions are audio recorded and provided to the participant on an MP3 player for continued use at home.

AVATAR therapy can be embraced within the generation of virtual reality-based psychological therapies using technology to integrate real-time graphics, sounds, and other sensory inputs to create a computer-generated world with which the user can interact (Gregg and Tarrier 2007). Although AVATAR therapy is not provided in a complex immersive environment, the platform uses virtual reality to create and allow the person to access and visualize the abstract nonphysical information of his/her voice. One could define it as a virtual embodiment of the experience: to give a physical representation to the personified but disembodied voice. This visualization of the voice may facilitate two essential processes in the AVATAR therapy: (a) validation of the experience and (b) the flow of dialogue with the voice through the sessions while modifying the type of relationship between the voice and the participant. This virtual embodiment of the experience is achieved by matching the voice of the avatar to the current auditory hallucination and, in early sessions, by the avatar using verbatim statements from the voice, as reported by the voice-hearer. These add realism to the experience and seem to be a key aspect of the therapy.

Morrison and colleagues showed that approximately 75 % of people with psychosis could identify images that occurred spontaneously in relation to their voices (e.g., having an image of the perceived source of a voice when hearing it) (Morrison et al. 2002). They also reported that some of the voice-hearers used their images as evidence to support their beliefs about voices (e.g., believing that a voice is omnipotent, powerful, and omniscient because they have a concurrent image of God or the Devil). They concluded that working with these images, for example, altering the content or meaning of the image, could result in a reduction of the distress associated with them and even increase the sense of control over the images (Morrison 2010). The exposure to the experience of seeing an image and hearing an avatar uttering the same statements as the voice in therapy sessions, along with the modification of the relationship with the voice, may be contributing to the reduction of the voice’s associated distress and to the disconfirmation of maladaptive beliefs about the voice. The mechanism of this may be anxiety related: the therapy may be reducing cognitive avoidance of fear-relevant information (i.e., the voice and its content) and also reducing anxiety as a direct consequence of exposure (Foa and Kozak 1986). Re-listening to MP3 recordings of each dialogue between sessions may facilitate this exposure process.

In line with the early theoretical work outlined above, the voicing/characterization of the avatar reflects a detailed understanding of the person’s beliefs about the voices (e.g., regarding identity, power, intention, and the consequences of resistance; Chadwick and Birchwood 1994). This includes an assessment of how a person’s cultural background influences what participants think are the origin of their voices but also informs how the therapist should enact the avatar in dialogue with the participant. Therapists on the trial have been required to enact spiritual entities located within different systems of beliefs, sometimes in combination (including among others Islamic, Christian, Spiritualist, and region-specific African and Rastafarian beliefs). In addition avatars representing characterized people gain validity when voiced to reflect the cultural norms of the experienced other (this typically proceeds via a synthesis of the therapist’s existing cultural competence together with assessment of the beliefs and assumptions of the voice-hearer). A specific example was the enactment (by author TC) of an avatar viewed as a local Rastafarian drug dealer, where it was important that the avatar was voiced from a position reflecting the basic tenets of Rastafarianism, specifically views about cannabis. Differing representations of the self and related beliefs about the reason for persecution (Trower and Chadwick 1995), together with comorbidity of depression and associated negative self-schemata (Vorontsova et al. 2013), also impact on the nature of the dialogue (e.g., assertiveness work and relinquishing of a “victim role” or work on attribution of guilt and self-blame). Within the AVATAR therapy approach, the person’s relationship with their voice is fundamentally viewed in the context of their current and previous significant relationships (Birchwood et al. 2000, 2004). The possible role of early trauma is sensitively addressed from the first meeting and in line with the “talking with voices” approach (Corstens et al. 2012); unresolved social and emotional issues that may be relevant to the person’s experience of voice hearing are considered throughout the therapy. The nature of the relationship as it varies along dimensions of interpersonal power and proximity (Birtchnell 1996) also influences the evolving dialogue. While all dialogues (particularly early sessions) involve negotiation of a transfer of power and control from voice/avatar to hearer, relationships characterized by “withdrawal” require an initial “turning to face” the previously avoided experience, while “clinging” relationships typically necessitate a process of disengagement (i.e., “not getting drawn in” to what might be termed as the habitual “dance of distress”). Such strategies share some commonalities with an acceptance and commitment therapy approach to working with psychosis (Bach and Hayes 2002; Gaudiano and Herbert 2006) insomuch as relationships characterized by “withdrawal” and “clinging” could be viewed as involving unhelpful levels of experiential avoidance and cognitive fusion, respectively. Following the initial assertiveness phase, the avatar’s character gradually changes to become conciliatory or even helpful. This initiates a second phase which focuses on issues of self-esteem and identity, work that is consistent with other recent approaches emphasizing the importance of self-esteem and self-compassion in working with distressing voices (Mayhew and Gilbert 2008; van der Gaag et al. 2012). Specific work on self-esteem typically includes asking the person to get friends and family to provide a list of their strengths and best qualities which can then be used in dialogue with the avatar. For some people the extent of current social isolation means that it can be difficult to identify someone to provide the list (in such cases it can be obtained from a trusted professional or the therapist may “work up” the list in collaboration with the person). For those who can identify someone to provide a list, the simple act of hearing a positive view from someone else can be a powerful (and surprising) experience. For others the discussion of positive qualities triggers embarrassment and awkwardness, and for some hearing positive qualities spoken aloud can seem an almost aversive experience (reflecting, in our view, the extent of the dissonance between this positive information and the ingrained negative view of the self). Given these potential challenges as in the earliest assertiveness sessions, it can often be necessary to engage in preparatory role-play with the therapist before attempting to raise the topic with the avatar. The final sessions of AVATAR therapy often involve discussion around hopes for the future and are influenced by consideration of the personal meaning of recovery in the context of the voice hearing experience (Romme et al. 2009).

An abbreviated outline of the evolution of a typical dialogue is given in Fig. 4.1.

Fig. 4.1
figure 1figure 1

Example of a dialogue

In an initial pilot study (Leff et al. 2013), 26 patients were randomized to therapy (n = 14) or a waiting list control group (n = 12). Therapy was provided for a maximum of seven sessions lasting 30 minutes. While the control group reported no change over time, those receiving AVATAR therapy reported an average reduction of 8.7 points (p = 0.0003) in the total score of the PSYRATS-AH rating scale for auditory hallucinations (Haddock et al. 1999) with three participants reporting a complete cessation of voices. Participants in the therapy arm also reported an average 5.9 point (p = 0.0004) reduction in scores on the omnipotence and malevolence subscales of the revised Beliefs About Voices Questionnaire (BAVQ-R; Chadwick et al. 2000).

AVATAR therapy is currently being examined in a larger, well-powered methodologically rigorous clinical trial (n = 142), in which a comparison is made between the effects of AVATAR therapy and supportive counseling, the control group chosen to take account of nonspecific elements of therapy exposure (ISRCTN: 65314790). Early qualitative impressions from the trial therapy team indicate that the virtual reality aspects of the setup, fostering a sense of “presence,” facilitate a dialogue whereby affect is “on line,” with participant reports of high ecological validity of the avatar. Some respondents report that the experience with the avatar is “100 % like hearing my troubling voice” potentially conferring benefits over existing helpful techniques such as role-play and “empty chair” work (e.g., Chadwick 2006; Hayward et al. 2009). In order to record and evaluate this reported verisimilitude, we have incorporated an adapted version of the Sense of Presence Questionnaire (Slater et al. 1994) that evaluates the participant’s sense of “hearing the voice” and their perception of the avatar as their “voice talking to me.” As we are also interested in the persecutory experience and level of anxiety when confronting the avatar, we have adapted the State Social Paranoia Scale (Freeman et al. 2007) to measure persecutory and positive thoughts about the experience (e.g., “the avatar was trying to irritate me” and “the avatar was friendly towards me”). Visual analogical scales are used to capture reported anxiety and perceived hostility of the avatar at the end of every therapy session.

4.3 Challenges of AVATAR Therapy

For good or bad (or sometimes both) the relationship with the voice often forms a key part of the person’s life and in many cases represents the main source of current social relating. As such the meaning and implications of changes in this important relationship require sensitive, open-minded discussion between the voice-hearer and therapist as part of the person’s engagement with the therapy. While the pilot study provided evidence of reductions in voice frequency and intensity, AVATAR therapy, in common with other psychological approaches and hearing voice networks, targets the reduction of distress and disruption to the life of the voice-hearer. Ultimately the aim is for the person to begin to experience a sense of power and control within their relationships (with their voice and other people) such that they emerge more confident in their ability to navigate their social world and engage with the possibility of a different, more positive future.

As is apparent from all we have said to this point, a key component of therapy is the ability of the therapist to understand the nature and possible purpose of the person’s voice and to deliver a “realistic” enactment of this entity during the dialogue. In the initial sessions, the therapist is required to use verbatim statements delivered with the prosodic features (including tone and rhythm) and force that the hearer usually hears from their persecutory voice. This presents a number of immediate challenges. The necessity to speak these typically abusive, threatening, and overtly hostile comments (including racist terms) directly to the person (albeit via a modified voice transform) sits uneasily with all the instincts and training of therapists. Early sessions aim to strike a balance between creating and dialoguing using a realistic representation of the voice experience while ensuring that the person feels sufficiently safe to approach something which may feel frightening and trigger concerns about possible voice retaliation. Getting the balance right can be tricky. On the one hand is the risk that the person is unable to tolerate the session as, for example, the person who terminated a session saying “…I have to put up with this rubbish day and night; this is just too much…” while on the other hand being so mild that the experience is perceived as contrived and unrealistic as, for example, a comment such as “oh, my voice would never speak like that….” In practice such occurrences have been rare, probably because a great deal of effort is put into preparing the participant for the sessions including role-play and rehearsal of responses before the first encounter with the avatar. A related challenge is presented when the voice is experienced with a particular accent where again the immersive reality of the experience is enhanced if the therapist is able do a fair imitation of the accent. Interestingly, when the balance is right, the immersive experience appears remarkably high with several participants commenting that they felt they were really in a dialogue with their voice.

A typical therapy experience for participants who engage with the approach involves some initial (often marked) anxiety (in particular preceding the first confrontation with the avatar) followed, during the first session debrief, by a reported sense of relief, achievement, power, and even liberation. Over time the reported in-session anxiety typically reduces and the participant is able to reflect with the therapist on a significant challenge which has been faced and overcome. For participants who choose not to continue with sessions (to date approximately 20 % of those who attend at least one session), reported reasons for discontinuation are varied (and in some cases simply logistical). Withdrawal factors that are related to the therapy typically involve the person finding early sessions overly stressful (including in rare instances increased hostility, threats, and commands not to continue from voices) or the participant not seeing how the approach could help in terms of their voices. It is worth noting that a similar dropout rate is seen in the supportive counseling control group (approximately 18 %) and is comparable to that seen in several other exposure-based therapies including PTSD (Imel et al. 2013; van den Berg et al. 2015).

One of the most significant challenges for the therapist is the transition from the initial, largely verbatim sessions where the task is mainly to establish a fair simulacrum of the voice hearing experience toward a second more dialogic phase in which the character of the avatar shifts to being less threatening and more considerate of the individual. This dialogic shift has to be appropriately timed, should be in response to changes in the preceding dialogue, and should be based on the therapist’s understanding of the participant’s key beliefs about the origin, nature, and function of the voice in their life. A key task for the therapist (in keeping with cognitive approaches to working with psychosis more generally) would be to determine whether the evolving dialogue is situated “within the belief” (e.g., someone definitively identifying the voice as caused by an external entity, e.g., a demon, a school bully, or a drug dealer) or whether the dialogue is evolving toward an understanding of the voice experience as having its origin within the self (e.g., the identification of the voice/avatar as representing low self-esteem or “memory echoes” of past bullying/abuse/trauma). In the former case (i.e., “working within”) a rationale for the diminishing presence or power of the “other entity” is negotiated, e.g., the bully who accepts the person is now too strong to be pushed around. In the latter case (i.e., a more internal attribution of voice) an understanding that the avatar/voice content represents “the negative things I think about myself” can be developed with the implication that the necessary change is for the person to begin viewing themselves in a more positive and compassionate way (this is framed as a process of change as opposed to a “quick fix” particularly for the many participants in the trial with significant abuse and bullying histories). The therapist aims to avoid “forcing” the dialogue into one direction or the other but rather adapts their approach to connect with the person’s evolving understanding of their voice and what would constitute a positive change in the relationship with their voice.

It should be noted that even assuming it is possible to deliver a realistic voice hearing experience, the task of transforming the avatar experience to becoming less hostile and more under the control of the voice-hearer is no guarantee that the actual voice hearing experience will similarly moderate. In cases where the avatar has transitioned, while the day-to-day voice remains hostile, the avatar typically suggests the person tries the strategies that worked in earlier sessions with their day-to-day voices reinforcing key messages that have emerged from the dialogue (e.g., relating to the person’s strength, resilience, and positive qualities). Throughout the trial, the therapy team has considered other potential adverse reactions to the therapy including the risk that the avatar voice becomes incorporated in a negative way into the voice-hearer’s experience/beliefs or that the avatar computer system is seen as the source of the voice hearing. Neither of these has yet been observed though in one instance a person reported hearing their avatar’s voice in a helpful way outside of a session. A number of people have also reported completely new, benign/reassuring content from their voices by the end of therapy which they view as a positive change. As is customary in any clinical trial of a new therapy, all possible negative outcomes are recorded and monitored and will form a key component of the final report of the trial.

4.4 The Future: Implementing AVATAR Therapy in Routine Care

The current clinical trial is being provided in NHS facilities and the majority of the participants are receiving continuing care from secondary psychiatric services. The number and spacing of sessions are such that the therapy is easily fitted in to the wider care program and is delivered alongside routine case management and medical treatment. At present the delivery system is fairly cumbersome, requiring the fixed installation of two desktop computers that are hardwired, but in fact, the software is capable of running on laptops or tablet computers over the hospital intranet given the appropriate data protection and governance permissions. With such a configuration it would be entirely feasible to incorporate this therapy into routine outpatient clinical settings, and sessions could be tailored to integrate with other components of overall care. This is indeed the future pathway envisaged. A component of the current project led by our colleagues in University College London is the development of a portable multi-platform system, available for future research and ultimately clinical use. This more flexible system should be available in 2016.

The larger potential barrier to routine implementation lies in identifying, training, and supervising clinical staff to deliver the therapy. This represents a major challenge more generally for psychological therapies for psychosis (Haddock et al. 2014; Prytys et al. 2011). All the AVATAR therapy to date has been delivered by very experienced clinicians, all of whom have considerable prior training in psychological therapies, and the group meets regularly for peer supervision, which is essential. Initial training involved each of the therapists working with two patients outside of the clinical trial and was provided by Professor Julian Leff against an outline manual that has subsequently been elaborated as we all gain experience across the trial. All therapists come from a background where the clinical formulation of a person’s problems is seen as essential for therapy to proceed and undoubtedly has determined the elaboration of therapy model as we deliver it.

It is difficult to envisage AVATAR therapy being delivered by novice therapists without competency in clinical formulation and familiarity with a variety of psychotherapeutic techniques. On the other hand, our current “homework” task of listening to MP3 recordings of the therapy sessions could certainly be enhanced through the use of more sophisticated tablet-based software that included the visual imagery (e.g., using augmented reality on a smartphone or tablet) and, perhaps in time, also could be programmed as a self-help top-up to practice the use of key assertive phrases in standing up to prespecified content. We are of the opinion that these represent methods of augmenting the standard one-on-one delivery of the therapy rather than offering a separate self-help alternative given the often strong emotional responses and the consequent importance of in-session monitoring and therapeutic work before and after the active dialogue.

In addition to the use of this therapy as a “stand-alone” intervention for people with diagnoses of schizophrenia and other psychoses, our experience suggests that the approach could be easily adapted for voice-hearers with other conditions. We also believe that AVATAR therapy may be a very helpful component of a broader therapeutic approach, included, for example, within a typical 16-session course of CBT where the voices reflect just one component of the individual’s experience. For example, during the trial training phase, TW saw an individual for 6 sessions of AVATAR therapy following a period of individual CBTp (approximately 20 sessions), which had taken place around 6 months earlier. The participant and therapist experience suggested that the two approaches operated in a complementary fashion with benefits that generalized from the voices to broader distressing persecutory beliefs.

Another frequently asked question is whether AVATAR could be helpful for people with psychosis who do not want to take medication. The inclusion criteria for the current trial are that participants hear voices despite continuing to take medication. As a result, we have excluded a small number of referrals of young people from early intervention services who were being managed off medication. We believe that this cautious approach is the right one at this stage of development of AVATAR. Should evidence from a number of trials show a low risk of adverse clinical effects, it would be appropriate to move toward a carefully conducted clinical trial as has been implemented for CBTp (Morrison et al. 2015). Another important though rather obvious consequence of the approach is that it provides a unique opportunity for the participant to share the voice hearing experience with the therapist and others – a feature that has been commented upon favorably by several participants, a number of whom have decided to play the sessions to friends and families. Working with and through an avatar provides an opportunity for the therapist to reflect on what living with such hostility on an ongoing basis might actually be like. In this way, empathy is taken from an abstract clinical plane and brought closer to the experience of people living with distressing voices.

4. Conclusion

AVATAR therapy is part of a new and exciting wave of therapies which adopt an explicitly relational and dialogic approach to working with distressing voices experienced by individuals suffering from psychosis. Although this work needs replication in future trials, initial data on the use of this novel approach offers many opportunities both in terms of the delivery of therapy and in elaborating our understanding of the phenomenology of “voice hearing.” Also AVATAR therapy has the potential of being applied to other mental health disorders and conditions which of course require further work and adaptations.