1 Introduction

When qualified of “subjective”, tinnitus is an auditory perception, experienced in the absence of any external or internal auditory stimulus (perception of a sound without a recordable source). Subjective tinnitus (ST) must be differentiated from objective tinnitus, induced by the perception of an internal, vascular or muscular bruit actually recordable (Shulman 1987). ST is generally a simple percept commonly described as a hissing or a buzzing. It cannot be clinically demonstrated (Savastano 2004). ST is a very common symptom affecting at least 10% of the general population that may become disabling when chronic. Approximately, one percent of the population is very severely affected (Axelsson and Ringdahl 1989). For such patients, daily activities impairment, attention deficit (Delb et al. 2008), sleep (Cronlein et al. 2007) and mood disorders induced by tinnitus perception result in major negative impact on quality of life (Bartels et al. 2008) and important economical and social burden (Vio and Holme 2005). Despite of its high incidence, pathophysiology of ST remains incompletely understood (Bauer 2004) and causally oriented treatment is still lacking even though various kinds of pharmacological agents have been proposed to treat ST with poor evidence-based validation (Dobie 1999).

Nonetheless, it emerges that ST perception is the consequence of central reorganization within the cortico-subcortical neural circuitry (Møller 2001) linked to deafferentation after peripheral cochlear or auditory nerve damage (Eggermont 2007). The involvement of auditory and non-auditory cerebral structures is then related either to ST conscious perception and to ST related distress or associated neuro-psychiatric symptoms (de Ridder 2005). Similar clinical patterns are noticeably present in post-amputation chronic pain syndrome (Møller 2007). Following this analogy, and because techniques of immersion in virtual reality (VR) have demonstrated theoretical and practical value in the treatment of chronic pain (Cole et al. 2009; Murray et al. 2007), we thought it legitimate to adapt these techniques for patients with ST. The purpose is to act on subcortical mechanisms of integration, by allowing the patient to willingly manipulate his tinnitus in a visual and auditory 3D virtual environment to control or “master” tinnitus.

The aim of this paper is to describe this first attempt to apply 3D visual and auditory VR environments to unilateral subjective tinnitus sufferers. We will first describe the motivation for tinnitus treatment by explaining tinnitus related distress pathophysiology. Then, we will focus on VR in health care as a framework to our project, with special interest in clinical situations analogous to those experienced by tinnitus sufferers. We will then propose how VR techniques could be successfully adapted to tinnitus sufferers and describe the VR setup specifications and the different virtual environments (VEs) that were developed. The paper ends with the description of the future clinical trial, which will be conducted to test 3D visual and auditory VR as a new therapeutic tool for unilateral subjective tinnitus sufferers.

2 Subjective tinnitus pathophysiology

ST, developing in the course of numerous otological diseases, is most often associated with hearing loss (>80% of cases; Weisz et al. 2006; Norena et al. 2002). This high incidence of hearing loss associated with ST can be explained by the presence of peripheral lesions (i.e., cochlea or auditory nerve) in the main pathologies causing ST (i.e., sudden hearing loss, acute or chronic noise induced hearing loss, presbyacusis, Méniere’s disease…) (de Ridder et al. 2004; Mrena et al. 2007; Herraiz et al. 2006). Even though these peripheral damages are clearly initially responsible in ST onset (Eggermont 1990), the mechanisms of production and sustention of chronic disabling ST are probably multiple and still poorly understood (Husain 2007). At the present time, there is growing consensus that dysfunctional neural plasticity processes are involved in the pathophysiology of chronic ST. Previously mentioned analogies with phantom limb pain suggest that chronic ST is a “phantom auditory perception” reflecting the maladaptive efforts of auditory and non-auditory brain circuitries to adjust to auditory deafferentation (de Ridder and van de Heyning 2007). Several recent functional imaging studies by MEG (Weisz et al. 2005), PET scan (Eichhammer et al. 2007), fMRI (Smits et al. 2007) have demonstrated that ST is associated with neuroplastic alterations in the central auditory system and/or functionally connected areas. Electrophysiological studies have shown a firing rate increase and neuronal synchrony associated with reduced alpha and enhanced gamma activity within primary and secondary auditory cortices that could be correlated, in humans, to ST related psychological distress (Weisz et al. 2007).

Moreover, and always in perfect analogy with chronic pain syndromes, incapacitating ST is often associated with stress (Welch and Dawes 2008) and psychopathological conditions of the anxiety–depression type, inducing avoidance or anticipating behavior and even phobic reactions (Belli et al. 2008). This could be related to non-auditory cortical areas involvement as described by various neurophysiological or psychological global models of tinnitus generation. According to the seminal neurophysiologic model described by Jastreboff, chronic tinnitus and the resulting discomfort is understood to result from an acquired central mechanism, triggered by a complex pathological process (cochlear lesion + stress), then sustained by an automatic system of detection and permanent analysis of the signal (reinforcement) (Jastreboff 1990). Chronic activation of the limbic circuits (emotion, memorization) is thought to lead to a conscious perception that becomes progressively more uncomfortable. Based on this model, a therapeutic protocol called tinnitus retraining therapy (TRT) is aimed at enhancing “habituation” (i.e., global process of neglecting meaningless signals) (Jastreboff and Jastreboff 2006). Counseling to make the problem more manageable combined with permanent exposure (8 h a day for 12–24 months) to a white noise is supposed to “retrain”, by neural plasticity, the central auditory pathways and then alleviate tinnitus and hyperacusis (i.e., intolerance to normal environment sounds) related discomfort (Jastreboff 2007).

On the other hand, more recent psychological models have underlined the relevance of cognitive distortions or negative automatic thoughts (“I cannot help it”, “I have lost silence forever”….) and consequent inappropriate behaviors (use of ear plugs, anxious or phobic reactions to ST perception) that promote the persistence of ST related discomfort (Andersson and McKenna 2006). It is for this reason that cognitive and behavioral therapies have been widely and successfully included in the multidisciplinary therapeutic management of tinnitus patients (Caffier et al. 2006; Londero et al. 2006).

In summary, the following factors seem to be implicated in the pathophysiology of ST:

  • Peripheral lesions (cochlea, auditory nerve),

  • Hyperactivity of the auditory pathways sometimes associated with excessive synchronization of influx and/or failure of inhibitory mechanisms,

  • Reorganization of the auditory cortex as a result of chronic auditory pathway malfunctioning,

  • Attention capture and psychological distress as a correlate of non-auditory brain areas involvement.

3 Virtual reality in clinical therapy and rehabilitation

Most often, VR integrates real-time computer graphics, body tracking devices and visual displays to immerse a user in a computer-generated VE. Other sensorial interfaces can also be used such as force or tactile feedback systems. All these interfaces enable the user to become an active participant within a virtual world and to give the user a sense of presence in the virtual environment (refer Loomis 1992 for a phenomenological description of presence). The setting in which the user performs an action can be controlled by the experimenter, recorded and measured. The unique features and flexibility of VR give it extraordinary potential for use in health research.

For instance, VR has been employed as an alternative for in vivo exposure for the treatment of different phobias for the past decade with a positive outcome (see Riva 2005 for a review). Advantages of VR have readily been used for motor rehabilitation, providing participants with repetitive practice, feedback about performance contributing to the desired effect and a motivational context in which actions are embedded (Holden 2005; Weiss et al. 2004).

A very interesting aspect of VR for health research is its use to reduce phantom limb pain. When an amputee patient sees a virtual limb placed at the position of its phantom limb, and when he manages to transfer his sensations to this virtual limb, he also reports a decrease in phantom pain (Murray et al. 2007). Moreover, when movements of the virtual limb correspond to capture motion data measured on the patient’s stump instead of the opposite remaining limb, it allows the patients to gain agency for the movement they see, and feel embodied within the limb (Cole et al. 2009). Therefore, a critical aspect of pain management is the sensation of agency, which is quite easy to provide with VR.

Regarding mental health applications, the goal to achieve immersion in a VE is to give to the user a compelling illusion that he actually moves in the VE and no longer in the physical world. The number of sensory modalities through which the user is coupled to the VE is a main factor contributing to the feeling of presence (Sheridan 1992). Thus, “multisensory” involvement is a key for VR. Furthermore, perceptual and cognitive mechanisms are tuned to process multisensory signals. Encoding, storing and retrieving perceptual information is intended by default to operate in a multisensory environment, and unisensory processing is suboptimal. Therefore, it is all the more important for a therapeutic procedure to allow the patient to combine information across the senses about a common source, to improve the localization and discrimination of virtual objects, and speed up reactions to them. In spite of that, VR technologies rarely integrate the auditory modality, which is the only sense through which we are communicating with the whole space around us. The visual and auditory channels can be easily exploited in 3D virtual environments (VEs), but it is quite uncommon to include a 3D auditory rendering in a therapeutic application.

VR technology could be an attractive tool for tinnitus sufferers to gain agency over their sensation if they could provide multisensory information and accurate 3D rendering. Indeed, in the natural environment, auditory localization is reinforced by visual cues. Several studies have shown that early blind subjects exhibit less precise auditory localization than sighted subjects, suggesting that the auditory system may require visual feedback for calibration (Lewald et al. 2002; Zwiers et al. 2001). In a VE aiming at giving patients, the opportunity to gain agency over their tinnitus sensation, auditory and visual information about the same objects would require to be accurately co-localized.

Such a project, therefore, involves technologies, models and applications linked to the introduction of 3D sound in VEs. Auditory augmentation of visual environments is known to improve presence and immersion (Hendrix and Barfield 1996). To create such environments and the corresponding content, several concepts and technologies need to be researched, developed and/or integrated.

4 Virtual reality techniques applied to tinnitus patients

The aim of the application is to offer to the tinnitus patient the possibility to manipulate an auditory–visual tinnitus avatar within a VE to gain control over the ST perception. Therefore, the application is based on the immersion in a visual VE coupled with accurate auditory spatial rendering, as well as a natural sensorimotor interaction provided through the use of two trackers. The overall procedure comprises first, the creation of an auditory avatar of the patient’s ST, and second, its inclusion in an interactive auditory–visual VE where the different audio components are spatialized according to the navigation and manipulation of the patient. The auditory avatar stimulus is created following the frequency patterns of the patient’s ST. The “spatialization” process is based on binaural technology using a database of either generic or individual HTRFs. The task of the subject is to navigate in the VE and to steer the visual and auditory avatar to place it in different positions (either according to distance or according to directional localization). So, in this protocol, the expected success of the habituation process lies essentially on the principle of integration of visual, auditory, and proprioceptive information.

5 Avatar creation

First, an acoustic modelization of the perceived tinnitus has to be established. The spectral characterization of ST and the creation of a credible tinnitus auditory image (tinnitus avatar) is not a straightforward process. To do so, the signal has to match the spectrum and intensity of the tinnitus percept. Thanks to custom-made software, a training procedure teaches the patients frequency and loudness matching. Then, by means of a similar graphical interface (Fig. 1), patients are asked to adjust a sound played into their contra lateral ear, so that it matches their tinnitus in frequency and loudness. It should be noted that for some patients, fitting the frequency and loudness parameters of the tinnitus avatar could even lead to the perception of a tinnitus image located in the middle of the head. This observation reveals that a fusion process can occur between the subjective unilateral tinnitus and the avatar stimulus presented in the contra lateral ear.

Fig. 1
figure 1

User interface of the Avatar creation application, allowing the patient to match the spectral content (sinusoidal sound plus band-pass noise), the pitch and the intensity of the auditory avatar to his subjective tinnitus

6 Auditory VR and visual 3D environments

This acoustic model is used as a tinnitus avatar, involving different auditory and visual VEs designed in 3DVIA Virtools, a VR development platform. Patients are equipped with a head-mounted display coupled with an infrared camera sensor system and immersed in the virtual scenes in which they can move forward by pressing a mouse button. Patients have to turn on their own vertical axis in order to change the direction of heading and displacement in the VE. The soundscape associated to the VE is updated in real time according to their movement and is delivered through the headphones. An additional marker attached to the tip of a rod allows the patient to control the virtual position of the tinnitus avatar through the displacement of the rod around his head. Two types of applications have been developed, which differ in their purpose, the associated task of the patient and consequently the way the sound-source spatialization is processed. The first application is based on a pointing task, through which data about ST perception are gathered. The second one is a navigation task in different VEs. The specificities of the spatialization processing involved in the applications required to overwrite the native 3D audio library functions provided by 3DVIA Virtools with customized spatialization algorithms derived from the Spat~library (Jot 1999) and implemented in the real-time audio platform Max/MSP.

6.1 Pointing task

The aim of this application is to evaluate the capacity of the patient to merge his unilateral ST with the contra lateral tinnitus avatar and to perceive it as a phantom source in a given location/direction. For this application, the tinnitus avatar is attached to an invisible and fixed reference position in the virtual scene, and the signal is presented only to the contra lateral ear. Its level depends both on the distance and the orientation of the patient relative to the reference position in order to create a virtual interaural level difference (ILD) between the ST and the tinnitus avatar. The patient’s task is to navigate in the virtual scene along an imposed path. From time to time the patient is asked to point at the subjective direction of the tinnitus avatar. The path is organized in such a way that the level of the contra lateral tinnitus avatar will cover a range of ±20 dB around the perceived level of the ST. The pointing data recorded during the session are expected to scan the subjective ILD localization function in the frequency region of the tinnitus and to check whether this function may evolve over time.

6.2 Navigation task

The main rehabilitation application consists in a navigation task through different auditory and visual VEs. Three VEs were chosen as representative of realistic situations (countryside, urban and indoor scenes) and are inhabited with a collection of auditory sources (animals, cars, domestic noises, etc.) and a sound ambience (background noise, reverberation). The task of the patient is to locate different visuo-auditory landmarks disseminated in the virtual scene, to “visit” each of them, one at a time, i.e., get close to them and wander around. During their navigation, patients are invited to move the tinnitus avatar around their head to find the most comfortable position in terms of auditory sensations (Fig. 2). When the tinnitus avatar enters in the viewing frustum, it is represented as by animated sparkles. This visual representation is used to elicit the integration between auditory, visual and proprioceptive sensory modalities and to improve agency over their tinnitus sensation.

Fig. 2
figure 2

Illustration of the city VE showing one of the auditory–visual landmarks (cube #4) disseminated in the scene. The animated sparkles are the visual manifestation of the tinnitus avatar associated to the rod manipulated by the patient

The auditory sources associated to the landmarks and sound ambiences are spatialized according to the location and orientation of the patient in the VE. Landmark sources are rendered through binaural technology (Wightman and Kistler 1989) using generic Head Related Transfer Functions (HRTF). The HRTFs, derived from measurements on a human or a dummy head, allow a reconstructing to the ears of the listeners of the perceptual cues, which are responsible for localizing sound in direction, i.e., the interaural time delay (ITD), the interaural level differences (ILD) and the spectral cues, which are determinant for localizing sound in the vertical plane. The tinnitus avatar is also spatialized through binaural rendering. Moreover, as its position is slaved to the rod placed into the hand of the patient, the binaural rendering takes advantage of the near-field and geometrical compensation proposed by (Brungart 1999). To improve the immersion of the listener in the auditory environment, ambience sounds encoded in first-order ambisonics format are added to the soundscape. Ambisonics is a scalable audio format (Malham and Myatt 1995; Daniel and Moreau 2004) that embodies spatial information of a sound scene according to the three directions of space (left/right, front/back and up/down). At playback, real-time spatial transformation is applied to the sound scene to compensate for the head rotation of the listener and an ambisonic to binaural decoder is used for headphone reproduction.

6.3 Virtual reality setup specifications

We used a nVisor SX head-mounted display (NVIS, 1280 × 1024 (SXGA) resolution, 60° FOV diagonal) with stereoscopic viewing. The patient’s head and the tip of the rod’s orientations and positions (Fig. 2.) are measured by optical motion capture (Optitrack, FLEX V120, 120 fps). Sounds are rendered through BeyerDynamic DT990 circum-aural open headphones.

Tracking data acquisition and processing, image synthesis and sound spatialization run in parallel on a Core 2 duo 3GHz equipped with 2 Go of RAM, an NVIDIA Quadro FX4600 graphics card and a RME audio interface Fireface 400.

7 Clinical validation

Therapeutic usefulness of these VR procedures will be subsequently tested in the clinical controlled trial described hereafter and approved by the local ethical committee. Eight consecutive VR sessions will act as an habituation protocol supposed to enhance the dissociation between tinnitus percept and its mental representation by working on the patient’s ability to progressively control the localization of the tinnitus avatar in both direction and distance in order to move it “off limits”, at will. The study design is an open randomized therapeutic trial comparing two therapeutic strategies: virtual reality and cognitive behavioral therapy (CBT) including an observational control arm. The control arm is a waiting list. It will be used to evaluate the natural history of the symptoms over a period of several months. The two juxtaposed strategies are VR and CBT, the latter being considered as a standard tinnitus treatment. The research hypothesis is that the treatment of tinnitus by VR is at least as effective as the treatment of tinnitus with standard CBT. The primary goal is to evaluate the efficacy of VR on the intensity of discomfort induced by tinnitus as measured by a validated questionnaire (French translation of the Subjective Tinnitus Severity Scale) (Meric et al. 1996). The secondary goals are to evaluate the efficacy of VR to reduce (1) the perceived intensity of ST as measured by visual analog scale and tinnitus matching, (2) the intrusiveness of associated hyperacusis as measured by the auditory sensitivity scale and (3) the psychological impact of ST as measured by Anxiety–Depression Hospital scale. The difference between treatments will be assessed on the mean relative changes Δ = đVR − đCBT, which are assumed to follow a normal distribution. The standard deviation of the expected mean relative change observed in the CBT group is 31.91% based on previous data (Londero et al. 2006). The sample size calculation is based on the primary endpoint, which is the mean relative change in intensity score, to test the null hypothesis H0 that the two treatments are not equivalent |Δ| ≥ ΔL = 15.95% versus the alternative hypothesis H1 that the two treatments are equivalent |Δ| < ΔL = 15.95%.

The two-tailed equivalency interval is then defined as follows:

A sample size of at least 63 patients per group would provide a power (1−β) of at least 0.8 to demonstrate, with the risk for a type I error of 5%, that the two treatments are equivalent. For the observational control group, a theoretical sample size of 30 subjects should provide a power of 91% to demonstrate a mean relative change in severity score of 20%, by comparing the untreated controls before and after management, with each subject serving as his own control.

8 Discussion

According to the “International Association for the Study of Pain”, pain is “an unpleasant sensory or emotional experience associated with actual or potential tissue damage”. ST perfectly meets this definition corroborating the similarity of the two pathological perceptions (Møller 2007). Whatever the mechanism, pain and tinnitus are perceptions thus totally subjective phenomena with a varying degree of tolerance, potentially unbearable as both are known to lead to suicide (Andersson 2003). Clinical data suggest other similarities between pain and tinnitus. First, in chronic neuropathic pain, the painful perception can be triggered by a cutaneous stimulation, which normally only causes a somesthesic perception: this phenomenon is called allodynia, and can be compared to hyperacusis experienced by tinnitus sufferers (Norena and Chery-Croze 2007; Nelson and Chen 2004). Second, there is no strong clinical argument in favor of hyperstimulation of the auditory sensory pathway (i.e., “nociceptive” by analogy to pain) as tinnitus underlying mechanism. This is obviously the case when tinnitus persists after total hearing loss (Baguley et al. 2005). On the contrary, various phenomena acting through peripheral and central neuronal modifications may underlie mechanisms similar to those involved in chronic neuropathic pain (Viirre 2007). A possible loss of a peripheral or central inhibitory control may be the cause of central hyperexcitability (Norena and Eggermont 2003). Third, another important notion common to both pain and tinnitus perception is the notion of central integration. In each case, the transmission is not passive but is prone to multiple modulations by mean of various control systems (attention, emotional). Some authors consider the possibility of the implication of plurisensory ascending pathways connecting multiple cerebral zones and in particular the limbic system highly involved in emotional reactions (Cacace 2003). Similarly, functional imaging shows that pain and tinnitus activate the same cerebral structures (Møller 2007). Such central dysfunction is a potential target for any kind of modulation based on neural plasticity.

Furthermore, there is a complex relationship between tinnitus and psychological status that is still subject to debate. If it is obvious that tinnitus is not a psychiatric symptom, it is also clear that tinnitus may have detrimental psychological consequences (Andersson 2002). Patients with prior psychological impairment who develop tinnitus may have an amplified perception of it and suffer from greater negative consequences (Folmer et al. 2008; Langguth et al. 2007).

Progressive exposure to virtual reality has been found to be useful in either post-amputation pain syndromes and in various kinds of anxiety related states (arachnophobia, Post Traumatic Stress Disorder…see Riva 2005 for a review). Furthermore, it has been demonstrated that supplying 3D auditory information in addition to visual information increases efficacy and improves patient adhesion to the experimental conditions (Viaud-Delmon et al. 2004, 2006). Moreover, and even if VR has never been used as a therapeutic tool in the tinnitus field, auditory training with auditory object identification, which dynamically engages attention and requires patients’ active participation, has already been shown to reduce tinnitus perception (Searchfield et al. 2007). Then, it seems plausible that immersion in VR can contribute to tinnitus treatment by promoting plasticity, through the active manipulation of a 3D auditory object linked to a visual representation. It is a question of working on a psycho-sensory level to trigger low-level “recalibration” allowing the patient to separate the representation of tinnitus from its perception. The global aim is to develop the patients’ ability for assuming an active role in controlling tinnitus, gaining agency for the movements they both see and hear. In practice, by modulating the perception of tinnitus in the short term, the patient is able to vary the localization of the tinnitus in the near space and to make it interact with the various auditory VR environments. As previously demonstrated for phantom limb pain, the theoretical goal is to make it evident that perception is some kind of “illusion” thus depriving it of its aggressive characteristics. Since ST currently lacks effective treatment, after appropriate clinical testing, such an innovative method could then potentially represent a useful step toward a possible cure for tinnitus sufferers.

Although VR techniques are very attractive for health care, their adaptations are mandatory in order to allow their practical clinical use. In the case of tinnitus patients, a special effort has to be dedicated to the work on the monitoring of 3D audio features. One of the foreseeable difficulties for the modulation of the localization of tinnitus in direction and distance is that the temporal and frequency characteristics of tinnitus are most often unfavorable. Tinnitus is generally made of a narrow band of high frequencies perceived as stationary. Thus, it is difficult for the auditory system to exploit all factors of perception that make localization possible. It is likely that, for most conditions, only the ILD index will effectively be useful for localization.

To increase the chances of successfully obtaining habituation, it is important to add a second simultaneous representation of the tinnitus using a visual avatar in order to further promote externalization and spatial anchoring. For example, the tendency to create both intracranial and ambiguous localization (confusion front–back) observed using generic HRTFs for binaural synthesis tends to decrease in the presence of a coherent visual stimulus. In a similar way, the interaction between auditory and proprioceptive factors could play a major role in the process of dissociating the perception of the tinnitus from its mental representation in space. Such interaction can be promoted through direct control of the localization of the avatar by the patient with a captor held in his hand and represented in space, or through linking the acoustic scene to the movements of rotations of the head of the listener. In a more sophisticated version, by creating interaction that is visual, auditory and idiothetic (based on proprioception and vestibular indices generated by the movements of the patient), the patient, equipped with a wireless tracking device, could wander in space freely approaching or moving away from the sound avatar.

9 Conclusion

Subjective tinnitus (ST) is a complex symptom still lacking of cause oriented therapy. Virtual reality (VR) and multimedia interactive technologies have proven efficiency in different clinical situations analogous to ST, like phantom limb pain. The 3D visual and auditory VR system here described in detail in terms of setting up and execution procedures is the first to be tailored to tinnitus sufferers. Even if further clinical research is warranted to demonstrate its clinical relevance in alleviating ST perception or ST related distress VR could, in the near future, represent a useful step toward a possible cure for ST sufferers. While interactive technologies will benefit neurological, psychiatric and tinnitus patients, they in turn will contribute greatly to the development of the technology, thereby benefiting all people.