Introduction

Tinnitus aurium is a ‘phantom’ auditory experience referring to a person’s perception of a ringing, hissing or buzzing sound despite there being no such sound present in the external world. An estimated 10–15% of people in the UK population suffer from tinnitus [1] including as many as one in three older adults [2]. For a significant number of people, tinnitus has a detrimental impact on their daily lives [3]. However, tinnitus remains poorly understood and there is no uniformly effective therapy. The exploration of novel therapies for tinnitus, such as auditory training, is welcome, especially if these therapies show promise in terms of efficacy and reducing the cost of current service provision [4, 5].

Current therapies for tinnitus are based on a number of assumptions and observations. It has been proposed that tinnitus persists because the sound lacks any context or behaviourally relevant meaning that might permit the brain to ignore it over time [6, 7]. Indeed, Cuny et al. [8] suggest that without intervention, individuals fail to habituate to tinnitus and so develop a heightened awareness of it. Sound generators are one therapeutic intervention used to facilitate habituation to the tinnitus sound. These devices can be set to deliver a continuous low-level noise that helps to reduce the salience of the tinnitus and to divert attention away from it. Psychological approaches, such as relaxation therapy and cognitive behavioural therapy, are also advocated to alleviate the psychological impact of tinnitus [9]. It has been suggested that sound therapy results in little or no benefit over and above that offered by the psychological component of tinnitus therapy [10]. Tinnitus retraining therapy [11] also offers a combination of habituation and counselling, but its efficacy has not been definitively documented [12].

Tinnitus typically co-occurs with some degree of sensorineural hearing loss [13], suggesting that it arises from problems in the peripheral auditory system. Indeed, dominant models derived from neurophysiological data in animals consider tinnitus to be a maladaptive neuroplastic response to deafness, in terms of abnormal temporal firing [14, 15] or elevated spontaneous activity [1618]. In cases of severe steep-sloping high-frequency hearing loss, those frequencies at the audiogram edge may become disproportionately over-represented in the brain regions that represent sound frequency, perhaps through a process of cortical rewiring or the unmasking of latent cochlear inputs to regions newly deprived of direct inputs [19, 20]. However, the supporting evidence in humans for a direct link between cortical plasticity and tinnitus is unclear at present [21].

In the audiology clinic, hearing-aid fitting is a common intervention for sensorineural hearing loss. Although the device is prescribed primarily to alleviate deafness, enhancing the sound world of patients with demonstrable hearing loss can play a very important role in alleviating tinnitus [2225]. The neurophysiological effect of an enriched sound environment is not well established. One claim is that it prevents cortical tonotopic map reorganisation that occurs following sensory deprivation [26]. However, the research evidence from this study is not directly transferable to the audiology clinic because the research study exposed test animals to a highly unusual procedure; a continuous sound enrichment composed of multiple high-frequency tones was presented over a number of weeks, commencing immediately after the noise trauma that induced the deafness. Instead, clinical intuition favours the explanation that patients simply pay less attention to their tinnitus when external auditory stimuli are more audible [24].

It has been proposed that auditory perceptual training might provide a direct and frequency-specific method for inducing neuroplasticity by expanding the cortical representation of the trained frequency. The neural effect of repeated exposure to the same sound stimulus was first demonstrated electrophysiologically by Recanzone et al. [27]. Monkeys trained to discriminate a target frequency had significantly larger areas of primary auditory cortex tuned to that frequency than those who were untrained or those who were trained to discriminate a different target frequency. Several other animal experiments have confirmed that auditory training is associated with plastic changes in neural representations [2831].

Given these neurophysiological perspectives, it is understandable how the ability to directly modify the cortical code for sound represents an enticing therapeutic target. The premise for perceptual training is that it facilitates the renormalisation of aberrant neural activity. Indeed, Herraiz et al. [32] explicitly reasoned that auditory training with an active listening task would alter the cortical ‘map’ that is associated with tinnitus generation. In the somatosensory domain, a tactile discrimination training task (identifying the frequency and location of non-painful electric stimuli) was shown to be effective at reducing phantom limb pain, and this reduction in pain correlated with somatosensory cortex reorganisation as measured by electroencephalography (EEG) [33]. For tinnitus, the motivation for auditory perceptual training is that it can be tailored to redress the abnormal representations of particular frequencies in the central auditory system.

This review evaluates current evidence for a beneficial effect of auditory perceptual training on tinnitus. It addresses two objectives: (1) what is the effect of auditory training on tinnitus-related problems, and (2) could auditory training provide an effective clinical management strategy? In discussing these objectives, we also clarify the relative strengths and weaknesses of the existing body of research and suggest appropriate future research directions.

Methods

The Centre for Reviews and Dissemination (part of the National Institute for Health Research, UK) recommends core principles and methods for conducting a systematic review of health interventions [34]. This document guided our protocol for identifying the search strategy, study selection, data extraction, quality assessment and data synthesis.

Systematic Search Strategy and Study Selection

Our search strategy used electronic databases, supplemented at a later stage by searching reference lists of relevant studies and hand searching key journals. At the first stage, databases were searched using the keywords ‘tinnitus and learning’ or ‘tinnitus and training’. Databases were Cambridge Scientific Abstracts (including Medline, Biological Sciences, and Toxline), PubMed, and Web of Science. Keywords were always combined so that papers were identified only if they examined learning or training and also referred to tinnitus. Articles were selected irrespective of whether the effect of auditory perceptual training was the primary research focus. The database search in October 2009 revealed 316 articles of possible relevance. The results of this first search were screened to remove duplicates and, where possible, to also remove publications that were not available in English, were not peer-reviewed, or did not report human studies. This first evaluation procedure yielded a set of 94 articles for the next stage.

The 94 abstracts were assessed independently by two of the co-authors to select those that met a number of inclusion criteria. Inclusion criteria were identified according to the Participants, Intervention, Control, Outcome, Study design (PICOS) formula (Centre for Reviews and Dissemination [34]). In our case, participants were adults with chronic tinnitus, the intervention was an active listening task, the controlled design compared different types of training, a ‘trained’ group and a ‘not trained’ group, or a before and after training comparison, the outcome was a change in tinnitus or quality of life and the study design was at least a repeated measures design. Whenever an abstract contained insufficient information for evaluating it against the PICOS criteria, the article was automatically passed to the next stage for a full text review. This second evaluation procedure yielded a set of 25 articles that were judged worthy of further review. Full texts of the 25 articles were again reviewed according to the PICOS criteria for final inclusion. This third evaluation procedure excluded 16 of the articles: 4 were available only in German, 10 did not include any active listening task, 1 did not present any data and 1 did not involve participants who had tinnitus. Two of the remaining publications were from Dohrmann and colleagues [35, 36], who reported results from the same study. These two publications were treated as a single publication. This process gave eight studies for inclusion [32, 3542]. Additional searches were informed by these selected publications and were conducted in January 2010. We searched the reference lists of our selected publications and hand-searched the content lists of relevant audiology/psychology journals published over the previous 6 months. This led to the identification of 13 further abstracts, two of which met the PICOS criteria. One was a self-citation [43] which did not include our keywords in the title or abstract, and the other was a study accessed via Epub, ahead of print [44]. We searched the reference lists of these two additional articles but did not identify any further relevant references.

Data Extraction

Data extraction was performed using a reporting form specifically developed for this purpose. Extracted data comprised the following details: (i) study design; (ii) participants (inclusion and exclusion criteria, sample size, tinnitus characteristics and hearing loss); (iii) training task and stimulus (including amount of training); (iv) compliance and follow-up (including dropout or exclusion); (v) outcome measures; and (vi) findings (outcomes, statistical comparisons, and significance levels). The reporting form for data extraction was piloted on two publications, by two co-authors, and revised as necessary. Using this revised form, data from all ten studies were then extracted independently by two co-authors. Any differences in reporting were reconciled by jointly revisiting the relevant publication.

Quality Assessment and Data Synthesis

The aim of assessing study quality is to establish how near ‘the truth’ its findings are likely to be and whether its findings are relevant to people with tinnitus. A subset of extracted data was used to assess study quality, performed at the same time as data extraction by two co-authors. The quality assessment tool was based on Jadad et al. [45] and Oxman et al. [46]; using a procedure consistent with Ref. [34]. Eight scales provided an overall numerical quality score for each study. Seven of these eight ‘quality criteria’ represented internal validity (evidence of bias): (i) study design; validity, randomisation and control, (ii) blinding of the participants and investigators to the treatment or expected outcome, (iii) use of appropriate and validated outcome measures, (iv) matching of intervention task and control, (v) matching of participants in each group, (iv) measurement and description of compliance, and (vii) evidence of funding bias. The eighth criterion represented external validity (generalizability). For each of the eight criteria, a score of 2 was awarded if the publication met the criterion to a high standard, a score of 1 if it partially met the criterion, or a score of 0 if it was flawed or if the relevant information was not stated. Two co-authors scored studies independently and then agreed on a final score for each after discussion. This numerical scoring method gave a quality score between 0 and 16. Overall, study quality was graded as very low (0–4), low (5–8), moderate (9–12) or high (13–16). Table 1 reports the qualitative descriptors for these grades according to Oxman et al. [46].

Table 1 Grading of evidence quality to support confidence in the study’s findings and estimation of effect (adapted from Oxman et al. [46])

A quantitative synthesis (meta-analysis) of comparative publications could not be carried out because the outcome measures were not sufficiently similar across all studies. Instead, quality assessment was incorporated into a narrative synthesis to help interpret and explain differences in results across studies.

Results

Data Extraction

A clear descriptive summary of the included studies was achieved by tabulating details of study design, participants, training task/stimulus, compliance and follow-up, outcome measures and findings (Table 2).

Table 2 Descriptive synthesis of data extracted from the ten studies selected for detailed review

Study Design

The PICOS selection criteria required a repeated measures design. Hence, at minimum, studies reported before and after measurements in a tinnitus treatment group [i.e. 40, 42, 43] or an individual case study [41], without direct comparison to control participants. Five of the remaining studies report a controlled design, comparing auditory training with a different treatment group. Only one study [44] reported a randomised allocation of participants to such groups. This issue raises a key concern about internal reliability which is discussed later in the section on study quality.

Participants

Typically, both treatment and control groups comprised people with tinnitus. However, tinnitus severity was highly variable. For example, participants recruited by Herraiz et al. [44] reported a within-group tinnitus handicap ranging 32 ± 21 points, measured using the tinnitus handicap inventory questionnaire [47]. This range spans two categories of tinnitus handicap (mild and moderate). Participants who scored above 56 points (severe tinnitus) were excluded. Flor et al. [37] recruited participants presenting with symptoms which spanned three categories of tinnitus severity (slight to severe, measured using the Goebel-Hiller tinnitus questionnaire [48]). Range of tinnitus severity influences the extent to which the results are generalisable to a typical clinical caseload and the implication is discussed under the later subheading ‘External validity’.

Some studies did not fully report the characteristics of the control group, and so it is not possible to determine whether the procedures for matching participants were carried out in a systematic manner. The randomisation procedure employed by Herraiz et al. [44] produced treatment and waiting list control groups that were evenly matched according to tinnitus severity, handicap and loudness, but not according to tinnitus pitch. For example, all participants who matched their tinnitus to 3 kHz were allocated to the same group. In an earlier study [39] from the same group, participants were matched evenly according to age, gender, and hearing loss. Again, the groups were not matched according to tinnitus pitch; there were four participants in the control group whose tinnitus matched to 4 kHz, but none in the treatment group.

Training Task and Stimulus

The PICOS selection criteria required an active listening task, although the nature of the training task and stimulus differed somewhat across studies. Five studies used a single-frequency pitch discrimination task. Whether this was administered using an adaptive procedure [32, 3537, 41] or a procedure with fixed difficulty across trials [44] appeared to have no clear influence over outcome. There has also been some investigation of whether or not the choice of target frequency determines outcome. The question regarding choice of training frequency is relevant to the goal of relating study results back to the prediction based on the neurophysiological model that perceptual training might facilitate the renormalisation of aberrant neural activity that gives rise to tinnitus. Two of the studies reported that the training frequency had no significant effect on any of the outcome measures [32, 37]. In contrast, the study by Herraiz et al. [44] concluded that training at a frequency that was similar (but not identical) to the tinnitus pitch was most effective—a result that in our opinion does not easily support the neurophysiological model. The neurophysiological model might also predict a relationship between the amount of training and the size of the benefit. Our review showed that the duration and frequency of the auditory perceptual training regime ranged across studies from 10 min twice daily for 1 month [32] to 2 h per day for 1 month [37]. However, the results indicate no reliable effect of training duration on tinnitus outcome.

Although active listening tasks were used in all ten training regimes, few authors reported measures of change in performance in these tasks. Noreña et al. [41] made repeated measures of frequency-discrimination thresholds at all four trained frequencies. They found a significant effect of training at the highest trained frequency (6.5 kHz) but not at lower training frequencies. Searchfield et al. [42] trained participants on a sound identification and localisation task whose aim was to indirectly influence tinnitus by enhancing the focus of attention towards environmental sounds. Despite recording a number of measures of attention, none showed a significant change after training.

Compliance and Follow-up

Compliance was under-reported. However, two of the more recent studies by Herraiz and colleagues reported a high degree of compliance with the training protocol; 98% [38] and 95% [44]. Herraiz et al. [44] also reported a low drop-out rate of 9%, whilst Flor et al. [37] reported a dropout of 14% due to worsening tinnitus. It is possible that compliance is biased by the conduct of the study. For example, neither participants nor experimenters appeared to be blind to the treatment allocation, nor was it clearly stated that participants were unaware which training group was expected to improve. The implication of any blinding bias is discussed later. Information on follow-up is also scant. There was no evidence that any of the improvements reported had any longer-term impact beyond the end of training.

Outcome Measures

Unfortunately, there are no standardised measures for evaluating the effectiveness of tinnitus interventions [49], and so it is perhaps not surprising that no two research groups have used the same set of measures. The primary outcome measure was often a self-reported rating of tinnitus severity (using a validated questionnaire or a visual analogue scale) or a personal statement regarding change in tinnitus (Table 2). Given the theoretical rationale of auditory perceptual training is the alteration of any aberrant neural activity that might be giving rise to tinnitus, then perhaps the most relevant outcome measure is a change in tinnitus percept. Only four studies used a psychoacoustic measure of tinnitus percept, namely, tinnitus loudness [40, 42, 43], pitch match [41], and minimum masking level [42].

We note that some of the studies reporting psychoacoustic measures post-training did use a number of self-reported measures at the pre-training stage [40, 42]. Additional interpretive leverage, such as levels of clinical significance, could be gained if studies assessed the efficacy of the intervention using a combination of psychoacoustic and self-reported outcome measures. A combination of positive outcomes would be the ideal goal where training is demonstrated to reduce the tinnitus sensation and this, in turn, alleviates tinnitus distress.

Findings

This section considers self-reported and psychoacoustic outcomes measures, in turn. Overall, five out of six studies reporting some type of self-reported outcome measure of tinnitus handicap or severity demonstrated a positive benefit of training. Effects of training have typically been evaluated by a repeated measures test (using the tinnitus handicap inventory or Goebel-Hiller tinnitus questionnaire scores). Several studies have reported no reliable effect of training on Goebel-Hiller tinnitus questionnaire scores [3537], although Flor et al. [37] did report a significant mean reduction of 5.4 Goebel-Hiller tinnitus questionnaire points for a subgroup of participants who completed the lengthiest amount of training. Herraiz and colleagues [32, 38, 39, 44] favoured the use of the tinnitus handicap inventory as an outcome measure in all four of their studies and tested the change in tinnitus handicap score across the participant groups. We discuss these results in more detail because they highlight the importance of statistical power in determining repeatability of findings. Their first study [38] found that the change in score was not significantly different for the auditory training group (mean change was −4.1 points) compared to that of the control group (mean change was +0.1 points). With a larger number of participants, Herraiz and colleagues [39] did show a statistically significant effect of training for the same task. However, the mean changes in tinnitus handicap inventory score differed little from those reported in the previous study (−4.9 points for the training group and +1.5 points for the control group) suggesting that the earlier null result might have been due to lack of statistical power. The two more recent studies by the same group [32, 44] confirmed significant effects of auditory discrimination training in terms of a 9.4- and 7.4-point reduction in tinnitus handicap inventory score, respectively.

A number of studies also evaluated self-reported outcomes using some form of non-validated personal statement of tinnitus change (e.g. ‘my tinnitus is better’ or ‘my tinnitus is worse’) [32, 3739, 44], often in conjunction with a questionnaire-based score of tinnitus severity or handicap. However, these personal statements do not always concur with those outcomes based on the questionnaires [38]. Self-stated benefits are much more likely to demonstrate an effect than a comparable validated outcome measure. For example, Marshall [50] reported a 40% bias of positive results in randomised controlled trials where self-styled outcome measures were used instead of validated measures.

All four studies reporting some type of psychoacoustic outcome measure demonstrated a significant change. Changes in tinnitus loudness, matched frequency spectrum, and minimum masking level have again typically been evaluated by a repeated measures test. Noreña et al. [41] reported a significant change of tinnitus percept in the trained ear after frequency-discrimination training. After training, a reduction in the contribution of high-frequency components to the participant’s tinnitus was observed, whereby frequencies above 8 kHz no longer contributed to the reported spectrum. This effect might be interpreted as a benefit because it in effect, completely extinguished a component of the participant’s tinnitus. The authors did, however, speculate that training might have refined the participant’s matching of their tinnitus (i.e. changed cognitive representations) rather than changing the tinnitus itself.

One study evaluated training using more than one psychoacoustic outcome measure and so it is informative to explore the degree of convergence between results. Although Searchfield et al. [42] reported a significant reduction in the average minimum level of a noise required to mask tinnitus of 13.2 dB, training did not show a concomitant reduction in tinnitus loudness. This discrepancy is unexpected and perhaps indicates that at least one of the outcome measures is not reliable. Certainly, participants showed a large variability in terms of the effect of training on tinnitus loudness. The findings reported by Ince et al. [40, 43] also highlight some temporal instability in the measure, since there was a large within-session reduction in tinnitus loudness, which did not always carry over to the subsequent training session.

Study Quality and Data Synthesis

Numerical quality scores allocated for the eight quality criteria are summarised in Table 3 and are discussed item by item below.

Table 3 Assessment of eight quality criteria, use of power calculations, type of outcome measures, and the resulting overall quality rating for each of the ten studies

Study Design

Randomisation is the preferred or ‘gold standard’ design in therapeutic or ‘effectiveness of an intervention’ research [46, 51, 52]. Only one study, from Herraiz et al. [44], was reported as a randomised controlled trial, where participants were randomly allocated to different treatment groups (scoring a 2 for this criterion). Without randomisation, one cannot be confident that the observed effects are truly attributable to the auditory training intervention, rather than to intrinsic differences between groups or spontaneous recovery over time. Other studies scored 1 because they met the minimum criterion of a repeated measures design. Although studies implemented a repeated measures design, most of the authors did not confirm that the measure was stable before the start of treatment. Without any evidence of the test–retest reliability, we cannot be certain that any changes observed during training are specific to the intervention. Alternative interpretations of change might be an ‘anticipation of treatment’ effect or a ‘therapeutic relationship’ effect and these cannot be ruled out.

Blinding

Blinding of the participants and the experimenters to the treatment allocation is the most preferable or ‘gold standard’ design that would achieve a quality score of 2. No study reported the proper use of participant and experimenter blinding, and most scored zero on this criterion. One study described a partial degree of blinding, but this was limited to their neurofeedback group and did not apply to their auditory training group [35, 36]. We allocated a score of 1 to the study by Flor et al. [37]. In this case, it is reasonable to assume that the participants and experimenters were unaware of the group classification since this particular outcome was the result of a post hoc reallocation of participants according to the amount of training completed.

Outcome Measures

Since none of the studies took the opportunity to directly associate physiological change (as evidenced by a change in tinnitus sensation) with a change in a measure of tinnitus handicap or severity, all scored 1 for this criterion.

Matched Training Task and Control

Most studies report some form of control for the training task, either a group who did not train (scored as 1), a group who completed a different training regime (scored as 2), or in the case of Noreña et al. [41], the untrained ear of the trained participant (scored as 1). Some concern about the appropriateness of the control has been expressed by a number of the authors. In those studies where all participant groups complete some form of potentially therapeutic intervention, there is little control over those benefits that might be due to non-specific (e.g. interpersonal) factors or placebo. Flor et al. [37] acknowledge this limitation. In studies from Herraiz et al. [38, 39, 44] where the control group remained on the waiting list for the duration of the ‘treatment’ period, there is again no control over non-specific factors such as spontaneous recovery. In their two most recent publications [32, 44], Herraiz and colleagues expressed the future intention to exert better experimental control by requiring the control group to complete a passive listening task or an unrelated active listening task. This control is to be welcomed as it will strengthen the quality of evidence for a specific auditory training benefit.

Matched Participant Groups

Studies not reporting a control group scored 0 for this criterion. Studies that reported control groups but did not provide sufficient detail to allow comparison with the intervention group were scored as 1 [32, 35, 36]. Noreña et al. [41] was also scored as 1 as the trained and untrained condition referred to the same individual. Only two studies [37, 39] provided sufficient detail to measure similarity between groups and hence scored 2. In terms of internal validity, it is interesting to note here the general inconsistency in quality, even between studies conducted by the same author. One example relates to the quality of reporting control group characteristics by Herraiz and colleagues in three studies [32, 38, 39] which scored 1, 0 and 2, respectively.

Compliance Reported

Only Herraiz et al. [44] provided a comprehensive description of compliance, scoring 2 for this criterion. Five studies reported on some aspect of compliance, although not always in a quantifiable way, scoring 1. Compliance was not reported in four studies, scoring 0.

Evidence of Funding Bias

None of the reviewed studies suggested any evidence of a potential bias relative to their funding source, all therefore scored 2 for this criterion.

External Validity

Case reports scored 0 for this criterion, whilst most group studies scored 1 because they involved a clinical cohort. Studies recruiting sufficient numbers of participants with mild to moderate tinnitus have greater external validity than those recruiting people with severe tinnitus, because 66% of a typical clinical caseload reflects mild to moderate tinnitus handicaps (i.e. tinnitus handicap inventory scores between 18 and 56), whilst only 17% reflects severe tinnitus handicaps (tinnitus handicap inventory scores >56) [53].

Despite appropriate participant characteristics, none of the studies reported the use of a power calculation to determine sample size. Small sample sizes limit how well findings can be generalised to a typical caseload because it is not possible to accurately estimate the within- and between-group variance. Two example studies estimate the required sample size for reliably assessing tinnitus interventions. Landgrebe et al. [54] determined that a sample size of 68 participants per group was required to show a statistical and clinically relevant change in Goebel-Hiller tinnitus questionnaire score, more than twice the number studied by Dohrmann et al. [35, 36]. Using tinnitus handicap inventory scores to calculate their required sample, Gudex et al. [53] estimated that they needed 50 participants per group to be confident of a statistically significant and clinically relevant change in tinnitus handicap. None of our selected studies using the tinnitus handicap inventory as an outcome measure recruited treatment groups of this size. For Newman and colleagues [55], a change of at least 20 points in an individual tinnitus handicap inventory score was needed to be considered ‘clinically relevant’ because the 95% confidence interval is of that magnitude. Again, none of our included studies using the tinnitus handicap inventory as an outcome measure reported score changes of this magnitude.

Study quality was evaluated by comparing the overall score with the grading presented in Table 1. The overall scores enable us to address the first objective concerning the evidence that auditory training has an effect on tinnitus-specific outcome measures. The studies were judged to provide either low or moderate levels of evidence that auditory perceptual training may be of benefit in relieving tinnitus symptoms. Hence, we must conclude that further studies of auditory training interventions are likely to impact on our confidence in the estimate of the treatment effect and may even change the estimate of that effect [46].

Discussion

The first objective of this systematic review was to evaluate the evidence that auditory training influences tinnitus-related problems. Overall, 9 out of the 10 studies reported a statistically significant change. However, our preceding narrative synthesis of the quality assessment concluded that this evidence is not of sufficient quality to be confident that auditory training represents a strategy for improving the percept and severity or handicap of tinnitus. The evidence is simply not yet robust enough to guide treatment.

Turning now to the second objective concerning whether auditory training could provide an effective clinical management strategy, we discuss key issues that are fundamental to determining the efficacy of auditory perceptual training for tinnitus and, subsequently, to realising its development into a clinically useful tool. Whilst most studies reported some significant effect of auditory training on their chosen outcome measures, none explored the implications for their finding on the desired endpoint of this line of research, i.e. tangible benefit for the person who has tinnitus. This leaves open questions about the nature of any translational benefit of auditory training for tinnitus, and whether any benefits are specific to auditory training. We suggest a number of practical recommendations for future research to be considered when designing a high-quality randomised controlled trial.

First, it is important to evaluate whether or not the appropriate outcome measure is being used. The selection of a primary outcome measure should be driven by the desired observation. McFerran and Baguley [56] propose that, from the perspective of the audiologist, inhibiting tinnitus should be the aim of clinical intervention. Such a viewpoint would suggest a psychoacoustic measure of tinnitus (such as tinnitus loudness or minimum masking level) as the primary outcome measure rather than a self-reported measure (such as a reduction in tinnitus distress). However, based on this review, we recommend that a combination of psychoacoustic and self-reported outcome measures is implemented so that we can understand how the intervention reduces tinnitus percept and alleviates tinnitus distress.

Second, confidence in the stability of pre-training measures is crucial in order to be able to attribute patient benefit to the specific intervention. Perhaps this issue is of utmost concern for self-reported outcome measures using questionnaires about tinnitus handicap or distress, because clinical intuition suggests that the psychological response often naturally diminishes over time. For example, Dohrmann et al. [35, 36] provide information on the test–retest reliability of the Goebel-Hiller tinnitus questionnaire by measuring tinnitus severity twice before training. The score reduced from 34.5 to 29.5, with no further change after training. A drop of 5 points equates to a clinically meaningful shift in tinnitus severity from moderate to mild, which the authors attribute to an ‘anticipation of treatment’ effect. Hence, designs that do not confirm a stable baseline measure cannot attribute any change after auditory training to be a specific benefit of the intervention. To rule out non-specific benefits, we recommend applying a test–retest procedure during the pre-training stage or using a crossover comparison design.

Third, it is important to evaluate whether or not the outcome reflects a clinically significant change that shifts someone from one category of tinnitus severity to a less severe category. A change in Goebel-Hiller tinnitus questionnaire score of 5 points is considered clinically relevant and observable in a clinical population [54, 57]. Indeed, the two studies using this questionnaire reported a mean change of at least this magnitude [3537]. Herraiz et al. [32, 38, 39, 44] favoured the tinnitus handicap inventory, and a change of between 4.9 and 9.4 was reported as significant between the training and control groups. The tinnitus handicap inventory is a well-validated questionnaire for an initial assessment of tinnitus severity, but is less sensitive as a measurement tool for treatment outcome. The 95% confidence interval of the tinnitus handicap inventory score is 20 points [55], meaning that a change of at least 20 points is required for the difference to be considered clinically significant. Hence, across the studies from Herraiz et al., auditory perceptual training would appear not to facilitate a significant clinical benefit. Moreover, details of individual benefit were not provided, so any indication of how many participants reduced their tinnitus handicap by 20 or more cannot be assessed. In the future, we hope that the work from Henry et al. [58] will deliver a standardised tinnitus outcome measure for clinicians and researchers. These authors are developing a tinnitus functional index questionnaire, specifically to have high discriminative and evaluative validity for the assessment of tinnitus treatment outcome. Until this becomes available, we advise against the use of the tinnitus handicap inventory as a self-reported outcome measure of change.

Fourth, given the commitment of time and resources that auditory perceptual training can involve, it is important that any benefit is proven to be maintained beyond the training period. Longer-term outcomes are largely unreported in the reviewed studies and follow-up assessment should be a priority for future research.

Fifth, research to date has focused on the alleviation of tinnitus symptoms and has ignored usability issues relating to the training technologies. Whilst proof of concept is certainly an essential part of this aspect of translational research, to be broadly successful, any training regime must have intrinsic motivation [59, 60]. Indeed, a recent review of the efficacy of auditory training in adults highlighted that training needs to be engaging and to promote a desire to train through feedback or reward [61].

Conclusion

To date, the published evidence describing the effects of auditory training interventions for tinnitus is of low to moderate quality. We have therefore identified a need for randomised controlled studies that will generate high-quality unbiased and generalisable evidence for whether or not auditory perceptual training has a clinically relevant effect on tinnitus. Only when this evidence exists can particular forms or regimes of training be conclusively included or excluded from future research, and there can be a move towards large-scale clinical trials. As it stands, the question of whether auditory training might be developed as a clinical tool to manage tinnitus remains open to future findings. Our perspective on future research makes the following recommendations:

  • Use a combination of psychoacoustic and self-reported outcome measures

  • Establish confidence in the stability of the pre-training measures

  • Use a self-reported outcome measure that is sensitive to change (i.e. not the tinnitus handicap inventory)

  • Demonstrate longer-term benefits

  • Ensure the training regime is intrinsically motivating