Introduction

Meniere’s disease (MD) is an inner ear disorder characterised by the clinical presentation of episodic vertigo, low- to mid-frequency hearing loss and fluctuating aural symptoms, with a potentially devastating impact on quality of life. Prevalence as high as 513/100,000 has been reported in population-based studies [1]. A series of diagnostic criteria have been proposed by international societies, most recently in 1995 and 2015 [4, 6], which are largely based on the subjective reporting of symptoms and audiometry [2,3,4,5,6,7,8]. However, MD may have variable manifestations, with the cardinal symptoms present in only 40% of patients with early disease [9, 10]. The ability of clinical criteria to capture atypical phenotypes [11] and to distinguish MD from alternative diagnoses has also been questioned [8, 12, 13]. Nevertheless, there have been no other reliable diagnostic methods until recently [14, 15], with the conventional role of MRI being to exclude other pathologies.

Endolymphatic hydrops (EHs) refer to the expansion of the endolymphatic space (ES) of the inner ear at the expense of the surrounding perilymphatic space (PS) and is considered to be the histological hallmark of MD [4, 16]. The MRI depiction of EH with delayed post-gadolinium MRI was first described in 2007 [17]. The ability of MRI to demonstrate EH and diagnose MD with both intra-tympanic and intravenous contrast administration has been evaluated in subsequent studies [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87]. The utilisation of delayed post-gadolinium MRI has been a major advance in otological imaging, with increasing worldwide application and a consequent shift in the diagnostic paradigm of MD. Some recent MD classifications [7] have even incorporated MRI within the diagnostic criteria [8, 88].

The reporting of EH on MRI is generally based on the evaluation of descriptors and semi-quantitative grading scales [19, 22, 24, 31, 46, 79, 94,95,96,97] but there is little consensus on which of these perform best in distinguishing affected ears. Despite previous systematic reviews on the subject [89,90,91,92,93], there have been no attempts to determine the pooled diagnostic performance of MRI descriptors. Meta-analysis would provide greater certainty as to how MRI should be interpreted to optimally corroborate the diagnosis of MD. Furthermore, whilst previous systematic reviews have been applied to a narrowly defined reference clinical standard such as “definite MD”, another question relates to whether MRI is diagnostically useful in atypical or monosymptomatic forms, such as cochlear MD (cMD) and vestibular MD (vMD) [98].

This systematic review and meta-analysis aimed to determine the diagnostic performance of MR descriptors in distinguishing ears with clinical MD, and how this differs between the MD clinical subcategories.

Method

This study applied the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines [99] and enrolled on the Prospective Register of Systematic Reviews (PROSPERO), CRD42022299285.

Search strategy

The search strategy was based on PICOS (population; intervention; comparator; outcome; study design). Population was defined as ears with MD symptoms; intervention as delayed (3–6 h for intravenous; 24 h for intratympanic) post-gadolinium MRI; comparator (reference standard) as clinical criteria for MD; outcome as qualitative or semi-quantitative MRI descriptors for MD; and study design as case-controlled cross-sectional studies [100]. Search terms were adapted after a pilot search to include relevant synonyms, before being subjected to Peer Review of Electronic Search Strategies (PRESS) [101]. Searches were performed in MEDLINE, EMBASE, Web of Science, Scopus, Cochrane Register of Controlled Trials and LILACS databases (supplementary 1). The search was performed from 2000 onwards. The searches were finally updated on 17/02/2022. Manual forward and backward searches were performed for all eligible and review articles. The five most frequently cited journals were hand-searched (2010–2021) and grey literature interrogated. Mendeley Reference Manager was used to collate the literature and duplicate studies were manually removed.

Selection of studies

Two independent reviewers (S.C./I.P.) applied a piloted screening tool to the titles and abstracts with the following inclusion criteria: defined MD ear disease group (supplementary 2); potential inclusion of control ears without MD; analysis of delayed post-gadolinium MRI. Case studies, review articles, foreign language literature and clearly duplicate studies were excluded. The full text was then independently assessed for eligibility by both reviewers according to the PICOS criteria. Inclusion required the extraction of 2-by-2 contingency tables, comparing the presence of MRI descriptors in MD ears (supplementary 3) with either asymptomatic ears contralateral to MD, asymptomatic ears in other subjects, or ears with an alternative audio-vestibular condition. Reasons for exclusions are listed in supplementary 4. Discrepancies were resolved by discussion. Authors were contacted to address any missing data from conference abstracts and full papers, and for clarification regarding potential overlapping data.

Data extraction

The same reviewers independently extracted data regarding (a) study characteristics: authors, year of publication, study centre and period, retrospective v prospective, sample size of MD and controls, gadolinium concentration, agent and route of administration, MRI system strength and sequences; (b) control type; (c) demographic and clinical characteristics: age and sex of the MD group, duration of MD, unilateral or bilateral MD, and clinical diagnostic criteria; and (d) MRI descriptors or grading scale analysed and number of observers.

Contingency tables (2-by-2) were constructed comparing the presence of clinical MD (reference test) to the presence of each MRI descriptor (index test).

Quality assessment

The methodological quality of the eligible studies was evaluated with a tailored Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) tool [102] by the two reviewers independently. Review specific guidance was developed with respect to the signalling questions (supplementary 5).

Statistical analysis

Bivariate diagnostic random-effects meta-analysis was conducted with R 4.2.1 (package “meta”) to evaluate the diagnostic performance of each MRI descriptor. The results were tabulated with receiver operating curve (ROC) plots and corresponding forest plots [103]. Sensitivity, specificity, diagnostic odds ratio (DOR), area under the curve, and positive and negative likelihood ratios were calculated after pooling of true positive, true negative, false positive and false negative values. Heterogeneity was assessed using Cochran’s Q test which tests the equality of sensitivities and specificities among the studies based on a chi-square distributed statistic (p < 0.001).

Meta-regression used a random effects model with restricted maximum likelihood estimation and the Kruskal–Wallis test evaluated (p < 0.05) differences in sensitivity and specificity between subgroups. The diagnostic performance of MRI descriptors for distinguishing ears with MD according to the current reference standard of the 2015 Barany criteria (“definite 2015”) was compared with that for each of the other clinical classifications: “definite 1995”, “probable 2015”, “probable 1995”, “possible 1995”, “cMD” and “vMD” (supplementary 2). Subgroup analysis of diagnostic performance for each MRI descriptor was performed for the “definite 2015” category, those clinical classifications in which diagnostic performance significantly differed (p < 0.05) and any other mono-symptomatic clinical classifications.

The following variables were also analysed for their potential influence on diagnostic performance: (a) control group type (asymptomatic ears contralateral to MD v asymptomatic ears in other subjects v ears with other audio-vestibular conditions); (b) route of gadolinium administration (IV v IT); (c) number of image reviewers (single v multiple observers); (d) analysed on an ear basis v patient basis; (e) sequences or post-processing depicting different bone signal (intermediate v low); (f) low risk of bias (any domain vs none); (g) high applicability (any vs none); (h) study design (prospective v other, consecutive recruitment v other). Deek’s funnel plots [104] depicted publication bias and sample size effect.

Results

Systematic review

Figure 1 is a study flow diagram documenting the search results and reasons for exclusion at each stage. The screening tool indicated 256 potentially relevant articles. After full text review, 72 studies were considered eligible.

Fig. 1
figure 1

Flow chart summary of the literature search and systematic review process

Study characteristics

The characteristics of all eligible studies are documented in Table 1 and supplementary 6 [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87]. There was a total of 3073 MD ears (mean age 40.2–67.2 years). The clinical classifications applied were as follows (supplementary 2): “definite MD” (2015, n = 37; 1995 n = 23), “probable MD” (2015, n = 13; 1995, n = 16), “possible MD” (n = 16), “cMD” (n = 11) and “vMD” (n = 7) (Table 1). The commonest control group type was asymptomatic ears contralateral to MD (n = 59) (Table 1). Gadolinium administration was described as intravenous (n = 57), intra-tympanic (n = 14) or both (n = 1), whilst MRI was most frequently performed at 3 T (n = 71) and with a 3D FLAIR sequence (n = 62) (supplementary 6). Multiple observers were documented in 40/72 studies, with inter-observer agreement statistics presented in 18/72 (kappa range 0.59–0.93) (supplementary 6). After clarification with authors, six studies were deemed to have partly overlapped data sets (supplementary 6). The meta-analysis therefore included 66 unique studies.

Table 1 Study characteristics

Categorisation of MRI descriptors

The two reviewers selected eleven MRI descriptors for analysis since they could be derived from at least four eligible studies. There were nine individual descriptors (Fig. 2) with two further combinations of descriptors. The presence of MRI descriptors was usually extracted from grading systems applied to the eligible studies [19, 22, 24, 46, 79, 91, 95] (supplementary 3). Due to subtle differences in some grading systems, descriptor definitions were adapted to capture the breadth of data across multiple studies (supplementary 7). Vestibular MRI descriptors of EH were as follows: “any vestibular EH”, “ > 33% area of ES relative to total vestibular fluid area” (Fig. 2F), “ > 50% area of ES relative to total vestibular fluid area” (Fig. 2G), “saccule to utricle ratio (SURI) or higher vestibular grade” (Fig. 2E),” fused utricle and saccule” (Fig. 2F) and “enhancing PS of the vestibule not visible” (Fig. 2G). Cochlear MRI descriptors of EH were “any cochlear EH” and “highest grade cochlear EH” (Fig. 2G). “Increased ipsilateral perilymphatic enhancement (PLE)” was additionally evaluated (Fig. 2H). Two MRI descriptors used a combination of features; when there was either “any vestibular EH” or “any cochlear EH”, it was termed “any EH”, and with additional increased PLE it was termed “increased ipsilateral PLE or any EH”.

Fig. 2
figure 2

Illustrations of the MRI descriptors. a T2 SPACE axial image is unable to distinguish the endolymphatic from the perilymphatic space and demonstrates the inner ear structures as high signal throughout. The cochlea (vertical arrow) and the vestibule (horizontal arrow) are indicated. b Delayed post-gadolinium 3D REAL IR axial image in a normal ear shows that the endolymphatic structures of the saccule (vertical arrow) and the utricle (horizontal arrow) demonstrated within the vestibule with the saccule being the smaller structure. The low signal endolymph is clearly distinguished from the surrounding enhancing perilymph. Schematic representations of (c) the normal endolymphatic structures and (d) the hydropic (dilated) endolymphatic structures in a MD ear (permission to use from Miss Irumee Pai). The lines depict the level of the axial sections which encompass the utricle (U) and saccule (S) in the other images Delayed post-gadolinium 3D REAL IR axial images in e to h depict the MR descriptors in ears with MD. e “Saccule to utricle ratio (SURI)”. There is inversion of the saccule to utricle ratio (SURI) with the saccule (vertical arrow) being larger than the utricle (horizontal arrow). f “Fused utricle and saccule”. The low-signal saccule and utricle are seen to be merged (horizontal arrow). There is also borderline “ > 33% area of ES relative to total vestibular fluid area” but it does not reach “ > 50% of ES relative to total vestibular fluid area”. g “Enhancing PS of the vestibule not visible “and “highest grade cochlear EH”. Severe EH is demonstrated with replacement of the vestibular perilymph by non enhancing endolymph (horizontal arrow) and there is also “ > 50% of ES relative to total vestibular fluid area”. There is severe cochlear hydrops (vertical arrows) as indicated by the non enhancing cochlear duct replacing the scala vestibuli enhancement (vertical arrows). h A right MD ear demonstrating “increased ipsilateral perilymphatic enhancement (PLE)”. The degree of perilymphatic enhancement within the inferior segment of the right basal turn (open arrow) is increased in the right symptomatic MD ear relative to the contralateral left asymptomatic ear (filled arrow)

Quality of studies

QADAS-2 evaluation showed high bias across all four domains in 22/66 studies, whilst only 3/66 studies demonstrated high bias in ≤ 1 domain (Fig. 3(A)). “Patient selection” always resulted in high bias since all studies were case-controlled. High bias was reported for “conduct and interpretation of test” since most studies only analysed MD cohorts and observers could not be blinded. “Reference standard conduct and interpretation” resulted in high bias when clinical classifications other than “definite 2015” were applied. “Flow and timing” bias occurred when multiple clinical criteria were evaluated within the study, thus not applying the same reference standard. There was applicability concern in ≤ 1 domain in 64/66 studies (Fig. 3(B)). The principal applicability concern was introduced in the “patient selection” domain when only a narrow range of clinical diagnostic criteria were studied.

Fig. 3
figure 3

Bar charts demonstrate (a) the risk of bias and (b) applicability concerns derived from the QUADAS-2 tool for the 66 eligible studies included in the meta-analysis

Diagnostic performance of individual MRI descriptors for MD

Pooled sensitivity, specificity, DOR, positive likelihood ratio and negative likelihood ratio are presented in Table 2. Forest plots for MRI descriptors are shown in Fig. 4, whilst summary ROC curves are shown in supplementary Fig. 1.

Table 2 Pooled sensitivity, specificity, diagnostic odds ratio (DOR), area under the curve (AUC) and likelihood ratios for MRI descriptors. Data in parentheses are 95% confidence interval MRI descriptors. DOR > 15 are highlighted in bold type
Fig. 4
figure 4figure 4figure 4figure 4figure 4figure 4

Forest plots with sensitivity and specificity for each MRI descriptor (ak), incorporating all relevant reports and with pooled values

All MRI descriptors were highly informative, with DORs ranging from 8.0 (6.1, 10.4) for “highest grade cochlear EH” to 131.7 (66.9, 259.2) for “increased ipsilateral PLE”. Five of the 11 MRI descriptors achieved a pooled specificity of > 90%: “SURI or higher vestibular grade” (92%; 95% CI: 90%, 93%), “fused utricle and saccule” (96%; 95% CI: 93%, 97%), “enhancing PS of the vestibule not visible” (99%; 95% CI: 97%, 99%), “increased ipsilateral PLE” (98%; 95% CI: 96%, 99%) and “increased ipsilateral PLE or any EH” (91%; 95% CI: 85%, 95%). Of these, the highest sensitivity was achieved with “increased ipsilateral PLE or any EH” (87%; 95% CI: 79%, 92%) which demonstrated a pooled DOR of 64.8 (95% CI: 29.7%, 141.2%). The other MRI descriptors with a sensitivity greater than 80% were “ > 33% area of ES relative to total vestibular fluid area” (83%; 95% CI: 81%, 85%) and “any EH” (81%; 95% CI: 79%, 82%).

Heterogeneity

All MRI descriptors demonstrated heterogeneity of sensitivity (Cochran’s Q test, p < 0.001). There were 4/11 MRI descriptors judged to show consistent specificity; however, 7/11 were heterogeneous predictors (Cochran’s Q test, p < 0.001) (supplementary 8). This heterogeneity is also reflected in the forest plots (Fig. 4) and Deek’s funnel plots (supplementary Fig. 2).

Clinical classifications and other covariates

The results of subgroup analysis for the clinical classifications and the meta-regression for other co-variates are shown in Tables 3 and 4. When “definite 2015” MD classification was used, “increased ipsilateral PLE or any EH” achieved improved sensitivity of 89% (95% CI: 83%, 95%) and specificity of 91% (95% CI: 86%, 96%). There was no significant difference in diagnostic performance for any MRI descriptors between “definite 2015” and either “probable 2015” or “definite 1995” clinical classifications.

Table 3 Subgroup analysis for diagnostic performance of MRI descriptors in definite 2015, cMD and vMD
Table 4 p values and significant co-variates on meta-regression

With respect to the monosymptomatic classifications, the diagnostic performance of “high grade cochlear EH” (sensitivity 41%, specificity 94%, DOR 10.12) and “any EH” descriptors (sensitivity 69%, specificity 79%, DOR 8.37) did not significantly differ between “cMD” and “definite 2015” MD ears (p = 0.3; p = 0.09). As for vMD, the MRI descriptors “any EH” (sensitivity 20%, specificity 87%, DOR 1.75), “any vestibular EH” (sensitivity 40%, specificity 82%, DOR 3.15) and “any cochlear EH” (sensitivity 40%, specificity 84%, DOR 3.54) demonstrated low sensitivity and the diagnostic performance was inferior to “definite 2015” MD.

The meta-regression showed that the type of the control group type had no significant influence on the diagnostic performance. Regarding other covariates, “any EH” and “any vestibular EH” showed superior diagnostic performance with multiple observers or intra-tympanic gadolinium administration. Superior diagnostic performance was achieved with sequences or post-processing which depicted bone as intermediate signal for four MRI descriptors (Table 4).

The Deek’s funnel plots demonstrated a small studies effect (p < 0.05) for three MRI descriptors.

Discussion

Despite increasing clinical application and impact on the diagnostic paradigm of Meniere’s disease (MD), there remains inconsistency in how delayed post-gadolinium MRI is interpreted and applied in clinical settings. This systematic review and meta-analysis evaluated 11 MRI descriptors for their ability to distinguish MD ears as defined by various clinical criteria. All descriptors were considered highly informative with DORs ranging from 8.0 (6.1, 10.4) to 131.7 (66.9, 259.2). “Increased ipsilateral perilymphatic enhancement (PLE)”, alone or in combination with “any endolymphatic hydrops (EH)”, demonstrated the highest DORs. This combination achieved the highest sensitivity (87% (95% CI: 79.92%)) whilst maintaining high specificity (91% (95% CI: 85.95%)) for MD, although it was only evaluated in three studies. Evaluation of EH for MD diagnosis was best attained with MRI features assessing the endolymphatic space alone, rather than comparing it with the perilymphatic area. Such descriptors with high DORs of 19.9 (15.5, 25.6) and 27.8 (16.6, 46.5) were “saccule to utricle ratio inversion or higher vestibular grade” and “fused utricle and saccule”. Diagnostic performance did not differ across definite 2015, probable 2015 and definite 1995 clinical classifications for any MRI descriptor. “Highest grade cochlear EH” and “any EH” performed similarly in monosymptomatic cochlear MD to the clinical reference standard of “definite 2015” MD (p = 0.3; p = 0.09). Sequences or post-processing which depicted bone as intermediate signal demonstrated superior diagnostic performance for four MRI descriptors. High risk of bias and heterogeneity was noted across the eligible studies included in the meta-analysis.

The current study differs in several respects from previous systematic reviews of delayed post-gadolinium MRI in MD [89,90,91,92,93]. Firstly, our contemporary literature search resulted in 72 eligible studies compared with 11–43 studies in previous reviews, providing sufficient data to enable a meta-analysis and pooled statistics for the first time. Secondly, this review evaluated 11 MRI descriptors compared with 1–4 descriptors in prior publications. Finally, inclusion and subgroup analysis of all clinical classifications explored the diagnostic performance of MRI in different symptomatic presentations.

The appropriate selection of specific versus sensitive MRI descriptors for the diagnosis of MD may depend on the clinical setting, as illustrated by a comparison of different vestibular EH descriptors. For instance, when low risk treatment or non-destructive interventions (e.g., intratympanic steroids) are being considered then overdiagnosis may be acceptable so “ > 33%” area of endolymph relative to total fluid area “would be a reasonable descriptor due to its higher pooled sensitivity (83%), despite low specificity (75%)”. Conversely, if vestibular-destructive procedures or trials of new interventions with potential morbidity are envisaged, then application of highly specific descriptors would be more appropriate. Regarding potential application to automated MRI analysis, it is of interest that vestibular MRI descriptors evaluating endolymphatic appearances alone demonstrated superior diagnostic performance, since current techniques focus on comparison with the perilymphatic space area.

Evaluation of MRI descriptors across the whole range of symptomatic presentations provided evidence for their diagnostic performance in differing clinical phenotypes [8]. The meta-regression indicated that most descriptors had a similar ability to diagnose MD when applying the current “definite 2015” reference standard or alternative clinical classifications, supporting the role of MRI in wider clinical situations. This extended to monosymptomatic presentations for some descriptors, with the presence of “any EH” being able to detect cMD ears with DOR of 8.37 (4.34, 16.12).

There are limitations to the current study, with respect to both the review process and the evidence available. Firstly, although the risk of missing data was minimised as far as possible, a body of non-English language literature (36 screened studies) was not reviewed. Secondly, since only a limited range of MRI descriptors were applied in individual studies, it was not feasible to perform head-to-head comparisons, introducing bias due to the indirect comparison of individual MRI descriptors. Thirdly, as the eligible studies principally focused on definite MD, the subgroup analysis of atypical forms of MD for less frequently analysed MRI descriptors yielded insufficient numbers for pooling data. Fourthly, it would have been pertinent to perform meta-regression for the “Increased ipsilateral PLE” descriptor with respect to the control group of “other audio vestibular disorders” (since it may also occur with differential diagnoses such as perilymphatic fistula) and for constant versus variable flip angle FLAIR sequences (since this influences the degree of PLE); however, this was precluded due to the limited number of eligible studies. Fifthly, variations in sensitivity (all descriptors) and specificity (7/11 descriptors) led to significant heterogeneity across studies. Meta-regression demonstrated that this was at least partly due to variable clinical classifications, MRI technique, analysis, study design, applicability and bias. Finally, the high level of bias should be considered. In particular, all eligible studies were case controlled, potentially resulting in an overestimation of diagnostic accuracy [105].

Conclusion

This systematic review and meta-analysis evaluated the relative performance of MRI descriptors for the diagnosis of MD. “Increased ipsilateral PLE” was a key descriptor, and in combination with EH, it achieved optimal specificity with sensitivity. MRI descriptors of EH which did not rely on a comparison with perilymphatic area showed the best diagnostic performance for MD. MRI diagnosis of EH can be usefully applied across a range of clinical classifications including monosymptomatic cMD. Future research and meta-analysis would benefit from consensus on standardised MRI descriptors and a minimum clinical data set, whilst MRI descriptors should also be evaluated for prognosis and prediction of treatment response through longitudinal studies.