Background

Chronic otitis media (COM) is a long-term inflammation of the middle ear and mastoid air cells, characterized by recurrent purulent discharge through the tympanic perforation. According to the WHO [1], China has a prevalence of COM ranging from 0.5 to 4%. A significant proportion of the population developed chronic condition as a sequel to acute otitis media (AOM) in their early childhood, and thus suffered from prolonged and cumulative impacts of the disease [2].

COM not only afflicts patients with recurrent or unremitting ear symptoms, including aural drainage, varying degree of hearing impairment etc., but also affects their mental state by generating anxiety or even social alienation [3]. Disturbed health-related quality of life (HRQoL) often complicate patients’ perspectives on the treatment outcome, which may not necessarily be consistent with the physicians' [4]. Therefore, patients’ participation in evaluation should be valued as important supplements to physicians’ viewpoints and physiological evidence. To study HRQoL, patient-reported outcome measures (PROMs) are often used as the assessment tools. These validated questionnaires with clinimetric and psychometric paradigms allow direct and comprehensive descriptions from the patients on their health conditions, and can be applied in both clinical practice and research settings [5, 6].

Currently, the Chinese version of the Chronic Ear Survey (CCES) [7] adapted from the original English Chronic Ear Survey (CES) [8] has been validated and applied nationwide as a Chinese-language PROM to evaluate HRQoL of adult patients with COM [9]. This 13-item questionnaire helps physicians to investigate the health consequences and treatment effectiveness in COM cases from three dimensions: i. Activity Restriction, ii. Symptoms and iii. Medical Resources Utilization [8]. The construction and certain items of CES have been referred to by several new instruments [10,11,12]. However, the CES does not involve any questions regarding the onset of tinnitus or psychological bearings [13], which are common COM patient complaints that might seriously compromise their HRQoL [3]. Recently, the Zurich Chronic Middle Ear Inventory (ZCMEI-21) [10] has been developed, which is a new questionnaire enabling a comprehensive evaluation of HRQoL. The ZCMEI-21 is subcategorized into four subscales: i. Ear Signs and Symptoms, ii. Hearing, iii. Psychosocial Impact, and iv. Medical Resources. Each subscale contains questions evaluating somatic or psychosocial outcomes scaling from 0 (absence) to 4 (extreme severity).

There has been an increasing need for a universal, disease-specific PROM by worldwide researchers [13,14,15]. Instruments applicable to different cultural settings allow for consistent trans-national data compilation and comparison. To standardize the reporting of quality-of-life outcomes in COM, the original German ZCMEI-21 has already been adapted into Japanese [16], English [17] and Italian versions [18]. The aims of this present study are to translate the ZCMEI-21 into Chinese (ZCMEI-21-Chn) and validate the Chinese-language instrument in the cultural context.

Methods

Patients and study centers

Inclusion criteria were i. diagnosis of COM with or without cholesteatoma (otitis media chronica cholesteatomatosa [OMCC] and/or otitis media chronica simplex [OMCS]), ii. adult age, iii. Mandarin Chinese as native language. A total of 223 patients were recruited via convenience sampling to complete the ZCMEI-21-Chn, along with the EQ-5D-5L questionnaire, during their outpatient visits at three referral centers between November 2018 and January 2019. Of the 223, 15 were excluded due to missing data, which leaves a total of 208 (93.3%) patients included in this present study (Department of Otolaryngology - Head and Neck Surgery, Peking University Third Hospital, Beijing, PR China: n = 61; Department of Otorhinolaryngology Head and Neck Surgery, People’s Liberty Army General Hospital, Beijing, PR China: n = 71; Department of Otorhinolaryngology, Tongren Hospital, Beijing, PR China: n = 76). The detailed study design and inclusion criteria are described in Fig. 1.

Fig. 1
figure 1

Study design for the translation and validation process of ZCMEI-21-Chn

Translation process

By referring to the ISPOR Principles of Good Practice [19], we standardized the translation and cultural adaption process into the forward, backward, and pretest steps, similar to the procedure of other ZCMEI-21 validation studies (see Fig. 1). The original inventory was primarily translated by two authorized translators independently. A native Chinese-speaking otologist with high proficiency in German then revised and merged these two transcriptions into a reconciled version, ZCMEI-21-Chn v1. A pilot test on v1 was conducted by 5 subjects, followed by cognitive debriefing and consensus meeting with the development team, though not explicitly recommended by ISPOR. V1 was then modified into ZCMEI-21-Chn v2 based on the feedbacks from the respondents. A third professional translation agency with medical background later translated ZCMEI-21-Chn v2 back into German. Certain items have been culturally adapted in the back translated version. For instance, on noticing that a significant proportion of the respondents were unable to connect their experience of dizziness with “the loss of balance control” originally stated in Question 5, we altered the description of this item into “Have you been experiencing dizziness or loss of balance?”. The transcription was reviewed against the original German version and revised for minor differences, before being subjected to a second cognitive debriefing process on another 5 subjects with no further adjustments. The final version of ZCMEI-21-Chn used in the following validation process was provided in paper-based form.

Validation process

Questionnaire survey

The final version of ZCMEI-21-Chn, along with the 5-level EuroQol five-dimensional questionnaire (EQ-5D-5L [20], referred to as EQ-5D), was administrated by patients meeting the abovementioned inclusion criteria at the clinics. Concurrent audiometric data were obtained from all the recruited cases. The layout panel from the original script [10] was adopted in the Chinese version of ZCMEI-21. For validation purposes, we involved an extra question directly assessing the general quality of life (Question 22, “My ear illness is worsening my quality of life…not at all/ mildly/ moderately/ severely/ very severely”).

The EQ-5D is a preference-based instrument used world-wide [21, 22] to assess generic HRQoL [23], and it comprises of the EQ-5D descriptive system and the EQ Visual Analogue Scale (EQ VAS). The former defines health in terms of 5 dimensions: i. Mobility, ii. Self-Care, iii. Usual Activities, iv. Pain/Discomfort and v. Anxiety/Depression; each dimension is depicted at five levels, corresponding to no, slight, moderate, severe and extreme problems. The latter, EQ VAS, measures the self-rated health state on the day of interview, ranging from 0 to 100 (corresponding to “worst” to “best imaginable health state”). The Chinese version of the EQ-5D questionnaire, administrated in the validation study, has been validated with a full set of rescaled EQ descriptive system scores [24], scaling from − 0.391 to 1.

The role that cultural differences play in shaping the patients’ perception of items originally developed in a foreign background was taken into consideration during the process. In addition, the results of the other international validation studies on the adapted versions of ZCMEI-21 enlightened us to hypothesize the correlation between the generic HRQoL scores and the COM-specific scale scores as moderate and positive.

Quality control was performed throughout the present study. Participants were provided with detailed verbal and written instruction on scale filling if needed. Unified training for standard procedure of recording of the questionnaires was conducted among all researchers. Each questionnaire was double entered and was checked in time.

Statistical analysis

Statistical analysis was performed using SAS (version 9.4, SAS Institute Inc., Cary, NC, USA), R Software (version 3.6.3, The R Foundation for Statistical Compupting), Mplus (version 7.4, Muthén & Muthén, CA, USA) and GraphPad Software (version 8, GraphPad Software, La Jolla, CA, USA). A two-tailed p-value less than 0.05 was considered statistically significant for all analyses. Values were reported as mean (SD) or as absolute number and percentage. The frequency distribution of the ZCMEI-21-Chn total scores was inspected through both a graphical approach and a normality test. The bell-shaped distribution fitting a normal probability curve in the histogram, and a p > 0.05 in D’Agostino-Pearson normality test indicated Gaussian distribution of the data. Items with a value of item total-correlation (ITC) corrected for overlap ≥0.3 were deemed as a “strong item” [10].

Before structural detection, sampling adequacy was confirmed via the Kaiser-Meyer-Olkin (KMO) test and Bartlett’s test of sphericity. A KMO value ≥0.80 and a p < 0.05 in Bartlett’s test indicated suitability of the data for factorial analysis. The developer of ZCMEI-21 suggested a hypothesized structure model comprising of 4 dimensions that also supported an overall score of the scale [10]. Therefore, a bifactor model and a second-order model was examined through fit indexes via confirmatory factor analysis (CFA) [25]. Based on theoretical considerations and statistical indications, models were modified to acquire fitting solution. Cutoff levels of fit indexes were: Root Mean Square Error of Approximation (RMSEA) < 0.06; both Comparative Fit Index (CFI) and Non-Normed Fit Index (NNFI), also known as Tuker-Lewis Index (TLI) > 0.95 [26]. Modified model fit was reanalyzed to prove statistical superiority over the original model via a chi-square difference test and comparison of AIC and BIC. Coefficient omega hierarchical (ωH) and explained common variance (ECV) were calculated to estimate the proportion of variance attributable to the single general target trait (general factor, G) [27], and to measure the unidimensionality of the scale [25] accordingly.

The pure-tone average (PTA) at speech frequency (0.5 kHz, 1 kHz, 2 kHz, and 4 kHz) [28] were collected from the patient’s concurrent audiometry testing. Criteria validity of ZCMEI-21-Chn was assessed with correlations to the PTAs of worse- and better-hearing ear.

Convergent validity was established by studying correlation between the total ZCMEI-21-Chn s cores, the additional question (Question 22) that directly addressed HRQoL the and EQ-5D descriptive system and VAS scores were examined using Pearson’s correlation analysis. Cronbach’s α and test-retest reliability suggested the reliability of the ZCMEI-21-Chn, with acceptance range set to ≥ 0.70 [7, 29].

The sample size for the validation survey was determined based on a subject to item ratio of 10:1, i.e. 21*10 = 210 cases [30].

Results

Detailed characteristics of the 208 respondents are listed in Table 1. Questions 8–10 assessing hearing impairment in detail were automatically skipped by 25 patients [10], who claimed, in Question 7, no detectable hearing impairment within the last 2 weeks.

Table 1 Patients Characteristics of the Validation Cohort

Single item statistics showed well-distributed answers and full range of answers (0–4) in every question. Detailed descriptive statistics of the individual items and subscales are listed in Table 2. The Item-Total-Correlation (ITC), corrected for overlap with the scale total, as one of the criteria for a strong item, was above 0.3 for all items except question 5 [10]. Correlation among the four subscales ranges from 0.18 to 0.61 (p < 0.001). Each subscore was moderately to strongly correlated with the ZCMEI-21-Chn total scores (see Table 3).

Table 2 Descriptive statistics of the individual items (ZCMEI-21-Chn)
Table 3 Descriptive statistics and correlation of subscales and total scores of ZCMEI-21-Chn

ZCMEI-21-Chn total scores followed a Gaussian distribution, suggested by both the histogram (Fig. 2) and the D’Agostino & Pearson normality test (p = 0.06). No significant differences of ZCMEI-21-Chn total scores among the three study centers were observed (p = 0.07, F(2,190) = 2.644, one-way ANOVA; Fig. 3).

Fig. 2
figure 2

ZCMEI-21-Chn total scores distribution and best-fitting Gaussian curve (bin-width on x-axis: 5)

Fig. 3
figure 3

ZCMEI-21-Chn total scores at the three referral centers

By conducting the KMO test and Bartlett’s test of sphericity on the data obtained, we found a KMO value of 0.87 and a p-value less than 0.001 respectively. From these results, we confirmed our data to be suitable to construct investigation via factorial analysis. The fit statistics of the hypothesized bifactor model with four domain-specific factors and the corresponding second-order model with four lower-order factors were reported in Table 4. The results of these fit statistics indicated post-hoc modification to both models. We obtained alternative bifactor and second-order solutions by deleting Question 5. Among all observable variables, item q5 was found with the lowest factor loading to the latent variables, also most poorly understood by the Chinese patient group. The chi-square difference test suggested that the modified bifactor model (Fig. 4) was a statistically better fit (Δχ2(18) = 32.96, p < 0.05) [31, 32] than the hypothesized construct. Also, RMSEA, NNFI and CFI of the modified bifactor model all fell within acceptable range, while those of the modified second-order model did not. Coefficient ωH of the general factor in the model fit was 0.65, and ECV was 0.47. For detailed fit statistics and the loading matrix of the hypothesized and the trimmed model, please refer to Tables 4 and 5 and Fig. 4.

Table 4 Fit indices of the hypothesized and the modified models
Fig. 4
figure 4

Modified bifactor model that best fits the data

Table 5 Factor loadings from the hypothesized and the modified bifactor models of the ZCMEI-21-Chn

Cronbach’s α of ZCMEI-21-Chn was 0.88, with all subscales’ above 0.70, except for the ear symptoms dimension. There were 53 patients who had not undergone significant clinical change from last visit, and were readministrated with the scale after a three- to four-week interval. And the test-retest reliability coefficient was also 0.88.

For validity evaluation, we observed a moderate correlation (r = 0.40, p < 0.001) between the question directly addressing HRQoL (Question 22) and the total scores of ZCMEI-21-Chn. The EQ descriptive scores were moderately correlated with the ZCMEI-21-Chn total scores (r = 0.57, p < 0.001). Yet, a weaker correlation was found between EQ-5D VAS and the ZCMEI-21-Chn total scores (r = 0.30, p < 0.0001). Next, the ZCMEI-21-Chn factor scores were correlated to the EQ-5D scores. Subscale representing the psychosocial impact, was strongly correlated to the EQ-5D descriptive system scores.

The correlation between the ZCMEI-21-Chn and the audiometric data were demonstrated in Table 6. Both worse-hearing ear PTA [53.84 (23.96) dB HL] and better-hearing ear PTA [25.82 (19.44) dB HL] significantly correlated with the hearing-related items and the total scores.

Table 6 Correlation between ZCMEI-21-Chn and audiometric data

Lastly, while comparing the total and subscale scores of ZCMEI-21-Chn, and the EQ-5D descriptive system score, a good level of comparability to the respective scores reported in the original validation study was discovered. Table 7 offers an extensive prospect over the global commonalities of the validation process for the various ZCMEI adapted versions.

Table 7 Descriptive statistics and correlation of the ZCMEI-21-Chn and the EQ-5D-5L, Cronbach’s α in the present study and the original validation study of the German-language ZCMEI-21, ZCMEI-21-Jap, ZCMEI-21-E, ZCMEI-21-It

Discussion

In the present study, we translated the ZCMEI-21 into Chinese by following the international guidelines. Next, we validated the ZCMEI-21-Chn in a multi-center study. Cronbach’s α and test-retest reliability coefficient of 0.88 indicated a good level of reliability of the entire questionnaire. Despite a lack of clear recommendations for the translated versions of an established scale, α ≥ 0.70 is a commonly used level for reliable measure in population studies [7]. With sampling adequacy to factor analysis confirmed with KMO and Barlett’s test, CFA was performed to seek structural models that fit the data. The results of CFA suggested that the modified bifactor model provided a significantly better fit to the matrix than the original hypothesized models. An ωH of 0.65 provided quantitative evidence that the scale scores generalize to a relative high extent to a latent variable (general factor, G) common to all the direct variables (except item 5). However, the domain-specific factors account for 53% (1-ECV) of the total variance, indicating that both an overall scoring and a combination of the subscores are meaningful interpretations of the scale [33].

Within CFA, the fifth item (q5) that questioned the patients about their feelings of dizziness or loss of balance control was already revised for inexplicability during the prior translation stage. The revised q5 turned out to remain poorly understood with the lowest loading to both F1(the symptom dimension, 0.06) and G(general factor, 0.18). The fit statistics of the model with complete deletion of q5 excels the one that only removed q5 from F1. Meanwhile, loading on item 6 (tinnitus) to F2 (hearing dimension) was 0.08, and 0.45 to G in the original bifactor model. We have also tried to modify the model by subtracting q6 from F2 solely or in combination with deleting q5. However, the fit indices of both models were found unacceptable, which suggested that further post-hoc modification would make neither theoretical nor statistical sense.

Furthermore, item 18 (protection from water)‘s loading to G was lower than the other items included in the psychsocial dimension (F3). Consistent with the findings during survey, the daily water-proof precautionary measures seemed to bother Chinese COM patients far less than the inconvenience brought by ear drainage or hearing impairment. Quite commonly did these patients respond to item 18 with “I have got used to wearing these earplugs during showers” or “It’s fine that I have quitted swimming ever since the onset of the disease”. Such discoveries that may involve a role of cultural ambiance inspire the interests and efforts in our future studies.

In addition, the fourth dimension (medical resources) demonstrated good internal reliability, yet showed the weakest correlation to both ZCMEI-21-Chn total scores and EQ-5D scores, as well as a non-significant correlation to PTA of either better- or worse-hearing ear. On one hand, this may suggest that seeking treatment played a comparatively limited part in hampering the general HRQoL in Chinese COM patients; on the other hand, the frequency of clinic appointments and medicine usages might not only be affected by the severity of hearing impairment or other complications, but also by the uneven distribution and accessibility of medical resources across the country.

A moderate correlation between total scores and the question directly addressing HRQoL might be explained by the underlying variance in perception and requirement of living standard in the Chinese cultural background. Weak to moderate correlation between the ZCMEI-21-Chn total, subscale scores and EQ-5D scores were well within expectation, since generic PROMs are reportedly less sensitive to self-rated HRQoL than disease-specific measures in hearing impaired or COM patients [34, 35]. Noticeably, correlation to the ZCMEI-21-Chn total and subscale scores of the EQ VAS scores were overall weaker than that of the EQ-5D descriptive system scores, possibly implying that the Chinese patients were unfamiliar with the visual analogous scale for the use of rating their health state.

Although no statistical comparison was performed, qualitative comparison revealed that the ZCMEI-21-Chn scored higher both overall and in each subset than the original German-language ZCMEI-21, as well as the other international adapted versions. This might be attributable to the relative scarcity of the primary care accessible to the Mandarin speaking population. In which case, only patients in a rather progressive state of the disease would come to seek professional help at the tertiary referral center in the capital. Thus, to assess the baseline ZCMEI-21-Chn scores of the Chinese COM population in order to compare with the other countries’, requires larger-scaled and more representative sampling in future studies.

There are a number of limitations in this study that ought to be acknowledged. First, EQ-5D was applied to the testing of convergent validity of ZCMEI-21-Chn, yet none to studying the hetrerotrait correlations in discriminant validity. Nonetheless, to administrate extra questionnaires during clinical visits at China’s overloaded tertiary referral centers seems infeasible. Future efforts will be devoted to issuing extended versions of ZCMEI-21-Chn applicable on the electric devices to allow comprehensive assessment and better experience of the patients.

The assumptions, under which Cronbach’s α is a consistent estimator for reliability, may not be entirely attainable in this study. For example, the varying factor loadings in the model fits were contrary to tau equivalence. Correlated error might arise from the order of items on the scale, speeded tests and so on [36]. Moreover, the multidimensionality and the Pearson correlation matrix may also bias the estimates of Cronbach’s α. On account of these unrealistic assumptions, a 5-level Likert, multidimensional scale like ZCMEI-21-Chn, may require substitute indicators for internal consistency, e.g. McDonald’s Omega or coefficients in G-Theory in future researches. Cronbach’s α was kept in the present study, also for the parametric comparison with the other international versions of ZCMEI-21.

Another limitation to our study is that, without different item functioning (DIF), the statistical results reported in this article may only serve as a rough guide to measure the relationship of this current study with the original research. Additionally, the sampling was neither randomized nor representative, rather purposive, which possibly resulted in the heterogeneity of the subjects, e.g. in ethnicity or education level. To further adapt ZCMEI-21-Chn, additional studies are needed focusing on balancing the potential ethnic influence.

Conclusion

We translated and culturally adapted the ZCMEI-21 into Chinese, and demonstrated the ZCMEI-21-Chn to be a reliable and valid self-reported outcome measure. Scores of the entire scale as well as of each dimension can be used to evaluate HRQoL in adult Chinese patients with COM. With health professionals’ understanding of the disease impacts and treatment effectiveness deepened, our future efforts include implementing the electronic version of ZCMEI-21-Chn as clinical routine, and enhancing standardized data aggregation on a global scale.