Introduction

Functional dyspepsia (FD) is characterized by symptoms including bothersome postprandial fullness, early satiation, epigastric pain and scorching heat with no evidence of organic damage [1]. FD is a common condition with a high prevalence throughout the world; according to research it has affected up to 29.2 % of the global population [2, 3]. FD symptoms often impact aspects of the patients’ health-related quality of life (HRQL), these including abdominal pain and indigestion, emotional distress, problems with food and drink, impaired vitality, and heavy economic burdens [4, 5]. Consequently, the HRQL endpoint is critical in assessing the clinical outcomes of FD.

Numerous disease-specific scales have been developed for FD, a few include quality of life in duodenal ulcer patients (QLDUP) [6], quality of life in reflux and dyspepsia (QOLRAD) [7], functional digestive disorders quality of life questionnaire (FDDQL) [8, 9], quality of life in peptic disease (QPD) [10], Nepean dyspepsia index (NDI) [11], and severity of dyspepsia assessment (SODA) [12], etc. So far, no Chinese version of these scales has been translated and validated. Considering that FD is also a very common disease in China, with a high prevalence of up to 18.92 % [13], and also, few useful HRQL instruments for FD exist in clinical research and practice, the introduction of a Chinese FD HRQL instrument is urgent and necessary.

Functional digestive disorders quality of life questionnaire (FDDQL) is a disease specific scale originally developed in French and validated by Chassany Olivier et al. in 1998 [8, 9]. It aims to measure the specific physical, psychological, and perpetual impacts of FD and irritable bowel syndrome. The 5-point Likert scale contains 43 items under eight subheadings, which are activities (8 items), anxiety (5 items), diet (6 items), sleep (3 items), discomfort (9 items), health perceptions (6 items), coping with disease (3 items) and impact of stress (3 items). A higher overall score indicates a better HRQL status. With the validity and reliability being evaluated, it has further been translated into Italian, Hungarian and Spanish, adapted for US, English, French and Canadian patients [8, 14], and applied in many clinical trials [1517]. This study aims to translate and cross-culturally adapt the English version of FDDQL into Chinese (in Mandarin).

Methods

Participants

Expert Panel

The study group included one coordinator, four translators, three gastroenterologists, two nurses, two HRQL experts and one secretary. The group aimed to conduct and participate in each research stage, with the guidance of moderator (Prof. Liu Feng-bin) during the overall research process and translation procedures.

Patients

According to Rome III diagnostic criteria, functional dyspepsia (FD) must include one or more of the following: (a) bothersome postprandial fullness, (b) early satiation, (c) epigastric pain, and (d) epigastric burning, as well as no evidence of structural tissue damage (including upper endoscopy level) that is likely to explain the symptoms. The FD patients were divided into Postprandial Distress Syndrome (PDS) and Epigastric Pain Syndrome (EPS) as Rome III defined.

The inclusion criteria were: (a) the existence of FD defined by Rome III, (b) age range between 18 and 70 years, and (c) Chinese literate. The following were excluded: pregnant or lactating women, FD patients with disturbance of consciousness, mental illness or other specific diseases who cannot comprehend the scale. The diagnostic, inclusion and exclusion criteria applied for every step in the study.

Permissions for the Use and Translation of FDDQL

The User Agreement (see Appendix 1) and Translation Agreement (see Appendix 2) of English version of FDDQL (see Appendix 3) from MAPI RESEARCH TRUST were obtained by the corresponding author (Prof. Liu Feng-bin) on December 23rd 2008. Meanwhile, permission for the study from the Research Ethics Committees in Guangzhou University of Chinese Medicine was also obtained.

Forward Translations

Two Chinese translators (A and B) proficiently fluent in English translated the complete English version of FDDQL, including item content, response options and instructions, into Chinese independently (FWT-A and FWT-B). Translator A was a physician and researcher. His work was intended to produce a translation providing a more reliable equivalence from a measurement perspective. Translator B (naive translator) has no medical background. His task was more focused on highlighting ambiguous meanings in the original questionnaire. They produced written reports summarizing all difficulties encountered, choices made and remaining uncertainties.

Synthesis of the Translations

The aim was to come up with one single version which is accepted by most participators. Coordinated by Prof. Liu Feng-bin, two translations (FWT-A and FWT-B) were merged into one single forward translation (FWT-A/B). The agreements and differences, even if they were very tiny, such as one word or punctuation marks, etc., were identified. The agreements were accepted for further processing, conversely, the differences were discussed item by item by the study group and three FD patients in multi-wave focus group meetings. If the disagreements were too difficult to resolve, alternative wording was suggested in the provisional forward translation for resolution through the backward translation process.

There were seven fully consistent items (Q17, Q19, Q20, Q31, Q33, Q39, Q40) in this step. The other 36 items had differences more or less. Of those, many were distributed on the synonyms, adjectives, prepositions, punctuations, word order, attributive adjuncts, etc. For example, as for Q2 ‘have you had to disrupt your daily activities?’, the FWT-A described as “您的消化问题会影响日常活动吗?”, and FWT-B as “您的消化问题会扰乱您的日常活动吗?”. The words “影响” and “打扰” were synonyms in Chinese, and “您的” additionally defined “日常活动” in FWT-B. The two translations were highly similar. Two-wave focus groups meetings were performed and no disagreements were too difficult to resolve (see Table 1, the FWT-A, FWT-B and FWT-A/B are available by request).

Table 1 The qualitative research procedure for functional digestive disorders quality of life questionnaire (FDDQL) translation and the results

Backward Translations

Totally blinded to the original English version of FDDQL, two translators (C and D) with a high level of fluency in English translated the single forward translation (FWT-A/B) back into English independently (BWT-C and BWT-D). Translator C was a physician and researcher with the objective to provide equivalency from a more clinical perspective. Translator D (naive translator) has no medical background. His work aimed to detect the more subtle differences in meaning of the original and offer a translation that reflects common Chinese used.

With the help of a translation coordinator, the agreements and differences between backward English translations and the original questionnaire were identified. Then, multi-wave discussions were performed until one single version (BWT-C/D) accepted by most participants was approved for pilot-testing. Challenging phrases, uncertainties and rationale of final decisions were recorded. An expert review (coordinated by Prof. Liu Feng-bin) was conducted to discuss and resolve any ambiguities in each translation version, and then the pilot testing of FDDQL was produced. The results were shown in Table 1. The BWT-C, BWT-D and BWT-C/D are available upon request.

Pilot Testing

The study group interviewed 30 FD patients with different educational levels individually by using a semi-structured questionnaire. The interview focused on items which are difficult, confusing, offensive and alternative questions. Then, the study group discussed the disagreements, comprehension, interpretability and suggestions for improvement, and the field testing for the Chinese version of FDDQL was produced (see Table 1). All the modified versions are available upon request.

Field Testing

Field testing was conducted to collect answers to each question for psychometric validation. A total of 327 consecutive adult patients diagnosed with FD were asked to participate in the study, 300 of them completed the survey. Enrolment started in November 2009 and ended in April 2010 among patients who were attended at the In- and Out-Departments of the First Affiliated Hospital of Guangzhou University of Chinese Medicine. All the participators had to complete the Chinese version of FDDQL and demographic questionnaire (which contains age, gender, residence, highest education level, disease duration, and disease subtype) once they were enrolled. The reliability, validity, responsiveness, individual items property with item response theory (IRT) and differential item functioning (DIF) analysis of the Chinese version of FDDQL were then psychometrically tested using the collected questionnaires.

Of these, 100 FD patients were asked to answer the questionnaire for a second time, after an interval of 1 or 2 days, to assess the test–retest reliability of the FDDQL. Also, 100 participants who had not previously received therapy and who were to start therapy were asked to answer the questionnaire twice—before replacement therapy, and again 2 weeks after beginning the therapy, to assess the responsiveness of FDDQL. All the patients received the same therapeutic regimen. The 2-week period was adopted because clinical experience has demonstrated that the patients’ health status usually improved significantly with correct interventions in this interval.

In order to assess the criterion validity of FDDQL, the Chronic Gastritis Subscale in Gastroenteric Disease Patient-Reported Outcome Scale (GEDPRO-CG, Chinese Version) would also be completed in the first interview simultaneously by at least 100 FD patients. GEDPRO-CG was a 31-item self-administered instrument to assess the health status of chronic gastritis and FD patients. The 5-point Likert scale contains four domains: physical (19 items), psychological (4 items), independent (4 items) and environment (4 items). Each item scored from 1 (best) to 5 (worst), with higher scores indicating worse health status. The previous studies showed it had good reliability, validity, responsiveness and item properties [18]. Furthermore, 110 healthy people were asked to answer FDDQL to assess its discriminant validity, of those, 100 completed the survey.

Data Analysis

Demographic and clinical variables of the participants were summarized using descriptive analyses. For reliability, the internal consistency reliability, test–retest reliability and split-half reliability were examined. A Cronbach’s alpha coefficient of ≥ 0.70 was considered acceptable for internal consistency. The correlation coefficient of ≥ 0.70 was considered acceptable for test–retest reliability. The half-tests were created by splitting out the odd-numbered items as one half and the even-numbered items as another half. The correlation of scores between the two halves was calculated by using the Spearman-Brown formula. The coefficient of ≥ 0.70 was considered acceptable for split-half reliability [19].

Validity, the construct validity, criterion validity and discriminant validity were examined. For construct validity, correlation analysis and confirmatory factor analysis (CFA) were performed to test the hypothesized domain structure. Higher correlation coefficient with its own domain rather than other domains indicates good construct validity. Overall and every domain's model fit statistics were examined in CFA, as well as standardized regression coefficients (factor loadings) for each item. Good model fit is indicated when the Bentler comparative fit index (CFI) is above 0.90. In addition, root mean square error of approximation (RMSEA) should be below 0.05 as an indication of good model fit, or below 0.08 as acceptable model fit [20]. Criterion validity was calculated with Pearson correlation coefficients among all domains of FDDQL and GEDPRO-CG. The correlation values between 0.10 and 0.29 are considered weak, between 0.30 and 0.49 are considered moderate, and between 0.50 and 1.00 are considered strong. Discriminant validity was measured by the between-groups comparison of FD patients and healthy people. Responsiveness was measured by the within-groups comparison of before- and after-treatment in FD patients.

Item response theory (IRT) was a mathematical model-based approach used to understand the relationships between individuals’ HRQL (trait latent) and their response patterns [21]. In IRT, the number of item parameters to be estimated determines which IRT statistical model will be used. IRT models can be divided into two families: unidimensional and multidimensional. Of those, multidimensional IRT models model response data hypothesized to arise from multiple traits. The FDDQL data were fitted to the partial credit model (PCM). Person separation index (PSI) values of 0.90 or greater indicate excellent property, and individual item fit residual statistics were acceptable when the value ranged from −2.5 to +2.5. The item fit residual statistics (short for Fit Resid) was analyzed by chi square with Bonferroni correction [22].

Differential item functioning (DIF) of each item was also evaluated. For a certain item, if distributions of the response from different people with the same HRQL (trait latent) were different, then the item was regarded as having DIF. If the item displayed a constant difference between groups through the whole range of HRQL, then the item was considered displaying a uniform DIF. When the differences occurred only at a certain level, the item displayed a non-uniform DIF. Both uniform DIF and non-uniform DIF were checked.

Data description, reliability, validity and responsiveness of FDDQL were analyzed by SPSS 11.0. CFA was conducted by using Lisrel software (version 8.7) [23]. IRT and DIF analysis was performed with the Rasch Unidimensional Measurement Model software 2020 (RUMM) [24]. All statistical tests were two-tailed, and the level of significance was set at 5 %.

Results

Socio-Demographic and Disease Characteristics

A total of 327 FD patients and 110 healthy people were enrolled in the field study. Of those, 27 patients and ten healthy people who didn’t complete the survey due to inadequate time were excluded. Finally, 300 FD patients and 100 healthy people were engaged in total data analysis. Of those, 100, 100, and 100 patients were included for test–retest reliability, criterion validity and responsiveness analysis, respectively. The socio-demographic and disease characteristics of different group participators are shown in Table 2. There is no missing data in item response. The average completion time of FDDQL was 12.45 ± 3.13 min.

Table 2 The socio-demographic and disease characteristics of participators in field testing

Reliability

Three hundred patients’ data were used for internal consistency reliability and split-half reliability analysis, and 100 were used for test–retest reliability analysis. The global Cronbach’s α of the Chinese version of FDDQL was 0.932, and coefficients of eight domains ranged from 0.676 to 0.817. The global split-half reliability coefficient was 0.823 and coefficients of eight domains ranged from 0.703 to 0.820. As for test–retest reliability, all domains’ coefficients were greater than 0.9 except the health perceptions domain (r = 0.738) (Table 3).

Table 3 Scale reliability of Chinese version of functional digestive disorders quality of life questionnaire (FDDQL) (43 items, 8 domains)

Validity

Construct Validity

Items–domains correlation analysis showed that all items correlated more strongly with their own domains than with other domains (Table 4). CFA analysis showed the CFI of global FDDQL was 0.902 and RMSEA was 0.076, and the CFI values of activities (0.950), anxiety (0.970), diet (0.960), sleep (0.965), discomfort (0.910), health perceptions (0.940), coping with disease (0.920) and impact of stress (0.905) domains were all greater than 0.9 (see Fig. 1, and the complete CFA results are available upon request).

Table 4 Items–domains correlation analysis of Chinese version of functional digestive disorders quality of life questionnaire (FDDQL) (43 items, 8 domains, N = 300)
Fig. 1
figure 1

Confirmatory factor analysis of global functional digestive disorders quality of life questionnaire (FDDQL) (43 items, question 1–43)

Criterion Validity

The criterion validity of the Chinese version of FDDQL was assessed by the correlations with GEDPRO-CG. It should be noted that higher scores on FDDQL indicate better quality of life, while higher scores on GEDPRO-CG indicate worse health status. Consequently, strong negative correlations indicate good criterion validity. Almost all the Spearman rank correlation coefficients (86.11 %) were statistically significant (p < 0.05). The two most strongly correlated FDDQL with GEDPRO-CG were those for activities and psychological domains (r = −0.73), and the two weakest correlated domains were impact of stress in FDDQL and psychological in GEDPRO-CG (r = −0.13) (Table 5).

Table 5 Criterion validity analysis between the Chinese version of the functional digestive disorders quality of life questionnaire (FDDQL) and chronic gastritis subscale in the gastroenteric disease patient-reported outcome scale (GEDPRO-CG, Chinese version, 31 items, 4 domains) (N = 100)

Discriminant Validity

The discriminant validity was assessed by comparing FDDQL scores between FD patients and healthy people. There were no significant differences on the age (p = 0.766), gender (p = 0.686), residence (p = 0.286) and highest education levels (n = 0.365) between the FD patients and healthy people. Scores for each domain ranged from 0 (poor quality of life) to 100 (good quality of life), and the healthy people have higher scale mean scores. All the domains’ differences between FD patients and healthy people were significant (p < 0.001) (Table 6).

Table 6 Discriminant validity analysis of the Chinese version of the functional digestive disorders quality of life questionnaire (FDDQL) (43 items, 8 domains) with functional dyspepsia patients and healthy people (N = 100)

Responsiveness

The mean change in FDDQL domain scores from baseline to 2 weeks indicates statistically significant changes (p < 0.05) (Table 7). Of those, the SLEEP domain demonstrated the greatest change in patient-perceived quality of life, with mean change scores of 10.42 ± 1.19 (p < 0.001). The effect sizes (ES) of FDDQL from baseline to 2 weeks was 0.49 and the standardized response mean (SRM) was 1.04.

Table 7 Responsiveness analysis of the Chinese version of the functional digestive disorders quality of life questionnaire (FDDQL) (43 items, 8 domains) (N = 100)

Item Response Theory and Differential Item Functioning Analysis

All the items fitted for the IRT analysis and partial credit model (PCM) were used. The PSI was equal to 0.920. The threshold estimator of the items showed in the third column of Table 8 was normally distributed with a mean of 0 and SD of 1.27. The threshold estimator of item 31 was minimum (Q31 = −2.04), which meant that “have you been satisfied with your digestion?” was the most easy item for FD patients to get a high score. The threshold estimator of item 3 was maximum (Q3 = 2.30), which meant that FD patients had the greatest difficulty in getting a high score for “have you had any difficulties carrying out your leisure activities”. The residuals of each item were between −2.5 and 2.5, with no statistical significance, which also meant the model was consistent with the theoretical model (the fourth and fifth column of Table 7). All the factor loadings of items were statistically significant, and almost all of them were greater than 0.4 (see the second column of Table 8). The structural plot of observed variables and latent variables are shown in Fig. 1. As we all know, DIF contains uniform and non-uniform DIF. The analysis in this study found that the items of the Chinese version of FDDQL had neither uniform nor non-uniform DIF in different genders and age groups (≤30, 31–44, ≥45 years).

Table 8 Confirmatory factor analysis, item response theory and differential item functioning analysis of the Chinese version of the functional digestive disorders quality of life questionnaire (FDDQL) (43 items, 8 domains) (N = 300)

Discussion

The abdominal pain or discomfort caused by functional dyspepsia (FD) results in interference of daily activities and brings considerable anxiety and depression to patients. The assessment of HRQOL of FD patients is essential. The FDDQL scale was developed by a collaboration of French, English and German researchers, and has been widely used in many countries. To date, it has already been translated into English (for Canada, UK, USA), French (for Canada), German (for Germany), Hungarian, Italian (for Italy), Russian (for Russia) and Spanish (for Spain). Due to the growing number of FD patients, it has become an absolute necessity to develop or introduce a scale with adequate psychometric characteristics for the quality of life measurement. So, the development of the Chinese version of the FDDQL was necessary. Self-evaluation of the QOL by the patients might provide insight into appropriate measures for patient treatment and care. Also, this study describes a translation and validation process of FDDQL to Chinese (see Appendices 4 and 5).

Psychometric Properties

The Chinese version of the FDDQL has good reliability. Internal reliability analysis showed the Cronbachs’ α of global FDDQL was excellent (0.932), with each domain greater than 0.7 except sleep (0.676). This may be caused by the fewer number of items (3 items). The results were consistent with previous studies in which Cronbachs’ α ranged from 0.69 to 0.89 [3]. The split-half reliability coefficient of the Chinese version of FDDQL was 0.823 with each domain greater than 0.7. As for test–retest reliability analysis, all the coefficients of FDDQL domains were greater than 0.9 except coping with disease (0.738).

In validity analysis, the correlation coefficients of all the items with their own domains were significantly higher than the others. In addition, the confirmatory factor analysis (CFA) model was used to reflect the relationship between latent variable and items. The CFA showed the determination coefficient was 0.42 which means the structure model explained 42 % variation of the dependent variable. CFI of the overall model was 0.902 and RMSEA was 0.076, which indicated the model was consistent with the theoretical construct. As for criterion validity, it was mainly supported by the pattern of correlation between FDDQL and GEDPRO-CG. The GEDPRO-CG scale was developed in standard procedure which contains physical, psychological, independent and environment domains. The physiology and psychology domains of FDDQL had significant high correlation coefficients with the physical and psychological domains of GEDPRO-CG, in contrast to independent and environmental domains of GEDPRO-CG. This was consistent with the original research [3]. Also, the discriminant capacity of the FDDQL questionnaire was excellent because the patients reported significantly lower scores than healthy people.

The responsiveness of FDDQL was also confirmed. After FD patients received treatments, their symptoms and psychological status were improved, and almost all domain scores increased significantly. The result was similar with the previous study in which people had significantly increased scores in most FDDQL domains after 7 days intervention [25].

The Person separation index (PSI) of FDDQL was 0.920. The threshold estimator of items was normally distributed with a mean of 0 and SD of 1.27. The residuals of items were between −2.5 and 2.5, with no statistical significance. All the items were invariant (no item has uniform or non-uniform DIF) in different genders and age groups (≤30, 31–44, ≥45 years old). This means FD patients in different genders and age groups respond similarly when they suffer from similar severity disease.

Strengths and Limitations

The main strength of this study is that the questionnaire that was psychometric evaluated with internal consistency reliability, test–retest reliability, split-half reliability, construct validity, criterion validity, discriminant validity, responsiveness, confirmatory factor analysis, item response theory and differential item functioning analysis. The assessment aspects were comprehensive and all the results indicated the Chinese version of FDDQL has good properties. The other strength is that the questionnaire was translated with a rigorous procedure, which includes study group establishment, permissions acquisition, forward translations, synthesis of the translations, backward translations, pilot testing and field testing. The comprehensive psychometric evaluation methods and rigorous translation procedure ensured the Chinese version of FDDQL was scientific and convincing.

The main limitation of the study is that the irritable bowel syndrome (IBS) patients, another intended population for FDDQL, were not included; however, further studies with IBS patients were in progress.

Conclusion

The Chinese version (in Mandarin) of functional digestive disorders quality of life questionnaire (Chi-FDDQL) was translated according to the standard process, including specifically forward-translation, backward-translation, pilot testing and field testing. The survey data indicate Chi-FDDQL has good reliability, validity, responsiveness and other psychometric characteristics with item response theory and different item function analysis. We recommend that Chi-FDDQL can be applied to measure the health status of Chinese FD patients.