Background

Inflammatory bowel diseases (IBD) are characterized by chronic, uncontrolled and relapsing inflammation of the gastrointestinal tract, which encompasses Crohn’s disease (CD) and ulcerative colitis (UC). Health-related quality of life (HRQoL) is defined as a broad, multidimensional concept comprising patients’ physical health (including disease), psychological state, level of independence, social relationships, personal beliefs and relationship to their environment [1, 2]. The evaluation of HRQoL for patients with IBD in clinical research and clinical practice enhances the understanding of the disease impact and the effects of treatments on the disease. Thus, the evaluation of HRQoL should be recognized as an important outcome indicator by patients and their clinicians.

Up to now, a large number of IBD-specific HRQoL instruments have been developed and validated for the IBD patients [3,4,5,6,7]. These instruments have been used to assess patients’ understanding of IBD symptoms and the subjective perception of the illness in clinical practice and research [3, 4]. They have also been used to compare the effect of treatment strategies and to provide evidence for health policy makers [3,4,5].

Several researchers have conducted reviews that measure the HRQoL of patients with IBD [3,4,5,6,7,8]. However, the reviews only enrolled some of the instruments, while other instruments are commonly ignored. The measurement properties and methodological quality of measurement properties should be evaluated systematically for clinical practitioner and researchers. We aimed to comprehensively collect all of the eligible IBD-specific HRQoL instruments to gain an understanding of their measurement properties. Therefore, the aim of this systematic review was to critically appraise and compare the measurement properties of the instruments to help clinicians and researchers select an appropriate instrument.

Methods

Inclusion and exclusion criteria

This study was conducted following the guideline of the preferred reporting items for systematic reviews and meta-analysis (PRISMA statement) [9]. Articles were included if they fulfilled the following criteria: (1) Types of patients: Patients diagnosed as CD, UC or IBD were enrolled. Patients with other diseases (infectious colitis, ischemic colitis, irritable bowel syndrome, etc.) were excluded. (2) Types of instruments: The HRQoL instruments developed and validated for patients with CD, UC or IBD were eligible. HRQoL was defined as a broad, multidimensional concept comprising patients’ physical health (including disease), psychological state, level of independence, social relationships, personal beliefs and relationship to their environment. Both the self-administered and rater-administered instruments were included. The instruments for child or adult patients were included. (3) Types of languages: The full-text articles were published in English. General HRQoL instruments were excluded, such as the SF-36. Disease-specific instruments not related or only partially related to IBD were also excluded, such as the gastrointestinal quality of life index [10].

Literature search

The following relevant electronic databases were searched for English-language articles: Medline (via Pubmed) and EMBASE. The search period was from the inception of the databases to May 31th 2016. The search strategy for Medline (see Additional file 1: Appendix S1) consisted of 3 types of search terms for the following: (1) IBD, UC or CD; (2) HRQoL; and (3) measurement properties. The latter two filters were developed according to the syntax established by Kotecha et al. [11].

In addition, Google Scholar was used to search for relevant articles and literature. The citations of the reviews and the references of included articles were also checked. The patient-reported outcome and quality of life instruments database (website: https://eprovide.mapi-trust.org/) was searched for eligible instruments. Two review authors (XLC, FBL) independently performed the literature search. Disagreements between the two authors were resolved by discussion with another author (LHZ).

Literature extraction

A set of questions regarding the characteristics of the instruments were drafted. The characteristics were as follows: Which type of disease does the instrument assess (IBD, UC or CD)? How is the instrument administered (self-administered or rater-administered)? How long does it take to complete (completion time)? At what time does it measure the HRQoL of the patients (recall period)? How many items does it contain? What is the form of the item (response options: including Likert or visual analogue scale [VAS])? What is the range of the scores? What domains does it contain? Are classical test theory and item response theory applied? Data about the first author, year of publication, the full and abbreviated names of the instrument and the country of origin (the first version) were also collected.

The methodological quality of measurement properties was assessed according to the consensus-based standards for the selection of health measurement instruments (COSMIN) checklist with a 4-point scale [12,13,14]. The COSMIN had the following items: internal consistency, reliability (test-retest reliability), measurement error, content validity, structural validity, hypothesis testing, cross-cultural validity, criterion validity and responsiveness. For each instrument, the measurement properties were rated as “poor”, “fair”, “good” or “excellent” based on predefined criteria [12,13,14]. The definitions of measurement properties for measurement properties based on COSMIN checklist are shown in Additional file 1: Appendix S2. The following measurement properties of the instruments were also evaluated: reliability (internal consistency, test-retest reliability), content validity (interviews/focus groups, pilot test), structural validity (convergent/divergent, discriminant), criterion validity and responsiveness.

The methodological quality of measurement properties was based on the original version, except that cross-cultural validity was based on the translated versions. Two of the three review authors (XLC, LHZ or YW) independently performed the article selection, screened and extracted the characteristics of the instruments and assessed the measurement properties. Disagreements between the two authors were resolved by discussion with another author (TWL or XYL).

Results

In total, 2075 articles were identified through the search, and 155 potential articles were included for the full text evaluation (Fig. 1). After manually evaluating the full text, 19 IBD-specific HRQoL instruments were identified. The Crohn’s and colitis quality of life questionnaire [15], inflammatory bowel disease impact and symptom scales [16], the Crohn’s disease patient-reported outcomes [17] and ulcerative colitis patient-reported outcomes [18] were excluded due to the lack of full text. At last, 15 articles investigating 15 IBD-specific instruments were included [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33]. Among them, three instruments were for paediatric IBD patients, and the others were for adult IBD patients.

Fig. 1
figure 1

Flow chart of the search strategy

The basic characteristics of the included instruments are shown in Table 1. The quality-of-life index for pediatric inflammatory bowel disease (IMPACT) [19, 34], IMPACT-II [20, 35] and IMPACT-III [21, 36] were IMPACT series instruments. The IMPACT series instruments were for paediatric IBD patients. The 32-item inflammatory bowel disease questionnaire (IBDQ-32) [22, 37], the 36-item inflammatory bowel disease questionnaire (IBDQ-36) [24], the short inflammatory bowel disease questionnaire (SIBDQ) [23, 38] and the 9-item inflammatory bowel disease questionnaire (IBDQ-9) [25] were IBDQ series instruments. All the instruments were developed for patients with IBD, except the Crohn’s life impact questionnaire (CLIQ) for patients with CD [33]. All of the instruments were developed in North American and European countries. All of the instruments were self-administered. Four instruments also had rater-administered versions [22, 23, 25, 29]. Response options in 9 instruments were Likert scales [21,22,23,24,25, 27, 28, 31, 32], and others were VAS scales.

Table 1 Characteristics of the included instruments

The numbers of domains in the 15 instruments varied from 1 to 6 (Table 2). For the instruments of paediatric IBD, the IMPACT series instruments contained four domains: IBD-related symptoms, physical functioning, emotional functioning and social functioning. For adult IBD patients, some instruments contained the above four domains, whereas some only contained one or two domains. In total, of 55 domains were obtained from all the instruments. (1) Among them, 19 domains were about IBD-related symptoms, which contained bowel or intestinal symptoms (10 domains), systemic symptoms or impairment (6 domains), other symptoms (2 domains) and disease complications (1 domain). (2) Fifteen domains were related to physical functioning or general wellbeing, comprising general quality of life or general wellbeing (5 domains), body image or body stigma (4 domains), functional functioning or impairment (2 domains), energy (2 domains), activity limitations (1 domain) and sexual intimacy (1 domain). (3) Nine domains were about emotional functioning, comprising emotional functioning or impairment (6 domains), disease-related worry (2 domains) and embarrassment (1 domain). (4) Ten domains were about social functioning, containing social functioning (6 domains), social impairment (2 domains) and treatment (2 domains). Another two domains were about information [31] and the total score of the IBDQ-9 (unidimensional) [25].

Table 2 Domains of the included instruments

The methodological quality of measurement properties based on the COSMIN checklist with 4-point scale ratings is shown in Table 3. All of the instruments were developed and assessed based on classical test theory. Item response theory was also used in the IBDQ-9 and CLIQ. (1) Most of the instruments scored “excellent” or “good” for content validity. The items of these instruments were mainly from interviews with patients, review of the literature and professional experience. The pilot study was used to ensure the applicability of the items in the seven instruments. The domains of these instruments mainly contained IBD-related symptoms, physical, emotional and social functioning (Table 2). For example, the IBDQ-32 contained bowel symptoms, systemic symptoms, emotional and social domains [22]. (2) Most of the instruments scored “good” or “fair” for internal consistency, reliability, structural validity, hypotheses testing and criterion validity. For example, structural validity was rated in 12 instruments. Among them, two instruments scored “excellent” [25, 33], three scored “good” [21, 26, 31], five scored “fair” [19, 20, 22,23,24] and two scored “poor” [29, 30]. (3) Most of the instruments scored “fair” or “poor” for measurement error, responsiveness and cross-cultural validity. The reasons for responsiveness scoring “fair” or “poor” included: the magnitude of the correlations or differences was not stated; and the criterion for change was not considered as a reasonable gold standard. The reasons for cross-cultural validity scoring “poor” and “fair” included: whether the two translators work independently was not reported; whether the items translated forward and backward was not reported; how differences between the original and translated versions were resolved was not described in detail; the cultural relevance of the translation was not checked; and differential item function between language groups was not assessed.

Table 3 COSMIN checklist with 4-point scale ratings of the included instruments

The measurement properties of the instruments are shown in Table 4. (1) The IMPACT series instruments (IMPACT, IMPACT-II and IMPACT-III) were used to assess the HRQoL of paediatric IBD patients. The IMPACT series instruments, especially IMPACT-II and IMPACT-III, had good content validity and were translated into other languages. They were easily administered and contained the main domains (symptoms, physical, emotional and social domains). (2) The IBDQ-32 was considered to be of good measurement properties (content validity) and was proven to be valid, reliable and responsive. The IBDQ-32 contained the main domains: symptom, social and emotional domains. Furthermore, the IBDQ-32 was the most widely used and was translated and back-translated into a variety of languages. (3) The rating form of IBD patient concerns (RFIPC) had good content validity, internal consistency and internal consistency and acceptable responsiveness. Although the original version did not report the responsiveness, its responsiveness was confirmed in the translated version [39]. The RFIPC contained symptoms and emotional domains but did not contain emotional or social domains. (4) The SIBDQ, IBDQ-9, Cleveland global quality of life (CGQL), short health scale (SHS), Edinburgh inflammatory bowel disease questionnaire (EIBDQ) and Crohn’s and ulcerative colitis questionnaire (CUCQ) were short instruments, which were all easily administered and could be completed in a short time. The IBDQ-9, SIBDQ, CUCQ and SHS had good measurement properties. The SIBDQ and IBDQ-9 were short versions of the IBDQ-32 and IBDQ-36, respectively. The SIBDQ was used in the UK, the US, Germany and Spain [40,41,42,43]. The SIBDQ contained symptoms, emotional and social domains. The IBDQ-9 was used in Spain and Iran [25, 44], which only contained one domain (total score). The SHS contained symptom burden, general wellbeing, disease-related worry and social functioning. The SHS was used in England, Norway and Sweden [45,46,47]. The CUCQ was used only in the UK, which should be further evaluated in other languages [32]. (5) For the IBDQ-36, the Cleveland clinic questionnaire for inflammatory bowel disease (CCQIBD) and Padova inflammatory bowel disease quality of life (PIBDQL), limited evidence was available for their measurement properties.

Table 4 Measurement properties of the included instruments

The translated versions of the instruments are shown in Table 5. (1) For the instruments of paediatric IBD, the IMPACT-II had 3 translated versions [48,49,50]. The IMPACT-III had 4 translated versions [51,52,53,54]. (2) For the instruments of adult IBD, the IBDQ-32 and RFIPC were the most widely used worldwide. The IBDQ-32 has been translated and validated in 93 languages [55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70] and was found to be reliable and valid in some languages. The IBDQ-32 was also used as an important outcome in randomized controlled trials [71,72,73,74,75]. The RFIPC had at least 6 translated versions [76,77,78,79,80,81,82]. The SIBDQ had 4 translated versions [40,41,42,43]. The IBDQ-9 [44], IBDQ-36 [83, 84], PIBDQL [85] and CGQL [86] also had translated versions.

Table 5 Translated versions of the instruments

Discussion

The present review summarizes an overview of 15 IBD-specific HRQoL instruments with respect to their measurement properties and the methodological quality based on the COSMIN checklist.

According to the results of the COSMIN checklist, most of the instruments did not include all the methodological quality. Only content validity was assessed properly in most of the included instruments. Most of the instruments scored “good” or “fair” for internal consistency, reliability, structural validity, hypotheses testing and criterion validity. The information regarding measurement error, responsiveness and cross-cultural validity was limited or was of poor measurement property because they did not reach the required criteria or because of insufficient information. Our results were consistent with other instruments appraised by the COSMIN criteria, such as irritable bowel syndrome-specific QOL instruments [87]; rheumatoid arthritis-specific QOL instruments [88]; and QOL instruments for infants, children and adolescents with eczema [89]. Most of the IBD-specific instruments did not show adequate methodological quality. One reason for this was that most of the IBD-specific HRQoL instruments were developed before 2010. However, COSMIN guidelines were developed approximately 2010 [12,13,14]. Therefore, older articles could not follow COSMIN guidelines, and their measurement properties might be underestimated.

Based on the results of the measurement properties and translated versions of the included instruments, some instruments had good psychometric characteristics and were widely used. (1) For paediatric IBD-specific instruments, most of the measurement properties were tested properly, especially the IMPACT-III [21]. The IMPACT-III had the same items as the IMPACT-II. However, The IMPACT-III was on a 0–4 Likert scale, which was easily understood by children. The IMPACT-III was translated into at least 4 translated versions [51,52,53,54]. The IMPACT-III was recommended to assess the HRQoL for paediatric IBD patients. (2) For the adult IBD instruments, the IBDQ-32 and SIBDQ (short version of IBDQ-32) had good measurement properties. The two instruments had excellent content validity and proved to be valid, reliable and responsive. The two instruments contained symptoms, emotional and social domains. The two instruments were used widely. The IBDQ-32 has been translated and validated in 93 languages. The SIBDQ was used in the UK, the US, Germany and Spain [40,41,42,43]. The IBDQ-9, CGQL, SHS, EIBDQ and CUCQ were all short instruments, which had relatively high methodological quality. However, they had fewer translated versions. The IBDQ-36, CCQIBD, PIBDQL, CGQL and EIBDQ had the lowest measurement properties. The PIBDQL and CGQL instruments were developed and assessed based on IBD patients receiving surgery, and they were translated into other languages. The EIBDQ had not been translated into other languages, which limited its use.

Compared with reviews of IBD-specific instruments published by other authors [3,4,5,6,7,8], our review had the following advantages. (1) Our review included more eligible IBD-specific HRQoL instruments. For example, the review conducted by Alrubaiy et al. enrolled 10 instruments [8]. Among them, only five instruments were about HRQoL instruments, while others were burden or disability instruments, such as the Crohn’s disease burden questionnaire, the IBD disability score and the IBD disability index. (2) Our review fully evaluated the measurement properties, including content reliability, internal consistency, test-retest reliability, measurement error, convergent/divergent, discriminant validity, criterion validity, cross-cultural validity and responsiveness. Previous reviews did not evaluate criterion validity, discriminant validity or cross-cultural validity for each instrument [8]. Criterion validity and discriminant validity are important features for the instrument. Criterion validity reflects the extent to which scores on a particular instrument relate to a gold standard. Discriminant validity refers to how well the scale can discriminate between different features of the participants.

All of the IBD-specific instruments were developed in North American and European countries. This is likely because the highest incidence and prevalence rates of IBD are in Europe and North America [90]. Another reason might be associated with the popularity of the QOL concepts and the standard procedure for QOL development [91, 92]. In developing countries, researchers mainly focused on translating and back-translating the IBD-specific instruments and used them to assess the QOL of IBD patients.

Although there was a lack of consensus regarding the specific domains among all of the instruments, the common domains measured in the instruments were identified: IBD-related symptoms, physical functioning or general wellbeing, emotional functioning and social functioning. These domains were consistent with the concepts of the common scales, such as the WHOQOL and FACT-G [92,93,94]. The typical manifestation of IBD included diarrhea with blood, fever, abdominal pain and malnutrition. These symptoms are the most frequently occurring, meaning that the domains contribute the most important information to the IBD-specific instruments.

The limitations of this study were as follows: (1) Non-English articles were not enrolled because of language restrictions; thus, the restriction resulted in limited negative evidence for this study; (2) Articles about the original language were used to assess the measurement properties of the included instruments. The translated articles were not used for the assessment of measurement properties; and (3) Some articles about clinical trials may have been excluded in this review, which resulted in a limited ability to examine responsiveness.

Conclusions

This review better guides the use of IBD-specific HRQoL instruments and helps clinicians and researchers choose appropriate IBD instruments. The measurement properties scored low for some IBD-specific HRQoL instruments. Based on the characteristics, measurement properties and applications of the instruments, the IBDQ-32 was the most widely used and had the strongest evidence of being reliable, valid and responsive for adult IBD patients. As a short instrument, the SIBDQ also had good measurement properties and was widely used. The IMPACT-III had good measurement properties and was widely used for paediatric IBD patients. For worldwide use of the new instruments, it is necessary to develop instruments according to the standard procedures (for example, the COSMIN) and make sure their measurement properties had excellent or good ratings. New instruments for IBD should take into account IBD-related symptoms and physical, emotional and social domains.