Introduction

Injuries are a leading cause of death in children of 1–18 years of age [1, 2]. The survival rate of major traumatized children is about 80% [2, 3]. Injuries can cause severe functional impairment and psychosocial problems in the short term and long term [49]. Despite the prominent role of major trauma in mortality and morbidity in children, relatively little research has been done in terms of quality of life of children after major trauma. Most studies focus on the consequences of brain injury [1012], whereas the quality of life in pediatric major trauma remains relatively unexplored. Van der Sluis and colleagues described the long-term outcome in pediatric polytrauma patients in 1997 [13]. Nine years after trauma, the RAND-36 was administered to patients 18 years of age or older. The quality of life enjoyed by the patients did not differ from a healthy reference population. Holbrook et al. recently studied the quality of life in adolescents 3, 6, 12, 18, and 24 months after major trauma with the Quality of Well-being Scale [14]. Significant deficits in quality of well-being were found in adolescents after major trauma compared with US norms for healthy adolescents.

There are many terms used to describe quality of life in health care, for example: health-related quality of life (HRQL), well-being, health status, and functional status. In this review, the definition of HRQL as described by the World Health Organization (WHO) is adopted. The WHO defines HRQL as the individuals’ perception of their position in life in the context of culture and value systems in which they live, and in relation to their goals, expectations, standards, and concerns [15]. To study HRQL in pediatric trauma patients, first a decision has to be made about what measure to use. Currently, many HRQL measures for children are available. Some measures are disease specific, whereas others are generic. Unfortunately, no trauma-specific HRQL measure has been developed for children, leaving generic measures as first choice. Comparison of the available measures enables a well-considered decision.

The aim for this review is to provide an overview of the available measures of HRQL for long-term follow-up in children after major trauma so that measures can be selected that are suitable for a large age range, valid and reliable, and cover a substantial amount of the content of the International Classification of Functioning, Disability, and Health (ICF) of the WHO [16].

Methods

Literature search

Medline and EMBASE databases were searched in all years up to October 2007 for measures of HRQL in children. The following search was entered: [(child* OR pediatr* OR paediatr* OR adolesc*) AND (quality of life OR health status) AND (psychometr* OR validity OR reliability OR cronbach OR test–retest)]. In Medline the extension (Title/abstract) was added to all terms to specify the search. Inclusion criteria were: (1) validation study of a generic HRQL measure in children in a Western country, (2) the measure is suitable for children in the age range of 5–18 years, (3) the paper is written in English or Dutch, (4) the measure has an English or Dutch version, (5) the measure has both validity and reliability reported.

Measure comparison

The measures were reviewed on four levels: (1) age range, (2) reliability, (3) validity, and (4) the content related to the ICF. The underlying idea for the measure comparison on these four levels is as follows:

Age range

When a measure is suitable for a large age range, fewer measures are needed to study a cohort. Therefore, a better comparison can be made over time and between subjects of different ages. In this review, a large age range is defined as at least 10 years covered. Measures were selected that were suitable for children who were >5 years old, because it is hypothesized that 5 years after trauma the most recovery that can be expected has taken place and that the child is in a relatively stable situation. Some measures have different versions for different age categories. When these versions were similar in content, the age ranges of these different versions were added. When they had a different amount of questions or a different scoring system, the versions were considered as separate measures and the age ranges were not added.

Reliability

On the second level, the internal consistency and the test–retest reliability of the HRQL measures were compared. In this review, a measure was considered reliable when it reached at least group comparison level for internal consistency (Cronbach’s alpha > 0.70), [17] and had a substantial test–retest reliability [kappa, intraclass correlation coefficients (ICC), Spearman or Pearson correlation coefficient >0.60] [18]. A measure was found reliable when at least 80% of the measurements of reliability exceeded the set levels.

Validity

Comparison of the validity of a measure is a complicated matter, because there are many ways to describe it. Validity can be divided into content and construct validity. A method often used to describe content validity of an HRQL measure is the ability to differentiate between healthy subjects and children with a disease. Construct validity can be described, for example, by factor analysis, by the correlation of a measure with other instruments that aim to measure similar or different constructs, and by the correlation with preknown information or clinical symptoms. In this review, an attempt was made to give an overview of the content and construct validity for all included HRQL measures.

Content related to the ICF

The fourth and final level of comparison included the content of the questionnaires. This comparison was made in light of the ICF [16] (Fig. 1). It is a model in which health condition is defined by three domains: body functions and structures, activities, and participation. These domains are divided into chapters, as listed in Table 5. To compare the content of the questionnaires, all items were placed in one of the ICF chapters. If an item encompassed different constructs, the item was placed in more than one chapter. For example, the fourth item of the Health Utilities Index Mark 2 (HUI 2) “Learns and remembers school work normally for age,” encompassed two constructs: “learn” and “remember.” These two constructs were placed in two different chapters, namely, the first chapter of activities and participation and the first chapter of body functions and structures, respectively. If the content of the item did not fit in one of the chapters, the item was placed in the category “other”.

Fig. 1
figure 1

International Classification of Functioning, Disability and Health (ICF) of the World Health Organization (WHO) [12]

Placement of items in the chapters of the ICF was done by three researchers independently. One of them (LJ) placed the items of all measures, whereas the other two (MK and MB) placed the items of seven measures. So finally, all items were placed by two researchers. In case of disagreement, a discussion followed, led by a fourth independent person (JWG). This person finally decided in which ICF chapter the item was placed. The number of chapters covered by the items was used as a measure for covering the ICF. In this review, a measure was found to represent the ICF substantially when the items covered more than six chapters.

Results

The search in Medline and EMBASE databases rendered 1,235 hits and 21 related articles. Seventy-nine papers met the inclusion criteria, describing in total 14 measures. The number of references per measure varied between 1 and 26. The included measures are Child Health and Illness Profile Adolescent and Child Edition (CHIP-AE [1921], CHIP-CE [22, 23]), Child Health Questionnaire Child and Parent Forms (CHQ-CF87 [2430], CHQ-PF50 [26, 27, 3053], CHQ-PF28 [5457]), DISABKIDS [58, 59], Functional Status II (FS II)(R) [60, 61], HUI 2 [32, 52, 6264]), KIDSCREEN 52 [65, 66] and KIDSCREEN 27 [67, 68], KINDL [25, 6972], Pediatric Quality of Life Inventory 4.0 (PedsQL [10, 7391]), TNO-AZL Child Quality of Life questionnaire (TACQOL [9295]), and Youth Quality of Life Instrument—Research Version (YQOL-R [96]).

Comparison of the age range and other general characteristics

CHIP, CHQ, DISABKIDS, KIDSCREEN, PedsQL, and TACQOL have different versions for different age categories. Besides language adaptations, the age-adapted versions of CHIP and CHQ also have different numbers of items and different scoring systems. Therefore, the child and adolescent edition of CHIP and the child and parent form of CHQ were considered as separate measures. The number of items and the scoring system of the different versions of DISABKIDS, KIDSCREEN, PedsQL, and TACQOL are similar. Therefore, these versions were considered as one measure, and the age ranges are added.

The measures that are suitable for the largest age range are HUI 2 and PedsQL. They are both validated for children between 2 and 18 years old. CHQ-PF50/28 (5–18 years), DISABKIDS (4–16 years), FS II(R) (0–12 years), KIDSCREEN 52/27 (8–18 years), and TACQOL (6–15) are also validated for an age range of 10 years or more. Measures suitable for an age range of less than 10 years are CHIP-AE/CE (11–17 / 6–11 years), YQOL-R (12–18 years), CHQ-CF (10–18 years), and KINDL (8–16 years). The large age range measures are all proxy reported or clinician administered, except for PedsQL, DISABKIDS, KIDSCREEN, and TACQOL, which also have a self-report version. The minimal age limit used for self-report measures varies between 8 and 11 years. The proxy-report measures and the clinician-administered measures are suitable for children of all ages.

The number of items varies enormously for each measure. HUI 2 contains less than ten items, whereas CHIP-AE and TACQOL contain more than 100 items, resulting in large differences in the time needed to complete the questionnaire. Short measures take only 5 min or less, whereas the larger measures take 10–45 min to complete. Measures that take 20 min or more to complete were CHIP-AE/CE and CHQ-CF87/PF50. Items are placed in a varying number of domains, with a median of six domains. PedsQL and YQOL-R have only four domains, whereas CHQ has 13 domains. General characteristics of all measures are summarized in Tables 1 and 2.

Table 1 General characteristics of health-related quality of life (HRQL) measures in children: number and titles of the domains
Table 2 General characteristics of health-related quality of life (HRQL) measures in children: validated age range, how to report, rating scale, number of items, time needed to complete the measure

Comparison of reliability

Internal consistency for the total score is reported for FS II(R), KINDL, PedsQL, and YQOL-R. In KINDL, PedsQL proxy-report version, and YQOL-R, all Cronbach alphas for the total score exceeded the 0.70 level of group comparison. In the PedsQL self-report version and in FS II(R), 95% and 63% of the alphas for the total score were >0.70, respectively. Internal consistency for the domains is reported for all measures except for HUI 2. In CHIP-AE/CE, DISABKIDS, FS II(R), KIDSCREEN 52/27, and YQOL-R, all alphas for the domains exceeded the 0.70 level. Measures with nearly all alphas for the domains >0.70 were CHQ-CF/PF50 (93% and 86%) and the proxy- and self-report version of PedsQL (95% and 84%). Measures with <80% of alphas for domains >0.70 were TACQOL (69%), CHQ-PF28 (53%), and KINDL (33%).

ICC, Pearson correlation coefficients, and kappas were used to report test–retest reliability in the reviewed articles. Test–retest reliability for the total score is reported for FS II(R), HUI 2, PedsQL, and YQOL-R. All measured coefficients for the total score exceeded the 0.60 level. Test–retest reliability for the domains is reported for all measures except for FS II(R), KIDSCREEN 52, KINDL, and PedsQL self-report version. All coefficients for the test–retest reliability of the domains exceeded the 0.60 level for CHIP-AE/CE, DISABKIDS, KIDSCREEN 27, PedsQL proxy-report version, and YQOL-R. HUI 2 has 80% of its reported coefficients >0.60. Measures with <80% of coefficients >0.60 were TACQOL (73%), CHQ-PF50 (65%), CHQ-CF87 (60%), and CHQ-PF28 (50%). Reliability for all measures is summarized in Table 3.

Table 3 Internal consistency and test–retest reliability for health-related quality of life (HRQL) measures in children

Comparison of validity

Validity was assessed and reported differently in all studies, so a comparison was difficult to make. For most measures, content validity was assessed by the ability to differentiate between healthy subjects and children with a disease. All measures were able to do so in a variety of diseases, except for KINDL, which could not differentiate between healthy and chronically ill children. No information about content validity was reported for CHIP-CE and KIDSCREEN 52. Construct validity was assessed by factor analysis, by correlation with other instruments that aim to measure similar or different constructs, and by correlation with preknown information or clinical symptoms. Factor analysis was performed for CHIP-AE/CE, CHQ-PF50, KIDSCREEN 27, KINDL, PedsQL, and TACQOL and revealed that most items of these measures load most highly on their conceptually derived scale. A summary of the information on content and construct validity for all measures is reported in Table 4.

Table 4 Content and construct validity for health-related quality of life (HRQL) measures in children

Covering the ICF

Measures that covered more than six chapters of the ICF domains were CHIP-AE/CE, CHQ-CF87/PF50, DISABKIDS, KIDSCREEN 52, PedsQL, and TACQOL. CHQ-PF, HUI 2, and KIDSCREEN-27 covered six chapters; YQOL-R covered five chapters; KINDL covered four chapters, and FS II(R) covered three chapters. CHIP-AE covered the ICF domain body functions & structures best, with all chapters represented in the measure. Only one to four of the chapters of body functions & structures were covered by the other measures. CHIP-AE/CE, CHQ-CF87, and TACQOL covered ICF domains activities and participation best, with seven of nine chapters represented in the measures. Measures with less than half of the chapters of activities and participation covered were CHQ-PF28, FS II(R), HUI 2, KINDL, and YQOL-R (see Table 5).

Table 5 Number of items on the chapters of International Classification of Functioning, Disability and Health (ICF) of pediatric health-related quality of life (HRQL) measures

Discussion

The 14 measures that resulted from the literature search performed differently on all four aspects that were looked at in this review. Measures that performed best on one level were outperformed on other levels and vice versa. For the purpose of this review, a measure should meet the criteria on all four aspects to be found suitable in measuring HRQL in children after major trauma. Most measures met the first criterion: “suitable for an age range of at least 10 years.” Measures that did not meet this criterion were CHIP-AE/CE, CHQ-CF87, KINDL, and YQOL-R. The second criterion was group comparison level for the internal consistency (α > 0.70) and substantial test–retest reliability (kappa, ICC, Spearman or Pearson correlation coefficient >0.60) in at least 80% of the measurements of reliability. Measures that did not meet this criterion were CHQ-CF87/PF50/PF28, FS II(R), KINDL, and TACQOL. The third aspect looked at was the content and construct validity of the measures, which was confirmed for all measures. The fourth and final criterion was that the items covered more than six chapters of ICF domains. This criterion was met by all measures except for CHQ-PF28, FS II(R), HUI 2, KIDSCREEN-27, KINDL, and YQOL-R. So the measures that met all four criteria were DISABKIDS, KIDSCREEN 52, and PedsQL4.0.

Two earlier reviews also came to a recommendation after comparing the general characteristics and psychometric properties of pediatric HRQL measures. Willis et al. assessed outcome measures in pediatric trauma populations [97]. They recommended PedsQL 4.0 for children >2 years of age because it captured both functional and QOL information, was quick to administer, covered a large age range, and had a self- and parent-proxy-report version. Eiser et al. reviewed generic and disease-specific measures of QOL in 2001 [98]. They recommended PedsQL for brief assessment during a regular clinic visit and CHQ where the goal is to improve family functioning or school integration. Other measures that performed well in current review: DISABKIDS and KIDSCREEN 52, were unfortunately not included in these two earlier reviews. In 2007, the European Consumer Safety Association (ECSA) developed guidelines for the conduction of follow-up studies measuring injury-related disability [99]. They chose EuroQoL-5D (EQ-5D) in combination with HUI 3 as the preferred common core to measure functional outcome after injury in patients aged 5 years or older. The ECSA assessed the content of the measures related to ICF domains. However, the psychometric properties of the measures were not considered. EQ-5D and HUI 3 were not included in this review because the measures were developed for adults and have not yet been sufficiently validated in children.

Strengths and limitations

The two largest biomedical databases (Medline and EMBASE) were searched for validation studies of HRQL measures for children. Despite the extensive search strategy, some relevant related articles were not found initially. Perhaps the addition of more synonyms for quality of life could have overcome this limitation. Another option is to search more databases, for example, the psychological database PsycINFO. However, it seems that no measures were missed. The pediatric HRQL measures included in the most recent review articles corresponded mostly with the measures that were included in this review [97, 100102]. The How are You (HAY) was excluded because it also contained disease-specific questions. The Exeter HRQL and the Generic Child Questionnaire (GCQ) were excluded because psychometric properties were reported insufficiently. No measures were included that had not been reviewed previously.

The articles were screened for meeting the inclusion criteria. Because no trauma-specific HRQL measure is available for children, generic measures were selected. To make comparison of psychometric properties possible, only measures were included for which validity and reliability was reported. Results were limited to validation studies performed in Western countries, because culture is hypothesized to have a great impact on the psychometric properties of the measure. Only measures that have an English or Dutch version were included, because the English language is most used in Western society, and Dutch is the language of interest of the research group. Another reason was that for comparison of the content of the measures the researcher should fully understand the items. Two French questionnaires, the Vecú de Santé Perçué Adolescent (VSP-A) and the Duke Health Profile (DUKE HP), were therefore excluded.

The number of available references for each measure is quite variable in this review. Some measures were assessed on the basis of only one reference, whereas other measures have 26 references for assessment. More references lead to a more reliable assessment of the psychometric properties of the measure. Unfortunately, internal consistency was not reported for HUI 2, no test–retest was reported for KINDL, and no content validity was reported for KIDSCREEN 52. It is questionable whether the reported information on content validity—the ability to discriminate between health and disease—is really that interesting in a trauma population. It seems much more important for an HRQL measure to distinguish between subjects with injuries of different severity levels. Unfortunately, this information is lacking in current literature for all the included HRQL measures. In fact, PedsQL 4.0 is the only HRQL measure validated in children after trauma at all [10]. Comparison of general characteristics and covering ICF chapters of activities and participation are not influenced by the number of references.

Strength of this review is comparison of HRQL measures on four levels: age range, reliability, validity, and content related to ICF. Earlier reviews on generic HRQL measures in children report general characteristics and psychometric properties [97, 98, 100104]. The age range for which the measure is suitable, domain titles, number of items, and time needed to complete the questionnaire is often described. Internal consistency of the measures is reported by Ravens-Sieberer et al., Willis et al., Rajmil et al., and Connolly et al., and the last two also reported test–retest reliability [97, 101103]. Only Rajmil et al. report on the content of the measures [102]. They placed the dimensions of the questionnaires in one of three domains: physical, psychological, or social. No previous review compares HRQL measures for children on all four levels looked at in our review. An interesting concept that was considered as fifth level in this review was the responsiveness of the measure. Terwee et al. divided responsiveness of HRQL instruments into three categories: (1) the ability to detect change in general, (2) the ability to detect clinically important change, and (3) the ability to detect real changes in the concept being measured [105]. They also eliminated 31 measures of responsiveness after an extensive literature search.

All items of HRQL measures were placed into the chapters of ICF domains. This provided a clear overview of the content of the measures related to ICF. Most measures covered the chapters of activities and participation much better than the chapters of body functions & structures. This implies that in children, activities and participation are considered of more importance for HRQL than are body functions & structures. Sometimes placement of an item was difficult, because multiple interpretations of the item were possible. Especially when it came to cognitive functions, distinction between body functions and activities was often not very clear. Furthermore, many items could not be placed in one of the ICF chapters. The constructs measured by these items were often too broad to be placed in one chapter. Also, items about personal or environmental factors, feelings, and emotions could not be placed in the chapters of the three ICF domains. The fact that many items could not be placed in ICF implies that HRQL is a broader concept than health status as defined by ICF.

Conclusion

Based on the results of this review, DISABKIDS, KIDSCREEN 52, and PedsQL4.0 seem to be most suitable to measure HRQL of children over the long term after major trauma. They cover a large age range, have good psychometric properties, and cover the ICF content substantially.