Highlights

  1. 1.

    Mental health problems have detrimental effects on people’s quality of life (QoL).

  2. 2.

    None of the currently available instruments to measure QoL was perfect for use in economic evaluation of child mental health care

  3. 3.

    All instruments had disadvantages, ranging from lack of psychometric research, no proxy version, not being suitable for young children, no age-specific value set, to insufficient focus on relevant domains.

The World Health Organization (WHO) has categorized mental health problems among the most disabling in the world [1]. Furthermore, the incidence of mental health problems has been increasing [2]. Around 20% of the working age population in Organization for Economic Co-operation and Development (OECD) countries is currently suffering from a mental disorder, and over the life course 40% is affected [2]. Many mental health disorders have their origin in childhood and adolescence [3]. Serious and common long-term effects such as substance abuse [4], poor work [5] and academic performance [6], problems with peer and romantic relations [7], and development of other psychiatric disorders do occur [8]. Consequently, mental health problems have detrimental effects on people’s quality of life (QoL) [9,10,11].

The WHO defines QoL as “individuals’ perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards, and concerns” [12]. At any given time, social, psychological, and biological factors determine a persons’ mental health, and this can affect a persons’ QoL. The definition of QoL is broad and related to several aspects, including physical health, psychological state, level of independence, social relationships, personal beliefs, and their relationship to salient features of their environment [13]. Thus, a measure for QoL should capture multiple domains and cannot be considered a single concept.

Assessing QoL is important, not only in clinical practice and research, but also in the field of health economics. The latter obviously prompted by an increased interest in the societal impact of interventions and the growing attention for economic evaluations in child and adolescent mental health care, given the chance of life-long reduction of cost associated with mental health problems in children. Policy makers increasingly base their decisions on outcomes of economic evaluations [14]. Therefore, a standardized method for performing economic evaluations in pediatric mental health care is of great significance. However, methods and instruments used in economic evaluations have traditionally been developed for the somatic (health) care, and mostly for an adult population. Moreover, very different aspects of QoL are considered relevant in this field, although the term used (i.e., QoL) is the same. As a result, performing and interpreting standardized and reliable economic evaluations in this sector remains challenging.

Problems in assessing quality of life in children with psychiatric disorders

A major concern in measuring QoL in children with mental health issues is that many instruments available to measure QoL in children have been derived from adult versions [15]. Factors that might affect an appropriate understanding of instruments measuring QoL are language development, cognitive development, and type of disorder [16, 17]. Often, it is assumed that measuring QoL in children below the age of eight is not feasible and reliable. Proxy versions of instruments can be used in this group, but these have limitations as well. Where possible, it is recommended to let an individual report on their own QoL, perhaps with an addition of a proxy version of the questionnaire. An instrument should consider the cognitive age of the child, as some children develop at a slower pace than other children. The self-assessed version of the instrument should be understandable for children and their proxies, and the proxy version of the instrument should be available to adequately assess QoL in children too young or otherwise unable to complete a self-assessed version.

With this review, we aim to provide an overview of existing generic measures of QoL suitable for economic evaluations in children with mental health or psychosocial problems. We will include both preference-based measures (those with a value set (i.e., a collection of values for all possible states) suitable for economic evaluations) and profile-based measures (which provide different profiles or domains of QoL instead of a single score). A systematic review of psychometric properties in children with mental health issues of the identified instruments will be provided. Finally, the instruments will be scored using an in-house quality rating (available in Additional file 1) and the scoring results will be summarized visually in a decision tree. This decision tree can aid in a well-informed decision for choosing an instrument to measure QoL in children with mental health or psychosocial problems.

Methods

First, we undertook a systematic review of reviews (meta-review) (A.) of QoL instruments from which we identified all relevant instruments (B.). Next, we performed a systematic review of the psychometric properties of the identified instruments (C.). Lastly, the results were summarized in a decision tree (D.).

A. Meta-review of quality of life instruments

First, several databases were searched. For scientific literature we searched PubMed (Medline), PsycInfo, Embase, Econlit, and Web of Science. For grey literature we searched Google Scholar, Google, Cosmin, Picarta, and several online repositories for instruments (Kenniscentrum meetinstrumenten VUMC (http://www.kmin-vumc.nl, Proqolid, PROM, PROMIS). Search terms for the reviews can be found in Additional file 1. Thereafter, reference lists of relevant literature were checked for missing information.

Reviews concerning QoL instruments were included if they were aimed at studies for children below the age of 18, were aimed at QoL instruments that could be used in social or cognitive development, or in relation to psychiatric disorders of children, and were written in English. Reviews were excluded if they focused on curative or palliative treatment of somatic illnesses and conditions, screening or diagnostic intervention, or vaccinations. Furthermore, we searched recent articles which were not included in reviews for possible newly developed instruments. Selection and screening of the QoL reviews was performed by two authors (LS and APG), disagreement was resolved by consensus.

B. Identification of QoL instruments

The identified reviews were searched for relevant instruments. Instruments for QoL were included if they fulfilled the following criteria: the instrument should be available in English, the instrument should be aimed at children below the age of 18, the instrument should be a measure of generic health related quality of life suitable for use in social or cognitive development, or in relation to psychiatric disorders of children. Furthermore, we excluded instruments that were aimed at one specific disorder (disease specific instruments).

C. Systematic review of psychometric properties of QoL instruments

Subsequently, for each of the identified instruments a systematic review was performed to assess the psychometric properties of the instrument. Databases (PubMed, PsycInfo, Econlit, Web of Science and EMBASE) were searched for relevant studies using the following search terms and their synonyms (instruments/ questionnaires AND psychometric quality AND child/adolescence) combined with search terms specific for each of the instruments (abbreviations and full instrument name). A full overview of the search terms can be found in Additional file 1. Furthermore, reference lists of identified studies and reviews where checked for missing studies.

Studies were included if the psychometric research was performed in healthy individuals below the age of 18 years old or children with psychosocial, cognitive or psychiatric problems. Studies were excluded if they were not written in English or Dutch, or focused solely on children with somatic difficulties and did not include a healthy control group or group with psychosocial, cognitive or psychiatric problems group. Selection and screening of the studies was performed by either APG or LS. Psychometric properties (i.e. internal consistency, reliability, measurement error, content validity, structural validity, hypotheses testing, cross cultural validity, criterion validity, responsiveness, and feasibility) were scored (yes, explored this characteristic/ no, did not look at this characteristic) using the definitions provided by COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN). A summary of the definitions used can be found in the Additional file 1.

D. Quality scoring based on results

Quality of all instruments was scored based on several elements often described in literature. This led to a quality score per instrument. We used an in-house measure of quality that scored the quality of the instruments based on the number of relevant domains for mental health (including both functional as pathology domains), number of psychometric studies in general population children, number of psychometric studies in children with mental health or psychosocial problems, psychometric quality of instruments in children with mental health of psychosocial problems, and the existence of a value set. Further, we assessed the quality of the instrument with a self-developed quality score instrument and summarized the results in a decision tree that can be used to identify the best instruments for measuring quality of life in children with mental health disorders. Criteria and full summary per instrument can be found in Additional file 1.

Results

A. Review of reviews- QoL

A total of 1636 reviews were identified. After the first selection based on title and abstract 43 reviews remained. No additional reviews were identified through our grey literature search. From these 43 reviews, 14 were not suitable for this review (reasons presented in PRISMA flow chart in Additional file 1), which led to 29 reviews included in this review of reviews.

B. Identification of QoL instruments

Of these 29 reviews, a total of 22 unique instruments were identified, see Table 1 for a summary. Of these 22 instruments, 14 had a proxy- and a self-report version, three instruments only had a proxy version and five only a self- report version. All identified instruments were available in English. An overview of the domains of QoL according to the WHO the instruments covered can be found in Fig. 1. A summary of the properties of the identified instruments can be found in Table 1.

Table 1 Summary Table of identified instruments to measure quality of life in children with mental health problems
Fig. 1
figure 1

Domains measured in quality of life instruments for children. Definition of QoL according to the World Health Organization. The X-axis represents the percentage of questionnaires that included at least 1 question on the specific domain

C. Systematic review of psychometric quality of QoL instruments

A total of 195 papers were identified that fulfilled our inclusion criteria concerning psychometric research. A summary of the type of psychometric research in children can be found in Fig. 2. PRISMA flow charts for all searches are available in Additional file 1. A summary per instrument of all psychometric research on these instruments (n = 195) can be found in Additional file 1. Of the 195 studies 30 (15.4%) focused on psychometric properties of the identified instruments in children with impaired social or cognitive development or psychiatric problems. Ten out of 22 instruments had no information on their psychometric properties in children with mental health problems (i.e., 16D, 17D, AQOL, AHUM, CHSCS-PS, GCQ, HUI2/3, ITQOL, QOLPAV, TACQOL). Thirty papers investigated the psychometric properties in children with mental health problems, these 30 papers are discussed below.

Fig. 2
figure 2

Type of psychometric research of all identified studies. COSMIN definitions were used to score these items. X axis represents percentage of identified studies

Child health and illness profile (CHIP)

The CHIP had questionable to excellent internal consistency (Cronbach’s alphas between 0.65–0.92 for the CHIP-AE [85], Cronbach’s alphas above 0.7 for the CHIP-CD/PRF [79] and Cronbach’s alphas between 0.71–0.82 for the CHIP-CE [76]) and fair to excellent test-retest reliability (ICC’s between 0.57–0.93) [85] in children with mental health problems. Structural validity was confirmed using linear principal factor model [79] and confirmatory factor analysis [76]. The questionnaires’ hypotheses testing abilities by investigating the discriminatory validity between age groups [85], genders [85], and illness groups [85], and by investigating the concurrent validity (comparison to ADHD-RS; r = −.35 [76] and r between −.18 and-.48 [79], and the SDQ r between-.28 and − .65 [79], CGI-.15 and − .30 [79], and FSI .28 and-.63 [79]).

Child health utility index 9 dimensions (CHU9D)

Psychometric research into the CHU9D has been conducted in two studies, one with overweight children [77] and one community sample receiving mental health services [86]. The CHU9D has acceptable internal consistency (Cronbach’s alpha of 0.78). Its hypotheses testing abilities were examined by convergence with the strengths and difficulties questionnaire (SDQ; r = 0.49) [77] and PedsQL (r = 0.47) [86] and discriminant validity between different weight and ethnic groups [77].

Child health questionnaire (CHQ)

The CHQ was developed on a sample of children with ADHD by Landgraf et al. [87]. The CHQ-CF87 has moderate to good internal consistency (Cronbach’s alphas between 0.63–0.89) [87], hypotheses testing was assessed by known groups analyses between a school, ADHD, and end-stage renal disorder sample, different age groups and gender [87]. The CHQ-PF50 has a poor to excellent internal consistency in ADHD (Cronbach’s alphas of 0.54–0.90) [88]. Measurement error was assessed by investigating the standard error of measurement. Hypotheses testing was confirmed through significant Pearson correlation coefficients between the CHQ-PF50 and other clinical measures (ADHD-RS, CPRS, CGI-ADHD-S, CGI-ADHD-I) [88].

Child quality of life questionnaire (CQOL)

The CQOL has good internal consistency in children with psychiatric disorders (Cronbach’s alphas of 0.81–0.87). Reliability was assessed by means of test-retest correlations (r = 0.4–0.7) and intra-rater correlations (0.57). Reliability of individual domains was very variable, but the combined scores of the CQOL was of acceptable reliability [80].

EuroQol five dimensions-youth (EQ-5D-Y)

The EQ-5D-Y has very variable test-retest reliability (ICC’s, between 0.25 and 1) [89, 90]. Structural validity was confirmed through principal component analysis [91]. Hypotheses testing was assessed through discriminant validity between groups with asthma, diabetes, rheumatic disorder, and speech or hearing disorder. Concurrent validity was examined by looking at the correlation between the EQ-5D-Y and the TACQOL (low to moderate correlations) [89, 90], ADHD-RS (index scores between r = 0.31–0.27) [92], the CHQ-PF50 scale (index scores between r = 0.11–0.64) [92], clinical outcome scores [93] and KIDSCREEN-10 (strong correlation with index scores, but low correlations between domains and items) [91]. Responsiveness was examined by comparing those responding to treatment and those not responding to treatment [91], and by investigating changes in scores of patients who improved according to the Clinical Global Impression – of Improvement (CGI-I) scale versus those who did not improve [93].

Secnik et al. [94] developed a value set for children with ADHD based on standard gamble utility interviews with parents of children with ADHD.

KIDSCREEN

Development and pilot testing of the KIDSCREEN took place using a sample of more than 3000 European children and adolescents from the 13 different countries [95]. For all versions psychometric research has been conducted into the internal consistency, reliability, structural validity, and hypotheses testing in 34 different studies. The KIDSCREEN-52 has also been evaluated based on its content validity, and the KIDSCREEN-27 as well as the KIDSCREEN-52 have been evaluated in terms of feasibility. Research by Bouwmans et al. [91] and Clark et al. [96] used a sample of children with psychosocial problems. Bouwmans et al. (2014) assessed the KIDSCREEN-10 in children with ADHD in terms of structural validity through principal component analyses, responsiveness through comparing children who were responsive to treatment and those who were not, and hypotheses testing through concurrent validity by comparing the KIDSCREEN-10 to the EQ-5D (r = 0.56). Clark et al. (2015) analyzed the KIDSCREEN-52 and found acceptable to good internal consistency (Cronbach’s alphas of 0.72–0.89 for the child-version and 0.78–0.92 for the parent-version). Intra-rater reliability was poor to good (ICC’s between parents and their children between − 0.17 and 0.66). Hypotheses testing was analyzed by means of concurrent validity (comparison with ABAS-II; low correlations).

Questionnaire for measuring health-related quality of life in children and adolescent - revised version (KINDL-R)

The KINDL-R has poor to good internal consistency (Cronbach’s alphas for the Chinese child-version of the Kid KINDL of 0.47–0.77 and 0.55–0.79 for the parent-version [97]; Cronbach’s alphas of 0.53–0.82 for the child version and 0.62–0.86 for the parent version for the kid and kiddo-KINDL [98]).

Principal component analysis [97] and confirmatory factor analysis [98] confirmed its structural validity. Hypotheses testing was assessed by discriminant validity between healthy groups and groups suffering from global development delay and differences between age and sex groups, but did not find significant differences [97]. Differences were found between children with and without special health care needs and concurrent validity by comparing the instruments with corresponding SDQ scales (r = 0.33–0.49) [98].

Multidimensional students’ life satisfaction scale (MSLSS)

Research of Athay [99] assessed the psychometric quality of the brief MSLSS in a sample of children with psychosocial problems and found acceptable internal consistency (Cronbach’s alphas of 0.77) and a standard error of measurement of 0.4. Structural validity was confirmed by performing confirmatory factor analysis. Hypotheses testing was evaluated, showing some evidence for construct validity (a correlation with children hope and symptom severity), and discriminant validity (increased score with treatment, differences between different age groups and gender differences) [99].

Pediatric quality of life inventory (PedsQL)

The PedsQL has acceptable to good internal consistency in children with ADHD, and in children with intellectual disabilities (all Cronbach’s alphas above .70) [73, 100,101,102], but in Dutch children with psychiatric disorders unacceptable to questionable internal validity for children 6–7 (Cronbach’s alphas of 0.40–0.63), questionable to good internal consistency for children 8–12 (0.63–0.85) and 13–18 (0.57–0.87) years old and parents (0.69–0.87) for parents of children of all ages [103]. It has excellent interparent reliability (ICC’s of 0.86–0.91) [103], but poor inter-rater reliability (ICC’s between the self-administration version and the parent version of 0.13–0.35) [100]. Structural validity was confirmed through exploratory factor analyses [73, 102], and confirmatory factor analysis [103]. The PedsQL’s hypotheses testing abilities were examined by looking at convergent validity (comparison to the CBCL [103]; (r = 0.24 children-rated and r = − 0.62 for parent-rated), and the SDQ [102] questionnaire (r = − 0.70–0.27). Parent-child agreement was moderate (r = 0.59–0.69) [101]. Discriminant validity was examined by assessing whether the PedsQL could distinguish between several known groups [73, 100,101,102,103]. Feasibility of the PedsQL was assessed by looking at the percentage of missing values which was less than 4.0% [101, 102].

Quality of well-being scale (QWB)

The QWB has good internal consistency (Cronbach’s alphas of 0.83 and 0.84) and excellent intra-rater reliability (ICC = 0.77). Hypotheses testing was evaluated with construct validity (confirmed by comparing the QWB-SA mental health scale to the mental health scales of the SF-36 (r = 0.66–0.72), EQ-5D (r = 0.61), HUI (r = 0.59–0.63), and POMS (r = 0.77)) [104].

TNO AZL preschool quality of life (TAPQOL)

The TAPQOL has fair to good internal consistency in children with language delays (Cronbach’s alphas of 0.63–0.82) and a low percentage of missing values (1.9–6.7%). Structural validity was confirmed by performing factor analysis and hypotheses testing was evaluated using known groups, receiver operating characteristics curves and comparison to a questionnaire for language delays [105].

Youth quality of life instrument (YQOL)

The YQOL has acceptable to excellent internal consistency (Cronbach’s alphas between 0.77–0.96) [63, 106] and good to excellent test-retest reliability (ICC = 0.74–0.85) [63, 106]. Hypotheses testing was assessed by comparing the YQOL to the Children’s Depression Inventory (r = 0.58) [63], the Functional Disability Inventory (r = 0.26) [63], the KINDL (r = 0.73) [63] and PedsQL’s comparable dimensions (r = 0.21–0.53) [106]. Discriminant validity was assessed by comparing known groups [63, 106].

Quality scoring of instruments

All instruments were scored on quality using an in-home instrument available in Additional file 1. The full quality score per instrument is available in the Additional file 1. A summary score per instrument is available in Table 1. The highest scoring instrument was the CHU9D with a score of 7 out of 10 points, and the lowest scoring instrument was the GCQ with 0 out of 10 points. These results led to a decision aid (Fig. 3) in which the instruments are sorted by quality score. Highest quality scores are ranked first.

Fig. 3
figure 3

Decision tree for choosing a quality of life instrument for children with mental health problems. Instruments are rated and ordered according to a rating system available in Additional file 1. Equal quality scores are represented by equal numbers. The higher the number the better the quality rating

Discussion

We found that none of the instruments was perfect for use in economic evaluation of child and adolescent mental health care as all instruments had disadvantages, ranging from lack of psychometric research, no proxy version, not being suitable for young children, no age-specific value set for children under 18, to insufficient focus on relevant domains (e.g. social and emotional domains). While around 50% of instruments had items that assessed social relations or psychological state, most just included a relatively general question probing a single aspect of psychosocial related problems. To fully assess the impact of psychosocial and mental health problems on quality of life, it is of the utmost importance that the outcome reflects all aspects of QoL that are affected, and not merely physical domains.

When one wants to perform a cost-utility analysis, most guidelines [107, 108], recommend to use the EQ-5D-Y. The advantage of this instrument is that both a proxy and a self-report version are available. A major disadvantage is that there is only an adult value set available. Studies have shown that the adult value set is not suitable for use in children and adolescents, given that health states described for adults are valued differently by children [109]. Different aspects are relevant for QoL in children, adolescents, or adults, making it questionable whether the adult items are relevant and important for QoL in children. Another major disadvantage to using the EQ-5D-Y for cost-utility analysis of child mental health care is the lack of questions that portrayed psychosocial problems. Only feelings of anxiety or depression are assessed with the EQ-5D-Y, which leaves externalizing and social problems neglected. Our review highlights the CHU9D as a more suitable instrument for measuring QoL if one plans to perform an economic evaluation, and the CHIP as a general measure for QOL in children with mental health and psychosocial problems.

Often, it is assumed measuring QoL in children below the age of 8 is not feasible and reliable. Proxy versions of instruments can be used in this age group, but these have their limitations as well. Some studies have reported poor to fair agreements between self and proxy versions of instruments (e.g., 35, 49, 50). Possibly, this difference is due to a different meaning of certain concepts for children than for adults. Moreover, it is unclear what determines high QoL in young children and it is hard to assess what high QoL is at a young age. Another problem associated with the use of proxy measures is that a proxy rater (often a parent) is close to the child thus the proxy’s interpretation of the QoL of the child may be affected by the child’s problems, leading to incorrect approximations of the child’s QoL. Where possible, it is recommended to let an individual report on their own QoL, possibly with an addition of a proxy version of the questionnaire. An instrument should consider the cognitive age of the child [16], at this moment none of the identified instruments does this. Another problem in current instruments is the poor to fair agreement between self and proxy versions of instruments [98, 110, 111]. Other studies reported moderate to high agreement [19, 101] between self and parent versions of questionnaires, but found large differences dependent on the domain, with higher correlations in physical domains [38]. However, most psychosocial interventions are aimed at changes in psychosocial domains, therefore one does not expect change in physical domains. Future research should focus on making age adjustable versions of questionnaires, assessing domains suitable for children with mental health disorders.

Interestingly, studies that compared generic QoL instruments with disease specific instruments measuring symptoms of mental health disorders found mostly weak to moderate correlations between the two [63, 76, 77, 79, 88, 92, 98, 102,103,104, 106]. These significant but relatively low correlations indicate that generic QoL instruments and disease specific instruments measure separate but related constructs. This indicates the added benefit of generic measures of QoL on top of disease specific measures in both research and clinical practice, since this gives a more complete overview of the child’s state. However, at this moment a perfect instrument for this purpose does not exists since most QoL measures are developed for children with somatic problems. The development of instruments that are suitable to measure QoL in children suffering from psychosocial or mental health problems is of utmost importance.

While this review provides a thorough overview of available instruments to measure QoL in children with psychosocial or mental health problems, some limitations should be noted. We did not have the resources to hold focus groups or interviews, in which children participate to assess the relevance of all items of instruments for use in children with mental health or psychosocial problems. To comprehensively assess which domains are relevant for children and adolescents compared to adults, children’s own appraisal of relevant domains, should be included in a measure for QoL for children (see also [112]). These focus groups or interviews should be aimed at assessing the relevance of certain domains and exploration of additional relevant domains in different age groups, and perhaps even different psychiatric classifications.

We did however, rate the inclusion of relevant domains based on the WHO definition. Additionally, we assessed the quality of the instruments with a newly developed, as we felt this fulfilled our requirements better than any existing instruments. The combination of quality assessment for both clinical practice and economic evaluations is relatively new, and therefore no available instrument met our criteria. While our assessment is transparent, an existing instrument could have led to different ratings. Furthermore, since many excellent reviews already summarized relevant instruments to measure QoL in children with mental health and psychosocial problems, we decided to perform a meta-review, and not a systematic search of individual studies. This approach could have caused us to overlook relevant instruments. Furthermore, we included children below the age of 18, but there is a growing international movement toward youth mental health services, which typically spans adolescence and young adulthood (ages 12–24). Future research is warranted on suitable instruments to measure QoL in this age group. Lastly, while we did a thorough search through all relevant databases and grey literature, we only included English or Dutch language articles.

Conclusions

Despite these limitations, this review provides an overview of the generic instruments available to measure QoL in children with mental health problems and their psychometric properties. This led to a decision aid which incorporates the results of the current study (Fig. 3), to aid in the choice of an instrument for QoL in children with mental health or psychosocial problems. Future research should focus on making age adjustable versions of questionnaires that take cognitive age into account, assessing domains suitable for children with mental health disorders.