Introduction

During the past two decades, several measures have become available to describe generic health-related quality of life in pediatrics, but adolescent self-report questionnaires received relatively little attention until now [1, 2]. The Child Health Questionnaire (CHQ) is one of the most widely used pediatric health-related quality of life measures and has been translated into 21 languages (32 countries). There is a form for parents and also a self-report form for adolescents, the Child Health Questionnaire-Child Form (CHQ-CF) [27]. The CHQ covers physical and psychosocial aspects of health, and includes the impact of child health problems or handicaps on family life [3]. This study focuses on the evaluation of missing answers at the item level, distribution of the scale scores, reliability, and validity of the CHQ-CF in an adolescent population.

We expect the commonly used paper format of the CHQ and other health questionnaires to be increasingly replaced by internet versions, especially in adolescent populations that are accustomed to the use of computers and the internet [8]. From the perspective of clinicians and researchers, the advantages of using the internet include avoiding paper work, on-line data-entry, and procedures designed to reduce the number of missing answers and the length of questionnaires [9, 10].

In general, the mode of questionnaire administration (e.g., written questionnaire, face to face interview, telephone interview, computer questionnaire) may affect the participation rate, number of missing answers, psychometric properties, and actual scores [1114]. With regard to health questionnaires, several studies demonstrated some differences between the commonly used paper versions and computer versions of the same questionnaires [1517]. Especially in studies comparing paper and computer questionnaires on sensitive topics, administration via computer was found to increase reporting of e.g., drug use or unsafe sexual behaviors, as this medium is apparently perceived as providing more privacy than a paper form [1820].

With regard to online, i.e., internet or web-based administration of health questionnaires, several studies have demonstrated that online health questionnaires are feasible in various settings, especially among adolescents [21, 22]. However, very few randomized studies have evaluated whether psychometric properties and scores differ between the paper and the internet mode of administration of the same health questionnaire [2326].

In this study, we compared indicators of the feasibility, reliability, and validity of the CHQ-CF in a subgroup of adolescents who completed the standard paper version of the CHQ-CF with the same indicators in another subgroup of adolescents who completed a newly developed internet version of the questionnaire. Additionally, we compared the mean CHQ-CF scores and distributions of the scale scores between both subgroups. A randomized parallel group design was applied in a large adolescent population (13–7 years old), ensuring that both subgroups were comparable.

The study assessed and compared the paper and internet mode of CHQ-CF administration with regard to the following indicators:

  1. (a)

    the number of missing answers (indicator of feasibility),

  2. (b)

    the distribution of the scale scores including mean scale scores in the whole sample and in gender and age specific subgroups,

  3. (c)

    the internal consistency reliability of multi-item scales (indicator of reliability),

  4. (d)

    the ability of the CHQ-CF to discriminate between subgroups with and without self-reported chronic conditions (indicator of construct validity).

Methods

Study population

In 2003, 1,071 students in 55 classes of various educational levels in the 3rd year of seven secondary schools (13–7 years old) in the area of Vlaardingen (metropolitan area) and Harderwijk (rural area), The Netherlands, were invited to complete the Child Health Questionnaire Child Form (CHQ-CF). The parents and students each received written information about the study several weeks before data collection; parents could refuse their child’s participation, and participation by the students was voluntary.

Data collection

The CHQ-CF consists of 87 items with 4, 5, or 6 response options divided over 10 multi-item scales and two single-item scales (Table 1) [3]. To reduce respondent burden, the item “change-in-health–was not fielded in this study, and the CHQ-CF scales “role functioning-emotional–and “role functioning-behavioral–were combined into a single scale. The combination of the two role functioning scales is a departure from the CHQ-CF instructions that makes the test analogous to the parent form of the CHQ in this regard [3]. For each scale, items were summed up (some recoded/recalibrated) and transformed into a 0 (worst possible score) to 100 (best possible score) scale [3]. Items on standard socio-demographic variables and the prevalence of seven chronic conditions were included in the questionnaire. From the conventional paper format, using the same wording of the items and instructions, an internet version of the questionnaire was developed through a generic internet tool using PHP (4.0.1), MySQL (3.22), and JavaScript (1.3) [27]. The internet version of the questionnaire listed the items of each CHQ-CF scale on a separate web-page. The internet version did not allow the respondent to select more than one answer to each item of the CHQ-CF and it checked the questionnaire for missing answers before the respondent could “logout– If one or more of the items were not answered, the internet version prompted the respondent to go back to complete those items; but, if the user failed to “logout–properly, missing answers would remain.

Table 1 CHQ-CF scales, items per scale, and interpretation of low and high scoresa

Randomization

Within each school class, students were randomly assigned to either the paper or the internet mode of administration using SPSS-generated random numbers. Students completed the questionnaires, either on paper or online in a classroom with computers linked to the internet, under the supervision of a research assistant; the students were allowed adequate privacy.

Analysis

Preparatory secondary vocational education was labelled as “lower secondary education”; secondary schools that prepare students for higher professional training as "intermediate secondary education", and university preparatory secondary education as "higher secondary education". Differences between the characteristics of the participants allocated to the paper versus the internet versions of the questionnaires were tested by Student’s t-test and the χ2 test. We assessed the frequency of missing answers to CHQ-CF items; the difference in the number of missing answers between the two formats was assessed by the Mann-Whitney U test. We assessed the distributions of the CHQ-CF scale scores to evaluate floor and ceiling effects (≥25% of the respondents having the lowest/highest score) for both modes of administration. Differences between CHQ-CF scale scores by format in the total sample were assessed by Mann-Whitney U tests. Additionally, after transforming the scale scores into ranks, ANOVA was applied to test whether the mode of questionnaire administration interacted with the variables gender (male n = 432; female n = 501) and age (13–4 year olds, n = 399; 15–7 year olds, n = 534). Cohen’s effect sizes, defined as d = [Mean(a) Mean(b)]/SD, where the denominator was the square root of [(n a -1)SD 2 a + (n b -1)SD 2 b ] / [(n a -1) + (n b -1)], were applied to indicate the relative magnitude of score differences between modes of administration. Here, the letters "a" and "b" refer to the subgroups administered the paper and internet forms of the test, respectively [28]. Following Cohen’s suggested guidelines, 0.20 ≤d < 0.50 indicated a “small effect– 0.50 ≤d < 0.80 a “medium effect– and d ≥0.80 a “large effect–[28]; Norman et al. have suggested that, in general, d = 0.50 can be considered as threshold for a “minimally important difference–(MID) [29]. Cronbach’s α was applied to evaluate the internal consistency reliability of CHQ-CF multi-item scales by format; α of 0.70 or higher was considered to indicate sufficient internal consistency reliability [30]. We applied statistical tests of the hypothesis that the Cronbach’s α reliability coefficients of CHQ multi-item scales in the sample administered the test on paper were equal to those administered the test online [31]. We applied item-level discriminant tests to evaluate whether the CHQ-CF items represent separate domains. For each mode of questionnaire administration, we evaluated whether (on average) correlation coefficients (Pearson-r correlation coefficients) between the items and their own scale score (without the item under consideration) were higher than the correlation coefficients between these items and any other scale. The average Pearson-r correlation coefficients were calculated by applying Fisher’s z transformations [32]; we tested whether the differences between the average Pearson-r correlation coefficients in the subgroup administered the paper form and in the subgroup administered the test online were statistically significant [33]. We assessed the CHQ-CF’s ability to discriminate between subgroups with 0, 1 or 2, and 3 or more chronic conditions, after having transformed the CHQ-CF scale scores into ranks, by ANOVA with the independent variables “number of chronic conditions– "mode of questionnaire administration– and the interaction term “number of chronic conditions–“mode of questionnaire administration– Cohen’s effect sizes d = [Mean(a) Mean(b)]/SD in the condition subgroup were calculated for 1 or 2 versus 0 conditions, and for ≥3 versus 0 conditions. The designations "a" and "b" refer to the subgroups without chronic conditions and those with chronic conditions, respectively [28].

All analyses were done using SPSS, Version 11.0.1. The medical ethical committee of the Erasmus MC-University Medical Center Rotterdam, approved the study.

Results

Participants and randomization

The participation rate was 87%. The age range of the participants was 13–7 years (mean age 14.7 years; SD 0.68), 54% were female, 93% were born in the Netherlands, and the majority attended lower secondary education (Table 2). The prevalence of self-reported chronic conditions was as follows: asthma, 8%; allergies, 25%; hearing problems, 7%; visual problems, 8%; headaches or migraine, 17%; chronic lower back pain, 17%; and depression or anxiety attacks, 8% (Table 2). These characteristics were equally distributed in the groups assigned to the paper and internet versions of the questionnaires (P ≥0.05; Table 2). The demographic characteristics of the participants (age, gender, country of birth, and educational level) reflected those of the general population of Dutch adolescents [34].

Table 2 Characteristics of study participants (total sample: n = 933; participants assigned to the paper questionnaire:n = 475; participants assigned to the internet questionnaire: n = 458)

Difference in the number of missing answers between different modes of CHQ-CF administration

At the item level, use of the paper version of the CHQ-CF resulted in more missing answers (0–.89% per item; mean 0.54%) compared with the internet version (0–.22% per item; mean 0.04%; P < 0.01).

CHQ-CF scores by mode of administration

A ceiling effect was observed for four CHQ-CF scales in the subgroup that completed the paper questionnaire, and 3 CHQ-CF scales in the subgroup that completed the internet questionnaire (Table 3). Four CHQ-CF scales, i.e., “general behavior– “role functioning-physical– “mental health– and “family activities– resulted in statistically significant, higher scores for paper versus internet administration (P < 0.05), but the effect sizes (d) were ≤0.21 (Table 3). The mode of questionnaire administration did not interact significantly with gender (P ≥0.05 regarding all scales), nor with age (P ≥0.05 regarding six scales), except for the CHQ-CF scales “role functioning-emotional/behavioral–(P < 0.05), “mental health–(P < 0.05), “self esteem–(P < 0.05), and “general health–(P < 0.01). Regarding these 4 CHQ-CF scales, administration of the paper version resulted in lower scores than online administration (or nearly equal scores in the case of “mental health– in the subgroup of 13–4 year olds, while in the subgroup of 15–7 year olds, paper administration resulted in higher scores compared with internet administration; the Cohen’s effect sizes (d) for these differences, regardless of sign, were ≤0.21 (data not shown).

Table 3 Comparison of mean scores, distributions of the scale scores, and other psychometric properties of CHQ-CF scales in subgroups with paper (n= 475) and internet modes (n = 458) of questionnaire administration

Internal consistency reliability of scales by mode of administration

Cronbach’s αs for the two formats were adequate for all CHQ-CF scales, except “physical functioning–in the subgroup administered the paper version of the questionnaire (α = 0.69). The two “role functioning–scales and “mental health–showed statistically significant, higher Cronbach’s αs in the subgroup administered the paper version of the questionnaire compared with the alphas in the subgroup administered the internet version (P < 0.01, respectively P < 0.05) (Table 3). All multi-item scales, regarding both modes of administration, showed higher average (corrected) item-own scale correlation coefficients than average item-other-scale correlation coefficients. The two “role functioning–scales showed statistically significant, higher average item-own scale correlation coefficients in the subgroup administered the paper version of the questionnaire compared with the item-own scale correlation coefficients in the subgroup administered the internet version (P < 0.01) (Table 3).

Construct validity by mode of administration

All mean CHQ-CF scale scores were lower in the subgroup with one or two reported conditions and in the subgroup with three or more reported conditions when either was compared with the subgroup with no reported conditions. For both modes of questionnaire administration, and for all CHQ-CF scales, the more chronic conditions that were reported, the higher the effect sizes compared with the subgroup with no chronic conditions. ANOVA showed statistically significant CHQ-CF score differences by “number of chronic conditions–for all scales (P < 0.01) (Table 4). The mode of questionnaire administration did not interact significantly with the variable “number of chronic conditions–(P ≥0.05 for all scales) (Table 4).

Table 4 Ability of the CHQ-CF scales to discriminate between subgroups differing in the participants–number of chronic conditions, for the group that was assigned to complete the paper version (n = 475) and for the group that was assigned to complete the internet version (n = 458)

Discussion and conclusions

In this study we applied a randomized design to compare the results of the Child Health Questionnaire-Child Form (CHQ-CF) administered by a paper questionnaire and by an online questionnaire. The results provided support for the feasibility, internal consistency reliability, and construct validity of the CHQ-CF scales. Both modes of questionnaire administration yielded comparable scale scores and showed comparable psychometric properties. Additionally, the study provided reference/norm scores for clinical studies (general population of 13–7 year olds).

Strengths of the current study

The participation rate was high. Study group characteristics (age, gender, country of birth, and educational levels) were representative of those of the general population of Dutch adolescents [34]. Randomization to either the paper or internet mode of administration of the CHQ-CF was successful with respect to the evaluated characteristics.

Limitations

We applied a randomized parallel group design that allows for the comparison of indicators of feasibility, reliability, and validity at the group level between a subgroup that completed a paper version and a subgroup that completed an internet version of the CHQ-CF. However, this did not allow an evaluation of whether the same person would provide equivalent or different answers to the same CHQ-CF questionnaire administered by the alternative mode, which would require a randomized crossover design [25, 35]. Such an evaluation at the individual level requires the respondent to forget all previously provided answers at the second assessment, e.g., by waiting 1 or 2 weeks between the two measurements. It also requires that there is no effect from having previously completed a CHQ-CF questionnaire by any mode, paper or internet, at the second assessment, and that scores by the same mode of administration after a relatively short interval, in the absence of changes in health status, are exactly the same. However, in an evaluation of retesting with the same paper version of the CHQ-CF after 2 weeks, 5 out of 10 CHQ-CF scales showed statistically significant, higher scores at the second measurement with Cohen’s effect sizes ranging from 0.25 to 0.40, while intraclass correlation coefficients between the first and second measurement ranged from 0.06 thru 0.84 [7]. Furthermore, in a randomized crossover design, “carry-over–effects may be present, i.e., completing an internet version before a paper version may have a different effect on the second assessment, than does completing a paper version before an internet version [35]. Despite the logistical and the above-mentioned methodological challenges, we recommend future studies comparing the paper and internet versions of the CHQ-CF applying a randomized crossover design to evaluate congruency of answers to CHQ-CF items at the individual level.

In this study, internet and paper questionnaires were completed in a controlled environment with adequate privacy and supervision. This may not be the case during future applications. We are unaware of the impact less privacy during completion of the questionnaires may have, but this would apply to both the paper and the internet versions of the questionnaire.

For both modes of questionnaire administration, we did not evaluate correlations between CHQ-CF scores and a relevant parent-rated questionnaire such as the CHQ-PF50 [2, 3]. Test-retest reliability of the CHQ-CF and its responsiveness and sensitivity to changes in health were not evaluated in the current study. The CHQ-CF has a relatively large number of items; therefore, we recommend developing a shorter version in the future.

Psychometric properties

The psychometric properties, with only a few exceptions, were equal between the two modes of questionnaire administration. The Cronbach’s α of the scale “physical functioning–in the subgroup administered the paper version of the questionnaire was just under 0.70, and the difference with the alpha in the subgroup administered the internet version was not statistically significant.

Missing values

Compared with the paper version, the internet version was successful in reducing the quantity of missing data. Theoretically, differences in selective partial non-response between formats might have contributed to differences in scores that were reported in this study. In our study, in the subgroup (n = 86) that had at least one missing answer to a paper CHQ-CF item, all scale means were somewhat lower than in the subgroup (n = 389) with no missing answers, but these differences were not significant (P ≥0.05). Thus, missing answers are not a source of the observed score differences.

Score differences between modes of questionnaire administration

Recently, Ritter et al. found no statistically significant score differences between internet and paper modes of administration for 16 health-related measures, but the study was conducted in an opportunity sample retrieved from the internet, which limits its generalizability [23]. In a randomized internet-paper comparison among adolescents concerning various health measures other than the CHQ-CF, only one statistically significant score difference was reported among 21 topics [24]. In another randomized adolescent study, a medical consumption index and 11 indicators of fruit consumption and determinants of fruit consumption did not show statistically significant score differences between internet and paper administration of the questionnaire, except for one measure that showed small score differences between modes of administration [25]. The International Study of Asthma and Allergies in Childhood (ISAAC) questionnaire did not show statistically significant score differences between internet and paper administration in two randomized adolescent studies [25, 26].

In our study, in the whole sample, the paper version resulted in slightly, yet statistically significant, higher scores for 4 of 10 CHQ-CF scales compared with the internet version. One plausible explanation is chance, since it should be considered that given multiple comparisons, there is a 1-in-20 chance of a false association for each comparison (Type I error at α = 0.05) [36]. A commonly used Bonferroni correction for 10 comparisons would imply an adjusted α = 0.05/10 = 0.005 [36]; at α = 0.005, only one score difference (regarding the scale “general behavior– was significant. Furthermore, given Cohen’s suggested guidelines for the interpretation of effect sizes, three of the four statistically significant differences between modes of administration can be considered as negligible (d ≤0.12), and one difference regarding the CHQ-scale “general behavior–(d = 0.21) can be considered as small [28]; all effect sizes were far below d = 0.50 that was suggested as an approximate threshold for “minimally important differences–by Norman et al. [29]. This study provides no explanations for the established small score differences between paper and internet administration, or for the established statistically significant, but small interaction effects of administration mode with age in the case of four CHQ-CF scales.

Conclusions

With increasing application of online health questionnaires rather than questionnaires on paper, especially in adolescent populations, it should be noted that comparison of results requires that the scores between these modes of administration do not show meaningful statistically significant differences. This study showed that, overall, paper and internet versions of the CHQ-CF yielded only a few, negligible or small, differences. Paper and internet modes of CHQ-CF administration may be combined in a single study, although researchers should consider the possibility of minor score differences depending on the mode of administration for some scales. We recommend repeated studies in other populations, including clinical populations, to confirm or reject our results.