Background

In recent years, it has been accepted by medical practitioners and researchers that patient-reported outcomes (PROs) should be implemented into clinical practice and research, in addition to objective clinical outcomes such as laboratory values. Health-related quality of life (HRQOL), which is a PRO, assesses some aspects of quality of life affected by diseases and their treatment [1,2,3]. Aspects such as the level of physical, psychological, and social well-being are also called generic HRQOL. The generic HRQOL can be compared between individuals with and without health problems. Hence, it can be used for the health evaluation of various populations [4, 5]. Therefore, HRQOL has established itself as an important outcome in clinical research, clinical practice, and community health [6,7,8].

Self-report is the standard method of measuring HRQOL. The same is true for children, but HRQOL for children who are too young to self-report or cognitively impaired requires parent-report [9]. There are several scales that provide parent-report for HRQOL for children older than 2 years (e.g., Pediatric Quality of Life Inventory (PedsQL) Generic Core Scales [10]), but HRQOL scales available for infants younger than 2 years are limited. In addition to the PedsQL Infant Scales (PedsQL-I) [11], the TNO/AZL Preschool Children Quality of Life Questionnaire (TAPQOL) [10, 12] and the Infant and Toddler Quality of Life Questionnaire (ITQOL) [13, 14] are used. However, the TAPQOL is not available until after the first year of life, and the ITQOL has 103 items, making it burdensome for the respondent. The PedsQL-I can be used from the first month of life, and having only a small number of items (36 items (1–12 months) or 45 items (13–24 months)), it is considered to be a simple and highly useful HRQOL scale.

One of the advantages of the PedsQL-I is its suitability for longitudinal HRQOL tracking: the two-factor structure of the PedsQL-I with physical and psychosocial health summary scores is similar in structure to the PedsQL Generic Core Scales [5, 10], which can be used from age 2–25 [15, 16]. Therefore, it can be used with conceptual consistency in longitudinal quality of life studies that follow patients for several years. The purpose of the present study was to develop a Japanese version of the PedsQL-I and to investigate its feasibility, reliability, and validity. In particular, we examined the relative and transitional validity of the PedsQL-I at the age boundary to determine its usefulness in longitudinal follow-up studies that continue beyond age two.

Methods

Scale development

Dr. James W. Varni (JWV) permitted the translation of the PedsQL-I [11] into Japanese. Two researchers separately translated the original scale using an approved translation procedure [17]. We reconciled these translations into a single Japanese version, preserving the wording between this scale and the Japanese version of PedsQL Generic Core Scales [18]. This preservation is needed because consistency between PedsQL-I and the PedsQL Generic Core Scales Toddler version (PedsQL-T) is necessary for tracking HRQOL over time. A native English translator, who was blinded to the original version, back-translated the reconciled version into English. Seven Japanese health professionals (pediatric nurses, clinical psychologists, and a nurse-midwife, including bilingual/bicultural researchers) compared the back-translated version with the original English version and made minor amendments to the reconciled Japanese version to produce a pilot questionnaire [19].

Ten native Japanese-speaking parents who nurtured infants aged 1–24 months participated in the pilot test between November 2013 and January 2014 [19]. The data obtained from the pilot test were used to produce a final Japanese version of the PedsQL-I. No words or phrases required modification after the pilot test. In addition, JWV confirmed the conceptual and linguistic equivalence between the Japanese version and the original version.

Study population

Parents with children aged 1–30 months were recruited from eight day care centers and one pediatric clinic located in Tokyo using distributed flyers. Recruitment was carried out between July and September 2014. Parents whose ability to read, understand and communicate in Japanese was judged as insufficient by the day care center/pediatric clinic staff were excluded. The present study included parents with children aged 25–30 months to examine transitional validity at the age boundary.

We calculated sample size based on known-group validity. Assuming a 1:5 ratio between those with and without a disease, it was calculated that 38 infants and 190 infants, respectively, were needed to detect a moderate difference (effect size d = 0.5) with a power of 0.8 at a significance level of 0.05.

Procedure

The study collaborators distributed the questionnaires and return envelopes. Participants received a detailed information sheet informing them of the content and extent of the study, and their completing the questionnaire was taken as providing informed consent. The questionnaires were either returned by mail or collected from designated questionnaire return boxes placed in the day care centers and the pediatric clinic. Parents with children aged 1–24 months were informed of the retest procedure during the first test. Participants who consented to the retest were sent retest questionnaires in self-addressed envelopes 1–2 weeks after the initial questionnaire, asking participants to return the retest within a week of receipt. Upon completion of the study, a summary of the study results was sent to the heads of the day care centers and the pediatric clinic.

Measurement

The PedsQL-I has two age-appropriate versions, for ages 1–12 months and 13–24 months [11]. The 1–12 months version and the 13–24 months version include the same five scales (36 and 45 items, respectively): Physical Functioning (6 and 9 items, respectively), Physical Symptoms (both include 10 items), Emotional Functioning (both include 12 items), Social Functioning (4 and 5 items, respectively), and Cognitive Functioning (4 and 9 items, respectively). The PedsQL-I also includes the Physical Health Summary score (the mean of the item scores included in the Physical Functioning and Physical Symptoms Scales) and the Psychosocial Health Summary score (the mean of the item scores included in the Emotional, Social, and Cognitive Functioning Scales). Respondents are asked to describe the extent to which each item had troubled their children over the past 1 month (the PedsQL’s standard recall period). A 5-point Likert response scale is used (0 = never a problem; 1 = almost never a problem; 2 = sometimes a problem; 3 = often a problem; 4 = almost always a problem). Items are reverse-scored and linearly transformed to a 0–100 scale (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0), where higher scores indicate a better HRQOL. Scale scores and the total score are computed as the sum of the items divided by the number of items answered. If more than 50% of the items are missing or incomplete, the scale score is not computed.

PedsQL-T consists of 21 items that belong to one of the four following scales: Physical Functioning (8 items), Emotional Functioning (5 items), Social Functioning (5 items), and School Functioning (3 items) [18]. The PedsQL-T also includes the Physical Health Summary score (same as the Physical Functioning scale) and the Psychosocial Health Summary score (the mean of the item scores included in the Emotional, Social, and School Functioning Scales). These scale scores and the total score are computed as the sum of all the items on the PedsQL-I in a similar manner.

Kessler-6 (K6) consists of six items and was used to screen parents for psychological distress [20]. Respondents rate how often they felt (1) nervous, (2) hopeless, (3) restless or fidgety, (4) so depressed that nothing could cheer them up, (5) that everything was an effort, and (6) worthless over the past one month. A 5-point Likert response scale was used (0 = none of the time; 1 = a little of the time; 2 = some of the time; 3 = most of the time; 4 = all of the time). Responses to the six items were summed up to yield a K6 score between 0 and 24, with higher scores indicating a greater tendency towards psychological distress.

Participants were asked about their age, gender, familial relations to their child, and economic status. They also answered the items regarding characteristics of their children (age, gender, place where a child spends daytime, and presence of acute, chronic illness and/or other disease). We also added one question to the retest questionnaire: ‘Has a significant event affecting you happened since responding to the initial questionnaire?’.

Participants with children aged 1–18 months completed the PedsQL-I and participants with children aged 19–30 months completed the PedsQL-I and PedsQL-T. All participants answered the items of K6.

Statistical analyses

All analyses were performed using IBM SPSS software, version 21 (SPSS, Inc., Chicago, IL, USA) and R version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) [21]. The level of significance was set at 0.05.

Score distributions for the PedsQL-I were summarized as mean, standard deviation, minimum and maximum scores, and percentages of floor (0) and ceiling (100) scores, in children aged each 1–12 months and 13–24 months. We defined a high floor/ceiling effect as more than 20% and an especially-high floor/ceiling effect as more than 50%. We assessed correlations between subscales on PedsQL-I in children aged each 1–12 months and 13–24 months by calculating Pearson’s product–moment correlation coefficient.

Feasibility was determined based on the percentage of missing values. Independence of easily missed items was assessed by Cochran’s Q test.

Reliability was assessed based on internal consistency and test–retest reliability among children aged 1–24 months. Good internal consistency was defined as a Cronbach’s alpha exceeding 0.70. To determine test–retest reliability, intraclass correlation coefficients (ICC) between the initial test and retest scores in a one-way random effects model were calculated; an ICC value of 0.40 represented moderate, 0.60 good, and 0.80 high agreement [22]. A paired t test between the initial test and retest scores was used to check whether or not the PedsQL scores had changed.

Validity was assessed based on factorial validity, known-groups validity, concurrent validity, convergent and discriminant validity, and relative validity. Factorial validity and known-groups validity were assessed among children aged 1–24 months. Concurrent validity, and convergent and discriminant validity were assessed among children aged 18–24 months. Relative validity was assessed among children aged 18–30 months.

We used multi-trait analysis to assess factorial validity [23]. The Pearson’s product–moment correlation coefficients of an item of PedsQL-I with its own scale (corrected for overlap) and other scales were calculated. We used pairwise correlations because only participants with infants aged 13–24 months answered 3 items of the Physical Functioning Scale, one item of the Social Functioning Scale, and 5 items of the Cognitive Functioning Scale. We hypothesized that an item correlates more strongly with its own scale than other scales. Scaling success for any scale was also assessed as the number of convergent correlation coefficients significantly higher than the discriminant correlation coefficient divided by the total number of correlations.

To assess known-groups validity, we calculated the regression coefficients for the presence of acute or chronic illness with the Physical Health Summary, the Psychosocial Health Summary, and the total scores of the PedsQL-I using a linear mixed model. We controlled the child’s age, gender, the parent’s age, gender, and K6 score as fixed effects and the day care centers/pediatric clinic as a random effect in this model. The presence of acute illness was defined as a child’s having only acute illness, and the presence of chronic illness was defined as a child’s having chronic illness regardless of having acute illness. We hypothesized that PedsQL-I would demonstrate a lower Physical Health Summary score and total score under acute illness and a lower Physical and Psychosocial Health Summary score and total score under chronic illness. Before conducting the analysis, we checked the interaction between infant age and disease presence for both the 1–12 months and the 13–24 months version using a regression model, but the difference was not statistically significant. Therefore, we conducted this analysis using the combined versions.

To assess concurrent validity, we calculated Pearson’s product–moment correlation coefficients to confirm that the Physical Health Summary, the Psychosocial Health Summary, and the total score on the PedsQL-I were positively correlated with the same scale scores on the PedsQL-T. Correlation coefficients of 0.10 represent small, 0.30 medium, and 0.50 large correlations [24].

Convergent and discriminant validity were examined by calculating Pearson’s product–moment correlation coefficient between scales from PedsQL-I and PedsQL-T. We hypothesized that the correlation of the Physical Functioning and Physical Symptoms Scale of the PedsQL-I would be highest with the Physical Functioning Scale of PedsQL-T, the Emotional Functioning Scale of PedsQL-I with that of PedsQL-T, and the Social Functioning Scale of PedsQL-I with that of PedsQL-T.

Relative validity was assessed via the ratio of squared t values. PedsQL-I must be better able to show HRQOL differences between known-groups than PedsQL-T among infants. If PedsQL-T can differentiate HRQOL between known groups better than PedsQL-I, the suitability of PedsQL-I for infants would be questionable (because PedsQL-T can take the place of PedsQL-I). We calculated the t value of the difference of the PedsQL-I Scale scores between 19–24-month-old infants with and without any (acute, chronic, and/or other) diseases, and compared it with those of the difference of the PedsQL-T scores. A larger than 1 ratio of squared t values means that the PedsQL-I Scale is relatively-valid as HRQOL scale for infants compared to the PedsQL-T. As control experiments, we checked the ratio of squared t values among 25- to 30-month-old toddlers.

We wanted to answer the question of whether the results measured by the PedsQL-I were ready to be taken over by the PedsQL-T when an infant visiting a follow-up clinic (or participating in a longitudinal study) matured. The transitional validity was determined by estimating two linear models of the PedsQL score depending on age: the PedsQL-I Scale scores among 19–24-month-old infants and the PedsQL-T Scale scores among 25–30-month-old toddlers. From these models, we obtained two estimates of PedsQL score at 24.5 months old. We hypothesized that the two estimates coincided with each other and tested it by bootstrapping. Similarly, we also checked the score difference between 1–12-month-old and 13–24-month-old infants at 12.5 months old.

Results

Participant characteristics

We distributed questionnaires to 340 participants of which 190 (55.9%) were returned. Seven questionnaires were excluded because participants with children aged 1–12 months had mistakenly completed the PedsQL-I 13–24 months version. A total of 183 questionnaires were analyzed (53.8%). This sample consisted of 50 participants with infants aged 1–12 months, 95 participants with infants aged 13–24 months, and 38 participants with infants aged 25–30 months (Table 1). Thirty-four percent of children aged 1–12 months and 78.9% of children aged 13–24 months spent daytime at a day care center. Far more children had an acute (75/145, 52%, including cold, herpangina or otitis media) and/or chronic disease (18/145, 12%, including allergy, atopic dermatitis, or asthma) than we assumed.

Table 1 Socio-demographic characteristics

Scale descriptions

More than half of the participants reported the maximum possible score on the Social (82.0% and 72%) and Cognitive Functioning Scale (80.0% and 63%) for the 1–12 months and the 13–24 months version of the PedsQL-I Scales, respectively (Table 2). There was no floor effect in any scale. All Pearson’s correlation coefficients between subscales on PedsQL-I were significant in children aged both 1–12 months and 13–24 months (Table 3).

Table 2 Score distribution of the Japanese version of the PedsQL Infant Scales
Table 3 Correlations between subscales on the Japanese version of the PedsQL Infant Scales

Feasibility

Missing items were independent of each other in the 1–12 months version of PedsQL-I (P = 0.35), but there were significant differences in the rate of missing values between items in the 13–24 months version of PedsQL-I (P = 0.002) (Additional file 1: Table S1). Some participants did not report the items of the PedsQL-I Cognitive Functioning Scale, for example, “Difficulty pointing to his/her body parts when asked,” “Difficulty naming familiar objects,” and “Difficulty repeating words.” Many participants reported that the items of the PedsQL-I were intelligible.

Reliability

The scales were internally consistent for participants with infants aged 1–12 months and 13–24 months (Chronbach’s coefficient α 0.88–0.98 and 0.85–0.97, respectively) (Table 2). We distributed retest questionnaires to 66 participants of which 56 (84.8%) were returned. Three retest questionnaires were excluded because participants had mistakenly completed the incorrect version of PedsQL-I in the first test. Fifty-three questionnaires (80.3%) were analyzed for test–retest reliability. In a comparison of respondent characteristics for retest and non-retest participants (Fisher’s exact test or Welch’s t test), retest participants tended to report that their child had acute disease at the first test. There was no significant difference between the initial scale scores of the PedsQL-I and the retest scores (Table 4). The intra-class correlation coefficients between the initial test and the retest all exceeded 0.40, indicating moderate to high agreement.

Table 4 Test–retest reliability of the Japanese version of the PedsQL Infant Scales

Validity

With respect to factor validity, the convergent correlation coefficient of the scales varied from 0.26 to 0.77, and the discriminative correlation coefficient of the scales from 0.07 to 0.64. The scaling success rate ranged from 80 to 100% (Table 5). Because a considerable portion of children had the same best score (100) they took up a large weight in the computation of Cronbach’s alpha, test–retest ICC, and correlations. Therefore, we also calculated Cronbach’s alpha, test–retest ICC, and correlations for the subset of participants without the best score of 100 (Additional file 1: Table S2).

Table 5 Factorial validity of the Japanese version of the PedsQL Infant Scales

Validation for known-groups showed that the Physical Health Summary score was sensitive to both acute and chronic disease, the Psychosocial Health Summary score was sensitive to neither acute nor chronic disease, and the total score was sensitive to only acute disease (Table 6). Although the Psychosocial Health Summary score and the total score were insensitive to chronic illness, the results showed a 3.0- and 5.3-point reduction to chronic illness, respectively.

Table 6 Known-groups validity of the Japanese version of the PedsQL Infant Scales

With respect to concurrent validity, correlation coefficients between the PedsQL-I and the PedsQL-T were as follows: Physical Health Summary scores (0.61), Psychosocial Health Summary scores (0.60), and total scores (0.74) (Table 7).

Table 7 Concurrent validity of the Japanese version of the PedsQL Infant Scales

The PedsQL-I scores were convergent and discriminative valid against the PedsQL-T, with a Pearson’s correlation coefficient for each score as follows: Physical Functioning (0.94), Physical Symptoms (0.64), Emotional Functioning (0.83), and Social Functioning (0.44) (Table 8). Different from our hypothesis, the PedsQL-I Social Functioning Scale correlated better with the PedsQL-T Emotional Functioning Scale than with the PedsQL-T Social Functioning Scale.

Table 8 Convergent and discriminative validity of the Japanese version of the PedsQL Infant Scales

Relative validity of the PedsQL-I total score and the Physical Summary score are presented (Table 9). The PedsQL-I was better able to distinguish healthy infants from ill infants than the PedsQL-T with regard to the total score (relative efficiency [RE] = 1.74) and the Physical Summary score (RE = 5.43). Contrary to our expectation, the PedsQL-I Psychosocial Summary score was less able to distinguish healthy infants from ill infants than the PedsQL-T (RE = 0.85).

Table 9 Relative validity of the Japanese version of the PedsQL Infant Scales

Regarding transitional validity of the PedsQL total score, the differences between the estimated score of the PedsQL-I and those of the PedsQL-T at the transition point were 1.0 for the mean (95% confidence interval (CI) − 6.2 to 7.2, P = 0.77). The PedsQL Physical Summary score also showed that the mean difference of the estimates at the transition point was − 2.1 and the CI spanned across zero (95% CI − 12.7 to 5.9, P = 0.71). Similarly, the two estimates of the PedsQL Psychosocial Summary score had no significant mean difference (3.2, 95% CI − 2.2 to 8.7, P = 0.25). Regarding the boundary at age 12.5 months, the Total, Physical Summary and Psychosocial Summary scores also showed no significant mean differences (− 2.6, 95% CI − 10.8 to 5.5, P = 0.53; − 0.3, 95% CI − 11.3 to 10.5, P = 0.95; and − 4.5, 95% CI − 11.6 to 2.6, P = 0.21), respectively.

Discussion

This study shows that the Japanese version of the PedsQL-I is a feasible, reliable, and satisfactory measure of HRQOL for infants aged 1–24 months.

High ceiling effects were observed; however, they had no substantial effect on psychometric properties, as can be judged from the computations done on the subset of infants without the best score. Especially-high ceiling effects observed with the Social Functioning Scale and the Cognitive Functioning Scale were likely due to participants evaluating their own children, who were healthy enough to live at home or in day care. Parents tend to acknowledge their children’s receptiveness to their own cues positively. Relatively high ceiling effects for the Social and Cognitive Functioning Scales have been reported previously for infants and children [11, 25, 26]; in the present study, however, they were particularly high. The Social, Cognitive and Emotional Functioning Scales construct the Psychosocial Health Summary in the PedsQL-I. The Psychosocial Health Summary had no high ceiling effect. It is possible that infants’ psychosocial aspects are broad and undifferentiated.

Some participants did not report items 6, 7, and 8 in the Cognitive Functioning Scale in the 13–24 months version. For item 6; “Difficulty pointing to his/her body parts when asked,” it might have been difficult to articulate an answer, which then resulted in not being able to answer the subsequent items 7; “Difficulty naming familiar objects” and item 8; “Difficulty repeating words.” This may be partly because very unhealthy (e.g. hospitalized) infants did not participate in this study, and partly because of individual variability in development, i.e., some parents might consider these items less applicable to their own children. However, the overall missing rate was only 0.7%, suggesting good feasibility. Rather, we cannot omit these items. The larger number of items in the PedsQL-I than the PedsQL-T would be necessary to understand infant HRQOL with broad and undifferentiated aspects.

The internal consistency reliability of the total score and all five subscale scores of the PedsQL-I was good and the test–retest reliability was also satisfactory. Together, this suggested sufficient reliability of the PedsQL-I. Proof of the validity of our results was assessed by convergent validity, discriminative validity and scaling success rate. Scaling success rate for all subscales exceeded 80%, which ensured factorial validity [27]. Validation for known-groups showed that the Physical Health Summary score was sensitive to both acute and chronic disease, however contrary to our hypothesis, the Psychosocial Health Summary score was sensitive to neither acute nor chronic disease. This study did not include infants who were hospitalized or institutionalized. Therefore, it is possible that the difference between the groups would be smaller than expected. This may have caused the lack of statistical difference. Another reason might be the lack of power due to the small sample size as a result of an unexpectedly low questionnaire response rate.

Regarding concurrent validity, each score of the PedsQL-I showed a high correlation to PedsQL-T. However, the PedsQL-I Social Functioning Scale correlated better with the PedsQL-T Emotional Functioning Scale than with the PedsQL-T Social Functioning Scale. For children in infancy and early childhood, emotional functioning is consistently based on the parent–child relationship. On the other hand, social relationships for children expand rapidly. Reflecting this, in contrast to the PedsQL-I Social Functioning scale, which asks about parents' perceptions of their infants' reactions to their parents, the PedsQL-T Social Functioning scale asks about parents' perceptions of their children's interactions with other children. We considered this to be a reason for the result.

Relative validity reported a ratio of squared t values of 1.74, which justifies the use of this scale in infants up to 24 months of age. Further, there was no difference in the score of intercept between age transition. To our knowledge, the present study is the first to show that it is appropriate to use the total score of the PedsQL-I as a continuous value between the age groups 1 and 2 years. In longitudinal HRQOL studies that follow patients from infancy, the PedsQL-I should be useful to study HRQOL at first, and the results could be readily taken over by the PedsQL-T as an infant matures.

A strength of the present study was that we included participants from diverse socio-demographic backgrounds. However, our sample was limited to participants from Tokyo and information on non-participants was not available. Thus, the results maybe not be generalizable to all of Japan. Second, the small sample size is a limitation of this study. In particular, it was difficult to recruit a defined number of known-group infants for each age group due to the unpredictably large number of infants with an acute or chronic disease and an unexpectedly low questionnaire response rate.

Third, the high ceiling effects observed in the present study might limit the usefulness of the scale in discerning health improvements in healthy or mildly ill infants. Fourth, we could not use comparable supporting measures (i.e. TAPQOL) to match each domain of PedsQL-I because Japanese versions have not been developed. Fifth, acute and chronic diseases of the infants in the present study were parent-reported and not verified. Further research will evaluate the use of the PedsQL-I in well-defined infant populations including hospitalized and institutionalized infants.

The present study taught us several points about infant HRQOL research. One notable point is that 50.0–71.1% of children in different age groups had an acute disease. Children in these age groups tend to develop acute conditions easily but recover within a few days. Therefore, it might be more reasonable to use a 7-day recall rather than a 1-month recall. Further, we assumed the factor structure of the PedsQL-I original version. As our sample size did not allow us to conduct exploratory or confirmatory factor analysis, we used multi-trait analysis, but this may suffer from single-rater bias. Parents have been thought of as the best reporters to assess their children’s HRQOL [28]. In the future, however, additional forms of QOL evaluation, such as dyad comparison of mother- and father-report, and multi-view, which includes the view of surrounding family members and child health professionals, as well as the growing infants’ own perspective, might bring further understanding to this challenging field.

Conclusions

The PedsQL-I is suitable for assessing health-related quality of life in healthy infants aged 1–24 months. We expect that the PedsQL-I should be useful to study HRQOL in a variety of infants (with various diseases) and for conducting follow-up studies conjoint with the PedsQL-T. Future research will help expand our understanding of and develop strategies to improve infant HRQOL.