FormalPara Key Points for Decision Makers

The Growth Hormone Deficiency-Child Impact Measure (GHD-CIM) is a unique observer-reported outcome (ObsRO) measure developed to assess the impact of growth hormone deficiency (GHD) on children and adolescents in the domains of physical functioning and social and emotional well-being based on the parent/guardians’ observations of the child’s daily life and health.

The GHD-CIM was found to be a reliable and valid ObsRO measure for parents of children with GHD aged 4 to < 13 years.

Incorporating the brief, 11-item GHD-CIM assessment into clinical practice could allow clinicians to assess a patient’s response to therapy.

1 Background

Growth hormone deficiency (GHD) results when the pituitary gland does not produce enough growth hormone (GH) to stimulate the body to grow, and manifests as an abnormally slow rate of growth in both early and later childhood [1]. Childhood GHD prevalence is within the range of 1.8–2.9 per 10,000 in Europe and the US [2,3,4]. The most common cause of childhood-onset GHD is not known [1], but it is classified as idiopathic isolated GHD, without any pituitary structural abnormalities or other concomitant hormone deficiencies. GHD impacts multiple aspects of daily life for children with the condition. The burden of GHD includes symptoms beyond small stature and reduced growth, including poor energy, decreased muscle strength/endurance, other physical limitations, and social and/or emotional impacts [5]. With GH replacement treatment, many children with GHD can reach a normal height for their family [3]. Treatment involves GH injections, typically administered once daily and continuing until the child reaches their final height. Recombinant GH has been in use clinically for over 30 years and is generally considered safe, with rare major adverse effects [6] and only potential minor adverse effects (for example, rash and pain at the injection site) [7].

Unfortunately, no disease-specific measures exist to assess the impact of GHD on children and adolescents. Although the Quality of Life in Short Stature Youth (QoLISSY) questionnaire is also used to assess outcomes in GHD, it has been validated for children with short stature and is not specific to child GHD. To address this gap, the Growth Hormone Deficiency-Child Impact Measure (GHD-CIM) was developed as a disease impact measure. Development of the GHD-CIM included systematic reviews of the GHD literature and existing patient-reported outcome (PRO), observer-reported outcome (ObsRO), and clinician-reported outcome instruments, expert advice, and direct patient and parent/guardian input through qualitative concept elicitation and cognitive debriefing interviews conducted by trained individuals with backgrounds in qualitative interviewing and in the native language of the host country [5].

Based on the conceptual development phase, a 33-item validation-ready measure was developed that was intended to examine the option to have two versions: a PRO for children with GHD aged 9 to < 13 years, and an ObsRO completed by parents/guardians of children with GHD aged 4 to < 9 years [5]. Examination of the validation data would be used to make the determination as to the appropriateness of a PRO for children with GHD aged 9 to < 13 years.

The initial review of the validation study data found that the child data had high ceiling effects not seen in the observer data, other than in the Physical Functioning (PHYS) domain; however, the PHYS domain had poor correlation between observer and child. Considering all of the data in its entirety (ceiling effects, item functioning, inconsistency of findings), it was determined that a PRO version for children aged 9 to < 13 years was not psychometrically sound and therefore the decision was made to have only an ObsRO measure of the GHD-CIM, which would be suitable for children aged 4 to < 13 years. This manuscript presents the validation data supporting the ObsRO version of the GHD-CIM.

2 Methods

The conceptual and psychometric validation of the GHD-CIM followed the scientific principles of PRO and ObsRO measures development according to the US FDA [8] and the European Medicines Agency [9], as well as guidance provided by the International Society for Pharmacoeconomics and Outcomes Research [10]. The study was conducted in accordance with the Declaration of Helsinki [11] and the Guidelines for Good Pharmacoepidemiology Practices [12]. In the US, ethics approval was obtained from the Copernicus Group Institutional Review Board (IRB; tracking number TBG1-15-428). In the UK, independent Research Ethics Committee approval was obtained from the National Health Service Health Research Authority (IRAS Project ID 219425, REC Reference 17/LO/0075). The FDA 21 Code of Federal Regulations §50 and §56 [13] were followed, and signed consent/assent was obtained before any study-related activities were initiated.

2.1 Validation Phase

A non-interventional, multicenter, clinic-based study was conducted in 30 private-practice and large institutional (academic/hospital) sites in the US and the UK. The site identified both ongoing and treatment-naïve children with GHD from their current caseload or by chart review, and confirmed their GHD diagnosis and other eligibility criteria (refer to Sect. 2.1.1, for specific information regarding participants). The decision to initiate GH treatment or stay on already initiated treatment was made by the physician as per usual care and independent of the patient’s decision to participate in the study. Participants were withdrawn if the child never started treatment, or stopped treatment at any time during the study.

Sites recruited two populations: (1) prepubertal children aged 9 to < 13 years at enrollment with a confirmed GHD diagnosis who answered questions on their own (PRO); and (2) parents/guardians of prepubertal younger children aged 4 to < 9 years at enrollment with a confirmed GHD diagnosis who answered questions as an observer (ObsRO) for children not able to answer for themselves; each population was divided into treatment-naïve or maintenance groups. Overall, 243 participants (145 children and 98 parents/guardians) were eligible, participated in the validation survey, and were included in the preliminary data analysis used to make the decision about whether to proceed with a PRO version for children aged 9 to < 13 years.

2.1.1 Inclusion and Exclusion Criteria

To be eligible, children in the treatment-naïve group had to have (1) a GHD diagnosis previously confirmed by a GH stimulation test, defined as a peak GH level of ≤ 7.0 ng/mL; (2) no prior exposure to GH therapy, with a decision made to initiate prescription GH therapy within 3 months; and (3) annualized height velocity below the 25th percentile for chronological age [14]. Children in the maintenance group had to have (1) a GHD diagnosis previously confirmed by a GH stimulation test, defined as a peak GH level of ≤ 10.0 ng/mL; and (2) be receiving prescription GH therapy for at least 6 months. In addition, all children had to have a body mass index greater than the 5th percentile and less than the 95th percentile.

Parent/guardian participants were eligible if their treatment-naïve or maintenance child met the diagnostic and medical criteria noted above; however, their child’s age had to be 4 to < 9 years. Additionally, all parent/guardian participants were required to live in the same residence as the child with GHD at least 50% of the time.

Child and parent/guardian participants were excluded from the study if they had previously participated in the study, had mental incapacity, unwillingness or language barriers precluding adequate understanding/cooperation, or were likely to be non-compliant in respect to the study conduct, per the physician. Child and parent/guardian participants were also excluded if the child (1) had any clinically significant abnormality likely to affect growth or the ability to evaluate growth, e.g. chromosomal abnormalities and medical syndromes, significant spinal abnormalities; (2) was born small for gestational age (birth weight and/or birth length less than − 2 standard deviations (SDs) for gestational age); (3) was diagnosed with diabetes mellitus or fasting blood glucose ≥ 126 mg/dL (7.0 mmol/L), or HbA1c ≥ 6.5% at enrollment; (4) had current inflammatory disease(s) requiring systemic corticosteroid or glucocorticoid treatment for longer than 2 weeks within the last 3 months prior to enrollment; (5) required glucocorticoid therapy and was taking a dose of > 400 µg/day of inhaled budesonide or equivalents for longer than 1 month the year prior to enrollment; (6) had concomitant administration of other treatments that may have an effect on growth (hormone replacement therapies were allowed for inclusion); or (7) had any disorder that, per investigator opinion, might jeopardize the subject’s safety or protocol compliance.

2.1.2 Validation Study Visits and Assessments

At the in-person baseline visit, all participants signed an informed consent/assent and completed a paper validation battery, which included sociodemographic items and relevant medical history (such as age, sex, race, household income, comorbid medical conditions, and GHD history), the GHD-CIM, and the one-item Patient Global Impression of Severity (PGIS) scale. The battery also included the QoLISSY [15], DISABKIDS (DCGM-37) [16], Child Sheehan Disability Scale (CSDS) [17], and Diabetes Fear of Injecting and Self-Testing Questionnaire (D-FISQ) [18]/fear of self-injecting (FSI) subscale. Clinicians completed the Clinician Global Impression of Severity (CGIS) scale and clinical measurements of height/weight.

Follow-up assessments were completed by either a mailed survey sent by clinic staff to maintenance participants, including the GHD-CIM and items covering any changes in treatment or major life events since the last assessment, or, for treatment-naïve participants, by weekly telephone calls by clinic staff to assess minimal improvement using the Patient Global Rating of Change (PGRC), a two-item scale assessing change and whether change (since treatment start) was meaningful to the child. Treatment-naïve participants had up to two additional in-person study visits at their clinic to complete the GHD-CIM, PGIS, and PGRC items covering GHD overall, GHD symptoms, physical functioning, social well-being, and emotions (minimally important difference [MID] assessment and week 12 visits). If an in-person follow-up visit was impossible, clinic staff mailed the week 12 measures to the participant for completion. At follow-up clinic visits, clinicians completed the CGIS and global rating of change, GH treatment start date, and any change in treatment, along with height/weight measurements. Only treatment-naïve children who initiated treatment were included in the study.

2.2 Statistical Analysis Methods

All analyses were conducted following an a priori validation statistical analysis plan. All statistical tests used a significance level of 0.05 (two-sided) unless otherwise noted. Statistical tests involving multiple comparisons (e.g. analysis of variance [ANOVA] models with multiple groups) included Scheffe post hoc tests. Statistics were conducted using SPSS [19].

2.2.1 Sociodemographic and Clinical Characteristics

Descriptive statistics were calculated for demographic and clinical variables to describe the study sample.

2.2.2 Descriptive Characteristics: Measure Items

Descriptive statistics were calculated for the individual item responses for items. The ceiling effect threshold for closer examination was set at 50%.

Item-to-item correlation was examined using a Pearson’s correlation matrix of each item in the GHD-CIM. A reliability analysis was conducted for all item pairs, and focus was given to correlation coefficients > 0.50, indicating potential redundancy between the items [20].

Item-to-total correlation was examined using Pearson’s correlation between each item score and the total score. A bivariate Pearson’s correlation was calculated for each item score against the total score (excluding the item of interest), and any item with a value < 0.40 [21] was examined since this indicates that it may not be sufficiently associated with the remaining items in the hypothesized scale.

2.2.3 Rasch Measurement Theory Analysis

Rasch measurement theory analyses were used to examine the ordering of item response options and the scale unidimensionality. Analyzing data according to the Rasch model provides a range of details for checking whether or not summing the scores is justified by the data.

2.2.4 Item Reduction

Items were considered for deletion for reasons of high correlations with other items or total score, floor or ceiling effects, poor fit, or conceptual relevance considerations.

2.2.5 Factor Analyses

Exploratory factor analysis procedures were performed on the correlation matrices derived from the items comprising the measure. Rotational methods (Varimax with Kaiser Normalization) were employed to achieve a meaningful set of factors. The appropriate number of factors to be extracted was determined as a function of the proportion of common variance accounted for, residuals analysis, and scree plot examination, along with clinical and theoretical interpretability. Standardized factor loadings of at least 0.40 were considered acceptable.

A confirmatory factor analysis was also conducted to verify the final factor structure. The following fit indices were used to test and confirm the relationship between the observed variables and their underlying latent constructs: comparative fit index, goodness-of-fit index, and root mean square error of approximation.

2.2.6 Reliability

Cronbach’s alpha was used to assess internal consistency reliability [22]. This statistic is used to analyze additive scales to determine to what degree the items within the scale are associated. A high internal consistency suggests that the scale or subdomain is measuring a single construct. Alpha values range from 0.00 to 1.00; however, a minimum correlation of 0.70 is necessary to claim the instrument is internally consistent (alpha values between 0.80 and 0.90 are preferred).

Test–retest reliability was administered approximately 2 weeks after baseline and assessed using the intraclass correlation coefficient (ICC; 2-way mixed-effect model with absolute agreement) in a subsample from the maintenance group who indicated experiencing no change on the Changes Since Last Assessment items (major life events or treatment).

2.2.7 Convergent Validity

To assess convergent construct validity for each domain and total score of the measure, Spearman correlations (due to non-normal distributions) were computed to measure the association between GHD-CIM scores and the other measures included in the study. Convergent validity was considered supported when scores were substantially correlated (≥ 0.40) with items or instruments measuring similar concepts. When more than one hypothesis per domain was proposed, at minimum one hypothesis should be met to claim convergent validity has been shown. The a priori hypotheses that were tested were:

  • GHD-CIM Total will be correlated with the QoLISSY Total score.

  • GHD-CIM Total will be correlated with overall GHD interference rating.

  • GHD-CIM Physical Functioning (PHYS) will be correlated with the QoLISSY Physical score.

  • GHD-CIM PHYS will be correlated with the overall physical functioning rating.

  • GHD-CIM Social Well-Being (SWB) will be correlated with the QoLISSY Social score.

  • GHD-CIM SWB will be correlated with the overall social well-being rating.

  • GHD-CIM Emotional Well-being (EWB) will be correlated with the DISABKIDS Emotional score.

  • GHD-CIM EWB will be correlated with the overall emotional well-being rating.

2.2.8 Known-Groups Validity

Known-groups validity was also tested for each domain and the total score based on a priori hypotheses using a two-tailed test at a p < 0.05 level. The a priori hypotheses that were tested were:

  • Total: Children (maintenance group) who start GH treatment earlier will have better total scores on the GHD-CIM.

  • PHYS: Increases in height (treatment-naïve) are significantly related to greater improvements in physical functioning.

  • SWB: Children with better coping related to their height will have better social well-being.

  • EWB: Children with more positive emotions related to their height will have greater emotional well-being.

2.2.9 Sensitivity to Change

Formal responsiveness was not assessed within this study’s protocol as it was neither a treatment intervention nor a randomized clinical trial. However, to assess potential sensitivity to changes, we examined the 12-week GHD-CIM follow-up scores of the treatment-naïve group who initiated treatment based on usual care. This provides an idea of the magnitude of change from baseline to follow-up on a new GH treatment. An exploratory analysis of sensitivity to change was conducted using distributional methods to evaluate the effect size (mean change score divided by the SD of the baseline score). Higher values for the effect size indicated a greater sensitivity to change. Using the preferred approach [23], standardized effect size, the mean change divided by an SD served as an effect size index. Standardized effect sizes of 0.2–0.5 were regarded as ‘small’, 0.5–0.8 were regarded as ‘moderate’ and those above 0.8 were regarded as ‘large’ [24].

2.2.10 Interpretation of Meaningful Change

To examine meaningful within-patient change, anchor-based methods were used, with the primary anchor being subjective perceptions of disease severity (PGIS), but also examining more objective, clinician perceptions (CGIS). This analysis used only the treatment-naïve patients who indicated having an improvement in these anchors. Meaningful change was defined as the difference between these two momentary assessments of GHD severity, with differences anchored to changes in one response option (e.g. severe to moderate) or two response options (e.g. severe to mild) [25].

2.2.11 Scoring

Factor analysis was conducted, which informs as to the measurement model and domain structure. The GHD-CIM is scored by summing the items for each domain and converting to a 0- to 100-point scale, with higher scores representing a greater impact. Three positively framed items (PHYS) were reverse-scored. Missing items (‘Don’t Know’ responses were treated as missing) are allowed and are accounted for in the scoring, with three of four items needed to score the PHYS, three of four needed for EWB, and two of three needed for the SWB. If there are missing items and the number of missing items does not exceed the number needed to score any domain, then the transformation calculation must be adjusted for the number of items included in the domain score. The overall score is calculated as the mean of the three domain scores (if a domain score could not be calculated due to missing data, then an overall score was not calculated).

3 Results

Given the decision was made, after examining the preliminary data, to only proceed with the ObsRO version, a total of 98 parents/guardians were included in the final psychometric validation analysis set used for the findings reported in this study.

3.1 Statistical Analysis

3.1.1 Sociodemographic and Clinical Characteristics

Most parent participants were from the US (90.8%). The mean child age was 6.7 years, ranging between 4 and 9 years. The parents’ mean age was 38.6 years (range 25–53). Children were predominantly male (65.3%) and White (82.7%).

The mean age of the children at diagnosis was 5.1 years (range 0.1–8), and the mean age when a child first started taking GHD medication was 5.2 years (range 0.1–8). A small percentage of children were taken off GHD medication (4.2%) for an average of 4.3 weeks. Most subjects (79.6%) used a pen for medication injection, with approximately 4.1% using a needle/syringe.

Over half (53.1%) of the children had no other health conditions and less than one-quarter of the children (20.4%) had been prescribed other medications. Table 1 presents the demographic and clinical characteristics of parent participants and their children.

Table 1 Demographic and clinical characteristics

3.1.2 Item Reduction

The GHD-CIM was examined for item characteristics, including floor and ceiling effects, missing data, item-to-item correlations, and item-to-total correlations; 22 items were dropped during item reduction. Four of the dropped items, which had conceptual coverage with other items, were positively framed, and, when examining the response patterns, were different than the other items and may have added confusion for children completing the questionnaire. The other 18 dropped items were deemed to be conceptually redundant with other items and/or had high ceiling effects. The final GHD-CIM included 11 items.

3.1.3 Descriptive Characteristics of Growth Hormone Deficiency-Child Impact Measure (GHD-CIM) Items

The full range of response options (0–4) were used for 8 of the 11 items. For the items ‘Energy’ and ‘Upset’, only responses 0–2 were used, as no-one answered ‘None/A Little’ for Energy or ‘All of the Time/Often’ for Upset, and for the item ‘Treated Differently by Children’ only responses 0–3 were used, as no-one answered ‘All of the Time’. Despite using the full range of responses, the overall trend was toward the ‘better’ end of the scale. Ceiling effects, responses of ‘Not at All/Never/None’ (where respondents could not get any better) were evident. Consequently, the means and medians were lower than expected.

Mean scores ranged from 0.50 (how often did your child feel upset?) to 2.15 (how often did people think your child was younger than they are?).

For the total sample, missing data (including missing and ‘Don’t Know’ responses) were minimal.

For most items, item-to-total correlations showed acceptable associations between each item against the remainder of the items as a total score (excluding that item). Two items with lower-than-expected associations (< 0.30) were ‘Often Upset’ (0.281) and ‘Worried About Growing’ (0.184).

Table 2 shows the descriptive statistics for individual responses on the GHD-CIM items.

Table 2 GHD-CIM item characteristics (total sample n = 98)

3.1.4 Rasch Measurement Theory Analysis

Item thresholds show that most items (30 of 33) were disordered, i.e. the threshold values between adjacent pairs of response options were disordered by magnitude. The person–item distribution showed that while the items covered a wide range (from difficult to not difficult), the persons were more clustered to the right side, indicating a fairly ‘healthy’ population.

3.1.5 Factor Analyses

An exploratory factor analysis (principal components analysis) was performed (n = 98) on the final 11-item measure. As seen in Table 3, three factors were presented. When evaluating the items within each factor, there was concordance, along with some differences, with the original conceptual framework. All items comprising the EWB and SWB were factored into those domains; however, the items within ‘Symptoms’ and ‘Physical Functioning’ were factored into a single component. It was determined that this was conceptually consistent and that symptom items (e.g. tiredness, energy) could be conceptualized as physical functioning items. This resulted in a change to the hypothesized conceptual framework by combining the ‘Symptoms’ and ‘Physical Functioning’ items into the single PHYS domain.

Table 3 GHD-CIM factor analysisa

A post hoc confirmatory factor analysis was also performed on the GHD-CIM using IBM® SPSS® Amos™ [26, 27]. Adequate fit indices were seen, i.e. comparative fit index (0.984), goodness-of-fit index (0.984), root mean square residual (0.0486), and root mean square error of approximation (0.045) [28, 29]. Additionally, a higher order factor analysis was conducted on the three subscale scores to determine the ability to create an overall score of treatment burden. The subscales factored into a single component, with 61.3% of total variance explained.

3.1.6 Reliability

All coefficients exceeded the threshold of 0.70, indicating internally consistent scales (see Table 4). Test–retest reliability was assessed using the ICC in a subsample from the maintenance group who indicated experiencing no change on the Changes Since Last Assessment items. ICCs for the EWB, SWB, and overall were above 0.70, and were lower than desired for the PHYS (0.66).

Table 4 Evidence for internal consistency of the GHD-CIM

3.1.7 Convergent Validity

For the eight convergent validity hypotheses, associations were significant for seven of eight comparisons (with PHYS being lower than expected). Six associations were greater than the threshold of 0.40, as expected. Significant correlations over 0.40 were found for the overall score with QoLISSY total score (p = −0.78) and overall GHD interference rating (0.44); SWB score with QoLISSY social score (p = −0.79) and overall GHD social well-being rating (0.49); and EWB score with DISABKIDS emotional score (0.62) and overall GHD emotional well-being score (0.55).

3.1.8 Known-Groups Validity

The known-groups a priori validity hypotheses for SWB and EWB were significant (p < 0.05). Additionally, all domains and the total score were able to discriminate between the levels of coping (Fig. 1) and emotional well-being (Fig. 2). For hypotheses of age at treatment initiation, trends (GHD impacts worsening as age at the start of treatment increased) were seen but were non-significant. Hypotheses for improvements in height being associated with better physical functioning were found.

Fig. 1
figure 1

Evidence for known-groups validity of the GHD-CIM based on the QoLISSY Coping domain. Significance for emotional well-being and social well-being was < 0.05; physical functioning and overall were not significant. Assessed using the QoLISSY Coping domain. GHD-CIM Growth Hormone Deficiency-Child Impact Measure, QoLISSY Quality of Life in Short Stature Youth

Fig. 2
figure 2

Evidence for known-groups validity of the GHD-CIM based on the QoLISSY Emotional domain. Significance for all GHD-CIM domains was p < 0.01. Assessed using the QoLISSY Emotional domain. GHD-CIM Growth Hormone Deficiency-Child Impact Measure, QoLISSY Quality of Life in Short Stature Youth

3.1.9 Sensitivity to Change

For the change over time for the treatment-naïve participants who completed a follow-up assessment 12 weeks post-baseline, marked improvements were noted for the SWB, EWB, and overall (ranging between − 6.4 and − 8.6 points on a 0- to 100-point scale). The PHYS domain did not show an improvement over the 12 weeks. Associated effect sizes (mean change divided by the baseline SD) ranged from − 0.26 (PHYS) to − 0.45 (EWB), indicating that the GHD-CIM is sensitive to change at a moderate level.

3.1.10 Interpretation of Meaningful Change

Given the study design (observational with an understanding that changes over time may not be assessed), there was a small sample that indicated some improvement over the 12 weeks. Using the anchors of PGIS and CGIS, GHD-CIM scores were calculated for the groups who had 1- and 2-point improvements in the PGIS and CGIS scales. Table 5 shows that changes for the GHD-CIM total and domain scores, in all but one case (CGIS for SWB), were larger, as expected, for the two-category improvements than for the one-category improvements. Smaller amounts of change, overall, were seen for PHYS. Overall score differences ranged from 5.1 to 8.3 points; PHYS (1.6–7.6 points); EWB (9.5–12.3 points); and SWB (4.1–9.2 points). A preliminary estimate of the MID for the overall score is suggested to be 5 points. Similarly, based on these results, the preliminary estimate would be 5 for PHYS, 7 for EWB, and 5 for SWB.

Table 5 Meaningful change thresholds (within-person)

3.2 Theoretical Model

Based on the validation study findings, the preliminary theoretical model of the impact of GHD on children was revised and is shown in Fig. 3. The final 11-item GHD-CIM is shown in Fig. 4.

Fig. 3
figure 3

GHD-CIM theoretical model. GHD-CIM Growth Hormone Deficiency-Child Impact Measure, GHD growth hormone deficiency

Fig. 4
figure 4

GHD-CIM conceptual framework. GHD-CIM Growth Hormone Deficiency-Child Impact Measure

The final validated 11-item GHD-CIM is an ObsRO for parents of children aged 4 to < 13 years.

4 Discussion

Psychometric analyses for evaluating measurement properties is an iterative process considering both conceptual relevance and psychometric properties. The analyses used to evaluate item performance of the GHD-CIM were in accordance with classical psychometric theory [30].

After examination of the preliminary data, we concluded that given the high ceiling effects for the PRO version, the ObsRO version was more appropriate as a valid and reliable measure of the impact of GHD on children aged 4 to < 13 years. We believe the ceiling effects were indicative of children with GHD having short stature their entire lives and possibly having reduced insight due to accommodation to their condition. Additionally, given that many clinical trials for GHD continue for more than 1 year until a child reaches puberty around age 13 years, a child entering the study at age <9 years, where an ObsRO would be required, will of course be older for future assessments. Using the same ObsRO version over the life of the study for all assessments will facilitate rater consistency over time.

Results showed that the GHD-CIM ObsRO version has acceptable measurement properties of item-to-item, item-to-total correlations, and test–retest reliability; the factor structure was confirmed for a three-domain measure (PHYS, SWB, and EWB), as well as justification for an overall score. Reliability was acceptable, with good internal consistency and adequate to good test–retest reliability. A priori criteria for convergent validity was met for seven of eight domains and the total score (with PHYS being lower than expected). Known-groups validity was confirmed for EWB and SWB, and trends were found for the PHYS and overall score. Emotional functioning was also found to discriminate between all domains and the overall score, with better emotional functioning evidenced for those with less GHD impact.

As with all studies there are limitations that should be noted. First, it was difficult to recruit parents/guardians of children who started treatment at an early age due to US real-world clinical practice patterns. Future studies in countries where treatment is generally started at an earlier age are warranted. Furthermore, this population appears to be on the healthier end of the spectrum, as indicated by the larger-than-expected floor/ceiling effects. With the recruitment of a more diverse population (including more severely symptomatic subjects), the person distribution within the Rasch analysis would be expanded. Additionally, although there was evidence of responsiveness of the GHD-CIM, this was a short study (12 weeks) and treatment benefit may take longer to manifest. In addition, further analyses with more robust sample sizes within the improvement categories would need to be conducted to establish meaningful change thresholds. In future studies that will be testing change, given the effect size we have seen for the total score (− 0.44), calculations would dictate a sample size of approximately 122 (α = 0.05, statistical power at 80%, using percentiles of the effect size distribution). Lastly, it would be helpful to conduct qualitative studies to assess patient perception of meaningful changes and where they occur on the 0–100 scale in studies where change in disease condition is evaluated.

The GHD-CIM is the first GHD-specific measure of the impact of this disease. Having a disease-specific measure in pediatric GHD can be another useful clinical tool for monitoring patients to assess the impact of treatment, as well as to facilitate healthcare provider–patient communication. The measure is brief and covers three broad domains in areas that are typically not extensively covered in routine clinical practice. From a practical perspective, the measure could be completed by the parent/guardian after the patient has been checked in and waiting for the provider. A simple score could be calculated and incorporated into the patient’s medical record, and then, after a period, the measure could be repeated. That would provide the clinician two key data points on GH therapy: (1) annualized height velocity as a primary endpoint, and (2) quality of life (QOL) data; both are important in assessing a patient’s response to therapy. In addition, there are research implications. As new long-acting GH therapies are currently in clinical trials, a QOL measure would also serve as additional clinical data. Additionally, the GHD-CIM is intended to be used in research to assess the impact of new therapies and better understand the burden of disease.

5 Conclusion

The GHD-CIM ObsRO is a well-validated, adjuvant tool to assess disease-specific functioning that is currently not being adequately evaluated in research or clinical practice, providing a more complete patient-centric picture to the GH therapy experience.