FormalPara Key Points for Decision Makers

For survey scores to be interpretable, the minimal important difference (MID) in scores must be quantified, which is the smallest change in a score for a patient that indicates an actual change between two time points.

The MID allows both researchers and clinicians to interpret if treatment plans are effective.

The Treatment-Related Impact Measure—Adult Growth Hormone Deficiency MID has been determined to be an improvement of 10 points in the total score.

1 Background

Adult growth hormone deficiency (AGHD) is a debilitating, rare disease with the incidence of genuine adult-onset growth hormone deficiency (GHD) estimated at 10 per million [1]. Over 50,000 adults in the United States are growth hormone deficient, with approximately 6000 new cases reported yearly (this figure includes children with growth hormone deficiency that are transitioning to adulthood) [2].

AGHD is associated with reduced muscle mass and muscle strength, reduced bone mass or osteoporosis, and an increase in body fat [3, 4]. Additionally, AGHD is associated with increased cardiovascular morbidity and mortality [5]. Findings regarding the role of height as a predictor of health-related quality of life (HRQoL) in AGHD are inconclusive [6, 7], suggesting that other factors may be more central to understanding HRQoL in this population. AGHD is also associated with impaired concentration and loss of memory, dissatisfaction with body image, and decreased quality of life (QoL) [8]. Important areas of impact for AGHD include energy or vitality levels, mood, social isolation, and self-control [9, 10]. Adults with AGHD may also experience psychological impairments such as depression, anxiety, and social isolation [11,12,13,14]. Negative impacts of AGHD on patient physiology, psychological well-being, cognitive functioning, and QoL are well documented; however, studies suggest that growth hormone (GH) treatments can be effective in mitigating poorer health outcomes [15, 16]. In fact, a recent study investigated the long-term effects of growth hormone treatment on adults with GHD and found the effect of GH therapy on QoL showed sequential improvements and stabilization until 6-year follow-up [17].

The Treatment-Related Impact Measure–Adult Growth Hormone Deficiency (TRIM-AGHD), a patient-reported outcome (PRO) measure of the impact of GHD on adult functioning and wellbeing has been previously developed according to the US Food and Drug Administration (FDA) guidelines for the development of PRO measures for use in clinical trials and labelling [18] and found to be conceptually sound and psychometrically reliable and valid [19]. Beginning with qualitative interviews to elicit concepts, then utilizing quantitative data for item reduction and defining the measurement model, the TRIM-AGHD was developed. It is a 27-item measure with four domains (Energy, Psychological, Cognitive, and Physical). It is primarily scored independently for each domain with score ranges of 0–100 (lower scores indicating a better health state). As evidenced with results from a higher-order factor analysis, a total score (range 0–100) can also be reported [19].

As an additional step in the validation of the TRIM-AGHD, this study assessed the sensitivity to change (responsiveness) and minimal important difference (MID) for the measure. The assessment of these psychometric properties for a newly developed PRO is an important scientific practice to ensure meaningfulness and interpretability of the measure [20, 21]. Without this information, it would be problematic for both clinicians and researchers to interpret if treatment plans are effective and if so, if that effectiveness is meaningful to patients. MID can be estimated using both anchor- and distribution-based approaches [22]. The FDA recommends that distribution-based methods for determining clinical significance of changes in scores should be considered as supportive and not the only basis for determining a responder definition [18]. Anchor-based methods assess responsiveness in relation to an independent measure (e.g., external rating) to quantify the meaning of a particular degree of change in the health construct [23,24,25,26]. Distribution-based methods rely on the distribution of scores within a population and relate clinical significance to a change in magnitude at least equal to a statistical parameter of group data such as variability (e.g., standard deviation) or reliability (e.g., Cronbach’s α). When each of these approaches are used to determine the MID of a scale, a range of values rather than a single point estimate are expected [25]. However, a more narrow range or even a single point of MID would be more helpful than the broad range determined by multiple estimates. A triangulation-type method, using anchor-based and distribution-based (both half a standard deviation and standard error of measurement [SEm]), can be used to converge on a reasonable MID [27].

The purpose of this study was to calculate an MID using a triangulation method and to examine responsiveness of the TRIM-AGHD. The primary hypothesis for this study was that as evaluations of severity improve, for example, on one of the measures utilized in this study—the Patient Global Rating of Change (PGRC), treatment-related impact (assessed by the TRIM-AGHD) will improve.

2 Methods

This was a prospective, non-interventional, observational, clinic-based survey study of GHD patients who were starting a new treatment for their GHD at the time of enrolment. Enrolments and study assessments were conducted between March 2014 and December 2015. Eligible patients were recruited by physicians at four study sites (one academic and three private-practice settings) in the US (Los Angeles, CA; Oklahoma City, OK; Salt Lake City, UT; and Dearborn, MI) from their current patient caseload or identified by chart review. Patients were invited to participate by designated site personnel who had completed telephone-based training to review the study protocol and review recruitment procedures. The decision to initiate treatment, and the choice of treatment, was made by the physician as per usual care and independent of the patient’s decision to participate in the study. In order not to influence treatment choice, the patient was asked to participate in the study only after the decision to treat his/her GHD and the specific treatment had been agreed upon by the patient and the physician.

Patients were included in the study if they were a male or female of at least 23 years of age and not more than 79 years of age; were able to speak, read, and write in English; had either adult or childhood onset of GHD and a confirmed GHD diagnosis; were GHD treatment naïve (not being on a prescription treatment for their GHD currently and for at least 6 months); were beginning a new prescription GHD treatment and expected to be on this treatment for GHD for a minimum of 6 months; and completed the informed consent before any study-related activities. Patients were excluded if they were on a prescription medication for treatment of GHD in past 6 months; had a Beck Depression Inventory II (BDI-II) score > 25 at enrolment; were female and pregnant or intending to become pregnant or were breastfeeding or not using adequate contraceptive methods; had acute severe illness associated with weight loss in the last 6 months (defined as a loss of more than 5.0% total body weight); had active Cushing’s syndrome within the last 24 months; had overt diabetes mellitus; had a mental incapacity, unwillingness or language barrier precluding adequate understanding or cooperation; or had previously participated in this study.

At the in-person, baseline visit, all eligible and interested patients signed the informed consent and were then enrolled after completing the screening process. As part of the screening process, all patients had their diagnosis of GHD and all eligibility criteria confirmed by their physician, including the BDI-II. Enrolled patients completed the baseline questionnaire battery, which included the TRIM-AGHD, one-item Patient Global Impression (PGI) of severity, and a brief demographic form. The clinician completed a Clinician Global Impression (CGI) of severity and a brief medical information sheet, which included physical measurements (e.g., blinded waist circumference) and details on prescribed GHD medication.

At the follow-up visit, the patients completed the follow-up questionnaire, which included the TRIM-AGHD, the PGI, and the PGRC. The PGRC has 15 response options ranging from ‘a very great deal better’ to ‘a very great deal worse’ with ‘no change’ in the middle. In this analysis, minimal improvement was defined as patients indicating ‘Almost the same/hardly better at all’, ‘A little better’, or ‘Somewhat better’ [28]. The clinician completed the CGI and assessed a blinded waist circumference.

All patients were expected to complete a follow-up visit at week 8 to assess the MID. After the first six patients were assessed at week 8, the schedule of assessments was modified due to reports from two patients of more than minimal improvement in the anchor questionnaire (PGRC). Subsequently, assessments occurred between approximately weeks 4–8 in order to detect the earliest minimal improvement.

Since more than minimal improvement on the PGRC questionnaire was reported at week 4 ± 1 day by seven patients (among the first 62 patients assessed), it was determined that to best capture the earliest time point where minimal improvement was occurring for the last 34 patients, the assessment strategy had to be altered again. The last 34 patients were monitored weekly by telephone starting 1 week after the baseline visit. During these calls, which were conducted by study-site personnel, all patients completed the PGRC. If patients reported minimal improvement in their disease status between weeks 2 and 7, they were brought back into the clinic for the follow-up visit as soon as possible, within 5 business days, to further evaluate the PGI, CGI, and TRIM-AGHD. For this subset of participants, if no minimal improvement was reported by week 7, their follow-up assessment was completed at week 8 ± 1 week.

Patients were treated as per usual care with no intervention by the study. This study was conducted in accordance with the Declaration of Helsinki and the Guidelines for Good Pharmacoepidemiology Practices [29, 30] and was approved by Copernicus Group IRB (approval TBG1-13-475). In addition, the COSMIN checklist (COnsensus-based Standards for the selection of health Measurement INstruments) [31, 32] was reviewed for design requirements in the assessment of responsiveness.

2.1 Statistical Methods

Sensitivity to change, the ability of an instrument to detect small but important changes, was evaluated using the effect size (ES) [33]. For this index, the numerator was the mean baseline to endpoint change and the denominator was the standard deviation (SD) at baseline. Higher values for the ES indicated a greater sensitivity to change. For the ES, Cohen [34] provided guidance on interpretation of the magnitude, where a 0.20 ES was considered a small change, 0.50 a moderate change, and 0.80 a large change.

The MID of the TRIM-AGHD was assessed using both anchor-based and distribution-based techniques [22, 23, 25]. Distribution-based methods included (i) the examination of 0.5 SD of the change between assessments [35] and (ii) the SEm, which is the observed SD multiplied by the square root of 1 minus the reliability (where reliability is represented by the Cronbach’s alpha coefficient) [36, 37]. For the anchor-based calculation, the MID was assessed using reports from the PGRC, which should represent perceptions of change due to treatment. Minimally important changes were calculated between the initial baseline assessment and the follow-up assessment at the time each patient registered an improvement in their disease status (up to 8 weeks). Change was inherent in the PGRC item, with patients indicating whether their GHD condition had stayed about the same, gotten better, or worsened. The anchor-based MID was calculated as the difference between the improved group (‘Almost the same/hardly better at all’, ‘A little better’, and ‘Somewhat better’) and the group who stayed the same. Given that not one of these methods are more psychometrically robust than another, and in an effort to arrive at a single estimate, we examined each of the different MIDs found using the various methods, averaged them, and rounded to the nearest integer. Therefore, by triangulating, this approach takes into consideration what would be considered both clinically meaningful and perceived as beneficial from the patient’s viewpoint.

3 Results

3.1 Sample Characteristics

Ninety-eight patients were confirmed eligible and enrolled in the study. Data for 96 patients with post-baseline values comprise the full analysis set used for the MID determination. These patients completed questionnaires including the TRIM-AGHD at baseline and at follow-up between 4 and approximately 8 weeks, and 247 monitoring telephone calls were completed between the baseline and follow-up visits. The average time between baseline (treatment initiation) and follow-up visit was 6.57 weeks. As seen in Table 1, mean age was 49.7 years (range 29–68) with 65.6% being female and 76.0% being Caucasian. Global impression of severity at baseline was rated as ‘Very severe’ by 85.4% of patients and 89.6% of their clinicians. The primary cause of GHD was idiopathic in nature (46.9%). Two recruited individuals were withdrawn from the study (one was lost to follow-up and one discontinued medication).

Table 1 Demographic and clinical characteristics

3.2 Responsiveness and MID

At follow-up, the TRIM-AGHD was shown to be highly responsive (ES > 0.80) to treatment with the total score effect size being 1.38 (subscales ranged between 1.22 and 1.36, see Table 2). For distribution-based MID calculations, 0.5 SD and SEm were examined. As shown in Table 2, the 0.5 SD for the TRIM-AGHD total score was 8.09 (subscales ranged between 8.44 and 9.18). The SEm for the TRIM-AGHD total score was 2.66 (subscales ranged between 3.55 and 4.57). Examining the anchor-based method using the PGRC, differences were larger. The majority of patients indicated getting ‘Better’ (59.4%) or were ‘About the same’ (34.4%). Only six patients (6.3%) indicated a worsening of their GHD. As shown in Table 3, the difference in TRIM-AGHD total score between the ‘Better’ group and the ‘About the same’ group was 20.43 (subscales ranged between 19.63 and 21.80).

Table 2 TRIM-AGHD: sensitivity to change
Table 3 TRIM-AGHD change scores by Patient Global Rating of Change (PGRC)

3.3 MID Estimate

Averaging each of the MIDs from all approaches (SEm, 0.5 SD, and the anchor-based PGRC), the MID based on the convergence of these values was for each of the subscales: Energy (11.45), Psychological (10.65), Cognitive (11.28), and Physical (11.11), and for the total score (10.40). Rounding to the nearest integer for these values suggests that the MID value for each of the subscales (Energy, Psychology, Cognitive, and Physical) should be 11 points and the MID value of 10 points should be used for the total score. These values are believed to be clinically meaningful and would be perceived as beneficial from the patient’s viewpoint.

3.4 Waist Circumference

TRIM-AGHD scores were also evaluated in relation to changes in waist circumference (patients who either increased in waist circumference [between 0.50 and 3.25 inches], decreased [between 0.40 and 10.00 inches], or remained with the same circumference). As seen in Fig. 1, TRIM-AGHD scores were sensitive to waist circumference change. All patients had improvements in TRIM-AGHD scores, but patients who had a decrease in waist circumference had statistically significant improvements (p < 0.01) compared with patients who had an increase in waist circumference.

Fig. 1
figure 1

TRIM-AGHD change by waist circumference. TRIM-AGHD Treatment-Related Impact Measure—Adult Growth Hormone Deficiency

4 Discussion

As the number of clinical trials in AGHD increases, determining the MID of instruments used to measure GHD response is prudent for the conduct and interpretability of meaningful future clinical trials. Additionally, it is important for clinicians who treat these patients to assess treatment benefit over the course of treatment so that targeted treatment strategies can be implemented. For adults with GHD, improvement in height is not an appropriate endpoint to assess this improvement. Incorporating the TRIM-AGHD in routine clinical visits can help both patients and their healthcare providers to better understand the effects of treatment and accurately assess this change over time. However, little work in MID estimation has been done in the context of AGHD and this study represents a step forward in understanding how to interpret the magnitude of change of a given therapy in both the clinical and research setting.

We focused on the evaluation of anchor-based and distribution-based approaches to defining the MID. A limitation in the application of different anchors or anchor types may produce different values of the MID [38], as could be true with the distribution-based methods in which different statistical approaches may also produce differing MIDs. Additional limitations include a potential differing of defined MID values based on whether data collection of the anchor was prospective versus retrospective [39] or the possibility that the MID as determined by anchor-based methods falls within the instrument’s random variation [23]. Distribution-based methods are limited by their ability to define only a minimal value below which a change in outcome score for a given measure may be due to measurement error [40], which does not provide information on clinical importance.

This study focused on the further evaluation of the TRIM-AGHD with respect to its ability to be sensitive to changes experienced by patients. As seen in this study, the TRIM-AGHD was highly responsive in GHD patients starting GH treatment as evidenced by effect sizes exceeding 1.00 in 93.8% of patients. Distribution-based and anchor-based approaches were used to converge and establish the MID. It should be noted that the estimated MID for the anchor-based approach using patient-reported perceived change was 20.43 while the distribution-based values were smaller (2.66 for SEm and 8.09 for 0.5 SD), suggesting that even with a small treatment effect, patients are reporting large improvements in functioning and wellbeing. Regarding the relationship between the SEM and the MID, it should be pointed out that while some studies show excellent agreement between SEM and minimal differences, others are not as strong [41]. Our goal here, given the variation between distributional-based and anchor-based approaches, was to triangulate the various methods as no one method is more robust than the other. This way we allowed the different methods to account for an MID estimation until further studies are examined.

When evaluating these results using the COSMIN ‘responsiveness’ checklist [32], the methodology of the study was rated as good or excellent on each of the criteria. However, as with all studies, there are limitations to this one. This study was US-based and data is predominantly from one site, and findings from other countries, especially those with differing cultural beliefs, may be different. While we acknowledge the numbers used in the analysis were small, we believe the sample size was adequate for performing the psychometric test used to evaluate the MID in this study, as the majority (59%) of the sample reported a minimal level of improvement necessary to allow the calculation of an MID for each of the methods used. Additionally, the etiology of the GHD was unknown for almost half the sample, a greater percent than would be expected. This is most likely due to the fact that the information was self-reported by the patient who may not have been aware of their GHD etiology. However, given that the patients were recruited by physicians from their own practice and an inclusion criterion required a confirmed diagnosis of GHD, we do not believe that the unknown etiology is reflective of an unclear diagnosis.

Another limitation revolves around the recent treatments for AGHD and how quickly they act. In this study, treatment benefit was reported by patients soon after starting treatment. In fact, one-third (33.3%) of patients, on their first report of improvement as early as week 4, had more than a minimal amount of improvement. The use of anchor-based approaches in situations like this are not optimal as minimal changes are typically not caught in time, and the more subtle changes are missed. To adjust for this reality, we used the first three levels out of seven possible levels of reported improvement. This may explain why the estimates using the PGRC are larger than the estimates found from the distribution-based approaches.

Additionally, concomitant treatment was not investigated in this study. We believe this would be of value to investigate in future studies. Also, understanding the relationship between patient- and physician-reported MID would be of interest.

5 Conclusions

The suggested MID for the TRIM-AGHD based on this study is an improvement of 10 points in the total score. Improvements that meet or exceed this threshold should be considered clinically relevant and important. Thus, having patients complete the TRIM-AGHD in both a research and/or a clinical setting can be a valuable tool for assessing patient-reported treatment benefit. Given the high degree of responsiveness to treatment of the measure, by applying an MID value of 10 points to interpreting change in total scores, researchers can better assess the full range of differences when comparing treatments and clinicians can better assess if treatment is effective for a given patient.