Responsiveness is defined as an instrument’s ability to accurately detect change that has occurred [1, 2]. Internal responsiveness is defined as an instrument’s ability to change during a prespecified time frame. External responsiveness is the extent to which a measure’s degree of change corresponds to an external reference value or measure (assesses an instrument’s ability to reflect both change and no change in the external standard) [3, 4].

This study examined the responsiveness of 2 patient-reported outcomes (PRO) measures, the Treatment Related Impact Measure-Diabetes (TRIM-D) and Treatment Related Impact Measure-Diabetes Device (TRIM-DD), which were developed as disease-specific PRO measures to assess the impact of diabetes treatment for both type 1 and 2 diabetes and across the spectrum of pharmacological treatments and delivery methods [5]. The TRIM-D is a 28 item measure with 5 domains assessing Treatment Burden, Daily Life, Diabetes Management, Compliance and Psychological Health. The TRIM-DD is an 8 item measure with 2 domains assessing Device Bother and Device Function. Both measures can be scored independently for each domain or as a total score. Higher scores indicate a better health state. The item generation and preliminary validation were conducted following FDA guidelines for PRO measures development [1]. Initial validation data for the TRIM-D and TRIM-DD were collected via an online, cross-sectional survey of 507 US patients. The cross-sectional validation showed that both measures have acceptable psychometric properties [5].

The purpose of the current study was to continue the validation process by examining the measures’ responsiveness and to confirm the measurement model under randomized controlled trial (RCT) conditions.


The data used to assess responsiveness came from a multi-center, randomized, open-label, 2 × 12 week period cross-over study of two prefilled pens in subjects with type 1 or 2 diabetes. All subjects were using insulin by vial/syringe previous to inclusion in the study and were pen naïve. Data for these analyses came from all patients who had completed the TRIM-D and TRIM-DD at randomization (baseline) and time of cross-over (week 12). Non-superiority for glucose control between groups was hypothesized. The study was approved by Sterling IRB (approval #2925), and all persons gave informed consent.


Analyses were conducted according to an a priori statistical analysis plan. All statistical tests were two-tailed and conducted with an alpha level of 0.05 as minimal threshold for significance. As the TRIM-D and TRIM-DD are intended to be used as either a total score or as independent domains, change scores were examined for both the totals and domain scores.

Responsiveness analyses

To examine internal responsiveness, t tests were used to examine differences in TRIM scores between baseline and week 12 (time of cross-over) with the expectation that significant improvement over time would be shown. Effect size (ES), measured by Cohen’s d, was examined by calculating the mean change in score divided by the standard deviation of the mean baseline TRIM score. ES was categorized: small, 0.2–0.3; medium, 0.4–0.7; and large, 0.8 or above [6].

External responsiveness was examined by testing the hypothesis that there will be a linear relationship between the TRIMs and treatment satisfaction (TS) as assessed by the insulin treatment satisfaction questionnaire (ITSQ) [7]. The ITSQ, a disease-specific PRO assessing insulin TS, has been shown to be reliable and valid [7, 8]. Pearson correlation coefficients between the change in ITSQ overall summary score (from baseline to week 12) and the change in each item and domain of the TRIM-D and TRIM-DD were examined.

Confirmatory analyses of measurement model

A confirmatory factor analysis (CFA) was conducted using the Bentler comparative fit index (CFI) and root mean square error of approximation (RMSEA) to determine the goodness of fit between the models previously identified [5] and the current sample data. The criterion used to indicate acceptable fit was a CFI of at least 0.90 [9] and an RMSEA of 0.06 [9] or less.

Internal consistency reliability was examined and compared with the original sample with Cronbach’s alpha, a statistic calculated from the pairwise correlations between items. Alphas range between zero and one, with coefficients of greater than 0.70 indicating acceptable reliability [10].


In the cross-over study, 242 subjects completed the TRIM-D and TRIM-DD at baseline and week 12 (Table 1).

Table 1 Sample description

Responsiveness analyses

Internal responsiveness

All TRIM-D and TRIM-DD domains and overall total scores and most individual items (TRIM-D: 23/28; TRIM-DD: 6/8) changed significantly after 12 weeks of randomized treatment. For the Treatment Burden, Diabetes Management, Daily Life, and total TRIM-D, these significant change scores were associated with large to moderate ES. For the Psychological Health and Compliance domains, the significant change scores were associated with a small ES. Score changes ranged from 18.6 (ES 0.84, TRIM-D Treatment Burden) to 3.1 (ES 0.17, TRIM-D Psychological Health). For the TRIM-DD domains and total score, large changes (9.4–10.1) along with moderate ES (0.43–0.56) were seen (Table 2).

Table 2 Responsiveness of the TRIM-Diabetes and TRIM-Diabetes Device items and domains

External responsiveness

Strong associations were found between the ITSQ change, TRIM-D Total score (r = 0.72, P < 0.001) and TRIM-DD Total score (r = 0.68, P < 0.001). Moderate to strong correlations were noted between the ITSQ overall summary score and items from the domains: Treatment Burden (r ranging between 0.32 and 0.53), Daily Life (0.37–0.45), Diabetes Management (0.22–0.38), Psychological Health (0.35–0.51), Device Function (0.30–0.51), and Device Bother (0.40–0.57). Lower associations were noted between ITSQ score and the Compliance domain (0.14–0.25).

Confirmatory measurement model analyses

Fit statistics

The model fit statistics for the TRIM-D and TRIM-DD Total domains were confirmed and are presented in Table 3.

Table 3 TRIM-Diabetes and TRIM-Diabetes Device measurement model properties

Internal consistency

All alphas for the TRIM-D and TRIM-DD (overall score and all domains) were above 0.70 indicating acceptable internal consistency. Additionally, the confirmatory RCT sample alphas were similar to the development coefficients (within 0.1).


These analyses found that the TRIMs total scores as well as all domain scores were significantly responsive over time and had the ability to differ between levels of change of an external criterion. Thus, internal and external responsiveness for the TRIM-D and TRIM-DD have been confirmed in an RCT sample. The measurement model was confirmed for all domains with lower than expected fit statistics for the Daily Life and Diabetes Management domains. Given that these domains were shown to have a strong factor structure in the development of the measures [3], this finding may be specific to this trial design or sample. Further testing the TRIM-D domain structure in other trials is warranted to confirm these findings.

The total score and all domain scores of the TRIMs were significantly responsive over time with the Treatment Burden domain showing the greatest responsiveness and the Psychological Health domain the least responsiveness. Additionally, the greatest number of individual items which were not responsive over time came from the Psychological Health domain. These findings should be interpreted in light of the study’s nature. Given that all patients received the same insulin treatment, it is understandable that the psychological component of treatment, which is often driven by treatment efficacy, would be the least responsive. However, the fact that the overall Psychological Health domain was still significant as an overall concept and suggests that insulin pen delivery system does contribute positively to the psychological impact of treatment.

As expected, given that the study was a device cross-over with non-superiority for drug effect, the Treatment Burden domain, the domain which should be most impacted by delivery mode, was the most responsive domain. These findings underscore the importance of understanding the independent contribution of domains, given the specific study design and hypotheses, in order to optimally identify, a priori, domains of a measure which will be responsive to change. As the TRIMs were developed and validated for stand-alone use of each domain as well as the total score, future use of the TRIMs can and should take independent domain responsiveness into consideration when making these a priori hypotheses.

Certain study limitations should be considered in interpreting results. To assess external validity, the ITSQ, a PRO measure rather than a clinical measure, was used as the reference value. It was not possible to use a clinical reference value due to two factors. First, HbA1c ≤9% was a study eligibility criterion and the majority of patients entered the study in good or adequate HbA1c control (61%, <7.5). Thus, there could only be a limited number of patients who could change from inadequate to adequate glucose control. In fact, in this sample, there were only 11 patients (4.8%) who changed from randomization poor control (>7.0%) to adequate control over the 12-week period (<7.0%). Second, the study was designed as a non-inferiority trial to examine difference in insulin delivery mode rather than drug treatment efficacy, and all patients received the same insulin treatment during the study. Thus, no differences in glucose control were expected or found. As a result of these design features, there was not an adequate size sample of patients who had a significant improvement or worsening of HbA1c to conduct responsiveness analyses using a clinical reference value. Further, the fact that a majority of these patients were in good control at study start may limit the external generalizability of findings.

Validation is an iterative process. This study continues that process for the TRIM-Diabetes and TRIM-Diabetes Device measures. To date, all evidence supports the use of these measures in future clinical trials.