Estimating meaningful change for The Impact of Weight on Self-Perception (IW-SP) questionnaire among people with type 2 diabetes

Purpose The Impact of Weight on Self-perception Questionnaire (IW-SP) is a three-item patient-reported outcome measure (PROM) instrument assessing the impact of body weight on self-perception. To date no published threshold for meaningful change exists. The objective of this study was to estimate the minimal important change (MIC) for the IW-SP among people with type 2 diabetes. Methods Responder analyses were conducted using anchor- and distribution-based approaches with existing clinical trial data (SURPASS-2). As SURPASS-2 did not include a priori anchors, a set of alternative exploratory anchors were identified based on the MICs and items from two conceptually related measures used in the trial as well as percent change in body weight. Exploratory anchors with change estimates that were sufficiently related to change in IW-SP (r ≥ 0.30) and were not redundant with other anchors were retained for the MIC analyses. The analyses were conducted in two stages (estimation = 2/3 of sample) to derive initial IW-SP MIC estimates, and a subsequent confirmation stage (remaining 1/3 of sample). Results While the most conceptually related anchors and items performed best in responsiveness analyses, all anchors resulted in a similar estimate of minimal meaningful change for the IW-SP total score: a 1-point change in raw units (1–5-point scale), corresponding to a 25-point change for transformed scores (0–100 scale). Distribution-based analyses supported these MIC estimates. Results were similar across both stages for all analyses. Conclusion The MIC for the IW-SP for patients with T2D is a 25-point change on the transformed score. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-023-03476-0.


Introduction
The association between obesity and diabetes is well established [1][2][3], and approximately 90% of patients with type 2 diabetes (T2D) are living with obesity or overweight [4].Weight management is a part of diabetes control, and some diabetes medications result in weight loss.Self-perception related to body weight is an important concept for T2D as it may be associated with patients' motivation to adhere to the medications and support efforts to reach and maintain a healthy body weight.
The Impact of Weight on Self-perception (IW-SP) [5,6] is a patient-reported outcome measure (PROM) that assesses self-perception associated with one's body weight.This measure has been used in clinical trials, however to our knowledge, a minimal important change (MIC) [5][6][7] has not been established to guide the interpretation of score changes.The MIC refers to the minimal meaningful withinpatient group mean change over time [8].We will use MIC, but this and related concepts are also referred to by other names, including minimally important difference (MID) and minimal clinically important change (MCIC), and simply meaningful change.The MIC is crucial to interpreting whether changes in self-perception related to weight loss, as measured by the IW-SP, represent changes that are meaningful to patients.
The primary objective of this work was to derive an MIC for improvement in the IW-SP total score among individuals with T2D using data from the SURPASS-2 trial [9].The current investigation was focused on improvements in the IW-SP, since the majority of participants in the clinical trial dataset lost weight.

Measures
The IW-SP, and two other patient-reported outcomes (PROs) instruments were used in this study to help determine the MIC of the IW-SP.

Impact of Weight on Self-perception Questionnaire (IW-SP)
The IW-SP contains three items [6]: 1. How often do you feel unhappy with your appearance due to your weight? 2. When going out in public, how often do you feel selfconscious due to your weight? 3. When comparing yourself to others, how often do you feel unhappy, due to your weight?
Each item is rated on a five-point scale: 1 = "always," 2 = "frequently," 3 = "sometimes," 4 = "rarely," and 5 = "never."The IW-SP total scores are derived by summing the item scores and dividing by the number of items.
The score can be transformed to a range from 0 to 100.Higher IW-SP scores correspond to better self-perception, and the items apply to the present [6].

Ability to perform physical activities of daily living (APPADL)
The ability to perform physical activities of daily living (APPADL) contains seven items that assess how difficult it is for patients to engage in certain activities considered integral to normal daily life, such as walking, standing, and climbing stairs [10].Items are scored on a five-point numeric rating scale, where 5 = "not at all difficult" and 1 = "unable to do."A raw overall score is calculated by summing scores of the seven items and applying a linear transformation to yield a raw score ranging from 0 to 100.
Higher scores indicate better ability to perform activities of daily living, and the items apply to the present.

Impact of weight on quality of life-lite clinical trials version (IWQOL-Lite-CT)
The impact of weight on quality of life-lite clinical trials version (IWQOL-Lite-CT) is a 20-item, obesity-specific PROM developed for use in obesity clinical trials.It assesses three primary composites of obesity-related health-related quality of life (HRQoL): physical (seven items), psychosocial (13 items), and a five-item subset of the physical compositethe physical function composite score.Items in the physical function composite describe physical impacts related to general and specific physical activities.All items are rated on either a five-point frequency scale ("never" to "always") or a five-point truth scale ("not at all true" to "completely true") [11,12].
The IWQOL-Lite-CT composites and total scores are scored by averaging the responses of items within each scale, and then multiplying by the total number of items in the scale.Higher scores indicate better quality of life, and the recall period is 1 week.

Analyses
Anchor-based methods, recommended by the United States (US) Food and Drug Administration (FDA) to interpret PROM scores, were considered the primary approach, and distribution-based methods, recommended as supportive, were secondary [13,14].Anchor-based methods are those that explore the association between the targeted concept of a PROM (in this case, self-perception associated with one's body weight, as measured by the IW-SP) and the same or a closely related concept measured by an independent instrument, the anchor.Anchors are typically single-item stems with a response scale that has descriptions accompanying each response option, so that meaningfulness to patients can be interpreted.Using this method, changes seen in the PROM are compared-or anchored-to changes on the anchoring instrument.An anchor must be conceptually or logically related to the concept measured by the PROM of interest and show an association with that PROM.
Analyses estimating an MIC for the IW-SP used SUR-PASS-2 clinical trial data [9].The SURPASS-2 study was an open-label, 40-week phase 3 trial of patients with T2D randomly assigned to receive tirzepatide at a dose of 5 mg, 10 mg, or 15 mg or semaglutide at a dose of 1 mg.The SURPASS-2 trial did not include a global scale related to patient impressions of change or severity in self-perception of body weight.In the absence of a specific anchor, the current study used multiple exploratory anchors to gather a body of evidence supporting a meaningful change in the IW-SP.The exploratory anchors that were considered included weight loss and scales from conceptually related PROMs that were completed by participants in the trial, the APPADL and the three composites and total score of the IWQOL-Lite-CT.In measuring one's perceived quality of life (QoL) associated with body weight, the IWQOL-Lite-CT was seen as logically related to self-perception of body weight.The APPADL, in measuring ADLs associated with obesity, is likely more distally related to self-perception and body weight, but a relationship may exist such that those losing weight improve in their ADLs as well as in their selfperception of body weight.Both PROMs have established MICs in people with either T2D or obesity.In addition, the IWQOL-Lite-CT also has two individual items that measure self-perceptions associated with body weight that were used as anchors: (Item 7) I feel less confident because of my weight (never, rarely, sometimes, usually, always) and (Item 20) I feel frustrated or upset with myself about my weight (not at all true, a little true, moderately true, mostly true, completely true).
Both anchor-and distribution-based approaches were used for the current analyses, first in 2/3 of the sample (estimation sample), then these initial results were confirmed in the other 1/3 of the sample (confirmation sample).The estimation and confirmation subsamples were created by random assignment.As most participants in the SURPASS-2 trial lost weight (approximately 73%), these analyses focused on change in terms of weight loss and improvement in PROM scores.Note that the process of deriving anchors was only performed during the estimation stage.
All analyses were conducted without consideration for treatment assignment in the trial, as the objective was to assess the meaningfulness of improvement in perception of body weight, irrespective of what drives weight loss.Change for all analyses was calculated using data from baseline and the trial endpoint (week 40), as these were the only visits where PROs were administered.All analyses were conducted using SAS Software (version 9.4).

Identifying potential anchors
The first step was defining prospective anchors, which could be used to group trial participants in terms of meaningful improvement.MIC values for weight loss from the literature show 5% and 10% change as frequently used indices of meaningful change in body weight [10,15,16], so both estimates were considered.The MIC for the APPADL was previously estimated as a range based on multiple approaches, from 6 to 14 points on a scale from 0 to 100 [10].The midpoint of this range, 10 points, was used in the current analysis.The MICs for the IWQOL-Lite-CT were established by the developers as follows: physical composite (13.5 points), physical function composite (14.6), psychosocial composite (16.2 points), and total score (16.6 points); all scales range from 0 to 100 [17].Change in Item 7 and Item 20 of the IWQOL-Lite-CT are discrete integers ranging from − 4 to 4, so the smallest change possible (one point) was used.
Finally, the percent of weight change that corresponds to the MICs of the APPADL and the IWQOL-Lite-CT Scales were estimated.These values were compared with the 5% and 10% values.These were estimated using regression, where change in the respective scale was the independent variable, percent change in body weight was the dependent variable, and the value of percent change in body weight predicted by each respective scale's MIC was the value considered as a potential anchor.

Selection of anchors
First, only variables showing a minimum relationship of |0.30| (Spearman's rho) in change [18] distributions (baseline to trial endpoint) were further included as anchors in the subsequent analyses.Second, redundant anchors for weight were eliminated.That is, potential anchors that grouped subjects by weight loss were eliminated if they converged on the same quantity of weight loss For example, seven anchors using percent change in weight-two from the literature [10,15,16] and five predicted by scale MICs-were explored as potential anchors for further analyses.A ≤ 5% point was identified a priori as the cut-off for redundant anchors, as 5% was the smallest MIC for weight change identified in our literature searches.

Application of anchors to form groups
The next step was to apply the anchors to form participant groups based on change in MIC.As the objective was to estimate the minimum meaningful change in the IW-SP, these analyses were performed using half MICs for all anchors, save for the IWQOL-Lite-CT items.Half MIC intervals were used in order to define a reasonable range within each change category for each of the anchor variables, as few (if any) people would have experienced exactly a 1 MIC change for each anchor variable.The intervals around the half MICs were shifted, such that groupings encompassed 1.0 times each anchors' actual MIC (Table 1-middle column).
For the IWQOL-Lite-CT individual items, the "no change" group comprised participants indicating 0 points of change between baseline and trial endpoint, and the additional groups indicated improvements of 1-4 points between baseline and trial endpoint (five groups in all; Table 1-last column).

Assessing meaningful change and responsiveness
Once anchors were selected, group differences by anchor category were tested to establish responsiveness with the prediction that group differences on the IW-SP scores would be a function of differences in the conceptually related anchors.Responsiveness refers to the extent to which the instrument can detect true change in participants known to have changed on the concept of interest [19].IW-SP responsiveness analyses compared change scores between participants with different degrees of anchor change using general linear models (analysis of covariance [ANCOVA]), controlling for age, gender, and baseline body mass index for each anchor.
These analyses also allowed assessment of the IW-SP's ability to reliably discriminate groups defined by differing increments of change, adding evidence for the reliability of the estimated MIC.

Distribution-based approaches
Distribution-based methods for estimating the magnitude of meaningful change in PROM scores utilize statistical parameters from the clinical trial population, where a PROM change score is expressed relative to some measure of variability.
The distribution-based approach was a calculation of standard deviation (SD) units using the baseline score.The half SD has been shown to provide a reasonable approximation of a meaningful change in PROMs [20].The standard error of measurement (SEM) [21] was not included, as the time interval between baseline and trial endpoint was too long to estimate test-retest reliability, which is required for calculation of the SEM.

Triangulation
A single estimate for an MIC for improvement in the IW-SP was achieved by triangulating findings of the anchor-and distribution-based analyses, supported by visualization of data on cumulative distribution function and probability density function plots [18,22,23].Special attention was paid to the IW-SP change score that corresponded to a one-category improvement (0.75 to < 1.25 MIC improvement) when there was a statistically significant group difference between IW-SP scores corresponding to that anchor category and the adjacent "no change" anchor category.MIC estimates for the IW-SP from the confirmation and estimation stages are presented and discussed together, allowing the extent of the convergence of estimates to be evaluated.That is, comparison of the estimation and confirmation stage estimates is itself a form of triangulation.
The MIC for improvement in the IW-SP total score at this stage was expressed in terms of the transformed scale and the raw scale.The latter is more interpretable as it aligns directly with the actual IW-SP responses.Following this process, a final MIC estimate was produced and rounded to the nearest mathematically possible change value for the IW-SP total score, both in terms of raw and transformed scales.

Sample description
The total sample was N = 1878: n = 1252 in the estimation group and n = 626 in the confirmation group (Table 2).The groups were similar in terms of sociodemographics, IW-SP, and anchor scores (Tables 2, 3, S1).
As more than 25% of participants had baseline scores at ceiling for both the estimation and confirmation stages (31.7% and 31.2%,respectively), a sensitivity analysis was

Anchor selection and anchor-based analyses
Regression analyses to identify change in body weight corresponding to each of the anchor scale's MICs converged on a similar estimate, falling between 10.2 and 10.8%, so only 10% change was included in the subsequent analyses.The correlations between change distributions for each potential anchor and the IW-SP showed that all prospective anchors exceeded the minimum requirement of ρ =|0.30| except for percent change in weight, which did not reach the threshold (ρ = − 0.23; Table 3).The relationships mostly varied as a function of conceptual relatedness, as the APPADL was the weakest after percent change in weight (ρ = 0.35), and the IWQOL-Lite-CT Psychosocial composite was the strongest (ρ = 0.57).While percent change in weight did not meet criteria, and is not considered an anchor, it was included in subsequent reporting as a reference to the prior literature.Following these analyses, the exploratory anchors submitted to meaningful change, responsiveness, and triangulation analyses were as follows: IWQOL-Lite-CT Physical composite, Physical Function composite, Psychosocial composite, and Total Score, and the IWQOL-Lite-CT Item 7 and Item 20, and the APPADL.
With few exceptions, the IW-SP total score changed in the expected direction and consistently across anchor categories for both the estimation (Table 4) and confirmation (Table S2) analyses, where self-perception regarding weight improved as anchor categories improved.All prospective anchoring scales showed significant omnibus effects across anchor categories (all p < 0.0001) for both the estimation (Table 4) and confirmation (Table S2) groups.The important anchor category corresponding to 1 MIC of change (0.75 to < 1.25 MIC) showed relatively high variability at confirmation and estimation stages, as CVs were near 1 or > 1 for the anchor categories based on summary scores (0.89 to 1.32), though CVs for individual Item 7 and Item 20 were lower (0.59 to 0.66).
Pairwise follow-up tests for significant omnibus effects, however, differentiated the prospective anchor scales.The APPADL MIC and weight change (5% and 10%) only showed significant pairwise differences when comparing patients categorized as not improving or improving very little vs. those who improved markedly (e.g., improvement     The composites and items of the IWQOL-Lite-CT, by contrast, revealed better responsiveness of the IW-SP, as pairwise comparisons using anchors based on these scales were able to significantly differentiate IW-SP change scores when comparing most anchor-based groupings of participants.This includes the comparisons of the IW-SP scores corresponding to < 0.25 MIC vs. 0.75 to < 1.25 MIC (i.e., Group 1 vs. Group 3, Table 1) for all of the IWQOL-Lite-CT composites in the estimation analysis, and for the IWQOL-Lite-CT Psychosocial composite and IWQOL-Lite-CT total score for the confirmation analysis.
Of the IWQOL-Lite-CT composites, however, only the participant groupings based on the IWQOL-Lite-CT Psychosocial composite-Item 7 (I feel less confident because of my weight) and Item 20 (I feel frustrated or upset with myself about my weight)-were able to differentiate IW-SP change scores from adjacent anchor categories.IW-SP scores corresponding to the anchor category of 0.75 to < 1.25 MIC of the psychosocial score were significantly improved compared with those corresponding to the < 0.25 to < 0.75 MIC category (i.e., Group 2 vs. Group 3, Table 1), and this was the case for both the estimation and confirmation analyses (all p < 0.05).
Similarly, IW-SP change scores corresponding to a 1-point improvement on both Item 7 and Item 20 were significantly improved compared to scores corresponding to no change on the items, and this was the case for both the estimation and confirmation analyses (all p < 0.01).That is, the scores corresponding to the category reflecting the minimal amount of improvement was significantly improved compared to the scores corresponding to the category reflecting no change.Most other comparisons for Item 7 and Item 20 showed significant pairwise comparisons for both the estimation and confirmation groups.These results support that these two items are likely the best anchors available.This is not surprising given that these two items are also most closely conceptually aligned with the IW-SP.
While all anchors showed evidence for the responsiveness of the IW-SP to change, these results indicate that the comparisons of anchor categories representing minimal possible change were reliable for conceptually related scales: the IWQOL-Lite-CT Psychosocial composite, and the two items comprising that scale, Item 7 and Item 20.

Distribution-based approaches
Distribution-based estimates can be found in Table 5.Estimates are very similar across the estimation and confirmation groups.These results are supportive of those found in anchor-based analyses, where half SD is slightly lower at approximately 15 points.

Triangulation
Triangulation results can be found in Table 5, which shows the IW-SP change score that corresponds to the 0.75 to < 1.25 MIC of change in each respective anchor, 1-point improvement compared to no change for the IWQOL-Lite-CT items, and the distribution-based estimate of half SD.That is, this table shows the change in the IW-SP total score that corresponds to the minimum change for each anchor.
Results in Table 5 are given in 0-100 transformed score units as well as in IW-SP Total raw scale units (i.e., the mean of the three IW-SP items, each of which are on a 1-5-point scale).Triangulation is discussed in terms of raw units of the IW-SP.While the transformed score is useful for easy calculation and formation of heuristics, it is more intuitive to present evidence for the MIC in terms of the actual participant responses.Change in the IW-SP is also included in terms of the total number of changes in response categories across the three IW-SP items (raw score/0.33)needed to produce the corresponding total score change.
There is a high degree of consistency between the estimation and confirmation groups, with no estimate differing by more than 0.2 points, and the average difference is near 0. Estimates of the MIC for improvement in the IW-SP total score range from 0.60 points to 1.08 points across 16 estimates and two approaches (anchor-based and distribution-based).
When examining the responsiveness results, it's clear that the IW-SP was more sensitive to small anchor category changes when anchored by the composites of IWQOL-Lite-CT, and sensitivity was greatest when anchored by the Psychosocial composite and two of the most relevant items that comprise that composite (Item 7 and Item 20).This is as expected as these are conceptually the closest to the IW-SP.However, the range of MIC estimates when considering these scales does not markedly differ from the other estimates, ranging from 0.83 points to 1.08 points for anchorbased estimates.
Rounding up to be conservative, an MIC of a 1-point improvement for the IW-SP total raw score is supported by these analyses, equivalent to a 25-point1 improvement of the transformed score.This is the equivalent of 3 points of response category improvement across the three items of the three IW-SP items (i.e., this corresponds to patients improving on average by 1 category across each of the three IW-SP questions).This estimate is also supported by CDF plots; Fig. 1 shows the CDF plots for select anchors, including the Psychosocial Scale and Items 7 and 20 from the IWQOL-Lite-CT (%weight loss is included as a reference).The degree to which these curves are separate from one another, and do not overlap, reflects how distinct the change scores from one anchor category are from the others.

Discussion
Both the estimation and confirmation stages of the current study arrived at an MIC of a 25-point improvement for the IW-SP, corresponding to a one-category improvement, on average, across each of the three IW-SP questions.While these estimates were stable across the study stages, both groups showed considerable individual variability where the impact of body weight on body image is concerned.Considering the convergence of estimates for the most conceptually relevant anchors, the IWQOL-Lite-CT Psychosocial Scale and Items 7 and 20, the MIC might also be expressed as a range from 0.83 to 1.08 in raw scores, or 21 to 27 transformed points, as the more single point estimate above may miss meaningful change that is present.Many participants did not report having any issues with their self-perception, even when they had very high body mass indexes.It is important to note that for these people, or any people with scores greater than 75 on the IW-SP at baseline, there is insufficient room to demonstrate a meaningful change on the IW-SP.Accordingly, future studies using the IW-SP as a key endpoint will need to consider baseline IW-SP score as an inclusion criteria to ensure that participants have scores that will allow for a meaningful change in the IW-SP (i.e., they must have scores < 75 at baseline).
The results of this study should be interpreted with consideration for the following limitations.First, there was not a conceptually a-priori appropriate global scales to anchor the IW-SP.In addition, it was an exploratory post hoc analysis of available clinical trial data, though the process for identifying and selecting anchors was prespecified.Use of clinical trial data for estimating psychometric properties is a common approach in PROM development and validation.This study focused on the MIC for improvement, as   the MIC for deterioration in self-perception, measured by the IW-SP due to weight gain, cannot be established based on the results of the current study.Finally, it is important to note that this MIC is intended for classifying minimal within-person change over time, for example as a threshold for determining the number of responders, rather than as a value for interpreting between-group mean differences [8].Interestingly, the conceptually related IWQOL-Lite-CT Items 7 and 20 showed greater sensitivity in responsiveness analyses but also showed higher MIC estimates.This may not be surprising given that the PROM-derived anchors used existing MIC estimates reflect the smallest meaningful difference to patients, and are comprised multiple items measuring a range of the facets associated with the respective concept.The single items on the other hand used the smallest change possible on their response scale, which may not represent a minimum change to patients, and were limited to the concepts of confidence (Item 7) and frustration (Item 20) with body weight.It may be that the response scales are not sensitive enough or do not capture the full range of concepts related to self-image associated with body weight.
The results of this study yield an estimate of the MIC for the IW-SP, which was derived based on a triangulation of many estimates from multiple anchors and distributionbased estimates.These estimates all converged quite closely and consistently on the identified value and reflect reasonable changes in the IW-SP scale (about a one-category change across each of the three items) that could conceivably reflect meaningful improvements from the patient perspective.

APPADL
ability to perform physical activities of daily living, IW-SP Impact of Weight on Self-perception Questionnaire, IWQOL-Lite-CT Impact of weight on quality of life-lite clinical trials version, SD standard deviation a Corresponding to 0.75 to < 1.25 MIC for scales, 1-point improvement for IWQOL Items 7 and 20, and half SD for distribution-based analysis b Showed significant difference in responsiveness analysis between 0.75 to < 1.25 MIC vs. < 0.25 MIC c Showed significant difference in responsiveness analysis between adjacent anchor categories: 0.75 to < 1.25 MIC vs. 0.25 to < .75MIC for the IWQOL-Lite-CT Psychosocial Scale; No change vs. 1-point improvement for the IWQOL-Lite-CT Items d Weight Change included for reference, but not included in calculations as it did not meet a priori criteria for inclusion as anchor Scale Transformed score improvement (0-100 points) IW-SP raw scale units improvement (1-5-point scale) Total # of response categories Improved across all three items Estimation Confirmation Difference Estimation Confirmation Difference Estimation Confirmation Difference IWQOL-Lite-CT Physical b

Table 1
Interval groupings of participants for IW-SP MIC estimation analyses IW-SP Impact of Weight on Self-perception Questionnaire, MIC minimal important change *The estimated change in IW-SP that corresponded to this level of change for each anchor variable were of primary interest in the current study as this represents the change in the IW-SP score that corresponds to a 1 MIC improvement in the continuous anchors, or a 1-point improvement in the individual items a Each group corresponds to change in units of respective PROM MIC as follows: APPADL MIC = 10 points; IWQOL-Lite-CT physical composite MIC = 13.5 points, physical function MIC = 14.6, psychosocial composite MIC = 16.2 points, total score MIC = 16.6 points b Groups correspond to all possible improvements in item response scale

Table 3
Descriptive summary of PROM and weight change a scores APPADL ability to perform physical activities of daily living, IW-SP Impact of Weight on Self-perception, IWQOL-Lite-CT Impact of weight on quality of life-lite clinical trials version, SD standard deviation Spearman's rho, change in prospective anchors correlated with change in IW-SP scores, baseline to endpoint b

Table 5
Triangulation: IW-SP Change Scores Corresponding to Minimal Important Change a in Anchor-and Distribution-Based Analyses