Background

Duchenne muscular dystrophy (DMD) is a rare, X-linked neuromuscular disease characterized by progressive muscle weakness [1, 2], affecting approximately 1 in 3,500-5,050 live male births [3, 4, 5]. DMD is typically diagnosed in early childhood and presents with a progressive, irreversible course, including loss of ambulation followed by loss of upper limb, respiratory, and cardiac function [6,7,8]. Patients with DMD generally pass away prematurely in the third decade of life [4, 9]. Although there is no cure for DMD, management includes corticosteroids, physical therapy, assisted ventilation, and spinal surgery [8]. Exon-skipping therapies and gene therapy are available to a subset of patients [10,11,12,13,14].

Measuring health-related quality of life (HRQoL) is critical for providing insight into the overall burden of chronic health conditions, such as DMD [15]. Due to its clinical severity, the HRQoL among patients with DMD is substantially lower than that of the general population, decreasing as the disease progresses [16].

HRQoL can be quantified using health state utilities (i.e., preference for a particular health status). Utility estimates, which are presented on a cardinal scale of 0 to 1 with 0 representing death and 1 representing perfect health, are derived through a variety of methods broadly categorized as either direct or indirect measures [17]. The Health Utilities Index (HUI) Questionnaire is one of the most commonly used approaches for estimating utilities in DMD [18].

Health state utilities are used to calculate quality-adjusted life years (QALYs), the most common health metric used in health economic evaluations. Cost-effectiveness analyses (CEAs) aim to maximize health benefits given available resources and could be used to aid in reimbursement or resource allocation decisions [17, 19]. Over the last 40 years, the number of published CEAs has increased substantially [20, 21]. Within the United States (US), CEAs are also used to inform the value of drugs and emerging health technologies, and are cited in practice guidelines or insurance policies [19].

Despite its importance in understanding patients’ burden and influencing decision-making, measuring the HRQoL in DMD remains a challenge. Available research is limited due to small patient populations and limited standardized utility estimation methods for pediatric populations [22, 23], and systematic reviews have noted the relative lack of published studies evaluating DMD-specific utility values [4, 24]. In these studies, health states generally focus on ambulatory ability (i.e., early ambulatory [EA], late ambulatory [LA], early non-ambulatory [ENA], and late non-ambulatory [LNA]). Thus, it is important to contextualize available utility estimates for DMD health states relative to other diseases. Using utilities of clinically similar conditions could be a strategy for overcoming limited available estimates. This study aims to catalog and compare existing published utilities for DMD with those reported for comparable conditions.

Methods

Resources and utility selection

Published utility estimates were identified using the Tufts CEA Registry [21], a comprehensive database of CEAs published from 1976 to the present. The CEA Registry focuses on a subset of cost-utility analyses that quantify health benefits in QALYs, which account for changes in both longevity and utility. At the time of this analysis, there were over 33,000 utility weights across more than 10,000 CEAs published from 1976 to 2021 within the registry [21]. Eligible utility weight records included those derived from various countries, diseases, utility instruments, preference weights, and respondent types (i.e., proxy or self).

Analysis One: identify and catalog comparable utility estimates to DMD

The first analysis consisted of identifying health states that have comparable HRQoL utility estimates with published US DMD utility estimates. First, utility estimates for four major DMD health states were obtained through the only available US study published at the time of the analysis: EA (utility, 0.73), LA (0.64), ENA (0.21), and LNA (0.18) [25]. These values were derived using the HUI Questionnaire Mark 3.0 online. Next, a minimal clinically important difference (MCID) of ± 0.03 was applied to each estimate to establish a utility range (Table 1). This range was deemed reasonable for generic multi-attribute preference-based measures because health states that differ by < ± 0.03 units cannot be reliably differentiated; utility estimates with changes > 0.03 are generally regarded as a significant change [17, 26, 27]. The registry was next searched to identify other health states with associated utilities that fell within each range. All utility instruments in addition to the HUI were included in the analysis although utility instruments were not universally captured in the registry if unspecified in the original study. Utility records were excluded if the health state was unspecified (i.e., intervention-specific utilities with no designated clinical condition), a disutility, or a duplicate record (i.e., records with identical health states and utility estimates).

Table 1 Health states with comparable utility estimates to published US DMD utility estimates (Analysis One)

Analysis Two: identify and catalog comparable health states to DMD

The second analysis consisted of identifying HRQoL utility estimates for health states that are similar to the clinical conditions of DMD. The registry was searched using pre-defined search terms, such as difficulty walking, joint and muscle weakness, loss of ambulation, and the use of a wheelchair (Table S1). These search terms were derived by the co-authors based on their understanding of DMD.

A clinical expert in DMD presentation (coauthor RS) next mapped each identified health state to five possible categories based on the degree of clinical similarity: EA, LA, ENA, LNA, and undetermined. Health states were first classified as ambulant or non-ambulant. If the health state was ambulant, then they were classified as either EA or LA depending on the severity of the health state. More severe ambulant health states in which individuals were able to walk independently but may have been falling or requiring assistance for stairs were classified as LA. Health states that required assistance to walk or stand or that used a wheelchair were classified as non-ambulant. The non-ambulant health states were further classified into ENA or LNA based on severity of symptoms. If respiratory support was required (such as in spinal muscular atrophy type 1), then they were classified similar to LNA. Health states, such as “secondary progressive multiple sclerosis expanded disability status scale 6<: walking aid or wheelchair required and restricted to bed,” were too broad, covering LA, ENA, and LNA health states; therefore, they were classified as LA, ENA, and LNA and given a low relevancy score. Others that were too general (e.g., “severe musculoskeletal pain”) did not provide enough information to be classified into a health state.

Once the health state was mapped, the clinical expert gave a “relevancy score” on a scale of 0 to 10 based on how clinically similar the health state was to the DMD health state: score of 10, absolute relevance; score of < 10 to ≥ 8, substantial relevance; score of < 8 to ≥ 6, moderate relevance; <6 to ≥ 4, fair relevance; and < 4, slight-to-poor relevance. Health states were not specifically restricted to those plausible for a DMD model but had to reflect the signs and/or symptoms that would be relevant to DMD. For example, the multiple sclerosis health state, “relapsing-remitting multiple sclerosis, expanded disability status scale: confined to bed and can still communicate and eat,” had a score of 10 and was mapped to the DMD LNA state. In addition, patients may experience the same condition in multiple DMD health states; therefore, some health states were mapped to ≥ 2 DMD health states. For example, a “gait disturbance” health state from a CEA studying surgical treatments for essential tremor [28] received a relevancy score of 5 for both the EA and LA states. This indicated some ambiguity of the health state being mapped, but general relevance with a DMD health state.

These relevancy scores were assigned using subjective, albeit expert, judgment. As a result, the set of health states with relevancy scores of 10 were used in the base case analysis while those with relevancy scores ≥ 8 were used in sensitivity analysis.

Data extraction and statistical analyses

The following data were extracted from each utility record for both analyses: study reference information (e.g., primary author, journal, year of publication), health state, and utility estimate. When available, baseline clinical and demographic characteristics, sample size, utility instrument, and respondent type (i.e., patient, parent/caregiver; and proxy or direct) were also extracted. Data from eligible studies were extracted using Stata® (Version 15.1) [29] and recorded in Microsoft Excel® (2013) [30]. Table S2 provides a detailed list of all data extracted for each identified utility record. After all data were extracted, descriptive statistics were calculated, including mean, median, standard deviation, and interquartile range. Summary statistics were stratified by the four DMD health states.

Results

Analysis One: Health states with utility estimates comparable to DMD health states

Analysis One identified 7,914 potentially relevant utility records; 3,606 (45.6%) were deemed ineligible and excluded from the study (i.e., non-specified health states, disutilities, or duplicate records). The remaining 4,308 (54.4%) unique utility records were included and mapped to the relevant DMD health state based on the utility estimate for each identified health state: EA (0.70–0.76), LA (0.61–0.67), ENA (0.18–0.24), and LNA (0.15–0.21). These utility records were extracted from 2,322 unique CEA studies. Significantly more utility records were identified for the ambulatory phases (EA, n = 2,174; LA, n = 1,841) than for the non-ambulatory phases (ENA, n = 223; LNA, n = 217). Given an overlap between the MCID utility range for the ENA and LNA phases, health states with utility estimates that ranged 0.18 to 0.21 (n = 147) were included in both ENA and LNA datasets.

The identified utilities included a variety of health states representing a wide range of acute and chronic diseases (Table 1). Whereas myocardial infarction, coronary heart disease, chronic obstructive pulmonary disease, locoregional breast cancer, and type 2 diabetes were the most common health states that fell within EA ranges, non-small cell lung cancer, end-stage renal disease, stroke, and hip fracture were the most common that fell within LA ranges. Due to the substantially lower utility range thresholds, health states for the non-ambulatory phases represented more severe, terminal, or chronic conditions, including severe stroke, chronic liver disease, terminal breast cancer, and Alzheimer’s disease. Table 2 provides summary statistics of the identified utility records. Table S3 provides health states, utility values, and study characteristics that specifically utilized the HUI instrument, which was also utilized in the study from which the DMD health state utilities were obtained.

Table 2 Health states with comparable utility estimates to DMD (Analysis One)

A third of the utilities were derived from US-based CEAs (Table 2). For the non-ambulatory phases, 62 (4.3%) of the identified utilities from US-based CEAs had estimates ranging from 0.18 to 0.21 and were included in both ENA and LNA datasets. The majority of identified utility records from US-based CEAs originated from secondary sources (i.e., extrapolated from other studies and then included in the CEA model); however, it was unclear whether the utility was originally estimated from US populations. Of those utilities from US-based CEAs, 34 (2.3%) were estimated using the HUI instrument (EA, n = 16/706 [2.3%]; LA, 12/642 [1.9%]; ENA, n = 6/103 [5.8%]).

Of the 4,015 identified ambulatory DMD health state utilities, less than a fifth mentioned the age of the population (EA, n = 471/2,174 [21.7%]; LA, n = 218/1,841 [11.8%]). Among those utilities that reported an age, populations ranged from newborns/infants to > 95 years of age, although most utilities represented adult populations ≥ 18 years or older (EA, n = 442 [93.8%]; LA, n = 208 [95.4%]). Only 51 utilities represented pediatric populations (i.e., ≤ 17 years) (EA, n = 37 [7.9%]; LA, n = 14 [6.4%]). A handful included both pediatric and adult populations (EA, n = 8 [1.7%]; LA, n = 4 [1.8%]). For non-ambulatory phase utilities, less than a third mentioned the population age (ENA, n = 37/223 [16.6%]; LNA, n = 34/217 [15.7%]), with 26 of those utilities included in both datasets. Table 2 shows summary statistics of utility records for pediatric populations. Most included ambulatory utilities versus non-ambulatory utilities.

Analysis Two: Health states with clinical similarities to DMD health states

Using the pre-defined search terms listed in Table S1, 236 eligible utility records with health state descriptions that included one of those terms were identified. After the clinical expert mapped each to a DMD health state based on clinical similarity, 48.3% (n = 114) of the health states received a relevancy score of 10 (i.e., absolutely relevant to the clinical conditions of the mapped DMD health state) with median utility estimates of 0.71 (n = 35) for those mapped to EA, 0.54 (n = 31) for LA, 0.39 (n = 39) for ENA, and 0.30 (n = 9) for LNA (Table 3). The median utility estimates deviated from published utility estimates for DMD by varying degrees. Given the highly skewed distributions of utility values within each sample as seen by the interquartile ranges, the median value was considered rather than the mean. Health states clinically similar to the ENA phase exhibited the greatest deviation where the median utility estimate (0.39) was 85.7% higher compared to the published DMD estimate (0.21) (Fig. 1A). Health states clinically similar to the LNA health state (0.30) were 66.7% higher than the published DMD estimate (0.18). Health states clinically similar to the EA phase exhibited the smallest difference (0.73 vs. 0.71, 2.7% decrease) (Table 4; Fig. 1B).

Table 3 Health state utility estimates for health states clinically similar to DMD (Analysis Two)
Table 4 Health state utility estimates for health states clinically similar to DMD extracted from US-based CEAs (Analysis Two)
Fig. 1
figure 1

Differences in utility estimates for health states clinically similar to DMD and existing DMD utility estimates. (A) Changes in utility estimates compared to existing utility estimates for each DMD health state for the total sample of utilities with a relevancy score of 10 (n = 114). (B) Changes in utility estimates compared to existing utility estimates for each DMD health state for the sample of utilities from a US-based CEA study with a relevancy score of 10 (n = 31). CEA, cost-effectiveness analysis; DMD, Duchenne muscular dystrophy; HRQoL, Health-related quality of life; ∆, change in utility estimate; Mdn, median; US, United States

From the 236 health states identified in Analysis Two, 26.3% (n = 62) were from US-based CEAs. However, 79.0% (n = 49) of those utilities originated from a secondary source, which may or may not have been a US-based utility study. The difference in utility value was more pronounced when comparing utilities extracted from US-based CEAs compared with those in the overall analysis. The US-based median utility estimates with a relevancy score of 10 (n = 31) deviated substantially from published DMD health states with the LNA and ENA health states exhibiting the greatest difference (127.8% and 114.3% increase, respectively) (Table 4; Fig. 1B).

When expanding the criteria to include relevancy scores ≥ 8, 64.8% (n = 153) of the health states were considered substantially or absolutely relevant to their mapped DMD health state. This more relaxed criterion mapped 39 additional health states. With the exception of the LNA health state, the median utility estimates did not differ from those obtained from the more stringent criterion (i.e., relevancy score of 10) (Table 3, Figure S1). For the LNA health state, the gap closed considerably between the median estimates of the sensitivity analysis (0.20) and the published DMD estimate (0.18). The number of mapped health states nearly doubled for this DMD phase, resulting in a highly skewed distribution of utility estimates (interquartile range, 0.05–0.41). When the relevancy score criterion was relaxed to ≥ 8 for the US-based CEA utility estimates, there were still considerable differences compared to the DMD estimates for the ENA and LNA phases (0.45 vs. 0.21 and 0.40 vs. 0.18, respectively) (Table 4, Figure S2).

There was uncertainty surrounding the clinical similarity of the 136 health states that received a relevancy score of < 8. Figure S3 provides a distribution of relevancy scores, stratified by level of relevance and each mapping category. Due to highly ambiguous descriptions, the clinical similarity of 12.7% (n = 30) of the health states could not be determined. For example, “musculoskeletal pain” is a common problem experienced by patients with DMD to varying degrees across all ambulatory phases although not routinely assessed [31]. Thus, the health state could not be mapped with any level of certainty to a specific DMD health state. Figures S4 and S5 and Table S4 show the statistics of health states with relevancy scores ≥ 6 or ≥ 4.

Only 6.4% (n = 15) of the health states identified in Analysis Two mentioned the age of the populations, ranging from newborns/infants to adults ≥ 60 years. Most represented adult populations of an unspecified age range (80.0%, n = 12). Two health states represented adults ≥ 50 years, while only one health state represented newborns/infants.

Discussion

This study implemented novel search strategies to catalog and compare a variety of health state utilities to those existing for DMD. This methodology as applied to the Tufts CEA Registry database is the first study of its kind to the authors’ awareness, and provides a blueprint for future studies, particularly studies to provide context for utilities for rare diseases where estimates are limited or based on small sample sizes. In Analysis One, which identified health states with comparable utility values to the four DMD health states, there were substantially more utility values found within the range for EA and LA utility estimates compared to the range for ENA and LNA utility estimates. These data highlight that more diseases have utility estimates closer to 1 than to 0. A sample of health states with similar utility values to the four DMD health states demonstrates a wide range of health conditions and provides context for the DMD utility estimates. In Analysis Two, which identified health states that were clinically similar to DMD health states albeit to varying degrees, nearly half were deemed absolutely relevant to a single DMD state. However, the utility values deviated from published DMD utility values. In health states clinically similar to the ENA and LNA health states, utility estimates were higher compared to the published DMD estimate; those clinically similar to the EA phase were similar.

Cataloging health states with similar utility estimates to DMD revealed contextual factors, such as clinical similarity, that are worth examining. Although health states identified in Analysis One share similar utility estimates to the US DMD population, there is a contrast when considering patient experiences, including the impact of the disease on various aspects of health which resulted in the corresponding utility estimate. While disease characteristics are unique, it is interesting to observe the various diseases with utilities close to the published DMD utilities. Of note, the published utility estimates for non-ambulatory patients with DMD were close to estimates for fatal conditions and events (e.g., terminal cancers or severe strokes). Among health states whose utilities were derived from the HUI instrument, the utility for moderate-to-advanced Alzheimer’s disease corresponded to those for non-ambulatory patients with DMD. However, for health states with utility estimates similar to the EA phase, diseases and severity varied and included cerebral palsy, single amputations, hearing loss, and myocardial infarctions.

In Analysis Two, there was immense variability between cataloged and existing utility estimates for the DMD US population, demonstrating the heterogeneity of available utility estimates. Median estimates from clinically similar health states (score of 10) to the LA, ENA, and LNA phases exhibited utility value differences greater than 0.03 (i.e., a MCID) compared with the published DMD estimates. In fact, health states mapped to the ENA phase had a median utility estimate that was nearly twice that of the published DMD estimate. While wide variability was observed between non-DMD and published DMD utility estimates, that variability is not solely due to the disease or condition itself. Utility values for a given health state are not absolute as they are dependent on the methodology and the type of respondent [17, 32]. Utility values can be estimated using a variety of methods, either direct or indirect [17]. This is particularly noteworthy in diseases that include a loss of mobility, where the impact of this loss on utility estimates can vary widely depending on the utility instrument [33]. Additionally, parents or caregivers are often the proxy respondents for young patients or those suffering from impairments too severe to express their own preferences [17, 22, 34]. Using public or proxy preferences for chronic conditions may not provide an accurate reflection of the patients’ experiences because patients learn to adapt to their disease over time [35, 36]. Thus, there can be substantial utility value variability for the exact same health state. The standard gamble method in direct utility elicitation exercises consistently provides higher utility estimates compared to a time-trade off approach [37]. Prior studies have also documented variation in HRQoL estimates for similar health states among a wide range of diseases and conditions [32]. For example, a “severe angina” health state utility estimate can range from 0.354 to 0.707 depending on severity, treatment type, patient group, and/or methodological characteristics (e.g., utility instrument, proxy measures, country) [32].

Since HRQoL estimates are an essential component of value assessments, variability in utility values can impact the informative power and comparability of economic evaluations [32]. For example, CEA results that use QALYs could differ dramatically depending on the utility values used. Furthermore, there may be difficulties in measuring utility estimates for rare diseases or pediatric populations [4, 22,23,24]. Researchers conducting CEAs may not have the resources to gather and estimate original HRQoL estimates in these challenging situations and may opt to leverage existing utility values from other sources for the CEA models [17, 32]. In fact, the overwhelming majority of health states cataloged in this study were extrapolated from secondary sources. However, definitions of DMD health states can vary because patients may experience similar symptoms in multiple health states, and there may be inconsistent definitions for the high-level health states [38, 39]. This can result in ambiguity in utility estimates, highlighted by the utility range overlap between for the ENA and LNA health states. This has implications for developing informative, comparable economic evaluations, which are becoming key to health care decision making [20, 40].

Utility estimates are also used for calculations of disease severity based on absolute or proportional QALY shortfalls [17, 41]. Some health technology assessment bodies consider QALY shortfalls in their value assessments to adjust the willingness-to-pay threshold, including the National Institute for Health and Care Excellence (NICE) in the United Kingdom [42], the National Health Care Institute in the Netherlands [43], and the Norwegian Institute of Public Health [44]. A new methodology, the Generalized Risk-Adjusted Cost-Effectiveness (GRACE), adjusts the willingness-to-pay threshold in a CEA based on numerous factors, including the severity of disease as reflected in the expected utility [45]. These changes to traditional CEA based on the QALY or expected utility highlight the importance of accurate utility measures.

Understanding patient preferences is critical to developing robust value assessments, especially given that patients’ feelings about their condition (e.g., losing the ability to walk) may not align with those of the general public [46,47,48,49]. This includes an analysis of DMD patients that observed noticeably higher visual analog scale (VAS) scores compared to HUI and EQ-5D scores, especially for non-ambulatory patients [49]. The results of this study could provide contextual considerations for existing utility estimates and corresponding value assessments. Extracting health states with similar utility estimates or clinically similar health states could be a good starting point. Although the focus of this study is on DMD, the methodology could be generalizable to other diseases with large variability or paucity of utility data (e.g., rare diseases, severe diseases, or diseases with heterogeneity in rate of progression). Specifically, decisionmakers may choose to assess treatments using utility values from clinically similar health states; however, further research is needed to assess decisionmakers’ willingness to apply this novel approach in practice. Furthermore, researchers could build models with alternative outcomes to examine the potential impact to reimbursement or resource allocation decisions. This is especially relevant to DMD where promising, novel treatments, such as gene therapy, are emerging [14, 50].

This study had some limitations. First, data for utility records sometimes lacked the specificity needed to extract information on population characteristics (e.g., age, sex, country) if the authors of the original CEA study failed to include detailed information on the health states and elicitation methods. Second, mapping the health states in Analysis Two to a DMD health state was based on subjective (albeit expert) assessments. As a result, the scaled score from 0 to 10 allowed some room for uncertainty, and sensitivity analyses were conducted on health states with relevancy scores ≥ 8. Furthermore, differences in the utility values found in Analysis Two could be due to several different factors given that utility estimates can vary depending on the population and methodology employed. Further studies are necessary to examine utility values by respondent and utility instrument utilized. Finally, the pre-defined search terms used in Analysis Two may not have been comprehensive.

Conclusion

This study identified heterogeneity in health state utility estimates for conditions with similar utility estimates to existing DMD utilities as well as significant variability in utility estimates for health states clinically similar to DMD. This methodology could yield contextual factors, such as clinical comparisons, that could inform economic evaluations but requires careful evaluation of the types of utility instruments, tariffs, and respondent type (proxy or self).