FormalPara Key Summary Points

Why carry out the study?

Data from insurance claims and electronic health records (EHRs) have been used to assess disease burden and progression at the population level for Duchenne muscular dystrophy (DMD). However, no studies have been done to assess the suitability of these data sources for longitudinal assessment of individuals with DMD.

Two large closed-claim databases and one open-claims database with some linked EHRs were examined for data availability at the individual level for 54 outcomes relevant to DMD.

What was learned from this study?

The results of this study demonstrate the challenges that can occur when using these data sources to longitudinally examine DMD-relevant outcomes at the individual level.

Some insurance claims data included information on clinical and functional events that occur at later stages of DMD. As an example, depending on the database, 45.5–48.5% of patients had at least one claim filed for a cardiomyopathy or heart failure clinical event.

Very limited EHR data (on 2% or less of patients) were available that indicated tests were ordered for clinical measures, biomarkers, or functional assessments. Among patients with data that indicated a test was completed, for each test, only a subset had test results available. No data were available for patient-reported outcomes.

Introduction

Background

Duchenne muscular dystrophy (DMD) is a rare and fatal progressive neurodegenerative disease caused by mutations in the gene responsible for producing dystrophin [1,2,3]. The progressive nature of DMD results in the loss of functional abilities caused by increasing muscle weakness and deterioration. Individuals with DMD first experience symptoms such as motor developmental delays and decreased walking ability in the early ambulatory phase between the ages of five and seven, though recent publications have suggested symptoms can manifest earlier [4,5,6]. Late ambulatory phase functional and clinical events then manifest between the ages of eight to eleven and are characterized by an increasing loss of ability to walk and part-time wheelchair use [7, 8]. Patients eventually reach the early nonambulatory phase in their early teens when they experience functional and clinical events such as loss of ambulation (LOA), defined in clinical research as full-time wheelchair use, and increasing loss of upper limb function. Patients then progress to the late nonambulatory phase later in their teens, which includes inability to perform activities of daily living, respiratory impairment that requires ventilatory support, increasing cardiac dysfunction, heart failure, and, eventually, premature death by the third decade of life [1, 3, 4, 9]. The clinical burden of DMD is compounded by the humanistic impact for patients and caregivers, who experience deteriorating health-related quality of life (HRQoL) as DMD transitions to more severe health states [4, 10,11,12]. The significant impacts of DMD highlight the importance of measuring disease progression for individuals over time.

Holistic understanding of a patient’s clinical history helps clinicians to optimize medical decision-making, ultimately improving patient outcomes. For example, appropriate initiation of glucocorticoids has been shown to prolong function and ambulation, with some studies demonstrating improved outcomes associated with early initiation [13,14,15]. As cardiovascular complications are a leading cause of mortality in DMD, most individuals are treated prophylactically with therapies for cardiomyopathy such as angiotensin-converting enzyme inhibitors or angiotensin receptor blockers as early as age ten [8, 16, 17]. Precision genetic ribonucleic acid (RNA) therapies targeting the root cause of the disease have also been approved for a subset of amenable patients in an effort to slow the progression of the disease [18,19,20]. Despite DMD’s predictable course, there is significant heterogeneity in individuals around the timing and severity of functional and clinical events, making ongoing clinical evaluation to monitor the rate of disease progression necessary [4, 17]. While genetic mutations partly explain intrapatient heterogeneity, with some resulting in more rapid disease progression, the underlying causes are not fully known [14, 21,22,23].

Real-world data (RWD) present an opportunity to understand the progression and burden of DMD. These data include information on diagnoses, performed procedures, dispensed medications, and inpatient stays. Sources for RWD include product and disease registries, and patient cohorts, which are suited for this purpose but are associated with high costs, time and logistical complexity, and not always representative of the general population. Insurance claims data (with and without electronic health record [EHR] data) are another form of RWD that have been used to describe patient populations, treatment patterns, healthcare resource use, and aspects of disease progression at the cohort level [23,24,25,26,27,28]. The ability and reliability of insurance claims (with and without linked EHRs) to accurately report the progression of DMD at the individual patient level has not been determined.

Objectives

The objective of this initiative was to comprehensively examine US insurance claims and EHRs for the availability and reliability of DMD outcomes data to describe functional status and disease progression at the individual patient level over time. It was hypothesized that many DMD-relevant outcomes related to functional and clinical outcomes would be missing or underrepresented in these data sources.

Materials and Methods

Data Sources

“Closed” insurance claims data, derived from individual payers and inclusive of all relevant records for healthcare encounters for a given individual, were used from two sources. The first was Merative’s MarketScan Commercial databases, a set of large, nationally representative healthcare databases with data for employer-sponsored, privately insured employees and their families. The second was Merative’s MarketScan Multistate Medicaid claims databases, which include demographic and clinical information, inpatient and outpatient utilization data, and outpatient prescription data for 17 million individuals enrolled in Medicaid across multiple states in the USA. Clarivate, an “open” claims database where records of healthcare encounters are derived from numerous sources including EHRs, was used as a third data source and has at least one claim filed for 200 million patients across inpatient, outpatient, or pharmacy settings [29]. MarketScan and Clarivate data used for this study are de-identified and did not require institutional review board review. The authors obtained permission to access and use the data from the owners of the MarketScan and Clarivate databases.

Patient Selection

The study period was defined as April 1, 2013, through March 31, 2018 (MarketScan Commercial); January 1, 2013, through June 30, 2018 (MarketScan Medicaid); and January 1, 2011 through December 14, 2021 (Clarivate). The index date was defined as the date when the first eligible inpatient or outpatient visit with a relevant diagnostic code or medication prescription record appeared in the datasets during their respective study periods. More detailed information on diagnostic codes can be found in the Supplementary Materials section.

Patients were included if they were male, 30 years of age or younger, and met at least one of the following criteria at any time during the study period:

  • International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) diagnostic code for hereditary progressive muscular dystrophy (359.1)

  • ICD-10-CM code for muscular dystrophy (G71.0 or, if on or after October 1, 2018, G71.01) in claims filed on or after October 1, 2015

  • Systematized Nomenclature of Medicine (SNOMED) code for DMD (76670001) in the Clarivate dataset

Patients were excluded if they met at least one of the following criteria:

  • ≥ 2 medical claims for ventilator use separated by ≥ 180 days, before 6 years of age

  • ≥ 1 medical claim with a Current Procedural Terminology (CPT) code or Healthcare Common Procedure Coding System (HCPCS) code for ankle foot orthosis or lower extremity surgery, before 3 years of age

  • ≥ 1 medical claim for a power, power-assist, and/or manual wheelchair, before 5 years of age

  • ≥ 1 medication fill (National Drug Code [NDC] 64406005801) or an injection code (HCPCS J2326, C9489) for nusinersen

In addition to the above exclusions, specific exclusion criteria were applied to the Clarivate dataset. These criteria included the presence of SNOMED code 387732009 for Becker muscular dystrophy (BMD). An exception was made for patients with BMD with a DMD-specific medication, ensuring that patients with DMD-related indications were appropriately included in the analysis even if they would have otherwise been excluded.

Patients with a record of prescribed DMD-specific therapies of phosphorodiamidate morpholino oligomer (PMO) therapies [18, 19, 30] were included into the cohort regardless of other inclusion/exclusion criteria. All included patients were followed from their index date until death (if known), deregistration, or the end of the study period.

DMD-Relevant Outcomes

A list of clinical and functional outcomes were ascertained from published systematic literature reviews (SLRs) [25, 31]. A total of 54 DMD-relevant outcomes were identified and independently assigned to one of five mutually exclusive categories: functional and clinical events, clinical measures, biomarkers, functional outcomes, and patient-reported outcomes (PROs) (Fig. 1). These categories provided a framework for assessing a dataset’s ability to capture and monitor DMD-relevant outcomes in individual patients. Data sources for clinical measures, functional outcomes, and PROs were identified from assessments or tests that measured the outcome of interest.

Fig. 1
figure 1

DMD-relevant outcomes for which insurance claims and EHR data availability and suitability were examined

Each dataset was examined for data relevant to the prespecified outcome using a comprehensive approach that accounted for structured and unstructured fields. Structured fields were defined as having data organized in a standardized, predefined format, and included ICD diagnosis and procedure codes, CPT and HCPCS procedure and/or equipment codes, and NDC for medication dispensations. Unstructured fields were defined as having data that was not organized in a predefined manner and included clinical notes. The examination of structured and unstructured fields for available data of interest relied on direct or indirect methods depending on the DMD-relevant outcome of interest. Data were identified directly if there were codes or keywords available that directly represented an outcome of interest (e.g., diagnoses of cardiomyopathy). If unavailable, then data for an outcome of interest were identified indirectly using codes or keywords that could be used as a proxy (e.g., records of cardioprotective medication use, which could indirectly suggest that a cardiomyopathy diagnosis had occurred).

Functional and clinical events were defined as events that patients will not be able to regress from once achieved, and included cardiomyopathy/heart failure, LOA, mortality, respiratory insufficiency, scoliosis, and wheelchair use. Cardiomyopathy and heart failure data were identified directly with ICD codes specific to each one, and indirectly using codes for cardioprotective medications. Data availability for respiratory insufficiency was identified directly with ICD diagnosis codes for respiratory failure, and indirectly with HCPCS, CPT, and ICD procedural codes related to pulmonary management, tracheostomy, or assisted ventilation. Scoliosis data were identified directly using ICD diagnosis codes for scoliosis and HCPCS, CPT, and ICD procedural codes for spinal surgery. Despite the known lack of mortality data in most USA-based insurance claims datasets, mortality was recorded if indicated in the evaluated fields [32, 33]. Data availability and suitability were not assessed for LOA as billing codes cannot reliably determine full-time wheelchair use. Wheelchair purchase and repair data availability were assessed to inform future research but considered to be insufficient proxies for part-time or full-time wheelchair use. Although a published cross-sectional study validated algorithms for identifying nonambulatory status in individuals with DMD, the algorithms did not determine the extent of mobility loss, the timing of LOA, nor disease progression until LOA is reached [28]. One additional item was assessed that does not fall under the definition of functional and clinical events but is clinically significant for patients with DMD: ventilation use [23]. Ventilation use data were also checked to inform future research, and were identified directly using HCPCS, CPT, and ICD procedural codes for tracheostomy or assisted ventilation. For more information on codes used to identify DMD-relevant outcomes for functional and clinical events, refer to Supplementary Material. Data availability for clinical measures and biomarkers, which are administered clinically, were examined by identifying procedure codes that indicated an assessment or test was ordered. Structured fields in Clarivate EHR data were examined for the results of ordered tests or assessments; no similar fields for test results exist within the claims datasets. Data availability for biomarkers was assessed by searching for documentation of brain/B-type natriuretic peptide, creatine kinase (CK), or dystrophin levels. Functional outcomes, which are measured through clinically administered assessments, and PROs, which are not often administered in routine clinical practice, did not have structured fields available.

Unstructured data fields were available in the Clarivate EHR data and examined by keyword, searching for information relevant to a test or assessment for clinical measures, biomarkers, functional outcomes, or PROs. The availability of data for these outcomes was assessed by reviewing information for items such as test or assessment results, panel names, test order names, and vital statistics.

Analysis Overview

Age at index date and median duration of follow-up from the index date were estimated for the cohorts from each dataset and summarized by age group using frequency and percentage of the population included in the dataset. Follow-up was summarized using median and interquartile range. Cumulative annual attrition was determined using a Kaplan–Meier curve that estimated the percentage of patients remaining in a dataset by the end of each post-index year. For each identified outcome of interest, the insurance claims and EHR data were assessed for availability and feasibility of longitudinal patient-level tracking of disease progression using the algorithm illustrated in Fig. 2. No additional statistical tests were conducted.

Fig. 2
figure 2

Algorithm used to assess the feasibility of insurance claims and EHR data to assess and track progression of DMD-relevant outcomes over time

Data relevant to the DMD outcomes of interest were also assessed for suitability of estimating disease severity and onset. Data were assessed for indicators that could distinguish more severe from less severe symptoms as well as indication of onset.

Finally, the overall feasibility of using available data to assess disease progression at the individual level was examined. The reporting of actual test results was required to inform whether data on the measures’ outcomes existed (as opposed to a reference to an assessment being requested or performed without the corresponding results). The prevalence of assessable measures was summarized overall for each dataset.

Results

Population Characteristics

Patients meeting inclusion criteria were tabulated, resulting in an observed count of 1964 in MarketScan Commercial, 2007 in MarketScan Medicaid, and 10,639 in Clarivate. The median follow-up was 1.9 years in MarketScan Commercial, 3.4 years in MarketScan Medicaid, and 6 years in Clarivate. The Clarivate cohort included a greater proportion of younger patients compared to both MarketScan cohorts. Approximately 50% of the Clarivate cohort consisted of patients aged 10 years or younger, compared to 30.0% for the MarketScan cohorts (Table 1). Data fields for, and patient-level information on, genetics, race and ethnicity, and socioeconomic status were limited across datasets.

Table 1 Patient characteristics and length of follow-up in MarketScan Commercial, MarketScan Medicaid, and Clarivate datasets

Higher cumulative attrition rates were observed in MarketScan Commercial cohort compared to the MarketScan Medicaid and Clarivate cohorts. For example, at the end of the fourth year post-index, 37.4% of patients remained in the MarketScan Commercial dataset compared to 60.3% in the MarketScan Medicaid dataset and 68.1% in the Clarivate dataset. The Clarivate cohort had the lowest attrition for each year post-index, particularly after the first year, in comparison to the MarketScan Medicaid and Commercial cohorts. During the fifth year post-index, 31.7% of individuals in MarketScan Commercial cohort and 35.1% of MarketScan Medicaid cohort remained. In contrast, retention within the Clarivate cohort was substantially higher (59.1%).

DMD-Relevant Outcomes

Functional and Clinical Events

Out of the six functional and clinical events, data were available for five and fout events within the MarketScan and Clarivate datasets, respectively (Tables 2, 3). Cardiomyopathy was observed in the highest proportion of patients compared to other events in every dataset. The prevalence of claims for cardiomyopathy or heart failure was relatively consistent across the datasets, with 45.5% (MarketScan Commercial), 48.0% (Clarivate), and 48.5% (MarketScan Medicaid) of patients having at least one filed claim. Respiratory insufficiency was observed at the second highest proportion of patients across all three datasets, with 31.9% (Clarivate), 34.2% (MarketScan Commercial), and 38.1% (MarketScan Medicaid) of patients having at least one filed claim. Although no data were available that could appropriately measure frequency or intensity of wheelchair use, 34.0%, 45.2%, and 56.3% of patients in the Clarivate, MarketScan Commercial, and MarketScan Medicaid datasets, respectively, were observed to have at least one claim for wheelchair purchase or repair. At least one claim for ventilation use was observed for 23.8% (MarketScan Commercial), 27.9% (MarketScan Medicaid), and 28.2% (Clarivate) of patients included in the datasets. Limited data were available to indicate mortality and in MarketScan these were restricted to inpatient claims filed before 2016 (Table 4).

Table 2 Availability and reliability of direct and indirect outcomes data in the MarketScan claims and Clarivate datasets
Table 3 DMD-specific outcomes with data in the MarketScan and Clarivate datasets that are available and reliable
Table 4 Proportion of patients with available data for key functional and clinical events in MarketScan Commercial, MarketScan Medicaid, and Clarivate datasets

Clinical Measures

Although the datasets contained some records of clinical measures, there were substantial limitations to their availability and completeness (Table 2). Less than 2% of patients in the Clarivate dataset had data available for any individual clinical measure, with only a subset of these having measure scores or results available. Among these patients, records were available that indicated seven of the 13 identified assessments measuring clinical measures of interest were performed: echocardiogram, cardiac magnetic resonance imaging (MRI), forced vital capacity (FVC), peak flow, pulse oximetry, bone mineral density, and left ventricular test. Additional claims were identified in the MarketScan Commercial datasets indicating that assessments for spirometry and cardiac MRI were ordered. There was no information that indicated whether the tests were performed or for any test results. Consequently, there was no opportunity to understand whether the test indicated abnormal findings or determined clinical or functional severity.

Biomarkers

Records were available that indicated tests were ordered for all three biomarkers of interest in the MarketScan datasets (Table 2). Additional information on test results, or whether the biomarker tests were completed, were unavailable. For each biomarker of interest (brain/B-type natriuretic peptide, CK, and dystrophin), less than 2% of patients in Clarivate had at least one claim to indicate a test was ordered. Only five of these patients, less than 0.05%, had information available that indicated a myotonic dystrophy test was completed, and only two of these patients had test results available.

Functional Outcomes and PROs

In the MarketScan databases, structured codes for documenting the administration of assessments that measured functional outcomes relevant to DMD were not available (Table 2). Similarly, in the Clarivate claims dataset, the availability of data on these assessments were very limited. Two records for the six-minute walk test (6MWT) were identified for a single patient. No records for other ambulatory assessments, such as the North Star Ambulatory Assessment (NSAA), were identified. No data were available across any of the datasets for any of the PROs of interest.

Discussion

In prior DMD research, RWD have been used to estimate the prevalence of selected outcomes at the cohort level, and document an increase in disease burden as DMD progresses [23,24,25, 28, 34, 35]. There is growing interest in potentially using RWD for tracking individual trajectories for clinical and economic insights. To do this successfully, RWD need to be able to provide information on the occurrence, timing, and severity of relevant DMD outcomes over a long enough period of time to observe disease progression [36].

This study comprehensively examined the availability and suitability of longitudinal, patient-level RWD for understanding the trajectory of DMD-specific outcomes for functional and clinical events, clinical measures, biomarkers, functional outcomes, and PROs. In doing so, this study better characterized the significant challenges associated with using insurance claims and EHR data to longitudinally assess DMD outcomes of clinical importance at the individual level.

Data availability for DMD-specific outcomes were limited to a very few functional and clinical events restricted to more severe stages of DMD. As a result, available data are insufficient and unreliable for assessing disease progression at the patient level. Few billing codes exist for earlier ambulatory milestones such as delays in walking or gait problems. Information on the reason for a provider’s choice of a billing code, timing of functional and clinical events, or disease severity cannot be determined. For example, an observed code for a heart failure medication does not necessarily indicate whether a physician prescribed the medication for prophylactic or acute treatment [8, 16, 37]. Clinical measures and biomarker data were largely absent across all examined datasets. While a small number of patients had codes indicating a test was ordered, the results were not routinely recorded, preventing further assessment for abnormal findings or determination of clinical or functional severity of disease. Data routinely collected in clinical trials for biomarkers, functional assessments or validated PROs were unavailable across all datasets, highlighting the lack of opportunity to assess patient HRQoL and function. This is a significant finding because it indicates that claims datasets, with or without EHR data, are unsuitable for identifying where individual patients with DMD are in their disease trajectory. Assessments such as the 6MWT or NSAA are commonly used in clinical trials because they are effective at measuring disease severity and identifying prevalent and incident functional events. However, such measures may not be routinely collected in clinical practice [38].

There were challenges with the data available to assess LOA and wheelchair use that should be noted. While claims for purchases and repairs of wheelchairs were identified, they can be viewed as an insufficient proxy for actual frequency of wheelchair use, which is necessary for determining LOA [23]. Wheelchair billing codes are designed to account for associated costs incurred by the healthcare system and do not document frequency of use, which is needed for a nuanced understanding of an individual’s disease stage. Similarly, wheelchair billing codes do not capture patient use of wheelchairs obtained privately outside of the healthcare system [34]. Taken together, records to better understand LOA, as defined in clinical research (full-time wheelchair use), were not available [39, 40]. Functional and clinical events data for DMD milestones that typically occur after LOA, particularly around cardiovascular and pulmonary events, are better captured but also face some limitations. Although billing codes were used to determine data availability for cardiomyopathy or heart failure, claims data alone cannot confirm whether care was provided to prevent, manage, or acutely treat cardiomyopathy. Additionally, understanding whether billing codes, such as those for scoliosis, generally documented before or during the early mid-teens, reflect incident diagnosis of the condition was difficult. Identifying incident events for chronic conditions, including scoliosis, requires a sufficient baseline period in addition to adequate follow-up time. For DMD, patient age and sufficient follow-up time could help in understanding whether a functional or clinical event is incident. However, this effort is challenging for two reasons: the structure of claims data, and the way data was captured across the databases. Together with the absence of functional outcomes and PROs, these limitations to data availability have important implications for using claims and EHR data to understand crucial aspects of patient status or disease progression. Lastly, the unavailability of genetic data, race and ethnicity, and socioeconomic status make it challenging to obtaining further clinical or economic insights, particularly at time of diagnosis [13].

Existing insurance claims and EHR data infrastructures are increasingly used to inform innovative access and reimbursement models such as outcomes-based agreements (OBAs) [41,42,43,44]. However, implementing OBAs requires the ability to define and easily measure health outcomes applicable to most patients and that can be observed at the individual level in real-world settings over a period of time [15, 41,42,43,44,45]. These findings suggest that there are challenges for assessing individual patient-level data in DMD if depending on claims or EHR data alone.

Study Strengths and Limitations

This study’s strengths included use of multiple, large real-world datasets that represented multiple payer types and included both children and young adults in the USA. Some EHR data were also included, which contained information not found in insurance claims. The examination of RWD availability and suitability for longitudinally assessing outcomes in individual patients was comprehensive, covering DMD-related outcomes identified from multiple SLRs.

A few study limitations should be noted. The study sample may not be representative of real-world DMD population segments associated with each institution or payer type, or jurisdiction. As a result, study findings may not be generalizable to all commercial and Medicaid payer segments in the USA. Commercial and Medicaid MarketScan data structure and categorization limitations were consistent with what is expected of insurance claims. The open-source nature of Clarivate’s EHR-linked data means that all relevant claims for an individual were not necessarily captured. This introduces significant challenges in understanding whether claims for a functional or clinical event are incidental. The inclusion of LOA, as defined in this study, can be viewed as a limitation. Clinical research uses this same definition for LOA to establish treatment efficacy. However, this may not be applicable to clinical practice settings, where LOA can be measured differently. Lastly, the SLRs used to inform which DMD-relevant outcomes to use in the dataset examination did not identify patient-reported outcomes measurement information system (PROMIS), a measure used in clinical research and more recently in clinical practice, as a PRO outcome. To address this limitation, PROMIS data availability was checked for in the EHR unstructured fields and found to be absent.

Directions for Future Research

The challenges of using existing RWD sources to understand patient health and disease progression in DMD at the individual level suggest that other data collection methodologies and sources could be needed. These may include other insurance claims databases with linked EHRs, or patient/caregiver-reported outcome studies and other observational studies of longitudinal, patient-level data. Retrospective chart reviews are a potential alternative for RWE generation when outcomes of interest are routinely collected in clinical practice but not represented in claims or EHR systems. The costs, time, and logistical complexity associated with these other data collection approaches can present challenges for longitudinal patient-level research. Emerging data sources such as electronic PROs (ePROs), wearables, and other digital clinical or functional assessment tools can be further explored as potential RWD solutions for assessing individual DMD outcomes.

Conclusion

Results from this study suggest that insurance claims and open claims with some linked EHR data do not contain many of the most relevant outcomes for understanding and holistically evaluating the progression and burden of DMD at the individual patient level.