Introduction

In X-linked myotubular myopathy (XLMTM), mutations of the MTM1 gene that encodes myotubularin protein lead to myotubularin deficiency and a rare, life-threatening congenital myopathy [1] with estimated incidence of 1 in 40,000–50,000 newborn males [2]. Myotubularin is required for normal development, maturation, and maintenance of skeletal muscle cells; thus, deficiency affects skeletal muscles throughout the body [1, 3, 4]. Historically, approximately half of affected male infants died before 18 months of age, most often related to respiratory failure [5,6,7]. These children rely on intensive medical intervention for survival [5,6,7].

Natural history studies have demonstrated multisystem involvement in XLMTM and cumulative morbidities that contribute to extensive disease burden. These include respiratory failure and resulting need for invasive ventilator support, feeding and swallowing impairments and associated aspiration risks leading to need for gastrostomy tube feeding, inability to ambulate independently requiring wheelchair use, absent or delayed attainment of major motor milestones, scoliosis and other musculoskeletal morbidities, and hepatobiliary disease and hepatic vascular abnormalities, such as peliosis [5, 6, 8,9,10].

Presently, there are no approved treatments that alter the disease pathology in XLMTM and no established consensus on disease management. Multidisciplinary supportive care for the aforementioned multisystemic manifestations of chronic, profound muscle weakness is required to prolong life and support mobility [5,6,7]. Multisystem disease involvement means that management and care of XLMTM is broad and complex and diagnosis, established or confirmed by genetic testing, can involve numerous diagnostic procedures and extensive healthcare resource utilization. Direct medical costs and health care resource utilization for children with XLMTM were recently estimated at $897,978 per patient in the first year of life and nearly $1.5 million total for patients who survive the first 4 years [11].

Although significant numbers of carrier females experience variable degrees of disability [12], we limited this analysis to males with XLMTM as they represent the large majority of affected individuals and experience complete penetrance of pathogenic mutations in the hemizygous state [2]. To further refine our understanding of the disease burden imposed by XLMTM we utilized a novel approach to claims analysis involving a comprehensive insurance claims database and artificial intelligence capabilities to quantify the disease burden of XLMTM, including multidisciplinary disease management; multiple comorbid conditions; and the need for hospitalization, invasive ventilation, and feeding support. In addition, the release of an International Classification of Diseases, 10th Revision (ICD-10) code for XLMTM in October 2020 allowed for more precise identification and assessment of these patients.

Methods

Data collection

We set out to identify a known cohort of male XLMTM patients within IPM.ai’s (Florham Park, NJ, USA) licensed patient-level data, consisting of US medical and pharmacy claims for more than 300 million unique patients across commercial and Center for Medicare and Medicaid services (CMS) payors. Typically, Boolean logic can be applied to medical coding (diagnosis, prescription, and procedure codes) using the ICD-10 code to arrive at the closest approximation of the confirmed patient cohort. However, until the approval of a specific ICD-10 code for XLMTM in October 2020, XLMTM was most commonly grouped with other myopathies under the G71.2 (congenital myopathies) ICD-10 code, which is not specific to XLMTM. Therefore, we needed to leverage an external data source to more accurately define an XLMTM patient cohort.

A de-identified dataset of XLMTM patients from an existing research registry of molecularly confirmed XLMTM patients at Boston Children’s Hospital was utilized to define a derivation set of patients. In order to maintain patient privacy, the statistical authors (IPM.ai, Florham Park, NJ, USA) engaged a third-party application (Datavant, San Francisco, CA, USA) to complete HIPAA-compliant patient tokenization (i.e., the process of replacing sensitive data with non-sensitive placeholders called “tokens”). The datasets in this publication were linked using privacy-preserving record linkage provided by Datavant [13]. Datavant’s solution uses personally identifiable information to create encrypted tokens, which are de-identified, unique and irreversible patient keys. Tokens are used to match individual records across different data sets without ever exposing the personally identifiable information of the patient to whom each record belongs. Thus, protected health information was not shared between the patient registry and the IPM.ai claims dataset.

The dataset consisted of 113 de-identified patient tokens, of which 76 matched the claims dataset. By engaging a genetic testing company (Invitae, San Francisco, CA, USA), Astellas Gene Therapies also purchased data on XLMTM patients confirmed through diagnostic testing. Through the same third-party tokenization software, IPM.ai incrementally identified four additional XLMTM patients, increasing the study population to 80 patient tokens matched in the claims dataset.

Finally, since the approval of ICD-10 diagnosis code G71.220 for XLMTM in October 2020, 135 male patients have been coded with G71.220, 23 of whom were previously identified through patient registry or genetic testing tokens, yielding a net increase of 112 patients for a total analysis population of 192 unique patients.

Medical code grouping

The research team analyzed individual medical codes as governed by Healthcare Common Procedure Coding System (HCPCS), Current Procedural Terminology (CPT), and the ICD-10. While each code carries specific information on care delivered to the patient, it is often the case that a diagnosis may have multiple sub-codes. Reporting the frequency with which a cohort of patients presents with a diagnosis of interest requires grouping like codes together, so as to track the overall occurrence of the disease. For example, a patient with scoliosis may have in their history a code for “Scoliosis, unspecified” (M41.9), as well as “Other forms of scoliosis, lumbar region” (M41.86). If each code were reported independently, one could not ascertain whether the same patient appeared with one or both codes. To ensure the number of patients with each comorbidity or procedure was as accurate and comprehensive as possible, a cross-functional team of physicians reviewed and aligned on the individual medical codes that were deemed appropriate to group together.

Physician identifier remapping

Health care providers are most commonly identified in data using their National Provider Identifier (NPI), a unique 10-digit identification number containing their profile detail, including name, address, and self-reported specialty. It is not unusual for provider specialty data to be inaccurate, whether due to further sub-specialization or original mis-capture of the data. The research team conducted a process of reconciling NPI details by constructing a web-scraping tool and overriding provider specialty using the most recent data captured on their practice biography. All data contained in the analysis—namely, care utilization by specialty—are based on this revised NPI information.

Approval for use of Boston Children’s Hospital Registry

One of the authors (Dr. Beggs) directs a research study with an umbrella protocol (Protocol number 03-08-128R) approved by the Boston Children’s Hospital institutional review board (IRB) entitled “Molecular Analysis of Neuromuscular Disease.” Under the terms of this protocol, the informed consent form specifies that de-identified data and samples may be shared with researchers outside Boston Children’s Hospital for related studies of neuromuscular disease led by other qualified investigators, academic institutions, and businesses. To permit use of the tokenization software behind the Boston Children’s Hospital secure firewall, the protocol and software package underwent security review by the Boston Children’s Hospital Clinical Research Informatics Team, which approved its use for the described purpose of deidentifying the Beggs Study cohort of XLMTM patients. That review was accepted by the (IRB), which subsequently approved an amendment to the IRB protocol allowing tokenization and sharing of tokens with IPM.ai and Astellas Gene Therapies (formerly, Audentes Therapeutics). All storage, manipulation, and processing of patient identifiers to generate tokens was performed on secure encrypted computing devices protected by the Boston Children’s Hospital institutional firewall.

Statistical analysis

In every case where there are fewer than or equal to three patients, HIPAA suppression logic was applied to remove the risk of reidentification. All patients who met this criterion were subsequently noted with the value of “≤ 3” and no efforts were made to reidentify or reverse engineer any volumes of patients. The absence of a code did not necessarily mean that the event did not occur, due to conventional coverage gaps that occur in open claims datasets. Unless otherwise suppressed for compliance purposes, all cohort n sizes are indicated. Where n sizes are 30 or fewer patients, analyses should be interpreted as directional. All analyses were performed using an open universe of claims dataset, with the queries written in SQL code. Descriptive statistics (frequencies and proportions) were generated for categorical variables.

Results

Patients and providers

Our analysis included 192 males with a diagnosis of XLMTM, including 80 patient tokens and 112 patients classified with the new XLMTM ICD-10 code (Fig. 1). Fifty percent of patients were 0 to 4 years of age when they received their first congenital myopathy diagnosis (Fig. 2). At the time of the last reported claim for each patient, 121 patients (63%) were < 18 years old, 31 patients (16%) were 18–24 years old, 20 patients (10%) were 25–34 years old, and 20 patients (10%) were ≥ 35 years old.

Fig. 1
figure 1

Flow diagram illustrating methods of ascertainment of study cohort subpopulations and the flow of information through the tokenization and data extraction process

Fig. 2
figure 2

Age at first known congenital myopathy diagnosis in XLMTM patients (N = 184). Actual first congenital myopathy claim could be at an earlier age depending on the amount of patient history in the dataset. Eight patients did not have a congenital myopathy diagnosis

XLMTM patients with a congenital myopathy diagnosis received care from a variety of specialty practices (Table 1). Approximately half were seen in pulmonology (53%) and pediatric practices (47%) and about one-third were seen by neurology (34%) and critical care medicine (31%). Regionally, most of the health care providers caring for XLMTM patients were located in the eastern half of the US and along the west coast (Additional file 1: Fig. S1).

Table 1 Specialists seen by XLMTM patients with a congenital myopathy diagnosis (N = 124)

During the study period from 2016 to 2020, the number of patients with claims each year increased steadily from 120 to 154 and the average number of claims per patient per year increased from 93 to 134 (Fig. 3).

Fig. 3
figure 3

Claims activity by year

Diagnostic and therapy utilization

The most common conditions and procedures related to XLMTM (Table 2) reflect high disease burden, including respiratory events (82%), ventilation management (82%), tracheostomy (64%), feeding difficulties (81%), feeding support (72%), gastrostomy (69%), scoliosis (42%), other myopathies (77%), wheelchair equipment (56%), and critical and emergency care (71%). Among patients who experienced respiratory events, most experienced acute respiratory distress or arrest (54%), nearly all experienced chronic respiratory events (96%), and others had medical coding that classified respiratory issues as acute and chronic (46%). Medical codes were classified as acute when explicitly stated as “acute” in the ICD-9 or ICD-10 code; medical codes were classified as chronic when explicitly stated as “chronic” in ICD-9 or ICD-10 coding or could not be ascertained from the code description as acute or chronic; any codes described as “acute and chronic” were labeled as such.

Table 2 Common conditions and procedures/services related to XLMTM in the overall XLMTM population with a diagnosis of congenital myopathy

The most commonly prescribed medications (Fig. 4) were those often used to treat respiratory illness and infection, including albuterol (38%), amoxicillin-clavulanate (32%), fluticasone propionate (31%), and amoxicillin (30%). The most frequently used diagnosis and procedure codes (Table 3), however, were for investigation of various cholestatic and liver abnormalities, including, gamma-glutamyl transferase [25]; abnormal serum enzyme; other or unspecified [21]; jaundice [17]; elevation of aspartate aminotransferase (AST), alanine aminotransferase (ALT), or lactate dehydrogenase (LDH) [12]; obstruction of gallbladder or bile duct [11]; and cholelithiasis [10]. Abnormal living function/imaging was present in 16 (8%) patients, disorders of bilirubin metabolism in 9 (5%) patients, and the combination of abnormal serum enzyme levels and disorders of bilirubin metabolism in 27 (14%) patients.

Fig. 4
figure 4

Common medications

Table 3 Frequency of applicable diagnosis and procedure codes searched across 192 patients

Inpatient claims

The total number of observed annual inpatient claims increased from 5,914 in 2016 to 11,513 in 2020 and the average number of inpatient claims per patient annually increased from 57 to 92 over the same time period (Table 4).

Table 4 Average number of annual inpatient claims across all patients in aggregate

Hospitalizations, excluding normal birth deliveries, were analyzed in aggregate (n = 146), then further segmented based on unplanned hospitalizations (n = 107) and planned hospitalizations (n = 121) (coding provided in Additional file 1: Table S1). Approximately half of patients (55%, n = 80) experienced their first hospitalization between 0 and 4 years of age: across all age groups, 76% experienced any hospitalization, 55% for unplanned and 63% for planned hospitalizations Fig. 5). The greatest proportion of patients experienced 1 to 2 hospitalizations (31% any, 27% unplanned and 32% planned) (Fig. 6).

Fig. 5
figure 5

Age at first hospitalization (any, unplanned, and planned)

Fig. 6
figure 6

Number of hospitalizations (any, unplanned, and planned)

Discussion

This study utilized a novel claims data analysis involving a comprehensive insurance claims database and artificial intelligence capabilities to quantify the disease burden of XLMTM. Results document the substantial disease burden of XLMTM, including extensive health care resources needed to care for these patients, specifically respiratory and ventilator support, tracheostomy, gastrostomy tube feeding, wheelchair use, critical and emergency care, medications for respiratory illness and infections, and hospitalizations. Caring for these patients required multispecialty involvement (pulmonology, pediatrics, neurology and critical care medicine) and high rates of multiple hospitalizations, surgical procedures, and use of assistive devices. Notably, the average number of inpatient claims per patient aged 0 to 4 and 5 to 12 were 23.2 and 23.8, respectively. Among age- and gender-matched controls in the IPM.ai claims database representing the general population, the average inpatient claims per year for patients 0 to 4 years and 5 to 12 years of age were 0.3 for both groups (Table 5). Interestingly, the number of claims increased between 2016 and 2020. It is not apparent from the data what might account for this, but the shift from ICD-9 to ICD-10 in October 2015 may have increased granularity in the later years and the new XLMTM-specific ICD-10 code and increased genetic testing in general might have contributed to improved data capture.

Table 5 Average inpatient claims per year for patients with XLMTM and the general population

These findings are largely consistent with the high disease burden associated with need for mechanical ventilation, gastrostomy tube feeding, and wheelchair requirement reflected in the limited natural history studies of XLMTM [5,6,7,8,9], including the RECENSUS retrospective chart review of 112 males with XLMTM. This is likely due, in part, to a high degree of patient overlap between the RECENSUS study cohort [5, 6] and this claims analysis data set (76 of the patient tokens in our study were also RECENSUS patients). However, the RECENSUS study focused on disease manifestations and recorded medical management and was limited by incomplete availability of data, while our all-claims analysis represents a more complete overview of all aspects of the patients’ medical care. For example, in the RECENSUS study, codes for diagnosis of hepatobiliary disease were among the most common ICD-10 codes utilized despite hepatobiliary disease not being a primary manifestation of XLMTM. Use of a certain ICD-10 code, however, does not mean the patient has the disease of interest. Physicians often refer XLMTM patients to gastroenterology for liver evaluation, using a code for liver abnormality despite the absence of overt liver disease. This finding of high use of hepatobiliary ICD-10 codes highlights an area requiring further study in light of recent reports of hepatobiliary abnormalities as an under-appreciated aspect of XLMTM, including cholestasis [14, 15], hypertransaminemia and hyperbilirubinemia with hepatopathy and cholestasis [16, 17].

Our claims analysis expands on the findings from an economic analysis by Sacks et al. showing very high medical costs and intense healthcare resource utilization associated with caring for patients with XLMTM [11]. First hospitalization between 0 and 4 years of age for 80% of patients in our study supports Sack et al.’s estimate of mean monthly per patient direct medical costs being highest in the first year of life ($74,831), with declining costs over the second, third, and fourth years of life ($23,207, $13,044, and $9,440, respectively) [11]. In addition, inpatient admissions ($69,025) accounted for the majority of the mean monthly per patient direct medical costs [11]. Both studies identified the major burden of ventilator support and ventilation management. Moreover, our study quantified the extensive use of tracheostomy, gastrostomy tube feeding, wheelchairs, occupational and physical therapy, and home care services. The Sacks et al. analysis estimated the mean monthly per patient prescription medication cost at $540 [11] and we have shown that the most common medications prescribed for these patients are medications for respiratory illnesses and infections.

The use of deidentified tokens for record matching across research consortia and between identified research databases and anonymized public databases has been growing [18], and a recent study reported 99% precision for matching among 20,002 record pairs when first name, last name, gender, and date of birth were tokenized [19]. However, as this analysis employed a novel use of artificial intelligence capabilities on both commercial and CMS payor databases, namely Medicaid claims, it has inherent limitations. Capture of open claims data relies on billing codes that may not accurately reflect the care delivered or the clinical reasoning for ordering a given test or procedure. In addition, the existing challenges of using nonspecific congenital myopathy ICD-10 codes in research on XLMTM were further complicated by the release of the new XLMTM-specific diagnosis code during the study time frame. Given the newness of this code, it will be important to validate how rigorously it is being applied, and whether care providers are limiting its use to patients with genetically confirmed diagnoses. Finally, the need to group like codes together into a code set for reporting purposes of some data limited the granularity of our analysis. Similarly, our age-group coding was structured to group events within the patients first 4 years of life in order to avoid reporting distinct patient counts fewer than four patients. However, this approach precludes looking specifically at coding for events within the first year of life, which could be useful in the setting of XLMTM research. We also saw an unexpectedly high number (n = 20) of XLMTM patients older than 35 years, 17 of whom were identified of the basis of ICD-10 code, which raises the question of whether some older males with congenital myopathy on muscle biopsy are being coded as XLMTM without the benefit of genetic testing. Thus, although at least three patients with molecular confirmation of MTM1 mutations were over the age of 35, analysis of age at diagnosis and first hospitalization (Figs. 2 and 5) should be considered in light of the possibility that some of the other 17 might reflect overuse of the XLMTM ICD-10 diagnostic code.

Due to the nature of open source databases, our findings likely underestimate the XLMTM disease burden. While the IPM.ai dataset (and others like it) represents CMS and privately insured patients, there are coverage gaps when using open source data. In particular, the available in-hospital data often include only admission and discharge data and the events happening within the hospital stay are not captured. Nonetheless, our analysis is more broadly representative of the general US population than the more idiosyncratic RECENSUS dataset, which was based on a research program from a specific patient registry.

While currently there are no disease-modifying therapies approved to treat XLMTM, several novel therapeutic strategies have shown promise in pre-clinical disease models, including tamoxifen [20], PIK3C2B inhibition [21], dynamin 2 reduction [22], and intramuscular injections of either adeno-associated virus (AAV)-shRNA or AAV-mediated gene replacement therapy (resamirigene bilparvovec) [23,24,25,26]. Our improved understanding of the unmet medical burden reflected in these claims data will enable rigorous and appropriate risk–benefit considerations as these and other therapies are developed and become available.

Conclusions

To our knowledge, this is the first analysis to combine a comprehensive claims database with artificial intelligence capabilities to quantify the extensive disease burden and health care resources needed to provide patients with XLMTM the care they rely on for survival. These data will be valuable for assessing healthcare resource utilization for future disease-modifying therapies relative to the current supportive care.