FormalPara Key Points for Decision Makers

Patient characteristics associated with high healthcare costs in generalized myasthenia gravis (gMG) in the USA included MG exacerbation at presentation, hospitalizations before MG diagnosis, and immunoglobulin and monoclonal antibody use after diagnosis.

Identifying and targeting amenable characteristics that are predictors of high costs in gMG may reduce cost for payers and provide priority guidance for healthcare providers treating the disease.

1 Introduction

Myasthenia gravis (MG) is an autoimmune disease characterized by fluctuating muscle weakness resulting from binding of autoantibodies to components at neuromuscular junctions, primarily acetylcholine receptor [1, 2]. The prevalence of MG in the USA was estimated at 37 per 100,000 persons, according to a claims-based analysis using population estimates from the 2021 US census [3]. Although symptoms caused by ocular muscle weakness such as eyelid drooping and double vision are the most common at MG presentation, about 85% of patients with ocular onset eventually develop symptoms in non-ocular muscles and are said to have generalized MG (gMG) [1]. gMG symptoms may manifest as fatigue, difficulty speaking and swallowing, and weakness in neck, limb, and respiratory muscles. Acute exacerbation of gMG symptoms can lead to myasthenic crisis, a life-threatening condition that requires intensive care and mechanical ventilation [1, 4].

The management of gMG generally consists of symptomatic treatment with acetylcholinesterase inhibitors to restore muscle strength; immunosuppression with corticosteroids and/or non-steroidal immunosuppressants should be used in those who do not achieve satisfactory outcomes with symptomatic treatment [1, 5]. Thymectomy should be considered early in the disease to minimize immunotherapy dose or duration [6]. For muscle-specific tyrosine kinase–positive gMG, rituximab may be considered following initial immunotherapy. For severe, refractory, acetylcholine receptor–positive gMG, eculizumab may be used [6]. Plasma exchange (PLEX) and intravenous immunoglobulin (IVIg) are the mainstay for managing myasthenic crises [5]. Newer gMG therapies approved or under investigations include monoclonal antibodies targeting different immune system components such as immune checkpoints, cytokines, and neonatal Fc receptor [7,8,9].

GMG is a chronic condition that can be costly to manage [10,11,12]. Prior studies have found that key cost drivers of MG or gMG management include use of PLEX and IVIg, high-dose steroids, MG exacerbations and crisis, and hospitalizations [11,12,13,14]. Nevertheless, no studies have assessed patient characteristics presented at gMG diagnosis or in the first year following diagnosis, which may predict subsequent high management costs. Knowledge of potential cost predictors may help identify unmet needs and inform strategies for cost savings for patients with gMG and the healthcare system both in the short term and over time. Notably, costs associated with gMG may vary not only from the disease severity spectrum, as reflected in the treatments needed, but also from demographic characteristics and specific comorbidities [15]. The complex relationships among demographic characteristics, baseline comorbidities, gMG clinical presentations, and treatment patterns need to be taken into account when attempting to identify potential predictors of high costs among patients with gMG.

While conventional regression-based variable selection or elimination methods alone are not equipped to simultaneously consider a large number of characteristics, machine learning (ML) method, specifically the random forest (RF) model, is well suited for such variable selection [16]. An RF model builds many decision trees to capture the effects and interactions between characteristics in the data [16]; hence, it can leverage the richness of available demographic characteristics, treatments, healthcare resource use (HRU), and cost data to capture unique aspects of patient profiles of those with gMG. On the contrary, a regression-based model isolates the risk associated with each predictor. With the different approaches, a combination of an RF model with regression could allow for estimations of the magnitude as well as direction of the association. Using this combined RF and regression approach, the current study aimed to identify predictors of high healthcare costs from a payer’s perspective after the first diagnosis of MG, as well as 12 months after first diagnosis, in a real-world US cohort.

2 Methods

2.1 Data Source

De-identified US administrative claims data from the IQVIA PharMetrics® Plus database (2017–2021) were used. IQVIA PharMetrics® Plus is a health plan claims database composed of fully adjudicated medical and pharmacy claims for more than 210 million unique enrollees since 2006. Data contributors to the database are largely commercial health plans. It is representative of the commercially insured US national population for patients under 65 years of age. It contains a longitudinal view of inpatient and outpatient services, prescription and office/outpatient administered drugs, costs, and detailed enrollment information. Data are de-identified and comply with the patient requirements of the Health Insurance Portability and Accountability Act (HIPAA) of 1996. Per HIPAA, there are no restrictions on the use or disclosure of de-identified health information (Title 45 of CFR, Part 164.502(d)(2), 164.514(a) and (b)). No review by an institutional review board was required per Title 45 of CFR, Part 46.104(d)(4) because the data were de-identified.

2.2 Study Design

A retrospective cohort design was used to select patients with gMG during the intake period that spanned from 1 January 2018 through 31 December 2021. The index date was defined as the date of the first diagnosis of MG. The 12-month baseline period before the index date was used to describe patient characteristics, i.e., potential predictors of high costs, including demographics, comorbidities, treatments, HRU, and characteristics of gMG on the index date.

To be able to evaluate the association of high costs with gMG-related treatments and clinical events occurring not only before but also after the first MG diagnosis, a subgroup of patients with > 12 months of follow-up after the index date was also identified. Among this subgroup, potential characteristics (i.e., treatments, HRU, and gMG-related characteristics and clinical events) were measured during the 12-month landmark period after the index date, in addition to the baseline period.

Healthcare costs were measured from the index date until the earliest among the end of continuous health plan eligibility or data (main cohort). Additionally, healthcare costs were measured from the end of the landmark period (i.e., from month 13 post-index) until the earliest among the end of continuous health plan eligibility or data.

2.3 Patient Selection Criteria

As there is no specific diagnosis code for gMG, the condition was identified on the basis of criteria suggested by Phillips et al. [12] that were further refined in this study and involved a combination of MG diagnoses, treatment settings, and physician specialty. Specifically, patients were included in the study if they (1) had ≥ 1 claim with a principal diagnosis for MG [International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM): G70.00, G70.01, or G70.2] in an inpatient setting or ≥ 2 claims on separate days with a diagnosis of MG in any position in an outpatient, emergency, or other setting; (2) had ≥ 1 diagnosis of MG by a neurologist; (3) were ≥ 18 years old on the index date; and (4) had ≥ 12 months of continuous health plan eligibility (with both medical and pharmacy coverage) before the index date. To be included in the subgroup, patients were additionally required to have > 12 months of continuous health plan eligibility (with both medical and pharmacy coverage) after the index date.

2.4 Study Outcome Measures

All-cause healthcare costs, including pharmacy and medical costs, during the follow-up period were measured from a payer’s perspective and inflated to 2021 US dollar. Patients with high follow-up costs in this study were defined as those with mean monthly all-cause healthcare costs in the 85th percentile (i.e., $9404) or above. The 85th percentile was chosen as the threshold of high costs for two reasons: (1) it is meaningfully far from the mean (approximately double of the mean of $5567) to expect a difference in characteristics between patients with and without high costs and (2) the split between patients with high cost (top 15%; N = 411) and without (bottom 85%; N = 2328) produced samples sufficient for training, validation, and test with an ML algorithm (Supplementary Fig. S1).

A similar thought process was used to identify high costs after the 12-month landmark period; high costs represented mean all-cause healthcare costs in the 90th percentile (i.e., $9519) or above. The threshold was meaningfully far from the mean (i.e., approximately double of the mean of $4139) and produced a sufficient split of high cost (top 10%; N = 164) and without (bottom 90%; N = 1474). Of note, the monthly all-cause healthcare costs after the landmark period at the 85th percentile were $5884, which was deemed not meaningfully far enough from the mean (Supplementary Fig. S2).

2.5 Statistical Analyses

All analyses were conducted using SAS Enterprise Guide 7.1 (SAS Institute, Cary, NC).

2.5.1 Descriptive Analysis of Patient Characteristics

Patient demographic, clinical characteristics, and treatment patterns were summarized descriptively using means, standard deviations, and medians for continuous variables and frequency counts and percentages for categorical variables.

2.5.2 RF Model Development and Performance Evaluation

Two separate RF models were developed. One model evaluated predictors of high costs any time after the index date on the basis of factors observed during the baseline period in the main cohort. The other evaluated predictors of high costs after the landmark period on the basis of factors observed both during the baseline and landmark periods in the subgroup. Models were trained to predict whether a patient would incur high costs using the list of potential predictors (i.e., patient characteristics captured in the claims data) as features. All potential predictors were assessed using a multicollinearity matrix to check for highly correlated features. Features with a Pearson correlation coefficient ≥ |0.8| were removed to improve the interpretability of both the ML algorithm and the regression model [17, 18].

For each RF model, a training set consisting of 80% of the data was used to train the RF model. The data contained information on all potential predictors and a binary indicator of whether the patient incurred high costs. To refine the ML algorithm, model hyperparameters (i.e., number of trees, tree depth, number of features to be chosen at each tree split) were chosen using grid search with repeated tenfold cross-validation with the objective to maximize the area under the receiver operating characteristic curve (AUROC). The final model for the main cohort was trained using 2250 trees, a maximum depth of 9, and up to 10 features randomly chosen at each split. For the subgroup, the model was trained using 2750 trees, a maximum depth of 5, and up to 12 features [19]. The remaining data were used as a test set to evaluate the model’s predictive performance on the basis of AUROC, in which the receiver operating characteristic curve was generated by plotting the true positive rate (i.e., sensitivity) versus the false positive rate (i.e., 1-specificity) resulting from all possible threshold values. A model that can only randomly predict a patient’s probability of having high cost would have an AUROC of 0.5, whereas a model with perfect classification would have an AUROC of 1.0.

2.5.3 Determination of Feature Importance

The most important predictors of high costs in the RF models were determined using Shapley importance [16, 20], which ranks features on the basis of their mean absolute Shapley value. The mean absolute Shapley value shows, on average, how much each characteristic impacts the predicted probability of high costs in the positive or negative direction from the background probability [e.g., a mean absolute Shapley value of 0.01 indicates that, on average, the predicted probability of a patient having high costs increases or decreases by one percentage point from the background probability of having high cost (i.e., 15% for the main cohort, 10% for the subgroup) due to the given predictor].

2.5.4 Quantification of the Magnitude and Direction of the RF Model–Identified Predictors

As Shapley values generated from the RF models do not quantify the magnitude or a direction of the association in terms of risk between the outcome and a chosen predictor, multivariable modified Poisson regression models with robust error variance were used, with high costs as the dependent variable and the top 50 predictors with the highest Shapley values as independent variables. The modified Poisson regression with robust error variance directly estimates the risk of having high cost for each predictor and is preferred over the conventional logistic regression since the rare outcome assumption is not met [21]. These two regression models were specific to the main cohort and the subgroup of patients with > 12 months of follow-up. The final regression models were adjusted for all selected predictors.

3 Results

3.1 Patient Characteristics and Potential Predictors of High Follow-up Costs in gMG

A total of 2739 patients with gMG were included (Table 1). Patients had a mean age of 56.2 years, 50.6% were female, and 58.1% were commercially insured. During baseline, 81.6% of patients had an a comorbid condition that is in the Elixhauser comorbidity index [22], including hypertension (48.7%), obesity (27.2%), and diabetes (21.5%); 78.1% had weakness or fatigue; and 47.8% had a psychiatric condition included in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5).

Table 1 Patient demographic and clinical characteristics

At index, 16.0% of patients were diagnosed with MG with exacerbation and 60.9% were first diagnosed with MG by a neurologist. There were 41.5% of patients with baseline use of corticosteroids, and 18.9% were treated with an acetylcholinesterase inhibitor. Nearly all (97.6%) patients had ≥ 1 outpatient visit, 48.7% had ≥ 1 emergency department visit, and 15.8% had ≥ 1 inpatient admission at baseline. The mean baseline all-cause monthly healthcare costs were $1633.

The subgroup of patients with > 12 months of follow-up included 1638 patients and yielded similar distributions of characteristics to the main cohort during the baseline period (Table 1). Additionally, during the 12-month landmark period, 68.2% of patients were treated with an acetylcholinesterase inhibitor, 56.9% with corticosteroids, 21.2% with non-steroidal immunosuppressants, 11.9% with immunoglobulin, 3.2% with plasmapheresis, and 2.6% with monoclonal antibodies (rituximab and eculizumab, in nearly equal parts). Of note, newer treatments approved for gMG in the end of 2021 (i.e., efgartigimod) or after 2021 (ravulizumab, rozanolixizumab) were not captured in the data.

The mean all-cause monthly healthcare costs were $1456 during baseline and $4022 during the landmark period.

3.2 Predictors of High Follow-up Costs Identified in the RF Models

The main cohort had a mean follow-up length of 18.5 months, with a maximum of 48 months. For the subgroup, the mean follow-up length after the landmark period was 14.8 months, with a maximum of 36 months. Both RF models had good predictive power. The model using baseline characteristics to predict costs after index date had an AUROC of 0.74. The model using both baseline and landmark period characteristics to predict costs after the landmark had an AUROC of 0.75 (Table 2). As determined by Shapley importance, the most important predictors of high costs after the index date were baseline inpatient admissions (Shapley value: 0.0126; i.e., having a previous inpatient admission during the baseline period changed the probability of high costs from the background probability by 1.26 percentage point), MG with exacerbation on the index date (0.0107), baseline numbers of outpatient visits (0.0091) and days on corticosteroids (0.0093), baseline comorbidities [diabetes (0.0084), fluid and electrolyte disorders (0.0074), psychiatric conditions (0.0068)], and diagnosing physician specialty [neurology (0.0056), cardiology (0.0055)] (Fig. 1a).

Table 2 Random forest performance metrics
Fig. 1
figure 1

a Top 50 predictors in order of Shapley importance (%) from the main cohort1,2. b Top 50 predictors in order of Shapley importance (%) from the subgroup1,3. AChEI, acetylcholinesterase inhibitor; CS, corticosteroid; DSM-5, Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition; ED, emergency department; ICU, intensive care unit; Ig, immunoglobulin; HRU, healthcare resource utilization; IP, inpatient; gMG, generalized myasthenia gravis; OP, outpatient; PLEX, plasma exchange. 1. Shapley importance ranks features on the basis of their mean Shapley value. The mean Shapley value shows, on average, how much each variable impacts the background probability of high costs, in the positive or negative direction (e.g., a Shapley value of 1% indicates that, on average, the probability of being predicted high costs increases or decreases by 1%). 2. Characteristics identified during the baseline period or on the index date. 3. Characteristics identified during the landmark period, unless indicated otherwise

The most important predictors of costs after the landmark period were characteristics during the landmark period (i.e., the 12 months after the first MG diagnosis), including immunoglobulin use (0.0188), number of outpatient visits (0.0162), number of MG-related medical encounters on separate dates (0.0154), number of days on corticosteroids (0.0094), and ≥ 4 unique gMG-related therapies (0.0085; Fig. 1b).

3.3 Association of Predictors with High Follow-up Costs in Regression Analyses

For the model using baseline characteristics to predict costs after index date, adjusting for the top 50 predictors identified in the RF model, regression analyses showed that patients with a cardiologist as the first diagnosing physician for gMG were 177% more likely to have high total monthly healthcare costs during the follow-up period compared with those first diagnosed by a neurologist. Patients initially presented with exacerbated MG were 78% more likely to have high follow-up costs, compared with those without (all p < 0.05; Fig. 2a). Certain baseline comorbidities were also associated with high follow-up costs in gMG. The likelihood of patients with gMG to have high follow-up costs increased by 74% for those with metastatic cancer, 54% for those with weakness or fatigue, 42% for those with an autoimmune disorder, and 36% for those with diabetes (all p < 0.05; Fig. 2a). In addition, baseline corticosteroid claims and previous hospitalization were also associated with high costs. For every 10 additional claims for corticosteroid during the baseline period, the likelihood of having high follow-up costs increased by 33% among patients with gMG; and patients with ≥ 1 inpatient admission during the baseline period were 27% more likely to have high follow-up costs (all p < 0.05; Fig. 2a).

Fig. 2
figure 2figure 2

a Regression results with the top 50 predictors for the main cohort1,2. b Regression results with the top 50 predictors for the subgroup1,2. AChEI, acetylcholinesterase inhibitor; AIC, Akaike information criterion; CI, confidence interval; CS, corticosteroid; DSM-5, Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition; ED, emergency department; ICU, intensive care unit; IP, inpatient; MG, myasthenia gravis; OP, outpatient; PLEX, plasma exchange; RF, random forest; RR, risk ratio. 1. Characteristics are selected by the RF algorithm and based on the Shapley importance index. 2. The modified Poisson regression was adjusted for the top 50 most predictive model features of the RF model. 3. RRs, 95% CI, and p-values are estimated on the basis of a multivariable modified Poisson regression with robust error variance. RRs > 1 indicate that patients with that characteristic are at a higher risk of having high cost, compared with those without such characteristic. 4. Includes four patients with an unknown payer type. 5. The deviance goodness-of-fit test was used to determine model fit for the modified Poisson regression. The null hypothesis is that the model is correctly specified

For the model using both baseline and landmark period characteristics to predict costs after the landmark, adjusting for all top 50 variables, post-MG diagnosis use of immunoglobulin and monoclonal antibodies increased the likelihood of high costs by 261% and 135%, respectively. Patients with systemic lupus erythematous, any DSM-5 diagnosis, or diabetes as a comorbid condition during the baseline period were 119%, 42%, and 42% more likely to have high costs, respectively (all p ≤ 0.05; Fig. 2b).

4 Discussion

In this retrospective cohort study, RF and Poisson regression models were used to identify predictors of high healthcare costs in gMG. In RF analysis, among the main predictors of high costs occurring before or at the first MG diagnosis were presentation of MG with an exacerbation, baseline inpatient admission, and baseline number of days with corticosteroids. After the initial MG diagnosis, immunoglobulin and monoclonal antibody use, numbers of all-cause outpatient visits, and number of MG-related encounters were the main predictors of high costs. Adjusting for top predictors identified in RF models in regression analysis, initial presentation of MG with an exacerbation, inpatient admission preceding MG diagnosis, and post-diagnosis immunoglobulin and monoclonal antibody use substantially increased the risk of high costs when keeping other characteristics constant.

It is not unexpected that RF and a parametric multivariate regression model such as Poisson regression may yield different results. RF identifies a characteristic that is predictive when interacting with others but not on its own, while a multivariable regression model isolates individual effects of each characteristic [23]. This may lead to differences in predictive importance between the RF and Poisson models. For instance, certain top predictors in the RF models were not evident as high-cost predictors in the Poisson models likely because these predictors have strong interactions with other important features. Contrarily, certain comorbidities, such as metastatic cancer, weakness, or fatigue, were associated with high costs in regression models but not RF models. This suggests that these comorbidities may have a large individual effect on high costs. The use of both ML and conventional regression models in this study accounts for the complex and heterogeneous characteristics of real-world patients with gMG that have both strong interaction and individual effects while providing interpretable associations with high healthcare costs.

Prior studies have reported cost drivers associated with MG or gMG management [10,11,12]; however, information on predictors of high costs based on patient characteristics presented both before and after gMG diagnosis has been lacking. Combining baseline patient characteristics and post-diagnosis clinical management information, the current findings suggest that uncontrolled gMG symptoms and comorbidities are key contributors of high costs, and there is a clear need to prevent progression of gMG as well as to manage comorbid conditions to minimize follow-up healthcare costs. Specifically, pre-diagnosis inpatient admission and initial MG exacerbation at presentation as drivers of costs may imply diagnostic difficulty, which may partially be due to muscle weakness being a common symptom shared by many disorders [24]. A prior US physician survey reported that one in five patients with MG were initially misdiagnosed with another condition; the survey also found an average 4.8-month delay from symptom onset to first consultation, and an additional 3.9-momth delay in receiving a diagnosis, at which time about one-quarter of patients were experiencing moderate-to-severe muscle weakness [25]. Patients with poor control of gMG symptoms at diagnosis will most likely need more intensive management, including higher dose and longer use of corticosteroids and monoclonal antibodies; these patients may also experience more acute MG events (i.e., more exacerbation and crisis) that require immunoglobulin or mechanical ventilation. Immunoglobulin use stood out as a strong predictor of costs, a finding that aligns with a previous study showing increased immunoglobulin use during an exacerbation or crisis leading to high HRU and costs [13]. Another study found that immunoglobulin accounts for the highest proportion of pharmacy costs despite being used by only a small fraction of patients with gMG [12]. Collectively, the consequences of poorly controlled gMG substantially increase the costs to patients and the healthcare system; hence, improving disease awareness to facilitate early diagnosis as well as implementing strategies for early disease control to prevent gMG progression may help reduce costs. Future studies may also assess whether newer gMG treatments (e.g., complement C5 inhibitors, neonatal Fc receptor antagonists) may lead to better clinical outcomes and potential cost savings in the longer term.

Several comorbidities were found to increase the cost burden in gMG. Research has suggested that patients with gMG are at an increased risk of multiple comorbidities, which can complicate diagnosis and management and hinder the provision of specialized gMG treatment, thereby increasing costs [26]. In this study, diabetes and psychiatric conditions, which affected more than 20% and 50% of the current patient sample, respectively, were significantly associated with high follow-up costs. Notably, clustering of autoimmune disorders is common in patients with a history of another autoimmune disease, including gMG [27]. The high costs associated with baseline autoimmune disorders and systemic lupus erythematous found in this study therefore further highlight the need to monitor and co-manage these comorbidities as part of gMG management strategy to reduce subsequent costs along the disease course.

Together, the ML models in the current study identify characteristics of high-cost patients with gMG, which may help inform payers on potential cost-saving strategies by addressing the unmet care needs of this population. Treating physicians may prioritize patients with gMG presented with comorbidities, certain treatment patterns, and other amenable characteristics associated with high follow-up costs to provide more targeted management, which may in turn reduce subsequent clinical, humanistic, and economic burden to the patients and the healthcare system at large.

4.1 Limitations

The findings of this study should be interpreted with certain limitations. First, the results may not be generalizable to patients without insurance or those with health insurance other than those included in the database (i.e., commercial, Medicare Advantage, Medicaid, or self-insured plans). Second, due to the lack of specific diagnosis code, gMG in this study was identified on the basis of refinement of previously suggested criteria [12], which were based on information available in claims data, and may not accurately capture all patients with gMG. Although the selection criteria were limited to claims filed by neurologists to maximize the likelihood of a gMG diagnosis, misidentification might still have occurred. Third, the models are subject to omitted variable bias as certain clinical characteristics (e.g., MG serotypes identified through antibody tests and impacting symptoms, treatment response, and disease progression) were sparsely populated in the data and were omitted from the model despite being potential predictors for high costs [28, 29]. Future studies using electronic health records of patients with gMG may provide insights on how these clinical characteristics may predict healthcare costs. Fourth, due to left censoring prior to the start of current continuous health plan eligibility, it was not possible to ascertain whether all patients included in this study were newly diagnosed with gMG. Finally, analyses of administrative claims data depend on correct diagnosis, procedure, and drug codes, and coding inaccuracies may lead to case misidentification.

5 Conclusions

Characteristics related to the first presentation of MG (exacerbations, physician specialty), previous inpatient admission, diagnosis of diabetes, number of corticosteroid claims, and post-diagnosis use of immunoglobulin and monoclonal antibodies were strong predictors of high costs after the first MG diagnosis, both when interacting with other characteristics and individually. These predictors are not only useful for payers to identify patients with gMG who might incur higher healthcare costs, but may also serve as a priority guidance for physicians treating gMG for more targeted management. Collectively, these findings suggest that there is a clinical and economic need to manage acute symptomatic events, comorbidities, and switch of treatments from immunoglobulin to reduce healthcare costs after gMG diagnosis.