The concentration of health care spending—5% of patients incur roughly half of total spending1,2,3—has prompted an intense focus on improving the quality and value of health care delivery for high-cost patients.3,4,5 Current efforts to describe and segment high-cost patients have focused predominately on fee-for-service (FFS) Medicare populations.3,6,7 Despite the fact that one in three Medicare beneficiaries is now enrolled in a Medicare Advantage (MA) plan,8 less is known about high-cost MA patients.

Better understanding the clinical composition of high-cost patient populations has the potential to improve care management program design. Researchers and policymakers have suggested that suboptimal patient targeting—singular interventions are frequently applied to diverse patient populations—may be driving the limited effectiveness of current care management approaches.3,7,9,10

Descriptive analyses have shown that within high-cost patient populations, there is substantial variation in demographics, functional status, diagnoses, and disease burden.6,7,11,12,13 Improving care management program effectiveness, therefore, may require identifying distinct subgroups of high-cost patients, and tailoring interventions to meet the unique needs of each group. Existing frameworks for identifying subgroups of high-cost patients are derived from expert opinion.3,6,7,14 There may be an opportunity to supplement these approaches by identifying subgroups exclusively based on the analysis of variation within patient data.

In this study, we aimed to (1) describe the demographic and clinical characteristics of a high-cost MA population, (2) use cluster analysis to derive high-cost patient subgroups from large volumes of clinical and claims data, and (3) explore whether these subgroups were meaningfully associated with patterns of utilization, spending, and mortality.


Study Population

We obtained data for patients enrolled in MA plans offered by CareMore Health System in 2014 (n = 93,047). CareMore, a subsidiary of Anthem, Inc., offered MA plans in California, Arizona, and Virginia in 2014. We excluded patients who were not continuously enrolled for the entire year (n = 27,163), those who died during 2014 (n = 3649), those who did not have any approved medical claims during 2014 (n = 1867), and those enrolled in an institutional special needs plan (n = 1981). The final study population consisted of 61,546 patients.


We extracted demographic, clinical, utilization, spending, and mortality data for the study population from CareMore’s electronic data warehouse (EDW). The EDW aggregates data from electronic medical records (EMR) and administrative sources. Data were obtained for the years 2013–2015.

A full description of study variables is provided in Appendix 1. Briefly, we grouped variables into the following categories: Demographics, chronic conditions, active diagnoses, procedures, laboratory, pharmacy, mortality, utilization, and spending. Demographic variables included age and gender. Chronic conditions were assessed individually according to the Elixhauser15 classification using prior year (2013) ICD-9 diagnosis codes. Active diagnoses were determined using 2014 ICD-9 codes, and grouped according to Agency for Healthcare Research and Quality (AHRQ) Clinical Conditions Software (CCS) categories.16 Procedures were determined using 2014 CPT codes, and grouped according to AHRQ CSS categories.17 Laboratory variables were assessed directly from the EMR. Pharmacy variables were calculated using pharmacy fill data, and included the number of unique medications as well as adherence, which was defined as the proportion of days covered (PDC)18 among a limited set of common outpatient medications. Subsequent year (2015) mortality was assessed directly from the EDW. Utilization and spending variables were assessed directly from paid claims. Preventable spending was calculated using the AHRQ Prevention Quality Indicators algorithm19 for inpatient spending, and the algorithm created by Billings et al.20 for emergency department spending, both of which have been validated and used in prior work segmenting high-cost Medicare populations.21,22

Descriptive Analyses

We defined “high-cost” patients as those in the top decile of spending in 2014 (n = 6154). First, we described demographic, chronic condition, pharmacy, utilization, and spending variables across the entire study population. We then compared these variables between high-cost and non-high-cost subgroups using t tests for continuous variables and χ2 tests for categorical variables.

Identifying Subgroups

Clustering is an unsupervised machine learning technique that groups observations (e.g., patients) according to similarities among measured characteristics. Clustering algorithms iteratively group observations into subgroups until finding the allocation that maximizes both intra-group similarity and inter-group differences (see an accompanying article23 for more information).

The dataset used for cluster analysis included demographics, chronic conditions, active diagnoses, procedures, laboratory, and pharmacy variables for the subset of high-cost patients (n = 6154). Utilization and spending variables were not used for clustering, allowing for comparison of utilization and spending across clusters (see below).

To perform cluster analysis, we began by analytically reducing the number of variables in the dataset—a task known in computer science as dimension reduction. We removed variables with extremely low variance, and those that were highly correlated. A total of 161 variables remained (full list provided in Appendix 2). We then utilized a non-linear dimension reduction algorithm24,25 to create a low-dimension representation of the dataset. Additional detail regarding dimension reduction is provided in Appendix 3 and an accompanying article.23 Finally, we applied a density-based clustering algorithm—Ordering Points To Identify the Clustering Structure (OPTICS)26,27—to the low-dimension dataset. We restricted the minimum number of patients per subgroup to be at least 62 (or 1% of the high-cost population) in order to ensure that the subgroups were operationally meaningful. Our rationale for choosing the OPTICS algorithm and information on tuning parameters is described in an accompanying article.23

Subgroup Analysis

To describe the clinical composition of the resultant subgroups, we first calculated high-cost population means and subgroup-specific means for each variable used in clustering. We then calculated standardized ratios of subgroup means to population means, such that larger numbers represented variables for which the subgroup deviated most from the broader high-cost population. We assigned a clinical descriptive label to each subgroup based on the variables with the highest standardized ratios as well as variables for which the ratios varied most among subgroups. Given the numerous variables, we chose to present the ten variables with the largest standardized ratios (labeled as “distinguishing factors”) for each subgroup. Appendix 4 contains a complete list of standardized ratios for all subgroups.

Next, we compared spending, utilization, and mortality across subgroups. We calculated 2014 utilization rates, average spending, composition of spending, and rates of preventable spending among each subgroup. To better understand the trajectory of spending for subgroups, we calculated average spending, preventable spending, and the prevalence of persistent high-cost status in 2015. We defined persistent high-cost status as remaining in the top decile of total spending in 2015. Finally, we calculated 2015 mortality rates among each subgroup. We excluded patients not continuously enrolled in a CareMore MA plan from the 2015 analyses (n = 1430).

Data preparations were done in SAS version 9.4 (SAS Institute, Cary, NC). R version 3.2.5 was used for all other analyses.


High-Cost Patient Characteristics

In a national Medicare Advantage population (n = 61,546), the highest cost 10% of patients (n = 6154) accounted for 55% of total population spending in 2014, with average annual spending of $55,696 per patient. Among high-cost patients in 2014, 64% were persistently high-cost (remained in top 10% of spending in 2015).

Table 1 describes utilization patterns for high-cost and non-high-cost patients. Compared to non-high-cost patients, high-cost patients had higher average rates of inpatient (IP) admissions (1.7 vs. 0.1), IP days (12.9 vs. 0.9), and emergency department (ED) visits (2.4 vs. 0.4). The total annual spending was roughly ten times higher among high-cost patients ($55,696 vs. $5071), and the rate of preventable spending was also notably higher (7.1% vs. 3.6%).

TABLE 1 Characteristics of the Study Population, by High-Cost Status

High-cost patients also varied substantially from non-high-cost patients across demographic and clinical characteristics (Table 1). High-cost patients were younger (average age 70.9 vs. 73.5), and more likely to be male (49.6% vs. 42.9%). High-cost patient had higher rates of co-occurring chronic conditions (9.0 vs. 4.4) as well as significantly higher rates of all individually assessed chronic conditions, including congestive heart failure (44% vs. 11%), vascular disease (60% vs. 28%), and renal failure (69% vs. 39%). Among high-cost patients, rates of polypharmacy were higher (14.6 vs. 7.9 average prescriptions), and medication adherence was lower (0.7 vs. 0.8 PDC).

High-Cost Patient Subgroups

Cluster analysis identified ten subgroups of high-cost patients. The number of patients in each subgroup ranged from 56 to 3686, and 382 patients were not assigned to any subgroup. Differentiation across subgroups was driven predominately by comorbidities and procedures; laboratory values and demographics were less important (Table 2). There were divergent patterns of index year utilization and spending (Table 3) as well as spending and mortality trajectories (Table 4) across subgroups. Each subgroup and associated patterns of utilization, spending, and mortality are described below. Comparative statements are relative to other high-cost patients, not the entire study population.

TABLE 2 Description of High-Cost Patient Subgroups
TABLE 3 Utilization and Spending for High-Cost Patient Subgroups, 2014
TABLE 4 Utilization and Spending Trajectories for High-Cost Patient Subgroups, 2014–2015

Acute Exacerbations of Chronic Disease (Mixed)

Distinguished by procedures associated with acute hospitalizations. These patients also had higher-than-average rates of cerebrovascular disease, chronic obstructive pulmonary disease, ischemic heart disease, congestive heart failure, and behavioral health disorders. Although rates of utilization and spending were close to average, this subgroup was characterized by higher-than-average rates of preventable spending in 2014 (7.9%) and 2015 (4.3%).

End-Stage Renal Disease

Distinguished by active diagnoses, chronic conditions, and procedures related to end-stage renal disease (ESRD) and dialysis. These patients were among the highest cost (average 2014 spending $74,385), were among the most likely to be persistently high-cost (97.3%), and had a high rate of mortality in 2015 (18.5%). Inpatient and ED utilization were substantially below average.

Recurrent Gastrointestinal Bleed

Distinguished by active diagnoses and procedures related to recurrent gastrointestinal bleed (GIB), as well as chronic conditions representing the sequelae of recurrent GIB. Rates of ED and inpatient utilization were among the highest (2.74 inpatient admissions, 3.25 ED admissions, and 18.96 inpatient days) in 2014, but these patients were among the least likely to remain persistently high-cost (34.6%).

Orthopedic Trauma (Trauma)

Distinguished by active diagnoses and procedures related to fractures and other traumatic events. These patients had the highest rates of inpatient utilization (26.09 inpatient days), but were the least likely to be persistently high-cost (11.8%).

Vascular Disease (Vascular)

Distinguished by active diagnoses and procedures related to peripheral vascular disease. Rates of utilization and spending in 2014 were among the lowest; rates of persistently high-cost status were close to average.

Surgical Infections and Other Complications (Complications)

Distinguished by active diagnoses and procedures related to surgical wounds, infections, and other iatrogenic complications. Patients in this subgroup had higher-than-average rates of inpatient utilization and total spending (19.69 inpatient days and total spending of $60,103), but were less likely to be persistently high-cost (31.1%). The rate of mortality in 2015 was among the highest (19.7%).

Cirrhosis with Hepatitis C (Liver)

Distinguished by active diagnoses and chronic conditions related to the diagnosis, management, and sequelae of hepatitis C infection. These patients had the highest rates of average spending in 2014 ($78,706), which was driven predominately by prescription drug costs (77.0% of total spending). Patients in this subgroup were more likely than average to remain high-cost (58.6%) and had the lowest rate of mortality in 2015 (0.0%).

ESRD with Increased Medical and Behavioral Comorbidity (ESRD+)

Distinguished by diagnoses, chronic conditions, and procedures related to ESRD and dialysis. Compared to the ESRD subgroup, there were higher rates of congestive heart failure, behavioral health disorders, liver failure, and cerebrovascular disease. Patients in this subgroup also had higher rates of preventable spending (4.8% vs. 3.0%) than those in the ESRD subgroup. These patients were the most likely to remain high-cost in 2015 (100.0%) and had the highest rate of mortality in 2015 (25.8%).

Cancer with High-Cost Imaging and Radiation Therapy (Oncology)

Distinguished by metastatic and non-metastatic cancer diagnoses, imaging procedures related to disease staging and surveillance, and brachytherapy. Patients were predominately male (88%) and the most common oncologic diagnosis was prostate cancer. Rates of 2014 utilization (3.25 inpatient days), spending ($46,240), persistent high-cost status (20.0%), and mortality in 2015 (5.5%) were among the lowest.

Neurologic Disorders (Neurologic)

Distinguished by active diagnoses and chronic conditions encompassing neurologic disorders (most notably multiple sclerosis) and neurologic diagnostic procedures. These patients had among the lowest rates of 2014 inpatient utilization, but were among the most likely to remain high-cost (75.6%), with spending driven by prescription drugs (51.7% of total spending).


We found that health care spending was highly concentrated in a national Medicare Advantage population. The highest cost 10% of patients accounted for 55% of total spending, a level of spending concentration similar to that of FFS Medicare.2,28 The majority of high-cost patients were persistently high-cost—65% remained in the highest cost decile the following year. This is in contrast to FFS Medicare, where rates of persistently high-cost status range from 25 to 45%.2,21,28

High-cost patients in this study had roughly twice as many co-occurring chronic conditions as non-high-cost patients, which is consistent with recent research in a FFS Medicare population.7 High-cost patients had higher rates of all comorbid conditions assessed, including diabetes, congestive heart failure, chronic obstructive pulmonary disease, hypertension, depression, and renal failure. Rates of comorbid conditions were higher than those previously described for high-cost FFS Medicare beneficiaries.2,21,28,29

To better understand the composition of this high-cost population, we used cluster analysis to identify subgroups of patients according to similarities across 161 demographic and clinical variables. We identified ten subgroups: acute exacerbations of chronic disease (mixed); end-stage renal disease (ESRD); recurrent gastrointestinal bleed (GIB); orthopedic trauma (trauma); vascular disease (vascular); surgical infections and other complications (complications); cirrhosis with hepatitis C (liver); ESRD with increased medical and behavioral comorbidity (ESRD+); cancer with high-cost imaging and radiation therapy (oncology); and neurologic disorders (neurologic). We found that these subgroups, while identified using only clinical and demographic data, had markedly different patterns of utilization, spending, and mortality.

Taken together, our findings hold important implications for the design and implementation of care management programs. First, these results add to a growing awareness of the heterogeneity of high-cost populations.6,7,11,22,30,31 Traditional narratives describe high-cost patient populations as being comprised of individuals with multiple, poorly controlled chronic conditions, often with coincident frailty and behavioral health disorders.4 Though we found this description to be true in aggregate (i.e., higher-than-average average rates of comorbidity among high-cost patients), it obscures substantial heterogeneity within the high-cost population. The mixed subgroup (roughly 60% of patients) more closely resembled the aforementioned narrative, but the remaining 40% of patients had disparate diagnoses and clinical compositions, often dominated by either a single condition or a single acute event.

The subgroups we identified share some similarities with high-cost patient subgroups identified in other populations. For example, research in FFS Medicare and single-health system populations have described subgroups of high-cost patients similar to the acute/mixed,6,7,30,31,32 ESRD/ESRD+,6,7 vascular,6 GIB,32 complications,33 trauma,30 and neurologic30 subgroups in this study. Incomplete overlap suggests that certain high-cost patient subgroups can be generalized across populations, but also that there is significant variability among different populations.

Second, we found disparate patterns of utilization, spending, and mortality across subgroups, suggesting that uniform care management strategies and interventions are likely to be insufficient. Traditional care management approaches—in which nurse care managers or other allied health professionals assist patients with disease management and medication adherence to reduce the risk of destabilization and inpatient utilization4,30—hold promise for subgroups with multi-morbidity and persistently high spending (e.g., ESRD and ESRD+). Interestingly, patients in the acute/mixed subgroup were relatively unlikely to remain high-cost (35.8%), casting doubt on the efficacy of traditional care management programs in reducing spending in this subgroup, and pointing to the importance of better identifying patients at risk of being persistently high-cost.

Among the other subgroups identified, traditional care management approaches are unlikely to be effective. For example, the neurologic and liver subgroups had among the highest rates of spending and persistently high spending. Within these subgroups, spending was driven predominately by prescription drug costs, indicating that the rational use and pricing of specialty pharmaceuticals may be the most effective strategies for reducing spending. For subgroups defined by acute events (trauma, GIB, complications), there may be limited opportunities to improve care and reduce spending. High rates of mortality among the ESRD, ESRD+, and GIB subgroups should prompt a focus on recognizing, and addressing, life-limiting illness with palliative care and other interventions.

This study has several limitations. First, our study population consisted of patients enrolled in MA plans offered by a single health insurer. As such, the subgroups we identified may not be generalizable to other populations, including other MA or FFS Medicare populations. However, less is known about MA populations, so our analysis begins to fill an important gap. Second, patterns of spending, utilization, and mortality among the clinical phenotypes we identified could be impacted by existing care management programs at CareMore34 and, therefore, may not be generalizable to other populations. Third, we did not have access to patient-level data on important social determinants of health (e.g., income, education, social isolation), despite a growing appreciation for the impact of these factors on spending and outcomes, especially among high-cost patients. Finally, we used the OPTICS algorithm for cluster analysis. Different clustering algorithms are likely to produce different results. However, as discussed in an accompanying article,23 we believe OPTICS is the optimal algorithm for clustering high-cost patient populations.