There is national interest in improving care for patients with complex needs, for whom there are both humanitarian and financial imperatives to do better.1, 2 In an environment of increasing value-based payment and risk-sharing, strong incentives encourage the development of complex care models, which often rely on enhancing the relationship between high-needs patients and their ambulatory care providers.3,4,5 Yet few interventions have demonstrated reliable evidence of success. The reasons are myriad, but the heterogeneity of high-cost patients (and their needs) is one important factor.

A better understanding of complex patients is foundational to understanding and harnessing improvement opportunities.1 Published descriptions of patients with high costs are mixed in terms of rigor and frequently are limited by simplified frameworks, such as using decision trees to assign patients to hypothesized groups. These approaches can fail to recognize subgroups that are not intuitive.

One pathway forward is the application of patient segmentation methods. Segmentation methods define subtypes within a heterogenous population, using data both to identify the groups that exist and to assign individuals to groups. This family of approaches is being deployed in varied industries,6 and is increasingly recognized as a key strategy to improve healthcare.7

Recently, several authors have used segmentation methods to cluster patients based on clinical conditions8,9,10 or patterns of utilization.11, 12 However, many of the studies focused on high-need patients have been restricted to specific populations or settings,13 such as Medicare8, 11, 12, 14 or Medicaid15, 16 beneficiaries, or patients of a safety net hospital,9, 17 and have used self-reported data or limited data sets to create patient clusters.

This paper goes beyond the existing literature by presenting a data-driven characterization of high-cost adults using comprehensive clinical data from a diverse, all-payer population of health plan members. We hypothesized that interventions for high-cost patients should be tailored based on clinical needs, and therefore focused our segmentation on patterns of chronic and acute conditions. While we did not set out to understand appropriateness of high spending in our population, we establish a granular description of high-cost adult that lays the foundation for subsequent work.


Study Design, Setting, and Data Sources

This is a retrospective observational cohort study of patients from Kaiser Permanente Southern California (KPSC). KPCS is an integrated healthcare delivery system and health plan with a comprehensive electronic health record (EHR). In 2010, KPSC cared for over 4.4 million members in 14 hospitals and over 220 medical offices across Southern California.18 The population of members is generally representative of the underlying population19 and includes Medicare and Medicaid beneficiaries as well as people with commercial and employer-sponsored coverage. We drew existing data from 2010 through 2014 from Kaiser Permanente’s internal data infrastructure (see eMethods in the online supplement).


We defined the high-cost population as the top 1% of continuously enrolled adult KPSC members with respect to total healthcare expenditures during 2010. To obtain the cohort, we first limited the adult membership of KPSC in 2010 (N = 2,714,005) to those with continuous coverage (N = 2,118,343) and then selected the top 1% (N = 21,183; eFigure 1).


We used the Centers for Medicare and Medicaid Services’ (CMS’s) publicly available ICD-9 grouping methodology, the Hierarchical Condition Categories (HCCs) system,20 to develop indicator variables that captured the presence or absence of acute and chronic conditions based on diagnostic codes for each patient. We collapsed some very rare condition categories (CCs) together (eMethods), resulting in a final list of 53 CCs used to define patient classes. We did not apply the hierarchies available in the CMS-HCC algorithm, but rather leveraged only the validated ICD-9 grouping function.

Statistical Methods

Our primary analysis used a multivariate structural equations modeling technique called latent class analysis (LCA). LCA is based on the theory that an unobservable (“latent”) variable (i.e., class membership) can be measured through patterns of association in a set of observable “manifest” variables (in our case, the set of 53 CCs capturing patients’ acute and chronic conditions). The investigator must identify the set of observed data on which to compose the classes.

LCA categorizes a population into underlying subgroups,11, 21, 22 with the goal of arriving at the most “parsimonious and interpretable set of classes”.8, 23, 24 Increasingly complex models are generated by adding to the number of latent classes, and model-fit indexes are used to compare models until the best-fitting model (i.e., with the lowest Bayesian information criterion, and best interpretability and parsimony) is found.21

Class membership is probabilistic. The model generates posterior probabilities of class membership, which sum to one for each individual. Once the best-fitting model is selected, individuals are assigned to the specific class for which their probability of membership is greatest.21, 25 The average of the posterior probabilities among members of each class can be used to assess quality of model fit. The classes are mutually exclusive and collectively exhaustive (eMethods).

After completing the LCA, we used demographic, health behavior, utilization, spending, and survival data to describe and interpret the resulting classes (but not to compose the classes). For comparison with the top 1% of adults, we also drew a 10% simple random sample from the bottom 99% of spenders (“all other adults”). We used the MPLUS and SAS statistical software packages for all analyses. This study was approved by the appropriate institutional review boards.


In aggregate, the top 1% of patients accounted for 22% of total healthcare spending on continuously enrolled adult KPSC members during 2010. The table shows basic descriptive statistics for high-cost adults versus all other adults, for contextual understanding. Annual per patient spending among the top 1% ranged from $40,000 to $3.6 million. Mean annual spending was $77,600 (median $57,500), which was more than 26 times higher than the mean among the bottom 99% of spenders ($2900; median $1200).

Mean age among high-cost adults was 62 years, compared to a mean of 49 years in all other adults (Table 1). Forty-eight percent of high-cost adults were female (vs. 53% of all other adults), 51% were White (vs. 39%), and 51% were Medicare beneficiaries (vs. 16%). Diabetes was the most common condition among high-cost adults (34% of the top 1% vs. 7% of all other adults).

Table 1 Characteristics of Patients with High Costs Versus All Other Adults

Segmentation by Clinical Conditions

In the primary analysis, we tested LCA models with three to nine classes (eMethods) and selected a seven-class model as the best-fit solution based on fit statistics, interpretability, and parsimony.21, 23 In the final model, class sizes ranged from 5% (N = 1124) to 33% (N = 7014) of the high-cost population (eTable 1). The average of the posterior probabilities of class membership ranged from 0.79 to 0.89 (eTable 1), indicating good model performance.26

Figure 1 displays the prevalence of the model indicators conditional on class membership; eTable 2 presents the same data in tabular format. Conditions were considered distinctive if they were highly prevalent within a class or highly concentrated in a specific class (even if not highly prevalent).

Figure 1
figure 1

Prevalence of condition categories within latent classes of high-cost adults. Highest-prevalence conditions within classes (columns), and conditions that are concentrated in specific classes more than others (rows) are distinctive class features. Figure shows the percent of patients within each class having each condition category. Categories are not mutually exclusive.

Some conditions were ubiquitous across all of the classes: diabetes, vascular diseases, and heart conditions were present to some extent across nearly every class. Within this context, three classes contained distinct groups of patients with specific constellations of chronic conditions: the end-stage renal disease (ESRD), cardio-vascular and pulmonary diseases, and diabetes with multiple comorbid conditions (MCCs) classes represented 12%, 17%, and 8% of the population, respectively (Fig. 1, eTable 1). The ESRD class contained patients with dialysis status and accompanying chronic comorbidities (Fig. 1, eTable 2). The cardio-vascular and pulmonary disease group had a high prevalence of heart conditions (e.g., congestive heart failure, coronary artery disease, vascular disease) and chronic obstructive pulmonary disease and low prevalence of diabetes. In contrast, patients in the diabetes with MCCs class had diabetes and commonly co-occurring comorbidities (e.g., dialysis status, congestive heart failure), as well as complications of diabetes (e.g., wound-healing and amputation codes, retinopathy).

Another class had acute exacerbations of chronic illness. This class made up 11% of the sample (eTable 1) and was the oldest class with a mean age of 71 years (eTable 3). In this class, we observed common chronic conditions, plus high prevalence of acute illnesses that are frequently observed as patients deteriorate and are proximate causes of death, such as sepsis and shock, acute renal failure, cardio-respiratory failure and arrest, intestinal obstruction/perforation, and artificial openings for feeding or elimination (Fig. 1, eTable 2). This class had the greatest number of individual members with ten or more comorbid conditions (Fig. 2). A somewhat larger class contained patients receiving highly specialized treatments (14% of the sample) such as organ transplantation and patients with immune disorders and cancers (Fig. 1, eTable 2). The smallest class contained patients with neurologic and catastrophic conditions (5% of the sample) such as stroke or post-stroke paralysis, seizure disorders, trauma such as severe head injuries and burns, and paralysis and coma (Fig. 1, eTable 2).

Figure 2
figure 2

Distribution of number of condition categories, by latent class. The largest class (class 7) is dominated by patients with very few HCCs, while > 50% of patients in classes 4 and 6 have eight or more HCCs. The remaining classes have similar and more moderate comorbidity burdens and vary in class size. Figure shows the count of patients in each class according to their total number of condition categories, among the 53 condition categories used in the analysis.

The largest class (33% of the sample) had few comorbidities (eTable 1). They were the youngest class with a mean age of 54 years and included more women and people with commercial insurance (eTable 3). The preponderance of patients in this class (68%) had two or fewer co-occurring conditions (Fig. 2), which represented a markedly lower burden of illness than among the other six classes.


After 4 years, 69% of high-cost adults from 2010 were alive, compared to 98% of all other adults (Fig. 3). Survival among high-cost adults varied widely between the classes. Four-year survival was best (88%) for those with few comorbidities and worst for the acute exacerbations of chronic illness (43%) and diabetes with MCCs (46%) classes. The remaining classes had survival in the range of 62 to 73% at 4 years (Fig. 3).

Figure 3
figure 3

Survival at 1 year and 4 years, by latent class. More than 80% of patients in all classes except class 6 survive at least 1 year after the high-cost year, but survival at 4 years is more variable. Classes 4 and 6 have the worst survival at the end of 2014, while Class 7 has the best. Figure shows the percent of patients within each class who were surviving at 1 year (December 31, 2011) and at 4 years (December 31, 2014); survival through the end of 2010 was a criterion for inclusion in this study. Survival data are based on vital statistics linked to Kaiser Permanente data systems; survival is known for all patients including those lost to follow-up after exiting Kaiser Permanente coverage.

Healthcare Spending

Figure 4 shows 2010 spending in aggregate for each class as a whole (panel A) and as an average (spending per person) within each class (panel B), broken down by type of care. While average spending was lowest in the few comorbidities class ($65,000 per person in 2010), their total aggregate spending as a class ($456 million) was the highest because of the large class size (33% of the sample).

Figure 4
figure 4

Total aggregate (Panel A) and average per patient (Panel B) spending by type of care, for latent classes, 2010. Total aggregate spending is greatest in class 7 despite low comorbidity burden, due to the large class size. In all classes except class 1, hospital costs dominate both total and average per patient spending. Outpatient drug spending is greatest in class 7, at $67.6 million in aggregate and $9600 per person. Home health, hospice, and skilled nursing costs are highest in classes 2 and 6, as expected based on clinical profiles. Panels show total spending in 2010 by all patients within each class (“Total aggregate”) and average per patient spending in 2010 within each class (“Average”); data labels show totals for each column. “All Other” costs include laboratory, radiology, home health, home health prescriptions, skilled nursing facility, hospice, and ambulance costs. Cost data are calculated using Kaiser Permanente’s standard cost accounting methodology and capture all utilization by members through both network and non-network providers; costs are rounded to the hundreds place.

In contrast, the acute exacerbation of chronic illness and neurologic and catastrophic classes had the highest average per person spending ($109,900 and $98,200 respectively). Patients in the ESRD and cardio-vascular and pulmonary classes had lower average per person spending ($66,000 and $67,500 respectively). For all classes—except ESRD—hospital services accounted for most of total spending (Fig. 2).

Persistence of high spending was limited over four subsequent years. Quarterly median individual spending by class is shown in Figure 5. Substantial regression to the mean after 2010 is apparent in all classes except the ESRD and diabetes with MCCs classes. In every class, there remained individual patients with very high spending at points in time. However, only 2126 patients (10%) sustained top 1%-level spending over all 5 years (data not shown).

Figure 5
figure 5

Quarterly median per patient spending from 2010 through 2014, by latent class, with 25th, 75th, and 90th percentiles. Class 1 and class 4 show stable median spending over 4 years of follow-up, while spending in all other classes shows rapid regression to the mean. However, the distribution of spending remains wide throughout the follow-up time in every class. Figure displays quarterly individual-level spending data for 2010 through 2014, showing the 25th, 50th (median), 75th, and 90th percentiles of costs within each latent class. Top 1%-level spending during 2010 was a criterion for inclusion in the study; in quarters after 2010, median spending declined rapidly across most classes. Decedents and patients who were lost to follow-up are included in the statistics up until the time of their exit from the population; thereafter (in the quarters in which they have missing data), they are excluded from the statistics. Patients with coverage and known spending of $0 in given quarters are included in statistics.


A rich understanding of the segmentation of high-cost patients is foundational to any strategy to improve their care.1 In this analysis of the top 1% in a large integrated healthcare system, we took advantage of an all-payer population with comprehensive data across the continuum of care. We developed a detailed characterization of high-cost adults using a person-centered, data-driven approach. Our analysis is the largest to our knowledge, demonstrating the feasibility of LCA with a large population and more than 50 variables used in fitting the model.

We identified seven classes of high-cost patients, clustering based on categories of diagnosis codes alone. The constellations of conditions present in each of the latent classes showed clinically intuitive patterns of comorbidity. As opposed to classifications based on expert judgment, LCA directly reflects the patterns of comorbidity that exist in the data, and offers quantitative information about the quality of the model fit. Of the 21,183 participants in our study, more than 85% had 0.70 or greater posterior probability of membership in the class to which they were assigned, and all classes exceeded the in-class average membership probability threshold established in the literature. Compared to alternative empirical approaches, LCA has advantages such as the option to include covariates and the ability to prospectively assign other subjects to proven clusters if they have measurements for the same set of manifest data.

Our results both confirm the prior literature and add nuance to the clinical segmentation of high-cost adults. In particular, we demonstrate that chronically ill high-cost patients are not monolithic, but rather segment into distinctive subtypes. We suggest that it is the combinations of comorbidities, rather than specific individual conditions, that characterize these high-cost patients. One potential policy action supported by our findings is continued movement toward clinical practice guidelines that address or incorporate multi-morbidity, to improve clarity about evidence-based management of these complex patients.27

Several of the classes of high-cost patients that arose in our analysis align with the existing literature, such as groups characterized by catastrophic injury or illness, neurological disease, renal disease, and minimal comorbidity.8, 17 However, our work makes an important contribution as we used an all-payer population (vs. Medicare-only data) and an expansive list of condition indicators (vs. a limited list) based on a comprehensive inventory of diagnosis codes (vs. self-report). Closest to our findings are the subgroups reported by Whitson et al. in a LCA of 14,052 Medicare beneficiaries based on self-reported variables capturing 13 chronic conditions.8 However, the model reported by Whitson et al. had high misclassification error, while ours did not. A comparison of the subtypes of patients based on clinical conditions in their analysis and ours is provided in the online supplement (eTable 4).

In most of the classes we identified, average spending rapidly decreased after the high-cost year. This pattern may leave little opportunity (or cause) to intervene, since for many patients, costs resolve on their own over time. Failure to recognize this pattern could lead evaluators of pilot programs to incorrectly attribute cost reductions to their interventions. Future work should better understand individual trajectories of spending and their predictors, so that interventions can be planned for groups of patients who are expected to have sustained patterns of high spending.

Spending patterns over time were more persistent in the ESRD and diabetes with MCCs classes in our analysis, both of which contained substantial numbers of patients receiving dialysis. Dialysis costs made up a large share of total spending for the ESRD class, but hospital costs dominated in the diabetes with MCCs class. These findings suggest both upstream work to prevent and delay the transition to dialysis status and to ensure optimal starts and focused effort to optimize care for these complex patients in terms of coordination, efficiency, and shared decision-making. Future work should investigate the level of diabetes control in this population, to understand whether interventions to improve self-management might be indicated.

Our study has limitations. We focused on the top 1% of patients, which is a narrow definition of patients with complex needs (but yielded a very large cohort in our population). We lack comparable cost data from prior to 2010 because of a transition to a new medical records and billing system and therefore cannot discern whether 2010 was the first high-cost year for patients in our study. Kaiser Permanente’s cost data may not be readily generalizable to other providers, but are based on a consistent cost accounting methodology (see eMethods) and can be used to understand relative differences in spending within our membership. Furthermore, our population, while more general and representative than many studied previously (which are often payor-specific), is not directly translatable to the USA due to the integrated system in which they receive care.

The LCA methodology used in our analysis is a data-driven segmentation approach,21 but it does rely on the investigator to select the set of manifest data for fitting the model. We used diagnosis-based condition indicators, a clinician-oriented approach. Future analyses should test other sets of manifest data for identifying clusters of complex patients. Other industries have begun assembling broader datasets, which could add substantially to our understanding of this population if new data (e.g., social needs and care preference data) and existing cross-sectoral data (e.g., from correctional authorities and social services) could be leveraged.28 Our analysis is the largest to our knowledge, demonstrating the feasibility of LCA with a large population and many indicator variables.

To minimize ascertainment bias, we required continuous enrollment during the study period. While this approach might lack sensitivity for conditions untreated during the study year, it should identify chronic conditions that persist over time, particularly in our population of high-use patients. It also identifies acute events such as trauma or heart attacks.

We did not set out to distinguish appropriate from inappropriate care or spending, and our results cannot be used to make direct inferences about opportunities for improvement. Much of the care reflected in our analysis may represent high-quality care that is concordant with patient preferences. However, a recent study focused on the top 10% of Medicare fee-for-service beneficiaries concluded that there is some potentially preventable spending within the high-cost population, particularly among frail elders.29 In our analysis, many patients have advanced illness or experience an overwhelming cascade of illness, and survival is poor. Thus, we would suggest that interventions in this population should focus on improving the delivery of person-centered, high-quality, and goal-concordant care that maximizes dignity, rather than the ubiquitous focus on costs as the primary outcome. Given the dynamic nature of this population, carefully controlled studies are essential to demonstrate the impact of any intervention.1, 28

In summary, our study offers a detailed descriptive characterization of high-cost patients, distinguishing seven distinctive classes of patients with various combinations of acute and chronic conditions. While the classes were composed on the basis of clinical conditions alone, they differed substantially with respect to four-year survival and future spending. This work builds on and extends conceptual patient taxonomies and analyses driven by ex ante categorization, and it lays the foundation for future work to explore actionable insights. Additional efforts are needed to differentiate appropriate and inappropriate spending and design intervention strategies accordingly, to meet the needs of these challenging populations.