INTRODUCTION

Health systems are increasingly adopting intensive primary care and care coordination programs to improve outcomes for high-need, high-cost (HNHC) patients, the 5% of patients who account for over 50% of health care costs.1 However, research on such programs has shown mixed results, improving patient satisfaction but having limited impact on quality of life, illness control, and need for acute care services.2, 3 As a group, HNHC patients are defined based on their utilization of care, rather than their clinical conditions. Yet, to better manage HNHC patients, clinicians need to match patients to care models tailored to their clinical conditions.4 Here, we utilized an open-source, machine learning method to describe different subgroups of HNHC patients based on their clinical characteristics for an urban Medicaid population in the Mount Sinai Health System (MSHS).

METHODS

Study Population

We examined administrative claims from 34,764 patients insured by a Medicaid managed care organization that operates in New York and New Jersey who were admitted to at least one hospital contained within MSHS between 1/1/2014 and 12/31/2015. This study was approved by the Icahn School of Medicine at Mount Sinai Institutional Review Board (IRB-16-01066).

High-Need, High-Cost Criteria

We selected patients ages 18 years and older who fulfilled either of two inclusion criteria: admitted at least three times within any 12-month period between 2014 and 2015 or admitted at least two times within the same time period, with at least one serious mental health condition as a primary diagnosis. We chose this definition based on Johnson et al.5 A hospitalization was defined as ICD-9 primary diagnosis codes in inpatient hospital claims; secondary and tertiary diagnoses were comorbidities.

Data Preparation

Using Medicaid claims data, we created a dataset of patient features consisting of ICD-9-based clinical condition categories, 31 electronic health record clinical codes, and demographic variables, including age, sex, and neighborhood of residence. We used the Clinical Classification Software scheme to categorize each primary diagnosis ICD-9 code into one of 250 clinical condition categories.5 For each clinical condition category, we created a variable that was equal to one when a patient’s claim line item included a primary diagnosis code fell into that category, and zero otherwise.

Data Analysis

Clustering is an unsupervised machine learning method for exploring non-parametric patterns within data that may not be discernable by parametric multivariate regression methods. We used affinity propagation (AP), a clustering algorithm that does not require the number of clusters in the data set to be known a priori.6

We utilized the apcluster package in R (3.3.1) using RStudio (version 0.99.903) for our analysis. For ease of interpretation, we focused on the top 25 clusters by size. The results were interpreted for clinical salience by investigators with clinical expertise (SN and JS).

RESULTS

Cohort Characteristics

There were 2397 patients in our cohort. The average age of patients was 46.5 (standard deviation [SD] 15.0) years and 56% were female (Table 1). The average number of admissions was 79 (SD 45.2) and total cost of care was $50,700 (SD $68,300).

Table 1 Characteristics of Top 25 Clusters

Clinical and Cost Characteristics for Top 25 Clusters

Table 1 presents the main findings. The two largest clusters were characterized by depression and other mood disorders. Twelve of the top 25 clusters were primarily mental health and substance use conditions. Other prominent clusters included pregnancy- and birth-related complications, heart conditions, and diabetes.

There was surprisingly large variation in average costs of care across the top 25 clusters, ranging from $7000 to $76,600 per patient per year.

DISCUSSION

We used an open-source machine learning method to describe different subgroups of HNHC patients based on their clinical characteristics. The largest HNHC patient subgroups were characterized by mental and behavioral health conditions. We found marked heterogeneity in HNHC patient costs across the different subgroups. We also identified an unexpected patient population: patients with pregnancy-related complications.