INTRODUCTION
Health systems are increasingly adopting intensive primary care and care coordination programs to improve outcomes for high-need, high-cost (HNHC) patients, the 5% of patients who account for over 50% of health care costs.1 However, research on such programs has shown mixed results, improving patient satisfaction but having limited impact on quality of life, illness control, and need for acute care services.2, 3 As a group, HNHC patients are defined based on their utilization of care, rather than their clinical conditions. Yet, to better manage HNHC patients, clinicians need to match patients to care models tailored to their clinical conditions.4 Here, we utilized an open-source, machine learning method to describe different subgroups of HNHC patients based on their clinical characteristics for an urban Medicaid population in the Mount Sinai Health System (MSHS).
METHODS
Study Population
We examined administrative claims from 34,764 patients insured by a Medicaid managed care organization that operates in New York and New Jersey who were admitted to at least one hospital contained within MSHS between 1/1/2014 and 12/31/2015. This study was approved by the Icahn School of Medicine at Mount Sinai Institutional Review Board (IRB-16-01066).
High-Need, High-Cost Criteria
We selected patients ages 18 years and older who fulfilled either of two inclusion criteria: admitted at least three times within any 12-month period between 2014 and 2015 or admitted at least two times within the same time period, with at least one serious mental health condition as a primary diagnosis. We chose this definition based on Johnson et al.5 A hospitalization was defined as ICD-9 primary diagnosis codes in inpatient hospital claims; secondary and tertiary diagnoses were comorbidities.
Data Preparation
Using Medicaid claims data, we created a dataset of patient features consisting of ICD-9-based clinical condition categories, 31 electronic health record clinical codes, and demographic variables, including age, sex, and neighborhood of residence. We used the Clinical Classification Software scheme to categorize each primary diagnosis ICD-9 code into one of 250 clinical condition categories.5 For each clinical condition category, we created a variable that was equal to one when a patient’s claim line item included a primary diagnosis code fell into that category, and zero otherwise.
Data Analysis
Clustering is an unsupervised machine learning method for exploring non-parametric patterns within data that may not be discernable by parametric multivariate regression methods. We used affinity propagation (AP), a clustering algorithm that does not require the number of clusters in the data set to be known a priori.6
We utilized the apcluster package in R (3.3.1) using RStudio (version 0.99.903) for our analysis. For ease of interpretation, we focused on the top 25 clusters by size. The results were interpreted for clinical salience by investigators with clinical expertise (SN and JS).
RESULTS
Cohort Characteristics
There were 2397 patients in our cohort. The average age of patients was 46.5 (standard deviation [SD] 15.0) years and 56% were female (Table 1). The average number of admissions was 79 (SD 45.2) and total cost of care was $50,700 (SD $68,300).
Clinical and Cost Characteristics for Top 25 Clusters
Table 1 presents the main findings. The two largest clusters were characterized by depression and other mood disorders. Twelve of the top 25 clusters were primarily mental health and substance use conditions. Other prominent clusters included pregnancy- and birth-related complications, heart conditions, and diabetes.
There was surprisingly large variation in average costs of care across the top 25 clusters, ranging from $7000 to $76,600 per patient per year.
DISCUSSION
We used an open-source machine learning method to describe different subgroups of HNHC patients based on their clinical characteristics. The largest HNHC patient subgroups were characterized by mental and behavioral health conditions. We found marked heterogeneity in HNHC patient costs across the different subgroups. We also identified an unexpected patient population: patients with pregnancy-related complications.
References
Blumenthal D, Chernof B, Fulmer T, Lumpkin J, Selberg J. Caring for high-need, high-cost patients—an urgent priority. N Eng J Med 2016;375(10):909–911.
Bleich SN, Sherrod C, Chiang A, et al. Systematic Review of Programs Treating High-Need and High-Cost People With Multiple Chronic Diseases or Disabilities in the United States, 2008-2014. Prev Chronic Dis 2015;12:E197.
McCarthy D, Ryan J, Klein S. Models of Care for High-Need, High-Cost Patients: An Evidence Synthesis. Issue Brief (Commonw Fund) 2015;31:1–19.
Anderson GF, Ballreich J, Bleich S, et al. Attributes common to programs that successfully treat high-need, high-cost individuals. Am J Manag Care 2015;21(11):e597–600.
Johnson TL, Rinehart DJ, Durfee J, et al. For many patients who use large amounts of health care services, the need is intense yet temporary. Health Aff (Millwood) 2015;34(8):1312–1319.
Agency for Healthcare Research and Quality. Clinical Classifications Software (CCS) for ICD-9-CM. Available at: https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp. Accessed 4 Jan 2019.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they do not have a conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Dr. Doupe and Ms. Villanueva were affiliated with the Department of Health System Design and Global Health, Arnhold Institute for Global Health, and Icahn School of Medicine at Mount Sinai during the time the work was conducted.
Rights and permissions
About this article
Cite this article
Nuti, S.V., Doupe, P., Villanueva, B. et al. Characterizing Subgroups of High-Need, High-Cost Patients Based on Their Clinical Conditions: a Machine Learning-Based Analysis of Medicaid Claims Data. J GEN INTERN MED 34, 1406–1408 (2019). https://doi.org/10.1007/s11606-019-04941-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11606-019-04941-8