Background

Population segmentation of patients into parsimonious and relatively homogenous subgroups or segments based on healthcare requirements can aid healthcare resource planning and the development of targeted intervention programs for a specific patient subgroup [1, 2]. With an understanding of the current and future healthcare requirements for each segment, more targeted and efficient care can be delivered for each specific patient segment. This is especially critical in Singapore with rapidly ageing population and increasing chronic disease burden [3]. Healthcare expenditure is predicted to exponentially increase from Singapore Dollars (SGD) $4 billion (USD $2.98 billion) in 2011 to SGD $12 billion (USD $8.94 billion) in 2020 [4]. Healthcare in Singapore is mainly under the responsibility of the Singapore Ministry of Health (MOH) which uses a mixed financing system that includes nationalized healthcare insurance schemes and deductions from the compulsory savings plan Central Provident Fund (CPF), for Singapore citizens and permanent residents [5]. In order to effectively deliver effective and targeted care for an ageing population and cope with increasing healthcare costs, it is crucial to have a deep understanding of population’s health characteristics and healthcare needs. Population segmentation is a critical first step in the development of effective healthcare policy because it provides policy makers with more detailed information about specific health characteristics and healthcare needs of each population segment which allows for tailored health intervention programs for different segments. This eventually leads to better policy decisions on healthcare resources allocation and planning.

There are two major approaches to population segmentation – 1) data-driven approach where segmentation is done using statistical analysis (e.g. clustering analysis, latent class analysis, classification tree) on empirical health data and 2) expert-defined approach where segments are decided via experts’ review and consensus on current evidence in literature. These two approaches are not mutually exclusive and a hybrid approach may have both data and experts input. Some examples of a data-driven approach include Lafortune’s latent class analysis of a trial’s data [6], Liu et al’s study of the Taiwan National health Insurance survey participants [7] and Van der Laan at al’s demand-driven segmentation model [8]. In these studies, health related data, including medical, behavioral, functional and socio-demographic data were used to derive various segments and profile each segment’s characteristics.

Alternatively, segments can also be defined a-priori through experts’ review and consensus on current evidence in literature. Examples of published expert-defined approaches include Lynn et al’s Bridges to Health person-centered segmentation framework, [9] Kaiser Permanente’s Senior Segmentation Algorithm for elderly persons aged 65 years or older [10] and National Academy of Medicine Patient Taxonomy [11]. In our previous work, we assessed the feasibility of segmenting a general patient population into six segments defined by Singapore Health Systems Regional Health System (SingHealth RHS) experts [12]. In our previous work, we found this framework to be feasible as a proof of concept to identify patient segments with distinct healthcare utilization and mortality patterns [12]. However, in the previous study, we were not able to assess the predicative ability of patient segment membership on long-term healthcare utilization and mortality. It is important that validation and adjustment need to be pursued before clinical and policy application in a healthcare system [13]. In our policy context, the segmentation approach needs to be validated against long-term healthcare utilization and mortality. This is also a critical gap in literature where it is not clear whether population segments by expert-defined segmentation approaches have different long-term healthcare utilization and mortality.

In this study, we aimed to address this critical gap by assessing the predictive ability of our expert defined segmentation approach on 3-year healthcare utilization (defined as hospital admissions, emergency department attendances, and specialist outpatient clinic attendances) and mortality rate.

Methods

Study design

We conducted a retrospective study to segment all adult patients (≥ 21 years of age in Year 2012) who utilized healthcare services at SingHealth RHS in 2012. Patients were excluded if they were below 21 years of age. This study was approved by SingHealth Centralized Institutional Review Board (CIRB 2016/2294). De-identified data from 2012 to 2015 were extracted from the electronic health records (EHRs) using the Oracle Business Intelligence and Enterprise Edition (OBIEE) Software [14]. The extracted variables included socio-demographic data, chronic diseases, healthcare utilization (hospital admissions, emergency department attendances and specialist outpatient clinic attendances) and mortality.

Segmentation classification

A previously described segmentation framework was used [12]. The experts who developed the current framework are senior health administrators with extensive experience in both health policy and clinical care. This is to ensure policy and implementation relevance in our healthcare system setting. Patients were segmented into six non-overlapping subgroups: Mostly Healthy, Stable Chronic, Serious Acute, Complex Chronic without Frequent Hospital Admissions, Complex Chronic with Frequent Hospital Admissions, and End of Life. The definitions and examples of the segments are elaborated in Additional file 1 and Additional file 2. We defined frequent hospital admissions as 3 or more hospital admissions in past 12 months, which is a proxy for high cost users [15,16,17,18].

Statistical analysis

We firstly compared the socio-demographics and hospital utilization in baseline year 2012 between each segment using Chi-square for categorical variables and one-way ANOVA test for continuous variables. Using the start date of 1st January 2013 as time of entry into the study for all patients, we calculated the time to survival as the number of days from entry to death (for patients who are deceased on or before 31st December 2015) or 1094 days for censored patients (number of days from entry to 31st December 2015). Kaplan-Meier survival curves were plotted and differences in the survival plots were analyzed using log-rank test. To determine if there are differences in the hospital utilization from year 2013 to 2014, we first conducted bivariate analyses between the population segment and the hospital utilization using ANOVA or Chi-square test. As the count data for the utilization rate is over-dispersed where most of the patients actually have 0 utilization, a negative binomial regression model was used to model the hospital utilization with the Mostly Healthy segment as the reference group for the segments, and adjusted for age, gender, and ethnicity and past hospital utilization. We used the survival time as the exposure variable for the negative binomial regression model. We also conducted two-degree freedom Chi-square test between each pair of segments to test for significant difference of hospital utilization. All analyses were performed on STATA/IC 13.1.

Results

Patient baseline characteristics and acute hospital utilization

A total of 819,993 patients were included and segmented into the six segments with the proportions shown in Table 1. The overall mean age of the study population was 49.8 years with standard deviation (SD) of 17.2. There are more female than male patients (58% vs. 42%). The differences in age and gender between the segments are statistically significant with p < 0.001.

Table 1 Demographics and Healthcare Utilization of Patients by Segments in Baseline Year 2012

There is a trend of increasing hospital utilization in 2012 as we moved down the segments from Mostly Healthy to Complex Chronic with Frequent Hospital Admissions (Table 1). The differences between the six segments are all statistically significant with p < 0.001 for ED visits, SOC visits and hospital admissions. Not unexpectedly, patients in the Complex Chronic with Frequent Admissions segment had more frequent admissions, as this was a criterion for inclusion in this segment. However, this pattern of increased utilization in this group was also seen for SOC and ED attendances, suggesting that this segment does have increased healthcare utilization in multiple areas.

Bivariate analyses of segments and hospital utilization from year 2013 to 2015

The trend that we observed for hospital utilization from year 2013 to 2015 is similar to the trend for hospital utilization in 2012 where there is an increasing number of ED visits, SOC visits and hospital admissions from the Mostly Healthy segment to the Complex Chronic with Frequent Hospital Admissions (Table 2). Patients in the End of Life segment had the most SOC visits (mean 43.2, SD 50.8) among all six segments but they had significantly less ED visits (mean 0.88, SD 1.67) and hospital admissions (mean 1.33, SD 2.10) than patients in the Complex Chronic with Frequent Hospital Admissions segment (mean 4.00, SD 7.29 for ED visit; mean 4.49, SD 6.32 for hospital admissions). The hospital utilization is significantly different for the six segments with p < 0.001.

Table 2 Bivariate Analyses of Segments versus Hospital Utilization Rates from Year 2013 to 2015

Multivariable negative binomial regression on hospital utilization from year 2013 to 2015

As compared to the Mostly Healthy segment, patients in all other segments have significantly higher ED visits (p < 0.001) after adjusting for age, gender, ethnicity and hospital utilization in year 2012 (Table 3). Patients in the Complex Chronic with Frequent Admissions segment have 14.5 times (95% Confidence Interval: 13.49–15.64) ED visits compared to patients in the Mostly Healthy segment. Patients in the End of Life segment also have a highly increased risk of having ED visits compared to patients in the Mostly Healthy segment with an incident rate ratio (IRR) of 9.56 (95% CI: 8.51–10.75).

Table 3 Multivariable Negative Binomial Regression on Hospital Utilization from Year 2013 to 2015

For SOC, compared with Mostly Healthy segment, all the other segments have significantly higher utilization than the (all p < 0.001). After adjusting for the baseline variables and hospital utilization in 2012, patients in the End of Life segment have 11.50 times (95% CI: 10.68–12.39) SOC utilization compared to patients in the Mostly Healthy segment. Patients in the Complex Chronic with Frequent Admissions segment also have a significantly higher utilization than patients in the Mostly Healthy segment (IRR 7.71, 95% CI: 7.31–8.13).

Lastly, compared to the Mostly Healthy segment, all other segments have significantly high inpatient admissions with IRRs > 1 (p < 0.001). Patients in the Complex Chronic with Frequent Admissions segment had the highest IRR of 22.66 (95% CI: 21.07–24.37) for hospital admissions from 2013 to 2015. Patients in the End of Life segment have the second highest IRR of 16.18 (95% CI: 14.49–18.07).

For each model, the Chi-square tests showed that there are significant differences between all pair-wise segments with p < 0.001.

Analysis of survival time

Day 0 was taken at 1st January 2013. At the end of 2013, the survival rates for patients in the End of Life and Complex Chronic with Frequent Hospital Admissions segments were 74.6 and 81.7% respectively, while the survival rates for Complex Chronic without Frequent Hospital Admissions, Stable Chronic, Serious Acute and Mostly Healthy segments were all > 95%.

At the end of the second year (2014), the survival rates for patients in the End of Life and Complex Chronic with Frequent Hospital Admissions segments were 64.6 and 71.0% respectively, while the survival rates for Complex Chronic without Frequent Hospital Admissions, Stable Chronic, Serious Acute and Mostly Healthy segments were all > 93%.

Overall, patients in the End of Life segment had the worst survival rate (58.2%), followed by patients in the Complex Chronic with Frequent Hospital Admissions (62.6%) at the end of 3 years (end of 2015). Throughout the 3 years 2013–2015, the survival rates for patients in the Mostly Healthy, Serious Acute and Stable Chronic segments were indistinguishable from each other and higher than the other three segments (Additional file 3). The log-rank test for equality of the six survival distributions showed statistically significant difference between the six segments (p < 0.001).

Discussion

Our study supports that our previously developed six-segment framework is predictive of long-term healthcare utilization and mortality. Healthcare utilization and mortality increased with the complexity of the segments, suggesting that our segmentation approach was able to discriminate between patients of varying healthcare needs and risk of mortality. Patients in the Complex Chronic with Frequent Hospital Admissions segment represented 0.5% of the study population, but accounted for the highest risk of hospital admissions and ED visits per patient, and second highest risk of SOC visits in the following 3 years (2013–2015) after the initial healthcare encounter in 2012. Moreover, about one in three patients in this segment died within the next 3 years. This suggests that patients in segment had high healthcare burden that requires further investigation into disease management, psychosocial environment and quality of community care within the segment. Equally worth noting is the End of Life segment that accounted for highest SOC visits. This is likely due to the nature of patients within the End of Life segment – many of them have metastatic cancer with frequent outpatient appointments.

For the Mostly Healthy, Serious Acute, and Stable Chronic segments, survival rates were similar from 2013 to 2015 although there was an increasing gradient of healthcare utilization over the same period of time. This is important information in population health management which does not only consider survival but also healthcare resource consumptions and service planning. In a healthcare system where increasing healthcare spending is of particular concern, healthcare resource consumption trends are relevant and of particular interest to our policy makers.

There are several strengths of our approach. Firstly, this simple categorization can be easily replicated in most healthcare systems as the variables and healthcare utilization measures used in our study are commonly available in other healthcare systems. Some of the recently implemented segmentation framework such as those used in British Columbia, Canada [19] and Northern London, UK [20] used similar domains of information as our framework. While our study successfully identified six distinct segments with different long-term healthcare utilization and mortality, we are cognizant that even within each segment, patients may have differing healthcare needs. The utility of the current segmentation approach is less about specific disease treatment for a specific patient over a single healthcare encounter, which requires individualization of management plan by each patient-healthcare provider pair, but more relevant at policy level in planning what types of health services are needed for each segment at population level. Our segmentation framework is practical, with each segment corresponding to a predominant site of care and bundle of interventions. For example, subjects in the Mostly Healthy and the Serious Acute segments require mainly community-based health promotion activities and lifestyle interventions. This will guide population health policy and lead to more resources in preventive services development and health promotion efforts. Patients in the Stable Chronic segment require mainly primary care to avoid progression to complications while patients in the Complex Chronic with Frequent Hospital Admissions segment and Complex Chronic without Frequent Hospital Admissions segment may benefit from more aggressive and multi-disciplinary services for case management. For the End of Life segment, hospice care is typically needed to manage symptoms and to avoid events such as unnecessary hospitalizations that may be expensive and potentially risky. By knowing there is an End of Life segment and what is the proportion of entire patient population that belong to this segment, healthcare policy makers can allocate appropriate health resources in developing advanced care plans and shared care with appropriate specialists and/or team-based care, community case coordinators to optimize quality of life.

Our study has several limitations. First, variables in our dataset were restricted to those routinely collected in our EHRs. We were hence unable to refine the segmentation using information on functional status and socioeconomic variables which play important roles in influencing health related behavior and health services utilizations [21]. Secondly, our population database is unable to account for cross-utilization of healthcare services outside of the SingHealth or out of hospital deaths.

Data-driven segmentation approaches also provide an attractive alternative to generating evidence-based insights of a population’s health status. These approaches include unsupervised techniques such as clustering analysis and latent class analysis, and supervised techniques such as classification and regression. A key strength of data-driven approaches is the potential to group similar patients according to their similarity in several dimensions or characteristics [22]. Non-apparent latent classes or clusters can then be identified based on similar characteristics. Data-driven frameworks, although easy to standardize and explicit in methodology, may not always be relevant and practical at policy and implementation level in a particular healthcare system. Experts driven methods are likely to have implementation feasibility and policy implications but may not have the rich insights from large volume of health data. It is each healthcare system’s decision to adopt either experts-driven, data-driven, or a hybrid approach taking into considerations scientific evidence and specific policy contexts and priorities.

Conclusion

In this study, we demonstrated the predictive ability of an expert-driven segmentation framework on longitudinal healthcare utilization and mortality.