1 Introduction

Current healthcare systems face many changes, such as aging population, increased use of technologies, and high expectations from citizens. These changes have transformed healthcare settings into patient-centered and value-based models. However, the effort to improve quality of health while limiting costs is a major roadblock for various stakeholders [1]. In this context, data and information can help healthcare providers to deliver optimum health outcomes in unprecedented ways. The availability of electronic clinical data has skyrocketed because of the development of information technology and the use of electronic medical records (EMRs). Given the huge amount of available data, traditional software is no longer computationally sufficient to store, manage, and analyse high volumes of information. The implementation of machine learning (ML) algorithms, such as decision trees (DTs), neural networks (NNs), and other techniques, is necessary in converting data into actionable insights for automated decision making and precision medicine. By 2025, the data subject to analysis are expected to balloon to 5.2 ZB, of which 1.4 ZB will be attributed to ML systems [2].

Predictive modeling is at the forefront of improving quality of care. With ML models, massive amounts of data can be analyzed to predict outcomes for individual patients. Predicting a patient’s risk of readmission is an exemplar application for ML models. Hospital readmission can be defined as an admission to a hospital within a specified time interval (typically 30 days) after previous discharge by the same patient [3]. Readmission could be costly, with the cost of care associated with readmission increasing to above $10,000 per patient [4]. In the United States, the Affordable Care Act requires healthcare providers to establish the Hospital Readmissions Reduction Program (HRRP) to reduce readmission rates by imposing penalties to hospitals with excess readmissions. The total penalties hit $500 million of hospitals’ overall Medicare payments annually [5]. Identifying patients at high risk of getting readmitted is a key strategy to reduce the number of hospital readmissions and the associated costs. Accurate predictions from algorithms are important in supporting care providers in their decision regarding whether a patient is ready for discharge or should be targeted for interventions. In this way, such predictions mitigate the risk of unplanned readmission and curb increased healthcare costs.

Many risk factors have been highlighted to be associated with a high risk of readmission. Common associations include sociodemographic factors such as high age and poor living conditions [6], patient comorbidities [7, 8], premature discharge [9], insufficient post-discharge support [10], complications from previous medical care, and adverse drug events [11]. Among these factors, clinical errors could be a major contributor to avoidable readmissions. Of the 20% of patients who experience adverse events (AE) following discharge, three-quarters of cases are related to medications, with diagnostic or therapeutic errors contributing to more than one-fifth of AE [11]. Examples of medication errors include patients being discharged without prescriptions for necessary medications, improper dosage, and inadequate monitoring for drug side effects. Therapeutic errors can be attributed to the failure to adequately monitor treatments. Preventing these errors is crucial to ensure patient safety after hospital discharge. In addition, targeted interventions to prevent readmission can be provided to patients by ensuring safe care transitions prior to discharge. Interventions such as medication reconciliation, structured discharge summaries, facilitated communication between hospital and primary care providers, and patient and family education have been shown to have positive impacts on readmission rates [12]. Predictive modeling is therefore necessary for healthcare facilities to identify patients at a high risk of readmission.

1.1 Work Motivation

Mining clinical data for insights and modeling is a tedious aspect of developing analytic solutions in healthcare. The area under receiver operating characteristic curve (AUC) is a standard measure of prediction performance that indicates the ability to discriminate between two or more target populations. Generally, a model with 0.5 AUC performs no better than chance, a range of 0.7–0.8 is modest and acceptable, and an excellent model has 0.8–0.9 discriminative ability; high values are rarely observed and are believed to be at a high risk of overfitting [13]. Despite increasing pressure to mitigate readmission rates, existing works often reported moderate performance (AUC <  = 0.75) in identifying positive cases, even with the emergence of ML algorithms [14, 15]. In 2018, Artetxe and colleagues [16] presented what is probably the most recent systematic review that covers a general overview of prediction models in the field. The study reported that traditional statistical models, such as logistic regression (LR) and survival analysis, are still widely used in health analytics, with other complex ML techniques showing promising results over classical methods. Emerging algorithms such as deep learning (DL) also show immense potential in yielding excellent results. Nevertheless, further studies are needed to assess the real impact of complex models in the domain of readmission prediction.

The present study aims to:

  1. (1)

    Present the current trends in the predictive approach to readmission research by describing the recent methods used for model building.

  2. (2)

    Investigate the impact of complex models on predictive performance.

  3. (3)

    Discuss some of the challenges in using these models as a decision support tool in future.

The paper is organized as follows. Section 2 describes the various reviews that provide rich insights into the use of predictive modeling in readmission research. Section 3 covers a general introduction to the methods of prediction. A full description of the models is beyond the scope of this study. Section 4 describes the data sources used for clinical predictive modeling. Sections 5 and 6, respectively, focus on existing methods for predicting readmission in particular patient subpopulations (disease-specific) and on models that fit the entire dataset. Section 7 highlights the discussion for readmission research. Section 8 concludes the study.

1.2 Methodology

In this work, related works were considered in the review of readmission by covering the following search strings: “readmission,” “rehospitalization,” “predictive,” “prediction,” and “review” published in 2010 onwards. In terms of the list of studies to be included in Section 3 onwards, a literature search was performed using the following search query: (“Readmission” OR “Rehospitalization”) AND (“Machine learning” OR “Deep learning” OR “Prediction” OR “Predictive” OR “Predicting” OR “Predict” OR “Model” OR “Modeling”). We refined our search to works published in the past two years, i.e., 2019 and 2020, to investigate the current trends in readmission research. The studies that met the following inclusion criteria and those published in the English language were retained for further analyses.

The eligibility criteria were as follows:

  1. (1)

    The objective of the study should answer a simple question: What is the probability of readmission of an adult patient at any time point during the hospitalization period?

  2. (2)

    The original research should present the development of prediction models by using traditional regression methods, such as LR and Cox regression; ML models, such as artificial neural networks (ANNs), DTs, and support vector machine (SVM); or other complex DL algorithms, such as deep neural networks (DNNs).

  3. (3)

    Articles with full text available to ensure quality assessment.

  4. (4)

    The titles or abstracts of the articles should show the relevant search terms, and the search terms should have the same meaning as intended for the review in the current work.

  5. (5)

    The outcomes of the studies must be the prediction of the likelihood of patients’ readmission with conditions and procedures specified in HRRP, hospital-wide, and cardiovascular-, psychiatric-, and diabetes- related studies. The studies wherein the targets are disease-specific post-surgery readmission were excluded as predictors are unique to each surgical procedure. In the entire selection process, a citation management tool (EndNote X9; Thomson Reuters Corporation, New York, NY) was used to manage all citations. Unlike conventional systematic review studies, the current work aims to answer the following questions: “What are the current trends in modeling readmission?” “Do complex models perform better than simple ones?” “Do state-of-the-art algorithms bring high value?” “How should complex event predictions be dealt with?”

2 Related Works

Using the vastly available clinical data for readmission, modeling has gained increasing attention from researchers. The increasing adoption of EMRs has created opportunities for researchers to leverage patient-centered records, which are usually ill-understood. An EMR contains a patient’s medical history, including demographic information, medical diagnoses, laboratory test results, treatment plans, and medications. According to the National Physician Survey, about 65% of physicians have indicated that EMRs improve the quality of patient care [17]. The effect of EMRs on the clinical workflow has been positive. However, few studies have discussed a complete set of techniques that can be explored to mine EMRs for readmission modeling. Table 1 shows a number of reviews in the readmission field, along with their research summaries.

Table 1 Studies on readmission predictive modeling

Many related review studies have reported moderate predictive performance with AUC < = 0.70. Although the predictive ability of readmission risk models in recent years has improved to AUCs above 0.70, other complex ML models struggle to reach parsimony state because of the lack of transparency in the feature selection process. Moreover, performance varies greatly depending on the target population because of different risk factors. Previously published review studies assessed various predictive models up to January 2019 [14] regardless of the methodology used for study selection. Gaps exist in the knowledge about the recent trends in this field of research and the models that leverage newer models, such as DL, for prediction. The current study focused on identifying the approaches to predicting high-risk patients in the last two years.

3 Modeling the Likelihood of Clinical Outcomes

Most studies modeled readmission as a dichotomous outcome, in which the target regarding readmission can be true or false within a certain time frame [21]. Survival analysis is another important method for estimating the readmission days survived from previous discharge. Regardless the natures of modeling, all algorithms are designed to detect the complex relationships between explanatory variables and observations. Traditional rule-based scoring, such as the LACE score and HOSPITAL score [22], is the simplest to employ. A high LACE score is directly proportional to a high risk of readmission. Despite the plethora of work that used the LACE model in developing readmission risk prediction models, the optimum cutoff score to capture high-risk patients varies according to study populations. Thus, models based on clinical rules help in facility-level decision making. In predictive methods based on multivariable modeling, the relationships between single independent variables with desired health outcome events can be examined. Learning-based prediction is crucial to predict readmission at the individual level. Figure 1 depicts an overview of the learning-based approaches in predictive modeling.

Fig. 1
figure 1

General overview of techniques used for predictive modeling

3.1 Statistical Learning

LR is a fundamental model that uses a logistic function to solve classification problems. Being the most commonly used method when predicting binary outcomes, LR can be used to identify the relationship weight (whether positive or negative) between independent variables (features) and dependent variables (study outcomes). Penalized regressions, such as ridge [23], lasso [24], and elastic net [25], are useful as they provide an approach to variable selection other than statistical significance. These penalizations address the over fitting issue by shrinking the magnitude of variable coefficients. Overall, the goal of LR is to find the best fitting model to describe the relationship between a set of predictors and target variables.

An alternative statistical-based approach to model prognosis is survival analysis, which aims to predict the time to readmission. A key aspect of survival analysis is censored observation, in which readmission has not occurred during the study period. Without the presence of censoring, standard LR could be used. Traditionally, the Cox proportional hazard (CPH) model has been the most widely used model to analyse censored data, but the CPH model often works for small datasets and does not scale well to high dimensions and large volumes of clinical data [26].

3.2 Machine Learning

Unlike statistical learning, ML is an interdisciplinary field comprising statistics, mathematics, and computer science elements. It provides an approach to develop machines that are “intelligent” enough to perform complex tasks. This approach is commonly known as artificial intelligence (AI). AI is able to imitate human intelligence driven by advanced algorithms and careful training over a large pool of data [27]. The idea of ML is to learn from examples and experiences (data) and is thus different from rule-based symbolic AI. Once trained, model will learn an optimized function on the basis of data and draw predictions for specific tasks. Such data-driven approach is now the state-of-the-art methodology for various domains, such as computer vision [28], natural language processing [29], and real industrial clinical applications [30]. Standard ML techniques can be broadly classified into three main categories, namely, support SVM, naïve Bayes (NB), and tree-based methods.

In healthcare, regression techniques always serve as a baseline model for clinical tasks. However, the potential disadvantage could be their inferior performance [31]. The use of various ML models in clinical settings is often a sensible approach. SVM was first introduced in 1995 and is a powerful learning approach to classification [32]. SVM is similar to LR, in which the end training output is a hyperplane that separates data points into two or more categories. However, different from LR, SVM is a non-probabilistic classifier. Given a set of inputs, SVM takes these inputs and attempts to find which of two possible classes forms the output; the decision boundary represents the largest separation between the two classes. This feature also means that the hyperplane may not be defined by a simple function. SVM is also able to work well in nonlinear separable data. This task can be realized using a method called “kernel trick.” In this kernel space, non-separable data are separable with a linear hyperplane. Linear, polynomial, and radial basis functions are the commonly used kernels. The advantage of using SVM is that over fitting is unlikely to occur in cases where the model only needs to learn the hyperplane in the mapped space. The disadvantage is the inferior interpretability being a “black box” model.

NB is a Bayes rule-based probabilistic classifier that has been used in research for over 50 years [33]. Most predictive models in ML generate a number between 0 and 1 to order the instances typically from the most likely to the least likely to be positive. A Bayes-based probabilistic model assigns the posterior probability of a class given a set of features. The naming “naive” comes from the assumption that individual features are independent from one another. Although this independence assumption is often violated, NB classifiers in practice still tend to perform well [34]. Different from other models, NB is simple, computationally efficient, and robust to noise and missing data [35]. It is also a potential classifier to be used in real applications because of its naturality toward medical prognosis and diagnosis as it uses all available attributes for prediction [36]. This approach is also used by physicians in diagnosis, in which every piece of information is crucial.

DT is a type of rule-based classifier that generates predictions with IF–THEN rules [37]. A DT consists of nodes (a dyadic Boolean operator to split data points on the basis of the satisfaction of condition rules), branches (outcomes of splitting according to instances’ feature values), and leaf nodes (final class assigned to instances from the entire decision-making process). Using the decision algorithm, data points are split at each node, thereby resulting in the largest information gain. The whole process provides DTs with their nonparametric properties. This feature also indicates that no distribution assumption is needed for input data. Hence, DTs are different from other classifiers that make certain assumptions, such as linearity. Nonparametric features give DTs high adaptability to fit numerical or categorical feature types across different datasets. Thus, tree-based classifiers often perform relatively well. DTs alone are unstable as small variations in datasets result in large differences in the structure and in model predictions. To compensate for this issue, ensemble learning outputs predictions by leveraging multiple classifiers. Two popular ensemble learning techniques are bagging and boosting [38]. In the bagging procedure, multiple random subsets (bags) of data are created with replacement. A base model is built on each individual subset, and the models are trained in parallel. Boosting involves training a number of individual models sequentially. Instead of running models in parallel, each subsequent model in boosting attempts to rectify the errors of the previous model by assigning high weights to misclassified data points (corrective boosting). Some examples include random forest (RF), which is bagged DT; and AdaBoost and gradient boosting, which are boosting learners. Overall, the biggest advantage of DTs is their interpretability for actionable decisions. However, for complex datasets, trees can become too large to visualize and interpret, and they are prone to over fitting [39].

3.3 Deep Learning

One major limitation of conventional ML models is the need to perform complex data preprocessing to extract requisite predictive features [40]. Therefore, significant domain knowledge and feature extraction expertise are required in model training. DL is a promising ML tool that can learn abstract features directly from raw data sources. A single well-trained network can yield state-of-the-art results in many fields without domain experts. DL is an extremely powerful tool in terms of processing and learning complex data to solve complicated tasks on the basis of inputs. However, it is not a one-size-fits-all tool in biomedical analytics applications [41].

The basic building block of DL models is the feed forward NN. An NN model is characterized by an activation function to convert inputs into outputs, and it is applied by a series of interconnected processing nodes. The overall idea of training an NN model relies on updating the weights of each node so that the deviation cost of prediction from true labels is minimized. This process is also known as gradient descent. Various advanced optimization functions have been proposed for effective training; they include RMS Prop, AdaBoost, and Adam [42]. Multilayer perceptron (MLP) is the simplest form of NN with three layers, namely, input, hidden, and output layers. Each layer can be composed of one or several neurons.

An NN with multiple hidden layers is called the DNN. With multiple layers, a DNN can extract more abstract features than a single-layer NN. The two commonly used DNN models with varying architectures are the recurrent neural networks (RNNs) and convolutional neural networks (CNNs). RNNs are a special type of NN that is designed to effectively process sequential information [43]. In RNNs, the hidden state at time t is computed by combining the current input and the hidden state at t − 1. Thus, the relationship between historical events and future outcomes can be established. This property is important in modeling long-term dependencies in clinical care as historical illness, and procedures may critically affect future outcomes, such as readmission and mortality. Two prominent RNN variants, namely, long short-term memory (LSTM) [44] and gated recurrent unit (GRU) [45], are widely used by researchers. Their gated mechanisms are designed to tackle the vanishing gradient problem of vanilla RNN. The gate operations in LSTM and GRU control how much information should be stored in the current state, how much is forgotten, and what information is to be passed down to the next step. In this way, the LSTM and GRU are able to learn long-term dependencies. In terms of structural differences, the GRU has fewer parameters than the LSTM and thus performs computations faster [46]. However, no concrete conclusion has identified which of the two RNN variants is better. This fact has also been proved in another work [47]. Researchers usually conduct multiple experiments to identify which model works best for their use case.

The CNN is special algorithm that can yield good results in image classification problems. Instead of modeling temporal information, the CNN effectively captures local temporal dependencies among clinical data [48]; specifically, convolutional layers with filters are applied to each region of the input (local features). Originally invented for computer vision, CNN models are also shown to be effective in handling laboratory results [49], medical feature embedding [50], and text classification [51]. Overall, the CNN is effective in mining semantic clues in contextual windows. An advantage of the convolutional operations in the CNN is that they are more parallelizable than the operations in the RNN, thus making the CNN relatively quick to train. At every time step, the CNN depends only on the local context rather than all the past states as in the RNN. One persistent downside of the CNN is its inability to model long-term semantic dependencies in a sequence. Nevertheless, the CNN performs essentially well in extracting features and in tasks in which feature recognition in the text is important.

Several challenges could hinder the efficacy of DL methods. Current challenges include poor data quality, inconsistent patient information, and lack of model interpretability [52, 53]. Model transparency also becomes a roadblock when putting these models into real use, in which case the mechanisms on how they operate cannot be easily understood. These difficulties need to be dealt with for DL to bring direct clinical impacts.

4 Data Sources of Clinical Predictive Modeling

Clinical data stored in EMRs can be classified into two types: structured and unstructured data. Structured data contain demographic information (such as age, nationality, address), basic information (such as height and weight), vital signs, laboratory results, drugs taken, comorbidity, and treatments/procedures. This type of data is generally stored in fixed-mode databases. Hospitals can choose their desired database systems from different vendors, and different systems can have different levels of retrieval capabilities.

Even if ML technology was developed on a structured data field, over 80% of medical data, such as clinical notes, remain unstructured [54]. Unstructured text is one type of narrative data, and it contains rich health information, such as history, diagnoses, symptoms, radiology reports, daily nursing notes, discharge records, and prescriptions. Clinical narratives provide a comprehensive picture of a patient by storing extensive valuable medical information. The text mining approach is required to discover the hidden knowledge underlying unstructured clinical notes.

5 Results

A total of 255 articles were gathered for possible inclusion in this review (Fig. 2). After the first level of title and abstract screening, 195 articles were retained for the Stage 1 review. A further review at Stage 2 excluded 115 articles that did not fulfill the predefined eligibility criteria. In the Stage 3 screening, 23 articles were excluded for being outside the prediction scope. The final set included 57 articles; 2 articles were review articles described in the Related Works section, and 55 were included for the assessment of study outcomes.

Fig. 2
figure 2

Screening flow diagram of study selection Process

6 Application to Readmission

Identifying patients at high risk of getting readmitted is a key strategy to reduce the number of hospital readmissions and the associated costs. Some measures of hospital readmissions are as follows: (1) condition-based, such as acute myocardial infarction (AMI), heart failure (HF), chronic obstructive pulmonary disease (COPD), and pneumonia; (2) procedure-based, such as coronary artery bypass grafting (CABG) and total joint arthroplasty (TJA); (3) hospital-wide all-cause readmission. The final 55 articles that met the inclusion criteria were divided on the basis of their study population cohorts into hospital-wide populations (including intensive care unit and emergency department readmission, n = 19) and patient-specific populations (n = 36). Figure 3 depicts the number of papers (only the studies included in the review) by prediction cohort. Most patient-specific models focused on heart conditions [55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72]. The remaining studies worked on readmission among patients with diabetes [73,74,75,76,77,78,79], psychiatric conditions [80,81,82,83], TJA [84, 85], COPD [86, 87], CABG [88, 89], and pneumonia [90].

Fig. 3
figure 3

Distribution of publications related to prediction by population cohort

6.1 Diagnosis-Specific Readmission

Table 2 summarizes the characteristics and predictive performances of diagnosis-specific models according to their population cohorts. Prediction performance was assessed on the basis of AUC, the commonly used metrics in binary classification, and sensitivity, which indicates the ability to detect readmission. Sensitivity is important in clinical settings as a low number of false negatives shows the ability of a model to detect all relevant readmissions within a population.

Table 2 Characteristics and performance of prediction models for selected studies in terms of various diagnosis settings

Among all diagnosis-specific studies, half of them (50%) built models to predict readmission among patients with heart-specific conditions. Among the 18 studies focused on heart conditions, 13 papers predicted readmissions among the HF population [56,57,58,59,60, 63, 65,66,67,68,69,70,71], 2 worked on AMI cohorts [61, 62], 2 developed models on general cardiovascular disease patients [55, 72], and 1 worked on the stroke population [64]. As shown in Fig. 3, this set is followed by readmission related to diabetes (n = 7), psychiatry (n = 4), COPD (n = 2), CABG (n = 2), TJA (n = 2), and pneumonia (n = 1).

Readmission risk has been modeled from different perspectives. Even with the emergence of the ML algorithm, 29 out of 36 articles adopted traditional statistical methods. Among these studies, ~ 90% used LR either as a baseline [56, 58, 60, 62,63,64, 68, 73, 74, 76,77,78, 83, 85,86,87] or the main model in prediction [60, 69, 71, 82, 88,89,90], and 3 studies derived their own risk scores on the basis of LR variable coefficients [61, 66, 84]. In the remaining 3 papers, the prognosis of readmission was carried out with Cox regression survival analysis. DT and its variants, such as RF, GBM, AdaBoost, CHAID, and SVM, remain popular models for predictive modeling, with 19 studies leveraging them in diagnosis-specific readmission [58, 62,63,64, 67, 68, 70, 73,74,75,76, 78,79,80,81, 83, 85,86,87]. NB and KNN were less commonly used. Being a potential approach that could improve the predictive ability of models, NN-based models were adopted by nearly half of the studies (n = 15) in their high-risk identification [56,57,58,59, 62, 64, 67, 68, 70, 73, 76,77,78, 86, 87]. Of these NN models, ANN or MLP have been widely used, and only 5 articles explored the applicability of RNN and CNN-based models [56, 57, 67, 86, 87].

Different demographic, clinical, laboratory, and social-economical features were included in models to predict readmission. Only 2 studies used unstructured data, namely, physicians’ notes and discharge summary, in their analysis of patient history embedded within clinical prose [67, 69]. The choice of readmission threshold is also an important aspect that influences study outcomes. According to Table 2, a 30-day period is the most widely used threshold, and it was used by 72% of the reviewed papers (n = 26), although some time spans did range from 90 days up to 2 years. Model performance varied greatly across different studies. The average AUCs among the 30-day readmission studies for heart condition, diabetes, psychiatry, and CABG were 0.633, 0.969, 0.761, and 0.677, respectively.

Only 2 studies worked on TJA by using 90 days as a threshold, and only 1 study reported an AUC of 0.665 [84]. In another study, a180-day COPD readmission was modeled with a discrimination of 0.737 [87]. One study worked on pneumonia-related readmission; however, AUC was not reported. With regard to AUC, most studies reported modest scores, with only 26% of 30-day models achieving a discrimination ability of above 0.75.

6.2 Hospital-Wide Readmission

For hospital-wide readmission, model types are described herein with their corresponding performances on the basis of readmission occurring within 30 days. For studies that used different readmission thresholds, the time spans were included in the descriptions.

In terms of hospital-wide prediction, Zebin and colleagues [91] proposed an LSTM + CNN model and achieved an AUC and recall of 0.821 and 0.742, respectively. AdaBoost showed a similar performance of 0.76 for both metrics in another work [92]. Pauly et al.[93] derived a rule-based risk score on the basis of the coefficient of the LR model and achieved moderate discrimination ability (AUC = 0.74). The authors of [94, 95] used LR and produced low to moderate performance (AUC = 0.712 and 0.661, respectively). GBM gave an AUC of 0.699 in predicting readmission among patients in skilled nursing facilities [96]. However, LR with lasso was shown to perform better than GBM in another research [97]. A poor discrimination of 0.60 was observed with LR prediction based on claims data available during admission [98]. Flaks-Manov et al.[99] employed the previously validated Preadmission Readmission Detection Model and added hospital data. Their model performed moderately with an AUC of 0.68. Lin et al.[100] used an advanced DL model to capture sudden fluctuations in clinical data, and LSTM was able to identify readmission at a sensitivity rate of 0.742 and AUC of 0.791. Table 3 shows the list of eligible studies on hospital-wide readmission.

Table 3 Characteristics and performance of prediction models for hospital-wide readmission

In a 90-day hospital readmission problem, the LR model did not perform sufficiently well (0.65 AUC) as a screening tool [101]. The study of Barbieri et al.[102] showed promising results with regard to the use of the RNN with code embeddings computed by neural ordinary differential equations; the study achieved a sensitivity of 0.672 (AUC = 0.739). Yu and Xie [103] proposed an ensemble model that combines the weight boosting model with the stacking algorithm; they improved the recall to 0.891 and AUC to 0.879. One study attained a high discrimination of 0.866 with gradient-boosted trees [104], with LR obtaining a comparable performance. The HOSPITAL score model was shown to output a similar performance as in original studies (AUC = 0.66) [105]. Hammer et al.[106] used LR to derive a score-based model that yielded a discrimination of 0.78 among intensive care unit (ICU) patients. The AUC for the model derived with LR in one study was 0.71, but the model performed well with mortality prediction [107]. Saleh et al.[108] assessed how well a 30-day model predicts 7-day readmission and proved that a 7-day model had a similar discrimination of 0.66 to LR. Li and colleagues [109] explored different ML algorithms and reported an AUC of 0.79 at admission with RF and 0.88 at discharge with GBM.

Generally, all the articles shown in Table 3 included structured data in their modeling. A 30-day period was the most commonly used threshold in hospital-wide readmission (90% of studies), except 1 article that focused on 90-day readmission [101]; in another study, the threshold was not specified [106]. Of the 19 studies, 5 worked on ICU readmission with a mean AUC of 0.725. Mišić et al. [104] attempted to predict postoperative readmissions with a high discrimination of 0.866. The remaining studies developed general models with a mean AUC of 0.73. Unlike diagnosis-specific prediction, this study identified that about 42% of the models produced AUCs above 0.75.

6.3 COVID-19

Coronavirus disease 2019 (COVID-19) is a complex clinical illness with potential complications that might impact quality of life and require ongoing care [110, 111]. Although numerous patients have survived from it, concern arises with regard to the outcomes after initial hospitalization. Nearly 1 in 10 patients were readmitted within 2 months after receiving inpatient care for COVID-19 [112]. The rate of readmission or death has been proved to be higher than that of pneumonia or HF within 10 days following COVID-19 discharge [113]. Understanding the risk factors underlying readmission can assist clinicians in making informed decisions on the discharge process. In addition, relevant health authorities are able to arrange proper healthcare planning so that hospitals have sufficient resources for the acute care of patients. This aspect is crucial because the high hospital attendance rate of COVID-19 has reduced the capability of hospitals to treat other serious diseases [114].

To understand the causes of readmission from various aspects, this study described some works that focused on the risk factors associated with readmission among COVID-19 patients. Jeon et al.[115] used an LR model to analyse the factors affecting readmission. The results of the model showed that patients who are male, are 65 years of age or older, own medical benefits, and had a shorter length of stay were associated with a high risk of readmission. Another study performed a statistical analysis and found that the percentage of hypertension and malignancy cases was relatively high among readmitted patients [116]. Lavery et al.[112] similarly identified that older age increases the odds of readmission, as well as the presence of chronic conditions, i.e., COPD, HF, diabetes, chronic kidney disease, and obesity [body mass index ≥ 30 kg/m2]. With LR and statistical testing, Parra et al.[117] found increased risk among immune compromised patients and those who presented fever within 48 h prior to discharge.

Overall, COVID-19 infection is still prevalent, and further outbreaks remain possible. Further research of readmitted patients should be encouraged to refine relevant risk factors that could help discharge patients safely. As a 30-day readmission rate is a common quality indicator for diseases such as HF and COPD, comprehensive studies are necessary to reveal the predictors of COVID-19 and to investigate the usefulness of this readmission rate in representing the quality of patient care among COVID-19 patients. Nevertheless, the studies included in this work suggest the need to have continued health interventions to prevent adverse post-discharge events, such as readmission among older patients and those with underlying medical comorbidities.

7 Discussion

This overview included 55 studies that reported the development of readmission risk prediction models regardless of their readmission threshold, model type, and population cohort. These studies were analyzed to answer the initial problem statement: (1) “What are the current trends in modeling readmission?” (2) “Do complex models perform better than simple ones?” (3) “How should complex event predictions be dealt with?” This overview study presents the current trends in such models in the readmission domain.

The penalties charged to hospitals with high readmission rates in the HRRP (particularly the six conditions or procedure-specific 30-day readmission) have increased the number of papers related to this issue. In this work, 36 of 55 unique models (65%) were found to be specific to certain diseases. Among these models, 35% predicted readmission among patients hospitalized with HF. Out of all the included studies, only 2 studies leveraged unstructured clinical notes to discover salient information that could be missing in structured data. Most studies included regression models either as a baseline or as a main method in modeling readmission as a dichotomous target (74.5%). A total of 4 studies used LR to derive simple score-based models that are easy to use. A total of 3 disease-specific studies modeled outcomes as a survival function with Cox proportional hazard regression, which allowed flexibility in handling censored data. In terms of ML methods, the most frequently used model is tree-based (44% of all studies and 75% of studies using ML techniques). In fact, RF is the most utilized algorithm among tree-based models (79%). Of these ML models, 62.5% used NN-based classifiers, followed by SVM (40.62%). This work observed a substantial growth in the literature that incorporated NNs in prediction relative to the 23.5% adoption rate reported by recent reviews in 2018 [16].

AUC is the de facto metric for measuring the discrimination ability of prediction models. Sensitivity is another important measure so as not to miss any readmitted patients. Of all studies, 48 (87.3%) and 28 (50.9%) reported AUCs and sensitivities of the developed models. As the performance of the models varies depending on readmission threshold, this work compared the results in terms of AUCs for 30-day readmission (which covered ~ 78% of the selected studies, that is, 26 disease-specific studies and 17 hospital-wide studies). However, only 21 disease-specific model reported AUC. As depicted in Fig. 4, risk models derived among hospital-wide populations tend to have a high discrimination with a mean AUC of 0.7412 (median, 0.716) relative to disease-specific models (mean AUC of 0.7368, median: 0.6987). Figure 5 summarizes the relationship between predictive performance and different models. Prediction models were categorized into three classes, i.e., regression (LR and score-based), ML (NB, SVM, tree-based), and NN (MLP, RNN, CNN). Models using ML were found to perform better than those that use regression, whereas complex NN models do not achieve better discrimination in both readmission populations. There exists great variability among ML and NN models in disease-specific readmission while regression exhibits comparable variability with a mean AUC of 0.68. The fact about the adoption of complex models not leading to substantial improvements in performance was also demonstrated by some studies. For example, LR and ML models, such as SVM and RF, showed comparable performance against LSTM [91, 100]. In studies involving hospital-wide predictions, boosting tree was applied and exhibited the best performance in terms of AUC [103, 109]. Tree-based models consistently performed exceptionally well in diagnosis-specific populations by using RF. In addition to their great predictive ability, DT models facilitate interpretable decision analysis, which is particularly beneficial in clinical settings where physicians can identify which features are used to inform decisions. Models such as GBM can even handle a mix of discrete and continuous predictors and allow for missing values of predictors. The prognosis of general readmission appears to initially start with the inclusion of demographic predictors (such as age and gender), clinical biomarkers (comorbidity and illness severity), length of stay, and number of hospital stays before index admission. The inclusion of patient information in EMRs, such as medications, laboratory results, and surgical-related data, has the potential to improve the predictive ability of models [92, 94, 104, 106]. However, adding more variables increases the complexity of models and could pose a challenge in usage and implementation. Moreover, not all data are readily available across different EMRs in different health institutions [14]. Most notably, a number of studies suggested that existing predictors are sufficient to discern the highest-risk patients [93, 96, 97, 99, 105]. Readmissions associated with specific diseases are difficult to compare because they are affected by different risk factors [109]. There is also value in venturing into unstructured notes because most data are stored in such formats. Further research is needed to verify the potential benefits of linking unstructured data to high-risk readmissions.

Fig. 4
figure 4

Studies reporting AUC for 30-day readmission in diagnosis-specific population (21 models), and hospital-wide prediction (17 models). Only 8 out of 38 studies showing discriminatory power of > 0.8

Fig. 5
figure 5

Comparison of AUC performances of regression, ML, and NN models for hospital-wide and diagnosis-specific readmission. There exists greater variability among disease-specific models compared to hospital-wide models. The p-Value is reported with t-test for comparison between regression and ML (0.001), also regression and NN (0.051)

Despite the efforts exerted to model readmission, the usefulness of these models in real clinical practice remains relatively understudied. Only 2 out of all studies discussed the potential applicability of their developed models. Ashfaq et al.[57] presented possible cost savings of up to $2 million from model implementation through financial analysis (given an intervention success rate of 0.5 and $700 intervention cost). They suggested that a highly precise model will have significant cost savings if interventions are proved effective. Another study demonstrated a potential cost saving of $400 K [85]. In future, the clinical utility of readmission risk prediction models will need further attention. This application can be realized by employing models in real use cases and comparing monthly intervention costs versus control groups (readmission costs).

A worth endeavor is to apply regression and tree-based models in any readmission modeling task. First, these models perform well. Second, they offer transparency in feature selection and relative contribution of individual variables. Being labeled as having the greatest potential to boost accuracy, NNs do not necessarily outperform simple approaches, in addition to their black box nature. Moreover, future studies could consider the following aspects that can effectively contribute to clinical outcomes: the final best-performing model should be presented in a simple way to ensure the interpretability of results; clinical data that can be easily collected by health systems should be included; reporting should include not only the discrimination ability of the model but also its usefulness in clinical adoption; models should be able to identify high-risk patients as early as during the index hospital stay instead of upon discharge.

The main strength of the current study was that it involved a comprehensive search that covered the latest articles in the readmission literature. Nevertheless, this overview study has certain limitations. First, non-indexed studies might have been missed in this work. Second, studies marked as meeting or conference abstracts were excluded. These articles would have been selected if full texts were published later. Third, as predictive performance varies greatly according to study population, caution should be used when comparing models across different populations, especially for studies focused on specific diseases. Further descriptions of disease-specific models could provide a comprehensive understanding of the risk factors and characteristics of certain diseases.

8 Conclusion

The studies presented herein included 55 articles that developed prediction models for readmission risk. LR remains the commonly used approach. ML, particularly tree-based models, is a promising technique that can improve predictive ability. In the last two years, NNs, such as MLP, RNN, and CNN, have been increasingly employed. The results showed that the performance of models varies according to different target populations. Overall, ML models tend to outperform statistical models. However, state-of-the-art DL does not guarantee excellent results and exhibits a black box nature that mitigates the possibility of buy-in from investors. Although the availability of the enormous volume of electronic clinical data might further improve predictive ability, the performance of any model is limited by the absence of relevant data. Although some features, such as social factors, have been proved to be associated with a high risk of readmission, these data are not readily available in health institutions. Around 95% of past approaches are accompanied by a structured representation of patient data for the development of risk models. Considerable amounts of patient information are clearly stored in unstructured data fields. Further studies are needed to investigate the value of unstructured prose in the readmission literature. Despite models showing predictive capabilities, most studies lacked methodologies for demonstrating the clinical usefulness of models, such as how models reduce readmission rates and induce cost savings. Thus, even a model with an AUC of 0.90 may not be useful given its noncertified clinical utility.

For the successful application of ML, the process of feature extraction, manipulation, and selection is critically important. A comparative study of feature engineering techniques, such as missing value imputation and variable selection or reduction, could be useful for future research in this field. Another major challenge for ML classifiers is the imbalanced dataset. Future research should extend the current work so as to compare numerous class-balancing techniques dedicated to tackle this problem.