Background

The rapidly aging population is one of the most important issues worldwide. In the United States, older adults (≥65 years old) were 15.2% of the total population in 2016, are projected to be 20% in 2030, and 23.5% in 2060 [1]. Taiwan is one of the rapidly aging countries in the world. In 2018, the number of deaths was nearly equal to that of births [2]. Older adults represented 14% of the total population in 2018, and are projected to be 20% in 2025 [2].

Influenza is a life-threatening disease for the older population. An Asian study revealed that older adults contributed to 70–90% of total deaths [3]. The mortality rate of influenza in older adults was nearly 39-fold that of the population aged 40–64 years old [3]. Influenza-related complications, including cardiorespiratory diseases, pneumonia, chronic obstructive pulmonary disease (COPD), and ischemic heart diseases, are the common causes of death [3].

Because of limited medical resources during the influenza season, predicting outcomes in older adults with influenza and their subsequent disposition becomes a critical issue. In our previous study, we recruited 409 older patients with influenza for developing a Geriatric Influenza Death Score (GID score) [4]. This study identified five independent mortality predictors: severe coma (Glasgow Coma Scale [GCS] ≤8), past histories of malignancy and coronary artery disease (CAD), elevated C-reactive protein (CRP) levels (> 10 mg/dl), and bandemia (> 10% band cells) [4]. Three mortality risk and disposition groups were formed according to five predictors: (1) low risk (1.1%; 95% confidence interval [CI], 0.5–3.0%); (2) moderate risk (16.7, 95% CI, 9.3–28.0%); and (3) high risk (40, 95% CI, 19.8–64.2%). The GID score has an area under the receiver operating characteristic curve of 0.86, and Hosmer-Lemeshow goodness of fit of 0.578 [4].

Although the GID score is a potentially good clinical decision rule (CDR) in older adults with influenza, it has the limitations of the small size of derivation sample and lacks both automation and feedback in real time to clinicians [5]. Artificial intelligence (AI) is defined as that uses computer techniques, including machine learning (ML) and deep learning (DL) to represent intelligent behavior [6]. In recent years, a great deal of evidence showed that AI could handle more variables that are already available through electronic health records (EMRs) and may better predict patient outcomes [5]. We performed searches on Google Scholar and PubMed using the keywords “AI,” “death,” “influenza,” “machine learning,” “mortality,” “older adult,” and “outcome,” but we did not find any AI application in this field. Therefore, we conducted the present study for clarifying the issue and applying it in the hospital information system (HIS) to assist decision making in real time.

Methods

Study design, setting, and participants

We included emergency physicians, information engineers, data scientists, quality managers, and nurse practitioners to establish a multi-disciplinary team for this project (Fig. 1). After our literature review, we decided to use the previous study about predicting mortality in older ED patients with influenza as the main reference [4]. We identified all older patients (≥65 years old) with influenza who visited the ED between January 1, 2009, and December 31, 2018, from the EMRs of three hospitals: Chi Mei Medical Center, Chi Mei Hospital, Liouying, and Chi Mei Hospital, Chiali. The present study hospitals are not the hospitals for developing the GID score. The criteria of influenza are defined as the diagnosis of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) of 487 or 488 or a prescription of Oseltamivir, Peramivir, or Relenza in the index ED visit.

Fig. 1
figure 1

Flowchart of the application of ML for predicting outcomes in older ED patients with influenza. ED, emergency department; KNN, K-nearest neighbors; SVM, support vector machine; LightGBM, light gradient boosting machine; MLP, multilayer perceptron; XGBoost, Extreme Gradient Boosting; AI, artificial intelligence

Definitions of feature variables

We adopted 10 potential risk factors proposed in the previous study for predicting mortality in the older patients with influenza as the feature variables for the ML or DL in this study [4]: (1) tachypnea (respiratory rate > 20/min); (2) severe coma (GCS ≤8, 3) history of hypertension; (4) history of CAD; (5) history of malignancy; (6) bedridden; (7) leukocytosis (WBC > 12,000 cells/mm); (8) bandemia (> 10% band cells); (9) anemia (hemoglobin < 12 mg/dL); and (10) elevated CRP (> 10 mg/dL).

In addition, we also recruited age, sex, vital signs, and past histories of hypertension (ICD-9: 401–405), diabetes (ICD-9-CM: 250), COPD (ICD-9-CM: 496), CAD (ICD-9-CM: 410–414), stroke (ICD-9: 436–438), malignancy (ICD-9: 140–208), congestive heart failure (CHF, ICD-9-CM: 428), dementia (ICD-9: 290), bedridden, feeding with a nasogastric tube, and nursing home resident, laboratory data including white blood cell count (WBC), bandemia, hemoglobin, platelet, serum creatinine, CRP, procalcitonin, glucose, Na, K, GOT, and GPT for this study. The patients who did not have a record of subsequent follow-up were excluded. Missing laboratory data were treated as the normal values (i.e., respiratory rate: 12/min, GCS: 15, WBC: 7000 cells/mm, band form: 0%, hemoglobin: 12 mg/dL, and CRP: 2.5 mg/dL).

Outcome measurements

The outcome measurements were binary coded as the follows: (1) hospitalization; (2) complications with pneumonia (ICD-9-CM: 480–486): (3) complications with sepsis or septic shock (ICD-9-CM: 038, 790.7, 995.91, 995.91, 785.52); (4) admitted to intensive care unit (ICU); and (5) in-hospital mortality.

Ethical statement

The present study was approved and granted permission to access the raw data by the institutional review board in the Chi Mei Medical Center. Because this study is retrospective and it contains de-identified information, informed consent from the participants was waived. The waiver does not affect the rights and welfare of the participants.

Data processing, model comparison, and application in the HIS

First, we extracted, transformed, and validated the data from the HIS into a data mart. Missing and ambiguous data were carefully processed at this step. Second, we randomly split the data to two dataset (70%/30%) and used the synthetic minority oversampling technique (SMOTE) to enlarge the first dataset (70%) as training dataset because of imbalanced outcome samples. The second dataset (30%) is used as testing dataset without any resampling. Third, according the optimal modeling result with testing dataset, we compared accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and the area under the curve (AUC) among the analyses of the random forest, logistic regression, K-nearest neighbors (KNN), support vector machine (SVM), light gradient boosting machine (LightGBM), multilayer perceptron (MLP, a kind of DL), and Extreme Gradient Boosting (XGBoost). In this step, we conducted grid search with hyper-parameters for each algorithm to obtain the optimal models (hyper-parameter ranges for each algorithm were summarized in Supplementary Table 1). Then, we selected the best algorithm to develop the prediction model for each outcome. Fourth, we deployed the model in the AI web service and integrated it with the HIS in the ED. After two-months of pilot testing and validating, we launched the prediction application in the HIS to assist physicians for decision making in real time.

Patient and public involvement

Patients and the public were not be involved in this study.

Results

In total, we recruited 5508 older ED patients into the present study. The mean ± standard deviation (SD) of age was 76.61 ± 7.44 years old, and the female proportion was 50.67% (Table 1). The proportion of the three age subgroups were young elderly (43.06%), moderately elderly (40.56%), and old elderly (16.38%). The mean ± SD of respiratory rate and GCS were 19.16 ± 3.94 breaths/min and 14.41 ± 1.84, respectively. The histories of ED patients were hypertension (56.05%), CAD (19.64%), malignancy (14.32%), and bedridden (31.94%). The mean ± SD of WBC, hemoglobin, and CRP were 8670.00 ± 4220.00 cells/mm3, 12.42 ± 1.95 mg/dL, and 42.06 ± 50.98 mg/dL, respectively. The proportions of patient outcomes were hospitalization (47.33%), complications with pneumonia (37.71%), complications with sepsis or septic shock (5.57%), admitted to ICU (1.07%), and in-hospital mortality (2.20%).

Table 1 Characteristics of older ED patients with influenza in this study

Comparisons of predictive accuracies among the random forest, logistic regression, KNN, SVM, LightGBM, MLP, and XGBoost revealed that the random forest model had the best AUC for hospitalization, pneumonia, and sepsis or septic shock than did other models in the testing dataset (Table 2 and Supplementary Fig. 1). The XGBoost had the best AUC for ICU admission (0.902) and logistic regression had the best AUC for in-hospital mortality (0.889). Table 3 summarized the best AUC for each outcome in the testing dataset, which was adopted for building the prediction model in further. Feature importance according to a random forest, logistic regression, LightGBM, and XGBoost for predicting the five outcomes was also reported (Supplementary Fig. 2).

Table 2 Comparisons of predictive accuracies among random forest, logistic regression, KNN, SVM, LightGBM, MLP, and XGBoost in the outcomes of testing dataset of older ED patients with influenza
Table 3 Evaluation report using the best model with the SMOTE preprocessing algorithm on the outcomes of testing dataset of older ED patients with influenza

We applied the best algorithm for predicting outcomes in older ED patients in the HIS to assist decision making in real time. An AI button was set up in the HIS of the ED (Supplementary Fig. 3). When the clinician presses the AI button, the AI application automatically catch the feature variables from the HIS and pops up a screen of the prediction result within 1 sec (Supplementary Fig. 4). The prediction result shows a personalized prediction for hospitalization, complications with pneumonia, complications with sepsis or septic shock, admitted to ICU, and in-hospital mortality. Using five-level Likert, a mean of 4.6 was responded by 101 times of use, which indicates that the AI prediction is useful for the clinicians.

Discussion

The present study revealed that the random forest had the best AUC for predicting hospitalization, pneumonia, and sepsis or septic shock, XGBoost had the best AUC for predicting ICU admission, and logistic regression had the best AUC for predicting in-hospital mortality in older ED patients with influenza. The predictions are very fast, in real time, and actionable, which provide prognostic information to assist in decision making, including disposition and outcome explanation.

Using AI prediction for assisting decision-making is an appealing idea [7]. Because of the increased availability of EMRs and advancement of computer performance and algorithm, AI prediction based on the medical big data becomes a promising way for healthcare [8]. In recent years, the rapid progression of cloud and IoT (internet of things) by healthcare monitor and wearable sensor networks also greatly support the development of real-time AI prediction [8]. Therefore, the AI-based tools, which are designed to improve diagnosis, care planning, and outcome will be incorporated into healthcare services in the near future [9]. Many regulations about AI use in healthcare need to be developed, including establishment of normative standard, evaluation guidelines, and monitoring and reporting systems [9].

The adopted feature variables in this study, including comorbidities and abnormal vital signs and laboratory data, are the risk factors for poor outcomes. The more feature variables, the poorer outcome in the result of AI prediction.

The random forest is superior to the traditional model (i.e., logistic regression) for developing CDR in predicting hospitalization, pneumonia, sepsis or septic shock, and ICU admission. One possible reason for the lower predictive accuracies of logistic regression is that it lacks external validation [5]. Traditional CDRs, including the GID score, are typically developed by gathering data at one or more hospitals, and then using both to derive and validate a model from a chosen set of predictors. The developed CDRs are then used in other hospitals, different from the original study hospital [10]. A recent study reviewed 127 new prediction models and showed that external independent validation was uncommon in predictive models [11]. Predictive performance in external validation tends to be worse than the original study [11]. In contrast to the GID score derived from other hospitals, we used local real-world big data in multi-centers to make predictions about the local population, which improves accuracy over the traditionally derived model. The variables used in the present study are structured data from the local EMR without being subjected to ambiguous clinical definitions or biases of data collection.

The random forest model is an ensemble learning method for classification and regression [12, 13]. It combines many binary decision trees, which are built by several bootstrapped learning samples, and chooses a subset of variables randomly at each node [12, 13]. Each tree in the random forest will vote for some input x, then the voting majority of trees will determine the output of the classifier [14]. The random forest can use a large number of trees in the ensemble to handle high dimensional data [14]. The random forest is a common method adopted for predicting outcomes and selecting predictors in the ED. A study about predicting in-hospital mortality in ED patients with sepsis revealed that the AUC of the random forest was 0.86, superior to the CART (classification and regression tree) model (0.69); logistic regression model (0.76); CURB-65 (Confusion, Urea, Respiratory rate, Blood pressure plus age ≥ 65 years old) (0.73); MEDS (mortality in emergency department sepsis) (0.71); and mREMS (modified rapid emergency medicine score) (0.72) [5]. A study used the random forest to select the most relevant variables for major adverse cardiac events in ED patients with chest pain [12]. They found that the selection predictor by the random forest is promising in discovering a few relevant and significant predictors [12].

The SMOTE adopted in the present study is the most common and effective method of oversampling for adjusting imbalanced data [15]. SMOTE solves the problems of both high-class skew and high sparsity and works in the “feature space” rather than “data space” [16]. By taking each minority class sample and the K-nearest neighbors, SMOTE creates synthetic samples for effectively forcing the decision region of the minority class to become more general [16]. Without duplicating the data, SMOTE increases the data space and amplifies the features of the minority class [16]. Studies with SMOTE preprocessing in health care are also acceptable [17, 18].

According to our literature review, the present study has the strength of being the first real-time prediction application in the HIS using ML for older ED patients with influenza. The limitations are as follows. First, interpretability and inferences about variables are the problems of ML. Second, we did not compare the predictive accuracy between this model and the physician’s judgment. Further studies about this issue, as well as the impact of this model, are warranted. Third, variable selection was not conducted in this study. We decided to adopt 10 potential risk factors proposed in the previous study for increasing the explainability for AI models. In the future, including as many variables as possible and reducing the number by running proper variable selection algorithms are needed. Fourth, the application may not be generalized to other hospitals because it needs building an infrastructure to make real-time predictive analytics a reality.

Conclusions

We developed the first real-time prediction application in the HIS for predicting outcomes in older ED patients with influenza using a big data-driven and machine learning approach. This real-time prediction is a promising way to assist the physician’s decision making and explanations to patients and their families. Further studies about the predictive accuracy between this model and both the physician’s judgment, impact of the application, and including as many variables as possible and reducing the number by running proper variable selection algorithms are needed.