1 Introduction

According to the World Health Organization (WHO), cardiovascular disease (CVD) was the primary cause of deaths in the world [1]. In 2015, 17.7 million people died of CVD, and it held 31% of deaths in the world. With the publication of the Framingham Heart Study in the 1960s [2], the concept of risk factors for CVD was established and CVD risk prediction models using the risk factors have been studied. So far, data mining techniques of prediction models on CVD occurrences in AMI patients can be cataloged into two different approaches such as regression-based and machine learning–based methods. Currently, Framingham risk score (FRS) [3,4,5], QRISK [6, 7], TIMI [8, 9], and GRACE [10,11,12] models are available in regression-based cardiovascular risk prediction models. Their developments aim to apply for clinical diagnosis by converting the risk factors into risk indices by using the thorough study and personal computation of dataset.

There are many technological features to promote physical activities using smartphone apps [13].

Machine learning techniques are being effectively used in order to collect medical data from divers’ sources [14]. There were also various machine learning–based classification methods for the prediction model of CVD occurrences, such as random forests (RF) [15], neural networks (NN) [16,17,18], and support vector machines (SVM) [19, 20]. Unstructured data of CVD patients was used to identify early heart failure using the Naïve Bayesian machine learning approach [21]. The machine learning–based approaches were known as methods to solve the limitations of traditional regression-based prediction models. The main characteristics of the machine learning–based models were available for finding the associations between different diseases and had the high accuracy of prediction and excellent ability to process missing and outlier datasets using data mining tools and techniques. It was also possible to perform large-scale data analysis by small and incomplete training datasets and many dependent variables which were the disadvantage of the regression-based model (logistic regression and Cox-proportional hazard regression models) [22].

Nevertheless, we can summarize the challenging issues in previous prediction models of CVD occurrence in AMI patients as follows. First, most of the previous regression-based CVD prediction models do not provide good results in the prognosis and diagnosis of the CVD occurrences in patients with moderate risk. For example, approximately half of myocardial infarction (MIs) and strokes will occur in people who are not predicted to be at the risk of CVD [23]. Moreover, previous guidelines for CVD risk showed low prediction accuracy, and thus unnecessary treatments sometimes occurred in patients with moderate risk due to inefficient knowledge of prediction models. Second, we implicitly assumed that each risk factor in regression-based CVD prediction model was related to the occurrence of MACE in AMI patients and the non-linear interactive relations among risk factors were oversimplified [22]. Third, conventional regression-based CVD prediction tools included the major risk factors such as age, blood pressure, heart rate, diabetes, cholesterol, smoking, and history of heart disease, whereas machine/deep learning approaches appeared different risk factors [24]. Fourth, machine/deep learning–based studies have done in various medical areas but mainly focused on analyzing medical images using the convolutional neural network (CNN) [25]. Especially, there was no deep learning–based prediction model for the prognosis and diagnosis of the MACE occurrences on clinical data in AMI patients and there was insufficient knowledge of prediction models in previous research works.

Therefore, this paper proposes a deep learning–based prediction model of the MACE occurrences during the 1, 6, and 12-month follow-up in the AMI clinical data with the aim to develop a knowledge mining–based clinical decision support system and assess the risk of after hospital discharge patients. The detailed research contents of this paper can be summarized as follows. First, our deep learning–based prediction model of the MACE occurrences is generated by applying training dataset and hyperparameter values obtained from the random hyperparameter grid search method on the dataset. Then we obtain many preliminary prediction models that consist of 20 input neurons, three hidden layers, and one output neuron. Second, the preliminary prediction models are evaluated through validation dataset, and then the best prediction model is selected. Third, the selected prediction model is finally evaluated with test dataset. Lastly, the accuracy of proposed prediction model will be compared with GBM, GLM, and GRACE during the 1, 6, and 12-month follow-up after discharge.

2 Method and materials

2.1 Data preparation

The Korea Acute Myocardial Infarction Registry (KAMIR) was the first nationwide, multicenter online registry designed to describe the characteristics and clinical outcomes of patients presenting with myocardial infarction (MI) and reflected current management of patients with acute myocardial infarction (AMI) in Korea [26]. The registry includes 52 community and university hospitals with the capability of primary PCI. Data were collected at each site by trained study coordinators with the standardized protocol retrospectively. For the experiment, this paper used 14,885 AMI subjects enrolled in KAMIR from 1 October, 2005, to 28 February, 2008. Among them, we selected 10,813 patients aged between 20 and 100 years old with the 1-year follow-up MACE after hospital discharge. Excepted the records of 4072 with missing values and with no attribute “MACE” value in the population during the follow-up period, the experimental dataset consisted of 10,813 records × 49 columns with 21 numerical, 26 categorical, and 4 discrete data. The dataset was preprocessed by creating new variables (age, survival time) and reevaluating variables (Killip class, LVF, etc.) before creating the prediction model of MACE occurrences.

2.2 Applied risk factors

The KAMIR data schema has 51 variables as shown in Table 1, and their variables and types are divided into two domains of demographic characteristics and clinical findings. The demographic characteristics are age, gender, height, weight, blood pressure, heart rate, Killip class, heart rhythm, hypertension, diabetes mellitus, pain, dyslipidemia, smoking history, family history of heart disease, history of ischemic heart disease, comorbidities, final diagnosis, and MACE. Clinical features include glucose, creatinine, creatine kinase (CK), CK-MB, Troponin-I, maximum Troponin-I, maximum Troponin-T, total cholesterol, triglyceride, HDL-cholesterol, LDL-cholesterol (hs-CRP), N-terminal brain natriuretic peptide (NT-proBNP), and glycated hemoglobin. The primary endpoints in this study are defined as major adverse cardiac events (MACE) that occurred during the 1, 6, and 12-month follow-up after discharge. We determined the MACE as cardiac death, non-cardiac death, re-PCI, and CABG, and calculated as a patient experienced the MACE not the total of events but at least one or more during the 1, 6, and 12-month follow-up after discharge.

Table 1 Variables applied to the prediction model of MACE occurrences

2.3 Training, validation, and test datasets

In case a prediction model either increases the number of parameters or uses a non-linear model such as a neural network model, the prediction performance by training data can be high in a moment due to overfitting. With the overfitting occurrence, the prediction accuracy by training data is high, but the prediction performance by test data can be rapidly reduced. Therefore, to objectively measure the performance of prediction model, our proposed prediction model was evaluated by a new dataset that did not use for training, namely validation and test datasets. Experimental datasets are divided into training dataset of 60% for model learning, validation dataset 20% for model selection, and test dataset 20% for evaluating the final selected model by random sampling without replacement on the bases of knowledge-based observations of dataset.

2.4 Applied machine learning algorithms

For the implementation of prediction model of MACE occurrences in AMI patients, this paper applied three machine learning algorithms: deep neural networks (DNN) [27], GBM [28], and GLM [29]. First, DNN is an artificial neural network (ANN) with multiple hidden layers between the input and output layers, consisting of three hidden layers in artificial networks and non-linear patterns in unstructured data. In this paper, we use the deep feedforward networks which are a quintessential deep learning model of the DNN model. ANN develops the relationship between input layer and output layer using the same architecture as derived from the human brain. All layers comprise of numerous neurons or nodes. The ANN applies a backpropagation algorithm for training process. Second, GBM is the method of boosting method plus gradient descent. It creates a model, generates a fitting model to the residual, and combines both models. Then, in case the residual again finds in the coupled model, then the fitting model creates in the residual, and the final prediction model generates by repeating until the residual does not exist. GBM provides a competitive approach for both regression and classification, especially for classifying less clean data. A forward stage additive model is designed by gradient descent function space to build a regression tree for different features in a distributed way [30]. We performed 50 bosting stages for GBM training. The deviance-loss function has been applied for optimization and Friedman mean-square-error function has been applied measuring splitting quality. Third, GLM is an extension of the linear regression model which enhances the linear model so that it analyzed even when dependent variables are not in the normal distribution. GLM is a combination of traditional statistical methods and machine learning techniques in which dependent variables are linearly related to independent variables through a specified link function and finds the combination of hyperparameter values via the grid search approach. GLM contains Bayesian regression, ridge regression, elastic net, and lasso estimator which is calculated by coordinate descent and least angle regression. We implemented stochastic gradient descent (SGD) algorithm for training GLM model. For our experiments, the proposed prediction model will be implemented using RStudio software with the caret library package and the H2O package for the prediction and regression algorithms.

2.5 Generation of the prediction model

The main steps of the proposed deep neural network (DNN)–based prediction model are shown in Fig. 1. First, a DNN model generates twenty preliminary prediction models by applying training dataset and random hyperparameter values via the grid search algorithm [31, 32]. The used DNN algorithm is a deep feedforward network [33], consisting of 20 input neurons and three hidden layers, and generates a model with all combinations of hyperparameter values via not “hand tuning” but random grid search (RGS). All nodes of different layers used activation functions to determine the result of the neural network like Yes or No. Activation functions are typically non-linear functions such as Sigmoid, Softmax, tanh, ReLU, and Leaky ReLU. We applied the ReLU activation function because of its effectiveness. It simply replaces the negative values with zero and positive values remain unchanged. Moreover, Adam optimizer which is a stochastic gradient–based optimizer was used for weights optimization. During the model learning and generation processes, early stopping and 10-fold cross-validation method [34, 35] were used to select the optimal number of iterations automatically. Second, the performances of the preliminary prediction models were evaluated with the validation dataset. Third, the best hyperparameter values are selected and the best prediction model is generated from it. Moreover, the proposed model is evaluated by applying test dataset to generalize the performance. Here the role of the test dataset is to serve as a new dataset that the prediction model has never been applied before. Lastly, various performance matrixes such as accuracy, sensitivity, specificity, and the area under the ROC curve (AUC) are applied to evaluate the performance of proposed model.

Fig. 1
figure 1

Main steps of building prediction model of the MACE occurrences in AMI patients using deep learning algorithm

2.6 Performance measures

We applied validation and test datasets to evaluate the accuracy of the MACE occurrence prediction models, and the experimental results will be described in different matrixes including accuracy, sensitivity, specificity, and the area under the ROC curve (AUC) for actual results versus predicted results.

2.7 Statistical analysis

Continuous variables (i.e., age and BP) were analyzed using a t test, and categorical variables (i.e., gender, DM, and smoking) were done using the chi-square test. Statistical significance is set to p value < 0.05. All the statistical analysis and data processing of all datasets were implemented using R software version 3.4.0.

3 Results

3.1 Baseline characteristics

After pre-processing the population, 10,813 AMI patients were selected for the experimental data, and the basic information of participating patients appeared in Table 2. The results showed that the occurrence of AMI was higher in young males than in females. Both also had higher pain, more STEMI in males, and higher dyspnea and Killip class in females than males. Clinical findings were higher in AMI patients than in healthy controls, especially in patients with AMI, CK, maximum CK-MB, maximum TnI, maximum TnT, NT-proBNP, and BNP. In the case of male’s medical history, smoking history, family history of heart disease, and complications were high, whereas previous angina before MI symptom, hypertension, and DM in female was higher than in male. In the medical procedure, thrombolysis and PCI performed much more in males than females.

Table 2 Baseline characteristics of all subjects (10,813 patients)

3.2 Variable significance in prediction models

The significance of all variables in each prediction model was calculated as a percentage. The significance degree of the variables varies between 0 and 1: the significance of the most instrumental variable is 1 and the significance of the lowest is 0. Table 3 showed the top eight instrumental variables that each prediction model needs to predict the MACE occurrence during the 1, 6, and 12-month follow-up duration after discharge in AMI patients. The primary risk factors in the prediction model were different: (1) complications, CABG, pain, and history of DM in the deep learning model and (2) complications, taking a statin, NT-proBNP and angiographic findings in the GBM model, as well as (3) NT-proBNP, history of DM, and family history of HD in the GLM model. The machine learning–based models appeared a common variable for the attribute “complications” at 1 and 6-month follow-up after discharge, and the remaining models except for GLM did an instrumental variable for the attribute “age” at the 12-month. Note that the primary risk factors were different depending on the model and their significance was not always related to the accuracy of prediction models. Furthermore, the primary risk factors in machine learning–based models were also very different from those in traditional regression-based models.

Table 3 Descending ranking of top eight primary risk factors in four prediction models of the MACE occurrence during 1, 6, and 12-month follow-up after discharge: DNN, GBM, GLM, and GRACE

3.3 Comparisons of AUC in prediction models

To evaluate the prediction models of the MACE occurrences in AMI patients, we compared the performance of prediction models according to the accuracy, sensitivity, specificity, and AUC. Table 4 showed the performance of all prediction models according to the evaluation indicators. The accuracy of all prediction models except the GRACE model was the highest at 1-month follow-up after discharge, and the DNN model showed more than 95% accuracy at 1, 6, and 12 months. Sensitivity showed the best value at the 12-month forecast in GBM model, and specificity was more than 95% in all models. Note that the DNN model showed the highest AUC value at 1-month follow-up after discharge, the GBM at 6-month, and the DNN and GBM models at 12-month.

Table 4 Comparison of the performance in four prediction models of the MACE occurrence during 1, 6, and 12- month follow-up after hospital discharge

4 Discussion

This paper proposed a deep learning–based prediction model of the MACE occurrences during the 1, 6, and 12-month follow-up after discharge in AMI patients and our results could be summarized as follows. This paper was the first trial of deep learning–based prediction model of the MACE occurrence. DNN method is a very successful technique that follows the same training method as the human brain for the prediction of the relationship between input data and target data. First, the primary risk factors in machine/deep learning models were CABG, complications (especially complications with DM), and pain. The attribute “age” in other models except GLM was the most instrumental variable in the prediction of MACE occurrences at 12-month follow-up. Second, the AUC values of MACE occurrence during the 1, 6, and 12-month follow-up after discharge were (1 M 0.97, 6 M 0.94, 12 M 0.96) in DNN and (0.96, 0.95, 0.96) in GBM. Consequently, the prediction accuracy in machine learning–based models was significantly superior to those of GRACE (0.75, 0.72, 0.76) in the regression-based model. The DNN model also showed more than 95% of accuracy at 1, 6, and 12-month follow-up after discharge.

It showed that machine/deep learning–based prediction models were suitable for the prediction of the MACE occurrences at 1, 6, and 12-month follow-up after discharge, and especially the prediction accuracies in machine/deep learning–based prediction models were superior to the established regression-based prediction model. The deep learning–based prediction model highly improved the prediction accuracy by automatically determining the primary risk factors required for MACE prediction in real time according to the characteristics of input data. In this paper, we found new primary risk factors for the prediction model of the MACE occurrences during 1, 6, and 12-month follow-up after discharge instead of traditional risk factors, such as smoking history, high blood pressure, high blood cholesterol, diabetes, physical inactivity, being overweight or obesity, family history of heart disease, and chest pain. As a result, the deep learning–based prediction model showed that it was not necessary to find the significant risk factors for the prediction model of MACE occurrences in AMI patients. The proposed model outperformed all the other machine learning and proved to be more successful for the prediction of the MACE occurrence in AMI clinical data.

There were some potential limitations in our research. First, KAMIR is not representative of all the world AMI patients because it is only a collection of Korean AMI patients. Second, the experimental data can be biased because the registered dataset does not have the information about the details of dosage, dose, and duration of taking drugs and beta blockers in each patient, as well as be limited to the short period follow-up of 1 year after hospital discharge.

5 Conclusion

This paper was the first deep learning–based prediction model of MACE occurrences during 1-year follow-up after discharge in AMI clinical data and highly improved the accuracy of prediction on KAMIR dataset using the knowledge and personal computing. Compared with a traditional GRACE risk prediction tool, the prediction accuracy of machine/deep learning–based prediction model was significantly higher. Especially, the proposed deep learning–based prediction model showed the accuracy of 95% or higher at 1, 6, and 12-month follow-up after discharge. Consequently, the deep learning approach was expected to provide a more efficient diagnosis and prediction tool for the MACE occurrences in AMI patients as a result of the knowledge-based personal computation in the future.