A multi-variate heart disease optimization and recognition framework

Balaha, Hossam Magdy; Shaban, Ahmed Osama; El-Gendy, Eman M.; Saafan, Mahmoud M.

doi:10.1007/s00521-022-07241-1

A multi-variate heart disease optimization and recognition framework

Original Article
Open access
Published: 02 May 2022

Volume 34, pages 15907–15944, (2022)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

A multi-variate heart disease optimization and recognition framework

Download PDF

Hossam Magdy Balaha ORCID: orcid.org/0000-0002-0686-4411¹,
Ahmed Osama Shaban²,
Eman M. El-Gendy¹ &
…
Mahmoud M. Saafan¹

3314 Accesses
1 Altmetric
Explore all metrics

Abstract

Cardiovascular diseases (CVD) are the most widely spread diseases all over the world among the common chronic diseases. CVD represents one of the main causes of morbidity and mortality. Therefore, it is vital to accurately detect the existence of heart diseases to help to save the patient life and prescribe a suitable treatment. The current evolution in artificial intelligence plays an important role in helping physicians diagnose different diseases. In the present work, a hybrid framework for the detection of heart diseases using medical voice records is suggested. A framework that consists of four layers, namely “Segmentation” Layer, “Features Extraction” Layer, “Learning and Optimization” Layer, and “Export and Statistics” Layer is proposed. In the first layer, a novel segmentation technique based on the segmentation of variable durations and directions (i.e., forward and backward) is suggested. Using the proposed technique, 11 datasets with 14,416 numerical features are generated. The second layer is responsible for feature extraction. Numerical and graphical features are extracted from the resulting datasets. In the third layer, numerical features are passed to 5 different Machine Learning (ML) algorithms, while graphical features are passed to 8 different Convolutional Neural Networks (CNN) with transfer learning to select the most suitable configurations. Grid Search and Aquila Optimizer (AO) are used to optimize the hyperparameters of ML and CNN configurations, respectively. In the last layer, the output of the proposed hybrid framework is validated using different performance metrics. The best-reported metrics are (1) 100% accuracy using ML algorithms including Extra Tree Classifier (ETC) and Random Forest Classifier (RFC) and (2) 99.17% accuracy using CNN.

Comprehensive evaluation and performance analysis of machine learning in heart disease prediction

Article Open access 03 April 2024

Robust optimization of convolutional neural networks with a uniform experiment design method: a case of phonocardiogram testing in patients with heart diseases

Article Open access 08 November 2021

A Novel Heart Disease Prediction Approach Using the Hybrid Combination of GA, PSO, and CNN

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cardiovascular diseases (CVDs) are the most worldwide spread chronic diseases all over the world and represented the top cause of morbidity and death in the last ten years globally [1]. According to the World Health Organization (WHO), there are 17.9 million people died from CVDs every year representing 32% of all death cases worldwide. So, day after day, the cases of CVDs are at a rapid rate and by 2030, the yearly death rate will increase and reach 22.2 million people approximately [2, 3]. The center of disease control and prevention report confirms the expectations for increasing the mortality rate. It said that every 40 s, one person died due to the CVDs [4]. In Egypt as well, CVDs are considered as the leading cause of death in the last 30 years, and in 2017, the CVDs had an estimated value of 46.2% of all mortality cases [5].

CVDs are an umbrella term including a group of disorders for the heart and blood vessels containing many types such as (1) congestive heart failure, (2) coronary heart disease, (3) congenital heart disease, (4) cerebrovascular disease, and (5) rheumatic heart disease [6]. From five CVDs death, there are four cases resulting from strokes and heart attacks. So, heart disease can be considered as the most life-snatching chronic disease and its risk comes from the silently of the disease. It is not diagnosed until the symptoms of heart failure (or attack) are recognized [7]. In heart disease, the heart fails to do its normal function by supplying blood to other body parts because of the blockage of coronary arteries that are responsible for supplying blood to the heart [8]. The regular heart disease symptoms include (1) breath shortness, (2) body weakness, (3) confusion, and (4) fainting. This disease risk can be increased with people with perilous cases including (1) unhealthy diet, (2) smoking, (3) fitness issues, (4) high blood pressure, (5) exercise deficiency, and (6) high cholesterol level [9].

The early and accurate prediction for heart disease is very crucial to enhance the survival rate and reduce the mortality rate. This will help healthcare professionals in their decisions by providing an accurate and efficient diagnosis and treatment for patients to save their lives [10]. One of the approaches for the early and accurate prediction for heart disease is machine intelligence. This can be achieved using machine learning (ML) algorithms and deep learning (DL) approaches [11]. Various types of heart disease data such as images, waves, and sounds can be used to perform that [12].

Image data can be analyzed and the features are extracted, to train the ML (or DL) approach such as CNN, to determine if the images are belonging to a diseased or healthy patient [13]. Detecting heart disease can also be through gathering features obtained from cardiac sounds to be the input for a DL or ML algorithm beside those cardiac sounds can be converted into numerical data to be utilized as an input for a DL approach to check the patient condition if it had heart disease or not [14]. Another type of data which have been deployed for heart disease detection and know the patient condition through analyzing Electrocardiogram (ECG) and Electroencephalogram (EEG) waves to assign the correct label to be the input for the Recurrent Neural Network (RNN) model or extracting features from the signals and convert them into a numerical data which had been used as the input for ML algorithm [15]. ML algorithms such as support vector machines and decision trees have an essential role in predicting the existence of heart disease accurately by analyzing the medical data whether its voice or images numerically [16,17,18]. Also, DL approaches such as convolutional neural networks (CNN) can analyze them efficiently and can deal with large datasets [19,20,21].

1.1 Paper contributions

The current study focuses mainly on developing a hybrid system from ML algorithms and CNN models to predict and detect the existence of heart disease accurately based on the analysis of the medical voice records and images. The suggested approach will aid healthcare professionals to improve the provided medical care to patients. The contributions of the current study can be summarized in the following points:

Proposing a hybrid system from ML algorithms and DL approach for predicting heart disease.
Analyzing different types of datasets including medical images and voice records.
Suggesting a hybrid DL and AO approach for the learning and optimization processes.
Reporting state-of-the-art performance metrics compared with other related studies and approaches.

1.2 Paper organization

The rest of this paper is organized as follows: In the next section, the related studies that have heart disease diagnosis and prediction processes contributions are described. Section 3 depicts the basic concepts regarding voice feature techniques, ML algorithms, Convolutional Neural Network (CNN), Metaheuristic Optimization using Aquila Optimizer (AO), Image Augmentation, and Data Normalization. In Sect. 4, the suggested approach of this work during the heart disease learning and optimization phase is discussed. Section 5 illustrates the experiments and the reported results of different approaches. Finally, Sect. 6 represents the main conclusion and future work.

2 Related work

In this section, the existing studies and research papers, related to heart disease diagnosis and prediction processes based on various types of medical data, are introduced. The related studies are split into studies that focused on (1) deep learning approaches, (2) machine learning algorithms, and (3) hybrid approach.

2.1 Deep learning-based studies

Brunese et al. [22] proposed a methodology for detecting heart disease using DL and through cardio sounds. They used deep neural networks (DNN) to extract a set of features and analyzed the cardio sounds. They showed if they belonged to healthy patients or those with heart disease. 176 heartbeats were considered when they performed the experiments and their results showed that 145 of them related to heart disease patients and only 31 heartbeats for healthy patients. The overall accuracy was 98%. Miao et al. [23] developed a DNN for predicting and diagnosing coronary heart disease. They used Multi-Layer Perceptron (MLP), regularization, and dropout. They utilized 303 instances containing attributes from patients at the Cleveland clinic foundation and achieved 83.67% accuracy and 93.51% sensitivity.

Abdel-Alim et al. [24] proposed a heart disease diagnosis system using ANN and through classifying several cases related to heart disorders using heart sounds. The used dataset contained 850 cases that were partitioned into 650 cases for the ANN training process and the other 200 cases for testing. They utilized different techniques to perform the diagnosis process such as (1) Fast Fourier Transform, (2) Discrete Wavelet Transfer, and (3) Linear Prediction Coding. They achieved a recognition rate of 95%. Ali et al. [19] suggested a smart monitoring system for predicting heart disease through DL approaches, feature selection, feature fusion, and weighting techniques on the used Cleveland and Hungarian datasets. The proposed approach achieved an accuracy of 98.5%. Zhang et al. [25] carried out ECG classification using CNNs to identify heart disease. They deployed the process using a dataset consisting of 102,548 heartbeats and achieved 97.7%, 97.6%, and 97.6% for positive predictive rate, sensitivity, and F1-score, respectively.

Zhang et al. [26] suggested an approach for diagnosing heart disease through signal processing and DL models which predicted the disease from the ECG signals. The used dataset contained 8524 single lead episodic ECG records and reported 0.87 on the F1-score performance metric. Kwon et al. [27] developed a DL approach for mortality rate prediction among heart disease patients from their ECG. The result showed that there are 1026 patients with a mortality rate among 25,776 cases and that confirmed the model achieved accurate results compared to existing or previous ML models. Sajeev et al. [28] proposed an approach for a heart disease prediction system that depended on DL models. It could determine the probabilities of disease risk on the patients. After applying the performance metrics, they achieved an accuracy of 94% and an Area Under the Curve (AUC) score of 0.964.

Rath et al. [29] carried out heart disease detection on the ECG samples through the DL model. The utilized model was depending on LSTM and Generative Adversarial Network (GAN) to achieve the best efficiency. The results reported the best accuracy of 99.2%, F1-score of 0.987, and AUC score of 0.984. Darmawahyuni et al. [30] developed a framework for detecting coronary heart disease based on DNN and UCI repository heart disease dataset. They achieved a specificity of 92%, sensitivity of 99%, and accuracy of 96%.

2.2 Machine learning-based studies

Jindal et al. [31] proposed a heart disease prediction system using ML algorithms to predict the condition of the patient and determine if it had heart disease or not. They depended on the medical history of each patient in a dataset that contained 13 medical attributes for 304 patients which were collected from the UCI repository. They used ML algorithms such as (1) K-Nearest Neighbor (KNN), (2) Logistic Regression (LR), and (3) Random Forest Classifier (RFC). From these algorithms, KNN achieved the best accuracy with a value of 88.52%. Also, they built a model from the used ML algorithms and it achieved 87.5% accuracy which was better compared to their related studies.

Muhammad et al. [32] developed an intelligent computational model for the early and accurate detection and diagnosis of heart disease based on ML algorithms. They utilized many ML algorithms such as (1) RFC, (2) Artificial Neural Network (ANN), (3) Support Vector Machine (SVM), (4) LR, (5) KNN, (6) Naïve Bayes (NB), (7) Extra-Tree Classifier (ETC), (8) Gradient Boosting (GB), (9) AdaBoost (AB), and (10) Decision Tree (DT) on the Cleveland and Hungarian heart disease datasets that were available on the UCI repository. They made a comparison between the algorithms and utilized performance evaluation metrics to show the best algorithms. They were the ETC and GB with overall accuracy values 94.41% and 93.36% respectively.

Pugazhenthi et al. [33] developed a framework for detecting ischemic heart disease from medical images using ML algorithms such as (1) MLP, (2) SVM, and (3) C5 classifier. The reported results showed that the highest accuracy was reported by SVM with an accuracy of 92.1%. Alarsan et al. [34] developed an approach of heart disease detection based on ECG classification and using ML algorithms to extract features that were required for the classification process. They used a dataset that contained 205,146 records for 51 patients. They deployed ML algorithms such as (1) RFC, (2) DT, and (3) Gradient-Boosted Trees (GDB). The highest accuracy was for the GDB algorithm which was 97.98%. Nikhar et al. [35] proposed a methodology for predicting heart disease using ML algorithms on the Cleveland heart disease database which contains 303 records with 76 medical attributes. They performed the experiments using NB and DT which achieved the highest accuracies.

Patel et al. [36] performed a heart disease prediction system by utilizing ML algorithms and data mining techniques on the Cleveland database of UCI repository which had 303 instances. The used algorithms are RFC and Logistic Model Tree (LMT) that performed the prediction process effectively. Singh et al. [37] proposed a prediction system for heart disease using ML approaches. They made a comparison between various ML algorithms such as KNN, SVM, DT, and LR on a dataset collected from the UCI repository. The results showed that the highest accuracy was achieved by KNN with an overall accuracy of 87%, SVM with 83%, DT with 79%, and LR with 78%. Krishnan et al. [38] proposed a prediction system for the probabilities of heart disease based on ML approaches such as DT and NB. They used data from the UCI repository that contained 300 instances with 14 clinical parameters. The DT algorithm had the highest accuracy with 91%.

2.3 Hybrid-based studies

Pasha et al. [39] proposed a framework for predicting cardiovascular disease using DL techniques and different algorithms such as (1) SVM, (2) DT, (3) KNN, and (4) ANN. They collected the dataset that contained attributes related to heart disease from Kaggle. In their work, they made a comparison between the algorithms to know the most optimum one which was ANN with an overall accuracy value of 85.24%. Raza et al. [40] developed a framework for classifying heartbeat sound signals using DL approaches. They utilized a recurrent neural network (RNN) that worked depending on the long short-term memory (LSTM), dense, dropout, and SoftMax layers. They also deployed the MLP, DT, and RFC models. The result showed that RNN is the most efficient one from them and reported an accuracy value of 80.80%. Arabasadi et al. [10] proposed a Computer-Aided System (CAS) for heart disease detection based on a hybrid model using Neural Networks (NN) and Genetic Algorithms (GA). They used a dataset containing information of 303 patients and achieved 93.85% accuracy, 97% sensitivity, and 92% specificity.

Sajja et al. [41] proposed a DL approach for the early prediction of cardiovascular diseases depending on CNNs. They used a dataset from the UCI repository and made a comparison between the traditional algorithms like (1) LR, (2) KNN, (3) SVM, (4) NB, (5) NN, and (6) the proposed approach which reported the best accuracy with 94.78%. Haq et al. [42] proposed a framework of a hybrid intelligent system for the prediction of heart disease based on ML algorithms to identify healthy people and heart disease patients through analyzing the used Cleveland heart disease dataset. They utilized 3 feature selection algorithms, 7 classifiers performance evaluation metrics, and the cross-validation method. The result showed that the best-used algorithms are LR and SVM with accuracies of 89% and 88% respectively. Gavhane et al. [43] suggested a prediction framework for heart disease based on symptoms and using ML algorithms such as NN and MLP. The results showed that NN was the most accurate algorithm when applied to the prediction process. Sharma et al. [44] suggested a framework for heart disease prediction using DNN on the heart disease UCI repository. They utilized different algorithms like (1) KNN, (2) SVM, (3) NB, and (4) RFC for the classification process. They used Talos optimization with DNN which led to achieving the best accuracy of 90.76%.

2.4 Related studies summarization

Table 1 summarizes the discussed related studies in 2021 and 2020 while Table 2 summarizes the discussed related studies otherwise (i.e., 2019 or before). They are ordered in descending order according to the publication year.

Table 1 Related Studies in (2021 and 2020) Summarization

A multi-variate heart disease optimization and recognition framework

Abstract

Similar content being viewed by others

Comprehensive evaluation and performance analysis of machine learning in heart disease prediction

Robust optimization of convolutional neural networks with a uniform experiment design method: a case of phonocardiogram testing in patients with heart diseases

A Novel Heart Disease Prediction Approach Using the Hybrid Combination of GA, PSO, and CNN

1 Introduction

1.1 Paper contributions

1.2 Paper organization

2 Related work

2.1 Deep learning-based studies

2.2 Machine learning-based studies

2.3 Hybrid-based studies

2.4 Related studies summarization

2.5 Plan of solution

3 Background

3.1 Voice feature extraction techniques

3.1.1 Mel-frequency cepstral coefficients (MFCC)

3.1.2 Mel-spectrogram (MS)

3.1.3 Zero-crossing rate (ZCR)

3.1.4 Chroma-based techniques

3.1.5 Root mean square energy (RMSE)

3.1.6 Tonnetz

3.1.7 Spectral-based techniques

3.2 Machine learning algorithms

3.2.1 K-nearest neighbour (KNN)

3.2.2 Decision trees (DT)

3.2.3 Random forest classifier (RFC)

3.2.4 Extra trees classifier (ETC)

3.2.5 AdaBoost (AB)

3.3 Convolutional neural network (CNN)

3.3.1 Transfer learning

3.3.2 Parameters optimization

3.3.3 Hyperparameters

3.4 Metaheuristic optimization using aquila optimizer (AO)

3.5 Image augmentation

3.6 Data normalization (DN)

3.6.1 Standard scaler

3.6.2 Min-max scaler

3.6.3 Max-abs scaler

3.6.4 Normalization

4 Suggested approach

4.1 Dataset collection phase

4.2 Pre-processing phase

4.2.1 Voice segmentation

4.2.2 Numerical features pre-processing

4.2.3 Graphical features pre-processing

4.3 Learning and optimization phase

4.4 Export and statistics phase

5 Experiments and discussions

5.1 Experiments configurations

5.2 ML experiments

5.2.1 K-nearest neighbor experiment

5.2.2 Decision tree (DT) experiment

5.2.3 AdaBoost (AB) experiment

5.2.4 Random forest classifier (RFC) experiment

5.2.5 Extra trees classifier (ETC) experiment

5.2.6 ML experiments summarization

5.3 CNN experiments

5.3.1 MFCC using slaney experiment

5.3.2 MFCC using HTK experiment

5.3.3 STFT experiment

5.3.4 Mel-specgram experiment

5.3.5 Specgram experiment

5.3.6 CNN experiments summarization

5.4 Error analysis

5.5 Related studies comparisons

6 Study limitations

7 Conclusions and future work

7.1 Future work

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval