Introduction

Cardiovascular disease (CVD) is one of the most life-threatening diseases in the world. The World Health Organization (WHO) as well as the Global Burden of Disease (GBD) study reported cardiovascular disease as the main cause of death around the globe annually [40, 56].  WHO revealed that CVD is expected to affect almost 23.6 million people by the year 2030. In some industrialized countries such as the United States of America, the rate is about 1 in 4 deaths [34]. The Middle East and North Africa (MENA) region has an even higher percentage, which is 39.2% of the mortality rate [20]. Hence, early and accurate diagnosis and the provision of appropriate treatments are keys to reducing the amount of death causing cardiovascular diseases. Availability of such services is essential for those who have a high risk of developing heart disease [29].

There are many features that contribute to heart disease prediction. Researchers in the past were more focused on identifying significant features to be used in their heart disease prediction models [8]. Less importance was given to determining the relationships between these features and to identifying their level of priority [32, 32] within the prediction model. To address the issues which hinder early and accurate diagnosis, many data mining related studies were previously conducted [9, 16, 28].

Weighted Association Rule Mining (WARM) is one of the data mining techniques used to discover the relationships between features and to determine mining rules that lead to certain predictions [22]. The weight that is used in this mining technique provides users with a convenient way to indicate the importance of the features that contributes to heart disease and helps obtain more accurate rules [4]. In many prediction models, different features have different importance. Hence, different weights are assigned to different features based on their predicting capabilities [48]. The failure in determining the weight indicates the failure in determining the importance of the features.

Past research had used Weighted Associative Rule Mining (WARM) in heart disease prediction [18, 31, 46, 48, 50]. However, the prediction model reported in these studies still demands further exploration in terms of the number of features used, the strength of these features and the evaluation of scores obtained. In this research, we proposed an algorithm to compute the weight of each feature that contributes to heart disease prediction. We have experimented on all features as well as selected significant features using WARM. The results obtained showed that the significant features outperformed all features with the highest confidence score of 98% in predicting heart disease. To the best of our knowledge, this study is the first that used strength scores of significant predictors in WARM.

The rest of the paper is organized as follows: Sect. 2 presents the background of the study followed by Sect. 3 on research objectives. Section 4 presents the methodology and Sect. 5 displays the results obtained by this research. Section 6 includes the discussions and Sect. 7 benchmarks this research against previous studies. Finally, Sect. 8 concludes the research with a summary of the findings and future work.

Related works

CVDs are disorders of the heart and blood vessels and include coronary heart disease, cerebrovascular disease and other conditions. Heart attacks and strokes are the main causes of mortality in cardiovascular disease in which the rate nears one out of three [6]. With the high rate of mortality, diagnosis and prevention measures need to be performed effectively and efficiently. Many data mining techniques have been used to help address these issues (Amin et al. [8]). Most of the past research looked into identifying features that contribute to better heart prediction accuracy [9]. However, very little researches looked into the relationships that exist between these features. The relationship between each feature that contributes to heart disease prediction can be obtained by using the Associative Rule Mining (ARM) technique [11]. The ARM technique is popular in transactional and relational datasets. The hidden knowledge in large datasets such as business transactions developed the interest of many business owners to understand the patterns that can help them to improve their business decisions (Agarwal and Mithal [1]). For instance, discovering the frequently bought items by customers in market basket analysis. This analysis looks at the various items found in customers’ shopping cart and identifies the associations between them. A good example would be if customers were looking to purchase milk, they were likely to purchase bread on the same trip to the supermarket. This approach is also widely used in the healthcare industry specifically in privacy preservation of healthcare data [15], predicting cancer associated protein interactions [12], predicting obstructive sleep apnea [43] and predicting co-diseases in Thyroid patients [23].

ARM is also used in heart disease prediction. Table 1 shows the studies that used ARM in heart disease prediction. Akbaş et al. [3], Shuriyaa and Rajendranb [42], Srinivas et al. [49], Khare and Gupta [24] and Lakshmi and Reddy [27] have used ARM on UCI dataset. Some of the studies listed in Table 1 used private datasets from hospitals and heart centres. Although the scores that were obtained from these datasets are high (99% by Sonet et al. [45]), 100% by Thanigaivel and Kumar [52], the studies have a limitation in terms of reproduction, as the datasets are not open for access. Akbaş et al. [3] on the other hand obtained a score of 97.8% in confidence using the UCI dataset. However, the confidence score obtained predicted people with no risk of heart disease.

Table 1 Studies on Heart Disease Prediction using ARM

Weighted Associative Rule Mining (WARM) is an extension of ARM, in which weights are assigned to differentiate the importance of the features mined. Let T be the training dataset in which contains T = {r1, r2, r3… ri} with a set of weight associated with each {attribute, attribute value} pair. Every ith record ri is a set of value and weight wi attached to each feature of ri tuple / record. In a weighted framework, each record is a set of triple {ai, vi, wi} where feature ai has a value of vi and weight of wi where 0 < wj <  = 1.

Assigning a correct weight to each feature is a hard task. In various fields of studies, there are different ways of calculating the weights of features. For instance, according to Malarvizhi and Sathiyabhama [30] in web mining, visitor page dwelling time is a way of calculating weightage. WARM is widely used in research on shopping basket scenarios and in predicting customers’ behaviour. Chengis et al. [10] investigated on assigning weight before and after ARM. WARM was also used in predicting disease comorbidities using clinical as well as molecular data (Lakshmi and Vadivu 26). This technique is also used in predicting breast cancer [5]. Recent research by Park and Lim [39] used this technique to reduce design failures of pre-alarming systems in the shipbuilding industry.

However, not many researchers focused on applying WARM to cardiovascular disease. Table 2 shows studies on heart disease prediction using WARM. However, the weight of features was not precisely calculated (Jabbar et al. [21], Sundar et al. [50], Soni and Vyas [48]). Soni et al. [47] proposed a new framework, which was an associative classifier that used WARM. Different weights were assigned to different attributes based on their predicting capability. Their theoretical model yielded a confidence score of 79.5%. Soni and Vyas [48] also applied WARM and the confidence level they achieved was was 79.5%. Their research assigned weights based on age range, smoking habits, hypertension and BMI range. On the other hand, Soni et al. [46] assigned weights to each of the attributes based on the advice obtained from the medical experts. They presented an intelligent and effective heart attack prediction system using a weighted associative classifier by achieving a maximum score of 80% confidence. Meanwhile, Sundar et al. [50] developed a system using two data mining techniques, which are Naïve Bayes and WARM. Their experiments showed that WARM achieved a score of 84% on confidence score, outperforming Naïve Bayes, which obtained only 78%. Chauhan et al. [11] also used WARM in predicting heart disease. They obtained an accuracy score of 60.4%. Kharya et al. [25] used Weighted Bayesian Association Rule Mining Algorithm, which combines WARM with heart disease dataset. However, they failed to indicate the results obtained in their study. Ibrahim and Sivabalakrishnan [19] have used Random Walker Memetic algorithm-based WARM for predicting coronary disease. They obtained an accuracy of 95% using the UCI heart disease dataset.

Table 2 Studies on Heart Disease Prediction using WARM

Despite having research that is based on WARM in predicting heart disease, none of them was focused on identifying the important features to be used in heart disease prediction which would contribute to better prediction performance. The weight of each feature plays an equally important role in deciding which feature has the highest impact (strength) in predicting heart disease. The right weight of the significant features identified will yield an effective prediction model. Thus, this research is focused on identifying the weight of significant features and utilizing the generated score in predicting heart disease.

Research objectives

The main objectives of this research are as follows:

To compute the weight of significant features in heart disease prediction.

To predict heart disease using the computed weight of significant features (using WARM).To evaluate the performance of WARM in predicting heart disease.

Proposed methodology

This section describe in detail the methodology used as shown in Fig. 1. It contains 5 main stages which are data pre-processing, feature selection, feature weight computation, apply WARM and model evaluation.

Fig. 1
figure 1

Methodology

Dataset

This research uses the heart disease dataset that is obtained from UCI Machine Learning Repository [13]. UCI Machine Learning Repository is one of the largest available datasets, having over 417 various datasets. The Cleveland dataset from UCI Machine Learning Repository is one of the datasets on heart disease, which is widely used by researchers to date (Amin et al. [8]). This research will also use this dataset of which contains 303 rows. The dataset contains 76 features in which 14 attributes including class label are used. The 14 features together with their descriptions and data types are shown in Table 3.

Table 3 Features description

Experimental Setup

In this research, Weka 3.8 was used to conduct the experiments. The retrieved Cleveland dataset went through a pre-processing phase. The significant features were retrieved from a total of 14 factors from the Cleveland dataset (Amin [7]). Further, the weight of each significant feature was computed and assigned back to them accordingly. WARM was applied to the heart disease dataset to generate rules. Finally, evaluation was performed to obtain the confidence score of the best rules generated using WARM based on significant features. The detailed explanation of each process is explained in the following sections.

Data Pre-Processing

In the data pre-processing phase, all missing records were deleted from the dataset, which consists of 6 instances. Based on Table 3, there are 13 normal attributes(age’, ‘sex’, ‘cp’, ‘trestbps’, ‘chol’, ‘fbs’, ‘restecg’, ‘thalach’, ‘exang’, ‘oldpeack’, ‘slope’, ‘ca’, ‘thal’) and 1 class label(‘goal’), which refers to the criticality level of heart disease in patients. It ranged from 0–4, in which 0 refers to’No Heart Disease’ and the other values indicates the presence of heart disease at different criticality levels. Since this research aims at predicting the presence of heart disease and not its criticality levels, the range from 1 to 4 is thus normalized to 1, which indicates the presence of heart disease, and 0 to represent the absence of heart disease. Data normalization is also performed as a part of the data transformation process that involved mounting data into nominal data. This is required, as WARM utilizes nominal data only. All the ranges formed for each features are indicated in Table 4.

Table 4 Ranges formed for features

Feature Selection

Features were selected based on experiments conducted by Amin et al. [8] since they had used the same dataset (UCI). They performed a set of experiments that dealt with 8100 combinations of features with 7 different classification models (K-NN, Decision Tree, Naïve Bayes, Logistic Regression, Neural Network and Vote) to identify significant features. Table 5 shows the features obtained from the highest performance of each classification models. The highlighted columns indicate the features which appeared more than 10 times and thus were selected as significant features. The selected 8 features are sex, CP, Fbs, Exang, Oldpeak, Slope, CA, and Thal.

Table 5 Selecting significant features from the result of the highest performance

Feature weight computation

This section explains how the weight of the features was calculated. The fundamental of WARM states that different features in a dataset have different importance in predicting heart disease. The weight of each feature ranges from 0 to 1. Thus, a weight that is closer to 1 indicates a more significant feature. On the other hand, a weight that is closer to 0 is the least significant in heart disease prediction.

Calculate feature weight

The first step was to calculate the individual feature weights. Let R be the set of features R = {n0, n1, n2… ni} and (n > 0). In this experiment, the total number of features is 13 and after feature selection, it is reduced to 8 (Sex, CP, Fbs, Exang, Oldpeak, Slope, CA, and Thal). W (n) is the weight of each feature (W is the weight of each feature to be calculated and n represents a feature),

$$W\left( n \right) = \frac{{\text{n}}}{{\mathop \sum \nolimits_{{{\text{n}}_{0} ,\;{\text{n}} \in {\text{R}}}}^{{\text{n}}} {\text{n}}_{0} + {\text{n}}_{1} + \cdots + {\text{n}}_{i} }}$$
(1)

For example, the value of sex as displayed in Table 5 is’20’ and the sum of all the features will be’121’. The total value of significant features (Sex, CP, Fbs, Exang, Oldpeak, Slope, CA, andThal) is calculated as (20 + 18 + 12 + 12 + 14 + 12 + 19 + 14 = 121). Thus, to calculate the weight of ‘sex’ (weight of features, WOF):

$${\text{WOF}} = {\text{W}}\left( {20} \right) = 20/121 = 0.17$$

Table 6 displays the calculated weights for each of the significant features. All weights were computed accordingly. From the distribution of the weights, CA has the greatest strength followed by Sex, CP, Oldpeak and Thal, Fbs, Exang and Slope has the similar weight of 0.09 each.

Table 6 Weight of the significant features

Calculate feature value weight

This section explains how feature values are computed. Feature values represent all the values that a feature contains. For instance, feature values for sex are male and female. Let A be the number of each feature value contained in the dataset and (A ∪ B) be the total number of records.

Table 7 shows the total sub value of each feature based on the UCI dataset. Male value is represented by 203 records and female by 94 records which gives a total of 297 records from the UCI dataset. To calculate the value of each feature weight, let A be the selected value and B be the rest of the features value,

$${\text{W}}_{{({\text{value}} = {\text{A}})}} = \frac{A}{A \cup B}$$
(2)
$$\begin{aligned} & {\mathbf{Gender}}\;{\mathbf{male}}\;{\mathbf{value}}:\;{\text{W}}_{(206)} = \, 203/297 = 0.68 \\ & {\mathbf{Gender}}\;{\mathbf{female}}\;{\mathbf{value}}:\;{\text{W}}_{(97)} = 97/297 = 0.32 \\ \end{aligned}$$
Table 7 Identify total sub value of each feature

Figure 2 shows the comparison of the percentage of males and females in the Cleveland heart disease dataset.

Fig. 2
figure 2

Comparison on the percentage of male and female in Cleveland heart disease dataset

Calculate total weight for feature

This section explains how the total weight for features is computed. The feature weight (W (n)) and feature value weight (W (value)) gives the total weight (W (t)) for the feature. The computation is shown below.

$${\text{W}}\left( {\text{t}} \right) = {\text{W}}\left( {\text{n}} \right)*{\text{W}}\left( {{\text{value}}} \right)$$
(3)

Example of calculating the total weight of feature W (t):

$$\begin{aligned} & {\mathbf{Male}}:\;{\text{W}}\;\left( {{\text{total}}\;{\text{Male}}} \right) = 0.14*0.68 = 0.0952 \\ & {\mathbf{Female}}:\;{\text{W}}\;\left( {{\text{total}}\;{\text{Female}}} \right) = 0.14*0.32 = 0.0448 \\ \end{aligned}$$

Algorithm

This section detailed out the algorithm to obtain the weighted score of each feature in predicting heart disease. The algorithm is stated as follows:

figure a

Apply WARM

Not all features in the heart disease dataset have the same level of significance in predicting the risk of heart disease. Thus, different weights based on their prediction capability are assigned. These values are then imported into Weka 3.8 to experiment with WARM using Apriori Algorithm.

Apriori algorithm

The Apriori algorithm is a well-known approach in WARM. Apriori was first proposed by Agrawal and Srikant [2]. The algorithm starts with a dataset including transactions that wants to construct frequent item sets, having at least a user-specified threshold. In the algorithmic process of Apriori, an item set X of length k is frequent if and only if every subset of X, having length k—1, is also frequent. This consideration results in a substantial reduction of search space and allows rule discovery in a computationally feasible time. Apriori generates a rule of the form: s =  > (f – s) if and only if the confidence of the rule is above the user-defined threshold. Confidence is essentially the accuracy of the rule and is used in Apriori to rank the rules (Agrawal & Srikant [2]; Mutter et al. [51]).

Weighted confidence

The confidence level is used in order to show how often the rule appears to be true. Let Y be the ‘goal’, then the weighted confidence of a rule X → Y can be calculated as the ratio of weighted support of \(\left( {X \cup Y} \right)\) over the weighted support of (X).

$${\text{Weighted}}\;{\text{Confidence }} = \left( {\frac{{Weighted \;Support \left( {X \cup Y} \right)}}{Weighted \;support \left( X \right)}} \right)$$
(4)

For instance, the rule {sex = Male, CA = 3} → {heart disease} has a confidence of 0.2/0.2 = 1.0. It means a patient who is a male and having 3 CA (major vessels coloured by fluoroscopy) has a 100% chance of having heart disease.

Evaluation

This phase generates rules based on the Apriori algorithm in Weighted Associative Rule Mining. Two sets of rules and confidence scores were generated for the followings:

  1. (i)

    All features—this includes all the 13 features.

  2. (ii)

    Selected significant features (8 features).

The following section provides a detailed explanations of the results obtained which are the rules and confidence scores.

Results (rules and confidence level generated)

The rules and confidence level generated for all the (13) features and the selected significant features (8) are shown in this section.

All features

Table 8 shows the top 20 rules and confidence scores obtained for all the features using WARM. The rules were sorted by the highest confidence scores.

Table 8 Rules generated from all the features using WARM

The highest confidence level achieved for predicting the risk of having heart disease is 96% and the number of features used to generate this rule is 3(CP, Slope and Thal). This can be clearly seen in Table 8 (Rule Number 7). The rule states that if the value of Chest Pain (CP) is asymptomatic, the slope is flat and the value of Thallium (Thal) is reversible, therefore, the patient has a very high tendency (confidence level = 96%) of having the risk of heart disease. All the highlighted rows in Table 8 show the rules that contributed to the prediction of the risk of having heart disease. Further, the Table 9 is the summary that shows the frequency of each features used in the rules, which were generated from Table 8 (which contains the rules that predicts heart disease). It shows the rule number and the features used in each of the top 20 rules. From the top 20 rules, only 6 rules predicts heart disease and others are non-sick rules which predicts no heart disease.

Table 9 Summary of frequency of each features contained in the rules that predicts heart disease (all features)

Although all 13 features have been used for rules and confidence score generation as shown in Table 8, only 9 features have been used for heart disease prediction based on the top 20 rules. The most significant feature in predicting heart disease is CP. This feature exist in all the 6 rules generated that predicts heart disease. Thal and Oldpeak exist in 4 rules out of the 6 rules in predicting heart disease.

Selected significant features

This section emphasizes on the rules and confidence scores obtained by the selected significant features. Table 10 shows the top 20 rules generated from the significant features using WARM. The confidence score obtained in predicting the risk of having heart disease using 8 selected significant features shows a comparatively high confidence level at 98%. The rule obtained for the top confidence score states as.

Table 10 Rules generated from 8 significant features using weighted associative rule mining

CP = asymptomatic, Exang = Yes, Oldpeak = greaterThanZero, Thal = reversible =  =  > class_HD = Heart Disease.

which means if Chest Pain (CP) is asymptomatic, exercise-induce angina (Exang) is present, Oldpeak (ST depression induced by exercise relative to rest) is present and, Thallium heart scan (Thal) is reversible then the patient is diagnosed as having heart disease. From the top 20 rules generated, 11 rules are meant for predicting heart disease as highlighted in Table 10. Table 11 shows the summary of the frequency of existence of each features contained in the rules that predicts heart disease. There are a total of 11 rules out of 20 rules generated using significant features to predict the presence of heart disease. The most significant feature that exists in all the positive rules that predicts the Heart Disease is Chest pain (CP). Thallium heart scan (Thal) is seen in 9 out of 11 rules and Oldpeak (ST depression induced by exercise relative to rest) is seen in 7 rules.

Table 11 Summary of frequency for each features contained in the rules that predicts heart disease (8 selected features)

Discussions

The implementation of WARM on selected significant features managed to achieve the highest confidence score in predicting heart disease which is 98% compared to 96% obtained from all features. It can be concluded that WARM predicts the risk of having heart disease well. From the top 20 rules generated, only 6 rules were based all features. On the other hand, 11 rules from the top 20 generated were based on the selected 8 features.

Studying the top 20 rules generated revealed some significant information. These findings were validated by a cardiologist:-

  • Asymptomatic chest pain, positive exercise-induced angina, Oldpeak > 0 and reversible thallium heart scan implies the presence of heart disease.

    CP = asymptomatic, Exang = Yes, Oldpeak = greaterThanZero, Thal = reversible =  =  > class_HD = Heart Disease

  • Asymptomatic chest pain is one of the most important features as it appears in all the rules generated in detecting heart disease.

  • Reversible thallium heart scan and Oldpeak greater than zero are positively correlated with heart disease.

  • Males are more prone to have heart disease compared to females as all the sick rules stated sex as male and the healthy rules stated sex as female.

  • There is a strong negative correlation between CA and Thal for heart disease prediction.

  • The most common features that exist in healthy rules are Sex = Female, Exang (Exercise induce angina) = No and CA (Number of major vessels coloured by fluoroscopy) = Zero. A patient will be predicted as not having heart disease if the patient is female, angina is not induced by exercise and has no major vessels coloured by fluoroscopy.

  • Slope is not featured in any of the healthy rules.

  • This study managed to determine the processes involved in obtaining significant features and to devise a scoring mechanism to obtain the strength of each feature. This will enable for the correct weight to be imposed on each of the significant features to be used in WARM for predicting heart disease. The confidence score obtained in this study is the highest obtained in heart disease prediction using WARM based on the UCI dataset. This study can be used as a guide for computing thestrength scores of significant features found in other heart disease datasets.

Comparative analysis with existing work

This section performs comparison between the proposed work and existing works using WARM. The results obtained in this research proved that the weighted scores imposed on WARM for 8 significant features have the highest confidence score of 98% compared with other existing studies. Figure 3 shows the confidence score of all the existing studies on WARM that used the UCI Cleveland heart disease dataset in comparison with the proposed work. The confidence score obtained by both the experiments which includes all features and significant features in predicting heart disease using WARM achieved a significant difference in terms of the confidence score achieved compared to previous studies. The use of the significant features score in WARM provides the highest confidence of 98% predicting heart disease.

Fig. 3
figure 3

Result comparison on WARM using UCI Cleveland heart disease dataset

Table 12 presents a comparative analysis of WARM using significant features versus existing results of ARM in heart disease prediction. Rules that gave the highest confidence scores were retrieved and compared in this table. Research by Said et al. [41] and Khare and Gupta [24] showed lower confidence scores compared to this research. Although Sonet et al., [45] managed to obtain a confidence score of 99%, the rule generated for this score is questionable. The rule stated that if a patient has diabetes, then the patient will have heart disease. Although the risk of having heart disease is proven to be higher in diabetic patient, this rule cannot be generalized for all diabetic patients. This is the result of bias that might have existed in their dataset. The dataset used in their study is collected from 4 different medical institutions with a total of 131 records and is not an open dataset. Besides that, the dataset contained different features from the dataset used in this study.

Table 12 Comparative Analysis of Weighted Associative analysis and Associative Rule Mining in predicting heart disease

This study also benchmarked the rules generated using the UCI dataset by past researches with the rules generated in our study. The extracted healthy rules are shown in Table 13 and sick rules are shown in Table 14. Table 13 shows that our experiment with 8 significant features obtained the optimum confidence score of 100% for predicting healthy rules. The rules retrieved for this stated that if the sex is female, chest paint is non-angina and thallium heart scan is normal, this person is then predicted not to have heart disease.

Table 13 Healthy rules extractions
Table 14 Sick rules extractions

Table 14 shows the sick rules together with the highest confidence scores of this research in comparison with other resesarch on associative and WARM for heart disease prediction. This study achieved a confidence score of 98% which is better than all the other predicted sick rules. To the best of our knowledge, the significant features’ weighted scores in our study managed to beat the scores obtained by all other research using ARM and WARM to predict heart disease.

Conclusion

This research contributed to obtaining the highest confidence score using significant features in WARM for heart disease prediction. Assigning appropriate weight scores have proven to improve the performance of confidence level in the prediction. A set of significant features with different weights to represent the strength of each of the features was used in heart disease prediction. To the best of our knowledge, this is the first study that made use of significant features in executing WARM. This research has also contributed to listing the top rules in predicting heart disease based on the UCI dataset. This is the first research that benchmarked the healthy rules and sick rules with the highest confidence scores. Future researches may look into predicting the risk levels of heart disease, as this will help medical practitioners and patients to gauge their heart disease severity. The algorithm used in this study for measuring weight can be further explored for use with other datasets to cater to other prediction models using the weighted approach. The machine learning techniques used in feature selection phase of this research is limited to the most popular techniques used in heart disease prediction research. Future researchers should look into exploring other machine learning techniques in selecting the significant features.