Skip to main content

Advertisement

Log in

Nonclinical Features in Predictive Modeling of Cardiovascular Diseases: A Machine Learning Approach

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Background

In the broader healthcare domain, the prediction bears more value than an explanation considering the cost of delays in its services. There are various risk prediction models for cardiovascular diseases (CVDs) in the literature for early risk assessment. However, the substantial increase in CVDs-related mortality is challenging global health systems, especially in developing countries. This situation allows researchers to improve CVDs prediction models using new features and risk computing methods. This study aims to assess nonclinical features that can be easily available in any healthcare systems, in predicting CVDs using advanced and flexible machine learning (ML) algorithms.

Methods

A gender-matched case–control study was conducted in the largest public sector cardiac hospital of Pakistan, and the data of 460 subjects were collected. The dataset comprised of eight nonclinical features. Four supervised ML algorithms were used to train and test the models to predict the CVDs status by considering traditional logistic regression (LR) as the baseline model. The models were validated through the train–test split (70:30) and tenfold cross-validation approaches.

Results

Random forest (RF), a nonlinear ML algorithm, performed better than other ML algorithms and LR. The area under the curve (AUC) of RF was 0.851 and 0.853 in the train–test split and tenfold cross-validation approach, respectively. The nonclinical features yielded an admissible accuracy (minimum 71%) through the LR and ML models, exhibiting its predictive capability in risk estimation.

Conclusion

The satisfactory performance of nonclinical features reveals that these features and flexible computational methodologies can reinforce the existing risk prediction models for better healthcare services.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Bansilal S, Castellano JM, Fuster V (2015) Global burden of CVD: focus on secondary prevention of cardiovascular disease. Int J Cardiol 201:S1–S7. https://doi.org/10.1016/s0167-5273(15)31026-3

    Article  PubMed  Google Scholar 

  2. Organization WH (2018) Noncommunicable diseases (NCD) country profiles

  3. Leeder S, Raymond S, Greenberg H, Liu H, Esson K (2004) A race against time: the challenge of cardiovascular disease in developing economies. Columbia University, New York

    Google Scholar 

  4. Joshi P, Islam S, Pais P, Reddy S, Dorairaj P, Kazmi K, Pandey MR, Haque S, Mendis S, Rangarajan S (2007) Risk factors for early myocardial infarction in South Asians compared with individuals in other countries. JAMA 297(3):286–294. https://doi.org/10.1001/jama.297.3.286

    Article  CAS  PubMed  Google Scholar 

  5. Zubair F, Nawaz SK, Nawaz A, Nangyal H, Amjad N, Khan MS (2018) Prevalence of cardiovascular diseases in Punjab, Pakistan: a cross-sectional study. J Public Health 26(5):523–529. https://doi.org/10.1007/s10389-018-0898-4

    Article  Google Scholar 

  6. Roth GA, Johnson C, Abajobir A, Abd-Allah F, Abera SF, Abyu G, Ahmed M, Aksut B, Alam T, Alam K (2017) Global, regional, and national burden of cardiovascular diseases for 10 causes, 1990 to 2015. J Am Coll Cardiol 70(1):1–25. https://doi.org/10.1016/j.jacc.2017.04.052

    Article  PubMed  PubMed Central  Google Scholar 

  7. Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. J Big Data 6(1):54. https://doi.org/10.1186/s40537-019-0217-0

    Article  Google Scholar 

  8. Lysaght T, Lim HY, Xafis V, Ngiam KY (2019) AI-assisted decision-making in healthcare. Asian Bioethics Rev 11(3):299–314. https://doi.org/10.1007/s41649-019-00096-0

    Article  Google Scholar 

  9. Chapman BP, Lin F, Roy S, Benedict RH, Lyness JM (2019) Health risk prediction models incorporating personality data: motivation, challenges, and illustration. Personal Disord 10(1):46. https://doi.org/10.1037/per0000300

    Article  PubMed  PubMed Central  Google Scholar 

  10. Sanchez RA, Ayala M, Baglivo H, Velazquez C, Burlando G, Kohlmann O, Jimenez J, Jaramillo PL, Brandao A, Valdes G (2009) Latin American guidelines on hypertension. J Hypertens 27(5):905–922. https://doi.org/10.1097/hjh.0b013e32832aa6d2

    Article  CAS  PubMed  Google Scholar 

  11. Han D, Kolli KK, Gransar H, Lee JH, Choi S-Y, Chun EJ, Han H-W, Park SH, Sung J, Jung HO (2020) Machine learning based risk prediction model for asymptomatic individuals who underwent coronary artery calcium score: Comparison with traditional risk prediction approaches. J Cardiovasc Comput Tomogr 14(2):168–176. https://doi.org/10.1016/j.jcct.2019.09.005

    Article  PubMed  Google Scholar 

  12. Sajid MR, Muhammad N, Zakaria R, Bukhari SAC (2020) Modifiable risk factors and overall cardiovascular mortality: moderation of urbanization. J Public Health Res. https://doi.org/10.4081/jphr.2020.1893

    Article  PubMed  PubMed Central  Google Scholar 

  13. Dimopoulos AC, Nikolaidou M, Caballero FF, Engchuan W, Sanchez-Niubo A, Arndt H, Ayuso-Mateos JL, Haro JM, Chatterji S, Georgousopoulou EN (2018) Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk. BMC Med Res Methodol 18(1):179. https://doi.org/10.1186/s12874-018-0644-1

    Article  PubMed  PubMed Central  Google Scholar 

  14. Grant SW, Collins GS, Nashef SA (2018) Statistical primer: developing and validating a risk prediction model. Eur J Cardiothorac Surg 54(2):203–208. https://doi.org/10.1093/ejcts/ezy180

    Article  PubMed  Google Scholar 

  15. Mahmood SS, Levy D, Vasan RS, Wang TJ (2014) The Framingham Heart Study and the epidemiology of cardiovascular disease: a historical perspective. Lancet 383(9921):999–1008. https://doi.org/10.1016/s0140-6736(13)61752-3

    Article  PubMed  Google Scholar 

  16. Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB (1998) Prediction of coronary heart disease using risk factor categories. Circulation 97(18):1837–1847. https://doi.org/10.1161/01.cir.97.18.1837

    Article  CAS  PubMed  Google Scholar 

  17. McGorrian C, Yusuf S, Islam S, Jung H, Rangarajan S, Avezum A, Prabhakaran D, Almahmeed W, Rumboldt Z, Budaj A (2011) Estimating modifiable coronary heart disease risk in multiple regions of the world: the INTERHEART modifiable risk score. Eur Heart J 32(5):581–589. https://doi.org/10.1093/eurheartj/ehq448

    Article  PubMed  Google Scholar 

  18. Panagiotakos DB, Georgousopoulou EN, Fitzgerald AP, Pitsavos C, Stefanadis C (2015) Validation of the HellenicSCORE (a calibration of the ESC SCORE project) regarding 10-year risk of fatal cardiovascular disease in Greece. Hellenic J Cardiol 56(4):302–308

    PubMed  Google Scholar 

  19. Conroy RM, Pyörälä K, Ae F, Sans S, Menotti A, De Backer G, De Bacquer D, Ducimetiere P, Jousilahti P, Keil U (2003) Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J 24(11):987–1003. https://doi.org/10.1016/s0195-668x(03)00114-3

    Article  CAS  PubMed  Google Scholar 

  20. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N (2017) Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS ONE. https://doi.org/10.1371/journal.pone.0174944

    Article  PubMed  PubMed Central  Google Scholar 

  21. Suzuki S, Yamashita T, Sakama T, Arita T, Yagi N, Otsuka T, Semba H, Kano H, Matsuno S, Kato Y (2019) Comparison of risk models for mortality and cardiovascular events between machine learning and conventional logistic regression analysis. PLoS ONE 14(9):e0221911. https://doi.org/10.1371/journal.pone.0221911

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Damen JA, Hooft L, Schuit E, Debray TP, Collins GS, Tzoulaki I, Lassale CM, Siontis GC, Chiocchia V, Roberts C (2016) Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. https://doi.org/10.1136/bmj.i2416

    Article  PubMed  PubMed Central  Google Scholar 

  23. Joseph P, Yusuf S, Lee SF, Ibrahim Q, Teo K, Rangarajan S, Gupta R, Rosengren A, Lear SA, Avezum A (2018) Prognostic validation of a non-laboratory and a laboratory based cardiovascular disease risk score in multiple regions of the world. Heart 104(7):581–587. https://doi.org/10.1136/heartjnl-2017-311609

    Article  CAS  PubMed  Google Scholar 

  24. Li L, Zhang M (2011) Population versus hospital controls for case-control studies on cancers in Chinese hospitals. BMC Med Res Methodol 11(1):167. https://doi.org/10.1186/1471-2288-11-167

    Article  PubMed  PubMed Central  Google Scholar 

  25. Sajid MR, Muhammad N, Zakaria R, Shahbaz A, Nauman A (2020) Associated factors of cardiovascular diseases in pakistan: assessment of path analyses using warp partial least squares estimation. Pak J Stat Oper Res 16(2):265–277

    Article  Google Scholar 

  26. Rahman M, Nakamura K, Seino K, Kizuki M (2015) Sociodemographic factors and the risk of developing cardiovascular disease in Bangladesh. Am J Prev Med 48(4):456–461. https://doi.org/10.1016/j.amepre.2014.10.009

    Article  PubMed  Google Scholar 

  27. Kastorini C-M, Milionis HJ, Georgousopoulou E, Kalantzi K, Nikolaou V, Vemmos KN, Goudevenos JA, Panagiotakos DB (2015) Defining the path between social and economic factors, clinical and lifestyle determinants, and cardiovascular disease. Global Heart 10(4):255–263. https://doi.org/10.1016/j.gheart.2015.01.002

    Article  PubMed  Google Scholar 

  28. Arber S, Fenn K, Meadows R (2014) Subjective financial well-being, income and health inequalities in mid and later life in Britain. Soc Sci Med 100:12–20. https://doi.org/10.1016/j.socscimed.2013.10.016

    Article  PubMed  Google Scholar 

  29. Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M (2002) Logistic regression. Springer

    Google Scholar 

  30. Choi BG, Rha S-W, Kim SW, Kang JH, Park JY, Noh Y-K (2019) Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks. Yonsei Med J 60(2):191–199. https://doi.org/10.3349/ymj.2019.60.2.191

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kubat M (2015) Artificial neural networks. In: An introduction to machine learning. Springer, pp 91–111

  32. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-84858-7

  33. Ma Y, Guo G (2014) Support vector machines applications. Springer. https://doi.org/10.1007/978-3-319-02300-7

  34. Angraal S, Mortazavi BJ, Gupta A, Khera R, Ahmad T, Desai NR, Jacoby DL, Masoudi FA, Spertus JA, Krumholz HM (2020) Machine learning prediction of mortality and hospitalization in heart failure with preserved ejection fraction. JACC: Heart Failure 8(1):12–21

    PubMed  Google Scholar 

  35. Breiman L (2001) Random forests. Machine learning 45(1):5–32

    Article  Google Scholar 

  36. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier

  37. Khoshgoftaar TM, Golawala M, Van Hulse J (2007) An empirical study of learning from imbalanced data using random forest. In: 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007). IEEE. pp 310–317. https://doi.org/10.1109/ictai.2007.46

  38. Biau G, Scornet E (2016) A random forest guided tour. Test 25(2):197–227. https://doi.org/10.1007/s11749-016-0481-7

    Article  Google Scholar 

  39. Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with Java implementations. Acm Sigmod Record 31(1):76–77. https://doi.org/10.1145/507338.507355

    Article  Google Scholar 

  40. Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D (2016) WEKA manual for version 3-9-1. University of Waikato, Hamilton, New Zealand

    Google Scholar 

  41. Xu Y, Goodacre R (2018) On splitting training and validation set: A comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J Anal Testing 2(3):249–262. https://doi.org/10.1007/s41664-018-0068-2

    Article  Google Scholar 

  42. Westerhuis JA, Hoefsloot HC, Smit S, Vis DJ, Smilde AK, van Velzen EJ, van Duijnhoven JP, van Dorsten FA (2008) Assessment of PLSDA cross validation. Metabolomics 4(1):81–89. https://doi.org/10.1007/s11306-007-0099-6

    Article  CAS  Google Scholar 

  43. Harrington PdB (2018) Multiple versus single set validation of multivariate models to avoid mistakes. Crit Rev Anal Chem 48(1):33–46. https://doi.org/10.1080/10408347.2017.1361314

    Article  CAS  PubMed  Google Scholar 

  44. Steyerberg EW, Harrell FE (2016) Prediction models need appropriate internal, internal–external, and external validation. J Clin Epidemiol 69:245–247. https://doi.org/10.1016/j.jclinepi.2015.04.005

    Article  PubMed  Google Scholar 

  45. Zhang Z, Beck MW, Winkler DA, Huang B, Sibanda W, Goyal H (2018) Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. Ann Trans Med 6(11):216. https://doi.org/10.21037/atm.2018.05.32

    Article  Google Scholar 

  46. Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA (2009) A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10(1):213. https://doi.org/10.1186/1471-2105-10-213

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Karunathilake SP, Ganegoda GU (2018) Secondary prevention of cardiovascular diseases and application of technology for early diagnosis. Biomed Res Int. https://doi.org/10.1155/2018/5767864

    Article  PubMed  PubMed Central  Google Scholar 

  48. Khanna D, Sahu R, Baths V, Deshpande B (2015) Comparative study of classification techniques (SVM, logistic regression and neural networks) to predict the prevalence of heart disease. Int J Mach Learn Comput 5(5):414. https://doi.org/10.7763/ijmlc.2015.v5.544

    Article  Google Scholar 

  49. Khalilia M, Chakraborty S, Popescu M (2011) Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak 11(1):51. https://doi.org/10.1186/1472-6947-11-51

    Article  PubMed  PubMed Central  Google Scholar 

  50. Yang L, Wu H, Jin X, Zheng P, Hu S, Xu X, Yu W, Yan J (2020) Study of cardiovascular disease prediction model based on random forest in eastern china. Sci Rep 10(1):1–8. https://doi.org/10.1038/s41598-020-62133-5

    Article  CAS  Google Scholar 

  51. Su TT, Amiri M, Mohd Hairi F, Thangiah N, Bulgiba A, Majid HA (2015) Prediction of cardiovascular disease risk among low-income urban dwellers in metropolitan Kuala Lumpur, Malaysia. BioMed Res Int. https://doi.org/10.1155/2015/516984

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Noryanti Muhammad.

Ethics declarations

Conflict of interest

The study has no conflict of interest to declare by any author.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sajid, M.R., Muhammad, N., Zakaria, R. et al. Nonclinical Features in Predictive Modeling of Cardiovascular Diseases: A Machine Learning Approach. Interdiscip Sci Comput Life Sci 13, 201–211 (2021). https://doi.org/10.1007/s12539-021-00423-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-021-00423-w

Keywords

Navigation