Skip to main content
Log in

Auto-MeDiSine: an auto-tunable medical decision support engine using an automated class outlier detection method and AutoMLP

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

With advanced data analysis techniques, efforts for more accurate decision support systems for disease prediction are on the rise. According to the World Health Organization, diabetes-related illnesses and mortalities are on the rise. Hence, early diagnosis is particularly important. In this paper, we present a framework, Auto-MeDiSine, that comprises an automated version of enhanced class outlier detection using a distance-based algorithm (AutoECODB), combined with an ensemble of automatic multilayer perceptron (AutoMLP). AutoECODB is built upon ECODB by automating the tuning of parameters to optimize outlier detection process. AutoECODB cleanses the dataset by removing outliers. Preprocessed dataset is then used to train a prediction model using an ensemble of AutoMLPs. A set of experiments is performed on publicly available Pima Indian Diabetes Dataset as follows: (1) Auto-MeDiSine is compared with other state-of-the-art methods reported in the literature where Auto-MeDiSine realized an accuracy of 88.7%; (2) AutoMLP is compared with other learners including individual (focusing on neural network-based learners) and ensemble learners; and (3) AutoECODB is compared with other preprocessing methods. Furthermore, in order to validate the generality of the framework, Auto-MeDiSine is tested on another publicly available BioStat Diabetes Dataset where it outperforms the existing reported results, reaching an accuracy of 97.1%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.who.int/diabetes/en/.

  2. http://archive.ics.uci.edu/ml/.

  3. http://archive.ics.uci.edu/ml/.

References

  1. Ahmed M, Afzal H, Siddiqi I, Khan B (2017) Mcs: multiple classifier system to predict the churners in the telecom industry. In: SAI Intelligent Systems Conference 2017, London, UK

  2. Ahmed M, Rasool AG, Afzal H, Siddiqi I (2017) Improving handwriting based gender classification using ensemble classifiers. Expert Syst Appl 85(1):158–168

    Article  Google Scholar 

  3. Aibinu AM, Salami MJE, Shafie AA (2011) A novel signal diagnosis technique using pseudo complex-valued autoregressive technique. Expert Syst Appl 38(8):9063–9069

    Article  Google Scholar 

  4. Al Jarullah AA (2011) Decision tree discovery for the diagnosis of type ii diabetes. In: 2011 International conference on innovations in information technology (IIT). IEEE, pp 303–307

  5. Al Shalabi L, Shaaban Z (2006) Normalization as a preprocessing engine for data mining and the approach of preference matrix. In: International conference on dependability of computer systems, 2006. DepCos-RELCOMEX’06. IEEE, pp 207–214

  6. Ali R, Siddiqi MH, Idris M, Kang BH, Lee S (2014) Prediction of diabetes mellitus based on boosting ensemble modeling. In: International conference on ubiquitous computing and ambient intelligence. Springer, pp 25–28

  7. Anbarasi M, Anupriya E, Iyengar N (2010) Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int J Eng Sci Technol 2(10):5370–5376

    Google Scholar 

  8. Apolloni B, Avanzini G, Cesa-Bianci N, Ronchini G (1990) Diagnosis of epilepsy via backpropagation. In: Proceedings of the 1990 international joint conference on neural networks, vol 2, pp 571–574

  9. Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12(10):2385–2404

    Article  Google Scholar 

  10. Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 29–38

  11. Bounds DG, Lloyd PJ, Mathew B, Waddell G (1988) A multilayer perceptron network for the diagnosis of low back pain. In: IEEE international conference on neural networks 1988. IEEE, pp 481–489

  12. Breuel T, Shafait F (2010) Automlp: simple, effective, fully automated learning rate and size adjustment. In: The learning workshop. Utah

  13. Daho MEH, Settouti N, Lazouni MEA, Chikh MA (2013) Recognition of diabetes disease using a new hybrid learning algorithm for nefclass. In: 2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA). IEEE, pp 239–243

  14. DeGroff CG, Bhatikar S, Hertzberg J, Shandas R, Valdes-Cruz L, Mahajan RL (2001) Artificial neural network-based method of screening heart murmurs in children. Circulation 103(22):2711–2716

    Article  Google Scholar 

  15. Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127

    Article  Google Scholar 

  16. Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(02):185–205

    Article  Google Scholar 

  17. Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5):352–359

    Article  Google Scholar 

  18. Farhanah S, Jafan B, Ali DM (2005) Diabetes mellitus forecast using artificial neural networks (ann). In: Asian conference on sensors and the international conference on new techniques in pharmaceutical and medical research proceedings (IEEE), pp 135–138

  19. Floyd CE, Lo JY, Yun AJ, Sullivan DC, Kornguth PJ (1994) Prediction of breast cancer malignancy using an artificial neural network. Cancer 74(11):2944–2948

    Article  Google Scholar 

  20. Guo Y, Bai G, Hu Y (2012) Using bayes network for prediction of type-2 diabetes. In: 2012 international conference for internet technology and secured transactions. IEEE, pp 471–472

  21. Gysels E, Renevey P, Celka P (2005) Svm-based recursive feature elimination to compare phase synchronization computed from broadband and narrowband eeg signals in brain-computer interfaces. Signal Process 85(11):2178–2189

    Article  Google Scholar 

  22. Hall MA (2000) Correlation-based feature selection of discrete and numeric class machine learning (Working paper 00/08). University of Waikato, Hamilton, New Zealand

  23. Han J, Rodriguez JC, Beheshti M (2008) Diabetes data analysis and prediction model discovery using rapidminer. In: 2008 Second international conference on future generation communication and networking, vol 3. IEEE, pp 96–99

  24. Han L, Luo S, Yu J, Pan L, Chen S (2015) Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J Biomed Health Inform 19(2):728–734

    Article  Google Scholar 

  25. Hewahi NM, Saad MK (2007) Class outliers mining: distance-based approach. Int J Intell Technol 2(1):55–68

    Google Scholar 

  26. Hilger F, Molau S, Ney H et al (2002) Quantile based histogram equalization for online applications. In: INTERSPEECH

  27. Imbens GW, Lancaster T (1996) Efficient estimation and stratified sampling. J Econom 74(2):289–318

    Article  MathSciNet  Google Scholar 

  28. Jahangir M, Afzal H, Ahmed M, Khurshid K, Nawaz R (2017) An expert system for diabetes prediction using auto tuned multi-layer perceptron. In: Intelligent systems conference (IntelliSys) 2017. IEEE, pp 722–728

  29. Johns MV (1988) Importance sampling for bootstrap confidence intervals. J Am Stat Assoc 83(403):709–714

    Article  MathSciNet  Google Scholar 

  30. Kalaiselvi C, Nasira G (2014) A new approach for diagnosis of diabetes and prediction of cancer using anfis. In: 2014 World congress on computing and communication technologies (WCCCT). IEEE, pp 188–190

  31. Kayaer K, Yıldırım T (2003) Medical diagnosis on pima indian diabetes using general regression neural networks. In: Proceedings of the international conference on artificial neural networks and neural information processing (ICANN/ICONIP), pp 181–184

  32. Kharya S (2012) Using data mining techniques for diagnosis and prognosis of cancer disease. arXiv preprint arXiv:1205.1923

  33. Kumari VA, Chitra R (2013) Classification of diabetes disease using support vector machine. Int J Eng Res Appl 3(2):1797–1801

    Google Scholar 

  34. Li L (2014) Diagnosis of diabetes using a weight-adjusted voting approach. In: 2014 IEEE international conference on bioinformatics and bioengineering (BIBE). IEEE, pp 320–324

  35. Nnamoko NA, Arshad FN, England D, Vora J (2014) Meta-classification model for diabetes onset forecast: a proof of concept. In: 2014 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 50–56

  36. Ohno-Machado L, Musen MA (1997) Sequential versus standard neural networks for pattern recognition: an example using the domain of coronary heart disease. Comput Biol Med 27(4):267–281

    Article  Google Scholar 

  37. Park J, Edington DW (2001) A sequential neural network model for diabetes prediction. Artif Intell Med 23(3):277–293

    Article  Google Scholar 

  38. PObi S, Hall LO (2006) Predicting juvenile diabetes from clinical test results. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 2159–2165

  39. Polat K, Güneş S, Arslan A (2008) A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine. Expert Syst Appl 34(1):482–487

    Article  Google Scholar 

  40. Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45

    Article  Google Scholar 

  41. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909

    Article  Google Scholar 

  42. Raicharoen T, Lursinsap C (2002) Critical support vector machine without kernel function. In: Proceedings of the 9th international conference on neural information processing, 2002. ICONIP’02, vol 5. IEEE, pp 2532–2536

  43. Rashid SF, Shafait F, Breuel TM (2012) Scanning neural network for text line recognition. In: 2012 10th IAPR international workshop on document analysis systems (DAS). IEEE, pp 105–109

  44. Saad MK, Hewahi NM (2009) A comparative study of outlier mining and class outlier mining. Comput Sci Lett 1(1)

  45. Sabariah MMK, Hanifa SA, Sa’adah MS (2014) Early detection of type ii diabetes mellitus with random forest and classification and regression tree (cart). In: 2014 International conference of advanced informatics: concept, theory and application (ICAICTA). IEEE, pp 238–242

  46. Saha S, Raghava G (2006) Prediction of continuous b-cell epitopes in an antigen using recurrent neural network. Proteins Struct Funct Bioinform 65(1):40–48

    Article  Google Scholar 

  47. Salami M, Shafie A, Aibinu A (2010) Application of modeling techniques to diabetes diagnosis. In: IEEE EMBS conference on biomedical engineering & sciences

  48. Sathyadevi G (2011) Application of cart algorithm in hepatitis disease diagnosis. In: 2011 International conference on recent trends in information technology (ICRTIT). IEEE, pp 1283–1287

  49. Saxena K, Sharma R et al (2015) Diabetes mellitus prediction system evaluation using c4. 5 rules and partial tree. In: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions). IEEE, pp 1–6

  50. Shanker MS (1996) Using neural networks to predict the onset of diabetes mellitus. J Chem Inf Comput Sci 36(1):35–41

    Article  Google Scholar 

  51. Specht DF (1991) A general regression neural network. IEEE Trans Neural Netw 2(6):568–576

    Article  Google Scholar 

  52. Srinivas K, Rani BK, Govrdhan A (2010) Applications of data mining techniques in healthcare and prediction of heart attacks. Int J Comput Sci Eng (IJCSE) 2(02):250–255

    Google Scholar 

  53. Sumathy M, Thirugnanam M, Kumar P, Jishnujit T, Kumar KR (2010) Diagnosis of diabetes mellitus based on risk factors. Int J Comput Appl 10(4):1–4

    Google Scholar 

  54. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  55. Tafa Z, Pervetica N, Karahoda B (2015) An intelligent system for diabetes prediction. In: 2015 4th Mediterranean conference on embedded computing (MECO). IEEE, pp 378–382

  56. Temurtas H, Yumusak N, Temurtas F (2009) A comparative study on diabetes disease diagnosis using neural networks. Expert Syst Appl 36(4):8610–8615

    Article  Google Scholar 

  57. Venkatesan P, Anitha S (2006) Application of a radial basis function neural network for diagnosis of diabetes mellitus. Curr Sci 91(9):1195–1199

    Google Scholar 

  58. Wang MH, Lee CS, Li HC, Ko WM (2007) Ontology-based fuzzy inference agent for diabetes classification. In: NAFIPS 2007–2007 annual meeting of the north American fuzzy information processing society. IEEE, pp 79–83

  59. Wettayaprasit W, Sangket U (2006) Linguistic knowledge extraction from neural networks using maximum weight and frequency data representation. In: 2006 IEEE conference on cybernetics and intelligent systems. IEEE, pp 1–6

  60. Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52

    Article  Google Scholar 

  61. Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, Hua L (2012) Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst 36(4):2431–2448

    Article  Google Scholar 

  62. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 856–863

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haider Abbas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jahangir, M., Afzal, H., Ahmed, M. et al. Auto-MeDiSine: an auto-tunable medical decision support engine using an automated class outlier detection method and AutoMLP. Neural Comput & Applic 32, 2621–2633 (2020). https://doi.org/10.1007/s00521-019-04137-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04137-5

Keywords

Navigation