Skip to main content

Advertisement

Log in

On the goodness of fit of parametric and non-parametric data mining techniques: the case of malaria incidence thresholds in Uganda

  • Original Paper
  • Published:
Health and Technology Aims and scope Submit manuscript

Abstract

To identify which data mining technique (parametric or non-parametric) best fits the predictions on imbalanced malaria incidence dataset. The researchers compared parametric techniques in form of naïve Bayes and logistic regression against non-parametric techniques in form of support vector machines and artificial neural networks and their goodness of fit and prediction was assessed using 10-fold and 5-fold cross-validation on an independent validation dataset set to determine which model best fits the predictions on imbalanced malaria incidence dataset. The 10-fold cross-validation outperformed the 5-fold cross-validation in all performance metrics with the naïve Bayes classifier attaining accuracy of 69% with a sensitivity of 90.9%, a specificity of 55.6%, a precision of 55.6% and F-measure score of 69.0%, the logistic regression achieved an accuracy of 65.5% with a sensitivity of 83.3%, a specificity of 52.9%, a precision of 55.6% and F-measure score of 66.7%, the support vector machines achieved an accuracy of 82.8% with a sensitivity of 88.2%, a specificity of 75.0%, a precision of 83.3%, and F-measure score of 85.7% whereas the artificial neural networks registered an accuracy of 89.7% with a sensitivity of 94.1%, a specificity of 83.3%, a precision of 88.9%, and F-measure score of 91.4%. Non-parametric data mining techniques in form of artificial neural networks and support vector machines outperformed the parametric data mining technique in form of naïve Bayes in making predictions emanating from imbalanced malaria incidence dataset on account of registering higher F-measure values of 91.4% and 85.7% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Availability of data and material

The data was sourced from the ministry of health( www.health.go.ug ), Uganda Bureau of Statistics ( www.ubos.org ) and Uganda National Meteorological Authority ( www.unma.go.ug ) and it has been availed/uploaded as supplementary material.

Code availability

The program scripts/code can be availed by the first author upon request.

Notes

  1. www.unma.go.ug

  2. www.ubos.org

References

  1. Ferreira D, Oliveira A, & Freitas A. Applying data mining techniques to improve diagnosis in neonatal jaundice. In Med Inform Decis Mak. 2012;12(143):2–7.

  2. Hakizimana L, Cheruiyot K, Kimani S, Nyararai M. A Hybrid Based Classification and Regression Model for Predicting Diseases Outbreak in Datasets. Int J Comput. (IJC). 2017;27(1):69–83.

    Google Scholar 

  3. Kotlar AM, Jong De, van Lier Q. Evaluation of parametric and nonparametric machine-learning techniques for prediction of saturated and near-saturated hydraulic conductivity. Vadose Zone J. 2019. https://doi.org/10.2136/vzj2018.07.0141.

    Article  Google Scholar 

  4. Olayinka TC, Chiemeke SC. Predicting paediatric malaria occurrence using classification algorithm in data mining. J Adv Math Comput Sci. 2019;31(4):1–10. https://doi.org/10.9734/JAMCS/2019/v31i430118.

    Article  Google Scholar 

  5. Hagenauer J, Omrani H, Helbich M. Assessing the performance of 38 machine learning models : the case of land consumption rates in Bavaria, Germany. Int J Geogr Inf Sci. 2019;1–21. https://doi.org/10.1080/13658816.2019.1579333.

  6. Maxwell AE, Warner TA, Fang F. Implementation of machine-learning classification in remote sensing: an applied review. Int J Remote Sens. 2018;39:2784–817.

    Article  Google Scholar 

  7. Tayyebi A, Pijanowski BC. Modeling multiple land use changes using ANN, CART and MARS: comparing tradeoffs in goodness of fit and explanatory power of data mining tools. J Appl Earth Obs Geoinf. 2014;28:102–16.

    Article  Google Scholar 

  8. Agyapong KB, Hayfron-Acquah J, Asante M. An overview of data mining models (descriptive and predicitve). International Journal of Software & Hardware Research in Engineering. 2016;4(5):53–60. https://doi.org/10.1007/978-3-319-13084-2_59.

    Article  Google Scholar 

  9. Patil TR, Sherekar SS. Performance analysis of Naive Bayes and J48 classification algorithm for data classification. Int J Comput Sci Appl. 2013;6(2).

  10. Krishnaiah V, Narsimha G, Subhash C. Diagnosis of lung cancer prediction system using data mining classification techniques. (IJCSIT) Int J Comput Sci Inf Technol. 2013;4(1):39–45.

  11. Goltsman K. Data Mining: Models and Methods. 2017.https://datascience.foundation/sciencewhitepaper/data-mining:-models-and-methods.

  12. Ouyang F, Guo B, Ouyang L, Liu Z, Lin S, Meng W. Comparison between linear and nonlinear machine-learning algorithms for the classification of thyroid nodules. Eur J Radiol. 2019;113(1):251–7. https://doi.org/10.1016/j.ejrad.2019.02.029.

    Article  Google Scholar 

  13. Mircioiu C, Atkinson J. A comparison of parametric and non-Parametric methods applied to a Likert Scale. Pharmacy. 2017;5(26):1–12. https://doi.org/10.3390/pharmacy5020026.

    Article  Google Scholar 

  14. Abdalrada AS, Yahya OH, Alaidi AHM, Hussein NA, Alrikabi HT, Al-Quraishi T. A predictive model for liver disease progression based on logistic regression algorithm. Period Eng Nat Sci. 2019;7(3):1255–64.

    Article  Google Scholar 

  15. David M. Automobile insurance pricing with generalized linear models. Proceedings in GV-Global Virtual Conference (No. 1). 2015.

  16. Loucoubar C, Paul R, Bar-hen A, Huret A, Tall A, Sokhna C, Trape J-F, Ly Badara A, Faye J, Diop A, Sakuntabhai A. An exhaustive, non-euclidean, non-parametric data mining tool for unraveling the complexity of biological systems – novel insights into malaria. PLoS One. 2011;6(9):1–16. https://doi.org/10.1371/journal.pone.0024085.

    Article  Google Scholar 

  17. Zhao X, Yan X, Yu A, Van Hentenryck P. Prediction and behavioral analysis of travel mode choice : A comparison of machine learning and logit models. Travel Behav Soc. 2020;20:22–35. https://doi.org/10.1016/j.tbs.2020.02.003.

    Article  Google Scholar 

  18. Uddin S, Khan A, Hossain ME, Moni MA. (2019). Comparing different supervised machine learning algorithms for disease prediction. In BMC Med Inform Decis Mak. 2019;19(281):1-16. https://doi.org/10.1186/s12911-019-1004-8.

  19. Tang Y, Ji J, Gao S, Dai H, Yu Y, Todo Y. A pruning neural network model in credit classification analysis. In Comput Math Methods Med. 2018;(pp. 21–22).

  20. Medjahed S, Saadi T, Benyettou A. A Breast cancer diagnosis by using k-nearest neighbor with different distances and classification rules. Int J Comput Appl. 2013;62(1).

  21. Kalaiselvan C, Rao LB. Comparison of reliability techniques of parametric and non- parametric method. Int J Eng Sci Technol. 2016;19:691–9. https://doi.org/10.1016/j.jestch.2015.11.002.

    Article  Google Scholar 

  22. Park S, Lee J, Son Y. Predicting market impact costs using nonparametric machine learning models. PLoS Negl Trop Dis. 2016;11(2):1–13. https://doi.org/10.1371/journal.pone.0150243.

    Article  Google Scholar 

  23. Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak. 2011;11(51).

  24. Liu T, Fan W, Wu C. A hybrid machine learning approach to cerebral stroke prediction based on imbalanced medical dataset. In Artificial Intelligence In Medicine: Elsevier B.V; 2019. https://doi.org/10.1016/j.artmed.2019.101723.

    Book  Google Scholar 

  25. Norinder U, Boyer S. Binary classification of imbalanced datasets using conformal prediction. J Mol Graph Model. 2017. https://doi.org/10.1016/j.jmgm.2017.01.008.

    Article  Google Scholar 

  26. Sambasivam G, Opiyo GD. A predictive machine learning application in agriculture : Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. In Egyptian Informatics Journal: Faculty of Computers and Information, Cairo University; 2020. https://doi.org/10.1016/j.eij.2020.02.007.

    Book  Google Scholar 

  27. Mujali OR, López G, Garach L. Bayes classifiers for imbalanced traffic accidents datasets. Accid Anal Prev. 2016;88:37–51. https://doi.org/10.1016/j.aap.2015.12.003.

    Article  Google Scholar 

  28. Sarkar BK. Improving disease diagnosis by a new hybrid model. In New Horizons in Translational Medicine 2017;4(1-4):2. Elsevier Ltd. https://doi.org/10.1016/j.nhtm.2017.07.001.

  29. Shanab AA, Khoshgoftaar TM, Wald R, Van Hulse J. Comparison of approaches to alleviate problems with high-dimensional and class-imbalanced data. IEEE. 2011;234–239.

  30. Wang Z. Practical tips for class imbalance in binary classification. 2018.https://towardsdatascience.com/practical-tips-for-class-imbalance-in-binary-classification-6ee29bcdb8a7.

  31. Thammasiri D, Delen D, Meesad P, Kasap N. A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. Expert Syst Appl. 2014;41:321–30.

    Article  Google Scholar 

  32. Bhatnagar R. Machine Learning and Big Data Processing: A Technological Perspective and Review (Hassanien (ed.). 2018. Springer International Publishing.

  33. Krawczyk B. Learning from imbalanced data : open challenges and future directions. Prog Artif Intell. 2016;5:221–32. https://doi.org/10.1007/s13748-016-0094-0.

    Article  Google Scholar 

  34. Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y. A Novel Ensemble Method for Classifying Imbalanced Data. In Pattern Recognition: Elsevier; 2014. https://doi.org/10.1016/j.patcog.2014.11.014.

    Book  Google Scholar 

  35. Lourenço C, Tatem AJ, Atkinson PM, Cohen JM, Pindolia D, Bhavnani D, Le Menach A. Strengthening surveillance systems for malaria elimination: A global landscaping of system performance, 2015–2017. Malar J. 2019;18(315):1–11. https://doi.org/10.1186/s12936-019-2960-2.

    Article  Google Scholar 

  36. Mpimbaza A, Miles M, Sserwanga A, Kigozi R, Wanzira H, Rubahika D, Nasr S, Kapella BK, Yoon SS, Chang M, Yeka A, Staedke SG, Kamya MR, Dorsey G. Short Report: Comparison of routine health management information system versus enhanced inpatient malaria surveillance for estimating the burden of malaria among children admitted to four hospitals in Uganda. Am J Trop Med Hyg. 2015;92(1):18–21. https://doi.org/10.4269/ajtmh.14-0284.

    Article  Google Scholar 

  37. Parveen R, Jalbani AH, Shaikh M, Memon KH, Siraj S, Nabi M, Lakho S. Prediction of Malaria using Artificial Neural Network. Int J Comput Sci Netw Secur. 2017;17(12):79–86.

    Google Scholar 

  38. Branco P, Torgo L, Ribeiro RP. A Survey of Predictive Modelling under Imbalanced Distributions. 2015.

  39. Jain S, Kotsampasakou E, Ecker GF. Comparing the performance of meta-classifiers — a case study on selected imbalanced data sets relevant for prediction of liver toxicity. J Comput Aided Mol Des. 2018;32:583–90. https://doi.org/10.1007/s10822-018-0116-z.

    Article  Google Scholar 

  40. Barros TM, Plácido SN, Guedes LA, Silva I. Predictive Models for Imbalanced Data : A School Dropout Perspective. Educ Sci. 2019;9(275). https://doi.org/10.3390/educsci9040275.

  41. Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N. A survey on addressing high - class imbalance in big data. J Big Data. 2018;5(42). https://doi.org/10.1186/s40537-018-0151-6.

  42. Huda S, Yearwood J, Jelinek HF, Hassan MM, Fortino G, Buckland M. A Hybrid Feature Selection With Ensemble Classification for Imbalanced Healthcare Data : A Case Study for Brain Tumor Diagnosis. IEEE Access. 2017;4. https://doi.org/10.1109/ACCESS.2016.2647238.

  43. Razzaghi T, Roderick O, Marko N, Safro I. Fast imbalanced classification of healthcare data with missing values. 18th International Conference on Information Fusion, 2015;774–781. Washington, DC.

  44. Amer AYA, Vranken J, Wouters F, Mesotten D, Vandervoort P, Storms V, Aerts JM. Feature engineering for ICU mortality prediction based on hourly to bi-hourly measurements. Appl Sci. 2019;9(3525). https://doi.org/10.3390/app9173525.

  45. González J, Martín F, Sánchez M, Sánchez F, Moreno MN. Multiclassifier systems for predicting neurological outcome of patients with severe trauma and polytrauma in intensive care units. J Med Syst. 2017;41(136). https://doi.org/10.1007/s10916-017-0789-1.

  46. Sanchez-Hernandez F, Ballesteros-Herraez J, Kraeim M, Sanchez-Barba M, Moreno-Garcia M. Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data . Using Ensembles and a Clustering-Based Undersampling Approach. Appl Sci. 2019;9(5287). https://doi.org/10.3390/app9245287.

  47. Basha HS, Tharwat A, Abdalla A, Hassanien AE. Neutrosophic rule-based prediction system for toxicity effects assessment of biotransformed hepatic drugs. Expert Syst Appl. 2019;121:142–57. https://doi.org/10.1016/j.eswa.2018.12.014.

    Article  Google Scholar 

  48. Rao RR, Makkithaya K. Learning from a Class Imbalanced Public Health Dataset : a Cost-based Comparison of Classifier Performance. Int J Electr Comput Eng. 2017;7(4):2215–2222. https://doi.org/10.11591/ijece.v7i4.pp2215-2222.

  49. Brown B, Przybylski AA, Manescu P, Caccioli F, Oyinloye G, Elmi M, Al E. Data-Driven Malaria Prevalence Prediction in Large Densely-Populated Urban Holoendemic sub-Saharan West Africa: Harnessing Machine Learning Approaches and 22-years of Prospectively Collected Data. Cornell University. 2019. https://doi.org/10.18907/jjsre.10.Special_105_4.

  50. World Health Organization [WHO]. World Malaria Report 2019. 2019.https://www.who.int/publications-detail/world-malaria-report-2019.

  51. Wang R, Jiang Y, Michael E, Zhao G. How to select a proper early warning threshold to detect infectious disease outbreaks based on the China infectious disease automated alert and response system (CIDARS). In BMC Public Health 2017;17:1–10. https://doi.org/10.1186/s12889-017-4488-0.

  52. Ministry of Health [MoH]. The Uganda malaria reduction strategic plan 2014-2020. Government of Uganda [GoU]. 2014. Retrieved from http://health.go.ug/sites/default/files/TheUgandaMalariaReductionStrategicPlan2014-2020.pdf.

  53. Dastile X, Celik T, Potsane M. Statistical and machine learning models in credit scoring: A systematic literature survey. Appl Soft Comput. 2020. https://doi.org/10.1016/j.asoc.2020.106263.

    Article  Google Scholar 

  54. Garcia-montemayor V, Martin-malo A, Barbieri C, Bellocchio F, Soriano S, Pendon-ruiz de Mier V, Molina I, Aljama P, Rodriguez M. (2020). Predicting mortality in hemodialysis patients using machine learning analysis. Clin Kidney J. 2020;1–8. https://doi.org/10.1093/ckj/sfaa126.

  55. Cui S, Wang D, Wang Y, Yu P, Jin Y. An improved support vector machine-based diabetic readmission prediction. Comput Methods Programs Biomed. 2018;166:123–35. https://doi.org/10.1016/j.cmpb.2018.10.012.

    Article  Google Scholar 

  56. Guo X, Li D, Zhang A. Improved support vector machine oil price forecast model based on genetic algorithm optimization parameters. Conference on Computational Intelligence and Bioinformatics. 2012;1:525–30. https://doi.org/10.1016/j.aasri.2012.06.082.

    Article  Google Scholar 

  57. Shao Y, Lunetta RS. Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS J Photogramm Remote Sens. 2012;70:78–87. https://doi.org/10.1016/j.isprsjprs.2012.04.001.

    Article  Google Scholar 

  58. Gao S, Zhao H, Bai Z, Han B, Xu J, Zhao R, Zhang N, Chen L, Lei X, Shi W, Zhang L, Li P, Yu H. Combined use of principal component analysis and artificial neural network approach to improve estimates of PM 2 . 5 personal exposure : A case study on older adults. Sci Total Environ. 2020;726. https://doi.org/10.1016/j.scitotenv.2020.138533.

  59. Ragmani A, Elomri A, Abghour N, Moussaid K, Rida M, Badidi E. Adaptive fault-tolerant model for improving cloud computing performance using artificial neural network. Proc Comput Sci. 2020;170:929–34.

    Article  Google Scholar 

  60. Yang J, Huang Y, Xu H, Gu D, Xu F, Tang J, Fang C. Optimization of fungi co-fermentation for improving anthraquinone contents and antioxidant activity using artificial neural networks. Food Chem. 2020;313. https://doi.org/10.1016/j.foodchem.2019.126138.

  61. Şen B, Uçar E, Delen D. Predicting and analyzing secondary education placement-test scores: A data mining approach. Expert Syst Appl. 2012;39(10):9468–76. https://doi.org/10.1016/j.eswa.2012.02.112.

    Article  Google Scholar 

  62. Hamblin D, Wang D, Chen G. (2016). Measurement classification using hybrid weighted Naive Bayes. IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications, CIVEMSA 2016 - Proceedings. 2016.https://doi.org/10.1109/CIVEMSA.2016.7524248.

  63. Tamaddoni-nezhad A, Milani GA, Raybould A, Muggleton S, Bohan DA. Construction and Validation of Food Webs Using Logic-Based Machine Learning and Text Mining. In Int Adv Econ Res. 2013;49(1):225–289. Elsevier Ltd. https://doi.org/10.1016/B978-0-12-420002-9.00004-4.

  64. Ayo E, Wanjoya A, Luboobi L. Statistical Modeling of Malaria Incidences in Apac District, Uganda. Open J Stat. 2017;7:901–19. https://doi.org/10.4236/ojs.2017.76063.

    Article  Google Scholar 

  65. Boruah I, Kakoty S. Analytical Study of Data Mining Applications in Malaria Prediction and Diagnosis. Int J Comput Sci Mob Comput (IJCSMC). 2019;8(3):275–84.

    Google Scholar 

  66. Oluwagbemi O, Clarence S. Computational Predictive Framework towards the Control and Reduction of Malaria incidences in Africa. Egypt Comput Sci J. 2012;36(2):1–17.

    Google Scholar 

  67. Zacarias O, Boström H. (Predicting the Incidence of Malaria Cases in Mozambique Using Regression Trees and Forests. Int J Electron Comput Sci Eng. (IJCSEE). 2013;1(1).

  68. Arifianto A, Barmawi AM, Wibowo AT. Malaria incidence forecasting from incidence record and weather pattern using polynomial neural network. Int J Future Comput Commun. 2014;3(1):60–5. https://doi.org/10.7763/ijfcc.2014.v3.268.

    Article  Google Scholar 

  69. Sharma V, Kumar A, Panat L, Karajkhede G, Lele A. Malaria Outbreak Prediction Model Using Machine Learning. Int J Adv Res Comput Eng Technol (IJARCET). 2015;4(12):4415–9.

    Google Scholar 

  70. Buczak AL, Baugher B, Guven E, Ramac-Thomas LC, Elbert Y, Babin SM, Lewis SH. Fuzzy association rule mining and classification for the prediction of malaria in South Korea. BMC Med Inform Decis Mak. 2015;15(1):1–17. https://doi.org/10.1186/s12911-015-0170-6.

    Article  Google Scholar 

  71. Santosh T, Ramesh D. Artificial neural network based prediction of malaria abundances using bidata : A knowledge capturing approach. Clinical Epidemiology and Global Health. 2019;7:121–6. https://doi.org/10.1016/j.cegh.2018.03.001.

    Article  Google Scholar 

  72. Ssempiira J, Nambuusi B, Kissa J, Agaba B, Makumbi F, Kasasa S, Vounatsou P. Geostatistical modelling of malaria indicator survey data to assess the effects of interventions on the geographical distribution of malaria prevalence in children less than 5 years in Uganda. PLoS One. 2017;12(4):1–20.

    Article  Google Scholar 

  73. Texier G, Machault V, Barragti M, Boutin JP, Rogier C. Environmental determinant of malaria cases among travellers. Malar J. 2013;12(1), 1–11. Retrieved from http://ovidsp.ovid.com/ovidweb.cgi?T=JS&PAGE=reference&D=emed11&NEWS=N&AN=23496931.

  74. Aggarwal C. Data mining: The Text book. Springer. 2015. https://doi.org/10.1007/978-3-319-14142-814.

    Article  Google Scholar 

  75. Crone SF, Lessmann S, Stahlbock R. The impact of preprocessing on data mining : An evaluation of classifier sensitivity in direct marketing. Eur J Oper Res. 2006;173:781–800. https://doi.org/10.1016/j.ejor.2005.07.023.

    Article  MathSciNet  MATH  Google Scholar 

  76. Maslove DM, Podchiyska T, Lowe HJ. Discretization of continuous features in clinical datasets. 2013;544–553. https://doi.org/10.1136/amiajnl-2012-000929.

  77. Li R, Wang Z. An entropy-based discretization method for classification rules with inconsistency checking. First International Conference on Machine Learning and Cybernetics, November, 2002;4–5.

  78. World Health Organization [WHO]. Malaria surveillance, monitoring & evaluation: A reference manual. 2018. Geneva-Switzerland.

  79. Li G, Zhou X, Liu J, Chen Y, Zhang H, Chen Y, Liu J, Jiang H, Yang J, Nie S. Comparison of three data mining models for prediction of advanced schistosomiasis prognosis in the Hubei province. PLoS Negl Trop Dis. 2018;12(2):1–19. https://doi.org/10.1371/journal.pntd.0006262.

    Article  Google Scholar 

  80. Ali MFM, Asklany SA, El-wahab MA, Hassan MA. Data Mining Algorithms for Weather Forecast Phenomena: Comparative Study. International Journal of Computer Science and Network Security. 2019;19(9):76–81.

    Google Scholar 

  81. Makhtar M, Nawang H, Shamsuddin SNW. Analysis on Students Performance Using Naïve classifier. J Theor Appl Inf Technol. 2017;95(16), 3993–4000. www.jatit.org.

  82. Zhu C, Idemudia C, Feng W. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. In Informatics in Medicine Unlocked 2019;(pp. 4–5). Elsevier Ltd. https://doi.org/10.1016/j.imu.2019.100179.

  83. Simsek S, Kursuncu U, Kibis E, AnisAbdellatif M, Dag A. A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. In Expert Systems with Applications 2020;(Vol. 139). Elsevier Ltd. https://doi.org/10.1016/j.eswa.2019.112863.

  84. Wu H, Yang S, Huang Z, He J, Wang X. Type 2 diabetes mellitus prediction model based on data mining. In Informatics in Medicine Unlocked. 2018. Elsevier Ltd. https://doi.org/10.1016/j.imu.2017.12.006.

  85. Vapnik WN. The nature of statistical learning theory. 2000. Tsinghua University Press.

  86. Ahmad L, Eshlaghy A, Poorebrahimi A, Ebrahimi M, Razavi A. Informatics using three machine learning techniques for predicting breast cancer recurrence. Health & Medical Informatics. 2013;4(2):2–4. https://doi.org/10.4172/2157-7420.1000124.

    Article  Google Scholar 

  87. Jiang T, Gradus JL, Rosellini AJ. Supervised machine learning: A brief primer. Behavior Therapy. 2020. https://doi.org/10.1016/j.beth.2020.05.002.

    Article  Google Scholar 

  88. Titterington M. Neural Networks. Wiley Interdisciplinary Reviews: Computational Statistics. 2010;2(1):1–8.

    Article  Google Scholar 

  89. Wang Q. A Hybrid Sampling SVM Approach to Imbalanced Data Classification. 2014;(Vol. 2014, pp. 1–7). Hindawi Publishing Corporation.

  90. Zhao J, Jin J, Chen S, Zhang R, Yu B, Liu Q. Knowledge-Based Systems. Knowl-Based Syst. 2020;203:1. https://doi.org/10.1016/j.knosys.2020.106087.

    Article  Google Scholar 

  91. Priya A, Garg S, Tigga NP. Predicting anxiety, depression and stress in modern life using machine learning algorithms machine learning algorithms. International Conference on Computational Intelligence and Data Science. 2019;167:1258–67. https://doi.org/10.1016/j.procs.2020.03.442.

    Article  Google Scholar 

  92. Soleymani R, Granger E, Fumera G. F-Measure Curves: A Tool to visualize classifier performance under imbalance. In Pattern Recognition: Elsevier Ltd.; 2019. https://doi.org/10.1016/j.patcog.2019.107146.

    Book  Google Scholar 

  93. Patil S, Sonavane S. Improved classification of large imbalanced data sets using rationalized technique : Updated Class Purity Maximization Over _ Sampling Technique ( UCPMOT ). Journal of Big Data. 2017;4(49):1–32. https://doi.org/10.1186/s40537-017-0108-1.

    Article  Google Scholar 

  94. Mehdiyev N, Enke D, Fettke P, Loos P. Evaluating forecasting methods by considering different accuracy measures. Proc Compu Sci. 2016;95:264–71. https://doi.org/10.1016/j.procs.2016.09.332.

    Article  Google Scholar 

  95. Linden A, Yarnold PR. Using data mining techniques to characterize participation in observational studies. J Eval Clin Pract. 2016;22:835–43. https://doi.org/10.1111/jep.12515.

    Article  Google Scholar 

  96. Goetz JN, Brenning A, Petschko H, Leopold P. Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput Geosci. 2015;81:1–11. https://doi.org/10.1016/j.cageo.2015.04.007.

    Article  Google Scholar 

  97. James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. Springer; 2013.

    Book  Google Scholar 

  98. Gareth J, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: With Applications in R. Springer. 2014.

  99. Witten I, Frank E, Hall M. Data mining: Practical machine learning tools and techniques (3rd ed.). 2011. Morgan Kaufmann.

  100. R Core Team. R: A language and environment for statistical computing. 2020.https://www.r-project.org/.

  101. Casas P. funModeling: Exploratory Data Analysis and Data Preparation Tool-Box (1.9.3). 2019.https://cran.r-project.org/package=funModeling.

  102. Wickham H, François R, Henry L, Müller K. dplyr: A grammar of data manipulation (0.8.5). R Foundation for Statistical Computing. 2020. https://cran.r-project.org/package=dplyr.

  103. Wickham H, Henry L. tidyr: Tidy Messy Data (1.0.2). R Foundation for Statistical Computing. 2020.

  104. Kuhn M. caret: Classification and Regression Training (6.0-86). R Foundation for Statistical Computing. 2020. https://cran.r-project.org/package=caret.

  105. Dinov I. Evaluating Model Performance. Data Science and Predictive Analytics. 2020. http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/13_ModelEvaluation.html.

  106. Parikh R, Mathai A, Parikh S, Sekhar C, Thomas R. Understanding and using sensitivity, specificity and predictive values. Indian Journal of Opthamology. 2008;56(1):45–50.

    Article  Google Scholar 

  107. Enke D, Mehdiyev N. A new yybrid approach for forecasting interest rates. Proc Comp Sci. 2012;12:259–64.

    Article  Google Scholar 

  108. Ahlawat A, Suri B. Improving Classification in Data mining using Hybrid algorithm. IEEE. 2016;2– 5.

  109. Lal A, Kumar CRS. Hybrid Classifier for Increasing Accuracy of Fitness Data Set. International Conference for Convergence in Technology. 2017;1246–1249. https://doi.org/10.1109/I2CT.2017.8226326.

  110. Nimala K, ThamizhArasan R. Hybrid data mining approaches for accurate prediction of diabetes and heart disease. International Journal of Pure and Applied Mathematics. 2018;120(6):2693–705.

    Google Scholar 

Download references

Acknowledgements

The authors extend their appreciation to Mr. Douglas Candia and Mr. Frank Namugera who contributed to improving this research. This research was partly funded by Makerere University through the Staff Development, Welfare and Retirement Benefits Committee (SDWRBC).

Funding

This study was partly funded by Makerere University through the Staff Development, Welfare and Retirement Benefits Committee (SDWRBC).

Author information

Authors and Affiliations

Authors

Contributions

FFB was involved in drafting the proposal, data collection, data preprocessing, data analysis, model designing and writing the manuscript. JN, PN and RW were supervisors of the work. All authors read amd approved the final manuscript.

Corresponding author

Correspondence to Francis Fuller Bbosa.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bbosa, F.F., Nabukenya, J., Nabende, P. et al. On the goodness of fit of parametric and non-parametric data mining techniques: the case of malaria incidence thresholds in Uganda. Health Technol. 11, 929–940 (2021). https://doi.org/10.1007/s12553-021-00551-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12553-021-00551-9

Keywords

Navigation