Skip to main content

An assessment of random forest technique using simulation study: illustration with infant mortality in Bangladesh

Abstract

We aimed to assess different machine learning techniques for predicting infant mortality (<1 year) in Bangladesh. The decision tree (DT), random forest (RF), support vector machine (SVM) and logistic regression (LR) approaches were evaluated through accuracy, sensitivity, specificity, precision, F1-score, receiver operating characteristics curve and k-fold cross-validation via simulations. The Boruta algorithm and chi-square (\(\chi ^2\)) test were used for features selection of infant mortality. Overall, the RF technique (Boruta: accuracy = 0.8890, sensitivity = 0.0480, specificity = 0.9789, precision = 0.1960, F1-score = 0.0771, AUC = 0.6590; \(\chi ^2\): accuracy = 0.8856, sensitivity = 0.0536, specificity = 0.9745, precision = 0.1837, F1-score = 0.0828, AUC = 0.6480) showed higher predictive performance for infant mortality compared to other approaches. Age at first marriage and birth, body mass index (BMI), birth interval, place of residence, religion, administrative division, parents education, occupation of mother, media-exposure, wealth index, gender of child, birth order, children ever born, toilet facility and cooking fuel were potential determinants of infant mortality in Bangladesh. Study findings may help women, stakeholders and policy-makers to take necessary steps for reducing infant mortality by creating awareness, expanding educational programs at community levels and public health interventions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Data availability

The secondary data are available at https://dhsprogram.com/data/available-datasets.cfm..

References

  1. CDC, Infant Mortality. Centers for Disease Control and Prevention; 2018. https://www.cdc.gov/reproductivehealth/MaternalInfantHealth/InfantMortality.htm/ accessed 14 July 2021.

  2. World Health Organization (WHO). 2018. Millennium development goals (MDGs). http://www.who.int/topics/millennium-development-goals/about/en/ accessed 14 July 2021.

  3. World Health Organization (WHO). 2018. The global helath observatory. https://www.who.int/data/gho/data/themes/topics/indicator-groups/indicator-group-details/GHO/infant-mortality/ accessed 14 July 2021.

  4. Vijay J, Patel KK. Risk factors of infant mortality in Bangladesh. Clin Epidemiol Glob Health. 2020;8:211–4. https://doi.org/10.1016/j.cegh.2019.07.003.

    Article  Google Scholar 

  5. Hajizadeh M, Nandi A, Heymann J. Social inequality in infant mortality: what explains variation across low and middle income countries? Soc Sci Med. 2014;101:36–46. https://doi.org/10.1016/j.socscimed.2013.11.019.

    Article  Google Scholar 

  6. World Health Organization (WHO). 2015. Success factor for women’s and child’s health: Bangladesh. www.who.int.

  7. Quansah E, Ohene LA, Norman L, et al. Social factors influencing child health in Ghana. PLoS ONE. 2016. https://doi.org/10.1371/journal.pone.0145401.

    Article  Google Scholar 

  8. Kiross GT, Chojenta C, Barker D, et al. The effect of maternal education on infant mortality in Ethiopia: a systematic review and meta-analysis. PLoS ONE. 2019;14:7. https://doi.org/10.1371/journal.pone.0220076.

    Article  Google Scholar 

  9. Khadka KB, Lieberman LS, Giedraitis V, et al. The socio-economic determinants of infant mortality in Nepal: analysis of Nepal demographic health survey, 2011. BMC Pediatr. 2015;15:152. https://doi.org/10.1186/s12887-015-0468-7.

    Article  Google Scholar 

  10. Santos SL, Santos LB, Campelo V, et al. Factors associated with infant mortality in a northeastern Brazilian capital. Rev Bras Ginecol Obstet. 2016;38(10):482–91. https://doi.org/10.1055/s-0036-1584686.

    Article  Google Scholar 

  11. Baraki AG, Akalu TY, Wolde HF, et al. Factors affecting infant mortality in the general population: evidence from the 2016 Ethiopian demographic and health survey (EDHS); a multilevel analysis. BMC Pregnancy Childbirth. 2020;20:299. https://doi.org/10.1186/s12884-020-03002-x.

    Article  Google Scholar 

  12. Mohamoud YA, Kirby RS, Ehrenthal DB. Poverty, urban–rural classification and term infant mortality: a population-based multilevel analysis. BMC Pregnancy Childbirth. 2019;19:40. https://doi.org/10.1186/s12884-019-2190-1.

    Article  Google Scholar 

  13. Dube L, Taha M, Asefa H. Determinants of infant mortality in community of Gilgel gibe field research center, Southwest Ethiopia: a matched case control study. BMC Public Health. 2013;13:401. https://doi.org/10.1186/1471-2458-13-401.

    Article  Google Scholar 

  14. Vilanova CS, Hirakata VN, Buriol VCC, et al. The relationship between the different low birth weight strata of newborns with infant mortality and the influence of the main health determinants in the extreme south of Brazil. Popul Health Metrics. 2019. https://doi.org/10.1186/s12963-019-0195-7.

    Article  Google Scholar 

  15. Dancer D, Rammohan A, Smith MD. Infant mortality and child nutrition in Bangladesh. Health Econ. 2008;17(9):1015–35. https://doi.org/10.1002/hec.1379.

    Article  Google Scholar 

  16. Alghamdi M, Al-Mallah M, Keteyian S, et al. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: the Henry Ford Exercise Testing (FIT) project. PLoS ONE. 2017;12: e0179805. https://doi.org/10.1371/journal.pone.0179805.

    Article  Google Scholar 

  17. Mateen BA, Liley J, Denniston AK, et al. Improving the quality of machine learning in health applications and clinical research. Nat Mach Intell. 2020;2(10):554–6. https://doi.org/10.1038/s42256-020-00239-1.

    Article  Google Scholar 

  18. Sarki R, Ahmed K, Wang H, et al. Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Inf Sci Syst. 2020;8(1):1–9. https://doi.org/10.1007/s13755-020-00125-5.

    Article  Google Scholar 

  19. Du J, Michalska S, Subramani S, et al. Neural attention with character embeddings for hay fever detection from Twitter. Health Inf Sci Syst. 2019. https://doi.org/10.1007/s13755-019-0084-2.

    Article  Google Scholar 

  20. Sarki R, Ahmed K, Wang H, et al. Image preprocessing in classification and identification of diabetic eye diseases. Data Sci Eng. 2021;6:455–71. https://doi.org/10.1007/s41019-021-00167-z.

    Article  Google Scholar 

  21. Supriya S, Siuly S, Wang H, et al. Automated epilepsy detection techniques from electroencephalogram signals: a review study. Health Inf Sci Syst. 2020;8(1):1–15. https://doi.org/10.1007/s13755-020-00129-1.

    Article  Google Scholar 

  22. He J, Rong J, Sun L, et al. A framework for cardiac arrhythmia detection from IoT-based ECGs. World Wide Web. 2020;23(5):2835–50. https://doi.org/10.1007/s11280-019-00776-9.

    Article  Google Scholar 

  23. National institute of population research and training (NIPROT). Bangladesh demographic and health survey 2017–2018. National institute of population research and training (NIPROT), Mitra and Associates, Dhaka, Bangladesh and ICF International, Calverton, Maryland, USA, 2019.

  24. Igual L, Seguí S. Introduction to data science. Cham: Springer; 2017.

    Book  Google Scholar 

  25. Nilsson NL. Introduction to machine learning; 1997.

  26. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  27. Awad M. Efficient Khanna, R machines learning. Berkeley: A press; 2015. https://doi.org/10.1007/978-1-4302-5990-9-1.

    Book  Google Scholar 

  28. Burges CJ. A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc. 1998;2(2):121–67.

    Article  Google Scholar 

  29. Müller KR, Mika S, Rätsch G, et al. An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw. 2001;12(2):181–201.

    Article  Google Scholar 

  30. Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995.

    Book  Google Scholar 

  31. Fawcett T. An Introduction to ROC Analysis. Pattern Recogn Lett. 2006;27:861–74.

    Article  Google Scholar 

  32. Rahman A, Hossain Z, Kabir E, et al. Machine learning algorithm for analysing infant mortality in Bangladesh. International Conference on Health Information Science 2021;205–219.

  33. R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project.org/; 2013.

  34. Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010;36(11):1–13.

    Article  Google Scholar 

  35. Palczewska A, Palczewski J, Robinson RM, et al. Interpreting random forest classification models using a feature contribution method In Integration of reusable systems. Cham: Springer; 2014. p. 193–218.

    Google Scholar 

  36. Hajipour M, Taherpour N, Fateh H, et al. Predictive factors of infant mortality using data mining in Iran. J Compr Ped. 2021;12:1. https://doi.org/10.5812/compreped.108575.

    Article  Google Scholar 

  37. de Bitencourt FH, Schwartz IVD, Vianna FSL. Infant mortality in Brazil attributable to inborn errors of metabolism associated with sudden death: a time-series study (2002–2014). BMC Pediatr. 2019;19:52. https://doi.org/10.1186/s12887-019-1421-y.

    Article  Google Scholar 

  38. Hossain MM, Abdulla F, Banik R, et al. Child marriage and its association with morbidity and mortality of under-5 years old children in Bangladesh. PLoS ONE. 2022;17:2. https://doi.org/10.1371/journal.pone.0262927.

    Article  Google Scholar 

  39. Finlay JE, Özaltin E, Canning D. The association of maternal age with infant mortality, child anthropometric failure, diarrhoea and anaemia for first births: evidence from 55 low- and middle-income countries. BMJ Open. 2011. https://doi.org/10.1136/bmjopen-2011-000226.

    Article  Google Scholar 

  40. Rahman MM, Ara T, Mahmud S, et al. Revisit the correlates of infant mortality in Bangladesh: findings from two nationwide cross-sectional studies. BMJ Open. 2021. https://doi.org/10.1136/bmjopen-2020-045506.

    Article  Google Scholar 

  41. Karmaker SC, Lahiry S, Roy DC, et al. Determinants of infant and child mortality in Bangladesh: time trends and comparisons across South Asia. Bangladesh J Med Sci. 2014. https://doi.org/10.3329/bjms.v13i4.20590.

    Article  Google Scholar 

  42. Singh A, Pathak PK, Chauhan RK, et al. Infant and child mortality in India in the last two decades: a geospatial analysis. PLoS ONE. 2011;6(11):2011e26856.

    Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the Bangladesh Demographic and Health Survey (BDHS) authority for the freely available secondary data.

Funding

There is no funding for this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atikur Rahman.

Ethics declarations

Conflict of interest

No conflict of interest exists among the authors.

Ethical approval

The National Research Ethics Committee of the Bangladesh Medical Research Council and ICF Macro Institutional Review Board approved BDHS, 2017–2018. The interview was conducted with the prior written approval of survey participants.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is an extended version of “Machine Learning Algorithm for Analysing Infant Mortality in Bangladesh” by Rahman et al. which appeared in the Proceedings of the 10th International Conference on Health Information Science (HIS 2021), Melbourne, Australia, October 2021.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rahman, A., Hossain, Z., Kabir, E. et al. An assessment of random forest technique using simulation study: illustration with infant mortality in Bangladesh. Health Inf Sci Syst 10, 12 (2022). https://doi.org/10.1007/s13755-022-00180-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-022-00180-0

Keywords

  • Machine learning
  • Random forest
  • Boruta algorithm
  • Chi-square
  • Infant mortality