Analysis of big data for prediction of provider-initiated preterm birth and spontaneous premature deliveries and ranking the predictive features
- 102 Downloads
High rate of preterm birth (birth before 37 weeks of gestation) in the world, its negative outcomes for pregnant women and newborns necessitate to predict preterm birth and identify its main risk factors. Premature deliveries have been divided into provider-initiated (with medical intervention for early terminating the pregnancy) and spontaneous preterm birth (without any intervention) categories in the previous studies. The main aim of this study is proposing methods for prediction of provider-initiated preterm birth and spontaneous premature deliveries and ranking the predictive features.
Data from national databank of Maternal and neonatal records (IMAN registry) is used in the study. The collected data have information about more than 1,400,000 deliveries with 112 features. Among them, 116,080 preterm births have occurred (from which 11,799 and 104,281 cases belong to provider-initiated preterm birth and spontaneous premature delivery, respectively). The data can be considered as big data due to its large number of data records, large number of the features and unbalanced distribution of the data between three classes of term, provider-initiated and spontaneous preterm birth. Therefore, we need to analyze data based on big data algorithms. In this paper, Map Reduce-based machine learning algorithms named MR-PB-PFS are proposed for this purpose. Map phase use parallel feature selection and classification methods to score the features. Reduce phase aggregates the feature scores obtained in Map phase and assign final scores to the features. Moreover, the classifiers trained in Map phase are aggregated based on two different ensemble rules in Reduce phase.
Experimental results show that the best performance of the proposed models for preterm birth prediction is accuracy of 81% and the area under the receiver operating characteristic curve (AUC) of 68%. Top features for predicting term, provider-initiated preterm and spontaneous premature birth identified in this study are having pregnancy risk factors, having gestational diabetes, having cardiovascular disease, maternal underlying diseases, and mother age. Chronic blood pressure is a high rank feature for preterm birth prediction and father nationality is highly important for discriminating provider-initiated from spontaneous premature delivery.
Identifying the pregnant women with high risk of spontaneous premature or therapeutic preterm delivery in our proposed model can help them to: (1) reduce the probability of premature birth with monitoring and management of the main risk factors and/or (2) educate them to care from the premature newborn. Management and monitoring top features discriminating term, provider-initiated preterm and spontaneous premature birth or their associated factors can reduce preterm labor or its negative outcomes.
KeywordsPreterm birth prediction Big data Map-reduce Feature selection Ensemble classifier
We thank Dr. Mohammad Heidarzadeh and Health and Medical Education Ministry of Iran—Neonatal Health Office for allowing us to have access to the dataset of IMAN registry and allowing us to do research on this dataset and publish the obtained results.
Conceptualization: TK, NK and MMS. Data curation: TK, NK and MMS. Formal analysis: TK, NK and MMS. Funding acquisition: there is no funding. Investigation: TK and NK. Methodology: TK, NK and MMS. Project administration: Khatibi T. Software: TK and NK. Supervision: TK and MMS. Validation: TK, NK and MMS. Visualization: TK, NK. Writing—original draft: TK, NK and MMS. Writing—review & editing: TK and MMS.
Compliance with ethical standards
Conflict of interest
The authors declare that there are no conflicts of interest.
- 1.World Health Organization (2018) Preterm Birth. World Health Organization (WHO). https://www.who.int/news-room/fact-sheets/detail/preterm-birth. Accessed Jan 2019
- 3.Blencowe H, Cousens S, Oestergaard MZ, Chou D, Moller AB, Narwal R, Adler A, Vera Garcia C, Rohde S, Say L, Lawn JE (2012) National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications. Lancet 379(9832):2162–2172. https://doi.org/10.1016/S0140-6736(12)60820-4 CrossRefPubMedGoogle Scholar
- 4.Liu L, Johnson HL, Cousens S, Perin J, Scott S, Lawn JE, Rudan I, Campbell H, Cibulskis R, Li M, Mathers C, Black RE (2012) Child Health Epidemiology Reference Group of WHO and UNICEF. Global, regional, and national causes of child mortality: an updated systematic analysis for 2010 with time trends since 2000. Lancet 379:2151–2161. https://doi.org/10.1016/S0140-6736(12)60560-1 CrossRefPubMedGoogle Scholar
- 11.Weber A, Darmstadt GL, Gruber S, Foeller ME, Carmichael SL, Stevenson DK, Shaw GM (2018) Application of machine-learning to predict early spontaneous preterm birth among nulliparous non-Hispanic black and white women. Ann Epidemiol 28(11):783–789. https://doi.org/10.1016/j.annepidem.2018.08.008 CrossRefPubMedGoogle Scholar
- 16.Chu C, Kim S, Lin Y, Yu Y, Bradski G, Ng A (2010) Olukotun K Map-reduce for machine learning on multicore. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in neural information processing systems (NIPS 2010). NIPS, Vancouver, pp 281–288Google Scholar
- 17.Sun Z, Fox G (2012) Study on parallel SVM based on MapReduce. In: Proceedings of the international conference on parallel and distributed processing techniques and applicationsGoogle Scholar
- 22.Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks, MontereyGoogle Scholar
- 23.Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques. Morgan Kauffmann, BurlingtonGoogle Scholar
- 28.Kouhkan A, Khamseh ME, Moini A, Pirjani R, Valojerdi AE, Arabipoor A, Hosseini R, Baradaran HR (2018) Predictive factors of gestational diabetes in pregnancies following assisted reproductive technology: a nested case–control study. Arch Gynecol Obstet 298(1):199–206. https://doi.org/10.1007/s00404-018-4772-y CrossRefPubMedGoogle Scholar
- 31.NIH (2017) What are the risk factors for preterm labor and birth? https://www.nichd.nih.gov/health/topics/preterm/conditioninfo/who_risk. Accessed 28 Jan 2019
- 36.CDC (2018) Center for disease control and prevention website. https://www.cdc.gov/features/prematurebirth/index.html. Accessed 28 Jan 2019
- 37.Vestgaard M, Secher AL, Ringholm L, Jensen JE, Damm P, Mathiesen ER (2017) Vitamin D insufficiency, preterm delivery and preeclampsia in women with type 1 diabetes—an observational study. Acta Obstetricia et Gynecologica Scandinavica (AOGS) 96(10):1197–1204. https://doi.org/10.1111/aogs.1318 CrossRefGoogle Scholar