Archives of Gynecology and Obstetrics

, Volume 300, Issue 6, pp 1565–1582 | Cite as

Analysis of big data for prediction of provider-initiated preterm birth and spontaneous premature deliveries and ranking the predictive features

  • Toktam KhatibiEmail author
  • Naghme Kheyrikoochaksarayee
  • Mohammad Mehdi Sepehri
Maternal-Fetal Medicine



High rate of preterm birth (birth before 37 weeks of gestation) in the world, its negative outcomes for pregnant women and newborns necessitate to predict preterm birth and identify its main risk factors. Premature deliveries have been divided into provider-initiated (with medical intervention for early terminating the pregnancy) and spontaneous preterm birth (without any intervention) categories in the previous studies. The main aim of this study is proposing methods for prediction of provider-initiated preterm birth and spontaneous premature deliveries and ranking the predictive features.


Data from national databank of Maternal and neonatal records (IMAN registry) is used in the study. The collected data have information about more than 1,400,000 deliveries with 112 features. Among them, 116,080 preterm births have occurred (from which 11,799 and 104,281 cases belong to provider-initiated preterm birth and spontaneous premature delivery, respectively). The data can be considered as big data due to its large number of data records, large number of the features and unbalanced distribution of the data between three classes of term, provider-initiated and spontaneous preterm birth. Therefore, we need to analyze data based on big data algorithms. In this paper, Map Reduce-based machine learning algorithms named MR-PB-PFS are proposed for this purpose. Map phase use parallel feature selection and classification methods to score the features. Reduce phase aggregates the feature scores obtained in Map phase and assign final scores to the features. Moreover, the classifiers trained in Map phase are aggregated based on two different ensemble rules in Reduce phase.


Experimental results show that the best performance of the proposed models for preterm birth prediction is accuracy of 81% and the area under the receiver operating characteristic curve (AUC) of 68%. Top features for predicting term, provider-initiated preterm and spontaneous premature birth identified in this study are having pregnancy risk factors, having gestational diabetes, having cardiovascular disease, maternal underlying diseases, and mother age. Chronic blood pressure is a high rank feature for preterm birth prediction and father nationality is highly important for discriminating provider-initiated from spontaneous premature delivery.


Identifying the pregnant women with high risk of spontaneous premature or therapeutic preterm delivery in our proposed model can help them to: (1) reduce the probability of premature birth with monitoring and management of the main risk factors and/or (2) educate them to care from the premature newborn. Management and monitoring top features discriminating term, provider-initiated preterm and spontaneous premature birth or their associated factors can reduce preterm labor or its negative outcomes.


Preterm birth prediction Big data Map-reduce Feature selection Ensemble classifier 



We thank Dr. Mohammad Heidarzadeh and Health and Medical Education Ministry of Iran—Neonatal Health Office for allowing us to have access to the dataset of IMAN registry and allowing us to do research on this dataset and publish the obtained results.

Author contributions

Conceptualization: TK, NK and MMS. Data curation: TK, NK and MMS. Formal analysis: TK, NK and MMS. Funding acquisition: there is no funding. Investigation: TK and NK. Methodology: TK, NK and MMS. Project administration: Khatibi T. Software: TK and NK. Supervision: TK and MMS. Validation: TK, NK and MMS. Visualization: TK, NK. Writing—original draft: TK, NK and MMS. Writing—review & editing: TK and MMS.

Compliance with ethical standards

Conflict of interest

The authors declare that there are no conflicts of interest.


  1. 1.
    World Health Organization (2018) Preterm Birth. World Health Organization (WHO). Accessed Jan 2019
  2. 2.
    Renzo GC, Tosto V, Giardina I (2018) The biological basis and prevention of preterm birth. Best Pract Res Clin Obstet Gynaecol 52:13–22. CrossRefPubMedGoogle Scholar
  3. 3.
    Blencowe H, Cousens S, Oestergaard MZ, Chou D, Moller AB, Narwal R, Adler A, Vera Garcia C, Rohde S, Say L, Lawn JE (2012) National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications. Lancet 379(9832):2162–2172. CrossRefPubMedGoogle Scholar
  4. 4.
    Liu L, Johnson HL, Cousens S, Perin J, Scott S, Lawn JE, Rudan I, Campbell H, Cibulskis R, Li M, Mathers C, Black RE (2012) Child Health Epidemiology Reference Group of WHO and UNICEF. Global, regional, and national causes of child mortality: an updated systematic analysis for 2010 with time trends since 2000. Lancet 379:2151–2161. CrossRefPubMedGoogle Scholar
  5. 5.
    Goldenberg RL, Culhane JF, Iams JD, Romero R (2008) Epidemiology and causes of preterm birth. Lancet 371:75–84. CrossRefPubMedGoogle Scholar
  6. 6.
    Ville Y, Rozenberg P (2018) Predictors of preterm birth. Best Pract Res Clin Obstet Gynaecol 52:23–32. CrossRefPubMedGoogle Scholar
  7. 7.
    Iams JD (2003) Prediction and early detection of preterm labor. Obstet Gynecol 101(2):402–412. CrossRefPubMedGoogle Scholar
  8. 8.
    Son M, Miller ES (2017) Predicting preterm birth: cervical length and fetal fibronectin. Semin Perinatol 41(8):445–451. CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Colstrup M, Mathiesen ER, Damm P (2013) Pregnancy in women with type 1 diabetes: have the goals of St. Vincent’s declaration been met concerning fetal and neonatal complications? J Matern Fetal Neonatal Med 26(17):1682–1686. CrossRefPubMedGoogle Scholar
  10. 10.
    Peelen MJ, Kazemier BM, Ravelli AC, Ghroot CJ, Post JA, Mol BW, Hajenius PJ, Kok M (2016) Impact of fetal gender on the risk of preterm birth, a national cohort study. Acta Obstetricia et Gynecologica Scandinavica (AOGS) 95(9):1034–1041. CrossRefGoogle Scholar
  11. 11.
    Weber A, Darmstadt GL, Gruber S, Foeller ME, Carmichael SL, Stevenson DK, Shaw GM (2018) Application of machine-learning to predict early spontaneous preterm birth among nulliparous non-Hispanic black and white women. Ann Epidemiol 28(11):783–789. CrossRefPubMedGoogle Scholar
  12. 12.
    Mailath-Pokorny M, Polterauer S, Kohl M, Kueronyai V, Worda K, Heinze G, Langer M (2015) Individualized assessment of preterm birth risk using two modified prediction models. Eur J Obstet Gynecol Reprod Biol 186:42–48. CrossRefPubMedGoogle Scholar
  13. 13.
    Elaveyini U, Devi SP, Rao KS (2011) Neural networks prediction of preterm delivery with first trimester bleeding. Arch Gynecol Obstet 283(5):971–979. CrossRefPubMedGoogle Scholar
  14. 14.
    Huang T, Lan L, Fang X, An P, Min J, Wang F (2015) Promises and challenges of big data computing in health sciences. Big Data Res 2(1):2–11. CrossRefGoogle Scholar
  15. 15.
    Genuer R, Poggi JM, Tuleau-Malot C, Villa-Vialaneix N (2017) Random forests for big data. Big Data Res 9:28–46. CrossRefGoogle Scholar
  16. 16.
    Chu C, Kim S, Lin Y, Yu Y, Bradski G, Ng A (2010) Olukotun K Map-reduce for machine learning on multicore. In: Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A (eds) Advances in neural information processing systems (NIPS 2010). NIPS, Vancouver, pp 281–288Google Scholar
  17. 17.
    Sun Z, Fox G (2012) Study on parallel SVM based on MapReduce. In: Proceedings of the international conference on parallel and distributed processing techniques and applicationsGoogle Scholar
  18. 18.
    Xu K, Wen C, Yuan Q, He X, Tie J (2014) A MapReduce based parallel SVM for email classification. J Netw 9(6):1640–1647. CrossRefGoogle Scholar
  19. 19.
    You ZH, Yu JZ, Zhu L, Li S, Wen ZK (2014) A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145:37–43. CrossRefGoogle Scholar
  20. 20.
    Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106. CrossRefGoogle Scholar
  21. 21.
    Breiman L (2001) Random forests. Mach Learn 45:5–32. CrossRefGoogle Scholar
  22. 22.
    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks, MontereyGoogle Scholar
  23. 23.
    Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques. Morgan Kauffmann, BurlingtonGoogle Scholar
  24. 24.
    Cortes C, Vapnik V (1995) Support-vector network. Mach Learn 20:1–25. CrossRefGoogle Scholar
  25. 25.
    Collobert R, Bengio S, Bengio Y (2001) A Parallel mixture of SVMs for very large scale problems. Neural Comput 14:1105–1114. CrossRefGoogle Scholar
  26. 26.
    Khalili N, Moradi-Lakeh M, Heidarzadeh M (2019) Low birth weight in Iran based on Iranian Maternal and Neonatal Network (IMAN). Med J Islam Repub Iran (MJIRI) 33:30. CrossRefGoogle Scholar
  27. 27.
    Spinillo A, Capuzzo E, Colonna L, Solerte L, Nicola S, Guaschino S (1994) Factors associated with abruptio placentae in preterm deliveries. Acta Obstetricia et Gynecologica Scandinavica (AOGS) 73(4):307–312CrossRefGoogle Scholar
  28. 28.
    Kouhkan A, Khamseh ME, Moini A, Pirjani R, Valojerdi AE, Arabipoor A, Hosseini R, Baradaran HR (2018) Predictive factors of gestational diabetes in pregnancies following assisted reproductive technology: a nested case–control study. Arch Gynecol Obstet 298(1):199–206. CrossRefPubMedGoogle Scholar
  29. 29.
    Langer O (2018) Prevention of obesity and diabetes in pregnancy: is it an impossible dream? Am J Obstet Gynecol (AJOG) 218(6):581–589. CrossRefGoogle Scholar
  30. 30.
    Bryson CL, Ioannou GN, Rulyak SJ, Critchlow C (2003) Association between gestational diabetes and pregnancy-induced hypertension. Am J Epidemiol 158(12):1148–1153. CrossRefPubMedGoogle Scholar
  31. 31.
    NIH (2017) What are the risk factors for preterm labor and birth? Accessed 28 Jan 2019
  32. 32.
    Steer P (2005) The epidemiology of preterm labour. BJOG 112(s1):1–3. CrossRefPubMedGoogle Scholar
  33. 33.
    Morisaki N, Ogawa K, Urayama KY, Sago H, Sato S, Saito S (2017) Preeclampsia mediates the association between shorter height and increased risk of preterm delivery. Int J Epidemiol 46(5):1690–1698. CrossRefPubMedGoogle Scholar
  34. 34.
    Oliver-Williams C, Fleming M, Wood AM, Smith GC (2015) Previous miscarriage and the subsequent risk of preterm birth in Scotland, 1980–2008: a historical cohort study. BJOG 122(11):1525–1534. CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Chiavarini M, Bartolucci F, Gili A, Pieroni L, Minelli L (2012) Effects of individual and social factors on preterm birth and low birth weight: empirical evidence from regional data in Italy. Int J Public Health 57(2):261–268. CrossRefPubMedGoogle Scholar
  36. 36.
    CDC (2018) Center for disease control and prevention website. Accessed 28 Jan 2019
  37. 37.
    Vestgaard M, Secher AL, Ringholm L, Jensen JE, Damm P, Mathiesen ER (2017) Vitamin D insufficiency, preterm delivery and preeclampsia in women with type 1 diabetes—an observational study. Acta Obstetricia et Gynecologica Scandinavica (AOGS) 96(10):1197–1204. CrossRefGoogle Scholar
  38. 38.
    Wang P, Liou SR, Cheng CY (2013) Prediction of maternal quality of life on preterm birth and low birthweight: a longitudinal study. BMC Pregnancy Childbirth 13(1):124. CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Goldenberg RL, Mercer BM, Meis PJ, Copper RL, Das A, McNellis D (1996) The preterm prediction study: fetal fibronectin testing and spontaneous preterm birth. Obstet Gynecol 87(5):643–648. CrossRefPubMedGoogle Scholar
  40. 40.
    Alijahan R, Hazrati S, Mirzarahimi M, Pourfarzi F, Ahmadi Hadi P (2014) Prevalence and risk factors associated with preterm birth in Ardabil, Iran. Iran J Reprod Med 12(1):47–56PubMedPubMedCentralGoogle Scholar
  41. 41.
    Vakilian K, Ranjbaran M, Khorsandi M, Sharafkhani N, Khodadost M (2015) Prevalence of preterm labor in Iran: a systematic review and meta-analysis. Int J Reprod Biomed (Yazd) 13(12):743–748CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Healthcare Systems Engineering, Faculty of Industrial and Systems EngineeringTarbiat Modares University (TMU)TehranIran
  2. 2.Healthcare Systems Engineering, Hospital Management Research Center (HMRC)Iran University of Medical Sciences (IUMS)TehranIran
  3. 3.Industrial and Systems EngineeringTarbiat Modares University (TMU)TehranIran

Personalised recommendations