Abstract
To predict preterm birth (PTB) in multiparous women, comparing machine learning approaches with traditional logistic regression. A population-based cohort study was conducted using data from the Ontario Better Outcomes Registry and Network (BORN). The cohort included all multiparous women who delivered a singleton birth at 20–42 weeks’ gestation in an Ontario hospital between April 1, 2012 and March 31, 2014. The primary outcome was PTB < 37 weeks, with spontaneous PTB analyzed as a secondary outcome. Stepwise logistic regression and the Boruta machine learning were used to select the important variables during the first and second trimester. For building prediction models, the whole data set were divided for the two independent parts: two-third for training the classifiers (Logistic regression, random forests, decision trees, and artificial neural networks) and one-third for model validation. Then, the training data set were balanced by random over sampling technique. The best hyper parameters were obtained by the tenfold cross validation. The performance of all models was evaluated by sensitivity, specificity, positive predictive value, negative predictive value, and the area under the receiver operating characteristics (AUC). The cohort included 145,846 births, of which 8125 (5.57%) were preterm. In first-trimester models, the strongest predictors of PTB were previous PTB, preexisting diabetes, and abnormal pregnancy‐associated plasma protein-A. In the testing data set, the highest predictive ability was seen for artificial neural networks, with an area under the receiver operating characteristic curve (AUC) of 68.8% (95% CI 67.6–70.1%). In second-trimester models, addition of infant sex, attendance at first-trimester appointment, medication exposure, and abnormal alpha-fetoprotein concentrations increased the AUC to 72.1% (95% CI 71.1–73.1%) with logistic regression. With the inclusion of the variable complications during pregnancy, the AUC increased to 80.5% (95% CI 79.6–81.5%) using logistic regression. For both overall and spontaneous PTB, during both the first and second trimesters, models yielded negative predictive values of 97%. Overall, machine learning and logistic regression produced similar performance for prediction of PTB. For overall and spontaneous PTB, both first- and second-trimester models provided negative predictive values of ~ 97%, higher than that of fetal fibronectin.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Introduction
An estimated 15 million infants are born preterm (less than 37 weeks of completed gestation) each year worldwide, of whom approximately 1 million do not survive the first year of life1,2,3. Complications of preterm birth (PTB) can lead to many short- and long-term medical conditions in the child, including respiratory distress syndrome, chronic lung disease, cardiovascular disorders, asthma, and loss of hearing and vision4. Consequently, the economic costs and public health burden of PTB are substantial. Yearly costs have been estimated at £2.946 billion in the UK5 and $8 billion CAD in Canada6, where PTB has been linked to two-thirds of infant deaths7. Compared to the UK, Canada, and Western European countries, the US has a higher rate of PTB (10.02% in 2018)8,9, with annual costs of $26.2 billion10.
In light of the potential to reduce infant and childhood morbidity and mortality, reduction of PTB is a major public health priority11. Accurate and reliable prediction of PTB would enable preventative interventions to diminish morbidity and mortality12, thus benefitting pregnant women, neonates, healthcare systems, and society as a whole. However, recent studies have shown that prediction of PTB is a challenging task, with prediction models to date yielding an area under the receiver operating characteristic curve (AUC) of around 70%13.
Machine learning techniques present the potential for improved prediction of PTB. Use of machine learning has grown rapidly in a wide range of medical disciplines including cancer prediction, cardiovascular diagnostics, image analysis14,15, and maternal postpartum complications16, and machine learning was highlighted as one of the most important advances of 2019 for prenatal diagnosis17. The main advantage of machine learning methods over other methods such as linear regression is the possibility of relaxing assumptions regarding multicollinearity, additivity, and distribution18.
Machine learning can be conducted using multiple types of learning algorithms, among which decision trees, random forests, and artificial neural networks are commonly used in medical disciplines19. Decision trees constitute the most straightforward algorithm and provide a visual representation of the relation between predictors and outcome variables. However, the high variance in the results of the decision trees can in some cases be improved by using random forests, which aggregate results of randomly generated decision trees to produce a more accurate model20. Artificial neural networks present a third option that has been broadly used in medical studies21 and performs well when complex and non-linear associations exist between variables22. Briefly, artificial neural networks use predictors as inputs and connect them to multiple hidden layer combinations by assigning suitable weights to predict the outcome23.
This study identified predictors of PTB in a large cohort of multiparous women using multiple state-of-the-art machine learning algorithms and compared results obtained from these algorithms with those from traditional logistic regression analyses. Predictors of PTB that have not been commonly considered in prior prediction models, such as proteins used for screening for placental diseases and Down Syndrome, were included. Also the predictive models were developed using training datasets and then tested using validation data, as is recommended by guidelines24,25.
Methods
Data and population
A population-based retrospective cohort study was performed using data from Ontario’s Better Outcomes Registry and Network (BORN)26. A description of the reliability of BORN to provide accurate data is provided by Miao et al.26 In the present study, multiparous women with a singleton birth at 20–42 weeks’ gestation delivered in an Ontario hospital between April 1, 2012, and March 31, 2014, were included. The primary outcome, PTB, was defined as gestational age at birth less than 37 weeks. Spontaneous PTB was also considered as a secondary outcome. Spontaneous PTB was identified by births categorized as not “induced”, not “caesarean section” and not “augmented labor”27.
The potential predictors of PTB available in the BORN database were identified based on knowledge to date regarding the etiology of PTB11,28. Potential predictors considered for first-trimester models included maternal age, height, pre-pregnancy body mass index (BMI), pre-existing physical health conditions (Table S1), pre-existing mental health conditions (Table S2), socioeconomic variables (Table S3), smoking status, alcohol consumption during pregnancy, folic acid use, gravidity, number of prior abortions (including miscarriages), number of prior PTBs, number of prior term births, number of prior vaginal births, number of prior caesarean births, history of vaginal birth following caesarean section, history of stillbirth, gestational weight gain during the first trimester, diabetes, use of assisted reproductive technologies, and antenatal health care provider. The impact of pregnancy-associated plasma protein-A concentration and free beta-subunit of human chorionic gonadotropin were also considered as potential markers of placental diseases including preeclampsia29, as well as nuchal translucencies30.
In second-trimester models, all of the variables from the first trimester were considered. Moreover, the protein concentrations of dimeric inhibin-A32, unconjugated estriol33, human chorionic gonadotropin34, and alpha-fetoprotein35 were added. Furthermore, the intention to breastfeed, attendance at first-trimester appointments, hypertensive disorder of pregnancy, gestational diabetes, infection, medication exposure, fetal sex, pregnancy complications31 were included as potential predictors of PTB during the second trimester as well. Complications of pregnancy contained more than 600 categories, which were combined into three groups based on the expertise of our in-house maternal–fetal specialist: No complications, Moderate complications, and severe complications (including hypertensive, placental abnormalities, and maternal problems during the pregnancy, such as antepartum hemorrhaging). Also, biomarker concentrations and nuchal translucencies were categorized into three groups: normal, abnormal, and missing (see Table S4 for cutoffs).
Descriptive analyses
The associations between predictors and outcome variables were measured by applying chi-square and t-tests for categorical and continuous predictors respectively, with statistical significance defined as p < 0.05. Then crude odds ratios (ORs) between individual predictors and the primary and secondary outcomes were estimated using univariate logistic regression.
Model selection and evaluation
First the whole cleaned data set was divided into two portions: two-thirds of the data were used for model building (training), while the remaining one-third of the data were used for model testing (validation). For each of the algorithms, the tenfold cross-validation was used in the balanced (random oversampling36,37) training data to identify the optimal model producing the highest area under the receiver operating characteristic curve (AUC). Finally the model performance is assessed in the validation data by sensitivity, specificity, positive predictive value, negative predictive value, and AUC38.
Machine learning algorithms were executed using R software (version 3.5.2) and the caret39 package. The receiver operating characteristic curves (ROCs) and AUCs were obtained by using the pROC package in R. Variable selection for the machine learning models was performed using the Boruta package40. Boruta computes the importance of each variable as a measure of predictive ability. The higher the importance of a variable, the more predictive power in the model41. For missing values of prediction variables, ten multiple imputations were produced by chained equations42 using the R package MICE43 and substituted missing values with the average and mode for continuous and categorical variables, respectively. Variable selection for the logistic regression analyses was performed using the stepAIC function in the MASS package.
Consent to participate
Informed consent was obtained from all subjects and/or their legal guardian(s).
Ethics approval
This study followed the guidelines for the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis25. The Hamilton Integrated Research Ethics Board approved the study before its commencement (approval # 14-714-C).
Results
The cohort included 145,846 births, of which 8125 (5.57%) were preterm (Table 1). Nearly 30% of the study population was over 35 years of age. The mean maternal pre-pregnancy BMI was 25.9 ± 6.6 kg/m2, and the mean maternal height was 163.6 cm ± 7.3 cm. Twenty-one percent of women had at least one medical condition and approximately 40% had at least one previous abortion (including miscarriages) (Table S5). Of those patients who had experienced prior abortions (including miscarriages), 4.9% had more than three. Overall, 2.0% of pregnant women had a history of stillbirth, while 8.6% had a history of PTB and 24.8% had a previous caesarean section.
Univariable analysis
Twenty-nine potential first-trimester predictors of PTB were analyzed (Table S6). Patients who were older than 35 years of age, shorter than 160 cm, obese prior to pregnancy, smokers, patients who conceived using ovulation induction including IVF, and those who had prior medical conditions including diabetes or low pregnancy-associated plasma protein-A concentrations were more likely to experience PTB than patients without these conditions. Furthermore, patients with a history of abortion, PTB, or caesarean section had a higher likelihood of experiencing PTB than patients who did not have a history of such conditions. Results of the univariate analysis for second-trimester predictors are shown in Table S7. Patients who did not attend their first trimester clinical evaluation, failed to enroll in prenatal education programs, or had abnormal protein (pregnancy-associated plasma protein-A, dimeric inhibin-A, unconjugated estriol, human chorionic gonadotropin) levels were more likely to experience PTB than a term birth. Furthermore, patients pregnant with a male baby, having a hypertensive disorder, or experiencing severe complications during pregnancy were at a higher risk of PTB than term birth.
Multivariable analysis
Stepwise logistic regression identified 18 significant first-trimester predictors of PTB (Fig. S1). Among these, the highest adjusted OR was seen for prior PTB (aOR for one previous PTB: 3.91; 95% confidence interval (CI) 3.52–4.33; OR for two previous preterm births: 4.33; 95% CI 3.59–5.20; OR for three or more previous preterm births: 6.37; 95% CI 4.48–8.94). Diabetes and abnormal pregnancy-associated plasma protein-A concentrations had the second highest ORs. Overall, women were at increased risk of PTB if they were shorter than 160 cm, underweight (< 18.5 kg/m2), had a lower education level, a history of health concerns, history of miscarriage or caesarean section, or experienced excess gestational weight gain. In contrast, patients with a history of term birth were less likely to give birth preterm.
In the second-trimester model, stepwise logistic regression identified 23 significant predictors of PTB (Figs. S1, S2), including all 18 predictors identified in the first-trimester model. The five additional predictors identified were fetal sex, attendance at the first-trimester clinical evaluation, pregnancy complications, medication exposure, and abnormal alpha-fetoprotein concentration. Severe complications during pregnancy exhibited the highest adjusted OR (12.37; 95% CI 11.61–13.18) in the second-trimester model. Patients who were pregnant with a male fetus were also at a higher risk of experiencing PTB. Conversely, patients exposed to medications, including vitamins and herbal supplements, were less likely to experience PTB.
Machine learning (Boruta method) identified 24-predictors of PTB during the first trimester (Table S8). As with logistic regression analyses, machine learning methods found that a history of PTB was the strongest predictor of subsequent PTB. However, in second-trimester models, machine learning methods did not identify fetal sex, prior abortions (including miscarriages), or maternal height as important predictors of PTB (Table S9). As with logistic regression analyses, complications during pregnancy, hypertensive disorder of pregnancy, and prior PTB were the most important predictors identified by machine learning.
Performance of prediction models in the validation set
Among first-trimester models, the random forests model had the highest sensitivity, specificity, and AUC in training samples (Table S10). However, in the testing data, artificial neural networks yielded the highest AUC (68.8%, 95% CI 67.6–70.1%), sensitivity (54.5%, 95% CI 52.4–56.7%) and negative predictive value (96.6%, 95% CI 96.3–96.8%) (Fig. 1, Table 2).
The AUC for both machine learning methods and logistic regression models rose substantially with inclusion of second-trimester predictors, with logistic regression yielding the highest AUC (80.5%, 95% CI 79.6–81.5%). The notable rise in AUC between first- and second-trimester models stemmed largely from the addition of pregnancy complications. Accordingly, a sensitivity analysis without this variable was performed. This analysis yielded an AUC of 72.1% (95% CI 71.1–73.1%) (Fig. 2). Among the second-trimester models, artificial neural networks yielded the highest AUC, while also maintaining a high negative predictive value of 97.4% (95% CI 97.3–97.6%). However, both artificial neural networks and random forests exhibited overfitting (high performance in training data but low performance in testing data) issue.
Prediction of spontaneous PTB
In first-trimester models predicting spontaneous PTB, logistic regression and artificial neural networks yielded similar AUCs (71.0% and 70.9% respectively) (Fig. S3). However, in second-trimester models, the AUC was slightly higher for artificial neural networks (73.8%, 95% CI 72.3–75.3%) compared to logistic regression (70.8%, 95% CI 69.1–72.4%) and other methods (Fig. S4). Artificial neural networks and logistic regression both yielded negative predictive values for spontaneous preterm birth of ~ 97% in first- and second-trimester models (Table S11).
Discussion
Main findings
Using data from a population-based retrospective cohort of multiparous women, a set of prediction models for PTB developed and validated demonstrated moderate predictive power44,45, with an AUC of 68.8% in the first trimester. This study identified 18 first-trimester predictor variables, among which history of PTB, diabetes, and abnormal pregnancy-associated plasma protein-A concentrations were the strongest predictors. In second-trimester models, four additional significant predictors were identified (infant sex, antenatal care in the first trimester, medication exposure, and abnormal alpha-fetoprotein concentrations), resulting in a maximum AUC of 72.1%, which is considered an acceptable prediction accuracy38,45. Inclusion of data on complications during pregnancy yielded an AUC of 80.5%, which is indicative of an acceptable prediction model39,46.
For both the first and second trimesters, my models for overall and spontaneous PTB yielded negative predictive values higher than that of fetal fibronectin, whose negative predictive value was identified as 93% in a recent systematic review46. Moreover, my second-trimester models generated using artificial neural networks yielded sensitivity and specificity similar to the overall sensitivity (58%) and specificity (84%) of fetal fibronectin46.
Strengths and limitations
This study has several important strengths. First, logistic regression was applied and three of the most commonly used machine learning approaches to predict PTB. Such an approach contrasts with that used in previous studies, which have primarily focused on one or two machine learning methods47,48,49,50,51,52,53. Moreover, new prediction variables from both the first and second trimesters were considered, which have also not been consistently included in previous studies8,9,10,11,12,13,14,48,49,50,51,54,55,56,57,58,59. The proposed models yielded higher negative predictive values than fetal fibronectin for the prediction of PTB46. Accordingly, eventual use of these models in clinical practice could reduce non-essential hospitalization and other interventions, in turn reducing costs and hospital staff burden60.
An additional strength of this study is the use of a relatively large cohort compared to previous studies8,9,10,11,12,13,14. Also, the prediction variables that have not been considered by previous studies, including plasma proteins and first-trimester gestational weight gain, were examined in this study. As suggested by Courtney et al.61, prenatal factors such as maternal health behaviors and medical history were also included in our study. Finally, an additional drawback of previous studies is the lack of information provided to address the issue of imbalanced classes8,9,10,11,12,13,14 which was covered by random-oversampling method.
Despite using state-of-the-art machine learning algorithms, the predictive power of the proposed models was limited, and the AUCs varied from 69 to 73% when information on pregnancy complications was not included. The inclusion of information on fetal fibronectin and ultrasonography (cervical length), which are important predictors of PTB62, as well as data on preventative measures used by patients with high-risk pregnancies63,64, may improve predictive power. However, it should be noted that progesterone was rarely administered in Ontario during the timeframe of the data used in our study65.
Conclusion
In a large cohort of multiparous women, machine learning (artificial neural networks) and logistic regression methods yielded generally similar accuracy for the prediction of PTB. However, for spontaneous PTB, artificial neural networks provided slightly better results than logistic regression. For overall and spontaneous PTB, both first- and second-trimester models provided very high negative predictive values, higher than that of fetal fibronectin.
Data availability
Data can be available from corresponding author upon reasonable request.
References
Blencowe, H. et al. National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: A systematic analysis and implications. Lancet 379(9832), 2162–2172 (2012).
Chawanpaiboon, S. et al. Global, regional, and national estimates of levels of preterm birth in 2014: A systematic review and modelling analysis. Lancet Glob. Health 7(1), e37–e46 (2019).
Liu, L. et al. Global, regional, and national causes of child mortality in 2000–13, with projections to inform post-2015 priorities: An updated systematic analysis. The Lancet 385(9966), 430–440 (2015).
Saigal, S. & Doyle, L. W. An overview of mortality and sequelae of preterm birth from infancy to adulthood. Lancet 371(9608), 261–269 (2008).
Mangham, L. J., Petrou, S., Doyle, L. W., Draper, E. S. & Marlow, N. The cost of preterm birth throughout childhood in England and Wales. Pediatrics 123(2), e312–e327 (2009).
Lim, G. et al. CIHI survey: Hospital costs for preterm and small-for-gestational age babies in Canada. Healthc. Q. 12(4), 20–24 (2009).
Government of Canada Invests in Better Health for Premature Babies. https://www.newswire.ca/news-releases/government-of-canada-invests-in-better-health-for-premature-babies-622089233.html (2019).
Bronstein, J. M., Wingate, M. S. & Brisendine, A. E. Why is the US preterm birth rate so much higher than the rates in Canada, Great Britain, and Western Europe? Int. J. Health Serv. 48(4), 622–640 (2018).
Preterm Birth Rates are from the National Center for Health Statistics, 2018 Final Natality Data [Internet]. MARCH OF DIMES. https://www.marchofdimes.org/mission/reportcard.aspx (2019).
The Impact of Premature Birth on Society. https://www.marchofdimes.org/mission/the-economic-and-societal-costs.aspx (2020).
Frey, H. A. & Klebanoff, M. A. The epidemiology, etiology, and costs of preterm birth. Semin. Fetal Neonat. Med. 21(2), 68–73 (2016).
Suff, N., Story, L. & Shennan, A. The prediction of preterm delivery: What is new? Semin. Fetal Neonat. Med. 24(1), 27–32 (2019).
Ferrero, D. M. et al. Cross-country individual participant analysis of 4.1 million singleton births in 5 countries with very high human development index confirms known associations but provides no biologic explanation for 2/3 of all preterm births. PLoS ONE 11(9), e0162506 (2016).
Jalali, A., Simpao, A. F., Gálvez, J. A., Licht, D. J. & Nataraj, C. Prediction of periventricular leukomalacia in neonates after cardiac surgery using machine learning algorithms. J. Med. Syst. 42(10), 177 (2018).
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. & Fotiadis, D. I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015).
Betts, K. S., Kisely, S. & Alati, R. Predicting common maternal postpartum complications: Leveraging health administrative data and machine learning. BJOG 126(6), 702–709 (2019).
Chitty, L. S. et al. In case you missed it: The prenatal diagnosis editors bring you the most significant advances of 2019. Prenat. Diagn. 40(3), 287–293 (2020).
Hastie, T., Tibshirani, R. & Friedman, J. Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn. (Springer, 2009).
Wiemken, T. L. & Kelley, R. R. Machine learning in epidemiology and health outcomes research. Annu. Rev. Public Health 41, 1 (2020).
James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning: With Applications in R 7th edn. (Springer, 2019).
DeGregory, K. W. et al. A review of machine learning in obesity. Obes. Rev. 19(5), 668–685 (2018).
Hassanipour, S. et al. Comparison of artificial neural network and logistic regression models for prediction of outcomes in trauma patients: A systematic review and meta-analysis. Injury 50(2), 244–250 (2019).
Lisboa, P. J. & Taktak, A. F. G. The use of artificial neural networks in decision support in cancer: A systematic review. Neural Netw. 19(4), 408–415 (2006).
Grant, S. W., Collins, G. S. & Nashef, S. A. M. Statistical primer: Developing and validating a risk prediction model. Eur. J. Cardiothorac. Surg. 54(2), 203–208 (2018).
Moons, K. G. M. et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Ann. Intern. Med. 162(1), W1–W73 (2015).
Miao, Q., Fell, D. B., Dunn, S. & Sprague, A. E. Agreement assessment of key maternal and newborn data elements between birth registry and Clinical Administrative Hospital Databases in Ontario, Canada. Arch. Gynecol. Obstet. 300(1), 135–143 (2019).
Maghsoudlou, S., Beyene, J., Yu, Z. M. & McDonald, S. D. Phenotypic classification of preterm birth among multiparous women: A population-based cohort study. J. Obstet. Gynaecol. Can. 41(10), 1433–1443 (2019).
Goldenberg, R. L., Culhane, J. F., Iams, J. D. & Romero, R. Epidemiology and causes of preterm birth. The Lancet 371(9606), 75–84 (2008).
Pillay, P., Moodley, K., Moodley, J. & Mackraj, I. Placenta-derived exosomes: Potential biomarkers of preeclampsia. Int. J. Nanomed. 12, 8009–8023 (2017).
Bilagi, A. et al. Association of maternal serum PAPP-A levels, nuchal translucency and crown-rump length in first trimester with adverse pregnancy outcomes: Retrospective cohort study. Prenat. Diagn. 37(7), 705–711 (2017).
Vahanian, S. A., Lavery, J. A., Ananth, C. V. & Vintzileos, A. Placental implantation abnormalities and risk of preterm delivery: A systematic review and metaanalysis. Am. J. Obstetr. Gynecol. 213(4), S78–S90 (2015).
Jelliffe-Pawlowski, L. L. et al. Association of early preterm birth with abnormal levels of routinely collected first and second trimester biomarkers. Am. J. Obstet. Gynecol. 208(6), 492 (2013).
Olsen, R. N., Dunsmoor-Su, R., Capurro, D., McMahon, K. & Gravett, M. G. Correlation between spontaneous preterm birth and mid-trimester maternal serum estriol. J. Matern. Fetal Neonatal Med. 27(4), 376–380 (2014).
Bernstein, P. S. et al. Beta-human chorionic gonadotropin in cervicovaginal secretions as a predictor of preterm delivery. Am. J. Obstet. Gynecol. 179(4), 870–873 (1998).
Waller, D. K., Lustig, L. S., Cunningham, G. C., Feuchtbaum, L. B. & Hook, E. B. The association between maternal serum alpha-fetoprotein and preterm birth, small for gestational age infants, preeclampsia, and placental complications. Obstet. Gynecol. 88(5), 816–822 (1996).
Fotouhi, S., Asadi, S. & Kattan, M. W. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J. Biomed. Inform. 90, 103089 (2019).
Sun, Y., Wong, A. K. C. & Kamel, M. S. Classification of imbalanced data: A review. Int. J. Pattern Recogn. Artif. Intell. 23(04), 687–719 (2009).
Hosmer, D. W. & Lemeshow, S. Applied Logistic Regression 2nd edn. (Wiley, 2000).
Wing, M. K. C. et al. caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret (Accessed 30 September 2019) (2019).
Kursa, M. B. & Rudnicki, W. R. Boruta: Wrapper Algorithm for All Relevant Feature Selection. https://CRAN.R-project.org/package=Boruta (Accessed 1 October 2019) (2018).
Witold, R. & Rudnicki, M. B. K. Feature selection with the boruta package. J. Stat. Softw. 36(11), 1–13 (2010).
Stuart, E. A., Azur, M., Frangakis, C. & Leaf, P. Multiple imputation with large data sets: A case study of the children’s mental health initiative. Am. J. Epidemiol. 169(9), 1133–1139 (2009).
van Buuren, S. & Groothuis-Oudshoorn, K. mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45(1), 1–67 (2011).
Fischer, J. E., Bachmann, L. M. & Jaeschke, R. A readers’ guide to the interpretation of diagnostic test properties: Clinical example of sepsis. Intens. Care Med. 29(7), 1043–1051 (2003).
Akobeng, A. K. Understanding diagnostic tests 3: Receiver operating characteristic curves. Acta Paediatr. 96(5), 644–647 (2007).
Melchor, J. C., Khalil, A., Wing, D., Schleussner, E. & Surbek, D. Prediction of preterm delivery in symptomatic women using PAMG-1, fetal fibronectin and phIGFBP-1 tests: Systematic review and meta-analysis. Ultrasound Obstet. Gynecol. 52(4), 442–451 (2018).
Grobman, W. A. et al. Prediction of spontaneous preterm birth among nulliparous women with a short cervix. J. Ultrasound Med. 35(6), 1293–1297 (2016).
Fergus, P. et al. Prediction of preterm deliveries from EHG signals using machine learning. PLoS ONE 8(10), e77154 (2013).
Meertens, L. J. E. et al. Prediction models for the risk of spontaneous preterm birth based on maternal characteristics: A systematic review and independent external validation. Acta Obstetr. Gynecol. Scand. 97(8), 907–920 (2018).
Lee, K. A. et al. A model for prediction of spontaneous preterm birth in asymptomatic women. J. Womens Health (Larchmt) 20(12), 1825–1831 (2011).
Catley, C., Frize, M., Walker, R. C. & Petriu, D. C. Predicting high-risk preterm birth using artificial neural networks. IEEE Trans. Inf. Technol. Biomed. 10(3), 540–549 (2006).
Podda, M. et al. A machine learning approach to estimating preterm infants survival: Development of the preterm infants survival assessment (PISA) predictor. Sci. Rep. 8(1), 13743 (2018).
Care, A. G. et al. Predicting preterm birth in women with previous preterm birth and cervical length ≥ 25 mm. Ultrasound Obstet. Gynecol. 43(6), 681–686 (2014).
Vovsha, I. et al. Using Kernel methods and model selection for prediction of preterm birth. In Machine Learning for Healthcare Conference 55–72. http://proceedings.mlr.press/v56/Vovsha16.html (Accessed 21 January 2019) (2016).
Weber, A. et al. Application of machine-learning to predict early spontaneous preterm birth among nulliparous non-Hispanic black and white women. Ann. Epidemiol. 28(11), 783–789 (2018).
Podda, M. et al. A machine learning approach to estimating preterm infants survival: Development of the preterm infants survival assessment (PISA) predictor. Sci. Rep. 8, 213 (2018).
Omani-Samani, R. et al. Cross-sectional study of associations between prior spontaneous abortions and preterm delivery. Int. J. Gynaecol. Obstet. 140(1), 81–86 (2018).
Goodwin, L. K. et al. Data mining methods find demographic predictors of preterm birth. Nurs. Res. 50(6), 340–345 (2001).
Luo, W. Preterm Birth Prediction: Deriving Stable and Interpretable Rules from High Dimensional Data (2016).
Melchor, J. C. et al. Predictive performance of PAMG-1 vs fFN test for risk of spontaneous preterm birth in symptomatic women attending an emergency obstetric unit: Retrospective cohort study. Ultrasound Obstet. Gynecol. 51(5), 644–649 (2018).
Courtney, K. L., Stewart, S., Popescu, M. & Goodwin, L. K. Predictors of preterm birth in birth certificate data. Stud. Health Technol. Inform. 136, 555–560 (2008).
Iams, J. D., Paraskos, J., Landon, M. B., Teteris, J. N. & Johnson, F. F. Cervical sonography in preterm labor. Obstet. Gynecol. 84(1), 40–46 (1994).
Jarde, A. et al. Effectiveness of progesterone, cerclage and pessary for preventing preterm birth in singleton pregnancies: A systematic review and network meta-analysis. BJOG 124(8), 1176–1189 (2017).
Jarde, A., Lutsiv, O., Beyene, J. & McDonald, S. D. Vaginal progesterone, oral progesterone, 17-OHPC, cerclage, and pessary for preventing preterm birth in at-risk singleton pregnancies: An updated systematic review and network meta-analysis. BJOG 126(5), 556–567 (2019).
Feng, Y. Y. et al. What interventions are being used to prevent preterm birth and when? J. Obstet. Gynaecol. Can. 40(5), 547–554 (2018).
Acknowledgements
The author extends sincere gratitude to the editor and reviewers for their invaluable insights and constructive feedback, which greatly enhanced the quality of this academic paper. Additionally, the author would like to thank Dr. Sarah McDonald and Dr. Joseph Beyene for their valuable inputs and support in conducting this research.
Author information
Authors and Affiliations
Contributions
Reza Belaghi conducted the statistical analysis, interpretation of the results, manuscript writing, and submission.
Corresponding author
Ethics declarations
Competing interests
The author declares no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Belaghi, R.A. Prediction of preterm birth in multiparous women using logistic regression and machine learning approaches. Sci Rep 14, 21967 (2024). https://doi.org/10.1038/s41598-024-60097-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-60097-4
- Springer Nature Limited