Introduction

Preterm birth brings a significant disease burden for infants and children in the world. On a yearly basis, fifteen million newborns are preterm on the globe and this is a major cause of one million deaths among those aged 0–4 years in the world1,2. The burden of preterm birth including its mortality and morbidity seems to be increasing in most countries3. Neonatal intensive care has advanced over the last few decades but extremely preterm (EP, less than 28 weeks of gestation) or extremely low birth weight (ELBW, less than 1000 g) infants have been subjected to death, neonatal complications or long-term neurodevelopmental impairment4,5,6, which presents a great challenge for the health care system as well7. Small for gestational age (SGA) is another significant contributor for the global burden of preterm birth8,9 and it is very important to identify the risk factors of these adverse birth outcomes above.

On the other hand, air pollution has become an important health issue in the world, increasing global mortality and morbidity. In 2016, ambient air pollution resulted in 4.2 million premature deaths worldwide10,11. This is particularly notable in a country such as Korea, where air pollution including particulate matter (PM) has registered rapid growth over the past few decades along with rapid industrialization and urbanization. Especially, maternal exposure to PM is reported to be a major cause of adverse health outcomes. Maternal exposure to PM2.5 (fine inhalable particles with diameters < 2.5 μm) or PM10 (inhalable particles with diameters < 10 μm) is found to have positive relationships with preterm birth12,13,14,15,16,17 and low birth weight18,19,20. However, little literature is available on associations between maternal exposure to PM and more adverse birth outcomes such as EP, ELBW or SGA. Furthermore, no endeavor has been made regarding the utilization of machine learning for the prediction of adverse birth outcomes among very low birth weight (VLBW, less than 1500 g) infants. In this context, this study employed machine learning and a national prospective cohort registry database to examine main predictors of adverse birth outcomes in VLBW infants including PM10 as a marker of air pollution.

Results

Descriptive statistics are shown for participants’ adverse birth outcomes and their predictors in Table 1. Among 10,423 participants, 3961 (38.0%), 1919 (18.4%), 3960 (38.0%), 1658 (15.9%) and 2242 (21.5%) belonged to the categories of GA < 28, GA < 26, BW < 1000, BW < 750 and SGA, respectively. Indeed, the mean and standard deviation of maternal age were 31 and 4.28, respectively. Monthly PM10 (micrograms per cube meter) were 52 (January), 49 (February), 60 (March), 57 (April), 60 (May), 43 (June), 35 (July), 34 (August), 34 (September), 38 (October), 47 (November) and 48 (December). Yearly PM10 (micrograms per cube meter), temperature average (degree Celsius), temperature min (degree Celsius) and temperature max (degree Celsius) in 2013, 2014, 2015, 2016 and 2017 were: 48, 46, 48, 48 and 47 for yearly PM10; 12.85, 14.10, 13.85, 14.80 and 14.60 for yearly temperature average; 7.35, 8.55, 8.50, 10.05 and 9.30 for yearly temperature min; and 19.05, 20.50, 20.05, 20.45 and 20.65 for yearly temperature max. The results of univariate analysis were presented in Table 2. The P values were smaller than 0.10 for the following variables: birth-month with GA < 28, GA < 26, BW < 750 and SGA; sex (male) with GA < 28, GA < 26, BW < 1000 and BW < 750 (positive relationship); number of fetuses with GA < 28, GA < 26, BW < 1000 and BW < 750 (negative relationship); maternal age with SGA (positive relationship); and PM10 month with BW < 1000 and BW < 750 (positive relationship).

Table 1 Descriptive statistics: adverse birth outcomes and categorical predictors.
Table 2 Univariate analysis.

The performance of the random forest was the best among the six models in this study (Table 3). Its areas under the receiver-operating-characteristic curve, 0.73 (PM10 excluded) and 0.72 (PM10 included), were higher than its logistic-regression counterparts, 0.69 (PM10 excluded) and 0.68 (PM10 included). Based on random forest variable importance in Table 4 (PM10 excluded), main predictors of adverse birth outcomes were maternal age (0.2276), birth-year (0.1216), birth-month (0.1165), sex (0.0415), number of fetuses (0.0404), primipara (0.0373), maternal education (0.0361), pregnancy-induced hypertension (0.0325), chorioamnionitis (0.0319) and antenatal steroid (0.0307). Likewise, according to random forest variable importance in Table 5 (PM10 included), major predictors of adverse birth outcomes were maternal age (0.2131), birth-month (0.0767), PM10 month (0.0656), sex (0.0428), number of fetuses (0.0424), primipara (0.0395), maternal education (0.0352), pregnancy-induced hypertension (0.0347), chorioamnionitis (0.0336) and antenatal steroid (0.0318) (Fig. 1). These values were the pooled outcome for the five adverse birth outcomes. It was found that the variable importance of birth-year disappears after the inclusion of PM10 month in the model. Finally, Table 6 helps to understand how variable importance rankings vary among different adverse birth outcomes. The ranking of a top-5 (or top-10) predictor was highlighted with the color of orange (or mild blue) in each column of the five adverse birth outcomes in the table. Maternal age, birth-month and PM10 month were the first, second and third most important predictors across board. However, some predictors were outside the top 5 on average but within the top 5 in certain adverse birth outcomes: primipara (6th on average) ranked 4th and 5th in BW < 1000 and BW < 750, respectively; pregnancy-induced hypertension (8th on average) ranked 4th in SGA; and chorioamnionitis (9th on average) ranked 4th in GA < 28. Also, it needs to be noted that maternal education ranked within the top 10 across board.

Table 3 Model Performance.
Table 4 Random forest variable importance for adverse birth outcomes: PM10 excluded.
Table 5 Random forest variable importance for adverse birth outcomes: PM10 included.
Figure 1
figure 1

Random forest variable importance plots for adverse birth outcomes: PM10 included. PM, particulate matter; PROM, prelabor rupture of membranes.

Table 6 Random forest variable importance rankings for adverse birth outcomes: PM10 included.

Discussion

This study used machine learning to provide the most comprehensive investigation for the predictors of five adverse birth outcomes, using a national prospective cohort registry (KNN) database in VLBW infants. Among the six prediction models for adverse birth outcomes, the random forest had the best performance (accuracy 0.79, area under the receiver-operating-characteristic curve 0.72). According to the random forest variable importance, major predictors of adverse birth outcomes were maternal age, birth-month, PM10 month, sex, number of fetuses, primipara, maternal education, pregnancy-induced hypertension, chorioamnionitis and antenatal steroid.

In this study, birth month and PM10 had affect to EP, ELBW and SGA. Environmental changes according to the month such as air pollution, ambient temperature, the synthesis of vitamin D from sunlight could affect the adverse birth outcomes. Recently, existing literature reports the positive association of adverse birth outcomes with maternal exposure to PM during specific trimester or entire pregnancy21,22. A meta-analysis study reported that PM10 was found to increase extremely preterm birth21, and in particular, the incidence of preterm birth increased in Asia with high concentration of air pollution22. Unlike previous studies, this study found that high air pollution exposure in the birth month, rather than specific trimester or entire pregnancy period, was strongly associated with adverse birth outcomes. This result was consistent with some previous studies that maternal exposure to PM during the birth month was a major risk factor for adverse birth outcomes23,24. A time-series study conducted by Liu et al. showed that short term exposure to air pollution one week before childbirth increased preterm birth23. In addition, Trasande et al.'s air pollution research model showed that PM2.5 concentration in the birth month was associated with low birth weight and VLBW24. A possible pathway between maternal exposure to PM and adverse birth outcome would be oxidative stress, placental dysfunction, endothelial dysfunction and abnormal fetal growth25,26. But more examination is to be done on this issue.

Sex (male) was the fourth most important predictor for adverse birth outcomes in VLBW infants in this study. This result is consistent with a recent review27, which suggests the following mechanism: In the presence of a male fetus the trophoblast causes a more pro-inflammatory environment, and the maturation of the fetal hypothalamic–pituitary–adrenal axis and the expression of placental genes follow sex-dependent patterns.

The number of fetuses was found to have a negative association with adverse birth outcomes in VLBW infants in this study. This finding does not agree with a recent review, which reports that the number of fetuses is a strong risk factor for adverse birth outcome28. One way to explain this discrepancy is that more active prenatal screening and monitoring were done in the case of multiple fetuses in this study: Pregnancy with multiple fetuses often involves in vitro fertilization, which can associate with high socioeconomic status and active prenatal care29.

Maternal age was the first most important predictor for adverse birth outcomes in VLBW infants but the result of univariate analysis was significant only for SGA in this study. The participants of this study were limited to VLBW infants and more investigation based on broader participants is to be done for more conclusive result.

Some predictors were outside the top 5 on average but within the top 5 in certain adverse birth outcomes in this study: pregnancy-induced hypertension (eighth on average) ranked fourth in SGA; and chorioamnionitis (ninth on average) ranked fourth in GA < 28. Chorioamnionitis is considered to be a major predictor of preterm birth30,31. It is expected to encourage pro-inflammatory cytokines, uterine contraction and preterm labor; or it is supposed to promote matrix metalloproteinase activation, fetal membrane degradation and preterm birth32. Also, early onset preeclampsia is reported to cause abnormal placentation and spiral artery remodeling, which in turn lead to restriction on blood flow to uterine arteries and abnormal fetal growth33,34,35.

Limitations

Firstly, this study did not analyze possible mediating effects among predictors. Secondly, this study adopted the binary categories of adverse birth outcomes (no, yes), which can be extended to multiple categories with more clinical insights. Thirdly, it was beyond the scope of this study to investigate a variety of mechanisms between PM and adverse birth outcomes. Little effort has been made and more examination is needed in this direction. Fourthly, synthesizing different modes of machine learning methods for different types of adverse birth outcome data would break new ground on this topic. Fifthly, this study did not consider indoor enviromental factors, which were reported as major predictors of adverse birth outcomes along with PM20,36.

Conclusions

The current study is the first to evaluate the predictors including PM10 month on adverse birth outcomes in VLBW infants using machine learning. Adverse birth outcomes such as EP, ELBW or SGA have strong associations with PM10 month as well as maternal and fetal predictors among the VLBW infants. For the prevention of adverse effects on birth outcomes, clinical and/or policy measures are needed regarding these predictors.

Methods

Participants and variables

Data consisted of 10,423 VLBW infants from the Korean Neonatal Network (KNN) database during January 2013-December 2017. The KNN started on April 2013 as a national prospective cohort registry of VLBW infants admitted or transferred to neonatal intensive care units across South Korea (It covers 74 neonatal intensive care units now). It collects the perinatal and neonatal data of VLBW infants based on a standardized operating procedure37.

Five adverse birth outcomes were considered as binary dependent variables (no, yes), i.e., gestational age less than 28 weeks (GA < 28), GA less than 26 weeks (GA < 26), birth weight less than 1000 g (BW < 1000), BW less than 750 g (BW < 750) and SGA. Thirty-three predictors were included: sex—male (no, yes), birth-year (2013, 2014, 2015, 2016, 2017), birth-month (1, 2, …, 12), birth-season-spring (no, yes), birth-season-summer (no, yes), birth-season-autumn (no, yes), birth-season-winter (no, yes), number of fetuses (1, 2, 3, 4 or more), in vitro fertilization (no, yes), gestational diabetes mellitus (no, yes), overt diabetes mellitus (no, yes), pregnancy-induced hypertension (no, yes), chronic hypertension (no, yes), chorioamnionitis (no, yes), prelabor rupture of membranes (no, yes), prelabor rupture of membranes > 18 h (no, yes), antenatal steroid (no, yes), cesarean section (no, yes), oligohydramnios (no, yes), polyhydramnios (no, yes), maternal age (years), primipara (no, yes), maternal education (elementary, junior high, senior high, college or higher), maternal citizenship (Korea, Vietnam, China, Philippines, Japan, Cambodia, United States, Thailand, Mongolia, Other), paternal education (elementary, junior high, senior high, college or higher), paternal citizenship (Korea, Vietnam, China, Philippines, Japan, Cambodia, United States, Thailand, Mongolia, Other), unmarried (no, yes), congenital infection (no, yes), PM10 year (PM10 for each year), PM10 month (PM10 for each birth-month), temperature average (for each year), temperature min (for each year) and temperature max (for each year). PM10 and temperature data came from the Korea Meteorological Administration (PM10 https://data.kma.go.kr/data/climate/selectDustRltmList.do?pgmNo=68; temperature https://web.kma.go.kr/weather/climate/past_cal.jsp). The definition of each variable is given in Text S1, supplementary text.

Statistical analysis

The artificial neural network, the decision tree, the logistic regression, the Naïve Bayes, the random forest and the support vector machine were used for predicting preterm birth38,39,40,41,42,43. A decision tree includes three elements, i.e., a test on an independent variable (intermediate note), an outcome of the test (branch) and a value of the dependent variable (terminal node). A naïve Bayesian classifier performs classification on the basis of Bayes’ theorem. Here, the theorem states that the probability of the dependent variable given certain values of independent variables can be calculated based on the probabilities of the independent variables given a certain value of the dependent variable. A random forest is a collection of many decision trees, which make majority votes on the dependent variable (“bootstrap aggregation”). Let us take a random forest with 1000 decision trees as an example. Let us assume that original data includes 10,000 participants. Then, the training and test of this random forest takes two steps. Firstly, new data with 10,000 participants is created based on random sampling with replacement, and a decision tree is created based on this new data. Here, some participants in the original data would be excluded from the new data and these leftovers are called out-of-bag data. This process is repeated 1000 times, i.e., 1000 new data are created, 1000 decision trees are created and 1000 out-of-bag data are created. Secondly, the 1000 decision trees make predictions on the dependent variable of every participant in the out-of-bag data, their majority vote is taken as their final prediction on this participant, and the out-of-bag error is calculated as the proportion of wrong votes on all participants in the out-of-bag data38,39.

A support vector machine estimates a group of “support vectors”, that is, a line or space called “hyperplane”. The hyperplane separates data with the greatest gap between various sub-groups. An artificial neural network consists of “neurons”, information units combined through weights. In general, the artificial neural network includes one input layer, one, two or three intermediate layers and one output layer. Neurons in a previous layer link with “weights” in the next layer (Here, these weights denote the strengths of linkages between neurons in a previous layer and their next-layer counterparts). This “feedforward” operation begins from the input layer, runs through intermediate layers and ends in the output layer. Then, this process is followed by learning: These weights are updated according to their contributions for a gap between the actual and predicted final outputs. This “backpropagation” operation begins from the output layer, runs through intermediate layers and ends in the input layer. The two processes are repeated until the performance measure reaches a certain limit38,39. Data on 10,423 observations with full information were divided into training and validation sets with a 70:30 ratio (7296 vs. 3127). Accuracy, a ratio of correct predictions among 3127 observations, was employed as a standard for validating the models. Random forest variable importance, the contribution of a certain variable for the performance (GINI) of the random forest, was used for examining major predictors of adverse birth outcomes in VLBW infants including PM10. The random split and analysis were repeated 50 times then its average was taken for external validation44,45. R-Studio 1.3.959 (R-Studio Inc.: Boston, United States) was employed for the analysis during August 1, 2021–September 30, 2021.

Ethic statement

The KNN registry was approved by the institutional review board (IRB) at each participating hospital (IRB No. of Korea University Anam Hospital: 2013AN0115). Informed consent was obtained from the parent(s) of each infant registered in the KNN. All methods were carried out in accordance with the IRB-approved protocol and in compliance with relevant guidelines and regulations.

The names of the institutional review board of the KNN participating hospitals were as follows: The institutional review board of Gachon University Gil Medical Center, The Catholic University of Korea Bucheon ST. Mary’s Hospital, The Catholic University of Korea Seoul ST. Mary’s Hospital, The Catholic University of Korea ST. Vincent’s Hospital, The Catholic University of Korea Yeouido ST. Mary’s Hospital, The Catholic University of Korea Uijeongbu ST. Mary’s Hospital, Gangnam Severance Hospital, Kyung Hee University Hospital at Gangdong, GangNeung Asan Hospital, Kangbuk Samsung Hospital, Kangwon National University Hospital, Konkuk University Medical Center, Konyang University Hospital, Kyungpook National University Hospital, Gyeongsang National University Hospital, Kyung Hee University Medical center, Keimyung University Dongsan Medical Center, Korea University Guro Hospital, Korea University Ansan Hospital, Korea University Anam Hospital, Kosin University Gospel Hospital, National Health Insurance Service Iilsan Hospital, Daegu Catholic University Medical Center, Dongguk University Ilsan Hospital, Dong-A University Hospital, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Pusan National University Hospital, Busan ST. Mary’s Hospital, Seoul National University Bundang Hospital, Samsung Medical Center, Samsung Changwon Medical Center, Seoul National University Hospital, Asan Medical Center, Sungae Hospital, Severance Hospital, Soonchunhyang University Hospital Bucheon, Soonchunhyang University Hospital Seoul, Soonchunhyang University Hospital Cheonan, Ajou University Hospital, Pusan National University Children’s Hospital, Yeungnam University Hospital, Ulsan University Hospital, Wonkwang University School of Medicine & Hospital, Wonju Severance Christian Hospital, Eulji University Hospital, Eulji General Hospital, Ewha Womans University Medical.

Center, Inje University Busan Paik Hospital, Inje University Sanggye Paik Hospital, Inje University Ilsan Paik Hospital, Inje University Haeundae Paik Hospital, Inha University Hospital, Chonnam National University Hospital, Chonbuk National University Hospital, Cheil General Hospital & Women’s Healthcare Center, Jeju National University Hospital, Chosun University Hospital, Chung-Ang University Hospital, CHA Gangnam Medical Center, CHA University, CHA Bundang Medical Center, CHA University, Chungnam National University Hospital, Chungbuk National University, Kyungpook National University Chilgok Hospital, Kangnam Sacred Heart Hospital, Kangdong Sacred Heart Hospital, Hanyang University Guri Hospital, and Hanyang University Medical Center.