Background

More than 20 million low birth weight (LBW, < 2500 g) infants are born annually [1]. LBW infants are at increased risk for mortality and serious neurodevelopmental outcomes, making LBW a major global public health problem [2]. In addition to mortality risks, LBW infants often need advanced medical care after birth to treat problems associated with prematurity (e.g., respiratory distress syndrome, infections, feeding problems) or problems associated with being born small for gestational age (SGA; e.g., hypoglycemia, hypothermia, poor postnatal growth). Since few centers in low- and middle-income countries (LMICs) have the ability to provide advanced neonatal care, allocation of advanced care towards LBW infants is a critical part of improving health outcomes for this population.

Identification of pregnant women at risk for the delivery of LBW infants prior to birth could facilitate referral of these women to delivery centers with advanced neonatal care, thereby reducing neonatal mortality related to LBW. Machine learning, or predictive modeling, has been successful at identifying high-risk groups for certain health outcomes, [3,4,5,6] and therefore could be a useful tool to risk-stratify pregnant women in low-resource settings [7]. If a machine learning tool could reliably predict women with pregnancies at high risk of LBW, it could be produced in a user-friendly interface to help providers make decisions about referral of these women to delivery centers with advanced neonatal care. Prior studies have investigated the use of machine learning techniques for the prediction of birth weight, but the majority have used small datasets ranging from less than 100 to 50,000 women [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. Predictive modeling tools based on high quality data from larger data sets are needed to more accurately predict LBW in low-resource settings.

The Eunice Kennedy Shriver National Institute of Child Health and Human Development Global Network for Women’s and Children’s Health Research (GN) maintains a Maternal and Newborn Health Registry (MNHR) documenting pregnancy characteristics and outcomes for over 30,000 mother/infant dyads annually in seven LMICs. This high quality and large dataset is a unique resource to investigate predictive models for LBW in low-resource settings. For this study, our goal was to determine pregnancy characteristics associated with greater probability of delivering LBW infants using the GN MNHR dataset. We also aimed to develop and compare the performance of five predictive models to identify LBW infants using the MNHR data. Understanding these predictors may assist in identifying who will need additional care at delivery, facilitating timely advanced care for LBW infants and thereby reducing long-term morbidity and mortality. We hypothesized that predictive model analysis would identify previously-known prenatal predictors associated with LBW (e.g., infection and hypertension/eclampsia) and new predictors not previously considered or fully explored in prior analyses.

Methods

We used the GN MNHR dataset for this study, which includes data from eight GN research sites in seven LMICs (the Democratic Republic of the Congo, Zambia, Kenya, Guatemala, India [2 sites: Belagavi, Nagpur], Pakistan, and Bangladesh) [28]. The MNHR contains maternal, pregnancy and delivery characteristics collected by trained research staff using medical record abstraction and in-person interviews with pregnant women. In the MNHR, birthweights are measured on all livebirths and stillbirths, and fresh stillbirths are defined as having no signs of maceration, such as skin or soft tissue changes including skin sloughing or discoloration. The MNHR is approved by appropriate institutional review boards or research ethics committees at each participating institution. The MNHR undergoes routine quality assurance processes [29] and is registered as trial number NCT01073475 in clinicaltrials.gov.

For this study, we included singleton livebirths and fresh stillbirths in the GN MNHR who were not lost to follow-up prior to delivery and delivered at or after 20 weeks (in keeping with the MNHR definition of stillbirth occurring at or after 20 weeks) between January 2017 and December 2020 [28, 30]. We excluded maternal deaths prior to delivery, miscarriages, medical terminations of pregnancy (MTP), macerated stillbirths or stillbirth of unknown type, unknown birth outcomes, multiples, LBW status missing and births with any predictive model covariate missing.

Outcome and variable definitions

Our primary outcome was LBW, defined as birth weight < 2500 g by measured weight when available, or estimated weight. We selected LBW as a surrogate for preterm birth given the lack of reliable gestational age dating for the total birth population. We evaluated candidate predictors from the variables that are collected in the MNHR, focusing on characteristics that do not require the use of lab tests or ultrasound which may not be available in all resource-poor settings. We selected characteristics or complications that were present prior to the time of delivery since our focus was to build a predictive model that could direct care prior to delivery. We evaluated maternal characteristics of age (< 20 years old, 20–35 years old, > 35 years old), education (no formal education, primary/secondary education, University +), parity (0, 1, 2, 3, 4 +), height, maternal weight, socioeconomic status (SES) score (< 34, 34–65, 66 + , where lower scores indicate lower household assets and SES status) [31] and previous livebirth (yes, no, no previous pregnancy lasting 20 + weeks). Of note, SES data collection in the MNHR was initiated in 2017 but site initiation varied throughout the year. We also evaluated pregnancy characteristics including the number of antenatal care visits (0, 1–3, 4 +), use of iron supplementation, use of vitamin or calcium supplementation, hypertensive disorders (systolic blood pressure ≥ 140 mmHg and diastolic blood pressure ≥ 90 mmHg on two or more occasions after 20 weeks of pregnancy, proteinura, or generalized seizures in the setting of preeclampsia), severe antepartum hemorrhage (vaginal bleeding after 22 weeks of pregnancy and before the onset of labor that is > 1,000 mL or heavy enough to soak a pad or cloth in less than five minutes), and severe infection during pregnancy (serious illness with symptoms that can include fever, chills, rapid breathing, rapid heart rate, confusion, disorientation, hypotension, and cold, clammy skin).

Analytic methods

We completed exploratory data analysis of study outcomes, maternal characteristics, and pregnancy characteristics, looking for predictors that were highly correlated with each other, had no or little variation or were missing for many subjects. We generated descriptive statistics of frequencies for categorical variables and count, mean, and standard deviation for continuous variables.

We prepared data for the models to exclude participants missing one or more of the predictors. The binary outcome for the predictive models was LBW. The variables described above were included as predictors. We prepared data and descriptive tables using SAS 9.4 and ran predictive models using Scikit-learn in Python 3. We picked Belagavi as the reference because this site generally enrolled women earlier than the other sites, thus representing ‘best case scenario’ for having information available early in pregnancy. We picked parity of one as the reference since nulliparous women are at greater risk for poor outcomes and of the remaining parity groups, parity = 1 had the largest sample size.

We developed and tested five predictive models: decision tree and random forest (both tree-based models), logistic regression, K-nearest neighbors and support vector machines. The decision tree model, based on classification and regression trees (CART), splits the subjects into consecutive sub-groups based on the most important predictors at each node of the tree. The random forest model avoids overfitting that may happen with a decision tree model by using multiple trees. For both tree-based models, we used Gini impurity criterion to determine which predictors yielded the most information for each classification split. Thirdly, we employed a regularized logistic regression model, which used an L2 ridge regulation penalty to avoid overfitting. For our fourth model, we used K-nearest neighbors with weights set by distance. Finally, our fifth model type was support vector machine models where we ran linear, degree 2 polynomial and radial basis kernel functions. For all models, except the K-nearest neighbors, we improved the class imbalance in the study outcome by using balanced weights.

To develop the models, we split the data into a training dataset (75% of available data) and test dataset (25% of available data). Hyperparameters were tuned using tenfold grid-search cross validation with scoring = ’roc_auc’. In addition to hyperparameter tuning, we varied the cut point for the probability used to classify an outcome as LBW for the logistic regression model from 0.1 to 0.9. We trained the models on the training data and validated the models on the test data. To validate the models, we generated predictive accuracy measures, including calculating area under the curve (AUC) and producing receiver operator characteristic (ROC) curves. We calculated precision (positive predictive value), recall (sensitivity), and f1 scores using the classification_report() method. To further evaluate model performance, we generated calibration curves. We implemented the permutation-based importance in Scikit-Learn as permutation_importance() method. This method randomly shuffles each feature and computes the change in the model’s performance. The features which impact the performance the most are the most important ones. Additionally, we created partial dependency plots of the probability of LBW based on the predictors for the models that performed the best.

Results

Of the 179,953 women screened in the MNHR from January 2017 to December 2020, 145,206 (80.7%) women were eligible, consented, were not lost to follow-up prior to delivery, and delivered a singleton fresh stillbirth or livebirth with known LBW status and non-missing covariates (Fig. 1, Table 1). The most common reasons for exclusion were miscarriage and MTP in the Asian sites (total of 17.7% of Belagavi, 7.1% of Nagpur, 9.4% of Pakistani and 4% of Bangladeshi deliveries). The most common missing covariates were SES (11.2% missing overall) and maternal height (4% of subjects in Kenya) (data not shown). Other exclusions (maternal death prior to delivery, macerated stillbirth or unknown stillbirth type, unknown birth outcome, multiples, and LBW status missing) occurred in < 2% of deliveries at each site. Of the analysis subset, 2,268 were fresh stillbirths and 142,938 were livebirths; 13.8% were LBW (of which 98.9% were measured weights). The Asian sites had the highest LBW rates of 19.2% or more, and Zambia and Kenya had the lowest rates at 6.4% and 3.8%, respectively.

Fig. 1
figure 1

CONSORT diagram depicting reasons for exclusion and outcome for analysis population

Table 1 Screening, exclusion and outcome characteristics for analysis population from 2017 – 2020

We present the maternal and pregnancy characteristics in Table 2 for the analysis population. The majority of women included were 20–35 years old. Other maternal and pregnancy characteristics varied by site. Maternal and pregnancy characteristics by LBW status are provided in Supplement Table 1. Mothers of LBW infants were shorter (152 vs 155 cm), weighed less (49 vs 54 kg), and were less likely to have taken calcium or vitamin supplementation (14 vs 83%) than mothers who did not have a LBW infant. Mothers of LBW infants were also more likely to experience a complication of pregnancy, such as a hypertensive disorder (5.7 vs 1.7%), severe antepartum hemorrhage (2.3 vs 0.4%), severe infection of pregnancy (2.9 vs 1.1%), or fresh stillbirth (6.4 vs 0.8%).

Table 2 Maternal and pregnancy characteristics by site for the analysis population

The Pearson correlations for the variables included in the models were calculated (data not shown). Related variables include parity and previous livebirth (r =—0.72), SES and clinical site (r = 0.54), parity and age (r = 0.5), and previous livebirth and age (r =—0.45). The distribution of the outcome and predictors among the training dataset (N = 108,904) and test dataset (N = 36,302) were similar (data not shown).

The predictive accuracy measures for the five models are provided in Table 3. The logistic regression model performed slightly better than the other models with an AUC score of 0.72 and an accuracy score of 61%. The positive predictive value (model precision) for logistic regression was 22% and the sensitivity (model recall) was 72%. The harmonic mean of precision and recall (model f1-score or model sensitivity) was 34%. For logistic regression, the default cut point value of 0.5 yielded the AUC and accuracy scores that were as good as the other cut point values (data not shown). The support vector machine linear kernel model performed similarly to the logistic regression and tree-based models. The polynomial and radial basis function support vector machines performed similarly to the linear support vector machine (data not shown). The k-nearest neighbors model results were different with an AUC value of 0.58 and an accuracy score of 83%. Although the accuracy for this model was higher, the positive predictive value and sensitivity for this model were 20% and 7%, respectively. The Receiver Operator Characteristic (ROC) curves for the predictive models are provided in Fig. 2.

Table 3 Predictive accuracy measures by predictive model
Fig. 2
figure 2

Receiver operator characteristic (ROC) curves for the predictive models

Figure 3 depicts calibration curves for each of the predictive models. The Y-axis is the true fraction of newborns who are low birth weight (LBW) and the X-axis is the model-predicted probability of being LBW. The worst performing model was k-nearest neighbors; the near-horizontal line for this curve indicates the model will predict a consistent LBW percentage of around 15% regardless of the true incidence of LBW. The best performing model was the linear support vector machine, which predicts nearly perfectly for the lowest incidence rates and begins to diverge around 40% incidence.

Fig. 3
figure 3

Calibration curves for the predictive models. The Y-axis is the true fraction of newborns who are low birth weight (LBW) and the X-axis is the model-predicted probability of being LBW. The worst performing model was k-nearest neighbors; the near-horizontal line for this curve indicates the model will predict a consistent LBW percentage of around 15% regardless of the true incidence of LBW. The best performing model was the linear support vector machine, which predicts nearly perfectly for the lowest incidence rates and begins to diverge around 40% incidence

Figure 4 illustrates the permutation-based feature importance for the logistic regression model and partial dependency plots provide the directionality of these risk factors. For the logistic model, the most important variable relative to the other variables in predicting LBW was clinical site. The partial dependence plots show a higher probability of LBW for those not in the African sites, which coincides with the descriptive statistics that that the African sites had LBW rates a third of that of the Asian sites. Following clinical site, variables in order of importance that result in higher probability of LBW were lower maternal weight, 0–3 antenatal care visits, hypertensive disorder, severe antepartum hemorrhage, severe infection during delivery, and lower maternal height. The random forest and linear support vector machine models also found similar variables to be the most important in predicting LBW. The most important variables for each model are provided in Table 3. Table 4 provides regression coefficients and the model intercept for the logistic regression model.

Fig. 4
figure 4

Permutation-based feature importance for the logistic regression model. The permutation-based importance was implemented in Scikit-Learn as permutation_importance method. This method randomly shuffles each feature and computes the change in the model’s performance. The features which impact the performance the most are the most important ones. The score is how the variable compares to other variables in the model. Thus, a high score for any level of a categorical variable indicates the entire variable is important. For clinical sites, the reference group is Belagavi, India. For maternal age, the reference group is 20–35 years. For maternal education, the reference group is University + . For parity, the reference group is parity of 1. For socio-economic status, the reference group is 66 + . For previous livebirth, yes is the reference group. For antenatal care visits, the reference group is 4 + visits

Table 4 Logistic regression intercept and model coefficients for predictive model of low birth weight

Discussion

We report a rate of LBW of 13.8% among the eight GN sites from 2017–2020, with a range of 3.8% (Kenya) and approximately 20% (in each Asian site). We found that mothers of LBW infants were more likely to experience a complication of pregnancy, such as hypertensive disorder, severe antepartum hemorrhage, severe infection of pregnancy, or fresh stillbirth. We used five predictive modeling strategies to identify pregnancy characteristics that predict the outcome of LBW infants. Of the five models tested, the logistic regression model performed the best with an AUC of 0.72 and an accuracy of 61%. All of the top performing models identified clinical site, maternal weight, antenatal care, hypertensive disorders, and severe antepartum hemorrhage as key variables in predicting LBW.

Our logistic regression model had reasonable performance to predict LBW using maternal and pregnancy characteristics prior to delivery. If we created a model that had predicted every outcome to be non-LBW, our accuracy rate would be 86%, given the 14% incidence of LBW in our sample; however, the recall of such a model would be 0. The recall, or sensitivity (proportion of true positives correctly identified), of our logistic regression model was 0.72. Since this model is intended to identify women with high-risk pregnancies for referral, a preferable model is one that errs on the side of over-identification (more false positives, lower specificity) than under-identification (more false negatives, or lower sensitivity). Our logistic regression model also had a precision, or positive predictive value (proportion of positives reported that are true positives) of 0.22. Lower performance for precision would increase the number of false positives, incorrectly identifying women as high-risk for LBW when they deliver a non-LBW infant. While over-predicting women who are at high-risk for delivering a LBW infant could put strain on an under-resourced health system, this is a reasonable allowance for a screening test to direct women to increased surveillance.

A comparison of our results to those from prior studies illustrates the importance of studying critical variables in various populations/datasets and comparing across models. Our logistic regression model performed similarly to the predictive model for LBW that included different factors associated with LBW from a case–control study in North India with 500 neonates. That study identified inadequate maternal weight gain, inadequate maternal protein intake, prior preterm infant, prior LBW infant, anemia and passive smoking as factors significantly associated with LBW [26]. Their predictive model had a sensitivity of 72% and specificity of 64%. Another model, using the Bangladeshi Demographic and Health Survey data identified alive child, education, height, region, twin child and wealth index as significant risk factors for LBW [27]. This logistic regression-based classifier had an AUC of 0.59 and accuracy of 87.6%. A United Arab Emirates study from a dataset of 821 women evaluated 30 machine learning algorithms for LBW classification, and found that logistic regression with SMOTE oversampling techniques achieved an accuracy of 90.24% and recall of 90.2%, with critical variables of diabetes, hypertension, and gestational age [8]. Developing the best predictive model may require expanding data collection to include additional relevant predictors from a variety of prospective modeling studies, which would lead to better overall model performance.

In low-resource settings where prenatal ultrasound is infrequently available to evaluate fetal weight, identification of LBW in advance of delivery using predictive modeling could have a substantial impact on care. Our top performing models identified a consistent cluster of variables available prior to delivery as important predictors of delivering a LBW infant, including low maternal weight, hypertensive disorder and severe antepartum hemorrhage. Maternal weight, hypertensive disorder, and severe antepartum hemorrhage are detectable at a time when referral is still feasible, and thus could be feasibly incorporated into a clinical tool to predict LBW. In particular, maternal malnutrition is a major, potentially modifiable predictor of LBW identifiable early in pregnancy. Limited antenatal care was also identified as a risk factor, but this variable is confounded by the higher number of preterm infants that are LBW, since preterm delivery truncates the usual number of antenatal care visits. We suspect that improving data collection in these key domains could improve the reliability of the predictive model. For example, inclusion of additional clinical information such as specific maternal blood pressure, might improve the accuracy of the predictive tools and thereby enhance the clinical utility of the predictive models. Our predictive modeling study is the first step in the development of a clinical tool to support decisions regarding referral of pregnancies at high risk for LBW in low-resource settings. While our study did not identify novel predictors of LBW, a clinical decision support tool incorporating these results could enhance care by standardizing referral decisions related to the anticipated delivery of a LBW infant.

Our study also identified clinical site as a consistent predictor of LBW across our top performing models. It is important to recognize that sites are not necessarily reflective of care across a country; future studies could consider analysis by geographic clusters with similar LBW rates as an alternative approach. Ultimately, the influence of site on prediction of LBW suggests that clinical tools to predict LBW should be developed within the site that they will be used. Our analysis provides a rubric for the development of similar tools in new sites, identifying an important set of predictors for collection that are not related to site, and indicating that a traditional logistic model is sufficient for analysis. Given the importance of site in the model, additional research could also focus on understanding how site differences are related to measurable characteristics, with replication of modeling with these new characteristics to improve predictive performance and reduce the importance of site identifiers in the model.

Our study has several notable strengths. We used high quality and robust prospectively-collected research data from the NICHD GN MNHR. This unique, population-based dataset contains maternal characteristics, pregnancy characteristics and delivery outcomes collected for a large number of women in seven different LMICs in Latin America, Africa and Asia. Due to the paucity of detailed health records in LMICs, the MNHR is an exceptional resource by which to build a predictive model. We assessed and compared the performance of five different predictive models using independent training and test data. The side-by-side comparison showed that the logistic regression model performed similarly to the random forest and linear support vector machine models, which is encouraging since logistic regression models are widely used and less complex. However, our study also had some limitations. We were limited by the data collected in the MNHR to build the predictive models. We had missing data for socio-economic status and maternal height (primarily Kenya) in early 2017. Maternal weight and clinical site were both predictors of LBW but were related with lower average weights in the Asian sites compared to higher average weights in Guatemala and the African sites. While using BMI instead of weight might account for some of the difference in weight across sites, we chose to maintain maternal height and weight as separate variables in our modeling since the MNHR includes sites where stunting or underweight are serious issues. We did not have information from clinical records such as maternal blood pressure, fundal height, or other features that might have improved the precision of our model. Despite these limitations, we believe that the variables collected approximate the typical data that might be readily available in a low-resource area, where clinical variables might be difficult to obtain. We limited the analysis to five different model types; other models such as extreme gradient boosting may have performed better.

Conclusion

We identified several predictive modeling strategies that risk-stratify women in LMICs based on their risk of delivering a LBW infant using clinical variables readily available prior to delivery in low-resource settings. Our creation of these predictive models is an important first step in the development of a clinical decision support tool to prompt early referral of women at high-risk of delivering a LBW infant in LMICs. Such a clinical tool could facilitate standard referral of these women before delivery, directing the limited resource of advanced neonatal care to infants at highest risk. Timely, advanced care for LBW infants could reduce mortality and serious morbidity of these infants.