Background

Preterm birth (PTB) is conventionally defined as a birth that occurs before 37 completed weeks of gestation [1, 2]. Globally, complications arising from preterm birth were the leading cause of child (less than 5 years of age) mortality in 2016, accounting for 35% of neonatal deaths [3]. PTB is a unique disease in the way it is defined by the duration of gestation and not by a pathological process. The duration of gestation is the period between the date of conception and date of delivery. While the date of delivery can be documented with fair accuracy, ascertaining the date of conception is challenging. The estimation of gestational age (GA) during the antenatal period also called as the dating of pregnancy has been conventionally done using the first day of the recall-based last menstrual period (LMP) or measurement of foetal biometry by ultrasonography (USG) [4, 5]. Each of these methods poses a unique set of challenges. The accuracy of dating by LMP method is dependent on accurate recall, and regularity of menstrual cycle [4, 6] which, is affected by numerous physiological and pathological conditions such as obesity [7], polycystic ovarian syndrome [8], breastfeeding [9] and use of contraceptive methods [10].

The USG method is based on foetal biometry using crown-rump length (CRL) in the first trimester. Several formulae exist to estimate GA using CRL, including Hadlock formula [11], based on a US population-based study widely used in India [12]. However, the choice of dating formula might influence dating accuracy, as these formulae have been developed from studies that differed both in the study population and study design [13]. The error and bias due to the choice of dating formula need to be quantitatively studied to estimate the rate of PTB in a specific population [14]. In addition to its public health importance, accurate dating is essential for clinical decision making during the antenatal period, such as scheduling monitoring visits and recommending appropriate antenatal care [4].

This study first quantified the discrepancy between LMP and USG-based (Hadlock) dating methods during the first trimester in an Indian population. We characterised how each method could contribute to the discrepancy in calculating the GA. We then built a population-specific model from the GARBH-Ini cohort (Interdisciplinary Group for Advanced Research on BirtH outcomes - DBT India Initiative), Garbhini-GA1, and compared its performance with the published ‘high quality’ formulae for the first-trimester dating [13] – McLennan and Schluter [15], Robinson and Fleming [16], Sahota [17] and Verburg [18], INTERGROWTH-21st [19], and Hadlock’s formula [11] (Table S1). Finally, we quantified the implications of the choice of dating methods on PTB rates in our study population.

Methods

Study design

GARBH-Ini is a collaborative program, initiated by Translational Health Science and Technology Institute, Faridabad with partners from Regional Centre of Biotechnology, Faridabad; National Institute of Biomedical Genomics, Kalyani; Civil Hospital, Gurugram; Safdarjung hospital, New Delhi. The GARBH-Ini cohort is a prospective observational cohort of pregnant women initiated in May 2015 at the District Civil Hospital that serves a mostly rural and semi-urban population in the Gurugram district, Haryana, India. The cohort study’s objective is to develop an effective risk stratification that facilitates timely referral for women at high risk of PTB, particularly in low- and middle-income countries. Women in the GARBH-Ini cohort are enrolled within 20 weeks of gestation and are followed three times during pregnancy till delivery and once postpartum [20]. After a verbal consent to be interviewed, informed consent for screening is obtained for women at < 20-weeks of gestational age (GA) calculated by the last menstrual period. A dating ultrasound is performed within the week to confirm a viable intrauterine pregnancy with  < 20-weeks GA using standard foetal biometric parameters. A time-series data on a large set of clinical and socioeconomic variables are collected across pregnancy to help stratify women into defined risk groups for PTB. The dating ultrasound is performed by a qualified radiologist specifically trained in the study protocol. The clinical and demographic information is collected by trained, dedicated research staff under medically qualified research officers’ supervision. The data acquisition protocols and quality control measures are detailed elsewhere [20].

Sampling strategy and participant datasets derived for the study

This analysis’s samples were derived from the first 3499 participants enrolled in the GARBH-Ini study (between May 2015 to November 2017). We included 1721 participants (Np = 1721), enrolled < 14 weeks of gestation and who had information on the LMP, CRL with singleton pregnancy which advanced beyond 20 weeks of gestation, i.e. the pregnancy did not end in a spontaneous abortion or major congenital abnormalities which required medical termination of pregnancy. If a participant was enrolled < 11 weeks, dating ultrasound was done upon enrolment when CRL was measured for the first time. The same participant was asked to come for another ultrasound between 11 and 14 weeks of gestation to assess foetal morphology during which another CRL measurement was taken. If more than one scan was performed for a participant, data from both the scans were included as unique observations (No). Therefore, 1721 participants contributed a total of 2562 observations (No = 2562) that was used for further analyses, and this dataset of observations was termed as the TRAINING DATASET (Fig. 1). This dataset was used to develop a population-based dating model named Garbhini-GA1, for the first trimester.

Fig. 1
figure 1

Outline of the data selection process for different datasets – (a) TRAINING DATASET and (b) TEST DATASET. Coloured boxes indicate the datasets used in the analysis. The names of each of the dataset are indicated below the box. Exclusion criteria for each step are indicated. Np indicates the number of participants included or excluded by that particular criterion and No indicates the number of unique observations derived from the participants in a dataset

It is essential to independently evaluate models on data that was not used for building the model in order to eliminate any biases that may have been incorporated due to the iterative learning process of the model building dataset and estimate the expected performance when applying the model on new data in the real world. We used an unseen TEST DATASET created from 999 participants enrolled after the initial set of 3499 participants in this cohort (Fig. 1). The TEST DATASET was obtained by applying identical processing steps as described for the TRAINING DATASET (No = 808 from Np = 559; Fig. 1).

Assessment of LMP and CRL

The date of LMP was ascertained from the participant’s recall of the first day of the last menstrual period. CRL from an ultrasound image (GE Voluson E8 Expert, General Electric Healthcare, Chicago, USA) was captured in the midline sagittal section of the whole foetus by placing the callipers on the outer margin skin borders of the foetal crown and rump ([20], see Supplementary Figure S5). The CRL measurement was done thrice on three different ultrasound images, and the average of the three measurements was considered for estimation of CRL-based GA. Under the supervision of medically qualified researchers, study nurses documented the clinical and sociodemographic characteristics [20].

Development and validation of the population-specific gestational dating model

The gold standard or ground truth for development of first-trimester dating model was derived from a subset of participants with the most reliable GA based on last menstrual period. We used two approaches to create subsets from the TRAINING DATASET for developing the first-trimester population-based dating formula. The first approach excluded participants with potentially unreliable LMP or high risk of foetal growth restriction such as smoking, alcohol and tobacco consumption and under/overweight mothers, giving us the CLINICALLY-FILTERED DATASET (No = 980 from Np = 650; Fig. 1, Table S2). We included participants with medical complications and those who delivered preterm in our training dataset to improve representativeness of our model.

The second approach used Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method to remove outliers based on noise in the data points. DBSCAN identifies noise by classifying points into clusters if there are a sufficient number of neighbours that lie within a specified Euclidean distance or if the point is adjacent to another data point meeting the criteria [21]. DBSCAN was used to identify and remove outliers in the TRAINING DATASET using the parameters for distance cut-off (epsilon, eps) 0.5 and the minimum number of neighbours (minpoints) 20. A range of values for eps and minpoints did not markedly change the clustering result (Table S3). The resulting dataset that retained reliable data points for the analysis was termed as the DBSCAN DATASET (No = 2156 from Np = 1476; Fig. 1).

The use of CRL for dating of pregnancy is restricted to the first trimester of pregnancy in clinical practice. This is because of the technical difficulties in obtaining accurate CRL measurements beyond this period. The same was practised in the GARBH-Ini cohort as it is an observational study. When an ultrasonographic examination was performed during early pregnancy, the radiologist refrained from measuring CRL if she/he was not assured of its accuracy. Instead, the radiologist measured the other foetal biometry (biparietal diameters, abdominal and head circumference and femur length to ascertain the gestational age). This resulted in a dataset with GA by CRL truncated at 14 weeks of gestation. When used for training models, such a truncated dataset may lead to inaccuracies in the model fitting particularly at the margins of the distribution around 14 weeks [22]. We considered multiple approaches used in the literature [22] and overcame this by supplementing our dataset with simulated observations from the Hadlock dataset, which measured the relationship between CRL and GA in the range of 15–18 weeks [11]. This supplemented dataset was used to build fractional polynomial models of GA as a function of CRL (see Figure S1, Table S7).

Development of a first trimester dating formula was done by fitting fractional polynomial regression models of GA (weeks) as a function of CRL (cm) on CLINICALLY-FILTERED and DBSCAN datasets. The performance of the chosen formula was validated in the TEST DATASET.

In addition to CRL as a primary indicator, a list of 282 candidate variables was explored by feature selection methods on the DBSCAN DATASET to identify other variables which may be predictive of GA during the first trimester. These methods helped to find uncorrelated, non-redundant features that might improve GA prediction accuracy (Table S4). First, the feature selection was done using Boruta [23], a random forest classifier, which identified six features and second, by implementing Generalised Linear Modelling (GLM) that identified two candidate predictors of GA. A union of these features (Table S5), gave a list of six candidate predictors. Equations were generated using all combinations of these predictors in the form of linear, logarithmic, polynomial and fractional power equations. The best fit model was termed Garbhini-GA1 formula and was validated for its performance in the TEST DATASET.

Comparison of LMP- and USG-based dating methods during the first-trimester

We calculated the difference between LMP- and USG-based GA for each participant and studied the distribution of the differences by Bland-Altman (BA) analysis [24]. Additionally, we estimated the effect of factors that could contribute to the discrepancy between GA by LMP and ultrasound. This may be due to an unreliable LMP or foetal growth restriction. We repeated the comparative analysis in our population’s subsets with accurate LMP and no risk factors for foetal growth restriction (see Additional file 1).

The mean difference between the methods and the limits of agreement (LoA) for 95% CI were reported. The PTB rates with LMP- and USG-based methods were reported per 100 live births with 95% CI. We compared different USG-based formulae using correlation analysis.

The data analyses were carried out in R versions 3.6.1 and 3.5.0. DBSCAN was implemented using the package dbscan, and the random forests feature selection was performed using the Boruta package [23]. Statistical analysis for comparing PTB rate as estimated using different dating formulae was carried out using standard t-test with or without Bonferroni multiple testing correction or using Fisher’s Exact test wherever appropriate.

Results

Description of participants included in the study

The median age of the participants enrolled in the cohort was 23.0 years (IQR 21.0–26.0), with the median weight and height as 47.0 kg (IQR 42.5–53.3) and 153.0 cm (IQR 149.2–156.8), respectively and with 59.93% of the participants having a normal first trimester BMI (median 20.09, IQR 18.27–22.59). Almost half of them were primigravida. Most of the participants (98.20%) were from middle or lower socioeconomic strata [25]. The participants selected for this analysis had a median GA of 11.71 weeks (IQR 9.29–13.0). The other baseline characteristics are given in Table 1.

Table 1 Baseline characteristics of the participants included in the TRAINING DATASET (Np = 1721) to compare different methods of dating

Comparison of USG-Hadlock and LMP-based methods for estimation of GA in the first trimester

The mean difference between USG-Hadlock and LMP-based dating at the time of enrolment was found to be − 0.44 ± 2.02 weeks (Fig. 2a) indicating that the LMP-based method overestimated GA by nearly 3 days. The LoA determined by BA analysis was − 4.39, 3.51 weeks, with 8.82% of participants falling beyond these limits (Fig. 2b) suggesting a high imprecision in both the methods. The LoA between USG-Hadlock and LMP-based dating marginally narrowed when tested on participants with reliable LMP (LoA -4.22, 3.28) or those with low-risk of foetal growth restriction (LoA -4.13, 3.21). The wide LoA that persisted despite ensuring reliable LMP and standardised CRL measurements represent the residual imprecision due to unknown factors in GA’s estimation.

Fig. 2
figure 2

a Distribution of the difference between USG- and LMP-based GA. The x-axis is the difference between USG and LMP-based GA in weeks, and the y-axis is the number of observations. b BA analysis to evaluate the bias between USG and LMP-based GA. The x-axis is mean of Hadlock and LMP-based GA in weeks, and the y-axis is the difference between Hadlock and LMP-based GA in weeks. Regression line with 95% CI is shown. c Comparison of individual-level classification of preterm birth by Hadlock- and LMP-based methods. Green (term birth for both), red (preterm birth for both), blue (term birth for LMP but preterm birth for Hadlock) and purple (term for Hadlock but preterm for LMP)

Development of Garbhini-GA1 formula for first-trimester dating

To remove noise from the TRAINING DATASET for building population-specific first-trimester dating models, two methods were used – clinical criteria-based filtering and DBSCAN (Fig. 1). When clinical criteria (Fig. 1) were used, more than two-third observations (68.46%) were excluded (Fig. 3a). However, when DBSCAN was implemented, less than one-sixth observations (15.85%) were removed (Fig. 3b). Models for first-trimester dating using CLINICALLY-FILTERED and DBSCAN datasets with CRL as the only predictor was done using fractional polynomial regression to identify the best predictive model (Figure S2). The DBSCAN approach provided a more accurate dataset (i.e. no artefacts as observed in the CLINICALLY-FILTERED DATASET) with lesser outliers. We, therefore, used DBSCAN DATASET for building dating models. Comparison among various dating models showed that the best regression coefficient (R2) was for quadratic regression (R2 = 0.86, Table S6). This provided the basis for using the following quadratic formula as the final model for estimating GA in the first trimester and was termed as Garbhini-GA1 formula:

$$ \mathit{\mathsf{GA}}=-\mathsf{0.02294}{\left(\mathit{\mathsf{CRL}}\right)}^{\mathsf{2}}+\mathsf{1.15018}\left(\mathit{\mathsf{CRL}}\right)+\mathsf{6.73526} $$
Fig. 3
figure 3

Comparison of data chosen to be reference data for the development of dating formula by (a) clinical and (b) data-driven (DBSCAN) approaches. The x-axis is CRL in cm, and the y-axis is GA in weeks (LMP-based are datapoints, Garbhini-GA1 is regression line). After filtering, the data points selected (TRUE) are coloured black and points not selected (FALSE) are white

where GA is in weeks, and CRL is in cm.

A multivariate dating model including CRL and the six additional predictors identified by data-driven approaches (GLM and Random forests): resident state, weight, BMI, abdominal girth, age, and maternal education, did not improve the performance of the CRL-based dating model (Figure S3, Table S6).

Comparison of published formulae and Garbhini-GA1 formula for estimation of GA

The actual test of the validity of a formula is to estimate GA reliably in an unseen sample population. We tested the published formulae’s performance (Table S1) and Garbhini-GA1 formula independently on the TEST DATASET (Figure S4). It was observed that Garbhini-GA1 had an R2 value of 0.58 (Table S8). All other formulae performed identically to Garbhini-GA1 on the TEST DATASET (Table S8). Furthermore, all possible pairwise BA analysis of these formulae (including Garbhini-GA1) showed that the mean difference of estimated GA varied from − 0.17 to 0.50 weeks (Table 2). This result shows that Garbhini-GA1 performs equally well as other formulae.

Table 2 Pairwise comparison of mean difference (LoA) between different first-trimester dating formulae (Difference: Column formula - Row formula). Values shown in white are for the TRAINING DATASET (No = 2562) and values shown in grey are for the TEST DATASET (No = 808) (see Methods for details)

Impact of the choice of USG dating formula on the estimation of the rate of PTB

The PTB rates estimated using different methods ranged between 11.27 and 16.5% with Garbhini-GA1 estimating the least (11.27%; CI 9.70, 13.00), followed by LMP (13.99%; CI 12.25, 15.86), Hadlock (14.53%; CI 12.77, 16.43), and Robinson-Fleming formula being the highest (16.50%; CI 14.64, 18.49). Among all pairwise comparisons performed, the differences in PTB rates estimated by Garbhini-GA1 compared with Robinson-Fleming or McLennan-Schluter were statistically significant (Fisher’s Exact test with Bonferroni correction for p < 0.05, Table S9). Furthermore, Garbhini-GA1 formula had the highest sensitivity and balanced accuracy (Table S10).

When these methods were used to determine PTB at an individual level, the Jaccard similarity coefficient (a statistic used for gauging the similarity and diversity of sample sets) ranged between 0.49–0.98 (Table 3). Interestingly, even though the two most used methods of dating, LMP and USG-Hadlock had similar PTB rates (13.99 and 14.53%, respectively) at the population-level, the Jaccard similarity coefficient was only 0.49 suggesting a poor agreement between the methods at an individual-level (Fig. 2c, Table 3).

Table 3 The Jaccard similarity coefficient of PTB classification between each pair of the method

Discussion

Principal findings

This study’s primary objectives were to compare different methods and formulae used for GA estimation during the first trimester, develop a population-specific dating model for the first trimester, and study the differences in PTB rate estimation using these formulae. Our findings show that the LMP-based method overestimates GA by 3 days compared to the USG (Hadlock) method. While this bias does not impact at the population level with similar overall PTB rates determined by both methods, interestingly, there is less than 50% agreement between these methods on who are classified as preterm at an individual level.

This is consistent with the pattern observed in a recent study from a Zambian cohort [26]. The Hadlock formula for USG-based estimation of GA was developed on a Caucasian population and has been used for several decades globally [12]. We developed and tested population-specific dating formula to estimate GA in an Indian setting. The CRL-based Garbhini-GA1 formula performed the best and addition of other clinical and sociodemographic predictors identified from machine learning tools did not improve the performance of CRL-based Garbhini-GA1 formula. While most of the dating formulae estimated similar PTB rates, Garbhini-GA1 formula estimated the lowest PTB rate and had the best sensitivity to determine preterm birth.

Strengths of the study

The Garbhini-GA1 formula developed from Indian population overcomes the low representativeness of existing dating formulae. Using advanced data-driven approaches, we evaluated multiple combinations of various clinical and sociodemographic parameters to estimate gestational age. We conclusively show that CRL is the sufficient parameter for first-trimester dating of pregnancy and the addition of other clinical or social parameters do not improve the performance of the dating model. Further, to build Garbhini-GA1 formula, we used a data-driven approach to remove outliers that retained more observations for building the model than would have been possible if the clinical criteria-based method had been used to develop the reference standard. Another important strength of our study is the standardised measurement of CRL. This reduces the imprecision to the minimum and makes USG-based estimation of gestational age accurate.

Limitations of the data

For the development of Garbhini-GA1 model, it would have been ideal to have used documented LMP collected pre-conceptionally. Since our GARBH-Ini cohort enrols participants in the first trimester of pregnancy, clinical criteria based on data collected using a questionnaire was used to derive a subset of participants with reliable LMP. This was relatively incomplete as we had residual imprecision, which was not accounted for by the clinical criteria. We tried to overcome this limitation by using data-driven approaches to improve precision.

To address the truncation problem [22], we supplemented observations simulated from Hadlock distribution. While it is possible that the supplemented data points from the Hadlock formula could be different from our population data, since CRL is not measured beyond 14 weeks as standard clinical practice, this is the best possible way to address this issue.

There is some evidence to demonstrate an association between CRL (early suboptimal growth) and early preterm birth [27]. To assess CRL as a metric of early foetal growth, we need an alternative accurate dating method as a good reference for comparison. The last menstrual period is the alternative used for such evaluations. In this manuscript, we have demonstrated the inaccuracy of LMP-based dating method in our population, making it difficult to assess the influence of early suboptimal growth, reflected by CRL, on our model.

Interpretation

The LMP-based dating is prone to errors from recall and irregularity of menstrual cycles due to physiological causes and pathological conditions. The overestimation of GA by the LMP-based method seen in our cohort has been reported in other populations from Africa and North America [26, 28]. However, the magnitude of overestimation varies, as seen in studies done earlier [26, 28, 29]. These differences could be attributed to the precision and accuracy with which these cohorts’ participants recalled their LMP. In our study, the bias in LMP-based dating was not reflected in the population-level PTB rates; however, at an individual level, LMP and USG-Hadlock had less than 50% agreement in the classification of PTB. Such considerable discordance is concerning as the clinical decisions during the early neonatal period largely depend on GA at birth. Further, any clinical and epidemiological research studying the risk factors and complications of PTB will be influenced by choice of dating method.

As shown by BA analysis, Garbhini-GA1 formula based on first-trimester CRL of our study population can be interchangeably used with Hadlock, INTERGROWTH-21st, Verburg and Sahota but not with McLennan-Schluter and Robinson-Fleming formulae. We get similar GA estimates using Hadlock, INTERGROWTH-21st, Verburg and Sahota formulae, which indicates that GA estimate using CRL is robust. However, even minimal difference in GA estimation leads to significantly different preterm estimates. The higher sensitivity of Garbhini-GA1 formula to classify PTB in our study population is encouraging but should be externally validated in other populations within the country before it can be recommended for application. It would be useful to evaluate the performance of population-specific formulae for second and third trimesters of gestation as ethnic differences in foetal growth might manifest more during this period.

In this study, we aimed to strike a balance between developing an accurate model and retaining the representativeness to the general population. We did not exclude medical complications of pregnancy in order to ensure a large unselected population-based cohort so that our model was more representative of the general population that would be encountered by an obstetrician in India.

Conclusions

LMP overestimates GA by 3 days compared to USG-Hadlock method, and only half of the preterm birth were classified correctly by both these methods. CRL-based USG method is the best for GA estimation in the first trimester, and the addition of clinical and demographic features does not improve its accuracy. Garbhini-GA1 formula is an Indian-population based formula for estimating GA in the first trimester based on CRL as the prime parameter. It has better sensitivity than the more commonly used Hadlock formula in estimating the PTB rate. Our results reinforce the need to develop population-specific GA formulae. These results need to be further validated in subsequent multi-ethnic cohorts before being applied for broader use.