Introduction

First trimester miscarriage is the most common complication in early pregnancy. Increased maternal age is the principal risk factor [1,2,3] fuelled by the delayed childbearing in many societies [4]. Maternal age is also associated with increased frequency of the omen symptoms of pain and bleeding [5]. Therefore, an increasing proportion of pregnant women seek clinical reassurance of viability from an early pregnancy unit [3]. Moreover, readily available and dependable home tests for human chorionic gonadotropin (hCG) and easily accessible transvaginal sonography (TVS), have contributed to the increased incidence of reassuring visits [6]. If the initial TVS is inconclusive, current management often requires at least a two-day observation period for development of hCG and a follow-up scan [7]. Consequently, couples frequently describe severe anxiety [8, 9] and tests to ameliorate the uncertainties of first trimester viability are warranted [10]. The 2021 article series from the Lancet shed light on the significant lack of evidence surrounding miscarriage, emphasizing the need for a global reform of its care [11,12,13].

Early pregnancy diagnostics is usually based on demographic, biochemical or ultrasonic data. Combining all three data sources may potentially improve the prediction of outcome compared with one-parameter diagnostics. However, Pillai and colleagues [14] reviewed the performance of such combinations but found too few and too heterogenic studies in order to conduct a relevant meta-analysis. Furthermore, currently available studies considering gestational age (GA)-dependent risk factors for miscarriage are logistic regression models that do not acknowledge the possibly changing risk throughout the first trimester [15,16,17,18,19,20].

Based on the Prospective Early Pregnancy cohort of women with naturally conceived and expectedly healthy pregnancies, this study aimed to describe the best combination of variables for assessment of the dynamically altering risks of miscarriage during the first trimester.

Methods

The PEP Cohort

From June 2016–March 2017, the Prospective Early Pregnancy cohort (PEP cohort) recruited Danish women in early pregnancy for serial TVS and blood samples every second week until 11–14 weeks’ gestation with the primary aim of developing a biobank of data to be used in research intended to improve prediction of early pregnancy outcome. Inclusion criteria were positive urine-based hCG test and < 8 weeks’ gestation in women older than 18 years. Exclusion criteria comprised: History of recurrent pregnancy loss (≥ 3 consecutive losses), known anatomical abnormalities of the uterus or tubes, discovery of multiple gestations and pregnancies from medically assisted reproductive (MAR) treatment. At the first visit, participants were excluded if heavy vaginal bleeding, pain that prompted any surgical intervention or uterine anatomical anomalies were diagnosed. From TVS, an intrauterine pregnancy (IUP) was defined as an intrauterine yolk sac within the gestational sac or a single embryo with fetal heartbeat. If the first scan was inconclusive, the patient had a pregnancy of unknown location (PUL) and was booked for another visit per protocol. If the second scan remained inconclusive a third visit was set up. Still inconclusive at this stage, the patient was excluded from the project and referred to the local early pregnancy unit for final diagnosis and treatment. Participants contacted the study if symptoms developed between visits, and emergency consultations could be planned to ascertain viability. The woman informed the investigator if a miscarriage was diagnosed. Ectopic pregnancies were a priori decided to be excluded but did not occur. The final outcome was either a biochemical pregnancy (positive test, no pregnancy seen), anembryonic IUP or missed miscarriage (yolk sac but no embryo or embryo ≥ 6 mm without a heartbeat) or a spontaneous miscarriage (unprompted vaginal bleeding with or without conceptus remnants, preceded by IUP) [21] or confirmed viability at the TVS from 11–14 weeks’ gestation. A postpartum follow-up of predefined pregnancy and delivery data was completed in December 2018.

Baseline Variables

Using a pre-enrollment validated digital questionnaire in an electronic case report form (SMART-TRIAL, Medei, Aalborg, Denmark), participants recorded demographic, previous medical history, lifestyle, and socioeconomic data. Participants also graded their state of physical health on a 1–5 Likert scale (5 being best). They were prompted to rate their own subjective feeling about their physical health. All male partners were asked to fulfill the same questionnaire. At the next visit, all answers were addressed for validation.

Transvaginal Sonography and Bleeding Assessment

GA was determined from self-reported last menstrual period (LMP) as participants often were recruited before the CRL could be measured. When a CRL (in mm) was available [22] from TVS (GE Voluson i BT14, GE Healthcare, Solingen, Germany) it was recorded and used for calculations [22]. Mean gestational sac diameter (MSD) was calculated as the sum of all three orthogonal planes (in mm) divided by three. All scans were carried out by the same investigator (MD, OBGYN resident, 2 years of TVS training). The woman quantified any observed bleeding from a pictural blood assessment chart (PBAC) [23] showing the degree of stained disposable sanitary product from 0 to 4.

Blood Samples and Laboratory Procedures

Blood was drawn using Vacutainer serum separator or K2EDTA plasma tubes (BD Diagnostics, Franklin Lakes, NJ, USA). The blood was centrifuged (Hettich Rotina 380 R, Andreas Hettich GmbH, Tuttlingen, Germany) and aliquoted to polypropylene RNase- and DNase-free microcentrifuge tubes and stored at −80 °C. The samples were analyzed eight months after the last patient had completed the trial. The PEP cohort biobank procedure have been previously published in more detail from our group in a study of pregnancy-specific reference intervals of 29 commonly used analytes [24]. All these analytes were available for this study and were used for the exploratory aim of this study; the development of early pregnancy prediction models (Supplementary Table 1).

Statistical Analysis

Two-sample group comparisons were performed using either the Chi-squared test or t-test. For asymmetrically distributed data, the Kruskal–Wallis test was used. Baseline variables without time-dependence were modeled by logistic regression and reported as univariate odds ratios (OR) with 95% confidence intervals (CI) of miscarriage, also after multivariate adjustment according to backwards elimination of the individual variables (aOR). Time-dependent variables were visualized in scatterplots as functions of gestational age with outcome-specific mean curves obtained from generalized additive models.

The effects of the time-dependent variables on the probability of miscarriage was assessed using joint modeling of longitudinal and survival data [25]. In the joint models, the longitudinal sub-models were generalized linear mixed models with natural cubic spline parameterizations of the means and random effects. Cox proportional hazards regression with gestational age as time scale was used for the survival sub-models in which the subject-specific means from the longitudinal sub-models entered as time-dependent covariates. Dichotomized age (below or above 35 years) was also included in the Cox model as a constant covariate. The level of hCG was transformed using base-2 logarithm, and bleeding was dichotomized as bleeding or no bleeding.

Univariate (including one time-dependent covariate) and multivariate (including three time-dependent covariates) joint models were considered, and the effects of the time-dependent covariates on the probability of miscarriage were given as hazard ratios (HR). For the multivariate joint models, all possible combinations of three time-dependent covariates were analyzed giving a total of 56 models. The ability to predict miscarriage in each model was compared using WAIC (Watanabe–Akaike information criterion) [26]. This criterion provides an estimate of the out-of-sample prediction accuracy and was used to rank the models with respect to their predictive performance.

Univariate joint models were fitted using the R package JM [27], while multivariate joint models were fitted using the rstanarm package [28]. Statistical significance was set at 5%.

Results

Study Population

In Fig. 1 we present the flow-chart of participants in the PEP cohort from the initial expression of interest until the number of women who reached the final outcome. From 218 interested women 203 were included for analyses. Viable IUP after 11–14 weeks’ gestation was seen in 166 (82%) and 37 women (18%) miscarried. The study accumulated 715 visits (3–4 visits per participant). Two women were diagnosed with fetal chromosomal abnormalities after the first trimester and had induced abortions, they remained in the analyses as ongoing pregnancies. Live births of 164 healthy singleton neonates were recorded. In 90% (95% CI 85–95%) of women the presence of a fetal heart rate before 8 weeks’ gestation was followed by a subsequent live birth.

Fig. 1
figure 1

Flowchart of participants in the Prospective Early Pregnancy (PEP) cohort from expressed interest to final outcome. Red boxes show drop-outs due to various reasons from analyses

Baseline Demographics and Risks

Women who miscarried were on average 2 years older than women with ongoing pregnancies (31 ± 5 vs. 29 ± 4 years, p = 0.021). Uni- or multivariate analyses, respectively, showed that per-year maternal age increased, the risk of miscarriage was elevated by 9 or 18%, respectively (OR 1.1, 95% CI [1.0;1.2], p = 0.02; aOR 1.2, 95% CI [1.1; 1.3], p < 0.01). Table 1 shows the baseline maternal and paternal characteristics divided by outcome and Fig. 2 depicts the survival curve from Kaplan Meier statistics with the event of miscarriage according to the three age groups. Women > 35 years were namely at risk in the earliest part of the pregnancy but also showed and overall increased risk (aOR 7.5, 95% CI [2.3; 26], p < 0.01) while 30–35-years old had the same odds of miscarriage as younger women.

Table 1 Baseline maternal and paternal characteristics compared by ongoing pregnancy or miscarriage. Odds ratios (OR) of miscarriage from uni- and multivariable logistic regression
Fig. 2
figure 2

Kaplan Meier survival curve with the event of miscarriage in three maternal age groups according to gestational age. Also showing the log rank test significance for between-group comparison

Odds of miscarriage were significantly increased in the adjusted model for obese (> 30 kg/m2) women (aOR 3.4, 95% CI [1.1; 10], p = 0.03) and reduced for women with two or more previous deliveries (aOR 0.1, 95% CI [0.01; 0.6], p = 0.02) (Table 1). Unknown status of menstrual cycles was more common in the miscarriage group (14% vs. 2%, p = 0.008) and odds of miscarriage was increased (OR 5.9, 95% CI [1.5; 25], p = 0.01). Odds of miscarriage was significantly increased if a woman graded her own physical health as a 4 compared to 5 (reference group) (OR 3.4, 95% CI [1.1; 15], p = 0.05). The only significant paternal factor was age (p = 0.02). The male partner was on average three years older than the woman (33 ± 6 and 30 ± 5).

Variables According to Gestational Age

Before 7 weeks’ gestation, the CRL and MSD had similar trajectories regardless of pregnancy outcome. After this point, their trajectories differed significantly clearly visualized by the differences in the green (healthy pregnancies) and red (miscarriages) data points of Fig. 3. In Table 2 we provide a detailed evaluation of the measured levels of estradiol, progesterone and hCG in quantiles before and after 7 weeks’ gestation. The odds of a live birth typically increased comparing lower (reference) to higher quantiles both before and after 7 weeks’ gestation.

Fig. 3
figure 3figure 3

Connected observations per participant for CRL, MSD, progesterone, estradiol, albumin, CA125 and hCG colored by miscarried (red, dashed line) or ongoing (green, full line) pregnancy with a thicker-line smoothed mean and 95% CI according to gestational age

Table 2 Odds of a live birth by quantiles of biochemical measurands before and after 7 weeks' gestation

In Table 3 we provide unadjusted HRs with 95% CIs for each investigated variable from the three sources: baseline, sonography and blood. Increased MSD, CRL, estradiol, progesterone and log2hCG showed a significantly reduced risk of miscarriage in the serially collected GA-dependent unadjusted data (p < 0.001 for all mentioned). Of the remaining 26 investigated analytes: albumin, CA125, cholesterol, CRP and creatinine were also significantly associated with miscarriage. Dichotomized maternal age showed a 2.6-fold increased HR (p = 0.012) for miscarriage in the ≥ 35 years old group. Bleeding was also dichotomized as present or absent with a 1.1-fold increased miscarriage HR (p = 0.008) if present.

Table 3 Unadjusted hazard ratio (HR) with 95% CI for miscarriage in all statistically significant variables. Categorical variables (italics) show the presence effect. Continuous variables show the effect of a one-unit (or otherwise specified) increase (CRL: crown-rump-length, MSD: mean gestational sac diameter, CA: cancer antigen, hCG: human chorionic gonadotropin)

Figure 3 shows all eight selected variables for modelling, colored by outcome and plotted according to GA with lines between the values for each woman.

Combined Biomarkers

From the unadjusted HR in Table 3, the following factors were selected for the modelling of dynamic survival probability, combining no more than three variables: CRL, MSD, bleeding, estradiol, progesterone, albumin, CA125 and hCG. Dichotomized age was added to all models for a total of 56 combinations. The results of each model are available in Supplementary Table 2 and ranked from 1–56 by decreasing prediction accuracy (increasing values of WAIC). From all models, the best combination of variables for prediction of miscarriage was age, hCG, CRL and bleeding. The second-best model was the sonography-absent model of maternal age, bleeding, hCG, and estradiol.

In Fig. 4 (see the figure legend for all details) we provide a theoretical example of survival probability calculations from updated values between the first and a follow-up visit using both the best model of hCG, CRL and bleeding (panel A and B) (showing the change in survival probability from an insufficient hCG development) and the second-best model (panel C and D) combining bleeding, hCG and estradiol (showing the change in survival probability from an insufficient estradiol development).

Fig. 4
figure 4

Pregnancy survival probability curve for a theoretical 36-year old woman who presents in week 6 with vaginal bleeding, CRL of 5 mm and hCG of 15.000 U/L (A). One week later (B) the CRL sufficiently increased at 8 mm but the probability of survival is shown according to two types of hCG increase (sufficiently: full line at 30.000 U/L vs. insufficiently: dashed at 16.000 U/L). The second-best combination of variables predicted the outcome for the same woman at her first visit (week 5, estradiol at 1.0 nmol/L) (C) and one week later (D) according to different estradiol increases (full line, 2.0 nmol/L vs. dashed, 1.1 nmol/L)

Discussion

Our study used a longitudinal dataset from the first trimester and introduced a novel approach to predicting outcomes using joint statistical modeling. The longitudinal data highlighted key differences between the investigated variables according to GA. We found that a combination of maternal age, bleeding, hCG, and CRL was the most accurate predictor of outcome. When CRL was unavailable, estradiol provided a useful alternative.

Interestingly, our study found that estradiol was a more effective predictor of outcome than progesterone, which was unexpected based on previous research [3, 19, 20]. Although the ideal tool for predicting miscarriage has yet to be developed, it has been the subject of many investigations. Previous studies have primarily focused on progesterone as the main biochemical predictor of miscarriage. However, in our data, the best combined model including progesterone was ranked 20th (with CRL and hCG). By examining women throughout the first trimester, we found that the dynamic alterations of hCG and estradiol provided a more accurate description of miscarriage prediction. Whittaker et al. conducted a prospective cohort study from 1978–1985, which resembled our design, to describe the first trimester gestational trajectory of serially collected estradiol, progesterone, and hCG related to miscarriage [29]. Their findings corroborate ours, indicating that dynamic changes in estradiol and progesterone are important in identifying impending miscarriages before 7 weeks of gestation. Moreover, a 2016 meta-analysis also supports the potential utility of estradiol in predicting miscarriage, and a recent retrospective study identified estradiol as the most effective marker of miscarriage prediction [7, 30]. In a recent ESHRE abstract, a plausible explanation of serum estradiol and outcome prediction was proposed. The study collected paired vaginal microbiome and serum estradiol (and progesterone) samples from 100 women during the first trimester. It was found that women who had a live birth despite having low levels of pregnancy-favorable Lactobacillus species [31] had significantly higher levels of estradiol compared to those with similar favorable Lactobacillus species depletion who miscarried. This was not the case for progesterone. Additionally, longitudinal samples from these high-estradiol women who had a live birth showed increasing levels of Lactobacillus species later in pregnancy [32]. Estradiol has been shown to increase glycogen storage in the vaginal mucosal lining, hypothetically supporting the dominance of Lactobacillus species [31].

Considering sonographic markers of viability, Bottomley et al. developed a prediction model for women with an intrauterine pregnancy of uncertain viability without blood sampling. Combining maternal age, bleeding assessment, GA, MSD and presence of a yolk sac provided an AUC of 0.77 [18]. The authors re-tested the original models and updated with a new GA-independent version in an external prospective validation study and confirmed the results at an AUC of 0.85 [16]. Finally, the group used the model in the original cohort of all consecutively examined women (irrespective of first TVS findings and therefore resembling our population), and got an AUC of 0.92 in 1435 participants [17]. Although these studies demonstrate accurate prediction of viability with non-invasive methods, they are subject to potential inaccuracies due to patient- and operator-dependent factors such as bleeding quantification and TVS quality. Moreover, their estimates were based on logistic regression models using only one or two values from each participant, which may introduce selection bias. In contrast, our study utilized individualized time-varying hazard ratios, providing a more precise assessment of risk.

The coveted ideal prediction model for fetal viability would likely utilize the combination of multiple data sources. In 2003, Elson et al. developed a logistic regression-based model including serum progesterone, MSD and maternal age in women with an intrauterine pregnancy of uncertain viability. The area under the receiver operating characteristics (ROC) curve (AUC) was 0.97 [20] and a follow-up trial found an AUC of 0.85 [19]. In the latter trial, testing was more likely to be carried out in women with an expected higher risk of miscarriage that may explain the lowered performance. Also, still at an experimental stage, the test was only used in about 9% of eligible patients adding to the selection bias.

The recruitment of patients in the PEP cohort before onset of symptoms allowed us to evaluate risk factors for miscarriage prior to the entry of care [33]. Previously well-described, we found an increased risk of miscarriage by maternal age ≥ 35 years [1, 3]. This may be explained by increased aneuploidy in older women and these losses also occurred more frequently before 7 weeks’ gestation (Fig. 2) [34]. Obesity (BMI > 30 kg/m2) increased and birth of two or more children decreased the risk of miscarriage [35, 36], respectively. Ideally, our study setup and power would have allowed for an evaluation of the effect modification of BMI on all serially collected variables which was not possible from the available sample size. A 2014 systematic review clearly documented association between smoking and miscarriage [37]. Probably due to a lack of statistical power, our study found no association between smoking status and risk of miscarriage. Unknown menstrual cycle status increased the risk of miscarriage but with wide confidence limits. The nine women who gave the answer all became pregnant shortly after giving birth or finishing a long use of hormonal contraception. When asked about usual menstrual pattern, one woman reported irregular periods. Backwards elimination removed this variable from the adjusted calculations. Collectively, we consider this a chance finding of minor importance.

The risk of miscarriage increased with lower physical well-being scores. More women at baseline (before outcome was known) in the group with a viable fetus reported perfect physical satisfaction (grade 5). The motivation for filling in lower grades were not further explored but may be linked to unhealthy behavior known to increase the risk of miscarriage (like smoking or being overweight). Further, older women with the knowledge about increased risk of miscarriage with age may apply this concern when having to describe their won health. This variable was also of minor statistical importance and excluded from the adjusted calculations. This finding is consistent with Maconochie et al.’s retrospective analysis of more than 6,000 women’s latest pregnancy, which found a lower risk of miscarriage in those who felt “well enough to fly or have sex.” [36]. No paternal variables were significant.

Our data only allowed a combination of three variables (besides maternal age) for estimation of dynamically altering risk. From the statistically significant covariates, we selected the feasible variables of CRL, MSD, bleeding (dichotomized), estradiol, progesterone and hCG for modelling. We added CA-125 as a novel candidate that have previously shown discriminative potential in a 2001 study by Schmidt et al. and found to be the best marker of threatening miscarriage in a 2016 Human Reproduction Update paper. Our study did not corroborate this finding. Albumin is known to decrease significantly in a healthy first trimester pregnancy as data from the PEP cohort has shown previously. It was therefore added to explore a possibility for differentiation that was not found [7, 38,39,40,41]. In 2019, a prospective study found that dichotomized bleeding was equal to grading the amount of first trimester bleeding in prediction of later gestational pathology [42]. The unreliable estimation of bleeding was therefore removed, reducing patient-subjective recall bias. Given the fact that heavy bleeding is a detrimental risk factor of miscarriage we would have included the graded data if statistically possible [18].

Fetal ploidy was unknown as product of conception tests was unfeasible. Moreover, our aim was to develop a model that could alleviate some of the anxiety of early pregnancy and the currently available diagnostic work-up does not distinguish by ploidy. Individualized genetic counselling may change this in the future [43, 44] alongside technological advancements in the artificial intelligence interpretation of early pregnancy ultrasonography [45]. Fetal heart rate (FHR) quantification has been shown to be a useful predictor of viability [14] but was left out for safety reasons as the heat-induction from doppler-based estimation of FHR may affect developing embryos before 10 weeks’ gestation according to the manufacturer.

The overall sample size was insufficient for an estimate of each model’s ability to predict positive and negative cases (e.g. AUC). This unfortunate limitation was primarily due to the challenging logistics of serial collection but acknowledge our aim of evaluating potential models. Any follow-up trials for validation of the models should take great care to get the sample size appropriate.

Developing tools for prediction of outcome comes at the price of possibly being able to predict increased risk of miscarriage in an otherwise low-risk pregnancy providing more anxiety contrary to the intention of development. Implementation of such models should therefore focus on women with a clear indication for testing, such as previous miscarriages or maternal age > 35 years.

In our prospective cohort of naturally conceived, serially followed and expectedly healthy pregnancies, maternal age, bleeding, hCG, and CRL provided the best model. A TVS-independent model of maternal age, bleeding, hCG, and estradiol also performed well. Surprisingly, estradiol was a better predictor of miscarriage than progesterone.