Background

Preterm birth (PTB) is defined as delivery before 37 gestational weeks and is a leading cause of infant morbidity and mortality [1,2,3,4]. It is estimated that 15 million babies are born preterm annually and the PTB rate ranges between 5 and 18% worldwide [3]. Specifically, the prevalence of PTB varies by geographic location ranging from 12 to 13% in the USA [1, 2] and from 5 to 9% in Europe [2]. Advances in neonatology and the administration of corticosteroids before birth have significantly improved the prognosis of babies born preterm [5]. Even though vigorous research was carried out over the last 40 years, which costed millions of dollars and focused on the prediction and prevention of preterm birth its incidence remains relatively unchanged [5]. The most probable explanation is that preterm birth is a syndrome, and many different causes may act synergistically to its manifestation [5].

Numerous systematic reviews and meta-analyses have assessed various, non-genetic risk factors of preterm labor. Several environmental and clinical parameters such as present pregnancy characteristics, previous pregnancy history [4], infections [6, 7], environmental exposures, pharmaceutical factors [8], and surgical interventions have been proposed as plausible factors related to PTB [9]. To date, there is no assessment of the epidemiological quality of this literature. Identifying robust risk factors for PTB should help us define a study population for specific interventions, allocate available resources effectively, allow for risk-specific treatment, and understand the mechanisms leading to PTB [1].

To our knowledge, there is no previous effort to summarize existing evidence of meta-analyses of non-genetic risk factors for PTB. We conducted an umbrella review across published meta-analyses of observational studies, including topics related to a wide range of risk factors including obstetric history, medical history, drugs, socioeconomic status indicators, and environmental and dietary risk factors, with the goal of mapping the existing evidence and critically evaluating the reported associations. We applied stringent criteria to assess potential systematic biases.

Methods

We conducted an umbrella review which is a comprehensive and systematic approach that collects and critically evaluates all systematic reviews and meta-analyses performed on a specific research topic [10]. We used previously described, standardized methods that have been already used in published umbrella reviews referring to risk factors related to various outcomes [11,12,13,14] and have been elaborated below.

A protocol for this umbrella review was registered in the International prospective register of systematic reviews (PROSPERO 2021 CRD42021227296).

Search strategy

Two researchers (A.E., I.M.) independently searched PubMed and Scopus databases from inception to February 2021, to identify systematic reviews with meta-analyses of studies that examine the association between risk factors and preterm birth. The search strategy included combinations of the Medical Subject Headings (MESH) terms, keywords, and word variants for terms “preterm birth” AND (“systematic review” OR “meta-analysis”). Titles and abstracts were screened, and potentially eligible articles were retrieved for full-text evaluation. A detailed description of our search strategy is provided in Additional file 1.

Eligibility criteria and data extraction

We included systematic reviews with meta-analyses investigating the association between various types of exposures as risk factors for PTB. Specifically, we included studies with singleton pregnancies and studies where PTB was evaluated as the primary outcome. Case reports or series and individual participant data meta-analyses were excluded. We also excluded studies that set time limits on time span or were performed on a restricted setting (i.e., conducted for one specific country). Studies that assessed PTB as a secondary outcome were also excluded. We excluded meta-analyses that assessed PTB as a secondary outcome for two reasons; first, in any analysis of a secondary outcome, there is a possibility of lack of power to detect an effect, given that studies design their power calculations based on a primary outcome. Therefore, any assessment of effect size, heterogeneity, and other statistics would be meaningless under the lack of power. Second, some components of the grading of evidence, such as publication bias, are assessed based on the primary outcome of the studies and could not be evaluated for secondary outcomes. Furthermore, we excluded studies including multiple pregnancies and studies that assessed genetic or over -omics features as risk factor for PTB. All studies were compared to avoid the possibility of duplicate or overlapping samples. If more than one meta-analysis referring to the same research question was eligible, parameters such as the largest amount of component studies with data on individual studies’ effect sizes, publication year, and in some cases the number of participants on individual studies were considered to retain the appropriate one for the main analysis.

Publications whom estimates of the studied associations, such as relative risks (RR) and 95% confidence intervals (CIs) were not reported or could not be retrieved/calculated were excluded from the analysis. For the non-environmental risk factors, we also excluded meta-analyses that did not provide the number of cases in the exposed and non-exposed groups, which is used for the calculation of the excess significance test. For the environmental risk factors, since most commonly they report the results as per unit(s) increase in exposure and the entire population is exposed, we included them even if they did not report the number of cases and total sample size. As most of the included meta-analyses did not report the number of cases or the sample size of the studies included, we were unable to estimate the power of each meta-analysis and the excess significance test.

Eligible articles were screened by four independent reviewers (AE/IM and EB/TK). Any disagreement between reviewers was resolved by consensus or after the evaluation of a third author (SP or EE). The data of eligible studies were extracted in a predefined data extraction form recording for each study the first author, journal, year of publication, the examined risk factors, and the number of reviewed studies. Either the study-specific relative risk estimates (risk ratio, odds ratio, hazard ratio, incidence rate ratio) and the confidence intervals were extracted or the mean and the standard deviation for continuous outcomes were also noted in this form. We also extracted the exposed and control groups used, outcome assessed, study population, exposure characteristics, number of studies in the meta-analysis, meta-analysis metric and method, effect estimate with the corresponding 95% confidence interval, number of cases and total sample size, I2 metric and the corresponding χ2 P-value for the Q test, and Egger’s regression P-value.

Assessment of summary effect and heterogeneity

We re-calculated summary effects and 95% confidence intervals (CIs) for each meta-analysis via fixed and random effects model [15, 16]. 95% prediction intervals (PI) were also computed for the summary random-effects estimates, which further account for between-study heterogeneity indicating the uncertainty for the effect that would be expected in a new study examining the same association [17, 18]. A PI describes the variability of the individual study estimates around the summary effect size and represents the range in which the effect estimate of a new study is expected to lie.

The largest study was considered to be the most precise if there was a difference between the point estimate and the upper or lower 95% confidence interval less than 0.20 [19]. If the largest study presented a statistically significant effect, then we recorded this as a part of the grading criteria.

Between-study heterogeneity was assessed and P-value of the χ2-based Cochran Q test and the I2 metric for inconsistency (reflecting either diversity or bias) was reported, too. I2 metric was used to indicate the ratio of between-study variance over the sum of within- and between-study variances, ranging from 0 to 100% [20]. Values exceeding 50% or 75% are usually considered to represent large or very large heterogeneity, respectively [21]. 95% confidence intervals were calculated as per Ioannidis et al. [21].

Assessment of small-study effects

Small studies tend to give substantially larger estimates of effect size when compared to larger studies. We evaluated the evidence of the presence of the small-study effects, to identify publication and other selective reporting biases. They can also reflect genuine heterogeneity, chance, or other reasons for differences between small and large studies [22]. We evaluated whether smaller (less precise) studies lead to inflated effect estimates compared to larger studies. We used the regression asymmetry test proposed by Egger, which examines the potential existence of small-study effects via funnel plot asymmetry [23]. Egger’s test fits a linear regression of the study estimates on their standard errors weighted by their inverse variance. Indication of small-study effects based on Egger’s asymmetry test was claimed when P-value ≤ 0.10. This is considered as an indication of publication bias; indication of small-study effects based on Egger’s asymmetry test was claimed when P-value ≤ 0.10 and the random effects summary estimate was larger compared to the point estimate of the largest (most precise) study in the meta-analysis.

Excess statistical significance evaluation

The excess significant test was applied to evaluate the existence of a relative excess of significant findings in the published literature for any reason (e.g., publication bias, selective reporting of outcomes or analyses) [24]. This is a binomial test evaluating whether the number of positive studies in a meta-analysis was too large according to the power that these studies have to detect plausible effects at α = 0.05. The power of each component study was calculated using the fixed-effects summary, the random effects summary, or the effect size of the largest study (smallest SE) as the plausible effect size [13] using an algorithm using non-central t distribution to calculate the power of each study [25]. Excess statistical significance for single meta-analyses was claimed at P < 0.10 (one-sided P < 0.05, with observed > expected as previously proposed), given the power to detect a specific excess will be low, especially with few positive studies [24].

Grading of evidence

We followed a 4-level grading (robust, highly suggestive, suggestive, and weak) to evaluate the strength of the evidence based on the following criteria: number of cases, summary random-effects P-value, between-studies heterogeneity, 95% PI, small-study effects bias, and excess statistical significance [26]. This grading approach based on these parameters was used because it allows for an objective, standardized classification of the level of evidence and has been previously shown to provide consistent results with other more subjective grading schemes [26,27,28,29,30].

Briefly, meta-analyses were considered to be supported by robust evidence if the association was supported by more than 1000 cases, a highly significant association (the random effects model had a P-value ≤ 10−6, a threshold that is considered to substantially reduce false positive findings) [31,32,33], there was absence of high heterogeneity based on I2 < 50%, the 95% PI excluded the null value, and there was no evidence of small-study effects or excess statistical significance. Highly suggestive evidence required more than 1000 cases, a highly significant association (a random-effects P-value ≤ 10−6), and the largest study in the meta-analysis was significant at P < 0.05. Associations based on meta-analyses with a random-effects P-value ≤ 10−3 and included more than 1000 cases [31,32,33] were graded as suggestive evidence. The remaining significant associations at P < 0.05 were graded as weak evidence. We need to highlight that this specific grading scheme focuses on the reduction of false-positive findings and the evaluation of potential biases in the studied associations. Therefore, the set of criteria used here is not ideal for a detailed evaluation of non-significant associations and to distinguish insufficient evidence from robust evidence of no association. That would require a different approach and another set of criteria altogether that would focus on the power of the meta-analyses to observe a significant effect, which was beyond the scope of our review. This grading system has been evaluated [34] and showed that these criteria may offer relatively independent and complimentary insights into the evidence of an observational association. Other systems such as GRADE [35] and ROBIS [36] have focused mainly on evaluating randomized evidence from RCTs or non-randomized studies of intervention.

Statistical analyses were performed using STATA version 14 (StataCorp, TX, USA).

Results

Description of eligible meta-analyses

The search identified 2985 items, of which 2420 were excluded after a review of the title and abstract (Fig. 1, PRISMA flowchart). Of the remaining 565 articles that were reviewed in full text, eight articles did not report the appropriate information for the calculation of excess of statistical significance (either because the total sample size was missing or the study-specific relative risk estimates were missing), and 134 articles were excluded because a larger systematic review or meta-analysis investigating the same risk factor was available. From the 219 comparisons, we further excluded the ones that included one or two studies (53 comparisons). Therefore, 218 articles were analyzed, of which 133 were systematic reviews without any quantitative component and 84 were meta-analyses. The 84 eligible meta-analyses [4, 6,7,8,9, 37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115] included data on 166 comparisons and 1480 primary studies.

Fig. 1
figure 1

Flow diagram for the selection of included studies

Summary effect sizes and significant findings

Three to 152 studies, with a median of 11 studies, were included per meta-analysis. The median number of cases and total population in each study were 91 and 1004, respectively. The median number of cases and total population in each meta-analysis was 7266 and 94,907, respectively. The number of cases was greater than 1000 in 94 comparisons. Overall, 570 (38,5%) individual studies observed statistically significant results at P < 0.05. Thirty-nine meta-analyses used the Newcastle-Ottawa scale to assess qualitatively the included primary studies. One meta-analysis used assessment criteria for non-randomized observational studies adapted from Duckitt and Harrington, 3 meta-analyses used the Methodological Index for Non-Randomized Studies (MINORS), and 37 meta-analyses used other assessment tools. Four meta-analyses did not perform any quality assessment. Details of the 166 comparisons that included 1480 individual study estimates are summarized in Additional file 2.

Of the 166 comparisons, 99 (59.3%) had statistically significant findings at P < 0.05 using the random-effects model, of which 93 reported an increased risk and six a decreased risk for preterm birth. The associations identified to decrease the risk of PTB are the following: preconception care vs no care [47], magnesium supplementation vs placebo [72], single vs double embryo transfer [87], high gestational weight gain vs normal gestational weight gain [106], interpregnancy interval following miscarriage < 6 months vs > 6 months [108], and greenery including only a 100-m normalized difference vegetation index (NDVI) buffer [113]. Of these, a total of 61 (36.8%) associations presented statistically significant effect at P < 0.001, while only 41 (24.7%) remained significant after the application of a more stringent P-value threshold of P < 10−6 (Table 1).

Table 1 Assessment across the statistically significant associations for preterm birth

Between-study heterogeneity and prediction intervals

Forty-four (26.5%) comparisons had large (I2 ≥ 50% and ≤ 75%) and 47 comparisons (28.3%) had very large (I2 > 75%) heterogeneity estimates (see Additional file 2). When calculating the 95% PIs, the null value was excluded in only thirty-one (18.7%) comparisons.

Small-study effects

Evidence for statistically significant small-study effects (Egger test P < 0.10 and random-effects summary estimate larger compared with the point estimate of the largest study in the meta-analysis) was identified in 37 (22.3%) comparisons (see Additional file 2).

Test of excess statistical significance

Evidence of excess-statistical-significance bias was observed in 12 (7.2%) associations, with statistically significant (P < 0.05) excess of positive studies under any of the three assumptions for the plausible effect size, i.e., the fixed-effects summary, random-effects summary or results of the largest study (see Additional file 2). In addition, the observed and expected number of positive studies showed that, overall, the excess of positive results was driven by meta-analyses with large estimates of heterogeneity (I2 > 50%).

Grading of evidence

The summary of the epidemiological credibility for 166 associations of risk factors for PTB is shown in Additional file 2. Seven of the 166 associations (4.2%) were supported by robust evidence (amphetamines, fetus with isolated single umbilical artery, maternal personality disorder, sleep-disordered breathing (SDB), prior induced termination of pregnancy (I-TOP) with vacuum aspiration (VA), low gestational weight gain, and interpregnancy interval (IPI) following miscarriage less than 6 months) (see Additional file 3). Twenty-six associations (15.7%) were supported by highly suggestive evidence referring to obstetric history, medical history, social and economic profile, and drugs (see Table 1). Fifteen associations (9%) were supported by suggestive evidence, including pre-gravid oral contraceptive use, marijuana, severe maternal morbidity (SMM), periodontal disease, women of short stature, antipsychotics during pregnancy, Trichomonas vaginalis infection, blastocyst-stage embryo transfer (vs cleavage stage embryo-transfer), fresh blastocyst transfer (for PTB), fresh blastocyst transfer (for very PTB), HPV infection (crude), HPV infection (age-adjusted), > 1 prior surgical I-TOP, any type of treatment for cervical intraepithelial neoplasia (CIN) with a cone depth ≥ 20 mm (compared to untreated CIN), and greenery.

Regarding the environmental risk factors, higher residential greenness did not technically qualify to be categorized as robust evidence because the random effects P-value was 3.25 × 10−6 but fulfilled all other criteria. The rest of the associations regarding different levels of exposure to air pollutants [particulate matter with aerodynamic diameter less than or equal to 2.5 μm (PM2.5), nitrogen dioxide (NO2)] in all windows of exposure were classified as weak.

Discussion

In this umbrella review, we evaluated the current evidence, derived from meta-analyses of observational studies on the association between various risk factors and PTB. Overall, from the 166 associations that have been examined, only 4.2% had epidemiologically robust results with no suggestion of bias, as can be inferred by substantial heterogeneity between studies, small-study effects, and excess significance bias. Seven risk factors were supported by robust evidence, including amphetamine exposure [53], isolated single umbilical artery [60], maternal personality disorder [70], sleep-disordered breathing measured with objective assessment [80], prior induced termination of pregnancy with vacuum aspiration compared to no termination [98], low gestational weight gain compared to normal weight gain [107], and interpregnancy interval following miscarriage less than 6 months compared to more than 6 months [108]. Several others had highly suggestive evidence including intimate partner violence [39] and unmarried women [52], cancer survivors [41], African/Black race [49], placental previa [43], hemorrhagic and hepatic disorders [68], endometriosis [8], chronic kidney disease [104], and treatments for CIN [112].

Interpretation in the light of evidence

Some of the risk factors identified from our analysis as robust are well-known risk factors and have been incorporated into the screening processes during prenatal visits such as illicit drug use, ultrasonographic markers, and reproductive history [5]. Nevertheless, we identified a few that are not receiving the attention they should during prenatal visits even though they demonstrate robust evidence.

This includes maternal psychosocial profile and sleeping quality that are either rarely screened during prenatal visits or not considered by clinicians as risk factors for PTB. Traditionally, emphasis was given to factors such as cervix length and history of PTB and their obstetric management [5]. Screening and early intervention on maternal personality disorders and SDB during pregnancy should be further evaluated at prenatal visits and potentially contribute to PTB prevention.

Interpregnancy intervals

One association with highly controversial evidence is interpregnancy interval following a miscarriage and the risk of preterm birth. The World Health Organization (WHO) encourages women who experienced a previous miscarriage to wait for a minimum of 6 months before the next conception to achieve optimal outcome and reduce obstetric complications such as preterm birth [116]. Contrary to the findings of the research on which WHO based its recommendations, some studies reported that the risk of adverse obstetric outcomes including preterm birth is lower in women who conceived less than 6 months after a pregnancy loss [117,118,119], while synthesizing all available data provided the same conclusion [108]. This meta-analysis included eight studies and performed two analyses: one including the study of Conde Agudelo 2004 [120] and one excluding it, and robust results were obtained after excluding this study. While this was a large retrospective study on which the WHO guidelines for delaying pregnancy for at least 6 months [116] are based, it did not differentiate between induced and spontaneous abortions and used data from many countries where induced abortion is illegal [120], therefore should be interpreted with caution. More recent studies have criticized methods used in the previous studies; therefore, the question remains open as to the causal effect of short interpregnancy intervals after miscarriage on adverse obstetric outcomes remains unknown [121, 122]. After a miscarriage, there is a very small burden on the folate reserve, and thus, miscarriage is not very likely to lead to folate deficiency in the postpartum period, so miscarriage and delivery later in pregnancy may have differential effects on subsequent pregnancy [123, 124]. This could explain the reduced risk of adverse outcomes in a short IPI after a miscarriage [123] but not after delivery. In support of this hypothesis, there is evidence to suggest that late miscarriages (after 12 weeks of gestation) are associated with worse outcomes in the subsequent pregnancy [124]. In addition, most women who attempt another pregnancy soon after a miscarriage are likely to be motivated to take better care of their health and consequently result in better pregnancy outcomes [125,126,127]. Another plausible reason may be that those who conceive soon after a miscarriage are naturally more fertile and younger and consequently have better pregnancy outcomes. Therefore, even though the characteristics of the meta-analysis included in our assessment classified this association as robust for a protective effect, given the complex causal structure of these associations, interpretations should be made with caution.

Sleep disorders and mental health

Another risk factor with robust evidence was sleep-disordered breathing. This meta-analysis clearly demonstrated the increased risk profile of women who experience SDB not only for preterm birth but for other adverse pregnancy outcomes [80]. Regarding plausible mechanisms, the association between SDB and intermittent maternal hypoxia as well as the link with conditions synonymous with impaired placental function such as pre-eclampsia suggests a multifactorial cause, with both physiologic changes associated with pregnancy and placental dysfunction involved [80]. This robust association has clear implications for obstetric practice. First, given the rapidly increasing worldwide obesity rates, SDB is likely to become more prevalent in the pregnant population and it should be introduced in screening. Second, the increased risk for both adverse intrapartum and perinatal outcomes demonstrated in this review strongly supports the need for increased surveillance in women who experience SDB during pregnancy. Third, public health education programs must take into account the specific maternal and perinatal risks and promote education about the significance of obstructive sleep apnea symptoms and the need for women to discuss this with their obstetric caregivers. Screening for sleeping habits and suggesting more frequent follow-up for women with such symptoms have the potential to reduce the burden of PTB.

In alignment with this suggestion, women with personality disorders could be identified early through mental health screening, where targeted health interventions and multidisciplinary management can be implemented to reduce adverse outcomes for the baby/child and woman. This early identification and support also have the potential to enable the prevention of maladaptive development trajectories within the mother-infant relationship [128, 129].

The ability to modify those factors, mainly those related to mental health and sleep quality screening, through clinical interventions or public health policy measures remains to be established. Nevertheless, we need to highlight that there is no guarantee that even a convincing observational association for a modifiable risk factor would necessarily translate into large preventive benefits for preterm birth if these risk factors were to be modified [8].

Clinical practice and medical history

Another association that fulfilled all criteria for a robust association is the prior I-TOP with VA. Concerns have been expressed regarding the validity of the reported association mainly due to the quality of the primary studies [130]. Many of them did not adjust for strong confounders such as parity, prior PTB, race, and smoking [98, 130]. The analysis of primary studies that reported data on cofounders and adjusted the risk estimated on these cofounders, revealed a greater increase in the PTB incidence [98]. This is supported by the fact that women who underwent an I-TOP usually have a low socioeconomic status and are likely to be exposed to a variety of factors related to PTB [98]. Moreover, abortion is a reported outcome that is accompanied by social stigma and, therefore, can be omitted from the medical history, leading to a high risk of differential misclassification. This highlights the need to thoroughly examine the other possible biases that can be identified in a meta-analysis even in the case that the epidemiologic criteria classify an association as robust.

Furthermore, it is important that clinical examination and medical history includes risk factors which are not well known, identified in meta-analysis with highly suggestive evidence. Regarding highly suggestive evidence, there were a few that are well known and used to classify pregnancies as high risk for PTB such as therapies for cervical intraepithelial neoplasia, advanced maternal age, placental pathology, race, first trimester bleeding, and maternal comorbidities. There were also included factors that are not routinely screened in the obstetric population such as intimate partner violence, cancer survivors, and being unmarried.

Obesity is generating an unfavorable metabolic environment from early gestation; therefore, initiation of interventions for weight loss during pregnancy might be belated to prevent or reverse adverse effects, which highlights the need for weight management strategies before conception [75, 106, 107, 131]. Moreover, obesity is becoming a global epidemic, while assessing the strength of evidence that supports the impact of overweight and obesity in comorbidities such as sleep-disordered breathing could allow not only the identification of women at high risk for adverse outcomes including PTB, but also better prevention. PTB does not only increase the risk for maternal and infant complications, but also significantly increases a woman’s risk of cardiovascular disease (CVD) after pregnancy; therefore, primary prevention of obesity could lead to multiple benefits [132,133,134,135].

Environment

Regarding environmental risk factors, increased residential greenness was associated with a protective effect on the risk of PTB. Although this finding was categorized as having suggestive evidence, the P-value of the random effect estimate was very close to the stringent threshold of < 10−6. Acknowledging the detrimental projected effect of climate change in greenness and given that it is one of the few protective risk factors for PTB [113], serious efforts should be made to maintain and grow residential greenness. Possible mechanisms include among others amelioration of the effects of air pollutants, reduction of stress, and increase in physical activity [113]. There was also suggestive evidence for early pregnancy exposure to PM2.5 and the risk of PTB. This association has been debated in the literature with conflicting results about the timing and magnitude of effect and is less robust than other associations that have been shown to have strong evidence for associations [136] such as birth weight.

Strengths and limitations

To our knowledge, this umbrella review represents the most comprehensive overview of published literature of observational studies to date investigating associations between a wide array of risk factors and PTB. The epidemiological robustness of meta-analyses of observational studies was assessed against a transparent and replicable set of statistical criteria. In addition, we performed a deeper assessment of these associations and assessed their potential to test causal assumptions. Our assessment has certain limitations. Umbrella reviews focus on existing systematic reviews and meta-analyses and therefore some studies may have not been included either because the original systematic reviews did not identify them, they were too recent to be included, or they did not provide the data to be included. In the current assessment, we used all available data from observational studies; therefore, the meta-analysis estimates may partly reflect the biases in the primary studies. Statistical tests of bias in the body of evidence (small-study effect and excess significance tests) offer hints of bias, not definitive proof thereof, while the Egger test is difficult to interpret when the between-study heterogeneity is large. These tests have low power if the meta-analyses include less than 10 studies and they may not identify the exact source of bias [22, 24, 34]. More specifically, in our study, all robust evidence applied to meta-analyses with less than 10 studies; therefore, the results of publication bias should be interpreted with caution. Furthermore, we did not appraise the quality of the individual studies on our own, since this should be included in the original meta-analysis and it was beyond the scope of the current umbrella review. However, we recorded whether and how they performed a quality assessment of the synthesized studies. Lastly, we cannot exclude the possibility of selective reporting for some associations in several studies. For example, perhaps some risk factors were more likely to be reported, if they had statistically significant results. Diving deeper into the associations that were classified as robust, we detected some issues beyond the prespecified criteria that are traditionally applied for umbrella reviews.

Therefore, it is recommended that future umbrella reviews perform a comprehensive assessment of the associations beyond the classic criteria.

Conclusions

The present umbrella review of meta-analyses identified 166 unique risk factors for preterm birth. Our analysis identified seven risk factors with robust evidence and strong epidemiological credibility pertaining to isolated single umbilical artery, amphetamine exposure, maternal personality disorder, sleep-disordered breathing, induced termination of pregnancy with vacuum aspiration, low gestational weight gain, and interpregnancy interval following miscarriage of less than 6 months, but the results should be interpreted with caution. As previously suggested, the use of standardized definitions and protocols for exposures, outcomes, and statistical analyses may diminish the threat of biases, enhance comparability of different studies examining risk factors, and promote the development and training of prediction models that could identify high-risk populations and promote public health.