Background

In early 1990’s, the “fetal origin hypothesis” of adult diseases was suggested to describe the observed associations between low birth weight (BW) and cardiovascular diseases in adult life [15]. Barker, who first observed these associations, hypothesized that fetal under-nutrition may lead to disproportionate fetal growth and program later coronary heart disease risk [6].

Since then, the importance of the early life and intrauterine environment in relation to later disease has been widely acknowledged and studied [1, 610]. BW is considered a marker of the intrauterine environment and has been extensively studied in epidemiological research, both in terms of its predictors but mainly in relation to subsequent disease. The examined phenotypes expanded beyond cardiovascular conditions into a wide range of outcomes and traits, including respiratory disease [8, 11], cancer [12, 13] and psychiatric outcomes [14]. At the same time, acknowledging its importance, WHO included low BW (<2500 g) as one of its 2025 targets, namely a 30 % reduction in the number of infants born with a BW below 2500 g by 2025 [15]. During the last two decades, interest in the potential health risks associated with high BW (>4000 g) has also emerged, and associations between high BW and the risk of adverse health outcomes have been studied in an increasing number of scientific papers.

Interpreting associations between BW and the occurrence of health problems later in life is, however, challenging and linked to a series of methodological limitations [16]. Despite the attention that BW has received in public health policy and epidemiological research, a comprehensive assessment of the proposed associations between BW and future disease is lacking. In the current study, we applied the methodology of umbrella reviews to map all the outcomes that have been associated with low and high BW and we applied a standardized approach to assess the credibility of the findings in order to identify which associations are supported by robust evidence.

Methods

Literature search and eligibility criteria

We performed an umbrella review, which is a comprehensive and systematic collection and evaluation of multiple systematic reviews and meta-analyses performed on a specific research topic [17]. We followed a standardized procedure that has already been applied in the appraisal of observational associations in other research fields [1821]. We systematically searched PubMed from inception to December 24, 2015, to identify systematic reviews and meta-analyses of observational studies examining associations of BW with medical conditions, traits and biomarkers. We used the following search algorithm: (“birth weight” OR “birth size” OR “small for gestational age” OR “large for gestational age” OR “fetal growth restriction” OR “intra-uterine growth restriction”) AND (systematic review* OR systematic literature review* OR meta-analys*). We excluded meta-analyses examining genetic or environmental determinants of BW. We further excluded the meta-analyses of individual participant data that did not report the study-specific estimates and pooled analyses that only summarized evidence across a non-systematically selected number of cohort studies or that did not present the study-specific effect estimates of component studies [2227]. We did not apply any limitation based on language of publication.

Data extraction

Two independent researchers extracted the data (LB, CK), and in the case of discrepancies, the final decision was that of a third researcher (IT). From each eligible article, we recorded the first author, journal, year of publication, examined outcomes and number of studies included. We also extracted the study-specific effect sizes (risk ratio, odds ratio, hazard ratio, mean difference and regression coefficient) along with the corresponding 95 % confidence intervals and the number of cases and controls in each study for each association. Whenever the sample sizes were not available through the meta-analysis, we retrieved the original reports to record them. Further, when multiple comparisons were available for a particular phenotype (e.g. < 2500 g vs. ≥ 2500 g and < 2500 g vs. 2500–4000 g) we always preferred to extract information on < 2500 g versus ≥ 2500 g and > 4000 g versus ≤ 4000 g in the case of low BW and high BW, respectively. However, when this comparison was not available, we extracted the comparison reported by the meta-analysis. For the excluded meta-analyses assessing an overlapping association, we recorded the level of comparison and the summary effect estimate along with the 95 % confidence interval. Additionally, we scrutinized the full-text of the eligible papers to examine whether their authors discussed the potential effect of gestational age in the association of BW with subsequent health outcomes.

Statistical analysis

For each meta-analysis, we estimated the summary effect size and its 95 % confidence interval with both fixed-effects and random-effects models [28, 29]. We also estimated the 95 % prediction interval, which further accounts for between-study heterogeneity and evaluates the uncertainty for the effect that would be expected in a new study addressing that same association [30, 31].

In the case of meta-analyses with continuous outcomes, the standardized mean difference was transformed to an odds ratio with an established formula [32]. Between-study heterogeneity was assessed by the I2 metric [33]. I2 ranges between 0 % and 100 % and is the ratio of between-study variance over the sum of the within-study and between-study variances [34]. Values exceeding 50 % or 75 % are usually judged to represent large or very large heterogeneity, respectively.

We assessed whether there was evidence for small-study effects (i.e. whether smaller studies tend to give substantially larger estimates of effect size compared with larger studies) with the regression asymmetry test proposed by Egger et al. [35, 36]. A P value less than 0.10 with a more conservative effect in the largest study than in random-effects meta-analysis was judged to be evidence for small-study effects.

We applied the excess statistical significance test, which assesses whether the observed number of studies with nominally significant results is larger than their expected number [37]. This test assesses whether the number of positive studies among those in a meta-analysis is too large based on the power that these studies have to detect plausible effects at an α of 0.05. The expected number of studies with significant results is calculated in each meta-analysis by the sum of the statistical power estimates for each component study. The power of each component study was estimated using the effect size of the largest study (smallest SE) in a meta-analysis and the power calculation was based on an algorithm using a non-central t distribution [38, 39]. Excess statistical significance for single meta-analyses was claimed at P < 0.10 [37]. For four associations, the power calculations and the excess statistical significance test were not performed, because the sample sizes of the component studies could not be retrieved neither from meta-analysis papers nor from the original reports.

Finally, we identified the associations that had the strongest validity and were not suggestive of bias. Specifically, we considered as convincing the associations that met the following criteria: significance under the random-effects model at P < 1 × 10−6, more than 1000 cases, not large between-study heterogeneity (I2 < 50 %), 95 % prediction interval excluding the null value, and no evidence of small-study effects and excess significance bias. Additionally, the associations with a statistically significant effect at P < 1 × 10−6, more than 1000 cases, and a statistically significant effect in the largest study were characterized as having highly suggestive evidence. We considered as suggestive the associations that have more than 1000 cases and a statistically significant effect under the random-effects model at P < 1 × 10−3. The rest of statistically significant associations at P < 0.05 under random-effects model were graded as weak associations.

The statistical analyses were performed with STATA version 12.0 and the power calculations were performed using STATA version 12.0 and G*Power version 3.1.

Results

Overall, the literature search identified 1520 articles, of which 39 articles, published between 2005 and 2015, were deemed eligible (Fig. 1). Sixty-three papers were screened by full-text. Of these, 10 examined the same or related phenotypes in the same population (defined as overlapping meta-analysis), six were individual participant data meta-analyses that did not report the study-specific effect estimates, and 12 were systematic reviews without a quantitative synthesis. The 39 eligible papers included 78 different meta-analyses (Table 1): 28 assessing low BW, four assessing small-for-gestational age infants, 18 assessing high BW, and 28 assessing a dose–response association between BW and subsequent health outcomes. A wide range of health outcomes has been studied ranging from anthropometry and metabolic disease, cardiovascular disease and cardiovascular risk factors, various cancers, respiratory diseases and allergies, musculoskeletal traits, and perinatal outcomes. Both neonatal and childhood conditions as well as adult diseases have been extensively examined (Table 1). Only two eligible papers had access to raw data of primary studies and performed an individual-level data meta-analysis [8, 40].

Fig. 1
figure 1

Flow chart of literature search

Table 1 Quantitative synthesis, bias assessment and credibility assessment of 74 associations between different comparisons of birth weight and health outcomes or traits

Overall, we identified more than one published meta-analysis for 25 outcomes, i.e. meta-analysis examining the same exposure (birth weight) and the same outcome. Overlapping meta-analyses provided concordant results, with the exception of two pairs, which had a summary effect in opposite direction (diastolic blood pressure and overweight) compared to the meta-analysis included in our umbrella review (largest most recently published meta-analysis). Six other meta-analyses differed in the summary effect significance compared to the most recent one (Additional file 1: Table S1).

Associations with low BW

Across 28 meta-analyses examining low BW as a dichotomous trait, the median number of cases was 5766 (interquartile range (IQR), 1574–11,729), while the median number of datasets was 11 (IQR, 8–16). Overall, 21 out of 28 associations had more than 1000 cases, 17 of 28 meta-analyses presented a nominally significant effect (P < 0.05) and 10 of them had a significant effect at P < 0.001. Only seven meta-analyses, examining the association of low BW with perinatal mortality in developing countries, wheezing disorders in childhood, being overweight or obese in adulthood, coronary heart disease, intelligence in adolescence, all-cause mortality, and chronic kidney disease, were statistically significant at P < 1 × 10−6 under the random-effects model (Table 1). The largest study had a standard error of less than 0.10 in 17 meta-analyses and a more conservative effect compared to random-effects model in 15 meta-analyses. Four meta-analyses (perinatal mortality in developing countries, coronary heart disease, school-age asthma, all-cause mortality) had a 95 % prediction interval excluding the null value. Five associations had large heterogeneity estimates (I2 ≥ 50 % and I2 ≤ 75 %), and 10 associations had very large heterogeneity estimates (I2 > 75 %). On bias assessment, seven associations had evidence for small-study effects (chronic kidney disease, coronary heart disease, diastolic blood pressure, intelligence in adolescence, medulloblastoma, wheezing disorders in childhood, and being overweight or obese in adulthood), and four associations (chronic kidney disease, diastolic blood pressure, intelligence in adolescence, and testicular cancer) had hints for excess significance bias (Table 1, Additional file 2: Table S2).

Associations with high BW

Across 18 meta-analyses examining high BW as a dichotomous trait, the median number of cases was 6115 (IQR, 3153–10,642), 16 meta-analyses were supported by more than 1000 cases, and the median number of datasets was 10 (IQR, 8–14). Ten associations presented a significant effect at P < 0.05, but only three associations (acute lymphoblastic leukaemia, all types of leukaemia, and being overweight or obese in adulthood) remained statistically significant after the application of a more conservative significance threshold (P < 1 × 10−6). The largest study had a standard error of less than 0.10 in four meta-analyses and a more conservative effect compared to random-effects model in 12 meta-analyses. Only four meta-analyses (all types of leukaemia, neuroblastoma, type 1 diabetes mellitus, and being overweight or obese in adulthood) had a 95 % prediction interval excluding the null value (Table 1). The heterogeneity estimate was large (I2 ≥ 50 % and I2 ≤ 75 %) in seven meta-analyses and only one meta-analysis presented very large heterogeneity (I2 > 75 %). Two associations presented hints for both small-study effects and excess significance bias (acute lymphoblastic leukaemia and all types of leukaemia), another two associations had only small-study effects (bone tumour and non-Hodgkin lymphoma in childhood), and two additional associations had hints for excess significance bias (acute myeloid leukaemia and testicular cancer).

Dose–response associations with BW

Across 28 meta-analyses, the median number of cases was 6747 (IQR, 3945–11,326) and the median number of datasets was 8 (IQR, 6–16). Overall, 17 associations were significant at P < 0.05, but only six associations survived in the application of a more stringent P value (P < 1 × 10−6). The largest study had a standard error of less than 0.10 in 21 meta-analyses and a more conservative effect compared to the random-effects model in 20 meta-analyses. Only six associations (all-cause mortality, bone mineral concentration in hip, coronary heart disease, melanoma, mortality from cardiovascular diseases, and waist-to-hip ratio) presented 95 % prediction interval excluding the null value (Table 1). Five associations presented large heterogeneity, and one association had very large heterogeneity. Hints for small-study effects and excess statistical significance were present in two (bone mineral concentration in lumbar spine, coronary heart disease) and eight meta-analyses (all-cause mortality, acute lymphoblastic leukaemia, all types of leukaemia, bone mineral concentration in lumbar spine, breast cancer, coronary heart disease, mortality from cancer, and waist-to-hip ratio), respectively (Table 1, Additional file 2: Table S2).

BW relative to gestational age

Three papers performed four meta-analyses examining associations between small-for-gestational-age infants (defined as BW below the 10th percentile for the gestational age) and the risk for acute lymphoblastic leukaemia, childhood stunting and depression. No meta-analyses on large-for-gestational age infants were identified. Under the random-effects model, three associations had a statistically significant effect at P < 1 × 10−6 and 95 % prediction interval excluding the null value (acute lymphoblastic leukaemia and childhood stunting in infants with low and normal BW; Table 1). Only one association had large between-study heterogeneity, whereas none of the examined associations presented evidence for small-study effects or excess significance bias.

Despite the importance of gestational age on BW, only four out of the 36 papers (pertained to seven meta-analyses) examining low BW, high BW or dose–response relationships with BW, presented subgroup analyses, including only studies that provided gestational age-adjusted estimates (Table 1) [14, 4143]. None of these analyses observed a statistically significant difference in the summary effect between the studies adjusting for gestational age and the unadjusted studies. Additionally, 18 (46 %) papers mentioned that the observed effect might differ from the true effect because gestational age was not considered as an adjustment variable in several observational studies. Twenty papers (51 %) reported the observational studies that adjusted for gestational age in the statistical models.

Assessment of epidemiological credibility

Twenty-eight of 78 associations (36 %) did not present a significant summary effect at P < 0.05. Of the remaining 50 associations, only four presented convincing evidence by having more than 1000 cases, not large heterogeneity, 95 % prediction interval excluding the null value, a significant summary effect at P < 1 × 10−6, and absence of small-study effects and excess significance bias (Table 2). These associations pertained to all-cause mortality for low versus normal BW, bone mineral concentration in hip and mortality from cardiovascular diseases per 1 kg increase in BW, and childhood stunting for small- versus adequate-for-gestational-age infants with BW ≥ 2500 g. Notably, apart from the meta-analyses on stunting, which included gestational age in the definition of the examined phenotype (small-for-gestational-age), none of the other three meta-analyses with convincing evidence restricted their analyses to studies with adjustment for gestational age. Eleven additional associations had highly suggestive evidence (more than 1000 cases, a significant summary effect at P < 1 × 10−6 and largest study with a significant effect). These associations examined perinatal mortality in developing countries, wheezing disorders, being overweight or obese in adulthood, coronary heart disease for the comparison of < 2500 g versus ≥2500 g, intelligence in adolescence for the comparison of low BW versus normal BW, all types of leukaemia, being overweight or obese in adulthood for the comparison of > 4000 g versus ≤ 4000 g, muscle strength and coronary heart disease for the comparison of increase per 1 kg in BW, and maternal cardiovascular mortality and paternal cardiovascular mortality for the comparison of increase per 1 SD in BW. Fourteen associations presented suggestive evidence and 13 associations had weak evidence (Table 2).

Table 2 Summary of evidence grading for meta-analyses associating different contrasts of birth weight and risk of future disease

Discussion

Our work constitutes the first comprehensive mapping and appraisal of the association between BW and the risk of subsequent health outcomes, as provided by published systematic reviews and meta-analyses of observational studies. Overall, 78 associations have been examined, including a diverse range of outcomes: cardiovascular, cancer, metabolic, respiratory and mortality outcomes, and disease traits and biomarkers. Despite common belief that the intrauterine environment as assessed by BW is associated with many diseases and disease traits in adult life [1, 610], our comprehensive assessment shows that convincing evidence only exists between the associations of low BW and increased risk for all-cause mortality, per 1 kg increase in BW and higher bone mineral concentration in hip and lower risk for mortality from cardiovascular diseases. Furthermore, the association between small-for-gestational-age and childhood stunting in low- and middle-income countries was supported by convincing evidence. There was no convincing evidence supporting associations between high BW and later outcomes; however, the associations with overweight or obesity in later life and all types of leukaemia were highly suggestive.

The associations between BW and cardiovascular disease were amongst the first to be observed in the medical literature [15] and our data suggests that the current evidence is highly suggestive. Both meta-analyses looking at low (<2500 g) versus high (≥4000 g) BW and those examining per 1 SD increase in BW showed highly significant summary effects and small between-study heterogeneity. However, both associations presented evidence for small-study effects and the dose–response association additionally had hints for excess significance bias. The latter may have resulted in inflated effect estimates for an association with cardiovascular disease that needs cautious interpretation [35, 44]. Despite the fact that studies have adjusted for a range of confounders, including socioeconomic status, not all studies were adjusted for gestational age, which is an important confounder and this, as well as other unrecognized confounders, could explain the observed association. In addition, the mechanisms underlying this association remain unclear despite many hypotheses having been suggested, including the one supporting that intrauterine under-nutrition leads to fetal adaptation, which is subsequently related to adverse cardiovascular risk in later life [10]. However, others have provided evidence that at least some of the association between the BW of individuals and their later risk of cardiovascular disease may be genetic and therefore not modifiable via interventions that target the intrauterine environment [45]. The causal pathway linking BW to cardiovascular risk needs further elucidation to allow evidence-based public health interventions.

The observed increased risk of cardiovascular disease associated with lower BW is likely to be a main contributor to the inverse association of BW with all-cause mortality; an association supported by convincing evidence in our assessment [42]. The higher incidence of perinatal mortality in the low BW group is also likely contributing to the all-cause mortality association with low BW, but only to a small extent. Babies born with a BW below 2500 g had increased perinatal mortality, an association supported by a very large summary effect estimate and a very small P value [46]. However, the meta-analysis on perinatal mortality was focused exclusively on developing countries. Therefore, the effect estimate might be exaggerated due to lack of neonatal intensive care units or difficult access to specialized healthcare facilities in these countries [47]. These data could not be generalised to other settings where high-quality healthcare is available.

The association between low BW and low bone mineral concentration in later life is less well studied compared to other outcomes and current data stem from six studies contributing to the meta-analysis [48]. Despite the fact that the association with bone mineral concentration in hip showed convincing evidence, cautious interpretation is required as data on osteoporotic fractures has not been reviewed and meta-analyses on other anatomical sites (e.g. lumbar spine) showed evidence for excess significance bias and no convincing associations.

Comparisons between BW and later overweight and obesity do not support a detrimental health effect of low BW. BW less than 2500 g was found to be protective for being overweight or obese, whereas BW greater than 4000 g was linked with an increased risk for being overweight or obese in adult life [43]. These associations were supported by highly suggestive evidence, but they also displayed very large between-study heterogeneity. Heterogeneity could be due to biased results in some of the included studies, but it could also reflect genuine differences across studies [35]. BW distributions are remarkably different across developed and developing countries [49], and the associations between BW and later adiposity may differ in these populations, contributing to the heterogeneity of the observed results. High BW is potentially causally associated with maternal BMI and glucose levels [50, 51]; however, the extent to which it could be modified through lifestyle or pharmacological interventions merits further investigation, particularly through long-term follow-up of interventions during pregnancy, which will strengthen and enhance the available evidence, particularly between high BW and subsequent risk of childhood and adulthood obesity [5254].

Although 29 associations focused on outcomes related to different types of cancer, high BW was found to be a risk factor only for developing leukaemia [13]. The associated summary effect estimate might be inflated by the presence of small-study effects and excess significance bias. However, the statistical heterogeneity was not large, the 95 % prediction interval excluded the null value and the association was highly significant. Similarly, despite diabetes being central in the “fetal origin hypothesis” [7], its association with high and low BW has weak evidence in the literature and is only suggestive of a direct association with high BW in line with the obesity-associated evidence.

Despite intensive research on BW reflected by the large number of meta-analyses identified, there were only three papers that performed meta-analyses of studies assessing low BW in relation to gestational age [40, 55, 56], whereas no single meta-analysis on large-for-gestational-age neonates was identified. As BW and gestational age are highly correlated, analyses which consider size-for-gestational-age rather than BW adjusted for gestational age have been proposed as a more appropriate alternative [57, 58]. Among the examined phenotypes in relation to small-for-gestational-age, the association between small-for-gestational-age without low BW and childhood stunting in low- and middle-income countries showed convincing evidence. However, those results require cautious interpretation as the analyses were stratified by BW and the association between small-for-gestational age with low BW and childhood stunting showed a much weaker effect estimate and was only supported by weak evidence. Additionally, those analyses focused on low- and middle-income countries, limiting the generalisability of those results but at the same time also highlighting the need for interventions during the pregnancy period in these populations [40]. The remaining meta-analyses included a mixture of studies that adjusted or not their analyses for gestational age and, hence, the current literature is inconclusive on the effects of BW relative to gestational age.

In the present study, we applied the umbrella review approach summarising data from already published systematic reviews and meta-analyses. This approach takes full advantage of the existing meta-analyses to perform a standardised methodological process for the assessment of the epidemiological credibility of the findings. However, our study has some caveats. First, the Egger test and excess statistical significance test offer hints of bias, and not proof thereof, while the Egger test is difficult to interpret when the between-study heterogeneity is large. Further, our excess significance estimates were based on the largest study of each meta-analysis and they might be conservative, because often these studies were not necessarily very large or might have had inherent biases themselves. Furthermore, we did not appraise the quality of the primary studies, because this was beyond the scope of this umbrella review. This should be the aim of the original systematic reviews and meta-analyses, which should examine the methodological characteristics of the component studies.

Conclusions

Our study maps the current status of evidence on 78 associations of BW with various health outcomes, traits and biomarkers. Of them, only three examined the effects of BW in relation to gestational age through size-at-birth defined phenotypes. Our results show that the range of outcomes associated with BW is narrow and smaller than described under the fetal origin of disease hypothesis. Currently, there is weak evidence that BW constitutes an effective public policy intervention for long-term health and disease.