Caffeine intake during pregnancy and adverse birth outcomes: a systematic review and dose–response meta-analysis
- First Online:
- Cite this article as:
- Greenwood, D.C., Thatcher, N.J., Ye, J. et al. Eur J Epidemiol (2014) 29: 725. doi:10.1007/s10654-014-9944-x
- 1.9k Downloads
Caffeine is commonly consumed during pregnancy, crosses the placenta, with fetal serum concentrations similar to the mother’s, but studies of birth outcome show conflicting findings. We systematically searched Medline and Embase for relevant publications. We conducted meta-analysis of dose–response curves for associations between caffeine intake and spontaneous abortion, stillbirth, preterm delivery, low birth weight and small for gestational age (SGA) infants. Meta-analyses included 60 unique publications from 53 cohort and case–control studies. An increment of 100 g caffeine was associated with a 14 % (95 % CI 10–19 %) increase in risk of spontaneous abortion, 19 % (5–35 %) stillbirth, 2 % (−2 to 6 %) preterm delivery, 7 % (1–12 %) low birth weight, and 10 % (95 % CI 6–14 %) SGA. There was substantial heterogeneity in all models, partly explained by adjustment for smoking and previous obstetric history, but not by prospective assessment of caffeine intake. There was evidence of small-study effects such as publication bias. Greater caffeine intake is associated with an increase in spontaneous abortion, stillbirth, low birth weight, and SGA, but not preterm delivery. There is no identifiable threshold below which the associations are not apparent, but the size of the associations are generally modest within the range of usual intake and are potentially explained by bias in study design or publication. There is therefore insufficient evidence to support further reductions in the maximum recommended intake of caffeine, but maintenance of current recommendations is a wise precaution.
KeywordsCaffeine Pregnancy Miscarriage Stillbirth Preterm birth Birth weight Small for gestational age infant Meta-analysis
Caffeine is present in many drinks and foods consumed during pregnancy, most notably in tea, coffee, colas, energy drinks and chocolate. Although amounts vary due to brand and brewing methods, and vary from country to country, on average one 260 ml (10 oz) mug of coffee contains around 100 mg caffeine. It is also a constituent of several common medications available over the counter or on prescription . Given its widespread presence, most pregnant women consume at least small amounts. However, its effects on the developing fetus are still not fully understood .
Animal studies suggest possible adverse effects on reproductive outcomes, including fetal growth, but may not be relevant for humans because caffeine metabolism varies greatly across species [2, 3]. In humans, studies have reached mixed conclusions, in part because of difficulties measuring caffeine intake, but also because of other clinical influences on fetal growth and birth outcomes [4, 5, 6]. Only one randomised controlled trial of caffeine reduction during pregnancy has been conducted to date, concluding that “moderate reductions” in caffeine intake (of around 200 mg) do not substantially alter birthweight or length of gestation . Given that the intervention came after the 1st trimester, did not investigate the more important outcome of spontaneous abortion, and only intervened on one source of caffeine (coffee), questions over the safe consumption of caffeine in pregnancy remain.
Because of the possible associations with restricted fetal growth, birth defects, miscarriage and stillbirth in humans, guidance in several countries, including the US and UK, has continued to consider it a wise precaution to limit caffeine intake to less than 200 mg immediately before and during pregnancy [2, 8, 9, 10]. In the UK, guidance has recently changed to reduce the recommended maximum intakes to this level . However, because of the differing conclusions reached by several major studies, the strength of any association, and the possibility of a threshold effect, has not been fully quantified [2, 8, 12].
To inform guidance, and in the light of recent large prospective studies, we aimed to pool information from the different observational studies that present information on the association between caffeine and the adverse birth outcomes of spontaneous abortion, stillbirth, preterm birth, low birth weight and small for gestational age. We aimed to avoid problems present in many systematic reviews of observational studies that compare extreme categories by estimating the dose–response slope over all categories of intake . In particular, we aimed to quantify the degree of any association and identify any possible threshold effects through modelling nonlinear dose–response curves.
Data sources and searches
We conducted a comprehensive systematic literature search covering all case–control and cohort studies providing evidence on dietary caffeine intake and adverse pregnancy outcomes, including spontaneous abortion or miscarriage, stillbirth, preterm birth, low birth weight and small for gestational age (SGA) infants. We searched MEDLINE and EMBASE online databases for all studies published in any language up to 15th May 2014 (detailed search strategy in online table 1). We also hand searched the reference lists of included studies and relevant review articles. The guidelines for conducting meta-analysis of observational studies in epidemiology were used throughout the design, conduct, analysis, and reporting of this review . A detailed protocol for this review was produced, but is not yet available to download. Instead, methods are provided in detail in this paper.
We screened titles and abstracts to remove publications when it was immediately apparent they were not relevant, such as editorials and single case-study reports. We obtained full-text versions of potentially relevant articles. The process of identifying relevant articles was conducted independently by five members of the review team (DCG, JY, LG, GK, LGK). The first author made the final decision where there were any differences. Only case–control and cohort studies were eligible for inclusion in the review, including nested case–control and case-cohort studies.
Inclusion criteria were studies based on dietary assessment of maternal caffeine intake during pregnancy (not before), published in any language, with assessment of caffeine or coffee with more than two categories of exposure, with outcomes including at least one of spontaneous abortion (fetal loss under 24 weeks gestation or less), stillbirth (fetal loss after 24 weeks gestation), preterm birth (before 37 weeks), low birth weight (<2500 g), or small for gestational age infant (defined as a weight below the 10th percentile for a given gestational age), as an outcome. Studies had to include some estimate of relative risk (RR) with a measure of uncertainty such as 95 % CIs.
Where results from the same study were presented in several papers, the results based on the larger sample, with the most complete assessment of caffeine intake (i.e. total caffeine rather than just caffeine from coffee), or that with the most appropriate adjustment for confounding, were used.
Where caffeine-containing beverages were not presented as mg of caffeine per day, a serving of coffee was assumed to contain approximately 100 mg caffeine and any caffeine-containing beverage not separated into different sources (e.g. coffee, tea or cola) was assumed to contain 60 mg of caffeine on average . These are broadly in line with the assumptions made in the studies included that assessed caffeine intake on the basis of dietary intake.
To be included in linear dose–response meta-analyses, studies needed to present estimates of RR with associated CIs for at least three categories of intake, alongside a quantified measure of caffeine or coffee intake, with sufficient detail regarding the numbers of cases and non-cases. Studies already presenting results as linear dose–response (e.g. RR per 100 mg/day) were included as well. Where studies presented results in both ways, the latter was taken as being the most accurate. To be included in nonlinear dose–response meta-analyses, more than three categories of intake were required, and results presented only as linear dose–response curves could not be used.
Data extraction and quality assessment
From the publications identified, we extracted the following information: authors, publication year, geographical region of the study, numbers of cases and non-cases, whether dietary assessment was prospective (before delivery) or retrospective, whether assessment of caffeine was based on multiple dietary sources or just coffee, level of dietary exposure (either as mean, median, midpoint or range for each category or unit of increment for continuous estimates), estimated RRs with confidence intervals, and characteristics controlled for either by modelling, matching or stratification. Data extraction was carried out by YJ and DCG and its accuracy checked by DCG and a sample by JC.
We assessed the methodological quality of studies using the Newcastle–Ottawa scale for either case–control or cohort studies, as appropriate, presented as a risk of bias table . For cohorts, stars were awarded in the “selection” category for participants being representative pregnancies in terms of caffeine intake, sampling of unexposed participants from the same community, detailed caffeine assessment, and demonstration that adverse pregnancy outcomes were not present at the start of the study. Comparability stars were awarded for adjustment for age and for smoking. These should be interpreted as risk of bias in the estimate, and not the study itself, as some estimates were derived from unadjusted descriptive statistics of secondary exposures. Outcome stars were awarded for outcomes based on medical records rather than self-report, for follow-up to end of pregnancy, and for at least 70 % follow-up. For case–control studies, stars were awarded for similar criteria, selection stars being awarded for independently validated cases, cases representative of all incident adverse outcomes, appropriately sourced controls, and controls demonstrably free from the adverse outcome. Comparability stars were awarded as for cohorts. For case–control studies, the remaining quality assessment is based on the exposure definition rather than the outcome definition, with exposure stars awarded for caffeine assessment blind to case–control outcome, the same method of caffeine assessment for cases and controls, and the same response rate in both groups (within 10 % points of each other). All studies were included, regardless of perceived quality.
Data synthesis and analysis
So that we could legitimately pool results from studies using different exposure categorisation, we derived a linear dose–response trend for each study using Greenland and Longnecker’s method . This method allows estimation of study-specific linear dose–response slopes and their associated confidence intervals, based on the results presented for each category of caffeine intake. These dose–response slopes, now all representing estimates of the same quantity, can then be combined into an overall pooled estimate using standard methods for meta-analysis.
So that we could derive the dose–response trend, we used the mean or median caffeine intake for each category if this was presented. We used the category midpoints when caffeine intake was only reported as a range. When the highest or lowest categories of intake were unbounded, we estimated the midpoint by assuming the width of the category was the same as the next adjacent category. If the distribution of cases was not presented in the publication, we estimated these initial numbers from definitions of the quantiles, assuming equal numbers in each category.
When studies already presented a linear dose–response trend based on a continuous measure of caffeine intake, alongside a measure of precision such as a confidence interval or a standard error, then we used this instead of deriving the trend indirectly. Where results were presented for two distinctly separate subgroups of women , we combined the separate subgroup results first using a fixed effects meta-analysis before they were combined with the other studies. In this way the between-study heterogeneity was estimated appropriately. Where results were presented for two distinct sub-outcomes on the same women , these were first combined using an approach suggested by Hamling et al. . This combines the fitted cell counts for the two sub-outcomes, maintaining the same exposure category totals, combining the two results into one with no double counting. We then pooled all the estimated dose–response trends for each study using a random effects model that takes into account the anticipated between-study heterogeneity .
We also explored potential nonlinear associations between caffeine and birth outcome. For each study that presented results for more than three categories of intake (the minimum required to model a nonlinear curve) we calculated a restricted cubic spline. This was based on three knots fixed at 10, 50 and 90 % through the total distribution of intake . We then combined these using multivariate meta-analysis .
We quantified the percentage of total variation in study estimates attributable to between-study heterogeneity (I2) and tested between-study heterogeneity using Cochran’s Q . We assessed the methodological quality of studies using the Newcastle–Ottowa scale, presented as a risk of bias table. We also performed a limited number of pre-defined subgroup analyses to explore aspects of study quality that may have contributed to the differences in the results seen across the studies, such as retrospective or prospective assessment of intake, source of caffeine, geographic location, and adjustment for pre-specified potential confounders. Though caffeine metabolism may be moderated by CYP1A2 activity, smoking status, or nausea , numbers of studies reporting stratified analyses were too few to enable investigation of effect modification by these factors. We conducted sensitivity analysis excluding potentially highly influential studies and studies that might be considered materially different. Any such analyses are included in the relevant section of the results. We investigated potential small study effects, such as publication bias, using contour-enhanced funnel plots. However, with small numbers of included studies, exploration of sources of heterogeneity and of small study effects lacked power. All analyses were conducted using Stata version 13.1 .
One publication only presented miscarriage combined with stillbirth . These two outcomes could have very different risk factors, but because of the low incidence of stillbirth relative to miscarriage, this publication was included in the meta-analysis of miscarriage, with sensitivity analysis to confirm this did not materially influence the pooled estimate. One study of low birth weight also included some small for gestational age infants . This was included in the low birth weight analysis as these formed the majority of events, but with sensitivity analysis to confirm that this did not substantially change the results. One study provided additional information on miscarriage and stillbirth separately that was incorporated into the meta-analyses . One publication provided results for the association between preterm delivery and caffeine separately for small for gestational age infants and normal for gestational age infants . These two subgroups were first combined using a fixed effects meta-analysis before they were pooled with the other studies. One study presented the association between caffeine and preterm delivery with premature rupture of the membranes separately from those without premature rupture of the membranes . These were first combined using an approach suggested by Hamling et al.  before pooling with the other studies. One publication was based on women with type 1 diabetes . Because of the non-general population, a sensitivity analysis was conducted to confirm that this did not materially influence the pooled estimate. Where a publication presented results for different definitions of small for gestational age, the method most consistent with the other studies was used . However, one publication with a combined outcome of ultrasound-based diagnosis of intrauterine growth restriction and low birth weight, could not be included in either category .
There was substantial heterogeneity between the studies (I2 = 89 %; 95 % CI 85–92 %; Q = 230; df = 25; P < .001). The estimated between-study variance was unchanged by excluding the study that had combined miscarriage and stillbirth into a single outcome , and this resulted in a negligible change to the pooled estimate (RR = 1.14; 95 % CI 1.09–1.18; P < .001). Excluding the study with greatest influence  decreased the estimated between-study variance by more than half, but resulted in only a marginally reduced estimate (RR = 1.11; 95 % CI 1.07–1.14; P < .001), so was subsequently excluded from the subgroup analyses.
There was no evidence that any study characteristics investigated, such as retrospective caffeine assessment or adjustment for specific potential confounders, were associated with higher or lower estimates in subgroup analyses (online table 3a). However, there was evidence of considerable asymmetry in the funnel plot (online figure 1a), to the extent that small-study effects such as publication bias cannot be ruled out for this outcome.
The nonlinear dose–response meta-analysis showed a small but consistently increasing incidence of miscarriage associated with increased daily caffeine intake (Fig. 2b). There was little evidence of any nonlinear association, such as a threshold effect, in the plot.
Our literature search identified eight unique publications from five studies investigating the association between caffeine and stillbirth. Data were extracted from all five of these studies (three cohort, two case–control), three from Europe, one from the US and one from Uruguay. The pooled estimate of RR from linear dose–response meta-analysis was 1.19 (95 % CI 1.05–1.35) per 100 mg/day of caffeine (P = .007) (Fig. 2c). All the identified studies were included in meta-analyses.
There was substantial heterogeneity between the studies (I2 = 82 %; 95 % CI 59–92 %; Q = 22; df = 4; P < .001). Studies that only considered coffee had substantially lower estimates than those that considered caffeine from multiple sources (P = .02). Stratifying on this study characteristic reduced the percentage of total variation in estimates attributable to between-study heterogeneity to <50 % in both groups (online table 3b). Again there was evidence of asymmetry in the funnel plot (online figure 1b), but the number of studies was small.
The nonlinear dose–response meta-analysis showed a small but consistently increasing incidence of stillbirth associated with increased daily caffeine intake (Fig. 2d). There was little evidence of any nonlinear association.
The literature search identified 21 unique publications from 20 studies investigating the association between caffeine and preterm delivery. Data were extracted from 15 of these studies (8 cohort, 7 case–control), 6 from Europe and 9 from the US. The pooled estimate of RR from linear dose–response meta-analysis was 1.02 (95 % CI .98–1.06) per 100 mg/day of caffeine (P = .42) (Fig. 2e). Of the five studies that could not be included in meta-analyses, four suggested some positive association [40, 41, 42, 43], and one did not .
There was substantial heterogeneity between the studies (I2 = 63 %; 95 % CI 34–79 %; Q = 38; df = 14; P = .001). Studies with prospective dietary assessment of caffeine intake tended to have more positive associations with preterm delivery than those with retrospective assessment (P = .04) (online table 3c). There was no evidence of any asymmetry in the funnel plot (online figure 1c).
The nonlinear dose–response meta-analysis showed a generally straight flat line with no evidence of any nonlinear association (Fig. 2f).
Low birth weight
There was substantial heterogeneity between the studies (I2 = 75 %; 95 % CI 55–86 %; Q = 40; df = 10; P < .001). Excluding the study with additional small for gestational age infants , had a negligible impact on the pooled estimate or the estimated heterogeneity (RR = 1.06; 95 % CI 1.01–1.12; P = .02). Studies that adjusted for maternal education or socio-economic factors had substantially lower estimates than those that did not (P = .02) (online table 3d). There was some evidence of asymmetry in the funnel plot (online figure 1d), leaving open the possibility of small-study effects such as publication bias.
The nonlinear dose–response meta-analysis showed a small but consistently increasing incidence of low birth weight associated with increased daily caffeine intake (Fig. 3b). There was little evidence of any nonlinear association.
Small for gestational age
A total of 18 unique publications were identified investigating the association between caffeine and small for gestational age infants, based on 18 studies. Data were extracted from 15 studies (10 cohort, 5 case–control) that could be included in meta-analysis, 6 from Europe and 9 from the US. The pooled estimate of RR from linear dose–response meta-analysis was 1.10 (95 % CI 1.06–1.14) per 100 mg/day of caffeine (P < .001) (Fig. 3c). The three studies that could not be included in meta-analysis provided mixed results, with one suggesting an association , one not , and a third suggesting no overall association, but some evidence of effect modification by smoking .
There was substantial heterogeneity between the studies (I2 = 64 %; 95 % CI 38–80 %; Q = 39; df = 14; P < .001). Studies that adjusted for smoking (P = .04) and studies that adjusted for previous adverse pregnancy outcomes (P = .05) tended to have lower estimates than those that did not adjust for these potential confounders (online table 3e). There was some evidence of asymmetry in the funnel plot (online figure 1e), leaving open the possibility of small-study effects such as publication bias.
The nonlinear dose–response meta-analysis showed a small but consistently increasing incidence of small for gestational age infants associated with increased daily caffeine intake (Fig. 3d). There was little evidence of any nonlinear association.
We have, for the first time, quantified with precision the association between caffeine and adverse birth outcomes, based on 60 publications from 53 separate cohort and case–control studies of adverse pregnancy outcomes. Meta-analysis of the associations between caffeine intake contain nearly 15,000 cases of miscarriage from 180,000 women, 700 still births from 120,000 women, 8,000 preterm deliveries from nearly 110,000 women, 5,000 low birth weight infants from nearly 78,000 women, and nearly 12,000 small for gestational age infants from 160,000 women. The evidence covers a variety of countries with different levels of intake, including non-consumers and categories consuming over 1,000 mg/day. This pooled evidence allows the associations between caffeine intake during pregnancy and these adverse outcomes to be described in greater detail than previously possible, and in a manner that allows the shape of the dose–response curve to be described.
A small but quantifiable association was observed between caffeine intake during pregnancy and incidence of miscarriage, stillbirth and low birth weight. There is also a similar sized association between caffeine intake during pregnancy and small for gestational age. There was no evidence of an association between caffeine intake and preterm delivery. For all outcomes the dose–response curves are fairly linear, with no evidence of any “threshold effect” or “plateau” in the dose–response curves. Heterogeneity is generally high, with little between-study heterogeneity being explained by aspects of study design or analysis investigated.
The size of the associations are relatively modest within the range of intakes consumed by the majority of women in the included studies, and within the range of intake currently recommended in most countries during pregnancy. In addition, the size of the associations are small relative to some established risk factors such as maternal smoking, but similar to others such secondhand smoke . It is therefore important to interpret any public health implications regarding caffeine intake in the context of known lifestyle risk factors.
It is also important to interpret these results alongside the clinical implications of the outcomes. Whilst the consequences of small for gestational age infants are less severe than miscarriage or stillbirth, small for gestational age has been associated with an increased risk of perinatal mortality and morbidity, including perinatal asphyxia. There is also a body of literature suggesting that it is associated with adverse effects in adult life [50, 51], such as increased incidence of obesity, hypertension, hypercholesterolemia, cardiovascular disease, and type 2 diabetes [52, 53, 54], If shown to be causal, a small association could therefore still be of importance from a public health perspective.
Given the observational nature of the evidence, we cannot draw inferences on the causal nature of the association identified in this review. Meta-analyses of observational studies are prone to the same biases as the studies they pool evidence across. Therefore the evidence from case–control studies are particularly susceptible to selection bias and recall bias, and all the studies are susceptible to uncontrolled confounding. In addition, all these observational studies are liable to bias from measurement error in using self-report measures to estimate the dietary intake of caffeine. One particular issue common to the majority of studies was the lack of an objective measure of exposure to tobacco smoke. Smoking is a potentially very strong confounder: smokers both consume more caffeine than non-smokers (because smokers’ altered CYP1A2 activity leads to faster caffeine clearance) and have much higher rates of adverse birth outcomes . It is therefore important to measure smoking objectively, using a repeated biomarker such as cotinine, to avoid measurement error bias, which in this case could lead to exaggerated associations from only partially controlling for its confounding effects .
The large heterogeneity observed in the meta-analyses also requires caution to be exercised in the interpretation of the results. Whilst earlier meta-analyses have also observed substantial heterogeneity [4, 5, 57], these may be explained in part by their pooling studies using different categories of intake. We tried to avoid this by placing each study on the same scale, pooling dose–response trends instead . Beyond this, we investigated other potential sources of heterogeneity through a small number of a subgroup analyses specified in advance. We used this to explore whether different study characteristics were associated with the observed differences in the results. These included study design, the method of caffeine intake assessment and adjustment for pre-specified potential confounders such as smoking. Whilst each of these were associated with some of the heterogeneity, it was not consistent across the outcome groups. Though heterogeneity was generally high, it mostly reflected variation in the size of the association, rather than whether there was an association. Heterogeneity associated with small-study effects such as publication bias was also observed for the meta-analyses of miscarriage, stillbirth, low birth weight and small for gestational age.
It is possible that reduction in caffeine may be a marker for a healthier pregnancy and that caffeine is not the cause of the adverse outcomes [58, 59]. None of the studies reviewed in this paper have adequately addressed this issue; simply adjusting for nausea does not correct for this potential bias and subgroup analysis suggested it made little difference to the estimates. This potential bias therefore remains the most prominent argument against a causal role of caffeine. Neither are pragmatic trials immune to this potential bias, where greater compliance with the intervention may be associated with healthier pregnancy.
Only one large double-blind randomised controlled trial of caffeine reduction during pregnancy and subsequent birthweight has been conducted to date . Over 1,000 Danish women were recruited, each consuming over three cups of coffee a day, and randomised to either caffeinated or decaffeinated coffee. However, the trial did not assess the important outcomes of miscarriage or stillbirth, ignored caffeine intake during the first trimester when caffeine consumption changes markedly and the majority of fetal deaths occur [7, 29], and did not measure compliance through objective biomarkers of caffeine intake. In addition, the intervention focussed on coffee intake rather than caffeine as a whole, whilst there is evidence from other countries that cola drinks, tea and chocolate may all contribute at least as much caffeine to the diet during pregnancy [6, 29]. These features of the trial limit the extent to which its results can contribute towards discussion of caffeine intake as a whole, or the association with miscarriage and stillbirth. In the absence of any other substantive trial data, our meta-analysis of observational studies provides a valuable resource.
If the observed association is causal, it is possible that it may be due to caffeine itself, to one of its metabolites, or a combination of them. Of the four primary routes of caffeine metabolism in humans, 3-demethylation is quantitatively the most important, the caffeine being converted to paraxanthine by CYP1A2. Studies have shown there to be varying levels of CYP1A2 activity in humans and there is considerable inter-individual variation in caffeine metabolism . Measures of caffeine consumption therefore do not necessarily indicate the levels of caffeine and caffeine metabolites in the maternal or fetal circulation. A small number of studies have measured levels of caffeine and its metabolites in maternal or umbilical cord blood rather than assessing caffeine consumption [55, 61, 62], though given the range of possible exposures, this was not the focus of this review. Linking phenotype with genotype is, however, an area for possible future research.
Given the observational nature of the studies, the heterogeneity and small-study effects, it is not possible to conclude that these associations are causal. The modest sizes of the associations are such that it is possible they could be explained by any or all of these potential biases. However, the plausible biological mechanisms, the evidence from animal studies, the mounting evidence from different observational human studies, and the dose–response slopes, provide some evidence to support the current recommendations limiting caffeine intake during pregnancy, such as restricting to less than 200 mg/day, as a precaution in case the associations really are causal. Whilst the associations are modest in size, they are potentially important at a public health level, and for infants already at elevated risk of adverse outcomes.
In summary, combining results from a large number of studies has allowed associations between caffeine intake and adverse pregnancy outcomes to be quantified with precision and discern a modest but significant association with caffeine intake that could only be adequately quantified by pooling results. A number of questions still remain to be answered. These include confirming causality, such as identifying whether caffeine is the causal agent, one of its metabolites, or whether the associations are completely explained by publication bias or caffeine being a marker of healthy pregnancy. Whilst these issues are unresolved, our results confirm the precautionary guidance adopted by countries recommending limiting caffeine consumption during pregnancy.
This review was funded by the Food Standards Agency (Contract T01033). We would like to acknowledge the contribution of Alastair Hay, Kay White and Nigel Simpson from the University of Leeds for comments on preliminary analyses and Gary Welsh from the Food Standards Agency Information Services and the University of Leeds Health Sciences Library for assistance with the literature searches.
Conflict of interests
The authors have no competing interests.