FormalPara Key Points

Seven published reviews have provided evidence on the different aspects of the use of caffeine for the treatment of apnea in premature infants in the neonatal intensive care setting.

Three of these provide high-quality evidence that caffeine is as effective as other available options for apnea and is better than the no alternative option.

The four remaining systematic reviews suggest that increasing the dose of caffeine, or an earlier administration of it, enhances the effect of caffeine. This suggestion, however, and its safety, is undermined by poor-quality evidence.

Hence, while the use of caffeine for the treatment of apnea is advisable, pending confirmation through more evidence, there is currently no good evidence to support higher or earlier dose administration of caffeine.

1 Background

Apnea of prematurity (AOP) is a common developmental complication in preterm infants, which may have different causes, mostly constituting two different types of AOP: a central apnea due to no or insufficient respiratory drive due to the immaturity of the brain stem, and an obstructive apnea due to obstruction of the infants’ (upper) airways. Based on these, there is also the potential for mixed apnea [1]. Other specific causes of neonatal apnea include tissue damage in the infant’s brain, respiratory disease, infection, gastrointestinal reflux, cardiac problems, and metabolic disorders [2]. If prolonged, this can lead to hypoxemia and reflex bradycardia, which may require active resuscitative efforts to reverse.

As respiratory stimulators, caffeine, theophylline, and aminophylline have been used for AOP for more than 40 years. Caffeine, a methylxanthine derivative, has been used in the neonatal intensive care unit (NICU) to treat AOP since the mid-seventies [3, 4]. The Caffeine for Apnea of Prematurity (CAP) trial reported that caffeine reduced duration of ventilation and oxygen dependency and improved disability and disability-free survival [5]. The CAP trial also showed a significant benefit in the caffeine group in terms of motor skills compared with the placebo group at the age of 11 years [6]. In the United States, more than 300,000 infants are born late preterm; among them, nearly 12% experience apnea before discharge [7]. In one study, the estimated inpatient hospital cost due to the delay in discharge was as high as US$2422 per patient [8].

Our preliminary literature search suggested that there are numerous systematic reviews (SRs) published for the purpose of providing evidence on the usefulness of caffeine for apnea. The high quantity and the variability in focus and structure of these SRs could potentially lead to limitations regarding easy access and interpretation of evidence, and therefore, these reviews often fail to efficiently support decision-making in healthcare. The systematic overview of SRs is a recent study design for the purpose of addressing the growing problem of information overload, enabling an approach to filter large volumes of evidence so as to enhance access to evidence and better inform healthcare decision-making [9].

In this systematic overview of SRs, we aimed to summarize the main reported outcomes of caffeine for apnea management in published SRs that considered all types of study designs.

2 Methodology

The current review is a systematic overview that follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines (Appendix 1; see the Electronic Supplementary Material).

2.1 Identification and Selection of Systematic Reviews

We searched EMBASE, the Cochrane Database of Systematic Reviews (CDSR), and PubMed databases with the variations of the key terms “apnea,” “caffeine,” “methylxanthine,” “intensive care, neonatal,” and “infant, newborn” from inception to January 2020 to identify literature on this topic using an extensive search strategy. As examples, the search strategies for EMBASE and PubMed are provided in Appendix 2 (see the Electronic Supplementary Material). In addition, to identify possibly missed relevant literature, we searched the grey literature via Google Scholar and the references of relevant reviews. Studies that were SRs on neonates where caffeine was used to treat apnea were considered for inclusion. Any included SR, identified as an SR and/or meta-analysis, was one that (1) systematically identified the evidence about using the methylxanthines, (2) summarized the different outcomes from different sources, and (3) synthesized summative evidence about each of the different outcomes. No search restrictions, including regarding language and publication year, were imposed.

We excluded publications such as expert opinions, previous SRs of current/updated ones, and narrative reviews, as well as SRs of caffeine use for the prevention of neonatal apnea.

2.2 Selection of Studies

Two reviewers independently screened titles/abstracts for inclusion and exclusion. The eligible studies were then subjected to the full-text screening, including that based on the pre-specified definition of an SR. Two reviewers also conducted the full-text screening, and any discrepancies were resolved by consulting a third author.

2.3 Data Abstraction

Two authors independently extracted the data from the included studies. We extracted data related to the study characteristics, including study design, patient characteristics, intervention, comparator, outcome measures, effect estimates, type, and formulation.

2.4 Quality and Risk of Bias Assessment

Two reviewers independently assessed the quality and risk of bias of included SRs based on each item of the tools discussed in Sects. 2.4.1 and 2.4.2. Any disagreements were resolved by consensus, and if needed, a third author adjudicated.

2.4.1 Quality of Methods Assessment

A Measurement Tool to Assess Systematic Reviews (AMSTAR-2) [10] is a 16-item instrument used to determine the methodological quality of SRs, and it has good agreement, reliability, construct validity, and feasibility to do so. The scoring was done and calculated using the online AMSTAR-2 checklist (https://amstar.ca/Amstar_Checklist.php). According to the guidance document for AMSTAR-2, the overall methodological quality of each SR was rated as high, moderate, low, and critically low.

2.4.2 Risk of Bias Assessment

Two reviewers independently performed the risk of bias assessment with the help of the ROBIS tool [11] for all included SRs. Reviewers were asked to read the ROBIS guidance document and understand the assessment procedure prior to employing this tool. The overall quality of each SR was rated as “high risk of bias”, “unclear risk of bias”, or “low risk of bias” depending upon the rating given for each of the signaling questions (SQs).

2.5 Data Analysis

Data were reported descriptively and graphically using Microsoft Excel 2016. Ethical approval for this work was not required because the sample included published SRs, not human or animal studies. Since there were fewer than ten SRs included in this study, we did not assess the association between the characteristics and quality of SRs.

3 Results

Our literature search yielded 979 studies, and after the removal of duplicates, 559 were assessed based on titles and abstracts. Studies that were relevant were subjected to full-text screening according to the pre-specified eligibility criteria. Finally, seven SRs with meta-analyses (SRMAs) were included in our overview (Fig. 1).

Fig. 1
figure 1

Systematic reviews inclusion. CDSR Cochrane Database of Systematic Reviews

A total of 63,315 neonates were included among all studies that were considered in all seven SRMAs. The interventions studied in the included studies were caffeine (high maintenance dose, low maintenance dose, high loading and maintenance doses, low loading and maintenance doses, standard loading and maintenance doses), theophylline, doxapram, methylxanthine, and placebo. The SRMAs’ main characteristics are reported in Table 1.

Table 1 Summary characteristics of included systematic reviews

3.1 Summary of Included SRMAs

A 2018 SRMA by Chen et al. [12] compared the efficacy and safety of high (10–20 mg/kg daily) versus low (5–10 mg/kg daily) maintenance dosages of caffeine citrate for the treatment of apnea in premature infants. This review included 13 randomized controlled trials (RCTs), including 1515 infants. Compared to the low-dose group, the high-dose group exhibited a greater effective treatment rate (risk ratio [RR] 1.37, 95% confidence interval [CI] 1.18–1.60) and success rate for ventilator removal (RR 1.74, 95% CI 1.04–2.90). The high-dose group also demonstrated a lower extubation failure rate (RR 0.5, 95% CI 0.35–0.71), frequency of apnea (weighted mean difference [WMD] − 1.55, 95% CI − 2.72 to − 0.39), apnea duration (WMD − 4.85, 95% CI − 8.29 to − 1.40), and incidence of bronchopulmonary dysplasia (BPD) (RR 0.79, 95% CI 0.68–0.91). There was, however, a higher incidence of tachycardia (RR 2.02, 95% CI 1.30–3.12). There was no or moderate heterogeneity observed for all the assessed outcomes. There were no significant group differences in adverse events such as retinopathy of prematurity (ROP), necrotizing enterocolitis (NEC), intraventricular hemorrhage (IVH), and periventricular leukomalacia (PVL) and in-hospital death.

Another SRMA [13] studied the effects of early (0–2 days of life) compared to late (≥ 3 days of life) administration in very low birth weight infants. This review included four retrospective cohort studies and one RCT, with 59,136 participants. Meta-analyses of those studies showed that the risk of death (odds ratio [OR] 0.90, 95% CI 0.82–0.98), BPD (OR 0.51, 95% CI 0.39–0.65), and BPD or death (OR 0.52, 95% CI 0.38–0.71) was lower in the early caffeine group than in the late caffeine group. However, there was significant heterogeneity observed for both BPD and death outcomes. The adverse events of PVL (OR 0.56, 95% CI 0.49–0.63), ROP requiring laser photocoagulation (OR 0.44, 95% CI 0.22–0.89), PDA requiring treatment (OR 0.40, 95% CI 0.38–0.42), and IVH (OR 0.54; 95% CI 0.36–0.80) were also lower in the early caffeine group than in the late caffeine group. Early caffeine use was also not associated with a risk of NEC (OR 0.97, 95% CI 0.71–1.33) and NEC requiring surgery (OR 1.06, 95% CI 0.65–1.74). Pooled analysis also indicated that the early use of caffeine did not significantly reduce the duration of mechanical ventilation (MV) (standard mean difference [MD] − 0.16, 95% CI − 0.44 to 0.11). Sensitivity analyses showed that none of the studies specifically contributed to heterogeneity.

A Cochrane review by Henderson-Smart et al. [14] was conducted to find the effect of caffeine compared with theophylline treatment on the risk of apnea in preterm infants with recurrent apnea. A total of five trials, with 108 infants, were included. There were no differences in the treatment failure rate (less than 50% reduction in apnea/bradycardia) and the mean apnea rate between groups after 1–3 days treatment and 5–7 days treatment, respectively. Change in dose due to tachycardia or feed intolerance was lower in the caffeine group (RR 0.17, 95% CI 0.04–0.72).

Another review [15] conducted by the same authors assessed the effect of doxapram compared to methylxanthines in preterm infants with recurrent apnea. This review included four trials, including 91 infants with recurrent apnea. There were no differences detected in the incidence of failed treatment within 48 h (RR 0.91, 95% CI 0.45–1.85) between groups, without heterogeneity. Among the trials, none of the infants were subjected to MV on either treatment. Also, none of the studies reported safety data.

The same research group conducted another Cochrane review [16], which assessed the effects of methylxanthine treatment on the incidence of apnea. Both theophylline and caffeine showed significantly fewer treatment failures and less use of intermittent positive pressure ventilation compared to placebo. No difference in the low rate of death before discharge was found between the methylxanthines and control. One trial reported that tachycardia was observed in two infants in the theophylline group. The postmenstrual age at last oxygen use (MD − 0.90 weeks, 95% CI − 1.54 to − 0.26), age at the time of last endotracheal tube use (MD − 0.60 weeks, 95% CI − 1.03 to − 0.17), and age at last positive pressure ventilation (MD − 0.90, 95% CI − 1.32 to − 0.48) were lower in the caffeine group.

An SRMA conducted by Vliegenthart et al. [17] compared a high versus a standard caffeine treatment regimen in infants with a gestational age < 32 weeks, with loading doses of 10–80 versus 10–30 mg/kg, and maintenance dosages of 5–30 versus 2.5–10 mg/kg/day, respectively. This review included six RCTs, including 620 infants. Meta-analysis showed a significant decrease in BPD (RR 0.72, 95% CI 0.54–0.97), the combined outcome BPD or mortality (RR 0.76, 95% CI 0.59–0.98), and failure to extubate [typical relative risk (TRR) 0.51, 95% CI 0.37–0.70] in infants allocated to a higher caffeine dose. There were no differences in the adverse events NEC, spontaneous intestinal perforation, hyperglycemia, ROP, and IVH between the groups. There was heterogeneity observed due to the inconsistent definition of high and low dosage of caffeine.

The latest SRMA [18] included six RCTs (including 816 preterm infants) that compared high- and low-dose caffeine, with loading doses of over versus under 20 mg/kg and maintenance dosages of over versus under 10 mg/kg/day, respectively. There was no significant change in mortality observed between both groups (RR 0.85, 95% CI 0.53–1.38). However, high-dose caffeine showed fewer cases of extubation failure (RR 0.51, 95% CI 0.36–0.71), apneas (MD − 5.68, 95% CI − 6.15 to − 5.22), and BPD (RR 0.76, 95% CI 0.60–0.96) and shorter duration of MV (MD − 1.69, 95% CI − 2.13 to − 1.25) at high heterogeneity. There were no differences in main adverse events reported in other studies [12, 17], but there was a higher rate of tachycardia observed with the higher dose, but this did not lead to discontinuation of caffeine treatment in infants. The higher caffeine dose was also potentially associated with increased cerebellar bleeding, but this was only suggested when the high dose was combined with an early administration of the dose.

3.2 Methodological Quality of SRMAs

Three SRMAs were found to be of high quality [14,15,16], two were of moderate quality [12, 18], one was of low quality [17], and one was of critically low quality [13]. Quality items were lacking in different SRMAs to a different extent. Please see Fig. 2.

Fig. 2
figure 2

Quality assessment of included systematic reviews (SRs) based on A Measurement Tool to Assess Systematic Reviews (AMSTAR-2)

The only aspect of caffeine use that was only supported by low-quality evidence is the timing of administration, as per the Park et al. study [13]. This is of a particular concern given that there is no commonly agreed on standardized protocol on the optimal timing of caffeine therapy, despite the suggestion that earlier treatment is associated with increased benefit. The Park et al. study was at a critically low level of quality given that substantial methodological items were poorly executed. Based on AMSTAR-2, these items related to using a comprehensive literature search strategy, accounting for risk of bias in individual studies when interpreting the results of the review, reporting an explicit statement that the review methods were established prior to the conduct of the review with justification of any significant deviations from the protocol, providing a list of excluded studies and justifying the exclusions, and reporting on the sources of funding for the studies included in the review.

3.3 Risk of Bias in SRMAs

Among the SRMAs, the risk of bias was low in four, unclear in two, and high in one of the SRMAs. The sequence of domains that contributed to the high risk of bias for SRMAs was only domain 2 [12]. Unclear risk of bias was rated in only three SRMAs [12, 13, 17], which were in domain 2 and domain 3. What majorly contributed to the high risk of bias in domain 2 was SQ “2.4”; for the unclear risk of bias (domains 2 and 3), SQs “2.3” and “3.5” contributed the most. For all SRMAs, domain 1 and domain 4 were found to be at low risk of bias. Please see Fig. 3.

Fig. 3
figure 3

Risk of bias assessment of included systematic reviews (SRs) based on ROBIS

4 Discussion

Our review summarizes all the evidence in terms of SRMAs available for the treatment of apnea in infants. A total of seven SRMAs were published in the last 2 decades, including updates of three Cochrane reviews and one non-Cochrane review. All these SRMAs were based on RCTs, except one, which also included observational studies. SRMAs varied with respect to interventions, comparators, and outcomes.

Three of the included seven SRMAs sought to evaluate caffeine against a comparator. In one of the three, caffeine was compared to theophylline [14]. This review concluded that both caffeine and theophylline are found to have similar short-term outcomes in relation to rates of apnea and treatment failure. Theophylline, however, had some lesser therapeutic advantages than caffeine in relation to dose changes due to tachycardia and intolerance. These results are in agreement with more recent individual studies that were not included in the SRMA. A study conducted by Jeong et al. [19] also reported that caffeine is found to be efficacious in terms of short-term treatment outcomes and easier to administer compared to theophylline. These conclusions are also supported by a recently published RCT conducted by Zulqarnain et al. among 100 infants in Pakistan [20]. Another study conducted recently by Shivakumar et al. [21] compared caffeine with aminophylline and reported caffeine and aminophylline are equally effective.

Indeed, caffeine therapy is currently one of the most prescribed medications for treating AOP. It is the initial drug of choice among all methylxanthines because of its efficacy, better tolerability, wider therapeutic margin, and longer half-life [22]. Theophylline, another methylxanthine agent, is administered more than once a day and may cause adverse events that require closer serum level monitoring than caffeine, which is given only once a day, rarely causes toxic effects, and has a relatively wide therapeutic index [23].

In the remaining two studies that evaluated caffeine against comparators, caffeine was evaluated as part of a methylxanthine study group (including caffeine and theophylline): one compared against doxapram and one against a placebo or no treatment (control). Since the 1970s, methylxanthines have been used to stimulate breathing efforts to reduce apnea [24,25,26], of which, theophylline and caffeine have been used.

The first use of doxapram was tested in 1985 [27] as an alternative to caffeine for breathing issues in neonates. Doxapram acts on both peripheral chemoreceptors and the central nervous system to improve breathing efforts. However, the use of doxapram is no longer recommended and has been reserved for neonates whom methylxanthine and continuous positive airways pressure (CPAP) strategies fail to control severe apneic events as it is associated with a decrease of cerebral blood flow [28, 29]. In one included SRMA [15], comparing doxapram to methylxanthines against the incidence rate of failed treatment within 48 h of the intravenous use of both therapies, a non-significant difference was concluded. Against control, in another included Cochran review [15], the methylxanthines were associated with a reduced rate of failure as well as the need for pressure ventilation. In this SRMA, the effect against mortality was analyzed, and no advantages in reducing early mortality were observed.

Important is that all comparative SRMAs of caffeine had high-quality methods and a low risk of bias.

One SRMA compared early to late administration of caffeine, and the outcomes were in favor of early administration in relation to mortality, BPD rate, PVL, ROP requiring laser photocoagulation, PDA requiring treatment, and BPD or death. Early administration, however, did significantly reduce the duration of MV. This SRMA, however, had a critically low level of quality and it was associated with an unclear risk of bias.

Three of the included SRMAs compared different dose regimens of caffeine, i.e., high versus lower dose. According to the “Consensus Guidelines for Management of Apnea of Prematurity UCSF (NC)2” (Northern CA Neonatology Consortium) [30], the loading dose should be 20 mg/kg intravenously (IV) and the maximum maintenance dose could be 10 mg/kg IV or orally (PO). All three SRMAs defined the low maintenance dosage in similar ranges: 5–10 mg/kg/day [11], ≤ 10 mg/kg/day [17], and 2.5–10 mg/kg/day [16]. The SRMAs, however, defined the high maintenance dosage and the loading dose differently: 10–20 mg/kg/day [11], > 10 mg/kg/day [17], and 5–30 mg/kg/day [16], and no limit of dose [11], > 20 mg/kg [16], and 10–80 mg/kg [17], respectively. In any case, higher doses of caffeine were associated with greater effectiveness against most outcomes, including success rate, apnea rate and duration, BPD rate, duration of MV, extubation failure BPD or death, and ventilator removal. The SRMAs of the different dose regimens of caffeine were of low to moderate quality and are associated with low, unclear, and high levels of bias.

Therapeutic drug monitoring (TDM) is frequently measured when administering methylxanthine medications; it helps clinicians adjust the dosing amount and duration of methylxanthines to ensure drug concentrations stay within the therapeutic range, potentially avoiding supra-therapeutic toxicity and subtherapeutic treatment failure [31]. None of the SRMAs collected data with regard to the peak and trough levels of caffeine or methylxanthine. Although the current SR confirms that caffeine is preferred over theophylline for the management of AOP, the clinical utility of routine TDM remains controversial [32]. While measuring serum levels is required in neonates managed with theophylline, due to a smaller margin of safety and greater variability of absorption, caffeine levels are only monitored in cases where signs of toxicity are suspected [31]. Natarajan et al [31], in their observational study of 101 preterm neonates with a median gestation of 28 weeks, reported that caffeine doses ranging from 2.5 to 10.9 mg/kg resulted in plasma concentrations that ranged from 3 to 23.8 mg/L and showed that 94.8% of concentrations fell within the normal reference range of 5.1–20 mg/L. Another prospective study showed that the serum concentrations by 14 days of life of 154 preterm neonates with a mean gestation of 29 weeks, who were given a 20 or 25 mg/kg caffeine loading dose followed by 6 mg/kg/day maintenance dosage, were no longer dependent on gestational age, weight, or postnatal age, suggesting that routine measurement of serum caffeine concentrations in preterm infants is not likely to be necessary [33]. In relation to caffeine toxicity, which is less likely to occur with caffeine standard doses than with other medications in the same class, an RCT reported that the need for TDM in infants with levels within the normal therapeutic range of 5.5–23.7 mg/L is unlikely. However, in cases in which toxicity is suspected or when a clinical response is absent, TDM may be required [34].

Despite some SRMAs demonstrating the benefits of caffeine in reducing the duration of MV, none of them reported whether caffeine reduces the need for CPAP, which is a substitute for invasive ventilation that does not need an endotracheal tube and is a preferred mode of ventilation for clinicians, avoiding neonate discomfort. Based on other literature, however, and although caffeine has been suggested to be beneficial for improving respiratory function via enhancing CPAP success [35], recent evidence demonstrated that early administration of caffeine (first 3 days of life) does not provide a reduction in the risk of CPAP and extubation failure [36].

Adverse events with caffeine use were only a focus in the SRMAs that evaluated the different administration regimens [12, 13, 17, 18]. With critically low-level quality and an unclear risk of bias, the early administration of caffeine was associated with fewer adverse events and a lower risk for NEC complications as compared to late administration [13]. With moderate quality and a low to high risk of bias, the higher caffeine doses were found to be associated with a higher rate of tachycardia [12, 18]. With a low to moderate quality level and an unclear to high risk of bias, there was no difference between the high and low doses of caffeine with respect to adverse events [12, 17, 18]. The adverse events of interest in comparative regimen studies included PVL, ROP, PDA, IVH, NEC, spontaneous intestinal perforation, and hyperglycemia.

Only one publication, by Chen et al., reported a greater number of IVH cases with high doses of caffeine (n = 422) [12]. While no specific risk factor behind this has been indicated, it is reported in the literature that neonates who received high doses of caffeine therapy tended to have a higher incidence of seizures as compared to those receiving standard doses (58% vs. 40%), with a seizure burden in the high-dose group ranging between 0 and 2174 s, versus 0 and 240 s in the low-dose group [37]. Given this, and the fact that no data on the neonates’ seizure status were reported in the Chen et al. study, an association between the occurrence of seizures and the increased number of IVH cases cannot be excluded.

The overview has some limitations. Searching with additional index terms to those in the study or additional combinations of them is always possible and may generate additional studies. In addition, the fact that a primary article could have been included in more than one SRMA may contribute to double counting of data within reported meta-analyses. No exploration of such overlaps took place in this study.

5 Conclusion

The overview of SRMAs indicates via limited quantity, but high quality of evidence that caffeine is efficacious and safe to administer to reduce apnea and stimulate breathing in infants. However, owing to the limited quantity and quality of relevant evidence, no robust conclusions can be made with regard to the comparative effectiveness and safety of different timings and doses of caffeine administration. Larger and long-term trials are needed to confirm the different aspects of caffeine use in neonatal apnea, particularly the ideal regimen to use.