Introduction

Invasive mechanical ventilation (MV) is a life support procedure regularly employed as an auxiliary ventilation method in intensive care [1, 2]. Mechanical ventilators have been utilized since the 1960s to support respiratory failure and improve survival in low-birth-weight preterm infants [3]. Despite the ability of endotracheal intubation and MV to save lives, they are associated with several complications, such as bacterial colonization, sepsis, ventilator-associated pneumonia, and airway trauma. Moreover, prolonged MV increases the risk of potential complications, including bronchopulmonary dysplasia (BPD), neurodevelopmental disorders, periventricular hemorrhage, subglottic stenosis, laryngeal edema, diaphragmatic atrophy, emphysema, pneumothorax and reduced postnatal growth [4,5,6,7]. Therefore, to minimize these risks and complications, it is the current consensus in clinical practice to withdraw from the ventilator as soon as possible and shorten the duration of invasive mechanical ventilation [8].

The scientific basis for determining when a patient is ready for extubation is still imprecise, despite significant advancements in MV and post-extubation respiratory support in neonatology. Clinical judgment, personal experience, bedside observation of blood gases, oxygen requirements, and ventilator settings are typically used to make decisions on whether to extubate [9, 10]. Consequently, there are significant practical differences and a paucity of protocols to simplify the management of all components of the peri-extubation process, with decisions often being physician-dependent rather than evidence-based, which may lead to inappropriate extubation [11,12,13]. The rate of extubation failure (EF) increases from 20% in infants born at 28–31 weeks gestational age to more than 60% in very preterm infants born at less than 28 weeks gestational age [14,15,16,17] for several reasons, including frequent or severe apneas, residual lung disease, immature respiratory drive and presence of unstable patent ductus arteriosus. EF not only prolongs the duration of mechanical ventilation but is independently associated with increased mortality, morbidity, length of hospital stay, and healthcare costs [17,18,19]. Therefore, it is critical and challenging to determine the optimal timing of extubation in mechanically ventilated neonates.

Identifying factors associated with failed extubation may help reduce the duration of mechanical ventilation, avoid reintubation, improve outcomes, and design future studies of ventilated preterm infants. This study aimed to identify potential predictors of EF in newborns.

Methods

The review followed the PRISMA reporting guidelines, a 27-item list designed to improve the reporting of systematic evaluations [20], and was registered with PROSPERO (registration number: CRD42023415289). All relevant analyses were based on previously published studies and did not require ethical approval or patient consent.

Search strategy

A systematic literature search was conducted in PubMed, Web of Science, Embase, and Cochrane Library for studies that were published in English from the inception of each database to March 2023 using keywords, Medical Subject Headings (MeSH), and other index terms, as well as combinations of these terms and appropriate synonyms. The search terms focused on “newborn infant,” “newborn,” “neonate,” “extubation failure,” “EF,” “extubation outcome,” “extubation readiness,” “risk factors,” “Influencing factors,” “predictors,” and their synonyms (see the Supplementary Material for the complete search strategy). Additionally, we manually searched the reference lists of all selected studies for any further relevant studies meeting our inclusion criteria.

Inclusion criteria

  • Newborns were the majority of the study population, including preterm infants and low-birth-weight infants (LBWI).

  • Extubation failure/success as the primary outcome indicator and predictors of extubation failure/success as the primary study objective.

  • Prospective or retrospective cohort study.

  • Studies published in English.

Exclusion criteria

  • Adults, children, or adolescents were the majority of the study’s population.

  • Focus on specific disease areas, such as congenital heart disease, laryngotracheoplasty, burns, or other surgical intubation.

  • Accidental extubation, treatment failure, intubation time, or death as the primary outcome indicator.

  • Clinical trial articles, because the clinical trial population cannot replace other MV newborns.

  • Abstracts, clinical trial registries, and medical record reports.

  • Conference proceedings, review articles, letters, and editorials.

  • Animal or in vitro studies.

  • Unavailable original literature.

  • Not published in English.

Data extraction

Two reviewers extracted the data using a pre-designed Microsoft Excel 2019 spreadsheet. The extraction procedure was conducted independently, with a third senior reviewer mediating disputes when necessary. Data were collected on the following characteristics for each included study.

  1. (1)

    Basic information: first author, country, year of publication, study duration, and study design.

  2. (2)

    Demographic characteristics: exclusion criteria of the study, sample size, number and rate of EF.

  3. (3)

    Assessment of reintubation: type of respiratory support provided after extubation and criteria for reintubation.

  4. (4)

    Description of EF: the primary definition and time frame used to classify infants into extubation success or failure were recorded.

  5. (5)

    Predictors of EF.

Quality assessment

Two reviewers independently assessed the methodological quality of the study, and disagreements were settled by consensus through a panel discussion. The risk of bias for each included study was assessed using the Risk of Bias Assessment for Nonrandomized Studies tool [21]. This tool was selected because of the nonrandomized nature of all included studies as well as its ability to evaluate six domains of risk of bias, including 1) selection of participants, 2) confounding variables, 3) measurement of exposure, 4) blinding of outcome assessments, 5) incomplete outcome data, and 6) selective outcome reporting. If the study received low risk ratings for each of the six evaluated domains, the risk of bias would be low. If at least one domain was rated to have an unclear risk (but no domains were rated to have a high risk), the study would be at moderate risk of bias. If at least one domain was rated as having a high risk, the study would be a high risk of bias.

A third reviewer extracted data from five studies that were chosen at random and examined for methodological quality and bias risk to ensure the correctness of the assessment.

Qualitative synthesis and quantitative meta-analysis

Each reported predictor was synthesized qualitatively based on the total number of low and moderate risk of bias studies evaluating the factor and the percentage of studies showing positive correlation, marking it as definite, likely, unclear, or not a risk factor (Table 1). For each risk factor, adjusted or unadjusted odds ratios were recorded when available. For predictors with sufficiently homogeneous definitions and reference ranges, a quantitative meta-analysis of low and moderate risk of bias studies was implemented to estimate a combined OR.

Table 1 Defining the strength of a risk factor

Data analysis was performed using Revman5.4 software provided by the Cochrane Collaboration Network. The generic inverse variance method was used for the meta-analysis of both predictors and EF rates [22]. This method requires only effect estimates and their SEs. The SEs were estimated by back transforming the 95% confidence limits using the standard normal distribution. The included studies were tested for heterogeneity (I2 test), if P ≥ 0.05 and I2 < 50%, this indicated less heterogeneity among studies and a fixed-effects model was selected for statistical analysis of the data, while conversely P < 0.05 or I2 ≥ 50% indicated greater heterogeneity among studies and a random-effects model was used.

Results

A total of 2356 articles were identified from the literature search of the databases, of which 627 duplicates were removed. After screening the remaining 1729 articles for title and abstract, 101 articles were selected for full-text retrieval. Following the eligibility assessment, 32 articles met the inclusion criteria. The references of the selected articles were also examined, and a full-text search was conducted for nine articles, resulting in the inclusion of two articles that met the eligibility criteria. Ultimately, 34 articles were identified for inclusion in this review, with 25 studies contributing to the qualitative synthesis and 24 to the quantitative meta-analysis. Study identification is summarized in Fig. 1.

Fig. 1
figure 1

Flow chart of the systematic literature search

Of these studies, 19 were prospective studies [17, 19, 23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39] and the remaining 15 studies were retrospective [10, 16, 40,41,42,43,44,45,46,47,48,49,50,51,52], of which seven were multicenter studies [16, 17, 19, 23, 37, 39, 49] and 27 were single-center studies [10, 24,25,26,27,28,29,30,31,32,33,34,35,36, 38, 40,41,42,43,44,45,46,47,48, 50,51,52].The sample size ranged from 34 to 926, with the two largest studies including 394 newborns [40] and 926 newborns [17], respectively. Three of these studies constructed clinical prediction models [16, 42, 49]. The basic characteristics of the included literature are given in Table 2.

Table 2 Summary characteristics of the 34 studies included in this review

Study populations and EF rates in included studies

The studies investigated a range of newborns of different gestational ages and different birth weights, of which 16 studies were conducted with preterm infants as the study population; 12 studies were conducted with LBWI as the study population; three studies with different requirements for gestational age or birth weight; and the other three studies included all eligible newborns without other requirements for gestational age or birth weight.

The combined EF rate was 26.5% (95% confidence interval [CI]: 23.1–30.6%). The heterogeneity was high (I2 = 88%). The combined EF rate was 26.5% (95% CI: 21.9–31.5%; I2 = 77%) in female infants and 32.9%(95% CI: 28.1–37.5%; I2 = 78%) in male infants. The frequency of EF based on very low birth weight (VLBW) infants and extremely preterm infants were available from 12 articles with 1698 infants and 7 articles with 1873 infants. The combined EF rate with VLBW infants and extremely preterm infants were 24.2% (95% CI: 20.0–28.6%; I2 = 75%), 40.1% (95% CI: 35.5–44.8%; I2 = 69%), respectively (Fig. S1-5 in the supplementary material).

Definition of EF and Criteria for reintubation

There was heterogeneity in the definition of EF among the included studies (Table 2). In all studies, EF was defined as reintubation, but the time range used to define EF ranged from 48 h to 7 days. Of these, eight studies defined EF as reintubation within 48 h with a combined EF rate of 25.9% (95% CI: 20.0–32.9%; I2 = 68%), 14 studies defined EF as reintubation within 72 h with a combined EF rate of 26.5% (95% CI: 20.6–33.8%; I2 = 88%), three studies defined EF as reintubation within 5 days with a combined EF rate of 36.7% (95% CI: 27.0–47.9%; I2 = 91%), and nine studies defined EF as reintubation within 7 days with a combined EF rate of 24.2% (95% CI: 17.4–32.9%; I2 = 90%) (Fig. S6 in the supplementary material).

There is no uniform standard regarding reintubation criteria. In studies proposing indications for reintubation, the most commonly used index is inspired oxygen fraction (FiO2) > 0.5–0.7; partial pressure of carbon dioxide (PCO2) > 55–65 mmHg, with persistent acidosis (pH < 7.20–7.25); frequent or severe apnea; increased work of breathing.

Quality of the EF studies

Included studies differed in their methodological quality (Fig. 2, and Fig. S7 in the supplementary material). Ten studies were classified as low risk in all six domains and were considered to be at overall low risk of bias. Nine studies were considered to be at overall high risk of bias, of which seven were related to confounding variables, one was related to selection of participants, and one was related to outcome assessments. The remaining 15 studies had at least one unclear risk in six domains and were categorized as having a moderate risk of bias.

Fig. 2
figure 2

Summary of risk of bias in the included studies

Predictors of EF in included studies

The 34 included studies described 43 statistically significant risk factors for EF. These variables were divided into six major categories: intrinsic factors (47.1%, 16/34); maternal factors (11.8%, 4/34); diseases and adverse conditions of the newborn (17.6%, 6/34); treatment of the newborn (8.9%, 3/34); characteristics before and after extubation (38.2%, 13/34); and clinical scores and composite indicators (58.8%, 20/34). Details of the risk factors identified in each study are presented in Table 3.

Table 3 Predictors of extubation failure

Five variables were found to be definite predictors for EF, based on either all low and moderate risk of bias studies showing a positive association (if at least three studies) or the majority of low and moderate risk of bias studies showing a positive association (if at least five studies). Definite predictors included being of gestational age, sepsis, pre-extubation pH, pre-extubation fraction of inspired oxygen (FiO2), and respiratory severity score (RSS). Eight variables were considered likely associated with EF, and these included being of age at extubation, anemia, inotropic use, mean airway pressure (MAP), pre-extubation PCO2, mechanical ventilation duration, Apgar score, and spontaneous breathing trial (SBT). 22 variables that showed conflicting results in studies with low and moderate risk of bias, or were positive in only one study, that were considered to have an unclear association with EF, included: birth weight, sex, small for gestational age (SGA), mode of delivery, premature rupture of membranes (PROM), maternal chorioamnionitis, antenatal steroids, pneumonia, necrotizing enterocolitis (NEC), severe respiratory distress syndrome (RDS), arterial hypotension in the first 3 days of life, administration of ≥ 2 doses of surfactant, caffeine administration, unsuccessful enteral feeding, ventilator inspiratory pressure, post-extubation pH < 7.3, metabolic acidosis with pH < 7.25 in the first 3 days of life, peak FiO2 within the first 24 h of age, post-extubation HCO3 < 18 mmol/L, lung ultrasound severity score (LUSS), heart rate variability (HRV) and electrical activity of the diaphragm (Edi).

Meta-analysis was implemented for 19 predictors with at least two low or moderate risk of bias studies demonstrating homogeneous predictors definitions and reference ranges (Figs. 3, 4, 5, 6, 7).

Fig. 3
figure 3

Meta-analysis of intrinsic factors. Forest plots of odds ratios (ORs) that were included in the quantitative meta-analysis and the associated overall ORs. For each OR, the size of the red square region is proportional to the corresponding study weight. Diamond shapes intervals represent the overall ORs. I2 represents the fraction of variability among the individual ORs that cannot be explained by sampling variability

Fig. 4
figure 4

Meta-analysis of maternal factors

Fig. 5
figure 5

Meta-analysis of disease and treatment factors

Fig. 6
figure 6

Meta-analysis of characteristics before and after extubation

Fig. 7
figure 7

Meta-analysis of clinical scores and composite indicators

Discussion

Our study was the first systematic review of predictors for EF of newborns. Through a qualitative synthesis of 43 predictors and a quantitative meta-analysis of 19 factors from the 34 studies included. We identified five definite factors, eight possible factors, and 22 unclear factors related to EF. Definite factors included gestational age, sepsis, pre-extubation pH, pre-extubation FiO2, and RSS. Possible factors included age at extubation, anemia, inotropic use, MAP, pre-extubation PCO2, mechanical ventilation duration, Apgar score, and SBT. The results of our systematic review provide an up-to-date comprehensive summary of the latest evidence, which can inform the determination of the optimal timing of extubation in newborns who are mechanically ventilated and the development of interventions to reduce and prevent EF.

Extubation failure reflects a complex pathophysiological process in which multiple factors are implicated such as weak respiratory drive, imbalance between the capacity of the respiratory muscles and the loads imposed on them (lung and chest wall elastic loads, airway and tissue resistance), and inability to keep the airway open. In the last few years, there has been an increasing tendency to extubate intubated newborns early after initial respiratory management. Unfortunately, these fragile infants are often at risk of reintubation shortly after the withdrawal of mechanical ventilation. In our included studies, the reintubation rate was 26.5% (95% CI: 23.1–30.6%) for the first extubation, with up to 40.1% (95% CI: 35.5–44.8%) in VLBW infants. The main causes of reintubation for EF were frequent apnea, increased respiratory workload, and hypoxemia [53]. There is evidence that EF is strongly associated with adverse clinical outcomes and mortality [10, 16, 19, 40, 44].

Our results are consistent with previous findings that gestational age is one of the most critical risk factors for EF. Immature infants are at higher risk for EF than mature infants due to changes in lung maturity, respiratory patterns, and respiratory muscle strength with increasing gestational age. Spaggiari et al. [50] found that the risk of EF is decreased by 27% for every week of gestational age increase. Although another study found no significant association between gestational age and EF, this may be due to differences in the composition of the study population [49]. In China, termination of pregnancy at gestational age of less than 28 weeks is defined as miscarriage in obstetrics. Fetuses younger in gestational age or less viable at birth are frequently more likely to be aborted or abandoned by their parents for treatment. Therefore, the younger the gestational age of the fetus, the less likely it is that the fetus will be transferred to the NICU and receive mechanical ventilation. The development of the brain is critical for the control and regulation of breathing, and a study by Williams et al. [38] showed that higher age at extubation was strongly associated with extubation success (ES) due to a more mature brain. This contradicts the findings of Dimitriou et al. [27] and He et al. [43], who suggested that prolonged ventilation before extubation causes disuse atrophy of the diaphragm, resulting in subsequent EF. Moreover, male infants are more susceptible to EF than female infants [39], and the incidence of meta integration is higher. However, there are no studies to explain the causes of this phenomenon.

Several maternal characteristics affect newborn conditions and outcomes, including mode of delivery, PROM, and maternal infections. Alaa et al. [46] and Teixeira et al. [32] reported that vaginal delivery was significantly associated with EF, although the precise mechanism of this association remains unclear. One possible explanation is that vaginal delivery may occur without the necessary medical care, which could worsen the infant's clinical condition. Furthermore, vaginal delivery itself is in most cases a spontaneous preterm birth due to an underlying inflammatory or infectious disease. Cesarean delivery may be associated with better prenatal care, which may moderate the risk factors associated with preterm birth, thereby reducing the risk of complications and increasing the chances of survival of preterm infants. The impact of PROM on extraction outcomes is also controversial. In 40–70% of patients of low gestational age, PROM is associated with histological chorioamnionitis [54, 55]. Some studies argue that chorioamnionitis may lead to early lung maturation by increasing surface-active substances and reducing pro-inflammatory mediators in the airways [56]. However, we are currently unable to confirm a link between chorioamnionitis and ES. Further studies are required to evaluate whether PROM, independently of chorioamnionitis, can activate unidentified mechanisms that affect lung maturation.

Neonatal diseases and adverse conditions are not only major causes of prolonged hospitalization or even death of newborns but may also be important risk factors for EF. Infection is one of the most common adverse events in hospitalized newborns and poses a threat to all newborns [57]. In our study, sepsis was a common infection affecting EF. First, inflammatory factors can attack immature lung tissue in an inflammatory storm. Once alveolar cells and interstitial lung tissue are destroyed by inflammation, ventilation function, and pulmonary vascular hemodynamics are compromised, and this damage may be irreversible. Second, sepsis may be complicated by encephalitis, leading to central respiratory dysfunction. In addition, three studies reported anemia as a predictor of EF [24, 28, 49]. The possible explanation is that low hemoglobin concentration (HB) levels reduce the oxygen delivery of the respiratory center of the brain and the ability of the lungs to deliver oxygen to tissues, which may lead to an increase in the burden of the heart and lungs, leading to EF.

A systematic review [58] of interventions to improve ES in neonates showed that methylxanthines improved rates of ES and caffeine given pre-extubation reduced time spent in oxygen and rates of death or disability. However, caffeine was routinely used in most studies we included, so no significant differences were found. There is no unified standard for the specific dosage and duration of administration, and further studies are needed. Two studies [40, 48] showed a significant association between the use of inotropes and EF. Early hypotension or blood pressure instability has previously been documented to impact lung and brain development, while it is unclear through which mechanisms the use of inotropes in early life affects EF [17]. In addition, the early introduction of total enteral feeding may affect extubation outcomes by enhancing micronutrient delivery, promoting gut development and maturation, stimulating microbiome development, reducing inflammation, and enhancing brain growth and neurodevelopment [48, 59].

Currently, the decision to extubate relies on clinical judgment through the interpretation of ventilatory support, blood gas values, and overall clinical stability of the neonate [37]. Pre-extubation pH, pre-extubation FiO2, pre-extubation PCO2, and MAP are important markers of extubation readiness and significant predictors of EF [16, 17, 19, 31, 39, 41,42,43, 49]. A lower pH indicates that the oxygen exchange capacity of the lungs is not meeting the body's demand for oxygen supply. Mechanically ventilated infants with severe hypercapnia are unlikely to produce sufficient spontaneous tidal volume for ES. Since blood gas data before extubation are highly dependent on ventilator settings. Wang et al. [51] found that postextraction arterial blood gas analysis results were more valuable in predicting EF than pre-extubation data. However, accurate thresholds for the above predictors are currently lacking and they require additional confirmation in multicenter studies with high sample sizes. In addition, long-term mechanical ventilation may cause damage to respiratory muscle strength and neural development [5]. Spaggiari et al. [50] pointed out that for every additional day of mechanical ventilation, EF increases by 5%, which is similar to the research results of Devadas et al. [24]. In the past decade, the practice of prompt weaning and early extubation to non-invasive respiratory support has been the focus and ultimate goal. Continuous positive airway pressure (CPAP) is the most commonly used respiratory support in clinical practice. The latest Cochrane Systematic review shows [60] that nasal intermittent positive pressure ventilation (NIPPV) is more effective than nasal CPAP in reducing the incidence of EF and the need for reintubation within 48 h to one week, but does not significantly reduce chronic lung disease and mortality. Noninvasive high-frequency oscillatory ventilation (NHFOV) is an unconventional noninvasive ventilation mode that is considered a possible improvement over CPAP. However, there is still a lack of uniform standards and consensus on which noninvasive modality to use for respiratory support after extubation.

ES depends on the adequacy of respiratory drive, the capacity of the respiratory muscles, and the load imposed upon them. Given this, EF is more likely to be predicted by a composite evaluation than a univariate index. RSS is a simple, non-invasive, easy-to-use tool for assessing respiratory failure. It has been effectively used to indicate the severity of lung disease in several large multicenter studies [61, 62], with five studies showing that high RSS or RSS/kg values before extubation strongly correlate with EF [10, 41, 42, 47, 48]. The low Apgar score indicates that early neonatal hypoxia may be prolonged, and cause significant hypoxic damage to the brain and lungs, affecting the recovery of respiratory function, which may lead to difficulty in extubation and prolong the use of the ventilator [17, 39, 41, 49]. The SBT was developed for the adult population to assess the patient's ability to breathe spontaneously with minimal or no support. Incorporating SBT into weaning protocols is an accepted common practice in adult populations [63]. Despite the widespread use of SBT in neonatal intensive care units (NICUs) worldwide, few robust studies have been conducted in neonatal populations. A systematic review evaluating the accuracy of all extraction preparation tests in preterm infants, including SBT, concluded that there is insufficient evidence to support using SBT in preterm infants [64]. Additionally, loss of variability is a common occurrence in disease and is often predictive of poor outcomes. Heart rate variability and respiratory variability are two attractive tools, they are simple and noninvasive to measure and can be automated and performed at the bedside [23, 52]. However, the predictive value of variability needs to be tested in a larger population and through a randomized controlled trial design.

The strengths of this systematic review include the systematic approach to identifying all publications that included predictors for EF of newborns and the division of predictors into six major categories to provide a logical progression of possible factors of EF. However, The results of this systematic review and meta-analysis must be considered in the context of several limitations. First, the lack of standardization in the definition of EF. In addition, while the search strategy was comprehensive and rigorous, it may still have missed some studies. Finally, because most of the included studies were retrospective, causal assertions could not be made regarding the predictors for EF.

Conclusions

In summary, we identified several of the most critical factors affecting extubation in our published studies, including gestational age, sepsis, pre-extubation pH, pre-extubation FiO2, and RSS. However, all of the included studies did not take into account how sociodemographic factors, such as family income, and the mental and physical health of the parents can affect EF. In addition to this, the level of NICU team skills, antenatal and delivery room management may impact both severity of illness and extubation outcome. Therefore, well-designed and more extensive prospective studies investigating the predictors affecting EF are still needed in the future. Additionally, consensus on the definition of EF is needed to better compare results and to improve the reliability of meta-analyses. In recent years, machine learning (ML) methods have been increasingly applied to handle a variety of challenging medical issues. ML can help improve the reliability, performance, predictability, and accuracy of diagnostic systems for many diseases. Since many confounding factors affect extubation outcomes, future research should explore the possibility of combining various tools. It can develop predictive models based on ML to predict EF more robustly.