FormalPara Key Summary Points

There is lack of consensus on what constitutes best practice when assessing extubation readiness in children.

The review aims to evaluate diagnostic accuracies of pre-extubation assessments in predicting extubation failure (EF) in children.

Compliance rate oxygenation pressure (CROP) index had the highest sensitivity [0.57, 95% confidence interval (CI) 0.4–0.73] and area under curve (AUC, 0.98). SBT had the highest specificity (0.93, 95% CI 0.92–0.94).

Heterogeneity of EF definitions found in this systematic review calls for consensus in the definition for future research.

Future studies should potentially explore the possibilities of combining various tools to develop a model that predicts EF in a more robust manner, as extubation failure is likely multifactorial.

Introduction

Intubation and mechanical ventilation (MV) are common procedures performed in the pediatric intensive care unit (PICU) [1,2,3]. Prolonged intubation and MV are associated with increased hospitalization and healthcare costs [4]. Conversely, extubation failure (EF) is associated with adverse outcomes such as increased mortality and higher costs [5]. Hence, there is a need to better identify patients who are ready for timely extubation, to minimize these risks [4].

The art of balancing between timely extubation and the associated risk of prolonged intubation is challenging in critically ill children. Many pediatric MV practices have been adopted from adult data or are based on individual experience and clinical judgement [6]. A recently published pediatric ventilator liberation guideline aims to address this concern [7], as the lack of clinical guidelines has previously led to variations in practice, which affects timely extubation and therefore MV duration.

We conducted this systematic review to qualitatively synthesize data from randomized controlled trials (RCTs) and observational studies in the current medical literature on pre-extubation assessments and evaluate the diagnostic accuracies of these assessments in predicting EF in critically ill children.

Methods

This systematic review and meta-analysis was conducted in close accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [9]. Our study protocol was registered in PROSPERO (https://www.crd.york.ac.uk/prospero/; ID no. CRD42020180409).

Definitions

In our review, pre-extubation assessment is defined as any objective data that were collected before the patient was extubated, which aided in the decision for extubation. Singular pre-extubation assessments referred to individual parameters (e.g., vital signs or ventilator settings/measurements). Collective pre-extubation assessments referred to indices combining two or more individual parameters. These pre-extubation assessments were obtained without necessarily changing the ventilator settings. For the purpose of this review, spontaneous breathing trial (SBT) refers to a systematic assessment to assess whether a patient can independently maintain adequate gas exchange without excessive respiratory effort if extubated, and this frequently involves implementing a change in the ventilator settings with minimal or no support [7].

Eligibility Criteria

We included RCTs or observational studies that examined the association between pre-extubation assessments and extubation outcomes in intubated patients ≤ 18 years old admitted to the PICU. Post-hoc analysis of an RCT (if there were no duplicate data), quality improvement projects, studies from prospective databases, and studies with retrospective and/or prospective arm were included. For this review, we focused on PICU patients with respiratory failure. Studies that focused on specialized subsets of patients (e.g., post-cardiac surgery, burns, and laryngotracheoplasty) were excluded.

We excluded studies that were conducted in neonatal or adult intensive care units. We also excluded studies on tracheostomy decannulation, effects of corticosteroids or sedative medications on extubation outcome, and weaning protocols only (without the use of any pre-extubation tests). Abstracts, clinical trial registries, case reports, case series, and non-English-language publications were excluded.

Search Strategy

A systematic search in PubMed, EMBASE, Web of Science, CINAHL, and Cochrane was performed from inception of each database to 15 July 2021. Our search strategy (Table S1) was created with the assistance of a medical librarian. The references from the database search were imported into Covidence (Australia) for screening [10]. Title and abstract of imported references were then screened by four independent reviewers (P.N., H.L.T., Y.-J.M., V.L.). The initial title and abstract screening required two “Yes” votes for the study to be included, and the full text screening required one vote. Any disagreements were resolved by discussion and, if needed, a vote by a third-party reviewer (J.H.L.).

Data Extraction

Data extraction was performed using a standard data collection form (P.N. and H.L.T.), and results were tracked in Microsoft Excel (Microsoft Corporation, USA) [11]. Missing data were requested from respective authors. If there was no reply, these studies were excluded. Extracted data included authors’ name, year of publication, study location, inclusion and exclusion criteria of studies, definition of EF, EF rates, and reports of diagnostic accuracies of pre-extubation assessments.

Risk of Bias Assessments

The appropriate risk of bias assessment was performed according to study design (P.N., H.L.T., and Y.-J.M.). Any disagreements were resolved by discussion and, if needed, a vote by a third-party reviewer (J.H.L.). The Cochrane risk-of-bias tool for randomized trials (RoB 2) was utilized to assess the methodological quality for RCTs [12]. RCTs were categorized on the basis of risk of bias as poor-, moderate-, or high-quality studies. For observational studies, Newcastle–Ottawa Quality Assessment Scale (NOS) was used to assess quality [13]. The NOS for cohort studies was used to assess risk of bias for quality improvement projects, secondary analysis of RCTs, and prospective studies [14]. Studies that scored eight to nine stars, six to seven stars, and less than five stars were considered as high, moderate, and low quality, respectively [14, 15].

Meta-analysis

Pooled EF rates and mean difference between extubation success and failure groups for each pre-extubation assessment were analyzed using the Comprehensive Meta Analysis Software Version 3.0 (USA) [16]. Statistical heterogeneity was determined and presented as I2. Pre-extubation assessments (e.g., SBT) that were used as part of routine care and were not specifically examined as a predictor of extubation outcome were not included in the meta-analysis. Studies that have common units of measurements were combined, while studies with incompatible data or units of measurements were excluded from the meta-analysis. The primary outcome for our review was EF, as defined by the study authors. As we expect heterogeneity in the definition of EF across studies, we used the random-effects model for our meta-analysis.

To evaluate the performance of each pre-extubation assessment, we also calculated the pooled sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR−), and diagnostic odds ratio (DOR) with their associated 95% confidence intervals (CI) (MetaDisc; Clinical Biostatistics Unit of the Ramón y Cajal Hospital, Madrid, Spain) [17]. A summary receiver operating characteristic (SROC) curve and its area under curve (AUC) were calculated. D+ was defined as extubation failure. This diagnostic meta-analysis required three or more studies that reported diagnostic accuracy results to generate the SROC curve.

Subgroup analyses were performed to explore whether the diagnostic accuracy results would change (1) based on the technique of SBT when pre-extubation assessments were measured [e.g., presence of pressure support, continuous positive airway pressure (CPAP), T-piece] and (2) based on different definitions used for EF [e.g., reintubation only, reintubation and use of non-invasive ventilation (NIV) post-extubation].

On the basis of a previous systematic review on the same topic conducted in a predominantly adult population [18], we anticipated to garner a large pool of data. In view of this, we focused our diagnostic meta-analysis on pre-extubation assessments that consider a combination of parameters [e.g., rapid shallow breathing index (RSBI) and SBT].

Ethics

This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.

Results

Systematic Review (Qualitative Synthesis)

After removal of duplicates, a total of 11,663 publications were screened and 41 studies were included in our systematic review (Fig. 1). Owing to the large pool of data, we have provided supplemental tables and figures (Tables S1–12 and Fig. S1) to further illustrate the results and discussion. Please refer to the electronic supplementary material for more details. There were 6 RCTs, 24 prospective studies, 6 retrospective studies, 3 secondary analyses, and 2 quality improvement projects. The majority of studies (18/41) were conducted in North America (Table 1 and Fig. S1). All included RCTs studied SBT [19,20,21,22,23,24].

Fig. 1
figure 1

PRISMA flow diagram

Table 1 Characteristic of included studies

A total of 8111 patients were included. Mean age was 2.86 [95% confidence interval (CI) 2.48–3.23] years, I2 = 98.6% [4, 19,20,21,22,23, 25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52]. Mean duration of MV prior to extubation attempt was 5.91 (95% CI 4.85–6.97) days, I2 = 99.5% [4, 20,21,22,23,24,25,26,27,28,29,30,31,32, 34,35,36,37,38, 41,42,43, 46,47,48,49, 51,52,53,54]. Out of the studies that presented the indications for intubation (17/41 studies) [4, 19, 20, 22, 23, 25,26,27, 39, 41, 43, 46, 51,52,53, 55, 56], intubation due to respiratory pathologies were cited to be the most common indication (n = 3070/4131). In studies that presented reasons for admissions (22/41 studies) [19,20,21, 23, 24, 29,30,31,32,33,34,35, 37, 42, 44, 45, 47,48,49,50, 53, 54], respiratory diseases were also the most common reason for PICU admission (n = 1372/3917).

Fifty-five unique pre-extubation assessments were identified. We categorized these into four categories: singular pre-extubation assessments, collective pre-extubation assessments, SBT, and others (Table 2, detailed description in Tables S2–S7).

Table 2 Pre-extubation assessments identified in the literature

Outcome Definition

Among the included studies, there was heterogeneity in the definition of EF (Table 3). Thirteen studies (total patients, n = 2581) included patients who required NIV post-extubation in their EF count, whereas 28 studies (n = 5955) defined EF as reintubation only. Timeframe used to define EF ranged from 24 to 72 h. The most common EF definition was reintubation within 48 h (16 studies, n = 3593). Overall combined EF rate of all included studies was 13.6% (95% CI 11.3–16.2%, I2 = 85.6%) (Fig. 2). Studies that included patients who required NIV post-extubation as part of EF count (13 studies, n = 2581) had a combined EF rate of 15.8% (95% CI 13–19.1%, I2 = 66.7%), whereas studies that defined EF as reintubation only (28 studies, n = 5955) had a combined EF rate of 12.9% (95% CI 10.2–16.2%, I2 = 88.5%). Studies that defined EF within 24 h (10 studies, n = 2417) had a combined EF rate of 10.6% (95% CI 8.2–13.7%, I2 = 68.7%), and those that defined EF within 48 h (26 studies, n = 3388) had a combined EF rate of 13.6% (95% CI 11–16.8%, I2 = 82.4%).

Table 3 Outcome definition
Fig. 2
figure 2

Extubation failure rates of included studies

Risk of Bias Assessment

For RCTs, three were ranked as having high risk of bias, two were of some concern, and one had low risk of bias (Table S8). For observational studies, overall quality was moderate to high. Using the NOS cohort tool (Table S9), there were 16 and 14 studies of high and moderate quality, respectively. Using NOS case–control tool (Table S10), two studies were of high quality, two studies of moderate quality, and one study of low quality.

Singular Pre-extubation Assessments

The most common vital sign examined was the respiratory rate (RR). Thirteen studies studied RR as pre-extubation assessment (n = 1945). The combined mean difference in RR between the extubation success and EF groups was −7.32 (95% CI −10.61 to −4.02) breaths/min (p-value < 0.001) [21, 25, 27, 30, 33, 35, 37, 41]. The most common blood test examined was the partial pressure of arterial carbon dioxide (PaCO2). Ten studies examined PaCO2 as pre-extubation assessment (n = 1379). The combined mean difference in PaCO2 between extubation success and EF groups was −1.68 (95% CI −5.13 to 1.78) mmHg (p-value = 0.34) [25, 27,28,29,30, 35, 37, 42]. The most common ventilator parameter examined was tidal volume. Thirteen studies examined tidal volume as pre-extubation assessment (n = 1945). The combined mean difference in tidal volume between the extubation success and EF groups was 1.1 (95% CI 0.37–1.83) ml/kg (p-value < 0.05) [20, 21, 25, 27, 29, 30, 33, 35, 37, 41, 42].

Collective Pre-extubation Assessments

The most common index examined was the rapid shallow breathing index (RSBI). Nine studies examined RSBI as pre-extubation assessment (n = 1400). The mean difference of RSBI in the extubation success group compared with EF group was −3.13 (95% CI −5.01 to −1.26) breaths/min/ml/kg (p-value < 0.05) [21, 25, 27, 35, 37, 41]. Seven of nine studies that presented diagnostic accuracy results were included in the diagnostic meta-analysis (Table S11). Noizet et al. [30] was not included in the diagnostic meta-analysis as it was the only study that reported age-adjusted data. As such, it was not compatible with the rest of the studies that reported unadjusted data. The pooled sensitivity and specificity were 0.44 (95% CI 0.35–0.54) and 0.81 (95% CI 0.78–0.84), respectively. Subgroup analyses conducted for studies that measured RSBI with CPAP support during SBT resulted in modest change to the pooled sensitivity and specificity of 0.58 (95% CI 0.41–0.75) and 0.8 (95% CI 0.75–0.84), respectively.

The most sensitive index reported was the compliance rate oxygenation pressure (CROP) index. Four studies examined CROP index as pre-extubation assessment (n = 401) and were included for diagnostic meta-analysis (Table 4). Out of the three pre-extubation assessments included in our diagnostic meta-analysis, CROP had the highest sensitivity (0.57, 95% CI 0.40–0.73) and highest AUC (0.98). The combined mean difference in CROP index between extubation success and EF groups was 0.5 (95% CI 0.26–0.74) ml/kg/breath/min (p-value < 0.001) [35, 37].

Table 4 Pooled diagnostic meta-analysis

Spontaneous Breathing Trial

Five of 13 studies (n = 1847) presented diagnostic accuracy results for the SBT and were included for diagnostic meta-analysis. SBT had the highest specificity (0.93, 95% CI 0.92–0.94) (Table 4) in the prediction of EF. The technique of SBT within these five studies were heterogeneous (Table S12). Table 5 presents the results of the subgroup analyses and the characteristics of studies within each analysis. Generally, there were no statistically significant changes in the pooled meta-analysis when we limited the analysis to studies that conducted SBT with pressure support compared with when we included all SBT studies.

Table 5 Subgroup analyses for spontaneous breathing trials (SBT) studies

Other Pre-extubation Assessments

There were six studies on diaphragm ultrasound (n = 338), and it was the most common pre-extubation assessment studied from 2020 to 2021. The common parameter of interest within the six studies on diaphragm ultrasound was diaphragm thickening fraction (DTF). The combined mean difference of the DTF between extubation success and EF group was 24.8% (95% CI −35.5% to 85.1%) (p-value = 0.42) [47, 48, 52, 53]. Two of six studies that presented diagnostic accuracy revealed that, with 20–23.2% as cutoff, DTF predicted extubation failure with 57.1–100% sensitivity and 76.2–89.4% specificity [51, 53].

Discussion

Our systematic review revealed that many parameters have been investigated as potential tools to aid in the assessment for extubation failure/success. These included RR, PaCO2, tidal volume, RSBI, CROP index, and SBT. Among the pre-extubation assessments that have been described in the literature, CROP index had the highest sensitivity and highest AUC; SBT had the highest specificity. In addition, there was heterogeneity in the definition of EF in our final 41 included studies.

Our systematic review differs from a previous review that explored methods used to predict extubation outcome in the PICU [57]. The previous review included only studies that defined EF within 48 h and did not perform a meta-analysis. We were broader in our EF definition as we included EF as per defined in the studies and hence included more studies in our review. In addition, we performed a meta-analysis with random-effects model, to account for the heterogeneity in the methods and EF definition. Our study revealed that there was a wide range in the definition of EF across all studies—in both what constituted EF and the duration used to define EF. The most common definition was reintubation alone within 48 h (16/41). We did not find statistically significant differences in the combined EF rates across the various definitions (Fig. 2). Owing to insufficient number of studies, we were not able to examine the effects of the different outcome definitions on the diagnostic accuracy results of pre-extubation assessments. Standardizations in the EF definition in future primary studies will provide a more robust overall meta-analysis result. On the basis of our review, we propose that future studies consider reintubation alone as their primary outcome of interest. Placing patients on NIV post-extubation should not be considered as EF for several reasons. For patients with an underlying progressive disease, using NIV could be a new baseline for the patient and should not be considered as EF. Some patients who require NIV post-extubation may only require minimal short-term noninvasive support as they are clinically improving, and clinicians may have preplanned NIV for these patients in efforts to extubate patients in a more timely manner.

We found heterogeneity in the timeframe for EF definition, with the most common timeframe being 48 h. Similar to our findings, a prior systematic review on adult population also revealed heterogeneity of the timeframe used in EF definition (ranging from within 24 to 72 h) [58]. The Brazilian Guidelines for Mechanical Ventilation for adult population defines EF as reintubation within 48 h of extubation [57]. Selecting to define the EF timeframe to 48 h arbitrarily may increase the risk of picking up patients who are reintubated for reasons other than premature extubation. In an observational study of 16 PICUs, the participating PICUs defined EF as reintubation within 24 h, as > 80% of failures occurred within this time period [3]. Future studies are needed to better define the most appropriate timeframe for EF definition, describe the different causes of EF across different timepoints, delineate the reasons for reintubation, and determine whether this could be attributed to premature extubation.

We focused our meta-analysis on indices and SBT as it is more common and practical to consider extubation on the basis of patients’ clinical status as whole rather than isolated parameters, which makes diagnostic accuracy results of parameters in isolation tough to interpret. Our meta-analysis results showed that RSBI, CROP index, and SBT are poor predictors of EF (sensitivity range 0.14–0.57, LR+ range 2.23–3.50). For SBT, this could be due to the fact that the majority of studies conducted SBT with pressure support. There have been previous discussions on whether pre-extubation assessments should be conducted without pressure support. Conducting SBT with pressure support may overestimate extubation readiness as it underestimates work of breathing [8, 59]. A prospective trial also showed that using the CPAP of 5 cm of water (cmH2O), as compared with pressure support, can better predict effort of breathing post-extubation [60]. Four out of six SBT studies included in the diagnosis meta-analysis conducted SBT with pressure support. This may explain why SBT had a low pooled sensitivity; pressure support overestimated the patients’ extubation readiness, resulting in increased false-negative results (i.e., passing SBT, but failing extubation). Unfortunately, there were not enough studies on SBT without pressure support to perform a meaningful comparison of meta-analysis results. In the neonatal intensive care setting, a meta-analysis of studies of SBT with CPAP (without pressure support) in preterm infants reported a pooled sensitivity of 0.4 (95% CI 0.24–0.58) and specificity of 0.97 (95% CI 0.85–0.99) [61]. Both these numbers are fairly similar to what we found. Hence, regardless of whether SBT is conducted with pressure support or CPAP, these data taken together may indicate that SBT is innately poor at detecting patients who are at risk for EF with its low sensitivity. This suggests that bedside providers may require an additional test in addition to SBT, or explore other SBT techniques, to reduce the risk of extubation failure.

A review in an adult population studying the predictive value RSBI suggested that serial RSBI and rate of change in RSBI is more predictive than a single RSBI measurement [62]. In addition, a prior prospective observational study showed that maximum RSBI rate of < 20% predicted extubation failure with a sensitivity of 88.8% and specificity of 88.8% [63], whereas a singular RSBI recorded at 120 min of SBT had a sensitivity and specificity of 33% and 76%, respectively. Most of the studies included in our review were of singular RSBI measurement pre-extubation, which had similar results when predicting extubation failure (sensitivity 0.44, specificity 0.81). Future studies can consider investigating the rate of change of indices such as RSBI and CROP index, instead of single measurements in predicting extubation failure in the pediatric population.

We found the CROP index having the highest AUC of the SROC curve at 0.98. This suggests that CROP index distinguishes patients with EF from extubation success patients well. However, owing to the limited studies, we were unable to perform a meta-regression to identify the pooled cutoff value for CROP index. Future studies should focus on and explore the predictability of CROP index so that results can be more robustly analyzed.

Current pre-extubation assessments such as RSBI, CROP index, and SBT have limitations as they do not account for many other factors such as sedation, risk of upper airway obstruction, or cough or gag reflex that may affect extubation outcome [8]. This is congruent with a recommendation by a recent pediatric clinical guideline suggested protocolizing extubation readiness test (ERT) bundle, which considers the factors above, in addition to performing SBT [7]. Future studies should explore the utility of a more robust model that combines risk factors and pre-extubation assessments to predict EF in a holistic manner. Performing a multiple logistic regression to identify both risk factors and pre-extubation assessments that have significant associations with extubation failure would help future investigators to formulate a model that is more comprehensive and holistic.

Limitation of the Review

Our review aimed to be comprehensive by expanding the scope beyond common predictors of extubation readiness and extubation failure definitions. Indeed, we have identified numerous parameters that have been studied thus far, and to the best of our knowledge, this is one of the largest systematic reviews on this topic to date. Most of the identified studies were conducted in North and South America (26/41), limiting the generalizability of our results in geographical regions where disease epidemiology and care provision may be different. This geographical bias could be due to our inclusion criteria excluding studies not published in English. There is high statistical and clinical heterogeneity across our included studies. We attempted to address these by applying the random-effects model in our meta-analysis and performing certain subgroup analyses. Our subgroup analyses conducted for SBT (Table 5) suggest that the heterogeneity could be attributed to clinical heterogeneity (e.g., SBT technique or outcome definition) as there is a general downtrend in I2 when limited to studies with same techniques or definitions.

Considerations for Future Research

Although observational studies garnered high-to-moderate quality in the risk of bias assessment, the reliability of the results is limited as the NOS tool does not capture the risk of bias in the domain of outcome assessment well. In the context of our review, it is important to have the results of the pre-extubation assessment blinded to the outcome assessors, and have a specific set of criteria for reintubation rather than for it to be physician dependent. However, according to the NOS tool, as long as there is written documentation of the outcome that can be found in the publication, a point is given to the domain [13]. Hence, future studies should aim to have the collected data blinded to the outcome assessors and have a specific set of criteria for reintubation rather than for it to be physician dependent, as there is likely to be a difference in threshold when performing rescue interventions such as reintubation or application of NIV in patients. Future research should strongly consider investigating SBT without pressure support (i.e., SBT on T-piece or CPAP) and comparing with their diagnostic accuracy with SBTs with pressure support. This will provide critical data to inform best practice when using SBT at the bedside. Also, as the CROP index had the highest AUC, more studies should be conducted to validate the diagnostic accuracy of this index in predicting EF in children.

Conclusion

Our systematic review showed that numerous studies have been conducted on this important aspect of PICU care, albeit with much heterogeneity. There is a need to find consensus on the definition of EF to allow better comparison of outcomes and increase reliability of future meta-analysis. Owing to many confounding factors that affect extubation outcome, future studies should potentially explore the possibilities of combining various tools to develop a model that predicts EF in a more robust manner.