Background

Although widespread use of MV in the intensive care unit (ICU) saves hundreds of lives daily, prolonged MV can lead to increased mortality and morbidity [1,2,3]. On the one hand, therefore, weaning should be considered as early as possible. On the other hand, however, premature withdrawal can result in extubation failure, which is also associated with increased morbidity and mortality [1, 4, 5].

Several ventilatory indices have been developed for identifying the right time to extubate the patient who has required endotracheal intubation and MV, but none met the criteria required to provide suitably accurate success rates [6]. More recently, lung and diaphragm ultrasound methods have been introduced, assessing pulmonary airway patterns and diaphragm function. Bouhemad [7] was the first author to propose the LUS score for calculating lung aeration patterns in patients with ventilator-associated pneumonia. In later articles, this score was used to predict weaning outcome [8,9,10,11], with promising results. Several parameters measured through diaphragm ultrasound have been proposed for the same purpose [11,12,13,14,15,16,17,18,19,20,21]. These include diaphragm thickness, diaphragm movement or excursion during the respiratory cycle [22], and diaphragm thickening or thickening fraction (TI). Although some studies have shown diaphragm excursion and thickness to be of low predictive value in the assessment of diaphragm function [18, 19, 23], a recent meta-analysis corroborates this and the best use of TI to weaning outcome [24].

The data suggest that TI and LUS are good non-invasive indicators of weaning outcome. However, the reliability and accuracy of these studies is limited, mainly due to small sample sizes, inadequate spectra of patients and study heterogeneity [24]. The aim of this study is to assess the reliability and accuracy of lung and diaphragm ultrasound for predicting successful weaning in general critical care patients on mechanical ventilation.

Methods

Design

We performed two independent studies: a cross-sectional concordance study between two sonographers (interobserver agreement study) and a prospective cohort study to assess the accuracy of lung and diaphragm ultrasound for predicting weaning and extubation outcome (predictive accuracy study).

Population

For the interobserver agreement study, we included 50 patients (with or without MV), who were consecutively admitted to the ICU of our hospital from December 2016 to February 2017, and who required a thoracic ultrasound examination for clinical reasons.

For the predictive accuracy study, we included consecutively all patients on MV admitted to the ICU from 15 January 2016 to 15 April 2017 who have signed the informed consent (Additional file 1) and met the following inclusion criteria (1) over 18 years of age; (2) more than 24 h on MV; (3) ready for weaning.

We applied the same exclusion criteria for both studies: (1) spinal cord injury higher than T8; (2) arrhythmias and haemodynamic instability; (3) terminal extubation; (4) pregnancy; (5) pneumothorax, pneumomediastinum, thoracostomy, chest tube or chest injuries that prevent ultrasound; (6) pleural lesions or pleurodesis.

Measurements/procedures

Ultrasound technique

Two sonographers trained in lung and diaphragm ultrasound, according to international recommendations [25] (Additional files 1, 2 and 3), performed the ultrasound measurements. They used a 2–4 MHz convex probe in B mode, as described in other studies [7, 8]. The scoring system adopted distinguishes four ventilation patterns as follows: normal aeration (N; presence of lung sliding with A lines and fewer than two isolated B lines), moderate loss of pulmonary ventilation (B1; more than two well-defined B lines), severe loss of pulmonary ventilation (B2; multiple coalescing B lines) and pulmonary consolidation (C; presence of a tissue pattern). Scores of 0–3 were respectively attributed to the four categories (0 point for N, 1 point for B1, 2 points for B2 and 3 points for C), and for each region the worst visible pattern was recorded. Rather than using the original LUS score, in our study, we applied a modified procedure (LUSm), evaluating four lung regions on each side instead of the standard six. Our intention in making this modification was to avoid having to move the critical patient, thus preventing the associated complications and facilitating the examination process for the operators. We assessed four areas: anterior–superior, anterior-inferior, lateral and postero-basal. The postero-basal area is where most of the pathology of the critical patient according to Lichtenstein [26] occurs. The total LUSm score for all areas ranged from 0 to 24 points.

In the diaphragm ultrasound examination, the sonographers measured diaphragm thickness using a 7–10 MHz linear probe in B mode (Micromax® Sonosite) following the technique described in other studies [11,12,13,14,15,16,17,18, 20, 27]. The right hemidiaphragm was visualised in the zone of apposition, on the midaxillary line between the 8th and 10th intercostal spaces, with the patient in a semi-decubitus position (20º–40º). The diaphragm was viewed in M-mode as a hypoechoic structure between two echoic lines (the diaphragmatic pleura and the peritoneal membrane). The sonographers captured almost three images in M-mode during spontaneous patient breathing, measuring diaphragm thickness at the end of expiration and at the end of inspiration. We made the average of three TI measurements using the following formula: (end inspiratory diaphragm thickness − end expiratory diaphragm thickness)/end inspiratory diaphragm thickness.

Interobserver agreement study

Both sonographers took TI and LUSm measurements in the same sample of 50 stable patients, with a time difference of less than 5 h between the two operators. This sample was different from the predictive accuracy study.

Predictive accuracy study

In the predictive accuracy study, the patients who were ready to start weaning, according to the international consensus conference criteria [28], the respirator was selected with 8 cm H2O pressure support (PS) and 5 cm H2O positive end-expiratory pressure (PEEP) and ultrasound and ventilatory measurements were made. The ventilatory measurements are made automatically by the respirator (model: GE DATEX- OHMEDA Engström Carestation). Afterwards, SBT was continued with a T-tube or with 8 cm H2O PS and 5 PEEP, depending on the decision of the responsible physician, who evaluated which of the patients successfully passed the SBT, and those who did were extubated. The medical team was blinded to the ultrasound results, and the research team played no role in the patient’s weaning. Weaning failure according to the 2007 international consensus conference [28] is defined as either failure of SBT or failure of extubation. Extubation failure is defined as the occurrence of reintubation, non-invasive ventilatory support or death within 48 h following extubation.

Statistical analysis

Data were expressed as medians and interquartile ranges (IQR) or percentages. To compare continuous variables, we used the unpaired Student’s t test, Mann–Whitney U test and Wilcoxon test. For categorical variables, we applied the Chi-square or Fisher’s exact test.

To evaluate interobserver agreement for LUSm, we calculated the quadratic-weighted kappa coefficient (which is comparable to ICC) and for the TI variable we used ICC and the Bland–Altman method.

In the predictive accuracy study, we calculated the AUCs and their corresponding sensitivities, specificities and likelihood ratios (LR + and LR−) at the optimal cut-off points, to determine the predictive value of TI and LUSm for weaning and extubation success. We developed a predictive model using binary logistic regression, with the ultrasound measures (LUSm and TI) as independent variables to predict successful weaning.

We used the StatsDirect v3.0.194 package to perform the statistical analysis.

Ethical aspects

The research ethics committee of Elche General University Hospital approved the study and all enrolled patients gave their informed consent.

Results

In the interobserver agreement study, the quadratic-weighted kappa value for LUSm was 0.95 (95% CI 0.92, 0.98), which shows almost perfect interobserver agreement. For the TI variable, we calculated an ICC value of 0.78 (95% CI 0.65, 0.87), showing moderate to good interobserver agreement, and a difference in measurements according to the Bland–Altman method of ± 12.5% (Fig. 1).

Fig. 1
figure 1

Bland–Altman method for interobserver difference in TI measurement

Over the study period, 139 patients underwent MV, of whom 52 did not meet the inclusion criteria (48 deaths before attempted weaning, 2 self-extubations, 2 on MV for less than 24 h) and 17 were not included for reasons beyond the research team’s control (eight withdrawn more gradually from MV, two with no informed consent, two transferred to another hospital, four eligible patients of whom the investigator was not notified, one case of a non-functioning ultrasound scanner). The baseline characteristics of the 69 patients recruited are shown in Table 1. Pressure support was used in 49% of SBTs, a T-tube in 42% and both methods in 9%. Eight patients failed SBT and 61 were extubated, of whom 17 failed extubation. This means that a total of 25 patients failed weaning. Most patients who failed extubation recovered with non-invasive-ventilation (NIV) and high-flow nasal cannula (HFNC); only five patients (8.2%) were reintubated (Fig. 2).

Table 1 Characteristics of patients included in the study
Fig. 2
figure 2

Flowchart of patients

If we compare the group that was successfully weaned (SW) with those who failed weaning (FW) (Table 1), we observe that the FW group was associated with more time on MV, more cases of chronic obstructive pulmonary disease (COPD), higher LUSm and mortality, and lower TI and SpO2. The median difference in TI and LUSm between the SW and FW groups was 11% and 3 points, respectively.

The area under the ROC curve for predicting weaning success was 0.80 for LUSm (95% CI 0.69, 0.91), 0.71 for TI (95% CI 0.58, 0.84) (Fig. 3) and 0.83 for both (Fig. 4). Table 2 shows the sensitivity, specificity and likelihood ratios at the optimal cut-off points for successful weaning. The area under the ROC curve for predicting extubation success was 0.78 for LUSm (95% CI 0.64, 0.91) and 0.76 for TI (95% CI, 0.61–0.9) (Fig. 3). Table 3 shows the sensitivity, specificity and likelihood ratios at the same cut-off points as those shown in the previous two tables.

Fig. 3
figure 3

ROC curves for predictive value of TI in successful weaning (SW) (+), in successful extubation (SE) (White square) and LUSm in SW (White circle) and SE (Increment). In SW: LUSm AUC 0.8; TI AUC 0.71. In SE: LUSm AUC 0.78; TI AUC 0.76

Fig. 4
figure 4

ROC curve for predictive value of TI plus LUSm in successful weaning (SW). AUC 0.83

Table 2 Comparison with other studies of predictive value of TI and LUS for successful weaning
Table 3 Predictive value of LUSm and TI for successful extubation

Discussion

According to our data, the reproducibility of lung ultrasound is excellent for the variable LUSm and moderate to good for TI. Regarding the prognostic accuracy of ultrasound for weaning outcome, we found that if TI is below 24% or LUSm is greater than 7 points, the patient has a high risk of weaning failure, with an AUC of 0.8 for LUSm and 0.71 for TI. We found similar values for extubation outcome.

Mean time on MV [29, 30], mortality of the patients included in the study (16%) [4], and SBT failure rate (11.6%) [31] was consistent with previously published results. Extubation failure occurred in 24.6% of the patients, a slightly higher proportion than the 10% to 20% reported in a number of other studies [1, 5, 32,33,34,35]. Of the 17 patients who failed extubation, only five (8.2%) required reintubation, a lower rate than reported in other studies [36, 37]. This shows that NIV plays a decisive role in reducing the need for reintubation without increasing morbidity or mortality [37, 38]. As such, through conceptually defined as a criteria of weaning failure [28], we consider that recovery with NIV in fact constitutes a success for the patient’s clinical situation. We observed that FW patients were more likely to have COPD. This is a logical finding, as COPD is a risk factor for extubation failure [39, 40]. Of the standard predictors of weaning assessed in our study (PIMax, RSBI, P0.1), we found none to be useful. In a study with the largest number of patients performed for the study of weaning predictors [6], about 500 patients, to assess the predictability of many indices as possible weaning predictors (minute volume, respiratory rate, PaO2, RSBI, PIMax, Maximum expiratory pressure, dynamic respiratory compliance, CROP index), it is observed that none of them has value as a predictor of weaning.

The results obtained for LUSm are consistent with previously reported data [8,9,10,11]. Our cut-off point is lower (7 LUSm points) because we assessed eight lung areas, whereas the other studies assessed 12. Our aim was to assess all the areas normally affected in critical patients [26], while simplifying the technique so that the patient did not have to be moved, and the associated complications could be avoided. We therefore consider LUSm to constitute a useful new proposal that is beneficial for both the patient and the operator. For TI, we found a cut-off point of 24% for predicting successful weaning, within the range of values reported in other studies (20–36%) [11,12,13,14,15,16,17,18,19,20]. The LUS and TI variables tested in those studies showed a higher predictive value for weaning success (Table 2), but a number of factors may have influenced this result. In those studies, patient selection was in some cases very strict, resulting in a homogeneous sample with specific characteristics (patients with COPD [15], tracheotomised patients with prior weaning failure [13], patients with ICU-acquired weakness [17]). Since these samples already had a higher probability of weaning failure, their results cannot be generalised to the whole population of critical care patients. Other limitations in the reviewed studies included elimination of deceased patients [20], diaphragm ultrasound when the patient was on MV rather than during SBT [20], use of a non-validated probe for diaphragm measurement [18], periods of up to 36 h between the ultrasound scan and extubation [12], and using STB failure rather than extubation failure as an endpoint [13]. Only one of the reviewed studies included a reproducibility study comparing TI measurement by two observers, with results very similar to ours (ICC 0.81) [14].

In our study, we chose not to select specific populations of patients for our study, to obtain a heterogeneous sample and produce generalizable results. The ultrasound measurements were taken within the first minutes of pressure support ventilation, the physicians were blinded to the ultrasound data, no patients were lost to follow-up, and the median time between SBT and extubation was 120 min.

One limitation of our study is the small sample size, which led to imprecise results with broad confidence intervals, especially for TI. Further studies with larger samples of patients are required to establish the true predictive power of these ultrasound techniques.

It should be noted that the interobserver difference in TI measurement (± 12.5%) was greater than the median difference in TI between the SW/FW groups (11%). In our sample, therefore, a degree of uncertainty was associated with this parameter. This is probably because the formula for calculating TI is such that a difference of a tenth of a millimeter in the measurement of inspiratory or expiratory diaphragm thickness has a considerable effect on the result. None of the studies reviewed considers this possibility. We believe the problem can be overcome with better ultrasound equipment and a more effective TI measuring technique, so that the true value of this parameter can be reported.

In the future, in order that results can be compared across studies, we believe a number of items should be standardised: the ultrasound technique, the definitions of weaning failure and extubation failure, the time between ultrasound and extubation, blinding of ultrasound findings, and the protocol. If results are to be applied to the whole population of general critical care patients on MV, a heterogeneous sample should be used. Studies in this domain should also assess reproducibility by measuring interobserver agreement in ultrasound measurements. In addition, it may be useful to consider other parameters, such as time on MV, disease severity, comorbidities, etc., alongside lung and diaphragm ultrasound measurements, to better predict weaning success.

Conclusions

In our study, interobserver agreement was excellent in LUSm measurements and moderate to good in TI measurements. The TI variable showed a degree of uncertainty for predicting weaning outcome, but overall its predictive value was found to be acceptable. LUSm produced stronger results in this regard. Lung and diaphragm ultrasound are promising techniques for predicting weaning outcome, but more studies are required to verify their reproducibility. These studies should have a standardised design and should assess interobserver agreement of ultrasound techniques. Using a non-specific sample would ensure that results can be generalised to all patients on MV.