Introduction

The assessment of left ventricular ejection fraction (LVEF) is part of the point of care echocardiographic evaluation of critically ill patients [1,2,3]. It has the disadvantage of being time-consuming and operator dependent. Machine learning algorithms have recently been developed to facilitate, automate, and decrease the variability of echocardiographic measurements [4,5,6,7]. Several algorithms have been designed specifically for the real-time assessment of LVEF [8,9,10]. They have been trained to recognize specific ultrasound images, enable instantaneous image quality control, and measure LVEF automatically in just a few seconds. However, clinical validation studies remain scarce and have been done in ambulatory cardiac patients [8,9,10].

In critically ill patients, we compared real-time LVEF measurements taken with a new machine learning algorithm to reference manual measurements taken by experts in echocardiography.

Methods

We prospectively studied critically ill patients who required an echocardiographic evaluation during their ICU stay and in whom it was possible to obtain transthoracic images enabling a manual and quantitative evaluation of left ventricular systolic function. Real-time LVEF measurements were taken with a machine learning algorithm (Real-Time EF, GE Healthcare, Chicago, USA) installed on a cart-based ultrasound system (Venue, GE Healthcare). The real-time LVEF software is a neural network algorithm which has been trained with thousands of cardiac images to automatically detect the 4-chamber view of the heart, locate landmarks on the left ventricular wall and detect end-diastolic and end-systolic times from the mitral valve motion. Once the endocardial border is detected, the algorithm provides immediate user feedback regarding image quality using color-coding. When image quality is considered acceptable (green or yellow endocardial border displayed on screen), left ventricular volumes are automatically estimated from the single-plane Simpson disk method, enabling LVEF calculation from real-time end-diastolic and end-systolic volumes.

Real-time LVEF measurements obtained by a novice (LVEFNov) and by an expert (LVEFExp) were compared with LVEF measurements taken manually by an expert in critical care echocardiography (LVEFRef). Seven novices (all residents in our department and beginners in echocardiography) and two experts (senior intensivists with the European Diploma in Advanced Critical Care Echocardiography) participated in data collection. Measurements taken in triplicate were averaged for comparisons, and the intra-operator reproducibility was assessed by calculating the coefficient of variation (standard deviation divided by the mean) expressed as a percentage.

The quality of echo images was classified as good, fair, or poor by the experts, and as green (optimal), yellow (acceptable), or red (not acceptable for real-time LVEF measurements) by the machine learning algorithm.

Results are expressed as mean ± standard deviation (SD). Agreement between real-time and reference LVEF measurements was tested using the Bland–Altman method. Statistical comparisons were made with a t-test. A p value < 0.05 was considered statistically significant.

Results

We prospectively enrolled 95 patients (mean age 60 ± 17 yr) over a 9-month period. Most patients were admitted for medical reasons and 32 (34%) were mechanically ventilated at the time of the ultrasound evaluation (Additional file 1: Table S1). Reference LVEF ranged from 26 to 80% (mean 54 ± 12%) and the reproducibility of manual measurements was 9 ± 6%. Thirty patients (32%) had a LVEFRef < 50% (left ventricular systolic dysfunction).

Real-time LVEFExp ranged from 31 to 68% (mean 54 ± 10%). We observed a strong relationship (r = 0.86, p < 0.001) between reference and real-time LVEFExp (Fig. 1). The average difference (bias) between real-time LVEFExp and reference LVEF was 0 ± 6% with 95% limits of agreement of − 12 to + 11% (Fig. 1). The intra-operator reproducibility of measurements was better for real-time LVEFExp than for reference manual measurements (5 ± 4% vs. 9 ± 6%, p < 0.001). The sensitivity and specificity of real-time LVEFExp to detect systolic dysfunction were 70% and 98%, respectively.

Fig. 1
figure 1

Correlation and Bland and Altman comparison between reference left ventricular ejection fraction measurements taken by experts (LVEFRef) and real-time measurements taken with a machine learning algorithm. Left: real-time measurements taken by experts (LVEFExp), right: real-time measurements taken by novices (LVEFNov)

Real-time LVEFNov ranged from 28 to 70% (mean 54 ± 9%). We observed a strong relationship (r = 0.81, p < 0.001) between LVEFRef and real-time LVEFNov (Fig. 1). The average difference (bias) between real-time LVEFNov and LVEFRef was 0 ± 7% with 95% limits of agreement of − 14 to + 13% (Fig. 1). The intra-operator reproducibility of measurements was better for real-time LVEFNov than for reference manual measurements (6 ± 5% vs. 9 ± 6%, p < 0.001). The sensitivity and specificity of real-time LVEFNov to detect systolic dysfunction were 73% and 98%, respectively.

According to experts’ judgement, the quality of echo images was good, fair, and poor in 41, 43, and 11 patients, respectively. The average difference (bias) between real-time and reference LVEF measurements was comparable when images were of good quality (n = 41) and of fair or poor quality (n = 54), both for experts and novices (Table 1). And results did not change significantly after excluding the 11 patients with poor image quality (Table 1).

Table 1 Main results in subgroups based on image quality

According to the machine learning algorithm, the quality of echo images was green, yellow, and red flagged in 80, 15 and 0 patients, respectively. Results did not change significantly after excluding the 15 patients in whom images were non-optimal/yellow flagged (Table 1).

The average difference (bias) between real-time and reference LVEF measurements was slightly higher in mechanically ventilated (n = 32) than in non-mechanically ventilated patients (n = 63), both for experts (− 2 ± 7% vs. 0 ± 5%) and novices (− 1 ± 8% vs. 0 ± 6%). However, observed differences did not reach statistical significance.

Discussion

An increasing number of anesthesiologists and intensivists have been trained to perform qualitative echocardiographic assessments [1,2,3]. However, quantitative evaluations remain challenging for many, particularly for novices. In the present study, we tested an artificial intelligence-enabled tool specifically designed to facilitate and automatize the bedside measurements of LVEF. Our findings suggest that this tool enables a clinically acceptable estimation of LVEF when compared to manual measurements. They also suggest that the real-time LVEF tool enables novices to assess LVEF with a better reproducibility than what experts can achieve manually.

Several machine learning algorithms have been designed to assess LVEF from a parasternal long axis view or from an apical 2 or 4-chamber view [8,9,10]. Comparison studies published so far yielded promising results. Indeed, close correlations and good agreements have been reported between LVEF measurements taken by skilled operators and by machine learning algorithms, particularly when the algorithm detects and analyze the apical 4-chamber view [9, 10]. However, clinical validation studies remain scarce and have been done in ambulatory cardiac patients. Our study appears to be the first evaluation done in critically ill patients in whom transthoracic echocardiography is often challenging, in particular when patients are mechanically ventilated. Our findings suggest that the real-time LVEF algorithm may help clinicians, including beginners in echocardiography, to accurately measure LVEF in just a few seconds. Such a tool may contribute to further increase the adoption of point of care echocardiographic evaluations in critically ill patients.

Our study has limitations. Because ultrasound evaluations are time-consuming, we studied hemodynamically stable patients to ensure comparability between measurements taken at each step of the evaluation (LVEF measurements were first taken by a trainee, then by an expert both manually and with the automatic method). Also, we did not assess the ability of the new real-time LVEF method to track changes in LVEF. A small number of patients had a severely impaired left ventricular systolic function (LVEFRef < 30%, n = 4) or a hyperkinetic ventricle (LVEFRef > 70%, n = 2). Therefore, future studies will need to assess the clinical value of the real-time LVEF algorithm during hemodynamic instability, in patients with a very low or supranormal LVEF, and during therapeutic interventions (e.g., inotropic stimulation) known to induce significant changes in systolic function.

Conclusion

Machine learning-enabled real-time measurements of LVEF were strongly correlated with manual measurements obtained by experts. The accuracy of real-time LVEF measurements was excellent, and the precision was fair. The reproducibility of LVEF measurements was better with the machine learning system, including for novices. The specificity to detect left ventricular systolic dysfunction was excellent both for experts and novices, whereas the sensitivity could be improved. Studies are needed to confirm our findings in mechanically ventilated patients with cardiogenic shock or hyperdynamic states.