Introduction

Echocardiography, a widely used imaging modality for assessing cardiac structure and function, involves the acquisition of images and subsequent measurement of various parameters [1]. However, the traditional interpretation of echocardiographic images requires manual analysis by trained experts, leading to time-consuming and operator-dependent results [2, 3]. The use of artificial intelligence (AI) in medical imaging has attracted attention due to its potential to improve examination efficiency, consistency, and accuracy over human interpretation [4, 5]. Several studies have demonstrated the potential of deep learning algorithms in classifying echocardiographic images based on specific view classifications, quantification of cardiac volumes, and assessment of cardiac systolic function [6,7,8,9,10]. In previous studies, several groups developed and externally validated an automated deep learning-based workflow for the classification and annotation of echocardiographic images [11, 12].

Therefore, there is a need to explore the use of AI algorithms to automate the measurement process and potentially reduce the overall examination time. The hypothesis of this study is that the implementation of AI algorithms for echocardiographic parameter measurement, after converting dynamic images to DICOM data, will lead to a significant reduction in measurement time compared to manual measurements performed by human experts. The AI's ability to automatically recognize and measure various cardiac parameters is expected to expedite the analysis process and provide efficient and reliable results, potentially revolutionizing the field of echocardiography [13]. We designed a prospective, single center, pilot study aimed to compare the time required for the measurement and report creation using conventional manual methods vs fully automated DICOM reading software (US2.ai).

Methods

Study population

The study enrolled patients who underwent echocardiographic evaluation conducted by a specific sonographer. Patients diagnosed with arrhythmia or poor image quality were also included in this study. Figure 1 shows the study workflow for echocardiographic parameter measurements. In all cases of echocardiographic examination, measurements were not performed during the recording process, with the focus solely on image acquisition. Subsequently, the time required for measurement and the creation of echocardiographic reports was recorded for both the experienced human examiner using the manual method and the fully automated analysis software. The start point for the manual measurement was defined as "The initial measurement image appears" and the endpoint was "Completion of all measurements". For the report creation step, the start point for the manual report creation was defined as "Entering initial measurement values" and the endpoint was "Completion of report comments entry". For the fully automated software, report creation initiated upon "Patient screen appears" and concluded with "Confirmation and modification fully completed." The fully automated software process was conducted by the same examiner, approximately one month after the measurements were taken using the manual method.

Fig. 1
figure 1

Flow chart of echocardiographic parameter measurement. The study population and echocardiographic parameter measurement. The same 2D videos and images were used for measurements by both human examiners and AI

Echocardiographic image acquisition

Echocardiography was performed using commercially available ultrasound machines. Image acquisition was performed by an experienced technician who holds certification as an echocardiography technologist recognized by the Japanese Society of Echocardiography. Only the necessary images for these parameters were recorded, and no measurements were performed during the examination. The image quality was categorized into three grades for subsequent assessment by the same observer. The quality of left ventricle (LV) images was assessed by considering the visibility of segments and the extent of endocardial border delineation in three cardiac apex sections. The evaluation criteria were as follows: good (0–2 segments poorly visible), fair (3–5 segments poorly visible), and poor (> 5 segments poorly visible). Similarly, Doppler image quality was also evaluated using a similar three-grade system. This evaluation specifically focused on the clarity of Doppler envelopes, with classifications of good (clear envelopes), fair (partially clear envelopes), and poor (unclear envelopes).

Manual assessment

All parameters were selected and measured following the routine examination protocols at our facility in accordance with the guidelines recommended by the American Society of Echocardiography [14]. Apical two- and four-chamber images were included. The biplane method of disks in two dimensions was used to calculate the volumes of the LV. The LV ejection fraction (LVEF) was determined using these volumes. Left atrial (LA) volume was also calculated using the biplane method of disks in two dimensions. Echocardiographic images were obtained for measurements of various parameters, including the interventricular septal thickness in diastole (IVSd), left ventricular internal diameter in diastole and systole (LVIDd, LVIDs), left ventricular posterior wall thickness in diastole (LVPWd), left ventricular mass index (LVMi), relative wall thickness (RWT), the left ventricular end diastolic and systolic volume by the modified Simpson's biplane method (LVEDV and LVESV MOD biplane), and the left ventricular ejection fraction by the modified Simpson's biplane method (LVEF MOD biplane). Images were included for Doppler parameter measurements such as the left and right ventricular outflow tract peak velocities (LVOT Vmax, RVOT Vmax), the aortic valve peak velocity (AoV Vmax), the mitral valve E and A wave velocities (MV-E, MV-A), the deceleration time (DecT), the early and late diastolic tissue Doppler velocities at the lateral and septal mitral annulus (eʹ lateral, eʹ septal, aʹ lateral, aʹ septal), the systolic tissue Doppler velocities at the lateral and septal mitral annulus (sʹ lateral, sʹ septal), the tricuspid regurgitant peak velocity (TR Vmax), the tricuspid annular plane systolic excursion (TAPSE), and the systolic, early diastolic and late diastolic tissue Doppler velocity at the tricuspid annulus (sʹ TAM, eʹ TAM, aʹ TAM). In the manual method, the findings from echocardiographic examinations, including measured values, were documented in text format to ensure clear understanding in a clinical context. These reports contained information about the presence of left ventricular and left atrial enlargement, right ventricular and right atrial enlargement, left ventricular wall hypertrophy, and wall motion abnormalities. Furthermore, the reports covered aspects, such as diastolic function, valvular diseases, and pulmonary hypertension.

Fully automated software

The study employed the US2.ai software, a fully automated DICOM reading software known for its speed and compatibility with various echo devices [15]. This software processed the 2D, Doppler in real-time, zero-click complete reports. The variables are measurements deemed clinically important by international societies (European Association of Cardiovascular Imaging [EACVI] [16], American Society of Echocardiography [ASE] [14]) for a comprehensive transthoracic adult echocardiogram. In this software, measurements equivalent to expert readings were attainable. However, manual adjustments were made in the following cases: (1) when measurements were missing despite the presence of images, and (2) when inaccuracies in measurements or misidentification of images were identified. The time required for these corrections was included in the measurement process.

In the report creation process, LV systolic function, LV diastolic function, LV geometry, RV function, LV and RV size, LA and RA size, the presence of aortic stenosis, pulmonary hypertension, as well as clinical considerations were automatically evaluated. All findings were accompanied by comments following multiple guidelines based on the acquired values. Any missing comments, such as asynergy or valve regurgitation, were manually added by a human in this study. In our study, we defined 'negative report complexity' as cases where the patient's cardiac function is appropriate for their age and does not show any significant abnormalities. Conversely, 'positive report complexity' was assigned to scenarios involving more complicated clinical conditions. This encompasses patients with reduced EF, heart failure, valvular heart disease, pulmonary hypertension, or a combination of these issues. Moreover, cases presenting findings not automatically detected by the software, such as significant valve regurgitation, were also categorized under positive report complexity. Once a report containing clinical results in a format comparable (or not inferior) to those obtained through the manual method was generated, the report creation process was deemed complete.

Statistical analysis

Continuous data were expressed as mean ± standard deviations (SD) and categorical data as an absolute number and percentages. Student’s t-test was used to compare continuous variables while the Chi-square test was used to compare categorical variables. Agreements between expert human and fully automated measurements for continuous variables were assessed using Intraclass Correlation Coefficients (ICC). Statistical analyses were performed using SPSS 21.0 (SPSS, Chicago, IL, USA) and MedCalc 19.5.6 (Mariakerke, Belgium). P value < 0.05 was considered statistically significant.

Sample size determination

We performed sample size calculations using the following methodology. The total time for the manual process was estimated at 23 min based on input from multiple individuals. Furthermore, drawing insights from various sources, we projected an examination time of 15 min when utilizing AI, indicating an expected time difference of 8 min compared to the manual method. The SD for the manual process was assumed to be 10 min. Additionally, we hypothesized that the AI method would consistently yield time savings compared to the manual approach. The objective was to determine the minimum required sample size to detect this difference, considering an 80% statistical power and a significance level of 5%. Employing a paired t-test model, we utilized the mean difference and SD between manual and AI measurements for our calculations. Based on our analyses, we concluded that a sample size of approximately 21 participants per group would yield statistically significant results when the AI method is employed and the SD is approximately 15.

Results

Clinical backgrounds

We investigated a cohort of 23 subjects, which consisted of the required minimum sample size of 21 cases, along with an additional preliminary inclusion of 2 cases (mean age; 57 ± 17 years, 30% males). Patient details are provided in Supplementary Table 1 and Supplementary Table 2. The distribution of echocardiogram requests in the study cohort was as follows: 13 cases were designated for screening, 4 cases pertained to diagnosed and monitored ischemic heart disease, and 2 cases sought an assessment for arrhythmia. Furthermore, individual cases of hypertrophic cardiomyopathy, severe pulmonary hypertension and severe aortic stenosis were observed. Additionally, one case required an echocardiogram to be conducted in the intensive care unit. The echocardiographic image quality was categorized as follows: 16 cases were rated as good, 6 cases as fair, and 1 case as poor.

Measurement and report creation by AI

Table 1 indicates the count of measurements successfully captured by AI across various echocardiographic parameters. The following parameters were recognized and measured by the AI with a success rate of 100%: LVIDd, LVIDs, IVSd, LVPWd, LV mass, RWT, LVEDV MOD biplane, LVESV MOD biplane, LVEF MOD biplane, LAESV MOD biplane, MV-E, Dec T, E/eʹ lateral, sʹ lateral, eʹ lateral, TR V max, LVOTd, LVOT Vmax, RVOT Vmax, and AoV Vmax. However, AI was unable to evaluate E/A, MV-A, aʹ lateral, sʹ TAM, and eʹ TAM in one case. Additionally, the E/eʹ mean, eʹ lateral, eʹ septal, aʹ septal, and aʹ TAM measurements were not recognized by AI, resulting in inaccuracies in two cases. For all parameters that significantly deviated from expert measurements, adjustments were made. Analysis using ICC indicated a high level of agreement, with p values < 0.05, between expert human and fully automated measurements for all these parameters.

Table 1 The comparison of echocardiographic parameters and time required between manual and the AI

Time required for AI and manual methods

As shown in Supplementary Table 1 and Supplementary Table 2, AI achieved time savings of 96% for measurements and 100% for report creation compared to the manual method. Table 1 presents a comparison of measurement time and report creation times between the manual and AI methods. The manual method required an average measurement time of 325 ± 94 s, while AI took 159 ± 66 s (p < 0.01). In the report creation step, the average time for manual report creation was 429 ± 128 s, whereas AI only needed 71 ± 39 s (p < 0.01). Overall, AI significantly reduced the time required for measurement and report creation compared to the manual method (230 ± 83 vs 754 ± 206 s, p < 0.01). As depicted in Fig. 2, the average time for measurement and report creation per case can be reduced by 524 s (70%) due to the significant time reduction achieved by AI compared to the manual method.

Fig. 2
figure 2

Time difference in echocardiographic measurement and report creation between Human and AI. Compared to the time required for measurements and report creation by humans, using AI enabled an average reduction of 70% in time

Impact of AI on measurement and report creation time

The median AI measurement time was 217 s, leading to the division of patients into two groups based on this median value. Table 2 shows the characteristics of the two groups based on AI time. The group with faster measurements showed significantly fewer modified indications compared to the group with longer measurements (2.3 ± 1.9 vs. 5.2 ± 2.6, p < 0.01). The faster measurement group showed a lower percentage of patients with fair or poor image quality (9% vs. 50%, p = 0.02) and more than mild pericardial effusion (0% vs 33%, p = 0.02) compared to the longer measurement group. Additionally, the number of diagnoses in patients' reports was lower in the faster measurement group (0.7 ± 0.9 vs 2.8 ± 2.1, p < 0.01). Even when Doppler image envelopes were classified as 'fair', they usually matched expert measurements, rarely requiring further adjustments or re-measurements (9% vs. 17%, p = 0.30).

Table 2 Characteristics of groups with and without time requirements

The impact of AI on measurement and report creation time was shown in Fig. 3. In cases with fair or poor image quality (n = 7), the number of corrections in automated analysis results was higher, and the measurement time significantly increased compared to cases with good image quality (n = 16) (217 ± 51 vs. 133 ± 55 s, p < 0.01). However, no significant difference in report creation time was observed based on image quality (72 ± 43 vs. 70 ± 34 s, p = 0.25) (Fig. 3A).

Fig. 3
figure 3

Impact of AI on measurement and report creation time. A Difference in image quality; B difference with and without findings

Regarding the influence of report complexity, no significant difference in measurement time was found between cases with negative (n = 14) and cases with positive (n = 9) report complexity (158 ± 67 vs. 160 ± 64 s, p = 0.18). However, report creation time was significantly longer in cases with positive report complexity compared to cases with negative (99 ± 37 vs. 54 ± 29 s, p < 0.01), as it took longer for the human to confirm the findings presented by AI (Fig. 3B).

Discussion

This study conducted a comparative analysis between manual and AI methods in echocardiography, involving 23 consecutive patients. The fully automated AI software exhibited significant potential, reducing echocardiographic analysis time by 70% without compromising accuracy. Patients with faster AI measurements showed a higher frequency of good image quality and a lower number of diagnoses. In cases with fair/poor image quality, more corrections were required, leading to an increase in measurement time. The importance of precise image acquisition by humans was evident, as the obtained measurements directly influenced the report creation process. Overall, the implementation of AI has demonstrated the potential for reducing examination time in the field of echocardiography, thereby making a substantial contribution to enhancing examination efficiency.

A comprehensive and efficient tool for time savings

In this study, the average time for manual acquisition of routine images was approximately 5–6 min. However, performing measurements for all relevant cardiac parameters post-image acquisition can be time-intensive, often taking 15 min or more, depending on case complexity and the measurer's experience. This can impose a significant burden on examiners. With the Us2.ai cloud-based analysis tool, the measurement time is reduced to less than 1 min for image upload. While there are several reports on time reduction using AI for faster examinations through a semi-automatic approach [17, 18], there are no studies on the extent of time reduction achieved by fully automated software compared to expert manual report creation. Additionally, AI demonstrates nearly 100% recognition and measurement capabilities for the majority of parameters. There were significant ICC observed between AI and expert human measurements, indicating a high level of agreement. AI performs the measurement with extremely good reproducibility and accuracy, as suggested by the findings.

The applicability of AI in echocardiography

While AI demonstrates high efficiency and accuracy in the majority of cases, it is essential to acknowledge its limitations in certain specific situations or patient characteristic. Particularly, when dealing with fair or poor image quality, the automated analysis necessitated more adjustments to its initial measurements compared to cases with good image quality. Consequently, the time required for automated measurements significantly increased for cases with fair or poor image quality in contrast to those with good image quality. This underscores that the precision of automated measurements could be affected by image quality, emphasizing the need for additional refinements when images are less optimal. A notable aspect of our study was ensuring measurement accuracy, particularly in Doppler imaging. We found that Doppler images rated as 'Fair' for envelope clarity typically matched expert assessments, lessening the need for re-analysis. However, when B-mode images are unclear during Doppler measurements, the automated analysis software may sometimes select incorrect parameters, highlighting the importance of image clarity in both Doppler and B-mode echocardiography for reliable evaluations. Another important finding from the results is that for ultrasound AI tools, image quality plays a crucial role in obtaining reliable automatic measurements [13, 19]. This presents a challenge as ultrasound outcomes are generally influenced by operator skill. While accurate AI tools can be advantageous for less experienced users, the fundamental prerequisite for accurate automatic measurements remains good image quality. This could pose challenges for less experienced examiners. If AI tools exclusively perform well in the hands of experts, their overall utility might come into question. Hence, users should strive to capture optimal images to ensure precise measurements. Nevertheless, despite best efforts, instances of suboptimal image quality may arise. In such scenarios, it is recommended not to solely depend on AI-generated measurements but to adopt a collaborative approach with AI in image evaluation. By working in tandem with AI, more favorable outcomes can be achieved, guaranteeing accurate image assessments for each patient.

Implications for clinical practice

AI systems consistently produce standardized results, contrasting with the potential variability in outcomes from human echocardiogram technicians due to differences in experience and skills. Additionally, human attempts to expedite the process may introduce measurement errors or mistakes. Therefore, the use of AI leads to improved consistency in test outcomes and reduces the risk of misdiagnosis. Additionally, AI utilization in echocardiograms leads to automatic and rapid result analysis, significantly speeding up report generation compared to traditional methods. As a result, AI adoption enhances result consistency and mitigates the risk of misdiagnosis. Furthermore, integrating AI into echocardiograms automates and expedites result analysis, considerably expediting report generation compared to conventional methods. This time-saving benefit for healthcare professionals allows them to allocate more attention to critical responsibilities like patient examinations and care. Notably, this advancement also positively impacts patients. AI-enabled rapid echocardiogram result delivery shortens waiting times and alleviates anxiety. This fosters a smoother and less stressful medical encounter, ultimately enhancing the overall patient experience. In summary, AI implementation offers multifaceted patient advantages, providing swifter and more dependable results while bolstering healthcare service efficiency and quality.

Limitations

There are several limitations to this study. First, the study was conducted at a single center, which might restrict the broader applicability of the results. Additionally, the study included only 23 consecutive individuals, resulting in a small sample size that limits the generalizability of the findings to outside populations. Furthermore, the study was carried out by a specific echocardiogram technician, and the results were not compared to those obtained by other examiners, which prevents the assessment of inter-examiner variability or interference.

Moreover, while the study included patients with arrhythmias and poor image quality, it did not consider other diseases or specific clinical situations, potentially limiting the conclusions regarding the applicability of the findings to specific diseases. Another significant limitation is related to the measurement and interpretation time. In our study, the process of importing data into the analysis software and anonymizing it was manually performed, which was time-consuming. However, it is important to note that this issue has been resolved in commercial devices. Additionally, we integrated image interpretation into the measurement process, which hindered our ability to independently assess the interpretation time, particularly in the context of AI methods. Consequently, this approach restricted our capacity to clearly evaluate how interpretation time influences the overall efficiency of measurement and reporting. We identify this as an important focus for future research.

Conclusions

The fully automated AI software showcases substantial potential for decreasing echocardiographic analysis time while upholding accuracy. This potential offers significant benefits to clinical workflow and efficiency, positively impacting patients and healthcare providers alike. In summary, AI's capacity to expedite and refine echocardiographic interpretation presents a noteworthy stride in medical diagnostics, ultimately resulting in enhanced patient care.