Introduction

Echocardiography offers several advantages over other imaging modalities for diagnosing inferior vena cava (IVC) abnormalities. Examples of these advantages include high temporal resolution, non-invasiveness, low cost, and portability [1]. However, manual echocardiography analysis by a trained sonographer is time-consuming, costly, and may exhibit poor reproducibility [1, 2]. The utilization of automated analysis can streamline clinical workflows, provide consistent results, and might subsequently enhance clinical decision making [1, 2].

IVC is responsible for circulating deoxygenated blood from the lower extremities and abdomen back to the right atrium. Studies [3, 4] have reported that the diameter of IVC (dIVC) and its change with inspiration (a.k.a., IVC collapsibility) can be non-invasively captured using ultrasound imaging and used to determine the fluid status in critically ill patients and acute heart failure (HF) conditions, and this is now a routine part of clinical echo exams. The collapsibility index of the inferior vena cava (cIVC) is visually estimated based on the changes in IVC diameter with inspiration. The current practice of measuring dIVC, cIVC, and estimating RAP involves several steps. First, a sonographer needs to manually select a high visual quality subcostal long-axis view of the IVC from an echocardiography study that may contain over a hundred views. Then, the dIVC is measured perpendicular to the long axis of the IVC within 1.0 to 2.0 cms (cm) of the cavo-atrial junction [4]. The cIVC is measured as the difference between the maximum and minimum IVC diameters during inspiration. Finally, the measured dIVC and collapsibility can be used to readily estimate right atrial pressure (RAP) using the American Society of Echocardiography (ASE) guidelines or other guidelines [4, 5].

Although this approach for estimating RAP based on echocardiographic IVC is considered the current standard, the manual calculation of RAP [6,7,8] may have low reproducibility and weak correlation to actual RAP values. For example, Magnino et al. [6] found that the r-squared values for IVC diameter and collapsibility were 0.19 or lower, and the actual values were within 2.5 mmHg only 34% of the time. Such inaccuracy may lead to an overestimation of pulmonary pressure, which could result in inappropriate diuretic treatment choices, and ultimately lead to increased uncertainty in clinical outcomes. Therefore, there is a need to develop new approaches for more accurate and objective IVC analysis and RAP estimation. In this work, we hypothesize that advances in machine learning (ML) and artificial intelligence (AI) techniques enable the development of a novel fully automated, reproducible, and scalable pipeline for echocardiographic dIVC and cIVC analysis, and RAP estimation that could be used by experts and non-experts alike, in both high- and low-resource primary care.

ML algorithms have been widely used for the automated analysis and interpretation of echocardiography [2] as well as automated quantification of cardiac measurements including ejection fraction [9, 10], thickness of left ventricle (LV) and surrounding walls [11, 12], LV strain [13], and doppler velocities [14]. Despite the clinical significance of IVC collapsibility analysis and RAP estimation, only a handful of studies [15,16,17] proposed automated solutions. For example, Mesin et al. [17] proposed a semi-automated method which uses a support vector machine (SVM) for RAP classification. The proposed method achieved 71% accuracy, which is higher than manual estimation using published guidelines (61% accuracy). Instead of using traditional ML methods, recent studies [15, 16] employed advanced deep learning (DL) methods such as long short-term memory (LSTM) for predicting fluid responsiveness in critically ill patients based on the analysis of IVC collapsibility. Other methods for automated IVC analysis can be found in [8, 18, 19].

Although the automated analysis of IVC and RAP has been explored, current methods (1) are applied directly to a manually pre-selected video or single-frame image from the echo video containing the IVC view; (2) have large and complex models that limit the usage on hand-held devices; and, (3) are designed for a “closed-world” environment where the training data is fixed with no opportunity for the AI to learn new knowledge, which limits their usability and generalizability in real-world clinical settings. To mitigate these challenges, our work presents a multi-stage AI system that can estimate dIVC, cIVC, and RAP. The proposed system employs a lightweight and open-world ML architecture to rapidly generate dIVC, cIVC, and RAP values. The lightweight feature would facilitate its integration into handheld devices, which can enhance accessibility. The open-world feature makes the system more robust in detecting and learning new, unpredictable cases or scenarios in real-world clinical settings. We believe that our system is the first one that has these capabilities to estimate RAP reliably and automatically. We evaluate the performance of the proposed system on clinical routine echocardiograms and validate its accuracy against measurements made by human experts. Finally, we describe some limitations as well as potential applications of the proposed system.

Fig. 1
figure 1

Automated pipeline for echocardiography assessment of dIVC, cIVC, and RAP estimation

Materials and methods

All echocardiography exams were performed at the Clinical Center of the National Institutes of Health (NIH). This project was reviewed by the NIH Office of Human Subjects Research Protections, which determined that the activities proposed did not require IRB review or approval because the project does not qualify as human subjects research (45 CFR 46.102) as defined in the federal regulations. De-identified echo videos/images of 255 adult participants were included in this study.

Echocardiography dataset

Each echocardiogram study consists of a collection of videos and still images showing various cardiac views including the parasternal long axis view (PLAX), Doppler, and apical four-chamber view (A4C) in addition to the IVC subcostal view. These were acquired using diverse echocardiography devices including iE33, GE E9, and GE Vivid E95. The manual measurements of IVC diameter were provided by board-certified echocardiographers following conventional methodology in current clinical practice.

For all downstream analyses, DICOM formatted videos were converted into multidimensional numeric arrays of pixel intensities. The acquired IVC videos have a spatial resolution of \(800 \times 600\) pixels. The individual dimensions of the arrays represent time, x and y coordinates in space, and additional dimensions (channels) to enable the encoding of color information. We divided the entire dataset (n = 255) into a training set (\(\approx\) 70%, n = 177 patients) and a hold-out testing set (\(\approx\) 30%, n = 78 patients). The training set was further divided into training and validation using 10 folds cross validation. The training set was used for the main tasks of our pipeline: (1) IVC view classification and quality assessment; (2) IVC segmentation; and (3) dIVC and cIVC quantification.

Automated pipeline for echocardiography IVC collapsibility and RAP estimation

The proposed pipeline is divided into multiple stages that are depicted in Fig. 1. This pipeline starts by performing image quality assessment and subcostal IVC view retrieval, and then followed by region segmentation, quantification, collapsibility analysis, and RAP estimation. The image analysis algorithms for each of the individual stages in the pipeline are described in the following subsections.

Image analysis: quality assessment & view retrieval

The image retrieval algorithm retrieves a specific view with acceptable quality. Our algorithm utilizes a lightweight model with a shared encoder and two “heads”. This term is derived from the analogy that the network shares common weights but like twins whose bodies are conjoined at birth and can appear to have two heads, each head can concurrently compute a separate but related input (same input image, two different tasks). In our algorithm, the first head is used for the quality assessment task and the second head is used for the view classification task. The view classification head detects an IVC view from a given echo study while the quality assessment head labels a given IVC view as good quality or bad quality. Both heads are configured to work in parallel, which can enhance the efficiency of the image retrieval component. The entire network with the shared encoder and two heads has a small size and high inference speed (0.22 s), enabling its use in resource-limited and handled devices. Further discussion about this two-head model along with a visualization is provided in Appendix A.

Conventional machine learning models require samples of a specific set of classes (e.g., IVC view vs. non-IVC view) to be available during training. This assumption, which is known as a closed-world assumption, may be too strict for real-world environments that are open and often have unseen examples. To overcome this challenge, our classification algorithm is designed to run efficiently in open-world clinical settings. It utilizes an OpenMax function [20] instead of Softmax function. The OpenMax function can label new (previously unseen) classes of images as “unknown”; a critical limitation of the Softmax function is that it strictly labels a new class as one of the known classes. Further details on OpenMax can be found in [21] and are provided in Appendix B.

The significance of open-world active learning can be demonstrated in several IVC applications. One possible application would be to group rare cases of IVC morphology that might not exist in the training data. Examples of these rare cases include a very dilated IVC with poor collapsibility due to heart failure or a very small IVC with complete collapsibility due to dehydration. Another “unknown” cluster may be IVC images that appear to collapse but represent artifacts in which the image appears to go out of plane of the ultrasound beam due to respiration movement.

Image analysis: IVC region contouring

To obtain the contour of the IVC region, we applied a lightweight segmentation algorithm [22]. As compared to other segmentation algorithms, this algorithm has a smaller size and high inference speed (> 60 frames per seconds), enabling its use in resource-limited settings and on handheld devices. Further details along with a visualization of the segmentation algorithm can be found in Appendix C.

After segmenting IVC region, the segmented region was cleaned to remove any isolated and unneeded pixels and retains a closed region. To delineate the IVC contour, we used the Moore-Neighbor tracing algorithm modified by Jacob’s stopping criteria [23]. After performing automated IVC region segmentation and contour delineation, the delineated contour was then used to compute IVC thickness or diameter in all frames of a given clip including the end-diastolic (largest dIVC) and end-systolic (smallest dIVC) frames as described next.

Image analysis: automated cIVC tracking

The delineated region, which was generated as described above, is divided into equal segments (or sectors). To compute dIVC, we automatically generated the major axis of the sub-segment that is located approximately 2 cm proximal to the ostium of the right atrium. We then computed the Euclidean distance between the endpoints of the major axis. Finally, we converted the computed pixel distance into millimeters (mm) as described in Appendix D. The computed dIVC was used to construct the dIVC curve by plotting the values over frames followed by applying a Savitzky-Golay filter [24] to obtain a smoothed dIVC curve. From this curve, the absolute maximum value (highest peak) of this measurement and the absolute minimum value (lowest valley) can be easily detected and used to measure the collapsibility percentage or cIVC. IVC collapsibility (cIVC) was computed based on the difference between the maximum peak and minimum valley in the dIVC curve. Specifically, cIVC is calculated as:

$$\begin{aligned} cIVC = \frac{dIVC_{max} - dIVC_{min}}{dIVC_{max}} \times 100\% \end{aligned}$$
(1)

Image analysis: automated RAP estimation

After the automated cIVC analysis, RAP is computed using the automatically generated dIVC and cIVC values based on two different criteria, namely ASE Criterion and NIH Criterion as detailed in Table 1.

ASE Criterion This criterion follows ASE guidelines [4, 5] for classifying RAP into 3 classes: 3, 8, and 15 mmHg based on dIVC and cIVC. A RAP of 3 mmHg is considered a normal or low pressure, indicating that the heart is functioning normally and there is no excessive pressure in the atrium; a RAP of 8 mmHg is considered slightly elevated, which can be caused by a variety of conditions such as heart failure, pulmonary hypertension, or fluid overload; a RAP of 15 mmHg or higher is considered severely elevated, which can indicate more severe heart failure, pulmonary embolism, or other serious cardiac conditions.

NIH Criterion This criterion follows a site-specific (NIH) guideline for classifying RAP as: 5, 10, 15, and 20 mmHg based on dIVC and cIVC. A RAP of 5 mmHg is considered a normal or low pressure; a RAP of 10 mm Hg is considered slightly elevated; a RAP of 15 mmHg would be considered moderately elevated; and a RAP of 20 mm Hg or higher would be considered severely elevated. Note that this criterion (NIH) has four RAP categories while the ASE criterion has three RAP categories due to historic precedents in this NIH echo lab.

Table 1 Different criteria (ASE and NIH) for RAP estimation

Statistical analysis

Data are expressed as mean ± standard deviation (SD) unless specified. All dIVC measurements are considered continuous variables while cIVCFootnote 1 and RAP values are considered categorical data. The Shapiro–Wilk test was used to test for the normality of IVC distribution. Automated versus manual reference measurements of IVC were compared using two-tailed, paired student’s t-test (or Mann-Whitney U-test if not normal), and a chi-square test was used for cIVC and RAP comparisons.

Additionally, Pearson correlation coefficient [25] and Bland-Altman [26] analyses were performed to assess the agreement between the automated dIVC measurements and those estimated by experts. To assess the agreement between the manual and automated RAP measurements, we used the confusion matrix, also known as a contingency table for categorical comparison. Various statistics were computed based on the values in the confusion matrix, such as macro accuracy, sensitivity, specificity, and f1-score to further evaluate how well the automated RAP values agree with the manual reference values.

Table 2 Demographic and clinical information for study participants

Results

Our dataset includes a similar distribution of male (128, \(\approx\) 50%) and female (127, \(\approx\) 50%) patients. The mean age was 44.99 ± 17.84 years. The mean weight and height were 76.99 ± 24.70 kg and 1.69 ± 0.14 cm, respectively. The mean body mass index (BMI) was 27.51 ± 15.55 kg/m2. The patients in our study had different clinical conditions including autoinflammatory disease, breast cancer, bronchiectasis, carcinoid, and SCD. The racial distribution is as follows: White (\(\approx\) 66%), Hispanic (\(\approx\) 11%), black or African American (\(\approx\) 16%), Asian (\(\approx\) 5%), and Mixed (\(\approx\) 2%). Table 2 summarizes the demographic and clinical information for the study participants.

Automated IVC retrieval and quality assessment

The downstream goal of IVC quantification requires accurate selection of individual subcostal IVC view from other views in each echocardiography study. Although others have previously published approaches in this area [14, 21, 27], our pipeline includes an automated stage to distinguish the IVC view from other echocardiographic views as well as subclasses of IVC views (e.g., view with artifact or very dilated IVC) with an accuracy of 0.97, a precision 0.96, a sensitivity of 0.97, and f-1 score of 0.96.

Another important step prior to automated IVC quantification is the quality assessment of the view. Several studies (e.g., [28]) reported that the accurate analysis of echocardiography is hugely dependent on the quality of the images, and that poor-quality images impair echocardiography quantifications. Therefore, we utilized a lightweight algorithm for assessing the quality of the IVC image prior to boundary delineation and thickness quantification. Our quality assessment algorithm achieved an accuracy of 0.94 ± 0.10, a precision of 0.94 ± 0.04, a sensitivity of 0.95 ± 0.09, and f-1 score of 0.95 ± 0.06. Figure 2 shows examples of echocardiography images classified as IVC with good and bad quality.

Fig. 2
figure 2

Left: example of good quality IVC view retrieved automatically from a set of other views; right: example of IVC view retrieved automatically from a set of other views and labeled as unusable (bad quality) as the IVC’s boundary is not clear

Differences between manual and automated segmentation

To assess the differences between the manual and automated IVC region segmentation, we used intersection over union (IoU) [29] and dice similarity coefficient (DSC) [29]. IoU and DSC are measures of overlap between two sets of data and can be used to quantify the similarity or difference between manual and automated regions. The IoU is calculated as the intersection of the two regions divided by the union of the two regions while the DSC is calculated as twice the intersection of the two regions divided by the sum of the two regions. A high IoU or DSC indicates a strong agreement between the manual and automated regions, while a low score indicates disagreement or differences between the two regions.

Our segmentation algorithm achieved excellent performance segmenting the IVC region with an IoU score of 0.96 ± 0.03 and a DSC score of 0.98 ± 0.05. In addition, our lightweight segmentation algorithm achieved an inference speed of > 60 frames per second (FPS). Figure 3 shows an example of the automatically segmented region along with the automatically generated contour.

Fig. 3
figure 3

Left: the automatically generated mask overlayed on the original image; right: the contour of the overlayed mask

dIVC and cIVC tracking

The automatically delineated region in each frame is used to derive dIVC by measuring IVC diameter 2 cm from the junction of the right atrium. As this dIVC calculation is performed in each frame, the calculation of IVCs diameter was performed over frames. Figure 4 shows an example of the absolute maximum and minimum dIVC as well as dIVC curve over frames.

Fig. 4
figure 4

Top left: automated dIVC in frame 253 of a given video with a dIVC of 1.73 cm (maximum dIVC); top right: automated dIVC in frame 183 of a given video with a dIVC of 0.475 cm (minimum dIVC); bottom: dIVC in each frame of a given clip, where the peaks indicate maximum dIVC and valleys indicate minimum dIVC

For quantitative comparison of the automated versus manual dIVC measurements, the t-test shows there is no significant difference between the two groups (p = 0.70). To assess the agreement between the manual and automated dIVC, we used Pearson correlation and Bland-Altman plots. Figure 5 shows an excellent agreement between the automated dIVC measurements and those estimated by experts based on correlation (r = 0.96) and Bland-Altman plots. These results suggest that the automated method is accurate and allows assessing IVC diameter in each frame.

To further evaluate the automated method, we performed a variability analysis of measuring dIVC at different locations. Our results showed a strong agreement at 3 cm location (r = 0.95) as well as at 1 cm location (r = 0.87) with the manual reference standard. It is important to note that our automated algorithm is capable of measuring dIVC at various other spatial locations, including locations at the IVC-right atrium junction and locations 4 cm and 5 cm caudal to the junction. This feature enables the algorithm to perform variational analysis at different locations and time points, resulting in more reliable measurements. However, for this study, we only compared the manual and automated dIVC measurements at 2 cm from the right atrium junction since manual measurements were only available at that location.

For the subsequent comparison of cIVC, the manual and automated values were based off the dIVC measured at 2 cm from the right atrium junction since manual measurements were performed at that location. Recall that cIVC is estimated by plugging \(dIVC_{max}\) and \(dIVC_{min}\) into Eq. 1. For the cIVC comparison, the chi-square test showed that there is a significant association between the automated and manual estimates (\(p<0.01\)). After the automated dIVC and cIVC measurements were derived, they were used to generate the automated RAP estimates.

Fig. 5
figure 5

Correlation and B &A plots between automated and human dIVC (at 2 cm location)

Automated versus manual RAP estimation

To compare the automated and manual RAP values, the chi-square test shows there was a significant association between the manual versus automated RAP estimates by both ASE Criterion (\(p<0.01\)) as well as by NIH Criterion (\(p<0.01\)).

Figure 6 shows the confusion matrices for RAP estimates by both ASE and NIH criteria. From the matrices, we see a strong to moderate agreement for both criteria. The accuracy, precision, recall, and f-score for both ASE and NIH are presented in Table 3. Based on the findings from the table and corresponding figure, it appears that ASE Criterion slightly outperforms NIH Criterion. One possible explanation for this observation is that ASE Criterion has fewer classes (3, 8, and 15 mmHg) than NIH Criterion (5, 10, 15, and 20 mmHg), which could impact the performance of individual classes and consequently affect the overall performance. For example, the performance of RAP value of 10 is lower than other RAP values, impacting the overall performance of NIH Criterion. Despite the difference in the performance between ASE Criterion and NIH Criterion, both criteria achieved promising results and show that our AI-based method can be used to estimate RAP values reliably. It is important to note that our study primarily assessed the automated system’s ability to estimate RAP values using dIVC and cIVS. We have not yet compared these automated estimations to the gold-standard invasive RAP measurements. Such a comparison is planned for our subsequent research.

Table 3 Performance of automated RAP estimation using ASE and NIH criteria; the macro* accuracy of ASE Criterion is 0.90 and the macro accuracy of NIH Criterion is 0.85
Fig. 6
figure 6

First row: non-normalized confusion matrices for automated RAP estimation using ASE Criterion (left) and NIH Criterion (right). Second row: normalized confusion matrices for automated RAP estimation using ASE Criterion (left) and NIH Criterion (right). The normalized confusion matrix is generated by dividing each value in the confusion matrix by the sum of the corresponding row

Discussion

This work presents an AI-system for the automation of dIVC, cIVC, and the subsequent estimation of RAP values. Our results show a strong agreement between automated measurements and those determined by human experts. Unlike current practices and existing automated methods, which predominantly compute dIVC in specified frames, our multi-stage AI system offers comprehensive echocardiography analysis in the open-world with minimal computational overhead. The system’s performance is robust, achieving a Pearson correlation coefficient of 0.96 and an F1-score of 0.85, outshining other referenced studies in these metrics.

Comparing our approach to the literature, in [15], a semi-automated LSTM-based architecture was trained on 220 videos, yielding moderate agreement (Fleiss’ kappa, k= 0.45) with expert IVC values. The subsequent work of the same authors, [16], employed this method to predict fluid responsiveness, achieving an AUC-ROC of 0.70 with 175 critically ill patients. In [17], Mesin et al. applied an edge-tracking and machine learning method to a dataset of 170 patients with specific heart-related conditions; the proposed method achieved moderate performance with an accuracy of 71% (SVM).

In contrast, our multi-stage AI-system covers the entire echocardiography analysis spectrum, from echo view selection to quantification. Further, our system takes into account challenges such as computational resource demands and data distribution shifts, incorporating efficient algorithms and adapting to open-world data changes. Additionally, a key strength of our system is its speed. Our algorithm not only facilitates echo view selection, quality assessment, and boundary tracing but also completes a comprehensive end-to-end echo analysis and quantifies parameters such as dIVC, CIVC, and RAP in less than a second (800ms on average). This streamlined efficiency is particularly beneficial in busy clinical settings. For context, we consulted an experienced cardiac sonographer from our echo lab who estimated that a visual assessment of dIVC takes around 3 s, nearly three times longer than our system’s full analysis. The manual measurement of dIVC takes even more time-consuming, averaging between 6 to 7 s, which is roughly 6 to 7 times lengthier than our automation. This automation could encourage the adoption of quantitative measurements in clinical settings. As highlighted in [30], the extended duration required for manual tracing of the cardiac borders has perpetuated the reliance on visual assessments in busy echocardiographic laboratories. Our innovation, therefore, represents a potential shift towards faster and more efficient analyses.

In addition to efficiency, our automated algorithm performs temporal analysis of dIVC over all video frames and at different sites. The analysis of all frames could provide information about temporal changes during respiration over multiple cardiac cycles. The temporal analysis feature of our system could also motivate sonographers to record longer echocardiography spans, potentially shedding light on respiratory changes’ impact on cardiac functions. In addition to the temporal analysis of dIVC and cIVC, our work investigated the automated estimation of noninvasive RAP measurements using two criteria, namely ASE criterion and NIH criterion. However, this work can be extended to estimate the gold-standard and noninvasive RAP measurement. Further, it could be extended to include other criteria and guidelines. There are several reasons for updating or customizing RAP estimation guidelines and recommendations to the specific resources and requirements of each organization or association. These reasons include differences in healthcare systems, new research findings, and variations in the patient populations. In the future, we plan to 1) evaluate the proposed system in estimating the gold-standard invasive RAP measurement, and 2) develop an automated algorithm that enables cardiologists to choose the best-suited RAP estimation criterion based on patient characteristics, and compare the results obtained from different guidelines to enhance accuracy, flexibility, and standardization of care.

Nonetheless, the present study was constrained by some limitations. First, it was conducted at a single center and the study cohort was made up of patients who underwent echocardiography examination for any reason, which might impact the results when the system is applied to patient groups with specific diseases such as pulmonary hypertension, cardiac tamponade, fluid overload, or patients with critically ill conditions.

Second, while our AI algorithm provides consistent results – that is, it can reproduce the exact same measurements for dIVC, cIVC, and RAP in each run – our current study did not comprehensively assess inter-observer and intra-observer variability. Such an assessment would be crucial in determining the reproducibility, which can be influenced by human’s variability. Existing literature has underscored the potential benefits of AI-driven reproducibility in echocardiography. For instance, Nolan et al. [31] reported that automated systems tend to exhibit consistent measurements across different cases, largely eliminating the variability often encountered in manual measurements. Such findings suggest that AI-driven methods, like ours, could offer a notable advantage in enhancing consistency in echocardiographic assessments. However, we recognize the importance of comparing the consistent output of our algorithm against the potential variability seen in manual measurements, and we realize the importance of conducting reproducibility analysis.

Finally, a pivotal limitation in our study is its failure to compare automated RAP values against the gold-standard invasive RAP measurements. While our system demonstrated reliability in IVC measurements relative to the manual method, its capability for direct RAP estimation using the gold standard approach remains unvalidated. In acknowledging this limitation, our next immediate step will involve validating the performance of our system against the invasive gold-standard RAP measurements. In our continued research, we also aim to evaluate our proposed system across different centers, conduct thorough intra- and inter-observer variability evaluations, compare the time required for automated and manual IVC analysis, and provide a more nuanced comparison between automated and manual IVC measurements across different spatial locations.

Conclusion

We present an AI-based system dedicated to automating dIVC and cIVC measurements, which has the potential to refine the current clinical practice of IVC analysis. Specifically, our solution provides a fully automated, cost-effective, and quantitative tool for dIVC and cIVC analysis that could be used in clinical settings and point-of-care testing. Moreover, it offers the capability to conduct variational analysis of dIVC across diverse spatial locations and temporal points, thereby ensuring more consistent measurements. While there may be potential implications for RAP estimations, the primary intent of our system is to augment the current practice of IVC analysis. Such improvements could potentially lead to better clinical decision-making and improved patient outcomes.