Evaluation of a Semi-automatic Right Ventricle Segmentation Method on Short-Axis MR Images

The purpose of this study was to evaluate a semi-automatic right ventricle segmentation method on short-axis cardiac cine MR images which segment all right ventricle contours in a cardiac phase using one seed contour. Twenty-eight consecutive short-axis, four-chamber, and tricuspid valve view cardiac cine MRI examinations of healthy volunteers were used. Two independent observers performed the manual and automatic segmentations of the right ventricles. Analyses were based on the ventricular volume and ejection fraction of the right heart chamber. Reproducibility of the manual and semi-automatic segmentations was assessed using intra- and inter-observer variability. Validity of the semi-automatic segmentations was analyzed with reference to the manual segmentations. The inter- and intra-observer variability of manual segmentations were between 0.8 and 3.2%. The semi-automatic segmentations were highly correlated with the manual segmentations (R2 0.79–0.98), with median difference of 0.9–4.8% and of 3.3% for volume and ejection fraction parameters, respectively. In comparison to the manual segmentation, the semi-automatic segmentation produced contours with median dice metrics of 0.95 and 0.87 and median Hausdorff distance of 5.05 and 7.35 mm for contours at end-diastolic and end-systolic phases, respectively. The inter- and intra-observer variability of the semi-automatic segmentations were lower than observed in the manual segmentations. Both manual and semi-automatic segmentations performed better at the end-diastolic phase than at the end-systolic phase. The investigated semi-automatic segmentation method managed to produce a valid and reproducible alternative to manual right ventricle segmentation.


Introduction
Assessment of ventricular morphology and function is important in the management of patients with cardiovascular disease. Magnetic resonance imaging (MRI) has been the preferred imaging modality for quantitative analysis of the ventricles [1,2]. These functional cardiac analyses are usually performed by measuring ventricular volumes at certain cardiac phases, such as at the end-diastolic (ED) and end-systolic (ES) phases, and subsequently calculating the ejection fraction (EF). Ventricular evaluation typically requires the ventricular borders to be segmented first, before further analysis and calculations can be performed. Traditionally in the clinical setting, ventricular border segmentation is performed manually, which is known to be a time-consuming process [3], prone to intra-and inter-observer variability [3][4][5], and dependent on user experience [3,5,6]. Therefore, efforts have been done previously to develop automatic segmentation methods which have been shown to reduce segmentation time [7][8][9] with comparable or even lower variability than manual segmentations [9,10].
Automatic segmentation methods have been shown to be beneficial in left ventricle (LV) evaluations on short-axis cine images [8,11,12]. Meanwhile, automatic segmentation algorithms for the right ventricle (RV) are less available than for the LV [13]. RV segmentation is more challenging due to its high shape variability and complex movement [14,15], resulting in a lower performance in terms of variability for both manual and automatic RV segmentations as compared to LV segmentations [3,16].
Various RV segmentation algorithms have been developed to overcome the inherent difficulties in RV segmentation, ranging from image-driven to model-based algorithms, from semi-automatic algorithms requiring multiple user inputs to fully automatic [14]. While model-based algorithms can be quite powerful, image-driven algorithms are generally regarded to be more robust against pathological and image acquisition variations. Due to the morphologic variations of the RV with regard to its pathological condition [17], robust segmentation algorithms are needed. Semi-automatic algorithms have been shown to outperform fully automatic ones, despite the user interactions needed [14].
In this study, we aim to evaluate a newly developed imagedriven semi-automatic RV segmentation method on cardiac short-axis MR images that segment all RV contours in a cardiac phase with minimal user input of one seed contour and with the essential restriction of no manual corrections. Validity and reproducibility of the semi-automatic segmentation will be compared against the manual segmentation.

Study Population
Twenty-eight consecutive volunteers were included for the current studies. This study was conducted according to the principles of the Declaration of Helsinki (October 2013) and in accordance with the Medical Research Involving Human Subjects Act (WMO). The study received approval by the local institutional review board and each subject gave informed consent.

MRI
Cardiovascular magnetic resonance (CMR) imaging was performed using a Signa 1.5 T scanner (GE Medical Systems, Milwaukee, WI, USA) with a dedicated 16-channel phasedarray cardiac surface coil. A cine volumetric dataset was acquired in short-axis, four-chamber view, and tricuspid valve view directions using a 2D steady-state free precession acquisition sequence with imaging parameters as follows: flip angle 45°, echo time (TE) set at minimal full, repetition time  (TR) 3 ms, 8 mm slice thickness, 2 mm interslice gap, number  of excitations 0.75, phase field of view percentage 0.65, 12 views/segment, and a matrix of 256 × 256 (resulting in an inplane resolution between 1.09 to 1.56 mm/pixel). Twenty-four phases per cardiac cycle were reconstructed retrospectively.

Image Analysis
Both manual and semi-automatic segmentations were performed within CAAS MRV software package (version 4.1; Pie Medical Imaging, BV, Maastricht, the Netherlands) on the short-axis view. The basal slice was inferred from the position of the tricuspid annulus on the four-chamber view [3,14] and tricuspid valve view. The apical slice was chosen to be the last slice that shows detectable RV activity [14]. The ED and ES phases and the apical and basal slice selections at ED and ES phases were set to be the same for both manual and semi-automatic segmentations. The segmentations were performed at ED and ES phases, on every slice between apical and basal slices.

Manual Segmentation
Two experienced observers with 7 and 2 years of CMR imaging experience (first and second observer, respectively) independently performed manual segmentations of RV endocardial contours to derive the inter-observer variability. The datasets were anonymized before being presented to the observers. The first observer performed the manual segmentations once (resulting in measurement M1), which serves as reference results. Meanwhile, the second observer performed the manual segmentations twice (resulting in measurements M2a and M2b), in two sessions separated by 2-week period to derive the intra-observer variability (see Table 1 for the overview of measurements). Papillary muscles and trabeculations were treated as part of the blood pool volume.

Semi-automatic Segmentation
The semi-automatic RV segmentation algorithm is based on the cellular automata framework which allows every voxel to be labeled as foreground or background based on their signal intensity similarity and their distance to the seeds [18]. This labeling process is implemented using parallel computation techniques and therefore high computation performance can be established. The segmentation algorithm requires prior information of the ED and ES phases, and the apical and basal slices for both the ED and ES phases. At the ED and ES phases, the user is asked to provide a rough RV endocardial contour as a seed in one slice between the identified apical and basal slices. The segmentation is initiated at the slice where the user defined roughly the RV seed contour, which is used by the algorithm during the foreground labeling process. Meanwhile, the background labeling is determined by the algorithm based on features derived from the image itself and cardiac movement extracted from the short-axis slice. After optimizing the seed contour, the resulting RV endocardial contours are propagated towards the apical and the basal slices, taking into account possible misalignment between slices and the RV geometry at a specific cardiac phase (relative to ED and ES phases).
The same two observers performed the semi-automatic RV segmentations, with the same number of segmentations as the manual one (resulting in measurements A1, A2a, and A2b respectively, see Table 1 for the overview of measurements). The observers performed the segmentations independently and were blinded to the results of segmentation until all the data were ready to be processed. Adhering to the common workflow of cardiac examinations, where the LV examinations were performed prior to RV examination, the LV was already segmented before the automatic RV segmentations. The same LV segmentations are provided to all measurements. We have to stress here that for the current study, no manual corrections were performed afterward and the resulting RV contours were used as is.
Ventricular volumes at ED and ES phases were automatically calculated by the software, using the Simpson's rule: Area i Thickness i where i is the slice level, n is the number of slices, Area i is the area covered by the RV endocardial contours at the ith slice level, and Thickness i is the slice thickness at ith slice level (including the interslice gap). EF was also automatically calculated using the following equation:

Statistical Analysis
To assess the performance of the semi-automatic segmentation algorithm, the derived ED and ES ventricular volumes for the right endocardium as well as values for the EF were compared with the manual derived values of the first observer (measurement A1 vs measurement M1). The appropriate term to express the level of agreement between the semi-automatic and manual segmentation is Bvalidity^instead of Baccuracy^in view of the fact that no gold standard exists for RV evaluation [19]. The validity was expressed as the mean and standard deviation, 95% limits of agreement (calculated as the mean ± 1.96 * standard deviation), the median, and the interquartile range of the paired differences in each data set. The percentage difference relative to the average value of the manual volumes was also calculated. A preliminary test using the Shapiro-Wilk test on the measurements showed that some of them were not normally distributed. Therefore, a two-tailed Wilcoxon signed-rank test was performed to determine the statistical significance of the observed difference, with P < 0.05 considered to indicate significant difference. The Bland-Altman analysis was also performed to visualize the observed differences. Contours obtained by the semi-automatic segmentation algorithm were also evaluated against the ones of the manual segmentation, using two metrics: the dice metrics (DM) and First attempt of manual segmentation by the second observer

M2b
Second attempt of manual segmentation by the second observer Semi-automatic measurements A1 Semi-automatic segmentation by the first observer

A2a
First attempt of semi-automatic segmentation by the second observer A2b Second attempt of semi-automatic segmentation by the second observer

Manual analyses
Inter-observer variability M1 vs M2a Intra-observer variability M2a vs M2b Semi-automatic analyses Inter-observer variability A1 vs A2a Intra-observer variability A2a vs A2b Validity A1 vs M1 Hausdorff distance (HD). DM is a measure of area overlap between two contours, using the following equation: where A and B are the areas enclosed by the two tested contours. The DM ranges from 0 (no overlap) to 1 (perfect overlap). Meanwhile, HD is a measure of maximum distance between two contours expressed in mm, using the following equation: where X and Y are the two tested contours, x and y are individual points of X and Y, respectively, and d(x,y) is Euclidian distance between x and y.
To put the observed difference of segmentations in perspective, we compared them against the inter-and intra-observer variability as found from the manual segmentations by the two observers. Inter-observer variability of the manual segmentations was obtained by comparing the segmentation results of the first observer and the first result of the second observer (measurement M1 vs M2a) and intra-observer variability by the two segmentation results of the second observer (measurement M2a vs M2b).
To assess the reproducibility of the semi-automatic segmentation algorithm, inter-and intra-observer variability of the semi-automatic segmentations were obtained in a similar way as the manual segmentations, i.e., by comparing automatic segmentation results of the first observer and the first result of the second observer (measurements A1 vs A2a) and by comparing both automatic segmentations results of the second observer (measurements A2a vs A2b), respectively. Validity and reproducibility of the semi-automatic segmentations and reproducibility of the manual segmentations were compared. Table 1 contains the overview of analyses performed in this study.
All statistical analyses were performed using Microsoft Excel 2013 (Microsoft Corp., Redmond, WA). A post hoc statistical power analysis was also performed using the G*Power software [20]. Table 2 shows the characteristics of the study participants including their ED volumes, ES volumes, and EF measurements. These values are concordant with previous reported RV parameters of healthy subjects [21]. The overall mean age was 30.5 + 6.5 years and 14 volunteers (50%) were males.

Results
A typical result of the RV semi-automatic segmentation is shown in Fig. 1. When the LV epicardial contour is available, the RV endocardial contour is shown as attached to the LV epicardial contour, with one of the attach points representing 55.3 ± 6.5 53.8 ± 6.9 56.9 ± 5.8 Values are presented as means ± standard deviation N number of participants, ED end-diastolic, ES end-systolic, EF ejection fraction

Manual RV Segmentation Reproducibility
The reproducibility analysis of the manual segmentation is presented in Table 3. All inter-and intra-observer variability results were statistically significant with the exception of the inter-observer EF variability. The inter-observer variability was larger than the intra-observer variability for ED volumes, but smaller for ES volumes and EF.

Semi-automatic RV Segmentation Validity
The validity of the semi-automatic segmentations is presented in Tables 4 and 5, and Fig. 2. The semi-automatic segmentation showed good agreement with the manual segmentation (Table 4), with an excellent linear correlation for both ED and ES volumes (R 2 of 0.98 and 0.91, respectively) and slightly less but still good correlation for EF (R 2 of 0.79). The ED volumes had a median difference of less than 2 mL (0.91%), ES volumes showed an underestimation with a median difference less than 4 mL (− 4.84%), and comparison of EF resulted in an overestimation with a median difference of less than 2% (or 3.27% relatively). In comparison to the reproducibility of manual segmentation, the median differences in ED volumes between semi-automatic and manual segmentations were smaller than the manual inter-observer variability. However, in all parameters, the interquartile ranges were larger than the ones of manual intra-and inter-observer variability. Post hoc statistical power analysis on the Wilcoxon signed-rank test yielded 0.62, 0.32, and 0.74 for testing the differences of ED volumes, ES volumes, and EF, respectively. Yet looking in more detail (Table 5), the semi-automatic contours showed good overlap with the manual contours (median DM of 0.95 and 0.87, for ED and ES contours respectively) and revealed small deviations (median HD of 5.05 mm and 7.35 mm for ED and ES contours, respectively).

Semi-automatic RV Segmentation Reproducibility
For the semi-automatic contour detection, no significant differences could be observed between the first and second Value and percentage are presented in median (25th to 75th percentile) and in mean ± SD (95% limits of agreement, calculated as mean ± 1.96 * SD). Two-tailed Wilcoxon signed-rank test calculated validity of semi-automatic segmentation with P < 0.05 indicating statistical significance SD standard deviation, ED end-diastolic, ES end-systolic, EF ejection fraction Value and percentage are presented in median (25th to 75th percentile). Two-tailed Wilcoxon signed-rank test calculated inter-and intra-observer variability with P < 0.05 indicating statistical significance ED end-diastolic, ES end-systolic, EF ejection fraction observers. The first and second measurements of the second observer showed significant differences in ES volumes and EF. The semi-automatic segmentations showed a highly reproducible segmentation (Table 6), with the largest median difference being lower than 0.5% for all parameters. The interand intra-observer variability of the semi-automatic segmentations were noticeably smaller than the manual segmentations.

Discussion
The semi-automatic segmentation results were highly correlated with the manual segmentation results with lower interand intra-observer variability than observed in the manual segmentations. The reproducibility of the manual segmentations was in line with previously reported values (Table 7). In comparison to the reproducibility of these previous studies, the validity level of the semi-automatic segmentations was generally on par or better and the inter-and intra-observer variability were considerably lower. Out of all the values listed in Table 7, the results presented in the current study are best comparable to the results of Caudron et al. [3], since there was consensus within and between the observer(s) on basal and apical slices and ES phase. Difficulties in RV segmentation at ES phases are wellknown [3,14] and attributed to partial volume effects [14] and to the more complex anatomical RV structure [24], especially at ES phase with maximum contraction of the right ventricle resulting in more compacted trabeculations and papillary muscles, limiting the segmentation process. Despite the slight underestimation of the volume measurements at ES phases in our semi-automatic segmentation method, the validity of EF still fell on average within 2% range with 95% limits of agreement smaller than ± 10% (in values difference). In a recent RV segmentation challenge, held at the Medical Image Computing and Computer Assisted Interventions (MICCAI) 2012 conference [14] and joined by seven imaging groups, the best performing algorithm managed to produce EF measurement with validity in the range of 6% with the general results producing 95% limits of agreement around ± 20% (in values difference). Several newly developed automatic RV segmentation  Dice metric and Hausdorff distance are presented in median (25th to 75th percentile) and in mean ± SD (95% limits of agreement, calculated as mean ± 1.96 * SD). Two-tailed Wilcoxon signed-rank test calculated validity of semi-automatic segmentation with P < 0.05 indicating statistical significance SD standard deviation, ED end-diastolic, ES end-systolic algorithms using the same dataset as the RV segmentation challenge have been published [28][29][30][31]. An improvement of the EF validity results has been reported to be around 2% with the 95% limits of agreement slightly higher than ± 10% [28]. Manual segmentations have been known to be time-consuming, taking from 5 min [25] up to 54 min per patient [5] depending on factors such as user experience and contouring methods. (Semi-)automatic segmentation methods that assist an analyst during this process within a small amount of time and yielding valid and reproducible results will be very beneficial for the physician in the clinical workflow. Our segmentation method allows valid and reproducible results, with one roughly user-drawn seed contour and within approximately 1 second of computation time to perform segmentation at     Value and percentage are presented in median (25th to 75th percentile). Two-tailed Wilcoxon signed-rank calculated inter-and intra-observer variability with P < 0.05 indicating statistical significance ED end-diastolic, ES end-systolic, EF ejection fraction one cardiac phase. These virtues are preferred in clinical settings for robustness and reduction in examination time [14]. We would like to argue that the functionality provided by the currently evaluated semi-automatic segmentation method provides an optimal balance between the ease-of-use and the algorithm performance. Evaluating LV and RV concurrently is beneficial because problems with either side of the ventricles frequently involves the other [32]. The common approach for concurrent LV and RV evaluation is by simultaneously showing both LV epicardial and RV endocardial contours at the septum site [1,33] which might introduce an error when there is an overlap or a gap between the two contours, i.e., the left ventricular epicardial border may extend into the RV or vice versa. In the currently investigated software package, the RV endocardial contour is attached to the LV epicardial contour, such that the RV endocardial and the LV epicardial contours effectively share the septum and avoid the aforementioned problem. Another advantage of this approach is that the structures attached to the septum, such as the septomarginal trabecula [34], will be automatically included into the right endocardial area. A previous study has shared this way of reasoning and presented a similar approach [23]. It is debatable whether to include or to exclude the trabeculations and papillary muscles into RV cavity delineations. However, inclusion of these structures is recommended to promote reproducibility [24].
The semi-automatic RV segmentation method is evaluated on short-axis cine MRI images. The use of short-axis image orientation for RV analysis promotes efficiency because with one image set both ventricles can be analyzed [35]. However, since short-axis cine MRI is designed for LV analysis, it may not be fully optimized for RV analysis. One of the drawbacks is that the tricuspid valve may not be present in the imaging plane, making it difficult to distinguish ventricles from atria at the basal slices [35][36][37] and to localize the RV outflow tract which may be out of plane [38]. Difficulties in segmenting these two structures at the basal region have been reported as one of the contributing factors in lower reproducibility in RV segmentation [25]. The Society for Cardiovascular Magnetic Resonance recommends the use of transaxial cine MRI images for RV volumetric analysis [39]. Despite the difficulties in distinguishing blood and myocardium border at inferior RV wall, RV segmentation on transaxial cine MRI images has shown to provide higher reproducibility, probably due to the easiness of locating pulmonary and tricuspid valves [35]. However, such improvement might be too small to be clinically significant and warrant an extra RV examination on transaxial images in addition to the normal CMR examination on short-axis images [40]. Several alternative imaging orientations have been suggested to improve RV segmentation reproducibility, such as: a modified RV short-axis view which is oriented to the RV outflow [36], due to the same advantage of easiness in locating the tricuspid valve; or an acquisition of six slices rotated along the long-axis of the RV, each forming 30°w edge to each other [38]. Variation of short-axis plane orientations exists, and a short-axis orientation perpendicular to the septum has been recommended to obtain optimal RV and LV measurements [41]. One way to mitigate the problem of choosing the most basal slice in short-axis images is by using other image orientation which is perpendicular to them, such as the four-chamber view [3,14]. Accordingly, this study used the four-chamber view but also the tricuspid valve view, which clearly depicted the tricuspid annulus, to locate the basal slice in the short-axis view. One of the evaluation parameters set up in this study is the reproducibility of the semiautomatic segmentation method. By pre-selecting the basal slices and setting the pre-selected level of basal slices similar for all measurements of the same dataset, thus removing one source of variability [3], the reproducibility analysis was able to be focused on the performance of the method.
There are several limitations in our study. First, the data used for evaluation were acquired on healthy volunteers and using one specific set of image acquisition protocols and MRI scanner. Various cardiovascular diseases can affect RV morphology and structures [24,42] which may hamper the performance of automatic segmentation methods. However, the employed algorithm relies on features present in the image data itself and it has been pointed out [14] that such a method should be invariant to pathological cases and image acquisitions. Nevertheless, future validation study is still needed to evaluate the performance of this segmentation method on variations of datasets.

Conclusions
In conclusion, the investigated semi-automatic RV segmentation method managed to produce a valid and reproducible alternative to manual RV segmentation, with limited number of user interactions and computation time.