Model and training
The automated segmentation model was based on neural network architecture described by Han [15], a modified U-net [16], for which excellent performance has been previously demonstrated in medical segmentation: The model makes use of residual modules [17], which improve gradient flow between adjacent layers and increase classification accuracy. A diagram of the model’s architecture is shown in Fig. 1.
The machine learning algorithm was initially trained in a derivation cohort, consisting of consecutive patients (n = 150) who underwent clinical CMR (with aortic valve PC-CMR) between January – November 2017. For each exam, manual segmentation maps were generated from PC-CMR: This entailed labeling pixels in the magnitude images as either valve or non-valve using 3DSlicer [18], an open-source medical image post-processing application. Prior to neural network processing, input images were resampled and (if necessary) zero-padded to 256 × 256 pixels. Pixel intensity values in the magnitude images were then rescaled to values between zero and one. Model training was performed using magnitude images as input and corresponding ground-truth manual segmentation maps as output (Fig. 2). The training set contained a total of 4345 unique images from 150 patients. Aggressive data augmentation was employed at batch time in the form of random zoom, rotation, crop (224 × 224), horizontal/vertical flip, and addition of Gaussian noise. A weighted softmax/cross entropy loss function was used for training as follows:
$$ loss\left(x,\kern0.5em i\right)\kern0.5em =\kern0.62em -w\left[i\right]\kern0.5em \ast \kern0.5em \mathit{\ln}\kern0.5em \frac{e^{x\left[i\right]}}{\sum jC{e}^{x\left[j\right]}} $$
(1)
where x is the output logit vector at a given pixel, i the true class label, w the vector of class weights, and C the number of classes. Weighting was employed to combat class imbalance given that the vast majority of pixels in each image were non-valve. A class weight of 0.2 was empirically assigned to the non-valve class and 0.8 to the valve class. RMSProp was used to apply incremental parameter updates.
The model was built in Python using the deep learning framework Pytorch. Training and testing were performed on a workstation with four CPU cores, 64 GB of system memory, and a graphics processing unit (GPU) with 11 GB of video memory (NVIDIA [Santa Clara, California, USA] GTX 1080 Ti). Software code pertaining to both training and testing of the machine learning model can be found on line at: https://github.com/akbratt/PC_AutoFlow.
Flow calculation
Data was extracted from PC-CMR DICOM files. Established algorithms were used to convert raw phase map pixel intensities to velocities as follows:
$$ v={P}^{\ast }{M}^{\ast } ASF $$
(2)
such that
$$ ASF=\frac{10\pi R}{VENC} $$
(3)
where P and M represents raw pixel values from the phase and magnitude maps, respectively, ASF is an amplitude scaling factor, R is a reconstruction scaling factor specified in the DICOM header, and VENC is an adjustable scanner parameter representing the maximum measurable flow velocity.
Flow was calculated from the automated segmentation map of a given phase contrast scan as:
$$ NetFlow=\kern0.5em \sum \limits_{n=0}^N\sum \limits_{i=0}^I{S}_{n,i}\kern0.5em {V}_{n,i}\kern0.5em a\Delta t $$
(4)
where n is the segment index, N the number of temporal segments in the scan, I the number of pixels in each segment, S the binary segmentation map, V the velocity map calculated using eqs. 1–3, a the pixel area (in cm2), and Δt the time interval between segments.
Study population
An independent validation cohort was thereafter employed to test the algorithm, which was comprised of CAD patients (n = 190) enrolled in two prospective (Cornell) institutional protocols focused on LV remodeling. PC-CMR exams were performed using a standardized protocol, in which PC-CMR datasets were acquired (through plane) at the level of the aortic valve leaflet tips and cine-CMR datasets (for assessment of systolic function) were acquired in contiguous short axis slices (6 mm slice thickness, 4 mm gap) throughout the LV. CMR exams in the validation cohort were performed using a 3T CMR scanner (84% 1.5 T, 16% 3T; General Electric Healthcare, Waukesha, Wisconsin, USA) scanners. Typical PC-CMR parameters were as follows: flip angle = 20 deg., Venc = 150–350 cm/sec, TR [1.5T] = 8 msec, TE [1.5T] = 3.7 msec, TR [3T] = 5 msec, TE = 3.6 msec. Transthoracic echocardiography was performed within one week of CMR (99% within 24h) in accordance with standardized protocol as previously detailed for each of the two prospective studies from which the current cohort was derived [19, 20]. Clinical and demographic information was prospectively acquired at time of study enrollment.
This research protocol was performed with approval of the Weill Cornell Institutional Review Board (IRB), which approved retrospective analysis of pre-existing datasets utilized for model training (derivation cohort). Validation cohort patients provided written informed consent for research participation.
Volume overlap and surface distance metric analysis
The automated segmentation model was evaluated in terms of volume overlap and surface distance metrics by comparing automated segmentation maps to corresponding ground-truth manual segmentation maps. Volume overlap and surface distance metrics were tested on all scans in the derivation cohort (n = 150) using six-fold cross-validation. Cross validation is a procedure whereby data is randomly split into non-overlapping subsets such that a model can be trained on all but one subset and tested on the remaining subset. In this case, a different model instance was trained and tested for each of the 6 hold-out subsets and test metrics were averaged per-case for the entire dataset.
Volume overlap metrics (Dice and Jaccard coefficients) consider only pixels that are labeled as valve. These coefficients take on values between zero and one such that a value of one is perfect overlap between segmentation maps and zero is no overlap. Distance metrics (Hausdorff [HD] and average symmetric surface [ASSD] distances) operate on binary surface plots generated from volumetric segmentations by zeroing any valve pixels with no neighboring non-valve pixels. Equations for volume overlap and surface distance metrics are shown in Additional file 1.
Flow comparisons / algorithm evaluation
Net forward trans-aortic flow calculated via fully automated machine learning segmentation was compared to that generated by manual segmentation, which was performed by an experienced (level III trained) physician (JWW). Flow differences between manual and machine learning were compared to those between manual and a conventional (commercially available) automated algorithm (Cardiac VX, General Electric Healthcare, Waukesha, Wisconsin, USA): The commercial algorithm requires a user to manually contour a single temporal segment; the segmentation mask is then propagated to all other temporal segments with automatic adjustments to account for valve motion and deformation. To directly test incremental utility of the machine learning approach, analyses using the commercial algorithm were in no way adjusted following initial segmentation.
Intra- and inter-reader reproducibility for manual, conventional automated, and machine learning segmentation were determined via analysis of a random subset of 20 patients.
External validation
Data from an additional institution (Duke) was used to further test robustness of the model, representing 130 CMR scans acquired using different vendor (Siemens, Munich, Germany]) equipment (53% 3T, 47% 1.5T) in a cohort enriched for patients (n = 40) with clinically documented aortic valve pathology (bicuspid aortic valve [BAV]) or aortic stenosis [AS]; 25% mild / 35% moderate / 40% severe). To do so, a new instance of the model was trained on a dataset consisting of the entire derivation cohort (n = 150) as well as 50 exams from the external validation cohort (including 10 with AS and 10 with BAV; total n = 200). The model was then tested on the remainder of the external validation cohort (n = 80), including an equivalent number of patients with aortic valve pathology (n = 10 AS, n = 10 BAV). The Duke IRB provided approval for analysis of de-identified datasets for research purposes.
Statistical methods
Comparisons between groups were made using Student’s t-test (expressed as mean ± standard deviation [SD]) for continuous variables. Inter and intra-observer agreement between methods was assessed using the method of Bland and Altman [21], which yielded mean difference as well as limits of agreement between methods (mean ± 1.96 SD). Bivariate correlation coefficients, intra-class correlation coefficients, and linear regression equations were used to evaluate associations between variables. Statistical calculations were performed using SPSS 24.0 (Statistical Package for the Social Sciences, International Business Machines, Inc., Armonk, New York, USA), SciPy [22], and Excel (Microsoft Inc., Redmond, Washington, USA). Two-sided p < 0.05 was considered indicative of statistical significance.