Background

Myocardial strain imaging for the assessment of contractile function has multiple applications. It is used for the detection of cardiotoxicity due to chemotherapy [1], in cardiac resynchronization therapy (CRT) as a predictor of response and outcomes [2, 3] and for detecting late-activating regions [4, 5], as a predictor of major adverse cardiac events after myocardial infarction [6], for the detection ischemia [7], and in other areas. For some applications, global strain assessments may be adequate; however, for others, such as detecting late-activating regions or for imaging regions of infarction or ischemia, segmental strain quantification is essential. Furthermore, reproducible techniques for segmental strain could potentially demonstrate its importance in certain applications where global strain is currently thought to be sufficient, such as cardiotoxicity where dysfunction may not be uniform throughout the myocardium, but instead may effect some regions more than others [8].

Strain reproducibility, i.e., the ability to obtain a consistent strain result when a measurement is repeated, is important for all of the applications discussed. The assessment of myocardial strain by cardiovascular magnetic resonance (CMR) takes multiple steps, including those that are related to image acquisition and those that occur during image analysis, and these steps may be influenced by practices and conditions that may vary between different users, different sites, and different imaging scans and sessions. For these reasons, reproducibility studies should try to account for as many of these variables as possible.

While a number of studies have shown high reproducibility of global or whole-slice strain for multiple CMR strain imaging methods including feature tracking [9, 10], conventional tissue tagging [10, 11], harmonic phase (HARP) imaging [12], strain-encoded imaging (SENC) [10, 13], and displacement encoding with stimulated echoes (DENSE) [11, 14, 15], achieving highly reproducible results for segmental strain is more challenging. For example, it has been shown that the reproducibility of segmental strain assessed using feature tracking is in the fair-poor range [16,17,18,19,20]. Previously, two single-center studies reported that DENSE at 1.5 T provides highly reproducible measurements of segmental end-systolic circumferential shortening (Ecc) in healthy subjects [15, 21], and another 1.5 T study showed excellent intra-observer and inter-observer reproducibility of DENSE segmental Ecc in myocardial infarction patients [22]. The study by Lin et al. [21] in particular showed excellent reproducibility at one center of DENSE for different image acquisition sessions (inter-session reproducibility), the same user performing image analysis at different times (intra-user reproducibility), and for different users performing image analysis (inter-user reproducibility). Recent improvements to DENSE include the use of outer volume suppression and optimization of parameters for 3 T [23]. The present study sought to evaluate the reproducibility of 3 T DENSE at multiple centers, for multiple analysts (including human analysts at the same and different centers and analysis by fully-automated deep learning), and for successive scans in healthy subjects and in patients with heart disease, with a particular focus on segmental Ecc.

Materials and methods

Study sites and subjects

Six centers participated in this study (St. Francis Hospital, New York, USA; the Royal Brompton Hospital, London, UK; Stanford University, Palo Alto, USA; University Hospital, Saint-Etienne, France; Emory University, Atlanta, USA; and the University of Virginia, Charlottesville, USA). All sites had prior experience with cardiac cine DENSE imaging. A total of 81 subjects participated, including 60 healthy subjects and 21 heart disease patients (Table 1). The patient group included subjects with (a) heart failure with left bundle branch block (n = 9), (b) ischemic heart disease (n = 3), (c) amyloidosis (n = 4), (d) cardiomyopathy (n = 3), atrial fibrillation (n = 1), and hypertension (n = 1). Only adults > 18 years were included, and subjects with CMR contraindications (e.g., implantable devices, cerebral aneurysm clips, cochlear implants etc.) were excluded. Additional exclusion criteria for the healthy control group included a history of cardiovascular disease, hypertension and smoking. All CMR studies were performed in accordance with each site’s protocols that were approved by their respective institutional review boards for research involving human subjects, and all subjects provided informed consent. All CMR was performed using 3 T CMR scanners (MAGNETOM Prisma or Skyra, Siemens Healthineers, Erlangen, Germany) with 4–32 channel phased-array radiofrequency coils.

Table 1 Description of Subjects

CMR protocol

For each subject, cine DENSE images were acquired in three short-axis planes at basal, mid-ventricular and apical levels, and in the four-chamber long-axis view. Short-axis cine balanced steady-state free precession (bSSFP) images covering the left ventricle (LV) were also acquired for all subjects for the quantification of LV end diastolic volume (LVEDV), LV end systolic volume (LVESV), and LV ejection fraction (LVEF). Healthy subjects did not receive gadolinium. For patients undergoing clinical exams, DENSE imaging was completed prior to administration of gadolinium. A standardized spiral cine DENSE [24] acquisition protocol with outer volume suppression [23] was used with the following parameters: slice thickness = 8 mm, TR = 15 ms, TE = 1.26 ms, temporal resolution of 15 ms (with view sharing), pixel size = 3.4 × 3.4 mm2, FOV = 200 mm2, region of signal generation = 120 × 120 mm2, 2D in-plane displacement encoding using the simple three-point method [25, 26], displacement-encoding frequency = 0.1 cycles/mm, ramped flip angle with final flip angle of 15°, fat suppression, and a total of 4 spiral interleaves with 2 interleaves acquired per heartbeat. Each cine DENSE acquisition was performed during end-expiratory breath holding over 14 cardiac cycles, which consisted of 12 cardiac cycles for acquiring DENSE data and 2 cardiac cycles to acquire B0 field map data which was used to correct the spiral data for off-resonance, assuming a linear variation in B0 across the field of view. The cine bSSFP protocol was not standardized among the participating sites.

During one imaging session, each subject underwent two separate DENSE scans in order to assess inter-scan reproducibility. Each subject was taken out of the scanner and repositioned between the scans. We refer to the separate scans as Scan A and Scan B.

Strain analysis of DENSE images

Well-established methods were used for strain analysis of DENSE images. Segmentation of the LV myocardium was performed semiautomatically using motion-guided segmentation [27], and manual correction was applied if needed. Next, a phase-unwrapping algorithm was applied to LV myocardial pixels, and, subsequently Lagrangian displacement and strain were calculated [28]. For short-axis images, the Lagrangian strain tensor was projected to the circumferential and radial directions relative to the LV center of mass to compute circumferential and radial strains (Ecc and Err, respectively). For long-axis images, the Lagrangian strain tensor was projected to the longitudinal direction to compute longitudinal strain (Ell). For short-axis imaging, whole-slice and segmental strain analyses were performed, where segmental analysis used the American Heart Association 16-segment model [29]. For four-chamber long-axis images, the Ell values from the two mid-ventricular segments (American Heart Association segments 9 and 12) were averaged to compute a single global Ell value. Global torsion was derived by computing twist for the three short-axis slices and then computing the change in twist along the longitudinal direction. All sites were provided written, video and/or live instructions in order to standardize the strain analysis process.

Reproducibility of strain analysis

To investigate the reproducibility of performing strain analysis, each site assigned two users to analyze DENSE images. The users had between 0 and 10 years of experience performing DENSE analysis. To assess intra-user reproducibility, the first user of each site analyzed Scan A data twice, with a 2–3 week interval between analysis sessions. To assess inter-user reproducibility within each site, User 1 and User 2 at each site analyzed Scan A datasets. To assess inter-scan reproducibility, User 1 analyzed Scan A and Scan B images at each site. One site, the University of Virginia, assigned one user (D.A.) to analyze the Scan A images of all other sites in order to assess the inter-site reproducibility of strain analysis. This user is referred to as User UVA. As fully automatic deep learning (DL) methods have recently been developed for whole-slice and segmental Ecc analysis of short-axis DENSE images [30], inter-user reproducibility was also assessed for DL vs. User 1 of all sites.

Statistical analysis

Correlations between continuous variables were assessed using the squared Pearson’s correlation coefficient r2. Bland–Altman analysis was used to assess the agreement between different measurements. Reproducibility was quantified using the coefficient of variation (CV) and the intra-class correlation coefficient (ICC). CV was considered excellent for CV ≤ 10%, good for 10% < CV ≤ 20%, fair for 20% < CV ≤ 40%, and poor for CV > 40% [21]. ICC values were considered excellent for ICC > 0.74, good for ICC 0.6 < ICC ≤ 0.74, fair for ICC 0.4 < ICC ≤ 0.59, poor for ICC < 0.4 [13]. ICC and CV values are presented for whole-slice or global Ecc, Err, Ell, and torsion, and for segmental Ecc. Student t-tests were two-tailed with a significance level of p < 0.05.

Results

Table 1 provides an overview of the study participants. From the 81 subjects, three short-axis slices were discarded due to poor image quality (phase signal-to-noise ratio [30] less than 12), resulting in 240 short-axis slices that underwent strain analysis and providing 240 strain values for whole-slice Ecc and Err reproducibility calculations, 78 values for torsion reproducibility calculations, and 1,278 strain values for segmental end-systolic Ecc reproducibility calculations. For long-axis imaging, eight slices were discarded due to poor image quality (phase signal-to-noise ratio less than 12 or a field of view that was smaller than the LV), resulting in 73 global Ell values for reproducibility calculations. Example end-systolic short-axis and long-axis DENSE images from healthy subjects are shown in Fig. 1 along with displacement and strain maps and strain–time curves. This figure also shows displacement and strain data analyzed by the same user twice and by two different users at the same site. Example images and Ecc maps from a heart failure patient with left bundle branch block are shown in Fig. 2, as are Ecc-time curves generated by two users from different sites, demonstrating inter-user-different-site reproducibility.

Fig. 1
figure 1

Demonstrations of intra-user and inter-user-same-site reproducibility. Example end-systolic short-axis displacement encoding with stimulated echoes (DENSE) magnitude (A) and phase (E) images of a healthy subject and the corresponding displacement maps (B, F), Ecc maps (C, G), and segmental circumferential strain (Ecc)-time curves (D, H) resulting from analysis by the same user at two different times (B, C, D vs. F, G, H). Also shown are example end-systolic long-axis DENSE magnitude (I) and phase (M) images of a healthy subject and the corresponding displacement maps (J, N), Ell maps (K, O), and global longitudinal strain (Ell)-time curves (L, P) resulting from analysis by two different users from the same site (J, K, L vs. N, O, P)

Fig. 2
figure 2

Demonstration of inter-user-different-site reproducibility. Example end-systolic short-axis DENSE magnitude (A) and phase (E) images of a patient with heart failure and left bundle branch block are shown as are the corresponding displacement maps (B, F), Ecc maps (C, G), and segmental Ecc-time curves (D, H) resulting from analysis by users at two different sites (B, C, D vs. F, G, H)

Reproducibility of Ecc

The global Ecc values, averaged over all three short-axis slices, were − 0.18 ± 0.03 and − 0.15 ± 0.05 (p < 0.05) for healthy subjects and for heart disease patients, respectively (see Table 1 for greater detail). For whole-slice Ecc, Fig. 3 (column 1) shows Bland–Altman plots for intra-user, inter-user-same-site, inter-user-different-site, inter-user-Human-DL, and inter-scan comparisons, demonstrating narrow limits of agreement and small biases for all cases. For segmental Ecc, Fig. 4 shows Bland–Altman plots for intra-user, inter-user-same-site, inter-user-different-site, inter-user-Human-DL, and inter-scan comparisons, demonstrating slightly higher but still narrow limits of agreement and small biases. As shown in Table 2, for whole-slice Ecc the mean coefficient of variation value was 5.0% or lower for intra-user, inter-user-same-site, inter-user-different-site, inter-user-Human-DL, and inter-scan comparisons, and the mean ICC was 0.85–0.93 for all comparisons. As also shown in Table 2, for segmental Ecc the coefficient of variation was 8.9% or lower for intra-user, inter-user-same-site, inter-user-different-site, and inter-user-Human-DL comparisons, and was 11.4 for the inter-scan case. The ICC was 0.86 or higher for all comparisons except inter-scan, where it was 0.77. For segmental Ecc, bullseye plots of coefficient of variation and ICC for intra-user, inter-user-same-site, inter-user-different-site, inter-user-Human-DL, and inter-scan comparisons are shown in Figs. 5 and 6, respectively. These results indicate excellent intra-user, inter-user-same-site, inter-user-different-site, inter-user-Human-DL, and inter-scan reproducibility of whole-slice Ecc and excellent reproducibility of segmental Ecc for intra-user, inter-user-same-site, inter-user-different-site, and inter-user-Human-DL cases and good–excellent reproducibility of segmental Ecc for the inter-scan case, with coefficient of variation in the good range and ICC in the excellent range.

Fig. 3
figure 3

Bland–Altman plots for whole-slice or global Ecc, Ell, radial strain (Err) and torsion showing agreement of intra-user, inter-user-same-site, inter-user-different-site, inter-user-Human-DL and inter-scan comparisons

Fig. 4
figure 4

Bland–Altman plots for segmental Ecc, showing agreement of intra-user, inter-user-same-site, inter-user-different-site, inter-user-Human-DL and inter-scan comparisons

Table 2 Summary of correlation, Bland–Altman, CV and ICC results. Global Ecc and Err values are averaged over 3 slices (base, mid-ventricle and apex)
Fig. 5
figure 5

Bull’s eye plots of the coefficient of variation for segmental Ecc for intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL and inter-scan comparisons

Fig. 6
figure 6

Bull’s eye plots of the intraclass correlation coefficient for segmental Ecc for intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL and inter-scan comparisons

Reproducibility of global Ell

The mean global Ell values were − 0.15 ± 0.02 and − 0.14 ± 0.04 for healthy subjects and for heart disease patients, respectively. For global Ell, Fig. 3 (column 2) shows Bland–Altman plots for intra-user, inter-user-same-site, inter-user-different-site, and inter-scan comparisons, demonstrating narrow limits of agreement for all cases. As shown in Table 2, for global Ell the coefficient of variation was 5.5% or lower for intra-user, inter-user-same-site, inter-user-different-site, and inter-scan comparisons, and the ICC was 0.84 or higher for all comparisons, indicating excellent intra-user, inter-user-same-site, inter-user-different-site, and inter-scan reproducibility of global Ell.

Reproducibility of whole-slice Err

The mean global Err values were 0.35 ± 0.16 and 0.28 ± 0.15 (p < 0.05) for healthy subjects and for heart disease patients, respectively. For whole-slice Err, Fig. 3 (column 3) shows Bland–Altman plots for intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL, and inter-scan comparisons, demonstrating wider limits of agreement for all cases as compared to the corresponding plots for Ecc and Ell. As shown in Table 2, for whole-slice Err the coefficient of variation is in the range of 19.7 – 47% (good—poor) for intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL, and inter-scan comparisons, and ICC was in the range of 0.74 – 0.92 (good – excellent) for all comparisons.

Reproducibility of torsion

The mean global torsion values were 2.79 ± 0.75 and 2.43 ± 1.56°/cm for healthy subjects and for heart disease patients, respectively. For global torsion, Fig. 3 (column 4) shows Bland–Altman plots for intra-user, inter-user-same-site, inter-user-different-site, inter-user-Human-DL, and inter-scan comparisons, demonstrating narrow limits of agreement for all cases. As shown in Table 2, for torsion the coefficient of variation is in the range of 2.5–21.2% (excellent – fair) for intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL, and inter-scan comparisons, and ICC was in the range of 0.86–0.99 (excellent) for all comparisons.

Discussion

Major findings

For spiral cine DENSE at 3 T, segmental Ecc was shown to provide excellent intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL reproducibility and good–excellent inter-scan reproducibility, with CV in the good range and ICC in the excellent range for this case. Whole-slice Ecc was shown to provide excellent intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL and inter-scan reproducibility. Also, global Ell was shown to provide excellent intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL and inter-scan reproducibility, and the reproducibility of torsion was good–excellent for all comparisons. For whole-slice Err, CV was typically in the fair-good range, and ICC was in the good–excellent range. This result showing worse reproducibility of Err compared to the reproducibility of Ecc and Ell is typical of all CMR strain methods that compute Err from short-axis images. This occurs because there are just a few pixels that span the LV wall in the radial direction, which presents challenges for the computation of strain in the radial direction. Due to it’s lower reproducibility, clinical decision making should rely less on radial strain, and due to their higher reproducibility, more on circumferential and longitudinal strain.

We generally found that intra-user reproducibility was best, followed by inter-user-same-site and inter-user-different-site, and then by inter-scan. These results are not surprising, as differences in performing image analysis between users may be greater than differences between the same user at different points in time, and there may be greater differences between different users at different sites compared to different users at a common site. Inter-scan differences may reflect differences in slice position or other factors that may differ between scans, leading to lower reproducibility than cases where the same images were analyzed at different times or by different users. With regard to the fully-automated DL method, its reproducibility was generally very similar to that of inter-user-same-site, which is consistent with and extends the results of the recent study that developed these methods [30]. For segmental Ecc, we observed that lower ICC values were seen in the lateral wall, whereas higher CV values were seen in the septum. This occurred because to get a high segmental ICC value, it is important to have a fairly wide range of the strain data in that segment. For the patients in our study, most of the segmental dysfunction occurred in the septal segments, leading to a wide range of Ecc values in the septum but a very narrow range in the lateral wall. This spatial distribution of segmental dysfunction also explains why CV values were higher in the septum. Since the computation of CV includes dividing by the mean strain, higher CVs occurred in in the septal segments, where some end-systolic Ecc values were near zero. Specifically, with regard to computation of CV for any heart segment, we consider that there are two strain observations. First, we have the strain values for all myocardial points in a specific segment provided by one observation. Second, we have the strain values for the same myocardial points in same segment provided by another observation. Then, the mean strain is the average of the strains from the two different observations for each myocardial point. The reported CV for each segment is the mean coefficient of variation from all points within that segment.

Many cardiac imaging modalities report high reproducibility of global strain, including speckle-tracking echocardiography (STE) [31] and computed tomography based feature tracing (CT-FT) [32,33,34]. With regard to the reproducibility of segmental strain, the present DENSE circumferential strain data compare favorably to STE and CT-FT. While there is a limited amount of published data, for STE a few studies investigating the interobserver reproducibility of various segmental strains (longitudinal and circumferential) report ICC values in the range of 0.77 – 0.82 [35, 36], whereas the present study shows an ICC of 0.86 for DENSE Ecc. For Bland–Altman limits of agreement for segmental strain, the values for STE are approximately twice as wide as for cine DENSE [36]. General recommendations for STE are that, while global strain is highly reproducible, segmental strain measurements have a higher degree of measurement variability and caution should be applied for clinical use [31]. For CT-FT, global strain also has high intraobserver and interobserver reproducibility [32,33,34]; however, the reproducibility of segmental strain is low, with ICC values less than 0.75 in most segments [34].

The present results contribute to a thorough sequence of studies validating DENSE for the measurement of displacement and strain. Spottiswoode et al. validated DENSE measurements of displacement in a rotating phantom for displacement values on the order of 2–20 mm [28], and Nwotchouang et al. recently validated DENSE measurements of displacement in a phantom designed for much smaller displacements, on the order of 20–200um [37]. Cowan et al. carefully validated DENSE measurements of strain using a deformable phantom, showing accuracy of strain that was similar to conventional tagging. The same study showed better reproducibility in healthy subjects of whole-slice Ecc and Err for DENSE than myocardial tagging [11]. Lin et al. showed the high reproducibility of DENSE for segmental Ecc at a single center, and these results included inter-session reproducibility with imaging sessions separated by days (not just minutes as in the present study). Verzhbinsky et al. used simulated phantom data to evaluate the methods used to compute strain from DENSE phase data and rigorously demonstrated their validity and accuracy [38]. Carruth et al. recently extended the findings of high reproducibility of DENSE strain to the right ventricle [14]. The demonstrated accuracy and reproducibility of DENSE may explain why Ecc measured by DENSE outperforms Ecc measured by feature tracking in clinical applications such as predicting CRT response [3] and predicting post-infarct LV remodeling and cardiac events [6].

Only 1.2% of short-axis slices and 9.9% of long-axis slices were not suitable for strain analysis. All sites had less prior experience with long-axis compared to short-axis DENSE utilizing outer volume suppression and a reduced field of view, which led to the higher percentage of poor image quality for long-axis imaging. Specifically, mistakes were made when using outer volume suppression for long-axis imaging such that the region of signal generation did not cover the entire LV. With greater experience, these mistakes were readily avoided.

Limitations

Our study investigated inter-scan reproducibility where subjects were removed from the scanner, repositioned, and rescanned, but didn’t include inter-session reproducibility where subjects underwent DENSE CMR on different days. This limitation occurred because the study design involved adding DENSE scans to clinical patient scans. With the reliance on clinical scans, it was not possible to schedule additional sessions for the evaluation of inter-session reproducibility. The present study didn’t investigate the reproducibility of DENSE at 1.5 T; however, the single-center study by Lin et al. was performed at 1.5 T [21]. Only one vendor was included because the cine DENSE sequence is only available for Siemens CMR scanners. While we evaluated whole-slice Err, we did not seek to show good-to-excellent reproducibility of segmental Err because prior data from Lin et al. strongly suggest that the reproducibility of segmental Err would be poor-to-fair [21]. We did not investigate the reproducibility of segmental Ell for DENSE because we have less experience with and less standardization of methods for long-axis DENSE imaging and image analysis.

Conclusions

In a multi-center study, 3 T CMR DENSE was shown to provide highly reproducible whole-slice and segmental Ecc, global Ell, and torsion myocardial strain data in healthy subjects and patients with heart disease with regard to intra-user, inter-user-same-site, inter-user-different-site, inter-user-human-DL and inter-scan measurements. Fully-automated DL image analysis methods applied to short-axis DENSE images provide excellent reproducibility, equivalent to that of an expert user for whole-slice and segmental Ecc and for torsion. These findings may facilitate future clinical applications that would benefit from reproducible segmental strain imaging.