Background

Global longitudinal strain (GLS) is an important prognostic biomarker in the evaluation of the left ventricular (LV) function [1,2,3]. Visual assessment of wall motion abnormalities is fast but dependent on the experience of the observer and short of objective quantification [4, 5]. In echocardiography, GLS derived by automated speckle-tracking has been shown superior in detecting and quantifying subtle impairment of LV systolic function [2], as well as to provide higher predictive value for mortality than LV ejection fraction (LV-EF) in the presence of regional wall motion abnormalities [1], or isolated GLS reduction in the presence of global (diffuse) systolic impairment [3].

Feature tracking (FT) by cardiac magnetic resonance (CMR), or CMR-FT, has conceptual similarities with speckle tracking, in providing quantitative assessment of myocardial deformation [6]. Despite the methodological differences in image acquisition and postprocessing, the similarity extends to the use of routinely acquired cine (steady-state free precession, SSFP) images, avoiding the need for additional dedicated sequences, such as tagging [7, 8]. Numerous studies reported on validation and agreement with other deformation techniques (reviewed elsewhere ([8, 9]), which, in summary, reveal that the measurements derived by CMR-FT are not easily transferrable, nor in scale or precision. One of the reasons for the differences is a number of methodologically different software solutions, pertaining numerous approaches to automatic contour-placement and tracking, as well as underlying algorithms of strain calculation [7]. Moreover, despite high reproducibility GLS in some single centre studies, there remains a remarkable inter-centre difference despite the use of same vendor, reflecting an important source of operator induced, or random variability [10, 11]. We hypothesized that standardization of user postprocessing may reduce the random component of variability. Improved precision of measurements may support transferability of CMR-FT and allow comparability of intervendor and intercentre results, despite the systematic differences, owing to the different image processing algorithms used by vendors. In this study, we undertook a systematic analysis and elimination of potential errors to guide the development of standardized operating procedure for local reads by comparison of two different vendors.

Methods

Anonymized datasets were sourced from the prospective longitudinal observational multicentre investigator-led study [12,13,14]. Groups of unrelated subjects with either known ischaemic heart disease (IHD) or non-ischaemic dilated cardiomyopathy (NIDCM), were composed to examine the influence of regional wall motion abnormalities (RWMA) and diffuse myocardial impairment on postprocessing of GLS, respectively. The control group consists of subjects with normal blood pressure, low-pretest likelihood of cardiomyopathy, normal LV mass, volumes, and global systolic function, as well as absence of myocardial scar on late gadolinium enhancement (LGE) imaging and no regular medication. Clinical meta-data, including systolic/diastolic blood pressure (BP), body mass index (BMI) were recorded. Exclusion criteria for all subjects were the generally accepted contraindications to CMR (implantable non-MR safe devices, cerebral aneurysm clips, cochlear implants).

CMR image analysis and acquisition

The CMR protocol, details of image acquisition and postprocessing have been reported previously [12,13,14]. All subjects included in this study underwent a CMR study using a 3.0 T clinical scanner (Skyra, Siemens). Cine images were obtained using a balanced steady-state free precession (SSFP) sequence in combination with parallel imaging (SENSitivity Encoding, factor 2) and retrospective gating during expiratory breath- hold (TE/TR/flip-angle: 1.7 ms/3.4 ms/60°, spatial resolution 1.8 × 1.8 × 8 mm, temporal resolution of 25 frames/cycle). Routine CMR analysis of cardiac volumes, function and mass was performed using commercially available software Medis Suite MR v2.1 (Medis medical imaging systems, Leiden, The Netherlands) [15] using a stack of gapless short axis (SAX) cine slices. Left ventricular endocardial borders were drawn manually at end-diastole and end-systole. The papillary muscles were traced and included as part of the LV cavity volume. Left ventricular end-diastolic (EDV) and end-systolic (ESV) volumes were determined using Simpson’ s rule. Ejection fraction (EF) was computed as EDV-ESV/EDV. All volumetric indices were normalized to body surface area (BSA).

Single cine slice long-axis views (LAX, 2-, 3-, and 4 chamber view) were used for GLS analysis using CMR-FT. Images were analysed off-line using two commercially available software packages: Medis Suite MR v2.1 (Medis medical imaging systems, Leiden, The Netherlands) and CVI42 Version 5.6.6 (Circle Cardiovascular Imaging Inc., Calgary, Canada). GLS was calculated as the average of the 3 LAX views and expressed as an absolute global peak systolic strain.

Optimization and derivation of standardized postprocessing

All readers involved in this project had extensive previous experience of 2D speckle tracking by echocardiography. After training of vendor recommended postprocessing approaches (handbooks, website information, webinars, training with application specialists), a series of stepwise adjustments was examined by way of trial and error; these steps were implemented if shown to beneficially reduce the intra/inter-observer bias. Subsequently, datasets were analysed separately by two independent observers, resulting in three sets per each software or six outputs for each subject and implementation series. The readers were blind to their own, each other’s as well as previous measurements. Intraobserver measurements were repeated after an interim interval of 4 weeks.

Optimization steps were based on a modified approach described previously [11], as well as comparative evaluation of both readers’ tracking results (Fig. 1). Common tracking errors and systematic differences were identified, and agreements were drawn and tested to support a reproducible approach for assessing GLS. Vendor-specific approaches to contour-manipulation were necessary. Using CVI-42, the epicardial and endocardial contours were manually delineated in all analysed sections with initial contours set at end diastole. The epicardial contours were purposefully placed slightly inside the myocardium avoiding at the epicardial border to reduce a common tracking failure resulting from placing the contours onto the boundary points of the pericardium (Fig. 2). Similarly, when placing the endocardial contours papillary muscles were avoided. When using Medis, the delineation of endocardial contours in all analysed sections was set at end systole, whereas epicardial contours (as well as all end-diastolic contours) were generated automatically. In most instances, the epicardial required manipulation by the observer, similar to the steps, described above. Care was taken to place the epicardial contour slightly within the myocardium to avoid tracking of the pericardium. Papillary muscles were excluded when placing endocardial contour. After completing the automated tracking process, the overall quality of contour placement was re-evaluated by the respective observer. Inadequate tracking, defined as apparent deviations of the contours from the endocardial and epicardial borders based on visual assessment, the contours were manually corrected, and the automated algorithm was reapplied (up to a maximum of 2 runs). Segmental regions of interest (ROIs), which persistently tracked poorly, were excluded from analysis. If persistent poor tracking included more than two segments in a single view, this patient’s case was excluded from the subsequent analysis.

Fig. 1
figure 1

Steps for CMR-Feature tracking

Fig. 2
figure 2

Examples for tracking problems and solutions. a - in this case, boundary points have been placed in the pericardium accidentally. b shows that there is no movement of the lateral wall boundary points in correlation to the ventricular contraction. c and d: In this case, boundary points were placed correctly and lateral wall tracking was improved

Statistical analysis

Statistical analysis was performed using SPSS, version 24. Normality of distributions were tested using Shapiro-Wilk test. Categorical data are expressed as counts (percentages), and continuous variables as mean ± SD or median (range), as appropriate. Mean difference (MD) ± SD was calculated from each group substracting the measured values of two different observers divided by the number of subjects measured in the respective group. Comparisons between groups were performed using Student t-test or one-way ANOVA for normally distributed variables, and chi2 and Mann-Whitney test for non-normally distributed variables. Fischer’s exact tests were used to compare proportions. Inter- and intra-observer variability was computed using the intra-class correlation coefficients (ICC) using a two-way mixed model with absolute agreement between measures and coefficient of variance (CV) and the Bland-Altman plots. Effect size between controls and patients based on GLS was assessed Cohen’s D as well as using receiver operating characteristic (ROC) curve analysis and calculation of the area under the curve (AUC); AUCs pre and after standardisation were compared using z-test statistics. Cohen’s D were calculated comparing the mean GLS values and the standard deviations pre- and poststandardization with the respective control group. The AUCs were calculated by integration of the graph produced by drawing the sensitivity of GLS on the y-axis and 1-specificity on the x-axis in terms of discriminating between health and disease. All tests were two-tailed and p-values <0.05 were considered statistically significant.

Results

Baseline characteristics are displayed in Table 1. The final analysis included 44 subject cases: 15 patients with IHD and regional wall motion abnormalities, 19 patients with NIDCM and 10 healthy controls. Quantitative analysis was not diagnostic in 12 subjects, whereas in 17(39%) subjects 1 ROI had to be discarded. The results of GLS measurements before and after standardization are provided in Table 2 for both vendors. There were significant differences between controls and all patients, as well as patient subgroups, in most CMR measurements, as well as GLS (p < 0.005) (summarised in Table 2); these differences were accentuated through standardization procedures, as shown by significant increase in Cohen D’s and AUCs (Fig. 3, Table 3). The benefit of standardization was also evident through improved CV and ICC agreements between observers and the different vendors. Results of Bland Altman analyses for inter-and intraobserver reproducibility for both vendors prior to and after standardization are displayed in Table 2 (plots Figs. 4, 5 and 6). Limits of agreement and the coefficient of variation were reduced after standardization within each vendor, albeit more strongly for interobserver than intraobserver agreement (p = 0.03 vs <0.001, respectively) and more for MEDIS than CVI42 (Medis: p < 0.000; CVI42: p < 0.028).

Table 1 Subject characteristics
Table 2 Results of pre and post-standardization analyses
Fig. 3
figure 3

ROC Curve analyses for discrimination between health vs disease

Table 3 Results of ROC analyses for separation between the groups prior to and after standardization (AUC, 95%CI)
Fig. 4
figure 4

Results of reproducibility for intervendor agreement between Medis and CVI42. a – Bland Altman pre-standardization (Mean = 2.14; limits of agreement: +1.96s = 9.21 -1.96s = -4.33), (b) - Bland Altman post-standardization (Mean = 1.54; limits of agreement: +1.96s = 5.76 -1.96s = -2.69)

Fig. 5
figure 5

Bland Altman plot for observer agreement Medis: a- pre-standardization intraobserver (Mean = -0.18; limits of agreement: +1.96s = 3.96 -1.96s = -4.32); b- post-standardization intraobserver (Mean = 0.23; limits of agreement: +1.96s = 1.44 -1.96s = -0.98); c -pre-standardization interobserver (Mean = 0.29; limits of agreement: +1.96s = 7.84 -1.96s = -7.25); d- post-standardization interobserver (Mean = 0.33; limits of agreement: +1.96s = 1.76 -1.96s = -1.10)

Fig. 6
figure 6

Bland Altman plot for observer agreement CVI42. a pre-standardization intraobserver (Mean = -1.16; limits of agreement: +1.96s = 3.30 -1.96s = -5.63); b post-standardization intraobserver (Mean = -0.54; limits of agreement: +1.96s = 0.80 -1.96s = -1.89); c pre-standardization interobserver (Mean = -0.57; limits of agreement: +1.96s = 2.05 -1.96s = -3.18); d post-standardization interobserver (Mean = -0.39; limits of agreement: +1.96s = 1.83 -1.96s = -2.60)

The common sources of poor reproducibility included:

  • mis-selection of images (e.g. in case of repeated acquisitions due to artefacts, poor breath-holding, arrhythmia);

  • the definition of end-diastole and end-systole (improved by visual determination of frames with biggest/smallest volume and closed heart valves);

  • the placement of endo- and epicardial contours by

    • avoiding the dark pericardial rim or effusion (poor tracking);

    • exclusion of papillary muscles

    • re-evaluation of sufficient tracking.

Mean GLS values for all subjects were on average 2 percentage points different between the two vendors (p = 0.023), also on the subgroup level (p < 0.05), despite relatively high agreement in overall measurements (r = 0.85, p < 0.001). Although the inter-vendor difference persisted post-standardization, it was smaller compared to initial results, substantiating the fixed element of systematic difference in derived measurements with two vendors.

Discussion

Results of our study reveal that post-processing approach to GLS matters with respect to the reproducibility of measurements and detection of effective difference in GLS between controls and patients. We demonstrate that standardization of GLS post-processing helps to reduce the random component of variability, introduced by inconsistencies between and within observers, while fixed systematic inter-vendor bias due to vendor related differences in image processing algorithms remained. We further show that the greater precision of measurements affords greater effect size, and as thus, improved discrimination between controls and subgroups of patients, which was not vendor-dependent. Results of our study provide a proof of concept that standardization of GLS measurements is an essential step in ensuring the reliable quantification of myocardial deformation, between different observers, users and across vendors.

Previous studies leading up to the present work have highlighted the differing normal values as well as the results for intra- and interobserver reproducibility, as well as between vendors and centres (summarised in Table 1S from PMID:19789193) [16]. In summary, the reported intraobserver CV ranges between 5.2–12.3%, whereas interobserver CV 10.9–15.4%, and our initial results agree well with these previous reports. However, we have shown that by following the standardization protocol considerably improves reproducibility of measurements, in the study group as a whole, as well as subgroups. By employing a standardization protocol, CMR-FT can become an objective and reproducible method for the quantification of LV deformation. Whereas the burden of contour manipulation may at first appear substantial, we have narrowed this down to a few essential and systematic steps, predictable of failure of tracking, as well as vendor-specific contour placement, which has proactively served to considerable improvement. This information is important as it may guide the necessary optimization steps of CMR-FT softwares, in order to adequately serve the clinical routine. Diversity of normal values is often noted limitation of CMR-FT, yet the range of the thus-far reported mean values is admittedly narrow (19–21.3%, Table 1S), also reproduced by the recent metanalysis by Vo et al., 20.1% [11]. Majority of the previous studies used TomTec based software [6, 7], a product using the same tracking algorithm as the MEDIS software, and our MEDIS derived values in controls reproduce these previous reports. The systematically higher measurements with CVI42 signal a very different image-processing approach; yet the high inter-vendor agreement of measurements suggests that although the softwares may be employing different algorithms, they track similar features of myocardial deformation. The benefits of standardization can further be seen through marked improvement of CV and ICC and Bland Altman plots, reflecting the effect of harmonization for both intra- and inter-observer variability. Reduction of the mean differences and limits of agreement translate into smaller dispersion of the GLS measurements, which is greater for interobserver reproducibility. Applying the standardized steps improved results for both vendors; vendor-specific steps clearly helped to reduce intervendor bias, again communicating a random variability component or, in other words, the many ways in which different observers could potentially use the different softwares. Our findings emphasize the role of clear and documented instructions and their unconditional implementation, in support of multi-user transferability in routine clinical practice.

The detection of early disease relies on precision in the technique, which can control for misclassification from healthy subjects. The patients and groups in our study were selected to be representative of the common clinical scenarios, where employment of GLS is known to be complementary to the assessment of global LV function, e.g. the mid-range LV-EF 30–50% [17]. Compared to controls, both patient groups were older and had significantly but similarly reduced global systolic function. In both groups, GLS values were significantly lower in comparison to controls. Of note, comparative GLS measurements between IHD and DCM group revealed significantly lower GLS in the DCM group (p < 0.001). This is an important observation, which is in part explained by considerably higher LV volumes in DCM group, indicating the presence of global remodelling and consequently, operation at much higher loading. In the IHD group, GLS is reduced considerably, but only owing to severe regional impairment, whereas the preserved myocardium at first compensates with hypertrophic response and not change of loading [1, 18]. Given the rather homogenous presentation of cases within the model disease groups, the AUC for separation of patient groups from healthy controls were excellent before and after standardization, although additional improvement remains notable.

The introduction of CMR-FT was long hailed as a much-needed clinical application that reuses the routine cine acquisitions, while reducing the need for additional imaging that encode the changes with myocardial deformation, such as tagging. The overall viability of this technology appeared to depend on the availably of a quick, sleek and foremost accurate offline postprocessing, which resulted in offspring of several dedicated software products for CMR-FT. Yet the results of CMR-FT analyses vary from vendor to vendor and remain highly observer dependent. Several solutions were proposed, foremost the averaging of results of repeated analyses for increasing intra-vendor reproducibility [11]. Our results reveal that benefit of such solution is likely dubious, the doubling or tripling of analysis time notwithstanding, as the source of high variability primarily arise from the tracking failure of automatically detected (auto-) contours, which cannot be improved by repetition, but manipulation of contour placement on post-processed SSFP images. In our study, this approach turned out to influence most strongly the accuracy of CMR-FT and several reasons underlie this observation. There are many independent variables that cannot be improved by repetitive tracking including image quality, frame rate, slice geometry (e.g. cutting through the papillary muscles), observer and centre experience. Image quality will suffer with poor breath-capacity, mitral annular calcification, pericardial effusion, mis-triggering, low frame rate and imperfect slice positioning, and will result in poor auto-tracking due to difficult endo- and epicardial border definition and misallocation of placed boundary points. Institutional structures mandating standardised approaches and providing adequate training will have high impact on reproducibility and precision.

Conclusion

Standardization of GLS postprocessing helps to reduce the random component of variability, introduced by inconsistencies of and between observers, and to some extent also intervendor variability. There remains fixed systematic inter-vendor bias due to vendor related differences in image processing algorithms. Greater precision of measurements affords an improved effect size, and as thus, discrimination between controls and subgroups of patients, irrespective of the choice of postprocessing software or underlying pathophysiology. Results of our study provide a proof of concept that standardization of GLS measurements is an essential step in ensuring the reliable quantification of myocardial deformation, between different observers, users and across vendors, and for implementation of CMR-FT in clinical routine.

Additional file 1: Video GLS.