Background

Pulmonary arterial hypertension (PAH), a disease of the pulmonary vasculature that leads to right heart failure and death, commonly complicates systemic sclerosis (SSc) [1]. Given its high morbidity and mortality, current guidelines recommend screening for PAH in SSc patients [2]. The screening algorithms used for early detection of PAH in SSc rely upon two-dimensional echocardiography (2DE) which, despite well-described limitations, has high specificity and positive predictive value [2, 3]. In addition, 2DE is recommended as part of follow-up evaluation and for risk assessment [4]. Clinically relevant metrics that are frequently used for early detection, follow-up, and risk assessment include right ventricular systolic pressure (RVSP, an estimate of pulmonary artery systolic pressure (PASP) [5], tricuspid annular plane systolic excursion (TAPSE) [6, 7], tissue Doppler of the tricuspid annulus S’ velocity [8], and fractional area change (FAC) [9, 10]. In addition, measures of RV contractile function utilizing speckle-tracking echocardiography (STE) have demonstrated regional abnormalities in RV contractile function in SSc [11], as well as value in assessing response to therapy and predicting mortality [12, 13]. Despite compelling evidence supporting the clinical relevance of these echocardiographic measures, to our knowledge, no study has specifically defined the repeatability, reproducibility, reliability, or the minimal detectable difference (MDD), the smallest change in a measurement of interest that is greater than the within subject variability and measurement error, in SSc or other populations. Defining the MDD is vital to characterizing responsiveness of a measurement to an intervention and a critical knowledge gap in the assessment of outcome measures in SSc-PAH [14].

In the present study, we sought to assess the performance of echocardiographic measures of RV function to assess the repeatability, reproducibility, and reliability of echo-derived measures of RV function and define the MDD. We hypothesized that TAPSE would have the least measurement error based upon our prior experience in SSc populations [6]. This work was presented at the 2020 American Thoracic Society Scientific Sessions, Philadelphia, PA, in abstract form.

Methods

Patient population

Our study was approved by the Johns Hopkins Medicine Institutional Review Board. We prospectively enrolled prevalent SSc patients ≥18 years old from May 2017 to October 2018. All participants met the 1980 [15] and/or 2013 American College of Rheumatology classification criteria for SSc [16]. .The Johns Hopkins Scleroderma Center’s standard clinical practice is to perform annual pulmonary function testing (PFTs) and 2DE to screen for cardiopulmonary complications [17]. Patients with significant chronic obstructive or interstitial lung disease, portal hypertension, severe obstructive sleep apnea, left-sided heart failure, or chronic thromboembolic disease were excluded [18]. Ever-usage of medications such as disease-modifying drugs, calcium channel blockers, and PAH therapies (endothelin receptor antagonists, phosphodiesterase type 5 inhibitors, and prostacyclin analogs or receptor agonists) was recorded.

Patients without evidence of resting PH or RV dysfunction by 2DE defined as resting RVSP < 35 mmHg and TAPSE ≥ 1.6 cm, FAC ≥ 35%, and tissue Doppler S’ ≥ 9.5 cm/s were recruited as cohort 1. Consecutive SSc patients with right heart catheterization (RHC)-proven PAH who were clinically stable on PAH-directed therapies were recruited as cohort 2. PAH was defined by a mean pulmonary artery pressure (mPAP) > 20 mmHg and pulmonary vascular resistance (PVR) > 3 Wood units (WU) with pulmonary artery wedge pressure ≤ 15 mmHg based on 2018 revised 6th World Symposium on Pulmonary Hypertension definition [4].

Echocardiographic acquisition and measurements

Echocardiograms were performed using Canon Artida Ultrasound Machine (Canon Healthcare, Testin, CA) with subjects in the left lateral decubitus position during image acquisition at 70–90 frames per second at end-expiration. 2DE-directed methods to obtain linear and volumetric measurements of the RV chamber in accordance with American Society of Echocardiography (ASE) guidelines [19]. Right atrial area (RAA) was estimated using volumetric area from the apical 4-chamber view. RV function was assessed using TAPSE, tissue Doppler S’ velocity of the lateral tricuspid annulus, and FAC [20]. Tricuspid regurgitant (TR) velocity was used to estimate RVSP and PASP using the modified Bernoulli equation and adding estimated RA pressure based on inferior vena cava dimension and collapsibility with sniff [21, 22]. Estimation of right ventricular-arterial coupling was assessed by the ratio of TAPSE:PASP [23].

2DE echocardiographic measures were obtained by a single certified cardiac sonographer at two time points, evaluation A and evaluation B, separated by 1 h in a semi-fasting state, to limit biologic variability. Two echocardiographers, blinded to patient information, timing (i.e., before/after one-hour fixed time interval), and clinical variables performed 2DE analysis using Synapse Cardiovascular Software (FUJIFILM Medical Systems, V4.0.8, USA).

STE-based longitudinal systolic strain analysis of the RV free wall was performed using commercially available strain software (Epsilon EchoInsight Version 3.1.0.3358, Milwaukee, WI). From the 4-chamber apical view, peak systolic longitudinal strain of the RV free wall was obtained by tracing endocardial borders in end-systolic still frames and manually adjusted to ensure adequate border delineation and segment thickness [24]. Peak longitudinal systolic strain was defined as the difference in shortening from the region of interest relative to original length, and expressed as a negative percentage. Global RVLSS was defined as the average of regional strain from the basal, midventricular, and apical RV free wall segments.

Analytic approach

Categorical variables were expressed as absolute number and percentage. Continuous variables were expressed as mean ± standard deviation (SD) if normally distributed by the Shapiro-Wilk test and as median (interquartile range, IQR) if not normally distributed. To assess statistical differences in all echo measures between evaluation A and evaluation B, within disease group (cohort 1 and cohort 2), and overall, across both cohorts, ANOVA tests for normally distributed variables and Friedman tests for non-normally distributed variables were performed.

Comparisons between cohort 1 and cohort 2 were performed with independent sample t-tests or Mann-Whitney-Wilcoxon rank sum test as appropriate. Fisher exact test was used to analyze differences in categorical variables between cohorts. A P-value < 0.05 was considered as statistically significant.

All examinations were analyzed twice on two different days by the same physician (MM) blinded to cohort and timing of the echocardiogram to determine intraobserver agreement. Examinations were then analyzed by a second reader (VM), also blinded to cohort, timing, and previous interpretations to determine interobserver agreement.

Repeatability, defined as the assessment of repeated measures on the same patient by the same operator on the same device under ideal conditions, was determined by calculating the SD for each patient between measurement A and B with a coefficient of variation (CV) (defined as SD divided by mean value, expressed as a percentage) [25]. Bland-Altman analysis was performed to assess for the level of agreement in measures between evaluation A and evaluation B and exclude the presence of proportional bias between approaches [26]. Reproducibility, defined as variations in measurements made on a subject under changing conditions, was assessed by intraclass correlation coefficient (ICC) for interobserver reproducibility both within disease group (cohort 1 and cohort 2) and in the overall cohort [25]. Reliability, defined as the degree to which the variability of the measurement compares to the inherent or true variability between subjects, was assessed by ICC for intraobserver agreement both within disease group (cohort 1 and cohort 2) and in the overall cohort.

MDD, defined as the minimal change in a measurement that is greater than the within subject variability and measurement error, was calculated for individual subjects between the two different observations. MDD was calculated as standard error of the measurement (SEM) × 1.96 and is reported by cohort [27]. SEM was calculated as the SD of the differences between the two observations for all participants divided by the square root of the sample size. Pearson’s correlation coefficient was used to analyse the relationship between TAPSE and other 2DE-based measures of RV function after normality of data distribution was assessed. Statistical analysis was performed using the SPSS statistical package (SPSS Inc., Chicago, IL, USA, version 20).

Results

Patient population

A total of 20 patients were included (Table 1). Cohort 1 consisted of 10 SSc patients without PAH who were, on average, 60.9 ± 8.0 years of age, and mostly women (80%). Cohort 2 consisted of 10 SSc-PAH patients who were mostly women (80%) and on average 61.7 ± 8.9 years of age with WHO functional class 2 symptoms. Across groups, most SSc patients in our pooled cohort had the limited disease subtype. Further details or SSc-defining characteristics are shown in Table 1. Hemodynamics from the SSc with PAH group were consistent with PAH of moderate severity. Of note, several patients in the SSc without PAH cohort received PAH specific therapy for non-PAH indications: ERA and prostacyclin analogs for management of Raynaud’s phenomenon and digital ulcers; PDE5I for erectile dysfunction.

Table 1 Clinical and hemodynamic characteristics of the study population

Echocardiographic measures

All measures were obtained and available for interpretation and analysis, apart from one patient from cohort 2 with an inadequate TR Doppler signal. Severity of TR differed between cohorts, with 10% of cohort 1 with moderate or severe TR compared to 50% of cohort 2. Conventional 2DE and STE-derived data are described in Table 2. The ratio of TAPSE to PASP, a noninvasive marker of RV contractile response to load, is also reported [23].

Table 2 Conventional echocardiographic and speckle-tracking derived measures of the study population

Repeatability and agreement

After determining normal distribution for both cohort 1 (SSc without known PAH) and cohort 2 (SSc-PAH), ANOVA was performed to compare measures between evaluation A and evaluation B for each subject within each cohort and demonstrated no significant differences in echocardiographic parameters of RV morphology and function between evaluations in both groups, with the exception of midventricular RVOT diameter, RVOT VTI, TR peak velocity, and RVSP (P < 0.05) as shown in Table 2.

Table 3 details the SD and CV of repeated measures between evaluations within the same subject. SD of repeated measures were the lowest for TAPSE, FAC, and tissue Doppler S’, especially for cohort 1. SD of repeated measures were also low for RVOT VTI and global RVLSS but were lower for subjects in cohort 2 compared to cohort 1, suggesting less variability of these measures in the PAH cohort. SD of repeated measures between evaluations were the highest for RVSP regardless of cohort (14.3; 5.3; 24.2 for total SSc population, cohort 1, and cohort 2, respectively). As also shown in Table 3, CV was the lowest for TAPSE (0.9%; 0.6%; 1.1% for total SSc population, cohort 1, and cohort 2, respectively), TAPSE:PASP (1.7%; 2.2%; 0.6%), S’ wave (3.2%; 2.5%; 4.2%), RVOT VTI (6.0%; 6.9%; 5.0%), and global RVLSS (9.7%; 11.7%; 6.6%), while FAC (21.3%; 17.0%; 27.1%) and RVSP (38.0%; 19.6%; 48.6%) showed the highest values.

Table 3 Standard deviation and coefficient of variation of repeated echocardiographic measures by PAH status

Bland-Altman analysis for agreement by reader revealed no significant proportional bias for TAPSE, FAC, and global RVLSS (Fig. 1).

Fig. 1
figure 1

Bland-Altman analysis of agreement between Evaluation A and Evaluation B for tricuspid annular plane systolic excursion (TAPSE), fractional area change (FAC, %) and global right ventricular longitudinal strain (RVLSS). The black line represents the mean of the differences between Evaluation A and Evaluation B. The grey dashed lines represent the 95% confidence interval (CI). A: The black line represents the mean of the differences between TAPSE at Evaluation A and Evaluation B. The grey dashed lines represent the 95% CI (0.51071 and -0.57771 respectively). Unstandardized beta coefficient -0.282, P=0.06 (no proportional bias). B: The black line represents the mean of the differences between FAC at Evaluation A and Evaluation B. The grey dashed lines represent the 95% CI (14.7392 and -11.3522 respectively). Unstandardized beta coefficient -0.139, P=0.316 (no proportional bias); C: The black line represents the mean of the differences between global RVLSS at Evaluation A and Evaluation B. The grey dashed lines represent the 95% CI (6.10929 and -4.10929 respectively). Unstandardized beta coefficient -0.110, P=0.348 (no proportional bias)

Reliability

Reliability as assessed by intra-observer variability was excellent across the total population (Table 4, panel A). In cohort 1, ICC was excellent, defined as by ICC > 0.9, for FAC (0.930; 95% CI 0.816–0.973), tissue Doppler S’ velocities (0.975; 95% CI 0.937–0.990), global RVLSS (0.967; 95% CI 0.917–0.987), and RVSP (0.950; 95% CI 0.950–0.992) [28]. ICC was good for TAPSE and RVOT VTI at 0.75 and 0.9 [28]. In cohort 2, ICC was excellent for TAPSE (0.970; 95% CI 0.926–0.988), FAC (0.952; 95% CI 0.880–0.981), RVOT VTI (0.963; 95% CI 0.900–0.986), global RVLSS (0.950; 95% CI 0.775–0.990), and RVSP (0.993; 95% CI 0.983–0.997) and good for tissue Doppler S’ velocity. ICC for TAPSE to PASP was not determined since it is a derived measure.

Table 4 Intra- and inter-observer variability

Reproducibility

Reproducibility as assessed by inter-observer variability was excellent across the pooled cohort for TAPSE, tissue Doppler S’ velocity, global RVLSS, and RVSP and good for both FAC and RVOT VTI (Table 4, panel B). By group, ICC was excellent for RVSP across cohort 1 (0.965; 95% CI 0.912–0.986) and cohort 2 (0.986; 95% CI 0.966–0.995) and for TAPSE (0.940; 95% CI 0.848–0.976), tissue Doppler S’ (0.980; 95% CI 0.950–0.992), and global RVLSS (0.980; 95% CI 0.949–0.992) in cohort 2. The lowest interobserver ICC agreements were observed for TAPSE, FAC, and RVOT VTI in the SSc patients without PAH. ICC for TAPSE to PASP was not determined since it is a derived measure.

Minimal detectable difference

The MDD in the overall population for TAPSE was 0.11 cm, FAC 2.9%, tissue Doppler S’ velocity 1.3 cm/s, and global free wall RVLSS 1.1%, Table 5. Notably, the absolute MDD values were higher for TAPSE, FAC, and RVSP in cohort 2 (SSc-PAH) compared to cohort 1 (SSc without PAH). MDD for the TAPSE:PASP ratio, an echo-derived estimation of ventriculo-arterial coupling, was 0.1 mm/mmHg across the pooled cohort and higher at 0.16 mm/mmHg in the SSc without PAH group compared to 0.11 mm/mmHg in the SSc with PAH group.

Table 5 Minimum detectable difference for each echocardiographic measure of right ventricular function

Discussion

In the present study, we sought to define the performance of echocardiographic measures of RV function in SSc patients with and without PAH. Under rigorous study conditions, we demonstrate high degrees of reproducibility, reliability, and repeatability for most measures. Bland-Altman analysis for agreement by reader revealed no significant proportional bias for TAPSE, FAC, and global RVLSS. Importantly, we define the MDD of RV functional measures for SSc patients with and without PAH. To our knowledge, this is the first study to establish key characteristics of echocardiographic measures of RV function. Furthermore, although our study included only SSc patients, our findings are likely applicable to other forms of PAH and thus have important implications for evaluation and management of this disease.

Echocardiography is integral in the screening for cardiopulmonary complications in SSc as well as serial monitoring of disease progression and treatment response in SSc-PAH [5]. However, despite its widespread use, no echocardiographic measures of RV function have been fully validated, as noted by the Expert Panel on Outcome Measures in PAH-SSc [14]. This lack of validation represents a major knowledge gap in the non-invasive assessment of RV function [29]. In the current study, we define the reproducibility, reliability, and repeatability of several echocardiographic measures of RV function to address this gap and to fulfil imaging standards defined as essential components for quality assurance and appropriate integration into study design and clinical trial analysis [30, 31]. We employed techniques to limit variability across various study aspects including technical components, uniformity of ultrasound equipment and analytical software, utilization of a single sonographer trained in acquisition of echocardiographic data for clinical trials, oversight of image acquisition and quality assurance, and analysis by two expert echocardiographers blinded to clinical variables and timing of 2DE examination to limit inter- and intra-observer variability [30]. In addition, echocardiograms were performed in a semi-fasting state within one hour to further limit biological variability. Thus, our study meets the imaging standards necessary to define measurement characteristics of RV function in echocardiography.

Our study demonstrates excellent repeatability as assessed by CV for most echocardiographic measures in both the SSc without and SSc with PAH groups. However, in the SSc-PAH group (cohort 2), both FAC (CV 27.08%) and RVSP (48.61%) had CVs that exceed commonly used thresholds for acceptable variability of 15–20%, suggesting poor repeatability of these measures [32]. Our data shows good-to-excellent levels of reliability and reproducibility for all parameters, based on ICC values for inter- and intra-observer agreement, though greater variability in most measures were noted in SSc-PAH patients compared to SSc patients without PAH. Differences in the variability of non-invasive measures of RV function between healthy controls and PAH patients have previously been demonstrated in a study of cardiac magnetic resonance imaging [33] and may be explained by physiologic adaptations of increasing RV end-systolic and end-diastolic volumes to maintain stroke volume in response to increased RV afterload. This leads to increases in RV dilatation which would necessarily cause increased variability in echo-based measures of RV function in PAH patients in whom RV afterload is elevated [34]. Interestingly, the lowest interobserver ICC agreements were observed for TAPSE, FAC, and RVOT VTI in the SSc patients without PAH and may suggest decreased sensitivity of these non-invasive parameters at lower afterloads.

Prior studies in PAH populations (not solely comprised of SSc-PAH) have not routinely reported ICC as a measure of reproducibility or reliability; thus, direct comparison to these studies is challenging. Hinderliter et al. reported the reproducibility of select measures in a PAH population from a randomized controlled trial of epoprostenol by comparing the repeated interpretations of a selection of 17 baseline echocardiograms [35]. The difference (mean ± SE) between the two interpretations for echo-based measurements obtained in that study was: 1.4 ± 0.2 cm2/m for indexed RV end-diastolic area; 4.7 ± 0.1% for FAC; and 0.08 ± 0.01 m/s for peak TR velocity [31]. Similarly, Nath and colleagues reported the reproducibility of RV size and RV function by comparing interpretations of 10 subjects’ echocardiograms who were randomly selected from cohort study of PAH patients and found the interobserver agreement for RV size was 80% and RV function was 70% [36]. Unfortunately, neither details regarding the metrics used to define RV size and RV function nor the method by which reproducibility was calculated were presented. Furthermore, as echocardiographic parameters that integrate the RV contractile response to pulmonary vascular load such as TAPSE to PASP have emerged as important predictors of PAH in SSc [23, 37, 38], it is increasingly important to define repeatability, reproducibility, and reliability. To our knowledge, no other study has rigorously evaluated other aspects of test characteristics for echocardiographic measures of RV function.

A key and novel component of the present study is the identification of the MDD for echocardiographic RV functional measures. Although not equivalent to the minimum clinically important difference, an MDD represents a key reference point upon which the lower bound of clinically relevant changes can be estimated [39, 40]. The MDD provides a framework for understanding if an observed change in a measure is related to inherent variability of the measure or if it represents real change from baseline. To put our MDD estimates in a clinical context, we reviewed several clinical studies that focused on the role of echocardiography in screening for PAH in SSc and assessing response to pulmonary vasodilator therapy in SSc-PAH patients. In a single observational study of 277 SSc patients unselected for PAH in whom changes in echo-based RV measures were assessed over a median follow-up of 3.3 years, the investigators found that an average decline in TAPSE of 0.14 cm, decrease in FAC of 1%, and increase in RVSP of 6 mmHg was associated with increased mortality [5]. The changes in TAPSE and RVSP associated with clinical outcomes in this study exceed the MDD for these measures as estimated by our current study; however, the change in FAC is significantly lower than the estimated MDD found in the current study, suggesting that changes in this range (1%) are within the range of measurement error and do not represent true change. In studies of various populations of PAH patients examining improvements in RV functional measures with pulmonary vasodilator therapy, changes in TAPSE of 0.2–0.56 cm and changes in FAC of approximately 3% were observed [41,42,43]. In an open-label clinical trial of combination oral therapy for treatment-naïve SSc-PAH patients, we have previously showed improvement in TAPSE by 0.55 cm, FAC by 11.8%, global RVLSS by 4.8%, and TAPSE:PASP 0.36 ± 0.24 over 36 weeks [13, 44]. The magnitude of the observed changes in each of these studies greatly exceed the MDDs reported in the current study, thereby confirming these changes as potentially clinically relevant.

There are several limitations to the present study. First, the relatively small sample size may influence the robustness of the reported measurement characteristics and thus require confirmation in larger cohorts. However, we did perform bootstrap analyses of the MDD calculations to estimate a population MDD with confidence intervals. These analyses show that the bootstrap-estimated population MDD and our study sample MDD are similar and, importantly, that the confidence intervals are narrow and thus consistent with the presented estimates of MDD (Supplement Table 1). This suggests that a larger sample size would be unlikely to yield difference results. Second, cohorts were frequency matched by age and gender; however, the predominance of women may impact the generalizability of our findings. Third, while we attempted to control for biovariability by conducting the study in a semi-fasting state at a fixed 1-h time interval, there may have been unanticipated biological factors that affected our findings. We also did not control for scleroderma disease duration and medications in our cross-sectional study design. Lastly, some measures of RV function, such as eccentricity index, were not included as part of our study protocol.

Conclusions

In conclusion, these data on the variability of echo-based measures of RV function in SSc patients are highly relevant to the use and interpretation of these measures. The MDD for these measures offer an essential framework upon which estimation of clinically relevant changes can based to inform clinical decision-making, such as referral for RHC and escalation of therapies, not only in SSc but also in other forms of PAH. Further prospective studies are needed to establish the role of these echocardiographic measures in the management of SSc patients at-risk for and with known PAH and for patients with other forms of PAH.