Introduction

Transcatheter aortic valve implantation (TAVI) is the treatment of choice for inoperable, a recommended alternative to surgery in high-risk, and a potential option in intermediate-risk patients with symptomatic severe aortic stenosis [1].

Paravalvular leak (PVL) is an important limitation of TAVI as compared to surgical valve replacement [2]. Proper annular sizing [3, 4] and the use of more efficient paravalvular sealing technologies [58] led to a significant reduction in the incidence of greater than mild PVL. However, mild PVL is still a common complication of the second/third generation transcatheter aortic valves [58], and has been linked to worse prognosis [9]. Moreover, as TAVI extends to younger patients, more bicuspid anatomy and native valvular regurgitation will be met increasing the potential risk of PVL [10, 11]. Furthermore, an excellent paravalvular sealing (at least comparable to surgical bioprosthesis) will be a prerequisite before lower risk patients can be offered TAVI as a recommended option.

Recently, PVL has been reported to regress up to 1 year compared with discharge after TAVI with the self-expanding CoreValve [12]. On the other hand, structural deterioration and new onset valve regurgitation are being increasingly reported [13], further emphasizing the importance of reproducible long-term surveillance.

Ironically, data on the incidence [14], the fate [12, 15, 16] and the consequences [9, 12, 17] of PVL tend to be inconsistent reflecting, in part, poor inter- and intra-technique reproducibility of PVL assessment.

We sought to investigate and propose an approach to improve the reproducibility of the echocardiographic assessment of PVL severity.

Methods

The study protocol has been approved by the institutional review board and all patients provided a written informed consent. The study consisted of three phases. In the first phase, 50 randomly-selected post-TAVI transthoracic 2D echocardiograms were independently analyzed by four cardiologists (BR, ES, MA and OS) of variable experience (in echocardiography; 5–19 years and in analysis of TAVI echocardiograms; 1–10 years) blinded to patients’ clinical and procedural data. A summary (mean ± standard deviation) of the individual-observer measurements is provided in Table S1. In 35 echocardiograms, reread by the same observer (BR) was performed at a median interval of 5 months to investigate intra-observer reproducibility. Eleven parameters of PVL severity (Table 1) were analyzed in accordance with the guidelines of the American Society of Echocardiography (ASE) and the European association of Echocardiography (EAE) for the evaluation of native [18]/prosthetic [19] aortic regurgitation (AR). Regurgitation volume was calculated as the difference between the stroke volumes at the left and right ventricular outflow tracts, derived from left ventricular outflow tract diameter (LVOTd) and velocity time integral (VTILVOT) and right ventricular outflow tract diameter (RVOTd) and VTI (VTIRVOT).

Table 1 Echocardiographic parameters of PVL severity included in the reproducibility analysis

In the second phase, data on the inter- and intra-observer reproducibility of the individual parameters were used to generate a reproducible PVL grading scheme. Parameters with the best inter- and intra-observer agreement and the least variability were chosen.

In the third phase, PVL severity was graded by the four observers in the 50 echocardiograms using the tailored scheme. The latter combined several qualitative and semiquantitative parameters of PVL severity. The qualitative features were initially used to categorize patients into clear none-trace PVL, clear severe PVL or an intermediate category. In cases in the intermediate zone, we used three semiquantitative parameters to allocate patients into one of four “granular” [20] sub-classes; mild, mild-to-moderate, moderate, and moderate-to-severe. The latter were then collapsed into two classes (mild and moderate) yielding a 4-class (none-trace, mild, moderate, and severe) final scale. We used the cut-points defined by the ASE/EAE guidelines [18], and experts’ consensus [21] and opinion [22]. In the first 15 studies, independent assessment by the four observers was routinely followed by a consensus grading to align the interpretation of qualitative parameters. More than 1-class disagreement (across the 6 subclasses) in the independent assessments occurred only in two cases. Those 15 cases were subsequently excluded from statistical analysis which was confined to 35 independently-adjudicated cases. Echocardiographic studies varied in image quality but were adequate for grading of PVL (no transvalvular regurgitation was observed), using at least two parameters of severity.

Statistical analysis

Continuous variables are summarized as mean ± standard deviation (SD) and categorical variables as frequency/percentage of the studied group. Intra- and inter-observer agreement of numerical parameters was expressed as intraclass correlation coefficient (ICC). For inter-observer ICC, pairwise comparisons of the four observers (6 comparisons) were averaged. The p value for the averaged ICC was determined according to the degree of freedom (number of pairs). Intra and inter-observer variability was expressed as a coefficient of variation (CV) calculated as the SD of inter-/intra-observer difference divided by the population mean. For intra-observer rereads, differences were the result of subtraction of the second from the first observation. For inter-observer comparisons, the differences were the result of the subtraction of the average observation (ȳj) from the individual observation (yij). Differences among observers were plotted using the method proposed by Jones et al. [23] for graphical assessment of agreement with the mean between multiple observers. In this method; dij = yij− ȳj (y-axis) is plotted against yj (x-axis) where y refers to the measurements, ȳ refers to the mean measurement, i refers to observers and j refers to subjects (so ȳj is the mean of the measurements for subject j). The 95 % limits of agreement (95 % LOA) with the mean are estimated as ±1.96 × s, where s is an estimate of the SD of interobserver differences (for the four observers) and is calculated as the square root of the variance of differences.

Inter-observer and intra-observer agreement on categorical parameters and inter-observer agreement on the PVL grade were expressed as kappa coefficient (κ).

Statistical analysis was performed with SPSS 23 (IBM, Armonk, NY, USA). All probability values were two-tailed, and a p value <0.05 was considered significant.

Results

Inter- and intra-observer ICC was high (0.73–0.99) and CV was low (0.01–0.47) for color Doppler parameters (except PVL short-axis area) and continuous-wave Doppler parameters (Tables 2, 3; Fig. 1).

Table 2 Indices of inter-observer variability and agreement for eleven parameters of PVL severity
Table 3 Indices of intra-observer variability and agreement for parameters of PVL severity
Fig. 1
figure 1

Modified Bland–Altman plots of inter-observer (4 observers; A, B, C and D) variability and limits of agreement for PVL jet circumferential extent, breadth, short-axis area, pressure half time and velocity time integral and valve stent eccentricity. As visually displayed in the plots, absolute differences (between the individual measurements and the average of all measurements) tended to increase proportionately with increasing average of the measurements (on the X-axis). AR aortic regurgitation, CV coefficient of variation, LOA limit of agreement, PVL paravalvular leak, ROA regurgitant orifice area

Quantitative Doppler parameters, PVL short-axis area and valve stent eccentricity index had lower ICC and higher CV. For quantitative Doppler parameters, the inter-observer CV was generally low for the individual measurements including LVOTd (0.04), VTILVOT (0.04), RVOTd (0.08), VTIRVOT (0.07), and VTIAR (0.04). Variability, however, markedly increased when computations were applied to calculate LVOT stroke volume (CV = 0.16), RVOT stroke volume (CV = 0.25), effective regurgitant orifice area (CV = 0.54), regurgitation volume (CV = 0.67) and fraction (CV = 0.82) (Fig. 2). Kappa coefficient for aortic diastolic flow reversal was low for inter- (κ = 0.25) and intra-observer (κ = 0.5) comparisons (p > 0.05 for both).

Fig. 2
figure 2

Modified Bland–Altman plots of inter-observer (4 observers; A, B, C and D) variability and limits of agreement of quantitative Doppler parameters of PVL severity. Variability increased (higher CV and wider 95 % LOA) as basic measurements are subjected to imputations. CV coefficient of variation, LOA limit of agreement, LVOTd left ventricular outflow tract diameter, LVSV stroke volume at the left ventricular outflow tract, RF regurgitation fraction, RV regurgitation volume, RVOTd right ventricular outflow tract diameter, RVSV stroke volume at the right ventricular outflow tract, VTI velocity time interval

Based on the reproducibility of the individual parameters, the grading scheme (Table 4) was set up and combined six qualitative and three semiquantitative reproducible parameters.

Table 4 The final PVL grading scheme set-up after considering the reproducibility of the individual parameters

Table S2 shows the number of patients in each of the PVL grades as defined by the four observers using this scheme. Inter-observer grade agreement was achieved in 86 % of cases with a kappa coefficient of 0.79 (Table 5).

Table 5 Inter-observer agreement on PVL grade*

Discussion

The main findings of the present study are that: (1) color Doppler and continuous wave Doppler parameters are more reproducible than other parameters of PVL severity, especially those entailing complex computations (quantitative Doppler); and that (2) a simplified 2-step granular scheme combining reproducible qualitative and semiquantitative parameters improves the inter-observer reproducibility of PVL grading.

The reported rates of PVL in different TAVI trials and registries ranged from 40 to 67 % for trivial to mild and from 7 to 27 % for moderate to severe AR [14, 24]. In recently published data from a large series treated with a balloon-expandable valve, the incidence of moderate-severe PVL was reported to be 27 % [24]; more than twofolds the incidence reported in former clinical trials utilizing the same valve technology [2]. Those discrepancies are largely to blame on the low reproducibility of the currently used methods to quantitate PVL.

In a random sample from the PARTNER trial, a highly-confident grading of PVL was possible in 62 % of studies, while it was low/uninterpretable in 13 % [20]. In spite of applying different approaches (one that heavily weighs jet circumferential extent vs. a multiparametric multi-window approach) and schemes (condensed vs. granular classification), interobserver PVL grade agreement (39–61 %) and weighted kappa estimates (0.48–0.52) were modest [20].

Our approach was to first investigate the reproducibility of the individual parameters to set-up a scheme that combines the most reproducible ones. To improve practicality, quick qualitative features were primarily used to broadly categorize patients. Afterwards, reproducible semiquantitative parameters were applied in a granular manner. The latter concept (granular classification) was previously shown to improve reproducibility of PVL grading and can easily be collapsed into the ordinary 4-class scheme [20]. The latter is more familiar to the clinicians to interpret and more aligned with other techniques (e.g. angiography and magnetic resonance imaging). This approach resulted in an inter-observer agreement on the PVL grade in 86 % of cases, 0 % greater than 1-grade disagreement and a kappa statistic of 0.79, denoting an excellent reproducibility [25].

Color and continuous-wave Doppler parameters showed favorable reproducibility, while aortic flow and quantitative Doppler parameters were less reproducible. Altiok et al. [26] reported intra- and inter-observer variability of 73.5 ± 52.2 and 108 ± 64.7 % for regurgitation volume and 75.2 ± 55.9 and 120.3 ± 62.3 % for regurgitation fraction of post-TAVI PVL. Noteworthy, in the present study, the component basic measurements of quantitative Doppler criteria showed good reproducibility. Variability, however, overinflated as imputations were applied and increased as imputations were more complex (Fig. 2).

It is widely believed, with little supportive evidence, that the hemodynamics of post-TAVI AR are different from that of chronic native AR [27]. Accordingly, the use of Doppler parameters sensitive to hemodynamics (including CWD parameters) in the assessment of post-TAVI AR is subject to experts’ criticism. On the other hand, two arguments supporting the use of CWD are worth-discussing. First; is that available data supports the correlation between the invasively measured transvalvular diastolic pressure gradient and patients’ outcomes [28]. Second; is that an index that accounts for the hemodynamics on either side of the aortic valve (stiff aorta and small stiff ventricle) should more accurately reflect the hemodynamic significance of an AR jet. It is therefore more relevant to set-up TAVI-specific cut-points of pressure half time as a hemodynamic index of AR severity than precluding its use. Specific cut-points of severity (reflecting the different hemodynamics of PVL from chronic native AR) were thus adopted in the present analysis, but are yet to be further validated.

An interesting counterintuitive finding of the present analysis is that intra-observer agreement and variability were too close to the inter-observer comparisons for most parameters. Similarly, results were quite similar for the four observers despite the wide range of experience. Both findings indicate that the variability reported here is inherent to the parameters of interest with minimal influence of the setting of analysis.

Limitations

All included echocardiographic studies involved a self-expanding transcatheter aortic valve. Although applicable to other valve types, the findings should be generalized with caution. The cut-points used in classifying the severity of PVL are inadequately validated in TAVI patients [21]. Accuracy of those parameters is, however, beyond the scope of the present study.

Conclusion

Reproducibility of PVL assessment by transthoracic echocardiography can be improved by using a simplified approach combining reproducible color and continuous wave Doppler parameters.