Introduction

Cardiac amyloidosis (CA) is an infiltrative disease of insoluble amyloid proteins in the myocardium. Acquired monoclonal immunoglobulin light-chain (AL) and transthyretin (TTR)-related diseases are the most frequent CA causes. Owing to these infiltrations, ventricular wall thickness and stiffness progress, and thereby CA mimics left ventricular hypertrophy (LVH). Thus, CA can be underdiagnosed in LVH patients [1]. Although CA has a poorer prognosis than other diseases with LVH, CA is currently pharmacologically treatable [2]. A definitive CA diagnosis requires proof of amyloid depositions in cardiomyocytes by endomyocardial biopsy, which may have fatal complications [3]. Bone scintigraphy, including technetium pyrophosphate (99mTc-PYP) scintigraphy, and cardiac magnetic resonance imaging (CMR) are useful for the non-invasive diagnosis of CA; however, they are costly and not available at all facilities [1]. Therefore, CA should be appropriately screened in LVH patients to determine those requiring further work-up.

Symptoms and physical findings are fundamental during CA screening. However, they require specialization, and their interobserver variations are large. Biomarkers, such as B-type natriuretic peptide (BNP) or troponin, are reproducible, widely used, and sensitive, but they are limited by their low specificity for CA [4, 5]. Electrocardiogram (ECG) and echocardiography are real-time diagnostic tools for providing differential diagnosis. Deformation parameters, including the relative apical sparing patterns of longitudinal strain (RASP), help diagnose CA with better precision than conventional parameters [6]. Accordingly, we hypothesized that the inclusion of deformation parameters into established diagnostic parameters would create a risk score for CA screening in LVH patients. We aimed to 1) investigate incremental benefits of echocardiographic parameters, including deformation parameters, over conventional diagnostic parameters for CA screening in patients with LVH; 2) determine the risk score for CA screening using all these variables; and 3) externally validate the score.

Methods

Study population

We retrospectively studied 323 consecutive LVH patients who underwent echocardiography and detailed work-up (biopsy, 99mTc-PYP scintigraphy, or CMR) in Ehime University Hospital or Uwajima City Hospital during June 2006–2019. LVH was defined as mean left ventricular (LV) wall thickness > 10 mm for men and > 9 mm for women on echocardiography [7]. We excluded patients with ischemic heart disease and severe aortic stenosis patients; thereafter, 295 patients were enrolled in the final analysis. This study was conducted according to the Declaration of Helsinki and approved by the ethics committee of Ehime University Graduate School of Medicine (IRB: 1905015); the informed consent process used the opt-out method on our hospital websites.

Clinical and electrocardiographic data

Clinical and electrocardiographic data at the closest time to echocardiography were collected by reviewing the medical chart (Supplemental Method 1and 2).

Standard transthoracic echocardiography

Conventional echocardiographic parameters and the parameters that are relatively specific to CA were measured, based on the recommendations of the American Society of Echocardiography and several references (Supplemental Method 3).

Strain imaging

The global longitudinal strain (GLS), ejection fraction strain ratio, and left atrial (LA) strain were measured using offline speckle-tracking analysis (Supplemental Method 4).

Rasp

Quantitatively assessed RASP (qRASP) was calculated by the previously reported formula: qRASP = [average apical LS]/[average basal LS + average mid LS] [6]. qRASP is consistent but requires offline calculation. Some concerns remain regarding the following: 1) dependency on the midventricular strain value, 2) offset of variation of the strain value based on the use of average values, 3) false positive results due to increased strain value of the entire left ventricle, and 4) no established threshold for assessment. Owing to the potential limitation of qRASP assessment, we recently introduced semi-quantitative method of RASP (sRASP) [8, 9]. Currently, GE and Philipps have adopted a color range divided into eight equal parts from − 20% (red) to 20% (blue) when the strain value is represented on a bull’s eye plot. sRASP was defined as reduction in LS of > − 10% in ≥ 5 (of 6) basal segments relative to LS of <− 15% in at least one apical segment.

Outcomes

The primary outcome was CA diagnosis by biopsy or 99mTc-PYP scintigraphy. Histological CA diagnosis was defined by positive Congo red staining with typical apple green birefringence in each specimen. In most histologically diagnosed CA patients, distinction between AL and TTR-associated amyloidosis (AL-CA and TTR-CA) was performed based on genotyping and/or immunohistochemistry. Patients with CA who showed amyloid infiltration by extra-cardiac biopsy had the diagnosis confirmed by ruling out other causes of LVH using clinical data, echocardiography, CMR, or 99mTc-PYP scintigraphy. Additionally, 99mTc-PYP scintigraphy is relatively specific for TTR-CA imaging [10]. Accordingly, 99mTc-PYP scintigraphy was scored using the following grading system: grade 0, no cardiac uptake; grade 1, mild uptake less than bone; grade 2, moderate uptake equal to bone; and grade 3, high uptake greater than bone [11]. Essentially, the non-invasive diagnosis of TTR-CA using 99mTc-PYP scintigraphy also requires a monoclonal protein assay [10], but since it was not available to all patients in this retrospective study, we expediently defined TTR-CA as cases measuring ≥2 on this score using 99mTc-PYP scintigraphy.

The secondary outcomes were all-cause death and admission for unexpected heart failure after the index echocardiographic examination. Medical records were used to conduct follow-up assessments. Patients were censored at the time of the outcome or at the end of follow-up (December 31, 2019).

Base model parameters for CA screening

From the aspect of external validity, age (≥65 [men], ≥70 [women]), low voltage on ECG, and LV posterior wall thickness (PWT) ≥14 mm were selected as conventional parameters comprising the base model, referring to previous reports on CA screening models [4, 12,13,14]. LV wall thickness is a fundamental characteristic of CA [4, 12, 13]. One study demonstrated that PWT was a more useful parameter than interventricular septal wall thickness; therefore, we used PWT [4]. Physical findings and biomarkers were not adopted as model candidates owing to the necessity of expertise and subunit variety.

Validation

A separate validation group of LVH patients undergoing detailed diagnostic tests (n = 242) between June 2006–2019 was obtained from the other three centers (Kitaishikai Hospital, Ehime Prefectural Central Hospital, and Ehime Prefectural Imabari Hospital). Based on the same exclusion criteria with the original cohort, 178 LVH patients were included.

Statistical analysis

Overall, < 5% of data in the derivation and validation cohorts were missing from patients’ records, except for BNP (8%), NT-proBNP (97%), troponin I (60%), troponin T (97%), serum albumin (6%), HbA1c (9%), total cholesterol (13%), PQ duration (8%), LA reservoir strain (11%), and LA booster strain (25%). Inter- and intraobserver variability of CA specific echocardiographic parameters was assessed using the kappa statistic and intraclass correlation coefficients. Measurements were performed in 30 randomly selected patients by one blinded sonographer and then repeated on more than 14 separate days by two blinded sonographers at Kitaishikai Hospital. The two readers used the same designated movies for assessing consistency. In the strain analysis, the variability included placing the region of interest in the automatically determined cardiac cycle by using a software program. In these 30 selected patients, the assessment time for sRASP and qRASP after constructing a bull’s eye plot for the evaluation of GLS were also measured by the two readers and then averaged.

Categorical variables were expressed as number of events and percentage. Continuous data were expressed as median and interquartile range and compared using Mann-Whitney U test, and categorical variables were compared using χ2 or Fisher’s exact tests.

Continuous echocardiographic variables were binarized with external cutoff points to avoid the best clinical scenario and construct a simple, general-purpose, easily implemented scoring system. Cut-off points of each parameter were as follows: PWT ≥14 mm [12], LV ejection fraction ≤55% [5], E/e’ > 12 [5], LA volume index ≥47 mL/m2 [15], anterior mitral valve leaflet thickness ≥ 5 mm [16], interatrial septal thickness ≥ 4 mm [17], right ventricular wall thickness ≥ 6 mm [18], GLS ≥ -16% [19], GLS ≥ -17% [5], ejection fraction strain ratio ≥ 3.9% [5], LA reservoir strain < 19% [17], qRASP > 0.87 [15], qRASP > 0.90 [5], and qRASP > 1.00 [6].

A receiver operating characteristic curve (ROC) was used to compare discriminative abilities between the base model and base model plus each echocardiographic parameter for identifying CA. The discrimination ability of each model was estimated as the area under the curve (AUC) using the probability model calculated from multivariable logistic regression for identifying CA. A comparison of AUCs was performed using methods by Delong et al. [20]. The sensitivity and specificity at the maximal Youden index were measured.

The score parameters comprised four parameters, including three base model parameters and the categorical echocardiographic parameter with maximum discriminatory power. Multivariable logistic regression analysis was performed to assess associations between CA diagnosis and the score parameters. The parameter with the lowest regression coefficient among these four variables in the multivariable logistic model was assigned a numeric value of 1, and the other three variables were assigned scores based on values of their regression coefficients relative to those of the lowest value and rounded to the nearest integer. The score was derived by summing the assigned numeric scores. The developed score was validated in the validation sample. Additionally, discrimination ability of the developed score was compared with that of the conventional Rahman’s model comprising interventricular septal thickness > 1.98 cm and low voltage on ECG [13], because subjects, design, and outcome of their study were relatively similar to those of our study. Moreover, to validate the CA screening score more rigorously, discriminative ability of the score was evaluated in selected patients (i.e., biopsy-based patients and patients without atrial fibrillation). Furthermore, discrimination ability of the score for identifying CA subtypes (AL-CA and TTR-CA) was assessed in all enrolled patients.

Additionally, the association of the score with adverse events was assessed using univariable Cox proportional hazards models and Kaplan-Meier curves. No significant violations of assumption of proportional hazards were noted. Differences in survival between groups were assessed using the log-rank test.

Threshold significance was defined as p < 0.05. Statistical analysis was performed using the R statistical package ver. 3.5.3 (R Foundation for Statistical Computing, Vienna, Australia, available online at http://www.R-project.org).

Results

Outcomes and patient characteristics

Of the 295 LVH patients, 54 were diagnosed with CA. Among these, 48 and 6 patients had biopsy-proven and 99mTc-PYP scintigraphy-proven amyloidosis, respectively. Biopsies were obtained from the myocardium in 38 patients. Twenty-two (41%) and 23 (43%) patients were diagnosed with AL-CA and TTR-CA, respectively. CA type could not be identified in nine patients (17%) who were diagnosed before 2010. They were older and frail; therefore, there was no indication for active treatment at the time, and their CA type was not investigated. The etiology of LVH in the remaining 241 patients was hypertrophic cardiomyopathy (n = 120), hypertensive heart disease (n = 74), dilated cardiomyopathy (n = 16), cardiac sarcoidosis (n = 14), valvular heart disease (n = 12), LV non-compaction cardiomyopathy (n = 2), Fabry disease (n = 2), and mitochondrial cardiomyopathy (n = 1).

Table 1 summarizes the baseline clinical and echocardiographic parameters of the enrolled patients. CA patients were significantly older and had lower voltage, thicker LV wall, more deteriorated LV diastolic functional and strain imaging parameters, and higher frequency of RASP than those without CA.

Table 1 Baseline characteristics in patients with and without cardiac amyloidosis

Incremental benefits of echocardiographic parameters over the base model

The discriminative ability of the base screening model comprising age (≥65 [men], ≥70 [women]), low voltage on ECG, and PWT ≥14 mm for identifying CA was acceptable. Of the binarized echocardiographic parameters, only RASP showed an incremental benefit over the base model (Table 2). Additionally, we inspected the additive value of the continuous variables over the base model to confirm whether the cutoff value used for each echo parameter was appropriate. Of the continuous echocardiographic parameters, adding qRASP resulted in the largest discriminatory power (Supplemental Table 1).

Table 2 Incremental benefits of categorical echocardiographic parameters over the base model

Development of the CA screening score

Accordingly, we created the diagnostic CA screening model using three base model parameters plus RASP (Table 3). We selected sRASP of the categorical RASP parameters because it could be quickly assessed online at the patient’s bedside, and it demonstrated similar discriminatory ability to other categorized qRASPs [18]. All parameters were independently associated with CA. Each parameter was assigned 1 point based on its relative effect. A score was constructed by adding the numeric values of factors identified in each patient, and the score range was 0–4. The mean score was 0.8 ± 0.9. Using the ROC analysis to identify CA, the score showed optimal discriminative ability, significantly better than that of the conventional Rahman’s model (Fig. 1, left). A total score of ≥2 showed optimal sensitivity (66%), specificity (95%), positive predictive value (74%), and negative predictive values (92%). The prevalence of CA clearly increased as the sum of the screening score increased (Fig. 2, left).

Table 3 Multivariable logistic regression analysis of the base model parameters and semi-quantitative relative apical strain pattern for identifying cardiac amyloidosis (n = 287; cardiac amyloidosis = 53)
Fig. 1
figure 1

Comparison of the screening score with Rahman’s model in the derivation (left) and validation groups (right). AUC; area under the curve; CI, confidence interval; PWT, posterior wall thickness; RASP, relative apical sparing pattern

Fig. 2
figure 2

Association between the prevalence of cardiac amyloidosis and the screening score in the derivation (left) and validation groups (right)

Validation

Score accuracy was investigated in the validation cohort. Patients’ characteristics were similar to those of the derivation cohort (Supplemental Table 2). Of 178 LVH patients, 56 were diagnosed with CA, including 26 and 30 patients with biopsy-proven and 99mTc-PYP scintigraphy-proven amyloidosis, respectively. Eleven patients (20%) were diagnosed with AL-CA and 44 (78%) with TTR-CA. CA type could not be determined in one patient (2%). The mean score of this cohort was 1.1 ± 1.0. Even in the validation cohort, the score showed optimal discriminative ability, significantly better than the Rahman’s model (Fig. 1, right). Similarly, a total score of ≥2 showed optimal sensitivity (71%), specificity (93%), positive predictive value (77%), and negative predictive values (85%). Furthermore, the positive relationship between CA prevalence and score was similar to that in the derivation cohort (Fig. 2, right).

Discriminative ability of the score incorporating binarized qRASP instead of sRASP for identifying CA

The measurement method of sRASP has not been fully validated compared to that of qRASP. Therefore, we inspected the discriminative ability of the score incorporating binarized qRASP, instead of sRASP. In this situation, qRASP > 0.90 was chosen because it showed greater additive value over the base model in the derivation cohort, rather than qRASP > 1.00 and > 0.87. qRASP > 0.90 was assigned 1 point and was used as a score component instead of sRASP. The score incorporating qRASP > 0.90 exhibited the same discrimination ability as that incorporating sRASP in both cohorts (Supplemental Fig. 1).

Discriminative ability of the score for identifying CA in selected patients

Analyses in selected patients with a definitive histological diagnosis or in patients without atrial fibrillation, who usually provide more accurate echo results, may contribute to validation of the present results. We also investigated the discriminative ability of the score in biopsy-proven patients (n = 204) and patients without atrial fibrillation (n = 336) who were successfully assessed using the score. Of these selected patients, CA was diagnosed in 74 and 68 patients, respectively. Even in these patients, the AUC of the score was almost equivalent to that in all enrolled patients and significantly better than that of the Rahman’s model (Supplemental Fig. 2).

Discriminative ability of the score for identifying CA subtypes

The histological feature of cardiac involvement in TTR-CA and AL-CA is different [21]. In all enrolled patients with successful score assessment (n = 461), we performed ROC analysis to assess the discriminative ability of the score for identifying CA subtypes. The score discriminated AL-CA (n = 33) more accurately than Rahman’s model, but its discriminative ability was modest (Fig. 3, left). For this discrimination, a total score of ≥2 showed optimal sensitivity (46%) and specificity (82%). However, the score still accurately discriminated TTR-CA (n = 67) (Fig. 3, right). For this discrimination, a total score of ≥2 showed optimal sensitivity (80%) and specificity (90%).

Fig. 3
figure 3

Comparison of the screening score with Rahman’s model for identifying cardiac amyloidosis subtypes in enrolled patients. AUC; area under the curve; AL-CA, light-chain types of cardiac amyloidosis; CI, confidence interval; PWT, posterior wall thickness; RASP, relative apical sparing pattern; TTR-CA, transthyretin types of cardiac amyloidosis

Predictive ability of adverse events with the score

Of patients with the score and follow-up data (n = 456; median follow-up: 2.6 years, IQR: 0.8–5.8 years), 27 (6%) suffered all-cause death, 79 (17%) presented with admission for unexpected heart failure, and 106 (23%) experienced both. The score was significantly associated with the adverse outcome (hazard ratio, 2.12; 95% confidence interval, 1.74–2.59; p < 0.01). In the Kaplan-Meier curves, the incidence of adverse outcomes significantly increased as the score increased (log-rank test, p < 0.01) (Fig. 4).

Fig. 4
figure 4

Kaplan-Meier estimates of time to the occurrence of adverse events according to the screening score

Reproducibility

Reproducibility data are summarized in Supplemental Table 3. GLS, qRASP, and sRASP demonstrated excellent consistency. After constructing a bull’s eye plot for the evaluation of GLS, the averaged assessment times for sRASP were significantly shorter than those for qRASP (1 ± 2 s vs. 63 ± 6 s, p < 0.01).

Discussion

In this study, we investigated the incremental benefits of echocardiographic deformation parameters versus established parameters for CA screening, determined the resultant risk score for CA screening, and externally validated the score in LVH patients. We developed a risk score, comprising four parameters (age, low voltage in electrocardiography, PWT ≥14 mm, and RASP), which has potential utility in the risk stratification and management of LVH patients.

Strength of the present CA screening score

According to an expert consensus, the first screening tests in suspected CA patients are symptoms, ECG, echocardiography, CMR, and biomarkers [22]. However, symptoms are highly subjective, and CMR is relatively expensive and not widely accessible. Biomarkers are reproducible, widely used and sensitive, but their limitation is a low specificity for CA. Conversely, ECG and echocardiography are commonly used in various clinical settings, and their parameters are highly reproducible. Therefore, referring to previous consensus reports describing CA screening models, we selected the three indices (age, low voltage, and PWT) with high reproducibility and versatility as basic model parameters for CA screening [4, 12,13,14]. The high discriminatory power of the base model (AUC in the derivation cohort: 0.82) may indicate that the parameter selection was relatively appropriate.

Several strain indices (RASP, ejection fraction strain ratio, and LA strain) are more useful in CA screening than conventional indices [6, 15, 16]. Here, the incremental benefit of several binary echocardiographic indicators over the base model in CA identification was analyzed. RASP showed the most additional value, and the result was the same even using continuous variables. Accordingly, our CA screening score comprised base model parameters and RASP. The discrimination ability of the score was significantly better than that of the conventional Rahman’s model, and it was well-validated even in the selected cases, such as biopsied patients and patients without atrial fibrillation. This finding may indicate that the score would be highly versatile in clinical practice. Additionally, an increased score was significantly associated with a poorer prognosis, likely reflected by the fact that CA has a poorer prognosis than other hypertrophic diseases [23]. This result may demonstrate the validity of the prediction accuracy of the present score.

Recently, Boldrini et al. reported the multiparametric echocardiographic score for the diagnosis of CA in patients with LVH in an international cohort study [24]. Their score adopted longitudinal strain and systolic apex-to-base ratio similar to RASP, which demonstrated an excellent CA discrimination ability; this might support the usefulness of our scores using RASP. We adopted semi-quantitatively assessed RASP to the score. This parameter has high reproducibility, does not require offline analysis, and can be evaluated simultaneously with GLS measurement [8]. This would make the present score practical in clinical settings. Nonetheless, sRASP has not been fully validated yet. However, even when the binary RASP obtained using the conventional quantitative method was incorporated into the score instead of sRASP, the discrimination ability of the score was similar to the score incorporating sRASP. Thus, it may be feasible to adapt the binary variable of qRASP as a component of the score, instead of sRASP, especially when evaluating RASP on non-General Electronic machines.

Differences in discriminative abilities for identifying CA subtypes

Although AL-CA and TTR-CA have different pathological conditions, it is difficult to distinguish these etiologies at an early stage in clinical practice [21]. Therefore, in the present study, we created a score to screen for both CA phenotypes to differentiate them from other causes of LV hypertrophy. Consequently, the present score was more suitable for TTR-CA than for AL-CA screening. There are a few possible reasons for this result. First, the model components of the score may have an impact. AL-CA often is diagnosed at a younger age than TTR-CA [1]. Moreover, the wall thickness of AL-CA is generally thinner than that of TTR-CA [1]. Conversely, low voltage on electrocardiogram is more common in AL than in TTR, but the incidence of RASP seems similar between AL and TTR [1, 25]. Therefore, the preference of each model component may have been more advantageous in screening TTR-CA than AL-CA. Second, this result may be involved in the enrollment of patients with already increased LV wall thickness. Therefore, some AL-CA patients who generally have thinner walls than those with TTR-CA [1] were not included, which may have led to selection bias.

Limitations

Our data should be interpreted while considering the limitations. First, biopsy data were not available for all cases. However, all patients were diagnosed with a detailed work up by cardiology specialists. Also, the accuracy of the scoring system was consistent between overall and biopsied patients. Second, a monoclonal protein assay was not available to all patients. Therefore, in our sample, TTR-CA patients diagnosed by 99mTc-PYP scintigraphy might overlap with other diseases, especially AL-CA. However, the purpose of our score is to screen for all types of CA, and the discrimination ability of the score in sub-group analysis of only biopsy-proven cases was almost equivalent to that in all enrolled patients. Third, some laboratory data useful for CA screening (troponin T, NT-proBNP, and serum kappa/lambda free light chain ratio) were unavailable because the measurement facilities were limited. Consequently, the accuracy of the present score could not be compared with models using these laboratory data [4, 5]. Furthermore, the present score was more suitable for TTR-CA than AL-CA. Therefore, in order to identify AL-CA, it may be important to use the AL score by Boldrini et al. and serum free light chain assay in combination with our score [24]. Forth, RASP was difficult to assess in 11 cases with arrhythmia or poor echocardiographic imaging (2%). However, a prospective assessment may reduce this rate. Fifth, strain analysis is not always available at all facilities. Recently, Aimo et al. proposed a CA screening score that uses only relative wall thickness and E/e’ without strain analysis, which could be an alternative method in this situation [26]. Sixth, all echocardiographic measurements were calculated as the average value in three cardiac cycles. However, the measurements should be averaged in five cycles, for example, when there was atrial fibrillation. Seventh, the target of this study was patients with already increased LV wall thickness. Therefore, the validity of the present score in individuals with normal LV thickness is unknown. However, increased LV wall thickness is a major characteristic of CA, and the enrollment method of this study seems mostly relevant to a real clinical setting. Eighth, echocardiographic examinations were performed using three different ultrasound machines. The difference may partially affect the reproducibility of the data. Finally, this study was retrospective in design. The retrospective analysis had limitations with respect to potential confounders and risk for bias. Thus, larger multicenter prospective studies are warranted to confirm our results.

Conclusion

We developed a CA screening score incorporating RASP, which presents better accuracy than that of the conventional prediction model. This score can identify patients who require subsequent work-up, including biopsy and scintigraphy, and consequently facilitate early pharmacological intervention and improve their prognosis. However, symptoms and biomarkers are fundamental assessment methods to screen CA. The present score should be considered as an additional tool to the biohumoral assessment.